WP 4 Annotations

  • Task1: participate in rapid prototyping process, together with members of WP 2 -- portal, WP 3 -- DSM runtime, WP 5 -- transform tools, and WP 7 -- end application, to together get a minimum viable system up and running. This will create a high degree of communication between partners early in the project, to uncover hidden assumptions, develop shared ideas about approach, define clean interfaces and interactions between partners, workpackages, and deliverable pieces, and increase the smoothness and pace of progress for the rest of the project.

Task 2: define the high level annotations Deliverable: detailed specification of high level annotations for use with OpenMP Deliverable: detailed specification of high level annotations for use with Reo Deliverable: detailed specification of high level annotations for use with C

Task 3: define the low level annotations that are output by the transform tools defined in WP 5 Deliverable: detailed specification of low level annotations

Task 4: identify portions of end user application that are to be written in terms of Cloud DSM, and perform rewrites of those portions. Use a rapid prototyping style, in close cooperation with partners performing Task 1 and also the partners performing WP 5 Tasks 1 through 3. This makes close collaboration between the partners who write the end user applications and those who define the annotations used while writing the application and the partners who implement the tools that take those annotations as input. This task is performed in a rapid iteration style together with task 1 and tasks 1 through 3 of WP 5. The version of the annotations produced is not the final version, but is rather the candidate used going forward within the transform tool creation, and will be modified if issues are uncovered during implementation of the tools or even issues propagating up from the DSM runtime implementation.

Deliverable: first version of three end user applications that are written in terms of the CloudDSM annotations, one application for each version of the annotations. These versions do not compile nor do they run, but they are in a form that is the candidate for the high level annotations.

  • Task 1: Define more precisely the class of applications that CloudDSM targets
    • Milestone M2.: Month 12 --
    • Deliverable D2.: Month 12 --
    • Milestone M2.: Month 24 --
    • Deliverable D2.: Month 24 --
    • Milestone M2.: Month 36 --
    • Deliverable D2.: Month 36 --
  • Task 2: Define the needs of the toolchain, what degrees of freedom it needs in order to accomplish the desired transforms of the source code.
    • Milestone M2.: Month 12 --
    • Deliverable D2.: Month 12 --
    • Milestone M2.: Month 24 --
    • Deliverable D2.: Month 24 --
    • Milestone M2.: Month 36 --
    • Deliverable D2.: Month 36 --
  • Task 3: Define the needs of the runtime system, what characteristics it needs in the code generated by the toolchain in order to deliver high performance.
    • Milestone M2.: Month 12 --
    • Deliverable D2.: Month 12 --
    • Milestone M2.: Month 24 --
    • Deliverable D2.: Month 24 --
    • Milestone M2.: Month 36 --
    • Deliverable D2.: Month 36 --
  • Task 4: Define the needs of the application developer, what mental models and what syntax, and what debugging and code-checking support they desire.
    • Milestone M2.: Month 12 --
    • Deliverable D2.: Month 12 --
    • Milestone M2.: Month 24 --
    • Deliverable D2.: Month 24 --
    • Milestone M2.: Month 36 --
    • Deliverable D2.: Month 36 --
  • Task 5: Integrate the results of tasks 1 through 4 into a specification of the interfaces used by the application developer. There will be two levels of code annotation.. one that high level application developers see and use, of which there will be many variations. For example, Reo will have a different high level user interface than the pragma system for OpenMP. The second, lower level will be common to all versions of the higher level interface. This will be used directly by the toolchain to perform code transforms. Each of the higher level forms will be translated into the same, common, lower level form. This task only considers the top level forms of the code. WP 5 separately defines the common lower level form.
    • Milestone M2.: Month 12 --
    • Deliverable D2.: Month 12 --
    • Milestone M2.: Month 24 --
    • Deliverable D2.: Month 24 --
    • Milestone M2.: Month 36 --
    • Deliverable D2.: Month 36 --

Tasks 1 through 5 will be performed iteratively, with multiple revisions during the first six months of the project. Each of the tasks will have an impact on the other tasks, and it will require a large amount of communication, via iterations, in order to find a suitable common ground interface that supports all the aspects well.

  • Task 6: Development tools to support the writing of application code. This includes code checkers that enforce restrictions on what coding practices are allowed inside the portions of the application that employ the CloudDSM system. It also includes debugging aids, to detect bugs and to narrow down the portion of the application code causing discrepancies from specified behavior.

-- Sean and INRIA and Douglas Connect will lead, with input from XLAB, Imperial, and partners for WP3 tasks 2 through 6.

Questions and Comments

Question: "Why use OpenMP?" A: Many existing parallel libraries are written in OpenMP. One early project goal was to hit the ground running with code that is relevant to industry and readily usable by industry. Using OpenMP libraries during development of the low-level interface allows compiler work to begin almost immediately on relevant, ready-to-run code.

Comment and Question: "It may provide benefit desired by end-users if location-aware high-level annotations were also provided, such as regions, effects, etc. With these, the programmers will be able to communicate placement information. We found that for Myrmics, the programmer can know more about placement than the runtime/compiler can infer, and once placement/allocation is done properly, the shared-mem abstraction still helps with coding, except that the manual placement provides superior locality and performance. The programmer knows about and can say things about locality (like the X10 places, or regions in Fortress)." Question: "does it make sense to expose location within a Cloud-system that automatically and dynamically changes the number, type, and location of machines assigned to the computation?" Answer: you have nailed the heart of what this workpackage is all about. This will be an on-going discussion during the first six to nine months of the project. Indeed, one desire is to capture the understanding that the programmer has in their head and uses during the process of specifying pieces of work and choosing, themselves, where to place each piece. The goal of the WP is to discover an encoding of the process that the programmer does in their head, so that they encode that mental process, in a parameterized way. The automation then chooses the parameter values, and plugs them into what the programmer provided, which delivers pieces of work and the affinities among them. The WP content is the work of discovering programmer abstractions that get us as close to there as possible, in a way that we know how to implement..

Question: "any hints on what is the common low-level form of the source that is produced by the development toolchain? is it annotated source code?" Answer: Figuring this out is the content of WP4. Albert would like a source code annotation form for project logistics reasons. In that case by-hand modification of existing OpenMP libraries can begin at once, and act as a test bed for rapid iterations of what is the best low-level form. At the same time, compiler techniques can be tried, and also at the same time high level end-user annotations can be tested for how the "feel" to the application programmers. The DSM-specializer can be worked on in tight iterations with figuring out what the best low-level representation should be.. any desired changes in representation are just done quickly by hand -- no need to fiddle with IR, and reasonably decoupled from high level annotation form..

Question: "How does the IBM fat-binary specializer interact with the DSM runtime system?" Answer: AFAIU, the re-optimizer interrupts work in progress, changes the executable, then resumes work in progress. But it doesn't controls what work is assigned to what core. The optimizations it performs are single-thread optimizations, and also the re-optimizer may be told by the DSM runtime or by the portal to adjust the code such that the size of chunk of work performed between DSM calls is adjusted, or the layout or access pattern of data is adjusted. It is not clear yet whether the re-optimizer tool will make decisions on its own about the best chunk size. It might communicate performance feedback to the DSM runtime, and optimal chunk size decisions are made there. Or those decisions may be passed along to the portal. Wherever they are ultimately made, it will be up to the Dorit tool to modify the code such that it actually performs work in the chosen size of chunk. It still remains to work through the details of how the DSM runtime will interact with the Dorit tool.