WP 5 Transform Tools

  • Task 0: participate in rapid prototyping process, together with members of WP 2 -- portal, WP 3 -- DSM runtime, WP 4 -- annotations, and WP 7 -- end application, to together get a minimum viable system up and running. This will create a high degree of communication between partners early in the project, to uncover hidden assumptions, develop shared ideas about approach, define clean interfaces and interactions between partners, workpackages, and deliverable pieces, and increase the smoothness and pace of progress for the rest of the project.

Task 1: create a transform tool that takes as input the application source code in the form of C code with OpenMP pragmas plus additional high level DSM related annotations. It produces as output a transformed version of the source code that has lower level DSM annotations, which identify data regions and access patterns for that data. The exact annotations will be defined as the tasks of wp4. Led by INRIA, with involvement by DC and SL regarding implications for the application programmer.

Task 2: create a transform tool, similar to that of Task 1, but rather takes as input source snippets intended as the CCFs of a Reo language program, and which include DSM related annotations. It produces as output a transformed version of the source that has the same form of low level annotations as in task 1. Led by CWI, with involvement by DC and SL regarding implications for the application programmer, and involvement by INRIA on advice and expertise in creating the tool.

Task 3: create a transform tool as in tasks 1 and 2, but rather takes standard C code as input, with high level annotations related DSM. The tool produces output with the same low level annotations as in tasks 1 and 2. Led by INRIA, with involvement by DC and SL regarding implications for the application programmer.

Task 4: create a transform tool that takes as input the outputs from the tools defined in tasks 1 through 3. This tool focuses only on the portions of code that are indicated by the low level annotations. For example, one set of such low level annotations may indicate portions of code that act as a computation kernel, another may indicate code that packages data that is used by such a kernel. This tool will transform the indicated code in order to arrive at a granularity of data that is appropriate to a particular physical machine configuration. Led by INRIA, with involvement by Gigas regarding machine details.

Task 5: create a compiler that takes as input the low level annotated outputs from tasks 1 through 3. It produces as output a fat binary in the form appropriate for input to a runtime JIT recompiler that incrementally replaces code with optimized versions that are specialized to low level hardware details. Led by IBM, with input from INRIA regarding the details of the low level annotations and advice related to making use of the annotation information.

Task 6: The goal of this task is to create a runtime JIT compiler that takes as input a fat binary and incrementally replaces portions of the binary with optimized versions of the code that are specialized to the low level hardware details. This JIT takes advantage, where applicable, of information inserted into the fat binary by the compiler from Task 5, which is derived from the low level DSM annotations. The JIT uses this information in order to fine tune the granularity of DSM calls and arrange the data accesses for high performance. The JIT interfaces to the DSM runtime by inserting calls to the DSM dynamic library. This is specific to IBM Power architecture, but it provides the blue print for implementing for other ISAs as well. IBM leads, with involvement by Sean regarding interfacing to the DSM runtime system.

  * Milestone   M2.: Month 12 -- 
  * Deliverable D2.: Month 12 --  
  * Milestone   M2.: Month 24 -- 
  * Deliverable D2.: Month 24 --  
  * Milestone   M2.: Month 36 -- 
  * Deliverable D2.: Month 36 --  

   =========================================
  • Task 1: participate in WP 4 task 2, as part of arriving at the interface that WP 5 will take as input.
    • Milestone M5.: Month 12 --
    • Deliverable D5.: Month 12 --
    • Milestone M5.: Month 24 --
    • Deliverable D5.: Month 24 --
    • Milestone M5.: Month 36 --
    • Deliverable D5.: Month 36 --
  • Task 2: Define intermediate, low level form of code annotation. The interfaces defined in WP 4 will be translated into this common lower level form.
    • Milestone M5.: Month 12 --
    • Deliverable D5.: Month 12 --
    • Milestone M5.: Month 24 --
    • Deliverable D5.: Month 24 --
    • Milestone M5.: Month 36 --
    • Deliverable D5.: Month 36 --
  • Task 3: Create tools that transform from each form of higher level code annotation into the common lower level code annotation form.
    • Milestone M5.: Month 12 --
    • Deliverable D5.: Month 12 --
    • Milestone M5.: Month 24 --
    • Deliverable D5.: Month 24 --
    • Milestone M5.: Month 36 --
    • Deliverable D5.: Month 36 --
  • Task 4: Create transform tools that translate from the common lower level form into the final C form of the code. The final C form includes OS calls, DSM runtime system calls, and synchronization calls that are inserted by the tool. The final C code has a form of the application that performs large chunks of work in-between calls to the DSM system. Each target hardware platform will require its own variation of the transform tool, which is tuned to the details of that hardware, especially communication details. The tool may produce a single multi-versioned binary, or it may include a runtime specializer, or it may generate many independent versions of the binary. A large portion of the research will involve determining the best approach.
    • Milestone M5.: Month 12 --
    • Deliverable D5.: Month 12 --
    • Milestone M5.: Month 24 --
    • Deliverable D5.: Month 24 --
    • Milestone M5.: Month 36 --
    • Deliverable D5.: Month 36 --

Comment and Question: "For the IBM fat binary specializer, there are three stages (1) development stage: static generic compilation on developer machine which produces a custom IR form plus generic executable (2) static specialization compilation on a server, or during load, which generates a Power executable specialized to a specific HW (3) runtime fat-binary based recompilation on the actual deployed HW" Question: "How does this fit into CloudDSM?" Answer: Stage 1 will remain on the developer machine, stage 2 will take place inside the CloudDSM portal, and stage 3 inside the Cloud server during execution.

Question: "how will stage 2 fit with the DSM specific specializations?" Answer: this is an open question, to be resolved during the WP. We need some pictures, to figure out what tools do what at which point..

Comment: "Stage 1 happens inside the development environment on a desktop machine. The low-level annotated source is then sent to the CloudDSM portal by the developer. This process registers the application and makes it available for the end-user to run. This registration process also causes the low-level annotated source to be given to a specialization 'harness'. That harness invokes a number of specializer modules. One specializer module is provided by IBM. This module re-runs stage 1 and then runs stage 2 several times, once for each potential Power HW configuration that the CloudDSM system could send the fat-binary to (the module may, in fact cause stage 1 and stage 2 to run remotely on Power ISA machines, inside their own Cloud VM). Lastly, after the user starts the application and issues a request for computation, the portal deploys a unit of work to a Cloud VM running on a Power ISA machine. That Cloud VM has the DSM runtime in it, and that is given the unit of work. The unit of work includes a function within the fat binary to perform. The fat binary is dynamically linked to the DSM runtime. During execution, the work suspends and the binary optimizer takes over, modifies the code, then resumes the work. When the work reaches a DSM call, the DSM runtime suspends the execution context. That context will remain suspended while communication of data takes place. The DSM runtime will switch the CPU to a different context, whose communication has completed and is ready to resume."

Workpackage for Runtime Specialization

Goal/objectives:

During the on-going execution of the application/computation, the DSM system continuously recalculates the division of work. This in turn will require compiling newly created tasks on the fly, specializing code to the platform it is deployed on/migrated to, and adapting various aspects of the computation, such as synchronization and data layout. The goal of this WP is to develop tools/mechanisms to carry out these adaptations, based primarily on JIT compilation, at load time and/or runtime, relying on (1) high level annotations provided by the development stage, (2) information about the underlying HW and location of the VMs to interact with, and (3) dynamically gathered profile information.

[ ? Is this general enough to also cover the kinds of more limited adaptations that the other platforms could support? (I assume that "compiling newly created tasks on the fly" os something we could do on all platform, and also "specializing code to the platform it is deployed on/migrated to" since that could be just load time compilation. ? ]

Description of Work/tasks:

The forms of adaptations/specializations considered include: 1. low level code specialization to the underlying HW configuration (ISA generation, cache hierarchy, scheduling). 2. specialization of synchronization patterns (based on high level annotations) 3. specialization of data layout and data accesses (based on high level annotations) 4. code adaptation based on runtime feedback information (specializing to the specific input, and adjusting to changing behavior of the computation at different phases of the computation)

These are organized into the following tasks:

[ ? Here I'm still not sure how best to break this down to tasks...

There are "vertical" responsibilities to be assigned:

- specialization on X86 (INRIA? just wild guess) - specialization on Power (IBM) - specialization on Calray (Calray) - specialization on Forth (Forth)

But at the same time there are "horizontal" tasks, basically one per each of the specialization forms I listed above. Each platform family will support a subset of these tasks; Power will probably support all of these tasks. ? ]

Deliverables/Milestones:

- There should be a milestone to define the annotations we'll based the adaptation on - Deliverables: probably one per task, so lets figure out the tasks first...