Underneath the DSLs, the tools in the project will rely on polyhedral techniques along with the DKU pattern (paper here) in order to automate targeting the code to a variety of different hardware platforms. To mitigate risk, we will focus on straight forward and well structured applications at the start of the project, and move towards rarer and more difficult cases as the project matures.
In the paper a primitive form of this approach is demonstrated running on the BlueGene supercomputer, which is treated as a collection of networked computing nodes. This view, of a collection of networked nodes, will be continued for all forms of hardware.
The project will identify a clean API by which a quantity of work is placed within a standard packaging (which is defined with the project). The API is used to invoke such a package of work on a computing node. The package includes meta-information about other nodes that may need to communicate during the work. This approach makes a clean interface that hides the nature of the computing node. The tools in the project will handle making multiple versions of the code, and ensure that the appropriate version is invoked on the chosen computing node.
The project plans to include several kinds of hardware target:
- Cloud based multi-core servers
- Cloud based high performance GPU accelerators
- Cloud based Intel PHI accelerators
- networked racks containing multi-core and (PHI/GPU) accelerated machines having high bandwidth local inter-machine networks
- The novel FORTH based 512 node ultra low power Cube many-core hardware
- The Parallela "personal supercomputer" ultra low power computation hardware
Erol Gelenbe's work at Imperial College London will tackle the automation of the process of deciding what work can be profitably exported to which compute nodes.
It will be Albert Cohen and Armin's Grosslinger's contributions to provide the means to automatically divide the work in the way that Imperial's scheduling determines to be optimal. It will also be theirs to provide the tools that generate the multiple targets of the work kernels, one for each hardware target in the project.
It will be XLab's contribution to implement the API as the front end for their Cloud hosted compute nodes (including Tesla GPUs and Intel Phi accelerators).
It will be Polyvios Pratikakis's contribution to implement the API as the front end for their innovative CUBE 512 core hardware.
By using such standard APIs, the dependence of the project pieces on each other is reduced. The APIs can be used after the project completes as stand alone means of invoking computation, making it agnostic to hardware target. This makes this part of the value of the project freely available, while simplifying the work within the project, and also fitting well with the development of the DSL front ends and the supporting tools.
The main differences among the hardware offerings is the latency to get commands, status, and data to and from the hardware node, the bandwidth of the communication, and the rate the work completes. These can be treated in a generic way by the scheduler. It will be the job of the DSL design and the tools to be able to treat the code of the work to be performed in a generic way that compiles to all the platforms. This is one of the strengths of the polyhedral compilation approach.
The limitation will be that the approach won't be initially universal. At first, only certain kinds of applications will fit this approach well. The initial work should capture roughly 80% of high computation cases, and move on from there to advance to more difficult cases as the project matures.