Main.EuroDSLApproach History

Show minor edits - Show changes to output

January 30, 2015, at 07:21 AM by 65.183.45.146 -
Changed line 3 from:
The project will use polyhedral techniques along with the DKU pattern ([[http://opensourceresearchinstitute.org/uploads/BLIS/Kronawitter_masters_DKU.pdf|paper here]]). We will focus on straight forward applications at first, moving towards rarer and more difficult cases as the project matures.
to:
Underneath the DSLs, the tools in the project will rely on polyhedral techniques along with the DKU pattern ([[http://opensourceresearchinstitute.org/uploads/BLIS/Kronawitter_masters_DKU.pdf|paper here]]) in order to automate targeting the code to a variety of different hardware platforms. To mitigate risk, we will focus on straight forward and well structured applications at the start of the project, and move towards rarer and more difficult cases as the project matures.
Added lines 24-25:
It will be Polyvios Pratikakis's contribution to implement the API as the front end for their innovative CUBE 512 core hardware.
Changed lines 28-30 from:
The idea is to have an interface that has a standard way to package up work to be performed. That interface can be a front for a GPU, or for a Cloud server, or for a node on a BlueGene.. the only difference among those is the latency to get there and back, the bandwidth of the communication, and the rate the work completes. (Assuming the code can be treated in a generic way that compiles to all those platforms -- this is what the polyhedral brings).

The limitation will be that the approach isn't universal.. only certain kinds of applications will fit this approach well (at least at first.. we'll get the low hanging fruit, that accounts for 80% of high computation needs, then advance to more difficult cases over time).
to:
The main differences among the hardware offerings is the latency to get commands, status, and data to and from the hardware node, the bandwidth of the communication, and the rate the work completes. These can be treated in a generic way by the scheduler. It will be the job of the DSL design and the tools to be able to treat the code of the work to be performed in a generic way that compiles to all the platforms. This is one of the strengths of the polyhedral compilation approach.

The limitation will be that the approach won't be initially universal. At first, only certain kinds of applications will fit this approach well. The initial work should capture roughly 80% of high computation cases, and move on from there to advance to more difficult cases as the project matures.
Changed lines 3-5 from:
approach will be to use polyhedral techniques along with the DKU pattern (paper attached). Something straight forward at first, then more sophisticated as the project matures.

In
[[http://opensourceresearchinstitute.org/uploads/BLIS/Kronawitter_masters_DKU.pdf|the paper]] it talks about running on the BlueGene supercomputer, which is organized as nodes. The only major difference from using multiple Cloud servers would be the speed of the network. It will be Erol's contribution to automate the process of deciding what work can be profitably exported to your Cloud servers (including the Teslas). It will be Armin's contribution providing the means to divide the work. And it will be yours to provide an API by which the chosen work is sent to a specified Cloud machine, executed, and results shipped back. (This should make it safe for you, because that API can be directly called by users, making your effort usable by customers no matter what.. it isn't dependent upon the other parts of the project in order to be offered by you and of value to your customers).
to:
The project will use polyhedral techniques along with the DKU pattern ([[http://opensourceresearchinstitute.org/uploads/BLIS/Kronawitter_masters_DKU.pdf|paper here]]). We will focus on straight forward applications at first, moving towards rarer and more difficult cases as the project matures.

In
[[http://opensourceresearchinstitute.org/uploads/BLIS/Kronawitter_masters_DKU.pdf|the paper]] a primitive form of this approach is demonstrated running on the BlueGene supercomputer, which is treated as a collection of networked computing nodes. This view, of a collection of networked nodes, will be continued for all forms of hardware.

The project will identify a clean API by which a quantity of work is placed within a standard packaging (which is defined with
the project). The API is used to invoke such a package of work on a computing node. The package includes meta-information about other nodes that may need to communicate during the work. This approach makes a clean interface that hides the nature of the computing node. The tools in the project will handle making multiple versions of the code, and ensure that the appropriate version is invoked on the chosen computing node.

The project plans
to include several kinds of hardware target:
* Cloud based multi-core servers
* Cloud based high performance GPU accelerators
* Cloud based Intel PHI accelerators
* networked racks containing multi-core and (PHI/GPU
) accelerated machines having high bandwidth local inter-machine networks
* The novel FORTH based 512 node ultra low power Cube many-core hardware
* The Parallela "personal supercomputer" ultra low power computation hardware


Erol Gelenbe's work at Imperial College London will tackle the automation of the process of deciding what work can be profitably exported to which compute nodes.

It will be Albert Cohen and Armin's Grosslinger's contributions to provide the means to automatically divide the work in the way that Imperial's scheduling determines to be optimal. It will also be theirs to provide the tools that generate the multiple targets of the work kernels, one for each hardware target in the project.

It will be XLab's contribution to implement the API as the front end for their Cloud hosted compute nodes (including Tesla GPUs and Intel Phi accelerators).

By using such standard APIs, the dependence of the project pieces on each other is reduced. The APIs can be used after the project completes as stand alone means of invoking computation, making it agnostic to hardware target. This makes this part of the value of the project freely available, while simplifying the work within the project, and also fitting well with the development of the DSL front ends and the supporting tools
.
Added line 1:
!!Project Approach
Added lines 1-8:

approach will be to use polyhedral techniques along with the DKU pattern (paper attached). Something straight forward at first, then more sophisticated as the project matures.

In [[http://opensourceresearchinstitute.org/uploads/BLIS/Kronawitter_masters_DKU.pdf|the paper]] it talks about running on the BlueGene supercomputer, which is organized as nodes. The only major difference from using multiple Cloud servers would be the speed of the network. It will be Erol's contribution to automate the process of deciding what work can be profitably exported to your Cloud servers (including the Teslas). It will be Armin's contribution providing the means to divide the work. And it will be yours to provide an API by which the chosen work is sent to a specified Cloud machine, executed, and results shipped back. (This should make it safe for you, because that API can be directly called by users, making your effort usable by customers no matter what.. it isn't dependent upon the other parts of the project in order to be offered by you and of value to your customers).

The idea is to have an interface that has a standard way to package up work to be performed. That interface can be a front for a GPU, or for a Cloud server, or for a node on a BlueGene.. the only difference among those is the latency to get there and back, the bandwidth of the communication, and the rate the work completes. (Assuming the code can be treated in a generic way that compiles to all those platforms -- this is what the polyhedral brings).

The limitation will be that the approach isn't universal.. only certain kinds of applications will fit this approach well (at least at first.. we'll get the low hanging fruit, that accounts for 80% of high computation needs, then advance to more difficult cases over time).