CloudDSM.WP2 History

Hide minor edits - Show changes to markup - Cancel

April 01, 2014, at 08:29 PM by 178.11.216.240 -
April 01, 2014, at 08:28 PM by 178.11.216.240 -
Changed lines 4-5 from:
  • Task1: Develop portal prototype (XLab, ICL, DC, Gigas, Sean) -- Participate as part of a rapid prototyping process, together with members of WP 3 -- DSM runtime, WP 4 -- annotations, WP 5 -- transform tools, and WP 7 -- end application. Together get a minimum viable system up and running. This will create a high degree of communication between partners early in the project, to uncover hidden assumptions, develop shared ideas about approach, define clean interfaces and interactions between partners, workpackages, and deliverable pieces, and increase the smoothness and pace of progress for the rest of the project. This task includes several revisions of specifications for: the portal functionality; internal structure; external interfaces; and protocols for communicating with it. A new incremental revision will be delivered each month, which refines what was delivered in the previous month. The prototype will then be available for the other WPs to work with, and deliver feedback to direct the next prototype delivered. The first deliverable will be mainly interfaces, with dummy modules inside that return "canned" results that correspond to the interfaces, without computing anything inside. The first prototype's interfaces may be incomplete, leaving out functionality that will be introduced over time in subsequent versions, such as the interface for gathering of scheduling related information. The monthly delivery of new and improved prototypes will continue until month 8, at which point the candidate for the final interfaces will be delivered, with minimum viable functionality implemented inside, although the functionality will be simple, fast-to-implement versions, taken from previous EU projects or open source, or created fresh, which ever path proves fastest during development.
to:
  • Task1: Develop portal prototype (XLab, ICL, DC, Gigas, Sean) -- Participate as part of a rapid prototyping process, together with members of WP 3 -- DSM runtime, WP 4 -- annotations, WP 5 -- transform tools, and WP 7 -- end application. Together get a minimum viable system up and running, by iterating on the process of gathering specifications, producing a design, implementing, and testing with end-user. This will create a high degree of communication between partners early in the project, to uncover hidden assumptions, develop shared ideas about approach, define clean interfaces and interactions between partners, workpackages, and deliverable pieces, and increase the smoothness and pace of progress for the rest of the project.

This task includes several revisions of specifications for: the portal functionality; internal structure; external interfaces; and protocols for communicating with it. A new incremental revision will be delivered each month, which refines what was delivered in the previous month. The prototype will then be available for the other WPs to work with, and deliver feedback to direct the next prototype delivered. The first deliverable will be mainly interfaces, with dummy modules inside that return "canned" results that correspond to the interfaces, without computing anything inside. The first prototype's interfaces may be incomplete, leaving out functionality that will be introduced over time in subsequent versions, such as the interface for gathering of scheduling related information. The monthly delivery of new and improved prototypes will continue until month 8, at which point the candidate for the final interfaces will be delivered, with minimum viable functionality implemented inside, although the functionality will be simple, fast-to-implement versions, taken from previous EU projects or open source, or created fresh, which ever path proves fastest during development.

The portal contains a specialization harness, a scheduler, a deployer, repository of specialized binaries, repository of application characteristics relevant to scheduling, statistics on application resource usage, and continuously updated state of the hardware resources. The portal receives instructions directly from human users, as well as engaging in a protocol with the front-end of applications. It observes resources and interacts with lower system levels, and also observes runing applications, and makes decisions for tasks. As such, the portal contains a scheduler that self-optimises the system in real time. Thus the portal and scheduler must be efficient and avoid complex computations. The portal will accomplish responsiveness by being highly distributed and by exploiting inputs from multiple agents that operate simultanteously around different time scales, and around different resources, which share the gathered information with the scheduler. This will be the philosophy driving the development of the portal so that it is made of many distributed/concurrent lightweight components with only lightweight synchronisation while maximising information sharing.

Changed lines 28-31 from:
  * Deliverable D2.2: Month 18 --  A working portal that includes a working Scheduler, Specialization harness, executable repository, application statistics gathering, hardware statistics monitoring, and deployment onto a given Cloud stack -- Deliver the working code base plus a technical report, and other artifacts related to the implementation of the portal and supporting subsystems.
  * Deliverable D2.3: Month 30 -- Deliver the final, portal, ready for integration and testing of the final deliverables from the other work packages, such as the low level annotation transform tool, and the IBM toolchain that generates a fat binary. Deliver a technical report, code, and other artifacts related to the implementation of the portal and supporting subsystems.
to:
  * Deliverable D2.9: Month 18 --  A working portal that includes a working Scheduler, Specialization harness, executable repository, application statistics gathering, hardware statistics monitoring, and deployment onto a given Cloud stack -- Deliver the working code base plus a technical report, and other artifacts related to the implementation of the portal and supporting subsystems.
  * Deliverable D2.10: Month 30 -- Deliver the final, portal, ready for integration and testing of the final deliverables from the other work packages, such as the low level annotation transform tool, and the IBM toolchain that generates a fat binary. Deliver a technical report, code, and other artifacts related to the implementation of the portal and supporting subsystems.
Changed lines 33-38 from:
  * Deliverable D2.6: Month 12 --  Report on the results of testing and measuring performance of the initial end-user apps from DC and LB, when run on the prototype system, with appropriate pieces from other work packages integrated.
  * Deliverable D2.6: Month 24 --  Report on the results of testing and measuring performance of the initial end-user apps from DC and LB, when run on the prototype system, with appropriate pieces from other work packages integrated.
  * Deliverable D2.8: Month 36 --  Final Evaluation of the CloudDSM system portal 
  • Task5: Modifications to Cloud software API -- (Gigas, XLab, Sean) -- add the following to the Cloud API:
    • The ability to assign which physical machine (IaaS provider) a given VM image is started and animated on, via API.
to:
  * Deliverable D2.11: Month 12 --  Report on the results of testing and measuring performance of the initial end-user apps from DC and LB, when run on the prototype system, with appropriate pieces from other work packages integrated.
  * Deliverable D2.12: Month 24 --  Report on the results of testing and measuring performance of the initial end-user apps from DC and LB, when run on the prototype system, with appropriate pieces from other work packages integrated.
  * Deliverable D2.13: Month 36 --  Final Evaluation of the CloudDSM system portal 
  • Task4: Modifications to Cloud software API inside Host provider -- (Gigas, XLab, Sean) -- add the following to the Cloud API:
    • The ability to assign which physical machine (inside IaaS provider's premises) a given VM image is started and animated on, via API.
Changed lines 41-48 from:
  • Task1: Develop portal -- gather specifications, produce design, make prototype, test with end-user, iterate (Leader XLab, major contribution by ICL, with input by DC, Gigas, and Sean; ICL 2 PMs)
  • Task2: Define Scheduling component within portal -- interface with other components inside portal, which supply information upon which to base decision and produces a scheduler to be carried out (Leader ICL, major contribution by XLab, with input by DC, Gigas, and Sean; ICL 21PMs): The scheduler will exploit a self-aware dynamic analysis approach which in an on-line manner and in real-time, will receive as input the state of all system resources, and calculate resource availability and expected execution times for tasks currently active within the portal and expected to be launched based on predictions derived from usage patterns. Scheduling decisions will combine detailed status information regarding expected response times, internal network delays, possible security risks, and possible reliability problems. The self-aware dynamic analysis will return a "short list" of the best instantaneous provisioning decisions, ranked according to performance, security, reliability, energy consumption and other relevant metrics. Based on the task to be provisioned, the scheduler will be able to decide on provisioning rapidly. The portal and collection of DSM runtime systems include monitoring of whether the performance objectives of the task are being met, which will re-trigger the scheduler to make a decision again if the observations indicate an unsatisfactory outcome, which will be based on the new system state as well as the overhead related to any changes in provisioning.

Milestone M2.2: (12 Months after the start of the project) A working simple prototype scheduler, plus a detailed description of the final full featured scheduler. The prototype will implement all interfaces and successfully generate a schedule, but using simple algorithms. The targeted final scheduler Will be described in a Technical Report defining the scheduler functionality, structure, and all of its components, and describing its interfaces to and integration with the portal subsystems that monitor hardware and application features and with the DSM runtime system.

  • Task4: Develop portal, including all sub-systems (Leader XLab with all partners; ICL 12PMs): The portal contains a specialization harness, a scheduler, a deployer, repository of specialized binaries, repository of application characteristics relevant to scheduling, statistics on application resource usage, and continuously updated state of the hardware resources. The portal receives instructions directly from human users, and observes resources and interacts with lower system levels, and also observes runing applications, and makes decisions for tasks. As such, the portal contains a scheduler that self-optimises the system in real time. Thus the portal and scheduler must be efficient and avoid complex computations. The portal will accomplish responsiveness by being highly distributed and by exploiting inputs from multiple agents that operate simultantously around different time scales, and around different resources, which share the gathered information with the scheduler. This will be the philosophy driving the development of the portal so that it is made of many distributed/concurrent lightweight components with only lightweight synchronisation while maximising information sharing.
to:
  * Deliverable D2.14: Month 10 -- the functionality of this task is implemented and integrated with the prototype portal.  All of the functionalities are exercised from the portal, and tests exist to exercise them, which are performed in Task 3 -- testing.
  • Task5: Research and Develop Scheduling component within portal -- (Leader ICL, major contribution by XLab, with input by DC, Gigas, and Sean; ICL 21PMs) -- Once the prototype is complete, the interface internal to the portal between the scheduler and other components inside portal, is defined. The other components supply information upon which to base decisions and produce a schedule to be carried out. The scheduler will exploit a self-aware dynamic analysis approach which in an on-line manner and in real-time, will receive as input the state of all system resources, and calculate resource availability and expected execution times for tasks currently active within the portal and expected to be launched based on predictions derived from usage patterns. Scheduling decisions will combine detailed status information regarding expected response times, internal network delays, possible security risks, and possible reliability problems. The self-aware dynamic analysis will return a "short list" of the best instantaneous provisioning decisions, ranked according to performance, security, reliability, energy consumption and other relevant metrics. Based on the task to be provisioned, the scheduler will be able to decide on provisioning rapidly. The portal and collection of DSM runtime systems include monitoring of whether the performance objectives of the task are being met, which will re-trigger the scheduler to make a decision again if the observations indicate an unsatisfactory outcome, which will be based on the new system state as well as the overhead related to any changes in provisioning.

Deliverable 2.15: 12 Months after the start of the project -- 6PMs -- A working simple prototype scheduler, integrated with the prototype portal, plus a detailed description of the final full featured scheduler. The prototype will implement all interfaces and successfully generate a schedule, but using simple algorithms. The targeted final scheduler is described in a delivered technical report defining the scheduler functionality, structure, and all of its components, and describing its interfaces to and integration with the portal subsystems that monitor hardware and application features and with the DSM runtime system.

Changed lines 49-51 from:

Deliverable D2.3: (27 Months after the start of the project) Technical Report and Open Source Software that implements the PT.

  • Task5: Testing and performance tuning (IBM with all partners; ICL 6 PMs)
to:
Added line 51:
Changed line 58 from:

Comment: "A VM can yield a physical processor through a POWER Hypervisor call (instruction?). It puts idle VMs into a hibernation state so that they do not consume any physical CPU resources."

to:

Comment: "A VM can yield a physical processor through a POWER Hypervisor call. It puts idle VMs into a hibernation state so that they are not assigned to any physical CPU resources."

April 01, 2014, at 08:05 PM by 178.11.216.240 -
Changed lines 23-34 from:
  • Task2: Harden the internals of the portal. Integrate deliverables from other work packages into the portal infrastructure.
    • Deliverable D2.2: Month 24 -- Upgrades to portal that support changes made in the Scheduler and Specialization harness -- Technical Report, code, and other artifacts related to the implementation of the portal and supporting subsystems.
    • Deliverable D2.3: Month 36 -- Upgrades to portal that support changes made in the Scheduler (advanced scheduler features) and Specialization harness -- Technical Report, code, and other artifacts related to the implementation of the portal and supporting subsystems.
  • Task2: Define Scheduling component within portal (ICL, XLab, Sean)
    • Deliverable D2.4: Month 6 -- detailed description of the first version full featured scheduler -- technical report that describes the interfaces to the first version scheduler and what functions it performs
    • Deliverable D2.5: Month 24 -- detailed description of the final full featured advanced scheduler -- technical report that describes the interfaces to the advanced scheduler and what functions it performs
  • Task4: Testing and performance tuning of full CloudDSM system (IBM, DC, INRIA, CWI, Gigas, ICL, XLab)
    • Deliverable D2.6: Month 12 -- First Evaluation of the CloudDSM system portal
    • Deliverable D2.6: Month 24 -- Evaluation and Revised Requirements and Architecture of the CloudDSM
to:
  • Task2: Harden the internals of the portal. Integrate deliverables from other work packages into the portal infrastructure, such as the scheduler delivered by workpackage 6. Improve the function of the simple elements that are inside the prototype.
    • Deliverable D2.2: Month 18 -- A working portal that includes a working Scheduler, Specialization harness, executable repository, application statistics gathering, hardware statistics monitoring, and deployment onto a given Cloud stack -- Deliver the working code base plus a technical report, and other artifacts related to the implementation of the portal and supporting subsystems.
    • Deliverable D2.3: Month 30 -- Deliver the final, portal, ready for integration and testing of the final deliverables from the other work packages, such as the low level annotation transform tool, and the IBM toolchain that generates a fat binary. Deliver a technical report, code, and other artifacts related to the implementation of the portal and supporting subsystems.
  • Task3: Testing and performance tuning of full CloudDSM system (IBM, DC, LB, INRIA, CWI, Gigas, ICL, XLab) In this task, a working version of the CloudDSM portal is tested and performance is measured, when running applications from DC and LB.
    • Deliverable D2.6: Month 12 -- Report on the results of testing and measuring performance of the initial end-user apps from DC and LB, when run on the prototype system, with appropriate pieces from other work packages integrated.
    • Deliverable D2.6: Month 24 -- Report on the results of testing and measuring performance of the initial end-user apps from DC and LB, when run on the prototype system, with appropriate pieces from other work packages integrated.
April 01, 2014, at 10:03 AM by 178.11.216.240 -
April 01, 2014, at 09:57 AM by 178.11.216.240 -
Changed lines 2-5 from:

(details of tasks are below)

  • Task1: Develop portal (XLab, ICL, DC, Gigas, Sean) as part of a rapid prototyping process, together with members of WP 3 -- DSM runtime, WP 4 -- annotations, WP 5 -- transform tools, and WP 7 -- end application, to together get a minimum viable system up and running. This will create a high degree of communication between partners early in the project, to uncover hidden assumptions, develop shared ideas about approach, define clean interfaces and interactions between partners, workpackages, and deliverable pieces, and increase the smoothness and pace of progress for the rest of the project. This task includes several revisions of specifications for the portal functionality, internal structure, and external interface and protocols for communicating with it. A new incremental revision will be delivered each month, which refines what was delivered in the previous month, and reflects the operation of the current prototype, which will be available for the other WPs to work with in the following month. The first deliverable will be mainly interfaces, with dummy modules inside that return "canned" results that correspond to the interfaces, without computing anything inside. The interfaces may be incomplete, leaving out functionality that will be introduced in later versions, such as gathering of scheduling related information. The monthly delivery of new and improved prototypes will continue until month 8, at which point the candidate for the final interfaces will be delivered, with significant functionality implemented inside, although the functionality will be simple, fast-to-implement versions, taken from previous EU projects or open source, or created fresh, which ever path proves fastest during development.
to:
  • Task1: Develop portal prototype (XLab, ICL, DC, Gigas, Sean) -- Participate as part of a rapid prototyping process, together with members of WP 3 -- DSM runtime, WP 4 -- annotations, WP 5 -- transform tools, and WP 7 -- end application. Together get a minimum viable system up and running. This will create a high degree of communication between partners early in the project, to uncover hidden assumptions, develop shared ideas about approach, define clean interfaces and interactions between partners, workpackages, and deliverable pieces, and increase the smoothness and pace of progress for the rest of the project. This task includes several revisions of specifications for: the portal functionality; internal structure; external interfaces; and protocols for communicating with it. A new incremental revision will be delivered each month, which refines what was delivered in the previous month. The prototype will then be available for the other WPs to work with, and deliver feedback to direct the next prototype delivered. The first deliverable will be mainly interfaces, with dummy modules inside that return "canned" results that correspond to the interfaces, without computing anything inside. The first prototype's interfaces may be incomplete, leaving out functionality that will be introduced over time in subsequent versions, such as the interface for gathering of scheduling related information. The monthly delivery of new and improved prototypes will continue until month 8, at which point the candidate for the final interfaces will be delivered, with minimum viable functionality implemented inside, although the functionality will be simple, fast-to-implement versions, taken from previous EU projects or open source, or created fresh, which ever path proves fastest during development.
Changed lines 8-18 from:
  * Deliverable D2.2: Month 2  -- XLab, with input from INRIA, DC, LB, IBM, Passau, and Gigas -- improved interfaces, with some actual functionality filled in, as determined by circumstances uncovered during the project.  Upon delivery, these interfaces become available to the other workpackages, and feedback from them will drive design of the months 3 and 4 deliverables of this WP.  The interfaces will be documented in a convetient form for the other WP participants to understand and use.

  * Deliverable D2.3: Month 3  -- XLab, with input from INRIA, DC, LB, IBM, Passau, and Gigas -- improved interfaces, with some actual functionality filled in, as determined by circumstances uncovered during the project.  Upon delivery, these interfaces become available to the other workpackages, and feedback from them will drive design of the months 3 and 4 deliverables of this WP.  The interfaces will be documented in a convetient form for the other WP participants to understand and use.

  * Deliverable D2.4: Month 4  -- XLab, with input from INRIA, DC, LB, IBM, Passau, and Gigas -- working, simple, prototype of portal, with simple implementations of each subsystem -- code and related artifacts of a working simple portal prototype (ready for use by DC and INRIA)

  * Deliverable D2.5: Month 5  -- XLab, with input from INRIA, DC, LB, IBM, Passau, and Gigas -- working, simple, prototype of portal, with simple implementations of each subsystem -- code and related artifacts of a working simple portal prototype (ready for use by DC and INRIA)

  * Deliverable D2.6: Month 6  -- XLab, with input from INRIA, DC, LB, IBM, Passau, and Gigas -- working, simple, prototype of portal, with simple implementations of each subsystem -- code and related artifacts of a working simple portal prototype (ready for use by DC and INRIA)

  * Deliverable D2.7: Month 7  -- XLab, with input from INRIA, DC, LB, IBM, Passau, and Gigas -- working, simple, prototype of portal, with simple implementations of each subsystem -- code and related artifacts of a working simple portal prototype (ready for use by DC and INRIA)
to:
  * Deliverable D2.2: Month 2  -- XLab, with input from INRIA, DC, LB, IBM, Passau, and Gigas -- improved interfaces, with some functionality filled in, as determined by circumstances uncovered during the project.  Upon delivery, these interfaces become available to the other workpackages, and feedback from them will drive design of the months 3 and 4 deliverables of this WP.  The interfaces will be documented in a convenient form for the other WP participants to understand and use.

  * Deliverable D2.3: Month 3  -- XLab, with input from INRIA, DC, LB, IBM, Passau, and Gigas -- improved interfaces, with some actual functionality filled in, as determined by circumstances uncovered during the project.  Upon delivery, these interfaces become available to the other workpackages, and feedback from them will drive design of the months 4 and 5 deliverables of this WP.  The interfaces will be documented in a convenient form for the other WP participants to understand and use.

  * Deliverable D2.4: Month 4  -- XLab, with input from INRIA, DC, LB, IBM, Passau, and Gigas -- interfaces should be starting to stabilize in this version, and more functionality is filled in, as determined by circumstances uncovered during the project.  Upon delivery, the interfaces become available to the other workpackages, and feedback from them will drive design of the months 5 and 6 deliverables of this WP.  The interfaces will be documented in a convenient form for the other WP participants to understand and use.

  * Deliverable D2.5: Month 5  -- XLab, with input from INRIA, DC, LB, IBM, Passau, and Gigas -- interfaces should be nearly stabile in this version, and more functionality is filled in, as determined by circumstances uncovered during the project.  Upon delivery, the interfaces become available to the other workpackages, and feedback from them will drive design of the month 6 deliverables of this WP.  Month 6 should be the last deliverable in which the interfaces change in any major way. The interfaces will be documented in a convenient form for the other WP participants to understand and use.

  * Deliverable D2.6: Month 6  -- XLab, with input from INRIA, DC, LB, IBM, Passau, and Gigas -- the last prototype in which the interfaces may change in major ways.  Some version of most functionality should be filled in, missing only where circumstances arising within the project delay.  Upon delivery, the interfaces become available to the other workpackages, as a candidate for the final interfaces to the portal.  The interfaces will be documented in a convenient form for the other WP participants to understand and use.

  * Deliverable D2.7: Month 7  -- XLab, with input from INRIA, DC, LB, IBM, Passau, and Gigas -- Nearly all functionality should have some form of implementation, missing only where circumstances arising within the project delay.  The interfaces will be documented in a convenient form for the other WP participants to understand and use.
April 01, 2014, at 09:42 AM by 178.11.216.240 -
Changed lines 5-8 from:
  • Task1: participate in rapid prototyping process, together with members of WP 3 -- DSM runtime, WP 4 -- annotations, WP 5 -- transform tools, and WP 7 -- end application, to together get a minimum viable system up and running. This will create a high degree of communication between partners early in the project, to uncover hidden assumptions, develop shared ideas about approach, define clean interfaces and interactions between partners, workpackages, and deliverable pieces, and increase the smoothness and pace of progress for the rest of the project.
  • Task1: Develop portal (XLab, ICL, DC, Gigas, Sean)
    • Deliverable D2.1: Month 6 -- working, simple, prototype of portal, with simple implementations of each subsystem -- code and related artifacts of a working simple portal prototype (ready for use by DC and INRIA)
to:
  • Task1: Develop portal (XLab, ICL, DC, Gigas, Sean) as part of a rapid prototyping process, together with members of WP 3 -- DSM runtime, WP 4 -- annotations, WP 5 -- transform tools, and WP 7 -- end application, to together get a minimum viable system up and running. This will create a high degree of communication between partners early in the project, to uncover hidden assumptions, develop shared ideas about approach, define clean interfaces and interactions between partners, workpackages, and deliverable pieces, and increase the smoothness and pace of progress for the rest of the project. This task includes several revisions of specifications for the portal functionality, internal structure, and external interface and protocols for communicating with it. A new incremental revision will be delivered each month, which refines what was delivered in the previous month, and reflects the operation of the current prototype, which will be available for the other WPs to work with in the following month. The first deliverable will be mainly interfaces, with dummy modules inside that return "canned" results that correspond to the interfaces, without computing anything inside. The interfaces may be incomplete, leaving out functionality that will be introduced in later versions, such as gathering of scheduling related information. The monthly delivery of new and improved prototypes will continue until month 8, at which point the candidate for the final interfaces will be delivered, with significant functionality implemented inside, although the functionality will be simple, fast-to-implement versions, taken from previous EU projects or open source, or created fresh, which ever path proves fastest during development.
    • Deliverable D2.1: Month 1 -- XLab, with input from INRIA, DC, LB, IBM, Passau, and Gigas -- simple set of interfaces to the portal, where the portal delivers "canned" responses when the interfaces are exercised. Upon delivery, these interfaces become available to the other workpackages, and feedback from them will drive design of the months 2 and 3 deliverables of this WP. The interfaces will be documented in a convetient form for the other WP participants to understand and use.
    • Deliverable D2.2: Month 2 -- XLab, with input from INRIA, DC, LB, IBM, Passau, and Gigas -- improved interfaces, with some actual functionality filled in, as determined by circumstances uncovered during the project. Upon delivery, these interfaces become available to the other workpackages, and feedback from them will drive design of the months 3 and 4 deliverables of this WP. The interfaces will be documented in a convetient form for the other WP participants to understand and use.
    • Deliverable D2.3: Month 3 -- XLab, with input from INRIA, DC, LB, IBM, Passau, and Gigas -- improved interfaces, with some actual functionality filled in, as determined by circumstances uncovered during the project. Upon delivery, these interfaces become available to the other workpackages, and feedback from them will drive design of the months 3 and 4 deliverables of this WP. The interfaces will be documented in a convetient form for the other WP participants to understand and use.
    • Deliverable D2.4: Month 4 -- XLab, with input from INRIA, DC, LB, IBM, Passau, and Gigas -- working, simple, prototype of portal, with simple implementations of each subsystem -- code and related artifacts of a working simple portal prototype (ready for use by DC and INRIA)
    • Deliverable D2.5: Month 5 -- XLab, with input from INRIA, DC, LB, IBM, Passau, and Gigas -- working, simple, prototype of portal, with simple implementations of each subsystem -- code and related artifacts of a working simple portal prototype (ready for use by DC and INRIA)
    • Deliverable D2.6: Month 6 -- XLab, with input from INRIA, DC, LB, IBM, Passau, and Gigas -- working, simple, prototype of portal, with simple implementations of each subsystem -- code and related artifacts of a working simple portal prototype (ready for use by DC and INRIA)
    • Deliverable D2.7: Month 7 -- XLab, with input from INRIA, DC, LB, IBM, Passau, and Gigas -- working, simple, prototype of portal, with simple implementations of each subsystem -- code and related artifacts of a working simple portal prototype (ready for use by DC and INRIA)
    • Deliverable D2.8: Month 8 -- XLab, with input from INRIA, DC, LB, IBM, Passau, and Gigas -- working, simple, prototype of portal, with simple implementations of each subsystem -- code and related artifacts of a working simple portal prototype (ready for use by DC and INRIA)
  • Task2: Harden the internals of the portal. Integrate deliverables from other work packages into the portal infrastructure.
Added line 26:
April 01, 2014, at 09:20 AM by 178.11.216.240 -
Changed line 1 from:

Work Package 2 DSM Runtime System

to:

Work Package 2 CloudDSM portal

April 01, 2014, at 09:20 AM by 178.11.216.240 -
Changed line 1 from:

Work Package 2

to:

Work Package 2 DSM Runtime System

April 01, 2014, at 09:16 AM by 178.11.216.240 -
Added lines 3-5:
  • Task1: participate in rapid prototyping process, together with members of WP 3 -- DSM runtime, WP 4 -- annotations, WP 5 -- transform tools, and WP 7 -- end application, to together get a minimum viable system up and running. This will create a high degree of communication between partners early in the project, to uncover hidden assumptions, develop shared ideas about approach, define clean interfaces and interactions between partners, workpackages, and deliverable pieces, and increase the smoothness and pace of progress for the rest of the project.
March 29, 2014, at 07:30 PM by Ales Cernivec XLAB - Refined tasks a bit - are there to many deliverables?
Changed line 7 from:
  * Deliverable D2.3: Month 36 -- Upgrades to portal that support changes made in the Scheduler and Specialization harness -- Technical Report, code, and other artifacts related to the implementation of the portal and supporting subsystems.
to:
  * Deliverable D2.3: Month 36 -- Upgrades to portal that support changes made in the Scheduler (advanced scheduler features) and Specialization harness -- Technical Report, code, and other artifacts related to the implementation of the portal and supporting subsystems.
Changed lines 10-11 from:
  * Deliverable D2.4: Month 6 -- detailed description of the final full featured scheduler -- technical report that describes the interfaces to the scheduler and what functions it performs
to:
  * Deliverable D2.4: Month 6 -- detailed description of the first version full featured scheduler -- technical report that describes the interfaces to the first version scheduler and what functions it performs
  * Deliverable D2.5: Month 24 -- detailed description of the final full featured advanced scheduler -- technical report that describes the interfaces to the advanced scheduler and what functions it performs
Changed lines 14-17 from:
  * Deliverable D2.: Month 12 --  
  * Deliverable D2.: Month 24 --  
  * Deliverable D2.: Month 36 --  
to:
  * Deliverable D2.6: Month 12 --  First Evaluation of the CloudDSM system portal
  * Deliverable D2.6: Month 24 --  Evaluation and Revised Requirements and Architecture of the CloudDSM 
  * Deliverable D2.8: Month 36 --  Final Evaluation of the CloudDSM system portal 
Changed lines 19-23 from:
  * The ability to assign which physical machine a given VM image is started and animated on, via API.
  * The ability to gather information about the CPU and network usage of each physical machine in a given hosting location, via API.
  * The ability for the CloudDSM portal to inform the hypervisor on a given physical machine that it should stop assigning a specific VM to CPUs, which is already loaded and active inside the hypervisor.  The portal also has the ability to tell the hypervisor to again begin assigning that VM to CPUs. (This eliminates the 2 minute overhead required to load a VM image into the hypervisor, and also eliminates charging the client for CPU time spent busy-waiting while waiting for data to arrive or for work to become free).
to:
  * The ability to assign which physical machine (IaaS provider) a given VM image is started and animated on, via API.
  * The ability to gather information about the CPU and network usage of each physical machine in a given hosting location, via API. Define interfaces to the chosen IaaS providers. 
  * The ability for the CloudDSM portal to inform the hypervisor on a given physical machine (via IaaS provider) that it should stop assigning a specific VM to CPUs, which is already loaded and active inside the hypervisor. The portal also has the ability to tell the hypervisor to again begin assigning that VM to CPUs. (This eliminates the 2 minute overhead required to load a VM image into the hypervisor, and also eliminates charging the client for CPU time spent busy-waiting while waiting for data to arrive or for work to become free).
Deleted line 28:
Deleted line 32:
March 28, 2014, at 09:18 AM by 80.114.134.224 -
Changed line 20 from:
  * The ability for the CloudDSM portal to inform the hypervisor on a given physical machine that it should stop assigning a specific VM to CPUs, which is already loaded and active inside the hypervisor.  The portal also has the ability to tell the hypervisor to again begin assigning that VM to CPUs.
to:
  * The ability for the CloudDSM portal to inform the hypervisor on a given physical machine that it should stop assigning a specific VM to CPUs, which is already loaded and active inside the hypervisor.  The portal also has the ability to tell the hypervisor to again begin assigning that VM to CPUs. (This eliminates the 2 minute overhead required to load a VM image into the hypervisor, and also eliminates charging the client for CPU time spent busy-waiting while waiting for data to arrive or for work to become free).
March 28, 2014, at 08:23 AM by 80.114.134.224 -
Changed line 17 from:
  • Task5: Modifications to Cloud software API -- Gigas Hosting -- add the following to the Cloud API:
to:
  • Task5: Modifications to Cloud software API -- (Gigas, XLab, Sean) -- add the following to the Cloud API:
March 28, 2014, at 08:21 AM by 80.114.134.224 -
March 28, 2014, at 08:21 AM by 80.114.134.224 -
Changed line 17 from:
  • Task5: Modifications to Cloud software API -- add the following to the Cloud API:
to:
  • Task5: Modifications to Cloud software API -- Gigas Hosting -- add the following to the Cloud API:
March 28, 2014, at 08:19 AM by 80.114.134.224 -
Changed lines 5-10 from:
  * Milestone   M2.1: Month 6  -- working, simple, prototype of portal, with simple implementations of each subsystem
  * Deliverable D2.1: Month 6  -- code and related artifacts of a working simple portal prototype (ready for use by DC and INRIA)
  * Milestone   M2.2: Month 24 -- Upgrades to portal that support changes made in the Scheduler and Specialization harness
  * Deliverable D2.2: Month 24 -- Technical Report, code, and other artifacts related to the implementation of the portal and supporting subsystems.
  * Milestone   M2.3: Month 36 -- Upgrades to portal that support changes made in the Scheduler and Specialization harness
  * Deliverable D2.3: Month 36 -- Technical Report, code, and other artifacts related to the implementation of the portal and supporting subsystems.
to:
  * Deliverable D2.1: Month 6  -- working, simple, prototype of portal, with simple implementations of each subsystem -- code and related artifacts of a working simple portal prototype (ready for use by DC and INRIA)
  * Deliverable D2.2: Month 24 -- Upgrades to portal that support changes made in the Scheduler and Specialization harness -- Technical Report, code, and other artifacts related to the implementation of the portal and supporting subsystems.
  * Deliverable D2.3: Month 36 -- Upgrades to portal that support changes made in the Scheduler and Specialization harness -- Technical Report, code, and other artifacts related to the implementation of the portal and supporting subsystems.
Changed lines 10-12 from:
  * Milestone   M2.4: Month 6 -- detailed description of the final full featured scheduler  
  * Deliverable D2.4: Month 6 -- technical report that describes the interfaces to the scheduler and what functions it performs
to:
  * Deliverable D2.4: Month 6 -- detailed description of the final full featured scheduler -- technical report that describes the interfaces to the scheduler and what functions it performs
Deleted line 12:
  * Milestone   M2.: Month 12 -- 
Deleted line 13:
  * Milestone   M2.: Month 24 -- 
Deleted line 14:
  * Milestone   M2.: Month 36 -- 
Added line 21:
March 28, 2014, at 08:17 AM by 80.114.134.224 -
Added lines 24-27:
  • Task5: Modifications to Cloud software API -- add the following to the Cloud API:
    • The ability to assign which physical machine a given VM image is started and animated on, via API.
    • The ability to gather information about the CPU and network usage of each physical machine in a given hosting location, via API.
    • The ability for the CloudDSM portal to inform the hypervisor on a given physical machine that it should stop assigning a specific VM to CPUs, which is already loaded and active inside the hypervisor. The portal also has the ability to tell the hypervisor to again begin assigning that VM to CPUs.
March 25, 2014, at 11:55 AM by 80.114.134.224 -
Deleted lines 15-22:
  • Task3: Develop scheduler (ICL, XLab, Sean)
    • Milestone M2.5: Month 12 -- First, simple, version of the full scheduler integrated into portal (replaces "dummy" version inside prototype portal)
    • Deliverable D2.5: Month 12 -- code and related artifacts of the working scheduler
    • Milestone M2.6: Month 24 -- Subset of advanced scheduler features functioning and integrated into portal.
    • Deliverable D2.6: Month 24 -- code and related artifacts of the working second-stage scheduler
    • Milestone M2.7: Month 36 -- All advanced scheduler features working within the integrated system.
    • Deliverable D2.7: Month 36 -- code and related artifacts of the working final, advanced scheduler
Changed lines 31-34 from:
  • Task3: Develop scheduler (Leader ICL with contributions to integration by XLab and Sean)

Milestone M2.3: (24 Months after the start of the project) A working scheduler that includes some advanced features. Measured performance improvements over the simple prototype scheduler. Milestone M2.4: (36 Months after the start of the project) All advanced features of scheduler added and demonstrated working within the system, with measured performance improvements delivered to end-user applications, which are due to the dynamic monitoring and re-provisioning.

to:
Changed line 90 from:

[1] http://www.conpaas.eu/

to:

[1] http://www.conpaas.eu/

March 24, 2014, at 08:48 AM by AlesCernivec XLAB - Added Contrail (ConPaaS) info
Added lines 84-101:

Suggested solution

ConPaaS[1] is an open source runtime environment for hosting applications in the cloud.

ConPaaS is part of PaaS family, provides full power of cloud to application developers while shielding them from the complexity of the cloud. It can use different IaaS providers as a backend (EC2, OpenNebula, plan is also to fully support OpenStack).

It is designed to host HPC scientific applications and web applications. Moreover, it automates the entire life-cycle of an application: development, deployment, perf.monitoring, and also automatic scaling. Important: it is under further development (as a result of project Contrail).

It runs on variety of public and private clouds, it is easily extendable where CloudDSM benifits the most. It allows developers to focus on application-specific concerns rather than on cloud-specific details.

Application Model It provides a collection of services, where each service acts as a replacement for a commonly used runtime environment. For example, to replace a MySQL database, ConPaaS provides a cloud-based MySQL service which acts as a high-level database abstraction. The service uses real MySQL databases internally, and therefore makes it easy to port a cloud application to ConPaaS. Unlike a regular centralized database, however, it is self-managed and fully elastic: one can dynamically increase or decrease its processing capacity by requesting the service to reconfigure itself with a different number of virtual machines.

ConPaaS is also designed to facilitate the development of new services. All services derive from a single “generic service” which provides the basic functionalities for starting and stopping VMs, configuring and initializing the right services in them, etc. All the virtual machines in ConPaaS also rely on the same generic VM image. A service implementation therefore consists of a few hundred lines of Python (implementing the service-specific parts of the service manager) and Javascript (extending the front-end GUI with service-specific information and control).

Map reduce and TaskFarming: these two services make intensive use of the MapReduce and TaskFarming services from ConPaaS as their main runtime for high-performance big-data computations. Both applications also plan to offer Web-based frontends to their users.

[1] http://www.conpaas.eu/

March 21, 2014, at 07:23 AM by 80.114.134.224 -
Changed lines 2-32 from:
to:

(details of tasks are below)

  • Task1: Develop portal (XLab, ICL, DC, Gigas, Sean)
    • Milestone M2.1: Month 6 -- working, simple, prototype of portal, with simple implementations of each subsystem
    • Deliverable D2.1: Month 6 -- code and related artifacts of a working simple portal prototype (ready for use by DC and INRIA)
    • Milestone M2.2: Month 24 -- Upgrades to portal that support changes made in the Scheduler and Specialization harness
    • Deliverable D2.2: Month 24 -- Technical Report, code, and other artifacts related to the implementation of the portal and supporting subsystems.
    • Milestone M2.3: Month 36 -- Upgrades to portal that support changes made in the Scheduler and Specialization harness
    • Deliverable D2.3: Month 36 -- Technical Report, code, and other artifacts related to the implementation of the portal and supporting subsystems.
  • Task2: Define Scheduling component within portal (ICL, XLab, Sean)
    • Milestone M2.4: Month 6 -- detailed description of the final full featured scheduler
    • Deliverable D2.4: Month 6 -- technical report that describes the interfaces to the scheduler and what functions it performs
  • Task3: Develop scheduler (ICL, XLab, Sean)
    • Milestone M2.5: Month 12 -- First, simple, version of the full scheduler integrated into portal (replaces "dummy" version inside prototype portal)
    • Deliverable D2.5: Month 12 -- code and related artifacts of the working scheduler
    • Milestone M2.6: Month 24 -- Subset of advanced scheduler features functioning and integrated into portal.
    • Deliverable D2.6: Month 24 -- code and related artifacts of the working second-stage scheduler
    • Milestone M2.7: Month 36 -- All advanced scheduler features working within the integrated system.
    • Deliverable D2.7: Month 36 -- code and related artifacts of the working final, advanced scheduler
  • Task4: Testing and performance tuning of full CloudDSM system (IBM, DC, INRIA, CWI, Gigas, ICL, XLab)
    • Milestone M2.: Month 12 --
    • Deliverable D2.: Month 12 --
    • Milestone M2.: Month 24 --
    • Deliverable D2.: Month 24 --
    • Milestone M2.: Month 36 --
    • Deliverable D2.: Month 36 --
Added line 34:
Added line 49:

Questions and comments related to the WP

March 19, 2014, at 06:03 AM by 80.114.135.137 -
Added lines 49-51:

Research regarding _search_ for best division of work and best deployment of it onto available hardware

March 19, 2014, at 04:22 AM by 80.114.135.137 -
March 19, 2014, at 04:22 AM by 80.114.135.137 -
Added lines 1-48:

Work Package 2

  • Task1: Develop portal -- gather specifications, produce design, make prototype, test with end-user, iterate (Leader XLab, major contribution by ICL, with input by DC, Gigas, and Sean; ICL 2 PMs)
  • Task2: Define Scheduling component within portal -- interface with other components inside portal, which supply information upon which to base decision and produces a scheduler to be carried out (Leader ICL, major contribution by XLab, with input by DC, Gigas, and Sean; ICL 21PMs): The scheduler will exploit a self-aware dynamic analysis approach which in an on-line manner and in real-time, will receive as input the state of all system resources, and calculate resource availability and expected execution times for tasks currently active within the portal and expected to be launched based on predictions derived from usage patterns. Scheduling decisions will combine detailed status information regarding expected response times, internal network delays, possible security risks, and possible reliability problems. The self-aware dynamic analysis will return a "short list" of the best instantaneous provisioning decisions, ranked according to performance, security, reliability, energy consumption and other relevant metrics. Based on the task to be provisioned, the scheduler will be able to decide on provisioning rapidly. The portal and collection of DSM runtime systems include monitoring of whether the performance objectives of the task are being met, which will re-trigger the scheduler to make a decision again if the observations indicate an unsatisfactory outcome, which will be based on the new system state as well as the overhead related to any changes in provisioning.

Milestone M2.2: (12 Months after the start of the project) A working simple prototype scheduler, plus a detailed description of the final full featured scheduler. The prototype will implement all interfaces and successfully generate a schedule, but using simple algorithms. The targeted final scheduler Will be described in a Technical Report defining the scheduler functionality, structure, and all of its components, and describing its interfaces to and integration with the portal subsystems that monitor hardware and application features and with the DSM runtime system.

  • Task3: Develop scheduler (Leader ICL with contributions to integration by XLab and Sean)

Milestone M2.3: (24 Months after the start of the project) A working scheduler that includes some advanced features. Measured performance improvements over the simple prototype scheduler. Milestone M2.4: (36 Months after the start of the project) All advanced features of scheduler added and demonstrated working within the system, with measured performance improvements delivered to end-user applications, which are due to the dynamic monitoring and re-provisioning.

  • Task4: Develop portal, including all sub-systems (Leader XLab with all partners; ICL 12PMs): The portal contains a specialization harness, a scheduler, a deployer, repository of specialized binaries, repository of application characteristics relevant to scheduling, statistics on application resource usage, and continuously updated state of the hardware resources. The portal receives instructions directly from human users, and observes resources and interacts with lower system levels, and also observes runing applications, and makes decisions for tasks. As such, the portal contains a scheduler that self-optimises the system in real time. Thus the portal and scheduler must be efficient and avoid complex computations. The portal will accomplish responsiveness by being highly distributed and by exploiting inputs from multiple agents that operate simultantously around different time scales, and around different resources, which share the gathered information with the scheduler. This will be the philosophy driving the development of the portal so that it is made of many distributed/concurrent lightweight components with only lightweight synchronisation while maximising information sharing.

Deliverable D2.3: (27 Months after the start of the project) Technical Report and Open Source Software that implements the PT.

  • Task5: Testing and performance tuning (IBM with all partners; ICL 6 PMs)

Question: "Why would a single Cloud VM be used to perform computation tasks from multiple applications?" Answer: from an overhead perspective, sharing a single "worker" VM among multiple applications is the most efficient way to go. But there must be isolation between applications, so this raises security concerns. Those have to be addressed, which requires extra development effort. So there's a balance between performance (overhead), security, and effort (IE, add security features to CloudDSM runtime is extra effort).

Note that when a DSM command is issued by application code, the application context is suspended. If there are no ready contexts within the application, then the proto-runtime system can switch to work from a different application within nano-seconds. This allows fine-grained interleaving of work with communication. In contrast, if the time to switch among applications is on the order of hundreds of micro-seconds, which it would be if the CPU had to switch to a different Cloud level VM, then the processor is better off simply sitting idle for any non-overlapped communication that is shorter than the switch-application time. In many situations, this will be the case, causing much lost performance and lost hardware utilization, and the customer being charged for idle CPU time. It is this loss that will be prevented by allowing a single VM, with its single DSM runtime instance inside it, to run work for multiple applications, and have the DSM runtime switch among the applications in its very fast way.

Question: "Doesn't the hypervisor automatically handle suspending an idle VM under the covers? Why is an explicit sleep/wake command needed in the CloudAPI?" Answer: When an application is started, the portal may over-provision, starting VMs on many machines, and then putting them to sleep. As soon as the application issues a high computation request, the VMs are woken and given work to do, with expected duration on the order of seconds to at most a minute. When the computation is done, most of the VMs are put back to sleep until the next time. The time to suspend and resume is less than the time to start the VM.

Comment: "A VM can yield a physical processor through a POWER Hypervisor call (instruction?). It puts idle VMs into a hibernation state so that they do not consume any physical CPU resources."

Comment: "For IBM, our binary-reoptimizer tool has a special loader that runs the application as a thread within the same process as the optimizer and kicks off dynamic monitoring and recompilation. So there is one dynamic optimization process for each application running in a VM."

Comment: "In summary, there are two cases: (1) A given application has no work ready, with which to overlap communication (2) The whole DSM runtime system has no work ready. In case 1, the sharing of the hardware is more efficiently performed inside the DSM runtime, where it happens on the order of nanoseconds, which is why it's better for the DSM runtime to handle switching among applications. In case 2, the hypervisor has no way to know that the DSM runtime has no work! It sees the polling the DSM does while waiting for new incoming work as a busy application, so the hypervisor keeps the DSM going. The DSM runtime needs a way to tell the hypervisor to suspend the idle VM. After all, the polling consumes cpu time which the end-user pays money for! At the moment, proto-runtime simply uses pthread yield when it is idle, polling for new incoming work.. but it's not clear that will be enough to get the VM to stop re-scheduling it.. Also, from a higher level, the deployment tool knows periods when there's no work for a specific DSM runtime instance, and can issue a sleep/yield to the VM, which is faster than a full suspend-to-disk..

Comment: "On IBM there are two virtualization layers: Power hypervisor, and Cloud stack hypervisor. The Power hypervisor is not visible to the cloud stack, and is also much faster."

Question: "Why is it advantageous to have Gigas's KVM based Cloud stack, which allows an entire machine to be given exclusively to a given computation task. Isn't this wasteful?" Answer: This allows the DSM runtime to manage the assignment of work onto cores, which happens on the order of nano-seconds, far faster than a hypervisor could manage assignment of work onto cores. This assignment of whole machine to a VM is how EC2 from Amazon works, as well, for example. It doesn't waste, because the machine is fully occupied by the computation.

Question: "How does the cloud VM layer relate to the DSM runtime and the CloudDSM portal?" Answer: There is one instance of the DSM runtime inside each Cloud level VM. The DSM might directly use hypervisor commands to cause the VM it is inside of to fast-yield/sleep, at the point the DSM runtime detects that it has no ready work. The portal, though, decides when the VM should be long-term suspended or shutdown. The Cloud VM is given all the cores of a machine whenever possible, then the DSM within directly manages assigning application work to the cores, which includes suspending execution contexts at the point they perform a DSM call or synchronization. The portal runs inside its own Cloud VMs. It performs Cloud level control of creating VMs, suspending them to disk, starting DSM runtimes inside them, receiving command requests from application front-ends, and starting work-units within chosen VMs. The portal knows about the physical providers, and what hardware is at each, and what Cloud API commands to use to start VMs at them, and what Cloud API commands to use to learn how busy each location is.. hopefully the Cloud stack API has some way of informing the portal about uptime, or load, within a given physical location. The portal has a set of VMs that run code that performs the decision making about how to divide the work of a request among the providers and among the VMs created within a given provider. It may be advantageous for the portal to have Cloud APIs available that expose the characteristics of the hardware within the provider, and allows a measure of control over assignment of VMs to the hardware. The DSMs report status to the portal, which may decide to take work away from poor performing VMs and give it to others, perhaps even VMs in a different physical location. (It is unlikely that a VM itself will be migrated, but rather the work assigned to the DSM runtime inside the VM).

Question: "Is it possible that the cloud would want to move a VM from one core to another?" Answer: control over cores is inside the DSM runtime system. It's too fine grained for the Cloud stack or portal to manage. When a VM is created, all the cores of the physical machine should be given to that VM, and the hypervisor should let that VM own the hardware for as long as possible, ideally until the DSM runtime signals that it has run out of work.

Question: "How do units of work map onto Cloud level entities?" Answer: I see a hierarchical division of work.. at the highest level, work is divided among physical locations.. then at a given location, the work it receives is divided again, among VMs created within that Cloud host. If the Cloud stack at a host allows entire machines to be allocated, in the way Gigas and Amazon EC2 do, then a further division of work is performed, among the CPUs in the machine, which is managed by the DSM runtime. Hence, the portal invokes APIs to manage work chunks starting execution at the level of providers and at the level of Cloud VMs. The DSM system inside a given VM manages work chunks starting on individual cores inside a VM.

Within a given VM, the DSM runtime will create its own "virtual processors", or VPs, which are analogous to threads.. all VPs are inside the same hardware-enforced coherent virtual address space. The DSM will switch among these, in order to overlap communications to remote DSM instances, which are running inside different VMs.

Question: "What components of the CloudDSM system care about best resource allocation from the cloud perspective?" Answer: The calculation of best cloud level resource allocation is encapsulated inside a module that the portal runs. The portal collects information from the application, which the toolchain packages, and the portal collects information about the hardware at each provider, and about the current load on that hardware, and it collects statistics on each of the commands invoked by a given application, and it gives all of this information to the resource-calculation module. That module determines the best way to break up the work represented by the user command, and distribute the pieces across providers and across VMs. Within a VM, the DSM runtime system independently and dynamically decides allocation among the CPUs. Erol Gelenbe at Imperial College will be handling the algorithms for the Cloud level resource calculations.

Comment and Question: "if the DSM detects that it is running late, it can issue a request to the portal to add more machines to the on-going computation. The resource tool re-calculates the division of work." Question: "What information is the portal going to base its decisions on?" Answer: the DSM runtime running inside a given VM communicates status to the portal. Annotations might be inserted into the low-level source form that help the DSM runtime with this task, or the DSM runtime may end up handling this all by itself.