Open Source Research Institutue - CloudDSM

March 25, 2014, at 12:07 PM by 80.114.134.224 -

Changed line 3 from:

* ~~Task3~~: Develop scheduler (ICL, XLab, Sean)

to:

* Task1: Develop scheduler (ICL, XLab, Sean)

Changed line 11 from:

* ~~Task3~~: Develop scheduler (Leader ICL with contributions to integration by XLab and Sean)

to:

* Task1: Develop scheduler (Leader ICL with contributions to integration by XLab and Sean)

Added lines 16-19:

Task 2: Develop Model of system that speeds up the search for best work division and best scheduling choices. The model should be very light weight so that it can be used inside the runtime to make choices as a program executes. Employ the model inside a scheduler, which is plugged in to the CloudDSM system, and measure the improvement in work division and scheduling quality achievable with its use.

Task 3: Develop Theory of scheduling that establishes well founded mathematical basis for discussions of scheduling. Use theory to prove conclusions about particular scheduler approaches when applied to applications that have particular features. Apply the theory to drive the development of a scheduler that takes advantage of the additional information provided by the CloudDSM annotations. Plug the scheduler into the CloudDSM system and measure the improvement in scheduling choices achieved with its use.

March 25, 2014, at 11:56 AM by 80.114.134.224 -

Added lines 1-15:

!!WP 6 -- Work Division and Scheduling

* Task3: Develop scheduler (ICL, XLab, Sean)
* Milestone M2.5: Month 12 -- First, simple, version of the full scheduler integrated into portal (replaces "dummy" version inside prototype portal)
* Deliverable D2.5: Month 12 -- code and related artifacts of the working scheduler
* Milestone M2.6: Month 24 -- Subset of advanced scheduler features functioning and integrated into portal.
* Deliverable D2.6: Month 24 -- code and related artifacts of the working second-stage scheduler
* Milestone M2.7: Month 36 -- All advanced scheduler features working within the integrated system.
* Deliverable D2.7: Month 36 -- code and related artifacts of the working final, advanced scheduler

* Task3: Develop scheduler (Leader ICL with contributions to integration by XLab and Sean)
Milestone M2.3: (24 Months after the start of the project) A working scheduler that includes some advanced features. Measured performance improvements over the simple prototype scheduler.
Milestone M2.4: (36 Months after the start of the project) All advanced features of scheduler added and demonstrated working within the system, with measured performance improvements delivered to end-user applications, which are due to the dynamic monitoring and re-provisioning.

The scheduler will exploit a self-aware dynamic analysis approach which in an on-line manner and in real-time, will receive as input the state of all system resources, and calculate resource availability and expected execution times for tasks currently active within the portal and expected to be launched based on predictions derived from usage patterns. Scheduling decisions will combine detailed status information regarding expected response times, internal network delays, possible security risks, and possible reliability problems. The self-aware dynamic analysis will return a "short list" of the best instantaneous provisioning decisions, ranked according to performance, security, reliability, energy consumption and other relevant metrics. Based on the task to be provisioned, the scheduler will be able to decide on provisioning rapidly. The portal and collection of DSM runtime systems include monitoring of whether the performance objectives of the task are being met, which will re-trigger the scheduler to make a decision again if the observations indicate an unsatisfactory outcome, which will be based on the new system state as well as the overhead related to any changes in provisioning.

March 25, 2014, at 11:51 AM by 80.114.134.224 -

Deleted lines 0-32:

* Task1: divide application into user client, computation kernels, and work division
* Milestone M6.: Month 12 --
* Deliverable D6.: Month 12 --
* Milestone M6.: Month 24 --
* Deliverable D6.: Month 24 --
* Milestone M6.: Month 36 --
* Deliverable D6.: Month 36 --
* Task2: mock up using annotations
* Milestone M6.: Month 12 --
* Deliverable D6.: Month 12 --
* Milestone M6.: Month 24 --
* Deliverable D6.: Month 24 --
* Milestone M6.: Month 36 --
* Deliverable D6.: Month 36 --
* Task3: employ the various interfaces
* Milestone M6.: Month 12 --
* Deliverable D6.: Month 12 --
* Milestone M6.: Month 24 --
* Deliverable D6.: Month 24 --
* Milestone M6.: Month 36 --
* Deliverable D6.: Month 36 --

* Task3: employ the various interfaces
* Milestone M7.: Month

* Task4: Develop a software solution for de-novo genome assembly using De-Brujin graphs, which utilizes the CloudDSM system. (LarkBio)
Metagenomic studies aim to identify novel enzyme candidates in bacterial communities, and has a variety of applications in medicine, biotechnology and agriculture. The sequence data to process is large, and come from the DNA of different species. More data means higher probability of covering also the least abundant of these species. In order to identify novel enzyme candidates, we have to run de-novo assembly algorithms to join the short reads building up longer fragments of DNA, and run gene predictions algorithms on these fragments. Current de-novo algorithms build up an in-memory graph processing all reads, then analyze this graph. The size of the available memory limits the number of reads that can be handled this way, limiting the enzyme candidates discovered in the sample. This is the problem that we address using the CloudDSM system.

* Task5: Validate the solution on a dataset which can be solved using currently available tools. (LarkBio)
As the validation of the software, we run the de-novo assembly on a smaller data set using CloudDSM and plain EC2 instance, and compare the results.

* Task6: Validate the solution on large dataset, which is not feasible without CloudDSM. (LarkBio)
The next step of the validation is to run the algorithm on an extended data set, and verify if novel fragments are assembled this way.

March 25, 2014, at 10:56 AM by 80.99.199.138 -

Changed lines 27-28 from:

to:

Metagenomic studies aim to identify novel enzyme candidates in bacterial communities, and has a variety of applications in medicine, biotechnology and agriculture. The sequence data to process is large, and come from the DNA of different species. More data means higher probability of covering also the least abundant of these species. In order to identify novel enzyme candidates, we have to run de-novo assembly algorithms to join the short reads building up longer fragments of DNA, and run gene predictions algorithms on these fragments. Current de-novo algorithms build up an in-memory graph processing all reads, then analyze this graph. The size of the available memory limits the number of reads that can be handled this way, limiting the enzyme candidates discovered in the sample. This is the problem that we address using the CloudDSM system.

Changed lines 30-31 from:

* Task6: Validate the ~~solution on large dataset~~, ~~which is not feasible without CloudDSM.~~ (LarkBio)

to:

As the validation of the software, we run the de-novo assembly on a smaller data set using CloudDSM and plain EC2 instance, and compare the results.

* Task6: Validate the solution on large dataset, which is not feasible without CloudDSM. (LarkBio)
The next step of the validation is to run the algorithm on an extended data set, and verify if novel fragments are assembled this way.

March 24, 2014, at 07:22 AM by 92.76.168.74 -

Deleted line 0:

Added lines 22-30:

* Task3: employ the various interfaces
* Milestone M7.: Month

* Task4: Develop a software solution for de-novo genome assembly using De-Brujin graphs, which utilizes the CloudDSM system. (LarkBio)

* Task5: Validate the solution on a dataset which can be solved using currently available tools. (LarkBio)

* Task6: Validate the solution on large dataset, which is not feasible without CloudDSM. (LarkBio)

March 19, 2014, at 05:28 AM by 80.114.135.137 -

Added lines 1-22:

* Task1: divide application into user client, computation kernels, and work division
* Milestone M6.: Month 12 --
* Deliverable D6.: Month 12 --
* Milestone M6.: Month 24 --
* Deliverable D6.: Month 24 --
* Milestone M6.: Month 36 --
* Deliverable D6.: Month 36 --
* Task2: mock up using annotations
* Milestone M6.: Month 12 --
* Deliverable D6.: Month 12 --
* Milestone M6.: Month 24 --
* Deliverable D6.: Month 24 --
* Milestone M6.: Month 36 --
* Deliverable D6.: Month 36 --
* Task3: employ the various interfaces
* Milestone M6.: Month 12 --
* Deliverable D6.: Month 12 --
* Milestone M6.: Month 24 --
* Deliverable D6.: Month 24 --
* Milestone M6.: Month 36 --
* Deliverable D6.: Month 36 --

Home Edit History Recent Changes
OSRI Home Proto-Runtime Home PStack Home Computation Model BLIS Home Code All OSRI papers VMSSkin ? powered by PmWiki	CloudDSM.WP6 History Hide minor edits - Show changes to output - Cancel March 25, 2014, at 12:07 PM by 80.114.134.224 - Restore March 25, 2014, at 12:07 PM by 80.114.134.224 - Changed line 3 from: * ~~Task3~~: Develop scheduler (ICL, XLab, Sean) to: * Task1: Develop scheduler (ICL, XLab, Sean) Changed line 11 from: * ~~Task3~~: Develop scheduler (Leader ICL with contributions to integration by XLab and Sean) to: * Task1: Develop scheduler (Leader ICL with contributions to integration by XLab and Sean) Added lines 16-19: Task 2: Develop Model of system that speeds up the search for best work division and best scheduling choices. The model should be very light weight so that it can be used inside the runtime to make choices as a program executes. Employ the model inside a scheduler, which is plugged in to the CloudDSM system, and measure the improvement in work division and scheduling quality achievable with its use. Task 3: Develop Theory of scheduling that establishes well founded mathematical basis for discussions of scheduling. Use theory to prove conclusions about particular scheduler approaches when applied to applications that have particular features. Apply the theory to drive the development of a scheduler that takes advantage of the additional information provided by the CloudDSM annotations. Plug the scheduler into the CloudDSM system and measure the improvement in scheduling choices achieved with its use. Restore March 25, 2014, at 11:56 AM by 80.114.134.224 - Added lines 1-15: !!WP 6 -- Work Division and Scheduling * Task3: Develop scheduler (ICL, XLab, Sean) * Milestone M2.5: Month 12 -- First, simple, version of the full scheduler integrated into portal (replaces "dummy" version inside prototype portal) * Deliverable D2.5: Month 12 -- code and related artifacts of the working scheduler * Milestone M2.6: Month 24 -- Subset of advanced scheduler features functioning and integrated into portal. * Deliverable D2.6: Month 24 -- code and related artifacts of the working second-stage scheduler * Milestone M2.7: Month 36 -- All advanced scheduler features working within the integrated system. * Deliverable D2.7: Month 36 -- code and related artifacts of the working final, advanced scheduler * Task3: Develop scheduler (Leader ICL with contributions to integration by XLab and Sean) Milestone M2.3: (24 Months after the start of the project) A working scheduler that includes some advanced features. Measured performance improvements over the simple prototype scheduler. Milestone M2.4: (36 Months after the start of the project) All advanced features of scheduler added and demonstrated working within the system, with measured performance improvements delivered to end-user applications, which are due to the dynamic monitoring and re-provisioning. The scheduler will exploit a self-aware dynamic analysis approach which in an on-line manner and in real-time, will receive as input the state of all system resources, and calculate resource availability and expected execution times for tasks currently active within the portal and expected to be launched based on predictions derived from usage patterns. Scheduling decisions will combine detailed status information regarding expected response times, internal network delays, possible security risks, and possible reliability problems. The self-aware dynamic analysis will return a "short list" of the best instantaneous provisioning decisions, ranked according to performance, security, reliability, energy consumption and other relevant metrics. Based on the task to be provisioned, the scheduler will be able to decide on provisioning rapidly. The portal and collection of DSM runtime systems include monitoring of whether the performance objectives of the task are being met, which will re-trigger the scheduler to make a decision again if the observations indicate an unsatisfactory outcome, which will be based on the new system state as well as the overhead related to any changes in provisioning. Restore March 25, 2014, at 11:51 AM by 80.114.134.224 - Deleted lines 0-32: * Task1: divide application into user client, computation kernels, and work division * Milestone M6.: Month 12 -- * Deliverable D6.: Month 12 -- * Milestone M6.: Month 24 -- * Deliverable D6.: Month 24 -- * Milestone M6.: Month 36 -- * Deliverable D6.: Month 36 -- * Task2: mock up using annotations * Milestone M6.: Month 12 -- * Deliverable D6.: Month 12 -- * Milestone M6.: Month 24 -- * Deliverable D6.: Month 24 -- * Milestone M6.: Month 36 -- * Deliverable D6.: Month 36 -- * Task3: employ the various interfaces * Milestone M6.: Month 12 -- * Deliverable D6.: Month 12 -- * Milestone M6.: Month 24 -- * Deliverable D6.: Month 24 -- * Milestone M6.: Month 36 -- * Deliverable D6.: Month 36 -- * Task3: employ the various interfaces * Milestone M7.: Month * Task4: Develop a software solution for de-novo genome assembly using De-Brujin graphs, which utilizes the CloudDSM system. (LarkBio) Metagenomic studies aim to identify novel enzyme candidates in bacterial communities, and has a variety of applications in medicine, biotechnology and agriculture. The sequence data to process is large, and come from the DNA of different species. More data means higher probability of covering also the least abundant of these species. In order to identify novel enzyme candidates, we have to run de-novo assembly algorithms to join the short reads building up longer fragments of DNA, and run gene predictions algorithms on these fragments. Current de-novo algorithms build up an in-memory graph processing all reads, then analyze this graph. The size of the available memory limits the number of reads that can be handled this way, limiting the enzyme candidates discovered in the sample. This is the problem that we address using the CloudDSM system. * Task5: Validate the solution on a dataset which can be solved using currently available tools. (LarkBio) As the validation of the software, we run the de-novo assembly on a smaller data set using CloudDSM and plain EC2 instance, and compare the results. * Task6: Validate the solution on large dataset, which is not feasible without CloudDSM. (LarkBio) The next step of the validation is to run the algorithm on an extended data set, and verify if novel fragments are assembled this way. Restore March 25, 2014, at 10:56 AM by 80.99.199.138 - Changed lines 27-28 from: to: Metagenomic studies aim to identify novel enzyme candidates in bacterial communities, and has a variety of applications in medicine, biotechnology and agriculture. The sequence data to process is large, and come from the DNA of different species. More data means higher probability of covering also the least abundant of these species. In order to identify novel enzyme candidates, we have to run de-novo assembly algorithms to join the short reads building up longer fragments of DNA, and run gene predictions algorithms on these fragments. Current de-novo algorithms build up an in-memory graph processing all reads, then analyze this graph. The size of the available memory limits the number of reads that can be handled this way, limiting the enzyme candidates discovered in the sample. This is the problem that we address using the CloudDSM system. Changed lines 30-31 from: * Task6: Validate the ~~solution on large dataset~~, ~~which is not feasible without CloudDSM.~~ (LarkBio) to: As the validation of the software, we run the de-novo assembly on a smaller data set using CloudDSM and plain EC2 instance, and compare the results. * Task6: Validate the solution on large dataset, which is not feasible without CloudDSM. (LarkBio) The next step of the validation is to run the algorithm on an extended data set, and verify if novel fragments are assembled this way. Restore March 24, 2014, at 07:22 AM by 92.76.168.74 - Deleted line 0: Added lines 22-30: * Task3: employ the various interfaces * Milestone M7.: Month * Task4: Develop a software solution for de-novo genome assembly using De-Brujin graphs, which utilizes the CloudDSM system. (LarkBio) * Task5: Validate the solution on a dataset which can be solved using currently available tools. (LarkBio) * Task6: Validate the solution on large dataset, which is not feasible without CloudDSM. (LarkBio) Restore March 19, 2014, at 05:28 AM by 80.114.135.137 - Added lines 1-22: * Task1: divide application into user client, computation kernels, and work division * Milestone M6.: Month 12 -- * Deliverable D6.: Month 12 -- * Milestone M6.: Month 24 -- * Deliverable D6.: Month 24 -- * Milestone M6.: Month 36 -- * Deliverable D6.: Month 36 -- * Task2: mock up using annotations * Milestone M6.: Month 12 -- * Deliverable D6.: Month 12 -- * Milestone M6.: Month 24 -- * Deliverable D6.: Month 24 -- * Milestone M6.: Month 36 -- * Deliverable D6.: Month 36 -- * Task3: employ the various interfaces * Milestone M6.: Month 12 -- * Deliverable D6.: Month 12 -- * Milestone M6.: Month 24 -- * Deliverable D6.: Month 24 -- * Milestone M6.: Month 36 -- * Deliverable D6.: Month 36 -- Restore Edit Page History Source Attach File Backlinks List Group Page last modified on March 25, 2014, at 12:07 PM
skin config ? pmwiki-2.2.37

CloudDSM.WP6 History