Zoltan Kovacs

Larkbio zoltan.kovacs@larkbio.com

-] What application will you create (or enhance) as part of the project (what does it do)

Genetics based personalized medicine enters inevitably into everyday clinical routine in a few years. Numerous medical fields will utilize Next Generation Sequencing generated data to diagnostic (prenatal screening, oncology, syndrome identification) and therapeutical processes (genetics based drug choice, effect/adverse effect prediction). On-site analysis however requires specifically designed applications and huge computational capacity (both raw data computation and validated interpretation).

-] Who will use the application (will it be used by end users in the community, or people inside a company using it as a tool, or for advancing science..)

The application will be used by patient care institutions at the periphery. Since diverse fields and requirements are to take into account, use case scenarios will be realized.

-] What are the application's computation needs that will benefit from parallel computation

Genome sequencing with the highest sensitivity results in terabyte of data for one human. Alignment, interpretation and validation exponentiates the demand and even if multiple analyzes parallel, the computation far exceeds the limits of alone computers.

-] What has blocked you from using parallel computation up to this point

Up to this point the routine use of sequence data was not feasible in clinical care, because the processing time is in the range of days or weeks. If the processing time can be reduced to hours or even real-time, it would allow sequencing to be a routine clinical test. By the time personalized genetics gets into routine, infrastructural background must be ready on hand.

-] How will your application provide higher benefit as a result of the parallel computation (what currently can't be done that will be enabled, or what aspect will be improved. For example, will weeks of waiting for simulation results drop to hours? Will a researcher be able to interactively search, rather than doing a scatter shot of simulations and hoping one of them was the right one? Will a product be producible with less material or less design effort? Will the graphics be richer, or render faster, or use less battery?)

Computation time highly reduces, paralell analyses can be run.

-] Who will receive this benefit and how (for example, will the application help cure cancer for millions of EU citizens by enabling doctors to use personalized genetics?)

Patients waiting for diagnoses benefit from a highly specific result, patients under therapy benefit from a personalized choice, where therapeutic answer is optimized and adverse effects are minimized. Health care systems may have less expenses because of higher therapy response and fewer side effect hospitalization.

-] What market(s) will be affected by this benefit, and will this benefit be passed on, to yet further markets

Markets affected: medical diagnostics, pharmaceutical industry, medical industry.

-] What language(s) do you plan to use for the application

The application will consist of algorithms written in C++, less critical parts possibly in Java.

-] Is the application.. desktop based, Cloud (SAAS) based, browser based, or mobile

The application is Cloud (SAAS) based, large amount of raw data needs to be uploaded and processed per patient. The algorithms generate the information which can be interpreted by the doctors. The main component of the system is the analysis algorithms on the server side. The resulting data needs to be displayed in a browser or mobile application.

-] A little bit about the architecture (do you have a server with database, or a large data set that is churned through such as Big Data style, what parts of the computation are performed on the end-user device versus in a server, and so on)

The raw data is text based, which needs to be matched against a reference. The primary result of the server side algorithms is the mapping information. The raw data will be discarded after the mapping, only the preprocessed information is stored and used for further analysis. The mapping can be highly parallelized, as the reads can be aligned independently.