Open Source Research Institutue :: Main / EuroDSLEarth

Main.EuroDSLEarth History

Hide minor edits - Show changes to output

February 13, 2015, at 12:49 AM by 24.6.239.131 -

Changed line 12 from:

!-----------------------------------

to:

-----------------------------------

Changed line 70 from:

!---------------------------------------

to:

---------------------------------------

Restore

February 13, 2015, at 12:48 AM by 24.6.239.131 -

Changed line 70 from:

~~========================================~~

to:

!---------------------------------------

Restore

February 13, 2015, at 12:47 AM by 24.6.239.131 -

Changed line 12 from:

!~~=============================~~

to:

!-----------------------------------

Restore

February 13, 2015, at 12:47 AM by 24.6.239.131 -

Changed line 12 from:

=============================

to:

!=============================

Restore

February 13, 2015, at 12:47 AM by 24.6.239.131 -

Changed line 12 from:

writing code for 20+ years, focused on the processing of Earth Observation (EO) data. I have on a few occasions got involved with parallel computing, but instead primarily relied on serial code with multiple instances running because of the effort involved. At present I mainly code in IDL, with some C and other languages, but I’ve started to work with Python as it has become very popular and there are a large number of open source EO focused libraries.

to:

=============================

Changed lines 14-16 from:

I’m attaching an overview of my company, Pixalytics Ltd, which includes projects we’ve worked on.

I primarily write code to implement scientific algorithms to process satellite data or perform quality control activities. As an example, I’ve attach a paper I published last year on the atmospheric correction of Landsat data. This underlying approach uses a Least Squares error minimization approach to solve the non-linear equations. I’m currently looking at other statistical learning approaches, e.g. Generalized Additive Models and Support Vector Machines, as I have other algorithms where Least Squares isn’t performing sufficiently. This does also link into data mining as a large amount of data is processed and information retained, which is then used when performing a future data processing.

to:

-] What application will you create (or enhance) as part of the project (what does it do)

Changed line 16 from:

~~Yes, assuming there’s sufficient funds available, my plan would be to employ a student / developer who’d work on the project (ideally full-time) alongside me~~.

to:

We have a suite of software that’s processing Landsat data collected over the UK during 2013/14/15. A PhD student of mine has gone though and found the ‘best’ images so that we have an optimum set covering the whole of the UK. The processing includes correction for the effects of the atmosphere, cloud identification (including cloud shadows) and then processing to value added products such a vegetation indices. At this point we have mainly concentrated on the atmospheric correction. Different techniques process the data on a pixel by pixel basis or as group of pixels called objects. This processing includes statistical techniques to solve non-linear equations, which involves iteration towards the best solution.

Changed line 18 from:

~~In the current data mining project I work with an SME called Terradue (http://www.terradue.com/). I’d be happy to introduce them~~ to ~~you~~.

to:

As new images become available this dataset will be updated and the aim is to go from a single UK set to separate seasonal sets so that we can analyse the seasonal cycle. Also, we would like to go backwards in time, using older sensors, so that long-term changes can be mapped. This requires additional analysis to check the data is cross-calibrated.

Added line 20:

In the longer term we’d also look to extend the dataset to Europe / international countries.

Changed line 22 from:

-] ~~What application~~ will ~~you create (or enhance) as part of~~ the ~~project (~~what ~~does it do~~)

to:

-] Who will use the application (how large is the audience, what areas.. end users in the community, or people inside a company using it as a tool, or for advancing science..)

Changed line 24 from:

~~We have a suite of software that’s processing Landsat data collected over~~ the ~~UK during 2013/14/15. A PhD student of mine has gone though and found the ‘best’ images so that we have an optimum set covering the whole~~ of the ~~UK. The processing includes correction for the effects of~~ the atmosphere, cloud identification (including cloud shadows) and then processing to value added products such a vegetation indices. At this point we have mainly concentrated on the atmospheric correction. Different techniques process the data on a pixel by pixel basis or as group of pixels called objects. This processing includes statistical techniques to solve non-linear equations, which involves iteration towards the best solution.

to:

The Landsat processing software will remain internal to the company, but the Python parallel computing module(s) will be made available to the community (e.g. through GitHub) so they can be used more widely. In addition, a version of the processed Landsat dataset will be made available to the community with addition value-added products available for sale.

Changed line 26 from:

~~As new images~~ become ~~available this dataset will be updated~~ and ~~the aim is to go from~~ a ~~single UK set~~ to separate seasonal sets so that we can analyse the seasonal cycle. Also, we would like to go backwards in time, using older sensors, so that long-term changes can be mapped. This requires additional analysis to check the data is cross-~~calibrated~~.

to:

Python has become very popular for remote sensing applications and so I think that providing a simple means to allow parallel processing will be of benefit to many organisations; both academic and commercial. We will promote the project at conferences and through papers / articles, will the aim of encouraging an active take-up.

Changed line 28 from:

~~In the longer term we’d also look to extend the dataset to Europe / international countries.~~

to:

-] What are the computation needs that will benefit from parallel computation

Changed line 30 from:

~~-] Who will use~~ the ~~application (how large is~~ the ~~audience~~, ~~what areas.. end users in~~ the ~~community, or people inside a company using it as a tool, or for advancing science~~..)

to:

The computing need stems from the large size of the dataset, both individual scenes (of the order of 9000 by 9000 pixels in size) and dataset as a whole, with the interest being in having the ability to process / reprocess it much quicker. I would like to include on-the-fly processing so that datasets are created within a few hours of users requesting them, which will be very useful for us to test software updates internally plus for users ordering data.

Changed line 32 from:

to:

-] What has blocked you from using parallel computation up to this point

Changed line 34 from:

to:

The time/resources to write code that does perform parallel computing properly – we currently undertake this by starting multiple instances of the code. We have looked at Hadoop as a way forward, but significant resources would be required to get to a working system.

Changed line 36 from:

-] ~~What are the computation needs that will~~ benefit ~~from parallel computation~~

to:

-] How will your application provide higher benefit as a result of the parallel computation (what currently can't be done that will be enabled, or what aspect will be improved. For example, will weeks of waiting for simulation results drop to hours? Will a researcher be able to interactively search, rather than doing a scatter shot of simulations and hoping one of them was the right one? Will a product be producible with less material or less design effort? Will the graphics be richer, or render faster, or use less battery?)

Changed line 38 from:

~~The computing need stems from~~ the large size of the dataset, both individual scenes (of the order of 9000 by 9000 pixels in size) and dataset as a whole, with the interest being in having the ability to process / reprocess it much quicker. I would like to include on-the-fly processing so that datasets are created within a few hours of users requesting them, which will be very useful for us to test software updates internally plus for users ordering data.

to:

Yes, the aim is for the processing to drop from weeks / months to a few hours / days depending on what’s being processed. As mentioned above, on-the-fly processing (i.e. on demand) processing is also of interest. As processing speeds improve we can increase the size of dataset, both in terms of short / long term temporal changes (i.e. seasonal and climatological processing and the area of interest (e.g. extending beyond the UK).

Changed line 40 from:

~~-] What has blocked you from using parallel computation up to this point~~

to:

We would also be interest in having improved visualisation of the processed dataset, which itself remains large and so slow to load and manipulate.

Changed line 42 from:

to:

-] Who will receive this benefit and how (for example, will the application help cure cancer for millions of EU citizens by enabling doctors to use personalized genetics?)

Changed lines 44-46 from:

~~-] How~~ will your application provide higher benefit as a result of the parallel computation (what currently can't be done that will be enabled, or what aspect will be improved. For example, will weeks of waiting for simulation results drop to hours? Will a researcher be able to interactively search, rather than doing a scatter shot of simulations and hoping one of them was the right one? Will a product be producible with less material or less design effort? Will the graphics be richer, or render faster, or use less battery?)

to:

There will be a benefit to the wider remote sensing community who undertake many different applications. The aims for the Pixalytics Landsat dataset is to provide information to:
- aid urban planning decisions (e.g. understanding the green spaces within cities and how cities are developing/changing over time)
- understand changes in vegetation more widely across the UK that is linked to changing land use practises and climate change.

Changed line 48 from:

to:

In addition, there will also be benefits for Pixalytics Ltd (as a small commercial company) in that we’ll be able to develop our business offering.

Changed line 50 from:

~~We would also be interest in having improved visualisation of the processed dataset, which itself remains large and so slow to load and manipulate~~.

to:

And, for logistics of the project, we would like to start by coding in C/C++ but are open to integrating into other languages, such as Python, Java, or even Javascript. Could you say a bit about your development process:

Changed line 52 from:

-] ~~Who will receive this benefit and how (~~for ~~example, will the application help cure cancer for millions of EU citizens by enabling doctors to use personalized genetics?)~~

to:

-] What language(s) do you plan to use for the application

Changed lines 54-56 from:

~~There~~ will be a ~~benefit to the wider remote sensing community who undertake many different applications. The aims for the Pixalytics Landsat dataset is~~ to provide information to:
- aid urban planning decisions (e.g. understanding the green spaces within cities and how cities are developing/changing over time)
- understand changes in vegetation more widely across the UK that is linked to changing land use practises and climate change.

to:

The application will be a combination of Interactive Data Language (IDL) and Python. The current thoughts are that computational intensive elements will be converted to Python so that they be run as parallel processing.

Changed line 56 from:

~~In addition, there will also be benefits for Pixalytics Ltd (as a small commercial company) in that we’ll be able to develop our business offering.~~

to:

-] Is the application.. desktop based, Cloud (SAAS) based, browser based, or mobile

Changed line 58 from:

~~And, for logistics of the project, we would like to start by coding in C/C++ but are open to integrating into other languages, such as Python, Java, or even Javascript.~~ ~~Could you say a bit about your development process:~~

to:

The application will primarily be run on a cloud based server, but there will also be a user web-based interface to the results that can be access through hand held /mobile hardware alongside PCs/laptops.

Changed line 60 from:

-] ~~What language(s) do you plan to use for the application~~

to:

-] A little bit about the architecture (do you have a server with database, or a large data set that is churned through such as Big Data style, what parts of the computation are performed on the end-user device versus in a server, and so on)

Changed line 62 from:

The ~~application will be~~ a ~~combination of Interactive Data Language (IDL) and Python. The current thoughts are that computational intensive elements will be converted to Python so that they be run as parallel processing~~.

to:

The data is stored within a directory tree structure within individual files being 10-30 GB in size, uncompressed, with the overall UK input dataset is ~500 GB (compressed) in size.

Changed line 64 from:

~~-] Is the application.. desktop based, Cloud (SAAS) based, browser based, or mobile~~

to:

I really appreciate your time in figuring out those things, it will help make the case for obtaining funding for the project.

Changed line 66 from:

~~The application will primarily be run on a cloud based server, but there~~ will ~~also~~ be ~~a user web-based interface to~~ the ~~results that can be access through hand held /mobile hardware alongside PCs/laptops~~.

to:

From what you mentioned, it sounds like your story will be strong as far as the computation parts. I'm thinking the end-user benefits part may need clarification, about the impact on the EU. Also, perhaps there is a GUI aspect.. do you need data visualization ?

Changed line 68 from:

to:

Yes, once the data is processed then my aim would be to have a GUI that can be used to visualize the data.

Changed lines 70-73 from:

~~The data is stored within a directory tree structure within individual files being 10-30 GB in size, uncompressed, with the overall UK input dataset is ~500 GB (compressed) in size~~.

to:

========================================

writing code for 20+ years, focused on the processing of Earth Observation (EO) data. I have on a few occasions got involved with parallel computing, but instead primarily relied on serial code with multiple instances running because of the effort involved. At present I mainly code in IDL, with some C and other languages, but I’ve started to work with Python as it has become very popular and there are a large number of open source EO focused libraries.

Changed lines 75-77 from:

~~I really appreciate your time in figuring out those things, it will help make the case for obtaining funding for the project~~.

to:

Changed line 79 from:

~~From what you mentioned~~, ~~it sounds like your story will be strong as far as the computation parts. I'm thinking the end~~-~~user benefits part may need clarification, about the impact on the EU~~. ~~Also, perhaps there is a GUI aspect.. do you need data visualization ?~~

to:

Yes, assuming there’s sufficient funds available, my plan would be to employ a student / developer who’d work on the project (ideally full-time) alongside me.

Changed lines 81-82 from:

~~Yes, once~~ the data ~~is processed then my aim would be to have a GUI that can be used~~ to ~~visualize the data~~.

to:

In the current data mining project I work with an SME called Terradue (http://www.terradue.com/). I’d be happy to introduce them to you.

Restore

February 13, 2015, at 12:32 AM by 24.6.239.131 -

Added lines 1-78:

Dr Samantha Lavender

Director, Pixalytics Ltd - Trusted Earth Observation Experts

1 Davy Road, Plymouth Science Park, Plymouth, Devon, PL6 8BX, UK
Mobile: +44 (0)7739 905541 Phone: +44 (0)1752 764407
E-mail: slavender@pixalytics.com https://twitter.com/samlvndr

http://www.pixalytics.com
Earth Observation Products and Services: http://www.pixalytics.com/what-we-do/solutions/

writing code for 20+ years, focused on the processing of Earth Observation (EO) data. I have on a few occasions got involved with parallel computing, but instead primarily relied on serial code with multiple instances running because of the effort involved. At present I mainly code in IDL, with some C and other languages, but I’ve started to work with Python as it has become very popular and there are a large number of open source EO focused libraries.

I’m attaching an overview of my company, Pixalytics Ltd, which includes projects we’ve worked on.

I primarily write code to implement scientific algorithms to process satellite data or perform quality control activities. As an example, I’ve attach a paper I published last year on the atmospheric correction of Landsat data. This underlying approach uses a Least Squares error minimization approach to solve the non-linear equations. I’m currently looking at other statistical learning approaches, e.g. Generalized Additive Models and Support Vector Machines, as I have other algorithms where Least Squares isn’t performing sufficiently. This does also link into data mining as a large amount of data is processed and information retained, which is then used when performing a future data processing.

Yes, assuming there’s sufficient funds available, my plan would be to employ a student / developer who’d work on the project (ideally full-time) alongside me.

In the current data mining project I work with an SME called Terradue (http://www.terradue.com/). I’d be happy to introduce them to you.

-] What application will you create (or enhance) as part of the project (what does it do)

We have a suite of software that’s processing Landsat data collected over the UK during 2013/14/15. A PhD student of mine has gone though and found the ‘best’ images so that we have an optimum set covering the whole of the UK. The processing includes correction for the effects of the atmosphere, cloud identification (including cloud shadows) and then processing to value added products such a vegetation indices. At this point we have mainly concentrated on the atmospheric correction. Different techniques process the data on a pixel by pixel basis or as group of pixels called objects. This processing includes statistical techniques to solve non-linear equations, which involves iteration towards the best solution.

As new images become available this dataset will be updated and the aim is to go from a single UK set to separate seasonal sets so that we can analyse the seasonal cycle. Also, we would like to go backwards in time, using older sensors, so that long-term changes can be mapped. This requires additional analysis to check the data is cross-calibrated.

In the longer term we’d also look to extend the dataset to Europe / international countries.

-] Who will use the application (how large is the audience, what areas.. end users in the community, or people inside a company using it as a tool, or for advancing science..)

The Landsat processing software will remain internal to the company, but the Python parallel computing module(s) will be made available to the community (e.g. through GitHub) so they can be used more widely. In addition, a version of the processed Landsat dataset will be made available to the community with addition value-added products available for sale.

Python has become very popular for remote sensing applications and so I think that providing a simple means to allow parallel processing will be of benefit to many organisations; both academic and commercial. We will promote the project at conferences and through papers / articles, will the aim of encouraging an active take-up.

-] What are the computation needs that will benefit from parallel computation

The computing need stems from the large size of the dataset, both individual scenes (of the order of 9000 by 9000 pixels in size) and dataset as a whole, with the interest being in having the ability to process / reprocess it much quicker. I would like to include on-the-fly processing so that datasets are created within a few hours of users requesting them, which will be very useful for us to test software updates internally plus for users ordering data.

-] What has blocked you from using parallel computation up to this point

The time/resources to write code that does perform parallel computing properly – we currently undertake this by starting multiple instances of the code. We have looked at Hadoop as a way forward, but significant resources would be required to get to a working system.

-] How will your application provide higher benefit as a result of the parallel computation (what currently can't be done that will be enabled, or what aspect will be improved. For example, will weeks of waiting for simulation results drop to hours? Will a researcher be able to interactively search, rather than doing a scatter shot of simulations and hoping one of them was the right one? Will a product be producible with less material or less design effort? Will the graphics be richer, or render faster, or use less battery?)

Yes, the aim is for the processing to drop from weeks / months to a few hours / days depending on what’s being processed. As mentioned above, on-the-fly processing (i.e. on demand) processing is also of interest. As processing speeds improve we can increase the size of dataset, both in terms of short / long term temporal changes (i.e. seasonal and climatological processing and the area of interest (e.g. extending beyond the UK).

We would also be interest in having improved visualisation of the processed dataset, which itself remains large and so slow to load and manipulate.

-] Who will receive this benefit and how (for example, will the application help cure cancer for millions of EU citizens by enabling doctors to use personalized genetics?)

There will be a benefit to the wider remote sensing community who undertake many different applications. The aims for the Pixalytics Landsat dataset is to provide information to:
- aid urban planning decisions (e.g. understanding the green spaces within cities and how cities are developing/changing over time)
- understand changes in vegetation more widely across the UK that is linked to changing land use practises and climate change.

In addition, there will also be benefits for Pixalytics Ltd (as a small commercial company) in that we’ll be able to develop our business offering.

And, for logistics of the project, we would like to start by coding in C/C++ but are open to integrating into other languages, such as Python, Java, or even Javascript. Could you say a bit about your development process:

-] What language(s) do you plan to use for the application

The application will be a combination of Interactive Data Language (IDL) and Python. The current thoughts are that computational intensive elements will be converted to Python so that they be run as parallel processing.

-] Is the application.. desktop based, Cloud (SAAS) based, browser based, or mobile

The application will primarily be run on a cloud based server, but there will also be a user web-based interface to the results that can be access through hand held /mobile hardware alongside PCs/laptops.

-] A little bit about the architecture (do you have a server with database, or a large data set that is churned through such as Big Data style, what parts of the computation are performed on the end-user device versus in a server, and so on)

The data is stored within a directory tree structure within individual files being 10-30 GB in size, uncompressed, with the overall UK input dataset is ~500 GB (compressed) in size.

I really appreciate your time in figuring out those things, it will help make the case for obtaining funding for the project.

From what you mentioned, it sounds like your story will be strong as far as the computation parts. I'm thinking the end-user benefits part may need clarification, about the impact on the EU. Also, perhaps there is a GUI aspect.. do you need data visualization ?

Yes, once the data is processed then my aim would be to have a GUI that can be used to visualize the data.

Restore

Open Source Research Institutue

Main.EuroDSLEarth History

Menu

Actions

Search