Dr Samantha Lavender

Director, Pixalytics Ltd - Trusted Earth Observation Experts

1 Davy Road, Plymouth Science Park, Plymouth, Devon, PL6 8BX, UK Mobile: +44 (0)7739 905541 Phone: +44 (0)1752 764407 E-mail: slavender@pixalytics.com https://twitter.com/samlvndr

http://www.pixalytics.com Earth Observation Products and Services: http://www.pixalytics.com/what-we-do/solutions/

-] What application will you create (or enhance) as part of the project (what does it do)

We have a suite of software that’s processing Landsat data collected over the UK during 2013/14/15. A PhD student of mine has gone though and found the ‘best’ images so that we have an optimum set covering the whole of the UK. The processing includes correction for the effects of the atmosphere, cloud identification (including cloud shadows) and then processing to value added products such a vegetation indices. At this point we have mainly concentrated on the atmospheric correction. Different techniques process the data on a pixel by pixel basis or as group of pixels called objects. This processing includes statistical techniques to solve non-linear equations, which involves iteration towards the best solution.

As new images become available this dataset will be updated and the aim is to go from a single UK set to separate seasonal sets so that we can analyse the seasonal cycle. Also, we would like to go backwards in time, using older sensors, so that long-term changes can be mapped. This requires additional analysis to check the data is cross-calibrated.

In the longer term we’d also look to extend the dataset to Europe / international countries.

-] Who will use the application (how large is the audience, what areas.. end users in the community, or people inside a company using it as a tool, or for advancing science..)

The Landsat processing software will remain internal to the company, but the Python parallel computing module(s) will be made available to the community (e.g. through GitHub) so they can be used more widely. In addition, a version of the processed Landsat dataset will be made available to the community with addition value-added products available for sale.

Python has become very popular for remote sensing applications and so I think that providing a simple means to allow parallel processing will be of benefit to many organisations; both academic and commercial. We will promote the project at conferences and through papers / articles, will the aim of encouraging an active take-up.

-] What are the computation needs that will benefit from parallel computation

The computing need stems from the large size of the dataset, both individual scenes (of the order of 9000 by 9000 pixels in size) and dataset as a whole, with the interest being in having the ability to process / reprocess it much quicker. I would like to include on-the-fly processing so that datasets are created within a few hours of users requesting them, which will be very useful for us to test software updates internally plus for users ordering data.

-] What has blocked you from using parallel computation up to this point

The time/resources to write code that does perform parallel computing properly – we currently undertake this by starting multiple instances of the code. We have looked at Hadoop as a way forward, but significant resources would be required to get to a working system.

-] How will your application provide higher benefit as a result of the parallel computation (what currently can't be done that will be enabled, or what aspect will be improved. For example, will weeks of waiting for simulation results drop to hours? Will a researcher be able to interactively search, rather than doing a scatter shot of simulations and hoping one of them was the right one? Will a product be producible with less material or less design effort? Will the graphics be richer, or render faster, or use less battery?)

Yes, the aim is for the processing to drop from weeks / months to a few hours / days depending on what’s being processed. As mentioned above, on-the-fly processing (i.e. on demand) processing is also of interest. As processing speeds improve we can increase the size of dataset, both in terms of short / long term temporal changes (i.e. seasonal and climatological processing and the area of interest (e.g. extending beyond the UK).

We would also be interest in having improved visualisation of the processed dataset, which itself remains large and so slow to load and manipulate.

-] Who will receive this benefit and how (for example, will the application help cure cancer for millions of EU citizens by enabling doctors to use personalized genetics?)

There will be a benefit to the wider remote sensing community who undertake many different applications. The aims for the Pixalytics Landsat dataset is to provide information to: - aid urban planning decisions (e.g. understanding the green spaces within cities and how cities are developing/changing over time) - understand changes in vegetation more widely across the UK that is linked to changing land use practises and climate change.

In addition, there will also be benefits for Pixalytics Ltd (as a small commercial company) in that we’ll be able to develop our business offering.

And, for logistics of the project, we would like to start by coding in C/C++ but are open to integrating into other languages, such as Python, Java, or even Javascript. Could you say a bit about your development process:

-] What language(s) do you plan to use for the application

The application will be a combination of Interactive Data Language (IDL) and Python. The current thoughts are that computational intensive elements will be converted to Python so that they be run as parallel processing.

-] Is the application.. desktop based, Cloud (SAAS) based, browser based, or mobile

The application will primarily be run on a cloud based server, but there will also be a user web-based interface to the results that can be access through hand held /mobile hardware alongside PCs/laptops.

-] A little bit about the architecture (do you have a server with database, or a large data set that is churned through such as Big Data style, what parts of the computation are performed on the end-user device versus in a server, and so on)

The data is stored within a directory tree structure within individual files being 10-30 GB in size, uncompressed, with the overall UK input dataset is ~500 GB (compressed) in size.

   I really appreciate your time in figuring out those things, it will help make the case for obtaining funding for the project.

   From what you mentioned, it sounds like your story will be strong as far as the computation parts.  I'm thinking the end-user benefits part may need clarification, about the impact on the EU.  Also, perhaps there is a GUI aspect..  do you need data visualization ? 

Yes, once the data is processed then my aim would be to have a GUI that can be used to visualize the data.

writing code for 20+ years, focused on the processing of Earth Observation (EO) data. I have on a few occasions got involved with parallel computing, but instead primarily relied on serial code with multiple instances running because of the effort involved. At present I mainly code in IDL, with some C and other languages, but I’ve started to work with Python as it has become very popular and there are a large number of open source EO focused libraries.

I’m attaching an overview of my company, Pixalytics Ltd, which includes projects we’ve worked on.

I primarily write code to implement scientific algorithms to process satellite data or perform quality control activities. As an example, I’ve attach a paper I published last year on the atmospheric correction of Landsat data. This underlying approach uses a Least Squares error minimization approach to solve the non-linear equations. I’m currently looking at other statistical learning approaches, e.g. Generalized Additive Models and Support Vector Machines, as I have other algorithms where Least Squares isn’t performing sufficiently. This does also link into data mining as a large amount of data is processed and information retained, which is then used when performing a future data processing.

Yes, assuming there’s sufficient funds available, my plan would be to employ a student / developer who’d work on the project (ideally full-time) alongside me.

In the current data mining project I work with an SME called Terradue (http://www.terradue.com/). I’d be happy to introduce them to you.