SatelliteVu submission for the AWS Disaster Response Hackathon
Elevator pitch: The tech team at Satellite Vu applied our knowledge of wildfires, satellite imagery and machine learning to demonstrate a fire spread prediction system which could be applied globally using open data.
For this challenge, the tech team at Satellite Vu applied our knowledge of wildfires, satellite imagery and machine learning to demonstrate a fire spread prediction system. Satellites offer wide area coverage in near real time, and can access even the most remote locations. Accurate prediction of fire spread could improve disaster response, for example to identify towns or properties in the greatest immediate risk. We used data from the MODIS and VIIRS satellite sensors, which provide long-term observational records in multiple spectral bands, as well as elevation and land cover data to make predictions with both traditional machine learning and deep learning models. The models predicted the next day's fire pattern with higher precision and recall than the baseline model of fire persistence. The use of the predictions in an application which could be used by fire management teams was also demonstrated. In future we hope that Satellite Vu’s high-resolution thermal imagery could be input to models such as these to provide even more spatially accurate fire spread predictions to help them pinpoint specific populations and areas at risk. We would also like to highlight that this work is done as a team challenge in a two-week sprint.
Above: Landsat 8 satellite image of the Camp Fire on Nov. 8 2018
Data sources are mostly from the Registry of Open Data on AWS, apart from the NASA-FIRMS open data which requires an account to access the data:
Data source | Description |
---|---|
Copernicus DEM | Digital Surface Model (DSM) which represents the surface of the Earth including buildings, infrastructure and vegetation. |
MODIS | The Moderate Resolution Imaging Spectroradiometer (MODIS) MCD43A4 Version 6 Nadir Bidirectional Reflectance Distribution Function (BRDF)-Adjusted Reflectance (NBAR) dataset is produced daily using 16 days of Terra and Aqua MODIS data at 500 meter (m) resolution. |
ESA Land Cover | The European Space Agency (ESA) WorldCover is a global land cover map with 11 different land cover classes produced at 10m resolution based on combination of both Sentinel-1 and Sentinel-2 data. |
ERA5 Atmospheric data | ERA5 is the fifth generation of ECMWF atmospheric reanalyses of the global climate, and the first reanalysis produced as an operational service. |
FIRMS active fire data | The Fire Information for Resource Management System (FIRMS) distributes Near Real-Time (NRT) active fire data within 3 hours of satellite observation from the Visible Infrared Imaging Radiometer Suite (VIIRS) aboard S-NPP and NOAA 20 at 375 meter (m) resolution. |
The processing code is in the folder dataset_preparation
. For convenience the dataset has been uploaded to Kaggle at satellite-next-day-wildfire-spread
Data quality checks involve visualising examples of the dataset and generating descriptive statistics about each variables of dataset, such as mean and standard deviation so that the data can be normalised when required by the mode. Additionally this process should identify any anomlies or outliers in the dataset, which may require special attention. The readme in the data_quality
folder provides further information. These checks were important since they identified that some features contained essentially the same information (e.g. NDVI & the landcore index) and allowed us to identify which features had the best overall data quality. An example single sample of data is shown below: note that not all features were used in the final model training.
Following the approach presented in the paper Next Day Wildfire Spread: A Machine Learning Data Set to Predict Wildfire Spreading from Remote-Sensing Data both logistic regression and random forest models were trained on flattened 3x3 pixel patches of imagery. This required transforming the dataset of images to a table of data in csv format, shown in the notebook /classical_ml/create_csv.ipynb
. The processing for doing this transformation is in the notebook classical_ml/create_csv.ipynb
. To avoid needing to copy image data locally and to conserve storage on the sagemaker instance, pixel data was accessed as numpy arrays that are hosted on AWS S3. To speed up processing multiprocessing was used, allowing all 4 cores of the sagemaker instance to be used.
The logistic regression model struggled to beat the naive baseline assumption that fires are persistent over time, whilst the random forest model achieved a significant improvement over the baseline (table below). Note that the training dataset is very imbalanced with the significant majority of pixels being not-fire
, but we want a model with good performance on the fire
pixels. To deal with this imbalance the class_weight
parameter was experimented, as well as the max_depth
of the trees. The precision and recall of the best performing model (selected for improving both these metrics over the baseline) are in the table below.
Fire weight | max_depth | fire precision | fire recall | no-fire precision | no-fire recall |
---|---|---|---|---|---|
Baseline | NA | 0.37 | 0.37 | 0.98 | 0.98 |
4 | 10 | 0.41 | 0.41 | 0.98 | 0.98 |
Inspection of feature importance identified the current fire location as the most significant indicator of fire spread, followed by elevation and landcover features.
Visualisation of fire predictions indicate that this simple model (based on a subset of features and 3x3 pixel patches of imagery) cannot predict significant movements of fires, but nevertheless does provide useful predictions for slow moving fires.
Amongst several CNN model architectures tested, we selected the ResUNet as it is showed the best performance during preliminary experiments. The ResUNet performance is compared to the baseline (fire persistence).
fire precision | fire recall | no-fire precision | no-fire recall | |
---|---|---|---|---|
Baseline | 0.27 | 0.33 | 0.98 | 0.98 |
ResUNet | 0.30 | 0.44 | 0.98 | 0.97 |
The model achieves better precision and recall for the fire class than the baseline. The recall improves more significantly, which works in the favour of an application where the cost of error is higher. Note that the baseline metrics differ from the random forest values for 2 reasons: (1) in the random forest model the outermost pixels are not used and (2) the test dataset may have comprised slightly different images than those used on the random forest as a discrepancy was discovered very late in the day about the version of datasets being used by the two team members.
Visualization of some predictions (below) show that it is difficult for the model to predict big changes, and the classifications are very much dependent on the previous day fire mask that is used as input. Outputs often look like a smoothed out version of the previous day fire input feature.
Future work could include: class balancing support & version of dice coefficient compatible with class balancing, and use of AutoML pipelines to improve efficiency of model architecture and hyperparameters tests
Visualisation tools were created for demonstrating how model predictions can be used in applications.
Fire prediction mapping on Sentinel-2 layer: This map tool is created for visualising VIIRS fire raw data/ ML model training results being generated. We use the geojson/metadata created from pre-processing to locate the fire pixels from ML model training results. Finally, those fire pixels are plotted overlapping with the Sentinel-2 data. The tool is in notebook fire_prediction_visualisation/split_map_s2_fire.ipynb
Fire prediction mapping on top of open street map: To make sure the predicted fire information can be easily used for fire management, we created a fire warning system with an example below. You can zoom in/out to check the exactly location of fire episodes. This system will also get nearby building data from open street map and generate a warning at individual building/power station ect level for any within 10km of fire. The tool is in notebook fire_prediction_visualisation/animation_fire_warning_using_open_street_map.ipynb
and exported to fire_prediction_visualisation/fire_warning.html
(open in browser)
All notebooks in this repo are runnable on AWS Sagemaker Studio Lab. For the python environmnet, some people used custom environments defined using an env.yml
file, whilst other people just installed packages into one of the supplied environments using a requirements.txt
file.
- Python 3 used for processing, modelling and visualisation
- Scikit learn for classical ML modelling & Yellowbrick for feature importance visualisation
- Deep learning used Keras/tensorflow
- AWS Studio Lab used for running notebooks and training ML models
- AWS S3 used for data storage
- https://firms.modaps.eosdis.nasa.gov/ for VIIRS fire data
- https://earth-search.aws.element84.com/v0/collections/sentinel-s2-l2a-cogs for Sentinel 2 images
- https://eod-catalog-svc-prod.astraea.earth/ for MODIS data
- https://registry.opendata.aws/ Registry of Open Data on AWS for all other data
- Open Street Map API (via osmnx)
- Git & github used for version control of code
- Weights & Biases was used for tracking training metrics of the deep learning model
- ipyleaflet was used to create interactive maps in Jupyter notebooks
- Zarr is a format for the storage of chunked, compressed, N-dimensional arrays
Note that each team member had their own AWS Sagemaker Studio Lab instance & git repositories, with the results and outputs aggregated in this repo.
The work for this submissions was completed within a 2 week sprint, with team members coordinating over Slack and Zoom.
- Ed Jeffrey: data quality assurance
- Isobel Gray: scripted and voiced the video submission
- James O'Connor: generated all datasets except fire dataset
- Jade Constantinou: curated and edited the video submission & provided data visualisations
- Maria Ghironi: project manager
- Patrick Talon: training and evaluation of deep learning ResUNet model
- Robin Cole: trained the classical ML models and curated this repository
- Ross Hawton: generated the VIIRS fire dataset
- Tianran Zhang: wildfire scientist, created data visualisations