Skip to content
fqqlsun edited this page Mar 3, 2022 · 29 revisions

Welcome to the LEAF production tool wiki!

Lixin Sun and Richard Fernandes, Canada Centre for Remote Sensing, Government of Canada

1. Description

The LEAF production tool is an application developed based on the Earth Engine Python API. With this tool, users can efficiently produce biophysical parameter maps with associated uncertainty estimates from the surface reflectance satellite imagery stored in Google Earth Engine (GEE). The two major features of this tool are: (1) various production requirements can be achieved by configuring a flexible input parameter dictionary object; (2) all results can be automatically exported to a specified location (either Google Drive or Google Cloud Storage) in a batch mode.

The regular output spatial unit of the LEAF production tool is a tile (900km x 900km), which is defined by the Canadian geospatial tile griding system (there are 26 tiles covering the Canadian landmass). However, a customized polygon can also be utilized to define the spatial area of a production. Currently, four types of biophysical products (LAI, fCOVER, fAPAR and Albedo) can be generated with this tool. Note that all the instructions in this document assume on Windows platform, because that is the only system we tested the LEAF production tool on.

2. Set Up Environment For The LEAF Production Tool

The prerequisites for running LEAF production tool on a local/client computer are described as follows:

  • Having a Google Cloud account;
  • Install Anaconda and Earth Engine (EE) Python API on an local/client computer. The LEAF production tool must be executed through Jupyter Notebook. Anaconda is a widely used Python distribution for data science. It includes the libraries required by both EE python API and Jupyter Notebook;
  • Clone or download a copy of the Python LEAF production tool from GitHub and save it on the local computer.

2.1 Install anaconda and GEE python API

The installation of Anaconda on a local computer is a straightforward process. Just go to the website of Anaconda and follow the given instructions. After the installation, an user needs to search (from the Start menu of Windows) and open Anaconda Prompt, because all conda command must be issued with the Prompt window.

It is highly recommended to create a dedicated conda environment for installing EE Python API and executing the LEAF production tool. Next line is the command for creating a dedicated conda environment ("leaf_prod" here is used to name the environment):

conda create -n leaf_prod

Once a dedicated conda environment has been created, the following commands can be used to activate the dedicated environment and install EE Python API to the dedicated environment:

conda activate leaf-prod

conda install -c conda-forge earthengine-api

2.2 Obtain a copy of the LEAF production tool from GitHub

As a result, there are 11 the biophysical parameter data set associated with one tile consists of 11 image files in GeoTiff format. Specifically, for each of four biophysical parameters (LAI, fCOVER, Albedo and fAPAR), there are two associated images, a parameter estimation and its corresponding uncertainty map (8 image files in total). Additionally, there are three ancillary image files (quality control, acquisition date and land cover partition). With spatial resolution set to 20m, the size of one tile’s parameter dataset (in GeoTiff format) ranges from 3.1GB to 44GB depending on the location of the tile. It must be noted that a smaller dataset size unnecessarily means less computing time is needed. The reason for this is that the size of the image files is determined by the homogeneity of the covered land surface. In northern Canada, the land cover is relatively homogeneous, however the number of satellite images involved in the calculation is larger due to the geometry of satellite paths. This leads to a longer computing time.

The execution environment of the code is the Jupyter Notebook. If the “geemap” Python package (https://geemap.org/) is also installed and Chrome browser (rather than FireFox browser) is utilized to open the notebook file, then the resultant parameter maps can be visualized and explored as well within the notebook.

The basic spatial output unit of the LEAF production code is a tile (900km x 900km), which is defined by the Canadian geospatial tile grinding system (there are 26 tiles covering the Canadian landmass). Currently, the biophysical parameter dataset associated with one tile consists of 11 image files. Specifically, for each of four biophysical parameters (LAI, fCOVER, Albedo and fAPAR), there are two associated images, a parameter estimation and its corresponding uncertainty map (8 image files in total). Additionally, there are three ancillary image files (quality control, acquisition date and land cover partition). With spatial resolution set to 20m, the size of one tile’s parameter dataset (in GeoTiff format) ranges from 3.1GB to 44GB depending on the location of the tile. It must be noted that a smaller dataset size unnecessarily means less computing time is needed. The reason for this is that the size of the image files is determined by the homogeneity of the covered land surface. In northern Canada, the land cover is relatively homogeneous, however the number of satellite images involved in the calculation is larger due to the geometry of satellite paths. This leads to a longer computing time.

LEAF production code is released under the Government of Canada's Open Government License

Current Efficiency

A GEE application (in either JavaScript or Python) must be executed on a client-server architecture with majority of the computations are carried out on GEE servers. This means that the efficiency of the LEAF production code is not determined by a client-side computer.

Currently, 2 hours on average is required to produce a 20m resolution biophysical vegetation parameter data set for one tile with the satellite images acquired by Sentinel-2, meaning that the generation of a Canadian national (26 tiles) one month 20m resolution biophysical parameter data set requires at least 52 hours. Of course, this time requirement varies depending on the location of a tile, network connection status and time zone (day, night or weekend). So it can be estimated that three or 4 days are required to operationally finish a monthly 20m resolution national biophysical parameter map.

Code in Repository

This repository does not include all the Python code of LEAF production tool. The main purpose of this repository is to keep a communication channel for investigating the efficiency of running LEAF production code on a Google Cloud VM.

Clone this wiki locally