Figure 1. The
-
$TOA_{VHR}$ geotiff -
$TOA_{VHR}$ cloudmask geotiff - Reference surface reflectance multispectral geotiff
The
* singularity run -B <local path(s) to mount> <container name> python <python application> <runtime parameters>* |
---|
- -toa_dir - directory containing
$TOA_{VHR}$ 2m (default suffix =toa.tif
). Note that a single, fully-qualified path to a single$TOA_{VHR}$ can be provided here instead of a directory. - -target_dir - directory container model data (default suffix =
ccdc.tif
) - -cloudmask_dir - directory containing cloudmasks (default suffix =
toa.cloudmask.v1.2.tif
) - -output_dir - directory containing results for this specific invocation
- -bandpairs - list of band pairs to be processed [(model band name B, TOA band name B), (model band name R, TOA band name R) …] [default="[['blue_ccdc', 'BAND-B'], ['green_ccdc', 'BAND-G'], ['red_ccdc', 'BAND-R'],['nir_ccdc', 'BAND-N'],['blue_ccdc', 'BAND-C'], ['green_ccdc', 'BAND-Y'], ['red_ccdc', 'BAND-RE'], ['nir_ccdc', 'BAND-N2']]"]
- --regressor - choose regression algorithm [‘rma’,'simple', 'robust'] [default = ‘rma’]
- --cloudmask - apply cloud mask values to common mask [default = False]
- --csv - generate comma-separated values (CSV) for output statistics [default = True]
- –csv-dir - directory path to receive statistics files [default = output_dir, “None” turns off –csv flag]
- –err-dir - directory path to receive error files [default = output_dir]
- –warp-dir - directory path to receive interim warped files [default = output_dir]
- --band8 - create simulated bands for missing CCDC bands (C/Y/RE/N2) [default = False]
- --xres - specify target X resolution (default = 30.0)
- --yres - specify target Y resolution (default = 30.0)
- --toa_suffix - specify TOA file suffix (default =
toa.tif
') - --target_suffix - specify TARGET file suffix (default =
ccdc.tif
') - --cloudmask_suffix - specify CLOUDMASK file suffix (default =
toa.cloudmask.v1.2.tif
') - --clean - overwrite previous output (default = True)
- --thmask - apply threshold mask values to common mask (default = False)
- --thrange - choose quality flag values to mask [default='-100, 2000']
- --pmask - suppress negative values from common mask [default = False]
- --debug - diagnostic level [0=None, 1=trace]
- --batch - name of run [default = None]
- --log - create log file for stdout/stderr [default = False]
Sample Invocation:
$ singularity run -B /panfs/ccds02/nobackup/people/iluser/projects/srlite,/panfs/ccds02/nobackup/people/gtamkin,/home/gtamkin/.conda/envs,/run,/explore/nobackup/people/gtamkin/dev,/explore/nobackup/projects/ilab/data/srlite/products /explore/nobackup/people/iluser/ilab_containers/srlite_1.0.1.sif python /usr/local/ilab/srlite/srlite/view/SrliteWorkflowCommandLineView.py -toa_dir /panfs/ccds02/nobackup/people/iluser/projects/srlite/test/input/baseline -target_dir /panfs/ccds02/nobackup/people/iluser/projects/srlite/test/input/baseline -cloudmask_dir /panfs/ccds02/nobackup/people/iluser/projects/srlite/test/input/baseline -bandpairs "[['blue_ccdc', 'BAND-B'], ['green_ccdc', 'BAND-G'], ['red_ccdc', 'BAND-R'], ['nir_ccdc', 'BAND-N'], ['blue_ccdc', 'BAND-C'], ['green_ccdc', 'BAND-Y'], ['red_ccdc', 'BAND-RE'], ['nir_ccdc', 'BAND-N2']]" -output_dir /explore/nobackup/projects/ilab/data/srlite/products/srlite_1.0.0-baseline/srlite_1.0.0-ADAPT-cli/Whitesands/srlite-1.0.0-rma-baseline --regressor rma --debug 1 --pmask --cloudmask --clean --csv --band8 |
---|
The output data is delivered to a directory specified in the program call (-output_dir). There are two outputs:
- The image data is output in COG format with the following naming convention:
\<SENSOR\>\_\<YYYYMMDD\>\_\<CATID\>-sr-02m.tif
. - The regression results are output as .csv files (see Table 1). They contain linear model (slope and intercept) coefficients along with
$SR_{VHR}$ performance statistics. .
The
The Enhanced Very High Resolution Workflow (EVHR) [3] is used to produce
To identify valid surface pixels, we applied a convolutional neural net (CNN) algorithm to mask cloudcover for each Worldview VHR dataset [4]. The mask is returned as a binary map that separates cloud from non-cloud pixels. The development of this algorithm is on-going. In some cases, transparent cirrus clouds are not identified as such, while some very bright non-cloud surfaces (smooth snow cover or bright lichen ground cover extents) may be mis-classified as clouds.
To compile a reference of surface reflectance (
We compile an input filename list of the 3 datasets (
The input files are regridded by warping each to the projection of the input
After regridding, we build a common mask that includes all “no data” collected from both reference and input
We use the re-gridded and mask data stack to build a linear model to describe the relationship of the input
- simple: ordinary least squares regression from sklearn using the LinearRegressor module.
- rma : Python’s pylr2 implementation of reduced major axis (RMA) regression to each
- robust: Python’s sklearn implementation using the HuberRegressor module.
The bandwise model fit from that is returned based on the model choice is then applied back to corresponding input
Output images are returned as multi-band Cloud Optimized GeoTIFFs (COGs) with a corresponding linear model result table (CSVs). These table includes:
- a summary of the correction model coefficients applied bandwise to each pixel of the
$TOA_{VHR}$ , - statistical results of the
$SR_{VHR}$ comparison with input$SR_{reference}$ data and input$TOA_{VHR}$
The image products inherit the projection, extent, and cell sizeof the EVHR
Table 1. An example of the output CSV table for each
![][image2]
Table 2. Description of output linear model results table column names.
Column Name | Description |
---|---|
band_names: | the name of the input TOA bands |
model | he name of the linear model choice |
intercept, slope | the values of the coefficients for each model relating input TOA to reference SR |
r2_score | coefficient of determination regression score, representing the proportion of variance of the dependent variable explained by the independent variables. https://scikit-learn.org/stable/modules/model\_evaluation.html\#r2-score |
explained_variance | the “r-squared value” representing the proportion of variance between the dependent and independent variables explained by the linear model. Similar to r2_score but does not account for systematic offsets in the prediction. |
mae | mean absolute error in the prediction |
mbe | mean bias in the prediction |
mape | mean absolute percentage error from the model residuals |
medea | median absolute error (https://scikit-learn.org/stable/modules/generated/sklearn.metrics.median\_absolute\_error.html\#sklearn.metrics.median\_absolute\_error) |
mse | mean squared error from the model residuals (https://scikit-learn.org/stable/modules/model\_evaluation.html\#mean-squared-error) |
rmse | root mean squared error from the model residuals (y) |
mean_ccdc_sr | mean surface reflectance of the reflectance target (CCDC) in the final (fully masked) model array. |
mean_evhr_srlite | mean surface reflectance of the $SR_{VHR}$ product in the final (fully masked) model array. |
mae_norm | normalized mean absolute error, normalized by dividing the mae by the mean CCDC reflectance. |
rmse_norm | normalized root mean square error, normalized by dividing the rmse by the mean CCDC reflectance. |
For each individual
- the image’s
$SR_{VHR}$ vs. input$TOA_{VHR}$ reflectance values (30 m). - the image’s
$SR_{VHR}$ vs. input reference reflectance values (30 m).
A suite of regression metrics for describing the strength of the model used to derive the output
For a batch of
- Snow: Snow is masked from CCDC so reference image during partial snow season generally will depict snow-free reflectance. We should be masking snow from EVHR before applying any regression with CCDC reference, however we do not currently have a snow mask to use.
- Cloud shadow: Artificially dark areas in EVHR that are not masked can greatly affect regression
- Water: Especially with high sediment loads or ice, not a stable reflectance target
- Saturation: More common in QB2/GE1 images. Often associated with snow. Regression not appropriate with saturated pixels. Saturated pixels are excluded from the CCDC model.
- Out of bounds values: Reflectance < 0 or > 1 are sometimes far outside the bound of the valid 0-1 range.
- VHR acquisition geometry plotted on polar coordinates, useful for understanding some of the variation within a batch of
$SR_{VHR}$ output. -
$SR_{VHR}$ output assessment notebooks.
- Geographic shift: CCDC (generated on a 30m grid) gets an extra arbitrary resampling step to match one corner of the
$TOA_{VHR}$ image. This results in a subpixel shift, possibly accompanied by a smoothing if NN resampling is not used. - Algorithm speed: mode resampling for cloud mask could be switched to average as long as pixels with a decimal value are then classified as cloud (conservative - more cloud area than the input). Mode may be an order of magnitude or more slower and is one of the slowest steps in the algorithm.
[1] P. M. Montesano et al., “Surface Reflectance From Commercial Very High Resolution Multispectral Imagery Estimated Empirically With Synthetic Landsat (2023),” IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens., pp. 1–10, 2024, doi: 10.1109/JSTARS.2024.3456587.]
[2] Z. Zhu, C. E. Woodcock, C. Holden, and Z. Yang, “Generating synthetic Landsat images based on all available Landsat data: Predicting Landsat surface reflectance at any given time,” Remote Sens. Environ., vol. 162, pp. 67–83, Jun. 2015, doi: 10.1016/j.rse.2015.02.009.]
[3] C. S. R. Neigh et al., “An API for Spaceborne Sub-Meter Resolution Products for Earth Science,” Int. Geosci. Remote Sens. Symp. IGARSS, pp. 5397–5400, Jul. 2019, doi: 10.1109/IGARSS.2019.8898358.]
[4] J. A. Caraballo-Vega et al., “Optimizing WorldView-2, -3 cloud masking using machine learning approaches,” Remote Sens. Environ., vol. 284, p. 113332, Jan. 2023, doi: 10.1016/j.rse.2022.113332.]