This is a small python program that allows easy and fast extraction of Bioclimatic variables (commonly used in ecological analyses and modeling) from 2 of the most prohiminent climate datasets, Chelsa and WorldClim. To extract the data, you only need the lon/lat (x,y) coordinates using any of the Coordinate Reference System in combination with the download GeoTIFF files. The program will take care of offset + scale correction and will output the data in the proper units for each variable.
Included also is a quick way to visualize with the Plotly library via running data_viz.py script that can improved and customized.
Latest publication : WorldClim 2: new 1-km spatial resolution climate surfaces for global land areas (2017)
Historical data range : 1970-2000
Data at 1 km2 resolution for all 19 BIOCLIM vars https://www.worldclim.org/
Metadata : bioclim-data-extraction/data/specs/anuclim61.pdf
Latest publication : Global climate-related predictors at kilometre resolution for the past and future. (preprint)
Historical data range : 1981-2010
Data at 1 km2 resolution for all 19 BIOCLIM vars/ https://chelsa-climate.org
Metadata : bioclim-data-extraction/data/specs/CHELSA_tech_specification_V2.pdf
Use conda to setup the python environment
For full requirements see requirements.txt. Uses mainly python > 3 with the following main libraries:
- rasterio
- pyproj
- pandas
Clone the repo
git clone git@github.com:simlal/bioclim-data-extraction.git
cd bioclim-data-extraction
Create a new conda environement
conda create --name bioclim --file requirements.txt
conda activate bioclim
There are some unused dependencies in there, but there are some intricacies regarding compatible versions of rasterio and pyproj so just go ahead and use the requirements.txt
Dir structure
.
├── data
│ ├── bioclim
│ ├── specs
│ │ ├── anuclim61.pdf
│ │ └── CHELSA_tech_specification_V2.pdf
│ ├── states.csv
│ └── urls
│ ├── chelsa_bioclim19_S3paths.txt
│ └── worldclim_bioclim19_30s_path.txt
├── scripts
│ ├── config.yaml
│ ├── data_extraction.py
│ ├── download.py
│ ├── __init__.py
├──requirements.txt
└── run.py
We will perform our data analysis from the main directory (i.e. run.py in this example).
mkdir data/bioclim/
wget -i data/urls/chelsa_bioclim19_S3paths.txt -P data/bioclim/
wget -i data/urls/worldclim_bioclim19_30s_path.txt -P data/bioclim/
Steps
- Run bioclim_download.py from the command line.
python scripts/download.py
chelsa : for Chelsa bioclim19 dataset
worldclim : for WorldClim bioclim19 dataset
both : for Chelsa and WorldClim bioclim datasets
Enter which dataset(s) you would like to download.
- Entering dataset (or 'both') keyword will start the download.
There will be some infographics about the progress and speed of the download within the terminal. Note that the download will proceed by chunks.
Use directly with CrsDataPoint class from run.py
>>> from scripts.data_extraction import CrsDataPoint
>>> sherby_3857 = CrsDataPoint('Sherbrooke', epsg=3857, x=-8002765.769038227, y=5683742.6823244635)
>>> sherby_3857.get_info()
id : Sherbrooke
EPSG : 3857
Map coordinates :
x = -8002765.769038227
y = 5683742.6823244635
(x,y) = (-8002765.769038227, 5683742.6823244635)
>>> sherby_4236 = sherby_3857.transform_crs() # default is 4326 (aka GPS)
>>> sherby_gps.get_info()
id : Sherbrooke_transformed
EPSG : 4326
Map coordinates :
x = -71.89006805555556
y = 45.39386888888889
(x,y) = (-71.89006805555556, 45.39386888888889)
Since both Worldclim and Chelsa db (GeoTIFF) are encoded with the EPSG:4326 coordinate reference system (crs), we need to convert the x/y coords to EPSG:4326.
The method transform_crs()
will be automatically called when encountering a non-"4326" object and convert the coordinates accordingly.
Get values and relevant metadata
>>> from scripts.data_extraction import CrsDataPoint
>>> sherby_3857 = CrsDataPoint('Sherbrooke', epsg=3857, x=-8002765.769038227, y=5683742.6823244635)
>>> sherby_4236_chelsa = sherby_3857.extract_bioclim_elev(dataset='chelsa') # As dictionary
Data point with x,y other than EPSG:4326. Calling transform_crs() method...
...than extracting values for Sherbrooke_transformed at lon=-71.890 lat=45.394 for all climate variables bio1 to bio19 in CHELSA V2.1 (1981-2010) + elevation in WorldClim 2.1 dataset...
Done!
# Convert to DataFrame
>>> import pandas as pd
>>> sherby_bio_chelsa = pd.DataFrame([sherby_4236_chelsa])
>>> with pd.option_context('display.max_colwidth', 15):
print(sherby_bio_chelsa) # Output will vary base on terminal width
id epsg lon lat bio1 (Celcius) ... bio19 (kg / m**2 / month) bio19_longname bio19_explanation elevation_Meters elevation_explanation
0 Sherbrooke_... 4326 -71.890068 45.393869 6.05 ... 243.0 mean monthl... The coldest... 158 Elevation i...
[1 rows x 63 columns]
To get a leaner dataframe with only relevant class information + values
>>> from scripts.data_extraction import trim_data
>>> sherby_bio_chelsa_trimmed = trim_data(sherby_4236_chelsa) # From full dict to trimmed dict
>>> print(pd.DataFrame([sherby_bioclim_trimmed])) # Display as df
id epsg lon ... bio18 (kg / m**2 / month) bio19 (kg / m**2 / month) elevation_Meters
0 Sherbrooke_transformed 4326 -71.890068 ... 375.8 243.0 158
[1 rows x 24 columns]
Use the load_csv()
classmethod with a csv containing the CrsDataPoint attributes as header
>>> from scripts.data_extraction import CrsDataPoint
>>> from pathlib import Path
# State centroid example
>>> csv_file = Path("./data/us-state-capitals.csv")
>>> data = CrsDataPoint.load_csv(csv_file) # As list of CrsDataPoint objects
# First 3 capitals
>>> print(data[0:3])
[CrsDataPoint(Montgomery_Alabama, epsg=4326, x=-86.279118, y=32.361538), CrsDataPoint(Juneau_Alaska, epsg=4326, x=-134.41974, y=58.301935), CrsDataPoint(Phoenix_Arizona, epsg=4326, x=-112.073844, y=33.448457)]
We can check at anypoint the complete list of all instantiated objects with by using the CrsDataPoint.all
attribute.
To extract the bioclim1 to 19 + elevation values for a given database
Calling the extract_multiple_bioclim_elev(specimens_list, dataset, trimmed=True
function returns a dataframe (trimmed or exhaustive depending on trimmed arg).
>>> from scripts.data_extraction import extract_multiple_bioclim_elev
# Extract bioclim values for all states
>>> df_trimmed = extract_multiple_bioclim_elev(data, 'worldclim', trimmed=True) # if False : full df
>>> df_trimmed = df_trimmed.set_index('id')
Extracting values for Montgomery_Alabama at lon=-86.279 lat=32.362 for all climate variables bio1 to bio19 + elevation in WorldClim 2.1 (1970-2000) dataset...
Done!
Extracting values for Juneau_Alaska at lon=-134.420 lat=58.302 for all climate variables bio1 to bio19 + elevation in WorldClim 2.1 (1970-2000) dataset...
Done!
...
Extracting values for Cheyenne_Wyoming at lon=-104.802 lat=41.146 for all climate variables bio1 to bio19 + elevation in WorldClim 2.1 (1970-2000) dataset...
Done!
# Checking the first five capitals
>>> print(df_trimmed.head())
epsg lon lat bio1 (Celcius) ... bio17 (kg / m**2 / month) bio18 (kg / m**2 / month) bio19 (kg / m**2 / month) elevation_Meters
id ...
Montgomery_Alabama 4326 -86.279118 32.361538 18.850000 ... 264.0 330.0 392.0 88
Juneau_Alaska 4326 -134.419740 58.301935 5.045834 ... 306.0 409.0 463.0 45
Phoenix_Arizona 4326 -112.073844 33.448457 22.787500 ... 13.0 53.0 59.0 336
Little Rock_Arkansas 4326 -92.331122 34.736009 16.950001 ... 270.0 275.0 301.0 104
Sacramento_California 4326 -121.468926 38.555605 16.587500 ... 9.0 9.0 249.0 9
[5 rows x 23 columns]
Then save to csv
>>> bioclim_out = data_dir / "us-capitals_bioclim.csv"
>>> if not Path.is_file(bioclim_out):
>>> df_trimmed.to_csv(bioclim_out)
All visualization are made with the Plotly graphing library for Python. Run the data_viz.py script command line with the previously generated csv as follow :
python data_viz.py us-capitals_bioclim.csv
.
The data_viz.py was written quickly to just have a visualization of the climate data pipeline.
Example with scatterplot on Mapbox
Example with bio1 selection from dropdown menu Example with elevation selection from dropdown menu
Feel free to contact me by email or any other platform mentioned in my GitHub profile for any questions of feedback!