Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ERA5 Monthly Single Levels 1979-2024 artifact #70

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

imreddyTeja
Copy link
Contributor

@imreddyTeja imreddyTeja commented Nov 13, 2024

The issue here has conflicting info on if there should be separate files for atmos and land calibration. The issue says "this can be in the same file but land wont use it" and
"We can have separate data files for atmos and land calibration. Only surface fluxes variables will be shared, and they can stay in the land data file."

I currently have the variables in one file. I thought this made more sense because only one variable is not used by land, so the atmos calibration data would contain a single variable.

Checklist:

  • I created a new folder $artifact_name
    • I added a README.md in that that folder that
      • describes the data and processing done to it
      • lists the sources of the raw data
      • lists the required citation, licenses
    • If applicable (e.g., for Creative Commons), I added a LICENSE file
    • I added the scripts that retrieve, process, and produce the artifact
    • I added the environment used for such scripts (typically, Project.toml
      and Manifest.toml)
    • I added the OutputArtifacts.toml file containing the information
      needed for package developers to add $artifact_name to their package
  • I uploaded the artifact folder to the Caltech cluster (in
    /groups/esm/ClimaArtifacts/artifacts/$artifact_name)
  • I added the relevant code to the Overides.toml on the Caltech Cluster
    (in /groups/esm/ClimaArtifacts/artifacts/Overrides.toml)
  • I added a link to the main README.md to point to the new artifact

Plots of monthly mean:
mslhf_map
msror_map
msshf_map
mssror_map
msuwlwrf_map
msuwswrf_map
tcw_map

Plots of monthly means by hour of day (these plots are from 23:00-24:00:

mslhf_hourly_map
msror_hourly_map
msshf_hourly_map
mssror_hourly_map
msuwlwrf_hourly_map
msuwswrf_hourly_map
tcw_hourly_map

@imreddyTeja imreddyTeja linked an issue Nov 13, 2024 that may be closed by this pull request
@imreddyTeja imreddyTeja marked this pull request as ready for review November 13, 2024 18:44
@szy21
Copy link
Member

szy21 commented Nov 13, 2024

I think it would be better to have total column water in a separate file. We will likely add more atmos variables to that file, or replace it with something else. But we can also do this later.

@kmdeck
Copy link
Member

kmdeck commented Nov 13, 2024

I think it would be better to have total column water in a separate file. We will likely add more atmos variables to that file, or replace it with something else. But we can also do this later.

Was there also a question of having surface variables in one file (some used by land, some by atmos), but 3d vars in a separate one only for atmos?

@szy21
Copy link
Member

szy21 commented Nov 13, 2024

I think it would be better to have total column water in a separate file. We will likely add more atmos variables to that file, or replace it with something else. But we can also do this later.

Was there also a question of having surface variables in one file (some used by land, some by atmos), but 3d vars in a separate one only for atmos?

Yeah I was thinking surface variables in one file, 2d variables (e.g, vertical integrals such as water vapor path) for atmosphere only in one file, and 3d variables for atmosphere either in one file or one variable per file, as 3d variables can be very large.

7. `mean_surface_runoff_rate`
8. `mean_sub_surface_runoff_rate`
9. `total_column_water`
10. `number`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can define this too?

attrib = input_ds["longitude"].attrib,
)

new_times = map(input_ds["date"][:]) do t
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain what lines 116-125 do?

end

convert_to_f32(x) = Float32(x)
convert_to_f32(x::Missing) = x
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, when was there missing data?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe runoff over ocean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Downloading Monthly averages by hour of day results in some weirdness with the time dimension, where there are duplicate entries. Some variables are defined on one of the duplicates and some on the other. I had improperly dealt with this (assumed all were on one of the duplictes), which resulted in missing data. This is fixed now, and there is no missing data

Copy link
Member

@kmdeck kmdeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good! Thank you! We can defer to @szy21 regarding the files

@imreddyTeja
Copy link
Contributor Author

imreddyTeja commented Nov 14, 2024

I just noticed the time in the monthly averages (non hourly) is formatted wrong. I'll fix them and split up the files

Copy link
Member

@szy21 szy21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

### `era5_monthly_surface_fluxes_197901-202410.nc`

This file contains Monthly averaged reanalysis from 1979 to present (October 2024 at time of creation), which is produced by averaging all daily data for each month. This results in 12*(2024-1979 + 10/12) = 550 points on the
time dimension, where each point is the 15th of the month that the point represents. For example, the 6th index of `time` is `19790601-01-15T00:00:00`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
time dimension, where each point is the 15th of the month that the point represents. For example, the 6th index of `time` is `19790601-01-15T00:00:00`,
time dimension, where each point is the 15th of the month that the point represents. For example, the 6th index of `time` is `1979-06-15T00:00:00`,

Comment on lines 150 to 248
ignore_mod_31 = [1, 3, 5, 7, 9, 11, 13]
time_indx = filter(i -> !(i % 31 in ignore_mod_31), 1:length(input_ds["time"][:]))
defDim(output_ds, "time", length(time_indx))
missing_indices = filter(i -> (i % 31 in ignore_mod_31), 1:length(input_ds["time"][:]));
for index in missing_indices
if !all(input_ds["tcw"][:,:,index] .=== missing)
@error "The index pattern of the invalid data is not as expected"
end
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this block of code do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Downloading Monthly averages by hour of day results in some weirdness with the time dimension, where there are duplicate entries. Some variables are defined on one of the duplicates and some on the other. This removed the duplicate points which only hold missing data. I added a comment describing this.

@imreddyTeja imreddyTeja force-pushed the tr/era5-calibration branch 3 times, most recently from 78c99b3 to f2e81f4 Compare November 15, 2024 15:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Download calibration data from 1979-present
3 participants