How can I use S3 links when earthaccess does not detect that I'm in-region (in us-west-2)? #845
-
Recently, via Slack, Shane Coffield (at GSFC) indicated that he ran into a situation where he used code similar to the following in an environment within the AWS us-west-2 region, but earthaccess was unable to determine this to be the case, and thus used HTTPS rather than S3 urls: import earthaccess
import xarray as xr
earthaccess.login()
results = earthaccess.search_data(
short_name='VNP03IMG',
bounding_box=(-125, 31, -102, 49.2),
temporal=('2020-06-01', '2020-06-02'),
count=5
)
files = earthaccess.open(results, provider="LAADS")
print(files[0])
xr.open_dataset(files[0], engine='h5netcdf', group='geolocation_data') This printed the following:
So the question is, knowing that we are indeed running in us-west-2, how can we get earthaccess to do the right thing by using S3 rather than HTTPS, since it's ability to detect whether or not we're "in region" is not particularly robust? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 8 replies
-
A very simple way to get this to work is to add the following line after the call to earthaccess.__store__.in_region = True However, use the approach above at your own risk. I recommend against this approach because Instead, I recommend the following approach: import earthaccess
import xarray as xr
earthaccess.login()
results = earthaccess.search_data(
short_name='VNP03IMG',
bounding_box=(-125, 31, -102, 49.2),
temporal=('2020-06-01', '2020-06-02'),
count=5
)
# Note: as of eartaccess 0.11.0, use get_s3_filesystem instead of get_s3fs_session.
# The 2 functions behave identically, but the new name accurately reflects the return type.
fs = earthaccess.get_s3fs_session(provider="LAADS")
s3_urls = [s3_url for result in results for s3_url in result.data_links(access="direct")]
print(s3_urls[0])
with (
fs.open(s3_urls[0]) as nc,
xr.open_dataset(nc, engine='h5netcdf', group='geolocation_data') as ds
):
# Do whatever you need to do with `ds` here.
print(ds) This prints the following S3 url (ignoring the dataset output):
In addition to not accessing a "private" object ( Unfortunately, using The downside of this is most notable within a Jupyter notebook, particularly when opening many files, especially when fetching large portions of each file, as this will likely produce memory problems due to the unreleased resources. The following, similar code, can be used for those who need to download files, rather than open them directly (via fs = earthaccess.get_s3fs_session(provider="LAADS")
s3_urls = [s3_url for result in results for s3_url in result.data_links(access="direct")]
filenames = [s3_url.split("/")[-1] for s3_url in s3_urls]
for s3_url, filename in zip(s3_urls, filenames):
fs.download(s3_url, filename) |
Beta Was this translation helpful? Give feedback.
A very simple way to get this to work is to add the following line after the call to
earthaccess.login()
(suggested by David Giles):However, use the approach above at your own risk. I recommend against this approach because
__store__
is not part of the library's "public" API. Thus, it is subject to change without notice, and there is no promise of any sort of backward compatibility, which is generally the case for the non-public API of any library.Instead, I recommend the following approach: