import os
import boto3
import rasterio as rio
from rasterio.session import AWSSession
import rioxarray as rxr
import hvplot.xarray
import earthaccess
How to: Directly Access ECOSTRESS Data (S3)
NASA Earthdata Cloud data is stored in an Amazon Web Services (AWS) Simple Storage Service (S3) bucket. Data located within these buckets can be directly accessed via temporary credentials; this access is limited to requests made within the US West (Oregon) (code: us-west-2) AWS region. In this notebook, we will access data for the ECOSTRESS Tiled Land Surface Temperature and Emissivity Instantaneous L2 Global 70 m V002 data product. These data are archived and distributed as Cloud Optimized GeoTIFF (COG) files, one file for each spectral band. We will access a single COG file, Land Surface Temperature (LST), directly loading it into memory from inside the AWS cloud (us-west-2 region, specifically). To accomplish this we will retrieve temporary AWS credentials and load the data directly to memory as an xarray
dataarray
. This approach leverages native protocols for efficient access to the data directly in its S3 bucket.
Background
The ECOSTRESS mission is answering these questions by accurately measuring the temperature of plants. Plants regulate their temperature by releasing water through tiny pores on their leaves called stomata. If they have sufficient water they can maintain their temperature, but if there is insufficient water, their temperatures rise and this temperature rise can be measured with ECOSTRESS. The images acquired by ECOSTRESS are the most detailed temperature images of the surface ever acquired from space and can be used to measure the temperature of an individual farmers field. These temperature images, along with auxiliary inputs, are used to produce one of the primary science outputs of ECOSTRESS: evapotranspiration, an indicator of plant health via the measure of evaporation and transpiration of water through a plant.
Learning Objectives
- How to authenticate to the NASA Earthdata system using the
earthaccess
Python library and a netrc file - How to access and load a Cloud Optimized GeoTIFF (COG) file using
rasterio
andrioxarray
- How to visualize the data using
hvplot
Requirements
- A cloud computing environment located within the AWS us-west2 region.
- NASA Earthdata Login account. If you do not have an Earthdata Account, you can create one here.
- A compatible Python Environment - See setup_instructions.md in the
/setup/
folder
Outline 1. Setup 2. Load file directly to memory 3. Visualize the data
1. Setup
Import the required libraries.
Authentication and Temporary AWS Credentials
Log into Earthdata using the login
functions from the earthaccess
library. The persist=True
argument will create a local .netrc
file if it doesn’t exist, or add your login info to an existing .netrc
file. If no Earthdata Login credentials are found in the .netrc
you’ll be prompted for them. After signing into Earthdata, these credentials can be used to get temporary AWS credentials so we can interact with S3 objects from applicable Earthdata Cloud buckets. Each NASA data center has different AWS credentials endpoints. The earthaccess
library can be used to retrieve credentials by just providing a data center name (e.g. ‘podaac’,‘gesdisc’,‘lpdaac’,‘ornldaac’,‘ghrcdaac’). In this case, ECOSTRESS data is archived by the LP DAAC (“lpdaac”), so we’ll want those temporary credentials.
# Log into Earthdata
= earthaccess.login(persist = True) auth
# Retrieve Temporary Credentials for LP DAAC Data
= earthaccess.get_s3_credentials(daac='lpdaac') temp_creds_req
Alternatively, we can manually define S3 credential endpoints and make a request using the requests
python library to retrive them. This process will use the login info from the .netrc
file. Uncomment and use below if you prefer.
# import requests
# s3_cred_endpoint = {
# 'podaac':'https://archive.podaac.earthdata.nasa.gov/s3credentials',
# 'gesdisc': 'https://data.gesdisc.earthdata.nasa.gov/s3credentials',
# 'lpdaac':'https://data.lpdaac.earthdatacloud.nasa.gov/s3credentials',
# 'ornldaac': 'https://data.ornldaac.earthdata.nasa.gov/s3credentials',
# 'ghrcdaac': 'https://data.ghrc.earthdata.nasa.gov/s3credentials'
# }
# def get_temp_creds(provider):
# return requests.get(s3_cred_endpoint[provider]).json()
# temp_creds_req = get_temp_creds('lpdaac')
Create a boto3
Session object using your temporary credentials. This Session is used to pass credentials and configuration to AWS so we can interact with S3 objects from applicable buckets.
Context Manager
For this exercise, we are going to open up a context manager for the notebook using the rasterio.env
module to store the required GDAL configurations we need to access the data from Earthdata Cloud. The context manager sends the authentication information when connecting to a file and can also customize how the file is handled locally. The GDAL environment variables must be configured to access COGs in Earthdata Cloud. Geospatial data access Python packages like rasterio and rioxarray depend on GDAL, leveraging GDAL’s “Virtual File Systems” to read remote files. GDAL has a lot of environment variables that control its behavior. Changing these settings can mean the difference between being able to access a file or not. They can also have an impact on the performance. Please see the GDAL config options documentation for more details and all available options.
While the context manager is open (rio_env.__enter__()) we will be able to run commands that open or get data that would typically be executed within a “with” statement. Entering the context manager for the entirety of the notebook allows us to more freely interact with the data. We’ll close the context manager (rio_env.__exit__()) at the end of the notebook.
= rio.Env(AWSSession(session),
rio_env ='TRUE',
GDAL_DISABLE_READDIR_ON_OPEN=os.path.expanduser('~/cookies.txt'),
GDAL_HTTP_COOKIEFILE=os.path.expanduser('~/cookies.txt'),
GDAL_HTTP_COOKIEJAR=10,
GDAL_HTTP_MAX_RETRY=0.5)
GDAL_HTTP_RETRY_DELAY__enter__() rio_env.
Above, GDAL_HTTP_COOKIEFILE
and GDAL_HTTP_COOKIEJAR
tell GDAL to use a cookie for authentication and where to find that cookie. GDAL_DISABLE_READDIR_ON_OPEN
tells gdal not to look for any auxiliary or sidecar files in the directory, which can slow down access. GDAL_HTTP_MAX_RETRY
and GDAL_HTTP_RETRY_DELAY
tell GDAL to retry the connection a number of times and how long to wait before retrying. These are nice options to add in the case that a connection fails temporarily, and will allow the workflow to continue without re-running.
2. Load the File Directly into Memory (S3)
In this example we’re interested in the ECOSTRESS data collection from NASA’s LP DAAC in Earthdata Cloud. Below we specify the URL to the data asset in Earthdata Cloud. This URL can be found via Earthdata Search or programmatically through earthaccess
, the CMR API or CMR-STAC API. There are programmatic examples in the Python tutorials for ECOSTRESS, and an earthdata search example available as well.
= 's3://lp-prod-protected/ECO_L2T_LSTE.002/ECOv002_L2T_LSTE_24479_001_11SKU_20221030T092522_0710_01/ECOv002_L2T_LSTE_24479_001_11SKU_20221030T092522_0710_01_LST.tif' s3_url_lst
Read in the ECOSTRESS LSt URL into our workspace using rioxarray
. This utilizes the context manager that we have entered. Optionally we can use the mask_and_scale
argument to mask and apply the scale and offset values for the data.
= rxr.open_rasterio(s3_url_lst)
da da
The file is read into Python as an xarray
dataarray
with a band, x, and y dimension. In this example the band dimension is meaningless, so we’ll use the squeeze()
function to remove band as a dimension.
= da.squeeze('band', drop=True)
da_lst da_lst
3. Visualize the Data
Plot the dataarray
, representing the ECOSTRESS band, using hvplot
. Since ECOSTRESS tiles are in UTM projections, to visualize this with a basemap tile, we’ll need to reproject to EPSG:4326 for the visual. This can be accomplished using the rio.reproject()
function.
= da_lst.rio.reproject("EPSG:4326") da_lst_reproj
= 'x',
da_lst_reproj.hvplot.image(x = 'y',
y = 'EPSG:4326',
crs ='jet',
cmap='EsriImagery',
tiles= f'{s3_url_lst.split("/")[-1]}',
title =500) frame_width
Exit the context manager.
__exit__() rio_env.
Contact Info:
Email: LPDAAC@usgs.gov
Voice: +1-866-573-3222
Organization: Land Processes Distributed Active Archive Center (LP DAAC)¹
Website: https://www.earthdata.nasa.gov/centers/lp-daac
¹Work performed under USGS contract G15PD00467 for NASA contract NNG14HH33I.