How to: Directly Access ECOSTRESS Data (S3)

NASA Earthdata Cloud data is stored in an Amazon Web Services (AWS) Simple Storage Service (S3) bucket. Data located within these buckets can be directly accessed via temporary credentials; this access is limited to requests made within the US West (Oregon) (code: us-west-2) AWS region. In this notebook, we will access data for the ECOSTRESS Tiled Land Surface Temperature and Emissivity Instantaneous L2 Global 70 m V002 data product. These data are archived and distributed as Cloud Optimized GeoTIFF (COG) files, one file for each spectral band. We will access a single COG file, Land Surface Temperature (LST), directly loading it into memory from inside the AWS cloud (us-west-2 region, specifically). To accomplish this we will retrieve temporary AWS credentials and load the data directly to memory as an xarray dataarray. This approach leverages native protocols for efficient access to the data directly in its S3 bucket.

Background

The ECOSTRESS mission is answering these questions by accurately measuring the temperature of plants. Plants regulate their temperature by releasing water through tiny pores on their leaves called stomata. If they have sufficient water they can maintain their temperature, but if there is insufficient water, their temperatures rise and this temperature rise can be measured with ECOSTRESS. The images acquired by ECOSTRESS are the most detailed temperature images of the surface ever acquired from space and can be used to measure the temperature of an individual farmers field. These temperature images, along with auxiliary inputs, are used to produce one of the primary science outputs of ECOSTRESS: evapotranspiration, an indicator of plant health via the measure of evaporation and transpiration of water through a plant.

Learning Objectives

Requirements

Outline 1. Setup 2. Load file directly to memory 3. Visualize the data

1. Setup

Import the required libraries.

import os
import boto3
import rasterio as rio
from rasterio.session import AWSSession
import rioxarray as rxr
import hvplot.xarray
import earthaccess

Authentication and Temporary AWS Credentials

Log into Earthdata using the login functions from the earthaccess library. The persist=True argument will create a local .netrc file if it doesn’t exist, or add your login info to an existing .netrc file. If no Earthdata Login credentials are found in the .netrc you’ll be prompted for them. After signing into Earthdata, these credentials can be used to get temporary AWS credentials so we can interact with S3 objects from applicable Earthdata Cloud buckets. Each NASA data center has different AWS credentials endpoints. The earthaccess library can be used to retrieve credentials by just providing a data center name (e.g. ‘podaac’,‘gesdisc’,‘lpdaac’,‘ornldaac’,‘ghrcdaac’). In this case, ECOSTRESS data is archived by the LP DAAC (“lpdaac”), so we’ll want those temporary credentials.

# Log into Earthdata
auth = earthaccess.login(persist = True)
# Retrieve Temporary Credentials for LP DAAC Data
temp_creds_req = earthaccess.get_s3_credentials(daac='lpdaac')

Alternatively, we can manually define S3 credential endpoints and make a request using the requests python library to retrive them. This process will use the login info from the .netrc file. Uncomment and use below if you prefer.

# import requests
# s3_cred_endpoint = {
#     'podaac':'https://archive.podaac.earthdata.nasa.gov/s3credentials',
#     'gesdisc': 'https://data.gesdisc.earthdata.nasa.gov/s3credentials',
#     'lpdaac':'https://data.lpdaac.earthdatacloud.nasa.gov/s3credentials',
#     'ornldaac': 'https://data.ornldaac.earthdata.nasa.gov/s3credentials',
#     'ghrcdaac': 'https://data.ghrc.earthdata.nasa.gov/s3credentials'
# }
# def get_temp_creds(provider):
#     return requests.get(s3_cred_endpoint[provider]).json()
# temp_creds_req = get_temp_creds('lpdaac')

Create a boto3 Session object using your temporary credentials. This Session is used to pass credentials and configuration to AWS so we can interact with S3 objects from applicable buckets.

session = boto3.Session(aws_access_key_id=temp_creds_req['accessKeyId'], 
                        aws_secret_access_key=temp_creds_req['secretAccessKey'],
                        aws_session_token=temp_creds_req['sessionToken'],
                        region_name='us-west-2')

Context Manager

For this exercise, we are going to open up a context manager for the notebook using the rasterio.env module to store the required GDAL configurations we need to access the data from Earthdata Cloud. The context manager sends the authentication information when connecting to a file and can also customize how the file is handled locally. The GDAL environment variables must be configured to access COGs in Earthdata Cloud. Geospatial data access Python packages like rasterio and rioxarray depend on GDAL, leveraging GDAL’s “Virtual File Systems” to read remote files. GDAL has a lot of environment variables that control its behavior. Changing these settings can mean the difference between being able to access a file or not. They can also have an impact on the performance. Please see the GDAL config options documentation for more details and all available options.

While the context manager is open (rio_env.__enter__()) we will be able to run commands that open or get data that would typically be executed within a “with” statement. Entering the context manager for the entirety of the notebook allows us to more freely interact with the data. We’ll close the context manager (rio_env.__exit__()) at the end of the notebook.

rio_env = rio.Env(AWSSession(session),
                  GDAL_DISABLE_READDIR_ON_OPEN='TRUE',
                  GDAL_HTTP_COOKIEFILE=os.path.expanduser('~/cookies.txt'),
                  GDAL_HTTP_COOKIEJAR=os.path.expanduser('~/cookies.txt'),
                  GDAL_HTTP_MAX_RETRY=10,
                  GDAL_HTTP_RETRY_DELAY=0.5)
rio_env.__enter__()

Above, GDAL_HTTP_COOKIEFILE and GDAL_HTTP_COOKIEJAR tell GDAL to use a cookie for authentication and where to find that cookie. GDAL_DISABLE_READDIR_ON_OPEN tells gdal not to look for any auxiliary or sidecar files in the directory, which can slow down access. GDAL_HTTP_MAX_RETRY and GDAL_HTTP_RETRY_DELAY tell GDAL to retry the connection a number of times and how long to wait before retrying. These are nice options to add in the case that a connection fails temporarily, and will allow the workflow to continue without re-running.

2. Load the File Directly into Memory (S3)

In this example we’re interested in the ECOSTRESS data collection from NASA’s LP DAAC in Earthdata Cloud. Below we specify the URL to the data asset in Earthdata Cloud. This URL can be found via Earthdata Search or programmatically through earthaccess, the CMR API or CMR-STAC API. There are programmatic examples in the Python tutorials for ECOSTRESS, and an earthdata search example available as well.

s3_url_lst = 's3://lp-prod-protected/ECO_L2T_LSTE.002/ECOv002_L2T_LSTE_24479_001_11SKU_20221030T092522_0710_01/ECOv002_L2T_LSTE_24479_001_11SKU_20221030T092522_0710_01_LST.tif'

Read in the ECOSTRESS LSt URL into our workspace using rioxarray. This utilizes the context manager that we have entered. Optionally we can use the mask_and_scale argument to mask and apply the scale and offset values for the data.

da = rxr.open_rasterio(s3_url_lst)
da

The file is read into Python as an xarray dataarray with a band, x, and y dimension. In this example the band dimension is meaningless, so we’ll use the squeeze() function to remove band as a dimension.

da_lst = da.squeeze('band', drop=True)
da_lst

3. Visualize the Data

Plot the dataarray, representing the ECOSTRESS band, using hvplot. Since ECOSTRESS tiles are in UTM projections, to visualize this with a basemap tile, we’ll need to reproject to EPSG:4326 for the visual. This can be accomplished using the rio.reproject() function.

da_lst_reproj = da_lst.rio.reproject("EPSG:4326")
da_lst_reproj.hvplot.image(x = 'x',
                           y = 'y',
                           crs = 'EPSG:4326',
                           cmap='jet',
                           tiles='EsriImagery',
                           title = f'{s3_url_lst.split("/")[-1]}',
                           frame_width=500)

Exit the context manager.

rio_env.__exit__()

Contact Info:

Email: LPDAAC@usgs.gov
Voice: +1-866-573-3222
Organization: Land Processes Distributed Active Archive Center (LP DAAC)¹
Website: https://www.earthdata.nasa.gov/centers/lp-daac

¹Work performed under USGS contract G15PD00467 for NASA contract NNG14HH33I.