How to: Directly Access ECOSTRESS Data (HTTP)

Summary

In this notebook, we will access data for the ECOSTRESS Tiled Land Surface Temperature and Emissivity Instantaneous L2 Global 70 m V002 data product. These data are archived and distributed as Cloud Optimized GeoTIFF (COG) files, one file for each spectral band. We will access a single COG file, Land Surface Temperature (LST), directly loading it into memory, leveraging the cloud-optimized format, rather than downloading the file. To accomplish this we will create a requests_https_session using the earthaccess Python library, which will handle passing our Earthdata Login credentials to the NASA Earthdata system, and the rasterio and rioxarray Python libraries to load the data into memory so we can easily work with it.

Background

The ECOSTRESS mission is answering these questions by accurately measuring the temperature of plants. Plants regulate their temperature by releasing water through tiny pores on their leaves called stomata. If they have sufficient water they can maintain their temperature, but if there is insufficient water, their temperatures rise and this temperature rise can be measured with ECOSTRESS. The images acquired by ECOSTRESS are the most detailed temperature images of the surface ever acquired from space and can be used to measure the temperature of an individual farm field. These temperature images, along with auxiliary inputs, are used to produce one of the primary science outputs of ECOSTRESS: evapotranspiration, an indicator of plant health vby measuring the evaporation and transpiration of water through a plant.

Learning Objectives

Requirements

Outline 1. Setup 2. Load file directly to memory 3. Visualize the data

1. Setup

Import the required libraries.

import os
import rasterio as rio
import rioxarray as rxr
import hvplot.xarray
import earthaccess

Authentication

Log into Earthdata using the Auth and login functions from the earthaccess library. The persist=True argument will create a local .netrc file if it doesn’t exist, or add your login info to an existing .netrc file. If no Earthdata Login credentials are found in the .netrc you’ll be prompted for them.

auth = earthaccess.login(persist = True)

Context Manager

For this exercise, we are going to open up a context manager for the notebook using the rasterio.env module to store the required GDAL configurations we need to access the data from Earthdata Cloud. The context manager sends the authentication information when connecting to a file and can also customize how the file is handled locally. Geospatial data access Python packages like rasterio and rioxarray depend on GDAL, leveraging GDAL’s “Virtual File Systems” to read remote files. GDAL has a lot of environment variables that control its behavior. Changing these settings can mean the difference between being able to access a file or not. They can also have an impact on the performance. Please see the GDAL config options documentation for more details and all available options.

While the context manager is open (rio_env.__enter__()) we will be able to run the open or get data commands that would typically be executed within a “with” statement. Entering the context manager for the entirety of the notebook allows us to more freely interact with the data. We’ll close the context manager (rio_env.__exit__()) at the end of the notebook.

# Set up and enter context manager
rio_env = rio.Env(GDAL_DISABLE_READDIR_ON_OPEN='TRUE',
                  GDAL_HTTP_COOKIEFILE=os.path.expanduser('~/cookies.txt'),
                  GDAL_HTTP_COOKIEJAR=os.path.expanduser('~/cookies.txt'),
                  GDAL_HTTP_MAX_RETRY=10,
                  GDAL_HTTP_RETRY_DELAY=0.5)
rio_env.__enter__()

Above, GDAL_HTTP_COOKIEFILE and GDAL_HTTP_COOKIEJAR tell GDAL to use a cookie for authentication and where to find that cookie. GDAL_DISABLE_READDIR_ON_OPEN tells gdal not to look for any auxiliary or sidecar files in the directory, which can slow down access. GDAL_HTTP_MAX_RETRY and GDAL_HTTP_RETRY_DELAY tell GDAL to retry the connection a number of times and how long to wait before retrying. These are nice options to add in the case that a connection fails temporarily, and will allow the workflow to continue without re-running.

In this example, we use cookies to pass authentication information via the context manager; however, this can also be accomplished by sending an Earthdata Login token, when working with versions of gdal > 3.7.0. See below for a commented out example.

# rio_env = rio.Env(GDAL_DISABLE_READDIR_ON_OPEN='TRUE',
#                   GDAL_HTTP_AUTH='BEARER',
#                   GDAL_HTTP_BEARER=auth.token['access_token'],
#                   GDAL_HTTP_MAX_RETRY=10,
#                   GDAL_HTTP_RETRY_DELAY=0.5)
# rio_env.__enter__()

2. Load the File Directly into Memory

In this example we’re interested in the ECOSTRESS data collection from NASA’s LP DAAC in Earthdata Cloud. Below we specify the URL to the data asset in Earthdata Cloud. This URL can be found via Earthdata Search or programmatically through earthaccess, the CMR API or CMR-STAC API. There are programmatic examples in the Python tutorials for ECOSTRESS, and an earthdata search example available as well.

https_url = 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/ECO_L2T_LSTE.002/ECOv002_L2T_LSTE_24479_001_11SKU_20221030T092522_0710_01/ECOv002_L2T_LSTE_24479_001_11SKU_20221030T092522_0710_01_LST.tif'

Read in the ECOSTRESS LST URL into our workspace using rioxarray. This utilizes the context manager that we have entered. Optionally we can use the mask_and_scale argument to mask and apply the scale and offset values for the data.

# Open data with rioxarray
da = rxr.open_rasterio(https_url, mask_and_scale=True)
da

The file is read into Python as an xarray dataarray with a band, x, and y dimension. In this example the band dimension is meaningless, so we’ll use the squeeze() function to remove band as a dimension.

da_lst = da.squeeze('band', drop=True)
da_lst

3. Visualize the Data

Plot the dataarray, representing the ECOSTRESS band, using hvplot. Since ECOSTRESS tiles are in UTM projections, to visualize this with a basemap tile, we’ll need to reproject to EPSG:4326 for the visual. This can be accomplished using the rio.reproject() function.

da_lst_reproj = da_lst.rio.reproject("EPSG:4326")
da_lst_reproj.hvplot.image(x = 'x',
                           y = 'y',
                           crs = 'EPSG:4326',
                           cmap='jet',
                           tiles='EsriImagery',
                           title = f'{https_url.split("/")[-1]}',
                           frame_width=500)

Exit the context manager.

rio_env.__exit__()

Contact Info:

Email: LPDAAC@usgs.gov
Voice: +1-866-573-3222
Organization: Land Processes Distributed Active Archive Center (LP DAAC)¹
Website: https://www.earthdata.nasa.gov/centers/lp-daac

¹Work performed under USGS contract G15PD00467 for NASA contract NNG14HH33I.