Demonstration for working with TEMPO data via `Harmony-py`

Overview

This notebook demonstrates how to retrieve and visualize TEMPO data using harmony-py.

Prerequisites

A free(!) account at https://www.earthdata.nasa.gov/ is needed to login and download the appropriate files.

This notebook was last tested using Python 3.10.15, and requires these libraries:

harmony-py
matplotlib
numpy
[python-cmr]
[netCDF4]
[xarray]
[geopandas]
[wget]

Note

This notebook is based on examples from the harmony-py repository and https://harmony.earthdata.nasa.gov/docs. - Created: 3 May 2023 - Last updated: 7 May 2025

What is Harmony?

Harmony is the cloud services orchestrator for NASA Earth Science Data and Information System (ESDIS). The goal of Harmony is to provide services to increase usage and ease of use of ESDIS’ Earth observation data, especially focusing on opportunities made possible by cloud-accessible data.

Data processed by Harmony are staged in Amazon s3 buckets, and Harmony services run in containers in pods in a Kubernetes cluster.

Note that services provided via Harmony can transform NASA data as well as provide access to the original data, through an Application Programmable Interface (API)…

Harmony services can be invoked through curl commands, e.g.:

https://harmony.earthdata.nasa.gov/{collectionId}/ogc-api-coverages/1.0.0/{variable}/coverage/rangeset

(Harmony services REST API conforms to the OGC Coverages API version 1.0.0. It accepts parameters in the URL path as well as query parameters.)

Harmony can also be invoked via wrapper libraries such as harmony-py, which is demonstrated below.

1. Setup

# Load packages into current runtime
import datetime as dt
import getpass

import numpy as np
import xarray as xr
import cartopy.crs as ccrs
import cartopy.feature as cfeature
from cartopy.mpl.gridliner import LONGITUDE_FORMATTER, LATITUDE_FORMATTER
from xarray.plot.utils import label_from_attrs

import matplotlib.pyplot as plt

from harmony import BBox, Client, Collection, Request
from harmony.config import Environment

2. Retrieve a data file (via Harmony)

All users will need an Earthdata Login (EDL) account in order to access NASA data and services.

Once a user has an EDL username and password they will need to use these when accessing Harmony.

print("Please provide your Earthdata Login credentials to allow data access")
print("Your credentials will only be passed to Earthdata and will not be exposed in the notebook")
username = input("Username:")

harmony_client = Client(env=Environment.PROD, auth=(username, getpass.getpass()))

Please provide your Earthdata Login credentials to allow data access
Your credentials will only be passed to Earthdata and will not be exposed in the notebook

Username: dkaufas
 ········

Choose a granule, and write out the request with a chosen collection ID and granule name.

In this case, we’ll choose a granule (a ~6 minute window) containing nitrogen dioxide data crossing part of the great lakes region.

# "Nitrogen Dioxide total column"
request = Request(
    collection=Collection(id="C2930725014-LARC_CLOUD"),
    granule_name=["TEMPO_NO2_L2_V03_20250406T215103Z_S012G07.nc"],
)
request.is_valid()

True

# Submit the request
job_id = harmony_client.submit(request)
print(f"jobID = {job_id}")

# Wait for the processing to complete.
harmony_client.wait_for_processing(job_id, show_progress=True)

jobID = 508af3b8-5d2d-4a5d-97fe-0e247042e9fe

 [ Processing: 100% ] |###################################################| [|]

# Download the resulting files
results = harmony_client.download_all(job_id, directory="/tmp")
all_results_stored = [f.result() for f in results]

print(f"Number of result files: {len(all_results_stored)}")

/tmp/TEMPO_NO2_L2_V03_20250406T215103Z_S012G07.nc
/tmp/TEMPO_NO2_L2_V03_20250406T215103Z_S012G07.nc
Number of result files: 2
/tmp/100137757_TEMPO_HCHO_L2_V03_20231230T145736Z_S003G08_product_vertical_column_subsetted.nc4
/tmp/100137762_TEMPO_HCHO_L2_V03_20231230T224418Z_S010G06_subsetted.nc4
/tmp/100137759_TEMPO_HCHO_L2_V03_20231230T222426Z_S010G03_subsetted.nc4
/tmp/100137761_TEMPO_HCHO_L2_V03_20231230T223741Z_S010G05_subsetted.nc4
/tmp/100137760_TEMPO_HCHO_L2_V03_20231230T223104Z_S010G04_subsetted.nc4
/tmp/100137764_TEMPO_HCHO_L2_V03_20231230T222426Z_S010G03_subsetted.nc4
/tmp/100137766_TEMPO_HCHO_L2_V03_20231230T223741Z_S010G05_subsetted.nc4
/tmp/100137765_TEMPO_HCHO_L2_V03_20231230T223104Z_S010G04_subsetted.nc4

3. Open the data file

# Open the data file using the Xarray package.
#   Alternatively, one could use the netCDF4-python library.
datatree = xr.open_datatree(all_results_stored[0])
datatree

4. Visualize one of the data variables

Map

Here we take a look at the Vertical Column Troposphere variable, both its metadata and then by creating a map visualization. Note the use of main_data_quality_flag in the plotting code to ensure that we are examining only the “normal” quality data.

product_variable_name = "product/vertical_column_troposphere"
da = datatree[product_variable_name]
da

data_proj = ccrs.PlateCarree()


def make_nice_map(axis):
    axis.add_feature(cfeature.STATES, color="gray", lw=0.1)
    axis.coastlines(resolution="50m", color="gray", linewidth=0.5)

    axis.set_extent([-150, -40, 14, 65], crs=data_proj)
    grid = axis.gridlines(draw_labels=["left", "bottom"], dms=True)
    grid.xformatter = LONGITUDE_FORMATTER
    grid.yformatter = LATITUDE_FORMATTER

fig, ax = plt.subplots(figsize=(10, 6), subplot_kw={"projection": data_proj})

make_nice_map(ax)

contour_handle = ax.contourf(
    datatree["geolocation/longitude"],
    datatree["geolocation/latitude"],
    da.where(datatree["product/main_data_quality_flag"] == 0),
    levels=100,
    vmin=0,
    zorder=2,
)

cb = plt.colorbar(contour_handle)
cb.set_label(label_from_attrs(da))

plt.show()

Zonal means

# Define two-degree wide latitude bins.
lat_bins = np.arange(15, 61, 5)

# Define a label for each bin corresponding to the central latitude.
lat_centers = np.arange(15, 60, 5)

# Group according to those bins and take the mean.
product_lat_mean = da.groupby_bins(
    datatree["geolocation/latitude"], lat_bins, labels=lat_centers
).mean(dim=xr.ALL_DIMS)

product_lat_mean.plot()
plt.gca().invert_xaxis()

da.where(datatree["product/main_data_quality_flag"] == 0).plot.contourf(
    x="mirror_step", y="xtrack", vmin=0
)

plt.gca().invert_xaxis()
plt.show()

5. Retrieve only a single variable

# TEMPO Formaldehyde
request = Request(
    collection=Collection(id="C2930730944-LARC_CLOUD"),
    granule_name=["TEMPO_HCHO_L2_V03_20231230T145736Z_S003G08.nc"],
    variables=["product/vertical_column"],
)

job_id = harmony_client.submit(request)
print(f"jobID = {job_id}")
harmony_client.wait_for_processing(job_id, show_progress=True)

# Download the resulting files
results = harmony_client.download_all(job_id, directory="/tmp", overwrite=True)
all_results_stored = [f.result() for f in results]
print(f"Number of result files: {len(all_results_stored)}")

jobID = 8252e95f-ee46-43f8-8d9c-a282a3b17c16

 [ Processing: 100% ] |###################################################| [|]

Number of result files: 1

# Open the data file.
datatree = xr.open_datatree(all_results_stored[0])
datatree

Note the reduced list of data variables.

6. Retrieve data from a select time range

# @title
# A clean-up step to ensure there isn't a clash between newly downloaded granules
for rf in all_results_stored:
    !rm {rf}

request = Request(
    collection=Collection(id="C2930730944-LARC_CLOUD"),
    # Note there is not a granule specified!
    temporal={
        "start": dt.datetime(2023, 12, 30, 22, 30, 0),
        "stop": dt.datetime(2023, 12, 30, 22, 45, 0),
    },
)

job_id = harmony_client.submit(request)

print(f"jobID = {job_id}")
harmony_client.wait_for_processing(job_id, show_progress=True)

# Download the resulting files
results = harmony_client.download_all(job_id, directory="/tmp", overwrite=True)
all_results_stored = [f.result() for f in results]
print(f"Number of result files: {len(all_results_stored)}")

jobID = 892616e0-3d42-4376-a850-c2086ebabd37

 [ Processing: 100% ] |###################################################| [|]

Number of result files: 4

# Open the data files
datatree_dict = dict()
for r in sorted(all_results_stored):
    datatree_dict[r] = xr.open_datatree(r)

    print(
        f"Time range: {datatree_dict[r]['geolocation/time'].values.min()} - "
        f"{datatree_dict[r]['geolocation/time'].values.max()}"
    )

Time range: 2023-12-30T22:30:00.279620864 - 2023-12-30T22:31:19.139312896
Time range: 2023-12-30T22:31:22.172382976 - 2023-12-30T22:37:56.470801152
Time range: 2023-12-30T22:37:59.503866880 - 2023-12-30T22:44:33.802316800
Time range: 2023-12-30T22:44:36.835376128 - 2023-12-30T22:44:58.066825984

Note how the time ranges fit within the requested temporal range

# Visualize each data file
fig, axs = plt.subplots(
    nrows=len(datatree_dict),
    sharex=True,
    figsize=(7, 10),
    gridspec_kw=dict(hspace=0.25),
    subplot_kw={"projection": data_proj},
)

for i, (key, tree) in enumerate(datatree_dict.items()):
    Var = tree["product/vertical_column"]

    if np.count_nonzero(~np.isnan(Var.values)) > 0:
        ax = axs[i]
        make_nice_map(ax)

        contour_handle = ax.contourf(
            tree["geolocation/longitude"],
            tree["geolocation/latitude"],
            Var,
            levels=100,
            vmin=0,
            zorder=2,
        )

        ax.set_title(key, fontsize=8)

plt.show()

# A clean-up step to ensure there isn't a clash between newly downloaded granules
for rf in all_results_stored:
    !rm {rf}

7. Retrieve data from a spatial bounding box

request = Request(
    collection=Collection(id="C2930730944-LARC_CLOUD"),
    # Note there is not a granule specified!
    spatial=BBox(-115, 35, -95, 45),
    temporal={
        "start": dt.datetime(2023, 12, 30, 22, 30, 0),
        "stop": dt.datetime(2023, 12, 30, 22, 45, 0),
    },
)

job_id = harmony_client.submit(request)

print(f"jobID = {job_id}")
harmony_client.wait_for_processing(job_id, show_progress=True)

# Download the resulting files
results = harmony_client.download_all(job_id, directory="/tmp", overwrite=True)
all_results_stored = [f.result() for f in results]
print(f"Number of result files: {len(all_results_stored)}")

jobID = 6b9e4884-3634-4073-9688-a23ef1ade651

 [ Processing: 100% ] |###################################################| [|]

Number of result files: 3

# Open the data files
datatree_dict = dict()
for r in sorted(all_results_stored):
    datatree_dict[r] = xr.open_datatree(r)

# Visualize each data file
fig, axs = plt.subplots(
    nrows=len(datatree_dict),
    sharex=True,
    figsize=(7, 10),
    gridspec_kw=dict(hspace=0.25),
    subplot_kw={"projection": data_proj},
)

for i, (key, tree) in enumerate(datatree_dict.items()):
    Var = tree["product/vertical_column"]
    ax = axs[i]
    if np.count_nonzero(~np.isnan(Var.values)) > 0:
        ax = axs[i]
        make_nice_map(ax)

        ax.scatter(
            tree["geolocation/longitude"], tree["geolocation/latitude"], s=1, c=Var, zorder=2
        )

        ax.set_title(key, fontsize=8)

        # Coordinates of rectangle vertices in clockwise order
        xs = [-115, -115, -95, -95, -115]
        ys = [35, 45, 45, 35, 35]
        ax.plot(xs, ys, color="red", linestyle=":")

plt.show()

Note the cut-off of data for the bounding box: (-115, 35, -95, 45)

End of Notebook.