Demonstration for working with TEMPO data via Harmony-py

Overview

This notebook demonstrates how to retrieve and visualize TEMPO data using harmony-py.

Prerequisites

A free(!) account at https://www.earthdata.nasa.gov/ is needed to login and download the appropriate files.

This notebook was last tested using Python 3.10.15, and requires these libraries:

Note

This notebook is based on examples from the harmony-py repository and https://harmony.earthdata.nasa.gov/docs. - Created: 3 May 2023 - Last updated: 7 May 2025

What is Harmony?

Harmony is the cloud services orchestrator for NASA Earth Science Data and Information System (ESDIS). The goal of Harmony is to provide services to increase usage and ease of use of ESDIS’ Earth observation data, especially focusing on opportunities made possible by cloud-accessible data.

Data processed by Harmony are staged in Amazon s3 buckets, and Harmony services run in containers in pods in a Kubernetes cluster.

Note that services provided via Harmony can transform NASA data as well as provide access to the original data, through an Application Programmable Interface (API)…

harmony_orchestrator_pic.png

Harmony services can be invoked through curl commands, e.g.:

https://harmony.earthdata.nasa.gov/{collectionId}/ogc-api-coverages/1.0.0/{variable}/coverage/rangeset

(Harmony services REST API conforms to the OGC Coverages API version 1.0.0. It accepts parameters in the URL path as well as query parameters.)

Harmony can also be invoked via wrapper libraries such as harmony-py, which is demonstrated below.

1. Setup

# Load packages into current runtime
import datetime as dt
import getpass

import numpy as np
import xarray as xr
import cartopy.crs as ccrs
import cartopy.feature as cfeature
from cartopy.mpl.gridliner import LONGITUDE_FORMATTER, LATITUDE_FORMATTER
from xarray.plot.utils import label_from_attrs

import matplotlib.pyplot as plt

from harmony import BBox, Client, Collection, Request
from harmony.config import Environment

2. Retrieve a data file (via Harmony)

All users will need an Earthdata Login (EDL) account in order to access NASA data and services.

Once a user has an EDL username and password they will need to use these when accessing Harmony.

print("Please provide your Earthdata Login credentials to allow data access")
print("Your credentials will only be passed to Earthdata and will not be exposed in the notebook")
username = input("Username:")

harmony_client = Client(env=Environment.PROD, auth=(username, getpass.getpass()))
Please provide your Earthdata Login credentials to allow data access
Your credentials will only be passed to Earthdata and will not be exposed in the notebook
Username: dkaufas
 ········

Choose a granule, and write out the request with a chosen collection ID and granule name.

In this case, we’ll choose a granule (a ~6 minute window) containing nitrogen dioxide data crossing part of the great lakes region.

no2_granule_EDS_screenshot.jpg
# "Nitrogen Dioxide total column"
request = Request(
    collection=Collection(id="C2930725014-LARC_CLOUD"),
    granule_name=["TEMPO_NO2_L2_V03_20250406T215103Z_S012G07.nc"],
)
request.is_valid()
True
# Submit the request
job_id = harmony_client.submit(request)
print(f"jobID = {job_id}")

# Wait for the processing to complete.
harmony_client.wait_for_processing(job_id, show_progress=True)
jobID = 508af3b8-5d2d-4a5d-97fe-0e247042e9fe
 [ Processing: 100% ] |###################################################| [|]
# Download the resulting files
results = harmony_client.download_all(job_id, directory="/tmp")
all_results_stored = [f.result() for f in results]

print(f"Number of result files: {len(all_results_stored)}")
/tmp/TEMPO_NO2_L2_V03_20250406T215103Z_S012G07.nc
/tmp/TEMPO_NO2_L2_V03_20250406T215103Z_S012G07.nc
Number of result files: 2
/tmp/100137757_TEMPO_HCHO_L2_V03_20231230T145736Z_S003G08_product_vertical_column_subsetted.nc4
/tmp/100137762_TEMPO_HCHO_L2_V03_20231230T224418Z_S010G06_subsetted.nc4
/tmp/100137759_TEMPO_HCHO_L2_V03_20231230T222426Z_S010G03_subsetted.nc4
/tmp/100137761_TEMPO_HCHO_L2_V03_20231230T223741Z_S010G05_subsetted.nc4
/tmp/100137760_TEMPO_HCHO_L2_V03_20231230T223104Z_S010G04_subsetted.nc4
/tmp/100137764_TEMPO_HCHO_L2_V03_20231230T222426Z_S010G03_subsetted.nc4
/tmp/100137766_TEMPO_HCHO_L2_V03_20231230T223741Z_S010G05_subsetted.nc4
/tmp/100137765_TEMPO_HCHO_L2_V03_20231230T223104Z_S010G04_subsetted.nc4

3. Open the data file

# Open the data file using the Xarray package.
#   Alternatively, one could use the netCDF4-python library.
datatree = xr.open_datatree(all_results_stored[0])
datatree
<xarray.DatasetView> Size: 9kB
Dimensions:      (xtrack: 2048, mirror_step: 131)
Coordinates:
  * xtrack       (xtrack) int32 8kB 0 1 2 3 4 5 ... 2043 2044 2045 2046 2047
  * mirror_step  (mirror_step) int32 524B 788 789 790 791 ... 915 916 917 918
Data variables:
    *empty*
Attributes: (12/38)
    tio_commit:                       bd5e3a3bec43c1970873e2537a09576f5e1d10df
    product_type:                     NO2
    processing_level:                 2
    processing_version:               3
    sdpc_version:                     TEMPO_SDPC_v4.4.3
    scan_num:                         12
    ...                               ...
    collection_shortname:             TEMPO_NO2_L2
    collection_version:               1
    keywords:                         EARTH SCIENCE>ATMOSPHERE>AIR QUALITY>NI...
    summary:                          Nitrogen dioxide Level 2 files provide ...
    coremetadata:                     \nGROUP                  = INVENTORYMET...
    history:                          2025-04-07T02:04:27Z:/tempo/nas0/sdpc_s...

4. Visualize one of the data variables

Map

Here we take a look at the Vertical Column Troposphere variable, both its metadata and then by creating a map visualization. Note the use of main_data_quality_flag in the plotting code to ensure that we are examining only the “normal” quality data.

product_variable_name = "product/vertical_column_troposphere"
da = datatree[product_variable_name]
da
<xarray.DataArray 'vertical_column_troposphere' (mirror_step: 131, xtrack: 2048)> Size: 2MB
[268288 values with dtype=float64]
Coordinates:
  * xtrack       (xtrack) int32 8kB 0 1 2 3 4 5 ... 2043 2044 2045 2046 2047
  * mirror_step  (mirror_step) int32 524B 788 789 790 791 ... 915 916 917 918
Attributes:
    long_name:  troposphere nitrogen dioxide vertical column
    units:      molecules/cm^2
data_proj = ccrs.PlateCarree()


def make_nice_map(axis):
    axis.add_feature(cfeature.STATES, color="gray", lw=0.1)
    axis.coastlines(resolution="50m", color="gray", linewidth=0.5)

    axis.set_extent([-150, -40, 14, 65], crs=data_proj)
    grid = axis.gridlines(draw_labels=["left", "bottom"], dms=True)
    grid.xformatter = LONGITUDE_FORMATTER
    grid.yformatter = LATITUDE_FORMATTER
fig, ax = plt.subplots(figsize=(10, 6), subplot_kw={"projection": data_proj})

make_nice_map(ax)

contour_handle = ax.contourf(
    datatree["geolocation/longitude"],
    datatree["geolocation/latitude"],
    da.where(datatree["product/main_data_quality_flag"] == 0),
    levels=100,
    vmin=0,
    zorder=2,
)

cb = plt.colorbar(contour_handle)
cb.set_label(label_from_attrs(da))

plt.show()

Zonal means

# Define two-degree wide latitude bins.
lat_bins = np.arange(15, 61, 5)

# Define a label for each bin corresponding to the central latitude.
lat_centers = np.arange(15, 60, 5)

# Group according to those bins and take the mean.
product_lat_mean = da.groupby_bins(
    datatree["geolocation/latitude"], lat_bins, labels=lat_centers
).mean(dim=xr.ALL_DIMS)

product_lat_mean.plot()
plt.gca().invert_xaxis()

da.where(datatree["product/main_data_quality_flag"] == 0).plot.contourf(
    x="mirror_step", y="xtrack", vmin=0
)

plt.gca().invert_xaxis()
plt.show()

5. Retrieve only a single variable

# TEMPO Formaldehyde
request = Request(
    collection=Collection(id="C2930730944-LARC_CLOUD"),
    granule_name=["TEMPO_HCHO_L2_V03_20231230T145736Z_S003G08.nc"],
    variables=["product/vertical_column"],
)

job_id = harmony_client.submit(request)
print(f"jobID = {job_id}")
harmony_client.wait_for_processing(job_id, show_progress=True)

# Download the resulting files
results = harmony_client.download_all(job_id, directory="/tmp", overwrite=True)
all_results_stored = [f.result() for f in results]
print(f"Number of result files: {len(all_results_stored)}")
jobID = 8252e95f-ee46-43f8-8d9c-a282a3b17c16
 [ Processing: 100% ] |###################################################| [|]
Number of result files: 1
# Open the data file.
datatree = xr.open_datatree(all_results_stored[0])
datatree
<xarray.DatasetView> Size: 0B
Dimensions:  ()
Data variables:
    *empty*
Attributes: (12/39)
    tio_commit:                       040d27c76f5040d777912886b53b0f3a5cbb3e67
    product_type:                     HCHO
    processing_level:                 2
    processing_version:               3
    sdpc_version:                     TEMPO_SDPC_v4.4.2
    scan_num:                         3
    ...                               ...
    collection_version:               1
    keywords:                         EARTH SCIENCE>ATMOSPHERE>AIR QUALITY>VO...
    summary:                          Formaldehyde Level 2 files provide trac...
    coremetadata:                     \nGROUP                  = INVENTORYMET...
    history:                          2024-08-05T19:15:47Z:/tempo/nas0/sdpc_s...
    history_json:                     [{"date_time": "2025-05-08T18:45:04.249...

Note the reduced list of data variables.

6. Retrieve data from a select time range

# @title
# A clean-up step to ensure there isn't a clash between newly downloaded granules
for rf in all_results_stored:
    !rm {rf}
request = Request(
    collection=Collection(id="C2930730944-LARC_CLOUD"),
    # Note there is not a granule specified!
    temporal={
        "start": dt.datetime(2023, 12, 30, 22, 30, 0),
        "stop": dt.datetime(2023, 12, 30, 22, 45, 0),
    },
)

job_id = harmony_client.submit(request)

print(f"jobID = {job_id}")
harmony_client.wait_for_processing(job_id, show_progress=True)

# Download the resulting files
results = harmony_client.download_all(job_id, directory="/tmp", overwrite=True)
all_results_stored = [f.result() for f in results]
print(f"Number of result files: {len(all_results_stored)}")
jobID = 892616e0-3d42-4376-a850-c2086ebabd37
 [ Processing: 100% ] |###################################################| [|]
Number of result files: 4
# Open the data files
datatree_dict = dict()
for r in sorted(all_results_stored):
    datatree_dict[r] = xr.open_datatree(r)

    print(
        f"Time range: {datatree_dict[r]['geolocation/time'].values.min()} - "
        f"{datatree_dict[r]['geolocation/time'].values.max()}"
    )
Time range: 2023-12-30T22:30:00.279620864 - 2023-12-30T22:31:19.139312896
Time range: 2023-12-30T22:31:22.172382976 - 2023-12-30T22:37:56.470801152
Time range: 2023-12-30T22:37:59.503866880 - 2023-12-30T22:44:33.802316800
Time range: 2023-12-30T22:44:36.835376128 - 2023-12-30T22:44:58.066825984

Note how the time ranges fit within the requested temporal range

# Visualize each data file
fig, axs = plt.subplots(
    nrows=len(datatree_dict),
    sharex=True,
    figsize=(7, 10),
    gridspec_kw=dict(hspace=0.25),
    subplot_kw={"projection": data_proj},
)

for i, (key, tree) in enumerate(datatree_dict.items()):
    Var = tree["product/vertical_column"]

    if np.count_nonzero(~np.isnan(Var.values)) > 0:
        ax = axs[i]
        make_nice_map(ax)

        contour_handle = ax.contourf(
            tree["geolocation/longitude"],
            tree["geolocation/latitude"],
            Var,
            levels=100,
            vmin=0,
            zorder=2,
        )

        ax.set_title(key, fontsize=8)

plt.show()

# A clean-up step to ensure there isn't a clash between newly downloaded granules
for rf in all_results_stored:
    !rm {rf}

7. Retrieve data from a spatial bounding box

request = Request(
    collection=Collection(id="C2930730944-LARC_CLOUD"),
    # Note there is not a granule specified!
    spatial=BBox(-115, 35, -95, 45),
    temporal={
        "start": dt.datetime(2023, 12, 30, 22, 30, 0),
        "stop": dt.datetime(2023, 12, 30, 22, 45, 0),
    },
)

job_id = harmony_client.submit(request)

print(f"jobID = {job_id}")
harmony_client.wait_for_processing(job_id, show_progress=True)

# Download the resulting files
results = harmony_client.download_all(job_id, directory="/tmp", overwrite=True)
all_results_stored = [f.result() for f in results]
print(f"Number of result files: {len(all_results_stored)}")
jobID = 6b9e4884-3634-4073-9688-a23ef1ade651
 [ Processing: 100% ] |###################################################| [|]
Number of result files: 3
# Open the data files
datatree_dict = dict()
for r in sorted(all_results_stored):
    datatree_dict[r] = xr.open_datatree(r)
# Visualize each data file
fig, axs = plt.subplots(
    nrows=len(datatree_dict),
    sharex=True,
    figsize=(7, 10),
    gridspec_kw=dict(hspace=0.25),
    subplot_kw={"projection": data_proj},
)

for i, (key, tree) in enumerate(datatree_dict.items()):
    Var = tree["product/vertical_column"]
    ax = axs[i]
    if np.count_nonzero(~np.isnan(Var.values)) > 0:
        ax = axs[i]
        make_nice_map(ax)

        ax.scatter(
            tree["geolocation/longitude"], tree["geolocation/latitude"], s=1, c=Var, zorder=2
        )

        ax.set_title(key, fontsize=8)

        # Coordinates of rectangle vertices in clockwise order
        xs = [-115, -115, -95, -95, -115]
        ys = [35, 45, 45, 35, 35]
        ax.plot(xs, ys, color="red", linestyle=":")

plt.show()

Note the cut-off of data for the bounding box: (-115, 35, -95, 45)

End of Notebook.