Exploring nitrogen dioxide (NO2) data from PACE/OCI#

Author(s): Anna Windle (NASA, SSAI), Zachary Fasnacht (NASA, SSAI)

Last updated: July 28, 2025

Summary#

This tutorial describes how to access and download nitrogen dioxide (NO2) data products developed from PACE OCI data. More information on how these products were created can be found in Fasnacht et al. (2025). This notebook will also include examples on how to plot NO2 data as a static and interactive map, as well as how to plot an interactive time series plot.

Learning Objectives#

At the end of this notebook you will know:

  • Where to access NO2 data products in development for the PACE Mission at the NASA Aura Validation Data Center

  • What to select from the XArray DataTree objects representing hierarchichal datasets

  • Where to find high NO2 vertical column retrievals (hint: it’s a big city)

  • How to create a time series of NO2 data collected at a single location

1. Setup#

Begin by importing all of the packages used in this notebook. If your kernel uses an environment defined following the guidance on the tutorials page, then the imports will be successful.

from datetime import datetime

import cartopy
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import fsspec
import hvplot.xarray
import matplotlib.colors
import matplotlib.pyplot as plt
import pandas as pd
import xarray as xr
import holoviews
holoviews.config.image_rtol = 1e-2

swap_dims = {"nlon": "longitude", "nlat": "latitude"}

2. Download NO2 Data#

While under development, the NO2 product is available at NASA’s Aura Validation Data Center (AVDC). While the data are hosted there, we can access files from this site using the fsspec package. Once this product is implemented by the PACE Mission Science Data Segment (SDS), it will be available in the Earthdata Cloud and accessible as usual using the earthaccess package.

url = "https://avdc.gsfc.nasa.gov/pub/tmp/PACE_NO2/NO2_L3_Gridded_NAmerica/PACE_NO2_Gridded_NAmerica_2024m0401.nc"
fs = fsspec.filesystem("https")
path = fs.open(url, cache_type="blockcache")

3. Preview this hierarchical dataset#

We will use xarray.open_datatree to open all groups in the NetCDF and then merge them into a single dataset. We need to clean up some superfluous data stored as nlat and nlon along the way.

datatree = xr.open_datatree(path)
datatree["/"].dataset = datatree["/"].dataset.drop_vars(swap_dims)
dataset = xr.merge(datatree.to_dict().values()).swap_dims(swap_dims)
dataset
<xarray.Dataset> Size: 128MB
Dimensions:                                 (longitude: 4000, latitude: 2667)
Coordinates:
  * longitude                               (longitude) float32 16kB -130.0 ....
  * latitude                                (latitude) float32 11kB 20.01 ......
Data variables:
    nitrogen_dioxide_total_vertical_column  (longitude, latitude) float32 43MB ...
    U10M                                    (longitude, latitude) float32 43MB ...
    V10M                                    (longitude, latitude) float32 43MB ...
Attributes:
    title:               PACE OCI Level-3 Data NO2
    product_name:        PACE_NO2_Gridded_NAmerica_2024m0401.nc
    processing_version:  1.0.0
    Conventions:         CF-1.8
    instrument:          OCI
    date_created:        2024-12-19T12:36:59.110439
    institution:         NASA/GSFC
    spatial_resolution:  1500m
    spatial_coverage:    -130W to -70W, 20N to 60N
    date:                2024-04-01T00:00:00.000000
    PlatformShortName:   PACE

If you expand the nitrogen_dioxide_total_vertical_column variable, you’ll see that it is a 2D variable consisting of total vertical column NO2 measurements with units of molecules cm-2. Let’s plot it!

fig, ax = plt.subplots(figsize=(9, 5), subplot_kw={"projection": ccrs.PlateCarree()})
ax.gridlines(draw_labels={"left": "y", "bottom": "x"}, linewidth=0.25)
ax.coastlines(linewidth=0.5)
ax.add_feature(cfeature.BORDERS, linewidth=0.5)
ax.add_feature(cfeature.OCEAN, linewidth=0.5)
ax.add_feature(cfeature.LAKES, linewidth=0.5)
ax.add_feature(cfeature.LAND, facecolor="oldlace", linewidth=0.5)
cmap = matplotlib.colors.LinearSegmentedColormap.from_list(
    "",
    ["lightgrey", "cyan", "yellow", "orange", "red", "darkred"],
)
dataset["nitrogen_dioxide_total_vertical_column"].plot(
    x="longitude", y="latitude", vmin=3e15, vmax=10e15, cmap=cmap, zorder=100
)
plt.show()
../../_images/62e90a8f4632cd6140d945c1528d9fbf54c9e6cff6bb5941654d2f4d522a4c05.png

Let’s zoom in to Los Angeles, California … i.e., the bright red blob in the lower left.

fig, ax = plt.subplots(figsize=(9, 5), subplot_kw={"projection": ccrs.PlateCarree()})
ax.gridlines(draw_labels={"left": "y", "bottom": "x"}, linewidth=0.25)
ax.coastlines(linewidth=0.5)
ax.add_feature(cfeature.BORDERS, linewidth=0.5)
ax.add_feature(cfeature.OCEAN, linewidth=0.5)
ax.add_feature(cfeature.LAKES, linewidth=0.5)
ax.add_feature(cfeature.LAND, facecolor="oldlace", linewidth=0.5)
dataset["nitrogen_dioxide_total_vertical_column"].plot(
    x="longitude", y="latitude", vmin=3e15, vmax=10e15, cmap=cmap, zorder=100
)
ax.set_extent([-125, -110, 30, 40])
plt.show()
../../_images/40eea9cee479b92792af9b9b52a7d9571774ae2801b9b995cacae9be328976ae.png

You’ll also see other variables in the dataset U10M, and V10M. These are 10-meter Eastward Wind, and 10-meter Northward Wind, respectively.

4. Interactive NO2 plot#

An interative plot allows you to engage with the data such as zooming, panning, and hovering over for more information. We will use the hvplot accessor on XArray data structures to make an interactive plot of the single file we accessed above.

dataset["nitrogen_dioxide_total_vertical_column"].hvplot(
    x="longitude",
    y="latitude",
    cmap=cmap,
    clim=(3e15, 10e15),
    aspect=2,
    title="Total vertical column NO₂ April 2024",
    widget_location="top",
    geo=True,
    coastline=True,
    tiles="EsriWorldStreetMap",
)

5. Time Series#

Since there are many NO2 granules available for testing, we can make a time series of NO2 concentrations over time. First, we have to get URLs for all the granules and “open” them for streaming. Alternatively, fs.get could be used to download files locally, but we don’t want to duplicate data storage if working in the commercial cloud.

pattern = "https://avdc.gsfc.nasa.gov/pub/tmp/PACE_NO2/NO2_L3_Gridded_NAmerica/PACE_NO2_Gridded_*.nc"
results = fs.glob(pattern)
paths = [fs.open(i, cache_type="blockcache") for i in results]

Now we will combine the files into one xarray dataset, for which we have to access one group at a time within the hierarchichal datasets. This can take several minutes.

dataset = xr.open_mfdataset(
    paths,
    group="geophysical_data",
    concat_dim="time",
    combine="nested",
)

We can get the spatial coordinates from the first granule, since these are invariant. We have concatenated along a new dimension (i.e., a dimension not present in the datasets). To incorporate the temporal coordinates, we can add a variable for the “time” dimension by grabbing it from the filename.

space = xr.open_dataset(paths[0], group="navigation_data")
time = [datetime.strptime(i[-12:-3], "%Ym%m%d") for i in results]
dataset = (
    xr.merge((dataset, space, {"time": ("time", time)}))
    .swap_dims(swap_dims)
)

Let’s select the nearest pixel to 34°N, 118°W.

array = (
    dataset["nitrogen_dioxide_total_vertical_column"]
    .sel({"latitude": 34, "longitude": -118}, method="nearest")
)
array
<xarray.DataArray 'nitrogen_dioxide_total_vertical_column' (time: 84)> Size: 336B
dask.array<getitem, shape=(84,), dtype=float32, chunksize=(1,), chunktype=numpy.ndarray>
Coordinates:
  * time       (time) datetime64[ns] 672B 2024-04-01 2024-04-02 ... 2024-06-30
    latitude   float32 4B 34.0
    longitude  float32 4B -118.0
Attributes:
    valid_min:  -999.0
    valid_max:  1e+30
    long_name:  Total vertical column nitrogen dioxide
    source:     ML
    units:      molecules/cm2

You can see how this selection creates a new 1D dataset with values for one pixel across time. Let’s plot it using hvplot.

line = array.hvplot.line(line_width=2, color="darkblue", alpha=0.8)
dots = array.hvplot.scatter(size=20, color="crimson", marker="o", alpha=0.7)

We’ve created two plots, and now we combine them, add styling, and display.

(line * dots).opts(
    title="Time series of total vertical column NO₂ at (34, -118)",
    width=800,
    height=400,
    xlabel="Time",
    ylabel="NO₂ (molecules cm⁻²)",
    show_grid=True,
)