!conda install cf_xarray -y
Exploring PACE OCI L2 Surface Reflectance
In this tutorial we will work with Level 2 data from the PACE’s Ocean Color Instrument (OCI), specificaclly the Surface Reflectance data product. For information on how this data product has been created please visit: PACE OCI Surface Reflectance (SFREFL) Algorithm Theortical Basis Document (ATBD)
The OCI instrument is a hyperspectral imaging radiometer, collecting hyperspectral measurements from 340 nm - 895 nm (Ultra Violet (UV) to near-infrared-NIR), and multi-spectral measurements from 940-2260 nm (NIR to shortwave-infrared).
This tutorial leverages code from Ocean Color OB DAAC Training Tutorials and NASA LP DAAC VITALS tutorials
Learning Objectives
This tutorial will guide you through working with PACE OCI L2 SFREL data on a cloud platform. Upon completion of this tutorial, you will be able to:
- Access and search for available granules by time and location.
- Explore data structure and format.
- Visualize data by plotting individual spectral bands.
- Apply cloud and water flags to effectively mask out unwanted pixels.
- Create interactive RGB maps.
- Display, and export spectral signatures for selected locations.
- Calculate advanced hyperspectral indices using these L2 datasets.
Contents
- Setup
- Searching for collections and granules
- Read PACE L2 Surface Reflectance data
- Mask clouds and water from images
- Display spectral signatures for selected locations
- Calculate unique spectral indices
Setup
# Import required libraries
import csv
import math
from io import BytesIO
import cf_xarray
import earthaccess
import rasterio as rio
import xarray as xr
import cartopy
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import holoviews as hv
import geoviews as gv
import hvplot.xarray
Authentication
earthaccess
creates and leverages Earthdata Login tokens to authenticate with NASA systems. Earthdata Login tokens expire after a month. To retrieve a token from Earthdata Login, you can either enter your username and password each time you use earthaccess
, or use a .netrc
file. A .netrc
file is a configuration file that is commonly used to store login credentials for remote systems. If you don’t have a .netrc
or don’t know if you have one or not, you can use the persist
argument with the login
function below to create or update an existing one, then use it for authentication.
If you do not have an Earthdata Account, you can create one here.
= earthaccess.login(persist=True) auth
Searching for Collections
The PACE mission produces several collections or datasets available via the NASA Earthdata cloud archive.
To view what’s available, we can use the search_datasets
function and with the keyword
argument.
# Retrieve Collections
= earthaccess.search_datasets(keyword='PACE Surface Reflectance')
collections # Print Quantity of Results
print(f'Collections found: {len(collections)}')
If you print the collections
object you can explore all of the json metadata.
# # Print collections
# collections
We can also create a list of the short-name
, concept-id
, and version
of each result collection using list comprehension. These fields are important for specifying and searching for data within collections.
= [
collections_info
{'short_name': c.summary()['short-name'],
'collection_concept_id': c.summary()['concept-id'],
'version': c.summary()['version'],
'entry_title': c['umm']['EntryTitle']
}for c in collections
]'display.max_colwidth', 150)
pd.set_option(= pd.DataFrame(collections_info)
collections_info collections_info
The collection concept-id
is the best way to search for data within a collection, as this is unique to each collection. The short-name
can be used as well, however the version
should be passed as well as there can be multiple versions available with the same short name. After finding the collection you want to search, you can use the concept-id
to search for granules within that collection.
Searching for Granules
A granule
can be thought of as a unique spatiotemporal grouping within a collection. To search for granules
, we can use the search_data
function from earthaccess
and provide the arguments for our search. Its possible to specify search products using several criteria shown in the table below:
dataset origin and location | spatio temporal parameters | dataset metadata parameters |
---|---|---|
archive_center | bounding_box | concept_id |
data_center | temporal | entry_title |
daac | point | granule_name |
provider | polygon | version |
cloud_hosted | line | short_name |
For this example we will search for the PACE OCI Level-2 Regional Surface Reflectance Data using a bounding box and temporal parameters, and add a cloud_cover
parameter. Note that not all datasets have cloud cover information, so this parameter may not work for all datasets.
= ("2025-03-01", "2025-03-20")
tspan = (-86.512, 31.70, -86.512, 34.938)
bbox
= earthaccess.search_data(
results ="PACE_OCI_L2_SFREFL",
short_name="3.0",
version=tspan,
temporal=bbox,
bounding_box=(0, 50)
cloud_cover
)print(f'Granules found: {len(results)}')
Results is a list
, so we can use an index to view a single result.
0] results[
We can also retrieve specific metadata for a result using .keys()
since this object also acts as a dictionary.
0].keys()
results[# results[0]['meta']
Feel free to explore the metadata of the results, to check if there are fields relevant for your use case.
From here, we can also retrieve the links for data using earthaccess
, which we can either download or stream.
= [result.data_links()[0] for result in results]
links # Display the first 5 links
5] links[:
To access data from NASA, you’ll need to provide your Earthdata Login credentials. When streaming this can best be done using the token or cookies set up by the earthaccess
library. Since we’ve already logged in, we can start an fsspec
session to manage our connection to a remote file, including sending credentials. This allows other libraries to work with a URL as if it is a local file.
In addition to fsspec
for this example, we’ll also use BytesIO
to read the full dataset into memory. This is not always the best approach, but for working with a single PACE scene which is ~750 mb, it speeds up the process due to the data structure and some caveats regarding fsspec
.
= earthaccess.get_fsspec_https_session() fs
Now, open the connection to the remote file and read into memory.
with fs.open(links[0]) as f:
= BytesIO(f.read()) file_content
Reading dataset
The PACE_OCI_L2_SFREFL files are heirarchichal netCDF files, so we will use the xarray.datatree
function to read in and view all of the groups. Using the datatree
representation object, we can explore the data structure, which is useful for understanding the groups, variables, dimension order, and coordinates.
= xr.open_datatree(file_content, decode_timedelta=False)
datatree datatree
It will be easier to work with the data if we create a single dataset, which we can do by merging the data and coordinate variables we need from different groups. Longitude and latitude appear as data variables, in the group navigation_data
, they need to be explicitly set as coordinates. Note the group geophysical_data
and its data variables: rhos
and l2_flags
. The rhos
variable are surface reflectances, and the l2_flags
are quality flags as defined by the Ocean Biology Processing Group (OBPG).
= xr.merge(
ds
(
datatree.ds,"geophysical_data"].ds[["rhos", "l2_flags"]],
datatree["sensor_band_parameters"].coords,
datatree["navigation_data"].ds.set_coords(("longitude", "latitude")).coords,
datatree[
)
) ds
We can also see which wavelengths we have surface reflectance measurements at by accessing the wavelength_3d
coordinate:
# Check the wavelengths available for PACE
"wavelength_3d"] ds[
Note that “wavelength_3d” is an indexed coordinate, which allows us to subset the dataset by slicing or choosing individual wavelengths. The method=“nearest” argument lets us select one wavelength without knowning its exact value.
Hence, we can select one wavelength even if it does not represent an exact value in the “wavelength_3d” array and plot it using the method=“nearest” argument.
= ds["rhos"].sel({"wavelength_3d": 860}, method="nearest")
rhos_860
= plt.subplots(figsize=(9, 5), subplot_kw={"projection": cartopy.crs.PlateCarree()})
fig, ax ={"left": "y", "bottom": "x"}, linewidth=0.25)
ax.gridlines(draw_labels=0.5)
ax.coastlines(linewidth="w", linewidth=0.01)
ax.add_feature(cartopy.feature.OCEAN, edgecolor="w", linewidth=0.01)
ax.add_feature(cartopy.feature.LAND, edgecolor="longitude", y="latitude", cmap="Greys_r", vmin=0, vmax=1.0)
rhos_860.plot(x plt.show()
rhos_860
Masking clouds
#Now flags
"l2_flags"] ds[
We will use the cf_xarray
library to create a mask including only land and excluding clouds and ice.
# Create and apply Mask
"l2_flags"].cf.is_flag_variable
ds[= (
cldwater_mask "l2_flags"].cf == "LAND")
(ds[& ~(ds["l2_flags"].cf == "CLDICE")
)# Apply mask, creating new dataset
= ds["rhos"].where(cldwater_mask) rhos
# Plot a band of masked data
= rhos.sel({"wavelength_3d": 860}, method="nearest")
rhos_860
= plt.subplots(figsize=(9, 5), subplot_kw={"projection": cartopy.crs.PlateCarree()})
fig, ax =0.25)
ax.coastlines(linewidth={"left": "y", "bottom": "x"}, linewidth=0.25)
ax.gridlines(draw_labels="w", linewidth=0.01)
ax.add_feature(cartopy.feature.OCEAN, edgecolor="w", linewidth=0.01)
ax.add_feature(cartopy.feature.LAND, edgecolor="k", linewidth=0.1)
ax.add_feature(cartopy.feature.LAKES, edgecolor="longitude", y="latitude", cmap="Greys_r", vmin=0, vmax=1.0)
rhos_860.plot(x plt.show()
Projecting data
From viewing the data structure, we can see that the data is in swath format based on the dimensions. This means that we will need to grid the data to work with it in a typical manner.
There are a couple ways to do this transformation, but for this example, we’ll opt for a simple approach using rioxarray
. Note that the process shown here will create a grid unique to this scene, so it won’t easily merge or stack with others created in the same way for something like a time-series analysis.
We encourage you to check the following resources for additional information on data transformation for PACE datasets: SeaDAS and Python Jupyter notebook
To grid the data with rioxarray
, we need to change the order of the dimensions in the array. Hyperspectral data is often stored (y,x,band) since most processing is conducted along the band dimension; however, rioxarray
expects the dimensions to be in the order of (band, y, x), so we must transpose the array to grid the data using rioxarray
. Note that we’ll also want to ensure we’re using nearest neighbor resampling to preserve the spectral information during the gridding process. If we use another method we could be creating artificial spectra which could lead to inaccurate unmixing or classification results.
# Transpose data
= rhos.transpose("wavelength_3d", ...)
sr_src sr_src
# Set spatial dimensions and swath CRS
= sr_src.rio.set_spatial_dims("pixels_per_line", "number_of_lines").rio.write_crs("epsg:4326")
sr_src sr_src
# Grid the data using rioxarray and rename dimensions
= sr_src.rio.reproject(
sr_dst =sr_src.rio.crs,
dst_crs=(
src_geoloc_array"longitude"],
sr_src.coords["latitude"],
sr_src.coords[
),=np.nan,
nodata=rio.warp.Resampling.nearest,
resampling'y': 'latitude', 'x': 'longitude'}) ).rename({
sr_dst
From this point, we can either subset the data spatially to a smaller region of interest, or extract a series of location points to work with.
Interactive Spatial and Spectral Plots
Combining the Spatial and Spectral information into a single visualization can be a powerful tool for exploring and inspecting imaging spectroscopy data. Using the streams
module from Holoviews
we can link a spatial map to a plot of spectra.
We could plot a single band image as we previously have, but using a multiband image, like an RGB may help infer what targets we’re examining. Build function to select and adjust the brightness of bands to create a nice RGB image.
Select bands to represent red (650 nm), green (560 nm), and blue (470 nm) by finding the nearest to a wavelength chosen to represent that color.
= sr_dst.sel(wavelength_3d=[650, 560, 470], method='nearest')
rgb_da rgb_da
To get a better rgb image, we can balance the brightness of the selected wavelengths to make a prettier map. This will not affect the data, just the visuals. To do this we will use the function below. We can change the bright argument to increase or decrease the brightness of the scene as a whole.
def gamma_adjust(rgb_da, bright=0.3):
"""
Adjust gamma across all bands in the RGB dataset.
"""
= rgb_da.data
array # Mask nan and negative values
= np.isnan(array) | (array < 0)
invalid = ~invalid
valid
# Calculate gamma based on the mean of valid values
= np.nanmean(array[valid])
mean_valid = math.log(bright) / math.log(mean_valid)
gamma # Apply scaling and clip
= np.full_like(array, np.nan)
scaled = np.power(array[valid], gamma)
scaled[valid] = np.clip(scaled, 0, 1)
rgb_da.data return rgb_da
= gamma_adjust(rgb_da, bright=0.3) rgb_da
='longitude', y='latitude', bands='wavelength_3d', frame_height=400, geo=True, crs='EPSG:4326', tiles='OSM', title="Stretched RGB PACE Image") rgb_da.hvplot.rgb(x
Now that we have an RGB dataset, we can use that to create a spatial plot, and data selected by clicking on our rgb map, which can be inputs for a function to return values from the full dataset at the selected latitude and longitude location using the cell below. To visualize the spectral and spatial data side-by-side, we use the Point Draw
tool from the holoviews
library.
Click in the RGB image to add spectra to the plot. You can also click and hold the mouse button then drag previously placed points. To remove a point click and hold the mouse button down, then press the backspace key.
# Starting Point from center pixel
= -85.3879
x_start = 31.3446
y_start
# Set Point Limit
= 10
POINT_LIMIT
# Set up Color Cycling
= hv.Cycle('Category20')
color_cycle
= ([x_start], [y_start], [0])
first_point
= gv.Points(first_point, kdims=['longitude','latitude'], vdims='id', crs=cartopy.crs.PlateCarree())
points
= hv.streams.PointDraw(
points_stream =points.columns(),
data=points,
source=True,
drag=POINT_LIMIT,
num_objects={'fill_color': color_cycle.values[:POINT_LIMIT], 'line_color': 'gray'}
styles
)
# RGB Plot without Basemap
= rgb_da.hvplot.rgb(x='longitude', y='latitude', bands='wavelength_3d',
rgb_map =480, frame_width=480,
frame_height=cartopy.crs.PlateCarree(),
crs="Stretched RGB PACE Image")
title
# Coastlines
= gv.feature.coastline(projection=cartopy.crs.PlateCarree()).opts(line_color='black', line_width=1)
coastlines
= hv.streams.PointerXY(source=rgb_map, x=x_start, y=y_start)
posxy = hv.streams.Tap(source=rgb_map, x=x_start, y=y_start)
clickxy
# Function to build spectral plot of clicked location to show on hover stream plot
def click_spectra(data):
= [c for c in zip(data['longitude'], data['latitude'])]
coordinates
= {}
plots for i, coords in enumerate(coordinates):
= coords
x, y = sr_dst.sel(longitude=x, latitude=y, method="nearest")
selected = (
plots[i]
selected.hvplot.line(="rhos",
y="wavelength_3d",
x=(340, 895), #you can comment or change the range here to see the multispectral bands too
xlim=f"{i}"
label
)
)"id"][i] = i
points_stream.data[return hv.NdOverlay(plots).opts(hv.opts.Curve(color=color_cycle))
def hover_spectra(x,y):
return sr_dst.sel(longitude=x,latitude=y,method='nearest').hvplot.line(y='rhos',x='wavelength_3d',
='black', frame_width=480)
color# Define the Dynamic Maps
= hv.DynamicMap(click_spectra, streams=[points_stream])
click_dmap = hv.DynamicMap(hover_spectra, streams=[posxy])
hover_dmap
# Plot the Map and Dynamic Map side by side
*click_dmap + rgb_map * coastlines* points).cols(2).opts(
hv.Layout(hover_dmap=['point_draw'], size=10, tools=['hover'], color='white', line_color='gray'),
hv.opts.Points(active_tools=False, show_title=False, fontscale=1.5, frame_height=480)
hv.opts.Overlay(show_legend )
We can take these selected points and the corresponding reflectance spectra and save them as a .csv for later use.
= points_stream.data
data = sr_dst.wavelength_3d.values
wavelengths
= [["id", "longitude", "latitude"] + [str(i) for i in wavelengths]]
rows
for p in zip(data['longitude'], data['latitude'], data['id']):
= p
x, y, i = sr_dst.sel(longitude=x, latitude=y, method="nearest").values
spectra = [i, x, y] + list(spectra)
row rows.append(row)
with open('../data/pace_interactive_plot_data.csv', 'w', newline='') as f:
= csv.writer(f)
writer writer.writerows(rows)
# In case you cannot save the file, check your directory
# import os
# print(os.getcwd())
Calculate unique indices
Hyperspectral-enabled Vegetation Indices (VI)
We’ll take the Chlorophyll Index Red Edge (CIRE) as an example of a hyperspectral-enabled VI. CIRE uses bands from the red edge and the NIR to get at relative canopy chlorophyll content.
CIRE: (ρ 800 /ρ 705)−1
Carotenoid Content Index (Car): [(1/ρ495)−(1/ρ705)] * ρ800
Information on Hyperspectral enabled indices by PACE here.
## Chlorophyll Index Red Edge (CIRE)
# Select bands
= sr_dst.sel({"wavelength_3d": 800}, method="nearest")
sr_800 = sr_dst.sel({"wavelength_3d": 705}, method="nearest")
sr_705 #Calculate
= (sr_800 / sr_705) - 1
cire "long_name"] = "Chlorophyll Index Red Edge (CIRE)" cire.attrs[
cire
#Plot
= cire.hvplot.image(x='longitude', y='latitude', frame_height=480, geo=True, cmap="viridis", tiles='CartoDark', title="CIRE from PACE OCI")
map1 map1
## Carotenoid Content Index (Car)
# Select additional bands
= sr_dst.sel({"wavelength_3d": 495}, method="nearest")
sr_495
#Calculate
= ((1 / sr_495)- (1 / sr_705)) * sr_800
car "long_name"] = "Caretenoid Content Index (Car)" car.attrs[
= car.hvplot.image(x='longitude', y='latitude', frame_height=480, geo=True, cmap="plasma", tiles='CartoDark', title="CAR from PACE OCI")
map2 map2
# Compare CIRE and CAR side to side
=400, frame_width=480)+map2.opts(frame_height=400, frame_width=480) map1.opts(frame_height
Enjoy PACE data
Calculate your own vegetation index
- Select relevant bands
- Calculate index
- Plot data
Contact Info:
Email: LPDAAC@usgs.gov
Voice: +1-866-573-3222
Organization: Land Processes Distributed Active Archive Center (LP DAAC)¹
Website: https://www.earthdata.nasa.gov/centers/lp-daac
¹Work performed under USGS contract 140G0121D0001 for NASA contract NNG14HH33I.