Access Data from the Ocean Color Instrument (OCI)#

Author(s): Anna Windle (NASA, SSAI), Ian Carroll (NASA, UMBC), Carina Poulin (NASA, SSAI)

Last updated: August 3, 2025

Summary#

In this example we will use the earthaccess package to search for OCI products on NASA Earthdata. The earthaccess package, published on the Python Package Index and conda-forge, facilitates discovery and use of all NASA Earth Science data products by providing an abstraction layer for NASA’s Common Metadata Repository (CMR) API and by simplifying requests to NASA’s Earthdata Cloud. Searching for data is more approachable using earthaccess when compared to using low-level HTTP requests or working diectly with S3 object stores.

In short, earthaccess helps authenticate with Earthdata Login, makes search easier, and provides a stream-lined way to load data into xarray containers. For more on earthaccess, visit the documentation site. Be aware that earthaccess is under active development.

To understand the discussions below on downloading and opening data, we need to clearly understand where our notebook is running. There are three cases to distinguish:

  1. The notebook is running on the local host. A Jupyter server on your laptop is a local host.

  2. The notebook is running on a remote host, but it does not have direct access to the NASA Earthdata Cloud. For instance, GitHub Codespaces is a remote host that runs on a different cloud platform than the NASA Earthdata Cloud.

  3. The notebook is running on a remote host that does have direct access to the NASA Earthdata Cloud.

Learning Objectives#

At the end of this notebook you will know:

  • How to store your NASA Earthdata Login credentials with earthaccess

  • How to use earthaccess to search for OCI data using search filters

  • How to download OCI data, but only when you need to

1. Setup#

We begin by importing the packages used in this notebook.

import earthaccess
import xarray as xr

2. NASA Earthdata Authentication#

Next, we authenticate using our Earthdata Login credentials. Authentication is not needed to search publicly available collections in Earthdata, but is always needed to access data. We use the login method from the earthaccess package. This will create an authenticated session when we provide a valid Earthdata Login username and password. The earthaccess package will search for credentials defined by environmental variables or within a .netrc file saved in the home directory. If credentials are not found, an interactive prompt will allow you to input credentials.

auth = earthaccess.login(persist=True)

3. Search for Data#

Collections on NASA Earthdata are discovered with the search_datasets function, which accepts an instrument filter as an easy way to get started. Each item in the list of collections returned in the search results has a “short-name”.

results = earthaccess.search_datasets(instrument="oci")
for item in results:
    summary = item.summary()
    print(summary["short-name"])
PACE_OCI_L0_SCI
PACE_OCI_L1A_SCI
PACE_OCI_L1B_SCI
PACE_OCI_L1C_SCI
PACE_OCI_L2_UVAI_UAA_NRT
PACE_OCI_L2_UVAI_UAA
PACE_OCI_L2_AER_UAA_NRT
PACE_OCI_L2_AER_UAA
PACE_OCI_L2_AOP_NRT
PACE_OCI_L2_AOP
PACE_OCI_L2_CLOUD_MASK_NRT
PACE_OCI_L2_CLOUD_MASK
PACE_OCI_L2_CLOUD_NRT
PACE_OCI_L2_CLOUD
PACE_OCI_L2_IOP_NRT
PACE_OCI_L2_IOP
PACE_OCI_L2_LANDVI_NRT
PACE_OCI_L2_LANDVI
PACE_OCI_L2_BGC
PACE_OCI_L2_BGC_NRT
PACE_OCI_L2_PAR_NRT
PACE_OCI_L2_PAR
PACE_OCI_L2_SFREFL_NRT
PACE_OCI_L2_SFREFL
PACE_OCI_L3B_AOT_NRT
PACE_OCI_L3B_AOT
PACE_OCI_L3B_AVW_NRT
PACE_OCI_L3B_AVW
PACE_OCI_L3B_CARBON
PACE_OCI_L3B_CARBON_NRT
PACE_OCI_L3B_CHL_NRT
PACE_OCI_L3B_CHL
PACE_OCI_L3B_KD_NRT
PACE_OCI_L3B_KD
PACE_OCI_L3B_FLH_NRT
PACE_OCI_L3B_FLH
PACE_OCI_L3B_IOP_NRT
PACE_OCI_L3B_IOP
PACE_OCI_L3B_LANDVI_NRT
PACE_OCI_L3B_LANDVI
PACE_OCI_L3B_PIC_NRT
PACE_OCI_L3B_PIC
PACE_OCI_L3B_POC_NRT
PACE_OCI_L3B_POC
PACE_OCI_L3B_PAR_NRT
PACE_OCI_L3B_PAR
PACE_OCI_L3B_RRS_NRT
PACE_OCI_L3B_RRS
PACE_OCI_L3B_SFREFL_NRT
PACE_OCI_L3B_SFREFL
PACE_OCI_L3M_UVAI_UAA_NRT
PACE_OCI_L3M_UVAI_UAA
PACE_OCI_L3M_AER_UAA_NRT
PACE_OCI_L3M_AER_UAA
PACE_OCI_L3M_AOT_NRT
PACE_OCI_L3M_AOT
PACE_OCI_L3M_AVW_NRT
PACE_OCI_L3M_AVW
PACE_OCI_L3M_CARBON
PACE_OCI_L3M_CARBON_NRT
PACE_OCI_L3M_CHL_NRT
PACE_OCI_L3M_CHL
PACE_OCI_L3M_CLOUD_MASK_NRT
PACE_OCI_L3M_CLOUD_MASK
PACE_OCI_L3M_CLOUD_NRT
PACE_OCI_L3M_CLOUD
PACE_OCI_L3M_KD_NRT
PACE_OCI_L3M_KD
PACE_OCI_L3M_FLH_NRT
PACE_OCI_L3M_FLH
PACE_OCI_L3M_IOP_NRT
PACE_OCI_L3M_IOP
PACE_OCI_L3M_LANDVI_NRT
PACE_OCI_L3M_LANDVI
PACE_OCI_L3M_PIC_NRT
PACE_OCI_L3M_PIC
PACE_OCI_L3M_POC_NRT
PACE_OCI_L3M_POC
PACE_OCI_L3M_PAR_NRT
PACE_OCI_L3M_PAR
PACE_OCI_L3M_RRS_NRT
PACE_OCI_L3M_RRS
PACE_OCI_L3M_SFREFL_NRT
PACE_OCI_L3M_SFREFL
PACE_OCI_L4M_MOANA
PACE_OCI_L4M_MOANA_NRT

Next, we use the search_data function (as opposed to search_datasets) to find granules within a collection. You can use search_data across collections too, but we’ll limit to a single collection by specifying one of the above short_name values. Let’s use the short_name for the PACE/OCI Level-2 biogeochemistry (BGC) products.

The count argument limits the number of granule records that are returned in the search results.

results = earthaccess.search_data(
    short_name="PACE_OCI_L2_BGC",
    count=1,
)

Displaying results shows the direct download link (try it!), along with a “quick-look” of some variable within the granule. The link will download the granule to your local machine, which may or may not be what you want to do. Even if you are running the notebook on a remote host, this download link will open a new browser tab or window and offer to save a file to your local machine. If you are running the notebook locally, this may be of use. More likely, you want to open or download the granules by following the steps below.

results[0]

Data: PACE_OCI.20240305T000858.L2.OC_BGC.V3_1.nc

Size: 14.45 MB

Cloud Hosted: True

Data Preview

We can refine our search by passing more parameters that describe the spatiotemporal domain of our use case. Here, we use the temporal parameter to request a date range and the bounding_box parameter to request granules that intersect with a bounding box. We can even provide a cloud_cover threshold to limit files that have a lower percetnage of cloud cover. We do not provide a count, so we’ll get all granules that satisfy the constraints.

tspan = ("2024-05-01", "2024-05-16")
bbox = (-76.75, 36.97, -75.74, 39.01)
clouds = (0, 50)
results = earthaccess.search_data(
    short_name="PACE_OCI_L2_BGC",
    temporal=tspan,
    bounding_box=bbox,
    cloud_cover=clouds,
)
len(results)
3
for item in results:
    display(item)

Data: PACE_OCI.20240502T172807.L2.OC_BGC.V3_1.nc

Size: 22.29 MB

Cloud Hosted: True

Data Preview

Data: PACE_OCI.20240508T174100.L2.OC_BGC.V3_1.nc

Size: 22.14 MB

Cloud Hosted: True

Data Preview

Data: PACE_OCI.20240513T171853.L2.OC_BGC.V3_1.nc

Size: 23.18 MB

Cloud Hosted: True

Data Preview

4. Open Data#

The results returned by earthaccess.search_data are just catalog entries, but include links to the data that we are able to access with xarray. The earthaccess.open function is used when you want to directly load data from a remote filesystem without downloading whole granules. When running code on a host with direct access to the NASA Earthdata Cloud, you don’t need to download the granule and earthaccess.open is the way to go.

Here is another search, this time for Level-3 granules from a single day, followed by earthaccess.open on the results list.

results = earthaccess.search_data(
    short_name="PACE_OCI_L3M_CHL",
    temporal=("2024-06-01", "2024-06-01"),
)
paths = earthaccess.open(results)

The list of outputs, which we called paths, contains references to files on a remote filesystem. They’re not paths to a local file, but many utilities that expect a file path can also use these “file-like” paths.

Despite not having downloaded these granules, we can now access their content with xarray. As always, the xarray package does “lazy loading”, so only coordinates are loaded until the daa variables are actually needed.

dataset = xr.open_dataset(paths[0])
dataset
<xarray.Dataset> Size: 26MB
Dimensions:  (lat: 1800, lon: 3600, rgb: 3, eightbitcolor: 256)
Coordinates:
  * lat      (lat) float32 7kB 89.95 89.85 89.75 89.65 ... -89.75 -89.85 -89.95
  * lon      (lon) float32 14kB -179.9 -179.9 -179.8 ... 179.8 179.9 180.0
Dimensions without coordinates: rgb, eightbitcolor
Data variables:
    chlor_a  (lat, lon) float32 26MB ...
    palette  (rgb, eightbitcolor) uint8 768B ...
Attributes: (12/64)
    product_name:                      PACE_OCI.20240321_20240620.L3m.SNSP.CH...
    instrument:                        OCI
    title:                             OCI Level-3 Standard Mapped Image
    project:                           Ocean Biology Processing Group (NASA/G...
    platform:                          PACE
    source:                            satellite observations from OCI-PACE
    ...                                ...
    identifier_product_doi:            10.5067/PACE/OCI/L3M/CHL/3.1
    keywords:                          Earth Science > Oceans > Ocean Chemist...
    keywords_vocabulary:               NASA Global Change Master Directory (G...
    data_bins:                         3554869
    data_minimum:                      0.0009999999
    data_maximum:                      99.464676

Even if you only want to read a slice of the data, and downloading seems unncessary, if you use earthaccess.open while not running on a remote host with direct access to the NASA Earthdata Cloud, performance will be very poor. This is not a problem with the cloud or with earthaccess, it has to do with the data format and may soon be improved.

For one reason or another, you also need to know how to download whole granules to the local or remote host running your code.

5. Download Data#

When you do not have direct access to the Earthdata Cloud, you’ll want to download the data. You may also want to download a granule for faster reads while you are learning your way around the files. Rather than earthaccess.open we call earthaccess.download on the same search results.

For this function, provide the list returned by earthaccess.search_data along with a directory for earthaccess to use for the downloads.

paths = earthaccess.download(results, local_path="granules")

The paths list now contains paths to actual files.

paths
[PosixPath('granules/PACE_OCI.20240321_20240620.L3m.SNSP.CHL.V3_1.chlor_a.0p1deg.nc'),
 PosixPath('granules/PACE_OCI.20240321_20240620.L3m.SNSP.CHL.V3_1.chlor_a.4km.nc'),
 PosixPath('granules/PACE_OCI.20240508_20240608.L3m.R32.CHL.V3_1.chlor_a.0p1deg.nc'),
 PosixPath('granules/PACE_OCI.20240508_20240608.L3m.R32.CHL.V3_1.chlor_a.4km.nc'),
 PosixPath('granules/PACE_OCI.20240516_20240616.L3m.R32.CHL.V3_1.chlor_a.0p1deg.nc'),
 PosixPath('granules/PACE_OCI.20240516_20240616.L3m.R32.CHL.V3_1.chlor_a.4km.nc'),
 PosixPath('granules/PACE_OCI.20240524_20240624.L3m.R32.CHL.V3_1.chlor_a.0p1deg.nc'),
 PosixPath('granules/PACE_OCI.20240524_20240624.L3m.R32.CHL.V3_1.chlor_a.4km.nc'),
 PosixPath('granules/PACE_OCI.20240601.L3m.DAY.CHL.V3_1.chlor_a.0p1deg.nc'),
 PosixPath('granules/PACE_OCI.20240601.L3m.DAY.CHL.V3_1.chlor_a.4km.nc'),
 PosixPath('granules/PACE_OCI.20240601_20240608.L3m.8D.CHL.V3_1.chlor_a.4km.nc'),
 PosixPath('granules/PACE_OCI.20240601_20240608.L3m.8D.CHL.V3_1.chlor_a.0p1deg.nc'),
 PosixPath('granules/PACE_OCI.20240601_20240630.L3m.MO.CHL.V3_1.chlor_a.0p1deg.nc'),
 PosixPath('granules/PACE_OCI.20240601_20240630.L3m.MO.CHL.V3_1.chlor_a.4km.nc'),
 PosixPath('granules/PACE_OCI.20240601_20240702.L3m.R32.CHL.V3_1.chlor_a.0p1deg.nc'),
 PosixPath('granules/PACE_OCI.20240601_20240702.L3m.R32.CHL.V3_1.chlor_a.4km.nc')]

We can open one of these downnloaded files in just the same way with xarray.

dataset = xr.open_dataset(paths[0])
dataset
<xarray.Dataset> Size: 26MB
Dimensions:  (lat: 1800, lon: 3600, rgb: 3, eightbitcolor: 256)
Coordinates:
  * lat      (lat) float32 7kB 89.95 89.85 89.75 89.65 ... -89.75 -89.85 -89.95
  * lon      (lon) float32 14kB -179.9 -179.9 -179.8 ... 179.8 179.9 180.0
Dimensions without coordinates: rgb, eightbitcolor
Data variables:
    chlor_a  (lat, lon) float32 26MB ...
    palette  (rgb, eightbitcolor) uint8 768B ...
Attributes: (12/64)
    product_name:                      PACE_OCI.20240321_20240620.L3m.SNSP.CH...
    instrument:                        OCI
    title:                             OCI Level-3 Standard Mapped Image
    project:                           Ocean Biology Processing Group (NASA/G...
    platform:                          PACE
    source:                            satellite observations from OCI-PACE
    ...                                ...
    identifier_product_doi:            10.5067/PACE/OCI/L3M/CHL/3.1
    keywords:                          Earth Science > Oceans > Ocean Chemist...
    keywords_vocabulary:               NASA Global Change Master Directory (G...
    data_bins:                         3554869
    data_minimum:                      0.0009999999
    data_maximum:                      99.464676

Anywhere in any of these notebooks where paths = earthaccess.open(...) is used to read data directly from the NASA Earthdata Cloud, you need to substitute paths = earthaccess.download(..., local_path) before running the notebook on a local host or a remote host that does not have direct access to the NASA Earthdata Cloud.