Access Data from the Ocean Color Instrument (OCI)#
Author(s): Anna Windle (NASA, SSAI), Ian Carroll (NASA, UMBC), Carina Poulin (NASA, SSAI)
Last updated: August 3, 2025
An Earthdata Login account is required to access data from the NASA Earthdata system, including NASA ocean color data.
Summary#
In this example we will use the earthaccess package to search for
OCI products on NASA Earthdata. The earthaccess package, published
on the Python Package Index and conda-forge,
facilitates discovery and use of all NASA Earth Science data
products by providing an abstraction layer for NASA’s Common
Metadata Repository (CMR) API and by simplifying requests to
NASA’s Earthdata Cloud. Searching for data is more
approachable using earthaccess when compared to using low-level
HTTP requests or working diectly with S3 object stores.
In short, earthaccess helps authenticate with Earthdata Login,
makes search easier, and provides a stream-lined way to load
data into xarray containers. For more on earthaccess, visit
the documentation site. Be aware that
earthaccess is under active development.
To understand the discussions below on downloading and opening data, we need to clearly understand where our notebook is running. There are three cases to distinguish:
The notebook is running on the local host. A Jupyter server on your laptop is a local host.
The notebook is running on a remote host, but it does not have direct access to the NASA Earthdata Cloud. For instance, GitHub Codespaces is a remote host that runs on a different cloud platform than the NASA Earthdata Cloud.
The notebook is running on a remote host that does have direct access to the NASA Earthdata Cloud.
Learning Objectives#
At the end of this notebook you will know:
How to store your NASA Earthdata Login credentials with
earthaccessHow to use
earthaccessto search for OCI data using search filtersHow to download OCI data, but only when you need to
1. Setup#
We begin by importing the packages used in this notebook.
import earthaccess
import xarray as xr
2. NASA Earthdata Authentication#
Next, we authenticate using our Earthdata Login
credentials. Authentication is not needed to search publicly
available collections in Earthdata, but is always needed to access
data. We use the login method from the earthaccess
package. This will create an authenticated session when we provide a
valid Earthdata Login username and password. The earthaccess
package will search for credentials defined by environmental
variables or within a .netrc file saved in the home
directory. If credentials are not found, an interactive prompt will
allow you to input credentials.
The persist=True argument ensures any discovered credentials are
stored in a .netrc file, so the argument is not necessary (but
it’s also harmless) for subsequent calls to earthaccess.login.
auth = earthaccess.login(persist=True)
3. Search for Data#
Collections on NASA Earthdata are discovered with the
search_datasets function, which accepts an instrument filter as an
easy way to get started. Each item in the list of
collections returned in the search results has a “short-name”.
results = earthaccess.search_datasets(instrument="oci")
for item in results:
summary = item.summary()
print(summary["short-name"])
PACE_OCI_L0_SCI
PACE_OCI_L1A_SCI
PACE_OCI_L1B_SCI
PACE_OCI_L1C_SCI
PACE_OCI_L2_UVAI_UAA_NRT
PACE_OCI_L2_UVAI_UAA
PACE_OCI_L2_AER_UAA_NRT
PACE_OCI_L2_AER_UAA
PACE_OCI_L2_AOP_NRT
PACE_OCI_L2_AOP
PACE_OCI_L2_CLOUD_MASK_NRT
PACE_OCI_L2_CLOUD_MASK
PACE_OCI_L2_CLOUD_NRT
PACE_OCI_L2_CLOUD
PACE_OCI_L2_IOP_NRT
PACE_OCI_L2_IOP
PACE_OCI_L2_LANDVI_NRT
PACE_OCI_L2_LANDVI
PACE_OCI_L2_BGC
PACE_OCI_L2_BGC_NRT
PACE_OCI_L2_PAR_NRT
PACE_OCI_L2_PAR
PACE_OCI_L2_SFREFL_NRT
PACE_OCI_L2_SFREFL
PACE_OCI_L3B_AOT_NRT
PACE_OCI_L3B_AOT
PACE_OCI_L3B_AVW_NRT
PACE_OCI_L3B_AVW
PACE_OCI_L3B_CARBON
PACE_OCI_L3B_CARBON_NRT
PACE_OCI_L3B_CHL_NRT
PACE_OCI_L3B_CHL
PACE_OCI_L3B_KD_NRT
PACE_OCI_L3B_KD
PACE_OCI_L3B_FLH_NRT
PACE_OCI_L3B_FLH
PACE_OCI_L3B_IOP_NRT
PACE_OCI_L3B_IOP
PACE_OCI_L3B_LANDVI_NRT
PACE_OCI_L3B_LANDVI
PACE_OCI_L3B_PIC_NRT
PACE_OCI_L3B_PIC
PACE_OCI_L3B_POC_NRT
PACE_OCI_L3B_POC
PACE_OCI_L3B_PAR_NRT
PACE_OCI_L3B_PAR
PACE_OCI_L3B_RRS_NRT
PACE_OCI_L3B_RRS
PACE_OCI_L3B_SFREFL_NRT
PACE_OCI_L3B_SFREFL
PACE_OCI_L3M_UVAI_UAA_NRT
PACE_OCI_L3M_UVAI_UAA
PACE_OCI_L3M_AER_UAA_NRT
PACE_OCI_L3M_AER_UAA
PACE_OCI_L3M_AOT_NRT
PACE_OCI_L3M_AOT
PACE_OCI_L3M_AVW_NRT
PACE_OCI_L3M_AVW
PACE_OCI_L3M_CARBON
PACE_OCI_L3M_CARBON_NRT
PACE_OCI_L3M_CHL_NRT
PACE_OCI_L3M_CHL
PACE_OCI_L3M_CLOUD_MASK_NRT
PACE_OCI_L3M_CLOUD_MASK
PACE_OCI_L3M_CLOUD_NRT
PACE_OCI_L3M_CLOUD
PACE_OCI_L3M_KD_NRT
PACE_OCI_L3M_KD
PACE_OCI_L3M_FLH_NRT
PACE_OCI_L3M_FLH
PACE_OCI_L3M_IOP_NRT
PACE_OCI_L3M_IOP
PACE_OCI_L3M_LANDVI_NRT
PACE_OCI_L3M_LANDVI
PACE_OCI_L3M_PIC_NRT
PACE_OCI_L3M_PIC
PACE_OCI_L3M_POC_NRT
PACE_OCI_L3M_POC
PACE_OCI_L3M_PAR_NRT
PACE_OCI_L3M_PAR
PACE_OCI_L3M_RRS_NRT
PACE_OCI_L3M_RRS
PACE_OCI_L3M_SFREFL_NRT
PACE_OCI_L3M_SFREFL
PACE_OCI_L4M_MOANA
PACE_OCI_L4M_MOANA_NRT
The short name can also be found on Earthdata Search, directly under the collection name, after clicking on the “i” button for a collection in any search result.
Next, we use the search_data function (as opposed to search_datasets) to find granules within a collection.
You can use search_data across collections too, but we’ll limit to a single collection by specifying one of the above short_name values.
Let’s use the short_name for the PACE/OCI Level-2 biogeochemistry (BGC) products.
The count argument limits the number of granule records that are returned in the search results.
results = earthaccess.search_data(
short_name="PACE_OCI_L2_BGC",
count=1,
)
Displaying results shows the direct download link (try it!), along with a “quick-look” of some variable within the granule. The link will download the granule to your local machine, which may or may not be what you want to do. Even if you are running the notebook on a remote host, this download link will open a new browser tab or window and offer to save a file to your local machine. If you are running the notebook locally, this may be of use. More likely, you want to open or download the granules by following the steps below.
results[0]
We can refine our search by passing more parameters that describe
the spatiotemporal domain of our use case. Here, we use the
temporal parameter to request a date range and the bounding_box
parameter to request granules that intersect with a bounding box. We
can even provide a cloud_cover threshold to limit files that have
a lower percetnage of cloud cover. We do not provide a count, so
we’ll get all granules that satisfy the constraints.
tspan = ("2024-05-01", "2024-05-16")
bbox = (-76.75, 36.97, -75.74, 39.01)
clouds = (0, 50)
results = earthaccess.search_data(
short_name="PACE_OCI_L2_BGC",
temporal=tspan,
bounding_box=bbox,
cloud_cover=clouds,
)
len(results)
3
for item in results:
display(item)
4. Open Data#
The results returned by earthaccess.search_data are just catalog entries, but include
links to the data that we are able to access with xarray. The earthaccess.open function
is used when you want to directly load data from a remote filesystem without downloading whole granules.
When running code on a host with direct access to the NASA Earthdata Cloud, you don’t need to download
the granule and earthaccess.open is the way to go.
Here is another search, this time for Level-3 granules from a single day, followed by earthaccess.open
on the results list.
results = earthaccess.search_data(
short_name="PACE_OCI_L3M_CHL",
temporal=("2024-06-01", "2024-06-01"),
)
paths = earthaccess.open(results)
The list of outputs, which we called paths, contains references to files on a remote filesystem. They’re not
paths to a local file, but many utilities that expect a file path can also use these “file-like” paths.
If you see HTTPFileSystem in the output when displaying paths, then earthaccess has determined that you do not have
direct access to the NASA Earthdata Cloud.
It may be wrong.
Despite not having downloaded these granules, we can now access their content with xarray. As always,
the xarray package does “lazy loading”, so only coordinates are loaded until the daa variables are
actually needed.
dataset = xr.open_dataset(paths[0])
dataset
<xarray.Dataset> Size: 26MB
Dimensions: (lat: 1800, lon: 3600, rgb: 3, eightbitcolor: 256)
Coordinates:
* lat (lat) float32 7kB 89.95 89.85 89.75 89.65 ... -89.75 -89.85 -89.95
* lon (lon) float32 14kB -179.9 -179.9 -179.8 ... 179.8 179.9 180.0
Dimensions without coordinates: rgb, eightbitcolor
Data variables:
chlor_a (lat, lon) float32 26MB ...
palette (rgb, eightbitcolor) uint8 768B ...
Attributes: (12/64)
product_name: PACE_OCI.20240321_20240620.L3m.SNSP.CH...
instrument: OCI
title: OCI Level-3 Standard Mapped Image
project: Ocean Biology Processing Group (NASA/G...
platform: PACE
source: satellite observations from OCI-PACE
... ...
identifier_product_doi: 10.5067/PACE/OCI/L3M/CHL/3.1
keywords: Earth Science > Oceans > Ocean Chemist...
keywords_vocabulary: NASA Global Change Master Directory (G...
data_bins: 3554869
data_minimum: 0.0009999999
data_maximum: 99.464676Even if you only want to read a slice of the data, and downloading
seems unncessary, if you use earthaccess.open while not running on
a remote host with direct access to the NASA Earthdata Cloud,
performance will be very poor. This is not a problem with the
cloud or with earthaccess, it has to do with the data format and
may soon be improved.
For one reason or another, you also need to know how to download whole granules to the local or remote host running your code.
5. Download Data#
When you do not have direct access to the Earthdata Cloud, you’ll want to download the data. You may also
want to download a granule for faster reads while you are learning your way around the files. Rather
than earthaccess.open we call earthaccess.download on the same search results.
For this function, provide the list returned by earthaccess.search_data
along with a directory for earthaccess to use for the downloads.
paths = earthaccess.download(results, local_path="granules")
The paths list now contains paths to actual files.
paths
[PosixPath('granules/PACE_OCI.20240321_20240620.L3m.SNSP.CHL.V3_1.chlor_a.0p1deg.nc'),
PosixPath('granules/PACE_OCI.20240321_20240620.L3m.SNSP.CHL.V3_1.chlor_a.4km.nc'),
PosixPath('granules/PACE_OCI.20240508_20240608.L3m.R32.CHL.V3_1.chlor_a.0p1deg.nc'),
PosixPath('granules/PACE_OCI.20240508_20240608.L3m.R32.CHL.V3_1.chlor_a.4km.nc'),
PosixPath('granules/PACE_OCI.20240516_20240616.L3m.R32.CHL.V3_1.chlor_a.0p1deg.nc'),
PosixPath('granules/PACE_OCI.20240516_20240616.L3m.R32.CHL.V3_1.chlor_a.4km.nc'),
PosixPath('granules/PACE_OCI.20240524_20240624.L3m.R32.CHL.V3_1.chlor_a.0p1deg.nc'),
PosixPath('granules/PACE_OCI.20240524_20240624.L3m.R32.CHL.V3_1.chlor_a.4km.nc'),
PosixPath('granules/PACE_OCI.20240601.L3m.DAY.CHL.V3_1.chlor_a.0p1deg.nc'),
PosixPath('granules/PACE_OCI.20240601.L3m.DAY.CHL.V3_1.chlor_a.4km.nc'),
PosixPath('granules/PACE_OCI.20240601_20240608.L3m.8D.CHL.V3_1.chlor_a.4km.nc'),
PosixPath('granules/PACE_OCI.20240601_20240608.L3m.8D.CHL.V3_1.chlor_a.0p1deg.nc'),
PosixPath('granules/PACE_OCI.20240601_20240630.L3m.MO.CHL.V3_1.chlor_a.0p1deg.nc'),
PosixPath('granules/PACE_OCI.20240601_20240630.L3m.MO.CHL.V3_1.chlor_a.4km.nc'),
PosixPath('granules/PACE_OCI.20240601_20240702.L3m.R32.CHL.V3_1.chlor_a.0p1deg.nc'),
PosixPath('granules/PACE_OCI.20240601_20240702.L3m.R32.CHL.V3_1.chlor_a.4km.nc')]
We can open one of these downnloaded files in just the same way with xarray.
dataset = xr.open_dataset(paths[0])
dataset
<xarray.Dataset> Size: 26MB
Dimensions: (lat: 1800, lon: 3600, rgb: 3, eightbitcolor: 256)
Coordinates:
* lat (lat) float32 7kB 89.95 89.85 89.75 89.65 ... -89.75 -89.85 -89.95
* lon (lon) float32 14kB -179.9 -179.9 -179.8 ... 179.8 179.9 180.0
Dimensions without coordinates: rgb, eightbitcolor
Data variables:
chlor_a (lat, lon) float32 26MB ...
palette (rgb, eightbitcolor) uint8 768B ...
Attributes: (12/64)
product_name: PACE_OCI.20240321_20240620.L3m.SNSP.CH...
instrument: OCI
title: OCI Level-3 Standard Mapped Image
project: Ocean Biology Processing Group (NASA/G...
platform: PACE
source: satellite observations from OCI-PACE
... ...
identifier_product_doi: 10.5067/PACE/OCI/L3M/CHL/3.1
keywords: Earth Science > Oceans > Ocean Chemist...
keywords_vocabulary: NASA Global Change Master Directory (G...
data_bins: 3554869
data_minimum: 0.0009999999
data_maximum: 99.464676Anywhere in any of these notebooks where paths = earthaccess.open(...) is used to read data directly from the NASA Earthdata Cloud, you need to substitute paths = earthaccess.download(..., local_path) before running the notebook on a local host or a remote host that does not have direct access to the NASA Earthdata Cloud.
You have completed the notebook on downloading and opening datasets. We now suggest starting the notebook on “File Structure at Three Processing Levels”.



