This notebook is from EMIT-Data-Resources

Source: How to Extract Points

Imported on: 2025-01-22

How to: Extracting EMIT Spectra at Specified Coordinates

Summary

In this notebook we will open a netCDF4 file from the Earth Surface Minteral Dust Source Investigation (EMIT) as an xarray.Dataset. We will then extract the spectra at point coordinates from a .csv as a dataframe, then save and plot the data.

Requirements: + A NASA Earthdata Login account is required to download EMIT data
+ Selected the emit_tutorials environment as the kernel for this notebook. + For instructions on setting up the environment, follow the the setup_instructions.md included in the /setup/ folder of the repository.

Learning Objectives + How to open an EMIT file as an xarray.Dataset + How to extract spectra at coordinates listed in a .csv file

Import the required Python libraries.

# Import Packages
import sys
import os
import earthaccess
import numpy as np
import pandas as pd
import xarray as xr
import hvplot.pandas
import hvplot.xarray
import holoviews as hv
sys.path.append('../modules/')
from emit_tools import emit_xarray

Login to your NASA Earthdata account and create a .netrc file using the login function from the earthaccess library. If you do not have an Earthdata Account, you can create one here.

earthaccess.login(persist=True)

For this notebook we will download the files necessary using earthaccess. You can also access the data in place or stream it, but this can slow due to the file sizes. Provide a URL for an EMIT L2A Reflectance granule.

url = 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL2ARFL.001/EMIT_L2A_RFL_001_20220903T163129_2224611_012/EMIT_L2A_RFL_001_20220903T163129_2224611_012.nc'

Get an HTTPS Session using your earthdata login, set a local path to save the file, and download the granule asset - This may take a while, the reflectance file is approximately 1.8 GB.

# Get requests https Session using Earthdata Login Info
fs = earthaccess.get_requests_https_session()
# Retrieve granule asset ID from URL (to maintain existing naming convention)
granule_asset_id = url.split('/')[-1]
# Define Local Filepath
fp = f'../../data/{granule_asset_id}'
# Download the Granule Asset if it doesn't exist
if not os.path.isfile(fp):
    with fs.get(url,stream=True) as src:
        with open(fp,'wb') as dst:
            for chunk in src.iter_content(chunk_size=64*1024*1024):
                dst.write(chunk)

Open the file downloaded and defined as fp. To do this, we will use the emit_tools module which contains a few helpful functions that can be used with EMIT data. Use the ortho=True option to orthorectify the dataset.

ds = emit_xarray(fp, ortho=True)
ds

Now open the .csv included in the /data/ directory as a pandas.dataframe.

Note: The category values here are arbitrary and included as an example of an additional column users may want.

points = pd.read_csv('../../data/sample_coords.csv')
points

Make a plot to visualize the points we’re going to select on the dataset. Here we use the reflectance values at 850nm as our basemap. To mask fill-values, assign them to np.nan.

# Assign fill-values to NaN
ds.reflectance.data[ds.reflectance.data == -9999] = np.nan
# Select a single band and plot points on top
ds.sel(wavelengths=850,method='nearest').hvplot.image(cmap='greys', frame_height=600, frame_width=600, geo=True, crs='EPSG:4326').opts(title=f"Example Points")*\
points.hvplot.points(x='Longitude',y='Latitude', by='ID', color=hv.Cycle('Dark2'), geo=True, crs='EPSG:4326')

Set the points dataframe index as ID to utilize our existing point ID’s as an index.

points = points.set_index(['ID'])

Convert the dataframe to an xarray.Dataset

xp = points.to_xarray()
xp

Select the data from our EMIT dataset using the Latitude and Longitude coordinates from our point dataset, then convert the output to a pandas.dataframe.

extracted = ds.sel(latitude=xp.Latitude,longitude=xp.Longitude, method='nearest').to_dataframe()
extracted

The output is a longform dataframe using the 'ID' field as an index. This is missing our 'Category' column from our original dataframe. Use the pd.join function to add the 'Category' column to our dataset using 'ID' as an index.

df = extracted.join(points['Category'], on=['ID'])
df

Now we have a dataframe containing our initial data, in addition to the extracted point data. This a a good place to save an output as a .csv. Go ahead and do that below.

df.to_csv('../../data/example_out.csv')

We can use our dataframe to plot the reflectance data we extracted, but first, mask the reflectance values of -0.01, which represent deep water vapor absorption regions. To do this, assign values where the reflectance = -0.01 to np.nan.

# Data values are slightly different than -0.01 because of floating point precision, which requires a tolerance to be set
df['reflectance'] = df['reflectance'].where(abs(df['reflectance'] - (-0.01)) > 1e-8, np.nan)

Plot the data using hvplot. We can use by= to separate the reflectances by their ID.

df.hvplot(x='wavelengths',y='reflectance', by=['ID'], color=hv.Cycle('Dark2'), frame_height=400, frame_width=600).opts(title='Example Points - Reflectance', xlabel='Wavelengths (nm)',ylabel='Reflectance')

Contact Info:

Email: LPDAAC@usgs.gov
Voice: +1-866-573-3222
Organization: Land Processes Distributed Active Archive Center (LP DAAC)¹
Website: https://lpdaac.usgs.gov/
Date last modified: 03-13-2024

¹Work performed under USGS contract G15PD00467 for NASA contract NNG14HH33I.