This notebook is from EMIT-Data-Resources

Source: Working with EMIT L2B Mineralogy Data

Imported on: 2025-01-22

Disclaimer: The EMITL2BMIN product is generated to support the EMIT mission objectives of constraining the sign of dust related radiative forcing. Ten mineral types are the core focus of this work: Calcite, Chlorite, Dolomite, Goethite, Gypsum, Hematite, Illite+Muscovite, Kaolinite, Montmorillonite, and Vermiculite. The EMIT_L3_ASA product contain the aggregate abundance of these minerals at a coarser resolution for use in Earth System Models. Additional minerals are included in the EMITL2BMIN product for transparency but were not the focus of this product. Further validation is required to use these additional mineral maps, particularly in the case of resource exploration. Similarly, the separation of minerals with similar spectral features, such as a fine-grained goethite and hematite, is an area of active research. The results presented here are an initial offering, but the precise categorization is likely to evolve over time, and the limits of what can and cannot be separated on the global scale is still being explored. The user is encouraged to read the Algorithm Theoretical Basis Document (ATBD) for more details.

Working with EMIT L2B Mineralogy Data

Summary

In this notebook we will open the EMIT L2B Estimated Mineral Identification and Band Depth and Uncertainty (EMITL2BMIN) products, find a mineral of interest from the ten mineral types focused on by the EMIT mission, evaluate uncertainty, orthorectify the data, then create an output mask or vector file for the granule.

Background

The EMIT instrument is an imaging spectrometer that measures light in visible and infrared wavelengths. These measurements display unique spectral signatures that correspond to the composition on the Earth’s surface. The EMIT mission focuses specifically on mapping the composition of minerals to better understand the effects of mineral dust throughout the Earth system and human populations now and in the future. More details about EMIT and its associated products can be found in the README.md and on the EMIT website.

The EMITL2BMIN data product provides estimated mineral identification, band depths and uncertainty in a spatially raw, non-orthocorrected format. Two spectral groups, which correspond to different regions of the spectra, are identified independently and often co-occur are used to identify minerals. These estimates are generated using the Tetracorder system(code) and are based on EMITL2ARFL reflectance values. The product also consists of an EMIT_L2B_MINUNCERT file, which provides band depth uncertainty estimates calculated using surface Reflectance Uncertainty values from the EMITL2ARFL data product. The band depth uncertainties are presented as standard deviations, and the fit score for each mineral identification is also provided as the coefficient of determination (r²) of the match between the continuum normalized library reference and the continuum normalized observed spectrum. Associated metadata indicates the name and reference information for each identified mineral, and additional information about aggregating minerals into different categories, and the code used for product generation is available in the emit-sds-l2b repository.

Disclaimer

The EMIT_L2B_MIN product is generated to support the EMIT mission objectives of constraining the sign of dust related radiative forcing. Ten mineral types are the core focus of this work: Calcite, Chlorite, Dolomite, Goethite, Gypsum, Hematite, Illite+Muscovite, Kaolinite, Montmorillonite, and Vermiculite. A future product will aggregate these results for use in Earth System Models. Additional minerals are included in this product for transparency but were not the focus of this product. Further validation is required to use these additional mineral maps, particularly in the case of resource exploration. Similarly, the separation of minerals with similar spectral features, such as a fine-grained goethite and hematite, is an area of active research. The results presented here are an initial offering, but the precise categorization is likely to evolve over time, and the limits of what can and cannot be separated on the global scale is still being explored. The user is encouraged to read the Algorithm Theoretical Basis Document (ATBD) for more details.

Requirements - Set up Python Environment - See setup_instructions.md in the /setup/ folder

Learning Objectives
- How to open an EMIT L2B .nc file as an xarray.Dataset - Apply the Geometry Lookup Table (GLT) to orthorectify the image. - Find minerals of interest within a granule - Visualize Mineral Identification and Band depth - Evaluate mineral uncertainty - Calculate and Visualize mineral Abundance

Tutorial Outline

Setup
Working with Mineral Identification and Band Depth
Aggregating and Mineral Abundance
Exporting to Cloud-Optimized GeoTIFF (COG)

1. Setup

Import the required Python libraries.

import earthaccess
import geopandas as gp
import os
import sys
import numpy as np
import pandas as pd
import xarray as xr
import hvplot.xarray
import holoviews as hv
import panel as pn
sys.path.append('../modules/')
import emit_tools as et

Login to your NASA Earthdata account and create a .netrc file using the login function from the earthaccess library. If you do not have an Earthdata Account, you can create one here.

earthaccess.login(persist=True)

For this notebook we will download the files necessary using earthaccess. You can also access the data in place or stream it, but this can slow due to the file sizes. Provide a URL for an EMIT L2B Mineral Identification and Band Depth granule.

# List the browse images from the text file output of the previous notebook.
min_list = '../../data/results_urls.txt'
with open(min_list) as f:
    min_urls = [line.rstrip('\n') for line in f]
min_urls

# List the browse images from the text file output of the previous notebook.
unc_list = '../../data/min_uncert_urls.txt'
with open(unc_list) as f:
    unc_urls = [line.rstrip('\n') for line in f]
unc_urls

Get an HTTPS Session using your earthdata login, set a local path to save the file, and download the granule asset, in this case we are selecting the second scene (index 1) from the list of granules for both the mineral and uncertainty files.

fs = earthaccess.get_fsspec_https_session()
fp = fs.open(min_urls[0])
fp_un = fs.open(unc_urls[0])

1.2 Downloaded Data

If you’ve already downloaded the data using the workflow shown in Section 6 of the Finding EMIT L2B Mineralogy Data , you can just set filepaths using the cell below.

#fp = '../../data/EMIT_L2B_MIN_001_20230427T173309_2311711_010.nc' # Mineral
#fp_rgb = '../../data/EMIT_L2A_RFL_001_20230427T173309_2311711_010.png' # RGB
#fp_un = '../../data/EMIT_L2B_MINUNCERT_001_20230427T173309_2311711_010.nc' # Mineral Uncertainty

2. Working with the L2B Mineral Identification and Band Depth

EMITL2BMIN data are distributed in a non-orthocorrected spatially raw NetCDF4 (.nc) format consisting of the data and its associated metadata. Inside the .nc file there are 3 groups. Groups can be thought of as containers to organize the data.

The root group that can be considered the main dataset contains 4 data variables data described by the downtrack, and crosstrack dimensions. These variables are group_1_mineral_id, group_1_band_depth, group_2_mineral_id, and group_2_band_depth. These contain the ID and a band depth for each mineral group. These groups do not correspond to the .netcdf file groups, but rather the spectral library groups used to identify the minerals based on which region of the spectra the mineral features correspond to.
The mineral_metadata group containing the spectral library entry name, index, record, group, and url for each entry.
The location group contains latitude and longitude values at the center of each pixel described by the crosstrack and downtrack dimensions, as well as a geometry lookup table (GLT) described by the ortho_x and ortho_y dimensions. The GLT is an orthorectified image (EPSG:4326) consisting of 2 layers containing downtrack and crosstrack indices. These index positions allow us to quickly project the raw data onto this geographic grid.

To access the .nc file, you can use the netCDF4, xarray libraries, or fuctions from the emit_tools.py library. Here we will use the emit_xarray function from this library, which will open and organize the data into an easy to work with xarray.Dataset object. We can also pass the ortho=True argument to orthorectify the data at this stage, but we will start just examining the data to get a better understanding.

ds_min = et.emit_xarray(fp)
ds_min

If we look at the mineral index by printing the first 5 values, we can see that values start with 1. If we look at the minimum values of the mineral IDs we can see these have 0 as a possible value.

print(ds_min.index.data[:5])

print(f'Group_1_minimum:{ds_min.group_1_mineral_id.data.min()} Group_2_minimum: {ds_min.group_2_mineral_id.data.min()}')

The 0 here represents no match. For convenience, let’s make a DataFrame that holds the mineral data, and add that ‘No match’ reference to it:

min_df = pd.DataFrame({x: ds_min[x].values for x in [var for var in ds_min.coords if 'mineral_name' in ds_min[var].dims]})
min_df.loc[-1] = {'index': 0, 'mineral_name': 'No_Match', 'record': -1.0, 'url': 'NA', 'group': 1.0, 'library': 'NA', 'spatial_ref': 0}
min_df = min_df.sort_index().reset_index(drop=True)
min_df

2.1 Orthorectification

The orthorectifation process has already been done for EMIT data. Here we are just using the crosstrack and downtrack indices contained in the GLT to place our spatially raw mineralogy data a into geographic grid with the ortho_x and ortho_y dimensions.

ds_min = et.ortho_xr(ds_min)
ds_min

We can see from these outputs that the dimensions are now latitude and longitude.

In this example, we’ll just work with the group_1 mineral data. We can find the minerals present in the scene by finding unique values in the group_1_mineral_id, but first we will replace fill-values introduced during orthorectification with np.nan, to omit them from our analysis and improve visualizations.

# Assign fill to np.nan
for var in ds_min.data_vars:
    ds_min[var].data[ds_min[var].data == -9999] = np.nan

2.2 Visualize Group 1 Minerals

To visualize minerals present, plot Group 1 Minerals using a categorical color set. You can hover over a colored region to see the zero-based mineral id from the spectral library. Note that these values correspond to the 1-based index value.

ds_min['group_1_mineral_id'].hvplot.image(cmap='glasbey', geo=True, tiles='ESRI', alpha=0.8,frame_width=750).opts(title='Group 1 Mineral ID')

This figure shows the minerals present in the scene, but doesn’t really quantify how well they matched with the spectral library. For that we can look at the band depth for each mineral. We can build an interactive tool to do this using the panel and hvplot libraries. This will take a bit of time to load for each selection.

Because many minerals are scarce, we’ll start by updating the names to include relative fractions

g1_min_percent = [np.round(np.sum(ds_min.group_1_mineral_id.data.flatten() == g1min) / np.sum(ds_min.group_1_mineral_id.data.flatten() > 0),2) * 100 for g1min in range(len(min_df))]
g1_dropdown_names = [str(g1_min_percent[_x]) + ' %: ' + x for (_x, x) in enumerate(min_df.mineral_name.tolist()) if g1_min_percent[_x] > 0 and x != 'No_Match']
g1_dropdown_names = np.array(g1_dropdown_names)[np.argsort([float(x.split(' %:')[0]) for x in g1_dropdown_names])[::-1]].tolist()

# Interactive Panel Control For Mineral Band Depth
min_select = pn.widgets.Select(name='Mineral Name', options = g1_dropdown_names, value = g1_dropdown_names[0])
@pn.depends(min_select)
def min_browse(min_select):
    mask = ds_min['group_1_band_depth'].where(ds_min['group_1_mineral_id'] == min_df['mineral_name'].tolist().index(min_select.split('%: ')[-1]))
    map = mask.hvplot.image(cmap='viridis', geo=True, tiles='ESRI', alpha=0.8,frame_width=450, clim=(0,np.nanpercentile(mask,98))).opts(title=f'{min_select} Band Depth')
    return map
pn.Row(pn.WidgetBox(min_select),min_browse)

2.3 Visualize Group 2 Minerals

We can do the same thing with Group 2 Minerals. Group 2 will show a more diverse set of minerals in this region, including clays and carbonates.

ds_min['group_2_mineral_id'].hvplot.image(cmap='glasbey', geo=True, tiles='ESRI', alpha=0.8,frame_width=750).opts(title='Group 2 Mineral ID')

g2_min_percent = [np.round(np.sum(ds_min.group_2_mineral_id.data.flatten() == g1min) / np.sum(ds_min.group_2_mineral_id.data.flatten() > 0),2) * 100 for g1min in range(len(min_df))]
g2_dropdown_names = [str(g2_min_percent[_x]) + ' %: ' + x for (_x, x) in enumerate(min_df.mineral_name.tolist()) if g2_min_percent[_x] > 0 and x != 'No_Match']
g2_dropdown_names = np.array(g2_dropdown_names)[np.argsort([float(x.split(' %:')[0]) for x in g2_dropdown_names])[::-1]].tolist()

# Interactive Panel Control For Mineral Band Depth
min_select_g2 = pn.widgets.Select(name='Mineral Name', options = g2_dropdown_names, value = g2_dropdown_names[0])
@pn.depends(min_select_g2)
def min_browse_g2(min_select):
    # print(min_select)
    mask = ds_min['group_2_band_depth'].where(ds_min['group_2_mineral_id'] == min_df['mineral_name'].tolist().index(min_select.split('%: ')[-1]))
    map = mask.hvplot.image(cmap='viridis', geo=True, tiles='ESRI', alpha=0.8,frame_width=450, clim=(0,np.nanpercentile(mask,98))).opts(title=f'{min_select} Band Depth')
    return map
pn.Row(pn.WidgetBox(min_select_g2),min_browse_g2)

2.4 Visualizing and filtering with uncertainties

Often just as important as visualizing the core mineral detections is displaying the corresponding uncertainties. The EMIT L2B MIN proudct comes with two different uncertainties (repeated for each group):

Fit - this is how good of a fit the individual mineral detection was, as estimated by the correlation coefficient of the alignment between the continuum removed library reference and the continuum removed observed spectra (after scaling).
Band depth uncertainty - this is the propogation of the reflectance uncertainty through the band depth calculation.

Let’s take a look at each of these by loading the relevant datasets:

# Open an Orthorectify the Uncertainty Data
ds_min_unc = et.emit_xarray(fp_un)
ds_min_unc = et.ortho_xr(ds_min_unc)
ds_min_unc

min_select_fit_g2 = pn.widgets.Select(name='Mineral Name', options = g1_dropdown_names, value = g1_dropdown_names[0])
@pn.depends(min_select_fit_g2)
def min_browse_fit_g2(min_select):
    mask = ds_min_unc['group_1_fit'].where(ds_min['group_1_mineral_id'] == min_df['mineral_name'].tolist().index(min_select.split('%: ')[-1]))
    # The fits are scaled down by a factor of two in the current EMIT L2B products, so scale them back up:
    mask *= 2
    map = mask.hvplot.image(cmap='viridis', geo=True, tiles='ESRI', alpha=0.8,frame_width=450, clim=(0,np.nanpercentile(mask,98))).opts(title=f'{min_select} Fit')
    return map
pn.Row(pn.WidgetBox(min_select_fit_g2),min_browse_fit_g2)

3. Aggregating and Mineral Abundance

The above visualizations walk through the identification of individual library contituents, and visualize band depths. However, the Tetracorder library used by EMIT contains many substrates that are spectrally distinct, but which may be useful to group together for some science applications. The library also contains many mixtures - both aerial and intimate.

To start, the mineral_grouping_matrix from the emit-sds-l2b repository (coppied locally) contains information aggregated from laboratory XRD analyses to attempt to quantify the abundance of different minerals within each constituent. A -1 in the spreadsheet indicates an unknown but non-zero quantity, which in the few cases in the EMIT-10 columns we assume to be 100%. Let’s open that spreadsheet and take a look:

# Open Mineral Groupings .csv
mineral_groupings = pd.read_csv('../../data/mineral_grouping_matrix_20230503.csv')
# The EMIT 10 Minerals are in columns 6 - 17.  Columns after 17 are experimental, and we'll drop for this tutorial:
mineral_groupings = mineral_groupings.drop([x for _x, x in enumerate(mineral_groupings) if _x >= 17], axis=1)

# Retrieve the EMIT 10 Mineral Names from Columns 7-16 (starting with 0) in .csv
mineral_names = [x for _x, x in enumerate(list(mineral_groupings)) if _x > 6 and _x < 17]
# Use EMIT 10 Mineral Names to Subset .csv to only columns with EMIT 10 mineral_names
mineral_abundance_ref = np.array(mineral_groupings[mineral_names])
# Replace Some values in the .csv
mineral_abundance_ref[np.isnan(mineral_abundance_ref)] = 0
mineral_abundance_ref[mineral_abundance_ref == -1] = 1

mineral_groupings

3.1 Approximating Mineral Spectral Abundance

If we make the assumption that the XRD analysis are accurate, and that band-depth scales linearly with abundance, we can approximate the mineral abundance at the surface. It should be noted that both of these assumptions are fraught - XRD analyses break down for some very small grainsize particles, particularly the Iron Oxides (Goethite and Hematite), and band-depth is heavily influenced by particle grain size, and so the abundance-band depth relationship is not linear. We will expand on these details a bit more later, but for now lets take a look at what happens if we hold both of these as true.

The first step is to run through each mineral that has a non-nan value in the mineral_groupings dataframe, and add up the sum of each of those within the scene. A little matrix multiplication is all we need to do that. Notably, in the emit-sds/emit-sds-l2b, this functionality is already built into the group_aggregator.py script in an efficient manner…but because the calculation is simple we’ll reproduce here for learning purposes.

Below, we step through each mineral one at a time, and multiply the band depth by the XRD value for each constintuent in the scene, storing it in a (y,x,band) numpy array, which we can cast into x-array for visualization down the line.

mineral_abundance = np.zeros((ds_min['group_1_band_depth'].shape[0], ds_min['group_1_band_depth'].shape[1], len(mineral_names)))
for _m, mineral_name in enumerate(mineral_names):
    print(f'Calculating {mineral_name}')
    for _c in range(mineral_groupings.shape[0]):
        if np.isnan(mineral_groupings[mineral_name][_c]) == False:      
            group = mineral_groupings["Group"][_c]
            mineral_abundance[...,_m] += (ds_min[f'group_{group}_mineral_id'].values == mineral_groupings['Index'][_c]) * ds_min[f'group_{group}_band_depth'].values
 
            
mineral_abundance[np.isnan(mineral_abundance)] = 0
mineral_abundance[np.all(mineral_abundance == 0, axis=-1),:] = np.nan

# Cast as x-array for consistent visualization
mineral_abundance_xarray = xr.merge([xr.DataArray(mineral_abundance[...,_x],
                                                  name=mineral_names[_x],
                                                  coords=ds_min['group_1_band_depth'].coords,
                                                  attrs=ds_min.attrs,) 
                                     for _x in range(len(mineral_names))])

After which we can show them as a widget:

min_abun_wid = pn.widgets.Select(name='Mineral Abundance', options = mineral_names, value = mineral_names[0])
@pn.depends(min_abun_wid)
def min_abun_browse(mineral_name):
    map = mineral_abundance_xarray[mineral_name].hvplot.image(cmap='viridis', geo=True, tiles='ESRI', alpha=0.8,frame_width=450, 
                                                              clim=(0,np.nanpercentile(mineral_abundance_xarray[mineral_name],98))).opts(\
                                                              title=f'{mineral_name} Spectral Abundance')

    return map
pn.Row(pn.WidgetBox(min_abun_wid),min_abun_browse)

We can also take a look at the histogram distributions

max_bin_max = np.max([np.nanpercentile(mineral_abundance_xarray[x].values,98) for x in mineral_names])
mineral_abundance_xarray.hvplot.hist(y=mineral_names, cmap='glasbey', bins=50, alpha=0.5, legend='left', bin_range=(0.001, max_bin_max), 
                                     xlabel='Spectral Abundance', ylabel='Pixel Count', title='',frame_width=650)

Notably, these abundance fractions do not sum to one. This can either be due to copmlex mixtures that aren’t fully captured, or due to the presence of additional materials that do not directly absorb in the VSWIR - most namely quartz and feldspar, but also including organic soil matter, as well as NPV and PV. To illustrate this, let’s examine the totals.

total_mineral_abundance = xr.DataArray(np.nansum(mineral_abundance,axis=-1), 
                                       name='Total Abundance', 
                                       coords=ds_min['group_1_band_depth'].coords, 
                                       attrs=ds_min.attrs,)
total_mineral_abundance.hvplot.hist(y=['Total Abundance'], cmap='glasbey', bins=50, alpha=0.5, legend='left', 
                                    bin_range=(0.001, np.nanpercentile(total_mineral_abundance.values, 99.9)), 
                                    xlabel='Spectral Abundance', ylabel='Pixel Count', title='',frame_width=650)

4. Exporting To Cloud-Optimized Geotiffs

We can select layers from any of the xarray datasets we’ve created and export them to a Cloud-Optimized Geotiff (COG) using the rio.to_raster function from the rasterio library. This will allow us to share the data with others or use it in other GIS software.

First create an output folder.

# Create Out filenames and set folder
out_folder = '../../data/output/' # may need to change based on your directory structure
# Create out_folder if it does not exist
if not os.path.exists(out_folder):
   os.makedirs(out_folder)

4.1 Mineral ID

# Set output Filename
out_name = f'{ds_min.granule_id}_group_1_mineral_id.tif'
# Select Group to Output
dat_out = ds_min['group_1_mineral_id']
# Replace nan with a fill value
dat_out.data = np.nan_to_num(dat_out.data, nan=-9999)
# Change datatype for integers/categorical
dat_out.data = dat_out.data.astype(int)
# Encode nodata value
dat_out.rio.write_nodata(-9999, encoded=True, inplace=True)
# Write data to COG
dat_out.rio.to_raster(raster_path=f'{out_folder}{out_name}', driver='COG')

4.2 Band Depth

# Set output Filename
out_name = f'{ds_min.granule_id}_group_1_band_depth.tif'
# Select Group to Output
dat_out = ds_min['group_1_band_depth']
# Replace nan with a fill value
dat_out.data = np.nan_to_num(dat_out.data, nan=-9999)
# Encode nodata value
dat_out.rio.write_nodata(-9999, encoded=True, inplace=True)
# Write data to COG
dat_out.rio.to_raster(raster_path=f'{out_folder}{out_name}', driver='COG')

4.3 Abundance

# Set output Filename
out_name = f'{mineral_abundance_xarray.granule_id}_Calcite.tif'
# Select Layer
dat_out = mineral_abundance_xarray['Calcite']
# Replace nan with a fill value
dat_out.data = np.nan_to_num(dat_out.data, nan=-9999)
# Encode nodata value
dat_out.rio.write_nodata(-9999, encoded=True, inplace=True)
# Write data to COG
dat_out.rio.to_raster(raster_path=f'{out_folder}{out_name}', driver='COG')

Contact Info:

Email: LPDAAC@usgs.gov
Voice: +1-866-573-3222
Organization: Land Processes Distributed Active Archive Center (LP DAAC)¹
Website: https://lpdaac.usgs.gov/
Date last modified: 07-07-2024

¹Work performed under USGS contract 140G0121D0001 for NASA contract NNG14HH33I.