import earthaccess
import geopandas as gp
import os
import sys
import numpy as np
import pandas as pd
import xarray as xr
import hvplot.xarray
import holoviews as hv
import panel as pn
'../modules/')
sys.path.append(import emit_tools as et
This notebook is from EMIT-Data-Resources
Source: Working with EMIT L2B Mineralogy Data
Imported on: 2025-01-22
Disclaimer: The EMITL2BMIN product is generated to support the EMIT mission objectives of constraining the sign of dust related radiative forcing. Ten mineral types are the core focus of this work: Calcite, Chlorite, Dolomite, Goethite, Gypsum, Hematite, Illite+Muscovite, Kaolinite, Montmorillonite, and Vermiculite. The EMIT_L3_ASA product contain the aggregate abundance of these minerals at a coarser resolution for use in Earth System Models. Additional minerals are included in the EMITL2BMIN product for transparency but were not the focus of this product. Further validation is required to use these additional mineral maps, particularly in the case of resource exploration. Similarly, the separation of minerals with similar spectral features, such as a fine-grained goethite and hematite, is an area of active research. The results presented here are an initial offering, but the precise categorization is likely to evolve over time, and the limits of what can and cannot be separated on the global scale is still being explored. The user is encouraged to read the Algorithm Theoretical Basis Document (ATBD) for more details.
Working with EMIT L2B Mineralogy Data
Summary
In this notebook we will open the EMIT L2B Estimated Mineral Identification and Band Depth and Uncertainty (EMITL2BMIN) products, find a mineral of interest from the ten mineral types focused on by the EMIT mission, evaluate uncertainty, orthorectify the data, then create an output mask or vector file for the granule.
Background
The EMIT instrument is an imaging spectrometer that measures light in visible and infrared wavelengths. These measurements display unique spectral signatures that correspond to the composition on the Earth’s surface. The EMIT mission focuses specifically on mapping the composition of minerals to better understand the effects of mineral dust throughout the Earth system and human populations now and in the future. More details about EMIT and its associated products can be found in the README.md and on the EMIT website.
The EMITL2BMIN data product provides estimated mineral identification, band depths and uncertainty in a spatially raw, non-orthocorrected format. Two spectral groups, which correspond to different regions of the spectra, are identified independently and often co-occur are used to identify minerals. These estimates are generated using the Tetracorder system(code) and are based on EMITL2ARFL reflectance values. The product also consists of an EMIT_L2B_MINUNCERT file, which provides band depth uncertainty estimates calculated using surface Reflectance Uncertainty values from the EMITL2ARFL data product. The band depth uncertainties are presented as standard deviations, and the fit score for each mineral identification is also provided as the coefficient of determination (r2) of the match between the continuum normalized library reference and the continuum normalized observed spectrum. Associated metadata indicates the name and reference information for each identified mineral, and additional information about aggregating minerals into different categories, and the code used for product generation is available in the emit-sds-l2b repository.
Disclaimer
The EMIT_L2B_MIN product is generated to support the EMIT mission objectives of constraining the sign of dust related radiative forcing. Ten mineral types are the core focus of this work: Calcite, Chlorite, Dolomite, Goethite, Gypsum, Hematite, Illite+Muscovite, Kaolinite, Montmorillonite, and Vermiculite. A future product will aggregate these results for use in Earth System Models. Additional minerals are included in this product for transparency but were not the focus of this product. Further validation is required to use these additional mineral maps, particularly in the case of resource exploration. Similarly, the separation of minerals with similar spectral features, such as a fine-grained goethite and hematite, is an area of active research. The results presented here are an initial offering, but the precise categorization is likely to evolve over time, and the limits of what can and cannot be separated on the global scale is still being explored. The user is encouraged to read the Algorithm Theoretical Basis Document (ATBD) for more details.
Requirements - Set up Python Environment - See setup_instructions.md in the /setup/
folder
Learning Objectives
- How to open an EMIT L2B .nc
file as an xarray.Dataset
- Apply the Geometry Lookup Table (GLT) to orthorectify the image. - Find minerals of interest within a granule - Visualize Mineral Identification and Band depth - Evaluate mineral uncertainty - Calculate and Visualize mineral Abundance
Tutorial Outline
- Setup
- Working with Mineral Identification and Band Depth
- Aggregating and Mineral Abundance
- Exporting to Cloud-Optimized GeoTIFF (COG)
1. Setup
Import the required Python libraries.
Login to your NASA Earthdata account and create a .netrc
file using the login
function from the earthaccess
library. If you do not have an Earthdata Account, you can create one here.
=True) earthaccess.login(persist
For this notebook we will download the files necessary using earthaccess
. You can also access the data in place or stream it, but this can slow due to the file sizes. Provide a URL for an EMIT L2B Mineral Identification and Band Depth granule.
# List the browse images from the text file output of the previous notebook.
= '../../data/results_urls.txt'
min_list with open(min_list) as f:
= [line.rstrip('\n') for line in f]
min_urls min_urls
# List the browse images from the text file output of the previous notebook.
= '../../data/min_uncert_urls.txt'
unc_list with open(unc_list) as f:
= [line.rstrip('\n') for line in f]
unc_urls unc_urls
Get an HTTPS Session using your earthdata login, set a local path to save the file, and download the granule asset, in this case we are selecting the second scene (index 1) from the list of granules for both the mineral and uncertainty files.
= earthaccess.get_fsspec_https_session()
fs = fs.open(min_urls[0])
fp = fs.open(unc_urls[0]) fp_un
1.2 Downloaded Data
If you’ve already downloaded the data using the workflow shown in Section 6 of the Finding EMIT L2B Mineralogy Data , you can just set filepaths using the cell below.
#fp = '../../data/EMIT_L2B_MIN_001_20230427T173309_2311711_010.nc' # Mineral
#fp_rgb = '../../data/EMIT_L2A_RFL_001_20230427T173309_2311711_010.png' # RGB
#fp_un = '../../data/EMIT_L2B_MINUNCERT_001_20230427T173309_2311711_010.nc' # Mineral Uncertainty
2. Working with the L2B Mineral Identification and Band Depth
EMITL2BMIN data are distributed in a non-orthocorrected spatially raw NetCDF4 (.nc) format consisting of the data and its associated metadata. Inside the .nc
file there are 3 groups. Groups can be thought of as containers to organize the data.
- The root group that can be considered the main dataset contains 4 data variables data described by the downtrack, and crosstrack dimensions. These variables are
group_1_mineral_id
,group_1_band_depth
,group_2_mineral_id
, andgroup_2_band_depth
. These contain the ID and a band depth for each mineral group. These groups do not correspond to the.netcdf
file groups, but rather the spectral library groups used to identify the minerals based on which region of the spectra the mineral features correspond to. - The
mineral_metadata
group containing the spectral library entry name, index, record, group, and url for each entry. - The
location
group contains latitude and longitude values at the center of each pixel described by the crosstrack and downtrack dimensions, as well as a geometry lookup table (GLT) described by the ortho_x and ortho_y dimensions. The GLT is an orthorectified image (EPSG:4326) consisting of 2 layers containing downtrack and crosstrack indices. These index positions allow us to quickly project the raw data onto this geographic grid.
To access the .nc
file, you can use the netCDF4
, xarray
libraries, or fuctions from the emit_tools.py
library. Here we will use the emit_xarray
function from this library, which will open and organize the data into an easy to work with xarray.Dataset
object. We can also pass the ortho=True
argument to orthorectify the data at this stage, but we will start just examining the data to get a better understanding.
= et.emit_xarray(fp)
ds_min ds_min
If we look at the mineral index
by printing the first 5 values, we can see that values start with 1. If we look at the minimum values of the mineral IDs we can see these have 0 as a possible value.
print(ds_min.index.data[:5])
print(f'Group_1_minimum:{ds_min.group_1_mineral_id.data.min()} Group_2_minimum: {ds_min.group_2_mineral_id.data.min()}')
The 0 here represents no match. For convenience, let’s make a DataFrame that holds the mineral data, and add that ‘No match’ reference to it:
= pd.DataFrame({x: ds_min[x].values for x in [var for var in ds_min.coords if 'mineral_name' in ds_min[var].dims]})
min_df -1] = {'index': 0, 'mineral_name': 'No_Match', 'record': -1.0, 'url': 'NA', 'group': 1.0, 'library': 'NA', 'spatial_ref': 0}
min_df.loc[= min_df.sort_index().reset_index(drop=True)
min_df min_df
2.1 Orthorectification
The orthorectifation process has already been done for EMIT data. Here we are just using the crosstrack and downtrack indices contained in the GLT to place our spatially raw mineralogy data a into geographic grid with the ortho_x
and ortho_y
dimensions.
= et.ortho_xr(ds_min)
ds_min ds_min
We can see from these outputs that the dimensions are now latitude and longitude.
In this example, we’ll just work with the group_1 mineral data. We can find the minerals present in the scene by finding unique values in the group_1_mineral_id
, but first we will replace fill-values introduced during orthorectification with np.nan
, to omit them from our analysis and improve visualizations.
# Assign fill to np.nan
for var in ds_min.data_vars:
== -9999] = np.nan ds_min[var].data[ds_min[var].data
2.2 Visualize Group 1 Minerals
To visualize minerals present, plot Group 1 Minerals using a categorical color set. You can hover over a colored region to see the zero-based mineral id from the spectral library. Note that these values correspond to the 1-based index value.
'group_1_mineral_id'].hvplot.image(cmap='glasbey', geo=True, tiles='ESRI', alpha=0.8,frame_width=750).opts(title='Group 1 Mineral ID') ds_min[
This figure shows the minerals present in the scene, but doesn’t really quantify how well they matched with the spectral library. For that we can look at the band depth for each mineral. We can build an interactive tool to do this using the panel
and hvplot
libraries. This will take a bit of time to load for each selection.
Because many minerals are scarce, we’ll start by updating the names to include relative fractions
= [np.round(np.sum(ds_min.group_1_mineral_id.data.flatten() == g1min) / np.sum(ds_min.group_1_mineral_id.data.flatten() > 0),2) * 100 for g1min in range(len(min_df))]
g1_min_percent = [str(g1_min_percent[_x]) + ' %: ' + x for (_x, x) in enumerate(min_df.mineral_name.tolist()) if g1_min_percent[_x] > 0 and x != 'No_Match']
g1_dropdown_names = np.array(g1_dropdown_names)[np.argsort([float(x.split(' %:')[0]) for x in g1_dropdown_names])[::-1]].tolist() g1_dropdown_names
# Interactive Panel Control For Mineral Band Depth
= pn.widgets.Select(name='Mineral Name', options = g1_dropdown_names, value = g1_dropdown_names[0])
min_select @pn.depends(min_select)
def min_browse(min_select):
= ds_min['group_1_band_depth'].where(ds_min['group_1_mineral_id'] == min_df['mineral_name'].tolist().index(min_select.split('%: ')[-1]))
mask map = mask.hvplot.image(cmap='viridis', geo=True, tiles='ESRI', alpha=0.8,frame_width=450, clim=(0,np.nanpercentile(mask,98))).opts(title=f'{min_select} Band Depth')
return map
pn.Row(pn.WidgetBox(min_select),min_browse)
2.3 Visualize Group 2 Minerals
We can do the same thing with Group 2 Minerals. Group 2 will show a more diverse set of minerals in this region, including clays and carbonates.
'group_2_mineral_id'].hvplot.image(cmap='glasbey', geo=True, tiles='ESRI', alpha=0.8,frame_width=750).opts(title='Group 2 Mineral ID') ds_min[
= [np.round(np.sum(ds_min.group_2_mineral_id.data.flatten() == g1min) / np.sum(ds_min.group_2_mineral_id.data.flatten() > 0),2) * 100 for g1min in range(len(min_df))]
g2_min_percent = [str(g2_min_percent[_x]) + ' %: ' + x for (_x, x) in enumerate(min_df.mineral_name.tolist()) if g2_min_percent[_x] > 0 and x != 'No_Match']
g2_dropdown_names = np.array(g2_dropdown_names)[np.argsort([float(x.split(' %:')[0]) for x in g2_dropdown_names])[::-1]].tolist() g2_dropdown_names
# Interactive Panel Control For Mineral Band Depth
= pn.widgets.Select(name='Mineral Name', options = g2_dropdown_names, value = g2_dropdown_names[0])
min_select_g2 @pn.depends(min_select_g2)
def min_browse_g2(min_select):
# print(min_select)
= ds_min['group_2_band_depth'].where(ds_min['group_2_mineral_id'] == min_df['mineral_name'].tolist().index(min_select.split('%: ')[-1]))
mask map = mask.hvplot.image(cmap='viridis', geo=True, tiles='ESRI', alpha=0.8,frame_width=450, clim=(0,np.nanpercentile(mask,98))).opts(title=f'{min_select} Band Depth')
return map
pn.Row(pn.WidgetBox(min_select_g2),min_browse_g2)
2.4 Visualizing and filtering with uncertainties
Often just as important as visualizing the core mineral detections is displaying the corresponding uncertainties. The EMIT L2B MIN proudct comes with two different uncertainties (repeated for each group):
- Fit - this is how good of a fit the individual mineral detection was, as estimated by the correlation coefficient of the alignment between the continuum removed library reference and the continuum removed observed spectra (after scaling).
- Band depth uncertainty - this is the propogation of the reflectance uncertainty through the band depth calculation.
Let’s take a look at each of these by loading the relevant datasets:
# Open an Orthorectify the Uncertainty Data
= et.emit_xarray(fp_un)
ds_min_unc = et.ortho_xr(ds_min_unc)
ds_min_unc ds_min_unc
= pn.widgets.Select(name='Mineral Name', options = g1_dropdown_names, value = g1_dropdown_names[0])
min_select_fit_g2 @pn.depends(min_select_fit_g2)
def min_browse_fit_g2(min_select):
= ds_min_unc['group_1_fit'].where(ds_min['group_1_mineral_id'] == min_df['mineral_name'].tolist().index(min_select.split('%: ')[-1]))
mask # The fits are scaled down by a factor of two in the current EMIT L2B products, so scale them back up:
*= 2
mask map = mask.hvplot.image(cmap='viridis', geo=True, tiles='ESRI', alpha=0.8,frame_width=450, clim=(0,np.nanpercentile(mask,98))).opts(title=f'{min_select} Fit')
return map
pn.Row(pn.WidgetBox(min_select_fit_g2),min_browse_fit_g2)
3. Aggregating and Mineral Abundance
The above visualizations walk through the identification of individual library contituents, and visualize band depths. However, the Tetracorder library used by EMIT contains many substrates that are spectrally distinct, but which may be useful to group together for some science applications. The library also contains many mixtures - both aerial and intimate.
To start, the mineral_grouping_matrix from the emit-sds-l2b repository (coppied locally) contains information aggregated from laboratory XRD analyses to attempt to quantify the abundance of different minerals within each constituent. A -1 in the spreadsheet indicates an unknown but non-zero quantity, which in the few cases in the EMIT-10 columns we assume to be 100%. Let’s open that spreadsheet and take a look:
# Open Mineral Groupings .csv
= pd.read_csv('../../data/mineral_grouping_matrix_20230503.csv')
mineral_groupings # The EMIT 10 Minerals are in columns 6 - 17. Columns after 17 are experimental, and we'll drop for this tutorial:
= mineral_groupings.drop([x for _x, x in enumerate(mineral_groupings) if _x >= 17], axis=1)
mineral_groupings
# Retrieve the EMIT 10 Mineral Names from Columns 7-16 (starting with 0) in .csv
= [x for _x, x in enumerate(list(mineral_groupings)) if _x > 6 and _x < 17]
mineral_names # Use EMIT 10 Mineral Names to Subset .csv to only columns with EMIT 10 mineral_names
= np.array(mineral_groupings[mineral_names])
mineral_abundance_ref # Replace Some values in the .csv
= 0
mineral_abundance_ref[np.isnan(mineral_abundance_ref)] == -1] = 1
mineral_abundance_ref[mineral_abundance_ref
mineral_groupings
3.1 Approximating Mineral Spectral Abundance
If we make the assumption that the XRD analysis are accurate, and that band-depth scales linearly with abundance, we can approximate the mineral abundance at the surface. It should be noted that both of these assumptions are fraught - XRD analyses break down for some very small grainsize particles, particularly the Iron Oxides (Goethite and Hematite), and band-depth is heavily influenced by particle grain size, and so the abundance-band depth relationship is not linear. We will expand on these details a bit more later, but for now lets take a look at what happens if we hold both of these as true.
The first step is to run through each mineral that has a non-nan value in the mineral_groupings dataframe, and add up the sum of each of those within the scene. A little matrix multiplication is all we need to do that. Notably, in the emit-sds/emit-sds-l2b, this functionality is already built into the group_aggregator.py script in an efficient manner…but because the calculation is simple we’ll reproduce here for learning purposes.
Below, we step through each mineral one at a time, and multiply the band depth by the XRD value for each constintuent in the scene, storing it in a (y,x,band) numpy array, which we can cast into x-array for visualization down the line.
= np.zeros((ds_min['group_1_band_depth'].shape[0], ds_min['group_1_band_depth'].shape[1], len(mineral_names)))
mineral_abundance for _m, mineral_name in enumerate(mineral_names):
print(f'Calculating {mineral_name}')
for _c in range(mineral_groupings.shape[0]):
if np.isnan(mineral_groupings[mineral_name][_c]) == False:
= mineral_groupings["Group"][_c]
group += (ds_min[f'group_{group}_mineral_id'].values == mineral_groupings['Index'][_c]) * ds_min[f'group_{group}_band_depth'].values
mineral_abundance[...,_m]
= 0
mineral_abundance[np.isnan(mineral_abundance)] all(mineral_abundance == 0, axis=-1),:] = np.nan
mineral_abundance[np.
# Cast as x-array for consistent visualization
= xr.merge([xr.DataArray(mineral_abundance[...,_x],
mineral_abundance_xarray =mineral_names[_x],
name=ds_min['group_1_band_depth'].coords,
coords=ds_min.attrs,)
attrsfor _x in range(len(mineral_names))])
After which we can show them as a widget:
= pn.widgets.Select(name='Mineral Abundance', options = mineral_names, value = mineral_names[0])
min_abun_wid @pn.depends(min_abun_wid)
def min_abun_browse(mineral_name):
map = mineral_abundance_xarray[mineral_name].hvplot.image(cmap='viridis', geo=True, tiles='ESRI', alpha=0.8,frame_width=450,
=(0,np.nanpercentile(mineral_abundance_xarray[mineral_name],98))).opts(\
clim=f'{mineral_name} Spectral Abundance')
title
return map
pn.Row(pn.WidgetBox(min_abun_wid),min_abun_browse)
We can also take a look at the histogram distributions
= np.max([np.nanpercentile(mineral_abundance_xarray[x].values,98) for x in mineral_names])
max_bin_max =mineral_names, cmap='glasbey', bins=50, alpha=0.5, legend='left', bin_range=(0.001, max_bin_max),
mineral_abundance_xarray.hvplot.hist(y='Spectral Abundance', ylabel='Pixel Count', title='',frame_width=650) xlabel
Notably, these abundance fractions do not sum to one. This can either be due to copmlex mixtures that aren’t fully captured, or due to the presence of additional materials that do not directly absorb in the VSWIR - most namely quartz and feldspar, but also including organic soil matter, as well as NPV and PV. To illustrate this, let’s examine the totals.
= xr.DataArray(np.nansum(mineral_abundance,axis=-1),
total_mineral_abundance ='Total Abundance',
name=ds_min['group_1_band_depth'].coords,
coords=ds_min.attrs,)
attrs=['Total Abundance'], cmap='glasbey', bins=50, alpha=0.5, legend='left',
total_mineral_abundance.hvplot.hist(y=(0.001, np.nanpercentile(total_mineral_abundance.values, 99.9)),
bin_range='Spectral Abundance', ylabel='Pixel Count', title='',frame_width=650) xlabel
4. Exporting To Cloud-Optimized Geotiffs
We can select layers from any of the xarray datasets we’ve created and export them to a Cloud-Optimized Geotiff (COG) using the rio.to_raster
function from the rasterio
library. This will allow us to share the data with others or use it in other GIS software.
First create an output folder.
# Create Out filenames and set folder
= '../../data/output/' # may need to change based on your directory structure
out_folder # Create out_folder if it does not exist
if not os.path.exists(out_folder):
os.makedirs(out_folder)
4.1 Mineral ID
# Set output Filename
= f'{ds_min.granule_id}_group_1_mineral_id.tif'
out_name # Select Group to Output
= ds_min['group_1_mineral_id']
dat_out # Replace nan with a fill value
= np.nan_to_num(dat_out.data, nan=-9999)
dat_out.data # Change datatype for integers/categorical
= dat_out.data.astype(int)
dat_out.data # Encode nodata value
-9999, encoded=True, inplace=True)
dat_out.rio.write_nodata(# Write data to COG
=f'{out_folder}{out_name}', driver='COG') dat_out.rio.to_raster(raster_path
4.2 Band Depth
# Set output Filename
= f'{ds_min.granule_id}_group_1_band_depth.tif'
out_name # Select Group to Output
= ds_min['group_1_band_depth']
dat_out # Replace nan with a fill value
= np.nan_to_num(dat_out.data, nan=-9999)
dat_out.data # Encode nodata value
-9999, encoded=True, inplace=True)
dat_out.rio.write_nodata(# Write data to COG
=f'{out_folder}{out_name}', driver='COG') dat_out.rio.to_raster(raster_path
4.3 Abundance
# Set output Filename
= f'{mineral_abundance_xarray.granule_id}_Calcite.tif'
out_name # Select Layer
= mineral_abundance_xarray['Calcite']
dat_out # Replace nan with a fill value
= np.nan_to_num(dat_out.data, nan=-9999)
dat_out.data # Encode nodata value
-9999, encoded=True, inplace=True)
dat_out.rio.write_nodata(# Write data to COG
=f'{out_folder}{out_name}', driver='COG') dat_out.rio.to_raster(raster_path
Contact Info:
Email: LPDAAC@usgs.gov
Voice: +1-866-573-3222
Organization: Land Processes Distributed Active Archive Center (LP DAAC)¹
Website: https://lpdaac.usgs.gov/
Date last modified: 07-07-2024
¹Work performed under USGS contract 140G0121D0001 for NASA contract NNG14HH33I.