# Import required libraries
import os
import sys
import folium
import earthaccess
import warnings
import folium.plugins
import pandas as pd
import geopandas as gpd
import math
from branca.element import Figure
from IPython.display import display
from shapely import geometry
from skimage import io
from datetime import timedelta
from shapely.geometry.polygon import orient
from matplotlib import pyplot as plt
import matplotlib.cm as cm
'../modules/')
sys.path.append(from tutorial_utils import list_metadata_fields, results_to_geopandas, convert_bounds
This notebook is from EMIT-Data-Resources
Source: Finding EMIT L2B Data
Imported on: 2025-01-22
Finding EMIT L2B Data
Summary
This notebook demonstrates how to search for Earth Mineral dust source Investigation (EMIT) L2B Estimated Mineral Identification and Band Depth and Uncertainty (EMITL2BMIN) data. This data is available in the NASA Earthdata Cloud, to download or stream, and can be found by querying the NASA Earthdata Common Metadata Repository (CMR) API, or using the Earthdata Search interface. In this example we will use the earthaccess
Python library, which abstracts the NASA CMR API, providing an easy way to search and review results programatically, and manages authentication credentials to simplify the user experience when streaming or downloading data.
Background
The EMIT instrument is an imaging spectrometer that measures light in visible and infrared wavelengths. These measurements display unique spectral signatures that correspond to the composition on the Earth’s surface. The EMIT mission focuses specifically on mapping the composition of minerals to better understand the effects of mineral dust throughout the Earth system and human populations now and in the future. More details about EMIT and its associated products can be found in the README.md and on the EMIT website.
The EMITL2BMIN data product provides estimated mineral identification, band depths and uncertainty in a spatially raw, non-orthocorrected format. Two spectral groups, which correspond to different regions of the spectra, are identified independently and often co-occur are used to identify minerals. These estimates are generated using the Tetracorder system(code) and are based on EMITL2ARFL reflectance values. The product also consists of an EMIT_L2B_MINUNCERT file, which provides band depth uncertainty estimates calculated using surface Reflectance Uncertainty values from the EMITL2ARFL data product. The band depth uncertainties are presented as standard deviations, and the fit score for each mineral identification is also provided as the coefficient of determination (r2) of the match between the continuum normalized library reference and the continuum normalized observed spectrum. Associated metadata indicates the name and reference information for each identified mineral, and additional information about aggregating minerals into different categories, and the code used for product generation is available in the emit-sds-l2b repository.
Disclaimer
The EMIT_L2B_MIN product is generated to support the EMIT mission objectives of constraining the sign of dust related radiative forcing. Ten mineral types are the core focus of this work: Calcite, Chlorite, Dolomite, Goethite, Gypsum, Hematite, Illite+Muscovite, Kaolinite, Montmorillonite, and Vermiculite. A future product will aggregate these results for use in Earth System Models. Additional minerals are included in this product for transparency but were not the focus of this product. Further validation is required to use these additional mineral maps, particularly in the case of resource exploration. Similarly, the separation of minerals with similar spectral features, such as a fine-grained goethite and hematite, is an area of active research. The results presented here are an initial offering, but the precise categorization is likely to evolve over time, and the limits of what can and cannot be separated on the global scale is still being explored. The user is encouraged to read the Algorithm Theoretical Basis Document (ATBD) for more details.
Requirements
- NASA Earthdata Account. This is free and can be set up quickly.
- No Python setup requirements if connected to the workshop cloud instance!
- Local Only Set up Python Environment - See setup_instructions.md in the
/setup/
folder to set up a local compatible Python environment
Learning Objectives - How to get information about data collections using earthaccess
- How to query for EMIT L2B Mineralogy data using spatiotemporal parameters - How to create a geodataframe from the search results - How to further filter and save results URLs to a list.
Tutorial Outline
- Setup
- Searching for EMIT L2B Mineralogy Data
- Advanced Filtering
- Visualizing Data
- Creating a List of Results and Asset URLs
- Streaming or Downloading Data
1. Setup
Import the required Python libraries.
1.2 NASA Earthdata Login Credentials
To download or stream NASA data you will need an Earthdata account, you can create one here. Searching We will use the login
function from the earthaccess
library for authentication before downloading at the end of the notebook. This function can also be used to create a local .netrc
file if it doesn’t exist or add your login info to an existing .netrc
file. If no Earthdata Login credentials are found in the .netrc
you’ll be prompted for them. This step is not necessary to conduct searches but is needed to download or stream data.
2. Searching for EMIT L2B Mineralogy Data
To find data we will use the earthaccess
Python library. earthaccess
searches NASA Common Metadata Repository (CMR) API, a metadata system that catalogs Earth Science data and associated metadata records. The results can then be used to download granules or generate lists of granule search result URLs.
Using earthaccess
we can search based on the attributes of a granule, which can be thought of as a spatiotemporal scene from an instrument containing multiple assets (ex: Reflectance, Reflectance Uncertainty, Masks for the EMIT L2A Reflectance Collection, and EMIT ). When conducting a search we can provide a product, in this case the mineralogy product, a date-time range, and spatial constraints. This process can also be used with other EMIT products, other NASA collections.
2.1 Querying for Datasets
Our first step in searching for data is determining which collection (e.g. EMIT L2A Estimated Surface Reflectance Uncertainty and Masks, EMIT L2B Estimated Mineral Identification and Band Depth and Uncertainty) we want to search for. The best way to do this is using the collection short_name
(e.g. EMITL2ARFL, EMITL2BMIN) or concept-id
. In rare cases, the short_name
of two collections can be the same, so we will use the concept-id
which is a unique identifier for each collection. To find the concept-id
we can search using some keywords.
# EMIT Collection Query
= earthaccess.collection_query().keyword('EMIT L2B Mineral')
emit_collection_query 'ShortName','EntryTitle','Version']).get() emit_collection_query.fields([
[{
"meta": {
"concept-id": "C2408034484-LPCLOUD",
"granule-count": 87550,
"provider-id": "LPCLOUD"
},
"umm": {
"ShortName": "EMITL2BMIN",
"EntryTitle": "EMIT L2B Estimated Mineral Identification and Band Depth and Uncertainty 60 m V001",
"Version": "001"
}
},
{
"meta": {
"concept-id": "C2748097305-LPCLOUD",
"granule-count": 1748,
"provider-id": "LPCLOUD"
},
"umm": {
"ShortName": "EMITL2BCH4ENH",
"EntryTitle": "EMIT L2B Methane Enhancement Data 60 m V001",
"Version": "001"
}
},
{
"meta": {
"concept-id": "C2748088093-LPCLOUD",
"granule-count": 1285,
"provider-id": "LPCLOUD"
},
"umm": {
"ShortName": "EMITL2BCH4PLM",
"EntryTitle": "EMIT L2B Estimated Methane Plume Complexes 60 m V001",
"Version": "001"
}
},
{
"meta": {
"concept-id": "C2872578364-LPCLOUD",
"granule-count": 402,
"provider-id": "LPCLOUD"
},
"umm": {
"ShortName": "EMITL2BCO2ENH",
"EntryTitle": "EMIT L2B Carbon Dioxide Enhancement Data 60 m V001",
"Version": "001"
}
},
{
"meta": {
"concept-id": "C2867824144-LPCLOUD",
"granule-count": 173,
"provider-id": "LPCLOUD"
},
"umm": {
"ShortName": "EMITL2BCO2PLM",
"EntryTitle": "EMIT L2B Estimated Carbon Dioxide Plume Complexes 60 m V001",
"Version": "001"
}
},
{
"meta": {
"concept-id": "C2408752948-LPCLOUD",
"granule-count": 1,
"provider-id": "LPCLOUD"
},
"umm": {
"ShortName": "EMITL3ASA",
"EntryTitle": "EMIT L3 Aggregated Mineral Spectral Abundance and Uncertainty 0.5 Deg V001",
"Version": "001"
}
}]
From this list of results we can see that the concept-id
for the desired mineral product is C2408034484-LPCLOUD
. We can use this to define one of our search arguments.
= 'C2408034484-LPCLOUD' concept_id
2.2 Define Temporal Range
For our date range, we’ll look at all EMIT data collected over 2023. The date_range
can be specified as a pair of dates, start and end (up to, not including).
= ('2023-01-01','2024-01-01') date_range
2.3 Define Spatial Region of Interest
For this example, our spatial region of interest will be the area around Cuprite, NV. A location where there have been several previous mineralogy studies. We can define this region using a rectangular polygon. If you want to make a polygon for a different region, you can use a tool like geojson.io.
Open the geojson
as a geodataframe
, and check the coordinate reference system (CRS) of the data.
= gpd.read_file('../../data/cuprite_bbox.geojson')
roi_gdf roi_gdf.crs
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
roi_gdf
geometry | |
---|---|
0 | POLYGON ((-117.24309 37.59129, -117.24309 37.5... |
We can see this geodataframe
consists of a single polygon which we want to include in our search, but the geometry is the only information contained in the file, so lets add a column for the site name, and set the value to “Cuprite”.
'Name'] = 'Cuprite' roi_gdf[
roi_gdf
geometry | Name | |
---|---|---|
0 | POLYGON ((-117.24309 37.59129, -117.24309 37.5... | Cuprite |
We can visualize the ROI using the folium
library and the explore
function from geopandas
. First, we’ll create a helper function. Then we will create a new map, using Google Maps tiles as our basemap, and add the polygon to the map. We’ll also use our convert_bounds
helper function to limit the map view to roughly the extent of the polygon.
# Function to convert a bounding box for use in leaflet notation
def convert_bounds(bbox, invert_y=False):
"""
Helper method for changing bounding box representation to leaflet notation
``(lon1, lat1, lon2, lat2) -> ((lat1, lon1), (lat2, lon2))``
"""
= bbox
x1, y1, x2, y2 if invert_y:
= y2, y1
y1, y2 return ((y1, x1), (y2, x2))
= Figure(width="750px", height="375px")
fig = folium.Map(tiles='https://mt1.google.com/vt/lyrs=y&x={x}&y={y}&z={z}', attr='Google')
map1
fig.add_child(map1)
# Add roi geodataframe
roi_gdf.explore("Name",
=True,
popup=True,
categorical='Set3',
cmap=dict(opacity=0.7, fillOpacity=0.4),
style_kwds="Regions of Interest",
name=map1
m
)
map1.add_child(folium.LayerControl())=convert_bounds(roi_gdf.unary_union.bounds))
map1.fit_bounds(bounds display(fig)
In our earthaccess
search, we will use the polygon
argument to find where this geometry intersects with the footprint of the EMIT scenes. To do this, we need to create a list of exterior polygon vertices in counter-clockwise order to submit in our search.
# Use orient to place vertices in counter-clockwise order
= orient(roi_gdf.geometry[0], sign = 1.0)
roi # Put the exterior coordinates in a list
= list(roi.exterior.coords)
roi roi
[(-117.24309240198033, 37.59129385913785),
(-117.24309240198033, 37.50102626452812),
(-117.14631968357332, 37.50102626452812),
(-117.14631968357332, 37.59129385913785),
(-117.24309240198033, 37.59129385913785)]
After we have all of the pieces: spatial extent, temporal range, and concept-id, we can perform a search. Note that we are limiting our search 500 results using the count
argument, which doesn’t matter here.
= earthaccess.search_data(
results =concept_id,
concept_id=roi,
polygon=date_range,
temporal=500
count )
Granules found: 9
Our search returned 9 results, which we can convert to a geodataframe
for further filtering and analysis.
results
[Collection: {'ShortName': 'EMITL2BMIN', 'Version': '001'}
Spatial coverage: {'HorizontalSpatialDomain': {'Geometry': {'GPolygons': [{'Boundary': {'Points': [{'Longitude': -116.86429595947266, 'Latitude': 37.96258544921875}, {'Longitude': -117.72748565673828, 'Latitude': 37.362998962402344}, {'Longitude': -117.27104949951172, 'Latitude': 36.70587921142578}, {'Longitude': -116.4078598022461, 'Latitude': 37.30546569824219}, {'Longitude': -116.86429595947266, 'Latitude': 37.96258544921875}]}}]}}}
Temporal coverage: {'RangeDateTime': {'BeginningDateTime': '2023-01-29T21:14:07Z', 'EndingDateTime': '2023-01-29T21:14:19Z'}}
Size(MB): 75.91049480438232
Data: ['https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL2BMIN.001/EMIT_L2B_MIN_001_20230129T211407_2302914_008/EMIT_L2B_MIN_001_20230129T211407_2302914_008.nc', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL2BMIN.001/EMIT_L2B_MIN_001_20230129T211407_2302914_008/EMIT_L2B_MINUNCERT_001_20230129T211407_2302914_008.nc'],
Collection: {'ShortName': 'EMITL2BMIN', 'Version': '001'}
Spatial coverage: {'HorizontalSpatialDomain': {'Geometry': {'GPolygons': [{'Boundary': {'Points': [{'Longitude': -117.79698181152344, 'Latitude': 38.333797454833984}, {'Longitude': -118.24942016601562, 'Latitude': 37.6685905456543}, {'Longitude': -117.468505859375, 'Latitude': 37.1374397277832}, {'Longitude': -117.01606750488281, 'Latitude': 37.80264663696289}, {'Longitude': -117.79698181152344, 'Latitude': 38.333797454833984}]}}]}}}
Temporal coverage: {'RangeDateTime': {'BeginningDateTime': '2023-04-27T17:32:57Z', 'EndingDateTime': '2023-04-27T17:33:09Z'}}
Size(MB): 101.56163311004639
Data: ['https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL2BMIN.001/EMIT_L2B_MIN_001_20230427T173257_2311711_009/EMIT_L2B_MIN_001_20230427T173257_2311711_009.nc', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL2BMIN.001/EMIT_L2B_MIN_001_20230427T173257_2311711_009/EMIT_L2B_MINUNCERT_001_20230427T173257_2311711_009.nc'],
Collection: {'ShortName': 'EMITL2BMIN', 'Version': '001'}
Spatial coverage: {'HorizontalSpatialDomain': {'Geometry': {'GPolygons': [{'Boundary': {'Points': [{'Longitude': -117.09042358398438, 'Latitude': 37.85598373413086}, {'Longitude': -117.54974365234375, 'Latitude': 37.194339752197266}, {'Longitude': -116.77647399902344, 'Latitude': 36.65752029418945}, {'Longitude': -116.31715393066406, 'Latitude': 37.31916427612305}, {'Longitude': -117.09042358398438, 'Latitude': 37.85598373413086}]}}]}}}
Temporal coverage: {'RangeDateTime': {'BeginningDateTime': '2023-04-27T17:33:09Z', 'EndingDateTime': '2023-04-27T17:33:21Z'}}
Size(MB): 100.93134117126465
Data: ['https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL2BMIN.001/EMIT_L2B_MIN_001_20230427T173309_2311711_010/EMIT_L2B_MIN_001_20230427T173309_2311711_010.nc', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL2BMIN.001/EMIT_L2B_MIN_001_20230427T173309_2311711_010/EMIT_L2B_MINUNCERT_001_20230427T173309_2311711_010.nc'],
Collection: {'ShortName': 'EMITL2BMIN', 'Version': '001'}
Spatial coverage: {'HorizontalSpatialDomain': {'Geometry': {'GPolygons': [{'Boundary': {'Points': [{'Longitude': -117.15681457519531, 'Latitude': 38.27824783325195}, {'Longitude': -118.03521728515625, 'Latitude': 37.68120193481445}, {'Longitude': -117.58502197265625, 'Latitude': 37.018863677978516}, {'Longitude': -116.70661926269531, 'Latitude': 37.615909576416016}, {'Longitude': -117.15681457519531, 'Latitude': 38.27824783325195}]}}]}}}
Temporal coverage: {'RangeDateTime': {'BeginningDateTime': '2023-08-04T19:16:50Z', 'EndingDateTime': '2023-08-04T19:17:02Z'}}
Size(MB): 101.90474605560303
Data: ['https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL2BMIN.001/EMIT_L2B_MIN_001_20230804T191650_2321613_007/EMIT_L2B_MIN_001_20230804T191650_2321613_007.nc', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL2BMIN.001/EMIT_L2B_MIN_001_20230804T191650_2321613_007/EMIT_L2B_MINUNCERT_001_20230804T191650_2321613_007.nc'],
Collection: {'ShortName': 'EMITL2BMIN', 'Version': '001'}
Spatial coverage: {'HorizontalSpatialDomain': {'Geometry': {'GPolygons': [{'Boundary': {'Points': [{'Longitude': -116.7279052734375, 'Latitude': 37.98891830444336}, {'Longitude': -117.59428405761719, 'Latitude': 37.39017105102539}, {'Longitude': -117.13900756835938, 'Latitude': 36.73139572143555}, {'Longitude': -116.27262878417969, 'Latitude': 37.330142974853516}, {'Longitude': -116.7279052734375, 'Latitude': 37.98891830444336}]}}]}}}
Temporal coverage: {'RangeDateTime': {'BeginningDateTime': '2023-08-08T17:39:53Z', 'EndingDateTime': '2023-08-08T17:40:05Z'}}
Size(MB): 102.3443374633789
Data: ['https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL2BMIN.001/EMIT_L2B_MIN_001_20230808T173953_2322012_011/EMIT_L2B_MIN_001_20230808T173953_2322012_011.nc', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL2BMIN.001/EMIT_L2B_MIN_001_20230808T173953_2322012_011/EMIT_L2B_MINUNCERT_001_20230808T173953_2322012_011.nc'],
Collection: {'ShortName': 'EMITL2BMIN', 'Version': '001'}
Spatial coverage: {'HorizontalSpatialDomain': {'Geometry': {'GPolygons': [{'Boundary': {'Points': [{'Longitude': -117.73664093017578, 'Latitude': 38.3143196105957}, {'Longitude': -118.18502044677734, 'Latitude': 37.653934478759766}, {'Longitude': -117.4017105102539, 'Latitude': 37.12208938598633}, {'Longitude': -116.95333099365234, 'Latitude': 37.782474517822266}, {'Longitude': -117.73664093017578, 'Latitude': 38.3143196105957}]}}]}}}
Temporal coverage: {'RangeDateTime': {'BeginningDateTime': '2023-10-23T18:39:05Z', 'EndingDateTime': '2023-10-23T18:39:17Z'}}
Size(MB): 101.62794208526611
Data: ['https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL2BMIN.001/EMIT_L2B_MIN_001_20231023T183905_2329612_009/EMIT_L2B_MIN_001_20231023T183905_2329612_009.nc', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL2BMIN.001/EMIT_L2B_MIN_001_20231023T183905_2329612_009/EMIT_L2B_MINUNCERT_001_20231023T183905_2329612_009.nc'],
Collection: {'ShortName': 'EMITL2BMIN', 'Version': '001'}
Spatial coverage: {'HorizontalSpatialDomain': {'Geometry': {'GPolygons': [{'Boundary': {'Points': [{'Longitude': -117.02520751953125, 'Latitude': 37.83479690551758}, {'Longitude': -117.48287963867188, 'Latitude': 37.17849349975586}, {'Longitude': -116.71324157714844, 'Latitude': 36.641788482666016}, {'Longitude': -116.25556945800781, 'Latitude': 37.298091888427734}, {'Longitude': -117.02520751953125, 'Latitude': 37.83479690551758}]}}]}}}
Temporal coverage: {'RangeDateTime': {'BeginningDateTime': '2023-10-23T18:39:17Z', 'EndingDateTime': '2023-10-23T18:39:29Z'}}
Size(MB): 100.52127742767334
Data: ['https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL2BMIN.001/EMIT_L2B_MIN_001_20231023T183917_2329612_010/EMIT_L2B_MIN_001_20231023T183917_2329612_010.nc', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL2BMIN.001/EMIT_L2B_MIN_001_20231023T183917_2329612_010/EMIT_L2B_MINUNCERT_001_20231023T183917_2329612_010.nc'],
Collection: {'ShortName': 'EMITL2BMIN', 'Version': '001'}
Spatial coverage: {'HorizontalSpatialDomain': {'Geometry': {'GPolygons': [{'Boundary': {'Points': [{'Longitude': -117.53460693359375, 'Latitude': 38.284488677978516}, {'Longitude': -118.40948486328125, 'Latitude': 37.68711471557617}, {'Longitude': -117.95744323730469, 'Latitude': 37.025081634521484}, {'Longitude': -117.08256530761719, 'Latitude': 37.62245559692383}, {'Longitude': -117.53460693359375, 'Latitude': 38.284488677978516}]}}]}}}
Temporal coverage: {'RangeDateTime': {'BeginningDateTime': '2023-12-01T20:10:58Z', 'EndingDateTime': '2023-12-01T20:11:10Z'}}
Size(MB): 101.65311431884766
Data: ['https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL2BMIN.001/EMIT_L2B_MIN_001_20231201T201058_2333513_006/EMIT_L2B_MIN_001_20231201T201058_2333513_006.nc', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL2BMIN.001/EMIT_L2B_MIN_001_20231201T201058_2333513_006/EMIT_L2B_MINUNCERT_001_20231201T201058_2333513_006.nc'],
Collection: {'ShortName': 'EMITL2BMIN', 'Version': '001'}
Spatial coverage: {'HorizontalSpatialDomain': {'Geometry': {'GPolygons': [{'Boundary': {'Points': [{'Longitude': -116.26719665527344, 'Latitude': 39.1148796081543}, {'Longitude': -117.70616149902344, 'Latitude': 38.172176361083984}, {'Longitude': -117.26689147949219, 'Latitude': 37.50168228149414}, {'Longitude': -115.82792663574219, 'Latitude': 38.44438552856445}, {'Longitude': -116.26719665527344, 'Latitude': 39.1148796081543}]}}]}}}
Temporal coverage: {'RangeDateTime': {'BeginningDateTime': '2023-12-01T20:11:10Z', 'EndingDateTime': '2023-12-01T20:11:31Z'}}
Size(MB): 170.77763843536377
Data: ['https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL2BMIN.001/EMIT_L2B_MIN_001_20231201T201110_2333513_007/EMIT_L2B_MIN_001_20231201T201110_2333513_007.nc', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL2BMIN.001/EMIT_L2B_MIN_001_20231201T201110_2333513_007/EMIT_L2B_MINUNCERT_001_20231201T201110_2333513_007.nc']]
3. Advanced Filtering
Now that we have some results, we will place them into a geodataframe that includes links to browse imagery and the files, so we can do some more advanced filtering of the data.
List the metadata fields available in the search results.
list_metadata_fields(results)
['size',
'concept-type',
'concept-id',
'revision-id',
'native-id',
'collection-concept-id',
'provider-id',
'format',
'revision-date',
'_beginning_date_time',
'_ending_date_time',
'_granule_ur',
'_additional_attributes',
'_gpolygons',
'_provider_dates',
'_short_name',
'_version',
'_pgename',
'_pgeversion',
'_related_urls',
'_cloud_cover',
'_day_night_flag',
'_archive_and_distribution_information',
'_production_date_time',
'_platforms',
'_url',
'_name',
'_version']
Some datasets have unique metadata that we can choose to include when we use our results_to_geopandas
function from the tutorial_utils.py
module to create a geodataframe. Below is a list of default fields. We can also include additional fields by passing them as a list to the fields
argument.
default_fields = [
“size”,
“concept_id”,
“dataset-id”,
“native-id”,
“provider-id”,
“_related_urls”,
“_single_date_time”,
“_beginning_date_time”,
“_ending_date_time”,
“geometry”,
]
For example, _cloud_cover
is not always available. We can add it to the default fields of this function by adding it to a fields
argument in list form.
= results_to_geopandas(results, fields=['_cloud_cover']) results_gdf
Add an index column so we can reference it using the explore
function from geopandas
# Specify index so we can reference it with gdf.explore()
'index']=results_gdf.index results_gdf[
results_gdf
size | native-id | provider-id | _beginning_date_time | _ending_date_time | _related_urls | _cloud_cover | geometry | index | |
---|---|---|---|---|---|---|---|---|---|
0 | 75.910495 | EMIT_L2B_MIN_001_20230129T211407_2302914_008 | LPCLOUD | 2023-01-29T21:14:07Z | 2023-01-29T21:14:19Z | [{'URL': 'https://data.lpdaac.earthdatacloud.n... | 99 | POLYGON ((-116.86430 37.96259, -117.72749 37.3... | 0 |
1 | 101.561633 | EMIT_L2B_MIN_001_20230427T173257_2311711_009 | LPCLOUD | 2023-04-27T17:32:57Z | 2023-04-27T17:33:09Z | [{'URL': 'https://data.lpdaac.earthdatacloud.n... | 21 | POLYGON ((-117.79698 38.33380, -118.24942 37.6... | 1 |
2 | 100.931341 | EMIT_L2B_MIN_001_20230427T173309_2311711_010 | LPCLOUD | 2023-04-27T17:33:09Z | 2023-04-27T17:33:21Z | [{'URL': 'https://data.lpdaac.earthdatacloud.n... | 8 | POLYGON ((-117.09042 37.85598, -117.54974 37.1... | 2 |
3 | 101.904746 | EMIT_L2B_MIN_001_20230804T191650_2321613_007 | LPCLOUD | 2023-08-04T19:16:50Z | 2023-08-04T19:17:02Z | [{'URL': 'https://data.lpdaac.earthdatacloud.n... | 4 | POLYGON ((-117.15681 38.27825, -118.03522 37.6... | 3 |
4 | 102.344337 | EMIT_L2B_MIN_001_20230808T173953_2322012_011 | LPCLOUD | 2023-08-08T17:39:53Z | 2023-08-08T17:40:05Z | [{'URL': 'https://data.lpdaac.earthdatacloud.n... | 6 | POLYGON ((-116.72791 37.98892, -117.59428 37.3... | 4 |
5 | 101.627942 | EMIT_L2B_MIN_001_20231023T183905_2329612_009 | LPCLOUD | 2023-10-23T18:39:05Z | 2023-10-23T18:39:17Z | [{'URL': 'https://data.lpdaac.earthdatacloud.n... | 74 | POLYGON ((-117.73664 38.31432, -118.18502 37.6... | 5 |
6 | 100.521277 | EMIT_L2B_MIN_001_20231023T183917_2329612_010 | LPCLOUD | 2023-10-23T18:39:17Z | 2023-10-23T18:39:29Z | [{'URL': 'https://data.lpdaac.earthdatacloud.n... | 32 | POLYGON ((-117.02521 37.83480, -117.48288 37.1... | 6 |
7 | 101.653114 | EMIT_L2B_MIN_001_20231201T201058_2333513_006 | LPCLOUD | 2023-12-01T20:10:58Z | 2023-12-01T20:11:10Z | [{'URL': 'https://data.lpdaac.earthdatacloud.n... | 65 | POLYGON ((-117.53461 38.28449, -118.40948 37.6... | 7 |
8 | 170.777638 | EMIT_L2B_MIN_001_20231201T201110_2333513_007 | LPCLOUD | 2023-12-01T20:11:10Z | 2023-12-01T20:11:31Z | [{'URL': 'https://data.lpdaac.earthdatacloud.n... | 53 | POLYGON ((-116.26720 39.11488, -117.70616 38.1... | 8 |
Filter the results geodataframe by cloud cover. We’ll use a cloud cover of 10% as our threshold.
# Filter Results
= results_gdf[results_gdf['_cloud_cover'] < 10]
results_gdf =True, inplace=True) results_gdf.reset_index(drop
Visualize the filtered results by iterating over the rows of the geodataframe and adding the geometry to the map. We do this instead of the explore
function, so we add separate layers for each, allowing use to add or remove them using the LayerControl
widget.
# Set up Figure and Basemap tiles
= Figure(width="1080px",height="540")
fig = folium.Map(tiles=None)
map1 ='https://mt1.google.com/vt/lyrs=y&x={x}&y={y}&z={z}',name='Google Satellite', attr='Google', overlay=True).add_to(map1)
folium.TileLayer(tiles='https://server.arcgisonline.com/ArcGIS/rest/services/World_Imagery/MapServer/tile/{z}/{y}/{x}.png',
folium.TileLayer(tiles='ESRI World Imagery',
name='Tiles © Esri — Source: Esri, i-cubed, USDA, USGS, AEX, GeoEye, Getmapping, Aerogrid, IGN, IGP, UPR-EGP, and the GIS User Community',
attr='True').add_to(map1)
overlay
fig.add_child(map1)
# Create a color map for the results
= cm.get_cmap('Set3')
cmap = len(results_gdf['native-id'].unique())
n = [cmap(i) for i in range(n)]
colors = [cm.colors.rgb2hex(color) for color in colors]
colors
# Add Search Results by Row
for index, row in results_gdf.iterrows():
= colors[index % len(colors)]
color = row['native-id'],style_function=lambda feature, color=color: {'color': color, 'fillColor': color}).add_to(map1)
folium.GeoJson(row.geometry, name
folium.GeoJson(roi_gdf,='Cuprite_ROI',
name
).add_to(map1)
# Zoom to Data
=convert_bounds(results_gdf.unary_union.bounds))
map1.fit_bounds(bounds# Add Layer controls
=False))
map1.add_child(folium.LayerControl(collapsed display(fig)
C:\Users\ebolch\AppData\Local\Temp\1\ipykernel_16232\2570979738.py:13: MatplotlibDeprecationWarning: The get_cmap function was deprecated in Matplotlib 3.7 and will be removed two minor releases later. Use ``matplotlib.colormaps[name]`` or ``matplotlib.colormaps.get_cmap(obj)`` instead.
cmap = cm.get_cmap('Set3')
View the related urls for the first result. We can see that there are multiple assets available for each result, including the mineralogy data, uncertainty data, and browse images, as well as multiple ways to access the data, https or s3 links.
0] results_gdf._related_urls[
[{'URL': 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL2BMIN.001/EMIT_L2B_MIN_001_20230427T173309_2311711_010/EMIT_L2B_MIN_001_20230427T173309_2311711_010.nc',
'Description': 'Download EMIT_L2B_MIN_001_20230427T173309_2311711_010.nc',
'Type': 'GET DATA'},
{'URL': 's3://lp-prod-protected/EMITL2BMIN.001/EMIT_L2B_MIN_001_20230427T173309_2311711_010/EMIT_L2B_MIN_001_20230427T173309_2311711_010.nc',
'Description': 'This link provides direct download access via S3 to the granule',
'Type': 'GET DATA VIA DIRECT ACCESS'},
{'URL': 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL2BMIN.001/EMIT_L2B_MIN_001_20230427T173309_2311711_010/EMIT_L2B_MINUNCERT_001_20230427T173309_2311711_010.nc',
'Description': 'Download EMIT_L2B_MINUNCERT_001_20230427T173309_2311711_010.nc',
'Type': 'GET DATA'},
{'URL': 's3://lp-prod-protected/EMITL2BMIN.001/EMIT_L2B_MIN_001_20230427T173309_2311711_010/EMIT_L2B_MINUNCERT_001_20230427T173309_2311711_010.nc',
'Description': 'This link provides direct download access via S3 to the granule',
'Type': 'GET DATA VIA DIRECT ACCESS'},
{'URL': 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-public/EMITL2BMIN.001/EMIT_L2B_MIN_001_20230427T173309_2311711_010/EMIT_L2B_MIN_001_20230427T173309_2311711_010.png',
'Description': 'Download EMIT_L2B_MIN_001_20230427T173309_2311711_010.png',
'Type': 'GET RELATED VISUALIZATION'},
{'URL': 's3://lp-prod-public/EMITL2BMIN.001/EMIT_L2B_MIN_001_20230427T173309_2311711_010/EMIT_L2B_MIN_001_20230427T173309_2311711_010.png',
'Description': 'This link provides direct download access via S3 to the granule',
'Type': 'GET RELATED VISUALIZATION'}]
We can use a function to return the asset URLs for a given result. This function will return a dictionary with the asset names as keys and the URLs as values.
def get_asset_url(row,asset, key='Type',value='GET DATA'):
"""
Retrieve a url from the list of dictionaries for a row in the _related_urls column.
Asset examples: CH4PLM, CH4PLMMETA, RFL, MASK, RFLUNCERT
"""
# Add _ to asset so string matching works
= f"_{asset}_"
asset # Retrieve URL matching parameters
for _dict in row['_related_urls']:
if _dict.get(key) == value and asset in _dict['URL'].split('/')[-1]:
return _dict['URL']
Apply the function for to the results geodataframe to get the asset URLs for each result for the L2B_MIN
asset.
# Iterate over rows in the plm_gdf and get the mineral urls and store them in a list
= results_gdf.apply(lambda row: get_asset_url(row, asset='L2B_MIN'), axis=1).tolist()
min_urls min_urls
['https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL2BMIN.001/EMIT_L2B_MIN_001_20230427T173309_2311711_010/EMIT_L2B_MIN_001_20230427T173309_2311711_010.nc',
'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL2BMIN.001/EMIT_L2B_MIN_001_20230804T191650_2321613_007/EMIT_L2B_MIN_001_20230804T191650_2321613_007.nc',
'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL2BMIN.001/EMIT_L2B_MIN_001_20230808T173953_2322012_011/EMIT_L2B_MIN_001_20230808T173953_2322012_011.nc']
We can repeat this for the uncertainty URLs.
= results_gdf.apply(lambda row: get_asset_url(row, asset='L2B_MINUNCERT'), axis=1).tolist()
min_unc_urls min_unc_urls
['https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL2BMIN.001/EMIT_L2B_MIN_001_20230427T173309_2311711_010/EMIT_L2B_MINUNCERT_001_20230427T173309_2311711_010.nc',
'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL2BMIN.001/EMIT_L2B_MIN_001_20230804T191650_2321613_007/EMIT_L2B_MINUNCERT_001_20230804T191650_2321613_007.nc',
'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL2BMIN.001/EMIT_L2B_MIN_001_20230808T173953_2322012_011/EMIT_L2B_MINUNCERT_001_20230808T173953_2322012_011.nc']
With some knowledge of how the granules and assets are neamed, we can grab the rgb browse images to get an idea of what the location looks like. First retrieve the browse images for the mineral product. These, show the mineral band depth only.
= results_gdf.apply(lambda row: get_asset_url(row, asset='L2B_MIN', value='GET RELATED VISUALIZATION'), axis=1).tolist()
min_png min_png
['https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-public/EMITL2BMIN.001/EMIT_L2B_MIN_001_20230427T173309_2311711_010/EMIT_L2B_MIN_001_20230427T173309_2311711_010.png',
'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-public/EMITL2BMIN.001/EMIT_L2B_MIN_001_20230804T191650_2321613_007/EMIT_L2B_MIN_001_20230804T191650_2321613_007.png',
'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-public/EMITL2BMIN.001/EMIT_L2B_MIN_001_20230808T173953_2322012_011/EMIT_L2B_MIN_001_20230808T173953_2322012_011.png']
With some slight changes to these, we can retrieve the RGB browse images from the L2A Reflectance product.
# Replace Collection ID
= [s.replace('EMITL2BMIN', 'EMITL2ARFL') for s in min_png]
rgb_urls # Update Product and Asset Names
= [s.replace('EMIT_L2B_MIN', 'EMIT_L2A_RFL') for s in rgb_urls]
rgb_urls # Change file extension
#rgb_urls = [s.replace('.nc', '.png') for s in rgb_urls]
rgb_urls
['https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-public/EMITL2ARFL.001/EMIT_L2A_RFL_001_20230427T173309_2311711_010/EMIT_L2A_RFL_001_20230427T173309_2311711_010.png',
'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-public/EMITL2ARFL.001/EMIT_L2A_RFL_001_20230804T191650_2321613_007/EMIT_L2A_RFL_001_20230804T191650_2321613_007.png',
'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-public/EMITL2ARFL.001/EMIT_L2A_RFL_001_20230808T173953_2322012_011/EMIT_L2A_RFL_001_20230808T173953_2322012_011.png']
Visualize the RGB browse images to get an idea for what the area we are investigating looks like.
= 3
cols = math.ceil(len(results_gdf)/cols)
rows = plt.subplots(rows, cols, figsize=(12,12))
fig, ax = ax.flatten()
ax
for _n, index in enumerate(results_gdf.index.to_list()):
= io.imread(rgb_urls[index])
img
ax[_n].imshow(img)f"Index: {index} - {results_gdf['native-id'][index]}", fontsize=8)
ax[_n].set_title('off')
ax[_n].axis(
plt.tight_layout() plt.show()
The black line in the third scene is caused by the on-board cloud masking.
5. Saving Lists of Results
We can save our lists of results URLs as a text file for later use, either to download the data, or stream it.
with open('../../data/rgb_browse_urls.txt', 'w') as f:
for line in rgb_urls:
f"{line}\n") f.write(
with open('../../data/results_urls.txt', 'w') as f:
for line in min_urls:
f"{line}\n") f.write(
with open('../../data/min_uncert_urls.txt', 'w') as f:
for line in min_unc_urls:
f"{line}\n") f.write(
6. Streaming or Downloading Data
For the workshop, we will stream the data, but either method can be used, and each has trade-offs based on the internet speed, storage space, or use case. The EMIT files are very large due to the number of bands, so operations can take some time if streaming with a slower internet connection. Since the workshop is hosted in a Cloud workspace, we can stream the data directly to the workspace.
6.1 Streaming Data Workflow
For an example of streaming both netCDF please see Working with EMIT L2B Mineralogy.ipynb.
If you plan to stream the data, you can stop here and move to the next notebook.
6.2 Downloading Data Workflow
To download the scenes, we can use the earthaccess library to authenticate then download the files.
First, log into Earthdata using the login function from the earthaccess library. The persist=True argument will create a local .netrc file if it doesn’t exist, or add your login info to an existing .netrc file. If no Earthdata Login credentials are found in the .netrc you’ll be prompted for them. As mentioned in section 1.2, this step is not necessary to conduct searches, but is needed to download or stream data.
The outputs saved in section 5 can be downloading by uncommenting and running the following cells.
# # Authenticate using earthaccess
# earthaccess.login(persist=True)
# # Open Text File and Read Lines
# file_list = ['../../data/rgb_browse_urls.txt','../../data/results_urls.txt']
# urls = []
# for file in file_list:
# with open(file) as f:
# urls.extend([line.rstrip('\n') for line in f])
# # Get requests https Session using Earthdata Login Info
# fs = earthaccess.get_requests_https_session()
# # Retrieve granule asset ID from URL (to maintain existing naming convention)
# for url in urls:
# granule_asset_id = url.split('/')[-1]
# # Define Local Filepath
# fp = f'../../data/{granule_asset_id}'
# # Download the Granule Asset if it doesn't exist
# if not os.path.isfile(fp):
# with fs.get(url,stream=True) as src:
# with open(fp,'wb') as dst:
# for chunk in src.iter_content(chunk_size=64*1024*1024):
# dst.write(chunk)
Contact Info:
Email: LPDAAC@usgs.gov
Voice: +1-866-573-3222
Organization: Land Processes Distributed Active Archive Center (LP DAAC)¹
Website: https://lpdaac.usgs.gov/
Date last modified: 06-28-2024
¹Work performed under USGS contract 140G0121D0001 for NASA contract NNG14HH33I.