# Import required libraries
import sys
import os
import glob
import requests
import numpy as np
import pandas as pd
import xarray as xr
from osgeo import gdal
import geopandas as gpd
from datetime import datetime
import folium
import earthaccess
import folium.plugins
import rioxarray as rxr
import hvplot.xarray
import hvplot.pandas
from branca.element import Figure
from IPython.display import display
from shapely.geometry.polygon import orient
from shapely.geometry import Point
'../modules/')
sys.path.append(from emit_tools import emit_xarray, ortho_xr, ortho_browse
from tutorial_utils import list_metadata_fields, results_to_geopandas, convert_bounds
This notebook is from EMIT-Data-Resources
Source: Visualizing Methane Plume Timeseries
Imported on: 2025-01-22
Visualizing Methane Plume Timeseries
Summary
In this notebook, we’ll conduct a search and visualize the available methane plume complex observations detected by the Earth Mineral Dust Source Investigation (EMIT) instrument, then select a region of interest (ROI) that shows several plumes that appear to be from the same source. We will then visualize a time series of plume data for the ROI, calculate the integrated methane enhancement, or mass of methane observed in each plume. Lastly, to add more context to the plume timeseries, we will look at all EMIT observations over the target regions and determine if there were factors such as clouds limiting plume detection, or simply no methane emissions during additional overpasses.
NOTE: Throughout this notebook, several complex functions and workflows are used to process and visualize data. These can be found in the
emit_tools.py
andtutorial_utils.py
modules.
Background
Methane is major contributor to atmospheric radiative forcing because its more efficient at trapping radiation than other greenhouse gases, like carbon dioxide. Because the lifetime of methane in the atmosphere is only about 10 years, reducing methane emissions offers an effective way to curb anthropogenic contributions to atmospheric radiative forcing.
The EMIT instrument is an imaging spectrometer that measures light in visible and infrared wavelengths. These measurements display unique spectral signatures that correspond to chemical composition. Although the primary mission focuses on mapping the mineral composition of Earth’s surface, the EMIT instrument has also been used to successfully map methane point source emissions within its target mask using the unique spectral fingerprint of methane. More details about EMIT and its associated products can be found in the README.md and on the EMIT website.
The L2B Estimated Methane Plume Complexes (EMITL2BCH4PLM) product provides delineated methane plume complexes in parts per million meter (ppm m) along with associated uncertainties. This product is a plume specific subset of the EMIT L2B Methane Enhancement Data (EMITL2BCH4ENH) product. Each EMITL2BCH4PLM granule is sized to a specific plume complex but may cross multiple EMITL2BCH4ENH granules. A list of source EMITL2BCH4ENH granules is included in the GeoTIFF file metadata as well as the GeoJSON file. Each EMITL2BCH4PLM granule contains two files: one Cloud Optimized GeoTIFF (COG) file at a spatial resolution of 60 meters (m) and one GeoJSON file. The EMITL2BCH4PLM COG file contains a raster image of a methane plume complex extracted from EMITL2BCH4ENH v001 data. The EMITL2BCH4PLM GeoJSON file contains a vector outline of the plume complex, a list of source scenes, coordinates of the maximum enhancement values with associated uncertainties.
The EMITL2BCH4ENH product only includes granules where methane plume complexes have been identified. To reduce the risk of false positives, all EMITL2BCH4ENH data undergo a manual review (or identification and confirmation) process before being designated as a plume complex. For more information on the manual review process, see Section 4.2.2 of the EMIT GHG Algorithm Theoretical Basis Document (ATBD).
References
Andrew K. Thorpe et al., Attribution of individual methane and carbon dioxide emission sources using EMIT observations from space. Sci. Adv.9, eadh2391 (2023). DOI:10.1126/sciadv.adh2391
Requirements - Set up Python Environment - See setup_instructions.md in the /setup/
folder - NASA Earthdata Login Account. Sign Up
Data Used - EMIT L2B Estimated Methane Plume Complexes (EMITL2BCH4PLM) - EMIT L2A Estimated Surface Reflectance and Uncertainty and Masks (EMITL2ARFL)
Learning Objectives - Search for EMIT L2B Estimated Methane Plume Complexes - Visualize search results on a map - Retrieve and visualize the EMIT L2B Estimated Methane Plume Complexes Metadata - Select a region of interest and build a timeseries of plume data - Further investigate plume detection by looking at browse images and quality information
Tutorial Outline
- Setup
- Search for EMIT L2B Estimated Methane Plume Complexes
- Creating a Timeseries from Plume Data
- Further investigation into plume detection
- Calculating the Integrated Mass Enhancement for Plumes
1. Set up
Import the necessary Python libraries.
Login using earthaccess
and create a .netrc
file if necessary. This file will store your NASA Earthdata Login credentials and use them to authenticate when necessary.
=True) earthaccess.login(persist
All of the data we use or save will go into the methane_tutorial
directory, so we can go ahead and define that filepath now, relative to this notebook.
= '../../data/methane_tutorial/' methane_dir
2. Search for EMIT L2B Estimated Methane Plume Complexes
Use earthaccess
to find all EMIT L2B Estimated Methane Plume Complexes (EMITL2BCH4PLM) data available from 2023. Define the date range, and concept-ids (unique product identifier) for the EMIT products that we want to search for, but leave the spatial arguments like polygon
and bbox
empty so we can preview detected methane plumes globally.
# Data Collections for our search, using a dictionary
= {'plumes':'C2748088093-LPCLOUD', 'reflectance':'C2408750690-LPCLOUD'}
concept_ids # Define Date Range
= ('2023-01-01','2023-12-31') date_range
= earthaccess.search_data(
results =concept_ids['plumes'],
concept_id=date_range,
temporal=2000
count )
Convert the results to a geopandas.GeoDataFrame
using a function from our tutorial_utils
module. This gives a nice way to organize and visualize the search results.
= results_to_geopandas(results)
gdf gdf
The function includes default fields, but you can add more with the fields
argument. To see all of the metadata available use the list_metadata_fields
function imported from the tutorial_utils.py
module.
list_metadata_fields(results)
Add an index column to the dataframe to include it in the tooltips for our visualization.
# Specify index so we can reference it with gdf.explore()
'index'] = gdf.index gdf[
# Set up Figure and Basemap tiles
= Figure(width="1080px",height="540")
fig = folium.Map(tiles=None)
map1 ='https://mt1.google.com/vt/lyrs=y&x={x}&y={y}&z={z}',name='Google Satellite', attr='Google', overlay=True).add_to(map1)
folium.TileLayer(tiles='https://server.arcgisonline.com/ArcGIS/rest/services/World_Imagery/MapServer/tile/{z}/{y}/{x}.png',
folium.TileLayer(tiles='ESRI World Imagery',
name='Tiles © Esri — Source: Esri, i-cubed, USDA, USGS, AEX, GeoEye, Getmapping, Aerogrid, IGN, IGP, UPR-EGP, and the GIS User Community',
attr='True').add_to(map1)
overlay
fig.add_child(map1)# Add Search Results gdf
"_single_date_time",
gdf.explore(=True,
categorical={"fillOpacity":0.1,"width":2},
style_kwds="EMIT L2B CH4PLM",
name=[
tooltip"index",
"native-id",
"_single_date_time",
],=map1,
m=False
legend
)
# Zoom to Data
=convert_bounds(gdf.unary_union.bounds))
map1.fit_bounds(bounds# Add Layer controls
=False))
map1.add_child(folium.LayerControl(collapsed display(fig)
In this example we’ll chose a region that looks like it has a several plumes emitting from the same source, a landfill in Jordan. We could create a simple bounding box around our target region by using the plumes that extend furthest in the cardinal directions to generate a bounding box around the region that we can use in our upcoming analysis.
However, to simplify things we will import an existing geojson
as a GeoDataFrame
with a bounding box around this region because the plume indices may change, which results in inconsistent outputs. A commented out cell below as included as an example of how the GeoDataFrame
was created before being written to geojson
.
# # Select a list of plumes to create geometry we can use for a spatial subset
# plumes = [146,198,243]
# bbox = gdf.loc[plumes].geometry.unary_union.envelope
# bbox = orient(bbox, sign=1)
# plume_bbox = gpd.GeoDataFrame({"name":['plume_bbox'], "geometry":[bbox]},crs=gdf.crs)
Open the predefined geojson
of our plume bounding box as a GeoDataFrame
then visualize on our folium
figure.
# Open the geojson with our plume bbox
= gpd.read_file(f'{methane_dir}/plume_bbox.geojson') plume_bbox
# Set up Figure and Basemap tiles
= Figure(width="1080px",height="540")
fig = folium.Map(tiles=None)
map1 ='https://mt1.google.com/vt/lyrs=y&x={x}&y={y}&z={z}',name='Google Satellite', attr='Google', overlay=True).add_to(map1)
folium.TileLayer(tiles='https://server.arcgisonline.com/ArcGIS/rest/services/World_Imagery/MapServer/tile/{z}/{y}/{x}.png',
folium.TileLayer(tiles='ESRI World Imagery',
name='Tiles © Esri — Source: Esri, i-cubed, USDA, USGS, AEX, GeoEye, Getmapping, Aerogrid, IGN, IGP, UPR-EGP, and the GIS User Community',
attr='True').add_to(map1)
overlay
fig.add_child(map1)# Add Search Results gdf
"name",
plume_bbox.explore(='Plume BBox',
name={"fillOpacity":0,"width":2},
style_kwds=map1,
m=False)
legend
"_single_date_time",
gdf.explore(=True,
categorical={"fillOpacity":0.1,"width":2},
style_kwds="EMIT L2B CH4PLM",
name=[
tooltip"index",
"native-id",
"_single_date_time",
],=map1,
m=False
legend
)# Zoom to Data
=convert_bounds(plume_bbox.unary_union.bounds))
map1.fit_bounds(bounds# Add Layer controls
=False))
map1.add_child(folium.LayerControl(collapsed display(fig)
Subset our geodataframe of global plumes to only those that intersect our bounding box.
= gdf[gdf.geometry.intersects(plume_bbox.geometry[0])]
plm_gdf plm_gdf
Let’s examine the _related_urls
column in our geodataframe. This column contains various links to assets related to each plume.
'_related_urls'].iloc[0] plm_gdf[
We can write a function to return the asset URL
for a given asset and row in our dataframe.
def get_asset_url(row,asset, key='Type',value='GET DATA'):
"""
Retrieve a url from the list of dictionaries for a row in the _related_urls column.
Asset examples: CH4PLM, CH4PLMMETA, RFL, MASK, RFLUNCERT
"""
# Add _ to asset so string matching works
= f"_{asset}_"
asset # Retrieve URL matching parameters
for _dict in row['_related_urls']:
if _dict.get(key) == value and asset in _dict['URL'].split('/')[-1]:
return _dict['URL']
Write another function to make a request to retrieve the json metadata for a given plume, using our get_asset_url
function to select the correct URL from the dataframe.
# Function to fetch CH4 Plume Metadata
# Speed could be improved here by using asyncio/aiohttp
def fetch_ch4_metadata(row):
= requests.get(get_asset_url(row, 'CH4PLMMETA'))
response = response.json()
json return json['features'][0]['properties']
0]) fetch_ch4_metadata(plm_gdf.iloc[
Retrieve additional plume metadata contained in the EMIT L2B Estimated Methane Plume Complexes (EMITL2BCH4PLM) data product, which contains the maximum enhancement value, the uncertainty of the plume complex, and the list of source scenes.
# Apply the function to each row and convert the result to a DataFrame
= plm_gdf.apply(fetch_ch4_metadata, axis=1).apply(pd.Series) plm_meta
plm_meta
We can add the points with highest methane concentration to our visualization.
Create an index column, as we did for the plumes, then convert the latitude and longitude of max concentration to a shapely Point
object and add it to our GeoDataFrame
.
# Specify index so we can reference it with gdf.explore()
'index'] = plm_meta.index
plm_meta[# Add Geometry and convert to geodataframe
'geometry'] = plm_meta.apply(lambda row: Point(row['Longitude of max concentration'], row['Latitude of max concentration']), axis=1)
plm_meta[= gpd.GeoDataFrame(plm_meta, geometry='geometry', crs='EPSG:4326') plm_meta
plm_meta
Now add this to our visualization.
# Set up Figure and Basemap tiles
= Figure(width="1080px",height="540")
fig = folium.Map(tiles=None)
map1 ='https://mt1.google.com/vt/lyrs=y&x={x}&y={y}&z={z}',name='Google Satellite', attr='Google', overlay=True).add_to(map1)
folium.TileLayer(tiles='https://server.arcgisonline.com/ArcGIS/rest/services/World_Imagery/MapServer/tile/{z}/{y}/{x}.png',
folium.TileLayer(tiles='ESRI World Imagery',
name='Tiles © Esri — Source: Esri, i-cubed, USDA, USGS, AEX, GeoEye, Getmapping, Aerogrid, IGN, IGP, UPR-EGP, and the GIS User Community',
attr='True').add_to(map1)
overlay
fig.add_child(map1)# Add Search Results gdf
"name",
plume_bbox.explore(='Plume BBox',
name={"fillOpacity":0,"width":2},
style_kwds=map1,
m=False)
legend
"index",
plm_gdf.explore(=True,
categorical={"fillOpacity":0.1,"width":2},
style_kwds="EMIT L2B CH4PLM",
name=[
tooltip"index",
"native-id",
"_single_date_time",
],=map1,
m=False
legend
)
"index",
plm_meta.explore(=True,
categorical={"fillOpacity":0.1,"width":2},
style_kwds="Location of Max Concentration (ppm m)",
name=[
tooltip"DAAC Scene Names",
"UTC Time Observed",
"Max Plume Concentration (ppm m)",
"Concentration Uncertainty (ppm m)",
"Orbit"
],=map1,
m=False
legend
)# Zoom to Data
=convert_bounds(plume_bbox.unary_union.bounds))
map1.fit_bounds(bounds# Add Layer controls
=False))
map1.add_child(folium.LayerControl(collapsed display(fig)
3. Creating a Timeseries from Plume Data
We can visualize a timeseries of these plumes that appear to be from the same source. To do this we’ll generate a list of the COG URLs for the plumes, then use rioxarray to stream the data and build a timeseries dataset based on the timestamps in the filenames.
Use the get_asset_url
function to retrieve the CH4PLM asset URLs by applying it to our dataframe and converting the output to a list.
# Iterate over rows in the plm_gdf and get the CH4PLM URLs and store them in a list
= plm_gdf.apply(lambda row: get_asset_url(row, asset='CH4PLM'), axis=1).tolist()
plm_urls plm_urls
Now that we have a list of COG urls we can set our gdal
configuration options to pass our NASA Earthdata login credentials when we access each COG.
# GDAL configurations used to successfully access LP DAAC Cloud Assets via vsicurl
'GDAL_HTTP_COOKIEFILE','~/cookies.txt')
gdal.SetConfigOption('GDAL_HTTP_COOKIEJAR', '~/cookies.txt')
gdal.SetConfigOption('GDAL_DISABLE_READDIR_ON_OPEN','EMPTY_DIR')
gdal.SetConfigOption('CPL_VSIL_CURL_ALLOWED_EXTENSIONS','TIF')
gdal.SetConfigOption('GDAL_HTTP_UNSAFESSL', 'YES')
gdal.SetConfigOption('GDAL_HTTP_MAX_RETRY', '10')
gdal.SetConfigOption('GDAL_HTTP_RETRY_DELAY', '0.5') gdal.SetConfigOption(
To build our timeseries we will start by opening all the necessary data. Loop over our list of URLs, open each plume, merge plumes acquired at the same time, and store them in a dictionary where keys correspond to the acquisition time and values are the plume data in an xarray.DataArray
.
= {}
plm_ts_dict # Set max retries for vsicurl errors
=5
max_retries# Iterate over plm urls
for url in plm_urls:
# retrieve acquisition time from url
= url.split('/')[-1].split('.')[-2].split('_')[-2]
acquisition_time # list plumes identified in same scene if there are any
= [url for url in plm_urls if acquisition_time in url.split('/')[-1].split('.')[-2].split('_')[-2]]
same_scene = []
to_merge # prevent duplicate processing of plumes from the same scene
if acquisition_time not in list(plm_ts_dict.keys()):
# Open and merge plumes identified from each scene
for _plm in same_scene:
print(f"Opening {_plm.split('/')[-1]}")
# Open COG and squeeze band dimension
= rxr.open_rasterio(_plm).squeeze('band', drop=True)
plm # Add to list of plumes to merge
to_merge.append(plm)# Merge plumes and add to timeseries
= rxr.merge.merge_arrays(to_merge) plm_ts_dict[acquisition_time]
Now that we have a plume object for each date in our timeseries, we need to put them all on a common grid so they are spatially aligned before we can stack them along the time dimension.
To do this, we will find the minimum and maximum bounds of each dataarray in our dictionary, then create a common grid to reproject to.
from typing import List
def create_common_grid(data_arrays: List[xr.DataArray]) -> xr.DataArray:
"""
Create a common grid for a list of xarray DataArrays, matching the resolution of the first data array.
"""
# Initial Bounds for common grid
= miny = float('inf')
minx = maxy = float('-inf')
maxx
for array in data_arrays:
= array.rio.bounds()
left, bottom, right, top = min(minx, left)
minx = min(miny, bottom)
miny = max(maxx, right)
maxx = max(maxy, top)
maxy
= (minx, miny, maxx, maxy)
bounds
= data_arrays[0].rio.resolution()
res = data_arrays[0].rio.crs
crs = data_arrays[0].rio.nodata
nodata
# Calculate new raster shape using the new extent, maintaining the original resolution
= int(np.ceil((bounds[3] - bounds[1]) / abs(res[1])))
height = int(np.ceil((bounds[2] - bounds[0]) / abs(res[0])))
width = np.full((height,width),nodata)
data = {'y':(['y'],np.arange(bounds[1], bounds[3], abs(res[1]))),
coords 'x':(['x'],np.arange(bounds[0], bounds[2], abs(res[0])))}
= xr.DataArray(data, coords=coords)
common_grid =True)
common_grid.rio.write_crs(crs, inplacereturn(common_grid)
= create_common_grid(list(plm_ts_dict.values()))
common_grid common_grid
Reproject each of the plume dataarrays to the common grid.
= {key: value.rio.reproject_match(common_grid) for key, value in plm_ts_dict.items()} plm_ts_dict
Now that we have all of our plumes on a standard grid, we can concatenate them along a time dimension to create a timeseries. Create an xarray variable called ‘time’ from our dictionary keys, then use xarray.concat
to concatenate all of our plumes along the time dimension.
= xr.Variable('time', [datetime.strptime(t,'%Y%m%dT%H%M%S') for t in list(plm_ts_dict.keys())])
plm_time plm_time
= xr.concat(list(plm_ts_dict.values()), dim=plm_time)
plm_ts_ds plm_ts_ds
Set our no_data values to np.nan
to make sure they are transparent for improved visualization.
== -9999] = np.nan plm_ts_ds.data[plm_ts_ds.data
Plot the plume time series.
= plm_ts_ds.hvplot.image(x='x',y='y',geo=True, tiles='ESRI', crs='EPGS:4326', cmap='inferno',clim=(0,np.nanmax(plm_ts_ds.data)),clabel=f'Methane Concentation ({plm_ts_ds.Units})', frame_width=600, frame_height=600, rasterize=True) plm_ts_plot
plm_ts_plot
4. Calculating the IME for each plume
The integrated mass enhancement (IME), a sum of the mass of methane present in each plume is calculated by summing the mass of methane present in all plume pixels. The IME in kilograms (kg) can be estimated by using the equation below which includes the mixing ratio length per pixel (ppmm), the area of the pixel (m^2), where 22.4 is the volume of 1 mole of gas at standard temperature and pressure (STP) in liters, and 0.01604 is the molar mass of methane in kg/mol.
The IME can be combined with a plume length and windspeed to estimate emissions in units of kg per hour, but in this tutorial, we will not calculate emissions, rather keep things simple and calculate the IME for each plume and observe how these values change over time.
We can write this as a function for each pixel, then apply it to the entire timeseries to calculate the IME for each plume.
def calc_ime(plume_da):
= 22.4 # L/mol at STP
molar_volume = 0.01604 #kg/mol
molar_mass_ch4
= plume_da * (1/1e6) * (60*60) * (1000) * (1/molar_volume) * molar_mass_ch4
kg = np.nansum(kg)
ime return ime
# Apply the function along the 'x' and 'y' dimensions
= xr.apply_ufunc(calc_ime, plm_ts_ds, input_core_dims=[['y', 'x']], vectorize=True)
ime_ts = 'value' ime_ts.name
= ime_ts.hvplot.scatter(x='time',y='value', title='Observed Methane IME over 2023', color='black', xticks=list(ime_ts.time.data), rot=90, grid=True, xlabel='Date Observed', ylabel='kg') ime_plot
ime_plot
5. Further investigation into plume detection
The timeseries shown above doesn’t necessarily give us a full a full picture methane emissions at the landfill. In addition to cases where no emissions were observed, gaps in data can result from absence of observations due the variable revisit period of the ISS, or clouds. To add more context to this plume timeseries, we will look at all EMIT acquisitions over the target regions and try to determine if there were factors affecting plume detection, or simply no methane emissions during those for any other overpass.
For this search we’ll want to restrict our search to a smaller ROI than our bounding box. We want to pick something smaller more centered around the source of the emission, so we avoid retrieving irrelevant data, for example an overpass barely crossing the corner of our large bounding box. To do this we will use the maximum concentration points from our plumes to create a smaller ROI that is likely to include a methane plume if one is present.
Create a new polygon using the maximum concentration points from our plumes as vertices, then orienting the points in counter-clockwise order so they are compatible with an earthaccess
search. This is just a simple example approach, there are several ways we could define a smaller ROI using the plume data or ancillary information.
= plm_meta.geometry.unary_union.envelope
max_conc_poly = orient(max_conc_poly, sign=1.0)
max_conc_poly = gpd.GeoDataFrame({"name":['max_points'], "geometry":[max_conc_poly]},crs=plm_meta.crs) max_conc_gdf
= list(max_conc_gdf.geometry[0].exterior.coords)
roi roi
Now conduct a search for reflectance data over our ROI and convert the results to a geopandas.GeoDataFrame
.
= earthaccess.search_data(
rfl_results =concept_ids['reflectance'],
concept_id=roi,
polygon= date_range, #('2023-03-01','2023-03-31')
temporal=2000
count
)= results_to_geopandas(rfl_results)
rfl_gdf rfl_gdf
From the results we can see we have 20 scenes that intersect our ROI. We can visualize the footprints of these scenes to gain some insight into coverage over our ROI.
# Set up Figure and Basemap tiles
= Figure(width="1080px",height="540")
fig = folium.Map(tiles=None)
map1 ='https://mt1.google.com/vt/lyrs=y&x={x}&y={y}&z={z}',name='Google Satellite', attr='Google', overlay=True).add_to(map1)
folium.TileLayer(tiles='https://server.arcgisonline.com/ArcGIS/rest/services/World_Imagery/MapServer/tile/{z}/{y}/{x}.png',
folium.TileLayer(tiles='ESRI World Imagery',
name='Tiles © Esri — Source: Esri, i-cubed, USDA, USGS, AEX, GeoEye, Getmapping, Aerogrid, IGN, IGP, UPR-EGP, and the GIS User Community',
attr='True').add_to(map1)
overlay
fig.add_child(map1)
# Add Search Reflectance Scenes with no CH4
='red',
rfl_gdf.explore(color={"fillOpacity":0,"width":2},
style_kwds="Scenes with no CH4 Plumes",
name=[
tooltip"native-id",
"_beginning_date_time",
],=map1,
m=False)
legend
# Add Plume BBox to Map
=map1,
max_conc_gdf.explore(m='Plumes Bounding Box',
name=False)
legend
# Zoom to Data
=convert_bounds(rfl_gdf.unary_union.bounds))
map1.fit_bounds(bounds# Add Layer controls
=False))
map1.add_child(folium.LayerControl(collapsed display(fig)
From this we can see that there are several scenes that intersect with our ROI, but likely have no relevant information since they only cover a small portion or corner of the ROI.
We can use a similar process as with the methane product to construct a time series to better understand the data gathered on each overpass. To do this, we will use the browse imagery and the masks included in the EMITL2ARFL product. The by default the mask files and browse images are not orthorectified, so we must do that as part of our workflow.
First, get the urls for the browse images and masks for each scene in our rfl_gdf
search results using the get_asset_urls
function.
= rfl_gdf.apply(lambda row: get_asset_url(row, asset='RFL', value='GET RELATED VISUALIZATION'), axis=1).tolist()
png_urls png_urls
= rfl_gdf.apply(lambda row: get_asset_url(row, asset='MASK'), axis=1).tolist()
mask_urls mask_urls
We can write these as a text file so we don’t need to search again, although we will use the rfl_gdf
GeoDataFrame later in the tutorial.
# # Save URL List
# with open(f'{methane_dir}rfl_mask_urls.txt', 'w') as f:
# for line in mask_urls:
# f.write(f"{line}\n")
Since the mask files are not chunked, its quicker to download them to do the processing.
Login with earthaccess
and download these files.
=True)
earthaccess.login(persist# Get requests https Session using Earthdata Login Info
= earthaccess.get_requests_https_session()
fs # Retrieve granule asset ID from URL (to maintain existing naming convention)
for url in mask_urls:
= url.split('/')[-1]
granule_asset_id # Define Local Filepath
= f'{methane_dir}{granule_asset_id}'
fp # Download the Granule Asset if it doesn't exist
if not os.path.isfile(fp):
with fs.get(url,stream=True) as src:
with open(fp,'wb') as dst:
for chunk in src.iter_content(chunk_size=64*1024*1024):
dst.write(chunk)
For each of these scenes we want to open EMIT L2A Mask data, then subset spatially and select only the variable we want, in this case we’ll use the Cloud flag
from the masks
dataarray. There are other flags, such as a cirrus mask, dilated cloud flag, spacecraft flag, water flag, and and aerosol optical depth. We could potentially use some of these as well to inform our decisions about plume detection, but will stick to the Cloud flag
in this example for simplicity.
As we do this, we will also use the GLT included in the mask file to orthorectify our RGB browse image. We can do this because the browse png files are in the native resolution and can be broadcast onto an orthorectified grid using the GLT. We will use the reproject_match
function to reproject the data on a common_grid
, which will automatically clip the data to the common_grid
extent.
First, get the filepaths for our downloaded mask data.
# List the downloaded files
= glob.glob(f'{methane_dir}*.nc')
fps # Make sure only files listed are also in the mask_urls list of files downloaded
= [os.path.basename(fp) for fp in fps if any(os.path.basename(fp) in url.split('/')[-1] for url in mask_urls)]
fns fns
Create a function to loop through our files, orthorectifying the mask and browse image, clipping and reprojecting to our predefined common_grid
, and finally saving outputs as a COG.
def process_scenes(fns, outdir, common_grid):
"""
This function will process a list of EMIT Mask scenes, selecting the cloud flag, orthorectifying the mask and browse image, then reprojecting both to a common grid and saving an output.
"""
for fn in fns:
# Get Granule Asset ID for First Adjacent Scene (may only be one)
= fn.split('.')[-2]
granule_asset_id # Set Output Path
= f"{outdir}{granule_asset_id}_cloud_flag.tif"
outpath_mask = f"{outdir}{granule_asset_id}_ortho_browse.tif"
outpath_browse # Check if the file exists
if not os.path.isfile(outpath_mask):
# Open Mask Dataset
= emit_xarray(f'{methane_dir}{fn}', ortho=False)
emit_ds # Retrieve GLT, spatial_ref, and geotransform to use on browse image
= np.nan_to_num(np.stack([emit_ds["glt_x"].data, emit_ds["glt_y"].data], axis=-1),nan=0).astype(int)
glt = emit_ds.spatial_ref
spatial_ref = emit_ds.geotransform
gt # Select browse image url corresponding to the scene
= [url for url in png_urls if fn.split('.')[-2].split('_')[-3] in url][0]
png_url # Orthorectify browse and mask
= ortho_browse(png_url, glt, spatial_ref, gt, white_background=True)
rgb = ortho_xr(emit_ds)
emit_ds # Select only mask array and desired quality flag and reproject to match our chosen extent
= emit_ds['mask'].sel(mask_bands='Cloud flag')
mask_da # Drop elevation
= mask_da.drop_vars('elev')
mask_da = 'Cloud flag'
mask_da.name = np.nan_to_num(mask_da.data, nan=-9999)
mask_da.data = mask_da.rio.reproject_match(common_grid, nodata=-9999)
mask_da #mask_da.rio.write_nodata(np.nan, inplace=True)
# Reproject rgb
= rgb.rio.reproject_match(common_grid, nodata=255) # 255 for white background
rgb # Write cog outputs
="COG")
mask_da.rio.to_raster(outpath_mask,driver="COG") rgb.rio.to_raster(outpath_browse,driver
Run the function.
process_scenes(fns, methane_dir, common_grid)
Create a list of the processed files to use in creation of a timeseries. We’ll use a similar process to what we did for the plumes, adding a time
variable to our datasets and concatenating.
= sorted(glob.glob(f'{methane_dir}*cloud_flag.tif'))
mask_files mask_files
= sorted(glob.glob(f'{methane_dir}*ortho_browse.tif'))
rgb_files rgb_files
Build a time index from the filenames.
def time_index_from_filenames(file_names,datetime_pos):
"""
Helper function to create a pandas DatetimeIndex
"""
return [datetime.strptime(f.split('_')[datetime_pos], '%Y%m%dT%H%M%S') for f in file_names]
= xr.Variable('time', time_index_from_filenames(mask_files, -5)) mask_time
Open and concatenate our datasets along the time dimension, then assign fill_values to np.nan
to make those sections of the data transparent in our visualization.
= xr.concat([rxr.open_rasterio(f).squeeze('band', drop=True).rio.reproject_match(common_grid) for f in mask_files], dim=mask_time) quality_ts_da
Create a plot object for the quality timeseries, first setting the quality mask values representing no clouds (0) or no data (-9999) to np.nan
so they will be transparent in our visualization.
< 1] = np.nan
quality_ts_da.data[quality_ts_da.data = quality_ts_da.hvplot.image(x='x',y='y',cmap='greys',groupby='time',clim=(0,1),geo=True,frame_height=400) quality_ts_map
RGB images are a good way to add something more visually understandable than just the mask layers. Follow the same process as above to build an RGB timeseries, then plot it with the bounding box and plume extents.
= xr.concat([rxr.open_rasterio(f).rio.reproject_match(common_grid) for f in rgb_files], dim=mask_time)
rgb_ts_ds == -1] = 255 rgb_ts_ds.data[rgb_ts_ds.data
= rgb_ts_ds.hvplot.rgb(x='x',y='y', bands='band',groupby='time',geo=True, frame_height=400, crs='EPSG:4326') rgb_ts_map
Lastly, add a new column to our plume geodataframe named time
so we are using the same naming convention and can layer our plume polygons from our filtered search on top of our quality data.
'time'] = pd.to_datetime(plm_gdf.loc[:,'_single_date_time']) plm_gdf[
Because we’ve set our RGB to display as white where there is no data, and our cloud mask where no clouds are present as transparent, we can identify any areas with no data over our ROI by white/transparent, and cloudy areas as black. The larger black lines/rectangles are representative of onboard cloud masking, where no data was downlinked due to a high volume of clouds detected.
*quality_ts_map*plm_gdf.hvplot(groupby='time', geo=True, line_color='red', fill_color=None)*max_conc_gdf.hvplot(color='red',crs='EPSG:4326',fill_color=None, line_color='yellow') rgb_ts_map
With this information, we can build a dataframe assigning a category to each of the dates where there was no plume detected and add this information to our IME timeseries figure.
Create a dataframe of dates where no plume was detected by finding the dates where there are no plumes in our mask_time
arrays by removing timestamps from our plm_time
array.
= np.setdiff1d(mask_time.data,plm_time.data)
no_plm_time no_plm_time
Build a dataframe out of these dates where there were no plumes detected
# Build a dataframe
= pd.DataFrame({'time':no_plm_time})
no_plm_df no_plm_df
Next, categorize them based on the visualizations we made and remove any times where there wasn’t a good observation of our area of interest.
# add a category column to describe the observation
'category'] = 'no_data'
no_plm_df[0,4],'category'] = 'cloud'
no_plm_df.loc[[7,'category'] = 'no_plume' no_plm_df.loc[
= no_plm_df[~no_plm_df['category'].str.contains('no_data')]
no_plm_df 'time'] = pd.to_datetime(no_plm_df['time'])
no_plm_df[=True, inplace=True)
no_plm_df.reset_index(drop no_plm_df
We can now merge our IME timeseries dataframe with this one, along the time column to add these observations with different categories to our timeseries.
= pd.merge(ime_ts.to_dataframe(), no_plm_df, on=['time'], how='outer')
ime_df = ime_df.sort_values(by='time')
ime_df 'category']=ime_df['category'].fillna('plume')
ime_df['value'] = ime_df['value'].fillna(0) ime_df[
ime_df
We can now plot this IME timeseries alongside our plume timeseries.
= ime_df.hvplot.scatter(x='time',y='value', by='category', size=100, xlabel='Date Observed', ylabel='kg', title='Observed Methane IME over 2023', rot=90,
ime_timeline =True, frame_height=400, frame_width=800, ylim=(0,11000),xticks=list(ime_df.time)) grid
='bottom') + plm_ts_plot.opts(title='Methane Plumes Observed', frame_height=400, frame_width=400) ime_timeline.opts(legend_position
Contact Info:
Email: LPDAAC@usgs.gov
Voice: +1-866-573-3222
Organization: Land Processes Distributed Active Archive Center (LP DAAC)¹
Website: https://lpdaac.usgs.gov/
Date last modified: 10-25-2024
¹Work performed under USGS contract G15PD00467 for NASA contract NNG14HH33I.