This notebook is from EMIT-Data-Resources

Source: Visualizing Methane Plume Timeseries

Imported on: 2025-01-22

Visualizing Methane Plume Timeseries

Summary

In this notebook, we’ll conduct a search and visualize the available methane plume complex observations detected by the Earth Mineral Dust Source Investigation (EMIT) instrument, then select a region of interest (ROI) that shows several plumes that appear to be from the same source. We will then visualize a time series of plume data for the ROI, calculate the integrated methane enhancement, or mass of methane observed in each plume. Lastly, to add more context to the plume timeseries, we will look at all EMIT observations over the target regions and determine if there were factors such as clouds limiting plume detection, or simply no methane emissions during additional overpasses.

NOTE: Throughout this notebook, several complex functions and workflows are used to process and visualize data. These can be found in the emit_tools.py and tutorial_utils.py modules.

Background

Methane is major contributor to atmospheric radiative forcing because its more efficient at trapping radiation than other greenhouse gases, like carbon dioxide. Because the lifetime of methane in the atmosphere is only about 10 years, reducing methane emissions offers an effective way to curb anthropogenic contributions to atmospheric radiative forcing.

The EMIT instrument is an imaging spectrometer that measures light in visible and infrared wavelengths. These measurements display unique spectral signatures that correspond to chemical composition. Although the primary mission focuses on mapping the mineral composition of Earth’s surface, the EMIT instrument has also been used to successfully map methane point source emissions within its target mask using the unique spectral fingerprint of methane. More details about EMIT and its associated products can be found in the README.md and on the EMIT website.

The L2B Estimated Methane Plume Complexes (EMITL2BCH4PLM) product provides delineated methane plume complexes in parts per million meter (ppm m) along with associated uncertainties. This product is a plume specific subset of the EMIT L2B Methane Enhancement Data (EMITL2BCH4ENH) product. Each EMITL2BCH4PLM granule is sized to a specific plume complex but may cross multiple EMITL2BCH4ENH granules. A list of source EMITL2BCH4ENH granules is included in the GeoTIFF file metadata as well as the GeoJSON file. Each EMITL2BCH4PLM granule contains two files: one Cloud Optimized GeoTIFF (COG) file at a spatial resolution of 60 meters (m) and one GeoJSON file. The EMITL2BCH4PLM COG file contains a raster image of a methane plume complex extracted from EMITL2BCH4ENH v001 data. The EMITL2BCH4PLM GeoJSON file contains a vector outline of the plume complex, a list of source scenes, coordinates of the maximum enhancement values with associated uncertainties.

The EMITL2BCH4ENH product only includes granules where methane plume complexes have been identified. To reduce the risk of false positives, all EMITL2BCH4ENH data undergo a manual review (or identification and confirmation) process before being designated as a plume complex. For more information on the manual review process, see Section 4.2.2 of the EMIT GHG Algorithm Theoretical Basis Document (ATBD).

References

Andrew K. Thorpe et al., Attribution of individual methane and carbon dioxide emission sources using EMIT observations from space. Sci. Adv.9, eadh2391 (2023). DOI:10.1126/sciadv.adh2391

Requirements - Set up Python Environment - See setup_instructions.md in the /setup/ folder - NASA Earthdata Login Account. Sign Up

Data Used - EMIT L2B Estimated Methane Plume Complexes (EMITL2BCH4PLM) - EMIT L2A Estimated Surface Reflectance and Uncertainty and Masks (EMITL2ARFL)

Learning Objectives - Search for EMIT L2B Estimated Methane Plume Complexes - Visualize search results on a map - Retrieve and visualize the EMIT L2B Estimated Methane Plume Complexes Metadata - Select a region of interest and build a timeseries of plume data - Further investigate plume detection by looking at browse images and quality information

Tutorial Outline

Setup
Search for EMIT L2B Estimated Methane Plume Complexes
Creating a Timeseries from Plume Data
Further investigation into plume detection
Calculating the Integrated Mass Enhancement for Plumes

1. Set up

Import the necessary Python libraries.

# Import required libraries
import sys
import os
import glob
import requests
import numpy as np
import pandas as pd
import xarray as xr
from osgeo import gdal
import geopandas as gpd

from datetime import datetime
import folium
import earthaccess
import folium.plugins
import rioxarray as rxr

import hvplot.xarray
import hvplot.pandas

from branca.element import Figure
from IPython.display import display
from shapely.geometry.polygon import orient
from shapely.geometry import Point

sys.path.append('../modules/')
from emit_tools import emit_xarray, ortho_xr, ortho_browse
from tutorial_utils import list_metadata_fields, results_to_geopandas, convert_bounds

Login using earthaccess and create a .netrc file if necessary. This file will store your NASA Earthdata Login credentials and use them to authenticate when necessary.

earthaccess.login(persist=True)

All of the data we use or save will go into the methane_tutorial directory, so we can go ahead and define that filepath now, relative to this notebook.

methane_dir = '../../data/methane_tutorial/'

2. Search for EMIT L2B Estimated Methane Plume Complexes

Use earthaccess to find all EMIT L2B Estimated Methane Plume Complexes (EMITL2BCH4PLM) data available from 2023. Define the date range, and concept-ids (unique product identifier) for the EMIT products that we want to search for, but leave the spatial arguments like polygon and bbox empty so we can preview detected methane plumes globally.

# Data Collections for our search, using a dictionary
concept_ids = {'plumes':'C2748088093-LPCLOUD', 'reflectance':'C2408750690-LPCLOUD'}
# Define Date Range
date_range = ('2023-01-01','2023-12-31')

results = earthaccess.search_data(
    concept_id=concept_ids['plumes'],
    temporal=date_range,
    count=2000
)

Convert the results to a geopandas.GeoDataFrame using a function from our tutorial_utils module. This gives a nice way to organize and visualize the search results.

gdf = results_to_geopandas(results)
gdf

The function includes default fields, but you can add more with the fields argument. To see all of the metadata available use the list_metadata_fields function imported from the tutorial_utils.py module.

list_metadata_fields(results)

Add an index column to the dataframe to include it in the tooltips for our visualization.

# Specify index so we can reference it with gdf.explore()
gdf['index'] = gdf.index

# Set up Figure and Basemap tiles
fig = Figure(width="1080px",height="540")
map1 = folium.Map(tiles=None)
folium.TileLayer(tiles='https://mt1.google.com/vt/lyrs=y&x={x}&y={y}&z={z}',name='Google Satellite', attr='Google', overlay=True).add_to(map1)
folium.TileLayer(tiles='https://server.arcgisonline.com/ArcGIS/rest/services/World_Imagery/MapServer/tile/{z}/{y}/{x}.png',
                name='ESRI World Imagery',
                attr='Tiles &copy; Esri &mdash; Source: Esri, i-cubed, USDA, USGS, AEX, GeoEye, Getmapping, Aerogrid, IGN, IGP, UPR-EGP, and the GIS User Community',
                overlay='True').add_to(map1)
fig.add_child(map1)
# Add Search Results gdf
gdf.explore("_single_date_time",
            categorical=True,
            style_kwds={"fillOpacity":0.1,"width":2},
            name="EMIT L2B CH4PLM",
            tooltip=[
                "index",
                "native-id",
                "_single_date_time",
            ],
            m=map1,
            legend=False
)

# Zoom to Data
map1.fit_bounds(bounds=convert_bounds(gdf.unary_union.bounds))
# Add Layer controls
map1.add_child(folium.LayerControl(collapsed=False))
display(fig)

In this example we’ll chose a region that looks like it has a several plumes emitting from the same source, a landfill in Jordan. We could create a simple bounding box around our target region by using the plumes that extend furthest in the cardinal directions to generate a bounding box around the region that we can use in our upcoming analysis.

However, to simplify things we will import an existing geojson as a GeoDataFrame with a bounding box around this region because the plume indices may change, which results in inconsistent outputs. A commented out cell below as included as an example of how the GeoDataFrame was created before being written to geojson.

# # Select a list of plumes to create geometry we can use for a spatial subset
# plumes = [146,198,243]
# bbox = gdf.loc[plumes].geometry.unary_union.envelope
# bbox = orient(bbox, sign=1)
# plume_bbox = gpd.GeoDataFrame({"name":['plume_bbox'], "geometry":[bbox]},crs=gdf.crs)

Open the predefined geojson of our plume bounding box as a GeoDataFrame then visualize on our folium figure.

# Open the geojson with our plume bbox
plume_bbox = gpd.read_file(f'{methane_dir}/plume_bbox.geojson')

# Set up Figure and Basemap tiles
fig = Figure(width="1080px",height="540")
map1 = folium.Map(tiles=None)
folium.TileLayer(tiles='https://mt1.google.com/vt/lyrs=y&x={x}&y={y}&z={z}',name='Google Satellite', attr='Google', overlay=True).add_to(map1)
folium.TileLayer(tiles='https://server.arcgisonline.com/ArcGIS/rest/services/World_Imagery/MapServer/tile/{z}/{y}/{x}.png',
                name='ESRI World Imagery',
                attr='Tiles &copy; Esri &mdash; Source: Esri, i-cubed, USDA, USGS, AEX, GeoEye, Getmapping, Aerogrid, IGN, IGP, UPR-EGP, and the GIS User Community',
                overlay='True').add_to(map1)
fig.add_child(map1)
# Add Search Results gdf
plume_bbox.explore("name",
                   name='Plume BBox',
                   style_kwds={"fillOpacity":0,"width":2},
                   m=map1,
                   legend=False)

gdf.explore("_single_date_time",
            categorical=True,
            style_kwds={"fillOpacity":0.1,"width":2},
            name="EMIT L2B CH4PLM",
            tooltip=[
                "index",
                "native-id",
                "_single_date_time",
            ],
            m=map1,
            legend=False
)
# Zoom to Data
map1.fit_bounds(bounds=convert_bounds(plume_bbox.unary_union.bounds))
# Add Layer controls
map1.add_child(folium.LayerControl(collapsed=False))
display(fig)

Subset our geodataframe of global plumes to only those that intersect our bounding box.

plm_gdf = gdf[gdf.geometry.intersects(plume_bbox.geometry[0])]
plm_gdf

Let’s examine the _related_urls column in our geodataframe. This column contains various links to assets related to each plume.

plm_gdf['_related_urls'].iloc[0]

We can write a function to return the asset URL for a given asset and row in our dataframe.

def get_asset_url(row,asset, key='Type',value='GET DATA'):
    """
    Retrieve a url from the list of dictionaries for a row in the _related_urls column.
    Asset examples: CH4PLM, CH4PLMMETA, RFL, MASK, RFLUNCERT 
    """
    # Add _ to asset so string matching works
    asset = f"_{asset}_"
    # Retrieve URL matching parameters
    for _dict in row['_related_urls']:
        if _dict.get(key) == value and asset in _dict['URL'].split('/')[-1]:
            return _dict['URL']

Write another function to make a request to retrieve the json metadata for a given plume, using our get_asset_url function to select the correct URL from the dataframe.

# Function to fetch CH4 Plume Metadata
# Speed could be improved here by using asyncio/aiohttp
def fetch_ch4_metadata(row):
    response = requests.get(get_asset_url(row, 'CH4PLMMETA'))
    json = response.json()
    return json['features'][0]['properties']

fetch_ch4_metadata(plm_gdf.iloc[0])

Retrieve additional plume metadata contained in the EMIT L2B Estimated Methane Plume Complexes (EMITL2BCH4PLM) data product, which contains the maximum enhancement value, the uncertainty of the plume complex, and the list of source scenes.

# Apply the function to each row and convert the result to a DataFrame
plm_meta = plm_gdf.apply(fetch_ch4_metadata, axis=1).apply(pd.Series)

plm_meta

We can add the points with highest methane concentration to our visualization.

Create an index column, as we did for the plumes, then convert the latitude and longitude of max concentration to a shapely Point object and add it to our GeoDataFrame.

# Specify index so we can reference it with gdf.explore()
plm_meta['index'] = plm_meta.index
# Add Geometry and convert to geodataframe
plm_meta['geometry'] = plm_meta.apply(lambda row: Point(row['Longitude of max concentration'], row['Latitude of max concentration']), axis=1)
plm_meta = gpd.GeoDataFrame(plm_meta, geometry='geometry', crs='EPSG:4326')

plm_meta

Now add this to our visualization.

# Set up Figure and Basemap tiles
fig = Figure(width="1080px",height="540")
map1 = folium.Map(tiles=None)
folium.TileLayer(tiles='https://mt1.google.com/vt/lyrs=y&x={x}&y={y}&z={z}',name='Google Satellite', attr='Google', overlay=True).add_to(map1)
folium.TileLayer(tiles='https://server.arcgisonline.com/ArcGIS/rest/services/World_Imagery/MapServer/tile/{z}/{y}/{x}.png',
                name='ESRI World Imagery',
                attr='Tiles &copy; Esri &mdash; Source: Esri, i-cubed, USDA, USGS, AEX, GeoEye, Getmapping, Aerogrid, IGN, IGP, UPR-EGP, and the GIS User Community',
                overlay='True').add_to(map1)
fig.add_child(map1)
# Add Search Results gdf
plume_bbox.explore("name",
                   name='Plume BBox',
                   style_kwds={"fillOpacity":0,"width":2},
                   m=map1,
                   legend=False)

plm_gdf.explore("index",
            categorical=True,
            style_kwds={"fillOpacity":0.1,"width":2},
            name="EMIT L2B CH4PLM",
            tooltip=[
                "index",
                "native-id",
                "_single_date_time",
            ],
            m=map1,
            legend=False
)

plm_meta.explore("index",
            categorical=True,
            style_kwds={"fillOpacity":0.1,"width":2},
            name="Location of Max Concentration (ppm m)",
            tooltip=[
                "DAAC Scene Names",
                "UTC Time Observed",
                "Max Plume Concentration (ppm m)",
                "Concentration Uncertainty (ppm m)",
                "Orbit"
            ],
            m=map1,
            legend=False
)
# Zoom to Data
map1.fit_bounds(bounds=convert_bounds(plume_bbox.unary_union.bounds))
# Add Layer controls
map1.add_child(folium.LayerControl(collapsed=False))
display(fig)

3. Creating a Timeseries from Plume Data

We can visualize a timeseries of these plumes that appear to be from the same source. To do this we’ll generate a list of the COG URLs for the plumes, then use rioxarray to stream the data and build a timeseries dataset based on the timestamps in the filenames.

Use the get_asset_url function to retrieve the CH4PLM asset URLs by applying it to our dataframe and converting the output to a list.

# Iterate over rows in the plm_gdf and get the CH4PLM URLs and store them in a list
plm_urls = plm_gdf.apply(lambda row: get_asset_url(row, asset='CH4PLM'), axis=1).tolist()
plm_urls

Now that we have a list of COG urls we can set our gdal configuration options to pass our NASA Earthdata login credentials when we access each COG.

# GDAL configurations used to successfully access LP DAAC Cloud Assets via vsicurl 
gdal.SetConfigOption('GDAL_HTTP_COOKIEFILE','~/cookies.txt')
gdal.SetConfigOption('GDAL_HTTP_COOKIEJAR', '~/cookies.txt')
gdal.SetConfigOption('GDAL_DISABLE_READDIR_ON_OPEN','EMPTY_DIR')
gdal.SetConfigOption('CPL_VSIL_CURL_ALLOWED_EXTENSIONS','TIF')
gdal.SetConfigOption('GDAL_HTTP_UNSAFESSL', 'YES')
gdal.SetConfigOption('GDAL_HTTP_MAX_RETRY', '10')
gdal.SetConfigOption('GDAL_HTTP_RETRY_DELAY', '0.5')

To build our timeseries we will start by opening all the necessary data. Loop over our list of URLs, open each plume, merge plumes acquired at the same time, and store them in a dictionary where keys correspond to the acquisition time and values are the plume data in an xarray.DataArray.

plm_ts_dict = {}
# Set max retries for vsicurl errors
max_retries=5
# Iterate over plm urls
for url in plm_urls:
    # retrieve acquisition time from url
    acquisition_time = url.split('/')[-1].split('.')[-2].split('_')[-2]
    # list plumes identified in same scene if there are any
    same_scene = [url for url in plm_urls if acquisition_time in url.split('/')[-1].split('.')[-2].split('_')[-2]]
    to_merge = []
    # prevent duplicate processing of plumes from the same scene
    if acquisition_time not in list(plm_ts_dict.keys()):
        # Open and merge plumes identified from each scene
        for _plm in same_scene:
            print(f"Opening {_plm.split('/')[-1]}")
            # Open COG and squeeze band dimension
            plm = rxr.open_rasterio(_plm).squeeze('band', drop=True)
            # Add to list of plumes to merge
            to_merge.append(plm)
            # Merge plumes and add to timeseries
            plm_ts_dict[acquisition_time] = rxr.merge.merge_arrays(to_merge)

Now that we have a plume object for each date in our timeseries, we need to put them all on a common grid so they are spatially aligned before we can stack them along the time dimension.

To do this, we will find the minimum and maximum bounds of each dataarray in our dictionary, then create a common grid to reproject to.

from typing import List
def create_common_grid(data_arrays: List[xr.DataArray]) -> xr.DataArray:
    """
    Create a common grid for a list of xarray DataArrays, matching the resolution of the first data array.
    """
    # Initial Bounds for common grid
    minx = miny = float('inf')
    maxx = maxy = float('-inf')

    for array in data_arrays:
        left, bottom, right, top = array.rio.bounds()
        minx = min(minx, left)
        miny = min(miny, bottom)
        maxx = max(maxx, right)
        maxy = max(maxy, top)

    bounds = (minx, miny, maxx, maxy)

    res = data_arrays[0].rio.resolution()
    crs = data_arrays[0].rio.crs
    nodata = data_arrays[0].rio.nodata

    # Calculate new raster shape using the new extent, maintaining the original resolution
    height = int(np.ceil((bounds[3] - bounds[1]) / abs(res[1])))
    width = int(np.ceil((bounds[2] - bounds[0]) / abs(res[0])))
    data = np.full((height,width),nodata)
    coords = {'y':(['y'],np.arange(bounds[1], bounds[3], abs(res[1]))),
              'x':(['x'],np.arange(bounds[0], bounds[2], abs(res[0])))}
    common_grid = xr.DataArray(data, coords=coords)
    common_grid.rio.write_crs(crs, inplace=True)
    return(common_grid)

common_grid = create_common_grid(list(plm_ts_dict.values()))
common_grid

Reproject each of the plume dataarrays to the common grid.

plm_ts_dict = {key: value.rio.reproject_match(common_grid) for key, value in plm_ts_dict.items()}

Now that we have all of our plumes on a standard grid, we can concatenate them along a time dimension to create a timeseries. Create an xarray variable called ‘time’ from our dictionary keys, then use xarray.concat to concatenate all of our plumes along the time dimension.

plm_time = xr.Variable('time', [datetime.strptime(t,'%Y%m%dT%H%M%S') for t in list(plm_ts_dict.keys())])
plm_time

plm_ts_ds = xr.concat(list(plm_ts_dict.values()), dim=plm_time)
plm_ts_ds

Set our no_data values to np.nan to make sure they are transparent for improved visualization.

plm_ts_ds.data[plm_ts_ds.data == -9999] = np.nan

Plot the plume time series.

plm_ts_plot = plm_ts_ds.hvplot.image(x='x',y='y',geo=True, tiles='ESRI', crs='EPGS:4326', cmap='inferno',clim=(0,np.nanmax(plm_ts_ds.data)),clabel=f'Methane Concentation ({plm_ts_ds.Units})', frame_width=600, frame_height=600, rasterize=True)

plm_ts_plot

4. Calculating the IME for each plume

The integrated mass enhancement (IME), a sum of the mass of methane present in each plume is calculated by summing the mass of methane present in all plume pixels. The IME in kilograms (kg) can be estimated by using the equation below which includes the mixing ratio length per pixel (ppmm), the area of the pixel (m^2), where 22.4 is the volume of 1 mole of gas at standard temperature and pressure (STP) in liters, and 0.01604 is the molar mass of methane in kg/mol.

The IME can be combined with a plume length and windspeed to estimate emissions in units of kg per hour, but in this tutorial, we will not calculate emissions, rather keep things simple and calculate the IME for each plume and observe how these values change over time.

\[\ kg\ \ (per \ \ pixel) = \frac{pixel \ \ value \ \ ppm \cdot m}{1} \frac{1}{1 \cdot 10^6 \ \ ppm} \frac {60 \ \ m \cdot 60 \ \ m} {1} \frac {1000 \ \ L} {m^3} \frac {1 \ \ mol} {22.4 \ \ L} \frac {0.01604 \ \ kg} {1 \ \ mol}\]

We can write this as a function for each pixel, then apply it to the entire timeseries to calculate the IME for each plume.

def calc_ime(plume_da):
    molar_volume = 22.4 # L/mol at STP
    molar_mass_ch4 = 0.01604 #kg/mol

    kg = plume_da * (1/1e6) * (60*60) * (1000) * (1/molar_volume) * molar_mass_ch4
    ime = np.nansum(kg)
    return ime

# Apply the function along the 'x' and 'y' dimensions
ime_ts = xr.apply_ufunc(calc_ime, plm_ts_ds, input_core_dims=[['y', 'x']], vectorize=True)
ime_ts.name = 'value'

ime_plot = ime_ts.hvplot.scatter(x='time',y='value', title='Observed Methane IME over 2023', color='black', xticks=list(ime_ts.time.data), rot=90, grid=True, xlabel='Date Observed', ylabel='kg')

ime_plot

5. Further investigation into plume detection

The timeseries shown above doesn’t necessarily give us a full a full picture methane emissions at the landfill. In addition to cases where no emissions were observed, gaps in data can result from absence of observations due the variable revisit period of the ISS, or clouds. To add more context to this plume timeseries, we will look at all EMIT acquisitions over the target regions and try to determine if there were factors affecting plume detection, or simply no methane emissions during those for any other overpass.

For this search we’ll want to restrict our search to a smaller ROI than our bounding box. We want to pick something smaller more centered around the source of the emission, so we avoid retrieving irrelevant data, for example an overpass barely crossing the corner of our large bounding box. To do this we will use the maximum concentration points from our plumes to create a smaller ROI that is likely to include a methane plume if one is present.

Create a new polygon using the maximum concentration points from our plumes as vertices, then orienting the points in counter-clockwise order so they are compatible with an earthaccess search. This is just a simple example approach, there are several ways we could define a smaller ROI using the plume data or ancillary information.

max_conc_poly = plm_meta.geometry.unary_union.envelope
max_conc_poly = orient(max_conc_poly, sign=1.0)
max_conc_gdf = gpd.GeoDataFrame({"name":['max_points'], "geometry":[max_conc_poly]},crs=plm_meta.crs)

roi = list(max_conc_gdf.geometry[0].exterior.coords)
roi

Now conduct a search for reflectance data over our ROI and convert the results to a geopandas.GeoDataFrame.

rfl_results = earthaccess.search_data(
    concept_id=concept_ids['reflectance'],
    polygon=roi,
    temporal= date_range, #('2023-03-01','2023-03-31')
    count=2000
)
rfl_gdf = results_to_geopandas(rfl_results)
rfl_gdf

From the results we can see we have 20 scenes that intersect our ROI. We can visualize the footprints of these scenes to gain some insight into coverage over our ROI.

# Set up Figure and Basemap tiles
fig = Figure(width="1080px",height="540")
map1 = folium.Map(tiles=None)
folium.TileLayer(tiles='https://mt1.google.com/vt/lyrs=y&x={x}&y={y}&z={z}',name='Google Satellite', attr='Google', overlay=True).add_to(map1)
folium.TileLayer(tiles='https://server.arcgisonline.com/ArcGIS/rest/services/World_Imagery/MapServer/tile/{z}/{y}/{x}.png',
                name='ESRI World Imagery',
                attr='Tiles &copy; Esri &mdash; Source: Esri, i-cubed, USDA, USGS, AEX, GeoEye, Getmapping, Aerogrid, IGN, IGP, UPR-EGP, and the GIS User Community',
                overlay='True').add_to(map1)
fig.add_child(map1)

# Add Search Reflectance Scenes with no CH4
rfl_gdf.explore(color='red',
               style_kwds={"fillOpacity":0,"width":2},
               name="Scenes with no CH4 Plumes",
               tooltip=[
                "native-id",
                "_beginning_date_time",
                ],
                m=map1,
                legend=False)

# Add Plume BBox to Map
max_conc_gdf.explore(m=map1,
                   name='Plumes Bounding Box',
                   legend=False)

# Zoom to Data
map1.fit_bounds(bounds=convert_bounds(rfl_gdf.unary_union.bounds))
# Add Layer controls
map1.add_child(folium.LayerControl(collapsed=False))
display(fig)

From this we can see that there are several scenes that intersect with our ROI, but likely have no relevant information since they only cover a small portion or corner of the ROI.

We can use a similar process as with the methane product to construct a time series to better understand the data gathered on each overpass. To do this, we will use the browse imagery and the masks included in the EMITL2ARFL product. The by default the mask files and browse images are not orthorectified, so we must do that as part of our workflow.

First, get the urls for the browse images and masks for each scene in our rfl_gdf search results using the get_asset_urls function.

png_urls = rfl_gdf.apply(lambda row: get_asset_url(row, asset='RFL', value='GET RELATED VISUALIZATION'), axis=1).tolist()
png_urls

mask_urls = rfl_gdf.apply(lambda row: get_asset_url(row, asset='MASK'), axis=1).tolist()
mask_urls

We can write these as a text file so we don’t need to search again, although we will use the rfl_gdf GeoDataFrame later in the tutorial.

# # Save URL List
# with open(f'{methane_dir}rfl_mask_urls.txt', 'w') as f:
#     for line in mask_urls:
#         f.write(f"{line}\n")

Since the mask files are not chunked, its quicker to download them to do the processing.

earthaccess.login(persist=True)
# Get requests https Session using Earthdata Login Info
fs = earthaccess.get_requests_https_session()
# Retrieve granule asset ID from URL (to maintain existing naming convention)
for url in mask_urls:
    granule_asset_id = url.split('/')[-1]
    # Define Local Filepath
    fp = f'{methane_dir}{granule_asset_id}'
    # Download the Granule Asset if it doesn't exist
    if not os.path.isfile(fp):
        with fs.get(url,stream=True) as src:
            with open(fp,'wb') as dst:
                for chunk in src.iter_content(chunk_size=64*1024*1024):
                    dst.write(chunk)

For each of these scenes we want to open EMIT L2A Mask data, then subset spatially and select only the variable we want, in this case we’ll use the Cloud flag from the masks dataarray. There are other flags, such as a cirrus mask, dilated cloud flag, spacecraft flag, water flag, and and aerosol optical depth. We could potentially use some of these as well to inform our decisions about plume detection, but will stick to the Cloud flag in this example for simplicity.

As we do this, we will also use the GLT included in the mask file to orthorectify our RGB browse image. We can do this because the browse png files are in the native resolution and can be broadcast onto an orthorectified grid using the GLT. We will use the reproject_match function to reproject the data on a common_grid, which will automatically clip the data to the common_grid extent.

First, get the filepaths for our downloaded mask data.

# List the downloaded files
fps = glob.glob(f'{methane_dir}*.nc')
# Make sure only files listed are also in the mask_urls list of files downloaded
fns = [os.path.basename(fp) for fp in fps if any(os.path.basename(fp) in url.split('/')[-1] for url in mask_urls)]
fns

Create a function to loop through our files, orthorectifying the mask and browse image, clipping and reprojecting to our predefined common_grid, and finally saving outputs as a COG.

def process_scenes(fns, outdir, common_grid):
    """
    This function will process a list of EMIT Mask scenes, selecting the cloud flag, orthorectifying the mask and browse image, then reprojecting both to a common grid and saving an output.
    """
    for fn in fns:
        # Get Granule Asset ID for First Adjacent Scene (may only be one)
        granule_asset_id = fn.split('.')[-2]
        # Set Output Path
        outpath_mask = f"{outdir}{granule_asset_id}_cloud_flag.tif"
        outpath_browse = f"{outdir}{granule_asset_id}_ortho_browse.tif"
        # Check if the file exists
        if not os.path.isfile(outpath_mask):
            # Open Mask Dataset
            emit_ds = emit_xarray(f'{methane_dir}{fn}', ortho=False)
            # Retrieve GLT, spatial_ref, and geotransform to use on browse image
            glt = np.nan_to_num(np.stack([emit_ds["glt_x"].data, emit_ds["glt_y"].data], axis=-1),nan=0).astype(int)
            spatial_ref = emit_ds.spatial_ref
            gt = emit_ds.geotransform
            # Select browse image url corresponding to the scene
            png_url = [url for url in png_urls if fn.split('.')[-2].split('_')[-3] in url][0]
            # Orthorectify browse and mask
            rgb = ortho_browse(png_url, glt, spatial_ref, gt, white_background=True)
            emit_ds = ortho_xr(emit_ds)
            # Select only mask array and desired quality flag and reproject to match our chosen extent
            mask_da = emit_ds['mask'].sel(mask_bands='Cloud flag')
            # Drop elevation
            mask_da = mask_da.drop_vars('elev')
            mask_da.name = 'Cloud flag'
            mask_da.data = np.nan_to_num(mask_da.data, nan=-9999)
            mask_da = mask_da.rio.reproject_match(common_grid, nodata=-9999)
            #mask_da.rio.write_nodata(np.nan, inplace=True)
            # Reproject rgb
            rgb = rgb.rio.reproject_match(common_grid, nodata=255) # 255 for white background
            # Write cog outputs        
            mask_da.rio.to_raster(outpath_mask,driver="COG")
            rgb.rio.to_raster(outpath_browse,driver="COG")

Run the function.

process_scenes(fns, methane_dir, common_grid)

Create a list of the processed files to use in creation of a timeseries. We’ll use a similar process to what we did for the plumes, adding a time variable to our datasets and concatenating.

mask_files = sorted(glob.glob(f'{methane_dir}*cloud_flag.tif'))
mask_files

rgb_files = sorted(glob.glob(f'{methane_dir}*ortho_browse.tif'))
rgb_files

Build a time index from the filenames.

def time_index_from_filenames(file_names,datetime_pos):
    """
    Helper function to create a pandas DatetimeIndex
    """
    return [datetime.strptime(f.split('_')[datetime_pos], '%Y%m%dT%H%M%S') for f in file_names]

mask_time = xr.Variable('time', time_index_from_filenames(mask_files, -5))

Open and concatenate our datasets along the time dimension, then assign fill_values to np.nan to make those sections of the data transparent in our visualization.

quality_ts_da = xr.concat([rxr.open_rasterio(f).squeeze('band', drop=True).rio.reproject_match(common_grid) for f in mask_files], dim=mask_time)

Create a plot object for the quality timeseries, first setting the quality mask values representing no clouds (0) or no data (-9999) to np.nan so they will be transparent in our visualization.

quality_ts_da.data[quality_ts_da.data < 1] = np.nan
quality_ts_map = quality_ts_da.hvplot.image(x='x',y='y',cmap='greys',groupby='time',clim=(0,1),geo=True,frame_height=400)

RGB images are a good way to add something more visually understandable than just the mask layers. Follow the same process as above to build an RGB timeseries, then plot it with the bounding box and plume extents.

rgb_ts_ds = xr.concat([rxr.open_rasterio(f).rio.reproject_match(common_grid) for f in rgb_files], dim=mask_time)
rgb_ts_ds.data[rgb_ts_ds.data == -1] = 255

rgb_ts_map = rgb_ts_ds.hvplot.rgb(x='x',y='y', bands='band',groupby='time',geo=True, frame_height=400, crs='EPSG:4326')

Lastly, add a new column to our plume geodataframe named time so we are using the same naming convention and can layer our plume polygons from our filtered search on top of our quality data.

plm_gdf['time'] = pd.to_datetime(plm_gdf.loc[:,'_single_date_time'])

Because we’ve set our RGB to display as white where there is no data, and our cloud mask where no clouds are present as transparent, we can identify any areas with no data over our ROI by white/transparent, and cloudy areas as black. The larger black lines/rectangles are representative of onboard cloud masking, where no data was downlinked due to a high volume of clouds detected.

rgb_ts_map*quality_ts_map*plm_gdf.hvplot(groupby='time', geo=True, line_color='red', fill_color=None)*max_conc_gdf.hvplot(color='red',crs='EPSG:4326',fill_color=None, line_color='yellow')

With this information, we can build a dataframe assigning a category to each of the dates where there was no plume detected and add this information to our IME timeseries figure.

Create a dataframe of dates where no plume was detected by finding the dates where there are no plumes in our mask_time arrays by removing timestamps from our plm_time array.

no_plm_time = np.setdiff1d(mask_time.data,plm_time.data)
no_plm_time

Build a dataframe out of these dates where there were no plumes detected

# Build a dataframe
no_plm_df = pd.DataFrame({'time':no_plm_time})
no_plm_df

Next, categorize them based on the visualizations we made and remove any times where there wasn’t a good observation of our area of interest.

# add a category column to describe the observation 
no_plm_df['category'] = 'no_data'
no_plm_df.loc[[0,4],'category'] = 'cloud'
no_plm_df.loc[7,'category'] = 'no_plume'

no_plm_df = no_plm_df[~no_plm_df['category'].str.contains('no_data')]
no_plm_df['time'] = pd.to_datetime(no_plm_df['time'])
no_plm_df.reset_index(drop=True, inplace=True)
no_plm_df

We can now merge our IME timeseries dataframe with this one, along the time column to add these observations with different categories to our timeseries.

ime_df = pd.merge(ime_ts.to_dataframe(), no_plm_df, on=['time'], how='outer')
ime_df = ime_df.sort_values(by='time')
ime_df['category']=ime_df['category'].fillna('plume')
ime_df['value'] = ime_df['value'].fillna(0)

ime_df

We can now plot this IME timeseries alongside our plume timeseries.

ime_timeline = ime_df.hvplot.scatter(x='time',y='value', by='category', size=100, xlabel='Date Observed', ylabel='kg', title='Observed Methane IME over 2023', rot=90,
                                     grid=True, frame_height=400, frame_width=800, ylim=(0,11000),xticks=list(ime_df.time))

ime_timeline.opts(legend_position='bottom') + plm_ts_plot.opts(title='Methane Plumes Observed', frame_height=400, frame_width=400)

Contact Info:

Email: LPDAAC@usgs.gov
Voice: +1-866-573-3222
Organization: Land Processes Distributed Active Archive Center (LP DAAC)¹
Website: https://lpdaac.usgs.gov/
Date last modified: 10-25-2024

¹Work performed under USGS contract G15PD00467 for NASA contract NNG14HH33I.