NetCDF to CSV Transformation

Published

October 27, 2025

Summary

This notebook demonstrates an approach for converting a Network Common Data Form (netCDF4) file from the Multi-Angle Imager for Aerosols (MAIA) instrument into a Comma Separated Value (CSV) file.

Outline

The NetCDF file is opened using nc.Dataset in read mode.
A dictionary called data_dict is created to store data from each variable. The read_variables function populates the dictionary.
The maximum length among the variable data arrays is determined.
Variables with shorter lengths are padded to match the maximum length using numpy.resize. The updated variable data arrays are stored back in the data_dict variable.
A pandas DataFrame called data is created using the data_dict dictionary, where each variable becomes a column in the DataFrame.
The DataFrame is saved to a CSV file. The index=False argument is used to exclude the row index from being written to the CSV file.

Prerequisites

netCDF4 - for reading netCDF files
numpy - for array operations
pandas - for data manipulation

Notebook Author / Affiliation

Hazem Mahmoud / Atmospheric Science Data Center

1. Setup

import netCDF4 as nc
import pandas as pd
import numpy as np

2. NetCDF conversion to CSV

def read_variables(group):
    """
    Recursive function to read variables from a NetCDF group and its subgroups.

    Parameters:
        group

    Returns:
        a dictionary containing variable names as keys and their corresponding data arrays as values.
    """
    data_dict = {}

    # Iterate over variables in the current group
    for var in group.variables:
        var_data = group.variables[var][:]

        # Flatten the variable data if it has more than one dimension
        if len(var_data.shape) > 1:
            var_data = var_data.flatten()

        data_dict[var] = var_data

    # Iterate over subgroups in the current group
    for subgroup in group.groups.values():
        subgroup_data = read_variables(subgroup)
        data_dict.update(subgroup_data)

    return data_dict

def netcdf_to_csv(input_file, output_file):
    dataset = nc.Dataset(input_file, "r")  # Open the NetCDF file in read mode

    # Create a dictionary to store the variable data
    data_dict = read_variables(dataset)

    # Find the maximum length among the variable data
    max_length = max(var_data.size for var_data in data_dict.values())

    # Pad variables with shorter lengths to match the maximum length
    for var in data_dict:
        var_data = data_dict[var]
        if var_data.size < max_length:
            var_data = np.resize(var_data, max_length)
            data_dict[var] = var_data

    # Create a pandas DataFrame using the data dictionary
    data = pd.DataFrame(data_dict)

    # Save the DataFrame to a CSV file
    data.to_csv(output_file, index=False)

    print("Conversion complete.")


# Usage example
input_file = "/content/MAIA_L4_GFPM_20180101T000000Z_FB_NOM_R01_USA-Boston_F01_VSIM01p01p01p01.nc"
output_file = "/content/output.csv"

netcdf_to_csv(input_file, output_file)