This notebook demonstrates an approach to converting a Network Common Data Form (netCDF4) file from the MAIA mission into a Comma Separated Value (CSV) file.
Outline
The NetCDF file is opened using nc.Dataset in read mode, and the resulting dataset object is stored in the dataset variable.
A dictionary called data_dict is created to store the variable data. The read_variables function is called with the dataset object as the input to populate the dictionary with variable data.
The maximum length among the variable data arrays is determined using a generator expression and the max function.
Variables with shorter lengths are padded to match the maximum length using np.resize. The updated variable data arrays are stored back in the data_dict dictionary.
A pandas DataFrame called data is created using the data_dict dictionary, where each variable becomes a column in the DataFrame.
The DataFrame is saved to a CSV file using the to_csv method of the DataFrame. The index=False argument is used to exclude the row index from being written to the CSV file.
Setup
Libraries are installed and then imported: - netCDF4 for reading NetCDF files, - pandas for data manipulation - numpy for array operations.
!pip install netCDF4 pandas
Collecting netCDF4
Downloading netCDF4-1.6.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.4 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/5.4 MB ? eta -:--:-- ━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.5/5.4 MB 43.7 MB/s eta 0:00:01 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 5.4/5.4 MB 93.0 MB/s eta 0:00:01 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.4/5.4 MB 57.9 MB/s eta 0:00:00
Requirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (1.5.3)
Collecting cftime (from netCDF4)
Downloading cftime-1.6.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 54.4 MB/s eta 0:00:00
Requirement already satisfied: certifi in /usr/local/lib/python3.10/dist-packages (from netCDF4) (2023.5.7)
Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from netCDF4) (1.22.4)
Requirement already satisfied: python-dateutil>=2.8.1 in /usr/local/lib/python3.10/dist-packages (from pandas) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas) (2022.7.1)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.1->pandas) (1.16.0)
Installing collected packages: cftime, netCDF4
Successfully installed cftime-1.6.2 netCDF4-1.6.4
import netCDF4 as ncimport pandas as pdimport numpy as np
NetCDF conversion to CSV
def netcdf_to_csv(input_file, output_file):"""Main function that performs the conversion from NetCDF to CSV. Parameters: input_file (the path to the NetCDF file) output_file (the desired path for the output CSV file). """ dataset = nc.Dataset(input_file) # Open the NetCDF file# Get the variable names from the dataset variables = dataset.variables.keys()# Create a dictionary to store the variable data data_dict = {} max_length =0# Track the maximum length among all variablesfor var in variables: var_data = dataset.variables[var][:]# Flatten the variable data if it has more than one dimensioniflen(var_data.shape) >1: var_data = var_data.flatten()# Update the maximum length if needed max_length =max(max_length, var_data.size) data_dict[var] = var_data# Pad variables with shorter lengths to match the maximum lengthfor var in data_dict: var_data = data_dict[var]if var_data.size < max_length: var_data = np.resize(var_data, max_length) data_dict[var] = var_data# Create a pandas DataFrame using the data dictionary data = pd.DataFrame(data_dict)# Save the DataFrame to a CSV file data.to_csv(output_file, index=False)print("Conversion complete.")# Usage exampleinput_file ="/content/MAIA_L4_GFPM_20180101T000000Z_FB_NOM_R01_USA-Boston_F01_VSIM01p01p01p01.nc"output_file ="/content/output.csv"netcdf_to_csv(input_file, output_file)
Conversion complete.
#NetCDF conversion to CSV with flat groups
def read_variables(group):""" Recursive function to read variables from a NetCDF group and its subgroups. Parameters: group Returns: a dictionary containing variable names as keys and their corresponding data arrays as values. """ data_dict = {}# Iterate over variables in the current groupfor var in group.variables: var_data = group.variables[var][:]# Flatten the variable data if it has more than one dimensioniflen(var_data.shape) >1: var_data = var_data.flatten() data_dict[var] = var_data# Iterate over subgroups in the current groupfor subgroup in group.groups.values(): subgroup_data = read_variables(subgroup) data_dict.update(subgroup_data)return data_dictdef netcdf_to_csv(input_file, output_file): dataset = nc.Dataset(input_file, "r") # Open the NetCDF file in read mode# Create a dictionary to store the variable data data_dict = read_variables(dataset)# Find the maximum length among the variable data max_length =max(var_data.size for var_data in data_dict.values())# Pad variables with shorter lengths to match the maximum lengthfor var in data_dict: var_data = data_dict[var]if var_data.size < max_length: var_data = np.resize(var_data, max_length) data_dict[var] = var_data# Create a pandas DataFrame using the data dictionary data = pd.DataFrame(data_dict)# Save the DataFrame to a CSV file data.to_csv(output_file, index=False)print("Conversion complete.")# Usage exampleinput_file ="/content/MAIA_L4_GFPM_20180101T000000Z_FB_NOM_R01_USA-Boston_F01_VSIM01p01p01p01.nc"output_file ="/content/output.csv"netcdf_to_csv(input_file, output_file)
Conversion complete.
def read_variables(group):""" Recursive function to read variables from a NetCDF group and its subgroups. """ data_dict = {}# Iterate over variables in the current groupfor var in group.variables: var_data = group.variables[var][:]# Flatten the variable data if it has more than one dimensioniflen(var_data.shape) >1: var_data = var_data.flatten() data_dict[var] = var_data# Iterate over subgroups in the current groupfor subgroup in group.groups.values(): subgroup_data = read_variables(subgroup) data_dict.update(subgroup_data)return data_dictdef netcdf_to_csv(input_file, output_file): dataset = nc.Dataset(input_file, "r") # Open the NetCDF file in read mode# Create a dictionary to store the variable data data_dict = read_variables(dataset)# Find the maximum length among the variable data max_length =max(var_data.size for var_data in data_dict.values())# Pad variables with shorter lengths to match the maximum lengthfor var in data_dict: var_data = data_dict[var]if var_data.size < max_length: var_data = np.resize(var_data, max_length) data_dict[var] = var_data# Create a pandas DataFrame using the data dictionary data = pd.DataFrame(data_dict)# Save the DataFrame to a CSV file data.to_csv(output_file, index=False)print("Conversion complete.")# Usage exampleinput_file ="/content/MAIA_L4_GFPM_20180101T000000Z_FB_NOM_R01_USA-Boston_F01_VSIM01p01p01p01.nc"output_file ="/content/output.csv"netcdf_to_csv(input_file, output_file)