Data Functionalization¶

This notebook shows how to use the Functionalize_Dataset function to easily functionalize a given dataset, even if a custom interpolator is desired. See the first cell for details on the function execution.

How to functionalize data¶

The example below shows the creation of the inputs, including example datasets with 7 dimensions. Any number of dimensions can be functionalized.

In [1]:

            
                Copied!
                
from kamodo_ccmc.tools.functionalize import Functionalize_Dataset
help(Functionalize_Dataset)
from kamodo_ccmc.tools.functionalize import Functionalize_Dataset
help(Functionalize_Dataset)

Help on function Functionalize_Dataset in module kamodo_ccmc.tools.functionalize:

Functionalize_Dataset(coord_dict, data_dict, kamodo_object=None, coord_str='', func=None, func_default='data')
    Determine and call the correct functionalize routine.
    Inputs:
        coord_dict: a dictionary containing the coordinate information.
            {'name_of_coord1': {'units': 'coord1_units', 'data': coord1_data},
             'name_of_coord2': {'units': 'coord2_units', 'data': coord2_data},
             etc...}
            coordX_data should be a 1D array. All others should be strings.
        data_dict: a dictionary containing the data information.
            {'variable_name1': {'units': 'data1_units', 'data': data1_array},
             'variable_name2': {'units': 'data2_units', 'data': data2_array},
             etc...}
            dataX_array should have the same shape as
                (coord1, coord2, coord3, ..., coordN)
        Note:The datasets given in the data_dict dictionary should all have the
            same dimensions. Datasets with different dimensions can be
            functionalized by simply calling the function again with the other
            dataset and the associated coordinate arrays. The datasets must
            also EACH depend upon ALL of the coordinate arrays given.
        coord_str: a string indicating the coordinate system of the data
            (e.g. "SMcar" or "GEOsph").
        kamodo_object: the previously created kamodo object. If one is not
            given, then one will be created.
        func: the function to be used for interpolation through the given
            datasets. The function must accept values for interpolation in an
            identical call structure as SciPy's RegularGridInterpolator or
            interp1D. See SciPy's documentation for more information.
        func_default: a string indicating whether a custom interpolation
            method is dersired. The default is 'data', indicating that the
            standard interpolation method will be used. Set this to 'custom' to
            indicate that func is a custom interpolator.
    
    Output: A kamodo object with the functionalized dataset.
    
    This is similar to RU.Functionalize_Dataset, except only the gridded
        interpolator is registered.

In [2]:

            
                Copied!
                
                    
                    
                
                

        
# Example of functionalizing a 7D array
import numpy as np
rng1 = np.random.RandomState(1)  # Seed the random generators differently
rng2 = np.random.RandomState(2)  # or the arrays created below will be identical.
coord_dict = {'time': {'units': 'hr', 'data': np.linspace(0., 24., 25)},
              'lon': {'units': 'deg', 'data': np.linspace(-180., 180., 12)},
              'lat': {'units': 'deg', 'data': np.linspace(-90., 90., 5)},
              'radius': {'units': 'R_E', 'data': np.linspace(0., 50., 10)},
              'nonsense': {'units': 'm/m', 'data': np.linspace(1., 15., 15)},
              'nope': {'units': 'm', 'data': np.linspace(1., 150., 25)},
              'nada': {'units': 'hPa', 'data': np.linspace(0.00005, 15000., 20)}}
var_dict = {'Test_7D': {'units': 'S', 'data': rng1.rand(25, 12, 5, 10, 15, 25, 20)},
            'Good_7D': {'units': 'mK', 'data': rng2.rand(25, 12, 5, 10, 15, 25, 20)}}
kamodo_object = Functionalize_Dataset(coord_dict, var_dict)
kamodo_object
# Example of functionalizing a 7D array
import numpy as np
rng1 = np.random.RandomState(1)  # Seed the random generators differently
rng2 = np.random.RandomState(2)  # or the arrays created below will be identical.
coord_dict = {'time': {'units': 'hr', 'data': np.linspace(0., 24., 25)},
              'lon': {'units': 'deg', 'data': np.linspace(-180., 180., 12)},
              'lat': {'units': 'deg', 'data': np.linspace(-90., 90., 5)},
              'radius': {'units': 'R_E', 'data': np.linspace(0., 50., 10)},
              'nonsense': {'units': 'm/m', 'data': np.linspace(1., 15., 15)},
              'nope': {'units': 'm', 'data': np.linspace(1., 150., 25)},
              'nada': {'units': 'hPa', 'data': np.linspace(0.00005, 15000., 20)}}
var_dict = {'Test_7D': {'units': 'S', 'data': rng1.rand(25, 12, 5, 10, 15, 25, 20)},
            'Good_7D': {'units': 'mK', 'data': rng2.rand(25, 12, 5, 10, 15, 25, 20)}}
kamodo_object = Functionalize_Dataset(coord_dict, var_dict)
kamodo_object

Out[2]:

\begin{equation}\operatorname{Test_{7D}}(time[hr],lon[deg],lat[deg],radius[R_{E}],nonsense[1],nope[m],nada[hPa])[S] = \lambda{\left(time,lon,lat,radius,nonsense,nope,nada \right)}\end{equation} \begin{equation}\operatorname{Good_{7D}}(time[hr],lon[deg],lat[deg],radius[R_{E}],nonsense[1],nope[m],nada[hPa])[mK] = \lambda{\left(time,lon,lat,radius,nonsense,nope,nada \right)}\end{equation}

Generating a generic 1D Plot¶

Plot a 1D slice of all the variables by choosing a slice value in all but one dimension.

kamodo_object.plot('Test_7D', 'Good_7D', plot_partial={
    'Test_7D': {'time': 12., 'lon': 0.5, 'lat': -20., 'radius': 15., 'nonsense': 11.5, 'nope': 5.},
    'Good_7D': {'time': 12., 'lon': 0.5, 'lat': -20., 'radius': 15., 'nonsense': 11.5, 'nope': 5.}})

Screenshot

Generating a generic 2D Plot¶

Plot a 2D slice of one variable by choosing a slice value in all but two dimensions.

kamodo_object.plot('Test_7D', plot_partial={
    'Test_7D': {'time': 12., 'lon': 0.5, 'lat': -20., 'radius': 15., 'nonsense': 11.5}})

Screenshot

Adding new functionalized datasets to a kamodo object¶

In [3]:

            
                Copied!
                
                    
                    
                
                

        
# You can add datasets of other dimensions to the same kamodo_object.
coord_dict = {'time': {'units': 'hr', 'data': np.linspace(0., 24., 25)}}
var_dict = {'Test_1D': {'units': 'S', 'data': rng1.rand(25)},
            'Good_1D': {'units': 'mK', 'data': rng2.rand(25)}}
kamodo_object = Functionalize_Dataset(coord_dict, var_dict, kamodo_object)
kamodo_object
# You can add datasets of other dimensions to the same kamodo_object.
coord_dict = {'time': {'units': 'hr', 'data': np.linspace(0., 24., 25)}}
var_dict = {'Test_1D': {'units': 'S', 'data': rng1.rand(25)},
            'Good_1D': {'units': 'mK', 'data': rng2.rand(25)}}
kamodo_object = Functionalize_Dataset(coord_dict, var_dict, kamodo_object)
kamodo_object

Out[3]:

\begin{equation}\operatorname{Test_{7D}}(time[hr],lon[deg],lat[deg],radius[R_{E}],nonsense[1],nope[m],nada[hPa])[S] = \lambda{\left(time,lon,lat,radius,nonsense,nope,nada \right)}\end{equation} \begin{equation}\operatorname{Good_{7D}}(time[hr],lon[deg],lat[deg],radius[R_{E}],nonsense[1],nope[m],nada[hPa])[mK] = \lambda{\left(time,lon,lat,radius,nonsense,nope,nada \right)}\end{equation} \begin{equation}\operatorname{Test_{1D}}(time[hr])[S] = \lambda{\left(time \right)}\end{equation} \begin{equation}\operatorname{Good_{1D}}(time[hr])[mK] = \lambda{\left(time \right)}\end{equation}

You can plot all of the functions on the same plot as long as the independent variable is the same (time in this example).

kamodo_object.plot('Test_1D', 'Good_1D', 'Test_7D', 'Good_7D', plot_partial={
    'Test_7D': {'lon': 0.5, 'lat': -20., 'radius': 15., 'nonsense': 11.5, 'nope': 5., 'nada': 12.},
    'Good_7D': {'lon': 0.5, 'lat': -20., 'radius': 15., 'nonsense': 11.5, 'nope': 5., 'nada': 12.}})

Screenshot

In [10]:

            
                Copied!
                
                    
                    
                
                

        
# You even use a custom interpolator if desired for a new dataset added to the same kamodo_object.
# The interpolator must be defined separately for each dataset.
coord_dict = {'time': {'units': 'hr', 'data': np.linspace(0., 24., 25)},
              'lon': {'units': 'deg', 'data': np.linspace(-180., 180., 12)},
              'lat': {'units': 'deg', 'data': np.linspace(-90., 90., 5)}}
var_dict = {'TestCustomA_3D': {'units': 'S', 'data': rng1.rand(25, 12, 5)},
            'TestCustomB_3D': {'units': 'm/s', 'data': rng2.rand(25, 12, 5)*-2.}}

# Define a custom interpolator (simple example)
# see https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.RegularGridInterpolator.html
from numpy import NaN
from scipy.interpolate import RegularGridInterpolator as RGI
coord_list = [value['data'] for key, value in coord_dict.items()]
for key in var_dict.keys():
    rgi = RGI(coord_list, var_dict[key]['data'], bounds_error=False,
                fill_value=-10., method='nearest')
    # wrap in a function and return the function
    def interp(xvec):
        return rgi(xvec)
    tmp_dict = {key: var_dict[key]}  # construct a separate dictionary for the current variable
    kamodo_object = Functionalize_Dataset(coord_dict, tmp_dict, kamodo_object, func=interp, func_default='custom')
kamodo_object
# You even use a custom interpolator if desired for a new dataset added to the same kamodo_object.
# The interpolator must be defined separately for each dataset.
coord_dict = {'time': {'units': 'hr', 'data': np.linspace(0., 24., 25)},
              'lon': {'units': 'deg', 'data': np.linspace(-180., 180., 12)},
              'lat': {'units': 'deg', 'data': np.linspace(-90., 90., 5)}}
var_dict = {'TestCustomA_3D': {'units': 'S', 'data': rng1.rand(25, 12, 5)},
            'TestCustomB_3D': {'units': 'm/s', 'data': rng2.rand(25, 12, 5)*-2.}}

# Define a custom interpolator (simple example)
# see https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.RegularGridInterpolator.html
from numpy import NaN
from scipy.interpolate import RegularGridInterpolator as RGI
coord_list = [value['data'] for key, value in coord_dict.items()]
for key in var_dict.keys():
    rgi = RGI(coord_list, var_dict[key]['data'], bounds_error=False,
                fill_value=-10., method='nearest')
    # wrap in a function and return the function
    def interp(xvec):
        return rgi(xvec)
    tmp_dict = {key: var_dict[key]}  # construct a separate dictionary for the current variable
    kamodo_object = Functionalize_Dataset(coord_dict, tmp_dict, kamodo_object, func=interp, func_default='custom')
kamodo_object

Out[10]:

\begin{equation}\operatorname{Test_{7D}}(time[hr],lon[deg],lat[deg],radius[R_{E}],nonsense[1],nope[m],nada[hPa])[S] = \lambda{\left(time,lon,lat,radius,nonsense,nope,nada \right)}\end{equation} \begin{equation}\operatorname{Good_{7D}}(time[hr],lon[deg],lat[deg],radius[R_{E}],nonsense[1],nope[m],nada[hPa])[mK] = \lambda{\left(time,lon,lat,radius,nonsense,nope,nada \right)}\end{equation} \begin{equation}\operatorname{Test_{1D}}(time[hr])[S] = \lambda{\left(time \right)}\end{equation} \begin{equation}\operatorname{Good_{1D}}(time[hr])[mK] = \lambda{\left(time \right)}\end{equation} \begin{equation}\operatorname{TestCustomA_{3D}}(time[hr],lon[deg],lat[deg])[S] = \lambda{\left(time,lon,lat \right)}\end{equation} \begin{equation}\operatorname{TestCustomB_{3D}}(time[hr],lon[deg],lat[deg])[m / s] = \lambda{\left(time,lon,lat \right)}\end{equation}

Plot a 1D slice of all the variables by choosing a slice value in all but one dimension.

kamodo_object.plot('TestCustomB_3D', plot_partial={'TestCustomB_3D':{'time': 12.56}})

Screenshot

Metadata functions¶

In [6]:

            
                Copied!
                
# Access the metadata
kamodo_object['Test_1D'].meta
# Access the metadata
kamodo_object['Test_1D'].meta

Out[6]:

{'units': 'S',
 'arg_units': {'time': 'hr'},
 'citation': None,
 'equation': None,
 'hidden_args': []}

In [7]:

            
                Copied!
                
# Add to the metadata
kamodo_object['Test_1D'].meta['description'] = 'Testing the functionalize.py script'
kamodo_object['Test_1D'].meta['citation'] = 'Ringuette et al. 2022'
kamodo_object['Test_1D'].meta
# Add to the metadata
kamodo_object['Test_1D'].meta['description'] = 'Testing the functionalize.py script'
kamodo_object['Test_1D'].meta['citation'] = 'Ringuette et al. 2022'
kamodo_object['Test_1D'].meta

Out[7]:

{'units': 'S',
 'arg_units': {'time': 'hr'},
 'citation': 'Ringuette et al. 2022',
 'equation': None,
 'hidden_args': [],
 'description': 'Testing the functionalize.py script'}

In [8]:

            
                Copied!
                
# See a pandas format output
kamodo_object.detail()
# See a pandas format output
kamodo_object.detail()

Out[8]:

	symbol	units	lhs	rhs	arg_units
Test_7D	Test_7D(time, lon, lat, radius, nonsense, nope...	S	Test_7D	lambda(time, lon, lat, radius, nonsense, nope,...	{'time': 'hr', 'lon': 'deg', 'lat': 'deg', 'ra...
Good_7D	Good_7D(time, lon, lat, radius, nonsense, nope...	mK	Good_7D	lambda(time, lon, lat, radius, nonsense, nope,...	{'time': 'hr', 'lon': 'deg', 'lat': 'deg', 'ra...
Test_1D	Test_1D(time)	S	Test_1D	lambda(time)	{'time': 'hr'}
Good_1D	Good_1D(time)	mK	Good_1D	lambda(time)	{'time': 'hr'}
TestCustomA_3D	TestCustomA_3D(time, lon, lat)	S	TestCustomA_3D	lambda(time, lon, lat)	{'time': 'hr', 'lon': 'deg', 'lat': 'deg'}
TestCustomB_3D	TestCustomB_3D(time, lon, lat)	m/s	TestCustomB_3D	lambda(time, lon, lat)	{'time': 'hr', 'lon': 'deg', 'lat': 'deg'}

In [9]:

            
                Copied!
                
# Determine the dependent coordinates and the coordinate ranges
import kamodo_ccmc.flythrough.model_wrapper as MW
MW.Coord_Range(kamodo_object, ['Test_7D'])
# Determine the dependent coordinates and the coordinate ranges
import kamodo_ccmc.flythrough.model_wrapper as MW
MW.Coord_Range(kamodo_object, ['Test_7D'])

The minimum and maximum values for each variable and coordinate are:
Test_7D:
time: [0.0, 24.0, 'hr']
lon: [-180.0, 180.0, 'deg']
lat: [-90.0, 90.0, 'deg']
radius: [0.0, 50.0, 'R_E']
nonsense: [1.0, 15.0, 'm/m']
nope: [1.0, 150.0, 'm']
nada: [5e-05, 15000.0, 'hPa']