Uncertain Data#
The progpy.uncertain_data package includes classes for representing data with uncertainty. All types of UncertainData can be operated on using the interface. Inidividual classes for representing uncertain data of different kinds are described below, in Implemented UncertainData Types.
Interface#
- class progpy.uncertain_data.UncertainData(_type=<class 'dict'>)#
Abstract base class for data with uncertainty. Any new uncertainty type must implement this class
- abstract property cov#
The covariance matrix of the UncertiantyData distribution or samples in order of keys (i.e., cov[1][1] is the standard deviation for key keys()[1])
- Returns
Covariance matrix
- Return type
np.array[np.array[float]]
Example
covariance_matrix = data.cov
- describe(title: str = 'UncertainData Metrics', print: bool = True) collections.defaultdict #
Print and view basic statistical information about this UncertainData object in a text-based printed table.
- Parameters
title – str Title of the table, printed before data rows.
print – bool = True Optional argument specifying whether to print or not; default true.
- Returns
- defaultdict
Dictionary of lists used to print metrics.
Example
data.describe()
- abstract keys()#
Get the keys for the property represented
Example
keys = data.keys()
- abstract property mean#
The mean of the UncertainData distribution or samples
Example
mean_value = data.mean
- abstract property median#
The median of the UncertainData distribution or samples
Example
median_value = data.median
- metrics(**kwargs) dict #
Calculate Metrics for this dist
- Keyword Arguments
- Returns
Dictionary of metrics
- Return type
Example
print(data.metrics()) m = data.metrics(ground_truth={'key1': 200, 'key2': 350}) m = data.metrics(keys=['key1', 'key3'])
- percentage_in_bounds(bounds: tuple, keys: list = None, n_samples: int = 1000) dict #
Calculate percentage of dist is within specified bounds
- Parameters
- Returns
Percentage within bounds for each key in keys (where 0.5 = 50%). e.g., {‘key1’: 1, ‘key2’: 0.75}
- Return type
Example
data.percentage_in_bounds((1025, 1075)) data.percentage_in_bounds({'key1': (1025, 1075), 'key2': (2520, 2675)}) data.percentage_in_bounds((1025, 1075), keys=['key1', 'key3'])
- plot_hist(fig=None, keys=None, num_samples=100, **kwargs)#
Create a histogram
- Parameters
Example
m = [5, 7, 3] c = [[0.3, 0.5, 0.1], [0.6, 0.7, 1e-9], [1e-9, 1e-10, 1]] d = MultivariateNormalDist(['a', 'b', 'c'], m, c) d.plot_hist() # With 100 samples states.plot_hist(num_samples=20) # Specifying the number of samples to plot states.plot_hist(keys=['a', 'b']) # only plot those keys
- plot_scatter(fig: matplotlib.figure.Figure = None, keys: list = None, num_samples: int = 100, **kwargs) matplotlib.figure.Figure #
Produce a scatter plot
- Parameters
fig (Figure, optional) – Existing figure previously used to plot states. If passed a figure argument additional data will be added to the plot. Defaults to creating new figure
keys (list[str], optional) – Keys to plot. Defaults to all keys.
num_samples (int, optional) – Number of samples to plot. Defaults to 100
**kwargs (optional) – Additional keyword arguments passed to scatter function.
- Returns
Figure
Example
m = [5, 7, 3] c = [[0.3, 0.5, 0.1], [0.6, 0.7, 1e-9], [1e-9, 1e-10, 1]] d = MultivariateNormalDist(['a', 'b', 'c'], m, c) d.plot_scatter() # With 100 samples states.plot_scatter(num_samples=5) # Specifying the number of samples to plot states.plot_scatter(keys=['a', 'b']) # only plot those keys
- relative_accuracy(ground_truth: dict) dict #
The relative accuracy is how close the mean of the distribution is to the ground truth, on relative terms
\(RA = 1 - \dfrac{\| r-p \|}{r}\)
Where r is ground truth and p is mean of predicted distribution 0
- Returns
Relative accuracy for each event where value is relative accuracy between [0,1]
- Return type
Example
ra = data.relative_accuracy({'key1': 22, 'key2': 57})
References
- 0
Prognostics: The Science of Making Predictions (Goebel et al, 239)
- abstract sample(nSamples: int = 1)#
Generate samples from data
- Parameters
nSamples (int, optional) – Number of samples to generate. Defaults to 1.
- Returns
Array of nSamples samples
- Return type
samples (UnweightedSamples)
Example
samples = data.samples(100)
Implemented UncertainData Types#
- class progpy.uncertain_data.UnweightedSamples(samples: list = [], _type=<class 'dict'>)#
Uncertain Data represented by a set of samples. Objects of this class can be treated like a list where samples[n] returns the nth sample (Dict).
- Parameters
samples (array, dict, or model.*Container, optional) –
array of samples. Defaults to empty array.
If dict, must be of the form of {key: [value, …], …}
If list, must be of the form of [{key: value, …}, …]
If InputContainer, OutputContainer, or StateContainer, must be of the form of *Container({‘key’: value, …})
- class progpy.uncertain_data.MultivariateNormalDist(labels, mean: numpy.array, covar: numpy.array, _type=<class 'dict'>)#
Data represented by a multivariate normal distribution with mean and covariance matrix