Uncertain Data#

The progpy.uncertain_data package includes classes for representing data with uncertainty. All types of UncertainData can be operated on using the interface. Inidividual classes for representing uncertain data of different kinds are described below, in Implemented UncertainData Types.

Interface#

class progpy.uncertain_data.UncertainData(_type=<class 'dict'>)#

Abstract base class for data with uncertainty. Any new uncertainty type must implement this class

abstract property cov#

The covariance matrix of the UncertiantyData distribution or samples in order of keys (i.e., cov[1][1] is the standard deviation for key keys()[1])

Returns

Covariance matrix

Return type

np.array[np.array[float]]

Example

covariance_matrix = data.cov
describe(title: str = 'UncertainData Metrics', print: bool = True) collections.defaultdict#

Print and view basic statistical information about this UncertainData object in a text-based printed table.

Parameters
  • title – str Title of the table, printed before data rows.

  • print – bool = True Optional argument specifying whether to print or not; default true.

Returns

defaultdict

Dictionary of lists used to print metrics.

Example

data.describe()
abstract keys()#

Get the keys for the property represented

Returns

keys

Return type

list[str]

Example

keys = data.keys()
abstract property mean#

The mean of the UncertainData distribution or samples

Returns

Mean value. e.g., {‘key1’: 23.2, …}

Return type

dict[str, float]

Example

mean_value = data.mean
abstract property median#

The median of the UncertainData distribution or samples

Returns

Median value. e.g., {‘key1’: 23.2, …}

Return type

dict[str, float]

Example

median_value = data.median
metrics(**kwargs) dict#

Calculate Metrics for this dist

Keyword Arguments
  • ground_truth (int or dict, optional) – Ground truth value. Defaults to None.

  • n_samples (int, optional) – Number of samples to use for calculating metrics (if not UnweightedSamples)

  • keys (list[str], optional) – Keys to calculate metrics for. Defaults to all keys.

Returns

Dictionary of metrics

Return type

dict

Example

print(data.metrics())
m = data.metrics(ground_truth={'key1': 200, 'key2': 350})
m = data.metrics(keys=['key1', 'key3'])
percentage_in_bounds(bounds: tuple, keys: list = None, n_samples: int = 1000) dict#

Calculate percentage of dist is within specified bounds

Parameters
  • bounds (tuple[float, float] or dict) –

    Lower and upper bounds.

    if tuple: (lower, upper)

    if dict: {key: (lower, upper), …}

  • keys (list[str], optional) – UncertainData keys to consider when calculating. Defaults to all keys.

  • n_samples (int, optional) – Number of samples to use when calculating

Returns

Percentage within bounds for each key in keys (where 0.5 = 50%). e.g., {‘key1’: 1, ‘key2’: 0.75}

Return type

dict

Example

data.percentage_in_bounds((1025, 1075))
data.percentage_in_bounds({'key1': (1025, 1075), 'key2': (2520, 2675)})
data.percentage_in_bounds((1025, 1075), keys=['key1', 'key3'])
plot_hist(fig=None, keys=None, num_samples=100, **kwargs)#

Create a histogram

Parameters
  • fig (MatPlotLib Figure, optional) – Existing histogram figure to be overritten. Defaults to create new figure.

  • num_samples (int, optional) – Number of samples to plot. Defaults to 100

  • keys (list(String), optional) – Keys to be plotted. Defaults to None.

Example

m = [5, 7, 3]
c = [[0.3, 0.5, 0.1], [0.6, 0.7, 1e-9], [1e-9, 1e-10, 1]]
d = MultivariateNormalDist(['a', 'b', 'c'], m, c)
d.plot_hist() # With 100 samples
states.plot_hist(num_samples=20) # Specifying the number of samples to plot
states.plot_hist(keys=['a', 'b']) # only plot those keys
plot_scatter(fig: matplotlib.figure.Figure = None, keys: list = None, num_samples: int = 100, **kwargs) matplotlib.figure.Figure#

Produce a scatter plot

Parameters
  • fig (Figure, optional) – Existing figure previously used to plot states. If passed a figure argument additional data will be added to the plot. Defaults to creating new figure

  • keys (list[str], optional) – Keys to plot. Defaults to all keys.

  • num_samples (int, optional) – Number of samples to plot. Defaults to 100

  • **kwargs (optional) – Additional keyword arguments passed to scatter function.

Returns

Figure

Example

m = [5, 7, 3]
c = [[0.3, 0.5, 0.1], [0.6, 0.7, 1e-9], [1e-9, 1e-10, 1]]
d = MultivariateNormalDist(['a', 'b', 'c'], m, c)
d.plot_scatter() # With 100 samples
states.plot_scatter(num_samples=5) # Specifying the number of samples to plot
states.plot_scatter(keys=['a', 'b']) # only plot those keys
relative_accuracy(ground_truth: dict) dict#

The relative accuracy is how close the mean of the distribution is to the ground truth, on relative terms

\(RA = 1 - \dfrac{\| r-p \|}{r}\)

Where r is ground truth and p is mean of predicted distribution 0

Returns

Relative accuracy for each event where value is relative accuracy between [0,1]

Return type

dict[str, float]

Example

ra = data.relative_accuracy({'key1': 22, 'key2': 57})

References

0

Prognostics: The Science of Making Predictions (Goebel et al, 239)

abstract sample(nSamples: int = 1)#

Generate samples from data

Parameters

nSamples (int, optional) – Number of samples to generate. Defaults to 1.

Returns

Array of nSamples samples

Return type

samples (UnweightedSamples)

Example

samples = data.samples(100)

Implemented UncertainData Types#

class progpy.uncertain_data.UnweightedSamples(samples: list = [], _type=<class 'dict'>)#

Uncertain Data represented by a set of samples. Objects of this class can be treated like a list where samples[n] returns the nth sample (Dict).

Parameters

samples (array, dict, or model.*Container, optional) –

array of samples. Defaults to empty array.

If dict, must be of the form of {key: [value, …], …}

If list, must be of the form of [{key: value, …}, …]

If InputContainer, OutputContainer, or StateContainer, must be of the form of *Container({‘key’: value, …})

key(key) list#

Return samples for given key

Parameters

key (str) – key

Returns

list of values for given key

Return type

list