`cape.statutils`: Statistics tools¶

This module includes several shorthand calls to statistical functions from scipy.stats. The primary tool provided by this module is to calculate 99% (or any other fraction) coverage ranges for two data sets.

This module depends on scipy.stats from the SciPy package. To ensure that this package is installed, even without root privileges on your system, run

pip install --user --upgrade scipy

This module does not provide a general-purpose statistical toolkit that wraps a complete package like scipy.stats. Instead, it provides a small set of tools that are common in handling data relevant to aerosciences databases but not commonly found in common statistics libraries.

cape.statutils.check_outliers(dx, cov=None, **kw)¶

Find outliers in a data set

Call:

>>> I = check_outliers(dx, cov, **kw)

Inputs:

dx: np.ndarray[float]: Array of signed deltas
cov, Coverage: {None} | 0 < float < 1: Strict coverage fraction
ksig, CoverageSigma: {None} | float: Number of standard deviations to cover (default based on cov; user must supply either cov or ksig or both)
cdf, CoverageCDF: {cov} | 0 < float < 1: Fraction to use to define ksig
osig, OutlierSigma: {1.5*ksig} | float: Multiple of standard deviation to identify outliers

Outputs:

I: np.ndarray[bool]: Flags for non-outlier cases, False if case is an outlier

Versions:

2019-02-04 @ddalle: Version 1.0
2021-09-20 @ddalle: Version 1.1
- use _parse_options()
- allow 100% coverage

cape.statutils.check_outliers_range(R, cov=None, **kw)¶

Find outliers in an array of ranges

Call:

>>> I = check_outliers_range(R, cov, **kw)

Inputs:

R: np.ndarray[float]: Array of ranges (unsigned deltas)
cov, Coverage: {None} | 0 < float < 1: Strict coverage fraction
ksig, CoverageSigma: {None} | float: Number of standard deviations to cover (default based on cov; user must supply either cov or ksig or both)
cdf, CoverageCDF: {cov} | 0 < float < 1: Fraction to use to define ksig
osig, OutlierSigma: {1.5*ksig} | float: Multiple of standard deviation to identify outliers

Outputs:

I: np.ndarray[bool]: Flags for non-outlier cases, False if case is an outlier

Versions:

2021-02-20 @ddalle: Version 1.0

cape.statutils.get_cov_interval(dx, cov=None, **kw)¶

Calculate Student’s t-distribution confidence range

If the nominal application of the Student’s t-distribution fails to cover a high enough fraction of the data, the bounds are extended until cov (user-defined fraction) of the data is covered.

Call:

>>> a, b = get_cov_interval(dx, cov, **kw)

Inputs:

dx: np.ndarray[float]: Array of signed deltas
cov, Coverage: {None} | 0 < float < 1: Strict coverage fraction
ksig, CoverageSigma: {None} | float: Number of standard deviations to cover (default based on cov; user must supply either cov or ksig or both)
cdf, CoverageCDF: {cov} | 0 < float < 1: Fraction to use to define ksig
osig, OutlierSigma: {1.5*ksig} | float: Multiple of standard deviation to identify outliers

Outputs:

a: float: Lower bound of coverage interval
b: float: Upper bound of coverage interval

Versions:

2019-02-04 @ddalle: Version 1.0
2021-09-20 @ddalle: Version 1.1
- use _parse_options()
- allow 100% coverage
- remove confusing kcov vs ksig scaling

cape.statutils.get_coverage(dx, cov=None, **kw)¶

Calculate Student’s t-distribution confidence range

If the nominal application of the Student’s t-distribution fails to cover a high enough fraction of the data, the bounds are extended until cov (user-defined fraction) of the data is covered.

Call:

>>> width = get_coverage(dx, cov, **kw)

Inputs:

dx: np.ndarray[float]: Array of signed deltas
cov, Coverage: {None} | 0 < float < 1: Strict coverage fraction
ksig, CoverageSigma: {None} | float: Number of standard deviations to cover (default based on cov; user must supply either cov or ksig or both)
cdf, CoverageCDF: {cov} | 0 < float < 1: Fraction to use to define ksig
osig, OutlierSigma: {1.5*ksig} | float: Multiple of standard deviation to identify outliers

Outputs:

width: float: Half-width of confidence region

Versions:

2019-02-04 @ddalle: Version 1.0
2021-09-20 @ddalle: Version 1.1
- use _parse_options()
- allow 100% coverage
- remove confusing kcov vs ksig scaling

cape.statutils.get_ordered_lower(V, cov)¶

Calculate value less than fraction cov of V’s values

Call:

>>> v = get_ordered_lower(V, cov)

Inputs:

V: np.ndarray[float]: Array of scalar values
cov: float: Coverage fraction, 0 < cov <= 1

Outputs:

v: float: Value such that cov*V.size entries in V are greater than or equal to v; may be interpolated between sorted values of V

Versions:

2021-09-30 @ddalle: Version 1.0

cape.statutils.get_ordered_stats(V, cov=None, onesided=False, **kw)¶

Calculate coverage using ordered statistics

Call:

>>> vmin, vmax = get_ordered_stats(V, cov)
>>> vmin, vmax = get_ordered_stats(V, **kw)
>>> vlim = get_ordered_stats(V, cov, onesided=True)
>>> vlim = get_ordered_stats(V, onsided=True, **kw)

Inputs:

V: np.ndarray[float]: Array of scalar values
cov: float: Coverage fraction, 0 < cov <= 1
onsided: True | {False}: Option to find coverage of one-sided distribution
ksig: {None} | float: Option to calculate cov based on Gaussian distribution
tsig: {None} | float: Option to calculate cov based on Student’s t-distribution

Outputs:

vmin: float: Lower limit of two-sided coverage interval
vmax: float: Upper limit of two-sided coverage interval
vlim: float: Upper limit of one-sided coverage interval

Versions:

2021-09-30 @ddalle: Version 1.0

cape.statutils.get_ordered_upper(V, cov)¶

Calculate value greater than fraction cov of V’s values

Call:

>>> v = get_ordered_upper(V, cov)

Inputs:

V: np.ndarray[float]: Array of scalar values
cov: float: Coverage fraction, 0 < cov <= 1

Outputs:

v: float: Value such that cov*V.size entries in V are less than or equal to v; may be interpolated between sorted values of V

Versions:

2021-09-30 @ddalle: Version 1.0

cape.statutils.get_range(R, cov=None, **kw)¶

Calculate Student’s t-distribution confidence range

If the nominal application of the Student’s t-distribution fails to cover a high enough fraction of the data, the bounds are extended until the data is covered.

Call:

>>> width = get_range(R, cov, **kw)

Inputs:

R: np.ndarray[float]: Array of ranges (absolute values of deltas)
cov, Coverage: {None} | 0 < float < 1: Strict coverage fraction
ksig, CoverageSigma: {None} | float: Number of standard deviations to cover (default based on cov; user must supply either cov or ksig or both)
cdf, CoverageCDF: {cov} | 0 < float < 1: Fraction to use to define ksig
osig, OutlierSigma: {1.5*ksig} | float: Multiple of standard deviation to identify outliers

Outputs:

width: float: Half-width of confidence region

Versions:

2018-09-28 @ddalle: Version 1.0
2021-09-20 @ddalle: Version 1.1
- use _parse_options()
- allow 100% coverage
- remove confusing kcov vs ksig scaling

`cape.statutils`: Statistics tools¶

Previous topic

Next topic

This Page

cape.statutils: Statistics tools¶

`cape.statutils`: Statistics tools¶