`cape.tnakit.statutils`: Statistics tools¶

This module includes several shorthand calls to statistical functions from scipy.stats. The primary tool provided by this module is to calculate 99% (or any other fraction) coverage ranges for two data sets.

This module depends on scipy.stats from the SciPy package. To ensure that this package is installed, even without root privileges on your system, run

pip install --user --upgrade scipy

This module does not provide a general-purpose statistical toolkit that wraps a complete package like scipy.stats. Instead, it provides a small set of tools that are common in handling data relevant to aerosciences databases but not commonly found in common statistics libraries.

cape.tnakit.statutils.check_outliers(dx, cov, **kw)¶

Find outliers in a data set

Call:

>>> I = check_outliers(dx, cov, **kw)

Inputs:

dx: np.ndarray[float]: Array of signed deltas
cov: 0.95 | 0 < float < 1: Coverage percentage
cdf, CoverageCDF: {cov} | 0 < float < 1: CDF if no extra coverage needed
osig, OutlierSigma: {1.5*ksig} | float: Multiple of standard deviation to identify outliers; default is 150% of the nominal coverage calculated using t-distribution.

Outputs:

I: np.ndarray (bool): Flags for non-outlier cases, False if case is an outlier

Versions:

2019-02-04 @ddalle: First version
2019-02-13 @ddalle: Moved to stats

cape.tnakit.statutils.get_cov_interval(dx, cov, **kw)¶

Calculate Student’s t-distribution confidence range

If the nominal application of the Student’s t-distribution fails to cover a high enough fraction of the data, the bounds are extended until the data is covered.

Call:

>>> a, b = get_cov_interval(dx, cov, **kw)

Inputs:

dx: np.ndarray[float]: Array of signed deltas
cov: 0 < float < 1: Coverage percentage
cdf, CoverageCDF: {cov} | 0 < float < 1: CDF if no extra coverage needed
osig, OutlierSigma: {1.5*ksig} | float: Multiple of standard deviation to identify outliers; default is 150% of the nominal coverage calculated using t-distribution.

Outputs:

a: float: Lower bound of coverage interval
b: float: Upper bound of coverage interval

Versions:

2019-02-04 @ddalle: First version
2019-02-13 @ddalle: Moved to stats

cape.tnakit.statutils.get_range(R, cov, **kw)¶

Calculate Student’s t-distribution confidence range

If the nominal application of the Student’s t-distribution fails to cover a high enough fraction of the data, the bounds are extended until the data is covered.

Call:

>>> width = get_range(R, cov, **kw)

Inputs:

R: np.ndarray[float]: Array of ranges (absolute values of deltas)
cov: 0.95 | 0 < float < 1: Coverage percentage
cdf, CoverageCDF: {cov} | 0 < float < 1: CDF if no extra coverage needed
osig, OutlierSigma: {1.5*ksig} | float: Multiple of standard deviation to identify outliers; default is 150% of the nominal coverage calculated using t-distribution.

Outputs:

width: float: Half-width of confidence region

Versions:

2018-09-28 @ddalle: First version
2019-01-30 @ddalle: Offloaded to get_range()
2019-02-13 @ddalle: Moved to stats

`cape.tnakit.statutils`: Statistics tools¶

Previous topic

Next topic

This Page

cape.tnakit.statutils: Statistics tools¶

`cape.tnakit.statutils`: Statistics tools¶