cape.tnakit.statutils: Statistics tools

This module includes several shorthand calls to statistical functions from scipy.stats. The primary tool provided by this module is to calculate 99% (or any other fraction) coverage ranges for two data sets.

This module depends on scipy.stats from the SciPy package. To ensure that this package is installed, even without root privileges on your system, run

pip install --user --upgrade scipy

This module does not provide a general-purpose statistical toolkit that wraps a complete package like scipy.stats. Instead, it provides a small set of tools that are common in handling data relevant to aerosciences databases but not commonly found in common statistics libraries.

cape.tnakit.statutils.check_outliers(dx, cov, **kw)

Find outliers in a data set

Call:
>>> I = check_outliers(dx, cov, **kw)
Inputs:
dx: np.ndarray[float]

Array of signed deltas

cov: 0.95 | 0 < float < 1

Coverage percentage

cdf, CoverageCDF: {cov} | 0 < float < 1

CDF if no extra coverage needed

osig, OutlierSigma: {1.5*ksig} | float

Multiple of standard deviation to identify outliers; default is 150% of the nominal coverage calculated using t-distribution.

Outputs:
I: np.ndarray (bool)

Flags for non-outlier cases, False if case is an outlier

Versions:
  • 2019-02-04 @ddalle: First version

  • 2019-02-13 @ddalle: Moved to stats

cape.tnakit.statutils.get_cov_interval(dx, cov, **kw)

Calculate Student’s t-distribution confidence range

If the nominal application of the Student’s t-distribution fails to cover a high enough fraction of the data, the bounds are extended until the data is covered.

Call:
>>> a, b = get_cov_interval(dx, cov, **kw)
Inputs:
dx: np.ndarray[float]

Array of signed deltas

cov: 0 < float < 1

Coverage percentage

cdf, CoverageCDF: {cov} | 0 < float < 1

CDF if no extra coverage needed

osig, OutlierSigma: {1.5*ksig} | float

Multiple of standard deviation to identify outliers; default is 150% of the nominal coverage calculated using t-distribution.

Outputs:
a: float

Lower bound of coverage interval

b: float

Upper bound of coverage interval

Versions:
  • 2019-02-04 @ddalle: First version

  • 2019-02-13 @ddalle: Moved to stats

cape.tnakit.statutils.get_range(R, cov, **kw)

Calculate Student’s t-distribution confidence range

If the nominal application of the Student’s t-distribution fails to cover a high enough fraction of the data, the bounds are extended until the data is covered.

Call:
>>> width = get_range(R, cov, **kw)
Inputs:
R: np.ndarray[float]

Array of ranges (absolute values of deltas)

cov: 0.95 | 0 < float < 1

Coverage percentage

cdf, CoverageCDF: {cov} | 0 < float < 1

CDF if no extra coverage needed

osig, OutlierSigma: {1.5*ksig} | float

Multiple of standard deviation to identify outliers; default is 150% of the nominal coverage calculated using t-distribution.

Outputs:
width: float

Half-width of confidence region

Versions:
  • 2018-09-28 @ddalle: First version

  • 2019-01-30 @ddalle: Offloaded to get_range()

  • 2019-02-13 @ddalle: Moved to stats