cape.tnakit.statutils
: Statistics tools¶
This module includes several shorthand calls to statistical functions
from scipy.stats
. The primary tool provided by this module is to
calculate 99% (or any other fraction) coverage ranges for two data sets.
This module depends on scipy.stats
from the SciPy package. To
ensure that this package is installed, even without root privileges on
your system, run
pip install --user --upgrade scipy
This module does not provide a general-purpose statistical toolkit that
wraps a complete package like scipy.stats
. Instead, it provides
a small set of tools that are common in handling data relevant to
aerosciences databases but not commonly found in common statistics
libraries.
- cape.tnakit.statutils.check_outliers(dx, cov, **kw)¶
Find outliers in a data set
- Call:
>>> I = check_outliers(dx, cov, **kw)
- Inputs:
- dx:
np.ndarray
[float
] Array of signed deltas
- cov:
0.95
| 0 <float
< 1 Coverage percentage
- cdf, CoverageCDF: {cov} | 0 <
float
< 1 CDF if no extra coverage needed
- osig, OutlierSigma: {
1.5*ksig
} |float
Multiple of standard deviation to identify outliers; default is 150% of the nominal coverage calculated using t-distribution.
- dx:
- Outputs:
- I:
np.ndarray
(bool
) Flags for non-outlier cases,
False
if case is an outlier
- I:
- Versions:
2019-02-04
@ddalle
: First version2019-02-13
@ddalle
: Moved tostats
- cape.tnakit.statutils.get_cov_interval(dx, cov, **kw)¶
Calculate Student’s t-distribution confidence range
If the nominal application of the Student’s t-distribution fails to cover a high enough fraction of the data, the bounds are extended until the data is covered.
- Call:
>>> a, b = get_cov_interval(dx, cov, **kw)
- Inputs:
- dx:
np.ndarray
[float
] Array of signed deltas
- cov: 0 <
float
< 1 Coverage percentage
- cdf, CoverageCDF: {cov} | 0 <
float
< 1 CDF if no extra coverage needed
- osig, OutlierSigma: {
1.5*ksig
} |float
Multiple of standard deviation to identify outliers; default is 150% of the nominal coverage calculated using t-distribution.
- dx:
- Outputs:
- a:
float
Lower bound of coverage interval
- b:
float
Upper bound of coverage interval
- a:
- Versions:
2019-02-04
@ddalle
: First version2019-02-13
@ddalle
: Moved tostats
- cape.tnakit.statutils.get_range(R, cov, **kw)¶
Calculate Student’s t-distribution confidence range
If the nominal application of the Student’s t-distribution fails to cover a high enough fraction of the data, the bounds are extended until the data is covered.
- Call:
>>> width = get_range(R, cov, **kw)
- Inputs:
- R:
np.ndarray
[float
] Array of ranges (absolute values of deltas)
- cov:
0.95
| 0 <float
< 1 Coverage percentage
- cdf, CoverageCDF: {cov} | 0 <
float
< 1 CDF if no extra coverage needed
- osig, OutlierSigma: {
1.5*ksig
} |float
Multiple of standard deviation to identify outliers; default is 150% of the nominal coverage calculated using t-distribution.
- R:
- Outputs:
- width:
float
Half-width of confidence region
- width:
- Versions:
2018-09-28
@ddalle
: First version2019-01-30
@ddalle
: Offloaded toget_range()
2019-02-13
@ddalle
: Moved tostats