cape.statutils
: Statistics tools¶
This module includes several shorthand calls to statistical functions
from scipy.stats
. The primary tool provided by this module is to
calculate 99% (or any other fraction) coverage ranges for two data sets.
This module depends on scipy.stats
from the SciPy package. To
ensure that this package is installed, even without root privileges on
your system, run
pip install --user --upgrade scipy
This module does not provide a general-purpose statistical toolkit that
wraps a complete package like scipy.stats
. Instead, it provides
a small set of tools that are common in handling data relevant to
aerosciences databases but not commonly found in common statistics
libraries.
- cape.statutils.check_outliers(dx, cov=None, **kw)¶
Find outliers in a data set
- Call:
>>> I = check_outliers(dx, cov, **kw)
- Inputs:
- dx:
np.ndarray
[float
] Array of signed deltas
- cov, Coverage: {
None
} | 0 <float
< 1 Strict coverage fraction
- ksig, CoverageSigma: {
None
} |float
Number of standard deviations to cover (default based on cov; user must supply either cov or ksig or both)
- cdf, CoverageCDF: {cov} | 0 <
float
< 1 Fraction to use to define ksig
- osig, OutlierSigma: {
1.5*ksig
} |float
Multiple of standard deviation to identify outliers
- dx:
- Outputs:
- I:
np.ndarray
[bool
] Flags for non-outlier cases,
False
if case is an outlier
- I:
- Versions:
2019-02-04
@ddalle
: Version 1.0- 2021-09-20
@ddalle
: Version 1.1 use
_parse_options()
allow 100% coverage
- 2021-09-20
- cape.statutils.check_outliers_range(R, cov=None, **kw)¶
Find outliers in an array of ranges
- Call:
>>> I = check_outliers_range(R, cov, **kw)
- Inputs:
- R:
np.ndarray
[float
] Array of ranges (unsigned deltas)
- cov, Coverage: {
None
} | 0 <float
< 1 Strict coverage fraction
- ksig, CoverageSigma: {
None
} |float
Number of standard deviations to cover (default based on cov; user must supply either cov or ksig or both)
- cdf, CoverageCDF: {cov} | 0 <
float
< 1 Fraction to use to define ksig
- osig, OutlierSigma: {
1.5*ksig
} |float
Multiple of standard deviation to identify outliers
- R:
- Outputs:
- I:
np.ndarray
[bool
] Flags for non-outlier cases,
False
if case is an outlier
- I:
- Versions:
2021-02-20
@ddalle
: Version 1.0
- cape.statutils.get_cov_interval(dx, cov=None, **kw)¶
Calculate Student’s t-distribution confidence range
If the nominal application of the Student’s t-distribution fails to cover a high enough fraction of the data, the bounds are extended until cov (user-defined fraction) of the data is covered.
- Call:
>>> a, b = get_cov_interval(dx, cov, **kw)
- Inputs:
- dx:
np.ndarray
[float
] Array of signed deltas
- cov, Coverage: {
None
} | 0 <float
< 1 Strict coverage fraction
- ksig, CoverageSigma: {
None
} |float
Number of standard deviations to cover (default based on cov; user must supply either cov or ksig or both)
- cdf, CoverageCDF: {cov} | 0 <
float
< 1 Fraction to use to define ksig
- osig, OutlierSigma: {
1.5*ksig
} |float
Multiple of standard deviation to identify outliers
- dx:
- Outputs:
- Versions:
2019-02-04
@ddalle
: Version 1.0- 2021-09-20
@ddalle
: Version 1.1 use
_parse_options()
allow 100% coverage
remove confusing kcov vs ksig scaling
- 2021-09-20
- cape.statutils.get_coverage(dx, cov=None, **kw)¶
Calculate Student’s t-distribution confidence range
If the nominal application of the Student’s t-distribution fails to cover a high enough fraction of the data, the bounds are extended until cov (user-defined fraction) of the data is covered.
- Call:
>>> width = get_coverage(dx, cov, **kw)
- Inputs:
- dx:
np.ndarray
[float
] Array of signed deltas
- cov, Coverage: {
None
} | 0 <float
< 1 Strict coverage fraction
- ksig, CoverageSigma: {
None
} |float
Number of standard deviations to cover (default based on cov; user must supply either cov or ksig or both)
- cdf, CoverageCDF: {cov} | 0 <
float
< 1 Fraction to use to define ksig
- osig, OutlierSigma: {
1.5*ksig
} |float
Multiple of standard deviation to identify outliers
- dx:
- Outputs:
- width:
float
Half-width of confidence region
- width:
- Versions:
2019-02-04
@ddalle
: Version 1.0- 2021-09-20
@ddalle
: Version 1.1 use
_parse_options()
allow 100% coverage
remove confusing kcov vs ksig scaling
- 2021-09-20
- cape.statutils.get_ordered_lower(V, cov)¶
Calculate value less than fraction cov of V’s values
- Call:
>>> v = get_ordered_lower(V, cov)
- Inputs:
- V:
np.ndarray
[float
] Array of scalar values
- cov:
float
Coverage fraction, 0 < cov <= 1
- V:
- Outputs:
- v:
float
Value such that
cov*V.size
entries in V are greater than or equal to v; may be interpolated between sorted values of V
- v:
- Versions:
2021-09-30
@ddalle
: Version 1.0
- cape.statutils.get_ordered_stats(V, cov=None, onesided=False, **kw)¶
Calculate coverage using ordered statistics
- Call:
>>> vmin, vmax = get_ordered_stats(V, cov) >>> vmin, vmax = get_ordered_stats(V, **kw) >>> vlim = get_ordered_stats(V, cov, onesided=True) >>> vlim = get_ordered_stats(V, onsided=True, **kw)
- Inputs:
- V:
np.ndarray
[float
] Array of scalar values
- cov:
float
Coverage fraction, 0 < cov <= 1
- onsided:
True
| {False
} Option to find coverage of one-sided distribution
- ksig: {
None
} |float
Option to calculate cov based on Gaussian distribution
- tsig: {
None
} |float
Option to calculate cov based on Student’s t-distribution
- V:
- Outputs:
- Versions:
2021-09-30
@ddalle
: Version 1.0
- cape.statutils.get_ordered_upper(V, cov)¶
Calculate value greater than fraction cov of V’s values
- Call:
>>> v = get_ordered_upper(V, cov)
- Inputs:
- V:
np.ndarray
[float
] Array of scalar values
- cov:
float
Coverage fraction, 0 < cov <= 1
- V:
- Outputs:
- v:
float
Value such that
cov*V.size
entries in V are less than or equal to v; may be interpolated between sorted values of V
- v:
- Versions:
2021-09-30
@ddalle
: Version 1.0
- cape.statutils.get_range(R, cov=None, **kw)¶
Calculate Student’s t-distribution confidence range
If the nominal application of the Student’s t-distribution fails to cover a high enough fraction of the data, the bounds are extended until the data is covered.
- Call:
>>> width = get_range(R, cov, **kw)
- Inputs:
- R:
np.ndarray
[float
] Array of ranges (absolute values of deltas)
- cov, Coverage: {
None
} | 0 <float
< 1 Strict coverage fraction
- ksig, CoverageSigma: {
None
} |float
Number of standard deviations to cover (default based on cov; user must supply either cov or ksig or both)
- cdf, CoverageCDF: {cov} | 0 <
float
< 1 Fraction to use to define ksig
- osig, OutlierSigma: {
1.5*ksig
} |float
Multiple of standard deviation to identify outliers
- R:
- Outputs:
- width:
float
Half-width of confidence region
- width:
- Versions:
2018-09-28
@ddalle
: Version 1.0- 2021-09-20
@ddalle
: Version 1.1 use
_parse_options()
allow 100% coverage
remove confusing kcov vs ksig scaling
- 2021-09-20