scikitplot.stats#
Elegant statistical tools for intuitive and insightful data visualization and interpretation.
The stats
module offers a wide range of probability distributions, summary
and frequency statistics, correlation functions, statistical tests,
masked statistics, and additional tools.
User guide. See the Stats (experimental) section for further details.
Astrostatistics: Bayesian Blocks for Time Series Analysis#
Use Bayesian Blocks for Time Series Analysis.
Bayesian Blocks for Time Series Analysis?#
Dynamic programming algorithm for solving a piecewise-constant model for various datasets. This is based on the algorithm presented in Scargle et al 2013 [1]. This code was ported from the astroML project [2].
Applications include:
finding an optimal histogram with adaptive bin widths
finding optimal segmentation of time series data
detecting inflection points in the rate of event data
The primary interface to these routines is the bayesian_blocks
function. This module provides fitness functions suitable for three types
of data:
Irregularly-spaced event data via the
Events
classRegularly-spaced event data via the
RegularEvents
classIrregularly-spaced point measurements via the
PointMeasures
class
For more fine-tuned control over the fitness functions used, it is possible
to define custom FitnessFunc
classes directly and use them with
the bayesian_blocks
routine.
One common application of the Bayesian Blocks algorithm is the determination
of optimal adaptive-width histogram bins. This uses the same fitness function
as for irregularly-spaced time series events. The easiest interface for
creating Bayesian Blocks histograms is the astropy.stats.histogram
function.
For detailed explanations see [1]-[2]-[3]-[4].
References
Bellman, R.E., Dreyfus, S.E., (1962). Applied Dynamic Programming. Princeton University Press, Princeton. https://press.princeton.edu/books/hardcover/9780691651873/applied-dynamic-programming
Bellman, R., Roth, R., (1969). Curve fitting by segmented straight lines. J. Amer. Statist. Assoc. 64, 1079–1084. https://www.tandfonline.com/doi/abs/10.1080/01621459.1969.10501038
Bayesian blocks fitness for binned or unbinned events. |
|
Base class for bayesian blocks fitness functions. |
|
Bayesian blocks fitness for point measures. |
|
Bayesian blocks fitness for regular events. |
|
Compute optimal segmentation of data with Scargle's Bayesian Blocks. |
Astrostatistics Tools#
This module contains simple statistical algorithms that are straightforwardly implemented as a single python function (or family of functions).
This module should generally not be used directly. Everything in
__all__
is imported into astropy.stats
, and hence that package
should be used for access.
Binomial proportion and confidence interval in bins of a continuous variable |
|
Binomial proportion confidence interval given k successes, n trials. |
|
Performs bootstrap resampling on numpy arrays. |
|
Construct a callable piecewise-linear CDF from a pair of arrays. |
|
Fold the weighted intervals to the interval (0,1). |
|
Convert a string or number to a floating point number, if possible. |
|
Convert a string or number to a floating point number, if possible. |
|
Histogram of a piecewise-constant weight function. |
|
Compute the length of overlap of two intervals. |
|
Compute the Kuiper statistic. |
|
Compute the false positive probability for the Kuiper statistic. |
|
Compute the Kuiper statistic to compare two samples. |
|
Calculate the median absolute deviation (MAD). |
|
Poisson parameter confidence interval given observed counts. |
|
Computes the signal to noise ratio for source being observed in the optical/IR using a CCD. |
Astrostatistics: Selecting the bin width of histograms#
Methods for selecting the bin width of histograms.
Ported from the astroML project: https://www.astroml.org/
Calculate histogram bin edges like |
|
Return the optimal histogram bin width using the Freedman-Diaconis rule. |
|
Enhanced histogram function, providing adaptive binnings. |
|
Return the optimal histogram bin width using Knuth's rule. |
|
Return the optimal histogram bin width using Scott's rule. |
Astrostatistics: Model Selection#
This module contains simple functions for model selection.
Computes the Akaike Information Criterion (AIC). |
|
Computes the Akaike Information Criterion assuming that the observations are Gaussian distributed. |
|
Computes the Bayesian Information Criterion (BIC) given the log of the likelihood function evaluated at the estimated (or analytically derived) parameters, the number of parameters, and the number of samples. |
|
Computes the Bayesian Information Criterion (BIC) assuming that the observations come from a Gaussian distribution. |
Discrete Distributions Tools#
Tweedie Distribution Module#
This module implements the Tweedie distribution,
a member of the exponential dispersion model (EDM) family,
using SciPy’s rv_continuous
class.
It is especially useful for modeling claim amounts in the insurance industry, where data often exhibit a mixture of zeroes and positive continuous values.
The primary focus of this package is the compound-Poisson behavior
of the Tweedie distribution, particularly in the range 1 < p < 2
.
However, it supports calculations for all valid values of the shape parameter p
.
Notes
The probability density function (PDF) of the Tweedie distribution cannot be expressed in a closed form for most values of p
.
However, approximations and numerical methods are employed to compute the PDF for practical purposes.
The Tweedie distribution family includes several well-known distributions based on the value of the shape parameter p
:
p = 0
: Normal distributionp = 1
: Poisson distribution1 < p < 2
: Compound Poisson-Gamma distributionp = 2
: Gamma distribution2 < p < 3
: Positive stable distributionsp = 3
: Inverse Gaussian distributionp > 3
: Positive stable distributions
The Tweedie distribution is undefined for values of p
in the range (0, 1)
.
References
- [1] Jørgensen, B. (1987). “Exponential dispersion models”.
Journal of the Royal Statistical Society, Series B. 49 (2): 127–162.
- [2] Tweedie, M. C. K. (1984). “An index which distinguishes between some important exponential families”.
In Statistics: Applications and New Directions. Proceedings of the Indian Statistical Institute Golden Jubilee International Conference.
- [3] [YouTube]
Statistical Methods Series: Zero-Inflated GLM and GLMM.
- [4] [Google]
A Tweedie continuous random variable inherited |
|
An instance of |