scikitplot.stats#

Elegant statistical tools for intuitive and insightful data visualization and interpretation.

The stats module offers a wide range of probability distributions, summary and frequency statistics, correlation functions, statistical tests, masked statistics, and additional tools.

User guide. See the Stats (experimental) section for further details.

Astrostatistics: Bayesian Blocks for Time Series Analysis#

Use Bayesian Blocks for Time Series Analysis.

Bayesian Blocks for Time Series Analysis?#

Dynamic programming algorithm for solving a piecewise-constant model for various datasets. This is based on the algorithm presented in Scargle et al 2013 [1]. This code was ported from the astroML project [2].

Applications include:

  • finding an optimal histogram with adaptive bin widths

  • finding optimal segmentation of time series data

  • detecting inflection points in the rate of event data

The primary interface to these routines is the bayesian_blocks function. This module provides fitness functions suitable for three types of data:

  • Irregularly-spaced event data via the Events class

  • Regularly-spaced event data via the RegularEvents class

  • Irregularly-spaced point measurements via the PointMeasures class

For more fine-tuned control over the fitness functions used, it is possible to define custom FitnessFunc classes directly and use them with the bayesian_blocks routine.

One common application of the Bayesian Blocks algorithm is the determination of optimal adaptive-width histogram bins. This uses the same fitness function as for irregularly-spaced time series events. The easiest interface for creating Bayesian Blocks histograms is the astropy.stats.histogram function.

For detailed explanations see [1]-[2]-[3]-[4].

References

[3]

Bellman, R.E., Dreyfus, S.E., (1962). Applied Dynamic Programming. Princeton University Press, Princeton. https://press.princeton.edu/books/hardcover/9780691651873/applied-dynamic-programming

[4]

Bellman, R., Roth, R., (1969). Curve fitting by segmented straight lines. J. Amer. Statist. Assoc. 64, 1079–1084. https://www.tandfonline.com/doi/abs/10.1080/01621459.1969.10501038

Events

Bayesian blocks fitness for binned or unbinned events.

FitnessFunc

Base class for bayesian blocks fitness functions.

PointMeasures

Bayesian blocks fitness for point measures.

RegularEvents

Bayesian blocks fitness for regular events.

bayesian_blocks

Compute optimal segmentation of data with Scargle's Bayesian Blocks.

Astrostatistics Tools#

This module contains simple statistical algorithms that are straightforwardly implemented as a single python function (or family of functions).

This module should generally not be used directly. Everything in __all__ is imported into astropy.stats, and hence that package should be used for access.

binned_binom_proportion

Binomial proportion and confidence interval in bins of a continuous variable x.

binom_conf_interval

Binomial proportion confidence interval given k successes, n trials.

bootstrap

Performs bootstrap resampling on numpy arrays.

cdf_from_intervals

Construct a callable piecewise-linear CDF from a pair of arrays.

fold_intervals

Fold the weighted intervals to the interval (0,1).

gaussian_fwhm_to_sigma

Convert a string or number to a floating point number, if possible.

gaussian_sigma_to_fwhm

Convert a string or number to a floating point number, if possible.

histogram_intervals

Histogram of a piecewise-constant weight function.

interval_overlap_length

Compute the length of overlap of two intervals.

kuiper

Compute the Kuiper statistic.

kuiper_false_positive_probability

Compute the false positive probability for the Kuiper statistic.

kuiper_two

Compute the Kuiper statistic to compare two samples.

mad_std

median_absolute_deviation

Calculate the median absolute deviation (MAD).

poisson_conf_interval

Poisson parameter confidence interval given observed counts.

signal_to_noise_oir_ccd

Computes the signal to noise ratio for source being observed in the optical/IR using a CCD.

Astrostatistics: Selecting the bin width of histograms#

Methods for selecting the bin width of histograms.

Ported from the astroML project: https://www.astroml.org/

calculate_bin_edges

Calculate histogram bin edges like numpy.histogram_bin_edges.

freedman_bin_width

Return the optimal histogram bin width using the Freedman-Diaconis rule.

histogram

Enhanced histogram function, providing adaptive binnings.

knuth_bin_width

Return the optimal histogram bin width using Knuth's rule.

scott_bin_width

Return the optimal histogram bin width using Scott's rule.

Astrostatistics: Model Selection#

This module contains simple functions for model selection.

akaike_info_criterion

Computes the Akaike Information Criterion (AIC).

akaike_info_criterion_lsq

Computes the Akaike Information Criterion assuming that the observations are Gaussian distributed.

bayesian_info_criterion

Computes the Bayesian Information Criterion (BIC) given the log of the likelihood function evaluated at the estimated (or analytically derived) parameters, the number of parameters, and the number of samples.

bayesian_info_criterion_lsq

Computes the Bayesian Information Criterion (BIC) assuming that the observations come from a Gaussian distribution.

Discrete Distributions Tools#

Tweedie Distribution Module#

This module implements the Tweedie distribution, a member of the exponential dispersion model (EDM) family, using SciPy’s rv_continuous class.

It is especially useful for modeling claim amounts in the insurance industry, where data often exhibit a mixture of zeroes and positive continuous values.

The primary focus of this package is the compound-Poisson behavior of the Tweedie distribution, particularly in the range 1 < p < 2. However, it supports calculations for all valid values of the shape parameter p.

Notes

The probability density function (PDF) of the Tweedie distribution cannot be expressed in a closed form for most values of p. However, approximations and numerical methods are employed to compute the PDF for practical purposes.

The Tweedie distribution family includes several well-known distributions based on the value of the shape parameter p:

  • p = 0 : Normal distribution

  • p = 1 : Poisson distribution

  • 1 < p < 2 : Compound Poisson-Gamma distribution

  • p = 2 : Gamma distribution

  • 2 < p < 3 : Positive stable distributions

  • p = 3 : Inverse Gaussian distribution

  • p > 3 : Positive stable distributions

The Tweedie distribution is undefined for values of p in the range (0, 1).

References

[1] Jørgensen, B. (1987). “Exponential dispersion models”.

Journal of the Royal Statistical Society, Series B. 49 (2): 127–162.

[2] Tweedie, M. C. K. (1984). “An index which distinguishes between some important exponential families”.

In Statistics: Applications and New Directions. Proceedings of the Indian Statistical Institute Golden Jubilee International Conference.

[3] [YouTube]

Statistical Methods Series: Zero-Inflated GLM and GLMM.

[4] [Google]

https://www.statisticshowto.com/tweedie-distribution/

tweedie_gen

A Tweedie continuous random variable inherited scipy.stats.rv_continuous.

tweedie

An instance of tweedie_gen, providing Tweedie distribution functionality.