tweedie_gen#
- class scikitplot._tweedie.tweedie_gen(momtype=1, a=None, b=None, xtol=1e-14, badvalue=None, name=None, longname=None, shapes=None, seed=None)[source]#
A Tweedie continuous random variable inherited
scipy.stats.rv_continuous
.See also
tweedie
An instance of
tweedie_gen
, providing Tweedie distribution functionality.scipy.stats.rv_continuous
doc https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rv_continuous.html
Notes
Tweedie is a family of distributions belonging to the class of exponential dispersion models.
\[f(x; \mu, \phi, p) = a(x, \phi, p) \exp((y \theta - \kappa(\theta)) / \phi)\]where \(\theta = {\mu^{1-p}}{1-p}\) when \(p \ne 1\) and \(\theta = \log(\mu)\) when \(p = 1\), and \(\kappa(\theta) = [\{(1 - p) \theta + 1\} ^ {(2 - p) / (1 - p)} - 1] / (2 - p)\) for \(p \ne 2\) and \(\kappa(\theta) = - \log(1 - \theta)\) for \(p = 2\).
Except in a few special cases (discussed below) \(a(x, \phi, p)\) is hard to to write out.
This class incorporates the Series method of evaluation of the Tweedie density for \(1 < p < 2\) and \(p > 2\). There are special cases at \(p = 0, 1, 2, 3\) where the method is equivalent to the Gaussian (Normal), Poisson, Gamma, and Inverse Gaussian (Normal).
For cdfs, only the special cases and \(1 < p < 2\) are implemented. The author has not found any documentation on series evaluation of the cdf for \(p > 2\).
Additionally, the R package
tweedie
also incorporates a (potentially) faster method that involves a Fourier inversion. This method is harder to understand, so I’ve not implemented it. However, others should feel free to attempt to add this themselves.References
Dunn, Peter K. and Smyth, Gordon K. 2001, Tweedie Family Densities: Methods of Evaluation
Dunn, Peter K. and Smyth, Gordon K. 2005, Series evaluation of Tweedie exponential dispersion model densities
Examples
The density can be found using the pdf method.
>>> tweedie(p=1.5, mu=1, phi=1).pdf(1) 0.357...
The cdf can be found using the cdf method.
>>> tweedie(p=1.5, mu=1, phi=1).cdf(1) 0.603...
The ppf can be found using the ppf method.
>>> tweedie(p=1.5, mu=1, phi=1).ppf(0.603) 0.998...
- __call__(*args, **kwds)[source]#
Freeze the distribution for the given arguments.
- Parameters:
- arg1, arg2, arg3,…array_like
The shape parameter(s) for the distribution. Should include all the non-optional arguments, may include
loc
andscale
.
- Returns:
- rv_frozenrv_frozen instance
The frozen distribution.
- cdf(x, *args, **kwds)[source]#
Cumulative distribution function of the given RV.
- Parameters:
- xarray_like
quantiles
- arg1, arg2, arg3,…array_like
The shape parameter(s) for the distribution (see docstring of the instance object for more information)
- locarray_like, optional
location parameter (default=0)
- scalearray_like, optional
scale parameter (default=1)
- Returns:
- cdfndarray
Cumulative distribution function evaluated at
x
- entropy(*args, **kwds)[source]#
Differential entropy of the RV.
- Parameters:
- arg1, arg2, arg3,…array_like
The shape parameter(s) for the distribution (see docstring of the instance object for more information).
- locarray_like, optional
Location parameter (default=0).
- scalearray_like, optional (continuous distributions only).
Scale parameter (default=1).
Notes
Entropy is defined base
e
:>>> import numpy as np >>> from scipy.stats._distn_infrastructure import rv_discrete >>> drv = rv_discrete(values=((0, 1), (0.5, 0.5))) >>> np.allclose(drv.entropy(), np.log(2.0)) True
- expect(func=None, args=(), loc=0, scale=1, lb=None, ub=None, conditional=False, **kwds)[source]#
Calculate expected value of a function with respect to the distribution by numerical integration.
The expected value of a function
f(x)
with respect to a distributiondist
is defined as:ub E[f(x)] = Integral(f(x) * dist.pdf(x)), lb
where
ub
andlb
are arguments andx
has thedist.pdf(x)
distribution. If the boundslb
andub
correspond to the support of the distribution, e.g.[-inf, inf]
in the default case, then the integral is the unrestricted expectation off(x)
. Also, the functionf(x)
may be defined such thatf(x)
is0
outside a finite interval in which case the expectation is calculated within the finite range[lb, ub]
.- Parameters:
- funccallable, optional
Function for which integral is calculated. Takes only one argument. The default is the identity mapping f(x) = x.
- argstuple, optional
Shape parameters of the distribution.
- locfloat, optional
Location parameter (default=0).
- scalefloat, optional
Scale parameter (default=1).
- lb, ubscalar, optional
Lower and upper bound for integration. Default is set to the support of the distribution.
- conditionalbool, optional
If True, the integral is corrected by the conditional probability of the integration interval. The return value is the expectation of the function, conditional on being in the given interval. Default is False.
- Additional keyword arguments are passed to the integration routine.
- Returns:
- expectfloat
The calculated expected value.
Notes
The integration behavior of this function is inherited from
scipy.integrate.quad
. Neither this function norscipy.integrate.quad
can verify whether the integral exists or is finite. For examplecauchy(0).mean()
returnsnp.nan
andcauchy(0).expect()
returns0.0
.Likewise, the accuracy of results is not verified by the function.
scipy.integrate.quad
is typically reliable for integrals that are numerically favorable, but it is not guaranteed to converge to a correct value for all possible intervals and integrands. This function is provided for convenience; for critical applications, check results against other integration methods.The function is not vectorized.
Examples
To understand the effect of the bounds of integration consider
>>> from scipy.stats import expon >>> expon(1).expect(lambda x: 1, lb=0.0, ub=2.0) 0.6321205588285578
This is close to
>>> expon(1).cdf(2.0) - expon(1).cdf(0.0) 0.6321205588285577
If
conditional=True
>>> expon(1).expect(lambda x: 1, lb=0.0, ub=2.0, conditional=True) 1.0000000000000002
The slight deviation from 1 is due to numerical integration.
The integrand can be treated as a complex-valued function by passing
complex_func=True
toscipy.integrate.quad
.>>> import numpy as np >>> from scipy.stats import vonmises >>> res = vonmises(loc=2, kappa=1).expect(lambda x: np.exp(1j*x), ... complex_func=True) >>> res (-0.18576377217422957+0.40590124735052263j)
>>> np.angle(res) # location of the (circular) distribution 2.0
- fit(data, *args, **kwds)[source]#
Return estimates of shape (if applicable), location, and scale parameters from data. The default estimation method is Maximum Likelihood Estimation (MLE), but Method of Moments (MM) is also available.
Starting estimates for the fit are given by input arguments; for any arguments not provided with starting estimates,
self._fitstart(data)
is called to generate such.One can hold some parameters fixed to specific values by passing in keyword arguments
f0
,f1
, …,fn
(for shape parameters) andfloc
andfscale
(for location and scale parameters, respectively).- Parameters:
- dataarray_like or
CensoredData
instance Data to use in estimating the distribution parameters.
- arg1, arg2, arg3,…floats, optional
Starting value(s) for any shape-characterizing arguments (those not provided will be determined by a call to
_fitstart(data)
). No default value.- **kwdsfloats, optional
loc
: initial guess of the distribution’s location parameter.scale
: initial guess of the distribution’s scale parameter.
Special keyword arguments are recognized as holding certain parameters fixed:
f0…fn : hold respective shape parameters fixed. Alternatively, shape parameters to fix can be specified by name. For example, if
self.shapes == "a, b"
,fa
andfix_a
are equivalent tof0
, andfb
andfix_b
are equivalent tof1
.floc : hold location parameter fixed to specified value.
fscale : hold scale parameter fixed to specified value.
optimizer : The optimizer to use. The optimizer must take
func
and starting position as the first two arguments, plusargs
(for extra arguments to pass to the function to be optimized) anddisp
. Thefit
method calls the optimizer withdisp=0
to suppress output. The optimizer must return the estimated parameters.method : The method to use. The default is “MLE” (Maximum Likelihood Estimate); “MM” (Method of Moments) is also available.
- dataarray_like or
- Returns:
- parameter_tupletuple of floats
Estimates for any shape parameters (if applicable), followed by those for location and scale. For most random variables, shape statistics will be returned, but there are exceptions (e.g.
norm
).
- Raises:
- TypeError, ValueError
If an input is invalid
~scipy.stats.FitError
If fitting fails or the fit produced would be invalid
Notes
With
method="MLE"
(default), the fit is computed by minimizing the negative log-likelihood function. A large, finite penalty (rather than infinite negative log-likelihood) is applied for observations beyond the support of the distribution.With
method="MM"
, the fit is computed by minimizing the L2 norm of the relative errors between the first k raw (about zero) data moments and the corresponding distribution moments, where k is the number of non-fixed parameters. More precisely, the objective function is:(((data_moments - dist_moments) / np.maximum(np.abs(data_moments), 1e-8))**2).sum()
where the constant
1e-8
avoids division by zero in case of vanishing data moments. Typically, this error norm can be reduced to zero. Note that the standard method of moments can produce parameters for which some data are outside the support of the fitted distribution; this implementation does nothing to prevent this.For either method, the returned answer is not guaranteed to be globally optimal; it may only be locally optimal, or the optimization may fail altogether. If the data contain any of
np.nan
,np.inf
, or-np.inf
, thefit
method will raise aRuntimeError
.When passing a
CensoredData
instance todata
, the log-likelihood function is defined as:\[\begin{split}l(\pmb{\theta}; k) & = \sum \log(f(k_u; \pmb{\theta})) + \sum \log(F(k_l; \pmb{\theta})) \\ & + \sum \log(1 - F(k_r; \pmb{\theta})) \\ & + \sum \log(F(k_{\text{high}, i}; \pmb{\theta}) - F(k_{\text{low}, i}; \pmb{\theta}))\end{split}\]where \(f\) and \(F\) are the pdf and cdf, respectively, of the function being fitted, \(\pmb{\theta}\) is the parameter vector, \(u\) are the indices of uncensored observations, \(l\) are the indices of left-censored observations, \(r\) are the indices of right-censored observations, subscripts “low”/”high” denote endpoints of interval-censored observations, and \(i\) are the indices of interval-censored observations.
Examples
Generate some data to fit: draw random variates from the
beta
distribution>>> import numpy as np >>> from scipy.stats import beta >>> a, b = 1., 2. >>> rng = np.random.default_rng(172786373191770012695001057628748821561) >>> x = beta.rvs(a, b, size=1000, random_state=rng)
Now we can fit all four parameters (
a
,b
,loc
andscale
):>>> a1, b1, loc1, scale1 = beta.fit(x) >>> a1, b1, loc1, scale1 (1.0198945204435628, 1.9484708982737828, 4.372241314917588e-05, 0.9979078845964814)
The fit can be done also using a custom optimizer:
>>> from scipy.optimize import minimize >>> def custom_optimizer(func, x0, args=(), disp=0): ... res = minimize(func, x0, args, method="slsqp", options={"disp": disp}) ... if res.success: ... return res.x ... raise RuntimeError('optimization routine failed') >>> a1, b1, loc1, scale1 = beta.fit(x, method="MLE", optimizer=custom_optimizer) >>> a1, b1, loc1, scale1 (1.0198821087258905, 1.948484145914738, 4.3705304486881485e-05, 0.9979104663953395)
We can also use some prior knowledge about the dataset: let’s keep
loc
andscale
fixed:>>> a1, b1, loc1, scale1 = beta.fit(x, floc=0, fscale=1) >>> loc1, scale1 (0, 1)
We can also keep shape parameters fixed by using
f
-keywords. To keep the zero-th shape parametera
equal 1, usef0=1
or, equivalently,fa=1
:>>> a1, b1, loc1, scale1 = beta.fit(x, fa=1, floc=0, fscale=1) >>> a1 1
Not all distributions return estimates for the shape parameters.
norm
for example just returns estimates for location and scale:>>> from scipy.stats import norm >>> x = norm.rvs(a, b, size=1000, random_state=123) >>> loc1, scale1 = norm.fit(x) >>> loc1, scale1 (0.92087172783841631, 2.0015750750324668)
- fit_loc_scale(data, *args)[source]#
Estimate loc and scale parameters from data using 1st and 2nd moments.
- Parameters:
- dataarray_like
Data to fit.
- arg1, arg2, arg3,…array_like
The shape parameter(s) for the distribution (see docstring of the instance object for more information).
- Returns:
- Lhatfloat
Estimated location parameter for the data.
- Shatfloat
Estimated scale parameter for the data.
- freeze(*args, **kwds)[source]#
Freeze the distribution for the given arguments.
- Parameters:
- arg1, arg2, arg3,…array_like
The shape parameter(s) for the distribution. Should include all the non-optional arguments, may include
loc
andscale
.
- Returns:
- rv_frozenrv_frozen instance
The frozen distribution.
- interval(confidence, *args, **kwds)[source]#
Confidence interval with equal areas around the median.
- Parameters:
- confidencearray_like of float
Probability that an rv will be drawn from the returned range. Each value should be in the range [0, 1].
- arg1, arg2, …array_like
The shape parameter(s) for the distribution (see docstring of the instance object for more information).
- locarray_like, optional
location parameter, Default is 0.
- scalearray_like, optional
scale parameter, Default is 1.
- Returns:
- a, bndarray of float
end-points of range that contain
100 * alpha %
of the rv’s possible values.
Notes
This is implemented as
ppf([p_tail, 1-p_tail])
, whereppf
is the inverse cumulative distribution function andp_tail = (1-confidence)/2
. Suppose[c, d]
is the support of a discrete distribution; thenppf([0, 1]) == (c-1, d)
. Therefore, whenconfidence=1
and the distribution is discrete, the left end of the interval will be beyond the support of the distribution. For discrete distributions, the interval will limit the probability in each tail to be less than or equal top_tail
(usually strictly less).
- isf(q, *args, **kwds)[source]#
Inverse survival function (inverse of
sf
) at q of the given RV.- Parameters:
- qarray_like
upper tail probability
- arg1, arg2, arg3,…array_like
The shape parameter(s) for the distribution (see docstring of the instance object for more information)
- locarray_like, optional
location parameter (default=0)
- scalearray_like, optional
scale parameter (default=1)
- Returns:
- xndarray or scalar
Quantile corresponding to the upper tail probability q.
- logcdf(x, *args, **kwds)[source]#
Log of the cumulative distribution function at x of the given RV.
- Parameters:
- xarray_like
quantiles
- arg1, arg2, arg3,…array_like
The shape parameter(s) for the distribution (see docstring of the instance object for more information)
- locarray_like, optional
location parameter (default=0)
- scalearray_like, optional
scale parameter (default=1)
- Returns:
- logcdfarray_like
Log of the cumulative distribution function evaluated at x
- logpdf(x, *args, **kwds)[source]#
Log of the probability density function at x of the given RV.
This uses a more numerically accurate calculation if available.
- Parameters:
- xarray_like
quantiles
- arg1, arg2, arg3,…array_like
The shape parameter(s) for the distribution (see docstring of the instance object for more information)
- locarray_like, optional
location parameter (default=0)
- scalearray_like, optional
scale parameter (default=1)
- Returns:
- logpdfarray_like
Log of the probability density function evaluated at x
- logsf(x, *args, **kwds)[source]#
Log of the survival function of the given RV.
Returns the log of the “survival function,” defined as (1 -
cdf
), evaluated atx
.- Parameters:
- xarray_like
quantiles
- arg1, arg2, arg3,…array_like
The shape parameter(s) for the distribution (see docstring of the instance object for more information)
- locarray_like, optional
location parameter (default=0)
- scalearray_like, optional
scale parameter (default=1)
- Returns:
- logsfndarray
Log of the survival function evaluated at
x
.
- mean(*args, **kwds)[source]#
Mean of the distribution.
- Parameters:
- arg1, arg2, arg3,…array_like
The shape parameter(s) for the distribution (see docstring of the instance object for more information)
- locarray_like, optional
location parameter (default=0)
- scalearray_like, optional
scale parameter (default=1)
- Returns:
- meanfloat
the mean of the distribution
- median(*args, **kwds)[source]#
Median of the distribution.
- Parameters:
- arg1, arg2, arg3,…array_like
The shape parameter(s) for the distribution (see docstring of the instance object for more information)
- locarray_like, optional
Location parameter, Default is 0.
- scalearray_like, optional
Scale parameter, Default is 1.
- Returns:
- medianfloat
The median of the distribution.
See also
rv_discrete.ppf
Inverse of the CDF
- moment(order, *args, **kwds)[source]#
non-central moment of distribution of specified order.
- Parameters:
- orderint, order >= 1
Order of moment.
- arg1, arg2, arg3,…float
The shape parameter(s) for the distribution (see docstring of the instance object for more information).
- locarray_like, optional
location parameter (default=0)
- scalearray_like, optional
scale parameter (default=1)
- nnlf(theta, x)[source]#
Negative loglikelihood function. .. rubric:: Notes
This is
-sum(log pdf(x, theta), axis=0)
wheretheta
are the parameters (including loc and scale).
- pdf(x, *args, **kwds)[source]#
Probability density function at x of the given RV.
- Parameters:
- xarray_like
quantiles
- arg1, arg2, arg3,…array_like
The shape parameter(s) for the distribution (see docstring of the instance object for more information)
- locarray_like, optional
location parameter (default=0)
- scalearray_like, optional
scale parameter (default=1)
- Returns:
- pdfndarray
Probability density function evaluated at x
- ppf(q, *args, **kwds)[source]#
Percent point function (inverse of
cdf
) at q of the given RV.- Parameters:
- qarray_like
lower tail probability
- arg1, arg2, arg3,…array_like
The shape parameter(s) for the distribution (see docstring of the instance object for more information)
- locarray_like, optional
location parameter (default=0)
- scalearray_like, optional
scale parameter (default=1)
- Returns:
- xarray_like
quantile corresponding to the lower tail probability q.
- property random_state#
Get or set the generator object for generating random variates.
If
random_state
is None (ornp.random
), thenumpy.random.RandomState
singleton is used. Ifrandom_state
is an int, a newRandomState
instance is used, seeded withrandom_state
. Ifrandom_state
is already aGenerator
orRandomState
instance, that instance is used.
- rvs(*args, **kwds)[source]#
Random variates of given type.
- Parameters:
- arg1, arg2, arg3,…array_like
The shape parameter(s) for the distribution (see docstring of the instance object for more information).
- locarray_like, optional
Location parameter (default=0).
- scalearray_like, optional
Scale parameter (default=1).
- sizeint or tuple of ints, optional
Defining number of random variates (default is 1).
- random_state{None, int,
numpy.random.Generator
, numpy.random.RandomState
}, optionalIf
random_state
is None (ornp.random
), thenumpy.random.RandomState
singleton is used. Ifrandom_state
is an int, a newRandomState
instance is used, seeded withrandom_state
. Ifrandom_state
is already aGenerator
orRandomState
instance, that instance is used.
- Returns:
- rvsndarray or scalar
Random variates of given
size
.
- sf(x, *args, **kwds)[source]#
Survival function (1 -
cdf
) at x of the given RV.- Parameters:
- xarray_like
quantiles
- arg1, arg2, arg3,…array_like
The shape parameter(s) for the distribution (see docstring of the instance object for more information)
- locarray_like, optional
location parameter (default=0)
- scalearray_like, optional
scale parameter (default=1)
- Returns:
- sfarray_like
Survival function evaluated at x
- stats(*args, **kwds)[source]#
Some statistics of the given RV.
- Parameters:
- arg1, arg2, arg3,…array_like
The shape parameter(s) for the distribution (see docstring of the instance object for more information)
- locarray_like, optional
location parameter (default=0)
- scalearray_like, optional (continuous RVs only)
scale parameter (default=1)
- momentsstr, optional
composed of letters [‘mvsk’] defining which moments to compute: ‘m’ = mean, ‘v’ = variance, ‘s’ = (Fisher’s) skew, ‘k’ = (Fisher’s) kurtosis. (default is ‘mv’)
- Returns:
- statssequence
of requested moments.
- std(*args, **kwds)[source]#
Standard deviation of the distribution.
- Parameters:
- arg1, arg2, arg3,…array_like
The shape parameter(s) for the distribution (see docstring of the instance object for more information)
- locarray_like, optional
location parameter (default=0)
- scalearray_like, optional
scale parameter (default=1)
- Returns:
- stdfloat
standard deviation of the distribution
- support(*args, **kwargs)[source]#
Support of the distribution.
- Parameters:
- arg1, arg2, …array_like
The shape parameter(s) for the distribution (see docstring of the instance object for more information).
- locarray_like, optional
location parameter, Default is 0.
- scalearray_like, optional
scale parameter, Default is 1.
- Returns:
- a, barray_like
end-points of the distribution’s support.
- var(*args, **kwds)[source]#
Variance of the distribution.
- Parameters:
- arg1, arg2, arg3,…array_like
The shape parameter(s) for the distribution (see docstring of the instance object for more information)
- locarray_like, optional
location parameter (default=0)
- scalearray_like, optional
scale parameter (default=1)
- Returns:
- varfloat
the variance of the distribution