akaike_info_criterion_lsq#
- scikitplot._astropy.stats.akaike_info_criterion_lsq(ssr, n_params, n_samples)[source]#
Computes the Akaike Information Criterion assuming that the observations are Gaussian distributed.
In this case, AIC is given as
\[\mathrm{AIC} = n\ln\left(\dfrac{\mathrm{SSR}}{n}\right) + 2k\]In case that the sample size is not “large enough”, a correction is applied, i.e.
\[\mathrm{AIC} = n\ln\left(\dfrac{\mathrm{SSR}}{n}\right) + 2k + \dfrac{2k(k+1)}{n-k-1}\]in which \(n\) is the sample size, \(k\) is the number of free parameters and \(\mathrm{SSR}\) stands for the sum of squared residuals between model and data.
This is applicable, for instance, when the parameters of a model are estimated using the least squares statistic.
For detailed explanations see [1]-[2].
- Parameters:
- ssrfloat
Sum of squared residuals (SSR) between model and data.
- n_paramsint
Number of free parameters of the model, i.e., the dimension of the parameter space.
- n_samplesint
Number of observations.
- Returns:
- aicfloat
Akaike Information Criterion.
- Parameters:
- Return type:
References
[1]Akaike Information Criterion. <https://en.wikipedia.org/wiki/Akaike_information_criterion>
[2]Origin Lab. Comparing Two Fitting Functions. <https://www.originlab.com/doc/Origin-Help/PostFit-CompareFitFunc>
Examples
This example is based on Astropy Modeling webpage, Compound models section.
>>> import numpy as np >>> from astropy.modeling import models, fitting >>> from astropy.stats.info_theory import akaike_info_criterion_lsq >>> np.random.seed(42) >>> # Generate fake data >>> g1 = models.Gaussian1D(.1, 0, 0.2) # changed this to noise level >>> g2 = models.Gaussian1D(.1, 0.3, 0.2) # and added another Gaussian >>> g3 = models.Gaussian1D(2.5, 0.5, 0.1) >>> x = np.linspace(-1, 1, 200) >>> y = g1(x) + g2(x) + g3(x) + np.random.normal(0., 0.2, x.shape) >>> # Fit with three Gaussians >>> g3_init = (models.Gaussian1D(.1, 0, 0.1) ... + models.Gaussian1D(.1, 0.2, 0.15) ... + models.Gaussian1D(2.4, .4, 0.1)) >>> fitter = fitting.LevMarLSQFitter() >>> g3_fit = fitter(g3_init, x, y) >>> # Fit with two Gaussians >>> g2_init = (models.Gaussian1D(.1, 0, 0.1) + ... models.Gaussian1D(2, 0.5, 0.1)) >>> g2_fit = fitter(g2_init, x, y) >>> # Fit with only one Gaussian >>> g1_init = models.Gaussian1D(amplitude=2., mean=0.3, stddev=.5) >>> g1_fit = fitter(g1_init, x, y) >>> # Compute the mean squared errors >>> ssr_g3 = np.sum((g3_fit(x) - y)**2.0) >>> ssr_g2 = np.sum((g2_fit(x) - y)**2.0) >>> ssr_g1 = np.sum((g1_fit(x) - y)**2.0) >>> akaike_info_criterion_lsq(ssr_g3, 9, x.shape[0]) np.float64(-634.5257517810961) >>> akaike_info_criterion_lsq(ssr_g2, 6, x.shape[0]) np.float64(-662.83834510232043) >>> akaike_info_criterion_lsq(ssr_g1, 3, x.shape[0]) np.float64(-647.47312032659499)
Hence, from the AIC values, we would prefer to choose the model g2_fit. However, we can considerably support the model g3_fit, since the difference in AIC is about 2.4. We should reject the model g1_fit.