plot_residuals_distribution#

scikitplot.api.metrics.plot_residuals_distribution(y_true, y_pred, *, dist_type='normal', var_power=1.5, title='Precision-Recall AUC Curves', title_fontsize='large', text_fontsize='medium', cmap=None, show_labels=True, digits=4, figsize=(10, 5), nrows=1, ncols=3, index=3, **kwargs)[source]#

Plot residuals and fit various distributions to assess their goodness of fit.

Parameters:
y_truearray-like, shape (n_samples,)

Ground truth (correct) target values.

y_predarray-like, shape (n_samples,)

Estimated targets as returned by a classifier.

dist_typestr, optional, default=’normal’

Type of distribution to fit to the residuals. Options are:

  • ‘normal’: For symmetrically distributed residuals (mean μ, std σ).

  • ‘poisson’: For count-based residuals or rare events (mean λ).

  • ‘gamma’: For positive, skewed residuals with a heavy tail (shape k or α, scale θ or β).

  • ‘inverse_gaussian’: For residuals with a distribution similar to the inverse Gaussian

(mean μ, scale λ). - ‘exponential’: For non-negative residuals with a long tail (scale λ). - ‘lognormal’: For positively skewed residuals with a multiplicative effect (shape σ, scale exp(μ)). - ‘tweedie’: For complex data including counts and continuous components.

The Tweedie distribution can model different types of data based on the variance power (var_power):

  • var_power = 0: Normal distribution (mean μ, std σ)

  • var_power = 1: Poisson distribution (mean λ)

  • 1 < var_power < 2: Compound Poisson-Gamma distribution

  • var_power = 2: Gamma distribution (shape k, scale θ)

  • var_power = 3: Inverse Gaussian distribution (mean μ, scale λ)

var_powerfloat or None
The variance power for the Tweedie distribution, applicable if dist_type='tweedie'.
  • Default is 1.5, which means Tweedie-specific plotting.

  • Example values: 1.5 for Compound Poisson-Gamma distribution, 2 for Gamma distribution.

titlestr, optional, default=’Precision-Recall AUC Curves’

Title of the generated plot.

title_fontsizestr or int, optional, default=’large’

Font size for the plot title.

text_fontsizestr or int, optional, default=’medium’

Font size for the text in the plot.

cmapNone, str or matplotlib.colors.Colormap, optional, default=None

Colormap used for plotting. Options include ‘viridis’, ‘PiYG’, ‘plasma’, ‘inferno’, ‘nipy_spectral’, etc. See Matplotlib Colormap documentation for available choices.

show_labelsbool, optional, default=True

Whether to display the legend labels.

digitsint, optional, default=3

Number of digits for formatting PR AUC values in the plot.

Added in version 0.3.9.

**kwargs: dict

Generic keyword arguments.

Returns:
axmatplotlib.axes.Axes

The axes on which the plot was drawn.

Other Parameters:
axmatplotlib.axes.Axes, optional, default=None

The axis to plot the figure on. If None is passed in the current axes will be used (or generated if required).

figmatplotlib.pyplot.figure, optional, default: None

The figure to plot the Visualizer on. If None is passed in the current plot will be used (or generated if required).

figsizetuple, optional, default=None

Width, height in inches. Tuple denoting figure size of the plot e.g. (12, 5)

nrowsint, optional, default=1

Number of rows in the subplot grid.

ncolsint, optional, default=1

Number of columns in the subplot grid.

plot_stylestr, optional, default=None

Check available styles with “plt.style.available”. Examples include: [‘ggplot’, ‘seaborn’, ‘bmh’, ‘classic’, ‘dark_background’, ‘fivethirtyeight’, ‘grayscale’, ‘seaborn-bright’, ‘seaborn-colorblind’, ‘seaborn-dark’, ‘seaborn-dark-palette’, ‘tableau-colorblind10’, ‘fast’].

Added in version 0.4.0.

show_figbool, default=True

Show the plot.

save_figbool, default=False

Save the plot.

save_fig_filenamestr, optional, default=’’

Specify the path and filetype to save the plot. If nothing specified, the plot will be saved as png inside result_images under to the current working directory. Defaults to plot image named to used func.__name__.

verbosebool, optional

If True, prints debugging information.

Raises:
ValueError: If an unsupported distribution type is provided or if var_power is invalid.

Examples

>>> import numpy as np
...
... np.random.seed(0)
>>> from sklearn.datasets import (
...     load_diabetes as data_regression,
... )
>>> from sklearn.model_selection import train_test_split
>>> from sklearn.linear_model import Ridge
>>> import scikitplot as skplt
>>>
>>> X, y = data_regression(return_X_y=True, as_frame=False)
>>> X_train, X_val, y_train, y_val = train_test_split(
...     X, y, test_size=0.5, random_state=0
... )
>>> model = Ridge(alpha=1.0).fit(X_train, y_train)
>>> y_val_pred = model.predict(X_val)
>>> skplt.metrics.plot_residuals_distribution(
>>>     y_val, y_val_pred, dist_type='tweedie',
>>> );

(Source code, png)

Residuals Distribution