plot_residuals_distribution#
- scikitplot.api.metrics.plot_residuals_distribution(y_true, y_pred, *, dist_type='normal', var_power=1.5, title='Precision-Recall AUC Curves', ax=None, fig=None, figsize=(10, 5), title_fontsize='large', text_fontsize='medium', cmap=None, show_labels=True, digits=4, **kwargs)[source]#
Plot residuals and fit various distributions to assess their goodness of fit.
- Parameters:
- y_truearray-like, shape (n_samples,)
Ground truth (correct) target values.
- y_predarray-like, shape (n_samples,)
Estimated targets as returned by a classifier.
- dist_typestr, optional, default=’normal’
Type of distribution to fit to the residuals. Options are:
‘normal’: For symmetrically distributed residuals (mean μ, std σ).
‘poisson’: For count-based residuals or rare events (mean λ).
‘gamma’: For positive, skewed residuals with a heavy tail (shape k or α, scale θ or β).
‘inverse_gaussian’: For residuals with a distribution similar to the inverse Gaussian (mean μ, scale λ).
‘exponential’: For non-negative residuals with a long tail (scale λ).
‘lognormal’: For positively skewed residuals with a multiplicative effect (shape σ, scale exp(μ)).
‘tweedie’: For complex data including counts and continuous components.
The Tweedie distribution can model different types of data based on the variance power (
var_power
):var_power = 0: Normal distribution (mean μ, std σ)
var_power = 1: Poisson distribution (mean λ)
1 < var_power < 2: Compound Poisson-Gamma distribution
var_power = 2: Gamma distribution (shape k, scale θ)
var_power = 3: Inverse Gaussian distribution (mean μ, scale λ)
- var_powerfloat or None
- The variance power for the Tweedie distribution, applicable if
dist_type='tweedie'
. Default is 1.5, which means Tweedie-specific plotting.
Example values: 1.5 for Compound Poisson-Gamma distribution, 2 for Gamma distribution.
- The variance power for the Tweedie distribution, applicable if
- titlestr, optional, default=’Precision-Recall AUC Curves’
Title of the generated plot.
- axlist of matplotlib.axes.Axes, optional, default=None
The axis to plot the figure on. If None is passed in the current axes will be used (or generated if required). Axes like
fig.add_subplot(1, 1, 1)
orplt.gca()
- figmatplotlib.pyplot.figure, optional, default: None
The figure to plot the Visualizer on. If None is passed in the current plot will be used (or generated if required).
Added in version 0.3.9.
- figsizetuple of int, optional, default=(10, 5)
Size of the figure (width, height) in inches.
- title_fontsizestr or int, optional, default=’large’
Font size for the plot title.
- text_fontsizestr or int, optional, default=’medium’
Font size for the text in the plot.
- cmapNone, str or matplotlib.colors.Colormap, optional, default=None
Colormap used for plotting. Options include ‘viridis’, ‘PiYG’, ‘plasma’, ‘inferno’, ‘nipy_spectral’, etc. See Matplotlib Colormap documentation for available choices.
https://matplotlib.org/stable/users/explain/colors/index.html
plt.colormaps()
plt.get_cmap() # None == ‘viridis’
- show_labelsbool, optional, default=True
Whether to display the legend labels.
- digitsint, optional, default=3
Number of digits for formatting PR AUC values in the plot.
Added in version 0.3.9.
- Returns:
- axmatplotlib.axes.Axes
The axes on which the plot was drawn.
- Raises:
- ValueError: If an unsupported distribution type is provided or if
var_power
is invalid.
- ValueError: If an unsupported distribution type is provided or if
Examples
>>> import numpy as np; np.random.seed(0) >>> from sklearn.datasets import load_diabetes as data_regression >>> from sklearn.model_selection import train_test_split >>> from sklearn.linear_model import Ridge >>> import scikitplot as skplt >>> >>> X, y = data_regression(return_X_y=True, as_frame=False) >>> X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.5, random_state=0) >>> model = Ridge(alpha=1.0).fit(X_train, y_train) >>> y_val_pred = model.predict(X_val) >>> skplt.metrics.plot_residuals_distribution( >>> y_val, y_val_pred, dist_type='tweedie', >>> );
(
Source code
,png
)
Gallery examples#
plot_residuals_distribution with examples