ModelPlotPy#

class scikitplot.decile.ModelPlotPy(feature_data=None, label_data=None, dataset_labels=None, models=None, model_labels=None, ntiles=10, seed=0)[source]#

Decile/ntile analysis for sklearn classifiers.

Parameters:

feature_dataSequence[Any] or None, default=None: Sequence of feature matrices (DataFrame or ndarray). One per dataset.
label_dataSequence[Any] or None, default=None: Sequence of label vectors (Series/ndarray/list). One per dataset.
dataset_labelsSequence[str] or None, default=None: Names for datasets; must match length of feature_data and label_data.
modelsSequence[ClassifierMixin] or None, default=None: Fitted sklearn-like classifiers that implement predict_proba and classes_.
model_labelsSequence[str] or None, default=None: Names for models; must match length of models.
ntilesint, default=10: Number of ntiles. Must satisfy 2 <= ntiles <= n_samples for each dataset.
seedint, default=0: Reserved for backward compatibility. Not used (ntiles are deterministic).

Returns:

ModelPlotPy: Instance.

Raises:

ValueError: If list lengths are inconsistent or ntiles is invalid.
TypeError: If models are not sklearn classifiers.

Parameters:

feature_data (Sequence[Any] | None)
label_data (Sequence[Any] | None)
dataset_labels (Sequence[str] | None)
models (Sequence[ClassifierMixin] | None)
model_labels (Sequence[str] | None)
ntiles (int)
seed (int)

See also

sklearn.base.ClassifierMixin

Notes

Key design rules:

No mutable defaults in __init__.
No randomness in ntile assignment (random noise added for qcut).

Examples

>>> from sklearn.linear_model import LogisticRegression
>>> X = pd.DataFrame({"x": [0, 1, 2, 3]})
>>> y = pd.Series([0, 0, 1, 1])
>>> m = LogisticRegression().fit(X, y)
>>> mp = ModelPlotPy([X], [y], ["train"], [m], ["lr"], ntiles=2)
>>> scores = mp.prepare_scores_and_ntiles()
>>> set(scores.columns) >= {
...     "dataset_label",
...     "model_label",
...     "target_class",
...     "prob_0",
...     "prob_1",
...     "dec_0",
...     "dec_1",
... }
True

aggregate_over_ntiles()[source]#

Aggregate counts and lift/gain metrics per ntile.

Parameters:

None

Returns:

pandas.DataFrame: Aggregated metrics per (model_label, dataset_label, target_class, ntile). The output schema matches the legacy implementation so it can be consumed by the existing plot functions.

Raises:

ValueError: If any group has zero positives for a requested target class.

Return type:

DataFrame

See also

prepare_scores_and_ntiles

Notes

Dev note: this implementation avoids mutating shared columns in the scores dataframe during loops (the legacy code writes pos/neg repeatedly).

Examples

>>> # agg = mp.aggregate_over_ntiles()

get_params()[source]#

Get parameters (sklearn-style API).

Parameters:

None

Returns:

dict[str, Any]: Parameter dictionary.

Raises:

None

Return type:

dict[str, Any]

See also

set_params

Notes

The returned objects are the current attributes; callers should not mutate them in-place if they want stable behavior.

Examples

>>> # mp.get_params()

plotting_scope(scope='auto', select_model_label=None, select_dataset_label=None, select_targetclass=None, select_smallest_targetclass=True)[source]#

Build plot_input subset according to a strict scope contract.

Parameters:

scope{‘auto’, ‘no_comparison’, ‘compare_models’, ‘compare_datasets’, ‘compare_targetclasses’}, default=’auto’

Evaluation perspective.

If scope='auto', the scope is inferred deterministically from the provided selectors and the available options.

select_model_labelSequence[str] or None, default=None

Model labels to include.

select_dataset_labelSequence[str] or None, default=None

Dataset labels to include.

select_targetclassSequence[Any] or None, default=None

Target classes to include.

select_smallest_targetclassbool, default=True

Should the plot only contain the results of the smallest targetclass. If True, the specific target is defined from the first dataset. If False and select_targetclass is None try to uses list(self.models[0].classes_)

Returns:

pandas.DataFrame: Subset dataframe ready for plotting functions.

Raises:

ValueError: If the scope is invalid, selector values are invalid, or the selection is ambiguous under the strict contract.

Parameters:

scope (str)
select_model_label (Sequence[str] | None)
select_dataset_label (Sequence[str] | None)
select_targetclass (Sequence[Any] | None)
select_smallest_targetclass (bool)

Return type:

DataFrame

See also

aggregate_over_ntiles

Notes

Inference rules for ``scope=’auto’``

Let the universes be:

M = all model_labels
D = all dataset_labels
T = all fitted classes_

After validating selectors (membership is strict):

If exactly one selector among (models, datasets, targetclasses) contains two or more values, then auto selects the corresponding comparison scope.
- len(select_model_label) >= 2 -> compare_models
- len(select_dataset_label) >= 2 -> compare_datasets
- len(select_targetclass) >= 2 -> compare_targetclasses
If more than one selector has length >= 2, the request is ambiguous and a ValueError is raised.
If no selector has length >= 2:
- If all dimensions are fixed (either explicitly selected with length 1, or the universe size is 1), auto selects no_comparison.
- Otherwise, if exactly one dimension is unfixed (universe size > 1) while the other two are fixed, auto selects the corresponding comparison scope comparing all values in that unfixed dimension.
- If the remaining degrees of freedom are not unique, the request is ambiguous and a ValueError is raised.

Examples

>>> # Compare all models on a fixed dataset and target class (scope inferred):
>>> # plot_input = mp.plotting_scope(select_dataset_label=['test'], select_targetclass=[1])
>>>
>>> # Compare two datasets for a fixed model and target class (scope inferred):
>>> # plot_input = mp.plotting_scope(select_model_label=['lr'], select_targetclass=[1])

prepare_scores_and_ntiles()[source]#

Compute per-row class probabilities and deterministic ntiles.

Parameters:

None

Returns:

pandas.DataFrame

DataFrame containing:

dataset_label and model_label
target_class (true label)
prob_<class> columns
dec_<class> columns (1..ntiles; 1 = highest probability)

Raises:

ValueError: If there are no models/datasets, ntiles is invalid, or any dataset has fewer rows than ntiles.

Return type:

DataFrame

See also

aggregate_over_ntiles

Notes

This replaces the legacy approach that added random noise and used pandas.qcut for binning.

Examples

>>> # scores = mp.prepare_scores_and_ntiles()

reset_params()[source]#

Reset all parameters to a default empty state.

Parameters:

None

Returns:

None

Raises:

None

Return type:

None

See also

set_params, get_params

Notes

The object remains usable after repopulating fields.

Examples

>>> # mp.reset_params()

set_params(**params)[source]#

Set parameters (sklearn-style API) and re-validate.

Parameters:

**paramsAny: Attributes to set on the object.

Returns:

None

Raises:

ValueError: If an invalid parameter is provided.

Parameters:

params (Any)

Return type:

None

See also

get_params

Notes

After updating attributes, the internal state is validated.

Examples

>>> # mp.set_params(ntiles=20)

Gallery examples#

Introduction to modelplotpy

ModelPlotPy#

Gallery examples#

This Page