ModelPlotPy#

class scikitplot.decile.ModelPlotPy(feature_data=None, label_data=None, dataset_labels=None, models=None, model_labels=None, ntiles=10, seed=0)[source]#

Decile/ntile analysis for sklearn classifiers.

Parameters:
feature_dataSequence[Any] or None, default=None

Sequence of feature matrices (DataFrame or ndarray). One per dataset.

label_dataSequence[Any] or None, default=None

Sequence of label vectors (Series/ndarray/list). One per dataset.

dataset_labelsSequence[str] or None, default=None

Names for datasets; must match length of feature_data and label_data.

modelsSequence[ClassifierMixin] or None, default=None

Fitted sklearn-like classifiers that implement predict_proba and classes_.

model_labelsSequence[str] or None, default=None

Names for models; must match length of models.

ntilesint, default=10

Number of ntiles. Must satisfy 2 <= ntiles <= n_samples for each dataset.

seedint, default=0

Reserved for backward compatibility. Not used (ntiles are deterministic).

Returns:
ModelPlotPy

Instance.

Raises:
ValueError

If list lengths are inconsistent or ntiles is invalid.

TypeError

If models are not sklearn classifiers.

Parameters:
  • feature_data (Sequence[Any] | None)

  • label_data (Sequence[Any] | None)

  • dataset_labels (Sequence[str] | None)

  • models (Sequence[ClassifierMixin] | None)

  • model_labels (Sequence[str] | None)

  • ntiles (int)

  • seed (int)

Notes

Key design rules:

  • No mutable defaults in __init__.

  • No randomness in ntile assignment (random noise added for qcut).

Examples

>>> from sklearn.linear_model import LogisticRegression
>>> X = pd.DataFrame({"x": [0, 1, 2, 3]})
>>> y = pd.Series([0, 0, 1, 1])
>>> m = LogisticRegression().fit(X, y)
>>> mp = ModelPlotPy([X], [y], ["train"], [m], ["lr"], ntiles=2)
>>> scores = mp.prepare_scores_and_ntiles()
>>> set(scores.columns) >= {
...     "dataset_label",
...     "model_label",
...     "target_class",
...     "prob_0",
...     "prob_1",
...     "dec_0",
...     "dec_1",
... }
True
aggregate_over_ntiles()[source]#

Aggregate counts and lift/gain metrics per ntile.

Parameters:
None
Returns:
pandas.DataFrame

Aggregated metrics per (model_label, dataset_label, target_class, ntile). The output schema matches the legacy implementation so it can be consumed by the existing plot functions.

Raises:
ValueError

If any group has zero positives for a requested target class.

Return type:

DataFrame

Notes

Dev note: this implementation avoids mutating shared columns in the scores dataframe during loops (the legacy code writes pos/neg repeatedly).

Examples

>>> # agg = mp.aggregate_over_ntiles()
get_params()[source]#

Get parameters (sklearn-style API).

Parameters:
None
Returns:
dict[str, Any]

Parameter dictionary.

Raises:
None
Return type:

dict[str, Any]

See also

set_params

Notes

The returned objects are the current attributes; callers should not mutate them in-place if they want stable behavior.

Examples

>>> # mp.get_params()
plotting_scope(scope='auto', select_model_label=None, select_dataset_label=None, select_targetclass=None, select_smallest_targetclass=True)[source]#

Build plot_input subset according to a strict scope contract.

Parameters:
scope{‘auto’, ‘no_comparison’, ‘compare_models’, ‘compare_datasets’, ‘compare_targetclasses’}, default=’auto’

Evaluation perspective.

If scope='auto', the scope is inferred deterministically from the provided selectors and the available options.

select_model_labelSequence[str] or None, default=None

Model labels to include.

select_dataset_labelSequence[str] or None, default=None

Dataset labels to include.

select_targetclassSequence[Any] or None, default=None

Target classes to include.

select_smallest_targetclassbool, default=True

Should the plot only contain the results of the smallest targetclass. If True, the specific target is defined from the first dataset. If False and select_targetclass is None try to uses list(self.models[0].classes_)

Returns:
pandas.DataFrame

Subset dataframe ready for plotting functions.

Raises:
ValueError

If the scope is invalid, selector values are invalid, or the selection is ambiguous under the strict contract.

Parameters:
Return type:

DataFrame

Notes

Inference rules for ``scope=’auto’``

Let the universes be:

  • M = all model_labels

  • D = all dataset_labels

  • T = all fitted classes_

After validating selectors (membership is strict):

  1. If exactly one selector among (models, datasets, targetclasses) contains two or more values, then auto selects the corresponding comparison scope.

    • len(select_model_label) >= 2 -> compare_models

    • len(select_dataset_label) >= 2 -> compare_datasets

    • len(select_targetclass) >= 2 -> compare_targetclasses

    If more than one selector has length >= 2, the request is ambiguous and a ValueError is raised.

  2. If no selector has length >= 2:

    • If all dimensions are fixed (either explicitly selected with length 1, or the universe size is 1), auto selects no_comparison.

    • Otherwise, if exactly one dimension is unfixed (universe size > 1) while the other two are fixed, auto selects the corresponding comparison scope comparing all values in that unfixed dimension.

    • If the remaining degrees of freedom are not unique, the request is ambiguous and a ValueError is raised.

Examples

>>> # Compare all models on a fixed dataset and target class (scope inferred):
>>> # plot_input = mp.plotting_scope(select_dataset_label=['test'], select_targetclass=[1])
>>>
>>> # Compare two datasets for a fixed model and target class (scope inferred):
>>> # plot_input = mp.plotting_scope(select_model_label=['lr'], select_targetclass=[1])
prepare_scores_and_ntiles()[source]#

Compute per-row class probabilities and deterministic ntiles.

Parameters:
None
Returns:
pandas.DataFrame

DataFrame containing:

  • dataset_label and model_label

  • target_class (true label)

  • prob_<class> columns

  • dec_<class> columns (1..ntiles; 1 = highest probability)

Raises:
ValueError

If there are no models/datasets, ntiles is invalid, or any dataset has fewer rows than ntiles.

Return type:

DataFrame

Notes

This replaces the legacy approach that added random noise and used pandas.qcut for binning.

Examples

>>> # scores = mp.prepare_scores_and_ntiles()
reset_params()[source]#

Reset all parameters to a default empty state.

Parameters:
None
Returns:
None
Raises:
None
Return type:

None

Notes

The object remains usable after repopulating fields.

Examples

>>> # mp.reset_params()
set_params(**params)[source]#

Set parameters (sklearn-style API) and re-validate.

Parameters:
**paramsAny

Attributes to set on the object.

Returns:
None
Raises:
ValueError

If an invalid parameter is provided.

Parameters:

params (Any)

Return type:

None

See also

get_params

Notes

After updating attributes, the internal state is validated.

Examples

>>> # mp.set_params(ntiles=20)