ModelPlotPy#
- class scikitplot.decile.ModelPlotPy(feature_data=None, label_data=None, dataset_labels=None, models=None, model_labels=None, ntiles=10, seed=0)[source]#
Decile/ntile analysis for sklearn classifiers.
- Parameters:
- feature_dataSequence[Any] or None, default=None
Sequence of feature matrices (DataFrame or ndarray). One per dataset.
- label_dataSequence[Any] or None, default=None
Sequence of label vectors (Series/ndarray/list). One per dataset.
- dataset_labelsSequence[str] or None, default=None
Names for datasets; must match length of
feature_dataandlabel_data.- modelsSequence[ClassifierMixin] or None, default=None
Fitted sklearn-like classifiers that implement
predict_probaandclasses_.- model_labelsSequence[str] or None, default=None
Names for models; must match length of
models.- ntilesint, default=10
Number of ntiles. Must satisfy 2 <= ntiles <= n_samples for each dataset.
- seedint, default=0
Reserved for backward compatibility. Not used (ntiles are deterministic).
- Returns:
- ModelPlotPy
Instance.
- Raises:
- ValueError
If list lengths are inconsistent or ntiles is invalid.
- TypeError
If models are not sklearn classifiers.
- Parameters:
See also
Notes
Key design rules:
No mutable defaults in
__init__.No randomness in ntile assignment (random noise added for qcut).
Examples
>>> from sklearn.linear_model import LogisticRegression >>> X = pd.DataFrame({"x": [0, 1, 2, 3]}) >>> y = pd.Series([0, 0, 1, 1]) >>> m = LogisticRegression().fit(X, y) >>> mp = ModelPlotPy([X], [y], ["train"], [m], ["lr"], ntiles=2) >>> scores = mp.prepare_scores_and_ntiles() >>> set(scores.columns) >= { ... "dataset_label", ... "model_label", ... "target_class", ... "prob_0", ... "prob_1", ... "dec_0", ... "dec_1", ... } True
- aggregate_over_ntiles()[source]#
Aggregate counts and lift/gain metrics per ntile.
- Parameters:
- None
- Returns:
- pandas.DataFrame
Aggregated metrics per (model_label, dataset_label, target_class, ntile). The output schema matches the legacy implementation so it can be consumed by the existing plot functions.
- Raises:
- ValueError
If any group has zero positives for a requested target class.
- Return type:
See also
Notes
Dev note: this implementation avoids mutating shared columns in the scores dataframe during loops (the legacy code writes
pos/negrepeatedly).Examples
>>> # agg = mp.aggregate_over_ntiles()
- get_params()[source]#
Get parameters (sklearn-style API).
- Parameters:
- None
- Returns:
- dict[str, Any]
Parameter dictionary.
- Raises:
- None
- Return type:
See also
Notes
The returned objects are the current attributes; callers should not mutate them in-place if they want stable behavior.
Examples
>>> # mp.get_params()
- plotting_scope(scope='auto', select_model_label=None, select_dataset_label=None, select_targetclass=None, select_smallest_targetclass=True)[source]#
Build
plot_inputsubset according to a strict scope contract.- Parameters:
- scope{‘auto’, ‘no_comparison’, ‘compare_models’, ‘compare_datasets’, ‘compare_targetclasses’}, default=’auto’
Evaluation perspective.
If
scope='auto', the scope is inferred deterministically from the provided selectors and the available options.- select_model_labelSequence[str] or None, default=None
Model labels to include.
- select_dataset_labelSequence[str] or None, default=None
Dataset labels to include.
- select_targetclassSequence[Any] or None, default=None
Target classes to include.
- select_smallest_targetclassbool, default=True
Should the plot only contain the results of the smallest targetclass. If True, the specific target is defined from the first dataset. If False and select_targetclass is None try to uses
list(self.models[0].classes_)
- Returns:
- pandas.DataFrame
Subset dataframe ready for plotting functions.
- Raises:
- ValueError
If the scope is invalid, selector values are invalid, or the selection is ambiguous under the strict contract.
- Parameters:
- Return type:
See also
Notes
Inference rules for ``scope=’auto’``
Let the universes be:
M= allmodel_labelsD= alldataset_labelsT= all fittedclasses_
After validating selectors (membership is strict):
If exactly one selector among (models, datasets, targetclasses) contains two or more values, then
autoselects the corresponding comparison scope.len(select_model_label) >= 2->compare_modelslen(select_dataset_label) >= 2->compare_datasetslen(select_targetclass) >= 2->compare_targetclasses
If more than one selector has length >= 2, the request is ambiguous and a ValueError is raised.
If no selector has length >= 2:
If all dimensions are fixed (either explicitly selected with length 1, or the universe size is 1),
autoselectsno_comparison.Otherwise, if exactly one dimension is unfixed (universe size > 1) while the other two are fixed,
autoselects the corresponding comparison scope comparing all values in that unfixed dimension.If the remaining degrees of freedom are not unique, the request is ambiguous and a ValueError is raised.
Examples
>>> # Compare all models on a fixed dataset and target class (scope inferred): >>> # plot_input = mp.plotting_scope(select_dataset_label=['test'], select_targetclass=[1]) >>> >>> # Compare two datasets for a fixed model and target class (scope inferred): >>> # plot_input = mp.plotting_scope(select_model_label=['lr'], select_targetclass=[1])
- prepare_scores_and_ntiles()[source]#
Compute per-row class probabilities and deterministic ntiles.
- Parameters:
- None
- Returns:
- pandas.DataFrame
DataFrame containing:
dataset_labelandmodel_labeltarget_class(true label)prob_<class>columnsdec_<class>columns (1..ntiles; 1 = highest probability)
- Raises:
- ValueError
If there are no models/datasets, ntiles is invalid, or any dataset has fewer rows than
ntiles.
- Return type:
See also
Notes
This replaces the legacy approach that added random noise and used
pandas.qcutfor binning.Examples
>>> # scores = mp.prepare_scores_and_ntiles()
- reset_params()[source]#
Reset all parameters to a default empty state.
- Parameters:
- None
- Returns:
- None
- Raises:
- None
- Return type:
None
See also
Notes
The object remains usable after repopulating fields.
Examples
>>> # mp.reset_params()
- set_params(**params)[source]#
Set parameters (sklearn-style API) and re-validate.
- Parameters:
- **paramsAny
Attributes to set on the object.
- Returns:
- None
- Raises:
- ValueError
If an invalid parameter is provided.
- Parameters:
params (Any)
- Return type:
None
See also
Notes
After updating attributes, the internal state is validated.
Examples
>>> # mp.set_params(ntiles=20)