ModelPlotPy#

class scikitplot.modelplotpy.ModelPlotPy(feature_data=[], label_data=[], dataset_labels=[], models=[], model_labels=[], ntiles=10, seed=0)[source]#

ModelPlotPy decile analysis.

Parameters:
feature_datalist of objects (n_datasets, )

Objects containing the X matrix for one or more different datasets.

label_datalist of objects (n_datasets, )

Objects of the y vector for one or more different datasets.

dataset_labelslist of str (n_datasets, )

Containing the names of the different feature_data and label_data combination pairs.

modelslist of objects (n_models, )

Containing the sk-learn model objects.

model_labelslist of str (n_models, )

Names of the (sk-learn) models.

ntilesint, default 10

The number of splits range is (2, inf]:

  • 10 is called deciles

  • 100 is called percentiles

  • any other value is an ntile

seedint, default=0

Making the splits reproducible.

Changed in version 0.3.9: Default changed from 999 to 0.

Raises:
ValueError

If there is no match with the complete list or the input list again

aggregate_over_ntiles()[source]#

Create eval_t_tot

This function builds the pandas dataframe eval_t_tot and contains the aggregated output. The data is aggregated over datasets (feature and label-data pairs) and list of models.

Parameters:
feature_datalist of objects (n_datasets, )

Objects containing the X matrix for one or more different datasets.

label_datalist of objects (n_datasets, )

Objects of the y vector for one or more different datasets.

dataset_labelslist of str (n_datasets, )

Containing the names of the different feature feature_data and label label_data data combination pairs.

modelslist of objects (n_models, )

Containing the sk-learn model objects.

model_labelslist of str (n_models, )

Names of the (sk-learn) models.

ntilesint, default 10

The number of splits 10 is called deciles, 100 is called percentiles and any other value is an ntile. Range is (2, inf].

seedint, default=0

Making the splits reproducible.

Changed in version 0.3.9: Default changed from 999 to 0.

Returns:
pandas.DataFrame

Pandas dataframe with combination of all datasets, models, target values and ntiles. It already contains almost all necessary information for model plotting.

Raises:
ValueError

If there is no match with the complete list or the input list again.

get_params()[source]#

Get parameters of the model plots object.

Added in version 0.3.9.

plotting_scope(scope='no_comparison', select_model_label=[], select_dataset_label=[], select_targetclass=[], select_smallest_targetclass=True)[source]#

Create plot_input

This function builds the pandas dataframe plot_input wich is a subset of scores_and_ntiles. The dataset is the subset of scores_and_ntiles that is dependent of 1 of the 4 evaluation types that a user can request.

Changed in version 0.3.9: Parameters has been reorganized.

Parameters:
scope{‘no_comparison’, ‘compare_models’, ‘compare_datasets’, ‘compare_targetclasses’}, default=’no_comparison’

How is this function evaluated? There are 4 different perspectives to evaluate model plots.

  1. scope='no_comparison'

    This perspective will show a single plot that contains the viewpoint from:

    • 1 dataset

    • 1 model

    • 1 target class

  2. scope='compare_models'

    This perspective will show plots that contains the viewpoint from:

    • 2 or more different models

    • 1 dataset

    • 1 target class

  3. scope='compare_datasets'

    This perspective will show plots that contains the viewpoint from:

    • 2 or more different datasets

    • 1 model

    • 1 target class

  4. scope='compare_targetclasses'

    This perspective will show plots that contains the viewpoint from:

    • 2 or more different target classes

    • 1 dataset

    • 1 model

select_model_labellist of str

List of one or more elements from the model_name parameter.

select_dataset_labellist of str

List of one or more elements from the description parameter.

select_targetclasslist of str

List of one or more elements from the label data.

select_smallest_targetclassbool, default = True

Should the plot only contain the results of the smallest targetclass. If True, the specific target is defined from the first dataset.

Returns:
pandas.DataFrame

Pandas dataframe, a subset of scores_and_ntiles, for all dataset, model and target value combinations for all ntiles. It contains all necessary information for model plotting.

Raises:
ValueError

If the wrong scope value is specified.

Return type:

pandas.DataFrame

prepare_scores_and_ntiles()[source]#

Create eval_tot

This function builds the pandas dataframe eval_tot that contains for each feature and label data pair given a description the actual and predicted value. It loops over the different models with the given model_name.

Parameters:
feature_datalist of objects (n_datasets, )

Objects containing the X matrix for one or more different datasets.

label_datalist of objects (n_datasets, )

Objects of the y vector for one or more different datasets.

dataset_labelslist of str (n_datasets, )

Containing the names of the different feature feature_data and label label_data data combination pairs.

modelslist of objects (n_models, )

Containing the sk-learn model objects.

model_labelslist of str (n_models, )

Names of the (sk-learn) models.

ntilesint, default 10

The number of splits 10 is called deciles, 100 is called percentiles and any other value is an ntile. Range is (2, inf].

seedint, default=0

Making the splits reproducible.

Changed in version 0.3.9: Default changed from 999 to 0.

Returns:
scores_and_ntilespandas.DataFrame

Pandas dataframe for all given information and for each target_class it makes a prediction and ntile. For each ntile a small value (based on the seed) is added and normalized to make the results reproducible.

Raises:
ValueError

If there is no match with the complete list or the input list again

reset_params()[source]#

Reset all parameters to default values.

Added in version 0.3.9.

set_params(**params)[source]#

Set parameters of the model plots object.

Added in version 0.3.9.