ModelPlotPy#

class scikitplot.modelplotpy.ModelPlotPy(feature_data=[], label_data=[], dataset_labels=[], models=[], model_labels=[], ntiles=10, seed=999)#

Create a model_plots object

Parameters:
  • feature_data (list of objects) – Objects containing the X matrix for one or more different datasets.

  • label_data (list of objects) – Objects of the y vector for one or more different datasets.

  • dataset_labels (list of str) – Containing the names of the different feature_data and label_data combination pairs.

  • models (list of objects) – Containing the sk-learn model objects

  • model_labels (list of str) – Names of the (sk-learn) models

  • seed (int, default 999) – Make results reproducible, in the case of a small dataset the data cannot be split into unique ntiles.

Raises:

ValueError – If there is no match with the complete list or the input list again:

Notes

Alias for backward compatibility or convenience ModelPlotPy modelplotpy = ModelPlotPy

aggregate_over_ntiles()#

Create eval_t_tot

This function builds the pandas dataframe eval_t_tot and contains the aggregated output. The data is aggregated over datasets (feature and label-data pairs) and list of models.

Parameters:
  • feature_data (list of objects) – Objects containing the X matrix for one or more different datasets.

  • label_data (list of objects) – Objects of the y vector for one or more different datasets.

  • dataset_labels (list of str) – Containing the names of the different feature feature_data and label label_data data combination pairs.

  • models (list of objects) – Containing the sk-learn model objects.

  • model_labels (list of str) – Names of the (sk-learn) models.

  • ntiles (int, default 10) – The number of splits 10 is called deciles, 100 is called percentiles and any other value is an ntile.

  • seed (int, default 999) – Making the splits reproducible.

Returns:

  • Pandas dataframe with combination of all datasets, models, target values and ntiles.

  • It already contains almost all necessary information for model plotting.

Raises:

ValueError – If there is no match with the complete list or the input list again.:

get_params()#

Get parameters of the model plots object.

plotting_scope(scope='no_comparison', select_model_label=[], select_dataset_label=[], select_targetclass=[], select_smallest_targetclass=True)#

Create plot_input

This function builds the pandas dataframe plot_input wich is a subset of scores_and_ntiles. The dataset is the subset of scores_and_ntiles that is dependent of 1 of the 4 evaluation types that a user can request.

How is this function evaluated? There are 4 different perspectives to evaluate model plots. 1. no_comparison This perspective will show a single plot that contains the viewpoint from: 1 dataset 1 model 1 target class

2. compare_models This perspective will show plots that contains the viewpoint from: 2 or more different models 1 dataset 1 target class

3. compare_datasets This perspective will show plots that contains the viewpoint from: 2 or more different datasets 1 model 1 target class

4. compare_datasets This perspective will show plots that contains the viewpoint from: 2 or more different target classes 1 dataset 1 model

Parameters:
  • scope (str, default is 'no_comparison') – One of the 4 evaluation types: - ‘no_comparison’ - ‘compare_models’ - ‘compare_datasets’ - ‘compare_datasets’ - ‘compare_targetclasses’.

  • select_model_label (list of str) – List of one or more elements from the model_name parameter.

  • select_dataset_label (list of str) – List of one or more elements from the description parameter.

  • select_targetclass (list of str) – List of one or more elements from the label data.

  • select_smallest_targetclass (bool, default = True) – Should the plot only contain the results of the smallest targetclass. If True, the specific target is defined from the first dataset.

Returns:

  • Pandas dataframe, a subset of scores_and_ntiles, for all dataset, model

  • and target value combinations for all ntiles.

  • It contains all necessary information for model plotting.

Raises:

ValueError – If the wrong scope value is specified.:

prepare_scores_and_ntiles()#

Create eval_tot

This function builds the pandas dataframe eval_tot that contains for each feature and label data pair given a description the actual and predicted value. It loops over the different models with the given model_name.

Parameters:
  • feature_data (list of objects) – Objects containing the X matrix for one or more different datasets.

  • label_data (list of objects) – Objects of the y vector for one or more different datasets.

  • dataset_labels (list of str) – Containing the names of the different feature feature_data and label label_data data combination pairs.

  • models (list of objects) – Containing the sk-learn model objects.

  • model_labels (list of str) – Names of the (sk-learn) models.

  • ntiles (int, default 10) – The number of splits 10 is called deciles, 100 is called percentiles and any other value is an ntile.

  • seed (int, default 999) – Making the splits reproducible.

Returns:

  • Pandas dataframe for all given information and for each target_class it makes

  • a prediction and ntile. For each ntile a small value (based on the seed) is added

  • and normalized to make the results reproducible.

Raises:

ValueError – If there is no match with the complete list or the input list again:

reset_modules()#

Reset the internal state of the ModelPlotPy object.

reset_params()#

Reset all parameters to default values.

set_params(**params)#

Set parameters of the model plots object.