ModelPlotPy#
- class scikitplot.modelplotpy.ModelPlotPy(feature_data=[], label_data=[], dataset_labels=[], models=[], model_labels=[], ntiles=10, seed=999)#
Create a model_plots object
- Parameters:
feature_data (list of objects) – Objects containing the X matrix for one or more different datasets.
label_data (list of objects) – Objects of the y vector for one or more different datasets.
dataset_labels (list of str) – Containing the names of the different
feature_data
andlabel_data
combination pairs.models (list of objects) – Containing the sk-learn model objects
seed (int, default 999) – Make results reproducible, in the case of a small dataset the data cannot be split into unique ntiles.
- Raises:
ValueError – If there is no match with the complete list or the input list again:
Notes
Alias for backward compatibility or convenience ModelPlotPy
modelplotpy
=ModelPlotPy
- aggregate_over_ntiles()#
Create eval_t_tot
This function builds the pandas dataframe eval_t_tot and contains the aggregated output. The data is aggregated over datasets (feature and label-data pairs) and list of models.
- Parameters:
feature_data (list of objects) – Objects containing the X matrix for one or more different datasets.
label_data (list of objects) – Objects of the y vector for one or more different datasets.
dataset_labels (list of str) – Containing the names of the different feature
feature_data
and labellabel_data
data combination pairs.models (list of objects) – Containing the sk-learn model objects.
model_labels (list of str) – Names of the (sk-learn) models.
ntiles (int, default 10) – The number of splits 10 is called deciles, 100 is called percentiles and any other value is an ntile.
seed (int, default 999) – Making the splits reproducible.
- Returns:
Pandas dataframe with combination of all datasets, models, target values and ntiles.
It already contains almost all necessary information for model plotting.
- Raises:
ValueError – If there is no match with the complete list or the input list again.:
- get_params()#
Get parameters of the model plots object.
- plotting_scope(scope='no_comparison', select_model_label=[], select_dataset_label=[], select_targetclass=[], select_smallest_targetclass=True)#
Create plot_input
This function builds the pandas dataframe plot_input wich is a subset of scores_and_ntiles. The dataset is the subset of scores_and_ntiles that is dependent of 1 of the 4 evaluation types that a user can request.
How is this function evaluated? There are 4 different perspectives to evaluate model plots. 1. no_comparison This perspective will show a single plot that contains the viewpoint from: 1 dataset 1 model 1 target class
2. compare_models This perspective will show plots that contains the viewpoint from: 2 or more different models 1 dataset 1 target class
3. compare_datasets This perspective will show plots that contains the viewpoint from: 2 or more different datasets 1 model 1 target class
4. compare_datasets This perspective will show plots that contains the viewpoint from: 2 or more different target classes 1 dataset 1 model
- Parameters:
scope (str, default is 'no_comparison') – One of the 4 evaluation types: - ‘no_comparison’ - ‘compare_models’ - ‘compare_datasets’ - ‘compare_datasets’ - ‘compare_targetclasses’.
select_model_label (list of str) – List of one or more elements from the model_name parameter.
select_dataset_label (list of str) – List of one or more elements from the description parameter.
select_targetclass (list of str) – List of one or more elements from the label data.
select_smallest_targetclass (bool, default = True) – Should the plot only contain the results of the smallest targetclass. If True, the specific target is defined from the first dataset.
- Returns:
Pandas dataframe, a subset of scores_and_ntiles, for all dataset, model
and target value combinations for all ntiles.
It contains all necessary information for model plotting.
- Raises:
ValueError – If the wrong
scope
value is specified.:
- prepare_scores_and_ntiles()#
Create eval_tot
This function builds the pandas dataframe eval_tot that contains for each feature and label data pair given a description the actual and predicted value. It loops over the different models with the given model_name.
- Parameters:
feature_data (list of objects) – Objects containing the X matrix for one or more different datasets.
label_data (list of objects) – Objects of the y vector for one or more different datasets.
dataset_labels (list of str) – Containing the names of the different feature
feature_data
and labellabel_data
data combination pairs.models (list of objects) – Containing the sk-learn model objects.
model_labels (list of str) – Names of the (sk-learn) models.
ntiles (int, default 10) – The number of splits 10 is called deciles, 100 is called percentiles and any other value is an ntile.
seed (int, default 999) – Making the splits reproducible.
- Returns:
Pandas dataframe for all given information and for each target_class it makes
a prediction and ntile. For each ntile a small value (based on the seed) is added
and normalized to make the results reproducible.
- Raises:
ValueError – If there is no match with the complete list or the input list again:
- reset_modules()#
Reset the internal state of the ModelPlotPy object.
- reset_params()#
Reset all parameters to default values.
- set_params(**params)#
Set parameters of the model plots object.