ModelPlotPy#
- class scikitplot.modelplotpy.ModelPlotPy(feature_data=[], label_data=[], dataset_labels=[], models=[], model_labels=[], ntiles=10, seed=0)[source]#
ModelPlotPy decile analysis.
- Parameters:
- feature_datalist of objects (n_datasets, )
Objects containing the X matrix for one or more different datasets.
- label_datalist of objects (n_datasets, )
Objects of the y vector for one or more different datasets.
- dataset_labelslist of str (n_datasets, )
Containing the names of the different
feature_data
andlabel_data
combination pairs.- modelslist of objects (n_models, )
Containing the sk-learn model objects.
- model_labelslist of str (n_models, )
Names of the (sk-learn) models.
- ntilesint, default 10
The number of splits range is (2, inf]:
10 is called
deciles
100 is called
percentiles
any other value is an
ntile
- seedint, default=0
Making the splits reproducible.
Changed in version 0.3.9: Default changed from 999 to 0.
- Raises:
- ValueError
If there is no match with the complete list or the input list again
- aggregate_over_ntiles()[source]#
Create eval_t_tot
This function builds the pandas dataframe eval_t_tot and contains the aggregated output. The data is aggregated over datasets (feature and label-data pairs) and list of models.
- Parameters:
- feature_datalist of objects (n_datasets, )
Objects containing the X matrix for one or more different datasets.
- label_datalist of objects (n_datasets, )
Objects of the y vector for one or more different datasets.
- dataset_labelslist of str (n_datasets, )
Containing the names of the different feature
feature_data
and labellabel_data
data combination pairs.- modelslist of objects (n_models, )
Containing the sk-learn model objects.
- model_labelslist of str (n_models, )
Names of the (sk-learn) models.
- ntilesint, default 10
The number of splits 10 is called deciles, 100 is called percentiles and any other value is an ntile. Range is (2, inf].
- seedint, default=0
Making the splits reproducible.
Changed in version 0.3.9: Default changed from 999 to 0.
- Returns:
- pandas.DataFrame
Pandas dataframe with combination of all datasets, models, target values and ntiles. It already contains almost all necessary information for model plotting.
- Raises:
- ValueError
If there is no match with the complete list or the input list again.
- plotting_scope(scope='no_comparison', select_model_label=[], select_dataset_label=[], select_targetclass=[], select_smallest_targetclass=True)[source]#
Create plot_input
This function builds the pandas dataframe plot_input wich is a subset of scores_and_ntiles. The dataset is the subset of scores_and_ntiles that is dependent of 1 of the 4 evaluation types that a user can request.
Changed in version 0.3.9: Parameters has been reorganized.
- Parameters:
- scope{‘no_comparison’, ‘compare_models’, ‘compare_datasets’, ‘compare_targetclasses’}, default=’no_comparison’
How is this function evaluated? There are 4 different perspectives to evaluate model plots.
scope='no_comparison'
This perspective will show a single plot that contains the viewpoint from:
1 dataset
1 model
1 target class
scope='compare_models'
This perspective will show plots that contains the viewpoint from:
2 or more different models
1 dataset
1 target class
scope='compare_datasets'
This perspective will show plots that contains the viewpoint from:
2 or more different datasets
1 model
1 target class
scope='compare_targetclasses'
This perspective will show plots that contains the viewpoint from:
2 or more different target classes
1 dataset
1 model
- select_model_labellist of str
List of one or more elements from the model_name parameter.
- select_dataset_labellist of str
List of one or more elements from the description parameter.
- select_targetclasslist of str
List of one or more elements from the label data.
- select_smallest_targetclassbool, default = True
Should the plot only contain the results of the smallest targetclass. If True, the specific target is defined from the first dataset.
- Returns:
- pandas.DataFrame
Pandas dataframe, a subset of scores_and_ntiles, for all dataset, model and target value combinations for all ntiles. It contains all necessary information for model plotting.
- Raises:
- ValueError
If the wrong
scope
value is specified.
- Return type:
- prepare_scores_and_ntiles()[source]#
Create eval_tot
This function builds the pandas dataframe eval_tot that contains for each feature and label data pair given a description the actual and predicted value. It loops over the different models with the given model_name.
- Parameters:
- feature_datalist of objects (n_datasets, )
Objects containing the X matrix for one or more different datasets.
- label_datalist of objects (n_datasets, )
Objects of the y vector for one or more different datasets.
- dataset_labelslist of str (n_datasets, )
Containing the names of the different feature
feature_data
and labellabel_data
data combination pairs.- modelslist of objects (n_models, )
Containing the sk-learn model objects.
- model_labelslist of str (n_models, )
Names of the (sk-learn) models.
- ntilesint, default 10
The number of splits 10 is called deciles, 100 is called percentiles and any other value is an ntile. Range is (2, inf].
- seedint, default=0
Making the splits reproducible.
Changed in version 0.3.9: Default changed from 999 to 0.
- Returns:
- scores_and_ntilespandas.DataFrame
Pandas dataframe for all given information and for each target_class it makes a prediction and ntile. For each ntile a small value (based on the seed) is added and normalized to make the results reproducible.
- Raises:
- ValueError
If there is no match with the complete list or the input list again