Building, scoring and trusting predictive models
0.5.dev0+git.20260626.e137512 - June 26, 2026 18:41 UTC
Data Preparation & Analysis#
This hub covers the applied predictive-modelling workflow: framing a prediction problem, fitting a model, and — most importantly for scikit-plots — evaluating it with the right chart for the right question. It is the practical companion to the Terminology reference: terminology defines the metrics, this page shows the workflow that produces and reads them.
It is written for three readers at once:
newcomers who want the intuition behind model evaluation;
practitioners choosing between ROC, lift, gains and threshold tuning;
reviewers who need diagnostics (residuals, outliers) before shipping.
Note
Detail is collapsed by default. Open the dropdown for a term, follow
the See also cross-links to wander related ideas, and use
Ctrl + F or the Sphinx search to jump straight to a topic.
Every code snippet uses a real scikitplot / scikit-learn call.
Discovery at a Glance#
What a predictive model is, and what “good” means.
Inputs → learned mapping → scored output, and the train / validate / test discipline that keeps it honest.
Discrimination vs. calibration vs. business value — three different questions, three different checks.
How the target’s shape (two classes vs. unordered many) changes which metrics apply.
The everyday toolkit for scoring classifiers.
Ranking quality across every threshold at once — and how to plot it in scikit-plots.
Turning scores into decisions: choosing the cut-off that matches your cost trade-off.
“If I contact the top 20 %, how much better than random?” — the campaign manager’s metric.
Interpretable models and what to check before trusting them.
Piecewise, rule-based models that capture interactions and explain themselves.
Studentized residuals to find the points your model cannot explain.
Using a tree to turn an opaque clustering into human-readable rules.
Part 1 — Prediction Models & What “Good” Means#
Before any chart, fix the question: what are we predicting, and how will we know the model is any good?
What is a Prediction Model?#
What is it?
A prediction (or supervised) model learns a mapping from input features \(X\) to a target \(y\) from labelled examples, so that it can score new, unseen inputs. Classification predicts a category; regression predicts a number.
The honesty discipline
Performance must be measured on data the model never saw during fitting. The standard split is train → validation (for tuning) → test (for the final, untouched estimate):
from sklearn.model_selection import train_test_split
X_tr, X_tmp, y_tr, y_tmp = train_test_split(
X, y, test_size=0.4, stratify=y, random_state=0
)
X_val, X_te, y_val, y_te = train_test_split(
X_tmp, y_tmp, test_size=0.5, stratify=y_tmp, random_state=0
)
When to use it — any task where past labelled outcomes can guide future decisions (churn, fraud, response, risk).
Assessing the Quality of a Prediction Model#
Three independent questions
A model can be strong on one axis and weak on another, so check all three:
Discrimination — does it rank positives above negatives? (ROC-AUC, gains, lift).
Calibration — do predicted probabilities match observed frequencies? (reliability curve, Brier score).
Business value — does acting on it beat the baseline at your operating point? (lift at the contacted fraction, profit curve).
scikit-plots connection
import scikitplot as skplt
# One call renders confusion matrix + ROC + PR for a quick read
skplt.metrics.plot_classifier_eval(y_true, y_pred, y_probas)
Binary vs. Nominal (Multiclass) Targets#
Binary — exactly two outcomes (positive / negative). The full confusion-matrix vocabulary (TP/FP/FN/TN) and threshold tuning apply directly.
Nominal / multiclass — three or more unordered categories. Each metric must be averaged across classes (macro / micro / weighted), and ROC-AUC is computed One-vs-Rest or One-vs-One.
from sklearn.metrics import classification_report
print(classification_report(y_true, y_pred)) # per-class + averages
Part 2 — Classification Evaluation#
The core loop: score → rank → choose a threshold → read the trade-off.
ROC Curve & AUC#
What is it?
The ROC curve plots True Positive Rate against False Positive Rate as the decision threshold sweeps from 1 to 0. The AUC (area under it) summarises ranking quality in a single number:
i.e. the probability a random positive is scored above a random negative. 0.5 = random, 1.0 = perfect.
scikit-plots connection
import scikitplot as skplt
skplt.metrics.plot_roc(y_true, y_probas)
When to use it — ranking/threshold-free comparison. For imbalanced problems, read it alongside Precision–Recall / lift, which are more sensitive to the minority class.
Threshold Optimization#
The problem
A classifier outputs a score; a decision needs a cut-off. The default 0.5 is rarely optimal — the right threshold depends on the relative cost of false positives vs. false negatives.
A cost-aware choice
Pick the threshold \(t\) that minimises expected cost:
import numpy as np
from sklearn.metrics import precision_recall_curve
prec, rec, thr = precision_recall_curve(y_true, y_score)
f1 = 2 * prec * rec / (prec + rec + 1e-12)
best_t = thr[np.nanargmax(f1[:-1])] # threshold maximising F1
When to use it — whenever a model’s output drives an action with asymmetric consequences (medical screening, fraud holds, mailing cost).
Part 3 — Gains, Lift & Decile Analysis#
The “how much better than random, at the fraction I can afford to act on” family — a particular strength of scikit-plots’ decile plots.
Cumulative Gains, Lift & Deciles#
What is it?
Rank all cases by predicted score, descending, and bin into deciles (top 10 %, next 10 %, …).
Cumulative gains — the share of all true positives captured by the top k % of ranked cases.
Lift — gains divided by the baseline (random) rate:
A lift of 3 at the top decile means that group responds 3× more than average — exactly the question behind targeted campaigns.
scikit-plots connection
import scikitplot as skplt
skplt.metrics.plot_cumulative_gain(y_true, y_probas)
skplt.metrics.plot_lift_curve(y_true, y_probas)
skplt.metrics.plot_ks_statistic(y_true, y_probas) # max separation
When to use it — ranked-action problems with a budget: direct mail, retention offers, lead scoring, collections.
Part 4 — Decision Trees & Diagnostics#
Interpretable models, and the residual checks that reveal where any model breaks down.
Decision Trees & CART (Interactions, Piecewise Structure)#
What is it?
A CART (Classification And Regression Tree) recursively splits the feature space into axis-aligned regions, predicting a constant in each leaf. It is therefore a piecewise-constant model that captures interactions automatically: a split on one feature changes which splits matter below it.
Splits are chosen to reduce impurity — Gini for classification:
scikit-learn
from sklearn.tree import DecisionTreeClassifier, plot_tree
tree = DecisionTreeClassifier(max_depth=4, min_samples_leaf=50)
tree.fit(X_tr, y_tr)
plot_tree(tree, filled=True, feature_names=cols)
When to use it — when an interpretable, rule-based model and explicit interactions matter more than squeezing out the last point of accuracy. Control depth / leaf size to avoid overfitting.
Explaining Clustering Results with a Tree#
The idea
Clustering (e.g. k-means) produces group labels but no explanation. Treat those cluster labels as a target and fit a shallow decision tree on the original features — the tree’s splits become a human-readable description of what makes each cluster different.
from sklearn.tree import DecisionTreeClassifier
explainer = DecisionTreeClassifier(max_depth=3)
explainer.fit(X, cluster_labels) # labels from KMeans, etc.
When to use it — segmentation deliverables where stakeholders need “Cluster 2 = high spend, low tenure” rather than centroid coordinates.
Residual Diagnostics & Outliers#
What is it?
A residual is the gap between observed and predicted value, \(e_i = y_i - \hat{y}_i\). Studentized residuals rescale each residual by its estimated standard deviation so they are comparable; points with \(|e_i^{\text{stud}}| > 3\) are candidate outliers the model cannot explain.
scikit-plots connection
import scikitplot as skplt
skplt.api.metrics.plot_residuals_distribution(y_true, y_pred)
When to use it — after fitting any regression (or probability) model, to check for structure, heteroscedasticity, and influential outliers before trusting predictions.
Map to scikit-plots Examples#
Worked, runnable galleries for the workflow above (verified links):
Confusion matrix, ROC and PR in one figure.
Per-class and averaged ROC with AUC.
The imbalance-aware companion to ROC.
Share of positives captured by the top deciles.
Improvement over random at each decile.
Maximum class separation along the ranked score.
Business-facing gains / lift / response reports.
Residual and Q–Q diagnostics for fitted models.
Sources#
Verified during preparation of this page; links were resolvable at the documentation build date.
Source context (framing only, re-expressed in our own words)
Data Preparation and Analysis category (56 posts): https://insightful-data-lab.com/category/data-preparation-and-analysis/
Official documentation (API calls used above)
scikit-learn — model evaluation metrics: https://scikit-learn.org/stable/modules/model_evaluation.html
scikit-learn — decision trees: https://scikit-learn.org/stable/modules/tree.html
scikit-learn — train/test splitting: https://scikit-learn.org/stable/modules/cross_validation.html
scikit-plots (this project)
Example gallery: https://scikit-plots.github.io/dev/auto_examples/index.html
API reference: https://scikit-plots.github.io/dev/apis/index.html
Terminology reference: terminology-index
Standard references
James, Witten, Hastie & Tibshirani, An Introduction to Statistical Learning (free): https://www.statlearning.com/
Breiman, Friedman, Olshen & Stone, Classification and Regression Trees (CART), 1984.