🎲 Bayesian Data Analysis
Reasoning about uncertainty with priors, likelihoods and posteriors
0.5.dev0+git.20260626.e137512 - June 26, 2026 18:41 UTC

Bayesian Data Analysis#

Bayesian analysis treats unknown quantities as probability distributions and updates them with data. Instead of a single “best” estimate, you get a full posterior — a principled account of what the data do and do not tell you. This hub builds from first principles up to the nonparametric models (mixtures, density estimation, Dirichlet processes) that the source corpus emphasises.

Three reading levels run through the page:

  • newcomers — the intuition of prior → likelihood → posterior;

  • practitioners — how to actually compute and check posteriors;

  • researchers — hierarchical and nonparametric (infinite-mixture) models.

Note

Open a dropdown for detail; follow See also links to related ideas. Code snippets use real scipy.stats / PyMC / ArviZ / scikit-learn calls. This page pairs with the Terminology reference (probability and distributions) and the Time Series hub (where Bayesian estimation also appears).


Discovery at a Glance#

The one equation everything rests on.

🔁 Bayes’ Theorem

Posterior ∝ likelihood × prior — how belief is updated by evidence.

Bayes’ Theorem
🎯 Prior, Likelihood, Posterior

The three ingredients, what each encodes, and where they come from.

Prior, Likelihood & Posterior
📐 Credible Intervals

A 95 % interval you can read as “95 % probability” — unlike a confidence interval.

Credible Intervals (and how they differ from CIs)

From conjugate shortcuts to general-purpose sampling.

✨ Conjugacy

When prior and posterior share a family, the update is exact and closed-form.

Conjugacy (the exact, closed-form case)
⛓️ MCMC Sampling

Drawing from any posterior when no formula exists — the workhorse of modern Bayes.

MCMC Sampling
🔮 Posterior Predictive

Simulating new data to check the model and forecast.

Posterior Predictive Checks

Sharing strength across groups; letting complexity grow with data.

🏛️ Hierarchical Models

Partial pooling: groups borrow strength from each other.

Hierarchical Models (Partial Pooling)
🌗 Mixture Models

Sub-populations, label switching, and choosing the number of components.

Mixture Models & Label Switching
♾️ Dirichlet Processes

Nonparametric priors that let the number of clusters grow with the data.

Dirichlet Processes (Nonparametric Bayes)

Part 1 — The Bayesian Idea#

Bayes’ Theorem#

What is it?

Bayes’ theorem inverts conditional probability to turn a model of “how data arise given parameters” into “what parameters are plausible given data”:

\[p(\theta \mid y) = \frac{p(y \mid \theta)\, p(\theta)}{p(y)} \;\;\propto\;\; \underbrace{p(y \mid \theta)}_{\text{likelihood}}\; \underbrace{p(\theta)}_{\text{prior}}\]

The denominator \(p(y)\) (the evidence) is a normalising constant; for inference about \(\theta\) the proportionality on the right is what matters.

When to use it — whenever you want to combine prior knowledge with observed data and quantify the remaining uncertainty as a distribution.

Prior, Likelihood & Posterior#
  • Prior \(p(\theta)\) — belief about the parameter before seeing this data (from theory, past studies, or a deliberately weak “let the data speak” choice).

  • Likelihood \(p(y\mid\theta)\) — the data-generating model, read as a function of \(\theta\) for the observed \(y\).

  • Posterior \(p(\theta\mid y)\) — the updated belief; the output of the analysis and the input to every decision.

As data accumulate, the likelihood dominates and the posterior becomes insensitive to a reasonable prior.

Credible Intervals (and how they differ from CIs)#

What is it?

A 95 % credible interval is any region containing 95 % of the posterior probability mass. It supports the natural statement “there is a 95 % probability the parameter lies in this range” — which a frequentist confidence interval does not.

import numpy as np
# equal-tailed 95% credible interval from posterior samples
lo, hi = np.percentile(posterior_samples, [2.5, 97.5])

Part 2 — Computing Posteriors#

Conjugacy (the exact, closed-form case)#

What is it?

A prior is conjugate to a likelihood when the posterior stays in the same family. The classic example is Beta–Binomial: a \(\text{Beta}(\alpha, \beta)\) prior on a success probability, with \(k\) successes in \(n\) trials, yields

\[p(\theta \mid y) = \text{Beta}(\alpha + k,\; \beta + n - k)\]

scipy

from scipy import stats
alpha, beta = 1, 1          # uniform prior
k, n = 8, 10
post = stats.beta(alpha + k, beta + n - k)
print(post.mean(), post.interval(0.95))

When to use it — quick, exact updates for simple models, and as building blocks inside larger samplers.

See also

MCMC Sampling

MCMC Sampling#

What is it?

When the posterior has no closed form, Markov chain Monte Carlo draws correlated samples whose stationary distribution is the posterior. Modern tools use Hamiltonian Monte Carlo / NUTS for efficient exploration.

PyMC + ArviZ

import pymc as pm
import arviz as az

with pm.Model() as model:
    theta = pm.Beta("theta", alpha=1, beta=1)
    y = pm.Binomial("y", n=10, p=theta, observed=8)
    idata = pm.sample(2000, tune=1000)

az.summary(idata)            # means, sd, 94% HDI, r_hat
az.plot_trace(idata)         # convergence diagnostics

Check before trusting\(\hat{R} \approx 1.0\), healthy effective sample size, no divergences.

Posterior Predictive Checks#

What is it?

The posterior predictive distribution simulates new data by integrating the likelihood over the posterior:

\[p(\tilde{y} \mid y) = \int p(\tilde{y}\mid\theta)\,p(\theta\mid y)\,d\theta\]

Comparing simulated datasets to the real one is the primary Bayesian model-checking tool: systematic mismatch signals a misspecified model.

with model:
    pm.sample_posterior_predictive(idata, extend_inferencedata=True)
az.plot_ppc(idata)

Part 3 — Hierarchies, Mixtures & Nonparametrics#

Hierarchical Models (Partial Pooling)#

What is it?

When data come in groups (schools, patients, sites), a hierarchical model gives each group its own parameter while tying those parameters to a shared population distribution:

\[y_{ij} \sim p(\cdot \mid \theta_j), \qquad \theta_j \sim \mathcal{N}(\mu, \tau^2)\]

This partial pooling shrinks noisy small-group estimates toward the overall mean — between “one estimate for everyone” (complete pooling) and “every group alone” (no pooling). The source’s hierarchical dependence posts develop exactly this structure.

Mixture Models & Label Switching#

What is it?

A finite mixture models a population as a weighted blend of sub-populations:

\[p(x) = \sum_{k=1}^{K} \pi_k \, \mathcal{N}(x \mid \mu_k, \Sigma_k), \qquad \sum_k \pi_k = 1\]

Label switching is the identifiability quirk that the components can be permuted without changing the likelihood — a thing to handle when summarising posteriors. Choosing \(K\) is a model-selection problem (AIC / BIC, or let it be infinite — see Dirichlet processes).

scikit-learn + scikit-plots

from sklearn.mixture import GaussianMixture
gmm = GaussianMixture(n_components=3).fit(X)

# scikit-plots: compare K via AIC / AICc / BIC
import scikitplot as skplt
skplt.stats.plot_gaussian_mixture_models(X)
Dirichlet Processes (Nonparametric Bayes)#

What is it?

A Dirichlet process \(\text{DP}(\alpha, G_0)\) is a prior over distributions — the foundation of infinite mixture / density- estimation models where the number of clusters is not fixed in advance but grows with the data. The stick-breaking view builds the mixing weights as

\[\pi_k = v_k \prod_{l<k}(1 - v_l), \qquad v_k \sim \text{Beta}(1, \alpha)\]

The concentration \(\alpha\) controls how readily new clusters appear. This underpins the source’s Dirichlet process mixtures, Bayesian histograms, and density estimation posts.

When to use it — clustering / density estimation where you cannot commit to a fixed number of components a priori.


Map to scikit-plots & the Bayesian Stack#

scikit-plots’ role here is diagnostic and model-selection visual support; the heavy lifting is done by the probabilistic-programming stack.

Gaussian Mixture Models (AIC / BIC)

Choose the number of mixture components by information criteria.

https://scikit-plots.github.io/dev/auto_examples/stats/plot_gaussian_mixture_models.html
Residuals distribution

Distributional / Q–Q checks on fitted models.

https://scikit-plots.github.io/dev/auto_examples/stats/plot_residuals_distribution_script.html
PyMC

Probabilistic programming for building and sampling models.

https://www.pymc.io/
ArviZ

Diagnostics, summaries and plots for Bayesian inference.

https://python.arviz.org/

Sources#

Verified during preparation of this page; resolvable at build date.

Source context (framing only, re-expressed in our own words)

Official documentation (API calls used above)

scikit-plots (this project)

Standard reference

Tags: purpose: reference domain: statistics level: beginner level: intermediate level: advanced