Modelling and forecasting data that arrives in order
0.5.dev0+git.20260626.e137512 - June 26, 2026 18:41 UTC
Time Series#
A time series is a sequence of observations indexed by time, where order and dependence matter. This hub walks the classical Box–Jenkins path that the source corpus follows: from stationarity and autocorrelation, through the AR / MA / ARMA / ARIMA / SARIMA model family, to estimation, diagnostics and forecasting.
Read it at any depth:
newcomers — what makes time-series data special, and stationarity;
practitioners — reading ACF/PACF and fitting ARIMA in statsmodels;
researchers — estimation (Yule–Walker, MLE), order selection and residual diagnostics.
Warning
Time series breaks the i.i.d. assumption behind ordinary cross-validation. Never shuffle: validate forward in time (walk-forward) to avoid leaking the future into the past.
Note
Open a dropdown for detail and follow See also links. Snippets use
real statsmodels / pandas / scikit-learn calls. This page
pairs with the Terminology reference (Signal
Processing & Time Series) and the
Bayesian Data Analysis hub.
Discovery at a Glance#
What is different about ordered data.
Trend, seasonality and noise — the components hiding in a sequence.
The property most classical models assume, and how to get it by differencing.
The two correlation fingerprints that reveal model order.
AR, MA and their combinations.
Regress on the past (AR) or on past shocks (MA) — the two atoms.
How a nonstationary model is built from a stationary ARMA via differencing.
Adding a seasonal layer for weekly / yearly periodicity.
Fit it, check it, project it forward.
Yule–Walker and Gaussian maximum likelihood for ARMA parameters.
AIC/BIC to pick (p, d, q); residual checks to trust the fit.
Best linear prediction, multi-step horizons, and exponential smoothing.
Part 1 — Time Series Foundations#
What is a Time Series?#
What is it?
An ordered sequence \(\{x_t\}_{t=1}^{T}\) of observations sampled over time. It is usually decomposed into:
Trend — long-run direction;
Seasonality — fixed-period cycles (daily, weekly, yearly);
Residual / noise — what is left after trend and seasonality.
pandas
import pandas as pd
from statsmodels.tsa.seasonal import seasonal_decompose
s = pd.read_csv("series.csv", parse_dates=["date"], index_col="date")
result = seasonal_decompose(s["value"], model="additive", period=12)
result.plot()
Stationarity#
What is it?
A series is (weakly) stationary when its mean, variance and autocovariance do not change over time. Most classical models assume this, so a trending/seasonal series is first differenced to remove the changing parts:
The ADF test checks for a unit root (nonstationarity):
from statsmodels.tsa.stattools import adfuller
stat, pvalue, *_ = adfuller(s["value"])
# small p-value → reject unit root → treat as stationary
Autocorrelation — ACF & PACF#
What is it?
ACF (autocorrelation function) — correlation between the series and its own lag \(k\):
PACF (partial autocorrelation) — the correlation at lag \(k\) after removing the effect of shorter lags.
Their decay/cut-off patterns are the classic fingerprint for choosing AR vs. MA order: a PACF that cuts off after lag p suggests AR(p); an ACF that cuts off after lag q suggests MA(q).
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
plot_acf(s["value"], lags=40)
plot_pacf(s["value"], lags=40, method="ywm") # Yule–Walker
Part 2 — The Classical Model Family#
AR & MA Models#
Autoregressive — AR(p) regresses the present on its own past:
Moving average — MA(q) regresses the present on past shocks:
ARMA(p, q) combines both on a stationary series.
ARIMA — Integrating Nonstationary Series#
What is it?
ARIMA(p, d, q) applies an ARMA(p, q) model to a series that has been differenced \(d\) times to make it stationary — exactly the “build a nonstationary model from a stationary one” idea in the source.
statsmodels
from statsmodels.tsa.arima.model import ARIMA
model = ARIMA(s["value"], order=(1, 1, 1)) # (p, d, q)
fit = model.fit()
print(fit.summary())
SARIMA — Adding Seasonality#
What is it?
SARIMA extends ARIMA with a seasonal \((P, D, Q)_m\) component (period \(m\)) to capture repeating cycles on top of the non-seasonal dynamics.
from statsmodels.tsa.statespace.sarimax import SARIMAX
model = SARIMAX(s["value"], order=(1, 1, 1),
seasonal_order=(1, 1, 1, 12)) # monthly seasonality
fit = model.fit(disp=False)
Part 3 — Estimate, Select & Forecast#
Estimation — Yule–Walker & Gaussian MLE#
What is it?
Yule–Walker — solves the linear system linking AR coefficients to the autocovariances; a fast, closed-form preliminary estimate for AR models.
Gaussian MLE — maximises the likelihood under a Gaussian innovation assumption; the standard estimator for full ARMA/ARIMA models (what
statsmodelsreports).
Order Selection & Residual Diagnostics#
Selecting (p, d, q) — fit candidates and compare information criteria; lower is better:
Diagnostics after fitting — the residuals should look like white noise: no autocorrelation (Ljung–Box test), roughly normal, constant variance.
import statsmodels.api as sm
fit.plot_diagnostics(figsize=(10, 8)) # built-in panel
sm.stats.acorr_ljungbox(fit.resid, lags=[10]) # whiteness test
Forecasting — Linear Prediction & Smoothing#
Best linear predictor — under stationarity, the minimum-MSE linear forecast is built from the autocovariance structure (and the PACF gives the one-step coefficients). Forecasts extend to multi-step horizons with widening uncertainty bands.
Exponential smoothing — a complementary family that forecasts by exponentially weighting recent observations (Holt–Winters adds trend and seasonality):
from statsmodels.tsa.holtwinters import ExponentialSmoothing
hw = ExponentialSmoothing(s["value"], trend="add",
seasonal="add", seasonal_periods=12).fit()
forecast = hw.forecast(12)
Validate forward in time:
from sklearn.model_selection import TimeSeriesSplit
for tr_idx, te_idx in TimeSeriesSplit(n_splits=5).split(s):
... # train on the past, test on the next block
Map to the Python Time-Series Stack#
ARIMA, SARIMAX, exponential smoothing, ACF/PACF, diagnostics.
Datetime indexing, resampling, rolling windows.
Leakage-free walk-forward cross-validation.
Distribution / Q–Q checks for model residuals.
Sources#
Verified during preparation of this page; resolvable at build date.
Source context (framing only, re-expressed in our own words)
Introduction to Time Series category (18 posts): https://insightful-data-lab.com/category/introduction-to-time-series/
Official documentation (API calls used above)
statsmodels — time-series analysis (
tsa): https://www.statsmodels.org/stable/tsa.htmlpandas — time-series / date functionality: https://pandas.pydata.org/docs/user_guide/timeseries.html
scikit-learn —
TimeSeriesSplit: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.TimeSeriesSplit.html
scikit-plots (this project)
Example gallery: https://scikit-plots.github.io/dev/auto_examples/index.html
Terminology reference: terminology-index
Standard references
Hyndman & Athanasopoulos, Forecasting: Principles and Practice (3rd ed., free): https://otexts.com/fpp3/
Brockwell & Davis, Introduction to Time Series and Forecasting.