⏱️ Time Series
Modelling and forecasting data that arrives in order
0.5.dev0+git.20260626.e137512 - June 26, 2026 18:41 UTC

Time Series#

A time series is a sequence of observations indexed by time, where order and dependence matter. This hub walks the classical Box–Jenkins path that the source corpus follows: from stationarity and autocorrelation, through the AR / MA / ARMA / ARIMA / SARIMA model family, to estimation, diagnostics and forecasting.

Read it at any depth:

  • newcomers — what makes time-series data special, and stationarity;

  • practitioners — reading ACF/PACF and fitting ARIMA in statsmodels;

  • researchers — estimation (Yule–Walker, MLE), order selection and residual diagnostics.

Warning

Time series breaks the i.i.d. assumption behind ordinary cross-validation. Never shuffle: validate forward in time (walk-forward) to avoid leaking the future into the past.

Note

Open a dropdown for detail and follow See also links. Snippets use real statsmodels / pandas / scikit-learn calls. This page pairs with the Terminology reference (Signal Processing & Time Series) and the Bayesian Data Analysis hub.


Discovery at a Glance#

What is different about ordered data.

📈 What is a Time Series?

Trend, seasonality and noise — the components hiding in a sequence.

What is a Time Series?
⚖️ Stationarity

The property most classical models assume, and how to get it by differencing.

Stationarity
🔗 ACF & PACF

The two correlation fingerprints that reveal model order.

Autocorrelation — ACF & PACF

AR, MA and their combinations.

🔁 AR & MA

Regress on the past (AR) or on past shocks (MA) — the two atoms.

AR & MA Models
🧱 ARIMA

How a nonstationary model is built from a stationary ARMA via differencing.

ARIMA — Integrating Nonstationary Series
🌗 SARIMA

Adding a seasonal layer for weekly / yearly periodicity.

SARIMA — Adding Seasonality

Fit it, check it, project it forward.

🧮 Estimation

Yule–Walker and Gaussian maximum likelihood for ARMA parameters.

Estimation — Yule–Walker & Gaussian MLE
🎯 Order Selection & Diagnostics

AIC/BIC to pick (p, d, q); residual checks to trust the fit.

Order Selection & Residual Diagnostics
🔮 Forecasting

Best linear prediction, multi-step horizons, and exponential smoothing.

Forecasting — Linear Prediction & Smoothing

Part 1 — Time Series Foundations#

What is a Time Series?#

What is it?

An ordered sequence \(\{x_t\}_{t=1}^{T}\) of observations sampled over time. It is usually decomposed into:

  • Trend — long-run direction;

  • Seasonality — fixed-period cycles (daily, weekly, yearly);

  • Residual / noise — what is left after trend and seasonality.

pandas

import pandas as pd
from statsmodels.tsa.seasonal import seasonal_decompose

s = pd.read_csv("series.csv", parse_dates=["date"], index_col="date")
result = seasonal_decompose(s["value"], model="additive", period=12)
result.plot()
Stationarity#

What is it?

A series is (weakly) stationary when its mean, variance and autocovariance do not change over time. Most classical models assume this, so a trending/seasonal series is first differenced to remove the changing parts:

\[\nabla x_t = x_t - x_{t-1}\]

The ADF test checks for a unit root (nonstationarity):

from statsmodels.tsa.stattools import adfuller
stat, pvalue, *_ = adfuller(s["value"])
# small p-value → reject unit root → treat as stationary
Autocorrelation — ACF & PACF#

What is it?

  • ACF (autocorrelation function) — correlation between the series and its own lag \(k\):

\[\rho(k) = \frac{\gamma(k)}{\gamma(0)}, \qquad \gamma(k) = \operatorname{Cov}(x_t, x_{t-k})\]
  • PACF (partial autocorrelation) — the correlation at lag \(k\) after removing the effect of shorter lags.

Their decay/cut-off patterns are the classic fingerprint for choosing AR vs. MA order: a PACF that cuts off after lag p suggests AR(p); an ACF that cuts off after lag q suggests MA(q).

from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
plot_acf(s["value"], lags=40)
plot_pacf(s["value"], lags=40, method="ywm")   # Yule–Walker

Part 2 — The Classical Model Family#

AR & MA Models#

Autoregressive — AR(p) regresses the present on its own past:

\[x_t = c + \sum_{i=1}^{p} \phi_i\, x_{t-i} + \varepsilon_t\]

Moving average — MA(q) regresses the present on past shocks:

\[x_t = \mu + \varepsilon_t + \sum_{j=1}^{q} \theta_j\, \varepsilon_{t-j}\]

ARMA(p, q) combines both on a stationary series.

ARIMA — Integrating Nonstationary Series#

What is it?

ARIMA(p, d, q) applies an ARMA(p, q) model to a series that has been differenced \(d\) times to make it stationary — exactly the “build a nonstationary model from a stationary one” idea in the source.

statsmodels

from statsmodels.tsa.arima.model import ARIMA

model = ARIMA(s["value"], order=(1, 1, 1))   # (p, d, q)
fit = model.fit()
print(fit.summary())
SARIMA — Adding Seasonality#

What is it?

SARIMA extends ARIMA with a seasonal \((P, D, Q)_m\) component (period \(m\)) to capture repeating cycles on top of the non-seasonal dynamics.

from statsmodels.tsa.statespace.sarimax import SARIMAX

model = SARIMAX(s["value"], order=(1, 1, 1),
                seasonal_order=(1, 1, 1, 12))   # monthly seasonality
fit = model.fit(disp=False)

Part 3 — Estimate, Select & Forecast#

Estimation — Yule–Walker & Gaussian MLE#

What is it?

  • Yule–Walker — solves the linear system linking AR coefficients to the autocovariances; a fast, closed-form preliminary estimate for AR models.

  • Gaussian MLE — maximises the likelihood under a Gaussian innovation assumption; the standard estimator for full ARMA/ARIMA models (what statsmodels reports).

Order Selection & Residual Diagnostics#

Selecting (p, d, q) — fit candidates and compare information criteria; lower is better:

\[\text{AIC} = 2k - 2\ln \hat{L}, \qquad \text{BIC} = k\ln n - 2\ln \hat{L}\]

Diagnostics after fitting — the residuals should look like white noise: no autocorrelation (Ljung–Box test), roughly normal, constant variance.

import statsmodels.api as sm
fit.plot_diagnostics(figsize=(10, 8))            # built-in panel
sm.stats.acorr_ljungbox(fit.resid, lags=[10])    # whiteness test
Forecasting — Linear Prediction & Smoothing#

Best linear predictor — under stationarity, the minimum-MSE linear forecast is built from the autocovariance structure (and the PACF gives the one-step coefficients). Forecasts extend to multi-step horizons with widening uncertainty bands.

Exponential smoothing — a complementary family that forecasts by exponentially weighting recent observations (Holt–Winters adds trend and seasonality):

from statsmodels.tsa.holtwinters import ExponentialSmoothing

hw = ExponentialSmoothing(s["value"], trend="add",
                          seasonal="add", seasonal_periods=12).fit()
forecast = hw.forecast(12)

Validate forward in time:

from sklearn.model_selection import TimeSeriesSplit
for tr_idx, te_idx in TimeSeriesSplit(n_splits=5).split(s):
    ...   # train on the past, test on the next block

Map to the Python Time-Series Stack#

statsmodels — tsa

ARIMA, SARIMAX, exponential smoothing, ACF/PACF, diagnostics.

https://www.statsmodels.org/stable/tsa.html
pandas — time series

Datetime indexing, resampling, rolling windows.

https://pandas.pydata.org/docs/user_guide/timeseries.html
scikit-learn — TimeSeriesSplit

Leakage-free walk-forward cross-validation.

https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.TimeSeriesSplit.html
scikit-plots — residual diagnostics

Distribution / Q–Q checks for model residuals.

https://scikit-plots.github.io/dev/auto_examples/stats/plot_residuals_distribution_script.html

Sources#

Verified during preparation of this page; resolvable at build date.

Source context (framing only, re-expressed in our own words)

Official documentation (API calls used above)

scikit-plots (this project)

Standard references

  • Hyndman & Athanasopoulos, Forecasting: Principles and Practice (3rd ed., free): https://otexts.com/fpp3/

  • Brockwell & Davis, Introduction to Time Series and Forecasting.

Tags: purpose: reference domain: statistics level: beginner level: intermediate level: advanced