compute_stats#

scikitplot.corpus.compute_stats(docs)[source]#

Compute aggregate statistics over a document collection.

Parameters:
docssequence of CorpusDocument

Documents to analyse. May be empty.

Returns:
CorpusStats

Frozen statistics object.

Parameters:

docs (Sequence[CorpusDocument])

Return type:

CorpusStats

Notes

This is a pure function: same input → same output, no I/O, no mutation. Safe to call from multiple threads concurrently.

Median is computed without NumPy using a sort-based O(n log n) algorithm so that this module has zero optional dependencies.

Examples

>>> stats = compute_stats(docs)
>>> print(stats.summary())
CorpusStats
  Documents  : 487
  Tokens     : 42,310 total (mean 86.9, ...)