NormalizationPipeline#

class scikitplot.corpus.NormalizationPipeline(steps)[source]#

Apply a sequence of normalisers in order.

Each normaliser in the pipeline receives the output of the previous one. Normalisers that have no effect return the document unchanged, so only modified documents incur a replace() call.

Parameters:

stepssequence of NormalizerBase: Ordered list of normalisers to apply.

Raises:

ValueError: If steps is empty.

Parameters:

steps (Sequence[NormalizerBase])

Examples

>>> pipeline = NormalizationPipeline(
...     [
...         UnicodeNormalizer(form="NFKC"),
...         HTMLStripNormalizer(),
...         WhitespaceNormalizer(),
...     ]
... )
>>> result = pipeline.normalize_doc(doc)