NormalizationPipeline#

class scikitplot.corpus.NormalizationPipeline(steps)[source]#

Apply a sequence of normalisers in order.

Each normaliser in the pipeline receives the output of the previous one. Normalisers that have no effect return the document unchanged, so only modified documents incur a replace() call.

Parameters:
stepssequence of NormalizerBase

Ordered list of normalisers to apply.

Raises:
ValueError

If steps is empty.

Parameters:

steps (Sequence[NormalizerBase])

Examples

>>> pipeline = NormalizationPipeline(
...     [
...         UnicodeNormalizer(form="NFKC"),
...         HTMLStripNormalizer(),
...         WhitespaceNormalizer(),
...     ]
... )
>>> result = pipeline.normalize_doc(doc)
normalize_batch(docs)[source]#

Apply the pipeline to a list of documents.

Parameters:
docslist[CorpusDocument]
Returns:
list[CorpusDocument]
Parameters:

docs (list[CorpusDocument])

Return type:

list[CorpusDocument]

normalize_doc(doc)[source]#

Apply all normalisers in order.

Parameters:
docCorpusDocument
Returns:
CorpusDocument

Document after all normalisation stages.

Parameters:

doc (CorpusDocument)

Return type:

CorpusDocument