NormalizerBase#

class scikitplot.corpus.NormalizerBase[source]#

Abstract base class for all text normalisers.

A normaliser receives a CorpusDocument and returns a new instance with normalized_text updated. If the normaliser has nothing to do (empty text, already clean, etc.) it returns the document unchanged.

Normalisers are composable via NormalizationPipeline.

Notes

Subclasses must implement normalize_doc. They must never call CorpusDocument.replace with text= — only normalized_text= may be modified.

abstractmethod normalize_doc(doc)[source]#

Apply normalisation to doc.

Parameters:
docCorpusDocument

Input document. Must be a valid, validated instance.

Returns:
CorpusDocument

New instance with normalized_text updated. text is never modified.

Parameters:

doc (CorpusDocument)

Return type:

CorpusDocument