NormalizerBase#
- class scikitplot.corpus.NormalizerBase[source]#
Abstract base class for all text normalisers.
A normaliser receives a
CorpusDocumentand returns a new instance withnormalized_textupdated. If the normaliser has nothing to do (empty text, already clean, etc.) it returns the document unchanged.Normalisers are composable via
NormalizationPipeline.Notes
Subclasses must implement
normalize_doc. They must never callCorpusDocument.replacewithtext=— onlynormalized_text=may be modified.- abstractmethod normalize_doc(doc)[source]#
Apply normalisation to
doc.- Parameters:
- docCorpusDocument
Input document. Must be a valid, validated instance.
- Returns:
- CorpusDocument
New instance with
normalized_textupdated.textis never modified.
- Parameters:
doc (CorpusDocument)
- Return type:
CorpusDocument