NormalizationPipeline#
- class scikitplot.corpus.NormalizationPipeline(steps)[source]#
Apply a sequence of normalisers in order.
Each normaliser in the pipeline receives the output of the previous one. Normalisers that have no effect return the document unchanged, so only modified documents incur a
replace()call.- Parameters:
- stepssequence of NormalizerBase
Ordered list of normalisers to apply.
- Raises:
- ValueError
If
stepsis empty.
- Parameters:
steps (Sequence[NormalizerBase])
Examples
>>> pipeline = NormalizationPipeline( ... [ ... UnicodeNormalizer(form="NFKC"), ... HTMLStripNormalizer(), ... WhitespaceNormalizer(), ... ] ... ) >>> result = pipeline.normalize_doc(doc)
- normalize_batch(docs)[source]#
Apply the pipeline to a list of documents.
- Parameters:
- docslist[CorpusDocument]
- Returns:
- list[CorpusDocument]
- Parameters:
docs (list[CorpusDocument])
- Return type:
- normalize_doc(doc)[source]#
Apply all normalisers in order.
- Parameters:
- docCorpusDocument
- Returns:
- CorpusDocument
Document after all normalisation stages.
- Parameters:
doc (CorpusDocument)
- Return type:
Gallery examples#
corpus WHO European Region local or url per file with examples
corpus WHO European Region local or url per file with examples