WhitespaceNormalizer#
- class scikitplot.corpus.WhitespaceNormalizer(collapse_newlines=False, strip=True)[source]#
Collapse runs of whitespace and optionally strip leading/trailing space.
- Parameters:
- collapse_newlinesbool, optional
When
True, newline characters are treated as whitespace and collapsed with other spaces. WhenFalse, newlines are preserved (only intra-line spaces are collapsed). Default:False.- stripbool, optional
Strip leading and trailing whitespace from the result. Default:
True.
- Parameters:
Examples
>>> norm = WhitespaceNormalizer() >>> doc = CorpusDocument.create("f.txt", 0, "Hello world.") >>> norm.normalize_doc(doc).normalized_text 'Hello world.'
- normalize_doc(doc)[source]#
Collapse whitespace in the document text.
- Parameters:
- docCorpusDocument
- Returns:
- CorpusDocument
- Parameters:
doc (CorpusDocument)
- Return type:
Gallery examples#
corpus WHO European Region local or url per file with examples
corpus WHO European Region local or url per file with examples