WhitespaceNormalizer#

class scikitplot.corpus.WhitespaceNormalizer(collapse_newlines=False, strip=True)[source]#

Collapse runs of whitespace and optionally strip leading/trailing space.

Parameters:
collapse_newlinesbool, optional

When True, newline characters are treated as whitespace and collapsed with other spaces. When False, newlines are preserved (only intra-line spaces are collapsed). Default: False.

stripbool, optional

Strip leading and trailing whitespace from the result. Default: True.

Parameters:

Examples

>>> norm = WhitespaceNormalizer()
>>> doc = CorpusDocument.create("f.txt", 0, "Hello   world.")
>>> norm.normalize_doc(doc).normalized_text
'Hello world.'
normalize_doc(doc)[source]#

Collapse whitespace in the document text.

Parameters:
docCorpusDocument
Returns:
CorpusDocument
Parameters:

doc (CorpusDocument)

Return type:

CorpusDocument