CustomNormalizer#
- class scikitplot.corpus.CustomNormalizer(fn, *, name=None, text_mode=False)[source]#
Wrap any callable as a
NormalizerBase.- Parameters:
- fncallable
Normalizer callable. One of two signatures accepted:
(doc: CorpusDocument) -> CorpusDocumentFull document transform — the callable controls exactly which fields change via
doc.replace().(text: str) -> strPure text transform — the module wraps the result in
doc.replace(normalized_text=result)automatically. Detected by inspecting whether the return value is astr.
- namestr, optional
Human-readable label used in
__repr__.- text_modebool, optional
When
True, treatfnas astr → strtransform and wrap automatically. WhenFalse(default), treatfnas a fullCorpusDocument → CorpusDocumenttransform. PassTruefor simple string-level operations (regex substitution, lowercasing, etc.) without writing thedoc.replace()boilerplate.
- Raises:
- TypeError
If
fnis not callable.
- Parameters:
See also
scikitplot.corpus._normalizers.NormalizationPipelineChain normalizers.
scikitplot.corpus._normalizers.NormalizerBaseAbstract base class.
Notes
User note: Combine with
NormalizationPipelineto slot a custom step anywhere in the normalisation sequence.Examples
Strip citation markers
[1],[2]from academic text:import re def strip_citations(text: str) -> str: return re.sub(r"\\[\\d+\\]", "", text) norm = CustomNormalizer(strip_citations, text_mode=True)
Full document transform (language detection side-channel):
def tag_language(doc): lang = detect(doc.normalized_text or doc.text) return doc.replace(language=lang) norm = CustomNormalizer(tag_language)