CustomNLPEnricher#

class scikitplot.corpus.CustomNLPEnricher(config=None, *, custom_config=None)[source]#

NLPEnricher extended with fully-replaceable NLP backends.

Wraps a standard NLPEnricher and intercepts each processing stage when the corresponding custom callable is set in custom_config. Built-in backends are used as fallback for any stage without a custom override.

Parameters:

configEnricherConfig or None, optional: Standard enrichment configuration. None uses defaults.
custom_configCustomEnricherConfig or None, optional: Custom backend callables. None disables all custom overrides (equivalent to using plain NLPEnricher).

Parameters:

config (Any | None)
custom_config (CustomEnricherConfig | None)

See also

scikitplot.corpus._enrichers.NLPEnricher: Built-in enricher.
CustomEnricherConfig: Custom backend callables configuration.

Notes

User note: Drop-in replacement for NLPEnricher. The same enrich_documents() interface is preserved.

Developer note: Delegation order per stage:

If custom_config.<stage> is set → call the custom callable.
Otherwise → delegate to the wrapped NLPEnricher method.

This keeps the built-in lazy-loading cache (spaCy, NLTK, stemmer) intact for any stage that does not have a custom override.

Examples

Integrate a custom tokenizer (e.g. SentencePiece):

import sentencepiece as spm

sp = spm.SentencePieceProcessor()
sp.load("bpe.model")

def sp_tokenize(text):
    return sp.encode(text, out_type=str)

ccfg = CustomEnricherConfig(custom_tokenizer=sp_tokenize)
enricher = CustomNLPEnricher(custom_config=ccfg)
docs = enricher.enrich_documents(corpus_docs)

enrich_documents(documents, *, overwrite=False)[source]#

Enrich a batch of CorpusDocument instances using custom or built-in backends per stage.

Parameters:

documentsSequence[CorpusDocument]: List of Corpus Document.
overwritebool, optional: Overwrite.

Returns:

list[CorpusDocument]

Parameters:

documents (Sequence[Any])
overwrite (bool)

Return type:

list[Any]