CustomNLPEnricher#

class scikitplot.corpus.CustomNLPEnricher(config=None, *, custom_config=None)[source]#

NLPEnricher extended with fully-replaceable NLP backends.

Wraps a standard NLPEnricher and intercepts each processing stage when the corresponding custom callable is set in custom_config. Built-in backends are used as fallback for any stage without a custom override.

Parameters:
configEnricherConfig or None, optional

Standard enrichment configuration. None uses defaults.

custom_configCustomEnricherConfig or None, optional

Custom backend callables. None disables all custom overrides (equivalent to using plain NLPEnricher).

Parameters:

See also

scikitplot.corpus._enrichers.NLPEnricher

Built-in enricher.

CustomEnricherConfig

Custom backend callables configuration.

Notes

User note: Drop-in replacement for NLPEnricher. The same enrich_documents() interface is preserved.

Developer note: Delegation order per stage:

  1. If custom_config.<stage> is set → call the custom callable.

  2. Otherwise → delegate to the wrapped NLPEnricher method.

This keeps the built-in lazy-loading cache (spaCy, NLTK, stemmer) intact for any stage that does not have a custom override.

Examples

Integrate a custom tokenizer (e.g. SentencePiece):

import sentencepiece as spm

sp = spm.SentencePieceProcessor()
sp.load("bpe.model")

def sp_tokenize(text):
    return sp.encode(text, out_type=str)

ccfg = CustomEnricherConfig(custom_tokenizer=sp_tokenize)
enricher = CustomNLPEnricher(custom_config=ccfg)
docs = enricher.enrich_documents(corpus_docs)
enrich_documents(documents, *, overwrite=False)[source]#

Enrich a batch of CorpusDocument instances using custom or built-in backends per stage.

Parameters:
documentsSequence[CorpusDocument]

List of Corpus Document.

overwritebool, optional

Overwrite.

Returns:
list[CorpusDocument]
Parameters:
Return type:

list[Any]