CustomEnricherConfig#

class scikitplot.corpus.CustomEnricherConfig(custom_tokenizer=None, custom_lemmatizer=None, custom_stemmer=None, custom_keyword_extractor=None, custom_stopwords=None)[source]#

Custom backend callables for CustomNLPEnricher.

Every field is optional. When set it replaces the corresponding built-in backend in NLPEnricher. None means “use the built-in backend from EnricherConfig”.

Parameters:
custom_tokenizercallable or None, optional

Replaces the built-in tokenizer. Signature:

def custom_tokenizer(text: str) -> list[str]: ...
custom_lemmatizercallable or None, optional

Replaces the built-in lemmatizer. Signature:

def custom_lemmatizer(tokens: list[str]) -> list[str]: ...
custom_stemmercallable or None, optional

Replaces the built-in stemmer. Signature:

def custom_stemmer(tokens: list[str]) -> list[str]: ...
custom_keyword_extractorcallable or None, optional

Replaces the built-in keyword extractor. Signature:

def custom_keyword_extractor(
    text: str,
    tokens: list[str],
) -> list[str]: ...
custom_stopwordsfrozenset[str] or None, optional

Replaces the built-in stopword set used by _filter_tokens. When None the built-in NLTK / fallback set is used.

Parameters:

Notes

User note: Pass a CustomEnricherConfig together with the standard EnricherConfig to CustomNLPEnricher. Built-in fields (tokenizer, lemmatizer, etc.) in EnricherConfig are still honoured for any stage that has no custom callable.

Examples

Replace keyword extraction with a KeyBERT-based extractor:

from keybert import KeyBERT

_kb = KeyBERT()

def kb_extractor(text, tokens):
    return [kw for kw, _ in _kb.extract_keywords(text, top_n=10)]

ccfg = CustomEnricherConfig(custom_keyword_extractor=kb_extractor)
enricher = CustomNLPEnricher(custom_config=ccfg)
custom_keyword_extractor: Callable[[str, list[str]], list[str]] | None = None#
custom_lemmatizer: Callable[[list[str]], list[str]] | None = None#
custom_stemmer: Callable[[list[str]], list[str]] | None = None#
custom_stopwords: frozenset[str] | None = None#
custom_tokenizer: Callable[[str], list[str]] | None = None#