SentenceChunkerConfig#
- class scikitplot.corpus.SentenceChunkerConfig(backend=SentenceBackend.REGEX, min_length=10, overlap=0, spacy_model=None, nltk_language='english', strip_whitespace=True, include_offsets=True)[source]#
Configuration for
SentenceChunker.- Parameters:
- backendSentenceBackend
Splitting strategy.
REGEXhas no extra dependencies.NLTKrequires the punkt model.SPACYrequires a loaded model name via spacy_model.- min_lengthint
Minimum character length for a sentence to be kept.
- overlapint
Number of preceding sentences to prepend as context.
- spacy_modelstr or None
Spacy model name, e.g.
"en_core_web_sm". Required when backend isSPACY.- nltk_languagestr
Language string forwarded to
nltk.tokenize.sent_tokenize.- strip_whitespacebool
Strip leading/trailing whitespace from each sentence.
- include_offsetsbool
Compute character offsets (
start_char,end_char).
- Parameters:
- backend: SentenceBackend = 'regex'[source]#
Gallery examples#
corpus WHO European Region YouTube shorts with examples
corpus WHO European Region YouTube shorts with examples
corpus WHO European Region local .zip with examples
corpus WHO European Region local .zip with examples