SentenceChunkerConfig#

class scikitplot.corpus.SentenceChunkerConfig(backend=SentenceBackend.REGEX, min_length=10, overlap=0, spacy_model=None, nltk_language='english', strip_whitespace=True, include_offsets=True)[source]#

Configuration for SentenceChunker.

Parameters:
backendSentenceBackend

Splitting strategy. REGEX has no extra dependencies. NLTK requires the punkt model. SPACY requires a loaded model name via spacy_model.

min_lengthint

Minimum character length for a sentence to be kept.

overlapint

Number of preceding sentences to prepend as context.

spacy_modelstr or None

Spacy model name, e.g. "en_core_web_sm". Required when backend is SPACY.

nltk_languagestr

Language string forwarded to nltk.tokenize.sent_tokenize.

strip_whitespacebool

Strip leading/trailing whitespace from each sentence.

include_offsetsbool

Compute character offsets (start_char, end_char).

Parameters:
backend: SentenceBackend = 'regex'[source]#
include_offsets: bool = True#
min_length: int = 10#
nltk_language: str = 'english'#
overlap: int = 0#
spacy_model: str | None = None#
strip_whitespace: bool = True#