ParagraphChunkerConfig#
- class scikitplot.corpus.ParagraphChunkerConfig(min_length=0, max_length=None, overlap=0, strip_whitespace=True, include_offsets=True, merge_short=False)[source]#
Configuration for
ParagraphChunker.- Parameters:
- min_lengthint
Minimum character length to retain a paragraph.
- max_lengthint or None
Maximum character length. Paragraphs exceeding this are split at sentence boundaries (
[.!?]).Nonedisables the limit.- overlapint
Number of preceding paragraphs prepended as context.
- strip_whitespacebool
Strip leading/trailing whitespace from each paragraph.
- include_offsetsbool
Compute and store character offsets.
- merge_shortbool
Merge consecutive short paragraphs (below min_length) into one block instead of discarding them.
- Parameters: