ParagraphChunkerConfig#

class scikitplot.corpus.ParagraphChunkerConfig(min_length=0, max_length=None, overlap=0, strip_whitespace=True, include_offsets=True, merge_short=False)[source]#

Configuration for ParagraphChunker.

Parameters:
min_lengthint

Minimum character length to retain a paragraph.

max_lengthint or None

Maximum character length. Paragraphs exceeding this are split at sentence boundaries ([.!?]). None disables the limit.

overlapint

Number of preceding paragraphs prepended as context.

strip_whitespacebool

Strip leading/trailing whitespace from each paragraph.

include_offsetsbool

Compute and store character offsets.

merge_shortbool

Merge consecutive short paragraphs (below min_length) into one block instead of discarding them.

Parameters:
  • min_length (int)

  • max_length (int | None)

  • overlap (int)

  • strip_whitespace (bool)

  • include_offsets (bool)

  • merge_short (bool)

include_offsets: bool = True#
max_length: int | None = None#
merge_short: bool = False#
min_length: int = 0#
overlap: int = 0#
strip_whitespace: bool = True#