FixedWindowChunkerConfig#

class scikitplot.corpus.FixedWindowChunkerConfig(window_size=512, step_size=256, unit=WindowUnit.CHARS, min_length=10, include_offsets=True, strip_whitespace=True)[source]#

Configuration for FixedWindowChunker.

Parameters:
window_sizeint

Size of each chunk in unit units.

step_sizeint

Stride between consecutive chunk starts. step_size == window_size gives non-overlapping chunks. step_size < window_size gives sliding-window overlap.

unitWindowUnit

Measurement unit: CHARS (default) or TOKENS.

min_lengthint

Minimum character length to keep the last (possibly partial) chunk.

include_offsetsbool

Compute and store character offsets.

strip_whitespacebool

Strip leading/trailing whitespace from each chunk.

Parameters:
include_offsets: bool = True#
min_length: int = 10#
step_size: int = 256#
strip_whitespace: bool = True#
unit: WindowUnit = 'chars'[source]#
window_size: int = 512#