ChunkerBridge#

class scikitplot.corpus.ChunkerBridge(inner)[source]#

Adapter that wraps a new-style chunker as a ChunkerBase- compatible object.

Parameters:
innerobject

The new-style chunker instance (SentenceChunker, ParagraphChunker, FixedWindowChunker, or WordChunker).

Attributes:
strategyChunkingStrategy

Required by _base.py:get_documents() line 739.

innerobject

The wrapped chunker — retained for direct access to the richer ChunkResult API when needed.

Parameters:

inner (Any)

Notes

Developer note: _base.py calls exactly two things on a chunker:

  1. self.chunker.strategy — a ChunkingStrategy enum value.

  2. self.chunker.chunk(text, metadata=raw_chunk)list[tuple[int, str]] where int is char_start and str is the chunk text.

This bridge satisfies both without touching ChunkerBase or the new chunkers.

chunk(text, metadata=None)[source]#

Chunk text and return (char_start, chunk_text) pairs.

Parameters:
textstr

Raw text to chunk.

metadatadict[str, Any] or None, optional

Raw-chunk metadata dict passed by get_documents(). Forwarded as extra_metadata to the inner chunker where supported.

Returns:
list[tuple[int, str]]

Each element is (char_offset, chunk_text). If the inner chunker does not provide offsets, a forward-cursor scan computes them.

Parameters:
Return type:

list[tuple[int, str]]

strategy: ClassVar[ChunkingStrategy]#