ChunkerBridge#

class scikitplot.corpus.ChunkerBridge(inner)[source]#

Adapter that wraps a new-style chunker as a ChunkerBase- compatible object.

Parameters:
innerobject

The new-style chunker instance (SentenceChunker, ParagraphChunker, FixedWindowChunker, or WordChunker).

Attributes:
strategyChunkingStrategy

Required by _base.py:get_documents() line 739.

innerobject

The wrapped chunker — retained for direct access to the richer ChunkResult API when needed.

Parameters:

inner (Any)

Notes

Developer note: _base.py calls exactly two things on a chunker:

  1. self.chunker.strategy — a ChunkingStrategy enum value.

  2. self.chunker.chunk(text, metadata=raw_chunk)list[tuple[int, str]] where int is char_start and str is the chunk text.

This bridge satisfies both without touching ChunkerBase or the new chunkers.

chunk(text, metadata=None)[source]#

Chunk text and return a ChunkResult.

CRITICAL-02 (Phase 2): Returns ChunkResult directly. DocumentReader.get_documents now iterates chunk_result.chunks instead of (char_start, chunk_text) tuples.

Parameters:
textstr

Raw text to chunk.

metadatadict[str, Any] or None, optional

Raw-chunk metadata dict passed by get_documents(). Forwarded as extra_metadata to the inner chunker.

Returns:
ChunkResult

Ordered list of Chunk objects with text, start_char, end_char, and metadata.

Parameters:
Return type:

ChunkResult

Notes

Use _to_tuples to convert to the legacy list[tuple[int, str]] format if needed for backward compat.

strategy: ClassVar[ChunkingStrategy]#