WordChunkerBridge#

class scikitplot.corpus.WordChunkerBridge(inner)[source]#

Bridge for WordChunkerChunkerBase contract.

Notes

WordChunker splits text at the word-token level, which does not correspond to any named ChunkingStrategy value. CUSTOM is used as the closest approximation — it signals that user-supplied or non-standard logic was applied, and downstream consumers should not assume standard segment boundaries.

Parameters:

inner (Any)

chunk(text, metadata=None)[source]#

Chunk text and return (char_start, chunk_text) pairs.

Parameters:
textstr

Raw text to chunk.

metadatadict[str, Any] or None, optional

Raw-chunk metadata dict passed by get_documents(). Forwarded as extra_metadata to the inner chunker where supported.

Returns:
list[tuple[int, str]]

Each element is (char_offset, chunk_text). If the inner chunker does not provide offsets, a forward-cursor scan computes them.

Parameters:
Return type:

list[tuple[int, str]]

strategy: ClassVar[ChunkingStrategy] = 'custom'[source]#