TokenizerProtocol#

class scikitplot.corpus.TokenizerProtocol(*args, **kwargs)[source]#

Structural protocol for word tokenizers.

Any object with a tokenize(text: str) -> list[str] method satisfies this protocol, regardless of inheritance. This includes MeCab wrappers, jieba objects, camel-tools tokenizers, Stanza pipelines, HuggingFace fast tokenizers, and plain callable wrappers via FunctionTokenizer.

Parameters:
(none at construction time — protocols define the *call* interface)

Examples

>>> class MyTok:
...     def tokenize(self, text: str) -> list:
...         return text.split()
>>> isinstance(MyTok(), TokenizerProtocol)
True
tokenize(text)[source]#

Tokenize text into a list of token strings.

Parameters:
textstr

Raw input text.

Returns:
list[str]

Token list. Empty list for empty input.

Parameters:

text (str)

Return type:

list[str]