FunctionTokenizer#
- class scikitplot.corpus.FunctionTokenizer(fn, name='custom')[source]#
Wrap any
Callable[[str], list[str]]as aTokenizerProtocol.- Parameters:
- fnCallable[[str], list[str]]
Tokenization function. Must accept a single
strargument and return alist[str].- namestr, optional
Human-readable name for logging and
repr.
- Parameters:
Notes
User note: Use this to plug in any tokenization library:
import MeCab tagger = MeCab.Tagger("-Owakati") tok = FunctionTokenizer(lambda text: tagger.parse(text).strip().split()) import jieba tok = FunctionTokenizer(lambda text: list(jieba.cut(text)))
Developer note: The wrapper stores only the callable; no model loading happens at construction time.
Examples
>>> tok = FunctionTokenizer(str.split) >>> tok.tokenize("hello world") ['hello', 'world'] >>> tok = FunctionTokenizer(lambda t: list(t), name="char_splitter") >>> tok.tokenize("abc") ['a', 'b', 'c']