coerce_language#
- scikitplot.corpus.coerce_language(lang, *, default='english')[source]#
Normalise any language specifier into a list of canonical NLTK names.
Accepts all three forms used by chunkers and the enricher:
None→[default](caller passes text for auto-detect separately)"en"→["english"]"english"→["english"]["en", "ar"]→["english", "arabic"]["english"]→["english"]
- Parameters:
- langstr or list[str] or None
Language specifier.
- defaultstr, optional
Canonical NLTK name to use when lang is
None. Default"english".
- Returns:
- list[str]
Non-empty list of canonical lowercase NLTK language names. Duplicates are removed while preserving order.
- Raises:
- TypeError
If lang is not a
str,list, orNone.- ValueError
If lang is an empty list.
- Parameters:
- Return type:
Examples
>>> coerce_language(None) ['english'] >>> coerce_language("en") ['english'] >>> coerce_language(["en", "ar"]) ['english', 'arabic'] >>> coerce_language("english") ['english']