coerce_language#

scikitplot.corpus.coerce_language(lang, *, default='english')[source]#

Normalise any language specifier into a list of canonical NLTK names.

Accepts all three forms used by chunkers and the enricher:

  • None[default] (caller passes text for auto-detect separately)

  • "en"["english"]

  • "english"["english"]

  • ["en", "ar"]["english", "arabic"]

  • ["english"]["english"]

Parameters:
langstr or list[str] or None

Language specifier.

defaultstr, optional

Canonical NLTK name to use when lang is None. Default "english".

Returns:
list[str]

Non-empty list of canonical lowercase NLTK language names. Duplicates are removed while preserving order.

Raises:
TypeError

If lang is not a str, list, or None.

ValueError

If lang is an empty list.

Parameters:
Return type:

list[str]

Examples

>>> coerce_language(None)
['english']
>>> coerce_language("en")
['english']
>>> coerce_language(["en", "ar"])
['english', 'arabic']
>>> coerce_language("english")
['english']