iso_to_nltk#

scikitplot.corpus.iso_to_nltk(code)[source]#

Resolve an ISO 639-1/639-3 code to a canonical NLTK language name.

Parameters:
codestr

ISO 639-1 two-letter code (e.g. "en", "ar") or ISO 639-3 three-letter code (e.g. "grc" for Ancient Greek), or already- canonical NLTK name (e.g. "english"). Case-insensitive.

Returns:
str

Canonical lowercase NLTK-compatible language name. Falls back to code itself if the code is not found in the registry (so passing "english" returns "english" unchanged).

Parameters:

code (str)

Return type:

str

Examples

>>> iso_to_nltk("en")
'english'
>>> iso_to_nltk("ar")
'arabic'
>>> iso_to_nltk("english")
'english'
>>> iso_to_nltk("grc")
'ancient_greek'
>>> iso_to_nltk("zz")  # unknown → returned as-is
'zz'