LanguageDetectionNormalizer#
- class scikitplot.corpus.LanguageDetectionNormalizer(fallback_language=None, min_confidence=0.7, overwrite=False)[source]#
Detect document language and set
CorpusDocument.language.Uses
langdetect(pip install langdetect) which is a port of Google’s language-detection library. Falls back to the providedfallback_languageif detection fails or the detected language has confidence belowmin_confidence.- Parameters:
- fallback_languagestr or None, optional
ISO 639-1 language code to use when detection fails.
Noneleaveslanguageunchanged on failure. Default:None.- min_confidencefloat, optional
Minimum probability threshold for accepting a detected language. Must be in
[0.0, 1.0]. Default:0.7.- overwritebool, optional
When
False, skip detection if the document already has a non-Nonelanguagefield. Default:False.
- Parameters:
Examples
>>> norm = LanguageDetectionNormalizer(fallback_language="en") >>> doc = CorpusDocument.create("f.txt", 0, "The quick brown fox.") >>> result = norm.normalize_doc(doc) >>> result.language 'en'