CustomFilter#
- class scikitplot.corpus.CustomFilter(fn, *, name=None)[source]#
Wrap any callable as a
FilterBase.- Parameters:
- fncallable
Filter callable. Signature:
def fn(doc: CorpusDocument) -> bool: ...
Return
Trueto include the document,Falseto discard it.- namestr, optional
Human-readable label used in
__repr__.
- Raises:
- TypeError
If
fnis not callable.
- Parameters:
See also
scikitplot.corpus._base.DefaultFilterBuilt-in noise filter.
scikitplot.corpus._base.FilterBaseAbstract base class.
Notes
User note: Use to apply domain-specific inclusion criteria — language detection, source type gates, keyword presence checks, etc.
Examples
Keep only English documents that contain the word “treatment”:
def medical_filter(doc): return ( doc.language is None or doc.language == "en" ) and "treatment" in doc.text.lower() reader = DocumentReader.create( Path("research.pdf"), filter_=CustomFilter(medical_filter), )