CustomFilter#

class scikitplot.corpus.CustomFilter(fn, *, name=None)[source]#

Wrap any callable as a FilterBase.

Parameters:
fncallable

Filter callable. Signature:

def fn(doc: CorpusDocument) -> bool: ...

Return True to include the document, False to discard it.

namestr, optional

Human-readable label used in __repr__.

Raises:
TypeError

If fn is not callable.

Parameters:
  • fn (Callable[[CorpusDocument], bool])

  • name (str | None)

See also

scikitplot.corpus._base.DefaultFilter

Built-in noise filter.

scikitplot.corpus._base.FilterBase

Abstract base class.

Notes

User note: Use to apply domain-specific inclusion criteria — language detection, source type gates, keyword presence checks, etc.

Examples

Keep only English documents that contain the word “treatment”:

def medical_filter(doc):
    return (
        doc.language is None or doc.language == "en"
    ) and "treatment" in doc.text.lower()

reader = DocumentReader.create(
    Path("research.pdf"),
    filter_=CustomFilter(medical_filter),
)
include(doc)[source]#

Return the result of the user-supplied filter callable.

Parameters:
docCorpusDocument

Document to evaluate.

Returns:
bool

True to include; False to discard.

Parameters:

doc (CorpusDocument)

Return type:

bool