ComponentRegistry#

class scikitplot.corpus.ComponentRegistry[source]#

Central look-up table for corpus pipeline components.

Stores class references (not instances) for four component types: chunkers, filters, readers, and normalizers. Callers retrieve a class and instantiate it with their own parameters.

Notes

The module-level registry singleton is pre-populated with all built-in components via register_builtins. Third-party packages can register additional components after import.

Examples

>>> from scikitplot.corpus._registry import registry
>>> registry.register_builtins()
>>> cls = registry.get_chunker("paragraph")
>>> chunker = cls(min_chars=20)
build_chunker(name, **kwargs)[source]#

Instantiate the chunker registered under name.

Parameters:
namestr

Registry key.

**kwargs

Constructor keyword arguments.

Returns:
ChunkerBase instance
Raises:
KeyError

If name is not registered.

Parameters:
Return type:

Any

Examples

>>> chunker = registry.build_chunker("paragraph", min_chars=20)
build_filter(name, **kwargs)[source]#

Instantiate the filter registered under name.

Parameters:
namestr
**kwargs
Returns:
FilterBase instance
Parameters:
Return type:

Any

build_normalizer(name, **kwargs)[source]#

Instantiate the normalizer registered under name.

Parameters:
namestr
**kwargs
Returns:
NormalizerBase instance
Parameters:
Return type:

Any

get_chunker(name)[source]#

Return the chunker class registered under name.

Parameters:
namestr

Registry key.

Returns:
type

The registered chunker class.

Raises:
KeyError

If name is not registered.

Parameters:

name (str)

Return type:

type

get_filter(name)[source]#

Return the filter class registered under name.

Parameters:
namestr
Returns:
type
Raises:
KeyError

If name is not registered.

Parameters:

name (str)

Return type:

type

get_normalizer(name)[source]#

Return the normalizer class registered under name.

Parameters:
namestr
Returns:
type
Raises:
KeyError

If name is not registered.

Parameters:

name (str)

Return type:

type

get_reader(name)[source]#

Return the reader class registered under name.

Parameters:
namestr
Returns:
type
Raises:
KeyError

If name is not registered.

Parameters:

name (str)

Return type:

type

list_chunkers()[source]#

Return sorted list of registered chunker names.

Return type:

list[str]

list_filters()[source]#

Return sorted list of registered filter names.

Return type:

list[str]

list_normalizers()[source]#

Return sorted list of registered normalizer names.

Return type:

list[str]

list_readers()[source]#

Return sorted list of registered reader names / extensions.

Return type:

list[str]

classmethod load_from_snapshot(snapshot, *, allowed_module_prefixes='scikitplot.')[source]#

Reconstruct a registry from a snapshot.

Parameters:
snapshotdict

Snapshot created by snapshot().

allowed_module_prefixesstr | list[str] | None, default=”scikitplot.”

If provided, only classes whose module starts with one of these prefixes are allowed. Recommended for security.

Caution

  • ⚠: Loading arbitrary FQCN from untrusted JSON is remote code execution risk.

Returns:
ComponentRegistry

New registry populated from snapshot.

Raises:
ValueError

If snapshot structure is invalid.

TypeError

If resolved class does not match expected base type.

Parameters:
Return type:

ComponentRegistry

register_builtins()[source]#

Register all built-in corpus pipeline components.

Safe to call multiple times — subsequent calls are no-ops. Triggers the necessary imports to populate the DocumentReader registry as well.

Notes

Importing scikitplot.corpus._readers as a side effect here is intentional: it populates the DocumentReader._registry extension map used by create.

Return type:

None

register_chunker(name, cls)[source]#

Register a chunker class under name.

Parameters:
namestr

Registry key (lowercase, underscore-separated). Must be non-empty.

clstype

Concrete class inheriting from ChunkerBase.

Raises:
ValueError

If name is empty.

TypeError

If cls is not a type.

Parameters:
Return type:

None

register_filter(name, cls)[source]#

Register a filter class under name.

Parameters:
namestr
clstype

Concrete class inheriting from FilterBase.

Parameters:
Return type:

None

register_normalizer(name, cls)[source]#

Register a normalizer class under name.

Parameters:
namestr
clstype

Concrete class inheriting from NormalizerBase.

Parameters:
Return type:

None

register_reader(name, cls)[source]#

Register a reader class under name (typically a file extension).

Parameters:
namestr

File extension (e.g. \".txt\") or URL scheme key (e.g. \":url\").

clstype

Concrete class inheriting from DocumentReader.

Parameters:
Return type:

None

snapshot()[source]#

Return a JSON-safe snapshot of all registered components.

Returns:
dict[str, dict[str, str]]

Keys: "chunkers", "filters", "readers", "normalizers". Values: dicts mapping name → class qualname.

Return type:

dict[str, dict[str, str]]