ComponentRegistry#
- class scikitplot.corpus.ComponentRegistry[source]#
Central look-up table for corpus pipeline components.
Stores class references (not instances) for four component types: chunkers, filters, readers, and normalizers. Callers retrieve a class and instantiate it with their own parameters.
Notes
The module-level
registrysingleton is pre-populated with all built-in components viaregister_builtins. Third-party packages can register additional components after import.Examples
>>> from scikitplot.corpus._registry import registry >>> registry.register_builtins() >>> cls = registry.get_chunker("paragraph") >>> chunker = cls(min_chars=20)
- build_chunker(name, **kwargs)[source]#
Instantiate the chunker registered under
name.- Parameters:
- namestr
Registry key.
- **kwargs
Constructor keyword arguments.
- Returns:
- ChunkerBase instance
- Raises:
- KeyError
If
nameis not registered.
- Parameters:
- Return type:
Examples
>>> chunker = registry.build_chunker("paragraph", min_chars=20)
- classmethod load_from_snapshot(snapshot, *, allowed_module_prefixes='scikitplot.')[source]#
Reconstruct a registry from a snapshot.
- Parameters:
- snapshotdict
Snapshot created by
snapshot().- allowed_module_prefixesstr | list[str] | None, default=”scikitplot.”
If provided, only classes whose module starts with one of these prefixes are allowed. Recommended for security.
Caution
⚠: Loading arbitrary FQCN from untrusted JSON is remote code execution risk.
- Returns:
- ComponentRegistry
New registry populated from snapshot.
- Raises:
- ValueError
If snapshot structure is invalid.
- TypeError
If resolved class does not match expected base type.
- Parameters:
- Return type:
- register_builtins()[source]#
Register all built-in corpus pipeline components.
Safe to call multiple times — subsequent calls are no-ops. Triggers the necessary imports to populate the
DocumentReaderregistry as well.Notes
Importing
scikitplot.corpus._readersas a side effect here is intentional: it populates theDocumentReader._registryextension map used bycreate.- Return type:
None
- register_chunker(name, cls)[source]#
Register a chunker class under
name.- Parameters:
- namestr
Registry key (lowercase, underscore-separated). Must be non-empty.
- clstype
Concrete class inheriting from
ChunkerBase.
- Raises:
- ValueError
If
nameis empty.- TypeError
If
clsis not a type.
- Parameters:
- Return type:
None
- register_filter(name, cls)[source]#
Register a filter class under
name.- Parameters:
- namestr
- clstype
Concrete class inheriting from
FilterBase.
- Parameters:
- Return type:
None
- register_reader(name, cls)[source]#
Register a reader class under
name(typically a file extension).- Parameters:
- namestr
File extension (e.g.
\".txt\") or URL scheme key (e.g.\":url\").- clstype
Concrete class inheriting from
DocumentReader.
- Parameters:
- Return type:
None