FactoryCorpusBuilder#

class scikitplot.corpus.FactoryCorpusBuilder(config=None, *, factories=None)[source]#

CorpusBuilder extended with pluggable component factories.

All public methods (build, add, search, to_langchain, etc.) are available through delegation to the wrapped CorpusBuilder. When a factory is provided for a given component, it replaces the corresponding lazy-init method.

Parameters:
configBuilderConfig or None, optional

Pipeline configuration. None uses defaults.

factoriesBuilderFactories or None, optional

Component factory callables. None disables all overrides.

Parameters:

Notes

User note: Use FactoryCorpusBuilder when you need to inject components that cannot be described by configuration alone — custom readers with per-source state, enrichers backed by remote APIs, embedding engines with non-standard initialisation, etc.

Developer note: Factory injection is performed by overriding the private _get_* lazy-init methods inherited from CorpusBuilder.

Examples

Inject a custom embedding engine factory:

def my_embed_factory():
    return MyEmbeddingEngine(model="custom-embedder-v2")

factories = BuilderFactories(embedding_engine_factory=my_embed_factory)
builder = FactoryCorpusBuilder(
    config=BuilderConfig(embed=True, build_index=True),
    factories=factories,
)
result = builder.build("./papers/")
results = builder.search("attention mechanism")
add(sources, **kwargs)[source]#

Add sources to existing corpus — delegates to inner builder.

Parameters:
Return type:

Any

build(sources, **kwargs)[source]#

Build corpus — delegates to inner builder with factory overrides.

Parameters:
Return type:

Any

close()[source]#

Clean up temporary files.

Return type:

None

export(path, **kwargs)[source]#

Export documents to file.

Parameters:
Return type:

Any

search(query, **kwargs)[source]#

Search corpus — delegates to inner builder.

Parameters:
Return type:

Any

to_huggingface()[source]#

Export as HuggingFace Dataset.

Return type:

Any

to_jsonl()[source]#

Export as JSONL lines.

Return type:

Any

to_langchain()[source]#

Export as LangChain documents.

Return type:

Any

to_langchain_retriever()[source]#

Create LangChain retriever.

Return type:

Any

to_langgraph_state(**kwargs)[source]#

Export as LangGraph state.

Parameters:

kwargs (Any)

Return type:

Any

to_mcp_resources(**kwargs)[source]#

Export as MCP resources.

Parameters:

kwargs (Any)

Return type:

Any

to_mcp_tool_result(query, **kwargs)[source]#

Search and format as MCP tool result.

Parameters:
Return type:

Any

to_rag_tuples()[source]#

Export as RAG tuples.

Return type:

Any