StorageBase#

class scikitplot.corpus.StorageBase[source]#

Abstract base class for all corpus storage backends.

All implementations must be safe to construct and use without holding any external resource until the first save/get call.

See also

InMemoryStorage

Dict-backed, testing only.

JSONLStorage

Flat JSONL file, zero dependencies.

SQLiteStorage

SQLite with FTS5, no external dependencies.

count()[source]#

Return the total number of stored documents.

Default implementation uses a StorageQuery with no filters. Override for backends that can compute this more efficiently.

Returns:
int
Return type:

int

abstractmethod get(doc_id)[source]#

Retrieve a document by its identifier.

Parameters:
doc_idstr

The CorpusDocument.doc_id to look up.

Returns:
CorpusDocument or None

The stored document, or None if not found.

Parameters:

doc_id (str)

Return type:

CorpusDocument | None

abstractmethod query(q)[source]#

Retrieve documents matching the query parameters.

Parameters:
qStorageQuery

Query specification.

Returns:
QueryResult
Parameters:

q (StorageQuery)

Return type:

QueryResult

abstractmethod save(doc)[source]#

Persist a single document.

Parameters:
docCorpusDocument

Document to store. Must be validated before calling.

Parameters:

doc (CorpusDocument)

Return type:

None

abstractmethod save_batch(docs)[source]#

Persist a batch of documents atomically.

Parameters:
docssequence of CorpusDocument

Documents to store. May be empty (no-op).

Parameters:

docs (Sequence[CorpusDocument])

Return type:

None