JSONLStorage#

class scikitplot.corpus.JSONLStorage(path)[source]#

Append-friendly JSONL (newline-delimited JSON) flat-file store.

Documents are written one JSON object per line. On construction the file is read into an in-memory index keyed by doc_id for O(1) get performance. Writes append to the file and update the index.

Warning

save_batch writes all documents atomically to a temporary file then renames it over the original. This reorders existing documents on disk (puts new documents at the end). Concurrent writers would corrupt the file; use one writer at a time.

Parameters:
pathpathlib.Path or str

Path to the .jsonl file. Created if absent.

Parameters:

path (pathlib.Path | str)

Examples

>>> store = JSONLStorage(Path("corpus.jsonl"))
>>> store.save(doc)
>>> store.count()
1
count()[source]#

Return total stored document count in O(1).

Return type:

int

get(doc_id)[source]#

Retrieve a document by doc_id.

Parameters:
doc_idstr
Parameters:

doc_id (str)

Return type:

CorpusDocument | None

query(q)[source]#

Filter documents by query parameters.

Full-text search is not supported and is ignored.

Parameters:
qStorageQuery
Parameters:

q (StorageQuery)

Return type:

QueryResult

save(doc)[source]#

Append or update a document.

If the doc_id already exists, the file is rewritten (update semantics). If new, the doc is appended (O(1) write).

Parameters:
docCorpusDocument
Parameters:

doc (CorpusDocument)

Return type:

None

save_batch(docs)[source]#

Save a batch, rewriting the file atomically once.

Parameters:
docssequence of CorpusDocument
Parameters:

docs (Sequence[CorpusDocument])

Return type:

None