JSONLStorage#
- class scikitplot.corpus.JSONLStorage(path)[source]#
Append-friendly JSONL (newline-delimited JSON) flat-file store.
Documents are written one JSON object per line. On construction the file is read into an in-memory index keyed by
doc_idfor O(1)getperformance. Writes append to the file and update the index.Warning
save_batchwrites all documents atomically to a temporary file then renames it over the original. This reorders existing documents on disk (puts new documents at the end). Concurrent writers would corrupt the file; use one writer at a time.- Parameters:
- pathpathlib.Path or str
Path to the
.jsonlfile. Created if absent.
- Parameters:
path (pathlib.Path | str)
Examples
>>> store = JSONLStorage(Path("corpus.jsonl")) >>> store.save(doc) >>> store.count() 1
- get(doc_id)[source]#
Retrieve a document by
doc_id.- Parameters:
- doc_idstr
- Parameters:
doc_id (str)
- Return type:
CorpusDocument | None
- query(q)[source]#
Filter documents by query parameters.
Full-text search is not supported and is ignored.
- Parameters:
- qStorageQuery
- Parameters:
q (StorageQuery)
- Return type:
- save(doc)[source]#
Append or update a document.
If the
doc_idalready exists, the file is rewritten (update semantics). If new, the doc is appended (O(1) write).- Parameters:
- docCorpusDocument
- Parameters:
doc (CorpusDocument)
- Return type:
None
- save_batch(docs)[source]#
Save a batch, rewriting the file atomically once.
- Parameters:
- docssequence of CorpusDocument
- Parameters:
docs (Sequence[CorpusDocument])
- Return type:
None