SimilarityIndex#
- class scikitplot.corpus.SimilarityIndex(config=None)[source]#
Multi-mode similarity index over
CorpusDocumentcollections.- Parameters:
- configSearchConfig or None, optional
Default search configuration. Can be overridden per query.
- Parameters:
config (SearchConfig | None)
See also
scikitplot.corpus._schema.MatchModeEnum of match modes.
scikitplot.corpus._adaptersConvert results to LangChain / MCP format.
Notes
User note: Build the index once, query many times:
index = SimilarityIndex() index.build(documents) results = index.search("What did Hamlet say about death?")
Developer note: The index stores references to the original documents. If documents are mutated after building, results are undefined.
Examples
>>> index = SimilarityIndex() >>> # index.build(corpus_documents) >>> # results = index.search("quantum computing")
- search(query, *, config=None, query_embedding=None)[source]#
Search the index.
- Parameters:
- querystr
Query text.
- configSearchConfig or None, optional
Override default config for this query.
- query_embeddingarray-like or None, optional
Pre-computed query embedding. Required for SEMANTIC mode if no embedding engine is attached.
- Returns:
- list[SearchResult]
Results sorted by descending score.
- Parameters:
query (str)
config (SearchConfig | None)
query_embedding (Any | None)
- Return type: