CustomSimilarityIndex#

class scikitplot.corpus.CustomSimilarityIndex(config=None, *, custom_scorer_fn=None)[source]#

SimilarityIndex extended with a fully-replaceable custom scorer callable.

When custom_scorer_fn is provided, search calls it instead of the built-in strict / keyword / semantic / hybrid modes. The callable receives the query string, the full document list, and the SearchConfig object.

Parameters:
configSearchConfig or None, optional

Default search configuration.

custom_scorer_fncallable or None, optional

Custom scoring callable. When set, completely replaces the built-in match modes for every search call. Signature:

def custom_scorer_fn(
    query: str,
    documents: list[CorpusDocument],
    config: SearchConfig,
) -> list[SearchResult]: ...

The callable must return a list of SearchResult instances.

Raises:
TypeError

If custom_scorer_fn is provided but not callable.

Parameters:
  • config (Any | None)

  • custom_scorer_fn (Callable[[str, list[Any], Any], list[Any]] | None)

Notes

User note: Use this to plug in a reranker (Cohere, BGE, ColBERT), a dense retrieval backend (Weaviate, Qdrant, Pinecone), or any other scoring logic that requires access to the full document list at query time.

Developer note: The built-in index (SimilarityIndex) is wrapped, not subclassed, to avoid MRO conflicts with its lazy-import dependencies. All build(), property, and __repr__ calls are delegated to the inner index.

Examples

Plug in a Cohere reranker:

import cohere

co = cohere.Client("API_KEY")

def cohere_rerank(query, docs, cfg):
    texts = [d.text[:512] for d in docs]
    resp = co.rerank(query=query, documents=texts, top_n=cfg.top_k)
    return [
        SearchResult(
            doc=docs[r.index], score=r.relevance_score, match_mode="cohere"
        )
        for r in resp.results
    ]

index = CustomSimilarityIndex(custom_scorer_fn=cohere_rerank)
index.build(corpus_documents)
results = index.search("clinical trial outcomes")
build(documents)[source]#

Build the index from documents.

Parameters:
documentsSequence[CorpusDocument]

documents.

Raises:
ValueError

If documents is empty.

Parameters:

documents (Sequence[Any])

Return type:

None

property has_embeddings: bool#

Whether dense embeddings are indexed.

property n_documents: int#

Number of indexed documents.

search(query, *, config=None, query_embedding=None)[source]#

Search the index using the custom scorer or built-in modes.

When custom_scorer_fn is set it is called with (query, documents, resolved_config) and its return value is used directly. Otherwise search is called on the inner index.

Parameters:
querystr

Query string.

configSearchConfig or None, optional

Per-query config override.

query_embeddingarray-like or None, optional

Pre-computed query embedding for semantic/hybrid modes.

Returns:
list[SearchResult]

Results sorted by descending score.

Raises:
RuntimeError

If custom_scorer_fn raises an unexpected exception.

Parameters:
  • query (str)

  • config (Any | None)

  • query_embedding (Any | None)

Return type:

list[Any]