SearchConfig#

class scikitplot.corpus.SearchConfig(top_k=10, match_mode='semantic', semantic_threshold=0.0, keyword_threshold=0.0, hybrid_alpha=0.5, rrf_k=60, use_normalized_text=True, case_sensitive=False)[source]#

Configuration for similarity search.

Parameters:
top_kint

Maximum results to return.

match_modestr

One of "strict", "keyword", "semantic", "hybrid".

semantic_thresholdfloat

Minimum cosine similarity for SEMANTIC results.

keyword_thresholdfloat

Minimum keyword overlap for KEYWORD results.

hybrid_alphafloat

Weight for semantic scores in HYBRID mode (0 = pure keyword, 1 = pure semantic). Default 0.5 (equal weight).

rrf_kint

Reciprocal rank fusion constant. Default 60 (standard).

use_normalized_textbool

Use normalized_text for matching when available.

case_sensitivebool

Case-sensitive matching in STRICT mode.

Parameters:
  • top_k (int)

  • match_mode (str)

  • semantic_threshold (float)

  • keyword_threshold (float)

  • hybrid_alpha (float)

  • rrf_k (int)

  • use_normalized_text (bool)

  • case_sensitive (bool)

Notes

User note: For RAG pipelines, match_mode="hybrid" with default settings provides a good balance. For exact citation matching, use match_mode="strict".

case_sensitive: bool = False#
hybrid_alpha: float = 0.5#
keyword_threshold: float = 0.0#
match_mode: str = 'semantic'#
rrf_k: int = 60#
semantic_threshold: float = 0.0#
top_k: int = 10#
use_normalized_text: bool = True#