Index#
- class scikitplot.cexternals.annoy.Index[source]#
A robust, pickle-safe, MLOps-ready wrapper around
AnnoyIndex.Provides:
Pickling support via a custom
__reduce__implementation.Optional compression (
zliborgzip).Thread-safe persistence using a module-level
RLock.Zero-copy restore path through Annoy’s
deserialize()buffer.
- Parameters:
- fint
Dimensionality of the vectors.
- metricstr
Distance metric (“angular”, “euclidean”, “manhattan”, …).
Notes
Annoy stores all data in a compact contiguous block, so serialization is extremely fast even on multi-GB indexes. The wrapper is fully compatible with joblib multiprocessing and cloud model registries.
- add_item()#
Adds item
i(any nonnegative integer) with vectorv.Note that it will allocate memory for
max(i)+1items.
- build()#
Builds a forest of
n_treestrees.More trees give higher precision when querying. After calling
build, no more items can be added.n_jobsspecifies the number of threads used to build the trees.n_jobs=-1uses all available CPU cores.
- deserialize()#
Deserializes the index from bytes.
- f#
dimension of vectors
- get_distance()#
Returns the distance between items
iandj.
- get_item_vector()#
Returns the vector for item
ithat was previously added.
- get_n_items()#
Returns the number of items in the index.
- get_n_trees()#
Returns the number of trees in the index.
- get_nns_by_item()#
Returns the
nclosest items to itemi.- Parameters:
search_k – the query will inspect up to
search_knodes.
search_kgives you a run-time tradeoff between better accuracy and speed.search_kdefaults ton_trees * nif not provided.- Parameters:
include_distances – If
True, this function will return a
2 element tuple of lists. The first list contains the
nclosest items. The second list contains the corresponding distances.
- get_nns_by_vector()#
Returns the
nclosest items to vectorvector.- Parameters:
search_k – the query will inspect up to
search_knodes.
search_kgives you a run-time tradeoff between better accuracy and speed.search_kdefaults ton_trees * nif not provided.- Parameters:
include_distances – If
True, this function will return a
2 element tuple of lists. The first list contains the
nclosest items. The second list contains the corresponding distances.
- load()#
Loads (mmaps) an index from disk.
- classmethod load_from_file(path)[source]#
Load an index previously saved with
save_to_file.
- metric#
metric name
- on_disk_build()#
Build will be performed with storage on disk instead of RAM.
- save()#
Saves the index to disk.
- save_to_file(path)[source]#
Persist the index to disk using joblib.
- Parameters:
- pathstr
Output
.joblibfile path.
- Parameters:
path (str)
- Return type:
None
Notes
Thread-safe. Locks during serialization.
- serialize()#
Serializes the index to bytes.
- set_seed()#
Sets the seed of Annoy’s random number generator.
- unbuild()#
Unbuilds the tree in order to allows adding new items.
build() has to be called again afterwards in order to run queries.
- unload()#
Unloads an index from disk.
- verbose()#