Index#

class scikitplot.cexternals.annoy.Index[source]#

A robust, pickle-safe, MLOps-ready wrapper around AnnoyIndex.

Provides:

  • Pickling support via a custom __reduce__ implementation.

  • Optional compression (zlib or gzip).

  • Thread-safe persistence using a module-level RLock.

  • Zero-copy restore path through Annoy’s deserialize() buffer.

Parameters:
fint

Dimensionality of the vectors.

metricstr

Distance metric (“angular”, “euclidean”, “manhattan”, …).

Notes

Annoy stores all data in a compact contiguous block, so serialization is extremely fast even on multi-GB indexes. The wrapper is fully compatible with joblib multiprocessing and cloud model registries.

add_item()#

Adds item i (any nonnegative integer) with vector v.

Note that it will allocate memory for max(i)+1 items.

build()#

Builds a forest of n_trees trees.

More trees give higher precision when querying. After calling build, no more items can be added. n_jobs specifies the number of threads used to build the trees. n_jobs=-1 uses all available CPU cores.

property compress: bool#

Whether the binary buffer is compressed before pickling.

property compression_type: str#

Compression algorithm to use when compress is True.

deserialize()#

Deserializes the index from bytes.

f#

dimension of vectors

get_distance()#

Returns the distance between items i and j.

get_item_vector()#

Returns the vector for item i that was previously added.

get_n_items()#

Returns the number of items in the index.

get_n_trees()#

Returns the number of trees in the index.

get_nns_by_item()#

Returns the n closest items to item i.

Parameters:

search_k – the query will inspect up to search_k nodes.

search_k gives you a run-time tradeoff between better accuracy and speed. search_k defaults to n_trees * n if not provided.

Parameters:

include_distances – If True, this function will return a

2 element tuple of lists. The first list contains the n closest items. The second list contains the corresponding distances.

get_nns_by_vector()#

Returns the n closest items to vector vector.

Parameters:

search_k – the query will inspect up to search_k nodes.

search_k gives you a run-time tradeoff between better accuracy and speed. search_k defaults to n_trees * n if not provided.

Parameters:

include_distances – If True, this function will return a

2 element tuple of lists. The first list contains the n closest items. The second list contains the corresponding distances.

load()#

Loads (mmaps) an index from disk.

classmethod load_from_file(path)[source]#

Load an index previously saved with save_to_file.

Parameters:
pathstr
Returns:
Index

Restored index.

Parameters:

path (str)

Return type:

Index

metric#

metric name

on_disk_build()#

Build will be performed with storage on disk instead of RAM.

save()#

Saves the index to disk.

save_to_file(path)[source]#

Persist the index to disk using joblib.

Parameters:
pathstr

Output .joblib file path.

Parameters:

path (str)

Return type:

None

Notes

Thread-safe. Locks during serialization.

serialize()#

Serializes the index to bytes.

set_seed()#

Sets the seed of Annoy’s random number generator.

unbuild()#

Unbuilds the tree in order to allows adding new items.

build() has to be called again afterwards in order to run queries.

unload()#

Unloads an index from disk.

verbose()#