ANNoy Vector Database (Approximate Nearest Neighbors)#
ANNoy helps you find similar items fast.
You give your data as vectors (arrays of numbers). Then you can search for the nearest neighbors (the most similar vectors).
This page documents the Annoy [1] user guide integration shipped with scikit-plots.
Public Python API#
This module exports:
Annoy: Low-level C-extension type (stable, picklable).AnnoyIndex: Public alias of the Low-levelAnnoyindex.Index: High-level Python wrapper subclass (stable, picklable).
Note
For backend and C-extension details, see spotify/ANNoy Vector Database (Approximate Nearest Neighbors).
High-level Python interface for the C++ Annoy backend.
This page documents annoy. It provides a stable import path
and a small, user-facing API built on the low-level bindings in
_annoy.
Workflow#
Create an
AnnoyIndexwith a fixed vector lengthfand a metric.Add items with
add_item.Build the forest with
build.Save and load with
saveandload.
Quick start#
import random; random.seed(0)
# from annoy import Annoy, AnnoyIndex
# from scikitplot.cexternals._annoy import Annoy, AnnoyIndex
from scikitplot.annoy import Annoy, AnnoyIndex, Index
f = 40 # Length of item vector that will be indexed
t = AnnoyIndex(f, 'angular')
for i in range(1000):
v = [random.gauss(0, 1) for z in range(f)]
t.add_item(i, v)
t.build(10) # 10 trees
t.save('test.ann')
u = AnnoyIndex(f, 'angular')
u.load('test.ann') # memory-mapped
print(u.get_nns_by_item(0, 1000))
Notes#
Every added vector must have length
f.Add items before calling
build.Item ids are integers. Storage is allocated up to
max(id) + 1.
High-level wrapper: Index#
Index is a Pythonic wrapper for Annoy-like objects.
It is designed for higher-level workflows where you want a Python object that is safe to serialize and move between processes.
Mixins used by the high-level wrapper#
The wrapper uses mixins _mixins
to keep features separate and explicit.
Further reading#
See also
See also
Nearest neighbor search (background): https://en.wikipedia.org/wiki/Nearest_neighbor_search
https://www.researchgate.net/publication/363234433_Analysis_of_Image_Similarity_Using_CNN_and_ANNOY
https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/XboxInnerProduct.pdf
https://link.springer.com/chapter/10.1007/978-981-97-7831-7_2