ANNoy#
This page documents the Annoy [0] integration shipped with scikit-plots.
High-level Python interface for the C++ Annoy backend.
This page documents annoy. It provides a stable import path
and a small, user-facing API built on the low-level bindings in
_annoy.
Note
For backend and C-extension details, see spotify/ANNoy (experimental).
Exports#
This module exports:
Annoy: Low-level C-extension type (stable).AnnoyIndex: Public alias of the Annoy index.Index: High-level Python wrapper subclass (picklable).
Workflow#
Create an
AnnoyIndexwith a fixed vector lengthfand a metric.Add items with
add_item.Build the forest with
build.Save and load with
saveandload.
Quick start#
import random; random.seed(0)
# from annoy import Annoy, AnnoyIndex
# from scikitplot.cexternals._annoy import Annoy, AnnoyIndex
from scikitplot.annoy import Annoy, AnnoyIndex, Index
f = 40 # Length of item vector that will be indexed
t = AnnoyIndex(f, 'angular')
for i in range(1000):
v = [random.gauss(0, 1) for z in range(f)]
t.add_item(i, v)
t.build(10) # 10 trees
t.save('test.ann')
u = AnnoyIndex(f, 'angular')
u.load('test.ann') # memory-mapped
print(u.get_nns_by_item(0, 1000))
Notes#
Every added vector must have length
f.Add items before calling
build.Item ids are integers. Storage is allocated up to
max(id) + 1.
High-level wrapper: Index#
Index is a Pythonic wrapper for Annoy-like objects.
It is designed for higher-level workflows where you want a Python object that is safe to serialize and move between processes.
Mixins used by the high-level wrapper#
The wrapper uses mixins _mixins
to keep features separate and explicit.
Further reading#
See also
See also
Nearest neighbor search (background): https://en.wikipedia.org/wiki/Nearest_neighbor_search
https://www.researchgate.net/publication/363234433_Analysis_of_Image_Similarity_Using_CNN_and_ANNOY
https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/XboxInnerProduct.pdf
https://link.springer.com/chapter/10.1007/978-981-97-7831-7_2