ANNoy Vector Database (Approximate Nearest Neighbors)#

ANNoy helps you find similar items fast.

You give your data as vectors (arrays of numbers). Then you can search for the nearest neighbors (the most similar vectors).

This page documents the Annoy [1] user guide integration shipped with scikit-plots.

  • Low-level bindings C-API: _annoy

  • High-level Python-API: annoy

Public Python API#

This module exports:

  • Annoy: Low-level C-extension type (stable, picklable).

  • AnnoyIndex: Public alias of the Low-level Annoy index.

  • Index: High-level Python wrapper subclass (stable, picklable).

Note

For backend and C-extension details, see spotify/ANNoy Vector Database (Approximate Nearest Neighbors).

High-level Python interface for the C++ Annoy backend.

This page documents annoy. It provides a stable import path and a small, user-facing API built on the low-level bindings in _annoy.

Workflow#

  1. Create an AnnoyIndex with a fixed vector length f and a metric.

  2. Add items with add_item.

  3. Build the forest with build.

  4. Save and load with save and load.

Quick start#

import random; random.seed(0)
# from annoy import Annoy, AnnoyIndex
# from scikitplot.cexternals._annoy import Annoy, AnnoyIndex
from scikitplot.annoy import Annoy, AnnoyIndex, Index

f = 40  # Length of item vector that will be indexed
t = AnnoyIndex(f, 'angular')

for i in range(1000):
    v = [random.gauss(0, 1) for z in range(f)]
    t.add_item(i, v)

t.build(10)  # 10 trees
t.save('test.ann')

u = AnnoyIndex(f, 'angular')
u.load('test.ann')  # memory-mapped

print(u.get_nns_by_item(0, 1000))

Notes#

  • Every added vector must have length f.

  • Add items before calling build.

  • Item ids are integers. Storage is allocated up to max(id) + 1.

High-level wrapper: Index#

Index is a Pythonic wrapper for Annoy-like objects.

It is designed for higher-level workflows where you want a Python object that is safe to serialize and move between processes.

Mixins used by the high-level wrapper#

The wrapper uses mixins _mixins to keep features separate and explicit.

Further reading#

References#