ANNoy#

This page documents the Annoy [0] integration shipped with scikit-plots.

High-level Python interface for the C++ Annoy backend.

This page documents annoy. It provides a stable import path and a small, user-facing API built on the low-level bindings in _annoy.

Note

For backend and C-extension details, see spotify/ANNoy (experimental).

Exports#

This module exports:

  • Annoy: Low-level C-extension type (stable).

  • AnnoyIndex: Public alias of the Annoy index.

  • Index: High-level Python wrapper subclass (picklable).

Workflow#

  1. Create an AnnoyIndex with a fixed vector length f and a metric.

  2. Add items with add_item.

  3. Build the forest with build.

  4. Save and load with save and load.

Quick start#

import random; random.seed(0)
# from annoy import Annoy, AnnoyIndex
# from scikitplot.cexternals._annoy import Annoy, AnnoyIndex
from scikitplot.annoy import Annoy, AnnoyIndex, Index

f = 40  # Length of item vector that will be indexed
t = AnnoyIndex(f, 'angular')

for i in range(1000):
    v = [random.gauss(0, 1) for z in range(f)]
    t.add_item(i, v)

t.build(10)  # 10 trees
t.save('test.ann')

u = AnnoyIndex(f, 'angular')
u.load('test.ann')  # memory-mapped

print(u.get_nns_by_item(0, 1000))

Notes#

  • Every added vector must have length f.

  • Add items before calling build.

  • Item ids are integers. Storage is allocated up to max(id) + 1.

High-level wrapper: Index#

Index is a Pythonic wrapper for Annoy-like objects.

It is designed for higher-level workflows where you want a Python object that is safe to serialize and move between processes.

Mixins used by the high-level wrapper#

The wrapper uses mixins _mixins to keep features separate and explicit.

Further reading#

References#