annoy.Index to NPY or CSV with examples#

An example showing the Index class.

import random; random.seed(0)

# from annoy import Annoy, AnnoyIndex
from scikitplot.annoy import Annoy, AnnoyIndex, Index

print(AnnoyIndex.__doc__)
High-level Pythonic Annoy wrapper with picklable (or pickle-able).

Minimal modify spotify/annoy low-level C-API to extend Python API.

.. seealso::
    * :py:obj:`~scikitplot.annoy.Index.from_low_level`
    * https://docs.python.org/3/library/pickle.html#what-can-be-pickled-and-unpickled
import random
from pathlib import Path

random.seed(0)

HERE = Path.cwd().resolve()
OUT = HERE / "../../../scikitplot/annoy/tests" / "test_v2.tree"

f = 10
n = 1000
idx = AnnoyIndex(f, "angular")
for i in range(n):
    idx.add_item(i, [random.gauss(0, 1) for _ in range(f)])

idx.build(10)
idx.save(str(OUT))
print("Wrote", OUT)
idx
Wrote /home/circleci/repo/galleries/examples/annoy/../../../scikitplot/annoy/tests/test_v2.tree

Annoy(f=10, metric='angular', n_items=1000, n_trees=10, on_disk_path=/home/circleci/repo/galleries/examples/annoy/../../../scikitplot/annoy/tests/test_v2.tree)

Small subset → DataFrame/CSV

df = idx.to_dataframe(start=0, stop=1000)
df.to_csv("sample.csv", index=False)
import pandas as pd

pd.read_csv("sample.csv")
id feature_0 feature_1 feature_2 feature_3 feature_4 feature_5 feature_6 feature_7 feature_8 feature_9
0 0 0.941715 -1.396578 -0.679714 0.370504 -1.016349 -0.072120 0.179196 -0.831099 -1.309037 0.193888
1 1 0.993250 -0.646982 -0.333668 1.645672 -0.558890 -0.514157 2.404119 -1.531083 0.796466 -2.003649
2 2 -0.596963 1.503681 1.221436 -0.901120 -0.453699 0.080233 -1.258103 0.552220 2.227577 -1.355241
3 3 -1.981533 0.288244 -0.119123 1.804330 -0.160362 -0.050660 -0.190874 -0.990606 0.673030 -1.324083
4 4 1.166490 0.008376 0.503630 -0.552765 -0.920194 1.800263 0.468550 1.207003 0.187123 2.611608
... ... ... ... ... ... ... ... ... ... ... ...
995 995 -0.764022 0.174524 -0.816212 0.623093 -0.395465 0.193787 -0.769984 -0.147106 0.377592 -0.230512
996 996 0.812510 -1.125429 -0.725055 1.007468 -1.236581 -0.339250 0.958843 -0.857818 1.487129 0.667199
997 997 1.509753 0.877829 -0.604218 0.013888 -0.597203 1.374362 0.723732 -1.195797 0.084885 -0.644913
998 998 -0.479956 -0.314434 2.384329 -1.387915 1.522265 0.047036 0.547916 0.307560 -0.234338 -0.743033
999 999 1.797861 0.535607 0.371127 0.373999 1.999118 -1.771545 -0.133898 -0.841187 -0.977023 -0.905645

1000 rows × 11 columns



Streaming CSV (warning: huge)

idx.to_csv("annoy_vectors.csv", start=0, stop=100_000)
'annoy_vectors.csv'
import pandas as pd

pd.read_csv("annoy_vectors.csv")
id feature_0 feature_1 feature_2 feature_3 feature_4 feature_5 feature_6 feature_7 feature_8 feature_9
0 0 0.941715 -1.396578 -0.679714 0.370504 -1.016349 -0.072120 0.179196 -0.831099 -1.309037 0.193888
1 1 0.993250 -0.646982 -0.333668 1.645672 -0.558890 -0.514157 2.404119 -1.531083 0.796466 -2.003649
2 2 -0.596963 1.503681 1.221436 -0.901120 -0.453699 0.080233 -1.258103 0.552220 2.227577 -1.355242
3 3 -1.981533 0.288244 -0.119123 1.804330 -0.160362 -0.050660 -0.190874 -0.990606 0.673030 -1.324082
4 4 1.166490 0.008376 0.503630 -0.552765 -0.920194 1.800263 0.468550 1.207003 0.187123 2.611608
... ... ... ... ... ... ... ... ... ... ... ...
995 995 -0.764022 0.174524 -0.816212 0.623093 -0.395465 0.193787 -0.769984 -0.147106 0.377592 -0.230512
996 996 0.812510 -1.125429 -0.725055 1.007468 -1.236580 -0.339250 0.958843 -0.857817 1.487129 0.667198
997 997 1.509753 0.877829 -0.604218 0.013888 -0.597203 1.374362 0.723732 -1.195797 0.084885 -0.644913
998 998 -0.479956 -0.314434 2.384329 -1.387915 1.522265 0.047036 0.547916 0.307560 -0.234338 -0.743033
999 999 1.797861 0.535607 0.371127 0.373999 1.999118 -1.771545 -0.133898 -0.841187 -0.977023 -0.905645

1000 rows × 11 columns



Large export → memory-safe .npy Exports items [0, n_items) into a memmapped .npy

idx.save_vectors_npy("annoy_vectors.npy")
'annoy_vectors.npy'
import numpy as np

np.load("annoy_vectors.npy")
array([[ 0.9417154 , -1.3965781 , -0.67971444, ..., -0.8310992 ,
        -1.3090373 ,  0.19388774],
       [ 0.9932497 , -0.64698166, -0.333668  , ..., -1.5310826 ,
         0.7964658 , -2.0036485 ],
       [-0.59696275,  1.5036808 ,  1.2214364 , ...,  0.55222   ,
         2.2275772 , -1.3552415 ],
       ...,
       [ 1.5097532 ,  0.8778289 , -0.6042179 , ..., -1.1957974 ,
         0.0848854 , -0.64491284],
       [-0.47995627, -0.31443435,  2.3843286 , ...,  0.30755976,
        -0.23433805, -0.7430332 ],
       [ 1.7978611 ,  0.53560704,  0.37112716, ..., -0.8411868 ,
        -0.9770226 , -0.90564495]], shape=(1000, 10), dtype=float32)

Range-only export (strict, sized)

idx.save_vectors_npy("chunk_0_1m.npy", start=0, stop=1_000_000)
'chunk_0_1m.npy'
import numpy as np

np.load("chunk_0_1m.npy")
array([[ 0.9417154 , -1.3965781 , -0.67971444, ..., -0.8310992 ,
        -1.3090373 ,  0.19388774],
       [ 0.9932497 , -0.64698166, -0.333668  , ..., -1.5310826 ,
         0.7964658 , -2.0036485 ],
       [-0.59696275,  1.5036808 ,  1.2214364 , ...,  0.55222   ,
         2.2275772 , -1.3552415 ],
       ...,
       [ 1.5097532 ,  0.8778289 , -0.6042179 , ..., -1.1957974 ,
         0.0848854 , -0.64491284],
       [-0.47995627, -0.31443435,  2.3843286 , ...,  0.30755976,
        -0.23433805, -0.7430332 ],
       [ 1.7978611 ,  0.53560704,  0.37112716, ..., -0.8411868 ,
        -0.9770226 , -0.90564495]], shape=(1000, 10), dtype=float32)

Tags: level: beginner purpose: showcase

Total running time of the script: (0 minutes 0.068 seconds)

Related examples

annoy.Index python-api with examples

annoy.Index python-api with examples

annoy.Annoy legacy c-api with examples

annoy.Annoy legacy c-api with examples

plot_aucplot_script with examples

plot_aucplot_script with examples

Mmap annoy.AnnoyIndex with examples

Mmap annoy.AnnoyIndex with examples

Gallery generated by Sphinx-Gallery