annoy.Index to NPY or CSV with examples#
An example showing the Index class.
See also
import random; random.seed(0)
# from annoy import Annoy, AnnoyIndex
from scikitplot.annoy import Annoy, AnnoyIndex, Index
print(AnnoyIndex.__doc__)
High-level Pythonic Annoy wrapper with picklable (or pickle-able).
Minimal modify spotify/annoy low-level C-API to extend Python API.
.. seealso::
* :py:obj:`~scikitplot.annoy.Index.from_low_level`
* https://docs.python.org/3/library/pickle.html#what-can-be-pickled-and-unpickled
import random
from pathlib import Path
random.seed(0)
HERE = Path.cwd().resolve()
OUT = HERE / "../../../scikitplot/annoy/tests" / "test_v2.tree"
f = 10
n = 1000
idx = AnnoyIndex(f, "angular")
for i in range(n):
idx.add_item(i, [random.gauss(0, 1) for _ in range(f)])
idx.build(10)
idx.save(str(OUT))
print("Wrote", OUT)
idx
Wrote /home/circleci/repo/galleries/examples/annoy/../../../scikitplot/annoy/tests/test_v2.tree
Annoy(f=10, metric='angular', n_items=1000, n_trees=10, on_disk_path=/home/circleci/repo/galleries/examples/annoy/../../../scikitplot/annoy/tests/test_v2.tree)
Small subset → DataFrame/CSV
df = idx.to_dataframe(start=0, stop=1000)
df.to_csv("sample.csv", index=False)
import pandas as pd
pd.read_csv("sample.csv")
Streaming CSV (warning: huge)
idx.to_csv("annoy_vectors.csv", start=0, stop=100_000)
'annoy_vectors.csv'
import pandas as pd
pd.read_csv("annoy_vectors.csv")
Large export → memory-safe .npy Exports items [0, n_items) into a memmapped .npy
idx.save_vectors_npy("annoy_vectors.npy")
'annoy_vectors.npy'
import numpy as np
np.load("annoy_vectors.npy")
array([[ 0.9417154 , -1.3965781 , -0.67971444, ..., -0.8310992 ,
-1.3090373 , 0.19388774],
[ 0.9932497 , -0.64698166, -0.333668 , ..., -1.5310826 ,
0.7964658 , -2.0036485 ],
[-0.59696275, 1.5036808 , 1.2214364 , ..., 0.55222 ,
2.2275772 , -1.3552415 ],
...,
[ 1.5097532 , 0.8778289 , -0.6042179 , ..., -1.1957974 ,
0.0848854 , -0.64491284],
[-0.47995627, -0.31443435, 2.3843286 , ..., 0.30755976,
-0.23433805, -0.7430332 ],
[ 1.7978611 , 0.53560704, 0.37112716, ..., -0.8411868 ,
-0.9770226 , -0.90564495]], shape=(1000, 10), dtype=float32)
Range-only export (strict, sized)
idx.save_vectors_npy("chunk_0_1m.npy", start=0, stop=1_000_000)
'chunk_0_1m.npy'
import numpy as np
np.load("chunk_0_1m.npy")
array([[ 0.9417154 , -1.3965781 , -0.67971444, ..., -0.8310992 ,
-1.3090373 , 0.19388774],
[ 0.9932497 , -0.64698166, -0.333668 , ..., -1.5310826 ,
0.7964658 , -2.0036485 ],
[-0.59696275, 1.5036808 , 1.2214364 , ..., 0.55222 ,
2.2275772 , -1.3552415 ],
...,
[ 1.5097532 , 0.8778289 , -0.6042179 , ..., -1.1957974 ,
0.0848854 , -0.64491284],
[-0.47995627, -0.31443435, 2.3843286 , ..., 0.30755976,
-0.23433805, -0.7430332 ],
[ 1.7978611 , 0.53560704, 0.37112716, ..., -0.8411868 ,
-0.9770226 , -0.90564495]], shape=(1000, 10), dtype=float32)
Total running time of the script: (0 minutes 0.068 seconds)
Related examples