annoy.Index python-api with examples#
An example showing the Index class.
See also
import numpy as np
import random; random.seed(0)
# from annoy import Annoy, AnnoyIndex
# from scikitplot.cexternals._annoy import Annoy, AnnoyIndex
from scikitplot.annoy import Annoy, AnnoyIndex, Index
print(Annoy.__doc__)
Compiled with GCC/Clang. Using 512-bit AVX instructions.
Approximate Nearest Neighbors index (Annoy) with a small, lazy C-extension wrapper.
::
>>> Annoy(
>>> f=None,
>>> metric=None,
>>> *,
>>> n_neighbors=5,
>>> on_disk_path=None,
>>> prefault=None,
>>> seed=None,
>>> verbose=None,
>>> schema_version=None,
>>> )
Parameters
----------
f : int or None, optional, default=None
Vector dimension. If ``0`` or ``None``, dimension may be inferred from the
first vector passed to ``add_item`` (lazy mode).
If None, treated as ``0`` (reset to default).
metric : {"angular", "cosine", "euclidean", "l2", "lstsq", "manhattan", "l1", "cityblock", "taxicab", "dot", "@", ".", "dotproduct", "inner", "innerproduct", "hamming"} or None, optional, default=None
Distance metric (one of 'angular', 'euclidean', 'manhattan', 'dot', 'hamming').
If omitted and ``f > 0``, defaults to ``'angular'`` (cosine-like).
If omitted and ``f == 0``, metric may be set later before construction.
If None, behavior depends on ``f``:
* If ``f > 0``: defaults to ``'angular'`` (legacy behavior; may emit a
:class:`FutureWarning`).
* If ``f == 0``: leaves the metric unset (lazy). You may set
:attr:`metric` later before construction, or it will default to
``'angular'`` on first :meth:`add_item`.
n_neighbors : int, default=5
Non-negative integer Number of neighbors to retrieve for each query.
on_disk_path : str or None, optional, default=None
If provided, configures the path for on-disk building. When the underlying
index exists, this enables on-disk build mode (equivalent to calling
:meth:`on_disk_build` with the same filename).
Note: Annoy core truncates the target file when enabling on-disk build.
This wrapper treats ``on_disk_path`` as strictly equivalent to calling
:meth:`on_disk_build` with the same filename (truncate allowed).
In lazy mode (``f==0`` and/or ``metric is None``), activation occurs once
the underlying C++ index is created.
prefault : bool or None, optional, default=None
If True, request page-faulting index pages into memory when loading
(when supported by the underlying platform/backing).
If None, treated as ``False`` (reset to default).
seed : int or None, optional, default=None
Non-negative integer seed. If set before the index is constructed,
the seed is stored and applied when the C++ index is created.
Seed value ``0`` is treated as \"use Annoy's deterministic default seed\"
(a :class:`UserWarning` is emitted when ``0`` is explicitly provided).
verbose : int or None, optional, default=None
Verbosity level. Values are clamped to the range ``[-2, 2]``.
``level >= 1`` enables Annoy's verbose logging; ``level <= 0`` disables it.
Logging level inspired by gradient-boosting libraries:
* ``<= 0`` : quiet (warnings only)
* ``1`` : info (Annoy's ``verbose=True``)
* ``>= 2`` : debug (currently same as info, reserved for future use)
schema_version : int, optional, default=None
Serialization/compatibility strategy marker.
This does not change the Annoy on-disk format, but it *does* control
how the index is snapshotted in pickles.
* ``0`` or ``1``: pickle stores a ``portable-v1`` snapshot (fast restore,
ABI-checked).
* ``2``: pickle stores ``canonical-v1`` (portable across ABIs; restores by
rebuilding deterministically).
* ``>=3``: pickle stores both portable and canonical (canonical is used as
a fallback if the ABI check fails).
If None, treated as ``0`` (reset to default).
Attributes
----------
f : int, default=0
Vector dimension. ``0`` means "unknown / lazy".
metric : {'angular', 'euclidean', 'manhattan', 'dot', 'hamming'}, default="angular"
Canonical metric name, or None if not configured yet (lazy).
n_neighbors : int, default=5
Non-negative integer Number of neighbors to retrieve for each query.
on_disk_path : str or None, optional, default=None
Configured on-disk build path. Setting this attribute enables on-disk
build mode (equivalent to :meth:`on_disk_build`), with safety checks
to avoid implicit truncation of existing files.
prefault : bool, default=False
Stored prefault flag (see :meth:`load`/`:meth:`save` prefault parameters).
seed : int or None, optional, default=None
Non-negative integer seed. Also provides :meth:`random_state`
verbose : int or None, optional, default=None
Verbosity level.
schema_version : int, default=0
Reserved schema/version marker (stored; does not affect on-disk format).
n_features : int
Alias of :meth:`f` (dimension), provided for scikit-learn naming parity.
Also provides :meth:`n_features_`, :meth:`n_features_in_`.
n_features_out_ : int
Number of output features produced by transform.
feature_names_in_ : list-like
Input feature names seen during fit.
Set only when explicitly provided via fit(..., feature_names=...).
y : list-like | None, optional, default=None
Dense label cache aligned to item ids (``0 .. n_items-1``). This is a
convenience view for scikit-learn style APIs and may be derived from
``y_map`` lazily.
The setter accepts sequences only (a dict is not allowed); when possible it
validates that ``len(y) == n_items`` and updates ``y_map`` deterministically.
y_map : dict | None, optional, default=None
Canonical sparse mapping ``{item_id -> label}``. Keys must be non-negative
integers and (when an index exists) strictly less than ``n_items``.
Setting this property invalidates the dense ``y`` cache; ``y`` is
materialized lazily (missing keys become ``None``).
See Also
--------
add_item : Add a vector to the index.
build : Build the forest after adding items.
unbuild : Remove trees to allow adding more items.
get_nns_by_item, get_nns_by_vector : Query nearest neighbours.
save, load : Persist the index to/from disk.
serialize, deserialize : Persist the index to/from bytes.
set_seed : Set the random seed deterministically.
set_verbose : Set verbosity level.
info : Return a structured summary of the current index.
Notes
-----
* Once the underlying C++ index is created, ``f`` and ``metric`` are immutable.
This keeps the object consistent and avoids undefined behavior.
* The C++ index is created lazily when sufficient information is available:
when both ``f > 0`` and ``metric`` are known, or when an operation that
requires the index is first executed.
* If ``f == 0``, the dimensionality is inferred from the first non-empty vector
passed to :meth:`add_item` and is then fixed for the lifetime of the index.
* Assigning ``None`` to :attr:`f` is not supported. Use ``0`` for lazy
inference (this matches ``Annoy(f=None, ...)`` at construction time).
* If ``metric`` is omitted while ``f > 0``, the current behavior defaults to
``'angular'`` and may emit a :class:`FutureWarning`. To avoid warnings and
future behavior changes, always pass ``metric=...`` explicitly.
* Items must be added *before* calling :meth:`build`. After :meth:`build`, the
index becomes read-only; to add more items, call :meth:`unbuild`, add items
again with :meth:`add_item`, then call :meth:`build` again.
* Very large indexes can be built directly on disk with :meth:`on_disk_build`
and then memory-mapped with :meth:`load`.
* :meth:`info` returns a structured summary (dimension, metric, counts, and
optional memory usage) suitable for programmatic inspection.
* This wrapper stores user configuration (e.g., seed/verbosity) even before the
C++ index exists and applies it deterministically upon construction.
Developer Notes:
- Source of truth:
* ``f`` (int) and ``metric_id`` (enum) describe configuration.
* ``ptr`` is NULL when index is not constructed.
- Invariant:
* ``ptr != NULL`` implies ``f > 0`` and ``metric_id != METRIC_UNKNOWN``.
Examples
--------
>>> from annoy import Annoy, AnnoyIndex
High-level API:
>>> from scikitplot.cexternals._annoy import Annoy, AnnoyIndex
>>> from scikitplot.annoy import Annoy, AnnoyIndex, Index
The lifecycle follows the examples in ``test.ipynb``:
1. **Construct the index**
>>> import random; random.seed(0)
>>> # from annoy import AnnoyIndex
>>> from scikitplot.cexternals._annoy import Annoy, AnnoyIndex
>>> from scikitplot.annoy import Annoy, AnnoyIndex, Index
>>> idx = Annoy(f=3, metric="angular")
>>> idx.f, idx.metric
(3, 'angular')
If you pass ``f=0`` the dimension can be inferred on the first
call to :meth:`add_item`.
2. **Add items**
>>> idx.add_item(0, [1.0, 0.0, 0.0])
>>> idx.add_item(1, [0.0, 1.0, 0.0])
>>> idx.add_item(2, [0.0, 0.0, 1.0])
>>> idx.get_n_items()
3
3. **Build the forest**
>>> idx.build(n_trees=-1)
>>> idx.get_n_trees()
10
>>> idx.memory_usage() # byte
543076
After :meth:`build` the index becomes read-only. You can still
query, save, load and serialize it.
4. **Query neighbours**
By stored item id:
>>> idx.get_nns_by_item(0, 5)
[0, 1, 2, ...]
With distances:
>>> idx.get_nns_by_item(0, 5, include_distances=True)
([0, 1, 2, ...], [0.0, 1.22, 1.26, ...])
Or by an explicit query vector:
>>> idx.get_nns_by_vector([0.1, 0.2, 0.3], 5, include_distances=True)
([103, 71, 160, 573, 672], [...])
5. **Persistence**
To work with memory-mapped indices on disk:
>>> idx.save("annoy_test.annoy")
>>> idx2 = Annoy(f=100, metric="angular")
>>> idx2.load("annoy_test.annoy")
>>> idx2.get_n_items()
1000
Or via raw byte:
>>> buf = idx.serialize()
>>> new_idx = Annoy(f=100, metric="angular")
>>> new_idx.deserialize(buf)
>>> new_idx.get_n_items()
1000
You can release OS resources with :meth:`unload` and drop the
current forest with :meth:`unbuild`.
print(Index.__doc__)
High-level ANNoy index composed from mixins.
Parameters
----------
f : int or None, optional, default=None
Vector dimension. If ``0`` or ``None``, dimension may be inferred from the
first vector passed to ``add_item`` (lazy mode).
If None, treated as ``0`` (reset to default).
metric : {"angular", "cosine", "euclidean", "l2", "lstsq", "manhattan", "l1", "cityblock", "taxicab", "dot", "@", ".", "dotproduct", "inner", "innerproduct", "hamming"} or None, optional, default=None
Distance metric (one of 'angular', 'euclidean', 'manhattan', 'dot', 'hamming').
If omitted and ``f > 0``, defaults to ``'angular'`` (cosine-like).
If omitted and ``f == 0``, metric may be set later before construction.
If None, behavior depends on ``f``:
* If ``f > 0``: defaults to ``'angular'`` (legacy behavior; may emit a
:class:`FutureWarning`).
* If ``f == 0``: leaves the metric unset (lazy). You may set
:attr:`metric` later before construction, or it will default to
``'angular'`` on first :meth:`add_item`.
n_neighbors : int, default=5
Non-negative integer Number of neighbors to retrieve for each query.
on_disk_path : str or None, optional, default=None
If provided, configures the path for on-disk building. When the underlying
index exists, this enables on-disk build mode (equivalent to calling
:meth:`on_disk_build` with the same filename).
Note: Annoy core truncates the target file when enabling on-disk build.
This wrapper treats ``on_disk_path`` as strictly equivalent to calling
:meth:`on_disk_build` with the same filename (truncate allowed).
In lazy mode (``f==0`` and/or ``metric is None``), activation occurs once
the underlying C++ index is created.
prefault : bool or None, optional, default=None
If True, request page-faulting index pages into memory when loading
(when supported by the underlying platform/backing).
If None, treated as ``False`` (reset to default).
seed : int or None, optional, default=None
Non-negative integer seed. If set before the index is constructed,
the seed is stored and applied when the C++ index is created.
Seed value ``0`` is treated as "use Annoy's deterministic default seed"
(a :class:`UserWarning` is emitted when ``0`` is explicitly provided).
verbose : int or None, optional, default=None
Verbosity level. Values are clamped to the range ``[-2, 2]``.
``level >= 1`` enables Annoy's verbose logging; ``level <= 0`` disables it.
Logging level inspired by gradient-boosting libraries:
* ``<= 0`` : quiet (warnings only)
* ``1`` : info (Annoy's ``verbose=True``)
* ``>= 2`` : debug (currently same as info, reserved for future use)
schema_version : int, optional, default=None
Serialization/compatibility strategy marker.
This does not change the Annoy on-disk format, but it *does* control
how the index is snapshotted in pickles.
* ``0`` or ``1``: pickle stores a ``portable-v1`` snapshot (fast restore,
ABI-checked).
* ``2``: pickle stores ``canonical-v1`` (portable across ABIs; restores by
rebuilding deterministically).
* ``>=3``: pickle stores both portable and canonical (canonical is used as
a fallback if the ABI check fails).
If None, treated as ``0`` (reset to default).
Attributes
----------
f : int, default=0
Vector dimension. ``0`` means "unknown / lazy".
metric : {'angular', 'euclidean', 'manhattan', 'dot', 'hamming'}, default="angular"
Canonical metric name, or None if not configured yet (lazy).
n_neighbors : int, default=5
Non-negative integer Number of neighbors to retrieve for each query.
on_disk_path : str or None, optional, default=None
Configured on-disk build path. Setting this attribute enables on-disk
build mode (equivalent to :meth:`on_disk_build`), with safety checks
to avoid implicit truncation of existing files.
seed, random_state : int or None, optional, default=None
Non-negative integer seed.
verbose : int or None, optional, default=None
Verbosity level.
prefault : bool, default=False
Stored prefault flag (see :meth:`load`/`:meth:`save` prefault parameters).
schema_version : int, default=0
Reserved schema/version marker (stored; does not affect on-disk format).
n_features, n_features_, n_features_in_ : int
Alias of `f` (dimension), provided for scikit-learn naming parity.
n_features_out_ : int
Number of output features produced by transform.
feature_names_in_ : list-like
Input feature names seen during fit.
Set only when explicitly provided via fit(..., feature_names=...).
y : dict | None, optional, default=None
If provided to fit(X, y), labels are stored here after a successful build.
You may also set this property manually. When possible, the setter enforces
that len(y) matches the current number of items (n_items).
pickle_mode : PickleMode
Pickle strategy used by :class:`~scikitplot.annoy._mixins._pickle.PickleMixin`.
compress_mode : CompressMode or None
Optional compression used by :class:`~scikitplot.annoy._mixins._pickle.PickleMixin`
when serializing to bytes.
Notes
-----
This class is a direct subclass of the C-extension backend. It does not
override ``__new__`` and does not rely on cooperative initialization across
mixins. Mixins must be written so that their methods work even if they
define no ``__init__`` at all.
See Also
--------
scikitplot.cexternals._annoy.Annoy
Index.from_low_level
from scikitplot import annoy
annoy.__version__, dir(annoy), dir(annoy.Annoy)
('2.0.0+git.20251130.8a7e82cb537053926b0ac6ec132b9ccc875af40c', ['Annoy', 'AnnoyIndex', 'CompressMode', 'Index', 'IndexIOMixin', 'MetaMixin', 'NDArrayMixin', 'PickleMixin', 'PickleMode', 'PlottingMixin', 'VectorOpsMixin', '__all__', '__author__', '__author_email__', '__builtins__', '__cached__', '__doc__', '__file__', '__git_hash__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '__version__', '_base', '_metadata', '_mixins', '_utils', 'annotations', 'annoylib'], ['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__len__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__sklearn_clone__', '__sklearn_is_fitted__', '__sklearn_tags__', '__str__', '__subclasshook__', '_f', '_metric_id', '_on_disk_path', '_prefault', '_repr_html_', '_schema_version', '_y', '_y_map', 'add_item', 'build', 'deserialize', 'f', 'feature_names_in_', 'fit', 'fit_transform', 'get_distance', 'get_feature_names_out', 'get_item_vector', 'get_n_items', 'get_n_trees', 'get_nns_by_item', 'get_nns_by_vector', 'get_params', 'info', 'load', 'memory_usage', 'metric', 'n_features', 'n_features_', 'n_features_in_', 'n_features_out_', 'n_neighbors', 'on_disk_build', 'on_disk_path', 'prefault', 'random_state', 'rebuild', 'repr_info', 'save', 'schema_version', 'seed', 'serialize', 'set_params', 'set_seed', 'set_verbose', 'set_verbosity', 'transform', 'unbuild', 'unload', 'verbose', 'y', 'y_map'])
import sys
# TODO: change this import to wherever your modified AnnoyIndex lives
# e.g. scikitplot.cexternals._annoy or similar
# import scikitplot.cexternals._annoy as annoy
from scikitplot import annoy
sys.modules["annoy"] = annoy # now `import annoy` will resolve to your module
import annoy
print(annoy.__doc__)
Public Annoy Python API for scikitplot.
Spotify ANNoy [1]_ (Approximate Nearest Neighbors Oh Yeah).
This package exposes **two layers**:
Exports:
1. Low-level C-extension types copied from Spotify's *annoy* project:
:class:`~scikitplot.cexternals._annoy.Annoy` and :class:`~scikitplot.cexternals._annoy.AnnoyIndex`.
2. A high-level, mixin-composed wrapper :class:`~scikitplot.annoy.Index` that:
- forwards the complete low-level API deterministically,
- adds versioned manifest import/export,
- provides explicit index I/O names (``save_index`` / ``load_index``),
- provides safe Python-object persistence helpers (pickling),
- adds optional NumPy export and plotting utilities.
Notes
-----
This module intentionally avoids side effects at import time (no implicit NumPy
or matplotlib imports).
.. seealso::
* :ref:`ANNoy <annoy-index>`
* :ref:`cexternals/ANNoy <cexternals-annoy-index>`
* https://github.com/spotify/annoy
* https://pypi.org/project/annoy
See Also
--------
scikitplot.cexternals._annoy
Low-level C-extension backend.
scikitplot.annoy.Index
High-level wrapper composed from mixins.
References
----------
.. [1] `Spotify AB. (2013, Feb 20). "ANNoy: Approximate Nearest Neighbors Oh Yeah"
Github. https://github.com/spotify/annoy <https://github.com/spotify/annoy>`_
Examples
--------
>>> import random
>>> random.seed(0)
>>> # from annoy import AnnoyIndex
>>> from scikitplot.cexternals._annoy import Annoy, AnnoyIndex
>>> from scikitplot.annoy import Annoy, AnnoyIndex, Index
>>> f = 40 # vector dimensionality
>>> t = Index(f, "angular") # same constructor as the low-level backend
>>> t.add_item(0, [1] * f)
>>> t.build(10) # Build 10 trees
>>> t.get_nns_by_item(0, 1) # Find nearest neighbor
Index()
# =============================================================
# 1. Construction
# =============================================================
idx = Index()
idx = Index(None, None)
print("Index dimension:", idx.f)
print("Metric :", idx.metric)
print(idx.info())
print(idx)
print(type(idx))
idx
# help(idx.info)
Index dimension: 0
Metric : None
{'f': 0, 'metric': None, 'n_neighbors': 5, 'on_disk_path': None, 'prefault': False, 'seed': None, 'verbose': None, 'schema_version': 0, 'n_items': 0, 'n_trees': 0}
Annoy(**{'f': 0, 'metric': None, 'n_neighbors': 5, 'on_disk_path': None, 'prefault': False, 'seed': None, 'verbose': None, 'schema_version': 0})
<class 'scikitplot.annoy._base.Index'>
dir(idx)
['_META_SCHEMA_VERSION', '_PICKLE_STATE_VERSION', '__annotations__', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__len__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__sklearn_clone__', '__sklearn_is_fitted__', '__sklearn_tags__', '__str__', '__subclasshook__', '__weakref__', '_as_2d_coords', '_backend', '_compress_mode', '_f', '_get_lock', '_lock', '_metric_id', '_ndarray_expected_rows', '_ndarray_infer_f', '_ndarray_iter_ids', '_ndarray_materialize_dense', '_ndarray_require_unbuilt', '_on_disk_path', '_pickle_mode', '_plotting_backend', '_prefault', '_rebuild', '_repr_html_', '_schema_version', '_y', '_y_map', 'add_item', 'add_items', 'backend', 'build', 'compress_mode', 'deserialize', 'f', 'feature_names_in_', 'fit', 'fit_transform', 'from_bytes', 'from_json', 'from_low_level', 'from_metadata', 'from_yaml', 'get_distance', 'get_feature_names_out', 'get_item_vector', 'get_item_vectors', 'get_n_items', 'get_n_trees', 'get_nns_by_item', 'get_nns_by_vector', 'get_params', 'info', 'iter_item_vectors', 'kneighbors', 'kneighbors_graph', 'load', 'load_bundle', 'load_index', 'memory_usage', 'metric', 'n_features', 'n_features_', 'n_features_in_', 'n_features_out_', 'n_neighbors', 'on_disk_build', 'on_disk_path', 'pickle_mode', 'plot_index', 'plot_knn_edges', 'prefault', 'query_by_item', 'query_by_vector', 'query_vectors_by_item', 'query_vectors_by_vector', 'random_state', 'rebuild', 'repr_info', 'save', 'save_bundle', 'save_index', 'schema_version', 'seed', 'serialize', 'set_params', 'set_seed', 'set_verbose', 'set_verbosity', 'to_bytes', 'to_json', 'to_metadata', 'to_numpy', 'to_pandas', 'to_scipy_csr', 'to_yaml', 'transform', 'unbuild', 'unload', 'verbose', 'y', 'y_map']
# AttributeError: readonly attribute
# idx._metric_id = 1
idx._f, idx._metric_id, idx._on_disk_path
(0, 0, None)
idx.f, idx.metric, idx.on_disk_path
(0, None, None)
idx.metric = "dot"
idx
idx.f, idx.metric, idx.on_disk_path
(0, 'dot', None)
type(idx)
# =============================================================
# 1. Construction
# =============================================================
idx = Index(f=3, metric="angular")
print("Index dimension:", idx.f)
print("Metric :", idx.metric)
print(idx.info())
print(idx)
idx
Index dimension: 3
Metric : angular
{'f': 3, 'metric': 'angular', 'n_neighbors': 5, 'on_disk_path': None, 'prefault': False, 'seed': None, 'verbose': None, 'schema_version': 0, 'n_items': 0, 'n_trees': 0}
Annoy(**{'f': 3, 'metric': 'angular', 'n_neighbors': 5, 'on_disk_path': None, 'prefault': False, 'seed': None, 'verbose': None, 'schema_version': 0})
# =============================================================
# 2. Add items
# =============================================================
idx.add_item(0, [0, 0, 0])
idx.add_item(1, [1, 0, 0])
idx.add_item(2, [0, 1, 0])
idx.add_item(3, [0, 0, 1])
idx.add_item(4, [2, 0, 0])
idx.add_item(5, [0, 2, 0])
idx.add_item(6, [0, 0, 2])
idx.add_item(7, [3, 0, 0])
idx.add_item(8, [0, 3, 0])
idx.add_item(9, [0, 0, 3])
idx.add_item(10, [4, 0, 0])
idx.add_item(11, [0, 4, 0])
idx.add_item(12, [0, 0, 4])
idx.add_item(12, [4, 0, 0])
idx.add_item(13, [0, 4, 0])
idx.add_item(14, [0, 0, 4])
print("Number of items:", idx.get_n_items())
print("Index dimension:", idx.f)
print("Metric :", idx.metric)
print(idx.info())
print(idx)
idx
Number of items: 15
Index dimension: 3
Metric : angular
{'f': 3, 'metric': 'angular', 'n_neighbors': 5, 'on_disk_path': None, 'prefault': False, 'seed': None, 'verbose': None, 'schema_version': 0, 'n_items': 15, 'n_trees': 0}
Annoy(**{'f': 3, 'metric': 'angular', 'n_neighbors': 5, 'on_disk_path': None, 'prefault': False, 'seed': None, 'verbose': None, 'schema_version': 0})
def plot(idx, y=None, **kwargs):
import numpy as np
import matplotlib.pyplot as plt
import scikitplot.cexternals._annoy._plotting as utils
single = np.zeros(idx.get_n_items(), dtype=int)
if y is None:
double = np.random.uniform(0, 1, idx.get_n_items()).round()
# single vs double
fig, ax = plt.subplots(ncols=2, figsize=(12, 5))
alpha = kwargs.pop("alpha", 0.8)
y2 = utils.plot_annoy_index(
idx,
dims = list(range(idx.f)),
plot_kwargs={"draw_legend": False},
ax=ax[0],
)[0]
utils.plot_annoy_knn_edges(
idx,
y2,
k=1,
line_kwargs={"alpha": alpha},
ax=ax[1],
)
idx.unbuild()
idx.build(100)
plot(idx)

from scikitplot import annoy as a
print(a.Annoy) # same
print(a.AnnoyIndex) # same
print(a.Index) # should show <class '..._base.Index'>
print(isinstance(idx, a.Annoy))
print(isinstance(idx, a.AnnoyIndex))
print(isinstance(idx, a.Index))
print(type(idx))
print(idx.__class__.__module__)
print(idx.__class__.__mro__)
<class 'scikitplot.cexternals._annoy.Annoy'>
<class 'scikitplot.cexternals._annoy.Annoy'>
<class 'scikitplot.annoy._base.Index'>
True
True
True
<class 'scikitplot.annoy._base.Index'>
scikitplot.annoy._base
(<class 'scikitplot.annoy._base.Index'>, <class 'scikitplot.cexternals._annoy.Annoy'>, <class 'scikitplot.annoy._mixins._meta.MetaMixin'>, <class 'scikitplot.annoy._mixins._io.IndexIOMixin'>, <class 'scikitplot.annoy._mixins._pickle.PickleMixin'>, <class 'scikitplot.annoy._mixins._vectors.VectorOpsMixin'>, <class 'scikitplot.annoy._mixins._ndarray.NDArrayMixin'>, <class 'scikitplot.annoy._mixins._plotting.PlottingMixin'>, <class 'object'>)
# =============================================================
# 1. Construction
# =============================================================
idx = Index(f=3, metric="angular")
print("Index dimension:", idx.f)
print("Metric :", idx.metric)
print(idx.info())
print(idx)
idx
Index dimension: 3
Metric : angular
{'f': 3, 'metric': 'angular', 'n_neighbors': 5, 'on_disk_path': None, 'prefault': False, 'seed': None, 'verbose': None, 'schema_version': 0, 'n_items': 0, 'n_trees': 0}
Annoy(**{'f': 3, 'metric': 'angular', 'n_neighbors': 5, 'on_disk_path': None, 'prefault': False, 'seed': None, 'verbose': None, 'schema_version': 0})
# =============================================================
# 2. Add items
# =============================================================
idx.add_item(0, [1, 0, 0])
idx.add_item(1, [0, 1, 0])
idx.add_item(2, [0, 0, 1])
print("Number of items:", idx.get_n_items())
print("Index dimension:", idx.f)
print("Metric :", idx.metric)
Number of items: 3
Index dimension: 3
Metric : angular
# =============================================================
# 1. Construction
# =============================================================
idx = Index(100, metric="angular")
print("Index dimension:", idx.f)
print("Metric :", idx.metric)
idx.on_disk_build("annoy_test_2.annoy")
# help(idx.on_disk_build)
Index dimension: 100
Metric : angular
# =============================================================
# 2. Add items
# =============================================================
f=100
n=1000
for i in range(n):
if(i % (n//10) == 0): print(f"{i} / {n} = {1.0 * i / n}")
# v = []
# for z in range(f):
# v.append(random.gauss(0, 1))
v = [random.gauss(0, 1) for _ in range(f)]
idx.add_item(i, v)
print("Number of items:", idx.get_n_items())
print("Index dimension:", idx.f)
print("Metric :", idx.metric)
print(idx)
0 / 1000 = 0.0
100 / 1000 = 0.1
200 / 1000 = 0.2
300 / 1000 = 0.3
400 / 1000 = 0.4
500 / 1000 = 0.5
600 / 1000 = 0.6
700 / 1000 = 0.7
800 / 1000 = 0.8
900 / 1000 = 0.9
Number of items: 1000
Index dimension: 100
Metric : angular
Annoy(**{'f': 100, 'metric': 'angular', 'n_neighbors': 5, 'on_disk_path': 'annoy_test_2.annoy', 'prefault': False, 'seed': None, 'verbose': None, 'schema_version': 0})
# =============================================================
# 3. Build index
# =============================================================
idx.build(10)
print("Trees:", idx.get_n_trees())
print("Memory usage:", idx.memory_usage(), "bytes")
print(idx.info())
print(idx)
idx
# help(idx.build)
Trees: 10
Memory usage: 543076 bytes
{'f': 100, 'metric': 'angular', 'n_neighbors': 5, 'on_disk_path': 'annoy_test_2.annoy', 'prefault': False, 'seed': None, 'verbose': None, 'schema_version': 0, 'n_items': 1000, 'n_trees': 10, 'memory_usage_byte': 543076, 'memory_usage_mib': 0.5179176330566406}
Annoy(**{'f': 100, 'metric': 'angular', 'n_neighbors': 5, 'on_disk_path': 'annoy_test_2.annoy', 'prefault': False, 'seed': None, 'verbose': None, 'schema_version': 0})
idx.unbuild()
print(idx.info())
print(idx)
idx
{'f': 100, 'metric': 'angular', 'n_neighbors': 5, 'on_disk_path': 'annoy_test_2.annoy', 'prefault': False, 'seed': None, 'verbose': None, 'schema_version': 0, 'n_items': 1000, 'n_trees': 0}
Annoy(**{'f': 100, 'metric': 'angular', 'n_neighbors': 5, 'on_disk_path': 'annoy_test_2.annoy', 'prefault': False, 'seed': None, 'verbose': None, 'schema_version': 0})
idx.build(10)
print(idx.info())
print(idx)
idx
{'f': 100, 'metric': 'angular', 'n_neighbors': 5, 'on_disk_path': 'annoy_test_2.annoy', 'prefault': False, 'seed': None, 'verbose': None, 'schema_version': 0, 'n_items': 1000, 'n_trees': 10, 'memory_usage_byte': 543076, 'memory_usage_mib': 0.5179176330566406}
Annoy(**{'f': 100, 'metric': 'angular', 'n_neighbors': 5, 'on_disk_path': 'annoy_test_2.annoy', 'prefault': False, 'seed': None, 'verbose': None, 'schema_version': 0})
# =============================================================
# 1. Construction
# =============================================================
idx = Index(0, metric="angular")
print("Index dimension:", idx.f)
print("Metric :", idx.metric)
print(idx.info())
print(idx)
idx
Index dimension: 0
Metric : angular
{'f': 0, 'metric': 'angular', 'n_neighbors': 5, 'on_disk_path': None, 'prefault': False, 'seed': None, 'verbose': None, 'schema_version': 0, 'n_items': 0, 'n_trees': 0}
Annoy(**{'f': 0, 'metric': 'angular', 'n_neighbors': 5, 'on_disk_path': None, 'prefault': False, 'seed': None, 'verbose': None, 'schema_version': 0})
# =============================================================
# 2. Add items
# =============================================================
f=100
n=1000
for i in range(n):
if(i % (n//10) == 0): print(f"{i} / {n} = {1.0 * i / n}")
# v = []
# for z in range(f):
# v.append(random.gauss(0, 1))
v = [random.gauss(0, 1) for _ in range(f)]
idx.add_item(i, v)
print("Number of items:", idx.get_n_items())
print("Index dimension:", idx.f)
print("Metric :", idx.metric)
print(idx)
0 / 1000 = 0.0
100 / 1000 = 0.1
200 / 1000 = 0.2
300 / 1000 = 0.3
400 / 1000 = 0.4
500 / 1000 = 0.5
600 / 1000 = 0.6
700 / 1000 = 0.7
800 / 1000 = 0.8
900 / 1000 = 0.9
Number of items: 1000
Index dimension: 100
Metric : angular
Annoy(**{'f': 100, 'metric': 'angular', 'n_neighbors': 5, 'on_disk_path': None, 'prefault': False, 'seed': None, 'verbose': None, 'schema_version': 0})
# =============================================================
# 3. Build index
# =============================================================
idx.build(10)
print("Trees:", idx.get_n_trees())
print("Memory usage:", idx.memory_usage(), "bytes")
print(idx.info())
print(idx)
idx
# help(idx.get_n_trees)
Trees: 10
Memory usage: 611056 bytes
{'f': 100, 'metric': 'angular', 'n_neighbors': 5, 'on_disk_path': None, 'prefault': False, 'seed': None, 'verbose': None, 'schema_version': 0, 'n_items': 1000, 'n_trees': 10, 'memory_usage_byte': 611056, 'memory_usage_mib': 0.5827484130859375}
Annoy(**{'f': 100, 'metric': 'angular', 'n_neighbors': 5, 'on_disk_path': None, 'prefault': False, 'seed': None, 'verbose': None, 'schema_version': 0})
# =============================================================
# 4. Query ā return
# =============================================================
res = idx.get_nns_by_item(
0,
5,
# search_k = -1,
include_distances=True,
)
print(res)
([0, 429, 39, 598, 47], [0.0, 1.2052730321884155, 1.212796926498413, 1.2235209941864014, 1.2287578582763672])
# =============================================================
# 8. Query using vector
# =============================================================
res2 = idx.get_nns_by_vector(
[random.gauss(0, 1) for _ in range(f)],
5,
include_distances=True
)
print("\nQuery by vector:", res2)
Query by vector: ([336, 499, 237, 815, 839], [1.2275598049163818, 1.2495390176773071, 1.2670350074768066, 1.2672923803329468, 1.269050121307373])
# =============================================================
# 9. Low-level (non-result) mode
# =============================================================
items = idx.get_nns_by_item(0, 2, include_distances=False)
print("\nLow-level items only:", items)
items_low, d_low = idx.get_nns_by_item(0, 2, include_distances=True)
print("Low-level tuple return:", items_low, d_low)
Low-level items only: [0, 429]
Low-level tuple return: [0, 429] [0.0, 1.2052730321884155]
# =============================================================
# 10. Persistence
# =============================================================
print("\n=== Saving with binary annoy ===")
print(idx.info())
print(idx)
idx
idx.save("annoy_test_2.annoy")
print(idx.info())
print(idx)
idx
print("Loading...")
idx2 = Index(100, metric='angular').load("annoy_test_2.annoy")
print("Loaded index:", idx2)
=== Saving with binary annoy ===
{'f': 100, 'metric': 'angular', 'n_neighbors': 5, 'on_disk_path': None, 'prefault': False, 'seed': None, 'verbose': None, 'schema_version': 0, 'n_items': 1000, 'n_trees': 10, 'memory_usage_byte': 611056, 'memory_usage_mib': 0.5827484130859375}
Annoy(**{'f': 100, 'metric': 'angular', 'n_neighbors': 5, 'on_disk_path': None, 'prefault': False, 'seed': None, 'verbose': None, 'schema_version': 0})
{'f': 100, 'metric': 'angular', 'n_neighbors': 5, 'on_disk_path': 'annoy_test_2.annoy', 'prefault': False, 'seed': None, 'verbose': None, 'schema_version': 0, 'n_items': 1000, 'n_trees': 10, 'memory_usage_byte': 542252, 'memory_usage_mib': 0.5171318054199219}
Annoy(**{'f': 100, 'metric': 'angular', 'n_neighbors': 5, 'on_disk_path': 'annoy_test_2.annoy', 'prefault': False, 'seed': None, 'verbose': None, 'schema_version': 0})
Loading...
Loaded index: Annoy(**{'f': 100, 'metric': 'angular', 'n_neighbors': 5, 'on_disk_path': 'annoy_test_2.annoy', 'prefault': False, 'seed': None, 'verbose': None, 'schema_version': 0})
import joblib
joblib.dump(idx2, "test.joblib")
a = joblib.load("test.joblib")
a
({'f': 100, 'metric': 'angular', 'n_neighbors': 5, 'on_disk_path': 'annoy_test_2.annoy', 'prefault': False, 'seed': None, 'verbose': None, 'schema_version': 0, 'n_items': 1000, 'n_trees': 10, 'memory_usage_byte': 542252, 'memory_usage_mib': 0.5171318054199219}, 1000, 10)
np.array_equal(a.get_item_vector(0), idx2.get_item_vector(0))
True
np.array_equal(a.get_item_vector(0), idx.get_item_vector(0))
True
# =============================================================
# 11. Raw serialize / deserialize
# =============================================================
print("\n=== Raw serialize ===")
buf = idx.serialize()
new_idx = Index(100, metric='angular')
new_idx.deserialize(buf)
print("Deserialized index n_items:", new_idx.get_n_items())
print(idx.info())
print(idx)
idx
=== Raw serialize ===
Deserialized index n_items: 1000
{'f': 100, 'metric': 'angular', 'n_neighbors': 5, 'on_disk_path': 'annoy_test_2.annoy', 'prefault': False, 'seed': None, 'verbose': None, 'schema_version': 0, 'n_items': 1000, 'n_trees': 10, 'memory_usage_byte': 542252, 'memory_usage_mib': 0.5171318054199219}
Annoy(**{'f': 100, 'metric': 'angular', 'n_neighbors': 5, 'on_disk_path': 'annoy_test_2.annoy', 'prefault': False, 'seed': None, 'verbose': None, 'schema_version': 0})
idx.unload()
print(idx.info())
print(idx)
idx
{'f': 100, 'metric': 'angular', 'n_neighbors': 5, 'on_disk_path': None, 'prefault': False, 'seed': None, 'verbose': None, 'schema_version': 0, 'n_items': 0, 'n_trees': 0}
Annoy(**{'f': 100, 'metric': 'angular', 'n_neighbors': 5, 'on_disk_path': None, 'prefault': False, 'seed': None, 'verbose': None, 'schema_version': 0})
# idx.build(10)
idx.load("annoy_test_2.annoy")
print(idx)
type(idx)
Annoy(**{'f': 100, 'metric': 'angular', 'n_neighbors': 5, 'on_disk_path': 'annoy_test_2.annoy', 'prefault': False, 'seed': None, 'verbose': None, 'schema_version': 0})
# joblib
import joblib
joblib.dump(idx, "test.joblib"), joblib.load("test.joblib")
(['test.joblib'], Annoy(**{'f': 100, 'metric': 'angular', 'n_neighbors': 5, 'on_disk_path': 'annoy_test_2.annoy', 'prefault': False, 'seed': None, 'verbose': None, 'schema_version': 0}))
from scikitplot import annoy as a
f = 10
idx = a.AnnoyIndex(f, "angular")
# Distinct non-zero content so we can see mismatches clearly
for i in range(20):
idx.add_item(i, [float(i)] * f)
idx.build(10)
type(idx)
from scikitplot import annoy as a
# Legacy Support
idx = a.Index.from_low_level(idx)
import joblib
joblib.dump(idx, "test.joblib")
type(idx)
print(idx.info())
print(idx)
idx
{'f': 10, 'metric': 'angular', 'n_neighbors': 5, 'on_disk_path': None, 'prefault': False, 'seed': None, 'verbose': None, 'schema_version': 0, 'n_items': 20, 'n_trees': 10, 'memory_usage_byte': 4220, 'memory_usage_mib': 0.004024505615234375}
Annoy(**{'f': 10, 'metric': 'angular', 'n_neighbors': 5, 'on_disk_path': None, 'prefault': False, 'seed': None, 'verbose': None, 'schema_version': 0})
idx.get_nns_by_item(0, 10), len(idx.get_item_vector(0))
([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 10)
import random
from scikitplot.utils._time import Timer
n, f = 1_000_000, 10
X = [[random.gauss(0, 1) for _ in range(f)] for _ in range(n)]
q = [[random.gauss(0, 1) for _ in range(f)]]
# idx = Index().fit(X, feature_names=map("feature_{}".format, range(0,10)))
idx = Index().fit(X, feature_names=map("col_{}".format, range(0,10)))
idx
idx.feature_names_in_
('col_0', 'col_1', 'col_2', 'col_3', 'col_4', 'col_5', 'col_6', 'col_7', 'col_8', 'col_9')
idx.transform(X[:5], include_distances=True, return_labels=True)
([[[0.2995162308216095, 0.26872411370277405, -0.31986403465270996, 0.40183380246162415, -0.38237830996513367, 0.9011735916137695, 0.7422892451286316, 0.8437517285346985, 1.3799339532852173, -0.06174032390117645], [0.3315776586532593, 0.1875527799129486, -0.8046607375144958, 0.3848173916339874, -0.5218529105186462, 0.9577100872993469, 0.6407361626625061, 0.6109951138496399, 1.6533797979354858, -0.14335046708583832], [1.066840648651123, 0.5097151398658752, -0.02644050307571888, 0.5691455006599426, -0.6068020462989807, 1.3134701251983643, 0.8040322065353394, 1.5368680953979492, 1.9145973920822144, -0.1704350709915161], [0.5550230741500854, 0.1936453878879547, -0.6877626776695251, 0.3667159378528595, -1.054802417755127, 1.2736949920654297, 0.7120872139930725, 0.9665201306343079, 1.589842438697815, 0.11769255250692368], [0.6585002541542053, 1.2284033298492432, -0.5905928611755371, 0.8687102198600769, -0.38249245285987854, 1.7344026565551758, 2.142791271209717, 1.7588355541229248, 2.0976037979125977, 0.07778248935937881]], [[-0.4028843939304352, 0.6818151473999023, -1.117720365524292, 1.0333377122879028, 0.1900119036436081, -0.8227489590644836, 0.7598976492881775, 0.5180985927581787, 0.3719368278980255, 1.6910221576690674], [-0.6337267160415649, 0.2562711834907532, -1.0475351810455322, 0.6090968251228333, 0.16023726761341095, -0.25397247076034546, 0.7996667623519897, 0.4232807755470276, 0.3861091434955597, 1.6330668926239014], [0.11266162991523743, 1.1821759939193726, -1.179136037826538, 1.6884346008300781, 0.6768856048583984, -1.3160065412521362, 1.2716294527053833, 0.8974682092666626, 1.0359108448028564, 1.7555595636367798], [-0.8022650480270386, 0.8258086442947388, -0.6947636008262634, 0.865044355392456, 0.7216836214065552, -0.6661052703857422, 0.7980683445930481, 0.8103359341621399, 0.008209889754652977, 2.0164074897766113], [-0.5616939067840576, 0.6011213064193726, -1.0803884267807007, 0.9985421895980835, 0.612271785736084, -1.201594591140747, 0.5310771465301514, 1.528005838394165, 0.7820205688476562, 2.5737133026123047]], [[-0.21217121183872223, 0.2056313157081604, 0.722652018070221, 0.8762103319168091, 0.6707500219345093, -1.6379401683807373, 0.9332223534584045, -0.5422225594520569, -1.1026482582092285, 0.056520331650972366], [-0.4803016185760498, 0.05752541869878769, 0.7424291372299194, 1.1357213258743286, 0.8284794092178345, -1.3418638706207275, 1.3532313108444214, -0.6681973934173584, -1.2024109363555908, -0.5458902716636658], [-0.09657622873783112, -0.03825854882597923, 0.19272591173648834, 1.1368770599365234, 0.6961694955825806, -1.9996322393417358, 0.7544685006141663, -0.9075682163238525, -1.4941128492355347, -0.3546978831291199], [-0.5726475119590759, 0.8345264792442322, 1.36396324634552, 0.6331958174705505, 1.1805782318115234, -1.7656670808792114, 1.5728577375411987, -1.2082107067108154, -1.7261351346969604, 0.2111993134021759], [0.5587325096130371, 0.7143872380256653, 0.7723369598388672, 1.6084697246551514, 1.0039435625076294, -3.12009859085083, 1.20151686668396, -0.9630723595619202, -1.747536063194275, -0.7610187530517578]], [[1.8360724449157715, -1.7788029909133911, -1.0985404253005981, -1.2299158573150635, -0.4852966070175171, 0.22859908640384674, -0.03444309160113335, -0.34960466623306274, -0.2747590243816376, 0.1640910655260086], [2.132922649383545, -1.8067975044250488, -0.5985732078552246, -1.4354743957519531, -0.6862561702728271, -0.055050190538167953, -0.20438416302204132, -0.10576765984296799, -0.18300966918468475, 0.0332980640232563], [1.1669689416885376, -1.1882712841033936, -1.2006605863571167, -0.9844658970832825, -0.35999438166618347, 0.1763678342103958, -0.16189533472061157, -0.22041620314121246, -0.4420163631439209, -0.25835075974464417], [1.172637701034546, -0.9567055702209473, -1.0078904628753662, -1.0543380975723267, -0.3939838111400604, 0.6376891136169434, 0.18943792581558228, -0.47181078791618347, -0.3672703504562378, 0.18777403235435486], [1.7341971397399902, -1.7087979316711426, -0.7155895829200745, -1.7988238334655762, -1.0472643375396729, 0.08940385282039642, 0.43338480591773987, -0.011753100901842117, -0.5730846524238586, 0.05322456732392311]], [[0.38939982652664185, -0.7888681292533875, 0.21797947585582733, -0.39556416869163513, 0.09195032715797424, -0.45746126770973206, 0.7257154583930969, 0.163970485329628, 0.3641418516635895, 0.2510545551776886], [1.57913339138031, -2.1115193367004395, 0.8659923672676086, -1.4170335531234741, 0.31213846802711487, -1.1963188648223877, 1.6555734872817993, 0.32366394996643066, 0.7790639996528625, 0.7397186160087585], [0.3067862093448639, -1.1996709108352661, 0.2953212559223175, -0.8477502465248108, -0.09012967348098755, -0.589461088180542, 1.2440359592437744, 0.19568035006523132, 0.5365380048751831, 0.5055891871452332], [0.5571768283843994, -1.5110305547714233, 0.44180983304977417, -0.579093873500824, -0.3039686977863312, -0.5685702562332153, 1.7404046058654785, 0.043175242841243744, 0.5812414884567261, 0.32155993580818176], [0.8853705525398254, -1.58010995388031, 0.23342616856098175, -1.356982946395874, 0.20337145030498505, -1.030630111694336, 0.989798367023468, 0.5244887471199036, 0.5933203101158142, 0.9632385969161987]]], [[0.0, 0.26271528005599976, 0.27752506732940674, 0.2870348393917084, 0.29192054271698], [0.0, 0.3105616569519043, 0.33396831154823303, 0.34322890639305115, 0.3432622253894806], [0.0, 0.31156572699546814, 0.31792789697647095, 0.32415494322776794, 0.3307192623615265], [0.0, 0.2397470921278, 0.2777545750141144, 0.324449747800827, 0.33787739276885986], [0.0, 0.2047322541475296, 0.22700326144695282, 0.2978989779949188, 0.31784898042678833]], [[None, None, None, None, None], [None, None, None, None, None], [None, None, None, None, None], [None, None, None, None, None], [None, None, None, None, None]])
with Timer("set_params"):
for m in ["angular", "l1", "l2", ".", "hamming"]:
idx = Index().set_params(metric=m).fit(X)
print(m, idx.transform(q))
angular [[[-0.576092004776001, -1.1014655828475952, 1.5072448253631592, -0.37987226247787476, -0.12107884138822556, -0.16090495884418488, -0.9599498510360718, 0.6443582773208618, 0.7830631136894226, 0.1322690099477768], [-0.14904215931892395, -1.9222668409347534, 2.399625778198242, -0.568252444267273, 0.47048714756965637, -0.5003377199172974, -1.3439618349075317, 1.420609951019287, 1.4913909435272217, -0.13222387433052063], [-0.530595600605011, -0.9399460554122925, 2.0382790565490723, 0.26027607917785645, -0.5553302764892578, -0.12559548020362854, -1.445036768913269, 0.6240798830986023, 0.725025475025177, 0.3368868827819824], [-0.5415041446685791, -1.1800884008407593, 1.529439091682434, 0.11952074617147446, -0.2929523289203644, -0.5609923601150513, -0.9184246063232422, 1.3177564144134521, 1.6426866054534912, 0.32133588194847107], [0.047580111771821976, -0.6638967394828796, 1.7719310522079468, 0.10254024714231491, 0.41480061411857605, -0.6402038931846619, -0.8408301472663879, 0.6886392831802368, 0.7279884219169617, -0.45430782437324524]]]
l1 [[[-0.6069902777671814, -1.5303187370300293, 1.7485501766204834, 0.09983374923467636, -0.2737273573875427, -0.4870619773864746, -0.857934296131134, 0.7189533114433289, 0.4311307370662689, 0.388454407453537], [-0.4290771186351776, -1.3283613920211792, 1.600633144378662, -0.04403701052069664, 0.060187242925167084, -0.7421000599861145, -1.0723392963409424, 1.1093946695327759, 0.21864093840122223, 0.7096473574638367], [-0.576092004776001, -1.1014655828475952, 1.5072448253631592, -0.37987226247787476, -0.12107884138822556, -0.16090495884418488, -0.9599498510360718, 0.6443582773208618, 0.7830631136894226, 0.1322690099477768], [-0.5415041446685791, -1.1800884008407593, 1.529439091682434, 0.11952074617147446, -0.2929523289203644, -0.5609923601150513, -0.9184246063232422, 1.3177564144134521, 1.6426866054534912, 0.32133588194847107], [-0.30479490756988525, -1.4008817672729492, 1.698373556137085, 0.24252529442310333, 0.2701326012611389, -0.360563188791275, -0.9885985255241394, -0.042488664388656616, 1.2770600318908691, -0.04430120438337326]]]
l2 [[[-0.6069902777671814, -1.5303187370300293, 1.7485501766204834, 0.09983374923467636, -0.2737273573875427, -0.4870619773864746, -0.857934296131134, 0.7189533114433289, 0.4311307370662689, 0.388454407453537], [-0.576092004776001, -1.1014655828475952, 1.5072448253631592, -0.37987226247787476, -0.12107884138822556, -0.16090495884418488, -0.9599498510360718, 0.6443582773208618, 0.7830631136894226, 0.1322690099477768], [-0.530595600605011, -0.9399460554122925, 2.0382790565490723, 0.26027607917785645, -0.5553302764892578, -0.12559548020362854, -1.445036768913269, 0.6240798830986023, 0.725025475025177, 0.3368868827819824], [-0.8246579170227051, -1.5498437881469727, 1.722296118736267, 0.12505893409252167, -0.5007220506668091, -0.5869333148002625, -1.1538058519363403, 0.8996411561965942, 0.046726711094379425, -0.06863833218812943], [-0.5415041446685791, -1.1800884008407593, 1.529439091682434, 0.11952074617147446, -0.2929523289203644, -0.5609923601150513, -0.9184246063232422, 1.3177564144134521, 1.6426866054534912, 0.32133588194847107]]]
. [[[-1.439661979675293, -2.1038331985473633, 2.227424144744873, -0.31966137886047363, 1.9542486667633057, -0.053185444325208664, -1.2450610399246216, 2.6845853328704834, 2.367955446243286, -1.165417194366455], [-1.1499366760253906, -1.3602036237716675, 2.739813804626465, -0.02609434723854065, 1.4179996252059937, -0.8871392607688904, -1.1483855247497559, 2.639958143234253, 1.8300042152404785, -0.17696726322174072], [0.050369977951049805, -2.576887607574463, 3.0548112392425537, -0.10387302190065384, 0.7079399824142456, -1.0265858173370361, -0.614619255065918, 1.9724539518356323, 0.49327361583709717, -0.49182355403900146], [-0.14804844558238983, -1.1218527555465698, 3.8763976097106934, 1.917222261428833, 1.2766927480697632, -0.8002503514289856, -1.7977492809295654, 1.2221813201904297, -0.7088087201118469, -0.7126805186271667], [-0.9116369485855103, -2.9165921211242676, 1.7386298179626465, 0.5616129040718079, 1.1095495223999023, -0.8117503523826599, -1.5209518671035767, 0.7406372427940369, 2.0935792922973633, -1.7276729345321655]]]
hamming [[[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0], [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0], [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0], [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0], [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0]]]
with Timer("rebuild"):
base = Index(metric="l2").fit(X)
for m in ["angular", "l1", "l2", "dot", "hamming"]:
idx_m = base.rebuild(metric=m) # rebuild-from-index
print(m, idx_m.transform(q)) # no .fit(X) here
angular [[[-0.576092004776001, -1.1014655828475952, 1.5072448253631592, -0.37987226247787476, -0.12107884138822556, -0.16090495884418488, -0.9599498510360718, 0.6443582773208618, 0.7830631136894226, 0.1322690099477768], [-0.14904215931892395, -1.9222668409347534, 2.399625778198242, -0.568252444267273, 0.47048714756965637, -0.5003377199172974, -1.3439618349075317, 1.420609951019287, 1.4913909435272217, -0.13222387433052063], [-0.530595600605011, -0.9399460554122925, 2.0382790565490723, 0.26027607917785645, -0.5553302764892578, -0.12559548020362854, -1.445036768913269, 0.6240798830986023, 0.725025475025177, 0.3368868827819824], [-0.5415041446685791, -1.1800884008407593, 1.529439091682434, 0.11952074617147446, -0.2929523289203644, -0.5609923601150513, -0.9184246063232422, 1.3177564144134521, 1.6426866054534912, 0.32133588194847107], [0.047580111771821976, -0.6638967394828796, 1.7719310522079468, 0.10254024714231491, 0.41480061411857605, -0.6402038931846619, -0.8408301472663879, 0.6886392831802368, 0.7279884219169617, -0.45430782437324524]]]
l1 [[[-0.6069902777671814, -1.5303187370300293, 1.7485501766204834, 0.09983374923467636, -0.2737273573875427, -0.4870619773864746, -0.857934296131134, 0.7189533114433289, 0.4311307370662689, 0.388454407453537], [-0.4290771186351776, -1.3283613920211792, 1.600633144378662, -0.04403701052069664, 0.060187242925167084, -0.7421000599861145, -1.0723392963409424, 1.1093946695327759, 0.21864093840122223, 0.7096473574638367], [-0.576092004776001, -1.1014655828475952, 1.5072448253631592, -0.37987226247787476, -0.12107884138822556, -0.16090495884418488, -0.9599498510360718, 0.6443582773208618, 0.7830631136894226, 0.1322690099477768], [-0.5415041446685791, -1.1800884008407593, 1.529439091682434, 0.11952074617147446, -0.2929523289203644, -0.5609923601150513, -0.9184246063232422, 1.3177564144134521, 1.6426866054534912, 0.32133588194847107], [-0.30479490756988525, -1.4008817672729492, 1.698373556137085, 0.24252529442310333, 0.2701326012611389, -0.360563188791275, -0.9885985255241394, -0.042488664388656616, 1.2770600318908691, -0.04430120438337326]]]
l2 [[[-0.6069902777671814, -1.5303187370300293, 1.7485501766204834, 0.09983374923467636, -0.2737273573875427, -0.4870619773864746, -0.857934296131134, 0.7189533114433289, 0.4311307370662689, 0.388454407453537], [-0.576092004776001, -1.1014655828475952, 1.5072448253631592, -0.37987226247787476, -0.12107884138822556, -0.16090495884418488, -0.9599498510360718, 0.6443582773208618, 0.7830631136894226, 0.1322690099477768], [-0.530595600605011, -0.9399460554122925, 2.0382790565490723, 0.26027607917785645, -0.5553302764892578, -0.12559548020362854, -1.445036768913269, 0.6240798830986023, 0.725025475025177, 0.3368868827819824], [-0.8246579170227051, -1.5498437881469727, 1.722296118736267, 0.12505893409252167, -0.5007220506668091, -0.5869333148002625, -1.1538058519363403, 0.8996411561965942, 0.046726711094379425, -0.06863833218812943], [-0.5415041446685791, -1.1800884008407593, 1.529439091682434, 0.11952074617147446, -0.2929523289203644, -0.5609923601150513, -0.9184246063232422, 1.3177564144134521, 1.6426866054534912, 0.32133588194847107]]]
dot [[[-1.439661979675293, -2.1038331985473633, 2.227424144744873, -0.31966137886047363, 1.9542486667633057, -0.053185444325208664, -1.2450610399246216, 2.6845853328704834, 2.367955446243286, -1.165417194366455], [-1.1499366760253906, -1.3602036237716675, 2.739813804626465, -0.02609434723854065, 1.4179996252059937, -0.8871392607688904, -1.1483855247497559, 2.639958143234253, 1.8300042152404785, -0.17696726322174072], [0.050369977951049805, -2.576887607574463, 3.0548112392425537, -0.10387302190065384, 0.7079399824142456, -1.0265858173370361, -0.614619255065918, 1.9724539518356323, 0.49327361583709717, -0.49182355403900146], [-0.14804844558238983, -1.1218527555465698, 3.8763976097106934, 1.917222261428833, 1.2766927480697632, -0.8002503514289856, -1.7977492809295654, 1.2221813201904297, -0.7088087201118469, -0.7126805186271667], [-0.9116369485855103, -2.9165921211242676, 1.7386298179626465, 0.5616129040718079, 1.1095495223999023, -0.8117503523826599, -1.5209518671035767, 0.7406372427940369, 2.0935792922973633, -1.7276729345321655]]]
hamming [[[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0], [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0], [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0], [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0], [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0]]]
Total running time of the script: (5 minutes 41.953 seconds)
Related examples