VectorOpsMixin#

class scikitplot.annoy.VectorOpsMixin[source]#

User-facing neighbor queries for Annoy-like backends.

This mixin exposes explicit per-query helpers (query_by_item, query_by_vector) and scikit-learn style batch helpers (kneighbors, kneighbors_graph).

Notes

Output ordering for kneighbors is (neighbors, distances) when include_distances=True (neighbors first). This is not the same as sklearn.neighbors.NearestNeighbors.kneighbors (which returns distances first). The order is intentional and documented.

kneighbors(X, n_neighbors=5, *, search_k=-1, include_distances=True, exclude_self=False, exclude_item_ids=None, ensure_all_finite=True, copy=False, output_type='vector')[source]#

Find k nearest neighbors for one or more query vectors.

This is a sklearn-like convenience wrapper that returns rectangular arrays.

Parameters:
Xarray-like of shape (f,) or (n_queries, f)

Query vector(s).

n_neighborsint, default=5

Number of neighbors to return per query.

search_kint, default=-1

Search parameter forwarded to the backend.

include_distancesbool, default=True

If True, return (neighbors, distances). Otherwise return neighbors.

exclude_selfbool, default=False

If True, apply the same deterministic self-exclusion rule as query_by_vector for each query row.

exclude_item_idsiterable of int, optional

Exclude these ids for every query.

ensure_all_finitebool or ‘allow-nan’, default=True

Input validation option forwarded to scikit-learn.

copybool, default=False

Input validation option forwarded to scikit-learn.

output_type{‘item’, ‘vector’}, default=’vector’

If ‘item’, return neighbor ids. If ‘vector’, return neighbor vectors.

Returns:
neighborsnumpy.ndarray

If output_type='item', shape is (n_queries, n_neighbors). If output_type='vector', shape is (n_queries, n_neighbors, f).

distancesnumpy.ndarray of shape (n_queries, n_neighbors)

Neighbor distances. Returned when include_distances=True.

Raises:
sklearn.exceptions.NotFittedError

If the backend reports that the index is unbuilt.

ValueError

If n_neighbors <= 0 or any query yields too few neighbors after exclusions.

Parameters:
Return type:

ndarray | tuple[ndarray, ndarray]

See also

query_by_vector

Per-query 1D interface.

kneighbors_graph

CSR kNN graph.

kneighbors_graph(X, n_neighbors=5, *, search_k=-1, mode='connectivity', exclude_self=False, exclude_item_ids=None, ensure_all_finite=True, copy=False, output_type='item')[source]#

Compute the k-neighbors graph (CSR) for query vectors.

Parameters:
Xarray-like of shape (f,) or (n_queries, f)

Query vector(s).

n_neighborsint, default=5

Number of neighbors per query.

search_kint, default=-1

Search parameter forwarded to the backend.

mode{‘connectivity’, ‘distance’}, default=’connectivity’

If ‘connectivity’, graph entries are 1. If ‘distance’, entries are backend distances.

exclude_selfbool, default=False

If True, apply the same deterministic self-exclusion rule as kneighbors for each query row.

exclude_item_idsiterable of int, optional

Exclude these ids for every query.

ensure_all_finitebool or ‘allow-nan’, default=True

Input validation option forwarded to scikit-learn.

copybool, default=False

Input validation option forwarded to scikit-learn.

output_type{‘item’}, default=’item’

Must be ‘item’ for CSR construction.

Returns:
graphscipy.sparse.csr_matrix

CSR matrix of shape (n_queries, n_items).

Raises:
ImportError

If SciPy is not installed.

ValueError

If mode is invalid or output_type != 'item'.

RuntimeError

If the backend returns an out-of-range neighbor id.

Parameters:
Return type:

Any

See also

kneighbors

Dense kNN results.

query_by_item(item, n_neighbors, *, search_k=-1, include_distances=False, exclude_self=False, exclude_item_ids=None, ensure_all_finite=True, copy=False)[source]#

Query neighbors by stored item id.

Parameters:
itemint

Stored item id.

n_neighborsint

Number of neighbors to return after applying exclusions.

search_kint, default=-1

Search parameter forwarded to the backend.

include_distancesbool, default=False

If True, also return distances.

exclude_selfbool, default=False

If True, exclude item from the returned neighbors.

exclude_item_idsiterable of int, optional

Additional item ids to exclude.

ensure_all_finitebool or ‘allow-nan’, default=True

Input validation option forwarded to scikit-learn.

copybool, default=False

Input validation option forwarded to scikit-learn.

Returns:
indicesnumpy.ndarray of shape (n_neighbors,)

Neighbor ids.

(indices, distances)tuple of numpy.ndarray

Returned when include_distances=True.

Raises:
sklearn.exceptions.NotFittedError

If the backend reports that the index is unbuilt.

ValueError

If n_neighbors <= 0 or not enough neighbors remain after exclusions.

Parameters:
Return type:

ndarray | tuple[ndarray, ndarray]

See also

query_by_vector

Query neighbors by an explicit vector.

kneighbors

Batch neighbor queries (sklearn-like).

Notes

Exclusions are applied deterministically in the order returned by the backend.

query_by_vector(vector, n_neighbors, *, search_k=-1, include_distances=False, exclude_self=False, exclude_item_ids=None, ensure_all_finite=True, copy=False)[source]#

Query neighbors by an explicit vector.

Parameters:
vectorarray-like of shape (f,)

Query vector.

n_neighborsint

Number of neighbors to return after exclusions.

search_kint, default=-1

Search parameter forwarded to the backend.

include_distancesbool, default=False

If True, also return distances.

exclude_selfbool, default=False

If True, exclude the first returned candidate whose distance is exactly 0.0. This is intended for queries where vector comes from the index itself.

exclude_item_idsiterable of int, optional

Additional item ids to exclude.

ensure_all_finitebool or ‘allow-nan’, default=True

Input validation option forwarded to scikit-learn.

copybool, default=False

Input validation option forwarded to scikit-learn.

Returns:
indicesnumpy.ndarray of shape (n_neighbors,)

Neighbor ids.

(indices, distances)tuple of numpy.ndarray

Returned when include_distances=True.

Raises:
sklearn.exceptions.NotFittedError

If the backend reports that the index is unbuilt.

ValueError

If n_neighbors <= 0, vector dimension mismatches f, or not enough neighbors remain after exclusions.

Parameters:
Return type:

ndarray | tuple[ndarray, ndarray]

See also

query_by_item

Query neighbors by stored item id.

kneighbors

Batch neighbor queries (sklearn-like).

Notes

Exclusions are applied deterministically in the order returned by the backend. If exclude_self=True and no exact 0.0 distance candidate is returned in the first position, no additional self-exclusion is applied.

query_vectors_by_item(item, n_neighbors, *, search_k=-1, include_distances=False, exclude_self=False, exclude_item_ids=None, ensure_all_finite=True, copy=False, dtype=<class 'numpy.float32'>, output_type='vector')[source]#

Query neighbor vectors by stored item id.

This is a convenience wrapper over query_by_item that materializes vectors using the backend’s get_item_vector.

Parameters:
item, n_neighbors, search_k, include_distances, exclude_self, exclude_item_ids

See query_by_item.

ensure_all_finite, copy

See query_by_vector.

dtypenumpy dtype, default=numpy.float32

Output dtype for the returned vectors.

output_type{‘item’, ‘vector’}, default=’vector’

If ‘vector’, return neighbor vectors. If ‘item’, return neighbor ids.

Returns:
vectorsnumpy.ndarray of shape (n_neighbors, f)

Neighbor vectors.

(vectors, distances)tuple

Returned when include_distances=True.

Parameters:
Return type:

ndarray | tuple[ndarray, ndarray]

See also

query_vectors_by_vector

Vector query returning vectors (or ids).

query_vectors_by_vector(vector, n_neighbors, *, search_k=-1, include_distances=False, exclude_self=False, exclude_item_ids=None, ensure_all_finite=True, copy=False, dtype=<class 'numpy.float32'>, output_type='vector')[source]#

Query neighbor vectors by an explicit vector.

Convenience wrapper over query_by_vector. By default it returns vectors; set output_type='item' to return neighbor ids instead.

Parameters:
vector, n_neighbors, search_k, include_distances, exclude_self, exclude_item_ids,

See query_by_item.

ensure_all_finite, copy

See query_by_vector.

dtypenumpy dtype, default=numpy.float32

Output dtype for the returned vectors.

output_type{‘item’, ‘vector’}, default=’vector’

If ‘vector’, return neighbor vectors. If ‘item’, return neighbor ids.

Returns:
neighborsnumpy.ndarray

If output_type='vector', an array of shape (n_neighbors, f). If output_type='item', an array of shape (n_neighbors,).

(neighbors, distances)tuple

Returned when include_distances=True.

Parameters:
Return type:

ndarray | tuple[ndarray, ndarray]

See also

query_vectors_by_item

Item id query returning vectors.

query_by_vector

Per-query id interface.