NDArrayMixin#

class scikitplot.annoy.NDArrayMixin[source]#

NumPy / SciPy / pandas interoperability for Annoy-like indexes.

add_items(X, ids=None, *, start_id=None, accept_sparse='error', ensure_all_finite=True, copy=False, dtype=<class 'numpy.float32'>, order='C', check_unique_ids=True)[source]#

Add many vectors to the index.

Parameters:
Xarray-like of shape (n_samples, n_features)

Vectors to add.

idsarray-like of shape (n_samples,), optional

Explicit integer ids. If omitted, ids are allocated as a contiguous range starting at start_id (or get_n_items() at call time).

start_idint, optional

Starting id used when ids is None. If None, defaults to backend.get_n_items() at call time.

accept_sparse{‘error’, ‘toarray’}, default=’error’

Sparse input handling. 'toarray' densifies SciPy sparse inputs explicitly. Any other sparse behavior raises.

ensure_all_finitebool or ‘allow-nan’, default=True

Finiteness validation policy.

copybool, default=False

If True, copy the validated dense array before adding.

dtypenumpy dtype, default=numpy.float32

Dtype passed to the backend.

order{‘C’, ‘F’, ‘A’, ‘K’}, default=’C’

Memory order used when coercing X.

check_unique_idsbool, default=True

If True, require ids to be unique.

Returns:
ids_outnumpy.ndarray of shape (n_samples,)

The ids that were added, as int64.

Raises:
RuntimeError

If the backend indicates the index is built.

TypeError

If sparse input is given while accept_sparse='error'.

ValueError

If X is not 2D, feature dimensions mismatch f, ids are invalid, or finiteness policy is violated.

Parameters:
Return type:

ndarray

See also

get_item_vectors

Fetch vectors by id selection.

to_numpy

Export vectors as a dense NumPy array.

Notes

This method is deterministic: ids are generated predictably and vectors are added in row order.

get_item_vectors(ids=None, *, dtype=<class 'numpy.float32'>, start=0, stop=None, n_rows=None, return_ids=False, validate_vector_len=True)[source]#

Fetch many vectors as a dense NumPy array.

Parameters:
idssequence of int or iterable of int, optional

Ids to fetch. If None, selects range(start, stop or n_items).

dtypenumpy dtype, default=numpy.float32

Output dtype.

start, stopint, optional

Range selection used when ids is None.

n_rowsint, optional

Required when ids is a non-sized iterable (e.g., generator).

return_idsbool, default=False

If True, also return the realized ids (int64) in row order.

validate_vector_lenbool, default=True

If True, verify every fetched vector has length f.

Returns:
Xnumpy.ndarray of shape (n_rows, f)

Dense matrix of vectors.

ids_outnumpy.ndarray of shape (n_rows,), optional

Returned when return_ids=True.

Raises:
ValueError

If the id selection is inconsistent or vectors have unexpected length.

TypeError

If ids is a non-sized iterable and n_rows is not provided.

Parameters:
Return type:

ndarray | tuple[ndarray, ndarray]

See also

to_numpy

Dense NumPy export alias.

iter_item_vectors

Streaming export without allocating a dense matrix.

iter_item_vectors(ids=None, *, start=0, stop=None, with_ids=True, dtype=None)[source]#

Iterate vectors without allocating a dense matrix.

Parameters:
ids, start, stop

Selection controls. See get_item_vectors.

with_idsbool, default=True

If True, yield (id, vector). If False, yield vectors only.

dtypenumpy dtype, optional

If provided, cast output vectors to this dtype.

Yields:
(id, vector) or vector

Each vector is returned as a 1D NumPy array.

Parameters:
Return type:

Iterator[ndarray | tuple[int, ndarray]]

See also

get_item_vectors

Dense export.

to_numpy(ids=None, *, dtype=<class 'numpy.float32'>, start=0, stop=None, n_rows=None, validate_vector_len=True)[source]#

Export vectors to a dense NumPy array.

See also

get_item_vectors

Dense export with optional id output.

iter_item_vectors

Streaming export.

to_scipy_csr

Export as SciPy CSR.

to_pandas

Export as pandas DataFrame.

Notes

This is an alias of get_item_vectors with return_ids=False.

Parameters:
Return type:

ndarray

to_pandas(ids=None, *, dtype=<class 'numpy.float32'>, start=0, stop=None, n_rows=None, id_location='index', id_name='id', columns=None, validate_vector_len=True)[source]#

Export vectors to a pandas DataFrame.

Parameters:
ids, start, stop, n_rows

Selection controls. See get_item_vectors.

dtypenumpy dtype, default=numpy.float32

Output dtype.

id_location{‘index’, ‘column’, ‘both’, ‘none’}, default=’index’

Where to place ids in the output.

id_namestr, default=’id’

Name used for the id column / index.

columnssequence of str, optional

Column names for vector dimensions. If None, uses feature_names_in_ when present and length matches f; otherwise uses feature_0..feature_{f-1}.

validate_vector_lenbool, default=True

If True, verify every fetched vector has length f.

Returns:
dfpandas.DataFrame

DataFrame with shape (n_rows, f) plus optional id metadata.

Raises:
ImportError

If pandas is not installed.

ValueError

If id_location is invalid or columns length mismatches f.

Parameters:
Return type:

Any

See also

to_numpy

Dense NumPy export.

to_scipy_csr

Export as SciPy CSR.

to_scipy_csr(ids=None, *, dtype=<class 'numpy.float32'>, start=0, stop=None, n_rows=None, validate_vector_len=True)[source]#

Export vectors as a SciPy CSR matrix.

Returns:
Xscipy.sparse.csr_matrix

CSR matrix with shape (n_rows, f).

Raises:
ImportError

If SciPy is not installed.

Parameters:
Return type:

Any

See also

to_numpy

Dense NumPy export.

to_pandas

Export as pandas DataFrame.