NDArrayMixin#

class scikitplot.annoy.NDArrayMixin[source]#

NumPy / SciPy / pandas interoperability for Annoy-like indexes.

add_items(X, ids=None, *, start_id=None, accept_sparse='error', ensure_all_finite=True, copy=False, dtype=<class 'numpy.float32'>, order='C', check_unique_ids=True)[source]#

Add many vectors to the index.

Parameters:

Xarray-like of shape (n_samples, n_features): Vectors to add.
idsarray-like of shape (n_samples,), optional: Explicit integer ids. If omitted, ids are allocated as a contiguous range starting at start_id (or get_n_items() at call time).
start_idint, optional: Starting id used when ids is None. If None, defaults to backend.get_n_items() at call time.
accept_sparse{‘error’, ‘toarray’}, default=’error’: Sparse input handling. 'toarray' densifies SciPy sparse inputs explicitly. Any other sparse behavior raises.
ensure_all_finitebool or ‘allow-nan’, default=True: Finiteness validation policy.
copybool, default=False: If True, copy the validated dense array before adding.
dtypenumpy dtype, default=numpy.float32: Dtype passed to the backend.
order{‘C’, ‘F’, ‘A’, ‘K’}, default=’C’: Memory order used when coercing X.
check_unique_idsbool, default=True: If True, require ids to be unique.

Returns:

ids_outnumpy.ndarray of shape (n_samples,): The ids that were added, as int64.

Raises:

RuntimeError: If the backend indicates the index is built.
TypeError: If sparse input is given while accept_sparse='error'.
ValueError: If X is not 2D, feature dimensions mismatch f, ids are invalid, or finiteness policy is violated.

Parameters:

X (Any)
ids (Sequence[int] | Iterable[int] | None)
start_id (int | None)
accept_sparse (Literal['error', 'toarray'])
ensure_all_finite (bool | Literal['allow-nan'])
copy (bool)
dtype (Any)
order (Literal['C', 'F', 'A', 'K'])
check_unique_ids (bool)

Return type:

ndarray

See also

get_item_vectors: Fetch vectors by id selection.
to_numpy: Export vectors as a dense NumPy array.

Notes

This method is deterministic: ids are generated predictably and vectors are added in row order.

get_item_vectors(ids=None, *, dtype=<class 'numpy.float32'>, start=0, stop=None, n_rows=None, return_ids=False, validate_vector_len=True)[source]#

Fetch many vectors as a dense NumPy array.

Parameters:

idssequence of int or iterable of int, optional: Ids to fetch. If None, selects range(start, stop or n_items).
dtypenumpy dtype, default=numpy.float32: Output dtype.
start, stopint, optional: Range selection used when ids is None.
n_rowsint, optional: Required when ids is a non-sized iterable (e.g., generator).
return_idsbool, default=False: If True, also return the realized ids (int64) in row order.
validate_vector_lenbool, default=True: If True, verify every fetched vector has length f.

Returns:

Xnumpy.ndarray of shape (n_rows, f): Dense matrix of vectors.
ids_outnumpy.ndarray of shape (n_rows,), optional: Returned when return_ids=True.

Raises:

ValueError: If the id selection is inconsistent or vectors have unexpected length.
TypeError: If ids is a non-sized iterable and n_rows is not provided.

Parameters:

ids (Sequence[int] | Iterable[int] | None)
dtype (Any)
start (int)
stop (int | None)
n_rows (int | None)
return_ids (bool)
validate_vector_len (bool)

Return type:

ndarray | tuple[ndarray, ndarray]

See also

to_numpy: Dense NumPy export alias.
iter_item_vectors: Streaming export without allocating a dense matrix.

iter_item_vectors(ids=None, *, start=0, stop=None, with_ids=True, dtype=None)[source]#

Iterate vectors without allocating a dense matrix.

Parameters:

ids, start, stop: Selection controls. See get_item_vectors.
with_idsbool, default=True: If True, yield (id, vector). If False, yield vectors only.
dtypenumpy dtype, optional: If provided, cast output vectors to this dtype.

Yields:

(id, vector) or vector: Each vector is returned as a 1D NumPy array.

Parameters:

ids (Sequence[int] | Iterable[int] | None)
start (int)
stop (int | None)
with_ids (bool)
dtype (Any | None)

Return type:

Iterator[ndarray | tuple[int, ndarray]]

See also

get_item_vectors: Dense export.

to_numpy(ids=None, *, dtype=<class 'numpy.float32'>, start=0, stop=None, n_rows=None, validate_vector_len=True)[source]#

Export vectors to a dense NumPy array.

See also

get_item_vectors: Dense export with optional id output.
iter_item_vectors: Streaming export.
to_scipy_csr: Export as SciPy CSR.
to_pandas: Export as pandas DataFrame.

Notes

This is an alias of get_item_vectors with return_ids=False.

Parameters:

ids (Sequence[int] | Iterable[int] | None)
dtype (Any)
start (int)
stop (int | None)
n_rows (int | None)
validate_vector_len (bool)

Return type:

ndarray

to_pandas(ids=None, *, dtype=<class 'numpy.float32'>, start=0, stop=None, n_rows=None, id_location='index', id_name='id', columns=None, validate_vector_len=True)[source]#

Export vectors to a pandas DataFrame.

Parameters:

ids, start, stop, n_rows: Selection controls. See get_item_vectors.
dtypenumpy dtype, default=numpy.float32: Output dtype.
id_location{‘index’, ‘column’, ‘both’, ‘none’}, default=’index’: Where to place ids in the output.
id_namestr, default=’id’: Name used for the id column / index.
columnssequence of str, optional: Column names for vector dimensions. If None, uses feature_names_in_ when present and length matches f; otherwise uses feature_0..feature_{f-1}.
validate_vector_lenbool, default=True: If True, verify every fetched vector has length f.

Returns:

dfpandas.DataFrame: DataFrame with shape (n_rows, f) plus optional id metadata.

Raises:

ImportError: If pandas is not installed.
ValueError: If id_location is invalid or columns length mismatches f.

Parameters:

ids (Sequence[int] | Iterable[int] | None)
dtype (Any)
start (int)
stop (int | None)
n_rows (int | None)
id_location (Literal['index', 'column', 'both', 'none'])
id_name (str)
columns (Sequence[str] | None)
validate_vector_len (bool)

Return type:

Any

See also

to_numpy: Dense NumPy export.
to_scipy_csr: Export as SciPy CSR.

to_scipy_csr(ids=None, *, dtype=<class 'numpy.float32'>, start=0, stop=None, n_rows=None, validate_vector_len=True)[source]#

Export vectors as a SciPy CSR matrix.

Returns:

Xscipy.sparse.csr_matrix: CSR matrix with shape (n_rows, f).

Raises:

ImportError: If SciPy is not installed.

Parameters:

ids (Sequence[int] | Iterable[int] | None)
dtype (Any)
start (int)
stop (int | None)
n_rows (int | None)
validate_vector_len (bool)

Return type:

Any

See also

to_numpy: Dense NumPy export.
to_pandas: Export as pandas DataFrame.

NDArrayMixin#

This Page