NDArrayExportMixin#

class scikitplot.annoy.NDArrayExportMixin[source]#

Export mixin for Annoy-like classes.

A class mixing this in MUST provide: - get_item_vector(i: int) -> Sequence[float] - get_n_items() -> int - attribute/property: f (dimension)

iter_item_vectors(ids=None, *, start=0, stop=None, with_ids=True)[source]#

Iterate item vectors in a memory-safe way.

Parameters:
ids:

Explicit item ids. Must be a sized Sequence for strictness.

start, stop:

Used only when ids is None.

with_ids:

If True yield (id, vector), else yield vector.

Yields:
(id, vector) or vector
Parameters:
Return type:

Iterator[Sequence[float] | tuple[int, Sequence[float]]]

partition_existing_ids(ids, *, missing_exceptions=(<class 'IndexError'>, ))[source]#

Partition candidate ids into (existing, missing) using get_item_vector semantics.

Strict rules: - Caller must provide a sized Sequence of ids. - We only treat missing_exceptions as a “not found” signal. - Any other exception is re-raised.

This avoids implicit assumptions about dense id ranges.

Parameters:
Return type:

tuple[list[int], list[int]]

save_vectors_npy(path, ids=None, *, start=0, stop=None, dtype='float32', overwrite=True)[source]#

Save vectors into a .npy file using NumPy open_memmap.

This is the recommended path for very large indexes.

Returns:
path
Parameters:
Return type:

str

to_csv(path, ids=None, *, start=0, stop=None, include_id=True, header=True, delimiter=',', float_format=None, columns=None, dtype='float32')[source]#

Stream vectors to CSV without building a full DataFrame.

This is safer than df.to_csv for large exports.

Notes

CSV for 1B rows will be extremely large and slow. Consider Parquet in the future.

Parameters:
Return type:

str

to_dataframe(ids=None, *, start=0, stop=None, include_id=True, columns=None, dtype='float32')[source]#

Materialize vectors into a Pandas DataFrame.

WARNING: Not suitable for huge indexes.

Parameters:
to_numpy(ids=None, *, start=0, stop=None, dtype='float32')[source]#

Materialize vectors into an in-memory NumPy array.

WARNING: Not suitable for huge indexes.

Parameters: