NDArrayExportMixin#

class scikitplot.annoy.NDArrayExportMixin[source]#

Export mixin for Annoy-like classes.

A class mixing this in MUST provide: - get_item_vector(i: int) -> Sequence[float] - get_n_items() -> int - attribute/property: f (dimension)

iter_item_vectors(ids=None, *, start=0, stop=None, with_ids=True)[source]#

Iterate item vectors in a memory-safe way.

Parameters:

ids:: Explicit item ids. Must be a sized Sequence for strictness.
start, stop:: Used only when ids is None.
with_ids:: If True yield (id, vector), else yield vector.

Yields:

(id, vector) or vector

Parameters:

ids (Sequence[int] | Iterable[int] | None)
start (int)
stop (int | None)
with_ids (bool)

Return type:

Iterator[Sequence[float] | tuple[int, Sequence[float]]]

partition_existing_ids(ids, *, missing_exceptions=(<class 'IndexError'>, ))[source]#

Partition candidate ids into (existing, missing) using get_item_vector semantics.

Strict rules: - Caller must provide a sized Sequence of ids. - We only treat missing_exceptions as a “not found” signal. - Any other exception is re-raised.

This avoids implicit assumptions about dense id ranges.

Parameters:

ids (Sequence[int])
missing_exceptions (tuple[type[Exception], ...])

Return type:

tuple[list[int], list[int]]

save_vectors_npy(path, ids=None, *, start=0, stop=None, dtype='float32', overwrite=True)[source]#

Save vectors into a .npy file using NumPy open_memmap.

This is the recommended path for very large indexes.

Returns:

path

Parameters:

path (str)
ids (Sequence[int] | Iterable[int] | None)
start (int)
stop (int | None)
dtype (str)
overwrite (bool)

Return type:

str

to_csv(path, ids=None, *, start=0, stop=None, include_id=True, header=True, delimiter=',', float_format=None, columns=None, dtype='float32')[source]#

Stream vectors to CSV without building a full DataFrame.

This is safer than df.to_csv for large exports.

Notes

CSV for 1B rows will be extremely large and slow. Consider Parquet in the future.

Parameters:

path (str)
ids (Sequence[int] | Iterable[int] | None)
start (int)
stop (int | None)
include_id (bool)
header (bool)
delimiter (str)
float_format (str | None)
columns (list[str] | None)
dtype (str)

Return type:

str

to_dataframe(ids=None, *, start=0, stop=None, include_id=True, columns=None, dtype='float32')[source]#

Materialize vectors into a Pandas DataFrame.

WARNING: Not suitable for huge indexes.

Parameters:

ids (Sequence[int] | Iterable[int] | None)
start (int)
stop (int | None)
include_id (bool)
columns (list[str] | None)
dtype (str)

to_numpy(ids=None, *, start=0, stop=None, dtype='float32')[source]#

Materialize vectors into an in-memory NumPy array.

WARNING: Not suitable for huge indexes.

Parameters:

ids (Sequence[int] | Iterable[int] | None)
start (int)
stop (int | None)
dtype (str)

NDArrayExportMixin#

This Page