NDArrayExportMixin#
- class scikitplot.annoy.NDArrayExportMixin[source]#
Export mixin for Annoy-like classes.
A class mixing this in MUST provide: - get_item_vector(i: int) -> Sequence[float] - get_n_items() -> int - attribute/property: f (dimension)
- iter_item_vectors(ids=None, *, start=0, stop=None, with_ids=True)[source]#
Iterate item vectors in a memory-safe way.
- Parameters:
- ids:
Explicit item ids. Must be a sized Sequence for strictness.
- start, stop:
Used only when ids is None.
- with_ids:
If True yield (id, vector), else yield vector.
- Yields:
- (id, vector) or vector
- Parameters:
- Return type:
- partition_existing_ids(ids, *, missing_exceptions=(<class 'IndexError'>, ))[source]#
Partition candidate ids into (existing, missing) using get_item_vector semantics.
Strict rules: - Caller must provide a sized Sequence of ids. - We only treat
missing_exceptionsas a “not found” signal. - Any other exception is re-raised.This avoids implicit assumptions about dense id ranges.
- save_vectors_npy(path, ids=None, *, start=0, stop=None, dtype='float32', overwrite=True)[source]#
Save vectors into a .npy file using NumPy open_memmap.
This is the recommended path for very large indexes.
- to_csv(path, ids=None, *, start=0, stop=None, include_id=True, header=True, delimiter=',', float_format=None, columns=None, dtype='float32')[source]#
Stream vectors to CSV without building a full DataFrame.
This is safer than df.to_csv for large exports.
Notes
CSV for 1B rows will be extremely large and slow. Consider Parquet in the future.