PipelineResult#

class scikitplot.corpus.PipelineResult(source, documents, output_path, n_read, n_omitted, n_embedded, elapsed_seconds, export_format)[source]#

Immutable summary of a single pipeline run.

Parameters:
sourcestr

Input source identifier (file path, URL, or batch label).

documentslist of CorpusDocument

All documents produced (after chunking, filtering, and optional embedding). Empty list if the source yielded no usable text.

output_pathpathlib.Path or None

Path to the exported file, or None when no export was requested (output_path=None in the pipeline call).

n_readint

Total raw chunks yielded by the reader before filtering.

n_omittedint

Chunks dropped by the filter.

n_embeddedint

Documents that received an embedding vector (0 when embedding is disabled).

elapsed_secondsfloat

Wall-clock time for the entire run, in seconds.

export_formatExportFormat or None

Format used for export, or None when no export was done.

Parameters:
  • source (str)

  • documents (list[CorpusDocument])

  • output_path (Path | None)

  • n_read (int)

  • n_omitted (int)

  • n_embedded (int)

  • elapsed_seconds (float)

  • export_format (ExportFormat | None)

Notes

n_read - n_omitted == len(documents) is an invariant maintained by the pipeline.

Examples

>>> result.n_read
512
>>> result.elapsed_seconds
3.14
>>> len(result.documents)
487
documents: list[CorpusDocument][source]#
elapsed_seconds: float[source]#
export_format: ExportFormat | None[source]#
property n_documents: int#

Number of documents in the result.

n_embedded: int[source]#
n_omitted: int[source]#
n_read: int[source]#
output_path: Path | None[source]#
source: str[source]#