BuildResult#

class scikitplot.corpus.BuildResult(documents=<factory>, n_sources=0, n_raw=0, n_filtered=0, n_normalised=0, n_enriched=0, n_embedded=0, index=None, errors=<factory>)[source]#

Result of a corpus build operation.

Parameters:
documentslist[CorpusDocument]

The processed documents.

n_sourcesint

Number of source files/URLs processed.

n_rawint

Total raw chunks before filtering.

n_filteredint

Chunks removed by filtering.

n_normalisedint

Chunks that were text-normalised.

n_enrichedint

Chunks that were NLP-enriched.

n_embeddedint

Chunks that were embedded.

indexSimilarityIndex or None

Built similarity index (if build_index=True).

errorslist[tuple[str, Exception]]

(source_path, exception) pairs for failed sources.

Parameters:

Notes

User note: Access documents directly:

result = builder.build("./data/")
for doc in result.documents:
    print(doc.text[:80])
documents: list[Any][source]#
errors: list[tuple[str, Exception]][source]#
index: Any = None#
property n_documents: int#

Number of CorpusDocument instances in documents.

Returns:
int
n_embedded: int = 0#
n_enriched: int = 0#
n_filtered: int = 0#
n_normalised: int = 0#
n_raw: int = 0#
n_sources: int = 0#
property success_rate: float#

Fraction of ingested sources that completed without error.

Returns:
float

(n_sources - len(errors)) / n_sources in [0.0, 1.0]. Returns 1.0 when no sources were processed.

summary()[source]#

Return a multi-line human-readable build summary.

Returns:
str

Multi-line string reporting sources, documents, normalisation, enrichment, embedding counts, and any errors.

Return type:

str