BuildResult#
- class scikitplot.corpus.BuildResult(documents=<factory>, n_sources=0, n_raw=0, n_filtered=0, n_normalised=0, n_enriched=0, n_embedded=0, index=None, errors=<factory>)[source]#
Result of a corpus build operation.
- Parameters:
- documentslist[CorpusDocument]
The processed documents.
- n_sourcesint
Number of source files/URLs processed.
- n_rawint
Total raw chunks before filtering.
- n_filteredint
Chunks removed by filtering.
- n_normalisedint
Chunks that were text-normalised.
- n_enrichedint
Chunks that were NLP-enriched.
- n_embeddedint
Chunks that were embedded.
- indexSimilarityIndex or None
Built similarity index (if
build_index=True).- errorslist[tuple[str, Exception]]
(source_path, exception)pairs for failed sources.
- Parameters:
Notes
User note: Access documents directly:
result = builder.build("./data/") for doc in result.documents: print(doc.text[:80])