AnyDownloader#
- class scikitplot.corpus.AnyDownloader(input_url, output_path=None, timeout=30.0, max_bytes=104857600, verify_ssl=True, block_private_ips=True, max_redirects=5, user_agent='Mozilla/5.0 (compatible; scikitplot-corpus/1.0; +https://github.com/scikit-plots/scikit-plots)', youtube_mode='transcript', youtube_language='en', youtube_include_auto=True, github_token=None, headers=None, max_retries=3, retry_backoff=1.0)[source]#
Auto-dispatching downloader with multi-URL and per-parameter list support.
Accepts one URL or a list of URLs. All parameters support
T | list[T] | None:None→ use the parameter’s built-in default for every URL.Scalar → broadcast to every URL.
list→ applied element-wise; must be the same length as input_url.
- Parameters:
- input_urlstr or list[str]
One URL or a list of URLs. When a list is supplied,
downloadreturnslist[DownloadResult].- output_pathpathlib.Path or None, optional
Directory shared across all URLs. Default:
None(temp dir).- timeoutfloat or list[float] or None, optional
HTTP timeout in seconds. Default:
30.0.- max_bytesint or list[int] or None, optional
Download size cap in bytes. Default:
100 MB.- verify_sslbool or list[bool] or None, optional
Verify TLS certificates. Default:
True.- block_private_ipsbool or list[bool] or None, optional
SSRF prevention. Default:
True.- max_redirectsint or list[int] or None, optional
Maximum HTTP redirects. Default:
5.- user_agentstr or list[str] or None, optional
User-Agentheader value. Default: scikitplot UA string.- youtube_modestr or list[str] or None, optional
Mode for YouTubeDownloader:
"transcript","audio", or"video". Default:"transcript".- youtube_languagestr or list[str] or None, optional
BCP-47 language code for transcript fetching. Default:
"en".- youtube_include_autobool or list[bool] or None, optional
Include auto-generated captions as fallback. Default:
True.- github_tokenstr or list[str or None] or None, optional
PAT for GitHubDownloader (private repos). Per-URL
Noneallowed. Never logged. Default:None.- headersdict or list[dict or None] or None, optional
Extra HTTP headers for WebDownloader. Per-URL
Noneallowed. Default:None.- max_retriesint or list[int] or None, optional
Retry attempts for WebDownloader. Default:
3.- retry_backofffloat or list[float] or None, optional
Exponential back-off base for WebDownloader. Default:
1.0.
- Parameters:
input_url (str)
output_path (Path | None)
timeout (float)
max_bytes (int)
verify_ssl (bool)
block_private_ips (bool)
max_redirects (int)
user_agent (str)
youtube_mode (object)
youtube_language (object)
youtube_include_auto (object)
github_token (object)
headers (object)
max_retries (object)
retry_backoff (object)
Notes
Single vs batch:
# Single URL — returns DownloadResult result = AnyDownloader("https://example.com/paper.pdf").download() # Batch — returns list[DownloadResult] results = AnyDownloader( [ "https://example.com/paper.pdf", "https://github.com/org/repo/blob/main/data.csv", "https://www.youtube.com/watch?v=abc123", ] ).download()
Per-URL parameters:
dl = AnyDownloader( input_url=[ "https://github.com/org/priv/blob/main/secret.csv", "https://example.com/public.pdf", ], github_token=["ghp_token", None], # None = public, no token needed timeout=[120.0, 30.0], # per-URL timeouts max_bytes=200 * 1024 * 1024, # broadcast to all ) results = dl.download()
Examples
Single URL:
>>> dl = AnyDownloader("https://example.com/report.pdf") >>> isinstance(dl.download(), DownloadResult) True
Batch:
>>> dl = AnyDownloader( ... ["https://example.com/a.pdf", "https://example.com/b.pdf"], ... timeout=60.0, ... ) >>> len(dl.download()) 2
- cleanup()[source]#
Remove the temporary directory owned by this instance, if any.
Safe to call multiple times. If
output_pathwas supplied at construction time (caller-owned), this method is a no-op.- Return type:
None
- download()[source]#
Download one URL or all URLs and return the result(s).
- Returns:
- DownloadResult
When
input_urlwas a singlestr.- list[DownloadResult]
When
input_urlwas alist[str]. Preserves input order.
Notes
Batch downloads are sequential. For parallel execution, call
download_singleper URL in your own thread/process pool.
- download_all()[source]#
Download all URLs and always return
list[DownloadResult].Normalises the return type so callers never need to branch on
isinstance(result, list).- Returns:
- list[DownloadResult]
One
DownloadResultper URL, in input order.
Examples
>>> dl = AnyDownloader("https://example.com/doc.pdf") >>> results = dl.download_all() >>> len(results) 1