download_url#
- scikitplot.corpus.download_url(url, *, output_path=None, max_bytes=524288000, timeout=120, max_redirects=10, max_retries=3, retry_backoff=1.0, skip_ssrf_check=False)[source]#
Download a URL to a local file.
- Parameters:
- urlstr
URL to download. Must be
http://orhttps://.- output_pathstr, Path, or None, optional
Directory to write the downloaded file into. If
None,tempfile.gettempdir()is used. Default:None.- max_bytesint, optional
Maximum download size in bytes. Default: 500 MB.
- timeoutint, optional
HTTP timeout in seconds. Default: 120.
- max_redirectsint, optional
Maximum number of HTTP redirects to follow. Default: 10.
- max_retriesint, optional
Maximum retry attempts for transient HTTP errors (429, 500, 502, 503, 504). Each attempt waits
retry_backoff * 2 ** attemptseconds before retrying. Set to0to disable retries. Default: 3.- retry_backofffloat, optional
Base delay in seconds for exponential back-off. The actual wait before attempt n (0-indexed) is
retry_backoff * 2 ** nseconds. Default: 1.0.- skip_ssrf_checkbool, optional
Skip SSRF prevention check. Only for trusted internal URLs. Default:
False.
- Returns:
- pathlib.Path
Path to the downloaded file. The caller is responsible for cleanup.
- Raises:
- ValueError
If the URL is invalid, targets a private IP (SSRF), or the response exceeds max_bytes.
- urllib.error.URLError
If the download fails due to a network error and all retries are exhausted.
- TimeoutError
If the download exceeds timeout seconds.
- Parameters:
- Return type:
Notes
Security: The URL is validated against private IP ranges before connecting. This prevents SSRF attacks where an attacker’s URL redirects to an internal service.
Deterministic filenames: The downloaded file uses a SHA-256 prefix of the URL as the filename stem, so repeated downloads of the same URL produce the same filename.
Retry policy: Only transient server-side errors trigger a retry (HTTP 429, 500, 502, 503, 504). Client errors (4xx except 429) and
ValueError(SSRF, size exceeded) are not retried.Examples
>>> path = download_url("https://example.com/report.pdf") >>> path.suffix '.pdf' >>> path.exists() True