download_url#

scikitplot.corpus.download_url(url, *, output_path=None, max_bytes=524288000, timeout=120, max_redirects=10, max_retries=3, retry_backoff=1.0, skip_ssrf_check=False)[source]#

Download a URL to a local file.

Parameters:

urlstr: URL to download. Must be http:// or https://.
output_pathstr, Path, or None, optional: Directory to write the downloaded file into. If None, tempfile.gettempdir() is used. Default: None.
max_bytesint, optional: Maximum download size in bytes. Default: 500 MB.
timeoutint, optional: HTTP timeout in seconds. Default: 120.
max_redirectsint, optional: Maximum number of HTTP redirects to follow. Default: 10.
max_retriesint, optional: Maximum retry attempts for transient HTTP errors (429, 500, 502, 503, 504). Each attempt waits retry_backoff * 2 ** attempt seconds before retrying. Set to 0 to disable retries. Default: 3.
retry_backofffloat, optional: Base delay in seconds for exponential back-off. The actual wait before attempt n (0-indexed) is retry_backoff * 2 ** n seconds. Default: 1.0.
skip_ssrf_checkbool, optional: Skip SSRF prevention check. Only for trusted internal URLs. Default: False.

Returns:

pathlib.Path: Path to the downloaded file. The caller is responsible for cleanup.

Raises:

ValueError: If the URL is invalid, targets a private IP (SSRF), or the response exceeds max_bytes.
urllib.error.URLError: If the download fails due to a network error and all retries are exhausted.
TimeoutError: If the download exceeds timeout seconds.

Parameters:

url (str)
output_path (str | Path | None)
max_bytes (int)
timeout (int)
max_redirects (int)
max_retries (int)
retry_backoff (float)
skip_ssrf_check (bool)

Return type:

Path

Notes

Security: The URL is validated against private IP ranges before connecting. This prevents SSRF attacks where an attacker’s URL redirects to an internal service.

Deterministic filenames: The downloaded file uses a SHA-256 prefix of the URL as the filename stem, so repeated downloads of the same URL produce the same filename.

Retry policy: Only transient server-side errors trigger a retry (HTTP 429, 500, 502, 503, 504). Client errors (4xx except 429) and ValueError (SSRF, size exceeded) are not retried.

Examples

>>> path = download_url("https://example.com/report.pdf")
>>> path.suffix
'.pdf'
>>> path.exists()
True