download_url#

scikitplot.corpus.download_url(url, *, output_path=None, max_bytes=524288000, timeout=120, max_redirects=10, max_retries=3, retry_backoff=1.0, skip_ssrf_check=False)[source]#

Download a URL to a local file.

Parameters:
urlstr

URL to download. Must be http:// or https://.

output_pathstr, Path, or None, optional

Directory to write the downloaded file into. If None, tempfile.gettempdir() is used. Default: None.

max_bytesint, optional

Maximum download size in bytes. Default: 500 MB.

timeoutint, optional

HTTP timeout in seconds. Default: 120.

max_redirectsint, optional

Maximum number of HTTP redirects to follow. Default: 10.

max_retriesint, optional

Maximum retry attempts for transient HTTP errors (429, 500, 502, 503, 504). Each attempt waits retry_backoff * 2 ** attempt seconds before retrying. Set to 0 to disable retries. Default: 3.

retry_backofffloat, optional

Base delay in seconds for exponential back-off. The actual wait before attempt n (0-indexed) is retry_backoff * 2 ** n seconds. Default: 1.0.

skip_ssrf_checkbool, optional

Skip SSRF prevention check. Only for trusted internal URLs. Default: False.

Returns:
pathlib.Path

Path to the downloaded file. The caller is responsible for cleanup.

Raises:
ValueError

If the URL is invalid, targets a private IP (SSRF), or the response exceeds max_bytes.

urllib.error.URLError

If the download fails due to a network error and all retries are exhausted.

TimeoutError

If the download exceeds timeout seconds.

Parameters:
  • url (str)

  • output_path (str | Path | None)

  • max_bytes (int)

  • timeout (int)

  • max_redirects (int)

  • max_retries (int)

  • retry_backoff (float)

  • skip_ssrf_check (bool)

Return type:

Path

Notes

Security: The URL is validated against private IP ranges before connecting. This prevents SSRF attacks where an attacker’s URL redirects to an internal service.

Deterministic filenames: The downloaded file uses a SHA-256 prefix of the URL as the filename stem, so repeated downloads of the same URL produce the same filename.

Retry policy: Only transient server-side errors trigger a retry (HTTP 429, 500, 502, 503, 504). Client errors (4xx except 429) and ValueError (SSRF, size exceeded) are not retried.

Examples

>>> path = download_url("https://example.com/report.pdf")
>>> path.suffix
'.pdf'
>>> path.exists()
True