CustomDownloader#

class scikitplot.corpus.CustomDownloader(input_url, output_path=None, timeout=30.0, max_bytes=104857600, verify_ssl=True, block_private_ips=True, max_redirects=5, user_agent='Mozilla/5.0 (compatible; scikitplot-corpus/1.0; +https://github.com/scikit-plots/scikit-plots)', handler=<object object>, handler_kwargs=None)[source]#

Wraps a user-supplied callable as a BaseDownloader.

Parameters:
input_urlstr

HTTP/HTTPS URL passed through to handler.

handlercallable

handler(input_url: str, output_path: Path, **kwargs) -> Path. Must write content to output_path and return the path. Required — raises TypeError at construction if not supplied.

handler_kwargsdict or None, optional

Extra keyword arguments forwarded to handler. Default: None.

output_pathpathlib.Path or None, optional

Directory for the downloaded file. Default: None (temp dir).

timeoutfloat, optional

Forwarded via handler_kwargs if not already present. Default: 30.0.

max_bytesint, optional

Forwarded via handler_kwargs if not already present. Default: 100 MB.

verify_sslbool, optional

Verify TLS certificates. Default: True.

block_private_ipsbool, optional

SSRF prevention. Default: True.

Raises:
TypeError

If handler is not supplied or not callable.

Parameters:
  • input_url (str)

  • output_path (Path | None)

  • timeout (float)

  • max_bytes (int)

  • verify_ssl (bool)

  • block_private_ips (bool)

  • max_redirects (int)

  • user_agent (str)

  • handler (object)

  • handler_kwargs (dict | None)

Examples

>>> from pathlib import Path
>>> def my_handler(input_url: str, output_path: Path, **kwargs) -> Path:
...     out = output_path / "file.txt"
...     out.write_text("content")
...     return out
>>> dl = CustomDownloader("https://example.com/f", handler=my_handler)
>>> result = dl.download()
block_private_ips: bool = True#
cleanup()[source]#

Remove the temporary directory owned by this instance, if any.

Safe to call multiple times. If output_path was supplied at construction time (caller-owned), this method is a no-op.

Return type:

None

download()[source]#

Invoke the user-supplied handler and return a DownloadResult.

Returns:
DownloadResult

Populated from the path returned by handler.

Raises:
ValueError

If SSRF check fails (block_private_ips=True).

TypeError

If handler returns something that cannot be coerced to Path.

FileNotFoundError

If the returned path does not exist.

Return type:

DownloadResult

handler: object = <object object>#
handler_kwargs: dict | None = None#
input_url: str[source]#
max_bytes: int = 104857600#
max_redirects: int = 5#
output_path: Path | None = None#
timeout: float = 30.0#
user_agent: str = 'Mozilla/5.0 (compatible; scikitplot-corpus/1.0; +https://github.com/scikit-plots/scikit-plots)'#
verify_ssl: bool = True#