GoogleDriveDownloader#

class scikitplot.corpus.GoogleDriveDownloader(input_url, output_path=None, timeout=30.0, max_bytes=104857600, verify_ssl=True, block_private_ips=True, max_redirects=5, user_agent='Mozilla/5.0 (compatible; scikitplot-corpus/1.0; +https://github.com/scikit-plots/scikit-plots)')[source]#

Google Drive share-link downloader.

Resolves any public Google Drive share URL to a direct download URL and streams the file to a local path. Handles the large-file virus-warning interstitial automatically.

Parameters:
input_urlstr

Any supported Google Drive share URL. See module docstring for accepted forms.

output_pathpathlib.Path or None, optional

Directory for the downloaded file. Default: None (temp dir).

timeoutfloat, optional

HTTP timeout in seconds. Default: 30.0.

max_bytesint, optional

Download size cap in bytes. Default: 100 MB.

verify_sslbool, optional

Verify TLS certificates. Default: True.

block_private_ipsbool, optional

SSRF prevention. Default: True.

max_redirectsint, optional

Maximum HTTP redirects. Default: 5.

Raises:
ValueError

If the file ID cannot be extracted from input_url at construction.

Parameters:
  • input_url (str)

  • output_path (Path | None)

  • timeout (float)

  • max_bytes (int)

  • verify_ssl (bool)

  • block_private_ips (bool)

  • max_redirects (int)

  • user_agent (str)

Notes

Large-file bypass: When Google Drive serves a virus-scan warning page instead of the file, this class inspects the response body for the confirm= token, rebuilds the download URL with the token, and re-downloads transparently.

Private files: Only publicly shared files are supported. Files that require Google account authentication will raise 403 Forbidden from Google’s servers. OAuth2 support is a planned future extension.

Examples

>>> dl = GoogleDriveDownloader(
...     "https://drive.google.com/file/d/1abc-DEF_xyz/view?usp=sharing"
... )
>>> result = dl.download()
>>> result.suffix  # determined from Content-Disposition / Content-Type
'.pdf'

Already-direct URL form:

>>> dl = GoogleDriveDownloader(
...     "https://drive.google.com/uc?export=download&id=1abc-DEF_xyz"
... )
>>> result = dl.download()
block_private_ips: bool = True#
cleanup()[source]#

Remove the temporary directory owned by this instance, if any.

Safe to call multiple times. If output_path was supplied at construction time (caller-owned), this method is a no-op.

Return type:

None

download()[source]#

Download the Google Drive file and return a DownloadResult.

Handles the large-file virus-warning interstitial by inspecting the first response for a confirm= token and re-issuing the request with the token if needed.

Returns:
DownloadResult

Populated result with local file path, extension, and source URL.

Raises:
ValueError

If SSRF check fails, size exceeds max_bytes, or the file ID cannot be extracted.

requests.HTTPError

On HTTP 4xx/5xx errors from Google’s servers.

RuntimeError

If the confirm-bypass loop fails (unexpected response structure).

Return type:

DownloadResult

input_url: str[source]#
max_bytes: int = 104857600#
max_redirects: int = 5#
output_path: Path | None = None#
resolve_download_url()[source]#

Resolve the share URL to a direct Google Drive download URL.

Returns:
str

Direct download URL with ?export=download&id=FILE_ID.

Return type:

str

Examples

>>> dl = GoogleDriveDownloader("https://drive.google.com/file/d/1abc-DEF/view")
>>> dl.resolve_download_url()
'https://drive.google.com/uc?export=download&id=1abc-DEF'
timeout: float = 30.0#
user_agent: str = 'Mozilla/5.0 (compatible; scikitplot-corpus/1.0; +https://github.com/scikit-plots/scikit-plots)'#
verify_ssl: bool = True#