GoogleDriveDownloader#
- class scikitplot.corpus.GoogleDriveDownloader(input_url, output_path=None, timeout=30.0, max_bytes=104857600, verify_ssl=True, block_private_ips=True, max_redirects=5, user_agent='Mozilla/5.0 (compatible; scikitplot-corpus/1.0; +https://github.com/scikit-plots/scikit-plots)')[source]#
Google Drive share-link downloader.
Resolves any public Google Drive share URL to a direct download URL and streams the file to a local path. Handles the large-file virus-warning interstitial automatically.
- Parameters:
- input_urlstr
Any supported Google Drive share URL. See module docstring for accepted forms.
- output_pathpathlib.Path or None, optional
Directory for the downloaded file. Default:
None(temp dir).- timeoutfloat, optional
HTTP timeout in seconds. Default:
30.0.- max_bytesint, optional
Download size cap in bytes. Default:
100 MB.- verify_sslbool, optional
Verify TLS certificates. Default:
True.- block_private_ipsbool, optional
SSRF prevention. Default:
True.- max_redirectsint, optional
Maximum HTTP redirects. Default:
5.
- Raises:
- ValueError
If the file ID cannot be extracted from
input_urlat construction.
- Parameters:
Notes
Large-file bypass: When Google Drive serves a virus-scan warning page instead of the file, this class inspects the response body for the
confirm=token, rebuilds the download URL with the token, and re-downloads transparently.Private files: Only publicly shared files are supported. Files that require Google account authentication will raise
403 Forbiddenfrom Google’s servers. OAuth2 support is a planned future extension.Examples
>>> dl = GoogleDriveDownloader( ... "https://drive.google.com/file/d/1abc-DEF_xyz/view?usp=sharing" ... ) >>> result = dl.download() >>> result.suffix # determined from Content-Disposition / Content-Type '.pdf'
Already-direct URL form:
>>> dl = GoogleDriveDownloader( ... "https://drive.google.com/uc?export=download&id=1abc-DEF_xyz" ... ) >>> result = dl.download()
- cleanup()[source]#
Remove the temporary directory owned by this instance, if any.
Safe to call multiple times. If
output_pathwas supplied at construction time (caller-owned), this method is a no-op.- Return type:
None
- download()[source]#
Download the Google Drive file and return a
DownloadResult.Handles the large-file virus-warning interstitial by inspecting the first response for a
confirm=token and re-issuing the request with the token if needed.- Returns:
- DownloadResult
Populated result with local file path, extension, and source URL.
- Raises:
- ValueError
If SSRF check fails, size exceeds
max_bytes, or the file ID cannot be extracted.- requests.HTTPError
On HTTP 4xx/5xx errors from Google’s servers.
- RuntimeError
If the confirm-bypass loop fails (unexpected response structure).
- Return type:
- resolve_download_url()[source]#
Resolve the share URL to a direct Google Drive download URL.
- Returns:
- str
Direct download URL with
?export=download&id=FILE_ID.
- Return type:
Examples
>>> dl = GoogleDriveDownloader("https://drive.google.com/file/d/1abc-DEF/view") >>> dl.resolve_download_url() 'https://drive.google.com/uc?export=download&id=1abc-DEF'