YouTubeDownloader#
- class scikitplot.corpus.YouTubeDownloader(input_url, output_path=None, timeout=30.0, max_bytes=104857600, verify_ssl=True, block_private_ips=True, max_redirects=5, user_agent='Mozilla/5.0 (compatible; scikitplot-corpus/1.0; +https://github.com/scikit-plots/scikit-plots)', mode='transcript', language='en', include_auto_generated=True)[source]#
YouTube content downloader.
Downloads a transcript, audio track, or video from a single YouTube video URL. The
modeparameter selects what is fetched.- Parameters:
- input_urlstr
YouTube video URL. Accepted forms:
https://www.youtube.com/watch?v=VIDEO_IDhttps://youtu.be/VIDEO_IDhttps://www.youtube.com/shorts/VIDEO_IDhttps://www.youtube.com/embed/VIDEO_ID
- mode{“transcript”, “audio”, “video”}, optional
What to download. Default:
"transcript".- languagestr, optional
BCP-47 language code for transcript fetching (e.g.
"en","fr","de"). Falls back to auto-generated captions when the requested language is not available. Only used formode="transcript". Default:"en".- include_auto_generatedbool, optional
When
True, include auto-generated transcripts as a fallback when no human-reviewed captions exist. Default:True.- output_pathpathlib.Path or None, optional
Directory for the downloaded file. Default:
None(temp dir).- timeoutfloat, optional
HTTP timeout in seconds (transcript fetch and yt-dlp). Default:
30.0.- max_bytesint, optional
Download size cap in bytes (audio/video modes only; transcripts are always small). Default:
100 MB.
- Raises:
- ValueError
If the URL is not a recognised YouTube single-video URL.
- ValueError
If
modeis not one of"transcript","audio","video".
- Parameters:
Notes
Transcript mode uses
youtube-transcript-api(pip-installable, lightweight, no browser). It writes a plain.txtfile where each caption segment is a line.Audio/video modes require
yt-dlp(pip install yt-dlp). They invokeyt-dlpprogrammatically via its Python API.Channels and playlists are not supported — pass a single video URL.
SSRF prevention is always applied for audio/video modes (network calls made by yt-dlp go to YouTube CDN, which is public; the check is a defence-in-depth measure). Transcript mode makes its own HTTP calls which are also validated.
Examples
Transcript (default):
>>> dl = YouTubeDownloader("https://www.youtube.com/watch?v=rwPISgZcYIk") >>> result = dl.download() >>> result.suffix '.txt'
Audio download:
>>> dl = YouTubeDownloader( ... "https://youtu.be/rwPISgZcYIk", ... mode="audio", ... ) >>> result = dl.download() >>> result.suffix in (".mp3", ".m4a", ".webm") True
- cleanup()[source]#
Remove the temporary directory owned by this instance, if any.
Safe to call multiple times. If
output_pathwas supplied at construction time (caller-owned), this method is a no-op.- Return type:
None
- download()[source]#
Download the requested content and return a
DownloadResult.Dispatches to
_download_transcript,_download_audio, or_download_videobased onself.mode.- Returns:
- DownloadResult
Populated result with local file path, extension, source URL.
- Raises:
- ImportError
If
youtube-transcript-api(transcript mode) oryt-dlp(audio/video modes) is not installed.- ValueError
If the transcript is not available for the given video/language.
- Return type: