classify_url#
- scikitplot.corpus.classify_url(url)[source]#
Classify a URL into one of the known
URLKindcategories.- Parameters:
- urlstr
Full URL string. Must start with
http://orhttps://.
- Returns:
- URLKind
Classification result.
- Raises:
- ValueError
If url does not start with
http://orhttps://.
- Parameters:
url (str)
- Return type:
Notes
Classification order matters. The check sequence is:
YouTube channel / handle (checked before video — the
@Handlepath would otherwise fall through to the web-page fallback).YouTube playlist (
/playlist?list=…).YouTube single video (
watch,shorts,embed,live,youtu.be). Awatch?v=…&list=…URL is classified as a single video — thelist=param is contextual and the reader extracts only thev=video ID.Google Drive share links.
GitHub blob URLs (must check before raw, since blob URLs are on
github.com).GitHub raw URLs.
Downloadable file (extension-based heuristic on the URL path).
Web page (default fallback).
Examples
>>> classify_url("https://youtu.be/rwPISgZcYIk") <URLKind.YOUTUBE: 'youtube'> >>> classify_url("https://www.youtube.com/watch?v=4nMSvDEYl1c") <URLKind.YOUTUBE: 'youtube'> >>> classify_url("https://www.youtube.com/shorts/-6hoqujlmfU") <URLKind.YOUTUBE: 'youtube'> >>> classify_url("https://www.youtube.com/watch?v=AAk3pi15Zn4&list=PLL4_zLP7J") <URLKind.YOUTUBE: 'youtube'> >>> classify_url("https://www.youtube.com/@WHO/videos") <URLKind.YOUTUBE_CHANNEL: 'youtube_channel'> >>> classify_url("https://www.youtube.com/@WHO/shorts") <URLKind.YOUTUBE_CHANNEL: 'youtube_channel'> >>> classify_url("https://www.youtube.com/playlist?list=PLL4_zLP7J") <URLKind.YOUTUBE_PLAYLIST: 'youtube_playlist'> >>> classify_url("https://example.com/report.pdf") <URLKind.DOWNLOADABLE: 'downloadable'> >>> classify_url("https://drive.google.com/file/d/abc123/view") <URLKind.GOOGLE_DRIVE: 'google_drive'> >>> classify_url("https://example.com/article") <URLKind.WEB_PAGE: 'web_page'>