././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1711211674.0158062 gallery_dl-1.26.9/0000755000175000017500000000000014577602232012433 5ustar00mikemike././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1711211673.0 gallery_dl-1.26.9/CHANGELOG.md0000644000175000017500000075625214577602231014264 0ustar00mikemike# Changelog ## 1.26.9 - 2024-03-23 ### Extractors #### Additions - [artstation] support video clips ([#2566](https://github.com/mikf/gallery-dl/issues/2566), [#3309](https://github.com/mikf/gallery-dl/issues/3309), [#3911](https://github.com/mikf/gallery-dl/issues/3911)) - [artstation] support collections ([#146](https://github.com/mikf/gallery-dl/issues/146)) - [deviantart] recognize `deviantart.com/stash/…` URLs - [idolcomplex] support new pool URLs - [lensdump] recognize direct image links ([#5293](https://github.com/mikf/gallery-dl/issues/5293)) - [skeb] add extractor for followed users ([#5290](https://github.com/mikf/gallery-dl/issues/5290)) - [twitter] add `quotes` extractor ([#5262](https://github.com/mikf/gallery-dl/issues/5262)) - [wikimedia] support `azurlane.koumakan.jp` ([#5256](https://github.com/mikf/gallery-dl/issues/5256)) - [xvideos] support `/channels/` URLs ([#5244](https://github.com/mikf/gallery-dl/issues/5244)) #### Fixes - [artstation] fix handling usernames with dashes in domain names ([#5224](https://github.com/mikf/gallery-dl/issues/5224)) - [bluesky] fix not spawning child extractors for followed users ([#5246](https://github.com/mikf/gallery-dl/issues/5246)) - [deviantart] handle CloudFront blocks ([#5363](https://github.com/mikf/gallery-dl/issues/5363)) - [deviantart:avatar] fix `index` for URLs without `?` ([#5276](https://github.com/mikf/gallery-dl/issues/5276)) - [deviantart:stash] fix `index` values ([#5335](https://github.com/mikf/gallery-dl/issues/5335)) - [gofile] fix extraction - [hiperdex] update URL patterns & fix `manga` metadata ([#5340](https://github.com/mikf/gallery-dl/issues/5340)) - [idolcomplex] fix metadata extraction - [imagefap] fix folder extraction ([#5333](https://github.com/mikf/gallery-dl/issues/5333)) - [instagram] make accessing `like_count` non-fatal ([#5218](https://github.com/mikf/gallery-dl/issues/5218)) - [mastodon] fix handling null `moved` account field ([#5321](https://github.com/mikf/gallery-dl/issues/5321)) - [naver] fix EUC-KR encoding issue in old image URLs ([#5126](https://github.com/mikf/gallery-dl/issues/5126)) - [nijie] increase default delay between requests ([#5221](https://github.com/mikf/gallery-dl/issues/5221)) - [nitter] ignore invalid Tweets ([#5253](https://github.com/mikf/gallery-dl/issues/5253)) - [pixiv:novel] fix text extraction ([#5285](https://github.com/mikf/gallery-dl/issues/5285), [#5309](https://github.com/mikf/gallery-dl/issues/5309)) - [skeb] retry 429 responses containing a `request_key` cookie ([#5210](https://github.com/mikf/gallery-dl/issues/5210)) - [warosu] fix crash for threads with deleted posts ([#5289](https://github.com/mikf/gallery-dl/issues/5289)) - [weibo] fix retweets ([#2825](https://github.com/mikf/gallery-dl/issues/2825), [#3874](https://github.com/mikf/gallery-dl/issues/3874), [#5263](https://github.com/mikf/gallery-dl/issues/5263)) - [weibo] fix `livephoto` filename extensions ([#5287](https://github.com/mikf/gallery-dl/issues/5287)) - [xvideos] fix galleries with more than 500 images ([#5244](https://github.com/mikf/gallery-dl/issues/5244)) #### Improvements - [bluesky] improve API error messages - [bluesky] handle posts with different `embed` structure - [deviantart:avatar] ignore default avatars ([#5276](https://github.com/mikf/gallery-dl/issues/5276)) - [fapello] download full-sized images ([#5349](https://github.com/mikf/gallery-dl/issues/5349)) - [gelbooru:favorite] automatically detect returned post order ([#5220](https://github.com/mikf/gallery-dl/issues/5220)) - [imgur] fail downloads when redirected to `removed.png` ([#5308](https://github.com/mikf/gallery-dl/issues/5308)) - [instagram] raise proper error for missing `reels_media` ([#5257](https://github.com/mikf/gallery-dl/issues/5257)) - [instagram] change `posts are private` exception to a warning ([#5322](https://github.com/mikf/gallery-dl/issues/5322)) - [reddit] improve preview fallback formats ([#5296](https://github.com/mikf/gallery-dl/issues/5296), [#5315](https://github.com/mikf/gallery-dl/issues/5315)) - [steamgriddb] raise exception for deleted assets - [twitter] handle "account is temporarily locked" errors ([#5300](https://github.com/mikf/gallery-dl/issues/5300)) - [weibo] rework pagination logic ([#4168](https://github.com/mikf/gallery-dl/issues/4168)) - [zerochan] fetch more posts by using the API ([#3669](https://github.com/mikf/gallery-dl/issues/3669)) #### Metadata - [bluesky] add `instance` metadata field ([#4438](https://github.com/mikf/gallery-dl/issues/4438)) - [gelbooru:favorite] add `date_favorited` metadata field - [imagefap] extract `folder` metadata ([#5270](https://github.com/mikf/gallery-dl/issues/5270)) - [instagram] default `likes` to `0` ([#5323](https://github.com/mikf/gallery-dl/issues/5323)) - [kemonoparty] add `revision_count` metadata field ([#5334](https://github.com/mikf/gallery-dl/issues/5334)) - [naver] unescape post `title` and `description` - [pornhub:gif] extract `viewkey` and `timestamp` metadata ([#4463](https://github.com/mikf/gallery-dl/issues/4463)) - [redgifs] make `date` available for directories ([#5262](https://github.com/mikf/gallery-dl/issues/5262)) - [subscribestar] fix `date` metadata - [twitter] add `birdwatch` metadata field ([#5317](https://github.com/mikf/gallery-dl/issues/5317)) - [twitter] add `protected` metadata field ([#5327](https://github.com/mikf/gallery-dl/issues/5327)) - [warosu] fix `board_name` metadata #### Options - [bluesky] add `reposts` option ([#4438](https://github.com/mikf/gallery-dl/issues/4438), [#5248](https://github.com/mikf/gallery-dl/issues/5248)) - [deviantart] add `comments-avatars` option ([#4995](https://github.com/mikf/gallery-dl/issues/4995)) - [deviantart] extend `metadata` option ([#5175](https://github.com/mikf/gallery-dl/issues/5175)) - [flickr] add `contexts` option ([#5324](https://github.com/mikf/gallery-dl/issues/5324)) - [gelbooru:favorite] add `order-posts` option ([#5220](https://github.com/mikf/gallery-dl/issues/5220)) - [kemonoparty] add `order-revisions` option ([#5334](https://github.com/mikf/gallery-dl/issues/5334)) - [vipergirls] add `like` option ([#4166](https://github.com/mikf/gallery-dl/issues/4166)) - [vipergirls] add `domain` option ([#4166](https://github.com/mikf/gallery-dl/issues/4166)) ### Downloaders - [http] add MIME type and signature for `.mov` files ([#5287](https://github.com/mikf/gallery-dl/issues/5287)) ### Docker - build images from source instead of PyPI package - build `linux/arm64` images ([#5227](https://github.com/mikf/gallery-dl/issues/5227)) - build images on every push to master - tag images as `YYYY.MM.DD` - tag the most recent build from master as `dev` - tag the most recent release build as `latest` - reduce image size ([#5097](https://github.com/mikf/gallery-dl/issues/5097)) ### Miscellaneous - [formatter] fix local DST datetime offsets for `:O` - build Linux executable on Ubuntu 22.04 LTS ([#4184](https://github.com/mikf/gallery-dl/issues/4184)) - automatically create directories for logging files ([#5249](https://github.com/mikf/gallery-dl/issues/5249)) ## 1.26.8 - 2024-02-17 ### Extractors #### Additions - [bluesky] add support ([#4438](https://github.com/mikf/gallery-dl/issues/4438), [#4708](https://github.com/mikf/gallery-dl/issues/4708), [#4722](https://github.com/mikf/gallery-dl/issues/4722), [#5047](https://github.com/mikf/gallery-dl/issues/5047)) - [bunkr] support new domains ([#5114](https://github.com/mikf/gallery-dl/issues/5114), [#5130](https://github.com/mikf/gallery-dl/issues/5130), [#5134](https://github.com/mikf/gallery-dl/issues/5134)) - [fanbox] add `home` and `supporting` extractors ([#5138](https://github.com/mikf/gallery-dl/issues/5138)) - [imagechest] add `user` extractor ([#5143](https://github.com/mikf/gallery-dl/issues/5143)) - [imagetwist] add `gallery` extractor ([#5190](https://github.com/mikf/gallery-dl/issues/5190)) - [kemonoparty] add `posts` extractor ([#5194](https://github.com/mikf/gallery-dl/issues/5194), [#5198](https://github.com/mikf/gallery-dl/issues/5198)) - [twitter] support communities ([#4913](https://github.com/mikf/gallery-dl/issues/4913)) - [vsco] support spaces ([#5202](https://github.com/mikf/gallery-dl/issues/5202)) - [weibo] add `gifs` option ([#5183](https://github.com/mikf/gallery-dl/issues/5183)) - [wikimedia] support `www.pidgi.net` ([#5205](https://github.com/mikf/gallery-dl/issues/5205)) - [wikimedia] support `bulbapedia.bulbagarden.net` ([#5206](https://github.com/mikf/gallery-dl/issues/5206)) #### Fixes - [archivedmoe] fix `thebarchive` WebM URLs ([#5116](https://github.com/mikf/gallery-dl/issues/5116)) - [batoto] fix crash when manga name or chapter contains a `-` ([#5200](https://github.com/mikf/gallery-dl/issues/5200)) - [bunkr] fix extraction ([#5088](https://github.com/mikf/gallery-dl/issues/5088), [#5151](https://github.com/mikf/gallery-dl/issues/5151), [#5153](https://github.com/mikf/gallery-dl/issues/5153)) - [gofile] update `website_token` extraction - [idolcomplex] fix pagination for tags containing `:` ([#5184](https://github.com/mikf/gallery-dl/issues/5184)) - [kemonoparty] fix deleting file names when computing `revision_hash` ([#5103](https://github.com/mikf/gallery-dl/issues/5103)) - [luscious] fix IndexError for files without thumbnail ([#5122](https://github.com/mikf/gallery-dl/issues/5122), [#5124](https://github.com/mikf/gallery-dl/issues/5124), [#5182](https://github.com/mikf/gallery-dl/issues/5182)) - [naverwebtoon] fix `title` for comics with empty tags ([#5120](https://github.com/mikf/gallery-dl/issues/5120)) - [pinterest] fix section URLs for boards with `/`, `?`, or `#` in their name ([#5104](https://github.com/mikf/gallery-dl/issues/5104)) - [twitter] update query hashes - [zerochan] fix skipping every other post #### Improvements - [deviantart] skip locked/blurred posts ([#4567](https://github.com/mikf/gallery-dl/issues/4567), [#5193](https://github.com/mikf/gallery-dl/issues/5193)) - [deviantart] implement downloading PNG versions of non-original images with `"quality": "png"` ([#4846](https://github.com/mikf/gallery-dl/issues/4846)) - [flickr] handle non-JSON errors ([#5131](https://github.com/mikf/gallery-dl/issues/5131)) - [idolcomplex] support alphanumeric post IDs ([#5171](https://github.com/mikf/gallery-dl/issues/5171)) - [kemonoparty] implement filtering duplicate revisions with `"revisions": "unique"`([#5013](https://github.com/mikf/gallery-dl/issues/5013)) - [naverwebtoon] support `/webtoon/` paths for all comics ([#5123](https://github.com/mikf/gallery-dl/issues/5123)) #### Metadata - [idolcomplex] extract `id_alnum` metadata ([#5171](https://github.com/mikf/gallery-dl/issues/5171)) - [pornpics] support multiple values for `channel` ([#5195](https://github.com/mikf/gallery-dl/issues/5195)) - [sankaku] add `id-format` option ([#5073](https://github.com/mikf/gallery-dl/issues/5073)) - [skeb] add `num` and `count` metadata fields ([#5187](https://github.com/mikf/gallery-dl/issues/5187)) ### Downloaders #### Fixes - [http] remove `pyopenssl` import ([#5156](https://github.com/mikf/gallery-dl/issues/5156)) ### Miscellaneous - fix filename formatting silently failing under certain circumstances ([#5185](https://github.com/mikf/gallery-dl/issues/5185), [#5186](https://github.com/mikf/gallery-dl/issues/5186)) ## 1.26.7 - 2024-01-21 ### Extractors #### Additions - [2ch] add support ([#1009](https://github.com/mikf/gallery-dl/issues/1009), [#3540](https://github.com/mikf/gallery-dl/issues/3540), [#4444](https://github.com/mikf/gallery-dl/issues/4444)) - [deviantart:avatar] add `formats` option ([#4995](https://github.com/mikf/gallery-dl/issues/4995)) - [hatenablog] add support ([#5036](https://github.com/mikf/gallery-dl/issues/5036), [#5037](https://github.com/mikf/gallery-dl/issues/5037)) - [mangadex] add `list` extractor ([#5025](https://github.com/mikf/gallery-dl/issues/5025)) - [steamgriddb] add support ([#5033](https://github.com/mikf/gallery-dl/issues/5033), [#5041](https://github.com/mikf/gallery-dl/issues/5041)) - [wikimedia] add support ([#1443](https://github.com/mikf/gallery-dl/issues/1443), [#2906](https://github.com/mikf/gallery-dl/issues/2906), [#3660](https://github.com/mikf/gallery-dl/issues/3660), [#2340](https://github.com/mikf/gallery-dl/issues/2340)) - [wikimedia] support `fandom` wikis ([#2677](https://github.com/mikf/gallery-dl/issues/2677), [#3378](https://github.com/mikf/gallery-dl/issues/3378)) #### Fixes - [blogger] fix `lh-*.googleusercontent.com` URLs ([#5091](https://github.com/mikf/gallery-dl/issues/5091)) - [bunkr] update domain ([#5088](https://github.com/mikf/gallery-dl/issues/5088)) - [deviantart] fix AttributeError for URLs without username ([#5065](https://github.com/mikf/gallery-dl/issues/5065)) - [deviantart] fix `KeyError: 'premium_folder_data'` ([#5063](https://github.com/mikf/gallery-dl/issues/5063)) - [deviantart:avatar] fix exception when `comments` are enabled ([#4995](https://github.com/mikf/gallery-dl/issues/4995)) - [fuskator] make metadata extraction non-fatal ([#5039](https://github.com/mikf/gallery-dl/issues/5039)) - [gelbooru] only log "Incomplete API response" for favorites ([#5045](https://github.com/mikf/gallery-dl/issues/5045)) - [giantessbooru] update domain - [issuu] fix extraction - [nijie] fix download URLs of single image posts ([#5049](https://github.com/mikf/gallery-dl/issues/5049)) - [patreon] fix `KeyError: 'name'` ([#5048](https://github.com/mikf/gallery-dl/issues/5048), [#5069](https://github.com/mikf/gallery-dl/issues/5069), [#5093](https://github.com/mikf/gallery-dl/issues/5093)) - [pixiv] update API headers ([#5029](https://github.com/mikf/gallery-dl/issues/5029)) - [realbooru] fix download URLs of older posts - [twitter] revert to using `media` timeline by default ([#4953](https://github.com/mikf/gallery-dl/issues/4953)) - [vk] transform image URLs to non-blurred versions ([#5017](https://github.com/mikf/gallery-dl/issues/5017)) #### Improvements - [batoto] support more mirror domains ([#5042](https://github.com/mikf/gallery-dl/issues/5042)) - [batoto] improve v2 manga URL pattern - [gelbooru] support `all` tag and URLs with empty tags ([#5076](https://github.com/mikf/gallery-dl/issues/5076)) - [patreon] download `m3u8` manifests with ytdl - [sankaku] support post URLs with alphanumeric IDs ([#5073](https://github.com/mikf/gallery-dl/issues/5073)) #### Metadata - [batoto] improve `manga_id` extraction ([#5042](https://github.com/mikf/gallery-dl/issues/5042)) - [erome] fix `count` metadata - [kemonoparty] add `revision_hash` metadata ([#4706](https://github.com/mikf/gallery-dl/issues/4706), [#4727](https://github.com/mikf/gallery-dl/issues/4727), [#5013](https://github.com/mikf/gallery-dl/issues/5013)) - [paheal] fix `source` metadata - [webtoons] extract more metadata ([#5061](https://github.com/mikf/gallery-dl/issues/5061), [#5094](https://github.com/mikf/gallery-dl/issues/5094)) #### Removals - [chevereto] remove `pixl.li` - [hbrowse] remove module - [nitter] remove `nitter.lacontrevoie.fr` ## 1.26.6 - 2024-01-06 ### Extractors #### Additions - [batoto] add `chapter` and `manga` extractors ([#1434](https://github.com/mikf/gallery-dl/issues/1434), [#2111](https://github.com/mikf/gallery-dl/issues/2111), [#4979](https://github.com/mikf/gallery-dl/issues/4979)) - [deviantart] add `avatar` and `background` extractors ([#4995](https://github.com/mikf/gallery-dl/issues/4995)) - [poringa] add support ([#4675](https://github.com/mikf/gallery-dl/issues/4675), [#4962](https://github.com/mikf/gallery-dl/issues/4962)) - [szurubooru] support `snootbooru.com` ([#5023](https://github.com/mikf/gallery-dl/issues/5023)) - [zzup] add `gallery` extractor ([#4517](https://github.com/mikf/gallery-dl/issues/4517), [#4604](https://github.com/mikf/gallery-dl/issues/4604), [#4659](https://github.com/mikf/gallery-dl/issues/4659), [#4863](https://github.com/mikf/gallery-dl/issues/4863), [#5016](https://github.com/mikf/gallery-dl/issues/5016)) #### Fixes - [gelbooru] fix `favorite` extractor ([#4903](https://github.com/mikf/gallery-dl/issues/4903)) - [idolcomplex] fix extraction & update URL patterns ([#5002](https://github.com/mikf/gallery-dl/issues/5002)) - [imagechest] fix loading more than 10 images in a gallery ([#4469](https://github.com/mikf/gallery-dl/issues/4469)) - [jpgfish] update domain - [komikcast] fix `manga` extractor ([#5027](https://github.com/mikf/gallery-dl/issues/5027)) - [komikcast] update domain ([#5027](https://github.com/mikf/gallery-dl/issues/5027)) - [lynxchan] update `bbw-chan` domain ([#4970](https://github.com/mikf/gallery-dl/issues/4970)) - [manganelo] fix extraction & recognize `.to` TLDs ([#5005](https://github.com/mikf/gallery-dl/issues/5005)) - [paheal] restore `extension` metadata ([#4976](https://github.com/mikf/gallery-dl/issues/4976)) - [rule34us] add fallback for `video-cdn1` videos ([#4985](https://github.com/mikf/gallery-dl/issues/4985)) - [weibo] fix AttributeError in `user` extractor ([#5022](https://github.com/mikf/gallery-dl/issues/5022)) #### Improvements - [gelbooru] show error for invalid API responses ([#4903](https://github.com/mikf/gallery-dl/issues/4903)) - [rule34] recognize URLs with `www` subdomain ([#4984](https://github.com/mikf/gallery-dl/issues/4984)) - [twitter] raise error for invalid `strategy` values ([#4953](https://github.com/mikf/gallery-dl/issues/4953)) #### Metadata - [fanbox] add `metadata` option ([#4921](https://github.com/mikf/gallery-dl/issues/4921)) - [nijie] add `count` metadata ([#146](https://github.com/mikf/gallery-dl/issues/146)) - [pinterest] add `count` metadata ([#4981](https://github.com/mikf/gallery-dl/issues/4981)) ### Miscellaneous - fix and update zsh completion ([#4972](https://github.com/mikf/gallery-dl/issues/4972)) - fix `--cookies-from-browser` macOS Firefox profile path ## 1.26.5 - 2023-12-23 ### Extractors #### Additions - [deviantart] add `intermediary` option ([#4955](https://github.com/mikf/gallery-dl/issues/4955)) - [inkbunny] add `unread` extractor ([#4934](https://github.com/mikf/gallery-dl/issues/4934)) - [mastodon] support non-numeric status IDs ([#4936](https://github.com/mikf/gallery-dl/issues/4936)) - [myhentaigallery] recognize `/g/` URLs ([#4920](https://github.com/mikf/gallery-dl/issues/4920)) - [postmill] add support ([#4917](https://github.com/mikf/gallery-dl/issues/4917), [#4919](https://github.com/mikf/gallery-dl/issues/4919)) - {shimmie2[ support `rule34hentai.net` ([#861](https://github.com/mikf/gallery-dl/issues/861), [#4789](https://github.com/mikf/gallery-dl/issues/4789), [#4945](https://github.com/mikf/gallery-dl/issues/4945)) #### Fixes - [deviantart] add workaround for integer `client-id` values ([#4924](https://github.com/mikf/gallery-dl/issues/4924)) - [exhentai] fix error for infinite `fallback-retries` ([#4911](https://github.com/mikf/gallery-dl/issues/4911)) - [inkbunny] stop pagination on empty results - [patreon] fix bootstrap data extraction again ([#4904](https://github.com/mikf/gallery-dl/issues/4904)) - [tumblr] fix exception after waiting for rate limit ([#4916](https://github.com/mikf/gallery-dl/issues/4916)) #### Improvements - [exhentai] output continuation URL when interrupted ([#4782](https://github.com/mikf/gallery-dl/issues/4782)) - [inkbunny] improve `/submissionsviewall.php` patterns ([#4934](https://github.com/mikf/gallery-dl/issues/4934)) - [tumblr] support infinite `fallback-retries` - [twitter] default to `tweets` timeline when `replies` are enabled ([#4953](https://github.com/mikf/gallery-dl/issues/4953)) #### Metadata - [danbooru] provide `tags` as list ([#4942](https://github.com/mikf/gallery-dl/issues/4942)) - [deviantart] set `is_original` for intermediary URLs to `false` - [twitter] remove `date_liked` ([#3850](https://github.com/mikf/gallery-dl/issues/3850), [#4108](https://github.com/mikf/gallery-dl/issues/4108), [#4657](https://github.com/mikf/gallery-dl/issues/4657)) ### Docker - add Docker instructions to README ([#4850](https://github.com/mikf/gallery-dl/issues/4850)) - fix auto-generation of `latest` tags ## 1.26.4 - 2023-12-10 ### Extractors #### Additions - [exhentai] add `fallback-retries` option ([#4792](https://github.com/mikf/gallery-dl/issues/4792)) - [urlgalleries] add `gallery` extractor ([#919](https://github.com/mikf/gallery-dl/issues/919), [#1184](https://github.com/mikf/gallery-dl/issues/1184), [#2905](https://github.com/mikf/gallery-dl/issues/2905), [#4886](https://github.com/mikf/gallery-dl/issues/4886)) #### Fixes - [nijie] fix image URLs of multi-image posts ([#4876](https://github.com/mikf/gallery-dl/issues/4876)) - [patreon] fix bootstrap data extraction ([#4904](https://github.com/mikf/gallery-dl/issues/4904), [#4906](https://github.com/mikf/gallery-dl/issues/4906)) - [twitter] fix `/media` timelines ([#4898](https://github.com/mikf/gallery-dl/issues/4898), [#4899](https://github.com/mikf/gallery-dl/issues/4899)) - [twitter] retry API requests when response contains incomplete results ([#4811](https://github.com/mikf/gallery-dl/issues/4811)) #### Improvements - [exhentai] store more cookies when logging in with username & password ([#4881](https://github.com/mikf/gallery-dl/issues/4881)) - [twitter] generalize "Login Required" errors ([#4734](https://github.com/mikf/gallery-dl/issues/4734), [#4324](https://github.com/mikf/gallery-dl/issues/4324)) ### Options - add `-e/--error-file` command-line and `output.errorfile` config option ([#4732](https://github.com/mikf/gallery-dl/issues/4732)) ### Miscellaneous - automatically build and push Docker images - prompt for passwords on login when necessary - fix `util.dump_response()` to work with `bytes` header values ## 1.26.3 - 2023-11-27 ### Extractors #### Additions - [behance] support `text` modules ([#4799](https://github.com/mikf/gallery-dl/issues/4799)) - [behance] add `modules` option ([#4799](https://github.com/mikf/gallery-dl/issues/4799)) - [blogger] support `www.micmicidol.club` ([#4759](https://github.com/mikf/gallery-dl/issues/4759)) - [erome] add `count` metadata ([#4812](https://github.com/mikf/gallery-dl/issues/4812)) - [exhentai] add `gp` option ([#4576](https://github.com/mikf/gallery-dl/issues/4576)) - [fapello] support `.su` TLD ([#4840](https://github.com/mikf/gallery-dl/issues/4840), [#4841](https://github.com/mikf/gallery-dl/issues/4841)) - [pixeldrain] add `file` and `album` extractors ([#4839](https://github.com/mikf/gallery-dl/issues/4839)) - [pixeldrain] add `api-key` option ([#4839](https://github.com/mikf/gallery-dl/issues/4839)) - [tmohentai] add `gallery` extractor ([#4808](https://github.com/mikf/gallery-dl/issues/4808), [#4832](https://github.com/mikf/gallery-dl/issues/4832)) #### Fixes - [cyberdrop] update to site layout changes - [exhentai] handle `Downloading … requires GP` errors ([#4576](https://github.com/mikf/gallery-dl/issues/4576), [#4763](https://github.com/mikf/gallery-dl/issues/4763)) - [exhentai] fix empty API URL with `"source": "hitomi"` ([#4829](https://github.com/mikf/gallery-dl/issues/4829)) - [hentaifoundry] check for and update expired sessions ([#4694](https://github.com/mikf/gallery-dl/issues/4694)) - [hiperdex] fix `manga` metadata - [idolcomplex] update to site layout changes - [imagefap] fix resolution of single images - [instagram] fix exception on empty `video_versions` ([#4795](https://github.com/mikf/gallery-dl/issues/4795)) - [mangaread] fix extraction - [mastodon] fix reblogs ([#4580](https://github.com/mikf/gallery-dl/issues/4580)) - [nitter] fix video extraction ([#4853](https://github.com/mikf/gallery-dl/issues/4853), [#4855](https://github.com/mikf/gallery-dl/issues/4855)) - [pornhub] fix `user` metadata for gifs - [tumblr] fix `day` extractor - [wallpapercave] fix extraction - [warosu] fix file URLs - [webtoons] fix pagination when receiving an HTTP redirect - [xvideos] fix metadata extraction - [zerochan] fix metadata extraction #### Improvements - [hentaicosplays] force `https://` for download URLs - [oauth] warn when cache is enabled but not writeable ([#4771](https://github.com/mikf/gallery-dl/issues/4771)) - [sankaku] update URL patterns - [twitter] ignore promoted Tweets ([#3894](https://github.com/mikf/gallery-dl/issues/3894), [#4790](https://github.com/mikf/gallery-dl/issues/4790)) - [weibo] detect redirects to login page ([#4773](https://github.com/mikf/gallery-dl/issues/4773)) #### Removals - [foolslide] remove `powermanga.org` ### Downloaders #### Changes - [http] treat files not passing `filesize-min`/`-max` as skipped ([#4821](https://github.com/mikf/gallery-dl/issues/4821)) ### Options #### Additions - add `metadata-extractor` option ([#4549](https://github.com/mikf/gallery-dl/issues/4549)) - support `metadata-*` names for `*-metadata` options (for example `url-metadata` is now also recognized as `metadata-url`) ### CLI #### Additions - implement `-I/--input-file-comment` and `-x/--input-file-delete` options ([#4732](https://github.com/mikf/gallery-dl/issues/4732)) - add `--ugoira` as a general version of `--ugoira-conv` and co. - add `--mtime` as a general version of `--mtime-from-date` - add `--cbz` #### Fixes - allow `--mtime-from-date` to work with Weibo`s metadata structure ### Miscellaneous #### Additions - add a simple Dockerfile ([#4831](https://github.com/mikf/gallery-dl/issues/4831)) ## 1.26.2 - 2023-11-04 ### Extractors #### Additions - [4archive] add `thread` and `board` extractors ([#1262](https://github.com/mikf/gallery-dl/issues/1262), [#2418](https://github.com/mikf/gallery-dl/issues/2418), [#4400](https://github.com/mikf/gallery-dl/issues/4400), [#4710](https://github.com/mikf/gallery-dl/issues/4710), [#4714](https://github.com/mikf/gallery-dl/issues/4714)) - [hitomi] recognize `imageset` gallery URLs ([#4756](https://github.com/mikf/gallery-dl/issues/4756)) - [kemonoparty] add `revision_index` metadata field ([#4727](https://github.com/mikf/gallery-dl/issues/4727)) - [misskey] support `misskey.design` ([#4713](https://github.com/mikf/gallery-dl/issues/4713)) - [reddit] support Reddit Mobile share links ([#4693](https://github.com/mikf/gallery-dl/issues/4693)) - [sankaku] support `/posts/` tag search URLs ([#4740](https://github.com/mikf/gallery-dl/issues/4740)) - [twitter] recognize `fixupx.com` URLs ([#4755](https://github.com/mikf/gallery-dl/issues/4755)) #### Fixes - [exhentai] update to site layout changes ([#4730](https://github.com/mikf/gallery-dl/issues/4730), [#4754](https://github.com/mikf/gallery-dl/issues/4754)) - [exhentai] provide fallback URLs ([#1021](https://github.com/mikf/gallery-dl/issues/1021), [#4745](https://github.com/mikf/gallery-dl/issues/4745)) - [exhentai] disable `DH` ciphers to avoid `DH_KEY_TOO_SMALL` errors ([#1021](https://github.com/mikf/gallery-dl/issues/1021), [#4593](https://github.com/mikf/gallery-dl/issues/4593)) - [idolcomplex] disable sending Referer headers ([#4726](https://github.com/mikf/gallery-dl/issues/4726)) - [instagram] update API headers - [kemonoparty] fix parsing of non-standard `date` values ([#4676](https://github.com/mikf/gallery-dl/issues/4676)) - [patreon] fix `campaign_id` extraction ([#4699](https://github.com/mikf/gallery-dl/issues/4699), [#4715](https://github.com/mikf/gallery-dl/issues/4715), [#4736](https://github.com/mikf/gallery-dl/issues/4736), [#4738](https://github.com/mikf/gallery-dl/issues/4738)) - [pixiv] load cookies for non-OAuth URLs ([#4760](https://github.com/mikf/gallery-dl/issues/4760)) - [twitter] fix avatars without `date` information ([#4696](https://github.com/mikf/gallery-dl/issues/4696)) - [twitter] restore truncated retweet texts ([#3430](https://github.com/mikf/gallery-dl/issues/3430), [#4690](https://github.com/mikf/gallery-dl/issues/4690)) - [weibo] fix Sina Visitor requests #### Improvements - [behance] unescape embed URLs ([#4742](https://github.com/mikf/gallery-dl/issues/4742)) - [fantia] simplify `tags` to a list of strings ([#4752](https://github.com/mikf/gallery-dl/issues/4752)) - [kemonoparty] limit `title` length ([#4741](https://github.com/mikf/gallery-dl/issues/4741)) - [nijie] set 1-2s delay between requests to avoid 429 errors - [patreon] provide ways to manually specify a user's campaign_id - `https://www.patreon.com/id:12345` - `https://www.patreon.com/USER?c=12345` - `https://www.patreon.com/USER?campaign_id=12345` - [twitter] cache `user_by_…` results ([#4719](https://github.com/mikf/gallery-dl/issues/4719)) ### Post Processors #### Fixes - [metadata] ignore non-string tag values ([#4764](https://github.com/mikf/gallery-dl/issues/4764)) ### Miscellaneous #### Fixes - prevent crash when `stdout.line_buffering` is not defined ([#642](https://github.com/mikf/gallery-dl/issues/642)) ## 1.26.1 - 2023-10-21 ### Extractors #### Additions - [bunkr] add extractor for media URLs ([#4684](https://github.com/mikf/gallery-dl/issues/4684)) - [chevereto] add generic extractors for `chevereto` sites ([#4664](https://github.com/mikf/gallery-dl/issues/4664)) - `deltaporno.com` ([#1381](https://github.com/mikf/gallery-dl/issues/1381)) - `img.kiwi` - `jpgfish` - `pixl.li` ([#3179](https://github.com/mikf/gallery-dl/issues/3179), [#4357](https://github.com/mikf/gallery-dl/issues/4357)) - [deviantart] implement `"group": "skip"` ([#4630](https://github.com/mikf/gallery-dl/issues/4630)) - [fantia] add `content_count` and `content_num` metadata fields ([#4627](https://github.com/mikf/gallery-dl/issues/4627)) - [imgbb] add `displayname` and `user_id` metadata ([#4626](https://github.com/mikf/gallery-dl/issues/4626)) - [kemonoparty] support post revisions; add `revisions` option ([#4498](https://github.com/mikf/gallery-dl/issues/4498), [#4597](https://github.com/mikf/gallery-dl/issues/4597)) - [kemonoparty] support searches ([#3385](https://github.com/mikf/gallery-dl/issues/3385), [#4057](https://github.com/mikf/gallery-dl/issues/4057)) - [kemonoparty] support discord URLs with channel IDs ([#4662](https://github.com/mikf/gallery-dl/issues/4662)) - [moebooru] add `metadata` option ([#4646](https://github.com/mikf/gallery-dl/issues/4646)) - [newgrounds] support multi-image posts ([#4642](https://github.com/mikf/gallery-dl/issues/4642)) - [sankaku] support `/posts/` URLs ([#4688](https://github.com/mikf/gallery-dl/issues/4688)) - [twitter] add `sensitive` metadata field ([#4619](https://github.com/mikf/gallery-dl/issues/4619)) #### Fixes - [4chanarchives] disable Referer headers by default ([#4686](https://github.com/mikf/gallery-dl/issues/4686)) - [bunkr] fix `/d/` file URLs ([#4685](https://github.com/mikf/gallery-dl/issues/4685)) - [deviantart] expand nested comment replies ([#4653](https://github.com/mikf/gallery-dl/issues/4653)) - [deviantart] disable `jwt` ([#4652](https://github.com/mikf/gallery-dl/issues/4652)) - [hentaifoundry] fix `.swf` file downloads ([#4641](https://github.com/mikf/gallery-dl/issues/4641)) - [imgbb] fix `user` metadata extraction ([#4626](https://github.com/mikf/gallery-dl/issues/4626)) - [imgbb] update pagination end condition ([#4626](https://github.com/mikf/gallery-dl/issues/4626)) - [kemonoparty] update API endpoints ([#4676](https://github.com/mikf/gallery-dl/issues/4676), [#4677](https://github.com/mikf/gallery-dl/issues/4677)) - [patreon] update `campaign_id` path ([#4639](https://github.com/mikf/gallery-dl/issues/4639)) - [reddit] fix wrong previews ([#4649](https://github.com/mikf/gallery-dl/issues/4649)) - [redgifs] fix `niches` extraction ([#4666](https://github.com/mikf/gallery-dl/issues/4666), [#4667](https://github.com/mikf/gallery-dl/issues/4667)) - [twitter] fix crash due to missing `source` ([#4620](https://github.com/mikf/gallery-dl/issues/4620)) - [warosu] fix extraction ([#4634](https://github.com/mikf/gallery-dl/issues/4634)) ### Post Processors #### Additions - support `{_filename}`, `{_directory}`, and `{_path}` replacement fields for `--exec` ([#4633](https://github.com/mikf/gallery-dl/issues/4633)) ### Miscellaneous #### Improvements - avoid temporary copies with `--cookies-from-browser` by opening cookie databases in read-only mode ## 1.26.0 - 2023-10-03 - ### Extractors #### Additions - [behance] add `date` metadata field ([#4417](https://github.com/mikf/gallery-dl/issues/4417)) - [danbooru] support `booru.borvar.art` ([#4096](https://github.com/mikf/gallery-dl/issues/4096)) - [danbooru] support `donmai.moe` - [deviantart] add `is_original` metadata field ([#4559](https://github.com/mikf/gallery-dl/issues/4559)) - [e621] support `e6ai.net` ([#4320](https://github.com/mikf/gallery-dl/issues/4320)) - [exhentai] add `fav` option ([#4409](https://github.com/mikf/gallery-dl/issues/4409)) - [gelbooru_v02] support `xbooru.com` ([#4493](https://github.com/mikf/gallery-dl/issues/4493)) - [instagram] add `following` extractor ([#1848](https://github.com/mikf/gallery-dl/issues/1848)) - [pillowfort] support `/tagged/` URLs ([#4570](https://github.com/mikf/gallery-dl/issues/4570)) - [pornhub] add `gif` support ([#4463](https://github.com/mikf/gallery-dl/issues/4463)) - [reddit] add `previews` option ([#4322](https://github.com/mikf/gallery-dl/issues/4322)) - [redgifs] add `niches` extractor ([#4311](https://github.com/mikf/gallery-dl/issues/4311), [#4312](https://github.com/mikf/gallery-dl/issues/4312)) - [redgifs] support `order` parameter for user URLs ([#4583](https://github.com/mikf/gallery-dl/issues/4583)) - [twitter] add `user` extractor and `include` option ([#4275](https://github.com/mikf/gallery-dl/issues/4275)) - [twitter] add `tweet-endpoint` option ([#4307](https://github.com/mikf/gallery-dl/issues/4307)) - [twitter] add `date_original` metadata for retweets ([#4337](https://github.com/mikf/gallery-dl/issues/4337), [#4443](https://github.com/mikf/gallery-dl/issues/4443)) - [twitter] extract `source` metadata ([#4459](https://github.com/mikf/gallery-dl/issues/4459)) - [twitter] support `x.com` URLs ([#4452](https://github.com/mikf/gallery-dl/issues/4452)) #### Improvements - include `Referer` header in all HTTP requests ([#4490](https://github.com/mikf/gallery-dl/issues/4490), [#4518](https://github.com/mikf/gallery-dl/issues/4518)) (can be disabled with `referer` option) - [behance] show errors for mature content ([#4417](https://github.com/mikf/gallery-dl/issues/4417)) - [deviantart] re-add `quality` option and `/intermediary/` transform - [fantia] improve metadata extraction ([#4126](https://github.com/mikf/gallery-dl/issues/4126)) - [instagram] better error messages for invalid users ([#4606](https://github.com/mikf/gallery-dl/issues/4606)) - [mangadex] support multiple values for `lang` ([#4093](https://github.com/mikf/gallery-dl/issues/4093)) - [mastodon] support `/@USER/following` URLs ([#4608](https://github.com/mikf/gallery-dl/issues/4608)) - [moebooru] match search URLs with empty `tags` ([#4354](https://github.com/mikf/gallery-dl/issues/4354)) - [pillowfort] extract `b2_lg_url` media ([#4570](https://github.com/mikf/gallery-dl/issues/4570)) - [reddit] improve comment metadata ([#4482](https://github.com/mikf/gallery-dl/issues/4482)) - [reddit] ignore `/message/compose` URLs ([#4482](https://github.com/mikf/gallery-dl/issues/4482), [#4581](https://github.com/mikf/gallery-dl/issues/4581)) - [redgifs] provide `collection` metadata as separate field ([#4508](https://github.com/mikf/gallery-dl/issues/4508)) - [redgifs] match `gfycat` image URLs ([#4558](https://github.com/mikf/gallery-dl/issues/4558)) - [twitter] improve error messages for single Tweets ([#4369](https://github.com/mikf/gallery-dl/issues/4369)) #### Fixes - [acidimg] fix extraction - [architizer] fix extraction ([#4537](https://github.com/mikf/gallery-dl/issues/4537)) - [behance] fix and update `user` extractor ([#4417](https://github.com/mikf/gallery-dl/issues/4417)) - [behance] fix cookie usage ([#4417](https://github.com/mikf/gallery-dl/issues/4417)) - [behance] handle videos without `renditions` ([#4523](https://github.com/mikf/gallery-dl/issues/4523)) - [bunkr] fix media domain for `cdn9` ([#4386](https://github.com/mikf/gallery-dl/issues/4386), [#4412](https://github.com/mikf/gallery-dl/issues/4412)) - [bunkr] fix extracting `.wmv` files ([#4419](https://github.com/mikf/gallery-dl/issues/4419)) - [bunkr] fix media domain for `cdn-pizza.bunkr.ru` ([#4489](https://github.com/mikf/gallery-dl/issues/4489)) - [bunkr] fix extraction ([#4514](https://github.com/mikf/gallery-dl/issues/4514), [#4532](https://github.com/mikf/gallery-dl/issues/4532), [#4529](https://github.com/mikf/gallery-dl/issues/4529), [#4540](https://github.com/mikf/gallery-dl/issues/4540)) - [deviantart] fix full resolution URLs for non-downloadable images ([#293](https://github.com/mikf/gallery-dl/issues/293), [#4548](https://github.com/mikf/gallery-dl/issues/4548), [#4563](https://github.com/mikf/gallery-dl/issues/4563)) - [deviantart] fix shortened URLs ([#4316](https://github.com/mikf/gallery-dl/issues/4316)) - [deviantart] fix search ([#4384](https://github.com/mikf/gallery-dl/issues/4384)) - [deviantart] update Eclipse API endpoints ([#4553](https://github.com/mikf/gallery-dl/issues/4553), [#4615](https://github.com/mikf/gallery-dl/issues/4615)) - [deviantart] use private tokens for `is_mature` posts ([#4563](https://github.com/mikf/gallery-dl/issues/4563)) - [flickr] update default API credentials ([#4332](https://github.com/mikf/gallery-dl/issues/4332)) - [giantessbooru] fix extraction ([#4373](https://github.com/mikf/gallery-dl/issues/4373)) - [hiperdex] fix crash for titles containing Unicode characters ([#4325](https://github.com/mikf/gallery-dl/issues/4325)) - [hiperdex] fix `manga` metadata - [imagefap] fix pagination ([#3013](https://github.com/mikf/gallery-dl/issues/3013)) - [imagevenue] fix extraction ([#4473](https://github.com/mikf/gallery-dl/issues/4473)) - [instagram] fix private posts with long shortcodes ([#4362](https://github.com/mikf/gallery-dl/issues/4362)) - [instagram] fix video preview archive IDs ([#2135](https://github.com/mikf/gallery-dl/issues/2135), [#4455](https://github.com/mikf/gallery-dl/issues/4455)) - [instagram] handle exceptions due to missing media ([#4555](https://github.com/mikf/gallery-dl/issues/4555)) - [issuu] fix extraction ([#4420](https://github.com/mikf/gallery-dl/issues/4420)) - [jpgfish] update domain to `jpg1.su` ([#4494](https://github.com/mikf/gallery-dl/issues/4494)) - [kemonoparty] update `favorite` API endpoint ([#4522](https://github.com/mikf/gallery-dl/issues/4522)) - [lensdump] fix extraction ([#4352](https://github.com/mikf/gallery-dl/issues/4352)) - [mangakakalot] update domain - [reddit] fix `preview.redd.it` URLs ([#4470](https://github.com/mikf/gallery-dl/issues/4470)) - [patreon] fix extraction ([#4547](https://github.com/mikf/gallery-dl/issues/4547)) - [pixiv] handle errors for private novels ([#4481](https://github.com/mikf/gallery-dl/issues/4481)) - [pornhub] fix extraction ([#4301](https://github.com/mikf/gallery-dl/issues/4301)) - [pururin] fix extraction ([#4375](https://github.com/mikf/gallery-dl/issues/4375)) - [subscribestar] fix preview detection ([#4468](https://github.com/mikf/gallery-dl/issues/4468)) - [twitter] fix crash on private user ([#4349](https://github.com/mikf/gallery-dl/issues/4349)) - [twitter] fix `TweetWithVisibilityResults` ([#4369](https://github.com/mikf/gallery-dl/issues/4369)) - [twitter] fix crash when `sortIndex` is undefined ([#4499](https://github.com/mikf/gallery-dl/issues/4499)) - [zerochan] fix `tags` extraction ([#4315](https://github.com/mikf/gallery-dl/issues/4315), [#4319](https://github.com/mikf/gallery-dl/issues/4319)) #### Removals - [gfycat] remove module - [shimmie2] remove `meme.museum` - ### Post Processors #### Changes - update `finalize` events - add `finalize-error` and `finalize-success` events that trigger depending on whether error(s) did or did not happen - change `finalize` to always trigger regardless of error status #### Additions - add `python` post processor - add `prepare-after` event ([#4083](https://github.com/mikf/gallery-dl/issues/4083)) - [ugoira] add `"framerate": "uniform"` ([#4421](https://github.com/mikf/gallery-dl/issues/4421)) #### Improvements - [ugoira] extend `ffmpeg-output` ([#4421](https://github.com/mikf/gallery-dl/issues/4421)) #### Fixes - [ugoira] restore `libx264-prevent-odd` ([#4407](https://github.com/mikf/gallery-dl/issues/4407)) - [ugoira] fix high frame rates ([#4421](https://github.com/mikf/gallery-dl/issues/4421)) - ### Downloaders #### Fixes - [http] close connection when file already exists ([#4403](https://github.com/mikf/gallery-dl/issues/4403)) - ### Options #### Additions - support `parent>child` categories for child extractor options, for example an `imgur` album from a `reddit` thread with `reddit>imgur` - implement `subconfigs` option ([#4440](https://github.com/mikf/gallery-dl/issues/4440)) - add `"ascii+"` as a special `path-restrict` value ([#4371](https://github.com/mikf/gallery-dl/issues/4371)) #### Removals - remove `pyopenssl` option - ### Tests #### Improvements - move extractor results into their own, separate files ([#4504](https://github.com/mikf/gallery-dl/issues/4504)) - include fallback URLs in content tests ([#3163](https://github.com/mikf/gallery-dl/issues/3163)) - various test method improvements - ### Miscellaneous #### Fixes - [formatter] use value of last alternative ([#4492](https://github.com/mikf/gallery-dl/issues/4492)) - fix imports when running `__main__.py` ([#4581](https://github.com/mikf/gallery-dl/issues/4581)) - fix symlink resolution in `__main__.py` - fix default Firefox user agent string ## 1.25.8 - 2023-07-15 ### Changes - update default User-Agent header to Firefox 115 ESR ### Additions - [gfycat] support `@me` user ([#3770](https://github.com/mikf/gallery-dl/issues/3770), [#4271](https://github.com/mikf/gallery-dl/issues/4271)) - [gfycat] implement login support ([#3770](https://github.com/mikf/gallery-dl/issues/3770), [#4271](https://github.com/mikf/gallery-dl/issues/4271)) - [reddit] notify users about registering an OAuth application ([#4292](https://github.com/mikf/gallery-dl/issues/4292)) - [twitter] add `ratelimit` option ([#4251](https://github.com/mikf/gallery-dl/issues/4251)) - [twitter] use `TweetResultByRestId` endpoint that allows accessing single Tweets without login ([#4250](https://github.com/mikf/gallery-dl/issues/4250)) ### Fixes - [bunkr] use `.la` TLD for `media-files12` servers ([#4147](https://github.com/mikf/gallery-dl/issues/4147), [#4276](https://github.com/mikf/gallery-dl/issues/4276)) - [erome] ignore duplicate album IDs - [fantia] send `X-Requested-With` header ([#4273](https://github.com/mikf/gallery-dl/issues/4273)) - [gelbooru_v01] fix `source` metadata ([#4302](https://github.com/mikf/gallery-dl/issues/4302), [#4303](https://github.com/mikf/gallery-dl/issues/4303)) - [gelbooru_v01] update `vidyart` domain - [jpgfish] update domain to `jpeg.pet` - [mangaread] fix `tags` metadata extraction - [naverwebtoon] fix `comic` metadata extraction - [newgrounds] extract & pass auth token during login ([#4268](https://github.com/mikf/gallery-dl/issues/4268)) - [paheal] fix extraction ([#4262](https://github.com/mikf/gallery-dl/issues/4262), [#4293](https://github.com/mikf/gallery-dl/issues/4293)) - [paheal] unescape `source` - [philomena] fix `--range` ([#4288](https://github.com/mikf/gallery-dl/issues/4288)) - [philomena] handle `429 Too Many Requests` errors ([#4288](https://github.com/mikf/gallery-dl/issues/4288)) - [pornhub] set `accessAgeDisclaimerPH` cookie ([#4301](https://github.com/mikf/gallery-dl/issues/4301)) - [reddit] use 0.6s delay between API requests ([#4292](https://github.com/mikf/gallery-dl/issues/4292)) - [seiga] set `skip_fetish_warning` cookie ([#4242](https://github.com/mikf/gallery-dl/issues/4242)) - [slideshare] fix extraction - [twitter] fix `following` extractor not getting all users ([#4287](https://github.com/mikf/gallery-dl/issues/4287)) - [twitter] use GraphQL search endpoint by default ([#4264](https://github.com/mikf/gallery-dl/issues/4264)) - [twitter] do not treat missing `TimelineAddEntries` instruction as fatal ([#4278](https://github.com/mikf/gallery-dl/issues/4278)) - [weibo] fix cursor based pagination - [wikifeet] fix `tag` extraction ([#4289](https://github.com/mikf/gallery-dl/issues/4289), [#4291](https://github.com/mikf/gallery-dl/issues/4291)) ### Removals - [bcy] remove module - [lineblog] remove module ## 1.25.7 - 2023-07-02 ### Additions - [flickr] add 'exif' option - [flickr] add 'metadata' option ([#4227](https://github.com/mikf/gallery-dl/issues/4227)) - [mangapark] add 'source' option ([#3969](https://github.com/mikf/gallery-dl/issues/3969)) - [twitter] extend 'conversations' option ([#4211](https://github.com/mikf/gallery-dl/issues/4211)) ### Fixes - [furaffinity] improve 'description' HTML ([#4224](https://github.com/mikf/gallery-dl/issues/4224)) - [gelbooru_v01] fix '--range' ([#4167](https://github.com/mikf/gallery-dl/issues/4167)) - [hentaifox] fix titles containing '@' ([#4201](https://github.com/mikf/gallery-dl/issues/4201)) - [mangapark] update to v5 ([#3969](https://github.com/mikf/gallery-dl/issues/3969)) - [piczel] update API server address ([#4244](https://github.com/mikf/gallery-dl/issues/4244)) - [poipiku] improve error detection ([#4206](https://github.com/mikf/gallery-dl/issues/4206)) - [sankaku] improve warnings for unavailable posts - [senmanga] ensure download URLs have a scheme ([#4235](https://github.com/mikf/gallery-dl/issues/4235)) ## 1.25.6 - 2023-06-17 ### Additions - [blogger] download files from `lh*.googleusercontent.com` ([#4070](https://github.com/mikf/gallery-dl/issues/4070)) - [fantia] extract `plan` metadata ([#2477](https://github.com/mikf/gallery-dl/issues/2477)) - [fantia] emit warning for non-visible content sections ([#4128](https://github.com/mikf/gallery-dl/issues/4128)) - [furaffinity] extract `favorite_id` metadata ([#4133](https://github.com/mikf/gallery-dl/issues/4133)) - [jschan] add generic extractors for jschan image boards ([#3447](https://github.com/mikf/gallery-dl/issues/3447)) - [kemonoparty] support `.su` TLDs ([#4139](https://github.com/mikf/gallery-dl/issues/4139)) - [pixiv:novel] add `novel-bookmark` extractor ([#4111](https://github.com/mikf/gallery-dl/issues/4111)) - [pixiv:novel] add `full-series` option ([#4111](https://github.com/mikf/gallery-dl/issues/4111)) - [postimage] add gallery support, update image extractor ([#3115](https://github.com/mikf/gallery-dl/issues/3115), [#4134](https://github.com/mikf/gallery-dl/issues/4134)) - [redgifs] support galleries ([#4021](https://github.com/mikf/gallery-dl/issues/4021)) - [twitter] extract `conversation_id` metadata ([#3839](https://github.com/mikf/gallery-dl/issues/3839)) - [vipergirls] add login support ([#4166](https://github.com/mikf/gallery-dl/issues/4166)) - [vipergirls] use API endpoints ([#4166](https://github.com/mikf/gallery-dl/issues/4166)) - [formatter] implement `H` conversion ([#4164](https://github.com/mikf/gallery-dl/issues/4164)) ### Fixes - [acidimg] fix extraction ([#4136](https://github.com/mikf/gallery-dl/issues/4136)) - [bunkr] update domain to bunkrr.su ([#4159](https://github.com/mikf/gallery-dl/issues/4159), [#4189](https://github.com/mikf/gallery-dl/issues/4189)) - [bunkr] fix video downloads - [fanbox] prevent exception due to missing embeds ([#4088](https://github.com/mikf/gallery-dl/issues/4088)) - [instagram] fix retrieving `/tagged` posts ([#4122](https://github.com/mikf/gallery-dl/issues/4122)) - [jpgfish] update domain to `jpg.pet` ([#4138](https://github.com/mikf/gallery-dl/issues/4138)) - [pixiv:novel] fix error with embeds extraction ([#4175](https://github.com/mikf/gallery-dl/issues/4175)) - [pornhub] improve redirect handling ([#4188](https://github.com/mikf/gallery-dl/issues/4188)) - [reddit] fix crash due to empty `crosspost_parent_lists` ([#4120](https://github.com/mikf/gallery-dl/issues/4120), [#4172](https://github.com/mikf/gallery-dl/issues/4172)) - [redgifs] update `search` URL pattern ([#4115](https://github.com/mikf/gallery-dl/issues/4115), [#4185](https://github.com/mikf/gallery-dl/issues/4185)) - [senmanga] fix and update ([#4160](https://github.com/mikf/gallery-dl/issues/4160)) - [twitter] use GraphQL API search endpoint ([#3942](https://github.com/mikf/gallery-dl/issues/3942)) - [wallhaven] improve HTTP error handling ([#4192](https://github.com/mikf/gallery-dl/issues/4192)) - [weibo] prevent fatal exception due to missing video data ([#4150](https://github.com/mikf/gallery-dl/issues/4150)) - [weibo] fix `.json` extension for some videos ## 1.25.5 - 2023-05-27 ### Additions - [8muses] add `parts` metadata field ([#3329](https://github.com/mikf/gallery-dl/issues/3329)) - [danbooru] add `date` metadata field ([#4047](https://github.com/mikf/gallery-dl/issues/4047)) - [e621] add `date` metadata field ([#4047](https://github.com/mikf/gallery-dl/issues/4047)) - [gofile] add basic password support ([#4056](https://github.com/mikf/gallery-dl/issues/4056)) - [imagechest] implement API support ([#4065](https://github.com/mikf/gallery-dl/issues/4065)) - [instagram] add `order-files` option ([#3993](https://github.com/mikf/gallery-dl/issues/3993), [#4017](https://github.com/mikf/gallery-dl/issues/4017)) - [instagram] add `order-posts` option ([#3993](https://github.com/mikf/gallery-dl/issues/3993), [#4017](https://github.com/mikf/gallery-dl/issues/4017)) - [instagram] add `metadata` option ([#3107](https://github.com/mikf/gallery-dl/issues/3107)) - [jpgfish] add `jpg.fishing` extractors ([#2657](https://github.com/mikf/gallery-dl/issues/2657), [#2719](https://github.com/mikf/gallery-dl/issues/2719)) - [lensdump] add `lensdump.com` extractors ([#2078](https://github.com/mikf/gallery-dl/issues/2078), [#4104](https://github.com/mikf/gallery-dl/issues/4104)) - [mangaread] add `mangaread.org` extractors ([#2425](https://github.com/mikf/gallery-dl/issues/2425), [#2781](https://github.com/mikf/gallery-dl/issues/2781)) - [misskey] add `favorite` extractor ([#3950](https://github.com/mikf/gallery-dl/issues/3950)) - [pixiv] add `novel` support ([#1241](https://github.com/mikf/gallery-dl/issues/1241), [#4044](https://github.com/mikf/gallery-dl/issues/4044)) - [reddit] support cross-posted media ([#887](https://github.com/mikf/gallery-dl/issues/887), [#3586](https://github.com/mikf/gallery-dl/issues/3586), [#3976](https://github.com/mikf/gallery-dl/issues/3976)) - [postprocessor:exec] support tilde expansion for `command` - [formatter] support slicing strings as bytes ([#4087](https://github.com/mikf/gallery-dl/issues/4087)) ### Fixes - [8muses] fix value of `album[url]` ([#3329](https://github.com/mikf/gallery-dl/issues/3329)) - [danbooru] refactor pagination logic ([#4002](https://github.com/mikf/gallery-dl/issues/4002)) - [fanbox] skip invalid posts ([#4088](https://github.com/mikf/gallery-dl/issues/4088)) - [gofile] automatically fetch `website-token` - [kemonoparty] fix kemono and coomer logins sharing the same cache ([#4098](https://github.com/mikf/gallery-dl/issues/4098)) - [newgrounds] add default delay between requests ([#4046](https://github.com/mikf/gallery-dl/issues/4046)) - [nsfwalbum] detect placeholder images - [poipiku] extract full `descriptions` ([#4066](https://github.com/mikf/gallery-dl/issues/4066)) - [tcbscans] update domain to `tcbscans.com` ([#4080](https://github.com/mikf/gallery-dl/issues/4080)) - [twitter] extract TwitPic URLs in text ([#3792](https://github.com/mikf/gallery-dl/issues/3792), [#3796](https://github.com/mikf/gallery-dl/issues/3796)) - [weibo] require numeric IDs to have length >= 10 ([#4059](https://github.com/mikf/gallery-dl/issues/4059)) - [ytdl] fix crash due to removed `no_color` attribute - [cookies] improve logging behavior ([#4050](https://github.com/mikf/gallery-dl/issues/4050)) ## 1.25.4 - 2023-05-07 ### Additions - [4chanarchives] add `thread` and `board` extractors ([#4012](https://github.com/mikf/gallery-dl/issues/4012)) - [foolfuuka] add `archive.palanq.win` - [imgur] add `favorite-folder` extractor ([#4016](https://github.com/mikf/gallery-dl/issues/4016)) - [mangadex] add `status` and `tags` metadata ([#4031](https://github.com/mikf/gallery-dl/issues/4031)) - allow selecting a domain with `--cookies-from-browser` - add `--cookies-export` command-line option - add `-C` as short option for `--cookies` - include exception type in config error messages ### Fixes - [exhentai] update sadpanda check - [imagechest] load all images when a "Load More" button is present ([#4028](https://github.com/mikf/gallery-dl/issues/4028)) - [imgur] fix bug causing some images/albums from user profiles and favorites to be ignored - [pinterest] update endpoint for related board pins - [pinterest] fix `pin.it` extractor - [ytdl] fix yt-dlp `--xff/--geo-bypass` tests ([#3989](https://github.com/mikf/gallery-dl/issues/3989)) ### Removals - [420chan] remove module - [foolfuuka] remove `archive.alice.al` and `tokyochronos.net` - [foolslide] remove `sensescans.com` - [nana] remove module ## 1.25.3 - 2023-04-30 ### Additions - [imagefap] extract `description` and `categories` metadata ([#3905](https://github.com/mikf/gallery-dl/issues/3905)) - [imxto] add `gallery` extractor ([#1289](https://github.com/mikf/gallery-dl/issues/1289)) - [itchio] add `game` extractor ([#3923](https://github.com/mikf/gallery-dl/issues/3923)) - [nitter] extract user IDs from encoded banner URLs - [pixiv] allow sorting search results by popularity ([#3970](https://github.com/mikf/gallery-dl/issues/3970)) - [reddit] match `preview.redd.it` URLs ([#3935](https://github.com/mikf/gallery-dl/issues/3935)) - [sankaku] support post URLs with MD5 hashes ([#3952](https://github.com/mikf/gallery-dl/issues/3952)) - [shimmie2] add generic extractors for Shimmie2 sites ([#3734](https://github.com/mikf/gallery-dl/issues/3734), [#943](https://github.com/mikf/gallery-dl/issues/943)) - [tumblr] add `day` extractor ([#3951](https://github.com/mikf/gallery-dl/issues/3951)) - [twitter] support `profile-conversation` entries ([#3938](https://github.com/mikf/gallery-dl/issues/3938)) - [vipergirls] add `thread` and `post` extractors ([#3812](https://github.com/mikf/gallery-dl/issues/3812), [#2720](https://github.com/mikf/gallery-dl/issues/2720), [#731](https://github.com/mikf/gallery-dl/issues/731)) - [downloader:http] add `consume-content` option ([#3748](https://github.com/mikf/gallery-dl/issues/3748)) ### Fixes - [2chen] update domain to sturdychan.help - [behance] fix extraction ([#3980](https://github.com/mikf/gallery-dl/issues/3980)) - [deviantart] retry downloads with private token ([#3941](https://github.com/mikf/gallery-dl/issues/3941)) - [imagefap] fix empty `tags` metadata - [manganelo] support arbitrary minor version separators ([#3972](https://github.com/mikf/gallery-dl/issues/3972)) - [nozomi] fix file URLs ([#3925](https://github.com/mikf/gallery-dl/issues/3925)) - [oauth] catch exceptions from `webbrowser.get()` ([#3947](https://github.com/mikf/gallery-dl/issues/3947)) - [pixiv] fix `pixivision` extraction - [reddit] ignore `id-max` value `"zik0zj"`/`2147483647` ([#3939](https://github.com/mikf/gallery-dl/issues/3939), [#3862](https://github.com/mikf/gallery-dl/issues/3862), [#3697](https://github.com/mikf/gallery-dl/issues/3697), [#3606](https://github.com/mikf/gallery-dl/issues/3606), [#3546](https://github.com/mikf/gallery-dl/issues/3546), [#3521](https://github.com/mikf/gallery-dl/issues/3521), [#3412](https://github.com/mikf/gallery-dl/issues/3412)) - [sankaku] sanitize `date:` tags ([#1790](https://github.com/mikf/gallery-dl/issues/1790)) - [tumblr] fix and update pagination logic ([#2191](https://github.com/mikf/gallery-dl/issues/2191)) - [twitter] fix `user` metadata when downloading quoted Tweets ([#3922](https://github.com/mikf/gallery-dl/issues/3922)) - [ytdl] fix crash due to `--geo-bypass` deprecation ([#3975](https://github.com/mikf/gallery-dl/issues/3975)) - [postprocessor:metadata] support putting keys in quotes - include more optional dependencies in executables ([#3907](https://github.com/mikf/gallery-dl/issues/3907)) ## 1.25.2 - 2023-04-15 ### Additions - [deviantart] add `public` option - [nitter] extract videos from `source` elements ([#3912](https://github.com/mikf/gallery-dl/issues/3912)) - [twitter] add `date_liked` and `date_bookmarked` metadata for liked and bookmarked Tweets ([#3816](https://github.com/mikf/gallery-dl/issues/3816)) - [urlshortener] add support for bit.ly & t.co ([#3841](https://github.com/mikf/gallery-dl/issues/3841)) - [downloader:http] add MIME type and signature for `.heic` files ([#3915](https://github.com/mikf/gallery-dl/issues/3915)) ### Fixes - [blogger] update regex to get the highest resolution URLs ([#3863](https://github.com/mikf/gallery-dl/issues/3863), [#3870](https://github.com/mikf/gallery-dl/issues/3870)) - [bunkr] update domain to `bunkr.la` ([#3813](https://github.com/mikf/gallery-dl/issues/3813), [#3877](https://github.com/mikf/gallery-dl/issues/3877)) - [deviantart] keep using private access tokens when requesting download URLs ([#3845](https://github.com/mikf/gallery-dl/issues/3845), [#3857](https://github.com/mikf/gallery-dl/issues/3857), [#3896](https://github.com/mikf/gallery-dl/issues/3896)) - [hentaifoundry] fix content filters ([#3887](https://github.com/mikf/gallery-dl/issues/3887)) - [hotleak] fix downloading of creators whose name starts with a category name ([#3871](https://github.com/mikf/gallery-dl/issues/3871)) - [imagechest] fix extraction ([#3914](https://github.com/mikf/gallery-dl/issues/3914)) - [realbooru] fix extraction ([#2530](https://github.com/mikf/gallery-dl/issues/2530)) - [sexcom] fix pagination ([#3906](https://github.com/mikf/gallery-dl/issues/3906)) - [sexcom] fix HD video extraction - [shopify] fix `collection` extractor ([#3866](https://github.com/mikf/gallery-dl/issues/3866), [#3868](https://github.com/mikf/gallery-dl/issues/3868)) - [twitter] update to bookmark timeline v2 ([#3859](https://github.com/mikf/gallery-dl/issues/3859), [#3854](https://github.com/mikf/gallery-dl/issues/3854)) - [twitter] warn about "withheld" Tweets and users ([#3864](https://github.com/mikf/gallery-dl/issues/3864)) ### Improvements - [danbooru] reduce number of API requests when fetching extended `metadata` - [deviantart:search] detect login redirects ([#3860](https://github.com/mikf/gallery-dl/issues/3860)) - [generic] write regular expressions without `x` flags - [mastodon] try to get account IDs without access token - [twitter] calculate `date` from Tweet IDs ## 1.25.1 - 2023-03-25 ### Additions - [nitter] support nitter.it ([#3819](https://github.com/mikf/gallery-dl/issues/3819)) - [twitter] add `hashtag` extractor ([#3783](https://github.com/mikf/gallery-dl/issues/3783)) - [twitter] support Tweet content with >280 characters - [formatter] support loading f-strings from template files ([#3800](https://github.com/mikf/gallery-dl/issues/3800)) - [formatter] support filesystem paths for `\fM` modules ([#3399](https://github.com/mikf/gallery-dl/issues/3399)) - [formatter] support putting keys in quotes (e.g. `user['name']`) ([#2559](https://github.com/mikf/gallery-dl/issues/2559)) - [postprocessor:metadata] add `skip` option ([#3786](https://github.com/mikf/gallery-dl/issues/3786)) ### Fixes - [output] set `errors=replace` for output streams ([#3765](https://github.com/mikf/gallery-dl/issues/3765)) - [gelbooru] extract favorites without needing cookies ([#3704](https://github.com/mikf/gallery-dl/issues/3704)) - [gelbooru] fix and improve `--range` for pools - [hiperdex] fix extraction ([#3768](https://github.com/mikf/gallery-dl/issues/3768)) - [naverwebtoon] fix extraction ([#3729](https://github.com/mikf/gallery-dl/issues/3729)) - [nitter] fix extraction for instances without user banners - [twitter] update API query hashes and parameters - [weibo] support `mix_media_info` entries ([#3793](https://github.com/mikf/gallery-dl/issues/3793)) - fix circular reference detection for `-K` ### Changes - update `globals` instead of overwriting the default ([#3773](https://github.com/mikf/gallery-dl/issues/3773)) ## 1.25.0 - 2023-03-11 ### Changes - [e621] split `e621` extractors from `danbooru` module ([#3425](https://github.com/mikf/gallery-dl/issues/3425)) - [deviantart] remove mature scraps warning ([#3691](https://github.com/mikf/gallery-dl/issues/3691)) - [deviantart] use `/collections/all` endpoint for favorites ([#3666](https://github.com/mikf/gallery-dl/issues/3666), [#3668](https://github.com/mikf/gallery-dl/issues/3668)) - [newgrounds] update default image and audio archive IDs to prevent ID overlap ([#3681](https://github.com/mikf/gallery-dl/issues/3681)) - rename `--ignore-config` to `--config-ignore` ### Extractors - [catbox] add `file` extractor ([#3570](https://github.com/mikf/gallery-dl/issues/3570)) - [deviantart] add `search` extractor ([#538](https://github.com/mikf/gallery-dl/issues/538), [#1264](https://github.com/mikf/gallery-dl/issues/1264), [#2954](https://github.com/mikf/gallery-dl/issues/2954), [#2970](https://github.com/mikf/gallery-dl/issues/2970), [#3577](https://github.com/mikf/gallery-dl/issues/3577)) - [deviantart] add `gallery-search` extractor ([#1695](https://github.com/mikf/gallery-dl/issues/1695)) - [deviantart] support `fxdeviantart.com` URLs (##3740) - [e621] implement `notes` and `pools` metadata extraction ([#3425](https://github.com/mikf/gallery-dl/issues/3425)) - [gelbooru] add `favorite` extractor ([#3704](https://github.com/mikf/gallery-dl/issues/3704)) - [imagetwist] support `phun.imagetwist.com` and `imagehaha.com` domains ([#3622](https://github.com/mikf/gallery-dl/issues/3622)) - [instagram] add `user` metadata field ([#3107](https://github.com/mikf/gallery-dl/issues/3107)) - [manganelo] update and fix metadata extraction - [manganelo] support mobile-only chapters - [mangasee] extract `author` and `genre` metadata ([#3703](https://github.com/mikf/gallery-dl/issues/3703)) - [misskey] add `misskey` extractors ([#3717](https://github.com/mikf/gallery-dl/issues/3717)) - [pornpics] add `gallery` and `search` extractors ([#263](https://github.com/mikf/gallery-dl/issues/263), [#3544](https://github.com/mikf/gallery-dl/issues/3544), [#3654](https://github.com/mikf/gallery-dl/issues/3654)) - [redgifs] support v3 URLs ([#3588](https://github.com/mikf/gallery-dl/issues/3588). [#3589](https://github.com/mikf/gallery-dl/issues/3589)) - [redgifs] add `collection` extractors ([#3427](https://github.com/mikf/gallery-dl/issues/3427), [#3662](https://github.com/mikf/gallery-dl/issues/3662)) - [shopify] support ohpolly.com ([#440](https://github.com/mikf/gallery-dl/issues/440), [#3596](https://github.com/mikf/gallery-dl/issues/3596)) - [szurubooru] add `tag` and `post` extractors ([#3583](https://github.com/mikf/gallery-dl/issues/3583), [#3713](https://github.com/mikf/gallery-dl/issues/3713)) - [twitter] add `transform` option ### Options - [postprocessor:metadata] add `sort` and `separators` options - [postprocessor:exec] implement archive options ([#3584](https://github.com/mikf/gallery-dl/issues/3584)) - add `--config-create` command-line option ([#2333](https://github.com/mikf/gallery-dl/issues/2333)) - add `--config-toml` command-line option to load config files in TOML format - add `output.stdout`, `output.stdin`, and `output.stderr` options ([#1621](https://github.com/mikf/gallery-dl/issues/1621), [#2152](https://github.com/mikf/gallery-dl/issues/2152), [#2529](https://github.com/mikf/gallery-dl/issues/2529)) - add `hash_md5` and `hash_sha1` functions ([#3679](https://github.com/mikf/gallery-dl/issues/3679)) - implement `globals` option to enable defining custom functions for `eval` statements - implement `archive-pragma` option to use SQLite PRAGMA statements - implement `actions` to trigger events on logging messages ([#3338](https://github.com/mikf/gallery-dl/issues/3338), [#3630](https://github.com/mikf/gallery-dl/issues/3630)) - implement ability to load external extractor classes - `-X/--extractors` command-line options - `extractor.modules-sources` config option ### Fixes - [bunkr] fix extraction ([#3636](https://github.com/mikf/gallery-dl/issues/3636), [#3655](https://github.com/mikf/gallery-dl/issues/3655)) - [danbooru] send gallery-dl User-Agent ([#3665](https://github.com/mikf/gallery-dl/issues/3665)) - [deviantart] fix crash when handling deleted deviations in status updates ([#3656](https://github.com/mikf/gallery-dl/issues/3656)) - [fanbox] fix crash with missing images ([#3673](https://github.com/mikf/gallery-dl/issues/3673)) - [imagefap] update `gallery` URLs ([#3595](https://github.com/mikf/gallery-dl/issues/3595)) - [imagefap] fix infinite pagination loop ([#3594](https://github.com/mikf/gallery-dl/issues/3594)) - [imagefap] fix metadata extraction - [oauth] use default name for browsers without `name` attribute - [pinterest] unescape search terms ([#3621](https://github.com/mikf/gallery-dl/issues/3621)) - [pixiv] fix `--write-tags` for `"tags": "original"` ([#3675](https://github.com/mikf/gallery-dl/issues/3675)) - [poipiku] warn about incorrect passwords ([#3646](https://github.com/mikf/gallery-dl/issues/3646)) - [reddit] update `videos` option ([#3712](https://github.com/mikf/gallery-dl/issues/3712)) - [soundgasm] rewrite ([#3578](https://github.com/mikf/gallery-dl/issues/3578)) - [telegraph] fix extraction when images are not in `
` elements ([#3590](https://github.com/mikf/gallery-dl/issues/3590)) - [tumblr] raise more detailed errors for dashboard-only blogs ([#3628](https://github.com/mikf/gallery-dl/issues/3628)) - [twitter] fix some `original` retweets not downloading ([#3744](https://github.com/mikf/gallery-dl/issues/3744)) - [ytdl] fix `--parse-metadata` ([#3663](https://github.com/mikf/gallery-dl/issues/3663)) - [downloader:ytdl] prevent exception on empty results ### Improvements - [downloader:http] use `time.monotonic()` - [downloader:http] update `_http_retry` to accept a Python function ([#3569](https://github.com/mikf/gallery-dl/issues/3569)) - [postprocessor:metadata] speed up JSON encoding - replace `json.loads/dumps` with direct calls to `JSONDecoder.decode/JSONEncoder.encode` - improve `option.Formatter` performance ### Removals - [nitter] remove `nitter.pussthecat.org` ## 1.24.5 - 2023-01-28 ### Additions - [booru] add `url` option - [danbooru] extend `metadata` option ([#3505](https://github.com/mikf/gallery-dl/issues/3505)) - [deviantart] add extractor for status updates ([#3539](https://github.com/mikf/gallery-dl/issues/3539), [#3541](https://github.com/mikf/gallery-dl/issues/3541)) - [deviantart] add support for `/deviation/` and `fav.me` URLs ([#3558](https://github.com/mikf/gallery-dl/issues/3558), [#3560](https://github.com/mikf/gallery-dl/issues/3560)) - [kemonoparty] extract `hash` metadata for discord files ([#3531](https://github.com/mikf/gallery-dl/issues/3531)) - [lexica] add `search` extractor ([#3567](https://github.com/mikf/gallery-dl/issues/3567)) - [mastodon] add `num` and `count` metadata fields ([#3517](https://github.com/mikf/gallery-dl/issues/3517)) - [nudecollect] add `image` and `album` extractors ([#2430](https://github.com/mikf/gallery-dl/issues/2430), [#2818](https://github.com/mikf/gallery-dl/issues/2818), [#3575](https://github.com/mikf/gallery-dl/issues/3575)) - [wikifeet] add `gallery` extractor ([#519](https://github.com/mikf/gallery-dl/issues/519), [#3537](https://github.com/mikf/gallery-dl/issues/3537)) - [downloader:http] add signature checks for `.blend`, `.obj`, and `.clip` files ([#3535](https://github.com/mikf/gallery-dl/issues/3535)) - add `extractor.retry-codes` option - add `-O/--postprocessor-option` command-line option ([#3565](https://github.com/mikf/gallery-dl/issues/3565)) - improve `write-pages` output ### Fixes - [bunkr] fix downloading `.mkv` and `.ts` files ([#3571](https://github.com/mikf/gallery-dl/issues/3571)) - [fantia] send `X-CSRF-Token` headers ([#3576](https://github.com/mikf/gallery-dl/issues/3576)) - [generic] fix regex for non-src image URLs ([#3555](https://github.com/mikf/gallery-dl/issues/3555)) - [hiperdex] update domain ([#3572](https://github.com/mikf/gallery-dl/issues/3572)) - [hotleak] fix video URLs ([#3516](https://github.com/mikf/gallery-dl/issues/3516), [#3525](https://github.com/mikf/gallery-dl/issues/3525), [#3563](https://github.com/mikf/gallery-dl/issues/3563), [#3581](https://github.com/mikf/gallery-dl/issues/3581)) - [instagram] always show `cursor` value after errors ([#3440](https://github.com/mikf/gallery-dl/issues/3440)) - [instagram] update API domain, headers, and csrf token handling - [oauth] show `client-id`/`api-key` values ([#3518](https://github.com/mikf/gallery-dl/issues/3518)) - [philomena] match URLs with www subdomain - [sankaku] update URL pattern ([#3523](https://github.com/mikf/gallery-dl/issues/3523)) - [twitter] refresh guest tokens ([#3445](https://github.com/mikf/gallery-dl/issues/3445), [#3458](https://github.com/mikf/gallery-dl/issues/3458)) - [twitter] fix search pagination ([#3536](https://github.com/mikf/gallery-dl/issues/3536), [#3534](https://github.com/mikf/gallery-dl/issues/3534), [#3549](https://github.com/mikf/gallery-dl/issues/3549)) - [twitter] use `"browser": "firefox"` by default ([#3522](https://github.com/mikf/gallery-dl/issues/3522)) ## 1.24.4 - 2023-01-11 ### Additions - [downloader:http] add `validate` option ### Fixes - [kemonoparty] fix regression from commit 473bd380 ([#3519](https://github.com/mikf/gallery-dl/issues/3519)) ## 1.24.3 - 2023-01-10 ### Additions - [danbooru] extract `uploader` metadata ([#3457](https://github.com/mikf/gallery-dl/issues/3457)) - [deviantart] initial implementation of username & password login for `scraps` ([#1029](https://github.com/mikf/gallery-dl/issues/1029)) - [fanleaks] add `post` and `model` extractors ([#3468](https://github.com/mikf/gallery-dl/issues/3468), [#3474](https://github.com/mikf/gallery-dl/issues/3474)) - [imagefap] add `folder` extractor ([#3504](https://github.com/mikf/gallery-dl/issues/3504)) - [lynxchan] support `bbw-chan.nl` ([#3456](https://github.com/mikf/gallery-dl/issues/3456), [#3463](https://github.com/mikf/gallery-dl/issues/3463)) - [pinterest] support `All Pins` boards ([#2855](https://github.com/mikf/gallery-dl/issues/2855), [#3484](https://github.com/mikf/gallery-dl/issues/3484)) - [pinterest] add `domain` option ([#3484](https://github.com/mikf/gallery-dl/issues/3484)) - [pixiv] implement `metadata-bookmark` option ([#3417](https://github.com/mikf/gallery-dl/issues/3417)) - [tcbscans] add `chapter` and `manga` extractors ([#3189](https://github.com/mikf/gallery-dl/issues/3189)) - [twitter] implement `syndication=extended` ([#3483](https://github.com/mikf/gallery-dl/issues/3483)) - implement slice notation for `range` options ([#918](https://github.com/mikf/gallery-dl/issues/918), [#2865](https://github.com/mikf/gallery-dl/issues/2865)) - allow `filter` options to be a list of expressions ### Fixes - [behance] use delay between requests ([#2507](https://github.com/mikf/gallery-dl/issues/2507)) - [bunkr] fix URLs returned by API ([#3481](https://github.com/mikf/gallery-dl/issues/3481)) - [fanbox] return `imageMap` files in order ([#2718](https://github.com/mikf/gallery-dl/issues/2718)) - [imagefap] use delay between requests ([#1140](https://github.com/mikf/gallery-dl/issues/1140)) - [imagefap] warn about redirects to `/human-verification` ([#1140](https://github.com/mikf/gallery-dl/issues/1140)) - [kemonoparty] reject invalid/empty files ([#3510](https://github.com/mikf/gallery-dl/issues/3510)) - [myhentaigallery] handle whitespace before title tag ([#3503](https://github.com/mikf/gallery-dl/issues/3503)) - [poipiku] fix extraction for a different warning button style ([#3493](https://github.com/mikf/gallery-dl/issues/3493), [#3460](https://github.com/mikf/gallery-dl/issues/3460)) - [poipiku] warn about login requirements - [telegraph] fix file URLs ([#3506](https://github.com/mikf/gallery-dl/issues/3506)) - [twitter] fix crash when using `expand` and `syndication` ([#3473](https://github.com/mikf/gallery-dl/issues/3473)) - [twitter] apply tweet type checks before uniqueness check ([#3439](https://github.com/mikf/gallery-dl/issues/3439), [#3455](https://github.com/mikf/gallery-dl/issues/3455)) - [twitter] force `https://` for TwitPic URLs ([#3449](https://github.com/mikf/gallery-dl/issues/3449)) - [ytdl] adapt to yt-dlp changes - update and improve documentation ([#3453](https://github.com/mikf/gallery-dl/issues/3453), [#3462](https://github.com/mikf/gallery-dl/issues/3462), [#3496](https://github.com/mikf/gallery-dl/issues/3496)) ## 1.24.2 - 2022-12-18 ### Additions - [2chen] support `.club` URLs ([#3406](https://github.com/mikf/gallery-dl/issues/3406)) - [deviantart] extract sta.sh URLs from `text_content` ([#3366](https://github.com/mikf/gallery-dl/issues/3366)) - [deviantart] add `/view` URL support ([#3367](https://github.com/mikf/gallery-dl/issues/3367)) - [e621] implement `threshold` option to control pagination ([#3413](https://github.com/mikf/gallery-dl/issues/3413)) - [fapello] add `post`, `user` and `path` extractors ([#3065](https://github.com/mikf/gallery-dl/issues/3065), [#3360](https://github.com/mikf/gallery-dl/issues/3360), [#3415](https://github.com/mikf/gallery-dl/issues/3415)) - [imgur] add support for imgur.io URLs ([#3419](https://github.com/mikf/gallery-dl/issues/3419)) - [lynxchan] add generic extractors for lynxchan imageboards ([#3389](https://github.com/mikf/gallery-dl/issues/3389), [#3394](https://github.com/mikf/gallery-dl/issues/3394)) - [mangafox] extract more metadata ([#3167](https://github.com/mikf/gallery-dl/issues/3167)) - [pixiv] extract `date_url` metadata ([#3405](https://github.com/mikf/gallery-dl/issues/3405)) - [soundgasm] add `audio` and `user` extractors ([#3384](https://github.com/mikf/gallery-dl/issues/3384), [#3388](https://github.com/mikf/gallery-dl/issues/3388)) - [webmshare] add `video` extractor ([#2410](https://github.com/mikf/gallery-dl/issues/2410)) - support Firefox containers for `--cookies-from-browser` ([#3346](https://github.com/mikf/gallery-dl/issues/3346)) ### Fixes - [2chen] fix file URLs - [bunkr] update domain ([#3391](https://github.com/mikf/gallery-dl/issues/3391)) - [exhentai] fix pagination - [imagetwist] fix extraction - [imgth] rewrite - [instagram] prevent post `date` overwriting file `date` ([#3392](https://github.com/mikf/gallery-dl/issues/3392)) - [khinsider] fix metadata extraction - [komikcast] update domain and fix extraction - [reddit] increase `id-max` default value ([#3397](https://github.com/mikf/gallery-dl/issues/3397)) - [seiga] raise error when redirected to login page ([#3401](https://github.com/mikf/gallery-dl/issues/3401)) - [sexcom] fix video URLs ([#3408](https://github.com/mikf/gallery-dl/issues/3408), [#3414](https://github.com/mikf/gallery-dl/issues/3414)) - [twitter] update `search` pagination ([#544](https://github.com/mikf/gallery-dl/issues/544)) - [warosu] fix and update - [zerochan] update for layout v3 - restore paths for archived files ([#3362](https://github.com/mikf/gallery-dl/issues/3362), [#3377](https://github.com/mikf/gallery-dl/issues/3377)) - use `util.NONE` as `keyword-default` default value ([#3334](https://github.com/mikf/gallery-dl/issues/3334)) ### Removals - [foolslide] remove `kireicake` - [kissgoddess] remove module ## 1.24.1 - 2022-12-04 ### Additions - [artstation] add `pro-first` option ([#3273](https://github.com/mikf/gallery-dl/issues/3273)) - [artstation] add `max-posts` option ([#3270](https://github.com/mikf/gallery-dl/issues/3270)) - [fapachi] add `post` and `user` extractors ([#3339](https://github.com/mikf/gallery-dl/issues/3339), [#3347](https://github.com/mikf/gallery-dl/issues/3347)) - [inkbunny] provide additional metadata ([#3274](https://github.com/mikf/gallery-dl/issues/3274)) - [nitter] add `retweets` option ([#3278](https://github.com/mikf/gallery-dl/issues/3278)) - [nitter] add `videos` option ([#3279](https://github.com/mikf/gallery-dl/issues/3279)) - [nitter] support `/i/web/` and `/i/user/` URLs ([#3310](https://github.com/mikf/gallery-dl/issues/3310)) - [pixhost] add `gallery` support ([#3336](https://github.com/mikf/gallery-dl/issues/3336), [#3353](https://github.com/mikf/gallery-dl/issues/3353)) - [weibo] add `count` metadata field ([#3305](https://github.com/mikf/gallery-dl/issues/3305)) - [downloader:http] add `retry-codes` option ([#3313](https://github.com/mikf/gallery-dl/issues/3313)) - [formatter] implement `S` format specifier to sort lists ([#3266](https://github.com/mikf/gallery-dl/issues/3266)) - implement `version-metadata` option ([#3201](https://github.com/mikf/gallery-dl/issues/3201)) ### Fixes - [2chen] fix extraction ([#3354](https://github.com/mikf/gallery-dl/issues/3354), [#3356](https://github.com/mikf/gallery-dl/issues/3356)) - [bcy] fix JSONDecodeError ([#3321](https://github.com/mikf/gallery-dl/issues/3321)) - [bunkr] fix video downloads ([#3326](https://github.com/mikf/gallery-dl/issues/3326), [#3335](https://github.com/mikf/gallery-dl/issues/3335)) - [bunkr] use `media-files` servers for more file types - [itaku] remove `Extreme` rating ([#3285](https://github.com/mikf/gallery-dl/issues/3285), [#3287](https://github.com/mikf/gallery-dl/issues/3287)) - [hitomi] apply format check for every image ([#3280](https://github.com/mikf/gallery-dl/issues/3280)) - [hotleak] fix UnboundLocalError ([#3288](https://github.com/mikf/gallery-dl/issues/3288), [#3293](https://github.com/mikf/gallery-dl/issues/3293)) - [nitter] sanitize filenames ([#3294](https://github.com/mikf/gallery-dl/issues/3294)) - [nitter] retry downloads on 404 ([#3313](https://github.com/mikf/gallery-dl/issues/3313)) - [nitter] set `hlsPlayback` cookie - [patreon] fix `403 Forbidden` errors ([#3341](https://github.com/mikf/gallery-dl/issues/3341)) - [patreon] improve `campaign_id` extraction ([#3235](https://github.com/mikf/gallery-dl/issues/3235)) - [patreon] update API query parameters - [pixiv] preserve `tags` order ([#3266](https://github.com/mikf/gallery-dl/issues/3266)) - [reddit] use `dash_url` for videos ([#3258](https://github.com/mikf/gallery-dl/issues/3258), [#3306](https://github.com/mikf/gallery-dl/issues/3306)) - [twitter] fix error when using user IDs for suspended accounts - [weibo] fix bug with empty `playback_list` ([#3301](https://github.com/mikf/gallery-dl/issues/3301)) - [downloader:http] fix potential `ZeroDivisionError` ([#3328](https://github.com/mikf/gallery-dl/issues/3328)) ### Removals - [lolisafe] remove `zz.ht` ## 1.24.0 - 2022-11-20 ### Additions - [exhentai] add metadata to search results ([#3181](https://github.com/mikf/gallery-dl/issues/3181)) - [gelbooru_v02] implement `notes` extraction - [instagram] add `guide` extractor ([#3192](https://github.com/mikf/gallery-dl/issues/3192)) - [lolisafe] add support for xbunkr ([#3153](https://github.com/mikf/gallery-dl/issues/3153), [#3156](https://github.com/mikf/gallery-dl/issues/3156)) - [mastodon] add `instance_remote` metadata field ([#3119](https://github.com/mikf/gallery-dl/issues/3119)) - [nitter] add extractors for Nitter instances ([#2415](https://github.com/mikf/gallery-dl/issues/2415), [#2696](https://github.com/mikf/gallery-dl/issues/2696)) - [pixiv] add support for new daily AI rankings category ([#3214](https://github.com/mikf/gallery-dl/issues/3214), [#3221](https://github.com/mikf/gallery-dl/issues/3221)) - [twitter] add `avatar` and `background` extractors ([#349](https://github.com/mikf/gallery-dl/issues/349), [#3023](https://github.com/mikf/gallery-dl/issues/3023)) - [uploadir] add support for `uploadir.com` ([#3162](https://github.com/mikf/gallery-dl/issues/3162)) - [wallhaven] add `user` extractor ([#3212](https://github.com/mikf/gallery-dl/issues/3212), [#3213](https://github.com/mikf/gallery-dl/issues/3213), [#3226](https://github.com/mikf/gallery-dl/issues/3226)) - [downloader:http] add `chunk-size` option ([#3143](https://github.com/mikf/gallery-dl/issues/3143)) - [downloader:http] add file signature check for `.mp4` files - [downloader:http] add file signature check and MIME type for `.avif` files - [postprocessor] implement `post-after` event ([#3117](https://github.com/mikf/gallery-dl/issues/3117)) - [postprocessor:metadata] implement `"mode": "jsonl"` - [postprocessor:metadata] add `open`, `encoding`, and `private` options - add `--chunk-size` command-line option ([#3143](https://github.com/mikf/gallery-dl/issues/3143)) - add `--user-agent` command-line option - implement `http-metadata` option - implement `"user-agent": "browser"` ([#2636](https://github.com/mikf/gallery-dl/issues/2636)) ### Changes - [deviantart] restore cookies warning for mature scraps ([#3129](https://github.com/mikf/gallery-dl/issues/3129)) - [instagram] use REST API for unauthenticated users by default - [downloader:http] increase default `chunk-size` to 32768 bytes ([#3143](https://github.com/mikf/gallery-dl/issues/3143)) - build Windows executables using py2exe's new `freeze()` API - build executables on GitHub Actions with Python 3.11 - reword error text for unsupported URLs ### Fixes - [exhentai] fix pagination ([#3181](https://github.com/mikf/gallery-dl/issues/3181)) - [khinsider] fix extraction ([#3215](https://github.com/mikf/gallery-dl/issues/3215), [#3219](https://github.com/mikf/gallery-dl/issues/3219)) - [realbooru] fix download URLs ([#2530](https://github.com/mikf/gallery-dl/issues/2530)) - [realbooru] fix `tags` extraction ([#2530](https://github.com/mikf/gallery-dl/issues/2530)) - [tumblr] fall back to `gifv` when possible ([#3095](https://github.com/mikf/gallery-dl/issues/3095), [#3159](https://github.com/mikf/gallery-dl/issues/3159)) - [twitter] fix login ([#3220](https://github.com/mikf/gallery-dl/issues/3220)) - [twitter] update URL for syndication API ([#3160](https://github.com/mikf/gallery-dl/issues/3160)) - [weibo] send `Referer` headers ([#3188](https://github.com/mikf/gallery-dl/issues/3188)) - [ytdl] update `parse_bytes` location ([#3256](https://github.com/mikf/gallery-dl/issues/3256)) ### Improvements - [imxto] extract additional metadata ([#3118](https://github.com/mikf/gallery-dl/issues/3118), [#3175](https://github.com/mikf/gallery-dl/issues/3175)) - [instagram] allow downloading avatars for private profiles ([#3255](https://github.com/mikf/gallery-dl/issues/3255)) - [pixiv] raise error for invalid search/ranking parameters ([#3214](https://github.com/mikf/gallery-dl/issues/3214)) - [twitter] update `bookmarks` pagination ([#3172](https://github.com/mikf/gallery-dl/issues/3172)) - [downloader:http] refactor file signature checks - [downloader:http] improve `-r/--limit-rate` accuracy ([#3143](https://github.com/mikf/gallery-dl/issues/3143)) - add loaded config files to debug output - improve `-K` output for lists ### Removals - [instagram] remove login support ([#3139](https://github.com/mikf/gallery-dl/issues/3139), [#3141](https://github.com/mikf/gallery-dl/issues/3141), [#3191](https://github.com/mikf/gallery-dl/issues/3191)) - [instagram] remove `channel` extractor - [ngomik] remove module ## 1.23.5 - 2022-10-30 ### Fixes - [instagram] fix AttributeError on user stories extraction ([#3123](https://github.com/mikf/gallery-dl/issues/3123)) ## 1.23.4 - 2022-10-29 ### Additions - [aibooru] add support for aibooru.online ([#3075](https://github.com/mikf/gallery-dl/issues/3075)) - [instagram] add 'avatar' extractor ([#929](https://github.com/mikf/gallery-dl/issues/929), [#1097](https://github.com/mikf/gallery-dl/issues/1097), [#2992](https://github.com/mikf/gallery-dl/issues/2992)) - [instagram] support 'instagram.com/s/' highlight URLs ([#3076](https://github.com/mikf/gallery-dl/issues/3076)) - [instagram] extract 'coauthors' metadata ([#3107](https://github.com/mikf/gallery-dl/issues/3107)) - [mangasee] add support for 'mangalife' ([#3086](https://github.com/mikf/gallery-dl/issues/3086)) - [mastodon] add 'bookmark' extractor ([#3109](https://github.com/mikf/gallery-dl/issues/3109)) - [mastodon] support cross-instance user references and '/web/' URLs ([#3109](https://github.com/mikf/gallery-dl/issues/3109)) - [moebooru] implement 'notes' extraction ([#3094](https://github.com/mikf/gallery-dl/issues/3094)) - [pixiv] extend 'metadata' option ([#3057](https://github.com/mikf/gallery-dl/issues/3057)) - [reactor] match 'best', 'new', 'all' URLs ([#3073](https://github.com/mikf/gallery-dl/issues/3073)) - [smugloli] add 'smugloli' extractors ([#3060](https://github.com/mikf/gallery-dl/issues/3060)) - [tumblr] add 'fallback-delay' and 'fallback-retries' options ([#2957](https://github.com/mikf/gallery-dl/issues/2957)) - [vichan] add generic extractors for vichan imageboards ### Fixes - [bcy] fix extraction ([#3103](https://github.com/mikf/gallery-dl/issues/3103)) - [gelbooru] support alternate parameter order in post URLs ([#2821](https://github.com/mikf/gallery-dl/issues/2821)) - [hentai2read] support minor versions in chapter URLs ([#3089](https://github.com/mikf/gallery-dl/issues/3089)) - [hentaihere] support minor versions in chapter URLs - [kemonoparty] fix 'dms' extraction ([#3106](https://github.com/mikf/gallery-dl/issues/3106)) - [kemonoparty] update pagination offset - [manganelo] update domain to 'chapmanganato.com' ([#3097](https://github.com/mikf/gallery-dl/issues/3097)) - [pixiv] use 'exact_match_for_tags' as default search mode ([#3092](https://github.com/mikf/gallery-dl/issues/3092)) - [redgifs] fix 'token' extraction ([#3080](https://github.com/mikf/gallery-dl/issues/3080), [#3081](https://github.com/mikf/gallery-dl/issues/3081)) - [skeb] fix extraction ([#3112](https://github.com/mikf/gallery-dl/issues/3112)) - improve compatibility of DownloadArchive ([#3078](https://github.com/mikf/gallery-dl/issues/3078)) ## 1.23.3 - 2022-10-15 ### Additions - [2chen] Add `2chen.moe` extractor ([#2707](https://github.com/mikf/gallery-dl/issues/2707)) - [8chan] add `thread` and `board` extractors ([#2938](https://github.com/mikf/gallery-dl/issues/2938)) - [deviantart] add `group` option ([#3018](https://github.com/mikf/gallery-dl/issues/3018)) - [fanbox] add `content` metadata field ([#3020](https://github.com/mikf/gallery-dl/issues/3020)) - [instagram] restore `cursor` functionality ([#2991](https://github.com/mikf/gallery-dl/issues/2991)) - [instagram] restore warnings for private profiles ([#3004](https://github.com/mikf/gallery-dl/issues/3004), [#3045](https://github.com/mikf/gallery-dl/issues/3045)) - [nana] add `nana` extractors ([#2967](https://github.com/mikf/gallery-dl/issues/2967)) - [nijie] add `feed` and `followed` extractors ([#3048](https://github.com/mikf/gallery-dl/issues/3048)) - [tumblr] support `https://www.tumblr.com/BLOGNAME` URLs ([#3034](https://github.com/mikf/gallery-dl/issues/3034)) - [tumblr] add `offset` option - [vk] add `tagged` extractor ([#2997](https://github.com/mikf/gallery-dl/issues/2997)) - add `path-extended` option ([#3021](https://github.com/mikf/gallery-dl/issues/3021)) - emit debug logging messages before calling time.sleep() ([#2982](https://github.com/mikf/gallery-dl/issues/2982)) ### Changes - [postprocessor:metadata] assume `"mode": "custom"` when `format` is given ### Fixes - [artstation] skip missing projects ([#3016](https://github.com/mikf/gallery-dl/issues/3016)) - [danbooru] fix ugoira metadata extraction ([#3056](https://github.com/mikf/gallery-dl/issues/3056)) - [deviantart] fix `deviation` extraction ([#2981](https://github.com/mikf/gallery-dl/issues/2981)) - [hitomi] fall back to `webp` when selected format is not available ([#3030](https://github.com/mikf/gallery-dl/issues/3030)) - [imagefap] fix and improve folder extraction and gallery pagination ([#3013](https://github.com/mikf/gallery-dl/issues/3013)) - [instagram] fix login ([#3011](https://github.com/mikf/gallery-dl/issues/3011), [#3015](https://github.com/mikf/gallery-dl/issues/3015)) - [nozomi] fix extraction ([#3051](https://github.com/mikf/gallery-dl/issues/3051)) - [redgifs] fix extraction ([#3037](https://github.com/mikf/gallery-dl/issues/3037)) - [tumblr] sleep between fallback retries ([#2957](https://github.com/mikf/gallery-dl/issues/2957)) - [vk] unescape error messages - fix duplicated metadata bug with `-j` ([#3033](https://github.com/mikf/gallery-dl/issues/3033)) - fix bug when processing input file comments ([#2808](https://github.com/mikf/gallery-dl/issues/2808)) ## 1.23.2 - 2022-10-01 ### Additions - [artstation] support search filters ([#2970](https://github.com/mikf/gallery-dl/issues/2970)) - [blogger] add `label` and `query` metadata fields ([#2930](https://github.com/mikf/gallery-dl/issues/2930)) - [exhentai] add a slash to the end of gallery URLs ([#2947](https://github.com/mikf/gallery-dl/issues/2947)) - [instagram] add `count` metadata field ([#2979](https://github.com/mikf/gallery-dl/issues/2979)) - [instagram] add `api` option - [kemonoparty] add `count` metadata field ([#2952](https://github.com/mikf/gallery-dl/issues/2952)) - [mastodon] warn about moved accounts ([#2939](https://github.com/mikf/gallery-dl/issues/2939)) - [newgrounds] add `games` extractor ([#2955](https://github.com/mikf/gallery-dl/issues/2955)) - [newgrounds] extract `type` metadata - [pixiv] add `series` extractor ([#2964](https://github.com/mikf/gallery-dl/issues/2964)) - [sankaku] implement `refresh` option ([#2958](https://github.com/mikf/gallery-dl/issues/2958)) - [skeb] add `search` extractor and `filters` option ([#2945](https://github.com/mikf/gallery-dl/issues/2945)) ### Fixes - [deviantart] fix extraction ([#2981](https://github.com/mikf/gallery-dl/issues/2981), [#2983](https://github.com/mikf/gallery-dl/issues/2983)) - [fappic] fix extraction - [instagram] extract higher-resolution photos ([#2666](https://github.com/mikf/gallery-dl/issues/2666)) - [instagram] fix `username` and `fullname` metadata for saved posts ([#2911](https://github.com/mikf/gallery-dl/issues/2911)) - [instagram] update API headers - [kemonoparty] send `Referer` headers ([#2989](https://github.com/mikf/gallery-dl/issues/2989), [#2990](https://github.com/mikf/gallery-dl/issues/2990)) - [kemonoparty] restore `favorites` API endpoints ([#2994](https://github.com/mikf/gallery-dl/issues/2994)) - [myportfolio] use fallback when no images are found ([#2959](https://github.com/mikf/gallery-dl/issues/2959)) - [plurk] fix extraction ([#2977](https://github.com/mikf/gallery-dl/issues/2977)) - [sankaku] detect expired links ([#2958](https://github.com/mikf/gallery-dl/issues/2958)) - [tumblr] retry extraction of failed higher-resolution images ([#2957](https://github.com/mikf/gallery-dl/issues/2957)) ## 1.23.1 - 2022-09-18 ### Additions - [flickr] add support for `secure.flickr.com` URLs ([#2910](https://github.com/mikf/gallery-dl/issues/2910)) - [hotleak] add hotleak extractors ([#2890](https://github.com/mikf/gallery-dl/issues/2890), [#2909](https://github.com/mikf/gallery-dl/issues/2909)) - [instagram] add `highlight_title` and `date` metadata for highlight downloads ([#2879](https://github.com/mikf/gallery-dl/issues/2879)) - [paheal] add support for videos ([#2892](https://github.com/mikf/gallery-dl/issues/2892)) - [tumblr] fetch high-quality inline images ([#2877](https://github.com/mikf/gallery-dl/issues/2877)) - [tumblr] implement `ratelimit` option ([#2919](https://github.com/mikf/gallery-dl/issues/2919)) - [twitter] add general support for unified cards ([#2875](https://github.com/mikf/gallery-dl/issues/2875)) - [twitter] implement `cards-blacklist` option ([#2875](https://github.com/mikf/gallery-dl/issues/2875)) - [zerochan] add `metadata` option ([#2861](https://github.com/mikf/gallery-dl/issues/2861)) - [postprocessor:zip] implement `files` option ([#2872](https://github.com/mikf/gallery-dl/issues/2872)) ### Fixes - [bunkr] fix extraction ([#2903](https://github.com/mikf/gallery-dl/issues/2903)) - [bunkr] use `media-files` servers for `m4v` and `mov` downloads ([#2925](https://github.com/mikf/gallery-dl/issues/2925)) - [exhentai] improve 509.gif detection ([#2901](https://github.com/mikf/gallery-dl/issues/2901)) - [exhentai] guess extension for original files ([#2842](https://github.com/mikf/gallery-dl/issues/2842)) - [poipiku] use `img-org.poipiku.com` as image domain ([#2796](https://github.com/mikf/gallery-dl/issues/2796)) - [reddit] prevent exception with empty submission URLs ([#2913](https://github.com/mikf/gallery-dl/issues/2913)) - [redgifs] fix download URLs ([#2884](https://github.com/mikf/gallery-dl/issues/2884)) - [smugmug] update default API credentials ([#2881](https://github.com/mikf/gallery-dl/issues/2881)) - [twitter] provide proper `date` for syndication results ([#2920](https://github.com/mikf/gallery-dl/issues/2920)) - [twitter] fix new-style `/card_img/` URLs - remove all whitespace before comments after input file URLs ([#2808](https://github.com/mikf/gallery-dl/issues/2808)) ## 1.23.0 - 2022-08-28 ### Changes - [twitter] update `user` and `author` metdata fields - for URLs with a single username or ID like `https://twitter.com/USER` or a search with a single `from:` statement, `user` will now always refer to the user referenced in the URL. - for all other URLs like `https://twitter.com/i/bookmarks`, `user` and `author` refer to the same user - `author` will always refer to the original Tweet author - [twitter] update `quote_id` and `quote_by` metadata fields - `quote_id` is now non-zero for quoted Tweets and contains the Tweet ID of the quotng Tweet (was the other way round before) - `quote_by` is only defined for quoted Tweets like before, but now contains the screen name of the user quoting this Tweet - [skeb] improve archive IDs for thumbnails and article images ### Additions - [artstation] add `num` and `count` metadata fields ([#2764](https://github.com/mikf/gallery-dl/issues/2764)) - [catbox] add `album` extractor ([#2410](https://github.com/mikf/gallery-dl/issues/2410)) - [blogger] emit metadata for posts without files ([#2789](https://github.com/mikf/gallery-dl/issues/2789)) - [foolfuuka] update supported domains - [gelbooru] add support for `api_key` and `user_id` ([#2767](https://github.com/mikf/gallery-dl/issues/2767)) - [gelbooru] implement pagination for `pool` results ([#2853](https://github.com/mikf/gallery-dl/issues/2853)) - [instagram] add support for a user's saved collections ([#2769](https://github.com/mikf/gallery-dl/issues/2769)) - [instagram] provide `date` for directory format strings ([#2830](https://github.com/mikf/gallery-dl/issues/2830)) - [kemonoparty] add `favorites` option ([#2826](https://github.com/mikf/gallery-dl/issues/2826), [#2831](https://github.com/mikf/gallery-dl/issues/2831)) - [oauth] add `host` config option ([#2806](https://github.com/mikf/gallery-dl/issues/2806)) - [rule34] implement pagination for `pool` results ([#2853](https://github.com/mikf/gallery-dl/issues/2853)) - [skeb] add option to download `article` images ([#1031](https://github.com/mikf/gallery-dl/issues/1031)) - [tumblr] download higher-quality images ([#2761](https://github.com/mikf/gallery-dl/issues/2761)) - [tumblr] add `count` metadata field ([#2804](https://github.com/mikf/gallery-dl/issues/2804)) - [wallhaven] implement `metadata` option ([#2803](https://github.com/mikf/gallery-dl/issues/2803)) - [zerochan] add `tag` and `image` extractors ([#1434](https://github.com/mikf/gallery-dl/issues/1434)) - [zerochan] implement login with username & password ([#1434](https://github.com/mikf/gallery-dl/issues/1434)) - [postprocessor:metadata] implement `mode: modify` and `mode: delete` ([#2640](https://github.com/mikf/gallery-dl/issues/2640)) - [formatter] add `g` conversion for slugifying a string ([#2410](https://github.com/mikf/gallery-dl/issues/2410)) - [formatter] apply `:J` only to lists ([#2833](https://github.com/mikf/gallery-dl/issues/2833)) - implement `path-metadata` option ([#2734](https://github.com/mikf/gallery-dl/issues/2734)) - allow comments after input file URLs ([#2808](https://github.com/mikf/gallery-dl/issues/2808)) - add global `warnings` option to control `urllib3` warning behavior ([#2762](https://github.com/mikf/gallery-dl/issues/2762)) ### Fixes - [bunkr] fix extraction ([#2788](https://github.com/mikf/gallery-dl/issues/2788)) - [deviantart] use public access token for journals ([#2702](https://github.com/mikf/gallery-dl/issues/2702)) - [e621] fix extraction of `popular` posts - [fanbox] download cover images in original size ([#2784](https://github.com/mikf/gallery-dl/issues/2784)) - [mastodon] allow downloading without access token ([#2782](https://github.com/mikf/gallery-dl/issues/2782)) - [hitomi] update cache expiry time ([#2863](https://github.com/mikf/gallery-dl/issues/2863)) - [hitomi] fix error when number of tag results is a multiple of 25 ([#2870](https://github.com/mikf/gallery-dl/issues/2870)) - [mangahere] fix `page-reverse` option ([#2795](https://github.com/mikf/gallery-dl/issues/2795)) - [poipiku] fix posts with more than one image ([#2796](https://github.com/mikf/gallery-dl/issues/2796)) - [poipiku] update filter for static images ([#2796](https://github.com/mikf/gallery-dl/issues/2796)) - [slideshare] fix metadata extraction - [twitter] unescape `+` in search queries ([#2226](https://github.com/mikf/gallery-dl/issues/2226)) - [twitter] fall back to unfiltered search ([#2766](https://github.com/mikf/gallery-dl/issues/2766)) - [twitter] ignore invalid user entries ([#2850](https://github.com/mikf/gallery-dl/issues/2850)) - [vk] prevent exceptions for broken/invalid photos ([#2774](https://github.com/mikf/gallery-dl/issues/2774)) - [vsco] fix `collection` extraction - [weibo] prevent exception for missing `playback_list` ([#2792](https://github.com/mikf/gallery-dl/issues/2792)) - [weibo] prevent errors when paginating over album entries ([#2817](https://github.com/mikf/gallery-dl/issues/2817)) ## 1.22.4 - 2022-07-15 ### Additions - [instagram] add `pinned` metadata field ([#2752](https://github.com/mikf/gallery-dl/issues/2752)) - [itaku] categorize sections by group ([#1842](https://github.com/mikf/gallery-dl/issues/1842)) - [khinsider] extract `platform` metadata - [tumblr] support `/blog/view` URLs ([#2760](https://github.com/mikf/gallery-dl/issues/2760)) - [twitter] implement `strategy` option ([#2712](https://github.com/mikf/gallery-dl/issues/2712)) - [twitter] add `count` metadata field ([#2741](https://github.com/mikf/gallery-dl/issues/2741)) - [formatter] implement `O` format specifier ([#2736](https://github.com/mikf/gallery-dl/issues/2736)) - [postprocessor:mtime] add `value` option ([#2739](https://github.com/mikf/gallery-dl/issues/2739)) - add `--no-postprocessors` command-line option ([#2725](https://github.com/mikf/gallery-dl/issues/2725)) - implement `format-separator` option ([#2737](https://github.com/mikf/gallery-dl/issues/2737)) ### Changes - [pinterest] handle section pins with separate extractors ([#2684](https://github.com/mikf/gallery-dl/issues/2684)) - [postprocessor:ugoira] enable `mtime` by default ([#2714](https://github.com/mikf/gallery-dl/issues/2714)) ### Fixes - [bunkr] fix extraction ([#2732](https://github.com/mikf/gallery-dl/issues/2732)) - [hentaifoundry] fix metadata extraction - [itaku] fix user caching ([#1842](https://github.com/mikf/gallery-dl/issues/1842)) - [itaku] fix `date` parsing - [kemonoparty] ensure all files have an `extension` ([#2740](https://github.com/mikf/gallery-dl/issues/2740)) - [komikcast] update domain - [mangakakalot] update domain - [newgrounds] only attempt to login if necessary ([#2715](https://github.com/mikf/gallery-dl/issues/2715)) - [newgrounds] prevent exception on empty results ([#2727](https://github.com/mikf/gallery-dl/issues/2727)) - [nozomi] reduce memory consumption during searches ([#2754](https://github.com/mikf/gallery-dl/issues/2754)) - [pixiv] fix default `background` filenames - [sankaku] rewrite file URLs to s.sankakucomplex.com ([#2746](https://github.com/mikf/gallery-dl/issues/2746)) - [slideshare] fix `description` extraction - [twitter] ignore previously seen Tweets ([#2712](https://github.com/mikf/gallery-dl/issues/2712)) - [twitter] unescape HTML entities in `content` ([#2757](https://github.com/mikf/gallery-dl/issues/2757)) - [weibo] handle invalid or broken status objects - [postprocessor:zip] ensure target directory exists ([#2758](https://github.com/mikf/gallery-dl/issues/2758)) - make `brotli` an *optional* dependency ([#2716](https://github.com/mikf/gallery-dl/issues/2716)) - limit path length for `--write-pages` output on Windows ([#2733](https://github.com/mikf/gallery-dl/issues/2733)) ### Removals - [foolfuuka] remove archive.wakarimasen.moe ## 1.22.3 - 2022-06-28 ### Changes - [twitter] revert strategy changes for user URLs ([#2712](https://github.com/mikf/gallery-dl/issues/2712), [#2710](https://github.com/mikf/gallery-dl/issues/2710)) - update default User-Agent headers ## 1.22.2 - 2022-06-27 ### Additions - [cyberdrop] add fallback URLs ([#2668](https://github.com/mikf/gallery-dl/issues/2668)) - [horne] add support for horne.red ([#2700](https://github.com/mikf/gallery-dl/issues/2700)) - [itaku] add `gallery` and `image` extractors ([#1842](https://github.com/mikf/gallery-dl/issues/1842)) - [poipiku] add `user` and `post` extractors ([#1602](https://github.com/mikf/gallery-dl/issues/1602)) - [skeb] add `following` extractor ([#2698](https://github.com/mikf/gallery-dl/issues/2698)) - [twitter] implement `expand` option ([#2665](https://github.com/mikf/gallery-dl/issues/2665)) - [twitter] implement `csrf` option ([#2676](https://github.com/mikf/gallery-dl/issues/2676)) - [unsplash] add `collection_title` and `collection_id` metadata fields ([#2670](https://github.com/mikf/gallery-dl/issues/2670)) - [weibo] support `tabtype=video` listings ([#2601](https://github.com/mikf/gallery-dl/issues/2601)) - [formatter] implement slice operator as format specifier - support cygwin/BSD/etc for `--cookies-from-browser` ### Fixes - [instagram] improve metadata generated by `_parse_post_api()` ([#2695](https://github.com/mikf/gallery-dl/issues/2695), [#2660](https://github.com/mikf/gallery-dl/issues/2660)) - [instagram} fix `tag` extractor ([#2659](https://github.com/mikf/gallery-dl/issues/2659)) - [instagram] automatically invalidate expired login sessions - [twitter] fix pagination for conversion tweets - [twitter] improve `"replies": "self"` ([#2665](https://github.com/mikf/gallery-dl/issues/2665)) - [twitter] improve strategy for user URLs ([#2665](https://github.com/mikf/gallery-dl/issues/2665)) - [vk] take URLs from `*_src` entries ([#2535](https://github.com/mikf/gallery-dl/issues/2535)) - [weibo] fix URLs generated by `user` extractor ([#2601](https://github.com/mikf/gallery-dl/issues/2601)) - [weibo] fix retweets ([#2601](https://github.com/mikf/gallery-dl/issues/2601)) - [downloader:ytdl] update `_set_outtmpl()` ([#2692](https://github.com/mikf/gallery-dl/issues/2692)) - [formatter] fix `!j` conversion for non-serializable types ([#2624](https://github.com/mikf/gallery-dl/issues/2624)) - [snap] Fix missing libslang dependency ([#2655](https://github.com/mikf/gallery-dl/issues/2655)) ## 1.22.1 - 2022-06-04 ### Additions - [gfycat] add support for collections ([#2629](https://github.com/mikf/gallery-dl/issues/2629)) - [instagram] support specifying users by ID - [paheal] extract more metadata ([#2641](https://github.com/mikf/gallery-dl/issues/2641)) - [reddit] add `home` extractor ([#2614](https://github.com/mikf/gallery-dl/issues/2614)) - [weibo] support usernames in URLs ([#1662](https://github.com/mikf/gallery-dl/issues/1662)) - [weibo] support `livephoto` and `gif` files ([#2146](https://github.com/mikf/gallery-dl/issues/2146)) - [weibo] add support for several different `tabtype` listings ([#686](https://github.com/mikf/gallery-dl/issues/686), [#2601](https://github.com/mikf/gallery-dl/issues/2601)) - [postprocessor:metadata] write to stdout by setting filename to "-" ([#2624](https://github.com/mikf/gallery-dl/issues/2624)) - implement `output.ansi` option ([#2628](https://github.com/mikf/gallery-dl/issues/2628)) - support user-defined `output.mode` settings ([#2529](https://github.com/mikf/gallery-dl/issues/2529)) ### Changes - [readcomiconline] remove default `browser` setting ([#2625](https://github.com/mikf/gallery-dl/issues/2625)) - [weibo] switch to desktop API ([#2601](https://github.com/mikf/gallery-dl/issues/2601)) - fix command-line argument name of `--cookies-from-browser` ([#1606](https://github.com/mikf/gallery-dl/issues/1606), [#2630](https://github.com/mikf/gallery-dl/issues/2630)) ### Fixes - [bunkr] change domain to `app.bunkr.is` ([#2634](https://github.com/mikf/gallery-dl/issues/2634)) - [deviantart] fix folder listings with `"pagination": "manual"` ([#2488](https://github.com/mikf/gallery-dl/issues/2488)) - [gofile] fix 401 Unauthorized errors ([#2632](https://github.com/mikf/gallery-dl/issues/2632)) - [hypnohub] move to gelbooru_v02 instances ([#2631](https://github.com/mikf/gallery-dl/issues/2631)) - [instagram] fix and update extractors ([#2644](https://github.com/mikf/gallery-dl/issues/2644)) - [nozomi] remove slashes from search terms ([#2653](https://github.com/mikf/gallery-dl/issues/2653)) - [pixiv] include `.gif` in background fallback URLs ([#2495](https://github.com/mikf/gallery-dl/issues/2495)) - [sankaku] extend URL patterns ([#2647](https://github.com/mikf/gallery-dl/issues/2647)) - [subscribestar] fix `date` metadata ([#2642](https://github.com/mikf/gallery-dl/issues/2642)) ## 1.22.0 - 2022-05-25 ### Additions - [gelbooru_v01] add `favorite` extractor ([#2546](https://github.com/mikf/gallery-dl/issues/2546)) - [Instagram] add `tagged_users` to keywords for stories ([#2582](https://github.com/mikf/gallery-dl/issues/2582), [#2584](https://github.com/mikf/gallery-dl/issues/2584)) - [lolisafe] implement `domain` option ([#2575](https://github.com/mikf/gallery-dl/issues/2575)) - [naverwebtoon] support (best)challenge comics ([#2542](https://github.com/mikf/gallery-dl/issues/2542)) - [nijie] support /history_nuita.php listings ([#2541](https://github.com/mikf/gallery-dl/issues/2541)) - [pixiv] provide more data when `metadata` is enabled ([#2594](https://github.com/mikf/gallery-dl/issues/2594)) - [shopify] support several more sites by default ([#2089](https://github.com/mikf/gallery-dl/issues/2089)) - [twitter] extract alt texts as `description` ([#2617](https://github.com/mikf/gallery-dl/issues/2617)) - [twitter] recognize vxtwitter URLs ([#2621](https://github.com/mikf/gallery-dl/issues/2621)) - [weasyl] implement `metadata` option ([#2610](https://github.com/mikf/gallery-dl/issues/2610)) - implement `--cookies-from-browser` ([#1606](https://github.com/mikf/gallery-dl/issues/1606)) - implement `output.colors` options ([#2532](https://github.com/mikf/gallery-dl/issues/2532)) - implement string literals in replacement fields - support using extended format strings for archive keys ### Changes - [foolfuuka] match 4chan filenames ([#2577](https://github.com/mikf/gallery-dl/issues/2577)) - [pixiv] implement `include` option - provide `avatar`/`background` downloads as separate extractors ([#2495](https://github.com/mikf/gallery-dl/issues/2495)) - [twitter] use a better strategy for user URLs - [twitter] disable `cards` by default - delay directory creation ([#2461](https://github.com/mikf/gallery-dl/issues/2461), [#2474](https://github.com/mikf/gallery-dl/issues/2474)) - flush writes to stdout/stderr ([#2529](https://github.com/mikf/gallery-dl/issues/2529)) - build executables on GitHub Actions with Python 3.10 ### Fixes - [artstation] use `"browser": "firefox"` by default ([#2527](https://github.com/mikf/gallery-dl/issues/2527)) - [imgur] prevent exception with empty albums ([#2557](https://github.com/mikf/gallery-dl/issues/2557)) - [instagram] report redirects to captcha challenges ([#2543](https://github.com/mikf/gallery-dl/issues/2543)) - [khinsider] fix metadata extraction ([#2611](https://github.com/mikf/gallery-dl/issues/2611)) - [mangafox] send Referer headers ([#2592](https://github.com/mikf/gallery-dl/issues/2592)) - [mangahere] send Referer headers ([#2592](https://github.com/mikf/gallery-dl/issues/2592)) - [mangasee] use randomly generated PHPSESSID cookie ([#2560](https://github.com/mikf/gallery-dl/issues/2560)) - [pixiv] make retrieving ugoira metadata non-fatal ([#2562](https://github.com/mikf/gallery-dl/issues/2562)) - [readcomiconline] update deobfuscation code ([#2481](https://github.com/mikf/gallery-dl/issues/2481)) - [realbooru] fix extraction ([#2530](https://github.com/mikf/gallery-dl/issues/2530)) - [vk] handle photos without width/height info ([#2535](https://github.com/mikf/gallery-dl/issues/2535)) - [vk] fix user ID extraction ([#2535](https://github.com/mikf/gallery-dl/issues/2535)) - [webtoons] extract real episode numbers ([#2591](https://github.com/mikf/gallery-dl/issues/2591)) - create missing directories for archive files ([#2597](https://github.com/mikf/gallery-dl/issues/2597)) - detect circular references with `-K` ([#2609](https://github.com/mikf/gallery-dl/issues/2609)) - replace "\f" in `--filename` arguments with a form feed character ([#2396](https://github.com/mikf/gallery-dl/issues/2396)) ### Removals - [gelbooru_v01] remove tlb.booru.org from supported domains ## 1.21.2 - 2022-04-27 ### Additions - [deviantart] implement `pagination` option ([#2488](https://github.com/mikf/gallery-dl/issues/2488)) - [pixiv] implement `background` option ([#623](https://github.com/mikf/gallery-dl/issues/623), [#1124](https://github.com/mikf/gallery-dl/issues/1124), [#2495](https://github.com/mikf/gallery-dl/issues/2495)) - [postprocessor:ugoira] report ffmpeg/mkvmerge errors ([#2487](https://github.com/mikf/gallery-dl/issues/2487)) ### Fixes - [cyberdrop] match cyberdrop.to URLs ([#2496](https://github.com/mikf/gallery-dl/issues/2496)) - [e621] fix 403 errors ([#2533](https://github.com/mikf/gallery-dl/issues/2533)) - [issuu] fix extraction ([#2483](https://github.com/mikf/gallery-dl/issues/2483)) - [mangadex] download from available chapters despite `externalUrl` ([#2503](https://github.com/mikf/gallery-dl/issues/2503)) - [photovogue] update domain and api endpoint ([#2494](https://github.com/mikf/gallery-dl/issues/2494)) - [sexcom] add fallback for empty files ([#2485](https://github.com/mikf/gallery-dl/issues/2485)) - [twitter] improve syndication video selection ([#2354](https://github.com/mikf/gallery-dl/issues/2354)) - [twitter] fix various syndication issues ([#2499](https://github.com/mikf/gallery-dl/issues/2499), [#2354](https://github.com/mikf/gallery-dl/issues/2354)) - [vk] fix extraction ([#2512](https://github.com/mikf/gallery-dl/issues/2512)) - [weibo] fix infinite retries for deleted accounts ([#2521](https://github.com/mikf/gallery-dl/issues/2521)) - [postprocessor:ugoira] use compatible paths with mkvmerge ([#2487](https://github.com/mikf/gallery-dl/issues/2487)) - [postprocessor:ugoira] do not auto-select the `image2` demuxer ([#2492](https://github.com/mikf/gallery-dl/issues/2492)) ## 1.21.1 - 2022-04-08 ### Additions - [gofile] add gofile.io extractor ([#2364](https://github.com/mikf/gallery-dl/issues/2364)) - [instagram] add `previews` option ([#2135](https://github.com/mikf/gallery-dl/issues/2135)) - [kemonoparty] add `duplicates` option ([#2440](https://github.com/mikf/gallery-dl/issues/2440)) - [pinterest] add extractor for created pins ([#2452](https://github.com/mikf/gallery-dl/issues/2452)) - [pinterest] support multiple files per pin ([#1619](https://github.com/mikf/gallery-dl/issues/1619), [#2452](https://github.com/mikf/gallery-dl/issues/2452)) - [telegraph] Add telegra.ph extractor ([#2312](https://github.com/mikf/gallery-dl/issues/2312)) - [twitter] add `syndication` option ([#2354](https://github.com/mikf/gallery-dl/issues/2354)) - [twitter] accept fxtwitter.com URLs ([#2484](https://github.com/mikf/gallery-dl/issues/2484)) - [downloader:http] support using an arbitrary method and sending POST data ([#2433](https://github.com/mikf/gallery-dl/issues/2433)) - [postprocessor:metadata] implement archive options ([#2421](https://github.com/mikf/gallery-dl/issues/2421)) - [postprocessor:ugoira] add `mtime` option ([#2307](https://github.com/mikf/gallery-dl/issues/2307)) - [postprocessor:ugoira] support setting timecodes with `mkvmerge` ([#1550](https://github.com/mikf/gallery-dl/issues/1550)) - [formatter] support evaluating f-string literals - add `--ugoira-conv-copy` command-line option ([#1550](https://github.com/mikf/gallery-dl/issues/1550)) - implement a `contains()` function for filter statements ([#2446](https://github.com/mikf/gallery-dl/issues/2446)) ### Fixes - [aryion] provide correct `date` metadata independent of DST - [furaffinity] fix search result pagination ([#2402](https://github.com/mikf/gallery-dl/issues/2402)) - [hitomi] update and fix metadata extraction ([#2444](https://github.com/mikf/gallery-dl/issues/2444)) - [kissgoddess] extract all images ([#2473](https://github.com/mikf/gallery-dl/issues/2473)) - [mangasee] unescape manga names ([#2454](https://github.com/mikf/gallery-dl/issues/2454)) - [newgrounds] update and fix pagination ([#2456](https://github.com/mikf/gallery-dl/issues/2456)) - [newgrounds] warn about age-restricted posts ([#2456](https://github.com/mikf/gallery-dl/issues/2456)) - [pinterest] do not force `m3u8_native` for video downloads ([#2436](https://github.com/mikf/gallery-dl/issues/2436)) - [twibooru] fix posts without `name` ([#2434](https://github.com/mikf/gallery-dl/issues/2434)) - [unsplash] replace dash with space in search API queries ([#2429](https://github.com/mikf/gallery-dl/issues/2429)) - [postprocessor:mtime] fix timestamps from datetime objects ([#2307](https://github.com/mikf/gallery-dl/issues/2307)) - fix yet another bug in `_check_cookies()` ([#2372](https://github.com/mikf/gallery-dl/issues/2372)) - fix loading/storing cookies without domain ## 1.21.0 - 2022-03-14 ### Additions - [fantia] add `num` enumeration index ([#2377](https://github.com/mikf/gallery-dl/issues/2377)) - [fantia] support "Blog Post" content ([#2381](https://github.com/mikf/gallery-dl/issues/2381)) - [imagebam] add support for /view/ paths ([#2378](https://github.com/mikf/gallery-dl/issues/2378)) - [kemonoparty] match beta.kemono.party URLs ([#2348](https://github.com/mikf/gallery-dl/issues/2348)) - [kissgoddess] add `gallery` and `model` extractors ([#1052](https://github.com/mikf/gallery-dl/issues/1052), [#2304](https://github.com/mikf/gallery-dl/issues/2304)) - [mememuseum] add `tag` and `post` extractors ([#2264](https://github.com/mikf/gallery-dl/issues/2264)) - [newgrounds] add `post_url` metadata field ([#2328](https://github.com/mikf/gallery-dl/issues/2328)) - [patreon] add `image_large` file type ([#2257](https://github.com/mikf/gallery-dl/issues/2257)) - [toyhouse] support `art` listings ([#1546](https://github.com/mikf/gallery-dl/issues/1546), [#2331](https://github.com/mikf/gallery-dl/issues/2331)) - [twibooru] add extractors for searches, galleries, and posts ([#2219](https://github.com/mikf/gallery-dl/issues/2219)) - [postprocessor:metadata] implement `mtime` option ([#2307](https://github.com/mikf/gallery-dl/issues/2307)) - [postprocessor:mtime] add `event` option ([#2307](https://github.com/mikf/gallery-dl/issues/2307)) - add fish shell completion ([#2363](https://github.com/mikf/gallery-dl/issues/2363)) - add `timedelta` class to global namespace in filter expressions ### Changes - [seiga] require authentication with `user_session` cookie ([#2372](https://github.com/mikf/gallery-dl/issues/2372)) - remove username & password login due to 2FA - refactor proxy support ([#2357](https://github.com/mikf/gallery-dl/issues/2357)) - allow gallery-dl proxy settings to overwrite environment proxies - allow specifying different proxies for data extraction and download ### Fixes - [bunkr] fix mp4 downloads ([#2239](https://github.com/mikf/gallery-dl/issues/2239)) - [fanbox] fetch data for each individual post ([#2388](https://github.com/mikf/gallery-dl/issues/2388)) - [hentaicosplays] send `Referer` header ([#2317](https://github.com/mikf/gallery-dl/issues/2317)) - [imagebam] set `nsfw_inter` cookie ([#2334](https://github.com/mikf/gallery-dl/issues/2334)) - [kemonoparty] limit default filename length ([#2373](https://github.com/mikf/gallery-dl/issues/2373)) - [mangadex] fix chapters without `translatedLanguage` ([#2352](https://github.com/mikf/gallery-dl/issues/2352)) - [newgrounds] fix video descriptions ([#2328](https://github.com/mikf/gallery-dl/issues/2328)) - [skeb] add `sent-requests` option ([#2322](https://github.com/mikf/gallery-dl/issues/2322), [#2330](https://github.com/mikf/gallery-dl/issues/2330)) - [slideshare] fix extraction - [subscribestar] unescape attachment URLs ([#2370](https://github.com/mikf/gallery-dl/issues/2370)) - [twitter] fix handling of 429 Too Many Requests responses ([#2339](https://github.com/mikf/gallery-dl/issues/2339)) - [twitter] warn about age-restricted Tweets ([#2354](https://github.com/mikf/gallery-dl/issues/2354)) - [twitter] handle Tweets with "softIntervention" entries - [twitter] update query hashes - fix another bug in `_check_cookies()` ([#2160](https://github.com/mikf/gallery-dl/issues/2160)) ## 1.20.5 - 2022-02-14 ### Additions - [furaffinity] add `layout` option ([#2277](https://github.com/mikf/gallery-dl/issues/2277)) - [lightroom] add Lightroom gallery extractor ([#2263](https://github.com/mikf/gallery-dl/issues/2263)) - [reddit] support standalone submissions on personal user pages ([#2301](https://github.com/mikf/gallery-dl/issues/2301)) - [redgifs] support i.redgifs.com URLs ([#2300](https://github.com/mikf/gallery-dl/issues/2300)) - [wallpapercave] add extractor for images and search results ([#2205](https://github.com/mikf/gallery-dl/issues/2205)) - add `signals-ignore` option ([#2296](https://github.com/mikf/gallery-dl/issues/2296)) ### Changes - [danbooru] merge `danbooru` and `e621` extractors - support `atfbooru` ([#2283](https://github.com/mikf/gallery-dl/issues/2283)) - remove support for old e621 tag search URLs ### Fixes - [furaffinity] improve new/old layout detection ([#2277](https://github.com/mikf/gallery-dl/issues/2277)) - [imgbox] fix ImgboxExtractor ([#2281](https://github.com/mikf/gallery-dl/issues/2281)) - [inkbunny] rename search parameters to their API equivalents - [kemonoparty] handle files without names ([#2276](https://github.com/mikf/gallery-dl/issues/2276)) - [twitter] fix extraction ([#2275](https://github.com/mikf/gallery-dl/issues/2275), [#2295](https://github.com/mikf/gallery-dl/issues/2295)) - [vk] fix infinite pagination loops ([#2297](https://github.com/mikf/gallery-dl/issues/2297)) - [downloader:ytdl] make `ImportError`s non-fatal ([#2273](https://github.com/mikf/gallery-dl/issues/2273)) ## 1.20.4 - 2022-02-06 ### Additions - [e621] add `favorite` extractor ([#2250](https://github.com/mikf/gallery-dl/issues/2250)) - [hitomi] add `format` option ([#2260](https://github.com/mikf/gallery-dl/issues/2260)) - [kohlchan] add Kohlchan extractors ([#2251](https://github.com/mikf/gallery-dl/issues/2251)) - [sexcom] add `pins` extractor ([#2265](https://github.com/mikf/gallery-dl/issues/2265)) - [twitter] add `warnings` option ([#2258](https://github.com/mikf/gallery-dl/issues/2258)) - add ability to disable TLS 1.2 ([#2243](https://github.com/mikf/gallery-dl/issues/2243)) - add examples for custom gelbooru instances ([#2262](https://github.com/mikf/gallery-dl/issues/2262)) ### Fixes - [bunkr] fix mp4 downloads ([#2239](https://github.com/mikf/gallery-dl/issues/2239)) - [gelbooru] improve and fix pagination ([#2230](https://github.com/mikf/gallery-dl/issues/2230), [#2232](https://github.com/mikf/gallery-dl/issues/2232)) - [hitomi] "fix" 403 errors ([#2260](https://github.com/mikf/gallery-dl/issues/2260)) - [kemonoparty] fix downloading smaller text files ([#2267](https://github.com/mikf/gallery-dl/issues/2267)) - [patreon] disable TLS 1.2 by default ([#2249](https://github.com/mikf/gallery-dl/issues/2249)) - [twitter] restore errors for protected timelines etc ([#2237](https://github.com/mikf/gallery-dl/issues/2237)) - [twitter] restore `logout` functionality ([#1719](https://github.com/mikf/gallery-dl/issues/1719)) - [twitter] provide fallback URLs for card images - [weibo] update pagination code ([#2244](https://github.com/mikf/gallery-dl/issues/2244)) ## 1.20.3 - 2022-01-26 ### Fixes - [kemonoparty] fix DMs extraction ([#2008](https://github.com/mikf/gallery-dl/issues/2008)) - [twitter] fix crash on Tweets with deleted quotes ([#2225](https://github.com/mikf/gallery-dl/issues/2225)) - [twitter] fix crash on suspended Tweets without `legacy` entry ([#2216](https://github.com/mikf/gallery-dl/issues/2216)) - [twitter] fix crash on unified cards without `type` - [twitter] prevent crash on invalid/deleted Retweets ([#2225](https://github.com/mikf/gallery-dl/issues/2225)) - [twitter] update query hashes ## 1.20.2 - 2022-01-24 ### Additions - [twitter] add `event` extractor (closes [#2109](https://github.com/mikf/gallery-dl/issues/2109)) - [twitter] support image_carousel_website unified cards - add `--source-address` command-line option ([#2206](https://github.com/mikf/gallery-dl/issues/2206)) - add environment variable syntax to formatting.md ([#2065](https://github.com/mikf/gallery-dl/issues/2065)) ### Changes - [twitter] changes to `cards` option - enable `cards` by default - require `cards` to be set to `"ytdl"` to invoke youtube-dl/yt-dlp on unsupported cards ### Fixes - [blogger] support new image domain ([#2204](https://github.com/mikf/gallery-dl/issues/2204)) - [gelbooru] improve video file detection ([#2188](https://github.com/mikf/gallery-dl/issues/2188)) - [hitomi] fix `tag` extraction ([#2189](https://github.com/mikf/gallery-dl/issues/2189)) - [instagram] fix highlights extraction ([#2197](https://github.com/mikf/gallery-dl/issues/2197)) - [mangadex] re-enable warning for external chapters ([#2193](https://github.com/mikf/gallery-dl/issues/2193)) - [newgrounds] set suitabilities filter before starting a search ([#2173](https://github.com/mikf/gallery-dl/issues/2173)) - [philomena] fix search parameter escaping ([#2215](https://github.com/mikf/gallery-dl/issues/2215)) - [reddit] allow downloading from quarantined subreddits ([#2180](https://github.com/mikf/gallery-dl/issues/2180)) - [sexcom] extend URL pattern ([#2220](https://github.com/mikf/gallery-dl/issues/2220)) - [twitter] update to GraphQL API ([#2212](https://github.com/mikf/gallery-dl/issues/2212)) ## 1.20.1 - 2022-01-08 ### Additions - [newgrounds] add `search` extractor ([#2161](https://github.com/mikf/gallery-dl/issues/2161)) ### Changes - restore `-d/--dest` functionality from before 1.20.0 ([#2148](https://github.com/mikf/gallery-dl/issues/2148)) - change short option for `--directory` to `-D` ### Fixes - [gelbooru] handle changed API response format ([#2157](https://github.com/mikf/gallery-dl/issues/2157)) - [hitomi] fix image URLs ([#2153](https://github.com/mikf/gallery-dl/issues/2153)) - [mangadex] fix extraction ([#2177](https://github.com/mikf/gallery-dl/issues/2177)) - [rule34] use `https://api.rule34.xxx` for API requests - fix cookie checks for patreon, fanbox, fantia - improve UNC path handling ([#2126](https://github.com/mikf/gallery-dl/issues/2126)) ## 1.20.0 - 2021-12-29 ### Additions - [500px] add `favorite` extractor ([#1927](https://github.com/mikf/gallery-dl/issues/1927)) - [exhentai] add `source` option - [fanbox] support pixiv redirects ([#2122](https://github.com/mikf/gallery-dl/issues/2122)) - [inkbunny] add `search` extractor ([#2094](https://github.com/mikf/gallery-dl/issues/2094)) - [kemonoparty] support coomer.party ([#2100](https://github.com/mikf/gallery-dl/issues/2100)) - [lolisafe] add generic album extractor for lolisafe/chibisafe instances ([#2038](https://github.com/mikf/gallery-dl/issues/2038), [#2105](https://github.com/mikf/gallery-dl/issues/2105)) - [rule34us] add `tag` and `post` extractors ([#1527](https://github.com/mikf/gallery-dl/issues/1527)) - add a generic extractor ([#735](https://github.com/mikf/gallery-dl/issues/735), [#683](https://github.com/mikf/gallery-dl/issues/683)) - add `-d/--directory` and `-f/--filename` command-line options - add `--sleep-request` and `--sleep-extractor` command-line options - allow specifying `sleep-*` options as string ### Changes - [cyberdrop] include file ID in default filenames - [hitomi] disable `metadata` by default - [kemonoparty] use `service` as subcategory ([#2147](https://github.com/mikf/gallery-dl/issues/2147)) - [kemonoparty] change default `files` order to `attachments,file,inline` ([#1991](https://github.com/mikf/gallery-dl/issues/1991)) - [output] write download progress indicator to stderr - [ytdl] prefer yt-dlp over youtube-dl ([#1850](https://github.com/mikf/gallery-dl/issues/1850), [#2028](https://github.com/mikf/gallery-dl/issues/2028)) - rename `--write-infojson` to `--write-info-json` ### Fixes - [500px] create directories per photo - [artstation] create directories per asset ([#2136](https://github.com/mikf/gallery-dl/issues/2136)) - [deviantart] use `/browse/newest` for most-recent searches ([#2096](https://github.com/mikf/gallery-dl/issues/2096)) - [hitomi] fix image URLs - [instagram] fix error when PostPage data is not in GraphQL format ([#2037](https://github.com/mikf/gallery-dl/issues/2037)) - [instagran] match post URLs with usernames ([#2085](https://github.com/mikf/gallery-dl/issues/2085)) - [instagram] allow downloading specific stories ([#2088](https://github.com/mikf/gallery-dl/issues/2088)) - [furaffinity] warn when no session cookies were found - [pixiv] respect date ranges in search URLs ([#2133](https://github.com/mikf/gallery-dl/issues/2133)) - [sexcom] fix and improve embed extraction ([#2145](https://github.com/mikf/gallery-dl/issues/2145)) - [tumblrgallery] fix extraction ([#2112](https://github.com/mikf/gallery-dl/issues/2112)) - [tumblrgallery] improve `id` extraction ([#2115](https://github.com/mikf/gallery-dl/issues/2115)) - [tumblrgallery] improve search pagination ([#2132](https://github.com/mikf/gallery-dl/issues/2132)) - [twitter] include `4096x4096` as a default image fallback ([#1881](https://github.com/mikf/gallery-dl/issues/1881), [#2107](https://github.com/mikf/gallery-dl/issues/2107)) - [ytdl] update argument parsing to latest yt-dlp changes ([#2124](https://github.com/mikf/gallery-dl/issues/2124)) - handle UNC paths ([#2113](https://github.com/mikf/gallery-dl/issues/2113)) ## 1.19.3 - 2021-11-27 ### Additions - [dynastyscans] add `manga` extractor ([#2035](https://github.com/mikf/gallery-dl/issues/2035)) - [instagram] include user metadata for `tagged` downloads ([#2024](https://github.com/mikf/gallery-dl/issues/2024)) - [kemonoparty] implement `files` option ([#1991](https://github.com/mikf/gallery-dl/issues/1991)) - [kemonoparty] add `dms` option ([#2008](https://github.com/mikf/gallery-dl/issues/2008)) - [mangadex] always provide `artist`, `author`, and `group` metadata fields ([#2049](https://github.com/mikf/gallery-dl/issues/2049)) - [philomena] support furbooru.org ([#1995](https://github.com/mikf/gallery-dl/issues/1995)) - [reactor] support thatpervert.com ([#2029](https://github.com/mikf/gallery-dl/issues/2029)) - [shopify] support loungeunderwear.com ([#2053](https://github.com/mikf/gallery-dl/issues/2053)) - [skeb] add `thumbnails` option ([#2047](https://github.com/mikf/gallery-dl/issues/2047), [#2051](https://github.com/mikf/gallery-dl/issues/2051)) - [subscribestar] add `num` enumeration index ([#2040](https://github.com/mikf/gallery-dl/issues/2040)) - [subscribestar] emit metadata for posts without media ([#1569](https://github.com/mikf/gallery-dl/issues/1569)) - [ytdl] implement `cmdline-args` and `config-file` options to allow parsing ytdl command-line options ([#1680](https://github.com/mikf/gallery-dl/issues/1680)) - [formatter] implement `D` format specifier - extend `blacklist`/`whitelist` syntax ([#2025](https://github.com/mikf/gallery-dl/issues/2025)) ### Fixes - [dynastyscans] provide `date` as datetime object ([#2050](https://github.com/mikf/gallery-dl/issues/2050)) - [exhentai] fix extraction for disowned galleries ([#2055](https://github.com/mikf/gallery-dl/issues/2055)) - [gelbooru] apply workaround for pagination limits - [kemonoparty] skip duplicate files ([#2032](https://github.com/mikf/gallery-dl/issues/2032), [#1991](https://github.com/mikf/gallery-dl/issues/1991), [#1899](https://github.com/mikf/gallery-dl/issues/1899)) - [kemonoparty] provide `date` metadata for gumroad ([#2007](https://github.com/mikf/gallery-dl/issues/2007)) - [mangoxo] fix metadata extraction - [twitter] distinguish between fatal & nonfatal errors ([#2020](https://github.com/mikf/gallery-dl/issues/2020)) - [twitter] fix extractor for direct image links ([#2030](https://github.com/mikf/gallery-dl/issues/2030)) - [webtoons] use download URLs that do not require a `Referer` header ([#2005](https://github.com/mikf/gallery-dl/issues/2005)) - [ytdl] improve error handling ([#1680](https://github.com/mikf/gallery-dl/issues/1680)) - [downloader:ytdl] prevent crash in `_progress_hook()` ([#1680](https://github.com/mikf/gallery-dl/issues/1680)) ### Removals - [seisoparty] remove module ## 1.19.2 - 2021-11-05 ### Additions - [kemonoparty] add `comments` option ([#1980](https://github.com/mikf/gallery-dl/issues/1980)) - [skeb] add `user` and `post` extractors ([#1031](https://github.com/mikf/gallery-dl/issues/1031), [#1971](https://github.com/mikf/gallery-dl/issues/1971)) - [twitter] add `pinned` option - support accessing environment variables and the current local datetime in format strings ([#1968](https://github.com/mikf/gallery-dl/issues/1968)) - add special type format strings to docs ([#1987](https://github.com/mikf/gallery-dl/issues/1987)) ### Fixes - [cyberdrop] fix video extraction ([#1993](https://github.com/mikf/gallery-dl/issues/1993)) - [deviantart] fix `index` values for stashed deviations - [gfycat] provide consistent `userName` values for `user` downloads ([#1962](https://github.com/mikf/gallery-dl/issues/1962)) - [gfycat] show warning when there are no available formats - [hitomi] fix image URLs ([#1975](https://github.com/mikf/gallery-dl/issues/1975), [#1982](https://github.com/mikf/gallery-dl/issues/1982), [#1988](https://github.com/mikf/gallery-dl/issues/1988)) - [instagram] update query hashes - [mangakakalot] update domain and fix extraction - [mangoxo] fix login and extraction - [reddit] prevent crash for galleries with no `media_metadata` ([#2001](https://github.com/mikf/gallery-dl/issues/2001)) - [redgifs] update to API v2 ([#1984](https://github.com/mikf/gallery-dl/issues/1984)) - fix calculating retry sleep times ([#1990](https://github.com/mikf/gallery-dl/issues/1990)) ## 1.19.1 - 2021-10-24 ### Additions - [inkbunny] add `following` extractor ([#515](https://github.com/mikf/gallery-dl/issues/515)) - [inkbunny] add `pool` extractor ([#1937](https://github.com/mikf/gallery-dl/issues/1937)) - [kemonoparty] add `discord` extractor ([#1827](https://github.com/mikf/gallery-dl/issues/1827), [#1940](https://github.com/mikf/gallery-dl/issues/1940)) - [nhentai] add `tag` extractor ([#1950](https://github.com/mikf/gallery-dl/issues/1950), [#1955](https://github.com/mikf/gallery-dl/issues/1955)) - [patreon] add `files` option ([#1935](https://github.com/mikf/gallery-dl/issues/1935)) - [picarto] add `gallery` extractor ([#1931](https://github.com/mikf/gallery-dl/issues/1931)) - [pixiv] add `sketch` extractor ([#1497](https://github.com/mikf/gallery-dl/issues/1497)) - [seisoparty] add `favorite` extractor ([#1906](https://github.com/mikf/gallery-dl/issues/1906)) - [twitter] add `size` option ([#1881](https://github.com/mikf/gallery-dl/issues/1881)) - [vk] add `album` extractor ([#474](https://github.com/mikf/gallery-dl/issues/474), [#1952](https://github.com/mikf/gallery-dl/issues/1952)) - [postprocessor:compare] add `equal` option ([#1592](https://github.com/mikf/gallery-dl/issues/1592)) ### Fixes - [cyberdrop] extract direct download URLs ([#1943](https://github.com/mikf/gallery-dl/issues/1943)) - [deviantart] update `search` argument handling ([#1911](https://github.com/mikf/gallery-dl/issues/1911)) - [deviantart] full resolution for non-downloadable images ([#293](https://github.com/mikf/gallery-dl/issues/293)) - [furaffinity] unquote search queries ([#1958](https://github.com/mikf/gallery-dl/issues/1958)) - [inkbunny] match "long" URLs for pools and favorites ([#1937](https://github.com/mikf/gallery-dl/issues/1937)) - [kemonoparty] improve inline extraction ([#1899](https://github.com/mikf/gallery-dl/issues/1899)) - [mangadex] update parameter handling for API requests ([#1908](https://github.com/mikf/gallery-dl/issues/1908)) - [patreon] better filenames for `content` images ([#1954](https://github.com/mikf/gallery-dl/issues/1954)) - [redgifs][gfycat] provide fallback URLs ([#1962](https://github.com/mikf/gallery-dl/issues/1962)) - [downloader:ytdl] prevent crash in `_progress_hook()` - restore SOCKS support for Windows executables ## 1.19.0 - 2021-10-01 ### Additions - [aryion] add `tag` extractor ([#1849](https://github.com/mikf/gallery-dl/issues/1849)) - [desktopography] implement desktopography extractors ([#1740](https://github.com/mikf/gallery-dl/issues/1740)) - [deviantart] implement `auto-unwatch` option ([#1466](https://github.com/mikf/gallery-dl/issues/1466), [#1757](https://github.com/mikf/gallery-dl/issues/1757)) - [fantia] add `date` metadata field ([#1853](https://github.com/mikf/gallery-dl/issues/1853)) - [fappic] add `image` extractor ([#1898](https://github.com/mikf/gallery-dl/issues/1898)) - [gelbooru_v02] add `favorite` extractor ([#1834](https://github.com/mikf/gallery-dl/issues/1834)) - [kemonoparty] add `favorite` extractor ([#1824](https://github.com/mikf/gallery-dl/issues/1824)) - [kemonoparty] implement login with username & password ([#1824](https://github.com/mikf/gallery-dl/issues/1824)) - [mastodon] add `following` extractor ([#1891](https://github.com/mikf/gallery-dl/issues/1891)) - [mastodon] support specifying accounts by ID - [twitter] support `/with_replies` URLs ([#1833](https://github.com/mikf/gallery-dl/issues/1833)) - [twitter] add `quote_by` metadata field ([#1481](https://github.com/mikf/gallery-dl/issues/1481)) - [postprocessor:compare] extend `action` option ([#1592](https://github.com/mikf/gallery-dl/issues/1592)) - implement a download progress indicator ([#1519](https://github.com/mikf/gallery-dl/issues/1519)) - implement a `page-reverse` option ([#1854](https://github.com/mikf/gallery-dl/issues/1854)) - implement a way to specify extended format strings - allow specifying a minimum/maximum for `sleep-*` options ([#1835](https://github.com/mikf/gallery-dl/issues/1835)) - add a `--write-infojson` command-line option ### Changes - [cyberdrop] change directory name format ([#1871](https://github.com/mikf/gallery-dl/issues/1871)) - [instagram] update default delay to 6-12 seconds ([#1835](https://github.com/mikf/gallery-dl/issues/1835)) - [reddit] extend subcategory depending on input URL ([#1836](https://github.com/mikf/gallery-dl/issues/1836)) - move util.Formatter and util.PathFormat into their own modules ### Fixes - [artstation] use `/album/all` view for user portfolios ([#1826](https://github.com/mikf/gallery-dl/issues/1826)) - [aryion] update/improve pagination ([#1849](https://github.com/mikf/gallery-dl/issues/1849)) - [deviantart] fix bug with fetching premium content ([#1879](https://github.com/mikf/gallery-dl/issues/1879)) - [deviantart] update default archive_fmt for single deviations ([#1874](https://github.com/mikf/gallery-dl/issues/1874)) - [erome] send Referer header for file downloads ([#1829](https://github.com/mikf/gallery-dl/issues/1829)) - [hiperdex] fix extraction - [kemonoparty] update file download URLs ([#1902](https://github.com/mikf/gallery-dl/issues/1902), [#1903](https://github.com/mikf/gallery-dl/issues/1903)) - [mangadex] fix extraction ([#1852](https://github.com/mikf/gallery-dl/issues/1852)) - [mangadex] fix retrieving chapters from "pornographic" titles ([#1908](https://github.com/mikf/gallery-dl/issues/1908)) - [nozomi] preserve case of search tags ([#1860](https://github.com/mikf/gallery-dl/issues/1860)) - [redgifs][gfycat] remove webtoken code ([#1907](https://github.com/mikf/gallery-dl/issues/1907)) - [twitter] ensure card entries have a `url` ([#1868](https://github.com/mikf/gallery-dl/issues/1868)) - implement a way to correctly shorten displayed filenames containing east-asian characters ([#1377](https://github.com/mikf/gallery-dl/issues/1377)) ## 1.18.4 - 2021-09-04 ### Additions - [420chan] add `thread` and `board` extractors ([#1773](https://github.com/mikf/gallery-dl/issues/1773)) - [deviantart] add `tag` extractor ([#1803](https://github.com/mikf/gallery-dl/issues/1803)) - [deviantart] add `comments` option ([#1800](https://github.com/mikf/gallery-dl/issues/1800)) - [deviantart] implement a `auto-watch` option ([#1466](https://github.com/mikf/gallery-dl/issues/1466), [#1757](https://github.com/mikf/gallery-dl/issues/1757)) - [foolfuuka] add `gallery` extractor ([#1785](https://github.com/mikf/gallery-dl/issues/1785)) - [furaffinity] expand URL pattern for searches ([#1780](https://github.com/mikf/gallery-dl/issues/1780)) - [kemonoparty] automatically generate required DDoS-GUARD cookies ([#1779](https://github.com/mikf/gallery-dl/issues/1779)) - [nhentai] add `favorite` extractor ([#1814](https://github.com/mikf/gallery-dl/issues/1814)) - [shopify] support windsorstore.com ([#1793](https://github.com/mikf/gallery-dl/issues/1793)) - [twitter] add `url` to user objects ([#1787](https://github.com/mikf/gallery-dl/issues/1787), [#1532](https://github.com/mikf/gallery-dl/issues/1532)) - [twitter] expand t.co links in user descriptions ([#1787](https://github.com/mikf/gallery-dl/issues/1787), [#1532](https://github.com/mikf/gallery-dl/issues/1532)) - show a warning if an extractor doesn`t yield any results ([#1428](https://github.com/mikf/gallery-dl/issues/1428), [#1759](https://github.com/mikf/gallery-dl/issues/1759)) - add a `j` format string conversion - implement a `fallback` option ([#1770](https://github.com/mikf/gallery-dl/issues/1770)) - implement a `path-strip` option ### Changes - [shopify] use API for product listings ([#1793](https://github.com/mikf/gallery-dl/issues/1793)) - update default User-Agent headers ### Fixes - [deviantart] prevent exceptions for "empty" videos ([#1796](https://github.com/mikf/gallery-dl/issues/1796)) - [exhentai] improve image limits check ([#1808](https://github.com/mikf/gallery-dl/issues/1808)) - [inkbunny] fix extraction ([#1816](https://github.com/mikf/gallery-dl/issues/1816)) - [mangadex] prevent exceptions for manga without English title ([#1815](https://github.com/mikf/gallery-dl/issues/1815)) - [oauth] use defaults when config values are set to `null` ([#1778](https://github.com/mikf/gallery-dl/issues/1778)) - [pixiv] fix pixivision title extraction - [reddit] delay RedditAPI initialization ([#1813](https://github.com/mikf/gallery-dl/issues/1813)) - [twitter] improve error reporting ([#1759](https://github.com/mikf/gallery-dl/issues/1759)) - [twitter] fix issue when filtering quote tweets ([#1792](https://github.com/mikf/gallery-dl/issues/1792)) - [twitter] fix `logout` option ([#1719](https://github.com/mikf/gallery-dl/issues/1719)) ### Removals - [deviantart] remove the "you need session cookies to download mature scraps" warning ([#1777](https://github.com/mikf/gallery-dl/issues/1777), [#1776](https://github.com/mikf/gallery-dl/issues/1776)) - [foolslide] remove entry for kobato.hologfx.com ## 1.18.3 - 2021-08-13 ### Additions - [bbc] add `width` option ([#1706](https://github.com/mikf/gallery-dl/issues/1706)) - [danbooru] add `external` option ([#1747](https://github.com/mikf/gallery-dl/issues/1747)) - [furaffinity] add `external` option ([#1492](https://github.com/mikf/gallery-dl/issues/1492)) - [luscious] add `gif` option ([#1701](https://github.com/mikf/gallery-dl/issues/1701)) - [newgrounds] add `format` option ([#1729](https://github.com/mikf/gallery-dl/issues/1729)) - [reactor] add `gif` option ([#1701](https://github.com/mikf/gallery-dl/issues/1701)) - [twitter] warn about suspended accounts ([#1759](https://github.com/mikf/gallery-dl/issues/1759)) - [twitter] extend `replies` option ([#1254](https://github.com/mikf/gallery-dl/issues/1254)) - [twitter] add option to log out and retry when blocked ([#1719](https://github.com/mikf/gallery-dl/issues/1719)) - [wikieat] add `thread` and `board` extractors ([#1699](https://github.com/mikf/gallery-dl/issues/1699), [#1607](https://github.com/mikf/gallery-dl/issues/1607)) ### Changes - [instagram] increase default delay between HTTP requests from 5s to 8s ([#1732](https://github.com/mikf/gallery-dl/issues/1732)) ### Fixes - [bbc] improve image dimensions ([#1706](https://github.com/mikf/gallery-dl/issues/1706)) - [bbc] support multi-page gallery listings ([#1730](https://github.com/mikf/gallery-dl/issues/1730)) - [behance] fix `collection` extraction - [deviantart] get original files for GIF previews ([#1731](https://github.com/mikf/gallery-dl/issues/1731)) - [furaffinity] fix errors when using `category-transfer` ([#1274](https://github.com/mikf/gallery-dl/issues/1274)) - [hitomi] fix image URLs ([#1765](https://github.com/mikf/gallery-dl/issues/1765)) - [instagram] use custom User-Agent header for video downloads ([#1682](https://github.com/mikf/gallery-dl/issues/1682), [#1623](https://github.com/mikf/gallery-dl/issues/1623), [#1580](https://github.com/mikf/gallery-dl/issues/1580)) - [kemonoparty] fix username extraction ([#1750](https://github.com/mikf/gallery-dl/issues/1750)) - [kemonoparty] update file server domain ([#1764](https://github.com/mikf/gallery-dl/issues/1764)) - [newgrounds] fix errors when using `category-transfer` ([#1274](https://github.com/mikf/gallery-dl/issues/1274)) - [nsfwalbum] retry backend requests when extracting image URLs ([#1733](https://github.com/mikf/gallery-dl/issues/1733), [#1271](https://github.com/mikf/gallery-dl/issues/1271)) - [vk] prevent exception for empty/private profiles ([#1742](https://github.com/mikf/gallery-dl/issues/1742)) ## 1.18.2 - 2021-07-23 ### Additions - [bbc] add `gallery` and `programme` extractors ([#1706](https://github.com/mikf/gallery-dl/issues/1706)) - [comicvine] add extractor ([#1712](https://github.com/mikf/gallery-dl/issues/1712)) - [kemonoparty] add `max-posts` option ([#1674](https://github.com/mikf/gallery-dl/issues/1674)) - [kemonoparty] parse `o` query parameters ([#1674](https://github.com/mikf/gallery-dl/issues/1674)) - [mastodon] add `reblogs` and `replies` options ([#1669](https://github.com/mikf/gallery-dl/issues/1669)) - [pixiv] add extractor for `pixivision` articles ([#1672](https://github.com/mikf/gallery-dl/issues/1672)) - [ytdl] add experimental extractor for sites supported by youtube-dl ([#1680](https://github.com/mikf/gallery-dl/issues/1680), [#878](https://github.com/mikf/gallery-dl/issues/878)) - extend `parent-metadata` functionality ([#1687](https://github.com/mikf/gallery-dl/issues/1687), [#1651](https://github.com/mikf/gallery-dl/issues/1651), [#1364](https://github.com/mikf/gallery-dl/issues/1364)) - add `archive-prefix` option ([#1711](https://github.com/mikf/gallery-dl/issues/1711)) - add `url-metadata` option ([#1659](https://github.com/mikf/gallery-dl/issues/1659), [#1073](https://github.com/mikf/gallery-dl/issues/1073)) ### Changes - [kemonoparty] skip duplicated patreon files ([#1689](https://github.com/mikf/gallery-dl/issues/1689), [#1667](https://github.com/mikf/gallery-dl/issues/1667)) - [mangadex] use custom User-Agent header ([#1535](https://github.com/mikf/gallery-dl/issues/1535)) ### Fixes - [hitomi] fix image URLs ([#1679](https://github.com/mikf/gallery-dl/issues/1679)) - [imagevenue] fix extraction ([#1677](https://github.com/mikf/gallery-dl/issues/1677)) - [instagram] fix extraction of `/explore/tags/` posts ([#1666](https://github.com/mikf/gallery-dl/issues/1666)) - [moebooru] fix `tags` ending with a `+` when logged in ([#1702](https://github.com/mikf/gallery-dl/issues/1702)) - [naverwebtoon] fix comic extraction - [pururin] update domain and fix extraction - [vk] improve metadata extraction and URL pattern ([#1691](https://github.com/mikf/gallery-dl/issues/1691)) - [downloader:ytdl] fix `outtmpl` setting for yt-dlp ([#1680](https://github.com/mikf/gallery-dl/issues/1680)) ## 1.18.1 - 2021-07-04 ### Additions - [mangafox] add `manga` extractor ([#1633](https://github.com/mikf/gallery-dl/issues/1633)) - [mangasee] add `chapter` and `manga` extractors - [mastodon] implement `text-posts` option ([#1569](https://github.com/mikf/gallery-dl/issues/1569), [#1669](https://github.com/mikf/gallery-dl/issues/1669)) - [seisoparty] add `user` and `post` extractors ([#1635](https://github.com/mikf/gallery-dl/issues/1635)) - implement conditional directories ([#1394](https://github.com/mikf/gallery-dl/issues/1394)) - add `T` format string conversion ([#1646](https://github.com/mikf/gallery-dl/issues/1646)) - document format string syntax ### Changes - [twitter] set `retweet_id` for original retweets ([#1481](https://github.com/mikf/gallery-dl/issues/1481)) ### Fixes - [directlink] manually encode Referer URLs ([#1647](https://github.com/mikf/gallery-dl/issues/1647)) - [hiperdex] use domain from input URL - [kemonoparty] fix `username` extraction ([#1652](https://github.com/mikf/gallery-dl/issues/1652)) - [kemonoparty] warn about missing DDoS-GUARD cookies - [twitter] ensure guest tokens are returned as string ([#1665](https://github.com/mikf/gallery-dl/issues/1665)) - [webtoons] match arbitrary language codes ([#1643](https://github.com/mikf/gallery-dl/issues/1643)) - fix depth counter in UrlJob when specifying `-g` multiple times ## 1.18.0 - 2021-06-19 ### Additions - [foolfuuka] support `archive.wakarimasen.moe` ([#1595](https://github.com/mikf/gallery-dl/issues/1595)) - [mangadex] implement login with username & password ([#1535](https://github.com/mikf/gallery-dl/issues/1535)) - [mangadex] add extractor for a user's followed feed ([#1535](https://github.com/mikf/gallery-dl/issues/1535)) - [pixiv] support fetching privately followed users ([#1628](https://github.com/mikf/gallery-dl/issues/1628)) - implement conditional filenames ([#1394](https://github.com/mikf/gallery-dl/issues/1394)) - implement `filter` option for post processors ([#1460](https://github.com/mikf/gallery-dl/issues/1460)) - add `-T/--terminate` command-line option ([#1399](https://github.com/mikf/gallery-dl/issues/1399)) - add `-P/--postprocessor` command-line option ([#1583](https://github.com/mikf/gallery-dl/issues/1583)) ### Changes - [kemonoparty] update default filenames and archive IDs ([#1514](https://github.com/mikf/gallery-dl/issues/1514)) - [twitter] update default settings - change `retweets` and `quoted` options from `true` to `false` - change directory format for search results to the same as other extractors - require an argument for `--clear-cache` ### Fixes - [500px] update GraphQL queries - [furaffinity] improve metadata extraction ([#1630](https://github.com/mikf/gallery-dl/issues/1630)) - [hitomi] update image URL generation ([#1637](https://github.com/mikf/gallery-dl/issues/1637)) - [idolcomplex] improve and fix pagination ([#1594](https://github.com/mikf/gallery-dl/issues/1594), [#1601](https://github.com/mikf/gallery-dl/issues/1601)) - [instagram] fix login ([#1631](https://github.com/mikf/gallery-dl/issues/1631)) - [instagram] update query hashes - [mangadex] update to API v5 ([#1535](https://github.com/mikf/gallery-dl/issues/1535)) - [mangafox] improve URL pattern ([#1608](https://github.com/mikf/gallery-dl/issues/1608)) - [oauth] prevent exceptions when reporting errors ([#1603](https://github.com/mikf/gallery-dl/issues/1603)) - [philomena] fix tag escapes handling ([#1629](https://github.com/mikf/gallery-dl/issues/1629)) - [redgifs] update API server address ([#1632](https://github.com/mikf/gallery-dl/issues/1632)) - [sankaku] handle empty tags ([#1617](https://github.com/mikf/gallery-dl/issues/1617)) - [subscribestar] improve attachment filenames ([#1609](https://github.com/mikf/gallery-dl/issues/1609)) - [unsplash] update collections URL pattern ([#1627](https://github.com/mikf/gallery-dl/issues/1627)) - [postprocessor:metadata] handle dicts in `mode:tags` ([#1598](https://github.com/mikf/gallery-dl/issues/1598)) ## 1.17.5 - 2021-05-30 ### Additions - [kemonoparty] add `metadata` option ([#1548](https://github.com/mikf/gallery-dl/issues/1548)) - [kemonoparty] add `type` metadata field ([#1556](https://github.com/mikf/gallery-dl/issues/1556)) - [mangapark] recognize v2.mangapark URLs ([#1578](https://github.com/mikf/gallery-dl/issues/1578)) - [patreon] extract user-defined `tags` ([#1539](https://github.com/mikf/gallery-dl/issues/1539), [#1540](https://github.com/mikf/gallery-dl/issues/1540)) - [pillowfort] implement login with username & password ([#846](https://github.com/mikf/gallery-dl/issues/846)) - [pillowfort] add `inline` and `external` options ([#846](https://github.com/mikf/gallery-dl/issues/846)) - [pixiv] implement `max-posts` option ([#1558](https://github.com/mikf/gallery-dl/issues/1558)) - [pixiv] add `metadata` option ([#1551](https://github.com/mikf/gallery-dl/issues/1551)) - [twitter] add `text-tweets` option ([#570](https://github.com/mikf/gallery-dl/issues/570)) - [weibo] extend `retweets` option ([#1542](https://github.com/mikf/gallery-dl/issues/1542)) - [postprocessor:ugoira] support using the `image2` demuxer ([#1550](https://github.com/mikf/gallery-dl/issues/1550)) - [postprocessor:ugoira] add `repeat-last-frame` option ([#1550](https://github.com/mikf/gallery-dl/issues/1550)) - support `XDG_CONFIG_HOME` ([#1545](https://github.com/mikf/gallery-dl/issues/1545)) - implement `parent-skip` and `"skip": "terminate"` options ([#1399](https://github.com/mikf/gallery-dl/issues/1399)) ### Changes - [twitter] resolve `t.co` URLs in `content` ([#1532](https://github.com/mikf/gallery-dl/issues/1532)) ### Fixes - [500px] update query hashes ([#1573](https://github.com/mikf/gallery-dl/issues/1573)) - [aryion] find text posts in `recursive=false` mode ([#1568](https://github.com/mikf/gallery-dl/issues/1568)) - [imagebam] fix extraction of NSFW images ([#1534](https://github.com/mikf/gallery-dl/issues/1534)) - [imgur] update URL patterns ([#1561](https://github.com/mikf/gallery-dl/issues/1561)) - [manganelo] update domain to `manganato.com` - [reactor] skip deleted/empty posts - [twitter] add missing retweet media entities ([#1555](https://github.com/mikf/gallery-dl/issues/1555)) - fix ISO 639-1 code for Japanese (`jp` -> `ja`) ## 1.17.4 - 2021-05-07 ### Additions - [gelbooru] add extractor for `/redirect.php` URLs ([#1530](https://github.com/mikf/gallery-dl/issues/1530)) - [inkbunny] add `favorite` extractor ([#1521](https://github.com/mikf/gallery-dl/issues/1521)) - add `output.skip` option - add an optional argument to `--clear-cache` to select which cache entries to remove ([#1230](https://github.com/mikf/gallery-dl/issues/1230)) ### Changes - [pixiv] update `translated-tags` option ([#1507](https://github.com/mikf/gallery-dl/issues/1507)) - rename to `tags` - accept `"japanese"`, `"translated"`, and `"original"` as values ### Fixes - [500px] update query hashes - [kemonoparty] fix download URLs ([#1514](https://github.com/mikf/gallery-dl/issues/1514)) - [imagebam] fix extraction - [instagram] update query hashes - [nozomi] update default archive-fmt for `tag` and `search` extractors ([#1529](https://github.com/mikf/gallery-dl/issues/1529)) - [pixiv] remove duplicate translated tags ([#1507](https://github.com/mikf/gallery-dl/issues/1507)) - [readcomiconline] change domain to `readcomiconline.li` ([#1517](https://github.com/mikf/gallery-dl/issues/1517)) - [sankaku] update invalid-token detection ([#1515](https://github.com/mikf/gallery-dl/issues/1515)) - fix crash when using `--no-download` with `--ugoira-conv` ([#1507](https://github.com/mikf/gallery-dl/issues/1507)) ## 1.17.3 - 2021-04-25 ### Additions - [danbooru] add option for extended metadata extraction ([#1458](https://github.com/mikf/gallery-dl/issues/1458)) - [fanbox] add extractors ([#1459](https://github.com/mikf/gallery-dl/issues/1459)) - [fantia] add extractors ([#1459](https://github.com/mikf/gallery-dl/issues/1459)) - [gelbooru] add an option to extract notes ([#1457](https://github.com/mikf/gallery-dl/issues/1457)) - [hentaicosplays] add extractor ([#907](https://github.com/mikf/gallery-dl/issues/907), [#1473](https://github.com/mikf/gallery-dl/issues/1473), [#1483](https://github.com/mikf/gallery-dl/issues/1483)) - [instagram] add extractor for `tagged` posts ([#1439](https://github.com/mikf/gallery-dl/issues/1439)) - [naverwebtoon] ignore non-comic images - [pixiv] also save untranslated tags when `translated-tags` is enabled ([#1501](https://github.com/mikf/gallery-dl/issues/1501)) - [shopify] support omgmiamiswimwear.com ([#1280](https://github.com/mikf/gallery-dl/issues/1280)) - implement `output.fallback` option - add archive format to InfoJob output ([#875](https://github.com/mikf/gallery-dl/issues/875)) - build executables with SOCKS proxy support ([#1424](https://github.com/mikf/gallery-dl/issues/1424)) ### Fixes - [500px] update query hashes - [8muses] fix JSON deobfuscation - [artstation] download `/4k/` images ([#1422](https://github.com/mikf/gallery-dl/issues/1422)) - [deviantart] fix pagination for Eclipse results ([#1444](https://github.com/mikf/gallery-dl/issues/1444)) - [deviantart] improve folder name matching ([#1451](https://github.com/mikf/gallery-dl/issues/1451)) - [erome] skip deleted albums ([#1447](https://github.com/mikf/gallery-dl/issues/1447)) - [exhentai] fix image limit detection ([#1437](https://github.com/mikf/gallery-dl/issues/1437)) - [exhentai] restore `limits` option ([#1487](https://github.com/mikf/gallery-dl/issues/1487)) - [gelbooru] fix tag category extraction ([#1455](https://github.com/mikf/gallery-dl/issues/1455)) - [instagram] update query hashes - [komikcast] fix extraction - [simplyhentai] fix extraction - [slideshare] fix extraction - [webtoons] update agegate/GDPR cookies ([#1431](https://github.com/mikf/gallery-dl/issues/1431)) - fix `category-transfer` option ### Removals - [yuki] remove module for yuki.la ## 1.17.2 - 2021-04-02 ### Additions - [deviantart] add support for posts from watched users ([#794](https://github.com/mikf/gallery-dl/issues/794)) - [manganelo] add `chapter` and `manga` extractors ([#1415](https://github.com/mikf/gallery-dl/issues/1415)) - [pinterest] add `search` extractor ([#1411](https://github.com/mikf/gallery-dl/issues/1411)) - [sankaku] add `tag_string` metadata field ([#1388](https://github.com/mikf/gallery-dl/issues/1388)) - [sankaku] add enumeration index for books ([#1388](https://github.com/mikf/gallery-dl/issues/1388)) - [tapas] add `series` and `episode` extractors ([#692](https://github.com/mikf/gallery-dl/issues/692)) - [tapas] implement login with username & password ([#692](https://github.com/mikf/gallery-dl/issues/692)) - [twitter] allow specifying a custom format for user results ([#1337](https://github.com/mikf/gallery-dl/issues/1337)) - [twitter] add extractor for direct image links ([#1417](https://github.com/mikf/gallery-dl/issues/1417)) - [vk] add support for albums ([#474](https://github.com/mikf/gallery-dl/issues/474)) ### Fixes - [aryion] unescape paths ([#1414](https://github.com/mikf/gallery-dl/issues/1414)) - [bcy] improve pagination - [deviantart] update `watch` URL pattern ([#794](https://github.com/mikf/gallery-dl/issues/794)) - [deviantart] fix arguments for search/popular results ([#1408](https://github.com/mikf/gallery-dl/issues/1408)) - [deviantart] use fallback for `/intermediary/` URLs - [exhentai] improve and simplify image limit checks - [komikcast] fix extraction - [pixiv] fix `favorite` URL pattern ([#1405](https://github.com/mikf/gallery-dl/issues/1405)) - [sankaku] simplify `pool` tags ([#1388](https://github.com/mikf/gallery-dl/issues/1388)) - [twitter] improve error message when trying to log in with 2FA ([#1409](https://github.com/mikf/gallery-dl/issues/1409)) - [twitter] don't use youtube-dl for cards when videos are disabled ([#1416](https://github.com/mikf/gallery-dl/issues/1416)) ## 1.17.1 - 2021-03-19 ### Additions - [architizer] add `project` and `firm` extractors ([#1369](https://github.com/mikf/gallery-dl/issues/1369)) - [deviantart] add `watch` extractor ([#794](https://github.com/mikf/gallery-dl/issues/794)) - [exhentai] support `/tag/` URLs ([#1363](https://github.com/mikf/gallery-dl/issues/1363)) - [gelbooru_v01] support `drawfriends.booru.org`, `vidyart.booru.org`, and `tlb.booru.org` by default - [nozomi] support `/index-N.html` URLs ([#1365](https://github.com/mikf/gallery-dl/issues/1365)) - [philomena] add generalized extractors for philomena sites ([#1379](https://github.com/mikf/gallery-dl/issues/1379)) - [philomena] support post URLs without `/images/` - [twitter] implement `users` option ([#1337](https://github.com/mikf/gallery-dl/issues/1337)) - implement `parent-metadata` option ([#1364](https://github.com/mikf/gallery-dl/issues/1364)) ### Changes - [deviantart] revert previous changes to `extra` option ([#1356](https://github.com/mikf/gallery-dl/issues/1356), [#1387](https://github.com/mikf/gallery-dl/issues/1387)) ### Fixes - [exhentai] improve favorites count extraction ([#1360](https://github.com/mikf/gallery-dl/issues/1360)) - [gelbooru] update domain for video downloads ([#1368](https://github.com/mikf/gallery-dl/issues/1368)) - [hentaifox] improve image and metadata extraction ([#1366](https://github.com/mikf/gallery-dl/issues/1366), [#1378](https://github.com/mikf/gallery-dl/issues/1378)) - [imgur] fix and improve rate limit handling ([#1386](https://github.com/mikf/gallery-dl/issues/1386)) - [weasyl] improve favorites URL pattern ([#1374](https://github.com/mikf/gallery-dl/issues/1374)) - use type check before applying `browser` option ([#1358](https://github.com/mikf/gallery-dl/issues/1358)) - ensure `-s/--simulate` always prints filenames ([#1360](https://github.com/mikf/gallery-dl/issues/1360)) ### Removals - [hentaicafe] remove module - [hentainexus] remove module - [mangareader] remove module - [mangastream] remove module ## 1.17.0 - 2021-03-05 ### Additions - [cyberdrop] add support for `https://cyberdrop.me/` ([#1328](https://github.com/mikf/gallery-dl/issues/1328)) - [exhentai] add `metadata` option; extract more metadata from gallery pages ([#1325](https://github.com/mikf/gallery-dl/issues/1325)) - [hentaicafe] add `search` and `tag` extractors ([#1345](https://github.com/mikf/gallery-dl/issues/1345)) - [hentainexus] add `original` option ([#1322](https://github.com/mikf/gallery-dl/issues/1322)) - [instagram] support `/user/reels/` URLs ([#1329](https://github.com/mikf/gallery-dl/issues/1329)) - [naverwebtoon] add support for `https://comic.naver.com/` ([#1331](https://github.com/mikf/gallery-dl/issues/1331)) - [pixiv] add `translated-tags` option ([#1354](https://github.com/mikf/gallery-dl/issues/1354)) - [tbib] add support for `https://tbib.org/` ([#473](https://github.com/mikf/gallery-dl/issues/473), [#1082](https://github.com/mikf/gallery-dl/issues/1082)) - [tumblrgallery] add support for `https://tumblrgallery.xyz/` ([#1298](https://github.com/mikf/gallery-dl/issues/1298)) - [twitter] add extractor for followed users ([#1337](https://github.com/mikf/gallery-dl/issues/1337)) - [twitter] add option to download all media from conversations ([#1319](https://github.com/mikf/gallery-dl/issues/1319)) - [wallhaven] add `collections` extractor ([#1351](https://github.com/mikf/gallery-dl/issues/1351)) - [snap] allow access to user's .netrc for site authentication ([#1352](https://github.com/mikf/gallery-dl/issues/1352)) - add extractors for Gelbooru v0.1 sites ([#234](https://github.com/mikf/gallery-dl/issues/234), [#426](https://github.com/mikf/gallery-dl/issues/426), [#473](https://github.com/mikf/gallery-dl/issues/473), [#767](https://github.com/mikf/gallery-dl/issues/767), [#1238](https://github.com/mikf/gallery-dl/issues/1238)) - add `-E/--extractor-info` command-line option ([#875](https://github.com/mikf/gallery-dl/issues/875)) - add GitHub Actions workflow for building standalone executables ([#1312](https://github.com/mikf/gallery-dl/issues/1312)) - add `browser` and `headers` options ([#1117](https://github.com/mikf/gallery-dl/issues/1117)) - add option to use different youtube-dl forks ([#1330](https://github.com/mikf/gallery-dl/issues/1330)) - support using multiple input files at once ([#1353](https://github.com/mikf/gallery-dl/issues/1353)) ### Changes - [deviantart] extend `extra` option to also download embedded DeviantArt posts. - [exhentai] rename metadata fields to match API results ([#1325](https://github.com/mikf/gallery-dl/issues/1325)) - [mangadex] use `api.mangadex.org` as default API server - [mastodon] cache OAuth tokens ([#616](https://github.com/mikf/gallery-dl/issues/616)) - replace `wait-min` and `wait-max` with `sleep-request` ### Fixes - [500px] skip unavailable photos ([#1335](https://github.com/mikf/gallery-dl/issues/1335)) - [komikcast] fix extraction - [readcomiconline] download high quality image versions ([#1347](https://github.com/mikf/gallery-dl/issues/1347)) - [twitter] update GraphQL endpoints - fix crash when `base-directory` is an empty string ([#1339](https://github.com/mikf/gallery-dl/issues/1339)) ### Removals - remove support for formerly deprecated options - remove `cloudflare` module ## 1.16.5 - 2021-02-14 ### Additions - [behance] support `video` modules ([#1282](https://github.com/mikf/gallery-dl/issues/1282)) - [erome] add `album`, `user`, and `search` extractors ([#409](https://github.com/mikf/gallery-dl/issues/409)) - [hentaifox] support searching by group ([#1294](https://github.com/mikf/gallery-dl/issues/1294)) - [imgclick] add `image` extractor ([#1307](https://github.com/mikf/gallery-dl/issues/1307)) - [kemonoparty] extract inline images ([#1286](https://github.com/mikf/gallery-dl/issues/1286)) - [kemonoparty] support URLs with non-numeric user and post IDs ([#1303](https://github.com/mikf/gallery-dl/issues/1303)) - [pillowfort] add `user` and `post` extractors ([#846](https://github.com/mikf/gallery-dl/issues/846)) ### Changes - [kemonoparty] include `service` in directories and archive keys - [pixiv] require a `refresh-token` to login ([#1304](https://github.com/mikf/gallery-dl/issues/1304)) - [snap] use `core18` as base ### Fixes - [500px] update query hashes - [deviantart] update parameters for `/browse/popular` ([#1267](https://github.com/mikf/gallery-dl/issues/1267)) - [deviantart] provide filename extension for original file downloads ([#1272](https://github.com/mikf/gallery-dl/issues/1272)) - [deviantart] fix `folders` option ([#1302](https://github.com/mikf/gallery-dl/issues/1302)) - [inkbunny] add `sid` parameter to private file downloads ([#1281](https://github.com/mikf/gallery-dl/issues/1281)) - [kemonoparty] fix absolute file URLs - [mangadex] revert to `https://mangadex.org/api/` and add `api-server` option ([#1310](https://github.com/mikf/gallery-dl/issues/1310)) - [nsfwalbum] use fallback for deleted content ([#1259](https://github.com/mikf/gallery-dl/issues/1259)) - [sankaku] update `invalid token` detection ([#1309](https://github.com/mikf/gallery-dl/issues/1309)) - [slideshare] fix extraction - [postprocessor:metadata] fix crash with `extension-format` ([#1285](https://github.com/mikf/gallery-dl/issues/1285)) ## 1.16.4 - 2021-01-23 ### Additions - [furaffinity] add `descriptions` option ([#1231](https://github.com/mikf/gallery-dl/issues/1231)) - [kemonoparty] add `user` and `post` extractors ([#1216](https://github.com/mikf/gallery-dl/issues/1216)) - [nozomi] add `num` enumeration index ([#1239](https://github.com/mikf/gallery-dl/issues/1239)) - [photovogue] added portfolio extractor ([#1253](https://github.com/mikf/gallery-dl/issues/1253)) - [twitter] match `/i/user/ID` URLs - [unsplash] add extractors ([#1197](https://github.com/mikf/gallery-dl/issues/1197)) - [vipr] add image extractor ([#1258](https://github.com/mikf/gallery-dl/issues/1258)) ### Changes - [derpibooru] use "Everything" filter by default ([#862](https://github.com/mikf/gallery-dl/issues/862)) ### Fixes - [derpibooru] update `date` parsing - [foolfuuka] stop search when results are exhausted ([#1174](https://github.com/mikf/gallery-dl/issues/1174)) - [instagram] fix regex for `/saved` URLs ([#1251](https://github.com/mikf/gallery-dl/issues/1251)) - [mangadex] update API URLs - [mangakakalot] fix extraction - [newgrounds] fix flash file extraction ([#1257](https://github.com/mikf/gallery-dl/issues/1257)) - [sankaku] simplify login process - [twitter] fix retries after hitting rate limit ## 1.16.3 - 2021-01-10 ### Fixes - fix crash when using a `dict` for `path-restrict` - [postprocessor:metadata] sanitize custom filenames ## 1.16.2 - 2021-01-09 ### Additions - [derpibooru] add `search` and `gallery` extractors ([#862](https://github.com/mikf/gallery-dl/issues/862)) - [foolfuuka] add `board` and `search` extractors ([#1044](https://github.com/mikf/gallery-dl/issues/1044), [#1174](https://github.com/mikf/gallery-dl/issues/1174)) - [gfycat] add `date` metadata field ([#1138](https://github.com/mikf/gallery-dl/issues/1138)) - [pinterest] add support for getting all boards of a user ([#1205](https://github.com/mikf/gallery-dl/issues/1205)) - [sankaku] add support for book searches ([#1204](https://github.com/mikf/gallery-dl/issues/1204)) - [twitter] fetch media from pinned tweets ([#1203](https://github.com/mikf/gallery-dl/issues/1203)) - [wikiart] add extractor for single paintings ([#1233](https://github.com/mikf/gallery-dl/issues/1233)) - [downloader:http] add MIME type and signature for `.ico` files ([#1211](https://github.com/mikf/gallery-dl/issues/1211)) - add `d` format string conversion for timestamp values - add `"ascii"` as a special `path-restrict` value ### Fixes - [hentainexus] fix extraction ([#1234](https://github.com/mikf/gallery-dl/issues/1234)) - [instagram] categorize single highlight URLs as `highlights` ([#1222](https://github.com/mikf/gallery-dl/issues/1222)) - [redgifs] fix search results - [twitter] fix login with username & password - [twitter] fetch tweets from `homeConversation` entries ## 1.16.1 - 2020-12-27 ### Additions - [instagram] add `include` option ([#1180](https://github.com/mikf/gallery-dl/issues/1180)) - [pinterest] implement video support ([#1189](https://github.com/mikf/gallery-dl/issues/1189)) - [sankaku] reimplement login support ([#1176](https://github.com/mikf/gallery-dl/issues/1176), [#1182](https://github.com/mikf/gallery-dl/issues/1182)) - [sankaku] add support for sankaku.app URLs ([#1193](https://github.com/mikf/gallery-dl/issues/1193)) ### Changes - [e621] return pool posts in order ([#1195](https://github.com/mikf/gallery-dl/issues/1195)) - [hentaicafe] prefer title of `/hc.fyi/` pages ([#1106](https://github.com/mikf/gallery-dl/issues/1106)) - [hentaicafe] simplify default filenames - [sankaku] normalize `created_at` metadata ([#1190](https://github.com/mikf/gallery-dl/issues/1190)) - [postprocessor:exec] do not add missing `{}` to command ([#1185](https://github.com/mikf/gallery-dl/issues/1185)) ### Fixes - [booru] improve error handling - [instagram] warn about private profiles ([#1187](https://github.com/mikf/gallery-dl/issues/1187)) - [keenspot] improve redirect handling - [mangadex] respect `chapter-reverse` settings ([#1194](https://github.com/mikf/gallery-dl/issues/1194)) - [pixiv] output debug message on failed login attempts ([#1192](https://github.com/mikf/gallery-dl/issues/1192)) - increase SQLite connection timeouts ([#1173](https://github.com/mikf/gallery-dl/issues/1173)) ### Removals - [mangapanda] remove module ## 1.16.0 - 2020-12-12 ### Additions - [booru] implement generalized extractors for `*booru` and `moebooru` sites - add support for sakugabooru.com ([#1136](https://github.com/mikf/gallery-dl/issues/1136)) - add support for lolibooru.moe ([#1050](https://github.com/mikf/gallery-dl/issues/1050)) - provide formattable `date` metadata fields ([#1138](https://github.com/mikf/gallery-dl/issues/1138)) - [postprocessor:metadata] add `event` and `filename` options ([#315](https://github.com/mikf/gallery-dl/issues/315), [#866](https://github.com/mikf/gallery-dl/issues/866), [#984](https://github.com/mikf/gallery-dl/issues/984)) - [postprocessor:exec] add `event` option ([#992](https://github.com/mikf/gallery-dl/issues/992)) ### Changes - [flickr] update default directories and improve metadata consistency ([#828](https://github.com/mikf/gallery-dl/issues/828)) - [sankaku] use API endpoints from `beta.sankakucomplex.com` - [downloader:http] improve filename extension handling ([#776](https://github.com/mikf/gallery-dl/issues/776)) - replace all JPEG filename extensions with `jpg` by default ### Fixes - [hentainexus] fix extraction ([#1166](https://github.com/mikf/gallery-dl/issues/1166)) - [instagram] rewrite ([#1113](https://github.com/mikf/gallery-dl/issues/1113), [#1122](https://github.com/mikf/gallery-dl/issues/1122), [#1128](https://github.com/mikf/gallery-dl/issues/1128), [#1130](https://github.com/mikf/gallery-dl/issues/1130), [#1149](https://github.com/mikf/gallery-dl/issues/1149)) - [mangadex] handle external chapters ([#1154](https://github.com/mikf/gallery-dl/issues/1154)) - [nozomi] handle empty `date` fields ([#1163](https://github.com/mikf/gallery-dl/issues/1163)) - [paheal] create directory for each post ([#1147](https://github.com/mikf/gallery-dl/issues/1147)) - [piczel] update API URLs - [twitter] update image URL format ([#1145](https://github.com/mikf/gallery-dl/issues/1145)) - [twitter] improve `x-csrf-token` header handling ([#1170](https://github.com/mikf/gallery-dl/issues/1170)) - [webtoons] update `ageGate` cookies ### Removals - [sankaku] remove login support ## 1.15.4 - 2020-11-27 ### Fixes - [2chan] skip external links - [hentainexus] fix extraction ([#1125](https://github.com/mikf/gallery-dl/issues/1125)) - [mangadex] switch to API v2 ([#1129](https://github.com/mikf/gallery-dl/issues/1129)) - [mangapanda] use http:// - [mangoxo] fix extraction - [reddit] skip invalid gallery items ([#1127](https://github.com/mikf/gallery-dl/issues/1127)) ## 1.15.3 - 2020-11-13 ### Additions - [sankakucomplex] extract videos and embeds ([#308](https://github.com/mikf/gallery-dl/issues/308)) - [twitter] add support for lists ([#1096](https://github.com/mikf/gallery-dl/issues/1096)) - [postprocessor:metadata] accept string-lists for `content-format` ([#1080](https://github.com/mikf/gallery-dl/issues/1080)) - implement `modules` and `extension-map` options ### Fixes - [500px] update query hashes - [8kun] fix file URLs of older posts ([#1101](https://github.com/mikf/gallery-dl/issues/1101)) - [exhentai] update image URL parsing ([#1094](https://github.com/mikf/gallery-dl/issues/1094)) - [hentaifoundry] update `YII_CSRF_TOKEN` cookie handling ([#1083](https://github.com/mikf/gallery-dl/issues/1083)) - [hentaifoundry] use scheme from input URLs ([#1095](https://github.com/mikf/gallery-dl/issues/1095)) - [mangoxo] fix metadata extraction - [paheal] fix extraction ([#1088](https://github.com/mikf/gallery-dl/issues/1088)) - collect post processors from `basecategory` entries ([#1084](https://github.com/mikf/gallery-dl/issues/1084)) ## 1.15.2 - 2020-10-24 ### Additions - [pinterest] implement login support ([#1055](https://github.com/mikf/gallery-dl/issues/1055)) - [reddit] add `date` metadata field ([#1068](https://github.com/mikf/gallery-dl/issues/1068)) - [seiga] add metadata for single image downloads ([#1063](https://github.com/mikf/gallery-dl/issues/1063)) - [twitter] support media from Cards ([#937](https://github.com/mikf/gallery-dl/issues/937), [#1005](https://github.com/mikf/gallery-dl/issues/1005)) - [weasyl] support api-key authentication ([#1057](https://github.com/mikf/gallery-dl/issues/1057)) - add a `t` format string conversion for trimming whitespace ([#1065](https://github.com/mikf/gallery-dl/issues/1065)) ### Fixes - [blogger] handle URLs with specified width/height ([#1061](https://github.com/mikf/gallery-dl/issues/1061)) - [fallenangels] fix extraction of `.5` chapters - [gelbooru] rewrite mp4 video URLs ([#1048](https://github.com/mikf/gallery-dl/issues/1048)) - [hitomi] fix image URLs and gallery URL pattern - [mangadex] unescape more metadata fields ([#1066](https://github.com/mikf/gallery-dl/issues/1066)) - [mangahere] ensure download URLs have a scheme ([#1070](https://github.com/mikf/gallery-dl/issues/1070)) - [mangakakalot] ignore "Go Home" buttons in chapter pages - [newgrounds] handle embeds without scheme ([#1033](https://github.com/mikf/gallery-dl/issues/1033)) - [newgrounds] provide fallback URLs for video downloads ([#1042](https://github.com/mikf/gallery-dl/issues/1042)) - [xhamster] fix user profile extraction ## 1.15.1 - 2020-10-11 ### Additions - [hentaicafe] add `manga_id` metadata field ([#1036](https://github.com/mikf/gallery-dl/issues/1036)) - [hentaifoundry] add support for stories ([#734](https://github.com/mikf/gallery-dl/issues/734)) - [hentaifoundry] add `include` option - [newgrounds] extract image embeds ([#1033](https://github.com/mikf/gallery-dl/issues/1033)) - [nijie] add `include` option ([#1018](https://github.com/mikf/gallery-dl/issues/1018)) - [reactor] match URLs without subdomain ([#1053](https://github.com/mikf/gallery-dl/issues/1053)) - [twitter] extend `retweets` option ([#1026](https://github.com/mikf/gallery-dl/issues/1026)) - [weasyl] add extractors ([#977](https://github.com/mikf/gallery-dl/issues/977)) ### Fixes - [500px] update query hashes - [behance] fix `collection` extraction - [newgrounds] fix video extraction ([#1042](https://github.com/mikf/gallery-dl/issues/1042)) - [twitter] improve twitpic extraction ([#1019](https://github.com/mikf/gallery-dl/issues/1019)) - [weibo] handle posts with more than 9 images ([#926](https://github.com/mikf/gallery-dl/issues/926)) - [xvideos] fix `title` extraction - fix crash when using `--download-archive` with `--no-skip` ([#1023](https://github.com/mikf/gallery-dl/issues/1023)) - fix issues with `blacklist`/`whitelist` defaults ([#1051](https://github.com/mikf/gallery-dl/issues/1051), [#1056](https://github.com/mikf/gallery-dl/issues/1056)) ### Removals - [kissmanga] remove module ## 1.15.0 - 2020-09-20 ### Additions - [deviantart] support watchers-only/paid deviations ([#995](https://github.com/mikf/gallery-dl/issues/995)) - [myhentaigallery] add gallery extractor ([#1001](https://github.com/mikf/gallery-dl/issues/1001)) - [twitter] support specifying users by ID ([#980](https://github.com/mikf/gallery-dl/issues/980)) - [twitter] support `/intent/user?user_id=…` URLs ([#980](https://github.com/mikf/gallery-dl/issues/980)) - add `--no-skip` command-line option ([#986](https://github.com/mikf/gallery-dl/issues/986)) - add `blacklist` and `whitelist` options ([#492](https://github.com/mikf/gallery-dl/issues/492), [#844](https://github.com/mikf/gallery-dl/issues/844)) - add `filesize-min` and `filesize-max` options ([#780](https://github.com/mikf/gallery-dl/issues/780)) - add `sleep-extractor` and `sleep-request` options ([#788](https://github.com/mikf/gallery-dl/issues/788)) - write skipped files to archive ([#550](https://github.com/mikf/gallery-dl/issues/550)) ### Changes - [exhentai] update wait time before original image downloads ([#978](https://github.com/mikf/gallery-dl/issues/978)) - [imgur] use new API endpoints for image/album data - [tumblr] create directories for each post ([#965](https://github.com/mikf/gallery-dl/issues/965)) - support format string replacement fields in download archive paths ([#985](https://github.com/mikf/gallery-dl/issues/985)) - reduce wait time growth rate for HTTP retries from exponential to linear ### Fixes - [500px] update query hash - [aryion] improve post ID extraction ([#981](https://github.com/mikf/gallery-dl/issues/981), [#982](https://github.com/mikf/gallery-dl/issues/982)) - [danbooru] handle posts without `id` ([#1004](https://github.com/mikf/gallery-dl/issues/1004)) - [furaffinity] update download URL extraction ([#988](https://github.com/mikf/gallery-dl/issues/988)) - [imgur] fix image/album detection for galleries - [postprocessor:zip] defer zip file creation ([#968](https://github.com/mikf/gallery-dl/issues/968)) ### Removals - [jaiminisbox] remove extractors - [worldthree] remove extractors ## 1.14.5 - 2020-08-30 ### Additions - [aryion] add username/password support ([#960](https://github.com/mikf/gallery-dl/issues/960)) - [exhentai] add ability to specify a custom image limit ([#940](https://github.com/mikf/gallery-dl/issues/940)) - [furaffinity] add `search` extractor ([#915](https://github.com/mikf/gallery-dl/issues/915)) - [imgur] add `search` and `tag` extractors ([#934](https://github.com/mikf/gallery-dl/issues/934)) ### Fixes - [500px] fix extraction and update URL patterns ([#956](https://github.com/mikf/gallery-dl/issues/956)) - [aryion] update folder mime type list ([#945](https://github.com/mikf/gallery-dl/issues/945)) - [gelbooru] fix extraction without API - [hentaihand] update to new site layout - [hitomi] fix redirect processing - [reddit] handle deleted galleries ([#953](https://github.com/mikf/gallery-dl/issues/953)) - [reddit] improve gallery extraction ([#955](https://github.com/mikf/gallery-dl/issues/955)) ## 1.14.4 - 2020-08-15 ### Additions - [blogger] add `search` extractor ([#925](https://github.com/mikf/gallery-dl/issues/925)) - [blogger] support searching posts by labels ([#925](https://github.com/mikf/gallery-dl/issues/925)) - [inkbunny] add `user` and `post` extractors ([#283](https://github.com/mikf/gallery-dl/issues/283)) - [instagram] support `/reel/` URLs - [pinterest] support `pinterest.co.uk` URLs ([#914](https://github.com/mikf/gallery-dl/issues/914)) - [reddit] support gallery posts ([#920](https://github.com/mikf/gallery-dl/issues/920)) - [subscribestar] extract attached media files ([#852](https://github.com/mikf/gallery-dl/issues/852)) ### Fixes - [blogger] improve error messages for missing posts/blogs ([#903](https://github.com/mikf/gallery-dl/issues/903)) - [exhentai] adjust image limit costs ([#940](https://github.com/mikf/gallery-dl/issues/940)) - [gfycat] skip malformed gfycat responses ([#902](https://github.com/mikf/gallery-dl/issues/902)) - [imgur] handle 403 overcapacity responses ([#910](https://github.com/mikf/gallery-dl/issues/910)) - [instagram] wait before GraphQL requests ([#901](https://github.com/mikf/gallery-dl/issues/901)) - [mangareader] fix extraction - [mangoxo] fix login - [pixnet] detect password-protected albums ([#177](https://github.com/mikf/gallery-dl/issues/177)) - [simplyhentai] fix `gallery_id` extraction - [subscribestar] update `date` parsing - [vsco] handle missing `description` fields - [xhamster] fix extraction ([#917](https://github.com/mikf/gallery-dl/issues/917)) - allow `parent-directory` to work recursively ([#905](https://github.com/mikf/gallery-dl/issues/905)) - skip external OAuth tests ([#908](https://github.com/mikf/gallery-dl/issues/908)) ### Removals - [bobx] remove module ## 1.14.3 - 2020-07-18 ### Additions - [8muses] support `comics.8muses.com` URLs - [artstation] add `following` extractor ([#888](https://github.com/mikf/gallery-dl/issues/888)) - [exhentai] add `domain` option ([#897](https://github.com/mikf/gallery-dl/issues/897)) - [gfycat] add `user` and `search` extractors - [imgur] support all `/t/...` URLs ([#880](https://github.com/mikf/gallery-dl/issues/880)) - [khinsider] add `format` option ([#840](https://github.com/mikf/gallery-dl/issues/840)) - [mangakakalot] add `manga` and `chapter` extractors ([#876](https://github.com/mikf/gallery-dl/issues/876)) - [redgifs] support `gifsdeliverynetwork.com` URLs ([#874](https://github.com/mikf/gallery-dl/issues/874)) - [subscribestar] add `user` and `post` extractors ([#852](https://github.com/mikf/gallery-dl/issues/852)) - [twitter] add support for nitter.net URLs ([#890](https://github.com/mikf/gallery-dl/issues/890)) - add Zsh completion script ([#150](https://github.com/mikf/gallery-dl/issues/150)) ### Fixes - [gfycat] retry 404'ed videos on redgifs.com ([#874](https://github.com/mikf/gallery-dl/issues/874)) - [newgrounds] fix favorites extraction - [patreon] yield images and attachments before post files ([#871](https://github.com/mikf/gallery-dl/issues/871)) - [reddit] fix AttributeError when using `recursion` ([#879](https://github.com/mikf/gallery-dl/issues/879)) - [twitter] raise proper exception if a user doesn't exist ([#891](https://github.com/mikf/gallery-dl/issues/891)) - defer directory creation ([#722](https://github.com/mikf/gallery-dl/issues/722)) - set pseudo extension for Metadata messages ([#865](https://github.com/mikf/gallery-dl/issues/865)) - prevent exception on Cloudflare challenges ([#868](https://github.com/mikf/gallery-dl/issues/868)) ## 1.14.2 - 2020-06-27 ### Additions - [artstation] add `date` metadata field ([#839](https://github.com/mikf/gallery-dl/issues/839)) - [mastodon] add `date` metadata field ([#839](https://github.com/mikf/gallery-dl/issues/839)) - [pinterest] add support for board sections ([#835](https://github.com/mikf/gallery-dl/issues/835)) - [twitter] add extractor for liked tweets ([#837](https://github.com/mikf/gallery-dl/issues/837)) - [twitter] add option to filter media from quoted tweets ([#854](https://github.com/mikf/gallery-dl/issues/854)) - [weibo] add `date` metadata field to `status` objects ([#829](https://github.com/mikf/gallery-dl/issues/829)) ### Fixes - [aryion] fix user gallery extraction ([#832](https://github.com/mikf/gallery-dl/issues/832)) - [imgur] build directory paths for each file ([#842](https://github.com/mikf/gallery-dl/issues/842)) - [tumblr] prevent errors when using `reblogs=same-blog` ([#851](https://github.com/mikf/gallery-dl/issues/851)) - [twitter] always provide an `author` metadata field ([#831](https://github.com/mikf/gallery-dl/issues/831), [#833](https://github.com/mikf/gallery-dl/issues/833)) - [twitter] don't download video previews ([#833](https://github.com/mikf/gallery-dl/issues/833)) - [twitter] improve handling of deleted tweets ([#838](https://github.com/mikf/gallery-dl/issues/838)) - [twitter] fix search results ([#847](https://github.com/mikf/gallery-dl/issues/847)) - [twitter] improve handling of quoted tweets ([#854](https://github.com/mikf/gallery-dl/issues/854)) - fix config lookups when multiple locations are involved ([#843](https://github.com/mikf/gallery-dl/issues/843)) - improve output of `-K/--list-keywords` for parent extractors ([#825](https://github.com/mikf/gallery-dl/issues/825)) - call `flush()` after writing JSON in `DataJob()` ([#727](https://github.com/mikf/gallery-dl/issues/727)) ## 1.14.1 - 2020-06-12 ### Additions - [furaffinity] add `artist_url` metadata field ([#821](https://github.com/mikf/gallery-dl/issues/821)) - [redgifs] add `user` and `search` extractors ([#724](https://github.com/mikf/gallery-dl/issues/724)) ### Changes - [deviantart] extend `extra` option; also search journals for sta.sh links ([#712](https://github.com/mikf/gallery-dl/issues/712)) - [twitter] rewrite; use new interface ([#806](https://github.com/mikf/gallery-dl/issues/806), [#740](https://github.com/mikf/gallery-dl/issues/740)) ### Fixes - [kissmanga] work around CAPTCHAs ([#818](https://github.com/mikf/gallery-dl/issues/818)) - [nhentai] fix extraction ([#819](https://github.com/mikf/gallery-dl/issues/819)) - [webtoons] generalize comic extraction code ([#820](https://github.com/mikf/gallery-dl/issues/820)) ## 1.14.0 - 2020-05-31 ### Additions - [imagechest] add new extractor for imgchest.com ([#750](https://github.com/mikf/gallery-dl/issues/750)) - [instagram] add `post_url`, `tags`, `location`, `tagged_users` metadata ([#743](https://github.com/mikf/gallery-dl/issues/743)) - [redgifs] add image extractor ([#724](https://github.com/mikf/gallery-dl/issues/724)) - [webtoons] add new extractor for webtoons.com ([#761](https://github.com/mikf/gallery-dl/issues/761)) - implement `--write-pages` option ([#736](https://github.com/mikf/gallery-dl/issues/736)) - extend `path-restrict` option ([#662](https://github.com/mikf/gallery-dl/issues/662)) - implement `path-replace` option ([#662](https://github.com/mikf/gallery-dl/issues/662), [#755](https://github.com/mikf/gallery-dl/issues/755)) - make `path` and `keywords` available in logging messages ([#574](https://github.com/mikf/gallery-dl/issues/574), [#575](https://github.com/mikf/gallery-dl/issues/575)) ### Changes - [danbooru] change default value of `ugoira` to `false` - [downloader:ytdl] change default value of `forward-cookies` to `false` - [downloader:ytdl] fix file extensions when merging into `.mkv` ([#720](https://github.com/mikf/gallery-dl/issues/720)) - write OAuth tokens to cache ([#616](https://github.com/mikf/gallery-dl/issues/616)) - use `%APPDATA%\gallery-dl` for config files and cache on Windows - use `util.Formatter` for formatting logging messages - reuse HTTP connections from parent extractors ### Fixes - [deviantart] use private access tokens for Journals ([#738](https://github.com/mikf/gallery-dl/issues/738)) - [gelbooru] simplify and fix pool extraction - [imgur] fix extraction of animated images without `mp4` entry - [imgur] treat `/t/unmuted/` URLs as galleries - [instagram] fix login with username & password ([#756](https://github.com/mikf/gallery-dl/issues/756), [#771](https://github.com/mikf/gallery-dl/issues/771), [#797](https://github.com/mikf/gallery-dl/issues/797), [#803](https://github.com/mikf/gallery-dl/issues/803)) - [reddit] don't send OAuth headers for file downloads ([#729](https://github.com/mikf/gallery-dl/issues/729)) - fix/improve Cloudflare bypass code ([#728](https://github.com/mikf/gallery-dl/issues/728), [#757](https://github.com/mikf/gallery-dl/issues/757)) - reset filenames on empty file extensions ([#733](https://github.com/mikf/gallery-dl/issues/733)) ## 1.13.6 - 2020-05-02 ### Additions - [patreon] respect filters and sort order in query parameters ([#711](https://github.com/mikf/gallery-dl/issues/711)) - [speakerdeck] add a new extractor for speakerdeck.com ([#726](https://github.com/mikf/gallery-dl/issues/726)) - [twitter] add `replies` option ([#705](https://github.com/mikf/gallery-dl/issues/705)) - [weibo] add `videos` option - [downloader:http] add MIME types for `.psd` files ([#714](https://github.com/mikf/gallery-dl/issues/714)) ### Fixes - [artstation] improve embed extraction ([#720](https://github.com/mikf/gallery-dl/issues/720)) - [deviantart] limit API wait times ([#721](https://github.com/mikf/gallery-dl/issues/721)) - [newgrounds] fix URLs produced by the `following` extractor ([#684](https://github.com/mikf/gallery-dl/issues/684)) - [patreon] improve file hash extraction ([#713](https://github.com/mikf/gallery-dl/issues/713)) - [vsco] fix user gallery extraction - fix/improve Cloudflare bypass code ([#728](https://github.com/mikf/gallery-dl/issues/728)) ## 1.13.5 - 2020-04-27 ### Additions - [500px] recognize `web.500px.com` URLs - [aryion] support downloading from folders ([#694](https://github.com/mikf/gallery-dl/issues/694)) - [furaffinity] add extractor for followed users ([#515](https://github.com/mikf/gallery-dl/issues/515)) - [hitomi] add extractor for tag searches ([#697](https://github.com/mikf/gallery-dl/issues/697)) - [instagram] add `post_id` and `num` metadata fields ([#698](https://github.com/mikf/gallery-dl/issues/698)) - [newgrounds] add extractor for followed users ([#684](https://github.com/mikf/gallery-dl/issues/684)) - [patreon] recognize URLs with creator IDs ([#711](https://github.com/mikf/gallery-dl/issues/711)) - [twitter] add `reply` metadata field ([#705](https://github.com/mikf/gallery-dl/issues/705)) - [xhamster] recognize `xhamster.porncache.net` URLs ([#700](https://github.com/mikf/gallery-dl/issues/700)) ### Fixes - [gelbooru] improve post ID extraction in pool listings - [hitomi] fix extraction of galleries without tags - [jaiminisbox] update metadata decoding procedure ([#702](https://github.com/mikf/gallery-dl/issues/702)) - [mastodon] fix pagination ([#701](https://github.com/mikf/gallery-dl/issues/701)) - [mastodon] improve account searches ([#704](https://github.com/mikf/gallery-dl/issues/704)) - [patreon] fix hash extraction from download URLs ([#693](https://github.com/mikf/gallery-dl/issues/693)) - improve parameter extraction when solving Cloudflare challenges ## 1.13.4 - 2020-04-12 ### Additions - [aryion] add `gallery` and `post` extractors ([#390](https://github.com/mikf/gallery-dl/issues/390), [#673](https://github.com/mikf/gallery-dl/issues/673)) - [deviantart] detect and handle folders in sta.sh listings ([#659](https://github.com/mikf/gallery-dl/issues/659)) - [hentainexus] add `circle`, `event`, and `title_conventional` metadata fields ([#661](https://github.com/mikf/gallery-dl/issues/661)) - [hiperdex] add `artist` extractor ([#606](https://github.com/mikf/gallery-dl/issues/606)) - [mastodon] add access tokens for `mastodon.social` and `baraag.net` ([#665](https://github.com/mikf/gallery-dl/issues/665)) ### Changes - [deviantart] retrieve *all* download URLs through the OAuth API - automatically read config files in PyInstaller executable directories ([#682](https://github.com/mikf/gallery-dl/issues/682)) ### Fixes - [deviantart] handle "Request blocked" errors ([#655](https://github.com/mikf/gallery-dl/issues/655)) - [deviantart] improve JPEG quality replacement pattern - [hiperdex] fix extraction - [mastodon] handle API rate limits ([#665](https://github.com/mikf/gallery-dl/issues/665)) - [mastodon] update OAuth credentials for pawoo.net ([#665](https://github.com/mikf/gallery-dl/issues/665)) - [myportfolio] fix extraction of galleries without title - [piczel] fix extraction of single images - [vsco] fix collection extraction - [weibo] accept status URLs with non-numeric IDs ([#664](https://github.com/mikf/gallery-dl/issues/664)) ## 1.13.3 - 2020-03-28 ### Additions - [instagram] Add support for user's saved medias ([#644](https://github.com/mikf/gallery-dl/issues/644)) - [nozomi] support multiple images per post ([#646](https://github.com/mikf/gallery-dl/issues/646)) - [35photo] add `tag` extractor ### Changes - [mangadex] transform timestamps from `date` fields to datetime objects ### Fixes - [deviantart] handle decode errors for `extended_fetch` results ([#655](https://github.com/mikf/gallery-dl/issues/655)) - [e621] fix bug in API rate limiting and improve pagination ([#651](https://github.com/mikf/gallery-dl/issues/651)) - [instagram] update pattern for user profile URLs - [mangapark] fix metadata extraction - [nozomi] sort search results ([#646](https://github.com/mikf/gallery-dl/issues/646)) - [piczel] fix extraction - [twitter] fix typo in `x-twitter-auth-type` header ([#625](https://github.com/mikf/gallery-dl/issues/625)) - remove trailing dots from Windows directory names ([#647](https://github.com/mikf/gallery-dl/issues/647)) - fix crash with missing `stdout`/`stderr`/`stdin` handles ([#653](https://github.com/mikf/gallery-dl/issues/653)) ## 1.13.2 - 2020-03-14 ### Additions - [furaffinity] extract more metadata - [instagram] add `post_shortcode` metadata field ([#525](https://github.com/mikf/gallery-dl/issues/525)) - [kabeuchi] add extractor ([#561](https://github.com/mikf/gallery-dl/issues/561)) - [newgrounds] add extractor for favorited posts ([#394](https://github.com/mikf/gallery-dl/issues/394)) - [pixiv] implement `avatar` option ([#595](https://github.com/mikf/gallery-dl/issues/595), [#623](https://github.com/mikf/gallery-dl/issues/623)) - [twitter] add extractor for bookmarked Tweets ([#625](https://github.com/mikf/gallery-dl/issues/625)) ### Fixes - [bcy] reduce number of HTTP requests during data extraction - [e621] update to new interface ([#635](https://github.com/mikf/gallery-dl/issues/635)) - [exhentai] handle incomplete MIME types ([#632](https://github.com/mikf/gallery-dl/issues/632)) - [hitomi] improve metadata extraction - [mangoxo] fix login - [newgrounds] improve error handling when extracting post data ## 1.13.1 - 2020-03-01 ### Additions - [hentaihand] add extractors ([#605](https://github.com/mikf/gallery-dl/issues/605)) - [hiperdex] add chapter and manga extractors ([#606](https://github.com/mikf/gallery-dl/issues/606)) - [oauth] implement option to write DeviantArt refresh-tokens to cache ([#616](https://github.com/mikf/gallery-dl/issues/616)) - [downloader:http] add more MIME types for `.bmp` and `.rar` files ([#621](https://github.com/mikf/gallery-dl/issues/621), [#628](https://github.com/mikf/gallery-dl/issues/628)) - warn about expired cookies ### Fixes - [bcy] fix partial image URLs ([#613](https://github.com/mikf/gallery-dl/issues/613)) - [danbooru] fix Ugoira downloads and metadata - [deviantart] check availability of `/intermediary/` URLs ([#609](https://github.com/mikf/gallery-dl/issues/609)) - [hitomi] follow multiple redirects & fix image URLs - [piczel] improve and update - [tumblr] replace `-` with ` ` in tag searches ([#611](https://github.com/mikf/gallery-dl/issues/611)) - [vsco] update gallery URL pattern - fix `--verbose` and `--quiet` command-line options ## 1.13.0 - 2020-02-16 ### Additions - Support for - `furaffinity` - https://www.furaffinity.net/ ([#284](https://github.com/mikf/gallery-dl/issues/284)) - `8kun` - https://8kun.top/ ([#582](https://github.com/mikf/gallery-dl/issues/582)) - `bcy` - https://bcy.net/ ([#592](https://github.com/mikf/gallery-dl/issues/592)) - [blogger] implement video extraction ([#587](https://github.com/mikf/gallery-dl/issues/587)) - [oauth] add option to specify port number used by local server ([#604](https://github.com/mikf/gallery-dl/issues/604)) - [pixiv] add `rating` metadata field ([#595](https://github.com/mikf/gallery-dl/issues/595)) - [pixiv] recognize tags at the end of new bookmark URLs - [reddit] add `videos` option - [weibo] use youtube-dl to download from m3u8 manifests - implement `parent-directory` option ([#551](https://github.com/mikf/gallery-dl/issues/551)) - extend filename formatting capabilities: - implement field name alternatives ([#525](https://github.com/mikf/gallery-dl/issues/525)) - allow multiple "special" format specifiers per replacement field ([#595](https://github.com/mikf/gallery-dl/issues/595)) - allow for numeric list and string indices ### Changes - [reddit] handle reddit-hosted images and videos natively ([#551](https://github.com/mikf/gallery-dl/issues/551)) - [twitter] change default value for `videos` to `true` ### Fixes - [cloudflare] unescape challenge URLs - [deviantart] fix video extraction from `extended_fetch` results - [hitomi] implement workaround for "broken" redirects - [khinsider] fix and improve metadata extraction - [patreon] filter duplicate files per post ([#590](https://github.com/mikf/gallery-dl/issues/590)) - [piczel] fix extraction - [pixiv] fix user IDs for bookmarks API calls ([#596](https://github.com/mikf/gallery-dl/issues/596)) - [sexcom] fix image URLs - [twitter] force old login page layout ([#584](https://github.com/mikf/gallery-dl/issues/584), [#598](https://github.com/mikf/gallery-dl/issues/598)) - [vsco] skip "invalid" entities - improve functions to load/save cookies.txt files ([#586](https://github.com/mikf/gallery-dl/issues/586)) ### Removals - [yaplog] remove module ## 1.12.3 - 2020-01-19 ### Additions - [hentaifoundry] extract more metadata ([#565](https://github.com/mikf/gallery-dl/issues/565)) - [twitter] add option to extract TwitPic embeds ([#579](https://github.com/mikf/gallery-dl/issues/579)) - implement a post-processor module to compare file versions ([#530](https://github.com/mikf/gallery-dl/issues/530)) ### Fixes - [hitomi] update image URL generation - [mangadex] revert domain to `mangadex.org` - [pinterest] improve detection of invalid pin.it links - [pixiv] update URL patterns for user profiles and bookmarks ([#568](https://github.com/mikf/gallery-dl/issues/568)) - [twitter] Fix stop before real end ([#573](https://github.com/mikf/gallery-dl/issues/573)) - remove temp files before downloading from fallback URLs ### Removals - [erolord] remove extractor ## 1.12.2 - 2020-01-05 ### Additions - [deviantart] match new search/popular URLs ([#538](https://github.com/mikf/gallery-dl/issues/538)) - [deviantart] match `/favourites/all` URLs ([#555](https://github.com/mikf/gallery-dl/issues/555)) - [deviantart] add extractor for followed users ([#515](https://github.com/mikf/gallery-dl/issues/515)) - [pixiv] support listing followed users ([#515](https://github.com/mikf/gallery-dl/issues/515)) - [imagefap] handle beta.imagefap.com URLs ([#552](https://github.com/mikf/gallery-dl/issues/552)) - [postprocessor:metadata] add `directory` option ([#520](https://github.com/mikf/gallery-dl/issues/520)) ### Fixes - [artstation] fix search result pagination ([#537](https://github.com/mikf/gallery-dl/issues/537)) - [directlink] send Referer headers ([#536](https://github.com/mikf/gallery-dl/issues/536)) - [exhentai] restrict default directory name length ([#545](https://github.com/mikf/gallery-dl/issues/545)) - [mangadex] change domain to mangadex.cc ([#559](https://github.com/mikf/gallery-dl/issues/559)) - [mangahere] send `isAdult` cookies ([#556](https://github.com/mikf/gallery-dl/issues/556)) - [newgrounds] fix tags metadata extraction - [pixiv] retry after rate limit errors ([#535](https://github.com/mikf/gallery-dl/issues/535)) - [twitter] handle quoted tweets ([#526](https://github.com/mikf/gallery-dl/issues/526)) - [twitter] handle API rate limits ([#526](https://github.com/mikf/gallery-dl/issues/526)) - [twitter] fix URLs forwarded to youtube-dl ([#540](https://github.com/mikf/gallery-dl/issues/540)) - prevent infinite recursion when spawning new extractors ([#489](https://github.com/mikf/gallery-dl/issues/489)) - improve output of `--list-keywords` for "parent" extractors ([#548](https://github.com/mikf/gallery-dl/issues/548)) - provide fallback for SQLite versions with missing `WITHOUT ROWID` support ([#553](https://github.com/mikf/gallery-dl/issues/553)) ## 1.12.1 - 2019-12-22 ### Additions - [4chan] add extractor for entire boards ([#510](https://github.com/mikf/gallery-dl/issues/510)) - [realbooru] add extractors for pools, posts, and tag searches ([#514](https://github.com/mikf/gallery-dl/issues/514)) - [instagram] implement a `videos` option ([#521](https://github.com/mikf/gallery-dl/issues/521)) - [vsco] implement a `videos` option - [postprocessor:metadata] implement a `bypost` option for downloading the metadata of an entire post ([#511](https://github.com/mikf/gallery-dl/issues/511)) ### Changes - [reddit] change the default value for `comments` to `0` - [vsco] improve image resolutions - make filesystem-related errors during file downloads non-fatal ([#512](https://github.com/mikf/gallery-dl/issues/512)) ### Fixes - [foolslide] add fallback for chapter data extraction - [instagram] ignore errors during post-page extraction - [patreon] avoid errors when fetching user info ([#508](https://github.com/mikf/gallery-dl/issues/508)) - [patreon] improve URL pattern for single posts - [reddit] fix errors with `t1` submissions - [vsco] fix user profile extraction … again - [weibo] handle unavailable/deleted statuses - [downloader:http] improve rate limit handling - retain trailing zeroes in Cloudflare challenge answers ## 1.12.0 - 2019-12-08 ### Additions - [flickr] support 3k, 4k, 5k, and 6k photo sizes ([#472](https://github.com/mikf/gallery-dl/issues/472)) - [imgur] add extractor for subreddit links ([#500](https://github.com/mikf/gallery-dl/issues/500)) - [newgrounds] add extractors for `audio` listings and general `media` files ([#394](https://github.com/mikf/gallery-dl/issues/394)) - [newgrounds] implement login support ([#394](https://github.com/mikf/gallery-dl/issues/394)) - [postprocessor:metadata] implement a `extension-format` option ([#477](https://github.com/mikf/gallery-dl/issues/477)) - `--exec-after` ### Changes - [deviantart] ensure consistent username capitalization ([#455](https://github.com/mikf/gallery-dl/issues/455)) - [directlink] split `{path}` into `{path}/{filename}.{extension}` - [twitter] update metadata fields with user/author information - [postprocessor:metadata] filter private entries & rename `format` to `content-format` - Enable `cookies-update` by default ### Fixes - [2chan] fix metadata extraction - [behance] get images from 'media_collection' modules - [bobx] fix image downloads by randomly generating session cookies ([#482](https://github.com/mikf/gallery-dl/issues/482)) - [deviantart] revert to getting download URLs from OAuth API calls ([#488](https://github.com/mikf/gallery-dl/issues/488)) - [deviantart] fix URL generation from '/extended_fetch' results ([#505](https://github.com/mikf/gallery-dl/issues/505)) - [flickr] adjust OAuth redirect URI ([#503](https://github.com/mikf/gallery-dl/issues/503)) - [hentaifox] fix extraction - [imagefap] adapt to new image URL format - [imgbb] fix error in galleries without user info ([#471](https://github.com/mikf/gallery-dl/issues/471)) - [instagram] prevent errors with missing 'video_url' fields ([#479](https://github.com/mikf/gallery-dl/issues/479)) - [nijie] fix `date` parsing - [pixiv] match new search URLs ([#507](https://github.com/mikf/gallery-dl/issues/507)) - [plurk] fix comment pagination - [sexcom] send specific Referer headers when downloading videos - [twitter] fix infinite loops ([#499](https://github.com/mikf/gallery-dl/issues/499)) - [vsco] fix user profile and collection extraction ([#480](https://github.com/mikf/gallery-dl/issues/480)) - Fix Cloudflare DDoS protection bypass ### Removals - `--abort-on-skip` ## 1.11.1 - 2019-11-09 ### Fixes - Fix inclusion of bash completion and man pages in source distributions ## 1.11.0 - 2019-11-08 ### Additions - Support for - `blogger` - https://www.blogger.com/ ([#364](https://github.com/mikf/gallery-dl/issues/364)) - `nozomi` - https://nozomi.la/ ([#388](https://github.com/mikf/gallery-dl/issues/388)) - `issuu` - https://issuu.com/ ([#413](https://github.com/mikf/gallery-dl/issues/413)) - `naver` - https://blog.naver.com/ ([#447](https://github.com/mikf/gallery-dl/issues/447)) - Extractor for `twitter` search results ([#448](https://github.com/mikf/gallery-dl/issues/448)) - Extractor for `deviantart` user profiles with configurable targets ([#377](https://github.com/mikf/gallery-dl/issues/377), [#419](https://github.com/mikf/gallery-dl/issues/419)) - `--ugoira-conv-lossless` ([#432](https://github.com/mikf/gallery-dl/issues/432)) - `cookies-update` option to allow updating cookies.txt files ([#445](https://github.com/mikf/gallery-dl/issues/445)) - Optional `cloudflare` and `video` installation targets ([#460](https://github.com/mikf/gallery-dl/issues/460)) - Allow executing commands with the `exec` post-processor after all files are downloaded ([#413](https://github.com/mikf/gallery-dl/issues/413), [#421](https://github.com/mikf/gallery-dl/issues/421)) ### Changes - Rewrite `imgur` using its public API ([#446](https://github.com/mikf/gallery-dl/issues/446)) - Rewrite `luscious` using GraphQL queries ([#457](https://github.com/mikf/gallery-dl/issues/457)) - Adjust default `nijie` filenames to match `pixiv` - Change enumeration index for gallery extractors from `page` to `num` - Return non-zero exit status when errors occurred - Forward proxy settings to youtube-dl downloader - Install bash completion script into `share/bash-completion/completions` ### Fixes - Adapt to new `instagram` page layout when logged in ([#391](https://github.com/mikf/gallery-dl/issues/391)) - Support protected `twitter` videos ([#452](https://github.com/mikf/gallery-dl/issues/452)) - Extend `hitomi` URL pattern and fix gallery extraction - Restore OAuth2 authentication error messages - Miscellaneous fixes for `patreon` ([#444](https://github.com/mikf/gallery-dl/issues/444)), `deviantart` ([#455](https://github.com/mikf/gallery-dl/issues/455)), `sexcom` ([#464](https://github.com/mikf/gallery-dl/issues/464)), `imgur` ([#467](https://github.com/mikf/gallery-dl/issues/467)), `simplyhentai` ## 1.10.6 - 2019-10-11 ### Additions - `--exec` command-line option to specify a command to run after each file download ([#421](https://github.com/mikf/gallery-dl/issues/421)) ### Changes - Include titles in `gfycat` default filenames ([#434](https://github.com/mikf/gallery-dl/issues/434)) ### Fixes - Fetch working download URLs for `deviantart` ([#436](https://github.com/mikf/gallery-dl/issues/436)) - Various fixes and improvements for `yaplog` blogs ([#443](https://github.com/mikf/gallery-dl/issues/443)) - Fix image URL generation for `hitomi` galleries - Miscellaneous fixes for `behance` and `xvideos` ## 1.10.5 - 2019-09-28 ### Additions - `instagram.highlights` option to include highlighted stories when downloading user profiles ([#329](https://github.com/mikf/gallery-dl/issues/329)) - Support for `/user/` URLs on `reddit` ([#350](https://github.com/mikf/gallery-dl/issues/350)) - Support for `imgur` user profiles and favorites ([#420](https://github.com/mikf/gallery-dl/issues/420)) - Additional metadata fields on `nijie`([#423](https://github.com/mikf/gallery-dl/issues/423)) ### Fixes - Improve handling of private `deviantart` artworks ([#414](https://github.com/mikf/gallery-dl/issues/414)) and 429 status codes ([#424](https://github.com/mikf/gallery-dl/issues/424)) - Prevent fatal errors when trying to open download-archive files ([#417](https://github.com/mikf/gallery-dl/issues/417)) - Detect and ignore unavailable videos on `weibo` ([#427](https://github.com/mikf/gallery-dl/issues/427)) - Update the `scope` of new `reddit` refresh-tokens ([#428](https://github.com/mikf/gallery-dl/issues/428)) - Fix inconsistencies with the `reddit.comments` option ([#429](https://github.com/mikf/gallery-dl/issues/429)) - Extend URL patterns for `hentaicafe` manga and `pixiv` artworks - Improve detection of unavailable albums on `luscious` and `imgbb` - Miscellaneous fixes for `tsumino` ## 1.10.4 - 2019-09-08 ### Additions - Support for - `lineblog` - https://www.lineblog.me/ ([#404](https://github.com/mikf/gallery-dl/issues/404)) - `fuskator` - https://fuskator.com/ ([#407](https://github.com/mikf/gallery-dl/issues/407)) - `ugoira` option for `danbooru` to download pre-rendered ugoira animations ([#406](https://github.com/mikf/gallery-dl/issues/406)) ### Fixes - Download the correct files from `twitter` replies ([#403](https://github.com/mikf/gallery-dl/issues/403)) - Prevent crash when trying to use unavailable downloader modules ([#405](https://github.com/mikf/gallery-dl/issues/405)) - Fix `pixiv` authentication ([#411](https://github.com/mikf/gallery-dl/issues/411)) - Improve `exhentai` image limit checks - Miscellaneous fixes for `hentaicafe`, `simplyhentai`, `tumblr` ## 1.10.3 - 2019-08-30 ### Additions - Provide `filename` metadata for all `deviantart` files ([#392](https://github.com/mikf/gallery-dl/issues/392), [#400](https://github.com/mikf/gallery-dl/issues/400)) - Implement a `ytdl.outtmpl` option to let youtube-dl handle filenames by itself ([#395](https://github.com/mikf/gallery-dl/issues/395)) - Support `seiga` mobile URLs ([#401](https://github.com/mikf/gallery-dl/issues/401)) ### Fixes - Extract more than the first 32 posts from `piczel` galleries ([#396](https://github.com/mikf/gallery-dl/issues/396)) - Fix filenames of archives created with `--zip` ([#397](https://github.com/mikf/gallery-dl/issues/397)) - Skip unavailable images and videos on `flickr` ([#398](https://github.com/mikf/gallery-dl/issues/398)) - Fix filesystem paths on Windows with Python 3.6 and lower ([#402](https://github.com/mikf/gallery-dl/issues/402)) ## 1.10.2 - 2019-08-23 ### Additions - Support for `instagram` stories and IGTV ([#371](https://github.com/mikf/gallery-dl/issues/371), [#373](https://github.com/mikf/gallery-dl/issues/373)) - Support for individual `imgbb` images ([#363](https://github.com/mikf/gallery-dl/issues/363)) - `deviantart.quality` option to set the JPEG compression quality for newer images ([#369](https://github.com/mikf/gallery-dl/issues/369)) - `enumerate` option for `extractor.skip` ([#306](https://github.com/mikf/gallery-dl/issues/306)) - `adjust-extensions` option to control filename extension adjustments - `path-remove` option to remove control characters etc. from filesystem paths ### Changes - Rename `restrict-filenames` to `path-restrict` - Adjust `pixiv` metadata and default filename format ([#366](https://github.com/mikf/gallery-dl/issues/366)) - Set `filename` to `"{category}_{user[id]}_{id}{suffix}.{extension}"` to restore the old default - Improve and optimize directory and filename generation ### Fixes - Allow the `classify` post-processor to handle files with unknown filename extension ([#138](https://github.com/mikf/gallery-dl/issues/138)) - Fix rate limit handling for OAuth APIs ([#368](https://github.com/mikf/gallery-dl/issues/368)) - Fix artwork and scraps extraction on `deviantart` ([#376](https://github.com/mikf/gallery-dl/issues/376), [#392](https://github.com/mikf/gallery-dl/issues/392)) - Distinguish between `imgur` album and gallery URLs ([#380](https://github.com/mikf/gallery-dl/issues/380)) - Prevent crash when using `--ugoira-conv` ([#382](https://github.com/mikf/gallery-dl/issues/382)) - Handle multi-image posts on `patreon` ([#383](https://github.com/mikf/gallery-dl/issues/383)) - Miscellaneous fixes for `*reactor`, `simplyhentai` ## 1.10.1 - 2019-08-02 ### Fixes - Use the correct domain for exhentai.org input URLs ## 1.10.0 - 2019-08-01 ### Warning - Prior to version 1.10.0 all cache files were created world readable (mode `644`) leading to possible sensitive information disclosure on multi-user systems - It is recommended to restrict access permissions of already existing files (`/tmp/.gallery-dl.cache`) with `chmod 600` - Windows users should not be affected ### Additions - Support for - `vsco` - https://vsco.co/ ([#331](https://github.com/mikf/gallery-dl/issues/331)) - `imgbb` - https://imgbb.com/ ([#361](https://github.com/mikf/gallery-dl/issues/361)) - `adultempire` - https://www.adultempire.com/ ([#340](https://github.com/mikf/gallery-dl/issues/340)) - `restrict-filenames` option to create Windows-compatible filenames on any platform ([#348](https://github.com/mikf/gallery-dl/issues/348)) - `forward-cookies` option to control cookie forwarding to youtube-dl ([#352](https://github.com/mikf/gallery-dl/issues/352)) ### Changes - The default cache file location on non-Windows systems is now - `$XDG_CACHE_HOME/gallery-dl/cache.sqlite3` or - `~/.cache/gallery-dl/cache.sqlite3` - New cache files are created with mode `600` - `exhentai` extractors will always use `e-hentai.org` as domain ### Fixes - Better handling of `exhentai` image limits and errors ([#356](https://github.com/mikf/gallery-dl/issues/356), [#360](https://github.com/mikf/gallery-dl/issues/360)) - Try to prevent ZIP file corruption ([#355](https://github.com/mikf/gallery-dl/issues/355)) - Miscellaneous fixes for `behance`, `ngomik` ## 1.9.0 - 2019-07-19 ### Additions - Support for - `erolord` - http://erolord.com/ ([#326](https://github.com/mikf/gallery-dl/issues/326)) - Add login support for `instagram` ([#195](https://github.com/mikf/gallery-dl/issues/195)) - Add `--no-download` and `extractor.*.download` disable file downloads ([#220](https://github.com/mikf/gallery-dl/issues/220)) - Add `-A/--abort` to specify the number of consecutive download skips before aborting - Interpret `-1` as infinite retries ([#300](https://github.com/mikf/gallery-dl/issues/300)) - Implement custom log message formats per log-level ([#304](https://github.com/mikf/gallery-dl/issues/304)) - Implement an `mtime` post-processor that sets file modification times according to metadata fields ([#332](https://github.com/mikf/gallery-dl/issues/332)) - Implement a `twitter.content` option to enable tweet text extraction ([#333](https://github.com/mikf/gallery-dl/issues/333), [#338](https://github.com/mikf/gallery-dl/issues/338)) - Enable `date-min/-max/-format` options for `tumblr` ([#337](https://github.com/mikf/gallery-dl/issues/337)) ### Changes - Set file modification times according to their `Last-Modified` header when downloading ([#236](https://github.com/mikf/gallery-dl/issues/236), [#277](https://github.com/mikf/gallery-dl/issues/277)) - Use `--no-mtime` or `downloader.*.mtime` to disable this behavior - Duplicate download URLs are no longer silently ignored (controllable with `extractor.*.image-unique`) - Deprecate `--abort-on-skip` ### Fixes - Retry downloads on OpenSSL exceptions ([#324](https://github.com/mikf/gallery-dl/issues/324)) - Ignore unavailable pins on `sexcom` instead of raising an exception ([#325](https://github.com/mikf/gallery-dl/issues/325)) - Use Firefox's SSL/TLS ciphers to prevent Cloudflare CAPTCHAs ([#342](https://github.com/mikf/gallery-dl/issues/342)) - Improve folder name matching on `deviantart` ([#343](https://github.com/mikf/gallery-dl/issues/343)) - Forward cookies to `youtube-dl` to allow downloading private videos - Miscellaneous fixes for `35photo`, `500px`, `newgrounds`, `simplyhentai` ## 1.8.7 - 2019-06-28 ### Additions - Support for - `vanillarock` - https://vanilla-rock.com/ ([#254](https://github.com/mikf/gallery-dl/issues/254)) - `nsfwalbum` - https://nsfwalbum.com/ ([#287](https://github.com/mikf/gallery-dl/issues/287)) - `artist` and `tags` metadata for `hentaicafe` ([#238](https://github.com/mikf/gallery-dl/issues/238)) - `description` metadata for `instagram` ([#310](https://github.com/mikf/gallery-dl/issues/310)) - Format string option to replace a substring with another - `R//` ([#318](https://github.com/mikf/gallery-dl/issues/318)) ### Changes - Delete empty archives created by the `zip` post-processor ([#316](https://github.com/mikf/gallery-dl/issues/316)) ### Fixes - Handle `hitomi` Game CG galleries correctly ([#321](https://github.com/mikf/gallery-dl/issues/321)) - Miscellaneous fixes for `deviantart`, `hitomi`, `pururin`, `kissmanga`, `keenspot`, `mangoxo`, `imagefap` ## 1.8.6 - 2019-06-14 ### Additions - Support for - `slickpic` - https://www.slickpic.com/ ([#249](https://github.com/mikf/gallery-dl/issues/249)) - `xhamster` - https://xhamster.com/ ([#281](https://github.com/mikf/gallery-dl/issues/281)) - `pornhub` - https://www.pornhub.com/ ([#282](https://github.com/mikf/gallery-dl/issues/282)) - `8muses` - https://www.8muses.com/ ([#305](https://github.com/mikf/gallery-dl/issues/305)) - `extra` option for `deviantart` to download Sta.sh content linked in description texts ([#302](https://github.com/mikf/gallery-dl/issues/302)) ### Changes - Detect `directlink` URLs with upper case filename extensions ([#296](https://github.com/mikf/gallery-dl/issues/296)) ### Fixes - Improved error handling for `tumblr` API calls ([#297](https://github.com/mikf/gallery-dl/issues/297)) - Fixed extraction of `livedoor` blogs ([#301](https://github.com/mikf/gallery-dl/issues/301)) - Fixed extraction of special `deviantart` Sta.sh items ([#307](https://github.com/mikf/gallery-dl/issues/307)) - Fixed pagination for specific `keenspot` comics ## 1.8.5 - 2019-06-01 ### Additions - Support for - `keenspot` - http://keenspot.com/ ([#223](https://github.com/mikf/gallery-dl/issues/223)) - `sankakucomplex` - https://www.sankakucomplex.com ([#258](https://github.com/mikf/gallery-dl/issues/258)) - `folders` option for `deviantart` to add a list of containing folders to each file ([#276](https://github.com/mikf/gallery-dl/issues/276)) - `captcha` option for `kissmanga` and `readcomiconline` to control CAPTCHA handling ([#279](https://github.com/mikf/gallery-dl/issues/279)) - `filename` metadata for files downloaded with youtube-dl ([#291](https://github.com/mikf/gallery-dl/issues/291)) ### Changes - Adjust `wallhaven` extractors to new page layout: - use API and add `api-key` option - removed traditional login support - Provide original filenames for `patreon` downloads ([#268](https://github.com/mikf/gallery-dl/issues/268)) - Use e-hentai.org or exhentai.org depending on input URL ([#278](https://github.com/mikf/gallery-dl/issues/278)) ### Fixes - Fix pagination over `sankaku` popular listings ([#265](https://github.com/mikf/gallery-dl/issues/265)) - Fix folder and collection extraction on `deviantart` ([#271](https://github.com/mikf/gallery-dl/issues/271)) - Detect "AreYouHuman" redirects on `readcomiconline` ([#279](https://github.com/mikf/gallery-dl/issues/279)) - Miscellaneous fixes for `hentainexus`, `livedoor`, `ngomik` ## 1.8.4 - 2019-05-17 ### Additions - Support for - `patreon` - https://www.patreon.com/ ([#226](https://github.com/mikf/gallery-dl/issues/226)) - `hentainexus` - https://hentainexus.com/ ([#256](https://github.com/mikf/gallery-dl/issues/256)) - `date` metadata fields for `pixiv` ([#248](https://github.com/mikf/gallery-dl/issues/248)), `instagram` ([#250](https://github.com/mikf/gallery-dl/issues/250)), `exhentai`, and `newgrounds` ### Changes - Improved `flickr` metadata and video extraction ([#246](https://github.com/mikf/gallery-dl/issues/246)) ### Fixes - Download original GIF animations from `deviantart` ([#242](https://github.com/mikf/gallery-dl/issues/242)) - Ignore missing `edge_media_to_comment` fields on `instagram` ([#250](https://github.com/mikf/gallery-dl/issues/250)) - Fix serialization of `datetime` objects for `--write-metadata` ([#251](https://github.com/mikf/gallery-dl/issues/251), [#252](https://github.com/mikf/gallery-dl/issues/252)) - Allow multiple post-processor command-line options at once ([#253](https://github.com/mikf/gallery-dl/issues/253)) - Prevent crash on `booru` sites when no tags are available ([#259](https://github.com/mikf/gallery-dl/issues/259)) - Fix extraction on `instagram` after `rhx_gis` field removal ([#266](https://github.com/mikf/gallery-dl/issues/266)) - Avoid Cloudflare CAPTCHAs for Python interpreters built against OpenSSL < 1.1.1 - Miscellaneous fixes for `luscious` ## 1.8.3 - 2019-05-04 ### Additions - Support for - `plurk` - https://www.plurk.com/ ([#212](https://github.com/mikf/gallery-dl/issues/212)) - `sexcom` - https://www.sex.com/ ([#147](https://github.com/mikf/gallery-dl/issues/147)) - `--clear-cache` - `date` metadata fields for `deviantart`, `twitter`, and `tumblr` ([#224](https://github.com/mikf/gallery-dl/issues/224), [#232](https://github.com/mikf/gallery-dl/issues/232)) ### Changes - Standalone executables are now built using PyInstaller: - uses the latest CPython interpreter (Python 3.7.3) - available on several platforms (Windows, Linux, macOS) - includes the `certifi` CA bundle, `youtube-dl`, and `pyOpenSSL` on Windows ### Fixes - Patch `urllib3`'s default list of SSL/TLS ciphers to prevent Cloudflare CAPTCHAs ([#227](https://github.com/mikf/gallery-dl/issues/227)) (Windows users need to install `pyOpenSSL` for this to take effect) - Provide fallback URLs for `twitter` images ([#237](https://github.com/mikf/gallery-dl/issues/237)) - Send `Referer` headers when downloading from `hitomi` ([#239](https://github.com/mikf/gallery-dl/issues/239)) - Updated login procedure on `mangoxo` ## 1.8.2 - 2019-04-12 ### Additions - Support for - `pixnet` - https://www.pixnet.net/ ([#177](https://github.com/mikf/gallery-dl/issues/177)) - `wikiart` - https://www.wikiart.org/ ([#179](https://github.com/mikf/gallery-dl/issues/179)) - `mangoxo` - https://www.mangoxo.com/ ([#184](https://github.com/mikf/gallery-dl/issues/184)) - `yaplog` - https://yaplog.jp/ ([#190](https://github.com/mikf/gallery-dl/issues/190)) - `livedoor` - http://blog.livedoor.jp/ ([#190](https://github.com/mikf/gallery-dl/issues/190)) - Login support for `mangoxo` ([#184](https://github.com/mikf/gallery-dl/issues/184)) and `twitter` ([#214](https://github.com/mikf/gallery-dl/issues/214)) ### Changes - Increased required `Requests` version to 2.11.0 ### Fixes - Improved image quality on `reactor` sites ([#210](https://github.com/mikf/gallery-dl/issues/210)) - Support `imagebam` galleries with more than 100 images ([#219](https://github.com/mikf/gallery-dl/issues/219)) - Updated Cloudflare bypass code ## 1.8.1 - 2019-03-29 ### Additions - Support for: - `35photo` - https://35photo.pro/ ([#162](https://github.com/mikf/gallery-dl/issues/162)) - `500px` - https://500px.com/ ([#185](https://github.com/mikf/gallery-dl/issues/185)) - `instagram` extractor for hashtags ([#202](https://github.com/mikf/gallery-dl/issues/202)) - Option to get more metadata on `deviantart` ([#189](https://github.com/mikf/gallery-dl/issues/189)) - Man pages and bash completion ([#150](https://github.com/mikf/gallery-dl/issues/150)) - Snap improvements ([#197](https://github.com/mikf/gallery-dl/issues/197), [#199](https://github.com/mikf/gallery-dl/issues/199), [#207](https://github.com/mikf/gallery-dl/issues/207)) ### Changes - Better FFmpeg arguments for `--ugoira-conv` - Adjusted metadata for `luscious` albums ### Fixes - Proper handling of `instagram` multi-image posts ([#178](https://github.com/mikf/gallery-dl/issues/178), [#201](https://github.com/mikf/gallery-dl/issues/201)) - Fixed `tumblr` avatar URLs when not using OAuth1.0 ([#193](https://github.com/mikf/gallery-dl/issues/193)) - Miscellaneous fixes for `exhentai`, `komikcast` ## 1.8.0 - 2019-03-15 ### Additions - Support for: - `weibo` - https://www.weibo.com/ - `pururin` - https://pururin.io/ ([#174](https://github.com/mikf/gallery-dl/issues/174)) - `fashionnova` - https://www.fashionnova.com/ ([#175](https://github.com/mikf/gallery-dl/issues/175)) - `shopify` sites in general ([#175](https://github.com/mikf/gallery-dl/issues/175)) - Snap packaging ([#169](https://github.com/mikf/gallery-dl/issues/169), [#170](https://github.com/mikf/gallery-dl/issues/170), [#187](https://github.com/mikf/gallery-dl/issues/187), [#188](https://github.com/mikf/gallery-dl/issues/188)) - Automatic Cloudflare DDoS protection bypass - Extractor and Job information for logging format strings - `dynastyscans` image and search extractors ([#163](https://github.com/mikf/gallery-dl/issues/163)) - `deviantart` scraps extractor ([#168](https://github.com/mikf/gallery-dl/issues/168)) - `artstation` extractor for artwork listings ([#172](https://github.com/mikf/gallery-dl/issues/172)) - `smugmug` video support and improved image format selection ([#183](https://github.com/mikf/gallery-dl/issues/183)) ### Changes - More metadata for `nhentai` galleries - Combined `myportfolio` extractors into one - Renamed `name` metadata field to `filename` and removed the original `filename` field - Simplified and improved internal data structures - Optimized creation of child extractors ### Fixes - Filter empty `tumblr` URLs ([#165](https://github.com/mikf/gallery-dl/issues/165)) - Filter ads and improve connection speed on `hentaifoundry` - Show proper error messages if `luscious` galleries are unavailable - Miscellaneous fixes for `mangahere`, `ngomik`, `simplyhentai`, `imgspice` ### Removals - `seaotterscans` ## 1.7.0 - 2019-02-05 - Added support for: - `photobucket` - http://photobucket.com/ ([#117](https://github.com/mikf/gallery-dl/issues/117)) - `hentaifox` - https://hentaifox.com/ ([#160](https://github.com/mikf/gallery-dl/issues/160)) - `tsumino` - https://www.tsumino.com/ ([#161](https://github.com/mikf/gallery-dl/issues/161)) - Added the ability to dynamically generate extractors based on a user's config file for - [`mastodon`](https://github.com/tootsuite/mastodon) instances ([#144](https://github.com/mikf/gallery-dl/issues/144)) - [`foolslide`](https://github.com/FoolCode/FoOlSlide) based sites - [`foolfuuka`](https://github.com/FoolCode/FoolFuuka) based archives - Added an extractor for `behance` collections ([#157](https://github.com/mikf/gallery-dl/issues/157)) - Added login support for `luscious` ([#159](https://github.com/mikf/gallery-dl/issues/159)) and `tsumino` ([#161](https://github.com/mikf/gallery-dl/issues/161)) - Added an option to stop downloading if the `exhentai` image limit is exceeded ([#141](https://github.com/mikf/gallery-dl/issues/141)) - Fixed extraction issues for `behance` and `mangapark` ## 1.6.3 - 2019-01-18 - Added `metadata` post-processor to write image metadata to an external file ([#135](https://github.com/mikf/gallery-dl/issues/135)) - Added option to reverse chapter order of manga extractors ([#149](https://github.com/mikf/gallery-dl/issues/149)) - Added authentication support for `danbooru` ([#151](https://github.com/mikf/gallery-dl/issues/151)) - Added tag metadata for `exhentai` and `hbrowse` galleries - Improved `*reactor` extractors ([#148](https://github.com/mikf/gallery-dl/issues/148)) - Fixed extraction issues for `nhentai` ([#156](https://github.com/mikf/gallery-dl/issues/156)), `pinterest`, `mangapark` ## 1.6.2 - 2019-01-01 - Added support for: - `instagram` - https://www.instagram.com/ ([#134](https://github.com/mikf/gallery-dl/issues/134)) - Added support for multiple items on sta.sh pages ([#113](https://github.com/mikf/gallery-dl/issues/113)) - Added option to download `tumblr` avatars ([#137](https://github.com/mikf/gallery-dl/issues/137)) - Changed defaults for visited post types and inline media on `tumblr` - Improved inline extraction of `tumblr` posts ([#133](https://github.com/mikf/gallery-dl/issues/133), [#137](https://github.com/mikf/gallery-dl/issues/137)) - Improved error handling and retry behavior of all API calls - Improved handling of missing fields in format strings ([#136](https://github.com/mikf/gallery-dl/issues/136)) - Fixed hash extraction for unusual `tumblr` URLs ([#129](https://github.com/mikf/gallery-dl/issues/129)) - Fixed image subdomains for `hitomi` galleries ([#142](https://github.com/mikf/gallery-dl/issues/142)) - Fixed and improved miscellaneous issues for `kissmanga` ([#20](https://github.com/mikf/gallery-dl/issues/20)), `luscious`, `mangapark`, `readcomiconline` ## 1.6.1 - 2018-11-28 - Added support for: - `joyreactor` - http://joyreactor.cc/ ([#114](https://github.com/mikf/gallery-dl/issues/114)) - `pornreactor` - http://pornreactor.cc/ ([#114](https://github.com/mikf/gallery-dl/issues/114)) - `newgrounds` - https://www.newgrounds.com/ ([#119](https://github.com/mikf/gallery-dl/issues/119)) - Added extractor for search results on `luscious` ([#127](https://github.com/mikf/gallery-dl/issues/127)) - Fixed filenames of ZIP archives ([#126](https://github.com/mikf/gallery-dl/issues/126)) - Fixed extraction issues for `gfycat`, `hentaifoundry` ([#125](https://github.com/mikf/gallery-dl/issues/125)), `mangafox` ## 1.6.0 - 2018-11-17 - Added support for: - `wallhaven` - https://alpha.wallhaven.cc/ - `yuki` - https://yuki.la/ - Added youtube-dl integration and video downloads for `twitter` ([#99](https://github.com/mikf/gallery-dl/issues/99)), `behance`, `artstation` - Added per-extractor options for network connections (`retries`, `timeout`, `verify`) - Added a `--no-check-certificate` command-line option - Added ability to specify the number of skipped downloads before aborting/exiting ([#115](https://github.com/mikf/gallery-dl/issues/115)) - Added extractors for scraps, favorites, popular and recent images on `hentaifoundry` ([#110](https://github.com/mikf/gallery-dl/issues/110)) - Improved login procedure for `pixiv` to avoid unwanted emails on each new login - Improved album metadata and error handling for `flickr` ([#109](https://github.com/mikf/gallery-dl/issues/109)) - Updated default User-Agent string to Firefox 62 ([#122](https://github.com/mikf/gallery-dl/issues/122)) - Fixed `twitter` API response handling when logged in ([#123](https://github.com/mikf/gallery-dl/issues/123)) - Fixed issue when converting Ugoira using H.264 - Fixed miscellaneous issues for `2chan`, `deviantart`, `fallenangels`, `flickr`, `imagefap`, `pinterest`, `turboimagehost`, `warosu`, `yuki` ([#112](https://github.com/mikf/gallery-dl/issues/112)) ## 1.5.3 - 2018-09-14 - Added support for: - `hentaicafe` - https://hentai.cafe/ ([#101](https://github.com/mikf/gallery-dl/issues/101)) - `bobx` - http://www.bobx.com/dark/ - Added black-/whitelist options for post-processor modules - Added support for `tumblr` inline videos ([#102](https://github.com/mikf/gallery-dl/issues/102)) - Fixed extraction of `smugmug` albums without owner ([#100](https://github.com/mikf/gallery-dl/issues/100)) - Fixed issues when using default config values with `reddit` extractors ([#104](https://github.com/mikf/gallery-dl/issues/104)) - Fixed pagination for user favorites on `sankaku` ([#106](https://github.com/mikf/gallery-dl/issues/106)) - Fixed a crash when processing `deviantart` journals ([#108](https://github.com/mikf/gallery-dl/issues/108)) ## 1.5.2 - 2018-08-31 - Added support for `twitter` timelines ([#96](https://github.com/mikf/gallery-dl/issues/96)) - Added option to suppress FFmpeg output during ugoira conversions - Improved filename formatter performance - Improved inline image quality on `tumblr` ([#98](https://github.com/mikf/gallery-dl/issues/98)) - Fixed image URLs for newly released `mangadex` chapters - Fixed a smaller issue with `deviantart` journals - Replaced `subapics` with `ngomik` ## 1.5.1 - 2018-08-17 - Added support for: - `piczel` - https://piczel.tv/ - Added support for related pins on `pinterest` - Fixed accessing "offensive" galleries on `exhentai` ([#97](https://github.com/mikf/gallery-dl/issues/97)) - Fixed extraction issues for `mangadex`, `komikcast` and `behance` - Removed original-image functionality from `tumblr`, since "raw" images are no longer accessible ## 1.5.0 - 2018-08-03 - Added support for: - `behance` - https://www.behance.net/ - `myportfolio` - https://www.myportfolio.com/ ([#95](https://github.com/mikf/gallery-dl/issues/95)) - Added custom format string options to handle long strings ([#92](https://github.com/mikf/gallery-dl/issues/92), [#94](https://github.com/mikf/gallery-dl/issues/94)) - Slicing: `"{field[10:40]}"` - Replacement: `"{field:L40/too long/}"` - Improved frame rate handling for ugoira conversions - Improved private access token usage on `deviantart` - Fixed metadata extraction for some images on `nijie` - Fixed chapter extraction on `mangahere` - Removed `whatisthisimnotgoodwithcomputers` - Removed support for Python 3.3 ## 1.4.2 - 2018-07-06 - Added image-pool extractors for `safebooru` and `rule34` - Added option for extended tag information on `booru` sites ([#92](https://github.com/mikf/gallery-dl/issues/92)) - Added support for DeviantArt's new URL format - Added support for `mangapark` mirrors - Changed `imagefap` extractors to use HTTPS - Fixed crash when skipping downloads for files without known extension ## 1.4.1 - 2018-06-22 - Added an `ugoira` post-processor to convert `pixiv` animations to WebM - Added `--zip` and `--ugoira-conv` command-line options - Changed how ugoira frame information is handled - instead of being written to a separate file, it is now made available as metadata field of the ZIP archive - Fixed manga and chapter titles for `mangadex` - Fixed file deletion by post-processors ## 1.4.0 - 2018-06-08 - Added support for: - `simplyhentai` - https://www.simply-hentai.com/ ([#89](https://github.com/mikf/gallery-dl/issues/89)) - Added extractors for - `pixiv` search results and followed users - `deviantart` search results and popular listings - Added post-processors to perform actions on downloaded files - Added options to configure logging behavior - Added OAuth support for `smugmug` - Changed `pixiv` extractors to use the AppAPI - this breaks `favorite` archive IDs and changes some metadata fields - Changed the default filename format for `tumblr` and renamed `offset` to `num` - Fixed a possible UnicodeDecodeError during installation ([#86](https://github.com/mikf/gallery-dl/issues/86)) - Fixed extraction of `mangadex` manga with more than 100 chapters ([#84](https://github.com/mikf/gallery-dl/issues/84)) - Fixed miscellaneous issues for `imgur`, `reddit`, `komikcast`, `mangafox` and `imagebam` ## 1.3.5 - 2018-05-04 - Added support for: - `smugmug` - https://www.smugmug.com/ - Added title information for `mangadex` chapters - Improved the `pinterest` API implementation ([#83](https://github.com/mikf/gallery-dl/issues/83)) - Improved error handling for `deviantart` and `tumblr` - Removed `gomanga` and `puremashiro` ## 1.3.4 - 2018-04-20 - Added support for custom OAuth2 credentials for `pinterest` - Improved rate limit handling for `tumblr` extractors - Improved `hentaifoundry` extractors - Improved `imgur` URL patterns - Fixed miscellaneous extraction issues for `luscious` and `komikcast` - Removed `loveisover` and `spectrumnexus` ## 1.3.3 - 2018-04-06 - Added extractors for - `nhentai` search results - `exhentai` search results and favorites - `nijie` doujins and favorites - Improved metadata extraction for `exhentai` and `nijie` - Improved `tumblr` extractors by avoiding unnecessary API calls - Fixed Cloudflare DDoS protection bypass - Fixed errors when trying to print unencodable characters ## 1.3.2 - 2018-03-23 - Added extractors for `artstation` albums, challenges and search results - Improved URL and metadata extraction for `hitomi`and `nhentai` - Fixed page transitions for `danbooru` API results ([#82](https://github.com/mikf/gallery-dl/issues/82)) ## 1.3.1 - 2018-03-16 - Added support for: - `mangadex` - https://mangadex.org/ - `artstation` - https://www.artstation.com/ - Added Cloudflare DDoS protection bypass to `komikcast` extractors - Changed archive ID formats for `deviantart` folders and collections - Improved error handling for `deviantart` API calls - Removed `imgchili` and various smaller image hosts ## 1.3.0 - 2018-03-02 - Added `--proxy` to explicitly specify a proxy server ([#76](https://github.com/mikf/gallery-dl/issues/76)) - Added options to customize [archive ID formats](https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst#extractorarchive-format) and [undefined replacement fields](https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst#extractorkeywords-default) - Changed various archive ID formats to improve their behavior for favorites / bookmarks / etc. - Affected modules are `deviantart`, `flickr`, `tumblr`, `pixiv` and all …boorus - Improved `sankaku` and `idolcomplex` support by - respecting `page` and `next` URL parameters ([#79](https://github.com/mikf/gallery-dl/issues/79)) - bypassing the page-limit for unauthenticated users - Improved `directlink` metadata by properly unquoting it - Fixed `pixiv` ugoira extraction ([#78](https://github.com/mikf/gallery-dl/issues/78)) - Fixed miscellaneous extraction issues for `mangastream` and `tumblr` - Removed `yeet`, `chronos`, `coreimg`, `hosturimage`, `imageontime`, `img4ever`, `imgmaid`, `imgupload` ## 1.2.0 - 2018-02-16 - Added support for: - `paheal` - https://rule34.paheal.net/ ([#69](https://github.com/mikf/gallery-dl/issues/69)) - `komikcast` - https://komikcast.com/ ([#70](https://github.com/mikf/gallery-dl/issues/70)) - `subapics` - http://subapics.com/ ([#70](https://github.com/mikf/gallery-dl/issues/70)) - Added `--download-archive` to record downloaded files in an archive file - Added `--write-log` to write logging output to a file - Added a filetype check on download completion to fix incorrectly assigned filename extensions ([#63](https://github.com/mikf/gallery-dl/issues/63)) - Added the `tumblr:...` pseudo URI scheme to support custom domains for Tumblr blogs ([#71](https://github.com/mikf/gallery-dl/issues/71)) - Added fallback URLs for `tumblr` images ([#64](https://github.com/mikf/gallery-dl/issues/64)) - Added support for `reddit`-hosted images ([#68](https://github.com/mikf/gallery-dl/issues/68)) - Improved the input file format by allowing comments and per-URL options - Fixed OAuth 1.0 signature generation for Python 3.3 and 3.4 ([#75](https://github.com/mikf/gallery-dl/issues/75)) - Fixed smaller issues for `luscious`, `hentai2read`, `hentaihere` and `imgur` - Removed the `batoto` module ## 1.1.2 - 2018-01-12 - Added support for: - `puremashiro` - http://reader.puremashiro.moe/ ([#66](https://github.com/mikf/gallery-dl/issues/66)) - `idolcomplex` - https://idol.sankakucomplex.com/ - Added an option to filter reblogs on `tumblr` ([#61](https://github.com/mikf/gallery-dl/issues/61)) - Added OAuth user authentication for `tumblr` ([#65](https://github.com/mikf/gallery-dl/issues/65)) - Added support for `slideshare` mobile URLs ([#67](https://github.com/mikf/gallery-dl/issues/67)) - Improved pagination for various …booru sites to work around page limits - Fixed chapter information parsing for certain manga on `kissmanga` ([#58](https://github.com/mikf/gallery-dl/issues/58)) and `batoto` ([#60](https://github.com/mikf/gallery-dl/issues/60)) ## 1.1.1 - 2017-12-22 - Added support for: - `slideshare` - https://www.slideshare.net/ ([#54](https://github.com/mikf/gallery-dl/issues/54)) - Added pool- and post-extractors for `sankaku` - Added OAuth user authentication for `deviantart` - Updated `luscious` to support `members.luscious.net` URLs ([#55](https://github.com/mikf/gallery-dl/issues/55)) - Updated `mangahere` to use their new domain name (mangahere.cc) and support mobile URLs - Updated `gelbooru` to not be restricted to the first 20,000 images ([#56](https://github.com/mikf/gallery-dl/issues/56)) - Fixed extraction issues for `nhentai` and `khinsider` ## 1.1.0 - 2017-12-08 - Added the ``-r/--limit-rate`` command-line option to set a maximum download rate - Added the ``--sleep`` command-line option to specify the number of seconds to sleep before each download - Updated `gelbooru` to no longer use their now disabled API - Fixed SWF extraction for `sankaku` ([#52](https://github.com/mikf/gallery-dl/issues/52)) - Fixed extraction issues for `hentai2read` and `khinsider` - Removed the deprecated `--images` and `--chapters` options - Removed the ``mangazuki`` module ## 1.0.2 - 2017-11-24 - Added an option to set a [custom user-agent string](https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst#extractoruser-agent) - Improved retry behavior for failed HTTP requests - Improved `seiga` by providing better metadata and getting more than the latest 200 images - Improved `tumblr` by adding support for [all post types](https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst#extractortumblrposts), scanning for [inline images](https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst#extractortumblrinline) and following [external links](https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst#extractortumblrexternal) ([#48](https://github.com/mikf/gallery-dl/issues/48)) - Fixed extraction issues for `hbrowse`, `khinsider` and `senmanga` ## 1.0.1 - 2017-11-10 - Added support for: - `xvideos` - https://www.xvideos.com/ ([#45](https://github.com/mikf/gallery-dl/issues/45)) - Fixed exception handling during file downloads which could lead to a premature exit - Fixed an issue with `tumblr` where not all images would be downloaded when using tags ([#48](https://github.com/mikf/gallery-dl/issues/48)) - Fixed extraction issues for `imgbox` ([#47](https://github.com/mikf/gallery-dl/issues/47)), `mangastream` ([#49](https://github.com/mikf/gallery-dl/issues/49)) and `mangahere` ## 1.0.0 - 2017-10-27 - Added support for: - `warosu` - https://warosu.org/ - `b4k` - https://arch.b4k.co/ - Added support for `pixiv` ranking lists - Added support for `booru` popular lists (`danbooru`, `e621`, `konachan`, `yandere`, `3dbooru`) - Added the `--cookies` command-line and [`cookies`](https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst#extractorcookies) config option to load additional cookies - Added the `--filter` and `--chapter-filter` command-line options to select individual images or manga-chapters by their metadata using simple Python expressions ([#43](https://github.com/mikf/gallery-dl/issues/43)) - Added the [`verify`](https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst#downloaderhttpverify) config option to control certificate verification during file downloads - Added config options to overwrite internally used API credentials ([API Tokens & IDs](https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst#api-tokens-ids)) - Added `-K` as a shortcut for `--list-keywords` - Changed the `--images` and `--chapters` command-line options to `--range` and `--chapter-range` - Changed keyword names for various modules to make them accessible by `--filter`. In general minus signs have been replaced with underscores (e.g. `gallery-id` -> `gallery_id`). - Changed default filename formats for manga extractors to optionally use volume and title information - Improved the downloader modules to use [`.part` files](https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst#downloaderpart) and support resuming incomplete downloads ([#29](https://github.com/mikf/gallery-dl/issues/29)) - Improved `deviantart` by distinguishing between users and groups ([#26](https://github.com/mikf/gallery-dl/issues/26)), always using HTTPS, and always downloading full-sized original images - Improved `sankaku` by adding authentication support and fixing various other issues ([#44](https://github.com/mikf/gallery-dl/issues/44)) - Improved URL pattern for direct image links ([#30](https://github.com/mikf/gallery-dl/issues/30)) - Fixed an issue with `luscious` not getting original image URLs ([#33](https://github.com/mikf/gallery-dl/issues/33)) - Fixed various smaller issues for `batoto`, `hentai2read` ([#38](https://github.com/mikf/gallery-dl/issues/38)), `jaiminisbox`, `khinsider`, `kissmanga` ([#28](https://github.com/mikf/gallery-dl/issues/28), [#46](https://github.com/mikf/gallery-dl/issues/46)), `mangahere`, `pawoo`, `twitter` - Removed `kisscomic` and `yonkouprod` modules ## 0.9.1 - 2017-07-24 - Added support for: - `2chan` - https://www.2chan.net/ - `4plebs` - https://archive.4plebs.org/ - `archivedmoe` - https://archived.moe/ - `archiveofsins` - https://archiveofsins.com/ - `desuarchive` - https://desuarchive.org/ - `fireden` - https://boards.fireden.net/ - `loveisover` - https://archive.loveisover.me/ - `nyafuu` - https://archive.nyafuu.org/ - `rbt` - https://rbt.asia/ - `thebarchive` - https://thebarchive.com/ - `mangazuki` - https://mangazuki.co/ - Improved `reddit` to allow submission filtering by ID and human-readable dates - Improved `deviantart` to support group galleries and gallery folders ([#26](https://github.com/mikf/gallery-dl/issues/26)) - Changed `deviantart` to use better default path formats - Fixed extraction of larger `imgur` albums - Fixed some smaller issues for `pixiv`, `batoto` and `fallenangels` ## 0.9.0 - 2017-06-28 - Added support for: - `reddit` - https://www.reddit.com/ ([#15](https://github.com/mikf/gallery-dl/issues/15)) - `flickr` - https://www.flickr.com/ ([#16](https://github.com/mikf/gallery-dl/issues/16)) - `gfycat` - https://gfycat.com/ - Added support for direct image links - Added user authentication via [OAuth](https://github.com/mikf/gallery-dl#52oauth) for `reddit` and `flickr` - Added support for user authentication data from [`.netrc`](https://stackoverflow.com/tags/.netrc/info) files ([#22](https://github.com/mikf/gallery-dl/issues/22)) - Added a simple progress indicator for multiple URLs ([#19](https://github.com/mikf/gallery-dl/issues/19)) - Added the `--write-unsupported` command-line option to write unsupported URLs to a file - Added documentation for all available config options ([configuration.rst](https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst)) - Improved `pixiv` to support tags for user downloads ([#17](https://github.com/mikf/gallery-dl/issues/17)) - Improved `pixiv` to support shortened and http://pixiv.me/... URLs ([#23](https://github.com/mikf/gallery-dl/issues/23)) - Improved `imgur` to properly handle `.gifv` images and provide better metadata - Fixed an issue with `kissmanga` where metadata parsing for some series failed ([#20](https://github.com/mikf/gallery-dl/issues/20)) - Fixed an issue with getting filename extensions from `Content-Type` response headers ## 0.8.4 - 2017-05-21 - Added the `--abort-on-skip` option to stop extraction if a download would be skipped - Improved the output format of the `--list-keywords` option - Updated `deviantart` to support all media types and journals - Updated `fallenangels` to support their [Vietnamese version](https://truyen.fascans.com/) - Fixed an issue with multiple tags on ...booru sites - Removed the `yomanga` module ## 0.8.3 - 2017-05-01 - Added support for https://pawoo.net/ - Added manga extractors for all [FoOlSlide](https://foolcode.github.io/FoOlSlide/)-based modules - Added the `-q/--quiet` and `-v/--verbose` options to control output verbosity - Added the `-j/--dump-json` option to dump extractor results in JSON format - Added the `--ignore-config` option - Updated the `exhentai` extractor to fall back to using the e-hentai version if no username is given - Updated `deviantart` to support sta.sh URLs - Fixed an issue with `kissmanga` which prevented image URLs from being decrypted properly (again) - Fixed an issue with `pixhost` where for an image inside an album it would always download the first image of that album ([#13](https://github.com/mikf/gallery-dl/issues/13)) - Removed the `mangashare` and `readcomics` modules ## 0.8.2 - 2017-04-10 - Fixed an issue in `kissmanga` which prevented image URLs from being decrypted properly ## 0.8.1 - 2017-04-09 - Added new extractors: - `kireicake` - https://reader.kireicake.com/ - `seaotterscans` - https://reader.seaotterscans.com/ - Added a favourites extractor for `deviantart` - Re-enabled the `kissmanga` module - Updated `nijie` to support multi-page image listings - Updated `mangastream` to support readms.net URLs - Updated `exhentai` to support e-hentai.org URLs - Updated `fallenangels` to support their new domain and site layout ## 0.8.0 - 2017-03-28 - Added logging support - Added the `-R/--retries` option to specify how often a download should be retried before giving up - Added the `--http-timeout` option to set a timeout for HTTP connections - Improved error handling/tolerance during HTTP file downloads ([#10](https://github.com/mikf/gallery-dl/issues/10)) - Improved option parsing and the help message from `-h/--help` - Changed the way configuration values are used by prioritizing top-level values - This allows for cmdline options like `-u/--username` to overwrite values set in configuration files - Fixed an issue with `imagefap.com` where incorrectly reported gallery sizes would cause the extractor to fail ([#9](https://github.com/mikf/gallery-dl/issues/9)) - Fixed an issue with `seiga.nicovideo.jp` where invalid characters in an API response caused the XML parser to fail - Fixed an issue with `seiga.nicovideo.jp` where the filename extension for the first image would be used for all others - Removed support for old configuration paths on Windows - Removed several modules: - `mangamint`: site is down - `whentai`: now requires account with VIP status for original images - `kissmanga`: encrypted image URLs (will be re-added later) ## 0.7.0 - 2017-03-06 - Added `--images` and `--chapters` options - Specifies which images (or chapters) to download through a comma-separated list of indices or index-ranges - Example: `--images -2,4,6-8,10-` will select images with index 1, 2, 4, 6, 7, 8 and 10 up to the last one - Changed the `-g`/`--get-urls` option - The amount of how often the -g option is given now determines up until which level URLs are resolved. - See 3bca86618505c21628cd9c7179ce933a78d00ca2 - Changed several option keys: - `directory_fmt` -> `directory` - `filename_fmt` -> `filename` - `download-original` -> `original` - Improved [FoOlSlide](https://foolcode.github.io/FoOlSlide/)-based extractors - Fixed URL extraction for hentai2read - Fixed an issue with deviantart, where the API access token wouldn't get refreshed ## 0.6.4 - 2017-02-13 - Added new extractors: - fallenangels (famatg.com) - Fixed url- and data-extraction for: - nhentai - mangamint - twitter - imagetwist - Disabled InsecureConnectionWarning when no certificates are available ## 0.6.3 - 2017-01-25 - Added new extractors: - gomanga - yomanga - mangafox - Fixed deviantart extractor failing - switched to using their API - Fixed an issue with SQLite on Python 3.6 - Automated test builds via Travis CI - Standalone executables for Windows ## 0.6.2 - 2017-01-05 - Added new extractors: - kisscomic - readcomics - yonkouprod - jaiminisbox - Added manga extractor to batoto-module - Added user extractor to seiga-module - Added `-i`/`--input-file` argument to allow local files and stdin as input (like wget) - Added basic support for `file://` URLs - this allows for the recursive extractor to be applied to local files: - `$ gallery-dl r:file://[path to file]` - Added a utility extractor to run unit test URLs - Updated luscious to deal with API changes - Fixed twitter to provide the original image URL - Minor fixes to hentaifoundry - Removed imgclick extractor ## 0.6.1 - 2016-11-30 - Added new extractors: - whentai - readcomiconline - sensescans, worldthree - imgmaid, imagevenue, img4ever, imgspot, imgtrial, pixhost - Added base class for extractors of [FoOlSlide](https://foolcode.github.io/FoOlSlide/)-based sites - Changed default paths for configuration files on Windows - old paths are still supported, but that will change in future versions - Fixed aborting downloads if a single one failed ([#5](https://github.com/mikf/gallery-dl/issues/5)) - Fixed cloudflare-bypass cache containing outdated cookies - Fixed image URLs for hitomi and 8chan - Updated deviantart to always provide the highest quality image - Updated README.rst - Removed doujinmode extractor ## 0.6.0 - 2016-10-08 - Added new extractors: - hentaihere - dokireader - twitter - rapidimg, picmaniac - Added support to find filename extensions by Content-Type response header - Fixed filename/path issues on Windows ([#4](https://github.com/mikf/gallery-dl/issues/4)): - Enable path names with more than 260 characters - Remove trailing spaces in path segments - Updated Job class to automatically set category/subcategory keywords ## 0.5.2 - 2016-09-23 - Added new extractors: - pinterest - rule34 - dynastyscans - imagebam, coreimg, imgcandy, imgtrex - Added login capabilities for batoto - Added `--version` cmdline argument to print the current program version and exit - Added `--list-extractors` cmdline argument to print names of all extractor classes together with descriptions and example URLs - Added proper error messages if an image/user does not exist - Added unittests for every extractor ## 0.5.1 - 2016-08-22 - Added new extractors: - luscious - doujinmode - hentaibox - seiga - imagefap - Changed error output to use stderr instead of stdout - Fixed broken pipes causing an exception-dump by catching BrokenPipeErrors ## 0.5.0 - 2016-07-25 ## 0.4.1 - 2015-12-03 - New modules (imagetwist, turboimagehost) - Manga-extractors: Download entire manga and not just single chapters - Generic extractor (provisional) - Better and configurable console output - Windows support ## 0.4.0 - 2015-11-26 ## 0.3.3 - 2015-11-10 ## 0.3.2 - 2015-11-04 ## 0.3.1 - 2015-10-30 ## 0.3.0 - 2015-10-05 ## 0.2.0 - 2015-06-28 ## 0.1.0 - 2015-05-27 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702659231.0 gallery_dl-1.26.9/LICENSE0000644000175000017500000004325414537102237013444 0ustar00mikemike GNU GENERAL PUBLIC LICENSE Version 2, June 1991 Copyright (C) 1989, 1991 Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This General Public License applies to most of the Free Software Foundation's software and to any other program whose authors commit to using it. (Some other Free Software Foundation software is covered by the GNU Lesser General Public License instead.) You can apply it to your programs, too. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things. To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it. For example, if you distribute copies of such a program, whether gratis or for a fee, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights. We protect your rights with two steps: (1) copyright the software, and (2) offer you this license which gives you legal permission to copy, distribute and/or modify the software. Also, for each author's protection and ours, we want to make certain that everyone understands that there is no warranty for this free software. If the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors' reputations. Finally, any free program is threatened constantly by software patents. We wish to avoid the danger that redistributors of a free program will individually obtain patent licenses, in effect making the program proprietary. To prevent this, we have made it clear that any patent must be licensed for everyone's free use or not licensed at all. The precise terms and conditions for copying, distribution and modification follow. GNU GENERAL PUBLIC LICENSE TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 0. This License applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License. The "Program", below, refers to any such program or work, and a "work based on the Program" means either the Program or any derivative work under copyright law: that is to say, a work containing the Program or a portion of it, either verbatim or with modifications and/or translated into another language. (Hereinafter, translation is included without limitation in the term "modification".) Each licensee is addressed as "you". Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running the Program is not restricted, and the output from the Program is covered only if its contents constitute a work based on the Program (independent of having been made by running the Program). Whether that is true depends on what the Program does. 1. You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and give any other recipients of the Program a copy of this License along with the Program. You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. 2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions: a) You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change. b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License. c) If the modified program normally reads commands interactively when run, you must cause it, when started running for such interactive use in the most ordinary way, to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty (or else, saying that you provide a warranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this License. (Exception: if the Program itself is interactive but does not normally print such an announcement, your work based on the Program is not required to print an announcement.) These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Program, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Program, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it. Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Program. In addition, mere aggregation of another work not based on the Program with the Program (or with a work based on the Program) on a volume of a storage or distribution medium does not bring the other work under the scope of this License. 3. You may copy and distribute the Program (or a work based on it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following: a) Accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, b) Accompany it with a written offer, valid for at least three years, to give any third party, for a charge no more than your cost of physically performing source distribution, a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, c) Accompany it with the information you received as to the offer to distribute corresponding source code. (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer, in accord with Subsection b above.) The source code for a work means the preferred form of the work for making modifications to it. For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable. However, as a special exception, the source code distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable. If distribution of executable or object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place counts as distribution of the source code, even though third parties are not compelled to copy the source along with the object code. 4. You may not copy, modify, sublicense, or distribute the Program except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense or distribute the Program is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. 5. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Program or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Program (or any work based on the Program), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Program or works based on it. 6. Each time you redistribute the Program (or any work based on the Program), the recipient automatically receives a license from the original licensor to copy, distribute or modify the Program subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties to this License. 7. If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Program at all. For example, if a patent license would not permit royalty-free redistribution of the Program by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Program. If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply and the section as a whole is intended to apply in other circumstances. It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system, which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice. This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License. 8. If the distribution and/or use of the Program is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Program under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License. 9. The Free Software Foundation may publish revised and/or new versions of the General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies a version number of this License which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of this License, you may choose any version ever published by the Free Software Foundation. 10. If you wish to incorporate parts of the Program into other free programs whose distribution conditions are different, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. NO WARRANTY 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. END OF TERMS AND CONDITIONS How to Apply These Terms to Your New Programs If you develop a new program, and you want it to be of the greatest possible use to the public, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms. To do so, attach the following notices to the program. It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. Copyright (C) This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. Also add information on how to contact you by electronic and paper mail. If the program is interactive, make it output a short notice like this when it starts in an interactive mode: Gnomovision version 69, Copyright (C) year name of author Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. This is free software, and you are welcome to redistribute it under certain conditions; type `show c' for details. The hypothetical commands `show w' and `show c' should show the appropriate parts of the General Public License. Of course, the commands you use may be called something other than `show w' and `show c'; they could even be mouse-clicks or menu items--whatever suits your program. You should also get your employer (if you work as a programmer) or your school, if any, to sign a "copyright disclaimer" for the program, if necessary. Here is a sample; alter the names: Yoyodyne, Inc., hereby disclaims all copyright interest in the program `Gnomovision' (which makes passes at compilers) written by James Hacker. , 1 April 1989 Ty Coon, President of Vice This General Public License does not permit incorporating your program into proprietary programs. If your program is a subroutine library, you may consider it more useful to permit linking proprietary applications with the library. If this is what you want to do, use the GNU Lesser General Public License instead of this License. ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1708353731.0 gallery_dl-1.26.9/MANIFEST.in0000644000175000017500000000010614564664303014171 0ustar00mikemikeinclude README.rst CHANGELOG.md LICENSE recursive-include docs *.conf ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1711211674.0158062 gallery_dl-1.26.9/PKG-INFO0000644000175000017500000003373714577602232013545 0ustar00mikemikeMetadata-Version: 2.1 Name: gallery_dl Version: 1.26.9 Summary: Command-line program to download image galleries and collections from several image hosting sites Home-page: https://github.com/mikf/gallery-dl Download-URL: https://github.com/mikf/gallery-dl/releases/latest Author: Mike Fährmann Author-email: mike_faehrmann@web.de Maintainer: Mike Fährmann Maintainer-email: mike_faehrmann@web.de License: GPLv2 Keywords: image gallery downloader crawler scraper Classifier: Development Status :: 5 - Production/Stable Classifier: Environment :: Console Classifier: Intended Audience :: End Users/Desktop Classifier: License :: OSI Approved :: GNU General Public License v2 (GPLv2) Classifier: Operating System :: OS Independent Classifier: Programming Language :: Python Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3 :: Only Classifier: Programming Language :: Python :: 3.4 Classifier: Programming Language :: Python :: 3.5 Classifier: Programming Language :: Python :: 3.6 Classifier: Programming Language :: Python :: 3.7 Classifier: Programming Language :: Python :: 3.8 Classifier: Programming Language :: Python :: 3.9 Classifier: Programming Language :: Python :: 3.10 Classifier: Programming Language :: Python :: 3.11 Classifier: Programming Language :: Python :: 3.12 Classifier: Programming Language :: Python :: Implementation :: CPython Classifier: Programming Language :: Python :: Implementation :: PyPy Classifier: Topic :: Internet :: WWW/HTTP Classifier: Topic :: Multimedia :: Graphics Classifier: Topic :: Utilities Requires-Python: >=3.4 License-File: LICENSE Requires-Dist: requests>=2.11.0 Provides-Extra: video Requires-Dist: youtube-dl; extra == "video" ========== gallery-dl ========== *gallery-dl* is a command-line program to download image galleries and collections from several image hosting sites (see `Supported Sites `__). It is a cross-platform tool with many `configuration options `__ and powerful `filenaming capabilities `__. |pypi| |build| .. contents:: Dependencies ============ - Python_ 3.4+ - Requests_ Optional -------- - FFmpeg_: Pixiv Ugoira conversion - yt-dlp_ or youtube-dl_: Video downloads - PySocks_: SOCKS proxy support - brotli_ or brotlicffi_: Brotli compression support - PyYAML_: YAML configuration file support - toml_: TOML configuration file support for Python<3.11 - SecretStorage_: GNOME keyring passwords for ``--cookies-from-browser`` Installation ============ Pip --- The stable releases of *gallery-dl* are distributed on PyPI_ and can be easily installed or upgraded using pip_: .. code:: bash python3 -m pip install -U gallery-dl Installing the latest dev version directly from GitHub can be done with pip_ as well: .. code:: bash python3 -m pip install -U -I --no-deps --no-cache-dir https://github.com/mikf/gallery-dl/archive/master.tar.gz Note: Windows users should use :code:`py -3` instead of :code:`python3`. It is advised to use the latest version of pip_, including the essential packages :code:`setuptools` and :code:`wheel`. To ensure these packages are up-to-date, run .. code:: bash python3 -m pip install --upgrade pip setuptools wheel Standalone Executable --------------------- Prebuilt executable files with a Python interpreter and required Python packages included are available for - `Windows `__ (Requires `Microsoft Visual C++ Redistributable Package (x86) `__) - `Linux `__ Nightly Builds -------------- | Executables build from the latest commit can be found at | https://github.com/mikf/gallery-dl/actions/workflows/executables.yml Snap ---- Linux users that are using a distro that is supported by Snapd_ can install *gallery-dl* from the Snap Store: .. code:: bash snap install gallery-dl Chocolatey ---------- Windows users that have Chocolatey_ installed can install *gallery-dl* from the Chocolatey Community Packages repository: .. code:: powershell choco install gallery-dl Scoop ----- *gallery-dl* is also available in the Scoop_ "main" bucket for Windows users: .. code:: powershell scoop install gallery-dl Homebrew -------- For macOS or Linux users using Homebrew: .. code:: bash brew install gallery-dl MacPorts -------- For macOS users with MacPorts: .. code:: bash sudo port install gallery-dl Docker -------- Using the Dockerfile in the repository: .. code:: bash git clone https://github.com/mikf/gallery-dl.git cd gallery-dl/ docker build -t gallery-dl:latest . Pulling image from `Docker Hub `__: .. code:: bash docker pull mikf123/gallery-dl docker tag mikf123/gallery-dl gallery-dl Pulling image from `GitHub Container Registry `__: .. code:: bash docker pull ghcr.io/mikf/gallery-dl docker tag ghcr.io/mikf/gallery-dl gallery-dl To run the container you will probably want to attach some directories on the host so that the config file and downloads can persist across runs. Make sure to either download the example config file reference in the repo and place it in the mounted volume location or touch an empty file there. If you gave the container a different tag or are using podman then make sure you adjust. Run ``docker image ls`` to check the name if you are not sure. This will remove the container after every use so you will always have a fresh environment for it to run. If you setup a ci-cd pipeline to autobuild the container you can also add a ``--pull=newer`` flag so that when you run it docker will check to see if there is a newer container and download it before running. .. code:: bash docker run --rm -v $HOME/Downloads/:/gallery-dl/ -v $HOME/.config/gallery-dl/gallery-dl.conf:/etc/gallery-dl.conf -it gallery-dl:latest You can also add an alias to your shell for "gallery-dl" or create a simple bash script and drop it somewhere in your $PATH to act as a shim for this command. Usage ===== To use *gallery-dl* simply call it with the URLs you wish to download images from: .. code:: bash gallery-dl [OPTIONS]... URLS... Use :code:`gallery-dl --help` or see ``__ for a full list of all command-line options. Examples -------- Download images; in this case from danbooru via tag search for 'bonocho': .. code:: bash gallery-dl "https://danbooru.donmai.us/posts?tags=bonocho" Get the direct URL of an image from a site supporting authentication with username & password: .. code:: bash gallery-dl -g -u "" -p "" "https://twitter.com/i/web/status/604341487988576256" Filter manga chapters by chapter number and language: .. code:: bash gallery-dl --chapter-filter "10 <= chapter < 20" -o "lang=fr" "https://mangadex.org/title/59793dd0-a2d8-41a2-9758-8197287a8539" | Search a remote resource for URLs and download images from them: | (URLs for which no extractor can be found will be silently ignored) .. code:: bash gallery-dl "r:https://pastebin.com/raw/FLwrCYsT" If a site's address is nonstandard for its extractor, you can prefix the URL with the extractor's name to force the use of a specific extractor: .. code:: bash gallery-dl "tumblr:https://sometumblrblog.example" Configuration ============= Configuration files for *gallery-dl* use a JSON-based file format. Documentation ------------- A list of all available configuration options and their descriptions can be found in ``__. | For a default configuration file with available options set to their default values, see ``__. | For a commented example with more involved settings and option usage, see ``__. Locations --------- *gallery-dl* searches for configuration files in the following places: Windows: * ``%APPDATA%\gallery-dl\config.json`` * ``%USERPROFILE%\gallery-dl\config.json`` * ``%USERPROFILE%\gallery-dl.conf`` (``%USERPROFILE%`` usually refers to a user's home directory, i.e. ``C:\Users\\``) Linux, macOS, etc.: * ``/etc/gallery-dl.conf`` * ``${XDG_CONFIG_HOME}/gallery-dl/config.json`` * ``${HOME}/.config/gallery-dl/config.json`` * ``${HOME}/.gallery-dl.conf`` When run as `executable `__, *gallery-dl* will also look for a ``gallery-dl.conf`` file in the same directory as said executable. It is possible to use more than one configuration file at a time. In this case, any values from files after the first will get merged into the already loaded settings and potentially override previous ones. Authentication ============== Username & Password ------------------- Some extractors require you to provide valid login credentials in the form of a username & password pair. This is necessary for ``nijie`` and optional for ``aryion``, ``danbooru``, ``e621``, ``exhentai``, ``idolcomplex``, ``imgbb``, ``inkbunny``, ``mangadex``, ``mangoxo``, ``pillowfort``, ``sankaku``, ``subscribestar``, ``tapas``, ``tsumino``, ``twitter``, and ``zerochan``. You can set the necessary information in your `configuration file `__ .. code:: json { "extractor": { "twitter": { "username": "", "password": "" } } } or you can provide them directly via the :code:`-u/--username` and :code:`-p/--password` or via the :code:`-o/--option` command-line options .. code:: bash gallery-dl -u "" -p "" "URL" gallery-dl -o "username=" -o "password=" "URL" Cookies ------- For sites where login with username & password is not possible due to CAPTCHA or similar, or has not been implemented yet, you can use the cookies from a browser login session and input them into *gallery-dl*. This can be done via the `cookies `__ option in your configuration file by specifying - | the path to a Mozilla/Netscape format cookies.txt file exported by a browser addon | (e.g. `Get cookies.txt LOCALLY `__ for Chrome, `Export Cookies `__ for Firefox) - | a list of name-value pairs gathered from your browser's web developer tools | (in `Chrome `__, in `Firefox `__) - | the name of a browser to extract cookies from | (supported browsers are Chromium-based ones, Firefox, and Safari) For example: .. code:: json { "extractor": { "instagram": { "cookies": "$HOME/path/to/cookies.txt" }, "patreon": { "cookies": { "session_id": "K1T57EKu19TR49C51CDjOJoXNQLF7VbdVOiBrC9ye0a" } }, "twitter": { "cookies": ["firefox"] } } } | You can also specify a cookies.txt file with the :code:`--cookies` command-line option | or a browser to extract cookies from with :code:`--cookies-from-browser`: .. code:: bash gallery-dl --cookies "$HOME/path/to/cookies.txt" "URL" gallery-dl --cookies-from-browser firefox "URL" OAuth ----- *gallery-dl* supports user authentication via OAuth_ for some extractors. This is necessary for ``pixiv`` and optional for ``deviantart``, ``flickr``, ``reddit``, ``smugmug``, ``tumblr``, and ``mastodon`` instances. Linking your account to *gallery-dl* grants it the ability to issue requests on your account's behalf and enables it to access resources which would otherwise be unavailable to a public user. To do so, start by invoking it with ``oauth:`` as an argument. For example: .. code:: bash gallery-dl oauth:flickr You will be sent to the site's authorization page and asked to grant read access to *gallery-dl*. Authorize it and you will be shown one or more "tokens", which should be added to your configuration file. To authenticate with a ``mastodon`` instance, run *gallery-dl* with ``oauth:mastodon:`` as argument. For example: .. code:: bash gallery-dl oauth:mastodon:pawoo.net gallery-dl oauth:mastodon:https://mastodon.social/ .. _Python: https://www.python.org/downloads/ .. _PyPI: https://pypi.org/ .. _pip: https://pip.pypa.io/en/stable/ .. _Requests: https://requests.readthedocs.io/en/master/ .. _FFmpeg: https://www.ffmpeg.org/ .. _yt-dlp: https://github.com/yt-dlp/yt-dlp .. _youtube-dl: https://ytdl-org.github.io/youtube-dl/ .. _PySocks: https://pypi.org/project/PySocks/ .. _brotli: https://github.com/google/brotli .. _brotlicffi: https://github.com/python-hyper/brotlicffi .. _PyYAML: https://pyyaml.org/ .. _toml: https://pypi.org/project/toml/ .. _SecretStorage: https://pypi.org/project/SecretStorage/ .. _Snapd: https://docs.snapcraft.io/installing-snapd .. _OAuth: https://en.wikipedia.org/wiki/OAuth .. _Chocolatey: https://chocolatey.org/install .. _Scoop: https://scoop.sh .. |pypi| image:: https://img.shields.io/pypi/v/gallery-dl.svg :target: https://pypi.org/project/gallery-dl/ .. |build| image:: https://github.com/mikf/gallery-dl/workflows/tests/badge.svg :target: https://github.com/mikf/gallery-dl/actions .. |gitter| image:: https://badges.gitter.im/gallery-dl/main.svg :target: https://gitter.im/gallery-dl/main ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1711211672.0 gallery_dl-1.26.9/README.rst0000644000175000017500000002764214577602230014133 0ustar00mikemike========== gallery-dl ========== *gallery-dl* is a command-line program to download image galleries and collections from several image hosting sites (see `Supported Sites `__). It is a cross-platform tool with many `configuration options `__ and powerful `filenaming capabilities `__. |pypi| |build| .. contents:: Dependencies ============ - Python_ 3.4+ - Requests_ Optional -------- - FFmpeg_: Pixiv Ugoira conversion - yt-dlp_ or youtube-dl_: Video downloads - PySocks_: SOCKS proxy support - brotli_ or brotlicffi_: Brotli compression support - PyYAML_: YAML configuration file support - toml_: TOML configuration file support for Python<3.11 - SecretStorage_: GNOME keyring passwords for ``--cookies-from-browser`` Installation ============ Pip --- The stable releases of *gallery-dl* are distributed on PyPI_ and can be easily installed or upgraded using pip_: .. code:: bash python3 -m pip install -U gallery-dl Installing the latest dev version directly from GitHub can be done with pip_ as well: .. code:: bash python3 -m pip install -U -I --no-deps --no-cache-dir https://github.com/mikf/gallery-dl/archive/master.tar.gz Note: Windows users should use :code:`py -3` instead of :code:`python3`. It is advised to use the latest version of pip_, including the essential packages :code:`setuptools` and :code:`wheel`. To ensure these packages are up-to-date, run .. code:: bash python3 -m pip install --upgrade pip setuptools wheel Standalone Executable --------------------- Prebuilt executable files with a Python interpreter and required Python packages included are available for - `Windows `__ (Requires `Microsoft Visual C++ Redistributable Package (x86) `__) - `Linux `__ Nightly Builds -------------- | Executables build from the latest commit can be found at | https://github.com/mikf/gallery-dl/actions/workflows/executables.yml Snap ---- Linux users that are using a distro that is supported by Snapd_ can install *gallery-dl* from the Snap Store: .. code:: bash snap install gallery-dl Chocolatey ---------- Windows users that have Chocolatey_ installed can install *gallery-dl* from the Chocolatey Community Packages repository: .. code:: powershell choco install gallery-dl Scoop ----- *gallery-dl* is also available in the Scoop_ "main" bucket for Windows users: .. code:: powershell scoop install gallery-dl Homebrew -------- For macOS or Linux users using Homebrew: .. code:: bash brew install gallery-dl MacPorts -------- For macOS users with MacPorts: .. code:: bash sudo port install gallery-dl Docker -------- Using the Dockerfile in the repository: .. code:: bash git clone https://github.com/mikf/gallery-dl.git cd gallery-dl/ docker build -t gallery-dl:latest . Pulling image from `Docker Hub `__: .. code:: bash docker pull mikf123/gallery-dl docker tag mikf123/gallery-dl gallery-dl Pulling image from `GitHub Container Registry `__: .. code:: bash docker pull ghcr.io/mikf/gallery-dl docker tag ghcr.io/mikf/gallery-dl gallery-dl To run the container you will probably want to attach some directories on the host so that the config file and downloads can persist across runs. Make sure to either download the example config file reference in the repo and place it in the mounted volume location or touch an empty file there. If you gave the container a different tag or are using podman then make sure you adjust. Run ``docker image ls`` to check the name if you are not sure. This will remove the container after every use so you will always have a fresh environment for it to run. If you setup a ci-cd pipeline to autobuild the container you can also add a ``--pull=newer`` flag so that when you run it docker will check to see if there is a newer container and download it before running. .. code:: bash docker run --rm -v $HOME/Downloads/:/gallery-dl/ -v $HOME/.config/gallery-dl/gallery-dl.conf:/etc/gallery-dl.conf -it gallery-dl:latest You can also add an alias to your shell for "gallery-dl" or create a simple bash script and drop it somewhere in your $PATH to act as a shim for this command. Usage ===== To use *gallery-dl* simply call it with the URLs you wish to download images from: .. code:: bash gallery-dl [OPTIONS]... URLS... Use :code:`gallery-dl --help` or see ``__ for a full list of all command-line options. Examples -------- Download images; in this case from danbooru via tag search for 'bonocho': .. code:: bash gallery-dl "https://danbooru.donmai.us/posts?tags=bonocho" Get the direct URL of an image from a site supporting authentication with username & password: .. code:: bash gallery-dl -g -u "" -p "" "https://twitter.com/i/web/status/604341487988576256" Filter manga chapters by chapter number and language: .. code:: bash gallery-dl --chapter-filter "10 <= chapter < 20" -o "lang=fr" "https://mangadex.org/title/59793dd0-a2d8-41a2-9758-8197287a8539" | Search a remote resource for URLs and download images from them: | (URLs for which no extractor can be found will be silently ignored) .. code:: bash gallery-dl "r:https://pastebin.com/raw/FLwrCYsT" If a site's address is nonstandard for its extractor, you can prefix the URL with the extractor's name to force the use of a specific extractor: .. code:: bash gallery-dl "tumblr:https://sometumblrblog.example" Configuration ============= Configuration files for *gallery-dl* use a JSON-based file format. Documentation ------------- A list of all available configuration options and their descriptions can be found in ``__. | For a default configuration file with available options set to their default values, see ``__. | For a commented example with more involved settings and option usage, see ``__. Locations --------- *gallery-dl* searches for configuration files in the following places: Windows: * ``%APPDATA%\gallery-dl\config.json`` * ``%USERPROFILE%\gallery-dl\config.json`` * ``%USERPROFILE%\gallery-dl.conf`` (``%USERPROFILE%`` usually refers to a user's home directory, i.e. ``C:\Users\\``) Linux, macOS, etc.: * ``/etc/gallery-dl.conf`` * ``${XDG_CONFIG_HOME}/gallery-dl/config.json`` * ``${HOME}/.config/gallery-dl/config.json`` * ``${HOME}/.gallery-dl.conf`` When run as `executable `__, *gallery-dl* will also look for a ``gallery-dl.conf`` file in the same directory as said executable. It is possible to use more than one configuration file at a time. In this case, any values from files after the first will get merged into the already loaded settings and potentially override previous ones. Authentication ============== Username & Password ------------------- Some extractors require you to provide valid login credentials in the form of a username & password pair. This is necessary for ``nijie`` and optional for ``aryion``, ``danbooru``, ``e621``, ``exhentai``, ``idolcomplex``, ``imgbb``, ``inkbunny``, ``mangadex``, ``mangoxo``, ``pillowfort``, ``sankaku``, ``subscribestar``, ``tapas``, ``tsumino``, ``twitter``, and ``zerochan``. You can set the necessary information in your `configuration file `__ .. code:: json { "extractor": { "twitter": { "username": "", "password": "" } } } or you can provide them directly via the :code:`-u/--username` and :code:`-p/--password` or via the :code:`-o/--option` command-line options .. code:: bash gallery-dl -u "" -p "" "URL" gallery-dl -o "username=" -o "password=" "URL" Cookies ------- For sites where login with username & password is not possible due to CAPTCHA or similar, or has not been implemented yet, you can use the cookies from a browser login session and input them into *gallery-dl*. This can be done via the `cookies `__ option in your configuration file by specifying - | the path to a Mozilla/Netscape format cookies.txt file exported by a browser addon | (e.g. `Get cookies.txt LOCALLY `__ for Chrome, `Export Cookies `__ for Firefox) - | a list of name-value pairs gathered from your browser's web developer tools | (in `Chrome `__, in `Firefox `__) - | the name of a browser to extract cookies from | (supported browsers are Chromium-based ones, Firefox, and Safari) For example: .. code:: json { "extractor": { "instagram": { "cookies": "$HOME/path/to/cookies.txt" }, "patreon": { "cookies": { "session_id": "K1T57EKu19TR49C51CDjOJoXNQLF7VbdVOiBrC9ye0a" } }, "twitter": { "cookies": ["firefox"] } } } | You can also specify a cookies.txt file with the :code:`--cookies` command-line option | or a browser to extract cookies from with :code:`--cookies-from-browser`: .. code:: bash gallery-dl --cookies "$HOME/path/to/cookies.txt" "URL" gallery-dl --cookies-from-browser firefox "URL" OAuth ----- *gallery-dl* supports user authentication via OAuth_ for some extractors. This is necessary for ``pixiv`` and optional for ``deviantart``, ``flickr``, ``reddit``, ``smugmug``, ``tumblr``, and ``mastodon`` instances. Linking your account to *gallery-dl* grants it the ability to issue requests on your account's behalf and enables it to access resources which would otherwise be unavailable to a public user. To do so, start by invoking it with ``oauth:`` as an argument. For example: .. code:: bash gallery-dl oauth:flickr You will be sent to the site's authorization page and asked to grant read access to *gallery-dl*. Authorize it and you will be shown one or more "tokens", which should be added to your configuration file. To authenticate with a ``mastodon`` instance, run *gallery-dl* with ``oauth:mastodon:`` as argument. For example: .. code:: bash gallery-dl oauth:mastodon:pawoo.net gallery-dl oauth:mastodon:https://mastodon.social/ .. _Python: https://www.python.org/downloads/ .. _PyPI: https://pypi.org/ .. _pip: https://pip.pypa.io/en/stable/ .. _Requests: https://requests.readthedocs.io/en/master/ .. _FFmpeg: https://www.ffmpeg.org/ .. _yt-dlp: https://github.com/yt-dlp/yt-dlp .. _youtube-dl: https://ytdl-org.github.io/youtube-dl/ .. _PySocks: https://pypi.org/project/PySocks/ .. _brotli: https://github.com/google/brotli .. _brotlicffi: https://github.com/python-hyper/brotlicffi .. _PyYAML: https://pyyaml.org/ .. _toml: https://pypi.org/project/toml/ .. _SecretStorage: https://pypi.org/project/SecretStorage/ .. _Snapd: https://docs.snapcraft.io/installing-snapd .. _OAuth: https://en.wikipedia.org/wiki/OAuth .. _Chocolatey: https://chocolatey.org/install .. _Scoop: https://scoop.sh .. |pypi| image:: https://img.shields.io/pypi/v/gallery-dl.svg :target: https://pypi.org/project/gallery-dl/ .. |build| image:: https://github.com/mikf/gallery-dl/workflows/tests/badge.svg :target: https://github.com/mikf/gallery-dl/actions .. |gitter| image:: https://badges.gitter.im/gallery-dl/main.svg :target: https://gitter.im/gallery-dl/main ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1711211673.9758053 gallery_dl-1.26.9/data/0000755000175000017500000000000014577602232013344 5ustar00mikemike././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1711211673.9791386 gallery_dl-1.26.9/data/completion/0000755000175000017500000000000014577602232015515 5ustar00mikemike././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709688746.0 gallery_dl-1.26.9/data/completion/_gallery-dl0000644000175000017500000001500514571743652017642 0ustar00mikemike#compdef gallery-dl local curcontext="$curcontext" typeset -A opt_args local rc=1 _arguments -s -S \ {-h,--help}'[Print this help message and exit]' \ --version'[Print program version and exit]' \ {-f,--filename}'[Filename format string for downloaded files ('\''/O'\'' for "original" filenames)]':'' \ {-d,--destination}'[Target location for file downloads]':'' \ {-D,--directory}'[Exact location for file downloads]':'' \ {-X,--extractors}'[Load external extractors from PATH]':'' \ --proxy'[Use the specified proxy]':'' \ --source-address'[Client-side IP address to bind to]':'' \ --user-agent'[User-Agent request header]':'' \ --clear-cache'[Delete cached login sessions, cookies, etc. for MODULE (ALL to delete everything)]':'' \ {-i,--input-file}'[Download URLs found in FILE ('\''-'\'' for stdin). More than one --input-file can be specified]':'':_files \ {-I,--input-file-comment}'[Download URLs found in FILE. Comment them out after they were downloaded successfully.]':'':_files \ {-x,--input-file-delete}'[Download URLs found in FILE. Delete them after they were downloaded successfully.]':'':_files \ {-q,--quiet}'[Activate quiet mode]' \ {-v,--verbose}'[Print various debugging information]' \ {-g,--get-urls}'[Print URLs instead of downloading]' \ {-G,--resolve-urls}'[Print URLs instead of downloading; resolve intermediary URLs]' \ {-j,--dump-json}'[Print JSON information]' \ {-s,--simulate}'[Simulate data extraction; do not download anything]' \ {-E,--extractor-info}'[Print extractor defaults and settings]' \ {-K,--list-keywords}'[Print a list of available keywords and example values for the given URLs]' \ {-e,--error-file}'[Add input URLs which returned an error to FILE]':'':_files \ --list-modules'[Print a list of available extractor modules]' \ --list-extractors'[Print a list of extractor classes with description, (sub)category and example URL]' \ --write-log'[Write logging output to FILE]':'':_files \ --write-unsupported'[Write URLs, which get emitted by other extractors but cannot be handled, to FILE]':'':_files \ --write-pages'[Write downloaded intermediary pages to files in the current directory to debug problems]' \ {-r,--limit-rate}'[Maximum download rate (e.g. 500k or 2.5M)]':'' \ {-R,--retries}'[Maximum number of retries for failed HTTP requests or -1 for infinite retries (default: 4)]':'' \ --http-timeout'[Timeout for HTTP connections (default: 30.0)]':'' \ --sleep'[Number of seconds to wait before each download. This can be either a constant value or a range (e.g. 2.7 or 2.0-3.5)]':'' \ --sleep-request'[Number of seconds to wait between HTTP requests during data extraction]':'' \ --sleep-extractor'[Number of seconds to wait before starting data extraction for an input URL]':'' \ --filesize-min'[Do not download files smaller than SIZE (e.g. 500k or 2.5M)]':'' \ --filesize-max'[Do not download files larger than SIZE (e.g. 500k or 2.5M)]':'' \ --chunk-size'[Size of in-memory data chunks (default: 32k)]':'' \ --no-part'[Do not use .part files]' \ --no-skip'[Do not skip downloads; overwrite existing files]' \ --no-mtime'[Do not set file modification times according to Last-Modified HTTP response headers]' \ --no-download'[Do not download any files]' \ --no-postprocessors'[Do not run any post processors]' \ --no-check-certificate'[Disable HTTPS certificate validation]' \ {-o,--option}'[Additional options. Example: -o browser=firefox]':'' \ {-c,--config}'[Additional configuration files]':'':_files \ --config-yaml'[Additional configuration files in YAML format]':'':_files \ --config-toml'[Additional configuration files in TOML format]':'':_files \ --config-create'[Create a basic configuration file]' \ --config-ignore'[Do not read default configuration files]' \ {-u,--username}'[Username to login with]':'' \ {-p,--password}'[Password belonging to the given username]':'' \ --netrc'[Enable .netrc authentication data]' \ {-C,--cookies}'[File to load additional cookies from]':'':_files \ --cookies-export'[Export session cookies to FILE]':'':_files \ --cookies-from-browser'[Name of the browser to load cookies from, with optional domain prefixed with '\''/'\'', keyring name prefixed with '\''+'\'', profile prefixed with '\'':'\'', and container prefixed with '\''::'\'' ('\''none'\'' for no container)]':'' \ --download-archive'[Record all downloaded or skipped files in FILE and skip downloading any file already in it]':'':_files \ {-A,--abort}'[Stop current extractor run after N consecutive file downloads were skipped]':'' \ {-T,--terminate}'[Stop current and parent extractor runs after N consecutive file downloads were skipped]':'' \ --range'[Index range(s) specifying which files to download. These can be either a constant value, range, or slice (e.g. '\''5'\'', '\''8-20'\'', or '\''1:24:3'\'')]':'' \ --chapter-range'[Like '\''--range'\'', but applies to manga chapters and other delegated URLs]':'' \ --filter'[Python expression controlling which files to download. Files for which the expression evaluates to False are ignored. Available keys are the filename-specific ones listed by '\''-K'\''. Example: --filter "image_width >= 1000 and rating in ('\''s'\'', '\''q'\'')"]':'' \ --chapter-filter'[Like '\''--filter'\'', but applies to manga chapters and other delegated URLs]':'' \ {-P,--postprocessor}'[Activate the specified post processor]':'' \ {-O,--postprocessor-option}'[Additional post processor options]':'' \ --write-metadata'[Write metadata to separate JSON files]' \ --write-info-json'[Write gallery metadata to a info.json file]' \ --write-tags'[Write image tags to separate text files]' \ --zip'[Store downloaded files in a ZIP archive]' \ --cbz'[Store downloaded files in a CBZ archive]' \ --mtime'[Set file modification times according to metadata selected by NAME. Examples: '\''date'\'' or '\''status\[date\]'\'']':'' \ --ugoira'[Convert Pixiv Ugoira to FORMAT using FFmpeg. Supported formats are '\''webm'\'', '\''mp4'\'', '\''gif'\'', '\''vp8'\'', '\''vp9'\'', '\''vp9-lossless'\'', '\''copy'\''.]':'' \ --exec'[Execute CMD for each downloaded file. Supported replacement fields are {} or {_path}, {_directory}, {_filename}. Example: --exec "convert {} {}.png && rm {}"]':'' \ --exec-after'[Execute CMD after all files were downloaded. Example: --exec-after "cd {_directory} && convert * ../doc.pdf"]':'' && rc=0 return rc ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709688746.0 gallery_dl-1.26.9/data/completion/gallery-dl0000644000175000017500000000320514571743652017502 0ustar00mikemike_gallery_dl() { local cur prev COMPREPLY=() cur="${COMP_WORDS[COMP_CWORD]}" prev="${COMP_WORDS[COMP_CWORD-1]}" if [[ "${prev}" =~ ^(-i|--input-file|-I|--input-file-comment|-x|--input-file-delete|-e|--error-file|--write-log|--write-unsupported|-c|--config|--config-yaml|--config-toml|-C|--cookies|--cookies-export|--download-archive)$ ]]; then COMPREPLY=( $(compgen -f -- "${cur}") ) elif [[ "${prev}" =~ ^()$ ]]; then COMPREPLY=( $(compgen -d -- "${cur}") ) else COMPREPLY=( $(compgen -W "--help --version --filename --destination --directory --extractors --proxy --source-address --user-agent --clear-cache --input-file --input-file-comment --input-file-delete --quiet --verbose --get-urls --resolve-urls --dump-json --simulate --extractor-info --list-keywords --error-file --list-modules --list-extractors --write-log --write-unsupported --write-pages --limit-rate --retries --http-timeout --sleep --sleep-request --sleep-extractor --filesize-min --filesize-max --chunk-size --no-part --no-skip --no-mtime --no-download --no-postprocessors --no-check-certificate --option --config --config-yaml --config-toml --config-create --config-ignore --ignore-config --username --password --netrc --cookies --cookies-export --cookies-from-browser --download-archive --abort --terminate --range --chapter-range --filter --chapter-filter --postprocessor --postprocessor-option --write-metadata --write-info-json --write-infojson --write-tags --zip --cbz --mtime --mtime-from-date --ugoira --ugoira-conv --ugoira-conv-lossless --ugoira-conv-copy --exec --exec-after" -- "${cur}") ) fi } complete -F _gallery_dl gallery-dl ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709688746.0 gallery_dl-1.26.9/data/completion/gallery-dl.fish0000644000175000017500000002040014571743652020426 0ustar00mikemikecomplete -c gallery-dl -x complete -c gallery-dl -s 'h' -l 'help' -d 'Print this help message and exit' complete -c gallery-dl -l 'version' -d 'Print program version and exit' complete -c gallery-dl -x -s 'f' -l 'filename' -d 'Filename format string for downloaded files ("/O" for "original" filenames)' complete -c gallery-dl -x -a '(__fish_complete_directories)' -s 'd' -l 'destination' -d 'Target location for file downloads' complete -c gallery-dl -x -a '(__fish_complete_directories)' -s 'D' -l 'directory' -d 'Exact location for file downloads' complete -c gallery-dl -x -a '(__fish_complete_directories)' -s 'X' -l 'extractors' -d 'Load external extractors from PATH' complete -c gallery-dl -x -l 'proxy' -d 'Use the specified proxy' complete -c gallery-dl -x -l 'source-address' -d 'Client-side IP address to bind to' complete -c gallery-dl -x -l 'user-agent' -d 'User-Agent request header' complete -c gallery-dl -x -l 'clear-cache' -d 'Delete cached login sessions, cookies, etc. for MODULE (ALL to delete everything)' complete -c gallery-dl -r -F -s 'i' -l 'input-file' -d 'Download URLs found in FILE ("-" for stdin). More than one --input-file can be specified' complete -c gallery-dl -r -F -s 'I' -l 'input-file-comment' -d 'Download URLs found in FILE. Comment them out after they were downloaded successfully.' complete -c gallery-dl -r -F -s 'x' -l 'input-file-delete' -d 'Download URLs found in FILE. Delete them after they were downloaded successfully.' complete -c gallery-dl -s 'q' -l 'quiet' -d 'Activate quiet mode' complete -c gallery-dl -s 'v' -l 'verbose' -d 'Print various debugging information' complete -c gallery-dl -s 'g' -l 'get-urls' -d 'Print URLs instead of downloading' complete -c gallery-dl -s 'G' -l 'resolve-urls' -d 'Print URLs instead of downloading; resolve intermediary URLs' complete -c gallery-dl -s 'j' -l 'dump-json' -d 'Print JSON information' complete -c gallery-dl -s 's' -l 'simulate' -d 'Simulate data extraction; do not download anything' complete -c gallery-dl -s 'E' -l 'extractor-info' -d 'Print extractor defaults and settings' complete -c gallery-dl -s 'K' -l 'list-keywords' -d 'Print a list of available keywords and example values for the given URLs' complete -c gallery-dl -r -F -s 'e' -l 'error-file' -d 'Add input URLs which returned an error to FILE' complete -c gallery-dl -l 'list-modules' -d 'Print a list of available extractor modules' complete -c gallery-dl -l 'list-extractors' -d 'Print a list of extractor classes with description, (sub)category and example URL' complete -c gallery-dl -r -F -l 'write-log' -d 'Write logging output to FILE' complete -c gallery-dl -r -F -l 'write-unsupported' -d 'Write URLs, which get emitted by other extractors but cannot be handled, to FILE' complete -c gallery-dl -l 'write-pages' -d 'Write downloaded intermediary pages to files in the current directory to debug problems' complete -c gallery-dl -x -s 'r' -l 'limit-rate' -d 'Maximum download rate (e.g. 500k or 2.5M)' complete -c gallery-dl -x -s 'R' -l 'retries' -d 'Maximum number of retries for failed HTTP requests or -1 for infinite retries (default: 4)' complete -c gallery-dl -x -l 'http-timeout' -d 'Timeout for HTTP connections (default: 30.0)' complete -c gallery-dl -x -l 'sleep' -d 'Number of seconds to wait before each download. This can be either a constant value or a range (e.g. 2.7 or 2.0-3.5)' complete -c gallery-dl -x -l 'sleep-request' -d 'Number of seconds to wait between HTTP requests during data extraction' complete -c gallery-dl -x -l 'sleep-extractor' -d 'Number of seconds to wait before starting data extraction for an input URL' complete -c gallery-dl -x -l 'filesize-min' -d 'Do not download files smaller than SIZE (e.g. 500k or 2.5M)' complete -c gallery-dl -x -l 'filesize-max' -d 'Do not download files larger than SIZE (e.g. 500k or 2.5M)' complete -c gallery-dl -x -l 'chunk-size' -d 'Size of in-memory data chunks (default: 32k)' complete -c gallery-dl -l 'no-part' -d 'Do not use .part files' complete -c gallery-dl -l 'no-skip' -d 'Do not skip downloads; overwrite existing files' complete -c gallery-dl -l 'no-mtime' -d 'Do not set file modification times according to Last-Modified HTTP response headers' complete -c gallery-dl -l 'no-download' -d 'Do not download any files' complete -c gallery-dl -l 'no-postprocessors' -d 'Do not run any post processors' complete -c gallery-dl -l 'no-check-certificate' -d 'Disable HTTPS certificate validation' complete -c gallery-dl -x -s 'o' -l 'option' -d 'Additional options. Example: -o browser=firefox' complete -c gallery-dl -r -F -s 'c' -l 'config' -d 'Additional configuration files' complete -c gallery-dl -r -F -l 'config-yaml' -d 'Additional configuration files in YAML format' complete -c gallery-dl -r -F -l 'config-toml' -d 'Additional configuration files in TOML format' complete -c gallery-dl -l 'config-create' -d 'Create a basic configuration file' complete -c gallery-dl -l 'config-ignore' -d 'Do not read default configuration files' complete -c gallery-dl -l 'ignore-config' -d '==SUPPRESS==' complete -c gallery-dl -x -s 'u' -l 'username' -d 'Username to login with' complete -c gallery-dl -x -s 'p' -l 'password' -d 'Password belonging to the given username' complete -c gallery-dl -l 'netrc' -d 'Enable .netrc authentication data' complete -c gallery-dl -r -F -s 'C' -l 'cookies' -d 'File to load additional cookies from' complete -c gallery-dl -r -F -l 'cookies-export' -d 'Export session cookies to FILE' complete -c gallery-dl -x -l 'cookies-from-browser' -d 'Name of the browser to load cookies from, with optional domain prefixed with "/", keyring name prefixed with "+", profile prefixed with ":", and container prefixed with "::" ("none" for no container)' complete -c gallery-dl -r -F -l 'download-archive' -d 'Record all downloaded or skipped files in FILE and skip downloading any file already in it' complete -c gallery-dl -x -s 'A' -l 'abort' -d 'Stop current extractor run after N consecutive file downloads were skipped' complete -c gallery-dl -x -s 'T' -l 'terminate' -d 'Stop current and parent extractor runs after N consecutive file downloads were skipped' complete -c gallery-dl -x -l 'range' -d 'Index range(s) specifying which files to download. These can be either a constant value, range, or slice (e.g. "5", "8-20", or "1:24:3")' complete -c gallery-dl -x -l 'chapter-range' -d 'Like "--range", but applies to manga chapters and other delegated URLs' complete -c gallery-dl -x -l 'filter' -d 'Python expression controlling which files to download. Files for which the expression evaluates to False are ignored. Available keys are the filename-specific ones listed by "-K". Example: --filter "image_width >= 1000 and rating in ("s", "q")"' complete -c gallery-dl -x -l 'chapter-filter' -d 'Like "--filter", but applies to manga chapters and other delegated URLs' complete -c gallery-dl -x -s 'P' -l 'postprocessor' -d 'Activate the specified post processor' complete -c gallery-dl -x -s 'O' -l 'postprocessor-option' -d 'Additional post processor options' complete -c gallery-dl -l 'write-metadata' -d 'Write metadata to separate JSON files' complete -c gallery-dl -l 'write-info-json' -d 'Write gallery metadata to a info.json file' complete -c gallery-dl -l 'write-infojson' -d '==SUPPRESS==' complete -c gallery-dl -l 'write-tags' -d 'Write image tags to separate text files' complete -c gallery-dl -l 'zip' -d 'Store downloaded files in a ZIP archive' complete -c gallery-dl -l 'cbz' -d 'Store downloaded files in a CBZ archive' complete -c gallery-dl -x -l 'mtime' -d 'Set file modification times according to metadata selected by NAME. Examples: "date" or "status[date]"' complete -c gallery-dl -l 'mtime-from-date' -d '==SUPPRESS==' complete -c gallery-dl -x -l 'ugoira' -d 'Convert Pixiv Ugoira to FORMAT using FFmpeg. Supported formats are "webm", "mp4", "gif", "vp8", "vp9", "vp9-lossless", "copy".' complete -c gallery-dl -l 'ugoira-conv' -d '==SUPPRESS==' complete -c gallery-dl -l 'ugoira-conv-lossless' -d '==SUPPRESS==' complete -c gallery-dl -l 'ugoira-conv-copy' -d '==SUPPRESS==' complete -c gallery-dl -x -l 'exec' -d 'Execute CMD for each downloaded file. Supported replacement fields are {} or {_path}, {_directory}, {_filename}. Example: --exec "convert {} {}.png && rm {}"' complete -c gallery-dl -x -l 'exec-after' -d 'Execute CMD after all files were downloaded. Example: --exec-after "cd {_directory} && convert * ../doc.pdf"' ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1711211673.9791386 gallery_dl-1.26.9/data/man/0000755000175000017500000000000014577602232014117 5ustar00mikemike././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1711211673.0 gallery_dl-1.26.9/data/man/gallery-dl.10000644000175000017500000002057414577602231016244 0ustar00mikemike.TH "GALLERY-DL" "1" "2024-03-23" "1.26.9" "gallery-dl Manual" .\" disable hyphenation .nh .SH NAME gallery-dl \- download image-galleries and -collections .SH SYNOPSIS .B gallery-dl [OPTION]... URL... .SH DESCRIPTION .B gallery-dl is a command-line program to download image-galleries and -collections from several image hosting sites. It is a cross-platform tool with many configuration options and powerful filenaming capabilities. .SH OPTIONS .TP .B "\-h, \-\-help" Print this help message and exit .TP .B "\-\-version" Print program version and exit .TP .B "\-f, \-\-filename" \f[I]FORMAT\f[] Filename format string for downloaded files ('/O' for "original" filenames) .TP .B "\-d, \-\-destination" \f[I]PATH\f[] Target location for file downloads .TP .B "\-D, \-\-directory" \f[I]PATH\f[] Exact location for file downloads .TP .B "\-X, \-\-extractors" \f[I]PATH\f[] Load external extractors from PATH .TP .B "\-\-proxy" \f[I]URL\f[] Use the specified proxy .TP .B "\-\-source\-address" \f[I]IP\f[] Client-side IP address to bind to .TP .B "\-\-user\-agent" \f[I]UA\f[] User-Agent request header .TP .B "\-\-clear\-cache" \f[I]MODULE\f[] Delete cached login sessions, cookies, etc. for MODULE (ALL to delete everything) .TP .B "\-i, \-\-input\-file" \f[I]FILE\f[] Download URLs found in FILE ('-' for stdin). More than one --input-file can be specified .TP .B "\-I, \-\-input\-file\-comment" \f[I]FILE\f[] Download URLs found in FILE. Comment them out after they were downloaded successfully. .TP .B "\-x, \-\-input\-file\-delete" \f[I]FILE\f[] Download URLs found in FILE. Delete them after they were downloaded successfully. .TP .B "\-q, \-\-quiet" Activate quiet mode .TP .B "\-v, \-\-verbose" Print various debugging information .TP .B "\-g, \-\-get\-urls" Print URLs instead of downloading .TP .B "\-G, \-\-resolve\-urls" Print URLs instead of downloading; resolve intermediary URLs .TP .B "\-j, \-\-dump\-json" Print JSON information .TP .B "\-s, \-\-simulate" Simulate data extraction; do not download anything .TP .B "\-E, \-\-extractor\-info" Print extractor defaults and settings .TP .B "\-K, \-\-list\-keywords" Print a list of available keywords and example values for the given URLs .TP .B "\-e, \-\-error\-file" \f[I]FILE\f[] Add input URLs which returned an error to FILE .TP .B "\-\-list\-modules" Print a list of available extractor modules .TP .B "\-\-list\-extractors" Print a list of extractor classes with description, (sub)category and example URL .TP .B "\-\-write\-log" \f[I]FILE\f[] Write logging output to FILE .TP .B "\-\-write\-unsupported" \f[I]FILE\f[] Write URLs, which get emitted by other extractors but cannot be handled, to FILE .TP .B "\-\-write\-pages" Write downloaded intermediary pages to files in the current directory to debug problems .TP .B "\-r, \-\-limit\-rate" \f[I]RATE\f[] Maximum download rate (e.g. 500k or 2.5M) .TP .B "\-R, \-\-retries" \f[I]N\f[] Maximum number of retries for failed HTTP requests or -1 for infinite retries (default: 4) .TP .B "\-\-http\-timeout" \f[I]SECONDS\f[] Timeout for HTTP connections (default: 30.0) .TP .B "\-\-sleep" \f[I]SECONDS\f[] Number of seconds to wait before each download. This can be either a constant value or a range (e.g. 2.7 or 2.0-3.5) .TP .B "\-\-sleep\-request" \f[I]SECONDS\f[] Number of seconds to wait between HTTP requests during data extraction .TP .B "\-\-sleep\-extractor" \f[I]SECONDS\f[] Number of seconds to wait before starting data extraction for an input URL .TP .B "\-\-filesize\-min" \f[I]SIZE\f[] Do not download files smaller than SIZE (e.g. 500k or 2.5M) .TP .B "\-\-filesize\-max" \f[I]SIZE\f[] Do not download files larger than SIZE (e.g. 500k or 2.5M) .TP .B "\-\-chunk\-size" \f[I]SIZE\f[] Size of in-memory data chunks (default: 32k) .TP .B "\-\-no\-part" Do not use .part files .TP .B "\-\-no\-skip" Do not skip downloads; overwrite existing files .TP .B "\-\-no\-mtime" Do not set file modification times according to Last-Modified HTTP response headers .TP .B "\-\-no\-download" Do not download any files .TP .B "\-\-no\-postprocessors" Do not run any post processors .TP .B "\-\-no\-check\-certificate" Disable HTTPS certificate validation .TP .B "\-o, \-\-option" \f[I]KEY=VALUE\f[] Additional options. Example: -o browser=firefox .TP .B "\-c, \-\-config" \f[I]FILE\f[] Additional configuration files .TP .B "\-\-config\-yaml" \f[I]FILE\f[] Additional configuration files in YAML format .TP .B "\-\-config\-toml" \f[I]FILE\f[] Additional configuration files in TOML format .TP .B "\-\-config\-create" Create a basic configuration file .TP .B "\-\-config\-ignore" Do not read default configuration files .TP .B "\-u, \-\-username" \f[I]USER\f[] Username to login with .TP .B "\-p, \-\-password" \f[I]PASS\f[] Password belonging to the given username .TP .B "\-\-netrc" Enable .netrc authentication data .TP .B "\-C, \-\-cookies" \f[I]FILE\f[] File to load additional cookies from .TP .B "\-\-cookies\-export" \f[I]FILE\f[] Export session cookies to FILE .TP .B "\-\-cookies\-from\-browser" \f[I]BROWSER[/DOMAIN][+KEYRING][:PROFILE][::CONTAINER]\f[] Name of the browser to load cookies from, with optional domain prefixed with '/', keyring name prefixed with '+', profile prefixed with ':', and container prefixed with '::' ('none' for no container) .TP .B "\-\-download\-archive" \f[I]FILE\f[] Record all downloaded or skipped files in FILE and skip downloading any file already in it .TP .B "\-A, \-\-abort" \f[I]N\f[] Stop current extractor run after N consecutive file downloads were skipped .TP .B "\-T, \-\-terminate" \f[I]N\f[] Stop current and parent extractor runs after N consecutive file downloads were skipped .TP .B "\-\-range" \f[I]RANGE\f[] Index range(s) specifying which files to download. These can be either a constant value, range, or slice (e.g. '5', '8-20', or '1:24:3') .TP .B "\-\-chapter\-range" \f[I]RANGE\f[] Like '--range', but applies to manga chapters and other delegated URLs .TP .B "\-\-filter" \f[I]EXPR\f[] Python expression controlling which files to download. Files for which the expression evaluates to False are ignored. Available keys are the filename-specific ones listed by '-K'. Example: --filter "image_width >= 1000 and rating in ('s', 'q')" .TP .B "\-\-chapter\-filter" \f[I]EXPR\f[] Like '--filter', but applies to manga chapters and other delegated URLs .TP .B "\-P, \-\-postprocessor" \f[I]NAME\f[] Activate the specified post processor .TP .B "\-O, \-\-postprocessor\-option" \f[I]KEY=VALUE\f[] Additional post processor options .TP .B "\-\-write\-metadata" Write metadata to separate JSON files .TP .B "\-\-write\-info\-json" Write gallery metadata to a info.json file .TP .B "\-\-write\-tags" Write image tags to separate text files .TP .B "\-\-zip" Store downloaded files in a ZIP archive .TP .B "\-\-cbz" Store downloaded files in a CBZ archive .TP .B "\-\-mtime" \f[I]NAME\f[] Set file modification times according to metadata selected by NAME. Examples: 'date' or 'status[date]' .TP .B "\-\-ugoira" \f[I]FORMAT\f[] Convert Pixiv Ugoira to FORMAT using FFmpeg. Supported formats are 'webm', 'mp4', 'gif', 'vp8', 'vp9', 'vp9-lossless', 'copy'. .TP .B "\-\-exec" \f[I]CMD\f[] Execute CMD for each downloaded file. Supported replacement fields are {} or {_path}, {_directory}, {_filename}. Example: --exec "convert {} {}.png && rm {}" .TP .B "\-\-exec\-after" \f[I]CMD\f[] Execute CMD after all files were downloaded. Example: --exec-after "cd {_directory} && convert * ../doc.pdf" .SH EXAMPLES .TP gallery-dl \f[I]URL\f[] Download images from \f[I]URL\f[]. .TP gallery-dl -g -u -p \f[I]URL\f[] Print direct URLs from a site that requires authentication. .TP gallery-dl --filter 'type == "ugoira"' --range '2-4' \f[I]URL\f[] Apply filter and range expressions. This will only download the second, third, and fourth file where its type value is equal to "ugoira". .TP gallery-dl r:\f[I]URL\f[] Scan \f[I]URL\f[] for other URLs and invoke \f[B]gallery-dl\f[] on them. .TP gallery-dl oauth:\f[I]SITE\-NAME\f[] Gain OAuth authentication tokens for .IR deviantart , .IR flickr , .IR reddit , .IR smugmug ", and" .IR tumblr . .SH FILES .TP .I /etc/gallery-dl.conf The system wide configuration file. .TP .I ~/.config/gallery-dl/config.json Per user configuration file. .TP .I ~/.gallery-dl.conf Alternate per user configuration file. .SH BUGS https://github.com/mikf/gallery-dl/issues .SH AUTHORS Mike Fährmann .br and https://github.com/mikf/gallery-dl/graphs/contributors .SH "SEE ALSO" .BR gallery-dl.conf (5) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1711211673.0 gallery_dl-1.26.9/data/man/gallery-dl.conf.50000644000175000017500000042332014577602231017170 0ustar00mikemike.TH "GALLERY-DL.CONF" "5" "2024-03-23" "1.26.9" "gallery-dl Manual" .\" disable hyphenation .nh .\" disable justification (adjust text to left margin only) .ad l .SH NAME gallery-dl.conf \- gallery-dl configuration file .SH DESCRIPTION gallery-dl will search for configuration files in the following places every time it is started, unless .B --ignore-config is specified: .PP .RS 4 .nf .I /etc/gallery-dl.conf .I $HOME/.config/gallery-dl/config.json .I $HOME/.gallery-dl.conf .fi .RE .PP It is also possible to specify additional configuration files with the .B -c/--config command-line option or to add further option values with .B -o/--option as = pairs, Configuration files are JSON-based and therefore don't allow any ordinary comments, but, since unused keys are simply ignored, it is possible to utilize those as makeshift comments by settings their values to arbitrary strings. .SH EXAMPLE { .RS 4 "base-directory": "/tmp/", .br "extractor": { .RS 4 "pixiv": { .RS 4 "directory": ["Pixiv", "Works", "{user[id]}"], .br "filename": "{id}{num}.{extension}", .br "username": "foo", .br "password": "bar" .RE }, .br "flickr": { .RS 4 "_comment": "OAuth keys for account 'foobar'", .br "access-token": "0123456789-0123456789abcdef", .br "access-token-secret": "fedcba9876543210" .RE } .RE }, .br "downloader": { .RS 4 "retries": 3, .br "timeout": 2.5 .RE } .RE } .SH EXTRACTOR OPTIONS .SS extractor.*.filename .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]object\f[] (condition -> \f[I]format string\f[]) .IP "Example:" 4 .. code:: json "{manga}_c{chapter}_{page:>03}.{extension}" .. code:: json { "extension == 'mp4'": "{id}_video.{extension}", "'nature' in title" : "{id}_{title}.{extension}", "" : "{id}_default.{extension}" } .IP "Description:" 4 A \f[I]format string\f[] to build filenames for downloaded files with. If this is an \f[I]object\f[], it must contain Python expressions mapping to the filename format strings to use. These expressions are evaluated in the order as specified in Python 3.6+ and in an undetermined order in Python 3.4 and 3.5. The available replacement keys depend on the extractor used. A list of keys for a specific one can be acquired by calling *gallery-dl* with the \f[I]-K\f[]/\f[I]--list-keywords\f[] command-line option. For example: .. code:: $ gallery-dl -K http://seiga.nicovideo.jp/seiga/im5977527 Keywords for directory names: category seiga subcategory image Keywords for filenames: category seiga extension None image-id 5977527 subcategory image Note: Even if the value of the \f[I]extension\f[] key is missing or \f[I]None\f[], it will be filled in later when the file download is starting. This key is therefore always available to provide a valid filename extension. .SS extractor.*.directory .IP "Type:" 6 .br * \f[I]list\f[] of \f[I]strings\f[] .br * \f[I]object\f[] (condition -> \f[I]format strings\f[]) .IP "Example:" 4 .. code:: json ["{category}", "{manga}", "c{chapter} - {title}"] .. code:: json { "'nature' in content": ["Nature Pictures"], "retweet_id != 0" : ["{category}", "{user[name]}", "Retweets"], "" : ["{category}", "{user[name]}"] } .IP "Description:" 4 A list of \f[I]format strings\f[] to build target directory paths with. If this is an \f[I]object\f[], it must contain Python expressions mapping to the list of format strings to use. Each individual string in such a list represents a single path segment, which will be joined together and appended to the \f[I]base-directory\f[] to form the complete target directory path. .SS extractor.*.base-directory .IP "Type:" 6 \f[I]Path\f[] .IP "Default:" 9 \f[I]"./gallery-dl/"\f[] .IP "Description:" 4 Directory path used as base for all download destinations. .SS extractor.*.parent-directory .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Use an extractor's current target directory as \f[I]base-directory\f[] for any spawned child extractors. .SS extractor.*.metadata-parent .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 If \f[I]true\f[], overwrite any metadata provided by a child extractor with its parent's. If this is a \f[I]string\f[], add a parent's metadata to its children's .br to a field named after said string. For example with \f[I]"parent-metadata": "_p_"\f[]: .br .. code:: json { "id": "child-id", "_p_": {"id": "parent-id"} } .SS extractor.*.parent-skip .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Share number of skipped downloads between parent and child extractors. .SS extractor.*.path-restrict .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]object\f[] (character -> replacement character(s)) .IP "Default:" 9 \f[I]"auto"\f[] .IP "Example:" 4 .br * "/!? (){}" .br * {" ": "_", "/": "-", "|": "-", ":": "_-_", "*": "_+_"} .IP "Description:" 4 A string of characters to be replaced with the value of .br \f[I]path-replace\f[] or an object mapping invalid/unwanted characters to their replacements .br for generated path segment names. .br Special values: .br * \f[I]"auto"\f[]: Use characters from \f[I]"unix"\f[] or \f[I]"windows"\f[] depending on the local operating system .br * \f[I]"unix"\f[]: \f[I]"/"\f[] .br * \f[I]"windows"\f[]: \f[I]"\\\\\\\\|/<>:\\"?*"\f[] .br * \f[I]"ascii"\f[]: \f[I]"^0-9A-Za-z_."\f[] (only ASCII digits, letters, underscores, and dots) .br * \f[I]"ascii+"\f[]: \f[I]"^0-9@-[\\\\]-{ #-)+-.;=!}~"\f[] (all ASCII characters except the ones not allowed by Windows) Implementation Detail: For \f[I]strings\f[] with length >= 2, this option uses a \f[I]Regular Expression Character Set\f[], meaning that: .br * using a caret \f[I]^\f[] as first character inverts the set .br * character ranges are supported (\f[I]0-9a-z\f[]) .br * \f[I]]\f[], \f[I]-\f[], and \f[I]\\\f[] need to be escaped as \f[I]\\\\]\f[], \f[I]\\\\-\f[], and \f[I]\\\\\\\\\f[] respectively to use them as literal characters .SS extractor.*.path-replace .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"_"\f[] .IP "Description:" 4 The replacement character(s) for \f[I]path-restrict\f[] .SS extractor.*.path-remove .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"\\u0000-\\u001f\\u007f"\f[] (ASCII control characters) .IP "Description:" 4 Set of characters to remove from generated path names. Note: In a string with 2 or more characters, \f[I][]^-\\\f[] need to be escaped with backslashes, e.g. \f[I]"\\\\[\\\\]"\f[] .SS extractor.*.path-strip .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"auto"\f[] .IP "Description:" 4 Set of characters to remove from the end of generated path segment names using \f[I]str.rstrip()\f[] Special values: .br * \f[I]"auto"\f[]: Use characters from \f[I]"unix"\f[] or \f[I]"windows"\f[] depending on the local operating system .br * \f[I]"unix"\f[]: \f[I]""\f[] .br * \f[I]"windows"\f[]: \f[I]". "\f[] .SS extractor.*.path-extended .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 On Windows, use \f[I]extended-length paths\f[] prefixed with \f[I]\\\\?\\\f[] to work around the 260 characters path length limit. .SS extractor.*.extension-map .IP "Type:" 6 \f[I]object\f[] (extension -> replacement) .IP "Default:" 9 .. code:: json { "jpeg": "jpg", "jpe" : "jpg", "jfif": "jpg", "jif" : "jpg", "jfi" : "jpg" } .IP "Description:" 4 A JSON \f[I]object\f[] mapping filename extensions to their replacements. .SS extractor.*.skip .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Controls the behavior when downloading files that have been downloaded before, i.e. a file with the same filename already exists or its ID is in a \f[I]download archive\f[]. .br * \f[I]true\f[]: Skip downloads .br * \f[I]false\f[]: Overwrite already existing files .br * \f[I]"abort"\f[]: Stop the current extractor run .br * \f[I]"abort:N"\f[]: Skip downloads and stop the current extractor run after \f[I]N\f[] consecutive skips .br * \f[I]"terminate"\f[]: Stop the current extractor run, including parent extractors .br * \f[I]"terminate:N"\f[]: Skip downloads and stop the current extractor run, including parent extractors, after \f[I]N\f[] consecutive skips .br * \f[I]"exit"\f[]: Exit the program altogether .br * \f[I]"exit:N"\f[]: Skip downloads and exit the program after \f[I]N\f[] consecutive skips .br * \f[I]"enumerate"\f[]: Add an enumeration index to the beginning of the filename extension (\f[I]file.1.ext\f[], \f[I]file.2.ext\f[], etc.) .SS extractor.*.sleep .IP "Type:" 6 \f[I]Duration\f[] .IP "Default:" 9 \f[I]0\f[] .IP "Description:" 4 Number of seconds to sleep before each download. .SS extractor.*.sleep-extractor .IP "Type:" 6 \f[I]Duration\f[] .IP "Default:" 9 \f[I]0\f[] .IP "Description:" 4 Number of seconds to sleep before handling an input URL, i.e. before starting a new extractor. .SS extractor.*.sleep-request .IP "Type:" 6 \f[I]Duration\f[] .IP "Default:" 9 \f[I]0\f[] .IP "Description:" 4 Minimal time interval in seconds between each HTTP request during data extraction. .SS extractor.*.username & .password .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 The username and password to use when attempting to log in to another site. Specifying username and password is required for .br * \f[I]nijie\f[] and optional for .br * \f[I]aibooru\f[] (*) .br * \f[I]aryion\f[] .br * \f[I]atfbooru\f[] (*) .br * \f[I]bluesky\f[] .br * \f[I]danbooru\f[] (*) .br * \f[I]e621\f[] (*) .br * \f[I]e926\f[] (*) .br * \f[I]exhentai\f[] .br * \f[I]idolcomplex\f[] .br * \f[I]imgbb\f[] .br * \f[I]inkbunny\f[] .br * \f[I]kemonoparty\f[] .br * \f[I]mangadex\f[] .br * \f[I]mangoxo\f[] .br * \f[I]pillowfort\f[] .br * \f[I]sankaku\f[] .br * \f[I]seisoparty\f[] .br * \f[I]subscribestar\f[] .br * \f[I]tapas\f[] .br * \f[I]tsumino\f[] .br * \f[I]twitter\f[] .br * \f[I]vipergirls\f[] .br * \f[I]zerochan\f[] These values can also be specified via the \f[I]-u/--username\f[] and \f[I]-p/--password\f[] command-line options or by using a \f[I].netrc\f[] file. (see Authentication_) (*) The password value for these sites should be the API key found in your user profile, not the actual account password. Note: Leave the \f[I]password\f[] value empty or undefined to get prompted for a passeword when performing a login (see \f[I]getpass()\f[]). .SS extractor.*.netrc .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Enable the use of \f[I].netrc\f[] authentication data. .SS extractor.*.cookies .IP "Type:" 6 .br * \f[I]Path\f[] .br * \f[I]object\f[] (name -> value) .br * \f[I]list\f[] .IP "Description:" 4 Source to read additional cookies from. This can be .br * The \f[I]Path\f[] to a Mozilla/Netscape format cookies.txt file .. code:: json "~/.local/share/cookies-instagram-com.txt" .br * An \f[I]object\f[] specifying cookies as name-value pairs .. code:: json { "cookie-name": "cookie-value", "sessionid" : "14313336321%3AsabDFvuASDnlpb%3A31", "isAdult" : "1" } .br * A \f[I]list\f[] with up to 5 entries specifying a browser profile. .br * The first entry is the browser name .br * The optional second entry is a profile name or an absolute path to a profile directory .br * The optional third entry is the keyring to retrieve passwords for decrypting cookies from .br * The optional fourth entry is a (Firefox) container name (\f[I]"none"\f[] for only cookies with no container) .br * The optional fifth entry is the domain to extract cookies for. Prefix it with a dot \f[I].\f[] to include cookies for subdomains. Has no effect when also specifying a container. .. code:: json ["firefox"] ["firefox", null, null, "Personal"] ["chromium", "Private", "kwallet", null, ".twitter.com"] .SS extractor.*.cookies-update .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]Path\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Export session cookies in cookies.txt format. .br * If this is a \f[I]Path\f[], write cookies to the given file path. .br * If this is \f[I]true\f[] and \f[I]extractor.*.cookies\f[] specifies the \f[I]Path\f[] of a valid cookies.txt file, update its contents. .SS extractor.*.proxy .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]object\f[] (scheme -> proxy) .IP "Example:" 4 .. code:: json "http://10.10.1.10:3128" .. code:: json { "http" : "http://10.10.1.10:3128", "https": "http://10.10.1.10:1080", "http://10.20.1.128": "http://10.10.1.10:5323" } .IP "Description:" 4 Proxy (or proxies) to be used for remote connections. .br * If this is a \f[I]string\f[], it is the proxy URL for all outgoing requests. .br * If this is an \f[I]object\f[], it is a scheme-to-proxy mapping to specify different proxy URLs for each scheme. It is also possible to set a proxy for a specific host by using \f[I]scheme://host\f[] as key. See \f[I]Requests' proxy documentation\f[] for more details. Note: If a proxy URLs does not include a scheme, \f[I]http://\f[] is assumed. .SS extractor.*.source-address .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] with 1 \f[I]string\f[] and 1 \f[I]integer\f[] as elements .IP "Example:" 4 .br * "192.168.178.20" .br * ["192.168.178.20", 8080] .IP "Description:" 4 Client-side IP address to bind to. Can be either a simple \f[I]string\f[] with just the local IP address .br or a \f[I]list\f[] with IP and explicit port number as elements. .br .SS extractor.*.user-agent .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/115.0"\f[] .IP "Description:" 4 User-Agent header value to be used for HTTP requests. Setting this value to \f[I]"browser"\f[] will try to automatically detect and use the User-Agent used by the system's default browser. Note: This option has no effect on pixiv, e621, and mangadex extractors, as these need specific values to function correctly. .SS extractor.*.browser .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 .br * \f[I]"firefox"\f[] for \f[I]patreon\f[], \f[I]mangapark\f[], and \f[I]mangasee\f[] .br * \f[I]null\f[] everywhere else .IP "Example:" 4 .br * "chrome:macos" .IP "Description:" 4 Try to emulate a real browser (\f[I]firefox\f[] or \f[I]chrome\f[]) by using their default HTTP headers and TLS ciphers for HTTP requests. Optionally, the operating system used in the \f[I]User-Agent\f[] header can be specified after a \f[I]:\f[] (\f[I]windows\f[], \f[I]linux\f[], or \f[I]macos\f[]). Note: \f[I]requests\f[] and \f[I]urllib3\f[] only support HTTP/1.1, while a real browser would use HTTP/2. .SS extractor.*.referer .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Send \f[I]Referer\f[] headers with all outgoing HTTP requests. If this is a \f[I]string\f[], send it as Referer instead of the extractor's \f[I]root\f[] domain. .SS extractor.*.headers .IP "Type:" 6 \f[I]object\f[] (name -> value) .IP "Default:" 9 .. code:: json { "User-Agent" : "", "Accept" : "*/*", "Accept-Language": "en-US,en;q=0.5", "Accept-Encoding": "gzip, deflate", "Referer" : "" } .IP "Description:" 4 Additional \f[I]HTTP headers\f[] to be sent with each HTTP request, To disable sending a header, set its value to \f[I]null\f[]. .SS extractor.*.ciphers .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Example:" 4 .. code:: json ["ECDHE-ECDSA-AES128-GCM-SHA256", "ECDHE-RSA-AES128-GCM-SHA256", "ECDHE-ECDSA-CHACHA20-POLY1305", "ECDHE-RSA-CHACHA20-POLY1305"] .IP "Description:" 4 List of TLS/SSL cipher suites in \f[I]OpenSSL cipher list format\f[] to be passed to \f[I]ssl.SSLContext.set_ciphers()\f[] .SS extractor.*.tls12 .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 .br * \f[I]true\f[] .br * \f[I]false\f[] for \f[I]patreon\f[], \f[I]pixiv:series\f[] .IP "Description:" 4 Allow selecting TLS 1.2 cipher suites. Can be disabled to alter TLS fingerprints and potentially bypass Cloudflare blocks. .SS extractor.*.keywords .IP "Type:" 6 \f[I]object\f[] (name -> value) .IP "Example:" 4 {"type": "Pixel Art", "type_id": 123} .IP "Description:" 4 Additional name-value pairs to be added to each metadata dictionary. .SS extractor.*.keywords-default .IP "Type:" 6 any .IP "Default:" 9 \f[I]"None"\f[] .IP "Description:" 4 Default value used for missing or undefined keyword names in \f[I]format strings\f[]. .SS extractor.*.url-metadata .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 Insert a file's download URL into its metadata dictionary as the given name. For example, setting this option to \f[I]"gdl_file_url"\f[] will cause a new metadata field with name \f[I]gdl_file_url\f[] to appear, which contains the current file's download URL. This can then be used in \f[I]filenames\f[], with a \f[I]metadata\f[] post processor, etc. .SS extractor.*.path-metadata .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 Insert a reference to the current \f[I]PathFormat\f[] data structure into metadata dictionaries as the given name. For example, setting this option to \f[I]"gdl_path"\f[] would make it possible to access the current file's filename as \f[I]"{gdl_path.filename}"\f[]. .SS extractor.*.extractor-metadata .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 Insert a reference to the current \f[I]Extractor\f[] object into metadata dictionaries as the given name. .SS extractor.*.http-metadata .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 Insert an \f[I]object\f[] containing a file's HTTP headers and \f[I]filename\f[], \f[I]extension\f[], and \f[I]date\f[] parsed from them into metadata dictionaries as the given name. For example, setting this option to \f[I]"gdl_http"\f[] would make it possible to access the current file's \f[I]Last-Modified\f[] header as \f[I]"{gdl_http[Last-Modified]}"\f[] and its parsed form as \f[I]"{gdl_http[date]}"\f[]. .SS extractor.*.version-metadata .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 Insert an \f[I]object\f[] containing gallery-dl's version info into metadata dictionaries as the given name. The content of the object is as follows: .. code:: json { "version" : "string", "is_executable" : "bool", "current_git_head": "string or null" } .SS extractor.*.category-transfer .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 Extractor-specific .IP "Description:" 4 Transfer an extractor's (sub)category values to all child extractors spawned by it, to let them inherit their parent's config options. .SS extractor.*.blacklist & .whitelist .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]["oauth", "recursive", "test"]\f[] + current extractor category .IP "Example:" 4 ["imgur", "redgifs:user", "*:image"] .IP "Description:" 4 A list of extractor identifiers to ignore (or allow) when spawning child extractors for unknown URLs, e.g. from \f[I]reddit\f[] or \f[I]plurk\f[]. Each identifier can be .br * A category or basecategory name (\f[I]"imgur"\f[], \f[I]"mastodon"\f[]) .br * | A (base)category-subcategory pair, where both names are separated by a colon (\f[I]"redgifs:user"\f[]). Both names can be a * or left empty, matching all possible names (\f[I]"*:image"\f[], \f[I]":user"\f[]). .br Note: Any \f[I]blacklist\f[] setting will automatically include \f[I]"oauth"\f[], \f[I]"recursive"\f[], and \f[I]"test"\f[]. .SS extractor.*.archive .IP "Type:" 6 \f[I]Path\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Example:" 4 "$HOME/.archives/{category}.sqlite3" .IP "Description:" 4 File to store IDs of downloaded files in. Downloads of files already recorded in this archive file will be \f[I]skipped\f[]. The resulting archive file is not a plain text file but an SQLite3 database, as either lookup operations are significantly faster or memory requirements are significantly lower when the amount of stored IDs gets reasonably large. Note: Archive files that do not already exist get generated automatically. Note: Archive paths support regular \f[I]format string\f[] replacements, but be aware that using external inputs for building local paths may pose a security risk. .SS extractor.*.archive-format .IP "Type:" 6 \f[I]string\f[] .IP "Example:" 4 "{id}_{offset}" .IP "Description:" 4 An alternative \f[I]format string\f[] to build archive IDs with. .SS extractor.*.archive-prefix .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"{category}"\f[] .IP "Description:" 4 Prefix for archive IDs. .SS extractor.*.archive-pragma .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Example:" 4 ["journal_mode=WAL", "synchronous=NORMAL"] .IP "Description:" 4 A list of SQLite \f[I]PRAGMA\f[] statements to run during archive initialization. See \f[I]\f[] for available \f[I]PRAGMA\f[] statements and further details. .SS extractor.*.postprocessors .IP "Type:" 6 \f[I]list\f[] of \f[I]Postprocessor Configuration\f[] objects .IP "Example:" 4 .. code:: json [ { "name": "zip" , "compression": "store" }, { "name": "exec", "command": ["/home/foobar/script", "{category}", "{image_id}"] } ] .IP "Description:" 4 A list of \f[I]post processors\f[] to be applied to each downloaded file in the specified order. Unlike other options, a \f[I]postprocessors\f[] setting at a deeper level .br does not override any \f[I]postprocessors\f[] setting at a lower level. Instead, all post processors from all applicable \f[I]postprocessors\f[] .br settings get combined into a single list. For example .br * an \f[I]mtime\f[] post processor at \f[I]extractor.postprocessors\f[], .br * a \f[I]zip\f[] post processor at \f[I]extractor.pixiv.postprocessors\f[], .br * and using \f[I]--exec\f[] will run all three post processors - \f[I]mtime\f[], \f[I]zip\f[], \f[I]exec\f[] - for each downloaded \f[I]pixiv\f[] file. .SS extractor.*.postprocessor-options .IP "Type:" 6 \f[I]object\f[] (name -> value) .IP "Example:" 4 .. code:: json { "archive": null, "keep-files": true } .IP "Description:" 4 Additional \f[I]Postprocessor Options\f[] that get added to each individual \f[I]post processor object\f[] before initializing it and evaluating filters. .SS extractor.*.retries .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]4\f[] .IP "Description:" 4 Maximum number of times a failed HTTP request is retried before giving up, or \f[I]-1\f[] for infinite retries. .SS extractor.*.retry-codes .IP "Type:" 6 \f[I]list\f[] of \f[I]integers\f[] .IP "Example:" 4 [404, 429, 430] .IP "Description:" 4 Additional \f[I]HTTP response status codes\f[] to retry an HTTP request on. \f[I]2xx\f[] codes (success responses) and \f[I]3xx\f[] codes (redirection messages) will never be retried and always count as success, regardless of this option. \f[I]5xx\f[] codes (server error responses) will always be retried, regardless of this option. .SS extractor.*.timeout .IP "Type:" 6 \f[I]float\f[] .IP "Default:" 9 \f[I]30.0\f[] .IP "Description:" 4 Amount of time (in seconds) to wait for a successful connection and response from a remote server. This value gets internally used as the \f[I]timeout\f[] parameter for the \f[I]requests.request()\f[] method. .SS extractor.*.verify .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Controls whether to verify SSL/TLS certificates for HTTPS requests. If this is a \f[I]string\f[], it must be the path to a CA bundle to use instead of the default certificates. This value gets internally used as the \f[I]verify\f[] parameter for the \f[I]requests.request()\f[] method. .SS extractor.*.download .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Controls whether to download media files. Setting this to \f[I]false\f[] won't download any files, but all other functions (\f[I]postprocessors\f[], \f[I]download archive\f[], etc.) will be executed as normal. .SS extractor.*.fallback .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Use fallback download URLs when a download fails. .SS extractor.*.image-range .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Examples:" 4 .br * \f[I]"10-20"\f[] .br * \f[I]"-5, 10, 30-50, 100-"\f[] .br * \f[I]"10:21, 30:51:2, :5, 100:"\f[] .br * \f[I]["-5", "10", "30-50", "100-"]\f[] .IP "Description:" 4 Index range(s) selecting which files to download. These can be specified as .br * index: \f[I]3\f[] (file number 3) .br * range: \f[I]2-4\f[] (files 2, 3, and 4) .br * \f[I]slice\f[]: \f[I]3:8:2\f[] (files 3, 5, and 7) Arguments for range and slice notation are optional .br and will default to begin (\f[I]1\f[]) or end (\f[I]sys.maxsize\f[]) if omitted. For example \f[I]5-\f[], \f[I]5:\f[], and \f[I]5::\f[] all mean "Start at file number 5". .br Note: The index of the first file is \f[I]1\f[]. .SS extractor.*.chapter-range .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 Like \f[I]image-range\f[], but applies to delegated URLs like manga chapters, etc. .SS extractor.*.image-filter .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Examples:" 4 .br * \f[I]"re.search(r'foo(bar)+', description)"\f[] .br * \f[I]["width >= 1200", "width/height > 1.2"]\f[] .IP "Description:" 4 Python expression controlling which files to download. A file only gets downloaded when *all* of the given expressions evaluate to \f[I]True\f[]. Available values are the filename-specific ones listed by \f[I]-K\f[] or \f[I]-j\f[]. .SS extractor.*.chapter-filter .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Examples:" 4 .br * \f[I]"lang == 'en'"\f[] .br * \f[I]["language == 'French'", "10 <= chapter < 20"]\f[] .IP "Description:" 4 Like \f[I]image-filter\f[], but applies to delegated URLs like manga chapters, etc. .SS extractor.*.image-unique .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Ignore image URLs that have been encountered before during the current extractor run. .SS extractor.*.chapter-unique .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Like \f[I]image-unique\f[], but applies to delegated URLs like manga chapters, etc. .SS extractor.*.date-format .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"%Y-%m-%dT%H:%M:%S"\f[] .IP "Description:" 4 Format string used to parse \f[I]string\f[] values of date-min and date-max. See \f[I]strptime\f[] for a list of formatting directives. Note: Despite its name, this option does **not** control how \f[I]{date}\f[] metadata fields are formatted. To use a different formatting for those values other than the default \f[I]%Y-%m-%d %H:%M:%S\f[], put \f[I]strptime\f[] formatting directives after a colon \f[I]:\f[], for example \f[I]{date:%Y%m%d}\f[]. .SS extractor.*.write-pages .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 During data extraction, write received HTTP request data to enumerated files in the current working directory. Special values: .br * \f[I]"all"\f[]: Include HTTP request and response headers. Hide \f[I]Authorization\f[], \f[I]Cookie\f[], and \f[I]Set-Cookie\f[] values. .br * \f[I]"ALL"\f[]: Include all HTTP request and response headers. .SH EXTRACTOR-SPECIFIC OPTIONS .SS extractor.artstation.external .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Try to follow external URLs of embedded players. .SS extractor.artstation.max-posts .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Limit the number of posts/projects to download. .SS extractor.artstation.previews .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download video previews. .SS extractor.artstation.videos .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download video clips. .SS extractor.artstation.search.pro-first .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Enable the "Show Studio and Pro member artwork first" checkbox when retrieving search results. .SS extractor.aryion.recursive .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Controls the post extraction strategy. .br * \f[I]true\f[]: Start on users' main gallery pages and recursively descend into subfolders .br * \f[I]false\f[]: Get posts from "Latest Updates" pages .SS extractor.bbc.width .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]1920\f[] .IP "Description:" 4 Specifies the requested image width. This value must be divisble by 16 and gets rounded down otherwise. The maximum possible value appears to be \f[I]1920\f[]. .SS extractor.behance.modules .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]["image", "video", "mediacollection", "embed"]\f[] .IP "Description:" 4 Selects which gallery modules to download from. Supported module types are \f[I]image\f[], \f[I]video\f[], \f[I]mediacollection\f[], \f[I]embed\f[], \f[I]text\f[]. .SS extractor.blogger.videos .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download embedded videos hosted on https://www.blogger.com/ .SS extractor.bluesky.include .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"media"\f[] .IP "Example:" 4 .br * "avatar,background,posts" .br * ["avatar", "background", "posts"] .IP "Description:" 4 A (comma-separated) list of subcategories to include when processing a user profile. Possible values are \f[I]"avatar"\f[], \f[I]"background"\f[], \f[I]"posts"\f[], \f[I]"replies"\f[], \f[I]"media"\f[], \f[I]"likes"\f[], It is possible to use \f[I]"all"\f[] instead of listing all values separately. .SS extractor.bluesky.metadata .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Example:" 4 .br * "facets,user" .br * ["facets", "user"] .IP "Description:" 4 Extract additional metadata. .br * \f[I]facets\f[]: \f[I]hashtags\f[], \f[I]mentions\f[], and \f[I]uris\f[] .br * \f[I]user\f[]: detailed \f[I]user\f[] metadata for the user referenced in the input URL (See \f[I]app.bsky.actor.getProfile\f[]). .SS extractor.bluesky.post.depth .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]0\f[] .IP "Description:" 4 Sets the maximum depth of returned reply posts. (See depth parameter of \f[I]app.bsky.feed.getPostThread\f[]) .SS extractor.bluesky.reposts .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Process reposts. .SS extractor.cyberdrop.domain .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Example:" 4 "cyberdrop.to" .IP "Description:" 4 Specifies the domain used by \f[I]cyberdrop\f[] regardless of input URL. Setting this option to \f[I]"auto"\f[] uses the same domain as a given input URL. .SS extractor.danbooru.external .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 For unavailable or restricted posts, follow the \f[I]source\f[] and download from there if possible. .SS extractor.danbooru.ugoira .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Controls the download target for Ugoira posts. .br * \f[I]true\f[]: Original ZIP archives .br * \f[I]false\f[]: Converted video files .SS extractor.[Danbooru].metadata .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Example:" 4 .br * replacements,comments,ai_tags .br * ["replacements", "comments", "ai_tags"] .IP "Description:" 4 Extract additional metadata (notes, artist commentary, parent, children, uploader) It is possible to specify a custom list of metadata includes. See \f[I]available_includes\f[] for possible field names. \f[I]aibooru\f[] also supports \f[I]ai_metadata\f[]. Note: This requires 1 additional HTTP request per 200-post batch. .SS extractor.[Danbooru].threshold .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]integer\f[] .IP "Default:" 9 \f[I]"auto"\f[] .IP "Description:" 4 Stop paginating over API results if the length of a batch of returned posts is less than the specified number. Defaults to the per-page limit of the current instance, which is 200. Note: Changing this setting is normally not necessary. When the value is greater than the per-page limit, gallery-dl will stop after the first batch. The value cannot be less than 1. .SS extractor.derpibooru.api-key .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Your \f[I]Derpibooru API Key\f[], to use your account's browsing settings and filters. .SS extractor.derpibooru.filter .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]56027\f[] (\f[I]Everything\f[] filter) .IP "Description:" 4 The content filter ID to use. Setting an explicit filter ID overrides any default filters and can be used to access 18+ content without \f[I]API Key\f[]. See \f[I]Filters\f[] for details. .SS extractor.deviantart.auto-watch .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Automatically watch users when encountering "Watchers-Only Deviations" (requires a \f[I]refresh-token\f[]). .SS extractor.deviantart.auto-unwatch .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 After watching a user through \f[I]auto-watch\f[], unwatch that user at the end of the current extractor run. .SS extractor.deviantart.comments .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract \f[I]comments\f[] metadata. .SS extractor.deviantart.comments-avatars .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download the avatar of each commenting user. Note: Enabling this option also enables deviantart.comments_. .SS extractor.deviantart.extra .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download extra Sta.sh resources from description texts and journals. Note: Enabling this option also enables deviantart.metadata_. .SS extractor.deviantart.flat .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Select the directory structure created by the Gallery- and Favorite-Extractors. .br * \f[I]true\f[]: Use a flat directory structure. .br * \f[I]false\f[]: Collect a list of all gallery-folders or favorites-collections and transfer any further work to other extractors (\f[I]folder\f[] or \f[I]collection\f[]), which will then create individual subdirectories for each of them. Note: Going through all gallery folders will not be able to fetch deviations which aren't in any folder. .SS extractor.deviantart.folders .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Provide a \f[I]folders\f[] metadata field that contains the names of all folders a deviation is present in. Note: Gathering this information requires a lot of API calls. Use with caution. .SS extractor.deviantart.group .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Check whether the profile name in a given URL belongs to a group or a regular user. When disabled, assume every given profile name belongs to a regular user. Special values: .br * \f[I]"skip"\f[]: Skip groups .SS extractor.deviantart.include .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"gallery"\f[] .IP "Example:" 4 .br * "favorite,journal,scraps" .br * ["favorite", "journal", "scraps"] .IP "Description:" 4 A (comma-separated) list of subcategories to include when processing a user profile. Possible values are \f[I]"avatar"\f[], \f[I]"background"\f[], \f[I]"gallery"\f[], \f[I]"scraps"\f[], \f[I]"journal"\f[], \f[I]"favorite"\f[], \f[I]"status"\f[]. It is possible to use \f[I]"all"\f[] instead of listing all values separately. .SS extractor.deviantart.intermediary .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 For older non-downloadable images, download a higher-quality \f[I]/intermediary/\f[] version. .SS extractor.deviantart.journals .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"html"\f[] .IP "Description:" 4 Selects the output format for textual content. This includes journals, literature and status updates. .br * \f[I]"html"\f[]: HTML with (roughly) the same layout as on DeviantArt. .br * \f[I]"text"\f[]: Plain text with image references and HTML tags removed. .br * \f[I]"none"\f[]: Don't download textual content. .SS extractor.deviantart.jwt .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Update \f[I]JSON Web Tokens\f[] (the \f[I]token\f[] URL parameter) of otherwise non-downloadable, low-resolution images to be able to download them in full resolution. Note: No longer functional as of 2023-10-11 .SS extractor.deviantart.mature .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Enable mature content. This option simply sets the \f[I]mature_content\f[] parameter for API calls to either \f[I]"true"\f[] or \f[I]"false"\f[] and does not do any other form of content filtering. .SS extractor.deviantart.metadata .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Example:" 4 .br * "stats,submission" .br * ["camera", "stats", "submission"] .IP "Description:" 4 Extract additional metadata for deviation objects. Provides \f[I]description\f[], \f[I]tags\f[], \f[I]license\f[], and \f[I]is_watching\f[] fields when enabled. It is possible to request extended metadata by specifying a list of .br * \f[I]camera\f[] : EXIF information (if available) .br * \f[I]stats\f[] : deviation statistics .br * \f[I]submission\f[] : submission information .br * \f[I]collection\f[] : favourited folder information (requires a \f[I]refresh token\f[]) .br * \f[I]gallery\f[] : gallery folder information (requires a \f[I]refresh token\f[]) Set this option to \f[I]"all"\f[] to request all extended metadata categories. See \f[I]/deviation/metadata\f[] for official documentation. .SS extractor.deviantart.original .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download original files if available. Setting this option to \f[I]"images"\f[] only downloads original files if they are images and falls back to preview versions for everything else (archives, etc.). .SS extractor.deviantart.pagination .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"api"\f[] .IP "Description:" 4 Controls when to stop paginating over API results. .br * \f[I]"api"\f[]: Trust the API and stop when \f[I]has_more\f[] is \f[I]false\f[]. .br * \f[I]"manual"\f[]: Disregard \f[I]has_more\f[] and only stop when a batch of results is empty. .SS extractor.deviantart.public .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Use a public access token for API requests. Disable this option to *force* using a private token for all requests when a \f[I]refresh token\f[] is provided. .SS extractor.deviantart.quality .IP "Type:" 6 .br * \f[I]integer\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]100\f[] .IP "Description:" 4 JPEG quality level of images for which an original file download is not available. Set this to \f[I]"png"\f[] to download a PNG version of these images instead. .SS extractor.deviantart.refresh-token .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 The \f[I]refresh-token\f[] value you get from \f[I]linking your DeviantArt account to gallery-dl\f[]. Using a \f[I]refresh-token\f[] allows you to access private or otherwise not publicly available deviations. Note: The \f[I]refresh-token\f[] becomes invalid \f[I]after 3 months\f[] or whenever your \f[I]cache file\f[] is deleted or cleared. .SS extractor.deviantart.wait-min .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]0\f[] .IP "Description:" 4 Minimum wait time in seconds before API requests. .SS extractor.deviantart.avatar.formats .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Example:" 4 ["original.jpg", "big.jpg", "big.gif", ".png"] .IP "Description:" 4 Avatar URL formats to return. Each format is parsed as \f[I]SIZE.EXT\f[]. .br Leave \f[I]SIZE\f[] empty to download the regular, small avatar format. .br .SS extractor.[E621].metadata .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Example:" 4 .br * "notes,pools" .br * ["notes", "pools"] .IP "Description:" 4 Extract additional metadata (notes, pool metadata) if available. Note: This requires 0-2 additional HTTP requests per post. .SS extractor.[E621].threshold .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]integer\f[] .IP "Default:" 9 \f[I]"auto"\f[] .IP "Description:" 4 Stop paginating over API results if the length of a batch of returned posts is less than the specified number. Defaults to the per-page limit of the current instance, which is 320. Note: Changing this setting is normally not necessary. When the value is greater than the per-page limit, gallery-dl will stop after the first batch. The value cannot be less than 1. .SS extractor.exhentai.domain .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"auto"\f[] .IP "Description:" 4 .br * \f[I]"auto"\f[]: Use \f[I]e-hentai.org\f[] or \f[I]exhentai.org\f[] depending on the input URL .br * \f[I]"e-hentai.org"\f[]: Use \f[I]e-hentai.org\f[] for all URLs .br * \f[I]"exhentai.org"\f[]: Use \f[I]exhentai.org\f[] for all URLs .SS extractor.exhentai.fallback-retries .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]2\f[] .IP "Description:" 4 Number of times a failed image gets retried or \f[I]-1\f[] for infinite retries. .SS extractor.exhentai.fav .IP "Type:" 6 \f[I]string\f[] .IP "Example:" 4 "4" .IP "Description:" 4 After downloading a gallery, add it to your account's favorites as the given category number. Note: Set this to "favdel" to remove galleries from your favorites. Note: This will remove any Favorite Notes when applied to already favorited galleries. .SS extractor.exhentai.gp .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"resized"\f[] .IP "Description:" 4 Selects how to handle "you do not have enough GP" errors. .br * "resized": Continue downloading \f[I]non-original\f[] images. .br * "stop": Stop the current extractor run. .br * "wait": Wait for user input before retrying the current image. .SS extractor.exhentai.limits .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Sets a custom image download limit and stops extraction when it gets exceeded. .SS extractor.exhentai.metadata .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Load extended gallery metadata from the \f[I]API\f[]. Adds \f[I]archiver_key\f[], \f[I]posted\f[], and \f[I]torrents\f[]. Makes \f[I]date\f[] and \f[I]filesize\f[] more precise. .SS extractor.exhentai.original .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download full-sized original images if available. .SS extractor.exhentai.source .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"gallery"\f[] .IP "Description:" 4 Selects an alternative source to download files from. .br * \f[I]"hitomi"\f[]: Download the corresponding gallery from \f[I]hitomi.la\f[] .SS extractor.fanbox.embeds .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Control behavior on embedded content from external sites. .br * \f[I]true\f[]: Extract embed URLs and download them if supported (videos are not downloaded). .br * \f[I]"ytdl"\f[]: Like \f[I]true\f[], but let \f[I]youtube-dl\f[] handle video extraction and download for YouTube, Vimeo and SoundCloud embeds. .br * \f[I]false\f[]: Ignore embeds. .SS extractor.fanbox.metadata .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Example:" 4 .br * user,plan .br * ["user", "plan"] .IP "Description:" 4 Extract \f[I]plan\f[] and extended \f[I]user\f[] metadata. .SS extractor.flickr.access-token & .access-token-secret .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 The \f[I]access_token\f[] and \f[I]access_token_secret\f[] values you get from \f[I]linking your Flickr account to gallery-dl\f[]. .SS extractor.flickr.contexts .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 For each photo, return the albums and pools it belongs to as \f[I]set\f[] and \f[I]pool\f[] metadata. Note: This requires 1 additional API call per photo. See \f[I]flickr.photos.getAllContexts\f[] for details. .SS extractor.flickr.exif .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 For each photo, return its EXIF/TIFF/GPS tags as \f[I]exif\f[] and \f[I]camera\f[] metadata. Note: This requires 1 additional API call per photo. See \f[I]flickr.photos.getExif\f[] for details. .SS extractor.flickr.metadata .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Example:" 4 .br * license,last_update,machine_tags .br * ["license", "last_update", "machine_tags"] .IP "Description:" 4 Extract additional metadata (license, date_taken, original_format, last_update, geo, machine_tags, o_dims) It is possible to specify a custom list of metadata includes. See \f[I]the extras parameter\f[] in \f[I]Flickr's API docs\f[] for possible field names. .SS extractor.flickr.videos .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Extract and download videos. .SS extractor.flickr.size-max .IP "Type:" 6 .br * \f[I]integer\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Sets the maximum allowed size for downloaded images. .br * If this is an \f[I]integer\f[], it specifies the maximum image dimension (width and height) in pixels. .br * If this is a \f[I]string\f[], it should be one of Flickr's format specifiers (\f[I]"Original"\f[], \f[I]"Large"\f[], ... or \f[I]"o"\f[], \f[I]"k"\f[], \f[I]"h"\f[], \f[I]"l"\f[], ...) to use as an upper limit. .SS extractor.furaffinity.descriptions .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"text"\f[] .IP "Description:" 4 Controls the format of \f[I]description\f[] metadata fields. .br * \f[I]"text"\f[]: Plain text with HTML tags removed .br * \f[I]"html"\f[]: Raw HTML content .SS extractor.furaffinity.external .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Follow external URLs linked in descriptions. .SS extractor.furaffinity.include .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"gallery"\f[] .IP "Example:" 4 .br * "scraps,favorite" .br * ["scraps", "favorite"] .IP "Description:" 4 A (comma-separated) list of subcategories to include when processing a user profile. Possible values are \f[I]"gallery"\f[], \f[I]"scraps"\f[], \f[I]"favorite"\f[]. It is possible to use \f[I]"all"\f[] instead of listing all values separately. .SS extractor.furaffinity.layout .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"auto"\f[] .IP "Description:" 4 Selects which site layout to expect when parsing posts. .br * \f[I]"auto"\f[]: Automatically differentiate between \f[I]"old"\f[] and \f[I]"new"\f[] .br * \f[I]"old"\f[]: Expect the *old* site layout .br * \f[I]"new"\f[]: Expect the *new* site layout .SS extractor.gelbooru.api-key & .user-id .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Values from the API Access Credentials section found at the bottom of your \f[I]Account Options\f[] page. .SS extractor.gelbooru.favorite.order-posts .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"desc"\f[] .IP "Description:" 4 Controls the order in which favorited posts are returned. .br * \f[I]"asc"\f[]: Ascending favorite date order (oldest first) .br * \f[I]"desc"\f[]: Descending favorite date order (newest first) .br * \f[I]"reverse"\f[]: Same as \f[I]"asc"\f[] .SS extractor.generic.enabled .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Match **all** URLs not otherwise supported by gallery-dl, even ones without a \f[I]generic:\f[] prefix. .SS extractor.gofile.api-token .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 API token value found at the bottom of your \f[I]profile page\f[]. If not set, a temporary guest token will be used. .SS extractor.gofile.website-token .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 API token value used during API requests. An invalid or not up-to-date value will result in \f[I]401 Unauthorized\f[] errors. Keeping this option unset will use an extra HTTP request to attempt to fetch the current value used by gofile. .SS extractor.gofile.recursive .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Recursively download files from subfolders. .SS extractor.hentaifoundry.include .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"pictures"\f[] .IP "Example:" 4 .br * "scraps,stories" .br * ["scraps", "stories"] .IP "Description:" 4 A (comma-separated) list of subcategories to include when processing a user profile. Possible values are \f[I]"pictures"\f[], \f[I]"scraps"\f[], \f[I]"stories"\f[], \f[I]"favorite"\f[]. It is possible to use \f[I]"all"\f[] instead of listing all values separately. .SS extractor.hitomi.format .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"webp"\f[] .IP "Description:" 4 Selects which image format to download. Available formats are \f[I]"webp"\f[] and \f[I]"avif"\f[]. \f[I]"original"\f[] will try to download the original \f[I]jpg\f[] or \f[I]png\f[] versions, but is most likely going to fail with \f[I]403 Forbidden\f[] errors. .SS extractor.imagechest.access-token .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 Your personal Image Chest access token. These tokens allow using the API instead of having to scrape HTML pages, providing more detailed metadata. (\f[I]date\f[], \f[I]description\f[], etc) See https://imgchest.com/docs/api/1.0/general/authorization for instructions on how to generate such a token. .SS extractor.imgur.client-id .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 Custom Client ID value for API requests. .SS extractor.imgur.mp4 .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Controls whether to choose the GIF or MP4 version of an animation. .br * \f[I]true\f[]: Follow Imgur's advice and choose MP4 if the \f[I]prefer_video\f[] flag in an image's metadata is set. .br * \f[I]false\f[]: Always choose GIF. .br * \f[I]"always"\f[]: Always choose MP4. .SS extractor.inkbunny.orderby .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"create_datetime"\f[] .IP "Description:" 4 Value of the \f[I]orderby\f[] parameter for submission searches. (See \f[I]API#Search\f[] for details) .SS extractor.instagram.api .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"rest"\f[] .IP "Description:" 4 Selects which API endpoints to use. .br * \f[I]"rest"\f[]: REST API - higher-resolution media .br * \f[I]"graphql"\f[]: GraphQL API - lower-resolution media .SS extractor.instagram.include .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"posts"\f[] .IP "Example:" 4 .br * "stories,highlights,posts" .br * ["stories", "highlights", "posts"] .IP "Description:" 4 A (comma-separated) list of subcategories to include when processing a user profile. Possible values are \f[I]"posts"\f[], \f[I]"reels"\f[], \f[I]"tagged"\f[], \f[I]"stories"\f[], \f[I]"highlights"\f[], \f[I]"avatar"\f[]. It is possible to use \f[I]"all"\f[] instead of listing all values separately. .SS extractor.instagram.metadata .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Provide extended \f[I]user\f[] metadata even when referring to a user by ID, e.g. \f[I]instagram.com/id:12345678\f[]. Note: This metadata is always available when referring to a user by name, e.g. \f[I]instagram.com/USERNAME\f[]. .SS extractor.instagram.order-files .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"asc"\f[] .IP "Description:" 4 Controls the order in which files of each post are returned. .br * \f[I]"asc"\f[]: Same order as displayed in a post .br * \f[I]"desc"\f[]: Reverse order as displayed in a post .br * \f[I]"reverse"\f[]: Same as \f[I]"desc"\f[] Note: This option does *not* affect \f[I]{num}\f[]. To enumerate files in reverse order, use \f[I]count - num + 1\f[]. .SS extractor.instagram.order-posts .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"asc"\f[] .IP "Description:" 4 Controls the order in which posts are returned. .br * \f[I]"asc"\f[]: Same order as displayed .br * \f[I]"desc"\f[]: Reverse order as displayed .br * \f[I]"id"\f[] or \f[I]"id_asc"\f[]: Ascending order by ID .br * \f[I]"id_desc"\f[]: Descending order by ID .br * \f[I]"reverse"\f[]: Same as \f[I]"desc"\f[] Note: This option only affects \f[I]highlights\f[]. .SS extractor.instagram.previews .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download video previews. .SS extractor.instagram.videos .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download video files. .SS extractor.itaku.videos .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download video files. .SS extractor.kemonoparty.comments .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract \f[I]comments\f[] metadata. Note: This requires 1 additional HTTP request per post. .SS extractor.kemonoparty.duplicates .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Controls how to handle duplicate files in a post. .br * \f[I]true\f[]: Download duplicates .br * \f[I]false\f[]: Ignore duplicates .SS extractor.kemonoparty.dms .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract a user's direct messages as \f[I]dms\f[] metadata. .SS extractor.kemonoparty.favorites .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]artist\f[] .IP "Description:" 4 Determines the type of favorites to be downloaded. Available types are \f[I]artist\f[], and \f[I]post\f[]. .SS extractor.kemonoparty.files .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]["attachments", "file", "inline"]\f[] .IP "Description:" 4 Determines the type and order of files to be downloaded. Available types are \f[I]file\f[], \f[I]attachments\f[], and \f[I]inline\f[]. .SS extractor.kemonoparty.max-posts .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Limit the number of posts to download. .SS extractor.kemonoparty.metadata .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract \f[I]username\f[] metadata. .SS extractor.kemonoparty.revisions .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract post revisions. Set this to \f[I]"unique"\f[] to filter out duplicate revisions. Note: This requires 1 additional HTTP request per post. .SS extractor.kemonoparty.order-revisions .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"desc"\f[] .IP "Description:" 4 Controls the order in which \f[I]revisions\f[] are returned. .br * \f[I]"asc"\f[]: Ascending order (oldest first) .br * \f[I]"desc"\f[]: Descending order (newest first) .br * \f[I]"reverse"\f[]: Same as \f[I]"asc"\f[] .SS extractor.khinsider.format .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"mp3"\f[] .IP "Description:" 4 The name of the preferred file format to download. Use \f[I]"all"\f[] to download all available formats, or a (comma-separated) list to select multiple formats. If the selected format is not available, the first in the list gets chosen (usually mp3). .SS extractor.lolisafe.domain .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Specifies the domain used by a \f[I]lolisafe\f[] extractor regardless of input URL. Setting this option to \f[I]"auto"\f[] uses the same domain as a given input URL. .SS extractor.luscious.gif .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Format in which to download animated images. Use \f[I]true\f[] to download animated images as gifs and \f[I]false\f[] to download as mp4 videos. .SS extractor.mangadex.api-server .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"https://api.mangadex.org"\f[] .IP "Description:" 4 The server to use for API requests. .SS extractor.mangadex.api-parameters .IP "Type:" 6 \f[I]object\f[] (name -> value) .IP "Example:" 4 {"order[updatedAt]": "desc"} .IP "Description:" 4 Additional query parameters to send when fetching manga chapters. (See \f[I]/manga/{id}/feed\f[] and \f[I]/user/follows/manga/feed\f[]) .SS extractor.mangadex.lang .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Example:" 4 .br * "en" .br * "fr,it" .br * ["fr", "it"] .IP "Description:" 4 \f[I]ISO 639-1\f[] language codes to filter chapters by. .SS extractor.mangadex.ratings .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]["safe", "suggestive", "erotica", "pornographic"]\f[] .IP "Description:" 4 List of acceptable content ratings for returned chapters. .SS extractor.mangapark.source .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]integer\f[] .IP "Example:" 4 .br * "koala:en" .br * 15150116 .IP "Description:" 4 Select chapter source and language for a manga. The general syntax is \f[I]":"\f[]. .br Both are optional, meaning \f[I]"koala"\f[], \f[I]"koala:"\f[], \f[I]":en"\f[], .br or even just \f[I]":"\f[] are possible as well. Specifying the numeric \f[I]ID\f[] of a source is also supported. .SS extractor.[mastodon].access-token .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 The \f[I]access-token\f[] value you get from \f[I]linking your account to gallery-dl\f[]. Note: gallery-dl comes with built-in tokens for \f[I]mastodon.social\f[], \f[I]pawoo\f[] and \f[I]baraag\f[]. For other instances, you need to obtain an \f[I]access-token\f[] in order to use usernames in place of numerical user IDs. .SS extractor.[mastodon].reblogs .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Fetch media from reblogged posts. .SS extractor.[mastodon].replies .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Fetch media from replies to other posts. .SS extractor.[mastodon].text-posts .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Also emit metadata for text-only posts without media content. .SS extractor.[misskey].access-token .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 Your access token, necessary to fetch favorited notes. .SS extractor.[misskey].renotes .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Fetch media from renoted notes. .SS extractor.[misskey].replies .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Fetch media from replies to other notes. .SS extractor.[moebooru].pool.metadata .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract extended \f[I]pool\f[] metadata. Note: Not supported by all \f[I]moebooru\f[] instances. .SS extractor.newgrounds.flash .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download original Adobe Flash animations instead of pre-rendered videos. .SS extractor.newgrounds.format .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"original"\f[] .IP "Example:" 4 "720p" .IP "Description:" 4 Selects the preferred format for video downloads. If the selected format is not available, the next smaller one gets chosen. .SS extractor.newgrounds.include .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"art"\f[] .IP "Example:" 4 .br * "movies,audio" .br * ["movies", "audio"] .IP "Description:" 4 A (comma-separated) list of subcategories to include when processing a user profile. Possible values are \f[I]"art"\f[], \f[I]"audio"\f[], \f[I]"games"\f[], \f[I]"movies"\f[]. It is possible to use \f[I]"all"\f[] instead of listing all values separately. .SS extractor.nijie.include .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"illustration,doujin"\f[] .IP "Description:" 4 A (comma-separated) list of subcategories to include when processing a user profile. Possible values are \f[I]"illustration"\f[], \f[I]"doujin"\f[], \f[I]"favorite"\f[], \f[I]"nuita"\f[]. It is possible to use \f[I]"all"\f[] instead of listing all values separately. .SS extractor.nitter.quoted .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Fetch media from quoted Tweets. .SS extractor.nitter.retweets .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Fetch media from Retweets. .SS extractor.nitter.videos .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Control video download behavior. .br * \f[I]true\f[]: Download videos .br * \f[I]"ytdl"\f[]: Download videos using \f[I]youtube-dl\f[] .br * \f[I]false\f[]: Skip video Tweets .SS extractor.oauth.browser .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Controls how a user is directed to an OAuth authorization page. .br * \f[I]true\f[]: Use Python's \f[I]webbrowser.open()\f[] method to automatically open the URL in the user's default browser. .br * \f[I]false\f[]: Ask the user to copy & paste an URL from the terminal. .SS extractor.oauth.cache .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Store tokens received during OAuth authorizations in \f[I]cache\f[]. .SS extractor.oauth.host .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"localhost"\f[] .IP "Description:" 4 Host name / IP address to bind to during OAuth authorization. .SS extractor.oauth.port .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]6414\f[] .IP "Description:" 4 Port number to listen on during OAuth authorization. Note: All redirects will go to port \f[I]6414\f[], regardless of the port specified here. You'll have to manually adjust the port number in your browser's address bar when using a different port than the default. .SS extractor.paheal.metadata .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract additional metadata (\f[I]source\f[], \f[I]uploader\f[]) Note: This requires 1 additional HTTP request per post. .SS extractor.patreon.files .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]["images", "image_large", "attachments", "postfile", "content"]\f[] .IP "Description:" 4 Determines the type and order of files to be downloaded. Available types are \f[I]postfile\f[], \f[I]images\f[], \f[I]image_large\f[], \f[I]attachments\f[], and \f[I]content\f[]. .SS extractor.photobucket.subalbums .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download subalbums. .SS extractor.pillowfort.external .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Follow links to external sites, e.g. Twitter, .SS extractor.pillowfort.inline .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Extract inline images. .SS extractor.pillowfort.reblogs .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract media from reblogged posts. .SS extractor.pinterest.domain .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"auto"\f[] .IP "Description:" 4 Specifies the domain used by \f[I]pinterest\f[] extractors. Setting this option to \f[I]"auto"\f[] uses the same domain as a given input URL. .SS extractor.pinterest.sections .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Include pins from board sections. .SS extractor.pinterest.videos .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download from video pins. .SS extractor.pixeldrain.api-key .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 Your account's \f[I]API key\f[] .SS extractor.pixiv.include .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"artworks"\f[] .IP "Example:" 4 .br * "avatar,background,artworks" .br * ["avatar", "background", "artworks"] .IP "Description:" 4 A (comma-separated) list of subcategories to include when processing a user profile. Possible values are \f[I]"artworks"\f[], \f[I]"avatar"\f[], \f[I]"background"\f[], \f[I]"favorite"\f[], \f[I]"novel-user"\f[], \f[I]"novel-bookmark"\f[]. It is possible to use \f[I]"all"\f[] instead of listing all values separately. .SS extractor.pixiv.refresh-token .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 The \f[I]refresh-token\f[] value you get from running \f[I]gallery-dl oauth:pixiv\f[] (see OAuth_) or by using a third-party tool like \f[I]gppt\f[]. .SS extractor.pixiv.embeds .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download images embedded in novels. .SS extractor.pixiv.novel.full-series .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 When downloading a novel being part of a series, download all novels of that series. .SS extractor.pixiv.metadata .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Fetch extended \f[I]user\f[] metadata. .SS extractor.pixiv.metadata-bookmark .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 For works bookmarked by \f[I]your own account\f[], fetch bookmark tags as \f[I]tags_bookmark\f[] metadata. Note: This requires 1 additional API call per bookmarked post. .SS extractor.pixiv.work.related .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Also download related artworks. .SS extractor.pixiv.tags .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"japanese"\f[] .IP "Description:" 4 Controls the \f[I]tags\f[] metadata field. .br * "japanese": List of Japanese tags .br * "translated": List of translated tags .br * "original": Unmodified list with both Japanese and translated tags .SS extractor.pixiv.ugoira .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download Pixiv's Ugoira animations or ignore them. These animations come as a \f[I].zip\f[] file containing all animation frames in JPEG format. Use an ugoira post processor to convert them to watchable videos. (Example__) .SS extractor.pixiv.max-posts .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]0\f[] .IP "Description:" 4 When downloading galleries, this sets the maximum number of posts to get. A value of \f[I]0\f[] means no limit. .SS extractor.plurk.comments .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Also search Plurk comments for URLs. .SS extractor.[postmill].save-link-post-body .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Whether or not to save the body for link/image posts. .SS extractor.reactor.gif .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Format in which to download animated images. Use \f[I]true\f[] to download animated images as gifs and \f[I]false\f[] to download as mp4 videos. .SS extractor.readcomiconline.captcha .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"stop"\f[] .IP "Description:" 4 Controls how to handle redirects to CAPTCHA pages. .br * \f[I]"stop\f[]: Stop the current extractor run. .br * \f[I]"wait\f[]: Ask the user to solve the CAPTCHA and wait. .SS extractor.readcomiconline.quality .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"auto"\f[] .IP "Description:" 4 Sets the \f[I]quality\f[] query parameter of issue pages. (\f[I]"lq"\f[] or \f[I]"hq"\f[]) \f[I]"auto"\f[] uses the quality parameter of the input URL or \f[I]"hq"\f[] if not present. .SS extractor.reddit.comments .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]0\f[] .IP "Description:" 4 The value of the \f[I]limit\f[] parameter when loading a submission and its comments. This number (roughly) specifies the total amount of comments being retrieved with the first API call. Reddit's internal default and maximum values for this parameter appear to be 200 and 500 respectively. The value \f[I]0\f[] ignores all comments and significantly reduces the time required when scanning a subreddit. .SS extractor.reddit.morecomments .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Retrieve additional comments by resolving the \f[I]more\f[] comment stubs in the base comment tree. Note: This requires 1 additional API call for every 100 extra comments. .SS extractor.reddit.date-min & .date-max .IP "Type:" 6 \f[I]Date\f[] .IP "Default:" 9 \f[I]0\f[] and \f[I]253402210800\f[] (timestamp of \f[I]datetime.max\f[]) .IP "Description:" 4 Ignore all submissions posted before/after this date. .SS extractor.reddit.id-min & .id-max .IP "Type:" 6 \f[I]string\f[] .IP "Example:" 4 "6kmzv2" .IP "Description:" 4 Ignore all submissions posted before/after the submission with this ID. .SS extractor.reddit.previews .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 For failed downloads from external URLs / child extractors, download Reddit's preview image/video if available. .SS extractor.reddit.recursion .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]0\f[] .IP "Description:" 4 Reddit extractors can recursively visit other submissions linked to in the initial set of submissions. This value sets the maximum recursion depth. Special values: .br * \f[I]0\f[]: Recursion is disabled .br * \f[I]-1\f[]: Infinite recursion (don't do this) .SS extractor.reddit.refresh-token .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 The \f[I]refresh-token\f[] value you get from \f[I]linking your Reddit account to gallery-dl\f[]. Using a \f[I]refresh-token\f[] allows you to access private or otherwise not publicly available subreddits, given that your account is authorized to do so, but requests to the reddit API are going to be rate limited at 600 requests every 10 minutes/600 seconds. .SS extractor.reddit.videos .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Control video download behavior. .br * \f[I]true\f[]: Download videos and use \f[I]youtube-dl\f[] to handle HLS and DASH manifests .br * \f[I]"ytdl"\f[]: Download videos and let \f[I]youtube-dl\f[] handle all of video extraction and download .br * \f[I]"dash"\f[]: Extract DASH manifest URLs and use \f[I]youtube-dl\f[] to download and merge them. (*) .br * \f[I]false\f[]: Ignore videos (*) This saves 1 HTTP request per video and might potentially be able to download otherwise deleted videos, but it will not always get the best video quality available. .SS extractor.redgifs.format .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]["hd", "sd", "gif"]\f[] .IP "Description:" 4 List of names of the preferred animation format, which can be \f[I]"hd"\f[], \f[I]"sd"\f[], \f[I]"gif"\f[], \f[I]"thumbnail"\f[], \f[I]"vthumbnail"\f[], or \f[I]"poster"\f[]. If a selected format is not available, the next one in the list will be tried until an available format is found. If the format is given as \f[I]string\f[], it will be extended with \f[I]["hd", "sd", "gif"]\f[]. Use a list with one element to restrict it to only one possible format. .SS extractor.sankaku.id-format .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"numeric"\f[] .IP "Description:" 4 Format of \f[I]id\f[] metadata fields. .br * \f[I]"alphanumeric"\f[] or \f[I]"alnum"\f[]: 11-character alphanumeric IDs (\f[I]y0abGlDOr2o\f[]) .br * \f[I]"numeric"\f[] or \f[I]"legacy"\f[]: numeric IDs (\f[I]360451\f[]) .SS extractor.sankaku.refresh .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Refresh download URLs before they expire. .SS extractor.sankakucomplex.embeds .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download video embeds from external sites. .SS extractor.sankakucomplex.videos .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download videos. .SS extractor.skeb.article .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download article images. .SS extractor.skeb.sent-requests .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download sent requests. .SS extractor.skeb.thumbnails .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download thumbnails. .SS extractor.skeb.search.filters .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]["genre:art", "genre:voice", "genre:novel", "genre:video", "genre:music", "genre:correction"]\f[] .IP "Example:" 4 "genre:music OR genre:voice" .IP "Description:" 4 Filters used during searches. .SS extractor.smugmug.videos .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download video files. .SS extractor.steamgriddb.animated .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Include animated assets when downloading from a list of assets. .SS extractor.steamgriddb.epilepsy .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Include assets tagged with epilepsy when downloading from a list of assets. .SS extractor.steamgriddb.dimensions .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"all"\f[] .IP "Examples:" 4 .br * \f[I]"1024x512,512x512"\f[] .br * \f[I]["460x215", "920x430"]\f[] .IP "Description:" 4 Only include assets that are in the specified dimensions. \f[I]all\f[] can be used to specify all dimensions. Valid values are: .br * Grids: \f[I]460x215\f[], \f[I]920x430\f[], \f[I]600x900\f[], \f[I]342x482\f[], \f[I]660x930\f[], \f[I]512x512\f[], \f[I]1024x1024\f[] .br * Heroes: \f[I]1920x620\f[], \f[I]3840x1240\f[], \f[I]1600x650\f[] .br * Logos: N/A (will be ignored) .br * Icons: \f[I]8x8\f[], \f[I]10x10\f[], \f[I]14x14\f[], \f[I]16x16\f[], \f[I]20x20\f[], \f[I]24x24\f[], \f[I]28x28\f[], \f[I]32x32\f[], \f[I]35x35\f[], \f[I]40x40\f[], \f[I]48x48\f[], \f[I]54x54\f[], \f[I]56x56\f[], \f[I]57x57\f[], \f[I]60x60\f[], \f[I]64x64\f[], \f[I]72x72\f[], \f[I]76x76\f[], \f[I]80x80\f[], \f[I]90x90\f[], \f[I]96x96\f[], \f[I]100x100\f[], \f[I]114x114\f[], \f[I]120x120\f[], \f[I]128x128\f[], \f[I]144x144\f[], \f[I]150x150\f[], \f[I]152x152\f[], \f[I]160x160\f[], \f[I]180x180\f[], \f[I]192x192\f[], \f[I]194x194\f[], \f[I]256x256\f[], \f[I]310x310\f[], \f[I]512x512\f[], \f[I]768x768\f[], \f[I]1024x1024\f[] .SS extractor.steamgriddb.file-types .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"all"\f[] .IP "Examples:" 4 .br * \f[I]"png,jpeg"\f[] .br * \f[I]["jpeg", "webp"]\f[] .IP "Description:" 4 Only include assets that are in the specified file types. \f[I]all\f[] can be used to specify all file types. Valid values are: .br * Grids: \f[I]png\f[], \f[I]jpeg\f[], \f[I]jpg\f[], \f[I]webp\f[] .br * Heroes: \f[I]png\f[], \f[I]jpeg\f[], \f[I]jpg\f[], \f[I]webp\f[] .br * Logos: \f[I]png\f[], \f[I]webp\f[] .br * Icons: \f[I]png\f[], \f[I]ico\f[] .SS extractor.steamgriddb.download-fake-png .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download fake PNGs alongside the real file. .SS extractor.steamgriddb.humor .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Include assets tagged with humor when downloading from a list of assets. .SS extractor.steamgriddb.languages .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"all"\f[] .IP "Examples:" 4 .br * \f[I]"en,km"\f[] .br * \f[I]["fr", "it"]\f[] .IP "Description:" 4 Only include assets that are in the specified languages. \f[I]all\f[] can be used to specify all languages. Valid values are \f[I]ISO 639-1\f[] language codes. .SS extractor.steamgriddb.nsfw .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Include assets tagged with adult content when downloading from a list of assets. .SS extractor.steamgriddb.sort .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]score_desc\f[] .IP "Description:" 4 Set the chosen sorting method when downloading from a list of assets. Can be one of: .br * \f[I]score_desc\f[] (Highest Score (Beta)) .br * \f[I]score_asc\f[] (Lowest Score (Beta)) .br * \f[I]score_old_desc\f[] (Highest Score (Old)) .br * \f[I]score_old_asc\f[] (Lowest Score (Old)) .br * \f[I]age_desc\f[] (Newest First) .br * \f[I]age_asc\f[] (Oldest First) .SS extractor.steamgriddb.static .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Include static assets when downloading from a list of assets. .SS extractor.steamgriddb.styles .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]all\f[] .IP "Examples:" 4 .br * \f[I]white,black\f[] .br * \f[I]["no_logo", "white_logo"]\f[] .IP "Description:" 4 Only include assets that are in the specified styles. \f[I]all\f[] can be used to specify all styles. Valid values are: .br * Grids: \f[I]alternate\f[], \f[I]blurred\f[], \f[I]no_logo\f[], \f[I]material\f[], \f[I]white_logo\f[] .br * Heroes: \f[I]alternate\f[], \f[I]blurred\f[], \f[I]material\f[] .br * Logos: \f[I]official\f[], \f[I]white\f[], \f[I]black\f[], \f[I]custom\f[] .br * Icons: \f[I]official\f[], \f[I]custom\f[] .SS extractor.steamgriddb.untagged .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Include untagged assets when downloading from a list of assets. .SS extractor.[szurubooru].username & .token .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 Username and login token of your account to access private resources. To generate a token, visit \f[I]/user/USERNAME/list-tokens\f[] and click \f[I]Create Token\f[]. .SS extractor.tumblr.avatar .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download blog avatars. .SS extractor.tumblr.date-min & .date-max .IP "Type:" 6 \f[I]Date\f[] .IP "Default:" 9 \f[I]0\f[] and \f[I]null\f[] .IP "Description:" 4 Ignore all posts published before/after this date. .SS extractor.tumblr.external .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Follow external URLs (e.g. from "Link" posts) and try to extract images from them. .SS extractor.tumblr.inline .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Search posts for inline images and videos. .SS extractor.tumblr.offset .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]0\f[] .IP "Description:" 4 Custom \f[I]offset\f[] starting value when paginating over blog posts. Allows skipping over posts without having to waste API calls. .SS extractor.tumblr.original .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download full-resolution \f[I]photo\f[] and \f[I]inline\f[] images. For each photo with "maximum" resolution (width equal to 2048 or height equal to 3072) or each inline image, use an extra HTTP request to find the URL to its full-resolution version. .SS extractor.tumblr.ratelimit .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"abort"\f[] .IP "Description:" 4 Selects how to handle exceeding the daily API rate limit. .br * \f[I]"abort"\f[]: Raise an error and stop extraction .br * \f[I]"wait"\f[]: Wait until rate limit reset .SS extractor.tumblr.reblogs .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 .br * \f[I]true\f[]: Extract media from reblogged posts .br * \f[I]false\f[]: Skip reblogged posts .br * \f[I]"same-blog"\f[]: Skip reblogged posts unless the original post is from the same blog .SS extractor.tumblr.posts .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"all"\f[] .IP "Example:" 4 .br * "video,audio,link" .br * ["video", "audio", "link"] .IP "Description:" 4 A (comma-separated) list of post types to extract images, etc. from. Possible types are \f[I]text\f[], \f[I]quote\f[], \f[I]link\f[], \f[I]answer\f[], \f[I]video\f[], \f[I]audio\f[], \f[I]photo\f[], \f[I]chat\f[]. It is possible to use \f[I]"all"\f[] instead of listing all types separately. .SS extractor.tumblr.fallback-delay .IP "Type:" 6 \f[I]float\f[] .IP "Default:" 9 \f[I]120.0\f[] .IP "Description:" 4 Number of seconds to wait between retries for fetching full-resolution images. .SS extractor.tumblr.fallback-retries .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]2\f[] .IP "Description:" 4 Number of retries for fetching full-resolution images or \f[I]-1\f[] for infinite retries. .SS extractor.twibooru.api-key .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Your \f[I]Twibooru API Key\f[], to use your account's browsing settings and filters. .SS extractor.twibooru.filter .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]2\f[] (\f[I]Everything\f[] filter) .IP "Description:" 4 The content filter ID to use. Setting an explicit filter ID overrides any default filters and can be used to access 18+ content without \f[I]API Key\f[]. See \f[I]Filters\f[] for details. .SS extractor.twitter.ads .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Fetch media from promoted Tweets. .SS extractor.twitter.cards .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Controls how to handle \f[I]Twitter Cards\f[]. .br * \f[I]false\f[]: Ignore cards .br * \f[I]true\f[]: Download image content from supported cards .br * \f[I]"ytdl"\f[]: Additionally download video content from unsupported cards using \f[I]youtube-dl\f[] .SS extractor.twitter.cards-blacklist .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Example:" 4 ["summary", "youtube.com", "player:twitch.tv"] .IP "Description:" 4 List of card types to ignore. Possible values are .br * card names .br * card domains .br * \f[I]:\f[] .SS extractor.twitter.conversations .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 For input URLs pointing to a single Tweet, e.g. https://twitter.com/i/web/status/, fetch media from all Tweets and replies in this \f[I]conversation \f[]. If this option is equal to \f[I]"accessible"\f[], only download from conversation Tweets if the given initial Tweet is accessible. .SS extractor.twitter.csrf .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"cookies"\f[] .IP "Description:" 4 Controls how to handle Cross Site Request Forgery (CSRF) tokens. .br * \f[I]"auto"\f[]: Always auto-generate a token. .br * \f[I]"cookies"\f[]: Use token given by the \f[I]ct0\f[] cookie if present. .SS extractor.twitter.expand .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 For each Tweet, return *all* Tweets from that initial Tweet's conversation or thread, i.e. *expand* all Twitter threads. Going through a timeline with this option enabled is essentially the same as running \f[I]gallery-dl https://twitter.com/i/web/status/\f[] with enabled \f[I]conversations\f[] option for each Tweet in said timeline. Note: This requires at least 1 additional API call per initial Tweet. .SS extractor.twitter.include .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"timeline"\f[] .IP "Example:" 4 .br * "avatar,background,media" .br * ["avatar", "background", "media"] .IP "Description:" 4 A (comma-separated) list of subcategories to include when processing a user profile. Possible values are \f[I]"avatar"\f[], \f[I]"background"\f[], \f[I]"timeline"\f[], \f[I]"tweets"\f[], \f[I]"media"\f[], \f[I]"replies"\f[], \f[I]"likes"\f[]. It is possible to use \f[I]"all"\f[] instead of listing all values separately. .SS extractor.twitter.transform .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Transform Tweet and User metadata into a simpler, uniform format. .SS extractor.twitter.tweet-endpoint .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"auto"\f[] .IP "Description:" 4 Selects the API endpoint used to retrieve single Tweets. .br * \f[I]"restid"\f[]: \f[I]/TweetResultByRestId\f[] - accessible to guest users .br * \f[I]"detail"\f[]: \f[I]/TweetDetail\f[] - more stable .br * \f[I]"auto"\f[]: \f[I]"detail"\f[] when logged in, \f[I]"restid"\f[] otherwise .SS extractor.twitter.size .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]["orig", "4096x4096", "large", "medium", "small"]\f[] .IP "Description:" 4 The image version to download. Any entries after the first one will be used for potential \f[I]fallback\f[] URLs. Known available sizes are \f[I]4096x4096\f[], \f[I]orig\f[], \f[I]large\f[], \f[I]medium\f[], and \f[I]small\f[]. .SS extractor.twitter.logout .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Logout and retry as guest when access to another user's Tweets is blocked. .SS extractor.twitter.pinned .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Fetch media from pinned Tweets. .SS extractor.twitter.quoted .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Fetch media from quoted Tweets. If this option is enabled, gallery-dl will try to fetch a quoted (original) Tweet when it sees the Tweet which quotes it. .SS extractor.twitter.ratelimit .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"wait"\f[] .IP "Description:" 4 Selects how to handle exceeding the API rate limit. .br * \f[I]"abort"\f[]: Raise an error and stop extraction .br * \f[I]"wait"\f[]: Wait until rate limit reset .SS extractor.twitter.locked .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"abort"\f[] .IP "Description:" 4 Selects how to handle "account is temporarily locked" errors. .br * \f[I]"abort"\f[]: Raise an error and stop extraction .br * \f[I]"wait"\f[]: Wait until the account is unlocked and retry .SS extractor.twitter.replies .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Fetch media from replies to other Tweets. If this value is \f[I]"self"\f[], only consider replies where reply and original Tweet are from the same user. Note: Twitter will automatically expand conversations if you use the \f[I]/with_replies\f[] timeline while logged in. For example, media from Tweets which the user replied to will also be downloaded. It is possible to exclude unwanted Tweets using \f[I]image-filter \f[]. .SS extractor.twitter.retweets .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Fetch media from Retweets. If this value is \f[I]"original"\f[], metadata for these files will be taken from the original Tweets, not the Retweets. .SS extractor.twitter.timeline.strategy .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"auto"\f[] .IP "Description:" 4 Controls the strategy / tweet source used for timeline URLs (\f[I]https://twitter.com/USER/timeline\f[]). .br * \f[I]"tweets"\f[]: \f[I]/tweets\f[] timeline + search .br * \f[I]"media"\f[]: \f[I]/media\f[] timeline + search .br * \f[I]"with_replies"\f[]: \f[I]/with_replies\f[] timeline + search .br * \f[I]"auto"\f[]: \f[I]"tweets"\f[] or \f[I]"media"\f[], depending on \f[I]retweets\f[] and \f[I]text-tweets\f[] settings .SS extractor.twitter.text-tweets .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Also emit metadata for text-only Tweets without media content. This only has an effect with a \f[I]metadata\f[] (or \f[I]exec\f[]) post processor with \f[I]"event": "post"\f[] and appropriate \f[I]filename\f[]. .SS extractor.twitter.twitpic .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract \f[I]TwitPic\f[] embeds. .SS extractor.twitter.unique .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Ignore previously seen Tweets. .SS extractor.twitter.users .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"user"\f[] .IP "Example:" 4 "https://twitter.com/search?q=from:{legacy[screen_name]}" .IP "Description:" 4 Format string for user URLs generated from .br \f[I]following\f[] and \f[I]list-members\f[] queries, whose replacement field values come from Twitter \f[I]user\f[] objects .br (\f[I]Example\f[]) Special values: .br * \f[I]"user"\f[]: \f[I]https://twitter.com/i/user/{rest_id}\f[] .br * \f[I]"timeline"\f[]: \f[I]https://twitter.com/id:{rest_id}/timeline\f[] .br * \f[I]"tweets"\f[]: \f[I]https://twitter.com/id:{rest_id}/tweets\f[] .br * \f[I]"media"\f[]: \f[I]https://twitter.com/id:{rest_id}/media\f[] Note: To allow gallery-dl to follow custom URL formats, set the \f[I]blacklist\f[] for \f[I]twitter\f[] to a non-default value, e.g. an empty string \f[I]""\f[]. .SS extractor.twitter.videos .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Control video download behavior. .br * \f[I]true\f[]: Download videos .br * \f[I]"ytdl"\f[]: Download videos using \f[I]youtube-dl\f[] .br * \f[I]false\f[]: Skip video Tweets .SS extractor.unsplash.format .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"raw"\f[] .IP "Description:" 4 Name of the image format to download. Available formats are \f[I]"raw"\f[], \f[I]"full"\f[], \f[I]"regular"\f[], \f[I]"small"\f[], and \f[I]"thumb"\f[]. .SS extractor.vipergirls.domain .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"vipergirls.to"\f[] .IP "Description:" 4 Specifies the domain used by \f[I]vipergirls\f[] extractors. For example \f[I]"viper.click"\f[] if the main domain is blocked or to bypass Cloudflare, .SS extractor.vipergirls.like .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Automatically like posts after downloading their images. Note: Requires \f[I]login\f[] or \f[I]cookies\f[] .SS extractor.vsco.videos .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download video files. .SS extractor.wallhaven.api-key .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Your \f[I]Wallhaven API Key\f[], to use your account's browsing settings and default filters when searching. See https://wallhaven.cc/help/api for more information. .SS extractor.wallhaven.include .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"uploads"\f[] .IP "Example:" 4 .br * "uploads,collections" .br * ["uploads", "collections"] .IP "Description:" 4 A (comma-separated) list of subcategories to include when processing a user profile. Possible values are \f[I]"uploads"\f[], \f[I]"collections"\f[]. It is possible to use \f[I]"all"\f[] instead of listing all values separately. .SS extractor.wallhaven.metadata .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract additional metadata (tags, uploader) Note: This requires 1 additional HTTP request per post. .SS extractor.weasyl.api-key .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Your \f[I]Weasyl API Key\f[], to use your account's browsing settings and filters. .SS extractor.weasyl.metadata .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Fetch extra submission metadata during gallery downloads. .br (\f[I]comments\f[], \f[I]description\f[], \f[I]favorites\f[], \f[I]folder_name\f[], .br \f[I]tags\f[], \f[I]views\f[]) Note: This requires 1 additional HTTP request per submission. .SS extractor.weibo.gifs .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download \f[I]gif\f[] files. Set this to \f[I]"video"\f[] to download GIFs as video files. .SS extractor.weibo.include .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"feed"\f[] .IP "Description:" 4 A (comma-separated) list of subcategories to include when processing a user profile. Possible values are \f[I]"home"\f[], \f[I]"feed"\f[], \f[I]"videos"\f[], \f[I]"newvideo"\f[], \f[I]"article"\f[], \f[I]"album"\f[]. It is possible to use \f[I]"all"\f[] instead of listing all values separately. .SS extractor.weibo.livephoto .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download \f[I]livephoto\f[] files. .SS extractor.weibo.retweets .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Fetch media from retweeted posts. If this value is \f[I]"original"\f[], metadata for these files will be taken from the original posts, not the retweeted posts. .SS extractor.weibo.videos .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download video files. .SS extractor.ytdl.enabled .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Match **all** URLs, even ones without a \f[I]ytdl:\f[] prefix. .SS extractor.ytdl.format .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 youtube-dl's default, currently \f[I]"bestvideo+bestaudio/best"\f[] .IP "Description:" 4 Video \f[I]format selection \f[] directly passed to youtube-dl. .SS extractor.ytdl.generic .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Controls the use of youtube-dl's generic extractor. Set this option to \f[I]"force"\f[] for the same effect as youtube-dl's \f[I]--force-generic-extractor\f[]. .SS extractor.ytdl.logging .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Route youtube-dl's output through gallery-dl's logging system. Otherwise youtube-dl will write its output directly to stdout/stderr. Note: Set \f[I]quiet\f[] and \f[I]no_warnings\f[] in \f[I]extractor.ytdl.raw-options\f[] to \f[I]true\f[] to suppress all output. .SS extractor.ytdl.module .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Name of the youtube-dl Python module to import. Setting this to \f[I]null\f[] will try to import \f[I]"yt_dlp"\f[] followed by \f[I]"youtube_dl"\f[] as fallback. .SS extractor.ytdl.raw-options .IP "Type:" 6 \f[I]object\f[] (name -> value) .IP "Example:" 4 .. code:: json { "quiet": true, "writesubtitles": true, "merge_output_format": "mkv" } .IP "Description:" 4 Additional options passed directly to the \f[I]YoutubeDL\f[] constructor. All available options can be found in \f[I]youtube-dl's docstrings \f[]. .SS extractor.ytdl.cmdline-args .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Example:" 4 .br * "--quiet --write-sub --merge-output-format mkv" .br * ["--quiet", "--write-sub", "--merge-output-format", "mkv"] .IP "Description:" 4 Additional options specified as youtube-dl command-line arguments. .SS extractor.ytdl.config-file .IP "Type:" 6 \f[I]Path\f[] .IP "Example:" 4 "~/.config/youtube-dl/config" .IP "Description:" 4 Location of a youtube-dl configuration file to load options from. .SS extractor.zerochan.metadata .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract additional metadata (date, md5, tags, ...) Note: This requires 1-2 additional HTTP requests per post. .SS extractor.zerochan.pagination .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"api"\f[] .IP "Description:" 4 Controls how to paginate over tag search results. .br * \f[I]"api"\f[]: Use the \f[I]JSON API\f[] (no \f[I]extension\f[] metadata) .br * \f[I]"html"\f[]: Parse HTML pages (limited to 100 pages * 24 posts) .SS extractor.[booru].tags .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Categorize tags by their respective types and provide them as \f[I]tags_\f[] metadata fields. Note: This requires 1 additional HTTP request per post. .SS extractor.[booru].notes .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract overlay notes (position and text). Note: This requires 1 additional HTTP request per post. .SS extractor.[booru].url .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"file_url"\f[] .IP "Example:" 4 "preview_url" .IP "Description:" 4 Alternate field name to retrieve download URLs from. .SS extractor.[manga-extractor].chapter-reverse .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Reverse the order of chapter URLs extracted from manga pages. .br * \f[I]true\f[]: Start with the latest chapter .br * \f[I]false\f[]: Start with the first chapter .SS extractor.[manga-extractor].page-reverse .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download manga chapter pages in reverse order. .SH DOWNLOADER OPTIONS .SS downloader.*.enabled .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Enable/Disable this downloader module. .SS downloader.*.filesize-min & .filesize-max .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Example:" 4 "32000", "500k", "2.5M" .IP "Description:" 4 Minimum/Maximum allowed file size in bytes. Any file smaller/larger than this limit will not be downloaded. Possible values are valid integer or floating-point numbers optionally followed by one of \f[I]k\f[], \f[I]m\f[]. \f[I]g\f[], \f[I]t\f[], or \f[I]p\f[]. These suffixes are case-insensitive. .SS downloader.*.mtime .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Use \f[I]Last-Modified\f[] HTTP response headers to set file modification times. .SS downloader.*.part .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Controls the use of \f[I].part\f[] files during file downloads. .br * \f[I]true\f[]: Write downloaded data into \f[I].part\f[] files and rename them upon download completion. This mode additionally supports resuming incomplete downloads. .br * \f[I]false\f[]: Do not use \f[I].part\f[] files and write data directly into the actual output files. .SS downloader.*.part-directory .IP "Type:" 6 \f[I]Path\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Alternate location for \f[I].part\f[] files. Missing directories will be created as needed. If this value is \f[I]null\f[], \f[I].part\f[] files are going to be stored alongside the actual output files. .SS downloader.*.progress .IP "Type:" 6 \f[I]float\f[] .IP "Default:" 9 \f[I]3.0\f[] .IP "Description:" 4 Number of seconds until a download progress indicator for the current download is displayed. Set this option to \f[I]null\f[] to disable this indicator. .SS downloader.*.rate .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Example:" 4 "32000", "500k", "2.5M" .IP "Description:" 4 Maximum download rate in bytes per second. Possible values are valid integer or floating-point numbers optionally followed by one of \f[I]k\f[], \f[I]m\f[]. \f[I]g\f[], \f[I]t\f[], or \f[I]p\f[]. These suffixes are case-insensitive. .SS downloader.*.retries .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]extractor.*.retries\f[] .IP "Description:" 4 Maximum number of retries during file downloads, or \f[I]-1\f[] for infinite retries. .SS downloader.*.timeout .IP "Type:" 6 \f[I]float\f[] .IP "Default:" 9 \f[I]extractor.*.timeout\f[] .IP "Description:" 4 Connection timeout during file downloads. .SS downloader.*.verify .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]extractor.*.verify\f[] .IP "Description:" 4 Certificate validation during file downloads. .SS downloader.*.proxy .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]object\f[] (scheme -> proxy) .IP "Default:" 9 \f[I]extractor.*.proxy\f[] .IP "Description:" 4 Proxy server used for file downloads. Disable the use of a proxy for file downloads by explicitly setting this option to \f[I]null\f[]. .SS downloader.http.adjust-extensions .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Check file headers of downloaded files and adjust their filename extensions if they do not match. For example, this will change the filename extension (\f[I]{extension}\f[]) of a file called \f[I]example.png\f[] from \f[I]png\f[] to \f[I]jpg\f[] when said file contains JPEG/JFIF data. .SS downloader.http.consume-content .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Controls the behavior when an HTTP response is considered unsuccessful If the value is \f[I]true\f[], consume the response body. This avoids closing the connection and therefore improves connection reuse. If the value is \f[I]false\f[], immediately close the connection without reading the response. This can be useful if the server is known to send large bodies for error responses. .SS downloader.http.chunk-size .IP "Type:" 6 .br * \f[I]integer\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]32768\f[] .IP "Example:" 4 "50k", "0.8M" .IP "Description:" 4 Number of bytes per downloaded chunk. Possible values are integer numbers optionally followed by one of \f[I]k\f[], \f[I]m\f[]. \f[I]g\f[], \f[I]t\f[], or \f[I]p\f[]. These suffixes are case-insensitive. .SS downloader.http.headers .IP "Type:" 6 \f[I]object\f[] (name -> value) .IP "Example:" 4 {"Accept": "image/webp,*/*", "Referer": "https://example.org/"} .IP "Description:" 4 Additional HTTP headers to send when downloading files, .SS downloader.http.retry-codes .IP "Type:" 6 \f[I]list\f[] of \f[I]integers\f[] .IP "Default:" 9 \f[I]extractor.*.retry-codes\f[] .IP "Description:" 4 Additional \f[I]HTTP response status codes\f[] to retry a download on. Codes \f[I]200\f[], \f[I]206\f[], and \f[I]416\f[] (when resuming a \f[I]partial\f[] download) will never be retried and always count as success, regardless of this option. \f[I]5xx\f[] codes (server error responses) will always be retried, regardless of this option. .SS downloader.http.validate .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Check for invalid responses. Fail a download when a file does not pass instead of downloading a potentially broken file. .SS downloader.ytdl.format .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 youtube-dl's default, currently \f[I]"bestvideo+bestaudio/best"\f[] .IP "Description:" 4 Video \f[I]format selection \f[] directly passed to youtube-dl. .SS downloader.ytdl.forward-cookies .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Forward cookies to youtube-dl. .SS downloader.ytdl.logging .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Route youtube-dl's output through gallery-dl's logging system. Otherwise youtube-dl will write its output directly to stdout/stderr. Note: Set \f[I]quiet\f[] and \f[I]no_warnings\f[] in \f[I]downloader.ytdl.raw-options\f[] to \f[I]true\f[] to suppress all output. .SS downloader.ytdl.module .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Name of the youtube-dl Python module to import. Setting this to \f[I]null\f[] will first try to import \f[I]"yt_dlp"\f[] and use \f[I]"youtube_dl"\f[] as fallback. .SS downloader.ytdl.outtmpl .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 The \f[I]Output Template\f[] used to generate filenames for files downloaded with youtube-dl. Special values: .br * \f[I]null\f[]: generate filenames with \f[I]extractor.*.filename\f[] .br * \f[I]"default"\f[]: use youtube-dl's default, currently \f[I]"%(title)s-%(id)s.%(ext)s"\f[] Note: An output template other than \f[I]null\f[] might cause unexpected results in combination with other options (e.g. \f[I]"skip": "enumerate"\f[]) .SS downloader.ytdl.raw-options .IP "Type:" 6 \f[I]object\f[] (name -> value) .IP "Example:" 4 .. code:: json { "quiet": true, "writesubtitles": true, "merge_output_format": "mkv" } .IP "Description:" 4 Additional options passed directly to the \f[I]YoutubeDL\f[] constructor. All available options can be found in \f[I]youtube-dl's docstrings \f[]. .SS downloader.ytdl.cmdline-args .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Example:" 4 .br * "--quiet --write-sub --merge-output-format mkv" .br * ["--quiet", "--write-sub", "--merge-output-format", "mkv"] .IP "Description:" 4 Additional options specified as youtube-dl command-line arguments. .SS downloader.ytdl.config-file .IP "Type:" 6 \f[I]Path\f[] .IP "Example:" 4 "~/.config/youtube-dl/config" .IP "Description:" 4 Location of a youtube-dl configuration file to load options from. .SH OUTPUT OPTIONS .SS output.mode .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]object\f[] (key -> format string) .IP "Default:" 9 \f[I]"auto"\f[] .IP "Description:" 4 Controls the output string format and status indicators. .br * \f[I]"null"\f[]: No output .br * \f[I]"pipe"\f[]: Suitable for piping to other processes or files .br * \f[I]"terminal"\f[]: Suitable for the standard Windows console .br * \f[I]"color"\f[]: Suitable for terminals that understand ANSI escape codes and colors .br * \f[I]"auto"\f[]: \f[I]"terminal"\f[] on Windows with \f[I]output.ansi\f[] disabled, \f[I]"color"\f[] otherwise. It is possible to use custom output format strings .br by setting this option to an \f[I]object\f[] and specifying \f[I]start\f[], \f[I]success\f[], \f[I]skip\f[], \f[I]progress\f[], and \f[I]progress-total\f[]. .br For example, the following will replicate the same output as \f[I]mode: color\f[]: .. code:: json { "start" : "{}", "success": "\\r\\u001b[1;32m{}\\u001b[0m\\n", "skip" : "\\u001b[2m{}\\u001b[0m\\n", "progress" : "\\r{0:>7}B {1:>7}B/s ", "progress-total": "\\r{3:>3}% {0:>7}B {1:>7}B/s " } \f[I]start\f[], \f[I]success\f[], and \f[I]skip\f[] are used to output the current filename, where \f[I]{}\f[] or \f[I]{0}\f[] is replaced with said filename. If a given format string contains printable characters other than that, their number needs to be specified as \f[I][, ]\f[] to get the correct results for \f[I]output.shorten\f[]. For example .. code:: json "start" : [12, "Downloading {}"] \f[I]progress\f[] and \f[I]progress-total\f[] are used when displaying the .br \f[I]download progress indicator\f[], \f[I]progress\f[] when the total number of bytes to download is unknown, .br \f[I]progress-total\f[] otherwise. For these format strings .br * \f[I]{0}\f[] is number of bytes downloaded .br * \f[I]{1}\f[] is number of downloaded bytes per second .br * \f[I]{2}\f[] is total number of bytes .br * \f[I]{3}\f[] is percent of bytes downloaded to total bytes .SS output.stdout & .stdin & .stderr .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]object\f[] .IP "Example:" 4 .. code:: json "utf-8" .. code:: json { "encoding": "utf-8", "errors": "replace", "line_buffering": true } .IP "Description:" 4 \f[I]Reconfigure\f[] a \f[I]standard stream\f[]. Possible options are .br * \f[I]encoding\f[] .br * \f[I]errors\f[] .br * \f[I]newline\f[] .br * \f[I]line_buffering\f[] .br * \f[I]write_through\f[] When this option is specified as a simple \f[I]string\f[], it is interpreted as \f[I]{"encoding": "", "errors": "replace"}\f[] Note: \f[I]errors\f[] always defaults to \f[I]"replace"\f[] .SS output.shorten .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Controls whether the output strings should be shortened to fit on one console line. Set this option to \f[I]"eaw"\f[] to also work with east-asian characters with a display width greater than 1. .SS output.colors .IP "Type:" 6 \f[I]object\f[] (key -> ANSI color) .IP "Default:" 9 \f[I]{"success": "1;32", "skip": "2"}\f[] .IP "Description:" 4 Controls the \f[I]ANSI colors\f[] used with \f[I]mode: color\f[] for successfully downloaded or skipped files. .SS output.ansi .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 On Windows, enable ANSI escape sequences and colored output .br by setting the \f[I]ENABLE_VIRTUAL_TERMINAL_PROCESSING\f[] flag for stdout and stderr. .br .SS output.skip .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Show skipped file downloads. .SS output.fallback .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Include fallback URLs in the output of \f[I]-g/--get-urls\f[]. .SS output.private .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Include private fields, i.e. fields whose name starts with an underscore, in the output of \f[I]-K/--list-keywords\f[] and \f[I]-j/--dump-json\f[]. .SS output.progress .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Controls the progress indicator when *gallery-dl* is run with multiple URLs as arguments. .br * \f[I]true\f[]: Show the default progress indicator (\f[I]"[{current}/{total}] {url}"\f[]) .br * \f[I]false\f[]: Do not show any progress indicator .br * Any \f[I]string\f[]: Show the progress indicator using this as a custom \f[I]format string\f[]. Possible replacement keys are \f[I]current\f[], \f[I]total\f[] and \f[I]url\f[]. .SS output.log .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]Logging Configuration\f[] .IP "Default:" 9 \f[I]"[{name}][{levelname}] {message}"\f[] .IP "Description:" 4 Configuration for logging output to stderr. If this is a simple \f[I]string\f[], it specifies the format string for logging messages. .SS output.logfile .IP "Type:" 6 .br * \f[I]Path\f[] .br * \f[I]Logging Configuration\f[] .IP "Description:" 4 File to write logging output to. .SS output.unsupportedfile .IP "Type:" 6 .br * \f[I]Path\f[] .br * \f[I]Logging Configuration\f[] .IP "Description:" 4 File to write external URLs unsupported by *gallery-dl* to. The default format string here is \f[I]"{message}"\f[]. .SS output.errorfile .IP "Type:" 6 .br * \f[I]Path\f[] .br * \f[I]Logging Configuration\f[] .IP "Description:" 4 File to write input URLs which returned an error to. The default format string here is also \f[I]"{message}"\f[]. When combined with \f[I]-I\f[]/\f[I]--input-file-comment\f[] or \f[I]-x\f[]/\f[I]--input-file-delete\f[], this option will cause *all* input URLs from these files to be commented/deleted after processing them and not just successful ones. .SS output.num-to-str .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Convert numeric values (\f[I]integer\f[] or \f[I]float\f[]) to \f[I]string\f[] before outputting them as JSON. .SH POSTPROCESSOR OPTIONS .SS classify.mapping .IP "Type:" 6 \f[I]object\f[] (directory -> extensions) .IP "Default:" 9 .. code:: json { "Pictures": ["jpg", "jpeg", "png", "gif", "bmp", "svg", "webp"], "Video" : ["flv", "ogv", "avi", "mp4", "mpg", "mpeg", "3gp", "mkv", "webm", "vob", "wmv"], "Music" : ["mp3", "aac", "flac", "ogg", "wma", "m4a", "wav"], "Archives": ["zip", "rar", "7z", "tar", "gz", "bz2"] } .IP "Description:" 4 A mapping from directory names to filename extensions that should be stored in them. Files with an extension not listed will be ignored and stored in their default location. .SS compare.action .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"replace"\f[] .IP "Description:" 4 The action to take when files do **not** compare as equal. .br * \f[I]"replace"\f[]: Replace/Overwrite the old version with the new one .br * \f[I]"enumerate"\f[]: Add an enumeration index to the filename of the new version like \f[I]skip = "enumerate"\f[] .SS compare.equal .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"null"\f[] .IP "Description:" 4 The action to take when files do compare as equal. .br * \f[I]"abort:N"\f[]: Stop the current extractor run after \f[I]N\f[] consecutive files compared as equal. .br * \f[I]"terminate:N"\f[]: Stop the current extractor run, including parent extractors, after \f[I]N\f[] consecutive files compared as equal. .br * \f[I]"exit:N"\f[]: Exit the program after \f[I]N\f[] consecutive files compared as equal. .SS compare.shallow .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Only compare file sizes. Do not read and compare their content. .SS exec.archive .IP "Type:" 6 \f[I]Path\f[] .IP "Description:" 4 File to store IDs of executed commands in, similar to \f[I]extractor.*.archive\f[]. \f[I]archive-format\f[], \f[I]archive-prefix\f[], and \f[I]archive-pragma\f[] options, akin to \f[I]extractor.*.archive-format\f[], \f[I]extractor.*.archive-prefix\f[], and \f[I]extractor.*.archive-pragma\f[], are supported as well. .SS exec.async .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Controls whether to wait for a subprocess to finish or to let it run asynchronously. .SS exec.command .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Example:" 4 .br * "convert {} {}.png && rm {}" .br * ["echo", "{user[account]}", "{id}"] .IP "Description:" 4 The command to run. .br * If this is a \f[I]string\f[], it will be executed using the system's shell, e.g. \f[I]/bin/sh\f[]. Any \f[I]{}\f[] will be replaced with the full path of a file or target directory, depending on \f[I]exec.event\f[] .br * If this is a \f[I]list\f[], the first element specifies the program name and any further elements its arguments. Each element of this list is treated as a \f[I]format string\f[] using the files' metadata as well as \f[I]{_path}\f[], \f[I]{_directory}\f[], and \f[I]{_filename}\f[]. .SS exec.event .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"after"\f[] .IP "Description:" 4 The event(s) for which \f[I]exec.command\f[] is run. See \f[I]metadata.event\f[] for a list of available events. .SS metadata.mode .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"json"\f[] .IP "Description:" 4 Selects how to process metadata. .br * \f[I]"json"\f[]: write metadata using \f[I]json.dump()\f[] .br * \f[I]"jsonl"\f[]: write metadata in \f[I]JSON Lines \f[] format .br * \f[I]"tags"\f[]: write \f[I]tags\f[] separated by newlines .br * \f[I]"custom"\f[]: write the result of applying \f[I]metadata.content-format\f[] to a file's metadata dictionary .br * \f[I]"modify"\f[]: add or modify metadata entries .br * \f[I]"delete"\f[]: remove metadata entries .SS metadata.filename .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Example:" 4 "{id}.data.json" .IP "Description:" 4 A \f[I]format string\f[] to build the filenames for metadata files with. (see \f[I]extractor.filename\f[]) Using \f[I]"-"\f[] as filename will write all output to \f[I]stdout\f[]. If this option is set, \f[I]metadata.extension\f[] and \f[I]metadata.extension-format\f[] will be ignored. .SS metadata.directory .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"."\f[] .IP "Example:" 4 "metadata" .IP "Description:" 4 Directory where metadata files are stored in relative to the current target location for file downloads. .SS metadata.extension .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"json"\f[] or \f[I]"txt"\f[] .IP "Description:" 4 Filename extension for metadata files that will be appended to the original file names. .SS metadata.extension-format .IP "Type:" 6 \f[I]string\f[] .IP "Example:" 4 .br * "{extension}.json" .br * "json" .IP "Description:" 4 Custom format string to build filename extensions for metadata files with, which will replace the original filename extensions. Note: \f[I]metadata.extension\f[] is ignored if this option is set. .SS metadata.event .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"file"\f[] .IP "Example:" 4 .br * "prepare,file,after" .br * ["prepare-after", "skip"] .IP "Description:" 4 The event(s) for which metadata gets written to a file. Available events are: \f[I]init\f[] After post processor initialization and before the first file download \f[I]finalize\f[] On extractor shutdown, e.g. after all files were downloaded \f[I]finalize-success\f[] On extractor shutdown when no error occurred \f[I]finalize-error\f[] On extractor shutdown when at least one error occurred \f[I]prepare\f[] Before a file download \f[I]prepare-after\f[] Before a file download, but after building and checking file paths \f[I]file\f[] When completing a file download, but before it gets moved to its target location \f[I]after\f[] After a file got moved to its target location \f[I]skip\f[] When skipping a file download \f[I]post\f[] When starting to download all files of a post, e.g. a Tweet on Twitter or a post on Patreon. \f[I]post-after\f[] After downloading all files of a post .SS metadata.fields .IP "Type:" 6 .br * \f[I]list\f[] of \f[I]strings\f[] .br * \f[I]object\f[] (field name -> \f[I]format string\f[]) .IP "Example:" 4 .. code:: json ["blocked", "watching", "status[creator][name]"] .. code:: json { "blocked" : "***", "watching" : "\\fE 'yes' if watching else 'no'", "status[username]": "{status[creator][name]!l}" } .IP "Description:" 4 .br * \f[I]"mode": "delete"\f[]: A list of metadata field names to remove. .br * \f[I]"mode": "modify"\f[]: An object with metadata field names mapping to a \f[I]format string\f[] whose result is assigned to said field name. .SS metadata.content-format .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Example:" 4 .br * "tags:\\n\\n{tags:J\\n}\\n" .br * ["tags:", "", "{tags:J\\n}"] .IP "Description:" 4 Custom format string to build the content of metadata files with. Note: Only applies for \f[I]"mode": "custom"\f[]. .SS metadata.ascii .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Escape all non-ASCII characters. See the \f[I]ensure_ascii\f[] argument of \f[I]json.dump()\f[] for further details. Note: Only applies for \f[I]"mode": "json"\f[] and \f[I]"jsonl"\f[]. .SS metadata.indent .IP "Type:" 6 .br * \f[I]integer\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]4\f[] .IP "Description:" 4 Indentation level of JSON output. See the \f[I]indent\f[] argument of \f[I]json.dump()\f[] for further details. Note: Only applies for \f[I]"mode": "json"\f[]. .SS metadata.separators .IP "Type:" 6 \f[I]list\f[] with two \f[I]string\f[] elements .IP "Default:" 9 \f[I][", ", ": "]\f[] .IP "Description:" 4 \f[I]\f[] - \f[I]\f[] pair to separate JSON keys and values with. See the \f[I]separators\f[] argument of \f[I]json.dump()\f[] for further details. Note: Only applies for \f[I]"mode": "json"\f[] and \f[I]"jsonl"\f[]. .SS metadata.sort .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Sort output by key. See the \f[I]sort_keys\f[] argument of \f[I]json.dump()\f[] for further details. Note: Only applies for \f[I]"mode": "json"\f[] and \f[I]"jsonl"\f[]. .SS metadata.open .IP "Type:" 6 \f[I]string\f[] .IP "Defsult:" 4 \f[I]"w"\f[] .IP "Description:" 4 The \f[I]mode\f[] in which metadata files get opened. For example, use \f[I]"a"\f[] to append to a file's content or \f[I]"w"\f[] to truncate it. See the \f[I]mode\f[] argument of \f[I]open()\f[] for further details. .SS metadata.encoding .IP "Type:" 6 \f[I]string\f[] .IP "Defsult:" 4 \f[I]"utf-8"\f[] .IP "Description:" 4 Name of the encoding used to encode a file's content. See the \f[I]encoding\f[] argument of \f[I]open()\f[] for further details. .SS metadata.private .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Include private fields, i.e. fields whose name starts with an underscore. .SS metadata.skip .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Do not overwrite already existing files. .SS metadata.archive .IP "Type:" 6 \f[I]Path\f[] .IP "Description:" 4 File to store IDs of generated metadata files in, similar to \f[I]extractor.*.archive\f[]. \f[I]archive-format\f[], \f[I]archive-prefix\f[], and \f[I]archive-pragma\f[] options, akin to \f[I]extractor.*.archive-format\f[], \f[I]extractor.*.archive-prefix\f[], and \f[I]extractor.*.archive-pragma\f[], are supported as well. .SS metadata.mtime .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Set modification times of generated metadata files according to the accompanying downloaded file. Enabling this option will only have an effect *if* there is actual \f[I]mtime\f[] metadata available, that is .br * after a file download (\f[I]"event": "file"\f[] (default), \f[I]"event": "after"\f[]) .br * when running *after* an \f[I]mtime\f[] post processes for the same \f[I]event\f[] For example, a \f[I]metadata\f[] post processor for \f[I]"event": "post"\f[] will *not* be able to set its file's modification time unless an \f[I]mtime\f[] post processor with \f[I]"event": "post"\f[] runs *before* it. .SS mtime.event .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"file"\f[] .IP "Description:" 4 The event(s) for which \f[I]mtime.key\f[] or \f[I]mtime.value\f[] get evaluated. See \f[I]metadata.event\f[] for a list of available events. .SS mtime.key .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"date"\f[] .IP "Description:" 4 Name of the metadata field whose value should be used. This value must be either a UNIX timestamp or a \f[I]datetime\f[] object. Note: This option gets ignored if \f[I]mtime.value\f[] is set. .SS mtime.value .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Example:" 4 .br * "{status[date]}" .br * "{content[0:6]:R22/2022/D%Y%m%d/}" .IP "Description:" 4 A \f[I]format string\f[] whose value should be used. The resulting value must be either a UNIX timestamp or a \f[I]datetime\f[] object. .SS python.archive .IP "Type:" 6 \f[I]Path\f[] .IP "Description:" 4 File to store IDs of called Python functions in, similar to \f[I]extractor.*.archive\f[]. \f[I]archive-format\f[], \f[I]archive-prefix\f[], and \f[I]archive-pragma\f[] options, akin to \f[I]extractor.*.archive-format\f[], \f[I]extractor.*.archive-prefix\f[], and \f[I]extractor.*.archive-pragma\f[], are supported as well. .SS python.event .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"file"\f[] .IP "Description:" 4 The event(s) for which \f[I]python.function\f[] gets called. See \f[I]metadata.event\f[] for a list of available events. .SS python.function .IP "Type:" 6 \f[I]string\f[] .IP "Example:" 4 .br * "my_module:generate_text" .br * "~/.local/share/gdl-utils.py:resize" .IP "Description:" 4 The Python function to call. This function is specified as \f[I]:\f[] and gets called with the current metadata dict as argument. \f[I]module\f[] is either an importable Python module name or the \f[I]Path\f[] to a .py file, .SS ugoira.extension .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"webm"\f[] .IP "Description:" 4 Filename extension for the resulting video files. .SS ugoira.ffmpeg-args .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Example:" 4 ["-c:v", "libvpx-vp9", "-an", "-b:v", "2M"] .IP "Description:" 4 Additional FFmpeg command-line arguments. .SS ugoira.ffmpeg-demuxer .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]auto\f[] .IP "Description:" 4 FFmpeg demuxer to read and process input files with. Possible values are .br * "\f[I]concat\f[]" (inaccurate frame timecodes for non-uniform frame delays) .br * "\f[I]image2\f[]" (accurate timecodes, requires nanosecond file timestamps, i.e. no Windows or macOS) .br * "mkvmerge" (accurate timecodes, only WebM or MKV, requires \f[I]mkvmerge\f[]) "auto" will select mkvmerge if available and fall back to concat otherwise. .SS ugoira.ffmpeg-location .IP "Type:" 6 \f[I]Path\f[] .IP "Default:" 9 \f[I]"ffmpeg"\f[] .IP "Description:" 4 Location of the \f[I]ffmpeg\f[] (or \f[I]avconv\f[]) executable to use. .SS ugoira.mkvmerge-location .IP "Type:" 6 \f[I]Path\f[] .IP "Default:" 9 \f[I]"mkvmerge"\f[] .IP "Description:" 4 Location of the \f[I]mkvmerge\f[] executable for use with the \f[I]mkvmerge demuxer\f[]. .SS ugoira.ffmpeg-output .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]"error"\f[] .IP "Description:" 4 Controls FFmpeg output. .br * \f[I]true\f[]: Enable FFmpeg output .br * \f[I]false\f[]: Disable all FFmpeg output .br * any \f[I]string\f[]: Pass \f[I]-hide_banner\f[] and \f[I]-loglevel\f[] with this value as argument to FFmpeg .SS ugoira.ffmpeg-twopass .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Enable Two-Pass encoding. .SS ugoira.framerate .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"auto"\f[] .IP "Description:" 4 Controls the frame rate argument (\f[I]-r\f[]) for FFmpeg .br * \f[I]"auto"\f[]: Automatically assign a fitting frame rate based on delays between frames. .br * \f[I]"uniform"\f[]: Like \f[I]auto\f[], but assign an explicit frame rate only to Ugoira with uniform frame delays. .br * any other \f[I]string\f[]: Use this value as argument for \f[I]-r\f[]. .br * \f[I]null\f[] or an empty \f[I]string\f[]: Don't set an explicit frame rate. .SS ugoira.keep-files .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Keep ZIP archives after conversion. .SS ugoira.libx264-prevent-odd .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Prevent \f[I]"width/height not divisible by 2"\f[] errors when using \f[I]libx264\f[] or \f[I]libx265\f[] encoders by applying a simple cropping filter. See this \f[I]Stack Overflow thread\f[] for more information. This option, when \f[I]libx264/5\f[] is used, automatically adds \f[I]["-vf", "crop=iw-mod(iw\\\\,2):ih-mod(ih\\\\,2)"]\f[] to the list of FFmpeg command-line arguments to reduce an odd width/height by 1 pixel and make them even. .SS ugoira.mtime .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Set modification times of generated ugoira aniomations. .SS ugoira.repeat-last-frame .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Allow repeating the last frame when necessary to prevent it from only being displayed for a very short amount of time. .SS zip.extension .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"zip"\f[] .IP "Description:" 4 Filename extension for the created ZIP archive. .SS zip.files .IP "Type:" 6 \f[I]list\f[] of \f[I]Path\f[] .IP "Example:" 4 ["info.json"] .IP "Description:" 4 List of extra files to be added to a ZIP archive. Note: Relative paths are relative to the current \f[I]download directory\f[]. .SS zip.keep-files .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Keep the actual files after writing them to a ZIP archive. .SS zip.mode .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"default"\f[] .IP "Description:" 4 .br * \f[I]"default"\f[]: Write the central directory file header once after everything is done or an exception is raised. .br * \f[I]"safe"\f[]: Update the central directory file header each time a file is stored in a ZIP archive. This greatly reduces the chance a ZIP archive gets corrupted in case the Python interpreter gets shut down unexpectedly (power outage, SIGKILL) but is also a lot slower. .SH MISCELLANEOUS OPTIONS .SS extractor.modules .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 The \f[I]modules\f[] list in \f[I]extractor/__init__.py\f[] .IP "Example:" 4 ["reddit", "danbooru", "mangadex"] .IP "Description:" 4 List of internal modules to load when searching for a suitable extractor class. Useful to reduce startup time and memory usage. .SS extractor.module-sources .IP "Type:" 6 \f[I]list\f[] of \f[I]Path\f[] instances .IP "Example:" 4 ["~/.config/gallery-dl/modules", null] .IP "Description:" 4 List of directories to load external extractor modules from. Any file in a specified directory with a \f[I].py\f[] filename extension gets \f[I]imported\f[] and searched for potential extractors, i.e. classes with a \f[I]pattern\f[] attribute. Note: \f[I]null\f[] references internal extractors defined in \f[I]extractor/__init__.py\f[] or by \f[I]extractor.modules\f[]. .SS globals .IP "Type:" 6 .br * \f[I]Path\f[] .br * \f[I]string\f[] .IP "Example:" 4 .br * "~/.local/share/gdl-globals.py" .br * "gdl-globals" .IP "Description:" 4 Path to or name of an .br \f[I]importable\f[] Python module, whose namespace, .br in addition to the \f[I]GLOBALS\f[] dict in \f[I]util.py\f[], gets used as \f[I]globals parameter\f[] for compiled Python expressions. .SS cache.file .IP "Type:" 6 \f[I]Path\f[] .IP "Default:" 9 .br * (\f[I]%APPDATA%\f[] or \f[I]"~"\f[]) + \f[I]"/gallery-dl/cache.sqlite3"\f[] on Windows .br * (\f[I]$XDG_CACHE_HOME\f[] or \f[I]"~/.cache"\f[]) + \f[I]"/gallery-dl/cache.sqlite3"\f[] on all other platforms .IP "Description:" 4 Path of the SQLite3 database used to cache login sessions, cookies and API tokens across gallery-dl invocations. Set this option to \f[I]null\f[] or an invalid path to disable this cache. .SS format-separator .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"/"\f[] .IP "Description:" 4 Character(s) used as argument separator in format string \f[I]format specifiers\f[]. For example, setting this option to \f[I]"#"\f[] would allow a replacement operation to be \f[I]Rold#new#\f[] instead of the default \f[I]Rold/new/\f[] .SS signals-ignore .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Example:" 4 ["SIGTTOU", "SIGTTIN", "SIGTERM"] .IP "Description:" 4 The list of signal names to ignore, i.e. set \f[I]SIG_IGN\f[] as signal handler for. .SS subconfigs .IP "Type:" 6 \f[I]list\f[] of \f[I]Path\f[] .IP "Example:" 4 ["~/cfg-twitter.json", "~/cfg-reddit.json"] .IP "Description:" 4 Additional configuration files to load. .SS warnings .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"default"\f[] .IP "Description:" 4 The \f[I]Warnings Filter action\f[] used for (urllib3) warnings. .SH API TOKENS & IDS .SS extractor.deviantart.client-id & .client-secret .IP "Type:" 6 \f[I]string\f[] .IP "How To:" 4 .br * login and visit DeviantArt's \f[I]Applications & Keys\f[] section .br * click "Register Application" .br * scroll to "OAuth2 Redirect URI Whitelist (Required)" and enter "https://mikf.github.io/gallery-dl/oauth-redirect.html" .br * scroll to the bottom and agree to the API License Agreement. Submission Policy, and Terms of Service. .br * click "Save" .br * copy \f[I]client_id\f[] and \f[I]client_secret\f[] of your new application and put them in your configuration file as \f[I]"client-id"\f[] and \f[I]"client-secret"\f[] .br * clear your \f[I]cache\f[] to delete any remaining \f[I]access-token\f[] entries. (\f[I]gallery-dl --clear-cache deviantart\f[]) .br * get a new \f[I]refresh-token\f[] for the new \f[I]client-id\f[] (\f[I]gallery-dl oauth:deviantart\f[]) .SS extractor.flickr.api-key & .api-secret .IP "Type:" 6 \f[I]string\f[] .IP "How To:" 4 .br * login and \f[I]Create an App\f[] in Flickr's \f[I]App Garden\f[] .br * click "APPLY FOR A NON-COMMERCIAL KEY" .br * fill out the form with a random name and description and click "SUBMIT" .br * copy \f[I]Key\f[] and \f[I]Secret\f[] and put them in your configuration file as \f[I]"api-key"\f[] and \f[I]"api-secret"\f[] .SS extractor.reddit.client-id & .user-agent .IP "Type:" 6 \f[I]string\f[] .IP "How To:" 4 .br * login and visit the \f[I]apps\f[] section of your account's preferences .br * click the "are you a developer? create an app..." button .br * fill out the form: .br * choose a name .br * select "installed app" .br * set \f[I]http://localhost:6414/\f[] as "redirect uri" .br * solve the "I'm not a robot" reCAPTCHA if needed .br * click "create app" .br * copy the client id (third line, under your application's name and "installed app") and put it in your configuration file as \f[I]"client-id"\f[] .br * use "\f[I]Python::v1.0 (by /u/)\f[]" as \f[I]user-agent\f[] and replace \f[I]\f[] and \f[I]\f[] accordingly (see Reddit's \f[I]API access rules\f[]) .br * clear your \f[I]cache\f[] to delete any remaining \f[I]access-token\f[] entries. (\f[I]gallery-dl --clear-cache reddit\f[]) .br * get a \f[I]refresh-token\f[] for the new \f[I]client-id\f[] (\f[I]gallery-dl oauth:reddit\f[]) .SS extractor.smugmug.api-key & .api-secret .IP "Type:" 6 \f[I]string\f[] .IP "How To:" 4 .br * login and \f[I]Apply for an API Key\f[] .br * use a random name and description, set "Type" to "Application", "Platform" to "All", and "Use" to "Non-Commercial" .br * fill out the two checkboxes at the bottom and click "Apply" .br * copy \f[I]API Key\f[] and \f[I]API Secret\f[] and put them in your configuration file as \f[I]"api-key"\f[] and \f[I]"api-secret"\f[] .SS extractor.tumblr.api-key & .api-secret .IP "Type:" 6 \f[I]string\f[] .IP "How To:" 4 .br * login and visit Tumblr's \f[I]Applications\f[] section .br * click "Register application" .br * fill out the form: use a random name and description, set https://example.org/ as "Application Website" and "Default callback URL" .br * solve Google's "I'm not a robot" challenge and click "Register" .br * click "Show secret key" (below "OAuth Consumer Key") .br * copy your \f[I]OAuth Consumer Key\f[] and \f[I]Secret Key\f[] and put them in your configuration file as \f[I]"api-key"\f[] and \f[I]"api-secret"\f[] .SH CUSTOM TYPES .SS Date .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]integer\f[] .IP "Example:" 4 .br * "2019-01-01T00:00:00" .br * "2019" with "%Y" as \f[I]date-format\f[] .br * 1546297200 .IP "Description:" 4 A \f[I]Date\f[] value represents a specific point in time. .br * If given as \f[I]string\f[], it is parsed according to \f[I]date-format\f[]. .br * If given as \f[I]integer\f[], it is interpreted as UTC timestamp. .SS Duration .IP "Type:" 6 .br * \f[I]float\f[] .br * \f[I]list\f[] with 2 \f[I]floats\f[] .br * \f[I]string\f[] .IP "Example:" 4 .br * 2.85 .br * [1.5, 3.0] .br * "2.85", "1.5-3.0" .IP "Description:" 4 A \f[I]Duration\f[] represents a span of time in seconds. .br * If given as a single \f[I]float\f[], it will be used as that exact value. .br * If given as a \f[I]list\f[] with 2 floating-point numbers \f[I]a\f[] & \f[I]b\f[] , it will be randomly chosen with uniform distribution such that \f[I]a <= N <= b\f[]. (see \f[I]random.uniform()\f[]) .br * If given as a \f[I]string\f[], it can either represent a single \f[I]float\f[] value (\f[I]"2.85"\f[]) or a range (\f[I]"1.5-3.0"\f[]). .SS Path .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Example:" 4 .br * "file.ext" .br * "~/path/to/file.ext" .br * "$HOME/path/to/file.ext" .br * ["$HOME", "path", "to", "file.ext"] .IP "Description:" 4 A \f[I]Path\f[] is a \f[I]string\f[] representing the location of a file or directory. Simple \f[I]tilde expansion\f[] and \f[I]environment variable expansion\f[] is supported. In Windows environments, backslashes (\f[I]"\\"\f[]) can, in addition to forward slashes (\f[I]"/"\f[]), be used as path separators. Because backslashes are JSON's escape character, they themselves have to be escaped. The path \f[I]C:\\path\\to\\file.ext\f[] has therefore to be written as \f[I]"C:\\\\path\\\\to\\\\file.ext"\f[] if you want to use backslashes. .SS Logging Configuration .IP "Type:" 6 \f[I]object\f[] .IP "Example:" 4 .. code:: json { "format" : "{asctime} {name}: {message}", "format-date": "%H:%M:%S", "path" : "~/log.txt", "encoding" : "ascii" } .. code:: json { "level" : "debug", "format": { "debug" : "debug: {message}", "info" : "[{name}] {message}", "warning": "Warning: {message}", "error" : "ERROR: {message}" } } .IP "Description:" 4 Extended logging output configuration. .br * format .br * General format string for logging messages or a dictionary with format strings for each loglevel. In addition to the default \f[I]LogRecord attributes\f[], it is also possible to access the current \f[I]extractor\f[], \f[I]job\f[], \f[I]path\f[], and keywords objects and their attributes, for example \f[I]"{extractor.url}"\f[], \f[I]"{path.filename}"\f[], \f[I]"{keywords.title}"\f[] .br * Default: \f[I]"[{name}][{levelname}] {message}"\f[] .br * format-date .br * Format string for \f[I]{asctime}\f[] fields in logging messages (see \f[I]strftime() directives\f[]) .br * Default: \f[I]"%Y-%m-%d %H:%M:%S"\f[] .br * level .br * Minimum logging message level (one of \f[I]"debug"\f[], \f[I]"info"\f[], \f[I]"warning"\f[], \f[I]"error"\f[], \f[I]"exception"\f[]) .br * Default: \f[I]"info"\f[] .br * path .br * \f[I]Path\f[] to the output file .br * mode .br * Mode in which the file is opened; use \f[I]"w"\f[] to truncate or \f[I]"a"\f[] to append (see \f[I]open()\f[]) .br * Default: \f[I]"w"\f[] .br * encoding .br * File encoding .br * Default: \f[I]"utf-8"\f[] Note: path, mode, and encoding are only applied when configuring logging output to a file. .SS Postprocessor Configuration .IP "Type:" 6 \f[I]object\f[] .IP "Example:" 4 .. code:: json { "name": "mtime" } .. code:: json { "name" : "zip", "compression": "store", "extension" : "cbz", "filter" : "extension not in ('zip', 'rar')", "whitelist" : ["mangadex", "exhentai", "nhentai"] } .IP "Description:" 4 An \f[I]object\f[] containing a \f[I]"name"\f[] attribute specifying the post-processor type, as well as any of its \f[I]options\f[]. It is possible to set a \f[I]"filter"\f[] expression similar to \f[I]image-filter\f[] to only run a post-processor conditionally. It is also possible set a \f[I]"whitelist"\f[] or \f[I]"blacklist"\f[] to only enable or disable a post-processor for the specified extractor categories. The available post-processor types are \f[I]classify\f[] Categorize files by filename extension \f[I]compare\f[] Compare versions of the same file and replace/enumerate them on mismatch .br (requires \f[I]downloader.*.part\f[] = \f[I]true\f[] and \f[I]extractor.*.skip\f[] = \f[I]false\f[]) .br \f[I]exec\f[] Execute external commands \f[I]metadata\f[] Write metadata to separate files \f[I]mtime\f[] Set file modification time according to its metadata \f[I]python\f[] Call Python functions \f[I]ugoira\f[] Convert Pixiv Ugoira to WebM using \f[I]FFmpeg\f[] \f[I]zip\f[] Store files in a ZIP archive .SH BUGS https://github.com/mikf/gallery-dl/issues .SH AUTHORS Mike Fährmann .br and https://github.com/mikf/gallery-dl/graphs/contributors .SH "SEE ALSO" .BR gallery-dl (1) ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1711211673.9791386 gallery_dl-1.26.9/docs/0000755000175000017500000000000014577602232013363 5ustar00mikemike././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709611201.0 gallery_dl-1.26.9/docs/gallery-dl-example.conf0000644000175000017500000003460714571514301017722 0ustar00mikemike{ "extractor": { "base-directory": "~/gallery-dl/", "#": "set global archive file for all extractors", "archive": "~/gallery-dl/archive.sqlite3", "archive-pragma": ["journal_mode=WAL", "synchronous=NORMAL"], "#": "add two custom keywords into the metadata dictionary", "#": "these can be used to further refine your output directories or filenames", "keywords": {"bkey": "", "ckey": ""}, "#": "make sure that custom keywords are empty, i.e. they don't appear unless specified by the user", "keywords-default": "", "#": "replace invalid path characters with unicode alternatives", "path-restrict": { "\\": "⧹", "/" : "⧸", "|" : "│", ":" : "꞉", "*" : "∗", "?" : "?", "\"": "″", "<" : "﹤", ">" : "﹥" }, "#": "write tags for several *booru sites", "postprocessors": [ { "name": "metadata", "mode": "tags", "whitelist": ["danbooru", "moebooru", "sankaku"] } ], "pixiv": { "#": "override global archive path for pixiv", "archive": "~/gallery-dl/archive-pixiv.sqlite3", "#": "set custom directory and filename format strings for all pixiv downloads", "filename": "{id}{num}.{extension}", "directory": ["Pixiv", "Works", "{user[id]}"], "refresh-token": "aBcDeFgHiJkLmNoPqRsTuVwXyZ01234567890-FedC9", "#": "transform ugoira into lossless MKVs", "ugoira": true, "postprocessors": ["ugoira-copy"], "#": "use special settings for favorites and bookmarks", "favorite": { "directory": ["Pixiv", "Favorites", "{user[id]}"] }, "bookmark": { "directory": ["Pixiv", "My Bookmarks"], "refresh-token": "01234567890aBcDeFgHiJkLmNoPqRsTuVwXyZ-ZyxW1" } }, "danbooru": { "ugoira": true, "postprocessors": ["ugoira-webm"] }, "exhentai": { "#": "use cookies instead of logging in with username and password", "cookies": { "ipb_member_id": "12345", "ipb_pass_hash": "1234567890abcdef", "igneous" : "123456789", "hath_perks" : "m1.m2.m3.a-123456789a", "sk" : "n4m34tv3574m2c4e22c35zgeehiw", "sl" : "dm_2" }, "#": "wait 2 to 4.8 seconds between HTTP requests", "sleep-request": [2.0, 4.8], "filename": "{num:>04}_{name}.{extension}", "directory": ["{category!c}", "{title}"] }, "sankaku": { "#": "authentication with cookies is not possible for sankaku", "username": "user", "password": "#secret#" }, "furaffinity": { "#": "authentication with username and password is not possible due to CAPTCHA", "cookies": { "a": "01234567-89ab-cdef-fedc-ba9876543210", "b": "fedcba98-7654-3210-0123-456789abcdef" }, "descriptions": "html", "postprocessors": ["content"] }, "deviantart": { "#": "download 'gallery' and 'scraps' images for user profile URLs", "include": "gallery,scraps", "#": "use custom API credentials to avoid 429 errors", "client-id": "98765", "client-secret": "0123456789abcdef0123456789abcdef", "refresh-token": "0123456789abcdef0123456789abcdef01234567", "#": "put description texts into a separate directory", "metadata": true, "postprocessors": [ { "name": "metadata", "mode": "custom", "directory" : "Descriptions", "content-format" : "{description}\n", "extension-format": "descr.txt" } ] }, "kemonoparty": { "postprocessors": [ { "name": "metadata", "event": "post", "filename": "{id} {title}.txt", "#": "write text content and external URLs", "mode": "custom", "format": "{content}\n{embed[url]:?/\n/}", "#": "onlx write file if there is an external link present", "filter": "embed.get('url') or re.search(r'(?i)(gigafile|xgf|1drv|mediafire|mega|google|drive)', content)" } ] }, "flickr": { "access-token": "1234567890-abcdef", "access-token-secret": "1234567890abcdef", "size-max": 1920 }, "mangadex": { "#": "only download safe/suggestive chapters translated to English", "lang": "en", "ratings": ["safe", "suggestive"], "#": "put chapters into '.cbz' archives", "postprocessors": ["cbz"] }, "reddit": { "#": "only spawn child extractors for links to specific sites", "whitelist": ["imgur", "redgifs"], "#": "put files from child extractors into the reddit directory", "parent-directory": true, "#": "transfer metadata to any child extractor as '_reddit'", "parent-metadata": "_reddit" }, "imgur": { "#": "general imgur settings", "filename": "{id}.{extension}" }, "reddit>imgur": { "#": "special settings for imgur URLs found in reddit posts", "directory": [], "filename": "{_reddit[id]} {_reddit[title]} {id}.{extension}" }, "tumblr": { "posts" : "all", "external": false, "reblogs" : false, "inline" : true, "#": "use special settings when downloading liked posts", "likes": { "posts" : "video,photo,link", "external": true, "reblogs" : true } }, "twitter": { "#": "write text content for *all* tweets", "postprocessors": ["content"], "text-tweets": true }, "ytdl": { "#": "enable 'ytdl' extractor", "#": "i.e. invoke ytdl on all otherwise unsupported input URLs", "enabled": true, "#": "use yt-dlp instead of youtube-dl", "module": "yt_dlp", "#": "load ytdl options from config file", "config-file": "~/yt-dlp.conf" }, "mastodon": { "#": "add 'tabletop.social' as recognized mastodon instance", "#": "(run 'gallery-dl oauth:mastodon:tabletop.social to get an access token')", "tabletop.social": { "root": "https://tabletop.social", "access-token": "513a36c6..." }, "#": "set filename format strings for all 'mastodon' instances", "directory": ["mastodon", "{instance}", "{account[username]!l}"], "filename" : "{id}_{media[id]}.{extension}" }, "foolslide": { "#": "add two more foolslide instances", "otscans" : {"root": "https://otscans.com/foolslide"}, "helvetica": {"root": "https://helveticascans.com/r" } }, "foolfuuka": { "#": "add two other foolfuuka 4chan archives", "fireden-onion": {"root": "http://ydt6jy2ng3s3xg2e.onion"}, "scalearchive" : {"root": "https://archive.scaled.team" } }, "gelbooru_v01": { "#": "add a custom gelbooru_v01 instance", "#": "this is just an example, this specific instance is already included!", "allgirlbooru": {"root": "https://allgirl.booru.org"}, "#": "the following options are used for all gelbooru_v01 instances", "tag": { "directory": { "locals().get('bkey')": ["Booru", "AllGirlBooru", "Tags", "{bkey}", "{ckey}", "{search_tags}"], "" : ["Booru", "AllGirlBooru", "Tags", "_Unsorted", "{search_tags}"] } }, "post": { "directory": ["Booru", "AllGirlBooru", "Posts"] }, "archive": "~/gallery-dl/custom-archive-file-for-gelbooru_v01_instances.db", "filename": "{tags}_{id}_{md5}.{extension}", "sleep-request": [0, 1.2] }, "gelbooru_v02": { "#": "add a custom gelbooru_v02 instance", "#": "this is just an example, this specific instance is already included!", "tbib": { "root": "https://tbib.org", "#": "some sites have different domains for API access", "#": "use the 'api_root' option in addition to the 'root' setting here" } }, "tbib": { "#": "the following options are only used for TBIB", "#": "gelbooru_v02 has four subcategories at the moment, use custom directory settings for all of these", "tag": { "directory": { "locals().get('bkey')": ["Other Boorus", "TBIB", "Tags", "{bkey}", "{ckey}", "{search_tags}"], "" : ["Other Boorus", "TBIB", "Tags", "_Unsorted", "{search_tags}"] } }, "pool": { "directory": { "locals().get('bkey')": ["Other Boorus", "TBIB", "Pools", "{bkey}", "{ckey}", "{pool}"], "" : ["Other Boorus", "TBIB", "Pools", "_Unsorted", "{pool}"] } }, "favorite": { "directory": { "locals().get('bkey')": ["Other Boorus", "TBIB", "Favorites", "{bkey}", "{ckey}", "{favorite_id}"], "" : ["Other Boorus", "TBIB", "Favorites", "_Unsorted", "{favorite_id}"] } }, "post": { "directory": ["Other Boorus", "TBIB", "Posts"] }, "archive": "~/gallery-dl/custom-archive-file-for-TBIB.db", "filename": "{id}_{md5}.{extension}", "sleep-request": [0, 1.2] }, "urlshortener": { "tinyurl": {"root": "https://tinyurl.com"} } }, "downloader": { "#": "restrict download speed to 1 MB/s", "rate": "1M", "#": "show download progress indicator after 2 seconds", "progress": 2.0, "#": "retry failed downloads up to 3 times", "retries": 3, "#": "consider a download 'failed' after 8 seconds of inactivity", "timeout": 8.0, "#": "write '.part' files into a special directory", "part-directory": "/tmp/.download/", "#": "do not update file modification times", "mtime": false, "ytdl": { "#": "use yt-dlp instead of youtube-dl", "module": "yt_dlp" } }, "output": { "log": { "level": "info", "#": "use different ANSI colors for each log level", "format": { "debug" : "\u001b[0;37m{name}: {message}\u001b[0m", "info" : "\u001b[1;37m{name}: {message}\u001b[0m", "warning": "\u001b[1;33m{name}: {message}\u001b[0m", "error" : "\u001b[1;31m{name}: {message}\u001b[0m" } }, "#": "shorten filenames to fit into one terminal line", "#": "while also considering wider East-Asian characters", "shorten": "eaw", "#": "enable ANSI escape sequences on Windows", "ansi": true, "#": "write logging messages to a separate file", "logfile": { "path": "~/gallery-dl/log.txt", "mode": "w", "level": "debug" }, "#": "write unrecognized URLs to a separate file", "unsupportedfile": { "path": "~/gallery-dl/unsupported.txt", "mode": "a", "format": "{asctime} {message}", "format-date": "%Y-%m-%d-%H-%M-%S" } }, "postprocessor": { "#": "write 'content' metadata into separate files", "content": { "name" : "metadata", "#": "write data for every post instead of each individual file", "event": "post", "filename": "{post_id|tweet_id|id}.txt", "#": "write only the values for 'content' or 'description'", "mode" : "custom", "format": "{content|description}\n" }, "#": "put files into a '.cbz' archive", "cbz": { "name": "zip", "extension": "cbz" }, "#": "various ugoira post processor configurations to create different file formats", "ugoira-webm": { "name": "ugoira", "extension": "webm", "ffmpeg-args": ["-c:v", "libvpx-vp9", "-an", "-b:v", "0", "-crf", "30"], "ffmpeg-twopass": true, "ffmpeg-demuxer": "image2" }, "ugoira-mp4": { "name": "ugoira", "extension": "mp4", "ffmpeg-args": ["-c:v", "libx264", "-an", "-b:v", "4M", "-preset", "veryslow"], "ffmpeg-twopass": true, "libx264-prevent-odd": true }, "ugoira-gif": { "name": "ugoira", "extension": "gif", "ffmpeg-args": ["-filter_complex", "[0:v] split [a][b];[a] palettegen [p];[b][p] paletteuse"] }, "ugoira-copy": { "name": "ugoira", "extension": "mkv", "ffmpeg-args": ["-c", "copy"], "libx264-prevent-odd": false, "repeat-last-frame": false } }, "#": "use a custom cache file location", "cache": { "file": "~/gallery-dl/cache.sqlite3" } } ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1711040587.0 gallery_dl-1.26.9/docs/gallery-dl.conf0000644000175000017500000002333114577064113016270 0ustar00mikemike{ "extractor": { "base-directory": "./gallery-dl/", "parent-directory": false, "postprocessors": null, "archive": null, "cookies": null, "cookies-update": true, "proxy": null, "skip": true, "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/115.0", "retries": 4, "timeout": 30.0, "verify": true, "fallback": true, "sleep": 0, "sleep-request": 0, "sleep-extractor": 0, "path-restrict": "auto", "path-replace": "_", "path-remove": "\\u0000-\\u001f\\u007f", "path-strip": "auto", "path-extended": true, "extension-map": { "jpeg": "jpg", "jpe" : "jpg", "jfif": "jpg", "jif" : "jpg", "jfi" : "jpg" }, "artstation": { "external": false, "pro-first": true }, "aryion": { "username": null, "password": null, "recursive": true }, "bbc": { "width": 1920 }, "blogger": { "videos": true }, "cyberdrop": { "domain": null }, "danbooru": { "username": null, "password": null, "external": false, "metadata": false, "ugoira": false }, "derpibooru": { "api-key": null, "filter": 56027 }, "deviantart": { "client-id": null, "client-secret": null, "refresh-token": null, "auto-watch": false, "auto-unwatch": false, "comments": false, "extra": false, "flat": true, "folders": false, "group": true, "include": "gallery", "journals": "html", "jwt": false, "mature": true, "metadata": false, "original": true, "pagination": "api", "public": true, "quality": 100, "wait-min": 0 }, "e621": { "username": null, "password": null }, "exhentai": { "username": null, "password": null, "domain": "auto", "limits": true, "metadata": false, "original": true, "sleep-request": 5.0 }, "flickr": { "exif": false, "metadata": false, "size-max": null, "videos": true }, "furaffinity": { "descriptions": "text", "external": false, "include": "gallery", "layout": "auto" }, "gelbooru": { "api-key": null, "user-id": null }, "gofile": { "api-token": null, "website-token": null }, "hentaifoundry": { "include": "pictures" }, "hitomi": { "format": "webp", "metadata": false }, "idolcomplex": { "username": null, "password": null, "sleep-request": 5.0 }, "imagechest": { "access-token": null }, "imgbb": { "username": null, "password": null }, "imgur": { "mp4": true }, "inkbunny": { "username": null, "password": null, "orderby": "create_datetime" }, "instagram": { "api": "rest", "cookies": null, "include": "posts", "order-files": "asc", "order-posts": "asc", "previews": false, "sleep-request": [6.0, 12.0], "videos": true }, "khinsider": { "format": "mp3" }, "luscious": { "gif": false }, "mangadex": { "api-server": "https://api.mangadex.org", "api-parameters": null, "lang": null, "ratings": ["safe", "suggestive", "erotica", "pornographic"] }, "mangoxo": { "username": null, "password": null }, "misskey": { "access-token": null, "renotes": false, "replies": true }, "newgrounds": { "username": null, "password": null, "flash": true, "format": "original", "include": "art" }, "nijie": { "username": null, "password": null, "include": "illustration,doujin" }, "nitter": { "quoted": false, "retweets": false, "videos": true }, "oauth": { "browser": true, "cache": true, "host": "localhost", "port": 6414 }, "paheal": { "metadata": false }, "pillowfort": { "external": false, "inline": true, "reblogs": false }, "pinterest": { "domain": "auto", "sections": true, "videos": true }, "pixiv": { "refresh-token": null, "include": "artworks", "embeds": false, "metadata": false, "metadata-bookmark": false, "tags": "japanese", "ugoira": true }, "reactor": { "gif": false, "sleep-request": 5.0 }, "reddit": { "client-id": null, "user-agent": null, "refresh-token": null, "comments": 0, "morecomments": false, "date-min": 0, "date-max": 253402210800, "date-format": "%Y-%m-%dT%H:%M:%S", "id-min": null, "id-max": null, "recursion": 0, "videos": true }, "redgifs": { "format": ["hd", "sd", "gif"] }, "sankaku": { "username": null, "password": null, "refresh": false }, "sankakucomplex": { "embeds": false, "videos": true }, "skeb": { "article": false, "filters": null, "sent-requests": false, "thumbnails": false }, "smugmug": { "videos": true }, "seiga": { "username": null, "password": null }, "subscribestar": { "username": null, "password": null }, "tsumino": { "username": null, "password": null }, "tumblr": { "avatar": false, "external": false, "inline": true, "posts": "all", "offset": 0, "original": true, "reblogs": true }, "twitter": { "username": null, "password": null, "cards": false, "conversations": false, "pinned": false, "quoted": false, "replies": true, "retweets": false, "strategy": null, "text-tweets": false, "twitpic": false, "unique": true, "users": "user", "videos": true }, "unsplash": { "format": "raw" }, "vsco": { "videos": true }, "wallhaven": { "api-key": null, "metadata": false, "include": "uploads" }, "weasyl": { "api-key": null, "metadata": false }, "weibo": { "livephoto": true, "retweets": true, "videos": true }, "ytdl": { "enabled": false, "format": null, "generic": true, "logging": true, "module": null, "raw-options": null }, "zerochan": { "username": null, "password": null, "metadata": false }, "booru": { "tags": false, "notes": false } }, "downloader": { "filesize-min": null, "filesize-max": null, "mtime": true, "part": true, "part-directory": null, "progress": 3.0, "rate": null, "retries": 4, "timeout": 30.0, "verify": true, "http": { "adjust-extensions": true, "chunk-size": 32768, "headers": null, "validate": true }, "ytdl": { "format": null, "forward-cookies": false, "logging": true, "module": null, "outtmpl": null, "raw-options": null } }, "output": { "mode": "auto", "progress": true, "shorten": true, "ansi": false, "colors": { "success": "1;32", "skip" : "2" }, "skip": true, "log": "[{name}][{levelname}] {message}", "logfile": null, "unsupportedfile": null }, "netrc": false } ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1711211673.9824722 gallery_dl-1.26.9/gallery_dl/0000755000175000017500000000000014577602232014551 5ustar00mikemike././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1711040587.0 gallery_dl-1.26.9/gallery_dl/__init__.py0000644000175000017500000004246514577064113016675 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2014-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. import sys import logging from . import version, config, option, output, extractor, job, util, exception __author__ = "Mike Fährmann" __copyright__ = "Copyright 2014-2023 Mike Fährmann" __license__ = "GPLv2" __maintainer__ = "Mike Fährmann" __email__ = "mike_faehrmann@web.de" __version__ = version.__version__ def main(): try: parser = option.build_parser() args = parser.parse_args() log = output.initialize_logging(args.loglevel) # configuration if args.config_load: config.load() if args.configs_json: config.load(args.configs_json, strict=True) if args.configs_yaml: import yaml config.load(args.configs_yaml, strict=True, loads=yaml.safe_load) if args.configs_toml: try: import tomllib as toml except ImportError: import toml config.load(args.configs_toml, strict=True, loads=toml.loads) if args.filename: filename = args.filename if filename == "/O": filename = "{filename}.{extension}" elif filename.startswith("\\f"): filename = "\f" + filename[2:] config.set((), "filename", filename) if args.directory is not None: config.set((), "base-directory", args.directory) config.set((), "directory", ()) if args.postprocessors: config.set((), "postprocessors", args.postprocessors) if args.abort: config.set((), "skip", "abort:" + str(args.abort)) if args.terminate: config.set((), "skip", "terminate:" + str(args.terminate)) if args.cookies_from_browser: browser, _, profile = args.cookies_from_browser.partition(":") browser, _, keyring = browser.partition("+") browser, _, domain = browser.partition("/") if profile.startswith(":"): container = profile[1:] profile = None else: profile, _, container = profile.partition("::") config.set((), "cookies", ( browser, profile, keyring, container, domain)) if args.options_pp: config.set((), "postprocessor-options", args.options_pp) for opts in args.options: config.set(*opts) output.configure_standard_streams() # signals signals = config.get((), "signals-ignore") if signals: import signal if isinstance(signals, str): signals = signals.split(",") for signal_name in signals: signal_num = getattr(signal, signal_name, None) if signal_num is None: log.warning("signal '%s' is not defined", signal_name) else: signal.signal(signal_num, signal.SIG_IGN) # enable ANSI escape sequences on Windows if util.WINDOWS and config.get(("output",), "ansi"): from ctypes import windll, wintypes, byref kernel32 = windll.kernel32 mode = wintypes.DWORD() for handle_id in (-11, -12): # stdout and stderr handle = kernel32.GetStdHandle(handle_id) kernel32.GetConsoleMode(handle, byref(mode)) if not mode.value & 0x4: mode.value |= 0x4 kernel32.SetConsoleMode(handle, mode) output.ANSI = True # format string separator separator = config.get((), "format-separator") if separator: from . import formatter formatter._SEPARATOR = separator # eval globals path = config.get((), "globals") if path: util.GLOBALS.update(util.import_file(path).__dict__) # loglevels output.configure_logging(args.loglevel) if args.loglevel >= logging.ERROR: config.set(("output",), "mode", "null") config.set(("downloader",), "progress", None) elif args.loglevel <= logging.DEBUG: import platform import requests extra = "" if util.EXECUTABLE: extra = " - Executable" else: git_head = util.git_head() if git_head: extra = " - Git HEAD: " + git_head log.debug("Version %s%s", __version__, extra) log.debug("Python %s - %s", platform.python_version(), platform.platform()) try: log.debug("requests %s - urllib3 %s", requests.__version__, requests.packages.urllib3.__version__) except AttributeError: pass log.debug("Configuration Files %s", config._files) # extractor modules modules = config.get(("extractor",), "modules") if modules is not None: if isinstance(modules, str): modules = modules.split(",") extractor.modules = modules # external modules if args.extractor_sources: sources = args.extractor_sources sources.append(None) else: sources = config.get(("extractor",), "module-sources") if sources: import os modules = [] for source in sources: if source: path = util.expand_path(source) try: files = os.listdir(path) modules.append(extractor._modules_path(path, files)) except Exception as exc: log.warning("Unable to load modules from %s (%s: %s)", path, exc.__class__.__name__, exc) else: modules.append(extractor._modules_internal()) if len(modules) > 1: import itertools extractor._module_iter = itertools.chain(*modules) elif not modules: extractor._module_iter = () else: extractor._module_iter = iter(modules[0]) if args.list_modules: extractor.modules.append("") sys.stdout.write("\n".join(extractor.modules)) elif args.list_extractors: write = sys.stdout.write fmt = ("{}{}\nCategory: {} - Subcategory: {}" "\nExample : {}\n\n").format for extr in extractor.extractors(): write(fmt( extr.__name__, "\n" + extr.__doc__ if extr.__doc__ else "", extr.category, extr.subcategory, extr.example, )) elif args.clear_cache: from . import cache log = logging.getLogger("cache") cnt = cache.clear(args.clear_cache) if cnt is None: log.error("Database file not available") else: log.info( "Deleted %d %s from '%s'", cnt, "entry" if cnt == 1 else "entries", cache._path(), ) elif args.config_init: return config.initialize() else: if not args.urls and not args.input_files: parser.error( "The following arguments are required: URL\n" "Use 'gallery-dl --help' to get a list of all options.") if args.list_urls: jobtype = job.UrlJob jobtype.maxdepth = args.list_urls if config.get(("output",), "fallback", True): jobtype.handle_url = \ staticmethod(jobtype.handle_url_fallback) else: jobtype = args.jobtype or job.DownloadJob input_manager = InputManager() input_manager.log = input_log = logging.getLogger("inputfile") # unsupported file logging handler handler = output.setup_logging_handler( "unsupportedfile", fmt="{message}") if handler: ulog = job.Job.ulog = logging.getLogger("unsupported") ulog.addHandler(handler) ulog.propagate = False # error file logging handler handler = output.setup_logging_handler( "errorfile", fmt="{message}", mode="a") if handler: elog = input_manager.err = logging.getLogger("errorfile") elog.addHandler(handler) elog.propagate = False # collect input URLs input_manager.add_list(args.urls) if args.input_files: for input_file, action in args.input_files: try: path = util.expand_path(input_file) input_manager.add_file(path, action) except Exception as exc: input_log.error(exc) return getattr(exc, "code", 128) pformat = config.get(("output",), "progress", True) if pformat and len(input_manager.urls) > 1 and \ args.loglevel < logging.ERROR: input_manager.progress(pformat) # process input URLs retval = 0 for url in input_manager: try: log.debug("Starting %s for '%s'", jobtype.__name__, url) if isinstance(url, ExtendedUrl): for opts in url.gconfig: config.set(*opts) with config.apply(url.lconfig): status = jobtype(url.value).run() else: status = jobtype(url).run() if status: retval |= status input_manager.error() else: input_manager.success() except exception.TerminateExtraction: pass except exception.RestartExtraction: log.debug("Restarting '%s'", url) continue except exception.NoExtractorError: log.error("Unsupported URL '%s'", url) retval |= 64 input_manager.error() input_manager.next() return retval except KeyboardInterrupt: raise SystemExit("\nKeyboardInterrupt") except BrokenPipeError: pass except OSError as exc: import errno if exc.errno != errno.EPIPE: raise return 1 class InputManager(): def __init__(self): self.urls = [] self.files = () self.log = self.err = None self._url = "" self._item = None self._index = 0 self._pformat = None def add_url(self, url): self.urls.append(url) def add_list(self, urls): self.urls += urls def add_file(self, path, action=None): """Process an input file. Lines starting with '#' and empty lines will be ignored. Lines starting with '-' will be interpreted as a key-value pair separated by an '='. where 'key' is a dot-separated option name and 'value' is a JSON-parsable string. These configuration options will be applied while processing the next URL only. Lines starting with '-G' are the same as above, except these options will be applied for *all* following URLs, i.e. they are Global. Everything else will be used as a potential URL. Example input file: # settings global options -G base-directory = "/tmp/" -G skip = false # setting local options for the next URL -filename="spaces_are_optional.jpg" -skip = true https://example.org/ # next URL uses default filename and 'skip' is false. https://example.com/index.htm # comment1 https://example.com/404.htm # comment2 """ if path == "-" and not action: try: lines = sys.stdin.readlines() except Exception: raise exception.InputFileError("stdin is not readable") path = None else: try: with open(path, encoding="utf-8") as fp: lines = fp.readlines() except Exception as exc: raise exception.InputFileError(str(exc)) if self.files: self.files[path] = lines else: self.files = {path: lines} if action == "c": action = self._action_comment elif action == "d": action = self._action_delete else: action = None gconf = [] lconf = [] indicies = [] strip_comment = None append = self.urls.append for n, line in enumerate(lines): line = line.strip() if not line or line[0] == "#": # empty line or comment continue elif line[0] == "-": # config spec if len(line) >= 2 and line[1] == "G": conf = gconf line = line[2:] else: conf = lconf line = line[1:] if action: indicies.append(n) key, sep, value = line.partition("=") if not sep: raise exception.InputFileError( "Invalid KEY=VALUE pair '%s' on line %s in %s", line, n+1, path) try: value = util.json_loads(value.strip()) except ValueError as exc: self.log.debug("%s: %s", exc.__class__.__name__, exc) raise exception.InputFileError( "Unable to parse '%s' on line %s in %s", value, n+1, path) key = key.strip().split(".") conf.append((key[:-1], key[-1], value)) else: # url if " #" in line or "\t#" in line: if strip_comment is None: import re strip_comment = re.compile(r"\s+#.*").sub line = strip_comment("", line) if gconf or lconf: url = ExtendedUrl(line, gconf, lconf) gconf = [] lconf = [] else: url = line if action: indicies.append(n) append((url, path, action, indicies)) indicies = [] else: append(url) def progress(self, pformat=True): if pformat is True: pformat = "[{current}/{total}] {url}\n" else: pformat += "\n" self._pformat = pformat.format_map def next(self): self._index += 1 def success(self): if self._item: self._rewrite() def error(self): if self.err: if self._item: url, path, action, indicies = self._item lines = self.files[path] out = "".join(lines[i] for i in indicies) if out and out[-1] == "\n": out = out[:-1] self._rewrite() else: out = str(self._url) self.err.info(out) def _rewrite(self): url, path, action, indicies = self._item lines = self.files[path] action(lines, indicies) try: with open(path, "w", encoding="utf-8") as fp: fp.writelines(lines) except Exception as exc: self.log.warning( "Unable to update '%s' (%s: %s)", path, exc.__class__.__name__, exc) @staticmethod def _action_comment(lines, indicies): for i in indicies: lines[i] = "# " + lines[i] @staticmethod def _action_delete(lines, indicies): for i in indicies: lines[i] = "" def __iter__(self): self._index = 0 return self def __next__(self): try: url = self.urls[self._index] except IndexError: raise StopIteration if isinstance(url, tuple): self._item = url url = url[0] else: self._item = None self._url = url if self._pformat: output.stderr_write(self._pformat({ "total" : len(self.urls), "current": self._index + 1, "url" : url, })) return url class ExtendedUrl(): """URL with attached config key-value pairs""" __slots__ = ("value", "gconfig", "lconfig") def __init__(self, url, gconf, lconf): self.value = url self.gconfig = gconf self.lconfig = lconf def __str__(self): return self.value ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709611201.0 gallery_dl-1.26.9/gallery_dl/__main__.py0000644000175000017500000000105714571514301016637 0ustar00mikemike#!/usr/bin/env python3 # -*- coding: utf-8 -*- # Copyright 2017-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. import sys if not __package__ and not hasattr(sys, "frozen"): import os.path path = os.path.realpath(os.path.abspath(__file__)) sys.path.insert(0, os.path.dirname(os.path.dirname(path))) import gallery_dl if __name__ == "__main__": raise SystemExit(gallery_dl.main()) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709611201.0 gallery_dl-1.26.9/gallery_dl/actions.py0000644000175000017500000000450314571514301016556 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """ """ import re import logging import operator from . import util, exception def parse(actionspec): if isinstance(actionspec, dict): actionspec = actionspec.items() actions = {} actions[logging.DEBUG] = actions_d = [] actions[logging.INFO] = actions_i = [] actions[logging.WARNING] = actions_w = [] actions[logging.ERROR] = actions_e = [] for event, spec in actionspec: level, _, pattern = event.partition(":") type, _, args = spec.partition(" ") action = (re.compile(pattern).search, ACTIONS[type](args)) level = level.strip() if not level or level == "*": actions_d.append(action) actions_i.append(action) actions_w.append(action) actions_e.append(action) else: actions[_level_to_int(level)].append(action) return actions def _level_to_int(level): try: return logging._nameToLevel[level] except KeyError: return int(level) def action_print(opts): def _print(_): print(opts) return _print def action_status(opts): op, value = re.match(r"\s*([&|^=])=?\s*(\d+)", opts).groups() op = { "&": operator.and_, "|": operator.or_, "^": operator.xor, "=": lambda x, y: y, }[op] value = int(value) def _status(args): args["job"].status = op(args["job"].status, value) return _status def action_level(opts): level = _level_to_int(opts.lstrip(" ~=")) def _level(args): args["level"] = level return _level def action_wait(opts): def _wait(args): input("Press Enter to continue") return _wait def action_restart(opts): return util.raises(exception.RestartExtraction) def action_exit(opts): try: opts = int(opts) except ValueError: pass def _exit(args): raise SystemExit(opts) return _exit ACTIONS = { "print" : action_print, "status" : action_status, "level" : action_level, "restart": action_restart, "wait" : action_wait, "exit" : action_exit, } ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1708353813.0 gallery_dl-1.26.9/gallery_dl/aes.py0000644000175000017500000005116714564664425015715 0ustar00mikemike# -*- coding: utf-8 -*- # This is a slightly modified version of yt-dlp's aes module. # https://github.com/yt-dlp/yt-dlp/blob/master/yt_dlp/aes.py import struct import binascii from math import ceil try: from Cryptodome.Cipher import AES as Cryptodome_AES except ImportError: try: from Crypto.Cipher import AES as Cryptodome_AES except ImportError: Cryptodome_AES = None if Cryptodome_AES: def aes_cbc_decrypt_bytes(data, key, iv): """Decrypt bytes with AES-CBC using pycryptodome""" return Cryptodome_AES.new( key, Cryptodome_AES.MODE_CBC, iv).decrypt(data) def aes_gcm_decrypt_and_verify_bytes(data, key, tag, nonce): """Decrypt bytes with AES-GCM using pycryptodome""" return Cryptodome_AES.new( key, Cryptodome_AES.MODE_GCM, nonce).decrypt_and_verify(data, tag) else: def aes_cbc_decrypt_bytes(data, key, iv): """Decrypt bytes with AES-CBC using native implementation""" return intlist_to_bytes(aes_cbc_decrypt( bytes_to_intlist(data), bytes_to_intlist(key), bytes_to_intlist(iv), )) def aes_gcm_decrypt_and_verify_bytes(data, key, tag, nonce): """Decrypt bytes with AES-GCM using native implementation""" return intlist_to_bytes(aes_gcm_decrypt_and_verify( bytes_to_intlist(data), bytes_to_intlist(key), bytes_to_intlist(tag), bytes_to_intlist(nonce), )) bytes_to_intlist = list def intlist_to_bytes(xs): if not xs: return b"" return struct.pack("%dB" % len(xs), *xs) def unpad_pkcs7(data): return data[:-data[-1]] BLOCK_SIZE_BYTES = 16 def aes_ecb_encrypt(data, key, iv=None): """ Encrypt with aes in ECB mode @param {int[]} data cleartext @param {int[]} key 16/24/32-Byte cipher key @param {int[]} iv Unused for this mode @returns {int[]} encrypted data """ expanded_key = key_expansion(key) block_count = int(ceil(float(len(data)) / BLOCK_SIZE_BYTES)) encrypted_data = [] for i in range(block_count): block = data[i * BLOCK_SIZE_BYTES: (i + 1) * BLOCK_SIZE_BYTES] encrypted_data += aes_encrypt(block, expanded_key) encrypted_data = encrypted_data[:len(data)] return encrypted_data def aes_ecb_decrypt(data, key, iv=None): """ Decrypt with aes in ECB mode @param {int[]} data cleartext @param {int[]} key 16/24/32-Byte cipher key @param {int[]} iv Unused for this mode @returns {int[]} decrypted data """ expanded_key = key_expansion(key) block_count = int(ceil(float(len(data)) / BLOCK_SIZE_BYTES)) encrypted_data = [] for i in range(block_count): block = data[i * BLOCK_SIZE_BYTES: (i + 1) * BLOCK_SIZE_BYTES] encrypted_data += aes_decrypt(block, expanded_key) encrypted_data = encrypted_data[:len(data)] return encrypted_data def aes_ctr_decrypt(data, key, iv): """ Decrypt with aes in counter mode @param {int[]} data cipher @param {int[]} key 16/24/32-Byte cipher key @param {int[]} iv 16-Byte initialization vector @returns {int[]} decrypted data """ return aes_ctr_encrypt(data, key, iv) def aes_ctr_encrypt(data, key, iv): """ Encrypt with aes in counter mode @param {int[]} data cleartext @param {int[]} key 16/24/32-Byte cipher key @param {int[]} iv 16-Byte initialization vector @returns {int[]} encrypted data """ expanded_key = key_expansion(key) block_count = int(ceil(float(len(data)) / BLOCK_SIZE_BYTES)) counter = iter_vector(iv) encrypted_data = [] for i in range(block_count): counter_block = next(counter) block = data[i * BLOCK_SIZE_BYTES: (i + 1) * BLOCK_SIZE_BYTES] block += [0] * (BLOCK_SIZE_BYTES - len(block)) cipher_counter_block = aes_encrypt(counter_block, expanded_key) encrypted_data += xor(block, cipher_counter_block) encrypted_data = encrypted_data[:len(data)] return encrypted_data def aes_cbc_decrypt(data, key, iv): """ Decrypt with aes in CBC mode @param {int[]} data cipher @param {int[]} key 16/24/32-Byte cipher key @param {int[]} iv 16-Byte IV @returns {int[]} decrypted data """ expanded_key = key_expansion(key) block_count = int(ceil(float(len(data)) / BLOCK_SIZE_BYTES)) decrypted_data = [] previous_cipher_block = iv for i in range(block_count): block = data[i * BLOCK_SIZE_BYTES: (i + 1) * BLOCK_SIZE_BYTES] block += [0] * (BLOCK_SIZE_BYTES - len(block)) decrypted_block = aes_decrypt(block, expanded_key) decrypted_data += xor(decrypted_block, previous_cipher_block) previous_cipher_block = block decrypted_data = decrypted_data[:len(data)] return decrypted_data def aes_cbc_encrypt(data, key, iv): """ Encrypt with aes in CBC mode. Using PKCS#7 padding @param {int[]} data cleartext @param {int[]} key 16/24/32-Byte cipher key @param {int[]} iv 16-Byte IV @returns {int[]} encrypted data """ expanded_key = key_expansion(key) block_count = int(ceil(float(len(data)) / BLOCK_SIZE_BYTES)) encrypted_data = [] previous_cipher_block = iv for i in range(block_count): block = data[i * BLOCK_SIZE_BYTES: (i + 1) * BLOCK_SIZE_BYTES] remaining_length = BLOCK_SIZE_BYTES - len(block) block += [remaining_length] * remaining_length mixed_block = xor(block, previous_cipher_block) encrypted_block = aes_encrypt(mixed_block, expanded_key) encrypted_data += encrypted_block previous_cipher_block = encrypted_block return encrypted_data def aes_gcm_decrypt_and_verify(data, key, tag, nonce): """ Decrypt with aes in GBM mode and checks authenticity using tag @param {int[]} data cipher @param {int[]} key 16-Byte cipher key @param {int[]} tag authentication tag @param {int[]} nonce IV (recommended 12-Byte) @returns {int[]} decrypted data """ # XXX: check aes, gcm param hash_subkey = aes_encrypt([0] * BLOCK_SIZE_BYTES, key_expansion(key)) if len(nonce) == 12: j0 = nonce + [0, 0, 0, 1] else: fill = (BLOCK_SIZE_BYTES - (len(nonce) % BLOCK_SIZE_BYTES)) % \ BLOCK_SIZE_BYTES + 8 ghash_in = nonce + [0] * fill + bytes_to_intlist( (8 * len(nonce)).to_bytes(8, "big")) j0 = ghash(hash_subkey, ghash_in) # TODO: add nonce support to aes_ctr_decrypt # nonce_ctr = j0[:12] iv_ctr = inc(j0) decrypted_data = aes_ctr_decrypt( data, key, iv_ctr + [0] * (BLOCK_SIZE_BYTES - len(iv_ctr))) pad_len = len(data) // 16 * 16 s_tag = ghash( hash_subkey, data + [0] * (BLOCK_SIZE_BYTES - len(data) + pad_len) + # pad bytes_to_intlist( (0 * 8).to_bytes(8, "big") + # length of associated data ((len(data) * 8).to_bytes(8, "big")) # length of data ) ) if tag != aes_ctr_encrypt(s_tag, key, j0): raise ValueError("Mismatching authentication tag") return decrypted_data def aes_encrypt(data, expanded_key): """ Encrypt one block with aes @param {int[]} data 16-Byte state @param {int[]} expanded_key 176/208/240-Byte expanded key @returns {int[]} 16-Byte cipher """ rounds = len(expanded_key) // BLOCK_SIZE_BYTES - 1 data = xor(data, expanded_key[:BLOCK_SIZE_BYTES]) for i in range(1, rounds + 1): data = sub_bytes(data) data = shift_rows(data) if i != rounds: data = list(iter_mix_columns(data, MIX_COLUMN_MATRIX)) data = xor(data, expanded_key[ i * BLOCK_SIZE_BYTES: (i + 1) * BLOCK_SIZE_BYTES]) return data def aes_decrypt(data, expanded_key): """ Decrypt one block with aes @param {int[]} data 16-Byte cipher @param {int[]} expanded_key 176/208/240-Byte expanded key @returns {int[]} 16-Byte state """ rounds = len(expanded_key) // BLOCK_SIZE_BYTES - 1 for i in range(rounds, 0, -1): data = xor(data, expanded_key[ i * BLOCK_SIZE_BYTES: (i + 1) * BLOCK_SIZE_BYTES]) if i != rounds: data = list(iter_mix_columns(data, MIX_COLUMN_MATRIX_INV)) data = shift_rows_inv(data) data = sub_bytes_inv(data) data = xor(data, expanded_key[:BLOCK_SIZE_BYTES]) return data def aes_decrypt_text(data, password, key_size_bytes): """ Decrypt text - The first 8 Bytes of decoded 'data' are the 8 high Bytes of the counter - The cipher key is retrieved by encrypting the first 16 Byte of 'password' with the first 'key_size_bytes' Bytes from 'password' (if necessary filled with 0's) - Mode of operation is 'counter' @param {str} data Base64 encoded string @param {str,unicode} password Password (will be encoded with utf-8) @param {int} key_size_bytes Possible values: 16 for 128-Bit, 24 for 192-Bit, or 32 for 256-Bit @returns {str} Decrypted data """ NONCE_LENGTH_BYTES = 8 data = bytes_to_intlist(binascii.a2b_base64(data)) password = bytes_to_intlist(password.encode("utf-8")) key = password[:key_size_bytes] + [0] * (key_size_bytes - len(password)) key = aes_encrypt(key[:BLOCK_SIZE_BYTES], key_expansion(key)) * \ (key_size_bytes // BLOCK_SIZE_BYTES) nonce = data[:NONCE_LENGTH_BYTES] cipher = data[NONCE_LENGTH_BYTES:] return intlist_to_bytes(aes_ctr_decrypt( cipher, key, nonce + [0] * (BLOCK_SIZE_BYTES - NONCE_LENGTH_BYTES) )) RCON = ( 0x8d, 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80, 0x1b, 0x36, ) SBOX = ( 0x63, 0x7C, 0x77, 0x7B, 0xF2, 0x6B, 0x6F, 0xC5, 0x30, 0x01, 0x67, 0x2B, 0xFE, 0xD7, 0xAB, 0x76, 0xCA, 0x82, 0xC9, 0x7D, 0xFA, 0x59, 0x47, 0xF0, 0xAD, 0xD4, 0xA2, 0xAF, 0x9C, 0xA4, 0x72, 0xC0, 0xB7, 0xFD, 0x93, 0x26, 0x36, 0x3F, 0xF7, 0xCC, 0x34, 0xA5, 0xE5, 0xF1, 0x71, 0xD8, 0x31, 0x15, 0x04, 0xC7, 0x23, 0xC3, 0x18, 0x96, 0x05, 0x9A, 0x07, 0x12, 0x80, 0xE2, 0xEB, 0x27, 0xB2, 0x75, 0x09, 0x83, 0x2C, 0x1A, 0x1B, 0x6E, 0x5A, 0xA0, 0x52, 0x3B, 0xD6, 0xB3, 0x29, 0xE3, 0x2F, 0x84, 0x53, 0xD1, 0x00, 0xED, 0x20, 0xFC, 0xB1, 0x5B, 0x6A, 0xCB, 0xBE, 0x39, 0x4A, 0x4C, 0x58, 0xCF, 0xD0, 0xEF, 0xAA, 0xFB, 0x43, 0x4D, 0x33, 0x85, 0x45, 0xF9, 0x02, 0x7F, 0x50, 0x3C, 0x9F, 0xA8, 0x51, 0xA3, 0x40, 0x8F, 0x92, 0x9D, 0x38, 0xF5, 0xBC, 0xB6, 0xDA, 0x21, 0x10, 0xFF, 0xF3, 0xD2, 0xCD, 0x0C, 0x13, 0xEC, 0x5F, 0x97, 0x44, 0x17, 0xC4, 0xA7, 0x7E, 0x3D, 0x64, 0x5D, 0x19, 0x73, 0x60, 0x81, 0x4F, 0xDC, 0x22, 0x2A, 0x90, 0x88, 0x46, 0xEE, 0xB8, 0x14, 0xDE, 0x5E, 0x0B, 0xDB, 0xE0, 0x32, 0x3A, 0x0A, 0x49, 0x06, 0x24, 0x5C, 0xC2, 0xD3, 0xAC, 0x62, 0x91, 0x95, 0xE4, 0x79, 0xE7, 0xC8, 0x37, 0x6D, 0x8D, 0xD5, 0x4E, 0xA9, 0x6C, 0x56, 0xF4, 0xEA, 0x65, 0x7A, 0xAE, 0x08, 0xBA, 0x78, 0x25, 0x2E, 0x1C, 0xA6, 0xB4, 0xC6, 0xE8, 0xDD, 0x74, 0x1F, 0x4B, 0xBD, 0x8B, 0x8A, 0x70, 0x3E, 0xB5, 0x66, 0x48, 0x03, 0xF6, 0x0E, 0x61, 0x35, 0x57, 0xB9, 0x86, 0xC1, 0x1D, 0x9E, 0xE1, 0xF8, 0x98, 0x11, 0x69, 0xD9, 0x8E, 0x94, 0x9B, 0x1E, 0x87, 0xE9, 0xCE, 0x55, 0x28, 0xDF, 0x8C, 0xA1, 0x89, 0x0D, 0xBF, 0xE6, 0x42, 0x68, 0x41, 0x99, 0x2D, 0x0F, 0xB0, 0x54, 0xBB, 0x16, ) SBOX_INV = ( 0x52, 0x09, 0x6a, 0xd5, 0x30, 0x36, 0xa5, 0x38, 0xbf, 0x40, 0xa3, 0x9e, 0x81, 0xf3, 0xd7, 0xfb, 0x7c, 0xe3, 0x39, 0x82, 0x9b, 0x2f, 0xff, 0x87, 0x34, 0x8e, 0x43, 0x44, 0xc4, 0xde, 0xe9, 0xcb, 0x54, 0x7b, 0x94, 0x32, 0xa6, 0xc2, 0x23, 0x3d, 0xee, 0x4c, 0x95, 0x0b, 0x42, 0xfa, 0xc3, 0x4e, 0x08, 0x2e, 0xa1, 0x66, 0x28, 0xd9, 0x24, 0xb2, 0x76, 0x5b, 0xa2, 0x49, 0x6d, 0x8b, 0xd1, 0x25, 0x72, 0xf8, 0xf6, 0x64, 0x86, 0x68, 0x98, 0x16, 0xd4, 0xa4, 0x5c, 0xcc, 0x5d, 0x65, 0xb6, 0x92, 0x6c, 0x70, 0x48, 0x50, 0xfd, 0xed, 0xb9, 0xda, 0x5e, 0x15, 0x46, 0x57, 0xa7, 0x8d, 0x9d, 0x84, 0x90, 0xd8, 0xab, 0x00, 0x8c, 0xbc, 0xd3, 0x0a, 0xf7, 0xe4, 0x58, 0x05, 0xb8, 0xb3, 0x45, 0x06, 0xd0, 0x2c, 0x1e, 0x8f, 0xca, 0x3f, 0x0f, 0x02, 0xc1, 0xaf, 0xbd, 0x03, 0x01, 0x13, 0x8a, 0x6b, 0x3a, 0x91, 0x11, 0x41, 0x4f, 0x67, 0xdc, 0xea, 0x97, 0xf2, 0xcf, 0xce, 0xf0, 0xb4, 0xe6, 0x73, 0x96, 0xac, 0x74, 0x22, 0xe7, 0xad, 0x35, 0x85, 0xe2, 0xf9, 0x37, 0xe8, 0x1c, 0x75, 0xdf, 0x6e, 0x47, 0xf1, 0x1a, 0x71, 0x1d, 0x29, 0xc5, 0x89, 0x6f, 0xb7, 0x62, 0x0e, 0xaa, 0x18, 0xbe, 0x1b, 0xfc, 0x56, 0x3e, 0x4b, 0xc6, 0xd2, 0x79, 0x20, 0x9a, 0xdb, 0xc0, 0xfe, 0x78, 0xcd, 0x5a, 0xf4, 0x1f, 0xdd, 0xa8, 0x33, 0x88, 0x07, 0xc7, 0x31, 0xb1, 0x12, 0x10, 0x59, 0x27, 0x80, 0xec, 0x5f, 0x60, 0x51, 0x7f, 0xa9, 0x19, 0xb5, 0x4a, 0x0d, 0x2d, 0xe5, 0x7a, 0x9f, 0x93, 0xc9, 0x9c, 0xef, 0xa0, 0xe0, 0x3b, 0x4d, 0xae, 0x2a, 0xf5, 0xb0, 0xc8, 0xeb, 0xbb, 0x3c, 0x83, 0x53, 0x99, 0x61, 0x17, 0x2b, 0x04, 0x7e, 0xba, 0x77, 0xd6, 0x26, 0xe1, 0x69, 0x14, 0x63, 0x55, 0x21, 0x0c, 0x7d ) MIX_COLUMN_MATRIX = ( (0x2, 0x3, 0x1, 0x1), (0x1, 0x2, 0x3, 0x1), (0x1, 0x1, 0x2, 0x3), (0x3, 0x1, 0x1, 0x2), ) MIX_COLUMN_MATRIX_INV = ( (0xE, 0xB, 0xD, 0x9), (0x9, 0xE, 0xB, 0xD), (0xD, 0x9, 0xE, 0xB), (0xB, 0xD, 0x9, 0xE), ) RIJNDAEL_EXP_TABLE = ( 0x01, 0x03, 0x05, 0x0F, 0x11, 0x33, 0x55, 0xFF, 0x1A, 0x2E, 0x72, 0x96, 0xA1, 0xF8, 0x13, 0x35, 0x5F, 0xE1, 0x38, 0x48, 0xD8, 0x73, 0x95, 0xA4, 0xF7, 0x02, 0x06, 0x0A, 0x1E, 0x22, 0x66, 0xAA, 0xE5, 0x34, 0x5C, 0xE4, 0x37, 0x59, 0xEB, 0x26, 0x6A, 0xBE, 0xD9, 0x70, 0x90, 0xAB, 0xE6, 0x31, 0x53, 0xF5, 0x04, 0x0C, 0x14, 0x3C, 0x44, 0xCC, 0x4F, 0xD1, 0x68, 0xB8, 0xD3, 0x6E, 0xB2, 0xCD, 0x4C, 0xD4, 0x67, 0xA9, 0xE0, 0x3B, 0x4D, 0xD7, 0x62, 0xA6, 0xF1, 0x08, 0x18, 0x28, 0x78, 0x88, 0x83, 0x9E, 0xB9, 0xD0, 0x6B, 0xBD, 0xDC, 0x7F, 0x81, 0x98, 0xB3, 0xCE, 0x49, 0xDB, 0x76, 0x9A, 0xB5, 0xC4, 0x57, 0xF9, 0x10, 0x30, 0x50, 0xF0, 0x0B, 0x1D, 0x27, 0x69, 0xBB, 0xD6, 0x61, 0xA3, 0xFE, 0x19, 0x2B, 0x7D, 0x87, 0x92, 0xAD, 0xEC, 0x2F, 0x71, 0x93, 0xAE, 0xE9, 0x20, 0x60, 0xA0, 0xFB, 0x16, 0x3A, 0x4E, 0xD2, 0x6D, 0xB7, 0xC2, 0x5D, 0xE7, 0x32, 0x56, 0xFA, 0x15, 0x3F, 0x41, 0xC3, 0x5E, 0xE2, 0x3D, 0x47, 0xC9, 0x40, 0xC0, 0x5B, 0xED, 0x2C, 0x74, 0x9C, 0xBF, 0xDA, 0x75, 0x9F, 0xBA, 0xD5, 0x64, 0xAC, 0xEF, 0x2A, 0x7E, 0x82, 0x9D, 0xBC, 0xDF, 0x7A, 0x8E, 0x89, 0x80, 0x9B, 0xB6, 0xC1, 0x58, 0xE8, 0x23, 0x65, 0xAF, 0xEA, 0x25, 0x6F, 0xB1, 0xC8, 0x43, 0xC5, 0x54, 0xFC, 0x1F, 0x21, 0x63, 0xA5, 0xF4, 0x07, 0x09, 0x1B, 0x2D, 0x77, 0x99, 0xB0, 0xCB, 0x46, 0xCA, 0x45, 0xCF, 0x4A, 0xDE, 0x79, 0x8B, 0x86, 0x91, 0xA8, 0xE3, 0x3E, 0x42, 0xC6, 0x51, 0xF3, 0x0E, 0x12, 0x36, 0x5A, 0xEE, 0x29, 0x7B, 0x8D, 0x8C, 0x8F, 0x8A, 0x85, 0x94, 0xA7, 0xF2, 0x0D, 0x17, 0x39, 0x4B, 0xDD, 0x7C, 0x84, 0x97, 0xA2, 0xFD, 0x1C, 0x24, 0x6C, 0xB4, 0xC7, 0x52, 0xF6, 0x01, ) RIJNDAEL_LOG_TABLE = ( 0x00, 0x00, 0x19, 0x01, 0x32, 0x02, 0x1a, 0xc6, 0x4b, 0xc7, 0x1b, 0x68, 0x33, 0xee, 0xdf, 0x03, 0x64, 0x04, 0xe0, 0x0e, 0x34, 0x8d, 0x81, 0xef, 0x4c, 0x71, 0x08, 0xc8, 0xf8, 0x69, 0x1c, 0xc1, 0x7d, 0xc2, 0x1d, 0xb5, 0xf9, 0xb9, 0x27, 0x6a, 0x4d, 0xe4, 0xa6, 0x72, 0x9a, 0xc9, 0x09, 0x78, 0x65, 0x2f, 0x8a, 0x05, 0x21, 0x0f, 0xe1, 0x24, 0x12, 0xf0, 0x82, 0x45, 0x35, 0x93, 0xda, 0x8e, 0x96, 0x8f, 0xdb, 0xbd, 0x36, 0xd0, 0xce, 0x94, 0x13, 0x5c, 0xd2, 0xf1, 0x40, 0x46, 0x83, 0x38, 0x66, 0xdd, 0xfd, 0x30, 0xbf, 0x06, 0x8b, 0x62, 0xb3, 0x25, 0xe2, 0x98, 0x22, 0x88, 0x91, 0x10, 0x7e, 0x6e, 0x48, 0xc3, 0xa3, 0xb6, 0x1e, 0x42, 0x3a, 0x6b, 0x28, 0x54, 0xfa, 0x85, 0x3d, 0xba, 0x2b, 0x79, 0x0a, 0x15, 0x9b, 0x9f, 0x5e, 0xca, 0x4e, 0xd4, 0xac, 0xe5, 0xf3, 0x73, 0xa7, 0x57, 0xaf, 0x58, 0xa8, 0x50, 0xf4, 0xea, 0xd6, 0x74, 0x4f, 0xae, 0xe9, 0xd5, 0xe7, 0xe6, 0xad, 0xe8, 0x2c, 0xd7, 0x75, 0x7a, 0xeb, 0x16, 0x0b, 0xf5, 0x59, 0xcb, 0x5f, 0xb0, 0x9c, 0xa9, 0x51, 0xa0, 0x7f, 0x0c, 0xf6, 0x6f, 0x17, 0xc4, 0x49, 0xec, 0xd8, 0x43, 0x1f, 0x2d, 0xa4, 0x76, 0x7b, 0xb7, 0xcc, 0xbb, 0x3e, 0x5a, 0xfb, 0x60, 0xb1, 0x86, 0x3b, 0x52, 0xa1, 0x6c, 0xaa, 0x55, 0x29, 0x9d, 0x97, 0xb2, 0x87, 0x90, 0x61, 0xbe, 0xdc, 0xfc, 0xbc, 0x95, 0xcf, 0xcd, 0x37, 0x3f, 0x5b, 0xd1, 0x53, 0x39, 0x84, 0x3c, 0x41, 0xa2, 0x6d, 0x47, 0x14, 0x2a, 0x9e, 0x5d, 0x56, 0xf2, 0xd3, 0xab, 0x44, 0x11, 0x92, 0xd9, 0x23, 0x20, 0x2e, 0x89, 0xb4, 0x7c, 0xb8, 0x26, 0x77, 0x99, 0xe3, 0xa5, 0x67, 0x4a, 0xed, 0xde, 0xc5, 0x31, 0xfe, 0x18, 0x0d, 0x63, 0x8c, 0x80, 0xc0, 0xf7, 0x70, 0x07, ) def key_expansion(data): """ Generate key schedule @param {int[]} data 16/24/32-Byte cipher key @returns {int[]} 176/208/240-Byte expanded key """ data = data[:] # copy rcon_iteration = 1 key_size_bytes = len(data) expanded_key_size_bytes = (key_size_bytes // 4 + 7) * BLOCK_SIZE_BYTES while len(data) < expanded_key_size_bytes: temp = data[-4:] temp = key_schedule_core(temp, rcon_iteration) rcon_iteration += 1 data += xor(temp, data[-key_size_bytes: 4 - key_size_bytes]) for _ in range(3): temp = data[-4:] data += xor(temp, data[-key_size_bytes: 4 - key_size_bytes]) if key_size_bytes == 32: temp = data[-4:] temp = sub_bytes(temp) data += xor(temp, data[-key_size_bytes: 4 - key_size_bytes]) for _ in range(3 if key_size_bytes == 32 else 2 if key_size_bytes == 24 else 0): temp = data[-4:] data += xor(temp, data[-key_size_bytes: 4 - key_size_bytes]) data = data[:expanded_key_size_bytes] return data def iter_vector(iv): while True: yield iv iv = inc(iv) def sub_bytes(data): return [SBOX[x] for x in data] def sub_bytes_inv(data): return [SBOX_INV[x] for x in data] def rotate(data): return data[1:] + [data[0]] def key_schedule_core(data, rcon_iteration): data = rotate(data) data = sub_bytes(data) data[0] = data[0] ^ RCON[rcon_iteration] return data def xor(data1, data2): return [x ^ y for x, y in zip(data1, data2)] def iter_mix_columns(data, matrix): for i in (0, 4, 8, 12): for row in matrix: mixed = 0 for j in range(4): if data[i:i + 4][j] == 0 or row[j] == 0: mixed ^= 0 else: mixed ^= RIJNDAEL_EXP_TABLE[ (RIJNDAEL_LOG_TABLE[data[i + j]] + RIJNDAEL_LOG_TABLE[row[j]]) % 0xFF ] yield mixed def shift_rows(data): return [ data[((column + row) & 0b11) * 4 + row] for column in range(4) for row in range(4) ] def shift_rows_inv(data): return [ data[((column - row) & 0b11) * 4 + row] for column in range(4) for row in range(4) ] def shift_block(data): data_shifted = [] bit = 0 for n in data: if bit: n |= 0x100 bit = n & 1 n >>= 1 data_shifted.append(n) return data_shifted def inc(data): data = data[:] # copy for i in range(len(data) - 1, -1, -1): if data[i] == 255: data[i] = 0 else: data[i] = data[i] + 1 break return data def block_product(block_x, block_y): # NIST SP 800-38D, Algorithm 1 if len(block_x) != BLOCK_SIZE_BYTES or len(block_y) != BLOCK_SIZE_BYTES: raise ValueError( "Length of blocks need to be %d bytes" % BLOCK_SIZE_BYTES) block_r = [0xE1] + [0] * (BLOCK_SIZE_BYTES - 1) block_v = block_y[:] block_z = [0] * BLOCK_SIZE_BYTES for i in block_x: for bit in range(7, -1, -1): if i & (1 << bit): block_z = xor(block_z, block_v) do_xor = block_v[-1] & 1 block_v = shift_block(block_v) if do_xor: block_v = xor(block_v, block_r) return block_z def ghash(subkey, data): # NIST SP 800-38D, Algorithm 2 if len(data) % BLOCK_SIZE_BYTES: raise ValueError( "Length of data should be %d bytes" % BLOCK_SIZE_BYTES) last_y = [0] * BLOCK_SIZE_BYTES for i in range(0, len(data), BLOCK_SIZE_BYTES): block = data[i: i + BLOCK_SIZE_BYTES] last_y = block_product(xor(last_y, block), subkey) return last_y ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709407240.0 gallery_dl-1.26.9/gallery_dl/cache.py0000644000175000017500000001451514570676010016172 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2016-2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Decorators to keep function results in an in-memory and database cache""" import sqlite3 import pickle import time import os import functools from . import config, util class CacheDecorator(): """Simplified in-memory cache""" def __init__(self, func, keyarg): self.func = func self.cache = {} self.keyarg = keyarg def __get__(self, instance, cls): return functools.partial(self.__call__, instance) def __call__(self, *args, **kwargs): key = "" if self.keyarg is None else args[self.keyarg] try: value = self.cache[key] except KeyError: value = self.cache[key] = self.func(*args, **kwargs) return value def update(self, key, value): self.cache[key] = value def invalidate(self, key=""): try: del self.cache[key] except KeyError: pass class MemoryCacheDecorator(CacheDecorator): """In-memory cache""" def __init__(self, func, keyarg, maxage): CacheDecorator.__init__(self, func, keyarg) self.maxage = maxage def __call__(self, *args, **kwargs): key = "" if self.keyarg is None else args[self.keyarg] timestamp = int(time.time()) try: value, expires = self.cache[key] except KeyError: expires = 0 if expires <= timestamp: value = self.func(*args, **kwargs) expires = timestamp + self.maxage self.cache[key] = value, expires return value def update(self, key, value): self.cache[key] = value, int(time.time()) + self.maxage class DatabaseCacheDecorator(): """Database cache""" db = None _init = True def __init__(self, func, keyarg, maxage): self.key = "%s.%s" % (func.__module__, func.__name__) self.func = func self.cache = {} self.keyarg = keyarg self.maxage = maxage def __get__(self, obj, objtype): return functools.partial(self.__call__, obj) def __call__(self, *args, **kwargs): key = "" if self.keyarg is None else args[self.keyarg] timestamp = int(time.time()) # in-memory cache lookup try: value, expires = self.cache[key] if expires > timestamp: return value except KeyError: pass # database lookup fullkey = "%s-%s" % (self.key, key) with self.database() as db: cursor = db.cursor() try: cursor.execute("BEGIN EXCLUSIVE") except sqlite3.OperationalError: pass # Silently swallow exception - workaround for Python 3.6 cursor.execute( "SELECT value, expires FROM data WHERE key=? LIMIT 1", (fullkey,), ) result = cursor.fetchone() if result and result[1] > timestamp: value, expires = result value = pickle.loads(value) else: value = self.func(*args, **kwargs) expires = timestamp + self.maxage cursor.execute( "INSERT OR REPLACE INTO data VALUES (?,?,?)", (fullkey, pickle.dumps(value), expires), ) self.cache[key] = value, expires return value def update(self, key, value): expires = int(time.time()) + self.maxage self.cache[key] = value, expires with self.database() as db: db.execute( "INSERT OR REPLACE INTO data VALUES (?,?,?)", ("%s-%s" % (self.key, key), pickle.dumps(value), expires), ) def invalidate(self, key): try: del self.cache[key] except KeyError: pass with self.database() as db: db.execute( "DELETE FROM data WHERE key=?", ("%s-%s" % (self.key, key),), ) def database(self): if self._init: self.db.execute( "CREATE TABLE IF NOT EXISTS data " "(key TEXT PRIMARY KEY, value TEXT, expires INTEGER)" ) DatabaseCacheDecorator._init = False return self.db def memcache(maxage=None, keyarg=None): if maxage: def wrap(func): return MemoryCacheDecorator(func, keyarg, maxage) else: def wrap(func): return CacheDecorator(func, keyarg) return wrap def cache(maxage=3600, keyarg=None): def wrap(func): return DatabaseCacheDecorator(func, keyarg, maxage) return wrap def clear(module): """Delete database entries for 'module'""" db = DatabaseCacheDecorator.db if not db: return None rowcount = 0 cursor = db.cursor() try: if module == "ALL": cursor.execute("DELETE FROM data") else: cursor.execute( "DELETE FROM data " "WHERE key LIKE 'gallery_dl.extractor.' || ? || '.%'", (module.lower(),) ) except sqlite3.OperationalError: pass # database not initialized, cannot be modified, etc. else: rowcount = cursor.rowcount db.commit() if rowcount: cursor.execute("VACUUM") return rowcount def _path(): path = config.get(("cache",), "file", util.SENTINEL) if path is not util.SENTINEL: return util.expand_path(path) if util.WINDOWS: cachedir = os.environ.get("APPDATA", "~") else: cachedir = os.environ.get("XDG_CACHE_HOME", "~/.cache") cachedir = util.expand_path(os.path.join(cachedir, "gallery-dl")) os.makedirs(cachedir, exist_ok=True) return os.path.join(cachedir, "cache.sqlite3") def _init(): try: dbfile = _path() # restrict access permissions for new db files os.close(os.open(dbfile, os.O_CREAT | os.O_RDONLY, 0o600)) DatabaseCacheDecorator.db = sqlite3.connect( dbfile, timeout=60, check_same_thread=False) except (OSError, TypeError, sqlite3.OperationalError): global cache cache = memcache _init() ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709611201.0 gallery_dl-1.26.9/gallery_dl/config.py0000644000175000017500000001432714571514301016370 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2015-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Global configuration module""" import sys import os.path import logging from . import util log = logging.getLogger("config") # -------------------------------------------------------------------- # internals _config = {} _files = [] if util.WINDOWS: _default_configs = [ r"%APPDATA%\gallery-dl\config.json", r"%USERPROFILE%\gallery-dl\config.json", r"%USERPROFILE%\gallery-dl.conf", ] else: _default_configs = [ "/etc/gallery-dl.conf", "${XDG_CONFIG_HOME}/gallery-dl/config.json" if os.environ.get("XDG_CONFIG_HOME") else "${HOME}/.config/gallery-dl/config.json", "${HOME}/.gallery-dl.conf", ] if util.EXECUTABLE: # look for config file in PyInstaller executable directory (#682) _default_configs.append(os.path.join( os.path.dirname(sys.executable), "gallery-dl.conf", )) # -------------------------------------------------------------------- # public interface def initialize(): paths = list(map(util.expand_path, _default_configs)) for path in paths: if os.access(path, os.R_OK | os.W_OK): log.error("There is already a configuration file at '%s'", path) return 1 for path in paths: try: os.makedirs(os.path.dirname(path), exist_ok=True) with open(path, "x", encoding="utf-8") as fp: fp.write("""\ { "extractor": { }, "downloader": { }, "output": { }, "postprocessor": { } } """) break except OSError as exc: log.debug("%s: %s", exc.__class__.__name__, exc) else: log.error("Unable to create a new configuration file " "at any of the default paths") return 1 log.info("Created a basic configuration file at '%s'", path) return 0 def load(files=None, strict=False, loads=util.json_loads): """Load JSON configuration files""" for pathfmt in files or _default_configs: path = util.expand_path(pathfmt) try: with open(path, encoding="utf-8") as file: conf = loads(file.read()) except OSError as exc: if strict: log.error(exc) raise SystemExit(1) except Exception as exc: log.error("%s when loading '%s': %s", exc.__class__.__name__, path, exc) if strict: raise SystemExit(2) else: if not _config: _config.update(conf) else: util.combine_dict(_config, conf) _files.append(pathfmt) if "subconfigs" in conf: subconfigs = conf["subconfigs"] if subconfigs: if isinstance(subconfigs, str): subconfigs = (subconfigs,) load(subconfigs, strict, loads) def clear(): """Reset configuration to an empty state""" _config.clear() def get(path, key, default=None, conf=_config): """Get the value of property 'key' or a default value""" try: for p in path: conf = conf[p] return conf[key] except Exception: return default def interpolate(path, key, default=None, conf=_config): """Interpolate the value of 'key'""" if key in conf: return conf[key] try: for p in path: conf = conf[p] if key in conf: default = conf[key] except Exception: pass return default def interpolate_common(common, paths, key, default=None, conf=_config): """Interpolate the value of 'key' using multiple 'paths' along a 'common' ancestor """ if key in conf: return conf[key] # follow the common path try: for p in common: conf = conf[p] if key in conf: default = conf[key] except Exception: return default # try all paths until a value is found value = util.SENTINEL for path in paths: c = conf try: for p in path: c = c[p] if key in c: value = c[key] except Exception: pass if value is not util.SENTINEL: return value return default def accumulate(path, key, conf=_config): """Accumulate the values of 'key' along 'path'""" result = [] try: if key in conf: value = conf[key] if value: result.extend(value) for p in path: conf = conf[p] if key in conf: value = conf[key] if value: result[:0] = value except Exception: pass return result def set(path, key, value, conf=_config): """Set the value of property 'key' for this session""" for p in path: try: conf = conf[p] except KeyError: conf[p] = conf = {} conf[key] = value def setdefault(path, key, value, conf=_config): """Set the value of property 'key' if it doesn't exist""" for p in path: try: conf = conf[p] except KeyError: conf[p] = conf = {} return conf.setdefault(key, value) def unset(path, key, conf=_config): """Unset the value of property 'key'""" try: for p in path: conf = conf[p] del conf[key] except Exception: pass class apply(): """Context Manager: apply a collection of key-value pairs""" def __init__(self, kvlist): self.original = [] self.kvlist = kvlist def __enter__(self): for path, key, value in self.kvlist: self.original.append((path, key, get(path, key, util.SENTINEL))) set(path, key, value) def __exit__(self, etype, value, traceback): for path, key, value in self.original: if value is util.SENTINEL: unset(path, key) else: set(path, key, value) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709611201.0 gallery_dl-1.26.9/gallery_dl/cookies.py0000644000175000017500000010775014571514301016562 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2022-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. # Adapted from yt-dlp's cookies module. # https://github.com/yt-dlp/yt-dlp/blob/master/yt_dlp/cookies.py import binascii import contextlib import ctypes import logging import os import shutil import sqlite3 import struct import subprocess import sys import tempfile from hashlib import pbkdf2_hmac from http.cookiejar import Cookie from . import aes, text, util SUPPORTED_BROWSERS_CHROMIUM = { "brave", "chrome", "chromium", "edge", "opera", "vivaldi"} SUPPORTED_BROWSERS = SUPPORTED_BROWSERS_CHROMIUM | {"firefox", "safari"} logger = logging.getLogger("cookies") def load_cookies(cookiejar, browser_specification): browser_name, profile, keyring, container, domain = \ _parse_browser_specification(*browser_specification) if browser_name == "firefox": load_cookies_firefox(cookiejar, profile, container, domain) elif browser_name == "safari": load_cookies_safari(cookiejar, profile, domain) elif browser_name in SUPPORTED_BROWSERS_CHROMIUM: load_cookies_chrome(cookiejar, browser_name, profile, keyring, domain) else: raise ValueError("unknown browser '{}'".format(browser_name)) def load_cookies_firefox(cookiejar, profile=None, container=None, domain=None): path, container_id = _firefox_cookies_database(profile, container) with DatabaseConnection(path) as db: sql = ("SELECT name, value, host, path, isSecure, expiry " "FROM moz_cookies") parameters = () if container_id is False: sql += " WHERE NOT INSTR(originAttributes,'userContextId=')" elif container_id: sql += " WHERE originAttributes LIKE ? OR originAttributes LIKE ?" uid = "%userContextId={}".format(container_id) parameters = (uid, uid + "&%") elif domain: if domain[0] == ".": sql += " WHERE host == ? OR host LIKE ?" parameters = (domain[1:], "%" + domain) else: sql += " WHERE host == ? OR host == ?" parameters = (domain, "." + domain) set_cookie = cookiejar.set_cookie for name, value, domain, path, secure, expires in db.execute( sql, parameters): set_cookie(Cookie( 0, name, value, None, False, domain, bool(domain), domain.startswith("."), path, bool(path), secure, expires, False, None, None, {}, )) _log_info("Extracted %s cookies from Firefox", len(cookiejar)) def load_cookies_safari(cookiejar, profile=None, domain=None): """Ref.: https://github.com/libyal/dtformats/blob /main/documentation/Safari%20Cookies.asciidoc - This data appears to be out of date but the important parts of the database structure is the same - There are a few bytes here and there which are skipped during parsing """ with _safari_cookies_database() as fp: data = fp.read() page_sizes, body_start = _safari_parse_cookies_header(data) p = DataParser(data[body_start:]) for page_size in page_sizes: _safari_parse_cookies_page(p.read_bytes(page_size), cookiejar) def load_cookies_chrome(cookiejar, browser_name, profile=None, keyring=None, domain=None): config = _get_chromium_based_browser_settings(browser_name) path = _chrome_cookies_database(profile, config) _log_debug("Extracting cookies from %s", path) with DatabaseConnection(path) as db: db.text_factory = bytes decryptor = get_cookie_decryptor( config["directory"], config["keyring"], keyring) if domain: if domain[0] == ".": condition = " WHERE host_key == ? OR host_key LIKE ?" parameters = (domain[1:], "%" + domain) else: condition = " WHERE host_key == ? OR host_key == ?" parameters = (domain, "." + domain) else: condition = "" parameters = () try: rows = db.execute( "SELECT host_key, name, value, encrypted_value, path, " "expires_utc, is_secure FROM cookies" + condition, parameters) except sqlite3.OperationalError: rows = db.execute( "SELECT host_key, name, value, encrypted_value, path, " "expires_utc, secure FROM cookies" + condition, parameters) set_cookie = cookiejar.set_cookie failed_cookies = 0 unencrypted_cookies = 0 for domain, name, value, enc_value, path, expires, secure in rows: if not value and enc_value: # encrypted value = decryptor.decrypt(enc_value) if value is None: failed_cookies += 1 continue else: value = value.decode() unencrypted_cookies += 1 domain = domain.decode() path = path.decode() name = name.decode() set_cookie(Cookie( 0, name, value, None, False, domain, bool(domain), domain.startswith("."), path, bool(path), secure, expires, False, None, None, {}, )) if failed_cookies > 0: failed_message = " ({} could not be decrypted)".format(failed_cookies) else: failed_message = "" _log_info("Extracted %s cookies from %s%s", len(cookiejar), browser_name.capitalize(), failed_message) counts = decryptor.cookie_counts counts["unencrypted"] = unencrypted_cookies _log_debug("Cookie version breakdown: %s", counts) # -------------------------------------------------------------------- # firefox def _firefox_cookies_database(profile=None, container=None): if not profile: search_root = _firefox_browser_directory() elif _is_path(profile): search_root = profile else: search_root = os.path.join(_firefox_browser_directory(), profile) path = _find_most_recently_used_file(search_root, "cookies.sqlite") if path is None: raise FileNotFoundError("Unable to find Firefox cookies database in " "{}".format(search_root)) _log_debug("Extracting cookies from %s", path) if container == "none": container_id = False _log_debug("Only loading cookies not belonging to any container") elif container: containers_path = os.path.join( os.path.dirname(path), "containers.json") try: with open(containers_path) as file: identities = util.json_loads(file.read())["identities"] except OSError: _log_error("Unable to read Firefox container database at '%s'", containers_path) raise except KeyError: identities = () for context in identities: if container == context.get("name") or container == text.extr( context.get("l10nID", ""), "userContext", ".label"): container_id = context["userContextId"] break else: raise ValueError("Unable to find Firefox container '{}'".format( container)) _log_debug("Only loading cookies from container '%s' (ID %s)", container, container_id) else: container_id = None return path, container_id def _firefox_browser_directory(): if sys.platform in ("win32", "cygwin"): return os.path.expandvars( r"%APPDATA%\Mozilla\Firefox\Profiles") if sys.platform == "darwin": return os.path.expanduser( "~/Library/Application Support/Firefox/Profiles") return os.path.expanduser("~/.mozilla/firefox") # -------------------------------------------------------------------- # safari def _safari_cookies_database(): try: path = os.path.expanduser("~/Library/Cookies/Cookies.binarycookies") return open(path, "rb") except FileNotFoundError: _log_debug("Trying secondary cookie location") path = os.path.expanduser("~/Library/Containers/com.apple.Safari/Data" "/Library/Cookies/Cookies.binarycookies") return open(path, "rb") def _safari_parse_cookies_header(data): p = DataParser(data) p.expect_bytes(b"cook", "database signature") number_of_pages = p.read_uint(big_endian=True) page_sizes = [p.read_uint(big_endian=True) for _ in range(number_of_pages)] return page_sizes, p.cursor def _safari_parse_cookies_page(data, cookiejar, domain=None): p = DataParser(data) p.expect_bytes(b"\x00\x00\x01\x00", "page signature") number_of_cookies = p.read_uint() record_offsets = [p.read_uint() for _ in range(number_of_cookies)] if number_of_cookies == 0: _log_debug("Cookies page of size %s has no cookies", len(data)) return p.skip_to(record_offsets[0], "unknown page header field") for i, record_offset in enumerate(record_offsets): p.skip_to(record_offset, "space between records") record_length = _safari_parse_cookies_record( data[record_offset:], cookiejar, domain) p.read_bytes(record_length) p.skip_to_end("space in between pages") def _safari_parse_cookies_record(data, cookiejar, host=None): p = DataParser(data) record_size = p.read_uint() p.skip(4, "unknown record field 1") flags = p.read_uint() is_secure = bool(flags & 0x0001) p.skip(4, "unknown record field 2") domain_offset = p.read_uint() name_offset = p.read_uint() path_offset = p.read_uint() value_offset = p.read_uint() p.skip(8, "unknown record field 3") expiration_date = _mac_absolute_time_to_posix(p.read_double()) _creation_date = _mac_absolute_time_to_posix(p.read_double()) # noqa: F841 try: p.skip_to(domain_offset) domain = p.read_cstring() if host: if host[0] == ".": if host[1:] != domain and not domain.endswith(host): return record_size else: if host != domain and ("." + host) != domain: return record_size p.skip_to(name_offset) name = p.read_cstring() p.skip_to(path_offset) path = p.read_cstring() p.skip_to(value_offset) value = p.read_cstring() except UnicodeDecodeError: _log_warning("Failed to parse Safari cookie") return record_size p.skip_to(record_size, "space at the end of the record") cookiejar.set_cookie(Cookie( 0, name, value, None, False, domain, bool(domain), domain.startswith("."), path, bool(path), is_secure, expiration_date, False, None, None, {}, )) return record_size # -------------------------------------------------------------------- # chrome def _chrome_cookies_database(profile, config): if profile is None: search_root = config["directory"] elif _is_path(profile): search_root = profile config["directory"] = (os.path.dirname(profile) if config["profiles"] else profile) elif config["profiles"]: search_root = os.path.join(config["directory"], profile) else: _log_warning("%s does not support profiles", config["browser"]) search_root = config["directory"] path = _find_most_recently_used_file(search_root, "Cookies") if path is None: raise FileNotFoundError("Unable to find {} cookies database in " "'{}'".format(config["browser"], search_root)) return path def _get_chromium_based_browser_settings(browser_name): # https://chromium.googlesource.com/chromium # /src/+/HEAD/docs/user_data_dir.md join = os.path.join if sys.platform in ("win32", "cygwin"): appdata_local = os.path.expandvars("%LOCALAPPDATA%") appdata_roaming = os.path.expandvars("%APPDATA%") browser_dir = { "brave" : join(appdata_local, R"BraveSoftware\Brave-Browser\User Data"), "chrome" : join(appdata_local, R"Google\Chrome\User Data"), "chromium": join(appdata_local, R"Chromium\User Data"), "edge" : join(appdata_local, R"Microsoft\Edge\User Data"), "opera" : join(appdata_roaming, R"Opera Software\Opera Stable"), "vivaldi" : join(appdata_local, R"Vivaldi\User Data"), }[browser_name] elif sys.platform == "darwin": appdata = os.path.expanduser("~/Library/Application Support") browser_dir = { "brave" : join(appdata, "BraveSoftware/Brave-Browser"), "chrome" : join(appdata, "Google/Chrome"), "chromium": join(appdata, "Chromium"), "edge" : join(appdata, "Microsoft Edge"), "opera" : join(appdata, "com.operasoftware.Opera"), "vivaldi" : join(appdata, "Vivaldi"), }[browser_name] else: config = (os.environ.get("XDG_CONFIG_HOME") or os.path.expanduser("~/.config")) browser_dir = { "brave" : join(config, "BraveSoftware/Brave-Browser"), "chrome" : join(config, "google-chrome"), "chromium": join(config, "chromium"), "edge" : join(config, "microsoft-edge"), "opera" : join(config, "opera"), "vivaldi" : join(config, "vivaldi"), }[browser_name] # Linux keyring names can be determined by snooping on dbus # while opening the browser in KDE: # dbus-monitor "interface="org.kde.KWallet"" "type=method_return" keyring_name = { "brave" : "Brave", "chrome" : "Chrome", "chromium": "Chromium", "edge" : "Microsoft Edge" if sys.platform == "darwin" else "Chromium", "opera" : "Opera" if sys.platform == "darwin" else "Chromium", "vivaldi" : "Vivaldi" if sys.platform == "darwin" else "Chrome", }[browser_name] browsers_without_profiles = {"opera"} return { "browser" : browser_name, "directory": browser_dir, "keyring" : keyring_name, "profiles" : browser_name not in browsers_without_profiles } class ChromeCookieDecryptor: """ Overview: Linux: - cookies are either v10 or v11 - v10: AES-CBC encrypted with a fixed key - v11: AES-CBC encrypted with an OS protected key (keyring) - v11 keys can be stored in various places depending on the activate desktop environment [2] Mac: - cookies are either v10 or not v10 - v10: AES-CBC encrypted with an OS protected key (keyring) and more key derivation iterations than linux - not v10: "old data" stored as plaintext Windows: - cookies are either v10 or not v10 - v10: AES-GCM encrypted with a key which is encrypted with DPAPI - not v10: encrypted with DPAPI Sources: - [1] https://chromium.googlesource.com/chromium/src/+/refs/heads /main/components/os_crypt/ - [2] https://chromium.googlesource.com/chromium/src/+/refs/heads /main/components/os_crypt/key_storage_linux.cc - KeyStorageLinux::CreateService """ def decrypt(self, encrypted_value): raise NotImplementedError("Must be implemented by sub classes") @property def cookie_counts(self): raise NotImplementedError("Must be implemented by sub classes") def get_cookie_decryptor(browser_root, browser_keyring_name, keyring=None): if sys.platform in ("win32", "cygwin"): return WindowsChromeCookieDecryptor(browser_root) elif sys.platform == "darwin": return MacChromeCookieDecryptor(browser_keyring_name) else: return LinuxChromeCookieDecryptor(browser_keyring_name, keyring) class LinuxChromeCookieDecryptor(ChromeCookieDecryptor): def __init__(self, browser_keyring_name, keyring=None): self._v10_key = self.derive_key(b"peanuts") password = _get_linux_keyring_password(browser_keyring_name, keyring) self._v11_key = None if password is None else self.derive_key(password) self._cookie_counts = {"v10": 0, "v11": 0, "other": 0} @staticmethod def derive_key(password): # values from # https://chromium.googlesource.com/chromium/src/+/refs/heads # /main/components/os_crypt/os_crypt_linux.cc return pbkdf2_sha1(password, salt=b"saltysalt", iterations=1, key_length=16) @property def cookie_counts(self): return self._cookie_counts def decrypt(self, encrypted_value): version = encrypted_value[:3] ciphertext = encrypted_value[3:] if version == b"v10": self._cookie_counts["v10"] += 1 return _decrypt_aes_cbc(ciphertext, self._v10_key) elif version == b"v11": self._cookie_counts["v11"] += 1 if self._v11_key is None: _log_warning("Unable to decrypt v11 cookies: no key found") return None return _decrypt_aes_cbc(ciphertext, self._v11_key) else: self._cookie_counts["other"] += 1 return None class MacChromeCookieDecryptor(ChromeCookieDecryptor): def __init__(self, browser_keyring_name): password = _get_mac_keyring_password(browser_keyring_name) self._v10_key = None if password is None else self.derive_key(password) self._cookie_counts = {"v10": 0, "other": 0} @staticmethod def derive_key(password): # values from # https://chromium.googlesource.com/chromium/src/+/refs/heads # /main/components/os_crypt/os_crypt_mac.mm return pbkdf2_sha1(password, salt=b"saltysalt", iterations=1003, key_length=16) @property def cookie_counts(self): return self._cookie_counts def decrypt(self, encrypted_value): version = encrypted_value[:3] ciphertext = encrypted_value[3:] if version == b"v10": self._cookie_counts["v10"] += 1 if self._v10_key is None: _log_warning("Unable to decrypt v10 cookies: no key found") return None return _decrypt_aes_cbc(ciphertext, self._v10_key) else: self._cookie_counts["other"] += 1 # other prefixes are considered "old data", # which were stored as plaintext # https://chromium.googlesource.com/chromium/src/+/refs/heads # /main/components/os_crypt/os_crypt_mac.mm return encrypted_value class WindowsChromeCookieDecryptor(ChromeCookieDecryptor): def __init__(self, browser_root): self._v10_key = _get_windows_v10_key(browser_root) self._cookie_counts = {"v10": 0, "other": 0} @property def cookie_counts(self): return self._cookie_counts def decrypt(self, encrypted_value): version = encrypted_value[:3] ciphertext = encrypted_value[3:] if version == b"v10": self._cookie_counts["v10"] += 1 if self._v10_key is None: _log_warning("Unable to decrypt v10 cookies: no key found") return None # https://chromium.googlesource.com/chromium/src/+/refs/heads # /main/components/os_crypt/os_crypt_win.cc # kNonceLength nonce_length = 96 // 8 # boringssl # EVP_AEAD_AES_GCM_TAG_LEN authentication_tag_length = 16 raw_ciphertext = ciphertext nonce = raw_ciphertext[:nonce_length] ciphertext = raw_ciphertext[ nonce_length:-authentication_tag_length] authentication_tag = raw_ciphertext[-authentication_tag_length:] return _decrypt_aes_gcm( ciphertext, self._v10_key, nonce, authentication_tag) else: self._cookie_counts["other"] += 1 # any other prefix means the data is DPAPI encrypted # https://chromium.googlesource.com/chromium/src/+/refs/heads # /main/components/os_crypt/os_crypt_win.cc return _decrypt_windows_dpapi(encrypted_value).decode() # -------------------------------------------------------------------- # keyring def _choose_linux_keyring(): """ https://chromium.googlesource.com/chromium/src/+/refs/heads /main/components/os_crypt/key_storage_util_linux.cc SelectBackend """ desktop_environment = _get_linux_desktop_environment(os.environ) _log_debug("Detected desktop environment: %s", desktop_environment) if desktop_environment == DE_KDE: return KEYRING_KWALLET if desktop_environment == DE_OTHER: return KEYRING_BASICTEXT return KEYRING_GNOMEKEYRING def _get_kwallet_network_wallet(): """ The name of the wallet used to store network passwords. https://chromium.googlesource.com/chromium/src/+/refs/heads /main/components/os_crypt/kwallet_dbus.cc KWalletDBus::NetworkWallet which does a dbus call to the following function: https://api.kde.org/frameworks/kwallet/html/classKWallet_1_1Wallet.html Wallet::NetworkWallet """ default_wallet = "kdewallet" try: proc, stdout = Popen_communicate( "dbus-send", "--session", "--print-reply=literal", "--dest=org.kde.kwalletd5", "/modules/kwalletd5", "org.kde.KWallet.networkWallet" ) if proc.returncode != 0: _log_warning("Failed to read NetworkWallet") return default_wallet else: network_wallet = stdout.decode().strip() _log_debug("NetworkWallet = '%s'", network_wallet) return network_wallet except Exception as exc: _log_warning("Error while obtaining NetworkWallet (%s: %s)", exc.__class__.__name__, exc) return default_wallet def _get_kwallet_password(browser_keyring_name): _log_debug("Using kwallet-query to obtain password from kwallet") if shutil.which("kwallet-query") is None: _log_error( "kwallet-query command not found. KWallet and kwallet-query " "must be installed to read from KWallet. kwallet-query should be " "included in the kwallet package for your distribution") return b"" network_wallet = _get_kwallet_network_wallet() try: proc, stdout = Popen_communicate( "kwallet-query", "--read-password", browser_keyring_name + " Safe Storage", "--folder", browser_keyring_name + " Keys", network_wallet, ) if proc.returncode != 0: _log_error("kwallet-query failed with return code {}. " "Please consult the kwallet-query man page " "for details".format(proc.returncode)) return b"" if stdout.lower().startswith(b"failed to read"): _log_debug("Failed to read password from kwallet. " "Using empty string instead") # This sometimes occurs in KDE because chrome does not check # hasEntry and instead just tries to read the value (which # kwallet returns "") whereas kwallet-query checks hasEntry. # To verify this: # dbus-monitor "interface="org.kde.KWallet"" "type=method_return" # while starting chrome. # This may be a bug, as the intended behaviour is to generate a # random password and store it, but that doesn't matter here. return b"" else: if stdout[-1:] == b"\n": stdout = stdout[:-1] return stdout except Exception as exc: _log_warning("Error when running kwallet-query (%s: %s)", exc.__class__.__name__, exc) return b"" def _get_gnome_keyring_password(browser_keyring_name): try: import secretstorage except ImportError: _log_error("'secretstorage' Python package not available") return b"" # Gnome keyring does not seem to organise keys in the same way as KWallet, # using `dbus-monitor` during startup, it can be observed that chromium # lists all keys and presumably searches for its key in the list. # It appears that we must do the same. # https://github.com/jaraco/keyring/issues/556 with contextlib.closing(secretstorage.dbus_init()) as con: col = secretstorage.get_default_collection(con) label = browser_keyring_name + " Safe Storage" for item in col.get_all_items(): if item.get_label() == label: return item.get_secret() else: _log_error("Failed to read from GNOME keyring") return b"" def _get_linux_keyring_password(browser_keyring_name, keyring): # Note: chrome/chromium can be run with the following flags # to determine which keyring backend it has chosen to use # - chromium --enable-logging=stderr --v=1 2>&1 | grep key_storage_ # # Chromium supports --password-store= # so the automatic detection will not be sufficient in all cases. if not keyring: keyring = _choose_linux_keyring() _log_debug("Chosen keyring: %s", keyring) if keyring == KEYRING_KWALLET: return _get_kwallet_password(browser_keyring_name) elif keyring == KEYRING_GNOMEKEYRING: return _get_gnome_keyring_password(browser_keyring_name) elif keyring == KEYRING_BASICTEXT: # when basic text is chosen, all cookies are stored as v10 # so no keyring password is required return None assert False, "Unknown keyring " + keyring def _get_mac_keyring_password(browser_keyring_name): _log_debug("Using find-generic-password to obtain " "password from OSX keychain") try: proc, stdout = Popen_communicate( "security", "find-generic-password", "-w", # write password to stdout "-a", browser_keyring_name, # match "account" "-s", browser_keyring_name + " Safe Storage", # match "service" ) if stdout[-1:] == b"\n": stdout = stdout[:-1] return stdout except Exception as exc: _log_warning("Error when using find-generic-password (%s: %s)", exc.__class__.__name__, exc) return None def _get_windows_v10_key(browser_root): path = _find_most_recently_used_file(browser_root, "Local State") if path is None: _log_error("Unable to find Local State file") return None _log_debug("Found Local State file at '%s'", path) with open(path, encoding="utf-8") as file: data = util.json_loads(file.read()) try: base64_key = data["os_crypt"]["encrypted_key"] except KeyError: _log_error("Unable to find encrypted key in Local State") return None encrypted_key = binascii.a2b_base64(base64_key) prefix = b"DPAPI" if not encrypted_key.startswith(prefix): _log_error("Invalid Local State key") return None return _decrypt_windows_dpapi(encrypted_key[len(prefix):]) # -------------------------------------------------------------------- # utility class ParserError(Exception): pass class DataParser: def __init__(self, data): self.cursor = 0 self._data = data def read_bytes(self, num_bytes): if num_bytes < 0: raise ParserError("invalid read of {} bytes".format(num_bytes)) end = self.cursor + num_bytes if end > len(self._data): raise ParserError("reached end of input") data = self._data[self.cursor:end] self.cursor = end return data def expect_bytes(self, expected_value, message): value = self.read_bytes(len(expected_value)) if value != expected_value: raise ParserError("unexpected value: {} != {} ({})".format( value, expected_value, message)) def read_uint(self, big_endian=False): data_format = ">I" if big_endian else " 0: _log_debug("Skipping {} bytes ({}): {!r}".format( num_bytes, description, self.read_bytes(num_bytes))) elif num_bytes < 0: raise ParserError("Invalid skip of {} bytes".format(num_bytes)) def skip_to(self, offset, description="unknown"): self.skip(offset - self.cursor, description) def skip_to_end(self, description="unknown"): self.skip_to(len(self._data), description) class DatabaseConnection(): def __init__(self, path): self.path = path self.database = None self.directory = None def __enter__(self): try: # https://www.sqlite.org/uri.html#the_uri_path path = self.path.replace("?", "%3f").replace("#", "%23") if util.WINDOWS: path = "/" + os.path.abspath(path) uri = "file:{}?mode=ro&immutable=1".format(path) self.database = sqlite3.connect( uri, uri=True, isolation_level=None, check_same_thread=False) return self.database except Exception as exc: _log_debug("Falling back to temporary database copy (%s: %s)", exc.__class__.__name__, exc) try: self.directory = tempfile.TemporaryDirectory(prefix="gallery-dl-") path_copy = os.path.join(self.directory.name, "copy.sqlite") shutil.copyfile(self.path, path_copy) self.database = sqlite3.connect( path_copy, isolation_level=None, check_same_thread=False) return self.database except BaseException: if self.directory: self.directory.cleanup() raise def __exit__(self, exc, value, tb): self.database.close() if self.directory: self.directory.cleanup() def Popen_communicate(*args): proc = subprocess.Popen( args, stdout=subprocess.PIPE, stderr=subprocess.DEVNULL) try: stdout, stderr = proc.communicate() except BaseException: # Including KeyboardInterrupt proc.kill() proc.wait() raise return proc, stdout """ https://chromium.googlesource.com/chromium/src/+/refs/heads /main/base/nix/xdg_util.h - DesktopEnvironment """ DE_OTHER = "other" DE_CINNAMON = "cinnamon" DE_GNOME = "gnome" DE_KDE = "kde" DE_PANTHEON = "pantheon" DE_UNITY = "unity" DE_XFCE = "xfce" """ https://chromium.googlesource.com/chromium/src/+/refs/heads /main/components/os_crypt/key_storage_util_linux.h - SelectedLinuxBackend """ KEYRING_KWALLET = "kwallet" KEYRING_GNOMEKEYRING = "gnomekeyring" KEYRING_BASICTEXT = "basictext" SUPPORTED_KEYRINGS = {"kwallet", "gnomekeyring", "basictext"} def _get_linux_desktop_environment(env): """ Ref: https://chromium.googlesource.com/chromium/src/+/refs/heads /main/base/nix/xdg_util.cc - GetDesktopEnvironment """ xdg_current_desktop = env.get("XDG_CURRENT_DESKTOP") desktop_session = env.get("DESKTOP_SESSION") if xdg_current_desktop: xdg_current_desktop = (xdg_current_desktop.partition(":")[0] .strip().lower()) if xdg_current_desktop == "unity": if desktop_session and "gnome-fallback" in desktop_session: return DE_GNOME else: return DE_UNITY elif xdg_current_desktop == "gnome": return DE_GNOME elif xdg_current_desktop == "x-cinnamon": return DE_CINNAMON elif xdg_current_desktop == "kde": return DE_KDE elif xdg_current_desktop == "pantheon": return DE_PANTHEON elif xdg_current_desktop == "xfce": return DE_XFCE if desktop_session: if desktop_session in ("mate", "gnome"): return DE_GNOME if "kde" in desktop_session: return DE_KDE if "xfce" in desktop_session: return DE_XFCE if "GNOME_DESKTOP_SESSION_ID" in env: return DE_GNOME if "KDE_FULL_SESSION" in env: return DE_KDE return DE_OTHER def _mac_absolute_time_to_posix(timestamp): # 978307200 is timestamp of 2001-01-01 00:00:00 return 978307200 + int(timestamp) def pbkdf2_sha1(password, salt, iterations, key_length): return pbkdf2_hmac("sha1", password, salt, iterations, key_length) def _decrypt_aes_cbc(ciphertext, key, initialization_vector=b" " * 16): try: return aes.unpad_pkcs7(aes.aes_cbc_decrypt_bytes( ciphertext, key, initialization_vector)).decode() except UnicodeDecodeError: _log_warning("Failed to decrypt cookie (AES-CBC Unicode)") except ValueError: _log_warning("Failed to decrypt cookie (AES-CBC)") return None def _decrypt_aes_gcm(ciphertext, key, nonce, authentication_tag): try: return aes.aes_gcm_decrypt_and_verify_bytes( ciphertext, key, authentication_tag, nonce).decode() except UnicodeDecodeError: _log_warning("Failed to decrypt cookie (AES-GCM Unicode)") except ValueError: _log_warning("Failed to decrypt cookie (AES-GCM MAC)") return None def _decrypt_windows_dpapi(ciphertext): """ References: - https://docs.microsoft.com/en-us/windows /win32/api/dpapi/nf-dpapi-cryptunprotectdata """ from ctypes.wintypes import DWORD class DATA_BLOB(ctypes.Structure): _fields_ = [("cbData", DWORD), ("pbData", ctypes.POINTER(ctypes.c_char))] buffer = ctypes.create_string_buffer(ciphertext) blob_in = DATA_BLOB(ctypes.sizeof(buffer), buffer) blob_out = DATA_BLOB() ret = ctypes.windll.crypt32.CryptUnprotectData( ctypes.byref(blob_in), # pDataIn None, # ppszDataDescr: human readable description of pDataIn None, # pOptionalEntropy: salt? None, # pvReserved: must be NULL None, # pPromptStruct: information about prompts to display 0, # dwFlags ctypes.byref(blob_out) # pDataOut ) if not ret: _log_warning("Failed to decrypt cookie (DPAPI)") return None result = ctypes.string_at(blob_out.pbData, blob_out.cbData) ctypes.windll.kernel32.LocalFree(blob_out.pbData) return result def _find_most_recently_used_file(root, filename): # if there are multiple browser profiles, take the most recently used one paths = [] for curr_root, dirs, files in os.walk(root): for file in files: if file == filename: paths.append(os.path.join(curr_root, file)) if not paths: return None return max(paths, key=lambda path: os.lstat(path).st_mtime) def _is_path(value): return os.path.sep in value def _parse_browser_specification( browser, profile=None, keyring=None, container=None, domain=None): browser = browser.lower() if browser not in SUPPORTED_BROWSERS: raise ValueError("Unsupported browser '{}'".format(browser)) if keyring and keyring not in SUPPORTED_KEYRINGS: raise ValueError("Unsupported keyring '{}'".format(keyring)) if profile and _is_path(profile): profile = os.path.expanduser(profile) return browser, profile, keyring, container, domain _log_cache = set() _log_debug = logger.debug _log_info = logger.info def _log_warning(msg, *args): if msg not in _log_cache: _log_cache.add(msg) logger.warning(msg, *args) def _log_error(msg, *args): if msg not in _log_cache: _log_cache.add(msg) logger.error(msg, *args) ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1711211673.9824722 gallery_dl-1.26.9/gallery_dl/downloader/0000755000175000017500000000000014577602232016707 5ustar00mikemike././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1708353731.0 gallery_dl-1.26.9/gallery_dl/downloader/__init__.py0000644000175000017500000000176414564664303021033 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2015-2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Downloader modules""" modules = [ "http", "text", "ytdl", ] def find(scheme): """Return downloader class suitable for handling the given scheme""" try: return _cache[scheme] except KeyError: pass cls = None if scheme == "https": scheme = "http" if scheme in modules: # prevent unwanted imports try: module = __import__(scheme, globals(), None, (), 1) except ImportError: pass else: cls = module.__downloader__ if scheme == "http": _cache["http"] = _cache["https"] = cls else: _cache[scheme] = cls return cls # -------------------------------------------------------------------- # internals _cache = {} ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1708353813.0 gallery_dl-1.26.9/gallery_dl/downloader/common.py0000644000175000017500000000250114564664425020557 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2014-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Common classes and constants used by downloader modules.""" import os from .. import config, util class DownloaderBase(): """Base class for downloaders""" scheme = "" def __init__(self, job): self.out = job.out self.session = job.extractor.session self.part = self.config("part", True) self.partdir = self.config("part-directory") self.log = job.get_logger("downloader." + self.scheme) if self.partdir: self.partdir = util.expand_path(self.partdir) os.makedirs(self.partdir, exist_ok=True) proxies = self.config("proxy", util.SENTINEL) if proxies is util.SENTINEL: self.proxies = job.extractor._proxies else: self.proxies = util.build_proxy_map(proxies, self.log) def config(self, key, default=None): """Interpolate downloader config value for 'key'""" return config.interpolate(("downloader", self.scheme), key, default) def download(self, url, pathfmt): """Write data from 'url' into the file specified by 'pathfmt'""" ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709823960.0 gallery_dl-1.26.9/gallery_dl/downloader/http.py0000644000175000017500000004126114572353730020245 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2014-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Downloader module for http:// and https:// URLs""" import time import mimetypes from requests.exceptions import RequestException, ConnectionError, Timeout from .common import DownloaderBase from .. import text, util from ssl import SSLError class HttpDownloader(DownloaderBase): scheme = "http" def __init__(self, job): DownloaderBase.__init__(self, job) extractor = job.extractor self.downloading = False self.adjust_extension = self.config("adjust-extensions", True) self.chunk_size = self.config("chunk-size", 32768) self.metadata = extractor.config("http-metadata") self.progress = self.config("progress", 3.0) self.validate = self.config("validate", True) self.headers = self.config("headers") self.minsize = self.config("filesize-min") self.maxsize = self.config("filesize-max") self.retries = self.config("retries", extractor._retries) self.retry_codes = self.config("retry-codes", extractor._retry_codes) self.timeout = self.config("timeout", extractor._timeout) self.verify = self.config("verify", extractor._verify) self.mtime = self.config("mtime", True) self.rate = self.config("rate") if not self.config("consume-content", False): # this resets the underlying TCP connection, and therefore # if the program makes another request to the same domain, # a new connection (either TLS or plain TCP) must be made self.release_conn = lambda resp: resp.close() if self.retries < 0: self.retries = float("inf") if self.minsize: minsize = text.parse_bytes(self.minsize) if not minsize: self.log.warning( "Invalid minimum file size (%r)", self.minsize) self.minsize = minsize if self.maxsize: maxsize = text.parse_bytes(self.maxsize) if not maxsize: self.log.warning( "Invalid maximum file size (%r)", self.maxsize) self.maxsize = maxsize if isinstance(self.chunk_size, str): chunk_size = text.parse_bytes(self.chunk_size) if not chunk_size: self.log.warning( "Invalid chunk size (%r)", self.chunk_size) chunk_size = 32768 self.chunk_size = chunk_size if self.rate: rate = text.parse_bytes(self.rate) if rate: if rate < self.chunk_size: self.chunk_size = rate self.rate = rate self.receive = self._receive_rate else: self.log.warning("Invalid rate limit (%r)", self.rate) if self.progress is not None: self.receive = self._receive_rate if self.progress < 0.0: self.progress = 0.0 def download(self, url, pathfmt): try: return self._download_impl(url, pathfmt) except Exception: print() raise finally: # remove file from incomplete downloads if self.downloading and not self.part: util.remove_file(pathfmt.temppath) def _download_impl(self, url, pathfmt): response = None tries = 0 msg = "" metadata = self.metadata kwdict = pathfmt.kwdict adjust_extension = kwdict.get( "_http_adjust_extension", self.adjust_extension) if self.part and not metadata: pathfmt.part_enable(self.partdir) while True: if tries: if response: self.release_conn(response) response = None self.log.warning("%s (%s/%s)", msg, tries, self.retries+1) if tries > self.retries: return False time.sleep(tries) tries += 1 file_header = None # collect HTTP headers headers = {"Accept": "*/*"} # file-specific headers extra = kwdict.get("_http_headers") if extra: headers.update(extra) # general headers if self.headers: headers.update(self.headers) # partial content file_size = pathfmt.part_size() if file_size: headers["Range"] = "bytes={}-".format(file_size) # connect to (remote) source try: response = self.session.request( kwdict.get("_http_method", "GET"), url, stream=True, headers=headers, data=kwdict.get("_http_data"), timeout=self.timeout, proxies=self.proxies, verify=self.verify, ) except (ConnectionError, Timeout) as exc: msg = str(exc) continue except Exception as exc: self.log.warning(exc) return False # check response code = response.status_code if code == 200: # OK offset = 0 size = response.headers.get("Content-Length") elif code == 206: # Partial Content offset = file_size size = response.headers["Content-Range"].rpartition("/")[2] elif code == 416 and file_size: # Requested Range Not Satisfiable break else: msg = "'{} {}' for '{}'".format(code, response.reason, url) if code in self.retry_codes or 500 <= code < 600: continue retry = kwdict.get("_http_retry") if retry and retry(response): continue self.release_conn(response) self.log.warning(msg) return False # check for invalid responses validate = kwdict.get("_http_validate") if validate and self.validate: try: result = validate(response) except Exception: self.release_conn(response) raise if isinstance(result, str): url = result tries -= 1 continue if not result: self.release_conn(response) self.log.warning("Invalid response") return False # check file size size = text.parse_int(size, None) if size is not None: if self.minsize and size < self.minsize: self.release_conn(response) self.log.warning( "File size smaller than allowed minimum (%s < %s)", size, self.minsize) pathfmt.temppath = "" return True if self.maxsize and size > self.maxsize: self.release_conn(response) self.log.warning( "File size larger than allowed maximum (%s > %s)", size, self.maxsize) pathfmt.temppath = "" return True build_path = False # set missing filename extension from MIME type if not pathfmt.extension: pathfmt.set_extension(self._find_extension(response)) build_path = True # set metadata from HTTP headers if metadata: kwdict[metadata] = util.extract_headers(response) build_path = True # build and check file path if build_path: pathfmt.build_path() if pathfmt.exists(): pathfmt.temppath = "" # release the connection back to pool by explicitly # calling .close() # see https://requests.readthedocs.io/en/latest/user # /advanced/#body-content-workflow # when the image size is on the order of megabytes, # re-establishing a TLS connection will typically be faster # than consuming the whole response response.close() return True if self.part and metadata: pathfmt.part_enable(self.partdir) metadata = False content = response.iter_content(self.chunk_size) # check filename extension against file header if adjust_extension and not offset and \ pathfmt.extension in SIGNATURE_CHECKS: try: file_header = next( content if response.raw.chunked else response.iter_content(16), b"") except (RequestException, SSLError) as exc: msg = str(exc) print() continue if self._adjust_extension(pathfmt, file_header) and \ pathfmt.exists(): pathfmt.temppath = "" response.close() return True # set open mode if not offset: mode = "w+b" if file_size: self.log.debug("Unable to resume partial download") else: mode = "r+b" self.log.debug("Resuming download at byte %d", offset) # download content self.downloading = True with pathfmt.open(mode) as fp: if file_header: fp.write(file_header) offset += len(file_header) elif offset: if adjust_extension and \ pathfmt.extension in SIGNATURE_CHECKS: self._adjust_extension(pathfmt, fp.read(16)) fp.seek(offset) self.out.start(pathfmt.path) try: self.receive(fp, content, size, offset) except (RequestException, SSLError) as exc: msg = str(exc) print() continue # check file size if size and fp.tell() < size: msg = "file size mismatch ({} < {})".format( fp.tell(), size) print() continue break self.downloading = False if self.mtime: kwdict.setdefault("_mtime", response.headers.get("Last-Modified")) else: kwdict["_mtime"] = None return True def release_conn(self, response): """Release connection back to pool by consuming response body""" try: for _ in response.iter_content(self.chunk_size): pass except (RequestException, SSLError) as exc: print() self.log.debug( "Unable to consume response body (%s: %s); " "closing the connection anyway", exc.__class__.__name__, exc) response.close() @staticmethod def receive(fp, content, bytes_total, bytes_start): write = fp.write for data in content: write(data) def _receive_rate(self, fp, content, bytes_total, bytes_start): rate = self.rate write = fp.write progress = self.progress bytes_downloaded = 0 time_start = time.monotonic() for data in content: time_elapsed = time.monotonic() - time_start bytes_downloaded += len(data) write(data) if progress is not None: if time_elapsed > progress: self.out.progress( bytes_total, bytes_start + bytes_downloaded, int(bytes_downloaded / time_elapsed), ) if rate: time_expected = bytes_downloaded / rate if time_expected > time_elapsed: time.sleep(time_expected - time_elapsed) def _find_extension(self, response): """Get filename extension from MIME type""" mtype = response.headers.get("Content-Type", "image/jpeg") mtype = mtype.partition(";")[0] if "/" not in mtype: mtype = "image/" + mtype if mtype in MIME_TYPES: return MIME_TYPES[mtype] ext = mimetypes.guess_extension(mtype, strict=False) if ext: return ext[1:] self.log.warning("Unknown MIME type '%s'", mtype) return "bin" @staticmethod def _adjust_extension(pathfmt, file_header): """Check filename extension against file header""" if not SIGNATURE_CHECKS[pathfmt.extension](file_header): for ext, check in SIGNATURE_CHECKS.items(): if check(file_header): pathfmt.set_extension(ext) pathfmt.build_path() return True return False MIME_TYPES = { "image/jpeg" : "jpg", "image/jpg" : "jpg", "image/png" : "png", "image/gif" : "gif", "image/bmp" : "bmp", "image/x-bmp" : "bmp", "image/x-ms-bmp": "bmp", "image/webp" : "webp", "image/avif" : "avif", "image/heic" : "heic", "image/heif" : "heif", "image/svg+xml" : "svg", "image/ico" : "ico", "image/icon" : "ico", "image/x-icon" : "ico", "image/vnd.microsoft.icon" : "ico", "image/x-photoshop" : "psd", "application/x-photoshop" : "psd", "image/vnd.adobe.photoshop": "psd", "video/webm": "webm", "video/ogg" : "ogg", "video/mp4" : "mp4", "video/quicktime": "mov", "audio/wav" : "wav", "audio/x-wav": "wav", "audio/webm" : "webm", "audio/ogg" : "ogg", "audio/mpeg" : "mp3", "application/zip" : "zip", "application/x-zip": "zip", "application/x-zip-compressed": "zip", "application/rar" : "rar", "application/x-rar": "rar", "application/x-rar-compressed": "rar", "application/x-7z-compressed" : "7z", "application/pdf" : "pdf", "application/x-pdf": "pdf", "application/x-shockwave-flash": "swf", "application/ogg": "ogg", # https://www.iana.org/assignments/media-types/model/obj "model/obj": "obj", "application/octet-stream": "bin", } # https://en.wikipedia.org/wiki/List_of_file_signatures SIGNATURE_CHECKS = { "jpg" : lambda s: s[0:3] == b"\xFF\xD8\xFF", "png" : lambda s: s[0:8] == b"\x89PNG\r\n\x1A\n", "gif" : lambda s: s[0:6] in (b"GIF87a", b"GIF89a"), "bmp" : lambda s: s[0:2] == b"BM", "webp": lambda s: (s[0:4] == b"RIFF" and s[8:12] == b"WEBP"), "avif": lambda s: s[4:11] == b"ftypavi" and s[11] in b"fs", "heic": lambda s: (s[4:10] == b"ftyphe" and s[10:12] in ( b"ic", b"im", b"is", b"ix", b"vc", b"vm", b"vs")), "svg" : lambda s: s[0:5] == b"= 0 else float("inf"), "socket_timeout": self.config("timeout", extractor._timeout), "nocheckcertificate": not self.config("verify", extractor._verify), "proxy": self.proxies.get("http") if self.proxies else None, } self.ytdl_instance = None self.forward_cookies = self.config("forward-cookies", False) self.progress = self.config("progress", 3.0) self.outtmpl = self.config("outtmpl") def download(self, url, pathfmt): kwdict = pathfmt.kwdict ytdl_instance = kwdict.pop("_ytdl_instance", None) if not ytdl_instance: ytdl_instance = self.ytdl_instance if not ytdl_instance: try: module = ytdl.import_module(self.config("module")) except ImportError as exc: self.log.error("Cannot import module '%s'", exc.name) self.log.debug("", exc_info=True) self.download = lambda u, p: False return False self.ytdl_instance = ytdl_instance = ytdl.construct_YoutubeDL( module, self, self.ytdl_opts) if self.outtmpl == "default": self.outtmpl = module.DEFAULT_OUTTMPL if self.forward_cookies: set_cookie = ytdl_instance.cookiejar.set_cookie for cookie in self.session.cookies: set_cookie(cookie) if self.progress is not None and not ytdl_instance._progress_hooks: ytdl_instance.add_progress_hook(self._progress_hook) info_dict = kwdict.pop("_ytdl_info_dict", None) if not info_dict: try: info_dict = ytdl_instance.extract_info(url[5:], download=False) except Exception: pass if not info_dict: return False if "entries" in info_dict: index = kwdict.get("_ytdl_index") if index is None: return self._download_playlist( ytdl_instance, pathfmt, info_dict) else: info_dict = info_dict["entries"][index] extra = kwdict.get("_ytdl_extra") if extra: info_dict.update(extra) return self._download_video(ytdl_instance, pathfmt, info_dict) def _download_video(self, ytdl_instance, pathfmt, info_dict): if "url" in info_dict: text.nameext_from_url(info_dict["url"], pathfmt.kwdict) formats = info_dict.get("requested_formats") if formats and not compatible_formats(formats): info_dict["ext"] = "mkv" if self.outtmpl: self._set_outtmpl(ytdl_instance, self.outtmpl) pathfmt.filename = filename = \ ytdl_instance.prepare_filename(info_dict) pathfmt.extension = info_dict["ext"] pathfmt.path = pathfmt.directory + filename pathfmt.realpath = pathfmt.temppath = ( pathfmt.realdirectory + filename) else: pathfmt.set_extension(info_dict["ext"]) pathfmt.build_path() if pathfmt.exists(): pathfmt.temppath = "" return True if self.part and self.partdir: pathfmt.temppath = os.path.join( self.partdir, pathfmt.filename) self._set_outtmpl(ytdl_instance, pathfmt.temppath.replace("%", "%%")) self.out.start(pathfmt.path) try: ytdl_instance.process_info(info_dict) except Exception: self.log.debug("Traceback", exc_info=True) return False return True def _download_playlist(self, ytdl_instance, pathfmt, info_dict): pathfmt.set_extension("%(playlist_index)s.%(ext)s") pathfmt.build_path() self._set_outtmpl(ytdl_instance, pathfmt.realpath) for entry in info_dict["entries"]: ytdl_instance.process_info(entry) return True def _progress_hook(self, info): if info["status"] == "downloading" and \ info["elapsed"] >= self.progress: total = info.get("total_bytes") or info.get("total_bytes_estimate") speed = info.get("speed") self.out.progress( None if total is None else int(total), info["downloaded_bytes"], int(speed) if speed else 0, ) @staticmethod def _set_outtmpl(ytdl_instance, outtmpl): try: ytdl_instance._parse_outtmpl except AttributeError: try: ytdl_instance.outtmpl_dict["default"] = outtmpl except AttributeError: ytdl_instance.params["outtmpl"] = outtmpl else: ytdl_instance.params["outtmpl"] = {"default": outtmpl} def compatible_formats(formats): """Returns True if 'formats' are compatible for merge""" video_ext = formats[0].get("ext") audio_ext = formats[1].get("ext") if video_ext == "webm" and audio_ext == "webm": return True exts = ("mp3", "mp4", "m4a", "m4p", "m4b", "m4r", "m4v", "ismv", "isma") return video_ext in exts and audio_ext in exts __downloader__ = YoutubeDLDownloader ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709680883.0 gallery_dl-1.26.9/gallery_dl/exception.py0000644000175000017500000000701314571724363017126 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2015-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Exception classes used by gallery-dl Class Hierarchy: Exception +-- GalleryDLException +-- ExtractionError | +-- AuthenticationError | +-- AuthorizationError | +-- NotFoundError | +-- HttpError +-- FormatError | +-- FilenameFormatError | +-- DirectoryFormatError +-- FilterError +-- InputFileError +-- NoExtractorError +-- StopExtraction +-- TerminateExtraction +-- RestartExtraction """ class GalleryDLException(Exception): """Base class for GalleryDL exceptions""" default = None msgfmt = None code = 1 def __init__(self, message=None, fmt=True): if not message: message = self.default elif isinstance(message, Exception): message = "{}: {}".format(message.__class__.__name__, message) if self.msgfmt and fmt: message = self.msgfmt.format(message) Exception.__init__(self, message) class ExtractionError(GalleryDLException): """Base class for exceptions during information extraction""" class HttpError(ExtractionError): """HTTP request during data extraction failed""" default = "HTTP request failed" code = 4 def __init__(self, message, response=None): ExtractionError.__init__(self, message) self.response = response self.status = 0 if response is None else response.status_code class NotFoundError(ExtractionError): """Requested resource (gallery/image) could not be found""" msgfmt = "Requested {} could not be found" default = "resource (gallery/image)" code = 8 class AuthenticationError(ExtractionError): """Invalid or missing login credentials""" default = "Invalid or missing login credentials" code = 16 class AuthorizationError(ExtractionError): """Insufficient privileges to access a resource""" default = "Insufficient privileges to access the specified resource" code = 16 class FormatError(GalleryDLException): """Error while building output paths""" code = 32 class FilenameFormatError(FormatError): """Error while building output filenames""" msgfmt = "Applying filename format string failed ({})" class DirectoryFormatError(FormatError): """Error while building output directory paths""" msgfmt = "Applying directory format string failed ({})" class FilterError(GalleryDLException): """Error while evaluating a filter expression""" msgfmt = "Evaluating filter expression failed ({})" code = 32 class InputFileError(GalleryDLException): """Error when parsing input file""" code = 32 def __init__(self, message, *args): GalleryDLException.__init__( self, message % args if args else message) class NoExtractorError(GalleryDLException): """No extractor can handle the given URL""" code = 64 class StopExtraction(GalleryDLException): """Stop data extraction""" def __init__(self, message=None, *args): GalleryDLException.__init__(self) self.message = message % args if args else message self.code = 1 if message else 0 class TerminateExtraction(GalleryDLException): """Terminate data extraction""" code = 0 class RestartExtraction(GalleryDLException): """Restart data extraction""" code = 0 ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1711211674.0091393 gallery_dl-1.26.9/gallery_dl/extractor/0000755000175000017500000000000014577602232016564 5ustar00mikemike././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709611201.0 gallery_dl-1.26.9/gallery_dl/extractor/2ch.py0000644000175000017500000000605614571514301017612 0ustar00mikemike# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://2ch.hk/""" from .common import Extractor, Message from .. import text, util class _2chThreadExtractor(Extractor): """Extractor for 2ch threads""" category = "2ch" subcategory = "thread" root = "https://2ch.hk" directory_fmt = ("{category}", "{board}", "{thread} {title}") filename_fmt = "{tim}{filename:? //}.{extension}" archive_fmt = "{board}_{thread}_{tim}" pattern = r"(?:https?://)?2ch\.hk/([^/?#]+)/res/(\d+)" example = "https://2ch.hk/a/res/12345.html" def __init__(self, match): Extractor.__init__(self, match) self.board, self.thread = match.groups() def items(self): url = "{}/{}/res/{}.json".format(self.root, self.board, self.thread) posts = self.request(url).json()["threads"][0]["posts"] op = posts[0] title = op.get("subject") or text.remove_html(op["comment"]) thread = { "board" : self.board, "thread": self.thread, "title" : text.unescape(title)[:50], } yield Message.Directory, thread for post in posts: files = post.get("files") if files: post["post_name"] = post["name"] post["date"] = text.parse_timestamp(post["timestamp"]) del post["files"] del post["name"] for file in files: file.update(thread) file.update(post) file["filename"] = file["fullname"].rpartition(".")[0] file["tim"], _, file["extension"] = \ file["name"].rpartition(".") yield Message.Url, self.root + file["path"], file class _2chBoardExtractor(Extractor): """Extractor for 2ch boards""" category = "2ch" subcategory = "board" root = "https://2ch.hk" pattern = r"(?:https?://)?2ch\.hk/([^/?#]+)/?$" example = "https://2ch.hk/a/" def __init__(self, match): Extractor.__init__(self, match) self.board = match.group(1) def items(self): # index page url = "{}/{}/index.json".format(self.root, self.board) index = self.request(url).json() index["_extractor"] = _2chThreadExtractor for thread in index["threads"]: url = "{}/{}/res/{}.html".format( self.root, self.board, thread["thread_num"]) yield Message.Queue, url, index # pages 1..n for n in util.advance(index["pages"], 1): url = "{}/{}/{}.json".format(self.root, self.board, n) page = self.request(url).json() page["_extractor"] = _2chThreadExtractor for thread in page["threads"]: url = "{}/{}/res/{}.html".format( self.root, self.board, thread["thread_num"]) yield Message.Queue, url, page ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709611201.0 gallery_dl-1.26.9/gallery_dl/extractor/2chan.py0000644000175000017500000000623114571514301020124 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2017-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.2chan.net/""" from .common import Extractor, Message from .. import text class _2chanThreadExtractor(Extractor): """Extractor for 2chan threads""" category = "2chan" subcategory = "thread" directory_fmt = ("{category}", "{board_name}", "{thread}") filename_fmt = "{tim}.{extension}" archive_fmt = "{board}_{thread}_{tim}" url_fmt = "https://{server}.2chan.net/{board}/src/{filename}" pattern = r"(?:https?://)?([\w-]+)\.2chan\.net/([^/?#]+)/res/(\d+)" example = "https://dec.2chan.net/12/res/12345.htm" def __init__(self, match): Extractor.__init__(self, match) self.server, self.board, self.thread = match.groups() def items(self): url = "https://{}.2chan.net/{}/res/{}.htm".format( self.server, self.board, self.thread) page = self.request(url).text data = self.metadata(page) yield Message.Directory, data for post in self.posts(page): if "filename" not in post: continue post.update(data) url = self.url_fmt.format_map(post) yield Message.Url, url, post def metadata(self, page): """Collect metadata for extractor-job""" title, _, boardname = text.extr( page, "", "").rpartition(" - ") return { "server": self.server, "title": title, "board": self.board, "board_name": boardname[:-4], "thread": self.thread, } def posts(self, page): """Build a list of all post-objects""" page = text.extr( page, '
') return [ self.parse(post) for post in page.split('') ] def parse(self, post): """Build post-object by extracting data from an HTML post""" data = self._extract_post(post) if data["name"]: data["name"] = data["name"].strip() path = text.extr(post, '' , '<'), ("name", 'class="cnm">' , '<'), ("now" , 'class="cnw">' , '<'), ("no" , 'class="cno">No.', '<'), (None , '', ''), ))[0] @staticmethod def _extract_image(post, data): text.extract_all(post, ( (None , '_blank', ''), ("filename", '>', '<'), ("fsize" , '(', ' '), ), 0, data) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709611201.0 gallery_dl-1.26.9/gallery_dl/extractor/2chen.py0000644000175000017500000000640714571514301020135 0ustar00mikemike# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://sturdychan.help/""" from .common import Extractor, Message from .. import text BASE_PATTERN = r"(?:https?://)?(?:sturdychan.help|2chen\.(?:moe|club))" class _2chenThreadExtractor(Extractor): """Extractor for 2chen threads""" category = "2chen" subcategory = "thread" root = "https://sturdychan.help" directory_fmt = ("{category}", "{board}", "{thread} {title}") filename_fmt = "{time} {filename}.{extension}" archive_fmt = "{board}_{thread}_{hash}_{time}" pattern = BASE_PATTERN + r"/([^/?#]+)/(\d+)" example = "https://sturdychan.help/a/12345/" def __init__(self, match): Extractor.__init__(self, match) self.board, self.thread = match.groups() def items(self): url = "{}/{}/{}".format(self.root, self.board, self.thread) page = self.request(url, encoding="utf-8", notfound="thread").text data = self.metadata(page) yield Message.Directory, data for post in self.posts(page): url = post["url"] if not url: continue if url[0] == "/": url = self.root + url post["url"] = url = url.partition("?")[0] post.update(data) post["time"] = text.parse_int(post["date"].timestamp()) yield Message.Url, url, text.nameext_from_url( post["filename"], post) def metadata(self, page): board, pos = text.extract(page, 'class="board">/', '/<') title = text.extract(page, "

", "

", pos)[0] return { "board" : board, "thread": self.thread, "title" : text.unescape(title), } def posts(self, page): """Return iterable with relevant posts""" return map(self.parse, text.extract_iter( page, 'class="glass media', '')) def parse(self, post): extr = text.extract_from(post) return { "name" : text.unescape(extr("", "")), "date" : text.parse_datetime( extr("")[2], "%d %b %Y (%a) %H:%M:%S" ), "no" : extr('href="#p', '"'), "url" : extr('
[^&#]+)") example = "http://behoimi.org/post?tags=TAG" def posts(self): params = {"tags": self.tags} return self._pagination(self.root + "/post/index.json", params) class _3dbooruPoolExtractor(_3dbooruBase, moebooru.MoebooruPoolExtractor): """Extractor for image-pools from behoimi.org""" pattern = r"(?:https?://)?(?:www\.)?behoimi\.org/pool/show/(?P\d+)" example = "http://behoimi.org/pool/show/12345" def posts(self): params = {"tags": "pool:" + self.pool_id} return self._pagination(self.root + "/post/index.json", params) class _3dbooruPostExtractor(_3dbooruBase, moebooru.MoebooruPostExtractor): """Extractor for single images from behoimi.org""" pattern = r"(?:https?://)?(?:www\.)?behoimi\.org/post/show/(?P\d+)" example = "http://behoimi.org/post/show/12345" def posts(self): params = {"tags": "id:" + self.post_id} return self._pagination(self.root + "/post/index.json", params) class _3dbooruPopularExtractor( _3dbooruBase, moebooru.MoebooruPopularExtractor): """Extractor for popular images from behoimi.org""" pattern = (r"(?:https?://)?(?:www\.)?behoimi\.org" r"/post/popular_(?Pby_(?:day|week|month)|recent)" r"(?:\?(?P[^#]*))?") example = "http://behoimi.org/post/popular_by_month" ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709611201.0 gallery_dl-1.26.9/gallery_dl/extractor/4archive.py0000644000175000017500000000750114571514301020637 0ustar00mikemike# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://4archive.org/""" from .common import Extractor, Message from .. import text, util class _4archiveThreadExtractor(Extractor): """Extractor for 4archive threads""" category = "4archive" subcategory = "thread" directory_fmt = ("{category}", "{board}", "{thread} {title}") filename_fmt = "{no} {filename}.{extension}" archive_fmt = "{board}_{thread}_{no}" root = "https://4archive.org" referer = False pattern = r"(?:https?://)?4archive\.org/board/([^/?#]+)/thread/(\d+)" example = "https://4archive.org/board/a/thread/12345/" def __init__(self, match): Extractor.__init__(self, match) self.board, self.thread = match.groups() def items(self): url = "{}/board/{}/thread/{}".format( self.root, self.board, self.thread) page = self.request(url).text data = self.metadata(page) posts = self.posts(page) if not data["title"]: data["title"] = posts[0]["com"][:50] for post in posts: post.update(data) post["time"] = int(util.datetime_to_timestamp(post["date"])) yield Message.Directory, post if "url" in post: yield Message.Url, post["url"], text.nameext_from_url( post["filename"], post) def metadata(self, page): return { "board" : self.board, "thread": text.parse_int(self.thread), "title" : text.unescape(text.extr( page, 'class="subject">', "")) } def posts(self, page): return [ self.parse(post) for post in page.split('class="postContainer')[1:] ] @staticmethod def parse(post): extr = text.extract_from(post) data = { "name": extr('class="name">', ""), "date": text.parse_datetime( extr('class="dateTime postNum" >', "<").strip(), "%Y-%m-%d %H:%M:%S"), "no" : text.parse_int(extr('href="#p', '"')), } if 'class="file"' in post: extr('class="fileText"', ">File: ").strip()[1:], "size" : text.parse_bytes(extr(" (", ", ")[:-1]), "width" : text.parse_int(extr("", "x")), "height" : text.parse_int(extr("", "px")), }) extr("
", "
"))) return data class _4archiveBoardExtractor(Extractor): """Extractor for 4archive boards""" category = "4archive" subcategory = "board" root = "https://4archive.org" pattern = r"(?:https?://)?4archive\.org/board/([^/?#]+)(?:/(\d+))?/?$" example = "https://4archive.org/board/a/" def __init__(self, match): Extractor.__init__(self, match) self.board = match.group(1) self.num = text.parse_int(match.group(2), 1) def items(self): data = {"_extractor": _4archiveThreadExtractor} while True: url = "{}/board/{}/{}".format(self.root, self.board, self.num) page = self.request(url).text if 'class="thread"' not in page: return for thread in text.extract_iter(page, 'class="thread" id="t', '"'): url = "{}/board/{}/thread/{}".format( self.root, self.board, thread) yield Message.Queue, url, data self.num += 1 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709611201.0 gallery_dl-1.26.9/gallery_dl/extractor/4chan.py0000644000175000017500000000500714571514301020126 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2015-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.4chan.org/""" from .common import Extractor, Message from .. import text class _4chanThreadExtractor(Extractor): """Extractor for 4chan threads""" category = "4chan" subcategory = "thread" directory_fmt = ("{category}", "{board}", "{thread} {title}") filename_fmt = "{tim} {filename}.{extension}" archive_fmt = "{board}_{thread}_{tim}" pattern = (r"(?:https?://)?boards\.4chan(?:nel)?\.org" r"/([^/]+)/thread/(\d+)") example = "https://boards.4channel.org/a/thread/12345/" def __init__(self, match): Extractor.__init__(self, match) self.board, self.thread = match.groups() def items(self): url = "https://a.4cdn.org/{}/thread/{}.json".format( self.board, self.thread) posts = self.request(url).json()["posts"] title = posts[0].get("sub") or text.remove_html(posts[0]["com"]) data = { "board" : self.board, "thread": self.thread, "title" : text.unescape(title)[:50], } yield Message.Directory, data for post in posts: if "filename" in post: post.update(data) post["extension"] = post["ext"][1:] post["filename"] = text.unescape(post["filename"]) url = "https://i.4cdn.org/{}/{}{}".format( post["board"], post["tim"], post["ext"]) yield Message.Url, url, post class _4chanBoardExtractor(Extractor): """Extractor for 4chan boards""" category = "4chan" subcategory = "board" pattern = r"(?:https?://)?boards\.4chan(?:nel)?\.org/([^/?#]+)/\d*$" example = "https://boards.4channel.org/a/" def __init__(self, match): Extractor.__init__(self, match) self.board = match.group(1) def items(self): url = "https://a.4cdn.org/{}/threads.json".format(self.board) threads = self.request(url).json() for page in threads: for thread in page["threads"]: url = "https://boards.4chan.org/{}/thread/{}/".format( self.board, thread["no"]) thread["page"] = page["page"] thread["_extractor"] = _4chanThreadExtractor yield Message.Queue, url, thread ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709611201.0 gallery_dl-1.26.9/gallery_dl/extractor/4chanarchives.py0000644000175000017500000000770514571514301021662 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://4chanarchives.com/""" from .common import Extractor, Message from .. import text class _4chanarchivesThreadExtractor(Extractor): """Extractor for threads on 4chanarchives.com""" category = "4chanarchives" subcategory = "thread" root = "https://4chanarchives.com" directory_fmt = ("{category}", "{board}", "{thread} - {title}") filename_fmt = "{no}-{filename}.{extension}" archive_fmt = "{board}_{thread}_{no}" referer = False pattern = r"(?:https?://)?4chanarchives\.com/board/([^/?#]+)/thread/(\d+)" example = "https://4chanarchives.com/board/a/thread/12345/" def __init__(self, match): Extractor.__init__(self, match) self.board, self.thread = match.groups() def items(self): url = "{}/board/{}/thread/{}".format( self.root, self.board, self.thread) page = self.request(url).text data = self.metadata(page) posts = self.posts(page) if not data["title"]: data["title"] = text.unescape(text.remove_html( posts[0]["com"]))[:50] for post in posts: post.update(data) yield Message.Directory, post if "url" in post: yield Message.Url, post["url"], post def metadata(self, page): return { "board" : self.board, "thread" : self.thread, "title" : text.unescape(text.extr( page, 'property="og:title" content="', '"')), } def posts(self, page): """Build a list of all post objects""" return [self.parse(html) for html in text.extract_iter( page, 'id="pc', '')] def parse(self, html): """Build post object by extracting data from an HTML post""" post = self._extract_post(html) if ">File: <" in html: self._extract_file(html, post) post["extension"] = post["url"].rpartition(".")[2] return post @staticmethod def _extract_post(html): extr = text.extract_from(html) return { "no" : text.parse_int(extr('', '"')), "name": extr('class="name">', '<'), "time": extr('class="dateTime postNum" >', '<').rstrip(), "com" : text.unescape( html[html.find('")[2]), } @staticmethod def _extract_file(html, post): extr = text.extract_from(html, html.index(">File: <")) post["url"] = extr('href="', '"') post["filename"] = text.unquote(extr(">", "<").rpartition(".")[0]) post["fsize"] = extr("(", ", ") post["w"] = text.parse_int(extr("", "x")) post["h"] = text.parse_int(extr("", ")")) class _4chanarchivesBoardExtractor(Extractor): """Extractor for boards on 4chanarchives.com""" category = "4chanarchives" subcategory = "board" root = "https://4chanarchives.com" pattern = r"(?:https?://)?4chanarchives\.com/board/([^/?#]+)(?:/(\d+))?/?$" example = "https://4chanarchives.com/board/a/" def __init__(self, match): Extractor.__init__(self, match) self.board, self.page = match.groups() def items(self): data = {"_extractor": _4chanarchivesThreadExtractor} pnum = text.parse_int(self.page, 1) needle = '''
board["pageCount"]: return url = "{}/{}/{}.json".format(self.root, self.board, page) threads = self.request(url).json()["threads"] ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709611201.0 gallery_dl-1.26.9/gallery_dl/extractor/8muses.py0000644000175000017500000000701714571514301020360 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2019-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://comics.8muses.com/""" from .common import Extractor, Message from .. import text, util class _8musesAlbumExtractor(Extractor): """Extractor for image albums on comics.8muses.com""" category = "8muses" subcategory = "album" directory_fmt = ("{category}", "{album[path]}") filename_fmt = "{page:>03}.{extension}" archive_fmt = "{hash}" root = "https://comics.8muses.com" pattern = (r"(?:https?://)?(?:comics\.|www\.)?8muses\.com" r"(/comics/album/[^?#]+)(\?[^#]+)?") example = "https://comics.8muses.com/comics/album/PATH/TITLE" def __init__(self, match): Extractor.__init__(self, match) self.path = match.group(1) self.params = match.group(2) or "" def items(self): url = self.root + self.path + self.params while True: data = self._unobfuscate(text.extr( self.request(url).text, 'id="ractive-public" type="text/plain">', '')) images = data.get("pictures") if images: count = len(images) album = self._make_album(data["album"]) yield Message.Directory, {"album": album, "count": count} for num, image in enumerate(images, 1): url = self.root + "/image/fl/" + image["publicUri"] img = { "url" : url, "page" : num, "hash" : image["publicUri"], "count" : count, "album" : album, "extension": "jpg", } yield Message.Url, url, img albums = data.get("albums") if albums: for album in albums: url = self.root + "/comics/album/" + album["permalink"] yield Message.Queue, url, { "url" : url, "name" : album["name"], "private" : album["isPrivate"], "_extractor": _8musesAlbumExtractor, } if data["page"] >= data["pages"]: return path, _, num = self.path.rstrip("/").rpartition("/") path = path if num.isdecimal() else self.path url = "{}{}/{}{}".format( self.root, path, data["page"] + 1, self.params) def _make_album(self, album): return { "id" : album["id"], "path" : album["path"], "parts" : album["path"].split("/"), "title" : album["name"], "private": album["isPrivate"], "url" : self.root + "/comics/album/" + album["permalink"], "parent" : text.parse_int(album["parentId"]), "views" : text.parse_int(album["numberViews"]), "likes" : text.parse_int(album["numberLikes"]), "date" : text.parse_datetime( album["updatedAt"], "%Y-%m-%dT%H:%M:%S.%fZ"), } @staticmethod def _unobfuscate(data): return util.json_loads("".join([ chr(33 + (ord(c) + 14) % 94) if "!" <= c <= "~" else c for c in text.unescape(data.strip("\t\n\r !")) ])) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709680883.0 gallery_dl-1.26.9/gallery_dl/extractor/__init__.py0000644000175000017500000001160714571724363020706 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2015-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. import sys import re modules = [ "2ch", "2chan", "2chen", "35photo", "3dbooru", "4chan", "4archive", "4chanarchives", "500px", "8chan", "8muses", "adultempire", "architizer", "artstation", "aryion", "batoto", "bbc", "behance", "blogger", "bluesky", "bunkr", "catbox", "chevereto", "comicvine", "cyberdrop", "danbooru", "desktopography", "deviantart", "dynastyscans", "e621", "erome", "exhentai", "fallenangels", "fanbox", "fanleaks", "fantia", "fapello", "fapachi", "flickr", "furaffinity", "fuskator", "gelbooru", "gelbooru_v01", "gelbooru_v02", "gofile", "hatenablog", "hentai2read", "hentaicosplays", "hentaifoundry", "hentaifox", "hentaihand", "hentaihere", "hiperdex", "hitomi", "hotleak", "idolcomplex", "imagebam", "imagechest", "imagefap", "imgbb", "imgbox", "imgth", "imgur", "inkbunny", "instagram", "issuu", "itaku", "itchio", "jschan", "kabeuchi", "keenspot", "kemonoparty", "khinsider", "komikcast", "lensdump", "lexica", "lightroom", "livedoor", "luscious", "lynxchan", "mangadex", "mangafox", "mangahere", "mangakakalot", "manganelo", "mangapark", "mangaread", "mangasee", "mangoxo", "misskey", "myhentaigallery", "myportfolio", "naver", "naverwebtoon", "newgrounds", "nhentai", "nijie", "nitter", "nozomi", "nsfwalbum", "paheal", "patreon", "philomena", "photobucket", "photovogue", "picarto", "piczel", "pillowfort", "pinterest", "pixeldrain", "pixiv", "pixnet", "plurk", "poipiku", "poringa", "pornhub", "pornpics", "postmill", "pururin", "reactor", "readcomiconline", "reddit", "redgifs", "rule34us", "sankaku", "sankakucomplex", "seiga", "senmanga", "sexcom", "shimmie2", "simplyhentai", "skeb", "slickpic", "slideshare", "smugmug", "soundgasm", "speakerdeck", "steamgriddb", "subscribestar", "szurubooru", "tapas", "tcbscans", "telegraph", "tmohentai", "toyhouse", "tsumino", "tumblr", "tumblrgallery", "twibooru", "twitter", "urlgalleries", "unsplash", "uploadir", "urlshortener", "vanillarock", "vichan", "vipergirls", "vk", "vsco", "wallhaven", "wallpapercave", "warosu", "weasyl", "webmshare", "webtoons", "weibo", "wikiart", "wikifeet", "wikimedia", "xhamster", "xvideos", "zerochan", "zzup", "booru", "moebooru", "foolfuuka", "foolslide", "mastodon", "shopify", "lolisafe", "imagehosts", "directlink", "recursive", "oauth", "ytdl", "generic", ] def find(url): """Find a suitable extractor for the given URL""" for cls in _list_classes(): match = cls.pattern.match(url) if match: return cls(match) return None def add(cls): """Add 'cls' to the list of available extractors""" cls.pattern = re.compile(cls.pattern) _cache.append(cls) return cls def add_module(module): """Add all extractors in 'module' to the list of available extractors""" classes = _get_classes(module) for cls in classes: cls.pattern = re.compile(cls.pattern) _cache.extend(classes) return classes def extractors(): """Yield all available extractor classes""" return sorted( _list_classes(), key=lambda x: x.__name__ ) # -------------------------------------------------------------------- # internals def _list_classes(): """Yield available extractor classes""" yield from _cache for module in _module_iter: yield from add_module(module) globals()["_list_classes"] = lambda : _cache def _modules_internal(): globals_ = globals() for module_name in modules: yield __import__(module_name, globals_, None, (), 1) def _modules_path(path, files): sys.path.insert(0, path) try: return [ __import__(name[:-3]) for name in files if name.endswith(".py") ] finally: del sys.path[0] def _get_classes(module): """Return a list of all extractor classes in a module""" return [ cls for cls in module.__dict__.values() if ( hasattr(cls, "pattern") and cls.__module__ == module.__name__ ) ] _cache = [] _module_iter = _modules_internal() ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709611201.0 gallery_dl-1.26.9/gallery_dl/extractor/adultempire.py0000644000175000017500000000341014571514301021440 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2019-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.adultempire.com/""" from .common import GalleryExtractor from .. import text class AdultempireGalleryExtractor(GalleryExtractor): """Extractor for image galleries from www.adultempire.com""" category = "adultempire" root = "https://www.adultempire.com" pattern = (r"(?:https?://)?(?:www\.)?adult(?:dvd)?empire\.com" r"(/(\d+)/gallery\.html)") example = "https://www.adultempire.com/12345/gallery.html" def __init__(self, match): GalleryExtractor.__init__(self, match) self.gallery_id = match.group(2) def metadata(self, page): extr = text.extract_from(page, page.index('
')) return { "gallery_id": text.parse_int(self.gallery_id), "title" : text.unescape(extr('title="', '"')), "studio" : extr(">studio", "<").strip(), "date" : text.parse_datetime(extr( ">released", "<").strip(), "%m/%d/%Y"), "actors" : sorted(text.split_html(extr( '
    ", "<").strip(), "type" : text.unescape(text.remove_html(extr( '
    Type
    ', 'STATUS
', 'YEAR', 'SIZE', '', '') .replace("
", "\n")), } def images(self, page): return [ (url, None) for url in text.extract_iter( page, 'property="og:image:secure_url" content="', "?") ] class ArchitizerFirmExtractor(Extractor): """Extractor for all projects of a firm""" category = "architizer" subcategory = "firm" root = "https://architizer.com" pattern = r"(?:https?://)?architizer\.com/firms/([^/?#]+)" example = "https://architizer.com/firms/NAME/" def __init__(self, match): Extractor.__init__(self, match) self.firm = match.group(1) def items(self): url = url = "{}/firms/{}/?requesting_merlin=pages".format( self.root, self.firm) page = self.request(url).text data = {"_extractor": ArchitizerProjectExtractor} for project in text.extract_iter(page, '
= data["total_count"]: return params["page"] += 1 def _init_csrf_token(self): url = self.root + "/api/v2/csrf_protection/token.json" headers = { "Accept" : "*/*", "Origin" : self.root, } return self.request( url, method="POST", headers=headers, json={}, ).json()["public_csrf_token"] @staticmethod def _no_cache(url, alphabet=(string.digits + string.ascii_letters)): """Cause a cache miss to prevent Cloudflare 'optimizations' Cloudflare's 'Polish' optimization strips image metadata and may even recompress an image as lossy JPEG. This can be prevented by causing a cache miss when requesting an image by adding a random dummy query parameter. Ref: https://github.com/r888888888/danbooru/issues/3528 https://danbooru.donmai.us/forum_topics/14952 """ param = "gallerydl_no_cache=" + util.bencode( random.getrandbits(64), alphabet) sep = "&" if "?" in url else "?" return url + sep + param class ArtstationUserExtractor(ArtstationExtractor): """Extractor for all projects of an artstation user""" subcategory = "user" pattern = (r"(?:https?://)?(?:(?:www\.)?artstation\.com" r"/(?!artwork|projects|search)([^/?#]+)(?:/albums/all)?" r"|((?!www)[\w-]+)\.artstation\.com(?:/projects)?)/?$") example = "https://www.artstation.com/USER" def projects(self): url = "{}/users/{}/projects.json".format(self.root, self.user) params = {"album_id": "all"} return self._pagination(url, params) class ArtstationAlbumExtractor(ArtstationExtractor): """Extractor for all projects in an artstation album""" subcategory = "album" directory_fmt = ("{category}", "{userinfo[username]}", "Albums", "{album[id]} - {album[title]}") archive_fmt = "a_{album[id]}_{asset[id]}" pattern = (r"(?:https?://)?(?:(?:www\.)?artstation\.com" r"/(?!artwork|projects|search)([^/?#]+)" r"|((?!www)[\w-]+)\.artstation\.com)/albums/(\d+)") example = "https://www.artstation.com/USER/albums/12345" def __init__(self, match): ArtstationExtractor.__init__(self, match) self.album_id = text.parse_int(match.group(3)) def metadata(self): userinfo = self.get_user_info(self.user) album = None for album in userinfo["albums_with_community_projects"]: if album["id"] == self.album_id: break else: raise exception.NotFoundError("album") return { "userinfo": userinfo, "album": album } def projects(self): url = "{}/users/{}/projects.json".format(self.root, self.user) params = {"album_id": self.album_id} return self._pagination(url, params) class ArtstationLikesExtractor(ArtstationExtractor): """Extractor for liked projects of an artstation user""" subcategory = "likes" directory_fmt = ("{category}", "{userinfo[username]}", "Likes") archive_fmt = "f_{userinfo[id]}_{asset[id]}" pattern = (r"(?:https?://)?(?:www\.)?artstation\.com" r"/(?!artwork|projects|search)([^/?#]+)/likes") example = "https://www.artstation.com/USER/likes" def projects(self): url = "{}/users/{}/likes.json".format(self.root, self.user) return self._pagination(url) class ArtstationCollectionExtractor(ArtstationExtractor): """Extractor for an artstation collection""" subcategory = "collection" directory_fmt = ("{category}", "{user}", "{collection[id]} {collection[name]}") archive_fmt = "c_{collection[id]}_{asset[id]}" pattern = (r"(?:https?://)?(?:www\.)?artstation\.com" r"/(?!artwork|projects|search)([^/?#]+)/collections/(\d+)") example = "https://www.artstation.com/USER/collections/12345" def __init__(self, match): ArtstationExtractor.__init__(self, match) self.collection_id = match.group(2) def metadata(self): url = "{}/collections/{}.json".format( self.root, self.collection_id) params = {"username": self.user} collection = self.request( url, params=params, notfound="collection").json() return {"collection": collection, "user": self.user} def projects(self): url = "{}/collections/{}/projects.json".format( self.root, self.collection_id) params = {"collection_id": self.collection_id} return self._pagination(url, params) class ArtstationCollectionsExtractor(ArtstationExtractor): """Extractor for an artstation user's collections""" subcategory = "collections" pattern = (r"(?:https?://)?(?:www\.)?artstation\.com" r"/(?!artwork|projects|search)([^/?#]+)/collections/?$") example = "https://www.artstation.com/USER/collections" def items(self): url = self.root + "/collections.json" params = {"username": self.user} for collection in self.request( url, params=params, notfound="collections").json(): url = "{}/{}/collections/{}".format( self.root, self.user, collection["id"]) collection["_extractor"] = ArtstationCollectionExtractor yield Message.Queue, url, collection class ArtstationChallengeExtractor(ArtstationExtractor): """Extractor for submissions of artstation challenges""" subcategory = "challenge" filename_fmt = "{submission_id}_{asset_id}_{filename}.{extension}" directory_fmt = ("{category}", "Challenges", "{challenge[id]} - {challenge[title]}") archive_fmt = "c_{challenge[id]}_{asset_id}" pattern = (r"(?:https?://)?(?:www\.)?artstation\.com" r"/contests/[^/?#]+/challenges/(\d+)" r"/?(?:\?sorting=([a-z]+))?") example = "https://www.artstation.com/contests/NAME/challenges/12345" def __init__(self, match): ArtstationExtractor.__init__(self, match) self.challenge_id = match.group(1) self.sorting = match.group(2) or "popular" def items(self): challenge_url = "{}/contests/_/challenges/{}.json".format( self.root, self.challenge_id) submission_url = "{}/contests/_/challenges/{}/submissions.json".format( self.root, self.challenge_id) update_url = "{}/contests/submission_updates.json".format( self.root) challenge = self.request(challenge_url).json() yield Message.Directory, {"challenge": challenge} params = {"sorting": self.sorting} for submission in self._pagination(submission_url, params): params = {"submission_id": submission["id"]} for update in self._pagination(update_url, params=params): del update["replies"] update["challenge"] = challenge for url in text.extract_iter( update["body_presentation_html"], ' href="', '"'): update["asset_id"] = self._id_from_url(url) text.nameext_from_url(url, update) yield Message.Url, self._no_cache(url), update @staticmethod def _id_from_url(url): """Get an image's submission ID from its URL""" parts = url.split("/") return text.parse_int("".join(parts[7:10])) class ArtstationSearchExtractor(ArtstationExtractor): """Extractor for artstation search results""" subcategory = "search" directory_fmt = ("{category}", "Searches", "{search[query]}") archive_fmt = "s_{search[query]}_{asset[id]}" pattern = (r"(?:https?://)?(?:\w+\.)?artstation\.com" r"/search/?\?([^#]+)") example = "https://www.artstation.com/search?query=QUERY" def __init__(self, match): ArtstationExtractor.__init__(self, match) self.params = query = text.parse_query(match.group(1)) self.query = text.unquote(query.get("query") or query.get("q", "")) self.sorting = query.get("sort_by", "relevance").lower() self.tags = query.get("tags", "").split(",") def metadata(self): return {"search": { "query" : self.query, "sorting": self.sorting, "tags" : self.tags, }} def projects(self): filters = [] for key, value in self.params.items(): if key.endswith("_ids") or key == "tags": filters.append({ "field" : key, "method": "include", "value" : value.split(","), }) url = "{}/api/v2/search/projects.json".format(self.root) data = { "query" : self.query, "page" : None, "per_page" : 50, "sorting" : self.sorting, "pro_first" : ("1" if self.config("pro-first", True) else "0"), "filters" : filters, "additional_fields": (), } return self._pagination(url, json=data) class ArtstationArtworkExtractor(ArtstationExtractor): """Extractor for projects on artstation's artwork page""" subcategory = "artwork" directory_fmt = ("{category}", "Artworks", "{artwork[sorting]!c}") archive_fmt = "A_{asset[id]}" pattern = (r"(?:https?://)?(?:\w+\.)?artstation\.com" r"/artwork/?\?([^#]+)") example = "https://www.artstation.com/artwork?sorting=SORT" def __init__(self, match): ArtstationExtractor.__init__(self, match) self.query = text.parse_query(match.group(1)) def metadata(self): return {"artwork": self.query} def projects(self): url = "{}/projects.json".format(self.root) return self._pagination(url, self.query.copy()) class ArtstationImageExtractor(ArtstationExtractor): """Extractor for images from a single artstation project""" subcategory = "image" pattern = (r"(?:https?://)?(?:" r"(?:[\w-]+\.)?artstation\.com/(?:artwork|projects|search)" r"|artstn\.co/p)/(\w+)") example = "https://www.artstation.com/artwork/abcde" def __init__(self, match): ArtstationExtractor.__init__(self, match) self.project_id = match.group(1) self.assets = None def metadata(self): self.assets = list(ArtstationExtractor.get_project_assets( self, self.project_id)) try: self.user = self.assets[0]["user"]["username"] except IndexError: self.user = "" return ArtstationExtractor.metadata(self) def projects(self): return ({"hash_id": self.project_id},) def get_project_assets(self, project_id): return self.assets class ArtstationFollowingExtractor(ArtstationExtractor): """Extractor for a user's followed users""" subcategory = "following" pattern = (r"(?:https?://)?(?:www\.)?artstation\.com" r"/(?!artwork|projects|search)([^/?#]+)/following") example = "https://www.artstation.com/USER/following" def items(self): url = "{}/users/{}/following.json".format(self.root, self.user) for user in self._pagination(url): url = "{}/{}".format(self.root, user["username"]) user["_extractor"] = ArtstationUserExtractor yield Message.Queue, url, user ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709611201.0 gallery_dl-1.26.9/gallery_dl/extractor/aryion.py0000644000175000017500000001714414571514301020437 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2020-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://aryion.com/""" from .common import Extractor, Message from .. import text, util, exception from ..cache import cache from email.utils import parsedate_tz from datetime import datetime BASE_PATTERN = r"(?:https?://)?(?:www\.)?aryion\.com/g4" class AryionExtractor(Extractor): """Base class for aryion extractors""" category = "aryion" directory_fmt = ("{category}", "{user!l}", "{path:J - }") filename_fmt = "{id} {title}.{extension}" archive_fmt = "{id}" cookies_domain = ".aryion.com" cookies_names = ("phpbb3_rl7a3_sid",) root = "https://aryion.com" def __init__(self, match): Extractor.__init__(self, match) self.user = match.group(1) self.recursive = True def login(self): if self.cookies_check(self.cookies_names): return username, password = self._get_auth_info() if username: self.cookies_update(self._login_impl(username, password)) @cache(maxage=14*86400, keyarg=1) def _login_impl(self, username, password): self.log.info("Logging in as %s", username) url = self.root + "/forum/ucp.php?mode=login" data = { "username": username, "password": password, "login": "Login", } response = self.request(url, method="POST", data=data) if b"You have been successfully logged in." not in response.content: raise exception.AuthenticationError() return {c: response.cookies[c] for c in self.cookies_names} def items(self): self.login() data = self.metadata() for post_id in self.posts(): post = self._parse_post(post_id) if post: if data: post.update(data) yield Message.Directory, post yield Message.Url, post["url"], post elif post is False and self.recursive: base = self.root + "/g4/view/" data = {"_extractor": AryionPostExtractor} for post_id in self._pagination_params(base + post_id): yield Message.Queue, base + post_id, data def posts(self): """Yield relevant post IDs""" def metadata(self): """Return general metadata""" def _pagination_params(self, url, params=None): if params is None: params = {"p": 1} else: params["p"] = text.parse_int(params.get("p"), 1) while True: page = self.request(url, params=params).text cnt = 0 for post_id in text.extract_iter( page, "class='gallery-item' id='", "'"): cnt += 1 yield post_id if cnt < 40: return params["p"] += 1 def _pagination_next(self, url): while True: page = self.request(url).text yield from text.extract_iter(page, "thumb' href='/g4/view/", "'") pos = page.find("Next >>") if pos < 0: return url = self.root + text.rextract(page, "href='", "'", pos)[0] def _parse_post(self, post_id): url = "{}/g4/data.php?id={}".format(self.root, post_id) with self.request(url, method="HEAD", fatal=False) as response: if response.status_code >= 400: self.log.warning( "Unable to fetch post %s ('%s %s')", post_id, response.status_code, response.reason) return None headers = response.headers # folder if headers["content-type"] in ( "application/x-folder", "application/x-comic-folder", "application/x-comic-folder-nomerge", ): return False # get filename from 'Content-Disposition' header cdis = headers["content-disposition"] fname, _, ext = text.extr(cdis, 'filename="', '"').rpartition(".") if not fname: fname, ext = ext, fname # get file size from 'Content-Length' header clen = headers.get("content-length") # fix 'Last-Modified' header lmod = headers["last-modified"] if lmod[22] != ":": lmod = "{}:{} GMT".format(lmod[:22], lmod[22:24]) post_url = "{}/g4/view/{}".format(self.root, post_id) extr = text.extract_from(self.request(post_url).text) title, _, artist = text.unescape(extr( "g4 :: ", "<")).rpartition(" by ") return { "id" : text.parse_int(post_id), "url" : url, "user" : self.user or artist, "title" : title, "artist": artist, "path" : text.split_html(extr( "cookiecrumb'>", '</span'))[4:-1:2], "date" : datetime(*parsedate_tz(lmod)[:6]), "size" : text.parse_int(clen), "views" : text.parse_int(extr("Views</b>:", "<").replace(",", "")), "width" : text.parse_int(extr("Resolution</b>:", "x")), "height": text.parse_int(extr("", "<")), "comments" : text.parse_int(extr("Comments</b>:", "<")), "favorites": text.parse_int(extr("Favorites</b>:", "<")), "tags" : text.split_html(extr("class='taglist'>", "</span>")), "description": text.unescape(text.remove_html(extr( "<p>", "</p>"), "", "")), "filename" : fname, "extension": ext, "_mtime" : lmod, } class AryionGalleryExtractor(AryionExtractor): """Extractor for a user's gallery on eka's portal""" subcategory = "gallery" categorytransfer = True pattern = BASE_PATTERN + r"/(?:gallery/|user/|latest.php\?name=)([^/?#]+)" example = "https://aryion.com/g4/gallery/USER" def __init__(self, match): AryionExtractor.__init__(self, match) self.offset = 0 def _init(self): self.recursive = self.config("recursive", True) def skip(self, num): if self.recursive: return 0 self.offset += num return num def posts(self): if self.recursive: url = "{}/g4/gallery/{}".format(self.root, self.user) return self._pagination_params(url) else: url = "{}/g4/latest.php?name={}".format(self.root, self.user) return util.advance(self._pagination_next(url), self.offset) class AryionTagExtractor(AryionExtractor): """Extractor for tag searches on eka's portal""" subcategory = "tag" directory_fmt = ("{category}", "tags", "{search_tags}") archive_fmt = "t_{search_tags}_{id}" pattern = BASE_PATTERN + r"/tags\.php\?([^#]+)" example = "https://aryion.com/g4/tags.php?tag=TAG" def _init(self): self.params = text.parse_query(self.user) self.user = None def metadata(self): return {"search_tags": self.params.get("tag")} def posts(self): url = self.root + "/g4/tags.php" return self._pagination_params(url, self.params) class AryionPostExtractor(AryionExtractor): """Extractor for individual posts on eka's portal""" subcategory = "post" pattern = BASE_PATTERN + r"/view/(\d+)" example = "https://aryion.com/g4/view/12345" def posts(self): post_id, self.user = self.user, None return (post_id,) ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1709680883.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.26.9/gallery_dl/extractor/batoto.py����������������������������������������������������0000644�0001750�0001750�00000010607�14571724363�020436� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://bato.to/""" from .common import Extractor, ChapterExtractor, MangaExtractor from .. import text, exception import re BASE_PATTERN = (r"(?:https?://)?(?:" r"(?:ba|d|h|m|w)to\.to|" r"(?:(?:manga|read)toto|batocomic|[xz]bato)\.(?:com|net|org)|" r"comiko\.(?:net|org)|" r"bat(?:otoo|o?two)\.com)") class BatotoBase(): """Base class for batoto extractors""" category = "batoto" root = "https://bato.to" def request(self, url, **kwargs): kwargs["encoding"] = "utf-8" return Extractor.request(self, url, **kwargs) class BatotoChapterExtractor(BatotoBase, ChapterExtractor): """Extractor for bato.to manga chapters""" pattern = BASE_PATTERN + r"/(?:title/[^/?#]+|chapter)/(\d+)" example = "https://bato.to/title/12345-MANGA/54321" def __init__(self, match): self.root = text.root_from_url(match.group(0)) self.chapter_id = match.group(1) url = "{}/title/0/{}".format(self.root, self.chapter_id) ChapterExtractor.__init__(self, match, url) def metadata(self, page): extr = text.extract_from(page) try: manga, info, _ = extr("<title>", "<").rsplit(" - ", 3) except ValueError: manga = info = None manga_id = text.extr( extr('rel="canonical" href="', '"'), "/title/", "/") if not manga: manga = extr('link-hover">', "<") info = text.remove_html(extr('link-hover">', "</")) match = re.match( r"(?:Volume\s+(\d+) )?" r"\w+\s+(\d+)(.*)", info) if match: volume, chapter, minor = match.groups() title = text.remove_html(extr( "selected>", "</option")).partition(" : ")[2] else: volume = chapter = 0 minor = "" title = info return { "manga" : text.unescape(manga), "manga_id" : text.parse_int(manga_id), "title" : text.unescape(title), "volume" : text.parse_int(volume), "chapter" : text.parse_int(chapter), "chapter_minor": minor, "chapter_id" : text.parse_int(self.chapter_id), "date" : text.parse_timestamp(extr(' time="', '"')[:-3]), } def images(self, page): images_container = text.extr(page, 'pageOpts', ':[0,0]}"') images_container = text.unescape(images_container) return [ (url, None) for url in text.extract_iter(images_container, r"\"", r"\"") ] class BatotoMangaExtractor(BatotoBase, MangaExtractor): """Extractor for bato.to manga""" reverse = False chapterclass = BatotoChapterExtractor pattern = (BASE_PATTERN + r"/(?:title/(\d+)[^/?#]*|series/(\d+)(?:/[^/?#]*)?)/?$") example = "https://bato.to/title/12345-MANGA/" def __init__(self, match): self.root = text.root_from_url(match.group(0)) self.manga_id = match.group(1) or match.group(2) url = "{}/title/{}".format(self.root, self.manga_id) MangaExtractor.__init__(self, match, url) def chapters(self, page): extr = text.extract_from(page) warning = extr(' class="alert alert-warning">', "</div><") if warning: raise exception.StopExtraction("'%s'", text.remove_html(warning)) data = { "manga_id": text.parse_int(self.manga_id), "manga" : text.unescape(extr( "<title>", "<").rpartition(" - ")[0]), } extr('<div data-hk="0-0-0-0"', "") results = [] while True: href = extr('<a href="/title/', '"') if not href: break chapter = href.rpartition("-ch_")[2] chapter, sep, minor = chapter.partition(".") data["chapter"] = text.parse_int(chapter) data["chapter_minor"] = sep + minor data["date"] = text.parse_datetime( extr('time="', '"'), "%Y-%m-%dT%H:%M:%S.%fZ") url = "{}/title/{}".format(self.root, href) results.append((url, data.copy())) return results �������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1709611201.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.26.9/gallery_dl/extractor/bbc.py�������������������������������������������������������0000644�0001750�0001750�00000005636�14571514301�017667� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2021-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://bbc.co.uk/""" from .common import GalleryExtractor, Extractor, Message from .. import text, util BASE_PATTERN = r"(?:https?://)?(?:www\.)?bbc\.co\.uk(/programmes/" class BbcGalleryExtractor(GalleryExtractor): """Extractor for a programme gallery on bbc.co.uk""" category = "bbc" root = "https://www.bbc.co.uk" directory_fmt = ("{category}", "{path[0]}", "{path[1]}", "{path[2]}", "{path[3:]:J - /}") filename_fmt = "{num:>02}.{extension}" archive_fmt = "{programme}_{num}" pattern = BASE_PATTERN + r"[^/?#]+(?!/galleries)(?:/[^/?#]+)?)$" example = "https://www.bbc.co.uk/programmes/PATH" def metadata(self, page): data = util.json_loads(text.extr( page, '<script type="application/ld+json">', '</script>')) return { "programme": self.gallery_url.split("/")[4], "path": list(util.unique_sequence( element["name"] for element in data["itemListElement"] )), } def images(self, page): width = self.config("width") width = width - width % 16 if width else 1920 dimensions = "/{}xn/".format(width) return [ (src.replace("/320x180_b/", dimensions), {"_fallback": self._fallback_urls(src, width)}) for src in text.extract_iter(page, 'data-image-src="', '"') ] @staticmethod def _fallback_urls(src, max_width): front, _, back = src.partition("/320x180_b/") for width in (1920, 1600, 1280, 976): if width < max_width: yield "{}/{}xn/{}".format(front, width, back) class BbcProgrammeExtractor(Extractor): """Extractor for all galleries of a bbc programme""" category = "bbc" subcategory = "programme" root = "https://www.bbc.co.uk" pattern = BASE_PATTERN + r"[^/?#]+/galleries)(?:/?\?page=(\d+))?" example = "https://www.bbc.co.uk/programmes/ID/galleries" def __init__(self, match): Extractor.__init__(self, match) self.path, self.page = match.groups() def items(self): data = {"_extractor": BbcGalleryExtractor} params = {"page": text.parse_int(self.page, 1)} galleries_url = self.root + self.path while True: page = self.request(galleries_url, params=params).text for programme_id in text.extract_iter( page, '<a href="https://www.bbc.co.uk/programmes/', '"'): url = "https://www.bbc.co.uk/programmes/" + programme_id yield Message.Queue, url, data if 'rel="next"' not in page: return params["page"] += 1 ��������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1709611201.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.26.9/gallery_dl/extractor/behance.py���������������������������������������������������0000644�0001750�0001750�00000035442�14571514301�020524� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2018-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.behance.net/""" from .common import Extractor, Message from .. import text, util, exception class BehanceExtractor(Extractor): """Base class for behance extractors""" category = "behance" root = "https://www.behance.net" request_interval = (2.0, 4.0) def _init(self): self._bcp = self.cookies.get("bcp", domain="www.behance.net") if not self._bcp: self._bcp = "4c34489d-914c-46cd-b44c-dfd0e661136d" self.cookies.set("bcp", self._bcp, domain="www.behance.net") def items(self): for gallery in self.galleries(): gallery["_extractor"] = BehanceGalleryExtractor yield Message.Queue, gallery["url"], self._update(gallery) def galleries(self): """Return all relevant gallery URLs""" def _request_graphql(self, endpoint, variables): url = self.root + "/v3/graphql" headers = { "Origin": self.root, "X-BCP" : self._bcp, "X-Requested-With": "XMLHttpRequest", } data = { "query" : GRAPHQL_QUERIES[endpoint], "variables": variables, } return self.request(url, method="POST", headers=headers, json=data).json()["data"] def _update(self, data): # compress data to simple lists if data["fields"] and isinstance(data["fields"][0], dict): data["fields"] = [ field.get("name") or field.get("label") for field in data["fields"] ] data["owners"] = [ owner.get("display_name") or owner.get("displayName") for owner in data["owners"] ] tags = data.get("tags") or () if tags and isinstance(tags[0], dict): tags = [tag["title"] for tag in tags] data["tags"] = tags data["date"] = text.parse_timestamp( data.get("publishedOn") or data.get("conceived_on") or 0) # backwards compatibility data["gallery_id"] = data["id"] data["title"] = data["name"] data["user"] = ", ".join(data["owners"]) return data class BehanceGalleryExtractor(BehanceExtractor): """Extractor for image galleries from www.behance.net""" subcategory = "gallery" directory_fmt = ("{category}", "{owners:J, }", "{id} {name}") filename_fmt = "{category}_{id}_{num:>02}.{extension}" archive_fmt = "{id}_{num}" pattern = r"(?:https?://)?(?:www\.)?behance\.net/gallery/(\d+)" example = "https://www.behance.net/gallery/12345/TITLE" def __init__(self, match): BehanceExtractor.__init__(self, match) self.gallery_id = match.group(1) def _init(self): BehanceExtractor._init(self) modules = self.config("modules") if modules: if isinstance(modules, str): modules = modules.split(",") self.modules = set(modules) else: self.modules = {"image", "video", "mediacollection", "embed"} def items(self): data = self.get_gallery_data() imgs = self.get_images(data) data["count"] = len(imgs) yield Message.Directory, data for data["num"], (url, module) in enumerate(imgs, 1): data["module"] = module data["extension"] = (module.get("extension") or text.ext_from_url(url)) yield Message.Url, url, data def get_gallery_data(self): """Collect gallery info dict""" url = "{}/gallery/{}/a".format(self.root, self.gallery_id) cookies = { "gki": '{"feature_project_view":false,' '"feature_discover_login_prompt":false,' '"feature_project_login_prompt":false}', "ilo0": "true", } page = self.request(url, cookies=cookies).text data = util.json_loads(text.extr( page, 'id="beconfig-store_state">', '</script>')) return self._update(data["project"]["project"]) def get_images(self, data): """Extract image results from an API response""" if not data["modules"]: access = data.get("matureAccess") if access == "logged-out": raise exception.AuthorizationError( "Mature content galleries require logged-in cookies") if access == "restricted-safe": raise exception.AuthorizationError( "Mature content blocked in account settings") if access and access != "allowed": raise exception.AuthorizationError() return () result = [] append = result.append for module in data["modules"]: mtype = module["__typename"][:-6].lower() if mtype not in self.modules: self.log.debug("Skipping '%s' module", mtype) continue if mtype == "image": url = module["imageSizes"]["size_original"]["url"] append((url, module)) elif mtype == "video": try: renditions = module["videoData"]["renditions"] except Exception: self.log.warning("No download URLs for video %s", module.get("id") or "???") continue try: url = [ r["url"] for r in renditions if text.ext_from_url(r["url"]) != "m3u8" ][-1] except Exception as exc: self.log.debug("%s: %s", exc.__class__.__name__, exc) url = "ytdl:" + renditions[-1]["url"] append((url, module)) elif mtype == "mediacollection": for component in module["components"]: for size in component["imageSizes"].values(): if size: parts = size["url"].split("/") parts[4] = "source" append(("/".join(parts), module)) break elif mtype == "embed": embed = module.get("originalEmbed") or module.get("fluidEmbed") if embed: embed = text.unescape(text.extr(embed, 'src="', '"')) module["extension"] = "mp4" append(("ytdl:" + embed, module)) elif mtype == "text": module["extension"] = "txt" append(("text:" + module["text"], module)) return result class BehanceUserExtractor(BehanceExtractor): """Extractor for a user's galleries from www.behance.net""" subcategory = "user" categorytransfer = True pattern = r"(?:https?://)?(?:www\.)?behance\.net/([^/?#]+)/?$" example = "https://www.behance.net/USER" def __init__(self, match): BehanceExtractor.__init__(self, match) self.user = match.group(1) def galleries(self): endpoint = "GetProfileProjects" variables = { "username": self.user, "after" : "MAo=", # "0" in base64 } while True: data = self._request_graphql(endpoint, variables) items = data["user"]["profileProjects"] yield from items["nodes"] if not items["pageInfo"]["hasNextPage"]: return variables["after"] = items["pageInfo"]["endCursor"] class BehanceCollectionExtractor(BehanceExtractor): """Extractor for a collection's galleries from www.behance.net""" subcategory = "collection" categorytransfer = True pattern = r"(?:https?://)?(?:www\.)?behance\.net/collection/(\d+)" example = "https://www.behance.net/collection/12345/TITLE" def __init__(self, match): BehanceExtractor.__init__(self, match) self.collection_id = match.group(1) def galleries(self): endpoint = "GetMoodboardItemsAndRecommendations" variables = { "afterItem": "MAo=", # "0" in base64 "firstItem": 40, "id" : int(self.collection_id), "shouldGetItems" : True, "shouldGetMoodboardFields": False, "shouldGetRecommendations": False, } while True: data = self._request_graphql(endpoint, variables) items = data["moodboard"]["items"] for node in items["nodes"]: yield node["entity"] if not items["pageInfo"]["hasNextPage"]: return variables["afterItem"] = items["pageInfo"]["endCursor"] GRAPHQL_QUERIES = { "GetProfileProjects": """\ query GetProfileProjects($username: String, $after: String) { user(username: $username) { profileProjects(first: 12, after: $after) { pageInfo { endCursor hasNextPage } nodes { __typename adminFlags { mature_lock privacy_lock dmca_lock flagged_lock privacy_violation_lock trademark_lock spam_lock eu_ip_lock } colors { r g b } covers { size_202 { url } size_404 { url } size_808 { url } } features { url name featuredOn ribbon { image image2x image3x } } fields { id label slug url } hasMatureContent id isFeatured isHiddenFromWorkTab isMatureReviewSubmitted isOwner isFounder isPinnedToSubscriptionOverview isPrivate linkedAssets { ...sourceLinkFields } linkedAssetsCount sourceFiles { ...sourceFileFields } matureAccess modifiedOn name owners { ...OwnerFields images { size_50 { url } } } premium publishedOn stats { appreciations { all } views { all } comments { all } } slug tools { id title category categoryLabel categoryId approved url backgroundColor } url } } } } fragment sourceFileFields on SourceFile { __typename sourceFileId projectId userId title assetId renditionUrl mimeType size category licenseType unitAmount currency tier hidden extension hasUserPurchased } fragment sourceLinkFields on LinkedAsset { __typename name premium url category licenseType } fragment OwnerFields on User { displayName hasPremiumAccess id isFollowing isProfileOwner location locationUrl url username availabilityInfo { availabilityTimeline isAvailableFullTime isAvailableFreelance } } """, "GetMoodboardItemsAndRecommendations": """\ query GetMoodboardItemsAndRecommendations( $id: Int! $firstItem: Int! $afterItem: String $shouldGetRecommendations: Boolean! $shouldGetItems: Boolean! $shouldGetMoodboardFields: Boolean! ) { viewer @include(if: $shouldGetMoodboardFields) { isOptedOutOfRecommendations isAdmin } moodboard(id: $id) { ...moodboardFields @include(if: $shouldGetMoodboardFields) items(first: $firstItem, after: $afterItem) @include(if: $shouldGetItems) { pageInfo { endCursor hasNextPage } nodes { ...nodesFields } } recommendedItems(first: 80) @include(if: $shouldGetRecommendations) { nodes { ...nodesFields fetchSource } } } } fragment moodboardFields on Moodboard { id label privacy followerCount isFollowing projectCount url isOwner owners { ...OwnerFields images { size_50 { url } size_100 { url } size_115 { url } size_230 { url } size_138 { url } size_276 { url } } } } fragment projectFields on Project { __typename id isOwner publishedOn matureAccess hasMatureContent modifiedOn name url isPrivate slug license { license description id label url text images } fields { label } colors { r g b } owners { ...OwnerFields images { size_50 { url } size_100 { url } size_115 { url } size_230 { url } size_138 { url } size_276 { url } } } covers { size_original { url } size_max_808 { url } size_808 { url } size_404 { url } size_202 { url } size_230 { url } size_115 { url } } stats { views { all } appreciations { all } comments { all } } } fragment exifDataValueFields on exifDataValue { id label value searchValue } fragment nodesFields on MoodboardItem { id entityType width height flexWidth flexHeight images { size url } entity { ... on Project { ...projectFields } ... on ImageModule { project { ...projectFields } colors { r g b } exifData { lens { ...exifDataValueFields } software { ...exifDataValueFields } makeAndModel { ...exifDataValueFields } focalLength { ...exifDataValueFields } iso { ...exifDataValueFields } location { ...exifDataValueFields } flash { ...exifDataValueFields } exposureMode { ...exifDataValueFields } shutterSpeed { ...exifDataValueFields } aperture { ...exifDataValueFields } } } ... on MediaCollectionComponent { project { ...projectFields } } } } fragment OwnerFields on User { displayName hasPremiumAccess id isFollowing isProfileOwner location locationUrl url username availabilityInfo { availabilityTimeline isAvailableFullTime isAvailableFreelance } } """, } ������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1709611201.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.26.9/gallery_dl/extractor/blogger.py���������������������������������������������������0000644�0001750�0001750�00000014771�14571514301�020562� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2019-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for Blogger blogs""" from .common import BaseExtractor, Message from .. import text, util import re class BloggerExtractor(BaseExtractor): """Base class for blogger extractors""" basecategory = "blogger" directory_fmt = ("blogger", "{blog[name]}", "{post[date]:%Y-%m-%d} {post[title]}") filename_fmt = "{num:>03}.{extension}" archive_fmt = "{post[id]}_{num}" def _init(self): self.api = BloggerAPI(self) self.blog = self.root.rpartition("/")[2] self.videos = self.config("videos", True) def items(self): blog = self.api.blog_by_url("http://" + self.blog) blog["pages"] = blog["pages"]["totalItems"] blog["posts"] = blog["posts"]["totalItems"] blog["date"] = text.parse_datetime(blog["published"]) del blog["selfLink"] sub = re.compile(r"(/|=)(?:[sw]\d+|w\d+-h\d+)(?=/|$)").sub findall_image = re.compile( r'src="(https?://(?:' r'blogger\.googleusercontent\.com/img|' r'lh\d+(?:-\w+)?\.googleusercontent\.com|' r'\d+\.bp\.blogspot\.com)/[^"]+)').findall findall_video = re.compile( r'src="(https?://www\.blogger\.com/video\.g\?token=[^"]+)').findall metadata = self.metadata() for post in self.posts(blog): content = post["content"] files = findall_image(content) for idx, url in enumerate(files): files[idx] = sub(r"\1s0", url).replace("http:", "https:", 1) if self.videos and 'id="BLOG_video-' in content: page = self.request(post["url"]).text for url in findall_video(page): page = self.request(url).text video_config = util.json_loads(text.extr( page, 'var VIDEO_CONFIG =', '\n')) files.append(max( video_config["streams"], key=lambda x: x["format_id"], )["play_url"]) post["author"] = post["author"]["displayName"] post["replies"] = post["replies"]["totalItems"] post["content"] = text.remove_html(content) post["date"] = text.parse_datetime(post["published"]) del post["selfLink"] del post["blog"] data = {"blog": blog, "post": post} if metadata: data.update(metadata) yield Message.Directory, data for data["num"], url in enumerate(files, 1): data["url"] = url yield Message.Url, url, text.nameext_from_url(url, data) def posts(self, blog): """Return an iterable with all relevant post objects""" def metadata(self): """Return additional metadata""" BASE_PATTERN = BloggerExtractor.update({ "blogspot": { "root": None, "pattern": r"[\w-]+\.blogspot\.com", }, "micmicidol": { "root": "https://www.micmicidol.club", "pattern": r"(?:www\.)?micmicidol\.club", }, }) class BloggerPostExtractor(BloggerExtractor): """Extractor for a single blog post""" subcategory = "post" pattern = BASE_PATTERN + r"(/\d\d\d\d/\d\d/[^/?#]+\.html)" example = "https://BLOG.blogspot.com/1970/01/TITLE.html" def __init__(self, match): BloggerExtractor.__init__(self, match) self.path = match.group(match.lastindex) def posts(self, blog): return (self.api.post_by_path(blog["id"], self.path),) class BloggerBlogExtractor(BloggerExtractor): """Extractor for an entire Blogger blog""" subcategory = "blog" pattern = BASE_PATTERN + r"/?$" example = "https://BLOG.blogspot.com/" def posts(self, blog): return self.api.blog_posts(blog["id"]) class BloggerSearchExtractor(BloggerExtractor): """Extractor for Blogger search resuls""" subcategory = "search" pattern = BASE_PATTERN + r"/search/?\?q=([^&#]+)" example = "https://BLOG.blogspot.com/search?q=QUERY" def __init__(self, match): BloggerExtractor.__init__(self, match) self.query = text.unquote(match.group(match.lastindex)) def posts(self, blog): return self.api.blog_search(blog["id"], self.query) def metadata(self): return {"query": self.query} class BloggerLabelExtractor(BloggerExtractor): """Extractor for Blogger posts by label""" subcategory = "label" pattern = BASE_PATTERN + r"/search/label/([^/?#]+)" example = "https://BLOG.blogspot.com/search/label/LABEL" def __init__(self, match): BloggerExtractor.__init__(self, match) self.label = text.unquote(match.group(match.lastindex)) def posts(self, blog): return self.api.blog_posts(blog["id"], self.label) def metadata(self): return {"label": self.label} class BloggerAPI(): """Minimal interface for the Blogger v3 API Ref: https://developers.google.com/blogger """ API_KEY = "AIzaSyCN9ax34oMMyM07g_M-5pjeDp_312eITK8" def __init__(self, extractor): self.extractor = extractor self.api_key = extractor.config("api-key", self.API_KEY) def blog_by_url(self, url): return self._call("blogs/byurl", {"url": url}, "blog") def blog_posts(self, blog_id, label=None): endpoint = "blogs/{}/posts".format(blog_id) params = {"labels": label} return self._pagination(endpoint, params) def blog_search(self, blog_id, query): endpoint = "blogs/{}/posts/search".format(blog_id) params = {"q": query} return self._pagination(endpoint, params) def post_by_path(self, blog_id, path): endpoint = "blogs/{}/posts/bypath".format(blog_id) return self._call(endpoint, {"path": path}, "post") def _call(self, endpoint, params, notfound=None): url = "https://www.googleapis.com/blogger/v3/" + endpoint params["key"] = self.api_key return self.extractor.request( url, params=params, notfound=notfound).json() def _pagination(self, endpoint, params): while True: data = self._call(endpoint, params) if "items" in data: yield from data["items"] if "nextPageToken" not in data: return params["pageToken"] = data["nextPageToken"] �������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1711040587.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.26.9/gallery_dl/extractor/bluesky.py���������������������������������������������������0000644�0001750�0001750�00000036260�14577064113�020623� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2024 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://bsky.app/""" from .common import Extractor, Message from .. import text, util, exception from ..cache import cache, memcache BASE_PATTERN = r"(?:https?://)?bsky\.app" USER_PATTERN = BASE_PATTERN + r"/profile/([^/?#]+)" class BlueskyExtractor(Extractor): """Base class for bluesky extractors""" category = "bluesky" directory_fmt = ("{category}", "{author[handle]}") filename_fmt = "{createdAt[:19]}_{post_id}_{num}.{extension}" archive_fmt = "{filename}" root = "https://bsky.app" def __init__(self, match): Extractor.__init__(self, match) self.user = match.group(1) def _init(self): meta = self.config("metadata") or () if meta: if isinstance(meta, str): meta = meta.replace(" ", "").split(",") elif not isinstance(meta, (list, tuple)): meta = ("user", "facets") self._metadata_user = ("user" in meta) self._metadata_facets = ("facets" in meta) self.api = BlueskyAPI(self) self._user = self._user_did = None self.instance = self.root.partition("://")[2] def items(self): for post in self.posts(): if "post" in post: post = post["post"] pid = post["uri"].rpartition("/")[2] if self._user_did and post["author"]["did"] != self._user_did: self.log.debug("Skipping %s (repost)", pid) continue post.update(post["record"]) del post["record"] images = () if "embed" in post: media = post["embed"] if "media" in media: media = media["media"] if "images" in media: images = media["images"] if self._metadata_facets: if "facets" in post: post["hashtags"] = tags = [] post["mentions"] = dids = [] post["uris"] = uris = [] for facet in post["facets"]: features = facet["features"][0] if "tag" in features: tags.append(features["tag"]) elif "did" in features: dids.append(features["did"]) elif "uri" in features: uris.append(features["uri"]) else: post["hashtags"] = post["mentions"] = post["uris"] = () if self._metadata_user: post["user"] = self._user or post["author"] post["instance"] = self.instance post["post_id"] = pid post["count"] = len(images) post["date"] = text.parse_datetime( post["createdAt"][:19], "%Y-%m-%dT%H:%M:%S") yield Message.Directory, post if not images: continue base = ("https://bsky.social/xrpc/com.atproto.sync.getBlob" "?did={}&cid=".format(post["author"]["did"])) post["num"] = 0 for file in images: post["num"] += 1 post["description"] = file["alt"] try: aspect = file["aspectRatio"] post["width"] = aspect["width"] post["height"] = aspect["height"] except KeyError: post["width"] = post["height"] = 0 image = file["image"] try: cid = image["ref"]["$link"] except KeyError: cid = image["cid"] post["filename"] = cid post["extension"] = image["mimeType"].rpartition("/")[2] yield Message.Url, base + cid, post def posts(self): return () def _make_post(self, actor, kind): did = self.api._did_from_actor(actor) profile = self.api.get_profile(did) if kind not in profile: return () cid = profile[kind].rpartition("/")[2].partition("@")[0] return ({ "post": { "embed": {"images": [{ "alt": kind, "image": { "$type" : "blob", "ref" : {"$link": cid}, "mimeType": "image/jpeg", "size" : 0, }, "aspectRatio": { "width" : 1000, "height": 1000, }, }]}, "author" : profile, "record" : (), "createdAt": "", "uri" : cid, }, },) class BlueskyUserExtractor(BlueskyExtractor): subcategory = "user" pattern = USER_PATTERN + r"$" example = "https://bsky.app/profile/HANDLE" def initialize(self): pass def items(self): base = "{}/profile/{}/".format(self.root, self.user) return self._dispatch_extractors(( (BlueskyAvatarExtractor , base + "avatar"), (BlueskyBackgroundExtractor, base + "banner"), (BlueskyPostsExtractor , base + "posts"), (BlueskyRepliesExtractor , base + "replies"), (BlueskyMediaExtractor , base + "media"), (BlueskyLikesExtractor , base + "likes"), ), ("media",)) class BlueskyPostsExtractor(BlueskyExtractor): subcategory = "posts" pattern = USER_PATTERN + r"/posts" example = "https://bsky.app/profile/HANDLE/posts" def posts(self): return self.api.get_author_feed(self.user, "posts_and_author_threads") class BlueskyRepliesExtractor(BlueskyExtractor): subcategory = "replies" pattern = USER_PATTERN + r"/replies" example = "https://bsky.app/profile/HANDLE/replies" def posts(self): return self.api.get_author_feed(self.user, "posts_with_replies") class BlueskyMediaExtractor(BlueskyExtractor): subcategory = "media" pattern = USER_PATTERN + r"/media" example = "https://bsky.app/profile/HANDLE/media" def posts(self): return self.api.get_author_feed(self.user, "posts_with_media") class BlueskyLikesExtractor(BlueskyExtractor): subcategory = "likes" pattern = USER_PATTERN + r"/likes" example = "https://bsky.app/profile/HANDLE/likes" def posts(self): return self.api.get_actor_likes(self.user) class BlueskyFeedExtractor(BlueskyExtractor): subcategory = "feed" pattern = USER_PATTERN + r"/feed/([^/?#]+)" example = "https://bsky.app/profile/HANDLE/feed/NAME" def __init__(self, match): BlueskyExtractor.__init__(self, match) self.feed = match.group(2) def posts(self): return self.api.get_feed(self.user, self.feed) class BlueskyListExtractor(BlueskyExtractor): subcategory = "list" pattern = USER_PATTERN + r"/lists/([^/?#]+)" example = "https://bsky.app/profile/HANDLE/lists/ID" def __init__(self, match): BlueskyExtractor.__init__(self, match) self.list = match.group(2) def posts(self): return self.api.get_list_feed(self.user, self.list) class BlueskyFollowingExtractor(BlueskyExtractor): subcategory = "following" pattern = USER_PATTERN + r"/follows" example = "https://bsky.app/profile/HANDLE/follows" def items(self): for user in self.api.get_follows(self.user): url = "https://bsky.app/profile/" + user["did"] user["_extractor"] = BlueskyUserExtractor yield Message.Queue, url, user class BlueskyPostExtractor(BlueskyExtractor): subcategory = "post" pattern = USER_PATTERN + r"/post/([^/?#]+)" example = "https://bsky.app/profile/HANDLE/post/ID" def __init__(self, match): BlueskyExtractor.__init__(self, match) self.post_id = match.group(2) def posts(self): return self.api.get_post_thread(self.user, self.post_id) class BlueskyAvatarExtractor(BlueskyExtractor): subcategory = "avatar" filename_fmt = "avatar_{post_id}.{extension}" pattern = USER_PATTERN + r"/avatar" example = "https://bsky.app/profile/HANDLE/avatar" def posts(self): return self._make_post(self.user, "avatar") class BlueskyBackgroundExtractor(BlueskyExtractor): subcategory = "background" filename_fmt = "background_{post_id}.{extension}" pattern = USER_PATTERN + r"/ba(?:nner|ckground)" example = "https://bsky.app/profile/HANDLE/banner" def posts(self): return self._make_post(self.user, "banner") class BlueskySearchExtractor(BlueskyExtractor): subcategory = "search" pattern = BASE_PATTERN + r"/search(?:/|\?q=)(.+)" example = "https://bsky.app/search?q=QUERY" def posts(self): return self.api.search_posts(self.user) class BlueskyAPI(): """Interface for the Bluesky API https://www.docs.bsky.app/docs/category/http-reference """ def __init__(self, extractor): self.extractor = extractor self.log = extractor.log self.headers = {"Accept": "application/json"} self.username, self.password = extractor._get_auth_info() if self.username: self.root = "https://bsky.social" else: self.root = "https://api.bsky.app" self.authenticate = util.noop def get_actor_likes(self, actor): endpoint = "app.bsky.feed.getActorLikes" params = { "actor": self._did_from_actor(actor), "limit": "100", } return self._pagination(endpoint, params) def get_author_feed(self, actor, filter="posts_and_author_threads"): endpoint = "app.bsky.feed.getAuthorFeed" params = { "actor" : self._did_from_actor(actor), "filter": filter, "limit" : "100", } return self._pagination(endpoint, params) def get_feed(self, actor, feed): endpoint = "app.bsky.feed.getFeed" params = { "feed" : "at://{}/app.bsky.feed.generator/{}".format( self._did_from_actor(actor, False), feed), "limit": "100", } return self._pagination(endpoint, params) def get_follows(self, actor): endpoint = "app.bsky.graph.getFollows" params = { "actor": self._did_from_actor(actor), "limit": "100", } return self._pagination(endpoint, params, "follows") def get_list_feed(self, actor, list): endpoint = "app.bsky.feed.getListFeed" params = { "list" : "at://{}/app.bsky.graph.list/{}".format( self._did_from_actor(actor, False), list), "limit": "100", } return self._pagination(endpoint, params) def get_post_thread(self, actor, post_id): endpoint = "app.bsky.feed.getPostThread" params = { "uri": "at://{}/app.bsky.feed.post/{}".format( self._did_from_actor(actor), post_id), "depth" : self.extractor.config("depth", "0"), "parentHeight": "0", } thread = self._call(endpoint, params)["thread"] if "replies" not in thread: return (thread,) index = 0 posts = [thread] while index < len(posts): post = posts[index] if "replies" in post: posts.extend(post["replies"]) index += 1 return posts @memcache(keyarg=1) def get_profile(self, did): endpoint = "app.bsky.actor.getProfile" params = {"actor": did} return self._call(endpoint, params) @memcache(keyarg=1) def resolve_handle(self, handle): endpoint = "com.atproto.identity.resolveHandle" params = {"handle": handle} return self._call(endpoint, params)["did"] def search_posts(self, query): endpoint = "app.bsky.feed.searchPosts" params = { "q" : query, "limit": "100", } return self._pagination(endpoint, params, "posts") def _did_from_actor(self, actor, user_did=True): if actor.startswith("did:"): did = actor else: did = self.resolve_handle(actor) extr = self.extractor if user_did and not extr.config("reposts", False): extr._user_did = did if extr._metadata_user: extr._user = self.get_profile(did) return did def authenticate(self): self.headers["Authorization"] = self._authenticate_impl(self.username) @cache(maxage=3600, keyarg=1) def _authenticate_impl(self, username): refresh_token = _refresh_token_cache(username) if refresh_token: self.log.info("Refreshing access token for %s", username) endpoint = "com.atproto.server.refreshSession" headers = {"Authorization": "Bearer " + refresh_token} data = None else: self.log.info("Logging in as %s", username) endpoint = "com.atproto.server.createSession" headers = None data = { "identifier": username, "password" : self.password, } url = "{}/xrpc/{}".format(self.root, endpoint) response = self.extractor.request( url, method="POST", headers=headers, json=data, fatal=None) data = response.json() if response.status_code != 200: self.log.debug("Server response: %s", data) raise exception.AuthenticationError('"{}: {}"'.format( data.get("error"), data.get("message"))) _refresh_token_cache.update(self.username, data["refreshJwt"]) return "Bearer " + data["accessJwt"] def _call(self, endpoint, params): url = "{}/xrpc/{}".format(self.root, endpoint) while True: self.authenticate() response = self.extractor.request( url, params=params, headers=self.headers, fatal=None) if response.status_code < 400: return response.json() if response.status_code == 429: until = response.headers.get("RateLimit-Reset") self.extractor.wait(until=until) continue try: data = response.json() msg = "API request failed ('{}: {}')".format( data["error"], data["message"]) except Exception: msg = "API request failed ({} {})".format( response.status_code, response.reason) self.extractor.log.debug("Server response: %s", response.text) raise exception.StopExtraction(msg) def _pagination(self, endpoint, params, key="feed"): while True: data = self._call(endpoint, params) yield from data[key] cursor = data.get("cursor") if not cursor: return params["cursor"] = cursor @cache(maxage=84*86400, keyarg=0) def _refresh_token_cache(username): return None ������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1708354600.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.26.9/gallery_dl/extractor/booru.py�����������������������������������������������������0000644�0001750�0001750�00000004530�14564666050�020272� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2015-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for *booru sites""" from .common import BaseExtractor, Message from .. import text import operator class BooruExtractor(BaseExtractor): """Base class for *booru extractors""" basecategory = "booru" filename_fmt = "{category}_{id}_{md5}.{extension}" page_start = 0 per_page = 100 def items(self): self.login() data = self.metadata() tags = self.config("tags", False) notes = self.config("notes", False) fetch_html = tags or notes url_key = self.config("url") if url_key: self._file_url = operator.itemgetter(url_key) for post in self.posts(): try: url = self._file_url(post) if url[0] == "/": url = self.root + url except (KeyError, TypeError): self.log.debug("Unable to fetch download URL for post %s " "(md5: %s)", post.get("id"), post.get("md5")) continue if fetch_html: html = self._html(post) if tags: self._tags(post, html) if notes: self._notes(post, html) text.nameext_from_url(url, post) post.update(data) self._prepare(post) yield Message.Directory, post yield Message.Url, url, post def skip(self, num): pages = num // self.per_page self.page_start += pages return pages * self.per_page def login(self): """Login and set necessary cookies""" def metadata(self): """Return a dict with general metadata""" return () def posts(self): """Return an iterable with post objects""" return () _file_url = operator.itemgetter("file_url") def _prepare(self, post): """Prepare a 'post's metadata""" def _html(self, post): """Return HTML content of a post""" def _tags(self, post, page): """Extract extended tag metadata""" def _notes(self, post, page): """Extract notes metadata""" ������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1711069832.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.26.9/gallery_dl/extractor/bunkr.py�����������������������������������������������������0000644�0001750�0001750�00000005643�14577155210�020266� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2022-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://bunkr.sk/""" from .lolisafe import LolisafeAlbumExtractor from .. import text BASE_PATTERN = ( r"(?:https?://)?(?:app\.)?(bunkr+" r"\.(?:s[kiu]|ru|la|is|to|ac|black|cat|media|red|site|ws))" ) LEGACY_DOMAINS = { "bunkr.ru", "bunkrr.ru", "bunkr.su", "bunkrr.su", "bunkr.la", "bunkr.is", "bunkr.to", } class BunkrAlbumExtractor(LolisafeAlbumExtractor): """Extractor for bunkr.sk albums""" category = "bunkr" root = "https://bunkr.sk" pattern = BASE_PATTERN + r"/a/([^/?#]+)" example = "https://bunkr.sk/a/ID" def __init__(self, match): LolisafeAlbumExtractor.__init__(self, match) domain = match.group(match.lastindex-1) if domain not in LEGACY_DOMAINS: self.root = "https://" + domain def fetch_album(self, album_id): # album metadata page = self.request(self.root + "/a/" + self.album_id).text info = text.split_html(text.extr( page, "<h1", "</div>").partition(">")[2]) count, _, size = info[1].split(None, 2) pos = page.index('class="grid-images') urls = list(text.extract_iter(page, '<a href="', '"', pos)) return self._extract_files(urls), { "album_id" : self.album_id, "album_name" : text.unescape(info[0]), "album_size" : size[1:-1], "count" : len(urls), } def _extract_files(self, urls): for url in urls: try: url = self._extract_file(text.unescape(url)) except Exception as exc: self.log.error("%s: %s", exc.__class__.__name__, exc) continue yield {"file": text.unescape(url)} def _extract_file(self, url): page = self.request(url).text return ( text.extr(page, '<source src="', '"') or text.extr(page, '<img src="', '"') or text.rextract(page, ' href="', '"', page.rindex("Download"))[0] ) class BunkrMediaExtractor(BunkrAlbumExtractor): """Extractor for bunkr.sk media links""" subcategory = "media" directory_fmt = ("{category}",) pattern = BASE_PATTERN + r"(/[vid]/[^/?#]+)" example = "https://bunkr.sk/v/FILENAME" def fetch_album(self, album_id): try: url = self._extract_file(self.root + self.album_id) except Exception as exc: self.log.error("%s: %s", exc.__class__.__name__, exc) return (), {} return ({"file": text.unescape(url)},), { "album_id" : "", "album_name" : "", "album_size" : -1, "description": "", "count" : 1, } ���������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1709611201.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.26.9/gallery_dl/extractor/catbox.py����������������������������������������������������0000644�0001750�0001750�00000003515�14571514301�020413� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2022-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://catbox.moe/""" from .common import GalleryExtractor, Extractor, Message from .. import text class CatboxAlbumExtractor(GalleryExtractor): """Extractor for catbox albums""" category = "catbox" subcategory = "album" root = "https://catbox.moe" filename_fmt = "{filename}.{extension}" directory_fmt = ("{category}", "{album_name} ({album_id})") archive_fmt = "{album_id}_{filename}" pattern = r"(?:https?://)?(?:www\.)?catbox\.moe(/c/[^/?#]+)" example = "https://catbox.moe/c/ID" def metadata(self, page): extr = text.extract_from(page) return { "album_id" : self.gallery_url.rpartition("/")[2], "album_name" : text.unescape(extr("<h1>", "<")), "date" : text.parse_datetime(extr( "<p>Created ", "<"), "%B %d %Y"), "description": text.unescape(extr("<p>", "<")), } def images(self, page): return [ ("https://files.catbox.moe/" + path, None) for path in text.extract_iter( page, ">https://files.catbox.moe/", "<") ] class CatboxFileExtractor(Extractor): """Extractor for catbox files""" category = "catbox" subcategory = "file" archive_fmt = "{filename}" pattern = r"(?:https?://)?(?:files|litter|de)\.catbox\.moe/([^/?#]+)" example = "https://files.catbox.moe/NAME.EXT" def items(self): url = text.ensure_http_scheme(self.url) file = text.nameext_from_url(url, {"url": url}) yield Message.Directory, file yield Message.Url, url, file �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1709611201.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.26.9/gallery_dl/extractor/chevereto.py�������������������������������������������������0000644�0001750�0001750�00000006352�14571514301�021121� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for Chevereto galleries""" from .common import BaseExtractor, Message from .. import text class CheveretoExtractor(BaseExtractor): """Base class for chevereto extractors""" basecategory = "chevereto" directory_fmt = ("{category}", "{user}", "{album}",) archive_fmt = "{id}" def __init__(self, match): BaseExtractor.__init__(self, match) self.path = match.group(match.lastindex) def _pagination(self, url): while url: page = self.request(url).text for item in text.extract_iter( page, '<div class="list-item-image ', 'image-container'): yield text.extr(item, '<a href="', '"') url = text.extr(page, '<a data-pagination="next" href="', '" ><') BASE_PATTERN = CheveretoExtractor.update({ "jpgfish": { "root": "https://jpg4.su", "pattern": r"jpe?g\d?\.(?:su|pet|fish(?:ing)?|church)", }, "imgkiwi": { "root": "https://img.kiwi", "pattern": r"img\.kiwi", }, "deltaporno": { "root": "https://gallery.deltaporno.com", "pattern": r"gallery\.deltaporno\.com", }, }) class CheveretoImageExtractor(CheveretoExtractor): """Extractor for chevereto Images""" subcategory = "image" pattern = BASE_PATTERN + r"(/im(?:g|age)/[^/?#]+)" example = "https://jpg2.su/img/TITLE.ID" def items(self): url = self.root + self.path extr = text.extract_from(self.request(url).text) image = { "id" : self.path.rpartition(".")[2], "url" : extr('<meta property="og:image" content="', '"'), "album": text.extr(extr("Added to <a", "/a>"), ">", "<"), "user" : extr('username: "', '"'), } text.nameext_from_url(image["url"], image) yield Message.Directory, image yield Message.Url, image["url"], image class CheveretoAlbumExtractor(CheveretoExtractor): """Extractor for chevereto Albums""" subcategory = "album" pattern = BASE_PATTERN + r"(/a(?:lbum)?/[^/?#]+(?:/sub)?)" example = "https://jpg2.su/album/TITLE.ID" def items(self): url = self.root + self.path data = {"_extractor": CheveretoImageExtractor} if self.path.endswith("/sub"): albums = self._pagination(url) else: albums = (url,) for album in albums: for image in self._pagination(album): yield Message.Queue, image, data class CheveretoUserExtractor(CheveretoExtractor): """Extractor for chevereto Users""" subcategory = "user" pattern = BASE_PATTERN + r"(/(?!img|image|a(?:lbum)?)[^/?#]+(?:/albums)?)" example = "https://jpg2.su/USER" def items(self): url = self.root + self.path if self.path.endswith("/albums"): data = {"_extractor": CheveretoAlbumExtractor} else: data = {"_extractor": CheveretoImageExtractor} for url in self._pagination(url): yield Message.Queue, url, data ��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1709611201.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.26.9/gallery_dl/extractor/comicvine.py�������������������������������������������������0000644�0001750�0001750�00000004007�14571514301�021104� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2021-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://comicvine.gamespot.com/""" from .booru import BooruExtractor from .. import text import operator class ComicvineTagExtractor(BooruExtractor): """Extractor for a gallery on comicvine.gamespot.com""" category = "comicvine" subcategory = "tag" basecategory = "" root = "https://comicvine.gamespot.com" per_page = 1000 directory_fmt = ("{category}", "{tag}") filename_fmt = "{filename}.{extension}" archive_fmt = "{id}" pattern = (r"(?:https?://)?comicvine\.gamespot\.com" r"(/([^/?#]+)/(\d+-\d+)/images/.*)") example = "https://comicvine.gamespot.com/TAG/123-45/images/" def __init__(self, match): BooruExtractor.__init__(self, match) self.path, self.object_name, self.object_id = match.groups() def metadata(self): return {"tag": text.unquote(self.object_name)} def posts(self): url = self.root + "/js/image-data.json" params = { "images": text.extract( self.request(self.root + self.path).text, 'data-gallery-id="', '"')[0], "start" : self.page_start, "count" : self.per_page, "object": self.object_id, } while True: images = self.request(url, params=params).json()["images"] yield from images if len(images) < self.per_page: return params["start"] += self.per_page def skip(self, num): self.page_start = num return num _file_url = operator.itemgetter("original") @staticmethod def _prepare(post): post["date"] = text.parse_datetime( post["dateCreated"], "%a, %b %d %Y") post["tags"] = [tag["name"] for tag in post["tags"] if tag["name"]] �������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1711040587.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.26.9/gallery_dl/extractor/common.py����������������������������������������������������0000644�0001750�0001750�00000074757�14577064113�020452� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2014-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Common classes and constants used by extractor modules.""" import os import re import ssl import time import netrc import queue import logging import datetime import requests import threading from requests.adapters import HTTPAdapter from .message import Message from .. import config, text, util, cache, exception class Extractor(): category = "" subcategory = "" basecategory = "" categorytransfer = False directory_fmt = ("{category}",) filename_fmt = "{filename}.{extension}" archive_fmt = "" root = "" cookies_domain = "" referer = True ciphers = None tls12 = True browser = None request_interval = 0.0 request_interval_min = 0.0 request_timestamp = 0.0 def __init__(self, match): self.log = logging.getLogger(self.category) self.url = match.string self._cfgpath = ("extractor", self.category, self.subcategory) self._parentdir = "" @classmethod def from_url(cls, url): if isinstance(cls.pattern, str): cls.pattern = re.compile(cls.pattern) match = cls.pattern.match(url) return cls(match) if match else None def __iter__(self): self.initialize() return self.items() def initialize(self): self._init_options() self._init_session() self._init_cookies() self._init() self.initialize = util.noop def finalize(self): pass def items(self): yield Message.Version, 1 def skip(self, num): return 0 def config(self, key, default=None): return config.interpolate(self._cfgpath, key, default) def config2(self, key, key2, default=None, sentinel=util.SENTINEL): value = self.config(key, sentinel) if value is not sentinel: return value return self.config(key2, default) def config_deprecated(self, key, deprecated, default=None, sentinel=util.SENTINEL, history=set()): value = self.config(deprecated, sentinel) if value is not sentinel: if deprecated not in history: history.add(deprecated) self.log.warning("'%s' is deprecated. Use '%s' instead.", deprecated, key) default = value value = self.config(key, sentinel) if value is not sentinel: return value return default def config_accumulate(self, key): return config.accumulate(self._cfgpath, key) def config_instance(self, key, default=None): return default def _config_shared(self, key, default=None): return config.interpolate_common( ("extractor",), self._cfgpath, key, default) def _config_shared_accumulate(self, key): first = True extr = ("extractor",) for path in self._cfgpath: if first: first = False values = config.accumulate(extr + path, key) else: conf = config.get(extr, path[0]) if conf: values[:0] = config.accumulate( (self.subcategory,), key, conf=conf) return values def request(self, url, method="GET", session=None, retries=None, retry_codes=None, encoding=None, fatal=True, notfound=None, **kwargs): if session is None: session = self.session if retries is None: retries = self._retries if retry_codes is None: retry_codes = self._retry_codes if "proxies" not in kwargs: kwargs["proxies"] = self._proxies if "timeout" not in kwargs: kwargs["timeout"] = self._timeout if "verify" not in kwargs: kwargs["verify"] = self._verify if "json" in kwargs: json = kwargs["json"] if json is not None: kwargs["data"] = util.json_dumps(json).encode() del kwargs["json"] headers = kwargs.get("headers") if headers: headers["Content-Type"] = "application/json" else: kwargs["headers"] = {"Content-Type": "application/json"} response = None tries = 1 if self._interval: seconds = (self._interval() - (time.time() - Extractor.request_timestamp)) if seconds > 0.0: self.sleep(seconds, "request") while True: try: response = session.request(method, url, **kwargs) except (requests.exceptions.ConnectionError, requests.exceptions.Timeout, requests.exceptions.ChunkedEncodingError, requests.exceptions.ContentDecodingError) as exc: msg = exc except (requests.exceptions.RequestException) as exc: raise exception.HttpError(exc) else: code = response.status_code if self._write_pages: self._dump_response(response) if 200 <= code < 400 or fatal is None and \ (400 <= code < 500) or not fatal and \ (400 <= code < 429 or 431 <= code < 500): if encoding: response.encoding = encoding return response if notfound and code == 404: raise exception.NotFoundError(notfound) msg = "'{} {}' for '{}'".format(code, response.reason, url) server = response.headers.get("Server") if server and server.startswith("cloudflare") and \ code in (403, 503): content = response.content if b"_cf_chl_opt" in content or b"jschl-answer" in content: self.log.warning("Cloudflare challenge") break if b'name="captcha-bypass"' in content: self.log.warning("Cloudflare CAPTCHA") break if code not in retry_codes and code < 500: break finally: Extractor.request_timestamp = time.time() self.log.debug("%s (%s/%s)", msg, tries, retries+1) if tries > retries: break if self._interval: seconds = self._interval() if seconds < tries: seconds = tries else: seconds = tries self.sleep(seconds, "retry") tries += 1 raise exception.HttpError(msg, response) def wait(self, seconds=None, until=None, adjust=1.0, reason="rate limit reset"): now = time.time() if seconds: seconds = float(seconds) until = now + seconds elif until: if isinstance(until, datetime.datetime): # convert to UTC timestamp until = util.datetime_to_timestamp(until) else: until = float(until) seconds = until - now else: raise ValueError("Either 'seconds' or 'until' is required") seconds += adjust if seconds <= 0.0: return if reason: t = datetime.datetime.fromtimestamp(until).time() isotime = "{:02}:{:02}:{:02}".format(t.hour, t.minute, t.second) self.log.info("Waiting until %s for %s.", isotime, reason) time.sleep(seconds) def sleep(self, seconds, reason): self.log.debug("Sleeping %.2f seconds (%s)", seconds, reason) time.sleep(seconds) def _get_auth_info(self): """Return authentication information as (username, password) tuple""" username = self.config("username") password = None if username: password = self.config("password") or util.LazyPrompt() elif self.config("netrc", False): try: info = netrc.netrc().authenticators(self.category) username, _, password = info except (OSError, netrc.NetrcParseError) as exc: self.log.error("netrc: %s", exc) except TypeError: self.log.warning("netrc: No authentication info") return username, password def _init(self): pass def _init_options(self): self._write_pages = self.config("write-pages", False) self._retry_codes = self.config("retry-codes") self._retries = self.config("retries", 4) self._timeout = self.config("timeout", 30) self._verify = self.config("verify", True) self._proxies = util.build_proxy_map(self.config("proxy"), self.log) self._interval = util.build_duration_func( self.config("sleep-request", self.request_interval), self.request_interval_min, ) if self._retries < 0: self._retries = float("inf") if not self._retry_codes: self._retry_codes = () def _init_session(self): self.session = session = requests.Session() headers = session.headers headers.clear() ssl_options = ssl_ciphers = 0 browser = self.config("browser") if browser is None: browser = self.browser if browser and isinstance(browser, str): browser, _, platform = browser.lower().partition(":") if not platform or platform == "auto": platform = ("Windows NT 10.0; Win64; x64" if util.WINDOWS else "X11; Linux x86_64") elif platform == "windows": platform = "Windows NT 10.0; Win64; x64" elif platform == "linux": platform = "X11; Linux x86_64" elif platform == "macos": platform = "Macintosh; Intel Mac OS X 11.5" if browser == "chrome": if platform.startswith("Macintosh"): platform = platform.replace(".", "_") + "_2" else: browser = "firefox" for key, value in HTTP_HEADERS[browser]: if value and "{}" in value: headers[key] = value.format(platform) else: headers[key] = value ssl_options |= (ssl.OP_NO_SSLv2 | ssl.OP_NO_SSLv3 | ssl.OP_NO_TLSv1 | ssl.OP_NO_TLSv1_1) ssl_ciphers = SSL_CIPHERS[browser] else: useragent = self.config("user-agent") if useragent is None: useragent = ("Mozilla/5.0 (Windows NT 10.0; Win64; x64; " "rv:109.0) Gecko/20100101 Firefox/115.0") elif useragent == "browser": useragent = _browser_useragent() headers["User-Agent"] = useragent headers["Accept"] = "*/*" headers["Accept-Language"] = "en-US,en;q=0.5" ssl_ciphers = self.ciphers if BROTLI: headers["Accept-Encoding"] = "gzip, deflate, br" else: headers["Accept-Encoding"] = "gzip, deflate" referer = self.config("referer", self.referer) if referer: if isinstance(referer, str): headers["Referer"] = referer elif self.root: headers["Referer"] = self.root + "/" custom_headers = self.config("headers") if custom_headers: headers.update(custom_headers) custom_ciphers = self.config("ciphers") if custom_ciphers: if isinstance(custom_ciphers, list): ssl_ciphers = ":".join(custom_ciphers) else: ssl_ciphers = custom_ciphers source_address = self.config("source-address") if source_address: if isinstance(source_address, str): source_address = (source_address, 0) else: source_address = (source_address[0], source_address[1]) tls12 = self.config("tls12") if tls12 is None: tls12 = self.tls12 if not tls12: ssl_options |= ssl.OP_NO_TLSv1_2 self.log.debug("TLS 1.2 disabled.") adapter = _build_requests_adapter( ssl_options, ssl_ciphers, source_address) session.mount("https://", adapter) session.mount("http://", adapter) def _init_cookies(self): """Populate the session's cookiejar""" self.cookies = self.session.cookies self.cookies_file = None if self.cookies_domain is None: return cookies = self.config("cookies") if cookies: if isinstance(cookies, dict): self.cookies_update_dict(cookies, self.cookies_domain) elif isinstance(cookies, str): path = util.expand_path(cookies) try: with open(path) as fp: util.cookiestxt_load(fp, self.cookies) except Exception as exc: self.log.warning("cookies: %s", exc) else: self.log.debug("Loading cookies from '%s'", cookies) self.cookies_file = path elif isinstance(cookies, (list, tuple)): key = tuple(cookies) cookiejar = _browser_cookies.get(key) if cookiejar is None: from ..cookies import load_cookies cookiejar = self.cookies.__class__() try: load_cookies(cookiejar, cookies) except Exception as exc: self.log.warning("cookies: %s", exc) else: _browser_cookies[key] = cookiejar else: self.log.debug("Using cached cookies from %s", key) set_cookie = self.cookies.set_cookie for cookie in cookiejar: set_cookie(cookie) else: self.log.warning( "Expected 'dict', 'list', or 'str' value for 'cookies' " "option, got '%s' (%s)", cookies.__class__.__name__, cookies) def cookies_store(self): """Store the session's cookies in a cookies.txt file""" export = self.config("cookies-update", True) if not export: return if isinstance(export, str): path = util.expand_path(export) else: path = self.cookies_file if not path: return try: with open(path, "w") as fp: util.cookiestxt_store(fp, self.cookies) except OSError as exc: self.log.warning("cookies: %s", exc) def cookies_update(self, cookies, domain=""): """Update the session's cookiejar with 'cookies'""" if isinstance(cookies, dict): self.cookies_update_dict(cookies, domain or self.cookies_domain) else: set_cookie = self.cookies.set_cookie try: cookies = iter(cookies) except TypeError: set_cookie(cookies) else: for cookie in cookies: set_cookie(cookie) def cookies_update_dict(self, cookiedict, domain): """Update cookiejar with name-value pairs from a dict""" set_cookie = self.cookies.set for name, value in cookiedict.items(): set_cookie(name, value, domain=domain) def cookies_check(self, cookies_names, domain=None): """Check if all 'cookies_names' are in the session's cookiejar""" if not self.cookies: return False if domain is None: domain = self.cookies_domain names = set(cookies_names) now = time.time() for cookie in self.cookies: if cookie.name in names and ( not domain or cookie.domain == domain): if cookie.expires: diff = int(cookie.expires - now) if diff <= 0: self.log.warning( "Cookie '%s' has expired", cookie.name) continue elif diff <= 86400: hours = diff // 3600 self.log.warning( "Cookie '%s' will expire in less than %s hour%s", cookie.name, hours + 1, "s" if hours else "") names.discard(cookie.name) if not names: return True return False def _prepare_ddosguard_cookies(self): if not self.cookies.get("__ddg2", domain=self.cookies_domain): self.cookies.set( "__ddg2", util.generate_token(), domain=self.cookies_domain) def _cache(self, func, maxage, keyarg=None): # return cache.DatabaseCacheDecorator(func, maxage, keyarg) return cache.DatabaseCacheDecorator(func, keyarg, maxage) def _cache_memory(self, func, maxage=None, keyarg=None): return cache.Memcache() def _get_date_min_max(self, dmin=None, dmax=None): """Retrieve and parse 'date-min' and 'date-max' config values""" def get(key, default): ts = self.config(key, default) if isinstance(ts, str): try: ts = int(datetime.datetime.strptime(ts, fmt).timestamp()) except ValueError as exc: self.log.warning("Unable to parse '%s': %s", key, exc) ts = default return ts fmt = self.config("date-format", "%Y-%m-%dT%H:%M:%S") return get("date-min", dmin), get("date-max", dmax) def _dispatch_extractors(self, extractor_data, default=()): """ """ extractors = { data[0].subcategory: data for data in extractor_data } include = self.config("include", default) or () if include == "all": include = extractors elif isinstance(include, str): include = include.replace(" ", "").split(",") result = [(Message.Version, 1)] for category in include: try: extr, url = extractors[category] except KeyError: self.log.warning("Invalid include '%s'", category) else: result.append((Message.Queue, url, {"_extractor": extr})) return iter(result) @classmethod def _dump(cls, obj): util.dump_json(obj, ensure_ascii=False, indent=2) def _dump_response(self, response, history=True): """Write the response content to a .dump file in the current directory. The file name is derived from the response url, replacing special characters with "_" """ if history: for resp in response.history: self._dump_response(resp, False) if hasattr(Extractor, "_dump_index"): Extractor._dump_index += 1 else: Extractor._dump_index = 1 Extractor._dump_sanitize = re.compile(r"[\\\\|/<>:\"?*&=#]+").sub fname = "{:>02}_{}".format( Extractor._dump_index, Extractor._dump_sanitize('_', response.url), ) if util.WINDOWS: path = os.path.abspath(fname)[:255] else: path = fname[:251] try: with open(path + ".txt", 'wb') as fp: util.dump_response( response, fp, headers=(self._write_pages in ("all", "ALL")), hide_auth=(self._write_pages != "ALL") ) except Exception as e: self.log.warning("Failed to dump HTTP request (%s: %s)", e.__class__.__name__, e) class GalleryExtractor(Extractor): subcategory = "gallery" filename_fmt = "{category}_{gallery_id}_{num:>03}.{extension}" directory_fmt = ("{category}", "{gallery_id} {title}") archive_fmt = "{gallery_id}_{num}" enum = "num" def __init__(self, match, url=None): Extractor.__init__(self, match) self.gallery_url = self.root + match.group(1) if url is None else url def items(self): self.login() if self.gallery_url: page = self.request( self.gallery_url, notfound=self.subcategory).text else: page = None data = self.metadata(page) imgs = self.images(page) if "count" in data: if self.config("page-reverse"): images = util.enumerate_reversed(imgs, 1, data["count"]) else: images = zip( range(1, data["count"]+1), imgs, ) else: enum = enumerate try: data["count"] = len(imgs) except TypeError: pass else: if self.config("page-reverse"): enum = util.enumerate_reversed images = enum(imgs, 1) yield Message.Directory, data for data[self.enum], (url, imgdata) in images: if imgdata: data.update(imgdata) if "extension" not in imgdata: text.nameext_from_url(url, data) else: text.nameext_from_url(url, data) yield Message.Url, url, data def login(self): """Login and set necessary cookies""" def metadata(self, page): """Return a dict with general metadata""" def images(self, page): """Return a list of all (image-url, metadata)-tuples""" class ChapterExtractor(GalleryExtractor): subcategory = "chapter" directory_fmt = ( "{category}", "{manga}", "{volume:?v/ />02}c{chapter:>03}{chapter_minor:?//}{title:?: //}") filename_fmt = ( "{manga}_c{chapter:>03}{chapter_minor:?//}_{page:>03}.{extension}") archive_fmt = ( "{manga}_{chapter}{chapter_minor}_{page}") enum = "page" class MangaExtractor(Extractor): subcategory = "manga" categorytransfer = True chapterclass = None reverse = True def __init__(self, match, url=None): Extractor.__init__(self, match) self.manga_url = url or self.root + match.group(1) if self.config("chapter-reverse", False): self.reverse = not self.reverse def items(self): self.login() page = self.request(self.manga_url).text chapters = self.chapters(page) if self.reverse: chapters.reverse() for chapter, data in chapters: data["_extractor"] = self.chapterclass yield Message.Queue, chapter, data def login(self): """Login and set necessary cookies""" def chapters(self, page): """Return a list of all (chapter-url, metadata)-tuples""" class AsynchronousMixin(): """Run info extraction in a separate thread""" def __iter__(self): self.initialize() messages = queue.Queue(5) thread = threading.Thread( target=self.async_items, args=(messages,), daemon=True, ) thread.start() while True: msg = messages.get() if msg is None: thread.join() return if isinstance(msg, Exception): thread.join() raise msg yield msg messages.task_done() def async_items(self, messages): try: for msg in self.items(): messages.put(msg) except Exception as exc: messages.put(exc) messages.put(None) class BaseExtractor(Extractor): instances = () def __init__(self, match): if not self.category: self._init_category(match) Extractor.__init__(self, match) def _init_category(self, match): for index, group in enumerate(match.groups()): if group is not None: if index: self.category, self.root, info = self.instances[index-1] if not self.root: self.root = text.root_from_url(match.group(0)) self.config_instance = info.get else: self.root = group self.category = group.partition("://")[2] break @classmethod def update(cls, instances): extra_instances = config.get(("extractor",), cls.basecategory) if extra_instances: for category, info in extra_instances.items(): if isinstance(info, dict) and "root" in info: instances[category] = info pattern_list = [] instance_list = cls.instances = [] for category, info in instances.items(): root = info["root"] if root: root = root.rstrip("/") instance_list.append((category, root, info)) pattern = info.get("pattern") if not pattern: pattern = re.escape(root[root.index(":") + 3:]) pattern_list.append(pattern + "()") return ( r"(?:" + cls.basecategory + r":(https?://[^/?#]+)|" r"(?:https?://)?(?:" + "|".join(pattern_list) + r"))" ) class RequestsAdapter(HTTPAdapter): def __init__(self, ssl_context=None, source_address=None): self.ssl_context = ssl_context self.source_address = source_address HTTPAdapter.__init__(self) def init_poolmanager(self, *args, **kwargs): kwargs["ssl_context"] = self.ssl_context kwargs["source_address"] = self.source_address return HTTPAdapter.init_poolmanager(self, *args, **kwargs) def proxy_manager_for(self, *args, **kwargs): kwargs["ssl_context"] = self.ssl_context kwargs["source_address"] = self.source_address return HTTPAdapter.proxy_manager_for(self, *args, **kwargs) def _build_requests_adapter(ssl_options, ssl_ciphers, source_address): key = (ssl_options, ssl_ciphers, source_address) try: return _adapter_cache[key] except KeyError: pass if ssl_options or ssl_ciphers: ssl_context = ssl.create_default_context() if ssl_options: ssl_context.options |= ssl_options if ssl_ciphers: ssl_context.set_ecdh_curve("prime256v1") ssl_context.set_ciphers(ssl_ciphers) else: ssl_context = None adapter = _adapter_cache[key] = RequestsAdapter( ssl_context, source_address) return adapter @cache.cache(maxage=86400) def _browser_useragent(): """Get User-Agent header from default browser""" import webbrowser import socket server = socket.socket(socket.AF_INET, socket.SOCK_STREAM) server.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) server.bind(("127.0.0.1", 6414)) server.listen(1) webbrowser.open("http://127.0.0.1:6414/user-agent") client = server.accept()[0] server.close() for line in client.recv(1024).split(b"\r\n"): key, _, value = line.partition(b":") if key.strip().lower() == b"user-agent": useragent = value.strip() break else: useragent = b"" client.send(b"HTTP/1.1 200 OK\r\n\r\n" + useragent) client.close() return useragent.decode() _adapter_cache = {} _browser_cookies = {} HTTP_HEADERS = { "firefox": ( ("User-Agent", "Mozilla/5.0 ({}; " "rv:109.0) Gecko/20100101 Firefox/115.0"), ("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9," "image/avif,image/webp,*/*;q=0.8"), ("Accept-Language", "en-US,en;q=0.5"), ("Accept-Encoding", None), ("Referer", None), ("DNT", "1"), ("Connection", "keep-alive"), ("Upgrade-Insecure-Requests", "1"), ("Cookie", None), ("Sec-Fetch-Dest", "empty"), ("Sec-Fetch-Mode", "no-cors"), ("Sec-Fetch-Site", "same-origin"), ("TE", "trailers"), ), "chrome": ( ("Connection", "keep-alive"), ("Upgrade-Insecure-Requests", "1"), ("User-Agent", "Mozilla/5.0 ({}) AppleWebKit/537.36 (KHTML, " "like Gecko) Chrome/111.0.0.0 Safari/537.36"), ("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9," "image/avif,image/webp,image/apng,*/*;q=0.8," "application/signed-exchange;v=b3;q=0.7"), ("Referer", None), ("Sec-Fetch-Site", "same-origin"), ("Sec-Fetch-Mode", "no-cors"), ("Sec-Fetch-Dest", "empty"), ("Accept-Encoding", None), ("Accept-Language", "en-US,en;q=0.9"), ("cookie", None), ("content-length", None), ), } SSL_CIPHERS = { "firefox": ( "TLS_AES_128_GCM_SHA256:" "TLS_CHACHA20_POLY1305_SHA256:" "TLS_AES_256_GCM_SHA384:" "ECDHE-ECDSA-AES128-GCM-SHA256:" "ECDHE-RSA-AES128-GCM-SHA256:" "ECDHE-ECDSA-CHACHA20-POLY1305:" "ECDHE-RSA-CHACHA20-POLY1305:" "ECDHE-ECDSA-AES256-GCM-SHA384:" "ECDHE-RSA-AES256-GCM-SHA384:" "ECDHE-ECDSA-AES256-SHA:" "ECDHE-ECDSA-AES128-SHA:" "ECDHE-RSA-AES128-SHA:" "ECDHE-RSA-AES256-SHA:" "AES128-GCM-SHA256:" "AES256-GCM-SHA384:" "AES128-SHA:" "AES256-SHA" ), "chrome": ( "TLS_AES_128_GCM_SHA256:" "TLS_AES_256_GCM_SHA384:" "TLS_CHACHA20_POLY1305_SHA256:" "ECDHE-ECDSA-AES128-GCM-SHA256:" "ECDHE-RSA-AES128-GCM-SHA256:" "ECDHE-ECDSA-AES256-GCM-SHA384:" "ECDHE-RSA-AES256-GCM-SHA384:" "ECDHE-ECDSA-CHACHA20-POLY1305:" "ECDHE-RSA-CHACHA20-POLY1305:" "ECDHE-RSA-AES128-SHA:" "ECDHE-RSA-AES256-SHA:" "AES128-GCM-SHA256:" "AES256-GCM-SHA384:" "AES128-SHA:" "AES256-SHA" ), } urllib3 = requests.packages.urllib3 # detect brotli support try: BROTLI = urllib3.response.brotli is not None except AttributeError: BROTLI = False # set (urllib3) warnings filter action = config.get((), "warnings", "default") if action: try: import warnings warnings.simplefilter(action, urllib3.exceptions.HTTPWarning) except Exception: pass del action �����������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1709611201.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.26.9/gallery_dl/extractor/cyberdrop.py�������������������������������������������������0000644�0001750�0001750�00000004167�14571514301�021130� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://cyberdrop.me/""" from . import lolisafe from .common import Message from .. import text class CyberdropAlbumExtractor(lolisafe.LolisafeAlbumExtractor): category = "cyberdrop" root = "https://cyberdrop.me" pattern = r"(?:https?://)?(?:www\.)?cyberdrop\.(?:me|to)/a/([^/?#]+)" example = "https://cyberdrop.me/a/ID" def items(self): files, data = self.fetch_album(self.album_id) yield Message.Directory, data for data["num"], file in enumerate(files, 1): file.update(data) text.nameext_from_url(file["name"], file) file["name"], sep, file["id"] = file["filename"].rpartition("-") yield Message.Url, file["url"], file def fetch_album(self, album_id): url = "{}/a/{}".format(self.root, album_id) page = self.request(url).text extr = text.extract_from(page) desc = extr('property="og:description" content="', '"') if desc.startswith("A privacy-focused censorship-resistant file " "sharing platform free for everyone."): desc = "" extr('id="title"', "") album = { "album_id" : self.album_id, "album_name" : text.unescape(extr('title="', '"')), "album_size" : text.parse_bytes(extr( '<p class="title">', "B")), "date" : text.parse_datetime(extr( '<p class="title">', '<'), "%d.%m.%Y"), "description": text.unescape(text.unescape( # double desc.rpartition(" [R")[0])), } file_ids = list(text.extract_iter(page, 'id="file" href="/f/', '"')) album["count"] = len(file_ids) return self._extract_files(file_ids), album def _extract_files(self, file_ids): for file_id in file_ids: url = "{}/api/f/{}".format(self.root, file_id) yield self.request(url).json() ���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1709611201.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.26.9/gallery_dl/extractor/danbooru.py��������������������������������������������������0000644�0001750�0001750�00000023460�14571514301�020745� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2014-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://danbooru.donmai.us/ and other Danbooru instances""" from .common import BaseExtractor, Message from .. import text, util import datetime class DanbooruExtractor(BaseExtractor): """Base class for danbooru extractors""" basecategory = "Danbooru" filename_fmt = "{category}_{id}_{filename}.{extension}" page_limit = 1000 page_start = None per_page = 200 request_interval = (0.5, 1.5) def _init(self): self.ugoira = self.config("ugoira", False) self.external = self.config("external", False) self.includes = False threshold = self.config("threshold") if isinstance(threshold, int): self.threshold = 1 if threshold < 1 else threshold else: self.threshold = self.per_page username, api_key = self._get_auth_info() if username: self.log.debug("Using HTTP Basic Auth for user '%s'", username) self.session.auth = util.HTTPBasicAuth(username, api_key) def skip(self, num): pages = num // self.per_page if pages >= self.page_limit: pages = self.page_limit - 1 self.page_start = pages + 1 return pages * self.per_page def items(self): self.session.headers["User-Agent"] = util.USERAGENT includes = self.config("metadata") if includes: if isinstance(includes, (list, tuple)): includes = ",".join(includes) elif not isinstance(includes, str): includes = "artist_commentary,children,notes,parent,uploader" self.includes = includes + ",id" data = self.metadata() for post in self.posts(): try: url = post["file_url"] except KeyError: if self.external and post["source"]: post.update(data) yield Message.Directory, post yield Message.Queue, post["source"], post continue text.nameext_from_url(url, post) post["date"] = text.parse_datetime( post["created_at"], "%Y-%m-%dT%H:%M:%S.%f%z") post["tags"] = ( post["tag_string"].split(" ") if post["tag_string"] else ()) post["tags_artist"] = ( post["tag_string_artist"].split(" ") if post["tag_string_artist"] else ()) post["tags_character"] = ( post["tag_string_character"].split(" ") if post["tag_string_character"] else ()) post["tags_copyright"] = ( post["tag_string_copyright"].split(" ") if post["tag_string_copyright"] else ()) post["tags_general"] = ( post["tag_string_general"].split(" ") if post["tag_string_general"] else ()) post["tags_meta"] = ( post["tag_string_meta"].split(" ") if post["tag_string_meta"] else ()) if post["extension"] == "zip": if self.ugoira: post["frames"] = self._ugoira_frames(post) post["_http_adjust_extension"] = False else: url = post["large_file_url"] post["extension"] = "webm" if url[0] == "/": url = self.root + url post.update(data) yield Message.Directory, post yield Message.Url, url, post def metadata(self): return () def posts(self): return () def _pagination(self, endpoint, params, prefix=None): url = self.root + endpoint params["limit"] = self.per_page params["page"] = self.page_start first = True while True: posts = self.request(url, params=params).json() if isinstance(posts, dict): posts = posts["posts"] if posts: if self.includes: params_meta = { "only" : self.includes, "limit": len(posts), "tags" : "id:" + ",".join(str(p["id"]) for p in posts), } data = { meta["id"]: meta for meta in self.request( url, params=params_meta).json() } for post in posts: post.update(data[post["id"]]) if prefix == "a" and not first: posts.reverse() yield from posts if len(posts) < self.threshold: return if prefix: params["page"] = "{}{}".format(prefix, posts[-1]["id"]) elif params["page"]: params["page"] += 1 else: params["page"] = 2 first = False def _ugoira_frames(self, post): data = self.request("{}/posts/{}.json?only=media_metadata".format( self.root, post["id"]) ).json()["media_metadata"]["metadata"] ext = data["ZIP:ZipFileName"].rpartition(".")[2] fmt = ("{:>06}." + ext).format delays = data["Ugoira:FrameDelays"] return [{"file": fmt(index), "delay": delay} for index, delay in enumerate(delays)] BASE_PATTERN = DanbooruExtractor.update({ "danbooru": { "root": None, "pattern": r"(?:(?:danbooru|hijiribe|sonohara|safebooru)\.donmai\.us" r"|donmai\.moe)", }, "atfbooru": { "root": "https://booru.allthefallen.moe", "pattern": r"booru\.allthefallen\.moe", }, "aibooru": { "root": None, "pattern": r"(?:safe\.)?aibooru\.online", }, "booruvar": { "root": "https://booru.borvar.art", "pattern": r"booru\.borvar\.art", }, }) class DanbooruTagExtractor(DanbooruExtractor): """Extractor for danbooru posts from tag searches""" subcategory = "tag" directory_fmt = ("{category}", "{search_tags}") archive_fmt = "t_{search_tags}_{id}" pattern = BASE_PATTERN + r"/posts\?(?:[^&#]*&)*tags=([^&#]*)" example = "https://danbooru.donmai.us/posts?tags=TAG" def __init__(self, match): DanbooruExtractor.__init__(self, match) tags = match.group(match.lastindex) self.tags = text.unquote(tags.replace("+", " ")) def metadata(self): return {"search_tags": self.tags} def posts(self): prefix = "b" for tag in self.tags.split(): if tag.startswith("order:"): if tag == "order:id" or tag == "order:id_asc": prefix = "a" elif tag == "order:id_desc": prefix = "b" else: prefix = None elif tag.startswith( ("id:", "md5", "ordfav:", "ordfavgroup:", "ordpool:")): prefix = None break return self._pagination("/posts.json", {"tags": self.tags}, prefix) class DanbooruPoolExtractor(DanbooruExtractor): """Extractor for posts from danbooru pools""" subcategory = "pool" directory_fmt = ("{category}", "pool", "{pool[id]} {pool[name]}") archive_fmt = "p_{pool[id]}_{id}" pattern = BASE_PATTERN + r"/pool(?:s|/show)/(\d+)" example = "https://danbooru.donmai.us/pools/12345" def __init__(self, match): DanbooruExtractor.__init__(self, match) self.pool_id = match.group(match.lastindex) def metadata(self): url = "{}/pools/{}.json".format(self.root, self.pool_id) pool = self.request(url).json() pool["name"] = pool["name"].replace("_", " ") self.post_ids = pool.pop("post_ids", ()) return {"pool": pool} def posts(self): params = {"tags": "pool:" + self.pool_id} return self._pagination("/posts.json", params, "b") class DanbooruPostExtractor(DanbooruExtractor): """Extractor for single danbooru posts""" subcategory = "post" archive_fmt = "{id}" pattern = BASE_PATTERN + r"/post(?:s|/show)/(\d+)" example = "https://danbooru.donmai.us/posts/12345" def __init__(self, match): DanbooruExtractor.__init__(self, match) self.post_id = match.group(match.lastindex) def posts(self): url = "{}/posts/{}.json".format(self.root, self.post_id) post = self.request(url).json() if self.includes: params = {"only": self.includes} post.update(self.request(url, params=params).json()) return (post,) class DanbooruPopularExtractor(DanbooruExtractor): """Extractor for popular images from danbooru""" subcategory = "popular" directory_fmt = ("{category}", "popular", "{scale}", "{date}") archive_fmt = "P_{scale[0]}_{date}_{id}" pattern = BASE_PATTERN + r"/(?:explore/posts/)?popular(?:\?([^#]*))?" example = "https://danbooru.donmai.us/explore/posts/popular" def __init__(self, match): DanbooruExtractor.__init__(self, match) self.params = match.group(match.lastindex) def metadata(self): self.params = params = text.parse_query(self.params) scale = params.get("scale", "day") date = params.get("date") or datetime.date.today().isoformat() if scale == "week": date = datetime.date.fromisoformat(date) date = (date - datetime.timedelta(days=date.weekday())).isoformat() elif scale == "month": date = date[:-3] return {"date": date, "scale": scale} def posts(self): return self._pagination("/explore/posts/popular.json", self.params) ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1709611201.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.26.9/gallery_dl/extractor/desktopography.py��������������������������������������������0000644�0001750�0001750�00000006111�14571514301�022171� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://desktopography.net/""" from .common import Extractor, Message from .. import text BASE_PATTERN = r"(?:https?://)?desktopography\.net" class DesktopographyExtractor(Extractor): """Base class for desktopography extractors""" category = "desktopography" archive_fmt = "{filename}" root = "https://desktopography.net" class DesktopographySiteExtractor(DesktopographyExtractor): """Extractor for all desktopography exhibitions """ subcategory = "site" pattern = BASE_PATTERN + r"/$" example = "https://desktopography.net/" def items(self): page = self.request(self.root).text data = {"_extractor": DesktopographyExhibitionExtractor} for exhibition_year in text.extract_iter( page, '<a href="https://desktopography.net/exhibition-', '/">'): url = self.root + "/exhibition-" + exhibition_year + "/" yield Message.Queue, url, data class DesktopographyExhibitionExtractor(DesktopographyExtractor): """Extractor for a yearly desktopography exhibition""" subcategory = "exhibition" pattern = BASE_PATTERN + r"/exhibition-([^/?#]+)/" example = "https://desktopography.net/exhibition-2020/" def __init__(self, match): DesktopographyExtractor.__init__(self, match) self.year = match.group(1) def items(self): url = "{}/exhibition-{}/".format(self.root, self.year) base_entry_url = "https://desktopography.net/portfolios/" page = self.request(url).text data = { "_extractor": DesktopographyEntryExtractor, "year": self.year, } for entry_url in text.extract_iter( page, '<a class="overlay-background" href="' + base_entry_url, '">'): url = base_entry_url + entry_url yield Message.Queue, url, data class DesktopographyEntryExtractor(DesktopographyExtractor): """Extractor for all resolutions of a desktopography wallpaper""" subcategory = "entry" pattern = BASE_PATTERN + r"/portfolios/([\w-]+)" example = "https://desktopography.net/portfolios/NAME/" def __init__(self, match): DesktopographyExtractor.__init__(self, match) self.entry = match.group(1) def items(self): url = "{}/portfolios/{}".format(self.root, self.entry) page = self.request(url).text entry_data = {"entry": self.entry} yield Message.Directory, entry_data for image_data in text.extract_iter( page, '<a target="_blank" href="https://desktopography.net', '">'): path, _, filename = image_data.partition( '" class="wallpaper-button" download="') text.nameext_from_url(filename, entry_data) yield Message.Url, self.root + path, entry_data �������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1711204979.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.26.9/gallery_dl/extractor/deviantart.py������������������������������������������������0000644�0001750�0001750�00000212244�14577565163�021317� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2015-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.deviantart.com/""" from .common import Extractor, Message from .. import text, util, exception from ..cache import cache, memcache import collections import itertools import mimetypes import binascii import time import re BASE_PATTERN = ( r"(?:https?://)?(?:" r"(?:www\.)?(?:fx)?deviantart\.com/(?!watch/)([\w-]+)|" r"(?!www\.)([\w-]+)\.(?:fx)?deviantart\.com)" ) DEFAULT_AVATAR = "https://a.deviantart.net/avatars/default.gif" class DeviantartExtractor(Extractor): """Base class for deviantart extractors""" category = "deviantart" root = "https://www.deviantart.com" directory_fmt = ("{category}", "{username}") filename_fmt = "{category}_{index}_{title}.{extension}" cookies_domain = None cookies_names = ("auth", "auth_secure", "userinfo") _last_request = 0 def __init__(self, match): Extractor.__init__(self, match) self.user = (match.group(1) or match.group(2) or "").lower() self.offset = 0 def _init(self): self.jwt = self.config("jwt", False) self.flat = self.config("flat", True) self.extra = self.config("extra", False) self.quality = self.config("quality", "100") self.original = self.config("original", True) self.intermediary = self.config("intermediary", True) self.comments_avatars = self.config("comments-avatars", False) self.comments = self.comments_avatars or self.config("comments", False) self.api = DeviantartOAuthAPI(self) self.group = False self._premium_cache = {} unwatch = self.config("auto-unwatch") if unwatch: self.unwatch = [] self.finalize = self._unwatch_premium else: self.unwatch = None if self.quality: if self.quality == "png": self.quality = "-fullview.png?" self.quality_sub = re.compile(r"-fullview\.[a-z0-9]+\?").sub else: self.quality = ",q_{}".format(self.quality) self.quality_sub = re.compile(r",q_\d+").sub if self.original != "image": self._update_content = self._update_content_default else: self._update_content = self._update_content_image self.original = True journals = self.config("journals", "html") if journals == "html": self.commit_journal = self._commit_journal_html elif journals == "text": self.commit_journal = self._commit_journal_text else: self.commit_journal = None def request(self, url, **kwargs): if "fatal" not in kwargs: kwargs["fatal"] = False while True: response = Extractor.request(self, url, **kwargs) if response.status_code != 403 or \ b"Request blocked." not in response.content: return response self.wait(seconds=300, reason="CloudFront block") def skip(self, num): self.offset += num return num def login(self): if self.cookies_check(self.cookies_names): return True username, password = self._get_auth_info() if username: self.cookies_update(_login_impl(self, username, password)) return True def items(self): if self.user: group = self.config("group", True) if group: user = _user_details(self, self.user) if user: self.user = user["username"] self.group = False elif group == "skip": self.log.info("Skipping group '%s'", self.user) raise exception.StopExtraction() else: self.subcategory = "group-" + self.subcategory self.group = True for deviation in self.deviations(): if isinstance(deviation, tuple): url, data = deviation yield Message.Queue, url, data continue if deviation["is_deleted"]: # prevent crashing in case the deviation really is # deleted self.log.debug( "Skipping %s (deleted)", deviation["deviationid"]) continue tier_access = deviation.get("tier_access") if tier_access == "locked": self.log.debug( "Skipping %s (access locked)", deviation["deviationid"]) continue if "premium_folder_data" in deviation: data = self._fetch_premium(deviation) if not data: continue deviation.update(data) self.prepare(deviation) yield Message.Directory, deviation if "content" in deviation: content = self._extract_content(deviation) yield self.commit(deviation, content) elif deviation["is_downloadable"]: content = self.api.deviation_download(deviation["deviationid"]) deviation["is_original"] = True yield self.commit(deviation, content) if "videos" in deviation and deviation["videos"]: video = max(deviation["videos"], key=lambda x: text.parse_int(x["quality"][:-1])) deviation["is_original"] = False yield self.commit(deviation, video) if "flash" in deviation: deviation["is_original"] = True yield self.commit(deviation, deviation["flash"]) if self.commit_journal: if "excerpt" in deviation: journal = self.api.deviation_content( deviation["deviationid"]) elif "body" in deviation: journal = {"html": deviation.pop("body")} else: journal = None if journal: if self.extra: deviation["_journal"] = journal["html"] deviation["is_original"] = True yield self.commit_journal(deviation, journal) if self.comments_avatars: for comment in deviation["comments"]: user = comment["user"] name = user["username"].lower() if user["usericon"] == DEFAULT_AVATAR: self.log.debug( "Skipping avatar of '%s' (default)", name) continue _user_details.update(name, user) url = "{}/{}/avatar/".format(self.root, name) comment["_extractor"] = DeviantartAvatarExtractor yield Message.Queue, url, comment if not self.extra: continue # ref: https://www.deviantart.com # /developers/http/v1/20210526/object/editor_text # the value of "features" is a JSON string with forward # slashes escaped text_content = \ deviation["text_content"]["body"]["features"].replace( "\\/", "/") if "text_content" in deviation else None for txt in (text_content, deviation.get("description"), deviation.get("_journal")): if txt is None: continue for match in DeviantartStashExtractor.pattern.finditer(txt): url = text.ensure_http_scheme(match.group(0)) deviation["_extractor"] = DeviantartStashExtractor yield Message.Queue, url, deviation def deviations(self): """Return an iterable containing all relevant Deviation-objects""" def prepare(self, deviation): """Adjust the contents of a Deviation-object""" if "index" not in deviation: try: if deviation["url"].startswith(( "https://www.deviantart.com/stash/", "https://sta.sh", )): filename = deviation["content"]["src"].split("/")[5] deviation["index_base36"] = filename.partition("-")[0][1:] deviation["index"] = id_from_base36( deviation["index_base36"]) else: deviation["index"] = text.parse_int( deviation["url"].rpartition("-")[2]) except KeyError: deviation["index"] = 0 deviation["index_base36"] = "0" if "index_base36" not in deviation: deviation["index_base36"] = base36_from_id(deviation["index"]) if self.user: deviation["username"] = self.user deviation["_username"] = self.user.lower() else: deviation["username"] = deviation["author"]["username"] deviation["_username"] = deviation["username"].lower() deviation["da_category"] = deviation["category"] deviation["published_time"] = text.parse_int( deviation["published_time"]) deviation["date"] = text.parse_timestamp( deviation["published_time"]) if self.comments: deviation["comments"] = ( self._extract_comments(deviation["deviationid"], "deviation") if deviation["stats"]["comments"] else () ) # filename metadata sub = re.compile(r"\W").sub deviation["filename"] = "".join(( sub("_", deviation["title"].lower()), "_by_", sub("_", deviation["author"]["username"].lower()), "-d", deviation["index_base36"], )) @staticmethod def commit(deviation, target): url = target["src"] name = target.get("filename") or url target = target.copy() target["filename"] = deviation["filename"] deviation["target"] = target deviation["extension"] = target["extension"] = text.ext_from_url(name) if "is_original" not in deviation: deviation["is_original"] = ("/v1/" not in url) return Message.Url, url, deviation def _commit_journal_html(self, deviation, journal): title = text.escape(deviation["title"]) url = deviation["url"] thumbs = deviation.get("thumbs") or deviation.get("files") html = journal["html"] shadow = SHADOW_TEMPLATE.format_map(thumbs[0]) if thumbs else "" if "css" in journal: css, cls = journal["css"], "withskin" elif html.startswith("<style"): css, _, html = html.partition("</style>") css = css.partition(">")[2] cls = "withskin" else: css, cls = "", "journal-green" if html.find('<div class="boxtop journaltop">', 0, 250) != -1: needle = '<div class="boxtop journaltop">' header = HEADER_CUSTOM_TEMPLATE.format( title=title, url=url, date=deviation["date"], ) else: needle = '<div usr class="gr">' catlist = deviation["category_path"].split("/") categories = " / ".join( ('<span class="crumb"><a href="{}/{}/"><span>{}</span></a>' '</span>').format(self.root, cpath, cat.capitalize()) for cat, cpath in zip( catlist, itertools.accumulate(catlist, lambda t, c: t + "/" + c) ) ) username = deviation["author"]["username"] urlname = deviation.get("username") or username.lower() header = HEADER_TEMPLATE.format( title=title, url=url, userurl="{}/{}/".format(self.root, urlname), username=username, date=deviation["date"], categories=categories, ) if needle in html: html = html.replace(needle, header, 1) else: html = JOURNAL_TEMPLATE_HTML_EXTRA.format(header, html) html = JOURNAL_TEMPLATE_HTML.format( title=title, html=html, shadow=shadow, css=css, cls=cls) deviation["extension"] = "htm" return Message.Url, html, deviation @staticmethod def _commit_journal_text(deviation, journal): html = journal["html"] if html.startswith("<style"): html = html.partition("</style>")[2] head, _, tail = html.rpartition("<script") content = "\n".join( text.unescape(text.remove_html(txt)) for txt in (head or tail).split("<br />") ) txt = JOURNAL_TEMPLATE_TEXT.format( title=deviation["title"], username=deviation["author"]["username"], date=deviation["date"], content=content, ) deviation["extension"] = "txt" return Message.Url, txt, deviation def _extract_content(self, deviation): content = deviation["content"] if self.original and deviation["is_downloadable"]: self._update_content(deviation, content) return content if self.jwt: self._update_token(deviation, content) return content if content["src"].startswith("https://images-wixmp-"): if self.intermediary and deviation["index"] <= 790677560: # https://github.com/r888888888/danbooru/issues/4069 intermediary, count = re.subn( r"(/f/[^/]+/[^/]+)/v\d+/.*", r"/intermediary\1", content["src"], 1) if count: deviation["is_original"] = False deviation["_fallback"] = (content["src"],) content["src"] = intermediary if self.quality: content["src"] = self.quality_sub( self.quality, content["src"], 1) return content @staticmethod def _find_folder(folders, name, uuid): if uuid.isdecimal(): match = re.compile(name.replace( "-", r"[^a-z0-9]+") + "$", re.IGNORECASE).match for folder in folders: if match(folder["name"]): return folder else: for folder in folders: if folder["folderid"] == uuid: return folder raise exception.NotFoundError("folder") def _folder_urls(self, folders, category, extractor): base = "{}/{}/{}/".format(self.root, self.user, category) for folder in folders: folder["_extractor"] = extractor url = "{}{}/{}".format(base, folder["folderid"], folder["name"]) yield url, folder def _update_content_default(self, deviation, content): if "premium_folder_data" in deviation or deviation.get("is_mature"): public = False else: public = None data = self.api.deviation_download(deviation["deviationid"], public) content.update(data) deviation["is_original"] = True def _update_content_image(self, deviation, content): data = self.api.deviation_download(deviation["deviationid"]) url = data["src"].partition("?")[0] mtype = mimetypes.guess_type(url, False)[0] if mtype and mtype.startswith("image/"): content.update(data) deviation["is_original"] = True def _update_token(self, deviation, content): """Replace JWT to be able to remove width/height limits All credit goes to @Ironchest337 for discovering and implementing this method """ url, sep, _ = content["src"].partition("/v1/") if not sep: return # 'images-wixmp' returns 401 errors, but just 'wixmp' still works url = url.replace("//images-wixmp", "//wixmp", 1) # header = b'{"typ":"JWT","alg":"none"}' payload = ( b'{"sub":"urn:app:","iss":"urn:app:","obj":[[{"path":"/f/' + url.partition("/f/")[2].encode() + b'"}]],"aud":["urn:service:file.download"]}' ) deviation["_fallback"] = (content["src"],) deviation["is_original"] = True content["src"] = ( "{}?token=eyJ0eXAiOiJKV1QiLCJhbGciOiJub25lIn0.{}.".format( url, # base64 of 'header' is precomputed as 'eyJ0eX...' # binascii.b2a_base64(header).rstrip(b"=\n").decode(), binascii.b2a_base64(payload).rstrip(b"=\n").decode()) ) def _extract_comments(self, target_id, target_type="deviation"): results = None comment_ids = [None] while comment_ids: comments = self.api.comments( target_id, target_type, comment_ids.pop()) if results: results.extend(comments) else: results = comments # parent comments, i.e. nodes with at least one child parents = {c["parentid"] for c in comments} # comments with more than one reply replies = {c["commentid"] for c in comments if c["replies"]} # add comment UUIDs with replies that are not parent to any node comment_ids.extend(replies - parents) return results def _limited_request(self, url, **kwargs): """Limits HTTP requests to one every 2 seconds""" diff = time.time() - DeviantartExtractor._last_request if diff < 2.0: self.sleep(2.0 - diff, "request") response = self.request(url, **kwargs) DeviantartExtractor._last_request = time.time() return response def _fetch_premium(self, deviation): try: return self._premium_cache[deviation["deviationid"]] except KeyError: pass if not self.api.refresh_token_key: self.log.warning( "Unable to access premium content (no refresh-token)") self._fetch_premium = lambda _: None return None dev = self.api.deviation(deviation["deviationid"], False) folder = deviation["premium_folder_data"] username = dev["author"]["username"] # premium_folder_data is no longer present when user has access (#5063) has_access = ("premium_folder_data" not in dev) or folder["has_access"] if not has_access and folder["type"] == "watchers" and \ self.config("auto-watch"): if self.unwatch is not None: self.unwatch.append(username) if self.api.user_friends_watch(username): has_access = True self.log.info( "Watching %s for premium folder access", username) else: self.log.warning( "Error when trying to watch %s. " "Try again with a new refresh-token", username) if has_access: self.log.info("Fetching premium folder data") else: self.log.warning("Unable to access premium content (type: %s)", folder["type"]) cache = self._premium_cache for dev in self.api.gallery( username, folder["gallery_id"], public=False): cache[dev["deviationid"]] = dev if has_access else None return cache[deviation["deviationid"]] def _unwatch_premium(self): for username in self.unwatch: self.log.info("Unwatching %s", username) self.api.user_friends_unwatch(username) def _eclipse_to_oauth(self, eclipse_api, deviations): for obj in deviations: deviation = obj["deviation"] if "deviation" in obj else obj deviation_uuid = eclipse_api.deviation_extended_fetch( deviation["deviationId"], deviation["author"]["username"], "journal" if deviation["isJournal"] else "art", )["deviation"]["extended"]["deviationUuid"] yield self.api.deviation(deviation_uuid) class DeviantartUserExtractor(DeviantartExtractor): """Extractor for an artist's user profile""" subcategory = "user" pattern = BASE_PATTERN + r"/?$" example = "https://www.deviantart.com/USER" def initialize(self): pass skip = Extractor.skip def items(self): base = "{}/{}/".format(self.root, self.user) return self._dispatch_extractors(( (DeviantartAvatarExtractor , base + "avatar"), (DeviantartBackgroundExtractor, base + "banner"), (DeviantartGalleryExtractor , base + "gallery"), (DeviantartScrapsExtractor , base + "gallery/scraps"), (DeviantartJournalExtractor , base + "posts"), (DeviantartStatusExtractor , base + "posts/statuses"), (DeviantartFavoriteExtractor , base + "favourites"), ), ("gallery",)) ############################################################################### # OAuth ####################################################################### class DeviantartGalleryExtractor(DeviantartExtractor): """Extractor for all deviations from an artist's gallery""" subcategory = "gallery" archive_fmt = "g_{_username}_{index}.{extension}" pattern = BASE_PATTERN + r"/gallery(?:/all|/?\?catpath=)?/?$" example = "https://www.deviantart.com/USER/gallery/" def deviations(self): if self.flat and not self.group: return self.api.gallery_all(self.user, self.offset) folders = self.api.gallery_folders(self.user) return self._folder_urls(folders, "gallery", DeviantartFolderExtractor) class DeviantartAvatarExtractor(DeviantartExtractor): """Extractor for an artist's avatar""" subcategory = "avatar" archive_fmt = "a_{_username}_{index}" pattern = BASE_PATTERN + r"/avatar" example = "https://www.deviantart.com/USER/avatar/" def deviations(self): name = self.user.lower() user = _user_details(self, name) if not user: return () icon = user["usericon"] if icon == DEFAULT_AVATAR: self.log.debug("Skipping avatar of '%s' (default)", name) return () _, sep, index = icon.rpartition("?") if not sep: index = "0" formats = self.config("formats") if not formats: url = icon.replace("/avatars/", "/avatars-big/", 1) return (self._make_deviation(url, user, index, ""),) if isinstance(formats, str): formats = formats.replace(" ", "").split(",") results = [] for fmt in formats: fmt, _, ext = fmt.rpartition(".") if fmt: fmt = "-" + fmt url = "https://a.deviantart.net/avatars{}/{}/{}/{}.{}?{}".format( fmt, name[0], name[1], name, ext, index) results.append(self._make_deviation(url, user, index, fmt)) return results def _make_deviation(self, url, user, index, fmt): return { "author" : user, "category" : "avatar", "index" : text.parse_int(index), "is_deleted" : False, "is_downloadable": False, "published_time" : 0, "title" : "avatar" + fmt, "stats" : {"comments": 0}, "content" : {"src": url}, } class DeviantartBackgroundExtractor(DeviantartExtractor): """Extractor for an artist's banner""" subcategory = "background" archive_fmt = "b_{index}" pattern = BASE_PATTERN + r"/ba(?:nner|ckground)" example = "https://www.deviantart.com/USER/banner/" def deviations(self): try: return (self.api.user_profile(self.user.lower()) ["cover_deviation"]["cover_deviation"],) except Exception: return () class DeviantartFolderExtractor(DeviantartExtractor): """Extractor for deviations inside an artist's gallery folder""" subcategory = "folder" directory_fmt = ("{category}", "{username}", "{folder[title]}") archive_fmt = "F_{folder[uuid]}_{index}.{extension}" pattern = BASE_PATTERN + r"/gallery/([^/?#]+)/([^/?#]+)" example = "https://www.deviantart.com/USER/gallery/12345/TITLE" def __init__(self, match): DeviantartExtractor.__init__(self, match) self.folder = None self.folder_id = match.group(3) self.folder_name = match.group(4) def deviations(self): folders = self.api.gallery_folders(self.user) folder = self._find_folder(folders, self.folder_name, self.folder_id) self.folder = { "title": folder["name"], "uuid" : folder["folderid"], "index": self.folder_id, "owner": self.user, } return self.api.gallery(self.user, folder["folderid"], self.offset) def prepare(self, deviation): DeviantartExtractor.prepare(self, deviation) deviation["folder"] = self.folder class DeviantartStashExtractor(DeviantartExtractor): """Extractor for sta.sh-ed deviations""" subcategory = "stash" archive_fmt = "{index}.{extension}" pattern = (r"(?:https?://)?(?:(?:www\.)?deviantart\.com/stash|sta\.sh)" r"/([a-z0-9]+)") example = "https://sta.sh/abcde" skip = Extractor.skip def __init__(self, match): DeviantartExtractor.__init__(self, match) self.user = None self.stash_id = match.group(1) def deviations(self, stash_id=None): if stash_id is None: stash_id = self.stash_id url = "https://sta.sh/" + stash_id page = self._limited_request(url).text if stash_id[0] == "0": uuid = text.extr(page, '//deviation/', '"') if uuid: deviation = self.api.deviation(uuid) deviation["index"] = text.parse_int(text.extr( page, '\\"deviationId\\":', ',')) yield deviation return for item in text.extract_iter( page, 'class="stash-thumb-container', '</div>'): url = text.extr(item, '<a href="', '"') if url: stash_id = url.rpartition("/")[2] else: stash_id = text.extr(item, 'gmi-stashid="', '"') stash_id = "2" + util.bencode(text.parse_int( stash_id), "0123456789abcdefghijklmnopqrstuvwxyz") if len(stash_id) > 2: yield from self.deviations(stash_id) class DeviantartFavoriteExtractor(DeviantartExtractor): """Extractor for an artist's favorites""" subcategory = "favorite" directory_fmt = ("{category}", "{username}", "Favourites") archive_fmt = "f_{_username}_{index}.{extension}" pattern = BASE_PATTERN + r"/favourites(?:/all|/?\?catpath=)?/?$" example = "https://www.deviantart.com/USER/favourites/" def deviations(self): if self.flat: return self.api.collections_all(self.user, self.offset) folders = self.api.collections_folders(self.user) return self._folder_urls( folders, "favourites", DeviantartCollectionExtractor) class DeviantartCollectionExtractor(DeviantartExtractor): """Extractor for a single favorite collection""" subcategory = "collection" directory_fmt = ("{category}", "{username}", "Favourites", "{collection[title]}") archive_fmt = "C_{collection[uuid]}_{index}.{extension}" pattern = BASE_PATTERN + r"/favourites/([^/?#]+)/([^/?#]+)" example = "https://www.deviantart.com/USER/favourites/12345/TITLE" def __init__(self, match): DeviantartExtractor.__init__(self, match) self.collection = None self.collection_id = match.group(3) self.collection_name = match.group(4) def deviations(self): folders = self.api.collections_folders(self.user) folder = self._find_folder( folders, self.collection_name, self.collection_id) self.collection = { "title": folder["name"], "uuid" : folder["folderid"], "index": self.collection_id, "owner": self.user, } return self.api.collections(self.user, folder["folderid"], self.offset) def prepare(self, deviation): DeviantartExtractor.prepare(self, deviation) deviation["collection"] = self.collection class DeviantartJournalExtractor(DeviantartExtractor): """Extractor for an artist's journals""" subcategory = "journal" directory_fmt = ("{category}", "{username}", "Journal") archive_fmt = "j_{_username}_{index}.{extension}" pattern = BASE_PATTERN + r"/(?:posts(?:/journals)?|journal)/?(?:\?.*)?$" example = "https://www.deviantart.com/USER/posts/journals/" def deviations(self): return self.api.browse_user_journals(self.user, self.offset) class DeviantartStatusExtractor(DeviantartExtractor): """Extractor for an artist's status updates""" subcategory = "status" directory_fmt = ("{category}", "{username}", "Status") filename_fmt = "{category}_{index}_{title}_{date}.{extension}" archive_fmt = "S_{_username}_{index}.{extension}" pattern = BASE_PATTERN + r"/posts/statuses" example = "https://www.deviantart.com/USER/posts/statuses/" def deviations(self): for status in self.api.user_statuses(self.user, self.offset): yield from self.status(status) def status(self, status): for item in status.get("items") or (): # do not trust is_share # shared deviations/statuses if "deviation" in item: yield item["deviation"].copy() if "status" in item: yield from self.status(item["status"].copy()) # assume is_deleted == true means necessary fields are missing if status["is_deleted"]: self.log.warning( "Skipping status %s (deleted)", status.get("statusid")) return yield status def prepare(self, deviation): if "deviationid" in deviation: return DeviantartExtractor.prepare(self, deviation) try: path = deviation["url"].split("/") deviation["index"] = text.parse_int(path[-1] or path[-2]) except KeyError: deviation["index"] = 0 if self.user: deviation["username"] = self.user deviation["_username"] = self.user.lower() else: deviation["username"] = deviation["author"]["username"] deviation["_username"] = deviation["username"].lower() deviation["date"] = dt = text.parse_datetime(deviation["ts"]) deviation["published_time"] = int(util.datetime_to_timestamp(dt)) deviation["da_category"] = "Status" deviation["category_path"] = "status" deviation["is_downloadable"] = False deviation["title"] = "Status Update" comments_count = deviation.pop("comments_count", 0) deviation["stats"] = {"comments": comments_count} if self.comments: deviation["comments"] = ( self._extract_comments(deviation["statusid"], "status") if comments_count else () ) class DeviantartPopularExtractor(DeviantartExtractor): """Extractor for popular deviations""" subcategory = "popular" directory_fmt = ("{category}", "Popular", "{popular[range]}", "{popular[search]}") archive_fmt = "P_{popular[range]}_{popular[search]}_{index}.{extension}" pattern = (r"(?:https?://)?www\.deviantart\.com/(?:" r"(?:deviations/?)?\?order=(popular-[^/?#]+)" r"|((?:[\w-]+/)*)(popular-[^/?#]+)" r")/?(?:\?([^#]*))?") example = "https://www.deviantart.com/popular-24-hours/" def __init__(self, match): DeviantartExtractor.__init__(self, match) self.user = "" trange1, path, trange2, query = match.groups() query = text.parse_query(query) self.search_term = query.get("q") trange = trange1 or trange2 or query.get("order", "") if trange.startswith("popular-"): trange = trange[8:] self.time_range = { "newest" : "now", "most-recent" : "now", "this-week" : "1week", "this-month" : "1month", "this-century": "alltime", "all-time" : "alltime", }.get(trange, "alltime") self.popular = { "search": self.search_term or "", "range" : trange or "all-time", "path" : path.strip("/") if path else "", } def deviations(self): if self.time_range == "now": return self.api.browse_newest(self.search_term, self.offset) return self.api.browse_popular( self.search_term, self.time_range, self.offset) def prepare(self, deviation): DeviantartExtractor.prepare(self, deviation) deviation["popular"] = self.popular class DeviantartTagExtractor(DeviantartExtractor): """Extractor for deviations from tag searches""" subcategory = "tag" directory_fmt = ("{category}", "Tags", "{search_tags}") archive_fmt = "T_{search_tags}_{index}.{extension}" pattern = r"(?:https?://)?www\.deviantart\.com/tag/([^/?#]+)" example = "https://www.deviantart.com/tag/TAG" def __init__(self, match): DeviantartExtractor.__init__(self, match) self.tag = text.unquote(match.group(1)) def deviations(self): return self.api.browse_tags(self.tag, self.offset) def prepare(self, deviation): DeviantartExtractor.prepare(self, deviation) deviation["search_tags"] = self.tag class DeviantartWatchExtractor(DeviantartExtractor): """Extractor for Deviations from watched users""" subcategory = "watch" pattern = (r"(?:https?://)?(?:www\.)?deviantart\.com" r"/(?:watch/deviations|notifications/watch)()()") example = "https://www.deviantart.com/watch/deviations" def deviations(self): return self.api.browse_deviantsyouwatch() class DeviantartWatchPostsExtractor(DeviantartExtractor): """Extractor for Posts from watched users""" subcategory = "watch-posts" pattern = r"(?:https?://)?(?:www\.)?deviantart\.com/watch/posts()()" example = "https://www.deviantart.com/watch/posts" def deviations(self): return self.api.browse_posts_deviantsyouwatch() ############################################################################### # Eclipse ##################################################################### class DeviantartDeviationExtractor(DeviantartExtractor): """Extractor for single deviations""" subcategory = "deviation" archive_fmt = "g_{_username}_{index}.{extension}" pattern = (BASE_PATTERN + r"/(art|journal)/(?:[^/?#]+-)?(\d+)" r"|(?:https?://)?(?:www\.)?(?:fx)?deviantart\.com/" r"(?:view/|deviation/|view(?:-full)?\.php/*\?(?:[^#]+&)?id=)" r"(\d+)" # bare deviation ID without slug r"|(?:https?://)?fav\.me/d([0-9a-z]+)") # base36 example = "https://www.deviantart.com/UsER/art/TITLE-12345" skip = Extractor.skip def __init__(self, match): DeviantartExtractor.__init__(self, match) self.type = match.group(3) self.deviation_id = \ match.group(4) or match.group(5) or id_from_base36(match.group(6)) def deviations(self): if self.user: url = "{}/{}/{}/{}".format( self.root, self.user, self.type or "art", self.deviation_id) else: url = "{}/view/{}/".format(self.root, self.deviation_id) uuid = text.extr(self._limited_request(url).text, '"deviationUuid\\":\\"', '\\') if not uuid: raise exception.NotFoundError("deviation") return (self.api.deviation(uuid),) class DeviantartScrapsExtractor(DeviantartExtractor): """Extractor for an artist's scraps""" subcategory = "scraps" directory_fmt = ("{category}", "{username}", "Scraps") archive_fmt = "s_{_username}_{index}.{extension}" cookies_domain = ".deviantart.com" pattern = BASE_PATTERN + r"/gallery/(?:\?catpath=)?scraps\b" example = "https://www.deviantart.com/USER/gallery/scraps" def deviations(self): self.login() eclipse_api = DeviantartEclipseAPI(self) return self._eclipse_to_oauth( eclipse_api, eclipse_api.gallery_scraps(self.user, self.offset)) class DeviantartSearchExtractor(DeviantartExtractor): """Extractor for deviantart search results""" subcategory = "search" directory_fmt = ("{category}", "Search", "{search_tags}") archive_fmt = "Q_{search_tags}_{index}.{extension}" cookies_domain = ".deviantart.com" pattern = (r"(?:https?://)?www\.deviantart\.com" r"/search(?:/deviations)?/?\?([^#]+)") example = "https://www.deviantart.com/search?q=QUERY" skip = Extractor.skip def __init__(self, match): DeviantartExtractor.__init__(self, match) self.query = text.parse_query(self.user) self.search = self.query.get("q", "") self.user = "" def deviations(self): logged_in = self.login() eclipse_api = DeviantartEclipseAPI(self) search = (eclipse_api.search_deviations if logged_in else self._search_html) return self._eclipse_to_oauth(eclipse_api, search(self.query)) def prepare(self, deviation): DeviantartExtractor.prepare(self, deviation) deviation["search_tags"] = self.search def _search_html(self, params): url = self.root + "/search" while True: response = self.request(url, params=params) if response.history and "/users/login" in response.url: raise exception.StopExtraction("HTTP redirect to login page") page = response.text for dev in DeviantartDeviationExtractor.pattern.findall( page)[2::3]: yield { "deviationId": dev[3], "author": {"username": dev[0]}, "isJournal": dev[2] == "journal", } cursor = text.extr(page, r'\"cursor\":\"', '\\',) if not cursor: return params["cursor"] = cursor class DeviantartGallerySearchExtractor(DeviantartExtractor): """Extractor for deviantart gallery searches""" subcategory = "gallery-search" archive_fmt = "g_{_username}_{index}.{extension}" cookies_domain = ".deviantart.com" pattern = BASE_PATTERN + r"/gallery/?\?(q=[^#]+)" example = "https://www.deviantart.com/USER/gallery?q=QUERY" def __init__(self, match): DeviantartExtractor.__init__(self, match) self.query = match.group(3) def deviations(self): self.login() eclipse_api = DeviantartEclipseAPI(self) query = text.parse_query(self.query) self.search = query["q"] return self._eclipse_to_oauth( eclipse_api, eclipse_api.galleries_search( self.user, self.search, self.offset, query.get("sort", "most-recent"), )) def prepare(self, deviation): DeviantartExtractor.prepare(self, deviation) deviation["search_tags"] = self.search class DeviantartFollowingExtractor(DeviantartExtractor): """Extractor for user's watched users""" subcategory = "following" pattern = BASE_PATTERN + "/about#watching$" example = "https://www.deviantart.com/USER/about#watching" def items(self): eclipse_api = DeviantartEclipseAPI(self) for user in eclipse_api.user_watching(self.user, self.offset): url = "{}/{}".format(self.root, user["username"]) user["_extractor"] = DeviantartUserExtractor yield Message.Queue, url, user ############################################################################### # API Interfaces ############################################################## class DeviantartOAuthAPI(): """Interface for the DeviantArt OAuth API Ref: https://www.deviantart.com/developers/http/v1/20160316 """ CLIENT_ID = "5388" CLIENT_SECRET = "76b08c69cfb27f26d6161f9ab6d061a1" def __init__(self, extractor): self.extractor = extractor self.log = extractor.log self.headers = {"dA-minor-version": "20200519"} self._warn_429 = True self.delay = extractor.config("wait-min", 0) self.delay_min = max(2, self.delay) self.mature = extractor.config("mature", "true") if not isinstance(self.mature, str): self.mature = "true" if self.mature else "false" self.strategy = extractor.config("pagination") self.folders = extractor.config("folders", False) self.public = extractor.config("public", True) client_id = extractor.config("client-id") if client_id: self.client_id = str(client_id) self.client_secret = extractor.config("client-secret") else: self.client_id = self.CLIENT_ID self.client_secret = self.CLIENT_SECRET token = extractor.config("refresh-token") if token is None or token == "cache": token = "#" + self.client_id if not _refresh_token_cache(token): token = None self.refresh_token_key = token metadata = extractor.config("metadata", False) if not metadata: metadata = bool(extractor.extra) if metadata: self.metadata = True if isinstance(metadata, str): if metadata == "all": metadata = ("submission", "camera", "stats", "collection", "gallery") else: metadata = metadata.replace(" ", "").split(",") elif not isinstance(metadata, (list, tuple)): metadata = () self._metadata_params = {"mature_content": self.mature} self._metadata_public = None if metadata: # extended metadata self.limit = 10 for param in metadata: self._metadata_params["ext_" + param] = "1" if "ext_collection" in self._metadata_params or \ "ext_gallery" in self._metadata_params: if token: self._metadata_public = False else: self.log.error("'collection' and 'gallery' metadata " "require a refresh token") else: # base metadata self.limit = 50 else: self.metadata = False self.limit = None self.log.debug( "Using %s API credentials (client-id %s)", "default" if self.client_id == self.CLIENT_ID else "custom", self.client_id, ) def browse_deviantsyouwatch(self, offset=0): """Yield deviations from users you watch""" endpoint = "/browse/deviantsyouwatch" params = {"limit": 50, "offset": offset, "mature_content": self.mature} return self._pagination(endpoint, params, public=False) def browse_posts_deviantsyouwatch(self, offset=0): """Yield posts from users you watch""" endpoint = "/browse/posts/deviantsyouwatch" params = {"limit": 50, "offset": offset, "mature_content": self.mature} return self._pagination(endpoint, params, public=False, unpack=True) def browse_newest(self, query=None, offset=0): """Browse newest deviations""" endpoint = "/browse/newest" params = { "q" : query, "limit" : 120, "offset" : offset, "mature_content": self.mature, } return self._pagination(endpoint, params) def browse_popular(self, query=None, timerange=None, offset=0): """Yield popular deviations""" endpoint = "/browse/popular" params = { "q" : query, "limit" : 120, "timerange" : timerange, "offset" : offset, "mature_content": self.mature, } return self._pagination(endpoint, params) def browse_tags(self, tag, offset=0): """ Browse a tag """ endpoint = "/browse/tags" params = { "tag" : tag, "offset" : offset, "limit" : 50, "mature_content": self.mature, } return self._pagination(endpoint, params) def browse_user_journals(self, username, offset=0): """Yield all journal entries of a specific user""" endpoint = "/browse/user/journals" params = {"username": username, "offset": offset, "limit": 50, "mature_content": self.mature, "featured": "false"} return self._pagination(endpoint, params) def collections(self, username, folder_id, offset=0): """Yield all Deviation-objects contained in a collection folder""" endpoint = "/collections/" + folder_id params = {"username": username, "offset": offset, "limit": 24, "mature_content": self.mature} return self._pagination(endpoint, params) def collections_all(self, username, offset=0): """Yield all deviations in a user's collection""" endpoint = "/collections/all" params = {"username": username, "offset": offset, "limit": 24, "mature_content": self.mature} return self._pagination(endpoint, params) @memcache(keyarg=1) def collections_folders(self, username, offset=0): """Yield all collection folders of a specific user""" endpoint = "/collections/folders" params = {"username": username, "offset": offset, "limit": 50, "mature_content": self.mature} return self._pagination_list(endpoint, params) def comments(self, target_id, target_type="deviation", comment_id=None, offset=0): """Fetch comments posted on a target""" endpoint = "/comments/{}/{}".format(target_type, target_id) params = { "commentid" : comment_id, "maxdepth" : "5", "offset" : offset, "limit" : 50, "mature_content": self.mature, } return self._pagination_list(endpoint, params=params, key="thread") def deviation(self, deviation_id, public=None): """Query and return info about a single Deviation""" endpoint = "/deviation/" + deviation_id deviation = self._call(endpoint, public=public) if deviation.get("is_mature") and public is None and \ self.refresh_token_key: deviation = self._call(endpoint, public=False) if self.metadata: self._metadata((deviation,)) if self.folders: self._folders((deviation,)) return deviation def deviation_content(self, deviation_id, public=None): """Get extended content of a single Deviation""" endpoint = "/deviation/content" params = {"deviationid": deviation_id} content = self._call(endpoint, params=params, public=public) if public and content["html"].startswith( ' <span class=\"username-with-symbol'): if self.refresh_token_key: content = self._call(endpoint, params=params, public=False) else: self.log.warning("Private Journal") return content def deviation_download(self, deviation_id, public=None): """Get the original file download (if allowed)""" endpoint = "/deviation/download/" + deviation_id params = {"mature_content": self.mature} try: return self._call( endpoint, params=params, public=public, log=False) except Exception: if not self.refresh_token_key: raise return self._call(endpoint, params=params, public=False) def deviation_metadata(self, deviations): """ Fetch deviation metadata for a set of deviations""" endpoint = "/deviation/metadata?" + "&".join( "deviationids[{}]={}".format(num, deviation["deviationid"]) for num, deviation in enumerate(deviations) ) return self._call( endpoint, params=self._metadata_params, public=self._metadata_public, )["metadata"] def gallery(self, username, folder_id, offset=0, extend=True, public=None): """Yield all Deviation-objects contained in a gallery folder""" endpoint = "/gallery/" + folder_id params = {"username": username, "offset": offset, "limit": 24, "mature_content": self.mature, "mode": "newest"} return self._pagination(endpoint, params, extend, public) def gallery_all(self, username, offset=0): """Yield all Deviation-objects of a specific user""" endpoint = "/gallery/all" params = {"username": username, "offset": offset, "limit": 24, "mature_content": self.mature} return self._pagination(endpoint, params) @memcache(keyarg=1) def gallery_folders(self, username, offset=0): """Yield all gallery folders of a specific user""" endpoint = "/gallery/folders" params = {"username": username, "offset": offset, "limit": 50, "mature_content": self.mature} return self._pagination_list(endpoint, params) @memcache(keyarg=1) def user_profile(self, username): """Get user profile information""" endpoint = "/user/profile/" + username return self._call(endpoint, fatal=False) def user_statuses(self, username, offset=0): """Yield status updates of a specific user""" endpoint = "/user/statuses/" params = {"username": username, "offset": offset, "limit": 50} return self._pagination(endpoint, params) def user_friends_watch(self, username): """Watch a user""" endpoint = "/user/friends/watch/" + username data = { "watch[friend]" : "0", "watch[deviations]" : "0", "watch[journals]" : "0", "watch[forum_threads]": "0", "watch[critiques]" : "0", "watch[scraps]" : "0", "watch[activity]" : "0", "watch[collections]" : "0", "mature_content" : self.mature, } return self._call( endpoint, method="POST", data=data, public=False, fatal=False, ).get("success") def user_friends_unwatch(self, username): """Unwatch a user""" endpoint = "/user/friends/unwatch/" + username return self._call( endpoint, method="POST", public=False, fatal=False, ).get("success") def authenticate(self, refresh_token_key): """Authenticate the application by requesting an access token""" self.headers["Authorization"] = \ self._authenticate_impl(refresh_token_key) @cache(maxage=3600, keyarg=1) def _authenticate_impl(self, refresh_token_key): """Actual authenticate implementation""" url = "https://www.deviantart.com/oauth2/token" if refresh_token_key: self.log.info("Refreshing private access token") data = {"grant_type": "refresh_token", "refresh_token": _refresh_token_cache(refresh_token_key)} else: self.log.info("Requesting public access token") data = {"grant_type": "client_credentials"} auth = util.HTTPBasicAuth(self.client_id, self.client_secret) response = self.extractor.request( url, method="POST", data=data, auth=auth, fatal=False) data = response.json() if response.status_code != 200: self.log.debug("Server response: %s", data) raise exception.AuthenticationError('"{}" ({})'.format( data.get("error_description"), data.get("error"))) if refresh_token_key: _refresh_token_cache.update( refresh_token_key, data["refresh_token"]) return "Bearer " + data["access_token"] def _call(self, endpoint, fatal=True, log=True, public=None, **kwargs): """Call an API endpoint""" url = "https://www.deviantart.com/api/v1/oauth2" + endpoint kwargs["fatal"] = None if public is None: public = self.public while True: if self.delay: self.extractor.sleep(self.delay, "api") self.authenticate(None if public else self.refresh_token_key) kwargs["headers"] = self.headers response = self.extractor.request(url, **kwargs) try: data = response.json() except ValueError: self.log.error("Unable to parse API response") data = {} status = response.status_code if 200 <= status < 400: if self.delay > self.delay_min: self.delay -= 1 return data if not fatal and status != 429: return None error = data.get("error_description") if error == "User not found.": raise exception.NotFoundError("user or group") if error == "Deviation not downloadable.": raise exception.AuthorizationError() self.log.debug(response.text) msg = "API responded with {} {}".format( status, response.reason) if status == 429: if self.delay < 30: self.delay += 1 self.log.warning("%s. Using %ds delay.", msg, self.delay) if self._warn_429 and self.delay >= 3: self._warn_429 = False if self.client_id == self.CLIENT_ID: self.log.info( "Register your own OAuth application and use its " "credentials to prevent this error: " "https://github.com/mikf/gallery-dl/blob/master/do" "cs/configuration.rst#extractordeviantartclient-id" "--client-secret") else: if log: self.log.error(msg) return data def _switch_tokens(self, results, params): if len(results) < params["limit"]: return True if not self.extractor.jwt: for item in results: if item.get("is_mature"): return True return False def _pagination(self, endpoint, params, extend=True, public=None, unpack=False, key="results"): warn = True if public is None: public = self.public if self.limit and params["limit"] > self.limit: params["limit"] = (params["limit"] // self.limit) * self.limit while True: data = self._call(endpoint, params=params, public=public) try: results = data[key] except KeyError: self.log.error("Unexpected API response: %s", data) return if unpack: results = [item["journal"] for item in results if "journal" in item] if extend: if public and self._switch_tokens(results, params): if self.refresh_token_key: self.log.debug("Switching to private access token") public = False continue elif data["has_more"] and warn: warn = False self.log.warning( "Private or mature deviations detected! " "Run 'gallery-dl oauth:deviantart' and follow the " "instructions to be able to access them.") # "statusid" cannot be used instead if results and "deviationid" in results[0]: if self.metadata: self._metadata(results) if self.folders: self._folders(results) else: # attempt to fix "deleted" deviations for dev in self._shared_content(results): if not dev["is_deleted"]: continue patch = self._call( "/deviation/" + dev["deviationid"], fatal=False) if patch: dev.update(patch) yield from results if not data["has_more"] and ( self.strategy != "manual" or not results or not extend): return if "next_cursor" in data: params["offset"] = None params["cursor"] = data["next_cursor"] elif data["next_offset"] is not None: params["offset"] = data["next_offset"] params["cursor"] = None else: if params.get("offset") is None: return params["offset"] = int(params["offset"]) + len(results) @staticmethod def _shared_content(results): """Return an iterable of shared deviations in 'results'""" for result in results: for item in result.get("items") or (): if "deviation" in item: yield item["deviation"] def _pagination_list(self, endpoint, params, key="results"): result = [] result.extend(self._pagination(endpoint, params, False, key=key)) return result def _metadata(self, deviations): """Add extended metadata to each deviation object""" if len(deviations) <= self.limit: self._metadata_batch(deviations) else: n = self.limit for index in range(0, len(deviations), n): self._metadata_batch(deviations[index:index+n]) def _metadata_batch(self, deviations): """Fetch extended metadata for a single batch of deviations""" for deviation, metadata in zip( deviations, self.deviation_metadata(deviations)): deviation.update(metadata) deviation["tags"] = [t["tag_name"] for t in deviation["tags"]] def _folders(self, deviations): """Add a list of all containing folders to each deviation object""" for deviation in deviations: deviation["folders"] = self._folders_map( deviation["author"]["username"])[deviation["deviationid"]] @memcache(keyarg=1) def _folders_map(self, username): """Generate a deviation_id -> folders mapping for 'username'""" self.log.info("Collecting folder information for '%s'", username) folders = self.gallery_folders(username) # create 'folderid'-to-'folder' mapping fmap = { folder["folderid"]: folder for folder in folders } # add parent names to folders, but ignore "Featured" as parent featured = folders[0]["folderid"] done = False while not done: done = True for folder in folders: parent = folder["parent"] if not parent: pass elif parent == featured: folder["parent"] = None else: parent = fmap[parent] if parent["parent"]: done = False else: folder["name"] = parent["name"] + "/" + folder["name"] folder["parent"] = None # map deviationids to folder names dmap = collections.defaultdict(list) for folder in folders: for deviation in self.gallery( username, folder["folderid"], 0, False): dmap[deviation["deviationid"]].append(folder["name"]) return dmap class DeviantartEclipseAPI(): """Interface to the DeviantArt Eclipse API""" def __init__(self, extractor): self.extractor = extractor self.log = extractor.log self.request = self.extractor._limited_request self.csrf_token = None def deviation_extended_fetch(self, deviation_id, user, kind=None): endpoint = "/_puppy/dadeviation/init" params = { "deviationid" : deviation_id, "username" : user, "type" : kind, "include_session" : "false", "expand" : "deviation.related", "da_minor_version": "20230710", } return self._call(endpoint, params) def gallery_scraps(self, user, offset=0): endpoint = "/_puppy/dashared/gallection/contents" params = { "username" : user, "type" : "gallery", "offset" : offset, "limit" : 24, "scraps_folder": "true", } return self._pagination(endpoint, params) def galleries_search(self, user, query, offset=0, order="most-recent"): endpoint = "/_puppy/dashared/gallection/search" params = { "username": user, "type" : "gallery", "order" : order, "q" : query, "offset" : offset, "limit" : 24, } return self._pagination(endpoint, params) def search_deviations(self, params): endpoint = "/_puppy/dabrowse/search/deviations" return self._pagination(endpoint, params, key="deviations") def user_info(self, user, expand=False): endpoint = "/_puppy/dauserprofile/init/about" params = {"username": user} return self._call(endpoint, params) def user_watching(self, user, offset=0): gruserid, moduleid = self._ids_watching(user) endpoint = "/_puppy/gruser/module/watching" params = { "gruserid" : gruserid, "gruser_typeid": "4", "username" : user, "moduleid" : moduleid, "offset" : offset, "limit" : 24, } return self._pagination(endpoint, params) def _call(self, endpoint, params): url = "https://www.deviantart.com" + endpoint params["csrf_token"] = self.csrf_token or self._fetch_csrf_token() response = self.request(url, params=params, fatal=None) try: return response.json() except Exception: return {"error": response.text} def _pagination(self, endpoint, params, key="results"): limit = params.get("limit", 24) warn = True while True: data = self._call(endpoint, params) results = data.get(key) if results is None: return if len(results) < limit and warn and data.get("hasMore"): warn = False self.log.warning( "Private deviations detected! " "Provide login credentials or session cookies " "to be able to access them.") yield from results if not data.get("hasMore"): return if "nextCursor" in data: params["offset"] = None params["cursor"] = data["nextCursor"] elif "nextOffset" in data: params["offset"] = data["nextOffset"] params["cursor"] = None elif params.get("offset") is None: return else: params["offset"] = int(params["offset"]) + len(results) def _ids_watching(self, user): url = "{}/{}/about".format(self.extractor.root, user) page = self.request(url).text gruserid, pos = text.extract(page, ' data-userid="', '"') pos = page.find('\\"type\\":\\"watching\\"', pos) if pos < 0: raise exception.NotFoundError("module") moduleid = text.rextract(page, '\\"id\\":', ',', pos)[0].strip('" ') self._fetch_csrf_token(page) return gruserid, moduleid def _fetch_csrf_token(self, page=None): if page is None: page = self.request(self.extractor.root + "/").text self.csrf_token = token = text.extr( page, "window.__CSRF_TOKEN__ = '", "'") return token @memcache(keyarg=1) def _user_details(extr, name): try: return extr.api.user_profile(name)["user"] except Exception: return None @cache(maxage=36500*86400, keyarg=0) def _refresh_token_cache(token): if token and token[0] == "#": return None return token @cache(maxage=28*86400, keyarg=1) def _login_impl(extr, username, password): extr.log.info("Logging in as %s", username) url = "https://www.deviantart.com/users/login" page = extr.request(url).text data = {} for item in text.extract_iter(page, '<input type="hidden" name="', '"/>'): name, _, value = item.partition('" value="') data[name] = value challenge = data.get("challenge") if challenge and challenge != "0": extr.log.warning("Login requires solving a CAPTCHA") extr.log.debug(challenge) data["username"] = username data["password"] = password data["remember"] = "on" extr.sleep(2.0, "login") url = "https://www.deviantart.com/_sisu/do/signin" response = extr.request(url, method="POST", data=data) if not response.history: raise exception.AuthenticationError() return { cookie.name: cookie.value for cookie in extr.cookies } def id_from_base36(base36): return util.bdecode(base36, _ALPHABET) def base36_from_id(deviation_id): return util.bencode(int(deviation_id), _ALPHABET) _ALPHABET = "0123456789abcdefghijklmnopqrstuvwxyz" ############################################################################### # Journal Formats ############################################################# SHADOW_TEMPLATE = """ <span class="shadow"> <img src="{src}" class="smshadow" width="{width}" height="{height}"> </span> <br><br> """ HEADER_TEMPLATE = """<div usr class="gr"> <div class="metadata"> <h2><a href="{url}">{title}</a></h2> <ul> <li class="author"> by <span class="name"><span class="username-with-symbol u"> <a class="u regular username" href="{userurl}">{username}</a>\ <span class="user-symbol regular"></span></span></span>, <span>{date}</span> </li> <li class="category"> {categories} </li> </ul> </div> """ HEADER_CUSTOM_TEMPLATE = """<div class='boxtop journaltop'> <h2> <img src="https://st.deviantart.net/minish/gruzecontrol/icons/journal.gif\ ?2" style="vertical-align:middle" alt=""/> <a href="{url}">{title}</a> </h2> Journal Entry: <span>{date}</span> """ JOURNAL_TEMPLATE_HTML = """text:<!DOCTYPE html> <html> <head> <meta charset="utf-8"> <title>{title}
{shadow}
{html}
""" JOURNAL_TEMPLATE_HTML_EXTRA = """\
\
{}
{}
""" JOURNAL_TEMPLATE_TEXT = """text:{title} by {username}, {date} {content} """ ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709611201.0 gallery_dl-1.26.9/gallery_dl/extractor/directlink.py0000644000175000017500000000265214571514301021264 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2017-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Direct link handling""" from .common import Extractor, Message from .. import text class DirectlinkExtractor(Extractor): """Extractor for direct links to images and other media files""" category = "directlink" filename_fmt = "{domain}/{path}/{filename}.{extension}" archive_fmt = filename_fmt pattern = (r"(?i)https?://(?P[^/?#]+)/(?P[^?#]+\." r"(?:jpe?g|jpe|png|gif|web[mp]|mp4|mkv|og[gmv]|opus))" r"(?:\?(?P[^#]*))?(?:#(?P.*))?$") example = "https://en.wikipedia.org/static/images/project-logos/enwiki.png" def __init__(self, match): Extractor.__init__(self, match) self.data = match.groupdict() def items(self): data = self.data for key, value in data.items(): if value: data[key] = text.unquote(value) data["path"], _, name = data["path"].rpartition("/") data["filename"], _, ext = name.rpartition(".") data["extension"] = ext.lower() data["_http_headers"] = { "Referer": self.url.encode("latin-1", "ignore")} yield Message.Directory, data yield Message.Url, self.url, data ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709611201.0 gallery_dl-1.26.9/gallery_dl/extractor/dynastyscans.py0000644000175000017500000001072614571514301021660 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2015-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://dynasty-scans.com/""" from .common import ChapterExtractor, MangaExtractor, Extractor, Message from .. import text, util import re BASE_PATTERN = r"(?:https?://)?(?:www\.)?dynasty-scans\.com" class DynastyscansBase(): """Base class for dynastyscans extractors""" category = "dynastyscans" root = "https://dynasty-scans.com" def _parse_image_page(self, image_id): url = "{}/images/{}".format(self.root, image_id) extr = text.extract_from(self.request(url).text) date = extr("class='create_at'>", "") tags = extr("class='tags'>", "") src = extr("class='btn-group'>", "") url = extr(' src="', '"') src = text.extr(src, 'href="', '"') if "Source<" in src else "" return { "url" : self.root + url, "image_id": text.parse_int(image_id), "tags" : text.split_html(tags), "date" : text.remove_html(date), "source" : text.unescape(src), } class DynastyscansChapterExtractor(DynastyscansBase, ChapterExtractor): """Extractor for manga-chapters from dynasty-scans.com""" pattern = BASE_PATTERN + r"(/chapters/[^/?#]+)" example = "https://dynasty-scans.com/chapters/NAME" def metadata(self, page): extr = text.extract_from(page) match = re.match( (r"(?:]*>)?([^<]+)(?:
)?" # manga name r"(?: ch(\d+)([^:<]*))?" # chapter info r"(?:: (.+))?"), # title extr("

", ""), ) author = extr(" by ", "") group = extr('"icon-print"> ', '') return { "manga" : text.unescape(match.group(1)), "chapter" : text.parse_int(match.group(2)), "chapter_minor": match.group(3) or "", "title" : text.unescape(match.group(4) or ""), "author" : text.remove_html(author), "group" : (text.remove_html(group) or text.extr(group, ' alt="', '"')), "date" : text.parse_datetime(extr( '"icon-calendar"> ', '<'), "%b %d, %Y"), "lang" : "en", "language": "English", } def images(self, page): data = text.extr(page, "var pages = ", ";\n") return [ (self.root + img["image"], None) for img in util.json_loads(data) ] class DynastyscansMangaExtractor(DynastyscansBase, MangaExtractor): chapterclass = DynastyscansChapterExtractor reverse = False pattern = BASE_PATTERN + r"(/series/[^/?#]+)" example = "https://dynasty-scans.com/series/NAME" def chapters(self, page): return [ (self.root + path, {}) for path in text.extract_iter(page, '
\nPlease wait a few moments", 0, 600) < 0: return response self.sleep(5.0, "check") def _pagination(self, url, params): for params["page"] in itertools.count(1): page = self.request(url, params=params).text album_ids = EromeAlbumExtractor.pattern.findall(page)[::2] yield from album_ids if len(album_ids) < 36: return class EromeAlbumExtractor(EromeExtractor): """Extractor for albums on erome.com""" subcategory = "album" pattern = BASE_PATTERN + r"/a/(\w+)" example = "https://www.erome.com/a/ID" def albums(self): return (self.item,) class EromeUserExtractor(EromeExtractor): subcategory = "user" pattern = BASE_PATTERN + r"/(?!a/|search\?)([^/?#]+)" example = "https://www.erome.com/USER" def albums(self): url = "{}/{}".format(self.root, self.item) return self._pagination(url, {}) class EromeSearchExtractor(EromeExtractor): subcategory = "search" pattern = BASE_PATTERN + r"/search\?q=([^&#]+)" example = "https://www.erome.com/search?q=QUERY" def albums(self): url = self.root + "/search" params = {"q": text.unquote(self.item)} return self._pagination(url, params) @cache() def _cookie_cache(): return () ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709611201.0 gallery_dl-1.26.9/gallery_dl/extractor/exhentai.py0000644000175000017500000004717114571514301020746 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2014-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://e-hentai.org/ and https://exhentai.org/""" from .common import Extractor, Message from .. import text, util, exception from ..cache import cache import itertools import math BASE_PATTERN = r"(?:https?://)?(e[x-]|g\.e-)hentai\.org" class ExhentaiExtractor(Extractor): """Base class for exhentai extractors""" category = "exhentai" directory_fmt = ("{category}", "{gid} {title[:247]}") filename_fmt = "{gid}_{num:>04}_{image_token}_{filename}.{extension}" archive_fmt = "{gid}_{num}" cookies_domain = ".exhentai.org" cookies_names = ("ipb_member_id", "ipb_pass_hash") root = "https://exhentai.org" request_interval = (3.0, 6.0) ciphers = "DEFAULT:!DH" LIMIT = False def __init__(self, match): Extractor.__init__(self, match) self.version = match.group(1) def initialize(self): domain = self.config("domain", "auto") if domain == "auto": domain = ("ex" if self.version == "ex" else "e-") + "hentai.org" self.root = "https://" + domain self.api_url = self.root + "/api.php" self.cookies_domain = "." + domain Extractor.initialize(self) if self.version != "ex": self.cookies.set("nw", "1", domain=self.cookies_domain) def request(self, url, **kwargs): response = Extractor.request(self, url, **kwargs) if response.history and response.headers.get("Content-Length") == "0": self.log.info("blank page") raise exception.AuthorizationError() return response def login(self): """Login and set necessary cookies""" if self.LIMIT: raise exception.StopExtraction("Image limit reached!") if self.cookies_check(self.cookies_names): return username, password = self._get_auth_info() if username: return self.cookies_update(self._login_impl(username, password)) if self.version == "ex": self.log.info("No username or cookies given; using e-hentai.org") self.root = "https://e-hentai.org" self.cookies_domain = ".e-hentai.org" self.cookies.set("nw", "1", domain=self.cookies_domain) self.original = False self.limits = False @cache(maxage=90*86400, keyarg=1) def _login_impl(self, username, password): self.log.info("Logging in as %s", username) url = "https://forums.e-hentai.org/index.php?act=Login&CODE=01" headers = { "Referer": "https://e-hentai.org/bounce_login.php?b=d&bt=1-1", } data = { "CookieDate": "1", "b": "d", "bt": "1-1", "UserName": username, "PassWord": password, "ipb_login_submit": "Login!", } self.cookies.clear() response = self.request(url, method="POST", headers=headers, data=data) if b"You are now logged in as:" not in response.content: raise exception.AuthenticationError() # collect more cookies url = self.root + "/favorites.php" response = self.request(url) if response.history: self.request(url) return self.cookies class ExhentaiGalleryExtractor(ExhentaiExtractor): """Extractor for image galleries from exhentai.org""" subcategory = "gallery" pattern = (BASE_PATTERN + r"(?:/g/(\d+)/([\da-f]{10})" r"|/s/([\da-f]{10})/(\d+)-(\d+))") example = "https://e-hentai.org/g/12345/67890abcde/" def __init__(self, match): ExhentaiExtractor.__init__(self, match) self.gallery_id = text.parse_int(match.group(2) or match.group(5)) self.gallery_token = match.group(3) self.image_token = match.group(4) self.image_num = text.parse_int(match.group(6), 1) self.key_start = None self.key_show = None self.key_next = None self.count = 0 self.data = None def _init(self): source = self.config("source") if source == "hitomi": self.items = self._items_hitomi limits = self.config("limits", False) if limits and limits.__class__ is int: self.limits = limits self._remaining = 0 else: self.limits = False self.fallback_retries = self.config("fallback-retries", 2) self.original = self.config("original", True) def finalize(self): if self.data: self.log.info("Use '%s/s/%s/%s-%s' as input URL " "to continue downloading from the current position", self.root, self.data["image_token"], self.gallery_id, self.data["num"]) def favorite(self, slot="0"): url = self.root + "/gallerypopups.php" params = { "gid": self.gallery_id, "t" : self.gallery_token, "act": "addfav", } data = { "favcat" : slot, "apply" : "Apply Changes", "update" : "1", } self.request(url, method="POST", params=params, data=data) def items(self): self.login() if self.gallery_token: gpage = self._gallery_page() self.image_token = text.extr(gpage, 'hentai.org/s/', '"') if not self.image_token: self.log.debug("Page content:\n%s", gpage) raise exception.StopExtraction( "Failed to extract initial image token") ipage = self._image_page() else: ipage = self._image_page() part = text.extr(ipage, 'hentai.org/g/', '"') if not part: self.log.debug("Page content:\n%s", ipage) raise exception.StopExtraction( "Failed to extract gallery token") self.gallery_token = part.split("/")[1] gpage = self._gallery_page() self.data = data = self.get_metadata(gpage) self.count = text.parse_int(data["filecount"]) yield Message.Directory, data images = itertools.chain( (self.image_from_page(ipage),), self.images_from_api()) for url, image in images: data.update(image) if self.limits: self._check_limits(data) if "/fullimg" in url: data["_http_validate"] = self._validate_response else: data["_http_validate"] = None yield Message.Url, url, data fav = self.config("fav") if fav is not None: self.favorite(fav) self.data = None def _items_hitomi(self): if self.config("metadata", False): data = self.metadata_from_api() data["date"] = text.parse_timestamp(data["posted"]) else: data = {} from .hitomi import HitomiGalleryExtractor url = "https://hitomi.la/galleries/{}.html".format(self.gallery_id) data["_extractor"] = HitomiGalleryExtractor yield Message.Queue, url, data def get_metadata(self, page): """Extract gallery metadata""" data = self.metadata_from_page(page) if self.config("metadata", False): data.update(self.metadata_from_api()) data["date"] = text.parse_timestamp(data["posted"]) return data def metadata_from_page(self, page): extr = text.extract_from(page) api_url = extr('var api_url = "', '"') if api_url: self.api_url = api_url data = { "gid" : self.gallery_id, "token" : self.gallery_token, "thumb" : extr("background:transparent url(", ")"), "title" : text.unescape(extr('

', '

')), "title_jpn" : text.unescape(extr('

', '

')), "_" : extr('
', '<'), "uploader" : extr('
', '
'), "date" : text.parse_datetime(extr( '>Posted:

'), "%Y-%m-%d %H:%M"), "parent" : extr( '>Parent:
', 'Visible:', '<'), "language" : extr('>Language:', ' '), "filesize" : text.parse_bytes(extr( '>File Size:', '<').rstrip("Bbi")), "filecount" : extr('>Length:', ' '), "favorites" : extr('id="favcount">', ' '), "rating" : extr(">Average: ", "<"), "torrentcount" : extr('>Torrent Download (', ')'), } if data["uploader"].startswith("<"): data["uploader"] = text.unescape(text.extr( data["uploader"], ">", "<")) f = data["favorites"][0] if f == "N": data["favorites"] = "0" elif f == "O": data["favorites"] = "1" data["lang"] = util.language_to_code(data["language"]) data["tags"] = [ text.unquote(tag.replace("+", " ")) for tag in text.extract_iter(page, 'hentai.org/tag/', '"') ] return data def metadata_from_api(self): data = { "method" : "gdata", "gidlist" : ((self.gallery_id, self.gallery_token),), "namespace": 1, } data = self.request(self.api_url, method="POST", json=data).json() if "error" in data: raise exception.StopExtraction(data["error"]) return data["gmetadata"][0] def image_from_page(self, page): """Get image url and data from webpage""" pos = page.index('
= 0: origurl, pos = text.rextract(i6, '"', '"', pos) url = text.unescape(origurl) data = self._parse_original_info(text.extract( i6, "ownload original", "<", pos)[0]) data["_fallback"] = self._fallback_original(nl, url) else: url = imgurl data = self._parse_image_info(url) data["_fallback"] = self._fallback_1280( nl, request["page"], imgkey) except IndexError: self.log.debug("Page content:\n%s", page) raise exception.StopExtraction( "Unable to parse image info for '%s'", url) data["num"] = request["page"] data["image_token"] = imgkey data["_url_1280"] = imgurl data["_nl"] = nl self._check_509(imgurl) yield url, text.nameext_from_url(url, data) request["imgkey"] = nextkey def _validate_response(self, response): if not response.history and response.headers.get( "content-type", "").startswith("text/html"): page = response.text self.log.warning("'%s'", page) if " requires GP" in page: gp = self.config("gp") if gp == "stop": raise exception.StopExtraction("Not enough GP") elif gp == "wait": input("Press ENTER to continue.") return response.url self.log.info("Falling back to non-original downloads") self.original = False return self.data["_url_1280"] self._report_limits() return True def _report_limits(self): ExhentaiExtractor.LIMIT = True raise exception.StopExtraction("Image limit reached!") def _check_limits(self, data): if not self._remaining or data["num"] % 25 == 0: self._update_limits() self._remaining -= data["cost"] if self._remaining <= 0: self._report_limits() def _check_509(self, url): # full 509.gif URLs # - https://exhentai.org/img/509.gif # - https://ehgt.org/g/509.gif if url.endswith(("hentai.org/img/509.gif", "ehgt.org/g/509.gif")): self.log.debug(url) self._report_limits() def _update_limits(self): url = "https://e-hentai.org/home.php" cookies = { cookie.name: cookie.value for cookie in self.cookies if cookie.domain == self.cookies_domain and cookie.name != "igneous" } page = self.request(url, cookies=cookies).text current = text.extr(page, "", "") self.log.debug("Image Limits: %s/%s", current, self.limits) self._remaining = self.limits - text.parse_int(current) def _gallery_page(self): url = "{}/g/{}/{}/".format( self.root, self.gallery_id, self.gallery_token) response = self.request(url, fatal=False) page = response.text if response.status_code == 404 and "Gallery Not Available" in page: raise exception.AuthorizationError() if page.startswith(("Key missing", "Gallery not found")): raise exception.NotFoundError("gallery") if "hentai.org/mpv/" in page: self.log.warning("Enabled Multi-Page Viewer is not supported") return page def _image_page(self): url = "{}/s/{}/{}-{}".format( self.root, self.image_token, self.gallery_id, self.image_num) page = self.request(url, fatal=False).text if page.startswith(("Invalid page", "Keep trying")): raise exception.NotFoundError("image page") return page def _fallback_original(self, nl, fullimg): url = "{}?nl={}".format(fullimg, nl) for _ in util.repeat(self.fallback_retries): yield url def _fallback_1280(self, nl, num, token=None): if not token: token = self.key_start for _ in util.repeat(self.fallback_retries): url = "{}/s/{}/{}-{}?nl={}".format( self.root, token, self.gallery_id, num, nl) page = self.request(url, fatal=False).text if page.startswith(("Invalid page", "Keep trying")): return url, data = self.image_from_page(page) yield url nl = data["_nl"] @staticmethod def _parse_image_info(url): for part in url.split("/")[4:]: try: _, size, width, height, _ = part.split("-") break except ValueError: pass else: size = width = height = 0 return { "cost" : 1, "size" : text.parse_int(size), "width" : text.parse_int(width), "height": text.parse_int(height), } @staticmethod def _parse_original_info(info): parts = info.lstrip().split(" ") size = text.parse_bytes(parts[3] + parts[4][0]) return { # 1 initial point + 1 per 0.1 MB "cost" : 1 + math.ceil(size / 100000), "size" : size, "width" : text.parse_int(parts[0]), "height": text.parse_int(parts[2]), } class ExhentaiSearchExtractor(ExhentaiExtractor): """Extractor for exhentai search results""" subcategory = "search" pattern = BASE_PATTERN + r"/(?:\?([^#]*)|tag/([^/?#]+))" example = "https://e-hentai.org/?f_search=QUERY" def __init__(self, match): ExhentaiExtractor.__init__(self, match) _, query, tag = match.groups() if tag: if "+" in tag: ns, _, tag = tag.rpartition(":") tag = '{}:"{}$"'.format(ns, tag.replace("+", " ")) else: tag += "$" self.params = {"f_search": tag, "page": 0} else: self.params = text.parse_query(query) if "next" not in self.params: self.params["page"] = text.parse_int(self.params.get("page")) def _init(self): self.search_url = self.root def items(self): self.login() data = {"_extractor": ExhentaiGalleryExtractor} search_url = self.search_url params = self.params while True: last = None page = self.request(search_url, params=params).text for gallery in ExhentaiGalleryExtractor.pattern.finditer(page): url = gallery.group(0) if url == last: continue last = url data["gallery_id"] = text.parse_int(gallery.group(2)) data["gallery_token"] = gallery.group(3) yield Message.Queue, url + "/", data next_url = text.extr(page, 'nexturl="', '"', None) if next_url is not None: if not next_url: return search_url = next_url params = None elif 'class="ptdd">><' in page or ">No hits found

" in page: return else: params["page"] += 1 class ExhentaiFavoriteExtractor(ExhentaiSearchExtractor): """Extractor for favorited exhentai galleries""" subcategory = "favorite" pattern = BASE_PATTERN + r"/favorites\.php(?:\?([^#]*)())?" example = "https://e-hentai.org/favorites.php" def _init(self): self.search_url = self.root + "/favorites.php" ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709611201.0 gallery_dl-1.26.9/gallery_dl/extractor/fallenangels.py0000644000175000017500000000565414571514301021574 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2017-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.fascans.com/""" from .common import ChapterExtractor, MangaExtractor from .. import text, util class FallenangelsChapterExtractor(ChapterExtractor): """Extractor for manga chapters from fascans.com""" category = "fallenangels" pattern = (r"(?:https?://)?(manga|truyen)\.fascans\.com" r"/manga/([^/?#]+)/([^/?#]+)") example = "https://manga.fascans.com/manga/NAME/CHAPTER/" def __init__(self, match): self.version, self.manga, self.chapter = match.groups() url = "https://{}.fascans.com/manga/{}/{}/1".format( self.version, self.manga, self.chapter) ChapterExtractor.__init__(self, match, url) def metadata(self, page): extr = text.extract_from(page) lang = "vi" if self.version == "truyen" else "en" chapter, sep, minor = self.chapter.partition(".") return { "manga" : extr('name="description" content="', ' Chapter '), "title" : extr(': ', ' - Page 1'), "chapter" : chapter, "chapter_minor": sep + minor, "lang" : lang, "language": util.code_to_language(lang), } @staticmethod def images(page): return [ (img["page_image"], None) for img in util.json_loads( text.extr(page, "var pages = ", ";") ) ] class FallenangelsMangaExtractor(MangaExtractor): """Extractor for manga from fascans.com""" chapterclass = FallenangelsChapterExtractor category = "fallenangels" pattern = r"(?:https?://)?((manga|truyen)\.fascans\.com/manga/[^/]+)/?$" example = "https://manga.fascans.com/manga/NAME" def __init__(self, match): url = "https://" + match.group(1) self.lang = "vi" if match.group(2) == "truyen" else "en" MangaExtractor.__init__(self, match, url) def chapters(self, page): extr = text.extract_from(page) results = [] language = util.code_to_language(self.lang) while extr('
  • ', '<') title = extr('', '') manga, _, chapter = cha.rpartition(" ") chapter, dot, minor = chapter.partition(".") results.append((url, { "manga" : manga, "title" : text.unescape(title), "volume" : text.parse_int(vol), "chapter" : text.parse_int(chapter), "chapter_minor": dot + minor, "lang" : self.lang, "language": language, })) return results ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709680883.0 gallery_dl-1.26.9/gallery_dl/extractor/fanbox.py0000644000175000017500000003205414571724363020423 0ustar00mikemike# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.fanbox.cc/""" from .common import Extractor, Message from .. import text from ..cache import memcache import re BASE_PATTERN = r"(?:https?://)?(?:www\.)?fanbox\.cc" USER_PATTERN = ( r"(?:https?://)?(?:" r"(?!www\.)([\w-]+)\.fanbox\.cc|" r"(?:www\.)?fanbox\.cc/@([\w-]+))" ) class FanboxExtractor(Extractor): """Base class for Fanbox extractors""" category = "fanbox" root = "https://www.fanbox.cc" directory_fmt = ("{category}", "{creatorId}") filename_fmt = "{id}_{num}.{extension}" archive_fmt = "{id}_{num}" _warning = True def _init(self): self.headers = {"Origin": self.root} self.embeds = self.config("embeds", True) includes = self.config("metadata") if includes: if isinstance(includes, str): includes = includes.split(",") elif not isinstance(includes, (list, tuple)): includes = ("user", "plan") self._meta_user = ("user" in includes) self._meta_plan = ("plan" in includes) else: self._meta_user = self._meta_plan = False if self._warning: if not self.cookies_check(("FANBOXSESSID",)): self.log.warning("no 'FANBOXSESSID' cookie set") FanboxExtractor._warning = False def items(self): for content_body, post in self.posts(): yield Message.Directory, post yield from self._get_urls_from_post(content_body, post) def posts(self): """Return all relevant post objects""" def _pagination(self, url): while url: url = text.ensure_http_scheme(url) body = self.request(url, headers=self.headers).json()["body"] for item in body["items"]: try: yield self._get_post_data(item["id"]) except Exception as exc: self.log.warning("Skipping post %s (%s: %s)", item["id"], exc.__class__.__name__, exc) url = body["nextUrl"] def _get_post_data(self, post_id): """Fetch and process post data""" url = "https://api.fanbox.cc/post.info?postId="+post_id post = self.request(url, headers=self.headers).json()["body"] content_body = post.pop("body", None) if content_body: if "html" in content_body: post["html"] = content_body["html"] if post["type"] == "article": post["articleBody"] = content_body.copy() if "blocks" in content_body: content = [] # text content images = [] # image IDs in 'body' order append = content.append append_img = images.append for block in content_body["blocks"]: if "text" in block: append(block["text"]) if "links" in block: for link in block["links"]: append(link["url"]) if "imageId" in block: append_img(block["imageId"]) if images and "imageMap" in content_body: # reorder 'imageMap' (#2718) image_map = content_body["imageMap"] content_body["imageMap"] = { image_id: image_map[image_id] for image_id in images if image_id in image_map } post["content"] = "\n".join(content) post["date"] = text.parse_datetime(post["publishedDatetime"]) post["text"] = content_body.get("text") if content_body else None post["isCoverImage"] = False if self._meta_user: post["user"] = self._get_user_data(post["creatorId"]) if self._meta_plan: plans = self._get_plan_data(post["creatorId"]) post["plan"] = plans[post["feeRequired"]] return content_body, post @memcache(keyarg=1) def _get_user_data(self, creator_id): url = "https://api.fanbox.cc/creator.get" params = {"creatorId": creator_id} data = self.request(url, params=params, headers=self.headers).json() user = data["body"] user.update(user.pop("user")) return user @memcache(keyarg=1) def _get_plan_data(self, creator_id): url = "https://api.fanbox.cc/plan.listCreator" params = {"creatorId": creator_id} data = self.request(url, params=params, headers=self.headers).json() plans = {0: { "id" : "", "title" : "", "fee" : 0, "description" : "", "coverImageUrl" : "", "creatorId" : creator_id, "hasAdultContent": None, "paymentMethod" : None, }} for plan in data["body"]: del plan["user"] plans[plan["fee"]] = plan return plans def _get_urls_from_post(self, content_body, post): num = 0 cover_image = post.get("coverImageUrl") if cover_image: cover_image = re.sub("/c/[0-9a-z_]+", "", cover_image) final_post = post.copy() final_post["isCoverImage"] = True final_post["fileUrl"] = cover_image text.nameext_from_url(cover_image, final_post) final_post["num"] = num num += 1 yield Message.Url, cover_image, final_post if not content_body: return if "html" in content_body: html_urls = [] for href in text.extract_iter(content_body["html"], 'href="', '"'): if "fanbox.pixiv.net/images/entry" in href: html_urls.append(href) elif "downloads.fanbox.cc" in href: html_urls.append(href) for src in text.extract_iter(content_body["html"], 'data-src-original="', '"'): html_urls.append(src) for url in html_urls: final_post = post.copy() text.nameext_from_url(url, final_post) final_post["fileUrl"] = url final_post["num"] = num num += 1 yield Message.Url, url, final_post for group in ("images", "imageMap"): if group in content_body: for item in content_body[group]: if group == "imageMap": # imageMap is a dict with image objects as values item = content_body[group][item] final_post = post.copy() final_post["fileUrl"] = item["originalUrl"] text.nameext_from_url(item["originalUrl"], final_post) if "extension" in item: final_post["extension"] = item["extension"] final_post["fileId"] = item.get("id") final_post["width"] = item.get("width") final_post["height"] = item.get("height") final_post["num"] = num num += 1 yield Message.Url, item["originalUrl"], final_post for group in ("files", "fileMap"): if group in content_body: for item in content_body[group]: if group == "fileMap": # fileMap is a dict with file objects as values item = content_body[group][item] final_post = post.copy() final_post["fileUrl"] = item["url"] text.nameext_from_url(item["url"], final_post) if "extension" in item: final_post["extension"] = item["extension"] if "name" in item: final_post["filename"] = item["name"] final_post["fileId"] = item.get("id") final_post["num"] = num num += 1 yield Message.Url, item["url"], final_post if self.embeds: embeds_found = [] if "video" in content_body: embeds_found.append(content_body["video"]) embeds_found.extend(content_body.get("embedMap", {}).values()) for embed in embeds_found: # embed_result is (message type, url, metadata dict) embed_result = self._process_embed(post, embed) if not embed_result: continue embed_result[2]["num"] = num num += 1 yield embed_result def _process_embed(self, post, embed): final_post = post.copy() provider = embed["serviceProvider"] content_id = embed.get("videoId") or embed.get("contentId") prefix = "ytdl:" if self.embeds == "ytdl" else "" url = None is_video = False if provider == "soundcloud": url = prefix+"https://soundcloud.com/"+content_id is_video = True elif provider == "youtube": url = prefix+"https://youtube.com/watch?v="+content_id is_video = True elif provider == "vimeo": url = prefix+"https://vimeo.com/"+content_id is_video = True elif provider == "fanbox": # this is an old URL format that redirects # to a proper Fanbox URL url = "https://www.pixiv.net/fanbox/"+content_id # resolve redirect try: url = self.request(url, method="HEAD", allow_redirects=False).headers["location"] except Exception as exc: url = None self.log.warning("Unable to extract fanbox embed %s (%s: %s)", content_id, exc.__class__.__name__, exc) else: final_post["_extractor"] = FanboxPostExtractor elif provider == "twitter": url = "https://twitter.com/_/status/"+content_id elif provider == "google_forms": templ = "https://docs.google.com/forms/d/e/{}/viewform?usp=sf_link" url = templ.format(content_id) else: self.log.warning("service not recognized: {}".format(provider)) if url: final_post["embed"] = embed final_post["embedUrl"] = url text.nameext_from_url(url, final_post) msg_type = Message.Queue if is_video and self.embeds == "ytdl": msg_type = Message.Url return msg_type, url, final_post class FanboxCreatorExtractor(FanboxExtractor): """Extractor for a Fanbox creator's works""" subcategory = "creator" pattern = USER_PATTERN + r"(?:/posts)?/?$" example = "https://USER.fanbox.cc/" def __init__(self, match): FanboxExtractor.__init__(self, match) self.creator_id = match.group(1) or match.group(2) def posts(self): url = "https://api.fanbox.cc/post.listCreator?creatorId={}&limit=10" return self._pagination(url.format(self.creator_id)) class FanboxPostExtractor(FanboxExtractor): """Extractor for media from a single Fanbox post""" subcategory = "post" pattern = USER_PATTERN + r"/posts/(\d+)" example = "https://USER.fanbox.cc/posts/12345" def __init__(self, match): FanboxExtractor.__init__(self, match) self.post_id = match.group(3) def posts(self): return (self._get_post_data(self.post_id),) class FanboxHomeExtractor(FanboxExtractor): """Extractor for your Fanbox home feed""" subcategory = "home" pattern = BASE_PATTERN + r"/?$" example = "https://fanbox.cc/" def posts(self): url = "https://api.fanbox.cc/post.listHome?limit=10" return self._pagination(url) class FanboxSupportingExtractor(FanboxExtractor): """Extractor for your supported Fanbox users feed""" subcategory = "supporting" pattern = BASE_PATTERN + r"/home/supporting" example = "https://fanbox.cc/home/supporting" def posts(self): url = "https://api.fanbox.cc/post.listSupporting?limit=10" return self._pagination(url) class FanboxRedirectExtractor(Extractor): """Extractor for pixiv redirects to fanbox.cc""" category = "fanbox" subcategory = "redirect" pattern = r"(?:https?://)?(?:www\.)?pixiv\.net/fanbox/creator/(\d+)" example = "https://www.pixiv.net/fanbox/creator/12345" def __init__(self, match): Extractor.__init__(self, match) self.user_id = match.group(1) def items(self): url = "https://www.pixiv.net/fanbox/creator/" + self.user_id data = {"_extractor": FanboxCreatorExtractor} response = self.request( url, method="HEAD", allow_redirects=False, notfound="user") yield Message.Queue, response.headers["Location"], data ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709611201.0 gallery_dl-1.26.9/gallery_dl/extractor/fanleaks.py0000644000175000017500000000602214571514301020713 0ustar00mikemike# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://fanleaks.club/""" from .common import Extractor, Message from .. import text class FanleaksExtractor(Extractor): """Base class for Fanleaks extractors""" category = "fanleaks" directory_fmt = ("{category}", "{model}") filename_fmt = "{model_id}_{id}.{extension}" archive_fmt = "{model_id}_{id}" root = "https://fanleaks.club" def __init__(self, match): Extractor.__init__(self, match) self.model_id = match.group(1) def extract_post(self, url): extr = text.extract_from(self.request(url, notfound="post").text) data = { "model_id": self.model_id, "model" : text.unescape(extr('text-lg">', "")), "id" : text.parse_int(self.id), "type" : extr('type="', '"')[:5] or "photo", } url = extr('src="', '"') yield Message.Directory, data yield Message.Url, url, text.nameext_from_url(url, data) class FanleaksPostExtractor(FanleaksExtractor): """Extractor for individual posts on fanleaks.club""" subcategory = "post" pattern = r"(?:https?://)?(?:www\.)?fanleaks\.club/([^/?#]+)/(\d+)" example = "https://fanleaks.club/MODEL/12345" def __init__(self, match): FanleaksExtractor.__init__(self, match) self.id = match.group(2) def items(self): url = "{}/{}/{}".format(self.root, self.model_id, self.id) return self.extract_post(url) class FanleaksModelExtractor(FanleaksExtractor): """Extractor for all posts from a fanleaks model""" subcategory = "model" pattern = (r"(?:https?://)?(?:www\.)?fanleaks\.club" r"/(?!latest/?$)([^/?#]+)/?$") example = "https://fanleaks.club/MODEL" def items(self): page_num = 1 page = self.request( self.root + "/" + self.model_id, notfound="model").text data = { "model_id": self.model_id, "model" : text.unescape(text.extr(page, 'mt-4">', "")), "type" : "photo", } page_url = text.extr(page, "url: '", "'") while True: page = self.request("{}{}".format(page_url, page_num)).text if not page: return for item in text.extract_iter(page, '"): self.id = id = text.extr(item, "/", '"') if "/icon-play.svg" in item: url = "{}/{}/{}".format(self.root, self.model_id, id) yield from self.extract_post(url) continue data["id"] = text.parse_int(id) url = text.extr(item, 'src="', '"').replace( "/thumbs/", "/", 1) yield Message.Directory, data yield Message.Url, url, text.nameext_from_url(url, data) page_num += 1 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709611201.0 gallery_dl-1.26.9/gallery_dl/extractor/fantia.py0000644000175000017500000001553014571514301020375 0ustar00mikemike# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://fantia.jp/""" from .common import Extractor, Message from .. import text, util class FantiaExtractor(Extractor): """Base class for Fantia extractors""" category = "fantia" root = "https://fantia.jp" directory_fmt = ("{category}", "{fanclub_id}") filename_fmt = "{post_id}_{file_id}.{extension}" archive_fmt = "{post_id}_{file_id}" _warning = True def _init(self): self.headers = { "Accept" : "application/json, text/plain, */*", "X-Requested-With": "XMLHttpRequest", } self._empty_plan = { "id" : 0, "price": 0, "limit": 0, "name" : "", "description": "", "thumb": self.root + "/images/fallback/plan/thumb_default.png", } if self._warning: if not self.cookies_check(("_session_id",)): self.log.warning("no '_session_id' cookie set") FantiaExtractor._warning = False def items(self): for post_id in self.posts(): post = self._get_post_data(post_id) post["num"] = 0 contents = self._get_post_contents(post) post["content_count"] = len(contents) post["content_num"] = 0 for content in contents: files = self._process_content(post, content) yield Message.Directory, post if content["visible_status"] != "visible": self.log.warning( "Unable to download '%s' files from " "%s#post-content-id-%s", content["visible_status"], post["post_url"], content["id"]) for file in files: post.update(file) post["num"] += 1 text.nameext_from_url( post["content_filename"] or file["file_url"], post) yield Message.Url, file["file_url"], post post["content_num"] += 1 def posts(self): """Return post IDs""" def _pagination(self, url): params = {"page": 1} while True: page = self.request(url, params=params).text self._csrf_token(page) post_id = None for post_id in text.extract_iter( page, 'class="link-block" href="/posts/', '"'): yield post_id if not post_id: return params["page"] += 1 def _csrf_token(self, page=None): if not page: page = self.request(self.root + "/").text self.headers["X-CSRF-Token"] = text.extr( page, 'name="csrf-token" content="', '"') def _get_post_data(self, post_id): """Fetch and process post data""" url = self.root+"/api/v1/posts/"+post_id resp = self.request(url, headers=self.headers).json()["post"] return { "post_id": resp["id"], "post_url": self.root + "/posts/" + str(resp["id"]), "post_title": resp["title"], "comment": resp["comment"], "rating": resp["rating"], "posted_at": resp["posted_at"], "date": text.parse_datetime( resp["posted_at"], "%a, %d %b %Y %H:%M:%S %z"), "fanclub_id": resp["fanclub"]["id"], "fanclub_user_id": resp["fanclub"]["user"]["id"], "fanclub_user_name": resp["fanclub"]["user"]["name"], "fanclub_name": resp["fanclub"]["name"], "fanclub_url": self.root+"/fanclubs/"+str(resp["fanclub"]["id"]), "tags": [t["name"] for t in resp["tags"]], "_data": resp, } def _get_post_contents(self, post): contents = post["_data"]["post_contents"] try: url = post["_data"]["thumb"]["original"] except Exception: pass else: contents.insert(0, { "id": "thumb", "title": "thumb", "category": "thumb", "download_uri": url, "visible_status": "visible", "plan": None, }) return contents def _process_content(self, post, content): post["content_category"] = content["category"] post["content_title"] = content["title"] post["content_filename"] = content.get("filename") or "" post["content_id"] = content["id"] post["content_comment"] = content.get("comment") or "" post["content_num"] += 1 post["plan"] = content["plan"] or self._empty_plan files = [] if "post_content_photos" in content: for photo in content["post_content_photos"]: files.append({"file_id" : photo["id"], "file_url": photo["url"]["original"]}) if "download_uri" in content: url = content["download_uri"] if url[0] == "/": url = self.root + url files.append({"file_id" : content["id"], "file_url": url}) if content["category"] == "blog" and "comment" in content: comment_json = util.json_loads(content["comment"]) blog_text = "" for op in comment_json.get("ops") or (): insert = op.get("insert") if isinstance(insert, str): blog_text += insert elif isinstance(insert, dict) and "fantiaImage" in insert: img = insert["fantiaImage"] files.append({"file_id" : img["id"], "file_url": self.root + img["original_url"]}) post["blogpost_text"] = blog_text else: post["blogpost_text"] = "" return files class FantiaCreatorExtractor(FantiaExtractor): """Extractor for a Fantia creator's works""" subcategory = "creator" pattern = r"(?:https?://)?(?:www\.)?fantia\.jp/fanclubs/(\d+)" example = "https://fantia.jp/fanclubs/12345" def __init__(self, match): FantiaExtractor.__init__(self, match) self.creator_id = match.group(1) def posts(self): url = "{}/fanclubs/{}/posts".format(self.root, self.creator_id) return self._pagination(url) class FantiaPostExtractor(FantiaExtractor): """Extractor for media from a single Fantia post""" subcategory = "post" pattern = r"(?:https?://)?(?:www\.)?fantia\.jp/posts/(\d+)" example = "https://fantia.jp/posts/12345" def __init__(self, match): FantiaExtractor.__init__(self, match) self.post_id = match.group(1) def posts(self): self._csrf_token() return (self.post_id,) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709680883.0 gallery_dl-1.26.9/gallery_dl/extractor/fapachi.py0000644000175000017500000000436414571724363020544 0ustar00mikemike# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://fapachi.com/""" from .common import Extractor, Message from .. import text class FapachiPostExtractor(Extractor): """Extractor for individual posts on fapachi.com""" category = "fapachi" subcategory = "post" root = "https://fapachi.com" directory_fmt = ("{category}", "{user}") filename_fmt = "{user}_{id}.{extension}" archive_fmt = "{user}_{id}" pattern = (r"(?:https?://)?(?:www\.)?fapachi\.com" r"/(?!search/)([^/?#]+)/media/(\d+)") example = "https://fapachi.com/MODEL/media/12345" def __init__(self, match): Extractor.__init__(self, match) self.user, self.id = match.groups() def items(self): data = { "user": self.user, "id" : self.id, } page = self.request("{}/{}/media/{}".format( self.root, self.user, self.id)).text url = self.root + text.extr(page, 'd-block" src="', '"') yield Message.Directory, data yield Message.Url, url, text.nameext_from_url(url, data) class FapachiUserExtractor(Extractor): """Extractor for all posts from a fapachi user""" category = "fapachi" subcategory = "user" root = "https://fapachi.com" pattern = (r"(?:https?://)?(?:www\.)?fapachi\.com" r"/(?!search(?:/|$))([^/?#]+)(?:/page/(\d+))?$") example = "https://fapachi.com/MODEL" def __init__(self, match): Extractor.__init__(self, match) self.user = match.group(1) self.num = text.parse_int(match.group(2), 1) def items(self): data = {"_extractor": FapachiPostExtractor} while True: page = self.request("{}/{}/page/{}".format( self.root, self.user, self.num)).text for post in text.extract_iter(page, 'model-media-prew">', ">"): path = text.extr(post, 'Next page' not in page: return self.num += 1 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1711041237.0 gallery_dl-1.26.9/gallery_dl/extractor/fapello.py0000644000175000017500000000767114577065325020601 0ustar00mikemike# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://fapello.com/""" from .common import Extractor, Message from .. import text, exception BASE_PATTERN = r"(?:https?://)?(?:www\.)?fapello\.(?:com|su)" class FapelloPostExtractor(Extractor): """Extractor for individual posts on fapello.com""" category = "fapello" subcategory = "post" directory_fmt = ("{category}", "{model}") filename_fmt = "{model}_{id}.{extension}" archive_fmt = "{type}_{model}_{id}" pattern = BASE_PATTERN + r"/(?!search/|popular_videos/)([^/?#]+)/(\d+)" example = "https://fapello.com/MODEL/12345/" def __init__(self, match): Extractor.__init__(self, match) self.root = text.root_from_url(match.group(0)) self.model, self.id = match.groups() def items(self): url = "{}/{}/{}/".format(self.root, self.model, self.id) page = text.extr( self.request(url, allow_redirects=False).text, 'class="uk-align-center"', "
  • ", None) if page is None: raise exception.NotFoundError("post") data = { "model": self.model, "id" : text.parse_int(self.id), "type" : "video" if 'type="video' in page else "photo", "thumbnail": text.extr(page, 'poster="', '"'), } url = text.extr(page, 'src="', '"').replace( ".md", "").replace(".th", "") yield Message.Directory, data yield Message.Url, url, text.nameext_from_url(url, data) class FapelloModelExtractor(Extractor): """Extractor for all posts from a fapello model""" category = "fapello" subcategory = "model" pattern = (BASE_PATTERN + r"/(?!top-(?:likes|followers)|popular_videos" r"|videos|trending|search/?$)" r"([^/?#]+)/?$") example = "https://fapello.com/model/" def __init__(self, match): Extractor.__init__(self, match) self.root = text.root_from_url(match.group(0)) self.model = match.group(1) def items(self): num = 1 data = {"_extractor": FapelloPostExtractor} while True: url = "{}/ajax/model/{}/page-{}/".format( self.root, self.model, num) page = self.request(url).text if not page: return for url in text.extract_iter(page, '', ""): yield Message.Queue, text.extr(item, ' 0 and (int(size["width"]) > self.maxsize or int(size["height"]) > self.maxsize): del sizes[index:] break return sizes def photos_search(self, params): """Return a list of photos matching some criteria.""" return self._pagination("photos.search", params.copy()) def photosets_getInfo(self, photoset_id, user_id): """Gets information about a photoset.""" params = {"photoset_id": photoset_id, "user_id": user_id} photoset = self._call("photosets.getInfo", params)["photoset"] return self._clean_info(photoset) def photosets_getList(self, user_id): """Returns the photosets belonging to the specified user.""" params = {"user_id": user_id} return self._pagination_sets("photosets.getList", params) def photosets_getPhotos(self, photoset_id): """Get the list of photos in a set.""" params = {"photoset_id": photoset_id} return self._pagination("photosets.getPhotos", params, "photoset") def urls_lookupGroup(self, groupname): """Returns a group NSID, given the url to a group's page.""" params = {"url": "https://www.flickr.com/groups/" + groupname} group = self._call("urls.lookupGroup", params)["group"] return {"nsid": group["id"], "path_alias": groupname, "groupname": group["groupname"]["_content"]} def urls_lookupUser(self, username): """Returns a user NSID, given the url to a user's photos or profile.""" params = {"url": "https://www.flickr.com/photos/" + username} user = self._call("urls.lookupUser", params)["user"] return { "nsid" : user["id"], "username" : user["username"]["_content"], "path_alias": username, } def video_getStreamInfo(self, video_id, secret=None): """Returns all available video streams""" params = {"photo_id": video_id} if not secret: secret = self._call("photos.getInfo", params)["photo"]["secret"] params["secret"] = secret stream = self._call("video.getStreamInfo", params)["streams"]["stream"] return max(stream, key=lambda s: self.VIDEO_FORMATS.get(s["type"], 0)) def _call(self, method, params): params["method"] = "flickr." + method params["format"] = "json" params["nojsoncallback"] = "1" if self.api_key: params["api_key"] = self.api_key response = self.request(self.API_URL, params=params) try: data = response.json() except ValueError: data = {"code": -1, "message": response.content} if "code" in data: msg = data.get("message") self.log.debug("Server response: %s", data) if data["code"] == 1: raise exception.NotFoundError(self.extractor.subcategory) elif data["code"] == 98: raise exception.AuthenticationError(msg) elif data["code"] == 99: raise exception.AuthorizationError(msg) raise exception.StopExtraction("API request failed: %s", msg) return data def _pagination(self, method, params, key="photos"): extras = ("description,date_upload,tags,views,media," "path_alias,owner_name,") includes = self.extractor.config("metadata") if includes: if isinstance(includes, (list, tuple)): includes = ",".join(includes) elif not isinstance(includes, str): includes = ("license,date_taken,original_format,last_update," "geo,machine_tags,o_dims") extras = extras + includes + "," extras += ",".join("url_" + fmt[0] for fmt in self.formats) params["extras"] = extras params["page"] = 1 while True: data = self._call(method, params)[key] yield from data["photo"] if params["page"] >= data["pages"]: return params["page"] += 1 def _pagination_sets(self, method, params): params["page"] = 1 while True: data = self._call(method, params)["photosets"] yield from data["photoset"] if params["page"] >= data["pages"]: return params["page"] += 1 def _extract_format(self, photo): photo["description"] = photo["description"]["_content"].strip() photo["views"] = text.parse_int(photo["views"]) photo["date"] = text.parse_timestamp(photo["dateupload"]) photo["tags"] = photo["tags"].split() if self.exif: photo.update(self.photos_getExif(photo["id"])) if self.contexts: photo.update(self.photos_getAllContexts(photo["id"])) photo["id"] = text.parse_int(photo["id"]) if "owner" in photo: photo["owner"] = { "nsid" : photo["owner"], "username" : photo["ownername"], "path_alias": photo["pathalias"], } else: photo["owner"] = self.extractor.user del photo["pathalias"] del photo["ownername"] if photo["media"] == "video" and self.videos: return self._extract_video(photo) for fmt, fmtname, fmtwidth in self.formats: key = "url_" + fmt if key in photo: photo["width"] = text.parse_int(photo["width_" + fmt]) photo["height"] = text.parse_int(photo["height_" + fmt]) if self.maxsize and (photo["width"] > self.maxsize or photo["height"] > self.maxsize): continue photo["url"] = photo[key] photo["label"] = fmtname # remove excess data keys = [ key for key in photo if key.startswith(("url_", "width_", "height_")) ] for key in keys: del photo[key] break else: self._extract_photo(photo) return photo def _extract_photo(self, photo): size = self.photos_getSizes(photo["id"])[-1] photo["url"] = size["source"] photo["label"] = size["label"] photo["width"] = text.parse_int(size["width"]) photo["height"] = text.parse_int(size["height"]) return photo def _extract_video(self, photo): stream = self.video_getStreamInfo(photo["id"], photo.get("secret")) photo["url"] = stream["_content"] photo["label"] = stream["type"] photo["width"] = photo["height"] = 0 return photo @staticmethod def _clean_info(info): info["title"] = info["title"]["_content"] info["description"] = info["description"]["_content"] return info ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709611201.0 gallery_dl-1.26.9/gallery_dl/extractor/foolfuuka.py0000644000175000017500000002001614571514301021121 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2019-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for FoolFuuka 4chan archives""" from .common import BaseExtractor, Message from .. import text import itertools class FoolfuukaExtractor(BaseExtractor): """Base extractor for FoolFuuka based boards/archives""" basecategory = "foolfuuka" filename_fmt = "{timestamp_ms} {filename_media}.{extension}" archive_fmt = "{board[shortname]}_{num}_{timestamp}" external = "default" def __init__(self, match): BaseExtractor.__init__(self, match) if self.category == "b4k": self.remote = self._remote_direct elif self.category == "archivedmoe": self.referer = False def items(self): yield Message.Directory, self.metadata() for post in self.posts(): media = post["media"] if not media: continue url = media["media_link"] if not url and "remote_media_link" in media: url = self.remote(media) if url.startswith("/"): url = self.root + url post["filename"], _, post["extension"] = \ media["media"].rpartition(".") post["filename_media"] = media["media_filename"].rpartition(".")[0] post["timestamp_ms"] = text.parse_int( media["media_orig"].rpartition(".")[0]) yield Message.Url, url, post def metadata(self): """Return general metadata""" def posts(self): """Return an iterable with all relevant posts""" def remote(self, media): """Resolve a remote media link""" page = self.request(media["remote_media_link"]).text url = text.extr(page, 'http-equiv="Refresh" content="0; url=', '"') if url.endswith(".webm") and \ url.startswith("https://thebarchive.com/"): return url[:-1] return url @staticmethod def _remote_direct(media): return media["remote_media_link"] BASE_PATTERN = FoolfuukaExtractor.update({ "4plebs": { "root": "https://archive.4plebs.org", "pattern": r"(?:archive\.)?4plebs\.org", }, "archivedmoe": { "root": "https://archived.moe", "pattern": r"archived\.moe", }, "archiveofsins": { "root": "https://archiveofsins.com", "pattern": r"(?:www\.)?archiveofsins\.com", }, "b4k": { "root": "https://arch.b4k.co", "pattern": r"arch\.b4k\.co", }, "desuarchive": { "root": "https://desuarchive.org", "pattern": r"desuarchive\.org", }, "fireden": { "root": "https://boards.fireden.net", "pattern": r"boards\.fireden\.net", }, "palanq": { "root": "https://archive.palanq.win", "pattern": r"archive\.palanq\.win", }, "rbt": { "root": "https://rbt.asia", "pattern": r"(?:rbt\.asia|(?:archive\.)?rebeccablacktech\.com)", }, "thebarchive": { "root": "https://thebarchive.com", "pattern": r"thebarchive\.com", }, }) class FoolfuukaThreadExtractor(FoolfuukaExtractor): """Base extractor for threads on FoolFuuka based boards/archives""" subcategory = "thread" directory_fmt = ("{category}", "{board[shortname]}", "{thread_num} {title|comment[:50]}") pattern = BASE_PATTERN + r"/([^/?#]+)/thread/(\d+)" example = "https://archived.moe/a/thread/12345/" def __init__(self, match): FoolfuukaExtractor.__init__(self, match) self.board = match.group(match.lastindex-1) self.thread = match.group(match.lastindex) self.data = None def metadata(self): url = self.root + "/_/api/chan/thread/" params = {"board": self.board, "num": self.thread} self.data = self.request(url, params=params).json()[self.thread] return self.data["op"] def posts(self): op = (self.data["op"],) posts = self.data.get("posts") if posts: posts = list(posts.values()) posts.sort(key=lambda p: p["timestamp"]) return itertools.chain(op, posts) return op class FoolfuukaBoardExtractor(FoolfuukaExtractor): """Base extractor for FoolFuuka based boards/archives""" subcategory = "board" pattern = BASE_PATTERN + r"/([^/?#]+)/\d*$" example = "https://archived.moe/a/" def __init__(self, match): FoolfuukaExtractor.__init__(self, match) self.board = match.group(match.lastindex) def items(self): index_base = "{}/_/api/chan/index/?board={}&page=".format( self.root, self.board) thread_base = "{}/{}/thread/".format(self.root, self.board) for page in itertools.count(1): with self.request(index_base + format(page)) as response: try: threads = response.json() except ValueError: threads = None if not threads: return for num, thread in threads.items(): thread["url"] = thread_base + format(num) thread["_extractor"] = FoolfuukaThreadExtractor yield Message.Queue, thread["url"], thread class FoolfuukaSearchExtractor(FoolfuukaExtractor): """Base extractor for search results on FoolFuuka based boards/archives""" subcategory = "search" directory_fmt = ("{category}", "search", "{search}") pattern = BASE_PATTERN + r"/([^/?#]+)/search((?:/[^/?#]+/[^/?#]+)+)" example = "https://archived.moe/_/search/text/QUERY/" request_interval = (0.5, 1.5) def __init__(self, match): FoolfuukaExtractor.__init__(self, match) self.params = params = {} args = match.group(match.lastindex).split("/") key = None for arg in args: if key: params[key] = text.unescape(arg) key = None else: key = arg board = match.group(match.lastindex-1) if board != "_": params["boards"] = board def metadata(self): return {"search": self.params.get("text", "")} def posts(self): url = self.root + "/_/api/chan/search/" params = self.params.copy() params["page"] = text.parse_int(params.get("page"), 1) if "filter" not in params: params["filter"] = "text" while True: try: data = self.request(url, params=params).json() except ValueError: return if isinstance(data, dict): if data.get("error"): return posts = data["0"]["posts"] elif isinstance(data, list): posts = data[0]["posts"] else: return yield from posts if len(posts) <= 3: return params["page"] += 1 class FoolfuukaGalleryExtractor(FoolfuukaExtractor): """Base extractor for FoolFuuka galleries""" subcategory = "gallery" directory_fmt = ("{category}", "{board}", "gallery") pattern = BASE_PATTERN + r"/([^/?#]+)/gallery(?:/(\d+))?" example = "https://archived.moe/a/gallery" def __init__(self, match): FoolfuukaExtractor.__init__(self, match) board = match.group(match.lastindex) if board.isdecimal(): self.board = match.group(match.lastindex-1) self.pages = (board,) else: self.board = board self.pages = map(format, itertools.count(1)) def metadata(self): return {"board": self.board} def posts(self): base = "{}/_/api/chan/gallery/?board={}&page=".format( self.root, self.board) for page in self.pages: with self.request(base + page) as response: posts = response.json() if not posts: return yield from posts ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709611201.0 gallery_dl-1.26.9/gallery_dl/extractor/foolslide.py0000644000175000017500000001047114571514301021112 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2016-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for FoOlSlide based sites""" from .common import BaseExtractor, Message from .. import text, util class FoolslideExtractor(BaseExtractor): """Base class for FoOlSlide extractors""" basecategory = "foolslide" def __init__(self, match): BaseExtractor.__init__(self, match) self.gallery_url = self.root + match.group(match.lastindex) def request(self, url): return BaseExtractor.request( self, url, encoding="utf-8", method="POST", data={"adult": "true"}) @staticmethod def parse_chapter_url(url, data): info = url.partition("/read/")[2].rstrip("/").split("/") lang = info[1].partition("-")[0] data["lang"] = lang data["language"] = util.code_to_language(lang) data["volume"] = text.parse_int(info[2]) data["chapter"] = text.parse_int(info[3]) data["chapter_minor"] = "." + info[4] if len(info) >= 5 else "" data["title"] = data["chapter_string"].partition(":")[2].strip() return data BASE_PATTERN = FoolslideExtractor.update({ }) class FoolslideChapterExtractor(FoolslideExtractor): """Base class for chapter extractors for FoOlSlide based sites""" subcategory = "chapter" directory_fmt = ("{category}", "{manga}", "{chapter_string}") filename_fmt = ( "{manga}_c{chapter:>03}{chapter_minor:?//}_{page:>03}.{extension}") archive_fmt = "{id}" pattern = BASE_PATTERN + r"(/read/[^/?#]+/[a-z-]+/\d+/\d+(?:/\d+)?)" example = "https://read.powermanga.org/read/MANGA/en/0/123/" def items(self): page = self.request(self.gallery_url).text data = self.metadata(page) imgs = self.images(page) data["count"] = len(imgs) data["chapter_id"] = text.parse_int(imgs[0]["chapter_id"]) yield Message.Directory, data enum = util.enumerate_reversed if self.config( "page-reverse") else enumerate for data["page"], image in enum(imgs, 1): try: url = image["url"] del image["url"] del image["chapter_id"] del image["thumb_url"] except KeyError: pass for key in ("height", "id", "size", "width"): image[key] = text.parse_int(image[key]) data.update(image) text.nameext_from_url(data["filename"], data) yield Message.Url, url, data def metadata(self, page): extr = text.extract_from(page) extr('

    ', '') return self.parse_chapter_url(self.gallery_url, { "manga" : text.unescape(extr('title="', '"')).strip(), "chapter_string": text.unescape(extr('title="', '"')), }) def images(self, page): return util.json_loads(text.extr(page, "var pages = ", ";")) class FoolslideMangaExtractor(FoolslideExtractor): """Base class for manga extractors for FoOlSlide based sites""" subcategory = "manga" categorytransfer = True pattern = BASE_PATTERN + r"(/series/[^/?#]+)" example = "https://read.powermanga.org/series/MANGA/" def items(self): page = self.request(self.gallery_url).text chapters = self.chapters(page) if not self.config("chapter-reverse", False): chapters.reverse() for chapter, data in chapters: data["_extractor"] = FoolslideChapterExtractor yield Message.Queue, chapter, data def chapters(self, page): extr = text.extract_from(page) manga = text.unescape(extr('

    ', '

    ')).strip() author = extr('Author: ', 'Artist: ', '
    ")) path = extr('href="//d', '"') if not path: self.log.warning( "Unable to download post %s (\"%s\")", post_id, text.remove_html( extr('System Message', '') or extr('System Message', '
    ') ) ) return None pi = text.parse_int rh = text.remove_html data = text.nameext_from_url(path, { "id" : pi(post_id), "url": "https://d" + path, }) if self._new_layout: data["tags"] = text.split_html(extr( 'class="tags-row">', '')) data["title"] = text.unescape(extr("

    ", "

    ")) data["artist"] = extr("", "<") data["_description"] = extr( 'class="submission-description user-submitted-links">', ' ') data["views"] = pi(rh(extr('class="views">', ''))) data["favorites"] = pi(rh(extr('class="favorites">', ''))) data["comments"] = pi(rh(extr('class="comments">', ''))) data["rating"] = rh(extr('class="rating">', '')) data["fa_category"] = rh(extr('>Category', '')) data["theme"] = rh(extr('>', '<')) data["species"] = rh(extr('>Species', '')) data["gender"] = rh(extr('>Gender', '')) data["width"] = pi(extr("", "x")) data["height"] = pi(extr("", "p")) else: # old site layout data["title"] = text.unescape(extr("

    ", "

    ")) data["artist"] = extr(">", "<") data["fa_category"] = extr("Category:", "<").strip() data["theme"] = extr("Theme:", "<").strip() data["species"] = extr("Species:", "<").strip() data["gender"] = extr("Gender:", "<").strip() data["favorites"] = pi(extr("Favorites:", "<")) data["comments"] = pi(extr("Comments:", "<")) data["views"] = pi(extr("Views:", "<")) data["width"] = pi(extr("Resolution:", "x")) data["height"] = pi(extr("", "<")) data["tags"] = text.split_html(extr( 'id="keywords">', ''))[::2] data["rating"] = extr('', ' ')
            data[', ' ') data["artist_url"] = data["artist"].replace("_", "").lower() data["user"] = self.user or data["artist_url"] data["date"] = text.parse_timestamp(data["filename"].partition(".")[0]) data["description"] = self._process_description(data["_description"]) return data @staticmethod def _process_description(description): return text.unescape(text.remove_html(description, "", "")) def _pagination(self, path): num = 1 while True: url = "{}/{}/{}/{}/".format( self.root, path, self.user, num) page = self.request(url).text post_id = None for post_id in text.extract_iter(page, 'id="sid-', '"'): yield post_id if not post_id: return num += 1 def _pagination_favorites(self): path = "/favorites/{}/".format(self.user) while path: page = self.request(self.root + path).text extr = text.extract_from(page) while True: post_id = extr('id="sid-', '"') if not post_id: break self._favorite_id = text.parse_int(extr('data-fav-id="', '"')) yield post_id path = text.extr(page, 'right" href="', '"') def _pagination_search(self, query): url = self.root + "/search/" data = { "page" : 1, "order-by" : "relevancy", "order-direction": "desc", "range" : "all", "range_from" : "", "range_to" : "", "rating-general" : "1", "rating-mature" : "1", "rating-adult" : "1", "type-art" : "1", "type-music" : "1", "type-flash" : "1", "type-story" : "1", "type-photo" : "1", "type-poetry" : "1", "mode" : "extended", } data.update(query) if "page" in query: data["page"] = text.parse_int(query["page"]) while True: page = self.request(url, method="POST", data=data).text post_id = None for post_id in text.extract_iter(page, 'id="sid-', '"'): yield post_id if not post_id: return if "next_page" in data: data["page"] += 1 else: data["next_page"] = "Next" class FuraffinityGalleryExtractor(FuraffinityExtractor): """Extractor for a furaffinity user's gallery""" subcategory = "gallery" pattern = BASE_PATTERN + r"/gallery/([^/?#]+)" example = "https://www.furaffinity.net/gallery/USER/" def posts(self): return self._pagination("gallery") class FuraffinityScrapsExtractor(FuraffinityExtractor): """Extractor for a furaffinity user's scraps""" subcategory = "scraps" directory_fmt = ("{category}", "{user!l}", "Scraps") pattern = BASE_PATTERN + r"/scraps/([^/?#]+)" example = "https://www.furaffinity.net/scraps/USER/" def posts(self): return self._pagination("scraps") class FuraffinityFavoriteExtractor(FuraffinityExtractor): """Extractor for a furaffinity user's favorites""" subcategory = "favorite" directory_fmt = ("{category}", "{user!l}", "Favorites") pattern = BASE_PATTERN + r"/favorites/([^/?#]+)" example = "https://www.furaffinity.net/favorites/USER/" def posts(self): return self._pagination_favorites() def _parse_post(self, post_id): post = FuraffinityExtractor._parse_post(self, post_id) if post: post["favorite_id"] = self._favorite_id return post class FuraffinitySearchExtractor(FuraffinityExtractor): """Extractor for furaffinity search results""" subcategory = "search" directory_fmt = ("{category}", "Search", "{search}") pattern = BASE_PATTERN + r"/search(?:/([^/?#]+))?/?[?&]([^#]+)" example = "https://www.furaffinity.net/search/?q=QUERY" def __init__(self, match): FuraffinityExtractor.__init__(self, match) self.query = text.parse_query(match.group(2)) if self.user and "q" not in self.query: self.query["q"] = text.unquote(self.user) def metadata(self): return {"search": self.query.get("q")} def posts(self): return self._pagination_search(self.query) class FuraffinityPostExtractor(FuraffinityExtractor): """Extractor for individual posts on furaffinity""" subcategory = "post" pattern = BASE_PATTERN + r"/(?:view|full)/(\d+)" example = "https://www.furaffinity.net/view/12345/" def posts(self): post_id = self.user self.user = None return (post_id,) class FuraffinityUserExtractor(FuraffinityExtractor): """Extractor for furaffinity user profiles""" subcategory = "user" cookies_domain = None pattern = BASE_PATTERN + r"/user/([^/?#]+)" example = "https://www.furaffinity.net/user/USER/" def initialize(self): pass skip = Extractor.skip def items(self): base = "{}/{{}}/{}/".format(self.root, self.user) return self._dispatch_extractors(( (FuraffinityGalleryExtractor , base.format("gallery")), (FuraffinityScrapsExtractor , base.format("scraps")), (FuraffinityFavoriteExtractor, base.format("favorites")), ), ("gallery",)) class FuraffinityFollowingExtractor(FuraffinityExtractor): """Extractor for a furaffinity user's watched users""" subcategory = "following" pattern = BASE_PATTERN + "/watchlist/by/([^/?#]+)" example = "https://www.furaffinity.net/watchlist/by/USER/" def items(self): url = "{}/watchlist/by/{}/".format(self.root, self.user) data = {"_extractor": FuraffinityUserExtractor} while True: page = self.request(url).text for path in text.extract_iter(page, '
    ", "").strip() title, _, gallery_id = title.rpartition("#") return { "gallery_id" : text.parse_int(gallery_id), "gallery_hash": self.gallery_hash, "title" : text.unescape(title[:-15]), "views" : data.get("hits"), "score" : data.get("rating"), "tags" : (data.get("tags") or "").split(","), } def images(self, page): return [ ("https:" + image["imageUrl"], image) for image in self.data["images"] ] class FuskatorSearchExtractor(Extractor): """Extractor for search results on fuskator.com""" category = "fuskator" subcategory = "search" root = "https://fuskator.com" pattern = r"(?:https?://)?fuskator\.com(/(?:search|page)/.+)" example = "https://fuskator.com/search/TAG/" def __init__(self, match): Extractor.__init__(self, match) self.path = match.group(1) def items(self): url = self.root + self.path data = {"_extractor": FuskatorGalleryExtractor} while True: page = self.request(url).text for path in text.extract_iter( page, 'class="pic_pad">', '>>><') if not pages: return url = self.root + text.rextract(pages, 'href="', '"')[0] ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1711195157.0 gallery_dl-1.26.9/gallery_dl/extractor/gelbooru.py0000644000175000017500000002121314577542025020756 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2014-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://gelbooru.com/""" from .common import Extractor, Message from . import gelbooru_v02 from .. import text, exception import binascii BASE_PATTERN = r"(?:https?://)?(?:www\.)?gelbooru\.com/(?:index\.php)?\?" class GelbooruBase(): """Base class for gelbooru extractors""" category = "gelbooru" basecategory = "booru" root = "https://gelbooru.com" offset = 0 def _api_request(self, params, key="post", log=False): if "s" not in params: params["s"] = "post" params["api_key"] = self.api_key params["user_id"] = self.user_id url = self.root + "/index.php?page=dapi&q=index&json=1" data = self.request(url, params=params).json() if not key: return data try: posts = data[key] except KeyError: if log: self.log.error("Incomplete API response (missing '%s')", key) self.log.debug("%s", data) return [] if not isinstance(posts, list): return (posts,) return posts def _pagination(self, params): params["pid"] = self.page_start params["limit"] = self.per_page limit = self.per_page // 2 while True: posts = self._api_request(params) for post in posts: yield post if len(posts) < limit: return if "pid" in params: del params["pid"] params["tags"] = "{} id:<{}".format(self.tags, post["id"]) def _pagination_html(self, params): url = self.root + "/index.php" params["pid"] = self.offset data = {} while True: num_ids = 0 page = self.request(url, params=params).text for data["id"] in text.extract_iter(page, '" id="p', '"'): num_ids += 1 yield from self._api_request(data) if num_ids < self.per_page: return params["pid"] += self.per_page @staticmethod def _file_url(post): url = post["file_url"] if url.endswith((".webm", ".mp4")): md5 = post["md5"] path = "/images/{}/{}/{}.webm".format(md5[0:2], md5[2:4], md5) post["_fallback"] = GelbooruBase._video_fallback(path) url = "https://img3.gelbooru.com" + path return url @staticmethod def _video_fallback(path): yield "https://img2.gelbooru.com" + path yield "https://img1.gelbooru.com" + path def _notes(self, post, page): notes_data = text.extr(page, '
    ') if not notes_data: return post["notes"] = notes = [] extr = text.extract for note in text.extract_iter(notes_data, ''): notes.append({ "width" : int(extr(note, 'data-width="', '"')[0]), "height": int(extr(note, 'data-height="', '"')[0]), "x" : int(extr(note, 'data-x="', '"')[0]), "y" : int(extr(note, 'data-y="', '"')[0]), "body" : extr(note, 'data-body="', '"')[0], }) def _skip_offset(self, num): self.offset += num return num class GelbooruTagExtractor(GelbooruBase, gelbooru_v02.GelbooruV02TagExtractor): """Extractor for images from gelbooru.com based on search-tags""" pattern = BASE_PATTERN + r"page=post&s=list&tags=([^&#]*)" example = "https://gelbooru.com/index.php?page=post&s=list&tags=TAG" class GelbooruPoolExtractor(GelbooruBase, gelbooru_v02.GelbooruV02PoolExtractor): """Extractor for gelbooru pools""" per_page = 45 pattern = BASE_PATTERN + r"page=pool&s=show&id=(\d+)" example = "https://gelbooru.com/index.php?page=pool&s=show&id=12345" skip = GelbooruBase._skip_offset def metadata(self): url = self.root + "/index.php" self._params = { "page": "pool", "s" : "show", "id" : self.pool_id, } page = self.request(url, params=self._params).text name, pos = text.extract(page, "

    Now Viewing: ", "

    ") if not name: raise exception.NotFoundError("pool") return { "pool": text.parse_int(self.pool_id), "pool_name": text.unescape(name), } def posts(self): return self._pagination_html(self._params) class GelbooruFavoriteExtractor(GelbooruBase, gelbooru_v02.GelbooruV02FavoriteExtractor): """Extractor for gelbooru favorites""" per_page = 100 pattern = BASE_PATTERN + r"page=favorites&s=view&id=(\d+)" example = "https://gelbooru.com/index.php?page=favorites&s=view&id=12345" skip = GelbooruBase._skip_offset def posts(self): # get number of favorites params = { "s" : "favorite", "id" : self.favorite_id, "limit": "2", } data = self._api_request(params, None, True) count = data["@attributes"]["count"] self.log.debug("API reports %s favorite entries", count) favs = data["favorite"] try: order = 1 if favs[0]["id"] < favs[1]["id"] else -1 except LookupError as exc: self.log.debug( "Error when determining API favorite order (%s: %s)", exc.__class__.__name__, exc) order = -1 else: self.log.debug("API yields favorites in %sscending order", "a" if order > 0 else "de") order_favs = self.config("order-posts") if order_favs and order_favs[0] in ("r", "a"): self.log.debug("Returning them in reverse") order = -order if order < 0: return self._pagination(params, count) return self._pagination_reverse(params, count) def _pagination(self, params, count): if self.offset: pnum, skip = divmod(self.offset, self.per_page) else: pnum = skip = 0 params["pid"] = pnum params["limit"] = self.per_page while True: favs = self._api_request(params, "favorite") if not favs: return if skip: favs = favs[skip:] skip = 0 for fav in favs: for post in self._api_request({"id": fav["favorite"]}): post["date_favorited"] = text.parse_timestamp(fav["added"]) yield post params["pid"] += 1 def _pagination_reverse(self, params, count): pnum, last = divmod(count-1, self.per_page) if self.offset > last: # page number change self.offset -= last diff, self.offset = divmod(self.offset-1, self.per_page) pnum -= diff + 1 skip = self.offset params["pid"] = pnum params["limit"] = self.per_page while True: favs = self._api_request(params, "favorite") favs.reverse() if skip: favs = favs[skip:] skip = 0 for fav in favs: for post in self._api_request({"id": fav["favorite"]}): post["date_favorited"] = text.parse_timestamp(fav["added"]) yield post params["pid"] -= 1 if params["pid"] < 0: return class GelbooruPostExtractor(GelbooruBase, gelbooru_v02.GelbooruV02PostExtractor): """Extractor for single images from gelbooru.com""" pattern = (BASE_PATTERN + r"(?=(?:[^#]+&)?page=post(?:&|#|$))" r"(?=(?:[^#]+&)?s=view(?:&|#|$))" r"(?:[^#]+&)?id=(\d+)") example = "https://gelbooru.com/index.php?page=post&s=view&id=12345" class GelbooruRedirectExtractor(GelbooruBase, Extractor): subcategory = "redirect" pattern = (r"(?:https?://)?(?:www\.)?gelbooru\.com" r"/redirect\.php\?s=([^&#]+)") example = "https://gelbooru.com/redirect.php?s=BASE64" def __init__(self, match): Extractor.__init__(self, match) self.url_base64 = match.group(1) def items(self): url = text.ensure_http_scheme(binascii.a2b_base64( self.url_base64).decode()) data = {"_extractor": GelbooruPostExtractor} yield Message.Queue, url, data ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709611201.0 gallery_dl-1.26.9/gallery_dl/extractor/gelbooru_v01.py0000644000175000017500000001071214571514301021434 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2021-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for Gelbooru Beta 0.1.11 sites""" from . import booru from .. import text class GelbooruV01Extractor(booru.BooruExtractor): basecategory = "gelbooru_v01" per_page = 20 def _parse_post(self, post_id): url = "{}/index.php?page=post&s=view&id={}".format( self.root, post_id) extr = text.extract_from(self.request(url).text) post = { "id" : post_id, "created_at": extr('Posted: ', ' <'), "uploader" : extr('By: ', ' <'), "width" : extr('Size: ', 'x'), "height" : extr('', ' <'), "source" : extr('Source: ', ' <'), "rating" : (extr('Rating: ', '<') or "?")[0].lower(), "score" : extr('Score: ', ' <'), "file_url" : extr('img', '<')), } post["md5"] = post["file_url"].rpartition("/")[2].partition(".")[0] post["date"] = text.parse_datetime( post["created_at"], "%Y-%m-%d %H:%M:%S") return post def skip(self, num): self.page_start += num return num def _pagination(self, url, begin, end): pid = self.page_start while True: page = self.request(url + str(pid)).text cnt = 0 for post_id in text.extract_iter(page, begin, end): yield self._parse_post(post_id) cnt += 1 if cnt < self.per_page: return pid += self.per_page BASE_PATTERN = GelbooruV01Extractor.update({ "thecollection": { "root": "https://the-collection.booru.org", "pattern": r"the-collection\.booru\.org", }, "illusioncardsbooru": { "root": "https://illusioncards.booru.org", "pattern": r"illusioncards\.booru\.org", }, "allgirlbooru": { "root": "https://allgirl.booru.org", "pattern": r"allgirl\.booru\.org", }, "drawfriends": { "root": "https://drawfriends.booru.org", "pattern": r"drawfriends\.booru\.org", }, "vidyart2": { "root": "https://vidyart2.booru.org", "pattern": r"vidyart2\.booru\.org", }, }) class GelbooruV01TagExtractor(GelbooruV01Extractor): subcategory = "tag" directory_fmt = ("{category}", "{search_tags}") archive_fmt = "t_{search_tags}_{id}" pattern = BASE_PATTERN + r"/index\.php\?page=post&s=list&tags=([^&#]+)" example = "https://allgirl.booru.org/index.php?page=post&s=list&tags=TAG" def __init__(self, match): GelbooruV01Extractor.__init__(self, match) self.tags = match.group(match.lastindex) def metadata(self): return {"search_tags": text.unquote(self.tags.replace("+", " "))} def posts(self): url = "{}/index.php?page=post&s=list&tags={}&pid=".format( self.root, self.tags) return self._pagination(url, 'class="thumb">
    ')) if not tag_container: return tags = collections.defaultdict(list) pattern = re.compile( r"tag-type-([^\"' ]+).*?[?;]tags=([^\"'&]+)", re.S) for tag_type, tag_name in pattern.findall(tag_container): tags[tag_type].append(text.unquote(tag_name)) for key, value in tags.items(): post["tags_" + key] = " ".join(value) def _notes(self, post, page): note_container = text.extr(page, 'id="note-container"', "", ""))), }) def _file_url_realbooru(self, post): url = post["file_url"] md5 = post["md5"] if md5 not in post["preview_url"] or url.count("/") == 5: url = "{}/images/{}/{}/{}.{}".format( self.root, md5[0:2], md5[2:4], md5, url.rpartition(".")[2]) return url def _tags_realbooru(self, post, page): tag_container = text.extr(page, 'id="tagLink"', '') tags = collections.defaultdict(list) pattern = re.compile( r'Pool: ", "") if not name: raise exception.NotFoundError("pool") self.post_ids = text.extract_iter( page, 'class="thumb" id="p', '"', pos) return { "pool": text.parse_int(self.pool_id), "pool_name": text.unescape(name), } def posts(self): params = {} for params["id"] in util.advance(self.post_ids, self.page_start): for post in self._api_request(params): yield post.attrib def _posts_pages(self): return self._pagination_html({ "page": "pool", "s" : "show", "id" : self.pool_id, }) class GelbooruV02FavoriteExtractor(GelbooruV02Extractor): subcategory = "favorite" directory_fmt = ("{category}", "favorites", "{favorite_id}") archive_fmt = "f_{favorite_id}_{id}" per_page = 50 pattern = BASE_PATTERN + r"/index\.php\?page=favorites&s=view&id=(\d+)" example = "https://safebooru.org/index.php?page=favorites&s=view&id=12345" def __init__(self, match): GelbooruV02Extractor.__init__(self, match) self.favorite_id = match.group(match.lastindex) def metadata(self): return {"favorite_id": text.parse_int(self.favorite_id)} def posts(self): return self._pagination_html({ "page": "favorites", "s" : "view", "id" : self.favorite_id, }) class GelbooruV02PostExtractor(GelbooruV02Extractor): subcategory = "post" archive_fmt = "{id}" pattern = BASE_PATTERN + r"/index\.php\?page=post&s=view&id=(\d+)" example = "https://safebooru.org/index.php?page=post&s=view&id=12345" def __init__(self, match): GelbooruV02Extractor.__init__(self, match) self.post_id = match.group(match.lastindex) def posts(self): return self._pagination({"id": self.post_id}) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709611201.0 gallery_dl-1.26.9/gallery_dl/extractor/generic.py0000644000175000017500000001765614571514301020562 0ustar00mikemike# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Generic information extractor""" from .common import Extractor, Message from .. import config, text import os.path import re class GenericExtractor(Extractor): """Extractor for images in a generic web page.""" category = "generic" directory_fmt = ("{category}", "{pageurl}") archive_fmt = "{imageurl}" # By default, the generic extractor is disabled # and the "g(eneric):" prefix in url is required. # If the extractor is enabled, make the prefix optional pattern = r"(?i)(?Pg(?:eneric)?:)" if config.get(("extractor", "generic"), "enabled"): pattern += r"?" # The generic extractor pattern should match (almost) any valid url # Based on: https://tools.ietf.org/html/rfc3986#appendix-B pattern += ( r"(?Phttps?://)?" # optional http(s) scheme r"(?P[-\w\.]+)" # required domain r"(?P/[^?#]*)?" # optional path r"(?:\?(?P[^#]*))?" # optional query r"(?:\#(?P.*))?" # optional fragment ) example = "generic:https://www.nongnu.org/lzip/" def __init__(self, match): Extractor.__init__(self, match) # Strip the "g(eneric):" prefix # and inform about "forced" or "fallback" mode if match.group('generic'): self.url = match.group(0).partition(":")[2] else: self.log.info("Falling back on generic information extractor.") self.url = match.group(0) # Make sure we have a scheme, or use https if match.group('scheme'): self.scheme = match.group('scheme') else: self.scheme = 'https://' self.url = self.scheme + self.url # Used to resolve relative image urls self.root = self.scheme + match.group('domain') def items(self): """Get page, extract metadata & images, yield them in suitable messages Adapted from common.GalleryExtractor.items() """ page = self.request(self.url).text data = self.metadata(page) imgs = self.images(page) try: data["count"] = len(imgs) except TypeError: pass images = enumerate(imgs, 1) yield Message.Directory, data for data["num"], (url, imgdata) in images: if imgdata: data.update(imgdata) if "extension" not in imgdata: text.nameext_from_url(url, data) else: text.nameext_from_url(url, data) yield Message.Url, url, data def metadata(self, page): """Extract generic webpage metadata, return them in a dict.""" data = {} data['pageurl'] = self.url data['title'] = text.extr(page, '', "") data['description'] = text.extr( page, ']*" # , ') if "page-archive" in attributes: yield from self._handle_partial_articles(extr) else: yield from self._handle_full_articles(extr) match = self._find_pager_url(page) url = text.unescape(match.group(1)) if match else None query = None def _handle_partial_articles(self, extr): while True: section = extr('
    ') if not attributes: break if "no-entry" in attributes: continue article = extr('', '') yield from self._handle_article(article) def _acceptable_query(self, key): return key == "page" or key in self.allowed_parameters class HatenablogEntryExtractor(HatenablogExtractor): """Extractor for a single entry URL""" subcategory = "entry" pattern = BASE_PATTERN + r"/entry/([^?#]+)" + QUERY_RE example = "https://BLOG.hatenablog.com/entry/PATH" def __init__(self, match): HatenablogExtractor.__init__(self, match) self.path = match.group(3) def items(self): url = "https://" + self.domain + "/entry/" + self.path page = self.request(url).text extr = text.extract_from(page) while True: attributes = extr('
    ') if "no-entry" in attributes: continue article = extr('', '
    ') return self._handle_article(article) class HatenablogHomeExtractor(HatenablogEntriesExtractor): """Extractor for a blog's home page""" subcategory = "home" pattern = BASE_PATTERN + r"(/?)" + QUERY_RE example = "https://BLOG.hatenablog.com" class HatenablogArchiveExtractor(HatenablogEntriesExtractor): """Extractor for a blog's archive page""" subcategory = "archive" pattern = (BASE_PATTERN + r"(/archive(?:/\d+(?:/\d+(?:/\d+)?)?" r"|/category/[^?#]+)?)" + QUERY_RE) example = "https://BLOG.hatenablog.com/archive/2024" class HatenablogSearchExtractor(HatenablogEntriesExtractor): """Extractor for a blog's search results""" subcategory = "search" pattern = BASE_PATTERN + r"(/search)" + QUERY_RE example = "https://BLOG.hatenablog.com/search?q=QUERY" allowed_parameters = ("q",) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709611201.0 gallery_dl-1.26.9/gallery_dl/extractor/hentai2read.py0000644000175000017500000000704714571514301021325 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2016-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://hentai2read.com/""" from .common import ChapterExtractor, MangaExtractor from .. import text, util import re class Hentai2readBase(): """Base class for hentai2read extractors""" category = "hentai2read" root = "https://hentai2read.com" class Hentai2readChapterExtractor(Hentai2readBase, ChapterExtractor): """Extractor for a single manga chapter from hentai2read.com""" archive_fmt = "{chapter_id}_{page}" pattern = r"(?:https?://)?(?:www\.)?hentai2read\.com(/[^/?#]+/([^/?#]+))" example = "https://hentai2read.com/TITLE/1/" def __init__(self, match): self.chapter = match.group(2) ChapterExtractor.__init__(self, match) def metadata(self, page): title, pos = text.extract(page, "", "") manga_id, pos = text.extract(page, 'data-mid="', '"', pos) chapter_id, pos = text.extract(page, 'data-cid="', '"', pos) chapter, sep, minor = self.chapter.partition(".") match = re.match(r"Reading (.+) \(([^)]+)\) Hentai(?: by (.+))? - " r"([^:]+): (.+) . Page 1 ", title) return { "manga": match.group(1), "manga_id": text.parse_int(manga_id), "chapter": text.parse_int(chapter), "chapter_minor": sep + minor, "chapter_id": text.parse_int(chapter_id), "type": match.group(2), "author": match.group(3), "title": match.group(5), "lang": "en", "language": "English", } def images(self, page): images = text.extract(page, "'images' : ", ",\n")[0] return [ ("https://hentaicdn.com/hentai" + part, None) for part in util.json_loads(images) ] class Hentai2readMangaExtractor(Hentai2readBase, MangaExtractor): """Extractor for hmanga from hentai2read.com""" chapterclass = Hentai2readChapterExtractor pattern = r"(?:https?://)?(?:www\.)?hentai2read\.com(/[^/?#]+)/?$" example = "https://hentai2read.com/TITLE/" def chapters(self, page): results = [] pos = page.find('itemscope itemtype="http://schema.org/Book') + 1 manga, pos = text.extract( page, '', '', pos) mtype, pos = text.extract( page, '[', ']', pos) manga_id = text.parse_int(text.extract( page, 'data-mid="', '"', pos)[0]) while True: chapter_id, pos = text.extract(page, ' data-cid="', '"', pos) if not chapter_id: return results _ , pos = text.extract(page, ' href="', '"', pos) url, pos = text.extract(page, ' href="', '"', pos) chapter, pos = text.extract(page, '>', '<', pos) chapter, _, title = text.unescape(chapter).strip().partition(" - ") chapter, sep, minor = chapter.partition(".") results.append((url, { "manga": manga, "manga_id": manga_id, "chapter": text.parse_int(chapter), "chapter_minor": sep + minor, "chapter_id": text.parse_int(chapter_id), "type": mtype, "title": title, "lang": "en", "language": "English", })) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709611201.0 gallery_dl-1.26.9/gallery_dl/extractor/hentaicosplays.py0000644000175000017500000000325714571514301022164 0ustar00mikemike# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://hentai-cosplays.com/ (also works for hentai-img.com and porn-images-xxx.com)""" from .common import GalleryExtractor from .. import text class HentaicosplaysGalleryExtractor(GalleryExtractor): """Extractor for image galleries from hentai-cosplays.com, hentai-img.com, and porn-images-xxx.com""" category = "hentaicosplays" directory_fmt = ("{site}", "{title}") filename_fmt = "{filename}.{extension}" archive_fmt = "{title}_{filename}" pattern = r"((?:https?://)?(?:\w{2}\.)?" \ r"(hentai-cosplays|hentai-img|porn-images-xxx)\.com)/" \ r"(?:image|story)/([\w-]+)" example = "https://hentai-cosplays.com/image/TITLE/" def __init__(self, match): root, self.site, self.slug = match.groups() self.root = text.ensure_http_scheme(root) url = "{}/story/{}/".format(self.root, self.slug) GalleryExtractor.__init__(self, match, url) def _init(self): self.session.headers["Referer"] = self.gallery_url def metadata(self, page): title = text.extr(page, "", "") return { "title": text.unescape(title.rpartition(" Story Viewer - ")[0]), "slug" : self.slug, "site" : self.site, } def images(self, page): return [ (url.replace("http:", "https:", 1), None) for url in text.extract_iter( page, '', '<')), "artist" : text.unescape(extr('/profile">', '<')), "_body" : extr( '
    Description
    ', '
    ') .replace("\r\n", "\n"), "", "")), "ratings" : [text.unescape(r) for r in text.extract_iter(extr( "class='ratings_box'", ""), "title='", "'")], "date" : text.parse_datetime(extr("datetime='", "'")), "views" : text.parse_int(extr(">Views", "<")), "score" : text.parse_int(extr(">Vote Score", "<")), "media" : text.unescape(extr(">Media", "<").strip()), "tags" : text.split_html(extr( ">Tags ", "")), } body = data["_body"] if "", "").rpartition(">")[2]), "author" : text.unescape(extr('alt="', '"')), "date" : text.parse_datetime(extr( ">Updated<", "").rpartition(">")[2], "%B %d, %Y"), "status" : extr("class='indent'>", "<"), } for c in ("Chapters", "Words", "Comments", "Views", "Rating"): data[c.lower()] = text.parse_int(extr( ">" + c + ":", "<").replace(",", "")) data["description"] = text.unescape(extr( "class='storyDescript'>", ""), "title='", "'")] return text.nameext_from_url(data["src"], data) def _request_check(self, url, **kwargs): self.request = self._request_original # check for Enter button / front page # and update PHPSESSID and content filters if necessary response = self.request(url, **kwargs) content = response.content if len(content) < 5000 and \ b'
    ', '') class HentaifoundryStoryExtractor(HentaifoundryExtractor): """Extractor for a hentaifoundry story""" subcategory = "story" archive_fmt = "s_{index}" pattern = BASE_PATTERN + r"/stories/user/([^/?#]+)/(\d+)" example = "https://www.hentai-foundry.com/stories/user/USER/12345/TITLE" skip = Extractor.skip def __init__(self, match): HentaifoundryExtractor.__init__(self, match) self.index = match.group(3) def items(self): story_url = "{}/stories/user/{}/{}/x?enterAgree=1".format( self.root, self.user, self.index) story = self._parse_story(self.request(story_url).text) yield Message.Directory, story yield Message.Url, story["src"], story ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709611201.0 gallery_dl-1.26.9/gallery_dl/extractor/hentaifox.py0000644000175000017500000000775514571514301021132 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2019-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://hentaifox.com/""" from .common import GalleryExtractor, Extractor, Message from .. import text, util class HentaifoxBase(): """Base class for hentaifox extractors""" category = "hentaifox" root = "https://hentaifox.com" class HentaifoxGalleryExtractor(HentaifoxBase, GalleryExtractor): """Extractor for image galleries on hentaifox.com""" pattern = r"(?:https?://)?(?:www\.)?hentaifox\.com(/gallery/(\d+))" example = "https://hentaifox.com/gallery/12345/" def __init__(self, match): GalleryExtractor.__init__(self, match) self.gallery_id = match.group(2) @staticmethod def _split(txt): return [ text.remove_html(tag.partition(">")[2], "", "") for tag in text.extract_iter( txt, "class='tag_btn", "') yield { "url" : text.urljoin(self.root, url), "gallery_id": text.parse_int( url.strip("/").rpartition("/")[2]), "title" : text.unescape(title), "_extractor": HentaifoxGalleryExtractor, } pos = page.find(">Next<") url = text.rextract(page, "href=", ">", pos)[0] if pos == -1 or "/pag" not in url: return num += 1 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709611201.0 gallery_dl-1.26.9/gallery_dl/extractor/hentaihand.py0000644000175000017500000000643414571514301021241 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2020-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://hentaihand.com/""" from .common import GalleryExtractor, Extractor, Message from .. import text, util class HentaihandGalleryExtractor(GalleryExtractor): """Extractor for image galleries on hentaihand.com""" category = "hentaihand" root = "https://hentaihand.com" pattern = r"(?:https?://)?(?:www\.)?hentaihand\.com/\w+/comic/([\w-]+)" example = "https://hentaihand.com/en/comic/TITLE" def __init__(self, match): self.slug = match.group(1) url = "{}/api/comics/{}".format(self.root, self.slug) GalleryExtractor.__init__(self, match, url) def metadata(self, page): info = util.json_loads(page) data = { "gallery_id" : text.parse_int(info["id"]), "title" : info["title"], "title_alt" : info["alternative_title"], "slug" : self.slug, "type" : info["category"]["name"], "language" : info["language"]["name"], "lang" : util.language_to_code(info["language"]["name"]), "tags" : [t["slug"] for t in info["tags"]], "date" : text.parse_datetime( info["uploaded_at"], "%Y-%m-%d"), } for key in ("artists", "authors", "groups", "characters", "relationships", "parodies"): data[key] = [v["name"] for v in info[key]] return data def images(self, _): info = self.request(self.gallery_url + "/images").json() return [(img["source_url"], img) for img in info["images"]] class HentaihandTagExtractor(Extractor): """Extractor for tag searches on hentaihand.com""" category = "hentaihand" subcategory = "tag" root = "https://hentaihand.com" pattern = (r"(?i)(?:https?://)?(?:www\.)?hentaihand\.com" r"/\w+/(parody|character|tag|artist|group|language" r"|category|relationship)/([^/?#]+)") example = "https://hentaihand.com/en/tag/TAG" def __init__(self, match): Extractor.__init__(self, match) self.type, self.key = match.groups() def items(self): if self.type[-1] == "y": tpl = self.type[:-1] + "ies" else: tpl = self.type + "s" url = "{}/api/{}/{}".format(self.root, tpl, self.key) tid = self.request(url, notfound=self.type).json()["id"] url = self.root + "/api/comics" params = { "per_page": "18", tpl : tid, "page" : 1, "q" : "", "sort" : "uploaded_at", "order" : "desc", "duration": "day", } while True: info = self.request(url, params=params).json() for gallery in info["data"]: gurl = "{}/en/comic/{}".format(self.root, gallery["slug"]) gallery["_extractor"] = HentaihandGalleryExtractor yield Message.Queue, gurl, gallery if params["page"] >= info["last_page"]: return params["page"] += 1 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709611201.0 gallery_dl-1.26.9/gallery_dl/extractor/hentaihere.py0000644000175000017500000000714514571514301021252 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2016-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://hentaihere.com/""" from .common import ChapterExtractor, MangaExtractor from .. import text, util import re class HentaihereBase(): """Base class for hentaihere extractors""" category = "hentaihere" root = "https://hentaihere.com" class HentaihereChapterExtractor(HentaihereBase, ChapterExtractor): """Extractor for a single manga chapter from hentaihere.com""" archive_fmt = "{chapter_id}_{page}" pattern = r"(?:https?://)?(?:www\.)?hentaihere\.com/m/S(\d+)/([^/?#]+)" example = "https://hentaihere.com/m/S12345/1/1/" def __init__(self, match): self.manga_id, self.chapter = match.groups() url = "{}/m/S{}/{}/1".format(self.root, self.manga_id, self.chapter) ChapterExtractor.__init__(self, match, url) def metadata(self, page): title = text.extr(page, "", "") chapter_id = text.extr(page, 'report/C', '"') chapter, sep, minor = self.chapter.partition(".") pattern = r"Page 1 \| (.+) \(([^)]+)\) - Chapter \d+: (.+) by (.+) at " match = re.match(pattern, title) return { "manga": match.group(1), "manga_id": text.parse_int(self.manga_id), "chapter": text.parse_int(chapter), "chapter_minor": sep + minor, "chapter_id": text.parse_int(chapter_id), "type": match.group(2), "title": match.group(3), "author": match.group(4), "lang": "en", "language": "English", } @staticmethod def images(page): images = text.extr(page, "var rff_imageList = ", ";") return [ ("https://hentaicdn.com/hentai" + part, None) for part in util.json_loads(images) ] class HentaihereMangaExtractor(HentaihereBase, MangaExtractor): """Extractor for hmanga from hentaihere.com""" chapterclass = HentaihereChapterExtractor pattern = r"(?:https?://)?(?:www\.)?hentaihere\.com(/m/S\d+)/?$" example = "https://hentaihere.com/m/S12345" def chapters(self, page): results = [] pos = page.find('itemscope itemtype="http://schema.org/Book') + 1 manga, pos = text.extract( page, '', '', pos) mtype, pos = text.extract( page, '[', ']', pos) manga_id = text.parse_int( self.manga_url.rstrip("/").rpartition("/")[2][1:]) while True: marker, pos = text.extract( page, '
  • ', '', pos) if marker is None: return results url, pos = text.extract(page, '\n', '<', pos) chapter_id, pos = text.extract(page, '/C', '"', pos) chapter, _, title = text.unescape(chapter).strip().partition(" - ") chapter, sep, minor = chapter.partition(".") results.append((url, { "manga_id": manga_id, "manga": manga, "chapter": text.parse_int(chapter), "chapter_minor": sep + minor, "chapter_id": text.parse_int(chapter_id), "type": mtype, "title": title, "lang": "en", "language": "English", })) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1710779776.0 gallery_dl-1.26.9/gallery_dl/extractor/hiperdex.py0000644000175000017500000001174314576066600020756 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2020-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://hiperdex.com/""" from .common import ChapterExtractor, MangaExtractor from .. import text from ..cache import memcache import re BASE_PATTERN = (r"((?:https?://)?(?:www\.)?" r"(?:1st)?hiperdex\d?\.(?:com|net|info))") class HiperdexBase(): """Base class for hiperdex extractors""" category = "hiperdex" root = "https://hiperdex.com" @memcache(keyarg=1) def manga_data(self, manga, page=None): if not page: url = "{}/mangas/{}/".format(self.root, manga) page = self.request(url).text extr = text.extract_from(page) return { "url" : text.unescape(extr( 'property="og:url" content="', '"')), "manga" : text.unescape(extr( ' property="name" title="', '"')), "score" : text.parse_float(extr( 'id="averagerate">', '<')), "author" : text.remove_html(extr( 'class="author-content">', '
  • ')), "artist" : text.remove_html(extr( 'class="artist-content">', '')), "genre" : text.split_html(extr( 'class="genres-content">', ''))[::2], "type" : extr( 'class="summary-content">', '<').strip(), "release": text.parse_int(text.remove_html(extr( 'class="summary-content">', ''))), "status" : extr( 'class="summary-content">', '<').strip(), "description": text.remove_html(text.unescape(extr( 'class="description-summary">', ''))), "language": "English", "lang" : "en", } def chapter_data(self, chapter): if chapter.startswith("chapter-"): chapter = chapter[8:] chapter, _, minor = chapter.partition("-") data = { "chapter" : text.parse_int(chapter), "chapter_minor": "." + minor if minor and minor != "end" else "", } data.update(self.manga_data(self.manga.lower())) return data class HiperdexChapterExtractor(HiperdexBase, ChapterExtractor): """Extractor for manga chapters from hiperdex.com""" pattern = BASE_PATTERN + r"(/mangas?/([^/?#]+)/([^/?#]+))" example = "https://hiperdex.com/mangas/MANGA/CHAPTER/" def __init__(self, match): root, path, self.manga, self.chapter = match.groups() self.root = text.ensure_http_scheme(root) ChapterExtractor.__init__(self, match, self.root + path + "/") def metadata(self, _): return self.chapter_data(self.chapter) def images(self, page): return [ (url.strip(), None) for url in re.findall( r'id="image-\d+"\s+(?:data-)?src="([^"]+)', page) ] class HiperdexMangaExtractor(HiperdexBase, MangaExtractor): """Extractor for manga from hiperdex.com""" chapterclass = HiperdexChapterExtractor pattern = BASE_PATTERN + r"(/mangas?/([^/?#]+))/?$" example = "https://hiperdex.com/mangas/MANGA/" def __init__(self, match): root, path, self.manga = match.groups() self.root = text.ensure_http_scheme(root) MangaExtractor.__init__(self, match, self.root + path + "/") def chapters(self, page): data = self.manga_data(self.manga, page) self.manga_url = url = data["url"] url = self.manga_url + "ajax/chapters/" headers = { "Accept": "*/*", "X-Requested-With": "XMLHttpRequest", "Origin": self.root, "Referer": "https://" + text.quote(self.manga_url[8:]), } html = self.request(url, method="POST", headers=headers).text results = [] for item in text.extract_iter( html, '
  • = total: return @memcache(maxage=1800) def _parse_gg(extr): page = extr.request("https://ltn.hitomi.la/gg.js").text m = {} keys = [] for match in re.finditer( r"case\s+(\d+):(?:\s*o\s*=\s*(\d+))?", page): key, value = match.groups() keys.append(int(key)) if value: value = int(value) for key in keys: m[key] = value keys.clear() for match in re.finditer( r"if\s+\(g\s*===?\s*(\d+)\)[\s{]*o\s*=\s*(\d+)", page): m[int(match.group(1))] = int(match.group(2)) d = re.search(r"(?:var\s|default:)\s*o\s*=\s*(\d+)", page) b = re.search(r"b:\s*[\"'](.+)[\"']", page) return m, b.group(1).strip("/"), int(d.group(1)) if d else 1 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709611201.0 gallery_dl-1.26.9/gallery_dl/extractor/hotleak.py0000644000175000017500000001353514571514301020565 0ustar00mikemike# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://hotleak.vip/""" from .common import Extractor, Message from .. import text, exception import binascii BASE_PATTERN = r"(?:https?://)?(?:www\.)?hotleak\.vip" class HotleakExtractor(Extractor): """Base class for hotleak extractors""" category = "hotleak" directory_fmt = ("{category}", "{creator}",) filename_fmt = "{creator}_{id}.{extension}" archive_fmt = "{type}_{creator}_{id}" root = "https://hotleak.vip" def items(self): for post in self.posts(): yield Message.Directory, post yield Message.Url, post["url"], post def posts(self): """Return an iterable containing relevant posts""" return () def _pagination(self, url, params): params = text.parse_query(params) params["page"] = text.parse_int(params.get("page"), 1) while True: page = self.request(url, params=params).text if "" not in page: return for item in text.extract_iter( page, '
    ', '
    ') data = { "id" : text.parse_int(self.id), "creator": self.creator, "type" : self.type, } if self.type == "photo": data["url"] = text.extr(page, 'data-src="', '"') text.nameext_from_url(data["url"], data) elif self.type == "video": data["url"] = "ytdl:" + decode_video_url(text.extr( text.unescape(page), '"src":"', '"')) text.nameext_from_url(data["url"], data) data["extension"] = "mp4" return (data,) class HotleakCreatorExtractor(HotleakExtractor): """Extractor for all posts from a hotleak creator""" subcategory = "creator" pattern = (BASE_PATTERN + r"/(?!(?:hot|creators|videos|photos)(?:$|/))" r"([^/?#]+)/?$") example = "https://hotleak.vip/MODEL" def __init__(self, match): HotleakExtractor.__init__(self, match) self.creator = match.group(1) def posts(self): url = "{}/{}".format(self.root, self.creator) return self._pagination(url) def _pagination(self, url): headers = {"X-Requested-With": "XMLHttpRequest"} params = {"page": 1} while True: try: response = self.request( url, headers=headers, params=params, notfound="creator") except exception.HttpError as exc: if exc.response.status_code == 429: self.wait( until=exc.response.headers.get("X-RateLimit-Reset")) continue raise posts = response.json() if not posts: return data = {"creator": self.creator} for post in posts: data["id"] = text.parse_int(post["id"]) if post["type"] == 0: data["type"] = "photo" data["url"] = self.root + "/storage/" + post["image"] text.nameext_from_url(data["url"], data) elif post["type"] == 1: data["type"] = "video" data["url"] = "ytdl:" + decode_video_url( post["stream_url_play"]) text.nameext_from_url(data["url"], data) data["extension"] = "mp4" yield data params["page"] += 1 class HotleakCategoryExtractor(HotleakExtractor): """Extractor for hotleak categories""" subcategory = "category" pattern = BASE_PATTERN + r"/(hot|creators|videos|photos)(?:/?\?([^#]+))?" example = "https://hotleak.vip/photos" def __init__(self, match): HotleakExtractor.__init__(self, match) self._category, self.params = match.groups() def items(self): url = "{}/{}".format(self.root, self._category) if self._category in ("hot", "creators"): data = {"_extractor": HotleakCreatorExtractor} elif self._category in ("videos", "photos"): data = {"_extractor": HotleakPostExtractor} for item in self._pagination(url, self.params): yield Message.Queue, item, data class HotleakSearchExtractor(HotleakExtractor): """Extractor for hotleak search results""" subcategory = "search" pattern = BASE_PATTERN + r"/search(?:/?\?([^#]+))" example = "https://hotleak.vip/search?search=QUERY" def __init__(self, match): HotleakExtractor.__init__(self, match) self.params = match.group(1) def items(self): data = {"_extractor": HotleakCreatorExtractor} for creator in self._pagination(self.root + "/search", self.params): yield Message.Queue, creator, data ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1711067806.0 gallery_dl-1.26.9/gallery_dl/extractor/idolcomplex.py0000644000175000017500000002173314577151236021466 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2018-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://idol.sankakucomplex.com/""" from .sankaku import SankakuExtractor from .common import Message from ..cache import cache from .. import text, util, exception import collections import re BASE_PATTERN = r"(?:https?://)?idol\.sankakucomplex\.com(?:/[a-z]{2})?" class IdolcomplexExtractor(SankakuExtractor): """Base class for idolcomplex extractors""" category = "idolcomplex" root = "https://idol.sankakucomplex.com" cookies_domain = "idol.sankakucomplex.com" cookies_names = ("_idolcomplex_session",) referer = False request_interval = (3.0, 6.0) def __init__(self, match): SankakuExtractor.__init__(self, match) self.logged_in = True self.start_page = 1 self.start_post = 0 def _init(self): self.find_pids = re.compile( r" href=[\"#]/\w\w/posts/(\w+)" ).findall self.find_tags = re.compile( r'tag-type-([^"]+)">\s*
    ]*?href="/[^?]*\?tags=([^"]+)' ).findall def items(self): self.login() data = self.metadata() for post_id in util.advance(self.post_ids(), self.start_post): post = self._extract_post(post_id) url = post["file_url"] post.update(data) text.nameext_from_url(url, post) yield Message.Directory, post yield Message.Url, url, post def skip(self, num): self.start_post += num return num def post_ids(self): """Return an iterable containing all relevant post ids""" def login(self): if self.cookies_check(self.cookies_names): return username, password = self._get_auth_info() if username: return self.cookies_update(self._login_impl(username, password)) self.logged_in = False @cache(maxage=90*86400, keyarg=1) def _login_impl(self, username, password): self.log.info("Logging in as %s", username) url = self.root + "/users/login" page = self.request(url).text headers = { "Referer": url, } url = self.root + (text.extr(page, '
    ") vcnt = extr('>Votes:', "<") pid = extr(">Post ID:", "<") created = extr(' title="', '"') file_url = extr('>Original:', 'id=') if file_url: file_url = extr(' href="', '"') width = extr(">", "x") height = extr("", " ") else: width = extr('') file_url = extr('Rating:", "') for tag_type, tag_name in self.find_tags(tags_html or ""): tags[tag_type].append(text.unquote(tag_name)) for key, value in tags.items(): data["tags_" + key] = " ".join(value) tags_list += value data["tags"] = " ".join(tags_list) return data class IdolcomplexTagExtractor(IdolcomplexExtractor): """Extractor for images from idol.sankakucomplex.com by search-tags""" subcategory = "tag" directory_fmt = ("{category}", "{search_tags}") archive_fmt = "t_{search_tags}_{id}" pattern = BASE_PATTERN + r"/(?:posts/?)?\?([^#]*)" example = "https://idol.sankakucomplex.com/en/posts?tags=TAGS" per_page = 20 def __init__(self, match): IdolcomplexExtractor.__init__(self, match) query = text.parse_query(match.group(1)) self.tags = text.unquote(query.get("tags", "").replace("+", " ")) self.start_page = text.parse_int(query.get("page"), 1) self.next = text.parse_int(query.get("next"), 0) def skip(self, num): if self.next: self.start_post += num else: pages, posts = divmod(num, self.per_page) self.start_page += pages self.start_post += posts return num def metadata(self): if not self.next: max_page = 50 if self.logged_in else 25 if self.start_page > max_page: self.log.info("Traversing from page %d to page %d", max_page, self.start_page) self.start_post += self.per_page * (self.start_page - max_page) self.start_page = max_page tags = self.tags.split() if not self.logged_in and len(tags) > 4: raise exception.StopExtraction( "Non-members can only search up to 4 tags at once") return {"search_tags": " ".join(tags)} def post_ids(self): params = {"tags": self.tags} if self.next: params["next"] = self.next else: params["page"] = self.start_page while True: page = self.request(self.root, params=params, retries=10).text pos = ((page.find('id="more-popular-posts-link"') + 1) or (page.find('', '<').strip())} def images(self, page): findall = re.compile(r'' in page: url = "{}/p/{}/loadAll".format(self.root, self.gallery_id) headers = { "X-Requested-With": "XMLHttpRequest", "Origin" : self.root, "Referer" : self.gallery_url, } csrf_token = text.extr(page, 'name="csrf-token" content="', '"') data = {"_token": csrf_token} page += self.request( url, method="POST", headers=headers, data=data).text return [ (url, None) for url in text.extract_iter(page, 'data-url="', '"') ] def _metadata_api(self, page): post = self.api.post(self.gallery_id) post["date"] = text.parse_datetime( post["created"], "%Y-%m-%dT%H:%M:%S.%fZ") for img in post["images"]: img["date"] = text.parse_datetime( img["created"], "%Y-%m-%dT%H:%M:%S.%fZ") post["gallery_id"] = self.gallery_id post.pop("image_count", None) self._image_list = post.pop("images") return post def _images_api(self, page): return [ (img["link"], img) for img in self._image_list ] class ImagechestUserExtractor(Extractor): """Extractor for imgchest.com user profiles""" category = "imagechest" subcategory = "user" root = "https://imgchest.com" pattern = BASE_PATTERN + r"/u/([^/?#]+)" example = "https://imgchest.com/u/USER" def __init__(self, match): Extractor.__init__(self, match) self.user = match.group(1) def items(self): url = self.root + "/api/posts" params = { "page" : 1, "sort" : "new", "tag" : "", "q" : "", "username": text.unquote(self.user), "nsfw" : "true", } while True: try: data = self.request(url, params=params).json()["data"] except (TypeError, KeyError): return for gallery in data: gallery["_extractor"] = ImagechestGalleryExtractor yield Message.Queue, gallery["link"], gallery params["page"] += 1 class ImagechestAPI(): """Interface for the Image Chest API https://imgchest.com/docs/api/1.0/general/overview """ root = "https://api.imgchest.com" def __init__(self, extractor, access_token): self.extractor = extractor self.headers = {"Authorization": "Bearer " + access_token} def file(self, file_id): endpoint = "/v1/file/" + file_id return self._call(endpoint) def post(self, post_id): endpoint = "/v1/post/" + post_id return self._call(endpoint) def user(self, username): endpoint = "/v1/user/" + username return self._call(endpoint) def _call(self, endpoint): url = self.root + endpoint while True: response = self.extractor.request( url, headers=self.headers, fatal=None, allow_redirects=False) if response.status_code < 300: return response.json()["data"] elif response.status_code < 400: raise exception.AuthenticationError("Invalid API access token") elif response.status_code == 429: self.extractor.wait(seconds=600) else: self.extractor.log.debug(response.text) raise exception.StopExtraction("API request failed") ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1710543112.0 gallery_dl-1.26.9/gallery_dl/extractor/imagefap.py0000644000175000017500000002101214575150410020675 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2016-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.imagefap.com/""" from .common import Extractor, Message from .. import text, util, exception BASE_PATTERN = r"(?:https?://)?(?:www\.|beta\.)?imagefap\.com" class ImagefapExtractor(Extractor): """Base class for imagefap extractors""" category = "imagefap" root = "https://www.imagefap.com" directory_fmt = ("{category}", "{gallery_id} {title}") filename_fmt = "{category}_{gallery_id}_{filename}.{extension}" archive_fmt = "{gallery_id}_{image_id}" request_interval = (2.0, 4.0) def request(self, url, **kwargs): response = Extractor.request(self, url, **kwargs) if response.history and response.url.endswith("/human-verification"): msg = text.extr(response.text, '
    ")[2].split()) raise exception.StopExtraction("'%s'", msg) self.log.warning("HTTP redirect to %s", response.url) return response class ImagefapGalleryExtractor(ImagefapExtractor): """Extractor for image galleries from imagefap.com""" subcategory = "gallery" pattern = BASE_PATTERN + r"/(?:gallery\.php\?gid=|gallery/|pictures/)(\d+)" example = "https://www.imagefap.com/gallery/12345" def __init__(self, match): ImagefapExtractor.__init__(self, match) self.gid = match.group(1) self.image_id = "" def items(self): url = "{}/gallery/{}".format(self.root, self.gid) page = self.request(url).text data = self.get_job_metadata(page) yield Message.Directory, data for url, image in self.get_images(): data.update(image) yield Message.Url, url, data def get_job_metadata(self, page): """Collect metadata for extractor-job""" extr = text.extract_from(page) data = { "gallery_id": text.parse_int(self.gid), "uploader": extr("porn picture gallery by ", " to see hottest"), "title": text.unescape(extr("", "<")), "description": text.unescape(extr( 'id="gdesc_text"', '<').partition(">")[2]), "categories": text.split_html(extr( 'id="cnt_cats"', '</div>'))[1::2], "tags": text.split_html(extr( 'id="cnt_tags"', '</div>'))[1::2], "count": text.parse_int(extr(' 1 of ', ' pics"')), } self.image_id = extr('id="img_ed_', '"') self._count = data["count"] return data def get_images(self): """Collect image-urls and -metadata""" url = "{}/photo/{}/".format(self.root, self.image_id) params = {"gid": self.gid, "idx": 0, "partial": "true"} headers = { "Content-Type": "application/x-www-form-urlencoded", "X-Requested-With": "XMLHttpRequest", "Referer": "{}?pgid=&gid={}&page=0".format(url, self.image_id) } num = 0 total = self._count while True: page = self.request(url, params=params, headers=headers).text cnt = 0 for image_url in text.extract_iter(page, '<a href="', '"'): num += 1 cnt += 1 data = text.nameext_from_url(image_url) data["num"] = num data["image_id"] = text.parse_int(data["filename"]) yield image_url, data if not cnt or cnt < 24 and num >= total: return params["idx"] += cnt class ImagefapImageExtractor(ImagefapExtractor): """Extractor for single images from imagefap.com""" subcategory = "image" pattern = BASE_PATTERN + r"/photo/(\d+)" example = "https://www.imagefap.com/photo/12345" def __init__(self, match): ImagefapExtractor.__init__(self, match) self.image_id = match.group(1) def items(self): url, data = self.get_image() yield Message.Directory, data yield Message.Url, url, data def get_image(self): url = "{}/photo/{}/".format(self.root, self.image_id) page = self.request(url).text url, pos = text.extract( page, 'original="', '"') info, pos = text.extract( page, '<script type="application/ld+json">', '</script>', pos) image_id, pos = text.extract( page, 'id="imageid_input" value="', '"', pos) gallery_id, pos = text.extract( page, 'id="galleryid_input" value="', '"', pos) info = util.json_loads(info) return url, text.nameext_from_url(url, { "title": text.unescape(info["name"]), "uploader": info["author"], "date": info["datePublished"], "width": text.parse_int(info["width"]), "height": text.parse_int(info["height"]), "gallery_id": text.parse_int(gallery_id), "image_id": text.parse_int(image_id), }) class ImagefapFolderExtractor(ImagefapExtractor): """Extractor for imagefap user folders""" subcategory = "folder" pattern = (BASE_PATTERN + r"/(?:organizer/|" r"(?:usergallery\.php\?user(id)?=([^&#]+)&" r"|profile/([^/?#]+)/galleries\?)folderid=)(\d+|-1)") example = "https://www.imagefap.com/organizer/12345" def __init__(self, match): ImagefapExtractor.__init__(self, match) self._id, user, profile, self.folder_id = match.groups() self.user = user or profile def items(self): for gallery_id, name, folder in self.galleries(self.folder_id): url = "{}/gallery/{}".format(self.root, gallery_id) data = { "gallery_id": gallery_id, "title" : text.unescape(name), "folder" : text.unescape(folder), "_extractor": ImagefapGalleryExtractor, } yield Message.Queue, url, data def galleries(self, folder_id): """Yield gallery IDs and titles of a folder""" if folder_id == "-1": folder_name = "Uncategorized" if self._id: url = "{}/usergallery.php?userid={}&folderid=-1".format( self.root, self.user) else: url = "{}/profile/{}/galleries?folderid=-1".format( self.root, self.user) else: folder_name = None url = "{}/organizer/{}/".format(self.root, folder_id) params = {"page": 0} extr = text.extract_from(self.request(url, params=params).text) if not folder_name: folder_name = extr("class'blk_galleries'><b>", "</b>") while True: cnt = 0 while True: gid = extr(' id="gid-', '"') if not gid: break yield gid, extr("<b>", "<"), folder_name cnt += 1 if cnt < 20: break params["page"] += 1 extr = text.extract_from(self.request(url, params=params).text) class ImagefapUserExtractor(ImagefapExtractor): """Extractor for an imagefap user profile""" subcategory = "user" pattern = (BASE_PATTERN + r"/(?:profile(?:\.php\?user=|/)([^/?#]+)(?:/galleries)?" r"|usergallery\.php\?userid=(\d+))(?:$|#)") example = "https://www.imagefap.com/profile/USER" def __init__(self, match): ImagefapExtractor.__init__(self, match) self.user, self.user_id = match.groups() def items(self): data = {"_extractor": ImagefapFolderExtractor} for folder_id in self.folders(): if folder_id == "-1": url = "{}/profile/{}/galleries?folderid=-1".format( self.root, self.user) else: url = "{}/organizer/{}/".format(self.root, folder_id) yield Message.Queue, url, data def folders(self): """Return a list of folder IDs of a user""" if self.user: url = "{}/profile/{}/galleries".format(self.root, self.user) else: url = "{}/usergallery.php?userid={}".format( self.root, self.user_id) response = self.request(url) self.user = response.url.split("/")[-2] folders = text.extr(response.text, ' id="tgl_all" value="', '"') return folders.rstrip("|").split("|") ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1709680883.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.26.9/gallery_dl/extractor/imagehosts.py������������������������������������������������0000644�0001750�0001750�00000026573�14571724363�021322� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2016-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Collection of extractors for various imagehosts""" from .common import Extractor, Message from .. import text, exception from ..cache import memcache from os.path import splitext class ImagehostImageExtractor(Extractor): """Base class for single-image extractors for various imagehosts""" basecategory = "imagehost" subcategory = "image" archive_fmt = "{token}" _https = True _params = None _cookies = None _encoding = None def __init__(self, match): Extractor.__init__(self, match) self.page_url = "http{}://{}".format( "s" if self._https else "", match.group(1)) self.token = match.group(2) if self._params == "simple": self._params = { "imgContinue": "Continue+to+image+...+", } elif self._params == "complex": self._params = { "op": "view", "id": self.token, "pre": "1", "adb": "1", "next": "Continue+to+image+...+", } def items(self): page = self.request( self.page_url, method=("POST" if self._params else "GET"), data=self._params, cookies=self._cookies, encoding=self._encoding, ).text url, filename = self.get_info(page) data = text.nameext_from_url(filename, {"token": self.token}) data.update(self.metadata(page)) if self._https and url.startswith("http:"): url = "https:" + url[5:] yield Message.Directory, data yield Message.Url, url, data def get_info(self, page): """Find image-url and string to get filename from""" def metadata(self, page): """Return additional metadata""" return () class ImxtoImageExtractor(ImagehostImageExtractor): """Extractor for single images from imx.to""" category = "imxto" pattern = (r"(?:https?://)?(?:www\.)?((?:imx\.to|img\.yt)" r"/(?:i/|img-)(\w+)(\.html)?)") example = "https://imx.to/i/ID" _params = "simple" _encoding = "utf-8" def __init__(self, match): ImagehostImageExtractor.__init__(self, match) if "/img-" in self.page_url: self.page_url = self.page_url.replace("img.yt", "imx.to") self.url_ext = True else: self.url_ext = False def get_info(self, page): url, pos = text.extract( page, '<div style="text-align:center;"><a href="', '"') if not url: raise exception.NotFoundError("image") filename, pos = text.extract(page, ' title="', '"', pos) if self.url_ext and filename: filename += splitext(url)[1] return url, filename or url def metadata(self, page): extr = text.extract_from(page, page.index("[ FILESIZE <")) size = extr(">", "</span>").replace(" ", "")[:-1] width, _, height = extr(">", " px</span>").partition("x") return { "size" : text.parse_bytes(size), "width" : text.parse_int(width), "height": text.parse_int(height), "hash" : extr(">", "</span>"), } class ImxtoGalleryExtractor(ImagehostImageExtractor): """Extractor for image galleries from imx.to""" category = "imxto" subcategory = "gallery" pattern = r"(?:https?://)?(?:www\.)?(imx\.to/g/([^/?#]+))" example = "https://imx.to/g/ID" def items(self): page = self.request(self.page_url).text title, pos = text.extract(page, '<div class="title', '<') data = { "_extractor": ImxtoImageExtractor, "title": text.unescape(title.partition(">")[2]).strip(), } for url in text.extract_iter(page, "<a href=", " ", pos): yield Message.Queue, url.strip("\"'"), data class AcidimgImageExtractor(ImagehostImageExtractor): """Extractor for single images from acidimg.cc""" category = "acidimg" pattern = r"(?:https?://)?((?:www\.)?acidimg\.cc/img-([a-z0-9]+)\.html)" example = "https://acidimg.cc/img-abc123.html" _params = "simple" _encoding = "utf-8" def get_info(self, page): url, pos = text.extract(page, "<img class='centred' src='", "'") if not url: url, pos = text.extract(page, '<img class="centred" src="', '"') if not url: raise exception.NotFoundError("image") filename, pos = text.extract(page, "alt='", "'", pos) if not filename: filename, pos = text.extract(page, 'alt="', '"', pos) return url, (filename + splitext(url)[1]) if filename else url class ImagevenueImageExtractor(ImagehostImageExtractor): """Extractor for single images from imagevenue.com""" category = "imagevenue" pattern = (r"(?:https?://)?((?:www|img\d+)\.imagevenue\.com" r"/([A-Z0-9]{8,10}|view/.*|img\.php\?.*))") example = "https://www.imagevenue.com/ME123456789" def get_info(self, page): pos = page.index('class="card-body') url, pos = text.extract(page, '<img src="', '"', pos) if url.endswith("/loader.svg"): url, pos = text.extract(page, '<img src="', '"', pos) filename, pos = text.extract(page, 'alt="', '"', pos) return url, text.unescape(filename) class ImagetwistImageExtractor(ImagehostImageExtractor): """Extractor for single images from imagetwist.com""" category = "imagetwist" pattern = (r"(?:https?://)?((?:www\.|phun\.)?" r"image(?:twist|haha)\.com/([a-z0-9]{12}))") example = "https://imagetwist.com/123456abcdef/NAME.EXT" @property @memcache(maxage=3*3600) def _cookies(self): return self.request(self.page_url).cookies def get_info(self, page): url , pos = text.extract(page, '<img src="', '"') filename, pos = text.extract(page, ' alt="', '"', pos) return url, filename class ImagetwistGalleryExtractor(ImagehostImageExtractor): """Extractor for galleries from imagetwist.com""" category = "imagetwist" subcategory = "gallery" pattern = (r"(?:https?://)?((?:www\.|phun\.)?" r"image(?:twist|haha)\.com/(p/[^/?#]+/\d+))") example = "https://imagetwist.com/p/USER/12345/NAME" def items(self): data = {"_extractor": ImagetwistImageExtractor} root = self.page_url[:self.page_url.find("/", 8)] page = self.request(self.page_url).text gallery = text.extr(page, 'class="gallerys', "</div") for path in text.extract_iter(gallery, ' href="', '"'): yield Message.Queue, root + path, data class ImgspiceImageExtractor(ImagehostImageExtractor): """Extractor for single images from imgspice.com""" category = "imgspice" pattern = r"(?:https?://)?((?:www\.)?imgspice\.com/([^/?#]+))" example = "https://imgspice.com/ID/NAME.EXT.html" def get_info(self, page): pos = page.find('id="imgpreview"') if pos < 0: raise exception.NotFoundError("image") url , pos = text.extract(page, 'src="', '"', pos) name, pos = text.extract(page, 'alt="', '"', pos) return url, text.unescape(name) class PixhostImageExtractor(ImagehostImageExtractor): """Extractor for single images from pixhost.to""" category = "pixhost" pattern = (r"(?:https?://)?((?:www\.)?pixhost\.(?:to|org)" r"/show/\d+/(\d+)_[^/?#]+)") example = "https://pixhost.to/show/123/12345_NAME.EXT" _cookies = {"pixhostads": "1", "pixhosttest": "1"} def get_info(self, page): url , pos = text.extract(page, "class=\"image-img\" src=\"", "\"") filename, pos = text.extract(page, "alt=\"", "\"", pos) return url, filename class PixhostGalleryExtractor(ImagehostImageExtractor): """Extractor for image galleries from pixhost.to""" category = "pixhost" subcategory = "gallery" pattern = (r"(?:https?://)?((?:www\.)?pixhost\.(?:to|org)" r"/gallery/([^/?#]+))") example = "https://pixhost.to/gallery/ID" def items(self): page = text.extr(self.request( self.page_url).text, 'class="images"', "</div>") data = {"_extractor": PixhostImageExtractor} for url in text.extract_iter(page, '<a href="', '"'): yield Message.Queue, url, data class PostimgImageExtractor(ImagehostImageExtractor): """Extractor for single images from postimages.org""" category = "postimg" pattern = (r"(?:https?://)?((?:www\.)?(?:postim(?:ages|g)|pixxxels)" r"\.(?:cc|org)/(?!gallery/)(?:image/)?([^/?#]+)/?)") example = "https://postimages.org/ID" def get_info(self, page): pos = page.index(' id="download"') url , pos = text.rextract(page, ' href="', '"', pos) filename, pos = text.extract(page, 'class="imagename">', '<', pos) return url, text.unescape(filename) class PostimgGalleryExtractor(ImagehostImageExtractor): """Extractor for images galleries from postimages.org""" category = "postimg" subcategory = "gallery" pattern = (r"(?:https?://)?((?:www\.)?(?:postim(?:ages|g)|pixxxels)" r"\.(?:cc|org)/gallery/([^/?#]+))") example = "https://postimages.org/gallery/ID" def items(self): page = self.request(self.page_url).text data = {"_extractor": PostimgImageExtractor} for url in text.extract_iter(page, ' class="thumb"><a href="', '"'): yield Message.Queue, url, data class TurboimagehostImageExtractor(ImagehostImageExtractor): """Extractor for single images from www.turboimagehost.com""" category = "turboimagehost" pattern = (r"(?:https?://)?((?:www\.)?turboimagehost\.com" r"/p/(\d+)/[^/?#]+\.html)") example = "https://www.turboimagehost.com/p/12345/NAME.EXT.html" def get_info(self, page): url = text.extract(page, 'src="', '"', page.index("<img "))[0] return url, url class ViprImageExtractor(ImagehostImageExtractor): """Extractor for single images from vipr.im""" category = "vipr" pattern = r"(?:https?://)?(vipr\.im/(\w+))" example = "https://vipr.im/abc123.html" def get_info(self, page): url = text.extr(page, '<img src="', '"') return url, url class ImgclickImageExtractor(ImagehostImageExtractor): """Extractor for single images from imgclick.net""" category = "imgclick" pattern = r"(?:https?://)?((?:www\.)?imgclick\.net/([^/?#]+))" example = "http://imgclick.net/abc123/NAME.EXT.html" _https = False _params = "complex" def get_info(self, page): url , pos = text.extract(page, '<br><img src="', '"') filename, pos = text.extract(page, 'alt="', '"', pos) return url, filename class FappicImageExtractor(ImagehostImageExtractor): """Extractor for single images from fappic.com""" category = "fappic" pattern = r"(?:https?://)?((?:www\.)?fappic\.com/(\w+)/[^/?#]+)" example = "https://fappic.com/abc123/NAME.EXT" def get_info(self, page): url , pos = text.extract(page, '<a href="#"><img src="', '"') filename, pos = text.extract(page, 'alt="', '"', pos) if filename.startswith("Porn-Picture-"): filename = filename[13:] return url, filename �������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1709611201.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.26.9/gallery_dl/extractor/imgbb.py�����������������������������������������������������0000644�0001750�0001750�00000016767�14571514301�020230� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2019-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://imgbb.com/""" from .common import Extractor, Message from .. import text, util, exception from ..cache import cache class ImgbbExtractor(Extractor): """Base class for imgbb extractors""" category = "imgbb" directory_fmt = ("{category}", "{user}") filename_fmt = "{title} {id}.{extension}" archive_fmt = "{id}" root = "https://imgbb.com" def __init__(self, match): Extractor.__init__(self, match) self.page_url = self.sort = None def items(self): self.login() url = self.page_url params = {"sort": self.sort} while True: response = self.request(url, params=params, allow_redirects=False) if response.status_code < 300: break url = response.headers["location"] if url.startswith(self.root): raise exception.NotFoundError(self.subcategory) page = response.text data = self.metadata(page) first = True for img in self.images(page): image = { "id" : img["url_viewer"].rpartition("/")[2], "user" : img["user"]["username"] if "user" in img else "", "title" : text.unescape(img["title"]), "url" : img["image"]["url"], "extension": img["image"]["extension"], "size" : text.parse_int(img["image"]["size"]), "width" : text.parse_int(img["width"]), "height" : text.parse_int(img["height"]), } image.update(data) if first: first = False yield Message.Directory, data yield Message.Url, image["url"], image def login(self): username, password = self._get_auth_info() if username: self.cookies_update(self._login_impl(username, password)) @cache(maxage=365*86400, keyarg=1) def _login_impl(self, username, password): self.log.info("Logging in as %s", username) url = self.root + "/login" page = self.request(url).text token = text.extr(page, 'PF.obj.config.auth_token="', '"') headers = {"Referer": url} data = { "auth_token" : token, "login-subject": username, "password" : password, } response = self.request(url, method="POST", headers=headers, data=data) if not response.history: raise exception.AuthenticationError() return self.cookies def _extract_resource(self, page): return util.json_loads(text.extr( page, "CHV.obj.resource=", "};") + "}") def _extract_user(self, page): return self._extract_resource(page).get("user") or {} def _pagination(self, page, endpoint, params): data = None seek, pos = text.extract(page, 'data-seek="', '"') tokn, pos = text.extract(page, 'PF.obj.config.auth_token="', '"', pos) params["action"] = "list" params["list"] = "images" params["sort"] = self.sort params["seek"] = seek params["page"] = 2 params["auth_token"] = tokn while True: for img in text.extract_iter(page, "data-object='", "'"): yield util.json_loads(text.unquote(img)) if data: if not data["seekEnd"] or params["seek"] == data["seekEnd"]: return params["seek"] = data["seekEnd"] params["page"] += 1 elif not seek or 'class="pagination-next"' not in page: return data = self.request(endpoint, method="POST", data=params).json() page = data["html"] class ImgbbAlbumExtractor(ImgbbExtractor): """Extractor for albums on imgbb.com""" subcategory = "album" directory_fmt = ("{category}", "{user}", "{album_name} {album_id}") pattern = r"(?:https?://)?ibb\.co/album/([^/?#]+)/?(?:\?([^#]+))?" example = "https://ibb.co/album/ID" def __init__(self, match): ImgbbExtractor.__init__(self, match) self.album_name = None self.album_id = match.group(1) self.sort = text.parse_query(match.group(2)).get("sort", "date_desc") self.page_url = "https://ibb.co/album/" + self.album_id def metadata(self, page): album = text.extr(page, '"og:title" content="', '"') user = self._extract_user(page) return { "album_id" : self.album_id, "album_name" : text.unescape(album), "user" : user.get("username") or "", "user_id" : user.get("id") or "", "displayname": user.get("name") or "", } def images(self, page): url = text.extr(page, '"og:url" content="', '"') album_id = url.rpartition("/")[2].partition("?")[0] return self._pagination(page, "https://ibb.co/json", { "from" : "album", "albumid" : album_id, "params_hidden[list]" : "images", "params_hidden[from]" : "album", "params_hidden[albumid]": album_id, }) class ImgbbUserExtractor(ImgbbExtractor): """Extractor for user profiles in imgbb.com""" subcategory = "user" pattern = r"(?:https?://)?([\w-]+)\.imgbb\.com/?(?:\?([^#]+))?$" example = "https://USER.imgbb.com" def __init__(self, match): ImgbbExtractor.__init__(self, match) self.user = match.group(1) self.sort = text.parse_query(match.group(2)).get("sort", "date_desc") self.page_url = "https://{}.imgbb.com/".format(self.user) def metadata(self, page): user = self._extract_user(page) return { "user" : user.get("username") or self.user, "user_id" : user.get("id") or "", "displayname": user.get("name") or "", } def images(self, page): user = text.extr(page, '.obj.resource={"id":"', '"') return self._pagination(page, self.page_url + "json", { "from" : "user", "userid" : user, "params_hidden[userid]": user, "params_hidden[from]" : "user", }) class ImgbbImageExtractor(ImgbbExtractor): subcategory = "image" pattern = r"(?:https?://)?ibb\.co/(?!album/)([^/?#]+)" example = "https://ibb.co/ID" def __init__(self, match): ImgbbExtractor.__init__(self, match) self.image_id = match.group(1) def items(self): url = "https://ibb.co/" + self.image_id page = self.request(url).text extr = text.extract_from(page) user = self._extract_user(page) image = { "id" : self.image_id, "title" : text.unescape(extr( '"og:title" content="', ' hosted at ImgBB"')), "url" : extr('"og:image" content="', '"'), "width" : text.parse_int(extr('"og:image:width" content="', '"')), "height": text.parse_int(extr('"og:image:height" content="', '"')), "user" : user.get("username") or "", "user_id" : user.get("id") or "", "displayname": user.get("name") or "", } image["extension"] = text.ext_from_url(image["url"]) yield Message.Directory, image yield Message.Url, image["url"], image ���������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1709611201.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.26.9/gallery_dl/extractor/imgbox.py����������������������������������������������������0000644�0001750�0001750�00000007046�14571514301�020423� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2014-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://imgbox.com/""" from .common import Extractor, Message, AsynchronousMixin from .. import text, exception import re class ImgboxExtractor(Extractor): """Base class for imgbox extractors""" category = "imgbox" root = "https://imgbox.com" def items(self): data = self.get_job_metadata() yield Message.Directory, data for image_key in self.get_image_keys(): imgpage = self.request(self.root + "/" + image_key).text imgdata = self.get_image_metadata(imgpage) if imgdata["filename"]: imgdata.update(data) imgdata["image_key"] = image_key text.nameext_from_url(imgdata["filename"], imgdata) yield Message.Url, self.get_image_url(imgpage), imgdata @staticmethod def get_job_metadata(): """Collect metadata for extractor-job""" return {} @staticmethod def get_image_keys(): """Return an iterable containing all image-keys""" return [] @staticmethod def get_image_metadata(page): """Collect metadata for a downloadable file""" return text.extract_all(page, ( ("num" , '</a>   ', ' of '), (None , 'class="image-container"', ''), ("filename" , ' title="', '"'), ))[0] @staticmethod def get_image_url(page): """Extract download-url""" return text.extr(page, 'property="og:image" content="', '"') class ImgboxGalleryExtractor(AsynchronousMixin, ImgboxExtractor): """Extractor for image galleries from imgbox.com""" subcategory = "gallery" directory_fmt = ("{category}", "{title} - {gallery_key}") filename_fmt = "{num:>03}-{filename}.{extension}" archive_fmt = "{gallery_key}_{image_key}" pattern = r"(?:https?://)?(?:www\.)?imgbox\.com/g/([A-Za-z0-9]{10})" example = "https://imgbox.com/g/12345abcde" def __init__(self, match): ImgboxExtractor.__init__(self, match) self.gallery_key = match.group(1) self.image_keys = [] def get_job_metadata(self): page = self.request(self.root + "/g/" + self.gallery_key).text if "The specified gallery could not be found." in page: raise exception.NotFoundError("gallery") self.image_keys = re.findall(r'<a href="/([^"]+)"><img alt="', page) title = text.extr(page, "<h1>", "</h1>") title, _, count = title.rpartition(" - ") return { "gallery_key": self.gallery_key, "title": text.unescape(title), "count": count[:-7], } def get_image_keys(self): return self.image_keys class ImgboxImageExtractor(ImgboxExtractor): """Extractor for single images from imgbox.com""" subcategory = "image" archive_fmt = "{image_key}" pattern = r"(?:https?://)?(?:www\.)?imgbox\.com/([A-Za-z0-9]{8})" example = "https://imgbox.com/1234abcd" def __init__(self, match): ImgboxExtractor.__init__(self, match) self.image_key = match.group(1) def get_image_keys(self): return (self.image_key,) @staticmethod def get_image_metadata(page): data = ImgboxExtractor.get_image_metadata(page) if not data["filename"]: raise exception.NotFoundError("image") return data ������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1709611201.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.26.9/gallery_dl/extractor/imgth.py�����������������������������������������������������0000644�0001750�0001750�00000003654�14571514301�020247� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2015-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://imgth.com/""" from .common import GalleryExtractor from .. import text class ImgthGalleryExtractor(GalleryExtractor): """Extractor for image galleries from imgth.com""" category = "imgth" root = "https://imgth.com" pattern = r"(?:https?://)?(?:www\.)?imgth\.com/gallery/(\d+)" example = "https://imgth.com/gallery/123/TITLE" def __init__(self, match): self.gallery_id = gid = match.group(1) url = "{}/gallery/{}/g/".format(self.root, gid) GalleryExtractor.__init__(self, match, url) def metadata(self, page): extr = text.extract_from(page) return { "gallery_id": text.parse_int(self.gallery_id), "title": text.unescape(extr("<h1>", "</h1>")), "count": text.parse_int(extr( "total of images in this gallery: ", " ")), "date" : text.parse_datetime( extr("created on ", " by <") .replace("th, ", " ", 1).replace("nd, ", " ", 1) .replace("st, ", " ", 1), "%B %d %Y at %H:%M"), "user" : text.unescape(extr(">", "<")), } def images(self, page): pnum = 0 while True: thumbs = text.extr(page, '<ul class="thumbnails">', '</ul>') for url in text.extract_iter(thumbs, '<img src="', '"'): path = url.partition("/thumbs/")[2] yield ("{}/images/{}".format(self.root, path), None) if '<li class="next">' not in page: return pnum += 1 url = "{}/gallery/{}/g/page/{}".format( self.root, self.gallery_id, pnum) page = self.request(url).text ������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1710018045.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.26.9/gallery_dl/extractor/imgur.py�����������������������������������������������������0000644�0001750�0001750�00000022367�14573146775�020306� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2015-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://imgur.com/""" from .common import Extractor, Message from .. import text, exception BASE_PATTERN = r"(?:https?://)?(?:www\.|[im]\.)?imgur\.(?:com|io)" class ImgurExtractor(Extractor): """Base class for imgur extractors""" category = "imgur" root = "https://imgur.com" def __init__(self, match): Extractor.__init__(self, match) self.key = match.group(1) def _init(self): self.api = ImgurAPI(self) self.mp4 = self.config("mp4", True) def _prepare(self, image): image.update(image["metadata"]) del image["metadata"] if image["ext"] == "jpeg": image["ext"] = "jpg" elif image["is_animated"] and self.mp4 and image["ext"] == "gif": image["ext"] = "mp4" image["url"] = url = "https://i.imgur.com/{}.{}".format( image["id"], image["ext"]) image["date"] = text.parse_datetime(image["created_at"]) image["_http_validate"] = self._validate text.nameext_from_url(url, image) return url def _validate(self, response): return (not response.history or not response.url.endswith("/removed.png")) def _items_queue(self, items): album_ex = ImgurAlbumExtractor image_ex = ImgurImageExtractor for item in items: if item["is_album"]: url = "https://imgur.com/a/" + item["id"] item["_extractor"] = album_ex else: url = "https://imgur.com/" + item["id"] item["_extractor"] = image_ex yield Message.Queue, url, item class ImgurImageExtractor(ImgurExtractor): """Extractor for individual images on imgur.com""" subcategory = "image" filename_fmt = "{category}_{id}{title:?_//}.{extension}" archive_fmt = "{id}" pattern = (BASE_PATTERN + r"/(?!gallery|search)" r"(?:r/\w+/)?(\w{7}|\w{5})[sbtmlh]?") example = "https://imgur.com/abcdefg" def items(self): image = self.api.image(self.key) try: del image["ad_url"] del image["ad_type"] except KeyError: pass image.update(image["media"][0]) del image["media"] url = self._prepare(image) yield Message.Directory, image yield Message.Url, url, image class ImgurAlbumExtractor(ImgurExtractor): """Extractor for imgur albums""" subcategory = "album" directory_fmt = ("{category}", "{album[id]}{album[title]:? - //}") filename_fmt = "{category}_{album[id]}_{num:>03}_{id}.{extension}" archive_fmt = "{album[id]}_{id}" pattern = BASE_PATTERN + r"/a/(\w{7}|\w{5})" example = "https://imgur.com/a/abcde" def items(self): album = self.api.album(self.key) try: images = album["media"] except KeyError: return del album["media"] count = len(images) album["date"] = text.parse_datetime(album["created_at"]) try: del album["ad_url"] del album["ad_type"] except KeyError: pass for num, image in enumerate(images, 1): url = self._prepare(image) image["num"] = num image["count"] = count image["album"] = album yield Message.Directory, image yield Message.Url, url, image class ImgurGalleryExtractor(ImgurExtractor): """Extractor for imgur galleries""" subcategory = "gallery" pattern = BASE_PATTERN + r"/(?:gallery|t/\w+)/(\w{7}|\w{5})" example = "https://imgur.com/gallery/abcde" def items(self): if self.api.gallery(self.key)["is_album"]: url = "{}/a/{}".format(self.root, self.key) extr = ImgurAlbumExtractor else: url = "{}/{}".format(self.root, self.key) extr = ImgurImageExtractor yield Message.Queue, url, {"_extractor": extr} class ImgurUserExtractor(ImgurExtractor): """Extractor for all images posted by a user""" subcategory = "user" pattern = BASE_PATTERN + r"/user/([^/?#]+)(?:/posts|/submitted)?/?$" example = "https://imgur.com/user/USER" def items(self): return self._items_queue(self.api.account_submissions(self.key)) class ImgurFavoriteExtractor(ImgurExtractor): """Extractor for a user's favorites""" subcategory = "favorite" pattern = BASE_PATTERN + r"/user/([^/?#]+)/favorites/?$" example = "https://imgur.com/user/USER/favorites" def items(self): return self._items_queue(self.api.account_favorites(self.key)) class ImgurFavoriteFolderExtractor(ImgurExtractor): """Extractor for a user's favorites folder""" subcategory = "favorite-folder" pattern = BASE_PATTERN + r"/user/([^/?#]+)/favorites/folder/(\d+)" example = "https://imgur.com/user/USER/favorites/folder/12345/TITLE" def __init__(self, match): ImgurExtractor.__init__(self, match) self.folder_id = match.group(2) def items(self): return self._items_queue(self.api.account_favorites_folder( self.key, self.folder_id)) class ImgurSubredditExtractor(ImgurExtractor): """Extractor for a subreddits's imgur links""" subcategory = "subreddit" pattern = BASE_PATTERN + r"/r/([^/?#]+)/?$" example = "https://imgur.com/r/SUBREDDIT" def items(self): return self._items_queue(self.api.gallery_subreddit(self.key)) class ImgurTagExtractor(ImgurExtractor): """Extractor for imgur tag searches""" subcategory = "tag" pattern = BASE_PATTERN + r"/t/([^/?#]+)$" example = "https://imgur.com/t/TAG" def items(self): return self._items_queue(self.api.gallery_tag(self.key)) class ImgurSearchExtractor(ImgurExtractor): """Extractor for imgur search results""" subcategory = "search" pattern = BASE_PATTERN + r"/search(?:/[^?#]+)?/?\?q=([^&#]+)" example = "https://imgur.com/search?q=UERY" def items(self): key = text.unquote(self.key.replace("+", " ")) return self._items_queue(self.api.gallery_search(key)) class ImgurAPI(): """Interface for the Imgur API Ref: https://apidocs.imgur.com/ """ def __init__(self, extractor): self.extractor = extractor self.client_id = extractor.config("client-id") or "546c25a59c58ad7" self.headers = {"Authorization": "Client-ID " + self.client_id} def account_favorites(self, account): endpoint = "/3/account/{}/gallery_favorites".format(account) return self._pagination(endpoint) def account_favorites_folder(self, account, folder_id): endpoint = "/3/account/{}/folders/{}/favorites".format( account, folder_id) return self._pagination_v2(endpoint) def gallery_search(self, query): endpoint = "/3/gallery/search" params = {"q": query} return self._pagination(endpoint, params) def account_submissions(self, account): endpoint = "/3/account/{}/submissions".format(account) return self._pagination(endpoint) def gallery_subreddit(self, subreddit): endpoint = "/3/gallery/r/{}".format(subreddit) return self._pagination(endpoint) def gallery_tag(self, tag): endpoint = "/3/gallery/t/{}".format(tag) return self._pagination(endpoint, key="items") def image(self, image_hash): endpoint = "/post/v1/media/" + image_hash params = {"include": "media,tags,account"} return self._call(endpoint, params) def album(self, album_hash): endpoint = "/post/v1/albums/" + album_hash params = {"include": "media,tags,account"} return self._call(endpoint, params) def gallery(self, gallery_hash): endpoint = "/post/v1/posts/" + gallery_hash return self._call(endpoint) def _call(self, endpoint, params=None, headers=None): while True: try: return self.extractor.request( "https://api.imgur.com" + endpoint, params=params, headers=(headers or self.headers), ).json() except exception.HttpError as exc: if exc.status not in (403, 429) or \ b"capacity" not in exc.response.content: raise self.extractor.wait(seconds=600) def _pagination(self, endpoint, params=None, key=None): num = 0 while True: data = self._call("{}/{}".format(endpoint, num), params)["data"] if key: data = data[key] if not data: return yield from data num += 1 def _pagination_v2(self, endpoint, params=None, key=None): if params is None: params = {} params["client_id"] = self.client_id params["page"] = 0 params["sort"] = "newest" headers = {"Origin": "https://imgur.com"} while True: data = self._call(endpoint, params, headers)["data"] if not data: return yield from data params["page"] += 1 �������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1709611201.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.26.9/gallery_dl/extractor/inkbunny.py��������������������������������������������������0000644�0001750�0001750�00000030301�14571514301�020761� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2020-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://inkbunny.net/""" from .common import Extractor, Message from .. import text, exception from ..cache import cache BASE_PATTERN = r"(?:https?://)?(?:www\.)?inkbunny\.net" class InkbunnyExtractor(Extractor): """Base class for inkbunny extractors""" category = "inkbunny" directory_fmt = ("{category}", "{username!l}") filename_fmt = "{submission_id} {file_id} {title}.{extension}" archive_fmt = "{file_id}" root = "https://inkbunny.net" def _init(self): self.api = InkbunnyAPI(self) def items(self): self.api.authenticate() metadata = self.metadata() to_bool = ("deleted", "favorite", "friends_only", "guest_block", "hidden", "public", "scraps") for post in self.posts(): post.update(metadata) post["date"] = text.parse_datetime( post["create_datetime"] + "00", "%Y-%m-%d %H:%M:%S.%f%z") post["tags"] = [kw["keyword_name"] for kw in post["keywords"]] post["ratings"] = [r["name"] for r in post["ratings"]] files = post["files"] for key in to_bool: if key in post: post[key] = (post[key] == "t") del post["keywords"] del post["files"] yield Message.Directory, post for post["num"], file in enumerate(files, 1): post.update(file) post["deleted"] = (file["deleted"] == "t") post["date"] = text.parse_datetime( file["create_datetime"] + "00", "%Y-%m-%d %H:%M:%S.%f%z") text.nameext_from_url(file["file_name"], post) url = file["file_url_full"] if "/private_files/" in url: url += "?sid=" + self.api.session_id yield Message.Url, url, post def posts(self): return () def metadata(self): return () class InkbunnyUserExtractor(InkbunnyExtractor): """Extractor for inkbunny user profiles""" subcategory = "user" pattern = BASE_PATTERN + r"/(?!s/)(gallery/|scraps/)?(\w+)(?:$|[/?#])" example = "https://inkbunny.net/USER" def __init__(self, match): kind, self.user = match.groups() if not kind: self.scraps = None elif kind[0] == "g": self.subcategory = "gallery" self.scraps = "no" else: self.subcategory = "scraps" self.scraps = "only" InkbunnyExtractor.__init__(self, match) def posts(self): orderby = self.config("orderby") params = { "username": self.user, "scraps" : self.scraps, "orderby" : orderby, } if orderby and orderby.startswith("unread_"): params["unread_submissions"] = "yes" return self.api.search(params) class InkbunnyPoolExtractor(InkbunnyExtractor): """Extractor for inkbunny pools""" subcategory = "pool" pattern = (BASE_PATTERN + r"/(?:" r"poolview_process\.php\?pool_id=(\d+)|" r"submissionsviewall\.php" r"\?((?:[^#]+&)?mode=pool(?:&[^#]+)?))") example = "https://inkbunny.net/poolview_process.php?pool_id=12345" def __init__(self, match): InkbunnyExtractor.__init__(self, match) pid = match.group(1) if pid: self.pool_id = pid self.orderby = "pool_order" else: params = text.parse_query(match.group(2)) self.pool_id = params.get("pool_id") self.orderby = params.get("orderby", "pool_order") def metadata(self): return {"pool_id": self.pool_id} def posts(self): params = { "pool_id": self.pool_id, "orderby": self.orderby, } return self.api.search(params) class InkbunnyFavoriteExtractor(InkbunnyExtractor): """Extractor for inkbunny user favorites""" subcategory = "favorite" pattern = (BASE_PATTERN + r"/(?:" r"userfavorites_process\.php\?favs_user_id=(\d+)|" r"submissionsviewall\.php" r"\?((?:[^#]+&)?mode=userfavs(?:&[^#]+)?))") example = ("https://inkbunny.net/userfavorites_process.php" "?favs_user_id=12345") def __init__(self, match): InkbunnyExtractor.__init__(self, match) uid = match.group(1) if uid: self.user_id = uid self.orderby = self.config("orderby", "fav_datetime") else: params = text.parse_query(match.group(2)) self.user_id = params.get("user_id") self.orderby = params.get("orderby", "fav_datetime") def metadata(self): return {"favs_user_id": self.user_id} def posts(self): params = { "favs_user_id": self.user_id, "orderby" : self.orderby, } if self.orderby and self.orderby.startswith("unread_"): params["unread_submissions"] = "yes" return self.api.search(params) class InkbunnyUnreadExtractor(InkbunnyExtractor): """Extractor for unread inkbunny submissions""" subcategory = "unread" pattern = (BASE_PATTERN + r"/submissionsviewall\.php" r"\?((?:[^#]+&)?mode=unreadsubs(?:&[^#]+)?)") example = ("https://inkbunny.net/submissionsviewall.php" "?text=&mode=unreadsubs&type=") def __init__(self, match): InkbunnyExtractor.__init__(self, match) self.params = text.parse_query(match.group(1)) def posts(self): params = self.params.copy() params.pop("rid", None) params.pop("mode", None) params["unread_submissions"] = "yes" return self.api.search(params) class InkbunnySearchExtractor(InkbunnyExtractor): """Extractor for inkbunny search results""" subcategory = "search" pattern = (BASE_PATTERN + r"/submissionsviewall\.php" r"\?((?:[^#]+&)?mode=search(?:&[^#]+)?)") example = ("https://inkbunny.net/submissionsviewall.php" "?text=TAG&mode=search&type=") def __init__(self, match): InkbunnyExtractor.__init__(self, match) self.params = text.parse_query(match.group(1)) def metadata(self): return {"search": self.params} def posts(self): params = self.params.copy() pop = params.pop pop("rid", None) params["string_join_type"] = pop("stringtype", None) params["dayslimit"] = pop("days", None) params["username"] = pop("artist", None) favsby = pop("favsby", None) if favsby: # get user_id from user profile url = "{}/{}".format(self.root, favsby) page = self.request(url).text user_id = text.extr(page, "?user_id=", "'") params["favs_user_id"] = user_id.partition("&")[0] return self.api.search(params) class InkbunnyFollowingExtractor(InkbunnyExtractor): """Extractor for inkbunny user watches""" subcategory = "following" pattern = (BASE_PATTERN + r"/(?:" r"watchlist_process\.php\?mode=watching&user_id=(\d+)|" r"usersviewall\.php" r"\?((?:[^#]+&)?mode=watching(?:&[^#]+)?))") example = ("https://inkbunny.net/watchlist_process.php" "?mode=watching&user_id=12345") def __init__(self, match): InkbunnyExtractor.__init__(self, match) self.user_id = match.group(1) or \ text.parse_query(match.group(2)).get("user_id") def items(self): url = self.root + "/watchlist_process.php" params = {"mode": "watching", "user_id": self.user_id} with self.request(url, params=params) as response: url, _, params = response.url.partition("?") page = response.text params = text.parse_query(params) params["page"] = text.parse_int(params.get("page"), 1) data = {"_extractor": InkbunnyUserExtractor} while True: cnt = 0 for user in text.extract_iter( page, '<a class="widget_userNameSmall" href="', '"', page.index('id="changethumboriginal_form"')): cnt += 1 yield Message.Queue, self.root + user, data if cnt < 20: return params["page"] += 1 page = self.request(url, params=params).text class InkbunnyPostExtractor(InkbunnyExtractor): """Extractor for individual Inkbunny posts""" subcategory = "post" pattern = BASE_PATTERN + r"/s/(\d+)" example = "https://inkbunny.net/s/12345" def __init__(self, match): InkbunnyExtractor.__init__(self, match) self.submission_id = match.group(1) def posts(self): submissions = self.api.detail(({"submission_id": self.submission_id},)) if submissions[0] is None: raise exception.NotFoundError("submission") return submissions class InkbunnyAPI(): """Interface for the Inkunny API Ref: https://wiki.inkbunny.net/wiki/API """ def __init__(self, extractor): self.extractor = extractor self.session_id = None def detail(self, submissions): """Get full details about submissions with the given IDs""" ids = { sub["submission_id"]: idx for idx, sub in enumerate(submissions) } params = { "submission_ids": ",".join(ids), "show_description": "yes", } submissions = [None] * len(ids) for sub in self._call("submissions", params)["submissions"]: submissions[ids[sub["submission_id"]]] = sub return submissions def search(self, params): """Perform a search""" return self._pagination_search(params) def set_allowed_ratings(self, nudity=True, sexual=True, violence=True, strong_violence=True): """Change allowed submission ratings""" params = { "tag[2]": "yes" if nudity else "no", "tag[3]": "yes" if violence else "no", "tag[4]": "yes" if sexual else "no", "tag[5]": "yes" if strong_violence else "no", } self._call("userrating", params) def authenticate(self, invalidate=False): username, password = self.extractor._get_auth_info() if invalidate: _authenticate_impl.invalidate(username or "guest") if username: self.session_id = _authenticate_impl(self, username, password) else: self.session_id = _authenticate_impl(self, "guest", "") self.set_allowed_ratings() def _call(self, endpoint, params): url = "https://inkbunny.net/api_" + endpoint + ".php" params["sid"] = self.session_id data = self.extractor.request(url, params=params).json() if "error_code" in data: if str(data["error_code"]) == "2": self.authenticate(invalidate=True) return self._call(endpoint, params) raise exception.StopExtraction(data.get("error_message")) return data def _pagination_search(self, params): params["page"] = 1 params["get_rid"] = "yes" params["submission_ids_only"] = "yes" while True: data = self._call("search", params) if not data["submissions"]: return yield from self.detail(data["submissions"]) if data["page"] >= data["pages_count"]: return if "get_rid" in params: del params["get_rid"] params["rid"] = data["rid"] params["page"] += 1 @cache(maxage=365*86400, keyarg=1) def _authenticate_impl(api, username, password): api.extractor.log.info("Logging in as %s", username) url = "https://inkbunny.net/api_login.php" data = {"username": username, "password": password} data = api.extractor.request(url, method="POST", data=data).json() if "sid" not in data: raise exception.AuthenticationError(data.get("error_message")) return data["sid"] �������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1710817901.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.26.9/gallery_dl/extractor/instagram.py�������������������������������������������������0000644�0001750�0001750�00000104170�14576201155�021124� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2018-2020 Leonardo Taccari # Copyright 2018-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.instagram.com/""" from .common import Extractor, Message from .. import text, util, exception from ..cache import cache, memcache import binascii import json import re BASE_PATTERN = r"(?:https?://)?(?:www\.)?instagram\.com" USER_PATTERN = BASE_PATTERN + r"/(?!(?:p|tv|reel|explore|stories)/)([^/?#]+)" class InstagramExtractor(Extractor): """Base class for instagram extractors""" category = "instagram" directory_fmt = ("{category}", "{username}") filename_fmt = "{sidecar_media_id:?/_/}{media_id}.{extension}" archive_fmt = "{media_id}" root = "https://www.instagram.com" cookies_domain = ".instagram.com" cookies_names = ("sessionid",) request_interval = (6.0, 12.0) def __init__(self, match): Extractor.__init__(self, match) self.item = match.group(1) def _init(self): self.www_claim = "0" self.csrf_token = util.generate_token() self._find_tags = re.compile(r"#\w+").findall self._logged_in = True self._cursor = None self._user = None self.cookies.set( "csrftoken", self.csrf_token, domain=self.cookies_domain) if self.config("api") == "graphql": self.api = InstagramGraphqlAPI(self) else: self.api = InstagramRestAPI(self) def items(self): self.login() data = self.metadata() videos = self.config("videos", True) previews = self.config("previews", False) video_headers = {"User-Agent": "Mozilla/5.0"} order = self.config("order-files") reverse = order[0] in ("r", "d") if order else False for post in self.posts(): if "__typename" in post: post = self._parse_post_graphql(post) else: post = self._parse_post_rest(post) if self._user: post["user"] = self._user post.update(data) files = post.pop("_files") post["count"] = len(files) yield Message.Directory, post if "date" in post: del post["date"] if reverse: files.reverse() for file in files: file.update(post) url = file.get("video_url") if url: if videos: file["_http_headers"] = video_headers text.nameext_from_url(url, file) yield Message.Url, url, file if previews: file["media_id"] += "p" else: continue url = file["display_url"] yield Message.Url, url, text.nameext_from_url(url, file) def metadata(self): return () def posts(self): return () def finalize(self): if self._cursor: self.log.info("Use '-o cursor=%s' to continue downloading " "from the current position", self._cursor) def request(self, url, **kwargs): response = Extractor.request(self, url, **kwargs) if response.history: url = response.url if "/accounts/login/" in url: page = "login" elif "/challenge/" in url: page = "challenge" else: page = None if page: raise exception.StopExtraction("HTTP redirect to %s page (%s)", page, url.partition("?")[0]) www_claim = response.headers.get("x-ig-set-www-claim") if www_claim is not None: self.www_claim = www_claim csrf_token = response.cookies.get("csrftoken") if csrf_token: self.csrf_token = csrf_token return response def login(self): if self.cookies_check(self.cookies_names): return username, password = self._get_auth_info() if username: return self.cookies_update(_login_impl(self, username, password)) self._logged_in = False def _parse_post_rest(self, post): if "items" in post: # story or highlight items = post["items"] reel_id = str(post["id"]).rpartition(":")[2] data = { "expires": text.parse_timestamp(post.get("expiring_at")), "post_id": reel_id, "post_shortcode": shortcode_from_id(reel_id), } if "title" in post: data["highlight_title"] = post["title"] if "created_at" in post: data["date"] = text.parse_timestamp(post.get("created_at")) else: # regular image/video post data = { "post_id" : post["pk"], "post_shortcode": post["code"], "likes": post.get("like_count", 0), "pinned": post.get("timeline_pinned_user_ids", ()), "date": text.parse_timestamp(post.get("taken_at")), } caption = post["caption"] data["description"] = caption["text"] if caption else "" tags = self._find_tags(data["description"]) if tags: data["tags"] = sorted(set(tags)) location = post.get("location") if location: slug = location["short_name"].replace(" ", "-").lower() data["location_id"] = location["pk"] data["location_slug"] = slug data["location_url"] = "{}/explore/locations/{}/{}/".format( self.root, location["pk"], slug) coauthors = post.get("coauthor_producers") if coauthors: data["coauthors"] = [ {"id" : user["pk"], "username" : user["username"], "full_name": user["full_name"]} for user in coauthors ] if "carousel_media" in post: items = post["carousel_media"] data["sidecar_media_id"] = data["post_id"] data["sidecar_shortcode"] = data["post_shortcode"] else: items = (post,) owner = post["user"] data["owner_id"] = owner["pk"] data["username"] = owner.get("username") data["fullname"] = owner.get("full_name") data["post_url"] = "{}/p/{}/".format(self.root, data["post_shortcode"]) data["_files"] = files = [] for num, item in enumerate(items, 1): try: image = item["image_versions2"]["candidates"][0] except Exception: self.log.warning("Missing media in post %s", data["post_shortcode"]) continue video_versions = item.get("video_versions") if video_versions: video = max( video_versions, key=lambda x: (x["width"], x["height"], x["type"]), ) media = video else: video = None media = image media = { "num" : num, "date" : text.parse_timestamp(item.get("taken_at") or media.get("taken_at") or post.get("taken_at")), "media_id" : item["pk"], "shortcode" : (item.get("code") or shortcode_from_id(item["pk"])), "display_url": image["url"], "video_url" : video["url"] if video else None, "width" : media["width"], "height" : media["height"], } if "expiring_at" in item: media["expires"] = text.parse_timestamp(post["expiring_at"]) self._extract_tagged_users(item, media) files.append(media) return data def _parse_post_graphql(self, post): typename = post["__typename"] if self._logged_in: if post.get("is_video") and "video_url" not in post: post = self.api.media(post["id"])[0] elif typename == "GraphSidecar" and \ "edge_sidecar_to_children" not in post: post = self.api.media(post["id"])[0] pinned = post.get("pinned_for_users", ()) if pinned: for index, user in enumerate(pinned): pinned[index] = int(user["id"]) owner = post["owner"] data = { "typename" : typename, "date" : text.parse_timestamp(post["taken_at_timestamp"]), "likes" : post["edge_media_preview_like"]["count"], "pinned" : pinned, "owner_id" : owner["id"], "username" : owner.get("username"), "fullname" : owner.get("full_name"), "post_id" : post["id"], "post_shortcode": post["shortcode"], "post_url" : "{}/p/{}/".format(self.root, post["shortcode"]), "description": text.parse_unicode_escapes("\n".join( edge["node"]["text"] for edge in post["edge_media_to_caption"]["edges"] )), } tags = self._find_tags(data["description"]) if tags: data["tags"] = sorted(set(tags)) location = post.get("location") if location: data["location_id"] = location["id"] data["location_slug"] = location["slug"] data["location_url"] = "{}/explore/locations/{}/{}/".format( self.root, location["id"], location["slug"]) coauthors = post.get("coauthor_producers") if coauthors: data["coauthors"] = [ {"id" : user["id"], "username": user["username"]} for user in coauthors ] data["_files"] = files = [] if "edge_sidecar_to_children" in post: for num, edge in enumerate( post["edge_sidecar_to_children"]["edges"], 1): node = edge["node"] dimensions = node["dimensions"] media = { "num": num, "media_id" : node["id"], "shortcode" : (node.get("shortcode") or shortcode_from_id(node["id"])), "display_url": node["display_url"], "video_url" : node.get("video_url"), "width" : dimensions["width"], "height" : dimensions["height"], "sidecar_media_id" : post["id"], "sidecar_shortcode": post["shortcode"], } self._extract_tagged_users(node, media) files.append(media) else: dimensions = post["dimensions"] media = { "media_id" : post["id"], "shortcode" : post["shortcode"], "display_url": post["display_url"], "video_url" : post.get("video_url"), "width" : dimensions["width"], "height" : dimensions["height"], } self._extract_tagged_users(post, media) files.append(media) return data @staticmethod def _extract_tagged_users(src, dest): dest["tagged_users"] = tagged_users = [] edges = src.get("edge_media_to_tagged_user") if edges: for edge in edges["edges"]: user = edge["node"]["user"] tagged_users.append({"id" : user["id"], "username" : user["username"], "full_name": user["full_name"]}) usertags = src.get("usertags") if usertags: for tag in usertags["in"]: user = tag["user"] tagged_users.append({"id" : user["pk"], "username" : user["username"], "full_name": user["full_name"]}) mentions = src.get("reel_mentions") if mentions: for mention in mentions: user = mention["user"] tagged_users.append({"id" : user.get("pk"), "username" : user["username"], "full_name": user["full_name"]}) stickers = src.get("story_bloks_stickers") if stickers: for sticker in stickers: sticker = sticker["bloks_sticker"] if sticker["bloks_sticker_type"] == "mention": user = sticker["sticker_data"]["ig_mention"] tagged_users.append({"id" : user["account_id"], "username" : user["username"], "full_name": user["full_name"]}) def _init_cursor(self): return self.config("cursor") or None def _update_cursor(self, cursor): self.log.debug("Cursor: %s", cursor) self._cursor = cursor return cursor def _assign_user(self, user): self._user = user for key, old in ( ("count_media" , "edge_owner_to_timeline_media"), ("count_video" , "edge_felix_video_timeline"), ("count_saved" , "edge_saved_media"), ("count_mutual" , "edge_mutual_followed_by"), ("count_follow" , "edge_follow"), ("count_followed" , "edge_followed_by"), ("count_collection", "edge_media_collections")): try: user[key] = user.pop(old)["count"] except Exception: user[key] = 0 class InstagramUserExtractor(InstagramExtractor): """Extractor for an Instagram user profile""" subcategory = "user" pattern = USER_PATTERN + r"/?(?:$|[?#])" example = "https://www.instagram.com/USER/" def initialize(self): pass def finalize(self): pass def items(self): base = "{}/{}/".format(self.root, self.item) stories = "{}/stories/{}/".format(self.root, self.item) return self._dispatch_extractors(( (InstagramAvatarExtractor , base + "avatar/"), (InstagramStoriesExtractor , stories), (InstagramHighlightsExtractor, base + "highlights/"), (InstagramPostsExtractor , base + "posts/"), (InstagramReelsExtractor , base + "reels/"), (InstagramTaggedExtractor , base + "tagged/"), ), ("posts",)) class InstagramPostsExtractor(InstagramExtractor): """Extractor for an Instagram user's posts""" subcategory = "posts" pattern = USER_PATTERN + r"/posts" example = "https://www.instagram.com/USER/posts/" def posts(self): uid = self.api.user_id(self.item) return self.api.user_feed(uid) class InstagramReelsExtractor(InstagramExtractor): """Extractor for an Instagram user's reels""" subcategory = "reels" pattern = USER_PATTERN + r"/reels" example = "https://www.instagram.com/USER/reels/" def posts(self): uid = self.api.user_id(self.item) return self.api.user_clips(uid) class InstagramTaggedExtractor(InstagramExtractor): """Extractor for an Instagram user's tagged posts""" subcategory = "tagged" pattern = USER_PATTERN + r"/tagged" example = "https://www.instagram.com/USER/tagged/" def metadata(self): if self.item.startswith("id:"): self.user_id = self.item[3:] return {"tagged_owner_id": self.user_id} self.user_id = self.api.user_id(self.item) user = self.api.user_by_name(self.item) return { "tagged_owner_id" : user["id"], "tagged_username" : user["username"], "tagged_full_name": user["full_name"], } def posts(self): return self.api.user_tagged(self.user_id) class InstagramGuideExtractor(InstagramExtractor): """Extractor for an Instagram guide""" subcategory = "guide" pattern = USER_PATTERN + r"/guide/[^/?#]+/(\d+)" example = "https://www.instagram.com/USER/guide/NAME/12345" def __init__(self, match): InstagramExtractor.__init__(self, match) self.guide_id = match.group(2) def metadata(self): return {"guide": self.api.guide(self.guide_id)} def posts(self): return self.api.guide_media(self.guide_id) class InstagramSavedExtractor(InstagramExtractor): """Extractor for an Instagram user's saved media""" subcategory = "saved" pattern = USER_PATTERN + r"/saved(?:/all-posts)?/?$" example = "https://www.instagram.com/USER/saved/" def posts(self): return self.api.user_saved() class InstagramCollectionExtractor(InstagramExtractor): """Extractor for Instagram collection""" subcategory = "collection" pattern = USER_PATTERN + r"/saved/([^/?#]+)/([^/?#]+)" example = "https://www.instagram.com/USER/saved/COLLECTION/12345" def __init__(self, match): InstagramExtractor.__init__(self, match) self.user, self.collection_name, self.collection_id = match.groups() def metadata(self): return { "collection_id" : self.collection_id, "collection_name": text.unescape(self.collection_name), } def posts(self): return self.api.user_collection(self.collection_id) class InstagramStoriesExtractor(InstagramExtractor): """Extractor for Instagram stories""" subcategory = "stories" pattern = (r"(?:https?://)?(?:www\.)?instagram\.com" r"/s(?:tories/(?:highlights/(\d+)|([^/?#]+)(?:/(\d+))?)" r"|/(aGlnaGxpZ2h0[^?#]+)(?:\?story_media_id=(\d+))?)") example = "https://www.instagram.com/stories/USER/" def __init__(self, match): h1, self.user, m1, h2, m2 = match.groups() if self.user: self.highlight_id = None else: self.subcategory = InstagramHighlightsExtractor.subcategory self.highlight_id = ("highlight:" + h1 if h1 else binascii.a2b_base64(h2).decode()) self.media_id = m1 or m2 InstagramExtractor.__init__(self, match) def posts(self): reel_id = self.highlight_id or self.api.user_id(self.user) reels = self.api.reels_media(reel_id) if self.media_id and reels: reel = reels[0] for item in reel["items"]: if item["pk"] == self.media_id: reel["items"] = (item,) break else: raise exception.NotFoundError("story") return reels class InstagramHighlightsExtractor(InstagramExtractor): """Extractor for an Instagram user's story highlights""" subcategory = "highlights" pattern = USER_PATTERN + r"/highlights" example = "https://www.instagram.com/USER/highlights/" def posts(self): uid = self.api.user_id(self.item) return self.api.highlights_media(uid) class InstagramFollowingExtractor(InstagramExtractor): """Extractor for an Instagram user's followed users""" subcategory = "following" pattern = USER_PATTERN + r"/following" example = "https://www.instagram.com/USER/following/" def items(self): uid = self.api.user_id(self.item) for user in self.api.user_following(uid): user["_extractor"] = InstagramUserExtractor url = "{}/{}".format(self.root, user["username"]) yield Message.Queue, url, user class InstagramTagExtractor(InstagramExtractor): """Extractor for Instagram tags""" subcategory = "tag" directory_fmt = ("{category}", "{subcategory}", "{tag}") pattern = BASE_PATTERN + r"/explore/tags/([^/?#]+)" example = "https://www.instagram.com/explore/tags/TAG/" def metadata(self): return {"tag": text.unquote(self.item)} def posts(self): return self.api.tags_media(self.item) class InstagramAvatarExtractor(InstagramExtractor): """Extractor for an Instagram user's avatar""" subcategory = "avatar" pattern = USER_PATTERN + r"/avatar" example = "https://www.instagram.com/USER/avatar/" def posts(self): if self._logged_in: user_id = self.api.user_id(self.item, check_private=False) user = self.api.user_by_id(user_id) avatar = (user.get("hd_profile_pic_url_info") or user["hd_profile_pic_versions"][-1]) else: user = self.item if user.startswith("id:"): user = self.api.user_by_id(user[3:]) else: user = self.api.user_by_name(user) user["pk"] = user["id"] url = user.get("profile_pic_url_hd") or user["profile_pic_url"] avatar = {"url": url, "width": 0, "height": 0} pk = user.get("profile_pic_id") if pk: pk = pk.partition("_")[0] code = shortcode_from_id(pk) else: pk = code = "avatar:" + str(user["pk"]) return ({ "pk" : pk, "code" : code, "user" : user, "caption" : None, "like_count": 0, "image_versions2": {"candidates": (avatar,)}, },) class InstagramPostExtractor(InstagramExtractor): """Extractor for an Instagram post""" subcategory = "post" pattern = (r"(?:https?://)?(?:www\.)?instagram\.com" r"/(?:[^/?#]+/)?(?:p|tv|reel)/([^/?#]+)") example = "https://www.instagram.com/p/abcdefg/" def posts(self): return self.api.media(self.item) class InstagramRestAPI(): def __init__(self, extractor): self.extractor = extractor def guide(self, guide_id): endpoint = "/v1/guides/web_info/" params = {"guide_id": guide_id} return self._call(endpoint, params=params) def guide_media(self, guide_id): endpoint = "/v1/guides/guide/{}/".format(guide_id) return self._pagination_guides(endpoint) def highlights_media(self, user_id, chunk_size=5): reel_ids = [hl["id"] for hl in self.highlights_tray(user_id)] order = self.extractor.config("order-posts") if order: if order in ("desc", "reverse"): reel_ids.reverse() elif order in ("id", "id_asc"): reel_ids.sort(key=lambda r: int(r[10:])) elif order == "id_desc": reel_ids.sort(key=lambda r: int(r[10:]), reverse=True) elif order != "asc": self.extractor.log.warning("Unknown posts order '%s'", order) for offset in range(0, len(reel_ids), chunk_size): yield from self.reels_media( reel_ids[offset : offset+chunk_size]) def highlights_tray(self, user_id): endpoint = "/v1/highlights/{}/highlights_tray/".format(user_id) return self._call(endpoint)["tray"] def media(self, shortcode): if len(shortcode) > 28: shortcode = shortcode[:-28] endpoint = "/v1/media/{}/info/".format(id_from_shortcode(shortcode)) return self._pagination(endpoint) def reels_media(self, reel_ids): endpoint = "/v1/feed/reels_media/" params = {"reel_ids": reel_ids} try: return self._call(endpoint, params=params)["reels_media"] except KeyError: raise exception.AuthorizationError("Login required") def tags_media(self, tag): for section in self.tags_sections(tag): for media in section["layout_content"]["medias"]: yield media["media"] def tags_sections(self, tag): endpoint = "/v1/tags/{}/sections/".format(tag) data = { "include_persistent": "0", "max_id" : None, "page" : None, "surface": "grid", "tab" : "recent", } return self._pagination_sections(endpoint, data) @memcache(keyarg=1) def user_by_name(self, screen_name): endpoint = "/v1/users/web_profile_info/" params = {"username": screen_name} return self._call( endpoint, params=params, notfound="user")["data"]["user"] @memcache(keyarg=1) def user_by_id(self, user_id): endpoint = "/v1/users/{}/info/".format(user_id) return self._call(endpoint)["user"] def user_id(self, screen_name, check_private=True): if screen_name.startswith("id:"): if self.extractor.config("metadata"): self.extractor._user = self.user_by_id(screen_name[3:]) return screen_name[3:] user = self.user_by_name(screen_name) if user is None: raise exception.AuthorizationError( "Login required to access this profile") if check_private and user["is_private"] and \ not user["followed_by_viewer"]: name = user["username"] s = "" if name.endswith("s") else "s" self.extractor.log.warning("%s'%s posts are private", name, s) self.extractor._assign_user(user) return user["id"] def user_clips(self, user_id): endpoint = "/v1/clips/user/" data = { "target_user_id": user_id, "page_size": "50", "max_id": None, "include_feed_video": "true", } return self._pagination_post(endpoint, data) def user_collection(self, collection_id): endpoint = "/v1/feed/collection/{}/posts/".format(collection_id) params = {"count": 50} return self._pagination(endpoint, params, media=True) def user_feed(self, user_id): endpoint = "/v1/feed/user/{}/".format(user_id) params = {"count": 30} return self._pagination(endpoint, params) def user_following(self, user_id): endpoint = "/v1/friendships/{}/following/".format(user_id) params = {"count": 12} return self._pagination_following(endpoint, params) def user_saved(self): endpoint = "/v1/feed/saved/posts/" params = {"count": 50} return self._pagination(endpoint, params, media=True) def user_tagged(self, user_id): endpoint = "/v1/usertags/{}/feed/".format(user_id) params = {"count": 20} return self._pagination(endpoint, params) def _call(self, endpoint, **kwargs): extr = self.extractor url = "https://www.instagram.com/api" + endpoint kwargs["headers"] = { "Accept" : "*/*", "X-CSRFToken" : extr.csrf_token, "X-IG-App-ID" : "936619743392459", "X-ASBD-ID" : "129477", "X-IG-WWW-Claim" : extr.www_claim, "X-Requested-With": "XMLHttpRequest", "Connection" : "keep-alive", "Referer" : extr.root + "/", "Sec-Fetch-Dest" : "empty", "Sec-Fetch-Mode" : "cors", "Sec-Fetch-Site" : "same-origin", } return extr.request(url, **kwargs).json() def _pagination(self, endpoint, params=None, media=False): if params is None: params = {} extr = self.extractor params["max_id"] = extr._init_cursor() while True: data = self._call(endpoint, params=params) if media: for item in data["items"]: yield item["media"] else: yield from data["items"] if not data.get("more_available"): return extr._update_cursor(None) params["max_id"] = extr._update_cursor(data["next_max_id"]) def _pagination_post(self, endpoint, params): extr = self.extractor params["max_id"] = extr._init_cursor() while True: data = self._call(endpoint, method="POST", data=params) for item in data["items"]: yield item["media"] info = data["paging_info"] if not info.get("more_available"): return extr._update_cursor(None) params["max_id"] = extr._update_cursor(info["max_id"]) def _pagination_sections(self, endpoint, params): extr = self.extractor params["max_id"] = extr._init_cursor() while True: info = self._call(endpoint, method="POST", data=params) yield from info["sections"] if not info.get("more_available"): return extr._update_cursor(None) params["page"] = info["next_page"] params["max_id"] = extr._update_cursor(info["next_max_id"]) def _pagination_guides(self, endpoint): extr = self.extractor params = {"max_id": extr._init_cursor()} while True: data = self._call(endpoint, params=params) for item in data["items"]: yield from item["media_items"] if "next_max_id" not in data: return extr._update_cursor(None) params["max_id"] = extr._update_cursor(data["next_max_id"]) def _pagination_following(self, endpoint, params): extr = self.extractor params["max_id"] = text.parse_int(extr._init_cursor()) while True: data = self._call(endpoint, params=params) yield from data["users"] if len(data["users"]) < params["count"]: return extr._update_cursor(None) params["max_id"] = extr._update_cursor( params["max_id"] + params["count"]) class InstagramGraphqlAPI(): def __init__(self, extractor): self.extractor = extractor self.user_collection = self.user_saved = self.reels_media = \ self.highlights_media = self.guide = self.guide_media = \ self._unsupported self._json_dumps = json.JSONEncoder(separators=(",", ":")).encode api = InstagramRestAPI(extractor) self.user_by_name = api.user_by_name self.user_by_id = api.user_by_id self.user_id = api.user_id @staticmethod def _unsupported(_=None): raise exception.StopExtraction("Unsupported with GraphQL API") def highlights_tray(self, user_id): query_hash = "d4d88dc1500312af6f937f7b804c68c3" variables = { "user_id": user_id, "include_chaining": False, "include_reel": False, "include_suggested_users": False, "include_logged_out_extras": True, "include_highlight_reels": True, "include_live_status": False, } edges = (self._call(query_hash, variables)["user"] ["edge_highlight_reels"]["edges"]) return [edge["node"] for edge in edges] def media(self, shortcode): query_hash = "9f8827793ef34641b2fb195d4d41151c" variables = { "shortcode": shortcode, "child_comment_count": 3, "fetch_comment_count": 40, "parent_comment_count": 24, "has_threaded_comments": True, } media = self._call(query_hash, variables).get("shortcode_media") return (media,) if media else () def tags_media(self, tag): query_hash = "9b498c08113f1e09617a1703c22b2f32" variables = {"tag_name": text.unescape(tag), "first": 50} return self._pagination(query_hash, variables, "hashtag", "edge_hashtag_to_media") def user_clips(self, user_id): query_hash = "bc78b344a68ed16dd5d7f264681c4c76" variables = {"id": user_id, "first": 50} return self._pagination(query_hash, variables) def user_feed(self, user_id): query_hash = "69cba40317214236af40e7efa697781d" variables = {"id": user_id, "first": 50} return self._pagination(query_hash, variables) def user_tagged(self, user_id): query_hash = "be13233562af2d229b008d2976b998b5" variables = {"id": user_id, "first": 50} return self._pagination(query_hash, variables) def _call(self, query_hash, variables): extr = self.extractor url = "https://www.instagram.com/graphql/query/" params = { "query_hash": query_hash, "variables" : self._json_dumps(variables), } headers = { "Accept" : "*/*", "X-CSRFToken" : extr.csrf_token, "X-Instagram-AJAX": "1006267176", "X-IG-App-ID" : "936619743392459", "X-ASBD-ID" : "198387", "X-IG-WWW-Claim" : extr.www_claim, "X-Requested-With": "XMLHttpRequest", "Referer" : extr.root + "/", } return extr.request(url, params=params, headers=headers).json()["data"] def _pagination(self, query_hash, variables, key_data="user", key_edge=None): extr = self.extractor variables["after"] = extr._init_cursor() while True: data = self._call(query_hash, variables)[key_data] data = data[key_edge] if key_edge else next(iter(data.values())) for edge in data["edges"]: yield edge["node"] info = data["page_info"] if not info["has_next_page"]: return extr._update_cursor(None) elif not data["edges"]: s = "" if self.item.endswith("s") else "s" raise exception.StopExtraction( "%s'%s posts are private", self.item, s) variables["after"] = extr._update_cursor(info["end_cursor"]) @cache(maxage=90*86400, keyarg=1) def _login_impl(extr, username, password): extr.log.error("Login with username & password is no longer supported. " "Use browser cookies instead.") return {} def id_from_shortcode(shortcode): return util.bdecode(shortcode, _ALPHABET) def shortcode_from_id(post_id): return util.bencode(int(post_id), _ALPHABET) _ALPHABET = ("ABCDEFGHIJKLMNOPQRSTUVWXYZ" "abcdefghijklmnopqrstuvwxyz" "0123456789-_") ��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1709611201.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.26.9/gallery_dl/extractor/issuu.py�����������������������������������������������������0000644�0001750�0001750�00000005242�14571514301�020302� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2019-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://issuu.com/""" from .common import GalleryExtractor, Extractor, Message from .. import text, util class IssuuBase(): """Base class for issuu extractors""" category = "issuu" root = "https://issuu.com" class IssuuPublicationExtractor(IssuuBase, GalleryExtractor): """Extractor for a single publication""" subcategory = "publication" directory_fmt = ("{category}", "{document[username]}", "{document[date]:%Y-%m-%d} {document[title]}") filename_fmt = "{num:>03}.{extension}" archive_fmt = "{document[publicationId]}_{num}" pattern = r"(?:https?://)?issuu\.com(/[^/?#]+/docs/[^/?#]+)" example = "https://issuu.com/issuu/docs/TITLE/" def metadata(self, page): pos = page.rindex('id="initial-data"') data = util.json_loads(text.rextract( page, '<script data-json="', '"', pos)[0].replace(""", '"')) doc = data["initialDocumentData"]["document"] doc["date"] = text.parse_datetime( doc["originalPublishDateInISOString"], "%Y-%m-%dT%H:%M:%S.%fZ") self._cnt = text.parse_int(doc["pageCount"]) self._tpl = "https://{}/{}-{}/jpg/page_{{}}.jpg".format( data["config"]["hosts"]["image"], doc["revisionId"], doc["publicationId"], ) return {"document": doc} def images(self, page): fmt = self._tpl.format return [(fmt(i), None) for i in range(1, self._cnt + 1)] class IssuuUserExtractor(IssuuBase, Extractor): """Extractor for all publications of a user/publisher""" subcategory = "user" pattern = r"(?:https?://)?issuu\.com/([^/?#]+)/?$" example = "https://issuu.com/USER" def __init__(self, match): Extractor.__init__(self, match) self.user = match.group(1) def items(self): url = "{}/call/profile/v1/documents/{}".format(self.root, self.user) params = {"offset": 0, "limit": "25"} while True: data = self.request(url, params=params).json() for publication in data["items"]: publication["url"] = "{}/{}/docs/{}".format( self.root, self.user, publication["uri"]) publication["_extractor"] = IssuuPublicationExtractor yield Message.Queue, publication["url"], publication if not data["hasMore"]: return params["offset"] += data["limit"] ��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1709611201.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.26.9/gallery_dl/extractor/itaku.py�����������������������������������������������������0000644�0001750�0001750�00000010105�14571514301�020241� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2022-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://itaku.ee/""" from .common import Extractor, Message from ..cache import memcache from .. import text BASE_PATTERN = r"(?:https?://)?itaku\.ee" class ItakuExtractor(Extractor): """Base class for itaku extractors""" category = "itaku" root = "https://itaku.ee" directory_fmt = ("{category}", "{owner_username}") filename_fmt = ("{id}{title:? //}.{extension}") archive_fmt = "{id}" request_interval = (0.5, 1.5) def __init__(self, match): Extractor.__init__(self, match) self.item = match.group(1) def _init(self): self.api = ItakuAPI(self) self.videos = self.config("videos", True) def items(self): for post in self.posts(): post["date"] = text.parse_datetime( post["date_added"], "%Y-%m-%dT%H:%M:%S.%fZ") for category, tags in post.pop("categorized_tags").items(): post["tags_" + category.lower()] = [t["name"] for t in tags] post["tags"] = [t["name"] for t in post["tags"]] sections = [] for s in post["sections"]: group = s["group"] if group: sections.append(group["title"] + "/" + s["title"]) else: sections.append(s["title"]) post["sections"] = sections if post["video"] and self.videos: url = post["video"]["video"] else: url = post["image"] yield Message.Directory, post yield Message.Url, url, text.nameext_from_url(url, post) class ItakuGalleryExtractor(ItakuExtractor): """Extractor for posts from an itaku user gallery""" subcategory = "gallery" pattern = BASE_PATTERN + r"/profile/([^/?#]+)/gallery" example = "https://itaku.ee/profile/USER/gallery" def posts(self): return self.api.galleries_images(self.item) class ItakuImageExtractor(ItakuExtractor): subcategory = "image" pattern = BASE_PATTERN + r"/images/(\d+)" example = "https://itaku.ee/images/12345" def posts(self): return (self.api.image(self.item),) class ItakuAPI(): def __init__(self, extractor): self.extractor = extractor self.root = extractor.root + "/api" self.headers = { "Accept": "application/json, text/plain, */*", } def galleries_images(self, username, section=None): endpoint = "/galleries/images/" params = { "cursor" : None, "owner" : self.user(username)["owner"], "section" : section, "date_range": "", "maturity_rating": ("SFW", "Questionable", "NSFW"), "ordering" : "-date_added", "page" : "1", "page_size" : "30", "visibility": ("PUBLIC", "PROFILE_ONLY"), } return self._pagination(endpoint, params, self.image) def image(self, image_id): endpoint = "/galleries/images/{}/".format(image_id) return self._call(endpoint) @memcache(keyarg=1) def user(self, username): return self._call("/user_profiles/{}/".format(username)) def _call(self, endpoint, params=None): if not endpoint.startswith("http"): endpoint = self.root + endpoint response = self.extractor.request( endpoint, params=params, headers=self.headers) return response.json() def _pagination(self, endpoint, params, extend): data = self._call(endpoint, params) while True: if extend: for result in data["results"]: yield extend(result["id"]) else: yield from data["results"] url_next = data["links"].get("next") if not url_next: return data = self._call(url_next) �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1709611201.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.26.9/gallery_dl/extractor/itchio.py����������������������������������������������������0000644�0001750�0001750�00000003777�14571514301�020424� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://itch.io/""" from .common import Extractor, Message from .. import text class ItchioGameExtractor(Extractor): """Extractor for itch.io games""" category = "itchio" subcategory = "game" root = "https://itch.io" directory_fmt = ("{category}", "{user[name]}") filename_fmt = "{game[title]} ({id}).{extension}" archive_fmt = "{id}" pattern = r"(?:https?://)?(\w+).itch\.io/([\w-]+)" example = "https://USER.itch.io/GAME" def __init__(self, match): self.user, self.slug = match.groups() Extractor.__init__(self, match) def items(self): game_url = "https://{}.itch.io/{}".format(self.user, self.slug) page = self.request(game_url).text params = { "source": "view_game", "as_props": "1", "after_download_lightbox": "true", } headers = { "Referer": game_url, "X-Requested-With": "XMLHttpRequest", "Origin": "https://{}.itch.io".format(self.user), } data = { "csrf_token": text.unquote(self.cookies["itchio_token"]), } for upload_id in text.extract_iter(page, 'data-upload_id="', '"'): file_url = "{}/file/{}".format(game_url, upload_id) info = self.request(file_url, method="POST", params=params, headers=headers, data=data).json() game = info["lightbox"]["game"] user = info["lightbox"]["user"] game["url"] = game_url user.pop("follow_button", None) game = {"game": game, "user": user, "id": upload_id} url = info["url"] yield Message.Directory, game yield Message.Url, url, text.nameext_from_url(url, game) �././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1709611201.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.26.9/gallery_dl/extractor/jschan.py����������������������������������������������������0000644�0001750�0001750�00000005317�14571514301�020403� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for jschan Imageboards""" from .common import BaseExtractor, Message from .. import text import itertools class JschanExtractor(BaseExtractor): basecategory = "jschan" BASE_PATTERN = JschanExtractor.update({ "94chan": { "root": "https://94chan.org", "pattern": r"94chan\.org" } }) class JschanThreadExtractor(JschanExtractor): """Extractor for jschan threads""" subcategory = "thread" directory_fmt = ("{category}", "{board}", "{threadId} {subject|nomarkup[:50]}") filename_fmt = "{postId}{num:?-//} {filename}.{extension}" archive_fmt = "{board}_{postId}_{num}" pattern = BASE_PATTERN + r"/([^/?#]+)/thread/(\d+)\.html" example = "https://94chan.org/a/thread/12345.html" def __init__(self, match): JschanExtractor.__init__(self, match) index = match.lastindex self.board = match.group(index-1) self.thread = match.group(index) def items(self): url = "{}/{}/thread/{}.json".format( self.root, self.board, self.thread) thread = self.request(url).json() thread["threadId"] = thread["postId"] posts = thread.pop("replies", ()) yield Message.Directory, thread for post in itertools.chain((thread,), posts): files = post.pop("files", ()) if files: thread.update(post) thread["count"] = len(files) for num, file in enumerate(files): url = self.root + "/file/" + file["filename"] file.update(thread) file["num"] = num file["siteFilename"] = file["filename"] text.nameext_from_url(file["originalFilename"], file) yield Message.Url, url, file class JschanBoardExtractor(JschanExtractor): """Extractor for jschan boards""" subcategory = "board" pattern = (BASE_PATTERN + r"/([^/?#]+)" r"(?:/index\.html|/catalog\.html|/\d+\.html|/?$)") example = "https://94chan.org/a/" def __init__(self, match): JschanExtractor.__init__(self, match) self.board = match.group(match.lastindex) def items(self): url = "{}/{}/catalog.json".format(self.root, self.board) for thread in self.request(url).json(): url = "{}/{}/thread/{}.html".format( self.root, self.board, thread["postId"]) thread["_extractor"] = JschanThreadExtractor yield Message.Queue, url, thread �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1709611201.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.26.9/gallery_dl/extractor/kabeuchi.py��������������������������������������������������0000644�0001750�0001750�00000005415�14571514301�020707� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2020-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://kabe-uchiroom.com/""" from .common import Extractor, Message from .. import text, exception class KabeuchiUserExtractor(Extractor): """Extractor for all posts of a user on kabe-uchiroom.com""" category = "kabeuchi" subcategory = "user" directory_fmt = ("{category}", "{twitter_user_id} {twitter_id}") filename_fmt = "{id}_{num:>02}{title:?_//}.{extension}" archive_fmt = "{id}_{num}" root = "https://kabe-uchiroom.com" pattern = r"(?:https?://)?kabe-uchiroom\.com/mypage/?\?id=(\d+)" example = "https://kabe-uchiroom.com/mypage/?id=12345" def __init__(self, match): Extractor.__init__(self, match) self.user_id = match.group(1) def items(self): base = "{}/accounts/upfile/{}/{}/".format( self.root, self.user_id[-1], self.user_id) keys = ("image1", "image2", "image3", "image4", "image5", "image6") for post in self.posts(): if post.get("is_ad") or not post["image1"]: continue post["date"] = text.parse_datetime( post["created_at"], "%Y-%m-%d %H:%M:%S") yield Message.Directory, post for key in keys: name = post[key] if not name: break url = base + name post["num"] = ord(key[-1]) - 48 yield Message.Url, url, text.nameext_from_url(name, post) def posts(self): url = "{}/mypage/?id={}".format(self.root, self.user_id) response = self.request(url) if response.history and response.url == self.root + "/": raise exception.NotFoundError("user") target_id = text.extr(response.text, 'user_friend_id = "', '"') return self._pagination(target_id) def _pagination(self, target_id): url = "{}/get_posts.php".format(self.root) data = { "user_id" : "0", "target_id" : target_id, "type" : "uploads", "sort_type" : "0", "category_id": "all", "latest_post": "", "page_num" : 0, } while True: info = self.request(url, method="POST", data=data).json() datas = info["datas"] if not datas or not isinstance(datas, list): return yield from datas last_id = datas[-1]["id"] if last_id == info["last_data"]: return data["latest_post"] = last_id data["page_num"] += 1 ���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1709611201.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.26.9/gallery_dl/extractor/keenspot.py��������������������������������������������������0000644�0001750�0001750�00000011154�14571514301�020761� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2019-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for http://www.keenspot.com/""" from .common import Extractor, Message from .. import text class KeenspotComicExtractor(Extractor): """Extractor for webcomics from keenspot.com""" category = "keenspot" subcategory = "comic" directory_fmt = ("{category}", "{comic}") filename_fmt = "{filename}.{extension}" archive_fmt = "{comic}_{filename}" pattern = r"(?:https?://)?(?!www\.|forums\.)([\w-]+)\.keenspot\.com(/.+)?" example = "http://COMIC.keenspot.com/" def __init__(self, match): Extractor.__init__(self, match) self.comic = match.group(1).lower() self.path = match.group(2) self.root = "http://" + self.comic + ".keenspot.com" self._needle = "" self._image = 'class="ksc"' self._next = self._next_needle def items(self): data = {"comic": self.comic} yield Message.Directory, data with self.request(self.root + "/") as response: if response.history: url = response.request.url self.root = url[:url.index("/", 8)] page = response.text del response url = self._first(page) if self.path: url = self.root + self.path prev = None ilen = len(self._image) while url and url != prev: prev = url page = self.request(text.urljoin(self.root, url)).text pos = 0 while True: pos = page.find(self._image, pos) if pos < 0: break img, pos = text.extract(page, 'src="', '"', pos + ilen) if img.endswith(".js"): continue if img[0] == "/": img = self.root + img elif "youtube.com/" in img: img = "ytdl:" + img yield Message.Url, img, text.nameext_from_url(img, data) url = self._next(page) def _first(self, page): if self.comic == "brawlinthefamily": self._next = self._next_brawl self._image = '<div id="comic">' return "http://brawlinthefamily.keenspot.com/comic/theshowdown/" url = text.extr(page, '<link rel="first" href="', '"') if url: if self.comic == "porcelain": self._needle = 'id="porArchivetop_"' else: self._next = self._next_link return url pos = page.find('id="first_day1"') if pos >= 0: self._next = self._next_id return text.rextract(page, 'href="', '"', pos)[0] pos = page.find('>FIRST PAGE<') if pos >= 0: if self.comic == "lastblood": self._next = self._next_lastblood self._image = '<div id="comic">' else: self._next = self._next_id return text.rextract(page, 'href="', '"', pos)[0] pos = page.find('<div id="kscomicpart"') if pos >= 0: self._needle = '<a href="/archive.html' return text.extract(page, 'href="', '"', pos)[0] pos = page.find('>First Comic<') # twokinds if pos >= 0: self._image = '</header>' self._needle = 'class="navarchive"' return text.rextract(page, 'href="', '"', pos)[0] pos = page.find('id="flip_FirstDay"') # flipside if pos >= 0: self._image = 'class="flip_Pages ksc"' self._needle = 'id="flip_ArcButton"' return text.rextract(page, 'href="', '"', pos)[0] self.log.error("Unrecognized page layout") return None def _next_needle(self, page): pos = page.index(self._needle) + len(self._needle) return text.extract(page, 'href="', '"', pos)[0] @staticmethod def _next_link(page): return text.extr(page, '<link rel="next" href="', '"') @staticmethod def _next_id(page): pos = page.find('id="next_') return text.rextract(page, 'href="', '"', pos)[0] if pos >= 0 else None @staticmethod def _next_lastblood(page): pos = page.index("link rel='next'") return text.extract(page, "href='", "'", pos)[0] @staticmethod def _next_brawl(page): pos = page.index("comic-nav-next") url = text.rextract(page, 'href="', '"', pos)[0] return None if "?random" in url else url ��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1711073098.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.26.9/gallery_dl/extractor/kemonoparty.py�����������������������������������������������0000644�0001750�0001750�00000043771�14577163512�021525� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2021-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://kemono.su/""" from .common import Extractor, Message from .. import text, util, exception from ..cache import cache, memcache import itertools import json import re BASE_PATTERN = r"(?:https?://)?(?:www\.|beta\.)?(kemono|coomer)\.(su|party)" USER_PATTERN = BASE_PATTERN + r"/([^/?#]+)/user/([^/?#]+)" HASH_PATTERN = r"/[0-9a-f]{2}/[0-9a-f]{2}/([0-9a-f]{64})" class KemonopartyExtractor(Extractor): """Base class for kemonoparty extractors""" category = "kemonoparty" root = "https://kemono.su" directory_fmt = ("{category}", "{service}", "{user}") filename_fmt = "{id}_{title[:180]}_{num:>02}_{filename[:180]}.{extension}" archive_fmt = "{service}_{user}_{id}_{num}" cookies_domain = ".kemono.su" def __init__(self, match): domain = match.group(1) tld = match.group(2) self.category = domain + "party" self.root = text.root_from_url(match.group(0)) self.cookies_domain = ".{}.{}".format(domain, tld) Extractor.__init__(self, match) def _init(self): self.revisions = self.config("revisions") if self.revisions: self.revisions_unique = (self.revisions == "unique") order = self.config("order-revisions") self.revisions_reverse = order[0] in ("r", "a") if order else False self._prepare_ddosguard_cookies() self._find_inline = re.compile( r'src="(?:https?://(?:kemono|coomer)\.(?:su|party))?(/inline/[^"]+' r'|/[0-9a-f]{2}/[0-9a-f]{2}/[0-9a-f]{64}\.[^"]+)').findall self._json_dumps = json.JSONEncoder( ensure_ascii=False, check_circular=False, sort_keys=True, separators=(",", ":")).encode def items(self): find_hash = re.compile(HASH_PATTERN).match generators = self._build_file_generators(self.config("files")) duplicates = self.config("duplicates") comments = self.config("comments") username = dms = None # prevent files from being sent with gzip compression headers = {"Accept-Encoding": "identity"} if self.config("metadata"): username = text.unescape(text.extract( self.request(self.user_url).text, '<meta name="artist_name" content="', '"')[0]) if self.config("dms"): dms = True posts = self.posts() max_posts = self.config("max-posts") if max_posts: posts = itertools.islice(posts, max_posts) for post in posts: headers["Referer"] = "{}/{}/user/{}/post/{}".format( self.root, post["service"], post["user"], post["id"]) post["_http_headers"] = headers post["date"] = self._parse_datetime( post["published"] or post["added"]) if username: post["username"] = username if comments: post["comments"] = self._extract_comments(post) if dms is not None: if dms is True: dms = self._extract_dms(post) post["dms"] = dms files = [] hashes = set() for file in itertools.chain.from_iterable( g(post) for g in generators): url = file["path"] match = find_hash(url) if match: file["hash"] = hash = match.group(1) if not duplicates: if hash in hashes: self.log.debug("Skipping %s (duplicate)", url) continue hashes.add(hash) else: file["hash"] = "" files.append(file) post["count"] = len(files) yield Message.Directory, post for post["num"], file in enumerate(files, 1): post["_http_validate"] = None post["hash"] = file["hash"] post["type"] = file["type"] url = file["path"] text.nameext_from_url(file.get("name", url), post) ext = text.ext_from_url(url) if not post["extension"]: post["extension"] = ext elif ext == "txt" and post["extension"] != "txt": post["_http_validate"] = _validate if url[0] == "/": url = self.root + "/data" + url elif url.startswith(self.root): url = self.root + "/data" + url[20:] yield Message.Url, url, post def login(self): username, password = self._get_auth_info() if username: self.cookies_update(self._login_impl( (username, self.cookies_domain), password)) @cache(maxage=28*86400, keyarg=1) def _login_impl(self, username, password): username = username[0] self.log.info("Logging in as %s", username) url = self.root + "/account/login" data = {"username": username, "password": password} response = self.request(url, method="POST", data=data) if response.url.endswith("/account/login") and \ "Username or password is incorrect" in response.text: raise exception.AuthenticationError() return {c.name: c.value for c in response.history[0].cookies} def _file(self, post): file = post["file"] if not file: return () file["type"] = "file" return (file,) def _attachments(self, post): for attachment in post["attachments"]: attachment["type"] = "attachment" return post["attachments"] def _inline(self, post): for path in self._find_inline(post.get("content") or ""): yield {"path": path, "name": path, "type": "inline"} def _build_file_generators(self, filetypes): if filetypes is None: return (self._attachments, self._file, self._inline) genmap = { "file" : self._file, "attachments": self._attachments, "inline" : self._inline, } if isinstance(filetypes, str): filetypes = filetypes.split(",") return [genmap[ft] for ft in filetypes] def _extract_comments(self, post): url = "{}/{}/user/{}/post/{}".format( self.root, post["service"], post["user"], post["id"]) page = self.request(url).text comments = [] for comment in text.extract_iter(page, "<article", "</article>"): extr = text.extract_from(comment) cid = extr('id="', '"') comments.append({ "id" : cid, "user": extr('href="#' + cid + '"', '</').strip(" \n\r>"), "body": extr( '<section class="comment__body">', '</section>').strip(), "date": extr('datetime="', '"'), }) return comments def _extract_dms(self, post): url = "{}/{}/user/{}/dms".format( self.root, post["service"], post["user"]) page = self.request(url).text dms = [] for dm in text.extract_iter(page, "<article", "</article>"): footer = text.extr(dm, "<footer", "</footer>") dms.append({ "body": text.unescape(text.extr( dm, "<pre>", "</pre></", ).strip()), "date": text.extr(footer, 'Published: ', '\n'), }) return dms def _parse_datetime(self, date_string): if len(date_string) > 19: date_string = date_string[:19] return text.parse_datetime(date_string, "%Y-%m-%dT%H:%M:%S") @memcache(keyarg=1) def _discord_channels(self, server): url = "{}/api/v1/discord/channel/lookup/{}".format( self.root, server) return self.request(url).json() def _revisions_post(self, post, url): post["revision_id"] = 0 try: revs = self.request(url + "/revisions").json() except exception.HttpError: post["revision_hash"] = self._revision_hash(post) post["revision_index"] = 1 post["revision_count"] = 1 return (post,) revs.insert(0, post) for rev in revs: rev["revision_hash"] = self._revision_hash(rev) if self.revisions_unique: uniq = [] last = None for rev in revs: if last != rev["revision_hash"]: last = rev["revision_hash"] uniq.append(rev) revs = uniq cnt = idx = len(revs) for rev in revs: rev["revision_index"] = idx rev["revision_count"] = cnt idx -= 1 if self.revisions_reverse: revs.reverse() return revs def _revisions_all(self, url): revs = self.request(url + "/revisions").json() cnt = idx = len(revs) for rev in revs: rev["revision_hash"] = self._revision_hash(rev) rev["revision_index"] = idx rev["revision_count"] = cnt idx -= 1 if self.revisions_reverse: revs.reverse() return revs def _revision_hash(self, revision): rev = revision.copy() rev.pop("revision_id", None) rev.pop("added", None) rev.pop("next", None) rev.pop("prev", None) rev["file"] = rev["file"].copy() rev["file"].pop("name", None) rev["attachments"] = [a.copy() for a in rev["attachments"]] for a in rev["attachments"]: a.pop("name", None) return util.sha1(self._json_dumps(rev)) def _validate(response): return (response.headers["content-length"] != "9" or response.content != b"not found") class KemonopartyUserExtractor(KemonopartyExtractor): """Extractor for all posts from a kemono.su user listing""" subcategory = "user" pattern = USER_PATTERN + r"/?(?:\?([^#]+))?(?:$|[?#])" example = "https://kemono.su/SERVICE/user/12345" def __init__(self, match): _, _, service, user_id, self.query = match.groups() self.subcategory = service KemonopartyExtractor.__init__(self, match) self.api_url = "{}/api/v1/{}/user/{}".format( self.root, service, user_id) self.user_url = "{}/{}/user/{}".format(self.root, service, user_id) def posts(self): url = self.api_url params = text.parse_query(self.query) params["o"] = text.parse_int(params.get("o")) while True: posts = self.request(url, params=params).json() if self.revisions: for post in posts: post_url = "{}/api/v1/{}/user/{}/post/{}".format( self.root, post["service"], post["user"], post["id"]) yield from self._revisions_post(post, post_url) else: yield from posts if len(posts) < 50: break params["o"] += 50 class KemonopartyPostsExtractor(KemonopartyExtractor): """Extractor for kemono.su post listings""" subcategory = "posts" pattern = BASE_PATTERN + r"/posts(?:/?\?([^#]+))?" example = "https://kemono.su/posts" def __init__(self, match): KemonopartyExtractor.__init__(self, match) self.query = match.group(3) self.api_url = self.root + "/api/v1/posts" posts = KemonopartyUserExtractor.posts class KemonopartyPostExtractor(KemonopartyExtractor): """Extractor for a single kemono.su post""" subcategory = "post" pattern = USER_PATTERN + r"/post/([^/?#]+)(/revisions?(?:/(\d*))?)?" example = "https://kemono.su/SERVICE/user/12345/post/12345" def __init__(self, match): _, _, service, user_id, post_id, self.revision, self.revision_id = \ match.groups() self.subcategory = service KemonopartyExtractor.__init__(self, match) self.api_url = "{}/api/v1/{}/user/{}/post/{}".format( self.root, service, user_id, post_id) self.user_url = "{}/{}/user/{}".format(self.root, service, user_id) def posts(self): if not self.revision: post = self.request(self.api_url).json() if self.revisions: return self._revisions_post(post, self.api_url) return (post,) revs = self._revisions_all(self.api_url) if not self.revision_id: return revs for rev in revs: if str(rev["revision_id"]) == self.revision_id: return (rev,) raise exception.NotFoundError("revision") class KemonopartyDiscordExtractor(KemonopartyExtractor): """Extractor for kemono.su discord servers""" subcategory = "discord" directory_fmt = ("{category}", "discord", "{server}", "{channel_name|channel}") filename_fmt = "{id}_{num:>02}_{filename}.{extension}" archive_fmt = "discord_{server}_{id}_{num}" pattern = BASE_PATTERN + r"/discord/server/(\d+)(?:/channel/(\d+))?#(.*)" example = "https://kemono.su/discord/server/12345#CHANNEL" def __init__(self, match): KemonopartyExtractor.__init__(self, match) _, _, self.server, self.channel_id, self.channel = match.groups() self.channel_name = "" def items(self): self._prepare_ddosguard_cookies() if self.channel_id: self.channel_name = self.channel else: if self.channel.isdecimal() and len(self.channel) >= 16: key = "id" else: key = "name" for channel in self._discord_channels(self.server): if channel[key] == self.channel: break else: raise exception.NotFoundError("channel") self.channel_id = channel["id"] self.channel_name = channel["name"] find_inline = re.compile( r"https?://(?:cdn\.discordapp.com|media\.discordapp\.net)" r"(/[A-Za-z0-9-._~:/?#\[\]@!$&'()*+,;%=]+)").findall find_hash = re.compile(HASH_PATTERN).match posts = self.posts() max_posts = self.config("max-posts") if max_posts: posts = itertools.islice(posts, max_posts) for post in posts: files = [] append = files.append for attachment in post["attachments"]: match = find_hash(attachment["path"]) attachment["hash"] = match.group(1) if match else "" attachment["type"] = "attachment" append(attachment) for path in find_inline(post["content"] or ""): append({"path": "https://cdn.discordapp.com" + path, "name": path, "type": "inline", "hash": ""}) post["channel_name"] = self.channel_name post["date"] = self._parse_datetime(post["published"]) post["count"] = len(files) yield Message.Directory, post for post["num"], file in enumerate(files, 1): post["hash"] = file["hash"] post["type"] = file["type"] url = file["path"] text.nameext_from_url(file.get("name", url), post) if not post["extension"]: post["extension"] = text.ext_from_url(url) if url[0] == "/": url = self.root + "/data" + url elif url.startswith(self.root): url = self.root + "/data" + url[20:] yield Message.Url, url, post def posts(self): url = "{}/api/v1/discord/channel/{}".format( self.root, self.channel_id) params = {"o": 0} while True: posts = self.request(url, params=params).json() yield from posts if len(posts) < 150: break params["o"] += 150 class KemonopartyDiscordServerExtractor(KemonopartyExtractor): subcategory = "discord-server" pattern = BASE_PATTERN + r"/discord/server/(\d+)$" example = "https://kemono.su/discord/server/12345" def __init__(self, match): KemonopartyExtractor.__init__(self, match) self.server = match.group(3) def items(self): for channel in self._discord_channels(self.server): url = "{}/discord/server/{}/channel/{}#{}".format( self.root, self.server, channel["id"], channel["name"]) channel["_extractor"] = KemonopartyDiscordExtractor yield Message.Queue, url, channel class KemonopartyFavoriteExtractor(KemonopartyExtractor): """Extractor for kemono.su favorites""" subcategory = "favorite" pattern = BASE_PATTERN + r"/favorites(?:/?\?([^#]+))?" example = "https://kemono.su/favorites" def __init__(self, match): KemonopartyExtractor.__init__(self, match) self.favorites = (text.parse_query(match.group(3)).get("type") or self.config("favorites") or "artist") def items(self): self._prepare_ddosguard_cookies() self.login() if self.favorites == "artist": users = self.request( self.root + "/api/v1/account/favorites?type=artist").json() for user in users: user["_extractor"] = KemonopartyUserExtractor url = "{}/{}/user/{}".format( self.root, user["service"], user["id"]) yield Message.Queue, url, user elif self.favorites == "post": posts = self.request( self.root + "/api/v1/account/favorites?type=post").json() for post in posts: post["_extractor"] = KemonopartyPostExtractor url = "{}/{}/user/{}/post/{}".format( self.root, post["service"], post["user"], post["id"]) yield Message.Queue, url, post �������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1709682589.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.26.9/gallery_dl/extractor/khinsider.py�������������������������������������������������0000644�0001750�0001750�00000005577�14571727635�021145� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2016-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://downloads.khinsider.com/""" from .common import Extractor, Message, AsynchronousMixin from .. import text, exception class KhinsiderSoundtrackExtractor(AsynchronousMixin, Extractor): """Extractor for soundtracks from khinsider.com""" category = "khinsider" subcategory = "soundtrack" root = "https://downloads.khinsider.com" directory_fmt = ("{category}", "{album[name]}") archive_fmt = "{filename}.{extension}" pattern = (r"(?:https?://)?downloads\.khinsider\.com" r"/game-soundtracks/album/([^/?#]+)") example = ("https://downloads.khinsider.com" "/game-soundtracks/album/TITLE") def __init__(self, match): Extractor.__init__(self, match) self.album = match.group(1) def items(self): url = self.root + "/game-soundtracks/album/" + self.album page = self.request(url, encoding="utf-8").text if "Download all songs at once:" not in page: raise exception.NotFoundError("soundtrack") data = self.metadata(page) yield Message.Directory, data for track in self.tracks(page): track.update(data) yield Message.Url, track["url"], track def metadata(self, page): extr = text.extract_from(page) return {"album": { "name" : text.unescape(extr("<h2>", "<")), "platform": extr("Platforms: <a", "<").rpartition(">")[2], "count": text.parse_int(extr("Number of Files: <b>", "<")), "size" : text.parse_bytes(extr("Total Filesize: <b>", "<")[:-1]), "date" : extr("Date Added: <b>", "<"), "type" : text.remove_html(extr("Album type: <b>", "</b>")), }} def tracks(self, page): fmt = self.config("format", ("mp3",)) if fmt and isinstance(fmt, str): if fmt == "all": fmt = None else: fmt = fmt.lower().split(",") page = text.extr(page, '<table id="songlist">', '</table>') for num, url in enumerate(text.extract_iter( page, '<td class="clickable-row"><a href="', '"'), 1): url = text.urljoin(self.root, url) page = self.request(url, encoding="utf-8").text track = first = None for url in text.extract_iter(page, '<p><a href="', '"'): track = text.nameext_from_url(url, {"num": num, "url": url}) if first is None: first = track if not fmt or track["extension"] in fmt: first = False yield track if first: yield first ���������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1709611201.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.26.9/gallery_dl/extractor/komikcast.py�������������������������������������������������0000644�0001750�0001750�00000006473�14571514301�021126� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2018-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://komikcast.lol/""" from .common import ChapterExtractor, MangaExtractor from .. import text import re BASE_PATTERN = r"(?:https?://)?(?:www\.)?komikcast\.(?:lol|site|me|com)" class KomikcastBase(): """Base class for komikcast extractors""" category = "komikcast" root = "https://komikcast.lol" @staticmethod def parse_chapter_string(chapter_string, data=None): """Parse 'chapter_string' value and add its info to 'data'""" if not data: data = {} match = re.match( r"(?:(.*) Chapter )?0*(\d+)([^ ]*)(?: (?:- )?(.+))?", text.unescape(chapter_string), ) manga, chapter, data["chapter_minor"], title = match.groups() if manga: data["manga"] = manga.partition(" Chapter ")[0] if title and not title.lower().startswith("bahasa indonesia"): data["title"] = title.strip() else: data["title"] = "" data["chapter"] = text.parse_int(chapter) data["lang"] = "id" data["language"] = "Indonesian" return data class KomikcastChapterExtractor(KomikcastBase, ChapterExtractor): """Extractor for manga-chapters from komikcast.lol""" pattern = BASE_PATTERN + r"(/chapter/[^/?#]+/)" example = "https://komikcast.lol/chapter/TITLE/" def metadata(self, page): info = text.extr(page, "<title>", " - Komikcast<") return self.parse_chapter_string(info) @staticmethod def images(page): readerarea = text.extr( page, '<div class="main-reading-area', '</div') return [ (text.unescape(url), None) for url in re.findall(r"<img[^>]* src=[\"']([^\"']+)", readerarea) ] class KomikcastMangaExtractor(KomikcastBase, MangaExtractor): """Extractor for manga from komikcast.lol""" chapterclass = KomikcastChapterExtractor pattern = BASE_PATTERN + r"(/(?:komik/)?[^/?#]+)/?$" example = "https://komikcast.lol/komik/TITLE" def chapters(self, page): results = [] data = self.metadata(page) for item in text.extract_iter( page, '<a class="chapter-link-item" href="', '</a'): url, _, chapter = item.rpartition('">Chapter') chapter, sep, minor = chapter.strip().partition(".") data["chapter"] = text.parse_int(chapter) data["chapter_minor"] = sep + minor results.append((url, data.copy())) return results @staticmethod def metadata(page): """Return a dict with general metadata""" manga , pos = text.extract(page, "<title>" , " - Komikcast<") genres, pos = text.extract( page, 'class="komik_info-content-genre">', "</span>", pos) author, pos = text.extract(page, ">Author:", "</span>", pos) mtype , pos = text.extract(page, ">Type:" , "</span>", pos) return { "manga": text.unescape(manga), "genres": text.split_html(genres), "author": text.remove_html(author), "type": text.remove_html(mtype), } �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1710004386.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.26.9/gallery_dl/extractor/lensdump.py��������������������������������������������������0000644�0001750�0001750�00000011327�14573114242�020764� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://lensdump.com/""" from .common import GalleryExtractor, Extractor, Message from .. import text, util BASE_PATTERN = r"(?:https?://)?lensdump\.com" class LensdumpBase(): """Base class for lensdump extractors""" category = "lensdump" root = "https://lensdump.com" def nodes(self, page=None): if page is None: page = self.request(self.url).text # go through all pages starting from the oldest page_url = text.urljoin(self.root, text.extr( text.extr(page, ' id="list-most-oldest-link"', '>'), 'href="', '"')) while page_url is not None: if page_url == self.url: current_page = page else: current_page = self.request(page_url).text for node in text.extract_iter( current_page, ' class="list-item ', '>'): yield node # find url of next page page_url = text.extr( text.extr(current_page, ' data-pagination="next"', '>'), 'href="', '"') if page_url is not None and len(page_url) > 0: page_url = text.urljoin(self.root, page_url) else: page_url = None class LensdumpAlbumExtractor(LensdumpBase, GalleryExtractor): subcategory = "album" pattern = BASE_PATTERN + r"/(?:((?!\w+/albums|a/|i/)\w+)|a/(\w+))" example = "https://lensdump.com/a/ID" def __init__(self, match): GalleryExtractor.__init__(self, match, match.string) self.gallery_id = match.group(1) or match.group(2) def metadata(self, page): return { "gallery_id": self.gallery_id, "title": text.unescape(text.extr( page, 'property="og:title" content="', '"').strip()) } def images(self, page): for node in self.nodes(page): # get urls and filenames of images in current page json_data = util.json_loads(text.unquote( text.extr(node, "data-object='", "'") or text.extr(node, 'data-object="', '"'))) image_id = json_data.get('name') image_url = json_data.get('url') image_title = json_data.get('title') if image_title is not None: image_title = text.unescape(image_title) yield (image_url, { 'id': image_id, 'url': image_url, 'title': image_title, 'name': json_data.get('filename'), 'filename': image_id, 'extension': json_data.get('extension'), 'height': text.parse_int(json_data.get('height')), 'width': text.parse_int(json_data.get('width')), }) class LensdumpAlbumsExtractor(LensdumpBase, Extractor): """Extractor for album list from lensdump.com""" subcategory = "albums" pattern = BASE_PATTERN + r"/\w+/albums" example = "https://lensdump.com/USER/albums" def items(self): for node in self.nodes(): album_url = text.urljoin(self.root, text.extr( node, 'data-url-short="', '"')) yield Message.Queue, album_url, { "_extractor": LensdumpAlbumExtractor} class LensdumpImageExtractor(LensdumpBase, Extractor): """Extractor for individual images on lensdump.com""" subcategory = "image" filename_fmt = "{category}_{id}{title:?_//}.{extension}" directory_fmt = ("{category}",) archive_fmt = "{id}" pattern = r"(?:https?://)?(?:(?:i\d?\.)?lensdump\.com|\w\.l3n\.co)/i/(\w+)" example = "https://lensdump.com/i/ID" def __init__(self, match): Extractor.__init__(self, match) self.key = match.group(1) def items(self): url = "{}/i/{}".format(self.root, self.key) extr = text.extract_from(self.request(url).text) data = { "id" : self.key, "title" : text.unescape(extr( 'property="og:title" content="', '"')), "url" : extr( 'property="og:image" content="', '"'), "width" : text.parse_int(extr( 'property="image:width" content="', '"')), "height": text.parse_int(extr( 'property="image:height" content="', '"')), "date" : text.parse_datetime(extr( '<span title="', '"'), "%Y-%m-%d %H:%M:%S"), } text.nameext_from_url(data["url"], data) yield Message.Directory, data yield Message.Url, data["url"], data ���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1709611201.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.26.9/gallery_dl/extractor/lexica.py����������������������������������������������������0000644�0001750�0001750�00000004333�14571514301�020377� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://lexica.art/""" from .common import Extractor, Message from .. import text class LexicaSearchExtractor(Extractor): """Extractor for lexica.art search results""" category = "lexica" subcategory = "search" root = "https://lexica.art" directory_fmt = ("{category}", "{search_tags}") archive_fmt = "{id}" pattern = r"(?:https?://)?lexica\.art/?\?q=([^&#]+)" example = "https://lexica.art/?q=QUERY" def __init__(self, match): Extractor.__init__(self, match) self.query = match.group(1) self.text = text.unquote(self.query).replace("+", " ") def items(self): base = ("https://lexica-serve-encoded-images2.sharif.workers.dev" "/full_jpg/") tags = self.text for image in self.posts(): image["filename"] = image["id"] image["extension"] = "jpg" image["search_tags"] = tags yield Message.Directory, image yield Message.Url, base + image["id"], image def posts(self): url = self.root + "/api/infinite-prompts" headers = { "Accept" : "application/json, text/plain, */*", "Referer": "{}/?q={}".format(self.root, self.query), } json = { "text" : self.text, "searchMode": "images", "source" : "search", "cursor" : 0, "model" : "lexica-aperture-v2", } while True: data = self.request( url, method="POST", headers=headers, json=json).json() prompts = { prompt["id"]: prompt for prompt in data["prompts"] } for image in data["images"]: image["prompt"] = prompts[image["promptid"]] del image["promptid"] yield image cursor = data.get("nextCursor") if not cursor: return json["cursor"] = cursor �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1709611201.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.26.9/gallery_dl/extractor/lightroom.py�������������������������������������������������0000644�0001750�0001750�00000005572�14571514301�021144� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://lightroom.adobe.com/""" from .common import Extractor, Message from .. import text, util class LightroomGalleryExtractor(Extractor): """Extractor for an image gallery on lightroom.adobe.com""" category = "lightroom" subcategory = "gallery" directory_fmt = ("{category}", "{user}", "{title}") filename_fmt = "{num:>04}_{id}.{extension}" archive_fmt = "{id}" pattern = r"(?:https?://)?lightroom\.adobe\.com/shares/([0-9a-f]+)" example = "https://lightroom.adobe.com/shares/0123456789abcdef" def __init__(self, match): Extractor.__init__(self, match) self.href = match.group(1) def items(self): # Get config url = "https://lightroom.adobe.com/shares/" + self.href response = self.request(url) album = util.json_loads( text.extr(response.text, "albumAttributes: ", "\n") ) images = self.images(album) for img in images: url = img["url"] yield Message.Directory, img yield Message.Url, url, text.nameext_from_url(url, img) def metadata(self, album): payload = album["payload"] story = payload.get("story") or {} return { "gallery_id": self.href, "user": story.get("author", ""), "title": story.get("title", payload["name"]), } def images(self, album): album_md = self.metadata(album) base_url = album["base"] next_url = album["links"]["/rels/space_album_images_videos"]["href"] num = 1 while next_url: url = base_url + next_url page = self.request(url).text # skip 1st line as it's a JS loop data = util.json_loads(page[page.index("\n") + 1:]) base_url = data["base"] for res in data["resources"]: img_url, img_size = None, 0 for key, value in res["asset"]["links"].items(): if not key.startswith("/rels/rendition_type/"): continue size = text.parse_int(key.split("/")[-1]) if size > img_size: img_size = size img_url = value["href"] if img_url: img = { "id": res["asset"]["id"], "num": num, "url": base_url + img_url, } img.update(album_md) yield img num += 1 try: next_url = data["links"]["next"]["href"] except KeyError: next_url = None ��������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1709611201.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.26.9/gallery_dl/extractor/livedoor.py��������������������������������������������������0000644�0001750�0001750�00000010067�14571514301�020756� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2019-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for http://blog.livedoor.jp/""" from .common import Extractor, Message from .. import text class LivedoorExtractor(Extractor): """Base class for livedoor extractors""" category = "livedoor" root = "http://blog.livedoor.jp" filename_fmt = "{post[id]}_{post[title]}_{num:>02}.{extension}" directory_fmt = ("{category}", "{post[user]}") archive_fmt = "{post[id]}_{hash}" def __init__(self, match): Extractor.__init__(self, match) self.user = match.group(1) def items(self): for post in self.posts(): images = self._images(post) if images: yield Message.Directory, {"post": post} for image in images: yield Message.Url, image["url"], image def posts(self): """Return an iterable with post objects""" def _load(self, data, body): extr = text.extract_from(data) tags = text.extr(body, 'class="article-tags">', '</dl>') about = extr('rdf:about="', '"') return { "id" : text.parse_int( about.rpartition("/")[2].partition(".")[0]), "title" : text.unescape(extr('dc:title="', '"')), "categories" : extr('dc:subject="', '"').partition(",")[::2], "description": extr('dc:description="', '"'), "date" : text.parse_datetime(extr('dc:date="', '"')), "tags" : text.split_html(tags)[1:] if tags else [], "user" : self.user, "body" : body, } def _images(self, post): imgs = [] body = post.pop("body") for num, img in enumerate(text.extract_iter(body, "<img ", ">"), 1): src = text.extr(img, 'src="', '"') alt = text.extr(img, 'alt="', '"') if not src: continue if "://livedoor.blogimg.jp/" in src: url = src.replace("http:", "https:", 1).replace("-s.", ".") else: url = text.urljoin(self.root, src) name, _, ext = url.rpartition("/")[2].rpartition(".") imgs.append({ "url" : url, "num" : num, "hash" : name, "filename" : alt or name, "extension": ext, "post" : post, }) return imgs class LivedoorBlogExtractor(LivedoorExtractor): """Extractor for a user's blog on blog.livedoor.jp""" subcategory = "blog" pattern = r"(?:https?://)?blog\.livedoor\.jp/(\w+)/?(?:$|[?#])" example = "http://blog.livedoor.jp/USER/" def posts(self): url = "{}/{}".format(self.root, self.user) while url: extr = text.extract_from(self.request(url).text) while True: data = extr('<rdf:RDF', '</rdf:RDF>') if not data: break body = extr('class="article-body-inner">', 'class="article-footer">') yield self._load(data, body) url = extr('<a rel="next" href="', '"') class LivedoorPostExtractor(LivedoorExtractor): """Extractor for images from a blog post on blog.livedoor.jp""" subcategory = "post" pattern = r"(?:https?://)?blog\.livedoor\.jp/(\w+)/archives/(\d+)" example = "http://blog.livedoor.jp/USER/archives/12345.html" def __init__(self, match): LivedoorExtractor.__init__(self, match) self.post_id = match.group(2) def posts(self): url = "{}/{}/archives/{}.html".format( self.root, self.user, self.post_id) extr = text.extract_from(self.request(url).text) data = extr('<rdf:RDF', '</rdf:RDF>') body = extr('class="article-body-inner">', 'class="article-footer">') return (self._load(data, body),) �������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1709611201.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.26.9/gallery_dl/extractor/lolisafe.py��������������������������������������������������0000644�0001750�0001750�00000003641�14571514301�020731� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2021-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for lolisafe/chibisafe instances""" from .common import BaseExtractor, Message from .. import text class LolisafeExtractor(BaseExtractor): """Base class for lolisafe extractors""" basecategory = "lolisafe" directory_fmt = ("{category}", "{album_name} ({album_id})") archive_fmt = "{album_id}_{id}" BASE_PATTERN = LolisafeExtractor.update({ "xbunkr": { "root": "https://xbunkr.com", "pattern": r"xbunkr\.com", }, }) class LolisafeAlbumExtractor(LolisafeExtractor): subcategory = "album" pattern = BASE_PATTERN + "/a/([^/?#]+)" example = "https://xbunkr.com/a/ID" def __init__(self, match): LolisafeExtractor.__init__(self, match) self.album_id = match.group(match.lastindex) def _init(self): domain = self.config("domain") if domain == "auto": self.root = text.root_from_url(self.url) elif domain: self.root = text.ensure_http_scheme(domain) def items(self): files, data = self.fetch_album(self.album_id) yield Message.Directory, data for data["num"], file in enumerate(files, 1): url = file["file"] file.update(data) text.nameext_from_url(url, file) file["name"], sep, file["id"] = file["filename"].rpartition("-") yield Message.Url, url, file def fetch_album(self, album_id): url = "{}/api/album/get/{}".format(self.root, album_id) data = self.request(url).json() return data["files"], { "album_id" : self.album_id, "album_name": text.unescape(data["title"]), "count" : data["count"], } �����������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1709680883.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.26.9/gallery_dl/extractor/luscious.py��������������������������������������������������0000644�0001750�0001750�00000021361�14571724363�021013� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2016-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://members.luscious.net/""" from .common import Extractor, Message from .. import text, exception class LusciousExtractor(Extractor): """Base class for luscious extractors""" category = "luscious" cookies_domain = ".luscious.net" root = "https://members.luscious.net" def _graphql(self, op, variables, query): data = { "id" : 1, "operationName": op, "query" : query, "variables" : variables, } response = self.request( "{}/graphql/nobatch/?operationName={}".format(self.root, op), method="POST", json=data, fatal=False, ) if response.status_code >= 400: self.log.debug("Server response: %s", response.text) raise exception.StopExtraction( "GraphQL query failed ('%s %s')", response.status_code, response.reason) return response.json()["data"] class LusciousAlbumExtractor(LusciousExtractor): """Extractor for image albums from luscious.net""" subcategory = "album" filename_fmt = "{category}_{album[id]}_{num:>03}.{extension}" directory_fmt = ("{category}", "{album[id]} {album[title]}") archive_fmt = "{album[id]}_{id}" pattern = (r"(?:https?://)?(?:www\.|members\.)?luscious\.net" r"/(?:albums|pictures/c/[^/?#]+/album)/[^/?#]+_(\d+)") example = "https://luscious.net/albums/TITLE_12345/" def __init__(self, match): LusciousExtractor.__init__(self, match) self.album_id = match.group(1) def _init(self): self.gif = self.config("gif", False) def items(self): album = self.metadata() yield Message.Directory, {"album": album} for num, image in enumerate(self.images(), 1): image["num"] = num image["album"] = album try: image["thumbnail"] = image.pop("thumbnails")[0]["url"] except LookupError: image["thumbnail"] = "" image["tags"] = [item["text"] for item in image["tags"]] image["date"] = text.parse_timestamp(image["created"]) image["id"] = text.parse_int(image["id"]) url = (image["url_to_original"] or image["url_to_video"] if self.gif else image["url_to_video"] or image["url_to_original"]) yield Message.Url, url, text.nameext_from_url(url, image) def metadata(self): variables = { "id": self.album_id, } query = """ query AlbumGet($id: ID!) { album { get(id: $id) { ... on Album { ...AlbumStandard } ... on MutationError { errors { code message } } } } } fragment AlbumStandard on Album { __typename id title labels description created modified like_status number_of_favorites rating status marked_for_deletion marked_for_processing number_of_pictures number_of_animated_pictures slug is_manga url download_url permissions cover { width height size url } created_by { id name display_name user_title avatar { url size } url } content { id title url } language { id title url } tags { id category text url count } genres { id title slug url } audiences { id title url url } last_viewed_picture { id position url } } """ album = self._graphql("AlbumGet", variables, query)["album"]["get"] if "errors" in album: raise exception.NotFoundError("album") album["audiences"] = [item["title"] for item in album["audiences"]] album["genres"] = [item["title"] for item in album["genres"]] album["tags"] = [item["text"] for item in album["tags"]] album["cover"] = album["cover"]["url"] album["content"] = album["content"]["title"] album["language"] = album["language"]["title"].partition(" ")[0] album["created_by"] = album["created_by"]["display_name"] album["id"] = text.parse_int(album["id"]) album["date"] = text.parse_timestamp(album["created"]) return album def images(self): variables = { "input": { "filters": [{ "name" : "album_id", "value": self.album_id, }], "display": "position", "page" : 1, }, } query = """ query AlbumListOwnPictures($input: PictureListInput!) { picture { list(input: $input) { info { ...FacetCollectionInfo } items { ...PictureStandardWithoutAlbum } } } } fragment FacetCollectionInfo on FacetCollectionInfo { page has_next_page has_previous_page total_items total_pages items_per_page url_complete url_filters_only } fragment PictureStandardWithoutAlbum on Picture { __typename id title created like_status number_of_comments number_of_favorites status width height resolution aspect_ratio url_to_original url_to_video is_animated position tags { id category text url } permissions url thumbnails { width height size url } } """ while True: data = self._graphql("AlbumListOwnPictures", variables, query) yield from data["picture"]["list"]["items"] if not data["picture"]["list"]["info"]["has_next_page"]: return variables["input"]["page"] += 1 class LusciousSearchExtractor(LusciousExtractor): """Extractor for album searches on luscious.net""" subcategory = "search" pattern = (r"(?:https?://)?(?:www\.|members\.)?luscious\.net" r"/albums/list/?(?:\?([^#]+))?") example = "https://luscious.net/albums/list/?tagged=TAG" def __init__(self, match): LusciousExtractor.__init__(self, match) self.query = match.group(1) def items(self): query = text.parse_query(self.query) display = query.pop("display", "date_newest") page = query.pop("page", None) variables = { "input": { "display": display, "filters": [{"name": n, "value": v} for n, v in query.items()], "page": text.parse_int(page, 1), }, } query = """ query AlbumListWithPeek($input: AlbumListInput!) { album { list(input: $input) { info { ...FacetCollectionInfo } items { ...AlbumMinimal peek_thumbnails { width height size url } } } } } fragment FacetCollectionInfo on FacetCollectionInfo { page has_next_page has_previous_page total_items total_pages items_per_page url_complete url_filters_only } fragment AlbumMinimal on Album { __typename id title labels description created modified number_of_favorites number_of_pictures slug is_manga url download_url cover { width height size url } content { id title url } language { id title url } tags { id category text url count } genres { id title slug url } audiences { id title url } } """ while True: data = self._graphql("AlbumListWithPeek", variables, query) for album in data["album"]["list"]["items"]: album["url"] = self.root + album["url"] album["_extractor"] = LusciousAlbumExtractor yield Message.Queue, album["url"], album if not data["album"]["list"]["info"]["has_next_page"]: return variables["input"]["page"] += 1 �������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1709611201.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.26.9/gallery_dl/extractor/lynxchan.py��������������������������������������������������0000644�0001750�0001750�00000005457�14571514301�020766� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for LynxChan Imageboards""" from .common import BaseExtractor, Message from .. import text import itertools class LynxchanExtractor(BaseExtractor): """Base class for LynxChan extractors""" basecategory = "lynxchan" BASE_PATTERN = LynxchanExtractor.update({ "bbw-chan": { "root": "https://bbw-chan.link", "pattern": r"bbw-chan\.(?:link|nl)", }, "kohlchan": { "root": "https://kohlchan.net", "pattern": r"kohlchan\.net", }, "endchan": { "root": None, "pattern": r"endchan\.(?:org|net|gg)", }, }) class LynxchanThreadExtractor(LynxchanExtractor): """Extractor for LynxChan threads""" subcategory = "thread" directory_fmt = ("{category}", "{boardUri}", "{threadId} {subject|message[:50]}") filename_fmt = "{postId}{num:?-//} {filename}.{extension}" archive_fmt = "{boardUri}_{postId}_{num}" pattern = BASE_PATTERN + r"/([^/?#]+)/res/(\d+)" example = "https://endchan.org/a/res/12345.html" def __init__(self, match): LynxchanExtractor.__init__(self, match) index = match.lastindex self.board = match.group(index-1) self.thread = match.group(index) def items(self): url = "{}/{}/res/{}.json".format(self.root, self.board, self.thread) thread = self.request(url).json() thread["postId"] = thread["threadId"] posts = thread.pop("posts", ()) yield Message.Directory, thread for post in itertools.chain((thread,), posts): files = post.pop("files", ()) if files: thread.update(post) for num, file in enumerate(files): file.update(thread) file["num"] = num url = self.root + file["path"] text.nameext_from_url(file["originalName"], file) yield Message.Url, url, file class LynxchanBoardExtractor(LynxchanExtractor): """Extractor for LynxChan boards""" subcategory = "board" pattern = BASE_PATTERN + r"/([^/?#]+)(?:/index|/catalog|/\d+|/?$)" example = "https://endchan.org/a/" def __init__(self, match): LynxchanExtractor.__init__(self, match) self.board = match.group(match.lastindex) def items(self): url = "{}/{}/catalog.json".format(self.root, self.board) for thread in self.request(url).json(): url = "{}/{}/res/{}.html".format( self.root, self.board, thread["threadId"]) thread["_extractor"] = LynxchanThreadExtractor yield Message.Queue, url, thread �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1709611201.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.26.9/gallery_dl/extractor/mangadex.py��������������������������������������������������0000644�0001750�0001750�00000024502�14571514301�020716� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2018-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://mangadex.org/""" from .common import Extractor, Message from .. import text, util, exception from ..cache import cache, memcache from collections import defaultdict BASE_PATTERN = r"(?:https?://)?(?:www\.)?mangadex\.(?:org|cc)" class MangadexExtractor(Extractor): """Base class for mangadex extractors""" category = "mangadex" directory_fmt = ( "{category}", "{manga}", "{volume:?v/ />02}c{chapter:>03}{chapter_minor}{title:?: //}") filename_fmt = ( "{manga}_c{chapter:>03}{chapter_minor}_{page:>03}.{extension}") archive_fmt = "{chapter_id}_{page}" root = "https://mangadex.org" _cache = {} def __init__(self, match): Extractor.__init__(self, match) self.uuid = match.group(1) def _init(self): self.session.headers["User-Agent"] = util.USERAGENT self.api = MangadexAPI(self) def items(self): for chapter in self.chapters(): uuid = chapter["id"] data = self._transform(chapter) data["_extractor"] = MangadexChapterExtractor self._cache[uuid] = data yield Message.Queue, self.root + "/chapter/" + uuid, data def _transform(self, chapter): relationships = defaultdict(list) for item in chapter["relationships"]: relationships[item["type"]].append(item) manga = self.api.manga(relationships["manga"][0]["id"]) for item in manga["relationships"]: relationships[item["type"]].append(item) cattributes = chapter["attributes"] mattributes = manga["attributes"] lang = cattributes.get("translatedLanguage") if lang: lang = lang.partition("-")[0] if cattributes["chapter"]: chnum, sep, minor = cattributes["chapter"].partition(".") else: chnum, sep, minor = 0, "", "" data = { "manga" : (mattributes["title"].get("en") or next(iter(mattributes["title"].values()))), "manga_id": manga["id"], "title" : cattributes["title"], "volume" : text.parse_int(cattributes["volume"]), "chapter" : text.parse_int(chnum), "chapter_minor": sep + minor, "chapter_id": chapter["id"], "date" : text.parse_datetime(cattributes["publishAt"]), "lang" : lang, "language": util.code_to_language(lang), "count" : cattributes["pages"], "_external_url": cattributes.get("externalUrl"), } data["artist"] = [artist["attributes"]["name"] for artist in relationships["artist"]] data["author"] = [author["attributes"]["name"] for author in relationships["author"]] data["group"] = [group["attributes"]["name"] for group in relationships["scanlation_group"]] data["status"] = mattributes["status"] data["tags"] = [tag["attributes"]["name"]["en"] for tag in mattributes["tags"]] return data class MangadexChapterExtractor(MangadexExtractor): """Extractor for manga-chapters from mangadex.org""" subcategory = "chapter" pattern = BASE_PATTERN + r"/chapter/([0-9a-f-]+)" example = ("https://mangadex.org/chapter" "/01234567-89ab-cdef-0123-456789abcdef") def items(self): try: data = self._cache.pop(self.uuid) except KeyError: chapter = self.api.chapter(self.uuid) data = self._transform(chapter) if data.get("_external_url") and not data["count"]: raise exception.StopExtraction( "Chapter %s%s is not available on MangaDex and can instead be " "read on the official publisher's website at %s.", data["chapter"], data["chapter_minor"], data["_external_url"]) yield Message.Directory, data server = self.api.athome_server(self.uuid) chapter = server["chapter"] base = "{}/data/{}/".format(server["baseUrl"], chapter["hash"]) enum = util.enumerate_reversed if self.config( "page-reverse") else enumerate for data["page"], page in enum(chapter["data"], 1): text.nameext_from_url(page, data) yield Message.Url, base + page, data class MangadexMangaExtractor(MangadexExtractor): """Extractor for manga from mangadex.org""" subcategory = "manga" pattern = BASE_PATTERN + r"/(?:title|manga)/(?!feed$)([0-9a-f-]+)" example = ("https://mangadex.org/title" "/01234567-89ab-cdef-0123-456789abcdef") def chapters(self): return self.api.manga_feed(self.uuid) class MangadexFeedExtractor(MangadexExtractor): """Extractor for chapters from your Followed Feed""" subcategory = "feed" pattern = BASE_PATTERN + r"/title/feed$()" example = "https://mangadex.org/title/feed" def chapters(self): return self.api.user_follows_manga_feed() class MangadexListExtractor(MangadexExtractor): """Extractor for mangadex lists""" subcategory = "list" pattern = (BASE_PATTERN + r"/list/([0-9a-f-]+)(?:/[^/?#]*)?(?:\?tab=(\w+))?") example = ("https://mangadex.org/list" "/01234567-89ab-cdef-0123-456789abcdef/NAME") def __init__(self, match): MangadexExtractor.__init__(self, match) if match.group(2) == "feed": self.subcategory = "list-feed" else: self.items = self._items_titles def chapters(self): return self.api.list_feed(self.uuid) def _items_titles(self): data = {"_extractor": MangadexMangaExtractor} for item in self.api.list(self.uuid)["relationships"]: if item["type"] == "manga": url = "{}/title/{}".format(self.root, item["id"]) yield Message.Queue, url, data class MangadexAPI(): """Interface for the MangaDex API v5 https://api.mangadex.org/docs/ """ def __init__(self, extr): self.extractor = extr self.headers = {} self.username, self.password = extr._get_auth_info() if not self.username: self.authenticate = util.noop server = extr.config("api-server") self.root = ("https://api.mangadex.org" if server is None else text.ensure_http_scheme(server).rstrip("/")) def athome_server(self, uuid): return self._call("/at-home/server/" + uuid) def chapter(self, uuid): params = {"includes[]": ("scanlation_group",)} return self._call("/chapter/" + uuid, params)["data"] def list(self, uuid): return self._call("/list/" + uuid)["data"] def list_feed(self, uuid): return self._pagination("/list/" + uuid + "/feed") @memcache(keyarg=1) def manga(self, uuid): params = {"includes[]": ("artist", "author")} return self._call("/manga/" + uuid, params)["data"] def manga_feed(self, uuid): order = "desc" if self.extractor.config("chapter-reverse") else "asc" params = { "order[volume]" : order, "order[chapter]": order, } return self._pagination("/manga/" + uuid + "/feed", params) def user_follows_manga_feed(self): params = {"order[publishAt]": "desc"} return self._pagination("/user/follows/manga/feed", params) def authenticate(self): self.headers["Authorization"] = \ self._authenticate_impl(self.username, self.password) @cache(maxage=900, keyarg=1) def _authenticate_impl(self, username, password): refresh_token = _refresh_token_cache(username) if refresh_token: self.extractor.log.info("Refreshing access token") url = self.root + "/auth/refresh" data = {"token": refresh_token} else: self.extractor.log.info("Logging in as %s", username) url = self.root + "/auth/login" data = {"username": username, "password": password} data = self.extractor.request( url, method="POST", json=data, fatal=None).json() if data.get("result") != "ok": raise exception.AuthenticationError() if refresh_token != data["token"]["refresh"]: _refresh_token_cache.update(username, data["token"]["refresh"]) return "Bearer " + data["token"]["session"] def _call(self, endpoint, params=None): url = self.root + endpoint while True: self.authenticate() response = self.extractor.request( url, params=params, headers=self.headers, fatal=None) if response.status_code < 400: return response.json() if response.status_code == 429: until = response.headers.get("X-RateLimit-Retry-After") self.extractor.wait(until=until) continue msg = ", ".join('{title}: {detail}'.format_map(error) for error in response.json()["errors"]) raise exception.StopExtraction( "%s %s (%s)", response.status_code, response.reason, msg) def _pagination(self, endpoint, params=None): if params is None: params = {} config = self.extractor.config ratings = config("ratings") if ratings is None: ratings = ("safe", "suggestive", "erotica", "pornographic") lang = config("lang") if isinstance(lang, str) and "," in lang: lang = lang.split(",") params["contentRating[]"] = ratings params["translatedLanguage[]"] = lang params["includes[]"] = ("scanlation_group",) params["offset"] = 0 api_params = config("api-parameters") if api_params: params.update(api_params) while True: data = self._call(endpoint, params) yield from data["data"] params["offset"] = data["offset"] + data["limit"] if params["offset"] >= data["total"]: return @cache(maxage=28*86400, keyarg=0) def _refresh_token_cache(username): return None ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1709611201.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.26.9/gallery_dl/extractor/mangafox.py��������������������������������������������������0000644�0001750�0001750�00000007463�14571514301�020741� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2017-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://fanfox.net/""" from .common import ChapterExtractor, MangaExtractor from .. import text BASE_PATTERN = r"(?:https?://)?(?:www\.|m\.)?(?:fanfox\.net|mangafox\.me)" class MangafoxChapterExtractor(ChapterExtractor): """Extractor for manga chapters from fanfox.net""" category = "mangafox" root = "https://m.fanfox.net" pattern = BASE_PATTERN + \ r"(/manga/[^/?#]+/((?:v([^/?#]+)/)?c(\d+)([^/?#]*)))" example = "https://fanfox.net/manga/TITLE/v01/c001/1.html" def __init__(self, match): base, self.cstr, self.volume, self.chapter, self.minor = match.groups() self.urlbase = self.root + base ChapterExtractor.__init__(self, match, self.urlbase + "/1.html") def metadata(self, page): manga, pos = text.extract(page, "<title>", "") count, pos = text.extract( page, ">", "<", page.find("", pos) - 20) sid , pos = text.extract(page, "var series_id =", ";", pos) cid , pos = text.extract(page, "var chapter_id =", ";", pos) return { "manga" : text.unescape(manga), "volume" : text.parse_int(self.volume), "chapter" : text.parse_int(self.chapter), "chapter_minor" : self.minor or "", "chapter_string": self.cstr, "count" : text.parse_int(count), "sid" : text.parse_int(sid), "cid" : text.parse_int(cid), } def images(self, page): pnum = 1 while True: url, pos = text.extract(page, '', '

    ') author = extr('

    Author(s):', '

    ') extr('
    ', '') genres, _, summary = text.extr( page, '
    ', '' ).partition('
    ') data = { "manga" : text.unescape(manga), "author" : text.remove_html(author), "description": text.unescape(text.remove_html(summary)), "tags" : text.split_html(genres), "lang" : "en", "language" : "English", } while True: url = "https://" + extr('', ''), "%b %d, %Y"), } chapter.update(data) results.append((url, chapter)) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709611201.0 gallery_dl-1.26.9/gallery_dl/extractor/mangahere.py0000644000175000017500000001077114571514301021064 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2015-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.mangahere.cc/""" from .common import ChapterExtractor, MangaExtractor from .. import text import re class MangahereBase(): """Base class for mangahere extractors""" category = "mangahere" root = "https://www.mangahere.cc" root_mobile = "https://m.mangahere.cc" url_fmt = root_mobile + "/manga/{}/{}.html" class MangahereChapterExtractor(MangahereBase, ChapterExtractor): """Extractor for manga-chapters from mangahere.cc""" pattern = (r"(?:https?://)?(?:www\.|m\.)?mangahere\.c[co]/manga/" r"([^/]+(?:/v0*(\d+))?/c([^/?#]+))") example = "https://www.mangahere.cc/manga/TITLE/c001/1.html" def __init__(self, match): self.part, self.volume, self.chapter = match.groups() url = self.url_fmt.format(self.part, 1) ChapterExtractor.__init__(self, match, url) def _init(self): self.session.headers["Referer"] = self.root_mobile + "/" def metadata(self, page): pos = page.index("") count , pos = text.extract(page, ">", "<", pos - 20) manga_id , pos = text.extract(page, "series_id = ", ";", pos) chapter_id, pos = text.extract(page, "chapter_id = ", ";", pos) manga , pos = text.extract(page, '"name":"', '"', pos) chapter, dot, minor = self.chapter.partition(".") return { "manga": text.unescape(manga), "manga_id": text.parse_int(manga_id), "title": self._get_title(), "volume": text.parse_int(self.volume), "chapter": text.parse_int(chapter), "chapter_minor": dot + minor, "chapter_id": text.parse_int(chapter_id), "count": text.parse_int(count), "lang": "en", "language": "English", } def images(self, page): pnum = 1 while True: url, pos = text.extract(page, '', '<', pos) date, pos = text.extract(page, 'class="title2">', '<', pos) match = re.match( r"(?:Vol\.0*(\d+) )?Ch\.0*(\d+)(\S*)(?: - (.*))?", info) if match: volume, chapter, minor, title = match.groups() else: chapter, _, minor = url[:-1].rpartition("/c")[2].partition(".") minor = "." + minor volume = 0 title = "" results.append((text.urljoin(self.root, url), { "manga": manga, "title": text.unescape(title) if title else "", "volume": text.parse_int(volume), "chapter": text.parse_int(chapter), "chapter_minor": minor, "date": date, "lang": "en", "language": "English", })) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709611201.0 gallery_dl-1.26.9/gallery_dl/extractor/mangakakalot.py0000644000175000017500000000664214571514301021571 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2020 Jake Mannens # Copyright 2021-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://mangakakalot.tv/""" from .common import ChapterExtractor, MangaExtractor from .. import text import re BASE_PATTERN = r"(?:https?://)?(?:ww[\dw]?\.)?mangakakalot\.tv" class MangakakalotBase(): """Base class for mangakakalot extractors""" category = "mangakakalot" root = "https://ww6.mangakakalot.tv" class MangakakalotChapterExtractor(MangakakalotBase, ChapterExtractor): """Extractor for manga chapters from mangakakalot.tv""" pattern = BASE_PATTERN + r"(/chapter/[^/?#]+/chapter[_-][^/?#]+)" example = "https://ww6.mangakakalot.tv/chapter/manga-ID/chapter-01" def __init__(self, match): self.path = match.group(1) ChapterExtractor.__init__(self, match, self.root + self.path) def metadata(self, page): _ , pos = text.extract(page, '', '<') manga , pos = text.extract(page, '', '<', pos) info , pos = text.extract(page, '', '<', pos) author, pos = text.extract(page, '. Author:', ' already has ', pos) match = re.match( r"(?:[Vv]ol\. *(\d+) )?" r"[Cc]hapter *([^:]*)" r"(?:: *(.+))?", info) volume, chapter, title = match.groups() if match else ("", "", info) chapter, sep, minor = chapter.partition(".") return { "manga" : text.unescape(manga), "title" : text.unescape(title) if title else "", "author" : text.unescape(author).strip() if author else "", "volume" : text.parse_int(volume), "chapter" : text.parse_int(chapter), "chapter_minor": sep + minor, "lang" : "en", "language" : "English", } def images(self, page): return [ (url, None) for url in text.extract_iter(page, '", "<") author, pos = text.extract(page, "
  • Author(s) :", "", pos) data["author"] = text.remove_html(author) results = [] for chapter in text.extract_iter(page, '
    ', '
    '): url, pos = text.extract(chapter, '', '', pos) data["title"] = title.partition(": ")[2] data["date"] , pos = text.extract( chapter, '") manga = extr('title="', '"') info = extr('title="', '"') author = extr("- Author(s) : ", "

    ") return self._parse_chapter( info, text.unescape(manga), text.unescape(author)) def images(self, page): page = text.extr( page, 'class="container-chapter-reader', 'class="container') return [ (url, None) for url in text.extract_iter(page, '", "<")) author = text.remove_html(extr("Author(s) :", "")) extr('class="row-content-chapter', '') while True: url = extr('class="chapter-name text-nowrap" href="', '"') if not url: return results info = extr(">", "<") date = extr('class="chapter-time text-nowrap" title="', '"') append((url, self._parse_chapter(info, manga, author, date))) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709611201.0 gallery_dl-1.26.9/gallery_dl/extractor/mangapark.py0000644000175000017500000002162114571514301021072 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2015-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://mangapark.net/""" from .common import ChapterExtractor, Extractor, Message from .. import text, util, exception import re BASE_PATTERN = r"(?:https?://)?(?:www\.)?mangapark\.(?:net|com|org|io|me)" class MangaparkBase(): """Base class for mangapark extractors""" category = "mangapark" _match_title = None def _parse_chapter_title(self, title): if not self._match_title: MangaparkBase._match_title = re.compile( r"(?i)" r"(?:vol(?:\.|ume)?\s*(\d+)\s*)?" r"ch(?:\.|apter)?\s*(\d+)([^\s:]*)" r"(?:\s*:\s*(.*))?" ).match match = self._match_title(title) return match.groups() if match else (0, 0, "", "") class MangaparkChapterExtractor(MangaparkBase, ChapterExtractor): """Extractor for manga-chapters from mangapark.net""" pattern = BASE_PATTERN + r"/title/[^/?#]+/(\d+)" example = "https://mangapark.net/title/MANGA/12345-en-ch.01" def __init__(self, match): self.root = text.root_from_url(match.group(0)) url = "{}/title/_/{}".format(self.root, match.group(1)) ChapterExtractor.__init__(self, match, url) def metadata(self, page): data = util.json_loads(text.extr( page, 'id="__NEXT_DATA__" type="application/json">', '<')) chapter = (data["props"]["pageProps"]["dehydratedState"] ["queries"][0]["state"]["data"]["data"]) manga = chapter["comicNode"]["data"] source = chapter["sourceNode"]["data"] self._urls = chapter["imageSet"]["httpLis"] self._params = chapter["imageSet"]["wordLis"] vol, ch, minor, title = self._parse_chapter_title(chapter["dname"]) return { "manga" : manga["name"], "manga_id" : manga["id"], "artist" : source["artists"], "author" : source["authors"], "genre" : source["genres"], "volume" : text.parse_int(vol), "chapter" : text.parse_int(ch), "chapter_minor": minor, "chapter_id": chapter["id"], "title" : chapter["title"] or title or "", "lang" : chapter["lang"], "language" : util.code_to_language(chapter["lang"]), "source" : source["srcTitle"], "source_id" : source["id"], "date" : text.parse_timestamp(chapter["dateCreate"] // 1000), } def images(self, page): return [ (url + "?" + params, None) for url, params in zip(self._urls, self._params) ] class MangaparkMangaExtractor(MangaparkBase, Extractor): """Extractor for manga from mangapark.net""" subcategory = "manga" pattern = BASE_PATTERN + r"/title/(\d+)(?:-[^/?#]*)?/?$" example = "https://mangapark.net/title/12345-MANGA" def __init__(self, match): self.root = text.root_from_url(match.group(0)) self.manga_id = int(match.group(1)) Extractor.__init__(self, match) def items(self): for chapter in self.chapters(): chapter = chapter["data"] url = self.root + chapter["urlPath"] vol, ch, minor, title = self._parse_chapter_title(chapter["dname"]) data = { "manga_id" : self.manga_id, "volume" : text.parse_int(vol), "chapter" : text.parse_int(ch), "chapter_minor": minor, "chapter_id": chapter["id"], "title" : chapter["title"] or title or "", "lang" : chapter["lang"], "language" : util.code_to_language(chapter["lang"]), "source" : chapter["srcTitle"], "source_id" : chapter["sourceId"], "date" : text.parse_timestamp( chapter["dateCreate"] // 1000), "_extractor": MangaparkChapterExtractor, } yield Message.Queue, url, data def chapters(self): source = self.config("source") if not source: return self.chapters_all() source_id = self._select_source(source) self.log.debug("Requesting chapters for source_id %s", source_id) return self.chapters_source(source_id) def chapters_all(self): pnum = 0 variables = { "select": { "comicId": self.manga_id, "range" : None, "isAsc" : not self.config("chapter-reverse"), } } while True: data = self._request_graphql( "get_content_comicChapterRangeList", variables) for item in data["items"]: yield from item["chapterNodes"] if not pnum: pager = data["pager"] pnum += 1 try: variables["select"]["range"] = pager[pnum] except IndexError: return def chapters_source(self, source_id): variables = { "sourceId": source_id, } chapters = self._request_graphql( "get_content_source_chapterList", variables) if self.config("chapter-reverse"): chapters.reverse() return chapters def _select_source(self, source): if isinstance(source, int): return source group, _, lang = source.partition(":") group = group.lower() variables = { "comicId" : self.manga_id, "dbStatuss" : ["normal"], "haveChapter": True, } for item in self._request_graphql( "get_content_comic_sources", variables): data = item["data"] if (not group or data["srcTitle"].lower() == group) and ( not lang or data["lang"] == lang): return data["id"] raise exception.StopExtraction( "'%s' does not match any available source", source) def _request_graphql(self, opname, variables): url = self.root + "/apo/" data = { "query" : QUERIES[opname], "variables" : util.json_dumps(variables), "operationName": opname, } return self.request( url, method="POST", json=data).json()["data"][opname] QUERIES = { "get_content_comicChapterRangeList": """ query get_content_comicChapterRangeList($select: Content_ComicChapterRangeList_Select) { get_content_comicChapterRangeList( select: $select ) { reqRange{x y} missing pager {x y} items{ serial chapterNodes { id data { id sourceId dbStatus isNormal isHidden isDeleted isFinal dateCreate datePublic dateModify lang volume serial dname title urlPath srcTitle srcColor count_images stat_count_post_child stat_count_post_reply stat_count_views_login stat_count_views_guest userId userNode { id data { id name uniq avatarUrl urlPath verified deleted banned dateCreate dateOnline stat_count_chapters_normal stat_count_chapters_others is_adm is_mod is_vip is_upr } } disqusId } sser_read } } } } """, "get_content_source_chapterList": """ query get_content_source_chapterList($sourceId: Int!) { get_content_source_chapterList( sourceId: $sourceId ) { id data { id sourceId dbStatus isNormal isHidden isDeleted isFinal dateCreate datePublic dateModify lang volume serial dname title urlPath srcTitle srcColor count_images stat_count_post_child stat_count_post_reply stat_count_views_login stat_count_views_guest userId userNode { id data { id name uniq avatarUrl urlPath verified deleted banned dateCreate dateOnline stat_count_chapters_normal stat_count_chapters_others is_adm is_mod is_vip is_upr } } disqusId } } } """, "get_content_comic_sources": """ query get_content_comic_sources($comicId: Int!, $dbStatuss: [String] = [], $userId: Int, $haveChapter: Boolean, $sortFor: String) { get_content_comic_sources( comicId: $comicId dbStatuss: $dbStatuss userId: $userId haveChapter: $haveChapter sortFor: $sortFor ) { id data{ id dbStatus isNormal isHidden isDeleted lang name altNames authors artists release genres summary{code} extraInfo{code} urlCover600 urlCover300 urlCoverOri srcTitle srcColor chapterCount chapterNode_last { id data { dateCreate datePublic dateModify volume serial dname title urlPath userNode { id data {uniq name} } } } } } } """, } ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709611201.0 gallery_dl-1.26.9/gallery_dl/extractor/mangaread.py0000644000175000017500000001005414571514301021046 0ustar00mikemike# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://mangaread.org/""" from .common import ChapterExtractor, MangaExtractor from .. import text, exception import re class MangareadBase(): """Base class for Mangaread extractors""" category = "mangaread" root = "https://www.mangaread.org" @staticmethod def parse_chapter_string(chapter_string, data): match = re.match( r"(?:(.+)\s*-\s*)?[Cc]hapter\s*(\d+)(\.\d+)?(?:\s*-\s*(.+))?", text.unescape(chapter_string).strip()) manga, chapter, minor, title = match.groups() manga = manga.strip() if manga else "" data["manga"] = data.pop("manga", manga) data["chapter"] = text.parse_int(chapter) data["chapter_minor"] = minor or "" data["title"] = title or "" data["lang"] = "en" data["language"] = "English" class MangareadChapterExtractor(MangareadBase, ChapterExtractor): """Extractor for manga-chapters from mangaread.org""" pattern = (r"(?:https?://)?(?:www\.)?mangaread\.org" r"(/manga/[^/?#]+/[^/?#]+)") example = "https://www.mangaread.org/manga/MANGA/chapter-01/" def metadata(self, page): tags = text.extr(page, 'class="wp-manga-tags-list">', '
  • ') data = {"tags": list(text.split_html(tags)[::2])} info = text.extr(page, '

    ', "

    ") if not info: raise exception.NotFoundError("chapter") self.parse_chapter_string(info, data) return data def images(self, page): page = text.extr( page, '
    ', '
    "): url , pos = text.extract(chapter, '", "", pos) self.parse_chapter_string(info, data) result.append((url, data.copy())) return result def metadata(self, page): extr = text.extract_from(text.extr( page, 'class="summary_content">', 'class="manga-action"')) return { "manga" : text.extr(page, "

    ", "

    ").strip(), "description": text.unescape(text.remove_html(text.extract( page, ">", "
    ", page.index("summary__content"))[0])), "rating" : text.parse_float( extr('total_votes">', "").strip()), "manga_alt" : text.remove_html( extr("Alternative \n
    ", "
    ")).split("; "), "author" : list(text.extract_iter( extr('class="author-content">', "
    "), '"tag">', "")), "artist" : list(text.extract_iter( extr('class="artist-content">', ""), '"tag">', "")), "genres" : list(text.extract_iter( extr('class="genres-content">', ""), '"tag">', "")), "type" : text.remove_html( extr("Type \n", "")), "release" : text.parse_int(text.remove_html( extr("Release \n", ""))), "status" : text.remove_html( extr("Status \n", "")), } ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709611201.0 gallery_dl-1.26.9/gallery_dl/extractor/mangasee.py0000644000175000017500000001014214571514301020705 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2021-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://mangasee123.com/""" from .common import ChapterExtractor, MangaExtractor from .. import text, util class MangaseeBase(): category = "mangasee" browser = "firefox" root = "https://mangasee123.com" @staticmethod def _transform_chapter(data): chapter = data["Chapter"] return { "title" : data["ChapterName"] or "", "index" : chapter[0], "chapter" : int(chapter[1:-1]), "chapter_minor": "" if chapter[-1] == "0" else "." + chapter[-1], "chapter_string": chapter, "lang" : "en", "language": "English", "date" : text.parse_datetime( data["Date"], "%Y-%m-%d %H:%M:%S"), } class MangaseeChapterExtractor(MangaseeBase, ChapterExtractor): pattern = (r"(?:https?://)?(mangasee123|manga4life)\.com" r"(/read-online/[^/?#]+\.html)") example = "https://mangasee123.com/read-online/MANGA-chapter-1-page-1.html" def __init__(self, match): if match.group(1) == "manga4life": self.category = "mangalife" self.root = "https://manga4life.com" ChapterExtractor.__init__(self, match, self.root + match.group(2)) def _init(self): self.session.headers["Referer"] = self.gallery_url domain = self.root.rpartition("/")[2] cookies = self.cookies if not cookies.get("PHPSESSID", domain=domain): cookies.set("PHPSESSID", util.generate_token(13), domain=domain) def metadata(self, page): extr = text.extract_from(page) author = util.json_loads(extr('"author":', '],') + "]") genre = util.json_loads(extr('"genre":', '],') + "]") self.chapter = data = util.json_loads(extr("vm.CurChapter =", ";\r\n")) self.domain = extr('vm.CurPathName = "', '"') self.slug = extr('vm.IndexName = "', '"') data = self._transform_chapter(data) data["manga"] = text.unescape(extr('vm.SeriesName = "', '"')) data["author"] = author data["genre"] = genre return data def images(self, page): chapter = self.chapter["Chapter"][1:] if chapter[-1] == "0": chapter = chapter[:-1] else: chapter = chapter[:-1] + "." + chapter[-1] base = "https://{}/manga/{}/".format(self.domain, self.slug) if self.chapter["Directory"]: base += self.chapter["Directory"] + "/" base += chapter + "-" return [ ("{}{:>03}.png".format(base, i), None) for i in range(1, int(self.chapter["Page"]) + 1) ] class MangaseeMangaExtractor(MangaseeBase, MangaExtractor): chapterclass = MangaseeChapterExtractor pattern = r"(?:https?://)?(mangasee123|manga4life)\.com(/manga/[^/?#]+)" example = "https://mangasee123.com/manga/MANGA" def __init__(self, match): if match.group(1) == "manga4life": self.category = "mangalife" self.root = "https://manga4life.com" MangaExtractor.__init__(self, match, self.root + match.group(2)) def chapters(self, page): extr = text.extract_from(page) author = util.json_loads(extr('"author":', '],') + "]") genre = util.json_loads(extr('"genre":', '],') + "]") slug = extr('vm.IndexName = "', '"') chapters = util.json_loads(extr("vm.Chapters = ", ";\r\n")) result = [] for data in map(self._transform_chapter, chapters): url = "{}/read-online/{}-chapter-{}{}".format( self.root, slug, data["chapter"], data["chapter_minor"]) if data["index"] != "1": url += "-index-" + data["index"] url += "-page-1.html" data["manga"] = slug data["author"] = author data["genre"] = genre result.append((url, data)) return result ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709611201.0 gallery_dl-1.26.9/gallery_dl/extractor/mangoxo.py0000644000175000017500000001321514571514301020601 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2019-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.mangoxo.com/""" from .common import Extractor, Message from .. import text, exception from ..cache import cache import hashlib import time class MangoxoExtractor(Extractor): """Base class for mangoxo extractors""" category = "mangoxo" root = "https://www.mangoxo.com" cookies_domain = "www.mangoxo.com" cookies_names = ("SESSION",) _warning = True def login(self): username, password = self._get_auth_info() if username: self.cookies_update(self._login_impl(username, password)) elif MangoxoExtractor._warning: MangoxoExtractor._warning = False self.log.warning("Unauthenticated users cannot see " "more than 5 images per album") @cache(maxage=3*3600, keyarg=1) def _login_impl(self, username, password): self.log.info("Logging in as %s", username) url = self.root + "/login" page = self.request(url).text token = text.extr(page, 'id="loginToken" value="', '"') url = self.root + "/api/login" headers = { "X-Requested-With": "XMLHttpRequest", "Referer": self.root + "/login", } data = self._sign_by_md5(username, password, token) response = self.request(url, method="POST", headers=headers, data=data) data = response.json() if str(data.get("result")) != "1": raise exception.AuthenticationError(data.get("msg")) return {"SESSION": self.cookies.get("SESSION")} @staticmethod def _sign_by_md5(username, password, token): # https://dns.mangoxo.com/libs/plugins/phoenix-ui/js/phoenix-ui.js params = [ ("username" , username), ("password" , password), ("token" , token), ("timestamp", str(int(time.time()))), ] query = "&".join("=".join(item) for item in sorted(params)) query += "&secretKey=340836904" sign = hashlib.md5(query.encode()).hexdigest() params.append(("sign", sign.upper())) return params @staticmethod def _total_pages(page): return text.parse_int(text.extract(page, "total :", ",")[0]) class MangoxoAlbumExtractor(MangoxoExtractor): """Extractor for albums on mangoxo.com""" subcategory = "album" filename_fmt = "{album[id]}_{num:>03}.{extension}" directory_fmt = ("{category}", "{channel[name]}", "{album[name]}") archive_fmt = "{album[id]}_{num}" pattern = r"(?:https?://)?(?:www\.)?mangoxo\.com/album/(\w+)" example = "https://www.mangoxo.com/album/ID" def __init__(self, match): MangoxoExtractor.__init__(self, match) self.album_id = match.group(1) def items(self): self.login() url = "{}/album/{}/".format(self.root, self.album_id) page = self.request(url).text data = self.metadata(page) imgs = self.images(url, page) yield Message.Directory, data data["extension"] = None for data["num"], path in enumerate(imgs, 1): data["id"] = text.parse_int(text.extr(path, "=", "&")) url = self.root + "/external/" + path.rpartition("url=")[2] yield Message.Url, url, text.nameext_from_url(url, data) def metadata(self, page): """Return general metadata""" extr = text.extract_from(page) title = extr('', '', '<') date = extr('class="fa fa-calendar">', '<') descr = extr('
    ', '
    ') return { "channel": { "id": cid, "name": text.unescape(cname), "cover": cover, }, "album": { "id": self.album_id, "name": text.unescape(title), "date": text.parse_datetime(date.strip(), "%Y.%m.%d %H:%M"), "description": text.unescape(descr), }, "count": text.parse_int(count), } def images(self, url, page): """Generator; Yields all image URLs""" total = self._total_pages(page) num = 1 while True: yield from text.extract_iter( page, 'class="lightgallery-item" href="', '"') if num >= total: return num += 1 page = self.request(url + str(num)).text class MangoxoChannelExtractor(MangoxoExtractor): """Extractor for all albums on a mangoxo channel""" subcategory = "channel" pattern = r"(?:https?://)?(?:www\.)?mangoxo\.com/(\w+)/album" example = "https://www.mangoxo.com/USER/album" def __init__(self, match): MangoxoExtractor.__init__(self, match) self.user = match.group(1) def items(self): self.login() num = total = 1 url = "{}/{}/album/".format(self.root, self.user) data = {"_extractor": MangoxoAlbumExtractor} while True: page = self.request(url + str(num)).text for album in text.extract_iter( page, '= total: return num += 1 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1710265644.0 gallery_dl-1.26.9/gallery_dl/extractor/mastodon.py0000644000175000017500000002330414574112454020763 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2019-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for Mastodon instances""" from .common import BaseExtractor, Message from .. import text, exception from ..cache import cache class MastodonExtractor(BaseExtractor): """Base class for mastodon extractors""" basecategory = "mastodon" directory_fmt = ("mastodon", "{instance}", "{account[username]}") filename_fmt = "{category}_{id}_{media[id]}.{extension}" archive_fmt = "{media[id]}" cookies_domain = None def __init__(self, match): BaseExtractor.__init__(self, match) self.item = match.group(match.lastindex) def _init(self): self.instance = self.root.partition("://")[2] self.reblogs = self.config("reblogs", False) self.replies = self.config("replies", True) def items(self): for status in self.statuses(): if self._check_moved: self._check_moved(status["account"]) if not self.reblogs and status["reblog"]: self.log.debug("Skipping %s (reblog)", status["id"]) continue if not self.replies and status["in_reply_to_id"]: self.log.debug("Skipping %s (reply)", status["id"]) continue attachments = status["media_attachments"] del status["media_attachments"] if status["reblog"]: attachments.extend(status["reblog"]["media_attachments"]) status["instance"] = self.instance acct = status["account"]["acct"] status["instance_remote"] = \ acct.rpartition("@")[2] if "@" in acct else None status["count"] = len(attachments) status["tags"] = [tag["name"] for tag in status["tags"]] status["date"] = text.parse_datetime( status["created_at"][:19], "%Y-%m-%dT%H:%M:%S") yield Message.Directory, status for status["num"], media in enumerate(attachments, 1): status["media"] = media url = media["url"] yield Message.Url, url, text.nameext_from_url(url, status) def statuses(self): """Return an iterable containing all relevant Status objects""" return () def _check_moved(self, account): self._check_moved = None # Certain fediverse software (such as Iceshrimp and Sharkey) have a # null account "moved" field instead of not having it outright. # To handle this, check if the "moved" value is truthy instead # if only it exists. if account.get("moved"): self.log.warning("Account '%s' moved to '%s'", account["acct"], account["moved"]["acct"]) BASE_PATTERN = MastodonExtractor.update({ "mastodon.social": { "root" : "https://mastodon.social", "pattern" : r"mastodon\.social", "access-token" : "Y06R36SMvuXXN5_wiPKFAEFiQaMSQg0o_hGgc86Jj48", "client-id" : "dBSHdpsnOUZgxOnjKSQrWEPakO3ctM7HmsyoOd4FcRo", "client-secret": "DdrODTHs_XoeOsNVXnILTMabtdpWrWOAtrmw91wU1zI", }, "pawoo": { "root" : "https://pawoo.net", "pattern" : r"pawoo\.net", "access-token" : "c12c9d275050bce0dc92169a28db09d7" "0d62d0a75a8525953098c167eacd3668", "client-id" : "978a25f843ec01e53d09be2c290cd75c" "782bc3b7fdbd7ea4164b9f3c3780c8ff", "client-secret": "9208e3d4a7997032cf4f1b0e12e5df38" "8428ef1fadb446dcfeb4f5ed6872d97b", }, "baraag": { "root" : "https://baraag.net", "pattern" : r"baraag\.net", "access-token" : "53P1Mdigf4EJMH-RmeFOOSM9gdSDztmrAYFgabOKKE0", "client-id" : "czxx2qilLElYHQ_sm-lO8yXuGwOHxLX9RYYaD0-nq1o", "client-secret": "haMaFdMBgK_-BIxufakmI2gFgkYjqmgXGEO2tB-R2xY", } }) + "(?:/web)?" class MastodonUserExtractor(MastodonExtractor): """Extractor for all images of an account/user""" subcategory = "user" pattern = BASE_PATTERN + r"/(?:@|users/)([^/?#]+)(?:/media)?/?$" example = "https://mastodon.social/@USER" def statuses(self): api = MastodonAPI(self) return api.account_statuses( api.account_id_by_username(self.item), only_media=( not self.reblogs and not self.config("text-posts", False) ), exclude_replies=not self.replies, ) class MastodonBookmarkExtractor(MastodonExtractor): """Extractor for mastodon bookmarks""" subcategory = "bookmark" pattern = BASE_PATTERN + r"/bookmarks" example = "https://mastodon.social/bookmarks" def statuses(self): return MastodonAPI(self).account_bookmarks() class MastodonFollowingExtractor(MastodonExtractor): """Extractor for followed mastodon users""" subcategory = "following" pattern = BASE_PATTERN + r"/(?:@|users/)([^/?#]+)/following" example = "https://mastodon.social/@USER/following" def items(self): api = MastodonAPI(self) account_id = api.account_id_by_username(self.item) for account in api.account_following(account_id): account["_extractor"] = MastodonUserExtractor yield Message.Queue, account["url"], account class MastodonStatusExtractor(MastodonExtractor): """Extractor for images from a status""" subcategory = "status" pattern = BASE_PATTERN + r"/@[^/?#]+/(?!following)([^/?#]+)" example = "https://mastodon.social/@USER/12345" def statuses(self): return (MastodonAPI(self).status(self.item),) class MastodonAPI(): """Minimal interface for the Mastodon API https://docs.joinmastodon.org/ https://github.com/tootsuite/mastodon """ def __init__(self, extractor): self.root = extractor.root self.extractor = extractor access_token = extractor.config("access-token") if access_token is None or access_token == "cache": access_token = _access_token_cache(extractor.instance) if not access_token: access_token = extractor.config_instance("access-token") if access_token: self.headers = {"Authorization": "Bearer " + access_token} else: self.headers = None def account_id_by_username(self, username): if username.startswith("id:"): return username[3:] try: return self.account_lookup(username)["id"] except Exception: # fall back to account search pass if "@" in username: handle = "@" + username else: handle = "@{}@{}".format(username, self.extractor.instance) for account in self.account_search(handle, 1): if account["acct"] == username: self.extractor._check_moved(account) return account["id"] raise exception.NotFoundError("account") def account_bookmarks(self): endpoint = "/v1/bookmarks" return self._pagination(endpoint, None) def account_following(self, account_id): endpoint = "/v1/accounts/{}/following".format(account_id) return self._pagination(endpoint, None) def account_lookup(self, username): endpoint = "/v1/accounts/lookup" params = {"acct": username} return self._call(endpoint, params).json() def account_search(self, query, limit=40): """Search for accounts""" endpoint = "/v1/accounts/search" params = {"q": query, "limit": limit} return self._call(endpoint, params).json() def account_statuses(self, account_id, only_media=True, exclude_replies=False): """Fetch an account's statuses""" endpoint = "/v1/accounts/{}/statuses".format(account_id) params = {"only_media" : "1" if only_media else "0", "exclude_replies": "1" if exclude_replies else "0"} return self._pagination(endpoint, params) def status(self, status_id): """Fetch a status""" endpoint = "/v1/statuses/" + status_id return self._call(endpoint).json() def _call(self, endpoint, params=None): if endpoint.startswith("http"): url = endpoint else: url = self.root + "/api" + endpoint while True: response = self.extractor.request( url, params=params, headers=self.headers, fatal=None) code = response.status_code if code < 400: return response if code == 401: raise exception.StopExtraction( "Invalid or missing access token.\n" "Run 'gallery-dl oauth:mastodon:%s' to obtain one.", self.extractor.instance) if code == 404: raise exception.NotFoundError() if code == 429: self.extractor.wait(until=text.parse_datetime( response.headers["x-ratelimit-reset"], "%Y-%m-%dT%H:%M:%S.%fZ", )) continue raise exception.StopExtraction(response.json().get("error")) def _pagination(self, endpoint, params): url = endpoint while url: response = self._call(url, params) yield from response.json() url = response.links.get("next") if not url: return url = url["url"] params = None @cache(maxage=36500*86400, keyarg=0) def _access_token_cache(instance): return None ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1708353731.0 gallery_dl-1.26.9/gallery_dl/extractor/message.py0000644000175000017500000000345314564664303020572 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2015-2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. class Message(): """Enum for message identifiers Extractors yield their results as message-tuples, where the first element is one of the following identifiers. This message-identifier determines the type and meaning of the other elements in such a tuple. - Message.Version: - Message protocol version (currently always '1') - 2nd element specifies the version of all following messages as integer - Message.Directory: - Sets the target directory for all following images - 2nd element is a dictionary containing general metadata - Message.Url: - Image URL and its metadata - 2nd element is the URL as a string - 3rd element is a dictionary with image-specific metadata - Message.Headers: # obsolete - HTTP headers to use while downloading - 2nd element is a dictionary with header-name and -value pairs - Message.Cookies: # obsolete - Cookies to use while downloading - 2nd element is a dictionary with cookie-name and -value pairs - Message.Queue: - (External) URL that should be handled by another extractor - 2nd element is the (external) URL as a string - 3rd element is a dictionary containing URL-specific metadata - Message.Urllist: # obsolete - Same as Message.Url, but its 2nd element is a list of multiple URLs - The additional URLs serve as a fallback if the primary one fails """ Version = 1 Directory = 2 Url = 3 # Headers = 4 # Cookies = 5 Queue = 6 # Urllist = 7 # Metadata = 8 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709611201.0 gallery_dl-1.26.9/gallery_dl/extractor/misskey.py0000644000175000017500000001405214571514301020615 0ustar00mikemike# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for Misskey instances""" from .common import BaseExtractor, Message from .. import text, exception class MisskeyExtractor(BaseExtractor): """Base class for Misskey extractors""" basecategory = "misskey" directory_fmt = ("misskey", "{instance}", "{user[username]}") filename_fmt = "{category}_{id}_{file[id]}.{extension}" archive_fmt = "{id}_{file[id]}" def __init__(self, match): BaseExtractor.__init__(self, match) self.item = match.group(match.lastindex) def _init(self): self.api = MisskeyAPI(self) self.instance = self.root.rpartition("://")[2] self.renotes = self.config("renotes", False) self.replies = self.config("replies", True) def items(self): for note in self.notes(): if "note" in note: note = note["note"] files = note.pop("files") or [] renote = note.get("renote") if renote: if not self.renotes: self.log.debug("Skipping %s (renote)", note["id"]) continue files.extend(renote.get("files") or ()) reply = note.get("reply") if reply: if not self.replies: self.log.debug("Skipping %s (reply)", note["id"]) continue files.extend(reply.get("files") or ()) note["instance"] = self.instance note["instance_remote"] = note["user"]["host"] note["count"] = len(files) note["date"] = text.parse_datetime( note["createdAt"], "%Y-%m-%dT%H:%M:%S.%f%z") yield Message.Directory, note for note["num"], file in enumerate(files, 1): file["date"] = text.parse_datetime( file["createdAt"], "%Y-%m-%dT%H:%M:%S.%f%z") note["file"] = file url = file["url"] yield Message.Url, url, text.nameext_from_url(url, note) def notes(self): """Return an iterable containing all relevant Note objects""" return () BASE_PATTERN = MisskeyExtractor.update({ "misskey.io": { "root": "https://misskey.io", "pattern": r"misskey\.io", }, "misskey.design": { "root": "https://misskey.design", "pattern": r"misskey\.design", }, "lesbian.energy": { "root": "https://lesbian.energy", "pattern": r"lesbian\.energy", }, "sushi.ski": { "root": "https://sushi.ski", "pattern": r"sushi\.ski", }, }) class MisskeyUserExtractor(MisskeyExtractor): """Extractor for all images of a Misskey user""" subcategory = "user" pattern = BASE_PATTERN + r"/@([^/?#]+)/?$" example = "https://misskey.io/@USER" def notes(self): return self.api.users_notes(self.api.user_id_by_username(self.item)) class MisskeyFollowingExtractor(MisskeyExtractor): """Extractor for followed Misskey users""" subcategory = "following" pattern = BASE_PATTERN + r"/@([^/?#]+)/following" example = "https://misskey.io/@USER/following" def items(self): user_id = self.api.user_id_by_username(self.item) for user in self.api.users_following(user_id): user = user["followee"] url = self.root + "/@" + user["username"] host = user["host"] if host is not None: url += "@" + host user["_extractor"] = MisskeyUserExtractor yield Message.Queue, url, user class MisskeyNoteExtractor(MisskeyExtractor): """Extractor for images from a Note""" subcategory = "note" pattern = BASE_PATTERN + r"/notes/(\w+)" example = "https://misskey.io/notes/98765" def notes(self): return (self.api.notes_show(self.item),) class MisskeyFavoriteExtractor(MisskeyExtractor): """Extractor for favorited notes""" subcategory = "favorite" pattern = BASE_PATTERN + r"/(?:my|api/i)/favorites" example = "https://misskey.io/my/favorites" def notes(self): return self.api.i_favorites() class MisskeyAPI(): """Interface for Misskey API https://github.com/misskey-dev/misskey https://misskey-hub.net/en/docs/api/ https://misskey-hub.net/docs/api/endpoints.html """ def __init__(self, extractor): self.root = extractor.root self.extractor = extractor self.headers = {"Content-Type": "application/json"} self.access_token = extractor.config("access-token") def user_id_by_username(self, username): endpoint = "/users/show" data = {"username": username} if "@" in username: data["username"], _, data["host"] = username.partition("@") return self._call(endpoint, data)["id"] def users_following(self, user_id): endpoint = "/users/following" data = {"userId": user_id} return self._pagination(endpoint, data) def users_notes(self, user_id): endpoint = "/users/notes" data = {"userId": user_id} return self._pagination(endpoint, data) def notes_show(self, note_id): endpoint = "/notes/show" data = {"noteId": note_id} return self._call(endpoint, data) def i_favorites(self): endpoint = "/i/favorites" if not self.access_token: raise exception.AuthenticationError() data = {"i": self.access_token} return self._pagination(endpoint, data) def _call(self, endpoint, data): url = self.root + "/api" + endpoint return self.extractor.request( url, method="POST", headers=self.headers, json=data).json() def _pagination(self, endpoint, data): data["limit"] = 100 while True: notes = self._call(endpoint, data) if not notes: return yield from notes data["untilId"] = notes[-1]["id"] ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709611201.0 gallery_dl-1.26.9/gallery_dl/extractor/moebooru.py0000644000175000017500000001405014571514301020756 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2020-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for Moebooru based sites""" from .booru import BooruExtractor from .. import text import collections import datetime import re class MoebooruExtractor(BooruExtractor): """Base class for Moebooru extractors""" basecategory = "moebooru" filename_fmt = "{category}_{id}_{md5}.{extension}" page_start = 1 @staticmethod def _prepare(post): post["date"] = text.parse_timestamp(post["created_at"]) def _html(self, post): return self.request("{}/post/show/{}".format( self.root, post["id"])).text def _tags(self, post, page): tag_container = text.extr(page, '")), "genres" : text.split_html(extr( "", "")), "tags" : text.split_html(extr( "", "")), "uploader" : text.remove_html(extr( "", "")), "language" : extr(" ", "\n"), } ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709611201.0 gallery_dl-1.26.9/gallery_dl/extractor/toyhouse.py0000644000175000017500000000743114571514301021013 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2022-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://toyhou.se/""" from .common import Extractor, Message from .. import text, util BASE_PATTERN = r"(?:https?://)?(?:www\.)?toyhou\.se" class ToyhouseExtractor(Extractor): """Base class for toyhouse extractors""" category = "toyhouse" root = "https://toyhou.se" directory_fmt = ("{category}", "{user|artists!S}") archive_fmt = "{id}" def __init__(self, match): Extractor.__init__(self, match) self.user = match.group(1) self.offset = 0 def items(self): metadata = self.metadata() for post in util.advance(self.posts(), self.offset): if metadata: post.update(metadata) text.nameext_from_url(post["url"], post) post["id"], _, post["hash"] = post["filename"].partition("_") yield Message.Directory, post yield Message.Url, post["url"], post def posts(self): return () def metadata(self): return None def skip(self, num): self.offset += num return num def _parse_post(self, post, needle='\n
    ', '<'), "%d %b %Y, %I:%M:%S %p"), "artists": [ text.remove_html(artist) for artist in extr( '
    ', '
    \n
    ').split( '
    ') ], "characters": text.split_html(extr( '
    ', ''): cnt += 1 yield self._parse_post(post) if cnt == 0 and params["page"] == 1: token, pos = text.extract( page, '', '
    ')), "date" : text.parse_datetime( extr('id="Uploaded">', '
    ').strip(), "%Y %B %d"), "rating" : text.parse_float(extr( 'id="Rating">', '').partition(" ")[0]), "type" : text.remove_html(extr('id="Category">' , '')), "collection": text.remove_html(extr('id="Collection">', '')), "group" : text.split_html(extr('id="Group">' , '')), "artist" : text.split_html(extr('id="Artist">' , '')), "parody" : text.split_html(extr('id="Parody">' , '')), "characters": text.split_html(extr('id="Character">' , '')), "tags" : text.split_html(extr('id="Tag">' , '')), "language" : "English", "lang" : "en", } def images(self, page): url = "{}/Read/Index/{}?page=1".format(self.root, self.gallery_id) headers = {"Referer": self.gallery_url} response = self.request(url, headers=headers, fatal=False) if "/Auth/" in response.url: raise exception.StopExtraction( "Failed to get gallery JSON data. Visit '%s' in a browser " "and solve the CAPTCHA to continue.", response.url) page = response.text tpl, pos = text.extract(page, 'data-cdn="', '"') cnt, pos = text.extract(page, '> of ', '<', pos) base, _, params = text.unescape(tpl).partition("[PAGE]") return [ (base + str(i) + params, None) for i in range(1, text.parse_int(cnt)+1) ] class TsuminoSearchExtractor(TsuminoBase, Extractor): """Extractor for search results on tsumino.com""" subcategory = "search" pattern = r"(?i)(?:https?://)?(?:www\.)?tsumino\.com/(?:Books/?)?#(.+)" example = "https://www.tsumino.com/Books#QUERY" def __init__(self, match): Extractor.__init__(self, match) self.query = match.group(1) def items(self): for gallery in self.galleries(): url = "{}/entry/{}".format(self.root, gallery["id"]) gallery["_extractor"] = TsuminoGalleryExtractor yield Message.Queue, url, gallery def galleries(self): """Return all gallery results matching 'self.query'""" url = "{}/Search/Operate?type=Book".format(self.root) headers = { "Referer": "{}/".format(self.root), "X-Requested-With": "XMLHttpRequest", } data = { "PageNumber": 1, "Text": "", "Sort": "Newest", "List": "0", "Length": "0", "MinimumRating": "0", "ExcludeList": "0", "CompletelyExcludeHated": "false", } data.update(self._parse(self.query)) while True: info = self.request( url, method="POST", headers=headers, data=data).json() for gallery in info["data"]: yield gallery["entry"] if info["pageNumber"] >= info["pageCount"]: return data["PageNumber"] += 1 def _parse(self, query): try: if query.startswith("?"): return self._parse_simple(query) return self._parse_jsurl(query) except Exception as exc: raise exception.StopExtraction( "Invalid search query '%s' (%s)", query, exc) @staticmethod def _parse_simple(query): """Parse search query with format '?=value>'""" key, _, value = query.partition("=") tag_types = { "Tag": "1", "Category": "2", "Collection": "3", "Group": "4", "Artist": "5", "Parody": "6", "Character": "7", "Uploader": "100", } return { "Tags[0][Type]": tag_types[key[1:].capitalize()], "Tags[0][Text]": text.unquote(value).replace("+", " "), "Tags[0][Exclude]": "false", } @staticmethod def _parse_jsurl(data): """Parse search query in JSURL format Nested lists and dicts are handled in a special way to deal with the way Tsumino expects its parameters -> expand(...) Example: ~(name~'John*20Doe~age~42~children~(~'Mary~'Bill)) Ref: https://github.com/Sage/jsurl """ if not data: return {} i = 0 imax = len(data) def eat(expected): nonlocal i if data[i] != expected: error = "bad JSURL syntax: expected '{}', got {}".format( expected, data[i]) raise ValueError(error) i += 1 def decode(): nonlocal i beg = i result = "" while i < imax: ch = data[i] if ch not in "~)*!": i += 1 elif ch == "*": if beg < i: result += data[beg:i] if data[i + 1] == "*": result += chr(int(data[i+2:i+6], 16)) i += 6 else: result += chr(int(data[i+1:i+3], 16)) i += 3 beg = i elif ch == "!": if beg < i: result += data[beg:i] result += "$" i += 1 beg = i else: break return result + data[beg:i] def parse_one(): nonlocal i eat('~') result = "" ch = data[i] if ch == "(": i += 1 if data[i] == "~": result = [] if data[i+1] == ")": i += 1 else: result.append(parse_one()) while data[i] == "~": result.append(parse_one()) else: result = {} if data[i] != ")": while True: key = decode() value = parse_one() for ekey, evalue in expand(key, value): result[ekey] = evalue if data[i] != "~": break i += 1 eat(")") elif ch == "'": i += 1 result = decode() else: beg = i i += 1 while i < imax and data[i] not in "~)": i += 1 sub = data[beg:i] if ch in "0123456789-": fval = float(sub) ival = int(fval) result = ival if ival == fval else fval else: if sub not in ("true", "false", "null"): raise ValueError("bad value keyword: " + sub) result = sub return result def expand(key, value): if isinstance(value, list): for index, cvalue in enumerate(value): ckey = "{}[{}]".format(key, index) yield from expand(ckey, cvalue) elif isinstance(value, dict): for ckey, cvalue in value.items(): ckey = "{}[{}]".format(key, ckey) yield from expand(ckey, cvalue) else: yield key, value return parse_one() ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709611201.0 gallery_dl-1.26.9/gallery_dl/extractor/tumblr.py0000644000175000017500000004256514571514301020450 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2016-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.tumblr.com/""" from .common import Extractor, Message from .. import text, util, oauth, exception from datetime import datetime, date, timedelta import re BASE_PATTERN = ( r"(?:tumblr:(?:https?://)?([^/]+)|" r"(?:https?://)?" r"(?:www\.tumblr\.com/(?:blog/(?:view/)?)?([\w-]+)|" r"([\w-]+\.tumblr\.com)))" ) POST_TYPES = frozenset(( "text", "quote", "link", "answer", "video", "audio", "photo", "chat")) class TumblrExtractor(Extractor): """Base class for tumblr extractors""" category = "tumblr" directory_fmt = ("{category}", "{blog_name}") filename_fmt = "{category}_{blog_name}_{id}_{num:>02}.{extension}" archive_fmt = "{id}_{num}" cookies_domain = None def __init__(self, match): Extractor.__init__(self, match) name = match.group(2) if name: self.blog = name + ".tumblr.com" else: self.blog = match.group(1) or match.group(3) def _init(self): self.api = TumblrAPI(self) self.types = self._setup_posttypes() self.avatar = self.config("avatar", False) self.inline = self.config("inline", True) self.reblogs = self.config("reblogs", True) self.external = self.config("external", False) self.original = self.config("original", True) self.fallback_delay = self.config("fallback-delay", 120.0) self.fallback_retries = self.config("fallback-retries", 2) if len(self.types) == 1: self.api.posts_type = next(iter(self.types)) elif not self.types: self.log.warning("no valid post types selected") if self.reblogs == "same-blog": self._skip_reblog = self._skip_reblog_same_blog self.date_min, self.api.before = self._get_date_min_max(0, None) def items(self): blog = None # pre-compile regular expressions self._sub_video = re.compile( r"https?://((?:vt|vtt|ve)(?:\.media)?\.tumblr\.com" r"/tumblr_[^_]+)_\d+\.([0-9a-z]+)").sub if self.inline: self._sub_image = re.compile( r"https?://(\d+\.media\.tumblr\.com(?:/[0-9a-f]+)?" r"/tumblr(?:_inline)?_[^_]+)_\d+\.([0-9a-z]+)").sub self._subn_orig_image = re.compile(r"/s\d+x\d+/").subn _findall_image = re.compile(' post["timestamp"]: return if post["type"] not in self.types: continue if not blog: blog = self.api.info(self.blog) blog["uuid"] = self.blog if self.avatar: url = self.api.avatar(self.blog) yield Message.Directory, {"blog": blog} yield self._prepare_avatar(url, post.copy(), blog) reblog = "reblogged_from_id" in post if reblog and self._skip_reblog(post): continue post["reblogged"] = reblog if "trail" in post: del post["trail"] post["blog"] = blog post["date"] = text.parse_timestamp(post["timestamp"]) posts = [] if "photos" in post: # type "photo" or "link" photos = post["photos"] del post["photos"] for photo in photos: post["photo"] = photo best_photo = photo["original_size"] for alt_photo in photo["alt_sizes"]: if (alt_photo["height"] > best_photo["height"] or alt_photo["width"] > best_photo["width"]): best_photo = alt_photo photo.update(best_photo) if self.original and "/s2048x3072/" in photo["url"] and ( photo["width"] == 2048 or photo["height"] == 3072): photo["url"], fb = self._original_photo(photo["url"]) if fb: post["_fallback"] = self._original_image_fallback( photo["url"], post["id"]) del photo["original_size"] del photo["alt_sizes"] posts.append( self._prepare_image(photo["url"], post.copy())) del post["photo"] post.pop("_fallback", None) url = post.get("audio_url") # type "audio" if url and url.startswith("https://a.tumblr.com/"): posts.append(self._prepare(url, post.copy())) url = post.get("video_url") # type "video" if url: posts.append(self._prepare( self._original_video(url), post.copy())) if self.inline and "reblog" in post: # inline media # only "chat" posts are missing a "reblog" key in their # API response, but they can't contain images/videos anyway body = post["reblog"]["comment"] + post["reblog"]["tree_html"] for url in _findall_image(body): url, fb = self._original_inline_image(url) if fb: post["_fallback"] = self._original_image_fallback( url, post["id"]) posts.append(self._prepare_image(url, post.copy())) post.pop("_fallback", None) for url in _findall_video(body): url = self._original_video(url) posts.append(self._prepare(url, post.copy())) if self.external: # external links url = post.get("permalink_url") or post.get("url") if url: post["extension"] = None posts.append((Message.Queue, url, post.copy())) del post["extension"] post["count"] = len(posts) yield Message.Directory, post for num, (msg, url, post) in enumerate(posts, 1): post["num"] = num post["count"] = len(posts) yield msg, url, post def posts(self): """Return an iterable containing all relevant posts""" def _setup_posttypes(self): types = self.config("posts", "all") if types == "all": return POST_TYPES elif not types: return frozenset() else: if isinstance(types, str): types = types.split(",") types = frozenset(types) invalid = types - POST_TYPES if invalid: types = types & POST_TYPES self.log.warning("Invalid post types: '%s'", "', '".join(sorted(invalid))) return types @staticmethod def _prepare(url, post): text.nameext_from_url(url, post) post["hash"] = post["filename"].partition("_")[2] return Message.Url, url, post @staticmethod def _prepare_image(url, post): text.nameext_from_url(url, post) # try ".gifv" (#3095) # it's unknown whether all gifs in this case are actually webps # incorrect extensions will be corrected by 'adjust-extensions' if post["extension"] == "gif": post["_fallback"] = (url + "v",) post["_http_headers"] = {"Accept": # copied from chrome 106 "image/avif,image/webp,image/apng," "image/svg+xml,image/*,*/*;q=0.8"} parts = post["filename"].split("_") try: post["hash"] = parts[1] if parts[1] != "inline" else parts[2] except IndexError: # filename doesn't follow the usual pattern (#129) post["hash"] = post["filename"] return Message.Url, url, post @staticmethod def _prepare_avatar(url, post, blog): text.nameext_from_url(url, post) post["num"] = post["count"] = 1 post["blog"] = blog post["reblogged"] = False post["type"] = post["id"] = post["hash"] = "avatar" return Message.Url, url, post def _skip_reblog(self, _): return not self.reblogs def _skip_reblog_same_blog(self, post): return self.blog != post.get("reblogged_root_uuid") def _original_photo(self, url): resized = url.replace("/s2048x3072/", "/s99999x99999/", 1) return self._update_image_token(resized) def _original_inline_image(self, url): if self.original: resized, n = self._subn_orig_image("/s99999x99999/", url, 1) if n: return self._update_image_token(resized) return self._sub_image(r"https://\1_1280.\2", url), False def _original_video(self, url): return self._sub_video(r"https://\1.\2", url) def _update_image_token(self, resized): headers = {"Accept": "text/html,*/*;q=0.8"} try: response = self.request(resized, headers=headers) except Exception: return resized, True else: updated = text.extr(response.text, '" src="', '"') return updated, (resized == updated) def _original_image_fallback(self, url, post_id): for _ in util.repeat(self.fallback_retries): self.sleep(self.fallback_delay, "image token") yield self._update_image_token(url)[0] self.log.warning("Unable to fetch higher-resolution " "version of %s (%s)", url, post_id) class TumblrUserExtractor(TumblrExtractor): """Extractor for a Tumblr user's posts""" subcategory = "user" pattern = BASE_PATTERN + r"(?:/page/\d+|/archive)?/?$" example = "https://www.tumblr.com/BLOG" def posts(self): return self.api.posts(self.blog, {}) class TumblrPostExtractor(TumblrExtractor): """Extractor for a single Tumblr post""" subcategory = "post" pattern = BASE_PATTERN + r"/(?:post/|image/)?(\d+)" example = "https://www.tumblr.com/BLOG/12345" def __init__(self, match): TumblrExtractor.__init__(self, match) self.post_id = match.group(4) self.reblogs = True self.date_min = 0 def posts(self): return self.api.posts(self.blog, {"id": self.post_id}) @staticmethod def _setup_posttypes(): return POST_TYPES class TumblrTagExtractor(TumblrExtractor): """Extractor for Tumblr user's posts by tag""" subcategory = "tag" pattern = BASE_PATTERN + r"/tagged/([^/?#]+)" example = "https://www.tumblr.com/BLOG/tagged/TAG" def __init__(self, match): TumblrExtractor.__init__(self, match) self.tag = text.unquote(match.group(4).replace("-", " ")) def posts(self): return self.api.posts(self.blog, {"tag": self.tag}) class TumblrDayExtractor(TumblrExtractor): """Extractor for Tumblr user's posts by day""" subcategory = "day" pattern = BASE_PATTERN + r"/day/(\d\d\d\d/\d\d/\d\d)" example = "https://www.tumblr.com/BLOG/day/1970/01/01" def __init__(self, match): TumblrExtractor.__init__(self, match) year, month, day = match.group(4).split("/") self.ordinal = date(int(year), int(month), int(day)).toordinal() def _init(self): TumblrExtractor._init(self) self.date_min = ( # 719163 == date(1970, 1, 1).toordinal() (self.ordinal - 719163) * 86400) self.api.before = self.date_min + 86400 def posts(self): return self.api.posts(self.blog, {}) class TumblrLikesExtractor(TumblrExtractor): """Extractor for a Tumblr user's liked posts""" subcategory = "likes" directory_fmt = ("{category}", "{blog_name}", "likes") archive_fmt = "f_{blog[name]}_{id}_{num}" pattern = BASE_PATTERN + r"/likes" example = "https://www.tumblr.com/BLOG/likes" def posts(self): return self.api.likes(self.blog) class TumblrAPI(oauth.OAuth1API): """Interface for the Tumblr API v2 https://github.com/tumblr/docs/blob/master/api.md """ ROOT = "https://api.tumblr.com" API_KEY = "O3hU2tMi5e4Qs5t3vezEi6L0qRORJ5y9oUpSGsrWu8iA3UCc3B" API_SECRET = "sFdsK3PDdP2QpYMRAoq0oDnw0sFS24XigXmdfnaeNZpJpqAn03" BLOG_CACHE = {} def __init__(self, extractor): oauth.OAuth1API.__init__(self, extractor) self.posts_type = self.before = None def info(self, blog): """Return general information about a blog""" try: return self.BLOG_CACHE[blog] except KeyError: endpoint = "/v2/blog/{}/info".format(blog) params = {"api_key": self.api_key} if self.api_key else None self.BLOG_CACHE[blog] = blog = self._call(endpoint, params)["blog"] return blog def avatar(self, blog, size="512"): """Retrieve a blog avatar""" if self.api_key: return "{}/v2/blog/{}/avatar/{}?api_key={}".format( self.ROOT, blog, size, self.api_key) endpoint = "/v2/blog/{}/avatar".format(blog) params = {"size": size} return self._call( endpoint, params, allow_redirects=False)["avatar_url"] def posts(self, blog, params): """Retrieve published posts""" params["offset"] = self.extractor.config("offset") params["limit"] = "50" params["reblog_info"] = "true" params["type"] = self.posts_type params["before"] = self.before if self.before and params["offset"]: self.log.warning("'offset' and 'date-max' cannot be used together") return self._pagination(blog, "/posts", params, cache=True) def likes(self, blog): """Retrieve liked posts""" params = {"limit": "50", "before": self.before} return self._pagination(blog, "/likes", params, key="liked_posts") def _call(self, endpoint, params, **kwargs): url = self.ROOT + endpoint kwargs["params"] = params while True: response = self.request(url, **kwargs) try: data = response.json() except ValueError: data = response.text status = response.status_code else: status = data["meta"]["status"] if 200 <= status < 400: return data["response"] self.log.debug(data) if status == 403: raise exception.AuthorizationError() elif status == 404: try: error = data["errors"][0]["detail"] board = ("only viewable within the Tumblr dashboard" in error) except Exception: board = False if board: self.log.info("Run 'gallery-dl oauth:tumblr' " "to access dashboard-only blogs") raise exception.AuthorizationError(error) raise exception.NotFoundError("user or post") elif status == 429: # daily rate limit if response.headers.get("x-ratelimit-perday-remaining") == "0": self.log.info("Daily API rate limit exceeded") reset = response.headers.get("x-ratelimit-perday-reset") api_key = self.api_key or self.session.auth.consumer_key if api_key == self.API_KEY: self.log.info( "Register your own OAuth application and use its " "credentials to prevent this error: https://githu" "b.com/mikf/gallery-dl/blob/master/docs/configurat" "ion.rst#extractortumblrapi-key--api-secret") if self.extractor.config("ratelimit") == "wait": self.extractor.wait(seconds=reset) continue t = (datetime.now() + timedelta(0, float(reset))).time() raise exception.StopExtraction( "Aborting - Rate limit will reset at %s", "{:02}:{:02}:{:02}".format(t.hour, t.minute, t.second)) # hourly rate limit reset = response.headers.get("x-ratelimit-perhour-reset") if reset: self.log.info("Hourly API rate limit exceeded") self.extractor.wait(seconds=reset) continue raise exception.StopExtraction(data) def _pagination(self, blog, endpoint, params, key="posts", cache=False): endpoint = "/v2/blog/{}{}".format(blog, endpoint) if self.api_key: params["api_key"] = self.api_key while True: data = self._call(endpoint, params) if cache: self.BLOG_CACHE[blog] = data["blog"] cache = False yield from data[key] try: endpoint = data["_links"]["next"]["href"] except KeyError: return params = None if self.api_key: endpoint += "&api_key=" + self.api_key ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709611201.0 gallery_dl-1.26.9/gallery_dl/extractor/tumblrgallery.py0000644000175000017500000001072414571514301022020 0ustar00mikemike# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://tumblrgallery.xyz/""" from .common import GalleryExtractor from .. import text BASE_PATTERN = r"(?:https?://)?tumblrgallery\.xyz" class TumblrgalleryExtractor(GalleryExtractor): """Base class for tumblrgallery extractors""" category = "tumblrgallery" filename_fmt = "{category}_{gallery_id}_{num:>03}_{id}.{extension}" directory_fmt = ("{category}", "{gallery_id} {title}") root = "https://tumblrgallery.xyz" @staticmethod def _urls_from_page(page): return text.extract_iter( page, '
    ", "")), "gallery_id": self.gallery_id, } def images(self, _): page_num = 1 while True: url = "{}/tumblrblog/gallery/{}/{}.html".format( self.root, self.gallery_id, page_num) response = self.request(url, allow_redirects=False, fatal=False) if response.status_code >= 300: return for url in self._urls_from_page(response.text): yield url, self._data_from_url(url) page_num += 1 class TumblrgalleryPostExtractor(TumblrgalleryExtractor): """Extractor for Posts on tumblrgallery.xyz""" subcategory = "post" pattern = BASE_PATTERN + r"(/post/(\d+)\.html)" example = "https://tumblrgallery.xyz/post/12345.html" def __init__(self, match): TumblrgalleryExtractor.__init__(self, match) self.gallery_id = text.parse_int(match.group(2)) def metadata(self, page): return { "title" : text.remove_html( text.unescape(text.extr(page, "", "")) ).replace("_", "-"), "gallery_id": self.gallery_id, } def images(self, page): for url in self._urls_from_page(page): yield url, self._data_from_url(url) class TumblrgallerySearchExtractor(TumblrgalleryExtractor): """Extractor for Search result on tumblrgallery.xyz""" subcategory = "search" filename_fmt = "{category}_{num:>03}_{gallery_id}_{id}_{title}.{extension}" directory_fmt = ("{category}", "{search_term}") pattern = BASE_PATTERN + r"(/s\.php\?q=([^&#]+))" example = "https://tumblrgallery.xyz/s.php?q=QUERY" def __init__(self, match): TumblrgalleryExtractor.__init__(self, match) self.search_term = match.group(2) def metadata(self, page): return { "search_term": self.search_term, } def images(self, _): page_url = "s.php?q=" + self.search_term while True: page = self.request(self.root + "/" + page_url).text for gallery_id in text.extract_iter( page, '
    ", "") )).replace("_", "-") yield url, data next_url = text.extr( page, ' = 400: continue url = text.extr(response.text, 'name="twitter:image" value="', '"') if url: files.append({"url": url}) def _transform_tweet(self, tweet): if "author" in tweet: author = tweet["author"] elif "core" in tweet: author = tweet["core"]["user_results"]["result"] else: author = tweet["user"] author = self._transform_user(author) if "legacy" in tweet: legacy = tweet["legacy"] else: legacy = tweet tget = legacy.get tweet_id = int(legacy["id_str"]) if tweet_id >= 300000000000000: date = text.parse_timestamp( ((tweet_id >> 22) + 1288834974657) // 1000) else: try: date = text.parse_datetime( legacy["created_at"], "%a %b %d %H:%M:%S %z %Y") except Exception: date = util.NONE tdata = { "tweet_id" : tweet_id, "retweet_id" : text.parse_int( tget("retweeted_status_id_str")), "quote_id" : text.parse_int( tget("quoted_by_id_str")), "reply_id" : text.parse_int( tget("in_reply_to_status_id_str")), "conversation_id": text.parse_int( tget("conversation_id_str")), "date" : date, "author" : author, "user" : self._user or author, "lang" : legacy["lang"], "source" : text.extr(tweet["source"], ">", "<"), "sensitive" : tget("possibly_sensitive"), "favorite_count": tget("favorite_count"), "quote_count" : tget("quote_count"), "reply_count" : tget("reply_count"), "retweet_count" : tget("retweet_count"), } if "note_tweet" in tweet: note = tweet["note_tweet"]["note_tweet_results"]["result"] content = note["text"] entities = note["entity_set"] else: content = tget("full_text") or tget("text") or "" entities = legacy["entities"] hashtags = entities.get("hashtags") if hashtags: tdata["hashtags"] = [t["text"] for t in hashtags] mentions = entities.get("user_mentions") if mentions: tdata["mentions"] = [{ "id": text.parse_int(u["id_str"]), "name": u["screen_name"], "nick": u["name"], } for u in mentions] content = text.unescape(content) urls = entities.get("urls") if urls: for url in urls: content = content.replace(url["url"], url["expanded_url"]) txt, _, tco = content.rpartition(" ") tdata["content"] = txt if tco.startswith("https://t.co/") else content if "birdwatch_pivot" in tweet: tdata["birdwatch"] = tweet["birdwatch_pivot"]["subtitle"]["text"] if "in_reply_to_screen_name" in legacy: tdata["reply_to"] = legacy["in_reply_to_screen_name"] if "quoted_by" in legacy: tdata["quote_by"] = legacy["quoted_by"] if tdata["retweet_id"]: tdata["content"] = "RT @{}: {}".format( author["name"], tdata["content"]) tdata["date_original"] = text.parse_timestamp( ((tdata["retweet_id"] >> 22) + 1288834974657) // 1000) return tdata def _transform_user(self, user): try: uid = user.get("rest_id") or user["id_str"] except KeyError: # private/invalid user (#4349) return {} try: return self._user_cache[uid] except KeyError: pass if "legacy" in user: user = user["legacy"] uget = user.get if uget("withheld_scope"): self.log.warning("'%s'", uget("description")) entities = user["entities"] self._user_cache[uid] = udata = { "id" : text.parse_int(uid), "name" : user["screen_name"], "nick" : user["name"], "location" : uget("location"), "date" : text.parse_datetime( uget("created_at"), "%a %b %d %H:%M:%S %z %Y"), "verified" : uget("verified", False), "protected" : uget("protected", False), "profile_banner" : uget("profile_banner_url", ""), "profile_image" : uget( "profile_image_url_https", "").replace("_normal.", "."), "favourites_count": uget("favourites_count"), "followers_count" : uget("followers_count"), "friends_count" : uget("friends_count"), "listed_count" : uget("listed_count"), "media_count" : uget("media_count"), "statuses_count" : uget("statuses_count"), } descr = user["description"] urls = entities["description"].get("urls") if urls: for url in urls: descr = descr.replace(url["url"], url["expanded_url"]) udata["description"] = descr if "url" in entities: url = entities["url"]["urls"][0] udata["url"] = url.get("expanded_url") or url.get("url") return udata def _assign_user(self, user): self._user_obj = user self._user = self._transform_user(user) def _users_result(self, users): userfmt = self.config("users") if not userfmt or userfmt == "user": cls = TwitterUserExtractor fmt = (self.root + "/i/user/{rest_id}").format_map elif userfmt == "timeline": cls = TwitterTimelineExtractor fmt = (self.root + "/id:{rest_id}/timeline").format_map elif userfmt == "media": cls = TwitterMediaExtractor fmt = (self.root + "/id:{rest_id}/media").format_map elif userfmt == "tweets": cls = TwitterTweetsExtractor fmt = (self.root + "/id:{rest_id}/tweets").format_map else: cls = None fmt = userfmt.format_map for user in users: user["_extractor"] = cls yield Message.Queue, fmt(user), user def _expand_tweets(self, tweets): seen = set() for tweet in tweets: obj = tweet["legacy"] if "legacy" in tweet else tweet cid = obj.get("conversation_id_str") if not cid: tid = obj["id_str"] self.log.warning( "Unable to expand %s (no 'conversation_id')", tid) continue if cid in seen: self.log.debug( "Skipping expansion of %s (previously seen)", cid) continue seen.add(cid) try: yield from self.api.tweet_detail(cid) except Exception: yield tweet def _make_tweet(self, user, url, id_str): return { "id_str": id_str, "lang": None, "user": user, "source": "><", "entities": {}, "extended_entities": { "media": [ { "original_info": {}, "media_url": url, }, ], }, } def metadata(self): """Return general metadata""" return {} def tweets(self): """Yield all relevant tweet objects""" def login(self): if self.cookies_check(self.cookies_names): return username, password = self._get_auth_info() if username: self.cookies_update(_login_impl(self, username, password)) class TwitterUserExtractor(TwitterExtractor): """Extractor for a Twitter user""" subcategory = "user" pattern = (BASE_PATTERN + r"/(?!search)(?:([^/?#]+)/?(?:$|[?#])" r"|i(?:/user/|ntent/user\?user_id=)(\d+))") example = "https://twitter.com/USER" def __init__(self, match): TwitterExtractor.__init__(self, match) user_id = match.group(2) if user_id: self.user = "id:" + user_id def initialize(self): pass def items(self): base = "{}/{}/".format(self.root, self.user) return self._dispatch_extractors(( (TwitterAvatarExtractor , base + "photo"), (TwitterBackgroundExtractor, base + "header_photo"), (TwitterTimelineExtractor , base + "timeline"), (TwitterTweetsExtractor , base + "tweets"), (TwitterMediaExtractor , base + "media"), (TwitterRepliesExtractor , base + "with_replies"), (TwitterLikesExtractor , base + "likes"), ), ("timeline",)) class TwitterTimelineExtractor(TwitterExtractor): """Extractor for a Twitter user timeline""" subcategory = "timeline" pattern = BASE_PATTERN + r"/(?!search)([^/?#]+)/timeline(?!\w)" example = "https://twitter.com/USER/timeline" def tweets(self): # yield initial batch of (media) tweets tweet = None for tweet in self._select_tweet_source()(self.user): yield tweet if tweet is None: return # build search query query = "from:{} max_id:{}".format( self._user["name"], tweet["rest_id"]) if self.retweets: query += " include:retweets include:nativeretweets" if not self.textonly: # try to search for media-only tweets tweet = None for tweet in self.api.search_timeline(query + " filter:links"): yield tweet if tweet is not None: return # yield unfiltered search results yield from self.api.search_timeline(query) def _select_tweet_source(self): strategy = self.config("strategy") if strategy is None or strategy == "auto": if self.retweets or self.textonly: return self.api.user_tweets else: return self.api.user_media if strategy == "tweets": return self.api.user_tweets if strategy == "media": return self.api.user_media if strategy == "with_replies": return self.api.user_tweets_and_replies raise exception.StopExtraction("Invalid strategy '%s'", strategy) class TwitterTweetsExtractor(TwitterExtractor): """Extractor for Tweets from a user's Tweets timeline""" subcategory = "tweets" pattern = BASE_PATTERN + r"/(?!search)([^/?#]+)/tweets(?!\w)" example = "https://twitter.com/USER/tweets" def tweets(self): return self.api.user_tweets(self.user) class TwitterRepliesExtractor(TwitterExtractor): """Extractor for Tweets from a user's timeline including replies""" subcategory = "replies" pattern = BASE_PATTERN + r"/(?!search)([^/?#]+)/with_replies(?!\w)" example = "https://twitter.com/USER/with_replies" def tweets(self): return self.api.user_tweets_and_replies(self.user) class TwitterMediaExtractor(TwitterExtractor): """Extractor for Tweets from a user's Media timeline""" subcategory = "media" pattern = BASE_PATTERN + r"/(?!search)([^/?#]+)/media(?!\w)" example = "https://twitter.com/USER/media" def tweets(self): return self.api.user_media(self.user) class TwitterLikesExtractor(TwitterExtractor): """Extractor for liked tweets""" subcategory = "likes" pattern = BASE_PATTERN + r"/(?!search)([^/?#]+)/likes(?!\w)" example = "https://twitter.com/USER/likes" def metadata(self): return {"user_likes": self.user} def tweets(self): return self.api.user_likes(self.user) class TwitterBookmarkExtractor(TwitterExtractor): """Extractor for bookmarked tweets""" subcategory = "bookmark" pattern = BASE_PATTERN + r"/i/bookmarks()" example = "https://twitter.com/i/bookmarks" def tweets(self): return self.api.user_bookmarks() def _transform_tweet(self, tweet): tdata = TwitterExtractor._transform_tweet(self, tweet) tdata["date_bookmarked"] = text.parse_timestamp( (int(tweet["sortIndex"] or 0) >> 20) // 1000) return tdata class TwitterListExtractor(TwitterExtractor): """Extractor for Twitter lists""" subcategory = "list" pattern = BASE_PATTERN + r"/i/lists/(\d+)/?$" example = "https://twitter.com/i/lists/12345" def tweets(self): return self.api.list_latest_tweets_timeline(self.user) class TwitterListMembersExtractor(TwitterExtractor): """Extractor for members of a Twitter list""" subcategory = "list-members" pattern = BASE_PATTERN + r"/i/lists/(\d+)/members" example = "https://twitter.com/i/lists/12345/members" def items(self): self.login() return self._users_result(TwitterAPI(self).list_members(self.user)) class TwitterFollowingExtractor(TwitterExtractor): """Extractor for followed users""" subcategory = "following" pattern = BASE_PATTERN + r"/(?!search)([^/?#]+)/following(?!\w)" example = "https://twitter.com/USER/following" def items(self): self.login() return self._users_result(TwitterAPI(self).user_following(self.user)) class TwitterSearchExtractor(TwitterExtractor): """Extractor for Twitter search results""" subcategory = "search" pattern = BASE_PATTERN + r"/search/?\?(?:[^&#]+&)*q=([^&#]+)" example = "https://twitter.com/search?q=QUERY" def metadata(self): return {"search": text.unquote(self.user)} def tweets(self): query = text.unquote(self.user.replace("+", " ")) user = None for item in query.split(): item = item.strip("()") if item.startswith("from:"): if user: user = None break else: user = item[5:] if user is not None: try: self._assign_user(self.api.user_by_screen_name(user)) except KeyError: pass return self.api.search_timeline(query) class TwitterHashtagExtractor(TwitterExtractor): """Extractor for Twitter hashtags""" subcategory = "hashtag" pattern = BASE_PATTERN + r"/hashtag/([^/?#]+)" example = "https://twitter.com/hashtag/NAME" def items(self): url = "{}/search?q=%23{}".format(self.root, self.user) data = {"_extractor": TwitterSearchExtractor} yield Message.Queue, url, data class TwitterCommunityExtractor(TwitterExtractor): """Extractor for a Twitter community""" subcategory = "community" pattern = BASE_PATTERN + r"/i/communities/(\d+)" example = "https://twitter.com/i/communities/12345" def tweets(self): if self.textonly: return self.api.community_tweets_timeline(self.user) return self.api.community_media_timeline(self.user) class TwitterCommunitiesExtractor(TwitterExtractor): """Extractor for followed Twitter communities""" subcategory = "communities" pattern = BASE_PATTERN + r"/([^/?#]+)/communities/?$" example = "https://twitter.com/i/communities" def tweets(self): return self.api.communities_main_page_timeline(self.user) class TwitterEventExtractor(TwitterExtractor): """Extractor for Tweets from a Twitter Event""" subcategory = "event" directory_fmt = ("{category}", "Events", "{event[id]} {event[short_title]}") pattern = BASE_PATTERN + r"/i/events/(\d+)" example = "https://twitter.com/i/events/12345" def metadata(self): return {"event": self.api.live_event(self.user)} def tweets(self): return self.api.live_event_timeline(self.user) class TwitterTweetExtractor(TwitterExtractor): """Extractor for individual tweets""" subcategory = "tweet" pattern = BASE_PATTERN + r"/([^/?#]+|i/web)/status/(\d+)/?$" example = "https://twitter.com/USER/status/12345" def __init__(self, match): TwitterExtractor.__init__(self, match) self.tweet_id = match.group(2) def tweets(self): conversations = self.config("conversations") if conversations: self._accessible = (conversations == "accessible") return self._tweets_conversation(self.tweet_id) endpoint = self.config("tweet-endpoint") if endpoint == "detail" or endpoint in (None, "auto") and \ self.api.headers["x-twitter-auth-type"]: return self._tweets_detail(self.tweet_id) return self._tweets_single(self.tweet_id) def _tweets_single(self, tweet_id): tweet = self.api.tweet_result_by_rest_id(tweet_id) try: self._assign_user(tweet["core"]["user_results"]["result"]) except KeyError: raise exception.StopExtraction( "'%s'", tweet.get("reason") or "Unavailable") yield tweet if not self.quoted: return while True: tweet_id = tweet["legacy"].get("quoted_status_id_str") if not tweet_id: break tweet = self.api.tweet_result_by_rest_id(tweet_id) tweet["legacy"]["quoted_by_id_str"] = tweet_id yield tweet def _tweets_detail(self, tweet_id): tweets = [] for tweet in self.api.tweet_detail(tweet_id): if tweet["rest_id"] == tweet_id or \ tweet.get("_retweet_id_str") == tweet_id: if self._user_obj is None: self._assign_user(tweet["core"]["user_results"]["result"]) tweets.append(tweet) tweet_id = tweet["legacy"].get("quoted_status_id_str") if not tweet_id: break return tweets def _tweets_conversation(self, tweet_id): tweets = self.api.tweet_detail(tweet_id) buffer = [] for tweet in tweets: buffer.append(tweet) if tweet["rest_id"] == tweet_id or \ tweet.get("_retweet_id_str") == tweet_id: self._assign_user(tweet["core"]["user_results"]["result"]) break else: # initial Tweet not accessible if self._accessible: return () return buffer return itertools.chain(buffer, tweets) class TwitterQuotesExtractor(TwitterExtractor): """Extractor for quotes of a Tweet""" subcategory = "quotes" pattern = BASE_PATTERN + r"/(?:[^/?#]+|i/web)/status/(\d+)/quotes" example = "https://twitter.com/USER/status/12345/quotes" def items(self): url = "{}/search?q=quoted_tweet_id:{}".format(self.root, self.user) data = {"_extractor": TwitterSearchExtractor} yield Message.Queue, url, data class TwitterAvatarExtractor(TwitterExtractor): subcategory = "avatar" filename_fmt = "avatar {date}.{extension}" archive_fmt = "AV_{user[id]}_{date}" pattern = BASE_PATTERN + r"/(?!search)([^/?#]+)/photo" example = "https://twitter.com/USER/photo" def tweets(self): self.api._user_id_by_screen_name(self.user) user = self._user_obj url = user["legacy"]["profile_image_url_https"] if url == ("https://abs.twimg.com/sticky" "/default_profile_images/default_profile_normal.png"): return () url = url.replace("_normal.", ".") id_str = url.rsplit("/", 2)[1] return (self._make_tweet(user, url, id_str),) class TwitterBackgroundExtractor(TwitterExtractor): subcategory = "background" filename_fmt = "background {date}.{extension}" archive_fmt = "BG_{user[id]}_{date}" pattern = BASE_PATTERN + r"/(?!search)([^/?#]+)/header_photo" example = "https://twitter.com/USER/header_photo" def tweets(self): self.api._user_id_by_screen_name(self.user) user = self._user_obj try: url = user["legacy"]["profile_banner_url"] _, timestamp = url.rsplit("/", 1) except (KeyError, ValueError): return () id_str = str((int(timestamp) * 1000 - 1288834974657) << 22) return (self._make_tweet(user, url, id_str),) class TwitterImageExtractor(Extractor): category = "twitter" subcategory = "image" pattern = r"https?://pbs\.twimg\.com/media/([\w-]+)(?:\?format=|\.)(\w+)" example = "https://pbs.twimg.com/media/ABCDE?format=jpg&name=orig" def __init__(self, match): Extractor.__init__(self, match) self.id, self.fmt = match.groups() TwitterExtractor._init_sizes(self) def items(self): base = "https://pbs.twimg.com/media/{}?format={}&name=".format( self.id, self.fmt) data = { "filename": self.id, "extension": self.fmt, "_fallback": TwitterExtractor._image_fallback(self, base), } yield Message.Directory, data yield Message.Url, base + self._size_image, data class TwitterAPI(): def __init__(self, extractor): self.extractor = extractor self.log = extractor.log self.root = "https://twitter.com/i/api" self._nsfw_warning = True self._json_dumps = json.JSONEncoder(separators=(",", ":")).encode cookies = extractor.cookies cookies_domain = extractor.cookies_domain csrf = extractor.config("csrf") if csrf is None or csrf == "cookies": csrf_token = cookies.get("ct0", domain=cookies_domain) else: csrf_token = None if not csrf_token: csrf_token = util.generate_token() cookies.set("ct0", csrf_token, domain=cookies_domain) auth_token = cookies.get("auth_token", domain=cookies_domain) self.headers = { "Accept": "*/*", "Referer": "https://twitter.com/", "content-type": "application/json", "x-guest-token": None, "x-twitter-auth-type": "OAuth2Session" if auth_token else None, "x-csrf-token": csrf_token, "x-twitter-client-language": "en", "x-twitter-active-user": "yes", "Sec-Fetch-Dest": "empty", "Sec-Fetch-Mode": "cors", "Sec-Fetch-Site": "same-origin", "authorization": "Bearer AAAAAAAAAAAAAAAAAAAAANRILgAAAAAAnNwIzUejR" "COuH5E6I8xnZz4puTs%3D1Zv7ttfk8LF81IUq16cHjhLTvJu" "4FA33AGWWjCpTnA", } self.params = { "include_profile_interstitial_type": "1", "include_blocking": "1", "include_blocked_by": "1", "include_followed_by": "1", "include_want_retweets": "1", "include_mute_edge": "1", "include_can_dm": "1", "include_can_media_tag": "1", "include_ext_has_nft_avatar": "1", "include_ext_is_blue_verified": "1", "include_ext_verified_type": "1", "skip_status": "1", "cards_platform": "Web-12", "include_cards": "1", "include_ext_alt_text": "true", "include_ext_limited_action_results": "false", "include_quote_count": "true", "include_reply_count": "1", "tweet_mode": "extended", "include_ext_collab_control": "true", "include_ext_views": "true", "include_entities": "true", "include_user_entities": "true", "include_ext_media_color": "true", "include_ext_media_availability": "true", "include_ext_sensitive_media_warning": "true", "include_ext_trusted_friends_metadata": "true", "send_error_codes": "true", "simple_quoted_tweet": "true", "q": None, "count": "100", "query_source": None, "cursor": None, "pc": None, "spelling_corrections": None, "include_ext_edit_control": "true", "ext": "mediaStats,highlightedLabel,hasNftAvatar,voiceInfo," "enrichments,superFollowMetadata,unmentionInfo,editControl," "collab_control,vibe", } self.features = { "hidden_profile_likes_enabled": True, "hidden_profile_subscriptions_enabled": True, "responsive_web_graphql_exclude_directive_enabled": True, "verified_phone_label_enabled": False, "highlights_tweets_tab_ui_enabled": True, "responsive_web_twitter_article_notes_tab_enabled": True, "creator_subscriptions_tweet_preview_api_enabled": True, "responsive_web_graphql_" "skip_user_profile_image_extensions_enabled": False, "responsive_web_graphql_timeline_navigation_enabled": True, } self.features_pagination = { "responsive_web_graphql_exclude_directive_enabled": True, "verified_phone_label_enabled": False, "creator_subscriptions_tweet_preview_api_enabled": True, "responsive_web_graphql_timeline_navigation_enabled": True, "responsive_web_graphql_skip_user_profile_" "image_extensions_enabled": False, "c9s_tweet_anatomy_moderator_badge_enabled": True, "tweetypie_unmention_optimization_enabled": True, "responsive_web_edit_tweet_api_enabled": True, "graphql_is_translatable_rweb_tweet_is_translatable_enabled": True, "view_counts_everywhere_api_enabled": True, "longform_notetweets_consumption_enabled": True, "responsive_web_twitter_article_tweet_consumption_enabled": True, "tweet_awards_web_tipping_enabled": False, "freedom_of_speech_not_reach_fetch_enabled": True, "standardized_nudges_misinfo": True, "tweet_with_visibility_results_prefer_gql_" "limited_actions_policy_enabled": True, "rweb_video_timestamps_enabled": True, "longform_notetweets_rich_text_read_enabled": True, "longform_notetweets_inline_media_enabled": True, "responsive_web_media_download_video_enabled": True, "responsive_web_enhance_cards_enabled": False, } def tweet_result_by_rest_id(self, tweet_id): endpoint = "/graphql/MWY3AO9_I3rcP_L2A4FR4A/TweetResultByRestId" variables = { "tweetId": tweet_id, "withCommunity": False, "includePromotedContent": False, "withVoice": False, } params = { "variables": self._json_dumps(variables), "features" : self._json_dumps(self.features_pagination), } tweet = self._call(endpoint, params)["data"]["tweetResult"]["result"] if "tweet" in tweet: tweet = tweet["tweet"] if tweet.get("__typename") == "TweetUnavailable": reason = tweet.get("reason") if reason == "NsfwLoggedOut": raise exception.AuthorizationError("NSFW Tweet") if reason == "Protected": raise exception.AuthorizationError("Protected Tweet") raise exception.StopExtraction("Tweet unavailable ('%s')", reason) return tweet def tweet_detail(self, tweet_id): endpoint = "/graphql/B9_KmbkLhXt6jRwGjJrweg/TweetDetail" variables = { "focalTweetId": tweet_id, "referrer": "profile", "with_rux_injections": False, "includePromotedContent": False, "withCommunity": True, "withQuickPromoteEligibilityTweetFields": True, "withBirdwatchNotes": True, "withVoice": True, "withV2Timeline": True, } return self._pagination_tweets( endpoint, variables, ("threaded_conversation_with_injections_v2",)) def user_tweets(self, screen_name): endpoint = "/graphql/5ICa5d9-AitXZrIA3H-4MQ/UserTweets" variables = { "userId": self._user_id_by_screen_name(screen_name), "count": 100, "includePromotedContent": False, "withQuickPromoteEligibilityTweetFields": True, "withVoice": True, "withV2Timeline": True, } return self._pagination_tweets(endpoint, variables) def user_tweets_and_replies(self, screen_name): endpoint = "/graphql/UtLStR_BnYUGD7Q453UXQg/UserTweetsAndReplies" variables = { "userId": self._user_id_by_screen_name(screen_name), "count": 100, "includePromotedContent": False, "withCommunity": True, "withVoice": True, "withV2Timeline": True, } return self._pagination_tweets(endpoint, variables) def user_media(self, screen_name): endpoint = "/graphql/tO4LMUYAZbR4T0SqQ85aAw/UserMedia" variables = { "userId": self._user_id_by_screen_name(screen_name), "count": 100, "includePromotedContent": False, "withClientEventToken": False, "withBirdwatchNotes": False, "withVoice": True, "withV2Timeline": True, } return self._pagination_tweets(endpoint, variables) def user_likes(self, screen_name): endpoint = "/graphql/9s8V6sUI8fZLDiN-REkAxA/Likes" variables = { "userId": self._user_id_by_screen_name(screen_name), "count": 100, "includePromotedContent": False, "withClientEventToken": False, "withBirdwatchNotes": False, "withVoice": True, "withV2Timeline": True, } return self._pagination_tweets(endpoint, variables) def user_bookmarks(self): endpoint = "/graphql/cQxQgX8MJYjWwC0dxpyfYg/Bookmarks" variables = { "count": 100, "includePromotedContent": False, } features = self.features_pagination.copy() features["graphql_timeline_v2_bookmark_timeline"] = True return self._pagination_tweets( endpoint, variables, ("bookmark_timeline_v2", "timeline"), False, features=features) def list_latest_tweets_timeline(self, list_id): endpoint = "/graphql/HjsWc-nwwHKYwHenbHm-tw/ListLatestTweetsTimeline" variables = { "listId": list_id, "count": 100, } return self._pagination_tweets( endpoint, variables, ("list", "tweets_timeline", "timeline")) def search_timeline(self, query): endpoint = "/graphql/fZK7JipRHWtiZsTodhsTfQ/SearchTimeline" variables = { "rawQuery": query, "count": 100, "querySource": "", "product": "Latest", } return self._pagination_tweets( endpoint, variables, ("search_by_raw_query", "search_timeline", "timeline")) def community_tweets_timeline(self, community_id): endpoint = "/graphql/7B2AdxSuC-Er8qUr3Plm_w/CommunityTweetsTimeline" variables = { "communityId": community_id, "count": 100, "displayLocation": "Community", "rankingMode": "Recency", "withCommunity": True, } return self._pagination_tweets( endpoint, variables, ("communityResults", "result", "ranked_community_timeline", "timeline")) def community_media_timeline(self, community_id): endpoint = "/graphql/qAGUldfcIoMv5KyAyVLYog/CommunityMediaTimeline" variables = { "communityId": community_id, "count": 100, "withCommunity": True, } return self._pagination_tweets( endpoint, variables, ("communityResults", "result", "community_media_timeline", "timeline")) def communities_main_page_timeline(self, screen_name): endpoint = ("/graphql/GtOhw2mstITBepTRppL6Uw" "/CommunitiesMainPageTimeline") variables = { "count": 100, "withCommunity": True, } return self._pagination_tweets( endpoint, variables, ("viewer", "communities_timeline", "timeline")) def live_event_timeline(self, event_id): endpoint = "/2/live_event/timeline/{}.json".format(event_id) params = self.params.copy() params["timeline_id"] = "recap" params["urt"] = "true" params["get_annotations"] = "true" return self._pagination_legacy(endpoint, params) def live_event(self, event_id): endpoint = "/1.1/live_event/1/{}/timeline.json".format(event_id) params = self.params.copy() params["count"] = "0" params["urt"] = "true" return (self._call(endpoint, params) ["twitter_objects"]["live_events"][event_id]) def list_members(self, list_id): endpoint = "/graphql/BQp2IEYkgxuSxqbTAr1e1g/ListMembers" variables = { "listId": list_id, "count": 100, "withSafetyModeUserFields": True, } return self._pagination_users( endpoint, variables, ("list", "members_timeline", "timeline")) def user_following(self, screen_name): endpoint = "/graphql/PAnE9toEjRfE-4tozRcsfw/Following" variables = { "userId": self._user_id_by_screen_name(screen_name), "count": 100, "includePromotedContent": False, } return self._pagination_users(endpoint, variables) @memcache(keyarg=1) def user_by_rest_id(self, rest_id): endpoint = "/graphql/tD8zKvQzwY3kdx5yz6YmOw/UserByRestId" features = self.features params = { "variables": self._json_dumps({ "userId": rest_id, "withSafetyModeUserFields": True, }), "features": self._json_dumps(features), } return self._call(endpoint, params)["data"]["user"]["result"] @memcache(keyarg=1) def user_by_screen_name(self, screen_name): endpoint = "/graphql/k5XapwcSikNsEsILW5FvgA/UserByScreenName" features = self.features.copy() features["subscriptions_verification_info_" "is_identity_verified_enabled"] = True features["subscriptions_verification_info_" "verified_since_enabled"] = True params = { "variables": self._json_dumps({ "screen_name": screen_name, "withSafetyModeUserFields": True, }), "features": self._json_dumps(features), } return self._call(endpoint, params)["data"]["user"]["result"] def _user_id_by_screen_name(self, screen_name): user = () try: if screen_name.startswith("id:"): user = self.user_by_rest_id(screen_name[3:]) else: user = self.user_by_screen_name(screen_name) self.extractor._assign_user(user) return user["rest_id"] except KeyError: if "unavailable_message" in user: raise exception.NotFoundError("{} ({})".format( user["unavailable_message"].get("text"), user.get("reason")), False) else: raise exception.NotFoundError("user") @cache(maxage=3600) def _guest_token(self): endpoint = "/1.1/guest/activate.json" self.log.info("Requesting guest token") return str(self._call( endpoint, None, "POST", False, "https://api.twitter.com", )["guest_token"]) def _authenticate_guest(self): guest_token = self._guest_token() if guest_token != self.headers["x-guest-token"]: self.headers["x-guest-token"] = guest_token self.extractor.cookies.set( "gt", guest_token, domain=self.extractor.cookies_domain) def _call(self, endpoint, params, method="GET", auth=True, root=None): url = (root or self.root) + endpoint while True: if not self.headers["x-twitter-auth-type"] and auth: self._authenticate_guest() response = self.extractor.request( url, method=method, params=params, headers=self.headers, fatal=None) # update 'x-csrf-token' header (#1170) csrf_token = response.cookies.get("ct0") if csrf_token: self.headers["x-csrf-token"] = csrf_token if response.status_code < 400: data = response.json() errors = data.get("errors") if not errors: return data retry = False for error in errors: msg = error.get("message") or "Unspecified" self.log.debug("API error: '%s'", msg) if "this account is temporarily locked" in msg: msg = "Account temporarily locked" if self.extractor.config("locked") != "wait": raise exception.AuthorizationError(msg) self.log.warning("%s. Press ENTER to retry.", msg) try: input() except (EOFError, OSError): pass retry = True elif msg.lower().startswith("timeout"): retry = True if not retry: return data elif self.headers["x-twitter-auth-type"]: self.log.debug("Retrying API request") continue # fall through to "Login Required" response.status_code = 404 if response.status_code == 429: # rate limit exceeded if self.extractor.config("ratelimit") == "abort": raise exception.StopExtraction("Rate limit exceeded") until = response.headers.get("x-rate-limit-reset") seconds = None if until else 60 self.extractor.wait(until=until, seconds=seconds) continue if response.status_code in (403, 404) and \ not self.headers["x-twitter-auth-type"]: raise exception.AuthorizationError("Login required") # error try: data = response.json() errors = ", ".join(e["message"] for e in data["errors"]) except ValueError: errors = response.text except Exception: errors = data.get("errors", "") raise exception.StopExtraction( "%s %s (%s)", response.status_code, response.reason, errors) def _pagination_legacy(self, endpoint, params): original_retweets = (self.extractor.retweets == "original") bottom = ("cursor-bottom-", "sq-cursor-bottom") while True: data = self._call(endpoint, params) instructions = data["timeline"]["instructions"] if not instructions: return tweets = data["globalObjects"]["tweets"] users = data["globalObjects"]["users"] tweet_id = cursor = None tweet_ids = [] entries = () # process instructions for instr in instructions: if "addEntries" in instr: entries = instr["addEntries"]["entries"] elif "replaceEntry" in instr: entry = instr["replaceEntry"]["entry"] if entry["entryId"].startswith(bottom): cursor = (entry["content"]["operation"] ["cursor"]["value"]) # collect tweet IDs and cursor value for entry in entries: entry_startswith = entry["entryId"].startswith if entry_startswith(("tweet-", "sq-I-t-")): tweet_ids.append( entry["content"]["item"]["content"]["tweet"]["id"]) elif entry_startswith("homeConversation-"): tweet_ids.extend( entry["content"]["timelineModule"]["metadata"] ["conversationMetadata"]["allTweetIds"][::-1]) elif entry_startswith(bottom): cursor = entry["content"]["operation"]["cursor"] if not cursor.get("stopOnEmptyResponse", True): # keep going even if there are no tweets tweet_id = True cursor = cursor["value"] elif entry_startswith("conversationThread-"): tweet_ids.extend( item["entryId"][6:] for item in entry["content"]["timelineModule"]["items"] if item["entryId"].startswith("tweet-") ) # process tweets for tweet_id in tweet_ids: try: tweet = tweets[tweet_id] except KeyError: self.log.debug("Skipping %s (deleted)", tweet_id) continue if "retweeted_status_id_str" in tweet: retweet = tweets.get(tweet["retweeted_status_id_str"]) if original_retweets: if not retweet: continue retweet["retweeted_status_id_str"] = retweet["id_str"] retweet["_retweet_id_str"] = tweet["id_str"] tweet = retweet elif retweet: tweet["author"] = users[retweet["user_id_str"]] if "extended_entities" in retweet and \ "extended_entities" not in tweet: tweet["extended_entities"] = \ retweet["extended_entities"] tweet["user"] = users[tweet["user_id_str"]] yield tweet if "quoted_status_id_str" in tweet: quoted = tweets.get(tweet["quoted_status_id_str"]) if quoted: quoted = quoted.copy() quoted["author"] = users[quoted["user_id_str"]] quoted["quoted_by"] = tweet["user"]["screen_name"] quoted["quoted_by_id_str"] = tweet["id_str"] yield quoted # stop on empty response if not cursor or (not tweets and not tweet_id): return params["cursor"] = cursor def _pagination_tweets(self, endpoint, variables, path=None, stop_tweets=True, features=None): extr = self.extractor original_retweets = (extr.retweets == "original") pinned_tweet = extr.pinned params = {"variables": None} if features is None: features = self.features_pagination if features: params["features"] = self._json_dumps(features) while True: params["variables"] = self._json_dumps(variables) data = self._call(endpoint, params)["data"] try: if path is None: instructions = (data["user"]["result"]["timeline_v2"] ["timeline"]["instructions"]) else: instructions = data for key in path: instructions = instructions[key] instructions = instructions["instructions"] cursor = None entries = None for instr in instructions: instr_type = instr.get("type") if instr_type == "TimelineAddEntries": if entries: entries.extend(instr["entries"]) else: entries = instr["entries"] elif instr_type == "TimelineAddToModule": entries = instr["moduleItems"] elif instr_type == "TimelineReplaceEntry": entry = instr["entry"] if entry["entryId"].startswith("cursor-bottom-"): cursor = entry["content"]["value"] if entries is None: if not cursor: return entries = () except LookupError: extr.log.debug(data) user = extr._user_obj if user: user = user["legacy"] if user.get("blocked_by"): if self.headers["x-twitter-auth-type"] and \ extr.config("logout"): extr.cookies_file = None del extr.cookies["auth_token"] self.headers["x-twitter-auth-type"] = None extr.log.info("Retrying API request as guest") continue raise exception.AuthorizationError( "{} blocked your account".format( user["screen_name"])) elif user.get("protected"): raise exception.AuthorizationError( "{}'s Tweets are protected".format( user["screen_name"])) raise exception.StopExtraction( "Unable to retrieve Tweets from this timeline") tweets = [] tweet = None if pinned_tweet: pinned_tweet = False if instructions[-1]["type"] == "TimelinePinEntry": tweets.append(instructions[-1]["entry"]) for entry in entries: esw = entry["entryId"].startswith if esw("tweet-"): tweets.append(entry) elif esw(("profile-grid-", "communities-grid-")): if "content" in entry: tweets.extend(entry["content"]["items"]) else: tweets.append(entry) elif esw(("homeConversation-", "profile-conversation-", "conversationthread-")): tweets.extend(entry["content"]["items"]) elif esw("tombstone-"): item = entry["content"]["itemContent"] item["tweet_results"] = \ {"result": {"tombstone": item["tombstoneInfo"]}} tweets.append(entry) elif esw("cursor-bottom-"): cursor = entry["content"] if "itemContent" in cursor: cursor = cursor["itemContent"] if not cursor.get("stopOnEmptyResponse", True): # keep going even if there are no tweets tweet = True cursor = cursor.get("value") for entry in tweets: try: item = ((entry.get("content") or entry["item"]) ["itemContent"]) if "promotedMetadata" in item and not extr.ads: extr.log.debug( "Skipping %s (ad)", (entry.get("entryId") or "").rpartition("-")[2]) continue tweet = item["tweet_results"]["result"] if "tombstone" in tweet: tweet = self._process_tombstone( entry, tweet["tombstone"]) if not tweet: continue if "tweet" in tweet: tweet = tweet["tweet"] legacy = tweet["legacy"] tweet["sortIndex"] = entry.get("sortIndex") except KeyError: extr.log.debug( "Skipping %s (deleted)", (entry.get("entryId") or "").rpartition("-")[2]) continue if "retweeted_status_result" in legacy: retweet = legacy["retweeted_status_result"]["result"] if "tweet" in retweet: retweet = retweet["tweet"] if original_retweets: try: retweet["legacy"]["retweeted_status_id_str"] = \ retweet["rest_id"] retweet["_retweet_id_str"] = tweet["rest_id"] tweet = retweet except KeyError: continue else: try: legacy["retweeted_status_id_str"] = \ retweet["rest_id"] tweet["author"] = \ retweet["core"]["user_results"]["result"] rtlegacy = retweet["legacy"] if "note_tweet" in retweet: tweet["note_tweet"] = retweet["note_tweet"] if "extended_entities" in rtlegacy and \ "extended_entities" not in legacy: legacy["extended_entities"] = \ rtlegacy["extended_entities"] if "withheld_scope" in rtlegacy and \ "withheld_scope" not in legacy: legacy["withheld_scope"] = \ rtlegacy["withheld_scope"] legacy["full_text"] = rtlegacy["full_text"] except KeyError: pass yield tweet if "quoted_status_result" in tweet: try: quoted = tweet["quoted_status_result"]["result"] quoted["legacy"]["quoted_by"] = ( tweet["core"]["user_results"]["result"] ["legacy"]["screen_name"]) quoted["legacy"]["quoted_by_id_str"] = tweet["rest_id"] quoted["sortIndex"] = entry.get("sortIndex") yield quoted except KeyError: extr.log.debug( "Skipping quote of %s (deleted)", tweet.get("rest_id")) continue if stop_tweets and not tweet: return if not cursor or cursor == variables.get("cursor"): return variables["cursor"] = cursor def _pagination_users(self, endpoint, variables, path=None): params = { "variables": None, "features" : self._json_dumps(self.features_pagination), } while True: cursor = entry = None params["variables"] = self._json_dumps(variables) data = self._call(endpoint, params)["data"] try: if path is None: instructions = (data["user"]["result"]["timeline"] ["timeline"]["instructions"]) else: for key in path: data = data[key] instructions = data["instructions"] except KeyError: return for instr in instructions: if instr["type"] == "TimelineAddEntries": for entry in instr["entries"]: if entry["entryId"].startswith("user-"): try: user = (entry["content"]["itemContent"] ["user_results"]["result"]) except KeyError: pass else: if "rest_id" in user: yield user elif entry["entryId"].startswith("cursor-bottom-"): cursor = entry["content"]["value"] if not cursor or cursor.startswith(("-1|", "0|")) or not entry: return variables["cursor"] = cursor def _process_tombstone(self, entry, tombstone): text = (tombstone.get("richText") or tombstone["text"])["text"] tweet_id = entry["entryId"].rpartition("-")[2] if text.startswith("Age-restricted"): if self._nsfw_warning: self._nsfw_warning = False self.log.warning('"%s"', text) self.log.debug("Skipping %s ('%s')", tweet_id, text) @cache(maxage=365*86400, keyarg=1) def _login_impl(extr, username, password): import re import random if re.fullmatch(r"[\w.%+-]+@[\w.-]+\.\w{2,}", username): extr.log.warning( "Login with email is no longer possible. " "You need to provide your username or phone number instead.") def process(response): try: data = response.json() except ValueError: data = {"errors": ({"message": "Invalid response"},)} else: if response.status_code < 400: return data["flow_token"] errors = [] for error in data.get("errors") or (): msg = error.get("message") errors.append('"{}"'.format(msg) if msg else "Unknown error") extr.log.debug(response.text) raise exception.AuthenticationError(", ".join(errors)) extr.cookies.clear() api = TwitterAPI(extr) api._authenticate_guest() headers = api.headers extr.log.info("Logging in as %s", username) # init data = { "input_flow_data": { "flow_context": { "debug_overrides": {}, "start_location": {"location": "unknown"}, }, }, "subtask_versions": { "action_list": 2, "alert_dialog": 1, "app_download_cta": 1, "check_logged_in_account": 1, "choice_selection": 3, "contacts_live_sync_permission_prompt": 0, "cta": 7, "email_verification": 2, "end_flow": 1, "enter_date": 1, "enter_email": 2, "enter_password": 5, "enter_phone": 2, "enter_recaptcha": 1, "enter_text": 5, "enter_username": 2, "generic_urt": 3, "in_app_notification": 1, "interest_picker": 3, "js_instrumentation": 1, "menu_dialog": 1, "notifications_permission_prompt": 2, "open_account": 2, "open_home_timeline": 1, "open_link": 1, "phone_verification": 4, "privacy_options": 1, "security_key": 3, "select_avatar": 4, "select_banner": 2, "settings_list": 7, "show_code": 1, "sign_up": 2, "sign_up_review": 4, "tweet_selection_urt": 1, "update_users": 1, "upload_media": 1, "user_recommendations_list": 4, "user_recommendations_urt": 1, "wait_spinner": 3, "web_modal": 1, }, } url = "https://api.twitter.com/1.1/onboarding/task.json?flow_name=login" response = extr.request(url, method="POST", headers=headers, json=data) data = { "flow_token": process(response), "subtask_inputs": [ { "subtask_id": "LoginJsInstrumentationSubtask", "js_instrumentation": { "response": "{}", "link": "next_link", }, }, ], } url = "https://api.twitter.com/1.1/onboarding/task.json" response = extr.request( url, method="POST", headers=headers, json=data, fatal=None) # username data = { "flow_token": process(response), "subtask_inputs": [ { "subtask_id": "LoginEnterUserIdentifierSSO", "settings_list": { "setting_responses": [ { "key": "user_identifier", "response_data": { "text_data": {"result": username}, }, }, ], "link": "next_link", }, }, ], } # url = "https://api.twitter.com/1.1/onboarding/task.json" extr.sleep(random.uniform(2.0, 4.0), "login (username)") response = extr.request( url, method="POST", headers=headers, json=data, fatal=None) # password data = { "flow_token": process(response), "subtask_inputs": [ { "subtask_id": "LoginEnterPassword", "enter_password": { "password": password, "link": "next_link", }, }, ], } # url = "https://api.twitter.com/1.1/onboarding/task.json" extr.sleep(random.uniform(2.0, 4.0), "login (password)") response = extr.request( url, method="POST", headers=headers, json=data, fatal=None) # account duplication check ? data = { "flow_token": process(response), "subtask_inputs": [ { "subtask_id": "AccountDuplicationCheck", "check_logged_in_account": { "link": "AccountDuplicationCheck_false", }, }, ], } # url = "https://api.twitter.com/1.1/onboarding/task.json" response = extr.request( url, method="POST", headers=headers, json=data, fatal=None) process(response) return { cookie.name: cookie.value for cookie in extr.cookies } ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709611201.0 gallery_dl-1.26.9/gallery_dl/extractor/unsplash.py0000644000175000017500000001072414571514301020770 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2021-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://unsplash.com/""" from .common import Extractor, Message from .. import text, util BASE_PATTERN = r"(?:https?://)?unsplash\.com" class UnsplashExtractor(Extractor): """Base class for unsplash extractors""" category = "unsplash" directory_fmt = ("{category}", "{user[username]}") filename_fmt = "{id}.{extension}" archive_fmt = "{id}" root = "https://unsplash.com" page_start = 1 per_page = 20 def __init__(self, match): Extractor.__init__(self, match) self.item = match.group(1) def items(self): fmt = self.config("format") or "raw" metadata = self.metadata() for photo in self.photos(): util.delete_items( photo, ("current_user_collections", "related_collections")) url = photo["urls"][fmt] text.nameext_from_url(url, photo) if metadata: photo.update(metadata) photo["extension"] = "jpg" photo["date"] = text.parse_datetime(photo["created_at"]) if "tags" in photo: photo["tags"] = [t["title"] for t in photo["tags"]] yield Message.Directory, photo yield Message.Url, url, photo @staticmethod def metadata(): return None def skip(self, num): pages = num // self.per_page self.page_start += pages return pages * self.per_page def _pagination(self, url, params, results=False): params["per_page"] = self.per_page params["page"] = self.page_start while True: photos = self.request(url, params=params).json() if results: photos = photos["results"] yield from photos if len(photos) < self.per_page: return params["page"] += 1 class UnsplashImageExtractor(UnsplashExtractor): """Extractor for a single unsplash photo""" subcategory = "image" pattern = BASE_PATTERN + r"/photos/([^/?#]+)" example = "https://unsplash.com/photos/ID" def photos(self): url = "{}/napi/photos/{}".format(self.root, self.item) return (self.request(url).json(),) class UnsplashUserExtractor(UnsplashExtractor): """Extractor for all photos of an unsplash user""" subcategory = "user" pattern = BASE_PATTERN + r"/@(\w+)/?$" example = "https://unsplash.com/@USER" def photos(self): url = "{}/napi/users/{}/photos".format(self.root, self.item) params = {"order_by": "latest"} return self._pagination(url, params) class UnsplashFavoriteExtractor(UnsplashExtractor): """Extractor for all likes of an unsplash user""" subcategory = "favorite" pattern = BASE_PATTERN + r"/@(\w+)/likes" example = "https://unsplash.com/@USER/likes" def photos(self): url = "{}/napi/users/{}/likes".format(self.root, self.item) params = {"order_by": "latest"} return self._pagination(url, params) class UnsplashCollectionExtractor(UnsplashExtractor): """Extractor for an unsplash collection""" subcategory = "collection" pattern = BASE_PATTERN + r"/collections/([^/?#]+)(?:/([^/?#]+))?" example = "https://unsplash.com/collections/12345/TITLE" def __init__(self, match): UnsplashExtractor.__init__(self, match) self.title = match.group(2) or "" def metadata(self): return {"collection_id": self.item, "collection_title": self.title} def photos(self): url = "{}/napi/collections/{}/photos".format(self.root, self.item) params = {"order_by": "latest"} return self._pagination(url, params) class UnsplashSearchExtractor(UnsplashExtractor): """Extractor for unsplash search results""" subcategory = "search" pattern = BASE_PATTERN + r"/s/photos/([^/?#]+)(?:\?([^#]+))?" example = "https://unsplash.com/s/photos/QUERY" def __init__(self, match): UnsplashExtractor.__init__(self, match) self.query = match.group(2) def photos(self): url = self.root + "/napi/search/photos" params = {"query": text.unquote(self.item.replace('-', ' '))} if self.query: params.update(text.parse_query(self.query)) return self._pagination(url, params, True) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709611201.0 gallery_dl-1.26.9/gallery_dl/extractor/uploadir.py0000644000175000017500000000361514571514301020753 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2022-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://uploadir.com/""" from .common import Extractor, Message from .. import text class UploadirFileExtractor(Extractor): """Extractor for uploadir files""" category = "uploadir" subcategory = "file" root = "https://uploadir.com" filename_fmt = "{filename} ({id}).{extension}" archive_fmt = "{id}" pattern = r"(?:https?://)?uploadir\.com/(?:user/)?u(?:ploads)?/([^/?#]+)" example = "https://uploadir.com/u/ID" def __init__(self, match): Extractor.__init__(self, match) self.file_id = match.group(1) def items(self): url = "{}/u/{}".format(self.root, self.file_id) response = self.request(url, method="HEAD", allow_redirects=False) if 300 <= response.status_code < 400: url = response.headers["Location"] extr = text.extract_from(self.request(url).text) name = text.unescape(extr("

    ", "

    ").strip()) url = self.root + extr('class="form" action="', '"') token = extr('name="authenticity_token" value="', '"') data = text.nameext_from_url(name, { "_http_method": "POST", "_http_data" : { "authenticity_token": token, "upload_id": self.file_id, }, }) else: hcd = response.headers.get("Content-Disposition") name = (hcd.partition("filename*=UTF-8''")[2] or text.extr(hcd, 'filename="', '"')) data = text.nameext_from_url(name) data["id"] = self.file_id yield Message.Directory, data yield Message.Url, url, data ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709611201.0 gallery_dl-1.26.9/gallery_dl/extractor/urlgalleries.py0000644000175000017500000000403614571514301021624 0ustar00mikemike# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://urlgalleries.net/""" from .common import GalleryExtractor, Message from .. import text class UrlgalleriesGalleryExtractor(GalleryExtractor): """Base class for Urlgalleries extractors""" category = "urlgalleries" root = "urlgalleries.net" request_interval = (0.5, 1.0) pattern = r"(?:https?://)(?:(\w+)\.)?urlgalleries\.net/(?:[\w-]+-)?(\d+)" example = "https://blog.urlgalleries.net/gallery-12345/TITLE" def __init__(self, match): self.blog, self.gallery_id = match.groups() url = "https://{}.urlgalleries.net/porn-gallery-{}/?a=10000".format( self.blog, self.gallery_id) GalleryExtractor.__init__(self, match, url) def items(self): page = self.request(self.gallery_url).text imgs = self.images(page) data = self.metadata(page) data["count"] = len(imgs) del page root = "https://{}.urlgalleries.net".format(self.blog) yield Message.Directory, data for data["num"], img in enumerate(imgs, 1): response = self.request( root + img, method="HEAD", allow_redirects=False) yield Message.Queue, response.headers["Location"], data def metadata(self, page): extr = text.extract_from(page) return { "gallery_id": self.gallery_id, "_site": extr(' title="', '"'), # site name "blog" : text.unescape(extr(' title="', '"')), "_rprt": extr(' title="', '"'), # report button "title": text.unescape(extr(' title="', '"').strip()), "date" : text.parse_datetime( extr(" images in gallery | ", "<"), "%B %d, %Y %H:%M"), } def images(self, page): imgs = text.extr(page, 'id="wtf"', "
    ") return list(text.extract_iter(imgs, " href='", "'")) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709611201.0 gallery_dl-1.26.9/gallery_dl/extractor/urlshortener.py0000644000175000017500000000306414571514301021666 0ustar00mikemike# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for general-purpose URL shorteners""" from .common import BaseExtractor, Message from .. import exception class UrlshortenerExtractor(BaseExtractor): """Base class for URL shortener extractors""" basecategory = "urlshortener" BASE_PATTERN = UrlshortenerExtractor.update({ "bitly": { "root": "https://bit.ly", "pattern": r"bit\.ly", }, "tco": { # t.co sends 'http-equiv="refresh"' (200) when using browser UA "headers": {"User-Agent": None}, "root": "https://t.co", "pattern": r"t\.co", }, }) class UrlshortenerLinkExtractor(UrlshortenerExtractor): """Extractor for general-purpose URL shorteners""" subcategory = "link" pattern = BASE_PATTERN + r"/([^/?#]+)" example = "https://bit.ly/abcde" def __init__(self, match): UrlshortenerExtractor.__init__(self, match) self.id = match.group(match.lastindex) def _init(self): self.headers = self.config_instance("headers") def items(self): response = self.request( "{}/{}".format(self.root, self.id), headers=self.headers, method="HEAD", allow_redirects=False, notfound="URL") try: yield Message.Queue, response.headers["location"], {} except KeyError: raise exception.StopExtraction("Unable to resolve short URL") ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709611201.0 gallery_dl-1.26.9/gallery_dl/extractor/vanillarock.py0000644000175000017500000000520514571514301021436 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2019-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://vanilla-rock.com/""" from .common import Extractor, Message from .. import text class VanillarockExtractor(Extractor): """Base class for vanillarock extractors""" category = "vanillarock" root = "https://vanilla-rock.com" def __init__(self, match): Extractor.__init__(self, match) self.path = match.group(1) class VanillarockPostExtractor(VanillarockExtractor): """Extractor for blogposts on vanilla-rock.com""" subcategory = "post" directory_fmt = ("{category}", "{path}") filename_fmt = "{num:>02}.{extension}" archive_fmt = "{filename}" pattern = (r"(?:https?://)?(?:www\.)?vanilla-rock\.com" r"(/(?!category/|tag/)[^/?#]+)/?$") example = "https://vanilla-rock.com/TITLE" def items(self): extr = text.extract_from(self.request(self.root + self.path).text) name = extr('

    ', "<") imgs = [] while True: img = extr('
    ', '
    ') if not img: break imgs.append(text.extr(img, 'href="', '"')) data = { "count": len(imgs), "title": text.unescape(name), "path" : self.path.strip("/"), "date" : text.parse_datetime(extr( '
    ', '
    '), "%Y-%m-%d %H:%M"), "tags" : text.split_html(extr( '
    ', '
    '))[::2], } yield Message.Directory, data for data["num"], url in enumerate(imgs, 1): yield Message.Url, url, text.nameext_from_url(url, data) class VanillarockTagExtractor(VanillarockExtractor): """Extractor for vanillarock blog posts by tag or category""" subcategory = "tag" pattern = (r"(?:https?://)?(?:www\.)?vanilla-rock\.com" r"(/(?:tag|category)/[^?#]+)") example = "https://vanilla-rock.com/tag/TAG" def items(self): url = self.root + self.path data = {"_extractor": VanillarockPostExtractor} while url: extr = text.extract_from(self.request(url).text) while True: post = extr('

    ', '

    ') if not post: break yield Message.Queue, text.extr(post, 'href="', '"'), data url = text.unescape(extr('class="next page-numbers" href="', '"')) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709611201.0 gallery_dl-1.26.9/gallery_dl/extractor/vichan.py0000644000175000017500000000727414571514301020411 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2022-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for vichan imageboards""" from .common import BaseExtractor, Message from .. import text class VichanExtractor(BaseExtractor): """Base class for vichan extractors""" basecategory = "vichan" BASE_PATTERN = VichanExtractor.update({ "8kun": { "root": "https://8kun.top", "pattern": r"8kun\.top", }, "wikieat": { "root": "https://wikieat.club", "pattern": r"wikieat\.club", }, "smugloli": { "root": None, "pattern": r"smuglo(?:\.li|li\.net)", }, }) class VichanThreadExtractor(VichanExtractor): """Extractor for vichan threads""" subcategory = "thread" directory_fmt = ("{category}", "{board}", "{thread} {title}") filename_fmt = "{time}{num:?-//} {filename}.{extension}" archive_fmt = "{board}_{thread}_{tim}" pattern = BASE_PATTERN + r"/([^/?#]+)/res/(\d+)" example = "https://8kun.top/a/res/12345.html" def __init__(self, match): VichanExtractor.__init__(self, match) index = match.lastindex self.board = match.group(index-1) self.thread = match.group(index) def items(self): url = "{}/{}/res/{}.json".format(self.root, self.board, self.thread) posts = self.request(url).json()["posts"] title = posts[0].get("sub") or text.remove_html(posts[0]["com"]) process = (self._process_8kun if self.category == "8kun" else self._process) data = { "board" : self.board, "thread": self.thread, "title" : text.unescape(title)[:50], "num" : 0, } yield Message.Directory, data for post in posts: if "filename" in post: yield process(post, data) if "extra_files" in post: for post["num"], filedata in enumerate( post["extra_files"], 1): yield process(post, filedata) def _process(self, post, data): post.update(data) post["extension"] = post["ext"][1:] post["url"] = "{}/{}/src/{}{}".format( self.root, post["board"], post["tim"], post["ext"]) return Message.Url, post["url"], post @staticmethod def _process_8kun(post, data): post.update(data) post["extension"] = post["ext"][1:] tim = post["tim"] if len(tim) > 16: post["url"] = "https://media.128ducks.com/file_store/{}{}".format( tim, post["ext"]) else: post["url"] = "https://media.128ducks.com/{}/src/{}{}".format( post["board"], tim, post["ext"]) return Message.Url, post["url"], post class VichanBoardExtractor(VichanExtractor): """Extractor for vichan boards""" subcategory = "board" pattern = BASE_PATTERN + r"/([^/?#]+)(?:/index|/catalog|/\d+|/?$)" example = "https://8kun.top/a/" def __init__(self, match): VichanExtractor.__init__(self, match) self.board = match.group(match.lastindex) def items(self): url = "{}/{}/threads.json".format(self.root, self.board) threads = self.request(url).json() for page in threads: for thread in page["threads"]: url = "{}/{}/res/{}.html".format( self.root, self.board, thread["no"]) thread["page"] = page["page"] thread["_extractor"] = VichanThreadExtractor yield Message.Queue, url, thread ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1710779776.0 gallery_dl-1.26.9/gallery_dl/extractor/vipergirls.py0000644000175000017500000000773314576066600021340 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://vipergirls.to/""" from .common import Extractor, Message from .. import text, util, exception from ..cache import cache from xml.etree import ElementTree BASE_PATTERN = r"(?:https?://)?(?:www\.)?vipergirls\.to" class VipergirlsExtractor(Extractor): """Base class for vipergirls extractors""" category = "vipergirls" root = "https://vipergirls.to" request_interval = 0.5 request_interval_min = 0.2 cookies_domain = ".vipergirls.to" cookies_names = ("vg_userid", "vg_password") def _init(self): domain = self.config("domain") if domain: self.root = text.ensure_http_scheme(domain) def items(self): self.login() posts = self.posts() like = self.config("like") if like: user_hash = posts[0].get("hash") if len(user_hash) < 16: self.log.warning("Login required to like posts") like = False posts = posts.iter("post") if self.page: util.advance(posts, (text.parse_int(self.page[5:]) - 1) * 15) for post in posts: data = post.attrib data["thread_id"] = self.thread_id yield Message.Directory, data image = None for image in post: yield Message.Queue, image.attrib["main_url"], data if image is not None and like: self.like(post, user_hash) def login(self): if self.cookies_check(self.cookies_names): return username, password = self._get_auth_info() if username: self.cookies_update(self._login_impl(username, password)) @cache(maxage=90*86400, keyarg=1) def _login_impl(self, username, password): self.log.info("Logging in as %s", username) url = "{}/login.php?do=login".format(self.root) data = { "vb_login_username": username, "vb_login_password": password, "do" : "login", "cookieuser" : "1", } response = self.request(url, method="POST", data=data) if not response.cookies.get("vg_password"): raise exception.AuthenticationError() return {cookie.name: cookie.value for cookie in response.cookies} def like(self, post, user_hash): url = self.root + "/post_thanks.php" params = { "do" : "post_thanks_add", "p" : post.get("id"), "securitytoken": user_hash, } with self.request(url, params=params, allow_redirects=False): pass class VipergirlsThreadExtractor(VipergirlsExtractor): """Extractor for vipergirls threads""" subcategory = "thread" pattern = BASE_PATTERN + r"/threads/(\d+)(?:-[^/?#]+)?(/page\d+)?$" example = "https://vipergirls.to/threads/12345-TITLE" def __init__(self, match): VipergirlsExtractor.__init__(self, match) self.thread_id, self.page = match.groups() def posts(self): url = "{}/vr.php?t={}".format(self.root, self.thread_id) return ElementTree.fromstring(self.request(url).text) class VipergirlsPostExtractor(VipergirlsExtractor): """Extractor for vipergirls posts""" subcategory = "post" pattern = (BASE_PATTERN + r"/threads/(\d+)(?:-[^/?#]+)?\?p=\d+[^#]*#post(\d+)") example = "https://vipergirls.to/threads/12345-TITLE?p=23456#post23456" def __init__(self, match): VipergirlsExtractor.__init__(self, match) self.thread_id, self.post_id = match.groups() self.page = 0 def posts(self): url = "{}/vr.php?p={}".format(self.root, self.post_id) return ElementTree.fromstring(self.request(url).text) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709611201.0 gallery_dl-1.26.9/gallery_dl/extractor/vk.py0000644000175000017500000001334014571514301017550 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2021-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://vk.com/""" from .common import Extractor, Message from .. import text, exception import re BASE_PATTERN = r"(?:https://)?(?:www\.|m\.)?vk\.com" class VkExtractor(Extractor): """Base class for vk extractors""" category = "vk" directory_fmt = ("{category}", "{user[name]|user[id]}") filename_fmt = "{id}.{extension}" archive_fmt = "{id}" root = "https://vk.com" request_interval = (0.5, 1.5) def items(self): sub = re.compile(r"/imp[fg]/").sub sizes = "wzyxrqpo" data = self.metadata() yield Message.Directory, data for photo in self.photos(): for size in sizes: size += "_" if size in photo: break else: self.log.warning("no photo URL found (%s)", photo.get("id")) continue try: url = photo[size + "src"] except KeyError: self.log.warning("no photo URL found (%s)", photo.get("id")) continue photo["url"] = sub("/", url.partition("?")[0]) # photo["url"] = url photo["_fallback"] = (url,) try: _, photo["width"], photo["height"] = photo[size] except ValueError: # photo without width/height entries (#2535) photo["width"] = photo["height"] = 0 photo["id"] = photo["id"].rpartition("_")[2] photo.update(data) text.nameext_from_url(photo["url"], photo) yield Message.Url, photo["url"], photo def _pagination(self, photos_id): url = self.root + "/al_photos.php" headers = { "X-Requested-With": "XMLHttpRequest", "Origin" : self.root, "Referer" : self.root + "/" + photos_id, } data = { "act" : "show", "al" : "1", "direction": "1", "list" : photos_id, "offset" : 0, } while True: payload = self.request( url, method="POST", headers=headers, data=data, ).json()["payload"][1] if len(payload) < 4: self.log.debug(payload) raise exception.AuthorizationError( text.unescape(payload[0]) if payload[0] else None) total = payload[1] photos = payload[3] data["offset"] += len(photos) if data["offset"] >= total: # the last chunk of photos also contains the first few photos # again if 'total' is not a multiple of 10 extra = total - data["offset"] if extra: del photos[extra:] yield from photos return yield from photos class VkPhotosExtractor(VkExtractor): """Extractor for photos from a vk user""" subcategory = "photos" pattern = (BASE_PATTERN + r"/(?:" r"(?:albums|photos|id)(-?\d+)" r"|(?!(?:album|tag)-?\d+_?)([^/?#]+))") example = "https://vk.com/id12345" def __init__(self, match): VkExtractor.__init__(self, match) self.user_id, self.user_name = match.groups() def photos(self): return self._pagination("photos" + self.user_id) def metadata(self): if self.user_id: user_id = self.user_id prefix = "public" if user_id[0] == "-" else "id" url = "{}/{}{}".format(self.root, prefix, user_id.lstrip("-")) data = self._extract_profile(url) else: url = "{}/{}".format(self.root, self.user_name) data = self._extract_profile(url) self.user_id = data["user"]["id"] return data def _extract_profile(self, url): extr = text.extract_from(self.request(url).text) return {"user": { "name": text.unescape(extr( 'rel="canonical" href="https://vk.com/', '"')), "nick": text.unescape(extr( '

    ', "<")).replace(" ", " "), "info": text.unescape(text.remove_html(extr( '', '= meta["last_page"]: return params["page"] = meta["current_page"] + 1 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709611201.0 gallery_dl-1.26.9/gallery_dl/extractor/wallpapercave.py0000644000175000017500000000260614571514301021761 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2021 David Hoppenbrouwers # Copyright 2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://wallpapercave.com/""" from .common import Extractor, Message from .. import text class WallpapercaveImageExtractor(Extractor): """Extractor for images on wallpapercave.com""" category = "wallpapercave" subcategory = "image" root = "https://wallpapercave.com" pattern = r"(?:https?://)?(?:www\.)?wallpapercave\.com" example = "https://wallpapercave.com/w/wp12345" def items(self): page = self.request(text.ensure_http_scheme(self.url)).text path = None for path in text.extract_iter(page, 'class="download" href="', '"'): image = text.nameext_from_url(path) yield Message.Directory, image yield Message.Url, self.root + path, image if path is None: try: path = text.rextract( page, 'href="', '"', page.index('id="tdownload"'))[0] except Exception: pass else: image = text.nameext_from_url(path) yield Message.Directory, image yield Message.Url, self.root + path, image ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709771499.0 gallery_dl-1.26.9/gallery_dl/extractor/warosu.py0000644000175000017500000000670014572205353020457 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2017-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://warosu.org/""" from .common import Extractor, Message from .. import text class WarosuThreadExtractor(Extractor): """Extractor for threads on warosu.org""" category = "warosu" subcategory = "thread" root = "https://warosu.org" directory_fmt = ("{category}", "{board}", "{thread} - {title}") filename_fmt = "{tim}-{filename}.{extension}" archive_fmt = "{board}_{thread}_{tim}" pattern = r"(?:https?://)?(?:www\.)?warosu\.org/([^/]+)/thread/(\d+)" example = "https://warosu.org/a/thread/12345" def __init__(self, match): Extractor.__init__(self, match) self.board, self.thread = match.groups() def items(self): url = "{}/{}/thread/{}".format(self.root, self.board, self.thread) page = self.request(url).text data = self.metadata(page) posts = self.posts(page) if not data["title"]: data["title"] = text.unescape(text.remove_html( posts[0]["com"]))[:50] yield Message.Directory, data for post in posts: if "image" in post: for key in ("w", "h", "no", "time", "tim"): post[key] = text.parse_int(post[key]) post.update(data) yield Message.Url, post["image"], post def metadata(self, page): boardname = text.extr(page, "", "") title = text.unescape(text.extr(page, "class=filetitle>", "<")) return { "board" : self.board, "board_name": boardname.split(" - ")[1], "thread" : self.thread, "title" : title, } def posts(self, page): """Build a list of all post objects""" page = text.extr(page, "
    ") needle = "" return [self.parse(post) for post in page.split(needle)] def parse(self, post): """Build post object by extracting data from an HTML post""" data = self._extract_post(post) if " File:" in post and self._extract_image(post, data): part = data["image"].rpartition("/")[2] data["tim"], _, data["extension"] = part.partition(".") data["ext"] = "." + data["extension"] return data def _extract_post(self, post): extr = text.extract_from(post) return { "no" : extr("id=p", ">"), "name": extr("class=postername>", "<").strip(), "time": extr("class=posttime title=", "000>"), "now" : extr("", "<").strip(), "com" : text.unescape(text.remove_html(extr( "
    ", "
    ").strip())), } def _extract_image(self, post, data): extr = text.extract_from(post) data["fsize"] = extr(" File: ", ", ") data["w"] = extr("", "x") data["h"] = extr("", ", ") data["filename"] = text.unquote(extr( "", "<").rstrip().rpartition(".")[0]) extr("
    ", "") url = extr("") if url: if url[0] == "/": data["image"] = self.root + url else: data["image"] = url return True return False ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709611201.0 gallery_dl-1.26.9/gallery_dl/extractor/weasyl.py0000644000175000017500000001555314571514301020444 0ustar00mikemike# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.weasyl.com/""" from .common import Extractor, Message from .. import text BASE_PATTERN = r"(?:https://)?(?:www\.)?weasyl.com/" class WeasylExtractor(Extractor): category = "weasyl" directory_fmt = ("{category}", "{owner_login}") filename_fmt = "{submitid} {title}.{extension}" archive_fmt = "{submitid}" root = "https://www.weasyl.com" @staticmethod def populate_submission(data): # Some submissions don't have content and can be skipped if "submission" in data["media"]: data["url"] = data["media"]["submission"][0]["url"] data["date"] = text.parse_datetime( data["posted_at"][:19], "%Y-%m-%dT%H:%M:%S") text.nameext_from_url(data["url"], data) return True return False def _init(self): self.session.headers['X-Weasyl-API-Key'] = self.config("api-key") def request_submission(self, submitid): return self.request( "{}/api/submissions/{}/view".format(self.root, submitid)).json() def retrieve_journal(self, journalid): data = self.request( "{}/api/journals/{}/view".format(self.root, journalid)).json() data["extension"] = "html" data["html"] = "text:" + data["content"] data["date"] = text.parse_datetime(data["posted_at"]) return data def submissions(self, owner_login, folderid=None): metadata = self.config("metadata") url = "{}/api/users/{}/gallery".format(self.root, owner_login) params = { "nextid" : None, "folderid": folderid, } while True: data = self.request(url, params=params).json() for submission in data["submissions"]: if metadata: submission = self.request_submission( submission["submitid"]) if self.populate_submission(submission): submission["folderid"] = folderid # Do any submissions have more than one url? If so # a urllist of the submission array urls would work. yield Message.Url, submission["url"], submission if not data["nextid"]: return params["nextid"] = data["nextid"] class WeasylSubmissionExtractor(WeasylExtractor): subcategory = "submission" pattern = BASE_PATTERN + r"(?:~[\w~-]+/submissions|submission)/(\d+)" example = "https://www.weasyl.com/~USER/submissions/12345/TITLE" def __init__(self, match): WeasylExtractor.__init__(self, match) self.submitid = match.group(1) def items(self): data = self.request_submission(self.submitid) if self.populate_submission(data): yield Message.Directory, data yield Message.Url, data["url"], data class WeasylSubmissionsExtractor(WeasylExtractor): subcategory = "submissions" pattern = BASE_PATTERN + r"(?:~|submissions/)([\w~-]+)/?$" example = "https://www.weasyl.com/submissions/USER" def __init__(self, match): WeasylExtractor.__init__(self, match) self.owner_login = match.group(1) def items(self): yield Message.Directory, {"owner_login": self.owner_login} yield from self.submissions(self.owner_login) class WeasylFolderExtractor(WeasylExtractor): subcategory = "folder" directory_fmt = ("{category}", "{owner_login}", "{folder_name}") pattern = BASE_PATTERN + r"submissions/([\w~-]+)\?folderid=(\d+)" example = "https://www.weasyl.com/submissions/USER?folderid=12345" def __init__(self, match): WeasylExtractor.__init__(self, match) self.owner_login, self.folderid = match.groups() def items(self): iter = self.submissions(self.owner_login, self.folderid) # Folder names are only on single submission api calls msg, url, data = next(iter) details = self.request_submission(data["submitid"]) yield Message.Directory, details yield msg, url, data yield from iter class WeasylJournalExtractor(WeasylExtractor): subcategory = "journal" filename_fmt = "{journalid} {title}.{extension}" archive_fmt = "{journalid}" pattern = BASE_PATTERN + r"journal/(\d+)" example = "https://www.weasyl.com/journal/12345" def __init__(self, match): WeasylExtractor.__init__(self, match) self.journalid = match.group(1) def items(self): data = self.retrieve_journal(self.journalid) yield Message.Directory, data yield Message.Url, data["html"], data class WeasylJournalsExtractor(WeasylExtractor): subcategory = "journals" filename_fmt = "{journalid} {title}.{extension}" archive_fmt = "{journalid}" pattern = BASE_PATTERN + r"journals/([\w~-]+)" example = "https://www.weasyl.com/journals/USER" def __init__(self, match): WeasylExtractor.__init__(self, match) self.owner_login = match.group(1) def items(self): yield Message.Directory, {"owner_login": self.owner_login} url = "{}/journals/{}".format(self.root, self.owner_login) page = self.request(url).text for journalid in text.extract_iter(page, 'href="/journal/', '/'): data = self.retrieve_journal(journalid) yield Message.Url, data["html"], data class WeasylFavoriteExtractor(WeasylExtractor): subcategory = "favorite" directory_fmt = ("{category}", "{owner_login}", "Favorites") pattern = BASE_PATTERN + r"favorites\?userid=(\d+)" example = "https://www.weasyl.com/favorites?userid=12345" def __init__(self, match): WeasylExtractor.__init__(self, match) self.userid = match.group(1) def items(self): owner_login = lastid = None url = self.root + "/favorites" params = { "userid" : self.userid, "feature": "submit", } while True: page = self.request(url, params=params).text pos = page.index('id="favorites-content"') if not owner_login: owner_login = text.extr(page, 'Added ", "<"), "%B %d, %Y"), "views": text.parse_int(extr('glyphicon-eye-open">
    ', '<')), "id" : self.video_id, "filename" : self.video_id, "extension": "webm", } if data["title"] == "webmshare": data["title"] = "" yield Message.Directory, data yield Message.Url, data["url"], data ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709611201.0 gallery_dl-1.26.9/gallery_dl/extractor/webtoons.py0000644000175000017500000001456314571514301021000 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2020 Leonardo Taccari # Copyright 2021-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.webtoons.com/""" from .common import GalleryExtractor, Extractor, Message from .. import exception, text, util BASE_PATTERN = r"(?:https?://)?(?:www\.)?webtoons\.com/(([^/?#]+)" class WebtoonsBase(): category = "webtoons" root = "https://www.webtoons.com" cookies_domain = ".webtoons.com" def setup_agegate_cookies(self): self.cookies_update({ "atGDPR" : "AD_CONSENT", "needCCPA" : "false", "needCOPPA" : "false", "needGDPR" : "false", "pagGDPR" : "true", "ageGatePass": "true", }) def request(self, url, **kwargs): response = Extractor.request(self, url, **kwargs) if response.history and "/ageGate" in response.url: raise exception.StopExtraction( "HTTP redirect to age gate check ('%s')", response.request.url) return response class WebtoonsEpisodeExtractor(WebtoonsBase, GalleryExtractor): """Extractor for an episode on webtoons.com""" subcategory = "episode" directory_fmt = ("{category}", "{comic}") filename_fmt = "{episode_no}-{num:>02}.{extension}" archive_fmt = "{title_no}_{episode_no}_{num}" pattern = (BASE_PATTERN + r"/([^/?#]+)/([^/?#]+)/(?:[^/?#]+))" r"/viewer(?:\?([^#'\"]+))") example = ("https://www.webtoons.com/en/GENRE/TITLE/NAME/viewer" "?title_no=123&episode_no=12345") test = ( (("https://www.webtoons.com/en/comedy/safely-endangered" "/ep-572-earth/viewer?title_no=352&episode_no=572"), { "url": "55bec5d7c42aba19e3d0d56db25fdf0b0b13be38", "content": ("1748c7e82b6db910fa179f6dc7c4281b0f680fa7", "42055e44659f6ffc410b3fb6557346dfbb993df3", "49e1f2def04c6f7a6a3dacf245a1cd9abe77a6a9"), "count": 5, }), (("https://www.webtoons.com/en/challenge/punderworld" "/happy-earth-day-/viewer?title_no=312584&episode_no=40"), { "exception": exception.NotFoundError, "keyword": { "comic": "punderworld", "description": str, "episode": "36", "episode_no": "40", "genre": "challenge", "title": r"re:^Punderworld - .+", "title_no": "312584", }, }), ) def __init__(self, match): self.path, self.lang, self.genre, self.comic, self.query = \ match.groups() url = "{}/{}/viewer?{}".format(self.root, self.path, self.query) GalleryExtractor.__init__(self, match, url) def _init(self): self.setup_agegate_cookies() params = text.parse_query(self.query) self.title_no = params.get("title_no") self.episode_no = params.get("episode_no") def metadata(self, page): extr = text.extract_from(page) title = extr('', '<') episode_name = extr('

    #', '<') else: episode = "" if extr('
    ', '') else: username = author_name = "" return { "genre" : self.genre, "comic" : self.comic, "title_no" : self.title_no, "episode_no" : self.episode_no, "title" : text.unescape(title), "episode" : episode, "comic_name" : text.unescape(comic_name), "episode_name": text.unescape(episode_name), "username" : username, "author_name" : text.unescape(author_name), "description" : text.unescape(descr), "lang" : self.lang, "language" : util.code_to_language(self.lang), } @staticmethod def images(page): return [ (url.replace("://webtoon-phinf.", "://swebtoon-phinf."), None) for url in text.extract_iter( page, 'class="_images" data-url="', '"') ] class WebtoonsComicExtractor(WebtoonsBase, Extractor): """Extractor for an entire comic on webtoons.com""" subcategory = "comic" categorytransfer = True pattern = (BASE_PATTERN + r"/([^/?#]+)/([^/?#]+))" r"/list(?:\?([^#]+))") example = "https://www.webtoons.com/en/GENRE/TITLE/list?title_no=123" def __init__(self, match): Extractor.__init__(self, match) self.path, self.lang, self.genre, self.comic, self.query = \ match.groups() def _init(self): self.setup_agegate_cookies() params = text.parse_query(self.query) self.title_no = params.get("title_no") self.page_no = text.parse_int(params.get("page"), 1) def items(self): page = None data = {"_extractor": WebtoonsEpisodeExtractor} while True: path = "/{}/list?title_no={}&page={}".format( self.path, self.title_no, self.page_no) if page and path not in page: return response = self.request(self.root + path) if response.history: parts = response.url.split("/") self.path = "/".join(parts[3:-1]) page = response.text data["page"] = self.page_no for url in self.get_episode_urls(page): yield Message.Queue, url, data self.page_no += 1 @staticmethod def get_episode_urls(page): """Extract and return all episode urls in 'page'""" page = text.extr(page, 'id="_listUl"', '') return [ match.group(0) for match in WebtoonsEpisodeExtractor.pattern.finditer(page) ] ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1710426054.0 gallery_dl-1.26.9/gallery_dl/extractor/weibo.py0000644000175000017500000003064614574603706020260 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2019-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.weibo.com/""" from .common import Extractor, Message from .. import text, util, exception from ..cache import cache import random BASE_PATTERN = r"(?:https?://)?(?:www\.|m\.)?weibo\.c(?:om|n)" USER_PATTERN = BASE_PATTERN + r"/(?:(u|n|p(?:rofile)?)/)?([^/?#]+)(?:/home)?" class WeiboExtractor(Extractor): category = "weibo" directory_fmt = ("{category}", "{user[screen_name]}") filename_fmt = "{status[id]}_{num:>02}.{extension}" archive_fmt = "{status[id]}_{num}" root = "https://weibo.com" request_interval = (1.0, 2.0) def __init__(self, match): Extractor.__init__(self, match) self._prefix, self.user = match.groups() def _init(self): self.livephoto = self.config("livephoto", True) self.retweets = self.config("retweets", False) self.videos = self.config("videos", True) self.gifs = self.config("gifs", True) self.gifs_video = (self.gifs == "video") cookies = _cookie_cache() if cookies is not None: self.cookies.update(cookies) def request(self, url, **kwargs): response = Extractor.request(self, url, **kwargs) if response.history: if "login.sina.com" in response.url: raise exception.StopExtraction( "HTTP redirect to login page (%s)", response.url.partition("?")[0]) if "passport.weibo.com" in response.url: self._sina_visitor_system(response) response = Extractor.request(self, url, **kwargs) return response def items(self): original_retweets = (self.retweets == "original") for status in self.statuses(): if "ori_mid" in status and not self.retweets: self.log.debug("Skipping %s (快转 retweet)", status["id"]) continue if "retweeted_status" in status: if not self.retweets: self.log.debug("Skipping %s (retweet)", status["id"]) continue # videos of the original post are in status # images of the original post are in status["retweeted_status"] files = [] self._extract_status(status, files) self._extract_status(status["retweeted_status"], files) if original_retweets: status = status["retweeted_status"] else: files = [] self._extract_status(status, files) status["date"] = text.parse_datetime( status["created_at"], "%a %b %d %H:%M:%S %z %Y") status["count"] = len(files) yield Message.Directory, status for num, file in enumerate(files, 1): if file["url"].startswith("http:"): file["url"] = "https:" + file["url"][5:] if "filename" not in file: text.nameext_from_url(file["url"], file) if file["extension"] == "json": file["extension"] = "mp4" file["status"] = status file["num"] = num yield Message.Url, file["url"], file def _extract_status(self, status, files): append = files.append if "mix_media_info" in status: for item in status["mix_media_info"]["items"]: type = item.get("type") if type == "video": if self.videos: append(self._extract_video(item["data"]["media_info"])) elif type == "pic": append(item["data"]["largest"].copy()) else: self.log.warning("Unknown media type '%s'", type) return pic_ids = status.get("pic_ids") if pic_ids: pics = status["pic_infos"] for pic_id in pic_ids: pic = pics[pic_id] pic_type = pic.get("type") if pic_type == "gif" and self.gifs: if self.gifs_video: append({"url": pic["video"]}) else: append(pic["largest"].copy()) elif pic_type == "livephoto" and self.livephoto: append(pic["largest"].copy()) file = {"url": pic["video"]} file["filename"], _, file["extension"] = \ pic["video"].rpartition("%2F")[2].rpartition(".") append(file) else: append(pic["largest"].copy()) if "page_info" in status: info = status["page_info"] if "media_info" in info and self.videos: append(self._extract_video(info["media_info"])) def _extract_video(self, info): try: media = max(info["playback_list"], key=lambda m: m["meta"]["quality_index"]) except Exception: return {"url": (info.get("stream_url_hd") or info.get("stream_url") or "")} else: return media["play_info"].copy() def _status_by_id(self, status_id): url = "{}/ajax/statuses/show?id={}".format(self.root, status_id) return self.request(url).json() def _user_id(self): if len(self.user) >= 10 and self.user.isdecimal(): return self.user[-10:] else: url = "{}/ajax/profile/info?{}={}".format( self.root, "screen_name" if self._prefix == "n" else "custom", self.user) return self.request(url).json()["data"]["user"]["idstr"] def _pagination(self, endpoint, params): url = self.root + "/ajax" + endpoint headers = { "X-Requested-With": "XMLHttpRequest", "X-XSRF-TOKEN": None, "Referer": "{}/u/{}".format(self.root, params["uid"]), } while True: response = self.request(url, params=params, headers=headers) headers["Accept"] = "application/json, text/plain, */*" headers["X-XSRF-TOKEN"] = response.cookies.get("XSRF-TOKEN") data = response.json() if not data.get("ok"): self.log.debug(response.content) if "since_id" not in params: # first iteration raise exception.StopExtraction( '"%s"', data.get("msg") or "unknown error") data = data["data"] statuses = data["list"] yield from statuses # videos, newvideo cursor = data.get("next_cursor") if cursor: if cursor == -1: return params["cursor"] = cursor continue # album since_id = data.get("since_id") if since_id: params["sinceid"] = data["since_id"] continue # home, article if "page" in params: if not statuses: return params["page"] += 1 continue # feed, last album page try: params["since_id"] = statuses[-1]["id"] - 1 except LookupError: return def _sina_visitor_system(self, response): self.log.info("Sina Visitor System") passport_url = "https://passport.weibo.com/visitor/genvisitor" headers = {"Referer": response.url} data = { "cb": "gen_callback", "fp": '{"os":"1","browser":"Gecko109,0,0,0","fonts":"undefined",' '"screenInfo":"1920*1080*24","plugins":""}', } page = Extractor.request( self, passport_url, method="POST", headers=headers, data=data).text data = util.json_loads(text.extr(page, "(", ");"))["data"] passport_url = "https://passport.weibo.com/visitor/visitor" params = { "a" : "incarnate", "t" : data["tid"], "w" : "3" if data.get("new_tid") else "2", "c" : "{:>03}".format(data.get("confidence") or 100), "gc" : "", "cb" : "cross_domain", "from" : "weibo", "_rand": random.random(), } response = Extractor.request(self, passport_url, params=params) _cookie_cache.update("", response.cookies) class WeiboUserExtractor(WeiboExtractor): """Extractor for weibo user profiles""" subcategory = "user" pattern = USER_PATTERN + r"(?:$|#)" example = "https://weibo.com/USER" def items(self): base = "{}/u/{}?tabtype=".format(self.root, self._user_id()) return self._dispatch_extractors(( (WeiboHomeExtractor , base + "home"), (WeiboFeedExtractor , base + "feed"), (WeiboVideosExtractor , base + "video"), (WeiboNewvideoExtractor, base + "newVideo"), (WeiboAlbumExtractor , base + "album"), ), ("feed",)) class WeiboHomeExtractor(WeiboExtractor): """Extractor for weibo 'home' listings""" subcategory = "home" pattern = USER_PATTERN + r"\?tabtype=home" example = "https://weibo.com/USER?tabtype=home" def statuses(self): endpoint = "/profile/myhot" params = {"uid": self._user_id(), "page": 1, "feature": "2"} return self._pagination(endpoint, params) class WeiboFeedExtractor(WeiboExtractor): """Extractor for weibo user feeds""" subcategory = "feed" pattern = USER_PATTERN + r"\?tabtype=feed" example = "https://weibo.com/USER?tabtype=feed" def statuses(self): endpoint = "/statuses/mymblog" params = {"uid": self._user_id(), "feature": "0"} return self._pagination(endpoint, params) class WeiboVideosExtractor(WeiboExtractor): """Extractor for weibo 'video' listings""" subcategory = "videos" pattern = USER_PATTERN + r"\?tabtype=video" example = "https://weibo.com/USER?tabtype=video" def statuses(self): endpoint = "/profile/getprofilevideolist" params = {"uid": self._user_id()} for status in self._pagination(endpoint, params): yield status["video_detail_vo"] class WeiboNewvideoExtractor(WeiboExtractor): """Extractor for weibo 'newVideo' listings""" subcategory = "newvideo" pattern = USER_PATTERN + r"\?tabtype=newVideo" example = "https://weibo.com/USER?tabtype=newVideo" def statuses(self): endpoint = "/profile/getWaterFallContent" params = {"uid": self._user_id()} return self._pagination(endpoint, params) class WeiboArticleExtractor(WeiboExtractor): """Extractor for weibo 'article' listings""" subcategory = "article" pattern = USER_PATTERN + r"\?tabtype=article" example = "https://weibo.com/USER?tabtype=article" def statuses(self): endpoint = "/statuses/mymblog" params = {"uid": self._user_id(), "page": 1, "feature": "10"} return self._pagination(endpoint, params) class WeiboAlbumExtractor(WeiboExtractor): """Extractor for weibo 'album' listings""" subcategory = "album" pattern = USER_PATTERN + r"\?tabtype=album" example = "https://weibo.com/USER?tabtype=album" def statuses(self): endpoint = "/profile/getImageWall" params = {"uid": self._user_id()} seen = set() for image in self._pagination(endpoint, params): mid = image["mid"] if mid not in seen: seen.add(mid) status = self._status_by_id(mid) if status.get("ok") != 1: self.log.debug("Skipping status %s (%s)", mid, status) else: yield status class WeiboStatusExtractor(WeiboExtractor): """Extractor for images from a status on weibo.cn""" subcategory = "status" pattern = BASE_PATTERN + r"/(detail|status|\d+)/(\w+)" example = "https://weibo.com/detail/12345" def statuses(self): status = self._status_by_id(self.user) if status.get("ok") != 1: self.log.debug(status) raise exception.NotFoundError("status") return (status,) @cache(maxage=365*86400) def _cookie_cache(): return None ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709611201.0 gallery_dl-1.26.9/gallery_dl/extractor/wikiart.py0000644000175000017500000001155614571514301020611 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2019-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.wikiart.org/""" from .common import Extractor, Message from .. import text BASE_PATTERN = r"(?:https?://)?(?:www\.)?wikiart\.org/([a-z]+)" class WikiartExtractor(Extractor): """Base class for wikiart extractors""" category = "wikiart" filename_fmt = "{id}_{title}.{extension}" archive_fmt = "{id}" root = "https://www.wikiart.org" def __init__(self, match): Extractor.__init__(self, match) self.lang = match.group(1) def items(self): data = self.metadata() yield Message.Directory, data for painting in self.paintings(): url = painting["image"] painting.update(data) yield Message.Url, url, text.nameext_from_url(url, painting) def metadata(self): """Return a dict with general metadata""" def paintings(self): """Return an iterable containing all relevant 'painting' objects""" def _pagination(self, url, extra_params=None, key="Paintings", stop=False): headers = { "X-Requested-With": "XMLHttpRequest", "Referer": url, } params = { "json": "2", "layout": "new", "page": 1, "resultType": "masonry", } if extra_params: params.update(extra_params) while True: data = self.request(url, headers=headers, params=params).json() items = data.get(key) if not items: return yield from items if stop: return params["page"] += 1 class WikiartArtistExtractor(WikiartExtractor): """Extractor for an artist's paintings on wikiart.org""" subcategory = "artist" directory_fmt = ("{category}", "{artist[artistName]}") pattern = BASE_PATTERN + r"/(?!\w+-by-)([\w-]+)/?$" example = "https://www.wikiart.org/en/ARTIST" def __init__(self, match): WikiartExtractor.__init__(self, match) self.artist_name = match.group(2) self.artist = None def metadata(self): url = "{}/{}/{}?json=2".format(self.root, self.lang, self.artist_name) self.artist = self.request(url).json() return {"artist": self.artist} def paintings(self): url = "{}/{}/{}/mode/all-paintings".format( self.root, self.lang, self.artist_name) return self._pagination(url) class WikiartImageExtractor(WikiartArtistExtractor): """Extractor for individual paintings on wikiart.org""" subcategory = "image" pattern = BASE_PATTERN + r"/(?!(?:paintings|artists)-by-)([\w-]+)/([\w-]+)" example = "https://www.wikiart.org/en/ARTIST/TITLE" def __init__(self, match): WikiartArtistExtractor.__init__(self, match) self.title = match.group(3) def paintings(self): title, sep, year = self.title.rpartition("-") if not sep or not year.isdecimal(): title = self.title url = "{}/{}/Search/{} {}".format( self.root, self.lang, self.artist.get("artistName") or self.artist_name, title, ) return self._pagination(url, stop=True) class WikiartArtworksExtractor(WikiartExtractor): """Extractor for artwork collections on wikiart.org""" subcategory = "artworks" directory_fmt = ("{category}", "Artworks by {group!c}", "{type}") pattern = BASE_PATTERN + r"/paintings-by-([\w-]+)/([\w-]+)" example = "https://www.wikiart.org/en/paintings-by-GROUP/TYPE" def __init__(self, match): WikiartExtractor.__init__(self, match) self.group = match.group(2) self.type = match.group(3) def metadata(self): return {"group": self.group, "type": self.type} def paintings(self): url = "{}/{}/paintings-by-{}/{}".format( self.root, self.lang, self.group, self.type) return self._pagination(url) class WikiartArtistsExtractor(WikiartExtractor): """Extractor for artist collections on wikiart.org""" subcategory = "artists" pattern = (BASE_PATTERN + r"/artists-by-([\w-]+)/([\w-]+)") example = "https://www.wikiart.org/en/artists-by-GROUP/TYPE" def __init__(self, match): WikiartExtractor.__init__(self, match) self.group = match.group(2) self.type = match.group(3) def items(self): url = "{}/{}/App/Search/Artists-by-{}".format( self.root, self.lang, self.group) params = {"json": "3", "searchterm": self.type} for artist in self._pagination(url, params, "Artists"): artist["_extractor"] = WikiartArtistExtractor yield Message.Queue, self.root + artist["artistUrl"], artist ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709611201.0 gallery_dl-1.26.9/gallery_dl/extractor/wikifeet.py0000644000175000017500000000454114571514301020742 0ustar00mikemike# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.wikifeet.com/""" from .common import GalleryExtractor from .. import text, util class WikifeetGalleryExtractor(GalleryExtractor): """Extractor for image galleries from wikifeet.com""" category = "wikifeet" directory_fmt = ("{category}", "{celebrity}") filename_fmt = "{category}_{celeb}_{pid}.{extension}" archive_fmt = "{type}_{celeb}_{pid}" pattern = (r"(?:https?://)(?:(?:www\.)?wikifeetx?|" r"men\.wikifeet)\.com/([^/?#]+)") example = "https://www.wikifeet.com/CELEB" def __init__(self, match): self.root = text.root_from_url(match.group(0)) if "wikifeetx.com" in self.root: self.category = "wikifeetx" self.type = "men" if "://men." in self.root else "women" self.celeb = match.group(1) GalleryExtractor.__init__(self, match, self.root + "/" + self.celeb) def metadata(self, page): extr = text.extract_from(page) return { "celeb" : self.celeb, "type" : self.type, "rating" : text.parse_float(extr('"ratingValue": "', '"')), "celebrity" : text.unescape(extr("times'>", "

    ")), "shoesize" : text.remove_html(extr("Shoe Size:", "edit")), "birthplace": text.remove_html(extr("Birthplace:", "edit")), "birthday" : text.parse_datetime(text.remove_html( extr("Birth Date:", "edit")), "%Y-%m-%d"), } def images(self, page): tagmap = { "C": "Close-up", "T": "Toenails", "N": "Nylons", "A": "Arches", "S": "Soles", "B": "Barefoot", } ufmt = "https://pics.wikifeet.com/" + self.celeb + "-Feet-{}.jpg" return [ (ufmt.format(data["pid"]), { "pid" : data["pid"], "width" : data["pw"], "height": data["ph"], "tags" : [ tagmap[tag] for tag in data["tags"] if tag in tagmap ], }) for data in util.json_loads(text.extr(page, "['gdata'] = ", ";")) ] ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709680883.0 gallery_dl-1.26.9/gallery_dl/extractor/wikimedia.py0000644000175000017500000001237514571724363021115 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2022 Ailothaen # Copyright 2024 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for Wikimedia sites""" from .common import BaseExtractor, Message from .. import text class WikimediaExtractor(BaseExtractor): """Base class for wikimedia extractors""" basecategory = "wikimedia" filename_fmt = "{filename} ({sha1[:8]}).{extension}" directory_fmt = ("{category}", "{page}") archive_fmt = "{sha1}" request_interval = (1.0, 2.0) def __init__(self, match): BaseExtractor.__init__(self, match) path = match.group(match.lastindex) if self.category == "wikimedia": self.category = self.root.split(".")[-2] elif self.category == "fandom": self.category = \ "fandom-" + self.root.partition(".")[0].rpartition("/")[2] if path.startswith("wiki/"): path = path[5:] pre, sep, _ = path.partition(":") prefix = pre.lower() if sep else None self.title = path = text.unquote(path) if prefix: self.subcategory = prefix if prefix == "category": self.params = { "generator": "categorymembers", "gcmtitle" : path, "gcmtype" : "file", } elif prefix == "file": self.params = { "titles" : path, } else: self.params = { "generator": "images", "titles" : path, } def _init(self): api_path = self.config_instance("api-path") if api_path: if api_path[0] == "/": self.api_url = self.root + api_path else: self.api_url = api_path else: self.api_url = self.root + "/api.php" def items(self): for info in self._pagination(self.params): image = info["imageinfo"][0] image["metadata"] = { m["name"]: m["value"] for m in image["metadata"]} image["commonmetadata"] = { m["name"]: m["value"] for m in image["commonmetadata"]} filename = image["canonicaltitle"] image["filename"], _, image["extension"] = \ filename.partition(":")[2].rpartition(".") image["date"] = text.parse_datetime( image["timestamp"], "%Y-%m-%dT%H:%M:%SZ") image["page"] = self.title yield Message.Directory, image yield Message.Url, image["url"], image def _pagination(self, params): """ https://www.mediawiki.org/wiki/API:Query https://opendata.stackexchange.com/questions/13381 """ url = self.api_url params["action"] = "query" params["format"] = "json" params["prop"] = "imageinfo" params["iiprop"] = ( "timestamp|user|userid|comment|canonicaltitle|url|size|" "sha1|mime|metadata|commonmetadata|extmetadata|bitdepth" ) while True: data = self.request(url, params=params).json() try: pages = data["query"]["pages"] except KeyError: pass else: yield from pages.values() try: continuation = data["continue"] except KeyError: break params.update(continuation) BASE_PATTERN = WikimediaExtractor.update({ "wikimedia": { "root": None, "pattern": r"[a-z]{2,}\." r"wik(?:i(?:pedia|quote|books|source|news|versity|data" r"|voyage)|tionary)" r"\.org", "api-path": "/w/api.php", }, "wikispecies": { "root": "https://species.wikimedia.org", "pattern": r"species\.wikimedia\.org", "api-path": "/w/api.php", }, "wikimediacommons": { "root": "https://commons.wikimedia.org", "pattern": r"commons\.wikimedia\.org", "api-path": "/w/api.php", }, "mediawiki": { "root": "https://www.mediawiki.org", "pattern": r"(?:www\.)?mediawiki\.org", "api-path": "/w/api.php", }, "fandom": { "root": None, "pattern": r"[\w-]+\.fandom\.com", }, "mariowiki": { "root": "https://www.mariowiki.com", "pattern": r"(?:www\.)?mariowiki\.com", }, "bulbapedia": { "root": "https://bulbapedia.bulbagarden.net", "pattern": r"(?:bulbapedia|archives)\.bulbagarden\.net", "api-path": "/w/api.php", }, "pidgiwiki": { "root": "https://www.pidgi.net", "pattern": r"(?:www\.)?pidgi\.net", "api-path": "/wiki/api.php", }, "azurlanewiki": { "root": "https://azurlane.koumakan.jp", "pattern": r"azurlane\.koumakan\.jp", "api-path": "/w/api.php", }, }) class WikimediaArticleExtractor(WikimediaExtractor): """Extractor for wikimedia articles""" subcategory = "article" pattern = BASE_PATTERN + r"/(?!static/)([^?#]+)" example = "https://en.wikipedia.org/wiki/TITLE" ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709611201.0 gallery_dl-1.26.9/gallery_dl/extractor/xhamster.py0000644000175000017500000001032014571514301020756 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2019-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://xhamster.com/""" from .common import Extractor, Message from .. import text, util BASE_PATTERN = (r"(?:https?://)?((?:[\w-]+\.)?xhamster" r"(?:\d?\.(?:com|one|desi)|\.porncache\.net))") class XhamsterExtractor(Extractor): """Base class for xhamster extractors""" category = "xhamster" def __init__(self, match): Extractor.__init__(self, match) self.root = "https://" + match.group(1) class XhamsterGalleryExtractor(XhamsterExtractor): """Extractor for image galleries on xhamster.com""" subcategory = "gallery" directory_fmt = ("{category}", "{user[name]}", "{gallery[id]} {gallery[title]}") filename_fmt = "{num:>03}_{id}.{extension}" archive_fmt = "{id}" pattern = BASE_PATTERN + r"(/photos/gallery/[^/?#]+)" example = "https://xhamster.com/photos/gallery/12345" def __init__(self, match): XhamsterExtractor.__init__(self, match) self.path = match.group(2) self.data = None def items(self): data = self.metadata() yield Message.Directory, data for num, image in enumerate(self.images(), 1): url = image["imageURL"] image.update(data) image["num"] = num yield Message.Url, url, text.nameext_from_url(url, image) def metadata(self): self.data = self._data(self.root + self.path) user = self.data["authorModel"] imgs = self.data["photosGalleryModel"] return { "user": { "id" : text.parse_int(user["id"]), "url" : user["pageURL"], "name" : user["name"], "retired" : user["retired"], "verified" : user["verified"], "subscribers": user["subscribers"], }, "gallery": { "id" : text.parse_int(imgs["id"]), "tags" : [c["name"] for c in imgs["categories"]], "date" : text.parse_timestamp(imgs["created"]), "views" : text.parse_int(imgs["views"]), "likes" : text.parse_int(imgs["rating"]["likes"]), "dislikes" : text.parse_int(imgs["rating"]["dislikes"]), "title" : text.unescape(imgs["title"]), "description": text.unescape(imgs["description"]), "thumbnail" : imgs["thumbURL"], }, "count": text.parse_int(imgs["quantity"]), } def images(self): data = self.data self.data = None while True: for image in data["photosGalleryModel"]["photos"]: del image["modelName"] yield image pgntn = data["pagination"] if pgntn["active"] == pgntn["maxPage"]: return url = pgntn["pageLinkTemplate"][:-3] + str(pgntn["next"]) data = self._data(url) def _data(self, url): page = self.request(url).text return util.json_loads(text.extr( page, "window.initials=", "").rstrip("\n\r;")) class XhamsterUserExtractor(XhamsterExtractor): """Extractor for all galleries of an xhamster user""" subcategory = "user" pattern = BASE_PATTERN + r"/users/([^/?#]+)(?:/photos)?/?(?:$|[?#])" example = "https://xhamster.com/users/USER/photos" def __init__(self, match): XhamsterExtractor.__init__(self, match) self.user = match.group(2) def items(self): url = "{}/users/{}/photos".format(self.root, self.user) data = {"_extractor": XhamsterGalleryExtractor} while url: extr = text.extract_from(self.request(url).text) while True: url = extr('thumb-image-container role-pop" href="', '"') if not url: break yield Message.Queue, url, data url = extr('data-page="next" href="', '"') ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709680883.0 gallery_dl-1.26.9/gallery_dl/extractor/xvideos.py0000644000175000017500000001001414571724363020617 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2017-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.xvideos.com/""" from .common import GalleryExtractor, Extractor, Message from .. import text, util BASE_PATTERN = (r"(?:https?://)?(?:www\.)?xvideos\.com" r"/(?:profiles|(?:amateur-|model-)?channels)") class XvideosBase(): """Base class for xvideos extractors""" category = "xvideos" root = "https://www.xvideos.com" class XvideosGalleryExtractor(XvideosBase, GalleryExtractor): """Extractor for user profile galleries on xvideos.com""" subcategory = "gallery" directory_fmt = ("{category}", "{user[name]}", "{gallery[id]} {gallery[title]}") filename_fmt = "{category}_{gallery[id]}_{num:>03}.{extension}" archive_fmt = "{gallery[id]}_{num}" pattern = BASE_PATTERN + r"/([^/?#]+)/photos/(\d+)" example = "https://www.xvideos.com/profiles/USER/photos/12345" def __init__(self, match): self.user, self.gallery_id = match.groups() url = "{}/profiles/{}/photos/{}".format( self.root, self.user, self.gallery_id) GalleryExtractor.__init__(self, match, url) def metadata(self, page): extr = text.extract_from(page) user = { "id" : text.parse_int(extr('"id_user":', ',')), "display": extr('"display":"', '"'), "sex" : extr('"sex":"', '"'), "name" : self.user, } title = extr('"title":"', '"') user["description"] = extr( '', '').strip() tags = extr('Tagged:', '<').strip() return { "user": user, "gallery": { "id" : text.parse_int(self.gallery_id), "title": text.unescape(title), "tags" : text.unescape(tags).split(", ") if tags else [], }, } def images(self, page): results = [ (url, None) for url in text.extract_iter( page, '
    Next"))["data"] if not isinstance(data["galleries"], dict): return if "0" in data["galleries"]: del data["galleries"]["0"] galleries = [ { "id" : text.parse_int(gid), "title": text.unescape(gdata["title"]), "count": gdata["nb_pics"], "_extractor": XvideosGalleryExtractor, } for gid, gdata in data["galleries"].items() ] galleries.sort(key=lambda x: x["id"]) for gallery in galleries: url = "https://www.xvideos.com/profiles/{}/photos/{}".format( self.user, gallery["id"]) yield Message.Queue, url, gallery ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709611201.0 gallery_dl-1.26.9/gallery_dl/extractor/ytdl.py0000644000175000017500000001163314571514301020107 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2021-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for sites supported by youtube-dl""" from .common import Extractor, Message from .. import ytdl, config, exception class YoutubeDLExtractor(Extractor): """Generic extractor for youtube-dl supported URLs""" category = "ytdl" directory_fmt = ("{category}", "{subcategory}") filename_fmt = "{title}-{id}.{extension}" archive_fmt = "{extractor_key} {id}" pattern = r"ytdl:(.*)" example = "ytdl:https://www.youtube.com/watch?v=abcdefghijk" def __init__(self, match): # import main youtube_dl module ytdl_module = ytdl.import_module(config.get( ("extractor", "ytdl"), "module")) self.ytdl_module_name = ytdl_module.__name__ # find suitable youtube_dl extractor self.ytdl_url = url = match.group(1) generic = config.interpolate(("extractor", "ytdl"), "generic", True) if generic == "force": self.ytdl_ie_key = "Generic" self.force_generic_extractor = True else: for ie in ytdl_module.extractor.gen_extractor_classes(): if ie.suitable(url): self.ytdl_ie_key = ie.ie_key() break if not generic and self.ytdl_ie_key == "Generic": raise exception.NoExtractorError() self.force_generic_extractor = False # set subcategory to youtube_dl extractor's key self.subcategory = self.ytdl_ie_key Extractor.__init__(self, match) def items(self): # import subcategory module ytdl_module = ytdl.import_module( config.get(("extractor", "ytdl", self.subcategory), "module") or self.ytdl_module_name) self.log.debug("Using %s", ytdl_module) # construct YoutubeDL object extr_opts = { "extract_flat" : "in_playlist", "force_generic_extractor": self.force_generic_extractor, } user_opts = { "retries" : self._retries, "socket_timeout" : self._timeout, "nocheckcertificate" : not self._verify, } if self._proxies: user_opts["proxy"] = self._proxies.get("http") username, password = self._get_auth_info() if username: user_opts["username"], user_opts["password"] = username, password del username, password ytdl_instance = ytdl.construct_YoutubeDL( ytdl_module, self, user_opts, extr_opts) # transfer cookies to ytdl cookies = self.cookies if cookies: set_cookie = ytdl_instance.cookiejar.set_cookie for cookie in cookies: set_cookie(cookie) # extract youtube_dl info_dict try: info_dict = ytdl_instance._YoutubeDL__extract_info( self.ytdl_url, ytdl_instance.get_info_extractor(self.ytdl_ie_key), False, {}, True) except ytdl_module.utils.YoutubeDLError: raise exception.StopExtraction("Failed to extract video data") if not info_dict: return elif "entries" in info_dict: results = self._process_entries( ytdl_module, ytdl_instance, info_dict["entries"]) else: results = (info_dict,) # yield results for info_dict in results: info_dict["extension"] = None info_dict["_ytdl_info_dict"] = info_dict info_dict["_ytdl_instance"] = ytdl_instance url = "ytdl:" + (info_dict.get("url") or info_dict.get("webpage_url") or self.ytdl_url) yield Message.Directory, info_dict yield Message.Url, url, info_dict def _process_entries(self, ytdl_module, ytdl_instance, entries): for entry in entries: if not entry: continue elif entry.get("_type") in ("url", "url_transparent"): try: info_dict = ytdl_instance.extract_info( entry["url"], False, ie_key=entry.get("ie_key")) except ytdl_module.utils.YoutubeDLError: continue if not info_dict: continue elif "entries" in info_dict: yield from self._process_entries( ytdl_module, ytdl_instance, info_dict["entries"]) else: yield info_dict else: yield entry if config.get(("extractor", "ytdl"), "enabled"): # make 'ytdl:' prefix optional YoutubeDLExtractor.pattern = r"(?:ytdl:)?(.*)" ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1709680883.0 gallery_dl-1.26.9/gallery_dl/extractor/zerochan.py0000644000175000017500000001604314571724363020757 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2022-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.zerochan.net/""" from .booru import BooruExtractor from ..cache import cache from .. import text, util, exception BASE_PATTERN = r"(?:https?://)?(?:www\.)?zerochan\.net" class ZerochanExtractor(BooruExtractor): """Base class for zerochan extractors""" category = "zerochan" root = "https://www.zerochan.net" filename_fmt = "{id}.{extension}" archive_fmt = "{id}" page_start = 1 per_page = 250 cookies_domain = ".zerochan.net" cookies_names = ("z_id", "z_hash") request_interval = (0.5, 1.5) def login(self): self._logged_in = True if self.cookies_check(self.cookies_names): return username, password = self._get_auth_info() if username: return self.cookies_update(self._login_impl(username, password)) self._logged_in = False @cache(maxage=90*86400, keyarg=1) def _login_impl(self, username, password): self.log.info("Logging in as %s", username) url = self.root + "/login" headers = { "Origin" : self.root, "Referer" : url, } data = { "ref" : "/", "name" : username, "password": password, "login" : "Login", } response = self.request(url, method="POST", headers=headers, data=data) if not response.history: raise exception.AuthenticationError() return response.cookies def _parse_entry_html(self, entry_id): url = "{}/{}".format(self.root, entry_id) extr = text.extract_from(self.request(url).text) data = { "id" : text.parse_int(entry_id), "author" : text.parse_unicode_escapes(extr(' "name": "', '"')), "file_url": extr('"contentUrl": "', '"'), "date" : text.parse_datetime(extr('"datePublished": "', '"')), "width" : text.parse_int(extr('"width": "', ' ')), "height" : text.parse_int(extr('"height": "', ' ')), "size" : text.parse_bytes(extr('"contentSize": "', 'B')), "path" : text.split_html(extr( 'class="breadcrumbs', ''))[2:], "uploader": extr('href="/user/', '"'), "tags" : extr('