././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1678564859.9818535 gallery_dl-1.25.0/0000755000175000017500000000000014403156774012424 5ustar00mikemike././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1678564859.0 gallery_dl-1.25.0/CHANGELOG.md0000644000175000017500000056775214403156773014262 0ustar00mikemike# Changelog ## 1.25.0 - 2023-03-11 ### Changes - [e621] split `e621` extractors from `danbooru` module ([#3425](https://github.com/mikf/gallery-dl/issues/3425)) - [deviantart] remove mature scraps warning ([#3691](https://github.com/mikf/gallery-dl/issues/3691)) - [deviantart] use `/collections/all` endpoint for favorites ([#3666](https://github.com/mikf/gallery-dl/issues/3666) ,#3668) - [newgrounds] update default image and audio archive IDs to prevent ID overlap ([#3681](https://github.com/mikf/gallery-dl/issues/3681)) - rename `--ignore-config` to `--config-ignore` ### Extractors - [catbox] add `file` extractor ([#3570](https://github.com/mikf/gallery-dl/issues/3570)) - [deviantart] add `search` extractor ([#538](https://github.com/mikf/gallery-dl/issues/538), [#1264](https://github.com/mikf/gallery-dl/issues/1264), [#2954](https://github.com/mikf/gallery-dl/issues/2954), [#2970](https://github.com/mikf/gallery-dl/issues/2970), [#3577](https://github.com/mikf/gallery-dl/issues/3577)) - [deviantart] add `gallery-search` extractor ([#1695](https://github.com/mikf/gallery-dl/issues/1695)) - [deviantart] support `fxdeviantart.com` URLs (##3740) - [e621] implement `notes` and `pools` metadata extraction ([#3425](https://github.com/mikf/gallery-dl/issues/3425)) - [gelbooru] add `favorite` extractor ([#3704](https://github.com/mikf/gallery-dl/issues/3704)) - [imagetwist] support `phun.imagetwist.com` and `imagehaha.com` domains ([#3622](https://github.com/mikf/gallery-dl/issues/3622)) - [instagram] add `user` metadata field ([#3107](https://github.com/mikf/gallery-dl/issues/3107)) - [manganelo] update and fix metadata extraction - [manganelo] support mobile-only chapters - [mangasee] extract `author` and `genre` metadata ([#3703](https://github.com/mikf/gallery-dl/issues/3703)) - [misskey] add `misskey` extractors ([#3717](https://github.com/mikf/gallery-dl/issues/3717)) - [pornpics] add `gallery` and `search` extractors ([#263](https://github.com/mikf/gallery-dl/issues/263), [#3544](https://github.com/mikf/gallery-dl/issues/3544), [#3654](https://github.com/mikf/gallery-dl/issues/3654)) - [redgifs] support v3 URLs ([#3588](https://github.com/mikf/gallery-dl/issues/3588). [#3589](https://github.com/mikf/gallery-dl/issues/3589)) - [redgifs] add `collection` extractors ([#3427](https://github.com/mikf/gallery-dl/issues/3427), [#3662](https://github.com/mikf/gallery-dl/issues/3662)) - [shopify] support ohpolly.com ([#440](https://github.com/mikf/gallery-dl/issues/440), [#3596](https://github.com/mikf/gallery-dl/issues/3596)) - [szurubooru] add `tag` and `post` extractors ([#3583](https://github.com/mikf/gallery-dl/issues/3583), [#3713](https://github.com/mikf/gallery-dl/issues/3713)) - [twitter] add `transform` option ### Options - [postprocessor:metadata] add `sort` and `separators` options - [postprocessor:exec] implement archive options ([#3584](https://github.com/mikf/gallery-dl/issues/3584)) - add `--config-create` command-line option ([#2333](https://github.com/mikf/gallery-dl/issues/2333)) - add `--config-toml` command-line option to load config files in TOML format - add `output.stdout`, `output.stdin`, and `output.stderr` options ([#1621](https://github.com/mikf/gallery-dl/issues/1621), [#2152](https://github.com/mikf/gallery-dl/issues/2152), [#2529](https://github.com/mikf/gallery-dl/issues/2529)) - add `hash_md5` and `hash_sha1` functions ([#3679](https://github.com/mikf/gallery-dl/issues/3679)) - implement `globals` option to enable defining custom functions for `eval` statements - implement `archive-pragma` option to use SQLite PRAGMA statements - implement `actions` to trigger events on logging messages ([#3338](https://github.com/mikf/gallery-dl/issues/3338), [#3630](https://github.com/mikf/gallery-dl/issues/3630)) - implement ability to load external extractor classes - `-X/--extractors` command-line options - `extractor.modules-sources` config option ### Fixes - [bunkr] fix extraction ([#3636](https://github.com/mikf/gallery-dl/issues/3636), [#3655](https://github.com/mikf/gallery-dl/issues/3655)) - [danbooru] send gallery-dl User-Agent ([#3665](https://github.com/mikf/gallery-dl/issues/3665)) - [deviantart] fix crash when handling deleted deviations in status updates ([#3656](https://github.com/mikf/gallery-dl/issues/3656)) - [fanbox] fix crash with missing images ([#3673](https://github.com/mikf/gallery-dl/issues/3673)) - [imagefap] update `gallery` URLs ([#3595](https://github.com/mikf/gallery-dl/issues/3595)) - [imagefap] fix infinite pagination loop ([#3594](https://github.com/mikf/gallery-dl/issues/3594)) - [imagefap] fix metadata extraction - [oauth] use default name for browsers without `name` attribute - [pinterest] unescape search terms ([#3621](https://github.com/mikf/gallery-dl/issues/3621)) - [pixiv] fix `--write-tags` for `"tags": "original"` ([#3675](https://github.com/mikf/gallery-dl/issues/3675)) - [poipiku] warn about incorrect passwords ([#3646](https://github.com/mikf/gallery-dl/issues/3646)) - [reddit] update `videos` option ([#3712](https://github.com/mikf/gallery-dl/issues/3712)) - [soundgasm] rewrite ([#3578](https://github.com/mikf/gallery-dl/issues/3578)) - [telegraph] fix extraction when images are not in `
` elements ([#3590](https://github.com/mikf/gallery-dl/issues/3590)) - [tumblr] raise more detailed errors for dashboard-only blogs ([#3628](https://github.com/mikf/gallery-dl/issues/3628)) - [twitter] fix some `original` retweets not downloading ([#3744](https://github.com/mikf/gallery-dl/issues/3744)) - [ytdl] fix `--parse-metadata` ([#3663](https://github.com/mikf/gallery-dl/issues/3663)) - [downloader:ytdl] prevent exception on empty results ### Improvements - [downloader:http] use `time.monotonic()` - [downloader:http] update `_http_retry` to accept a Python function ([#3569](https://github.com/mikf/gallery-dl/issues/3569)) - [postprocessor:metadata] speed up JSON encoding - replace `json.loads/dumps` with direct calls to `JSONDecoder.decode/JSONEncoder.encode` - improve `option.Formatter` performance ### Removals - [nitter] remove `nitter.pussthecat.org` ## 1.24.5 - 2023-01-28 ### Additions - [booru] add `url` option - [danbooru] extend `metadata` option ([#3505](https://github.com/mikf/gallery-dl/issues/3505)) - [deviantart] add extractor for status updates ([#3539](https://github.com/mikf/gallery-dl/issues/3539), [#3541](https://github.com/mikf/gallery-dl/issues/3541)) - [deviantart] add support for `/deviation/` and `fav.me` URLs ([#3558](https://github.com/mikf/gallery-dl/issues/3558), [#3560](https://github.com/mikf/gallery-dl/issues/3560)) - [kemonoparty] extract `hash` metadata for discord files ([#3531](https://github.com/mikf/gallery-dl/issues/3531)) - [lexica] add `search` extractor ([#3567](https://github.com/mikf/gallery-dl/issues/3567)) - [mastodon] add `num` and `count` metadata fields ([#3517](https://github.com/mikf/gallery-dl/issues/3517)) - [nudecollect] add `image` and `album` extractors ([#2430](https://github.com/mikf/gallery-dl/issues/2430), [#2818](https://github.com/mikf/gallery-dl/issues/2818), [#3575](https://github.com/mikf/gallery-dl/issues/3575)) - [wikifeet] add `gallery` extractor ([#519](https://github.com/mikf/gallery-dl/issues/519), [#3537](https://github.com/mikf/gallery-dl/issues/3537)) - [downloader:http] add signature checks for `.blend`, `.obj`, and `.clip` files ([#3535](https://github.com/mikf/gallery-dl/issues/3535)) - add `extractor.retry-codes` option - add `-O/--postprocessor-option` command-line option ([#3565](https://github.com/mikf/gallery-dl/issues/3565)) - improve `write-pages` output ### Fixes - [bunkr] fix downloading `.mkv` and `.ts` files ([#3571](https://github.com/mikf/gallery-dl/issues/3571)) - [fantia] send `X-CSRF-Token` headers ([#3576](https://github.com/mikf/gallery-dl/issues/3576)) - [generic] fix regex for non-src image URLs ([#3555](https://github.com/mikf/gallery-dl/issues/3555)) - [hiperdex] update domain ([#3572](https://github.com/mikf/gallery-dl/issues/3572)) - [hotleak] fix video URLs ([#3516](https://github.com/mikf/gallery-dl/issues/3516), [#3525](https://github.com/mikf/gallery-dl/issues/3525), [#3563](https://github.com/mikf/gallery-dl/issues/3563), [#3581](https://github.com/mikf/gallery-dl/issues/3581)) - [instagram] always show `cursor` value after errors ([#3440](https://github.com/mikf/gallery-dl/issues/3440)) - [instagram] update API domain, headers, and csrf token handling - [oauth] show `client-id`/`api-key` values ([#3518](https://github.com/mikf/gallery-dl/issues/3518)) - [philomena] match URLs with www subdomain - [sankaku] update URL pattern ([#3523](https://github.com/mikf/gallery-dl/issues/3523)) - [twitter] refresh guest tokens ([#3445](https://github.com/mikf/gallery-dl/issues/3445), [#3458](https://github.com/mikf/gallery-dl/issues/3458)) - [twitter] fix search pagination ([#3536](https://github.com/mikf/gallery-dl/issues/3536), [#3534](https://github.com/mikf/gallery-dl/issues/3534), [#3549](https://github.com/mikf/gallery-dl/issues/3549)) - [twitter] use `"browser": "firefox"` by default ([#3522](https://github.com/mikf/gallery-dl/issues/3522)) ## 1.24.4 - 2023-01-11 ### Additions - [downloader:http] add `validate` option ### Fixes - [kemonoparty] fix regression from commit 473bd380 ([#3519](https://github.com/mikf/gallery-dl/issues/3519)) ## 1.24.3 - 2023-01-10 ### Additions - [danbooru] extract `uploader` metadata ([#3457](https://github.com/mikf/gallery-dl/issues/3457)) - [deviantart] initial implementation of username & password login for `scraps` ([#1029](https://github.com/mikf/gallery-dl/issues/1029)) - [fanleaks] add `post` and `model` extractors ([#3468](https://github.com/mikf/gallery-dl/issues/3468), [#3474](https://github.com/mikf/gallery-dl/issues/3474)) - [imagefap] add `folder` extractor ([#3504](https://github.com/mikf/gallery-dl/issues/3504)) - [lynxchan] support `bbw-chan.nl` ([#3456](https://github.com/mikf/gallery-dl/issues/3456), [#3463](https://github.com/mikf/gallery-dl/issues/3463)) - [pinterest] support `All Pins` boards ([#2855](https://github.com/mikf/gallery-dl/issues/2855), [#3484](https://github.com/mikf/gallery-dl/issues/3484)) - [pinterest] add `domain` option ([#3484](https://github.com/mikf/gallery-dl/issues/3484)) - [pixiv] implement `metadata-bookmark` option ([#3417](https://github.com/mikf/gallery-dl/issues/3417)) - [tcbscans] add `chapter` and `manga` extractors ([#3189](https://github.com/mikf/gallery-dl/issues/3189)) - [twitter] implement `syndication=extended` ([#3483](https://github.com/mikf/gallery-dl/issues/3483)) - implement slice notation for `range` options ([#918](https://github.com/mikf/gallery-dl/issues/918), [#2865](https://github.com/mikf/gallery-dl/issues/2865)) - allow `filter` options to be a list of expressions ### Fixes - [behance] use delay between requests ([#2507](https://github.com/mikf/gallery-dl/issues/2507)) - [bunkr] fix URLs returned by API ([#3481](https://github.com/mikf/gallery-dl/issues/3481)) - [fanbox] return `imageMap` files in order ([#2718](https://github.com/mikf/gallery-dl/issues/2718)) - [imagefap] use delay between requests ([#1140](https://github.com/mikf/gallery-dl/issues/1140)) - [imagefap] warn about redirects to `/human-verification` ([#1140](https://github.com/mikf/gallery-dl/issues/1140)) - [kemonoparty] reject invalid/empty files ([#3510](https://github.com/mikf/gallery-dl/issues/3510)) - [myhentaigallery] handle whitespace before title tag ([#3503](https://github.com/mikf/gallery-dl/issues/3503)) - [poipiku] fix extraction for a different warning button style ([#3493](https://github.com/mikf/gallery-dl/issues/3493), [#3460](https://github.com/mikf/gallery-dl/issues/3460)) - [poipiku] warn about login requirements - [telegraph] fix file URLs ([#3506](https://github.com/mikf/gallery-dl/issues/3506)) - [twitter] fix crash when using `expand` and `syndication` ([#3473](https://github.com/mikf/gallery-dl/issues/3473)) - [twitter] apply tweet type checks before uniqueness check ([#3439](https://github.com/mikf/gallery-dl/issues/3439), [#3455](https://github.com/mikf/gallery-dl/issues/3455)) - [twitter] force `https://` for TwitPic URLs ([#3449](https://github.com/mikf/gallery-dl/issues/3449)) - [ytdl] adapt to yt-dlp changes - update and improve documentation ([#3453](https://github.com/mikf/gallery-dl/issues/3453), [#3462](https://github.com/mikf/gallery-dl/issues/3462), [#3496](https://github.com/mikf/gallery-dl/issues/3496)) ## 1.24.2 - 2022-12-18 ### Additions - [2chen] support `.club` URLs ([#3406](https://github.com/mikf/gallery-dl/issues/3406)) - [deviantart] extract sta.sh URLs from `text_content` ([#3366](https://github.com/mikf/gallery-dl/issues/3366)) - [deviantart] add `/view` URL support ([#3367](https://github.com/mikf/gallery-dl/issues/3367)) - [e621] implement `threshold` option to control pagination ([#3413](https://github.com/mikf/gallery-dl/issues/3413)) - [fapello] add `post`, `user` and `path` extractors ([#3065](https://github.com/mikf/gallery-dl/issues/3065), [#3360](https://github.com/mikf/gallery-dl/issues/3360), [#3415](https://github.com/mikf/gallery-dl/issues/3415)) - [imgur] add support for imgur.io URLs ([#3419](https://github.com/mikf/gallery-dl/issues/3419)) - [lynxchan] add generic extractors for lynxchan imageboards ([#3389](https://github.com/mikf/gallery-dl/issues/3389), [#3394](https://github.com/mikf/gallery-dl/issues/3394)) - [mangafox] extract more metadata ([#3167](https://github.com/mikf/gallery-dl/issues/3167)) - [pixiv] extract `date_url` metadata ([#3405](https://github.com/mikf/gallery-dl/issues/3405)) - [soundgasm] add `audio` and `user` extractors ([#3384](https://github.com/mikf/gallery-dl/issues/3384), [#3388](https://github.com/mikf/gallery-dl/issues/3388)) - [webmshare] add `video` extractor ([#2410](https://github.com/mikf/gallery-dl/issues/2410)) - support Firefox containers for `--cookies-from-browser` ([#3346](https://github.com/mikf/gallery-dl/issues/3346)) ### Fixes - [2chen] fix file URLs - [bunkr] update domain ([#3391](https://github.com/mikf/gallery-dl/issues/3391)) - [exhentai] fix pagination - [imagetwist] fix extraction - [imgth] rewrite - [instagram] prevent post `date` overwriting file `date` ([#3392](https://github.com/mikf/gallery-dl/issues/3392)) - [khinsider] fix metadata extraction - [komikcast] update domain and fix extraction - [reddit] increase `id-max` default value ([#3397](https://github.com/mikf/gallery-dl/issues/3397)) - [seiga] raise error when redirected to login page ([#3401](https://github.com/mikf/gallery-dl/issues/3401)) - [sexcom] fix video URLs ([#3408](https://github.com/mikf/gallery-dl/issues/3408), [#3414](https://github.com/mikf/gallery-dl/issues/3414)) - [twitter] update `search` pagination ([#544](https://github.com/mikf/gallery-dl/issues/544)) - [warosu] fix and update - [zerochan] update for layout v3 - restore paths for archived files ([#3362](https://github.com/mikf/gallery-dl/issues/3362), [#3377](https://github.com/mikf/gallery-dl/issues/3377)) - use `util.NONE` as `keyword-default` default value ([#3334](https://github.com/mikf/gallery-dl/issues/3334)) ### Removals - [foolslide] remove `kireicake` - [kissgoddess] remove module ## 1.24.1 - 2022-12-04 ### Additions - [artstation] add `pro-first` option ([#3273](https://github.com/mikf/gallery-dl/issues/3273)) - [artstation] add `max-posts` option ([#3270](https://github.com/mikf/gallery-dl/issues/3270)) - [fapachi] add `post` and `user` extractors ([#3339](https://github.com/mikf/gallery-dl/issues/3339), [#3347](https://github.com/mikf/gallery-dl/issues/3347)) - [inkbunny] provide additional metadata ([#3274](https://github.com/mikf/gallery-dl/issues/3274)) - [nitter] add `retweets` option ([#3278](https://github.com/mikf/gallery-dl/issues/3278)) - [nitter] add `videos` option ([#3279](https://github.com/mikf/gallery-dl/issues/3279)) - [nitter] support `/i/web/` and `/i/user/` URLs ([#3310](https://github.com/mikf/gallery-dl/issues/3310)) - [pixhost] add `gallery` support ([#3336](https://github.com/mikf/gallery-dl/issues/3336), [#3353](https://github.com/mikf/gallery-dl/issues/3353)) - [weibo] add `count` metadata field ([#3305](https://github.com/mikf/gallery-dl/issues/3305)) - [downloader:http] add `retry-codes` option ([#3313](https://github.com/mikf/gallery-dl/issues/3313)) - [formatter] implement `S` format specifier to sort lists ([#3266](https://github.com/mikf/gallery-dl/issues/3266)) - implement `version-metadata` option ([#3201](https://github.com/mikf/gallery-dl/issues/3201)) ### Fixes - [2chen] fix extraction ([#3354](https://github.com/mikf/gallery-dl/issues/3354), [#3356](https://github.com/mikf/gallery-dl/issues/3356)) - [bcy] fix JSONDecodeError ([#3321](https://github.com/mikf/gallery-dl/issues/3321)) - [bunkr] fix video downloads ([#3326](https://github.com/mikf/gallery-dl/issues/3326), [#3335](https://github.com/mikf/gallery-dl/issues/3335)) - [bunkr] use `media-files` servers for more file types - [itaku] remove `Extreme` rating ([#3285](https://github.com/mikf/gallery-dl/issues/3285), [#3287](https://github.com/mikf/gallery-dl/issues/3287)) - [hitomi] apply format check for every image ([#3280](https://github.com/mikf/gallery-dl/issues/3280)) - [hotleak] fix UnboundLocalError ([#3288](https://github.com/mikf/gallery-dl/issues/3288), [#3293](https://github.com/mikf/gallery-dl/issues/3293)) - [nitter] sanitize filenames ([#3294](https://github.com/mikf/gallery-dl/issues/3294)) - [nitter] retry downloads on 404 ([#3313](https://github.com/mikf/gallery-dl/issues/3313)) - [nitter] set `hlsPlayback` cookie - [patreon] fix `403 Forbidden` errors ([#3341](https://github.com/mikf/gallery-dl/issues/3341)) - [patreon] improve `campaign_id` extraction ([#3235](https://github.com/mikf/gallery-dl/issues/3235)) - [patreon] update API query parameters - [pixiv] preserve `tags` order ([#3266](https://github.com/mikf/gallery-dl/issues/3266)) - [reddit] use `dash_url` for videos ([#3258](https://github.com/mikf/gallery-dl/issues/3258), [#3306](https://github.com/mikf/gallery-dl/issues/3306)) - [twitter] fix error when using user IDs for suspended accounts - [weibo] fix bug with empty `playback_list` ([#3301](https://github.com/mikf/gallery-dl/issues/3301)) - [downloader:http] fix potential `ZeroDivisionError` ([#3328](https://github.com/mikf/gallery-dl/issues/3328)) ### Removals - [lolisafe] remove `zz.ht` ## 1.24.0 - 2022-11-20 ### Additions - [exhentai] add metadata to search results ([#3181](https://github.com/mikf/gallery-dl/issues/3181)) - [gelbooru_v02] implement `notes` extraction - [instagram] add `guide` extractor ([#3192](https://github.com/mikf/gallery-dl/issues/3192)) - [lolisafe] add support for xbunkr ([#3153](https://github.com/mikf/gallery-dl/issues/3153), [#3156](https://github.com/mikf/gallery-dl/issues/3156)) - [mastodon] add `instance_remote` metadata field ([#3119](https://github.com/mikf/gallery-dl/issues/3119)) - [nitter] add extractors for Nitter instances ([#2415](https://github.com/mikf/gallery-dl/issues/2415), [#2696](https://github.com/mikf/gallery-dl/issues/2696)) - [pixiv] add support for new daily AI rankings category ([#3214](https://github.com/mikf/gallery-dl/issues/3214), [#3221](https://github.com/mikf/gallery-dl/issues/3221)) - [twitter] add `avatar` and `background` extractors ([#349](https://github.com/mikf/gallery-dl/issues/349), [#3023](https://github.com/mikf/gallery-dl/issues/3023)) - [uploadir] add support for `uploadir.com` ([#3162](https://github.com/mikf/gallery-dl/issues/3162)) - [wallhaven] add `user` extractor ([#3212](https://github.com/mikf/gallery-dl/issues/3212), [#3213](https://github.com/mikf/gallery-dl/issues/3213), [#3226](https://github.com/mikf/gallery-dl/issues/3226)) - [downloader:http] add `chunk-size` option ([#3143](https://github.com/mikf/gallery-dl/issues/3143)) - [downloader:http] add file signature check for `.mp4` files - [downloader:http] add file signature check and MIME type for `.avif` files - [postprocessor] implement `post-after` event ([#3117](https://github.com/mikf/gallery-dl/issues/3117)) - [postprocessor:metadata] implement `"mode": "jsonl"` - [postprocessor:metadata] add `open`, `encoding`, and `private` options - add `--chunk-size` command-line option ([#3143](https://github.com/mikf/gallery-dl/issues/3143)) - add `--user-agent` command-line option - implement `http-metadata` option - implement `"user-agent": "browser"` ([#2636](https://github.com/mikf/gallery-dl/issues/2636)) ### Changes - [deviantart] restore cookies warning for mature scraps ([#3129](https://github.com/mikf/gallery-dl/issues/3129)) - [instagram] use REST API for unauthenticated users by default - [downloader:http] increase default `chunk-size` to 32768 bytes ([#3143](https://github.com/mikf/gallery-dl/issues/3143)) - build Windows executables using py2exe's new `freeze()` API - build executables on GitHub Actions with Python 3.11 - reword error text for unsupported URLs ### Fixes - [exhentai] fix pagination ([#3181](https://github.com/mikf/gallery-dl/issues/3181)) - [khinsider] fix extraction ([#3215](https://github.com/mikf/gallery-dl/issues/3215), [#3219](https://github.com/mikf/gallery-dl/issues/3219)) - [realbooru] fix download URLs ([#2530](https://github.com/mikf/gallery-dl/issues/2530)) - [realbooru] fix `tags` extraction ([#2530](https://github.com/mikf/gallery-dl/issues/2530)) - [tumblr] fall back to `gifv` when possible ([#3095](https://github.com/mikf/gallery-dl/issues/3095), [#3159](https://github.com/mikf/gallery-dl/issues/3159)) - [twitter] fix login ([#3220](https://github.com/mikf/gallery-dl/issues/3220)) - [twitter] update URL for syndication API ([#3160](https://github.com/mikf/gallery-dl/issues/3160)) - [weibo] send `Referer` headers ([#3188](https://github.com/mikf/gallery-dl/issues/3188)) - [ytdl] update `parse_bytes` location ([#3256](https://github.com/mikf/gallery-dl/issues/3256)) ### Improvements - [imxto] extract additional metadata ([#3118](https://github.com/mikf/gallery-dl/issues/3118), [#3175](https://github.com/mikf/gallery-dl/issues/3175)) - [instagram] allow downloading avatars for private profiles ([#3255](https://github.com/mikf/gallery-dl/issues/3255)) - [pixiv] raise error for invalid search/ranking parameters ([#3214](https://github.com/mikf/gallery-dl/issues/3214)) - [twitter] update `bookmarks` pagination ([#3172](https://github.com/mikf/gallery-dl/issues/3172)) - [downloader:http] refactor file signature checks - [downloader:http] improve `-r/--limit-rate` accuracy ([#3143](https://github.com/mikf/gallery-dl/issues/3143)) - add loaded config files to debug output - improve `-K` output for lists ### Removals - [instagram] remove login support ([#3139](https://github.com/mikf/gallery-dl/issues/3139), [#3141](https://github.com/mikf/gallery-dl/issues/3141), [#3191](https://github.com/mikf/gallery-dl/issues/3191)) - [instagram] remove `channel` extractor - [ngomik] remove module ## 1.23.5 - 2022-10-30 ### Fixes - [instagram] fix AttributeError on user stories extraction ([#3123](https://github.com/mikf/gallery-dl/issues/3123)) ## 1.23.4 - 2022-10-29 ### Additions - [aibooru] add support for aibooru.online ([#3075](https://github.com/mikf/gallery-dl/issues/3075)) - [instagram] add 'avatar' extractor ([#929](https://github.com/mikf/gallery-dl/issues/929), [#1097](https://github.com/mikf/gallery-dl/issues/1097), [#2992](https://github.com/mikf/gallery-dl/issues/2992)) - [instagram] support 'instagram.com/s/' highlight URLs ([#3076](https://github.com/mikf/gallery-dl/issues/3076)) - [instagram] extract 'coauthors' metadata ([#3107](https://github.com/mikf/gallery-dl/issues/3107)) - [mangasee] add support for 'mangalife' ([#3086](https://github.com/mikf/gallery-dl/issues/3086)) - [mastodon] add 'bookmark' extractor ([#3109](https://github.com/mikf/gallery-dl/issues/3109)) - [mastodon] support cross-instance user references and '/web/' URLs ([#3109](https://github.com/mikf/gallery-dl/issues/3109)) - [moebooru] implement 'notes' extraction ([#3094](https://github.com/mikf/gallery-dl/issues/3094)) - [pixiv] extend 'metadata' option ([#3057](https://github.com/mikf/gallery-dl/issues/3057)) - [reactor] match 'best', 'new', 'all' URLs ([#3073](https://github.com/mikf/gallery-dl/issues/3073)) - [smugloli] add 'smugloli' extractors ([#3060](https://github.com/mikf/gallery-dl/issues/3060)) - [tumblr] add 'fallback-delay' and 'fallback-retries' options ([#2957](https://github.com/mikf/gallery-dl/issues/2957)) - [vichan] add generic extractors for vichan imageboards ### Fixes - [bcy] fix extraction ([#3103](https://github.com/mikf/gallery-dl/issues/3103)) - [gelbooru] support alternate parameter order in post URLs ([#2821](https://github.com/mikf/gallery-dl/issues/2821)) - [hentai2read] support minor versions in chapter URLs ([#3089](https://github.com/mikf/gallery-dl/issues/3089)) - [hentaihere] support minor versions in chapter URLs - [kemonoparty] fix 'dms' extraction ([#3106](https://github.com/mikf/gallery-dl/issues/3106)) - [kemonoparty] update pagination offset - [manganelo] update domain to 'chapmanganato.com' ([#3097](https://github.com/mikf/gallery-dl/issues/3097)) - [pixiv] use 'exact_match_for_tags' as default search mode ([#3092](https://github.com/mikf/gallery-dl/issues/3092)) - [redgifs] fix 'token' extraction ([#3080](https://github.com/mikf/gallery-dl/issues/3080), [#3081](https://github.com/mikf/gallery-dl/issues/3081)) - [skeb] fix extraction ([#3112](https://github.com/mikf/gallery-dl/issues/3112)) - improve compatibility of DownloadArchive ([#3078](https://github.com/mikf/gallery-dl/issues/3078)) ## 1.23.3 - 2022-10-15 ### Additions - [2chen] Add `2chen.moe` extractor ([#2707](https://github.com/mikf/gallery-dl/issues/2707)) - [8chan] add `thread` and `board` extractors ([#2938](https://github.com/mikf/gallery-dl/issues/2938)) - [deviantart] add `group` option ([#3018](https://github.com/mikf/gallery-dl/issues/3018)) - [fanbox] add `content` metadata field ([#3020](https://github.com/mikf/gallery-dl/issues/3020)) - [instagram] restore `cursor` functionality ([#2991](https://github.com/mikf/gallery-dl/issues/2991)) - [instagram] restore warnings for private profiles ([#3004](https://github.com/mikf/gallery-dl/issues/3004), [#3045](https://github.com/mikf/gallery-dl/issues/3045)) - [nana] add `nana` extractors ([#2967](https://github.com/mikf/gallery-dl/issues/2967)) - [nijie] add `feed` and `followed` extractors ([#3048](https://github.com/mikf/gallery-dl/issues/3048)) - [tumblr] support `https://www.tumblr.com/BLOGNAME` URLs ([#3034](https://github.com/mikf/gallery-dl/issues/3034)) - [tumblr] add `offset` option - [vk] add `tagged` extractor ([#2997](https://github.com/mikf/gallery-dl/issues/2997)) - add `path-extended` option ([#3021](https://github.com/mikf/gallery-dl/issues/3021)) - emit debug logging messages before calling time.sleep() ([#2982](https://github.com/mikf/gallery-dl/issues/2982)) ### Changes - [postprocessor:metadata] assume `"mode": "custom"` when `format` is given ### Fixes - [artstation] skip missing projects ([#3016](https://github.com/mikf/gallery-dl/issues/3016)) - [danbooru] fix ugoira metadata extraction ([#3056](https://github.com/mikf/gallery-dl/issues/3056)) - [deviantart] fix `deviation` extraction ([#2981](https://github.com/mikf/gallery-dl/issues/2981)) - [hitomi] fall back to `webp` when selected format is not available ([#3030](https://github.com/mikf/gallery-dl/issues/3030)) - [imagefap] fix and improve folder extraction and gallery pagination ([#3013](https://github.com/mikf/gallery-dl/issues/3013)) - [instagram] fix login ([#3011](https://github.com/mikf/gallery-dl/issues/3011), [#3015](https://github.com/mikf/gallery-dl/issues/3015)) - [nozomi] fix extraction ([#3051](https://github.com/mikf/gallery-dl/issues/3051)) - [redgifs] fix extraction ([#3037](https://github.com/mikf/gallery-dl/issues/3037)) - [tumblr] sleep between fallback retries ([#2957](https://github.com/mikf/gallery-dl/issues/2957)) - [vk] unescape error messages - fix duplicated metadata bug with `-j` ([#3033](https://github.com/mikf/gallery-dl/issues/3033)) - fix bug when processing input file comments ([#2808](https://github.com/mikf/gallery-dl/issues/2808)) ## 1.23.2 - 2022-10-01 ### Additions - [artstation] support search filters ([#2970](https://github.com/mikf/gallery-dl/issues/2970)) - [blogger] add `label` and `query` metadata fields ([#2930](https://github.com/mikf/gallery-dl/issues/2930)) - [exhentai] add a slash to the end of gallery URLs ([#2947](https://github.com/mikf/gallery-dl/issues/2947)) - [instagram] add `count` metadata field ([#2979](https://github.com/mikf/gallery-dl/issues/2979)) - [instagram] add `api` option - [kemonoparty] add `count` metadata field ([#2952](https://github.com/mikf/gallery-dl/issues/2952)) - [mastodon] warn about moved accounts ([#2939](https://github.com/mikf/gallery-dl/issues/2939)) - [newgrounds] add `games` extractor ([#2955](https://github.com/mikf/gallery-dl/issues/2955)) - [newgrounds] extract `type` metadata - [pixiv] add `series` extractor ([#2964](https://github.com/mikf/gallery-dl/issues/2964)) - [sankaku] implement `refresh` option ([#2958](https://github.com/mikf/gallery-dl/issues/2958)) - [skeb] add `search` extractor and `filters` option ([#2945](https://github.com/mikf/gallery-dl/issues/2945)) ### Fixes - [deviantart] fix extraction ([#2981](https://github.com/mikf/gallery-dl/issues/2981), [#2983](https://github.com/mikf/gallery-dl/issues/2983)) - [fappic] fix extraction - [instagram] extract higher-resolution photos ([#2666](https://github.com/mikf/gallery-dl/issues/2666)) - [instagram] fix `username` and `fullname` metadata for saved posts ([#2911](https://github.com/mikf/gallery-dl/issues/2911)) - [instagram] update API headers - [kemonoparty] send `Referer` headers ([#2989](https://github.com/mikf/gallery-dl/issues/2989), [#2990](https://github.com/mikf/gallery-dl/issues/2990)) - [kemonoparty] restore `favorites` API endpoints ([#2994](https://github.com/mikf/gallery-dl/issues/2994)) - [myportfolio] use fallback when no images are found ([#2959](https://github.com/mikf/gallery-dl/issues/2959)) - [plurk] fix extraction ([#2977](https://github.com/mikf/gallery-dl/issues/2977)) - [sankaku] detect expired links ([#2958](https://github.com/mikf/gallery-dl/issues/2958)) - [tumblr] retry extraction of failed higher-resolution images ([#2957](https://github.com/mikf/gallery-dl/issues/2957)) ## 1.23.1 - 2022-09-18 ### Additions - [flickr] add support for `secure.flickr.com` URLs ([#2910](https://github.com/mikf/gallery-dl/issues/2910)) - [hotleak] add hotleak extractors ([#2890](https://github.com/mikf/gallery-dl/issues/2890), [#2909](https://github.com/mikf/gallery-dl/issues/2909)) - [instagram] add `highlight_title` and `date` metadata for highlight downloads ([#2879](https://github.com/mikf/gallery-dl/issues/2879)) - [paheal] add support for videos ([#2892](https://github.com/mikf/gallery-dl/issues/2892)) - [tumblr] fetch high-quality inline images ([#2877](https://github.com/mikf/gallery-dl/issues/2877)) - [tumblr] implement `ratelimit` option ([#2919](https://github.com/mikf/gallery-dl/issues/2919)) - [twitter] add general support for unified cards ([#2875](https://github.com/mikf/gallery-dl/issues/2875)) - [twitter] implement `cards-blacklist` option ([#2875](https://github.com/mikf/gallery-dl/issues/2875)) - [zerochan] add `metadata` option ([#2861](https://github.com/mikf/gallery-dl/issues/2861)) - [postprocessor:zip] implement `files` option ([#2872](https://github.com/mikf/gallery-dl/issues/2872)) ### Fixes - [bunkr] fix extraction ([#2903](https://github.com/mikf/gallery-dl/issues/2903)) - [bunkr] use `media-files` servers for `m4v` and `mov` downloads ([#2925](https://github.com/mikf/gallery-dl/issues/2925)) - [exhentai] improve 509.gif detection ([#2901](https://github.com/mikf/gallery-dl/issues/2901)) - [exhentai] guess extension for original files ([#2842](https://github.com/mikf/gallery-dl/issues/2842)) - [poipiku] use `img-org.poipiku.com` as image domain ([#2796](https://github.com/mikf/gallery-dl/issues/2796)) - [reddit] prevent exception with empty submission URLs ([#2913](https://github.com/mikf/gallery-dl/issues/2913)) - [redgifs] fix download URLs ([#2884](https://github.com/mikf/gallery-dl/issues/2884)) - [smugmug] update default API credentials ([#2881](https://github.com/mikf/gallery-dl/issues/2881)) - [twitter] provide proper `date` for syndication results ([#2920](https://github.com/mikf/gallery-dl/issues/2920)) - [twitter] fix new-style `/card_img/` URLs - remove all whitespace before comments after input file URLs ([#2808](https://github.com/mikf/gallery-dl/issues/2808)) ## 1.23.0 - 2022-08-28 ### Changes - [twitter] update `user` and `author` metdata fields - for URLs with a single username or ID like `https://twitter.com/USER` or a search with a single `from:` statement, `user` will now always refer to the user referenced in the URL. - for all other URLs like `https://twitter.com/i/bookmarks`, `user` and `author` refer to the same user - `author` will always refer to the original Tweet author - [twitter] update `quote_id` and `quote_by` metadata fields - `quote_id` is now non-zero for quoted Tweets and contains the Tweet ID of the quotng Tweet (was the other way round before) - `quote_by` is only defined for quoted Tweets like before, but now contains the screen name of the user quoting this Tweet - [skeb] improve archive IDs for thumbnails and article images ### Additions - [artstation] add `num` and `count` metadata fields ([#2764](https://github.com/mikf/gallery-dl/issues/2764)) - [catbox] add `album` extractor ([#2410](https://github.com/mikf/gallery-dl/issues/2410)) - [blogger] emit metadata for posts without files ([#2789](https://github.com/mikf/gallery-dl/issues/2789)) - [foolfuuka] update supported domains - [gelbooru] add support for `api_key` and `user_id` ([#2767](https://github.com/mikf/gallery-dl/issues/2767)) - [gelbooru] implement pagination for `pool` results ([#2853](https://github.com/mikf/gallery-dl/issues/2853)) - [instagram] add support for a user's saved collections ([#2769](https://github.com/mikf/gallery-dl/issues/2769)) - [instagram] provide `date` for directory format strings ([#2830](https://github.com/mikf/gallery-dl/issues/2830)) - [kemonoparty] add `favorites` option ([#2826](https://github.com/mikf/gallery-dl/issues/2826), [#2831](https://github.com/mikf/gallery-dl/issues/2831)) - [oauth] add `host` config option ([#2806](https://github.com/mikf/gallery-dl/issues/2806)) - [rule34] implement pagination for `pool` results ([#2853](https://github.com/mikf/gallery-dl/issues/2853)) - [skeb] add option to download `article` images ([#1031](https://github.com/mikf/gallery-dl/issues/1031)) - [tumblr] download higher-quality images ([#2761](https://github.com/mikf/gallery-dl/issues/2761)) - [tumblr] add `count` metadata field ([#2804](https://github.com/mikf/gallery-dl/issues/2804)) - [wallhaven] implement `metadata` option ([#2803](https://github.com/mikf/gallery-dl/issues/2803)) - [zerochan] add `tag` and `image` extractors ([#1434](https://github.com/mikf/gallery-dl/issues/1434)) - [zerochan] implement login with username & password ([#1434](https://github.com/mikf/gallery-dl/issues/1434)) - [postprocessor:metadata] implement `mode: modify` and `mode: delete` ([#2640](https://github.com/mikf/gallery-dl/issues/2640)) - [formatter] add `g` conversion for slugifying a string ([#2410](https://github.com/mikf/gallery-dl/issues/2410)) - [formatter] apply `:J` only to lists ([#2833](https://github.com/mikf/gallery-dl/issues/2833)) - implement `path-metadata` option ([#2734](https://github.com/mikf/gallery-dl/issues/2734)) - allow comments after input file URLs ([#2808](https://github.com/mikf/gallery-dl/issues/2808)) - add global `warnings` option to control `urllib3` warning behavior ([#2762](https://github.com/mikf/gallery-dl/issues/2762)) ### Fixes - [bunkr] fix extraction ([#2788](https://github.com/mikf/gallery-dl/issues/2788)) - [deviantart] use public access token for journals ([#2702](https://github.com/mikf/gallery-dl/issues/2702)) - [e621] fix extraction of `popular` posts - [fanbox] download cover images in original size ([#2784](https://github.com/mikf/gallery-dl/issues/2784)) - [mastodon] allow downloading without access token ([#2782](https://github.com/mikf/gallery-dl/issues/2782)) - [hitomi] update cache expiry time ([#2863](https://github.com/mikf/gallery-dl/issues/2863)) - [hitomi] fix error when number of tag results is a multiple of 25 ([#2870](https://github.com/mikf/gallery-dl/issues/2870)) - [mangahere] fix `page-reverse` option ([#2795](https://github.com/mikf/gallery-dl/issues/2795)) - [poipiku] fix posts with more than one image ([#2796](https://github.com/mikf/gallery-dl/issues/2796)) - [poipiku] update filter for static images ([#2796](https://github.com/mikf/gallery-dl/issues/2796)) - [slideshare] fix metadata extraction - [twitter] unescape `+` in search queries ([#2226](https://github.com/mikf/gallery-dl/issues/2226)) - [twitter] fall back to unfiltered search ([#2766](https://github.com/mikf/gallery-dl/issues/2766)) - [twitter] ignore invalid user entries ([#2850](https://github.com/mikf/gallery-dl/issues/2850)) - [vk] prevent exceptions for broken/invalid photos ([#2774](https://github.com/mikf/gallery-dl/issues/2774)) - [vsco] fix `collection` extraction - [weibo] prevent exception for missing `playback_list` ([#2792](https://github.com/mikf/gallery-dl/issues/2792)) - [weibo] prevent errors when paginating over album entries ([#2817](https://github.com/mikf/gallery-dl/issues/2817)) ## 1.22.4 - 2022-07-15 ### Additions - [instagram] add `pinned` metadata field ([#2752](https://github.com/mikf/gallery-dl/issues/2752)) - [itaku] categorize sections by group ([#1842](https://github.com/mikf/gallery-dl/issues/1842)) - [khinsider] extract `platform` metadata - [tumblr] support `/blog/view` URLs ([#2760](https://github.com/mikf/gallery-dl/issues/2760)) - [twitter] implement `strategy` option ([#2712](https://github.com/mikf/gallery-dl/issues/2712)) - [twitter] add `count` metadata field ([#2741](https://github.com/mikf/gallery-dl/issues/2741)) - [formatter] implement `O` format specifier ([#2736](https://github.com/mikf/gallery-dl/issues/2736)) - [postprocessor:mtime] add `value` option ([#2739](https://github.com/mikf/gallery-dl/issues/2739)) - add `--no-postprocessors` command-line option ([#2725](https://github.com/mikf/gallery-dl/issues/2725)) - implement `format-separator` option ([#2737](https://github.com/mikf/gallery-dl/issues/2737)) ### Changes - [pinterest] handle section pins with separate extractors ([#2684](https://github.com/mikf/gallery-dl/issues/2684)) - [postprocessor:ugoira] enable `mtime` by default ([#2714](https://github.com/mikf/gallery-dl/issues/2714)) ### Fixes - [bunkr] fix extraction ([#2732](https://github.com/mikf/gallery-dl/issues/2732)) - [hentaifoundry] fix metadata extraction - [itaku] fix user caching ([#1842](https://github.com/mikf/gallery-dl/issues/1842)) - [itaku] fix `date` parsing - [kemonoparty] ensure all files have an `extension` ([#2740](https://github.com/mikf/gallery-dl/issues/2740)) - [komikcast] update domain - [mangakakalot] update domain - [newgrounds] only attempt to login if necessary ([#2715](https://github.com/mikf/gallery-dl/issues/2715)) - [newgrounds] prevent exception on empty results ([#2727](https://github.com/mikf/gallery-dl/issues/2727)) - [nozomi] reduce memory consumption during searches ([#2754](https://github.com/mikf/gallery-dl/issues/2754)) - [pixiv] fix default `background` filenames - [sankaku] rewrite file URLs to s.sankakucomplex.com ([#2746](https://github.com/mikf/gallery-dl/issues/2746)) - [slideshare] fix `description` extraction - [twitter] ignore previously seen Tweets ([#2712](https://github.com/mikf/gallery-dl/issues/2712)) - [twitter] unescape HTML entities in `content` ([#2757](https://github.com/mikf/gallery-dl/issues/2757)) - [weibo] handle invalid or broken status objects - [postprocessor:zip] ensure target directory exists ([#2758](https://github.com/mikf/gallery-dl/issues/2758)) - make `brotli` an *optional* dependency ([#2716](https://github.com/mikf/gallery-dl/issues/2716)) - limit path length for `--write-pages` output on Windows ([#2733](https://github.com/mikf/gallery-dl/issues/2733)) ### Removals - [foolfuuka] remove archive.wakarimasen.moe ## 1.22.3 - 2022-06-28 ### Changes - [twitter] revert strategy changes for user URLs ([#2712](https://github.com/mikf/gallery-dl/issues/2712), [#2710](https://github.com/mikf/gallery-dl/issues/2710)) - update default User-Agent headers ## 1.22.2 - 2022-06-27 ### Additions - [cyberdrop] add fallback URLs ([#2668](https://github.com/mikf/gallery-dl/issues/2668)) - [horne] add support for horne.red ([#2700](https://github.com/mikf/gallery-dl/issues/2700)) - [itaku] add `gallery` and `image` extractors ([#1842](https://github.com/mikf/gallery-dl/issues/1842)) - [poipiku] add `user` and `post` extractors ([#1602](https://github.com/mikf/gallery-dl/issues/1602)) - [skeb] add `following` extractor ([#2698](https://github.com/mikf/gallery-dl/issues/2698)) - [twitter] implement `expand` option ([#2665](https://github.com/mikf/gallery-dl/issues/2665)) - [twitter] implement `csrf` option ([#2676](https://github.com/mikf/gallery-dl/issues/2676)) - [unsplash] add `collection_title` and `collection_id` metadata fields ([#2670](https://github.com/mikf/gallery-dl/issues/2670)) - [weibo] support `tabtype=video` listings ([#2601](https://github.com/mikf/gallery-dl/issues/2601)) - [formatter] implement slice operator as format specifier - support cygwin/BSD/etc for `--cookies-from-browser` ### Fixes - [instagram] improve metadata generated by `_parse_post_api()` ([#2695](https://github.com/mikf/gallery-dl/issues/2695), [#2660](https://github.com/mikf/gallery-dl/issues/2660)) - [instagram} fix `tag` extractor ([#2659](https://github.com/mikf/gallery-dl/issues/2659)) - [instagram] automatically invalidate expired login sessions - [twitter] fix pagination for conversion tweets - [twitter] improve `"replies": "self"` ([#2665](https://github.com/mikf/gallery-dl/issues/2665)) - [twitter] improve strategy for user URLs ([#2665](https://github.com/mikf/gallery-dl/issues/2665)) - [vk] take URLs from `*_src` entries ([#2535](https://github.com/mikf/gallery-dl/issues/2535)) - [weibo] fix URLs generated by `user` extractor ([#2601](https://github.com/mikf/gallery-dl/issues/2601)) - [weibo] fix retweets ([#2601](https://github.com/mikf/gallery-dl/issues/2601)) - [downloader:ytdl] update `_set_outtmpl()` ([#2692](https://github.com/mikf/gallery-dl/issues/2692)) - [formatter] fix `!j` conversion for non-serializable types ([#2624](https://github.com/mikf/gallery-dl/issues/2624)) - [snap] Fix missing libslang dependency ([#2655](https://github.com/mikf/gallery-dl/issues/2655)) ## 1.22.1 - 2022-06-04 ### Additions - [gfycat] add support for collections ([#2629](https://github.com/mikf/gallery-dl/issues/2629)) - [instagram] support specifying users by ID - [paheal] extract more metadata ([#2641](https://github.com/mikf/gallery-dl/issues/2641)) - [reddit] add `home` extractor ([#2614](https://github.com/mikf/gallery-dl/issues/2614)) - [weibo] support usernames in URLs ([#1662](https://github.com/mikf/gallery-dl/issues/1662)) - [weibo] support `livephoto` and `gif` files ([#2146](https://github.com/mikf/gallery-dl/issues/2146)) - [weibo] add support for several different `tabtype` listings ([#686](https://github.com/mikf/gallery-dl/issues/686), [#2601](https://github.com/mikf/gallery-dl/issues/2601)) - [postprocessor:metadata] write to stdout by setting filename to "-" ([#2624](https://github.com/mikf/gallery-dl/issues/2624)) - implement `output.ansi` option ([#2628](https://github.com/mikf/gallery-dl/issues/2628)) - support user-defined `output.mode` settings ([#2529](https://github.com/mikf/gallery-dl/issues/2529)) ### Changes - [readcomiconline] remove default `browser` setting ([#2625](https://github.com/mikf/gallery-dl/issues/2625)) - [weibo] switch to desktop API ([#2601](https://github.com/mikf/gallery-dl/issues/2601)) - fix command-line argument name of `--cookies-from-browser` ([#1606](https://github.com/mikf/gallery-dl/issues/1606), [#2630](https://github.com/mikf/gallery-dl/issues/2630)) ### Fixes - [bunkr] change domain to `app.bunkr.is` ([#2634](https://github.com/mikf/gallery-dl/issues/2634)) - [deviantart] fix folder listings with `"pagination": "manual"` ([#2488](https://github.com/mikf/gallery-dl/issues/2488)) - [gofile] fix 401 Unauthorized errors ([#2632](https://github.com/mikf/gallery-dl/issues/2632)) - [hypnohub] move to gelbooru_v02 instances ([#2631](https://github.com/mikf/gallery-dl/issues/2631)) - [instagram] fix and update extractors ([#2644](https://github.com/mikf/gallery-dl/issues/2644)) - [nozomi] remove slashes from search terms ([#2653](https://github.com/mikf/gallery-dl/issues/2653)) - [pixiv] include `.gif` in background fallback URLs ([#2495](https://github.com/mikf/gallery-dl/issues/2495)) - [sankaku] extend URL patterns ([#2647](https://github.com/mikf/gallery-dl/issues/2647)) - [subscribestar] fix `date` metadata ([#2642](https://github.com/mikf/gallery-dl/issues/2642)) ## 1.22.0 - 2022-05-25 ### Additions - [gelbooru_v01] add `favorite` extractor ([#2546](https://github.com/mikf/gallery-dl/issues/2546)) - [Instagram] add `tagged_users` to keywords for stories ([#2582](https://github.com/mikf/gallery-dl/issues/2582), [#2584](https://github.com/mikf/gallery-dl/issues/2584)) - [lolisafe] implement `domain` option ([#2575](https://github.com/mikf/gallery-dl/issues/2575)) - [naverwebtoon] support (best)challenge comics ([#2542](https://github.com/mikf/gallery-dl/issues/2542)) - [nijie] support /history_nuita.php listings ([#2541](https://github.com/mikf/gallery-dl/issues/2541)) - [pixiv] provide more data when `metadata` is enabled ([#2594](https://github.com/mikf/gallery-dl/issues/2594)) - [shopify] support several more sites by default ([#2089](https://github.com/mikf/gallery-dl/issues/2089)) - [twitter] extract alt texts as `description` ([#2617](https://github.com/mikf/gallery-dl/issues/2617)) - [twitter] recognize vxtwitter URLs ([#2621](https://github.com/mikf/gallery-dl/issues/2621)) - [weasyl] implement `metadata` option ([#2610](https://github.com/mikf/gallery-dl/issues/2610)) - implement `--cookies-from-browser` ([#1606](https://github.com/mikf/gallery-dl/issues/1606)) - implement `output.colors` options ([#2532](https://github.com/mikf/gallery-dl/issues/2532)) - implement string literals in replacement fields - support using extended format strings for archive keys ### Changes - [foolfuuka] match 4chan filenames ([#2577](https://github.com/mikf/gallery-dl/issues/2577)) - [pixiv] implement `include` option - provide `avatar`/`background` downloads as separate extractors ([#2495](https://github.com/mikf/gallery-dl/issues/2495)) - [twitter] use a better strategy for user URLs - [twitter] disable `cards` by default - delay directory creation ([#2461](https://github.com/mikf/gallery-dl/issues/2461), [#2474](https://github.com/mikf/gallery-dl/issues/2474)) - flush writes to stdout/stderr ([#2529](https://github.com/mikf/gallery-dl/issues/2529)) - build executables on GitHub Actions with Python 3.10 ### Fixes - [artstation] use `"browser": "firefox"` by default ([#2527](https://github.com/mikf/gallery-dl/issues/2527)) - [imgur] prevent exception with empty albums ([#2557](https://github.com/mikf/gallery-dl/issues/2557)) - [instagram] report redirects to captcha challenges ([#2543](https://github.com/mikf/gallery-dl/issues/2543)) - [khinsider] fix metadata extraction ([#2611](https://github.com/mikf/gallery-dl/issues/2611)) - [mangafox] send Referer headers ([#2592](https://github.com/mikf/gallery-dl/issues/2592)) - [mangahere] send Referer headers ([#2592](https://github.com/mikf/gallery-dl/issues/2592)) - [mangasee] use randomly generated PHPSESSID cookie ([#2560](https://github.com/mikf/gallery-dl/issues/2560)) - [pixiv] make retrieving ugoira metadata non-fatal ([#2562](https://github.com/mikf/gallery-dl/issues/2562)) - [readcomiconline] update deobfuscation code ([#2481](https://github.com/mikf/gallery-dl/issues/2481)) - [realbooru] fix extraction ([#2530](https://github.com/mikf/gallery-dl/issues/2530)) - [vk] handle photos without width/height info ([#2535](https://github.com/mikf/gallery-dl/issues/2535)) - [vk] fix user ID extraction ([#2535](https://github.com/mikf/gallery-dl/issues/2535)) - [webtoons] extract real episode numbers ([#2591](https://github.com/mikf/gallery-dl/issues/2591)) - create missing directories for archive files ([#2597](https://github.com/mikf/gallery-dl/issues/2597)) - detect circular references with `-K` ([#2609](https://github.com/mikf/gallery-dl/issues/2609)) - replace "\f" in `--filename` arguments with a form feed character ([#2396](https://github.com/mikf/gallery-dl/issues/2396)) ### Removals - [gelbooru_v01] remove tlb.booru.org from supported domains ## 1.21.2 - 2022-04-27 ### Additions - [deviantart] implement `pagination` option ([#2488](https://github.com/mikf/gallery-dl/issues/2488)) - [pixiv] implement `background` option ([#623](https://github.com/mikf/gallery-dl/issues/623), [#1124](https://github.com/mikf/gallery-dl/issues/1124), [#2495](https://github.com/mikf/gallery-dl/issues/2495)) - [postprocessor:ugoira] report ffmpeg/mkvmerge errors ([#2487](https://github.com/mikf/gallery-dl/issues/2487)) ### Fixes - [cyberdrop] match cyberdrop.to URLs ([#2496](https://github.com/mikf/gallery-dl/issues/2496)) - [e621] fix 403 errors ([#2533](https://github.com/mikf/gallery-dl/issues/2533)) - [issuu] fix extraction ([#2483](https://github.com/mikf/gallery-dl/issues/2483)) - [mangadex] download from available chapters despite `externalUrl` ([#2503](https://github.com/mikf/gallery-dl/issues/2503)) - [photovogue] update domain and api endpoint ([#2494](https://github.com/mikf/gallery-dl/issues/2494)) - [sexcom] add fallback for empty files ([#2485](https://github.com/mikf/gallery-dl/issues/2485)) - [twitter] improve syndication video selection ([#2354](https://github.com/mikf/gallery-dl/issues/2354)) - [twitter] fix various syndication issues ([#2499](https://github.com/mikf/gallery-dl/issues/2499), [#2354](https://github.com/mikf/gallery-dl/issues/2354)) - [vk] fix extraction ([#2512](https://github.com/mikf/gallery-dl/issues/2512)) - [weibo] fix infinite retries for deleted accounts ([#2521](https://github.com/mikf/gallery-dl/issues/2521)) - [postprocessor:ugoira] use compatible paths with mkvmerge ([#2487](https://github.com/mikf/gallery-dl/issues/2487)) - [postprocessor:ugoira] do not auto-select the `image2` demuxer ([#2492](https://github.com/mikf/gallery-dl/issues/2492)) ## 1.21.1 - 2022-04-08 ### Additions - [gofile] add gofile.io extractor ([#2364](https://github.com/mikf/gallery-dl/issues/2364)) - [instagram] add `previews` option ([#2135](https://github.com/mikf/gallery-dl/issues/2135)) - [kemonoparty] add `duplicates` option ([#2440](https://github.com/mikf/gallery-dl/issues/2440)) - [pinterest] add extractor for created pins ([#2452](https://github.com/mikf/gallery-dl/issues/2452)) - [pinterest] support multiple files per pin ([#1619](https://github.com/mikf/gallery-dl/issues/1619), [#2452](https://github.com/mikf/gallery-dl/issues/2452)) - [telegraph] Add telegra.ph extractor ([#2312](https://github.com/mikf/gallery-dl/issues/2312)) - [twitter] add `syndication` option ([#2354](https://github.com/mikf/gallery-dl/issues/2354)) - [twitter] accept fxtwitter.com URLs ([#2484](https://github.com/mikf/gallery-dl/issues/2484)) - [downloader:http] support using an arbitrary method and sending POST data ([#2433](https://github.com/mikf/gallery-dl/issues/2433)) - [postprocessor:metadata] implement archive options ([#2421](https://github.com/mikf/gallery-dl/issues/2421)) - [postprocessor:ugoira] add `mtime` option ([#2307](https://github.com/mikf/gallery-dl/issues/2307)) - [postprocessor:ugoira] support setting timecodes with `mkvmerge` ([#1550](https://github.com/mikf/gallery-dl/issues/1550)) - [formatter] support evaluating f-string literals - add `--ugoira-conv-copy` command-line option ([#1550](https://github.com/mikf/gallery-dl/issues/1550)) - implement a `contains()` function for filter statements ([#2446](https://github.com/mikf/gallery-dl/issues/2446)) ### Fixes - [aryion] provide correct `date` metadata independent of DST - [furaffinity] fix search result pagination ([#2402](https://github.com/mikf/gallery-dl/issues/2402)) - [hitomi] update and fix metadata extraction ([#2444](https://github.com/mikf/gallery-dl/issues/2444)) - [kissgoddess] extract all images ([#2473](https://github.com/mikf/gallery-dl/issues/2473)) - [mangasee] unescape manga names ([#2454](https://github.com/mikf/gallery-dl/issues/2454)) - [newgrounds] update and fix pagination ([#2456](https://github.com/mikf/gallery-dl/issues/2456)) - [newgrounds] warn about age-restricted posts ([#2456](https://github.com/mikf/gallery-dl/issues/2456)) - [pinterest] do not force `m3u8_native` for video downloads ([#2436](https://github.com/mikf/gallery-dl/issues/2436)) - [twibooru] fix posts without `name` ([#2434](https://github.com/mikf/gallery-dl/issues/2434)) - [unsplash] replace dash with space in search API queries ([#2429](https://github.com/mikf/gallery-dl/issues/2429)) - [postprocessor:mtime] fix timestamps from datetime objects ([#2307](https://github.com/mikf/gallery-dl/issues/2307)) - fix yet another bug in `_check_cookies()` ([#2372](https://github.com/mikf/gallery-dl/issues/2372)) - fix loading/storing cookies without domain ## 1.21.0 - 2022-03-14 ### Additions - [fantia] add `num` enumeration index ([#2377](https://github.com/mikf/gallery-dl/issues/2377)) - [fantia] support "Blog Post" content ([#2381](https://github.com/mikf/gallery-dl/issues/2381)) - [imagebam] add support for /view/ paths ([#2378](https://github.com/mikf/gallery-dl/issues/2378)) - [kemonoparty] match beta.kemono.party URLs ([#2348](https://github.com/mikf/gallery-dl/issues/2348)) - [kissgoddess] add `gallery` and `model` extractors ([#1052](https://github.com/mikf/gallery-dl/issues/1052), [#2304](https://github.com/mikf/gallery-dl/issues/2304)) - [mememuseum] add `tag` and `post` extractors ([#2264](https://github.com/mikf/gallery-dl/issues/2264)) - [newgrounds] add `post_url` metadata field ([#2328](https://github.com/mikf/gallery-dl/issues/2328)) - [patreon] add `image_large` file type ([#2257](https://github.com/mikf/gallery-dl/issues/2257)) - [toyhouse] support `art` listings ([#1546](https://github.com/mikf/gallery-dl/issues/1546), [#2331](https://github.com/mikf/gallery-dl/issues/2331)) - [twibooru] add extractors for searches, galleries, and posts ([#2219](https://github.com/mikf/gallery-dl/issues/2219)) - [postprocessor:metadata] implement `mtime` option ([#2307](https://github.com/mikf/gallery-dl/issues/2307)) - [postprocessor:mtime] add `event` option ([#2307](https://github.com/mikf/gallery-dl/issues/2307)) - add fish shell completion ([#2363](https://github.com/mikf/gallery-dl/issues/2363)) - add `timedelta` class to global namespace in filter expressions ### Changes - [seiga] require authentication with `user_session` cookie ([#2372](https://github.com/mikf/gallery-dl/issues/2372)) - remove username & password login due to 2FA - refactor proxy support ([#2357](https://github.com/mikf/gallery-dl/issues/2357)) - allow gallery-dl proxy settings to overwrite environment proxies - allow specifying different proxies for data extraction and download ### Fixes - [bunkr] fix mp4 downloads ([#2239](https://github.com/mikf/gallery-dl/issues/2239)) - [fanbox] fetch data for each individual post ([#2388](https://github.com/mikf/gallery-dl/issues/2388)) - [hentaicosplays] send `Referer` header ([#2317](https://github.com/mikf/gallery-dl/issues/2317)) - [imagebam] set `nsfw_inter` cookie ([#2334](https://github.com/mikf/gallery-dl/issues/2334)) - [kemonoparty] limit default filename length ([#2373](https://github.com/mikf/gallery-dl/issues/2373)) - [mangadex] fix chapters without `translatedLanguage` ([#2352](https://github.com/mikf/gallery-dl/issues/2352)) - [newgrounds] fix video descriptions ([#2328](https://github.com/mikf/gallery-dl/issues/2328)) - [skeb] add `sent-requests` option ([#2322](https://github.com/mikf/gallery-dl/issues/2322), [#2330](https://github.com/mikf/gallery-dl/issues/2330)) - [slideshare] fix extraction - [subscribestar] unescape attachment URLs ([#2370](https://github.com/mikf/gallery-dl/issues/2370)) - [twitter] fix handling of 429 Too Many Requests responses ([#2339](https://github.com/mikf/gallery-dl/issues/2339)) - [twitter] warn about age-restricted Tweets ([#2354](https://github.com/mikf/gallery-dl/issues/2354)) - [twitter] handle Tweets with "softIntervention" entries - [twitter] update query hashes - fix another bug in `_check_cookies()` ([#2160](https://github.com/mikf/gallery-dl/issues/2160)) ## 1.20.5 - 2022-02-14 ### Additions - [furaffinity] add `layout` option ([#2277](https://github.com/mikf/gallery-dl/issues/2277)) - [lightroom] add Lightroom gallery extractor ([#2263](https://github.com/mikf/gallery-dl/issues/2263)) - [reddit] support standalone submissions on personal user pages ([#2301](https://github.com/mikf/gallery-dl/issues/2301)) - [redgifs] support i.redgifs.com URLs ([#2300](https://github.com/mikf/gallery-dl/issues/2300)) - [wallpapercave] add extractor for images and search results ([#2205](https://github.com/mikf/gallery-dl/issues/2205)) - add `signals-ignore` option ([#2296](https://github.com/mikf/gallery-dl/issues/2296)) ### Changes - [danbooru] merge `danbooru` and `e621` extractors - support `atfbooru` ([#2283](https://github.com/mikf/gallery-dl/issues/2283)) - remove support for old e621 tag search URLs ### Fixes - [furaffinity] improve new/old layout detection ([#2277](https://github.com/mikf/gallery-dl/issues/2277)) - [imgbox] fix ImgboxExtractor ([#2281](https://github.com/mikf/gallery-dl/issues/2281)) - [inkbunny] rename search parameters to their API equivalents - [kemonoparty] handle files without names ([#2276](https://github.com/mikf/gallery-dl/issues/2276)) - [twitter] fix extraction ([#2275](https://github.com/mikf/gallery-dl/issues/2275), [#2295](https://github.com/mikf/gallery-dl/issues/2295)) - [vk] fix infinite pagination loops ([#2297](https://github.com/mikf/gallery-dl/issues/2297)) - [downloader:ytdl] make `ImportError`s non-fatal ([#2273](https://github.com/mikf/gallery-dl/issues/2273)) ## 1.20.4 - 2022-02-06 ### Additions - [e621] add `favorite` extractor ([#2250](https://github.com/mikf/gallery-dl/issues/2250)) - [hitomi] add `format` option ([#2260](https://github.com/mikf/gallery-dl/issues/2260)) - [kohlchan] add Kohlchan extractors ([#2251](https://github.com/mikf/gallery-dl/issues/2251)) - [sexcom] add `pins` extractor ([#2265](https://github.com/mikf/gallery-dl/issues/2265)) - [twitter] add `warnings` option ([#2258](https://github.com/mikf/gallery-dl/issues/2258)) - add ability to disable TLS 1.2 ([#2243](https://github.com/mikf/gallery-dl/issues/2243)) - add examples for custom gelbooru instances ([#2262](https://github.com/mikf/gallery-dl/issues/2262)) ### Fixes - [bunkr] fix mp4 downloads ([#2239](https://github.com/mikf/gallery-dl/issues/2239)) - [gelbooru] improve and fix pagination ([#2230](https://github.com/mikf/gallery-dl/issues/2230), [#2232](https://github.com/mikf/gallery-dl/issues/2232)) - [hitomi] "fix" 403 errors ([#2260](https://github.com/mikf/gallery-dl/issues/2260)) - [kemonoparty] fix downloading smaller text files ([#2267](https://github.com/mikf/gallery-dl/issues/2267)) - [patreon] disable TLS 1.2 by default ([#2249](https://github.com/mikf/gallery-dl/issues/2249)) - [twitter] restore errors for protected timelines etc ([#2237](https://github.com/mikf/gallery-dl/issues/2237)) - [twitter] restore `logout` functionality ([#1719](https://github.com/mikf/gallery-dl/issues/1719)) - [twitter] provide fallback URLs for card images - [weibo] update pagination code ([#2244](https://github.com/mikf/gallery-dl/issues/2244)) ## 1.20.3 - 2022-01-26 ### Fixes - [kemonoparty] fix DMs extraction ([#2008](https://github.com/mikf/gallery-dl/issues/2008)) - [twitter] fix crash on Tweets with deleted quotes ([#2225](https://github.com/mikf/gallery-dl/issues/2225)) - [twitter] fix crash on suspended Tweets without `legacy` entry ([#2216](https://github.com/mikf/gallery-dl/issues/2216)) - [twitter] fix crash on unified cards without `type` - [twitter] prevent crash on invalid/deleted Retweets ([#2225](https://github.com/mikf/gallery-dl/issues/2225)) - [twitter] update query hashes ## 1.20.2 - 2022-01-24 ### Additions - [twitter] add `event` extractor (closes [#2109](https://github.com/mikf/gallery-dl/issues/2109)) - [twitter] support image_carousel_website unified cards - add `--source-address` command-line option ([#2206](https://github.com/mikf/gallery-dl/issues/2206)) - add environment variable syntax to formatting.md ([#2065](https://github.com/mikf/gallery-dl/issues/2065)) ### Changes - [twitter] changes to `cards` option - enable `cards` by default - require `cards` to be set to `"ytdl"` to invoke youtube-dl/yt-dlp on unsupported cards ### Fixes - [blogger] support new image domain ([#2204](https://github.com/mikf/gallery-dl/issues/2204)) - [gelbooru] improve video file detection ([#2188](https://github.com/mikf/gallery-dl/issues/2188)) - [hitomi] fix `tag` extraction ([#2189](https://github.com/mikf/gallery-dl/issues/2189)) - [instagram] fix highlights extraction ([#2197](https://github.com/mikf/gallery-dl/issues/2197)) - [mangadex] re-enable warning for external chapters ([#2193](https://github.com/mikf/gallery-dl/issues/2193)) - [newgrounds] set suitabilities filter before starting a search ([#2173](https://github.com/mikf/gallery-dl/issues/2173)) - [philomena] fix search parameter escaping ([#2215](https://github.com/mikf/gallery-dl/issues/2215)) - [reddit] allow downloading from quarantined subreddits ([#2180](https://github.com/mikf/gallery-dl/issues/2180)) - [sexcom] extend URL pattern ([#2220](https://github.com/mikf/gallery-dl/issues/2220)) - [twitter] update to GraphQL API ([#2212](https://github.com/mikf/gallery-dl/issues/2212)) ## 1.20.1 - 2022-01-08 ### Additions - [newgrounds] add `search` extractor ([#2161](https://github.com/mikf/gallery-dl/issues/2161)) ### Changes - restore `-d/--dest` functionality from before 1.20.0 ([#2148](https://github.com/mikf/gallery-dl/issues/2148)) - change short option for `--directory` to `-D` ### Fixes - [gelbooru] handle changed API response format ([#2157](https://github.com/mikf/gallery-dl/issues/2157)) - [hitomi] fix image URLs ([#2153](https://github.com/mikf/gallery-dl/issues/2153)) - [mangadex] fix extraction ([#2177](https://github.com/mikf/gallery-dl/issues/2177)) - [rule34] use `https://api.rule34.xxx` for API requests - fix cookie checks for patreon, fanbox, fantia - improve UNC path handling ([#2126](https://github.com/mikf/gallery-dl/issues/2126)) ## 1.20.0 - 2021-12-29 ### Additions - [500px] add `favorite` extractor ([#1927](https://github.com/mikf/gallery-dl/issues/1927)) - [exhentai] add `source` option - [fanbox] support pixiv redirects ([#2122](https://github.com/mikf/gallery-dl/issues/2122)) - [inkbunny] add `search` extractor ([#2094](https://github.com/mikf/gallery-dl/issues/2094)) - [kemonoparty] support coomer.party ([#2100](https://github.com/mikf/gallery-dl/issues/2100)) - [lolisafe] add generic album extractor for lolisafe/chibisafe instances ([#2038](https://github.com/mikf/gallery-dl/issues/2038), [#2105](https://github.com/mikf/gallery-dl/issues/2105)) - [rule34us] add `tag` and `post` extractors ([#1527](https://github.com/mikf/gallery-dl/issues/1527)) - add a generic extractor ([#735](https://github.com/mikf/gallery-dl/issues/735), [#683](https://github.com/mikf/gallery-dl/issues/683)) - add `-d/--directory` and `-f/--filename` command-line options - add `--sleep-request` and `--sleep-extractor` command-line options - allow specifying `sleep-*` options as string ### Changes - [cyberdrop] include file ID in default filenames - [hitomi] disable `metadata` by default - [kemonoparty] use `service` as subcategory ([#2147](https://github.com/mikf/gallery-dl/issues/2147)) - [kemonoparty] change default `files` order to `attachments,file,inline` ([#1991](https://github.com/mikf/gallery-dl/issues/1991)) - [output] write download progress indicator to stderr - [ytdl] prefer yt-dlp over youtube-dl ([#1850](https://github.com/mikf/gallery-dl/issues/1850), [#2028](https://github.com/mikf/gallery-dl/issues/2028)) - rename `--write-infojson` to `--write-info-json` ### Fixes - [500px] create directories per photo - [artstation] create directories per asset ([#2136](https://github.com/mikf/gallery-dl/issues/2136)) - [deviantart] use `/browse/newest` for most-recent searches ([#2096](https://github.com/mikf/gallery-dl/issues/2096)) - [hitomi] fix image URLs - [instagram] fix error when PostPage data is not in GraphQL format ([#2037](https://github.com/mikf/gallery-dl/issues/2037)) - [instagran] match post URLs with usernames ([#2085](https://github.com/mikf/gallery-dl/issues/2085)) - [instagram] allow downloading specific stories ([#2088](https://github.com/mikf/gallery-dl/issues/2088)) - [furaffinity] warn when no session cookies were found - [pixiv] respect date ranges in search URLs ([#2133](https://github.com/mikf/gallery-dl/issues/2133)) - [sexcom] fix and improve embed extraction ([#2145](https://github.com/mikf/gallery-dl/issues/2145)) - [tumblrgallery] fix extraction ([#2112](https://github.com/mikf/gallery-dl/issues/2112)) - [tumblrgallery] improve `id` extraction ([#2115](https://github.com/mikf/gallery-dl/issues/2115)) - [tumblrgallery] improve search pagination ([#2132](https://github.com/mikf/gallery-dl/issues/2132)) - [twitter] include `4096x4096` as a default image fallback ([#1881](https://github.com/mikf/gallery-dl/issues/1881), [#2107](https://github.com/mikf/gallery-dl/issues/2107)) - [ytdl] update argument parsing to latest yt-dlp changes ([#2124](https://github.com/mikf/gallery-dl/issues/2124)) - handle UNC paths ([#2113](https://github.com/mikf/gallery-dl/issues/2113)) ## 1.19.3 - 2021-11-27 ### Additions - [dynastyscans] add `manga` extractor ([#2035](https://github.com/mikf/gallery-dl/issues/2035)) - [instagram] include user metadata for `tagged` downloads ([#2024](https://github.com/mikf/gallery-dl/issues/2024)) - [kemonoparty] implement `files` option ([#1991](https://github.com/mikf/gallery-dl/issues/1991)) - [kemonoparty] add `dms` option ([#2008](https://github.com/mikf/gallery-dl/issues/2008)) - [mangadex] always provide `artist`, `author`, and `group` metadata fields ([#2049](https://github.com/mikf/gallery-dl/issues/2049)) - [philomena] support furbooru.org ([#1995](https://github.com/mikf/gallery-dl/issues/1995)) - [reactor] support thatpervert.com ([#2029](https://github.com/mikf/gallery-dl/issues/2029)) - [shopify] support loungeunderwear.com ([#2053](https://github.com/mikf/gallery-dl/issues/2053)) - [skeb] add `thumbnails` option ([#2047](https://github.com/mikf/gallery-dl/issues/2047), [#2051](https://github.com/mikf/gallery-dl/issues/2051)) - [subscribestar] add `num` enumeration index ([#2040](https://github.com/mikf/gallery-dl/issues/2040)) - [subscribestar] emit metadata for posts without media ([#1569](https://github.com/mikf/gallery-dl/issues/1569)) - [ytdl] implement `cmdline-args` and `config-file` options to allow parsing ytdl command-line options ([#1680](https://github.com/mikf/gallery-dl/issues/1680)) - [formatter] implement `D` format specifier - extend `blacklist`/`whitelist` syntax ([#2025](https://github.com/mikf/gallery-dl/issues/2025)) ### Fixes - [dynastyscans] provide `date` as datetime object ([#2050](https://github.com/mikf/gallery-dl/issues/2050)) - [exhentai] fix extraction for disowned galleries ([#2055](https://github.com/mikf/gallery-dl/issues/2055)) - [gelbooru] apply workaround for pagination limits - [kemonoparty] skip duplicate files ([#2032](https://github.com/mikf/gallery-dl/issues/2032), [#1991](https://github.com/mikf/gallery-dl/issues/1991), [#1899](https://github.com/mikf/gallery-dl/issues/1899)) - [kemonoparty] provide `date` metadata for gumroad ([#2007](https://github.com/mikf/gallery-dl/issues/2007)) - [mangoxo] fix metadata extraction - [twitter] distinguish between fatal & nonfatal errors ([#2020](https://github.com/mikf/gallery-dl/issues/2020)) - [twitter] fix extractor for direct image links ([#2030](https://github.com/mikf/gallery-dl/issues/2030)) - [webtoons] use download URLs that do not require a `Referer` header ([#2005](https://github.com/mikf/gallery-dl/issues/2005)) - [ytdl] improve error handling ([#1680](https://github.com/mikf/gallery-dl/issues/1680)) - [downloader:ytdl] prevent crash in `_progress_hook()` ([#1680](https://github.com/mikf/gallery-dl/issues/1680)) ### Removals - [seisoparty] remove module ## 1.19.2 - 2021-11-05 ### Additions - [kemonoparty] add `comments` option ([#1980](https://github.com/mikf/gallery-dl/issues/1980)) - [skeb] add `user` and `post` extractors ([#1031](https://github.com/mikf/gallery-dl/issues/1031), [#1971](https://github.com/mikf/gallery-dl/issues/1971)) - [twitter] add `pinned` option - support accessing environment variables and the current local datetime in format strings ([#1968](https://github.com/mikf/gallery-dl/issues/1968)) - add special type format strings to docs ([#1987](https://github.com/mikf/gallery-dl/issues/1987)) ### Fixes - [cyberdrop] fix video extraction ([#1993](https://github.com/mikf/gallery-dl/issues/1993)) - [deviantart] fix `index` values for stashed deviations - [gfycat] provide consistent `userName` values for `user` downloads ([#1962](https://github.com/mikf/gallery-dl/issues/1962)) - [gfycat] show warning when there are no available formats - [hitomi] fix image URLs ([#1975](https://github.com/mikf/gallery-dl/issues/1975), [#1982](https://github.com/mikf/gallery-dl/issues/1982), [#1988](https://github.com/mikf/gallery-dl/issues/1988)) - [instagram] update query hashes - [mangakakalot] update domain and fix extraction - [mangoxo] fix login and extraction - [reddit] prevent crash for galleries with no `media_metadata` ([#2001](https://github.com/mikf/gallery-dl/issues/2001)) - [redgifs] update to API v2 ([#1984](https://github.com/mikf/gallery-dl/issues/1984)) - fix calculating retry sleep times ([#1990](https://github.com/mikf/gallery-dl/issues/1990)) ## 1.19.1 - 2021-10-24 ### Additions - [inkbunny] add `following` extractor ([#515](https://github.com/mikf/gallery-dl/issues/515)) - [inkbunny] add `pool` extractor ([#1937](https://github.com/mikf/gallery-dl/issues/1937)) - [kemonoparty] add `discord` extractor ([#1827](https://github.com/mikf/gallery-dl/issues/1827), [#1940](https://github.com/mikf/gallery-dl/issues/1940)) - [nhentai] add `tag` extractor ([#1950](https://github.com/mikf/gallery-dl/issues/1950), [#1955](https://github.com/mikf/gallery-dl/issues/1955)) - [patreon] add `files` option ([#1935](https://github.com/mikf/gallery-dl/issues/1935)) - [picarto] add `gallery` extractor ([#1931](https://github.com/mikf/gallery-dl/issues/1931)) - [pixiv] add `sketch` extractor ([#1497](https://github.com/mikf/gallery-dl/issues/1497)) - [seisoparty] add `favorite` extractor ([#1906](https://github.com/mikf/gallery-dl/issues/1906)) - [twitter] add `size` option ([#1881](https://github.com/mikf/gallery-dl/issues/1881)) - [vk] add `album` extractor ([#474](https://github.com/mikf/gallery-dl/issues/474), [#1952](https://github.com/mikf/gallery-dl/issues/1952)) - [postprocessor:compare] add `equal` option ([#1592](https://github.com/mikf/gallery-dl/issues/1592)) ### Fixes - [cyberdrop] extract direct download URLs ([#1943](https://github.com/mikf/gallery-dl/issues/1943)) - [deviantart] update `search` argument handling ([#1911](https://github.com/mikf/gallery-dl/issues/1911)) - [deviantart] full resolution for non-downloadable images ([#293](https://github.com/mikf/gallery-dl/issues/293)) - [furaffinity] unquote search queries ([#1958](https://github.com/mikf/gallery-dl/issues/1958)) - [inkbunny] match "long" URLs for pools and favorites ([#1937](https://github.com/mikf/gallery-dl/issues/1937)) - [kemonoparty] improve inline extraction ([#1899](https://github.com/mikf/gallery-dl/issues/1899)) - [mangadex] update parameter handling for API requests ([#1908](https://github.com/mikf/gallery-dl/issues/1908)) - [patreon] better filenames for `content` images ([#1954](https://github.com/mikf/gallery-dl/issues/1954)) - [redgifs][gfycat] provide fallback URLs ([#1962](https://github.com/mikf/gallery-dl/issues/1962)) - [downloader:ytdl] prevent crash in `_progress_hook()` - restore SOCKS support for Windows executables ## 1.19.0 - 2021-10-01 ### Additions - [aryion] add `tag` extractor ([#1849](https://github.com/mikf/gallery-dl/issues/1849)) - [desktopography] implement desktopography extractors ([#1740](https://github.com/mikf/gallery-dl/issues/1740)) - [deviantart] implement `auto-unwatch` option ([#1466](https://github.com/mikf/gallery-dl/issues/1466), [#1757](https://github.com/mikf/gallery-dl/issues/1757)) - [fantia] add `date` metadata field ([#1853](https://github.com/mikf/gallery-dl/issues/1853)) - [fappic] add `image` extractor ([#1898](https://github.com/mikf/gallery-dl/issues/1898)) - [gelbooru_v02] add `favorite` extractor ([#1834](https://github.com/mikf/gallery-dl/issues/1834)) - [kemonoparty] add `favorite` extractor ([#1824](https://github.com/mikf/gallery-dl/issues/1824)) - [kemonoparty] implement login with username & password ([#1824](https://github.com/mikf/gallery-dl/issues/1824)) - [mastodon] add `following` extractor ([#1891](https://github.com/mikf/gallery-dl/issues/1891)) - [mastodon] support specifying accounts by ID - [twitter] support `/with_replies` URLs ([#1833](https://github.com/mikf/gallery-dl/issues/1833)) - [twitter] add `quote_by` metadata field ([#1481](https://github.com/mikf/gallery-dl/issues/1481)) - [postprocessor:compare] extend `action` option ([#1592](https://github.com/mikf/gallery-dl/issues/1592)) - implement a download progress indicator ([#1519](https://github.com/mikf/gallery-dl/issues/1519)) - implement a `page-reverse` option ([#1854](https://github.com/mikf/gallery-dl/issues/1854)) - implement a way to specify extended format strings - allow specifying a minimum/maximum for `sleep-*` options ([#1835](https://github.com/mikf/gallery-dl/issues/1835)) - add a `--write-infojson` command-line option ### Changes - [cyberdrop] change directory name format ([#1871](https://github.com/mikf/gallery-dl/issues/1871)) - [instagram] update default delay to 6-12 seconds ([#1835](https://github.com/mikf/gallery-dl/issues/1835)) - [reddit] extend subcategory depending on input URL ([#1836](https://github.com/mikf/gallery-dl/issues/1836)) - move util.Formatter and util.PathFormat into their own modules ### Fixes - [artstation] use `/album/all` view for user portfolios ([#1826](https://github.com/mikf/gallery-dl/issues/1826)) - [aryion] update/improve pagination ([#1849](https://github.com/mikf/gallery-dl/issues/1849)) - [deviantart] fix bug with fetching premium content ([#1879](https://github.com/mikf/gallery-dl/issues/1879)) - [deviantart] update default archive_fmt for single deviations ([#1874](https://github.com/mikf/gallery-dl/issues/1874)) - [erome] send Referer header for file downloads ([#1829](https://github.com/mikf/gallery-dl/issues/1829)) - [hiperdex] fix extraction - [kemonoparty] update file download URLs ([#1902](https://github.com/mikf/gallery-dl/issues/1902), [#1903](https://github.com/mikf/gallery-dl/issues/1903)) - [mangadex] fix extraction ([#1852](https://github.com/mikf/gallery-dl/issues/1852)) - [mangadex] fix retrieving chapters from "pornographic" titles ([#1908](https://github.com/mikf/gallery-dl/issues/1908)) - [nozomi] preserve case of search tags ([#1860](https://github.com/mikf/gallery-dl/issues/1860)) - [redgifs][gfycat] remove webtoken code ([#1907](https://github.com/mikf/gallery-dl/issues/1907)) - [twitter] ensure card entries have a `url` ([#1868](https://github.com/mikf/gallery-dl/issues/1868)) - implement a way to correctly shorten displayed filenames containing east-asian characters ([#1377](https://github.com/mikf/gallery-dl/issues/1377)) ## 1.18.4 - 2021-09-04 ### Additions - [420chan] add `thread` and `board` extractors ([#1773](https://github.com/mikf/gallery-dl/issues/1773)) - [deviantart] add `tag` extractor ([#1803](https://github.com/mikf/gallery-dl/issues/1803)) - [deviantart] add `comments` option ([#1800](https://github.com/mikf/gallery-dl/issues/1800)) - [deviantart] implement a `auto-watch` option ([#1466](https://github.com/mikf/gallery-dl/issues/1466), [#1757](https://github.com/mikf/gallery-dl/issues/1757)) - [foolfuuka] add `gallery` extractor ([#1785](https://github.com/mikf/gallery-dl/issues/1785)) - [furaffinity] expand URL pattern for searches ([#1780](https://github.com/mikf/gallery-dl/issues/1780)) - [kemonoparty] automatically generate required DDoS-GUARD cookies ([#1779](https://github.com/mikf/gallery-dl/issues/1779)) - [nhentai] add `favorite` extractor ([#1814](https://github.com/mikf/gallery-dl/issues/1814)) - [shopify] support windsorstore.com ([#1793](https://github.com/mikf/gallery-dl/issues/1793)) - [twitter] add `url` to user objects ([#1787](https://github.com/mikf/gallery-dl/issues/1787), [#1532](https://github.com/mikf/gallery-dl/issues/1532)) - [twitter] expand t.co links in user descriptions ([#1787](https://github.com/mikf/gallery-dl/issues/1787), [#1532](https://github.com/mikf/gallery-dl/issues/1532)) - show a warning if an extractor doesn`t yield any results ([#1428](https://github.com/mikf/gallery-dl/issues/1428), [#1759](https://github.com/mikf/gallery-dl/issues/1759)) - add a `j` format string conversion - implement a `fallback` option ([#1770](https://github.com/mikf/gallery-dl/issues/1770)) - implement a `path-strip` option ### Changes - [shopify] use API for product listings ([#1793](https://github.com/mikf/gallery-dl/issues/1793)) - update default User-Agent headers ### Fixes - [deviantart] prevent exceptions for "empty" videos ([#1796](https://github.com/mikf/gallery-dl/issues/1796)) - [exhentai] improve image limits check ([#1808](https://github.com/mikf/gallery-dl/issues/1808)) - [inkbunny] fix extraction ([#1816](https://github.com/mikf/gallery-dl/issues/1816)) - [mangadex] prevent exceptions for manga without English title ([#1815](https://github.com/mikf/gallery-dl/issues/1815)) - [oauth] use defaults when config values are set to `null` ([#1778](https://github.com/mikf/gallery-dl/issues/1778)) - [pixiv] fix pixivision title extraction - [reddit] delay RedditAPI initialization ([#1813](https://github.com/mikf/gallery-dl/issues/1813)) - [twitter] improve error reporting ([#1759](https://github.com/mikf/gallery-dl/issues/1759)) - [twitter] fix issue when filtering quote tweets ([#1792](https://github.com/mikf/gallery-dl/issues/1792)) - [twitter] fix `logout` option ([#1719](https://github.com/mikf/gallery-dl/issues/1719)) ### Removals - [deviantart] remove the "you need session cookies to download mature scraps" warning ([#1777](https://github.com/mikf/gallery-dl/issues/1777), [#1776](https://github.com/mikf/gallery-dl/issues/1776)) - [foolslide] remove entry for kobato.hologfx.com ## 1.18.3 - 2021-08-13 ### Additions - [bbc] add `width` option ([#1706](https://github.com/mikf/gallery-dl/issues/1706)) - [danbooru] add `external` option ([#1747](https://github.com/mikf/gallery-dl/issues/1747)) - [furaffinity] add `external` option ([#1492](https://github.com/mikf/gallery-dl/issues/1492)) - [luscious] add `gif` option ([#1701](https://github.com/mikf/gallery-dl/issues/1701)) - [newgrounds] add `format` option ([#1729](https://github.com/mikf/gallery-dl/issues/1729)) - [reactor] add `gif` option ([#1701](https://github.com/mikf/gallery-dl/issues/1701)) - [twitter] warn about suspended accounts ([#1759](https://github.com/mikf/gallery-dl/issues/1759)) - [twitter] extend `replies` option ([#1254](https://github.com/mikf/gallery-dl/issues/1254)) - [twitter] add option to log out and retry when blocked ([#1719](https://github.com/mikf/gallery-dl/issues/1719)) - [wikieat] add `thread` and `board` extractors ([#1699](https://github.com/mikf/gallery-dl/issues/1699), [#1607](https://github.com/mikf/gallery-dl/issues/1607)) ### Changes - [instagram] increase default delay between HTTP requests from 5s to 8s ([#1732](https://github.com/mikf/gallery-dl/issues/1732)) ### Fixes - [bbc] improve image dimensions ([#1706](https://github.com/mikf/gallery-dl/issues/1706)) - [bbc] support multi-page gallery listings ([#1730](https://github.com/mikf/gallery-dl/issues/1730)) - [behance] fix `collection` extraction - [deviantart] get original files for GIF previews ([#1731](https://github.com/mikf/gallery-dl/issues/1731)) - [furaffinity] fix errors when using `category-transfer` ([#1274](https://github.com/mikf/gallery-dl/issues/1274)) - [hitomi] fix image URLs ([#1765](https://github.com/mikf/gallery-dl/issues/1765)) - [instagram] use custom User-Agent header for video downloads ([#1682](https://github.com/mikf/gallery-dl/issues/1682), [#1623](https://github.com/mikf/gallery-dl/issues/1623), [#1580](https://github.com/mikf/gallery-dl/issues/1580)) - [kemonoparty] fix username extraction ([#1750](https://github.com/mikf/gallery-dl/issues/1750)) - [kemonoparty] update file server domain ([#1764](https://github.com/mikf/gallery-dl/issues/1764)) - [newgrounds] fix errors when using `category-transfer` ([#1274](https://github.com/mikf/gallery-dl/issues/1274)) - [nsfwalbum] retry backend requests when extracting image URLs ([#1733](https://github.com/mikf/gallery-dl/issues/1733), [#1271](https://github.com/mikf/gallery-dl/issues/1271)) - [vk] prevent exception for empty/private profiles ([#1742](https://github.com/mikf/gallery-dl/issues/1742)) ## 1.18.2 - 2021-07-23 ### Additions - [bbc] add `gallery` and `programme` extractors ([#1706](https://github.com/mikf/gallery-dl/issues/1706)) - [comicvine] add extractor ([#1712](https://github.com/mikf/gallery-dl/issues/1712)) - [kemonoparty] add `max-posts` option ([#1674](https://github.com/mikf/gallery-dl/issues/1674)) - [kemonoparty] parse `o` query parameters ([#1674](https://github.com/mikf/gallery-dl/issues/1674)) - [mastodon] add `reblogs` and `replies` options ([#1669](https://github.com/mikf/gallery-dl/issues/1669)) - [pixiv] add extractor for `pixivision` articles ([#1672](https://github.com/mikf/gallery-dl/issues/1672)) - [ytdl] add experimental extractor for sites supported by youtube-dl ([#1680](https://github.com/mikf/gallery-dl/issues/1680), [#878](https://github.com/mikf/gallery-dl/issues/878)) - extend `parent-metadata` functionality ([#1687](https://github.com/mikf/gallery-dl/issues/1687), [#1651](https://github.com/mikf/gallery-dl/issues/1651), [#1364](https://github.com/mikf/gallery-dl/issues/1364)) - add `archive-prefix` option ([#1711](https://github.com/mikf/gallery-dl/issues/1711)) - add `url-metadata` option ([#1659](https://github.com/mikf/gallery-dl/issues/1659), [#1073](https://github.com/mikf/gallery-dl/issues/1073)) ### Changes - [kemonoparty] skip duplicated patreon files ([#1689](https://github.com/mikf/gallery-dl/issues/1689), [#1667](https://github.com/mikf/gallery-dl/issues/1667)) - [mangadex] use custom User-Agent header ([#1535](https://github.com/mikf/gallery-dl/issues/1535)) ### Fixes - [hitomi] fix image URLs ([#1679](https://github.com/mikf/gallery-dl/issues/1679)) - [imagevenue] fix extraction ([#1677](https://github.com/mikf/gallery-dl/issues/1677)) - [instagram] fix extraction of `/explore/tags/` posts ([#1666](https://github.com/mikf/gallery-dl/issues/1666)) - [moebooru] fix `tags` ending with a `+` when logged in ([#1702](https://github.com/mikf/gallery-dl/issues/1702)) - [naverwebtoon] fix comic extraction - [pururin] update domain and fix extraction - [vk] improve metadata extraction and URL pattern ([#1691](https://github.com/mikf/gallery-dl/issues/1691)) - [downloader:ytdl] fix `outtmpl` setting for yt-dlp ([#1680](https://github.com/mikf/gallery-dl/issues/1680)) ## 1.18.1 - 2021-07-04 ### Additions - [mangafox] add `manga` extractor ([#1633](https://github.com/mikf/gallery-dl/issues/1633)) - [mangasee] add `chapter` and `manga` extractors - [mastodon] implement `text-posts` option ([#1569](https://github.com/mikf/gallery-dl/issues/1569), [#1669](https://github.com/mikf/gallery-dl/issues/1669)) - [seisoparty] add `user` and `post` extractors ([#1635](https://github.com/mikf/gallery-dl/issues/1635)) - implement conditional directories ([#1394](https://github.com/mikf/gallery-dl/issues/1394)) - add `T` format string conversion ([#1646](https://github.com/mikf/gallery-dl/issues/1646)) - document format string syntax ### Changes - [twitter] set `retweet_id` for original retweets ([#1481](https://github.com/mikf/gallery-dl/issues/1481)) ### Fixes - [directlink] manually encode Referer URLs ([#1647](https://github.com/mikf/gallery-dl/issues/1647)) - [hiperdex] use domain from input URL - [kemonoparty] fix `username` extraction ([#1652](https://github.com/mikf/gallery-dl/issues/1652)) - [kemonoparty] warn about missing DDoS-GUARD cookies - [twitter] ensure guest tokens are returned as string ([#1665](https://github.com/mikf/gallery-dl/issues/1665)) - [webtoons] match arbitrary language codes ([#1643](https://github.com/mikf/gallery-dl/issues/1643)) - fix depth counter in UrlJob when specifying `-g` multiple times ## 1.18.0 - 2021-06-19 ### Additions - [foolfuuka] support `archive.wakarimasen.moe` ([#1595](https://github.com/mikf/gallery-dl/issues/1595)) - [mangadex] implement login with username & password ([#1535](https://github.com/mikf/gallery-dl/issues/1535)) - [mangadex] add extractor for a user's followed feed ([#1535](https://github.com/mikf/gallery-dl/issues/1535)) - [pixiv] support fetching privately followed users ([#1628](https://github.com/mikf/gallery-dl/issues/1628)) - implement conditional filenames ([#1394](https://github.com/mikf/gallery-dl/issues/1394)) - implement `filter` option for post processors ([#1460](https://github.com/mikf/gallery-dl/issues/1460)) - add `-T/--terminate` command-line option ([#1399](https://github.com/mikf/gallery-dl/issues/1399)) - add `-P/--postprocessor` command-line option ([#1583](https://github.com/mikf/gallery-dl/issues/1583)) ### Changes - [kemonoparty] update default filenames and archive IDs ([#1514](https://github.com/mikf/gallery-dl/issues/1514)) - [twitter] update default settings - change `retweets` and `quoted` options from `true` to `false` - change directory format for search results to the same as other extractors - require an argument for `--clear-cache` ### Fixes - [500px] update GraphQL queries - [furaffinity] improve metadata extraction ([#1630](https://github.com/mikf/gallery-dl/issues/1630)) - [hitomi] update image URL generation ([#1637](https://github.com/mikf/gallery-dl/issues/1637)) - [idolcomplex] improve and fix pagination ([#1594](https://github.com/mikf/gallery-dl/issues/1594), [#1601](https://github.com/mikf/gallery-dl/issues/1601)) - [instagram] fix login ([#1631](https://github.com/mikf/gallery-dl/issues/1631)) - [instagram] update query hashes - [mangadex] update to API v5 ([#1535](https://github.com/mikf/gallery-dl/issues/1535)) - [mangafox] improve URL pattern ([#1608](https://github.com/mikf/gallery-dl/issues/1608)) - [oauth] prevent exceptions when reporting errors ([#1603](https://github.com/mikf/gallery-dl/issues/1603)) - [philomena] fix tag escapes handling ([#1629](https://github.com/mikf/gallery-dl/issues/1629)) - [redgifs] update API server address ([#1632](https://github.com/mikf/gallery-dl/issues/1632)) - [sankaku] handle empty tags ([#1617](https://github.com/mikf/gallery-dl/issues/1617)) - [subscribestar] improve attachment filenames ([#1609](https://github.com/mikf/gallery-dl/issues/1609)) - [unsplash] update collections URL pattern ([#1627](https://github.com/mikf/gallery-dl/issues/1627)) - [postprocessor:metadata] handle dicts in `mode:tags` ([#1598](https://github.com/mikf/gallery-dl/issues/1598)) ## 1.17.5 - 2021-05-30 ### Additions - [kemonoparty] add `metadata` option ([#1548](https://github.com/mikf/gallery-dl/issues/1548)) - [kemonoparty] add `type` metadata field ([#1556](https://github.com/mikf/gallery-dl/issues/1556)) - [mangapark] recognize v2.mangapark URLs ([#1578](https://github.com/mikf/gallery-dl/issues/1578)) - [patreon] extract user-defined `tags` ([#1539](https://github.com/mikf/gallery-dl/issues/1539), [#1540](https://github.com/mikf/gallery-dl/issues/1540)) - [pillowfort] implement login with username & password ([#846](https://github.com/mikf/gallery-dl/issues/846)) - [pillowfort] add `inline` and `external` options ([#846](https://github.com/mikf/gallery-dl/issues/846)) - [pixiv] implement `max-posts` option ([#1558](https://github.com/mikf/gallery-dl/issues/1558)) - [pixiv] add `metadata` option ([#1551](https://github.com/mikf/gallery-dl/issues/1551)) - [twitter] add `text-tweets` option ([#570](https://github.com/mikf/gallery-dl/issues/570)) - [weibo] extend `retweets` option ([#1542](https://github.com/mikf/gallery-dl/issues/1542)) - [postprocessor:ugoira] support using the `image2` demuxer ([#1550](https://github.com/mikf/gallery-dl/issues/1550)) - [postprocessor:ugoira] add `repeat-last-frame` option ([#1550](https://github.com/mikf/gallery-dl/issues/1550)) - support `XDG_CONFIG_HOME` ([#1545](https://github.com/mikf/gallery-dl/issues/1545)) - implement `parent-skip` and `"skip": "terminate"` options ([#1399](https://github.com/mikf/gallery-dl/issues/1399)) ### Changes - [twitter] resolve `t.co` URLs in `content` ([#1532](https://github.com/mikf/gallery-dl/issues/1532)) ### Fixes - [500px] update query hashes ([#1573](https://github.com/mikf/gallery-dl/issues/1573)) - [aryion] find text posts in `recursive=false` mode ([#1568](https://github.com/mikf/gallery-dl/issues/1568)) - [imagebam] fix extraction of NSFW images ([#1534](https://github.com/mikf/gallery-dl/issues/1534)) - [imgur] update URL patterns ([#1561](https://github.com/mikf/gallery-dl/issues/1561)) - [manganelo] update domain to `manganato.com` - [reactor] skip deleted/empty posts - [twitter] add missing retweet media entities ([#1555](https://github.com/mikf/gallery-dl/issues/1555)) - fix ISO 639-1 code for Japanese (`jp` -> `ja`) ## 1.17.4 - 2021-05-07 ### Additions - [gelbooru] add extractor for `/redirect.php` URLs ([#1530](https://github.com/mikf/gallery-dl/issues/1530)) - [inkbunny] add `favorite` extractor ([#1521](https://github.com/mikf/gallery-dl/issues/1521)) - add `output.skip` option - add an optional argument to `--clear-cache` to select which cache entries to remove ([#1230](https://github.com/mikf/gallery-dl/issues/1230)) ### Changes - [pixiv] update `translated-tags` option ([#1507](https://github.com/mikf/gallery-dl/issues/1507)) - rename to `tags` - accept `"japanese"`, `"translated"`, and `"original"` as values ### Fixes - [500px] update query hashes - [kemonoparty] fix download URLs ([#1514](https://github.com/mikf/gallery-dl/issues/1514)) - [imagebam] fix extraction - [instagram] update query hashes - [nozomi] update default archive-fmt for `tag` and `search` extractors ([#1529](https://github.com/mikf/gallery-dl/issues/1529)) - [pixiv] remove duplicate translated tags ([#1507](https://github.com/mikf/gallery-dl/issues/1507)) - [readcomiconline] change domain to `readcomiconline.li` ([#1517](https://github.com/mikf/gallery-dl/issues/1517)) - [sankaku] update invalid-token detection ([#1515](https://github.com/mikf/gallery-dl/issues/1515)) - fix crash when using `--no-download` with `--ugoira-conv` ([#1507](https://github.com/mikf/gallery-dl/issues/1507)) ## 1.17.3 - 2021-04-25 ### Additions - [danbooru] add option for extended metadata extraction ([#1458](https://github.com/mikf/gallery-dl/issues/1458)) - [fanbox] add extractors ([#1459](https://github.com/mikf/gallery-dl/issues/1459)) - [fantia] add extractors ([#1459](https://github.com/mikf/gallery-dl/issues/1459)) - [gelbooru] add an option to extract notes ([#1457](https://github.com/mikf/gallery-dl/issues/1457)) - [hentaicosplays] add extractor ([#907](https://github.com/mikf/gallery-dl/issues/907), [#1473](https://github.com/mikf/gallery-dl/issues/1473), [#1483](https://github.com/mikf/gallery-dl/issues/1483)) - [instagram] add extractor for `tagged` posts ([#1439](https://github.com/mikf/gallery-dl/issues/1439)) - [naverwebtoon] ignore non-comic images - [pixiv] also save untranslated tags when `translated-tags` is enabled ([#1501](https://github.com/mikf/gallery-dl/issues/1501)) - [shopify] support omgmiamiswimwear.com ([#1280](https://github.com/mikf/gallery-dl/issues/1280)) - implement `output.fallback` option - add archive format to InfoJob output ([#875](https://github.com/mikf/gallery-dl/issues/875)) - build executables with SOCKS proxy support ([#1424](https://github.com/mikf/gallery-dl/issues/1424)) ### Fixes - [500px] update query hashes - [8muses] fix JSON deobfuscation - [artstation] download `/4k/` images ([#1422](https://github.com/mikf/gallery-dl/issues/1422)) - [deviantart] fix pagination for Eclipse results ([#1444](https://github.com/mikf/gallery-dl/issues/1444)) - [deviantart] improve folder name matching ([#1451](https://github.com/mikf/gallery-dl/issues/1451)) - [erome] skip deleted albums ([#1447](https://github.com/mikf/gallery-dl/issues/1447)) - [exhentai] fix image limit detection ([#1437](https://github.com/mikf/gallery-dl/issues/1437)) - [exhentai] restore `limits` option ([#1487](https://github.com/mikf/gallery-dl/issues/1487)) - [gelbooru] fix tag category extraction ([#1455](https://github.com/mikf/gallery-dl/issues/1455)) - [instagram] update query hashes - [komikcast] fix extraction - [simplyhentai] fix extraction - [slideshare] fix extraction - [webtoons] update agegate/GDPR cookies ([#1431](https://github.com/mikf/gallery-dl/issues/1431)) - fix `category-transfer` option ### Removals - [yuki] remove module for yuki.la ## 1.17.2 - 2021-04-02 ### Additions - [deviantart] add support for posts from watched users ([#794](https://github.com/mikf/gallery-dl/issues/794)) - [manganelo] add `chapter` and `manga` extractors ([#1415](https://github.com/mikf/gallery-dl/issues/1415)) - [pinterest] add `search` extractor ([#1411](https://github.com/mikf/gallery-dl/issues/1411)) - [sankaku] add `tag_string` metadata field ([#1388](https://github.com/mikf/gallery-dl/issues/1388)) - [sankaku] add enumeration index for books ([#1388](https://github.com/mikf/gallery-dl/issues/1388)) - [tapas] add `series` and `episode` extractors ([#692](https://github.com/mikf/gallery-dl/issues/692)) - [tapas] implement login with username & password ([#692](https://github.com/mikf/gallery-dl/issues/692)) - [twitter] allow specifying a custom format for user results ([#1337](https://github.com/mikf/gallery-dl/issues/1337)) - [twitter] add extractor for direct image links ([#1417](https://github.com/mikf/gallery-dl/issues/1417)) - [vk] add support for albums ([#474](https://github.com/mikf/gallery-dl/issues/474)) ### Fixes - [aryion] unescape paths ([#1414](https://github.com/mikf/gallery-dl/issues/1414)) - [bcy] improve pagination - [deviantart] update `watch` URL pattern ([#794](https://github.com/mikf/gallery-dl/issues/794)) - [deviantart] fix arguments for search/popular results ([#1408](https://github.com/mikf/gallery-dl/issues/1408)) - [deviantart] use fallback for `/intermediary/` URLs - [exhentai] improve and simplify image limit checks - [komikcast] fix extraction - [pixiv] fix `favorite` URL pattern ([#1405](https://github.com/mikf/gallery-dl/issues/1405)) - [sankaku] simplify `pool` tags ([#1388](https://github.com/mikf/gallery-dl/issues/1388)) - [twitter] improve error message when trying to log in with 2FA ([#1409](https://github.com/mikf/gallery-dl/issues/1409)) - [twitter] don't use youtube-dl for cards when videos are disabled ([#1416](https://github.com/mikf/gallery-dl/issues/1416)) ## 1.17.1 - 2021-03-19 ### Additions - [architizer] add `project` and `firm` extractors ([#1369](https://github.com/mikf/gallery-dl/issues/1369)) - [deviantart] add `watch` extractor ([#794](https://github.com/mikf/gallery-dl/issues/794)) - [exhentai] support `/tag/` URLs ([#1363](https://github.com/mikf/gallery-dl/issues/1363)) - [gelbooru_v01] support `drawfriends.booru.org`, `vidyart.booru.org`, and `tlb.booru.org` by default - [nozomi] support `/index-N.html` URLs ([#1365](https://github.com/mikf/gallery-dl/issues/1365)) - [philomena] add generalized extractors for philomena sites ([#1379](https://github.com/mikf/gallery-dl/issues/1379)) - [philomena] support post URLs without `/images/` - [twitter] implement `users` option ([#1337](https://github.com/mikf/gallery-dl/issues/1337)) - implement `parent-metadata` option ([#1364](https://github.com/mikf/gallery-dl/issues/1364)) ### Changes - [deviantart] revert previous changes to `extra` option ([#1356](https://github.com/mikf/gallery-dl/issues/1356), [#1387](https://github.com/mikf/gallery-dl/issues/1387)) ### Fixes - [exhentai] improve favorites count extraction ([#1360](https://github.com/mikf/gallery-dl/issues/1360)) - [gelbooru] update domain for video downloads ([#1368](https://github.com/mikf/gallery-dl/issues/1368)) - [hentaifox] improve image and metadata extraction ([#1366](https://github.com/mikf/gallery-dl/issues/1366), [#1378](https://github.com/mikf/gallery-dl/issues/1378)) - [imgur] fix and improve rate limit handling ([#1386](https://github.com/mikf/gallery-dl/issues/1386)) - [weasyl] improve favorites URL pattern ([#1374](https://github.com/mikf/gallery-dl/issues/1374)) - use type check before applying `browser` option ([#1358](https://github.com/mikf/gallery-dl/issues/1358)) - ensure `-s/--simulate` always prints filenames ([#1360](https://github.com/mikf/gallery-dl/issues/1360)) ### Removals - [hentaicafe] remove module - [hentainexus] remove module - [mangareader] remove module - [mangastream] remove module ## 1.17.0 - 2021-03-05 ### Additions - [cyberdrop] add support for `https://cyberdrop.me/` ([#1328](https://github.com/mikf/gallery-dl/issues/1328)) - [exhentai] add `metadata` option; extract more metadata from gallery pages ([#1325](https://github.com/mikf/gallery-dl/issues/1325)) - [hentaicafe] add `search` and `tag` extractors ([#1345](https://github.com/mikf/gallery-dl/issues/1345)) - [hentainexus] add `original` option ([#1322](https://github.com/mikf/gallery-dl/issues/1322)) - [instagram] support `/user/reels/` URLs ([#1329](https://github.com/mikf/gallery-dl/issues/1329)) - [naverwebtoon] add support for `https://comic.naver.com/` ([#1331](https://github.com/mikf/gallery-dl/issues/1331)) - [pixiv] add `translated-tags` option ([#1354](https://github.com/mikf/gallery-dl/issues/1354)) - [tbib] add support for `https://tbib.org/` ([#473](https://github.com/mikf/gallery-dl/issues/473), [#1082](https://github.com/mikf/gallery-dl/issues/1082)) - [tumblrgallery] add support for `https://tumblrgallery.xyz/` ([#1298](https://github.com/mikf/gallery-dl/issues/1298)) - [twitter] add extractor for followed users ([#1337](https://github.com/mikf/gallery-dl/issues/1337)) - [twitter] add option to download all media from conversations ([#1319](https://github.com/mikf/gallery-dl/issues/1319)) - [wallhaven] add `collections` extractor ([#1351](https://github.com/mikf/gallery-dl/issues/1351)) - [snap] allow access to user's .netrc for site authentication ([#1352](https://github.com/mikf/gallery-dl/issues/1352)) - add extractors for Gelbooru v0.1 sites ([#234](https://github.com/mikf/gallery-dl/issues/234), [#426](https://github.com/mikf/gallery-dl/issues/426), [#473](https://github.com/mikf/gallery-dl/issues/473), [#767](https://github.com/mikf/gallery-dl/issues/767), [#1238](https://github.com/mikf/gallery-dl/issues/1238)) - add `-E/--extractor-info` command-line option ([#875](https://github.com/mikf/gallery-dl/issues/875)) - add GitHub Actions workflow for building standalone executables ([#1312](https://github.com/mikf/gallery-dl/issues/1312)) - add `browser` and `headers` options ([#1117](https://github.com/mikf/gallery-dl/issues/1117)) - add option to use different youtube-dl forks ([#1330](https://github.com/mikf/gallery-dl/issues/1330)) - support using multiple input files at once ([#1353](https://github.com/mikf/gallery-dl/issues/1353)) ### Changes - [deviantart] extend `extra` option to also download embedded DeviantArt posts. - [exhentai] rename metadata fields to match API results ([#1325](https://github.com/mikf/gallery-dl/issues/1325)) - [mangadex] use `api.mangadex.org` as default API server - [mastodon] cache OAuth tokens ([#616](https://github.com/mikf/gallery-dl/issues/616)) - replace `wait-min` and `wait-max` with `sleep-request` ### Fixes - [500px] skip unavailable photos ([#1335](https://github.com/mikf/gallery-dl/issues/1335)) - [komikcast] fix extraction - [readcomiconline] download high quality image versions ([#1347](https://github.com/mikf/gallery-dl/issues/1347)) - [twitter] update GraphQL endpoints - fix crash when `base-directory` is an empty string ([#1339](https://github.com/mikf/gallery-dl/issues/1339)) ### Removals - remove support for formerly deprecated options - remove `cloudflare` module ## 1.16.5 - 2021-02-14 ### Additions - [behance] support `video` modules ([#1282](https://github.com/mikf/gallery-dl/issues/1282)) - [erome] add `album`, `user`, and `search` extractors ([#409](https://github.com/mikf/gallery-dl/issues/409)) - [hentaifox] support searching by group ([#1294](https://github.com/mikf/gallery-dl/issues/1294)) - [imgclick] add `image` extractor ([#1307](https://github.com/mikf/gallery-dl/issues/1307)) - [kemonoparty] extract inline images ([#1286](https://github.com/mikf/gallery-dl/issues/1286)) - [kemonoparty] support URLs with non-numeric user and post IDs ([#1303](https://github.com/mikf/gallery-dl/issues/1303)) - [pillowfort] add `user` and `post` extractors ([#846](https://github.com/mikf/gallery-dl/issues/846)) ### Changes - [kemonoparty] include `service` in directories and archive keys - [pixiv] require a `refresh-token` to login ([#1304](https://github.com/mikf/gallery-dl/issues/1304)) - [snap] use `core18` as base ### Fixes - [500px] update query hashes - [deviantart] update parameters for `/browse/popular` ([#1267](https://github.com/mikf/gallery-dl/issues/1267)) - [deviantart] provide filename extension for original file downloads ([#1272](https://github.com/mikf/gallery-dl/issues/1272)) - [deviantart] fix `folders` option ([#1302](https://github.com/mikf/gallery-dl/issues/1302)) - [inkbunny] add `sid` parameter to private file downloads ([#1281](https://github.com/mikf/gallery-dl/issues/1281)) - [kemonoparty] fix absolute file URLs - [mangadex] revert to `https://mangadex.org/api/` and add `api-server` option ([#1310](https://github.com/mikf/gallery-dl/issues/1310)) - [nsfwalbum] use fallback for deleted content ([#1259](https://github.com/mikf/gallery-dl/issues/1259)) - [sankaku] update `invalid token` detection ([#1309](https://github.com/mikf/gallery-dl/issues/1309)) - [slideshare] fix extraction - [postprocessor:metadata] fix crash with `extension-format` ([#1285](https://github.com/mikf/gallery-dl/issues/1285)) ## 1.16.4 - 2021-01-23 ### Additions - [furaffinity] add `descriptions` option ([#1231](https://github.com/mikf/gallery-dl/issues/1231)) - [kemonoparty] add `user` and `post` extractors ([#1216](https://github.com/mikf/gallery-dl/issues/1216)) - [nozomi] add `num` enumeration index ([#1239](https://github.com/mikf/gallery-dl/issues/1239)) - [photovogue] added portfolio extractor ([#1253](https://github.com/mikf/gallery-dl/issues/1253)) - [twitter] match `/i/user/ID` URLs - [unsplash] add extractors ([#1197](https://github.com/mikf/gallery-dl/issues/1197)) - [vipr] add image extractor ([#1258](https://github.com/mikf/gallery-dl/issues/1258)) ### Changes - [derpibooru] use "Everything" filter by default ([#862](https://github.com/mikf/gallery-dl/issues/862)) ### Fixes - [derpibooru] update `date` parsing - [foolfuuka] stop search when results are exhausted ([#1174](https://github.com/mikf/gallery-dl/issues/1174)) - [instagram] fix regex for `/saved` URLs ([#1251](https://github.com/mikf/gallery-dl/issues/1251)) - [mangadex] update API URLs - [mangakakalot] fix extraction - [newgrounds] fix flash file extraction ([#1257](https://github.com/mikf/gallery-dl/issues/1257)) - [sankaku] simplify login process - [twitter] fix retries after hitting rate limit ## 1.16.3 - 2021-01-10 ### Fixes - fix crash when using a `dict` for `path-restrict` - [postprocessor:metadata] sanitize custom filenames ## 1.16.2 - 2021-01-09 ### Additions - [derpibooru] add `search` and `gallery` extractors ([#862](https://github.com/mikf/gallery-dl/issues/862)) - [foolfuuka] add `board` and `search` extractors ([#1044](https://github.com/mikf/gallery-dl/issues/1044), [#1174](https://github.com/mikf/gallery-dl/issues/1174)) - [gfycat] add `date` metadata field ([#1138](https://github.com/mikf/gallery-dl/issues/1138)) - [pinterest] add support for getting all boards of a user ([#1205](https://github.com/mikf/gallery-dl/issues/1205)) - [sankaku] add support for book searches ([#1204](https://github.com/mikf/gallery-dl/issues/1204)) - [twitter] fetch media from pinned tweets ([#1203](https://github.com/mikf/gallery-dl/issues/1203)) - [wikiart] add extractor for single paintings ([#1233](https://github.com/mikf/gallery-dl/issues/1233)) - [downloader:http] add MIME type and signature for `.ico` files ([#1211](https://github.com/mikf/gallery-dl/issues/1211)) - add `d` format string conversion for timestamp values - add `"ascii"` as a special `path-restrict` value ### Fixes - [hentainexus] fix extraction ([#1234](https://github.com/mikf/gallery-dl/issues/1234)) - [instagram] categorize single highlight URLs as `highlights` ([#1222](https://github.com/mikf/gallery-dl/issues/1222)) - [redgifs] fix search results - [twitter] fix login with username & password - [twitter] fetch tweets from `homeConversation` entries ## 1.16.1 - 2020-12-27 ### Additions - [instagram] add `include` option ([#1180](https://github.com/mikf/gallery-dl/issues/1180)) - [pinterest] implement video support ([#1189](https://github.com/mikf/gallery-dl/issues/1189)) - [sankaku] reimplement login support ([#1176](https://github.com/mikf/gallery-dl/issues/1176), [#1182](https://github.com/mikf/gallery-dl/issues/1182)) - [sankaku] add support for sankaku.app URLs ([#1193](https://github.com/mikf/gallery-dl/issues/1193)) ### Changes - [e621] return pool posts in order ([#1195](https://github.com/mikf/gallery-dl/issues/1195)) - [hentaicafe] prefer title of `/hc.fyi/` pages ([#1106](https://github.com/mikf/gallery-dl/issues/1106)) - [hentaicafe] simplify default filenames - [sankaku] normalize `created_at` metadata ([#1190](https://github.com/mikf/gallery-dl/issues/1190)) - [postprocessor:exec] do not add missing `{}` to command ([#1185](https://github.com/mikf/gallery-dl/issues/1185)) ### Fixes - [booru] improve error handling - [instagram] warn about private profiles ([#1187](https://github.com/mikf/gallery-dl/issues/1187)) - [keenspot] improve redirect handling - [mangadex] respect `chapter-reverse` settings ([#1194](https://github.com/mikf/gallery-dl/issues/1194)) - [pixiv] output debug message on failed login attempts ([#1192](https://github.com/mikf/gallery-dl/issues/1192)) - increase SQLite connection timeouts ([#1173](https://github.com/mikf/gallery-dl/issues/1173)) ### Removals - [mangapanda] remove module ## 1.16.0 - 2020-12-12 ### Additions - [booru] implement generalized extractors for `*booru` and `moebooru` sites - add support for sakugabooru.com ([#1136](https://github.com/mikf/gallery-dl/issues/1136)) - add support for lolibooru.moe ([#1050](https://github.com/mikf/gallery-dl/issues/1050)) - provide formattable `date` metadata fields ([#1138](https://github.com/mikf/gallery-dl/issues/1138)) - [postprocessor:metadata] add `event` and `filename` options ([#315](https://github.com/mikf/gallery-dl/issues/315), [#866](https://github.com/mikf/gallery-dl/issues/866), [#984](https://github.com/mikf/gallery-dl/issues/984)) - [postprocessor:exec] add `event` option ([#992](https://github.com/mikf/gallery-dl/issues/992)) ### Changes - [flickr] update default directories and improve metadata consistency ([#828](https://github.com/mikf/gallery-dl/issues/828)) - [sankaku] use API endpoints from `beta.sankakucomplex.com` - [downloader:http] improve filename extension handling ([#776](https://github.com/mikf/gallery-dl/issues/776)) - replace all JPEG filename extensions with `jpg` by default ### Fixes - [hentainexus] fix extraction ([#1166](https://github.com/mikf/gallery-dl/issues/1166)) - [instagram] rewrite ([#1113](https://github.com/mikf/gallery-dl/issues/1113), [#1122](https://github.com/mikf/gallery-dl/issues/1122), [#1128](https://github.com/mikf/gallery-dl/issues/1128), [#1130](https://github.com/mikf/gallery-dl/issues/1130), [#1149](https://github.com/mikf/gallery-dl/issues/1149)) - [mangadex] handle external chapters ([#1154](https://github.com/mikf/gallery-dl/issues/1154)) - [nozomi] handle empty `date` fields ([#1163](https://github.com/mikf/gallery-dl/issues/1163)) - [paheal] create directory for each post ([#1147](https://github.com/mikf/gallery-dl/issues/1147)) - [piczel] update API URLs - [twitter] update image URL format ([#1145](https://github.com/mikf/gallery-dl/issues/1145)) - [twitter] improve `x-csrf-token` header handling ([#1170](https://github.com/mikf/gallery-dl/issues/1170)) - [webtoons] update `ageGate` cookies ### Removals - [sankaku] remove login support ## 1.15.4 - 2020-11-27 ### Fixes - [2chan] skip external links - [hentainexus] fix extraction ([#1125](https://github.com/mikf/gallery-dl/issues/1125)) - [mangadex] switch to API v2 ([#1129](https://github.com/mikf/gallery-dl/issues/1129)) - [mangapanda] use http:// - [mangoxo] fix extraction - [reddit] skip invalid gallery items ([#1127](https://github.com/mikf/gallery-dl/issues/1127)) ## 1.15.3 - 2020-11-13 ### Additions - [sankakucomplex] extract videos and embeds ([#308](https://github.com/mikf/gallery-dl/issues/308)) - [twitter] add support for lists ([#1096](https://github.com/mikf/gallery-dl/issues/1096)) - [postprocessor:metadata] accept string-lists for `content-format` ([#1080](https://github.com/mikf/gallery-dl/issues/1080)) - implement `modules` and `extension-map` options ### Fixes - [500px] update query hashes - [8kun] fix file URLs of older posts ([#1101](https://github.com/mikf/gallery-dl/issues/1101)) - [exhentai] update image URL parsing ([#1094](https://github.com/mikf/gallery-dl/issues/1094)) - [hentaifoundry] update `YII_CSRF_TOKEN` cookie handling ([#1083](https://github.com/mikf/gallery-dl/issues/1083)) - [hentaifoundry] use scheme from input URLs ([#1095](https://github.com/mikf/gallery-dl/issues/1095)) - [mangoxo] fix metadata extraction - [paheal] fix extraction ([#1088](https://github.com/mikf/gallery-dl/issues/1088)) - collect post processors from `basecategory` entries ([#1084](https://github.com/mikf/gallery-dl/issues/1084)) ## 1.15.2 - 2020-10-24 ### Additions - [pinterest] implement login support ([#1055](https://github.com/mikf/gallery-dl/issues/1055)) - [reddit] add `date` metadata field ([#1068](https://github.com/mikf/gallery-dl/issues/1068)) - [seiga] add metadata for single image downloads ([#1063](https://github.com/mikf/gallery-dl/issues/1063)) - [twitter] support media from Cards ([#937](https://github.com/mikf/gallery-dl/issues/937), [#1005](https://github.com/mikf/gallery-dl/issues/1005)) - [weasyl] support api-key authentication ([#1057](https://github.com/mikf/gallery-dl/issues/1057)) - add a `t` format string conversion for trimming whitespace ([#1065](https://github.com/mikf/gallery-dl/issues/1065)) ### Fixes - [blogger] handle URLs with specified width/height ([#1061](https://github.com/mikf/gallery-dl/issues/1061)) - [fallenangels] fix extraction of `.5` chapters - [gelbooru] rewrite mp4 video URLs ([#1048](https://github.com/mikf/gallery-dl/issues/1048)) - [hitomi] fix image URLs and gallery URL pattern - [mangadex] unescape more metadata fields ([#1066](https://github.com/mikf/gallery-dl/issues/1066)) - [mangahere] ensure download URLs have a scheme ([#1070](https://github.com/mikf/gallery-dl/issues/1070)) - [mangakakalot] ignore "Go Home" buttons in chapter pages - [newgrounds] handle embeds without scheme ([#1033](https://github.com/mikf/gallery-dl/issues/1033)) - [newgrounds] provide fallback URLs for video downloads ([#1042](https://github.com/mikf/gallery-dl/issues/1042)) - [xhamster] fix user profile extraction ## 1.15.1 - 2020-10-11 ### Additions - [hentaicafe] add `manga_id` metadata field ([#1036](https://github.com/mikf/gallery-dl/issues/1036)) - [hentaifoundry] add support for stories ([#734](https://github.com/mikf/gallery-dl/issues/734)) - [hentaifoundry] add `include` option - [newgrounds] extract image embeds ([#1033](https://github.com/mikf/gallery-dl/issues/1033)) - [nijie] add `include` option ([#1018](https://github.com/mikf/gallery-dl/issues/1018)) - [reactor] match URLs without subdomain ([#1053](https://github.com/mikf/gallery-dl/issues/1053)) - [twitter] extend `retweets` option ([#1026](https://github.com/mikf/gallery-dl/issues/1026)) - [weasyl] add extractors ([#977](https://github.com/mikf/gallery-dl/issues/977)) ### Fixes - [500px] update query hashes - [behance] fix `collection` extraction - [newgrounds] fix video extraction ([#1042](https://github.com/mikf/gallery-dl/issues/1042)) - [twitter] improve twitpic extraction ([#1019](https://github.com/mikf/gallery-dl/issues/1019)) - [weibo] handle posts with more than 9 images ([#926](https://github.com/mikf/gallery-dl/issues/926)) - [xvideos] fix `title` extraction - fix crash when using `--download-archive` with `--no-skip` ([#1023](https://github.com/mikf/gallery-dl/issues/1023)) - fix issues with `blacklist`/`whitelist` defaults ([#1051](https://github.com/mikf/gallery-dl/issues/1051), [#1056](https://github.com/mikf/gallery-dl/issues/1056)) ### Removals - [kissmanga] remove module ## 1.15.0 - 2020-09-20 ### Additions - [deviantart] support watchers-only/paid deviations ([#995](https://github.com/mikf/gallery-dl/issues/995)) - [myhentaigallery] add gallery extractor ([#1001](https://github.com/mikf/gallery-dl/issues/1001)) - [twitter] support specifying users by ID ([#980](https://github.com/mikf/gallery-dl/issues/980)) - [twitter] support `/intent/user?user_id=…` URLs ([#980](https://github.com/mikf/gallery-dl/issues/980)) - add `--no-skip` command-line option ([#986](https://github.com/mikf/gallery-dl/issues/986)) - add `blacklist` and `whitelist` options ([#492](https://github.com/mikf/gallery-dl/issues/492), [#844](https://github.com/mikf/gallery-dl/issues/844)) - add `filesize-min` and `filesize-max` options ([#780](https://github.com/mikf/gallery-dl/issues/780)) - add `sleep-extractor` and `sleep-request` options ([#788](https://github.com/mikf/gallery-dl/issues/788)) - write skipped files to archive ([#550](https://github.com/mikf/gallery-dl/issues/550)) ### Changes - [exhentai] update wait time before original image downloads ([#978](https://github.com/mikf/gallery-dl/issues/978)) - [imgur] use new API endpoints for image/album data - [tumblr] create directories for each post ([#965](https://github.com/mikf/gallery-dl/issues/965)) - support format string replacement fields in download archive paths ([#985](https://github.com/mikf/gallery-dl/issues/985)) - reduce wait time growth rate for HTTP retries from exponential to linear ### Fixes - [500px] update query hash - [aryion] improve post ID extraction ([#981](https://github.com/mikf/gallery-dl/issues/981), [#982](https://github.com/mikf/gallery-dl/issues/982)) - [danbooru] handle posts without `id` ([#1004](https://github.com/mikf/gallery-dl/issues/1004)) - [furaffinity] update download URL extraction ([#988](https://github.com/mikf/gallery-dl/issues/988)) - [imgur] fix image/album detection for galleries - [postprocessor:zip] defer zip file creation ([#968](https://github.com/mikf/gallery-dl/issues/968)) ### Removals - [jaiminisbox] remove extractors - [worldthree] remove extractors ## 1.14.5 - 2020-08-30 ### Additions - [aryion] add username/password support ([#960](https://github.com/mikf/gallery-dl/issues/960)) - [exhentai] add ability to specify a custom image limit ([#940](https://github.com/mikf/gallery-dl/issues/940)) - [furaffinity] add `search` extractor ([#915](https://github.com/mikf/gallery-dl/issues/915)) - [imgur] add `search` and `tag` extractors ([#934](https://github.com/mikf/gallery-dl/issues/934)) ### Fixes - [500px] fix extraction and update URL patterns ([#956](https://github.com/mikf/gallery-dl/issues/956)) - [aryion] update folder mime type list ([#945](https://github.com/mikf/gallery-dl/issues/945)) - [gelbooru] fix extraction without API - [hentaihand] update to new site layout - [hitomi] fix redirect processing - [reddit] handle deleted galleries ([#953](https://github.com/mikf/gallery-dl/issues/953)) - [reddit] improve gallery extraction ([#955](https://github.com/mikf/gallery-dl/issues/955)) ## 1.14.4 - 2020-08-15 ### Additions - [blogger] add `search` extractor ([#925](https://github.com/mikf/gallery-dl/issues/925)) - [blogger] support searching posts by labels ([#925](https://github.com/mikf/gallery-dl/issues/925)) - [inkbunny] add `user` and `post` extractors ([#283](https://github.com/mikf/gallery-dl/issues/283)) - [instagram] support `/reel/` URLs - [pinterest] support `pinterest.co.uk` URLs ([#914](https://github.com/mikf/gallery-dl/issues/914)) - [reddit] support gallery posts ([#920](https://github.com/mikf/gallery-dl/issues/920)) - [subscribestar] extract attached media files ([#852](https://github.com/mikf/gallery-dl/issues/852)) ### Fixes - [blogger] improve error messages for missing posts/blogs ([#903](https://github.com/mikf/gallery-dl/issues/903)) - [exhentai] adjust image limit costs ([#940](https://github.com/mikf/gallery-dl/issues/940)) - [gfycat] skip malformed gfycat responses ([#902](https://github.com/mikf/gallery-dl/issues/902)) - [imgur] handle 403 overcapacity responses ([#910](https://github.com/mikf/gallery-dl/issues/910)) - [instagram] wait before GraphQL requests ([#901](https://github.com/mikf/gallery-dl/issues/901)) - [mangareader] fix extraction - [mangoxo] fix login - [pixnet] detect password-protected albums ([#177](https://github.com/mikf/gallery-dl/issues/177)) - [simplyhentai] fix `gallery_id` extraction - [subscribestar] update `date` parsing - [vsco] handle missing `description` fields - [xhamster] fix extraction ([#917](https://github.com/mikf/gallery-dl/issues/917)) - allow `parent-directory` to work recursively ([#905](https://github.com/mikf/gallery-dl/issues/905)) - skip external OAuth tests ([#908](https://github.com/mikf/gallery-dl/issues/908)) ### Removals - [bobx] remove module ## 1.14.3 - 2020-07-18 ### Additions - [8muses] support `comics.8muses.com` URLs - [artstation] add `following` extractor ([#888](https://github.com/mikf/gallery-dl/issues/888)) - [exhentai] add `domain` option ([#897](https://github.com/mikf/gallery-dl/issues/897)) - [gfycat] add `user` and `search` extractors - [imgur] support all `/t/...` URLs ([#880](https://github.com/mikf/gallery-dl/issues/880)) - [khinsider] add `format` option ([#840](https://github.com/mikf/gallery-dl/issues/840)) - [mangakakalot] add `manga` and `chapter` extractors ([#876](https://github.com/mikf/gallery-dl/issues/876)) - [redgifs] support `gifsdeliverynetwork.com` URLs ([#874](https://github.com/mikf/gallery-dl/issues/874)) - [subscribestar] add `user` and `post` extractors ([#852](https://github.com/mikf/gallery-dl/issues/852)) - [twitter] add support for nitter.net URLs ([#890](https://github.com/mikf/gallery-dl/issues/890)) - add Zsh completion script ([#150](https://github.com/mikf/gallery-dl/issues/150)) ### Fixes - [gfycat] retry 404'ed videos on redgifs.com ([#874](https://github.com/mikf/gallery-dl/issues/874)) - [newgrounds] fix favorites extraction - [patreon] yield images and attachments before post files ([#871](https://github.com/mikf/gallery-dl/issues/871)) - [reddit] fix AttributeError when using `recursion` ([#879](https://github.com/mikf/gallery-dl/issues/879)) - [twitter] raise proper exception if a user doesn't exist ([#891](https://github.com/mikf/gallery-dl/issues/891)) - defer directory creation ([#722](https://github.com/mikf/gallery-dl/issues/722)) - set pseudo extension for Metadata messages ([#865](https://github.com/mikf/gallery-dl/issues/865)) - prevent exception on Cloudflare challenges ([#868](https://github.com/mikf/gallery-dl/issues/868)) ## 1.14.2 - 2020-06-27 ### Additions - [artstation] add `date` metadata field ([#839](https://github.com/mikf/gallery-dl/issues/839)) - [mastodon] add `date` metadata field ([#839](https://github.com/mikf/gallery-dl/issues/839)) - [pinterest] add support for board sections ([#835](https://github.com/mikf/gallery-dl/issues/835)) - [twitter] add extractor for liked tweets ([#837](https://github.com/mikf/gallery-dl/issues/837)) - [twitter] add option to filter media from quoted tweets ([#854](https://github.com/mikf/gallery-dl/issues/854)) - [weibo] add `date` metadata field to `status` objects ([#829](https://github.com/mikf/gallery-dl/issues/829)) ### Fixes - [aryion] fix user gallery extraction ([#832](https://github.com/mikf/gallery-dl/issues/832)) - [imgur] build directory paths for each file ([#842](https://github.com/mikf/gallery-dl/issues/842)) - [tumblr] prevent errors when using `reblogs=same-blog` ([#851](https://github.com/mikf/gallery-dl/issues/851)) - [twitter] always provide an `author` metadata field ([#831](https://github.com/mikf/gallery-dl/issues/831), [#833](https://github.com/mikf/gallery-dl/issues/833)) - [twitter] don't download video previews ([#833](https://github.com/mikf/gallery-dl/issues/833)) - [twitter] improve handling of deleted tweets ([#838](https://github.com/mikf/gallery-dl/issues/838)) - [twitter] fix search results ([#847](https://github.com/mikf/gallery-dl/issues/847)) - [twitter] improve handling of quoted tweets ([#854](https://github.com/mikf/gallery-dl/issues/854)) - fix config lookups when multiple locations are involved ([#843](https://github.com/mikf/gallery-dl/issues/843)) - improve output of `-K/--list-keywords` for parent extractors ([#825](https://github.com/mikf/gallery-dl/issues/825)) - call `flush()` after writing JSON in `DataJob()` ([#727](https://github.com/mikf/gallery-dl/issues/727)) ## 1.14.1 - 2020-06-12 ### Additions - [furaffinity] add `artist_url` metadata field ([#821](https://github.com/mikf/gallery-dl/issues/821)) - [redgifs] add `user` and `search` extractors ([#724](https://github.com/mikf/gallery-dl/issues/724)) ### Changes - [deviantart] extend `extra` option; also search journals for sta.sh links ([#712](https://github.com/mikf/gallery-dl/issues/712)) - [twitter] rewrite; use new interface ([#806](https://github.com/mikf/gallery-dl/issues/806), [#740](https://github.com/mikf/gallery-dl/issues/740)) ### Fixes - [kissmanga] work around CAPTCHAs ([#818](https://github.com/mikf/gallery-dl/issues/818)) - [nhentai] fix extraction ([#819](https://github.com/mikf/gallery-dl/issues/819)) - [webtoons] generalize comic extraction code ([#820](https://github.com/mikf/gallery-dl/issues/820)) ## 1.14.0 - 2020-05-31 ### Additions - [imagechest] add new extractor for imgchest.com ([#750](https://github.com/mikf/gallery-dl/issues/750)) - [instagram] add `post_url`, `tags`, `location`, `tagged_users` metadata ([#743](https://github.com/mikf/gallery-dl/issues/743)) - [redgifs] add image extractor ([#724](https://github.com/mikf/gallery-dl/issues/724)) - [webtoons] add new extractor for webtoons.com ([#761](https://github.com/mikf/gallery-dl/issues/761)) - implement `--write-pages` option ([#736](https://github.com/mikf/gallery-dl/issues/736)) - extend `path-restrict` option ([#662](https://github.com/mikf/gallery-dl/issues/662)) - implement `path-replace` option ([#662](https://github.com/mikf/gallery-dl/issues/662), [#755](https://github.com/mikf/gallery-dl/issues/755)) - make `path` and `keywords` available in logging messages ([#574](https://github.com/mikf/gallery-dl/issues/574), [#575](https://github.com/mikf/gallery-dl/issues/575)) ### Changes - [danbooru] change default value of `ugoira` to `false` - [downloader:ytdl] change default value of `forward-cookies` to `false` - [downloader:ytdl] fix file extensions when merging into `.mkv` ([#720](https://github.com/mikf/gallery-dl/issues/720)) - write OAuth tokens to cache ([#616](https://github.com/mikf/gallery-dl/issues/616)) - use `%APPDATA%\gallery-dl` for config files and cache on Windows - use `util.Formatter` for formatting logging messages - reuse HTTP connections from parent extractors ### Fixes - [deviantart] use private access tokens for Journals ([#738](https://github.com/mikf/gallery-dl/issues/738)) - [gelbooru] simplify and fix pool extraction - [imgur] fix extraction of animated images without `mp4` entry - [imgur] treat `/t/unmuted/` URLs as galleries - [instagram] fix login with username & password ([#756](https://github.com/mikf/gallery-dl/issues/756), [#771](https://github.com/mikf/gallery-dl/issues/771), [#797](https://github.com/mikf/gallery-dl/issues/797), [#803](https://github.com/mikf/gallery-dl/issues/803)) - [reddit] don't send OAuth headers for file downloads ([#729](https://github.com/mikf/gallery-dl/issues/729)) - fix/improve Cloudflare bypass code ([#728](https://github.com/mikf/gallery-dl/issues/728), [#757](https://github.com/mikf/gallery-dl/issues/757)) - reset filenames on empty file extensions ([#733](https://github.com/mikf/gallery-dl/issues/733)) ## 1.13.6 - 2020-05-02 ### Additions - [patreon] respect filters and sort order in query parameters ([#711](https://github.com/mikf/gallery-dl/issues/711)) - [speakerdeck] add a new extractor for speakerdeck.com ([#726](https://github.com/mikf/gallery-dl/issues/726)) - [twitter] add `replies` option ([#705](https://github.com/mikf/gallery-dl/issues/705)) - [weibo] add `videos` option - [downloader:http] add MIME types for `.psd` files ([#714](https://github.com/mikf/gallery-dl/issues/714)) ### Fixes - [artstation] improve embed extraction ([#720](https://github.com/mikf/gallery-dl/issues/720)) - [deviantart] limit API wait times ([#721](https://github.com/mikf/gallery-dl/issues/721)) - [newgrounds] fix URLs produced by the `following` extractor ([#684](https://github.com/mikf/gallery-dl/issues/684)) - [patreon] improve file hash extraction ([#713](https://github.com/mikf/gallery-dl/issues/713)) - [vsco] fix user gallery extraction - fix/improve Cloudflare bypass code ([#728](https://github.com/mikf/gallery-dl/issues/728)) ## 1.13.5 - 2020-04-27 ### Additions - [500px] recognize `web.500px.com` URLs - [aryion] support downloading from folders ([#694](https://github.com/mikf/gallery-dl/issues/694)) - [furaffinity] add extractor for followed users ([#515](https://github.com/mikf/gallery-dl/issues/515)) - [hitomi] add extractor for tag searches ([#697](https://github.com/mikf/gallery-dl/issues/697)) - [instagram] add `post_id` and `num` metadata fields ([#698](https://github.com/mikf/gallery-dl/issues/698)) - [newgrounds] add extractor for followed users ([#684](https://github.com/mikf/gallery-dl/issues/684)) - [patreon] recognize URLs with creator IDs ([#711](https://github.com/mikf/gallery-dl/issues/711)) - [twitter] add `reply` metadata field ([#705](https://github.com/mikf/gallery-dl/issues/705)) - [xhamster] recognize `xhamster.porncache.net` URLs ([#700](https://github.com/mikf/gallery-dl/issues/700)) ### Fixes - [gelbooru] improve post ID extraction in pool listings - [hitomi] fix extraction of galleries without tags - [jaiminisbox] update metadata decoding procedure ([#702](https://github.com/mikf/gallery-dl/issues/702)) - [mastodon] fix pagination ([#701](https://github.com/mikf/gallery-dl/issues/701)) - [mastodon] improve account searches ([#704](https://github.com/mikf/gallery-dl/issues/704)) - [patreon] fix hash extraction from download URLs ([#693](https://github.com/mikf/gallery-dl/issues/693)) - improve parameter extraction when solving Cloudflare challenges ## 1.13.4 - 2020-04-12 ### Additions - [aryion] add `gallery` and `post` extractors ([#390](https://github.com/mikf/gallery-dl/issues/390), [#673](https://github.com/mikf/gallery-dl/issues/673)) - [deviantart] detect and handle folders in sta.sh listings ([#659](https://github.com/mikf/gallery-dl/issues/659)) - [hentainexus] add `circle`, `event`, and `title_conventional` metadata fields ([#661](https://github.com/mikf/gallery-dl/issues/661)) - [hiperdex] add `artist` extractor ([#606](https://github.com/mikf/gallery-dl/issues/606)) - [mastodon] add access tokens for `mastodon.social` and `baraag.net` ([#665](https://github.com/mikf/gallery-dl/issues/665)) ### Changes - [deviantart] retrieve *all* download URLs through the OAuth API - automatically read config files in PyInstaller executable directories ([#682](https://github.com/mikf/gallery-dl/issues/682)) ### Fixes - [deviantart] handle "Request blocked" errors ([#655](https://github.com/mikf/gallery-dl/issues/655)) - [deviantart] improve JPEG quality replacement pattern - [hiperdex] fix extraction - [mastodon] handle API rate limits ([#665](https://github.com/mikf/gallery-dl/issues/665)) - [mastodon] update OAuth credentials for pawoo.net ([#665](https://github.com/mikf/gallery-dl/issues/665)) - [myportfolio] fix extraction of galleries without title - [piczel] fix extraction of single images - [vsco] fix collection extraction - [weibo] accept status URLs with non-numeric IDs ([#664](https://github.com/mikf/gallery-dl/issues/664)) ## 1.13.3 - 2020-03-28 ### Additions - [instagram] Add support for user's saved medias ([#644](https://github.com/mikf/gallery-dl/issues/644)) - [nozomi] support multiple images per post ([#646](https://github.com/mikf/gallery-dl/issues/646)) - [35photo] add `tag` extractor ### Changes - [mangadex] transform timestamps from `date` fields to datetime objects ### Fixes - [deviantart] handle decode errors for `extended_fetch` results ([#655](https://github.com/mikf/gallery-dl/issues/655)) - [e621] fix bug in API rate limiting and improve pagination ([#651](https://github.com/mikf/gallery-dl/issues/651)) - [instagram] update pattern for user profile URLs - [mangapark] fix metadata extraction - [nozomi] sort search results ([#646](https://github.com/mikf/gallery-dl/issues/646)) - [piczel] fix extraction - [twitter] fix typo in `x-twitter-auth-type` header ([#625](https://github.com/mikf/gallery-dl/issues/625)) - remove trailing dots from Windows directory names ([#647](https://github.com/mikf/gallery-dl/issues/647)) - fix crash with missing `stdout`/`stderr`/`stdin` handles ([#653](https://github.com/mikf/gallery-dl/issues/653)) ## 1.13.2 - 2020-03-14 ### Additions - [furaffinity] extract more metadata - [instagram] add `post_shortcode` metadata field ([#525](https://github.com/mikf/gallery-dl/issues/525)) - [kabeuchi] add extractor ([#561](https://github.com/mikf/gallery-dl/issues/561)) - [newgrounds] add extractor for favorited posts ([#394](https://github.com/mikf/gallery-dl/issues/394)) - [pixiv] implement `avatar` option ([#595](https://github.com/mikf/gallery-dl/issues/595), [#623](https://github.com/mikf/gallery-dl/issues/623)) - [twitter] add extractor for bookmarked Tweets ([#625](https://github.com/mikf/gallery-dl/issues/625)) ### Fixes - [bcy] reduce number of HTTP requests during data extraction - [e621] update to new interface ([#635](https://github.com/mikf/gallery-dl/issues/635)) - [exhentai] handle incomplete MIME types ([#632](https://github.com/mikf/gallery-dl/issues/632)) - [hitomi] improve metadata extraction - [mangoxo] fix login - [newgrounds] improve error handling when extracting post data ## 1.13.1 - 2020-03-01 ### Additions - [hentaihand] add extractors ([#605](https://github.com/mikf/gallery-dl/issues/605)) - [hiperdex] add chapter and manga extractors ([#606](https://github.com/mikf/gallery-dl/issues/606)) - [oauth] implement option to write DeviantArt refresh-tokens to cache ([#616](https://github.com/mikf/gallery-dl/issues/616)) - [downloader:http] add more MIME types for `.bmp` and `.rar` files ([#621](https://github.com/mikf/gallery-dl/issues/621), [#628](https://github.com/mikf/gallery-dl/issues/628)) - warn about expired cookies ### Fixes - [bcy] fix partial image URLs ([#613](https://github.com/mikf/gallery-dl/issues/613)) - [danbooru] fix Ugoira downloads and metadata - [deviantart] check availability of `/intermediary/` URLs ([#609](https://github.com/mikf/gallery-dl/issues/609)) - [hitomi] follow multiple redirects & fix image URLs - [piczel] improve and update - [tumblr] replace `-` with ` ` in tag searches ([#611](https://github.com/mikf/gallery-dl/issues/611)) - [vsco] update gallery URL pattern - fix `--verbose` and `--quiet` command-line options ## 1.13.0 - 2020-02-16 ### Additions - Support for - `furaffinity` - https://www.furaffinity.net/ ([#284](https://github.com/mikf/gallery-dl/issues/284)) - `8kun` - https://8kun.top/ ([#582](https://github.com/mikf/gallery-dl/issues/582)) - `bcy` - https://bcy.net/ ([#592](https://github.com/mikf/gallery-dl/issues/592)) - [blogger] implement video extraction ([#587](https://github.com/mikf/gallery-dl/issues/587)) - [oauth] add option to specify port number used by local server ([#604](https://github.com/mikf/gallery-dl/issues/604)) - [pixiv] add `rating` metadata field ([#595](https://github.com/mikf/gallery-dl/issues/595)) - [pixiv] recognize tags at the end of new bookmark URLs - [reddit] add `videos` option - [weibo] use youtube-dl to download from m3u8 manifests - implement `parent-directory` option ([#551](https://github.com/mikf/gallery-dl/issues/551)) - extend filename formatting capabilities: - implement field name alternatives ([#525](https://github.com/mikf/gallery-dl/issues/525)) - allow multiple "special" format specifiers per replacement field ([#595](https://github.com/mikf/gallery-dl/issues/595)) - allow for numeric list and string indices ### Changes - [reddit] handle reddit-hosted images and videos natively ([#551](https://github.com/mikf/gallery-dl/issues/551)) - [twitter] change default value for `videos` to `true` ### Fixes - [cloudflare] unescape challenge URLs - [deviantart] fix video extraction from `extended_fetch` results - [hitomi] implement workaround for "broken" redirects - [khinsider] fix and improve metadata extraction - [patreon] filter duplicate files per post ([#590](https://github.com/mikf/gallery-dl/issues/590)) - [piczel] fix extraction - [pixiv] fix user IDs for bookmarks API calls ([#596](https://github.com/mikf/gallery-dl/issues/596)) - [sexcom] fix image URLs - [twitter] force old login page layout ([#584](https://github.com/mikf/gallery-dl/issues/584), [#598](https://github.com/mikf/gallery-dl/issues/598)) - [vsco] skip "invalid" entities - improve functions to load/save cookies.txt files ([#586](https://github.com/mikf/gallery-dl/issues/586)) ### Removals - [yaplog] remove module ## 1.12.3 - 2020-01-19 ### Additions - [hentaifoundry] extract more metadata ([#565](https://github.com/mikf/gallery-dl/issues/565)) - [twitter] add option to extract TwitPic embeds ([#579](https://github.com/mikf/gallery-dl/issues/579)) - implement a post-processor module to compare file versions ([#530](https://github.com/mikf/gallery-dl/issues/530)) ### Fixes - [hitomi] update image URL generation - [mangadex] revert domain to `mangadex.org` - [pinterest] improve detection of invalid pin.it links - [pixiv] update URL patterns for user profiles and bookmarks ([#568](https://github.com/mikf/gallery-dl/issues/568)) - [twitter] Fix stop before real end ([#573](https://github.com/mikf/gallery-dl/issues/573)) - remove temp files before downloading from fallback URLs ### Removals - [erolord] remove extractor ## 1.12.2 - 2020-01-05 ### Additions - [deviantart] match new search/popular URLs ([#538](https://github.com/mikf/gallery-dl/issues/538)) - [deviantart] match `/favourites/all` URLs ([#555](https://github.com/mikf/gallery-dl/issues/555)) - [deviantart] add extractor for followed users ([#515](https://github.com/mikf/gallery-dl/issues/515)) - [pixiv] support listing followed users ([#515](https://github.com/mikf/gallery-dl/issues/515)) - [imagefap] handle beta.imagefap.com URLs ([#552](https://github.com/mikf/gallery-dl/issues/552)) - [postprocessor:metadata] add `directory` option ([#520](https://github.com/mikf/gallery-dl/issues/520)) ### Fixes - [artstation] fix search result pagination ([#537](https://github.com/mikf/gallery-dl/issues/537)) - [directlink] send Referer headers ([#536](https://github.com/mikf/gallery-dl/issues/536)) - [exhentai] restrict default directory name length ([#545](https://github.com/mikf/gallery-dl/issues/545)) - [mangadex] change domain to mangadex.cc ([#559](https://github.com/mikf/gallery-dl/issues/559)) - [mangahere] send `isAdult` cookies ([#556](https://github.com/mikf/gallery-dl/issues/556)) - [newgrounds] fix tags metadata extraction - [pixiv] retry after rate limit errors ([#535](https://github.com/mikf/gallery-dl/issues/535)) - [twitter] handle quoted tweets ([#526](https://github.com/mikf/gallery-dl/issues/526)) - [twitter] handle API rate limits ([#526](https://github.com/mikf/gallery-dl/issues/526)) - [twitter] fix URLs forwarded to youtube-dl ([#540](https://github.com/mikf/gallery-dl/issues/540)) - prevent infinite recursion when spawning new extractors ([#489](https://github.com/mikf/gallery-dl/issues/489)) - improve output of `--list-keywords` for "parent" extractors ([#548](https://github.com/mikf/gallery-dl/issues/548)) - provide fallback for SQLite versions with missing `WITHOUT ROWID` support ([#553](https://github.com/mikf/gallery-dl/issues/553)) ## 1.12.1 - 2019-12-22 ### Additions - [4chan] add extractor for entire boards ([#510](https://github.com/mikf/gallery-dl/issues/510)) - [realbooru] add extractors for pools, posts, and tag searches ([#514](https://github.com/mikf/gallery-dl/issues/514)) - [instagram] implement a `videos` option ([#521](https://github.com/mikf/gallery-dl/issues/521)) - [vsco] implement a `videos` option - [postprocessor:metadata] implement a `bypost` option for downloading the metadata of an entire post ([#511](https://github.com/mikf/gallery-dl/issues/511)) ### Changes - [reddit] change the default value for `comments` to `0` - [vsco] improve image resolutions - make filesystem-related errors during file downloads non-fatal ([#512](https://github.com/mikf/gallery-dl/issues/512)) ### Fixes - [foolslide] add fallback for chapter data extraction - [instagram] ignore errors during post-page extraction - [patreon] avoid errors when fetching user info ([#508](https://github.com/mikf/gallery-dl/issues/508)) - [patreon] improve URL pattern for single posts - [reddit] fix errors with `t1` submissions - [vsco] fix user profile extraction … again - [weibo] handle unavailable/deleted statuses - [downloader:http] improve rate limit handling - retain trailing zeroes in Cloudflare challenge answers ## 1.12.0 - 2019-12-08 ### Additions - [flickr] support 3k, 4k, 5k, and 6k photo sizes ([#472](https://github.com/mikf/gallery-dl/issues/472)) - [imgur] add extractor for subreddit links ([#500](https://github.com/mikf/gallery-dl/issues/500)) - [newgrounds] add extractors for `audio` listings and general `media` files ([#394](https://github.com/mikf/gallery-dl/issues/394)) - [newgrounds] implement login support ([#394](https://github.com/mikf/gallery-dl/issues/394)) - [postprocessor:metadata] implement a `extension-format` option ([#477](https://github.com/mikf/gallery-dl/issues/477)) - `--exec-after` ### Changes - [deviantart] ensure consistent username capitalization ([#455](https://github.com/mikf/gallery-dl/issues/455)) - [directlink] split `{path}` into `{path}/{filename}.{extension}` - [twitter] update metadata fields with user/author information - [postprocessor:metadata] filter private entries & rename `format` to `content-format` - Enable `cookies-update` by default ### Fixes - [2chan] fix metadata extraction - [behance] get images from 'media_collection' modules - [bobx] fix image downloads by randomly generating session cookies ([#482](https://github.com/mikf/gallery-dl/issues/482)) - [deviantart] revert to getting download URLs from OAuth API calls ([#488](https://github.com/mikf/gallery-dl/issues/488)) - [deviantart] fix URL generation from '/extended_fetch' results ([#505](https://github.com/mikf/gallery-dl/issues/505)) - [flickr] adjust OAuth redirect URI ([#503](https://github.com/mikf/gallery-dl/issues/503)) - [hentaifox] fix extraction - [imagefap] adapt to new image URL format - [imgbb] fix error in galleries without user info ([#471](https://github.com/mikf/gallery-dl/issues/471)) - [instagram] prevent errors with missing 'video_url' fields ([#479](https://github.com/mikf/gallery-dl/issues/479)) - [nijie] fix `date` parsing - [pixiv] match new search URLs ([#507](https://github.com/mikf/gallery-dl/issues/507)) - [plurk] fix comment pagination - [sexcom] send specific Referer headers when downloading videos - [twitter] fix infinite loops ([#499](https://github.com/mikf/gallery-dl/issues/499)) - [vsco] fix user profile and collection extraction ([#480](https://github.com/mikf/gallery-dl/issues/480)) - Fix Cloudflare DDoS protection bypass ### Removals - `--abort-on-skip` ## 1.11.1 - 2019-11-09 ### Fixes - Fix inclusion of bash completion and man pages in source distributions ## 1.11.0 - 2019-11-08 ### Additions - Support for - `blogger` - https://www.blogger.com/ ([#364](https://github.com/mikf/gallery-dl/issues/364)) - `nozomi` - https://nozomi.la/ ([#388](https://github.com/mikf/gallery-dl/issues/388)) - `issuu` - https://issuu.com/ ([#413](https://github.com/mikf/gallery-dl/issues/413)) - `naver` - https://blog.naver.com/ ([#447](https://github.com/mikf/gallery-dl/issues/447)) - Extractor for `twitter` search results ([#448](https://github.com/mikf/gallery-dl/issues/448)) - Extractor for `deviantart` user profiles with configurable targets ([#377](https://github.com/mikf/gallery-dl/issues/377), [#419](https://github.com/mikf/gallery-dl/issues/419)) - `--ugoira-conv-lossless` ([#432](https://github.com/mikf/gallery-dl/issues/432)) - `cookies-update` option to allow updating cookies.txt files ([#445](https://github.com/mikf/gallery-dl/issues/445)) - Optional `cloudflare` and `video` installation targets ([#460](https://github.com/mikf/gallery-dl/issues/460)) - Allow executing commands with the `exec` post-processor after all files are downloaded ([#413](https://github.com/mikf/gallery-dl/issues/413), [#421](https://github.com/mikf/gallery-dl/issues/421)) ### Changes - Rewrite `imgur` using its public API ([#446](https://github.com/mikf/gallery-dl/issues/446)) - Rewrite `luscious` using GraphQL queries ([#457](https://github.com/mikf/gallery-dl/issues/457)) - Adjust default `nijie` filenames to match `pixiv` - Change enumeration index for gallery extractors from `page` to `num` - Return non-zero exit status when errors occurred - Forward proxy settings to youtube-dl downloader - Install bash completion script into `share/bash-completion/completions` ### Fixes - Adapt to new `instagram` page layout when logged in ([#391](https://github.com/mikf/gallery-dl/issues/391)) - Support protected `twitter` videos ([#452](https://github.com/mikf/gallery-dl/issues/452)) - Extend `hitomi` URL pattern and fix gallery extraction - Restore OAuth2 authentication error messages - Miscellaneous fixes for `patreon` ([#444](https://github.com/mikf/gallery-dl/issues/444)), `deviantart` ([#455](https://github.com/mikf/gallery-dl/issues/455)), `sexcom` ([#464](https://github.com/mikf/gallery-dl/issues/464)), `imgur` ([#467](https://github.com/mikf/gallery-dl/issues/467)), `simplyhentai` ## 1.10.6 - 2019-10-11 ### Additions - `--exec` command-line option to specify a command to run after each file download ([#421](https://github.com/mikf/gallery-dl/issues/421)) ### Changes - Include titles in `gfycat` default filenames ([#434](https://github.com/mikf/gallery-dl/issues/434)) ### Fixes - Fetch working download URLs for `deviantart` ([#436](https://github.com/mikf/gallery-dl/issues/436)) - Various fixes and improvements for `yaplog` blogs ([#443](https://github.com/mikf/gallery-dl/issues/443)) - Fix image URL generation for `hitomi` galleries - Miscellaneous fixes for `behance` and `xvideos` ## 1.10.5 - 2019-09-28 ### Additions - `instagram.highlights` option to include highlighted stories when downloading user profiles ([#329](https://github.com/mikf/gallery-dl/issues/329)) - Support for `/user/` URLs on `reddit` ([#350](https://github.com/mikf/gallery-dl/issues/350)) - Support for `imgur` user profiles and favorites ([#420](https://github.com/mikf/gallery-dl/issues/420)) - Additional metadata fields on `nijie`([#423](https://github.com/mikf/gallery-dl/issues/423)) ### Fixes - Improve handling of private `deviantart` artworks ([#414](https://github.com/mikf/gallery-dl/issues/414)) and 429 status codes ([#424](https://github.com/mikf/gallery-dl/issues/424)) - Prevent fatal errors when trying to open download-archive files ([#417](https://github.com/mikf/gallery-dl/issues/417)) - Detect and ignore unavailable videos on `weibo` ([#427](https://github.com/mikf/gallery-dl/issues/427)) - Update the `scope` of new `reddit` refresh-tokens ([#428](https://github.com/mikf/gallery-dl/issues/428)) - Fix inconsistencies with the `reddit.comments` option ([#429](https://github.com/mikf/gallery-dl/issues/429)) - Extend URL patterns for `hentaicafe` manga and `pixiv` artworks - Improve detection of unavailable albums on `luscious` and `imgbb` - Miscellaneous fixes for `tsumino` ## 1.10.4 - 2019-09-08 ### Additions - Support for - `lineblog` - https://www.lineblog.me/ ([#404](https://github.com/mikf/gallery-dl/issues/404)) - `fuskator` - https://fuskator.com/ ([#407](https://github.com/mikf/gallery-dl/issues/407)) - `ugoira` option for `danbooru` to download pre-rendered ugoira animations ([#406](https://github.com/mikf/gallery-dl/issues/406)) ### Fixes - Download the correct files from `twitter` replies ([#403](https://github.com/mikf/gallery-dl/issues/403)) - Prevent crash when trying to use unavailable downloader modules ([#405](https://github.com/mikf/gallery-dl/issues/405)) - Fix `pixiv` authentication ([#411](https://github.com/mikf/gallery-dl/issues/411)) - Improve `exhentai` image limit checks - Miscellaneous fixes for `hentaicafe`, `simplyhentai`, `tumblr` ## 1.10.3 - 2019-08-30 ### Additions - Provide `filename` metadata for all `deviantart` files ([#392](https://github.com/mikf/gallery-dl/issues/392), [#400](https://github.com/mikf/gallery-dl/issues/400)) - Implement a `ytdl.outtmpl` option to let youtube-dl handle filenames by itself ([#395](https://github.com/mikf/gallery-dl/issues/395)) - Support `seiga` mobile URLs ([#401](https://github.com/mikf/gallery-dl/issues/401)) ### Fixes - Extract more than the first 32 posts from `piczel` galleries ([#396](https://github.com/mikf/gallery-dl/issues/396)) - Fix filenames of archives created with `--zip` ([#397](https://github.com/mikf/gallery-dl/issues/397)) - Skip unavailable images and videos on `flickr` ([#398](https://github.com/mikf/gallery-dl/issues/398)) - Fix filesystem paths on Windows with Python 3.6 and lower ([#402](https://github.com/mikf/gallery-dl/issues/402)) ## 1.10.2 - 2019-08-23 ### Additions - Support for `instagram` stories and IGTV ([#371](https://github.com/mikf/gallery-dl/issues/371), [#373](https://github.com/mikf/gallery-dl/issues/373)) - Support for individual `imgbb` images ([#363](https://github.com/mikf/gallery-dl/issues/363)) - `deviantart.quality` option to set the JPEG compression quality for newer images ([#369](https://github.com/mikf/gallery-dl/issues/369)) - `enumerate` option for `extractor.skip` ([#306](https://github.com/mikf/gallery-dl/issues/306)) - `adjust-extensions` option to control filename extension adjustments - `path-remove` option to remove control characters etc. from filesystem paths ### Changes - Rename `restrict-filenames` to `path-restrict` - Adjust `pixiv` metadata and default filename format ([#366](https://github.com/mikf/gallery-dl/issues/366)) - Set `filename` to `"{category}_{user[id]}_{id}{suffix}.{extension}"` to restore the old default - Improve and optimize directory and filename generation ### Fixes - Allow the `classify` post-processor to handle files with unknown filename extension ([#138](https://github.com/mikf/gallery-dl/issues/138)) - Fix rate limit handling for OAuth APIs ([#368](https://github.com/mikf/gallery-dl/issues/368)) - Fix artwork and scraps extraction on `deviantart` ([#376](https://github.com/mikf/gallery-dl/issues/376), [#392](https://github.com/mikf/gallery-dl/issues/392)) - Distinguish between `imgur` album and gallery URLs ([#380](https://github.com/mikf/gallery-dl/issues/380)) - Prevent crash when using `--ugoira-conv` ([#382](https://github.com/mikf/gallery-dl/issues/382)) - Handle multi-image posts on `patreon` ([#383](https://github.com/mikf/gallery-dl/issues/383)) - Miscellaneous fixes for `*reactor`, `simplyhentai` ## 1.10.1 - 2019-08-02 ### Fixes - Use the correct domain for exhentai.org input URLs ## 1.10.0 - 2019-08-01 ### Warning - Prior to version 1.10.0 all cache files were created world readable (mode `644`) leading to possible sensitive information disclosure on multi-user systems - It is recommended to restrict access permissions of already existing files (`/tmp/.gallery-dl.cache`) with `chmod 600` - Windows users should not be affected ### Additions - Support for - `vsco` - https://vsco.co/ ([#331](https://github.com/mikf/gallery-dl/issues/331)) - `imgbb` - https://imgbb.com/ ([#361](https://github.com/mikf/gallery-dl/issues/361)) - `adultempire` - https://www.adultempire.com/ ([#340](https://github.com/mikf/gallery-dl/issues/340)) - `restrict-filenames` option to create Windows-compatible filenames on any platform ([#348](https://github.com/mikf/gallery-dl/issues/348)) - `forward-cookies` option to control cookie forwarding to youtube-dl ([#352](https://github.com/mikf/gallery-dl/issues/352)) ### Changes - The default cache file location on non-Windows systems is now - `$XDG_CACHE_HOME/gallery-dl/cache.sqlite3` or - `~/.cache/gallery-dl/cache.sqlite3` - New cache files are created with mode `600` - `exhentai` extractors will always use `e-hentai.org` as domain ### Fixes - Better handling of `exhentai` image limits and errors ([#356](https://github.com/mikf/gallery-dl/issues/356), [#360](https://github.com/mikf/gallery-dl/issues/360)) - Try to prevent ZIP file corruption ([#355](https://github.com/mikf/gallery-dl/issues/355)) - Miscellaneous fixes for `behance`, `ngomik` ## 1.9.0 - 2019-07-19 ### Additions - Support for - `erolord` - http://erolord.com/ ([#326](https://github.com/mikf/gallery-dl/issues/326)) - Add login support for `instagram` ([#195](https://github.com/mikf/gallery-dl/issues/195)) - Add `--no-download` and `extractor.*.download` disable file downloads ([#220](https://github.com/mikf/gallery-dl/issues/220)) - Add `-A/--abort` to specify the number of consecutive download skips before aborting - Interpret `-1` as infinite retries ([#300](https://github.com/mikf/gallery-dl/issues/300)) - Implement custom log message formats per log-level ([#304](https://github.com/mikf/gallery-dl/issues/304)) - Implement an `mtime` post-processor that sets file modification times according to metadata fields ([#332](https://github.com/mikf/gallery-dl/issues/332)) - Implement a `twitter.content` option to enable tweet text extraction ([#333](https://github.com/mikf/gallery-dl/issues/333), [#338](https://github.com/mikf/gallery-dl/issues/338)) - Enable `date-min/-max/-format` options for `tumblr` ([#337](https://github.com/mikf/gallery-dl/issues/337)) ### Changes - Set file modification times according to their `Last-Modified` header when downloading ([#236](https://github.com/mikf/gallery-dl/issues/236), [#277](https://github.com/mikf/gallery-dl/issues/277)) - Use `--no-mtime` or `downloader.*.mtime` to disable this behavior - Duplicate download URLs are no longer silently ignored (controllable with `extractor.*.image-unique`) - Deprecate `--abort-on-skip` ### Fixes - Retry downloads on OpenSSL exceptions ([#324](https://github.com/mikf/gallery-dl/issues/324)) - Ignore unavailable pins on `sexcom` instead of raising an exception ([#325](https://github.com/mikf/gallery-dl/issues/325)) - Use Firefox's SSL/TLS ciphers to prevent Cloudflare CAPTCHAs ([#342](https://github.com/mikf/gallery-dl/issues/342)) - Improve folder name matching on `deviantart` ([#343](https://github.com/mikf/gallery-dl/issues/343)) - Forward cookies to `youtube-dl` to allow downloading private videos - Miscellaneous fixes for `35photo`, `500px`, `newgrounds`, `simplyhentai` ## 1.8.7 - 2019-06-28 ### Additions - Support for - `vanillarock` - https://vanilla-rock.com/ ([#254](https://github.com/mikf/gallery-dl/issues/254)) - `nsfwalbum` - https://nsfwalbum.com/ ([#287](https://github.com/mikf/gallery-dl/issues/287)) - `artist` and `tags` metadata for `hentaicafe` ([#238](https://github.com/mikf/gallery-dl/issues/238)) - `description` metadata for `instagram` ([#310](https://github.com/mikf/gallery-dl/issues/310)) - Format string option to replace a substring with another - `R//` ([#318](https://github.com/mikf/gallery-dl/issues/318)) ### Changes - Delete empty archives created by the `zip` post-processor ([#316](https://github.com/mikf/gallery-dl/issues/316)) ### Fixes - Handle `hitomi` Game CG galleries correctly ([#321](https://github.com/mikf/gallery-dl/issues/321)) - Miscellaneous fixes for `deviantart`, `hitomi`, `pururin`, `kissmanga`, `keenspot`, `mangoxo`, `imagefap` ## 1.8.6 - 2019-06-14 ### Additions - Support for - `slickpic` - https://www.slickpic.com/ ([#249](https://github.com/mikf/gallery-dl/issues/249)) - `xhamster` - https://xhamster.com/ ([#281](https://github.com/mikf/gallery-dl/issues/281)) - `pornhub` - https://www.pornhub.com/ ([#282](https://github.com/mikf/gallery-dl/issues/282)) - `8muses` - https://www.8muses.com/ ([#305](https://github.com/mikf/gallery-dl/issues/305)) - `extra` option for `deviantart` to download Sta.sh content linked in description texts ([#302](https://github.com/mikf/gallery-dl/issues/302)) ### Changes - Detect `directlink` URLs with upper case filename extensions ([#296](https://github.com/mikf/gallery-dl/issues/296)) ### Fixes - Improved error handling for `tumblr` API calls ([#297](https://github.com/mikf/gallery-dl/issues/297)) - Fixed extraction of `livedoor` blogs ([#301](https://github.com/mikf/gallery-dl/issues/301)) - Fixed extraction of special `deviantart` Sta.sh items ([#307](https://github.com/mikf/gallery-dl/issues/307)) - Fixed pagination for specific `keenspot` comics ## 1.8.5 - 2019-06-01 ### Additions - Support for - `keenspot` - http://keenspot.com/ ([#223](https://github.com/mikf/gallery-dl/issues/223)) - `sankakucomplex` - https://www.sankakucomplex.com ([#258](https://github.com/mikf/gallery-dl/issues/258)) - `folders` option for `deviantart` to add a list of containing folders to each file ([#276](https://github.com/mikf/gallery-dl/issues/276)) - `captcha` option for `kissmanga` and `readcomiconline` to control CAPTCHA handling ([#279](https://github.com/mikf/gallery-dl/issues/279)) - `filename` metadata for files downloaded with youtube-dl ([#291](https://github.com/mikf/gallery-dl/issues/291)) ### Changes - Adjust `wallhaven` extractors to new page layout: - use API and add `api-key` option - removed traditional login support - Provide original filenames for `patreon` downloads ([#268](https://github.com/mikf/gallery-dl/issues/268)) - Use e-hentai.org or exhentai.org depending on input URL ([#278](https://github.com/mikf/gallery-dl/issues/278)) ### Fixes - Fix pagination over `sankaku` popular listings ([#265](https://github.com/mikf/gallery-dl/issues/265)) - Fix folder and collection extraction on `deviantart` ([#271](https://github.com/mikf/gallery-dl/issues/271)) - Detect "AreYouHuman" redirects on `readcomiconline` ([#279](https://github.com/mikf/gallery-dl/issues/279)) - Miscellaneous fixes for `hentainexus`, `livedoor`, `ngomik` ## 1.8.4 - 2019-05-17 ### Additions - Support for - `patreon` - https://www.patreon.com/ ([#226](https://github.com/mikf/gallery-dl/issues/226)) - `hentainexus` - https://hentainexus.com/ ([#256](https://github.com/mikf/gallery-dl/issues/256)) - `date` metadata fields for `pixiv` ([#248](https://github.com/mikf/gallery-dl/issues/248)), `instagram` ([#250](https://github.com/mikf/gallery-dl/issues/250)), `exhentai`, and `newgrounds` ### Changes - Improved `flickr` metadata and video extraction ([#246](https://github.com/mikf/gallery-dl/issues/246)) ### Fixes - Download original GIF animations from `deviantart` ([#242](https://github.com/mikf/gallery-dl/issues/242)) - Ignore missing `edge_media_to_comment` fields on `instagram` ([#250](https://github.com/mikf/gallery-dl/issues/250)) - Fix serialization of `datetime` objects for `--write-metadata` ([#251](https://github.com/mikf/gallery-dl/issues/251), [#252](https://github.com/mikf/gallery-dl/issues/252)) - Allow multiple post-processor command-line options at once ([#253](https://github.com/mikf/gallery-dl/issues/253)) - Prevent crash on `booru` sites when no tags are available ([#259](https://github.com/mikf/gallery-dl/issues/259)) - Fix extraction on `instagram` after `rhx_gis` field removal ([#266](https://github.com/mikf/gallery-dl/issues/266)) - Avoid Cloudflare CAPTCHAs for Python interpreters built against OpenSSL < 1.1.1 - Miscellaneous fixes for `luscious` ## 1.8.3 - 2019-05-04 ### Additions - Support for - `plurk` - https://www.plurk.com/ ([#212](https://github.com/mikf/gallery-dl/issues/212)) - `sexcom` - https://www.sex.com/ ([#147](https://github.com/mikf/gallery-dl/issues/147)) - `--clear-cache` - `date` metadata fields for `deviantart`, `twitter`, and `tumblr` ([#224](https://github.com/mikf/gallery-dl/issues/224), [#232](https://github.com/mikf/gallery-dl/issues/232)) ### Changes - Standalone executables are now built using PyInstaller: - uses the latest CPython interpreter (Python 3.7.3) - available on several platforms (Windows, Linux, macOS) - includes the `certifi` CA bundle, `youtube-dl`, and `pyOpenSSL` on Windows ### Fixes - Patch `urllib3`'s default list of SSL/TLS ciphers to prevent Cloudflare CAPTCHAs ([#227](https://github.com/mikf/gallery-dl/issues/227)) (Windows users need to install `pyOpenSSL` for this to take effect) - Provide fallback URLs for `twitter` images ([#237](https://github.com/mikf/gallery-dl/issues/237)) - Send `Referer` headers when downloading from `hitomi` ([#239](https://github.com/mikf/gallery-dl/issues/239)) - Updated login procedure on `mangoxo` ## 1.8.2 - 2019-04-12 ### Additions - Support for - `pixnet` - https://www.pixnet.net/ ([#177](https://github.com/mikf/gallery-dl/issues/177)) - `wikiart` - https://www.wikiart.org/ ([#179](https://github.com/mikf/gallery-dl/issues/179)) - `mangoxo` - https://www.mangoxo.com/ ([#184](https://github.com/mikf/gallery-dl/issues/184)) - `yaplog` - https://yaplog.jp/ ([#190](https://github.com/mikf/gallery-dl/issues/190)) - `livedoor` - http://blog.livedoor.jp/ ([#190](https://github.com/mikf/gallery-dl/issues/190)) - Login support for `mangoxo` ([#184](https://github.com/mikf/gallery-dl/issues/184)) and `twitter` ([#214](https://github.com/mikf/gallery-dl/issues/214)) ### Changes - Increased required `Requests` version to 2.11.0 ### Fixes - Improved image quality on `reactor` sites ([#210](https://github.com/mikf/gallery-dl/issues/210)) - Support `imagebam` galleries with more than 100 images ([#219](https://github.com/mikf/gallery-dl/issues/219)) - Updated Cloudflare bypass code ## 1.8.1 - 2019-03-29 ### Additions - Support for: - `35photo` - https://35photo.pro/ ([#162](https://github.com/mikf/gallery-dl/issues/162)) - `500px` - https://500px.com/ ([#185](https://github.com/mikf/gallery-dl/issues/185)) - `instagram` extractor for hashtags ([#202](https://github.com/mikf/gallery-dl/issues/202)) - Option to get more metadata on `deviantart` ([#189](https://github.com/mikf/gallery-dl/issues/189)) - Man pages and bash completion ([#150](https://github.com/mikf/gallery-dl/issues/150)) - Snap improvements ([#197](https://github.com/mikf/gallery-dl/issues/197), [#199](https://github.com/mikf/gallery-dl/issues/199), [#207](https://github.com/mikf/gallery-dl/issues/207)) ### Changes - Better FFmpeg arguments for `--ugoira-conv` - Adjusted metadata for `luscious` albums ### Fixes - Proper handling of `instagram` multi-image posts ([#178](https://github.com/mikf/gallery-dl/issues/178), [#201](https://github.com/mikf/gallery-dl/issues/201)) - Fixed `tumblr` avatar URLs when not using OAuth1.0 ([#193](https://github.com/mikf/gallery-dl/issues/193)) - Miscellaneous fixes for `exhentai`, `komikcast` ## 1.8.0 - 2019-03-15 ### Additions - Support for: - `weibo` - https://www.weibo.com/ - `pururin` - https://pururin.io/ ([#174](https://github.com/mikf/gallery-dl/issues/174)) - `fashionnova` - https://www.fashionnova.com/ ([#175](https://github.com/mikf/gallery-dl/issues/175)) - `shopify` sites in general ([#175](https://github.com/mikf/gallery-dl/issues/175)) - Snap packaging ([#169](https://github.com/mikf/gallery-dl/issues/169), [#170](https://github.com/mikf/gallery-dl/issues/170), [#187](https://github.com/mikf/gallery-dl/issues/187), [#188](https://github.com/mikf/gallery-dl/issues/188)) - Automatic Cloudflare DDoS protection bypass - Extractor and Job information for logging format strings - `dynastyscans` image and search extractors ([#163](https://github.com/mikf/gallery-dl/issues/163)) - `deviantart` scraps extractor ([#168](https://github.com/mikf/gallery-dl/issues/168)) - `artstation` extractor for artwork listings ([#172](https://github.com/mikf/gallery-dl/issues/172)) - `smugmug` video support and improved image format selection ([#183](https://github.com/mikf/gallery-dl/issues/183)) ### Changes - More metadata for `nhentai` galleries - Combined `myportfolio` extractors into one - Renamed `name` metadata field to `filename` and removed the original `filename` field - Simplified and improved internal data structures - Optimized creation of child extractors ### Fixes - Filter empty `tumblr` URLs ([#165](https://github.com/mikf/gallery-dl/issues/165)) - Filter ads and improve connection speed on `hentaifoundry` - Show proper error messages if `luscious` galleries are unavailable - Miscellaneous fixes for `mangahere`, `ngomik`, `simplyhentai`, `imgspice` ### Removals - `seaotterscans` ## 1.7.0 - 2019-02-05 - Added support for: - `photobucket` - http://photobucket.com/ ([#117](https://github.com/mikf/gallery-dl/issues/117)) - `hentaifox` - https://hentaifox.com/ ([#160](https://github.com/mikf/gallery-dl/issues/160)) - `tsumino` - https://www.tsumino.com/ ([#161](https://github.com/mikf/gallery-dl/issues/161)) - Added the ability to dynamically generate extractors based on a user's config file for - [`mastodon`](https://github.com/tootsuite/mastodon) instances ([#144](https://github.com/mikf/gallery-dl/issues/144)) - [`foolslide`](https://github.com/FoolCode/FoOlSlide) based sites - [`foolfuuka`](https://github.com/FoolCode/FoolFuuka) based archives - Added an extractor for `behance` collections ([#157](https://github.com/mikf/gallery-dl/issues/157)) - Added login support for `luscious` ([#159](https://github.com/mikf/gallery-dl/issues/159)) and `tsumino` ([#161](https://github.com/mikf/gallery-dl/issues/161)) - Added an option to stop downloading if the `exhentai` image limit is exceeded ([#141](https://github.com/mikf/gallery-dl/issues/141)) - Fixed extraction issues for `behance` and `mangapark` ## 1.6.3 - 2019-01-18 - Added `metadata` post-processor to write image metadata to an external file ([#135](https://github.com/mikf/gallery-dl/issues/135)) - Added option to reverse chapter order of manga extractors ([#149](https://github.com/mikf/gallery-dl/issues/149)) - Added authentication support for `danbooru` ([#151](https://github.com/mikf/gallery-dl/issues/151)) - Added tag metadata for `exhentai` and `hbrowse` galleries - Improved `*reactor` extractors ([#148](https://github.com/mikf/gallery-dl/issues/148)) - Fixed extraction issues for `nhentai` ([#156](https://github.com/mikf/gallery-dl/issues/156)), `pinterest`, `mangapark` ## 1.6.2 - 2019-01-01 - Added support for: - `instagram` - https://www.instagram.com/ ([#134](https://github.com/mikf/gallery-dl/issues/134)) - Added support for multiple items on sta.sh pages ([#113](https://github.com/mikf/gallery-dl/issues/113)) - Added option to download `tumblr` avatars ([#137](https://github.com/mikf/gallery-dl/issues/137)) - Changed defaults for visited post types and inline media on `tumblr` - Improved inline extraction of `tumblr` posts ([#133](https://github.com/mikf/gallery-dl/issues/133), [#137](https://github.com/mikf/gallery-dl/issues/137)) - Improved error handling and retry behavior of all API calls - Improved handling of missing fields in format strings ([#136](https://github.com/mikf/gallery-dl/issues/136)) - Fixed hash extraction for unusual `tumblr` URLs ([#129](https://github.com/mikf/gallery-dl/issues/129)) - Fixed image subdomains for `hitomi` galleries ([#142](https://github.com/mikf/gallery-dl/issues/142)) - Fixed and improved miscellaneous issues for `kissmanga` ([#20](https://github.com/mikf/gallery-dl/issues/20)), `luscious`, `mangapark`, `readcomiconline` ## 1.6.1 - 2018-11-28 - Added support for: - `joyreactor` - http://joyreactor.cc/ ([#114](https://github.com/mikf/gallery-dl/issues/114)) - `pornreactor` - http://pornreactor.cc/ ([#114](https://github.com/mikf/gallery-dl/issues/114)) - `newgrounds` - https://www.newgrounds.com/ ([#119](https://github.com/mikf/gallery-dl/issues/119)) - Added extractor for search results on `luscious` ([#127](https://github.com/mikf/gallery-dl/issues/127)) - Fixed filenames of ZIP archives ([#126](https://github.com/mikf/gallery-dl/issues/126)) - Fixed extraction issues for `gfycat`, `hentaifoundry` ([#125](https://github.com/mikf/gallery-dl/issues/125)), `mangafox` ## 1.6.0 - 2018-11-17 - Added support for: - `wallhaven` - https://alpha.wallhaven.cc/ - `yuki` - https://yuki.la/ - Added youtube-dl integration and video downloads for `twitter` ([#99](https://github.com/mikf/gallery-dl/issues/99)), `behance`, `artstation` - Added per-extractor options for network connections (`retries`, `timeout`, `verify`) - Added a `--no-check-certificate` command-line option - Added ability to specify the number of skipped downloads before aborting/exiting ([#115](https://github.com/mikf/gallery-dl/issues/115)) - Added extractors for scraps, favorites, popular and recent images on `hentaifoundry` ([#110](https://github.com/mikf/gallery-dl/issues/110)) - Improved login procedure for `pixiv` to avoid unwanted emails on each new login - Improved album metadata and error handling for `flickr` ([#109](https://github.com/mikf/gallery-dl/issues/109)) - Updated default User-Agent string to Firefox 62 ([#122](https://github.com/mikf/gallery-dl/issues/122)) - Fixed `twitter` API response handling when logged in ([#123](https://github.com/mikf/gallery-dl/issues/123)) - Fixed issue when converting Ugoira using H.264 - Fixed miscellaneous issues for `2chan`, `deviantart`, `fallenangels`, `flickr`, `imagefap`, `pinterest`, `turboimagehost`, `warosu`, `yuki` ([#112](https://github.com/mikf/gallery-dl/issues/112)) ## 1.5.3 - 2018-09-14 - Added support for: - `hentaicafe` - https://hentai.cafe/ ([#101](https://github.com/mikf/gallery-dl/issues/101)) - `bobx` - http://www.bobx.com/dark/ - Added black-/whitelist options for post-processor modules - Added support for `tumblr` inline videos ([#102](https://github.com/mikf/gallery-dl/issues/102)) - Fixed extraction of `smugmug` albums without owner ([#100](https://github.com/mikf/gallery-dl/issues/100)) - Fixed issues when using default config values with `reddit` extractors ([#104](https://github.com/mikf/gallery-dl/issues/104)) - Fixed pagination for user favorites on `sankaku` ([#106](https://github.com/mikf/gallery-dl/issues/106)) - Fixed a crash when processing `deviantart` journals ([#108](https://github.com/mikf/gallery-dl/issues/108)) ## 1.5.2 - 2018-08-31 - Added support for `twitter` timelines ([#96](https://github.com/mikf/gallery-dl/issues/96)) - Added option to suppress FFmpeg output during ugoira conversions - Improved filename formatter performance - Improved inline image quality on `tumblr` ([#98](https://github.com/mikf/gallery-dl/issues/98)) - Fixed image URLs for newly released `mangadex` chapters - Fixed a smaller issue with `deviantart` journals - Replaced `subapics` with `ngomik` ## 1.5.1 - 2018-08-17 - Added support for: - `piczel` - https://piczel.tv/ - Added support for related pins on `pinterest` - Fixed accessing "offensive" galleries on `exhentai` ([#97](https://github.com/mikf/gallery-dl/issues/97)) - Fixed extraction issues for `mangadex`, `komikcast` and `behance` - Removed original-image functionality from `tumblr`, since "raw" images are no longer accessible ## 1.5.0 - 2018-08-03 - Added support for: - `behance` - https://www.behance.net/ - `myportfolio` - https://www.myportfolio.com/ ([#95](https://github.com/mikf/gallery-dl/issues/95)) - Added custom format string options to handle long strings ([#92](https://github.com/mikf/gallery-dl/issues/92), [#94](https://github.com/mikf/gallery-dl/issues/94)) - Slicing: `"{field[10:40]}"` - Replacement: `"{field:L40/too long/}"` - Improved frame rate handling for ugoira conversions - Improved private access token usage on `deviantart` - Fixed metadata extraction for some images on `nijie` - Fixed chapter extraction on `mangahere` - Removed `whatisthisimnotgoodwithcomputers` - Removed support for Python 3.3 ## 1.4.2 - 2018-07-06 - Added image-pool extractors for `safebooru` and `rule34` - Added option for extended tag information on `booru` sites ([#92](https://github.com/mikf/gallery-dl/issues/92)) - Added support for DeviantArt's new URL format - Added support for `mangapark` mirrors - Changed `imagefap` extractors to use HTTPS - Fixed crash when skipping downloads for files without known extension ## 1.4.1 - 2018-06-22 - Added an `ugoira` post-processor to convert `pixiv` animations to WebM - Added `--zip` and `--ugoira-conv` command-line options - Changed how ugoira frame information is handled - instead of being written to a separate file, it is now made available as metadata field of the ZIP archive - Fixed manga and chapter titles for `mangadex` - Fixed file deletion by post-processors ## 1.4.0 - 2018-06-08 - Added support for: - `simplyhentai` - https://www.simply-hentai.com/ ([#89](https://github.com/mikf/gallery-dl/issues/89)) - Added extractors for - `pixiv` search results and followed users - `deviantart` search results and popular listings - Added post-processors to perform actions on downloaded files - Added options to configure logging behavior - Added OAuth support for `smugmug` - Changed `pixiv` extractors to use the AppAPI - this breaks `favorite` archive IDs and changes some metadata fields - Changed the default filename format for `tumblr` and renamed `offset` to `num` - Fixed a possible UnicodeDecodeError during installation ([#86](https://github.com/mikf/gallery-dl/issues/86)) - Fixed extraction of `mangadex` manga with more than 100 chapters ([#84](https://github.com/mikf/gallery-dl/issues/84)) - Fixed miscellaneous issues for `imgur`, `reddit`, `komikcast`, `mangafox` and `imagebam` ## 1.3.5 - 2018-05-04 - Added support for: - `smugmug` - https://www.smugmug.com/ - Added title information for `mangadex` chapters - Improved the `pinterest` API implementation ([#83](https://github.com/mikf/gallery-dl/issues/83)) - Improved error handling for `deviantart` and `tumblr` - Removed `gomanga` and `puremashiro` ## 1.3.4 - 2018-04-20 - Added support for custom OAuth2 credentials for `pinterest` - Improved rate limit handling for `tumblr` extractors - Improved `hentaifoundry` extractors - Improved `imgur` URL patterns - Fixed miscellaneous extraction issues for `luscious` and `komikcast` - Removed `loveisover` and `spectrumnexus` ## 1.3.3 - 2018-04-06 - Added extractors for - `nhentai` search results - `exhentai` search results and favorites - `nijie` doujins and favorites - Improved metadata extraction for `exhentai` and `nijie` - Improved `tumblr` extractors by avoiding unnecessary API calls - Fixed Cloudflare DDoS protection bypass - Fixed errors when trying to print unencodable characters ## 1.3.2 - 2018-03-23 - Added extractors for `artstation` albums, challenges and search results - Improved URL and metadata extraction for `hitomi`and `nhentai` - Fixed page transitions for `danbooru` API results ([#82](https://github.com/mikf/gallery-dl/issues/82)) ## 1.3.1 - 2018-03-16 - Added support for: - `mangadex` - https://mangadex.org/ - `artstation` - https://www.artstation.com/ - Added Cloudflare DDoS protection bypass to `komikcast` extractors - Changed archive ID formats for `deviantart` folders and collections - Improved error handling for `deviantart` API calls - Removed `imgchili` and various smaller image hosts ## 1.3.0 - 2018-03-02 - Added `--proxy` to explicitly specify a proxy server ([#76](https://github.com/mikf/gallery-dl/issues/76)) - Added options to customize [archive ID formats](https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst#extractorarchive-format) and [undefined replacement fields](https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst#extractorkeywords-default) - Changed various archive ID formats to improve their behavior for favorites / bookmarks / etc. - Affected modules are `deviantart`, `flickr`, `tumblr`, `pixiv` and all …boorus - Improved `sankaku` and `idolcomplex` support by - respecting `page` and `next` URL parameters ([#79](https://github.com/mikf/gallery-dl/issues/79)) - bypassing the page-limit for unauthenticated users - Improved `directlink` metadata by properly unquoting it - Fixed `pixiv` ugoira extraction ([#78](https://github.com/mikf/gallery-dl/issues/78)) - Fixed miscellaneous extraction issues for `mangastream` and `tumblr` - Removed `yeet`, `chronos`, `coreimg`, `hosturimage`, `imageontime`, `img4ever`, `imgmaid`, `imgupload` ## 1.2.0 - 2018-02-16 - Added support for: - `paheal` - https://rule34.paheal.net/ ([#69](https://github.com/mikf/gallery-dl/issues/69)) - `komikcast` - https://komikcast.com/ ([#70](https://github.com/mikf/gallery-dl/issues/70)) - `subapics` - http://subapics.com/ ([#70](https://github.com/mikf/gallery-dl/issues/70)) - Added `--download-archive` to record downloaded files in an archive file - Added `--write-log` to write logging output to a file - Added a filetype check on download completion to fix incorrectly assigned filename extensions ([#63](https://github.com/mikf/gallery-dl/issues/63)) - Added the `tumblr:...` pseudo URI scheme to support custom domains for Tumblr blogs ([#71](https://github.com/mikf/gallery-dl/issues/71)) - Added fallback URLs for `tumblr` images ([#64](https://github.com/mikf/gallery-dl/issues/64)) - Added support for `reddit`-hosted images ([#68](https://github.com/mikf/gallery-dl/issues/68)) - Improved the input file format by allowing comments and per-URL options - Fixed OAuth 1.0 signature generation for Python 3.3 and 3.4 ([#75](https://github.com/mikf/gallery-dl/issues/75)) - Fixed smaller issues for `luscious`, `hentai2read`, `hentaihere` and `imgur` - Removed the `batoto` module ## 1.1.2 - 2018-01-12 - Added support for: - `puremashiro` - http://reader.puremashiro.moe/ ([#66](https://github.com/mikf/gallery-dl/issues/66)) - `idolcomplex` - https://idol.sankakucomplex.com/ - Added an option to filter reblogs on `tumblr` ([#61](https://github.com/mikf/gallery-dl/issues/61)) - Added OAuth user authentication for `tumblr` ([#65](https://github.com/mikf/gallery-dl/issues/65)) - Added support for `slideshare` mobile URLs ([#67](https://github.com/mikf/gallery-dl/issues/67)) - Improved pagination for various …booru sites to work around page limits - Fixed chapter information parsing for certain manga on `kissmanga` ([#58](https://github.com/mikf/gallery-dl/issues/58)) and `batoto` ([#60](https://github.com/mikf/gallery-dl/issues/60)) ## 1.1.1 - 2017-12-22 - Added support for: - `slideshare` - https://www.slideshare.net/ ([#54](https://github.com/mikf/gallery-dl/issues/54)) - Added pool- and post-extractors for `sankaku` - Added OAuth user authentication for `deviantart` - Updated `luscious` to support `members.luscious.net` URLs ([#55](https://github.com/mikf/gallery-dl/issues/55)) - Updated `mangahere` to use their new domain name (mangahere.cc) and support mobile URLs - Updated `gelbooru` to not be restricted to the first 20,000 images ([#56](https://github.com/mikf/gallery-dl/issues/56)) - Fixed extraction issues for `nhentai` and `khinsider` ## 1.1.0 - 2017-12-08 - Added the ``-r/--limit-rate`` command-line option to set a maximum download rate - Added the ``--sleep`` command-line option to specify the number of seconds to sleep before each download - Updated `gelbooru` to no longer use their now disabled API - Fixed SWF extraction for `sankaku` ([#52](https://github.com/mikf/gallery-dl/issues/52)) - Fixed extraction issues for `hentai2read` and `khinsider` - Removed the deprecated `--images` and `--chapters` options - Removed the ``mangazuki`` module ## 1.0.2 - 2017-11-24 - Added an option to set a [custom user-agent string](https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst#extractoruser-agent) - Improved retry behavior for failed HTTP requests - Improved `seiga` by providing better metadata and getting more than the latest 200 images - Improved `tumblr` by adding support for [all post types](https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst#extractortumblrposts), scanning for [inline images](https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst#extractortumblrinline) and following [external links](https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst#extractortumblrexternal) ([#48](https://github.com/mikf/gallery-dl/issues/48)) - Fixed extraction issues for `hbrowse`, `khinsider` and `senmanga` ## 1.0.1 - 2017-11-10 - Added support for: - `xvideos` - https://www.xvideos.com/ ([#45](https://github.com/mikf/gallery-dl/issues/45)) - Fixed exception handling during file downloads which could lead to a premature exit - Fixed an issue with `tumblr` where not all images would be downloaded when using tags ([#48](https://github.com/mikf/gallery-dl/issues/48)) - Fixed extraction issues for `imgbox` ([#47](https://github.com/mikf/gallery-dl/issues/47)), `mangastream` ([#49](https://github.com/mikf/gallery-dl/issues/49)) and `mangahere` ## 1.0.0 - 2017-10-27 - Added support for: - `warosu` - https://warosu.org/ - `b4k` - https://arch.b4k.co/ - Added support for `pixiv` ranking lists - Added support for `booru` popular lists (`danbooru`, `e621`, `konachan`, `yandere`, `3dbooru`) - Added the `--cookies` command-line and [`cookies`](https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst#extractorcookies) config option to load additional cookies - Added the `--filter` and `--chapter-filter` command-line options to select individual images or manga-chapters by their metadata using simple Python expressions ([#43](https://github.com/mikf/gallery-dl/issues/43)) - Added the [`verify`](https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst#downloaderhttpverify) config option to control certificate verification during file downloads - Added config options to overwrite internally used API credentials ([API Tokens & IDs](https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst#api-tokens-ids)) - Added `-K` as a shortcut for `--list-keywords` - Changed the `--images` and `--chapters` command-line options to `--range` and `--chapter-range` - Changed keyword names for various modules to make them accessible by `--filter`. In general minus signs have been replaced with underscores (e.g. `gallery-id` -> `gallery_id`). - Changed default filename formats for manga extractors to optionally use volume and title information - Improved the downloader modules to use [`.part` files](https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst#downloaderpart) and support resuming incomplete downloads ([#29](https://github.com/mikf/gallery-dl/issues/29)) - Improved `deviantart` by distinguishing between users and groups ([#26](https://github.com/mikf/gallery-dl/issues/26)), always using HTTPS, and always downloading full-sized original images - Improved `sankaku` by adding authentication support and fixing various other issues ([#44](https://github.com/mikf/gallery-dl/issues/44)) - Improved URL pattern for direct image links ([#30](https://github.com/mikf/gallery-dl/issues/30)) - Fixed an issue with `luscious` not getting original image URLs ([#33](https://github.com/mikf/gallery-dl/issues/33)) - Fixed various smaller issues for `batoto`, `hentai2read` ([#38](https://github.com/mikf/gallery-dl/issues/38)), `jaiminisbox`, `khinsider`, `kissmanga` ([#28](https://github.com/mikf/gallery-dl/issues/28), [#46](https://github.com/mikf/gallery-dl/issues/46)), `mangahere`, `pawoo`, `twitter` - Removed `kisscomic` and `yonkouprod` modules ## 0.9.1 - 2017-07-24 - Added support for: - `2chan` - https://www.2chan.net/ - `4plebs` - https://archive.4plebs.org/ - `archivedmoe` - https://archived.moe/ - `archiveofsins` - https://archiveofsins.com/ - `desuarchive` - https://desuarchive.org/ - `fireden` - https://boards.fireden.net/ - `loveisover` - https://archive.loveisover.me/ - `nyafuu` - https://archive.nyafuu.org/ - `rbt` - https://rbt.asia/ - `thebarchive` - https://thebarchive.com/ - `mangazuki` - https://mangazuki.co/ - Improved `reddit` to allow submission filtering by ID and human-readable dates - Improved `deviantart` to support group galleries and gallery folders ([#26](https://github.com/mikf/gallery-dl/issues/26)) - Changed `deviantart` to use better default path formats - Fixed extraction of larger `imgur` albums - Fixed some smaller issues for `pixiv`, `batoto` and `fallenangels` ## 0.9.0 - 2017-06-28 - Added support for: - `reddit` - https://www.reddit.com/ ([#15](https://github.com/mikf/gallery-dl/issues/15)) - `flickr` - https://www.flickr.com/ ([#16](https://github.com/mikf/gallery-dl/issues/16)) - `gfycat` - https://gfycat.com/ - Added support for direct image links - Added user authentication via [OAuth](https://github.com/mikf/gallery-dl#52oauth) for `reddit` and `flickr` - Added support for user authentication data from [`.netrc`](https://stackoverflow.com/tags/.netrc/info) files ([#22](https://github.com/mikf/gallery-dl/issues/22)) - Added a simple progress indicator for multiple URLs ([#19](https://github.com/mikf/gallery-dl/issues/19)) - Added the `--write-unsupported` command-line option to write unsupported URLs to a file - Added documentation for all available config options ([configuration.rst](https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst)) - Improved `pixiv` to support tags for user downloads ([#17](https://github.com/mikf/gallery-dl/issues/17)) - Improved `pixiv` to support shortened and http://pixiv.me/... URLs ([#23](https://github.com/mikf/gallery-dl/issues/23)) - Improved `imgur` to properly handle `.gifv` images and provide better metadata - Fixed an issue with `kissmanga` where metadata parsing for some series failed ([#20](https://github.com/mikf/gallery-dl/issues/20)) - Fixed an issue with getting filename extensions from `Content-Type` response headers ## 0.8.4 - 2017-05-21 - Added the `--abort-on-skip` option to stop extraction if a download would be skipped - Improved the output format of the `--list-keywords` option - Updated `deviantart` to support all media types and journals - Updated `fallenangels` to support their [Vietnamese version](https://truyen.fascans.com/) - Fixed an issue with multiple tags on ...booru sites - Removed the `yomanga` module ## 0.8.3 - 2017-05-01 - Added support for https://pawoo.net/ - Added manga extractors for all [FoOlSlide](https://foolcode.github.io/FoOlSlide/)-based modules - Added the `-q/--quiet` and `-v/--verbose` options to control output verbosity - Added the `-j/--dump-json` option to dump extractor results in JSON format - Added the `--ignore-config` option - Updated the `exhentai` extractor to fall back to using the e-hentai version if no username is given - Updated `deviantart` to support sta.sh URLs - Fixed an issue with `kissmanga` which prevented image URLs from being decrypted properly (again) - Fixed an issue with `pixhost` where for an image inside an album it would always download the first image of that album ([#13](https://github.com/mikf/gallery-dl/issues/13)) - Removed the `mangashare` and `readcomics` modules ## 0.8.2 - 2017-04-10 - Fixed an issue in `kissmanga` which prevented image URLs from being decrypted properly ## 0.8.1 - 2017-04-09 - Added new extractors: - `kireicake` - https://reader.kireicake.com/ - `seaotterscans` - https://reader.seaotterscans.com/ - Added a favourites extractor for `deviantart` - Re-enabled the `kissmanga` module - Updated `nijie` to support multi-page image listings - Updated `mangastream` to support readms.net URLs - Updated `exhentai` to support e-hentai.org URLs - Updated `fallenangels` to support their new domain and site layout ## 0.8.0 - 2017-03-28 - Added logging support - Added the `-R/--retries` option to specify how often a download should be retried before giving up - Added the `--http-timeout` option to set a timeout for HTTP connections - Improved error handling/tolerance during HTTP file downloads ([#10](https://github.com/mikf/gallery-dl/issues/10)) - Improved option parsing and the help message from `-h/--help` - Changed the way configuration values are used by prioritizing top-level values - This allows for cmdline options like `-u/--username` to overwrite values set in configuration files - Fixed an issue with `imagefap.com` where incorrectly reported gallery sizes would cause the extractor to fail ([#9](https://github.com/mikf/gallery-dl/issues/9)) - Fixed an issue with `seiga.nicovideo.jp` where invalid characters in an API response caused the XML parser to fail - Fixed an issue with `seiga.nicovideo.jp` where the filename extension for the first image would be used for all others - Removed support for old configuration paths on Windows - Removed several modules: - `mangamint`: site is down - `whentai`: now requires account with VIP status for original images - `kissmanga`: encrypted image URLs (will be re-added later) ## 0.7.0 - 2017-03-06 - Added `--images` and `--chapters` options - Specifies which images (or chapters) to download through a comma-separated list of indices or index-ranges - Example: `--images -2,4,6-8,10-` will select images with index 1, 2, 4, 6, 7, 8 and 10 up to the last one - Changed the `-g`/`--get-urls` option - The amount of how often the -g option is given now determines up until which level URLs are resolved. - See 3bca86618505c21628cd9c7179ce933a78d00ca2 - Changed several option keys: - `directory_fmt` -> `directory` - `filename_fmt` -> `filename` - `download-original` -> `original` - Improved [FoOlSlide](https://foolcode.github.io/FoOlSlide/)-based extractors - Fixed URL extraction for hentai2read - Fixed an issue with deviantart, where the API access token wouldn't get refreshed ## 0.6.4 - 2017-02-13 - Added new extractors: - fallenangels (famatg.com) - Fixed url- and data-extraction for: - nhentai - mangamint - twitter - imagetwist - Disabled InsecureConnectionWarning when no certificates are available ## 0.6.3 - 2017-01-25 - Added new extractors: - gomanga - yomanga - mangafox - Fixed deviantart extractor failing - switched to using their API - Fixed an issue with SQLite on Python 3.6 - Automated test builds via Travis CI - Standalone executables for Windows ## 0.6.2 - 2017-01-05 - Added new extractors: - kisscomic - readcomics - yonkouprod - jaiminisbox - Added manga extractor to batoto-module - Added user extractor to seiga-module - Added `-i`/`--input-file` argument to allow local files and stdin as input (like wget) - Added basic support for `file://` URLs - this allows for the recursive extractor to be applied to local files: - `$ gallery-dl r:file://[path to file]` - Added a utility extractor to run unit test URLs - Updated luscious to deal with API changes - Fixed twitter to provide the original image URL - Minor fixes to hentaifoundry - Removed imgclick extractor ## 0.6.1 - 2016-11-30 - Added new extractors: - whentai - readcomiconline - sensescans, worldthree - imgmaid, imagevenue, img4ever, imgspot, imgtrial, pixhost - Added base class for extractors of [FoOlSlide](https://foolcode.github.io/FoOlSlide/)-based sites - Changed default paths for configuration files on Windows - old paths are still supported, but that will change in future versions - Fixed aborting downloads if a single one failed ([#5](https://github.com/mikf/gallery-dl/issues/5)) - Fixed cloudflare-bypass cache containing outdated cookies - Fixed image URLs for hitomi and 8chan - Updated deviantart to always provide the highest quality image - Updated README.rst - Removed doujinmode extractor ## 0.6.0 - 2016-10-08 - Added new extractors: - hentaihere - dokireader - twitter - rapidimg, picmaniac - Added support to find filename extensions by Content-Type response header - Fixed filename/path issues on Windows ([#4](https://github.com/mikf/gallery-dl/issues/4)): - Enable path names with more than 260 characters - Remove trailing spaces in path segments - Updated Job class to automatically set category/subcategory keywords ## 0.5.2 - 2016-09-23 - Added new extractors: - pinterest - rule34 - dynastyscans - imagebam, coreimg, imgcandy, imgtrex - Added login capabilities for batoto - Added `--version` cmdline argument to print the current program version and exit - Added `--list-extractors` cmdline argument to print names of all extractor classes together with descriptions and example URLs - Added proper error messages if an image/user does not exist - Added unittests for every extractor ## 0.5.1 - 2016-08-22 - Added new extractors: - luscious - doujinmode - hentaibox - seiga - imagefap - Changed error output to use stderr instead of stdout - Fixed broken pipes causing an exception-dump by catching BrokenPipeErrors ## 0.5.0 - 2016-07-25 ## 0.4.1 - 2015-12-03 - New modules (imagetwist, turboimagehost) - Manga-extractors: Download entire manga and not just single chapters - Generic extractor (provisional) - Better and configurable console output - Windows support ## 0.4.0 - 2015-11-26 ## 0.3.3 - 2015-11-10 ## 0.3.2 - 2015-11-04 ## 0.3.1 - 2015-10-30 ## 0.3.0 - 2015-10-05 ## 0.2.0 - 2015-06-28 ## 0.1.0 - 2015-05-27 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1513530019.0 gallery_dl-1.25.0/LICENSE0000644000175000017500000004325413215521243013424 0ustar00mikemike GNU GENERAL PUBLIC LICENSE Version 2, June 1991 Copyright (C) 1989, 1991 Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This General Public License applies to most of the Free Software Foundation's software and to any other program whose authors commit to using it. (Some other Free Software Foundation software is covered by the GNU Lesser General Public License instead.) You can apply it to your programs, too. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things. To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it. For example, if you distribute copies of such a program, whether gratis or for a fee, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights. We protect your rights with two steps: (1) copyright the software, and (2) offer you this license which gives you legal permission to copy, distribute and/or modify the software. Also, for each author's protection and ours, we want to make certain that everyone understands that there is no warranty for this free software. If the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors' reputations. Finally, any free program is threatened constantly by software patents. We wish to avoid the danger that redistributors of a free program will individually obtain patent licenses, in effect making the program proprietary. To prevent this, we have made it clear that any patent must be licensed for everyone's free use or not licensed at all. The precise terms and conditions for copying, distribution and modification follow. GNU GENERAL PUBLIC LICENSE TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 0. This License applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License. The "Program", below, refers to any such program or work, and a "work based on the Program" means either the Program or any derivative work under copyright law: that is to say, a work containing the Program or a portion of it, either verbatim or with modifications and/or translated into another language. (Hereinafter, translation is included without limitation in the term "modification".) Each licensee is addressed as "you". Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running the Program is not restricted, and the output from the Program is covered only if its contents constitute a work based on the Program (independent of having been made by running the Program). Whether that is true depends on what the Program does. 1. You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and give any other recipients of the Program a copy of this License along with the Program. You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. 2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions: a) You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change. b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License. c) If the modified program normally reads commands interactively when run, you must cause it, when started running for such interactive use in the most ordinary way, to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty (or else, saying that you provide a warranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this License. (Exception: if the Program itself is interactive but does not normally print such an announcement, your work based on the Program is not required to print an announcement.) These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Program, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Program, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it. Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Program. In addition, mere aggregation of another work not based on the Program with the Program (or with a work based on the Program) on a volume of a storage or distribution medium does not bring the other work under the scope of this License. 3. You may copy and distribute the Program (or a work based on it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following: a) Accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, b) Accompany it with a written offer, valid for at least three years, to give any third party, for a charge no more than your cost of physically performing source distribution, a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, c) Accompany it with the information you received as to the offer to distribute corresponding source code. (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer, in accord with Subsection b above.) The source code for a work means the preferred form of the work for making modifications to it. For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable. However, as a special exception, the source code distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable. If distribution of executable or object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place counts as distribution of the source code, even though third parties are not compelled to copy the source along with the object code. 4. You may not copy, modify, sublicense, or distribute the Program except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense or distribute the Program is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. 5. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Program or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Program (or any work based on the Program), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Program or works based on it. 6. Each time you redistribute the Program (or any work based on the Program), the recipient automatically receives a license from the original licensor to copy, distribute or modify the Program subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties to this License. 7. If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Program at all. For example, if a patent license would not permit royalty-free redistribution of the Program by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Program. If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply and the section as a whole is intended to apply in other circumstances. It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system, which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice. This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License. 8. If the distribution and/or use of the Program is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Program under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License. 9. The Free Software Foundation may publish revised and/or new versions of the General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies a version number of this License which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of this License, you may choose any version ever published by the Free Software Foundation. 10. If you wish to incorporate parts of the Program into other free programs whose distribution conditions are different, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. NO WARRANTY 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. END OF TERMS AND CONDITIONS How to Apply These Terms to Your New Programs If you develop a new program, and you want it to be of the greatest possible use to the public, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms. To do so, attach the following notices to the program. It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. Copyright (C) This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. Also add information on how to contact you by electronic and paper mail. If the program is interactive, make it output a short notice like this when it starts in an interactive mode: Gnomovision version 69, Copyright (C) year name of author Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. This is free software, and you are welcome to redistribute it under certain conditions; type `show c' for details. The hypothetical commands `show w' and `show c' should show the appropriate parts of the General Public License. Of course, the commands you use may be called something other than `show w' and `show c'; they could even be mouse-clicks or menu items--whatever suits your program. You should also get your employer (if you work as a programmer) or your school, if any, to sign a "copyright disclaimer" for the program, if necessary. Here is a sample; alter the names: Yoyodyne, Inc., hereby disclaims all copyright interest in the program `Gnomovision' (which makes passes at compilers) written by James Hacker. , 1 April 1989 Ty Coon, President of Vice This General Public License does not permit incorporating your program into proprietary programs. If your program is a subroutine library, you may consider it more useful to permit linking proprietary applications with the library. If this is what you want to do, use the GNU Lesser General Public License instead of this License. ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1618779451.0 gallery_dl-1.25.0/MANIFEST.in0000644000175000017500000000010614037116473014152 0ustar00mikemikeinclude README.rst CHANGELOG.md LICENSE recursive-include docs *.conf ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1678564859.9818535 gallery_dl-1.25.0/PKG-INFO0000644000175000017500000002725214403156774013531 0ustar00mikemikeMetadata-Version: 2.1 Name: gallery_dl Version: 1.25.0 Summary: Command-line program to download image galleries and collections from several image hosting sites Home-page: https://github.com/mikf/gallery-dl Download-URL: https://github.com/mikf/gallery-dl/releases/latest Author: Mike Fährmann Author-email: mike_faehrmann@web.de Maintainer: Mike Fährmann Maintainer-email: mike_faehrmann@web.de License: GPLv2 Keywords: image gallery downloader crawler scraper Classifier: Development Status :: 5 - Production/Stable Classifier: Environment :: Console Classifier: Intended Audience :: End Users/Desktop Classifier: License :: OSI Approved :: GNU General Public License v2 (GPLv2) Classifier: Operating System :: OS Independent Classifier: Programming Language :: Python Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3 :: Only Classifier: Programming Language :: Python :: 3.4 Classifier: Programming Language :: Python :: 3.5 Classifier: Programming Language :: Python :: 3.6 Classifier: Programming Language :: Python :: 3.7 Classifier: Programming Language :: Python :: 3.8 Classifier: Programming Language :: Python :: 3.9 Classifier: Programming Language :: Python :: 3.10 Classifier: Programming Language :: Python :: 3.11 Classifier: Programming Language :: Python :: Implementation :: CPython Classifier: Programming Language :: Python :: Implementation :: PyPy Classifier: Topic :: Internet :: WWW/HTTP Classifier: Topic :: Multimedia :: Graphics Classifier: Topic :: Utilities Requires-Python: >=3.4 Provides-Extra: video License-File: LICENSE ========== gallery-dl ========== *gallery-dl* is a command-line program to download image galleries and collections from several image hosting sites (see `Supported Sites `__). It is a cross-platform tool with many `configuration options `__ and powerful `filenaming capabilities `__. |pypi| |build| .. contents:: Dependencies ============ - Python_ 3.4+ - Requests_ Optional -------- - FFmpeg_: Pixiv Ugoira conversion - yt-dlp_ or youtube-dl_: Video downloads - PySocks_: SOCKS proxy support - brotli_ or brotlicffi_: Brotli compression support Installation ============ Pip --- The stable releases of *gallery-dl* are distributed on PyPI_ and can be easily installed or upgraded using pip_: .. code:: bash python3 -m pip install -U gallery-dl Installing the latest dev version directly from GitHub can be done with pip_ as well: .. code:: bash python3 -m pip install -U -I --no-deps --no-cache-dir https://github.com/mikf/gallery-dl/archive/master.tar.gz Note: Windows users should use :code:`py -3` instead of :code:`python3`. It is advised to use the latest version of pip_, including the essential packages :code:`setuptools` and :code:`wheel`. To ensure these packages are up-to-date, run .. code:: bash python3 -m pip install --upgrade pip setuptools wheel Standalone Executable --------------------- Prebuilt executable files with a Python interpreter and required Python packages included are available for - `Windows `__ (Requires `Microsoft Visual C++ Redistributable Package (x86) `__) - `Linux `__ Nightly Builds -------------- | Executables build from the latest commit can be found at | https://github.com/mikf/gallery-dl/actions/workflows/executables.yml Snap ---- Linux users that are using a distro that is supported by Snapd_ can install *gallery-dl* from the Snap Store: .. code:: bash snap install gallery-dl Chocolatey ---------- Windows users that have Chocolatey_ installed can install *gallery-dl* from the Chocolatey Community Packages repository: .. code:: powershell choco install gallery-dl Scoop ----- *gallery-dl* is also available in the Scoop_ "main" bucket for Windows users: .. code:: powershell scoop install gallery-dl Homebrew -------- For macOS or Linux users using Homebrew: .. code:: bash brew install gallery-dl Usage ===== To use *gallery-dl* simply call it with the URLs you wish to download images from: .. code:: bash gallery-dl [OPTIONS]... URLS... Use :code:`gallery-dl --help` or see ``__ for a full list of all command-line options. Examples -------- Download images; in this case from danbooru via tag search for 'bonocho': .. code:: bash gallery-dl "https://danbooru.donmai.us/posts?tags=bonocho" Get the direct URL of an image from a site supporting authentication with username & password: .. code:: bash gallery-dl -g -u "" -p "" "https://twitter.com/i/web/status/604341487988576256" Filter manga chapters by chapter number and language: .. code:: bash gallery-dl --chapter-filter "10 <= chapter < 20" -o "lang=fr" "https://mangadex.org/title/59793dd0-a2d8-41a2-9758-8197287a8539" | Search a remote resource for URLs and download images from them: | (URLs for which no extractor can be found will be silently ignored) .. code:: bash gallery-dl "r:https://pastebin.com/raw/FLwrCYsT" If a site's address is nonstandard for its extractor, you can prefix the URL with the extractor's name to force the use of a specific extractor: .. code:: bash gallery-dl "tumblr:https://sometumblrblog.example" Configuration ============= Configuration files for *gallery-dl* use a JSON-based file format. Documentation ------------- A list of all available configuration options and their descriptions can be found in ``__. | For a default configuration file with available options set to their default values, see ``__. | For a commented example with more involved settings and option usage, see ``__. Locations --------- *gallery-dl* searches for configuration files in the following places: Windows: * ``%APPDATA%\gallery-dl\config.json`` * ``%USERPROFILE%\gallery-dl\config.json`` * ``%USERPROFILE%\gallery-dl.conf`` (``%USERPROFILE%`` usually refers to a user's home directory, i.e. ``C:\Users\\``) Linux, macOS, etc.: * ``/etc/gallery-dl.conf`` * ``${XDG_CONFIG_HOME}/gallery-dl/config.json`` * ``${HOME}/.config/gallery-dl/config.json`` * ``${HOME}/.gallery-dl.conf`` When run as `executable `__, *gallery-dl* will also look for a ``gallery-dl.conf`` file in the same directory as said executable. It is possible to use more than one configuration file at a time. In this case, any values from files after the first will get merged into the already loaded settings and potentially override previous ones. Authentication ============== Username & Password ------------------- Some extractors require you to provide valid login credentials in the form of a username & password pair. This is necessary for ``nijie`` and optional for ``aryion``, ``danbooru``, ``e621``, ``exhentai``, ``idolcomplex``, ``imgbb``, ``inkbunny``, ``mangadex``, ``mangoxo``, ``pillowfort``, ``sankaku``, ``subscribestar``, ``tapas``, ``tsumino``, ``twitter``, and ``zerochan``. You can set the necessary information in your `configuration file `__ .. code:: json { "extractor": { "twitter": { "username": "", "password": "" } } } or you can provide them directly via the :code:`-u/--username` and :code:`-p/--password` or via the :code:`-o/--option` command-line options .. code:: bash gallery-dl -u "" -p "" "URL" gallery-dl -o "username=" -o "password=" "URL" Cookies ------- For sites where login with username & password is not possible due to CAPTCHA or similar, or has not been implemented yet, you can use the cookies from a browser login session and input them into *gallery-dl*. This can be done via the `cookies `__ option in your configuration file by specifying - | the path to a Mozilla/Netscape format cookies.txt file exported by a browser addon | (e.g. `Export Cookies `__ for Firefox) - | a list of name-value pairs gathered from your browser's web developer tools | (in `Chrome `__, in `Firefox `__) - | the name of a browser to extract cookies from | (supported browsers are Chromium-based ones, Firefox, and Safari) For example: .. code:: json { "extractor": { "instagram": { "cookies": "$HOME/path/to/cookies.txt" }, "patreon": { "cookies": { "session_id": "K1T57EKu19TR49C51CDjOJoXNQLF7VbdVOiBrC9ye0a" } }, "twitter": { "cookies": ["firefox"] } } } | You can also specify a cookies.txt file with the :code:`--cookies` command-line option | or a browser to extract cookies from with :code:`--cookies-from-browser`: .. code:: bash gallery-dl --cookies "$HOME/path/to/cookies.txt" "URL" gallery-dl --cookies-from-browser firefox "URL" OAuth ----- *gallery-dl* supports user authentication via OAuth_ for some extractors. This is necessary for ``pixiv`` and optional for ``deviantart``, ``flickr``, ``reddit``, ``smugmug``, ``tumblr``, and ``mastodon`` instances. Linking your account to *gallery-dl* grants it the ability to issue requests on your account's behalf and enables it to access resources which would otherwise be unavailable to a public user. To do so, start by invoking it with ``oauth:`` as an argument. For example: .. code:: bash gallery-dl oauth:flickr You will be sent to the site's authorization page and asked to grant read access to *gallery-dl*. Authorize it and you will be shown one or more "tokens", which should be added to your configuration file. To authenticate with a ``mastodon`` instance, run *gallery-dl* with ``oauth:mastodon:`` as argument. For example: .. code:: bash gallery-dl oauth:mastodon:pawoo.net gallery-dl oauth:mastodon:https://mastodon.social/ .. _Python: https://www.python.org/downloads/ .. _PyPI: https://pypi.org/ .. _pip: https://pip.pypa.io/en/stable/ .. _Requests: https://requests.readthedocs.io/en/master/ .. _FFmpeg: https://www.ffmpeg.org/ .. _yt-dlp: https://github.com/yt-dlp/yt-dlp .. _youtube-dl: https://ytdl-org.github.io/youtube-dl/ .. _PySocks: https://pypi.org/project/PySocks/ .. _brotli: https://github.com/google/brotli .. _brotlicffi: https://github.com/python-hyper/brotlicffi .. _Snapd: https://docs.snapcraft.io/installing-snapd .. _OAuth: https://en.wikipedia.org/wiki/OAuth .. _Chocolatey: https://chocolatey.org/install .. _Scoop: https://scoop.sh .. |pypi| image:: https://img.shields.io/pypi/v/gallery-dl.svg :target: https://pypi.org/project/gallery-dl/ .. |build| image:: https://github.com/mikf/gallery-dl/workflows/tests/badge.svg :target: https://github.com/mikf/gallery-dl/actions .. |gitter| image:: https://badges.gitter.im/gallery-dl/main.svg :target: https://gitter.im/gallery-dl/main ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1678564859.0 gallery_dl-1.25.0/README.rst0000644000175000017500000002335414403156773014121 0ustar00mikemike========== gallery-dl ========== *gallery-dl* is a command-line program to download image galleries and collections from several image hosting sites (see `Supported Sites `__). It is a cross-platform tool with many `configuration options `__ and powerful `filenaming capabilities `__. |pypi| |build| .. contents:: Dependencies ============ - Python_ 3.4+ - Requests_ Optional -------- - FFmpeg_: Pixiv Ugoira conversion - yt-dlp_ or youtube-dl_: Video downloads - PySocks_: SOCKS proxy support - brotli_ or brotlicffi_: Brotli compression support Installation ============ Pip --- The stable releases of *gallery-dl* are distributed on PyPI_ and can be easily installed or upgraded using pip_: .. code:: bash python3 -m pip install -U gallery-dl Installing the latest dev version directly from GitHub can be done with pip_ as well: .. code:: bash python3 -m pip install -U -I --no-deps --no-cache-dir https://github.com/mikf/gallery-dl/archive/master.tar.gz Note: Windows users should use :code:`py -3` instead of :code:`python3`. It is advised to use the latest version of pip_, including the essential packages :code:`setuptools` and :code:`wheel`. To ensure these packages are up-to-date, run .. code:: bash python3 -m pip install --upgrade pip setuptools wheel Standalone Executable --------------------- Prebuilt executable files with a Python interpreter and required Python packages included are available for - `Windows `__ (Requires `Microsoft Visual C++ Redistributable Package (x86) `__) - `Linux `__ Nightly Builds -------------- | Executables build from the latest commit can be found at | https://github.com/mikf/gallery-dl/actions/workflows/executables.yml Snap ---- Linux users that are using a distro that is supported by Snapd_ can install *gallery-dl* from the Snap Store: .. code:: bash snap install gallery-dl Chocolatey ---------- Windows users that have Chocolatey_ installed can install *gallery-dl* from the Chocolatey Community Packages repository: .. code:: powershell choco install gallery-dl Scoop ----- *gallery-dl* is also available in the Scoop_ "main" bucket for Windows users: .. code:: powershell scoop install gallery-dl Homebrew -------- For macOS or Linux users using Homebrew: .. code:: bash brew install gallery-dl Usage ===== To use *gallery-dl* simply call it with the URLs you wish to download images from: .. code:: bash gallery-dl [OPTIONS]... URLS... Use :code:`gallery-dl --help` or see ``__ for a full list of all command-line options. Examples -------- Download images; in this case from danbooru via tag search for 'bonocho': .. code:: bash gallery-dl "https://danbooru.donmai.us/posts?tags=bonocho" Get the direct URL of an image from a site supporting authentication with username & password: .. code:: bash gallery-dl -g -u "" -p "" "https://twitter.com/i/web/status/604341487988576256" Filter manga chapters by chapter number and language: .. code:: bash gallery-dl --chapter-filter "10 <= chapter < 20" -o "lang=fr" "https://mangadex.org/title/59793dd0-a2d8-41a2-9758-8197287a8539" | Search a remote resource for URLs and download images from them: | (URLs for which no extractor can be found will be silently ignored) .. code:: bash gallery-dl "r:https://pastebin.com/raw/FLwrCYsT" If a site's address is nonstandard for its extractor, you can prefix the URL with the extractor's name to force the use of a specific extractor: .. code:: bash gallery-dl "tumblr:https://sometumblrblog.example" Configuration ============= Configuration files for *gallery-dl* use a JSON-based file format. Documentation ------------- A list of all available configuration options and their descriptions can be found in ``__. | For a default configuration file with available options set to their default values, see ``__. | For a commented example with more involved settings and option usage, see ``__. Locations --------- *gallery-dl* searches for configuration files in the following places: Windows: * ``%APPDATA%\gallery-dl\config.json`` * ``%USERPROFILE%\gallery-dl\config.json`` * ``%USERPROFILE%\gallery-dl.conf`` (``%USERPROFILE%`` usually refers to a user's home directory, i.e. ``C:\Users\\``) Linux, macOS, etc.: * ``/etc/gallery-dl.conf`` * ``${XDG_CONFIG_HOME}/gallery-dl/config.json`` * ``${HOME}/.config/gallery-dl/config.json`` * ``${HOME}/.gallery-dl.conf`` When run as `executable `__, *gallery-dl* will also look for a ``gallery-dl.conf`` file in the same directory as said executable. It is possible to use more than one configuration file at a time. In this case, any values from files after the first will get merged into the already loaded settings and potentially override previous ones. Authentication ============== Username & Password ------------------- Some extractors require you to provide valid login credentials in the form of a username & password pair. This is necessary for ``nijie`` and optional for ``aryion``, ``danbooru``, ``e621``, ``exhentai``, ``idolcomplex``, ``imgbb``, ``inkbunny``, ``mangadex``, ``mangoxo``, ``pillowfort``, ``sankaku``, ``subscribestar``, ``tapas``, ``tsumino``, ``twitter``, and ``zerochan``. You can set the necessary information in your `configuration file `__ .. code:: json { "extractor": { "twitter": { "username": "", "password": "" } } } or you can provide them directly via the :code:`-u/--username` and :code:`-p/--password` or via the :code:`-o/--option` command-line options .. code:: bash gallery-dl -u "" -p "" "URL" gallery-dl -o "username=" -o "password=" "URL" Cookies ------- For sites where login with username & password is not possible due to CAPTCHA or similar, or has not been implemented yet, you can use the cookies from a browser login session and input them into *gallery-dl*. This can be done via the `cookies `__ option in your configuration file by specifying - | the path to a Mozilla/Netscape format cookies.txt file exported by a browser addon | (e.g. `Export Cookies `__ for Firefox) - | a list of name-value pairs gathered from your browser's web developer tools | (in `Chrome `__, in `Firefox `__) - | the name of a browser to extract cookies from | (supported browsers are Chromium-based ones, Firefox, and Safari) For example: .. code:: json { "extractor": { "instagram": { "cookies": "$HOME/path/to/cookies.txt" }, "patreon": { "cookies": { "session_id": "K1T57EKu19TR49C51CDjOJoXNQLF7VbdVOiBrC9ye0a" } }, "twitter": { "cookies": ["firefox"] } } } | You can also specify a cookies.txt file with the :code:`--cookies` command-line option | or a browser to extract cookies from with :code:`--cookies-from-browser`: .. code:: bash gallery-dl --cookies "$HOME/path/to/cookies.txt" "URL" gallery-dl --cookies-from-browser firefox "URL" OAuth ----- *gallery-dl* supports user authentication via OAuth_ for some extractors. This is necessary for ``pixiv`` and optional for ``deviantart``, ``flickr``, ``reddit``, ``smugmug``, ``tumblr``, and ``mastodon`` instances. Linking your account to *gallery-dl* grants it the ability to issue requests on your account's behalf and enables it to access resources which would otherwise be unavailable to a public user. To do so, start by invoking it with ``oauth:`` as an argument. For example: .. code:: bash gallery-dl oauth:flickr You will be sent to the site's authorization page and asked to grant read access to *gallery-dl*. Authorize it and you will be shown one or more "tokens", which should be added to your configuration file. To authenticate with a ``mastodon`` instance, run *gallery-dl* with ``oauth:mastodon:`` as argument. For example: .. code:: bash gallery-dl oauth:mastodon:pawoo.net gallery-dl oauth:mastodon:https://mastodon.social/ .. _Python: https://www.python.org/downloads/ .. _PyPI: https://pypi.org/ .. _pip: https://pip.pypa.io/en/stable/ .. _Requests: https://requests.readthedocs.io/en/master/ .. _FFmpeg: https://www.ffmpeg.org/ .. _yt-dlp: https://github.com/yt-dlp/yt-dlp .. _youtube-dl: https://ytdl-org.github.io/youtube-dl/ .. _PySocks: https://pypi.org/project/PySocks/ .. _brotli: https://github.com/google/brotli .. _brotlicffi: https://github.com/python-hyper/brotlicffi .. _Snapd: https://docs.snapcraft.io/installing-snapd .. _OAuth: https://en.wikipedia.org/wiki/OAuth .. _Chocolatey: https://chocolatey.org/install .. _Scoop: https://scoop.sh .. |pypi| image:: https://img.shields.io/pypi/v/gallery-dl.svg :target: https://pypi.org/project/gallery-dl/ .. |build| image:: https://github.com/mikf/gallery-dl/workflows/tests/badge.svg :target: https://github.com/mikf/gallery-dl/actions .. |gitter| image:: https://badges.gitter.im/gallery-dl/main.svg :target: https://gitter.im/gallery-dl/main ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1678564859.9451864 gallery_dl-1.25.0/data/0000755000175000017500000000000014403156774013335 5ustar00mikemike././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1678564859.9485197 gallery_dl-1.25.0/data/completion/0000755000175000017500000000000014403156774015506 5ustar00mikemike././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1678401408.0 gallery_dl-1.25.0/data/completion/_gallery-dl0000644000175000017500000001363414402457600017622 0ustar00mikemike#compdef gallery-dl local curcontext="$curcontext" typeset -A opt_args local rc=1 _arguments -C -S \ {-h,--help}'[Print this help message and exit]' \ --version'[Print program version and exit]' \ {-i,--input-file}'[Download URLs found in FILE ("-" for stdin). More than one --input-file can be specified]':'':_files \ {-f,--filename}'[Filename format string for downloaded files ("/O" for "original" filenames)]':'' \ {-d,--destination}'[Target location for file downloads]':'' \ {-D,--directory}'[Exact location for file downloads]':'' \ {-X,--extractors}'[Load external extractors from PATH]':'' \ --proxy'[Use the specified proxy]':'' \ --source-address'[Client-side IP address to bind to]':'' \ --user-agent'[User-Agent request header]':'' \ --clear-cache'[Delete cached login sessions, cookies, etc. for MODULE (ALL to delete everything)]':'' \ --cookies'[File to load additional cookies from]':'':_files \ --cookies-from-browser'[Name of the browser to load cookies from, with optional keyring name prefixed with "+", profile prefixed with ":", and container prefixed with "::" ("none" for no container)]':'' \ {-q,--quiet}'[Activate quiet mode]' \ {-v,--verbose}'[Print various debugging information]' \ {-g,--get-urls}'[Print URLs instead of downloading]' \ {-G,--resolve-urls}'[Print URLs instead of downloading; resolve intermediary URLs]' \ {-j,--dump-json}'[Print JSON information]' \ {-s,--simulate}'[Simulate data extraction; do not download anything]' \ {-E,--extractor-info}'[Print extractor defaults and settings]' \ {-K,--list-keywords}'[Print a list of available keywords and example values for the given URLs]' \ --list-modules'[Print a list of available extractor modules]' \ --list-extractors'[Print a list of extractor classes with description, (sub)category and example URL]' \ --write-log'[Write logging output to FILE]':'':_files \ --write-unsupported'[Write URLs, which get emitted by other extractors but cannot be handled, to FILE]':'':_files \ --write-pages'[Write downloaded intermediary pages to files in the current directory to debug problems]' \ {-r,--limit-rate}'[Maximum download rate (e.g. 500k or 2.5M)]':'' \ {-R,--retries}'[Maximum number of retries for failed HTTP requests or -1 for infinite retries (default: 4)]':'' \ --http-timeout'[Timeout for HTTP connections (default: 30.0)]':'' \ --sleep'[Number of seconds to wait before each download. This can be either a constant value or a range (e.g. 2.7 or 2.0-3.5)]':'' \ --sleep-request'[Number of seconds to wait between HTTP requests during data extraction]':'' \ --sleep-extractor'[Number of seconds to wait before starting data extraction for an input URL]':'' \ --filesize-min'[Do not download files smaller than SIZE (e.g. 500k or 2.5M)]':'' \ --filesize-max'[Do not download files larger than SIZE (e.g. 500k or 2.5M)]':'' \ --chunk-size'[Size of in-memory data chunks (default: 32k)]':'' \ --no-part'[Do not use .part files]' \ --no-skip'[Do not skip downloads; overwrite existing files]' \ --no-mtime'[Do not set file modification times according to Last-Modified HTTP response headers]' \ --no-download'[Do not download any files]' \ --no-postprocessors'[Do not run any post processors]' \ --no-check-certificate'[Disable HTTPS certificate validation]' \ {-o,--option}'[Additional options. Example: -o browser=firefox]':'' \ {-c,--config}'[Additional configuration files]':'':_files \ --config-yaml'[Additional configuration files in YAML format]':'':_files \ --config-toml'[Additional configuration files in TOML format]':'':_files \ --config-create'[Create a basic configuration file]' \ --config-ignore'[Do not read default configuration files]' \ --ignore-config'[==SUPPRESS==]' \ {-u,--username}'[Username to login with]':'' \ {-p,--password}'[Password belonging to the given username]':'' \ --netrc'[Enable .netrc authentication data]' \ --download-archive'[Record all downloaded or skipped files in FILE and skip downloading any file already in it]':'':_files \ {-A,--abort}'[Stop current extractor run after N consecutive file downloads were skipped]':'' \ {-T,--terminate}'[Stop current and parent extractor runs after N consecutive file downloads were skipped]':'' \ --range'[Index range(s) specifying which files to download. These can be either a constant value, range, or slice (e.g. "5", "8-20", or "1:24:3")]':'' \ --chapter-range'[Like "--range", but applies to manga chapters and other delegated URLs]':'' \ --filter'[Python expression controlling which files to download. Files for which the expression evaluates to False are ignored. Available keys are the filename-specific ones listed by "-K". Example: --filter "image_width >= 1000 and rating in ("s", "q")"]':'' \ --chapter-filter'[Like "--filter", but applies to manga chapters and other delegated URLs]':'' \ --zip'[Store downloaded files in a ZIP archive]' \ --ugoira-conv'[Convert Pixiv Ugoira to WebM (requires FFmpeg)]' \ --ugoira-conv-lossless'[Convert Pixiv Ugoira to WebM in VP9 lossless mode]' \ --ugoira-conv-copy'[Convert Pixiv Ugoira to MKV without re-encoding any frames]' \ --write-metadata'[Write metadata to separate JSON files]' \ --write-info-json'[Write gallery metadata to a info.json file]' \ --write-infojson'[==SUPPRESS==]' \ --write-tags'[Write image tags to separate text files]' \ --mtime-from-date'[Set file modification times according to "date" metadata]' \ --exec'[Execute CMD for each downloaded file. Example: --exec "convert {} {}.png && rm {}"]':'' \ --exec-after'[Execute CMD after all files were downloaded successfully. Example: --exec-after "cd {} && convert * ../doc.pdf"]':'' \ {-P,--postprocessor}'[Activate the specified post processor]':'' \ {-O,--postprocessor-option}'[Additional "=" post processor options]':'' && rc=0 return rc ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1678401407.0 gallery_dl-1.25.0/data/completion/gallery-dl0000644000175000017500000000272414402457577017476 0ustar00mikemike_gallery_dl() { local cur prev COMPREPLY=() cur="${COMP_WORDS[COMP_CWORD]}" prev="${COMP_WORDS[COMP_CWORD-1]}" if [[ "${prev}" =~ ^(-i|--input-file|--cookies|--write-log|--write-unsupported|-c|--config|--config-yaml|--config-toml|--download-archive)$ ]]; then COMPREPLY=( $(compgen -f -- "${cur}") ) elif [[ "${prev}" =~ ^()$ ]]; then COMPREPLY=( $(compgen -d -- "${cur}") ) else COMPREPLY=( $(compgen -W "--help --version --input-file --filename --destination --directory --extractors --proxy --source-address --user-agent --clear-cache --cookies --cookies-from-browser --quiet --verbose --get-urls --resolve-urls --dump-json --simulate --extractor-info --list-keywords --list-modules --list-extractors --write-log --write-unsupported --write-pages --limit-rate --retries --http-timeout --sleep --sleep-request --sleep-extractor --filesize-min --filesize-max --chunk-size --no-part --no-skip --no-mtime --no-download --no-postprocessors --no-check-certificate --option --config --config-yaml --config-toml --config-create --config-ignore --ignore-config --username --password --netrc --download-archive --abort --terminate --range --chapter-range --filter --chapter-filter --zip --ugoira-conv --ugoira-conv-lossless --ugoira-conv-copy --write-metadata --write-info-json --write-infojson --write-tags --mtime-from-date --exec --exec-after --postprocessor --postprocessor-option" -- "${cur}") ) fi } complete -F _gallery_dl gallery-dl ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1678401408.0 gallery_dl-1.25.0/data/completion/gallery-dl.fish0000644000175000017500000001673114402457600020414 0ustar00mikemikecomplete -c gallery-dl -x complete -c gallery-dl -s 'h' -l 'help' -d 'Print this help message and exit' complete -c gallery-dl -l 'version' -d 'Print program version and exit' complete -c gallery-dl -r -F -s 'i' -l 'input-file' -d 'Download URLs found in FILE ("-" for stdin). More than one --input-file can be specified' complete -c gallery-dl -x -s 'f' -l 'filename' -d 'Filename format string for downloaded files ("/O" for "original" filenames)' complete -c gallery-dl -x -a '(__fish_complete_directories)' -s 'd' -l 'destination' -d 'Target location for file downloads' complete -c gallery-dl -x -a '(__fish_complete_directories)' -s 'D' -l 'directory' -d 'Exact location for file downloads' complete -c gallery-dl -x -a '(__fish_complete_directories)' -s 'X' -l 'extractors' -d 'Load external extractors from PATH' complete -c gallery-dl -x -l 'proxy' -d 'Use the specified proxy' complete -c gallery-dl -x -l 'source-address' -d 'Client-side IP address to bind to' complete -c gallery-dl -x -l 'user-agent' -d 'User-Agent request header' complete -c gallery-dl -x -l 'clear-cache' -d 'Delete cached login sessions, cookies, etc. for MODULE (ALL to delete everything)' complete -c gallery-dl -r -F -l 'cookies' -d 'File to load additional cookies from' complete -c gallery-dl -x -l 'cookies-from-browser' -d 'Name of the browser to load cookies from, with optional keyring name prefixed with "+", profile prefixed with ":", and container prefixed with "::" ("none" for no container)' complete -c gallery-dl -s 'q' -l 'quiet' -d 'Activate quiet mode' complete -c gallery-dl -s 'v' -l 'verbose' -d 'Print various debugging information' complete -c gallery-dl -s 'g' -l 'get-urls' -d 'Print URLs instead of downloading' complete -c gallery-dl -s 'G' -l 'resolve-urls' -d 'Print URLs instead of downloading; resolve intermediary URLs' complete -c gallery-dl -s 'j' -l 'dump-json' -d 'Print JSON information' complete -c gallery-dl -s 's' -l 'simulate' -d 'Simulate data extraction; do not download anything' complete -c gallery-dl -s 'E' -l 'extractor-info' -d 'Print extractor defaults and settings' complete -c gallery-dl -s 'K' -l 'list-keywords' -d 'Print a list of available keywords and example values for the given URLs' complete -c gallery-dl -l 'list-modules' -d 'Print a list of available extractor modules' complete -c gallery-dl -l 'list-extractors' -d 'Print a list of extractor classes with description, (sub)category and example URL' complete -c gallery-dl -r -F -l 'write-log' -d 'Write logging output to FILE' complete -c gallery-dl -r -F -l 'write-unsupported' -d 'Write URLs, which get emitted by other extractors but cannot be handled, to FILE' complete -c gallery-dl -l 'write-pages' -d 'Write downloaded intermediary pages to files in the current directory to debug problems' complete -c gallery-dl -x -s 'r' -l 'limit-rate' -d 'Maximum download rate (e.g. 500k or 2.5M)' complete -c gallery-dl -x -s 'R' -l 'retries' -d 'Maximum number of retries for failed HTTP requests or -1 for infinite retries (default: 4)' complete -c gallery-dl -x -l 'http-timeout' -d 'Timeout for HTTP connections (default: 30.0)' complete -c gallery-dl -x -l 'sleep' -d 'Number of seconds to wait before each download. This can be either a constant value or a range (e.g. 2.7 or 2.0-3.5)' complete -c gallery-dl -x -l 'sleep-request' -d 'Number of seconds to wait between HTTP requests during data extraction' complete -c gallery-dl -x -l 'sleep-extractor' -d 'Number of seconds to wait before starting data extraction for an input URL' complete -c gallery-dl -x -l 'filesize-min' -d 'Do not download files smaller than SIZE (e.g. 500k or 2.5M)' complete -c gallery-dl -x -l 'filesize-max' -d 'Do not download files larger than SIZE (e.g. 500k or 2.5M)' complete -c gallery-dl -x -l 'chunk-size' -d 'Size of in-memory data chunks (default: 32k)' complete -c gallery-dl -l 'no-part' -d 'Do not use .part files' complete -c gallery-dl -l 'no-skip' -d 'Do not skip downloads; overwrite existing files' complete -c gallery-dl -l 'no-mtime' -d 'Do not set file modification times according to Last-Modified HTTP response headers' complete -c gallery-dl -l 'no-download' -d 'Do not download any files' complete -c gallery-dl -l 'no-postprocessors' -d 'Do not run any post processors' complete -c gallery-dl -l 'no-check-certificate' -d 'Disable HTTPS certificate validation' complete -c gallery-dl -x -s 'o' -l 'option' -d 'Additional options. Example: -o browser=firefox' complete -c gallery-dl -r -F -s 'c' -l 'config' -d 'Additional configuration files' complete -c gallery-dl -r -F -l 'config-yaml' -d 'Additional configuration files in YAML format' complete -c gallery-dl -r -F -l 'config-toml' -d 'Additional configuration files in TOML format' complete -c gallery-dl -l 'config-create' -d 'Create a basic configuration file' complete -c gallery-dl -l 'config-ignore' -d 'Do not read default configuration files' complete -c gallery-dl -l 'ignore-config' -d '==SUPPRESS==' complete -c gallery-dl -x -s 'u' -l 'username' -d 'Username to login with' complete -c gallery-dl -x -s 'p' -l 'password' -d 'Password belonging to the given username' complete -c gallery-dl -l 'netrc' -d 'Enable .netrc authentication data' complete -c gallery-dl -r -F -l 'download-archive' -d 'Record all downloaded or skipped files in FILE and skip downloading any file already in it' complete -c gallery-dl -x -s 'A' -l 'abort' -d 'Stop current extractor run after N consecutive file downloads were skipped' complete -c gallery-dl -x -s 'T' -l 'terminate' -d 'Stop current and parent extractor runs after N consecutive file downloads were skipped' complete -c gallery-dl -x -l 'range' -d 'Index range(s) specifying which files to download. These can be either a constant value, range, or slice (e.g. "5", "8-20", or "1:24:3")' complete -c gallery-dl -x -l 'chapter-range' -d 'Like "--range", but applies to manga chapters and other delegated URLs' complete -c gallery-dl -x -l 'filter' -d 'Python expression controlling which files to download. Files for which the expression evaluates to False are ignored. Available keys are the filename-specific ones listed by "-K". Example: --filter "image_width >= 1000 and rating in ("s", "q")"' complete -c gallery-dl -x -l 'chapter-filter' -d 'Like "--filter", but applies to manga chapters and other delegated URLs' complete -c gallery-dl -l 'zip' -d 'Store downloaded files in a ZIP archive' complete -c gallery-dl -l 'ugoira-conv' -d 'Convert Pixiv Ugoira to WebM (requires FFmpeg)' complete -c gallery-dl -l 'ugoira-conv-lossless' -d 'Convert Pixiv Ugoira to WebM in VP9 lossless mode' complete -c gallery-dl -l 'ugoira-conv-copy' -d 'Convert Pixiv Ugoira to MKV without re-encoding any frames' complete -c gallery-dl -l 'write-metadata' -d 'Write metadata to separate JSON files' complete -c gallery-dl -l 'write-info-json' -d 'Write gallery metadata to a info.json file' complete -c gallery-dl -l 'write-infojson' -d '==SUPPRESS==' complete -c gallery-dl -l 'write-tags' -d 'Write image tags to separate text files' complete -c gallery-dl -l 'mtime-from-date' -d 'Set file modification times according to "date" metadata' complete -c gallery-dl -x -l 'exec' -d 'Execute CMD for each downloaded file. Example: --exec "convert {} {}.png && rm {}"' complete -c gallery-dl -x -l 'exec-after' -d 'Execute CMD after all files were downloaded successfully. Example: --exec-after "cd {} && convert * ../doc.pdf"' complete -c gallery-dl -x -s 'P' -l 'postprocessor' -d 'Activate the specified post processor' complete -c gallery-dl -x -s 'O' -l 'postprocessor-option' -d 'Additional "=" post processor options' ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1678564859.9485197 gallery_dl-1.25.0/data/man/0000755000175000017500000000000014403156774014110 5ustar00mikemike././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1678564859.0 gallery_dl-1.25.0/data/man/gallery-dl.10000644000175000017500000001752014403156773016232 0ustar00mikemike.TH "GALLERY-DL" "1" "2023-03-11" "1.25.0" "gallery-dl Manual" .\" disable hyphenation .nh .SH NAME gallery-dl \- download image-galleries and -collections .SH SYNOPSIS .B gallery-dl [OPTION]... URL... .SH DESCRIPTION .B gallery-dl is a command-line program to download image-galleries and -collections from several image hosting sites. It is a cross-platform tool with many configuration options and powerful filenaming capabilities. .SH OPTIONS .TP .B "\-h, \-\-help" Print this help message and exit .TP .B "\-\-version" Print program version and exit .TP .B "\-i, \-\-input\-file" \f[I]FILE\f[] Download URLs found in FILE ('-' for stdin). More than one --input-file can be specified .TP .B "\-f, \-\-filename" \f[I]FORMAT\f[] Filename format string for downloaded files ('/O' for "original" filenames) .TP .B "\-d, \-\-destination" \f[I]PATH\f[] Target location for file downloads .TP .B "\-D, \-\-directory" \f[I]PATH\f[] Exact location for file downloads .TP .B "\-X, \-\-extractors" \f[I]PATH\f[] Load external extractors from PATH .TP .B "\-\-proxy" \f[I]URL\f[] Use the specified proxy .TP .B "\-\-source\-address" \f[I]IP\f[] Client-side IP address to bind to .TP .B "\-\-user\-agent" \f[I]UA\f[] User-Agent request header .TP .B "\-\-clear\-cache" \f[I]MODULE\f[] Delete cached login sessions, cookies, etc. for MODULE (ALL to delete everything) .TP .B "\-\-cookies" \f[I]FILE\f[] File to load additional cookies from .TP .B "\-\-cookies\-from\-browser" \f[I]BROWSER[+KEYRING][:PROFILE][::CONTAINER]\f[] Name of the browser to load cookies from, with optional keyring name prefixed with '+', profile prefixed with ':', and container prefixed with '::' ('none' for no container) .TP .B "\-q, \-\-quiet" Activate quiet mode .TP .B "\-v, \-\-verbose" Print various debugging information .TP .B "\-g, \-\-get\-urls" Print URLs instead of downloading .TP .B "\-G, \-\-resolve\-urls" Print URLs instead of downloading; resolve intermediary URLs .TP .B "\-j, \-\-dump\-json" Print JSON information .TP .B "\-s, \-\-simulate" Simulate data extraction; do not download anything .TP .B "\-E, \-\-extractor\-info" Print extractor defaults and settings .TP .B "\-K, \-\-list\-keywords" Print a list of available keywords and example values for the given URLs .TP .B "\-\-list\-modules" Print a list of available extractor modules .TP .B "\-\-list\-extractors" Print a list of extractor classes with description, (sub)category and example URL .TP .B "\-\-write\-log" \f[I]FILE\f[] Write logging output to FILE .TP .B "\-\-write\-unsupported" \f[I]FILE\f[] Write URLs, which get emitted by other extractors but cannot be handled, to FILE .TP .B "\-\-write\-pages" Write downloaded intermediary pages to files in the current directory to debug problems .TP .B "\-r, \-\-limit\-rate" \f[I]RATE\f[] Maximum download rate (e.g. 500k or 2.5M) .TP .B "\-R, \-\-retries" \f[I]N\f[] Maximum number of retries for failed HTTP requests or -1 for infinite retries (default: 4) .TP .B "\-\-http\-timeout" \f[I]SECONDS\f[] Timeout for HTTP connections (default: 30.0) .TP .B "\-\-sleep" \f[I]SECONDS\f[] Number of seconds to wait before each download. This can be either a constant value or a range (e.g. 2.7 or 2.0-3.5) .TP .B "\-\-sleep\-request" \f[I]SECONDS\f[] Number of seconds to wait between HTTP requests during data extraction .TP .B "\-\-sleep\-extractor" \f[I]SECONDS\f[] Number of seconds to wait before starting data extraction for an input URL .TP .B "\-\-filesize\-min" \f[I]SIZE\f[] Do not download files smaller than SIZE (e.g. 500k or 2.5M) .TP .B "\-\-filesize\-max" \f[I]SIZE\f[] Do not download files larger than SIZE (e.g. 500k or 2.5M) .TP .B "\-\-chunk\-size" \f[I]SIZE\f[] Size of in-memory data chunks (default: 32k) .TP .B "\-\-no\-part" Do not use .part files .TP .B "\-\-no\-skip" Do not skip downloads; overwrite existing files .TP .B "\-\-no\-mtime" Do not set file modification times according to Last-Modified HTTP response headers .TP .B "\-\-no\-download" Do not download any files .TP .B "\-\-no\-postprocessors" Do not run any post processors .TP .B "\-\-no\-check\-certificate" Disable HTTPS certificate validation .TP .B "\-o, \-\-option" \f[I]KEY=VALUE\f[] Additional options. Example: -o browser=firefox .TP .B "\-c, \-\-config" \f[I]FILE\f[] Additional configuration files .TP .B "\-\-config\-yaml" \f[I]FILE\f[] Additional configuration files in YAML format .TP .B "\-\-config\-toml" \f[I]FILE\f[] Additional configuration files in TOML format .TP .B "\-\-config\-create" Create a basic configuration file .TP .B "\-\-config\-ignore" Do not read default configuration files .TP .B "\-u, \-\-username" \f[I]USER\f[] Username to login with .TP .B "\-p, \-\-password" \f[I]PASS\f[] Password belonging to the given username .TP .B "\-\-netrc" Enable .netrc authentication data .TP .B "\-\-download\-archive" \f[I]FILE\f[] Record all downloaded or skipped files in FILE and skip downloading any file already in it .TP .B "\-A, \-\-abort" \f[I]N\f[] Stop current extractor run after N consecutive file downloads were skipped .TP .B "\-T, \-\-terminate" \f[I]N\f[] Stop current and parent extractor runs after N consecutive file downloads were skipped .TP .B "\-\-range" \f[I]RANGE\f[] Index range(s) specifying which files to download. These can be either a constant value, range, or slice (e.g. '5', '8-20', or '1:24:3') .TP .B "\-\-chapter\-range" \f[I]RANGE\f[] Like '--range', but applies to manga chapters and other delegated URLs .TP .B "\-\-filter" \f[I]EXPR\f[] Python expression controlling which files to download. Files for which the expression evaluates to False are ignored. Available keys are the filename-specific ones listed by '-K'. Example: --filter "image_width >= 1000 and rating in ('s', 'q')" .TP .B "\-\-chapter\-filter" \f[I]EXPR\f[] Like '--filter', but applies to manga chapters and other delegated URLs .TP .B "\-\-zip" Store downloaded files in a ZIP archive .TP .B "\-\-ugoira\-conv" Convert Pixiv Ugoira to WebM (requires FFmpeg) .TP .B "\-\-ugoira\-conv\-lossless" Convert Pixiv Ugoira to WebM in VP9 lossless mode .TP .B "\-\-ugoira\-conv\-copy" Convert Pixiv Ugoira to MKV without re-encoding any frames .TP .B "\-\-write\-metadata" Write metadata to separate JSON files .TP .B "\-\-write\-info\-json" Write gallery metadata to a info.json file .TP .B "\-\-write\-tags" Write image tags to separate text files .TP .B "\-\-mtime\-from\-date" Set file modification times according to 'date' metadata .TP .B "\-\-exec" \f[I]CMD\f[] Execute CMD for each downloaded file. Example: --exec "convert {} {}.png && rm {}" .TP .B "\-\-exec\-after" \f[I]CMD\f[] Execute CMD after all files were downloaded successfully. Example: --exec-after "cd {} && convert * ../doc.pdf" .TP .B "\-P, \-\-postprocessor" \f[I]NAME\f[] Activate the specified post processor .TP .B "\-O, \-\-postprocessor\-option" \f[I]OPT\f[] Additional '=' post processor options .SH EXAMPLES .TP gallery-dl \f[I]URL\f[] Download images from \f[I]URL\f[]. .TP gallery-dl -g -u -p \f[I]URL\f[] Print direct URLs from a site that requires authentication. .TP gallery-dl --filter 'type == "ugoira"' --range '2-4' \f[I]URL\f[] Apply filter and range expressions. This will only download the second, third, and fourth file where its type value is equal to "ugoira". .TP gallery-dl r:\f[I]URL\f[] Scan \f[I]URL\f[] for other URLs and invoke \f[B]gallery-dl\f[] on them. .TP gallery-dl oauth:\f[I]SITE\-NAME\f[] Gain OAuth authentication tokens for .IR deviantart , .IR flickr , .IR reddit , .IR smugmug ", and" .IR tumblr . .SH FILES .TP .I /etc/gallery-dl.conf The system wide configuration file. .TP .I ~/.config/gallery-dl/config.json Per user configuration file. .TP .I ~/.gallery-dl.conf Alternate per user configuration file. .SH BUGS https://github.com/mikf/gallery-dl/issues .SH AUTHORS Mike Fährmann .br and https://github.com/mikf/gallery-dl/graphs/contributors .SH "SEE ALSO" .BR gallery-dl.conf (5) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1678564859.0 gallery_dl-1.25.0/data/man/gallery-dl.conf.50000644000175000017500000034457114403156773017173 0ustar00mikemike.TH "GALLERY-DL.CONF" "5" "2023-03-11" "1.25.0" "gallery-dl Manual" .\" disable hyphenation .nh .\" disable justification (adjust text to left margin only) .ad l .SH NAME gallery-dl.conf \- gallery-dl configuration file .SH DESCRIPTION gallery-dl will search for configuration files in the following places every time it is started, unless .B --ignore-config is specified: .PP .RS 4 .nf .I /etc/gallery-dl.conf .I $HOME/.config/gallery-dl/config.json .I $HOME/.gallery-dl.conf .fi .RE .PP It is also possible to specify additional configuration files with the .B -c/--config command-line option or to add further option values with .B -o/--option as = pairs, Configuration files are JSON-based and therefore don't allow any ordinary comments, but, since unused keys are simply ignored, it is possible to utilize those as makeshift comments by settings their values to arbitrary strings. .SH EXAMPLE { .RS 4 "base-directory": "/tmp/", .br "extractor": { .RS 4 "pixiv": { .RS 4 "directory": ["Pixiv", "Works", "{user[id]}"], .br "filename": "{id}{num}.{extension}", .br "username": "foo", .br "password": "bar" .RE }, .br "flickr": { .RS 4 "_comment": "OAuth keys for account 'foobar'", .br "access-token": "0123456789-0123456789abcdef", .br "access-token-secret": "fedcba9876543210" .RE } .RE }, .br "downloader": { .RS 4 "retries": 3, .br "timeout": 2.5 .RE } .RE } .SH EXTRACTOR OPTIONS .SS extractor.*.filename .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]object\f[] (condition -> \f[I]format string\f[]) .IP "Example:" 4 .. code:: json "{manga}_c{chapter}_{page:>03}.{extension}" .. code:: json { "extension == 'mp4'": "{id}_video.{extension}", "'nature' in title" : "{id}_{title}.{extension}", "" : "{id}_default.{extension}" } .IP "Description:" 4 A \f[I]format string\f[] to build filenames for downloaded files with. If this is an \f[I]object\f[], it must contain Python expressions mapping to the filename format strings to use. These expressions are evaluated in the order as specified in Python 3.6+ and in an undetermined order in Python 3.4 and 3.5. The available replacement keys depend on the extractor used. A list of keys for a specific one can be acquired by calling *gallery-dl* with the \f[I]-K\f[]/\f[I]--list-keywords\f[] command-line option. For example: .. code:: $ gallery-dl -K http://seiga.nicovideo.jp/seiga/im5977527 Keywords for directory names: category seiga subcategory image Keywords for filenames: category seiga extension None image-id 5977527 subcategory image Note: Even if the value of the \f[I]extension\f[] key is missing or \f[I]None\f[], it will be filled in later when the file download is starting. This key is therefore always available to provide a valid filename extension. .SS extractor.*.directory .IP "Type:" 6 .br * \f[I]list\f[] of \f[I]strings\f[] .br * \f[I]object\f[] (condition -> \f[I]format strings\f[]) .IP "Example:" 4 .. code:: json ["{category}", "{manga}", "c{chapter} - {title}"] .. code:: json { "'nature' in content": ["Nature Pictures"], "retweet_id != 0" : ["{category}", "{user[name]}", "Retweets"], "" : ["{category}", "{user[name]}"] } .IP "Description:" 4 A list of \f[I]format strings\f[] to build target directory paths with. If this is an \f[I]object\f[], it must contain Python expressions mapping to the list of format strings to use. Each individual string in such a list represents a single path segment, which will be joined together and appended to the \f[I]base-directory\f[] to form the complete target directory path. .SS extractor.*.base-directory .IP "Type:" 6 \f[I]Path\f[] .IP "Default:" 9 \f[I]"./gallery-dl/"\f[] .IP "Description:" 4 Directory path used as base for all download destinations. .SS extractor.*.parent-directory .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Use an extractor's current target directory as \f[I]base-directory\f[] for any spawned child extractors. .SS extractor.*.parent-metadata .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 If \f[I]true\f[], overwrite any metadata provided by a child extractor with its parent's. If this is a \f[I]string\f[], add a parent's metadata to its children's .br to a field named after said string. For example with \f[I]"parent-metadata": "_p_"\f[]: .br .. code:: json { "id": "child-id", "_p_": {"id": "parent-id"} } .SS extractor.*.parent-skip .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Share number of skipped downloads between parent and child extractors. .SS extractor.*.path-restrict .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]object\f[] (character -> replacement character(s)) .IP "Default:" 9 \f[I]"auto"\f[] .IP "Example:" 4 .br * "/!? (){}" .br * {" ": "_", "/": "-", "|": "-", ":": "-", "*": "+"} .IP "Description:" 4 A string of characters to be replaced with the value of .br \f[I]path-replace\f[] or an object mapping invalid/unwanted characters to their replacements .br for generated path segment names. .br Special values: .br * \f[I]"auto"\f[]: Use characters from \f[I]"unix"\f[] or \f[I]"windows"\f[] depending on the local operating system .br * \f[I]"unix"\f[]: \f[I]"/"\f[] .br * \f[I]"windows"\f[]: \f[I]"\\\\\\\\|/<>:\\"?*"\f[] .br * \f[I]"ascii"\f[]: \f[I]"^0-9A-Za-z_."\f[] Note: In a string with 2 or more characters, \f[I][]^-\\\f[] need to be escaped with backslashes, e.g. \f[I]"\\\\[\\\\]"\f[] .SS extractor.*.path-replace .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"_"\f[] .IP "Description:" 4 The replacement character(s) for \f[I]path-restrict\f[] .SS extractor.*.path-remove .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"\\u0000-\\u001f\\u007f"\f[] (ASCII control characters) .IP "Description:" 4 Set of characters to remove from generated path names. Note: In a string with 2 or more characters, \f[I][]^-\\\f[] need to be escaped with backslashes, e.g. \f[I]"\\\\[\\\\]"\f[] .SS extractor.*.path-strip .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"auto"\f[] .IP "Description:" 4 Set of characters to remove from the end of generated path segment names using \f[I]str.rstrip()\f[] Special values: .br * \f[I]"auto"\f[]: Use characters from \f[I]"unix"\f[] or \f[I]"windows"\f[] depending on the local operating system .br * \f[I]"unix"\f[]: \f[I]""\f[] .br * \f[I]"windows"\f[]: \f[I]". "\f[] .SS extractor.*.path-extended .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 On Windows, use \f[I]extended-length paths\f[] prefixed with \f[I]\\\\?\\\f[] to work around the 260 characters path length limit. .SS extractor.*.extension-map .IP "Type:" 6 \f[I]object\f[] (extension -> replacement) .IP "Default:" 9 .. code:: json { "jpeg": "jpg", "jpe" : "jpg", "jfif": "jpg", "jif" : "jpg", "jfi" : "jpg" } .IP "Description:" 4 A JSON \f[I]object\f[] mapping filename extensions to their replacements. .SS extractor.*.skip .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Controls the behavior when downloading files that have been downloaded before, i.e. a file with the same filename already exists or its ID is in a \f[I]download archive\f[]. .br * \f[I]true\f[]: Skip downloads .br * \f[I]false\f[]: Overwrite already existing files .br * \f[I]"abort"\f[]: Stop the current extractor run .br * \f[I]"abort:N"\f[]: Skip downloads and stop the current extractor run after \f[I]N\f[] consecutive skips .br * \f[I]"terminate"\f[]: Stop the current extractor run, including parent extractors .br * \f[I]"terminate:N"\f[]: Skip downloads and stop the current extractor run, including parent extractors, after \f[I]N\f[] consecutive skips .br * \f[I]"exit"\f[]: Exit the program altogether .br * \f[I]"exit:N"\f[]: Skip downloads and exit the program after \f[I]N\f[] consecutive skips .br * \f[I]"enumerate"\f[]: Add an enumeration index to the beginning of the filename extension (\f[I]file.1.ext\f[], \f[I]file.2.ext\f[], etc.) .SS extractor.*.sleep .IP "Type:" 6 \f[I]Duration\f[] .IP "Default:" 9 \f[I]0\f[] .IP "Description:" 4 Number of seconds to sleep before each download. .SS extractor.*.sleep-extractor .IP "Type:" 6 \f[I]Duration\f[] .IP "Default:" 9 \f[I]0\f[] .IP "Description:" 4 Number of seconds to sleep before handling an input URL, i.e. before starting a new extractor. .SS extractor.*.sleep-request .IP "Type:" 6 \f[I]Duration\f[] .IP "Default:" 9 \f[I]0\f[] .IP "Description:" 4 Minimal time interval in seconds between each HTTP request during data extraction. .SS extractor.*.username & .password .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 The username and password to use when attempting to log in to another site. Specifying a username and password is required for .br * \f[I]nijie\f[] and optional for .br * \f[I]aibooru\f[] (*) .br * \f[I]aryion\f[] .br * \f[I]atfbooru\f[] (*) .br * \f[I]danbooru\f[] (*) .br * \f[I]e621\f[] (*) .br * \f[I]e926\f[] (*) .br * \f[I]exhentai\f[] .br * \f[I]idolcomplex\f[] .br * \f[I]imgbb\f[] .br * \f[I]inkbunny\f[] .br * \f[I]kemonoparty\f[] .br * \f[I]mangadex\f[] .br * \f[I]mangoxo\f[] .br * \f[I]pillowfort\f[] .br * \f[I]sankaku\f[] .br * \f[I]seisoparty\f[] .br * \f[I]subscribestar\f[] .br * \f[I]tapas\f[] .br * \f[I]tsumino\f[] .br * \f[I]twitter\f[] .br * \f[I]zerochan\f[] These values can also be specified via the \f[I]-u/--username\f[] and \f[I]-p/--password\f[] command-line options or by using a \f[I].netrc\f[] file. (see Authentication_) (*) The password value for these sites should be the API key found in your user profile, not the actual account password. .SS extractor.*.netrc .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Enable the use of \f[I].netrc\f[] authentication data. .SS extractor.*.cookies .IP "Type:" 6 .br * \f[I]Path\f[] .br * \f[I]object\f[] (name -> value) .br * \f[I]list\f[] .IP "Description:" 4 Source to read additional cookies from. This can be .br * The \f[I]Path\f[] to a Mozilla/Netscape format cookies.txt file .. code:: json "~/.local/share/cookies-instagram-com.txt" .br * An \f[I]object\f[] specifying cookies as name-value pairs .. code:: json { "cookie-name": "cookie-value", "sessionid" : "14313336321%3AsabDFvuASDnlpb%3A31", "isAdult" : "1" } .br * A \f[I]list\f[] with up to 4 entries specifying a browser profile. .br * The first entry is the browser name .br * The optional second entry is a profile name or an absolute path to a profile directory .br * The optional third entry is the keyring to retrieve passwords for decrypting cookies from .br * The optional fourth entry is a (Firefox) container name (\f[I]"none"\f[] for only cookies with no container) .. code:: json ["firefox"] ["firefox", null, null, "Personal"] ["chromium", "Private", "kwallet"] .SS extractor.*.cookies-update .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 If \f[I]extractor.*.cookies\f[] specifies the \f[I]Path\f[] of a cookies.txt file and it can be opened and parsed without errors, update its contents with cookies received during data extraction. .SS extractor.*.proxy .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]object\f[] (scheme -> proxy) .IP "Example:" 4 .. code:: json "http://10.10.1.10:3128" .. code:: json { "http" : "http://10.10.1.10:3128", "https": "http://10.10.1.10:1080", "http://10.20.1.128": "http://10.10.1.10:5323" } .IP "Description:" 4 Proxy (or proxies) to be used for remote connections. .br * If this is a \f[I]string\f[], it is the proxy URL for all outgoing requests. .br * If this is an \f[I]object\f[], it is a scheme-to-proxy mapping to specify different proxy URLs for each scheme. It is also possible to set a proxy for a specific host by using \f[I]scheme://host\f[] as key. See \f[I]Requests' proxy documentation\f[] for more details. Note: If a proxy URLs does not include a scheme, \f[I]http://\f[] is assumed. .SS extractor.*.source-address .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] with 1 \f[I]string\f[] and 1 \f[I]integer\f[] as elements .IP "Example:" 4 .br * "192.168.178.20" .br * ["192.168.178.20", 8080] .IP "Description:" 4 Client-side IP address to bind to. Can be either a simple \f[I]string\f[] with just the local IP address .br or a \f[I]list\f[] with IP and explicit port number as elements. .br .SS extractor.*.user-agent .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Firefox/102.0"\f[] .IP "Description:" 4 User-Agent header value to be used for HTTP requests. Setting this value to \f[I]"browser"\f[] will try to automatically detect and use the User-Agent used by the system's default browser. Note: This option has no effect on pixiv, e621, and mangadex extractors, as these need specific values to function correctly. .SS extractor.*.browser .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 .br * \f[I]"firefox"\f[] for \f[I]patreon\f[], \f[I]mangapark\f[], and \f[I]mangasee\f[] .br * \f[I]null\f[] everywhere else .IP "Example:" 4 .br * "chrome:macos" .IP "Description:" 4 Try to emulate a real browser (\f[I]firefox\f[] or \f[I]chrome\f[]) by using their default HTTP headers and TLS ciphers for HTTP requests. Optionally, the operating system used in the \f[I]User-Agent\f[] header can be specified after a \f[I]:\f[] (\f[I]windows\f[], \f[I]linux\f[], or \f[I]macos\f[]). Note: \f[I]requests\f[] and \f[I]urllib3\f[] only support HTTP/1.1, while a real browser would use HTTP/2. .SS extractor.*.headers .IP "Type:" 6 \f[I]object\f[] (name -> value) .IP "Default:" 9 .. code:: json { "User-Agent" : "", "Accept" : "*/*", "Accept-Language": "en-US,en;q=0.5", "Accept-Encoding": "gzip, deflate" } .IP "Description:" 4 Additional \f[I]HTTP headers\f[] to be sent with each HTTP request, To disable sending a header, set its value to \f[I]null\f[]. .SS extractor.*.ciphers .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Example:" 4 .. code:: json ["ECDHE-ECDSA-AES128-GCM-SHA256", "ECDHE-RSA-AES128-GCM-SHA256", "ECDHE-ECDSA-CHACHA20-POLY1305", "ECDHE-RSA-CHACHA20-POLY1305"] .IP "Description:" 4 List of TLS/SSL cipher suites in \f[I]OpenSSL cipher list format\f[] to be passed to \f[I]ssl.SSLContext.set_ciphers()\f[] .SS extractor.*.keywords .IP "Type:" 6 \f[I]object\f[] (name -> value) .IP "Example:" 4 {"type": "Pixel Art", "type_id": 123} .IP "Description:" 4 Additional name-value pairs to be added to each metadata dictionary. .SS extractor.*.keywords-default .IP "Type:" 6 any .IP "Default:" 9 \f[I]"None"\f[] .IP "Description:" 4 Default value used for missing or undefined keyword names in \f[I]format strings\f[]. .SS extractor.*.url-metadata .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Insert a file's download URL into its metadata dictionary as the given name. For example, setting this option to \f[I]"gdl_file_url"\f[] will cause a new metadata field with name \f[I]gdl_file_url\f[] to appear, which contains the current file's download URL. This can then be used in \f[I]filenames\f[], with a \f[I]metadata\f[] post processor, etc. .SS extractor.*.path-metadata .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Insert a reference to the current \f[I]PathFormat\f[] data structure into metadata dictionaries as the given name. For example, setting this option to \f[I]"gdl_path"\f[] would make it possible to access the current file's filename as \f[I]"{gdl_path.filename}"\f[]. .SS extractor.*.http-metadata .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Insert an \f[I]object\f[] containing a file's HTTP headers and \f[I]filename\f[], \f[I]extension\f[], and \f[I]date\f[] parsed from them into metadata dictionaries as the given name. For example, setting this option to \f[I]"gdl_http"\f[] would make it possible to access the current file's \f[I]Last-Modified\f[] header as \f[I]"{gdl_http[Last-Modified]}"\f[] and its parsed form as \f[I]"{gdl_http[date]}"\f[]. .SS extractor.*.version-metadata .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Insert an \f[I]object\f[] containing gallery-dl's version info into metadata dictionaries as the given name. The content of the object is as follows: .. code:: json { "version" : "string", "is_executable" : "bool", "current_git_head": "string or null" } .SS extractor.*.category-transfer .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 Extractor-specific .IP "Description:" 4 Transfer an extractor's (sub)category values to all child extractors spawned by it, to let them inherit their parent's config options. .SS extractor.*.blacklist & .whitelist .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]["oauth", "recursive", "test"]\f[] + current extractor category .IP "Example:" 4 ["imgur", "gfycat:user", "*:image"] .IP "Description:" 4 A list of extractor identifiers to ignore (or allow) when spawning child extractors for unknown URLs, e.g. from \f[I]reddit\f[] or \f[I]plurk\f[]. Each identifier can be .br * A category or basecategory name (\f[I]"imgur"\f[], \f[I]"mastodon"\f[]) .br * | A (base)category-subcategory pair, where both names are separated by a colon (\f[I]"gfycat:user"\f[]). Both names can be a * or left empty, matching all possible names (\f[I]"*:image"\f[], \f[I]":user"\f[]). .br Note: Any \f[I]blacklist\f[] setting will automatically include \f[I]"oauth"\f[], \f[I]"recursive"\f[], and \f[I]"test"\f[]. .SS extractor.*.archive .IP "Type:" 6 \f[I]Path\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Example:" 4 "$HOME/.archives/{category}.sqlite3" .IP "Description:" 4 File to store IDs of downloaded files in. Downloads of files already recorded in this archive file will be \f[I]skipped\f[]. The resulting archive file is not a plain text file but an SQLite3 database, as either lookup operations are significantly faster or memory requirements are significantly lower when the amount of stored IDs gets reasonably large. Note: Archive files that do not already exist get generated automatically. Note: Archive paths support regular \f[I]format string\f[] replacements, but be aware that using external inputs for building local paths may pose a security risk. .SS extractor.*.archive-format .IP "Type:" 6 \f[I]string\f[] .IP "Example:" 4 "{id}_{offset}" .IP "Description:" 4 An alternative \f[I]format string\f[] to build archive IDs with. .SS extractor.*.archive-prefix .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"{category}"\f[] .IP "Description:" 4 Prefix for archive IDs. .SS extractor.*.archive-pragma .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Example:" 4 ["journal_mode=WAL", "synchronous=NORMAL"] .IP "Description:" 4 A list of SQLite \f[I]PRAGMA\f[] statements to run during archive initialization. See \f[I]\f[] for available \f[I]PRAGMA\f[] statements and further details. .SS extractor.*.postprocessors .IP "Type:" 6 \f[I]list\f[] of \f[I]Postprocessor Configuration\f[] objects .IP "Example:" 4 .. code:: json [ { "name": "zip" , "compression": "store" }, { "name": "exec", "command": ["/home/foobar/script", "{category}", "{image_id}"] } ] .IP "Description:" 4 A list of \f[I]post processors\f[] to be applied to each downloaded file in the specified order. Unlike other options, a \f[I]postprocessors\f[] setting at a deeper level .br does not override any \f[I]postprocessors\f[] setting at a lower level. Instead, all post processors from all applicable \f[I]postprocessors\f[] .br settings get combined into a single list. For example .br * an \f[I]mtime\f[] post processor at \f[I]extractor.postprocessors\f[], .br * a \f[I]zip\f[] post processor at \f[I]extractor.pixiv.postprocessors\f[], .br * and using \f[I]--exec\f[] will run all three post processors - \f[I]mtime\f[], \f[I]zip\f[], \f[I]exec\f[] - for each downloaded \f[I]pixiv\f[] file. .SS extractor.*.postprocessor-options .IP "Type:" 6 \f[I]object\f[] (name -> value) .IP "Example:" 4 .. code:: json { "archive": null, "keep-files": true } .IP "Description:" 4 Additional \f[I]Postprocessor Options\f[] that get added to each individual \f[I]post processor object\f[] before initializing it and evaluating filters. .SS extractor.*.retries .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]4\f[] .IP "Description:" 4 Maximum number of times a failed HTTP request is retried before giving up, or \f[I]-1\f[] for infinite retries. .SS extractor.*.retry-codes .IP "Type:" 6 \f[I]list\f[] of \f[I]integers\f[] .IP "Example:" 4 [404, 429, 430] .IP "Description:" 4 Additional \f[I]HTTP response status codes\f[] to retry an HTTP request on. \f[I]2xx\f[] codes (success responses) and \f[I]3xx\f[] codes (redirection messages) will never be retried and always count as success, regardless of this option. \f[I]5xx\f[] codes (server error responses) will always be retried, regardless of this option. .SS extractor.*.timeout .IP "Type:" 6 \f[I]float\f[] .IP "Default:" 9 \f[I]30.0\f[] .IP "Description:" 4 Amount of time (in seconds) to wait for a successful connection and response from a remote server. This value gets internally used as the \f[I]timeout\f[] parameter for the \f[I]requests.request()\f[] method. .SS extractor.*.verify .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Controls whether to verify SSL/TLS certificates for HTTPS requests. If this is a \f[I]string\f[], it must be the path to a CA bundle to use instead of the default certificates. This value gets internally used as the \f[I]verify\f[] parameter for the \f[I]requests.request()\f[] method. .SS extractor.*.download .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Controls whether to download media files. Setting this to \f[I]false\f[] won't download any files, but all other functions (\f[I]postprocessors\f[], \f[I]download archive\f[], etc.) will be executed as normal. .SS extractor.*.fallback .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Use fallback download URLs when a download fails. .SS extractor.*.image-range .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Examples:" 4 .br * \f[I]"10-20"\f[] .br * \f[I]"-5, 10, 30-50, 100-"\f[] .br * \f[I]"10:21, 30:51:2, :5, 100:"\f[] .br * \f[I]["-5", "10", "30-50", "100-"]\f[] .IP "Description:" 4 Index range(s) selecting which files to download. These can be specified as .br * index: \f[I]3\f[] (file number 3) .br * range: \f[I]2-4\f[] (files 2, 3, and 4) .br * \f[I]slice\f[]: \f[I]3:8:2\f[] (files 3, 5, and 7) Arguments for range and slice notation are optional .br and will default to begin (\f[I]1\f[]) or end (\f[I]sys.maxsize\f[]) if omitted. For example \f[I]5-\f[], \f[I]5:\f[], and \f[I]5::\f[] all mean "Start at file number 5". .br Note: The index of the first file is \f[I]1\f[]. .SS extractor.*.chapter-range .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 Like \f[I]image-range\f[], but applies to delegated URLs like manga chapters, etc. .SS extractor.*.image-filter .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Examples:" 4 .br * \f[I]"re.search(r'foo(bar)+', description)"\f[] .br * \f[I]["width >= 1200", "width/height > 1.2"]\f[] .IP "Description:" 4 Python expression controlling which files to download. A file only gets downloaded when *all* of the given expressions evaluate to \f[I]True\f[]. Available values are the filename-specific ones listed by \f[I]-K\f[] or \f[I]-j\f[]. .SS extractor.*.chapter-filter .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Examples:" 4 .br * \f[I]"lang == 'en'"\f[] .br * \f[I]["language == 'French'", "10 <= chapter < 20"]\f[] .IP "Description:" 4 Like \f[I]image-filter\f[], but applies to delegated URLs like manga chapters, etc. .SS extractor.*.image-unique .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Ignore image URLs that have been encountered before during the current extractor run. .SS extractor.*.chapter-unique .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Like \f[I]image-unique\f[], but applies to delegated URLs like manga chapters, etc. .SS extractor.*.date-format .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"%Y-%m-%dT%H:%M:%S"\f[] .IP "Description:" 4 Format string used to parse \f[I]string\f[] values of date-min and date-max. See \f[I]strptime\f[] for a list of formatting directives. Note: Despite its name, this option does **not** control how \f[I]{date}\f[] metadata fields are formatted. To use a different formatting for those values other than the default \f[I]%Y-%m-%d %H:%M:%S\f[], put \f[I]strptime\f[] formatting directives after a colon \f[I]:\f[], for example \f[I]{date:%Y%m%d}\f[]. .SH EXTRACTOR-SPECIFIC OPTIONS .SS extractor.artstation.external .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Try to follow external URLs of embedded players. .SS extractor.artstation.max-posts .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Limit the number of posts/projects to download. .SS extractor.artstation.search.pro-first .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Enable the "Show Studio and Pro member artwork first" checkbox when retrieving search results. .SS extractor.aryion.recursive .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Controls the post extraction strategy. .br * \f[I]true\f[]: Start on users' main gallery pages and recursively descend into subfolders .br * \f[I]false\f[]: Get posts from "Latest Updates" pages .SS extractor.bbc.width .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]1920\f[] .IP "Description:" 4 Specifies the requested image width. This value must be divisble by 16 and gets rounded down otherwise. The maximum possible value appears to be \f[I]1920\f[]. .SS extractor.blogger.videos .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download embedded videos hosted on https://www.blogger.com/ .SS extractor.cyberdrop.domain .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Example:" 4 "cyberdrop.to" .IP "Description:" 4 Specifies the domain used by \f[I]cyberdrop\f[] regardless of input URL. Setting this option to \f[I]"auto"\f[] uses the same domain as a given input URL. .SS extractor.danbooru.external .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 For unavailable or restricted posts, follow the \f[I]source\f[] and download from there if possible. .SS extractor.danbooru.ugoira .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Controls the download target for Ugoira posts. .br * \f[I]true\f[]: Original ZIP archives .br * \f[I]false\f[]: Converted video files .SS extractor.[Danbooru].metadata .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Example:" 4 .br * replacements,comments,ai_tags .br * ["replacements", "comments", "ai_tags"] .IP "Description:" 4 Extract additional metadata (notes, artist commentary, parent, children, uploader) It is possible to specify a custom list of metadata includes. See \f[I]available_includes\f[] for possible field names. \f[I]aibooru\f[] also supports \f[I]ai_metadata\f[]. Note: This requires 1 additional HTTP request per post. .SS extractor.{Danbooru].threshold .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]integer\f[] .IP "Default:" 9 \f[I]"auto"\f[] .IP "Description:" 4 Stop paginating over API results if the length of a batch of returned posts is less than the specified number. Defaults to the per-page limit of the current instance, which is 200. Note: Changing this setting is normally not necessary. When the value is greater than the per-page limit, gallery-dl will stop after the first batch. The value cannot be less than 1. .SS extractor.derpibooru.api-key .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Your \f[I]Derpibooru API Key\f[], to use your account's browsing settings and filters. .SS extractor.derpibooru.filter .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]56027\f[] (\f[I]Everything\f[] filter) .IP "Description:" 4 The content filter ID to use. Setting an explicit filter ID overrides any default filters and can be used to access 18+ content without \f[I]API Key\f[]. See \f[I]Filters\f[] for details. .SS extractor.deviantart.auto-watch .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Automatically watch users when encountering "Watchers-Only Deviations" (requires a \f[I]refresh-token\f[]). .SS extractor.deviantart.auto-unwatch .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 After watching a user through \f[I]auto-watch\f[], unwatch that user at the end of the current extractor run. .SS extractor.deviantart.comments .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract \f[I]comments\f[] metadata. .SS extractor.deviantart.extra .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download extra Sta.sh resources from description texts and journals. Note: Enabling this option also enables deviantart.metadata_. .SS extractor.deviantart.flat .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Select the directory structure created by the Gallery- and Favorite-Extractors. .br * \f[I]true\f[]: Use a flat directory structure. .br * \f[I]false\f[]: Collect a list of all gallery-folders or favorites-collections and transfer any further work to other extractors (\f[I]folder\f[] or \f[I]collection\f[]), which will then create individual subdirectories for each of them. Note: Going through all gallery folders will not be able to fetch deviations which aren't in any folder. .SS extractor.deviantart.folders .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Provide a \f[I]folders\f[] metadata field that contains the names of all folders a deviation is present in. Note: Gathering this information requires a lot of API calls. Use with caution. .SS extractor.deviantart.group .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Check whether the profile name in a given URL belongs to a group or a regular user. .SS extractor.deviantart.include .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"gallery"\f[] .IP "Example:" 4 .br * "favorite,journal,scraps" .br * ["favorite", "journal", "scraps"] .IP "Description:" 4 A (comma-separated) list of subcategories to include when processing a user profile. Possible values are \f[I]"gallery"\f[], \f[I]"scraps"\f[], \f[I]"journal"\f[], \f[I]"favorite"\f[], \f[I]"status"\f[]. It is possible to use \f[I]"all"\f[] instead of listing all values separately. .SS extractor.deviantart.journals .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"html"\f[] .IP "Description:" 4 Selects the output format for textual content. This includes journals, literature and status updates. .br * \f[I]"html"\f[]: HTML with (roughly) the same layout as on DeviantArt. .br * \f[I]"text"\f[]: Plain text with image references and HTML tags removed. .br * \f[I]"none"\f[]: Don't download textual content. .SS extractor.deviantart.mature .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Enable mature content. This option simply sets the \f[I]mature_content\f[] parameter for API calls to either \f[I]"true"\f[] or \f[I]"false"\f[] and does not do any other form of content filtering. .SS extractor.deviantart.metadata .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Request extended metadata for deviation objects to additionally provide \f[I]description\f[], \f[I]tags\f[], \f[I]license\f[] and \f[I]is_watching\f[] fields. .SS extractor.deviantart.original .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download original files if available. Setting this option to \f[I]"images"\f[] only downloads original files if they are images and falls back to preview versions for everything else (archives, etc.). .SS extractor.deviantart.pagination .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"api"\f[] .IP "Description:" 4 Controls when to stop paginating over API results. .br * \f[I]"api"\f[]: Trust the API and stop when \f[I]has_more\f[] is \f[I]false\f[]. .br * \f[I]"manual"\f[]: Disregard \f[I]has_more\f[] and only stop when a batch of results is empty. .SS extractor.deviantart.refresh-token .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 The \f[I]refresh-token\f[] value you get from \f[I]linking your DeviantArt account to gallery-dl\f[]. Using a \f[I]refresh-token\f[] allows you to access private or otherwise not publicly available deviations. Note: The \f[I]refresh-token\f[] becomes invalid \f[I]after 3 months\f[] or whenever your \f[I]cache file\f[] is deleted or cleared. .SS extractor.deviantart.wait-min .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]0\f[] .IP "Description:" 4 Minimum wait time in seconds before API requests. .SS extractor.[E621].metadata .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Example:" 4 .br * notes,pools .br * ["notes", "pools" .IP "Description:" 4 Extract additional metadata (notes, pool metadata) if available. Note: This requires 0-2 additional HTTP requests per post. .SS extractor.[E621].threshold .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]integer\f[] .IP "Default:" 9 \f[I]"auto"\f[] .IP "Description:" 4 Stop paginating over API results if the length of a batch of returned posts is less than the specified number. Defaults to the per-page limit of the current instance, which is 320. Note: Changing this setting is normally not necessary. When the value is greater than the per-page limit, gallery-dl will stop after the first batch. The value cannot be less than 1. .SS extractor.exhentai.domain .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"auto"\f[] .IP "Description:" 4 .br * \f[I]"auto"\f[]: Use \f[I]e-hentai.org\f[] or \f[I]exhentai.org\f[] depending on the input URL .br * \f[I]"e-hentai.org"\f[]: Use \f[I]e-hentai.org\f[] for all URLs .br * \f[I]"exhentai.org"\f[]: Use \f[I]exhentai.org\f[] for all URLs .SS extractor.exhentai.limits .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Sets a custom image download limit and stops extraction when it gets exceeded. .SS extractor.exhentai.metadata .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Load extended gallery metadata from the \f[I]API\f[]. Adds \f[I]archiver_key\f[], \f[I]posted\f[], and \f[I]torrents\f[]. Makes \f[I]date\f[] and \f[I]filesize\f[] more precise. .SS extractor.exhentai.original .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download full-sized original images if available. .SS extractor.exhentai.source .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"gallery"\f[] .IP "Description:" 4 Selects an alternative source to download files from. .br * \f[I]"hitomi"\f[]: Download the corresponding gallery from \f[I]hitomi.la\f[] .SS extractor.fanbox.embeds .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Control behavior on embedded content from external sites. .br * \f[I]true\f[]: Extract embed URLs and download them if supported (videos are not downloaded). .br * \f[I]"ytdl"\f[]: Like \f[I]true\f[], but let \f[I]youtube-dl\f[] handle video extraction and download for YouTube, Vimeo and SoundCloud embeds. .br * \f[I]false\f[]: Ignore embeds. .SS extractor.flickr.access-token & .access-token-secret .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 The \f[I]access_token\f[] and \f[I]access_token_secret\f[] values you get from \f[I]linking your Flickr account to gallery-dl\f[]. .SS extractor.flickr.videos .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Extract and download videos. .SS extractor.flickr.size-max .IP "Type:" 6 .br * \f[I]integer\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Sets the maximum allowed size for downloaded images. .br * If this is an \f[I]integer\f[], it specifies the maximum image dimension (width and height) in pixels. .br * If this is a \f[I]string\f[], it should be one of Flickr's format specifiers (\f[I]"Original"\f[], \f[I]"Large"\f[], ... or \f[I]"o"\f[], \f[I]"k"\f[], \f[I]"h"\f[], \f[I]"l"\f[], ...) to use as an upper limit. .SS extractor.furaffinity.descriptions .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"text"\f[] .IP "Description:" 4 Controls the format of \f[I]description\f[] metadata fields. .br * \f[I]"text"\f[]: Plain text with HTML tags removed .br * \f[I]"html"\f[]: Raw HTML content .SS extractor.furaffinity.external .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Follow external URLs linked in descriptions. .SS extractor.furaffinity.include .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"gallery"\f[] .IP "Example:" 4 .br * "scraps,favorite" .br * ["scraps", "favorite"] .IP "Description:" 4 A (comma-separated) list of subcategories to include when processing a user profile. Possible values are \f[I]"gallery"\f[], \f[I]"scraps"\f[], \f[I]"favorite"\f[]. It is possible to use \f[I]"all"\f[] instead of listing all values separately. .SS extractor.furaffinity.layout .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"auto"\f[] .IP "Description:" 4 Selects which site layout to expect when parsing posts. .br * \f[I]"auto"\f[]: Automatically differentiate between \f[I]"old"\f[] and \f[I]"new"\f[] .br * \f[I]"old"\f[]: Expect the *old* site layout .br * \f[I]"new"\f[]: Expect the *new* site layout .SS extractor.gelbooru.api-key & .user-id .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Values from the API Access Credentials section found at the bottom of your \f[I]Account Options\f[] page. .SS extractor.generic.enabled .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Match **all** URLs not otherwise supported by gallery-dl, even ones without a \f[I]generic:\f[] prefix. .SS extractor.gfycat.format .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]["mp4", "webm", "mobile", "gif"]\f[] .IP "Description:" 4 List of names of the preferred animation format, which can be \f[I]"mp4"\f[], \f[I]"webm"\f[], \f[I]"mobile"\f[], \f[I]"gif"\f[], or \f[I]"webp"\f[]. If a selected format is not available, the next one in the list will be tried until an available format is found. If the format is given as \f[I]string\f[], it will be extended with \f[I]["mp4", "webm", "mobile", "gif"]\f[]. Use a list with one element to restrict it to only one possible format. .SS extractor.gofile.api-token .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 API token value found at the bottom of your \f[I]profile page\f[]. If not set, a temporary guest token will be used. .SS extractor.gofile.website-token .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"12345"\f[] .IP "Description:" 4 API token value used during API requests. A not up-to-date value will result in \f[I]401 Unauthorized\f[] errors. Setting this value to \f[I]null\f[] will do an extra HTTP request to fetch the current value used by gofile. .SS extractor.gofile.recursive .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Recursively download files from subfolders. .SS extractor.hentaifoundry.include .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"pictures"\f[] .IP "Example:" 4 .br * "scraps,stories" .br * ["scraps", "stories"] .IP "Description:" 4 A (comma-separated) list of subcategories to include when processing a user profile. Possible values are \f[I]"pictures"\f[], \f[I]"scraps"\f[], \f[I]"stories"\f[], \f[I]"favorite"\f[]. It is possible to use \f[I]"all"\f[] instead of listing all values separately. .SS extractor.hitomi.format .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"webp"\f[] .IP "Description:" 4 Selects which image format to download. Available formats are \f[I]"webp"\f[] and \f[I]"avif"\f[]. \f[I]"original"\f[] will try to download the original \f[I]jpg\f[] or \f[I]png\f[] versions, but is most likely going to fail with \f[I]403 Forbidden\f[] errors. .SS extractor.imgur.mp4 .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Controls whether to choose the GIF or MP4 version of an animation. .br * \f[I]true\f[]: Follow Imgur's advice and choose MP4 if the \f[I]prefer_video\f[] flag in an image's metadata is set. .br * \f[I]false\f[]: Always choose GIF. .br * \f[I]"always"\f[]: Always choose MP4. .SS extractor.inkbunny.orderby .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"create_datetime"\f[] .IP "Description:" 4 Value of the \f[I]orderby\f[] parameter for submission searches. (See \f[I]API#Search\f[] for details) .SS extractor.instagram.api .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"rest"\f[] .IP "Description:" 4 Selects which API endpoints to use. .br * \f[I]"rest"\f[]: REST API - higher-resolution media .br * \f[I]"graphql"\f[]: GraphQL API - lower-resolution media .SS extractor.instagram.include .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"posts"\f[] .IP "Example:" 4 .br * "stories,highlights,posts" .br * ["stories", "highlights", "posts"] .IP "Description:" 4 A (comma-separated) list of subcategories to include when processing a user profile. Possible values are \f[I]"posts"\f[], \f[I]"reels"\f[], \f[I]"tagged"\f[], \f[I]"stories"\f[], \f[I]"highlights"\f[], \f[I]"avatar"\f[]. It is possible to use \f[I]"all"\f[] instead of listing all values separately. .SS extractor.instagram.previews .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download video previews. .SS extractor.instagram.videos .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download video files. .SS extractor.itaku.videos .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download video files. .SS extractor.kemonoparty.comments .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract \f[I]comments\f[] metadata. Note: This requires 1 additional HTTP request per post. .SS extractor.kemonoparty.duplicates .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Controls how to handle duplicate files in a post. .br * \f[I]true\f[]: Download duplicates .br * \f[I]false\f[]: Ignore duplicates .SS extractor.kemonoparty.dms .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract a user's direct messages as \f[I]dms\f[] metadata. .SS extractor.kemonoparty.favorites .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]artist\f[] .IP "Description:" 4 Determines the type of favorites to be downloaded. Available types are \f[I]artist\f[], and \f[I]post\f[]. .SS extractor.kemonoparty.files .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]["attachments", "file", "inline"]\f[] .IP "Description:" 4 Determines the type and order of files to be downloaded. Available types are \f[I]file\f[], \f[I]attachments\f[], and \f[I]inline\f[]. .SS extractor.kemonoparty.max-posts .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Limit the number of posts to download. .SS extractor.kemonoparty.metadata .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract \f[I]username\f[] metadata .SS extractor.khinsider.format .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"mp3"\f[] .IP "Description:" 4 The name of the preferred file format to download. Use \f[I]"all"\f[] to download all available formats, or a (comma-separated) list to select multiple formats. If the selected format is not available, the first in the list gets chosen (usually mp3). .SS extractor.lolisafe.domain .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Specifies the domain used by a \f[I]lolisafe\f[] extractor regardless of input URL. Setting this option to \f[I]"auto"\f[] uses the same domain as a given input URL. .SS extractor.luscious.gif .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Format in which to download animated images. Use \f[I]true\f[] to download animated images as gifs and \f[I]false\f[] to download as mp4 videos. .SS extractor.mangadex.api-server .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"https://api.mangadex.org"\f[] .IP "Description:" 4 The server to use for API requests. .SS extractor.mangadex.api-parameters .IP "Type:" 6 \f[I]object\f[] (name -> value) .IP "Example:" 4 {"order[updatedAt]": "desc"} .IP "Description:" 4 Additional query parameters to send when fetching manga chapters. (See \f[I]/manga/{id}/feed\f[] and \f[I]/user/follows/manga/feed\f[]) .SS extractor.mangadex.lang .IP "Type:" 6 \f[I]string\f[] .IP "Example:" 4 "en" .IP "Description:" 4 \f[I]ISO 639-1\f[] language code to filter chapters by. .SS extractor.mangadex.ratings .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]["safe", "suggestive", "erotica", "pornographic"]\f[] .IP "Description:" 4 List of acceptable content ratings for returned chapters. .SS extractor.[mastodon].access-token .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 The \f[I]access-token\f[] value you get from \f[I]linking your account to gallery-dl\f[]. Note: gallery-dl comes with built-in tokens for \f[I]mastodon.social\f[], \f[I]pawoo\f[] and \f[I]baraag\f[]. For other instances, you need to obtain an \f[I]access-token\f[] in order to use usernames in place of numerical user IDs. .SS extractor.[mastodon].reblogs .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Fetch media from reblogged posts. .SS extractor.[mastodon].replies .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Fetch media from replies to other posts. .SS extractor.[mastodon].text-posts .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Also emit metadata for text-only posts without media content. .SS extractor.[misskey].renotes .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Fetch media from renoted notes. .SS extractor.[misskey].replies .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Fetch media from replies to other notes. .SS extractor.nana.favkey .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Your \f[I]Nana Favorite Key\f[], used to access your favorite archives. .SS extractor.newgrounds.flash .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download original Adobe Flash animations instead of pre-rendered videos. .SS extractor.newgrounds.format .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"original"\f[] .IP "Example:" 4 "720p" .IP "Description:" 4 Selects the preferred format for video downloads. If the selected format is not available, the next smaller one gets chosen. .SS extractor.newgrounds.include .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"art"\f[] .IP "Example:" 4 .br * "movies,audio" .br * ["movies", "audio"] .IP "Description:" 4 A (comma-separated) list of subcategories to include when processing a user profile. Possible values are \f[I]"art"\f[], \f[I]"audio"\f[], \f[I]"games"\f[], \f[I]"movies"\f[]. It is possible to use \f[I]"all"\f[] instead of listing all values separately. .SS extractor.nijie.include .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"illustration,doujin"\f[] .IP "Description:" 4 A (comma-separated) list of subcategories to include when processing a user profile. Possible values are \f[I]"illustration"\f[], \f[I]"doujin"\f[], \f[I]"favorite"\f[], \f[I]"nuita"\f[]. It is possible to use \f[I]"all"\f[] instead of listing all values separately. .SS extractor.nitter.quoted .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Fetch media from quoted Tweets. .SS extractor.nitter.retweets .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Fetch media from Retweets. .SS extractor.nitter.videos .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Control video download behavior. .br * \f[I]true\f[]: Download videos .br * \f[I]"ytdl"\f[]: Download videos using \f[I]youtube-dl\f[] .br * \f[I]false\f[]: Skip video Tweets .SS extractor.oauth.browser .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Controls how a user is directed to an OAuth authorization page. .br * \f[I]true\f[]: Use Python's \f[I]webbrowser.open()\f[] method to automatically open the URL in the user's default browser. .br * \f[I]false\f[]: Ask the user to copy & paste an URL from the terminal. .SS extractor.oauth.cache .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Store tokens received during OAuth authorizations in \f[I]cache\f[]. .SS extractor.oauth.host .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"localhost"\f[] .IP "Description:" 4 Host name / IP address to bind to during OAuth authorization. .SS extractor.oauth.port .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]6414\f[] .IP "Description:" 4 Port number to listen on during OAuth authorization. Note: All redirects will go to port \f[I]6414\f[], regardless of the port specified here. You'll have to manually adjust the port number in your browser's address bar when using a different port than the default. .SS extractor.paheal.metadata .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract additional metadata (\f[I]source\f[], \f[I]uploader\f[]) Note: This requires 1 additional HTTP request per post. .SS extractor.patreon.files .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]["images", "image_large", "attachments", "postfile", "content"]\f[] .IP "Description:" 4 Determines the type and order of files to be downloaded. Available types are \f[I]postfile\f[], \f[I]images\f[], \f[I]image_large\f[], \f[I]attachments\f[], and \f[I]content\f[]. .SS extractor.photobucket.subalbums .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download subalbums. .SS extractor.pillowfort.external .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Follow links to external sites, e.g. Twitter, .SS extractor.pillowfort.inline .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Extract inline images. .SS extractor.pillowfort.reblogs .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract media from reblogged posts. .SS extractor.pinterest.domain .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"auto"\f[] .IP "Description:" 4 Specifies the domain used by \f[I]pinterest\f[] extractors. Setting this option to \f[I]"auto"\f[] uses the same domain as a given input URL. .SS extractor.pinterest.sections .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Include pins from board sections. .SS extractor.pinterest.videos .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download from video pins. .SS extractor.pixiv.include .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"artworks"\f[] .IP "Example:" 4 .br * "avatar,background,artworks" .br * ["avatar", "background", "artworks"] .IP "Description:" 4 A (comma-separated) list of subcategories to include when processing a user profile. Possible values are \f[I]"artworks"\f[], \f[I]"avatar"\f[], \f[I]"background"\f[], \f[I]"favorite"\f[]. It is possible to use \f[I]"all"\f[] instead of listing all values separately. .SS extractor.pixiv.refresh-token .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 The \f[I]refresh-token\f[] value you get from running \f[I]gallery-dl oauth:pixiv\f[] (see OAuth_) or by using a third-party tool like \f[I]gppt\f[]. .SS extractor.pixiv.metadata .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Fetch extended \f[I]user\f[] metadata. .SS extractor.pixiv.metadata-bookmark .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 For works bookmarked by \f[I]your own account\f[], fetch bookmark tags as \f[I]tags_bookmark\f[] metadata. Note: This requires 1 additional API call per bookmarked post. .SS extractor.pixiv.work.related .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Also download related artworks. .SS extractor.pixiv.tags .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"japanese"\f[] .IP "Description:" 4 Controls the \f[I]tags\f[] metadata field. .br * "japanese": List of Japanese tags .br * "translated": List of translated tags .br * "original": Unmodified list with both Japanese and translated tags .SS extractor.pixiv.ugoira .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download Pixiv's Ugoira animations or ignore them. These animations come as a \f[I].zip\f[] file containing all animation frames in JPEG format. Use an ugoira post processor to convert them to watchable videos. (Example__) .SS extractor.pixiv.max-posts .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]0\f[] .IP "Description:" 4 When downloading galleries, this sets the maximum number of posts to get. A value of \f[I]0\f[] means no limit. .SS extractor.plurk.comments .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Also search Plurk comments for URLs. .SS extractor.reactor.gif .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Format in which to download animated images. Use \f[I]true\f[] to download animated images as gifs and \f[I]false\f[] to download as mp4 videos. .SS extractor.readcomiconline.captcha .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"stop"\f[] .IP "Description:" 4 Controls how to handle redirects to CAPTCHA pages. .br * \f[I]"stop\f[]: Stop the current extractor run. .br * \f[I]"wait\f[]: Ask the user to solve the CAPTCHA and wait. .SS extractor.readcomiconline.quality .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"auto"\f[] .IP "Description:" 4 Sets the \f[I]quality\f[] query parameter of issue pages. (\f[I]"lq"\f[] or \f[I]"hq"\f[]) \f[I]"auto"\f[] uses the quality parameter of the input URL or \f[I]"hq"\f[] if not present. .SS extractor.reddit.comments .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]0\f[] .IP "Description:" 4 The value of the \f[I]limit\f[] parameter when loading a submission and its comments. This number (roughly) specifies the total amount of comments being retrieved with the first API call. Reddit's internal default and maximum values for this parameter appear to be 200 and 500 respectively. The value \f[I]0\f[] ignores all comments and significantly reduces the time required when scanning a subreddit. .SS extractor.reddit.morecomments .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Retrieve additional comments by resolving the \f[I]more\f[] comment stubs in the base comment tree. Note: This requires 1 additional API call for every 100 extra comments. .SS extractor.reddit.date-min & .date-max .IP "Type:" 6 \f[I]Date\f[] .IP "Default:" 9 \f[I]0\f[] and \f[I]253402210800\f[] (timestamp of \f[I]datetime.max\f[]) .IP "Description:" 4 Ignore all submissions posted before/after this date. .SS extractor.reddit.id-min & .id-max .IP "Type:" 6 \f[I]string\f[] .IP "Example:" 4 "6kmzv2" .IP "Description:" 4 Ignore all submissions posted before/after the submission with this ID. .SS extractor.reddit.recursion .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]0\f[] .IP "Description:" 4 Reddit extractors can recursively visit other submissions linked to in the initial set of submissions. This value sets the maximum recursion depth. Special values: .br * \f[I]0\f[]: Recursion is disabled .br * \f[I]-1\f[]: Infinite recursion (don't do this) .SS extractor.reddit.refresh-token .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 The \f[I]refresh-token\f[] value you get from \f[I]linking your Reddit account to gallery-dl\f[]. Using a \f[I]refresh-token\f[] allows you to access private or otherwise not publicly available subreddits, given that your account is authorized to do so, but requests to the reddit API are going to be rate limited at 600 requests every 10 minutes/600 seconds. .SS extractor.reddit.videos .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Control video download behavior. .br * \f[I]true\f[]: Download videos and use \f[I]youtube-dl\f[] to handle HLS and DASH manifests .br * \f[I]"ytdl"\f[]: Download videos and let \f[I]youtube-dl\f[] handle all of video extraction and download .br * \f[I]"dash"\f[]: Extract DASH manifest URLs and use \f[I]youtube-dl\f[] to download and merge them. (*) .br * \f[I]false\f[]: Ignore videos (*) This saves 1 HTTP request per video and might potentially be able to download otherwise deleted videos, but it will not always get the best video quality available. .SS extractor.redgifs.format .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]["hd", "sd", "gif"]\f[] .IP "Description:" 4 List of names of the preferred animation format, which can be \f[I]"hd"\f[], \f[I]"sd"\f[], "gif", "vthumbnail"`, "thumbnail"\f[I], or \f[]"poster"\f[I]. If a selected format is not available, the next one in the list will be tried until an available format is found. If the format is given as \f[]string\f[I], it will be extended with \f[]["hd", "sd", "gif"]``. Use a list with one element to restrict it to only one possible format. .SS extractor.sankaku.refresh .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Refresh download URLs before they expire. .SS extractor.sankakucomplex.embeds .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download video embeds from external sites. .SS extractor.sankakucomplex.videos .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download videos. .SS extractor.skeb.article .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download article images. .SS extractor.skeb.sent-requests .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download sent requests. .SS extractor.skeb.thumbnails .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download thumbnails. .SS extractor.skeb.search.filters .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]["genre:art", "genre:voice", "genre:novel", "genre:video", "genre:music", "genre:correction"]\f[] .IP "Example:" 4 "genre:music OR genre:voice" .IP "Description:" 4 Filters used during searches. .SS extractor.smugmug.videos .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download video files. .SS extractor.[szurubooru].username & .token .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 Username and login token of your account to access private resources. To generate a token, visit \f[I]/user/USERNAME/list-tokens\f[] and click \f[I]Create Token\f[]. .SS extractor.tumblr.avatar .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download blog avatars. .SS extractor.tumblr.date-min & .date-max .IP "Type:" 6 \f[I]Date\f[] .IP "Default:" 9 \f[I]0\f[] and \f[I]null\f[] .IP "Description:" 4 Ignore all posts published before/after this date. .SS extractor.tumblr.external .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Follow external URLs (e.g. from "Link" posts) and try to extract images from them. .SS extractor.tumblr.inline .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Search posts for inline images and videos. .SS extractor.tumblr.offset .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]0\f[] .IP "Description:" 4 Custom \f[I]offset\f[] starting value when paginating over blog posts. Allows skipping over posts without having to waste API calls. .SS extractor.tumblr.original .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download full-resolution \f[I]photo\f[] and \f[I]inline\f[] images. For each photo with "maximum" resolution (width equal to 2048 or height equal to 3072) or each inline image, use an extra HTTP request to find the URL to its full-resolution version. .SS extractor.tumblr.ratelimit .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"abort"\f[] .IP "Description:" 4 Selects how to handle exceeding the daily API rate limit. .br * \f[I]"abort"\f[]: Raise an error and stop extraction .br * \f[I]"wait"\f[]: Wait until rate limit reset .SS extractor.tumblr.reblogs .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 .br * \f[I]true\f[]: Extract media from reblogged posts .br * \f[I]false\f[]: Skip reblogged posts .br * \f[I]"same-blog"\f[]: Skip reblogged posts unless the original post is from the same blog .SS extractor.tumblr.posts .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"all"\f[] .IP "Example:" 4 .br * "video,audio,link" .br * ["video", "audio", "link"] .IP "Description:" 4 A (comma-separated) list of post types to extract images, etc. from. Possible types are \f[I]text\f[], \f[I]quote\f[], \f[I]link\f[], \f[I]answer\f[], \f[I]video\f[], \f[I]audio\f[], \f[I]photo\f[], \f[I]chat\f[]. It is possible to use \f[I]"all"\f[] instead of listing all types separately. .SS extractor.tumblr.fallback-delay .IP "Type:" 6 \f[I]float\f[] .IP "Default:" 9 \f[I]120.0\f[] .IP "Description:" 4 Number of seconds to wait between retries for fetching full-resolution images. .SS extractor.tumblr.fallback-retries .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]2\f[] .IP "Description:" 4 Number of retries for fetching full-resolution images. .SS extractor.twibooru.api-key .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Your \f[I]Twibooru API Key\f[], to use your account's browsing settings and filters. .SS extractor.twibooru.filter .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]2\f[] (\f[I]Everything\f[] filter) .IP "Description:" 4 The content filter ID to use. Setting an explicit filter ID overrides any default filters and can be used to access 18+ content without \f[I]API Key\f[]. See \f[I]Filters\f[] for details. .SS extractor.twitter.cards .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Controls how to handle \f[I]Twitter Cards\f[]. .br * \f[I]false\f[]: Ignore cards .br * \f[I]true\f[]: Download image content from supported cards .br * \f[I]"ytdl"\f[]: Additionally download video content from unsupported cards using \f[I]youtube-dl\f[] .SS extractor.twitter.cards-blacklist .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Example:" 4 ["summary", "youtube.com", "player:twitch.tv"] .IP "Description:" 4 List of card types to ignore. Possible values are .br * card names .br * card domains .br * \f[I]:\f[] .SS extractor.twitter.conversations .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 For input URLs pointing to a single Tweet, e.g. https://twitter.com/i/web/status/, fetch media from all Tweets and replies in this \f[I]conversation \f[] or thread. .SS extractor.twitter.csrf .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"cookies"\f[] .IP "Description:" 4 Controls how to handle Cross Site Request Forgery (CSRF) tokens. .br * \f[I]"auto"\f[]: Always auto-generate a token. .br * \f[I]"cookies"\f[]: Use token given by the \f[I]ct0\f[] cookie if present. .SS extractor.twitter.expand .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 For each Tweet, return *all* Tweets from that initial Tweet's conversation or thread, i.e. *expand* all Twitter threads. Going through a timeline with this option enabled is essentially the same as running \f[I]gallery-dl https://twitter.com/i/web/status/\f[] with enabled \f[I]conversations\f[] option for each Tweet in said timeline. Note: This requires at least 1 additional API call per initial Tweet. Age-restricted replies cannot be expanded when using the \f[I]syndication\f[] API. .SS extractor.twitter.transform .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Transform Tweet and User metadata into a simpler, uniform format. .SS extractor.twitter.size .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]["orig", "4096x4096", "large", "medium", "small"]\f[] .IP "Description:" 4 The image version to download. Any entries after the first one will be used for potential \f[I]fallback\f[] URLs. Known available sizes are \f[I]4096x4096\f[], \f[I]orig\f[], \f[I]large\f[], \f[I]medium\f[], and \f[I]small\f[]. .SS extractor.twitter.syndication .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Controls how to retrieve age-restricted content when not logged in. .br * \f[I]false\f[]: Skip age-restricted Tweets. .br * \f[I]true\f[]: Download using Twitter's syndication API. .br * \f[I]"extended"\f[]: Try to fetch Tweet metadata using the normal API in addition to the syndication API. This requires additional HTTP requests in some cases (e.g. when \f[I]retweets\f[] are enabled). Note: This does not apply to search results (including \f[I]timeline strategies\f[]). To retrieve such content from search results, you must log in and disable "Hide sensitive content" in your \f[I]search settings \f[]. .SS extractor.twitter.logout .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Logout and retry as guest when access to another user's Tweets is blocked. .SS extractor.twitter.pinned .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Fetch media from pinned Tweets. .SS extractor.twitter.quoted .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Fetch media from quoted Tweets. If this option is enabled, gallery-dl will try to fetch a quoted (original) Tweet when it sees the Tweet which quotes it. .SS extractor.twitter.replies .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Fetch media from replies to other Tweets. If this value is \f[I]"self"\f[], only consider replies where reply and original Tweet are from the same user. Note: Twitter will automatically expand conversations if you use the \f[I]/with_replies\f[] timeline while logged in. For example, media from Tweets which the user replied to will also be downloaded. It is possible to exclude unwanted Tweets using \f[I]image-filter \f[]. .SS extractor.twitter.retweets .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Fetch media from Retweets. If this value is \f[I]"original"\f[], metadata for these files will be taken from the original Tweets, not the Retweets. .SS extractor.twitter.timeline.strategy .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"auto"\f[] .IP "Description:" 4 Controls the strategy / tweet source used for user URLs (\f[I]https://twitter.com/USER\f[]). .br * \f[I]"tweets"\f[]: \f[I]/tweets\f[] timeline + search .br * \f[I]"media"\f[]: \f[I]/media\f[] timeline + search .br * \f[I]"with_replies"\f[]: \f[I]/with_replies\f[] timeline + search .br * \f[I]"auto"\f[]: \f[I]"tweets"\f[] or \f[I]"media"\f[], depending on \f[I]retweets\f[] and \f[I]text-tweets\f[] settings .SS extractor.twitter.text-tweets .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Also emit metadata for text-only Tweets without media content. This only has an effect with a \f[I]metadata\f[] (or \f[I]exec\f[]) post processor with \f[I]"event": "post"\f[] and appropriate \f[I]filename\f[]. .SS extractor.twitter.twitpic .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract \f[I]TwitPic\f[] embeds. .SS extractor.twitter.unique .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Ignore previously seen Tweets. .SS extractor.twitter.users .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"timeline"\f[] .IP "Example:" 4 "https://twitter.com/search?q=from:{legacy[screen_name]}" .IP "Description:" 4 Format string for user URLs generated from .br \f[I]following\f[] and \f[I]list-members\f[] queries, whose replacement field values come from Twitter \f[I]user\f[] objects .br (\f[I]Example\f[]) Special values: .br * \f[I]"timeline"\f[]: \f[I]https://twitter.com/i/user/{rest_id}\f[] .br * \f[I]"tweets"\f[]: \f[I]https://twitter.com/id:{rest_id}/tweets\f[] .br * \f[I]"media"\f[]: \f[I]https://twitter.com/id:{rest_id}/media\f[] Note: To allow gallery-dl to follow custom URL formats, set the \f[I]blacklist\f[] for \f[I]twitter\f[] to a non-default value, e.g. an empty string \f[I]""\f[]. .SS extractor.twitter.videos .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Control video download behavior. .br * \f[I]true\f[]: Download videos .br * \f[I]"ytdl"\f[]: Download videos using \f[I]youtube-dl\f[] .br * \f[I]false\f[]: Skip video Tweets .SS extractor.unsplash.format .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"raw"\f[] .IP "Description:" 4 Name of the image format to download. Available formats are \f[I]"raw"\f[], \f[I]"full"\f[], \f[I]"regular"\f[], \f[I]"small"\f[], and \f[I]"thumb"\f[]. .SS extractor.vsco.videos .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download video files. .SS extractor.wallhaven.api-key .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Your \f[I]Wallhaven API Key\f[], to use your account's browsing settings and default filters when searching. See https://wallhaven.cc/help/api for more information. .SS extractor.wallhaven.include .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"uploads"\f[] .IP "Example:" 4 .br * "uploads,collections" .br * ["uploads", "collections"] .IP "Description:" 4 A (comma-separated) list of subcategories to include when processing a user profile. Possible values are \f[I]"uploads"\f[], \f[I]"collections"\f[]. It is possible to use \f[I]"all"\f[] instead of listing all values separately. .SS extractor.wallhaven.metadata .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract additional metadata (tags, uploader) Note: This requires 1 additional HTTP request per post. .SS extractor.weasyl.api-key .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Your \f[I]Weasyl API Key\f[], to use your account's browsing settings and filters. .SS extractor.weasyl.metadata .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Fetch extra submission metadata during gallery downloads. .br (\f[I]comments\f[], \f[I]description\f[], \f[I]favorites\f[], \f[I]folder_name\f[], .br \f[I]tags\f[], \f[I]views\f[]) Note: This requires 1 additional HTTP request per submission. .SS extractor.weibo.include .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"feed"\f[] .IP "Description:" 4 A (comma-separated) list of subcategories to include when processing a user profile. Possible values are \f[I]"home"\f[], \f[I]"feed"\f[], \f[I]"videos"\f[], \f[I]"newvideo"\f[], \f[I]"article"\f[], \f[I]"album"\f[]. It is possible to use \f[I]"all"\f[] instead of listing all values separately. .SS extractor.weibo.livephoto .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download \f[I]livephoto\f[] files. .SS extractor.weibo.retweets .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Fetch media from retweeted posts. If this value is \f[I]"original"\f[], metadata for these files will be taken from the original posts, not the retweeted posts. .SS extractor.weibo.videos .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download video files. .SS extractor.ytdl.enabled .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Match **all** URLs, even ones without a \f[I]ytdl:\f[] prefix. .SS extractor.ytdl.format .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 youtube-dl's default, currently \f[I]"bestvideo+bestaudio/best"\f[] .IP "Description:" 4 Video \f[I]format selection \f[] directly passed to youtube-dl. .SS extractor.ytdl.generic .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Controls the use of youtube-dl's generic extractor. Set this option to \f[I]"force"\f[] for the same effect as youtube-dl's \f[I]--force-generic-extractor\f[]. .SS extractor.ytdl.logging .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Route youtube-dl's output through gallery-dl's logging system. Otherwise youtube-dl will write its output directly to stdout/stderr. Note: Set \f[I]quiet\f[] and \f[I]no_warnings\f[] in \f[I]extractor.ytdl.raw-options\f[] to \f[I]true\f[] to suppress all output. .SS extractor.ytdl.module .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Name of the youtube-dl Python module to import. Setting this to \f[I]null\f[] will try to import \f[I]"yt_dlp"\f[] followed by \f[I]"youtube_dl"\f[] as fallback. .SS extractor.ytdl.raw-options .IP "Type:" 6 \f[I]object\f[] (name -> value) .IP "Example:" 4 .. code:: json { "quiet": true, "writesubtitles": true, "merge_output_format": "mkv" } .IP "Description:" 4 Additional options passed directly to the \f[I]YoutubeDL\f[] constructor. All available options can be found in \f[I]youtube-dl's docstrings \f[]. .SS extractor.ytdl.cmdline-args .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Example:" 4 .br * "--quiet --write-sub --merge-output-format mkv" .br * ["--quiet", "--write-sub", "--merge-output-format", "mkv"] .IP "Description:" 4 Additional options specified as youtube-dl command-line arguments. .SS extractor.ytdl.config-file .IP "Type:" 6 \f[I]Path\f[] .IP "Example:" 4 "~/.config/youtube-dl/config" .IP "Description:" 4 Location of a youtube-dl configuration file to load options from. .SS extractor.zerochan.metadata .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract additional metadata (date, md5, tags, ...) Note: This requires 1-2 additional HTTP requests per post. .SS extractor.[booru].tags .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Categorize tags by their respective types and provide them as \f[I]tags_\f[] metadata fields. Note: This requires 1 additional HTTP request per post. .SS extractor.[booru].notes .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract overlay notes (position and text). Note: This requires 1 additional HTTP request per post. .SS extractor.[booru].url .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"file_url"\f[] .IP "Example:" 4 "preview_url" .IP "Description:" 4 Alternate field name to retrieve download URLs from. .SS extractor.[manga-extractor].chapter-reverse .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Reverse the order of chapter URLs extracted from manga pages. .br * \f[I]true\f[]: Start with the latest chapter .br * \f[I]false\f[]: Start with the first chapter .SS extractor.[manga-extractor].page-reverse .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download manga chapter pages in reverse order. .SH DOWNLOADER OPTIONS .SS downloader.*.enabled .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Enable/Disable this downloader module. .SS downloader.*.filesize-min & .filesize-max .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Example:" 4 "32000", "500k", "2.5M" .IP "Description:" 4 Minimum/Maximum allowed file size in bytes. Any file smaller/larger than this limit will not be downloaded. Possible values are valid integer or floating-point numbers optionally followed by one of \f[I]k\f[], \f[I]m\f[]. \f[I]g\f[], \f[I]t\f[], or \f[I]p\f[]. These suffixes are case-insensitive. .SS downloader.*.mtime .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Use \f[I]Last-Modified\f[] HTTP response headers to set file modification times. .SS downloader.*.part .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Controls the use of \f[I].part\f[] files during file downloads. .br * \f[I]true\f[]: Write downloaded data into \f[I].part\f[] files and rename them upon download completion. This mode additionally supports resuming incomplete downloads. .br * \f[I]false\f[]: Do not use \f[I].part\f[] files and write data directly into the actual output files. .SS downloader.*.part-directory .IP "Type:" 6 \f[I]Path\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Alternate location for \f[I].part\f[] files. Missing directories will be created as needed. If this value is \f[I]null\f[], \f[I].part\f[] files are going to be stored alongside the actual output files. .SS downloader.*.progress .IP "Type:" 6 \f[I]float\f[] .IP "Default:" 9 \f[I]3.0\f[] .IP "Description:" 4 Number of seconds until a download progress indicator for the current download is displayed. Set this option to \f[I]null\f[] to disable this indicator. .SS downloader.*.rate .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Example:" 4 "32000", "500k", "2.5M" .IP "Description:" 4 Maximum download rate in bytes per second. Possible values are valid integer or floating-point numbers optionally followed by one of \f[I]k\f[], \f[I]m\f[]. \f[I]g\f[], \f[I]t\f[], or \f[I]p\f[]. These suffixes are case-insensitive. .SS downloader.*.retries .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]extractor.*.retries\f[] .IP "Description:" 4 Maximum number of retries during file downloads, or \f[I]-1\f[] for infinite retries. .SS downloader.*.timeout .IP "Type:" 6 \f[I]float\f[] .IP "Default:" 9 \f[I]extractor.*.timeout\f[] .IP "Description:" 4 Connection timeout during file downloads. .SS downloader.*.verify .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]extractor.*.verify\f[] .IP "Description:" 4 Certificate validation during file downloads. .SS downloader.*.proxy .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]object\f[] (scheme -> proxy) .IP "Default:" 9 \f[I]extractor.*.proxy\f[] .IP "Description:" 4 Proxy server used for file downloads. Disable the use of a proxy for file downloads by explicitly setting this option to \f[I]null\f[]. .SS downloader.http.adjust-extensions .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Check file headers of downloaded files and adjust their filename extensions if they do not match. For example, this will change the filename extension (\f[I]{extension}\f[]) of a file called \f[I]example.png\f[] from \f[I]png\f[] to \f[I]jpg\f[] when said file contains JPEG/JFIF data. .SS downloader.http.chunk-size .IP "Type:" 6 .br * \f[I]integer\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]32768\f[] .IP "Example:" 4 "50k", "0.8M" .IP "Description:" 4 Number of bytes per downloaded chunk. Possible values are integer numbers optionally followed by one of \f[I]k\f[], \f[I]m\f[]. \f[I]g\f[], \f[I]t\f[], or \f[I]p\f[]. These suffixes are case-insensitive. .SS downloader.http.headers .IP "Type:" 6 \f[I]object\f[] (name -> value) .IP "Example:" 4 {"Accept": "image/webp,*/*", "Referer": "https://example.org/"} .IP "Description:" 4 Additional HTTP headers to send when downloading files, .SS downloader.http.retry-codes .IP "Type:" 6 \f[I]list\f[] of \f[I]integers\f[] .IP "Default:" 9 \f[I]extractor.*.retry-codes\f[] .IP "Description:" 4 Additional \f[I]HTTP response status codes\f[] to retry a download on. Codes \f[I]200\f[], \f[I]206\f[], and \f[I]416\f[] (when resuming a \f[I]partial\f[] download) will never be retried and always count as success, regardless of this option. \f[I]5xx\f[] codes (server error responses) will always be retried, regardless of this option. .SS downloader.http.validate .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Check for invalid responses. Fail a download when a file does not pass instead of downloading a potentially broken file. .SS downloader.ytdl.format .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 youtube-dl's default, currently \f[I]"bestvideo+bestaudio/best"\f[] .IP "Description:" 4 Video \f[I]format selection \f[] directly passed to youtube-dl. .SS downloader.ytdl.forward-cookies .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Forward cookies to youtube-dl. .SS downloader.ytdl.logging .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Route youtube-dl's output through gallery-dl's logging system. Otherwise youtube-dl will write its output directly to stdout/stderr. Note: Set \f[I]quiet\f[] and \f[I]no_warnings\f[] in \f[I]downloader.ytdl.raw-options\f[] to \f[I]true\f[] to suppress all output. .SS downloader.ytdl.module .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Name of the youtube-dl Python module to import. Setting this to \f[I]null\f[] will first try to import \f[I]"yt_dlp"\f[] and use \f[I]"youtube_dl"\f[] as fallback. .SS downloader.ytdl.outtmpl .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 The \f[I]Output Template\f[] used to generate filenames for files downloaded with youtube-dl. Special values: .br * \f[I]null\f[]: generate filenames with \f[I]extractor.*.filename\f[] .br * \f[I]"default"\f[]: use youtube-dl's default, currently \f[I]"%(title)s-%(id)s.%(ext)s"\f[] Note: An output template other than \f[I]null\f[] might cause unexpected results in combination with other options (e.g. \f[I]"skip": "enumerate"\f[]) .SS downloader.ytdl.raw-options .IP "Type:" 6 \f[I]object\f[] (name -> value) .IP "Example:" 4 .. code:: json { "quiet": true, "writesubtitles": true, "merge_output_format": "mkv" } .IP "Description:" 4 Additional options passed directly to the \f[I]YoutubeDL\f[] constructor. All available options can be found in \f[I]youtube-dl's docstrings \f[]. .SS downloader.ytdl.cmdline-args .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Example:" 4 .br * "--quiet --write-sub --merge-output-format mkv" .br * ["--quiet", "--write-sub", "--merge-output-format", "mkv"] .IP "Description:" 4 Additional options specified as youtube-dl command-line arguments. .SS downloader.ytdl.config-file .IP "Type:" 6 \f[I]Path\f[] .IP "Example:" 4 "~/.config/youtube-dl/config" .IP "Description:" 4 Location of a youtube-dl configuration file to load options from. .SH OUTPUT OPTIONS .SS output.mode .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]object\f[] (key -> format string) .IP "Default:" 9 \f[I]"auto"\f[] .IP "Description:" 4 Controls the output string format and status indicators. .br * \f[I]"null"\f[]: No output .br * \f[I]"pipe"\f[]: Suitable for piping to other processes or files .br * \f[I]"terminal"\f[]: Suitable for the standard Windows console .br * \f[I]"color"\f[]: Suitable for terminals that understand ANSI escape codes and colors .br * \f[I]"auto"\f[]: \f[I]"terminal"\f[] on Windows with \f[I]output.ansi\f[] disabled, \f[I]"color"\f[] otherwise. It is possible to use custom output format strings .br by setting this option to an \f[I]object\f[] and specifying \f[I]start\f[], \f[I]success\f[], \f[I]skip\f[], \f[I]progress\f[], and \f[I]progress-total\f[]. .br For example, the following will replicate the same output as \f[I]mode: color\f[]: .. code:: json { "start" : "{}", "success": "\\r\\u001b[1;32m{}\\u001b[0m\\n", "skip" : "\\u001b[2m{}\\u001b[0m\\n", "progress" : "\\r{0:>7}B {1:>7}B/s ", "progress-total": "\\r{3:>3}% {0:>7}B {1:>7}B/s " } \f[I]start\f[], \f[I]success\f[], and \f[I]skip\f[] are used to output the current filename, where \f[I]{}\f[] or \f[I]{0}\f[] is replaced with said filename. If a given format string contains printable characters other than that, their number needs to be specified as \f[I][, ]\f[] to get the correct results for \f[I]output.shorten\f[]. For example .. code:: json "start" : [12, "Downloading {}"] \f[I]progress\f[] and \f[I]progress-total\f[] are used when displaying the .br \f[I]download progress indicator\f[], \f[I]progress\f[] when the total number of bytes to download is unknown, .br \f[I]progress-total\f[] otherwise. For these format strings .br * \f[I]{0}\f[] is number of bytes downloaded .br * \f[I]{1}\f[] is number of downloaded bytes per second .br * \f[I]{2}\f[] is total number of bytes .br * \f[I]{3}\f[] is percent of bytes downloaded to total bytes .SS output.stdout & .stdin & .stderr .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]object\f[] .IP "Example:" 4 .. code:: json "utf-8" .. code:: json { "encoding": "utf-8", "errors": "replace", "line_buffering": true } .IP "Description:" 4 \f[I]Reconfigure\f[] a \f[I]standard stream\f[]. Possible options are .br * \f[I]encoding\f[] .br * \f[I]errors\f[] .br * \f[I]newline\f[] .br * \f[I]line_buffering\f[] .br * \f[I]write_through\f[] When this option is specified as a simple \f[I]string\f[], it is interpreted as \f[I]{"encoding": "", "errors": "replace"}\f[] Note: \f[I]errors\f[] always defaults to \f[I]"replace"\f[] .SS output.shorten .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Controls whether the output strings should be shortened to fit on one console line. Set this option to \f[I]"eaw"\f[] to also work with east-asian characters with a display width greater than 1. .SS output.colors .IP "Type:" 6 \f[I]object\f[] (key -> ANSI color) .IP "Default:" 9 \f[I]{"success": "1;32", "skip": "2"}\f[] .IP "Description:" 4 Controls the \f[I]ANSI colors\f[] used with \f[I]mode: color\f[] for successfully downloaded or skipped files. .SS output.ansi .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 On Windows, enable ANSI escape sequences and colored output .br by setting the \f[I]ENABLE_VIRTUAL_TERMINAL_PROCESSING\f[] flag for stdout and stderr. .br .SS output.skip .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Show skipped file downloads. .SS output.fallback .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Include fallback URLs in the output of \f[I]-g/--get-urls\f[]. .SS output.private .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Include private fields, i.e. fields whose name starts with an underscore, in the output of \f[I]-K/--list-keywords\f[] and \f[I]-j/--dump-json\f[]. .SS output.progress .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Controls the progress indicator when *gallery-dl* is run with multiple URLs as arguments. .br * \f[I]true\f[]: Show the default progress indicator (\f[I]"[{current}/{total}] {url}"\f[]) .br * \f[I]false\f[]: Do not show any progress indicator .br * Any \f[I]string\f[]: Show the progress indicator using this as a custom \f[I]format string\f[]. Possible replacement keys are \f[I]current\f[], \f[I]total\f[] and \f[I]url\f[]. .SS output.log .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]Logging Configuration\f[] .IP "Default:" 9 \f[I]"[{name}][{levelname}] {message}"\f[] .IP "Description:" 4 Configuration for logging output to stderr. If this is a simple \f[I]string\f[], it specifies the format string for logging messages. .SS output.logfile .IP "Type:" 6 .br * \f[I]Path\f[] .br * \f[I]Logging Configuration\f[] .IP "Description:" 4 File to write logging output to. .SS output.unsupportedfile .IP "Type:" 6 .br * \f[I]Path\f[] .br * \f[I]Logging Configuration\f[] .IP "Description:" 4 File to write external URLs unsupported by *gallery-dl* to. The default format string here is \f[I]"{message}"\f[]. .SS output.num-to-str .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Convert numeric values (\f[I]integer\f[] or \f[I]float\f[]) to \f[I]string\f[] before outputting them as JSON. .SH POSTPROCESSOR OPTIONS .SS classify.mapping .IP "Type:" 6 \f[I]object\f[] (directory -> extensions) .IP "Default:" 9 .. code:: json { "Pictures": ["jpg", "jpeg", "png", "gif", "bmp", "svg", "webp"], "Video" : ["flv", "ogv", "avi", "mp4", "mpg", "mpeg", "3gp", "mkv", "webm", "vob", "wmv"], "Music" : ["mp3", "aac", "flac", "ogg", "wma", "m4a", "wav"], "Archives": ["zip", "rar", "7z", "tar", "gz", "bz2"] } .IP "Description:" 4 A mapping from directory names to filename extensions that should be stored in them. Files with an extension not listed will be ignored and stored in their default location. .SS compare.action .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"replace"\f[] .IP "Description:" 4 The action to take when files do **not** compare as equal. .br * \f[I]"replace"\f[]: Replace/Overwrite the old version with the new one .br * \f[I]"enumerate"\f[]: Add an enumeration index to the filename of the new version like \f[I]skip = "enumerate"\f[] .SS compare.equal .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"null"\f[] .IP "Description:" 4 The action to take when files do compare as equal. .br * \f[I]"abort:N"\f[]: Stop the current extractor run after \f[I]N\f[] consecutive files compared as equal. .br * \f[I]"terminate:N"\f[]: Stop the current extractor run, including parent extractors, after \f[I]N\f[] consecutive files compared as equal. .br * \f[I]"exit:N"\f[]: Exit the program after \f[I]N\f[] consecutive files compared as equal. .SS compare.shallow .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Only compare file sizes. Do not read and compare their content. .SS exec.archive .IP "Type:" 6 \f[I]Path\f[] .IP "Description:" 4 File to store IDs of executed commands in, similar to \f[I]extractor.*.archive\f[]. \f[I]archive-format\f[], \f[I]archive-prefix\f[], and \f[I]archive-pragma\f[] options, akin to \f[I]extractor.*.archive-format\f[], \f[I]extractor.*.archive-prefix\f[], and \f[I]extractor.*.archive-pragma\f[], are supported as well. .SS exec.async .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Controls whether to wait for a subprocess to finish or to let it run asynchronously. .SS exec.command .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Example:" 4 .br * "convert {} {}.png && rm {}" .br * ["echo", "{user[account]}", "{id}"] .IP "Description:" 4 The command to run. .br * If this is a \f[I]string\f[], it will be executed using the system's shell, e.g. \f[I]/bin/sh\f[]. Any \f[I]{}\f[] will be replaced with the full path of a file or target directory, depending on \f[I]exec.event\f[] .br * If this is a \f[I]list\f[], the first element specifies the program name and any further elements its arguments. Each element of this list is treated as a \f[I]format string\f[] using the files' metadata as well as \f[I]{_path}\f[], \f[I]{_directory}\f[], and \f[I]{_filename}\f[]. .SS exec.event .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"after"\f[] .IP "Description:" 4 The event for which \f[I]exec.command\f[] is run. See \f[I]metadata.event\f[] for a list of available events. .SS metadata.mode .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"json"\f[] .IP "Description:" 4 Selects how to process metadata. .br * \f[I]"json"\f[]: write metadata using \f[I]json.dump()\f[] .br * \f[I]"jsonl"\f[]: write metadata in \f[I]JSON Lines \f[] format .br * \f[I]"tags"\f[]: write \f[I]tags\f[] separated by newlines .br * \f[I]"custom"\f[]: write the result of applying \f[I]metadata.content-format\f[] to a file's metadata dictionary .br * \f[I]"modify"\f[]: add or modify metadata entries .br * \f[I]"delete"\f[]: remove metadata entries .SS metadata.filename .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Example:" 4 "{id}.data.json" .IP "Description:" 4 A \f[I]format string\f[] to build the filenames for metadata files with. (see \f[I]extractor.filename\f[]) Using \f[I]"-"\f[] as filename will write all output to \f[I]stdout\f[]. If this option is set, \f[I]metadata.extension\f[] and \f[I]metadata.extension-format\f[] will be ignored. .SS metadata.directory .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"."\f[] .IP "Example:" 4 "metadata" .IP "Description:" 4 Directory where metadata files are stored in relative to the current target location for file downloads. .SS metadata.extension .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"json"\f[] or \f[I]"txt"\f[] .IP "Description:" 4 Filename extension for metadata files that will be appended to the original file names. .SS metadata.extension-format .IP "Type:" 6 \f[I]string\f[] .IP "Example:" 4 .br * "{extension}.json" .br * "json" .IP "Description:" 4 Custom format string to build filename extensions for metadata files with, which will replace the original filename extensions. Note: \f[I]metadata.extension\f[] is ignored if this option is set. .SS metadata.event .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"file"\f[] .IP "Description:" 4 The event for which metadata gets written to a file. The available events are: \f[I]init\f[] After post processor initialization and before the first file download \f[I]finalize\f[] On extractor shutdown, e.g. after all files were downloaded \f[I]prepare\f[] Before a file download \f[I]file\f[] When completing a file download, but before it gets moved to its target location \f[I]after\f[] After a file got moved to its target location \f[I]skip\f[] When skipping a file download \f[I]post\f[] When starting to download all files of a post, e.g. a Tweet on Twitter or a post on Patreon. \f[I]post-after\f[] After downloading all files of a post .SS metadata.fields .IP "Type:" 6 .br * \f[I]list\f[] of \f[I]strings\f[] .br * \f[I]object\f[] (field name -> \f[I]format string\f[]) .IP "Example:" 4 .. code:: json ["blocked", "watching", "status[creator][name]"] .. code:: json { "blocked" : "***", "watching" : "\\fE 'yes' if watching else 'no'", "status[username]": "{status[creator][name]!l}" } .IP "Description:" 4 .br * \f[I]"mode": "delete"\f[]: A list of metadata field names to remove. .br * \f[I]"mode": "modify"\f[]: An object with metadata field names mapping to a \f[I]format string\f[] whose result is assigned to said field name. .SS metadata.content-format .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Example:" 4 .br * "tags:\\n\\n{tags:J\\n}\\n" .br * ["tags:", "", "{tags:J\\n}"] .IP "Description:" 4 Custom format string to build the content of metadata files with. Note: Only applies for \f[I]"mode": "custom"\f[]. .SS metadata.ascii .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Escape all non-ASCII characters. See the \f[I]ensure_ascii\f[] argument of \f[I]json.dump()\f[] for further details. Note: Only applies for \f[I]"mode": "json"\f[] and \f[I]"jsonl"\f[]. .SS metadata.indent .IP "Type:" 6 .br * \f[I]integer\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]4\f[] .IP "Description:" 4 Indentation level of JSON output. See the \f[I]indent\f[] argument of \f[I]json.dump()\f[] for further details. Note: Only applies for \f[I]"mode": "json"\f[]. .SS metadata.separators .IP "Type:" 6 \f[I]list\f[] with two \f[I]string\f[] elements .IP "Default:" 9 \f[I][", ", ": "]\f[] .IP "Description:" 4 \f[I]\f[] - \f[I]\f[] pair to separate JSON keys and values with. See the \f[I]separators\f[] argument of \f[I]json.dump()\f[] for further details. Note: Only applies for \f[I]"mode": "json"\f[] and \f[I]"jsonl"\f[]. .SS metadata.sort .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Sort output by key. See the \f[I]sort_keys\f[] argument of \f[I]json.dump()\f[] for further details. Note: Only applies for \f[I]"mode": "json"\f[] and \f[I]"jsonl"\f[]. .SS metadata.open .IP "Type:" 6 \f[I]string\f[] .IP "Defsult:" 4 \f[I]"w"\f[] .IP "Description:" 4 The \f[I]mode\f[] in which metadata files get opened. For example, use \f[I]"a"\f[] to append to a file's content or \f[I]"w"\f[] to truncate it. See the \f[I]mode\f[] argument of \f[I]open()\f[] for further details. .SS metadata.encoding .IP "Type:" 6 \f[I]string\f[] .IP "Defsult:" 4 \f[I]"utf-8"\f[] .IP "Description:" 4 Name of the encoding used to encode a file's content. See the \f[I]encoding\f[] argument of \f[I]open()\f[] for further details. .SS metadata.private .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Include private fields, i.e. fields whose name starts with an underscore. .SS metadata.archive .IP "Type:" 6 \f[I]Path\f[] .IP "Description:" 4 File to store IDs of generated metadata files in, similar to \f[I]extractor.*.archive\f[]. \f[I]archive-format\f[], \f[I]archive-prefix\f[], and \f[I]archive-pragma\f[] options, akin to \f[I]extractor.*.archive-format\f[], \f[I]extractor.*.archive-prefix\f[], and \f[I]extractor.*.archive-pragma\f[], are supported as well. .SS metadata.mtime .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Set modification times of generated metadata files according to the accompanying downloaded file. Enabling this option will only have an effect *if* there is actual \f[I]mtime\f[] metadata available, that is .br * after a file download (\f[I]"event": "file"\f[] (default), \f[I]"event": "after"\f[]) .br * when running *after* an \f[I]mtime\f[] post processes for the same \f[I]event\f[] For example, a \f[I]metadata\f[] post processor for \f[I]"event": "post"\f[] will *not* be able to set its file's modification time unless an \f[I]mtime\f[] post processor with \f[I]"event": "post"\f[] runs *before* it. .SS mtime.event .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"file"\f[] .IP "Description:" 4 See \f[I]metadata.event\f[] .SS mtime.key .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"date"\f[] .IP "Description:" 4 Name of the metadata field whose value should be used. This value must either be a UNIX timestamp or a \f[I]datetime\f[] object. Note: This option gets ignored if \f[I]mtime.value\f[] is set. .SS mtime.value .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Example:" 4 .br * "{status[date]}" .br * "{content[0:6]:R22/2022/D%Y%m%d/}" .IP "Description:" 4 A \f[I]format string\f[] whose value should be used. The resulting value must either be a UNIX timestamp or a \f[I]datetime\f[] object. .SS ugoira.extension .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"webm"\f[] .IP "Description:" 4 Filename extension for the resulting video files. .SS ugoira.ffmpeg-args .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Example:" 4 ["-c:v", "libvpx-vp9", "-an", "-b:v", "2M"] .IP "Description:" 4 Additional FFmpeg command-line arguments. .SS ugoira.ffmpeg-demuxer .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]auto\f[] .IP "Description:" 4 FFmpeg demuxer to read and process input files with. Possible values are .br * "\f[I]concat\f[]" (inaccurate frame timecodes for non-uniform frame delays) .br * "\f[I]image2\f[]" (accurate timecodes, requires nanosecond file timestamps, i.e. no Windows or macOS) .br * "mkvmerge" (accurate timecodes, only WebM or MKV, requires \f[I]mkvmerge\f[]) "auto" will select mkvmerge if available and fall back to concat otherwise. .SS ugoira.ffmpeg-location .IP "Type:" 6 \f[I]Path\f[] .IP "Default:" 9 \f[I]"ffmpeg"\f[] .IP "Description:" 4 Location of the \f[I]ffmpeg\f[] (or \f[I]avconv\f[]) executable to use. .SS ugoira.mkvmerge-location .IP "Type:" 6 \f[I]Path\f[] .IP "Default:" 9 \f[I]"mkvmerge"\f[] .IP "Description:" 4 Location of the \f[I]mkvmerge\f[] executable for use with the \f[I]mkvmerge demuxer\f[]. .SS ugoira.ffmpeg-output .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Show FFmpeg output. .SS ugoira.ffmpeg-twopass .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Enable Two-Pass encoding. .SS ugoira.framerate .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"auto"\f[] .IP "Description:" 4 Controls the frame rate argument (\f[I]-r\f[]) for FFmpeg .br * \f[I]"auto"\f[]: Automatically assign a fitting frame rate based on delays between frames. .br * any other \f[I]string\f[]: Use this value as argument for \f[I]-r\f[]. .br * \f[I]null\f[] or an empty \f[I]string\f[]: Don't set an explicit frame rate. .SS ugoira.keep-files .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Keep ZIP archives after conversion. .SS ugoira.libx264-prevent-odd .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Prevent \f[I]"width/height not divisible by 2"\f[] errors when using \f[I]libx264\f[] or \f[I]libx265\f[] encoders by applying a simple cropping filter. See this \f[I]Stack Overflow thread\f[] for more information. This option, when \f[I]libx264/5\f[] is used, automatically adds \f[I]["-vf", "crop=iw-mod(iw\\\\,2):ih-mod(ih\\\\,2)"]\f[] to the list of FFmpeg command-line arguments to reduce an odd width/height by 1 pixel and make them even. .SS ugoira.mtime .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Set modification times of generated ugoira aniomations. .SS ugoira.repeat-last-frame .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Allow repeating the last frame when necessary to prevent it from only being displayed for a very short amount of time. .SS zip.extension .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"zip"\f[] .IP "Description:" 4 Filename extension for the created ZIP archive. .SS zip.files .IP "Type:" 6 \f[I]list\f[] of \f[I]Path\f[] .IP "Example:" 4 ["info.json"] .IP "Description:" 4 List of extra files to be added to a ZIP archive. Note: Relative paths are relative to the current \f[I]download directory\f[]. .SS zip.keep-files .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Keep the actual files after writing them to a ZIP archive. .SS zip.mode .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"default"\f[] .IP "Description:" 4 .br * \f[I]"default"\f[]: Write the central directory file header once after everything is done or an exception is raised. .br * \f[I]"safe"\f[]: Update the central directory file header each time a file is stored in a ZIP archive. This greatly reduces the chance a ZIP archive gets corrupted in case the Python interpreter gets shut down unexpectedly (power outage, SIGKILL) but is also a lot slower. .SH MISCELLANEOUS OPTIONS .SS extractor.modules .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 The \f[I]modules\f[] list in \f[I]extractor/__init__.py\f[] .IP "Example:" 4 ["reddit", "danbooru", "mangadex"] .IP "Description:" 4 List of internal modules to load when searching for a suitable extractor class. Useful to reduce startup time and memory usage. .SS extractor.module-sources .IP "Type:" 6 \f[I]list\f[] of \f[I]Path\f[] instances .IP "Example:" 4 ["~/.config/gallery-dl/modules", null] .IP "Description:" 4 List of directories to load external extractor modules from. Any file in a specified directory with a \f[I].py\f[] filename extension gets \f[I]imported\f[] and searched for potential extractors, i.e. classes with a \f[I]pattern\f[] attribute. Note: \f[I]null\f[] references internal extractors defined in \f[I]extractor/__init__.py\f[] or by \f[I]extractor.modules\f[]. .SS globals .IP "Type:" 6 .br * \f[I]Path\f[] .br * \f[I]string\f[] .IP "Example:" 4 .br * "~/.local/share/gdl-globals.py" .br * "gdl-globals" .IP "Default:" 9 The \f[I]GLOBALS\f[] dict in \f[I]util.py\f[] .IP "Description:" 4 Path to or name of an \f[I]importable\f[] Python module whose namespace gets used as an alternative \f[I]globals parameter\f[] for compiled Python expressions. .SS cache.file .IP "Type:" 6 \f[I]Path\f[] .IP "Default:" 9 .br * (\f[I]%APPDATA%\f[] or \f[I]"~"\f[]) + \f[I]"/gallery-dl/cache.sqlite3"\f[] on Windows .br * (\f[I]$XDG_CACHE_HOME\f[] or \f[I]"~/.cache"\f[]) + \f[I]"/gallery-dl/cache.sqlite3"\f[] on all other platforms .IP "Description:" 4 Path of the SQLite3 database used to cache login sessions, cookies and API tokens across gallery-dl invocations. Set this option to \f[I]null\f[] or an invalid path to disable this cache. .SS format-separator .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"/"\f[] .IP "Description:" 4 Character(s) used as argument separator in format string \f[I]format specifiers\f[]. For example, setting this option to \f[I]"#"\f[] would allow a replacement operation to be \f[I]Rold#new#\f[] instead of the default \f[I]Rold/new/\f[] .SS signals-ignore .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Example:" 4 ["SIGTTOU", "SIGTTIN", "SIGTERM"] .IP "Description:" 4 The list of signal names to ignore, i.e. set \f[I]SIG_IGN\f[] as signal handler for. .SS warnings .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"default"\f[] .IP "Description:" 4 The \f[I]Warnings Filter action\f[] used for (urllib3) warnings. .SS pyopenssl .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Use \f[I]pyOpenSSL\f[]-backed SSL-support. .SH API TOKENS & IDS .SS extractor.deviantart.client-id & .client-secret .IP "Type:" 6 \f[I]string\f[] .IP "How To:" 4 .br * login and visit DeviantArt's \f[I]Applications & Keys\f[] section .br * click "Register Application" .br * scroll to "OAuth2 Redirect URI Whitelist (Required)" and enter "https://mikf.github.io/gallery-dl/oauth-redirect.html" .br * scroll to the bottom and agree to the API License Agreement. Submission Policy, and Terms of Service. .br * click "Save" .br * copy \f[I]client_id\f[] and \f[I]client_secret\f[] of your new application and put them in your configuration file as \f[I]"client-id"\f[] and \f[I]"client-secret"\f[] .br * clear your \f[I]cache\f[] to delete any remaining \f[I]access-token\f[] entries. (\f[I]gallery-dl --clear-cache deviantart\f[]) .br * get a new \f[I]refresh-token\f[] for the new \f[I]client-id\f[] (\f[I]gallery-dl oauth:deviantart\f[]) .SS extractor.flickr.api-key & .api-secret .IP "Type:" 6 \f[I]string\f[] .IP "How To:" 4 .br * login and \f[I]Create an App\f[] in Flickr's \f[I]App Garden\f[] .br * click "APPLY FOR A NON-COMMERCIAL KEY" .br * fill out the form with a random name and description and click "SUBMIT" .br * copy \f[I]Key\f[] and \f[I]Secret\f[] and put them in your configuration file as \f[I]"api-key"\f[] and \f[I]"api-secret"\f[] .SS extractor.reddit.client-id & .user-agent .IP "Type:" 6 \f[I]string\f[] .IP "How To:" 4 .br * login and visit the \f[I]apps\f[] section of your account's preferences .br * click the "are you a developer? create an app..." button .br * fill out the form, choose "installed app", preferably set "http://localhost:6414/" as "redirect uri" and finally click "create app" .br * copy the client id (third line, under your application's name and "installed app") and put it in your configuration file as \f[I]"client-id"\f[] .br * use "\f[I]Python::v1.0 (by /u/)\f[]" as \f[I]user-agent\f[] and replace \f[I]\f[] and \f[I]\f[] accordingly (see Reddit's \f[I]API access rules\f[]) .SS extractor.smugmug.api-key & .api-secret .IP "Type:" 6 \f[I]string\f[] .IP "How To:" 4 .br * login and \f[I]Apply for an API Key\f[] .br * use a random name and description, set "Type" to "Application", "Platform" to "All", and "Use" to "Non-Commercial" .br * fill out the two checkboxes at the bottom and click "Apply" .br * copy \f[I]API Key\f[] and \f[I]API Secret\f[] and put them in your configuration file as \f[I]"api-key"\f[] and \f[I]"api-secret"\f[] .SS extractor.tumblr.api-key & .api-secret .IP "Type:" 6 \f[I]string\f[] .IP "How To:" 4 .br * login and visit Tumblr's \f[I]Applications\f[] section .br * click "Register application" .br * fill out the form: use a random name and description, set https://example.org/ as "Application Website" and "Default callback URL" .br * solve Google's "I'm not a robot" challenge and click "Register" .br * click "Show secret key" (below "OAuth Consumer Key") .br * copy your \f[I]OAuth Consumer Key\f[] and \f[I]Secret Key\f[] and put them in your configuration file as \f[I]"api-key"\f[] and \f[I]"api-secret"\f[] .SH CUSTOM TYPES .SS Date .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]integer\f[] .IP "Example:" 4 .br * "2019-01-01T00:00:00" .br * "2019" with "%Y" as \f[I]date-format\f[] .br * 1546297200 .IP "Description:" 4 A \f[I]Date\f[] value represents a specific point in time. .br * If given as \f[I]string\f[], it is parsed according to \f[I]date-format\f[]. .br * If given as \f[I]integer\f[], it is interpreted as UTC timestamp. .SS Duration .IP "Type:" 6 .br * \f[I]float\f[] .br * \f[I]list\f[] with 2 \f[I]floats\f[] .br * \f[I]string\f[] .IP "Example:" 4 .br * 2.85 .br * [1.5, 3.0] .br * "2.85", "1.5-3.0" .IP "Description:" 4 A \f[I]Duration\f[] represents a span of time in seconds. .br * If given as a single \f[I]float\f[], it will be used as that exact value. .br * If given as a \f[I]list\f[] with 2 floating-point numbers \f[I]a\f[] & \f[I]b\f[] , it will be randomly chosen with uniform distribution such that \f[I]a <= N <= b\f[]. (see \f[I]random.uniform()\f[]) .br * If given as a \f[I]string\f[], it can either represent a single \f[I]float\f[] value (\f[I]"2.85"\f[]) or a range (\f[I]"1.5-3.0"\f[]). .SS Path .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Example:" 4 .br * "file.ext" .br * "~/path/to/file.ext" .br * "$HOME/path/to/file.ext" .br * ["$HOME", "path", "to", "file.ext"] .IP "Description:" 4 A \f[I]Path\f[] is a \f[I]string\f[] representing the location of a file or directory. Simple \f[I]tilde expansion\f[] and \f[I]environment variable expansion\f[] is supported. In Windows environments, backslashes (\f[I]"\\"\f[]) can, in addition to forward slashes (\f[I]"/"\f[]), be used as path separators. Because backslashes are JSON's escape character, they themselves have to be escaped. The path \f[I]C:\\path\\to\\file.ext\f[] has therefore to be written as \f[I]"C:\\\\path\\\\to\\\\file.ext"\f[] if you want to use backslashes. .SS Logging Configuration .IP "Type:" 6 \f[I]object\f[] .IP "Example:" 4 .. code:: json { "format" : "{asctime} {name}: {message}", "format-date": "%H:%M:%S", "path" : "~/log.txt", "encoding" : "ascii" } .. code:: json { "level" : "debug", "format": { "debug" : "debug: {message}", "info" : "[{name}] {message}", "warning": "Warning: {message}", "error" : "ERROR: {message}" } } .IP "Description:" 4 Extended logging output configuration. .br * format .br * General format string for logging messages or a dictionary with format strings for each loglevel. In addition to the default \f[I]LogRecord attributes\f[], it is also possible to access the current \f[I]extractor\f[], \f[I]job\f[], \f[I]path\f[], and keywords objects and their attributes, for example \f[I]"{extractor.url}"\f[], \f[I]"{path.filename}"\f[], \f[I]"{keywords.title}"\f[] .br * Default: \f[I]"[{name}][{levelname}] {message}"\f[] .br * format-date .br * Format string for \f[I]{asctime}\f[] fields in logging messages (see \f[I]strftime() directives\f[]) .br * Default: \f[I]"%Y-%m-%d %H:%M:%S"\f[] .br * level .br * Minimum logging message level (one of \f[I]"debug"\f[], \f[I]"info"\f[], \f[I]"warning"\f[], \f[I]"error"\f[], \f[I]"exception"\f[]) .br * Default: \f[I]"info"\f[] .br * path .br * \f[I]Path\f[] to the output file .br * mode .br * Mode in which the file is opened; use \f[I]"w"\f[] to truncate or \f[I]"a"\f[] to append (see \f[I]open()\f[]) .br * Default: \f[I]"w"\f[] .br * encoding .br * File encoding .br * Default: \f[I]"utf-8"\f[] Note: path, mode, and encoding are only applied when configuring logging output to a file. .SS Postprocessor Configuration .IP "Type:" 6 \f[I]object\f[] .IP "Example:" 4 .. code:: json { "name": "mtime" } .. code:: json { "name" : "zip", "compression": "store", "extension" : "cbz", "filter" : "extension not in ('zip', 'rar')", "whitelist" : ["mangadex", "exhentai", "nhentai"] } .IP "Description:" 4 An \f[I]object\f[] containing a \f[I]"name"\f[] attribute specifying the post-processor type, as well as any of its \f[I]options\f[]. It is possible to set a \f[I]"filter"\f[] expression similar to \f[I]image-filter\f[] to only run a post-processor conditionally. It is also possible set a \f[I]"whitelist"\f[] or \f[I]"blacklist"\f[] to only enable or disable a post-processor for the specified extractor categories. The available post-processor types are \f[I]classify\f[] Categorize files by filename extension \f[I]compare\f[] Compare versions of the same file and replace/enumerate them on mismatch .br (requires \f[I]downloader.*.part\f[] = \f[I]true\f[] and \f[I]extractor.*.skip\f[] = \f[I]false\f[]) .br \f[I]exec\f[] Execute external commands \f[I]metadata\f[] Write metadata to separate files \f[I]mtime\f[] Set file modification time according to its metadata \f[I]ugoira\f[] Convert Pixiv Ugoira to WebM using \f[I]FFmpeg\f[] \f[I]zip\f[] Store files in a ZIP archive .SH BUGS https://github.com/mikf/gallery-dl/issues .SH AUTHORS Mike Fährmann .br and https://github.com/mikf/gallery-dl/graphs/contributors .SH "SEE ALSO" .BR gallery-dl (1) ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1678564859.9485197 gallery_dl-1.25.0/docs/0000755000175000017500000000000014403156774013354 5ustar00mikemike././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1675691565.0 gallery_dl-1.25.0/docs/gallery-dl-example.conf0000644000175000017500000003457414370203055017711 0ustar00mikemike{ "extractor": { "base-directory": "~/gallery-dl/", "#": "set global archive file for all extractors", "archive": "~/gallery-dl/archive.sqlite3", "archive-pragma": ["journal_mode=WAL", "synchronous=NORMAL"], "#": "add two custom keywords into the metadata dictionary", "#": "these can be used to further refine your output directories or filenames", "keywords": {"bkey": "", "ckey": ""}, "#": "make sure that custom keywords are empty, i.e. they don't appear unless specified by the user", "keywords-default": "", "#": "replace invalid path characters with unicode alternatives", "path-restrict": { "\\": "⧹", "/" : "⧸", "|" : "│", ":" : "꞉", "*" : "∗", "?" : "?", "\"": "″", "<" : "﹤", ">" : "﹥" }, "#": "write tags for several *booru sites", "postprocessors": [ { "name": "metadata", "mode": "tags", "whitelist": ["danbooru", "moebooru", "sankaku"] } ], "pixiv": { "#": "override global archive path for pixiv", "archive": "~/gallery-dl/archive-pixiv.sqlite3", "#": "set custom directory and filename format strings for all pixiv downloads", "filename": "{id}{num}.{extension}", "directory": ["Pixiv", "Works", "{user[id]}"], "refresh-token": "aBcDeFgHiJkLmNoPqRsTuVwXyZ01234567890-FedC9", "#": "transform ugoira into lossless MKVs", "ugoira": true, "postprocessors": ["ugoira-copy"], "#": "use special settings for favorites and bookmarks", "favorite": { "directory": ["Pixiv", "Favorites", "{user[id]}"] }, "bookmark": { "directory": ["Pixiv", "My Bookmarks"], "refresh-token": "01234567890aBcDeFgHiJkLmNoPqRsTuVwXyZ-ZyxW1" } }, "danbooru": { "ugoira": true, "postprocessors": ["ugoira-webm"] }, "exhentai": { "#": "use cookies instead of logging in with username and password", "cookies": { "ipb_member_id": "12345", "ipb_pass_hash": "1234567890abcdef", "igneous" : "123456789", "hath_perks" : "m1.m2.m3.a-123456789a", "sk" : "n4m34tv3574m2c4e22c35zgeehiw", "sl" : "dm_2" }, "#": "wait 2 to 4.8 seconds between HTTP requests", "sleep-request": [2.0, 4.8], "filename": "{num:>04}_{name}.{extension}", "directory": ["{category!c}", "{title}"] }, "sankaku": { "#": "authentication with cookies is not possible for sankaku", "username": "user", "password": "#secret#" }, "furaffinity": { "#": "authentication with username and password is not possible due to CAPTCHA", "cookies": { "a": "01234567-89ab-cdef-fedc-ba9876543210", "b": "fedcba98-7654-3210-0123-456789abcdef" }, "descriptions": "html", "postprocessors": ["content"] }, "deviantart": { "#": "download 'gallery' and 'scraps' images for user profile URLs", "include": "gallery,scraps", "#": "use custom API credentials to avoid 429 errors", "client-id": "98765", "client-secret": "0123456789abcdef0123456789abcdef", "refresh-token": "0123456789abcdef0123456789abcdef01234567", "#": "put description texts into a separate directory", "metadata": true, "postprocessors": [ { "name": "metadata", "mode": "custom", "directory" : "Descriptions", "content-format" : "{description}\n", "extension-format": "descr.txt" } ] }, "kemonoparty": { "postprocessors": [ { "name": "metadata", "event": "post", "filename": "{id} {title}.txt", "#": "write text content and external URLs", "mode": "custom", "format": "{content}\n{embed[url]:?/\n/}", "#": "onlx write file if there is an external link present", "filter": "embed.get('url') or re.search(r'(?i)(gigafile|xgf|1drv|mediafire|mega|google|drive)', content)" } ] }, "flickr": { "access-token": "1234567890-abcdef", "access-token-secret": "1234567890abcdef", "size-max": 1920 }, "mangadex": { "#": "only download safe/suggestive chapters translated to English", "lang": "en", "ratings": ["safe", "suggestive"], "#": "put chapters into '.cbz' archives", "postprocessors": ["cbz"] }, "reddit": { "#": "only spawn child extractors for links to specific sites", "whitelist": ["imgur", "redgifs", "gfycat"], "#": "put files from child extractors into the reddit directory", "parent-directory": true, "#": "transfer metadata to any child extractor as '_reddit'", "parent-metadata": "_reddit" }, "imgur": { "#": "use different directory and filename formats when coming from a reddit post", "directory": { "'_reddit' in locals()": [] }, "filename": { "'_reddit' in locals()": "{_reddit[id]} {id}.{extension}", "" : "{id}.{extension}" } }, "tumblr": { "posts" : "all", "external": false, "reblogs" : false, "inline" : true, "#": "use special settings when downloading liked posts", "likes": { "posts" : "video,photo,link", "external": true, "reblogs" : true } }, "twitter": { "#": "write text content for *all* tweets", "postprocessors": ["content"], "text-tweets": true }, "ytdl": { "#": "enable 'ytdl' extractor", "#": "i.e. invoke ytdl on all otherwise unsupported input URLs", "enabled": true, "#": "use yt-dlp instead of youtube-dl", "module": "yt_dlp", "#": "load ytdl options from config file", "config-file": "~/yt-dlp.conf" }, "mastodon": { "#": "add 'tabletop.social' as recognized mastodon instance", "#": "(run 'gallery-dl oauth:mastodon:tabletop.social to get an access token')", "tabletop.social": { "root": "https://tabletop.social", "access-token": "513a36c6..." }, "#": "set filename format strings for all 'mastodon' instances", "directory": ["mastodon", "{instance}", "{account[username]!l}"], "filename" : "{id}_{media[id]}.{extension}" }, "foolslide": { "#": "add two more foolslide instances", "otscans" : {"root": "https://otscans.com/foolslide"}, "helvetica": {"root": "https://helveticascans.com/r" } }, "foolfuuka": { "#": "add two other foolfuuka 4chan archives", "fireden-onion": {"root": "http://ydt6jy2ng3s3xg2e.onion"}, "scalearchive" : {"root": "https://archive.scaled.team" } }, "gelbooru_v01": { "#": "add a custom gelbooru_v01 instance", "#": "this is just an example, this specific instance is already included!", "allgirlbooru": {"root": "https://allgirl.booru.org"}, "#": "the following options are used for all gelbooru_v01 instances", "tag": { "directory": { "locals().get('bkey')": ["Booru", "AllGirlBooru", "Tags", "{bkey}", "{ckey}", "{search_tags}"], "" : ["Booru", "AllGirlBooru", "Tags", "_Unsorted", "{search_tags}"] } }, "post": { "directory": ["Booru", "AllGirlBooru", "Posts"] }, "archive": "~/gallery-dl/custom-archive-file-for-gelbooru_v01_instances.db", "filename": "{tags}_{id}_{md5}.{extension}", "sleep-request": [0, 1.2] }, "gelbooru_v02": { "#": "add a custom gelbooru_v02 instance", "#": "this is just an example, this specific instance is already included!", "tbib": { "root": "https://tbib.org", "#": "some sites have different domains for API access", "#": "use the 'api_root' option in addition to the 'root' setting here" } }, "tbib": { "#": "the following options are only used for TBIB", "#": "gelbooru_v02 has four subcategories at the moment, use custom directory settings for all of these", "tag": { "directory": { "locals().get('bkey')": ["Other Boorus", "TBIB", "Tags", "{bkey}", "{ckey}", "{search_tags}"], "" : ["Other Boorus", "TBIB", "Tags", "_Unsorted", "{search_tags}"] } }, "pool": { "directory": { "locals().get('bkey')": ["Other Boorus", "TBIB", "Pools", "{bkey}", "{ckey}", "{pool}"], "" : ["Other Boorus", "TBIB", "Pools", "_Unsorted", "{pool}"] } }, "favorite": { "directory": { "locals().get('bkey')": ["Other Boorus", "TBIB", "Favorites", "{bkey}", "{ckey}", "{favorite_id}"], "" : ["Other Boorus", "TBIB", "Favorites", "_Unsorted", "{favorite_id}"] } }, "post": { "directory": ["Other Boorus", "TBIB", "Posts"] }, "archive": "~/gallery-dl/custom-archive-file-for-TBIB.db", "filename": "{id}_{md5}.{extension}", "sleep-request": [0, 1.2] } }, "downloader": { "#": "restrict download speed to 1 MB/s", "rate": "1M", "#": "show download progress indicator after 2 seconds", "progress": 2.0, "#": "retry failed downloads up to 3 times", "retries": 3, "#": "consider a download 'failed' after 8 seconds of inactivity", "timeout": 8.0, "#": "write '.part' files into a special directory", "part-directory": "/tmp/.download/", "#": "do not update file modification times", "mtime": false, "ytdl": { "#": "use yt-dlp instead of youtube-dl", "module": "yt_dlp" } }, "output": { "log": { "level": "info", "#": "use different ANSI colors for each log level", "format": { "debug" : "\u001b[0;37m{name}: {message}\u001b[0m", "info" : "\u001b[1;37m{name}: {message}\u001b[0m", "warning": "\u001b[1;33m{name}: {message}\u001b[0m", "error" : "\u001b[1;31m{name}: {message}\u001b[0m" } }, "#": "shorten filenames to fit into one terminal line", "#": "while also considering wider East-Asian characters", "shorten": "eaw", "#": "enable ANSI escape sequences on Windows", "ansi": true, "#": "write logging messages to a separate file", "logfile": { "path": "~/gallery-dl/log.txt", "mode": "w", "level": "debug" }, "#": "write unrecognized URLs to a separate file", "unsupportedfile": { "path": "~/gallery-dl/unsupported.txt", "mode": "a", "format": "{asctime} {message}", "format-date": "%Y-%m-%d-%H-%M-%S" } }, "postprocessor": { "#": "write 'content' metadata into separate files", "content": { "name" : "metadata", "#": "write data for every post instead of each individual file", "event": "post", "filename": "{post_id|tweet_id|id}.txt", "#": "write only the values for 'content' or 'description'", "mode" : "custom", "format": "{content|description}\n" }, "#": "put files into a '.cbz' archive", "cbz": { "name": "zip", "extension": "cbz" }, "#": "various ugoira post processor configurations to create different file formats", "ugoira-webm": { "name": "ugoira", "extension": "webm", "ffmpeg-args": ["-c:v", "libvpx-vp9", "-an", "-b:v", "0", "-crf", "30"], "ffmpeg-twopass": true, "ffmpeg-demuxer": "image2" }, "ugoira-mp4": { "name": "ugoira", "extension": "mp4", "ffmpeg-args": ["-c:v", "libx264", "-an", "-b:v", "4M", "-preset", "veryslow"], "ffmpeg-twopass": true, "libx264-prevent-odd": true }, "ugoira-gif": { "name": "ugoira", "extension": "gif", "ffmpeg-args": ["-filter_complex", "[0:v] split [a][b];[a] palettegen [p];[b][p] paletteuse"] }, "ugoira-copy": { "name": "ugoira", "extension": "mkv", "ffmpeg-args": ["-c", "copy"], "libx264-prevent-odd": false, "repeat-last-frame": false } }, "#": "use a custom cache file location", "cache": { "file": "~/gallery-dl/cache.sqlite3" } } ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1678397180.0 gallery_dl-1.25.0/docs/gallery-dl.conf0000644000175000017500000002254514402447374016265 0ustar00mikemike{ "extractor": { "base-directory": "./gallery-dl/", "parent-directory": false, "postprocessors": null, "archive": null, "cookies": null, "cookies-update": true, "proxy": null, "skip": true, "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Firefox/102.0", "retries": 4, "timeout": 30.0, "verify": true, "fallback": true, "sleep": 0, "sleep-request": 0, "sleep-extractor": 0, "path-restrict": "auto", "path-replace": "_", "path-remove": "\\u0000-\\u001f\\u007f", "path-strip": "auto", "path-extended": true, "extension-map": { "jpeg": "jpg", "jpe" : "jpg", "jfif": "jpg", "jif" : "jpg", "jfi" : "jpg" }, "artstation": { "external": false, "pro-first": true }, "aryion": { "username": null, "password": null, "recursive": true }, "bbc": { "width": 1920 }, "blogger": { "videos": true }, "cyberdrop": { "domain": null }, "danbooru": { "username": null, "password": null, "external": false, "metadata": false, "ugoira": false }, "derpibooru": { "api-key": null, "filter": 56027 }, "deviantart": { "client-id": null, "client-secret": null, "auto-watch": false, "auto-unwatch": false, "comments": false, "extra": false, "flat": true, "folders": false, "group": true, "include": "gallery", "journals": "html", "mature": true, "metadata": false, "original": true, "wait-min": 0 }, "e621": { "username": null, "password": null }, "exhentai": { "username": null, "password": null, "domain": "auto", "limits": true, "metadata": false, "original": true, "sleep-request": 5.0 }, "flickr": { "videos": true, "size-max": null }, "furaffinity": { "descriptions": "text", "external": false, "include": "gallery", "layout": "auto" }, "gelbooru": { "api-key": null, "user-id": null }, "gfycat": { "format": ["mp4", "webm", "mobile", "gif"] }, "gofile": { "api-token": null, "website-token": "12345" }, "hentaifoundry": { "include": "pictures" }, "hitomi": { "format": "webp", "metadata": false }, "idolcomplex": { "username": null, "password": null, "sleep-request": 5.0 }, "imgbb": { "username": null, "password": null }, "imgur": { "mp4": true }, "inkbunny": { "username": null, "password": null, "orderby": "create_datetime" }, "instagram": { "api": "rest", "cookies": null, "include": "posts", "sleep-request": [6.0, 12.0], "videos": true }, "khinsider": { "format": "mp3" }, "luscious": { "gif": false }, "mangadex": { "api-server": "https://api.mangadex.org", "api-parameters": null, "lang": null, "ratings": ["safe", "suggestive", "erotica", "pornographic"] }, "mangoxo": { "username": null, "password": null }, "misskey": { "renotes": false, "replies": true }, "newgrounds": { "username": null, "password": null, "flash": true, "format": "original", "include": "art" }, "nana": { "favkey": null }, "nijie": { "username": null, "password": null, "include": "illustration,doujin" }, "nitter": { "quoted": false, "retweets": false, "videos": true }, "oauth": { "browser": true, "cache": true, "host": "localhost", "port": 6414 }, "paheal": { "metadata": false }, "pillowfort": { "external": false, "inline": true, "reblogs": false }, "pinterest": { "domain": "auto", "sections": true, "videos": true }, "pixiv": { "refresh-token": null, "include": "artworks", "metadata": false, "metadata-bookmark": false, "tags": "japanese", "ugoira": true }, "reactor": { "gif": false, "sleep-request": 5.0 }, "reddit": { "comments": 0, "morecomments": false, "date-min": 0, "date-max": 253402210800, "date-format": "%Y-%m-%dT%H:%M:%S", "id-min": null, "id-max": null, "recursion": 0, "videos": true }, "redgifs": { "format": ["hd", "sd", "gif"] }, "sankaku": { "username": null, "password": null, "refresh": false }, "sankakucomplex": { "embeds": false, "videos": true }, "skeb": { "article": false, "filters": null, "sent-requests": false, "thumbnails": false }, "smugmug": { "videos": true }, "seiga": { "username": null, "password": null }, "subscribestar": { "username": null, "password": null }, "tsumino": { "username": null, "password": null }, "tumblr": { "avatar": false, "external": false, "inline": true, "posts": "all", "offset": 0, "original": true, "reblogs": true }, "twitter": { "username": null, "password": null, "cards": false, "conversations": false, "pinned": false, "quoted": false, "replies": true, "retweets": false, "strategy": null, "text-tweets": false, "twitpic": false, "unique": true, "users": "timeline", "videos": true }, "unsplash": { "format": "raw" }, "vsco": { "videos": true }, "wallhaven": { "api-key": null, "metadata": false, "include": "uploads" }, "weasyl": { "api-key": null, "metadata": false }, "weibo": { "livephoto": true, "retweets": true, "videos": true }, "ytdl": { "enabled": false, "format": null, "generic": true, "logging": true, "module": null, "raw-options": null }, "zerochan": { "username": null, "password": null, "metadata": false }, "booru": { "tags": false, "notes": false } }, "downloader": { "filesize-min": null, "filesize-max": null, "mtime": true, "part": true, "part-directory": null, "progress": 3.0, "rate": null, "retries": 4, "timeout": 30.0, "verify": true, "http": { "adjust-extensions": true, "chunk-size": 32768, "headers": null, "validate": true }, "ytdl": { "format": null, "forward-cookies": false, "logging": true, "module": null, "outtmpl": null, "raw-options": null } }, "output": { "mode": "auto", "progress": true, "shorten": true, "ansi": false, "colors": { "success": "1;32", "skip" : "2" }, "skip": true, "log": "[{name}][{levelname}] {message}", "logfile": null, "unsupportedfile": null }, "netrc": false } ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1678564859.9485197 gallery_dl-1.25.0/gallery_dl/0000755000175000017500000000000014403156774014542 5ustar00mikemike././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1677790111.0 gallery_dl-1.25.0/gallery_dl/__init__.py0000644000175000017500000002541314400205637016646 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2014-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. import sys import logging from . import version, config, option, output, extractor, job, util, exception __author__ = "Mike Fährmann" __copyright__ = "Copyright 2014-2023 Mike Fährmann" __license__ = "GPLv2" __maintainer__ = "Mike Fährmann" __email__ = "mike_faehrmann@web.de" __version__ = version.__version__ def progress(urls, pformat): """Wrapper around urls to output a simple progress indicator""" if pformat is True: pformat = "[{current}/{total}] {url}\n" else: pformat += "\n" pinfo = {"total": len(urls)} for pinfo["current"], pinfo["url"] in enumerate(urls, 1): output.stderr_write(pformat.format_map(pinfo)) yield pinfo["url"] def main(): try: parser = option.build_parser() args = parser.parse_args() log = output.initialize_logging(args.loglevel) # configuration if args.config_load: config.load() if args.configs_json: config.load(args.configs_json, strict=True) if args.configs_yaml: import yaml config.load(args.configs_yaml, strict=True, load=yaml.safe_load) if args.configs_toml: try: import tomllib as toml except ImportError: import toml config.load(args.configs_toml, strict=True, load=toml.loads) if args.filename: filename = args.filename if filename == "/O": filename = "{filename}.{extension}" elif filename.startswith("\\f"): filename = "\f" + filename[2:] config.set((), "filename", filename) if args.directory: config.set((), "base-directory", args.directory) config.set((), "directory", ()) if args.postprocessors: config.set((), "postprocessors", args.postprocessors) if args.abort: config.set((), "skip", "abort:" + str(args.abort)) if args.terminate: config.set((), "skip", "terminate:" + str(args.terminate)) if args.cookies_from_browser: browser, _, profile = args.cookies_from_browser.partition(":") browser, _, keyring = browser.partition("+") if profile.startswith(":"): container = profile[1:] profile = None else: profile, _, container = profile.partition("::") config.set((), "cookies", (browser, profile, keyring, container)) if args.options_pp: config.set((), "postprocessor-options", args.options_pp) for opts in args.options: config.set(*opts) output.configure_standard_streams() # signals signals = config.get((), "signals-ignore") if signals: import signal if isinstance(signals, str): signals = signals.split(",") for signal_name in signals: signal_num = getattr(signal, signal_name, None) if signal_num is None: log.warning("signal '%s' is not defined", signal_name) else: signal.signal(signal_num, signal.SIG_IGN) # enable ANSI escape sequences on Windows if util.WINDOWS and config.get(("output",), "ansi"): from ctypes import windll, wintypes, byref kernel32 = windll.kernel32 mode = wintypes.DWORD() for handle_id in (-11, -12): # stdout and stderr handle = kernel32.GetStdHandle(handle_id) kernel32.GetConsoleMode(handle, byref(mode)) if not mode.value & 0x4: mode.value |= 0x4 kernel32.SetConsoleMode(handle, mode) output.ANSI = True # format string separator separator = config.get((), "format-separator") if separator: from . import formatter formatter._SEPARATOR = separator # eval globals path = config.get((), "globals") if path: util.GLOBALS = util.import_file(path).__dict__ # loglevels output.configure_logging(args.loglevel) if args.loglevel >= logging.ERROR: config.set(("output",), "mode", "null") elif args.loglevel <= logging.DEBUG: import platform import requests extra = "" if util.EXECUTABLE: extra = " - Executable" else: git_head = util.git_head() if git_head: extra = " - Git HEAD: " + git_head log.debug("Version %s%s", __version__, extra) log.debug("Python %s - %s", platform.python_version(), platform.platform()) try: log.debug("requests %s - urllib3 %s", requests.__version__, requests.packages.urllib3.__version__) except AttributeError: pass log.debug("Configuration Files %s", config._files) # extractor modules modules = config.get(("extractor",), "modules") if modules is not None: if isinstance(modules, str): modules = modules.split(",") extractor.modules = modules # external modules if args.extractor_sources: sources = args.extractor_sources sources.append(None) else: sources = config.get(("extractor",), "module-sources") if sources: import os modules = [] for source in sources: if source: path = util.expand_path(source) try: files = os.listdir(path) modules.append(extractor._modules_path(path, files)) except Exception as exc: log.warning("Unable to load modules from %s (%s: %s)", path, exc.__class__.__name__, exc) else: modules.append(extractor._modules_internal()) if len(modules) > 1: import itertools extractor._module_iter = itertools.chain(*modules) elif not modules: extractor._module_iter = () else: extractor._module_iter = iter(modules[0]) if args.list_modules: extractor.modules.append("") sys.stdout.write("\n".join(extractor.modules)) elif args.list_extractors: write = sys.stdout.write fmt = "{}\n{}\nCategory: {} - Subcategory: {}{}\n\n".format for extr in extractor.extractors(): if not extr.__doc__: continue test = next(extr._get_tests(), None) write(fmt( extr.__name__, extr.__doc__, extr.category, extr.subcategory, "\nExample : " + test[0] if test else "", )) elif args.clear_cache: from . import cache log = logging.getLogger("cache") cnt = cache.clear(args.clear_cache) if cnt is None: log.error("Database file not available") else: log.info( "Deleted %d %s from '%s'", cnt, "entry" if cnt == 1 else "entries", cache._path(), ) elif args.config_init: return config.initialize() else: if not args.urls and not args.inputfiles: parser.error( "The following arguments are required: URL\n" "Use 'gallery-dl --help' to get a list of all options.") if args.list_urls: jobtype = job.UrlJob jobtype.maxdepth = args.list_urls if config.get(("output",), "fallback", True): jobtype.handle_url = \ staticmethod(jobtype.handle_url_fallback) else: jobtype = args.jobtype or job.DownloadJob urls = args.urls if args.inputfiles: for inputfile in args.inputfiles: try: if inputfile == "-": if sys.stdin: urls += util.parse_inputfile(sys.stdin, log) else: log.warning( "input file: stdin is not readable") else: with open(inputfile, encoding="utf-8") as file: urls += util.parse_inputfile(file, log) except OSError as exc: log.warning("input file: %s", exc) # unsupported file logging handler handler = output.setup_logging_handler( "unsupportedfile", fmt="{message}") if handler: ulog = logging.getLogger("unsupported") ulog.addHandler(handler) ulog.propagate = False job.Job.ulog = ulog pformat = config.get(("output",), "progress", True) if pformat and len(urls) > 1 and args.loglevel < logging.ERROR: urls = progress(urls, pformat) else: urls = iter(urls) retval = 0 url = next(urls, None) while url is not None: try: log.debug("Starting %s for '%s'", jobtype.__name__, url) if isinstance(url, util.ExtendedUrl): for opts in url.gconfig: config.set(*opts) with config.apply(url.lconfig): retval |= jobtype(url.value).run() else: retval |= jobtype(url).run() except exception.TerminateExtraction: pass except exception.RestartExtraction: log.debug("Restarting '%s'", url) continue except exception.NoExtractorError: log.error("Unsupported URL '%s'", url) retval |= 64 url = next(urls, None) return retval except KeyboardInterrupt: sys.exit("\nKeyboardInterrupt") except BrokenPipeError: pass except OSError as exc: import errno if exc.errno != errno.EPIPE: raise return 1 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1609520004.0 gallery_dl-1.25.0/gallery_dl/__main__.py0000644000175000017500000000105313773651604016634 0ustar00mikemike#!/usr/bin/env python3 # -*- coding: utf-8 -*- # Copyright 2017-2019 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. import sys if __package__ is None and not hasattr(sys, "frozen"): import os.path path = os.path.dirname(os.path.dirname(os.path.abspath(__file__))) sys.path.insert(0, os.path.realpath(path)) import gallery_dl if __name__ == "__main__": sys.exit(gallery_dl.main()) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1678482482.0 gallery_dl-1.25.0/gallery_dl/actions.py0000644000175000017500000000450614402716062016550 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """ """ import re import sys import logging import operator from . import util, exception def parse(actionspec): if isinstance(actionspec, dict): actionspec = actionspec.items() actions = {} actions[logging.DEBUG] = actions_d = [] actions[logging.INFO] = actions_i = [] actions[logging.WARNING] = actions_w = [] actions[logging.ERROR] = actions_e = [] for event, spec in actionspec: level, _, pattern = event.partition(":") type, _, args = spec.partition(" ") action = (re.compile(pattern).search, ACTIONS[type](args)) level = level.strip() if not level or level == "*": actions_d.append(action) actions_i.append(action) actions_w.append(action) actions_e.append(action) else: actions[_level_to_int(level)].append(action) return actions def _level_to_int(level): try: return logging._nameToLevel[level] except KeyError: return int(level) def action_print(opts): def _print(_): print(opts) return _print def action_status(opts): op, value = re.match(r"\s*([&|^=])=?\s*(\d+)", opts).groups() op = { "&": operator.and_, "|": operator.or_, "^": operator.xor, "=": lambda x, y: y, }[op] value = int(value) def _status(args): args["job"].status = op(args["job"].status, value) return _status def action_level(opts): level = _level_to_int(opts.lstrip(" ~=")) def _level(args): args["level"] = level return _level def action_wait(opts): def _wait(args): input("Press Enter to continue") return _wait def action_restart(opts): return util.raises(exception.RestartExtraction) def action_exit(opts): try: opts = int(opts) except ValueError: pass def _exit(args): sys.exit(opts) return _exit ACTIONS = { "print" : action_print, "status" : action_status, "level" : action_level, "restart": action_restart, "wait" : action_wait, "exit" : action_exit, } ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1653908283.0 gallery_dl-1.25.0/gallery_dl/aes.py0000644000175000017500000005116714245121473015666 0ustar00mikemike# -*- coding: utf-8 -*- # This is a slightly modified version of yt-dlp's aes module. # https://github.com/yt-dlp/yt-dlp/blob/master/yt_dlp/aes.py import struct import binascii from math import ceil try: from Cryptodome.Cipher import AES as Cryptodome_AES except ImportError: try: from Crypto.Cipher import AES as Cryptodome_AES except ImportError: Cryptodome_AES = None if Cryptodome_AES: def aes_cbc_decrypt_bytes(data, key, iv): """Decrypt bytes with AES-CBC using pycryptodome""" return Cryptodome_AES.new( key, Cryptodome_AES.MODE_CBC, iv).decrypt(data) def aes_gcm_decrypt_and_verify_bytes(data, key, tag, nonce): """Decrypt bytes with AES-GCM using pycryptodome""" return Cryptodome_AES.new( key, Cryptodome_AES.MODE_GCM, nonce).decrypt_and_verify(data, tag) else: def aes_cbc_decrypt_bytes(data, key, iv): """Decrypt bytes with AES-CBC using native implementation""" return intlist_to_bytes(aes_cbc_decrypt( bytes_to_intlist(data), bytes_to_intlist(key), bytes_to_intlist(iv), )) def aes_gcm_decrypt_and_verify_bytes(data, key, tag, nonce): """Decrypt bytes with AES-GCM using native implementation""" return intlist_to_bytes(aes_gcm_decrypt_and_verify( bytes_to_intlist(data), bytes_to_intlist(key), bytes_to_intlist(tag), bytes_to_intlist(nonce), )) bytes_to_intlist = list def intlist_to_bytes(xs): if not xs: return b"" return struct.pack("%dB" % len(xs), *xs) def unpad_pkcs7(data): return data[:-data[-1]] BLOCK_SIZE_BYTES = 16 def aes_ecb_encrypt(data, key, iv=None): """ Encrypt with aes in ECB mode @param {int[]} data cleartext @param {int[]} key 16/24/32-Byte cipher key @param {int[]} iv Unused for this mode @returns {int[]} encrypted data """ expanded_key = key_expansion(key) block_count = int(ceil(float(len(data)) / BLOCK_SIZE_BYTES)) encrypted_data = [] for i in range(block_count): block = data[i * BLOCK_SIZE_BYTES: (i + 1) * BLOCK_SIZE_BYTES] encrypted_data += aes_encrypt(block, expanded_key) encrypted_data = encrypted_data[:len(data)] return encrypted_data def aes_ecb_decrypt(data, key, iv=None): """ Decrypt with aes in ECB mode @param {int[]} data cleartext @param {int[]} key 16/24/32-Byte cipher key @param {int[]} iv Unused for this mode @returns {int[]} decrypted data """ expanded_key = key_expansion(key) block_count = int(ceil(float(len(data)) / BLOCK_SIZE_BYTES)) encrypted_data = [] for i in range(block_count): block = data[i * BLOCK_SIZE_BYTES: (i + 1) * BLOCK_SIZE_BYTES] encrypted_data += aes_decrypt(block, expanded_key) encrypted_data = encrypted_data[:len(data)] return encrypted_data def aes_ctr_decrypt(data, key, iv): """ Decrypt with aes in counter mode @param {int[]} data cipher @param {int[]} key 16/24/32-Byte cipher key @param {int[]} iv 16-Byte initialization vector @returns {int[]} decrypted data """ return aes_ctr_encrypt(data, key, iv) def aes_ctr_encrypt(data, key, iv): """ Encrypt with aes in counter mode @param {int[]} data cleartext @param {int[]} key 16/24/32-Byte cipher key @param {int[]} iv 16-Byte initialization vector @returns {int[]} encrypted data """ expanded_key = key_expansion(key) block_count = int(ceil(float(len(data)) / BLOCK_SIZE_BYTES)) counter = iter_vector(iv) encrypted_data = [] for i in range(block_count): counter_block = next(counter) block = data[i * BLOCK_SIZE_BYTES: (i + 1) * BLOCK_SIZE_BYTES] block += [0] * (BLOCK_SIZE_BYTES - len(block)) cipher_counter_block = aes_encrypt(counter_block, expanded_key) encrypted_data += xor(block, cipher_counter_block) encrypted_data = encrypted_data[:len(data)] return encrypted_data def aes_cbc_decrypt(data, key, iv): """ Decrypt with aes in CBC mode @param {int[]} data cipher @param {int[]} key 16/24/32-Byte cipher key @param {int[]} iv 16-Byte IV @returns {int[]} decrypted data """ expanded_key = key_expansion(key) block_count = int(ceil(float(len(data)) / BLOCK_SIZE_BYTES)) decrypted_data = [] previous_cipher_block = iv for i in range(block_count): block = data[i * BLOCK_SIZE_BYTES: (i + 1) * BLOCK_SIZE_BYTES] block += [0] * (BLOCK_SIZE_BYTES - len(block)) decrypted_block = aes_decrypt(block, expanded_key) decrypted_data += xor(decrypted_block, previous_cipher_block) previous_cipher_block = block decrypted_data = decrypted_data[:len(data)] return decrypted_data def aes_cbc_encrypt(data, key, iv): """ Encrypt with aes in CBC mode. Using PKCS#7 padding @param {int[]} data cleartext @param {int[]} key 16/24/32-Byte cipher key @param {int[]} iv 16-Byte IV @returns {int[]} encrypted data """ expanded_key = key_expansion(key) block_count = int(ceil(float(len(data)) / BLOCK_SIZE_BYTES)) encrypted_data = [] previous_cipher_block = iv for i in range(block_count): block = data[i * BLOCK_SIZE_BYTES: (i + 1) * BLOCK_SIZE_BYTES] remaining_length = BLOCK_SIZE_BYTES - len(block) block += [remaining_length] * remaining_length mixed_block = xor(block, previous_cipher_block) encrypted_block = aes_encrypt(mixed_block, expanded_key) encrypted_data += encrypted_block previous_cipher_block = encrypted_block return encrypted_data def aes_gcm_decrypt_and_verify(data, key, tag, nonce): """ Decrypt with aes in GBM mode and checks authenticity using tag @param {int[]} data cipher @param {int[]} key 16-Byte cipher key @param {int[]} tag authentication tag @param {int[]} nonce IV (recommended 12-Byte) @returns {int[]} decrypted data """ # XXX: check aes, gcm param hash_subkey = aes_encrypt([0] * BLOCK_SIZE_BYTES, key_expansion(key)) if len(nonce) == 12: j0 = nonce + [0, 0, 0, 1] else: fill = (BLOCK_SIZE_BYTES - (len(nonce) % BLOCK_SIZE_BYTES)) % \ BLOCK_SIZE_BYTES + 8 ghash_in = nonce + [0] * fill + bytes_to_intlist( (8 * len(nonce)).to_bytes(8, "big")) j0 = ghash(hash_subkey, ghash_in) # TODO: add nonce support to aes_ctr_decrypt # nonce_ctr = j0[:12] iv_ctr = inc(j0) decrypted_data = aes_ctr_decrypt( data, key, iv_ctr + [0] * (BLOCK_SIZE_BYTES - len(iv_ctr))) pad_len = len(data) // 16 * 16 s_tag = ghash( hash_subkey, data + [0] * (BLOCK_SIZE_BYTES - len(data) + pad_len) + # pad bytes_to_intlist( (0 * 8).to_bytes(8, "big") + # length of associated data ((len(data) * 8).to_bytes(8, "big")) # length of data ) ) if tag != aes_ctr_encrypt(s_tag, key, j0): raise ValueError("Mismatching authentication tag") return decrypted_data def aes_encrypt(data, expanded_key): """ Encrypt one block with aes @param {int[]} data 16-Byte state @param {int[]} expanded_key 176/208/240-Byte expanded key @returns {int[]} 16-Byte cipher """ rounds = len(expanded_key) // BLOCK_SIZE_BYTES - 1 data = xor(data, expanded_key[:BLOCK_SIZE_BYTES]) for i in range(1, rounds + 1): data = sub_bytes(data) data = shift_rows(data) if i != rounds: data = list(iter_mix_columns(data, MIX_COLUMN_MATRIX)) data = xor(data, expanded_key[ i * BLOCK_SIZE_BYTES: (i + 1) * BLOCK_SIZE_BYTES]) return data def aes_decrypt(data, expanded_key): """ Decrypt one block with aes @param {int[]} data 16-Byte cipher @param {int[]} expanded_key 176/208/240-Byte expanded key @returns {int[]} 16-Byte state """ rounds = len(expanded_key) // BLOCK_SIZE_BYTES - 1 for i in range(rounds, 0, -1): data = xor(data, expanded_key[ i * BLOCK_SIZE_BYTES: (i + 1) * BLOCK_SIZE_BYTES]) if i != rounds: data = list(iter_mix_columns(data, MIX_COLUMN_MATRIX_INV)) data = shift_rows_inv(data) data = sub_bytes_inv(data) data = xor(data, expanded_key[:BLOCK_SIZE_BYTES]) return data def aes_decrypt_text(data, password, key_size_bytes): """ Decrypt text - The first 8 Bytes of decoded 'data' are the 8 high Bytes of the counter - The cipher key is retrieved by encrypting the first 16 Byte of 'password' with the first 'key_size_bytes' Bytes from 'password' (if necessary filled with 0's) - Mode of operation is 'counter' @param {str} data Base64 encoded string @param {str,unicode} password Password (will be encoded with utf-8) @param {int} key_size_bytes Possible values: 16 for 128-Bit, 24 for 192-Bit, or 32 for 256-Bit @returns {str} Decrypted data """ NONCE_LENGTH_BYTES = 8 data = bytes_to_intlist(binascii.a2b_base64(data)) password = bytes_to_intlist(password.encode("utf-8")) key = password[:key_size_bytes] + [0] * (key_size_bytes - len(password)) key = aes_encrypt(key[:BLOCK_SIZE_BYTES], key_expansion(key)) * \ (key_size_bytes // BLOCK_SIZE_BYTES) nonce = data[:NONCE_LENGTH_BYTES] cipher = data[NONCE_LENGTH_BYTES:] return intlist_to_bytes(aes_ctr_decrypt( cipher, key, nonce + [0] * (BLOCK_SIZE_BYTES - NONCE_LENGTH_BYTES) )) RCON = ( 0x8d, 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80, 0x1b, 0x36, ) SBOX = ( 0x63, 0x7C, 0x77, 0x7B, 0xF2, 0x6B, 0x6F, 0xC5, 0x30, 0x01, 0x67, 0x2B, 0xFE, 0xD7, 0xAB, 0x76, 0xCA, 0x82, 0xC9, 0x7D, 0xFA, 0x59, 0x47, 0xF0, 0xAD, 0xD4, 0xA2, 0xAF, 0x9C, 0xA4, 0x72, 0xC0, 0xB7, 0xFD, 0x93, 0x26, 0x36, 0x3F, 0xF7, 0xCC, 0x34, 0xA5, 0xE5, 0xF1, 0x71, 0xD8, 0x31, 0x15, 0x04, 0xC7, 0x23, 0xC3, 0x18, 0x96, 0x05, 0x9A, 0x07, 0x12, 0x80, 0xE2, 0xEB, 0x27, 0xB2, 0x75, 0x09, 0x83, 0x2C, 0x1A, 0x1B, 0x6E, 0x5A, 0xA0, 0x52, 0x3B, 0xD6, 0xB3, 0x29, 0xE3, 0x2F, 0x84, 0x53, 0xD1, 0x00, 0xED, 0x20, 0xFC, 0xB1, 0x5B, 0x6A, 0xCB, 0xBE, 0x39, 0x4A, 0x4C, 0x58, 0xCF, 0xD0, 0xEF, 0xAA, 0xFB, 0x43, 0x4D, 0x33, 0x85, 0x45, 0xF9, 0x02, 0x7F, 0x50, 0x3C, 0x9F, 0xA8, 0x51, 0xA3, 0x40, 0x8F, 0x92, 0x9D, 0x38, 0xF5, 0xBC, 0xB6, 0xDA, 0x21, 0x10, 0xFF, 0xF3, 0xD2, 0xCD, 0x0C, 0x13, 0xEC, 0x5F, 0x97, 0x44, 0x17, 0xC4, 0xA7, 0x7E, 0x3D, 0x64, 0x5D, 0x19, 0x73, 0x60, 0x81, 0x4F, 0xDC, 0x22, 0x2A, 0x90, 0x88, 0x46, 0xEE, 0xB8, 0x14, 0xDE, 0x5E, 0x0B, 0xDB, 0xE0, 0x32, 0x3A, 0x0A, 0x49, 0x06, 0x24, 0x5C, 0xC2, 0xD3, 0xAC, 0x62, 0x91, 0x95, 0xE4, 0x79, 0xE7, 0xC8, 0x37, 0x6D, 0x8D, 0xD5, 0x4E, 0xA9, 0x6C, 0x56, 0xF4, 0xEA, 0x65, 0x7A, 0xAE, 0x08, 0xBA, 0x78, 0x25, 0x2E, 0x1C, 0xA6, 0xB4, 0xC6, 0xE8, 0xDD, 0x74, 0x1F, 0x4B, 0xBD, 0x8B, 0x8A, 0x70, 0x3E, 0xB5, 0x66, 0x48, 0x03, 0xF6, 0x0E, 0x61, 0x35, 0x57, 0xB9, 0x86, 0xC1, 0x1D, 0x9E, 0xE1, 0xF8, 0x98, 0x11, 0x69, 0xD9, 0x8E, 0x94, 0x9B, 0x1E, 0x87, 0xE9, 0xCE, 0x55, 0x28, 0xDF, 0x8C, 0xA1, 0x89, 0x0D, 0xBF, 0xE6, 0x42, 0x68, 0x41, 0x99, 0x2D, 0x0F, 0xB0, 0x54, 0xBB, 0x16, ) SBOX_INV = ( 0x52, 0x09, 0x6a, 0xd5, 0x30, 0x36, 0xa5, 0x38, 0xbf, 0x40, 0xa3, 0x9e, 0x81, 0xf3, 0xd7, 0xfb, 0x7c, 0xe3, 0x39, 0x82, 0x9b, 0x2f, 0xff, 0x87, 0x34, 0x8e, 0x43, 0x44, 0xc4, 0xde, 0xe9, 0xcb, 0x54, 0x7b, 0x94, 0x32, 0xa6, 0xc2, 0x23, 0x3d, 0xee, 0x4c, 0x95, 0x0b, 0x42, 0xfa, 0xc3, 0x4e, 0x08, 0x2e, 0xa1, 0x66, 0x28, 0xd9, 0x24, 0xb2, 0x76, 0x5b, 0xa2, 0x49, 0x6d, 0x8b, 0xd1, 0x25, 0x72, 0xf8, 0xf6, 0x64, 0x86, 0x68, 0x98, 0x16, 0xd4, 0xa4, 0x5c, 0xcc, 0x5d, 0x65, 0xb6, 0x92, 0x6c, 0x70, 0x48, 0x50, 0xfd, 0xed, 0xb9, 0xda, 0x5e, 0x15, 0x46, 0x57, 0xa7, 0x8d, 0x9d, 0x84, 0x90, 0xd8, 0xab, 0x00, 0x8c, 0xbc, 0xd3, 0x0a, 0xf7, 0xe4, 0x58, 0x05, 0xb8, 0xb3, 0x45, 0x06, 0xd0, 0x2c, 0x1e, 0x8f, 0xca, 0x3f, 0x0f, 0x02, 0xc1, 0xaf, 0xbd, 0x03, 0x01, 0x13, 0x8a, 0x6b, 0x3a, 0x91, 0x11, 0x41, 0x4f, 0x67, 0xdc, 0xea, 0x97, 0xf2, 0xcf, 0xce, 0xf0, 0xb4, 0xe6, 0x73, 0x96, 0xac, 0x74, 0x22, 0xe7, 0xad, 0x35, 0x85, 0xe2, 0xf9, 0x37, 0xe8, 0x1c, 0x75, 0xdf, 0x6e, 0x47, 0xf1, 0x1a, 0x71, 0x1d, 0x29, 0xc5, 0x89, 0x6f, 0xb7, 0x62, 0x0e, 0xaa, 0x18, 0xbe, 0x1b, 0xfc, 0x56, 0x3e, 0x4b, 0xc6, 0xd2, 0x79, 0x20, 0x9a, 0xdb, 0xc0, 0xfe, 0x78, 0xcd, 0x5a, 0xf4, 0x1f, 0xdd, 0xa8, 0x33, 0x88, 0x07, 0xc7, 0x31, 0xb1, 0x12, 0x10, 0x59, 0x27, 0x80, 0xec, 0x5f, 0x60, 0x51, 0x7f, 0xa9, 0x19, 0xb5, 0x4a, 0x0d, 0x2d, 0xe5, 0x7a, 0x9f, 0x93, 0xc9, 0x9c, 0xef, 0xa0, 0xe0, 0x3b, 0x4d, 0xae, 0x2a, 0xf5, 0xb0, 0xc8, 0xeb, 0xbb, 0x3c, 0x83, 0x53, 0x99, 0x61, 0x17, 0x2b, 0x04, 0x7e, 0xba, 0x77, 0xd6, 0x26, 0xe1, 0x69, 0x14, 0x63, 0x55, 0x21, 0x0c, 0x7d ) MIX_COLUMN_MATRIX = ( (0x2, 0x3, 0x1, 0x1), (0x1, 0x2, 0x3, 0x1), (0x1, 0x1, 0x2, 0x3), (0x3, 0x1, 0x1, 0x2), ) MIX_COLUMN_MATRIX_INV = ( (0xE, 0xB, 0xD, 0x9), (0x9, 0xE, 0xB, 0xD), (0xD, 0x9, 0xE, 0xB), (0xB, 0xD, 0x9, 0xE), ) RIJNDAEL_EXP_TABLE = ( 0x01, 0x03, 0x05, 0x0F, 0x11, 0x33, 0x55, 0xFF, 0x1A, 0x2E, 0x72, 0x96, 0xA1, 0xF8, 0x13, 0x35, 0x5F, 0xE1, 0x38, 0x48, 0xD8, 0x73, 0x95, 0xA4, 0xF7, 0x02, 0x06, 0x0A, 0x1E, 0x22, 0x66, 0xAA, 0xE5, 0x34, 0x5C, 0xE4, 0x37, 0x59, 0xEB, 0x26, 0x6A, 0xBE, 0xD9, 0x70, 0x90, 0xAB, 0xE6, 0x31, 0x53, 0xF5, 0x04, 0x0C, 0x14, 0x3C, 0x44, 0xCC, 0x4F, 0xD1, 0x68, 0xB8, 0xD3, 0x6E, 0xB2, 0xCD, 0x4C, 0xD4, 0x67, 0xA9, 0xE0, 0x3B, 0x4D, 0xD7, 0x62, 0xA6, 0xF1, 0x08, 0x18, 0x28, 0x78, 0x88, 0x83, 0x9E, 0xB9, 0xD0, 0x6B, 0xBD, 0xDC, 0x7F, 0x81, 0x98, 0xB3, 0xCE, 0x49, 0xDB, 0x76, 0x9A, 0xB5, 0xC4, 0x57, 0xF9, 0x10, 0x30, 0x50, 0xF0, 0x0B, 0x1D, 0x27, 0x69, 0xBB, 0xD6, 0x61, 0xA3, 0xFE, 0x19, 0x2B, 0x7D, 0x87, 0x92, 0xAD, 0xEC, 0x2F, 0x71, 0x93, 0xAE, 0xE9, 0x20, 0x60, 0xA0, 0xFB, 0x16, 0x3A, 0x4E, 0xD2, 0x6D, 0xB7, 0xC2, 0x5D, 0xE7, 0x32, 0x56, 0xFA, 0x15, 0x3F, 0x41, 0xC3, 0x5E, 0xE2, 0x3D, 0x47, 0xC9, 0x40, 0xC0, 0x5B, 0xED, 0x2C, 0x74, 0x9C, 0xBF, 0xDA, 0x75, 0x9F, 0xBA, 0xD5, 0x64, 0xAC, 0xEF, 0x2A, 0x7E, 0x82, 0x9D, 0xBC, 0xDF, 0x7A, 0x8E, 0x89, 0x80, 0x9B, 0xB6, 0xC1, 0x58, 0xE8, 0x23, 0x65, 0xAF, 0xEA, 0x25, 0x6F, 0xB1, 0xC8, 0x43, 0xC5, 0x54, 0xFC, 0x1F, 0x21, 0x63, 0xA5, 0xF4, 0x07, 0x09, 0x1B, 0x2D, 0x77, 0x99, 0xB0, 0xCB, 0x46, 0xCA, 0x45, 0xCF, 0x4A, 0xDE, 0x79, 0x8B, 0x86, 0x91, 0xA8, 0xE3, 0x3E, 0x42, 0xC6, 0x51, 0xF3, 0x0E, 0x12, 0x36, 0x5A, 0xEE, 0x29, 0x7B, 0x8D, 0x8C, 0x8F, 0x8A, 0x85, 0x94, 0xA7, 0xF2, 0x0D, 0x17, 0x39, 0x4B, 0xDD, 0x7C, 0x84, 0x97, 0xA2, 0xFD, 0x1C, 0x24, 0x6C, 0xB4, 0xC7, 0x52, 0xF6, 0x01, ) RIJNDAEL_LOG_TABLE = ( 0x00, 0x00, 0x19, 0x01, 0x32, 0x02, 0x1a, 0xc6, 0x4b, 0xc7, 0x1b, 0x68, 0x33, 0xee, 0xdf, 0x03, 0x64, 0x04, 0xe0, 0x0e, 0x34, 0x8d, 0x81, 0xef, 0x4c, 0x71, 0x08, 0xc8, 0xf8, 0x69, 0x1c, 0xc1, 0x7d, 0xc2, 0x1d, 0xb5, 0xf9, 0xb9, 0x27, 0x6a, 0x4d, 0xe4, 0xa6, 0x72, 0x9a, 0xc9, 0x09, 0x78, 0x65, 0x2f, 0x8a, 0x05, 0x21, 0x0f, 0xe1, 0x24, 0x12, 0xf0, 0x82, 0x45, 0x35, 0x93, 0xda, 0x8e, 0x96, 0x8f, 0xdb, 0xbd, 0x36, 0xd0, 0xce, 0x94, 0x13, 0x5c, 0xd2, 0xf1, 0x40, 0x46, 0x83, 0x38, 0x66, 0xdd, 0xfd, 0x30, 0xbf, 0x06, 0x8b, 0x62, 0xb3, 0x25, 0xe2, 0x98, 0x22, 0x88, 0x91, 0x10, 0x7e, 0x6e, 0x48, 0xc3, 0xa3, 0xb6, 0x1e, 0x42, 0x3a, 0x6b, 0x28, 0x54, 0xfa, 0x85, 0x3d, 0xba, 0x2b, 0x79, 0x0a, 0x15, 0x9b, 0x9f, 0x5e, 0xca, 0x4e, 0xd4, 0xac, 0xe5, 0xf3, 0x73, 0xa7, 0x57, 0xaf, 0x58, 0xa8, 0x50, 0xf4, 0xea, 0xd6, 0x74, 0x4f, 0xae, 0xe9, 0xd5, 0xe7, 0xe6, 0xad, 0xe8, 0x2c, 0xd7, 0x75, 0x7a, 0xeb, 0x16, 0x0b, 0xf5, 0x59, 0xcb, 0x5f, 0xb0, 0x9c, 0xa9, 0x51, 0xa0, 0x7f, 0x0c, 0xf6, 0x6f, 0x17, 0xc4, 0x49, 0xec, 0xd8, 0x43, 0x1f, 0x2d, 0xa4, 0x76, 0x7b, 0xb7, 0xcc, 0xbb, 0x3e, 0x5a, 0xfb, 0x60, 0xb1, 0x86, 0x3b, 0x52, 0xa1, 0x6c, 0xaa, 0x55, 0x29, 0x9d, 0x97, 0xb2, 0x87, 0x90, 0x61, 0xbe, 0xdc, 0xfc, 0xbc, 0x95, 0xcf, 0xcd, 0x37, 0x3f, 0x5b, 0xd1, 0x53, 0x39, 0x84, 0x3c, 0x41, 0xa2, 0x6d, 0x47, 0x14, 0x2a, 0x9e, 0x5d, 0x56, 0xf2, 0xd3, 0xab, 0x44, 0x11, 0x92, 0xd9, 0x23, 0x20, 0x2e, 0x89, 0xb4, 0x7c, 0xb8, 0x26, 0x77, 0x99, 0xe3, 0xa5, 0x67, 0x4a, 0xed, 0xde, 0xc5, 0x31, 0xfe, 0x18, 0x0d, 0x63, 0x8c, 0x80, 0xc0, 0xf7, 0x70, 0x07, ) def key_expansion(data): """ Generate key schedule @param {int[]} data 16/24/32-Byte cipher key @returns {int[]} 176/208/240-Byte expanded key """ data = data[:] # copy rcon_iteration = 1 key_size_bytes = len(data) expanded_key_size_bytes = (key_size_bytes // 4 + 7) * BLOCK_SIZE_BYTES while len(data) < expanded_key_size_bytes: temp = data[-4:] temp = key_schedule_core(temp, rcon_iteration) rcon_iteration += 1 data += xor(temp, data[-key_size_bytes: 4 - key_size_bytes]) for _ in range(3): temp = data[-4:] data += xor(temp, data[-key_size_bytes: 4 - key_size_bytes]) if key_size_bytes == 32: temp = data[-4:] temp = sub_bytes(temp) data += xor(temp, data[-key_size_bytes: 4 - key_size_bytes]) for _ in range(3 if key_size_bytes == 32 else 2 if key_size_bytes == 24 else 0): temp = data[-4:] data += xor(temp, data[-key_size_bytes: 4 - key_size_bytes]) data = data[:expanded_key_size_bytes] return data def iter_vector(iv): while True: yield iv iv = inc(iv) def sub_bytes(data): return [SBOX[x] for x in data] def sub_bytes_inv(data): return [SBOX_INV[x] for x in data] def rotate(data): return data[1:] + [data[0]] def key_schedule_core(data, rcon_iteration): data = rotate(data) data = sub_bytes(data) data[0] = data[0] ^ RCON[rcon_iteration] return data def xor(data1, data2): return [x ^ y for x, y in zip(data1, data2)] def iter_mix_columns(data, matrix): for i in (0, 4, 8, 12): for row in matrix: mixed = 0 for j in range(4): if data[i:i + 4][j] == 0 or row[j] == 0: mixed ^= 0 else: mixed ^= RIJNDAEL_EXP_TABLE[ (RIJNDAEL_LOG_TABLE[data[i + j]] + RIJNDAEL_LOG_TABLE[row[j]]) % 0xFF ] yield mixed def shift_rows(data): return [ data[((column + row) & 0b11) * 4 + row] for column in range(4) for row in range(4) ] def shift_rows_inv(data): return [ data[((column - row) & 0b11) * 4 + row] for column in range(4) for row in range(4) ] def shift_block(data): data_shifted = [] bit = 0 for n in data: if bit: n |= 0x100 bit = n & 1 n >>= 1 data_shifted.append(n) return data_shifted def inc(data): data = data[:] # copy for i in range(len(data) - 1, -1, -1): if data[i] == 255: data[i] = 0 else: data[i] = data[i] + 1 break return data def block_product(block_x, block_y): # NIST SP 800-38D, Algorithm 1 if len(block_x) != BLOCK_SIZE_BYTES or len(block_y) != BLOCK_SIZE_BYTES: raise ValueError( "Length of blocks need to be %d bytes" % BLOCK_SIZE_BYTES) block_r = [0xE1] + [0] * (BLOCK_SIZE_BYTES - 1) block_v = block_y[:] block_z = [0] * BLOCK_SIZE_BYTES for i in block_x: for bit in range(7, -1, -1): if i & (1 << bit): block_z = xor(block_z, block_v) do_xor = block_v[-1] & 1 block_v = shift_block(block_v) if do_xor: block_v = xor(block_v, block_r) return block_z def ghash(subkey, data): # NIST SP 800-38D, Algorithm 2 if len(data) % BLOCK_SIZE_BYTES: raise ValueError( "Length of data should be %d bytes" % BLOCK_SIZE_BYTES) last_y = [0] * BLOCK_SIZE_BYTES for i in range(0, len(data), BLOCK_SIZE_BYTES): block = data[i: i + BLOCK_SIZE_BYTES] last_y = block_product(xor(last_y, block), subkey) return last_y ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1675954109.0 gallery_dl-1.25.0/gallery_dl/cache.py0000644000175000017500000001451514371203675016162 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2016-2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Decorators to keep function results in an in-memory and database cache""" import sqlite3 import pickle import time import os import functools from . import config, util class CacheDecorator(): """Simplified in-memory cache""" def __init__(self, func, keyarg): self.func = func self.cache = {} self.keyarg = keyarg def __get__(self, instance, cls): return functools.partial(self.__call__, instance) def __call__(self, *args, **kwargs): key = "" if self.keyarg is None else args[self.keyarg] try: value = self.cache[key] except KeyError: value = self.cache[key] = self.func(*args, **kwargs) return value def update(self, key, value): self.cache[key] = value def invalidate(self, key=""): try: del self.cache[key] except KeyError: pass class MemoryCacheDecorator(CacheDecorator): """In-memory cache""" def __init__(self, func, keyarg, maxage): CacheDecorator.__init__(self, func, keyarg) self.maxage = maxage def __call__(self, *args, **kwargs): key = "" if self.keyarg is None else args[self.keyarg] timestamp = int(time.time()) try: value, expires = self.cache[key] except KeyError: expires = 0 if expires <= timestamp: value = self.func(*args, **kwargs) expires = timestamp + self.maxage self.cache[key] = value, expires return value def update(self, key, value): self.cache[key] = value, int(time.time()) + self.maxage class DatabaseCacheDecorator(): """Database cache""" db = None _init = True def __init__(self, func, keyarg, maxage): self.key = "%s.%s" % (func.__module__, func.__name__) self.func = func self.cache = {} self.keyarg = keyarg self.maxage = maxage def __get__(self, obj, objtype): return functools.partial(self.__call__, obj) def __call__(self, *args, **kwargs): key = "" if self.keyarg is None else args[self.keyarg] timestamp = int(time.time()) # in-memory cache lookup try: value, expires = self.cache[key] if expires > timestamp: return value except KeyError: pass # database lookup fullkey = "%s-%s" % (self.key, key) with self.database() as db: cursor = db.cursor() try: cursor.execute("BEGIN EXCLUSIVE") except sqlite3.OperationalError: pass # Silently swallow exception - workaround for Python 3.6 cursor.execute( "SELECT value, expires FROM data WHERE key=? LIMIT 1", (fullkey,), ) result = cursor.fetchone() if result and result[1] > timestamp: value, expires = result value = pickle.loads(value) else: value = self.func(*args, **kwargs) expires = timestamp + self.maxage cursor.execute( "INSERT OR REPLACE INTO data VALUES (?,?,?)", (fullkey, pickle.dumps(value), expires), ) self.cache[key] = value, expires return value def update(self, key, value): expires = int(time.time()) + self.maxage self.cache[key] = value, expires with self.database() as db: db.execute( "INSERT OR REPLACE INTO data VALUES (?,?,?)", ("%s-%s" % (self.key, key), pickle.dumps(value), expires), ) def invalidate(self, key): try: del self.cache[key] except KeyError: pass with self.database() as db: db.execute( "DELETE FROM data WHERE key=?", ("%s-%s" % (self.key, key),), ) def database(self): if self._init: self.db.execute( "CREATE TABLE IF NOT EXISTS data " "(key TEXT PRIMARY KEY, value TEXT, expires INTEGER)" ) DatabaseCacheDecorator._init = False return self.db def memcache(maxage=None, keyarg=None): if maxage: def wrap(func): return MemoryCacheDecorator(func, keyarg, maxage) else: def wrap(func): return CacheDecorator(func, keyarg) return wrap def cache(maxage=3600, keyarg=None): def wrap(func): return DatabaseCacheDecorator(func, keyarg, maxage) return wrap def clear(module): """Delete database entries for 'module'""" db = DatabaseCacheDecorator.db if not db: return None rowcount = 0 cursor = db.cursor() try: if module == "ALL": cursor.execute("DELETE FROM data") else: cursor.execute( "DELETE FROM data " "WHERE key LIKE 'gallery_dl.extractor.' || ? || '.%'", (module.lower(),) ) except sqlite3.OperationalError: pass # database not initialized, cannot be modified, etc. else: rowcount = cursor.rowcount db.commit() if rowcount: cursor.execute("VACUUM") return rowcount def _path(): path = config.get(("cache",), "file", util.SENTINEL) if path is not util.SENTINEL: return util.expand_path(path) if util.WINDOWS: cachedir = os.environ.get("APPDATA", "~") else: cachedir = os.environ.get("XDG_CACHE_HOME", "~/.cache") cachedir = util.expand_path(os.path.join(cachedir, "gallery-dl")) os.makedirs(cachedir, exist_ok=True) return os.path.join(cachedir, "cache.sqlite3") def _init(): try: dbfile = _path() # restrict access permissions for new db files os.close(os.open(dbfile, os.O_CREAT | os.O_RDONLY, 0o600)) DatabaseCacheDecorator.db = sqlite3.connect( dbfile, timeout=60, check_same_thread=False) except (OSError, TypeError, sqlite3.OperationalError): global cache cache = memcache _init() ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1677790111.0 gallery_dl-1.25.0/gallery_dl/config.py0000644000175000017500000001363614400205637016360 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2015-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Global configuration module""" import sys import os.path import logging from . import util log = logging.getLogger("config") # -------------------------------------------------------------------- # internals _config = {} _files = [] if util.WINDOWS: _default_configs = [ r"%APPDATA%\gallery-dl\config.json", r"%USERPROFILE%\gallery-dl\config.json", r"%USERPROFILE%\gallery-dl.conf", ] else: _default_configs = [ "/etc/gallery-dl.conf", "${XDG_CONFIG_HOME}/gallery-dl/config.json" if os.environ.get("XDG_CONFIG_HOME") else "${HOME}/.config/gallery-dl/config.json", "${HOME}/.gallery-dl.conf", ] if util.EXECUTABLE: # look for config file in PyInstaller executable directory (#682) _default_configs.append(os.path.join( os.path.dirname(sys.executable), "gallery-dl.conf", )) # -------------------------------------------------------------------- # public interface def initialize(): paths = list(map(util.expand_path, _default_configs)) for path in paths: if os.access(path, os.R_OK | os.W_OK): log.error("There is already a configuration file at '%s'", path) return 1 for path in paths: try: os.makedirs(os.path.dirname(path), exist_ok=True) with open(path, "x", encoding="utf-8") as fp: fp.write("""\ { "extractor": { }, "downloader": { }, "output": { }, "postprocessor": { } } """) break except OSError as exc: log.debug("%s: %s", exc.__class__.__name__, exc) else: log.error("Unable to create a new configuration file " "at any of the default paths") return 1 log.info("Created a basic configuration file at '%s'", path) return 0 def load(files=None, strict=False, load=util.json_loads): """Load JSON configuration files""" for pathfmt in files or _default_configs: path = util.expand_path(pathfmt) try: with open(path, encoding="utf-8") as file: conf = load(file.read()) except OSError as exc: if strict: log.error(exc) sys.exit(1) except Exception as exc: log.warning("Could not parse '%s': %s", path, exc) if strict: sys.exit(2) else: if not _config: _config.update(conf) else: util.combine_dict(_config, conf) _files.append(pathfmt) def clear(): """Reset configuration to an empty state""" _config.clear() def get(path, key, default=None, *, conf=_config): """Get the value of property 'key' or a default value""" try: for p in path: conf = conf[p] return conf[key] except Exception: return default def interpolate(path, key, default=None, *, conf=_config): """Interpolate the value of 'key'""" if key in conf: return conf[key] try: for p in path: conf = conf[p] if key in conf: default = conf[key] except Exception: pass return default def interpolate_common(common, paths, key, default=None, *, conf=_config): """Interpolate the value of 'key' using multiple 'paths' along a 'common' ancestor """ if key in conf: return conf[key] # follow the common path try: for p in common: conf = conf[p] if key in conf: default = conf[key] except Exception: return default # try all paths until a value is found value = util.SENTINEL for path in paths: c = conf try: for p in path: c = c[p] if key in c: value = c[key] except Exception: pass if value is not util.SENTINEL: return value return default def accumulate(path, key, *, conf=_config): """Accumulate the values of 'key' along 'path'""" result = [] try: if key in conf: value = conf[key] if value: result.extend(value) for p in path: conf = conf[p] if key in conf: value = conf[key] if value: result[:0] = value except Exception: pass return result def set(path, key, value, *, conf=_config): """Set the value of property 'key' for this session""" for p in path: try: conf = conf[p] except KeyError: conf[p] = conf = {} conf[key] = value def setdefault(path, key, value, *, conf=_config): """Set the value of property 'key' if it doesn't exist""" for p in path: try: conf = conf[p] except KeyError: conf[p] = conf = {} return conf.setdefault(key, value) def unset(path, key, *, conf=_config): """Unset the value of property 'key'""" try: for p in path: conf = conf[p] del conf[key] except Exception: pass class apply(): """Context Manager: apply a collection of key-value pairs""" def __init__(self, kvlist): self.original = [] self.kvlist = kvlist def __enter__(self): for path, key, value in self.kvlist: self.original.append((path, key, get(path, key, util.SENTINEL))) set(path, key, value) def __exit__(self, etype, value, traceback): for path, key, value in self.original: if value is util.SENTINEL: unset(path, key) else: set(path, key, value) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1676045045.0 gallery_dl-1.25.0/gallery_dl/cookies.py0000644000175000017500000010437114371465365016561 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2022-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. # Adapted from yt-dlp's cookies module. # https://github.com/yt-dlp/yt-dlp/blob/master/yt_dlp/cookies.py import binascii import contextlib import ctypes import logging import os import shutil import sqlite3 import struct import subprocess import sys import tempfile from datetime import datetime, timedelta, timezone from hashlib import pbkdf2_hmac from http.cookiejar import Cookie from . import aes, text, util SUPPORTED_BROWSERS_CHROMIUM = { "brave", "chrome", "chromium", "edge", "opera", "vivaldi"} SUPPORTED_BROWSERS = SUPPORTED_BROWSERS_CHROMIUM | {"firefox", "safari"} logger = logging.getLogger("cookies") def load_cookies(cookiejar, browser_specification): browser_name, profile, keyring, container = \ _parse_browser_specification(*browser_specification) if browser_name == "firefox": load_cookies_firefox(cookiejar, profile, container) elif browser_name == "safari": load_cookies_safari(cookiejar, profile) elif browser_name in SUPPORTED_BROWSERS_CHROMIUM: load_cookies_chrome(cookiejar, browser_name, profile, keyring) else: raise ValueError("unknown browser '{}'".format(browser_name)) def load_cookies_firefox(cookiejar, profile=None, container=None): path, container_id = _firefox_cookies_database(profile, container) with DatabaseCopy(path) as db: sql = ("SELECT name, value, host, path, isSecure, expiry " "FROM moz_cookies") parameters = () if container_id is False: sql += " WHERE NOT INSTR(originAttributes,'userContextId=')" elif container_id: sql += " WHERE originAttributes LIKE ? OR originAttributes LIKE ?" uid = "%userContextId={}".format(container_id) parameters = (uid, uid + "&%") set_cookie = cookiejar.set_cookie for name, value, domain, path, secure, expires in db.execute( sql, parameters): set_cookie(Cookie( 0, name, value, None, False, domain, bool(domain), domain.startswith("."), path, bool(path), secure, expires, False, None, None, {}, )) def load_cookies_safari(cookiejar, profile=None): """Ref.: https://github.com/libyal/dtformats/blob /main/documentation/Safari%20Cookies.asciidoc - This data appears to be out of date but the important parts of the database structure is the same - There are a few bytes here and there which are skipped during parsing """ with _safari_cookies_database() as fp: data = fp.read() page_sizes, body_start = _safari_parse_cookies_header(data) p = DataParser(data[body_start:]) for page_size in page_sizes: _safari_parse_cookies_page(p.read_bytes(page_size), cookiejar) def load_cookies_chrome(cookiejar, browser_name, profile, keyring): config = _get_chromium_based_browser_settings(browser_name) path = _chrome_cookies_database(profile, config) logger.debug("Extracting cookies from %s", path) with DatabaseCopy(path) as db: db.text_factory = bytes decryptor = get_cookie_decryptor( config["directory"], config["keyring"], keyring=keyring) try: rows = db.execute( "SELECT host_key, name, value, encrypted_value, path, " "expires_utc, is_secure FROM cookies") except sqlite3.OperationalError: rows = db.execute( "SELECT host_key, name, value, encrypted_value, path, " "expires_utc, secure FROM cookies") set_cookie = cookiejar.set_cookie failed_cookies = unencrypted_cookies = 0 for domain, name, value, enc_value, path, expires, secure in rows: if not value and enc_value: # encrypted value = decryptor.decrypt(enc_value) if value is None: failed_cookies += 1 continue else: value = value.decode() unencrypted_cookies += 1 domain = domain.decode() path = path.decode() name = name.decode() set_cookie(Cookie( 0, name, value, None, False, domain, bool(domain), domain.startswith("."), path, bool(path), secure, expires, False, None, None, {}, )) if failed_cookies > 0: failed_message = " ({} could not be decrypted)".format(failed_cookies) else: failed_message = "" logger.info("Extracted %s cookies from %s%s", len(cookiejar), browser_name, failed_message) counts = decryptor.cookie_counts.copy() counts["unencrypted"] = unencrypted_cookies logger.debug("cookie version breakdown: %s", counts) # -------------------------------------------------------------------- # firefox def _firefox_cookies_database(profile=None, container=None): if not profile: search_root = _firefox_browser_directory() elif _is_path(profile): search_root = profile else: search_root = os.path.join(_firefox_browser_directory(), profile) path = _find_most_recently_used_file(search_root, "cookies.sqlite") if path is None: raise FileNotFoundError("Unable to find Firefox cookies database in " "{}".format(search_root)) logger.debug("Extracting cookies from %s", path) if container == "none": container_id = False logger.debug("Only loading cookies not belonging to any container") elif container: containers_path = os.path.join( os.path.dirname(path), "containers.json") try: with open(containers_path) as file: identities = util.json_loads(file.read())["identities"] except OSError: logger.error("Unable to read Firefox container database at %s", containers_path) raise except KeyError: identities = () for context in identities: if container == context.get("name") or container == text.extr( context.get("l10nID", ""), "userContext", ".label"): container_id = context["userContextId"] break else: raise ValueError("Unable to find Firefox container {}".format( container)) logger.debug("Only loading cookies from container '%s' (ID %s)", container, container_id) else: container_id = None return path, container_id def _firefox_browser_directory(): if sys.platform in ("win32", "cygwin"): return os.path.expandvars(r"%APPDATA%\Mozilla\Firefox\Profiles") if sys.platform == "darwin": return os.path.expanduser("~/Library/Application Support/Firefox") return os.path.expanduser("~/.mozilla/firefox") # -------------------------------------------------------------------- # safari def _safari_cookies_database(): try: path = os.path.expanduser("~/Library/Cookies/Cookies.binarycookies") return open(path, "rb") except FileNotFoundError: logger.debug("Trying secondary cookie location") path = os.path.expanduser("~/Library/Containers/com.apple.Safari/Data" "/Library/Cookies/Cookies.binarycookies") return open(path, "rb") def _safari_parse_cookies_header(data): p = DataParser(data) p.expect_bytes(b"cook", "database signature") number_of_pages = p.read_uint(big_endian=True) page_sizes = [p.read_uint(big_endian=True) for _ in range(number_of_pages)] return page_sizes, p.cursor def _safari_parse_cookies_page(data, jar): p = DataParser(data) p.expect_bytes(b"\x00\x00\x01\x00", "page signature") number_of_cookies = p.read_uint() record_offsets = [p.read_uint() for _ in range(number_of_cookies)] if number_of_cookies == 0: logger.debug("a cookies page of size %s has no cookies", len(data)) return p.skip_to(record_offsets[0], "unknown page header field") for i, record_offset in enumerate(record_offsets): p.skip_to(record_offset, "space between records") record_length = _safari_parse_cookies_record( data[record_offset:], jar) p.read_bytes(record_length) p.skip_to_end("space in between pages") def _safari_parse_cookies_record(data, cookiejar): p = DataParser(data) record_size = p.read_uint() p.skip(4, "unknown record field 1") flags = p.read_uint() is_secure = bool(flags & 0x0001) p.skip(4, "unknown record field 2") domain_offset = p.read_uint() name_offset = p.read_uint() path_offset = p.read_uint() value_offset = p.read_uint() p.skip(8, "unknown record field 3") expiration_date = _mac_absolute_time_to_posix(p.read_double()) _creation_date = _mac_absolute_time_to_posix(p.read_double()) # noqa: F841 try: p.skip_to(domain_offset) domain = p.read_cstring() p.skip_to(name_offset) name = p.read_cstring() p.skip_to(path_offset) path = p.read_cstring() p.skip_to(value_offset) value = p.read_cstring() except UnicodeDecodeError: logger.warning("failed to parse Safari cookie " "because UTF-8 decoding failed") return record_size p.skip_to(record_size, "space at the end of the record") cookiejar.set_cookie(Cookie( 0, name, value, None, False, domain, bool(domain), domain.startswith("."), path, bool(path), is_secure, expiration_date, False, None, None, {}, )) return record_size # -------------------------------------------------------------------- # chrome def _chrome_cookies_database(profile, config): if profile is None: search_root = config["directory"] elif _is_path(profile): search_root = profile config["directory"] = (os.path.dirname(profile) if config["profiles"] else profile) elif config["profiles"]: search_root = os.path.join(config["directory"], profile) else: logger.warning("%s does not support profiles", config["browser"]) search_root = config["directory"] path = _find_most_recently_used_file(search_root, "Cookies") if path is None: raise FileNotFoundError("Unable to find {} cookies database in " "'{}'".format(config["browser"], search_root)) return path def _get_chromium_based_browser_settings(browser_name): # https://chromium.googlesource.com/chromium # /src/+/HEAD/docs/user_data_dir.md join = os.path.join if sys.platform in ("win32", "cygwin"): appdata_local = os.path.expandvars("%LOCALAPPDATA%") appdata_roaming = os.path.expandvars("%APPDATA%") browser_dir = { "brave" : join(appdata_local, R"BraveSoftware\Brave-Browser\User Data"), "chrome" : join(appdata_local, R"Google\Chrome\User Data"), "chromium": join(appdata_local, R"Chromium\User Data"), "edge" : join(appdata_local, R"Microsoft\Edge\User Data"), "opera" : join(appdata_roaming, R"Opera Software\Opera Stable"), "vivaldi" : join(appdata_local, R"Vivaldi\User Data"), }[browser_name] elif sys.platform == "darwin": appdata = os.path.expanduser("~/Library/Application Support") browser_dir = { "brave" : join(appdata, "BraveSoftware/Brave-Browser"), "chrome" : join(appdata, "Google/Chrome"), "chromium": join(appdata, "Chromium"), "edge" : join(appdata, "Microsoft Edge"), "opera" : join(appdata, "com.operasoftware.Opera"), "vivaldi" : join(appdata, "Vivaldi"), }[browser_name] else: config = (os.environ.get("XDG_CONFIG_HOME") or os.path.expanduser("~/.config")) browser_dir = { "brave" : join(config, "BraveSoftware/Brave-Browser"), "chrome" : join(config, "google-chrome"), "chromium": join(config, "chromium"), "edge" : join(config, "microsoft-edge"), "opera" : join(config, "opera"), "vivaldi" : join(config, "vivaldi"), }[browser_name] # Linux keyring names can be determined by snooping on dbus # while opening the browser in KDE: # dbus-monitor "interface="org.kde.KWallet"" "type=method_return" keyring_name = { "brave" : "Brave", "chrome" : "Chrome", "chromium": "Chromium", "edge" : "Microsoft Edge" if sys.platform == "darwin" else "Chromium", "opera" : "Opera" if sys.platform == "darwin" else "Chromium", "vivaldi" : "Vivaldi" if sys.platform == "darwin" else "Chrome", }[browser_name] browsers_without_profiles = {"opera"} return { "browser" : browser_name, "directory": browser_dir, "keyring" : keyring_name, "profiles" : browser_name not in browsers_without_profiles } class ChromeCookieDecryptor: """ Overview: Linux: - cookies are either v10 or v11 - v10: AES-CBC encrypted with a fixed key - v11: AES-CBC encrypted with an OS protected key (keyring) - v11 keys can be stored in various places depending on the activate desktop environment [2] Mac: - cookies are either v10 or not v10 - v10: AES-CBC encrypted with an OS protected key (keyring) and more key derivation iterations than linux - not v10: "old data" stored as plaintext Windows: - cookies are either v10 or not v10 - v10: AES-GCM encrypted with a key which is encrypted with DPAPI - not v10: encrypted with DPAPI Sources: - [1] https://chromium.googlesource.com/chromium/src/+/refs/heads /main/components/os_crypt/ - [2] https://chromium.googlesource.com/chromium/src/+/refs/heads /main/components/os_crypt/key_storage_linux.cc - KeyStorageLinux::CreateService """ def decrypt(self, encrypted_value): raise NotImplementedError("Must be implemented by sub classes") @property def cookie_counts(self): raise NotImplementedError("Must be implemented by sub classes") def get_cookie_decryptor(browser_root, browser_keyring_name, *, keyring=None): if sys.platform in ("win32", "cygwin"): return WindowsChromeCookieDecryptor(browser_root) elif sys.platform == "darwin": return MacChromeCookieDecryptor(browser_keyring_name) else: return LinuxChromeCookieDecryptor( browser_keyring_name, keyring=keyring) class LinuxChromeCookieDecryptor(ChromeCookieDecryptor): def __init__(self, browser_keyring_name, *, keyring=None): self._v10_key = self.derive_key(b"peanuts") password = _get_linux_keyring_password(browser_keyring_name, keyring) self._v11_key = None if password is None else self.derive_key(password) self._cookie_counts = {"v10": 0, "v11": 0, "other": 0} @staticmethod def derive_key(password): # values from # https://chromium.googlesource.com/chromium/src/+/refs/heads # /main/components/os_crypt/os_crypt_linux.cc return pbkdf2_sha1(password, salt=b"saltysalt", iterations=1, key_length=16) @property def cookie_counts(self): return self._cookie_counts def decrypt(self, encrypted_value): version = encrypted_value[:3] ciphertext = encrypted_value[3:] if version == b"v10": self._cookie_counts["v10"] += 1 return _decrypt_aes_cbc(ciphertext, self._v10_key) elif version == b"v11": self._cookie_counts["v11"] += 1 if self._v11_key is None: logger.warning("cannot decrypt v11 cookies: no key found") return None return _decrypt_aes_cbc(ciphertext, self._v11_key) else: self._cookie_counts["other"] += 1 return None class MacChromeCookieDecryptor(ChromeCookieDecryptor): def __init__(self, browser_keyring_name): password = _get_mac_keyring_password(browser_keyring_name) self._v10_key = None if password is None else self.derive_key(password) self._cookie_counts = {"v10": 0, "other": 0} @staticmethod def derive_key(password): # values from # https://chromium.googlesource.com/chromium/src/+/refs/heads # /main/components/os_crypt/os_crypt_mac.mm return pbkdf2_sha1(password, salt=b"saltysalt", iterations=1003, key_length=16) @property def cookie_counts(self): return self._cookie_counts def decrypt(self, encrypted_value): version = encrypted_value[:3] ciphertext = encrypted_value[3:] if version == b"v10": self._cookie_counts["v10"] += 1 if self._v10_key is None: logger.warning("cannot decrypt v10 cookies: no key found") return None return _decrypt_aes_cbc(ciphertext, self._v10_key) else: self._cookie_counts["other"] += 1 # other prefixes are considered "old data", # which were stored as plaintext # https://chromium.googlesource.com/chromium/src/+/refs/heads # /main/components/os_crypt/os_crypt_mac.mm return encrypted_value class WindowsChromeCookieDecryptor(ChromeCookieDecryptor): def __init__(self, browser_root): self._v10_key = _get_windows_v10_key(browser_root) self._cookie_counts = {"v10": 0, "other": 0} @property def cookie_counts(self): return self._cookie_counts def decrypt(self, encrypted_value): version = encrypted_value[:3] ciphertext = encrypted_value[3:] if version == b"v10": self._cookie_counts["v10"] += 1 if self._v10_key is None: logger.warning("cannot decrypt v10 cookies: no key found") return None # https://chromium.googlesource.com/chromium/src/+/refs/heads # /main/components/os_crypt/os_crypt_win.cc # kNonceLength nonce_length = 96 // 8 # boringssl # EVP_AEAD_AES_GCM_TAG_LEN authentication_tag_length = 16 raw_ciphertext = ciphertext nonce = raw_ciphertext[:nonce_length] ciphertext = raw_ciphertext[ nonce_length:-authentication_tag_length] authentication_tag = raw_ciphertext[-authentication_tag_length:] return _decrypt_aes_gcm( ciphertext, self._v10_key, nonce, authentication_tag) else: self._cookie_counts["other"] += 1 # any other prefix means the data is DPAPI encrypted # https://chromium.googlesource.com/chromium/src/+/refs/heads # /main/components/os_crypt/os_crypt_win.cc return _decrypt_windows_dpapi(encrypted_value).decode() # -------------------------------------------------------------------- # keyring def _choose_linux_keyring(): """ https://chromium.googlesource.com/chromium/src/+/refs/heads /main/components/os_crypt/key_storage_util_linux.cc SelectBackend """ desktop_environment = _get_linux_desktop_environment(os.environ) logger.debug("Detected desktop environment: %s", desktop_environment) if desktop_environment == DE_KDE: return KEYRING_KWALLET if desktop_environment == DE_OTHER: return KEYRING_BASICTEXT return KEYRING_GNOMEKEYRING def _get_kwallet_network_wallet(): """ The name of the wallet used to store network passwords. https://chromium.googlesource.com/chromium/src/+/refs/heads /main/components/os_crypt/kwallet_dbus.cc KWalletDBus::NetworkWallet which does a dbus call to the following function: https://api.kde.org/frameworks/kwallet/html/classKWallet_1_1Wallet.html Wallet::NetworkWallet """ default_wallet = "kdewallet" try: proc, stdout = Popen_communicate( "dbus-send", "--session", "--print-reply=literal", "--dest=org.kde.kwalletd5", "/modules/kwalletd5", "org.kde.KWallet.networkWallet" ) if proc.returncode != 0: logger.warning("failed to read NetworkWallet") return default_wallet else: network_wallet = stdout.decode().strip() logger.debug("NetworkWallet = '%s'", network_wallet) return network_wallet except Exception as exc: logger.warning("exception while obtaining NetworkWallet (%s: %s)", exc.__class__.__name__, exc) return default_wallet def _get_kwallet_password(browser_keyring_name): logger.debug("using kwallet-query to obtain password from kwallet") if shutil.which("kwallet-query") is None: logger.error( "kwallet-query command not found. KWallet and kwallet-query " "must be installed to read from KWallet. kwallet-query should be " "included in the kwallet package for your distribution") return b"" network_wallet = _get_kwallet_network_wallet() try: proc, stdout = Popen_communicate( "kwallet-query", "--read-password", browser_keyring_name + " Safe Storage", "--folder", browser_keyring_name + " Keys", network_wallet, ) if proc.returncode != 0: logger.error("kwallet-query failed with return code {}. " "Please consult the kwallet-query man page " "for details".format(proc.returncode)) return b"" if stdout.lower().startswith(b"failed to read"): logger.debug("Failed to read password from kwallet. " "Using empty string instead") # This sometimes occurs in KDE because chrome does not check # hasEntry and instead just tries to read the value (which # kwallet returns "") whereas kwallet-query checks hasEntry. # To verify this: # dbus-monitor "interface="org.kde.KWallet"" "type=method_return" # while starting chrome. # This may be a bug, as the intended behaviour is to generate a # random password and store it, but that doesn't matter here. return b"" else: logger.debug("password found") if stdout[-1:] == b"\n": stdout = stdout[:-1] return stdout except Exception as exc: logger.warning("exception running kwallet-query (%s: %s)", exc.__class__.__name__, exc) return b"" def _get_gnome_keyring_password(browser_keyring_name): try: import secretstorage except ImportError: logger.error("secretstorage not available") return b"" # Gnome keyring does not seem to organise keys in the same way as KWallet, # using `dbus-monitor` during startup, it can be observed that chromium # lists all keys and presumably searches for its key in the list. # It appears that we must do the same. # https://github.com/jaraco/keyring/issues/556 with contextlib.closing(secretstorage.dbus_init()) as con: col = secretstorage.get_default_collection(con) label = browser_keyring_name + " Safe Storage" for item in col.get_all_items(): if item.get_label() == label: return item.get_secret() else: logger.error("failed to read from keyring") return b"" def _get_linux_keyring_password(browser_keyring_name, keyring): # Note: chrome/chromium can be run with the following flags # to determine which keyring backend it has chosen to use # - chromium --enable-logging=stderr --v=1 2>&1 | grep key_storage_ # # Chromium supports --password-store= # so the automatic detection will not be sufficient in all cases. if not keyring: keyring = _choose_linux_keyring() logger.debug("Chosen keyring: %s", keyring) if keyring == KEYRING_KWALLET: return _get_kwallet_password(browser_keyring_name) elif keyring == KEYRING_GNOMEKEYRING: return _get_gnome_keyring_password(browser_keyring_name) elif keyring == KEYRING_BASICTEXT: # when basic text is chosen, all cookies are stored as v10 # so no keyring password is required return None assert False, "Unknown keyring " + keyring def _get_mac_keyring_password(browser_keyring_name): logger.debug("using find-generic-password to obtain " "password from OSX keychain") try: proc, stdout = Popen_communicate( "security", "find-generic-password", "-w", # write password to stdout "-a", browser_keyring_name, # match "account" "-s", browser_keyring_name + " Safe Storage", # match "service" ) if stdout[-1:] == b"\n": stdout = stdout[:-1] return stdout except Exception as exc: logger.warning("exception running find-generic-password (%s: %s)", exc.__class__.__name__, exc) return None def _get_windows_v10_key(browser_root): path = _find_most_recently_used_file(browser_root, "Local State") if path is None: logger.error("could not find local state file") return None logger.debug("Found local state file at '%s'", path) with open(path, encoding="utf-8") as file: data = util.json_loads(file.read()) try: base64_key = data["os_crypt"]["encrypted_key"] except KeyError: logger.error("no encrypted key in Local State") return None encrypted_key = binascii.a2b_base64(base64_key) prefix = b"DPAPI" if not encrypted_key.startswith(prefix): logger.error("invalid key") return None return _decrypt_windows_dpapi(encrypted_key[len(prefix):]) # -------------------------------------------------------------------- # utility class ParserError(Exception): pass class DataParser: def __init__(self, data): self.cursor = 0 self._data = data def read_bytes(self, num_bytes): if num_bytes < 0: raise ParserError("invalid read of {} bytes".format(num_bytes)) end = self.cursor + num_bytes if end > len(self._data): raise ParserError("reached end of input") data = self._data[self.cursor:end] self.cursor = end return data def expect_bytes(self, expected_value, message): value = self.read_bytes(len(expected_value)) if value != expected_value: raise ParserError("unexpected value: {} != {} ({})".format( value, expected_value, message)) def read_uint(self, big_endian=False): data_format = ">I" if big_endian else " 0: logger.debug("skipping {} bytes ({}): {!r}".format( num_bytes, description, self.read_bytes(num_bytes))) elif num_bytes < 0: raise ParserError("invalid skip of {} bytes".format(num_bytes)) def skip_to(self, offset, description="unknown"): self.skip(offset - self.cursor, description) def skip_to_end(self, description="unknown"): self.skip_to(len(self._data), description) class DatabaseCopy(): def __init__(self, path): self.path = path self.database = None self.directory = None def __enter__(self): try: self.directory = tempfile.TemporaryDirectory(prefix="gallery-dl-") path_copy = os.path.join(self.directory.name, "copy.sqlite") shutil.copyfile(self.path, path_copy) self.database = db = sqlite3.connect( path_copy, isolation_level=None, check_same_thread=False) return db except BaseException: if self.directory: self.directory.cleanup() raise def __exit__(self, exc, value, tb): self.database.close() self.directory.cleanup() def Popen_communicate(*args): proc = subprocess.Popen( args, stdout=subprocess.PIPE, stderr=subprocess.DEVNULL) try: stdout, stderr = proc.communicate() except BaseException: # Including KeyboardInterrupt proc.kill() proc.wait() raise return proc, stdout """ https://chromium.googlesource.com/chromium/src/+/refs/heads /main/base/nix/xdg_util.h - DesktopEnvironment """ DE_OTHER = "other" DE_CINNAMON = "cinnamon" DE_GNOME = "gnome" DE_KDE = "kde" DE_PANTHEON = "pantheon" DE_UNITY = "unity" DE_XFCE = "xfce" """ https://chromium.googlesource.com/chromium/src/+/refs/heads /main/components/os_crypt/key_storage_util_linux.h - SelectedLinuxBackend """ KEYRING_KWALLET = "kwallet" KEYRING_GNOMEKEYRING = "gnomekeyring" KEYRING_BASICTEXT = "basictext" SUPPORTED_KEYRINGS = {"kwallet", "gnomekeyring", "basictext"} def _get_linux_desktop_environment(env): """ Ref: https://chromium.googlesource.com/chromium/src/+/refs/heads /main/base/nix/xdg_util.cc - GetDesktopEnvironment """ xdg_current_desktop = env.get("XDG_CURRENT_DESKTOP") desktop_session = env.get("DESKTOP_SESSION") if xdg_current_desktop: xdg_current_desktop = (xdg_current_desktop.partition(":")[0] .strip().lower()) if xdg_current_desktop == "unity": if desktop_session and "gnome-fallback" in desktop_session: return DE_GNOME else: return DE_UNITY elif xdg_current_desktop == "gnome": return DE_GNOME elif xdg_current_desktop == "x-cinnamon": return DE_CINNAMON elif xdg_current_desktop == "kde": return DE_KDE elif xdg_current_desktop == "pantheon": return DE_PANTHEON elif xdg_current_desktop == "xfce": return DE_XFCE if desktop_session: if desktop_session in ("mate", "gnome"): return DE_GNOME if "kde" in desktop_session: return DE_KDE if "xfce" in desktop_session: return DE_XFCE if "GNOME_DESKTOP_SESSION_ID" in env: return DE_GNOME if "KDE_FULL_SESSION" in env: return DE_KDE return DE_OTHER def _mac_absolute_time_to_posix(timestamp): return int((datetime(2001, 1, 1, 0, 0, tzinfo=timezone.utc) + timedelta(seconds=timestamp)).timestamp()) def pbkdf2_sha1(password, salt, iterations, key_length): return pbkdf2_hmac("sha1", password, salt, iterations, key_length) def _decrypt_aes_cbc(ciphertext, key, initialization_vector=b" " * 16): plaintext = aes.unpad_pkcs7( aes.aes_cbc_decrypt_bytes(ciphertext, key, initialization_vector)) try: return plaintext.decode() except UnicodeDecodeError: logger.warning("failed to decrypt cookie (AES-CBC) because UTF-8 " "decoding failed. Possibly the key is wrong?") return None def _decrypt_aes_gcm(ciphertext, key, nonce, authentication_tag): try: plaintext = aes.aes_gcm_decrypt_and_verify_bytes( ciphertext, key, authentication_tag, nonce) except ValueError: logger.warning("failed to decrypt cookie (AES-GCM) because MAC check " "failed. Possibly the key is wrong?") return None try: return plaintext.decode() except UnicodeDecodeError: logger.warning("failed to decrypt cookie (AES-GCM) because UTF-8 " "decoding failed. Possibly the key is wrong?") return None def _decrypt_windows_dpapi(ciphertext): """ References: - https://docs.microsoft.com/en-us/windows /win32/api/dpapi/nf-dpapi-cryptunprotectdata """ from ctypes.wintypes import DWORD class DATA_BLOB(ctypes.Structure): _fields_ = [("cbData", DWORD), ("pbData", ctypes.POINTER(ctypes.c_char))] buffer = ctypes.create_string_buffer(ciphertext) blob_in = DATA_BLOB(ctypes.sizeof(buffer), buffer) blob_out = DATA_BLOB() ret = ctypes.windll.crypt32.CryptUnprotectData( ctypes.byref(blob_in), # pDataIn None, # ppszDataDescr: human readable description of pDataIn None, # pOptionalEntropy: salt? None, # pvReserved: must be NULL None, # pPromptStruct: information about prompts to display 0, # dwFlags ctypes.byref(blob_out) # pDataOut ) if not ret: logger.warning("failed to decrypt with DPAPI") return None result = ctypes.string_at(blob_out.pbData, blob_out.cbData) ctypes.windll.kernel32.LocalFree(blob_out.pbData) return result def _find_most_recently_used_file(root, filename): # if there are multiple browser profiles, take the most recently used one paths = [] for curr_root, dirs, files in os.walk(root): for file in files: if file == filename: paths.append(os.path.join(curr_root, file)) if not paths: return None return max(paths, key=lambda path: os.lstat(path).st_mtime) def _is_path(value): return os.path.sep in value def _parse_browser_specification( browser, profile=None, keyring=None, container=None): browser = browser.lower() if browser not in SUPPORTED_BROWSERS: raise ValueError("unsupported browser '{}'".format(browser)) if keyring and keyring not in SUPPORTED_KEYRINGS: raise ValueError("unsupported keyring '{}'".format(keyring)) if profile and _is_path(profile): profile = os.path.expanduser(profile) return browser, profile, keyring, container ././@PaxHeader0000000000000000000000000000003300000000000010211 xustar0027 mtime=1678564859.951853 gallery_dl-1.25.0/gallery_dl/downloader/0000755000175000017500000000000014403156774016700 5ustar00mikemike././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1618779451.0 gallery_dl-1.25.0/gallery_dl/downloader/__init__.py0000644000175000017500000000176414037116473021014 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2015-2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Downloader modules""" modules = [ "http", "text", "ytdl", ] def find(scheme): """Return downloader class suitable for handling the given scheme""" try: return _cache[scheme] except KeyError: pass cls = None if scheme == "https": scheme = "http" if scheme in modules: # prevent unwanted imports try: module = __import__(scheme, globals(), None, (), 1) except ImportError: pass else: cls = module.__downloader__ if scheme == "http": _cache["http"] = _cache["https"] = cls else: _cache[scheme] = cls return cls # -------------------------------------------------------------------- # internals _cache = {} ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1653657633.0 gallery_dl-1.25.0/gallery_dl/downloader/common.py0000644000175000017500000000250114244150041020520 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2014-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Common classes and constants used by downloader modules.""" import os from .. import config, util class DownloaderBase(): """Base class for downloaders""" scheme = "" def __init__(self, job): self.out = job.out self.session = job.extractor.session self.part = self.config("part", True) self.partdir = self.config("part-directory") self.log = job.get_logger("downloader." + self.scheme) if self.partdir: self.partdir = util.expand_path(self.partdir) os.makedirs(self.partdir, exist_ok=True) proxies = self.config("proxy", util.SENTINEL) if proxies is util.SENTINEL: self.proxies = job.extractor._proxies else: self.proxies = util.build_proxy_map(proxies, self.log) def config(self, key, default=None): """Interpolate downloader config value for 'key'""" return config.interpolate(("downloader", self.scheme), key, default) def download(self, url, pathfmt): """Write data from 'url' into the file specified by 'pathfmt'""" ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1678400890.0 gallery_dl-1.25.0/gallery_dl/downloader/http.py0000644000175000017500000003553614402456572020243 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2014-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Downloader module for http:// and https:// URLs""" import time import mimetypes from requests.exceptions import RequestException, ConnectionError, Timeout from .common import DownloaderBase from .. import text, util from ssl import SSLError try: from OpenSSL.SSL import Error as OpenSSLError except ImportError: OpenSSLError = SSLError class HttpDownloader(DownloaderBase): scheme = "http" def __init__(self, job): DownloaderBase.__init__(self, job) extractor = job.extractor self.downloading = False self.adjust_extension = self.config("adjust-extensions", True) self.chunk_size = self.config("chunk-size", 32768) self.metadata = extractor.config("http-metadata") self.progress = self.config("progress", 3.0) self.validate = self.config("validate", True) self.headers = self.config("headers") self.minsize = self.config("filesize-min") self.maxsize = self.config("filesize-max") self.retries = self.config("retries", extractor._retries) self.retry_codes = self.config("retry-codes", extractor._retry_codes) self.timeout = self.config("timeout", extractor._timeout) self.verify = self.config("verify", extractor._verify) self.mtime = self.config("mtime", True) self.rate = self.config("rate") if self.retries < 0: self.retries = float("inf") if self.minsize: minsize = text.parse_bytes(self.minsize) if not minsize: self.log.warning( "Invalid minimum file size (%r)", self.minsize) self.minsize = minsize if self.maxsize: maxsize = text.parse_bytes(self.maxsize) if not maxsize: self.log.warning( "Invalid maximum file size (%r)", self.maxsize) self.maxsize = maxsize if isinstance(self.chunk_size, str): chunk_size = text.parse_bytes(self.chunk_size) if not chunk_size: self.log.warning( "Invalid chunk size (%r)", self.chunk_size) chunk_size = 32768 self.chunk_size = chunk_size if self.rate: rate = text.parse_bytes(self.rate) if rate: if rate < self.chunk_size: self.chunk_size = rate self.rate = rate self.receive = self._receive_rate else: self.log.warning("Invalid rate limit (%r)", self.rate) if self.progress is not None: self.receive = self._receive_rate if self.progress < 0.0: self.progress = 0.0 def download(self, url, pathfmt): try: return self._download_impl(url, pathfmt) except Exception: print() raise finally: # remove file from incomplete downloads if self.downloading and not self.part: util.remove_file(pathfmt.temppath) def _download_impl(self, url, pathfmt): response = None tries = 0 msg = "" metadata = self.metadata kwdict = pathfmt.kwdict adjust_extension = kwdict.get( "_http_adjust_extension", self.adjust_extension) if self.part and not metadata: pathfmt.part_enable(self.partdir) while True: if tries: if response: response.close() response = None self.log.warning("%s (%s/%s)", msg, tries, self.retries+1) if tries > self.retries: return False time.sleep(tries) tries += 1 file_header = None # collect HTTP headers headers = {"Accept": "*/*"} # file-specific headers extra = kwdict.get("_http_headers") if extra: headers.update(extra) # general headers if self.headers: headers.update(self.headers) # partial content file_size = pathfmt.part_size() if file_size: headers["Range"] = "bytes={}-".format(file_size) # connect to (remote) source try: response = self.session.request( kwdict.get("_http_method", "GET"), url, stream=True, headers=headers, data=kwdict.get("_http_data"), timeout=self.timeout, proxies=self.proxies, verify=self.verify, ) except (ConnectionError, Timeout) as exc: msg = str(exc) continue except Exception as exc: self.log.warning(exc) return False # check response code = response.status_code if code == 200: # OK offset = 0 size = response.headers.get("Content-Length") elif code == 206: # Partial Content offset = file_size size = response.headers["Content-Range"].rpartition("/")[2] elif code == 416 and file_size: # Requested Range Not Satisfiable break else: msg = "'{} {}' for '{}'".format(code, response.reason, url) if code in self.retry_codes or 500 <= code < 600: continue retry = kwdict.get("_http_retry") if retry and retry(response): continue self.log.warning(msg) return False # check for invalid responses validate = kwdict.get("_http_validate") if validate and self.validate: result = validate(response) if isinstance(result, str): url = result tries -= 1 continue if not result: self.log.warning("Invalid response") return False # check file size size = text.parse_int(size, None) if size is not None: if self.minsize and size < self.minsize: self.log.warning( "File size smaller than allowed minimum (%s < %s)", size, self.minsize) return False if self.maxsize and size > self.maxsize: self.log.warning( "File size larger than allowed maximum (%s > %s)", size, self.maxsize) return False build_path = False # set missing filename extension from MIME type if not pathfmt.extension: pathfmt.set_extension(self._find_extension(response)) build_path = True # set metadata from HTTP headers if metadata: kwdict[metadata] = util.extract_headers(response) build_path = True # build and check file path if build_path: pathfmt.build_path() if pathfmt.exists(): pathfmt.temppath = "" return True if self.part and metadata: pathfmt.part_enable(self.partdir) metadata = False content = response.iter_content(self.chunk_size) # check filename extension against file header if adjust_extension and not offset and \ pathfmt.extension in SIGNATURE_CHECKS: try: file_header = next( content if response.raw.chunked else response.iter_content(16), b"") except (RequestException, SSLError, OpenSSLError) as exc: msg = str(exc) print() continue if self._adjust_extension(pathfmt, file_header) and \ pathfmt.exists(): pathfmt.temppath = "" return True # set open mode if not offset: mode = "w+b" if file_size: self.log.debug("Unable to resume partial download") else: mode = "r+b" self.log.debug("Resuming download at byte %d", offset) # download content self.downloading = True with pathfmt.open(mode) as fp: if file_header: fp.write(file_header) offset += len(file_header) elif offset: if adjust_extension and \ pathfmt.extension in SIGNATURE_CHECKS: self._adjust_extension(pathfmt, fp.read(16)) fp.seek(offset) self.out.start(pathfmt.path) try: self.receive(fp, content, size, offset) except (RequestException, SSLError, OpenSSLError) as exc: msg = str(exc) print() continue # check file size if size and fp.tell() < size: msg = "file size mismatch ({} < {})".format( fp.tell(), size) print() continue break self.downloading = False if self.mtime: kwdict.setdefault("_mtime", response.headers.get("Last-Modified")) else: kwdict["_mtime"] = None return True @staticmethod def receive(fp, content, bytes_total, bytes_start): write = fp.write for data in content: write(data) def _receive_rate(self, fp, content, bytes_total, bytes_start): rate = self.rate write = fp.write progress = self.progress bytes_downloaded = 0 time_start = time.monotonic() for data in content: time_elapsed = time.monotonic() - time_start bytes_downloaded += len(data) write(data) if progress is not None: if time_elapsed > progress: self.out.progress( bytes_total, bytes_start + bytes_downloaded, int(bytes_downloaded / time_elapsed), ) if rate: time_expected = bytes_downloaded / rate if time_expected > time_elapsed: time.sleep(time_expected - time_elapsed) def _find_extension(self, response): """Get filename extension from MIME type""" mtype = response.headers.get("Content-Type", "image/jpeg") mtype = mtype.partition(";")[0] if "/" not in mtype: mtype = "image/" + mtype if mtype in MIME_TYPES: return MIME_TYPES[mtype] ext = mimetypes.guess_extension(mtype, strict=False) if ext: return ext[1:] self.log.warning("Unknown MIME type '%s'", mtype) return "bin" @staticmethod def _adjust_extension(pathfmt, file_header): """Check filename extension against file header""" if not SIGNATURE_CHECKS[pathfmt.extension](file_header): for ext, check in SIGNATURE_CHECKS.items(): if check(file_header): pathfmt.set_extension(ext) pathfmt.build_path() return True return False MIME_TYPES = { "image/jpeg" : "jpg", "image/jpg" : "jpg", "image/png" : "png", "image/gif" : "gif", "image/bmp" : "bmp", "image/x-bmp" : "bmp", "image/x-ms-bmp": "bmp", "image/webp" : "webp", "image/avif" : "avif", "image/svg+xml" : "svg", "image/ico" : "ico", "image/icon" : "ico", "image/x-icon" : "ico", "image/vnd.microsoft.icon" : "ico", "image/x-photoshop" : "psd", "application/x-photoshop" : "psd", "image/vnd.adobe.photoshop": "psd", "video/webm": "webm", "video/ogg" : "ogg", "video/mp4" : "mp4", "audio/wav" : "wav", "audio/x-wav": "wav", "audio/webm" : "webm", "audio/ogg" : "ogg", "audio/mpeg" : "mp3", "application/zip" : "zip", "application/x-zip": "zip", "application/x-zip-compressed": "zip", "application/rar" : "rar", "application/x-rar": "rar", "application/x-rar-compressed": "rar", "application/x-7z-compressed" : "7z", "application/pdf" : "pdf", "application/x-pdf": "pdf", "application/x-shockwave-flash": "swf", "application/ogg": "ogg", # https://www.iana.org/assignments/media-types/model/obj "model/obj": "obj", "application/octet-stream": "bin", } # https://en.wikipedia.org/wiki/List_of_file_signatures SIGNATURE_CHECKS = { "jpg" : lambda s: s[0:3] == b"\xFF\xD8\xFF", "png" : lambda s: s[0:8] == b"\x89PNG\r\n\x1A\n", "gif" : lambda s: s[0:6] in (b"GIF87a", b"GIF89a"), "bmp" : lambda s: s[0:2] == b"BM", "webp": lambda s: (s[0:4] == b"RIFF" and s[8:12] == b"WEBP"), "avif": lambda s: s[4:11] == b"ftypavi" and s[11] in b"fs", "svg" : lambda s: s[0:5] == b"= 0 else float("inf"), "socket_timeout": self.config("timeout", extractor._timeout), "nocheckcertificate": not self.config("verify", extractor._verify), "proxy": self.proxies.get("http") if self.proxies else None, } self.ytdl_instance = None self.forward_cookies = self.config("forward-cookies", False) self.progress = self.config("progress", 3.0) self.outtmpl = self.config("outtmpl") def download(self, url, pathfmt): kwdict = pathfmt.kwdict ytdl_instance = kwdict.pop("_ytdl_instance", None) if not ytdl_instance: ytdl_instance = self.ytdl_instance if not ytdl_instance: try: module = ytdl.import_module(self.config("module")) except ImportError as exc: self.log.error("Cannot import module '%s'", exc.name) self.log.debug("", exc_info=True) self.download = lambda u, p: False return False self.ytdl_instance = ytdl_instance = ytdl.construct_YoutubeDL( module, self, self.ytdl_opts) if self.outtmpl == "default": self.outtmpl = module.DEFAULT_OUTTMPL if self.forward_cookies: set_cookie = ytdl_instance.cookiejar.set_cookie for cookie in self.session.cookies: set_cookie(cookie) if self.progress is not None and not ytdl_instance._progress_hooks: ytdl_instance.add_progress_hook(self._progress_hook) info_dict = kwdict.pop("_ytdl_info_dict", None) if not info_dict: try: info_dict = ytdl_instance.extract_info(url[5:], download=False) except Exception: pass if not info_dict: return False if "entries" in info_dict: index = kwdict.get("_ytdl_index") if index is None: return self._download_playlist( ytdl_instance, pathfmt, info_dict) else: info_dict = info_dict["entries"][index] extra = kwdict.get("_ytdl_extra") if extra: info_dict.update(extra) return self._download_video(ytdl_instance, pathfmt, info_dict) def _download_video(self, ytdl_instance, pathfmt, info_dict): if "url" in info_dict: text.nameext_from_url(info_dict["url"], pathfmt.kwdict) formats = info_dict.get("requested_formats") if formats and not compatible_formats(formats): info_dict["ext"] = "mkv" if self.outtmpl: self._set_outtmpl(ytdl_instance, self.outtmpl) pathfmt.filename = filename = \ ytdl_instance.prepare_filename(info_dict) pathfmt.extension = info_dict["ext"] pathfmt.path = pathfmt.directory + filename pathfmt.realpath = pathfmt.temppath = ( pathfmt.realdirectory + filename) else: pathfmt.set_extension(info_dict["ext"]) pathfmt.build_path() if pathfmt.exists(): pathfmt.temppath = "" return True if self.part and self.partdir: pathfmt.temppath = os.path.join( self.partdir, pathfmt.filename) self._set_outtmpl(ytdl_instance, pathfmt.temppath.replace("%", "%%")) self.out.start(pathfmt.path) try: ytdl_instance.process_info(info_dict) except Exception: self.log.debug("Traceback", exc_info=True) return False return True def _download_playlist(self, ytdl_instance, pathfmt, info_dict): pathfmt.set_extension("%(playlist_index)s.%(ext)s") pathfmt.build_path() self._set_outtmpl(ytdl_instance, pathfmt.realpath) for entry in info_dict["entries"]: ytdl_instance.process_info(entry) return True def _progress_hook(self, info): if info["status"] == "downloading" and \ info["elapsed"] >= self.progress: total = info.get("total_bytes") or info.get("total_bytes_estimate") speed = info.get("speed") self.out.progress( None if total is None else int(total), info["downloaded_bytes"], int(speed) if speed else 0, ) @staticmethod def _set_outtmpl(ytdl_instance, outtmpl): try: ytdl_instance._parse_outtmpl except AttributeError: try: ytdl_instance.outtmpl_dict["default"] = outtmpl except AttributeError: ytdl_instance.params["outtmpl"] = outtmpl else: ytdl_instance.params["outtmpl"] = {"default": outtmpl} def compatible_formats(formats): """Returns True if 'formats' are compatible for merge""" video_ext = formats[0].get("ext") audio_ext = formats[1].get("ext") if video_ext == "webm" and audio_ext == "webm": return True exts = ("mp3", "mp4", "m4a", "m4p", "m4b", "m4r", "m4v", "ismv", "isma") return video_ext in exts and audio_ext in exts __downloader__ = YoutubeDLDownloader ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1676132154.0 gallery_dl-1.25.0/gallery_dl/exception.py0000644000175000017500000000640314371737472017121 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2015-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Exception classes used by gallery-dl Class Hierarchy: Exception +-- GalleryDLException +-- ExtractionError | +-- AuthenticationError | +-- AuthorizationError | +-- NotFoundError | +-- HttpError +-- FormatError | +-- FilenameFormatError | +-- DirectoryFormatError +-- FilterError +-- NoExtractorError +-- StopExtraction +-- TerminateExtraction +-- RestartExtraction """ class GalleryDLException(Exception): """Base class for GalleryDL exceptions""" default = None msgfmt = None code = 1 def __init__(self, message=None, fmt=True): if not message: message = self.default elif isinstance(message, Exception): message = "{}: {}".format(message.__class__.__name__, message) if self.msgfmt and fmt: message = self.msgfmt.format(message) Exception.__init__(self, message) class ExtractionError(GalleryDLException): """Base class for exceptions during information extraction""" class HttpError(ExtractionError): """HTTP request during data extraction failed""" default = "HTTP request failed" code = 4 def __init__(self, message, response=None): ExtractionError.__init__(self, message) self.response = response self.status = response.status_code if response else 0 class NotFoundError(ExtractionError): """Requested resource (gallery/image) could not be found""" msgfmt = "Requested {} could not be found" default = "resource (gallery/image)" code = 8 class AuthenticationError(ExtractionError): """Invalid or missing login credentials""" default = "Invalid or missing login credentials" code = 16 class AuthorizationError(ExtractionError): """Insufficient privileges to access a resource""" default = "Insufficient privileges to access the specified resource" code = 16 class FormatError(GalleryDLException): """Error while building output paths""" code = 32 class FilenameFormatError(FormatError): """Error while building output filenames""" msgfmt = "Applying filename format string failed ({})" class DirectoryFormatError(FormatError): """Error while building output directory paths""" msgfmt = "Applying directory format string failed ({})" class FilterError(GalleryDLException): """Error while evaluating a filter expression""" msgfmt = "Evaluating filter expression failed ({})" code = 32 class NoExtractorError(GalleryDLException): """No extractor can handle the given URL""" code = 64 class StopExtraction(GalleryDLException): """Stop data extraction""" def __init__(self, message=None, *args): GalleryDLException.__init__(self) self.message = message % args if args else message self.code = 1 if message else 0 class TerminateExtraction(GalleryDLException): """Terminate data extraction""" code = 0 class RestartExtraction(GalleryDLException): """Restart data extraction""" code = 0 ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1678564859.9751868 gallery_dl-1.25.0/gallery_dl/extractor/0000755000175000017500000000000014403156774016555 5ustar00mikemike././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1674475069.0 gallery_dl-1.25.0/gallery_dl/extractor/2chan.py0000644000175000017500000000732514363473075020132 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2017-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.2chan.net/""" from .common import Extractor, Message from .. import text class _2chanThreadExtractor(Extractor): """Extractor for 2chan threads""" category = "2chan" subcategory = "thread" directory_fmt = ("{category}", "{board_name}", "{thread}") filename_fmt = "{tim}.{extension}" archive_fmt = "{board}_{thread}_{tim}" url_fmt = "https://{server}.2chan.net/{board}/src/{filename}" pattern = r"(?:https?://)?([\w-]+)\.2chan\.net/([^/]+)/res/(\d+)" test = ("https://dec.2chan.net/70/res/14565.htm", { "pattern": r"https://dec\.2chan\.net/70/src/\d{13}\.jpg", "count": ">= 3", "keyword": { "board": "70", "board_name": "新板提案", "com": str, "fsize": r"re:\d+", "name": "名無し", "no": r"re:1[45]\d\d\d", "now": r"re:22/../..\(.\)..:..:..", "post": "無題", "server": "dec", "thread": "14565", "tim": r"re:^\d{13}$", "time": r"re:^\d{10}$", "title": "ヒロアカ板" }, }) def __init__(self, match): Extractor.__init__(self, match) self.server, self.board, self.thread = match.groups() def items(self): url = "https://{}.2chan.net/{}/res/{}.htm".format( self.server, self.board, self.thread) page = self.request(url).text data = self.metadata(page) yield Message.Directory, data for post in self.posts(page): if "filename" not in post: continue post.update(data) url = self.url_fmt.format_map(post) yield Message.Url, url, post def metadata(self, page): """Collect metadata for extractor-job""" title, _, boardname = text.extr( page, "", "").rpartition(" - ") return { "server": self.server, "title": title, "board": self.board, "board_name": boardname[:-4], "thread": self.thread, } def posts(self, page): """Build a list of all post-objects""" page = text.extr( page, '
') return [ self.parse(post) for post in page.split('') ] def parse(self, post): """Build post-object by extracting data from an HTML post""" data = self._extract_post(post) if data["name"]: data["name"] = data["name"].strip() path = text.extr(post, '' , '<'), ("name", 'class="cnm">' , '<'), ("now" , 'class="cnw">' , '<'), ("no" , 'class="cno">No.', '<'), (None , '', ''), ))[0] @staticmethod def _extract_image(post, data): text.extract_all(post, ( (None , '_blank', ''), ("filename", '>', '<'), ("fsize" , '(', ' '), ), 0, data) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1674475069.0 gallery_dl-1.25.0/gallery_dl/extractor/2chen.py0000644000175000017500000000730014363473075020127 0ustar00mikemike# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://2chen.moe/""" from .common import Extractor, Message from .. import text class _2chenThreadExtractor(Extractor): """Extractor for 2chen threads""" category = "2chen" subcategory = "thread" directory_fmt = ("{category}", "{board}", "{thread} {title}") filename_fmt = "{time} {filename}.{extension}" archive_fmt = "{board}_{thread}_{hash}_{time}" pattern = r"(?:https?://)?2chen\.(?:moe|club)/([^/?#]+)/(\d+)" test = ( ("https://2chen.moe/tv/496715", { "pattern": r"https://2chen\.su/assets/images/src/\w{40}\.\w+$", "count": ">= 179", }), ("https://2chen.club/tv/1", { "count": 5, }), # 404 ("https://2chen.moe/jp/303786"), ) def __init__(self, match): Extractor.__init__(self, match) self.root = text.root_from_url(match.group(0)) self.board, self.thread = match.groups() def items(self): url = "{}/{}/{}".format(self.root, self.board, self.thread) page = self.request(url, encoding="utf-8", notfound="thread").text data = self.metadata(page) yield Message.Directory, data for post in self.posts(page): url = post["url"] if not url: continue if url[0] == "/": url = self.root + url post["url"] = url = url.partition("?")[0] post.update(data) post["time"] = text.parse_int(post["date"].timestamp()) yield Message.Url, url, text.nameext_from_url( post["filename"], post) def metadata(self, page): board, pos = text.extract(page, 'class="board">/', '/<') title = text.extract(page, "

", "

", pos)[0] return { "board" : board, "thread": self.thread, "title" : text.unescape(title), } def posts(self, page): """Return iterable with relevant posts""" return map(self.parse, text.extract_iter( page, 'class="glass media', '')) def parse(self, post): extr = text.extract_from(post) return { "name" : text.unescape(extr("", "")), "date" : text.parse_datetime( extr("")[2], "%d %b %Y (%a) %H:%M:%S" ), "no" : extr('href="#p', '"'), "url" : extr('
= 33", }), ("https://en.35photo.pro/liya"), ("https://ru.35photo.pro/liya"), ) def __init__(self, match): _35photoExtractor.__init__(self, match) self.user = match.group(1) self.user_id = 0 def metadata(self): url = "{}/{}/".format(self.root, self.user) page = self.request(url).text self.user_id = text.parse_int(text.extr(page, "/user_", ".xml")) return { "user": self.user, "user_id": self.user_id, } def photos(self): return self._pagination({ "page": "photoUser", "user_id": self.user_id, }) class _35photoTagExtractor(_35photoExtractor): """Extractor for all photos from a tag listing""" subcategory = "tag" directory_fmt = ("{category}", "Tags", "{search_tag}") archive_fmt = "t{search_tag}_{id}_{num}" pattern = r"(?:https?://)?(?:[a-z]+\.)?35photo\.pro/tags/([^/?#]+)" test = ("https://35photo.pro/tags/landscape/", { "range": "1-25", "count": 25, "archive": False, }) def __init__(self, match): _35photoExtractor.__init__(self, match) self.tag = match.group(1) def metadata(self): return {"search_tag": text.unquote(self.tag).lower()} def photos(self): num = 1 while True: url = "{}/tags/{}/list_{}/".format(self.root, self.tag, num) page = self.request(url).text prev = None for photo_id in text.extract_iter(page, "35photo.pro/photo_", "/"): if photo_id != prev: prev = photo_id yield photo_id if not prev: return num += 1 class _35photoGenreExtractor(_35photoExtractor): """Extractor for images of a specific genre on 35photo.pro""" subcategory = "genre" directory_fmt = ("{category}", "Genre", "{genre}") archive_fmt = "g{genre_id}_{id}_{num}" pattern = r"(?:https?://)?(?:[a-z]+\.)?35photo\.pro/genre_(\d+)(/new/)?" test = ("https://35photo.pro/genre_109/",) def __init__(self, match): _35photoExtractor.__init__(self, match) self.genre_id, self.new = match.groups() self.photo_ids = None def metadata(self): url = "{}/genre_{}{}".format(self.root, self.genre_id, self.new or "/") page = self.request(url).text self.photo_ids = self._photo_ids(text.extr( page, ' class="photo', '\n')) return { "genre": text.extr(page, " genre - ", ". "), "genre_id": text.parse_int(self.genre_id), } def photos(self): if not self.photo_ids: return () return self._pagination({ "page": "genre", "community_id": self.genre_id, "photo_rating": "0" if self.new else "50", "lastId": self.photo_ids[-1], }, self.photo_ids) class _35photoImageExtractor(_35photoExtractor): """Extractor for individual images from 35photo.pro""" subcategory = "image" pattern = r"(?:https?://)?(?:[a-z]+\.)?35photo\.pro/photo_(\d+)" test = ("https://35photo.pro/photo_753340/", { "count": 1, "keyword": { "url" : r"re:https://35photo\.pro/photos_main/.*\.jpg", "id" : 753340, "title" : "Winter walk", "description": str, "tags" : list, "views" : int, "favorites" : int, "score" : int, "type" : 0, "date" : "15 авг, 2014", "user" : "liya", "user_id" : 20415, "user_name" : "Liya Mirzaeva", }, }) def __init__(self, match): _35photoExtractor.__init__(self, match) self.photo_id = match.group(1) def photos(self): return (self.photo_id,) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1641689335.0 gallery_dl-1.25.0/gallery_dl/extractor/3dbooru.py0000644000175000017500000000600514166430367020504 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2015-2020 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for http://behoimi.org/""" from . import moebooru class _3dbooruBase(): """Base class for 3dbooru extractors""" category = "3dbooru" basecategory = "booru" root = "http://behoimi.org" def __init__(self, match): super().__init__(match) self.session.headers.update({ "Referer": "http://behoimi.org/post/show/", "Accept-Encoding": "identity", }) class _3dbooruTagExtractor(_3dbooruBase, moebooru.MoebooruTagExtractor): """Extractor for images from behoimi.org based on search-tags""" pattern = (r"(?:https?://)?(?:www\.)?behoimi\.org/post" r"(?:/(?:index)?)?\?tags=(?P[^&#]+)") test = ("http://behoimi.org/post?tags=himekawa_azuru+dress", { "url": "ecb30c6aaaf8a6ff8f55255737a9840832a483c1", "content": "11cbda40c287e026c1ce4ca430810f761f2d0b2a", }) def posts(self): params = {"tags": self.tags} return self._pagination(self.root + "/post/index.json", params) class _3dbooruPoolExtractor(_3dbooruBase, moebooru.MoebooruPoolExtractor): """Extractor for image-pools from behoimi.org""" pattern = r"(?:https?://)?(?:www\.)?behoimi\.org/pool/show/(?P\d+)" test = ("http://behoimi.org/pool/show/27", { "url": "da75d2d1475449d5ef0c266cb612683b110a30f2", "content": "fd5b37c5c6c2de4b4d6f1facffdefa1e28176554", }) def posts(self): params = {"tags": "pool:" + self.pool_id} return self._pagination(self.root + "/post/index.json", params) class _3dbooruPostExtractor(_3dbooruBase, moebooru.MoebooruPostExtractor): """Extractor for single images from behoimi.org""" pattern = r"(?:https?://)?(?:www\.)?behoimi\.org/post/show/(?P\d+)" test = ("http://behoimi.org/post/show/140852", { "url": "ce874ea26f01d6c94795f3cc3aaaaa9bc325f2f6", "content": "26549d55b82aa9a6c1686b96af8bfcfa50805cd4", "options": (("tags", True),), "keyword": { "tags_character": "furude_rika", "tags_copyright": "higurashi_no_naku_koro_ni", "tags_model": "himekawa_azuru", "tags_general": str, }, }) def posts(self): params = {"tags": "id:" + self.post_id} return self._pagination(self.root + "/post/index.json", params) class _3dbooruPopularExtractor( _3dbooruBase, moebooru.MoebooruPopularExtractor): """Extractor for popular images from behoimi.org""" pattern = (r"(?:https?://)?(?:www\.)?behoimi\.org" r"/post/popular_(?Pby_(?:day|week|month)|recent)" r"(?:\?(?P[^#]*))?") test = ("http://behoimi.org/post/popular_by_month?month=2&year=2013", { "pattern": r"http://behoimi\.org/data/../../[0-9a-f]{32}\.jpg", "count": 20, }) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1643756959.0 gallery_dl-1.25.0/gallery_dl/extractor/420chan.py0000644000175000017500000000512014176336637020271 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://420chan.org/""" from .common import Extractor, Message class _420chanThreadExtractor(Extractor): """Extractor for 420chan threads""" category = "420chan" subcategory = "thread" directory_fmt = ("{category}", "{board}", "{thread} {title}") archive_fmt = "{board}_{thread}_{filename}" pattern = r"(?:https?://)?boards\.420chan\.org/([^/?#]+)/thread/(\d+)" test = ("https://boards.420chan.org/ani/thread/33251/chow-chows", { "pattern": r"https://boards\.420chan\.org/ani/src/\d+\.jpg", "content": "b07c803b0da78de159709da923e54e883c100934", "count": 2, }) def __init__(self, match): Extractor.__init__(self, match) self.board, self.thread = match.groups() def items(self): url = "https://api.420chan.org/{}/res/{}.json".format( self.board, self.thread) posts = self.request(url).json()["posts"] data = { "board" : self.board, "thread": self.thread, "title" : posts[0].get("sub") or posts[0]["com"][:50], } yield Message.Directory, data for post in posts: if "filename" in post: post.update(data) post["extension"] = post["ext"][1:] url = "https://boards.420chan.org/{}/src/{}{}".format( post["board"], post["filename"], post["ext"]) yield Message.Url, url, post class _420chanBoardExtractor(Extractor): """Extractor for 420chan boards""" category = "420chan" subcategory = "board" pattern = r"(?:https?://)?boards\.420chan\.org/([^/?#]+)/\d*$" test = ("https://boards.420chan.org/po/", { "pattern": _420chanThreadExtractor.pattern, "count": ">= 100", }) def __init__(self, match): Extractor.__init__(self, match) self.board = match.group(1) def items(self): url = "https://api.420chan.org/{}/threads.json".format(self.board) threads = self.request(url).json() for page in threads: for thread in page["threads"]: url = "https://boards.420chan.org/{}/thread/{}/".format( self.board, thread["no"]) thread["page"] = page["page"] thread["_extractor"] = _420chanThreadExtractor yield Message.Queue, url, thread ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1673713123.0 gallery_dl-1.25.0/gallery_dl/extractor/4chan.py0000644000175000017500000000601414360552743020123 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2015-2019 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.4chan.org/""" from .common import Extractor, Message from .. import text class _4chanThreadExtractor(Extractor): """Extractor for 4chan threads""" category = "4chan" subcategory = "thread" directory_fmt = ("{category}", "{board}", "{thread} {title}") filename_fmt = "{tim} {filename}.{extension}" archive_fmt = "{board}_{thread}_{tim}" pattern = (r"(?:https?://)?boards\.4chan(?:nel)?\.org" r"/([^/]+)/thread/(\d+)") test = ( ("https://boards.4chan.org/tg/thread/15396072/", { "url": "39082ad166161966d7ba8e37f2173a824eb540f0", "keyword": "7ae2f4049adf0d2f835eb91b6b26b7f4ec882e0a", "content": "20b7b51afa51c9c31a0020a0737b889532c8d7ec", }), ("https://boards.4channel.org/tg/thread/15396072/", { "url": "39082ad166161966d7ba8e37f2173a824eb540f0", "keyword": "7ae2f4049adf0d2f835eb91b6b26b7f4ec882e0a", }), ) def __init__(self, match): Extractor.__init__(self, match) self.board, self.thread = match.groups() def items(self): url = "https://a.4cdn.org/{}/thread/{}.json".format( self.board, self.thread) posts = self.request(url).json()["posts"] title = posts[0].get("sub") or text.remove_html(posts[0]["com"]) data = { "board" : self.board, "thread": self.thread, "title" : text.unescape(title)[:50], } yield Message.Directory, data for post in posts: if "filename" in post: post.update(data) post["extension"] = post["ext"][1:] post["filename"] = text.unescape(post["filename"]) url = "https://i.4cdn.org/{}/{}{}".format( post["board"], post["tim"], post["ext"]) yield Message.Url, url, post class _4chanBoardExtractor(Extractor): """Extractor for 4chan boards""" category = "4chan" subcategory = "board" pattern = r"(?:https?://)?boards\.4chan(?:nel)?\.org/([^/?#]+)/\d*$" test = ("https://boards.4channel.org/po/", { "pattern": _4chanThreadExtractor.pattern, "count": ">= 100", }) def __init__(self, match): Extractor.__init__(self, match) self.board = match.group(1) def items(self): url = "https://a.4cdn.org/{}/threads.json".format(self.board) threads = self.request(url).json() for page in threads: for thread in page["threads"]: url = "https://boards.4chan.org/{}/thread/{}/".format( self.board, thread["no"]) thread["page"] = page["page"] thread["_extractor"] = _4chanThreadExtractor yield Message.Queue, url, thread ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1675953929.0 gallery_dl-1.25.0/gallery_dl/extractor/500px.py0000644000175000017500000004370514371203411017775 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2019-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://500px.com/""" from .common import Extractor, Message from .. import util BASE_PATTERN = r"(?:https?://)?(?:web\.)?500px\.com" class _500pxExtractor(Extractor): """Base class for 500px extractors""" category = "500px" directory_fmt = ("{category}", "{user[username]}") filename_fmt = "{id}_{name}.{extension}" archive_fmt = "{id}" root = "https://500px.com" cookiedomain = ".500px.com" def __init__(self, match): Extractor.__init__(self, match) self.session.headers["Referer"] = self.root + "/" def items(self): data = self.metadata() for photo in self.photos(): url = photo["images"][-1]["url"] photo["extension"] = photo["image_format"] if data: photo.update(data) yield Message.Directory, photo yield Message.Url, url, photo def metadata(self): """Returns general metadata""" def photos(self): """Returns an iterable containing all relevant photo IDs""" def _extend(self, edges): """Extend photos with additional metadata and higher resolution URLs""" ids = [str(edge["node"]["legacyId"]) for edge in edges] url = "https://api.500px.com/v1/photos" params = { "expanded_user_info" : "true", "include_tags" : "true", "include_geo" : "true", "include_equipment_info": "true", "vendor_photos" : "true", "include_licensing" : "true", "include_releases" : "true", "liked_by" : "1", "following_sample" : "100", "image_size" : "4096", "ids" : ",".join(ids), } photos = self._request_api(url, params)["photos"] return [ photos[pid] for pid in ids if pid in photos or self.log.warning("Unable to fetch photo %s", pid) ] def _request_api(self, url, params): headers = { "Origin": self.root, "x-csrf-token": self.session.cookies.get( "x-csrf-token", domain=".500px.com"), } return self.request(url, headers=headers, params=params).json() def _request_graphql(self, opname, variables): url = "https://api.500px.com/graphql" headers = { "x-csrf-token": self.session.cookies.get( "x-csrf-token", domain=".500px.com"), } data = { "operationName": opname, "variables" : util.json_dumps(variables), "query" : QUERIES[opname], } return self.request( url, method="POST", headers=headers, json=data).json()["data"] class _500pxUserExtractor(_500pxExtractor): """Extractor for photos from a user's photostream on 500px.com""" subcategory = "user" pattern = BASE_PATTERN + r"/(?!photo/|liked)(?:p/)?([^/?#]+)/?(?:$|[?#])" test = ( ("https://500px.com/p/light_expression_photography", { "pattern": r"https?://drscdn.500px.org/photo/\d+/m%3D4096/v2", "range": "1-99", "count": 99, }), ("https://500px.com/light_expression_photography"), ("https://web.500px.com/light_expression_photography"), ) def __init__(self, match): _500pxExtractor.__init__(self, match) self.user = match.group(1) def photos(self): variables = {"username": self.user, "pageSize": 20} photos = self._request_graphql( "OtherPhotosQuery", variables, )["user"]["photos"] while True: yield from self._extend(photos["edges"]) if not photos["pageInfo"]["hasNextPage"]: return variables["cursor"] = photos["pageInfo"]["endCursor"] photos = self._request_graphql( "OtherPhotosPaginationContainerQuery", variables, )["userByUsername"]["photos"] class _500pxGalleryExtractor(_500pxExtractor): """Extractor for photo galleries on 500px.com""" subcategory = "gallery" directory_fmt = ("{category}", "{user[username]}", "{gallery[name]}") pattern = (BASE_PATTERN + r"/(?!photo/)(?:p/)?" r"([^/?#]+)/galleries/([^/?#]+)") test = ( ("https://500px.com/p/fashvamp/galleries/lera", { "url": "002dc81dee5b4a655f0e31ad8349e8903b296df6", "count": 3, "keyword": { "gallery": dict, "user": dict, }, }), ("https://500px.com/fashvamp/galleries/lera"), ) def __init__(self, match): _500pxExtractor.__init__(self, match) self.user_name, self.gallery_name = match.groups() self.user_id = self._photos = None def metadata(self): user = self._request_graphql( "ProfileRendererQuery", {"username": self.user_name}, )["profile"] self.user_id = str(user["legacyId"]) variables = { "galleryOwnerLegacyId": self.user_id, "ownerLegacyId" : self.user_id, "slug" : self.gallery_name, "token" : None, "pageSize" : 20, } gallery = self._request_graphql( "GalleriesDetailQueryRendererQuery", variables, )["gallery"] self._photos = gallery["photos"] del gallery["photos"] return { "gallery": gallery, "user" : user, } def photos(self): photos = self._photos variables = { "ownerLegacyId": self.user_id, "slug" : self.gallery_name, "token" : None, "pageSize" : 20, } while True: yield from self._extend(photos["edges"]) if not photos["pageInfo"]["hasNextPage"]: return variables["cursor"] = photos["pageInfo"]["endCursor"] photos = self._request_graphql( "GalleriesDetailPaginationContainerQuery", variables, )["galleryByOwnerIdAndSlugOrToken"]["photos"] class _500pxFavoriteExtractor(_500pxExtractor): """Extractor for favorite 500px photos""" subcategory = "favorite" pattern = BASE_PATTERN + r"/liked/?$" test = ("https://500px.com/liked",) def photos(self): variables = {"pageSize": 20} photos = self._request_graphql( "LikedPhotosQueryRendererQuery", variables, )["likedPhotos"] while True: yield from self._extend(photos["edges"]) if not photos["pageInfo"]["hasNextPage"]: return variables["cursor"] = photos["pageInfo"]["endCursor"] photos = self._request_graphql( "LikedPhotosPaginationContainerQuery", variables, )["likedPhotos"] class _500pxImageExtractor(_500pxExtractor): """Extractor for individual images from 500px.com""" subcategory = "image" pattern = BASE_PATTERN + r"/photo/(\d+)" test = ("https://500px.com/photo/222049255/queen-of-coasts", { "url": "fbdf7df39325cae02f5688e9f92935b0e7113315", "count": 1, "keyword": { "camera": "Canon EOS 600D", "camera_info": dict, "comments": list, "comments_count": int, "created_at": "2017-08-01T08:40:05+00:00", "description": str, "editored_by": None, "editors_choice": False, "extension": "jpg", "feature": "popular", "feature_date": "2017-08-01T09:58:28+00:00", "focal_length": "208", "height": 3111, "id": 222049255, "image_format": "jpg", "image_url": list, "images": list, "iso": "100", "lens": "EF-S55-250mm f/4-5.6 IS II", "lens_info": dict, "liked": None, "location": None, "location_details": dict, "name": "Queen Of Coasts", "nsfw": False, "privacy": False, "profile": True, "rating": float, "status": 1, "tags": list, "taken_at": "2017-05-04T17:36:51+00:00", "times_viewed": int, "url": "/photo/222049255/Queen-Of-Coasts-by-Alice-Nabieva", "user": dict, "user_id": 12847235, "votes_count": int, "watermark": True, "width": 4637, }, }) def __init__(self, match): _500pxExtractor.__init__(self, match) self.photo_id = match.group(1) def photos(self): edges = ({"node": {"legacyId": self.photo_id}},) return self._extend(edges) QUERIES = { "OtherPhotosQuery": """\ query OtherPhotosQuery($username: String!, $pageSize: Int) { user: userByUsername(username: $username) { ...OtherPhotosPaginationContainer_user_RlXb8 id } } fragment OtherPhotosPaginationContainer_user_RlXb8 on User { photos(first: $pageSize, privacy: PROFILE, sort: ID_DESC) { edges { node { id legacyId canonicalPath width height name isLikedByMe notSafeForWork photographer: uploader { id legacyId username displayName canonicalPath followedByUsers { isFollowedByMe } } images(sizes: [33, 35]) { size url jpegUrl webpUrl id } __typename } cursor } totalCount pageInfo { endCursor hasNextPage } } } """, "OtherPhotosPaginationContainerQuery": """\ query OtherPhotosPaginationContainerQuery($username: String!, $pageSize: Int, $cursor: String) { userByUsername(username: $username) { ...OtherPhotosPaginationContainer_user_3e6UuE id } } fragment OtherPhotosPaginationContainer_user_3e6UuE on User { photos(first: $pageSize, after: $cursor, privacy: PROFILE, sort: ID_DESC) { edges { node { id legacyId canonicalPath width height name isLikedByMe notSafeForWork photographer: uploader { id legacyId username displayName canonicalPath followedByUsers { isFollowedByMe } } images(sizes: [33, 35]) { size url jpegUrl webpUrl id } __typename } cursor } totalCount pageInfo { endCursor hasNextPage } } } """, "ProfileRendererQuery": """\ query ProfileRendererQuery($username: String!) { profile: userByUsername(username: $username) { id legacyId userType: type username firstName displayName registeredAt canonicalPath avatar { ...ProfileAvatar_avatar id } userProfile { firstname lastname state country city about id } socialMedia { website twitter instagram facebook id } coverPhotoUrl followedByUsers { totalCount isFollowedByMe } followingUsers { totalCount } membership { expiryDate membershipTier: tier photoUploadQuota refreshPhotoUploadQuotaAt paymentStatus id } profileTabs { tabs { name visible } } ...EditCover_cover photoStats { likeCount viewCount } photos(privacy: PROFILE) { totalCount } licensingPhotos(status: ACCEPTED) { totalCount } portfolio { id status userDisabled } } } fragment EditCover_cover on User { coverPhotoUrl } fragment ProfileAvatar_avatar on UserAvatar { images(sizes: [MEDIUM, LARGE]) { size url id } } """, "GalleriesDetailQueryRendererQuery": """\ query GalleriesDetailQueryRendererQuery($galleryOwnerLegacyId: ID!, $ownerLegacyId: String, $slug: String, $token: String, $pageSize: Int, $gallerySize: Int) { galleries(galleryOwnerLegacyId: $galleryOwnerLegacyId, first: $gallerySize) { edges { node { legacyId description name privacy canonicalPath notSafeForWork buttonName externalUrl cover { images(sizes: [35, 33]) { size webpUrl jpegUrl id } id } photos { totalCount } id } } } gallery: galleryByOwnerIdAndSlugOrToken(ownerLegacyId: $ownerLegacyId, slug: $slug, token: $token) { ...GalleriesDetailPaginationContainer_gallery_RlXb8 id } } fragment GalleriesDetailPaginationContainer_gallery_RlXb8 on Gallery { id legacyId name privacy notSafeForWork ownPhotosOnly canonicalPath publicSlug lastPublishedAt photosAddedSinceLastPublished reportStatus creator { legacyId id } cover { images(sizes: [33, 32, 36, 2048]) { url size webpUrl id } id } description externalUrl buttonName photos(first: $pageSize) { totalCount edges { cursor node { id legacyId canonicalPath name description category uploadedAt location width height isLikedByMe photographer: uploader { id legacyId username displayName canonicalPath avatar { images(sizes: SMALL) { url id } id } followedByUsers { totalCount isFollowedByMe } } images(sizes: [33, 32]) { size url webpUrl id } __typename } } pageInfo { endCursor hasNextPage } } } """, "GalleriesDetailPaginationContainerQuery": """\ query GalleriesDetailPaginationContainerQuery($ownerLegacyId: String, $slug: String, $token: String, $pageSize: Int, $cursor: String) { galleryByOwnerIdAndSlugOrToken(ownerLegacyId: $ownerLegacyId, slug: $slug, token: $token) { ...GalleriesDetailPaginationContainer_gallery_3e6UuE id } } fragment GalleriesDetailPaginationContainer_gallery_3e6UuE on Gallery { id legacyId name privacy notSafeForWork ownPhotosOnly canonicalPath publicSlug lastPublishedAt photosAddedSinceLastPublished reportStatus creator { legacyId id } cover { images(sizes: [33, 32, 36, 2048]) { url size webpUrl id } id } description externalUrl buttonName photos(first: $pageSize, after: $cursor) { totalCount edges { cursor node { id legacyId canonicalPath name description category uploadedAt location width height isLikedByMe photographer: uploader { id legacyId username displayName canonicalPath avatar { images(sizes: SMALL) { url id } id } followedByUsers { totalCount isFollowedByMe } } images(sizes: [33, 32]) { size url webpUrl id } __typename } } pageInfo { endCursor hasNextPage } } } """, "LikedPhotosQueryRendererQuery": """\ query LikedPhotosQueryRendererQuery($pageSize: Int) { ...LikedPhotosPaginationContainer_query_RlXb8 } fragment LikedPhotosPaginationContainer_query_RlXb8 on Query { likedPhotos(first: $pageSize) { edges { node { id legacyId canonicalPath name description category uploadedAt location width height isLikedByMe notSafeForWork tags photographer: uploader { id legacyId username displayName canonicalPath avatar { images { url id } id } followedByUsers { totalCount isFollowedByMe } } images(sizes: [33, 35]) { size url jpegUrl webpUrl id } __typename } cursor } pageInfo { endCursor hasNextPage } } } """, "LikedPhotosPaginationContainerQuery": """\ query LikedPhotosPaginationContainerQuery($cursor: String, $pageSize: Int) { ...LikedPhotosPaginationContainer_query_3e6UuE } fragment LikedPhotosPaginationContainer_query_3e6UuE on Query { likedPhotos(first: $pageSize, after: $cursor) { edges { node { id legacyId canonicalPath name description category uploadedAt location width height isLikedByMe notSafeForWork tags photographer: uploader { id legacyId username displayName canonicalPath avatar { images { url id } id } followedByUsers { totalCount isFollowedByMe } } images(sizes: [33, 35]) { size url jpegUrl webpUrl id } __typename } cursor } pageInfo { endCursor hasNextPage } } } """, } ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1674475069.0 gallery_dl-1.25.0/gallery_dl/extractor/8chan.py0000644000175000017500000001370014363473075020132 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://8chan.moe/""" from .common import Extractor, Message from .. import text from ..cache import memcache from datetime import datetime, timedelta import itertools BASE_PATTERN = r"(?:https?://)?8chan\.(moe|se|cc)" class _8chanExtractor(Extractor): """Base class for 8chan extractors""" category = "8chan" root = "https://8chan.moe" def __init__(self, match): self.root = "https://8chan." + match.group(1) Extractor.__init__(self, match) @memcache() def _prepare_cookies(self): # fetch captcha cookies # (necessary to download without getting interrupted) now = datetime.utcnow() url = self.root + "/captcha.js" params = {"d": now.strftime("%a %b %d %Y %H:%M:%S GMT+0000 (UTC)")} self.request(url, params=params).content # adjust cookies # - remove 'expires' timestamp # - move 'captchaexpiration' value forward by 1 month) domain = self.root.rpartition("/")[2] for cookie in self.session.cookies: if cookie.domain.endswith(domain): cookie.expires = None if cookie.name == "captchaexpiration": cookie.value = (now + timedelta(30, 300)).strftime( "%a, %d %b %Y %H:%M:%S GMT") return self.session.cookies class _8chanThreadExtractor(_8chanExtractor): """Extractor for 8chan threads""" subcategory = "thread" directory_fmt = ("{category}", "{boardUri}", "{threadId} {subject[:50]}") filename_fmt = "{postId}{num:?-//} {filename[:200]}.{extension}" archive_fmt = "{boardUri}_{postId}_{num}" pattern = BASE_PATTERN + r"/([^/?#]+)/res/(\d+)" test = ( ("https://8chan.moe/vhs/res/4.html", { "pattern": r"https://8chan\.moe/\.media/[0-9a-f]{64}\.\w+$", "count": 14, "keyword": { "archived": False, "autoSage": False, "boardDescription": "Film and Cinema", "boardMarkdown": None, "boardName": "Movies", "boardUri": "vhs", "creation": r"re:\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.\d{3}Z", "cyclic": False, "email": None, "id": "re:^[0-9a-f]{6}$", "locked": False, "markdown": str, "maxFileCount": 5, "maxFileSize": "32.00 MB", "maxMessageLength": 8001, "message": str, "mime": str, "name": "Anonymous", "num": int, "originalName": str, "path": r"re:/.media/[0-9a-f]{64}\.\w+$", "pinned": False, "postId": int, "signedRole": None, "size": int, "threadId": 4, "thumb": r"re:/.media/t_[0-9a-f]{64}$", "uniquePosters": 9, "usesCustomCss": True, "usesCustomJs": False, "?wsPort": 8880, "?wssPort": 2087, }, }), ("https://8chan.se/vhs/res/4.html"), ("https://8chan.cc/vhs/res/4.html"), ) def __init__(self, match): _8chanExtractor.__init__(self, match) _, self.board, self.thread = match.groups() def items(self): # fetch thread data url = "{}/{}/res/{}.".format(self.root, self.board, self.thread) self.session.headers["Referer"] = url + "html" thread = self.request(url + "json").json() thread["postId"] = thread["threadId"] thread["_http_headers"] = {"Referer": url + "html"} try: self.session.cookies = self._prepare_cookies() except Exception as exc: self.log.debug("Failed to fetch captcha cookies: %s: %s", exc.__class__.__name__, exc, exc_info=True) # download files posts = thread.pop("posts", ()) yield Message.Directory, thread for post in itertools.chain((thread,), posts): files = post.pop("files", ()) if not files: continue thread.update(post) for num, file in enumerate(files): file.update(thread) file["num"] = num text.nameext_from_url(file["originalName"], file) yield Message.Url, self.root + file["path"], file class _8chanBoardExtractor(_8chanExtractor): """Extractor for 8chan boards""" subcategory = "board" pattern = BASE_PATTERN + r"/([^/?#]+)/(?:(\d+)\.html)?$" test = ( ("https://8chan.moe/vhs/"), ("https://8chan.moe/vhs/2.html", { "pattern": _8chanThreadExtractor.pattern, "count": 23, }), ("https://8chan.se/vhs/"), ("https://8chan.cc/vhs/"), ) def __init__(self, match): _8chanExtractor.__init__(self, match) _, self.board, self.page = match.groups() self.session.headers["Referer"] = self.root + "/" def items(self): page = text.parse_int(self.page, 1) url = "{}/{}/{}.json".format(self.root, self.board, page) board = self.request(url).json() threads = board["threads"] while True: for thread in threads: thread["_extractor"] = _8chanThreadExtractor url = "{}/{}/res/{}.html".format( self.root, self.board, thread["threadId"]) yield Message.Queue, url, thread page += 1 if page > board["pageCount"]: return url = "{}/{}/{}.json".format(self.root, self.board, page) threads = self.request(url).json()["threads"] ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1675797623.0 gallery_dl-1.25.0/gallery_dl/extractor/8muses.py0000644000175000017500000001161714370522167020355 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2019-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://comics.8muses.com/""" from .common import Extractor, Message from .. import text, util class _8musesAlbumExtractor(Extractor): """Extractor for image albums on comics.8muses.com""" category = "8muses" subcategory = "album" directory_fmt = ("{category}", "{album[path]}") filename_fmt = "{page:>03}.{extension}" archive_fmt = "{hash}" root = "https://comics.8muses.com" pattern = (r"(?:https?://)?(?:comics\.|www\.)?8muses\.com" r"(/comics/album/[^?#]+)(\?[^#]+)?") test = ( ("https://comics.8muses.com/comics/album/Fakku-Comics/mogg/Liar", { "url": "6286ac33087c236c5a7e51f8a9d4e4d5548212d4", "pattern": r"https://comics.8muses.com/image/fl/[\w-]+", "keyword": { "url" : str, "hash" : str, "page" : int, "count": 6, "album": { "id" : 10467, "title" : "Liar", "path" : "Fakku Comics/mogg/Liar", "private": False, "url" : str, "parent" : 10464, "views" : int, "likes" : int, "date" : "dt:2018-07-10 00:00:00", }, }, }), ("https://www.8muses.com/comics/album/Fakku-Comics/santa", { "count": ">= 3", "pattern": pattern, "keyword": { "url" : str, "name" : str, "private": False, }, }), # custom sorting ("https://www.8muses.com/comics/album/Fakku-Comics/11?sort=az", { "count": ">= 70", "keyword": {"name": r"re:^[R-Zr-z]"}, }), # non-ASCII characters (("https://comics.8muses.com/comics/album/Various-Authors/Chessire88" "/From-Trainers-to-Pokmons"), { "count": 2, "keyword": {"name": "re:From Trainers to Pokémons"}, }), ) def __init__(self, match): Extractor.__init__(self, match) self.path = match.group(1) self.params = match.group(2) or "" def items(self): url = self.root + self.path + self.params while True: data = self._unobfuscate(text.extr( self.request(url).text, 'id="ractive-public" type="text/plain">', '')) images = data.get("pictures") if images: count = len(images) album = self._make_album(data["album"]) yield Message.Directory, {"album": album, "count": count} for num, image in enumerate(images, 1): url = self.root + "/image/fl/" + image["publicUri"] img = { "url" : url, "page" : num, "hash" : image["publicUri"], "count" : count, "album" : album, "extension": "jpg", } yield Message.Url, url, img albums = data.get("albums") if albums: for album in albums: url = self.root + "/comics/album/" + album["permalink"] yield Message.Queue, url, { "url" : url, "name" : album["name"], "private" : album["isPrivate"], "_extractor": _8musesAlbumExtractor, } if data["page"] >= data["pages"]: return path, _, num = self.path.rstrip("/").rpartition("/") path = path if num.isdecimal() else self.path url = "{}{}/{}{}".format( self.root, path, data["page"] + 1, self.params) def _make_album(self, album): return { "id" : album["id"], "path" : album["path"], "title" : album["name"], "private": album["isPrivate"], "url" : self.root + album["permalink"], "parent" : text.parse_int(album["parentId"]), "views" : text.parse_int(album["numberViews"]), "likes" : text.parse_int(album["numberLikes"]), "date" : text.parse_datetime( album["updatedAt"], "%Y-%m-%dT%H:%M:%S.%fZ"), } @staticmethod def _unobfuscate(data): return util.json_loads("".join([ chr(33 + (ord(c) + 14) % 94) if "!" <= c <= "~" else c for c in text.unescape(data.strip("\t\n\r !")) ])) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1677790111.0 gallery_dl-1.25.0/gallery_dl/extractor/__init__.py0000644000175000017500000001124214400205637020654 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2015-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. import sys import re modules = [ "2chan", "2chen", "35photo", "3dbooru", "420chan", "4chan", "500px", "8chan", "8muses", "adultempire", "architizer", "artstation", "aryion", "bbc", "bcy", "behance", "blogger", "bunkr", "catbox", "comicvine", "cyberdrop", "danbooru", "desktopography", "deviantart", "dynastyscans", "e621", "erome", "exhentai", "fallenangels", "fanbox", "fanleaks", "fantia", "fapello", "fapachi", "flickr", "furaffinity", "fuskator", "gelbooru", "gelbooru_v01", "gelbooru_v02", "gfycat", "gofile", "hbrowse", "hentai2read", "hentaicosplays", "hentaifoundry", "hentaifox", "hentaihand", "hentaihere", "hiperdex", "hitomi", "hotleak", "idolcomplex", "imagebam", "imagechest", "imagefap", "imgbb", "imgbox", "imgth", "imgur", "inkbunny", "instagram", "issuu", "itaku", "kabeuchi", "keenspot", "kemonoparty", "khinsider", "komikcast", "lexica", "lightroom", "lineblog", "livedoor", "luscious", "lynxchan", "mangadex", "mangafox", "mangahere", "mangakakalot", "manganelo", "mangapark", "mangasee", "mangoxo", "mememuseum", "misskey", "myhentaigallery", "myportfolio", "nana", "naver", "naverwebtoon", "newgrounds", "nhentai", "nijie", "nitter", "nozomi", "nsfwalbum", "nudecollect", "paheal", "patreon", "philomena", "photobucket", "photovogue", "picarto", "piczel", "pillowfort", "pinterest", "pixiv", "pixnet", "plurk", "poipiku", "pornhub", "pornpics", "pururin", "reactor", "readcomiconline", "reddit", "redgifs", "rule34us", "sankaku", "sankakucomplex", "seiga", "senmanga", "sexcom", "simplyhentai", "skeb", "slickpic", "slideshare", "smugmug", "soundgasm", "speakerdeck", "subscribestar", "szurubooru", "tapas", "tcbscans", "telegraph", "toyhouse", "tsumino", "tumblr", "tumblrgallery", "twibooru", "twitter", "unsplash", "uploadir", "vanillarock", "vichan", "vk", "vsco", "wallhaven", "wallpapercave", "warosu", "weasyl", "webmshare", "webtoons", "weibo", "wikiart", "wikifeet", "xhamster", "xvideos", "zerochan", "booru", "moebooru", "foolfuuka", "foolslide", "mastodon", "shopify", "lolisafe", "imagehosts", "directlink", "recursive", "oauth", "test", "ytdl", "generic", ] def find(url): """Find a suitable extractor for the given URL""" for cls in _list_classes(): match = cls.pattern.match(url) if match: return cls(match) return None def add(cls): """Add 'cls' to the list of available extractors""" cls.pattern = re.compile(cls.pattern) _cache.append(cls) return cls def add_module(module): """Add all extractors in 'module' to the list of available extractors""" classes = _get_classes(module) for cls in classes: cls.pattern = re.compile(cls.pattern) _cache.extend(classes) return classes def extractors(): """Yield all available extractor classes""" return sorted( _list_classes(), key=lambda x: x.__name__ ) # -------------------------------------------------------------------- # internals def _list_classes(): """Yield available extractor classes""" yield from _cache for module in _module_iter: yield from add_module(module) globals()["_list_classes"] = lambda : _cache def _modules_internal(): globals_ = globals() for module_name in modules: yield __import__(module_name, globals_, None, (), 1) def _modules_path(path, files): sys.path.insert(0, path) try: return [ __import__(name[:-3]) for name in files if name.endswith(".py") ] finally: del sys.path[0] def _get_classes(module): """Return a list of all extractor classes in a module""" return [ cls for cls in module.__dict__.values() if ( hasattr(cls, "pattern") and cls.__module__ == module.__name__ ) ] _cache = [] _module_iter = _modules_internal() ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1671837296.0 gallery_dl-1.25.0/gallery_dl/extractor/adultempire.py0000644000175000017500000000421414351433160021430 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2019 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.adultempire.com/""" from .common import GalleryExtractor from .. import text class AdultempireGalleryExtractor(GalleryExtractor): """Extractor for image galleries from www.adultempire.com""" category = "adultempire" root = "https://www.adultempire.com" pattern = (r"(?:https?://)?(?:www\.)?adult(?:dvd)?empire\.com" r"(/(\d+)/gallery\.html)") test = ( ("https://www.adultempire.com/5998/gallery.html", { "range": "1", "keyword": "5b3266e69801db0d78c22181da23bc102886e027", "content": "5c6beb31e5e3cdc90ee5910d5c30f9aaec977b9e", }), ("https://www.adultdvdempire.com/5683/gallery.html", { "url": "b12cd1a65cae8019d837505adb4d6a2c1ed4d70d", "keyword": "8d448d79c4ac5f5b10a3019d5b5129ddb43655e5", }), ) def __init__(self, match): GalleryExtractor.__init__(self, match) self.gallery_id = match.group(2) def metadata(self, page): extr = text.extract_from(page, page.index('
')) return { "gallery_id": text.parse_int(self.gallery_id), "title" : text.unescape(extr('title="', '"')), "studio" : extr(">studio", "<").strip(), "date" : text.parse_datetime(extr( ">released", "<").strip(), "%m/%d/%Y"), "actors" : sorted(text.split_html(extr( '
    ", "<").strip(), "type" : text.unescape(text.remove_html(extr( '
    Type
    ', 'STATUS
', 'YEAR', 'SIZE', '', '') .replace("
", "\n")), } def images(self, page): return [ (url, None) for url in text.extract_iter( page, "property='og:image:secure_url' content='", "?") ] class ArchitizerFirmExtractor(Extractor): """Extractor for all projects of a firm""" category = "architizer" subcategory = "firm" root = "https://architizer.com" pattern = r"(?:https?://)?architizer\.com/firms/([^/?#]+)" test = ("https://architizer.com/firms/olson-kundig/", { "pattern": ArchitizerProjectExtractor.pattern, "count": ">= 90", }) def __init__(self, match): Extractor.__init__(self, match) self.firm = match.group(1) def items(self): url = url = "{}/firms/{}/?requesting_merlin=pages".format( self.root, self.firm) page = self.request(url).text data = {"_extractor": ArchitizerProjectExtractor} for project in text.extract_iter(page, '
= data["total_count"]: return params["page"] += 1 def _init_csrf_token(self): url = self.root + "/api/v2/csrf_protection/token.json" headers = { "Accept" : "*/*", "Origin" : self.root, "Referer": self.root + "/", } return self.request( url, method="POST", headers=headers, json={}, ).json()["public_csrf_token"] @staticmethod def _no_cache(url, alphabet=(string.digits + string.ascii_letters)): """Cause a cache miss to prevent Cloudflare 'optimizations' Cloudflare's 'Polish' optimization strips image metadata and may even recompress an image as lossy JPEG. This can be prevented by causing a cache miss when requesting an image by adding a random dummy query parameter. Ref: https://github.com/r888888888/danbooru/issues/3528 https://danbooru.donmai.us/forum_topics/14952 """ param = "gallerydl_no_cache=" + util.bencode( random.getrandbits(64), alphabet) sep = "&" if "?" in url else "?" return url + sep + param class ArtstationUserExtractor(ArtstationExtractor): """Extractor for all projects of an artstation user""" subcategory = "user" pattern = (r"(?:https?://)?(?:(?:www\.)?artstation\.com" r"/(?!artwork|projects|search)([^/?#]+)(?:/albums/all)?" r"|((?!www)\w+)\.artstation\.com(?:/projects)?)/?$") test = ( ("https://www.artstation.com/sungchoi/", { "pattern": r"https://\w+\.artstation\.com/p/assets/images" r"/images/\d+/\d+/\d+/(4k|large|medium|small)/[^/]+", "range": "1-10", "count": ">= 10", }), ("https://www.artstation.com/sungchoi/albums/all/"), ("https://sungchoi.artstation.com/"), ("https://sungchoi.artstation.com/projects/"), ) def projects(self): url = "{}/users/{}/projects.json".format(self.root, self.user) params = {"album_id": "all"} return self._pagination(url, params) class ArtstationAlbumExtractor(ArtstationExtractor): """Extractor for all projects in an artstation album""" subcategory = "album" directory_fmt = ("{category}", "{userinfo[username]}", "Albums", "{album[id]} - {album[title]}") archive_fmt = "a_{album[id]}_{asset[id]}" pattern = (r"(?:https?://)?(?:(?:www\.)?artstation\.com" r"/(?!artwork|projects|search)([^/?#]+)" r"|((?!www)\w+)\.artstation\.com)/albums/(\d+)") test = ( ("https://www.artstation.com/huimeiye/albums/770899", { "count": 2, }), ("https://www.artstation.com/huimeiye/albums/770898", { "exception": exception.NotFoundError, }), ("https://huimeiye.artstation.com/albums/770899"), ) def __init__(self, match): ArtstationExtractor.__init__(self, match) self.album_id = text.parse_int(match.group(3)) def metadata(self): userinfo = self.get_user_info(self.user) album = None for album in userinfo["albums_with_community_projects"]: if album["id"] == self.album_id: break else: raise exception.NotFoundError("album") return { "userinfo": userinfo, "album": album } def projects(self): url = "{}/users/{}/projects.json".format(self.root, self.user) params = {"album_id": self.album_id} return self._pagination(url, params) class ArtstationLikesExtractor(ArtstationExtractor): """Extractor for liked projects of an artstation user""" subcategory = "likes" directory_fmt = ("{category}", "{userinfo[username]}", "Likes") archive_fmt = "f_{userinfo[id]}_{asset[id]}" pattern = (r"(?:https?://)?(?:www\.)?artstation\.com" r"/(?!artwork|projects|search)([^/?#]+)/likes/?") test = ( ("https://www.artstation.com/mikf/likes", { "pattern": r"https://\w+\.artstation\.com/p/assets/images" r"/images/\d+/\d+/\d+/(4k|large|medium|small)/[^/]+", "count": 6, }), # no likes ("https://www.artstation.com/sungchoi/likes", { "count": 0, }), ) def projects(self): url = "{}/users/{}/likes.json".format(self.root, self.user) return self._pagination(url) class ArtstationChallengeExtractor(ArtstationExtractor): """Extractor for submissions of artstation challenges""" subcategory = "challenge" filename_fmt = "{submission_id}_{asset_id}_{filename}.{extension}" directory_fmt = ("{category}", "Challenges", "{challenge[id]} - {challenge[title]}") archive_fmt = "c_{challenge[id]}_{asset_id}" pattern = (r"(?:https?://)?(?:www\.)?artstation\.com" r"/contests/[^/?#]+/challenges/(\d+)" r"/?(?:\?sorting=([a-z]+))?") test = ( ("https://www.artstation.com/contests/thu-2017/challenges/20"), (("https://www.artstation.com/contests/beyond-human" "/challenges/23?sorting=winners"), { "range": "1-30", "count": 30, }), ) def __init__(self, match): ArtstationExtractor.__init__(self, match) self.challenge_id = match.group(1) self.sorting = match.group(2) or "popular" def items(self): challenge_url = "{}/contests/_/challenges/{}.json".format( self.root, self.challenge_id) submission_url = "{}/contests/_/challenges/{}/submissions.json".format( self.root, self.challenge_id) update_url = "{}/contests/submission_updates.json".format( self.root) challenge = self.request(challenge_url).json() yield Message.Directory, {"challenge": challenge} params = {"sorting": self.sorting} for submission in self._pagination(submission_url, params): params = {"submission_id": submission["id"]} for update in self._pagination(update_url, params=params): del update["replies"] update["challenge"] = challenge for url in text.extract_iter( update["body_presentation_html"], ' href="', '"'): update["asset_id"] = self._id_from_url(url) text.nameext_from_url(url, update) yield Message.Url, self._no_cache(url), update @staticmethod def _id_from_url(url): """Get an image's submission ID from its URL""" parts = url.split("/") return text.parse_int("".join(parts[7:10])) class ArtstationSearchExtractor(ArtstationExtractor): """Extractor for artstation search results""" subcategory = "search" directory_fmt = ("{category}", "Searches", "{search[query]}") archive_fmt = "s_{search[query]}_{asset[id]}" pattern = (r"(?:https?://)?(?:\w+\.)?artstation\.com" r"/search/?\?([^#]+)") test = ("https://www.artstation.com/search?query=ancient&sort_by=rank", { "range": "1-20", "count": 20, }) def __init__(self, match): ArtstationExtractor.__init__(self, match) self.params = query = text.parse_query(match.group(1)) self.query = text.unquote(query.get("query") or query.get("q", "")) self.sorting = query.get("sort_by", "relevance").lower() self.tags = query.get("tags", "").split(",") def metadata(self): return {"search": { "query" : self.query, "sorting": self.sorting, "tags" : self.tags, }} def projects(self): filters = [] for key, value in self.params.items(): if key.endswith("_ids") or key == "tags": filters.append({ "field" : key, "method": "include", "value" : value.split(","), }) url = "{}/api/v2/search/projects.json".format(self.root) data = { "query" : self.query, "page" : None, "per_page" : 50, "sorting" : self.sorting, "pro_first" : ("1" if self.config("pro-first", True) else "0"), "filters" : filters, "additional_fields": (), } return self._pagination(url, json=data) class ArtstationArtworkExtractor(ArtstationExtractor): """Extractor for projects on artstation's artwork page""" subcategory = "artwork" directory_fmt = ("{category}", "Artworks", "{artwork[sorting]!c}") archive_fmt = "A_{asset[id]}" pattern = (r"(?:https?://)?(?:\w+\.)?artstation\.com" r"/artwork/?\?([^#]+)") test = ("https://www.artstation.com/artwork?sorting=latest", { "range": "1-20", "count": 20, }) def __init__(self, match): ArtstationExtractor.__init__(self, match) self.query = text.parse_query(match.group(1)) def metadata(self): return {"artwork": self.query} def projects(self): url = "{}/projects.json".format(self.root) return self._pagination(url, self.query.copy()) class ArtstationImageExtractor(ArtstationExtractor): """Extractor for images from a single artstation project""" subcategory = "image" pattern = (r"(?:https?://)?(?:" r"(?:\w+\.)?artstation\.com/(?:artwork|projects|search)" r"|artstn\.co/p)/(\w+)") test = ( ("https://www.artstation.com/artwork/LQVJr", { "pattern": r"https?://\w+\.artstation\.com/p/assets" r"/images/images/008/760/279/4k/.+", "content": "7b113871465fdc09d127adfdc2767d51cf45a7e9", # SHA1 hash without _no_cache() # "content": "44b80f9af36d40efc5a2668cdd11d36d6793bae9", }), # multiple images per project ("https://www.artstation.com/artwork/Db3dy", { "count": 4, }), # embedded youtube video ("https://www.artstation.com/artwork/g4WPK", { "range": "2", "options": (("external", True),), "pattern": "ytdl:https://www.youtube.com/embed/JNFfJtwwrU0", }), # 404 (#3016) ("https://www.artstation.com/artwork/3q3mXB", { "count": 0, }), # alternate URL patterns ("https://sungchoi.artstation.com/projects/LQVJr"), ("https://artstn.co/p/LQVJr"), ) def __init__(self, match): ArtstationExtractor.__init__(self, match) self.project_id = match.group(1) self.assets = None def metadata(self): self.assets = list(ArtstationExtractor.get_project_assets( self, self.project_id)) try: self.user = self.assets[0]["user"]["username"] except IndexError: self.user = "" return ArtstationExtractor.metadata(self) def projects(self): return ({"hash_id": self.project_id},) def get_project_assets(self, project_id): return self.assets class ArtstationFollowingExtractor(ArtstationExtractor): """Extractor for a user's followed users""" subcategory = "following" pattern = (r"(?:https?://)?(?:www\.)?artstation\.com" r"/(?!artwork|projects|search)([^/?#]+)/following") test = ("https://www.artstation.com/sungchoi/following", { "pattern": ArtstationUserExtractor.pattern, "count": ">= 50", }) def items(self): url = "{}/users/{}/following.json".format(self.root, self.user) for user in self._pagination(url): url = "{}/{}".format(self.root, user["username"]) user["_extractor"] = ArtstationUserExtractor yield Message.Queue, url, user ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1674475069.0 gallery_dl-1.25.0/gallery_dl/extractor/aryion.py0000644000175000017500000002214114363473075020431 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2020-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://aryion.com/""" from .common import Extractor, Message from .. import text, util, exception from ..cache import cache from email.utils import parsedate_tz from datetime import datetime BASE_PATTERN = r"(?:https?://)?(?:www\.)?aryion\.com/g4" class AryionExtractor(Extractor): """Base class for aryion extractors""" category = "aryion" directory_fmt = ("{category}", "{user!l}", "{path:J - }") filename_fmt = "{id} {title}.{extension}" archive_fmt = "{id}" cookiedomain = ".aryion.com" cookienames = ("phpbb3_rl7a3_sid",) root = "https://aryion.com" def __init__(self, match): Extractor.__init__(self, match) self.user = match.group(1) self.recursive = True def login(self): if self._check_cookies(self.cookienames): return username, password = self._get_auth_info() if username: self._update_cookies(self._login_impl(username, password)) @cache(maxage=14*24*3600, keyarg=1) def _login_impl(self, username, password): self.log.info("Logging in as %s", username) url = self.root + "/forum/ucp.php?mode=login" data = { "username": username, "password": password, "login": "Login", } response = self.request(url, method="POST", data=data) if b"You have been successfully logged in." not in response.content: raise exception.AuthenticationError() return {c: response.cookies[c] for c in self.cookienames} def items(self): self.login() data = self.metadata() for post_id in self.posts(): post = self._parse_post(post_id) if post: if data: post.update(data) yield Message.Directory, post yield Message.Url, post["url"], post elif post is False and self.recursive: base = self.root + "/g4/view/" data = {"_extractor": AryionPostExtractor} for post_id in self._pagination_params(base + post_id): yield Message.Queue, base + post_id, data def posts(self): """Yield relevant post IDs""" def metadata(self): """Return general metadata""" def _pagination_params(self, url, params=None): if params is None: params = {"p": 1} else: params["p"] = text.parse_int(params.get("p"), 1) while True: page = self.request(url, params=params).text cnt = 0 for post_id in text.extract_iter( page, "class='gallery-item' id='", "'"): cnt += 1 yield post_id if cnt < 40: return params["p"] += 1 def _pagination_next(self, url): while True: page = self.request(url).text yield from text.extract_iter(page, "thumb' href='/g4/view/", "'") pos = page.find("Next >>") if pos < 0: return url = self.root + text.rextract(page, "href='", "'", pos)[0] def _parse_post(self, post_id): url = "{}/g4/data.php?id={}".format(self.root, post_id) with self.request(url, method="HEAD", fatal=False) as response: if response.status_code >= 400: self.log.warning( "Unable to fetch post %s ('%s %s')", post_id, response.status_code, response.reason) return None headers = response.headers # folder if headers["content-type"] in ( "application/x-folder", "application/x-comic-folder", "application/x-comic-folder-nomerge", ): return False # get filename from 'Content-Disposition' header cdis = headers["content-disposition"] fname, _, ext = text.extr(cdis, 'filename="', '"').rpartition(".") if not fname: fname, ext = ext, fname # get file size from 'Content-Length' header clen = headers.get("content-length") # fix 'Last-Modified' header lmod = headers["last-modified"] if lmod[22] != ":": lmod = "{}:{} GMT".format(lmod[:22], lmod[22:24]) post_url = "{}/g4/view/{}".format(self.root, post_id) extr = text.extract_from(self.request(post_url).text) title, _, artist = text.unescape(extr( "g4 :: ", "<")).rpartition(" by ") return { "id" : text.parse_int(post_id), "url" : url, "user" : self.user or artist, "title" : title, "artist": artist, "path" : text.split_html(extr( "cookiecrumb'>", '</span'))[4:-1:2], "date" : datetime(*parsedate_tz(lmod)[:6]), "size" : text.parse_int(clen), "views" : text.parse_int(extr("Views</b>:", "<").replace(",", "")), "width" : text.parse_int(extr("Resolution</b>:", "x")), "height": text.parse_int(extr("", "<")), "comments" : text.parse_int(extr("Comments</b>:", "<")), "favorites": text.parse_int(extr("Favorites</b>:", "<")), "tags" : text.split_html(extr("class='taglist'>", "</span>")), "description": text.unescape(text.remove_html(extr( "<p>", "</p>"), "", "")), "filename" : fname, "extension": ext, "_mtime" : lmod, } class AryionGalleryExtractor(AryionExtractor): """Extractor for a user's gallery on eka's portal""" subcategory = "gallery" categorytransfer = True pattern = BASE_PATTERN + r"/(?:gallery/|user/|latest.php\?name=)([^/?#]+)" test = ( ("https://aryion.com/g4/gallery/jameshoward", { "options": (("recursive", False),), "pattern": r"https://aryion\.com/g4/data\.php\?id=\d+$", "range": "48-52", "count": 5, }), ("https://aryion.com/g4/user/jameshoward"), ("https://aryion.com/g4/latest.php?name=jameshoward"), ) def __init__(self, match): AryionExtractor.__init__(self, match) self.recursive = self.config("recursive", True) self.offset = 0 def skip(self, num): if self.recursive: return 0 self.offset += num return num def posts(self): if self.recursive: url = "{}/g4/gallery/{}".format(self.root, self.user) return self._pagination_params(url) else: url = "{}/g4/latest.php?name={}".format(self.root, self.user) return util.advance(self._pagination_next(url), self.offset) class AryionTagExtractor(AryionExtractor): """Extractor for tag searches on eka's portal""" subcategory = "tag" directory_fmt = ("{category}", "tags", "{search_tags}") archive_fmt = "t_{search_tags}_{id}" pattern = BASE_PATTERN + r"/tags\.php\?([^#]+)" test = ("https://aryion.com/g4/tags.php?tag=star+wars&p=19", { "count": ">= 5", }) def metadata(self): self.params = text.parse_query(self.user) self.user = None return {"search_tags": self.params.get("tag")} def posts(self): url = self.root + "/g4/tags.php" return self._pagination_params(url, self.params) class AryionPostExtractor(AryionExtractor): """Extractor for individual posts on eka's portal""" subcategory = "post" pattern = BASE_PATTERN + r"/view/(\d+)" test = ( ("https://aryion.com/g4/view/510079", { "url": "f233286fa5558c07ae500f7f2d5cb0799881450e", "keyword": { "artist" : "jameshoward", "user" : "jameshoward", "filename" : "jameshoward-510079-subscribestar_150", "extension": "jpg", "id" : 510079, "width" : 1665, "height" : 1619, "size" : 784239, "title" : "I'm on subscribestar now too!", "description": r"re:Doesn't hurt to have a backup, right\?", "tags" : ["Non-Vore", "subscribestar"], "date" : "dt:2019-02-16 19:30:34", "path" : [], "views" : int, "favorites": int, "comments" : int, "_mtime" : "Sat, 16 Feb 2019 19:30:34 GMT", }, }), # x-folder (#694) ("https://aryion.com/g4/view/588928", { "pattern": pattern, "count": ">= 8", }), # x-comic-folder (#945) ("https://aryion.com/g4/view/537379", { "pattern": pattern, "count": 2, }), ) def posts(self): post_id, self.user = self.user, None return (post_id,) �������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1675805250.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.25.0/gallery_dl/extractor/bbc.py�������������������������������������������������������0000644�0001750�0001750�00000007134�14370541102�017644� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2021-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://bbc.co.uk/""" from .common import GalleryExtractor, Extractor, Message from .. import text, util BASE_PATTERN = r"(?:https?://)?(?:www\.)?bbc\.co\.uk(/programmes/" class BbcGalleryExtractor(GalleryExtractor): """Extractor for a programme gallery on bbc.co.uk""" category = "bbc" root = "https://www.bbc.co.uk" directory_fmt = ("{category}", "{path[0]}", "{path[1]}", "{path[2]}", "{path[3:]:J - /}") filename_fmt = "{num:>02}.{extension}" archive_fmt = "{programme}_{num}" pattern = BASE_PATTERN + r"[^/?#]+(?!/galleries)(?:/[^/?#]+)?)$" test = ( ("https://www.bbc.co.uk/programmes/p084qtzs/p085g9kg", { "pattern": r"https://ichef\.bbci\.co\.uk" r"/images/ic/1920xn/\w+\.jpg", "count": 37, "keyword": { "programme": "p084qtzs", "path": ["BBC One", "Doctor Who", "The Timeless Children"], }, }), ("https://www.bbc.co.uk/programmes/p084qtzs"), ) def metadata(self, page): data = util.json_loads(text.extr( page, '<script type="application/ld+json">', '</script>')) return { "programme": self.gallery_url.split("/")[4], "path": list(util.unique_sequence( element["name"] for element in data["itemListElement"] )), } def images(self, page): width = self.config("width") width = width - width % 16 if width else 1920 dimensions = "/{}xn/".format(width) return [ (src.replace("/320x180_b/", dimensions), {"_fallback": self._fallback_urls(src, width)}) for src in text.extract_iter(page, 'data-image-src="', '"') ] @staticmethod def _fallback_urls(src, max_width): front, _, back = src.partition("/320x180_b/") for width in (1920, 1600, 1280, 976): if width < max_width: yield "{}/{}xn/{}".format(front, width, back) class BbcProgrammeExtractor(Extractor): """Extractor for all galleries of a bbc programme""" category = "bbc" subcategory = "programme" root = "https://www.bbc.co.uk" pattern = BASE_PATTERN + r"[^/?#]+/galleries)(?:/?\?page=(\d+))?" test = ( ("https://www.bbc.co.uk/programmes/b006q2x0/galleries", { "pattern": BbcGalleryExtractor.pattern, "range": "1-50", "count": ">= 50", }), ("https://www.bbc.co.uk/programmes/b006q2x0/galleries?page=40", { "pattern": BbcGalleryExtractor.pattern, "count": ">= 100", }), ) def __init__(self, match): Extractor.__init__(self, match) self.path, self.page = match.groups() def items(self): data = {"_extractor": BbcGalleryExtractor} params = {"page": text.parse_int(self.page, 1)} galleries_url = self.root + self.path while True: page = self.request(galleries_url, params=params).text for programme_id in text.extract_iter( page, '<a href="https://www.bbc.co.uk/programmes/', '"'): url = "https://www.bbc.co.uk/programmes/" + programme_id yield Message.Queue, url, data if 'rel="next"' not in page: return params["page"] += 1 ������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1675805277.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.25.0/gallery_dl/extractor/bcy.py�������������������������������������������������������0000644�0001750�0001750�00000016074�14370541135�017704� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2020-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://bcy.net/""" from .common import Extractor, Message from .. import text, util, exception import re class BcyExtractor(Extractor): """Base class for bcy extractors""" category = "bcy" directory_fmt = ("{category}", "{user[id]} {user[name]}") filename_fmt = "{post[id]} {id}.{extension}" archive_fmt = "{post[id]}_{id}" root = "https://bcy.net" def __init__(self, match): Extractor.__init__(self, match) self.item_id = match.group(1) self.session.headers["Referer"] = self.root + "/" def items(self): sub = re.compile(r"^https?://p\d+-bcy" r"(?:-sign\.bcyimg\.com|\.byteimg\.com/img)" r"/banciyuan").sub iroot = "https://img-bcy-qn.pstatp.com" noop = self.config("noop") for post in self.posts(): if not post["image_list"]: continue multi = None tags = post.get("post_tags") or () data = { "user": { "id" : post["uid"], "name" : post["uname"], "avatar" : sub(iroot, post["avatar"].partition("~")[0]), }, "post": { "id" : text.parse_int(post["item_id"]), "tags" : [t["tag_name"] for t in tags], "date" : text.parse_timestamp(post["ctime"]), "parody" : post["work"], "content": post["plain"], "likes" : post["like_count"], "shares" : post["share_count"], "replies": post["reply_count"], }, } yield Message.Directory, data for data["num"], image in enumerate(post["image_list"], 1): data["id"] = image["mid"] data["width"] = image["w"] data["height"] = image["h"] url = image["path"].partition("~")[0] text.nameext_from_url(url, data) # full-resolution image without watermark if data["extension"]: if not url.startswith(iroot): url = sub(iroot, url) data["filter"] = "" yield Message.Url, url, data # watermarked image & low quality noop filter else: if multi is None: multi = self._data_from_post( post["item_id"])["post_data"]["multi"] image = multi[data["num"] - 1] if image["origin"]: data["filter"] = "watermark" yield Message.Url, image["origin"], data if noop: data["extension"] = "" data["filter"] = "noop" yield Message.Url, image["original_path"], data def posts(self): """Returns an iterable with all relevant 'post' objects""" def _data_from_post(self, post_id): url = "{}/item/detail/{}".format(self.root, post_id) page = self.request(url, notfound="post").text data = (text.extr(page, 'JSON.parse("', '");') .replace('\\\\u002F', '/') .replace('\\"', '"')) try: return util.json_loads(data)["detail"] except ValueError: return util.json_loads(data.replace('\\"', '"'))["detail"] class BcyUserExtractor(BcyExtractor): """Extractor for user timelines""" subcategory = "user" pattern = r"(?:https?://)?bcy\.net/u/(\d+)" test = ( ("https://bcy.net/u/1933712", { "pattern": r"https://img-bcy-qn.pstatp.com/\w+/\d+/post/\w+/.+jpg", "count": ">= 20", }), ("https://bcy.net/u/109282764041", { "pattern": r"https://p\d-bcy-sign\.bcyimg\.com/banciyuan/[0-9a-f]+" r"~tplv-bcyx-yuan-logo-v1:.+\.image", "range": "1-25", "count": 25, }), ) def posts(self): url = self.root + "/apiv3/user/selfPosts" params = {"uid": self.item_id, "since": None} while True: data = self.request(url, params=params).json() try: items = data["data"]["items"] except KeyError: return if not items: return for item in items: yield item["item_detail"] params["since"] = item["since"] class BcyPostExtractor(BcyExtractor): """Extractor for individual posts""" subcategory = "post" pattern = r"(?:https?://)?bcy\.net/item/detail/(\d+)" test = ( ("https://bcy.net/item/detail/6355835481002893070", { "url": "301202375e61fd6e0e2e35de6c3ac9f74885dec3", "count": 1, "keyword": { "user": { "id" : 1933712, "name" : "wukloo", "avatar" : "re:https://img-bcy-qn.pstatp.com/Public/", }, "post": { "id" : 6355835481002893070, "tags" : list, "date" : "dt:2016-11-22 08:47:46", "parody" : "东方PROJECT", "content": "re:根据微博的建议稍微做了点修改", "likes" : int, "shares" : int, "replies": int, }, "id": 8330182, "num": 1, "width" : 3000, "height": 1687, "filename": "712e0780b09011e696f973c3d1568337", "extension": "jpg", }, }), # only watermarked images available ("https://bcy.net/item/detail/6950136331708144648", { "pattern": r"https://p\d-bcy-sign\.bcyimg\.com/banciyuan/[0-9a-f]+" r"~tplv-bcyx-yuan-logo-v1:.+\.image", "count": 10, "keyword": {"filter": "watermark"}, }), # deleted ("https://bcy.net/item/detail/6780546160802143237", { "exception": exception.NotFoundError, "count": 0, }), # only visible to logged in users ("https://bcy.net/item/detail/6747523535150783495", { "count": 0, }), # JSON decode error (#3321) ("https://bcy.net/item/detail/7166939271872388110", { "count": 0, }), ) def posts(self): try: data = self._data_from_post(self.item_id) except KeyError: return () post = data["post_data"] post["image_list"] = post["multi"] post["plain"] = text.parse_unicode_escapes(post["plain"]) post.update(data["detail_user"]) return (post,) ��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1675805685.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.25.0/gallery_dl/extractor/behance.py���������������������������������������������������0000644�0001750�0001750�00000026574�14370541765�020533� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2018-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.behance.net/""" from .common import Extractor, Message from .. import text, util class BehanceExtractor(Extractor): """Base class for behance extractors""" category = "behance" root = "https://www.behance.net" request_interval = (2.0, 4.0) def items(self): for gallery in self.galleries(): gallery["_extractor"] = BehanceGalleryExtractor yield Message.Queue, gallery["url"], self._update(gallery) def galleries(self): """Return all relevant gallery URLs""" @staticmethod def _update(data): # compress data to simple lists if data["fields"] and isinstance(data["fields"][0], dict): data["fields"] = [ field.get("name") or field.get("label") for field in data["fields"] ] data["owners"] = [ owner.get("display_name") or owner.get("displayName") for owner in data["owners"] ] tags = data.get("tags") or () if tags and isinstance(tags[0], dict): tags = [tag["title"] for tag in tags] data["tags"] = tags # backwards compatibility data["gallery_id"] = data["id"] data["title"] = data["name"] data["user"] = ", ".join(data["owners"]) return data class BehanceGalleryExtractor(BehanceExtractor): """Extractor for image galleries from www.behance.net""" subcategory = "gallery" directory_fmt = ("{category}", "{owners:J, }", "{id} {name}") filename_fmt = "{category}_{id}_{num:>02}.{extension}" archive_fmt = "{id}_{num}" pattern = r"(?:https?://)?(?:www\.)?behance\.net/gallery/(\d+)" test = ( ("https://www.behance.net/gallery/17386197/A-Short-Story", { "count": 2, "url": "ab79bd3bef8d3ae48e6ac74fd995c1dfaec1b7d2", "keyword": { "id": 17386197, "name": 're:"Hi". A short story about the important things ', "owners": ["Place Studio", "Julio César Velazquez"], "fields": ["Animation", "Character Design", "Directing"], "tags": list, "module": dict, }, }), ("https://www.behance.net/gallery/21324767/Nevada-City", { "count": 6, "url": "0258fe194fe7d828d6f2c7f6086a9a0a4140db1d", "keyword": {"owners": ["Alex Strohl"]}, }), # 'media_collection' modules ("https://www.behance.net/gallery/88276087/Audi-R8-RWD", { "count": 20, "url": "6bebff0d37f85349f9ad28bd8b76fd66627c1e2f", }), # 'video' modules (#1282) ("https://www.behance.net/gallery/101185577/COLCCI", { "pattern": r"ytdl:https://cdn-prod-ccv\.adobe\.com/", "count": 3, }), ) def __init__(self, match): BehanceExtractor.__init__(self, match) self.gallery_id = match.group(1) def items(self): data = self.get_gallery_data() imgs = self.get_images(data) data["count"] = len(imgs) yield Message.Directory, data for data["num"], (url, module) in enumerate(imgs, 1): data["module"] = module data["extension"] = text.ext_from_url(url) yield Message.Url, url, data def get_gallery_data(self): """Collect gallery info dict""" url = "{}/gallery/{}/a".format(self.root, self.gallery_id) cookies = { "_evidon_consent_cookie": '{"consent_date":"2019-01-31T09:41:15.132Z"}', "bcp": "4c34489d-914c-46cd-b44c-dfd0e661136d", "gk_suid": "66981391", "gki": '{"feature_project_view":false,' '"feature_discover_login_prompt":false,' '"feature_project_login_prompt":false}', "ilo0": "true", } page = self.request(url, cookies=cookies).text data = util.json_loads(text.extr( page, 'id="beconfig-store_state">', '</script>')) return self._update(data["project"]["project"]) def get_images(self, data): """Extract image results from an API response""" result = [] append = result.append for module in data["modules"]: mtype = module["type"] if mtype == "image": url = module["sizes"]["original"] append((url, module)) elif mtype == "video": page = self.request(module["src"]).text url = text.extr(page, '<source src="', '"') if text.ext_from_url(url) == "m3u8": url = "ytdl:" + url append((url, module)) elif mtype == "media_collection": for component in module["components"]: url = component["sizes"]["source"] append((url, module)) elif mtype == "embed": embed = module.get("original_embed") or module.get("embed") if embed: append(("ytdl:" + text.extr(embed, 'src="', '"'), module)) return result class BehanceUserExtractor(BehanceExtractor): """Extractor for a user's galleries from www.behance.net""" subcategory = "user" categorytransfer = True pattern = r"(?:https?://)?(?:www\.)?behance\.net/([^/?#]+)/?$" test = ("https://www.behance.net/alexstrohl", { "count": ">= 8", "pattern": BehanceGalleryExtractor.pattern, }) def __init__(self, match): BehanceExtractor.__init__(self, match) self.user = match.group(1) def galleries(self): url = "{}/{}/projects".format(self.root, self.user) params = {"offset": 0} headers = {"X-Requested-With": "XMLHttpRequest"} while True: data = self.request(url, params=params, headers=headers).json() work = data["profile"]["activeSection"]["work"] yield from work["projects"] if not work["hasMore"]: return params["offset"] += len(work["projects"]) class BehanceCollectionExtractor(BehanceExtractor): """Extractor for a collection's galleries from www.behance.net""" subcategory = "collection" categorytransfer = True pattern = r"(?:https?://)?(?:www\.)?behance\.net/collection/(\d+)" test = ("https://www.behance.net/collection/71340149/inspiration", { "count": ">= 145", "pattern": BehanceGalleryExtractor.pattern, }) def __init__(self, match): BehanceExtractor.__init__(self, match) self.collection_id = match.group(1) def galleries(self): url = self.root + "/v3/graphql" headers = { "Origin" : self.root, "Referer": self.root + "/collection/" + self.collection_id, "X-BCP" : "4c34489d-914c-46cd-b44c-dfd0e661136d", "X-NewRelic-ID" : "VgUFVldbGwsFU1BRDwUBVw==", "X-Requested-With": "XMLHttpRequest", } cookies = { "bcp" : "4c34489d-914c-46cd-b44c-dfd0e661136d", "gk_suid": "66981391", "ilo0" : "true", } query = """ query GetMoodboardItemsAndRecommendations( $id: Int! $firstItem: Int! $afterItem: String $shouldGetRecommendations: Boolean! $shouldGetItems: Boolean! $shouldGetMoodboardFields: Boolean! ) { viewer @include(if: $shouldGetMoodboardFields) { isOptedOutOfRecommendations isAdmin } moodboard(id: $id) { ...moodboardFields @include(if: $shouldGetMoodboardFields) items(first: $firstItem, after: $afterItem) @include(if: $shouldGetItems) { pageInfo { endCursor hasNextPage } nodes { ...nodesFields } } recommendedItems(first: 80) @include(if: $shouldGetRecommendations) { nodes { ...nodesFields fetchSource } } } } fragment moodboardFields on Moodboard { id label privacy followerCount isFollowing projectCount url isOwner owners { id displayName url firstName location locationUrl isFollowing images { size_50 { url } size_100 { url } size_115 { url } size_230 { url } size_138 { url } size_276 { url } } } } fragment projectFields on Project { id isOwner publishedOn matureAccess hasMatureContent modifiedOn name url isPrivate slug license { license description id label url text images } fields { label } colors { r g b } owners { url displayName id location locationUrl isProfileOwner isFollowing images { size_50 { url } size_100 { url } size_115 { url } size_230 { url } size_138 { url } size_276 { url } } } covers { size_original { url } size_max_808 { url } size_808 { url } size_404 { url } size_202 { url } size_230 { url } size_115 { url } } stats { views { all } appreciations { all } comments { all } } } fragment exifDataValueFields on exifDataValue { id label value searchValue } fragment nodesFields on MoodboardItem { id entityType width height flexWidth flexHeight images { size url } entity { ... on Project { ...projectFields } ... on ImageModule { project { ...projectFields } colors { r g b } exifData { lens { ...exifDataValueFields } software { ...exifDataValueFields } makeAndModel { ...exifDataValueFields } focalLength { ...exifDataValueFields } iso { ...exifDataValueFields } location { ...exifDataValueFields } flash { ...exifDataValueFields } exposureMode { ...exifDataValueFields } shutterSpeed { ...exifDataValueFields } aperture { ...exifDataValueFields } } } ... on MediaCollectionComponent { project { ...projectFields } } } } """ variables = { "afterItem": "MAo=", "firstItem": 40, "id" : int(self.collection_id), "shouldGetItems" : True, "shouldGetMoodboardFields": False, "shouldGetRecommendations": False, } data = {"query": query, "variables": variables} while True: items = self.request( url, method="POST", headers=headers, cookies=cookies, json=data, ).json()["data"]["moodboard"]["items"] for node in items["nodes"]: yield node["entity"] if not items["pageInfo"]["hasNextPage"]: return variables["afterItem"] = items["pageInfo"]["endCursor"] ������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1675805668.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.25.0/gallery_dl/extractor/blogger.py���������������������������������������������������0000644�0001750�0001750�00000022167�14370541744�020556� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2019-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for Blogger blogs""" from .common import Extractor, Message from .. import text, util import re BASE_PATTERN = ( r"(?:blogger:(?:https?://)?([^/]+)|" r"(?:https?://)?([\w-]+\.blogspot\.com))") class BloggerExtractor(Extractor): """Base class for blogger extractors""" category = "blogger" directory_fmt = ("{category}", "{blog[name]}", "{post[date]:%Y-%m-%d} {post[title]}") filename_fmt = "{num:>03}.{extension}" archive_fmt = "{post[id]}_{num}" root = "https://www.blogger.com" def __init__(self, match): Extractor.__init__(self, match) self.videos = self.config("videos", True) self.blog = match.group(1) or match.group(2) self.api = BloggerAPI(self) def items(self): blog = self.api.blog_by_url("http://" + self.blog) blog["pages"] = blog["pages"]["totalItems"] blog["posts"] = blog["posts"]["totalItems"] blog["date"] = text.parse_datetime(blog["published"]) del blog["selfLink"] sub = re.compile(r"(/|=)(?:s\d+|w\d+-h\d+)(?=/|$)").sub findall_image = re.compile( r'src="(https?://(?:' r'blogger\.googleusercontent\.com/img|' r'\d+\.bp\.blogspot\.com)/[^"]+)').findall findall_video = re.compile( r'src="(https?://www\.blogger\.com/video\.g\?token=[^"]+)').findall metadata = self.metadata() for post in self.posts(blog): content = post["content"] files = findall_image(content) for idx, url in enumerate(files): files[idx] = sub(r"\1s0", url).replace("http:", "https:", 1) if self.videos and 'id="BLOG_video-' in content: page = self.request(post["url"]).text for url in findall_video(page): page = self.request(url).text video_config = util.json_loads(text.extr( page, 'var VIDEO_CONFIG =', '\n')) files.append(max( video_config["streams"], key=lambda x: x["format_id"], )["play_url"]) post["author"] = post["author"]["displayName"] post["replies"] = post["replies"]["totalItems"] post["content"] = text.remove_html(content) post["date"] = text.parse_datetime(post["published"]) del post["selfLink"] del post["blog"] data = {"blog": blog, "post": post} if metadata: data.update(metadata) yield Message.Directory, data for data["num"], url in enumerate(files, 1): data["url"] = url yield Message.Url, url, text.nameext_from_url(url, data) def posts(self, blog): """Return an iterable with all relevant post objects""" def metadata(self): """Return additional metadata""" class BloggerPostExtractor(BloggerExtractor): """Extractor for a single blog post""" subcategory = "post" pattern = BASE_PATTERN + r"(/\d{4}/\d\d/[^/?#]+\.html)" test = ( ("https://julianbphotography.blogspot.com/2010/12/moon-rise.html", { "url": "9928429fb62f712eb4de80f53625eccecc614aae", "pattern": r"https://3.bp.blogspot.com/.*/s0/Icy-Moonrise-.*.jpg", "keyword": { "blog": { "date" : "dt:2010-11-21 18:19:42", "description": "", "id" : "5623928067739466034", "kind" : "blogger#blog", "locale" : dict, "name" : "Julian Bunker Photography", "pages" : int, "posts" : int, "published" : "2010-11-21T10:19:42-08:00", "updated" : str, "url" : "http://julianbphotography.blogspot.com/", }, "post": { "author" : "Julian Bunker", "content" : str, "date" : "dt:2010-12-26 01:08:00", "etag" : str, "id" : "6955139236418998998", "kind" : "blogger#post", "published" : "2010-12-25T17:08:00-08:00", "replies" : "0", "title" : "Moon Rise", "updated" : "2011-12-06T05:21:24-08:00", "url" : "re:.+/2010/12/moon-rise.html$", }, "num": int, "url": str, }, }), ("blogger:http://www.julianbunker.com/2010/12/moon-rise.html"), # video (#587) (("http://cfnmscenesinmovies.blogspot.com/2011/11/" "cfnm-scene-jenna-fischer-in-office.html"), { "pattern": r"https://.+\.googlevideo\.com/videoplayback", }), # image URLs with width/height (#1061) # ("https://aaaninja.blogspot.com/2020/08/altera-boob-press-2.html", { # "pattern": r"https://1.bp.blogspot.com/.+/s0/altera_.+png", # }), # new image domain (#2204) (("https://randomthingsthroughmyletterbox.blogspot.com/2022/01" "/bitter-flowers-by-gunnar-staalesen-blog.html"), { "pattern": r"https://blogger.googleusercontent.com/img/a/.+=s0$", "count": 8, }), ) def __init__(self, match): BloggerExtractor.__init__(self, match) self.path = match.group(3) def posts(self, blog): return (self.api.post_by_path(blog["id"], self.path),) class BloggerBlogExtractor(BloggerExtractor): """Extractor for an entire Blogger blog""" subcategory = "blog" pattern = BASE_PATTERN + r"/?$" test = ( ("https://julianbphotography.blogspot.com/", { "range": "1-25", "count": 25, "pattern": r"https://\d\.bp\.blogspot\.com/.*/s0/[^.]+\.jpg", }), ("blogger:https://www.kefblog.com.ng/", { "range": "1-25", "count": 25, }), ) def posts(self, blog): return self.api.blog_posts(blog["id"]) class BloggerSearchExtractor(BloggerExtractor): """Extractor for Blogger search resuls""" subcategory = "search" pattern = BASE_PATTERN + r"/search/?\?q=([^&#]+)" test = ( ("https://julianbphotography.blogspot.com/search?q=400mm", { "count": "< 10", "keyword": {"query": "400mm"}, }), ) def __init__(self, match): BloggerExtractor.__init__(self, match) self.query = text.unquote(match.group(3)) def posts(self, blog): return self.api.blog_search(blog["id"], self.query) def metadata(self): return {"query": self.query} class BloggerLabelExtractor(BloggerExtractor): """Extractor for Blogger posts by label""" subcategory = "label" pattern = BASE_PATTERN + r"/search/label/([^/?#]+)" test = ( ("https://dmmagazine.blogspot.com/search/label/D%26D", { "range": "1-25", "count": 25, "keyword": {"label": "D&D"}, }), ) def __init__(self, match): BloggerExtractor.__init__(self, match) self.label = text.unquote(match.group(3)) def posts(self, blog): return self.api.blog_posts(blog["id"], self.label) def metadata(self): return {"label": self.label} class BloggerAPI(): """Minimal interface for the Blogger v3 API Ref: https://developers.google.com/blogger """ API_KEY = "AIzaSyCN9ax34oMMyM07g_M-5pjeDp_312eITK8" def __init__(self, extractor): self.extractor = extractor self.api_key = extractor.config("api-key", self.API_KEY) def blog_by_url(self, url): return self._call("blogs/byurl", {"url": url}, "blog") def blog_posts(self, blog_id, label=None): endpoint = "blogs/{}/posts".format(blog_id) params = {"labels": label} return self._pagination(endpoint, params) def blog_search(self, blog_id, query): endpoint = "blogs/{}/posts/search".format(blog_id) params = {"q": query} return self._pagination(endpoint, params) def post_by_path(self, blog_id, path): endpoint = "blogs/{}/posts/bypath".format(blog_id) return self._call(endpoint, {"path": path}, "post") def _call(self, endpoint, params, notfound=None): url = "https://www.googleapis.com/blogger/v3/" + endpoint params["key"] = self.api_key return self.extractor.request( url, params=params, notfound=notfound).json() def _pagination(self, endpoint, params): while True: data = self._call(endpoint, params) if "items" in data: yield from data["items"] if "nextPageToken" not in data: return params["pageToken"] = data["nextPageToken"] ���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1674475069.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.25.0/gallery_dl/extractor/booru.py�����������������������������������������������������0000644�0001750�0001750�00000004530�14363473075�020260� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2015-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for *booru sites""" from .common import BaseExtractor, Message from .. import text import operator class BooruExtractor(BaseExtractor): """Base class for *booru extractors""" basecategory = "booru" filename_fmt = "{category}_{id}_{md5}.{extension}" page_start = 0 per_page = 100 def items(self): self.login() data = self.metadata() tags = self.config("tags", False) notes = self.config("notes", False) fetch_html = tags or notes url_key = self.config("url") if url_key: self._file_url = operator.itemgetter(url_key) for post in self.posts(): try: url = self._file_url(post) if url[0] == "/": url = self.root + url except (KeyError, TypeError): self.log.debug("Unable to fetch download URL for post %s " "(md5: %s)", post.get("id"), post.get("md5")) continue if fetch_html: html = self._html(post) if tags: self._tags(post, html) if notes: self._notes(post, html) text.nameext_from_url(url, post) post.update(data) self._prepare(post) yield Message.Directory, post yield Message.Url, url, post def skip(self, num): pages = num // self.per_page self.page_start += pages return pages * self.per_page def login(self): """Login and set necessary cookies""" def metadata(self): """Return a dict with general metadata""" return () def posts(self): """Return an iterable with post objects""" return () _file_url = operator.itemgetter("file_url") def _prepare(self, post): """Prepare a 'post's metadata""" def _html(self, post): """Return HTML content of a post""" def _tags(self, post, page): """Extract extended tag metadata""" def _notes(self, post, page): """Extract notes metadata""" ������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1676722307.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.25.0/gallery_dl/extractor/bunkr.py�����������������������������������������������������0000644�0001750�0001750�00000006757�14374140203�020252� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2022-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://bunkr.su/""" from .lolisafe import LolisafeAlbumExtractor from .. import text class BunkrAlbumExtractor(LolisafeAlbumExtractor): """Extractor for bunkr.su albums""" category = "bunkr" root = "https://bunkr.su" pattern = r"(?:https?://)?(?:app\.)?bunkr\.(?:[sr]u|is|to)/a/([^/?#]+)" test = ( ("https://bunkr.su/a/Lktg9Keq", { "pattern": r"https://cdn\.bunkr\.ru/test-テスト-\"&>-QjgneIQv\.png", "content": "0c8768055e4e20e7c7259608b67799171b691140", "keyword": { "album_id": "Lktg9Keq", "album_name": 'test テスト "&>', "count": 1, "filename": 'test-テスト-"&>-QjgneIQv', "id": "QjgneIQv", "name": 'test-テスト-"&>', "num": int, }, }), # mp4 (#2239) ("https://app.bunkr.ru/a/ptRHaCn2", { "pattern": r"https://media-files\.bunkr\.ru/_-RnHoW69L\.mp4", "content": "80e61d1dbc5896ae7ef9a28734c747b28b320471", }), # cdn4 ("https://bunkr.is/a/iXTTc1o2", { "pattern": r"https://(cdn|media-files)4\.bunkr\.ru/", "content": "da29aae371b7adc8c5ef8e6991b66b69823791e8", "keyword": { "album_id": "iXTTc1o2", "album_name": "test2", "album_size": "691.1 KB", "count": 2, "description": "072022", "filename": "re:video-wFO9FtxG|image-sZrQUeOx", "id": "re:wFO9FtxG|sZrQUeOx", "name": "re:video|image", "num": int, }, }), ("https://bunkr.to/a/Lktg9Keq"), ) def fetch_album(self, album_id): # album metadata page = self.request(self.root + "/a/" + self.album_id).text info = text.split_html(text.extr( page, "<h1", "</div>").partition(">")[2]) count, _, size = info[1].split(None, 2) # files cdn = None files = [] append = files.append headers = {"Referer": self.root.replace("://", "://stream.", 1) + "/"} pos = page.index('class="grid-images') for url in text.extract_iter(page, '<a href="', '"', pos): if url.startswith("/"): if not cdn: # fetch cdn root from download page durl = "{}/d/{}".format(self.root, url[3:]) cdn = text.extr(self.request( durl).text, 'link.href = "', '"') cdn = cdn[:cdn.index("/", 8)] url = cdn + url[2:] url = text.unescape(url) if url.endswith((".mp4", ".m4v", ".mov", ".webm", ".mkv", ".ts", ".zip", ".rar", ".7z")): append({"file": url.replace("://cdn", "://media-files", 1), "_http_headers": headers}) else: append({"file": url}) return files, { "album_id" : self.album_id, "album_name" : text.unescape(info[0]), "album_size" : size[1:-1], "description": text.unescape(info[2]) if len(info) > 2 else "", "count" : len(files), } �����������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1676722307.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.25.0/gallery_dl/extractor/catbox.py����������������������������������������������������0000644�0001750�0001750�00000005225�14374140203�020376� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2022-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://catbox.moe/""" from .common import GalleryExtractor, Extractor, Message from .. import text class CatboxAlbumExtractor(GalleryExtractor): """Extractor for catbox albums""" category = "catbox" subcategory = "album" root = "https://catbox.moe" filename_fmt = "{filename}.{extension}" directory_fmt = ("{category}", "{album_name} ({album_id})") archive_fmt = "{album_id}_{filename}" pattern = r"(?:https?://)?(?:www\.)?catbox\.moe(/c/[^/?#]+)" test = ( ("https://catbox.moe/c/1igcbe", { "url": "35866a88c29462814f103bc22ec031eaeb380f8a", "content": "70ddb9de3872e2d17cc27e48e6bf395e5c8c0b32", "pattern": r"https://files\.catbox\.moe/\w+\.\w{3}$", "count": 3, "keyword": { "album_id": "1igcbe", "album_name": "test", "date": "dt:2022-08-18 00:00:00", "description": "album test &>", }, }), ("https://www.catbox.moe/c/cd90s1"), ("https://catbox.moe/c/w7tm47#"), ) def metadata(self, page): extr = text.extract_from(page) return { "album_id" : self.gallery_url.rpartition("/")[2], "album_name" : text.unescape(extr("<h1>", "<")), "date" : text.parse_datetime(extr( "<p>Created ", "<"), "%B %d %Y"), "description": text.unescape(extr("<p>", "<")), } def images(self, page): return [ ("https://files.catbox.moe/" + path, None) for path in text.extract_iter( page, ">https://files.catbox.moe/", "<") ] class CatboxFileExtractor(Extractor): """Extractor for catbox files""" category = "catbox" subcategory = "file" archive_fmt = "{filename}" pattern = r"(?:https?://)?(?:files|litter|de)\.catbox\.moe/([^/?#]+)" test = ( ("https://files.catbox.moe/8ih3y7.png", { "pattern": r"^https://files\.catbox\.moe/8ih3y7\.png$", "content": "0c8768055e4e20e7c7259608b67799171b691140", "count": 1, }), ("https://litter.catbox.moe/t8v3n9.png"), ("https://de.catbox.moe/bjdmz1.jpg"), ) def items(self): url = text.ensure_http_scheme(self.url) file = text.nameext_from_url(url, {"url": url}) yield Message.Directory, file yield Message.Url, url, file ���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1643756959.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.25.0/gallery_dl/extractor/comicvine.py�������������������������������������������������0000644�0001750�0001750�00000004737�14176336637�021123� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://comicvine.gamespot.com/""" from .booru import BooruExtractor from .. import text import operator class ComicvineTagExtractor(BooruExtractor): """Extractor for a gallery on comicvine.gamespot.com""" category = "comicvine" subcategory = "tag" basecategory = "" root = "https://comicvine.gamespot.com" per_page = 1000 directory_fmt = ("{category}", "{tag}") filename_fmt = "{filename}.{extension}" archive_fmt = "{id}" pattern = (r"(?:https?://)?comicvine\.gamespot\.com" r"(/([^/?#]+)/(\d+-\d+)/images/.*)") test = ( ("https://comicvine.gamespot.com/jock/4040-5653/images/", { "pattern": r"https://comicvine\.gamespot\.com/a/uploads" r"/original/\d+/\d+/\d+-.+\.(jpe?g|png)", "count": ">= 140", }), (("https://comicvine.gamespot.com/batman/4005-1699" "/images/?tag=Fan%20Art%20%26%20Cosplay"), { "pattern": r"https://comicvine\.gamespot\.com/a/uploads" r"/original/\d+/\d+/\d+-.+", "count": ">= 450", }), ) def __init__(self, match): BooruExtractor.__init__(self, match) self.path, self.object_name, self.object_id = match.groups() def metadata(self): return {"tag": text.unquote(self.object_name)} def posts(self): url = self.root + "/js/image-data.json" params = { "images": text.extract( self.request(self.root + self.path).text, 'data-gallery-id="', '"')[0], "start" : self.page_start, "count" : self.per_page, "object": self.object_id, } while True: images = self.request(url, params=params).json()["images"] yield from images if len(images) < self.per_page: return params["start"] += self.per_page def skip(self, num): self.page_start = num return num _file_url = operator.itemgetter("original") @staticmethod def _prepare(post): post["date"] = text.parse_datetime( post["dateCreated"], "%a, %b %d %Y") post["tags"] = [tag["name"] for tag in post["tags"] if tag["name"]] ���������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1678397180.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.25.0/gallery_dl/extractor/common.py����������������������������������������������������0000644�0001750�0001750�00000071002�14402447374�020415� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2014-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Common classes and constants used by extractor modules.""" import os import re import ssl import time import netrc import queue import logging import datetime import requests import threading from requests.adapters import HTTPAdapter from .message import Message from .. import config, text, util, cache, exception class Extractor(): category = "" subcategory = "" basecategory = "" categorytransfer = False directory_fmt = ("{category}",) filename_fmt = "{filename}.{extension}" archive_fmt = "" cookiedomain = "" browser = None root = "" test = None finalize = None request_interval = 0.0 request_interval_min = 0.0 request_timestamp = 0.0 tls12 = True def __init__(self, match): self.log = logging.getLogger(self.category) self.url = match.string if self.basecategory: self.config = self._config_shared self.config_accumulate = self._config_shared_accumulate self._cfgpath = ("extractor", self.category, self.subcategory) self._parentdir = "" self._write_pages = self.config("write-pages", False) self._retry_codes = self.config("retry-codes") self._retries = self.config("retries", 4) self._timeout = self.config("timeout", 30) self._verify = self.config("verify", True) self._proxies = util.build_proxy_map(self.config("proxy"), self.log) self._interval = util.build_duration_func( self.config("sleep-request", self.request_interval), self.request_interval_min, ) if self._retries < 0: self._retries = float("inf") if not self._retry_codes: self._retry_codes = () self._init_session() self._init_cookies() @classmethod def from_url(cls, url): if isinstance(cls.pattern, str): cls.pattern = re.compile(cls.pattern) match = cls.pattern.match(url) return cls(match) if match else None def __iter__(self): return self.items() def items(self): yield Message.Version, 1 def skip(self, num): return 0 def config(self, key, default=None): return config.interpolate(self._cfgpath, key, default) def config_accumulate(self, key): return config.accumulate(self._cfgpath, key) def _config_shared(self, key, default=None): return config.interpolate_common(("extractor",), ( (self.category, self.subcategory), (self.basecategory, self.subcategory), ), key, default) def _config_shared_accumulate(self, key): values = config.accumulate(self._cfgpath, key) conf = config.get(("extractor",), self.basecategory) if conf: values[:0] = config.accumulate((self.subcategory,), key, conf=conf) return values def request(self, url, *, method="GET", session=None, retries=None, retry_codes=None, encoding=None, fatal=True, notfound=None, **kwargs): if session is None: session = self.session if retries is None: retries = self._retries if retry_codes is None: retry_codes = self._retry_codes if "proxies" not in kwargs: kwargs["proxies"] = self._proxies if "timeout" not in kwargs: kwargs["timeout"] = self._timeout if "verify" not in kwargs: kwargs["verify"] = self._verify response = None tries = 1 if self._interval: seconds = (self._interval() - (time.time() - Extractor.request_timestamp)) if seconds > 0.0: self.sleep(seconds, "request") while True: try: response = session.request(method, url, **kwargs) except (requests.exceptions.ConnectionError, requests.exceptions.Timeout, requests.exceptions.ChunkedEncodingError, requests.exceptions.ContentDecodingError) as exc: msg = exc except (requests.exceptions.RequestException) as exc: raise exception.HttpError(exc) else: code = response.status_code if self._write_pages: self._dump_response(response) if 200 <= code < 400 or fatal is None and \ (400 <= code < 500) or not fatal and \ (400 <= code < 429 or 431 <= code < 500): if encoding: response.encoding = encoding return response if notfound and code == 404: raise exception.NotFoundError(notfound) msg = "'{} {}' for '{}'".format(code, response.reason, url) server = response.headers.get("Server") if server and server.startswith("cloudflare") and \ code in (403, 503): content = response.content if b"_cf_chl_opt" in content or b"jschl-answer" in content: self.log.warning("Cloudflare challenge") break if b'name="captcha-bypass"' in content: self.log.warning("Cloudflare CAPTCHA") break if code not in retry_codes and code < 500: break finally: Extractor.request_timestamp = time.time() self.log.debug("%s (%s/%s)", msg, tries, retries+1) if tries > retries: break self.sleep( max(tries, self._interval()) if self._interval else tries, "retry") tries += 1 raise exception.HttpError(msg, response) def wait(self, *, seconds=None, until=None, adjust=1.0, reason="rate limit reset"): now = time.time() if seconds: seconds = float(seconds) until = now + seconds elif until: if isinstance(until, datetime.datetime): # convert to UTC timestamp until = util.datetime_to_timestamp(until) else: until = float(until) seconds = until - now else: raise ValueError("Either 'seconds' or 'until' is required") seconds += adjust if seconds <= 0.0: return if reason: t = datetime.datetime.fromtimestamp(until).time() isotime = "{:02}:{:02}:{:02}".format(t.hour, t.minute, t.second) self.log.info("Waiting until %s for %s.", isotime, reason) time.sleep(seconds) def sleep(self, seconds, reason): self.log.debug("Sleeping %.2f seconds (%s)", seconds, reason) time.sleep(seconds) def _get_auth_info(self): """Return authentication information as (username, password) tuple""" username = self.config("username") password = None if username: password = self.config("password") elif self.config("netrc", False): try: info = netrc.netrc().authenticators(self.category) username, _, password = info except (OSError, netrc.NetrcParseError) as exc: self.log.error("netrc: %s", exc) except TypeError: self.log.warning("netrc: No authentication info") return username, password def _init_session(self): self.session = session = requests.Session() headers = session.headers headers.clear() ssl_options = ssl_ciphers = 0 browser = self.config("browser") if browser is None: browser = self.browser if browser and isinstance(browser, str): browser, _, platform = browser.lower().partition(":") if not platform or platform == "auto": platform = ("Windows NT 10.0; Win64; x64" if util.WINDOWS else "X11; Linux x86_64") elif platform == "windows": platform = "Windows NT 10.0; Win64; x64" elif platform == "linux": platform = "X11; Linux x86_64" elif platform == "macos": platform = "Macintosh; Intel Mac OS X 11.5" if browser == "chrome": if platform.startswith("Macintosh"): platform = platform.replace(".", "_") + "_2" else: browser = "firefox" for key, value in HTTP_HEADERS[browser]: if value and "{}" in value: headers[key] = value.format(platform) else: headers[key] = value ssl_options |= (ssl.OP_NO_SSLv2 | ssl.OP_NO_SSLv3 | ssl.OP_NO_TLSv1 | ssl.OP_NO_TLSv1_1) ssl_ciphers = SSL_CIPHERS[browser] else: useragent = self.config("user-agent") if useragent is None: useragent = ("Mozilla/5.0 (Windows NT 10.0; Win64; x64; " "rv:102.0) Gecko/20100101 Firefox/102.0") elif useragent == "browser": useragent = _browser_useragent() headers["User-Agent"] = useragent headers["Accept"] = "*/*" headers["Accept-Language"] = "en-US,en;q=0.5" if BROTLI: headers["Accept-Encoding"] = "gzip, deflate, br" else: headers["Accept-Encoding"] = "gzip, deflate" custom_headers = self.config("headers") if custom_headers: headers.update(custom_headers) custom_ciphers = self.config("ciphers") if custom_ciphers: if isinstance(custom_ciphers, list): ssl_ciphers = ":".join(custom_ciphers) else: ssl_ciphers = custom_ciphers source_address = self.config("source-address") if source_address: if isinstance(source_address, str): source_address = (source_address, 0) else: source_address = (source_address[0], source_address[1]) tls12 = self.config("tls12") if tls12 is None: tls12 = self.tls12 if not tls12: ssl_options |= ssl.OP_NO_TLSv1_2 self.log.debug("TLS 1.2 disabled.") adapter = _build_requests_adapter( ssl_options, ssl_ciphers, source_address) session.mount("https://", adapter) session.mount("http://", adapter) def _init_cookies(self): """Populate the session's cookiejar""" self._cookiefile = None self._cookiejar = self.session.cookies if self.cookiedomain is None: return cookies = self.config("cookies") if cookies: if isinstance(cookies, dict): self._update_cookies_dict(cookies, self.cookiedomain) elif isinstance(cookies, str): cookiefile = util.expand_path(cookies) try: with open(cookiefile) as fp: util.cookiestxt_load(fp, self._cookiejar) except Exception as exc: self.log.warning("cookies: %s", exc) else: self.log.debug("Loading cookies from '%s'", cookies) self._cookiefile = cookiefile elif isinstance(cookies, (list, tuple)): key = tuple(cookies) cookiejar = _browser_cookies.get(key) if cookiejar is None: from ..cookies import load_cookies cookiejar = self._cookiejar.__class__() try: load_cookies(cookiejar, cookies) except Exception as exc: self.log.warning("cookies: %s", exc) else: _browser_cookies[key] = cookiejar else: self.log.debug("Using cached cookies from %s", key) setcookie = self._cookiejar.set_cookie for cookie in cookiejar: setcookie(cookie) else: self.log.warning( "Expected 'dict', 'list', or 'str' value for 'cookies' " "option, got '%s' (%s)", cookies.__class__.__name__, cookies) def _store_cookies(self): """Store the session's cookiejar in a cookies.txt file""" if self._cookiefile and self.config("cookies-update", True): try: with open(self._cookiefile, "w") as fp: util.cookiestxt_store(fp, self._cookiejar) except OSError as exc: self.log.warning("cookies: %s", exc) def _update_cookies(self, cookies, *, domain=""): """Update the session's cookiejar with 'cookies'""" if isinstance(cookies, dict): self._update_cookies_dict(cookies, domain or self.cookiedomain) else: setcookie = self._cookiejar.set_cookie try: cookies = iter(cookies) except TypeError: setcookie(cookies) else: for cookie in cookies: setcookie(cookie) def _update_cookies_dict(self, cookiedict, domain): """Update cookiejar with name-value pairs from a dict""" setcookie = self._cookiejar.set for name, value in cookiedict.items(): setcookie(name, value, domain=domain) def _check_cookies(self, cookienames, *, domain=None): """Check if all 'cookienames' are in the session's cookiejar""" if not self._cookiejar: return False if domain is None: domain = self.cookiedomain names = set(cookienames) now = time.time() for cookie in self._cookiejar: if cookie.name in names and ( not domain or cookie.domain == domain): if cookie.expires: diff = int(cookie.expires - now) if diff <= 0: self.log.warning( "Cookie '%s' has expired", cookie.name) continue elif diff <= 86400: hours = diff // 3600 self.log.warning( "Cookie '%s' will expire in less than %s hour%s", cookie.name, hours + 1, "s" if hours else "") names.discard(cookie.name) if not names: return True return False def _prepare_ddosguard_cookies(self): if not self._cookiejar.get("__ddg2", domain=self.cookiedomain): self._cookiejar.set( "__ddg2", util.generate_token(), domain=self.cookiedomain) def _get_date_min_max(self, dmin=None, dmax=None): """Retrieve and parse 'date-min' and 'date-max' config values""" def get(key, default): ts = self.config(key, default) if isinstance(ts, str): try: ts = int(datetime.datetime.strptime(ts, fmt).timestamp()) except ValueError as exc: self.log.warning("Unable to parse '%s': %s", key, exc) ts = default return ts fmt = self.config("date-format", "%Y-%m-%dT%H:%M:%S") return get("date-min", dmin), get("date-max", dmax) def _dispatch_extractors(self, extractor_data, default=()): """ """ extractors = { data[0].subcategory: data for data in extractor_data } include = self.config("include", default) or () if include == "all": include = extractors elif isinstance(include, str): include = include.split(",") result = [(Message.Version, 1)] for category in include: if category in extractors: extr, url = extractors[category] result.append((Message.Queue, url, {"_extractor": extr})) return iter(result) @classmethod def _get_tests(cls): """Yield an extractor's test cases as (URL, RESULTS) tuples""" tests = cls.test if not tests: return if len(tests) == 2 and (not tests[1] or isinstance(tests[1], dict)): tests = (tests,) for test in tests: if isinstance(test, str): test = (test, None) yield test def _dump_response(self, response, history=True): """Write the response content to a .dump file in the current directory. The file name is derived from the response url, replacing special characters with "_" """ if history: for resp in response.history: self._dump_response(resp, False) if hasattr(Extractor, "_dump_index"): Extractor._dump_index += 1 else: Extractor._dump_index = 1 Extractor._dump_sanitize = re.compile(r"[\\\\|/<>:\"?*&=#]+").sub fname = "{:>02}_{}".format( Extractor._dump_index, Extractor._dump_sanitize('_', response.url), ) if util.WINDOWS: path = os.path.abspath(fname)[:255] else: path = fname[:251] try: with open(path + ".txt", 'wb') as fp: util.dump_response( response, fp, headers=(self._write_pages in ("all", "ALL")), hide_auth=(self._write_pages != "ALL") ) except Exception as e: self.log.warning("Failed to dump HTTP request (%s: %s)", e.__class__.__name__, e) class GalleryExtractor(Extractor): subcategory = "gallery" filename_fmt = "{category}_{gallery_id}_{num:>03}.{extension}" directory_fmt = ("{category}", "{gallery_id} {title}") archive_fmt = "{gallery_id}_{num}" enum = "num" def __init__(self, match, url=None): Extractor.__init__(self, match) self.gallery_url = self.root + match.group(1) if url is None else url def items(self): self.login() page = self.request(self.gallery_url, notfound=self.subcategory).text data = self.metadata(page) imgs = self.images(page) if "count" in data: if self.config("page-reverse"): images = util.enumerate_reversed(imgs, 1, data["count"]) else: images = zip( range(1, data["count"]+1), imgs, ) else: enum = enumerate try: data["count"] = len(imgs) except TypeError: pass else: if self.config("page-reverse"): enum = util.enumerate_reversed images = enum(imgs, 1) yield Message.Directory, data for data[self.enum], (url, imgdata) in images: if imgdata: data.update(imgdata) if "extension" not in imgdata: text.nameext_from_url(url, data) else: text.nameext_from_url(url, data) yield Message.Url, url, data def login(self): """Login and set necessary cookies""" def metadata(self, page): """Return a dict with general metadata""" def images(self, page): """Return a list of all (image-url, metadata)-tuples""" class ChapterExtractor(GalleryExtractor): subcategory = "chapter" directory_fmt = ( "{category}", "{manga}", "{volume:?v/ />02}c{chapter:>03}{chapter_minor:?//}{title:?: //}") filename_fmt = ( "{manga}_c{chapter:>03}{chapter_minor:?//}_{page:>03}.{extension}") archive_fmt = ( "{manga}_{chapter}{chapter_minor}_{page}") enum = "page" class MangaExtractor(Extractor): subcategory = "manga" categorytransfer = True chapterclass = None reverse = True def __init__(self, match, url=None): Extractor.__init__(self, match) self.manga_url = url or self.root + match.group(1) if self.config("chapter-reverse", False): self.reverse = not self.reverse def items(self): self.login() page = self.request(self.manga_url).text chapters = self.chapters(page) if self.reverse: chapters.reverse() for chapter, data in chapters: data["_extractor"] = self.chapterclass yield Message.Queue, chapter, data def login(self): """Login and set necessary cookies""" def chapters(self, page): """Return a list of all (chapter-url, metadata)-tuples""" class AsynchronousMixin(): """Run info extraction in a separate thread""" def __iter__(self): messages = queue.Queue(5) thread = threading.Thread( target=self.async_items, args=(messages,), daemon=True, ) thread.start() while True: msg = messages.get() if msg is None: thread.join() return if isinstance(msg, Exception): thread.join() raise msg yield msg messages.task_done() def async_items(self, messages): try: for msg in self.items(): messages.put(msg) except Exception as exc: messages.put(exc) messages.put(None) class BaseExtractor(Extractor): instances = () def __init__(self, match): if not self.category: self._init_category(match) Extractor.__init__(self, match) def _init_category(self, match): for index, group in enumerate(match.groups()): if group is not None: if index: self.category, self.root = self.instances[index-1] if not self.root: self.root = text.root_from_url(match.group(0)) else: self.root = group self.category = group.partition("://")[2] break @classmethod def update(cls, instances): extra_instances = config.get(("extractor",), cls.basecategory) if extra_instances: for category, info in extra_instances.items(): if isinstance(info, dict) and "root" in info: instances[category] = info pattern_list = [] instance_list = cls.instances = [] for category, info in instances.items(): root = info["root"] if root: root = root.rstrip("/") instance_list.append((category, root)) pattern = info.get("pattern") if not pattern: pattern = re.escape(root[root.index(":") + 3:]) pattern_list.append(pattern + "()") return ( r"(?:" + cls.basecategory + r":(https?://[^/?#]+)|" r"(?:https?://)?(?:" + "|".join(pattern_list) + r"))" ) class RequestsAdapter(HTTPAdapter): def __init__(self, ssl_context=None, source_address=None): self.ssl_context = ssl_context self.source_address = source_address HTTPAdapter.__init__(self) def init_poolmanager(self, *args, **kwargs): kwargs["ssl_context"] = self.ssl_context kwargs["source_address"] = self.source_address return HTTPAdapter.init_poolmanager(self, *args, **kwargs) def proxy_manager_for(self, *args, **kwargs): kwargs["ssl_context"] = self.ssl_context kwargs["source_address"] = self.source_address return HTTPAdapter.proxy_manager_for(self, *args, **kwargs) def _build_requests_adapter(ssl_options, ssl_ciphers, source_address): key = (ssl_options, ssl_ciphers, source_address) try: return _adapter_cache[key] except KeyError: pass if ssl_options or ssl_ciphers: ssl_context = ssl.create_default_context() if ssl_options: ssl_context.options |= ssl_options if ssl_ciphers: ssl_context.set_ecdh_curve("prime256v1") ssl_context.set_ciphers(ssl_ciphers) else: ssl_context = None adapter = _adapter_cache[key] = RequestsAdapter( ssl_context, source_address) return adapter @cache.cache(maxage=86400) def _browser_useragent(): """Get User-Agent header from default browser""" import webbrowser import socket server = socket.socket(socket.AF_INET, socket.SOCK_STREAM) server.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) server.bind(("127.0.0.1", 6414)) server.listen(1) webbrowser.open("http://127.0.0.1:6414/user-agent") client = server.accept()[0] server.close() for line in client.recv(1024).split(b"\r\n"): key, _, value = line.partition(b":") if key.strip().lower() == b"user-agent": useragent = value.strip() break else: useragent = b"" client.send(b"HTTP/1.1 200 OK\r\n\r\n" + useragent) client.close() return useragent.decode() _adapter_cache = {} _browser_cookies = {} HTTP_HEADERS = { "firefox": ( ("User-Agent", "Mozilla/5.0 ({}; rv:102.0) " "Gecko/20100101 Firefox/102.0"), ("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9," "image/avif,image/webp,*/*;q=0.8"), ("Accept-Language", "en-US,en;q=0.5"), ("Accept-Encoding", None), ("Referer", None), ("DNT", "1"), ("Connection", "keep-alive"), ("Upgrade-Insecure-Requests", "1"), ("Cookie", None), ("Sec-Fetch-Dest", "empty"), ("Sec-Fetch-Mode", "no-cors"), ("Sec-Fetch-Site", "same-origin"), ("TE", "trailers"), ), "chrome": ( ("Connection", "keep-alive"), ("Upgrade-Insecure-Requests", "1"), ("User-Agent", "Mozilla/5.0 ({}) AppleWebKit/537.36 (KHTML, " "like Gecko) Chrome/111.0.0.0 Safari/537.36"), ("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9," "image/avif,image/webp,image/apng,*/*;q=0.8," "application/signed-exchange;v=b3;q=0.7"), ("Referer", None), ("Sec-Fetch-Site", "same-origin"), ("Sec-Fetch-Mode", "no-cors"), ("Sec-Fetch-Dest", "empty"), ("Accept-Encoding", None), ("Accept-Language", "en-US,en;q=0.9"), ("cookie", None), ("content-length", None), ), } SSL_CIPHERS = { "firefox": ( "TLS_AES_128_GCM_SHA256:" "TLS_CHACHA20_POLY1305_SHA256:" "TLS_AES_256_GCM_SHA384:" "ECDHE-ECDSA-AES128-GCM-SHA256:" "ECDHE-RSA-AES128-GCM-SHA256:" "ECDHE-ECDSA-CHACHA20-POLY1305:" "ECDHE-RSA-CHACHA20-POLY1305:" "ECDHE-ECDSA-AES256-GCM-SHA384:" "ECDHE-RSA-AES256-GCM-SHA384:" "ECDHE-ECDSA-AES256-SHA:" "ECDHE-ECDSA-AES128-SHA:" "ECDHE-RSA-AES128-SHA:" "ECDHE-RSA-AES256-SHA:" "AES128-GCM-SHA256:" "AES256-GCM-SHA384:" "AES128-SHA:" "AES256-SHA" ), "chrome": ( "TLS_AES_128_GCM_SHA256:" "TLS_AES_256_GCM_SHA384:" "TLS_CHACHA20_POLY1305_SHA256:" "ECDHE-ECDSA-AES128-GCM-SHA256:" "ECDHE-RSA-AES128-GCM-SHA256:" "ECDHE-ECDSA-AES256-GCM-SHA384:" "ECDHE-RSA-AES256-GCM-SHA384:" "ECDHE-ECDSA-CHACHA20-POLY1305:" "ECDHE-RSA-CHACHA20-POLY1305:" "ECDHE-RSA-AES128-SHA:" "ECDHE-RSA-AES256-SHA:" "AES128-GCM-SHA256:" "AES256-GCM-SHA384:" "AES128-SHA:" "AES256-SHA" ), } urllib3 = requests.packages.urllib3 # detect brotli support try: BROTLI = urllib3.response.brotli is not None except AttributeError: BROTLI = False # set (urllib3) warnings filter action = config.get((), "warnings", "default") if action: try: import warnings warnings.simplefilter(action, urllib3.exceptions.HTTPWarning) except Exception: pass del action # Undo automatic pyOpenSSL injection by requests pyopenssl = config.get((), "pyopenssl", False) if not pyopenssl: try: from requests.packages.urllib3.contrib import pyopenssl # noqa pyopenssl.extract_from_urllib3() except ImportError: pass del pyopenssl ������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1674475069.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.25.0/gallery_dl/extractor/cyberdrop.py�������������������������������������������������0000644�0001750�0001750�00000004164�14363473075�021126� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://cyberdrop.me/""" from . import lolisafe from .. import text class CyberdropAlbumExtractor(lolisafe.LolisafeAlbumExtractor): category = "cyberdrop" root = "https://cyberdrop.me" pattern = r"(?:https?://)?(?:www\.)?cyberdrop\.(?:me|to)/a/([^/?#]+)" test = ( # images ("https://cyberdrop.me/a/keKRjm4t", { "pattern": r"https://fs-\d+\.cyberdrop\.to/.*\.(jpg|png|webp)$", "keyword": { "album_id": "keKRjm4t", "album_name": "Fate (SFW)", "album_size": 150069254, "count": 62, "date": "dt:2020-06-18 13:14:20", "description": "", "id": r"re:\w{8}", }, }), # videos ("https://cyberdrop.to/a/l8gIAXVD", { "pattern": r"https://fs-\d+\.cyberdrop\.to/.*\.mp4$", "count": 31, "keyword": { "album_id": "l8gIAXVD", "album_name": "Achelois17 videos", "album_size": 652037121, "date": "dt:2020-06-16 15:40:44", }, }), ) def fetch_album(self, album_id): url = self.root + "/a/" + self.album_id extr = text.extract_from(self.request(url).text) files = [] append = files.append while True: url = text.unescape(extr('id="file" href="', '"')) if not url: break append({"file": url, "_fallback": (self.root + url[url.find("/", 8):],)}) return files, { "album_id" : self.album_id, "album_name" : extr("name: '", "'"), "date" : text.parse_timestamp(extr("timestamp: ", ",")), "album_size" : text.parse_int(extr("totalSize: ", ",")), "description": extr("description: `", "`"), "count" : len(files), } ������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1677790111.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.25.0/gallery_dl/extractor/danbooru.py��������������������������������������������������0000644�0001750�0001750�00000023733�14400205637�020736� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2014-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://danbooru.donmai.us/ and other Danbooru instances""" from .common import BaseExtractor, Message from .. import text, util import datetime class DanbooruExtractor(BaseExtractor): """Base class for danbooru extractors""" basecategory = "Danbooru" filename_fmt = "{category}_{id}_{filename}.{extension}" page_limit = 1000 page_start = None per_page = 200 request_interval = 1.0 def __init__(self, match): BaseExtractor.__init__(self, match) self.ugoira = self.config("ugoira", False) self.external = self.config("external", False) threshold = self.config("threshold") if isinstance(threshold, int): self.threshold = 1 if threshold < 1 else threshold else: self.threshold = self.per_page username, api_key = self._get_auth_info() if username: self.log.debug("Using HTTP Basic Auth for user '%s'", username) self.session.auth = (username, api_key) def skip(self, num): pages = num // self.per_page if pages >= self.page_limit: pages = self.page_limit - 1 self.page_start = pages + 1 return pages * self.per_page def items(self): self.session.headers["User-Agent"] = util.USERAGENT includes = self.config("metadata") if includes: if isinstance(includes, (list, tuple)): includes = ",".join(includes) elif not isinstance(includes, str): includes = "artist_commentary,children,notes,parent,uploader" data = self.metadata() for post in self.posts(): try: url = post["file_url"] except KeyError: if self.external and post["source"]: post.update(data) yield Message.Directory, post yield Message.Queue, post["source"], post continue text.nameext_from_url(url, post) if post["extension"] == "zip": if self.ugoira: post["frames"] = self._ugoira_frames(post) post["_http_adjust_extension"] = False else: url = post["large_file_url"] post["extension"] = "webm" if includes: meta_url = "{}/posts/{}.json?only={}".format( self.root, post["id"], includes) post.update(self.request(meta_url).json()) if url[0] == "/": url = self.root + url post.update(data) yield Message.Directory, post yield Message.Url, url, post def metadata(self): return () def posts(self): return () def _pagination(self, endpoint, params, pages=False): url = self.root + endpoint params["limit"] = self.per_page params["page"] = self.page_start while True: posts = self.request(url, params=params).json() if "posts" in posts: posts = posts["posts"] yield from posts if len(posts) < self.threshold: return if pages: params["page"] += 1 else: for post in reversed(posts): if "id" in post: params["page"] = "b{}".format(post["id"]) break else: return def _ugoira_frames(self, post): data = self.request("{}/posts/{}.json?only=media_metadata".format( self.root, post["id"]) ).json()["media_metadata"]["metadata"] ext = data["ZIP:ZipFileName"].rpartition(".")[2] fmt = ("{:>06}." + ext).format delays = data["Ugoira:FrameDelays"] return [{"file": fmt(index), "delay": delay} for index, delay in enumerate(delays)] BASE_PATTERN = DanbooruExtractor.update({ "danbooru": { "root": None, "pattern": r"(?:danbooru|hijiribe|sonohara|safebooru)\.donmai\.us", }, "atfbooru": { "root": "https://booru.allthefallen.moe", "pattern": r"booru\.allthefallen\.moe", }, "aibooru": { "root": None, "pattern": r"(?:safe.)?aibooru\.online", } }) class DanbooruTagExtractor(DanbooruExtractor): """Extractor for danbooru posts from tag searches""" subcategory = "tag" directory_fmt = ("{category}", "{search_tags}") archive_fmt = "t_{search_tags}_{id}" pattern = BASE_PATTERN + r"/posts\?(?:[^&#]*&)*tags=([^&#]*)" test = ( ("https://danbooru.donmai.us/posts?tags=bonocho", { "content": "b196fb9f1668109d7774a0a82efea3ffdda07746", }), # test page transitions ("https://danbooru.donmai.us/posts?tags=mushishi", { "count": ">= 300", }), # 'external' option (#1747) ("https://danbooru.donmai.us/posts?tags=pixiv_id%3A1476533", { "options": (("external", True),), "pattern": r"https://i\.pximg\.net/img-original/img" r"/2008/08/28/02/35/48/1476533_p0\.jpg", }), ("https://booru.allthefallen.moe/posts?tags=yume_shokunin", { "count": 12, }), ("https://aibooru.online/posts?tags=center_frills&z=1", { "pattern": r"https://aibooru\.online/data/original" r"/[0-9a-f]{2}/[0-9a-f]{2}/[0-9a-f]{32}\.\w+", "count": ">= 3", }), ("https://hijiribe.donmai.us/posts?tags=bonocho"), ("https://sonohara.donmai.us/posts?tags=bonocho"), ("https://safebooru.donmai.us/posts?tags=bonocho"), ("https://safe.aibooru.online/posts?tags=center_frills"), ) def __init__(self, match): DanbooruExtractor.__init__(self, match) tags = match.group(match.lastindex) self.tags = text.unquote(tags.replace("+", " ")) def metadata(self): return {"search_tags": self.tags} def posts(self): return self._pagination("/posts.json", {"tags": self.tags}) class DanbooruPoolExtractor(DanbooruExtractor): """Extractor for posts from danbooru pools""" subcategory = "pool" directory_fmt = ("{category}", "pool", "{pool[id]} {pool[name]}") archive_fmt = "p_{pool[id]}_{id}" pattern = BASE_PATTERN + r"/pool(?:s|/show)/(\d+)" test = ( ("https://danbooru.donmai.us/pools/7659", { "content": "b16bab12bea5f7ea9e0a836bf8045f280e113d99", }), ("https://booru.allthefallen.moe/pools/9", { "url": "902549ffcdb00fe033c3f63e12bc3cb95c5fd8d5", "count": 6, }), ("https://aibooru.online/pools/1"), ("https://danbooru.donmai.us/pool/show/7659"), ) def __init__(self, match): DanbooruExtractor.__init__(self, match) self.pool_id = match.group(match.lastindex) def metadata(self): url = "{}/pools/{}.json".format(self.root, self.pool_id) pool = self.request(url).json() pool["name"] = pool["name"].replace("_", " ") self.post_ids = pool.pop("post_ids", ()) return {"pool": pool} def posts(self): params = {"tags": "pool:" + self.pool_id} return self._pagination("/posts.json", params) class DanbooruPostExtractor(DanbooruExtractor): """Extractor for single danbooru posts""" subcategory = "post" archive_fmt = "{id}" pattern = BASE_PATTERN + r"/post(?:s|/show)/(\d+)" test = ( ("https://danbooru.donmai.us/posts/294929", { "content": "5e255713cbf0a8e0801dc423563c34d896bb9229", }), ("https://danbooru.donmai.us/posts/3613024", { "pattern": r"https?://.+\.zip$", "options": (("ugoira", True),) }), ("https://booru.allthefallen.moe/posts/22", { "content": "21dda68e1d7e0a554078e62923f537d8e895cac8", }), ("https://aibooru.online/posts/1", { "content": "54d548743cd67799a62c77cbae97cfa0fec1b7e9", }), ("https://danbooru.donmai.us/post/show/294929"), ) def __init__(self, match): DanbooruExtractor.__init__(self, match) self.post_id = match.group(match.lastindex) def posts(self): url = "{}/posts/{}.json".format(self.root, self.post_id) return (self.request(url).json(),) class DanbooruPopularExtractor(DanbooruExtractor): """Extractor for popular images from danbooru""" subcategory = "popular" directory_fmt = ("{category}", "popular", "{scale}", "{date}") archive_fmt = "P_{scale[0]}_{date}_{id}" pattern = BASE_PATTERN + r"/(?:explore/posts/)?popular(?:\?([^#]*))?" test = ( ("https://danbooru.donmai.us/explore/posts/popular"), (("https://danbooru.donmai.us/explore/posts/popular" "?date=2013-06-06&scale=week"), { "range": "1-120", "count": 120, }), ("https://booru.allthefallen.moe/explore/posts/popular"), ("https://aibooru.online/explore/posts/popular"), ) def __init__(self, match): DanbooruExtractor.__init__(self, match) self.params = match.group(match.lastindex) def metadata(self): self.params = params = text.parse_query(self.params) scale = params.get("scale", "day") date = params.get("date") or datetime.date.today().isoformat() if scale == "week": date = datetime.date.fromisoformat(date) date = (date - datetime.timedelta(days=date.weekday())).isoformat() elif scale == "month": date = date[:-3] return {"date": date, "scale": scale} def posts(self): if self.page_start is None: self.page_start = 1 return self._pagination( "/explore/posts/popular.json", self.params, True) �������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1651573353.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.25.0/gallery_dl/extractor/desktopography.py��������������������������������������������0000644�0001750�0001750�00000006114�14234201151�022152� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://desktopography.net/""" from .common import Extractor, Message from .. import text BASE_PATTERN = r"(?:https?://)?desktopography\.net" class DesktopographyExtractor(Extractor): """Base class for desktopography extractors""" category = "desktopography" archive_fmt = "{filename}" root = "https://desktopography.net" class DesktopographySiteExtractor(DesktopographyExtractor): """Extractor for all desktopography exhibitions """ subcategory = "site" pattern = BASE_PATTERN + r"/$" test = ("https://desktopography.net/",) def items(self): page = self.request(self.root).text data = {"_extractor": DesktopographyExhibitionExtractor} for exhibition_year in text.extract_iter( page, '<a href="https://desktopography.net/exhibition-', '/">'): url = self.root + "/exhibition-" + exhibition_year + "/" yield Message.Queue, url, data class DesktopographyExhibitionExtractor(DesktopographyExtractor): """Extractor for a yearly desktopography exhibition""" subcategory = "exhibition" pattern = BASE_PATTERN + r"/exhibition-([^/?#]+)/" test = ("https://desktopography.net/exhibition-2020/",) def __init__(self, match): DesktopographyExtractor.__init__(self, match) self.year = match.group(1) def items(self): url = "{}/exhibition-{}/".format(self.root, self.year) base_entry_url = "https://desktopography.net/portfolios/" page = self.request(url).text data = { "_extractor": DesktopographyEntryExtractor, "year": self.year, } for entry_url in text.extract_iter( page, '<a class="overlay-background" href="' + base_entry_url, '">'): url = base_entry_url + entry_url yield Message.Queue, url, data class DesktopographyEntryExtractor(DesktopographyExtractor): """Extractor for all resolutions of a desktopography wallpaper""" subcategory = "entry" pattern = BASE_PATTERN + r"/portfolios/([\w-]+)" test = ("https://desktopography.net/portfolios/new-era/",) def __init__(self, match): DesktopographyExtractor.__init__(self, match) self.entry = match.group(1) def items(self): url = "{}/portfolios/{}".format(self.root, self.entry) page = self.request(url).text entry_data = {"entry": self.entry} yield Message.Directory, entry_data for image_data in text.extract_iter( page, '<a target="_blank" href="https://desktopography.net', '">'): path, _, filename = image_data.partition( '" class="wallpaper-button" download="') text.nameext_from_url(filename, entry_data) yield Message.Url, self.root + path, entry_data ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1678397180.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.25.0/gallery_dl/extractor/deviantart.py������������������������������������������������0000644�0001750�0001750�00000220267�14402447374�021277� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2015-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.deviantart.com/""" from .common import Extractor, Message from .. import text, util, exception from ..cache import cache, memcache import collections import itertools import mimetypes import binascii import time import re BASE_PATTERN = ( r"(?:https?://)?(?:" r"(?:www\.)?(?:fx)?deviantart\.com/(?!watch/)([\w-]+)|" r"(?!www\.)([\w-]+)\.(?:fx)?deviantart\.com)" ) class DeviantartExtractor(Extractor): """Base class for deviantart extractors""" category = "deviantart" root = "https://www.deviantart.com" directory_fmt = ("{category}", "{username}") filename_fmt = "{category}_{index}_{title}.{extension}" cookiedomain = None cookienames = ("auth", "auth_secure", "userinfo") _last_request = 0 def __init__(self, match): Extractor.__init__(self, match) self.flat = self.config("flat", True) self.extra = self.config("extra", False) self.original = self.config("original", True) self.comments = self.config("comments", False) self.user = match.group(1) or match.group(2) self.group = False self.offset = 0 self.api = None unwatch = self.config("auto-unwatch") if unwatch: self.unwatch = [] self.finalize = self._unwatch_premium else: self.unwatch = None if self.original != "image": self._update_content = self._update_content_default else: self._update_content = self._update_content_image self.original = True self._premium_cache = {} self.commit_journal = { "html": self._commit_journal_html, "text": self._commit_journal_text, }.get(self.config("journals", "html")) def skip(self, num): self.offset += num return num def login(self): if not self._check_cookies(self.cookienames): username, password = self._get_auth_info() if not username: return False self._update_cookies(_login_impl(self, username, password)) return True def items(self): self.api = DeviantartOAuthAPI(self) if self.user and self.config("group", True): profile = self.api.user_profile(self.user) self.group = not profile if self.group: self.subcategory = "group-" + self.subcategory self.user = self.user.lower() else: self.user = profile["user"]["username"] for deviation in self.deviations(): if isinstance(deviation, tuple): url, data = deviation yield Message.Queue, url, data continue if deviation["is_deleted"]: # prevent crashing in case the deviation really is # deleted self.log.debug( "Skipping %s (deleted)", deviation["deviationid"]) continue if "premium_folder_data" in deviation: data = self._fetch_premium(deviation) if not data: continue deviation.update(data) self.prepare(deviation) yield Message.Directory, deviation if "content" in deviation: content = deviation["content"] if self.original and deviation["is_downloadable"]: self._update_content(deviation, content) else: self._update_token(deviation, content) yield self.commit(deviation, content) elif deviation["is_downloadable"]: content = self.api.deviation_download(deviation["deviationid"]) yield self.commit(deviation, content) if "videos" in deviation and deviation["videos"]: video = max(deviation["videos"], key=lambda x: text.parse_int(x["quality"][:-1])) yield self.commit(deviation, video) if "flash" in deviation: yield self.commit(deviation, deviation["flash"]) if self.commit_journal: if "excerpt" in deviation: journal = self.api.deviation_content( deviation["deviationid"]) elif "body" in deviation: journal = {"html": deviation.pop("body")} else: journal = None if journal: if self.extra: deviation["_journal"] = journal["html"] yield self.commit_journal(deviation, journal) if not self.extra: continue # ref: https://www.deviantart.com # /developers/http/v1/20210526/object/editor_text # the value of "features" is a JSON string with forward # slashes escaped text_content = \ deviation["text_content"]["body"]["features"].replace( "\\/", "/") if "text_content" in deviation else None for txt in (text_content, deviation.get("description"), deviation.get("_journal")): if txt is None: continue for match in DeviantartStashExtractor.pattern.finditer(txt): url = text.ensure_http_scheme(match.group(0)) deviation["_extractor"] = DeviantartStashExtractor yield Message.Queue, url, deviation def deviations(self): """Return an iterable containing all relevant Deviation-objects""" def prepare(self, deviation): """Adjust the contents of a Deviation-object""" if "index" not in deviation: try: if deviation["url"].startswith("https://sta.sh"): filename = deviation["content"]["src"].split("/")[5] deviation["index_base36"] = filename.partition("-")[0][1:] deviation["index"] = id_from_base36( deviation["index_base36"]) else: deviation["index"] = text.parse_int( deviation["url"].rpartition("-")[2]) except KeyError: deviation["index"] = 0 deviation["index_base36"] = "0" if "index_base36" not in deviation: deviation["index_base36"] = base36_from_id(deviation["index"]) if self.user: deviation["username"] = self.user deviation["_username"] = self.user.lower() else: deviation["username"] = deviation["author"]["username"] deviation["_username"] = deviation["username"].lower() deviation["da_category"] = deviation["category"] deviation["published_time"] = text.parse_int( deviation["published_time"]) deviation["date"] = text.parse_timestamp( deviation["published_time"]) if self.comments: deviation["comments"] = ( self.api.comments(deviation["deviationid"], target="deviation") if deviation["stats"]["comments"] else () ) # filename metadata sub = re.compile(r"\W").sub deviation["filename"] = "".join(( sub("_", deviation["title"].lower()), "_by_", sub("_", deviation["author"]["username"].lower()), "-d", deviation["index_base36"], )) @staticmethod def commit(deviation, target): url = target["src"] name = target.get("filename") or url target = target.copy() target["filename"] = deviation["filename"] deviation["target"] = target deviation["extension"] = target["extension"] = text.ext_from_url(name) return Message.Url, url, deviation def _commit_journal_html(self, deviation, journal): title = text.escape(deviation["title"]) url = deviation["url"] thumbs = deviation.get("thumbs") or deviation.get("files") html = journal["html"] shadow = SHADOW_TEMPLATE.format_map(thumbs[0]) if thumbs else "" if "css" in journal: css, cls = journal["css"], "withskin" elif html.startswith("<style"): css, _, html = html.partition("</style>") css = css.partition(">")[2] cls = "withskin" else: css, cls = "", "journal-green" if html.find('<div class="boxtop journaltop">', 0, 250) != -1: needle = '<div class="boxtop journaltop">' header = HEADER_CUSTOM_TEMPLATE.format( title=title, url=url, date=deviation["date"], ) else: needle = '<div usr class="gr">' catlist = deviation["category_path"].split("/") categories = " / ".join( ('<span class="crumb"><a href="{}/{}/"><span>{}</span></a>' '</span>').format(self.root, cpath, cat.capitalize()) for cat, cpath in zip( catlist, itertools.accumulate(catlist, lambda t, c: t + "/" + c) ) ) username = deviation["author"]["username"] urlname = deviation.get("username") or username.lower() header = HEADER_TEMPLATE.format( title=title, url=url, userurl="{}/{}/".format(self.root, urlname), username=username, date=deviation["date"], categories=categories, ) if needle in html: html = html.replace(needle, header, 1) else: html = JOURNAL_TEMPLATE_HTML_EXTRA.format(header, html) html = JOURNAL_TEMPLATE_HTML.format( title=title, html=html, shadow=shadow, css=css, cls=cls) deviation["extension"] = "htm" return Message.Url, html, deviation @staticmethod def _commit_journal_text(deviation, journal): html = journal["html"] if html.startswith("<style"): html = html.partition("</style>")[2] head, _, tail = html.rpartition("<script") content = "\n".join( text.unescape(text.remove_html(txt)) for txt in (head or tail).split("<br />") ) txt = JOURNAL_TEMPLATE_TEXT.format( title=deviation["title"], username=deviation["author"]["username"], date=deviation["date"], content=content, ) deviation["extension"] = "txt" return Message.Url, txt, deviation @staticmethod def _find_folder(folders, name, uuid): if uuid.isdecimal(): match = re.compile(name.replace( "-", r"[^a-z0-9]+") + "$", re.IGNORECASE).match for folder in folders: if match(folder["name"]): return folder else: for folder in folders: if folder["folderid"] == uuid: return folder raise exception.NotFoundError("folder") def _folder_urls(self, folders, category, extractor): base = "{}/{}/{}/".format(self.root, self.user, category) for folder in folders: folder["_extractor"] = extractor url = "{}{}/{}".format(base, folder["folderid"], folder["name"]) yield url, folder def _update_content_default(self, deviation, content): public = "premium_folder_data" not in deviation data = self.api.deviation_download(deviation["deviationid"], public) content.update(data) def _update_content_image(self, deviation, content): data = self.api.deviation_download(deviation["deviationid"]) url = data["src"].partition("?")[0] mtype = mimetypes.guess_type(url, False)[0] if mtype and mtype.startswith("image/"): content.update(data) def _update_token(self, deviation, content): """Replace JWT to be able to remove width/height limits All credit goes to @Ironchest337 for discovering and implementing this method """ url, sep, _ = content["src"].partition("/v1/") if not sep: return # header = b'{"typ":"JWT","alg":"none"}' payload = ( b'{"sub":"urn:app:","iss":"urn:app:","obj":[[{"path":"/f/' + url.partition("/f/")[2].encode() + b'"}]],"aud":["urn:service:file.download"]}' ) deviation["_fallback"] = (content["src"],) content["src"] = ( "{}?token=eyJ0eXAiOiJKV1QiLCJhbGciOiJub25lIn0.{}.".format( url, # base64 of 'header' is precomputed as 'eyJ0eX...' # binascii.a2b_base64(header).rstrip(b"=\n").decode(), binascii.b2a_base64(payload).rstrip(b"=\n").decode()) ) def _limited_request(self, url, **kwargs): """Limits HTTP requests to one every 2 seconds""" kwargs["fatal"] = None diff = time.time() - DeviantartExtractor._last_request if diff < 2.0: self.sleep(2.0 - diff, "request") while True: response = self.request(url, **kwargs) if response.status_code != 403 or \ b"Request blocked." not in response.content: DeviantartExtractor._last_request = time.time() return response self.wait(seconds=180) def _fetch_premium(self, deviation): try: return self._premium_cache[deviation["deviationid"]] except KeyError: pass if not self.api.refresh_token_key: self.log.warning( "Unable to access premium content (no refresh-token)") self._fetch_premium = lambda _: None return None dev = self.api.deviation(deviation["deviationid"], False) folder = dev["premium_folder_data"] username = dev["author"]["username"] has_access = folder["has_access"] if not has_access and folder["type"] == "watchers" and \ self.config("auto-watch"): if self.unwatch is not None: self.unwatch.append(username) if self.api.user_friends_watch(username): has_access = True self.log.info( "Watching %s for premium folder access", username) else: self.log.warning( "Error when trying to watch %s. " "Try again with a new refresh-token", username) if has_access: self.log.info("Fetching premium folder data") else: self.log.warning("Unable to access premium content (type: %s)", folder["type"]) cache = self._premium_cache for dev in self.api.gallery( username, folder["gallery_id"], public=False): cache[dev["deviationid"]] = dev if has_access else None return cache[deviation["deviationid"]] def _unwatch_premium(self): for username in self.unwatch: self.log.info("Unwatching %s", username) self.api.user_friends_unwatch(username) def _eclipse_to_oauth(self, eclipse_api, deviations): for obj in deviations: deviation = obj["deviation"] if "deviation" in obj else obj deviation_uuid = eclipse_api.deviation_extended_fetch( deviation["deviationId"], deviation["author"]["username"], "journal" if deviation["isJournal"] else "art", )["deviation"]["extended"]["deviationUuid"] yield self.api.deviation(deviation_uuid) class DeviantartUserExtractor(DeviantartExtractor): """Extractor for an artist's user profile""" subcategory = "user" pattern = BASE_PATTERN + r"/?$" test = ( ("https://www.deviantart.com/shimoda7", { "pattern": r"/shimoda7/gallery$", }), ("https://www.deviantart.com/shimoda7", { "options": (("include", "all"),), "pattern": r"/shimoda7/" r"(gallery(/scraps)?|posts(/statuses)?|favourites)$", "count": 5, }), ("https://shimoda7.deviantart.com/"), ) def items(self): base = "{}/{}/".format(self.root, self.user) return self._dispatch_extractors(( (DeviantartGalleryExtractor , base + "gallery"), (DeviantartScrapsExtractor , base + "gallery/scraps"), (DeviantartJournalExtractor , base + "posts"), (DeviantartStatusExtractor , base + "posts/statuses"), (DeviantartFavoriteExtractor, base + "favourites"), ), ("gallery",)) ############################################################################### # OAuth ####################################################################### class DeviantartGalleryExtractor(DeviantartExtractor): """Extractor for all deviations from an artist's gallery""" subcategory = "gallery" archive_fmt = "g_{_username}_{index}.{extension}" pattern = BASE_PATTERN + r"/gallery(?:/all|/?\?catpath=)?/?$" test = ( ("https://www.deviantart.com/shimoda7/gallery/", { "pattern": r"https://(images-)?wixmp-[^.]+\.wixmp\.com" r"/f/.+/.+\.(jpg|png)\?token=.+", "count": ">= 30", "keyword": { "allows_comments": bool, "author": { "type": "regular", "usericon": str, "userid": "9AE51FC7-0278-806C-3FFF-F4961ABF9E2B", "username": "shimoda7", }, "category_path": str, "content": { "filesize": int, "height": int, "src": str, "transparency": bool, "width": int, }, "da_category": str, "date": "type:datetime", "deviationid": str, "?download_filesize": int, "extension": str, "index": int, "is_deleted": bool, "is_downloadable": bool, "is_favourited": bool, "is_mature": bool, "preview": { "height": int, "src": str, "transparency": bool, "width": int, }, "published_time": int, "stats": { "comments": int, "favourites": int, }, "target": dict, "thumbs": list, "title": str, "url": r"re:https://www.deviantart.com/shimoda7/art/[^/]+-\d+", "username": "shimoda7", }, }), # group ("https://www.deviantart.com/yakuzafc/gallery", { "pattern": r"https://www.deviantart.com/yakuzafc/gallery" r"/\w{8}-\w{4}-\w{4}-\w{4}-\w{12}/", "count": ">= 15", }), # 'folders' option (#276) ("https://www.deviantart.com/justatest235723/gallery", { "count": 3, "options": (("metadata", 1), ("folders", 1), ("original", 0)), "keyword": { "description": str, "folders": list, "is_watching": bool, "license": str, "tags": list, }, }), ("https://www.deviantart.com/shimoda8/gallery/", { "exception": exception.NotFoundError, }), ("https://www.deviantart.com/shimoda7/gallery"), ("https://www.deviantart.com/shimoda7/gallery/all"), ("https://www.deviantart.com/shimoda7/gallery/?catpath=/"), ("https://shimoda7.deviantart.com/gallery/"), ("https://shimoda7.deviantart.com/gallery/all/"), ("https://shimoda7.deviantart.com/gallery/?catpath=/"), ) def deviations(self): if self.flat and not self.group: return self.api.gallery_all(self.user, self.offset) folders = self.api.gallery_folders(self.user) return self._folder_urls(folders, "gallery", DeviantartFolderExtractor) class DeviantartFolderExtractor(DeviantartExtractor): """Extractor for deviations inside an artist's gallery folder""" subcategory = "folder" directory_fmt = ("{category}", "{username}", "{folder[title]}") archive_fmt = "F_{folder[uuid]}_{index}.{extension}" pattern = BASE_PATTERN + r"/gallery/([^/?#]+)/([^/?#]+)" test = ( # user ("https://www.deviantart.com/shimoda7/gallery/722019/Miscellaneous", { "count": 5, "options": (("original", False),), }), # group ("https://www.deviantart.com/yakuzafc/gallery/37412168/Crafts", { "count": ">= 4", "options": (("original", False),), }), # uuid (("https://www.deviantart.com/shimoda7/gallery" "/B38E3C6A-2029-6B45-757B-3C8D3422AD1A/misc"), { "count": 5, "options": (("original", False),), }), # name starts with '_', special characters (#1451) (("https://www.deviantart.com/justatest235723" "/gallery/69302698/-test-b-c-d-e-f-"), { "count": 1, "options": (("original", False),), }), ("https://shimoda7.deviantart.com/gallery/722019/Miscellaneous"), ("https://yakuzafc.deviantart.com/gallery/37412168/Crafts"), ) def __init__(self, match): DeviantartExtractor.__init__(self, match) self.folder = None self.folder_id = match.group(3) self.folder_name = match.group(4) def deviations(self): folders = self.api.gallery_folders(self.user) folder = self._find_folder(folders, self.folder_name, self.folder_id) self.folder = { "title": folder["name"], "uuid" : folder["folderid"], "index": self.folder_id, "owner": self.user, } return self.api.gallery(self.user, folder["folderid"], self.offset) def prepare(self, deviation): DeviantartExtractor.prepare(self, deviation) deviation["folder"] = self.folder class DeviantartStashExtractor(DeviantartExtractor): """Extractor for sta.sh-ed deviations""" subcategory = "stash" archive_fmt = "{index}.{extension}" pattern = r"(?:https?://)?sta\.sh/([a-z0-9]+)" test = ( ("https://sta.sh/022c83odnaxc", { "pattern": r"https://wixmp-[^.]+\.wixmp\.com" r"/f/.+/.+\.png\?token=.+", "content": "057eb2f2861f6c8a96876b13cca1a4b7a408c11f", "count": 1, }), # multiple stash items ("https://sta.sh/21jf51j7pzl2", { "options": (("original", False),), "count": 4, }), # downloadable, but no "content" field (#307) ("https://sta.sh/024t4coz16mi", { "pattern": r"https://wixmp-[^.]+\.wixmp\.com" r"/f/.+/.+\.rar\?token=.+", "count": 1, }), # mixed folders and images (#659) ("https://sta.sh/215twi387vfj", { "options": (("original", False),), "count": 4, }), ("https://sta.sh/abcdefghijkl", { "count": 0, }), ) skip = Extractor.skip def __init__(self, match): DeviantartExtractor.__init__(self, match) self.user = None self.stash_id = match.group(1) def deviations(self, stash_id=None): if stash_id is None: stash_id = self.stash_id url = "https://sta.sh/" + stash_id page = self._limited_request(url).text if stash_id[0] == "0": uuid = text.extr(page, '//deviation/', '"') if uuid: deviation = self.api.deviation(uuid) deviation["index"] = text.parse_int(text.extr( page, 'gmi-deviationid="', '"')) yield deviation return for item in text.extract_iter( page, 'class="stash-thumb-container', '</div>'): url = text.extr(item, '<a href="', '"') if url: stash_id = url.rpartition("/")[2] else: stash_id = text.extr(item, 'gmi-stashid="', '"') stash_id = "2" + util.bencode(text.parse_int( stash_id), "0123456789abcdefghijklmnopqrstuvwxyz") if len(stash_id) > 2: yield from self.deviations(stash_id) class DeviantartFavoriteExtractor(DeviantartExtractor): """Extractor for an artist's favorites""" subcategory = "favorite" directory_fmt = ("{category}", "{username}", "Favourites") archive_fmt = "f_{_username}_{index}.{extension}" pattern = BASE_PATTERN + r"/favourites(?:/all|/?\?catpath=)?/?$" test = ( ("https://www.deviantart.com/h3813067/favourites/", { "options": (("metadata", True), ("flat", False)), # issue #271 "count": 1, }), ("https://www.deviantart.com/h3813067/favourites/", { "content": "6a7c74dc823ebbd457bdd9b3c2838a6ee728091e", }), ("https://www.deviantart.com/h3813067/favourites/all"), ("https://www.deviantart.com/h3813067/favourites/?catpath=/"), ("https://h3813067.deviantart.com/favourites/"), ("https://h3813067.deviantart.com/favourites/all"), ("https://h3813067.deviantart.com/favourites/?catpath=/"), ) def deviations(self): if self.flat: return self.api.collections_all(self.user, self.offset) folders = self.api.collections_folders(self.user) return self._folder_urls( folders, "favourites", DeviantartCollectionExtractor) class DeviantartCollectionExtractor(DeviantartExtractor): """Extractor for a single favorite collection""" subcategory = "collection" directory_fmt = ("{category}", "{username}", "Favourites", "{collection[title]}") archive_fmt = "C_{collection[uuid]}_{index}.{extension}" pattern = BASE_PATTERN + r"/favourites/([^/?#]+)/([^/?#]+)" test = ( (("https://www.deviantart.com/pencilshadings/favourites" "/70595441/3D-Favorites"), { "count": ">= 15", "options": (("original", False),), }), (("https://www.deviantart.com/pencilshadings/favourites" "/F050486B-CB62-3C66-87FB-1105A7F6379F/3D Favorites"), { "count": ">= 15", "options": (("original", False),), }), ("https://pencilshadings.deviantart.com" "/favourites/70595441/3D-Favorites"), ) def __init__(self, match): DeviantartExtractor.__init__(self, match) self.collection = None self.collection_id = match.group(3) self.collection_name = match.group(4) def deviations(self): folders = self.api.collections_folders(self.user) folder = self._find_folder( folders, self.collection_name, self.collection_id) self.collection = { "title": folder["name"], "uuid" : folder["folderid"], "index": self.collection_id, "owner": self.user, } return self.api.collections(self.user, folder["folderid"], self.offset) def prepare(self, deviation): DeviantartExtractor.prepare(self, deviation) deviation["collection"] = self.collection class DeviantartJournalExtractor(DeviantartExtractor): """Extractor for an artist's journals""" subcategory = "journal" directory_fmt = ("{category}", "{username}", "Journal") archive_fmt = "j_{_username}_{index}.{extension}" pattern = BASE_PATTERN + r"/(?:posts(?:/journals)?|journal)/?(?:\?.*)?$" test = ( ("https://www.deviantart.com/angrywhitewanker/posts/journals/", { "url": "38db2a0d3a587a7e0f9dba7ff7d274610ebefe44", }), ("https://www.deviantart.com/angrywhitewanker/posts/journals/", { "url": "b2a8e74d275664b1a4acee0fca0a6fd33298571e", "options": (("journals", "text"),), }), ("https://www.deviantart.com/angrywhitewanker/posts/journals/", { "count": 0, "options": (("journals", "none"),), }), ("https://www.deviantart.com/shimoda7/posts/"), ("https://www.deviantart.com/shimoda7/journal/"), ("https://www.deviantart.com/shimoda7/journal/?catpath=/"), ("https://shimoda7.deviantart.com/journal/"), ("https://shimoda7.deviantart.com/journal/?catpath=/"), ) def deviations(self): return self.api.browse_user_journals(self.user, self.offset) class DeviantartStatusExtractor(DeviantartExtractor): """Extractor for an artist's status updates""" subcategory = "status" directory_fmt = ("{category}", "{username}", "Status") filename_fmt = "{category}_{index}_{title}_{date}.{extension}" archive_fmt = "S_{_username}_{index}.{extension}" pattern = BASE_PATTERN + r"/posts/statuses" test = ( ("https://www.deviantart.com/t1na/posts/statuses", { "count": 0, }), ("https://www.deviantart.com/justgalym/posts/statuses", { "count": 4, "url": "bf4c44c0c60ff2648a880f4c3723464ad3e7d074", }), # shared deviation ("https://www.deviantart.com/justgalym/posts/statuses", { "options": (("journals", "none"),), "count": 1, "pattern": r"https://images-wixmp-\w+\.wixmp\.com/f" r"/[^/]+/[^.]+\.jpg\?token=", }), # shared sta.sh item ("https://www.deviantart.com/vanillaghosties/posts/statuses", { "options": (("journals", "none"), ("original", False)), "range": "5-", "count": 1, "keyword": { "index" : int, "index_base36": "re:^[0-9a-z]+$", "url" : "re:^https://sta.sh", }, }), # "deleted" deviations in 'items' ("https://www.deviantart.com/AndrejSKalin/posts/statuses", { "options": (("journals", "none"), ("original", 0), ("image-filter", "deviationid[:8] == '147C8B03'")), "count": 2, "archive": False, "keyword": {"deviationid": "147C8B03-7D34-AE93-9241-FA3C6DBBC655"} }), ("https://www.deviantart.com/justgalym/posts/statuses", { "options": (("journals", "text"),), "url": "c8744f7f733a3029116607b826321233c5ca452d", }), ) def deviations(self): for status in self.api.user_statuses(self.user, self.offset): yield from self.status(status) def status(self, status): for item in status.get("items") or (): # do not trust is_share # shared deviations/statuses if "deviation" in item: yield item["deviation"].copy() if "status" in item: yield from self.status(item["status"].copy()) # assume is_deleted == true means necessary fields are missing if status["is_deleted"]: self.log.warning( "Skipping status %s (deleted)", status.get("statusid")) return yield status def prepare(self, deviation): if "deviationid" in deviation: return DeviantartExtractor.prepare(self, deviation) try: path = deviation["url"].split("/") deviation["index"] = text.parse_int(path[-1] or path[-2]) except KeyError: deviation["index"] = 0 if self.user: deviation["username"] = self.user deviation["_username"] = self.user.lower() else: deviation["username"] = deviation["author"]["username"] deviation["_username"] = deviation["username"].lower() deviation["date"] = dt = text.parse_datetime(deviation["ts"]) deviation["published_time"] = int(util.datetime_to_timestamp(dt)) deviation["da_category"] = "Status" deviation["category_path"] = "status" deviation["is_downloadable"] = False deviation["title"] = "Status Update" comments_count = deviation.pop("comments_count", 0) deviation["stats"] = {"comments": comments_count} if self.comments: deviation["comments"] = ( self.api.comments(deviation["statusid"], target="status") if comments_count else () ) class DeviantartPopularExtractor(DeviantartExtractor): """Extractor for popular deviations""" subcategory = "popular" directory_fmt = ("{category}", "Popular", "{popular[range]}", "{popular[search]}") archive_fmt = "P_{popular[range]}_{popular[search]}_{index}.{extension}" pattern = (r"(?:https?://)?www\.deviantart\.com/(?:" r"(?:deviations/?)?\?order=(popular-[^/?#]+)" r"|((?:[\w-]+/)*)(popular-[^/?#]+)" r")/?(?:\?([^#]*))?") test = ( ("https://www.deviantart.com/?order=popular-all-time", { "options": (("original", False),), "range": "1-30", "count": 30, }), ("https://www.deviantart.com/popular-24-hours/?q=tree+house", { "options": (("original", False),), "range": "1-30", "count": 30, }), ("https://www.deviantart.com/artisan/popular-all-time/?q=tree"), ) def __init__(self, match): DeviantartExtractor.__init__(self, match) self.user = "" trange1, path, trange2, query = match.groups() query = text.parse_query(query) self.search_term = query.get("q") trange = trange1 or trange2 or query.get("order", "") if trange.startswith("popular-"): trange = trange[8:] self.time_range = { "newest" : "now", "most-recent" : "now", "this-week" : "1week", "this-month" : "1month", "this-century": "alltime", "all-time" : "alltime", }.get(trange, "alltime") self.popular = { "search": self.search_term or "", "range" : trange or "all-time", "path" : path.strip("/") if path else "", } def deviations(self): if self.time_range == "now": return self.api.browse_newest(self.search_term, self.offset) return self.api.browse_popular( self.search_term, self.time_range, self.offset) def prepare(self, deviation): DeviantartExtractor.prepare(self, deviation) deviation["popular"] = self.popular class DeviantartTagExtractor(DeviantartExtractor): """Extractor for deviations from tag searches""" subcategory = "tag" directory_fmt = ("{category}", "Tags", "{search_tags}") archive_fmt = "T_{search_tags}_{index}.{extension}" pattern = r"(?:https?://)?www\.deviantart\.com/tag/([^/?#]+)" test = ("https://www.deviantart.com/tag/nature", { "options": (("original", False),), "range": "1-30", "count": 30, }) def __init__(self, match): DeviantartExtractor.__init__(self, match) self.tag = text.unquote(match.group(1)) def deviations(self): return self.api.browse_tags(self.tag, self.offset) def prepare(self, deviation): DeviantartExtractor.prepare(self, deviation) deviation["search_tags"] = self.tag class DeviantartWatchExtractor(DeviantartExtractor): """Extractor for Deviations from watched users""" subcategory = "watch" pattern = (r"(?:https?://)?(?:www\.)?deviantart\.com" r"/(?:watch/deviations|notifications/watch)()()") test = ( ("https://www.deviantart.com/watch/deviations"), ("https://www.deviantart.com/notifications/watch"), ) def deviations(self): return self.api.browse_deviantsyouwatch() class DeviantartWatchPostsExtractor(DeviantartExtractor): """Extractor for Posts from watched users""" subcategory = "watch-posts" pattern = r"(?:https?://)?(?:www\.)?deviantart\.com/watch/posts()()" test = ("https://www.deviantart.com/watch/posts",) def deviations(self): return self.api.browse_posts_deviantsyouwatch() ############################################################################### # Eclipse ##################################################################### class DeviantartDeviationExtractor(DeviantartExtractor): """Extractor for single deviations""" subcategory = "deviation" archive_fmt = "g_{_username}_{index}.{extension}" pattern = (BASE_PATTERN + r"/(art|journal)/(?:[^/?#]+-)?(\d+)" r"|(?:https?://)?(?:www\.)?(?:fx)?deviantart\.com/" r"(?:view/|deviation/|view(?:-full)?\.php/*\?(?:[^#]+&)?id=)" r"(\d+)" # bare deviation ID without slug r"|(?:https?://)?fav\.me/d([0-9a-z]+)") # base36 test = ( (("https://www.deviantart.com/shimoda7/art/For-the-sake-10073852"), { "options": (("original", 0),), "content": "6a7c74dc823ebbd457bdd9b3c2838a6ee728091e", }), ("https://www.deviantart.com/zzz/art/zzz-1234567890", { "exception": exception.NotFoundError, }), (("https://www.deviantart.com/myria-moon/art/Aime-Moi-261986576"), { "options": (("comments", True),), "keyword": {"comments": list}, "pattern": r"https://wixmp-[^.]+\.wixmp\.com" r"/f/.+/.+\.jpg\?token=.+", }), # wixmp URL rewrite (("https://www.deviantart.com/citizenfresh/art/Hverarond-789295466"), { "pattern": (r"https://images-wixmp-\w+\.wixmp\.com/f" r"/[^/]+/[^.]+\.jpg\?token="), }), # GIF (#242) (("https://www.deviantart.com/skatergators/art/COM-Moni-781571783"), { "pattern": r"https://wixmp-\w+\.wixmp\.com/f/03fd2413-efe9-4e5c-" r"8734-2b72605b3fbb/dcxbsnb-1bbf0b38-42af-4070-8878-" r"f30961955bec\.gif\?token=ey...", }), # Flash animation with GIF preview (#1731) ("https://www.deviantart.com/yuumei/art/Flash-Comic-214724929", { "pattern": r"https://wixmp-[^.]+\.wixmp\.com" r"/f/.+/.+\.swf\?token=.+", "keyword": { "filename": "flash_comic_tutorial_by_yuumei-d3juatd", "extension": "swf", }, }), # sta.sh URLs from description (#302) (("https://www.deviantart.com/uotapo/art/INANAKI-Memo-590297498"), { "options": (("extra", 1), ("original", 0)), "pattern": DeviantartStashExtractor.pattern, "range": "2-", "count": 4, }), # sta.sh URL from deviation["text_content"]["body"]["features"] (("https://www.deviantart.com" "/cimar-wildehopps/art/Honorary-Vixen-859809305"), { "options": (("extra", 1),), "pattern": ("text:<!DOCTYPE html>\n|" + DeviantartStashExtractor.pattern), "count": 2, }), # journal ("https://www.deviantart.com/shimoda7/journal/ARTility-583755752", { "url": "d34b2c9f873423e665a1b8ced20fcb75951694a3", "pattern": "text:<!DOCTYPE html>\n", }), # journal-like post with isJournal == False (#419) ("https://www.deviantart.com/gliitchlord/art/brashstrokes-812942668", { "url": "e2e0044bd255304412179b6118536dbd9bb3bb0e", "pattern": "text:<!DOCTYPE html>\n", }), # /view/ URLs ("https://deviantart.com/view/904858796/", { "content": "8770ec40ad1c1d60f6b602b16301d124f612948f", }), ("http://www.deviantart.com/view/890672057", { "content": "1497e13d925caeb13a250cd666b779a640209236", }), ("https://www.deviantart.com/view/706871727", { "content": "3f62ae0c2fca2294ac28e41888ea06bb37c22c65", }), ("https://www.deviantart.com/view/1", { "exception": exception.NotFoundError, }), # /deviation/ (#3558) ("https://www.deviantart.com/deviation/817215762"), # fav.me (#3558) ("https://fav.me/ddijrpu", { "count": 1, }), ("https://fav.me/dddd", { "exception": exception.NotFoundError, }), # old-style URLs ("https://shimoda7.deviantart.com" "/art/For-the-sake-of-a-memory-10073852"), ("https://myria-moon.deviantart.com" "/art/Aime-Moi-part-en-vadrouille-261986576"), ("https://zzz.deviantart.com/art/zzz-1234567890"), # old /view/ URLs from the Wayback Machine ("https://www.deviantart.com/view.php?id=14864502"), ("http://www.deviantart.com/view-full.php?id=100842"), ("https://www.fxdeviantart.com/zzz/art/zzz-1234567890"), ("https://www.fxdeviantart.com/view/1234567890"), ) skip = Extractor.skip def __init__(self, match): DeviantartExtractor.__init__(self, match) self.type = match.group(3) self.deviation_id = \ match.group(4) or match.group(5) or id_from_base36(match.group(6)) def deviations(self): url = "{}/{}/{}/{}".format( self.root, self.user or "u", self.type or "art", self.deviation_id) uuid = text.extract(self._limited_request(url).text, '"deviationUuid\\":\\"', '\\')[0] if not uuid: raise exception.NotFoundError("deviation") return (self.api.deviation(uuid),) class DeviantartScrapsExtractor(DeviantartExtractor): """Extractor for an artist's scraps""" subcategory = "scraps" directory_fmt = ("{category}", "{username}", "Scraps") archive_fmt = "s_{_username}_{index}.{extension}" cookiedomain = ".deviantart.com" pattern = BASE_PATTERN + r"/gallery/(?:\?catpath=)?scraps\b" test = ( ("https://www.deviantart.com/shimoda7/gallery/scraps", { "count": 12, }), ("https://www.deviantart.com/shimoda7/gallery/?catpath=scraps"), ("https://shimoda7.deviantart.com/gallery/?catpath=scraps"), ) def deviations(self): self.login() eclipse_api = DeviantartEclipseAPI(self) return self._eclipse_to_oauth( eclipse_api, eclipse_api.gallery_scraps(self.user, self.offset)) class DeviantartSearchExtractor(DeviantartExtractor): """Extractor for deviantart search results""" subcategory = "search" directory_fmt = ("{category}", "Search", "{search_tags}") archive_fmt = "Q_{search_tags}_{index}.{extension}" cookiedomain = ".deviantart.com" pattern = (r"(?:https?://)?www\.deviantart\.com" r"/search(?:/deviations)?/?\?([^#]+)") test = ( ("https://www.deviantart.com/search?q=tree"), ("https://www.deviantart.com/search/deviations?order=popular-1-week"), ) skip = Extractor.skip def __init__(self, match): DeviantartExtractor.__init__(self, match) self.query = text.parse_query(self.user) self.search = self.query.get("q", "") self.user = "" def deviations(self): logged_in = self.login() eclipse_api = DeviantartEclipseAPI(self) search = (eclipse_api.search_deviations if logged_in else self._search_html) return self._eclipse_to_oauth(eclipse_api, search(self.query)) def prepare(self, deviation): DeviantartExtractor.prepare(self, deviation) deviation["search_tags"] = self.search def _search_html(self, params): url = self.root + "/search" deviation = { "deviationId": None, "author": {"username": "u"}, "isJournal": False, } while True: page = self.request(url, params=params).text items , pos = text.rextract(page, r'\"items\":[', ']') cursor, pos = text.extract(page, r'\"cursor\":\"', '\\', pos) for deviation_id in items.split(","): deviation["deviationId"] = deviation_id yield deviation if not cursor: return params["cursor"] = cursor class DeviantartGallerySearchExtractor(DeviantartExtractor): """Extractor for deviantart gallery searches""" subcategory = "gallery-search" archive_fmt = "g_{_username}_{index}.{extension}" cookiedomain = ".deviantart.com" pattern = BASE_PATTERN + r"/gallery/?\?(q=[^#]+)" test = ( ("https://www.deviantart.com/shimoda7/gallery?q=memory", { "options": (("original", 0),), "content": "6a7c74dc823ebbd457bdd9b3c2838a6ee728091e", }), ("https://www.deviantart.com/shimoda7/gallery?q=memory&sort=popular"), ) def __init__(self, match): DeviantartExtractor.__init__(self, match) self.query = match.group(3) def deviations(self): self.login() eclipse_api = DeviantartEclipseAPI(self) info = eclipse_api.user_info(self.user) query = text.parse_query(self.query) self.search = query["q"] return self._eclipse_to_oauth( eclipse_api, eclipse_api.galleries_search( info["user"]["userId"], self.search, self.offset, query.get("sort", "most-recent"), )) def prepare(self, deviation): DeviantartExtractor.prepare(self, deviation) deviation["search_tags"] = self.search class DeviantartFollowingExtractor(DeviantartExtractor): """Extractor for user's watched users""" subcategory = "following" pattern = BASE_PATTERN + "/about#watching$" test = ("https://www.deviantart.com/shimoda7/about#watching", { "pattern": DeviantartUserExtractor.pattern, "range": "1-50", "count": 50, }) def items(self): eclipse_api = DeviantartEclipseAPI(self) for user in eclipse_api.user_watching(self.user, self.offset): url = "{}/{}".format(self.root, user["username"]) user["_extractor"] = DeviantartUserExtractor yield Message.Queue, url, user ############################################################################### # API Interfaces ############################################################## class DeviantartOAuthAPI(): """Interface for the DeviantArt OAuth API Ref: https://www.deviantart.com/developers/http/v1/20160316 """ CLIENT_ID = "5388" CLIENT_SECRET = "76b08c69cfb27f26d6161f9ab6d061a1" def __init__(self, extractor): self.extractor = extractor self.log = extractor.log self.headers = {"dA-minor-version": "20200519"} self._warn_429 = True self.delay = extractor.config("wait-min", 0) self.delay_min = max(2, self.delay) self.mature = extractor.config("mature", "true") if not isinstance(self.mature, str): self.mature = "true" if self.mature else "false" self.folders = extractor.config("folders", False) self.metadata = extractor.extra or extractor.config("metadata", False) self.strategy = extractor.config("pagination") self.client_id = extractor.config("client-id") if self.client_id: self.client_secret = extractor.config("client-secret") else: self.client_id = self.CLIENT_ID self.client_secret = self.CLIENT_SECRET token = extractor.config("refresh-token") if token is None or token == "cache": token = "#" + str(self.client_id) if not _refresh_token_cache(token): token = None self.refresh_token_key = token self.log.debug( "Using %s API credentials (client-id %s)", "default" if self.client_id == self.CLIENT_ID else "custom", self.client_id, ) def browse_deviantsyouwatch(self, offset=0): """Yield deviations from users you watch""" endpoint = "/browse/deviantsyouwatch" params = {"limit": "50", "offset": offset, "mature_content": self.mature} return self._pagination(endpoint, params, public=False) def browse_posts_deviantsyouwatch(self, offset=0): """Yield posts from users you watch""" endpoint = "/browse/posts/deviantsyouwatch" params = {"limit": "50", "offset": offset, "mature_content": self.mature} return self._pagination(endpoint, params, public=False, unpack=True) def browse_newest(self, query=None, offset=0): """Browse newest deviations""" endpoint = "/browse/newest" params = { "q" : query, "limit" : 50 if self.metadata else 120, "offset" : offset, "mature_content": self.mature, } return self._pagination(endpoint, params) def browse_popular(self, query=None, timerange=None, offset=0): """Yield popular deviations""" endpoint = "/browse/popular" params = { "q" : query, "limit" : 50 if self.metadata else 120, "timerange" : timerange, "offset" : offset, "mature_content": self.mature, } return self._pagination(endpoint, params) def browse_tags(self, tag, offset=0): """ Browse a tag """ endpoint = "/browse/tags" params = { "tag" : tag, "offset" : offset, "limit" : 50, "mature_content": self.mature, } return self._pagination(endpoint, params) def browse_user_journals(self, username, offset=0): """Yield all journal entries of a specific user""" endpoint = "/browse/user/journals" params = {"username": username, "offset": offset, "limit": 50, "mature_content": self.mature, "featured": "false"} return self._pagination(endpoint, params) def collections(self, username, folder_id, offset=0): """Yield all Deviation-objects contained in a collection folder""" endpoint = "/collections/" + folder_id params = {"username": username, "offset": offset, "limit": 24, "mature_content": self.mature} return self._pagination(endpoint, params) def collections_all(self, username, offset=0): """Yield all deviations in a user's collection""" endpoint = "/collections/all" params = {"username": username, "offset": offset, "limit": 24, "mature_content": self.mature} return self._pagination(endpoint, params) @memcache(keyarg=1) def collections_folders(self, username, offset=0): """Yield all collection folders of a specific user""" endpoint = "/collections/folders" params = {"username": username, "offset": offset, "limit": 50, "mature_content": self.mature} return self._pagination_list(endpoint, params) def comments(self, id, target, offset=0): """Fetch comments posted on a target""" endpoint = "/comments/{}/{}".format(target, id) params = {"maxdepth": "5", "offset": offset, "limit": 50, "mature_content": self.mature} return self._pagination_list(endpoint, params=params, key="thread") def deviation(self, deviation_id, public=True): """Query and return info about a single Deviation""" endpoint = "/deviation/" + deviation_id deviation = self._call(endpoint, public=public) if self.metadata: self._metadata((deviation,)) if self.folders: self._folders((deviation,)) return deviation def deviation_content(self, deviation_id, public=True): """Get extended content of a single Deviation""" endpoint = "/deviation/content" params = {"deviationid": deviation_id} content = self._call(endpoint, params=params, public=public) if public and content["html"].startswith( ' <span class=\"username-with-symbol'): if self.refresh_token_key: content = self._call(endpoint, params=params, public=False) else: self.log.warning("Private Journal") return content def deviation_download(self, deviation_id, public=True): """Get the original file download (if allowed)""" endpoint = "/deviation/download/" + deviation_id params = {"mature_content": self.mature} return self._call(endpoint, params=params, public=public) def deviation_metadata(self, deviations): """ Fetch deviation metadata for a set of deviations""" endpoint = "/deviation/metadata?" + "&".join( "deviationids[{}]={}".format(num, deviation["deviationid"]) for num, deviation in enumerate(deviations) ) params = {"mature_content": self.mature} return self._call(endpoint, params=params)["metadata"] def gallery(self, username, folder_id, offset=0, extend=True, public=True): """Yield all Deviation-objects contained in a gallery folder""" endpoint = "/gallery/" + folder_id params = {"username": username, "offset": offset, "limit": 24, "mature_content": self.mature, "mode": "newest"} return self._pagination(endpoint, params, extend, public) def gallery_all(self, username, offset=0): """Yield all Deviation-objects of a specific user""" endpoint = "/gallery/all" params = {"username": username, "offset": offset, "limit": 24, "mature_content": self.mature} return self._pagination(endpoint, params) @memcache(keyarg=1) def gallery_folders(self, username, offset=0): """Yield all gallery folders of a specific user""" endpoint = "/gallery/folders" params = {"username": username, "offset": offset, "limit": 50, "mature_content": self.mature} return self._pagination_list(endpoint, params) @memcache(keyarg=1) def user_profile(self, username): """Get user profile information""" endpoint = "/user/profile/" + username return self._call(endpoint, fatal=False) def user_statuses(self, username, offset=0): """Yield status updates of a specific user""" endpoint = "/user/statuses/" params = {"username": username, "offset": offset, "limit": 50} return self._pagination(endpoint, params) def user_friends_watch(self, username): """Watch a user""" endpoint = "/user/friends/watch/" + username data = { "watch[friend]" : "0", "watch[deviations]" : "0", "watch[journals]" : "0", "watch[forum_threads]": "0", "watch[critiques]" : "0", "watch[scraps]" : "0", "watch[activity]" : "0", "watch[collections]" : "0", "mature_content" : self.mature, } return self._call( endpoint, method="POST", data=data, public=False, fatal=False, ).get("success") def user_friends_unwatch(self, username): """Unwatch a user""" endpoint = "/user/friends/unwatch/" + username return self._call( endpoint, method="POST", public=False, fatal=False, ).get("success") def authenticate(self, refresh_token_key): """Authenticate the application by requesting an access token""" self.headers["Authorization"] = \ self._authenticate_impl(refresh_token_key) @cache(maxage=3600, keyarg=1) def _authenticate_impl(self, refresh_token_key): """Actual authenticate implementation""" url = "https://www.deviantart.com/oauth2/token" if refresh_token_key: self.log.info("Refreshing private access token") data = {"grant_type": "refresh_token", "refresh_token": _refresh_token_cache(refresh_token_key)} else: self.log.info("Requesting public access token") data = {"grant_type": "client_credentials"} auth = (self.client_id, self.client_secret) response = self.extractor.request( url, method="POST", data=data, auth=auth, fatal=False) data = response.json() if response.status_code != 200: self.log.debug("Server response: %s", data) raise exception.AuthenticationError('"{}" ({})'.format( data.get("error_description"), data.get("error"))) if refresh_token_key: _refresh_token_cache.update( refresh_token_key, data["refresh_token"]) return "Bearer " + data["access_token"] def _call(self, endpoint, fatal=True, public=True, **kwargs): """Call an API endpoint""" url = "https://www.deviantart.com/api/v1/oauth2" + endpoint kwargs["fatal"] = None while True: if self.delay: self.extractor.sleep(self.delay, "api") self.authenticate(None if public else self.refresh_token_key) kwargs["headers"] = self.headers response = self.extractor.request(url, **kwargs) data = response.json() status = response.status_code if 200 <= status < 400: if self.delay > self.delay_min: self.delay -= 1 return data if not fatal and status != 429: return None if data.get("error_description") == "User not found.": raise exception.NotFoundError("user or group") self.log.debug(response.text) msg = "API responded with {} {}".format( status, response.reason) if status == 429: if self.delay < 30: self.delay += 1 self.log.warning("%s. Using %ds delay.", msg, self.delay) if self._warn_429 and self.delay >= 3: self._warn_429 = False if self.client_id == self.CLIENT_ID: self.log.info( "Register your own OAuth application and use its " "credentials to prevent this error: " "https://github.com/mikf/gallery-dl/blob/master/do" "cs/configuration.rst#extractordeviantartclient-id" "--client-secret") else: self.log.error(msg) return data def _pagination(self, endpoint, params, extend=True, public=True, unpack=False, key="results"): warn = True while True: data = self._call(endpoint, params=params, public=public) if key not in data: self.log.error("Unexpected API response: %s", data) return results = data[key] if unpack: results = [item["journal"] for item in results if "journal" in item] if extend: if public and len(results) < params["limit"]: if self.refresh_token_key: self.log.debug("Switching to private access token") public = False continue elif data["has_more"] and warn: warn = False self.log.warning( "Private deviations detected! Run 'gallery-dl " "oauth:deviantart' and follow the instructions to " "be able to access them.") # "statusid" cannot be used instead if results and "deviationid" in results[0]: if self.metadata: self._metadata(results) if self.folders: self._folders(results) else: # attempt to fix "deleted" deviations for dev in self._shared_content(results): if not dev["is_deleted"]: continue patch = self._call( "/deviation/" + dev["deviationid"], fatal=False) if patch: dev.update(patch) yield from results if not data["has_more"] and ( self.strategy != "manual" or not results or not extend): return if "next_cursor" in data: params["offset"] = None params["cursor"] = data["next_cursor"] elif data["next_offset"] is not None: params["offset"] = data["next_offset"] params["cursor"] = None else: if params.get("offset") is None: return params["offset"] = int(params["offset"]) + len(results) @staticmethod def _shared_content(results): """Return an iterable of shared deviations in 'results'""" for result in results: for item in result.get("items") or (): if "deviation" in item: yield item["deviation"] def _pagination_list(self, endpoint, params, key="results"): result = [] result.extend(self._pagination(endpoint, params, False, key=key)) return result def _metadata(self, deviations): """Add extended metadata to each deviation object""" for deviation, metadata in zip( deviations, self.deviation_metadata(deviations)): deviation.update(metadata) deviation["tags"] = [t["tag_name"] for t in deviation["tags"]] def _folders(self, deviations): """Add a list of all containing folders to each deviation object""" for deviation in deviations: deviation["folders"] = self._folders_map( deviation["author"]["username"])[deviation["deviationid"]] @memcache(keyarg=1) def _folders_map(self, username): """Generate a deviation_id -> folders mapping for 'username'""" self.log.info("Collecting folder information for '%s'", username) folders = self.gallery_folders(username) # create 'folderid'-to-'folder' mapping fmap = { folder["folderid"]: folder for folder in folders } # add parent names to folders, but ignore "Featured" as parent featured = folders[0]["folderid"] done = False while not done: done = True for folder in folders: parent = folder["parent"] if not parent: pass elif parent == featured: folder["parent"] = None else: parent = fmap[parent] if parent["parent"]: done = False else: folder["name"] = parent["name"] + "/" + folder["name"] folder["parent"] = None # map deviationids to folder names dmap = collections.defaultdict(list) for folder in folders: for deviation in self.gallery( username, folder["folderid"], 0, False): dmap[deviation["deviationid"]].append(folder["name"]) return dmap class DeviantartEclipseAPI(): """Interface to the DeviantArt Eclipse API""" def __init__(self, extractor): self.extractor = extractor self.log = extractor.log self.request = self.extractor._limited_request self.csrf_token = None def deviation_extended_fetch(self, deviation_id, user=None, kind=None): endpoint = "/da-browse/shared_api/deviation/extended_fetch" params = { "deviationid" : deviation_id, "username" : user, "type" : kind, "include_session": "false", } return self._call(endpoint, params) def gallery_scraps(self, user, offset=None): endpoint = "/da-user-profile/api/gallery/contents" params = { "username" : user, "offset" : offset, "limit" : 24, "scraps_folder": "true", } return self._pagination(endpoint, params) def galleries_search(self, user_id, query, offset=None, order="most-recent"): endpoint = "/shared_api/galleries/search" params = { "userid": user_id, "order" : order, "q" : query, "offset": offset, "limit" : 24, } return self._pagination(endpoint, params) def search_deviations(self, params): endpoint = "/da-browse/api/networkbar/search/deviations" return self._pagination(endpoint, params, key="deviations") def user_info(self, user, expand=False): endpoint = "/shared_api/user/info" params = {"username": user} if expand: params["expand"] = "user.stats,user.profile,user.watch" return self._call(endpoint, params) def user_watching(self, user, offset=None): endpoint = "/da-user-profile/api/module/watching" params = { "username": user, "moduleid": self._module_id_watching(user), "offset" : offset, "limit" : 24, } return self._pagination(endpoint, params) def _call(self, endpoint, params): url = "https://www.deviantart.com/_napi" + endpoint headers = {"Referer": "https://www.deviantart.com/"} params["csrf_token"] = self.csrf_token or self._fetch_csrf_token() response = self.request( url, params=params, headers=headers, fatal=None) if response.status_code == 404: raise exception.StopExtraction( "Your account must use the Eclipse interface.") try: return response.json() except Exception: return {"error": response.text} def _pagination(self, endpoint, params, key="results"): limit = params.get("limit", 24) warn = True while True: data = self._call(endpoint, params) results = data.get(key) if results is None: return if len(results) < limit and warn and data.get("hasMore"): warn = False self.log.warning( "Private deviations detected! " "Provide login credentials or session cookies " "to be able to access them.") yield from results if not data.get("hasMore"): return if "nextCursor" in data: params["offset"] = None params["cursor"] = data["nextCursor"] elif "nextOffset" in data: params["offset"] = data["nextOffset"] params["cursor"] = None elif params.get("offset") is None: return else: params["offset"] = int(params["offset"]) + len(results) def _module_id_watching(self, user): url = "{}/{}/about".format(self.extractor.root, user) page = self.request(url).text pos = page.find('\\"type\\":\\"watching\\"') if pos < 0: raise exception.NotFoundError("module") self._fetch_csrf_token(page) return text.rextract(page, '\\"id\\":', ',', pos)[0].strip('" ') def _fetch_csrf_token(self, page=None): if page is None: page = self.request(self.extractor.root + "/").text self.csrf_token = token = text.extr( page, "window.__CSRF_TOKEN__ = '", "'") return token @cache(maxage=100*365*86400, keyarg=0) def _refresh_token_cache(token): if token and token[0] == "#": return None return token @cache(maxage=28*86400, keyarg=1) def _login_impl(extr, username, password): extr.log.info("Logging in as %s", username) url = "https://www.deviantart.com/users/login" page = extr.request(url).text data = {} for item in text.extract_iter(page, '<input type="hidden" name="', '"/>'): name, _, value = item.partition('" value="') data[name] = value challenge = data.get("challenge") if challenge and challenge != "0": extr.log.warning("Login requires solving a CAPTCHA") extr.log.debug(challenge) data["username"] = username data["password"] = password data["remember"] = "on" extr.sleep(2.0, "login") url = "https://www.deviantart.com/_sisu/do/signin" response = extr.request(url, method="POST", data=data) if not response.history: raise exception.AuthenticationError() return { cookie.name: cookie.value for cookie in extr.session.cookies } def id_from_base36(base36): return util.bdecode(base36, _ALPHABET) def base36_from_id(deviation_id): return util.bencode(int(deviation_id), _ALPHABET) _ALPHABET = "0123456789abcdefghijklmnopqrstuvwxyz" ############################################################################### # Journal Formats ############################################################# SHADOW_TEMPLATE = """ <span class="shadow"> <img src="{src}" class="smshadow" width="{width}" height="{height}"> </span> <br><br> """ HEADER_TEMPLATE = """<div usr class="gr"> <div class="metadata"> <h2><a href="{url}">{title}</a></h2> <ul> <li class="author"> by <span class="name"><span class="username-with-symbol u"> <a class="u regular username" href="{userurl}">{username}</a>\ <span class="user-symbol regular"></span></span></span>, <span>{date}</span> </li> <li class="category"> {categories} </li> </ul> </div> """ HEADER_CUSTOM_TEMPLATE = """<div class='boxtop journaltop'> <h2> <img src="https://st.deviantart.net/minish/gruzecontrol/icons/journal.gif\ ?2" style="vertical-align:middle" alt=""/> <a href="{url}">{title}</a> </h2> Journal Entry: <span>{date}</span> """ JOURNAL_TEMPLATE_HTML = """text:<!DOCTYPE html> <html> <head> <meta charset="utf-8"> <title>{title}
{shadow}
{html}
""" JOURNAL_TEMPLATE_HTML_EXTRA = """\
\
{}
{}
""" JOURNAL_TEMPLATE_TEXT = """text:{title} by {username}, {date} {content} """ ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1678397180.0 gallery_dl-1.25.0/gallery_dl/extractor/directlink.py0000644000175000017500000000537714402447374021271 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2017-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Direct link handling""" from .common import Extractor, Message from .. import text class DirectlinkExtractor(Extractor): """Extractor for direct links to images and other media files""" category = "directlink" filename_fmt = "{domain}/{path}/{filename}.{extension}" archive_fmt = filename_fmt pattern = (r"(?i)https?://(?P[^/?#]+)/(?P[^?#]+\." r"(?:jpe?g|jpe|png|gif|web[mp]|mp4|mkv|og[gmv]|opus))" r"(?:\?(?P[^#]*))?(?:#(?P.*))?$") test = ( (("https://en.wikipedia.org/static/images/project-logos/enwiki.png"), { "url": "18c5d00077332e98e53be9fed2ee4be66154b88d", "keyword": "105770a3f4393618ab7b811b731b22663b5d3794", }), # empty path (("https://example.org/file.webm"), { "url": "2d807ed7059d1b532f1bb71dc24b510b80ff943f", "keyword": "29dad729c40fb09349f83edafa498dba1297464a", }), # more complex example ("https://example.org/path/to/file.webm?que=1?&ry=2/#fragment", { "url": "6fb1061390f8aada3db01cb24b51797c7ee42b31", "keyword": "3d7abc31d45ba324e59bc599c3b4862452d5f29c", }), # percent-encoded characters ("https://example.org/%27%3C%23/%23%3E%27.jpg?key=%3C%26%3E", { "url": "2627e8140727fdf743f86fe18f69f99a052c9718", "keyword": "831790fddda081bdddd14f96985ab02dc5b5341f", }), # upper case file extension (#296) ("https://post-phinf.pstatic.net/MjAxOTA1MjlfMTQ4/MDAxNTU5MTI2NjcyNTkw" ".JUzkGb4V6dj9DXjLclrOoqR64uDxHFUO5KDriRdKpGwg.88mCtd4iT1NHlpVKSCaUpP" "mZPiDgT8hmQdQ5K_gYyu0g.JPEG/2.JPG"), # internationalized domain name ("https://räksmörgås.josefsson.org/raksmorgas.jpg", { "url": "a65667f670b194afbd1e3ea5e7a78938d36747da", "keyword": "fd5037fe86eebd4764e176cbaf318caec0f700be", }), ) def __init__(self, match): Extractor.__init__(self, match) self.data = match.groupdict() def items(self): data = self.data for key, value in data.items(): if value: data[key] = text.unquote(value) data["path"], _, name = data["path"].rpartition("/") data["filename"], _, ext = name.rpartition(".") data["extension"] = ext.lower() data["_http_headers"] = { "Referer": self.url.encode("latin-1", "ignore")} yield Message.Directory, data yield Message.Url, self.url, data ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1675805818.0 gallery_dl-1.25.0/gallery_dl/extractor/dynastyscans.py0000644000175000017500000001263614370542172021654 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2015-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://dynasty-scans.com/""" from .common import ChapterExtractor, MangaExtractor, Extractor, Message from .. import text, util import re BASE_PATTERN = r"(?:https?://)?(?:www\.)?dynasty-scans\.com" class DynastyscansBase(): """Base class for dynastyscans extractors""" category = "dynastyscans" root = "https://dynasty-scans.com" def _parse_image_page(self, image_id): url = "{}/images/{}".format(self.root, image_id) extr = text.extract_from(self.request(url).text) date = extr("class='create_at'>", "") tags = extr("class='tags'>", "") src = extr("class='btn-group'>", "") url = extr(' src="', '"') src = text.extr(src, 'href="', '"') if "Source<" in src else "" return { "url" : self.root + url, "image_id": text.parse_int(image_id), "tags" : text.split_html(tags), "date" : text.remove_html(date), "source" : text.unescape(src), } class DynastyscansChapterExtractor(DynastyscansBase, ChapterExtractor): """Extractor for manga-chapters from dynasty-scans.com""" pattern = BASE_PATTERN + r"(/chapters/[^/?#]+)" test = ( (("http://dynasty-scans.com/chapters/" "hitoribocchi_no_oo_seikatsu_ch33"), { "url": "dce64e8c504118f1ab4135c00245ea12413896cb", "keyword": "b67599703c27316a2fe4f11c3232130a1904e032", }), (("http://dynasty-scans.com/chapters/" "new_game_the_spinoff_special_13"), { "url": "dbe5bbb74da2edcfb1832895a484e2a40bc8b538", "keyword": "6b674eb3a274999153f6be044973b195008ced2f", }), ) def metadata(self, page): extr = text.extract_from(page) match = re.match( (r"(?:]*>)?([^<]+)(?:
)?" # manga name r"(?: ch(\d+)([^:<]*))?" # chapter info r"(?:: (.+))?"), # title extr("

", ""), ) author = extr(" by ", "") group = extr('"icon-print"> ', '') return { "manga" : text.unescape(match.group(1)), "chapter" : text.parse_int(match.group(2)), "chapter_minor": match.group(3) or "", "title" : text.unescape(match.group(4) or ""), "author" : text.remove_html(author), "group" : (text.remove_html(group) or text.extr(group, ' alt="', '"')), "date" : text.parse_datetime(extr( '"icon-calendar"> ', '<'), "%b %d, %Y"), "lang" : "en", "language": "English", } def images(self, page): data = text.extr(page, "var pages = ", ";\n") return [ (self.root + img["image"], None) for img in util.json_loads(data) ] class DynastyscansMangaExtractor(DynastyscansBase, MangaExtractor): chapterclass = DynastyscansChapterExtractor reverse = False pattern = BASE_PATTERN + r"(/series/[^/?#]+)" test = ("https://dynasty-scans.com/series/hitoribocchi_no_oo_seikatsu", { "pattern": DynastyscansChapterExtractor.pattern, "count": ">= 100", }) def chapters(self, page): return [ (self.root + path, {}) for path in text.extract_iter(page, '
\n= 70", }), ("https://e926.net/explore/posts/popular"), (("https://e926.net/explore/posts/popular" "?date=2019-06-01&scale=month"), { "pattern": r"https://static\d.e926.net/data/../../[0-9a-f]+", "count": ">= 70", }), ) def posts(self): if self.page_start is None: self.page_start = 1 return self._pagination("/popular.json", self.params, True) class E621FavoriteExtractor(E621Extractor): """Extractor for e621 favorites""" subcategory = "favorite" directory_fmt = ("{category}", "Favorites", "{user_id}") archive_fmt = "f_{user_id}_{id}" pattern = BASE_PATTERN + r"/favorites(?:\?([^#]*))?" test = ( ("https://e621.net/favorites"), ("https://e621.net/favorites?page=2&user_id=53275", { "pattern": r"https://static\d.e621.net/data/../../[0-9a-f]+", "count": "> 260", }), ("https://e926.net/favorites"), ("https://e926.net/favorites?page=2&user_id=53275", { "pattern": r"https://static\d.e926.net/data/../../[0-9a-f]+", "count": "> 260", }), ) def __init__(self, match): E621Extractor.__init__(self, match) self.query = text.parse_query(match.group(match.lastindex)) def metadata(self): return {"user_id": self.query.get("user_id", "")} def posts(self): if self.page_start is None: self.page_start = 1 return self._pagination("/favorites.json", self.query, True) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1677790111.0 gallery_dl-1.25.0/gallery_dl/extractor/erome.py0000644000175000017500000001117514400205637020231 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2021-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.erome.com/""" from .common import Extractor, Message from .. import text, util, exception from ..cache import cache import itertools BASE_PATTERN = r"(?:https?://)?(?:www\.)?erome\.com" class EromeExtractor(Extractor): category = "erome" directory_fmt = ("{category}", "{user}") filename_fmt = "{album_id} {title} {num:>02}.{extension}" archive_fmt = "{album_id}_{num}" root = "https://www.erome.com" def __init__(self, match): Extractor.__init__(self, match) self.item = match.group(1) self.__cookies = True def items(self): for album_id in self.albums(): url = "{}/a/{}".format(self.root, album_id) try: page = self.request(url).text except exception.HttpError as exc: self.log.warning( "Unable to fetch album '%s' (%s)", album_id, exc) continue title, pos = text.extract( page, 'property="og:title" content="', '"') pos = page.index('
Please wait a few moments", 0, 600) < 0: return response self.sleep(5.0, "check") def _pagination(self, url, params): for params["page"] in itertools.count(1): page = self.request(url, params=params).text album_ids = EromeAlbumExtractor.pattern.findall(page) yield from album_ids if len(album_ids) < 36: return class EromeAlbumExtractor(EromeExtractor): """Extractor for albums on erome.com""" subcategory = "album" pattern = BASE_PATTERN + r"/a/(\w+)" test = ( ("https://www.erome.com/a/NQgdlWvk", { "pattern": r"https://v\d+\.erome\.com/\d+" r"/NQgdlWvk/j7jlzmYB_480p\.mp4", "count": 1, "keyword": { "album_id": "NQgdlWvk", "num": 1, "title": "porn", "user": "yYgWBZw8o8qsMzM", }, }), ("https://www.erome.com/a/TdbZ4ogi", { "pattern": r"https://s\d+\.erome\.com/\d+/TdbZ4ogi/\w+", "count": 6, "keyword": { "album_id": "TdbZ4ogi", "num": int, "title": "82e78cfbb461ad87198f927fcb1fda9a1efac9ff.", "user": "yYgWBZw8o8qsMzM", }, }), ) def albums(self): return (self.item,) class EromeUserExtractor(EromeExtractor): subcategory = "user" pattern = BASE_PATTERN + r"/(?!a/|search\?)([^/?#]+)" test = ("https://www.erome.com/yYgWBZw8o8qsMzM", { "range": "1-25", "count": 25, }) def albums(self): url = "{}/{}".format(self.root, self.item) return self._pagination(url, {}) class EromeSearchExtractor(EromeExtractor): subcategory = "search" pattern = BASE_PATTERN + r"/search\?q=([^&#]+)" test = ("https://www.erome.com/search?q=cute", { "range": "1-25", "count": 25, }) def albums(self): url = self.root + "/search" params = {"q": text.unquote(self.item)} return self._pagination(url, params) @cache() def _cookie_cache(): return () ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1674475069.0 gallery_dl-1.25.0/gallery_dl/extractor/exhentai.py0000644000175000017500000005001014363473075020731 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2014-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://e-hentai.org/ and https://exhentai.org/""" from .common import Extractor, Message from .. import text, util, exception from ..cache import cache import itertools import math BASE_PATTERN = r"(?:https?://)?(e[x-]|g\.e-)hentai\.org" class ExhentaiExtractor(Extractor): """Base class for exhentai extractors""" category = "exhentai" directory_fmt = ("{category}", "{gid} {title[:247]}") filename_fmt = ( "{gid}_{num:>04}_{image_token}_{filename}.{extension}") archive_fmt = "{gid}_{num}" cookienames = ("ipb_member_id", "ipb_pass_hash") cookiedomain = ".exhentai.org" root = "https://exhentai.org" request_interval = 5.0 LIMIT = False def __init__(self, match): # allow calling 'self.config()' before 'Extractor.__init__()' self._cfgpath = ("extractor", self.category, self.subcategory) version = match.group(1) domain = self.config("domain", "auto") if domain == "auto": domain = ("ex" if version == "ex" else "e-") + "hentai.org" self.root = "https://" + domain self.cookiedomain = "." + domain Extractor.__init__(self, match) self.original = self.config("original", True) limits = self.config("limits", False) if limits and limits.__class__ is int: self.limits = limits self._remaining = 0 else: self.limits = False self.session.headers["Referer"] = self.root + "/" if version != "ex": self.session.cookies.set("nw", "1", domain=self.cookiedomain) def request(self, *args, **kwargs): response = Extractor.request(self, *args, **kwargs) if self._is_sadpanda(response): self.log.info("sadpanda.jpg") raise exception.AuthorizationError() return response def login(self): """Login and set necessary cookies""" if self.LIMIT: raise exception.StopExtraction("Image limit reached!") if self._check_cookies(self.cookienames): return username, password = self._get_auth_info() if username: self._update_cookies(self._login_impl(username, password)) else: self.log.info("no username given; using e-hentai.org") self.root = "https://e-hentai.org" self.original = False self.limits = False self.session.cookies["nw"] = "1" @cache(maxage=90*24*3600, keyarg=1) def _login_impl(self, username, password): self.log.info("Logging in as %s", username) url = "https://forums.e-hentai.org/index.php?act=Login&CODE=01" headers = { "Referer": "https://e-hentai.org/bounce_login.php?b=d&bt=1-1", } data = { "CookieDate": "1", "b": "d", "bt": "1-1", "UserName": username, "PassWord": password, "ipb_login_submit": "Login!", } response = self.request(url, method="POST", headers=headers, data=data) if b"You are now logged in as:" not in response.content: raise exception.AuthenticationError() return {c: response.cookies[c] for c in self.cookienames} @staticmethod def _is_sadpanda(response): """Return True if the response object contains a sad panda""" return ( response.headers.get("Content-Length") == "9615" and "sadpanda.jpg" in response.headers.get("Content-Disposition", "") ) class ExhentaiGalleryExtractor(ExhentaiExtractor): """Extractor for image galleries from exhentai.org""" subcategory = "gallery" pattern = (BASE_PATTERN + r"(?:/g/(\d+)/([\da-f]{10})" r"|/s/([\da-f]{10})/(\d+)-(\d+))") test = ( ("https://exhentai.org/g/1200119/d55c44d3d0/", { "options": (("original", False),), "keyword": { "cost": int, "date": "dt:2018-03-18 20:14:00", "eh_category": "Non-H", "expunged": False, "favorites": r"re:^[12]\d$", "filecount": "4", "filesize": 1488978, "gid": 1200119, "height": int, "image_token": "re:[0-9a-f]{10}", "lang": "ja", "language": "Japanese", "parent": "", "rating": r"re:\d\.\d+", "size": int, "tags": [ "parody:komi-san wa komyushou desu.", "character:shouko komi", "group:seventh lowlife", "other:sample", ], "thumb": "https://exhentai.org/t/ce/0a/ce0a5bcb583229a9b07c0f8" "3bcb1630ab1350640-624622-736-1036-jpg_250.jpg", "title": "C93 [Seventh_Lowlife] Komi-san ha Tokidoki Daitan de" "su (Komi-san wa Komyushou desu) [Sample]", "title_jpn": "(C93) [Comiketjack (わ!)] 古見さんは、時々大胆" "です。 (古見さんは、コミュ症です。) [見本]", "token": "d55c44d3d0", "torrentcount": "0", "uploader": "klorpa", "width": int, }, "content": ("2c68cff8a7ca540a78c36fdbf5fbae0260484f87", "e9891a4c017ed0bb734cd1efba5cd03f594d31ff"), }), ("https://exhentai.org/g/960461/4f0e369d82/", { "exception": exception.NotFoundError, }), ("http://exhentai.org/g/962698/7f02358e00/", { "exception": exception.AuthorizationError, }), ("https://exhentai.org/s/f68367b4c8/1200119-3", { "options": (("original", False),), "count": 2, }), ("https://e-hentai.org/s/f68367b4c8/1200119-3", { "options": (("original", False),), "count": 2, }), ("https://g.e-hentai.org/g/1200119/d55c44d3d0/"), ) def __init__(self, match): ExhentaiExtractor.__init__(self, match) self.key = {} self.count = 0 self.gallery_id = text.parse_int(match.group(2) or match.group(5)) self.gallery_token = match.group(3) self.image_token = match.group(4) self.image_num = text.parse_int(match.group(6), 1) source = self.config("source") if source == "hitomi": self.items = self._items_hitomi def items(self): self.login() if self.gallery_token: gpage = self._gallery_page() self.image_token = text.extr(gpage, 'hentai.org/s/', '"') if not self.image_token: self.log.error("Failed to extract initial image token") self.log.debug("Page content:\n%s", gpage) return ipage = self._image_page() else: ipage = self._image_page() part = text.extr(ipage, 'hentai.org/g/', '"') if not part: self.log.error("Failed to extract gallery token") self.log.debug("Page content:\n%s", ipage) return self.gallery_token = part.split("/")[1] gpage = self._gallery_page() data = self.get_metadata(gpage) self.count = text.parse_int(data["filecount"]) yield Message.Directory, data def _validate_response(response): # declared inside 'items()' to be able to access 'data' if not response.history and response.headers.get( "content-type", "").startswith("text/html"): self._report_limits(data) return True images = itertools.chain( (self.image_from_page(ipage),), self.images_from_api()) for url, image in images: data.update(image) if self.limits: self._check_limits(data) if "/fullimg.php" in url: data["_http_validate"] = _validate_response else: data["_http_validate"] = None yield Message.Url, url, data def _items_hitomi(self): if self.config("metadata", False): data = self.metadata_from_api() data["date"] = text.parse_timestamp(data["posted"]) else: data = {} from .hitomi import HitomiGalleryExtractor url = "https://hitomi.la/galleries/{}.html".format(self.gallery_id) data["_extractor"] = HitomiGalleryExtractor yield Message.Queue, url, data def get_metadata(self, page): """Extract gallery metadata""" data = self.metadata_from_page(page) if self.config("metadata", False): data.update(self.metadata_from_api()) data["date"] = text.parse_timestamp(data["posted"]) return data def metadata_from_page(self, page): extr = text.extract_from(page) data = { "gid" : self.gallery_id, "token" : self.gallery_token, "thumb" : extr("background:transparent url(", ")"), "title" : text.unescape(extr('

', '

')), "title_jpn" : text.unescape(extr('

', '

')), "_" : extr('
', '<'), "uploader" : extr('
', '
'), "date" : text.parse_datetime(extr( '>Posted:

'), "%Y-%m-%d %H:%M"), "parent" : extr( '>Parent:
', 'Visible:', '<'), "language" : extr('>Language:', ' '), "filesize" : text.parse_bytes(extr( '>File Size:', '<').rstrip("Bb")), "filecount" : extr('>Length:', ' '), "favorites" : extr('id="favcount">', ' '), "rating" : extr(">Average: ", "<"), "torrentcount" : extr('>Torrent Download (', ')'), } if data["uploader"].startswith("<"): data["uploader"] = text.unescape(text.extr( data["uploader"], ">", "<")) f = data["favorites"][0] if f == "N": data["favorites"] = "0" elif f == "O": data["favorites"] = "1" data["lang"] = util.language_to_code(data["language"]) data["tags"] = [ text.unquote(tag.replace("+", " ")) for tag in text.extract_iter(page, 'hentai.org/tag/', '"') ] return data def metadata_from_api(self): url = self.root + "/api.php" data = { "method": "gdata", "gidlist": ((self.gallery_id, self.gallery_token),), "namespace": 1, } data = self.request(url, method="POST", json=data).json() if "error" in data: raise exception.StopExtraction(data["error"]) return data["gmetadata"][0] def image_from_page(self, page): """Get image url and data from webpage""" pos = page.index('
", "") self.log.debug("Image Limits: %s/%s", current, self.limits) self._remaining = self.limits - text.parse_int(current) def _gallery_page(self): url = "{}/g/{}/{}/".format( self.root, self.gallery_id, self.gallery_token) response = self.request(url, fatal=False) page = response.text if response.status_code == 404 and "Gallery Not Available" in page: raise exception.AuthorizationError() if page.startswith(("Key missing", "Gallery not found")): raise exception.NotFoundError("gallery") if "hentai.org/mpv/" in page: self.log.warning("Enabled Multi-Page Viewer is not supported") return page def _image_page(self): url = "{}/s/{}/{}-{}".format( self.root, self.image_token, self.gallery_id, self.image_num) page = self.request(url, fatal=False).text if page.startswith(("Invalid page", "Keep trying")): raise exception.NotFoundError("image page") return page @staticmethod def _parse_image_info(url): for part in url.split("/")[4:]: try: _, size, width, height, _ = part.split("-") break except ValueError: pass else: size = width = height = 0 return { "cost" : 1, "size" : text.parse_int(size), "width" : text.parse_int(width), "height": text.parse_int(height), } @staticmethod def _parse_original_info(info): parts = info.lstrip().split(" ") size = text.parse_bytes(parts[3] + parts[4][0]) return { # 1 initial point + 1 per 0.1 MB "cost" : 1 + math.ceil(size / 100000), "size" : size, "width" : text.parse_int(parts[0]), "height": text.parse_int(parts[2]), } class ExhentaiSearchExtractor(ExhentaiExtractor): """Extractor for exhentai search results""" subcategory = "search" pattern = BASE_PATTERN + r"/(?:\?([^#]*)|tag/([^/?#]+))" test = ( ("https://e-hentai.org/?f_search=touhou"), ("https://exhentai.org/?f_cats=767&f_search=touhou"), ("https://exhentai.org/tag/parody:touhou+project"), (("https://exhentai.org/?f_doujinshi=0&f_manga=0&f_artistcg=0" "&f_gamecg=0&f_western=0&f_non-h=1&f_imageset=0&f_cosplay=0" "&f_asianporn=0&f_misc=0&f_search=touhou&f_apply=Apply+Filter"), { "pattern": ExhentaiGalleryExtractor.pattern, "range": "1-30", "count": 30, "keyword": { "gallery_id": int, "gallery_token": r"re:^[0-9a-f]{10}$" }, }), ) def __init__(self, match): ExhentaiExtractor.__init__(self, match) self.search_url = self.root _, query, tag = match.groups() if tag: if "+" in tag: ns, _, tag = tag.rpartition(":") tag = '{}:"{}$"'.format(ns, tag.replace("+", " ")) else: tag += "$" self.params = {"f_search": tag, "page": 0} else: self.params = text.parse_query(query) if "next" not in self.params: self.params["page"] = text.parse_int(self.params.get("page")) def items(self): self.login() data = {"_extractor": ExhentaiGalleryExtractor} search_url = self.search_url params = self.params while True: last = None page = self.request(search_url, params=params).text for gallery in ExhentaiGalleryExtractor.pattern.finditer(page): url = gallery.group(0) if url == last: continue last = url data["gallery_id"] = text.parse_int(gallery.group(2)) data["gallery_token"] = gallery.group(3) yield Message.Queue, url + "/", data next_url = text.extr(page, 'nexturl="', '"', None) if next_url is not None: if not next_url: return search_url = next_url params = None elif 'class="ptdd">><' in page or ">No hits found

" in page: return else: params["page"] += 1 class ExhentaiFavoriteExtractor(ExhentaiSearchExtractor): """Extractor for favorited exhentai galleries""" subcategory = "favorite" pattern = BASE_PATTERN + r"/favorites\.php(?:\?([^#]*)())?" test = ( ("https://e-hentai.org/favorites.php", { "count": 1, "pattern": r"https?://e-hentai\.org/g/1200119/d55c44d3d0" }), ("https://exhentai.org/favorites.php?favcat=1&f_search=touhou" "&f_apply=Search+Favorites"), ) def __init__(self, match): ExhentaiSearchExtractor.__init__(self, match) self.search_url = self.root + "/favorites.php" ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1675805918.0 gallery_dl-1.25.0/gallery_dl/extractor/fallenangels.py0000644000175000017500000000756414370542336021572 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2017-2019 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.fascans.com/""" from .common import ChapterExtractor, MangaExtractor from .. import text, util class FallenangelsChapterExtractor(ChapterExtractor): """Extractor for manga-chapters from fascans.com""" category = "fallenangels" pattern = (r"(?:https?://)?(manga|truyen)\.fascans\.com" r"/manga/([^/?#]+)/([^/?#]+)") test = ( ("https://manga.fascans.com/manga/chronos-ruler/20/1", { "url": "4604a7914566cc2da0ff789aa178e2d1c8c241e3", "keyword": "2dfcc50020e32cd207be88e2a8fac0933e36bdfb", }), ("http://truyen.fascans.com/manga/hungry-marie/8", { "url": "1f923d9cb337d5e7bbf4323719881794a951c6ae", "keyword": "2bdb7334c0e3eceb9946ffd3132df679b4a94f6a", }), ("http://manga.fascans.com/manga/rakudai-kishi-no-eiyuutan/19.5", { "url": "273f6863966c83ea79ad5846a2866e08067d3f0e", "keyword": "d1065685bfe0054c4ff2a0f20acb089de4cec253", }), ) def __init__(self, match): self.version, self.manga, self.chapter = match.groups() url = "https://{}.fascans.com/manga/{}/{}/1".format( self.version, self.manga, self.chapter) ChapterExtractor.__init__(self, match, url) def metadata(self, page): extr = text.extract_from(page) lang = "vi" if self.version == "truyen" else "en" chapter, sep, minor = self.chapter.partition(".") return { "manga" : extr('name="description" content="', ' Chapter '), "title" : extr(': ', ' - Page 1'), "chapter" : chapter, "chapter_minor": sep + minor, "lang" : lang, "language": util.code_to_language(lang), } @staticmethod def images(page): return [ (img["page_image"], None) for img in util.json_loads( text.extr(page, "var pages = ", ";") ) ] class FallenangelsMangaExtractor(MangaExtractor): """Extractor for manga from fascans.com""" chapterclass = FallenangelsChapterExtractor category = "fallenangels" pattern = r"(?:https?://)?((manga|truyen)\.fascans\.com/manga/[^/]+)/?$" test = ( ("https://manga.fascans.com/manga/chronos-ruler", { "url": "eea07dd50f5bc4903aa09e2cc3e45c7241c9a9c2", "keyword": "c414249525d4c74ad83498b3c59a813557e59d7e", }), ("https://truyen.fascans.com/manga/rakudai-kishi-no-eiyuutan", { "url": "51a731a6b82d5eb7a335fbae6b02d06aeb2ab07b", "keyword": "2d2a2a5d9ea5925eb9a47bb13d848967f3af086c", }), ) def __init__(self, match): url = "https://" + match.group(1) self.lang = "vi" if match.group(2) == "truyen" else "en" MangaExtractor.__init__(self, match, url) def chapters(self, page): extr = text.extract_from(page) results = [] language = util.code_to_language(self.lang) while extr('
  • ', '<') title = extr('', '') manga, _, chapter = cha.rpartition(" ") chapter, dot, minor = chapter.partition(".") results.append((url, { "manga" : manga, "title" : text.unescape(title), "volume" : text.parse_int(vol), "chapter" : text.parse_int(chapter), "chapter_minor": dot + minor, "lang" : self.lang, "language": language, })) return results ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1677790111.0 gallery_dl-1.25.0/gallery_dl/extractor/fanbox.py0000644000175000017500000003127014400205637020375 0ustar00mikemike# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.fanbox.cc/""" import re from .common import Extractor, Message from .. import text BASE_PATTERN = ( r"(?:https?://)?(?:" r"(?!www\.)([\w-]+)\.fanbox\.cc|" r"(?:www\.)?fanbox\.cc/@([\w-]+))" ) class FanboxExtractor(Extractor): """Base class for Fanbox extractors""" category = "fanbox" root = "https://www.fanbox.cc" directory_fmt = ("{category}", "{creatorId}") filename_fmt = "{id}_{num}.{extension}" archive_fmt = "{id}_{num}" _warning = True def __init__(self, match): Extractor.__init__(self, match) self.embeds = self.config("embeds", True) def items(self): if self._warning: if not self._check_cookies(("FANBOXSESSID",)): self.log.warning("no 'FANBOXSESSID' cookie set") FanboxExtractor._warning = False for content_body, post in self.posts(): yield Message.Directory, post yield from self._get_urls_from_post(content_body, post) def posts(self): """Return all relevant post objects""" def _pagination(self, url): headers = {"Origin": self.root} while url: url = text.ensure_http_scheme(url) body = self.request(url, headers=headers).json()["body"] for item in body["items"]: yield self._get_post_data(item["id"]) url = body["nextUrl"] def _get_post_data(self, post_id): """Fetch and process post data""" headers = {"Origin": self.root} url = "https://api.fanbox.cc/post.info?postId="+post_id post = self.request(url, headers=headers).json()["body"] content_body = post.pop("body", None) if content_body: if "html" in content_body: post["html"] = content_body["html"] if post["type"] == "article": post["articleBody"] = content_body.copy() if "blocks" in content_body: content = [] # text content images = [] # image IDs in 'body' order append = content.append append_img = images.append for block in content_body["blocks"]: if "text" in block: append(block["text"]) if "links" in block: for link in block["links"]: append(link["url"]) if "imageId" in block: append_img(block["imageId"]) if images and "imageMap" in content_body: # reorder 'imageMap' (#2718) image_map = content_body["imageMap"] content_body["imageMap"] = { image_id: image_map[image_id] for image_id in images if image_id in image_map } post["content"] = "\n".join(content) post["date"] = text.parse_datetime(post["publishedDatetime"]) post["text"] = content_body.get("text") if content_body else None post["isCoverImage"] = False return content_body, post def _get_urls_from_post(self, content_body, post): num = 0 cover_image = post.get("coverImageUrl") if cover_image: cover_image = re.sub("/c/[0-9a-z_]+", "", cover_image) final_post = post.copy() final_post["isCoverImage"] = True final_post["fileUrl"] = cover_image text.nameext_from_url(cover_image, final_post) final_post["num"] = num num += 1 yield Message.Url, cover_image, final_post if not content_body: return if "html" in content_body: html_urls = [] for href in text.extract_iter(content_body["html"], 'href="', '"'): if "fanbox.pixiv.net/images/entry" in href: html_urls.append(href) elif "downloads.fanbox.cc" in href: html_urls.append(href) for src in text.extract_iter(content_body["html"], 'data-src-original="', '"'): html_urls.append(src) for url in html_urls: final_post = post.copy() text.nameext_from_url(url, final_post) final_post["fileUrl"] = url final_post["num"] = num num += 1 yield Message.Url, url, final_post for group in ("images", "imageMap"): if group in content_body: for item in content_body[group]: if group == "imageMap": # imageMap is a dict with image objects as values item = content_body[group][item] final_post = post.copy() final_post["fileUrl"] = item["originalUrl"] text.nameext_from_url(item["originalUrl"], final_post) if "extension" in item: final_post["extension"] = item["extension"] final_post["fileId"] = item.get("id") final_post["width"] = item.get("width") final_post["height"] = item.get("height") final_post["num"] = num num += 1 yield Message.Url, item["originalUrl"], final_post for group in ("files", "fileMap"): if group in content_body: for item in content_body[group]: if group == "fileMap": # fileMap is a dict with file objects as values item = content_body[group][item] final_post = post.copy() final_post["fileUrl"] = item["url"] text.nameext_from_url(item["url"], final_post) if "extension" in item: final_post["extension"] = item["extension"] if "name" in item: final_post["filename"] = item["name"] final_post["fileId"] = item.get("id") final_post["num"] = num num += 1 yield Message.Url, item["url"], final_post if self.embeds: embeds_found = [] if "video" in content_body: embeds_found.append(content_body["video"]) embeds_found.extend(content_body.get("embedMap", {}).values()) for embed in embeds_found: # embed_result is (message type, url, metadata dict) embed_result = self._process_embed(post, embed) if not embed_result: continue embed_result[2]["num"] = num num += 1 yield embed_result def _process_embed(self, post, embed): final_post = post.copy() provider = embed["serviceProvider"] content_id = embed.get("videoId") or embed.get("contentId") prefix = "ytdl:" if self.embeds == "ytdl" else "" url = None is_video = False if provider == "soundcloud": url = prefix+"https://soundcloud.com/"+content_id is_video = True elif provider == "youtube": url = prefix+"https://youtube.com/watch?v="+content_id is_video = True elif provider == "vimeo": url = prefix+"https://vimeo.com/"+content_id is_video = True elif provider == "fanbox": # this is an old URL format that redirects # to a proper Fanbox URL url = "https://www.pixiv.net/fanbox/"+content_id # resolve redirect response = self.request(url, method="HEAD", allow_redirects=False) url = response.headers["Location"] final_post["_extractor"] = FanboxPostExtractor elif provider == "twitter": url = "https://twitter.com/_/status/"+content_id elif provider == "google_forms": templ = "https://docs.google.com/forms/d/e/{}/viewform?usp=sf_link" url = templ.format(content_id) else: self.log.warning("service not recognized: {}".format(provider)) if url: final_post["embed"] = embed final_post["embedUrl"] = url text.nameext_from_url(url, final_post) msg_type = Message.Queue if is_video and self.embeds == "ytdl": msg_type = Message.Url return msg_type, url, final_post class FanboxCreatorExtractor(FanboxExtractor): """Extractor for a Fanbox creator's works""" subcategory = "creator" pattern = BASE_PATTERN + r"(?:/posts)?/?$" test = ( ("https://xub.fanbox.cc", { "range": "1-15", "count": ">= 15", "keyword": { "creatorId" : "xub", "tags" : list, "title" : str, }, }), ("https://xub.fanbox.cc/posts"), ("https://www.fanbox.cc/@xub/"), ("https://www.fanbox.cc/@xub/posts"), ) def __init__(self, match): FanboxExtractor.__init__(self, match) self.creator_id = match.group(1) or match.group(2) def posts(self): url = "https://api.fanbox.cc/post.listCreator?creatorId={}&limit=10" return self._pagination(url.format(self.creator_id)) class FanboxPostExtractor(FanboxExtractor): """Extractor for media from a single Fanbox post""" subcategory = "post" pattern = BASE_PATTERN + r"/posts/(\d+)" test = ( ("https://www.fanbox.cc/@xub/posts/1910054", { "count": 3, "keyword": { "title": "えま★おうがすと", "tags": list, "hasAdultContent": True, "isCoverImage": False }, }), # entry post type, image embedded in html of the post ("https://nekoworks.fanbox.cc/posts/915", { "count": 2, "keyword": { "title": "【SAYORI FAN CLUB】お届け内容", "tags": list, "html": str, "hasAdultContent": True }, }), # article post type, imageMap, 2 twitter embeds, fanbox embed ("https://steelwire.fanbox.cc/posts/285502", { "options": (("embeds", True),), "count": 10, "keyword": { "title": "イラスト+SS|義足の炭鉱少年が義足を見せてくれるだけ 【全体公開版】", "tags": list, "articleBody": dict, "hasAdultContent": True }, }), # 'content' metadata (#3020) ("https://www.fanbox.cc/@official-en/posts/4326303", { "keyword": { "content": r"re:(?s)^Greetings from FANBOX.\n \nAs of Monday, " r"September 5th, 2022, we are happy to announce " r"the start of the FANBOX hashtag event " r"#MySetupTour ! \nAbout the event\nTo join this " r"event .+ \nPlease check this page for further " r"details regarding the Privacy & Terms.\n" r"https://fanbox.pixiv.help/.+/10184952456601\n\n\n" r"Thank you for your continued support of FANBOX.$", }, }), # imageMap file order (#2718) ("https://mochirong.fanbox.cc/posts/3746116", { "url": "c92ddd06f2efc4a5fe30ec67e21544f79a5c4062", }), ) def __init__(self, match): FanboxExtractor.__init__(self, match) self.post_id = match.group(3) def posts(self): return (self._get_post_data(self.post_id),) class FanboxRedirectExtractor(Extractor): """Extractor for pixiv redirects to fanbox.cc""" category = "fanbox" subcategory = "redirect" pattern = r"(?:https?://)?(?:www\.)?pixiv\.net/fanbox/creator/(\d+)" test = ("https://www.pixiv.net/fanbox/creator/52336352", { "pattern": FanboxCreatorExtractor.pattern, }) def __init__(self, match): Extractor.__init__(self, match) self.user_id = match.group(1) def items(self): url = "https://www.pixiv.net/fanbox/creator/" + self.user_id data = {"_extractor": FanboxCreatorExtractor} response = self.request( url, method="HEAD", allow_redirects=False, notfound="user") yield Message.Queue, response.headers["Location"], data ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1674475069.0 gallery_dl-1.25.0/gallery_dl/extractor/fanleaks.py0000644000175000017500000001064314363473075020720 0ustar00mikemike# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://fanleaks.club/""" from .common import Extractor, Message from .. import text, exception class FanleaksExtractor(Extractor): """Base class for Fanleaks extractors""" category = "fanleaks" directory_fmt = ("{category}", "{model}") filename_fmt = "{model_id}_{id}.{extension}" archive_fmt = "{model_id}_{id}" root = "https://fanleaks.club" def __init__(self, match): Extractor.__init__(self, match) self.model_id = match.group(1) def extract_post(self, url): extr = text.extract_from(self.request(url, notfound="post").text) data = { "model_id": self.model_id, "model" : text.unescape(extr('text-lg">', "")), "id" : text.parse_int(self.id), "type" : extr('type="', '"')[:5] or "photo", } url = extr('src="', '"') yield Message.Directory, data yield Message.Url, url, text.nameext_from_url(url, data) class FanleaksPostExtractor(FanleaksExtractor): """Extractor for individual posts on fanleak.club""" subcategory = "post" pattern = r"(?:https?://)?(?:www\.)?fanleaks\.club/([^/?#]+)/(\d+)" test = ( ("https://fanleaks.club/selti/880", { "pattern": (r"https://fanleaks\.club//models" r"/selti/images/selti_0880\.jpg"), "keyword": { "model_id": "selti", "model" : "Selti", "id" : 880, "type" : "photo", }, }), ("https://fanleaks.club/daisy-keech/1038", { "pattern": (r"https://fanleaks\.club//models" r"/daisy-keech/videos/daisy-keech_1038\.mp4"), "keyword": { "model_id": "daisy-keech", "model" : "Daisy Keech", "id" : 1038, "type" : "video", }, }), ("https://fanleaks.club/hannahowo/000", { "exception": exception.NotFoundError, }), ) def __init__(self, match): FanleaksExtractor.__init__(self, match) self.id = match.group(2) def items(self): url = "{}/{}/{}".format(self.root, self.model_id, self.id) return self.extract_post(url) class FanleaksModelExtractor(FanleaksExtractor): """Extractor for all posts from a fanleaks model""" subcategory = "model" pattern = (r"(?:https?://)?(?:www\.)?fanleaks\.club" r"/(?!latest/?$)([^/?#]+)/?$") test = ( ("https://fanleaks.club/hannahowo", { "pattern": (r"https://fanleaks\.club//models" r"/hannahowo/(images|videos)/hannahowo_\d+\.\w+"), "range" : "1-100", "count" : 100, }), ("https://fanleaks.club/belle-delphine", { "pattern": (r"https://fanleaks\.club//models" r"/belle-delphine/(images|videos)" r"/belle-delphine_\d+\.\w+"), "range" : "1-100", "count" : 100, }), ("https://fanleaks.club/daisy-keech"), ) def items(self): page_num = 1 page = self.request( self.root + "/" + self.model_id, notfound="model").text data = { "model_id": self.model_id, "model" : text.unescape( text.extr(page, 'mt-4">', "")), "type" : "photo", } page_url = text.extr(page, "url: '", "'") while True: page = self.request("{}{}".format(page_url, page_num)).text if not page: return for item in text.extract_iter(page, '"): self.id = id = text.extr(item, "/", '"') if "/icon-play.svg" in item: url = "{}/{}/{}".format(self.root, self.model_id, id) yield from self.extract_post(url) continue data["id"] = text.parse_int(id) url = text.extr(item, 'src="', '"').replace( "/thumbs/", "/", 1) yield Message.Directory, data yield Message.Url, url, text.nameext_from_url(url, data) page_num += 1 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1675804677.0 gallery_dl-1.25.0/gallery_dl/extractor/fantia.py0000644000175000017500000001466114370540005020364 0ustar00mikemike# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://fantia.jp/""" from .common import Extractor, Message from .. import text, util class FantiaExtractor(Extractor): """Base class for Fantia extractors""" category = "fantia" root = "https://fantia.jp" directory_fmt = ("{category}", "{fanclub_id}") filename_fmt = "{post_id}_{file_id}.{extension}" archive_fmt = "{post_id}_{file_id}" _warning = True def items(self): self.headers = { "Accept" : "application/json, text/plain, */*", "Referer": self.root, } if self._warning: if not self._check_cookies(("_session_id",)): self.log.warning("no '_session_id' cookie set") FantiaExtractor._warning = False for post_id in self.posts(): full_response, post = self._get_post_data(post_id) yield Message.Directory, post post["num"] = 0 for url, url_data in self._get_urls_from_post(full_response, post): post["num"] += 1 fname = url_data["content_filename"] or url text.nameext_from_url(fname, url_data) url_data["file_url"] = url yield Message.Url, url, url_data def posts(self): """Return post IDs""" def _pagination(self, url): params = {"page": 1} headers = self.headers while True: page = self.request(url, params=params, headers=headers).text self._csrf_token(page) post_id = None for post_id in text.extract_iter( page, 'class="link-block" href="/posts/', '"'): yield post_id if not post_id: return params["page"] += 1 def _csrf_token(self, page=None): if not page: page = self.request(self.root + "/").text self.headers["X-CSRF-Token"] = text.extr( page, 'name="csrf-token" content="', '"') def _get_post_data(self, post_id): """Fetch and process post data""" url = self.root+"/api/v1/posts/"+post_id resp = self.request(url, headers=self.headers).json()["post"] post = { "post_id": resp["id"], "post_url": self.root + "/posts/" + str(resp["id"]), "post_title": resp["title"], "comment": resp["comment"], "rating": resp["rating"], "posted_at": resp["posted_at"], "date": text.parse_datetime( resp["posted_at"], "%a, %d %b %Y %H:%M:%S %z"), "fanclub_id": resp["fanclub"]["id"], "fanclub_user_id": resp["fanclub"]["user"]["id"], "fanclub_user_name": resp["fanclub"]["user"]["name"], "fanclub_name": resp["fanclub"]["name"], "fanclub_url": self.root+"/fanclubs/"+str(resp["fanclub"]["id"]), "tags": resp["tags"] } return resp, post def _get_urls_from_post(self, resp, post): """Extract individual URL data from the response""" if "thumb" in resp and resp["thumb"] and "original" in resp["thumb"]: post["content_filename"] = "" post["content_category"] = "thumb" post["file_id"] = "thumb" yield resp["thumb"]["original"], post for content in resp["post_contents"]: post["content_category"] = content["category"] post["content_title"] = content["title"] post["content_filename"] = content.get("filename", "") post["content_id"] = content["id"] if "comment" in content: post["content_comment"] = content["comment"] if "post_content_photos" in content: for photo in content["post_content_photos"]: post["file_id"] = photo["id"] yield photo["url"]["original"], post if "download_uri" in content: post["file_id"] = content["id"] yield self.root+"/"+content["download_uri"], post if content["category"] == "blog" and "comment" in content: comment_json = util.json_loads(content["comment"]) ops = comment_json.get("ops", ()) # collect blogpost text first blog_text = "" for op in ops: insert = op.get("insert") if isinstance(insert, str): blog_text += insert post["blogpost_text"] = blog_text # collect images for op in ops: insert = op.get("insert") if isinstance(insert, dict) and "fantiaImage" in insert: img = insert["fantiaImage"] post["file_id"] = img["id"] yield "https://fantia.jp" + img["original_url"], post class FantiaCreatorExtractor(FantiaExtractor): """Extractor for a Fantia creator's works""" subcategory = "creator" pattern = r"(?:https?://)?(?:www\.)?fantia\.jp/fanclubs/(\d+)" test = ( ("https://fantia.jp/fanclubs/6939", { "range": "1-25", "count": ">= 25", "keyword": { "fanclub_user_id" : 52152, "tags" : list, "title" : str, }, }), ) def __init__(self, match): FantiaExtractor.__init__(self, match) self.creator_id = match.group(1) def posts(self): url = "{}/fanclubs/{}/posts".format(self.root, self.creator_id) return self._pagination(url) class FantiaPostExtractor(FantiaExtractor): """Extractor for media from a single Fantia post""" subcategory = "post" pattern = r"(?:https?://)?(?:www\.)?fantia\.jp/posts/(\d+)" test = ( ("https://fantia.jp/posts/508363", { "count": 6, "keyword": { "post_title": "zunda逆バニーでおしりコッショリ", "tags": list, "rating": "adult", "post_id": 508363 }, }), ) def __init__(self, match): FantiaExtractor.__init__(self, match) self.post_id = match.group(1) def posts(self): self._csrf_token() return (self.post_id,) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1674475069.0 gallery_dl-1.25.0/gallery_dl/extractor/fapachi.py0000644000175000017500000000534614363473075020533 0ustar00mikemike# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://fapachi.com/""" from .common import Extractor, Message from .. import text class FapachiPostExtractor(Extractor): """Extractor for individual posts on fapachi.com""" category = "fapachi" subcategory = "post" directory_fmt = ("{category}", "{user}") filename_fmt = "{user}_{id}.{extension}" archive_fmt = "{user}_{id}" pattern = (r"(?:https?://)?(?:www\.)?fapachi\.com" r"/(?!search/)([^/?#]+)/media/(\d+)") root = "https://fapachi.com" test = ( # NSFW ("https://fapachi.com/sonson/media/0082", { "pattern": (r"https://fapachi\.com/models/s/o/" r"sonson/1/full/sonson_0082\.jpeg"), "keyword": { "user": "sonson", "id" : "0082", }, }), # NSFW ("https://fapachi.com/ferxiita/media/0159"), ) def __init__(self, match): Extractor.__init__(self, match) self.user, self.id = match.groups() def items(self): data = { "user": self.user, "id" : self.id, } page = self.request("{}/{}/media/{}".format( self.root, self.user, self.id)).text url = self.root + text.extr(page, 'd-block" src="', '"') yield Message.Directory, data yield Message.Url, url, text.nameext_from_url(url, data) class FapachiUserExtractor(Extractor): """Extractor for all posts from a fapachi user""" category = "fapachi" subcategory = "user" pattern = (r"(?:https?://)?(?:www\.)?fapachi\.com" r"/(?!search(?:/|$))([^/?#]+)(?:/page/(\d+))?$") root = "https://fapachi.com" test = ( ("https://fapachi.com/sonson", { "pattern": FapachiPostExtractor.pattern, "range" : "1-50", "count" : 50, }), ("https://fapachi.com/ferxiita/page/3"), ) def __init__(self, match): Extractor.__init__(self, match) self.user = match.group(1) self.num = text.parse_int(match.group(2), 1) def items(self): data = {"_extractor": FapachiPostExtractor} while True: page = self.request("{}/{}/page/{}".format( self.root, self.user, self.num)).text for post in text.extract_iter(page, 'model-media-prew">', ">"): url = self.root + text.extr(post, 'Next page' not in page: return self.num += 1 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1674475069.0 gallery_dl-1.25.0/gallery_dl/extractor/fapello.py0000644000175000017500000001216214363473075020554 0ustar00mikemike# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://fapello.com/""" from .common import Extractor, Message from .. import text, exception class FapelloPostExtractor(Extractor): """Extractor for individual posts on fapello.com""" category = "fapello" subcategory = "post" directory_fmt = ("{category}", "{model}") filename_fmt = "{model}_{id}.{extension}" archive_fmt = "{type}_{model}_{id}" pattern = (r"(?:https?://)?(?:www\.)?fapello\.com" r"/(?!search/|popular_videos/)([^/?#]+)/(\d+)") test = ( ("https://fapello.com/carrykey/530/", { "pattern": (r"https://fapello\.com/content/c/a" r"/carrykey/1000/carrykey_0530\.jpg"), "keyword": { "model": "carrykey", "id" : 530, "type" : "photo", "thumbnail": "", }, }), ("https://fapello.com/vladislava-661/693/", { "pattern": (r"https://cdn\.fapello\.com/content/v/l" r"/vladislava-661/1000/vladislava-661_0693\.mp4"), "keyword": { "model": "vladislava-661", "id" : 693, "type" : "video", "thumbnail": ("https://fapello.com/content/v/l" "/vladislava-661/1000/vladislava-661_0693.jpg"), }, }), ("https://fapello.com/carrykey/000/", { "exception": exception.NotFoundError, }), ) def __init__(self, match): Extractor.__init__(self, match) self.model, self.id = match.groups() def items(self): url = "https://fapello.com/{}/{}/".format(self.model, self.id) page = text.extr( self.request(url, allow_redirects=False).text, 'class="uk-align-center"', "
  • ", None) if page is None: raise exception.NotFoundError("post") data = { "model": self.model, "id" : text.parse_int(self.id), "type" : "video" if 'type="video' in page else "photo", "thumbnail": text.extr(page, 'poster="', '"'), } url = text.extr(page, 'src="', '"') yield Message.Directory, data yield Message.Url, url, text.nameext_from_url(url, data) class FapelloModelExtractor(Extractor): """Extractor for all posts from a fapello model""" category = "fapello" subcategory = "model" pattern = (r"(?:https?://)?(?:www\.)?fapello\.com" r"/(?!top-(?:likes|followers)|popular_videos" r"|videos|trending|search/?$)" r"([^/?#]+)/?$") test = ( ("https://fapello.com/hyoon/", { "pattern": FapelloPostExtractor.pattern, "range" : "1-50", "count" : 50, }), ("https://fapello.com/kobaebeefboo/"), ) def __init__(self, match): Extractor.__init__(self, match) self.model = match.group(1) def items(self): num = 1 data = {"_extractor": FapelloPostExtractor} while True: url = "https://fapello.com/ajax/model/{}/page-{}/".format( self.model, num) page = self.request(url).text if not page: return for url in text.extract_iter(page, '', ""): yield Message.Queue, text.extr(item, '= 10", }) def __init__(self, match): FlickrExtractor.__init__(self, match) self.gallery_id = match.group(2) def metadata(self): data = FlickrExtractor.metadata(self) data["gallery"] = self.api.galleries_getInfo(self.gallery_id) return data def photos(self): return self.api.galleries_getPhotos(self.gallery_id) class FlickrGroupExtractor(FlickrExtractor): """Extractor for group pools from flickr.com""" subcategory = "group" directory_fmt = ("{category}", "Groups", "{group[groupname]}") archive_fmt = "G_{group[nsid]}_{id}" pattern = BASE_PATTERN + r"/groups/([^/?#]+)" test = ("https://www.flickr.com/groups/bird_headshots/", { "pattern": FlickrImageExtractor.pattern, "count": "> 150", }) def metadata(self): self.group = self.api.urls_lookupGroup(self.item_id) return {"group": self.group} def photos(self): return self.api.groups_pools_getPhotos(self.group["nsid"]) class FlickrUserExtractor(FlickrExtractor): """Extractor for the photostream of a flickr user""" subcategory = "user" archive_fmt = "u_{user[nsid]}_{id}" pattern = BASE_PATTERN + r"/photos/([^/?#]+)/?$" test = ("https://www.flickr.com/photos/shona_s/", { "pattern": FlickrImageExtractor.pattern, "count": 28, }) def photos(self): return self.api.people_getPhotos(self.user["nsid"]) class FlickrFavoriteExtractor(FlickrExtractor): """Extractor for favorite photos of a flickr user""" subcategory = "favorite" directory_fmt = ("{category}", "{user[username]}", "Favorites") archive_fmt = "f_{user[nsid]}_{id}" pattern = BASE_PATTERN + r"/photos/([^/?#]+)/favorites" test = ("https://www.flickr.com/photos/shona_s/favorites", { "pattern": FlickrImageExtractor.pattern, "count": 4, }) def photos(self): return self.api.favorites_getList(self.user["nsid"]) class FlickrSearchExtractor(FlickrExtractor): """Extractor for flickr photos based on search results""" subcategory = "search" directory_fmt = ("{category}", "Search", "{search[text]}") archive_fmt = "s_{search}_{id}" pattern = BASE_PATTERN + r"/search/?\?([^#]+)" test = ( ("https://flickr.com/search/?text=mountain"), ("https://flickr.com/search/?text=tree%20cloud%20house" "&color_codes=4&styles=minimalism"), ) def __init__(self, match): FlickrExtractor.__init__(self, match) self.search = text.parse_query(match.group(1)) if "text" not in self.search: self.search["text"] = "" def metadata(self): return {"search": self.search} def photos(self): return self.api.photos_search(self.search) class FlickrAPI(oauth.OAuth1API): """Minimal interface for the flickr API https://www.flickr.com/services/api/ """ API_URL = "https://api.flickr.com/services/rest/" API_KEY = "ac4fd7aa98585b9eee1ba761c209de68" API_SECRET = "3adb0f568dc68393" FORMATS = [ ("o" , "Original" , None), ("6k", "X-Large 6K" , 6144), ("5k", "X-Large 5K" , 5120), ("4k", "X-Large 4K" , 4096), ("3k", "X-Large 3K" , 3072), ("k" , "Large 2048" , 2048), ("h" , "Large 1600" , 1600), ("l" , "Large" , 1024), ("c" , "Medium 800" , 800), ("z" , "Medium 640" , 640), ("m" , "Medium" , 500), ("n" , "Small 320" , 320), ("s" , "Small" , 240), ("q" , "Large Square", 150), ("t" , "Thumbnail" , 100), ("s" , "Square" , 75), ] VIDEO_FORMATS = { "orig" : 9, "1080p" : 8, "720p" : 7, "360p" : 6, "288p" : 5, "700" : 4, "300" : 3, "100" : 2, "appletv" : 1, "iphone_wifi": 0, } def __init__(self, extractor): oauth.OAuth1API.__init__(self, extractor) self.videos = extractor.config("videos", True) self.maxsize = extractor.config("size-max") if isinstance(self.maxsize, str): for fmt, fmtname, fmtwidth in self.FORMATS: if self.maxsize == fmt or self.maxsize == fmtname: self.maxsize = fmtwidth break else: self.maxsize = None extractor.log.warning( "Could not match '%s' to any format", self.maxsize) if self.maxsize: self.formats = [fmt for fmt in self.FORMATS if not fmt[2] or fmt[2] <= self.maxsize] else: self.formats = self.FORMATS self.formats = self.formats[:8] def favorites_getList(self, user_id): """Returns a list of the user's favorite photos.""" params = {"user_id": user_id} return self._pagination("favorites.getList", params) def galleries_getInfo(self, gallery_id): """Gets information about a gallery.""" params = {"gallery_id": gallery_id} gallery = self._call("galleries.getInfo", params)["gallery"] return self._clean_info(gallery) def galleries_getPhotos(self, gallery_id): """Return the list of photos for a gallery.""" params = {"gallery_id": gallery_id} return self._pagination("galleries.getPhotos", params) def groups_pools_getPhotos(self, group_id): """Returns a list of pool photos for a given group.""" params = {"group_id": group_id} return self._pagination("groups.pools.getPhotos", params) def people_getPhotos(self, user_id): """Return photos from the given user's photostream.""" params = {"user_id": user_id} return self._pagination("people.getPhotos", params) def photos_getInfo(self, photo_id): """Get information about a photo.""" params = {"photo_id": photo_id} return self._call("photos.getInfo", params)["photo"] def photos_getSizes(self, photo_id): """Returns the available sizes for a photo.""" params = {"photo_id": photo_id} sizes = self._call("photos.getSizes", params)["sizes"]["size"] if self.maxsize: for index, size in enumerate(sizes): if index > 0 and (int(size["width"]) > self.maxsize or int(size["height"]) > self.maxsize): del sizes[index:] break return sizes def photos_search(self, params): """Return a list of photos matching some criteria.""" return self._pagination("photos.search", params.copy()) def photosets_getInfo(self, photoset_id, user_id): """Gets information about a photoset.""" params = {"photoset_id": photoset_id, "user_id": user_id} photoset = self._call("photosets.getInfo", params)["photoset"] return self._clean_info(photoset) def photosets_getList(self, user_id): """Returns the photosets belonging to the specified user.""" params = {"user_id": user_id} return self._pagination_sets("photosets.getList", params) def photosets_getPhotos(self, photoset_id): """Get the list of photos in a set.""" params = {"photoset_id": photoset_id} return self._pagination("photosets.getPhotos", params, "photoset") def urls_lookupGroup(self, groupname): """Returns a group NSID, given the url to a group's page.""" params = {"url": "https://www.flickr.com/groups/" + groupname} group = self._call("urls.lookupGroup", params)["group"] return {"nsid": group["id"], "path_alias": groupname, "groupname": group["groupname"]["_content"]} def urls_lookupUser(self, username): """Returns a user NSID, given the url to a user's photos or profile.""" params = {"url": "https://www.flickr.com/photos/" + username} user = self._call("urls.lookupUser", params)["user"] return { "nsid" : user["id"], "username" : user["username"]["_content"], "path_alias": username, } def video_getStreamInfo(self, video_id, secret=None): """Returns all available video streams""" params = {"photo_id": video_id} if not secret: secret = self._call("photos.getInfo", params)["photo"]["secret"] params["secret"] = secret stream = self._call("video.getStreamInfo", params)["streams"]["stream"] return max(stream, key=lambda s: self.VIDEO_FORMATS.get(s["type"], 0)) def _call(self, method, params): params["method"] = "flickr." + method params["format"] = "json" params["nojsoncallback"] = "1" if self.api_key: params["api_key"] = self.api_key data = self.request(self.API_URL, params=params).json() if "code" in data: msg = data.get("message") self.log.debug("Server response: %s", data) if data["code"] == 1: raise exception.NotFoundError(self.extractor.subcategory) elif data["code"] == 98: raise exception.AuthenticationError(msg) elif data["code"] == 99: raise exception.AuthorizationError(msg) raise exception.StopExtraction("API request failed: %s", msg) return data def _pagination(self, method, params, key="photos"): params["extras"] = ("description,date_upload,tags,views,media," "path_alias,owner_name,") params["extras"] += ",".join("url_" + fmt[0] for fmt in self.formats) params["page"] = 1 while True: data = self._call(method, params)[key] yield from data["photo"] if params["page"] >= data["pages"]: return params["page"] += 1 def _pagination_sets(self, method, params): params["page"] = 1 while True: data = self._call(method, params)["photosets"] yield from data["photoset"] if params["page"] >= data["pages"]: return params["page"] += 1 def _extract_format(self, photo): photo["description"] = photo["description"]["_content"].strip() photo["views"] = text.parse_int(photo["views"]) photo["date"] = text.parse_timestamp(photo["dateupload"]) photo["tags"] = photo["tags"].split() photo["id"] = text.parse_int(photo["id"]) if "owner" in photo: photo["owner"] = { "nsid" : photo["owner"], "username" : photo["ownername"], "path_alias": photo["pathalias"], } else: photo["owner"] = self.extractor.user del photo["pathalias"] del photo["ownername"] if photo["media"] == "video" and self.videos: return self._extract_video(photo) for fmt, fmtname, fmtwidth in self.formats: key = "url_" + fmt if key in photo: photo["width"] = text.parse_int(photo["width_" + fmt]) photo["height"] = text.parse_int(photo["height_" + fmt]) if self.maxsize and (photo["width"] > self.maxsize or photo["height"] > self.maxsize): continue photo["url"] = photo[key] photo["label"] = fmtname # remove excess data keys = [ key for key in photo if key.startswith(("url_", "width_", "height_")) ] for key in keys: del photo[key] break else: self._extract_photo(photo) return photo def _extract_photo(self, photo): size = self.photos_getSizes(photo["id"])[-1] photo["url"] = size["source"] photo["label"] = size["label"] photo["width"] = text.parse_int(size["width"]) photo["height"] = text.parse_int(size["height"]) return photo def _extract_video(self, photo): stream = self.video_getStreamInfo(photo["id"], photo.get("secret")) photo["url"] = stream["_content"] photo["label"] = stream["type"] photo["width"] = photo["height"] = 0 return photo @staticmethod def _clean_info(info): info["title"] = info["title"]["_content"] info["description"] = info["description"]["_content"] return info ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1674475069.0 gallery_dl-1.25.0/gallery_dl/extractor/foolfuuka.py0000644000175000017500000002601614363473075021130 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2019-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for FoolFuuka 4chan archives""" from .common import BaseExtractor, Message from .. import text import itertools class FoolfuukaExtractor(BaseExtractor): """Base extractor for FoolFuuka based boards/archives""" basecategory = "foolfuuka" filename_fmt = "{timestamp_ms} {filename_media}.{extension}" archive_fmt = "{board[shortname]}_{num}_{timestamp}" external = "default" def __init__(self, match): BaseExtractor.__init__(self, match) self.session.headers["Referer"] = self.root if self.category == "b4k": self.remote = self._remote_direct def items(self): yield Message.Directory, self.metadata() for post in self.posts(): media = post["media"] if not media: continue url = media["media_link"] if not url and "remote_media_link" in media: url = self.remote(media) if url.startswith("/"): url = self.root + url post["filename"], _, post["extension"] = \ media["media"].rpartition(".") post["filename_media"] = media["media_filename"].rpartition(".")[0] post["timestamp_ms"] = text.parse_int( media["media_orig"].rpartition(".")[0]) yield Message.Url, url, post def metadata(self): """Return general metadata""" def posts(self): """Return an iterable with all relevant posts""" def remote(self, media): """Resolve a remote media link""" needle = '= 5 else "" data["title"] = data["chapter_string"].partition(":")[2].strip() return data BASE_PATTERN = FoolslideExtractor.update({ "powermanga": { "root": "https://read.powermanga.org", "pattern": r"read(?:er)?\.powermanga\.org", }, "sensescans": { "root": "https://sensescans.com/reader", "pattern": r"(?:(?:www\.)?sensescans\.com/reader" r"|reader\.sensescans\.com)", }, }) class FoolslideChapterExtractor(FoolslideExtractor): """Base class for chapter extractors for FoOlSlide based sites""" subcategory = "chapter" directory_fmt = ("{category}", "{manga}", "{chapter_string}") filename_fmt = ( "{manga}_c{chapter:>03}{chapter_minor:?//}_{page:>03}.{extension}") archive_fmt = "{id}" pattern = BASE_PATTERN + r"(/read/[^/?#]+/[a-z-]+/\d+/\d+(?:/\d+)?)" test = ( (("https://read.powermanga.org" "/read/one_piece_digital_colour_comics/en/0/75/"), { "url": "854c5817f8f767e1bccd05fa9d58ffb5a4b09384", "keyword": "a60c42f2634b7387899299d411ff494ed0ad6dbe", }), ("https://sensescans.com/reader/read/ao_no_orchestra/en/0/26/", { "url": "bbd428dc578f5055e9f86ad635b510386cd317cd", "keyword": "083ef6f8831c84127fe4096fa340a249be9d1424", }), ("https://reader.sensescans.com/read/ao_no_orchestra/en/0/26/"), ) def items(self): page = self.request(self.gallery_url).text data = self.metadata(page) imgs = self.images(page) data["count"] = len(imgs) data["chapter_id"] = text.parse_int(imgs[0]["chapter_id"]) yield Message.Directory, data enum = util.enumerate_reversed if self.config( "page-reverse") else enumerate for data["page"], image in enum(imgs, 1): try: url = image["url"] del image["url"] del image["chapter_id"] del image["thumb_url"] except KeyError: pass for key in ("height", "id", "size", "width"): image[key] = text.parse_int(image[key]) data.update(image) text.nameext_from_url(data["filename"], data) yield Message.Url, url, data def metadata(self, page): extr = text.extract_from(page) extr('

    ', '') return self.parse_chapter_url(self.gallery_url, { "manga" : text.unescape(extr('title="', '"')).strip(), "chapter_string": text.unescape(extr('title="', '"')), }) def images(self, page): return util.json_loads(text.extr(page, "var pages = ", ";")) class FoolslideMangaExtractor(FoolslideExtractor): """Base class for manga extractors for FoOlSlide based sites""" subcategory = "manga" categorytransfer = True pattern = BASE_PATTERN + r"(/series/[^/?#]+)" test = ( (("https://read.powermanga.org" "/series/one_piece_digital_colour_comics/"), { "count": ">= 1", "keyword": { "chapter": int, "chapter_minor": str, "chapter_string": str, "group": "PowerManga", "lang": "en", "language": "English", "manga": "One Piece Digital Colour Comics", "title": str, "volume": int, }, }), ("https://sensescans.com/reader/series/yotsubato/", { "count": ">= 3", }), ) def items(self): page = self.request(self.gallery_url).text chapters = self.chapters(page) if not self.config("chapter-reverse", False): chapters.reverse() for chapter, data in chapters: data["_extractor"] = FoolslideChapterExtractor yield Message.Queue, chapter, data def chapters(self, page): extr = text.extract_from(page) manga = text.unescape(extr('

    ', '

    ')).strip() author = extr('Author: ', 'Artist: ', '
    ")) path = extr('href="//d', '"') if not path: self.log.warning( "Unable to download post %s (\"%s\")", post_id, text.remove_html( extr('System Message', '') or extr('System Message', '
    ') ) ) return None pi = text.parse_int rh = text.remove_html data = text.nameext_from_url(path, { "id" : pi(post_id), "url": "https://d" + path, }) if self._new_layout: data["tags"] = text.split_html(extr( 'class="tags-row">', '')) data["title"] = text.unescape(extr("

    ", "

    ")) data["artist"] = extr("", "<") data["_description"] = extr('class="section-body">', '') data["views"] = pi(rh(extr('class="views">', ''))) data["favorites"] = pi(rh(extr('class="favorites">', ''))) data["comments"] = pi(rh(extr('class="comments">', ''))) data["rating"] = rh(extr('class="rating">', '')) data["fa_category"] = rh(extr('>Category', '')) data["theme"] = rh(extr('>', '<')) data["species"] = rh(extr('>Species', '')) data["gender"] = rh(extr('>Gender', '')) data["width"] = pi(extr("", "x")) data["height"] = pi(extr("", "p")) else: # old site layout data["title"] = text.unescape(extr("

    ", "

    ")) data["artist"] = extr(">", "<") data["fa_category"] = extr("Category:", "<").strip() data["theme"] = extr("Theme:", "<").strip() data["species"] = extr("Species:", "<").strip() data["gender"] = extr("Gender:", "<").strip() data["favorites"] = pi(extr("Favorites:", "<")) data["comments"] = pi(extr("Comments:", "<")) data["views"] = pi(extr("Views:", "<")) data["width"] = pi(extr("Resolution:", "x")) data["height"] = pi(extr("", "<")) data["tags"] = text.split_html(extr( 'id="keywords">', ''))[::2] data["rating"] = extr('', ' ')
            data[", "") data["artist_url"] = data["artist"].replace("_", "").lower() data["user"] = self.user or data["artist_url"] data["date"] = text.parse_timestamp(data["filename"].partition(".")[0]) data["description"] = self._process_description(data["_description"]) return data @staticmethod def _process_description(description): return text.unescape(text.remove_html(description, "", "")) def _pagination(self, path): num = 1 while True: url = "{}/{}/{}/{}/".format( self.root, path, self.user, num) page = self.request(url).text post_id = None for post_id in text.extract_iter(page, 'id="sid-', '"'): yield post_id if not post_id: return num += 1 def _pagination_favorites(self): path = "/favorites/{}/".format(self.user) while path: page = self.request(self.root + path).text yield from text.extract_iter(page, 'id="sid-', '"') path = text.extr(page, 'right" href="', '"') def _pagination_search(self, query): url = self.root + "/search/" data = { "page" : 1, "order-by" : "relevancy", "order-direction": "desc", "range" : "all", "range_from" : "", "range_to" : "", "rating-general" : "1", "rating-mature" : "1", "rating-adult" : "1", "type-art" : "1", "type-music" : "1", "type-flash" : "1", "type-story" : "1", "type-photo" : "1", "type-poetry" : "1", "mode" : "extended", } data.update(query) if "page" in query: data["page"] = text.parse_int(query["page"]) while True: page = self.request(url, method="POST", data=data).text post_id = None for post_id in text.extract_iter(page, 'id="sid-', '"'): yield post_id if not post_id: return if "next_page" in data: data["page"] += 1 else: data["next_page"] = "Next" class FuraffinityGalleryExtractor(FuraffinityExtractor): """Extractor for a furaffinity user's gallery""" subcategory = "gallery" pattern = BASE_PATTERN + r"/gallery/([^/?#]+)" test = ("https://www.furaffinity.net/gallery/mirlinthloth/", { "pattern": r"https://d\d?\.f(uraffinity|acdn)\.net" r"/art/mirlinthloth/\d+/\d+.\w+\.\w+", "range": "45-50", "count": 6, }) def posts(self): return self._pagination("gallery") class FuraffinityScrapsExtractor(FuraffinityExtractor): """Extractor for a furaffinity user's scraps""" subcategory = "scraps" directory_fmt = ("{category}", "{user!l}", "Scraps") pattern = BASE_PATTERN + r"/scraps/([^/?#]+)" test = ("https://www.furaffinity.net/scraps/mirlinthloth/", { "pattern": r"https://d\d?\.f(uraffinity|acdn)\.net" r"/art/[^/]+(/stories)?/\d+/\d+.\w+.", "count": ">= 3", }) def posts(self): return self._pagination("scraps") class FuraffinityFavoriteExtractor(FuraffinityExtractor): """Extractor for a furaffinity user's favorites""" subcategory = "favorite" directory_fmt = ("{category}", "{user!l}", "Favorites") pattern = BASE_PATTERN + r"/favorites/([^/?#]+)" test = ("https://www.furaffinity.net/favorites/mirlinthloth/", { "pattern": r"https://d\d?\.f(uraffinity|acdn)\.net" r"/art/[^/]+/\d+/\d+.\w+\.\w+", "range": "45-50", "count": 6, }) def posts(self): return self._pagination_favorites() class FuraffinitySearchExtractor(FuraffinityExtractor): """Extractor for furaffinity search results""" subcategory = "search" directory_fmt = ("{category}", "Search", "{search}") pattern = BASE_PATTERN + r"/search(?:/([^/?#]+))?/?[?&]([^#]+)" test = ( ("https://www.furaffinity.net/search/?q=cute", { "pattern": r"https://d\d?\.f(uraffinity|acdn)\.net" r"/art/[^/]+/\d+/\d+.\w+\.\w+", "range": "45-50", "count": 6, }), # first page of search results (#2402) ("https://www.furaffinity.net/search/?q=leaf&range=1day", { "range": "1-3", "count": 3, }), ) def __init__(self, match): FuraffinityExtractor.__init__(self, match) self.query = text.parse_query(match.group(2)) if self.user and "q" not in self.query: self.query["q"] = text.unquote(self.user) def metadata(self): return {"search": self.query.get("q")} def posts(self): return self._pagination_search(self.query) class FuraffinityPostExtractor(FuraffinityExtractor): """Extractor for individual posts on furaffinity""" subcategory = "post" pattern = BASE_PATTERN + r"/(?:view|full)/(\d+)" test = ( ("https://www.furaffinity.net/view/21835115/", { "pattern": r"https://d\d*\.f(uraffinity|acdn)\.net/(download/)?art" r"/mirlinthloth/music/1488278723/1480267446.mirlinthlot" r"h_dj_fennmink_-_bude_s_4_ever\.mp3", "keyword": { "artist" : "mirlinthloth", "artist_url" : "mirlinthloth", "date" : "dt:2016-11-27 17:24:06", "description": "A Song made playing the game Cosmic DJ.", "extension" : "mp3", "filename" : r"re:\d+\.\w+_dj_fennmink_-_bude_s_4_ever", "id" : 21835115, "tags" : list, "title" : "Bude's 4 Ever", "url" : r"re:https://d\d?\.f(uraffinity|acdn)\.net/art", "user" : "mirlinthloth", "views" : int, "favorites" : int, "comments" : int, "rating" : "General", "fa_category": "Music", "theme" : "All", "species" : "Unspecified / Any", "gender" : "Any", "width" : 120, "height" : 120, }, }), # 'external' option (#1492) ("https://www.furaffinity.net/view/42166511/", { "options": (("external", True),), "pattern": r"https://d\d*\.f(uraffinity|acdn)\.net/" r"|http://www\.postybirb\.com", "count": 2, }), # no tags (#2277) ("https://www.furaffinity.net/view/45331225/", { "keyword": { "artist": "Kota_Remminders", "artist_url": "kotaremminders", "date": "dt:2022-01-03 17:49:33", "fa_category": "Adoptables", "filename": "1641232173.kotaremminders_chidopts1", "gender": "Any", "height": 905, "id": 45331225, "rating": "General", "species": "Unspecified / Any", "tags": [], "theme": "All", "title": "REMINDER", "width": 1280, }, }), ("https://furaffinity.net/view/21835115/"), ("https://sfw.furaffinity.net/view/21835115/"), ("https://www.furaffinity.net/full/21835115/"), ) def posts(self): post_id = self.user self.user = None return (post_id,) class FuraffinityUserExtractor(FuraffinityExtractor): """Extractor for furaffinity user profiles""" subcategory = "user" cookiedomain = None pattern = BASE_PATTERN + r"/user/([^/?#]+)" test = ( ("https://www.furaffinity.net/user/mirlinthloth/", { "pattern": r"/gallery/mirlinthloth/$", }), ("https://www.furaffinity.net/user/mirlinthloth/", { "options": (("include", "all"),), "pattern": r"/(gallery|scraps|favorites)/mirlinthloth/$", "count": 3, }), ) def items(self): base = "{}/{{}}/{}/".format(self.root, self.user) return self._dispatch_extractors(( (FuraffinityGalleryExtractor , base.format("gallery")), (FuraffinityScrapsExtractor , base.format("scraps")), (FuraffinityFavoriteExtractor, base.format("favorites")), ), ("gallery",)) class FuraffinityFollowingExtractor(FuraffinityExtractor): """Extractor for a furaffinity user's watched users""" subcategory = "following" pattern = BASE_PATTERN + "/watchlist/by/([^/?#]+)" test = ("https://www.furaffinity.net/watchlist/by/mirlinthloth/", { "pattern": FuraffinityUserExtractor.pattern, "range": "176-225", "count": 50, }) def items(self): url = "{}/watchlist/by/{}/".format(self.root, self.user) data = {"_extractor": FuraffinityUserExtractor} while True: page = self.request(url).text for path in text.extract_iter(page, '
    ", "").strip() title, _, gallery_id = title.rpartition("#") return { "gallery_id" : text.parse_int(gallery_id), "gallery_hash": self.gallery_hash, "title" : text.unescape(title[:-15]), "views" : data["hits"], "score" : data["rating"], "tags" : data["tags"].split(","), "count" : len(data["images"]), } def images(self, page): for image in self.data["images"]: yield "https:" + image["imageUrl"], image class FuskatorSearchExtractor(Extractor): """Extractor for search results on fuskator.com""" category = "fuskator" subcategory = "search" root = "https://fuskator.com" pattern = r"(?:https?://)?fuskator\.com(/(?:search|page)/.+)" test = ( ("https://fuskator.com/search/red_swimsuit/", { "pattern": FuskatorGalleryExtractor.pattern, "count": ">= 40", }), ("https://fuskator.com/page/3/swimsuit/quality/"), ) def __init__(self, match): Extractor.__init__(self, match) self.path = match.group(1) def items(self): url = self.root + self.path data = {"_extractor": FuskatorGalleryExtractor} while True: page = self.request(url).text for path in text.extract_iter( page, 'class="pic_pad">', '>>><') if not pages: return url = self.root + text.rextract(pages, 'href="', '"')[0] ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1677790111.0 gallery_dl-1.25.0/gallery_dl/extractor/gelbooru.py0000644000175000017500000002024314400205637020734 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2014-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://gelbooru.com/""" from .common import Extractor, Message from . import gelbooru_v02 from .. import text, exception import binascii BASE_PATTERN = r"(?:https?://)?(?:www\.)?gelbooru\.com/(?:index\.php)?\?" class GelbooruBase(): """Base class for gelbooru extractors""" category = "gelbooru" basecategory = "booru" root = "https://gelbooru.com" def _api_request(self, params): params["api_key"] = self.api_key params["user_id"] = self.user_id url = self.root + "/index.php?page=dapi&s=post&q=index&json=1" data = self.request(url, params=params).json() if "post" not in data: return () posts = data["post"] if not isinstance(posts, list): return (posts,) return posts def _pagination(self, params): params["pid"] = self.page_start params["limit"] = self.per_page limit = self.per_page // 2 while True: posts = self._api_request(params) for post in posts: yield post if len(posts) < limit: return if "pid" in params: del params["pid"] params["tags"] = "{} id:<{}".format(self.tags, post["id"]) def _pagination_html(self, params): url = self.root + "/index.php" params["pid"] = self.page_start * self.per_page data = {} while True: num_ids = 0 page = self.request(url, params=params).text for data["id"] in text.extract_iter(page, '" id="p', '"'): num_ids += 1 yield from self._api_request(data) if num_ids < self.per_page: return params["pid"] += self.per_page @staticmethod def _file_url(post): url = post["file_url"] if url.endswith((".webm", ".mp4")): md5 = post["md5"] path = "/images/{}/{}/{}.webm".format(md5[0:2], md5[2:4], md5) post["_fallback"] = GelbooruBase._video_fallback(path) url = "https://img3.gelbooru.com" + path return url @staticmethod def _video_fallback(path): yield "https://img2.gelbooru.com" + path yield "https://img1.gelbooru.com" + path def _notes(self, post, page): notes_data = text.extr(page, '
    ') if not notes_data: return post["notes"] = notes = [] extr = text.extract for note in text.extract_iter(notes_data, ''): notes.append({ "width" : int(extr(note, 'data-width="', '"')[0]), "height": int(extr(note, 'data-height="', '"')[0]), "x" : int(extr(note, 'data-x="', '"')[0]), "y" : int(extr(note, 'data-y="', '"')[0]), "body" : extr(note, 'data-body="', '"')[0], }) class GelbooruTagExtractor(GelbooruBase, gelbooru_v02.GelbooruV02TagExtractor): """Extractor for images from gelbooru.com based on search-tags""" pattern = BASE_PATTERN + r"page=post&s=list&tags=([^&#]+)" test = ( ("https://gelbooru.com/index.php?page=post&s=list&tags=bonocho", { "count": 5, }), ("https://gelbooru.com/index.php?page=post&s=list&tags=meiya_neon", { "range": "196-204", "url": "845a61aa1f90fb4ced841e8b7e62098be2e967bf", "pattern": r"https://img\d\.gelbooru\.com" r"/images/../../[0-9a-f]{32}\.jpg", "count": 9, }), ) class GelbooruPoolExtractor(GelbooruBase, gelbooru_v02.GelbooruV02PoolExtractor): """Extractor for gelbooru pools""" per_page = 45 pattern = BASE_PATTERN + r"page=pool&s=show&id=(\d+)" test = ( ("https://gelbooru.com/index.php?page=pool&s=show&id=761", { "count": 6, }), ) def metadata(self): url = self.root + "/index.php" self._params = { "page": "pool", "s" : "show", "id" : self.pool_id, "pid" : self.page_start, } page = self.request(url, params=self._params).text name, pos = text.extract(page, "

    Now Viewing: ", "

    ") if not name: raise exception.NotFoundError("pool") return { "pool": text.parse_int(self.pool_id), "pool_name": text.unescape(name), } def posts(self): return self._pagination_html(self._params) class GelbooruFavoriteExtractor(GelbooruBase, gelbooru_v02.GelbooruV02FavoriteExtractor): pattern = BASE_PATTERN + r"page=favorites&s=view&id=(\d+)" test = ("https://gelbooru.com/index.php?page=favorites&s=view&id=12345",) class GelbooruPostExtractor(GelbooruBase, gelbooru_v02.GelbooruV02PostExtractor): """Extractor for single images from gelbooru.com""" pattern = (BASE_PATTERN + r"(?=(?:[^#]+&)?page=post(?:&|#|$))" r"(?=(?:[^#]+&)?s=view(?:&|#|$))" r"(?:[^#]+&)?id=(\d+)") test = ( ("https://gelbooru.com/index.php?page=post&s=view&id=313638", { "content": "5e255713cbf0a8e0801dc423563c34d896bb9229", "count": 1, }), ("https://gelbooru.com/index.php?page=post&s=view&id=313638"), ("https://gelbooru.com/index.php?s=view&page=post&id=313638"), ("https://gelbooru.com/index.php?page=post&id=313638&s=view"), ("https://gelbooru.com/index.php?s=view&id=313638&page=post"), ("https://gelbooru.com/index.php?id=313638&page=post&s=view"), ("https://gelbooru.com/index.php?id=313638&s=view&page=post"), ("https://gelbooru.com/index.php?page=post&s=view&id=6018318", { "options": (("tags", True),), "content": "977caf22f27c72a5d07ea4d4d9719acdab810991", "keyword": { "tags_artist": "kirisaki_shuusei", "tags_character": str, "tags_copyright": "vocaloid", "tags_general": str, "tags_metadata": str, }, }), # video ("https://gelbooru.com/index.php?page=post&s=view&id=5938076", { "content": "6360452fa8c2f0c1137749e81471238564df832a", "pattern": r"https://img\d\.gelbooru\.com/images" r"/22/61/226111273615049235b001b381707bd0\.webm", }), # notes ("https://gelbooru.com/index.php?page=post&s=view&id=5997331", { "options": (("notes", True),), "keyword": { "notes": [ { "body": "Look over this way when you talk~", "height": 553, "width": 246, "x": 35, "y": 72, }, { "body": "Hey~\nAre you listening~?", "height": 557, "width": 246, "x": 1233, "y": 109, }, ], }, }), ) class GelbooruRedirectExtractor(GelbooruBase, Extractor): subcategory = "redirect" pattern = (r"(?:https?://)?(?:www\.)?gelbooru\.com" r"/redirect\.php\?s=([^&#]+)") test = (("https://gelbooru.com/redirect.php?s=Ly9nZWxib29ydS5jb20vaW5kZXgu" "cGhwP3BhZ2U9cG9zdCZzPXZpZXcmaWQ9MTgzMDA0Ng=="), { "pattern": r"https://gelbooru.com/index.php" r"\?page=post&s=view&id=1830046" }) def __init__(self, match): Extractor.__init__(self, match) self.redirect_url = text.ensure_http_scheme( binascii.a2b_base64(match.group(1)).decode()) def items(self): data = {"_extractor": GelbooruPostExtractor} yield Message.Queue, self.redirect_url, data ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1675106081.0 gallery_dl-1.25.0/gallery_dl/extractor/gelbooru_v01.py0000644000175000017500000001565614366013441021437 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2021-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for Gelbooru Beta 0.1.11 sites""" from . import booru from .. import text class GelbooruV01Extractor(booru.BooruExtractor): basecategory = "gelbooru_v01" per_page = 20 def _parse_post(self, post_id): url = "{}/index.php?page=post&s=view&id={}".format( self.root, post_id) page = self.request(url).text post = text.extract_all(page, ( ("created_at", 'Posted: ', ' <'), ("uploader" , 'By: ', ' <'), ("width" , 'Size: ', 'x'), ("height" , '', ' <'), ("source" , 'Source:
    ', '<'), ))[0] post["id"] = post_id post["md5"] = post["file_url"].rpartition("/")[2].partition(".")[0] post["rating"] = (post["rating"] or "?")[0].lower() post["tags"] = text.unescape(post["tags"]) post["date"] = text.parse_datetime( post["created_at"], "%Y-%m-%d %H:%M:%S") return post def _pagination(self, url, begin, end): pid = self.page_start while True: page = self.request(url + str(pid)).text cnt = 0 for post_id in text.extract_iter(page, begin, end): yield self._parse_post(post_id) cnt += 1 if cnt < self.per_page: return pid += self.per_page BASE_PATTERN = GelbooruV01Extractor.update({ "thecollection": { "root": "https://the-collection.booru.org", "pattern": r"the-collection\.booru\.org", }, "illusioncardsbooru": { "root": "https://illusioncards.booru.org", "pattern": r"illusioncards\.booru\.org", }, "allgirlbooru": { "root": "https://allgirl.booru.org", "pattern": r"allgirl\.booru\.org", }, "drawfriends": { "root": "https://drawfriends.booru.org", "pattern": r"drawfriends\.booru\.org", }, "vidyart": { "root": "https://vidyart.booru.org", "pattern": r"vidyart\.booru\.org", }, }) class GelbooruV01TagExtractor(GelbooruV01Extractor): subcategory = "tag" directory_fmt = ("{category}", "{search_tags}") archive_fmt = "t_{search_tags}_{id}" pattern = BASE_PATTERN + r"/index\.php\?page=post&s=list&tags=([^&#]+)" test = ( (("https://the-collection.booru.org" "/index.php?page=post&s=list&tags=parody"), { "range": "1-25", "count": 25, }), (("https://illusioncards.booru.org" "/index.php?page=post&s=list&tags=koikatsu"), { "range": "1-25", "count": 25, }), ("https://allgirl.booru.org/index.php?page=post&s=list&tags=dress", { "range": "1-25", "count": 25, }), ("https://drawfriends.booru.org/index.php?page=post&s=list&tags=all"), ("https://vidyart.booru.org/index.php?page=post&s=list&tags=all"), ) def __init__(self, match): GelbooruV01Extractor.__init__(self, match) self.tags = match.group(match.lastindex) def metadata(self): return {"search_tags": text.unquote(self.tags.replace("+", " "))} def posts(self): url = "{}/index.php?page=post&s=list&tags={}&pid=".format( self.root, self.tags) return self._pagination(url, 'class="thumb">')) if not tag_container: return tags = collections.defaultdict(list) pattern = re.compile( r"tag-type-([^\"' ]+).*?[?;]tags=([^\"'&]+)", re.S) for tag_type, tag_name in pattern.findall(tag_container): tags[tag_type].append(text.unquote(tag_name)) for key, value in tags.items(): post["tags_" + key] = " ".join(value) def _notes(self, post, page): note_container = text.extr(page, 'id="note-container"', "", ""))), }) def _file_url_realbooru(self, post): url = post["file_url"] md5 = post["md5"] if md5 not in post["preview_url"] or url.count("/") == 5: url = "{}/images/{}/{}/{}.{}".format( self.root, md5[0:2], md5[2:4], md5, url.rpartition(".")[2]) return url def _tags_realbooru(self, post, page): tag_container = text.extr(page, 'id="tagLink"', '') tags = collections.defaultdict(list) pattern = re.compile( r'= 64", }), ("https://tbib.org/index.php?page=post&s=list&tags=yuyaiyaui", { "count": ">= 120", }), ("https://hypnohub.net/index.php?page=post&s=list&tags=gonoike_biwa", { "url": "fe662b86d38c331fcac9c62af100167d404937dc", }), ) def __init__(self, match): GelbooruV02Extractor.__init__(self, match) tags = match.group(match.lastindex) self.tags = text.unquote(tags.replace("+", " ")) def metadata(self): return {"search_tags": self.tags} def posts(self): return self._pagination({"tags": self.tags}) class GelbooruV02PoolExtractor(GelbooruV02Extractor): subcategory = "pool" directory_fmt = ("{category}", "pool", "{pool}") archive_fmt = "p_{pool}_{id}" pattern = BASE_PATTERN + r"/index\.php\?page=pool&s=show&id=(\d+)" test = ( ("https://rule34.xxx/index.php?page=pool&s=show&id=179", { "count": 3, }), ("https://safebooru.org/index.php?page=pool&s=show&id=11", { "count": 5, }), ("https://realbooru.com/index.php?page=pool&s=show&id=1", { "count": 3, }), ("https://hypnohub.net/index.php?page=pool&s=show&id=61", { "url": "d314826280073441a2da609f70ee814d1f4b9407", "count": 3, }), ) def __init__(self, match): GelbooruV02Extractor.__init__(self, match) self.pool_id = match.group(match.lastindex) if self.category == "rule34": self.posts = self._posts_pages self.per_page = 45 else: self.post_ids = () def skip(self, num): self.page_start += num return num def metadata(self): url = "{}/index.php?page=pool&s=show&id={}".format( self.root, self.pool_id) page = self.request(url).text name, pos = text.extract(page, "

    Pool: ", "

    ") if not name: raise exception.NotFoundError("pool") self.post_ids = text.extract_iter( page, 'class="thumb" id="p', '"', pos) return { "pool": text.parse_int(self.pool_id), "pool_name": text.unescape(name), } def posts(self): params = {} for params["id"] in util.advance(self.post_ids, self.page_start): for post in self._api_request(params): yield post.attrib def _posts_pages(self): return self._pagination_html({ "page": "pool", "s" : "show", "id" : self.pool_id, }) class GelbooruV02FavoriteExtractor(GelbooruV02Extractor): subcategory = "favorite" directory_fmt = ("{category}", "favorites", "{favorite_id}") archive_fmt = "f_{favorite_id}_{id}" per_page = 50 pattern = BASE_PATTERN + r"/index\.php\?page=favorites&s=view&id=(\d+)" test = ( ("https://rule34.xxx/index.php?page=favorites&s=view&id=1030218", { "count": 3, }), ("https://safebooru.org/index.php?page=favorites&s=view&id=17567", { "count": 2, }), ("https://realbooru.com/index.php?page=favorites&s=view&id=274", { "count": 2, }), ("https://tbib.org/index.php?page=favorites&s=view&id=7881", { "count": 3, }), ("https://hypnohub.net/index.php?page=favorites&s=view&id=43546", { "count": 3, }), ) def __init__(self, match): GelbooruV02Extractor.__init__(self, match) self.favorite_id = match.group(match.lastindex) def metadata(self): return {"favorite_id": text.parse_int(self.favorite_id)} def posts(self): return self._pagination_html({ "page": "favorites", "s" : "view", "id" : self.favorite_id, }) class GelbooruV02PostExtractor(GelbooruV02Extractor): subcategory = "post" archive_fmt = "{id}" pattern = BASE_PATTERN + r"/index\.php\?page=post&s=view&id=(\d+)" test = ( ("https://rule34.xxx/index.php?page=post&s=view&id=863", { "pattern": r"https://api-cdn\.rule34\.xxx/images" r"/1/6aafbdb3e22f3f3b412ea2cf53321317a37063f3\.jpg", "content": ("a43f418aa350039af0d11cae501396a33bbe2201", "67b516295950867e1c1ab6bc13b35d3b762ed2a3"), "options": (("tags", True), ("notes", True)), "keyword": { "tags_artist": "reverse_noise yamu_(reverse_noise)", "tags_character": "hong_meiling", "tags_copyright": "touhou", "tags_general": str, "tags_metadata": "censored translated", "notes": [ { "body": "It feels angry, I'm losing myself... " "It won't calm down!", "height": 65, "id": 93586, "width": 116, "x": 22, "y": 333, }, { "body": "REPUTATION OF RAGE", "height": 272, "id": 93587, "width": 199, "x": 78, "y": 442, }, ], }, }), ("https://hypnohub.net/index.php?page=post&s=view&id=1439", { "pattern": r"https://hypnohub\.net/images" r"/90/24/90245c3c5250c2a8173255d3923a010b\.jpg", "content": "5987c5d2354f22e5fa9b7ee7ce4a6f7beb8b2b71", "options": (("tags", True), ("notes", True)), "keyword": { "tags_artist": "brokenteapot", "tags_character": "hsien-ko", "tags_copyright": "capcom darkstalkers", "tags_general": str, "tags_metadata": "dialogue text translated", "notes": [ { "body": "Master Master Master " "Master Master Master", "height": 83, "id": 10577, "width": 129, "x": 259, "y": 20, }, { "body": "Response Response Response " "Response Response Response", "height": 86, "id": 10578, "width": 125, "x": 126, "y": 20, }, { "body": "Obedience Obedience Obedience " "Obedience Obedience Obedience", "height": 80, "id": 10579, "width": 98, "x": 20, "y": 20, }, ], }, }), ("https://safebooru.org/index.php?page=post&s=view&id=1169132", { "url": "cf05e37a3c62b2d55788e2080b8eabedb00f999b", "content": "93b293b27dabd198afafabbaf87c49863ac82f27", "options": (("tags", True),), "keyword": { "tags_artist": "kawanakajima", "tags_character": "heath_ledger ronald_mcdonald the_joker", "tags_copyright": "dc_comics mcdonald's the_dark_knight", "tags_general": str, }, }), ("https://realbooru.com/index.php?page=post&s=view&id=668483", { "pattern": r"https://realbooru\.com/images/dc/b5" r"/dcb5c0ce9ec0bf74a6930608985f4719\.jpeg", "content": "7f5873ce3b6cd295ea2e81fcb49583098ea9c8da", "options": (("tags", True),), "keyword": { "tags_general": "1girl blonde blonde_hair blue_eyes cute " "female female_only looking_at_viewer smile " "solo solo_female teeth", "tags_model": "jennifer_lawrence", }, }), ("https://tbib.org/index.php?page=post&s=view&id=9233957", { "url": "5a6ebe07bfff8e6d27f7c30b5480f27abcb577d2", "content": "1c3831b6fbaa4686e3c79035b5d98460b1c85c43", }), ) def __init__(self, match): GelbooruV02Extractor.__init__(self, match) self.post_id = match.group(match.lastindex) def posts(self): return self._pagination({"id": self.post_id}) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1678553031.0 gallery_dl-1.25.0/gallery_dl/extractor/generic.py0000644000175000017500000002132414403127707020537 0ustar00mikemike# -*- coding: utf-8 -*- """Extractor for images in a generic web page.""" from .common import Extractor, Message from .. import config, text import re import os.path class GenericExtractor(Extractor): """Extractor for images in a generic web page.""" category = "generic" directory_fmt = ("{category}", "{pageurl}") archive_fmt = "{imageurl}" # By default, the generic extractor is disabled # and the "g(eneric):" prefix in url is required. # If the extractor is enabled, make the prefix optional pattern = r"(?ix)(?Pg(?:eneric)?:)" if config.get(("extractor", "generic"), "enabled"): pattern += r"?" # The generic extractor pattern should match (almost) any valid url # Based on: https://tools.ietf.org/html/rfc3986#appendix-B pattern += r""" (?Phttps?://)? # optional http(s) scheme (?P[-\w\.]+) # required domain (?P/[^?#]*)? # optional path (?:\?(?P[^#]*))? # optional query (?:\#(?P.*))? # optional fragment """ test = ( ("generic:https://www.nongnu.org/lzip/", { "count": 1, "content": "40be5c77773d3e91db6e1c5df720ee30afb62368", "keyword": { "description": "Lossless data compressor", "imageurl": "https://www.nongnu.org/lzip/lzip.png", "keywords": "lzip, clzip, plzip, lzlib, LZMA, bzip2, " "gzip, data compression, GNU, free software", "pageurl": "https://www.nongnu.org/lzip/", }, }), # internationalized domain name ("generic:https://räksmörgås.josefsson.org/", { "count": 2, "pattern": "^https://räksmörgås.josefsson.org/", }), ("generic:https://en.wikipedia.org/Main_Page"), ("generic:https://example.org/path/to/file?que=1?&ry=2/#fragment"), ("generic:https://example.org/%27%3C%23/%23%3E%27.htm?key=%3C%26%3E"), ) def __init__(self, match): """Init.""" Extractor.__init__(self, match) # Strip the "g(eneric):" prefix # and inform about "forced" or "fallback" mode if match.group('generic'): self.log.info("Forcing use of generic information extractor.") self.url = match.group(0).partition(":")[2] else: self.log.info("Falling back on generic information extractor.") self.url = match.group(0) # Make sure we have a scheme, or use https if match.group('scheme'): self.scheme = match.group('scheme') else: self.scheme = 'https://' self.url = self.scheme + self.url # Used to resolve relative image urls self.root = self.scheme + match.group('domain') def items(self): """Get page, extract metadata & images, yield them in suitable messages Adapted from common.GalleryExtractor.items() """ page = self.request(self.url).text data = self.metadata(page) imgs = self.images(page) try: data["count"] = len(imgs) except TypeError: pass images = enumerate(imgs, 1) yield Message.Version, 1 yield Message.Directory, data for data["num"], (url, imgdata) in images: if imgdata: data.update(imgdata) if "extension" not in imgdata: text.nameext_from_url(url, data) else: text.nameext_from_url(url, data) yield Message.Url, url, data def metadata(self, page): """Extract generic webpage metadata, return them in a dict.""" data = {} data['pageurl'] = self.url data['title'] = text.extr(page, '', "") data['description'] = text.extr( page, ',
    View ', '<', pos) data["chapter"] = text.parse_int(url.rpartition("/")[2][1:]) data["title"] = title results.append((text.urljoin(self.root, url), data.copy())) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1675806009.0 gallery_dl-1.25.0/gallery_dl/extractor/hentai2read.py0000644000175000017500000001241014370542471021307 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2016-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://hentai2read.com/""" from .common import ChapterExtractor, MangaExtractor from .. import text, util import re class Hentai2readBase(): """Base class for hentai2read extractors""" category = "hentai2read" root = "https://hentai2read.com" class Hentai2readChapterExtractor(Hentai2readBase, ChapterExtractor): """Extractor for a single manga chapter from hentai2read.com""" archive_fmt = "{chapter_id}_{page}" pattern = r"(?:https?://)?(?:www\.)?hentai2read\.com(/[^/?#]+/([^/?#]+))" test = ( ("https://hentai2read.com/amazon_elixir/1/", { "url": "964b942cf492b3a129d2fe2608abfc475bc99e71", "keyword": "85645b02d34aa11b3deb6dadd7536863476e1bad", }), ("https://hentai2read.com/popuni_kei_joshi_panic/2.5/", { "pattern": r"https://hentaicdn\.com/hentai" r"/13088/2\.5y/ccdn00\d+\.jpg", "count": 36, "keyword": { "author": "Kurisu", "chapter": 2, "chapter_id": 75152, "chapter_minor": ".5", "count": 36, "lang": "en", "language": "English", "manga": "Popuni Kei Joshi Panic!", "manga_id": 13088, "page": int, "title": "Popuni Kei Joshi Panic! 2.5", "type": "Original", }, }), ) def __init__(self, match): self.chapter = match.group(2) ChapterExtractor.__init__(self, match) def metadata(self, page): title, pos = text.extract(page, "", "") manga_id, pos = text.extract(page, 'data-mid="', '"', pos) chapter_id, pos = text.extract(page, 'data-cid="', '"', pos) chapter, sep, minor = self.chapter.partition(".") match = re.match(r"Reading (.+) \(([^)]+)\) Hentai(?: by (.+))? - " r"([^:]+): (.+) . Page 1 ", title) return { "manga": match.group(1), "manga_id": text.parse_int(manga_id), "chapter": text.parse_int(chapter), "chapter_minor": sep + minor, "chapter_id": text.parse_int(chapter_id), "type": match.group(2), "author": match.group(3), "title": match.group(5), "lang": "en", "language": "English", } def images(self, page): images = text.extract(page, "'images' : ", ",\n")[0] return [ ("https://hentaicdn.com/hentai" + part, None) for part in util.json_loads(images) ] class Hentai2readMangaExtractor(Hentai2readBase, MangaExtractor): """Extractor for hmanga from hentai2read.com""" chapterclass = Hentai2readChapterExtractor pattern = r"(?:https?://)?(?:www\.)?hentai2read\.com(/[^/?#]+)/?$" test = ( ("https://hentai2read.com/amazon_elixir/", { "url": "273073752d418ec887d7f7211e42b832e8c403ba", "keyword": "5c1b712258e78e120907121d3987c71f834d13e1", }), ("https://hentai2read.com/oshikage_riot/", { "url": "6595f920a3088a15c2819c502862d45f8eb6bea6", "keyword": "a2e9724acb221040d4b29bf9aa8cb75b2240d8af", }), ("https://hentai2read.com/popuni_kei_joshi_panic/", { "pattern": Hentai2readChapterExtractor.pattern, "range": "2-3", "keyword": { "chapter": int, "chapter_id": int, "chapter_minor": ".5", "lang": "en", "language": "English", "manga": "Popuni Kei Joshi Panic!", "manga_id": 13088, "title": str, "type": "Original", }, }), ) def chapters(self, page): results = [] pos = page.find('itemscope itemtype="http://schema.org/Book') + 1 manga, pos = text.extract( page, '', '', pos) mtype, pos = text.extract( page, '[', ']', pos) manga_id = text.parse_int(text.extract( page, 'data-mid="', '"', pos)[0]) while True: chapter_id, pos = text.extract(page, ' data-cid="', '"', pos) if not chapter_id: return results _ , pos = text.extract(page, ' href="', '"', pos) url, pos = text.extract(page, ' href="', '"', pos) chapter, pos = text.extract(page, '>', '<', pos) chapter, _, title = text.unescape(chapter).strip().partition(" - ") chapter, sep, minor = chapter.partition(".") results.append((url, { "manga": manga, "manga_id": manga_id, "chapter": text.parse_int(chapter), "chapter_minor": sep + minor, "chapter_id": text.parse_int(chapter_id), "type": mtype, "title": title, "lang": "en", "language": "English", })) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1674475069.0 gallery_dl-1.25.0/gallery_dl/extractor/hentaicosplays.py0000644000175000017500000000524214363473075022161 0ustar00mikemike# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://hentai-cosplays.com/ (also works for hentai-img.com and porn-images-xxx.com)""" from .common import GalleryExtractor from .. import text class HentaicosplaysGalleryExtractor(GalleryExtractor): """Extractor for image galleries from hentai-cosplays.com, hentai-img.com, and porn-images-xxx.com""" category = "hentaicosplays" directory_fmt = ("{site}", "{title}") filename_fmt = "{filename}.{extension}" archive_fmt = "{title}_{filename}" pattern = r"((?:https?://)?(?:\w{2}\.)?" \ r"(hentai-cosplays|hentai-img|porn-images-xxx)\.com)/" \ r"(?:image|story)/([\w-]+)" test = ( ("https://hentai-cosplays.com/image/---devilism--tide-kurihara-/", { "pattern": r"https://static\d?.hentai-cosplays.com/upload/" r"\d+/\d+/\d+/\d+.jpg$", "keyword": { "count": 18, "site": "hentai-cosplays", "slug": "---devilism--tide-kurihara-", "title": "艦 こ れ-devilism の tide Kurihara 憂", }, }), ("https://fr.porn-images-xxx.com/image/enako-enako-24/", { "pattern": r"https://static\d?.porn-images-xxx.com/upload/" r"\d+/\d+/\d+/\d+.jpg$", "keyword": { "count": 11, "site": "porn-images-xxx", "title": str, }, }), ("https://ja.hentai-img.com/image/hollow-cora-502/", { "pattern": r"https://static\d?.hentai-img.com/upload/" r"\d+/\d+/\d+/\d+.jpg$", "keyword": { "count": 2, "site": "hentai-img", "title": str, }, }), ) def __init__(self, match): root, self.site, self.slug = match.groups() self.root = text.ensure_http_scheme(root) url = "{}/story/{}/".format(self.root, self.slug) GalleryExtractor.__init__(self, match, url) self.session.headers["Referer"] = url def metadata(self, page): title = text.extr(page, "", "") return { "title": text.unescape(title.rpartition(" Story Viewer - ")[0]), "slug" : self.slug, "site" : self.site, } def images(self, page): return [ (url, None) for url in text.extract_iter( page, '', '<')), "artist" : text.unescape(extr('/profile">', '<')), "width" : text.parse_int(extr('width="', '"')), "height" : text.parse_int(extr('height="', '"')), "index" : text.parse_int(path.rsplit("/", 2)[1]), "src" : text.urljoin(self.root, text.unescape(extr( 'src="', '"'))), "description": text.unescape(text.remove_html(extr( '>Description', '
    ') .replace("\r\n", "\n"), "", "")), "ratings" : [text.unescape(r) for r in text.extract_iter(extr( "class='ratings_box'", ""), "title='", "'")], "date" : text.parse_datetime(extr("datetime='", "'")), "views" : text.parse_int(extr(">Views", "<")), "score" : text.parse_int(extr(">Vote Score", "<")), "media" : text.unescape(extr(">Media", "<").strip()), "tags" : text.split_html(extr( ">Tags ", "")), } return text.nameext_from_url(data["src"], data) def _parse_story(self, html): """Collect url and metadata for a story""" extr = text.extract_from(html) data = { "user" : self.user, "title" : text.unescape(extr( "
    ", "").rpartition(">")[2]), "author" : text.unescape(extr('alt="', '"')), "date" : text.parse_datetime(extr( ">Updated<", "").rpartition(">")[2], "%B %d, %Y"), "status" : extr("class='indent'>", "<"), } for c in ("Chapters", "Words", "Comments", "Views", "Rating"): data[c.lower()] = text.parse_int(extr( ">" + c + ":", "<").replace(",", "")) data["description"] = text.unescape(extr( "class='storyDescript'>", ""), "title='", "'")] return text.nameext_from_url(data["src"], data) def _init_site_filters(self): """Set site-internal filters to show all images""" url = self.root + "/?enterAgree=1" self.request(url, method="HEAD") csrf_token = self.session.cookies.get( "YII_CSRF_TOKEN", domain=self.cookiedomain) if not csrf_token: self.log.warning("Unable to update site content filters") return url = self.root + "/site/filters" data = { "rating_nudity" : "3", "rating_violence" : "3", "rating_profanity": "3", "rating_racism" : "3", "rating_sex" : "3", "rating_spoilers" : "3", "rating_yaoi" : "1", "rating_yuri" : "1", "rating_teen" : "1", "rating_guro" : "1", "rating_furry" : "1", "rating_beast" : "1", "rating_male" : "1", "rating_female" : "1", "rating_futa" : "1", "rating_other" : "1", "rating_scat" : "1", "rating_incest" : "1", "rating_rape" : "1", "filter_media" : "A", "filter_order" : "date_new", "filter_type" : "0", "YII_CSRF_TOKEN" : text.unquote(text.extr( csrf_token, "%22", "%22")), } self.request(url, method="POST", data=data) class HentaifoundryUserExtractor(HentaifoundryExtractor): """Extractor for a hentaifoundry user profile""" subcategory = "user" pattern = BASE_PATTERN + r"/user/([^/?#]+)/profile" test = ("https://www.hentai-foundry.com/user/Tenpura/profile",) def items(self): root = self.root user = "/user/" + self.user return self._dispatch_extractors(( (HentaifoundryPicturesExtractor , root + "/pictures" + user), (HentaifoundryScrapsExtractor, root + "/pictures" + user + "/scraps"), (HentaifoundryStoriesExtractor, root + "/stories" + user), (HentaifoundryFavoriteExtractor, root + user + "/faves/pictures"), ), ("pictures",)) class HentaifoundryPicturesExtractor(HentaifoundryExtractor): """Extractor for all pictures of a hentaifoundry user""" subcategory = "pictures" pattern = BASE_PATTERN + r"/pictures/user/([^/?#]+)(?:/page/(\d+))?/?$" test = ( ("https://www.hentai-foundry.com/pictures/user/Tenpura", { "url": "ebbc981a85073745e3ca64a0f2ab31fab967fc28", }), ("https://www.hentai-foundry.com/pictures/user/Tenpura/page/3"), ) def __init__(self, match): HentaifoundryExtractor.__init__(self, match) self.page_url = "{}/pictures/user/{}".format(self.root, self.user) class HentaifoundryScrapsExtractor(HentaifoundryExtractor): """Extractor for scraps of a hentaifoundry user""" subcategory = "scraps" directory_fmt = ("{category}", "{user}", "Scraps") pattern = BASE_PATTERN + r"/pictures/user/([^/?#]+)/scraps" test = ( ("https://www.hentai-foundry.com/pictures/user/Evulchibi/scraps", { "url": "7cd9c6ec6258c4ab8c44991f7731be82337492a7", }), ("https://www.hentai-foundry.com" "/pictures/user/Evulchibi/scraps/page/3"), ) def __init__(self, match): HentaifoundryExtractor.__init__(self, match) self.page_url = "{}/pictures/user/{}/scraps".format( self.root, self.user) class HentaifoundryFavoriteExtractor(HentaifoundryExtractor): """Extractor for favorite images of a hentaifoundry user""" subcategory = "favorite" directory_fmt = ("{category}", "{user}", "Favorites") archive_fmt = "f_{user}_{index}" pattern = BASE_PATTERN + r"/user/([^/?#]+)/faves/pictures" test = ( ("https://www.hentai-foundry.com/user/Tenpura/faves/pictures", { "url": "56f9ae2e89fe855e9fe1da9b81e5ec6212b0320b", }), ("https://www.hentai-foundry.com" "/user/Tenpura/faves/pictures/page/3"), ) def __init__(self, match): HentaifoundryExtractor.__init__(self, match) self.page_url = "{}/user/{}/faves/pictures".format( self.root, self.user) class HentaifoundryRecentExtractor(HentaifoundryExtractor): """Extractor for 'Recent Pictures' on hentaifoundry.com""" subcategory = "recent" directory_fmt = ("{category}", "Recent Pictures", "{date}") archive_fmt = "r_{index}" pattern = BASE_PATTERN + r"/pictures/recent/(\d\d\d\d-\d\d-\d\d)" test = ("https://www.hentai-foundry.com/pictures/recent/2018-09-20", { "pattern": r"https://pictures.hentai-foundry.com/[^/]/[^/?#]+/\d+/", "range": "20-30", }) def __init__(self, match): HentaifoundryExtractor.__init__(self, match) self.page_url = "{}/pictures/recent/{}".format(self.root, self.user) def metadata(self): return {"date": self.user} class HentaifoundryPopularExtractor(HentaifoundryExtractor): """Extractor for popular images on hentaifoundry.com""" subcategory = "popular" directory_fmt = ("{category}", "Popular Pictures") archive_fmt = "p_{index}" pattern = BASE_PATTERN + r"/pictures/popular()" test = ("https://www.hentai-foundry.com/pictures/popular", { "pattern": r"https://pictures.hentai-foundry.com/[^/]/[^/?#]+/\d+/", "range": "20-30", }) def __init__(self, match): HentaifoundryExtractor.__init__(self, match) self.page_url = self.root + "/pictures/popular" class HentaifoundryImageExtractor(HentaifoundryExtractor): """Extractor for a single image from hentaifoundry.com""" subcategory = "image" pattern = (r"(https?://)?(?:www\.|pictures\.)?hentai-foundry\.com" r"/(?:pictures/user|[^/?#])/([^/?#]+)/(\d+)") test = ( (("https://www.hentai-foundry.com" "/pictures/user/Tenpura/407501/shimakaze"), { "url": "fbf2fd74906738094e2575d2728e8dc3de18a8a3", "content": "91bf01497c39254b6dfb234a18e8f01629c77fd1", "keyword": { "artist" : "Tenpura", "date" : "dt:2016-02-22 14:41:19", "description": "Thank you!", "height" : 700, "index" : 407501, "media" : "Other digital art", "ratings": ["Sexual content", "Contains female nudity"], "score" : int, "tags" : ["collection", "kancolle", "kantai", "shimakaze"], "title" : "shimakaze", "user" : "Tenpura", "views" : int, "width" : 495, }, }), ("http://www.hentai-foundry.com/pictures/user/Tenpura/407501/", { "pattern": "http://pictures.hentai-foundry.com/t/Tenpura/407501/", }), ("https://www.hentai-foundry.com/pictures/user/Tenpura/407501/"), ("https://pictures.hentai-foundry.com" "/t/Tenpura/407501/Tenpura-407501-shimakaze.png"), ) skip = Extractor.skip def __init__(self, match): HentaifoundryExtractor.__init__(self, match) self.index = match.group(3) def items(self): post_url = "{}/pictures/user/{}/{}/?enterAgree=1".format( self.root, self.user, self.index) image = self._parse_post(post_url) image["user"] = self.user yield Message.Directory, image yield Message.Url, image["src"], image class HentaifoundryStoriesExtractor(HentaifoundryExtractor): """Extractor for stories of a hentaifoundry user""" subcategory = "stories" archive_fmt = "s_{index}" pattern = BASE_PATTERN + r"/stories/user/([^/?#]+)(?:/page/(\d+))?/?$" test = ("https://www.hentai-foundry.com/stories/user/SnowWolf35", { "count": ">= 35", "keyword": { "author" : "SnowWolf35", "chapters" : int, "comments" : int, "date" : "type:datetime", "description": str, "index" : int, "rating" : int, "ratings" : list, "status" : "re:(Inc|C)omplete", "title" : str, "user" : "SnowWolf35", "views" : int, "words" : int, }, }) def items(self): self._init_site_filters() for story_html in util.advance(self.stories(), self.start_post): story = self._parse_story(story_html) yield Message.Directory, story yield Message.Url, story["src"], story def stories(self): url = "{}/stories/user/{}".format(self.root, self.user) return self._pagination(url, '
    ', '') class HentaifoundryStoryExtractor(HentaifoundryExtractor): """Extractor for a hentaifoundry story""" subcategory = "story" archive_fmt = "s_{index}" pattern = BASE_PATTERN + r"/stories/user/([^/?#]+)/(\d+)" test = (("https://www.hentai-foundry.com/stories/user/SnowWolf35" "/26416/Overwatch-High-Chapter-Voting-Location"), { "url": "5a67cfa8c3bf7634c8af8485dd07c1ea74ee0ae8", "keyword": {"title": "Overwatch High Chapter Voting Location"}, }) skip = Extractor.skip def __init__(self, match): HentaifoundryExtractor.__init__(self, match) self.index = match.group(3) def items(self): story_url = "{}/stories/user/{}/{}/x?enterAgree=1".format( self.root, self.user, self.index) story = self._parse_story(self.request(story_url).text) yield Message.Directory, story yield Message.Url, story["src"], story ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1675805990.0 gallery_dl-1.25.0/gallery_dl/extractor/hentaifox.py0000644000175000017500000001256514370542446021123 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2019-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://hentaifox.com/""" from .common import GalleryExtractor, Extractor, Message from .. import text, util class HentaifoxBase(): """Base class for hentaifox extractors""" category = "hentaifox" root = "https://hentaifox.com" class HentaifoxGalleryExtractor(HentaifoxBase, GalleryExtractor): """Extractor for image galleries on hentaifox.com""" pattern = r"(?:https?://)?(?:www\.)?hentaifox\.com(/gallery/(\d+))" test = ( ("https://hentaifox.com/gallery/56622/", { "pattern": r"https://i\d*\.hentaifox\.com/\d+/\d+/\d+\.jpg", "keyword": "bcd6b67284f378e5cc30b89b761140e3e60fcd92", "count": 24, }), # 'split_tag' element (#1378) ("https://hentaifox.com/gallery/630/", { "keyword": { "artist": ["beti", "betty", "magi", "mimikaki"], "characters": [ "aerith gainsborough", "tifa lockhart", "yuffie kisaragi" ], "count": 32, "gallery_id": 630, "group": ["cu-little2"], "parody": ["darkstalkers | vampire", "final fantasy vii"], "tags": ["femdom", "fingering", "masturbation", "yuri"], "title": "Cu-Little Bakanya~", "type": "doujinshi", }, }), ) def __init__(self, match): GalleryExtractor.__init__(self, match) self.gallery_id = match.group(2) @staticmethod def _split(txt): return [ text.remove_html(tag.partition(">")[2], "", "") for tag in text.extract_iter( txt, "class='tag_btn", "= 60", "keyword": { "url" : str, "gallery_id": int, "title" : str, }, }), ) def __init__(self, match): Extractor.__init__(self, match) self.path = match.group(1) def items(self): for gallery in self.galleries(): yield Message.Queue, gallery["url"], gallery def galleries(self): num = 1 while True: url = "{}{}/pag/{}/".format(self.root, self.path, num) page = self.request(url).text for info in text.extract_iter( page, 'class="g_title">') yield { "url" : text.urljoin(self.root, url), "gallery_id": text.parse_int( url.strip("/").rpartition("/")[2]), "title" : text.unescape(title), "_extractor": HentaifoxGalleryExtractor, } pos = page.find(">Next<") url = text.rextract(page, "href=", ">", pos)[0] if pos == -1 or "/pag" not in url: return num += 1 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1675806057.0 gallery_dl-1.25.0/gallery_dl/extractor/hentaihand.py0000644000175000017500000001070114370542551021224 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2020-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://hentaihand.com/""" from .common import GalleryExtractor, Extractor, Message from .. import text, util class HentaihandGalleryExtractor(GalleryExtractor): """Extractor for image galleries on hentaihand.com""" category = "hentaihand" root = "https://hentaihand.com" pattern = r"(?:https?://)?(?:www\.)?hentaihand\.com/\w+/comic/([\w-]+)" test = ( (("https://hentaihand.com/en/comic/c75-takumi-na-muchi-choudenji-hou-" "no-aishi-kata-how-to-love-a-super-electromagnetic-gun-toaru-kagaku-" "no-railgun-english"), { "pattern": r"https://cdn.hentaihand.com/.*/images/37387/\d+.jpg$", "count": 50, "keyword": { "artists" : ["Takumi Na Muchi"], "date" : "dt:2014-06-28 00:00:00", "gallery_id": 37387, "lang" : "en", "language" : "English", "parodies" : ["Toaru Kagaku No Railgun"], "relationships": list, "tags" : list, "title" : r"re:\(C75\) \[Takumi na Muchi\] Choudenji Hou ", "title_alt" : r"re:\(C75\) \[たくみなむち\] 超電磁砲のあいしかた", "type" : "Doujinshi", }, }), ) def __init__(self, match): self.slug = match.group(1) url = "{}/api/comics/{}".format(self.root, self.slug) GalleryExtractor.__init__(self, match, url) def metadata(self, page): info = util.json_loads(page) data = { "gallery_id" : text.parse_int(info["id"]), "title" : info["title"], "title_alt" : info["alternative_title"], "slug" : self.slug, "type" : info["category"]["name"], "language" : info["language"]["name"], "lang" : util.language_to_code(info["language"]["name"]), "tags" : [t["slug"] for t in info["tags"]], "date" : text.parse_datetime( info["uploaded_at"], "%Y-%m-%d"), } for key in ("artists", "authors", "groups", "characters", "relationships", "parodies"): data[key] = [v["name"] for v in info[key]] return data def images(self, _): info = self.request(self.gallery_url + "/images").json() return [(img["source_url"], img) for img in info["images"]] class HentaihandTagExtractor(Extractor): """Extractor for tag searches on hentaihand.com""" category = "hentaihand" subcategory = "tag" root = "https://hentaihand.com" pattern = (r"(?i)(?:https?://)?(?:www\.)?hentaihand\.com" r"/\w+/(parody|character|tag|artist|group|language" r"|category|relationship)/([^/?#]+)") test = ( ("https://hentaihand.com/en/artist/takumi-na-muchi", { "pattern": HentaihandGalleryExtractor.pattern, "count": ">= 6", }), ("https://hentaihand.com/en/tag/full-color"), ("https://hentaihand.com/fr/language/japanese"), ("https://hentaihand.com/zh/category/manga"), ) def __init__(self, match): Extractor.__init__(self, match) self.type, self.key = match.groups() def items(self): if self.type[-1] == "y": tpl = self.type[:-1] + "ies" else: tpl = self.type + "s" url = "{}/api/{}/{}".format(self.root, tpl, self.key) tid = self.request(url, notfound=self.type).json()["id"] url = self.root + "/api/comics" params = { "per_page": "18", tpl : tid, "page" : 1, "q" : "", "sort" : "uploaded_at", "order" : "desc", "duration": "day", } while True: info = self.request(url, params=params).json() for gallery in info["data"]: gurl = "{}/en/comic/{}".format(self.root, gallery["slug"]) gallery["_extractor"] = HentaihandGalleryExtractor yield Message.Queue, gurl, gallery if params["page"] >= info["last_page"]: return params["page"] += 1 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1675806043.0 gallery_dl-1.25.0/gallery_dl/extractor/hentaihere.py0000644000175000017500000001223214370542533021236 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2016-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://hentaihere.com/""" from .common import ChapterExtractor, MangaExtractor from .. import text, util import re class HentaihereBase(): """Base class for hentaihere extractors""" category = "hentaihere" root = "https://hentaihere.com" class HentaihereChapterExtractor(HentaihereBase, ChapterExtractor): """Extractor for a single manga chapter from hentaihere.com""" archive_fmt = "{chapter_id}_{page}" pattern = r"(?:https?://)?(?:www\.)?hentaihere\.com/m/S(\d+)/([^/?#]+)" test = ( ("https://hentaihere.com/m/S13812/1/1/", { "url": "964b942cf492b3a129d2fe2608abfc475bc99e71", "keyword": "0207d20eea3a15d2a8d1496755bdfa49de7cfa9d", }), ("https://hentaihere.com/m/S23048/1.5/1/", { "pattern": r"https://hentaicdn\.com/hentai" r"/23048/1\.5/ccdn00\d+\.jpg", "count": 32, "keyword": { "author": "Shinozuka Yuuji", "chapter": 1, "chapter_id": 80186, "chapter_minor": ".5", "count": 32, "lang": "en", "language": "English", "manga": "High School Slut's Love Consultation", "manga_id": 23048, "page": int, "title": "High School Slut's Love Consultation + " "Girlfriend [Full Color]", "type": "Original", }, }), ) def __init__(self, match): self.manga_id, self.chapter = match.groups() url = "{}/m/S{}/{}/1".format(self.root, self.manga_id, self.chapter) ChapterExtractor.__init__(self, match, url) def metadata(self, page): title = text.extr(page, "", "") chapter_id = text.extr(page, 'report/C', '"') chapter, sep, minor = self.chapter.partition(".") pattern = r"Page 1 \| (.+) \(([^)]+)\) - Chapter \d+: (.+) by (.+) at " match = re.match(pattern, title) return { "manga": match.group(1), "manga_id": text.parse_int(self.manga_id), "chapter": text.parse_int(chapter), "chapter_minor": sep + minor, "chapter_id": text.parse_int(chapter_id), "type": match.group(2), "title": match.group(3), "author": match.group(4), "lang": "en", "language": "English", } @staticmethod def images(page): images = text.extr(page, "var rff_imageList = ", ";") return [ ("https://hentaicdn.com/hentai" + part, None) for part in util.json_loads(images) ] class HentaihereMangaExtractor(HentaihereBase, MangaExtractor): """Extractor for hmanga from hentaihere.com""" chapterclass = HentaihereChapterExtractor pattern = r"(?:https?://)?(?:www\.)?hentaihere\.com(/m/S\d+)/?$" test = ( ("https://hentaihere.com/m/S13812", { "url": "d1ba6e28bb2162e844f8559c2b2725ba0a093559", "keyword": "5c1b712258e78e120907121d3987c71f834d13e1", }), ("https://hentaihere.com/m/S7608", { "url": "6c5239758dc93f6b1b4175922836c10391b174f7", "keyword": { "chapter": int, "chapter_id": int, "chapter_minor": "", "lang": "en", "language": "English", "manga": "Oshikake Riot", "manga_id": 7608, "title": r"re:Oshikake Riot( \d+)?", "type": "Original", }, }), ) def chapters(self, page): results = [] pos = page.find('itemscope itemtype="http://schema.org/Book') + 1 manga, pos = text.extract( page, '', '', pos) mtype, pos = text.extract( page, '[', ']', pos) manga_id = text.parse_int( self.manga_url.rstrip("/").rpartition("/")[2][1:]) while True: marker, pos = text.extract( page, '
  • ', '', pos) if marker is None: return results url, pos = text.extract(page, '\n', '<', pos) chapter_id, pos = text.extract(page, '/C', '"', pos) chapter, _, title = text.unescape(chapter).strip().partition(" - ") chapter, sep, minor = chapter.partition(".") results.append((url, { "manga_id": manga_id, "manga": manga, "chapter": text.parse_int(chapter), "chapter_minor": sep + minor, "chapter_id": text.parse_int(chapter_id), "type": mtype, "title": title, "lang": "en", "language": "English", })) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1674918287.0 gallery_dl-1.25.0/gallery_dl/extractor/hiperdex.py0000644000175000017500000001603614365234617020745 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2020-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://1sthiperdex.com/""" from .common import ChapterExtractor, MangaExtractor from .. import text from ..cache import memcache import re BASE_PATTERN = (r"((?:https?://)?(?:www\.)?" r"(?:1st)?hiperdex\d?\.(?:com|net|info))") class HiperdexBase(): """Base class for hiperdex extractors""" category = "hiperdex" root = "https://1sthiperdex.com" @memcache(keyarg=1) def manga_data(self, manga, page=None): if not page: url = "{}/manga/{}/".format(self.root, manga) page = self.request(url).text extr = text.extract_from(page) return { "manga" : text.unescape(extr( "", "<").rpartition("&")[0].strip()), "score" : text.parse_float(extr( 'id="averagerate">', '<')), "author" : text.remove_html(extr( 'class="author-content">', '</div>')), "artist" : text.remove_html(extr( 'class="artist-content">', '</div>')), "genre" : text.split_html(extr( 'class="genres-content">', '</div>'))[::2], "type" : extr( 'class="summary-content">', '<').strip(), "release": text.parse_int(text.remove_html(extr( 'class="summary-content">', '</div>'))), "status" : extr( 'class="summary-content">', '<').strip(), "description": text.remove_html(text.unescape(extr( 'class="description-summary">', '</div>'))), "language": "English", "lang" : "en", } def chapter_data(self, chapter): if chapter.startswith("chapter-"): chapter = chapter[8:] chapter, _, minor = chapter.partition("-") data = { "chapter" : text.parse_int(chapter), "chapter_minor": "." + minor if minor and minor != "end" else "", } data.update(self.manga_data(self.manga.lower())) return data class HiperdexChapterExtractor(HiperdexBase, ChapterExtractor): """Extractor for manga chapters from 1sthiperdex.com""" pattern = BASE_PATTERN + r"(/manga/([^/?#]+)/([^/?#]+))" test = ( ("https://1sthiperdex.com/manga/domestic-na-kanojo/154-5/", { "pattern": r"https://(1st)?hiperdex\d?.(com|net|info)" r"/wp-content/uploads/WP-manga/data" r"/manga_\w+/[0-9a-f]{32}/\d+\.webp", "count": 9, "keyword": { "artist" : "Sasuga Kei", "author" : "Sasuga Kei", "chapter": 154, "chapter_minor": ".5", "description": "re:Natsuo Fujii is in love with his teacher, ", "genre" : list, "manga" : "Domestic na Kanojo", "release": 2014, "score" : float, "type" : "Manga", }, }), ("https://hiperdex.com/manga/domestic-na-kanojo/154-5/"), ("https://hiperdex2.com/manga/domestic-na-kanojo/154-5/"), ("https://hiperdex.net/manga/domestic-na-kanojo/154-5/"), ("https://hiperdex.info/manga/domestic-na-kanojo/154-5/"), ) def __init__(self, match): root, path, self.manga, self.chapter = match.groups() self.root = text.ensure_http_scheme(root) ChapterExtractor.__init__(self, match, self.root + path + "/") def metadata(self, _): return self.chapter_data(self.chapter) def images(self, page): return [ (url.strip(), None) for url in re.findall( r'id="image-\d+"\s+(?:data-)?src="([^"]+)', page) ] class HiperdexMangaExtractor(HiperdexBase, MangaExtractor): """Extractor for manga from 1sthiperdex.com""" chapterclass = HiperdexChapterExtractor pattern = BASE_PATTERN + r"(/manga/([^/?#]+))/?$" test = ( ("https://1sthiperdex.com/manga/youre-not-that-special/", { "count": 51, "pattern": HiperdexChapterExtractor.pattern, "keyword": { "artist" : "Bolp", "author" : "Abyo4", "chapter": int, "chapter_minor": "", "description": "re:I didn’t think much of the creepy girl in ", "genre" : list, "manga" : "You’re Not That Special!", "release": 2019, "score" : float, "status" : "Completed", "type" : "Manhwa", }, }), ("https://hiperdex.com/manga/youre-not-that-special/"), ("https://hiperdex2.com/manga/youre-not-that-special/"), ("https://hiperdex.net/manga/youre-not-that-special/"), ("https://hiperdex.info/manga/youre-not-that-special/"), ) def __init__(self, match): root, path, self.manga = match.groups() self.root = text.ensure_http_scheme(root) MangaExtractor.__init__(self, match, self.root + path + "/") def chapters(self, page): self.manga_data(self.manga, page) results = [] shortlink = text.extr(page, "rel='shortlink' href='", "'") data = { "action" : "manga_get_reading_nav", "manga" : shortlink.rpartition("=")[2], "chapter" : "", "volume_id": "", "style" : "list", "type" : "manga", } url = self.root + "/wp-admin/admin-ajax.php" page = self.request(url, method="POST", data=data).text for url in text.extract_iter(page, 'data-redirect="', '"'): chapter = url.rpartition("/")[2] results.append((url, self.chapter_data(chapter))) return results class HiperdexArtistExtractor(HiperdexBase, MangaExtractor): """Extractor for an artists's manga on hiperdex.com""" subcategory = "artist" categorytransfer = False chapterclass = HiperdexMangaExtractor reverse = False pattern = BASE_PATTERN + r"(/manga-a(?:rtist|uthor)/(?:[^/?#]+))" test = ( ("https://1sthiperdex.com/manga-artist/beck-ho-an/"), ("https://hiperdex.net/manga-artist/beck-ho-an/"), ("https://hiperdex2.com/manga-artist/beck-ho-an/"), ("https://hiperdex.info/manga-artist/beck-ho-an/"), ("https://hiperdex.com/manga-author/viagra/", { "pattern": HiperdexMangaExtractor.pattern, "count": ">= 6", }), ) def __init__(self, match): self.root = text.ensure_http_scheme(match.group(1)) MangaExtractor.__init__(self, match, self.root + match.group(2) + "/") def chapters(self, page): results = [] for info in text.extract_iter(page, 'id="manga-item-', '<img'): url = text.extr(info, 'href="', '"') results.append((url, {})) return results ��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1675805369.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.25.0/gallery_dl/extractor/hitomi.py����������������������������������������������������0000644�0001750�0001750�00000017306�14370541271�020420� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2015-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://hitomi.la/""" from .common import GalleryExtractor, Extractor, Message from .nozomi import decode_nozomi from ..cache import memcache from .. import text, util import string import re class HitomiGalleryExtractor(GalleryExtractor): """Extractor for image galleries from hitomi.la""" category = "hitomi" root = "https://hitomi.la" pattern = (r"(?:https?://)?hitomi\.la" r"/(?:manga|doujinshi|cg|gamecg|galleries|reader)" r"/(?:[^/?#]+-)?(\d+)") test = ( ("https://hitomi.la/galleries/867789.html", { "pattern": r"https://[a-c]a\.hitomi\.la/webp/\d+/\d+" r"/[0-9a-f]{64}\.webp", "keyword": "86af5371f38117a07407f11af689bdd460b09710", "count": 16, }), # download test ("https://hitomi.la/galleries/1401410.html", { "range": "1", "content": "d75d5a3d1302a48469016b20e53c26b714d17745", }), # Game CG with scenes (#321) ("https://hitomi.la/galleries/733697.html", { "count": 210, }), # fallback for galleries only available through /reader/ URLs ("https://hitomi.la/galleries/1045954.html", { "count": 1413, }), # gallery with "broken" redirect ("https://hitomi.la/cg/scathacha-sama-okuchi-ecchi-1291900.html", { "count": 10, "options": (("format", "original"),), "pattern": r"https://[a-c]b\.hitomi\.la/images/\d+/\d+" r"/[0-9a-f]{64}\.jpg", }), # no tags ("https://hitomi.la/cg/1615823.html", { "count": 22, "options": (("format", "avif"),), "pattern": r"https://[a-c]a\.hitomi\.la/avif/\d+/\d+" r"/[0-9a-f]{64}\.avif", }), ("https://hitomi.la/manga/amazon-no-hiyaku-867789.html"), ("https://hitomi.la/manga/867789.html"), ("https://hitomi.la/doujinshi/867789.html"), ("https://hitomi.la/cg/867789.html"), ("https://hitomi.la/gamecg/867789.html"), ("https://hitomi.la/reader/867789.html"), ) def __init__(self, match): gid = match.group(1) url = "https://ltn.hitomi.la/galleries/{}.js".format(gid) GalleryExtractor.__init__(self, match, url) self.info = None self.session.headers["Referer"] = "{}/reader/{}.html".format( self.root, gid) def metadata(self, page): self.info = info = util.json_loads(page.partition("=")[2]) iget = info.get language = iget("language") if language: language = language.capitalize() date = iget("date") if date: date += ":00" tags = [] for tinfo in iget("tags") or (): tag = string.capwords(tinfo["tag"]) if tinfo.get("female"): tag += " ♀" elif tinfo.get("male"): tag += " ♂" tags.append(tag) return { "gallery_id": text.parse_int(info["id"]), "title" : info["title"], "type" : info["type"].capitalize(), "language" : language, "lang" : util.language_to_code(language), "date" : text.parse_datetime(date, "%Y-%m-%d %H:%M:%S%z"), "tags" : tags, "artist" : [o["artist"] for o in iget("artists") or ()], "group" : [o["group"] for o in iget("groups") or ()], "parody" : [o["parody"] for o in iget("parodys") or ()], "characters": [o["character"] for o in iget("characters") or ()] } def images(self, _): # see https://ltn.hitomi.la/gg.js gg_m, gg_b, gg_default = _parse_gg(self) fmt = self.config("format") or "webp" if fmt == "original": subdomain, path, ext, check = "b", "images", None, False else: subdomain, path, ext, check = "a", fmt, fmt, (fmt != "webp") result = [] for image in self.info["files"]: if check: if image.get("has" + fmt): path = ext = fmt else: path = ext = "webp" ihash = image["hash"] idata = text.nameext_from_url(image["name"]) if ext: idata["extension"] = ext # see https://ltn.hitomi.la/common.js inum = int(ihash[-1] + ihash[-3:-1], 16) url = "https://{}{}.hitomi.la/{}/{}/{}/{}.{}".format( chr(97 + gg_m.get(inum, gg_default)), subdomain, path, gg_b, inum, ihash, idata["extension"], ) result.append((url, idata)) return result class HitomiTagExtractor(Extractor): """Extractor for galleries from tag searches on hitomi.la""" category = "hitomi" subcategory = "tag" root = "https://hitomi.la" pattern = (r"(?:https?://)?hitomi\.la/" r"(tag|artist|group|series|type|character)/" r"([^/?#]+)\.html") test = ( ("https://hitomi.la/tag/screenshots-japanese.html", { "pattern": HitomiGalleryExtractor.pattern, "count": ">= 35", }), ("https://hitomi.la/artist/a1-all-1.html"), ("https://hitomi.la/group/initial%2Dg-all-1.html"), ("https://hitomi.la/series/amnesia-all-1.html"), ("https://hitomi.la/type/doujinshi-all-1.html"), ("https://hitomi.la/character/a2-all-1.html"), ) def __init__(self, match): Extractor.__init__(self, match) self.type, self.tag = match.groups() tag, _, num = self.tag.rpartition("-") if num.isdecimal(): self.tag = tag def items(self): data = {"_extractor": HitomiGalleryExtractor} nozomi_url = "https://ltn.hitomi.la/{}/{}.nozomi".format( self.type, self.tag) headers = { "Origin": self.root, "Cache-Control": "max-age=0", } offset = 0 total = None while True: headers["Referer"] = "{}/{}/{}.html?page={}".format( self.root, self.type, self.tag, offset // 100 + 1) headers["Range"] = "bytes={}-{}".format(offset, offset+99) response = self.request(nozomi_url, headers=headers) for gallery_id in decode_nozomi(response.content): gallery_url = "{}/galleries/{}.html".format( self.root, gallery_id) yield Message.Queue, gallery_url, data offset += 100 if total is None: total = text.parse_int( response.headers["content-range"].rpartition("/")[2]) if offset >= total: return @memcache(maxage=1800) def _parse_gg(extr): page = extr.request("https://ltn.hitomi.la/gg.js").text m = {} keys = [] for match in re.finditer( r"case\s+(\d+):(?:\s*o\s*=\s*(\d+))?", page): key, value = match.groups() keys.append(int(key)) if value: value = int(value) for key in keys: m[key] = value keys.clear() for match in re.finditer( r"if\s+\(g\s*===?\s*(\d+)\)[\s{]*o\s*=\s*(\d+)", page): m[int(match.group(1))] = int(match.group(2)) d = re.search(r"(?:var\s|default:)\s*o\s*=\s*(\d+)", page) b = re.search(r"b:\s*[\"'](.+)[\"']", page) return m, b.group(1).strip("/"), int(d.group(1)) if d else 1 ��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1674918287.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.25.0/gallery_dl/extractor/hotleak.py���������������������������������������������������0000644�0001750�0001750�00000017043�14365234617�020563� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://hotleak.vip/""" from .common import Extractor, Message from .. import text, exception import binascii BASE_PATTERN = r"(?:https?://)?(?:www\.)?hotleak\.vip" class HotleakExtractor(Extractor): """Base class for hotleak extractors""" category = "hotleak" directory_fmt = ("{category}", "{creator}",) filename_fmt = "{creator}_{id}.{extension}" archive_fmt = "{type}_{creator}_{id}" root = "https://hotleak.vip" def __init__(self, match): Extractor.__init__(self, match) self.session.headers["Referer"] = self.root def items(self): for post in self.posts(): yield Message.Directory, post yield Message.Url, post["url"], post def posts(self): """Return an iterable containing relevant posts""" return () def _pagination(self, url, params): params = text.parse_query(params) params["page"] = text.parse_int(params.get("page"), 1) while True: page = self.request(url, params=params).text if "</article>" not in page: return for item in text.extract_iter( page, '<article class="movie-item', '</article>'): yield text.extr(item, '<a href="', '"') params["page"] += 1 def decode_video_url(url): # cut first and last 16 characters, reverse, base64 decode return binascii.a2b_base64(url[-17:15:-1]).decode() class HotleakPostExtractor(HotleakExtractor): """Extractor for individual posts on hotleak""" subcategory = "post" pattern = (BASE_PATTERN + r"/(?!hot|creators|videos|photos)" r"([^/]+)/(photo|video)/(\d+)") test = ( ("https://hotleak.vip/kaiyakawaii/photo/1617145", { "pattern": r"https://hotleak\.vip/storage/images/3625" r"/1617145/fefdd5988dfcf6b98cc9e11616018868\.jpg", "keyword": { "id": 1617145, "creator": "kaiyakawaii", "type": "photo", "filename": "fefdd5988dfcf6b98cc9e11616018868", "extension": "jpg", }, }), ("https://hotleak.vip/lilmochidoll/video/1625538", { "pattern": r"ytdl:https://cdn8-leak\.camhdxx\.com" r"/1661/1625538/index\.m3u8", "keyword": { "id": 1625538, "creator": "lilmochidoll", "type": "video", "filename": "index", "extension": "mp4", }, }), ) def __init__(self, match): HotleakExtractor.__init__(self, match) self.creator, self.type, self.id = match.groups() def posts(self): url = "{}/{}/{}/{}".format( self.root, self.creator, self.type, self.id) page = self.request(url).text page = text.extr( page, '<div class="movie-image thumb">', '</article>') data = { "id" : text.parse_int(self.id), "creator": self.creator, "type" : self.type, } if self.type == "photo": data["url"] = text.extr(page, 'data-src="', '"') text.nameext_from_url(data["url"], data) elif self.type == "video": data["url"] = "ytdl:" + decode_video_url(text.extr( text.unescape(page), '"src":"', '"')) text.nameext_from_url(data["url"], data) data["extension"] = "mp4" return (data,) class HotleakCreatorExtractor(HotleakExtractor): """Extractor for all posts from a hotleak creator""" subcategory = "creator" pattern = BASE_PATTERN + r"/(?!hot|creators|videos|photos)([^/?#]+)/?$" test = ( ("https://hotleak.vip/kaiyakawaii", { "range": "1-200", "count": 200, }), ("https://hotleak.vip/stellaviolet", { "count": "> 600" }), ("https://hotleak.vip/doesnotexist", { "exception": exception.NotFoundError, }), ) def __init__(self, match): HotleakExtractor.__init__(self, match) self.creator = match.group(1) def posts(self): url = "{}/{}".format(self.root, self.creator) return self._pagination(url) def _pagination(self, url): headers = {"X-Requested-With": "XMLHttpRequest"} params = {"page": 1} while True: try: response = self.request( url, headers=headers, params=params, notfound="creator") except exception.HttpError as exc: if exc.response.status_code == 429: self.wait( until=exc.response.headers.get("X-RateLimit-Reset")) continue raise posts = response.json() if not posts: return data = {"creator": self.creator} for post in posts: data["id"] = text.parse_int(post["id"]) if post["type"] == 0: data["type"] = "photo" data["url"] = self.root + "/storage/" + post["image"] text.nameext_from_url(data["url"], data) elif post["type"] == 1: data["type"] = "video" data["url"] = "ytdl:" + decode_video_url( post["stream_url_play"]) text.nameext_from_url(data["url"], data) data["extension"] = "mp4" yield data params["page"] += 1 class HotleakCategoryExtractor(HotleakExtractor): """Extractor for hotleak categories""" subcategory = "category" pattern = BASE_PATTERN + r"/(hot|creators|videos|photos)(?:/?\?([^#]+))?" test = ( ("https://hotleak.vip/photos", { "pattern": HotleakPostExtractor.pattern, "range": "1-50", "count": 50, }), ("https://hotleak.vip/videos"), ("https://hotleak.vip/creators", { "pattern": HotleakCreatorExtractor.pattern, "range": "1-50", "count": 50, }), ("https://hotleak.vip/hot"), ) def __init__(self, match): HotleakExtractor.__init__(self, match) self._category, self.params = match.groups() def items(self): url = "{}/{}".format(self.root, self._category) if self._category in ("hot", "creators"): data = {"_extractor": HotleakCreatorExtractor} elif self._category in ("videos", "photos"): data = {"_extractor": HotleakPostExtractor} for item in self._pagination(url, self.params): yield Message.Queue, item, data class HotleakSearchExtractor(HotleakExtractor): """Extractor for hotleak search results""" subcategory = "search" pattern = BASE_PATTERN + r"/search(?:/?\?([^#]+))" test = ( ("https://hotleak.vip/search?search=gallery-dl", { "count": 0, }), ("https://hotleak.vip/search?search=hannah", { "count": "> 30", }), ) def __init__(self, match): HotleakExtractor.__init__(self, match) self.params = match.group(1) def items(self): data = {"_extractor": HotleakCreatorExtractor} for creator in self._pagination(self.root + "/search", self.params): yield Message.Queue, creator, data ���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1674475069.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.25.0/gallery_dl/extractor/idolcomplex.py�����������������������������������������������0000644�0001750�0001750�00000022274�14363473075�021456� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2018-2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://idol.sankakucomplex.com/""" from .sankaku import SankakuExtractor from .common import Message from ..cache import cache from .. import text, util, exception import collections import re class IdolcomplexExtractor(SankakuExtractor): """Base class for idolcomplex extractors""" category = "idolcomplex" cookienames = ("login", "pass_hash") cookiedomain = "idol.sankakucomplex.com" root = "https://" + cookiedomain request_interval = 5.0 def __init__(self, match): SankakuExtractor.__init__(self, match) self.logged_in = True self.start_page = 1 self.start_post = 0 self.extags = self.config("tags", False) def items(self): self.login() data = self.metadata() for post_id in util.advance(self.post_ids(), self.start_post): post = self._parse_post(post_id) url = post["file_url"] post.update(data) text.nameext_from_url(url, post) yield Message.Directory, post yield Message.Url, url, post def skip(self, num): self.start_post += num return num def post_ids(self): """Return an iterable containing all relevant post ids""" def login(self): if self._check_cookies(self.cookienames): return username, password = self._get_auth_info() if username: cookies = self._login_impl(username, password) self._update_cookies(cookies) else: self.logged_in = False @cache(maxage=90*24*3600, keyarg=1) def _login_impl(self, username, password): self.log.info("Logging in as %s", username) url = self.root + "/user/authenticate" data = { "url" : "", "user[name]" : username, "user[password]": password, "commit" : "Login", } response = self.request(url, method="POST", data=data) if not response.history or response.url != self.root + "/user/home": raise exception.AuthenticationError() cookies = response.history[0].cookies return {c: cookies[c] for c in self.cookienames} def _parse_post(self, post_id): """Extract metadata of a single post""" url = self.root + "/post/show/" + post_id page = self.request(url, retries=10).text extr = text.extract tags , pos = extr(page, "<title>", " | ") vavg , pos = extr(page, "itemprop=ratingValue>", "<", pos) vcnt , pos = extr(page, "itemprop=reviewCount>", "<", pos) _ , pos = extr(page, "Posted: <", "", pos) created, pos = extr(page, ' title="', '"', pos) rating = extr(page, "<li>Rating: ", "<", pos)[0] file_url, pos = extr(page, '<li>Original: <a href="', '"', pos) if file_url: width , pos = extr(page, '>', 'x', pos) height, pos = extr(page, '', ' ', pos) else: width , pos = extr(page, '<object width=', ' ', pos) height, pos = extr(page, 'height=', '>', pos) file_url = extr(page, '<embed src="', '"', pos)[0] data = { "id": text.parse_int(post_id), "md5": file_url.rpartition("/")[2].partition(".")[0], "tags": text.unescape(tags), "vote_average": text.parse_float(vavg), "vote_count": text.parse_int(vcnt), "created_at": created, "rating": (rating or "?")[0].lower(), "file_url": "https:" + text.unescape(file_url), "width": text.parse_int(width), "height": text.parse_int(height), } if self.extags: tags = collections.defaultdict(list) tags_html = text.extr(page, '<ul id=tag-sidebar>', '</ul>') pattern = re.compile(r'tag-type-([^>]+)><a href="/\?tags=([^"]+)') for tag_type, tag_name in pattern.findall(tags_html or ""): tags[tag_type].append(text.unquote(tag_name)) for key, value in tags.items(): data["tags_" + key] = " ".join(value) return data class IdolcomplexTagExtractor(IdolcomplexExtractor): """Extractor for images from idol.sankakucomplex.com by search-tags""" subcategory = "tag" directory_fmt = ("{category}", "{search_tags}") archive_fmt = "t_{search_tags}_{id}" pattern = r"(?:https?://)?idol\.sankakucomplex\.com/\?([^#]*)" test = ( ("https://idol.sankakucomplex.com/?tags=lyumos", { "count": 5, "range": "18-22", "pattern": r"https://is\.sankakucomplex\.com/data/[^/]{2}/[^/]{2}" r"/[^/]{32}\.\w+\?e=\d+&m=[^&#]+", }), ("https://idol.sankakucomplex.com/?tags=order:favcount", { "count": 5, "range": "18-22", }), ("https://idol.sankakucomplex.com" "/?tags=lyumos+wreath&page=3&next=694215"), ) per_page = 20 def __init__(self, match): IdolcomplexExtractor.__init__(self, match) query = text.parse_query(match.group(1)) self.tags = text.unquote(query.get("tags", "").replace("+", " ")) self.start_page = text.parse_int(query.get("page"), 1) self.next = text.parse_int(query.get("next"), 0) def skip(self, num): if self.next: self.start_post += num else: pages, posts = divmod(num, self.per_page) self.start_page += pages self.start_post += posts return num def metadata(self): if not self.next: max_page = 50 if self.logged_in else 25 if self.start_page > max_page: self.log.info("Traversing from page %d to page %d", max_page, self.start_page) self.start_post += self.per_page * (self.start_page - max_page) self.start_page = max_page tags = self.tags.split() if not self.logged_in and len(tags) > 4: raise exception.StopExtraction( "Non-members can only search up to 4 tags at once") return {"search_tags": " ".join(tags)} def post_ids(self): params = {"tags": self.tags} if self.next: params["next"] = self.next else: params["page"] = self.start_page while True: page = self.request(self.root, params=params, retries=10).text pos = page.find("<div id=more-popular-posts-link>") + 1 yield from text.extract_iter(page, '" id=p', '>', pos) next_url = text.extract(page, 'next-page-url="', '"', pos)[0] if not next_url: return next_params = text.parse_query(text.unescape( next_url).lstrip("?/")) if "next" in next_params: # stop if the same "next" value occurs twice in a row (#265) if "next" in params and params["next"] == next_params["next"]: return next_params["page"] = "2" params = next_params class IdolcomplexPoolExtractor(IdolcomplexExtractor): """Extractor for image-pools from idol.sankakucomplex.com""" subcategory = "pool" directory_fmt = ("{category}", "pool", "{pool}") archive_fmt = "p_{pool}_{id}" pattern = r"(?:https?://)?idol\.sankakucomplex\.com/pool/show/(\d+)" test = ("https://idol.sankakucomplex.com/pool/show/145", { "count": 3, }) per_page = 24 def __init__(self, match): IdolcomplexExtractor.__init__(self, match) self.pool_id = match.group(1) def skip(self, num): pages, posts = divmod(num, self.per_page) self.start_page += pages self.start_post += posts return num def metadata(self): return {"pool": self.pool_id} def post_ids(self): url = self.root + "/pool/show/" + self.pool_id params = {"page": self.start_page} while True: page = self.request(url, params=params, retries=10).text ids = list(text.extract_iter(page, '" id=p', '>')) yield from ids if len(ids) < self.per_page: return params["page"] += 1 class IdolcomplexPostExtractor(IdolcomplexExtractor): """Extractor for single images from idol.sankakucomplex.com""" subcategory = "post" archive_fmt = "{id}" pattern = r"(?:https?://)?idol\.sankakucomplex\.com/post/show/(\d+)" test = ("https://idol.sankakucomplex.com/post/show/694215", { "content": "694ec2491240787d75bf5d0c75d0082b53a85afd", "options": (("tags", True),), "keyword": { "tags_character": "shani_(the_witcher)", "tags_copyright": "the_witcher", "tags_idol": str, "tags_medium": str, "tags_general": str, }, }) def __init__(self, match): IdolcomplexExtractor.__init__(self, match) self.post_id = match.group(1) def post_ids(self): return (self.post_id,) ������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1674475069.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.25.0/gallery_dl/extractor/imagebam.py��������������������������������������������������0000644�0001750�0001750�00000011613�14363473075�020674� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2014-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.imagebam.com/""" from .common import Extractor, Message from .. import text, exception import re class ImagebamExtractor(Extractor): """Base class for imagebam extractors""" category = "imagebam" root = "https://www.imagebam.com" def __init__(self, match): Extractor.__init__(self, match) self.path = match.group(1) self.session.cookies.set("nsfw_inter", "1", domain="www.imagebam.com") def _parse_image_page(self, path): page = self.request(self.root + path).text url, pos = text.extract(page, '<img src="https://images', '"') filename = text.unescape(text.extract(page, 'alt="', '"', pos)[0]) data = { "url" : "https://images" + url, "image_key": path.rpartition("/")[2], } data["filename"], _, data["extension"] = filename.rpartition(".") return data class ImagebamGalleryExtractor(ImagebamExtractor): """Extractor for imagebam galleries""" subcategory = "gallery" directory_fmt = ("{category}", "{title} {gallery_key}") filename_fmt = "{num:>03} {filename}.{extension}" archive_fmt = "{gallery_key}_{image_key}" pattern = (r"(?:https?://)?(?:www\.)?imagebam\.com" r"(/(?:gallery/|view/G)[a-zA-Z0-9]+)") test = ( ("https://www.imagebam.com/gallery/adz2y0f9574bjpmonaismyrhtjgvey4o", { "url": "76d976788ae2757ac81694736b07b72356f5c4c8", "keyword": "b048478b1bbba3072a7fa9fcc40630b3efad1f6c", "content": "596e6bfa157f2c7169805d50075c2986549973a8", }), ("http://www.imagebam.com/gallery/op9dwcklwdrrguibnkoe7jxgvig30o5p", { # more than 100 images; see issue #219 "count": 107, "url": "32ae6fe5dc3e4ca73ff6252e522d16473595d1d1", }), ("http://www.imagebam.com/gallery/gsl8teckymt4vbvx1stjkyk37j70va2c", { "exception": exception.HttpError, }), # /view/ path (#2378) ("https://www.imagebam.com/view/GA3MT1", { "url": "35018ce1e00a2d2825a33d3cd37857edaf804919", "keyword": "3a9f98178f73694c527890c0d7ca9a92b46987ba", }), ) def items(self): page = self.request(self.root + self.path).text images = self.images(page) images.reverse() data = self.metadata(page) data["count"] = len(images) data["gallery_key"] = self.path.rpartition("/")[2] yield Message.Directory, data for data["num"], path in enumerate(images, 1): image = self._parse_image_page(path) image.update(data) yield Message.Url, image["url"], image @staticmethod def metadata(page): return {"title": text.unescape(text.extr( page, 'id="gallery-name">', '<').strip())} def images(self, page): findall = re.compile(r'<a href="https://www\.imagebam\.com' r'(/(?:image/|view/M)[a-zA-Z0-9]+)').findall paths = [] while True: paths += findall(page) pos = page.find('rel="next" aria-label="Next') if pos > 0: url = text.rextract(page, 'href="', '"', pos)[0] if url: page = self.request(url).text continue return paths class ImagebamImageExtractor(ImagebamExtractor): """Extractor for single imagebam images""" subcategory = "image" archive_fmt = "{image_key}" pattern = (r"(?:https?://)?(?:\w+\.)?imagebam\.com" r"(/(?:image/|view/M|(?:[0-9a-f]{2}/){3})[a-zA-Z0-9]+)") test = ( ("https://www.imagebam.com/image/94d56c502511890", { "url": "5e9ba3b1451f8ded0ae3a1b84402888893915d4a", "keyword": "2a4380d4b57554ff793898c2d6ec60987c86d1a1", "content": "0c8768055e4e20e7c7259608b67799171b691140", }), ("http://images3.imagebam.com/1d/8c/44/94d56c502511890.png"), # NSFW (#1534) ("https://www.imagebam.com/image/0850951366904951", { "url": "d37297b17ed1615b4311c8ed511e50ce46e4c748", }), # /view/ path (#2378) ("https://www.imagebam.com/view/ME8JOQP", { "url": "4dca72bbe61a0360185cf4ab2bed8265b49565b8", "keyword": "15a494c02fd30846b41b42a26117aedde30e4ceb", "content": "f81008666b17a42d8834c4749b910e1dc10a6e83", }), ) def items(self): path = self.path if path[3] == "/": path = ("/view/" if path[10] == "M" else "/image/") + path[10:] image = self._parse_image_page(path) yield Message.Directory, image yield Message.Url, image["url"], image ���������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1674475069.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.25.0/gallery_dl/extractor/imagechest.py������������������������������������������������0000644�0001750�0001750�00000003035�14363473075�021242� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2020 Leonid "Bepis" Pavel # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extract images from galleries at https://imgchest.com/""" from .common import GalleryExtractor from .. import text, exception class ImagechestGalleryExtractor(GalleryExtractor): """Extractor for image galleries from imgchest.com""" category = "imagechest" root = "https://imgchest.com" pattern = r"(?:https?://)?(?:www\.)?imgchest\.com/p/([A-Za-z0-9]{11})" test = ( ("https://imgchest.com/p/3na7kr3by8d", { "url": "f095b4f78c051e5a94e7c663814d1e8d4c93c1f7", "content": "076959e65be30249a2c651fbe6090dc30ba85193", "count": 3 }), ) def __init__(self, match): self.gallery_id = match.group(1) url = self.root + "/p/" + self.gallery_id GalleryExtractor.__init__(self, match, url) def metadata(self, page): if "Sorry, but the page you requested could not be found." in page: raise exception.NotFoundError("gallery") return { "gallery_id": self.gallery_id, "title": text.unescape(text.extr( page, 'property="og:title" content="', '"').strip()) } def images(self, page): return [ (url, None) for url in text.extract_iter( page, 'property="og:image" content="', '"') ] ���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1675805389.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.25.0/gallery_dl/extractor/imagefap.py��������������������������������������������������0000644�0001750�0001750�00000026310�14370541315�020672� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2016-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.imagefap.com/""" from .common import Extractor, Message from .. import text, util, exception BASE_PATTERN = r"(?:https?://)?(?:www\.|beta\.)?imagefap\.com" class ImagefapExtractor(Extractor): """Base class for imagefap extractors""" category = "imagefap" root = "https://www.imagefap.com" directory_fmt = ("{category}", "{gallery_id} {title}") filename_fmt = "{category}_{gallery_id}_{filename}.{extension}" archive_fmt = "{gallery_id}_{image_id}" request_interval = (2.0, 4.0) def __init__(self, match): Extractor.__init__(self, match) self.session.headers["Referer"] = self.root def request(self, url, **kwargs): response = Extractor.request(self, url, **kwargs) if response.history and response.url.endswith("/human-verification"): msg = text.extr(response.text, '<div class="mt-4', '<') if msg: msg = " ".join(msg.partition(">")[2].split()) raise exception.StopExtraction("'%s'", msg) self.log.warning("HTTP redirect to %s", response.url) return response class ImagefapGalleryExtractor(ImagefapExtractor): """Extractor for image galleries from imagefap.com""" subcategory = "gallery" pattern = BASE_PATTERN + r"/(?:gallery\.php\?gid=|gallery/|pictures/)(\d+)" test = ( ("https://www.imagefap.com/gallery/7102714", { "pattern": r"https://cdnh?\.imagefap\.com" r"/images/full/\d+/\d+/\d+\.jpg", "keyword": "2ba96e84c2952c4750e9fa94a3f2b1f965cec2f3", "content": "694a0a57385980a6f90fbc296cadcd6c11ba2dab", }), ("https://www.imagefap.com/gallery/7876223", { "pattern": r"https://cdnh?\.imagefap\.com" r"/images/full/\d+/\d+/\d+\.jpg", "keyword": { "count": 44, "gallery_id": 7876223, "image_id": int, "num": int, "tags": ["big ass", "panties", "horny", "pussy", "exposed", "outdoor"], "title": "Kelsi Monroe in lingerie", "uploader": "BdRachel", }, "count": 44, }), ("https://www.imagefap.com/pictures/7102714"), ("https://www.imagefap.com/gallery.php?gid=7102714"), ("https://beta.imagefap.com/gallery.php?gid=7102714"), ) def __init__(self, match): ImagefapExtractor.__init__(self, match) self.gid = match.group(1) self.image_id = "" def items(self): url = "{}/gallery/{}".format(self.root, self.gid) page = self.request(url).text data = self.get_job_metadata(page) yield Message.Directory, data for url, image in self.get_images(): data.update(image) yield Message.Url, url, data def get_job_metadata(self, page): """Collect metadata for extractor-job""" extr = text.extract_from(page) data = { "gallery_id": text.parse_int(self.gid), "tags": extr('name="keywords" content="', '"').split(", "), "uploader": extr("porn picture gallery by ", " to see hottest"), "title": text.unescape(extr("<title>", "<")), "count": text.parse_int(extr(' 1 of ', ' pics"')), } self.image_id = extr('id="img_ed_', '"') self._count = data["count"] return data def get_images(self): """Collect image-urls and -metadata""" url = "{}/photo/{}/".format(self.root, self.image_id) params = {"gid": self.gid, "idx": 0, "partial": "true"} headers = { "Content-Type": "application/x-www-form-urlencoded", "X-Requested-With": "XMLHttpRequest", "Referer": "{}?pgid=&gid={}&page=0".format(url, self.image_id) } num = 0 total = self._count while True: page = self.request(url, params=params, headers=headers).text cnt = 0 for image_url in text.extract_iter(page, '<a href="', '"'): num += 1 cnt += 1 data = text.nameext_from_url(image_url) data["num"] = num data["image_id"] = text.parse_int(data["filename"]) yield image_url, data if not cnt or cnt < 24 and num >= total: return params["idx"] += cnt class ImagefapImageExtractor(ImagefapExtractor): """Extractor for single images from imagefap.com""" subcategory = "image" pattern = BASE_PATTERN + r"/photo/(\d+)" test = ( ("https://www.imagefap.com/photo/1962981893", { "pattern": r"https://cdnh?\.imagefap\.com" r"/images/full/65/196/1962981893\.jpg", "keyword": { "date": "21/08/2014", "gallery_id": 7876223, "height": 1600, "image_id": 1962981893, "title": "Kelsi Monroe in lingerie", "uploader": "BdRachel", "width": 1066, }, }), ("https://beta.imagefap.com/photo/1962981893"), ) def __init__(self, match): ImagefapExtractor.__init__(self, match) self.image_id = match.group(1) def items(self): url, data = self.get_image() yield Message.Directory, data yield Message.Url, url, data def get_image(self): url = "{}/photo/{}/".format(self.root, self.image_id) page = self.request(url).text info, pos = text.extract( page, '<script type="application/ld+json">', '</script>') image_id, pos = text.extract( page, 'id="imageid_input" value="', '"', pos) gallery_id, pos = text.extract( page, 'id="galleryid_input" value="', '"', pos) info = util.json_loads(info) url = info["contentUrl"] return url, text.nameext_from_url(url, { "title": text.unescape(info["name"]), "uploader": info["author"], "date": info["datePublished"], "width": text.parse_int(info["width"]), "height": text.parse_int(info["height"]), "gallery_id": text.parse_int(gallery_id), "image_id": text.parse_int(image_id), }) class ImagefapFolderExtractor(ImagefapExtractor): """Extractor for imagefap user folders""" subcategory = "folder" pattern = (BASE_PATTERN + r"/(?:organizer/|" r"(?:usergallery\.php\?user(id)?=([^&#]+)&" r"|profile/([^/?#]+)/galleries\?)folderid=)(\d+|-1)") test = ( ("https://www.imagefap.com/organizer/409758", { "pattern": r"https://www\.imagefap\.com/gallery/7876223", "url": "37822523e6e4a56feb9dea35653760c86b44ff89", "count": 1, }), (("https://www.imagefap.com/usergallery.php" "?userid=1981976&folderid=409758"), { "url": "37822523e6e4a56feb9dea35653760c86b44ff89", }), (("https://www.imagefap.com/usergallery.php" "?user=BdRachel&folderid=409758"), { "url": "37822523e6e4a56feb9dea35653760c86b44ff89", }), ("https://www.imagefap.com/profile/BdRachel/galleries?folderid=-1", { "pattern": ImagefapGalleryExtractor.pattern, "range": "1-40", }), (("https://www.imagefap.com/usergallery.php" "?userid=1981976&folderid=-1"), { "pattern": ImagefapGalleryExtractor.pattern, "range": "1-40", }), (("https://www.imagefap.com/usergallery.php" "?user=BdRachel&folderid=-1"), { "pattern": ImagefapGalleryExtractor.pattern, "range": "1-40", }), ) def __init__(self, match): ImagefapExtractor.__init__(self, match) self._id, user, profile, self.folder_id = match.groups() self.user = user or profile def items(self): for gallery_id, name in self.galleries(self.folder_id): url = "{}/gallery/{}".format(self.root, gallery_id) data = { "gallery_id": gallery_id, "title" : text.unescape(name), "_extractor": ImagefapGalleryExtractor, } yield Message.Queue, url, data def galleries(self, folder_id): """Yield gallery IDs and titles of a folder""" if folder_id == "-1": if self._id: url = "{}/usergallery.php?userid={}&folderid=-1".format( self.root, self.user) else: url = "{}/profile/{}/galleries?folderid=-1".format( self.root, self.user) else: url = "{}/organizer/{}/".format(self.root, folder_id) params = {"page": 0} while True: extr = text.extract_from(self.request(url, params=params).text) cnt = 0 while True: gid = extr('<a href="/gallery/', '"') if not gid: break yield gid, extr("<b>", "<") cnt += 1 if cnt < 25: break params["page"] += 1 class ImagefapUserExtractor(ImagefapExtractor): """Extractor for an imagefap user profile""" subcategory = "user" pattern = (BASE_PATTERN + r"/(?:profile(?:\.php\?user=|/)([^/?#]+)(?:/galleries)?" r"|usergallery\.php\?userid=(\d+))(?:$|#)") test = ( ("https://www.imagefap.com/profile/BdRachel", { "pattern": ImagefapFolderExtractor.pattern, "count": ">= 18", }), ("https://www.imagefap.com/usergallery.php?userid=1862791", { "pattern": r"https://www\.imagefap\.com" r"/profile/LucyRae/galleries\?folderid=-1", "count": 1, }), ("https://www.imagefap.com/profile/BdRachel/galleries"), ("https://www.imagefap.com/profile.php?user=BdRachel"), ("https://beta.imagefap.com/profile.php?user=BdRachel"), ) def __init__(self, match): ImagefapExtractor.__init__(self, match) self.user, self.user_id = match.groups() def items(self): data = {"_extractor": ImagefapFolderExtractor} for folder_id in self.folders(): if folder_id == "-1": url = "{}/profile/{}/galleries?folderid=-1".format( self.root, self.user) else: url = "{}/organizer/{}/".format(self.root, folder_id) yield Message.Queue, url, data def folders(self): """Return a list of folder IDs of a user""" if self.user: url = "{}/profile/{}/galleries".format(self.root, self.user) else: url = "{}/usergallery.php?userid={}".format( self.root, self.user_id) response = self.request(url) self.user = response.url.split("/")[-2] folders = text.extr(response.text, ' id="tgl_all" value="', '"') return folders.rstrip("|").split("|") ������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1675691565.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.25.0/gallery_dl/extractor/imagehosts.py������������������������������������������������0000644�0001750�0001750�00000031462�14370203055�021264� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2016-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Collection of extractors for various imagehosts""" from .common import Extractor, Message from .. import text, exception from ..cache import memcache from os.path import splitext class ImagehostImageExtractor(Extractor): """Base class for single-image extractors for various imagehosts""" basecategory = "imagehost" subcategory = "image" archive_fmt = "{token}" https = True params = None cookies = None encoding = None def __init__(self, match): Extractor.__init__(self, match) self.page_url = "http{}://{}".format( "s" if self.https else "", match.group(1)) self.token = match.group(2) if self.params == "simple": self.params = { "imgContinue": "Continue+to+image+...+", } elif self.params == "complex": self.params = { "op": "view", "id": self.token, "pre": "1", "adb": "1", "next": "Continue+to+image+...+", } def items(self): page = self.request( self.page_url, method=("POST" if self.params else "GET"), data=self.params, cookies=self.cookies, encoding=self.encoding, ).text url, filename = self.get_info(page) data = text.nameext_from_url(filename, {"token": self.token}) data.update(self.metadata(page)) if self.https and url.startswith("http:"): url = "https:" + url[5:] yield Message.Directory, data yield Message.Url, url, data def get_info(self, page): """Find image-url and string to get filename from""" def metadata(self, page): """Return additional metadata""" return () class ImxtoImageExtractor(ImagehostImageExtractor): """Extractor for single images from imx.to""" category = "imxto" pattern = (r"(?:https?://)?(?:www\.)?((?:imx\.to|img\.yt)" r"/(?:i/|img-)(\w+)(\.html)?)") test = ( ("https://imx.to/i/1qdeva", { # new-style URL "url": "ab2173088a6cdef631d7a47dec4a5da1c6a00130", "content": "0c8768055e4e20e7c7259608b67799171b691140", "keyword": { "size" : 18, "width" : 64, "height": 32, "hash" : "94d56c599223c59f3feb71ea603484d1", }, }), ("https://imx.to/img-57a2050547b97.html", { # old-style URL "url": "a83fe6ef1909a318c4d49fcf2caf62f36c3f9204", "content": "54592f2635674c25677c6872db3709d343cdf92f", "keyword": { "size" : 5284, "width" : 320, "height": 160, "hash" : "40da6aaa7b8c42b18ef74309bbc713fc", }, }), ("https://img.yt/img-57a2050547b97.html", { # img.yt domain "url": "a83fe6ef1909a318c4d49fcf2caf62f36c3f9204", }), ("https://imx.to/img-57a2050547b98.html", { "exception": exception.NotFoundError, }), ) params = "simple" encoding = "utf-8" def __init__(self, match): ImagehostImageExtractor.__init__(self, match) if "/img-" in self.page_url: self.page_url = self.page_url.replace("img.yt", "imx.to") self.url_ext = True else: self.url_ext = False def get_info(self, page): url, pos = text.extract( page, '<div style="text-align:center;"><a href="', '"') if not url: raise exception.NotFoundError("image") filename, pos = text.extract(page, ' title="', '"', pos) if self.url_ext and filename: filename += splitext(url)[1] return url, filename or url def metadata(self, page): extr = text.extract_from(page, page.index("[ FILESIZE <")) size = extr(">", "</span>").replace(" ", "")[:-1] width, _, height = extr(">", " px</span>").partition("x") return { "size" : text.parse_bytes(size), "width" : text.parse_int(width), "height": text.parse_int(height), "hash" : extr(">", "</span>"), } class AcidimgImageExtractor(ImagehostImageExtractor): """Extractor for single images from acidimg.cc""" category = "acidimg" pattern = r"(?:https?://)?((?:www\.)?acidimg\.cc/img-([a-z0-9]+)\.html)" test = ("https://acidimg.cc/img-5acb6b9de4640.html", { "url": "f132a630006e8d84f52d59555191ed82b3b64c04", "keyword": "a8bb9ab8b2f6844071945d31f8c6e04724051f37", "content": "0c8768055e4e20e7c7259608b67799171b691140", }) params = "simple" encoding = "utf-8" def get_info(self, page): url, pos = text.extract(page, "<img class='centred' src='", "'") if not url: raise exception.NotFoundError("image") filename, pos = text.extract(page, " alt='", "'", pos) return url, (filename + splitext(url)[1]) if filename else url class ImagevenueImageExtractor(ImagehostImageExtractor): """Extractor for single images from imagevenue.com""" category = "imagevenue" pattern = (r"(?:https?://)?((?:www|img\d+)\.imagevenue\.com" r"/([A-Z0-9]{8,10}|view/.*|img\.php\?.*))") test = ( ("https://www.imagevenue.com/ME13LS07", { "pattern": r"https://cdn-images\.imagevenue\.com" r"/10/ac/05/ME13LS07_o\.png", "keyword": "ae15d6e3b2095f019eee84cd896700cd34b09c36", "content": "cfaa8def53ed1a575e0c665c9d6d8cf2aac7a0ee", }), (("https://www.imagevenue.com/view/o?i=92518_13732377" "annakarina424200712535AM_122_486lo.jpg&h=img150&l=loc486"), { "url": "8bf0254e29250d8f5026c0105bbdda3ee3d84980", }), (("http://img28116.imagevenue.com/img.php" "?image=th_52709_test_122_64lo.jpg"), { "url": "f98e3091df7f48a05fb60fbd86f789fc5ec56331", }), ) def get_info(self, page): pos = page.index('class="card-body') url, pos = text.extract(page, '<img src="', '"', pos) filename, pos = text.extract(page, 'alt="', '"', pos) return url, text.unescape(filename) class ImagetwistImageExtractor(ImagehostImageExtractor): """Extractor for single images from imagetwist.com""" category = "imagetwist" pattern = (r"(?:https?://)?((?:www\.|phun\.)?" r"image(?:twist|haha)\.com/([a-z0-9]{12}))") test = ( ("https://imagetwist.com/f1i2s4vhvbrq/test.png", { "url": "8d5e168c0bee30211f821c6f3b2116e419d42671", "keyword": "d1060a4c2e3b73b83044e20681712c0ffdd6cfef", "content": "0c8768055e4e20e7c7259608b67799171b691140", }), ("https://www.imagetwist.com/f1i2s4vhvbrq/test.png"), ("https://phun.imagetwist.com/f1i2s4vhvbrq/test.png"), ("https://imagehaha.com/f1i2s4vhvbrq/test.png"), ("https://www.imagehaha.com/f1i2s4vhvbrq/test.png"), ) @property @memcache(maxage=3*3600) def cookies(self): return self.request(self.page_url).cookies def get_info(self, page): url , pos = text.extract(page, '<img src="', '"') filename, pos = text.extract(page, ' alt="', '"', pos) return url, filename class ImgspiceImageExtractor(ImagehostImageExtractor): """Extractor for single images from imgspice.com""" category = "imgspice" pattern = r"(?:https?://)?((?:www\.)?imgspice\.com/([^/?#]+))" test = ("https://imgspice.com/nwfwtpyog50y/test.png.html", { "url": "b8c30a8f51ee1012959a4cfd46197fabf14de984", "keyword": "100e310a19a2fa22d87e1bbc427ecb9f6501e0c0", "content": "0c8768055e4e20e7c7259608b67799171b691140", }) def get_info(self, page): pos = page.find('id="imgpreview"') if pos < 0: raise exception.NotFoundError("image") url , pos = text.extract(page, 'src="', '"', pos) name, pos = text.extract(page, 'alt="', '"', pos) return url, text.unescape(name) class PixhostImageExtractor(ImagehostImageExtractor): """Extractor for single images from pixhost.to""" category = "pixhost" pattern = (r"(?:https?://)?((?:www\.)?pixhost\.(?:to|org)" r"/show/\d+/(\d+)_[^/?#]+)") test = ("http://pixhost.to/show/190/130327671_test-.png", { "url": "4e5470dcf6513944773044d40d883221bbc46cff", "keyword": "3bad6d59db42a5ebbd7842c2307e1c3ebd35e6b0", "content": "0c8768055e4e20e7c7259608b67799171b691140", }) cookies = {"pixhostads": "1", "pixhosttest": "1"} def get_info(self, page): url , pos = text.extract(page, "class=\"image-img\" src=\"", "\"") filename, pos = text.extract(page, "alt=\"", "\"", pos) return url, filename class PixhostGalleryExtractor(ImagehostImageExtractor): """Extractor for image galleries from pixhost.to""" category = "pixhost" subcategory = "gallery" pattern = (r"(?:https?://)?((?:www\.)?pixhost\.(?:to|org)" r"/gallery/([^/?#]+))") test = ("https://pixhost.to/gallery/jSMFq", { "pattern": PixhostImageExtractor.pattern, "count": 3, }) def items(self): page = text.extr(self.request( self.page_url).text, 'class="images"', "</div>") data = {"_extractor": PixhostImageExtractor} for url in text.extract_iter(page, '<a href="', '"'): yield Message.Queue, url, data class PostimgImageExtractor(ImagehostImageExtractor): """Extractor for single images from postimages.org""" category = "postimg" pattern = (r"(?:https?://)?((?:www\.)?(?:postimg|pixxxels)\.(?:cc|org)" r"/(?:image/)?([^/?#]+)/?)") test = ("https://postimg.cc/Wtn2b3hC", { "url": "0794cfda9b8951a8ac3aa692472484200254ab86", "keyword": "2d05808d04e4e83e33200db83521af06e3147a84", "content": "cfaa8def53ed1a575e0c665c9d6d8cf2aac7a0ee", }) def get_info(self, page): url , pos = text.extract(page, 'id="main-image" src="', '"') filename, pos = text.extract(page, 'class="imagename">', '<', pos) return url, text.unescape(filename) class TurboimagehostImageExtractor(ImagehostImageExtractor): """Extractor for single images from www.turboimagehost.com""" category = "turboimagehost" pattern = (r"(?:https?://)?((?:www\.)?turboimagehost\.com" r"/p/(\d+)/[^/?#]+\.html)") test = ("https://www.turboimagehost.com/p/39078423/test--.png.html", { "url": "b94de43612318771ced924cb5085976f13b3b90e", "keyword": "704757ca8825f51cec516ec44c1e627c1f2058ca", "content": "0c8768055e4e20e7c7259608b67799171b691140", }) def get_info(self, page): url = text.extract(page, 'src="', '"', page.index("<img "))[0] return url, url class ViprImageExtractor(ImagehostImageExtractor): """Extractor for single images from vipr.im""" category = "vipr" pattern = r"(?:https?://)?(vipr\.im/(\w+))" test = ("https://vipr.im/kcd5jcuhgs3v.html", { "url": "88f6a3ecbf3356a11ae0868b518c60800e070202", "keyword": "c432e8a1836b0d97045195b745731c2b1bb0e771", }) def get_info(self, page): url = text.extr(page, '<img src="', '"') return url, url class ImgclickImageExtractor(ImagehostImageExtractor): """Extractor for single images from imgclick.net""" category = "imgclick" pattern = r"(?:https?://)?((?:www\.)?imgclick\.net/([^/?#]+))" test = ("http://imgclick.net/4tbrre1oxew9/test-_-_.png.html", { "url": "140dcb250a325f2d26b2d918c18b8ac6a2a0f6ab", "keyword": "6895256143eab955622fc149aa367777a8815ba3", "content": "0c8768055e4e20e7c7259608b67799171b691140", }) https = False params = "complex" def get_info(self, page): url , pos = text.extract(page, '<br><img src="', '"') filename, pos = text.extract(page, 'alt="', '"', pos) return url, filename class FappicImageExtractor(ImagehostImageExtractor): """Extractor for single images from fappic.com""" category = "fappic" pattern = r"(?:https?://)?((?:www\.)?fappic\.com/(\w+)/[^/?#]+)" test = ("https://www.fappic.com/98wxqcklyh8k/test.png", { "pattern": r"https://img\d+\.fappic\.com/img/\w+/test\.png", "keyword": "433b1d310b0ff12ad8a71ac7b9d8ba3f8cd1e898", "content": "0c8768055e4e20e7c7259608b67799171b691140", }) def get_info(self, page): url , pos = text.extract(page, '<a href="#"><img src="', '"') filename, pos = text.extract(page, 'alt="', '"', pos) if filename.startswith("Porn-Picture-"): filename = filename[13:] return url, filename ��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1675805217.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.25.0/gallery_dl/extractor/imgbb.py�����������������������������������������������������0000644�0001750�0001750�00000020045�14370541041�020174� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2019 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://imgbb.com/""" from .common import Extractor, Message from .. import text, util, exception from ..cache import cache class ImgbbExtractor(Extractor): """Base class for imgbb extractors""" category = "imgbb" directory_fmt = ("{category}", "{user}") filename_fmt = "{title} {id}.{extension}" archive_fmt = "{id}" root = "https://imgbb.com" def __init__(self, match): Extractor.__init__(self, match) self.page_url = self.sort = None def items(self): self.login() url = self.page_url params = {"sort": self.sort} while True: response = self.request(url, params=params, allow_redirects=False) if response.status_code < 300: break url = response.headers["location"] if url.startswith(self.root): raise exception.NotFoundError(self.subcategory) page = response.text data = self.metadata(page) first = True for img in self.images(page): image = { "id" : img["url_viewer"].rpartition("/")[2], "user" : img["user"]["username"] if "user" in img else "", "title" : text.unescape(img["title"]), "url" : img["image"]["url"], "extension": img["image"]["extension"], "size" : text.parse_int(img["image"]["size"]), "width" : text.parse_int(img["width"]), "height" : text.parse_int(img["height"]), } image.update(data) if first: first = False yield Message.Directory, data yield Message.Url, image["url"], image def login(self): username, password = self._get_auth_info() if username: self._update_cookies(self._login_impl(username, password)) @cache(maxage=360*24*3600, keyarg=1) def _login_impl(self, username, password): self.log.info("Logging in as %s", username) url = self.root + "/login" page = self.request(url).text token = text.extr(page, 'PF.obj.config.auth_token="', '"') headers = {"Referer": url} data = { "auth_token" : token, "login-subject": username, "password" : password, } response = self.request(url, method="POST", headers=headers, data=data) if not response.history: raise exception.AuthenticationError() return self.session.cookies def _pagination(self, page, endpoint, params): data = None seek, pos = text.extract(page, 'data-seek="', '"') tokn, pos = text.extract(page, 'PF.obj.config.auth_token="', '"', pos) params["action"] = "list" params["list"] = "images" params["sort"] = self.sort params["seek"] = seek params["page"] = 2 params["auth_token"] = tokn while True: for img in text.extract_iter(page, "data-object='", "'"): yield util.json_loads(text.unquote(img)) if data: if params["seek"] == data["seekEnd"]: return params["seek"] = data["seekEnd"] params["page"] += 1 elif not seek or 'class="pagination-next"' not in page: return data = self.request(endpoint, method="POST", data=params).json() page = data["html"] class ImgbbAlbumExtractor(ImgbbExtractor): """Extractor for albums on imgbb.com""" subcategory = "album" directory_fmt = ("{category}", "{user}", "{album_name} {album_id}") pattern = r"(?:https?://)?ibb\.co/album/([^/?#]+)/?(?:\?([^#]+))?" test = ( ("https://ibb.co/album/i5PggF", { "range": "1-80", "url": "70afec9fcc3a6de62a6b644b487d892d8d47cf1a", "keyword": "569e1d88ebdd27655387559cdf1cd526a3e1ab69", }), ("https://ibb.co/album/i5PggF?sort=title_asc", { "range": "1-80", "url": "afdf5fc95d8e09d77e8f44312f3e9b843987bb5a", "keyword": "f090e14d0e5f7868595082b2c95da1309c84872d", }), # no user data (#471) ("https://ibb.co/album/kYKpwF", { "url": "ac0abcfcb89f4df6adc2f7e4ff872f3b03ef1bc7", "keyword": {"user": ""}, }), # private ("https://ibb.co/album/hqgWrF", { "exception": exception.HttpError, }), ) def __init__(self, match): ImgbbExtractor.__init__(self, match) self.album_name = None self.album_id = match.group(1) self.sort = text.parse_query(match.group(2)).get("sort", "date_desc") self.page_url = "https://ibb.co/album/" + self.album_id def metadata(self, page): album, pos = text.extract(page, '"og:title" content="', '"') user , pos = text.extract(page, 'rel="author">', '<', pos) return { "album_id" : self.album_id, "album_name": text.unescape(album), "user" : user.lower() if user else "", } def images(self, page): url = text.extr(page, '"og:url" content="', '"') album_id = url.rpartition("/")[2].partition("?")[0] return self._pagination(page, "https://ibb.co/json", { "from" : "album", "albumid" : album_id, "params_hidden[list]" : "images", "params_hidden[from]" : "album", "params_hidden[albumid]": album_id, }) class ImgbbUserExtractor(ImgbbExtractor): """Extractor for user profiles in imgbb.com""" subcategory = "user" pattern = r"(?:https?://)?([\w-]+)\.imgbb\.com/?(?:\?([^#]+))?$" test = ("https://folkie.imgbb.com", { "range": "1-80", "pattern": r"https?://i\.ibb\.co/\w+/[^/?#]+", }) def __init__(self, match): ImgbbExtractor.__init__(self, match) self.user = match.group(1) self.sort = text.parse_query(match.group(2)).get("sort", "date_desc") self.page_url = "https://{}.imgbb.com/".format(self.user) def metadata(self, page): return {"user": self.user} def images(self, page): user = text.extr(page, '.obj.resource={"id":"', '"') return self._pagination(page, self.page_url + "json", { "from" : "user", "userid" : user, "params_hidden[userid]": user, "params_hidden[from]" : "user", }) class ImgbbImageExtractor(ImgbbExtractor): subcategory = "image" pattern = r"(?:https?://)?ibb\.co/(?!album/)([^/?#]+)" test = ("https://ibb.co/fUqh5b", { "pattern": r"https://i\.ibb\.co/g3kvx80/Arundel-Ireeman-5\.jpg", "content": "c5a0965178a8b357acd8aa39660092918c63795e", "keyword": { "id" : "fUqh5b", "title" : "Arundel Ireeman 5", "url" : "https://i.ibb.co/g3kvx80/Arundel-Ireeman-5.jpg", "width" : 960, "height": 719, "user" : "folkie", "extension": "jpg", }, }) def __init__(self, match): ImgbbExtractor.__init__(self, match) self.image_id = match.group(1) def items(self): url = "https://ibb.co/" + self.image_id extr = text.extract_from(self.request(url).text) image = { "id" : self.image_id, "title" : text.unescape(extr('"og:title" content="', '"')), "url" : extr('"og:image" content="', '"'), "width" : text.parse_int(extr('"og:image:width" content="', '"')), "height": text.parse_int(extr('"og:image:height" content="', '"')), "user" : extr('rel="author">', '<').lower(), } image["extension"] = text.ext_from_url(image["url"]) yield Message.Directory, image yield Message.Url, image["url"], image �������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1674475069.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.25.0/gallery_dl/extractor/imgbox.py����������������������������������������������������0000644�0001750�0001750�00000010614�14363473075�020417� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2014-2019 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extract images from galleries at https://imgbox.com/""" from .common import Extractor, Message, AsynchronousMixin from .. import text, exception import re class ImgboxExtractor(Extractor): """Base class for imgbox extractors""" category = "imgbox" root = "https://imgbox.com" def items(self): data = self.get_job_metadata() yield Message.Directory, data for image_key in self.get_image_keys(): imgpage = self.request(self.root + "/" + image_key).text imgdata = self.get_image_metadata(imgpage) if imgdata["filename"]: imgdata.update(data) imgdata["image_key"] = image_key text.nameext_from_url(imgdata["filename"], imgdata) yield Message.Url, self.get_image_url(imgpage), imgdata @staticmethod def get_job_metadata(): """Collect metadata for extractor-job""" return {} @staticmethod def get_image_keys(): """Return an iterable containing all image-keys""" return [] @staticmethod def get_image_metadata(page): """Collect metadata for a downloadable file""" return text.extract_all(page, ( ("num" , '</a>   ', ' of '), (None , 'class="image-container"', ''), ("filename" , ' title="', '"'), ))[0] @staticmethod def get_image_url(page): """Extract download-url""" return text.extr(page, 'property="og:image" content="', '"') class ImgboxGalleryExtractor(AsynchronousMixin, ImgboxExtractor): """Extractor for image galleries from imgbox.com""" subcategory = "gallery" directory_fmt = ("{category}", "{title} - {gallery_key}") filename_fmt = "{num:>03}-{filename}.{extension}" archive_fmt = "{gallery_key}_{image_key}" pattern = r"(?:https?://)?(?:www\.)?imgbox\.com/g/([A-Za-z0-9]{10})" test = ( ("https://imgbox.com/g/JaX5V5HX7g", { "url": "da4f15b161461119ee78841d4b8e8d054d95f906", "keyword": "4b1e62820ac2c6205b7ad0b6322cc8e00dbe1b0c", "content": "d20307dc8511ac24d688859c55abf2e2cc2dd3cc", }), ("https://imgbox.com/g/cUGEkRbdZZ", { "url": "76506a3aab175c456910851f66227e90484ca9f7", "keyword": "fb0427b87983197849fb2887905e758f3e50cb6e", }), ("https://imgbox.com/g/JaX5V5HX7h", { "exception": exception.NotFoundError, }), ) def __init__(self, match): ImgboxExtractor.__init__(self, match) self.gallery_key = match.group(1) self.image_keys = [] def get_job_metadata(self): page = self.request(self.root + "/g/" + self.gallery_key).text if "The specified gallery could not be found." in page: raise exception.NotFoundError("gallery") self.image_keys = re.findall(r'<a href="/([^"]+)"><img alt="', page) title = text.extr(page, "<h1>", "</h1>") title, _, count = title.rpartition(" - ") return { "gallery_key": self.gallery_key, "title": text.unescape(title), "count": count[:-7], } def get_image_keys(self): return self.image_keys class ImgboxImageExtractor(ImgboxExtractor): """Extractor for single images from imgbox.com""" subcategory = "image" archive_fmt = "{image_key}" pattern = r"(?:https?://)?(?:www\.)?imgbox\.com/([A-Za-z0-9]{8})" test = ( ("https://imgbox.com/qHhw7lpG", { "url": "ee9cdea6c48ad0161c1b5f81f6b0c9110997038c", "keyword": "dfc72310026b45f3feb4f9cada20c79b2575e1af", "content": "0c8768055e4e20e7c7259608b67799171b691140", }), ("https://imgbox.com/qHhw7lpH", { "exception": exception.NotFoundError, }), ) def __init__(self, match): ImgboxExtractor.__init__(self, match) self.image_key = match.group(1) def get_image_keys(self): return (self.image_key,) @staticmethod def get_image_metadata(page): data = ImgboxExtractor.get_image_metadata(page) if not data["filename"]: raise exception.NotFoundError("image") return data ��������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1674475069.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.25.0/gallery_dl/extractor/imgth.py�����������������������������������������������������0000644�0001750�0001750�00000005046�14363473075�020245� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2015-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://imgth.com/""" from .common import GalleryExtractor from .. import text class ImgthGalleryExtractor(GalleryExtractor): """Extractor for image galleries from imgth.com""" category = "imgth" root = "https://imgth.com" pattern = r"(?:https?://)?(?:www\.)?imgth\.com/gallery/(\d+)" test = ( ("https://imgth.com/gallery/37/wallpaper-anime", { "url": "4ae1d281ca2b48952cf5cca57e9914402ad72748", "pattern": r"https://imgth\.com/images/2009/11/25" r"/wallpaper-anime_\w+\.jpg", "keyword": { "count": 12, "date": "dt:2009-11-25 18:21:00", "extension": "jpg", "filename": r"re:wallpaper-anime_\w+", "gallery_id": 37, "num": int, "title": "Wallpaper anime", "user": "celebrities", }, }), ("https://www.imgth.com/gallery/37/wallpaper-anime"), ) def __init__(self, match): self.gallery_id = gid = match.group(1) url = "{}/gallery/{}/g/".format(self.root, gid) GalleryExtractor.__init__(self, match, url) def metadata(self, page): extr = text.extract_from(page) return { "gallery_id": text.parse_int(self.gallery_id), "title": text.unescape(extr("<h1>", "</h1>")), "count": text.parse_int(extr( "total of images in this gallery: ", " ")), "date" : text.parse_datetime( extr("created on ", " by <") .replace("th, ", " ", 1).replace("nd, ", " ", 1) .replace("st, ", " ", 1), "%B %d %Y at %H:%M"), "user" : text.unescape(extr(">", "<")), } def images(self, page): pnum = 0 while True: thumbs = text.extr(page, '<ul class="thumbnails">', '</ul>') for url in text.extract_iter(thumbs, '<img src="', '"'): path = url.partition("/thumbs/")[2] yield ("{}/images/{}".format(self.root, path), None) if '<li class="next">' not in page: return pnum += 1 url = "{}/gallery/{}/g/page/{}".format( self.root, self.gallery_id, pnum) page = self.request(url).text ������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1674475069.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.25.0/gallery_dl/extractor/imgur.py�����������������������������������������������������0000644�0001750�0001750�00000033714�14363473075�020263� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2015-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://imgur.com/""" from .common import Extractor, Message from .. import text, exception BASE_PATTERN = r"(?:https?://)?(?:www\.|[im]\.)?imgur\.(?:com|io)" class ImgurExtractor(Extractor): """Base class for imgur extractors""" category = "imgur" root = "https://imgur.com" def __init__(self, match): Extractor.__init__(self, match) self.api = ImgurAPI(self) self.key = match.group(1) self.mp4 = self.config("mp4", True) def _prepare(self, image): image.update(image["metadata"]) del image["metadata"] if image["ext"] == "jpeg": image["ext"] = "jpg" elif image["is_animated"] and self.mp4 and image["ext"] == "gif": image["ext"] = "mp4" image["url"] = url = "https://i.imgur.com/{}.{}".format( image["id"], image["ext"]) image["date"] = text.parse_datetime(image["created_at"]) text.nameext_from_url(url, image) return url def _items_queue(self, items): album_ex = ImgurAlbumExtractor image_ex = ImgurImageExtractor for item in items: item["_extractor"] = album_ex if item["is_album"] else image_ex yield Message.Queue, item["link"], item class ImgurImageExtractor(ImgurExtractor): """Extractor for individual images on imgur.com""" subcategory = "image" filename_fmt = "{category}_{id}{title:?_//}.{extension}" archive_fmt = "{id}" pattern = (BASE_PATTERN + r"/(?!gallery|search)" r"(?:r/\w+/)?(\w{7}|\w{5})[sbtmlh]?") test = ( ("https://imgur.com/21yMxCS", { "url": "6f2dcfb86815bdd72808c313e5f715610bc7b9b2", "content": "0c8768055e4e20e7c7259608b67799171b691140", "keyword": { "account_id" : 0, "comment_count" : int, "cover_id" : "21yMxCS", "date" : "dt:2016-11-10 14:24:35", "description" : "", "downvote_count": int, "duration" : 0, "ext" : "png", "favorite" : False, "favorite_count": 0, "has_sound" : False, "height" : 32, "id" : "21yMxCS", "image_count" : 1, "in_most_viral" : False, "is_ad" : False, "is_album" : False, "is_animated" : False, "is_looping" : False, "is_mature" : False, "is_pending" : False, "mime_type" : "image/png", "name" : "test-テスト", "point_count" : int, "privacy" : "", "score" : int, "size" : 182, "title" : "Test", "upvote_count" : int, "url" : "https://i.imgur.com/21yMxCS.png", "view_count" : int, "width" : 64, }, }), ("http://imgur.com/0gybAXR", { # gifv/mp4 video "url": "a2220eb265a55b0c95e0d3d721ec7665460e3fd7", "content": "a3c080e43f58f55243ab830569ba02309d59abfc", }), ("https://imgur.com/XFfsmuC", { # missing title in API response (#467) "keyword": {"title": "Tears are a natural response to irritants"}, }), ("https://imgur.com/1Nily2P", { # animated png "pattern": "https://i.imgur.com/1Nily2P.png", }), ("https://imgur.com/zzzzzzz", { # not found "exception": exception.HttpError, }), ("https://m.imgur.com/r/Celebs/iHJ7tsM"), ("https://www.imgur.com/21yMxCS"), # www ("https://m.imgur.com/21yMxCS"), # mobile ("https://imgur.com/zxaY6"), # 5 character key ("https://imgur.io/zxaY6"), # .io ("https://i.imgur.com/21yMxCS.png"), # direct link ("https://i.imgur.io/21yMxCS.png"), # direct link .io ("https://i.imgur.com/21yMxCSh.png"), # direct link thumbnail ("https://i.imgur.com/zxaY6.gif"), # direct link (short) ("https://i.imgur.com/zxaY6s.gif"), # direct link (short; thumb) ) def items(self): image = self.api.image(self.key) try: del image["ad_url"] del image["ad_type"] except KeyError: pass image.update(image["media"][0]) del image["media"] url = self._prepare(image) yield Message.Directory, image yield Message.Url, url, image class ImgurAlbumExtractor(ImgurExtractor): """Extractor for imgur albums""" subcategory = "album" directory_fmt = ("{category}", "{album[id]}{album[title]:? - //}") filename_fmt = "{category}_{album[id]}_{num:>03}_{id}.{extension}" archive_fmt = "{album[id]}_{id}" pattern = BASE_PATTERN + r"/a/(\w{7}|\w{5})" test = ( ("https://imgur.com/a/TcBmP", { "url": "ce3552f550a5b5316bd9c7ae02e21e39f30c0563", "keyword": { "album": { "account_id" : 0, "comment_count" : int, "cover_id" : "693j2Kr", "date" : "dt:2015-10-09 10:37:50", "description" : "", "downvote_count": 0, "favorite" : False, "favorite_count": 0, "id" : "TcBmP", "image_count" : 19, "in_most_viral" : False, "is_ad" : False, "is_album" : True, "is_mature" : False, "is_pending" : False, "privacy" : "private", "score" : int, "title" : "138", "upvote_count" : int, "url" : "https://imgur.com/a/TcBmP", "view_count" : int, "virality" : int, }, "account_id" : 0, "count" : 19, "date" : "type:datetime", "description": "", "ext" : "jpg", "has_sound" : False, "height" : int, "id" : str, "is_animated": False, "is_looping" : False, "mime_type" : "image/jpeg", "name" : str, "num" : int, "size" : int, "title" : str, "type" : "image", "updated_at" : None, "url" : str, "width" : int, }, }), ("https://imgur.com/a/eD9CT", { # large album "url": "de748c181a04d18bef1de9d4f4866ef0a06d632b", }), ("https://imgur.com/a/RhJXhVT/all", { # 7 character album hash "url": "695ef0c950023362a0163ee5041796300db76674", }), ("https://imgur.com/a/TcBmQ", { "exception": exception.HttpError, }), ("https://imgur.com/a/pjOnJA0", { # empty, no 'media' (#2557) "count": 0, }), ("https://www.imgur.com/a/TcBmP"), # www ("https://imgur.io/a/TcBmP"), # .io ("https://m.imgur.com/a/TcBmP"), # mobile ) def items(self): album = self.api.album(self.key) try: images = album["media"] except KeyError: return del album["media"] count = len(images) album["date"] = text.parse_datetime(album["created_at"]) try: del album["ad_url"] del album["ad_type"] except KeyError: pass for num, image in enumerate(images, 1): url = self._prepare(image) image["num"] = num image["count"] = count image["album"] = album yield Message.Directory, image yield Message.Url, url, image class ImgurGalleryExtractor(ImgurExtractor): """Extractor for imgur galleries""" subcategory = "gallery" pattern = BASE_PATTERN + r"/(?:gallery|t/\w+)/(\w{7}|\w{5})" test = ( ("https://imgur.com/gallery/zf2fIms", { # non-album gallery (#380) "pattern": "https://imgur.com/zf2fIms", }), ("https://imgur.com/gallery/eD9CT", { "pattern": "https://imgur.com/a/eD9CT", }), ("https://imgur.com/t/unmuted/26sEhNr"), ("https://imgur.com/t/cat/qSB8NbN"), ("https://imgur.io/t/cat/qSB8NbN"), # .io ) def items(self): if self.api.gallery(self.key)["is_album"]: url = "{}/a/{}".format(self.root, self.key) extr = ImgurAlbumExtractor else: url = "{}/{}".format(self.root, self.key) extr = ImgurImageExtractor yield Message.Queue, url, {"_extractor": extr} class ImgurUserExtractor(ImgurExtractor): """Extractor for all images posted by a user""" subcategory = "user" pattern = BASE_PATTERN + r"/user/([^/?#]+)(?:/posts|/submitted)?/?$" test = ( ("https://imgur.com/user/Miguenzo", { "range": "1-100", "count": 100, "pattern": r"https?://(i.imgur.com|imgur.com/a)/[\w.]+", }), ("https://imgur.com/user/Miguenzo/posts"), ("https://imgur.com/user/Miguenzo/submitted"), ) def items(self): return self._items_queue(self.api.account_submissions(self.key)) class ImgurFavoriteExtractor(ImgurExtractor): """Extractor for a user's favorites""" subcategory = "favorite" pattern = BASE_PATTERN + r"/user/([^/?#]+)/favorites" test = ("https://imgur.com/user/Miguenzo/favorites", { "range": "1-100", "count": 100, "pattern": r"https?://(i.imgur.com|imgur.com/a)/[\w.]+", }) def items(self): return self._items_queue(self.api.account_favorites(self.key)) class ImgurSubredditExtractor(ImgurExtractor): """Extractor for a subreddits's imgur links""" subcategory = "subreddit" pattern = BASE_PATTERN + r"/r/([^/?#]+)/?$" test = ("https://imgur.com/r/pics", { "range": "1-100", "count": 100, "pattern": r"https?://(i.imgur.com|imgur.com/a)/[\w.]+", }) def items(self): return self._items_queue(self.api.gallery_subreddit(self.key)) class ImgurTagExtractor(ImgurExtractor): """Extractor for imgur tag searches""" subcategory = "tag" pattern = BASE_PATTERN + r"/t/([^/?#]+)$" test = ("https://imgur.com/t/animals", { "range": "1-100", "count": 100, "pattern": r"https?://(i.imgur.com|imgur.com/a)/[\w.]+", }) def items(self): return self._items_queue(self.api.gallery_tag(self.key)) class ImgurSearchExtractor(ImgurExtractor): """Extractor for imgur search results""" subcategory = "search" pattern = BASE_PATTERN + r"/search(?:/[^?#]+)?/?\?q=([^&#]+)" test = ("https://imgur.com/search?q=cute+cat", { "range": "1-100", "count": 100, "pattern": r"https?://(i.imgur.com|imgur.com/a)/[\w.]+", }) def items(self): key = text.unquote(self.key.replace("+", " ")) return self._items_queue(self.api.gallery_search(key)) class ImgurAPI(): """Interface for the Imgur API Ref: https://apidocs.imgur.com/ """ def __init__(self, extractor): self.extractor = extractor self.headers = { "Authorization": "Client-ID " + extractor.config( "client-id", "546c25a59c58ad7"), } def account_favorites(self, account): endpoint = "/3/account/{}/gallery_favorites".format(account) return self._pagination(endpoint) def gallery_search(self, query): endpoint = "/3/gallery/search" params = {"q": query} return self._pagination(endpoint, params) def account_submissions(self, account): endpoint = "/3/account/{}/submissions".format(account) return self._pagination(endpoint) def gallery_subreddit(self, subreddit): endpoint = "/3/gallery/r/{}".format(subreddit) return self._pagination(endpoint) def gallery_tag(self, tag): endpoint = "/3/gallery/t/{}".format(tag) return self._pagination(endpoint, key="items") def image(self, image_hash): endpoint = "/post/v1/media/" + image_hash params = {"include": "media,tags,account"} return self._call(endpoint, params) def album(self, album_hash): endpoint = "/post/v1/albums/" + album_hash params = {"include": "media,tags,account"} return self._call(endpoint, params) def gallery(self, gallery_hash): endpoint = "/post/v1/posts/" + gallery_hash return self._call(endpoint) def _call(self, endpoint, params=None): while True: try: return self.extractor.request( "https://api.imgur.com" + endpoint, params=params, headers=self.headers, ).json() except exception.HttpError as exc: if exc.status not in (403, 429) or \ b"capacity" not in exc.response.content: raise self.extractor.wait(seconds=600) def _pagination(self, endpoint, params=None, key=None): num = 0 while True: data = self._call("{}/{}".format(endpoint, num), params)["data"] if key: data = data[key] if not data: return yield from data num += 1 ����������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1674475069.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.25.0/gallery_dl/extractor/inkbunny.py��������������������������������������������������0000644�0001750�0001750�00000035431�14363473075�020773� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2020-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://inkbunny.net/""" from .common import Extractor, Message from .. import text, exception from ..cache import cache BASE_PATTERN = r"(?:https?://)?(?:www\.)?inkbunny\.net" class InkbunnyExtractor(Extractor): """Base class for inkbunny extractors""" category = "inkbunny" directory_fmt = ("{category}", "{username!l}") filename_fmt = "{submission_id} {file_id} {title}.{extension}" archive_fmt = "{file_id}" root = "https://inkbunny.net" def __init__(self, match): Extractor.__init__(self, match) self.api = InkbunnyAPI(self) def items(self): self.api.authenticate() metadata = self.metadata() to_bool = ("deleted", "favorite", "friends_only", "guest_block", "hidden", "public", "scraps") for post in self.posts(): post.update(metadata) post["date"] = text.parse_datetime( post["create_datetime"] + "00", "%Y-%m-%d %H:%M:%S.%f%z") post["tags"] = [kw["keyword_name"] for kw in post["keywords"]] post["ratings"] = [r["name"] for r in post["ratings"]] files = post["files"] for key in to_bool: if key in post: post[key] = (post[key] == "t") del post["keywords"] del post["files"] yield Message.Directory, post for post["num"], file in enumerate(files, 1): post.update(file) post["deleted"] = (file["deleted"] == "t") post["date"] = text.parse_datetime( file["create_datetime"] + "00", "%Y-%m-%d %H:%M:%S.%f%z") text.nameext_from_url(file["file_name"], post) url = file["file_url_full"] if "/private_files/" in url: url += "?sid=" + self.api.session_id yield Message.Url, url, post def posts(self): return () def metadata(self): return () class InkbunnyUserExtractor(InkbunnyExtractor): """Extractor for inkbunny user profiles""" subcategory = "user" pattern = BASE_PATTERN + r"/(?!s/)(gallery/|scraps/)?(\w+)(?:$|[/?#])" test = ( ("https://inkbunny.net/soina", { "pattern": r"https://[\w.]+\.metapix\.net/files/full" r"/\d+/\d+_soina_.+", "range": "20-50", "keyword": { "date" : "type:datetime", "deleted" : bool, "file_id" : "re:[0-9]+", "filename" : r"re:[0-9]+_soina_\w+", "full_file_md5": "re:[0-9a-f]{32}", "mimetype" : str, "submission_id": "re:[0-9]+", "user_id" : "20969", "comments_count" : "re:[0-9]+", "deleted" : bool, "favorite" : bool, "favorites_count": "re:[0-9]+", "friends_only" : bool, "guest_block" : bool, "hidden" : bool, "pagecount" : "re:[0-9]+", "pools" : list, "pools_count" : int, "public" : bool, "rating_id" : "re:[0-9]+", "rating_name" : str, "ratings" : list, "scraps" : bool, "tags" : list, "title" : str, "type_name" : str, "username" : "soina", "views" : str, }, }), ("https://inkbunny.net/gallery/soina", { "range": "1-25", "keyword": {"scraps": False}, }), ("https://inkbunny.net/scraps/soina", { "range": "1-25", "keyword": {"scraps": True}, }), ) def __init__(self, match): kind, self.user = match.groups() if not kind: self.scraps = None elif kind[0] == "g": self.subcategory = "gallery" self.scraps = "no" else: self.subcategory = "scraps" self.scraps = "only" InkbunnyExtractor.__init__(self, match) def posts(self): orderby = self.config("orderby") params = { "username": self.user, "scraps" : self.scraps, "orderby" : orderby, } if orderby and orderby.startswith("unread_"): params["unread_submissions"] = "yes" return self.api.search(params) class InkbunnyPoolExtractor(InkbunnyExtractor): """Extractor for inkbunny pools""" subcategory = "pool" pattern = (BASE_PATTERN + r"/(?:" r"poolview_process\.php\?pool_id=(\d+)|" r"submissionsviewall\.php\?([^#]+&mode=pool&[^#]+))") test = ( ("https://inkbunny.net/poolview_process.php?pool_id=28985", { "count": 9, "keyword": {"pool_id": "28985"}, }), ("https://inkbunny.net/submissionsviewall.php?rid=ffffffffff" "&mode=pool&pool_id=28985&page=1&orderby=pool_order&random=no"), ) def __init__(self, match): InkbunnyExtractor.__init__(self, match) pid = match.group(1) if pid: self.pool_id = pid self.orderby = "pool_order" else: params = text.parse_query(match.group(2)) self.pool_id = params.get("pool_id") self.orderby = params.get("orderby", "pool_order") def metadata(self): return {"pool_id": self.pool_id} def posts(self): params = { "pool_id": self.pool_id, "orderby": self.orderby, } return self.api.search(params) class InkbunnyFavoriteExtractor(InkbunnyExtractor): """Extractor for inkbunny user favorites""" subcategory = "favorite" pattern = (BASE_PATTERN + r"/(?:" r"userfavorites_process\.php\?favs_user_id=(\d+)|" r"submissionsviewall\.php\?([^#]+&mode=userfavs&[^#]+))") test = ( ("https://inkbunny.net/userfavorites_process.php?favs_user_id=20969", { "pattern": r"https://[\w.]+\.metapix\.net/files/full" r"/\d+/\d+_\w+_.+", "range": "20-50", "keyword": {"favs_user_id": "20969"}, }), ("https://inkbunny.net/submissionsviewall.php?rid=ffffffffff" "&mode=userfavs&random=no&orderby=fav_datetime&page=1&user_id=20969"), ) def __init__(self, match): InkbunnyExtractor.__init__(self, match) uid = match.group(1) if uid: self.user_id = uid self.orderby = self.config("orderby", "fav_datetime") else: params = text.parse_query(match.group(2)) self.user_id = params.get("user_id") self.orderby = params.get("orderby", "fav_datetime") def metadata(self): return {"favs_user_id": self.user_id} def posts(self): params = { "favs_user_id": self.user_id, "orderby" : self.orderby, } if self.orderby and self.orderby.startswith("unread_"): params["unread_submissions"] = "yes" return self.api.search(params) class InkbunnySearchExtractor(InkbunnyExtractor): """Extractor for inkbunny search results""" subcategory = "search" pattern = (BASE_PATTERN + r"/submissionsviewall\.php\?([^#]+&mode=search&[^#]+)") test = (("https://inkbunny.net/submissionsviewall.php?rid=ffffffffff" "&mode=search&page=1&orderby=create_datetime&text=cute" "&stringtype=and&keywords=yes&title=yes&description=no&artist=" "&favsby=&type=&days=&keyword_id=&user_id=&random=&md5="), { "range": "1-10", "count": 10, "keyword": { "search": { "rid": "ffffffffff", "mode": "search", "page": "1", "orderby": "create_datetime", "text": "cute", "stringtype": "and", "keywords": "yes", "title": "yes", "description": "no", }, }, }) def __init__(self, match): InkbunnyExtractor.__init__(self, match) self.params = text.parse_query(match.group(1)) def metadata(self): return {"search": self.params} def posts(self): params = self.params.copy() pop = params.pop pop("rid", None) params["string_join_type"] = pop("stringtype", None) params["dayslimit"] = pop("days", None) params["username"] = pop("artist", None) favsby = pop("favsby", None) if favsby: # get user_id from user profile url = "{}/{}".format(self.root, favsby) page = self.request(url).text user_id = text.extr(page, "?user_id=", "'") params["favs_user_id"] = user_id.partition("&")[0] return self.api.search(params) class InkbunnyFollowingExtractor(InkbunnyExtractor): """Extractor for inkbunny user watches""" subcategory = "following" pattern = (BASE_PATTERN + r"/(?:" r"watchlist_process\.php\?mode=watching&user_id=(\d+)|" r"usersviewall\.php\?([^#]+&mode=watching&[^#]+))") test = ( (("https://inkbunny.net/watchlist_process.php" "?mode=watching&user_id=20969"), { "pattern": InkbunnyUserExtractor.pattern, "count": ">= 90", }), ("https://inkbunny.net/usersviewall.php?rid=ffffffffff" "&mode=watching&page=1&user_id=20969&orderby=added&namesonly="), ) def __init__(self, match): InkbunnyExtractor.__init__(self, match) self.user_id = match.group(1) or \ text.parse_query(match.group(2)).get("user_id") def items(self): url = self.root + "/watchlist_process.php" params = {"mode": "watching", "user_id": self.user_id} with self.request(url, params=params) as response: url, _, params = response.url.partition("?") page = response.text params = text.parse_query(params) params["page"] = text.parse_int(params.get("page"), 1) data = {"_extractor": InkbunnyUserExtractor} while True: cnt = 0 for user in text.extract_iter( page, '<a class="widget_userNameSmall" href="', '"', page.index('id="changethumboriginal_form"')): cnt += 1 yield Message.Queue, self.root + user, data if cnt < 20: return params["page"] += 1 page = self.request(url, params=params).text class InkbunnyPostExtractor(InkbunnyExtractor): """Extractor for individual Inkbunny posts""" subcategory = "post" pattern = BASE_PATTERN + r"/s/(\d+)" test = ( ("https://inkbunny.net/s/1829715", { "pattern": r"https://[\w.]+\.metapix\.net/files/full" r"/2626/2626843_soina_dscn2296\.jpg", "content": "cf69d8dddf0822a12b4eef1f4b2258bd600b36c8", }), ("https://inkbunny.net/s/2044094", { "count": 4, }), ) def __init__(self, match): InkbunnyExtractor.__init__(self, match) self.submission_id = match.group(1) def posts(self): submissions = self.api.detail(({"submission_id": self.submission_id},)) if submissions[0] is None: raise exception.NotFoundError("submission") return submissions class InkbunnyAPI(): """Interface for the Inkunny API Ref: https://wiki.inkbunny.net/wiki/API """ def __init__(self, extractor): self.extractor = extractor self.session_id = None def detail(self, submissions): """Get full details about submissions with the given IDs""" ids = { sub["submission_id"]: idx for idx, sub in enumerate(submissions) } params = { "submission_ids": ",".join(ids), "show_description": "yes", } submissions = [None] * len(ids) for sub in self._call("submissions", params)["submissions"]: submissions[ids[sub["submission_id"]]] = sub return submissions def search(self, params): """Perform a search""" return self._pagination_search(params) def set_allowed_ratings(self, nudity=True, sexual=True, violence=True, strong_violence=True): """Change allowed submission ratings""" params = { "tag[2]": "yes" if nudity else "no", "tag[3]": "yes" if violence else "no", "tag[4]": "yes" if sexual else "no", "tag[5]": "yes" if strong_violence else "no", } self._call("userrating", params) def authenticate(self, invalidate=False): username, password = self.extractor._get_auth_info() if invalidate: _authenticate_impl.invalidate(username or "guest") if username: self.session_id = _authenticate_impl(self, username, password) else: self.session_id = _authenticate_impl(self, "guest", "") self.set_allowed_ratings() def _call(self, endpoint, params): url = "https://inkbunny.net/api_" + endpoint + ".php" params["sid"] = self.session_id data = self.extractor.request(url, params=params).json() if "error_code" in data: if str(data["error_code"]) == "2": self.authenticate(invalidate=True) return self._call(endpoint, params) raise exception.StopExtraction(data.get("error_message")) return data def _pagination_search(self, params): params["page"] = 1 params["get_rid"] = "yes" params["submission_ids_only"] = "yes" while True: data = self._call("search", params) yield from self.detail(data["submissions"]) if data["page"] >= data["pages_count"]: return if "get_rid" in params: del params["get_rid"] params["rid"] = data["rid"] params["page"] += 1 @cache(maxage=360*24*3600, keyarg=1) def _authenticate_impl(api, username, password): api.extractor.log.info("Logging in as %s", username) url = "https://inkbunny.net/api_login.php" data = {"username": username, "password": password} data = api.extractor.request(url, method="POST", data=data).json() if "sid" not in data: raise exception.AuthenticationError(data.get("error_message")) return data["sid"] ���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1675691565.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.25.0/gallery_dl/extractor/instagram.py�������������������������������������������������0000644�0001750�0001750�00000111113�14370203055�021076� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2018-2020 Leonardo Taccari # Copyright 2018-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.instagram.com/""" from .common import Extractor, Message from .. import text, util, exception from ..cache import cache, memcache import binascii import json import re BASE_PATTERN = r"(?:https?://)?(?:www\.)?instagram\.com" USER_PATTERN = BASE_PATTERN + r"/(?!(?:p|tv|reel|explore|stories)/)([^/?#]+)" class InstagramExtractor(Extractor): """Base class for instagram extractors""" category = "instagram" directory_fmt = ("{category}", "{username}") filename_fmt = "{sidecar_media_id:?/_/}{media_id}.{extension}" archive_fmt = "{media_id}" root = "https://www.instagram.com" cookiedomain = ".instagram.com" cookienames = ("sessionid",) request_interval = (6.0, 12.0) def __init__(self, match): Extractor.__init__(self, match) self.item = match.group(1) self.api = None self.www_claim = "0" self.csrf_token = util.generate_token() self._logged_in = True self._find_tags = re.compile(r"#\w+").findall self._cursor = None self._user = None def items(self): self.login() if self.config("api") == "graphql": self.api = InstagramGraphqlAPI(self) else: self.api = InstagramRestAPI(self) data = self.metadata() videos = self.config("videos", True) previews = self.config("previews", False) video_headers = {"User-Agent": "Mozilla/5.0"} for post in self.posts(): if "__typename" in post: post = self._parse_post_graphql(post) else: post = self._parse_post_rest(post) if self._user: post["user"] = self._user post.update(data) files = post.pop("_files") post["count"] = len(files) yield Message.Directory, post if "date" in post: del post["date"] for file in files: file.update(post) url = file.get("video_url") if url: if videos: file["_http_headers"] = video_headers text.nameext_from_url(url, file) yield Message.Url, url, file if not previews: continue url = file["display_url"] yield Message.Url, url, text.nameext_from_url(url, file) def metadata(self): return () def posts(self): return () def finalize(self): if self._cursor: self.log.info("Use '-o cursor=%s' to continue downloading " "from the current position", self._cursor) def request(self, url, **kwargs): response = Extractor.request(self, url, **kwargs) if response.history: url = response.url if "/accounts/login/" in url: page = "login" elif "/challenge/" in url: page = "challenge" else: page = None if page: raise exception.StopExtraction("HTTP redirect to %s page (%s)", page, url.partition("?")[0]) www_claim = response.headers.get("x-ig-set-www-claim") if www_claim is not None: self.www_claim = www_claim csrf_token = response.cookies.get("csrftoken") if csrf_token: self.csrf_token = csrf_token return response def login(self): if not self._check_cookies(self.cookienames): username, password = self._get_auth_info() if username: self._update_cookies(_login_impl(self, username, password)) else: self._logged_in = False self.session.cookies.set( "csrftoken", self.csrf_token, domain=self.cookiedomain) def _parse_post_rest(self, post): if "items" in post: # story or highlight items = post["items"] reel_id = str(post["id"]).rpartition(":")[2] data = { "expires": text.parse_timestamp(post.get("expiring_at")), "post_id": reel_id, "post_shortcode": shortcode_from_id(reel_id), } if "title" in post: data["highlight_title"] = post["title"] if "created_at" in post: data["date"] = text.parse_timestamp(post.get("created_at")) else: # regular image/video post data = { "post_id" : post["pk"], "post_shortcode": post["code"], "likes": post["like_count"], "pinned": post.get("timeline_pinned_user_ids", ()), "date": text.parse_timestamp(post.get("taken_at")), } caption = post["caption"] data["description"] = caption["text"] if caption else "" tags = self._find_tags(data["description"]) if tags: data["tags"] = sorted(set(tags)) location = post.get("location") if location: slug = location["short_name"].replace(" ", "-").lower() data["location_id"] = location["pk"] data["location_slug"] = slug data["location_url"] = "{}/explore/locations/{}/{}/".format( self.root, location["pk"], slug) coauthors = post.get("coauthor_producers") if coauthors: data["coauthors"] = [ {"id" : user["pk"], "username" : user["username"], "full_name": user["full_name"]} for user in coauthors ] if "carousel_media" in post: items = post["carousel_media"] data["sidecar_media_id"] = data["post_id"] data["sidecar_shortcode"] = data["post_shortcode"] else: items = (post,) owner = post["user"] data["owner_id"] = owner["pk"] data["username"] = owner.get("username") data["fullname"] = owner.get("full_name") data["post_url"] = "{}/p/{}/".format(self.root, data["post_shortcode"]) data["_files"] = files = [] for num, item in enumerate(items, 1): image = item["image_versions2"]["candidates"][0] if "video_versions" in item: video = max( item["video_versions"], key=lambda x: (x["width"], x["height"], x["type"]), ) media = video else: video = None media = image media = { "num" : num, "date" : text.parse_timestamp(item.get("taken_at") or media.get("taken_at") or post.get("taken_at")), "media_id" : item["pk"], "shortcode" : (item.get("code") or shortcode_from_id(item["pk"])), "display_url": image["url"], "video_url" : video["url"] if video else None, "width" : media["width"], "height" : media["height"], } if "expiring_at" in item: media["expires"] = text.parse_timestamp(post["expiring_at"]) self._extract_tagged_users(item, media) files.append(media) return data def _parse_post_graphql(self, post): typename = post["__typename"] if self._logged_in: if post.get("is_video") and "video_url" not in post: post = self.api.media(post["id"])[0] elif typename == "GraphSidecar" and \ "edge_sidecar_to_children" not in post: post = self.api.media(post["id"])[0] pinned = post.get("pinned_for_users", ()) if pinned: for index, user in enumerate(pinned): pinned[index] = int(user["id"]) owner = post["owner"] data = { "typename" : typename, "date" : text.parse_timestamp(post["taken_at_timestamp"]), "likes" : post["edge_media_preview_like"]["count"], "pinned" : pinned, "owner_id" : owner["id"], "username" : owner.get("username"), "fullname" : owner.get("full_name"), "post_id" : post["id"], "post_shortcode": post["shortcode"], "post_url" : "{}/p/{}/".format(self.root, post["shortcode"]), "description": text.parse_unicode_escapes("\n".join( edge["node"]["text"] for edge in post["edge_media_to_caption"]["edges"] )), } tags = self._find_tags(data["description"]) if tags: data["tags"] = sorted(set(tags)) location = post.get("location") if location: data["location_id"] = location["id"] data["location_slug"] = location["slug"] data["location_url"] = "{}/explore/locations/{}/{}/".format( self.root, location["id"], location["slug"]) coauthors = post.get("coauthor_producers") if coauthors: data["coauthors"] = [ {"id" : user["id"], "username": user["username"]} for user in coauthors ] data["_files"] = files = [] if "edge_sidecar_to_children" in post: for num, edge in enumerate( post["edge_sidecar_to_children"]["edges"], 1): node = edge["node"] dimensions = node["dimensions"] media = { "num": num, "media_id" : node["id"], "shortcode" : (node.get("shortcode") or shortcode_from_id(node["id"])), "display_url": node["display_url"], "video_url" : node.get("video_url"), "width" : dimensions["width"], "height" : dimensions["height"], "sidecar_media_id" : post["id"], "sidecar_shortcode": post["shortcode"], } self._extract_tagged_users(node, media) files.append(media) else: dimensions = post["dimensions"] media = { "media_id" : post["id"], "shortcode" : post["shortcode"], "display_url": post["display_url"], "video_url" : post.get("video_url"), "width" : dimensions["width"], "height" : dimensions["height"], } self._extract_tagged_users(post, media) files.append(media) return data @staticmethod def _extract_tagged_users(src, dest): dest["tagged_users"] = tagged_users = [] edges = src.get("edge_media_to_tagged_user") if edges: for edge in edges["edges"]: user = edge["node"]["user"] tagged_users.append({"id" : user["id"], "username" : user["username"], "full_name": user["full_name"]}) usertags = src.get("usertags") if usertags: for tag in usertags["in"]: user = tag["user"] tagged_users.append({"id" : user["pk"], "username" : user["username"], "full_name": user["full_name"]}) mentions = src.get("reel_mentions") if mentions: for mention in mentions: user = mention["user"] tagged_users.append({"id" : user.get("pk"), "username" : user["username"], "full_name": user["full_name"]}) stickers = src.get("story_bloks_stickers") if stickers: for sticker in stickers: sticker = sticker["bloks_sticker"] if sticker["bloks_sticker_type"] == "mention": user = sticker["sticker_data"]["ig_mention"] tagged_users.append({"id" : user["account_id"], "username" : user["username"], "full_name": user["full_name"]}) def _init_cursor(self): return self.config("cursor") or None def _update_cursor(self, cursor): self.log.debug("Cursor: %s", cursor) self._cursor = cursor return cursor def _assign_user(self, user): self._user = user for key, old in ( ("count_media" , "edge_owner_to_timeline_media"), ("count_video" , "edge_felix_video_timeline"), ("count_saved" , "edge_saved_media"), ("count_mutual" , "edge_mutual_followed_by"), ("count_follow" , "edge_follow"), ("count_followed" , "edge_followed_by"), ("count_collection", "edge_media_collections")): try: user[key] = user.pop(old)["count"] except Exception: user[key] = 0 class InstagramUserExtractor(InstagramExtractor): """Extractor for an Instagram user profile""" subcategory = "user" pattern = USER_PATTERN + r"/?(?:$|[?#])" test = ( ("https://www.instagram.com/instagram/"), ("https://www.instagram.com/instagram/?hl=en"), ("https://www.instagram.com/id:25025320/"), ) def items(self): base = "{}/{}/".format(self.root, self.item) stories = "{}/stories/{}/".format(self.root, self.item) return self._dispatch_extractors(( (InstagramAvatarExtractor , base + "avatar/"), (InstagramStoriesExtractor , stories), (InstagramHighlightsExtractor, base + "highlights/"), (InstagramPostsExtractor , base + "posts/"), (InstagramReelsExtractor , base + "reels/"), (InstagramTaggedExtractor , base + "tagged/"), ), ("posts",)) class InstagramPostsExtractor(InstagramExtractor): """Extractor for an Instagram user's posts""" subcategory = "posts" pattern = USER_PATTERN + r"/posts" test = ("https://www.instagram.com/instagram/posts/", { "range": "1-16", "count": ">= 16", }) def posts(self): uid = self.api.user_id(self.item) return self.api.user_feed(uid) class InstagramReelsExtractor(InstagramExtractor): """Extractor for an Instagram user's reels""" subcategory = "reels" pattern = USER_PATTERN + r"/reels" test = ("https://www.instagram.com/instagram/reels/", { "range": "40-60", "count": ">= 20", }) def posts(self): uid = self.api.user_id(self.item) return self.api.user_clips(uid) class InstagramTaggedExtractor(InstagramExtractor): """Extractor for an Instagram user's tagged posts""" subcategory = "tagged" pattern = USER_PATTERN + r"/tagged" test = ("https://www.instagram.com/instagram/tagged/", { "range": "1-16", "count": ">= 16", "keyword": { "tagged_owner_id" : "25025320", "tagged_username" : "instagram", "tagged_full_name": "Instagram", }, }) def metadata(self): if self.item.startswith("id:"): self.user_id = self.item[3:] return {"tagged_owner_id": self.user_id} self.user_id = self.api.user_id(self.item) user = self.api.user_by_name(self.item) return { "tagged_owner_id" : user["id"], "tagged_username" : user["username"], "tagged_full_name": user["full_name"], } def posts(self): return self.api.user_tagged(self.user_id) class InstagramGuideExtractor(InstagramExtractor): """Extractor for an Instagram guide""" subcategory = "guide" pattern = USER_PATTERN + r"/guide/[^/?#]+/(\d+)" test = (("https://www.instagram.com/kadakaofficial/guide" "/knit-i-need-collection/18131821684305217/"), { "range": "1-16", "count": ">= 16", }) def __init__(self, match): InstagramExtractor.__init__(self, match) self.guide_id = match.group(2) def metadata(self): return {"guide": self.api.guide(self.guide_id)} def posts(self): return self.api.guide_media(self.guide_id) class InstagramSavedExtractor(InstagramExtractor): """Extractor for an Instagram user's saved media""" subcategory = "saved" pattern = USER_PATTERN + r"/saved(?:/all-posts)?/?$" test = ( ("https://www.instagram.com/instagram/saved/"), ("https://www.instagram.com/instagram/saved/all-posts/"), ) def posts(self): return self.api.user_saved() class InstagramCollectionExtractor(InstagramExtractor): """Extractor for Instagram collection""" subcategory = "collection" pattern = USER_PATTERN + r"/saved/([^/?#]+)/([^/?#]+)" test = ( "https://www.instagram.com/instagram/saved/collection_name/123456789/", ) def __init__(self, match): InstagramExtractor.__init__(self, match) self.user, self.collection_name, self.collection_id = match.groups() def metadata(self): return { "collection_id" : self.collection_id, "collection_name": text.unescape(self.collection_name), } def posts(self): return self.api.user_collection(self.collection_id) class InstagramStoriesExtractor(InstagramExtractor): """Extractor for Instagram stories""" subcategory = "stories" pattern = (r"(?:https?://)?(?:www\.)?instagram\.com" r"/s(?:tories/(?:highlights/(\d+)|([^/?#]+)(?:/(\d+))?)" r"|/(aGlnaGxpZ2h0[^?#]+)(?:\?story_media_id=(\d+))?)") test = ( ("https://www.instagram.com/stories/instagram/"), ("https://www.instagram.com/stories/highlights/18042509488170095/"), ("https://instagram.com/stories/geekmig/2724343156064789461"), ("https://www.instagram.com/s/aGlnaGxpZ2h0OjE4MDQyNTA5NDg4MTcwMDk1"), ("https://www.instagram.com/s/aGlnaGxpZ2h0OjE4MDQyNTA5NDg4MTcwMDk1" "?story_media_id=2724343156064789461"), ) def __init__(self, match): h1, self.user, m1, h2, m2 = match.groups() if self.user: self.highlight_id = None else: self.subcategory = InstagramHighlightsExtractor.subcategory self.highlight_id = ("highlight:" + h1 if h1 else binascii.a2b_base64(h2).decode()) self.media_id = m1 or m2 InstagramExtractor.__init__(self, match) def posts(self): reel_id = self.highlight_id or self.api.user_id(self.user) reels = self.api.reels_media(reel_id) if self.media_id and reels: reel = reels[0] for item in reel["items"]: if item["pk"] == self.media_id: reel["items"] = (item,) break else: raise exception.NotFoundError("story") return reels class InstagramHighlightsExtractor(InstagramExtractor): """Extractor for an Instagram user's story highlights""" subcategory = "highlights" pattern = USER_PATTERN + r"/highlights" test = ("https://www.instagram.com/instagram/highlights",) def posts(self): uid = self.api.user_id(self.item) return self.api.highlights_media(uid) class InstagramTagExtractor(InstagramExtractor): """Extractor for Instagram tags""" subcategory = "tag" directory_fmt = ("{category}", "{subcategory}", "{tag}") pattern = BASE_PATTERN + r"/explore/tags/([^/?#]+)" test = ("https://www.instagram.com/explore/tags/instagram/", { "range": "1-16", "count": ">= 16", }) def metadata(self): return {"tag": text.unquote(self.item)} def posts(self): return self.api.tags_media(self.item) class InstagramAvatarExtractor(InstagramExtractor): """Extractor for an Instagram user's avatar""" subcategory = "avatar" pattern = USER_PATTERN + r"/avatar" test = ("https://www.instagram.com/instagram/avatar", { "pattern": r"https://instagram\.[\w.-]+\.fbcdn\.net/v/t51\.2885-19" r"/281440578_1088265838702675_6233856337905829714_n\.jpg", }) def posts(self): if self._logged_in: user_id = self.api.user_id(self.item, check_private=False) user = self.api.user_by_id(user_id) avatar = (user.get("hd_profile_pic_url_info") or user["hd_profile_pic_versions"][-1]) else: user = self.item if user.startswith("id:"): user = self.api.user_by_id(user[3:]) else: user = self.api.user_by_name(user) user["pk"] = user["id"] url = user.get("profile_pic_url_hd") or user["profile_pic_url"] avatar = {"url": url, "width": 0, "height": 0} pk = user.get("profile_pic_id") if pk: pk = pk.partition("_")[0] code = shortcode_from_id(pk) else: pk = code = "avatar:" + str(user["pk"]) return ({ "pk" : pk, "code" : code, "user" : user, "caption" : None, "like_count": 0, "image_versions2": {"candidates": (avatar,)}, },) class InstagramPostExtractor(InstagramExtractor): """Extractor for an Instagram post""" subcategory = "post" pattern = (r"(?:https?://)?(?:www\.)?instagram\.com" r"/(?:[^/?#]+/)?(?:p|tv|reel)/([^/?#]+)") test = ( # GraphImage ("https://www.instagram.com/p/BqvsDleB3lV/", { "pattern": r"https://[^/]+\.(cdninstagram\.com|fbcdn\.net)" r"/v(p/[0-9a-f]+/[0-9A-F]+)?/t51.2885-15/e35" r"/44877605_725955034447492_3123079845831750529_n.jpg", "keyword": { "date": "dt:2018-11-29 01:04:04", "description": str, "height": int, "likes": int, "location_id": "214424288", "location_slug": "hong-kong", "location_url": "re:/explore/locations/214424288/hong-kong/", "media_id": "1922949326347663701", "shortcode": "BqvsDleB3lV", "post_id": "1922949326347663701", "post_shortcode": "BqvsDleB3lV", "post_url": "https://www.instagram.com/p/BqvsDleB3lV/", "tags": ["#WHPsquares"], "typename": "GraphImage", "username": "instagram", "width": int, } }), # GraphSidecar ("https://www.instagram.com/p/BoHk1haB5tM/", { "count": 5, "keyword": { "sidecar_media_id": "1875629777499953996", "sidecar_shortcode": "BoHk1haB5tM", "post_id": "1875629777499953996", "post_shortcode": "BoHk1haB5tM", "post_url": "https://www.instagram.com/p/BoHk1haB5tM/", "num": int, "likes": int, "username": "instagram", } }), # GraphVideo ("https://www.instagram.com/p/Bqxp0VSBgJg/", { "pattern": r"/46840863_726311431074534_7805566102611403091_n\.mp4", "keyword": { "date": "dt:2018-11-29 19:23:58", "description": str, "height": int, "likes": int, "media_id": "1923502432034620000", "post_url": "https://www.instagram.com/p/Bqxp0VSBgJg/", "shortcode": "Bqxp0VSBgJg", "tags": ["#ASMR"], "typename": "GraphVideo", "username": "instagram", "width": int, } }), # GraphVideo (IGTV) ("https://www.instagram.com/tv/BkQjCfsBIzi/", { "pattern": r"/10000000_597132547321814_702169244961988209_n\.mp4", "keyword": { "date": "dt:2018-06-20 19:51:32", "description": str, "height": int, "likes": int, "media_id": "1806097553666903266", "post_url": "https://www.instagram.com/p/BkQjCfsBIzi/", "shortcode": "BkQjCfsBIzi", "typename": "GraphVideo", "username": "instagram", "width": int, } }), # GraphSidecar with 2 embedded GraphVideo objects ("https://www.instagram.com/p/BtOvDOfhvRr/", { "count": 2, "keyword": { "post_url": "https://www.instagram.com/p/BtOvDOfhvRr/", "sidecar_media_id": "1967717017113261163", "sidecar_shortcode": "BtOvDOfhvRr", "video_url": str, } }), # GraphImage with tagged user ("https://www.instagram.com/p/B_2lf3qAd3y/", { "keyword": { "tagged_users": [{ "id" : "1246468638", "username" : "kaaymbl", "full_name": "Call Me Kay", }] } }), # URL with username (#2085) ("https://www.instagram.com/dm/p/CW042g7B9CY/"), ("https://www.instagram.com/reel/CDg_6Y1pxWu/"), ) def posts(self): return self.api.media(id_from_shortcode(self.item)) class InstagramRestAPI(): def __init__(self, extractor): self.extractor = extractor def guide(self, guide_id): endpoint = "/v1/guides/web_info/" params = {"guide_id": guide_id} return self._call(endpoint, params=params) def guide_media(self, guide_id): endpoint = "/v1/guides/guide/{}/".format(guide_id) return self._pagination_guides(endpoint) def highlights_media(self, user_id): chunk_size = 5 reel_ids = [hl["id"] for hl in self.highlights_tray(user_id)] for offset in range(0, len(reel_ids), chunk_size): yield from self.reels_media( reel_ids[offset : offset+chunk_size]) def highlights_tray(self, user_id): endpoint = "/v1/highlights/{}/highlights_tray/".format(user_id) return self._call(endpoint)["tray"] def media(self, post_id): endpoint = "/v1/media/{}/info/".format(post_id) return self._pagination(endpoint) def reels_media(self, reel_ids): endpoint = "/v1/feed/reels_media/" params = {"reel_ids": reel_ids} return self._call(endpoint, params=params)["reels_media"] def tags_media(self, tag): for section in self.tags_sections(tag): for media in section["layout_content"]["medias"]: yield media["media"] def tags_sections(self, tag): endpoint = "/v1/tags/{}/sections/".format(tag) data = { "include_persistent": "0", "max_id" : None, "page" : None, "surface": "grid", "tab" : "recent", } return self._pagination_sections(endpoint, data) @memcache(keyarg=1) def user_by_name(self, screen_name): endpoint = "/v1/users/web_profile_info/" params = {"username": screen_name} return self._call(endpoint, params=params)["data"]["user"] def user_by_id(self, user_id): endpoint = "/v1/users/{}/info/".format(user_id) return self._call(endpoint)["user"] def user_id(self, screen_name, check_private=True): if screen_name.startswith("id:"): return screen_name[3:] user = self.user_by_name(screen_name) if user is None: raise exception.AuthorizationError( "Login required to access this profile") if check_private and user["is_private"] and \ not user["followed_by_viewer"]: name = user["username"] s = "" if name.endswith("s") else "s" raise exception.StopExtraction("%s'%s posts are private", name, s) self.extractor._assign_user(user) return user["id"] def user_clips(self, user_id): endpoint = "/v1/clips/user/" data = { "target_user_id": user_id, "page_size": "50", "max_id": None, "include_feed_video": "true", } return self._pagination_post(endpoint, data) def user_collection(self, collection_id): endpoint = "/v1/feed/collection/{}/posts/".format(collection_id) params = {"count": 50} return self._pagination(endpoint, params, media=True) def user_feed(self, user_id): endpoint = "/v1/feed/user/{}/".format(user_id) params = {"count": 30} return self._pagination(endpoint, params) def user_saved(self): endpoint = "/v1/feed/saved/posts/" params = {"count": 50} return self._pagination(endpoint, params, media=True) def user_tagged(self, user_id): endpoint = "/v1/usertags/{}/feed/".format(user_id) params = {"count": 50} return self._pagination(endpoint, params) def _call(self, endpoint, **kwargs): extr = self.extractor url = "https://www.instagram.com/api" + endpoint kwargs["headers"] = { "Accept" : "*/*", "X-CSRFToken" : extr.csrf_token, "X-Instagram-AJAX": "1006242110", "X-IG-App-ID" : "936619743392459", "X-ASBD-ID" : "198387", "X-IG-WWW-Claim" : extr.www_claim, "X-Requested-With": "XMLHttpRequest", "Alt-Used" : "www.instagram.com", "Referer" : extr.root + "/", } return extr.request(url, **kwargs).json() def _pagination(self, endpoint, params=None, media=False): if params is None: params = {} extr = self.extractor params["max_id"] = extr._init_cursor() while True: data = self._call(endpoint, params=params) if media: for item in data["items"]: yield item["media"] else: yield from data["items"] if not data.get("more_available"): return extr._update_cursor(None) params["max_id"] = extr._update_cursor(data["next_max_id"]) def _pagination_post(self, endpoint, params): extr = self.extractor params["max_id"] = extr._init_cursor() while True: data = self._call(endpoint, method="POST", data=params) for item in data["items"]: yield item["media"] info = data["paging_info"] if not info.get("more_available"): return extr._update_cursor(None) params["max_id"] = extr._update_cursor(info["max_id"]) def _pagination_sections(self, endpoint, params): extr = self.extractor params["max_id"] = extr._init_cursor() while True: info = self._call(endpoint, method="POST", data=params) yield from info["sections"] if not info.get("more_available"): return extr._update_cursor(None) params["page"] = info["next_page"] params["max_id"] = extr._update_cursor(info["next_max_id"]) def _pagination_guides(self, endpoint): extr = self.extractor params = {"max_id": extr._init_cursor()} while True: data = self._call(endpoint, params=params) for item in data["items"]: yield from item["media_items"] if "next_max_id" not in data: return extr._update_cursor(None) params["max_id"] = extr._update_cursor(data["next_max_id"]) class InstagramGraphqlAPI(): def __init__(self, extractor): self.extractor = extractor self.user_collection = self.user_saved = self.reels_media = \ self.highlights_media = self.guide = self.guide_media = \ self._unsupported self._json_dumps = json.JSONEncoder(separators=(",", ":")).encode api = InstagramRestAPI(extractor) self.user_by_name = api.user_by_name self.user_by_id = api.user_by_id self.user_id = api.user_id @staticmethod def _unsupported(_=None): raise exception.StopExtraction("Unsupported with GraphQL API") def highlights_tray(self, user_id): query_hash = "d4d88dc1500312af6f937f7b804c68c3" variables = { "user_id": user_id, "include_chaining": False, "include_reel": False, "include_suggested_users": False, "include_logged_out_extras": True, "include_highlight_reels": True, "include_live_status": False, } edges = (self._call(query_hash, variables)["user"] ["edge_highlight_reels"]["edges"]) return [edge["node"] for edge in edges] def media(self, post_id): query_hash = "9f8827793ef34641b2fb195d4d41151c" variables = { "shortcode": shortcode_from_id(post_id), "child_comment_count": 3, "fetch_comment_count": 40, "parent_comment_count": 24, "has_threaded_comments": True, } media = self._call(query_hash, variables).get("shortcode_media") return (media,) if media else () def tags_media(self, tag): query_hash = "9b498c08113f1e09617a1703c22b2f32" variables = {"tag_name": text.unescape(tag), "first": 50} return self._pagination(query_hash, variables, "hashtag", "edge_hashtag_to_media") def user_clips(self, user_id): query_hash = "bc78b344a68ed16dd5d7f264681c4c76" variables = {"id": user_id, "first": 50} return self._pagination(query_hash, variables) def user_feed(self, user_id): query_hash = "69cba40317214236af40e7efa697781d" variables = {"id": user_id, "first": 50} return self._pagination(query_hash, variables) def user_tagged(self, user_id): query_hash = "be13233562af2d229b008d2976b998b5" variables = {"id": user_id, "first": 50} return self._pagination(query_hash, variables) def _call(self, query_hash, variables): extr = self.extractor url = "https://www.instagram.com/graphql/query/" params = { "query_hash": query_hash, "variables" : self._json_dumps(variables), } headers = { "Accept" : "*/*", "X-CSRFToken" : extr.csrf_token, "X-Instagram-AJAX": "1006267176", "X-IG-App-ID" : "936619743392459", "X-ASBD-ID" : "198387", "X-IG-WWW-Claim" : extr.www_claim, "X-Requested-With": "XMLHttpRequest", "Referer" : extr.root + "/", } return extr.request(url, params=params, headers=headers).json()["data"] def _pagination(self, query_hash, variables, key_data="user", key_edge=None): extr = self.extractor variables["after"] = extr._init_cursor() while True: data = self._call(query_hash, variables)[key_data] data = data[key_edge] if key_edge else next(iter(data.values())) for edge in data["edges"]: yield edge["node"] info = data["page_info"] if not info["has_next_page"]: return extr._update_cursor(None) elif not data["edges"]: s = "" if self.item.endswith("s") else "s" raise exception.StopExtraction( "%s'%s posts are private", self.item, s) variables["after"] = extr._update_cursor(info["end_cursor"]) @cache(maxage=90*24*3600, keyarg=1) def _login_impl(extr, username, password): extr.log.error("Login with username & password is no longer supported. " "Use browser cookies instead.") return {} def id_from_shortcode(shortcode): return util.bdecode(shortcode, _ALPHABET) def shortcode_from_id(post_id): return util.bencode(int(post_id), _ALPHABET) _ALPHABET = ("ABCDEFGHIJKLMNOPQRSTUVWXYZ" "abcdefghijklmnopqrstuvwxyz" "0123456789-_") �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1675805194.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.25.0/gallery_dl/extractor/issuu.py�����������������������������������������������������0000644�0001750�0001750�00000007204�14370541012�020264� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2019-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://issuu.com/""" from .common import GalleryExtractor, Extractor, Message from .. import text, util class IssuuBase(): """Base class for issuu extractors""" category = "issuu" root = "https://issuu.com" class IssuuPublicationExtractor(IssuuBase, GalleryExtractor): """Extractor for a single publication""" subcategory = "publication" directory_fmt = ("{category}", "{document[username]}", "{document[date]:%Y-%m-%d} {document[title]}") filename_fmt = "{num:>03}.{extension}" archive_fmt = "{document[publicationId]}_{num}" pattern = r"(?:https?://)?issuu\.com(/[^/?#]+/docs/[^/?#]+)" test = ("https://issuu.com/issuu/docs/motions-1-2019/", { "pattern": r"https://image.isu.pub/190916155301-\w+/jpg/page_\d+.jpg", "count" : 36, "keyword": { "document": { "access" : "PUBLIC", "contentRating" : { "isAdsafe" : True, "isExplicit": False, "isReviewed": True, }, "date" : "dt:2019-09-16 00:00:00", "description" : "re:Motions, the brand new publication by I", "documentName" : "motions-1-2019", "downloadable" : False, "pageCount" : 36, "publicationId" : "d99ec95935f15091b040cb8060f05510", "title" : "Motions by Issuu - Issue 1", "username" : "issuu", }, "extension": "jpg", "filename" : r"re:page_\d+", "num" : int, }, }) def metadata(self, page): data = util.json_loads(text.extr( page, '<script data-json="', '"').replace(""", '"')) doc = data["initialDocumentData"]["document"] doc["date"] = text.parse_datetime( doc["originalPublishDateInISOString"], "%Y-%m-%dT%H:%M:%S.%fZ") self._cnt = text.parse_int(doc["pageCount"]) self._tpl = "https://{}/{}-{}/jpg/page_{{}}.jpg".format( data["config"]["hosts"]["image"], doc["revisionId"], doc["publicationId"], ) return {"document": doc} def images(self, page): fmt = self._tpl.format return [(fmt(i), None) for i in range(1, self._cnt + 1)] class IssuuUserExtractor(IssuuBase, Extractor): """Extractor for all publications of a user/publisher""" subcategory = "user" pattern = r"(?:https?://)?issuu\.com/([^/?#]+)/?$" test = ("https://issuu.com/issuu", { "pattern": IssuuPublicationExtractor.pattern, "count" : "> 25", }) def __init__(self, match): Extractor.__init__(self, match) self.user = match.group(1) def items(self): url = "{}/call/profile/v1/documents/{}".format(self.root, self.user) params = {"offset": 0, "limit": "25"} while True: data = self.request(url, params=params).json() for publication in data["items"]: publication["url"] = "{}/{}/docs/{}".format( self.root, self.user, publication["uri"]) publication["_extractor"] = IssuuPublicationExtractor yield Message.Queue, publication["url"], publication if not data["hasMore"]: return params["offset"] += data["limit"] ��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1674475069.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.25.0/gallery_dl/extractor/itaku.py�����������������������������������������������������0000644�0001750�0001750�00000015126�14363473075�020252� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://itaku.ee/""" from .common import Extractor, Message from ..cache import memcache from .. import text BASE_PATTERN = r"(?:https?://)?itaku\.ee" class ItakuExtractor(Extractor): """Base class for itaku extractors""" category = "itaku" root = "https://itaku.ee" directory_fmt = ("{category}", "{owner_username}") filename_fmt = ("{id}{title:? //}.{extension}") archive_fmt = "{id}" request_interval = (0.5, 1.5) def __init__(self, match): Extractor.__init__(self, match) self.api = ItakuAPI(self) self.item = match.group(1) self.videos = self.config("videos", True) def items(self): for post in self.posts(): post["date"] = text.parse_datetime( post["date_added"], "%Y-%m-%dT%H:%M:%S.%fZ") for category, tags in post.pop("categorized_tags").items(): post["tags_" + category.lower()] = [t["name"] for t in tags] post["tags"] = [t["name"] for t in post["tags"]] sections = [] for s in post["sections"]: group = s["group"] if group: sections.append(group["title"] + "/" + s["title"]) else: sections.append(s["title"]) post["sections"] = sections if post["video"] and self.videos: url = post["video"]["video"] else: url = post["image"] yield Message.Directory, post yield Message.Url, url, text.nameext_from_url(url, post) class ItakuGalleryExtractor(ItakuExtractor): """Extractor for posts from an itaku user gallery""" subcategory = "gallery" pattern = BASE_PATTERN + r"/profile/([^/?#]+)/gallery" test = ("https://itaku.ee/profile/piku/gallery", { "pattern": r"https://d1wmr8tlk3viaj\.cloudfront\.net/gallery_imgs" r"/[^/?#]+\.(jpg|png|gif)", "range": "1-10", "count": 10, }) def posts(self): return self.api.galleries_images(self.item) class ItakuImageExtractor(ItakuExtractor): subcategory = "image" pattern = BASE_PATTERN + r"/images/(\d+)" test = ( ("https://itaku.ee/images/100471", { "pattern": r"https://d1wmr8tlk3viaj\.cloudfront\.net/gallery_imgs" r"/220504_oUNIAFT\.png", "count": 1, "keyword": { "already_pinned": None, "blacklisted": { "blacklisted_tags": [], "is_blacklisted": False }, "can_reshare": True, "date": "dt:2022-05-05 19:21:17", "date_added": "2022-05-05T19:21:17.674148Z", "date_edited": "2022-05-25T14:37:46.220612Z", "description": "sketch from drawpile", "extension": "png", "filename": "220504_oUNIAFT", "hotness_score": float, "id": 100471, "image": "https://d1wmr8tlk3viaj.cloudfront.net/gallery_imgs" "/220504_oUNIAFT.png", "image_xl": "https://d1wmr8tlk3viaj.cloudfront.net" "/gallery_imgs/220504_oUNIAFT/lg.jpg", "liked_by_you": False, "maturity_rating": "SFW", "num_comments": int, "num_likes": int, "num_reshares": int, "obj_tags": 136446, "owner": 16775, "owner_avatar": "https://d1wmr8tlk3viaj.cloudfront.net" "/profile_pics/av2022r_vKYVywc/md.jpg", "owner_displayname": "Piku", "owner_username": "piku", "reshared_by_you": False, "sections": ["Fanart/Miku"], "tags": list, "tags_character": ["hatsune_miku"], "tags_copyright": ["vocaloid"], "tags_general" : ["twintails", "green_hair", "flag", "gloves", "green_eyes", "female", "racing_miku"], "title": "Racing Miku 2022 Ver.", "too_mature": False, "uncompressed_filesize": "0.62", "video": None, "visibility": "PUBLIC", }, }), # video ("https://itaku.ee/images/19465", { "pattern": r"https://d1wmr8tlk3viaj\.cloudfront\.net/gallery_vids" r"/sleepy_af_OY5GHWw\.mp4", }), ) def posts(self): return (self.api.image(self.item),) class ItakuAPI(): def __init__(self, extractor): self.extractor = extractor self.root = extractor.root + "/api" self.headers = { "Accept": "application/json, text/plain, */*", "Referer": extractor.root + "/", } def galleries_images(self, username, section=None): endpoint = "/galleries/images/" params = { "cursor" : None, "owner" : self.user(username)["owner"], "section" : section, "date_range": "", "maturity_rating": ("SFW", "Questionable", "NSFW"), "ordering" : "-date_added", "page" : "1", "page_size" : "30", "visibility": ("PUBLIC", "PROFILE_ONLY"), } return self._pagination(endpoint, params, self.image) def image(self, image_id): endpoint = "/galleries/images/{}/".format(image_id) return self._call(endpoint) @memcache(keyarg=1) def user(self, username): return self._call("/user_profiles/{}/".format(username)) def _call(self, endpoint, params=None): if not endpoint.startswith("http"): endpoint = self.root + endpoint response = self.extractor.request( endpoint, params=params, headers=self.headers) return response.json() def _pagination(self, endpoint, params, extend): data = self._call(endpoint, params) while True: if extend: for result in data["results"]: yield extend(result["id"]) else: yield from data["results"] url_next = data["links"].get("next") if not url_next: return data = self._call(url_next) ������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1674475069.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.25.0/gallery_dl/extractor/kabeuchi.py��������������������������������������������������0000644�0001750�0001750�00000006117�14363473075�020710� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2020 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://kabe-uchiroom.com/""" from .common import Extractor, Message from .. import text, exception class KabeuchiUserExtractor(Extractor): """Extractor for all posts of a user on kabe-uchiroom.com""" category = "kabeuchi" subcategory = "user" directory_fmt = ("{category}", "{twitter_user_id} {twitter_id}") filename_fmt = "{id}_{num:>02}{title:?_//}.{extension}" archive_fmt = "{id}_{num}" root = "https://kabe-uchiroom.com" pattern = r"(?:https?://)?kabe-uchiroom\.com/mypage/?\?id=(\d+)" test = ( ("https://kabe-uchiroom.com/mypage/?id=919865303848255493", { "pattern": (r"https://kabe-uchiroom\.com/accounts/upfile/3/" r"919865303848255493/\w+\.jpe?g"), "count": ">= 24", }), ("https://kabe-uchiroom.com/mypage/?id=123456789", { "exception": exception.NotFoundError, }), ) def __init__(self, match): Extractor.__init__(self, match) self.user_id = match.group(1) def items(self): base = "{}/accounts/upfile/{}/{}/".format( self.root, self.user_id[-1], self.user_id) keys = ("image1", "image2", "image3", "image4", "image5", "image6") for post in self.posts(): if post.get("is_ad") or not post["image1"]: continue post["date"] = text.parse_datetime( post["created_at"], "%Y-%m-%d %H:%M:%S") yield Message.Directory, post for key in keys: name = post[key] if not name: break url = base + name post["num"] = ord(key[-1]) - 48 yield Message.Url, url, text.nameext_from_url(name, post) def posts(self): url = "{}/mypage/?id={}".format(self.root, self.user_id) response = self.request(url) if response.history and response.url == self.root + "/": raise exception.NotFoundError("user") target_id = text.extr(response.text, 'user_friend_id = "', '"') return self._pagination(target_id) def _pagination(self, target_id): url = "{}/get_posts.php".format(self.root) data = { "user_id" : "0", "target_id" : target_id, "type" : "uploads", "sort_type" : "0", "category_id": "all", "latest_post": "", "page_num" : 0, } while True: info = self.request(url, method="POST", data=data).json() datas = info["datas"] if not datas or not isinstance(datas, list): return yield from datas last_id = datas[-1]["id"] if last_id == info["last_data"]: return data["latest_post"] = last_id data["page_num"] += 1 �������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1674475069.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.25.0/gallery_dl/extractor/keenspot.py��������������������������������������������������0000644�0001750�0001750�00000012554�14363473075�020767� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2019 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for http://www.keenspot.com/""" from .common import Extractor, Message from .. import text class KeenspotComicExtractor(Extractor): """Extractor for webcomics from keenspot.com""" category = "keenspot" subcategory = "comic" directory_fmt = ("{category}", "{comic}") filename_fmt = "{filename}.{extension}" archive_fmt = "{comic}_{filename}" pattern = r"(?:https?://)?(?!www\.|forums\.)([\w-]+)\.keenspot\.com(/.+)?" test = ( ("http://marksmen.keenspot.com/", { # link "range": "1-3", "url": "83bcf029103bf8bc865a1988afa4aaeb23709ba6", }), ("http://barkercomic.keenspot.com/", { # id "range": "1-3", "url": "c4080926db18d00bac641fdd708393b7d61379e6", }), ("http://crowscare.keenspot.com/", { # id v2 "range": "1-3", "url": "a00e66a133dd39005777317da90cef921466fcaa" }), ("http://supernovas.keenspot.com/", { # ks "range": "1-3", "url": "de21b12887ef31ff82edccbc09d112e3885c3aab" }), ("http://twokinds.keenspot.com/comic/1066/", { # "random" access "range": "1-3", "url": "6a784e11370abfb343dcad9adbb7718f9b7be350", }) ) def __init__(self, match): Extractor.__init__(self, match) self.comic = match.group(1).lower() self.path = match.group(2) self.root = "http://" + self.comic + ".keenspot.com" self._needle = "" self._image = 'class="ksc"' self._next = self._next_needle def items(self): data = {"comic": self.comic} yield Message.Directory, data with self.request(self.root + "/") as response: if response.history: url = response.request.url self.root = url[:url.index("/", 8)] page = response.text del response url = self._first(page) if self.path: url = self.root + self.path prev = None ilen = len(self._image) while url and url != prev: prev = url page = self.request(text.urljoin(self.root, url)).text pos = 0 while True: pos = page.find(self._image, pos) if pos < 0: break img, pos = text.extract(page, 'src="', '"', pos + ilen) if img.endswith(".js"): continue if img[0] == "/": img = self.root + img elif "youtube.com/" in img: img = "ytdl:" + img yield Message.Url, img, text.nameext_from_url(img, data) url = self._next(page) def _first(self, page): if self.comic == "brawlinthefamily": self._next = self._next_brawl self._image = '<div id="comic">' return "http://brawlinthefamily.keenspot.com/comic/theshowdown/" url = text.extr(page, '<link rel="first" href="', '"') if url: if self.comic == "porcelain": self._needle = 'id="porArchivetop_"' else: self._next = self._next_link return url pos = page.find('id="first_day1"') if pos >= 0: self._next = self._next_id return text.rextract(page, 'href="', '"', pos)[0] pos = page.find('>FIRST PAGE<') if pos >= 0: if self.comic == "lastblood": self._next = self._next_lastblood self._image = '<div id="comic">' else: self._next = self._next_id return text.rextract(page, 'href="', '"', pos)[0] pos = page.find('<div id="kscomicpart"') if pos >= 0: self._needle = '<a href="/archive.html' return text.extract(page, 'href="', '"', pos)[0] pos = page.find('>First Comic<') # twokinds if pos >= 0: self._image = '</header>' self._needle = 'class="navarchive"' return text.rextract(page, 'href="', '"', pos)[0] pos = page.find('id="flip_FirstDay"') # flipside if pos >= 0: self._image = 'class="flip_Pages ksc"' self._needle = 'id="flip_ArcButton"' return text.rextract(page, 'href="', '"', pos)[0] self.log.error("Unrecognized page layout") return None def _next_needle(self, page): pos = page.index(self._needle) + len(self._needle) return text.extract(page, 'href="', '"', pos)[0] @staticmethod def _next_link(page): return text.extr(page, '<link rel="next" href="', '"') @staticmethod def _next_id(page): pos = page.find('id="next_') return text.rextract(page, 'href="', '"', pos)[0] if pos >= 0 else None @staticmethod def _next_lastblood(page): pos = page.index("link rel='next'") return text.extract(page, "href='", "'", pos)[0] @staticmethod def _next_brawl(page): pos = page.index("comic-nav-next") url = text.rextract(page, 'href="', '"', pos)[0] return None if "?random" in url else url ����������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1674918287.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.25.0/gallery_dl/extractor/kemonoparty.py�����������������������������������������������0000644�0001750�0001750�00000047110�14365234617�021502� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2021-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://kemono.party/""" from .common import Extractor, Message from .. import text, exception from ..cache import cache import itertools import re BASE_PATTERN = r"(?:https?://)?(?:www\.|beta\.)?(kemono|coomer)\.party" USER_PATTERN = BASE_PATTERN + r"/([^/?#]+)/user/([^/?#]+)" HASH_PATTERN = r"/[0-9a-f]{2}/[0-9a-f]{2}/([0-9a-f]{64})" class KemonopartyExtractor(Extractor): """Base class for kemonoparty extractors""" category = "kemonoparty" root = "https://kemono.party" directory_fmt = ("{category}", "{service}", "{user}") filename_fmt = "{id}_{title}_{num:>02}_{filename[:180]}.{extension}" archive_fmt = "{service}_{user}_{id}_{num}" cookiedomain = ".kemono.party" def __init__(self, match): if match.group(1) == "coomer": self.category = "coomerparty" self.cookiedomain = ".coomer.party" self.root = text.root_from_url(match.group(0)) Extractor.__init__(self, match) self.session.headers["Referer"] = self.root + "/" def items(self): self._prepare_ddosguard_cookies() self._find_inline = re.compile( r'src="(?:https?://(?:kemono|coomer)\.party)?(/inline/[^"]+' r'|/[0-9a-f]{2}/[0-9a-f]{2}/[0-9a-f]{64}\.[^"]+)').findall find_hash = re.compile(HASH_PATTERN).match generators = self._build_file_generators(self.config("files")) duplicates = self.config("duplicates") comments = self.config("comments") username = dms = None # prevent files from being sent with gzip compression headers = {"Accept-Encoding": "identity"} if self.config("metadata"): username = text.unescape(text.extract( self.request(self.user_url).text, '<meta name="artist_name" content="', '"')[0]) if self.config("dms"): dms = True posts = self.posts() max_posts = self.config("max-posts") if max_posts: posts = itertools.islice(posts, max_posts) for post in posts: headers["Referer"] = "{}/{}/user/{}/post/{}".format( self.root, post["service"], post["user"], post["id"]) post["_http_headers"] = headers post["date"] = text.parse_datetime( post["published"] or post["added"], "%a, %d %b %Y %H:%M:%S %Z") if username: post["username"] = username if comments: post["comments"] = self._extract_comments(post) if dms is not None: if dms is True: dms = self._extract_dms(post) post["dms"] = dms files = [] hashes = set() for file in itertools.chain.from_iterable( g(post) for g in generators): url = file["path"] match = find_hash(url) if match: file["hash"] = hash = match.group(1) if not duplicates: if hash in hashes: self.log.debug("Skipping %s (duplicate)", url) continue hashes.add(hash) else: file["hash"] = "" files.append(file) post["count"] = len(files) yield Message.Directory, post for post["num"], file in enumerate(files, 1): post["_http_validate"] = None post["hash"] = file["hash"] post["type"] = file["type"] url = file["path"] text.nameext_from_url(file.get("name", url), post) ext = text.ext_from_url(url) if not post["extension"]: post["extension"] = ext elif ext == "txt" and post["extension"] != "txt": post["_http_validate"] = _validate if url[0] == "/": url = self.root + "/data" + url elif url.startswith(self.root): url = self.root + "/data" + url[20:] yield Message.Url, url, post def login(self): username, password = self._get_auth_info() if username: self._update_cookies(self._login_impl(username, password)) @cache(maxage=28*24*3600, keyarg=1) def _login_impl(self, username, password): self.log.info("Logging in as %s", username) url = self.root + "/account/login" data = {"username": username, "password": password} response = self.request(url, method="POST", data=data) if response.url.endswith("/account/login") and \ "Username or password is incorrect" in response.text: raise exception.AuthenticationError() return {c.name: c.value for c in response.history[0].cookies} def _file(self, post): file = post["file"] if not file: return () file["type"] = "file" return (file,) def _attachments(self, post): for attachment in post["attachments"]: attachment["type"] = "attachment" return post["attachments"] def _inline(self, post): for path in self._find_inline(post["content"] or ""): yield {"path": path, "name": path, "type": "inline"} def _build_file_generators(self, filetypes): if filetypes is None: return (self._attachments, self._file, self._inline) genmap = { "file" : self._file, "attachments": self._attachments, "inline" : self._inline, } if isinstance(filetypes, str): filetypes = filetypes.split(",") return [genmap[ft] for ft in filetypes] def _extract_comments(self, post): url = "{}/{}/user/{}/post/{}".format( self.root, post["service"], post["user"], post["id"]) page = self.request(url).text comments = [] for comment in text.extract_iter(page, "<article", "</article>"): extr = text.extract_from(comment) cid = extr('id="', '"') comments.append({ "id" : cid, "user": extr('href="#' + cid + '"', '</').strip(" \n\r>"), "body": extr( '<section class="comment__body">', '</section>').strip(), "date": extr('datetime="', '"'), }) return comments def _extract_dms(self, post): url = "{}/{}/user/{}/dms".format( self.root, post["service"], post["user"]) page = self.request(url).text dms = [] for dm in text.extract_iter(page, "<article", "</article>"): dms.append({ "body": text.unescape(text.extract( dm, "<pre>", "</pre></", )[0].strip()), "date": text.extr(dm, 'datetime="', '"'), }) return dms def _validate(response): return (response.headers["content-length"] != "9" or response.content != b"not found") class KemonopartyUserExtractor(KemonopartyExtractor): """Extractor for all posts from a kemono.party user listing""" subcategory = "user" pattern = USER_PATTERN + r"/?(?:\?o=(\d+))?(?:$|[?#])" test = ( ("https://kemono.party/fanbox/user/6993449", { "range": "1-25", "count": 25, }), # 'max-posts' option, 'o' query parameter (#1674) ("https://kemono.party/patreon/user/881792?o=150", { "options": (("max-posts", 25),), "count": "< 100", }), ("https://kemono.party/subscribestar/user/alcorart"), ) def __init__(self, match): _, service, user_id, offset = match.groups() self.subcategory = service KemonopartyExtractor.__init__(self, match) self.api_url = "{}/api/{}/user/{}".format(self.root, service, user_id) self.user_url = "{}/{}/user/{}".format(self.root, service, user_id) self.offset = text.parse_int(offset) def posts(self): url = self.api_url params = {"o": self.offset} while True: posts = self.request(url, params=params).json() yield from posts cnt = len(posts) if cnt < 25: return params["o"] += cnt class KemonopartyPostExtractor(KemonopartyExtractor): """Extractor for a single kemono.party post""" subcategory = "post" pattern = USER_PATTERN + r"/post/([^/?#]+)" test = ( ("https://kemono.party/fanbox/user/6993449/post/506575", { "pattern": r"https://kemono.party/data/21/0f" r"/210f35388e28bbcf756db18dd516e2d82ce75[0-9a-f]+\.jpg", "content": "900949cefc97ab8dc1979cc3664785aac5ba70dd", "keyword": { "added": "Wed, 06 May 2020 20:28:02 GMT", "content": str, "count": 1, "date": "dt:2019-08-11 02:09:04", "edited": None, "embed": dict, "extension": "jpeg", "filename": "P058kDFYus7DbqAkGlfWTlOr", "hash": "210f35388e28bbcf756db18dd516e2d8" "2ce758e0d32881eeee76d43e1716d382", "id": "506575", "num": 1, "published": "Sun, 11 Aug 2019 02:09:04 GMT", "service": "fanbox", "shared_file": False, "subcategory": "fanbox", "title": "c96取り置き", "type": "file", "user": "6993449", }, }), # inline image (#1286) ("https://kemono.party/fanbox/user/7356311/post/802343", { "pattern": r"https://kemono\.party/data/47/b5/47b5c014ecdcfabdf2c8" r"5eec53f1133a76336997ae8596f332e97d956a460ad2\.jpg", "keyword": {"hash": "47b5c014ecdcfabdf2c85eec53f1133a" "76336997ae8596f332e97d956a460ad2"}, }), # kemono.party -> data.kemono.party ("https://kemono.party/gumroad/user/trylsc/post/IURjT", { "pattern": r"https://kemono\.party/data/(" r"a4/7b/a47bfe938d8c1682eef06e885927484cd8df1b.+\.jpg|" r"c6/04/c6048f5067fd9dbfa7a8be565ac194efdfb6e4.+\.zip)", }), # username (#1548, #1652) ("https://kemono.party/gumroad/user/3252870377455/post/aJnAH", { "options": (("metadata", True),), "keyword": {"username": "Kudalyn's Creations"}, }), # skip patreon duplicates ("https://kemono.party/patreon/user/4158582/post/32099982", { "count": 2, }), # allow duplicates (#2440) ("https://kemono.party/patreon/user/4158582/post/32099982", { "options": (("duplicates", True),), "count": 3, }), # DMs (#2008) ("https://kemono.party/patreon/user/34134344/post/38129255", { "options": (("dms", True),), "keyword": {"dms": [{ "body": r"re:Hi! Thank you very much for supporting the work I" r" did in May. Here's your reward pack! I hope you fin" r"d something you enjoy in it. :\)\n\nhttps://www.medi" r"afire.com/file/\w+/Set13_tier_2.zip/file", "date": "2021-07-31 02:47:51.327865", }]}, }), # coomer.party (#2100) ("https://coomer.party/onlyfans/user/alinity/post/125962203", { "pattern": r"https://coomer\.party/data/7d/3f/7d3fd9804583dc224968" r"c0591163ec91794552b04f00a6c2f42a15b68231d5a8\.jpg", }), # invalid file (#3510) ("https://kemono.party/patreon/user/19623797/post/29035449", { "pattern": r"907ba78b4545338d3539683e63ecb51c" r"f51c10adc9dabd86e92bd52339f298b9\.txt", "content": "da39a3ee5e6b4b0d3255bfef95601890afd80709", # empty }), ("https://kemono.party/subscribestar/user/alcorart/post/184330"), ("https://www.kemono.party/subscribestar/user/alcorart/post/184330"), ("https://beta.kemono.party/subscribestar/user/alcorart/post/184330"), ) def __init__(self, match): _, service, user_id, post_id = match.groups() self.subcategory = service KemonopartyExtractor.__init__(self, match) self.api_url = "{}/api/{}/user/{}/post/{}".format( self.root, service, user_id, post_id) self.user_url = "{}/{}/user/{}".format(self.root, service, user_id) def posts(self): posts = self.request(self.api_url).json() return (posts[0],) if len(posts) > 1 else posts class KemonopartyDiscordExtractor(KemonopartyExtractor): """Extractor for kemono.party discord servers""" subcategory = "discord" directory_fmt = ("{category}", "discord", "{server}", "{channel_name|channel}") filename_fmt = "{id}_{num:>02}_{filename}.{extension}" archive_fmt = "discord_{server}_{id}_{num}" pattern = BASE_PATTERN + r"/discord/server/(\d+)(?:/channel/(\d+))?#(.*)" test = ( (("https://kemono.party/discord" "/server/488668827274444803#finish-work"), { "count": 4, "keyword": {"channel_name": "finish-work"}, }), (("https://kemono.party/discord" "/server/256559665620451329/channel/462437519519383555#"), { "pattern": r"https://kemono\.party/data/(" r"e3/77/e377e3525164559484ace2e64425b0cec1db08.*\.png|" r"51/45/51453640a5e0a4d23fbf57fb85390f9c5ec154.*\.gif)", "keyword": {"hash": "re:e377e3525164559484ace2e64425b0cec1db08" "|51453640a5e0a4d23fbf57fb85390f9c5ec154"}, "count": ">= 2", }), # 'inline' files (("https://kemono.party/discord" "/server/315262215055736843/channel/315262215055736843#general"), { "pattern": r"https://cdn\.discordapp\.com/attachments/\d+/\d+/.+$", "options": (("image-filter", "type == 'inline'"),), "keyword": {"hash": ""}, "range": "1-5", }), ) def __init__(self, match): KemonopartyExtractor.__init__(self, match) _, self.server, self.channel, self.channel_name = match.groups() def items(self): self._prepare_ddosguard_cookies() find_inline = re.compile( r"https?://(?:cdn\.discordapp.com|media\.discordapp\.net)" r"(/[A-Za-z0-9-._~:/?#\[\]@!$&'()*+,;%=]+)").findall find_hash = re.compile(HASH_PATTERN).match posts = self.posts() max_posts = self.config("max-posts") if max_posts: posts = itertools.islice(posts, max_posts) for post in posts: files = [] append = files.append for attachment in post["attachments"]: match = find_hash(attachment["path"]) attachment["hash"] = match.group(1) if match else "" attachment["type"] = "attachment" append(attachment) for path in find_inline(post["content"] or ""): append({"path": "https://cdn.discordapp.com" + path, "name": path, "type": "inline", "hash": ""}) post["channel_name"] = self.channel_name post["date"] = text.parse_datetime( post["published"], "%a, %d %b %Y %H:%M:%S %Z") post["count"] = len(files) yield Message.Directory, post for post["num"], file in enumerate(files, 1): post["hash"] = file["hash"] post["type"] = file["type"] url = file["path"] text.nameext_from_url(file.get("name", url), post) if not post["extension"]: post["extension"] = text.ext_from_url(url) if url[0] == "/": url = self.root + "/data" + url elif url.startswith(self.root): url = self.root + "/data" + url[20:] yield Message.Url, url, post def posts(self): if self.channel is None: url = "{}/api/discord/channels/lookup?q={}".format( self.root, self.server) for channel in self.request(url).json(): if channel["name"] == self.channel_name: self.channel = channel["id"] break else: raise exception.NotFoundError("channel") url = "{}/api/discord/channel/{}".format(self.root, self.channel) params = {"skip": 0} while True: posts = self.request(url, params=params).json() yield from posts cnt = len(posts) if cnt < 25: break params["skip"] += cnt class KemonopartyDiscordServerExtractor(KemonopartyExtractor): subcategory = "discord-server" pattern = BASE_PATTERN + r"/discord/server/(\d+)$" test = ("https://kemono.party/discord/server/488668827274444803", { "pattern": KemonopartyDiscordExtractor.pattern, "count": 13, }) def __init__(self, match): KemonopartyExtractor.__init__(self, match) self.server = match.group(2) def items(self): url = "{}/api/discord/channels/lookup?q={}".format( self.root, self.server) channels = self.request(url).json() for channel in channels: url = "{}/discord/server/{}/channel/{}#{}".format( self.root, self.server, channel["id"], channel["name"]) channel["_extractor"] = KemonopartyDiscordExtractor yield Message.Queue, url, channel class KemonopartyFavoriteExtractor(KemonopartyExtractor): """Extractor for kemono.party favorites""" subcategory = "favorite" pattern = BASE_PATTERN + r"/favorites(?:/?\?([^#]+))?" test = ( ("https://kemono.party/favorites", { "pattern": KemonopartyUserExtractor.pattern, "url": "f4b5b796979bcba824af84206578c79101c7f0e1", "count": 3, }), ("https://kemono.party/favorites?type=post", { "pattern": KemonopartyPostExtractor.pattern, "url": "ecfccf5f0d50b8d14caa7bbdcf071de5c1e5b90f", "count": 3, }), ) def __init__(self, match): KemonopartyExtractor.__init__(self, match) self.favorites = (text.parse_query(match.group(2)).get("type") or self.config("favorites") or "artist") def items(self): self._prepare_ddosguard_cookies() self.login() if self.favorites == "artist": users = self.request( self.root + "/api/favorites?type=artist").json() for user in users: user["_extractor"] = KemonopartyUserExtractor url = "{}/{}/user/{}".format( self.root, user["service"], user["id"]) yield Message.Queue, url, user elif self.favorites == "post": posts = self.request( self.root + "/api/favorites?type=post").json() for post in posts: post["_extractor"] = KemonopartyPostExtractor url = "{}/{}/user/{}/post/{}".format( self.root, post["service"], post["user"], post["id"]) yield Message.Queue, url, post ��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1674475069.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.25.0/gallery_dl/extractor/khinsider.py�������������������������������������������������0000644�0001750�0001750�00000006744�14363473075�021123� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2016-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://downloads.khinsider.com/""" from .common import Extractor, Message, AsynchronousMixin from .. import text, exception class KhinsiderSoundtrackExtractor(AsynchronousMixin, Extractor): """Extractor for soundtracks from khinsider.com""" category = "khinsider" subcategory = "soundtrack" directory_fmt = ("{category}", "{album[name]}") archive_fmt = "{filename}.{extension}" pattern = (r"(?:https?://)?downloads\.khinsider\.com" r"/game-soundtracks/album/([^/?#]+)") root = "https://downloads.khinsider.com" test = (("https://downloads.khinsider.com" "/game-soundtracks/album/horizon-riders-wii"), { "pattern": r"https?://vgm(site|downloads)\.com" r"/soundtracks/horizon-riders-wii/[^/]+" r"/Horizon%20Riders%20Wii%20-%20Full%20Soundtrack\.mp3", "keyword": { "album": { "count": 1, "date": "Sep 18th, 2016", "name": "Horizon Riders", "platform": "Wii", "size": 26214400, "type": "Gamerip", }, "extension": "mp3", "filename": "Horizon Riders Wii - Full Soundtrack", }, "count": 1, }) def __init__(self, match): Extractor.__init__(self, match) self.album = match.group(1) def items(self): url = self.root + "/game-soundtracks/album/" + self.album page = self.request(url, encoding="utf-8").text if "Download all songs at once:" not in page: raise exception.NotFoundError("soundtrack") data = self.metadata(page) yield Message.Directory, data for track in self.tracks(page): track.update(data) yield Message.Url, track["url"], track def metadata(self, page): extr = text.extract_from(page) return {"album": { "name" : text.unescape(extr("<h2>", "<")), "platform": extr("Platforms: <a", "<").rpartition(">")[2], "count": text.parse_int(extr("Number of Files: <b>", "<")), "size" : text.parse_bytes(extr("Total Filesize: <b>", "<")[:-1]), "date" : extr("Date Added: <b>", "<"), "type" : text.remove_html(extr("Album type: <b>", "</b>")), }} def tracks(self, page): fmt = self.config("format", ("mp3",)) if fmt and isinstance(fmt, str): if fmt == "all": fmt = None else: fmt = fmt.lower().split(",") page = text.extr(page, '<table id="songlist">', '</table>') for num, url in enumerate(text.extract_iter( page, '<td class="clickable-row"><a href="', '"'), 1): url = text.urljoin(self.root, url) page = self.request(url, encoding="utf-8").text track = first = None for url in text.extract_iter(page, '<p><a href="', '"'): track = text.nameext_from_url(url, {"num": num, "url": url}) if first is None: first = track if not fmt or track["extension"] in fmt: first = False yield track if first: yield first ����������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1674475069.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.25.0/gallery_dl/extractor/komikcast.py�������������������������������������������������0000644�0001750�0001750�00000007606�14363473075�021126� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2018-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://komikcast.site/""" from .common import ChapterExtractor, MangaExtractor from .. import text import re BASE_PATTERN = r"(?:https?://)?(?:www\.)?komikcast\.(?:site|me|com)" class KomikcastBase(): """Base class for komikcast extractors""" category = "komikcast" root = "https://komikcast.site" @staticmethod def parse_chapter_string(chapter_string, data=None): """Parse 'chapter_string' value and add its info to 'data'""" if not data: data = {} match = re.match( r"(?:(.*) Chapter )?0*(\d+)([^ ]*)(?: (?:- )?(.+))?", text.unescape(chapter_string), ) manga, chapter, data["chapter_minor"], title = match.groups() if manga: data["manga"] = manga.partition(" Chapter ")[0] if title and not title.lower().startswith("bahasa indonesia"): data["title"] = title.strip() else: data["title"] = "" data["chapter"] = text.parse_int(chapter) data["lang"] = "id" data["language"] = "Indonesian" return data class KomikcastChapterExtractor(KomikcastBase, ChapterExtractor): """Extractor for manga-chapters from komikcast.site""" pattern = BASE_PATTERN + r"(/chapter/[^/?#]+/)" test = ( (("https://komikcast.site/chapter" "/apotheosis-chapter-02-2-bahasa-indonesia/"), { "url": "f6b43fbc027697749b3ea1c14931c83f878d7936", "keyword": "f3938e1aff9ad1f302f52447e9781b21f6da26d4", }), (("https://komikcast.me/chapter" "/soul-land-ii-chapter-300-1-bahasa-indonesia/"), { "url": "efd00a9bd95461272d51990d7bc54b79ff3ff2e6", "keyword": "cb646cfed3d45105bd645ab38b2e9f7d8c436436", }), ) def metadata(self, page): info = text.extr(page, "<title>", " - Komikcast<") return self.parse_chapter_string(info) @staticmethod def images(page): readerarea = text.extr( page, '<div class="main-reading-area', '</div') return [ (text.unescape(url), None) for url in re.findall(r"<img[^>]* src=[\"']([^\"']+)", readerarea) ] class KomikcastMangaExtractor(KomikcastBase, MangaExtractor): """Extractor for manga from komikcast.site""" chapterclass = KomikcastChapterExtractor pattern = BASE_PATTERN + r"(/(?:komik/)?[^/?#]+)/?$" test = ( ("https://komikcast.site/komik/090-eko-to-issho/", { "url": "19d3d50d532e84be6280a3d61ff0fd0ca04dd6b4", "keyword": "837a7e96867344ff59d840771c04c20dc46c0ab1", }), ("https://komikcast.me/tonari-no-kashiwagi-san/"), ) def chapters(self, page): results = [] data = self.metadata(page) for item in text.extract_iter( page, '<a class="chapter-link-item" href="', '</a'): url, _, chapter_string = item.rpartition('">Chapter ') self.parse_chapter_string(chapter_string, data) results.append((url, data.copy())) return results @staticmethod def metadata(page): """Return a dict with general metadata""" manga , pos = text.extract(page, "<title>" , " - Komikcast<") genres, pos = text.extract( page, 'class="komik_info-content-genre">', "</span>", pos) author, pos = text.extract(page, ">Author:", "</span>", pos) mtype , pos = text.extract(page, ">Type:" , "</span>", pos) return { "manga": text.unescape(manga), "genres": text.split_html(genres), "author": text.remove_html(author), "type": text.remove_html(mtype), } ��������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1674918287.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.25.0/gallery_dl/extractor/lexica.py����������������������������������������������������0000644�0001750�0001750�00000006341�14365234617�020400� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://lexica.art/""" from .common import Extractor, Message from .. import text class LexicaSearchExtractor(Extractor): """Extractor for lexica.art search results""" category = "lexica" subcategory = "search" root = "https://lexica.art" directory_fmt = ("{category}", "{search_tags}") archive_fmt = "{id}" pattern = r"(?:https?://)?lexica\.art/?\?q=([^&#]+)" test = ( ("https://lexica.art/?q=tree", { "pattern": r"https://lexica-serve-encoded-images2\.sharif\." r"workers.dev/full_jpg/[0-9a-f-]{36}$", "range": "1-80", "count": 80, "keyword": { "height": int, "id": str, "upscaled_height": int, "upscaled_width": int, "userid": str, "width": int, "prompt": { "c": int, "grid": bool, "height": int, "id": str, "images": list, "initImage": None, "initImageStrength": None, "model": "lexica-aperture-v2", "negativePrompt": str, "prompt": str, "seed": str, "timestamp": r"re:\d{4}-\d\d-\d\dT\d\d:\d\d:\d\d.\d\d\dZ", "width": int, }, }, }), ) def __init__(self, match): Extractor.__init__(self, match) self.query = match.group(1) self.text = text.unquote(self.query).replace("+", " ") def items(self): base = ("https://lexica-serve-encoded-images2.sharif.workers.dev" "/full_jpg/") tags = self.text for image in self.posts(): image["filename"] = image["id"] image["extension"] = "jpg" image["search_tags"] = tags yield Message.Directory, image yield Message.Url, base + image["id"], image def posts(self): url = self.root + "/api/infinite-prompts" headers = { "Accept" : "application/json, text/plain, */*", "Referer": "{}/?q={}".format(self.root, self.query), } json = { "text" : self.text, "searchMode": "images", "source" : "search", "cursor" : 0, "model" : "lexica-aperture-v2", } while True: data = self.request( url, method="POST", headers=headers, json=json).json() prompts = { prompt["id"]: prompt for prompt in data["prompts"] } for image in data["images"]: image["prompt"] = prompts[image["promptid"]] del image["promptid"] yield image cursor = data.get("nextCursor") if not cursor: return json["cursor"] = cursor �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1675805577.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.25.0/gallery_dl/extractor/lightroom.py�������������������������������������������������0000644�0001750�0001750�00000006541�14370541611�021130� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://lightroom.adobe.com/""" from .common import Extractor, Message from .. import text, util class LightroomGalleryExtractor(Extractor): """Extractor for an image gallery on lightroom.adobe.com""" category = "lightroom" subcategory = "gallery" directory_fmt = ("{category}", "{user}", "{title}") filename_fmt = "{num:>04}_{id}.{extension}" archive_fmt = "{id}" pattern = r"(?:https?://)?lightroom\.adobe\.com/shares/([0-9a-f]+)" test = ( (("https://lightroom.adobe.com/shares/" "0c9cce2033f24d24975423fe616368bf"), { "keyword": { "title": "Sterne und Nachtphotos", "user": "Christian Schrang", }, "count": ">= 55", }), (("https://lightroom.adobe.com/shares/" "7ba68ad5a97e48608d2e6c57e6082813"), { "keyword": { "title": "HEBFC Snr/Res v Brighton", "user": "", }, "count": ">= 180", }), ) def __init__(self, match): Extractor.__init__(self, match) self.href = match.group(1) def items(self): # Get config url = "https://lightroom.adobe.com/shares/" + self.href response = self.request(url) album = util.json_loads( text.extr(response.text, "albumAttributes: ", "\n") ) images = self.images(album) for img in images: url = img["url"] yield Message.Directory, img yield Message.Url, url, text.nameext_from_url(url, img) def metadata(self, album): payload = album["payload"] story = payload.get("story") or {} return { "gallery_id": self.href, "user": story.get("author", ""), "title": story.get("title", payload["name"]), } def images(self, album): album_md = self.metadata(album) base_url = album["base"] next_url = album["links"]["/rels/space_album_images_videos"]["href"] num = 1 while next_url: url = base_url + next_url page = self.request(url).text # skip 1st line as it's a JS loop data = util.json_loads(page[page.index("\n") + 1:]) base_url = data["base"] for res in data["resources"]: img_url, img_size = None, 0 for key, value in res["asset"]["links"].items(): if not key.startswith("/rels/rendition_type/"): continue size = text.parse_int(key.split("/")[-1]) if size > img_size: img_size = size img_url = value["href"] if img_url: img = { "id": res["asset"]["id"], "num": num, "url": base_url + img_url, } img.update(album_md) yield img num += 1 try: next_url = data["links"]["next"]["href"] except KeyError: next_url = None ���������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1674475069.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.25.0/gallery_dl/extractor/lineblog.py��������������������������������������������������0000644�0001750�0001750�00000004452�14363473075�020730� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2019-2020 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.lineblog.me/""" from .livedoor import LivedoorBlogExtractor, LivedoorPostExtractor from .. import text class LineblogBase(): """Base class for lineblog extractors""" category = "lineblog" root = "https://lineblog.me" def _images(self, post): imgs = [] body = post.pop("body") for num, img in enumerate(text.extract_iter(body, "<img ", ">"), 1): src = text.extr(img, 'src="', '"') alt = text.extr(img, 'alt="', '"') if not src: continue if src.startswith("https://obs.line-scdn.") and src.count("/") > 3: src = src.rpartition("/")[0] imgs.append(text.nameext_from_url(alt or src, { "url" : src, "num" : num, "hash": src.rpartition("/")[2], "post": post, })) return imgs class LineblogBlogExtractor(LineblogBase, LivedoorBlogExtractor): """Extractor for a user's blog on lineblog.me""" pattern = r"(?:https?://)?lineblog\.me/(\w+)/?(?:$|[?#])" test = ("https://lineblog.me/mamoru_miyano/", { "range": "1-20", "count": 20, "pattern": r"https://obs.line-scdn.net/[\w-]+$", "keyword": { "post": { "categories" : tuple, "date" : "type:datetime", "description": str, "id" : int, "tags" : list, "title" : str, "user" : "mamoru_miyano" }, "filename": str, "hash" : r"re:\w{32,}", "num" : int, }, }) class LineblogPostExtractor(LineblogBase, LivedoorPostExtractor): """Extractor for blog posts on lineblog.me""" pattern = r"(?:https?://)?lineblog\.me/(\w+)/archives/(\d+)" test = ("https://lineblog.me/mamoru_miyano/archives/1919150.html", { "url": "24afeb4044c554f80c374b52bf8109c6f1c0c757", "keyword": "76a38e2c0074926bd3362f66f9fc0e6c41591dcb", }) ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1674475069.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.25.0/gallery_dl/extractor/livedoor.py��������������������������������������������������0000644�0001750�0001750�00000012646�14363473075�020764� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2019-2020 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for http://blog.livedoor.jp/""" from .common import Extractor, Message from .. import text class LivedoorExtractor(Extractor): """Base class for livedoor extractors""" category = "livedoor" root = "http://blog.livedoor.jp" filename_fmt = "{post[id]}_{post[title]}_{num:>02}.{extension}" directory_fmt = ("{category}", "{post[user]}") archive_fmt = "{post[id]}_{hash}" def __init__(self, match): Extractor.__init__(self, match) self.user = match.group(1) def items(self): for post in self.posts(): images = self._images(post) if images: yield Message.Directory, {"post": post} for image in images: yield Message.Url, image["url"], image def posts(self): """Return an iterable with post objects""" def _load(self, data, body): extr = text.extract_from(data) tags = text.extr(body, 'class="article-tags">', '</dl>') about = extr('rdf:about="', '"') return { "id" : text.parse_int( about.rpartition("/")[2].partition(".")[0]), "title" : text.unescape(extr('dc:title="', '"')), "categories" : extr('dc:subject="', '"').partition(",")[::2], "description": extr('dc:description="', '"'), "date" : text.parse_datetime(extr('dc:date="', '"')), "tags" : text.split_html(tags)[1:] if tags else [], "user" : self.user, "body" : body, } def _images(self, post): imgs = [] body = post.pop("body") for num, img in enumerate(text.extract_iter(body, "<img ", ">"), 1): src = text.extr(img, 'src="', '"') alt = text.extr(img, 'alt="', '"') if not src: continue if "://livedoor.blogimg.jp/" in src: url = src.replace("http:", "https:", 1).replace("-s.", ".") else: url = text.urljoin(self.root, src) name, _, ext = url.rpartition("/")[2].rpartition(".") imgs.append({ "url" : url, "num" : num, "hash" : name, "filename" : alt or name, "extension": ext, "post" : post, }) return imgs class LivedoorBlogExtractor(LivedoorExtractor): """Extractor for a user's blog on blog.livedoor.jp""" subcategory = "blog" pattern = r"(?:https?://)?blog\.livedoor\.jp/(\w+)/?(?:$|[?#])" test = ( ("http://blog.livedoor.jp/zatsu_ke/", { "range": "1-50", "count": 50, "archive": False, "pattern": r"https?://livedoor.blogimg.jp/\w+/imgs/\w/\w/\w+\.\w+", "keyword": { "post": { "categories" : tuple, "date" : "type:datetime", "description": str, "id" : int, "tags" : list, "title" : str, "user" : "zatsu_ke" }, "filename": str, "hash" : r"re:\w{4,}", "num" : int, }, }), ("http://blog.livedoor.jp/uotapo/", { "range": "1-5", "count": 5, }), ) def posts(self): url = "{}/{}".format(self.root, self.user) while url: extr = text.extract_from(self.request(url).text) while True: data = extr('<rdf:RDF', '</rdf:RDF>') if not data: break body = extr('class="article-body-inner">', 'class="article-footer">') yield self._load(data, body) url = extr('<a rel="next" href="', '"') class LivedoorPostExtractor(LivedoorExtractor): """Extractor for images from a blog post on blog.livedoor.jp""" subcategory = "post" pattern = r"(?:https?://)?blog\.livedoor\.jp/(\w+)/archives/(\d+)" test = ( ("http://blog.livedoor.jp/zatsu_ke/archives/51493859.html", { "url": "9ca3bbba62722c8155be79ad7fc47be409e4a7a2", "keyword": "1f5b558492e0734f638b760f70bfc0b65c5a97b9", }), ("http://blog.livedoor.jp/amaumauma/archives/7835811.html", { "url": "204bbd6a9db4969c50e0923855aeede04f2e4a62", "keyword": "05821c7141360e6057ef2d382b046f28326a799d", }), ("http://blog.livedoor.jp/uotapo/archives/1050616939.html", { "url": "4b5ab144b7309eb870d9c08f8853d1abee9946d2", "keyword": "84fbf6e4eef16675013d6333039a7cfcb22c2d50", }), ) def __init__(self, match): LivedoorExtractor.__init__(self, match) self.post_id = match.group(2) def posts(self): url = "{}/{}/archives/{}.html".format( self.root, self.user, self.post_id) extr = text.extract_from(self.request(url).text) data = extr('<rdf:RDF', '</rdf:RDF>') body = extr('class="article-body-inner">', 'class="article-footer">') return (self._load(data, body),) ������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1674475069.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.25.0/gallery_dl/extractor/lolisafe.py��������������������������������������������������0000644�0001750�0001750�00000004305�14363473075�020730� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2021-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for lolisafe/chibisafe instances""" from .common import BaseExtractor, Message from .. import text class LolisafeExtractor(BaseExtractor): """Base class for lolisafe extractors""" basecategory = "lolisafe" directory_fmt = ("{category}", "{album_name} ({album_id})") archive_fmt = "{album_id}_{id}" BASE_PATTERN = LolisafeExtractor.update({ "xbunkr": { "root": "https://xbunkr.com", "pattern": r"xbunkr\.com", }, }) class LolisafeAlbumExtractor(LolisafeExtractor): subcategory = "album" pattern = BASE_PATTERN + "/a/([^/?#]+)" test = ( ("https://xbunkr.com/a/TA0bu3F4", { "pattern": r"https://media\.xbunkr\.com/[^.]+\.\w+", "count": 861, "keyword": { "album_id": "TA0bu3F4", "album_name": "Hannahowo Onlyfans Photos", } }), ("https://xbunkr.com/a/GNQc2I5d"), ) def __init__(self, match): LolisafeExtractor.__init__(self, match) self.album_id = match.group(match.lastindex) domain = self.config("domain") if domain == "auto": self.root = text.root_from_url(match.group(0)) elif domain: self.root = text.ensure_http_scheme(domain) def items(self): files, data = self.fetch_album(self.album_id) yield Message.Directory, data for data["num"], file in enumerate(files, 1): url = file["file"] file.update(data) text.nameext_from_url(url, file) file["name"], sep, file["id"] = file["filename"].rpartition("-") yield Message.Url, url, file def fetch_album(self, album_id): url = "{}/api/album/get/{}".format(self.root, album_id) data = self.request(url).json() return data["files"], { "album_id" : self.album_id, "album_name": text.unescape(data["title"]), "count" : data["count"], } ���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1674475069.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.25.0/gallery_dl/extractor/luscious.py��������������������������������������������������0000644�0001750�0001750�00000027473�14363473075�021013� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2016-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://members.luscious.net/""" from .common import Extractor, Message from .. import text, exception class LusciousExtractor(Extractor): """Base class for luscious extractors""" category = "luscious" cookiedomain = ".luscious.net" root = "https://members.luscious.net" def _graphql(self, op, variables, query): data = { "id" : 1, "operationName": op, "query" : query, "variables" : variables, } response = self.request( "{}/graphql/nobatch/?operationName={}".format(self.root, op), method="POST", json=data, fatal=False, ) if response.status_code >= 400: self.log.debug("Server response: %s", response.text) raise exception.StopExtraction( "GraphQL query failed ('%s %s')", response.status_code, response.reason) return response.json()["data"] class LusciousAlbumExtractor(LusciousExtractor): """Extractor for image albums from luscious.net""" subcategory = "album" filename_fmt = "{category}_{album[id]}_{num:>03}.{extension}" directory_fmt = ("{category}", "{album[id]} {album[title]}") archive_fmt = "{album[id]}_{id}" pattern = (r"(?:https?://)?(?:www\.|members\.)?luscious\.net" r"/(?:albums|pictures/c/[^/?#]+/album)/[^/?#]+_(\d+)") test = ( ("https://luscious.net/albums/okinami-no-koigokoro_277031/", { "pattern": r"https://storage\.bhs\.cloud\.ovh\.net/v1/AUTH_\w+" r"/images/NTRshouldbeillegal/277031" r"/luscious_net_\d+_\d+\.jpg$", # "content": "b3a747a6464509440bd0ff6d1267e6959f8d6ff3", "keyword": { "album": { "__typename" : "Album", "audiences" : list, "content" : "Hentai", "cover" : "re:https://\\w+.luscious.net/.+/277031/", "created" : 1479625853, "created_by" : "NTRshouldbeillegal", "date" : "dt:2016-11-20 07:10:53", "description" : "Enjoy.", "download_url": "re:/download/(r/)?824778/277031/", "genres" : list, "id" : 277031, "is_manga" : True, "labels" : list, "language" : "English", "like_status" : "none", "modified" : int, "permissions" : list, "rating" : float, "slug" : "okinami-no-koigokoro", "status" : None, "tags" : list, "title" : "Okinami no Koigokoro", "url" : "/albums/okinami-no-koigokoro_277031/", "marked_for_deletion": False, "marked_for_processing": False, "number_of_animated_pictures": 0, "number_of_favorites": int, "number_of_pictures": 18, }, "aspect_ratio": r"re:\d+:\d+", "category" : "luscious", "created" : int, "date" : "type:datetime", "height" : int, "id" : int, "is_animated" : False, "like_status" : "none", "position" : int, "resolution" : r"re:\d+x\d+", "status" : None, "tags" : list, "thumbnail" : str, "title" : str, "width" : int, "number_of_comments": int, "number_of_favorites": int, }, }), ("https://luscious.net/albums/not-found_277035/", { "exception": exception.NotFoundError, }), ("https://members.luscious.net/albums/login-required_323871/", { "count": 64, }), ("https://www.luscious.net/albums/okinami_277031/"), ("https://members.luscious.net/albums/okinami_277031/"), ("https://luscious.net/pictures/c/video_game_manga/album" "/okinami-no-koigokoro_277031/sorted/position/id/16528978/@_1"), ) def __init__(self, match): LusciousExtractor.__init__(self, match) self.album_id = match.group(1) self.gif = self.config("gif", False) def items(self): album = self.metadata() yield Message.Directory, {"album": album} for num, image in enumerate(self.images(), 1): image["num"] = num image["album"] = album image["thumbnail"] = image.pop("thumbnails")[0]["url"] image["tags"] = [item["text"] for item in image["tags"]] image["date"] = text.parse_timestamp(image["created"]) image["id"] = text.parse_int(image["id"]) url = (image["url_to_original"] or image["url_to_video"] if self.gif else image["url_to_video"] or image["url_to_original"]) yield Message.Url, url, text.nameext_from_url(url, image) def metadata(self): variables = { "id": self.album_id, } query = """ query AlbumGet($id: ID!) { album { get(id: $id) { ... on Album { ...AlbumStandard } ... on MutationError { errors { code message } } } } } fragment AlbumStandard on Album { __typename id title labels description created modified like_status number_of_favorites rating status marked_for_deletion marked_for_processing number_of_pictures number_of_animated_pictures slug is_manga url download_url permissions cover { width height size url } created_by { id name display_name user_title avatar { url size } url } content { id title url } language { id title url } tags { id category text url count } genres { id title slug url } audiences { id title url url } last_viewed_picture { id position url } } """ album = self._graphql("AlbumGet", variables, query)["album"]["get"] if "errors" in album: raise exception.NotFoundError("album") album["audiences"] = [item["title"] for item in album["audiences"]] album["genres"] = [item["title"] for item in album["genres"]] album["tags"] = [item["text"] for item in album["tags"]] album["cover"] = album["cover"]["url"] album["content"] = album["content"]["title"] album["language"] = album["language"]["title"].partition(" ")[0] album["created_by"] = album["created_by"]["display_name"] album["id"] = text.parse_int(album["id"]) album["date"] = text.parse_timestamp(album["created"]) return album def images(self): variables = { "input": { "filters": [{ "name" : "album_id", "value": self.album_id, }], "display": "position", "page" : 1, }, } query = """ query AlbumListOwnPictures($input: PictureListInput!) { picture { list(input: $input) { info { ...FacetCollectionInfo } items { ...PictureStandardWithoutAlbum } } } } fragment FacetCollectionInfo on FacetCollectionInfo { page has_next_page has_previous_page total_items total_pages items_per_page url_complete url_filters_only } fragment PictureStandardWithoutAlbum on Picture { __typename id title created like_status number_of_comments number_of_favorites status width height resolution aspect_ratio url_to_original url_to_video is_animated position tags { id category text url } permissions url thumbnails { width height size url } } """ while True: data = self._graphql("AlbumListOwnPictures", variables, query) yield from data["picture"]["list"]["items"] if not data["picture"]["list"]["info"]["has_next_page"]: return variables["input"]["page"] += 1 class LusciousSearchExtractor(LusciousExtractor): """Extractor for album searches on luscious.net""" subcategory = "search" pattern = (r"(?:https?://)?(?:www\.|members\.)?luscious\.net" r"/albums/list/?(?:\?([^#]+))?") test = ( ("https://members.luscious.net/albums/list/"), ("https://members.luscious.net/albums/list/" "?display=date_newest&language_ids=%2B1&tagged=+full_color&page=1", { "pattern": LusciousAlbumExtractor.pattern, "range": "41-60", "count": 20, }), ) def __init__(self, match): LusciousExtractor.__init__(self, match) self.query = match.group(1) def items(self): query = text.parse_query(self.query) display = query.pop("display", "date_newest") page = query.pop("page", None) variables = { "input": { "display": display, "filters": [{"name": n, "value": v} for n, v in query.items()], "page": text.parse_int(page, 1), }, } query = """ query AlbumListWithPeek($input: AlbumListInput!) { album { list(input: $input) { info { ...FacetCollectionInfo } items { ...AlbumMinimal peek_thumbnails { width height size url } } } } } fragment FacetCollectionInfo on FacetCollectionInfo { page has_next_page has_previous_page total_items total_pages items_per_page url_complete url_filters_only } fragment AlbumMinimal on Album { __typename id title labels description created modified number_of_favorites number_of_pictures slug is_manga url download_url cover { width height size url } content { id title url } language { id title url } tags { id category text url count } genres { id title slug url } audiences { id title url } } """ while True: data = self._graphql("AlbumListWithPeek", variables, query) for album in data["album"]["list"]["items"]: album["url"] = self.root + album["url"] album["_extractor"] = LusciousAlbumExtractor yield Message.Queue, album["url"], album if not data["album"]["list"]["info"]["has_next_page"]: return variables["input"]["page"] += 1 �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1674475069.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.25.0/gallery_dl/extractor/lynxchan.py��������������������������������������������������0000644�0001750�0001750�00000007700�14363473075�020760� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for LynxChan Imageboards""" from .common import BaseExtractor, Message from .. import text import itertools class LynxchanExtractor(BaseExtractor): """Base class for LynxChan extractors""" basecategory = "lynxchan" BASE_PATTERN = LynxchanExtractor.update({ "bbw-chan": { "root": "https://bbw-chan.nl", "pattern": r"bbw-chan\.nl", }, "kohlchan": { "root": "https://kohlchan.net", "pattern": r"kohlchan\.net", }, "endchan": { "root": None, "pattern": r"endchan\.(?:org|net|gg)", }, }) class LynxchanThreadExtractor(LynxchanExtractor): """Extractor for LynxChan threads""" subcategory = "thread" directory_fmt = ("{category}", "{boardUri}", "{threadId} {subject|message[:50]}") filename_fmt = "{postId}{num:?-//} {filename}.{extension}" archive_fmt = "{boardUri}_{postId}_{num}" pattern = BASE_PATTERN + r"/([^/?#]+)/res/(\d+)" test = ( ("https://bbw-chan.nl/bbwdraw/res/499.html", { "pattern": r"https://bbw-chan\.nl/\.media/[0-9a-f]{64}(\.\w+)?$", "count": ">= 352", }), ("https://bbw-chan.nl/bbwdraw/res/489.html"), ("https://kohlchan.net/a/res/4594.html", { "pattern": r"https://kohlchan\.net/\.media/[0-9a-f]{64}(\.\w+)?$", "count": ">= 80", }), ("https://endchan.org/yuri/res/193483.html", { "pattern": r"https://endchan\.org/\.media/[^.]+(\.\w+)?$", "count" : ">= 19", }), ("https://endchan.org/yuri/res/33621.html"), ) def __init__(self, match): LynxchanExtractor.__init__(self, match) index = match.lastindex self.board = match.group(index-1) self.thread = match.group(index) def items(self): url = "{}/{}/res/{}.json".format(self.root, self.board, self.thread) thread = self.request(url).json() thread["postId"] = thread["threadId"] posts = thread.pop("posts", ()) yield Message.Directory, thread for post in itertools.chain((thread,), posts): files = post.pop("files", ()) if files: thread.update(post) for num, file in enumerate(files): file.update(thread) file["num"] = num url = self.root + file["path"] text.nameext_from_url(file["originalName"], file) yield Message.Url, url, file class LynxchanBoardExtractor(LynxchanExtractor): """Extractor for LynxChan boards""" subcategory = "board" pattern = BASE_PATTERN + r"/([^/?#]+)(?:/index|/catalog|/\d+|/?$)" test = ( ("https://bbw-chan.nl/bbwdraw/", { "pattern": LynxchanThreadExtractor.pattern, "count": ">= 148", }), ("https://bbw-chan.nl/bbwdraw/2.html"), ("https://kohlchan.net/a/", { "pattern": LynxchanThreadExtractor.pattern, "count": ">= 100", }), ("https://kohlchan.net/a/2.html"), ("https://kohlchan.net/a/catalog.html"), ("https://endchan.org/yuri/", { "pattern": LynxchanThreadExtractor.pattern, "count" : ">= 9", }), ("https://endchan.org/yuri/catalog.html"), ) def __init__(self, match): LynxchanExtractor.__init__(self, match) self.board = match.group(match.lastindex) def items(self): url = "{}/{}/catalog.json".format(self.root, self.board) for thread in self.request(url).json(): url = "{}/{}/res/{}.html".format( self.root, self.board, thread["threadId"]) thread["_extractor"] = LynxchanThreadExtractor yield Message.Queue, url, thread ����������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1677790111.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.25.0/gallery_dl/extractor/mangadex.py��������������������������������������������������0000644�0001750�0001750�00000025104�14400205637�020703� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2018-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://mangadex.org/""" from .common import Extractor, Message from .. import text, util, exception from ..cache import cache, memcache from collections import defaultdict BASE_PATTERN = r"(?:https?://)?(?:www\.)?mangadex\.(?:org|cc)" class MangadexExtractor(Extractor): """Base class for mangadex extractors""" category = "mangadex" directory_fmt = ( "{category}", "{manga}", "{volume:?v/ />02}c{chapter:>03}{chapter_minor}{title:?: //}") filename_fmt = ( "{manga}_c{chapter:>03}{chapter_minor}_{page:>03}.{extension}") archive_fmt = "{chapter_id}_{page}" root = "https://mangadex.org" _cache = {} def __init__(self, match): Extractor.__init__(self, match) self.session.headers["User-Agent"] = util.USERAGENT self.api = MangadexAPI(self) self.uuid = match.group(1) def items(self): for chapter in self.chapters(): uuid = chapter["id"] data = self._transform(chapter) data["_extractor"] = MangadexChapterExtractor self._cache[uuid] = data yield Message.Queue, self.root + "/chapter/" + uuid, data def _transform(self, chapter): relationships = defaultdict(list) for item in chapter["relationships"]: relationships[item["type"]].append(item) manga = self.api.manga(relationships["manga"][0]["id"]) for item in manga["relationships"]: relationships[item["type"]].append(item) cattributes = chapter["attributes"] mattributes = manga["attributes"] lang = cattributes.get("translatedLanguage") if lang: lang = lang.partition("-")[0] if cattributes["chapter"]: chnum, sep, minor = cattributes["chapter"].partition(".") else: chnum, sep, minor = 0, "", "" data = { "manga" : (mattributes["title"].get("en") or next(iter(mattributes["title"].values()))), "manga_id": manga["id"], "title" : cattributes["title"], "volume" : text.parse_int(cattributes["volume"]), "chapter" : text.parse_int(chnum), "chapter_minor": sep + minor, "chapter_id": chapter["id"], "date" : text.parse_datetime(cattributes["publishAt"]), "lang" : lang, "language": util.code_to_language(lang), "count" : cattributes["pages"], "_external_url": cattributes.get("externalUrl"), } data["artist"] = [artist["attributes"]["name"] for artist in relationships["artist"]] data["author"] = [author["attributes"]["name"] for author in relationships["author"]] data["group"] = [group["attributes"]["name"] for group in relationships["scanlation_group"]] return data class MangadexChapterExtractor(MangadexExtractor): """Extractor for manga-chapters from mangadex.org""" subcategory = "chapter" pattern = BASE_PATTERN + r"/chapter/([0-9a-f-]+)" test = ( ("https://mangadex.org/chapter/f946ac53-0b71-4b5d-aeb2-7931b13c4aaa", { "keyword": "86fb262cf767dac6d965cd904ad499adba466404", # "content": "50383a4c15124682057b197d40261641a98db514", }), # oneshot ("https://mangadex.org/chapter/61a88817-9c29-4281-bdf1-77b3c1be9831", { "count": 64, "keyword": "6abcbe1e24eeb1049dc931958853cd767ee483fb", }), # MANGA Plus (#1154) ("https://mangadex.org/chapter/74149a55-e7c4-44ea-8a37-98e879c1096f", { "exception": exception.StopExtraction, }), # 'externalUrl', but still downloadable (#2503) ("https://mangadex.org/chapter/364728a4-6909-4164-9eea-6b56354f7c78", { "count": 0, # 404 }), ) def items(self): try: data = self._cache.pop(self.uuid) except KeyError: chapter = self.api.chapter(self.uuid) data = self._transform(chapter) if data.get("_external_url") and not data["count"]: raise exception.StopExtraction( "Chapter %s%s is not available on MangaDex and can instead be " "read on the official publisher's website at %s.", data["chapter"], data["chapter_minor"], data["_external_url"]) yield Message.Directory, data server = self.api.athome_server(self.uuid) chapter = server["chapter"] base = "{}/data/{}/".format(server["baseUrl"], chapter["hash"]) enum = util.enumerate_reversed if self.config( "page-reverse") else enumerate for data["page"], page in enum(chapter["data"], 1): text.nameext_from_url(page, data) yield Message.Url, base + page, data class MangadexMangaExtractor(MangadexExtractor): """Extractor for manga from mangadex.org""" subcategory = "manga" pattern = BASE_PATTERN + r"/(?:title|manga)/(?!feed$)([0-9a-f-]+)" test = ( ("https://mangadex.org/title/f90c4398-8aad-4f51-8a1f-024ca09fdcbc", { "keyword": { "manga" : "Souten no Koumori", "manga_id": "f90c4398-8aad-4f51-8a1f-024ca09fdcbc", "title" : "re:One[Ss]hot", "volume" : 0, "chapter" : 0, "chapter_minor": "", "chapter_id": str, "date" : "type:datetime", "lang" : str, "language": str, "artist" : ["Arakawa Hiromu"], "author" : ["Arakawa Hiromu"], }, }), ("https://mangadex.cc/manga/d0c88e3b-ea64-4e07-9841-c1d2ac982f4a/", { "options": (("lang", "en"),), "count": ">= 100", }), ("https://mangadex.org/title/7c1e2742-a086-4fd3-a3be-701fd6cf0be9", { "count": 1, }), ("https://mangadex.org/title/584ef094-b2ab-40ce-962c-bce341fb9d10", { "count": ">= 20", }) ) def chapters(self): return self.api.manga_feed(self.uuid) class MangadexFeedExtractor(MangadexExtractor): """Extractor for chapters from your Followed Feed""" subcategory = "feed" pattern = BASE_PATTERN + r"/title/feed$()" test = ("https://mangadex.org/title/feed",) def chapters(self): return self.api.user_follows_manga_feed() class MangadexAPI(): """Interface for the MangaDex API v5""" def __init__(self, extr): self.extractor = extr self.headers = {} self.username, self.password = self.extractor._get_auth_info() if not self.username: self.authenticate = util.noop server = extr.config("api-server") self.root = ("https://api.mangadex.org" if server is None else text.ensure_http_scheme(server).rstrip("/")) def athome_server(self, uuid): return self._call("/at-home/server/" + uuid) def chapter(self, uuid): params = {"includes[]": ("scanlation_group",)} return self._call("/chapter/" + uuid, params)["data"] @memcache(keyarg=1) def manga(self, uuid): params = {"includes[]": ("artist", "author")} return self._call("/manga/" + uuid, params)["data"] def manga_feed(self, uuid): order = "desc" if self.extractor.config("chapter-reverse") else "asc" params = { "order[volume]" : order, "order[chapter]": order, } return self._pagination("/manga/" + uuid + "/feed", params) def user_follows_manga_feed(self): params = {"order[publishAt]": "desc"} return self._pagination("/user/follows/manga/feed", params) def authenticate(self): self.headers["Authorization"] = \ self._authenticate_impl(self.username, self.password) @cache(maxage=900, keyarg=1) def _authenticate_impl(self, username, password): refresh_token = _refresh_token_cache(username) if refresh_token: self.extractor.log.info("Refreshing access token") url = self.root + "/auth/refresh" data = {"token": refresh_token} else: self.extractor.log.info("Logging in as %s", username) url = self.root + "/auth/login" data = {"username": username, "password": password} data = self.extractor.request( url, method="POST", json=data, fatal=None).json() if data.get("result") != "ok": raise exception.AuthenticationError() if refresh_token != data["token"]["refresh"]: _refresh_token_cache.update(username, data["token"]["refresh"]) return "Bearer " + data["token"]["session"] def _call(self, endpoint, params=None): url = self.root + endpoint while True: self.authenticate() response = self.extractor.request( url, params=params, headers=self.headers, fatal=None) if response.status_code < 400: return response.json() if response.status_code == 429: until = response.headers.get("X-RateLimit-Retry-After") self.extractor.wait(until=until) continue msg = ", ".join('{title}: {detail}'.format_map(error) for error in response.json()["errors"]) raise exception.StopExtraction( "%s %s (%s)", response.status_code, response.reason, msg) def _pagination(self, endpoint, params=None): if params is None: params = {} config = self.extractor.config ratings = config("ratings") if ratings is None: ratings = ("safe", "suggestive", "erotica", "pornographic") params["contentRating[]"] = ratings params["includes[]"] = ("scanlation_group",) params["translatedLanguage[]"] = config("lang") params["offset"] = 0 api_params = config("api-parameters") if api_params: params.update(api_params) while True: data = self._call(endpoint, params) yield from data["data"] params["offset"] = data["offset"] + data["limit"] if params["offset"] >= data["total"]: return @cache(maxage=28*24*3600, keyarg=0) def _refresh_token_cache(username): return None ������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1674475069.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.25.0/gallery_dl/extractor/mangafox.py��������������������������������������������������0000644�0001750�0001750�00000012747�14363473075�020743� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2017-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://fanfox.net/""" from .common import ChapterExtractor, MangaExtractor from .. import text BASE_PATTERN = r"(?:https?://)?(?:www\.|m\.)?(?:fanfox\.net|mangafox\.me)" class MangafoxChapterExtractor(ChapterExtractor): """Extractor for manga chapters from fanfox.net""" category = "mangafox" root = "https://m.fanfox.net" pattern = BASE_PATTERN + \ r"(/manga/[^/?#]+/((?:v([^/?#]+)/)?c(\d+)([^/?#]*)))" test = ( ("http://fanfox.net/manga/kidou_keisatsu_patlabor/v05/c006.2/1.html", { "keyword": "5661dab258d42d09d98f194f7172fb9851a49766", "content": "5c50c252dcf12ffecf68801f4db8a2167265f66c", }), ("http://mangafox.me/manga/kidou_keisatsu_patlabor/v05/c006.2/"), ("http://fanfox.net/manga/black_clover/vTBD/c295/1.html"), ) def __init__(self, match): base, self.cstr, self.volume, self.chapter, self.minor = match.groups() self.urlbase = self.root + base ChapterExtractor.__init__(self, match, self.urlbase + "/1.html") self.session.headers["Referer"] = self.root + "/" def metadata(self, page): manga, pos = text.extract(page, "<title>", "") count, pos = text.extract( page, ">", "<", page.find("", pos) - 20) sid , pos = text.extract(page, "var series_id =", ";", pos) cid , pos = text.extract(page, "var chapter_id =", ";", pos) return { "manga" : text.unescape(manga), "volume" : text.parse_int(self.volume), "chapter" : text.parse_int(self.chapter), "chapter_minor" : self.minor or "", "chapter_string": self.cstr, "count" : text.parse_int(count), "sid" : text.parse_int(sid), "cid" : text.parse_int(cid), } def images(self, page): pnum = 1 while True: url, pos = text.extract(page, '=60", "keyword": { "author": "HIROYUKI", "chapter": int, "chapter_minor": r"re:^(\.\d+)?$", "chapter_string": r"re:(v\d+/)?c\d+", "date": "type:datetime", "description": "High school boy Naoya gets a confession from M" "omi, a cute and friendly girl. However, Naoya " "already has a girlfriend, Seki... but Momi is " "too good a catch to let go. Momi and Nagoya's " "goal becomes clear: convince Seki to accept be" "ing an item with the two of them. Will she bud" "ge?", "lang": "en", "language": "English", "manga": "Kanojo mo Kanojo", "tags": ["Comedy", "Romance", "School Life", "Shounen"], "volume": int, }, }), ("https://mangafox.me/manga/shangri_la_frontier", { "pattern": MangafoxChapterExtractor.pattern, "count": ">=45", }), ("https://m.fanfox.net/manga/sentai_daishikkaku"), ) def chapters(self, page): results = [] chapter_match = MangafoxChapterExtractor.pattern.match extr = text.extract_from(page) manga = extr('

    ', '

    ') author = extr('

    Author(s):', '

    ') extr('
    ', '') genres, _, summary = text.extr( page, '
    ', '' ).partition('
    ') data = { "manga" : text.unescape(manga), "author" : text.remove_html(author), "description": text.unescape(text.remove_html(summary)), "tags" : text.split_html(genres), "lang" : "en", "language" : "English", } while True: url = "https://" + extr('', ''), "%b %d, %Y"), } chapter.update(data) results.append((url, chapter)) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1674475069.0 gallery_dl-1.25.0/gallery_dl/extractor/mangahere.py0000644000175000017500000001324614363473075021065 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2015-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.mangahere.cc/""" from .common import ChapterExtractor, MangaExtractor from .. import text import re class MangahereBase(): """Base class for mangahere extractors""" category = "mangahere" root = "https://www.mangahere.cc" root_mobile = "https://m.mangahere.cc" url_fmt = root_mobile + "/manga/{}/{}.html" class MangahereChapterExtractor(MangahereBase, ChapterExtractor): """Extractor for manga-chapters from mangahere.cc""" pattern = (r"(?:https?://)?(?:www\.|m\.)?mangahere\.c[co]/manga/" r"([^/]+(?:/v0*(\d+))?/c([^/?#]+))") test = ( ("https://www.mangahere.cc/manga/dongguo_xiaojie/c004.2/", { "keyword": "7c98d7b50a47e6757b089aa875a53aa970cac66f", "content": "708d475f06893b88549cbd30df1e3f9428f2c884", }), # URLs without HTTP scheme (#1070) ("https://www.mangahere.cc/manga/beastars/c196/1.html", { "pattern": "https://zjcdn.mangahere.org/.*", }), ("http://www.mangahere.co/manga/dongguo_xiaojie/c003.2/"), ("http://m.mangahere.co/manga/dongguo_xiaojie/c003.2/"), ) def __init__(self, match): self.part, self.volume, self.chapter = match.groups() url = self.url_fmt.format(self.part, 1) ChapterExtractor.__init__(self, match, url) self.session.headers["Referer"] = self.root_mobile + "/" def metadata(self, page): pos = page.index("") count , pos = text.extract(page, ">", "<", pos - 20) manga_id , pos = text.extract(page, "series_id = ", ";", pos) chapter_id, pos = text.extract(page, "chapter_id = ", ";", pos) manga , pos = text.extract(page, '"name":"', '"', pos) chapter, dot, minor = self.chapter.partition(".") return { "manga": text.unescape(manga), "manga_id": text.parse_int(manga_id), "title": self._get_title(), "volume": text.parse_int(self.volume), "chapter": text.parse_int(chapter), "chapter_minor": dot + minor, "chapter_id": text.parse_int(chapter_id), "count": text.parse_int(count), "lang": "en", "language": "English", } def images(self, page): pnum = 1 while True: url, pos = text.extract(page, '= 50", }), ("https://www.mangahere.co/manga/aria/"), ("https://m.mangahere.co/manga/aria/"), ) def __init__(self, match): MangaExtractor.__init__(self, match) self.session.cookies.set("isAdult", "1", domain="www.mangahere.cc") def chapters(self, page): results = [] manga, pos = text.extract(page, '', '<', pos) date, pos = text.extract(page, 'class="title2">', '<', pos) match = re.match( r"(?:Vol\.0*(\d+) )?Ch\.0*(\d+)(\S*)(?: - (.*))?", info) if match: volume, chapter, minor, title = match.groups() else: chapter, _, minor = url[:-1].rpartition("/c")[2].partition(".") minor = "." + minor volume = 0 title = "" results.append((text.urljoin(self.root, url), { "manga": manga, "title": text.unescape(title) if title else "", "volume": text.parse_int(volume), "chapter": text.parse_int(chapter), "chapter_minor": minor, "date": date, "lang": "en", "language": "English", })) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1674475069.0 gallery_dl-1.25.0/gallery_dl/extractor/mangakakalot.py0000644000175000017500000000776014363473075021574 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2020 Jake Mannens # Copyright 2021-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://mangakakalot.tv/""" from .common import ChapterExtractor, MangaExtractor from .. import text import re BASE_PATTERN = r"(?:https?://)?(?:ww[\dw]?\.)?mangakakalot\.tv" class MangakakalotBase(): """Base class for mangakakalot extractors""" category = "mangakakalot" root = "https://ww3.mangakakalot.tv" class MangakakalotChapterExtractor(MangakakalotBase, ChapterExtractor): """Extractor for manga chapters from mangakakalot.tv""" pattern = BASE_PATTERN + r"(/chapter/[^/?#]+/chapter[_-][^/?#]+)" test = ( ("https://ww3.mangakakalot.tv/chapter/manga-jk986845/chapter-34.2", { "pattern": r"https://cm\.blazefast\.co" r"/[0-9a-f]{2}/[0-9a-f]{2}/[0-9a-f]{32}\.jpg", "keyword": "0f1586ff52f0f9cbbb25306ae64ab718f8a6a633", "count": 9, }), ("https://mangakakalot.tv/chapter" "/hatarakanai_futari_the_jobless_siblings/chapter_20.1"), ) def __init__(self, match): self.path = match.group(1) ChapterExtractor.__init__(self, match, self.root + self.path) self.session.headers['Referer'] = self.root def metadata(self, page): _ , pos = text.extract(page, '', '<') manga , pos = text.extract(page, '', '<', pos) info , pos = text.extract(page, '', '<', pos) author, pos = text.extract(page, '. Author:', ' already has ', pos) match = re.match( r"(?:[Vv]ol\. *(\d+) )?" r"[Cc]hapter *([^:]*)" r"(?:: *(.+))?", info) volume, chapter, title = match.groups() if match else ("", "", info) chapter, sep, minor = chapter.partition(".") return { "manga" : text.unescape(manga), "title" : text.unescape(title) if title else "", "author" : text.unescape(author).strip() if author else "", "volume" : text.parse_int(volume), "chapter" : text.parse_int(chapter), "chapter_minor": sep + minor, "lang" : "en", "language" : "English", } def images(self, page): return [ (url, None) for url in text.extract_iter(page, '= 30", }), ("https://mangakakalot.tv/manga/lk921810"), ) def chapters(self, page): data = {"lang": "en", "language": "English"} data["manga"], pos = text.extract(page, "

    ", "<") author, pos = text.extract(page, "
  • Author(s) :", "", pos) data["author"] = text.remove_html(author) results = [] for chapter in text.extract_iter(page, '
    ', '
    '): url, pos = text.extract(chapter, '', '', pos) data["title"] = title.partition(": ")[2] data["date"] , pos = text.extract( chapter, '") manga = extr('title="', '"') info = extr('title="', '"') author = extr("- Author(s) : ", "

    ") return self._parse_chapter( info, text.unescape(manga), text.unescape(author)) def images(self, page): page = text.extr( page, 'class="container-chapter-reader', '\n= 25", }), ("https://m.manganelo.com/manga-ti107776", { "pattern": ManganeloChapterExtractor.pattern, "count": ">= 12", }), ("https://readmanganato.com/manga-gn983696"), ("https://manganelo.com/manga/read_otome_no_teikoku"), ("https://manganelo.com/manga/ol921234/"), ) def chapters(self, page): results = [] append = results.append extr = text.extract_from(page) manga = text.unescape(extr("

    ", "<")) author = text.remove_html(extr("Author(s) :", "")) extr('class="row-content-chapter', '') while True: url = extr('class="chapter-name text-nowrap" href="', '"') if not url: return results info = extr(">", "<") date = extr('class="chapter-time text-nowrap" title="', '"') append((url, self._parse_chapter(info, manga, author, date))) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1675805551.0 gallery_dl-1.25.0/gallery_dl/extractor/mangapark.py0000644000175000017500000001377214370541557021102 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2015-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://mangapark.net/""" from .common import ChapterExtractor, MangaExtractor from .. import text, util, exception import re class MangaparkBase(): """Base class for mangapark extractors""" category = "mangapark" root_fmt = "https://v2.mangapark.{}" browser = "firefox" @staticmethod def parse_chapter_path(path, data): """Get volume/chapter information from url-path of a chapter""" data["volume"], data["chapter_minor"] = 0, "" for part in path.split("/")[1:]: key, value = part[0], part[1:] if key == "c": chapter, dot, minor = value.partition(".") data["chapter"] = text.parse_int(chapter) data["chapter_minor"] = dot + minor elif key == "i": data["chapter_id"] = text.parse_int(value) elif key == "v": data["volume"] = text.parse_int(value) elif key == "s": data["stream"] = text.parse_int(value) elif key == "e": data["chapter_minor"] = "v" + value @staticmethod def parse_chapter_title(title, data): match = re.search(r"(?i)(?:vol(?:ume)?[ .]*(\d+) )?" r"ch(?:apter)?[ .]*(\d+)(\.\w+)?", title) if match: vol, ch, data["chapter_minor"] = match.groups() data["volume"] = text.parse_int(vol) data["chapter"] = text.parse_int(ch) class MangaparkChapterExtractor(MangaparkBase, ChapterExtractor): """Extractor for manga-chapters from mangapark.net""" pattern = (r"(?:https?://)?(?:www\.|v2\.)?mangapark\.(me|net|com)" r"/manga/([^?#]+/i\d+)") test = ( ("https://mangapark.net/manga/gosu/i811653/c055/1", { "count": 50, "keyword": "db1ed9af4f972756a25dbfa5af69a8f155b043ff", }), (("https://mangapark.net/manga" "/ad-astra-per-aspera-hata-kenjirou/i662051/c001.2/1"), { "count": 40, "keyword": "2bb3a8f426383ea13f17ff5582f3070d096d30ac", }), (("https://mangapark.net/manga" "/gekkan-shoujo-nozaki-kun/i2067426/v7/c70/1"), { "count": 15, "keyword": "edc14993c4752cee3a76e09b2f024d40d854bfd1", }), ("https://mangapark.me/manga/gosu/i811615/c55/1"), ("https://mangapark.com/manga/gosu/i811615/c55/1"), ) def __init__(self, match): tld, self.path = match.groups() self.root = self.root_fmt.format(tld) url = "{}/manga/{}?zoom=2".format(self.root, self.path) ChapterExtractor.__init__(self, match, url) def metadata(self, page): data = text.extract_all(page, ( ("manga_id" , "var _manga_id = '", "'"), ("chapter_id", "var _book_id = '", "'"), ("stream" , "var _stream = '", "'"), ("path" , "var _book_link = '", "'"), ("manga" , "

    ", "

    "), ("title" , "", "<"), ), values={"lang": "en", "language": "English"})[0] if not data["path"]: raise exception.NotFoundError("chapter") self.parse_chapter_path(data["path"], data) if "chapter" not in data: self.parse_chapter_title(data["title"], data) data["manga"], _, data["type"] = data["manga"].rpartition(" ") data["manga"] = text.unescape(data["manga"]) data["title"] = data["title"].partition(": ")[2] for key in ("manga_id", "chapter_id", "stream"): data[key] = text.parse_int(data[key]) return data def images(self, page): data = util.json_loads(text.extr(page, "var _load_pages =", ";")) return [ (text.urljoin(self.root, item["u"]), { "width": text.parse_int(item["w"]), "height": text.parse_int(item["h"]), }) for item in data ] class MangaparkMangaExtractor(MangaparkBase, MangaExtractor): """Extractor for manga from mangapark.net""" chapterclass = MangaparkChapterExtractor pattern = (r"(?:https?://)?(?:www\.|v2\.)?mangapark\.(me|net|com)" r"(/manga/[^/?#]+)/?$") test = ( ("https://mangapark.net/manga/aria", { "url": "51c6d82aed5c3c78e0d3f980b09a998e6a2a83ee", "keyword": "cabc60cf2efa82749d27ac92c495945961e4b73c", }), ("https://mangapark.me/manga/aria"), ("https://mangapark.com/manga/aria"), ) def __init__(self, match): self.root = self.root_fmt.format(match.group(1)) MangaExtractor.__init__(self, match, self.root + match.group(2)) def chapters(self, page): results = [] data = {"lang": "en", "language": "English"} data["manga"] = text.unescape( text.extr(page, '', ' Manga - ')) for stream in page.split('<div id="stream_')[1:]: data["stream"] = text.parse_int(text.extr(stream, '', '"')) for chapter in text.extract_iter(stream, '<li ', '</li>'): path , pos = text.extract(chapter, 'href="', '"') title1, pos = text.extract(chapter, '>', '<', pos) title2, pos = text.extract(chapter, '>: </span>', '<', pos) count , pos = text.extract(chapter, ' of ', ' ', pos) self.parse_chapter_path(path[8:], data) if "chapter" not in data: self.parse_chapter_title(title1, data) if title2: data["title"] = title2.strip() else: data["title"] = title1.partition(":")[2].strip() data["count"] = text.parse_int(count) results.append((self.root + path, data.copy())) data.pop("chapter", None) return results ������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1677790111.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.25.0/gallery_dl/extractor/mangasee.py��������������������������������������������������0000644�0001750�0001750�00000016315�14400205637�020703� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2021-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://mangasee123.com/""" from .common import ChapterExtractor, MangaExtractor from .. import text, util class MangaseeBase(): category = "mangasee" browser = "firefox" root = "https://mangasee123.com" @staticmethod def _transform_chapter(data): chapter = data["Chapter"] return { "title" : data["ChapterName"] or "", "index" : chapter[0], "chapter" : int(chapter[1:-1]), "chapter_minor": "" if chapter[-1] == "0" else "." + chapter[-1], "chapter_string": chapter, "lang" : "en", "language": "English", "date" : text.parse_datetime( data["Date"], "%Y-%m-%d %H:%M:%S"), } class MangaseeChapterExtractor(MangaseeBase, ChapterExtractor): pattern = (r"(?:https?://)?(mangasee123|manga4life)\.com" r"(/read-online/[^/?#]+\.html)") test = ( (("https://mangasee123.com/read-online" "/Tokyo-Innocent-chapter-4.5-page-1.html"), { "pattern": r"https://[^/]+/manga/Tokyo-Innocent/0004\.5-00\d\.png", "count": 8, "keyword": { "author": ["NARUMI Naru"], "chapter": 4, "chapter_minor": ".5", "chapter_string": "100045", "count": 8, "date": "dt:2020-01-20 21:52:53", "extension": "png", "filename": r"re:0004\.5-00\d", "genre": ["Comedy", "Fantasy", "Harem", "Romance", "Shounen", "Supernatural"], "index": "1", "lang": "en", "language": "English", "manga": "Tokyo Innocent", "page": int, "title": "", }, }), (("https://manga4life.com/read-online" "/One-Piece-chapter-1063-page-1.html"), { "pattern": r"https://[^/]+/manga/One-Piece/1063-0\d\d\.png", "count": 13, "keyword": { "author": ["ODA Eiichiro"], "chapter": 1063, "chapter_minor": "", "chapter_string": "110630", "count": 13, "date": "dt:2022-10-16 17:32:54", "extension": "png", "filename": r"re:1063-0\d\d", "genre": ["Action", "Adventure", "Comedy", "Drama", "Fantasy", "Shounen"], "index": "1", "lang": "en", "language": "English", "manga": "One Piece", "page": int, "title": "", }, }), ) def __init__(self, match): if match.group(1) == "manga4life": self.category = "mangalife" self.root = "https://manga4life.com" ChapterExtractor.__init__(self, match, self.root + match.group(2)) self.session.headers["Referer"] = self.gallery_url domain = self.root.rpartition("/")[2] cookies = self.session.cookies if not cookies.get("PHPSESSID", domain=domain): cookies.set("PHPSESSID", util.generate_token(13), domain=domain) def metadata(self, page): extr = text.extract_from(page) author = util.json_loads(extr('"author":', '],') + "]") genre = util.json_loads(extr('"genre":', '],') + "]") self.chapter = data = util.json_loads(extr("vm.CurChapter =", ";\r\n")) self.domain = extr('vm.CurPathName = "', '"') self.slug = extr('vm.IndexName = "', '"') data = self._transform_chapter(data) data["manga"] = text.unescape(extr('vm.SeriesName = "', '"')) data["author"] = author data["genre"] = genre return data def images(self, page): chapter = self.chapter["Chapter"][1:] if chapter[-1] == "0": chapter = chapter[:-1] else: chapter = chapter[:-1] + "." + chapter[-1] base = "https://{}/manga/{}/".format(self.domain, self.slug) if self.chapter["Directory"]: base += self.chapter["Directory"] + "/" base += chapter + "-" return [ ("{}{:>03}.png".format(base, i), None) for i in range(1, int(self.chapter["Page"]) + 1) ] class MangaseeMangaExtractor(MangaseeBase, MangaExtractor): chapterclass = MangaseeChapterExtractor pattern = r"(?:https?://)?(mangasee123|manga4life)\.com(/manga/[^/?#]+)" test = ( (("https://mangasee123.com/manga" "/Nakamura-Koedo-To-Daizu-Keisuke-Wa-Umaku-Ikanai"), { "pattern": MangaseeChapterExtractor.pattern, "count": ">= 17", "keyword": { "author": ["TAKASE Masaya"], "chapter": int, "chapter_minor": r"re:^|\.5$", "chapter_string": r"re:100\d\d\d", "date": "type:datetime", "genre": ["Comedy", "Romance", "School Life", "Shounen", "Slice of Life"], "index": "1", "lang": "en", "language": "English", "manga": "Nakamura-Koedo-To-Daizu-Keisuke-Wa-Umaku-Ikanai", "title": "", }, }), ("https://manga4life.com/manga/Ano-Musume-Ni-Kiss-To-Shirayuri-O", { "pattern": MangaseeChapterExtractor.pattern, "count": ">= 50", "keyword": { "author": ["Canno"], "chapter": int, "chapter_minor": r"re:^|\.5$", "chapter_string": r"re:100\d\d\d", "date": "type:datetime", "genre": ["Comedy", "Romance", "School Life", "Seinen", "Shoujo Ai"], "index": "1", "lang": "en", "language": "English", "manga": "Ano-Musume-Ni-Kiss-To-Shirayuri-O", "title": "" }, }), ) def __init__(self, match): if match.group(1) == "manga4life": self.category = "mangalife" self.root = "https://manga4life.com" MangaExtractor.__init__(self, match, self.root + match.group(2)) def chapters(self, page): extr = text.extract_from(page) author = util.json_loads(extr('"author":', '],') + "]") genre = util.json_loads(extr('"genre":', '],') + "]") slug = extr('vm.IndexName = "', '"') chapters = util.json_loads(extr("vm.Chapters = ", ";\r\n")) result = [] for data in map(self._transform_chapter, chapters): url = "{}/read-online/{}-chapter-{}{}".format( self.root, slug, data["chapter"], data["chapter_minor"]) if data["index"] != "1": url += "-index-" + data["index"] url += "-page-1.html" data["manga"] = slug data["author"] = author data["genre"] = genre result.append((url, data)) return result �������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1674475069.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.25.0/gallery_dl/extractor/mangoxo.py���������������������������������������������������0000644�0001750�0001750�00000014444�14363473075�020607� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2019-2020 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.mangoxo.com/""" from .common import Extractor, Message from .. import text, exception from ..cache import cache import hashlib import time class MangoxoExtractor(Extractor): """Base class for mangoxo extractors""" category = "mangoxo" root = "https://www.mangoxo.com" cookiedomain = "www.mangoxo.com" cookienames = ("SESSION",) _warning = True def login(self): username, password = self._get_auth_info() if username: self._update_cookies(self._login_impl(username, password)) elif MangoxoExtractor._warning: MangoxoExtractor._warning = False self.log.warning("Unauthenticated users cannot see " "more than 5 images per album") @cache(maxage=3*3600, keyarg=1) def _login_impl(self, username, password): self.log.info("Logging in as %s", username) url = self.root + "/login" page = self.request(url).text token = text.extr(page, 'id="loginToken" value="', '"') url = self.root + "/api/login" headers = { "X-Requested-With": "XMLHttpRequest", "Referer": self.root + "/login", } data = self._sign_by_md5(username, password, token) response = self.request(url, method="POST", headers=headers, data=data) data = response.json() if str(data.get("result")) != "1": raise exception.AuthenticationError(data.get("msg")) return {"SESSION": self.session.cookies.get("SESSION")} @staticmethod def _sign_by_md5(username, password, token): # https://dns.mangoxo.com/libs/plugins/phoenix-ui/js/phoenix-ui.js params = [ ("username" , username), ("password" , password), ("token" , token), ("timestamp", str(int(time.time()))), ] query = "&".join("=".join(item) for item in sorted(params)) query += "&secretKey=340836904" sign = hashlib.md5(query.encode()).hexdigest() params.append(("sign", sign.upper())) return params @staticmethod def _total_pages(page): return text.parse_int(text.extract(page, "total :", ",")[0]) class MangoxoAlbumExtractor(MangoxoExtractor): """Extractor for albums on mangoxo.com""" subcategory = "album" filename_fmt = "{album[id]}_{num:>03}.{extension}" directory_fmt = ("{category}", "{channel[name]}", "{album[name]}") archive_fmt = "{album[id]}_{num}" pattern = r"(?:https?://)?(?:www\.)?mangoxo\.com/album/(\w+)" test = ("https://www.mangoxo.com/album/lzVOv1Q9", { "url": "ad921fe62663b06e7d73997f7d00646cab7bdd0d", "keyword": { "channel": { "id": "gaxO16d8", "name": "Phoenix", "cover": str, }, "album": { "id": "lzVOv1Q9", "name": "re:池永康晟 Ikenaga Yasunari 透出古朴", "date": "dt:2019-03-22 14:42:00", "description": str, }, "id": int, "num": int, "count": 65, }, }) def __init__(self, match): MangoxoExtractor.__init__(self, match) self.album_id = match.group(1) def items(self): self.login() url = "{}/album/{}/".format(self.root, self.album_id) page = self.request(url).text data = self.metadata(page) imgs = self.images(url, page) yield Message.Directory, data data["extension"] = None for data["num"], path in enumerate(imgs, 1): data["id"] = text.parse_int(text.extr(path, "=", "&")) url = self.root + "/external/" + path.rpartition("url=")[2] yield Message.Url, url, text.nameext_from_url(url, data) def metadata(self, page): """Return general metadata""" extr = text.extract_from(page) title = extr('<img id="cover-img" alt="', '"') cid = extr('href="https://www.mangoxo.com/user/', '"') cname = extr('<img alt="', '"') cover = extr(' src="', '"') count = extr('id="pic-count">', '<') date = extr('class="fa fa-calendar"></i>', '<') descr = extr('<pre>', '</pre>') return { "channel": { "id": cid, "name": text.unescape(cname), "cover": cover, }, "album": { "id": self.album_id, "name": text.unescape(title), "date": text.parse_datetime(date.strip(), "%Y.%m.%d %H:%M"), "description": text.unescape(descr), }, "count": text.parse_int(count), } def images(self, url, page): """Generator; Yields all image URLs""" total = self._total_pages(page) num = 1 while True: yield from text.extract_iter( page, 'class="lightgallery-item" href="', '"') if num >= total: return num += 1 page = self.request(url + str(num)).text class MangoxoChannelExtractor(MangoxoExtractor): """Extractor for all albums on a mangoxo channel""" subcategory = "channel" pattern = r"(?:https?://)?(?:www\.)?mangoxo\.com/(\w+)/album" test = ("https://www.mangoxo.com/phoenix/album", { "pattern": MangoxoAlbumExtractor.pattern, "range": "1-30", "count": "> 20", }) def __init__(self, match): MangoxoExtractor.__init__(self, match) self.user = match.group(1) def items(self): self.login() num = total = 1 url = "{}/{}/album/".format(self.root, self.user) data = {"_extractor": MangoxoAlbumExtractor} while True: page = self.request(url + str(num)).text for album in text.extract_iter( page, '<a class="link black" href="', '"'): yield Message.Queue, album, data if num == 1: total = self._total_pages(page) if num >= total: return num += 1 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1674479010.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.25.0/gallery_dl/extractor/mastodon.py��������������������������������������������������0000644�0001750�0001750�00000025202�14363502642�020746� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2019-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for Mastodon instances""" from .common import BaseExtractor, Message from .. import text, exception from ..cache import cache class MastodonExtractor(BaseExtractor): """Base class for mastodon extractors""" basecategory = "mastodon" directory_fmt = ("mastodon", "{instance}", "{account[username]}") filename_fmt = "{category}_{id}_{media[id]}.{extension}" archive_fmt = "{media[id]}" cookiedomain = None def __init__(self, match): BaseExtractor.__init__(self, match) self.instance = self.root.partition("://")[2] self.item = match.group(match.lastindex) self.reblogs = self.config("reblogs", False) self.replies = self.config("replies", True) def items(self): for status in self.statuses(): if self._check_moved: self._check_moved(status["account"]) if not self.reblogs and status["reblog"]: self.log.debug("Skipping %s (reblog)", status["id"]) continue if not self.replies and status["in_reply_to_id"]: self.log.debug("Skipping %s (reply)", status["id"]) continue attachments = status["media_attachments"] del status["media_attachments"] status["instance"] = self.instance acct = status["account"]["acct"] status["instance_remote"] = \ acct.rpartition("@")[2] if "@" in acct else None status["count"] = len(attachments) status["tags"] = [tag["name"] for tag in status["tags"]] status["date"] = text.parse_datetime( status["created_at"][:19], "%Y-%m-%dT%H:%M:%S") yield Message.Directory, status for status["num"], media in enumerate(attachments, 1): status["media"] = media url = media["url"] yield Message.Url, url, text.nameext_from_url(url, status) def statuses(self): """Return an iterable containing all relevant Status objects""" return () def _check_moved(self, account): self._check_moved = None if "moved" in account: self.log.warning("Account '%s' moved to '%s'", account["acct"], account["moved"]["acct"]) INSTANCES = { "mastodon.social": { "root" : "https://mastodon.social", "pattern" : r"mastodon\.social", "access-token" : "Y06R36SMvuXXN5_wiPKFAEFiQaMSQg0o_hGgc86Jj48", "client-id" : "dBSHdpsnOUZgxOnjKSQrWEPakO3ctM7HmsyoOd4FcRo", "client-secret": "DdrODTHs_XoeOsNVXnILTMabtdpWrWOAtrmw91wU1zI", }, "pawoo": { "root" : "https://pawoo.net", "pattern" : r"pawoo\.net", "access-token" : "c12c9d275050bce0dc92169a28db09d7" "0d62d0a75a8525953098c167eacd3668", "client-id" : "978a25f843ec01e53d09be2c290cd75c" "782bc3b7fdbd7ea4164b9f3c3780c8ff", "client-secret": "9208e3d4a7997032cf4f1b0e12e5df38" "8428ef1fadb446dcfeb4f5ed6872d97b", }, "baraag": { "root" : "https://baraag.net", "pattern" : r"baraag\.net", "access-token" : "53P1Mdigf4EJMH-RmeFOOSM9gdSDztmrAYFgabOKKE0", "client-id" : "czxx2qilLElYHQ_sm-lO8yXuGwOHxLX9RYYaD0-nq1o", "client-secret": "haMaFdMBgK_-BIxufakmI2gFgkYjqmgXGEO2tB-R2xY", } } BASE_PATTERN = MastodonExtractor.update(INSTANCES) + "(?:/web)?" class MastodonUserExtractor(MastodonExtractor): """Extractor for all images of an account/user""" subcategory = "user" pattern = BASE_PATTERN + r"/(?:@|users/)([^/?#]+)(?:/media)?/?$" test = ( ("https://mastodon.social/@jk", { "pattern": r"https://files.mastodon.social/media_attachments" r"/files/(\d+/){3,}original/\w+", "range": "1-60", "count": 60, }), ("https://pawoo.net/@yoru_nine/", { "range": "1-60", "count": 60, }), ("https://baraag.net/@pumpkinnsfw"), ("https://mastodon.social/@yoru_nine@pawoo.net", { "pattern": r"https://mastodon\.social/media_proxy/\d+/original", "range": "1-10", "count": 10, }), ("https://mastodon.social/@id:10843"), ("https://mastodon.social/users/id:10843"), ("https://mastodon.social/users/jk"), ("https://mastodon.social/users/yoru_nine@pawoo.net"), ("https://mastodon.social/web/@jk"), ) def statuses(self): api = MastodonAPI(self) return api.account_statuses( api.account_id_by_username(self.item), only_media=not self.config("text-posts", False), exclude_replies=not self.replies, ) class MastodonBookmarkExtractor(MastodonExtractor): """Extractor for mastodon bookmarks""" subcategory = "bookmark" pattern = BASE_PATTERN + r"/bookmarks" test = ( ("https://mastodon.social/bookmarks"), ("https://pawoo.net/bookmarks"), ("https://baraag.net/bookmarks"), ) def statuses(self): return MastodonAPI(self).account_bookmarks() class MastodonFollowingExtractor(MastodonExtractor): """Extractor for followed mastodon users""" subcategory = "following" pattern = BASE_PATTERN + r"/users/([^/?#]+)/following" test = ( ("https://mastodon.social/users/0x4f/following", { "extractor": False, "count": ">= 20", }), ("https://mastodon.social/users/id:10843/following"), ("https://pawoo.net/users/yoru_nine/following"), ("https://baraag.net/users/pumpkinnsfw/following"), ) def items(self): api = MastodonAPI(self) account_id = api.account_id_by_username(self.item) for account in api.account_following(account_id): account["_extractor"] = MastodonUserExtractor yield Message.Queue, account["url"], account class MastodonStatusExtractor(MastodonExtractor): """Extractor for images from a status""" subcategory = "status" pattern = BASE_PATTERN + r"/@[^/?#]+/(\d+)" test = ( ("https://mastodon.social/@jk/103794036899778366", { "count": 4, "keyword": { "count": 4, "num": int, }, }), ("https://pawoo.net/@yoru_nine/105038878897832922", { "content": "b52e807f8ab548d6f896b09218ece01eba83987a", }), ("https://baraag.net/@pumpkinnsfw/104364170556898443", { "content": "67748c1b828c58ad60d0fe5729b59fb29c872244", }), ) def statuses(self): return (MastodonAPI(self).status(self.item),) class MastodonAPI(): """Minimal interface for the Mastodon API https://docs.joinmastodon.org/ https://github.com/tootsuite/mastodon """ def __init__(self, extractor): self.root = extractor.root self.extractor = extractor access_token = extractor.config("access-token") if access_token is None or access_token == "cache": access_token = _access_token_cache(extractor.instance) if not access_token: try: access_token = INSTANCES[extractor.category]["access-token"] except (KeyError, TypeError): pass if access_token: self.headers = {"Authorization": "Bearer " + access_token} else: self.headers = None def account_id_by_username(self, username): if username.startswith("id:"): return username[3:] if "@" in username: handle = "@" + username else: handle = "@{}@{}".format(username, self.extractor.instance) for account in self.account_search(handle, 1): if account["acct"] == username: self.extractor._check_moved(account) return account["id"] raise exception.NotFoundError("account") def account_bookmarks(self): endpoint = "/v1/bookmarks" return self._pagination(endpoint, None) def account_following(self, account_id): endpoint = "/v1/accounts/{}/following".format(account_id) return self._pagination(endpoint, None) def account_search(self, query, limit=40): """Search for accounts""" endpoint = "/v1/accounts/search" params = {"q": query, "limit": limit} return self._call(endpoint, params).json() def account_statuses(self, account_id, only_media=True, exclude_replies=False): """Fetch an account's statuses""" endpoint = "/v1/accounts/{}/statuses".format(account_id) params = {"only_media" : "1" if only_media else "0", "exclude_replies": "1" if exclude_replies else "0"} return self._pagination(endpoint, params) def status(self, status_id): """Fetch a status""" endpoint = "/v1/statuses/" + status_id return self._call(endpoint).json() def _call(self, endpoint, params=None): if endpoint.startswith("http"): url = endpoint else: url = self.root + "/api" + endpoint while True: response = self.extractor.request( url, params=params, headers=self.headers, fatal=None) code = response.status_code if code < 400: return response if code == 401: raise exception.StopExtraction( "Invalid or missing access token.\n" "Run 'gallery-dl oauth:mastodon:%s' to obtain one.", self.extractor.instance) if code == 404: raise exception.NotFoundError() if code == 429: self.extractor.wait(until=text.parse_datetime( response.headers["x-ratelimit-reset"], "%Y-%m-%dT%H:%M:%S.%fZ", )) continue raise exception.StopExtraction(response.json().get("error")) def _pagination(self, endpoint, params): url = endpoint while url: response = self._call(url, params) yield from response.json() url = response.links.get("next") if not url: return url = url["url"] params = None @cache(maxage=100*365*24*3600, keyarg=0) def _access_token_cache(instance): return None ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1651573353.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.25.0/gallery_dl/extractor/mememuseum.py������������������������������������������������0000644�0001750�0001750�00000010035�14234201151�021263� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://meme.museum/""" from .common import Extractor, Message from .. import text class MememuseumExtractor(Extractor): """Base class for meme.museum extractors""" basecategory = "booru" category = "mememuseum" filename_fmt = "{category}_{id}_{md5}.{extension}" archive_fmt = "{id}" root = "https://meme.museum" def items(self): data = self.metadata() for post in self.posts(): url = post["file_url"] for key in ("id", "width", "height"): post[key] = text.parse_int(post[key]) post["tags"] = text.unquote(post["tags"]) post.update(data) yield Message.Directory, post yield Message.Url, url, text.nameext_from_url(url, post) def metadata(self): """Return general metadata""" return () def posts(self): """Return an iterable containing data of all relevant posts""" return () class MememuseumTagExtractor(MememuseumExtractor): """Extractor for images from meme.museum by search-tags""" subcategory = "tag" directory_fmt = ("{category}", "{search_tags}") pattern = r"(?:https?://)?meme\.museum/post/list/([^/?#]+)" test = ("https://meme.museum/post/list/animated/1", { "pattern": r"https://meme\.museum/_images/\w+/\d+%20-%20", "count": ">= 30" }) per_page = 25 def __init__(self, match): MememuseumExtractor.__init__(self, match) self.tags = text.unquote(match.group(1)) def metadata(self): return {"search_tags": self.tags} def posts(self): pnum = 1 while True: url = "{}/post/list/{}/{}".format(self.root, self.tags, pnum) extr = text.extract_from(self.request(url).text) while True: mime = extr("data-mime='", "'") if not mime: break pid = extr("data-post-id='", "'") tags, dimensions, size = extr("title='", "'").split(" // ") md5 = extr("/_thumbs/", "/") width, _, height = dimensions.partition("x") yield { "file_url": "{}/_images/{}/{}%20-%20{}.{}".format( self.root, md5, pid, text.quote(tags), mime.rpartition("/")[2]), "id": pid, "md5": md5, "tags": tags, "width": width, "height": height, "size": text.parse_bytes(size[:-1]), } if not extr(">Next<", ">"): return pnum += 1 class MememuseumPostExtractor(MememuseumExtractor): """Extractor for single images from meme.museum""" subcategory = "post" pattern = r"(?:https?://)?meme\.museum/post/view/(\d+)" test = ("https://meme.museum/post/view/10243", { "pattern": r"https://meme\.museum/_images/105febebcd5ca791ee332adc4997" r"1f78/10243%20-%20g%20beard%20open_source%20richard_stallm" r"an%20stallman%20tagme%20text\.jpg", "keyword": "3c8009251480cf17248c08b2b194dc0c4d59580e", "content": "45565f3f141fc960a8ae1168b80e718a494c52d2", }) def __init__(self, match): MememuseumExtractor.__init__(self, match) self.post_id = match.group(1) def posts(self): url = "{}/post/view/{}".format(self.root, self.post_id) extr = text.extract_from(self.request(url).text) return ({ "id" : self.post_id, "tags" : extr(": ", "<"), "md5" : extr("/_thumbs/", "/"), "file_url": self.root + extr("id='main_image' src='", "'"), "width" : extr("data-width=", " ").strip("'\""), "height" : extr("data-height=", " ").strip("'\""), "size" : 0, },) ���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1639190302.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.25.0/gallery_dl/extractor/message.py���������������������������������������������������0000644�0001750�0001750�00000003453�14155007436�020552� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2015-2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. class Message(): """Enum for message identifiers Extractors yield their results as message-tuples, where the first element is one of the following identifiers. This message-identifier determines the type and meaning of the other elements in such a tuple. - Message.Version: - Message protocol version (currently always '1') - 2nd element specifies the version of all following messages as integer - Message.Directory: - Sets the target directory for all following images - 2nd element is a dictionary containing general metadata - Message.Url: - Image URL and its metadata - 2nd element is the URL as a string - 3rd element is a dictionary with image-specific metadata - Message.Headers: # obsolete - HTTP headers to use while downloading - 2nd element is a dictionary with header-name and -value pairs - Message.Cookies: # obsolete - Cookies to use while downloading - 2nd element is a dictionary with cookie-name and -value pairs - Message.Queue: - (External) URL that should be handled by another extractor - 2nd element is the (external) URL as a string - 3rd element is a dictionary containing URL-specific metadata - Message.Urllist: # obsolete - Same as Message.Url, but its 2nd element is a list of multiple URLs - The additional URLs serve as a fallback if the primary one fails """ Version = 1 Directory = 2 Url = 3 # Headers = 4 # Cookies = 5 Queue = 6 # Urllist = 7 # Metadata = 8 ���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1678397180.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.25.0/gallery_dl/extractor/misskey.py���������������������������������������������������0000644�0001750�0001750�00000014670�14402447374�020621� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for Misskey instances""" from .common import BaseExtractor, Message from .. import text class MisskeyExtractor(BaseExtractor): """Base class for Misskey extractors""" basecategory = "misskey" directory_fmt = ("misskey", "{instance}", "{user[username]}") filename_fmt = "{category}_{id}_{file[id]}.{extension}" archive_fmt = "{id}_{file[id]}" def __init__(self, match): BaseExtractor.__init__(self, match) self.api = MisskeyAPI(self) self.instance = self.root.rpartition("://")[2] self.item = match.group(match.lastindex) self.renotes = self.config("renotes", False) self.replies = self.config("replies", True) def items(self): for note in self.notes(): files = note.pop("files") or [] renote = note.get("renote") if renote: if not self.renotes: self.log.debug("Skipping %s (renote)", note["id"]) continue files.extend(renote.get("files") or ()) reply = note.get("reply") if reply: if not self.replies: self.log.debug("Skipping %s (reply)", note["id"]) continue files.extend(reply.get("files") or ()) note["instance"] = self.instance note["instance_remote"] = note["user"]["host"] note["count"] = len(files) note["date"] = text.parse_datetime( note["createdAt"], "%Y-%m-%dT%H:%M:%S.%f%z") yield Message.Directory, note for note["num"], file in enumerate(files, 1): file["date"] = text.parse_datetime( file["createdAt"], "%Y-%m-%dT%H:%M:%S.%f%z") note["file"] = file url = file["url"] yield Message.Url, url, text.nameext_from_url(url, note) def notes(self): """Return an iterable containing all relevant Note objects""" return () BASE_PATTERN = MisskeyExtractor.update({ "misskey.io": { "root": "https://misskey.io", "pattern": r"misskey\.io", }, "lesbian.energy": { "root": "https://lesbian.energy", "pattern": r"lesbian\.energy" }, "sushi.ski": { "root": "https://sushi.ski", "pattern": r"sushi\.ski", }, }) class MisskeyUserExtractor(MisskeyExtractor): """Extractor for all images of a Misskey user""" subcategory = "user" pattern = BASE_PATTERN + r"/@([^/?#]+)/?$" test = ( ("https://misskey.io/@lithla", { "pattern": r"https://s\d+\.arkjp\.net/misskey/[\w-]+\.\w+", "range": "1-50", "count": 50, }), ("https://misskey.io/@blooddj@pawoo.net", { "range": "1-50", "count": 50, }), ("https://lesbian.energy/@rerorero", { "pattern": r"https://lesbian.energy/files/\w+", "range": "1-50", "count": 50, }), ("https://lesbian.energy/@nano@mk.yopo.work"), ("https://sushi.ski/@ui@misskey.04.si"), ) def notes(self): return self.api.users_notes(self.api.user_id_by_username(self.item)) class MisskeyFollowingExtractor(MisskeyExtractor): """Extractor for followed Misskey users""" subcategory = "following" pattern = BASE_PATTERN + r"/@([^/?#]+)/following" test = ( ("https://misskey.io/@blooddj@pawoo.net/following", { "extractor": False, "count": ">= 6", }), ("https://sushi.ski/@hatusimo_sigure/following"), ) def items(self): user_id = self.api.user_id_by_username(self.item) for user in self.api.users_following(user_id): user = user["followee"] url = self.root + "/@" + user["username"] host = user["host"] if host is not None: url += "@" + host user["_extractor"] = MisskeyUserExtractor yield Message.Queue, url, user class MisskeyNoteExtractor(MisskeyExtractor): """Extractor for images from a Note""" subcategory = "note" pattern = BASE_PATTERN + r"/notes/(\w+)" test = ( ("https://misskey.io/notes/9bhqfo835v", { "pattern": r"https://s\d+\.arkjp\.net/misskey/[\w-]+\.\w+", "count": 4, }), ("https://misskey.io/notes/9brq7z1re6"), ("https://sushi.ski/notes/9bm3x4ksqw", { "pattern": r"https://media\.sushi\.ski/files/[\w-]+\.png", "count": 1, }), ("https://lesbian.energy/notes/995ig09wqy", { "count": 1, }), ("https://lesbian.energy/notes/96ynd9w5kc"), ) def notes(self): return (self.api.notes_show(self.item),) class MisskeyAPI(): """Interface for Misskey API https://github.com/misskey-dev/misskey https://misskey-hub.net/en/docs/api/ https://misskey-hub.net/docs/api/endpoints.html """ def __init__(self, extractor): self.root = extractor.root self.extractor = extractor self.headers = {"Content-Type": "application/json"} def user_id_by_username(self, username): endpoint = "/users/show" data = {"username": username} if "@" in username: data["username"], _, data["host"] = username.partition("@") return self._call(endpoint, data)["id"] def users_following(self, user_id): endpoint = "/users/following" data = {"userId": user_id} return self._pagination(endpoint, data) def users_notes(self, user_id): endpoint = "/users/notes" data = {"userId": user_id} return self._pagination(endpoint, data) def notes_show(self, note_id): endpoint = "/notes/show" data = {"noteId": note_id} return self._call(endpoint, data) def _call(self, endpoint, data): url = self.root + "/api" + endpoint return self.extractor.request( url, method="POST", headers=self.headers, json=data).json() def _pagination(self, endpoint, data): data["limit"] = 100 while True: notes = self._call(endpoint, data) if not notes: return yield from notes data["untilId"] = notes[-1]["id"] ������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1674475069.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.25.0/gallery_dl/extractor/moebooru.py��������������������������������������������������0000644�0001750�0001750�00000021747�14363473075�020772� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2020-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for Moebooru based sites""" from .booru import BooruExtractor from .. import text import collections import datetime import re class MoebooruExtractor(BooruExtractor): """Base class for Moebooru extractors""" basecategory = "moebooru" filename_fmt = "{category}_{id}_{md5}.{extension}" page_start = 1 @staticmethod def _prepare(post): post["date"] = text.parse_timestamp(post["created_at"]) def _html(self, post): return self.request("{}/post/show/{}".format( self.root, post["id"])).text def _tags(self, post, page): tag_container = text.extr(page, '<ul id="tag-', '</ul>') if not tag_container: return tags = collections.defaultdict(list) pattern = re.compile(r"tag-type-([^\"' ]+).*?[?;]tags=([^\"'+]+)") for tag_type, tag_name in pattern.findall(tag_container): tags[tag_type].append(text.unquote(tag_name)) for key, value in tags.items(): post["tags_" + key] = " ".join(value) def _notes(self, post, page): note_container = text.extr(page, 'id="note-container"', "<img ") if not note_container: return post["notes"] = notes = [] for note in note_container.split('class="note-box"')[1:]: extr = text.extract_from(note) notes.append({ "width" : int(extr("width:", "p")), "height": int(extr("height:", "p")), "y" : int(extr("top:", "p")), "x" : int(extr("left:", "p")), "id" : int(extr('id="note-body-', '"')), "body" : text.unescape(text.remove_html(extr(">", "</div>"))), }) def _pagination(self, url, params): params["page"] = self.page_start params["limit"] = self.per_page while True: posts = self.request(url, params=params).json() yield from posts if len(posts) < self.per_page: return params["page"] += 1 BASE_PATTERN = MoebooruExtractor.update({ "yandere": { "root": "https://yande.re", "pattern": r"yande\.re", }, "konachan": { "root": "https://konachan.com", "pattern": r"konachan\.(?:com|net)", }, "sakugabooru": { "root": "https://www.sakugabooru.com", "pattern": r"(?:www\.)?sakugabooru\.com", }, "lolibooru": { "root": "https://lolibooru.moe", "pattern": r"lolibooru\.moe", }, }) class MoebooruPostExtractor(MoebooruExtractor): subcategory = "post" archive_fmt = "{id}" pattern = BASE_PATTERN + r"/post/show/(\d+)" test = ( ("https://yande.re/post/show/51824", { "content": "59201811c728096b2d95ce6896fd0009235fe683", "options": (("tags", True),), "keyword": { "tags_artist": "sasaki_tamaru", "tags_circle": "softhouse_chara", "tags_copyright": "ouzoku", "tags_general": str, }, }), ("https://konachan.com/post/show/205189", { "content": "674e75a753df82f5ad80803f575818b8e46e4b65", "options": (("tags", True),), "keyword": { "tags_artist": "patata", "tags_character": "clownpiece", "tags_copyright": "touhou", "tags_general": str, }, }), ("https://yande.re/post/show/993156", { "content": "fed722bd90f48de41ec163692befc701056e2b1e", "options": (("notes", True),), "keyword": { "notes": [ { "id": 7096, "x" : 90, "y" : 626, "width" : 283, "height": 529, "body" : "Please keep this as a secret for me!!", }, { "id": 7095, "x" : 900, "y" : 438, "width" : 314, "height": 588, "body" : "The facts that I love playing games", }, ], }, }), ("https://lolibooru.moe/post/show/281305/", { "content": "a331430223ffc5b23c31649102e7d49f52489b57", "options": (("notes", True),), "keyword": { "notes": list, }, }), ("https://konachan.net/post/show/205189"), ("https://www.sakugabooru.com/post/show/125570"), ("https://lolibooru.moe/post/show/287835"), ) def __init__(self, match): MoebooruExtractor.__init__(self, match) self.post_id = match.group(match.lastindex) def posts(self): params = {"tags": "id:" + self.post_id} return self.request(self.root + "/post.json", params=params).json() class MoebooruTagExtractor(MoebooruExtractor): subcategory = "tag" directory_fmt = ("{category}", "{search_tags}") archive_fmt = "t_{search_tags}_{id}" pattern = BASE_PATTERN + r"/post\?(?:[^&#]*&)*tags=([^&#]+)" test = ( ("https://yande.re/post?tags=ouzoku+armor", { "content": "59201811c728096b2d95ce6896fd0009235fe683", }), ("https://konachan.com/post?tags=patata", { "content": "838cfb815e31f48160855435655ddf7bfc4ecb8d", }), ("https://konachan.net/post?tags=patata"), ("https://www.sakugabooru.com/post?tags=nichijou"), ("https://lolibooru.moe/post?tags=ruu_%28tksymkw%29"), ) def __init__(self, match): MoebooruExtractor.__init__(self, match) tags = match.group(match.lastindex) self.tags = text.unquote(tags.replace("+", " ")) def metadata(self): return {"search_tags": self.tags} def posts(self): params = {"tags": self.tags} return self._pagination(self.root + "/post.json", params) class MoebooruPoolExtractor(MoebooruExtractor): subcategory = "pool" directory_fmt = ("{category}", "pool", "{pool}") archive_fmt = "p_{pool}_{id}" pattern = BASE_PATTERN + r"/pool/show/(\d+)" test = ( ("https://yande.re/pool/show/318", { "content": "2a35b9d6edecce11cc2918c6dce4de2198342b68", }), ("https://konachan.com/pool/show/95", { "content": "cf0546e38a93c2c510a478f8744e60687b7a8426", }), ("https://konachan.net/pool/show/95"), ("https://www.sakugabooru.com/pool/show/54"), ("https://lolibooru.moe/pool/show/239"), ) def __init__(self, match): MoebooruExtractor.__init__(self, match) self.pool_id = match.group(match.lastindex) def metadata(self): return {"pool": text.parse_int(self.pool_id)} def posts(self): params = {"tags": "pool:" + self.pool_id} return self._pagination(self.root + "/post.json", params) class MoebooruPopularExtractor(MoebooruExtractor): subcategory = "popular" directory_fmt = ("{category}", "popular", "{scale}", "{date}") archive_fmt = "P_{scale[0]}_{date}_{id}" pattern = BASE_PATTERN + \ r"/post/popular_(by_(?:day|week|month)|recent)(?:\?([^#]*))?" test = ( ("https://yande.re/post/popular_by_month?month=6&year=2014", { "count": 40, }), ("https://yande.re/post/popular_recent"), ("https://konachan.com/post/popular_by_month?month=11&year=2010", { "count": 20, }), ("https://konachan.com/post/popular_recent"), ("https://konachan.net/post/popular_recent"), ("https://www.sakugabooru.com/post/popular_recent"), ("https://lolibooru.moe/post/popular_recent"), ) def __init__(self, match): MoebooruExtractor.__init__(self, match) self.scale = match.group(match.lastindex-1) self.query = match.group(match.lastindex) def metadata(self): self.params = params = text.parse_query(self.query) if "year" in params: date = "{:>04}-{:>02}-{:>02}".format( params["year"], params.get("month", "01"), params.get("day", "01"), ) else: date = datetime.date.today().isoformat() scale = self.scale if scale.startswith("by_"): scale = scale[3:] if scale == "week": date = datetime.date.fromisoformat(date) date = (date - datetime.timedelta(days=date.weekday())).isoformat() elif scale == "month": date = date[:-3] return {"date": date, "scale": scale} def posts(self): url = "{}/post/popular_{}.json".format(self.root, self.scale) return self.request(url, params=self.params).json() �������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1677967926.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.25.0/gallery_dl/extractor/myhentaigallery.py�������������������������������������������0000644�0001750�0001750�00000005024�14400741066�022315� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://myhentaigallery.com/""" from .common import GalleryExtractor from .. import text, exception class MyhentaigalleryGalleryExtractor(GalleryExtractor): """Extractor for image galleries from myhentaigallery.com""" category = "myhentaigallery" directory_fmt = ("{category}", "{gallery_id} {artist:?[/] /J, }{title}") pattern = (r"(?:https?://)?myhentaigallery\.com" r"/gallery/(?:thumbnails|show)/(\d+)") test = ( ("https://myhentaigallery.com/gallery/thumbnails/16247", { "pattern": r"https://images.myhentaicomics\.com/imagesgallery" r"/images/[^/]+/original/\d+\.jpg", "keyword": { "artist" : list, "count" : 11, "gallery_id": 16247, "group" : list, "parodies" : list, "tags" : ["Giantess"], "title" : "Attack Of The 50ft Woman 1", }, }), ("https://myhentaigallery.com/gallery/show/16247/1"), ) root = "https://myhentaigallery.com" def __init__(self, match): self.gallery_id = match.group(1) url = "{}/gallery/thumbnails/{}".format(self.root, self.gallery_id) GalleryExtractor.__init__(self, match, url) self.session.headers["Referer"] = url def metadata(self, page): extr = text.extract_from(page) split = text.split_html title = extr('<div class="comic-description">\n', '</h1>').lstrip() if title.startswith("<h1>"): title = title[len("<h1>"):] if not title: raise exception.NotFoundError("gallery") return { "title" : text.unescape(title), "gallery_id": text.parse_int(self.gallery_id), "tags" : split(extr('<div>\nCategories:', '</div>')), "artist" : split(extr('<div>\nArtists:' , '</div>')), "group" : split(extr('<div>\nGroups:' , '</div>')), "parodies" : split(extr('<div>\nParodies:' , '</div>')), } def images(self, page): return [ (text.unescape(text.extr(url, 'src="', '"')).replace( "/thumbnail/", "/original/"), None) for url in text.extract_iter(page, 'class="comic-thumb"', '</div>') ] ������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1674475069.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.25.0/gallery_dl/extractor/myportfolio.py�����������������������������������������������0000644�0001750�0001750�00000007720�14363473075�021521� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2018-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extract images from https://www.myportfolio.com/""" from .common import Extractor, Message from .. import text, exception class MyportfolioGalleryExtractor(Extractor): """Extractor for an image gallery on www.myportfolio.com""" category = "myportfolio" subcategory = "gallery" directory_fmt = ("{category}", "{user}", "{title}") filename_fmt = "{num:>02}.{extension}" archive_fmt = "{user}_{filename}" pattern = (r"(?:myportfolio:(?:https?://)?([^/]+)|" r"(?:https?://)?([\w-]+\.myportfolio\.com))" r"(/[^/?&#]+)?") test = ( ("https://andrewling.myportfolio.com/volvo-xc-90-hybrid", { "url": "acea0690c76db0e5cf267648cefd86e921bc3499", "keyword": "6ac6befe2ee0af921d24cf1dd4a4ed71be06db6d", }), ("https://andrewling.myportfolio.com/", { "pattern": r"https://andrewling\.myportfolio\.com/[^/?#+]+$", "count": ">= 6", }), ("https://stevenilousphotography.myportfolio.com/society", { "exception": exception.NotFoundError, }), # custom domain ("myportfolio:https://tooco.com.ar/6-of-diamonds-paradise-bird", { "count": 3, }), ("myportfolio:https://tooco.com.ar/", { "pattern": pattern, "count": ">= 40", }), ) def __init__(self, match): Extractor.__init__(self, match) domain1, domain2, self.path = match.groups() self.domain = domain1 or domain2 self.prefix = "myportfolio:" if domain1 else "" def items(self): url = "https://" + self.domain + (self.path or "") response = self.request(url) if response.history and response.url.endswith(".adobe.com/missing"): raise exception.NotFoundError() page = response.text projects = text.extr( page, '<section class="project-covers', '</section>') if projects: data = {"_extractor": MyportfolioGalleryExtractor} base = self.prefix + "https://" + self.domain for path in text.extract_iter(projects, ' href="', '"'): yield Message.Queue, base + path, data else: data = self.metadata(page) imgs = self.images(page) data["count"] = len(imgs) yield Message.Directory, data for data["num"], url in enumerate(imgs, 1): yield Message.Url, url, text.nameext_from_url(url, data) @staticmethod def metadata(page): """Collect general image metadata""" # og:title contains data as "<user> - <title>", but both # <user> and <title> can contain a "-" as well, so we get the title # from somewhere else and cut that amount from the og:title content extr = text.extract_from(page) user = extr('property="og:title" content="', '"') or \ extr('property=og:title content="', '"') descr = extr('property="og:description" content="', '"') or \ extr('property=og:description content="', '"') title = extr('<h1 ', '</h1>') if title: title = title.partition(">")[2] user = user[:-len(title)-3] elif user: user, _, title = user.partition(" - ") else: raise exception.NotFoundError() return { "user": text.unescape(user), "title": text.unescape(title), "description": text.unescape(descr), } @staticmethod def images(page): """Extract and return a list of all image-urls""" return ( list(text.extract_iter(page, 'js-lightbox" data-src="', '"')) or list(text.extract_iter(page, 'data-src="', '"')) ) ������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1675804288.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.25.0/gallery_dl/extractor/nana.py������������������������������������������������������0000644�0001750�0001750�00000007671�14370537200�020045� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://nana.my.id/""" from .common import GalleryExtractor, Extractor, Message from .. import text, util, exception class NanaGalleryExtractor(GalleryExtractor): """Extractor for image galleries from nana.my.id""" category = "nana" directory_fmt = ("{category}", "{title}") pattern = r"(?:https?://)?nana\.my\.id/reader/([^/?#]+)" test = ( (("https://nana.my.id/reader/" "059f7de55a4297413bfbd432ce7d6e724dd42bae"), { "pattern": r"https://nana\.my\.id/reader/" r"\w+/image/page\?path=.*\.\w+", "title" : "Everybody Loves Shion", "artist" : "fuzui", "tags" : list, "count" : 29, }), (("https://nana.my.id/reader/" "77c8712b67013e427923573379f5bafcc0c72e46"), { "pattern": r"https://nana\.my\.id/reader/" r"\w+/image/page\?path=.*\.\w+", "title" : "Lovey-Dovey With an Otaku-Friendly Gyaru", "artist" : "Sueyuu", "tags" : ["Sueyuu"], "count" : 58, }), ) def __init__(self, match): self.gallery_id = match.group(1) url = "https://nana.my.id/reader/" + self.gallery_id GalleryExtractor.__init__(self, match, url) def metadata(self, page): title = text.unescape( text.extr(page, '</a>  ', '</div>')) artist = text.unescape(text.extr( page, '<title>', ''))[len(title):-10] tags = text.extr(page, 'Reader.tags = "', '"') return { "gallery_id": self.gallery_id, "title" : title, "artist" : artist[4:] if artist.startswith(" by ") else "", "tags" : tags.split(", ") if tags else (), "lang" : "en", "language" : "English", } def images(self, page): data = util.json_loads(text.extr(page, "Reader.pages = ", ".pages")) return [ ("https://nana.my.id" + image, None) for image in data["pages"] ] class NanaSearchExtractor(Extractor): """Extractor for nana search results""" category = "nana" subcategory = "search" pattern = r"(?:https?://)?nana\.my\.id(?:/?\?([^#]+))" test = ( ('https://nana.my.id/?q=+"elf"&sort=desc', { "pattern": NanaGalleryExtractor.pattern, "range": "1-100", "count": 100, }), ("https://nana.my.id/?q=favorites%3A", { "pattern": NanaGalleryExtractor.pattern, "count": ">= 2", }), ) def __init__(self, match): Extractor.__init__(self, match) self.params = text.parse_query(match.group(1)) self.params["p"] = text.parse_int(self.params.get("p"), 1) self.params["q"] = self.params.get("q") or "" def items(self): if "favorites:" in self.params["q"]: favkey = self.config("favkey") if not favkey: raise exception.AuthenticationError( "'Favorite key' not provided. " "Please see 'https://nana.my.id/tutorial'") self.session.cookies.set("favkey", favkey, domain="nana.my.id") data = {"_extractor": NanaGalleryExtractor} while True: try: page = self.request( "https://nana.my.id", params=self.params).text except exception.HttpError: return for gallery in text.extract_iter( page, '
    ', '
    '): url = "https://nana.my.id" + text.extr( gallery, '', '<') or extr('_postAddDate">', '<'), "%Y. %m. %d. %H:%M") return data def images(self, page): return [ (url.replace("://post", "://blog", 1).partition("?")[0], None) for url in text.extract_iter(page, 'data-lazy-src="', '"') ] class NaverBlogExtractor(NaverBase, Extractor): """Extractor for a user's blog on blog.naver.com""" subcategory = "blog" categorytransfer = True pattern = (r"(?:https?://)?blog\.naver\.com/" r"(?:PostList.nhn\?(?:[^&#]+&)*blogId=([^&#]+)|(\w+)/?$)") test = ( ("https://blog.naver.com/gukjung", { "pattern": NaverPostExtractor.pattern, "count": 12, "range": "1-12", }), ("https://blog.naver.com/PostList.nhn?blogId=gukjung", { "pattern": NaverPostExtractor.pattern, "count": 12, "range": "1-12", }), ) def __init__(self, match): Extractor.__init__(self, match) self.blog_id = match.group(1) or match.group(2) def items(self): # fetch first post number url = "{}/PostList.nhn?blogId={}".format(self.root, self.blog_id) post_num = text.extract( self.request(url).text, 'gnFirstLogNo = "', '"', )[0] # setup params for API calls url = "{}/PostViewBottomTitleListAsync.nhn".format(self.root) params = { "blogId" : self.blog_id, "logNo" : post_num or "0", "viewDate" : "", "categoryNo" : "", "parentCategoryNo" : "", "showNextPage" : "true", "showPreviousPage" : "false", "sortDateInMilli" : "", "isThumbnailViewType": "false", "countPerPage" : "", } # loop over all posts while True: data = self.request(url, params=params).json() for post in data["postList"]: post["url"] = "{}/PostView.nhn?blogId={}&logNo={}".format( self.root, self.blog_id, post["logNo"]) post["_extractor"] = NaverPostExtractor yield Message.Queue, post["url"], post if not data["hasNextPage"]: return params["logNo"] = data["nextIndexLogNo"] params["sortDateInMilli"] = data["nextIndexSortDate"] ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1674475069.0 gallery_dl-1.25.0/gallery_dl/extractor/naverwebtoon.py0000644000175000017500000001137614363473075021651 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2021 Seonghyeon Cho # Copyright 2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://comic.naver.com/""" from .common import GalleryExtractor, Extractor, Message from .. import text import re BASE_PATTERN = (r"(?:https?://)?comic\.naver\.com" r"/(webtoon|challenge|bestChallenge)") class NaverwebtoonBase(): """Base class for naver webtoon extractors""" category = "naverwebtoon" root = "https://comic.naver.com" class NaverwebtoonEpisodeExtractor(NaverwebtoonBase, GalleryExtractor): subcategory = "episode" directory_fmt = ("{category}", "{comic}") filename_fmt = "{episode:>03}-{num:>02}.{extension}" archive_fmt = "{title_id}_{episode}_{num}" pattern = BASE_PATTERN + r"/detail(?:\.nhn)?\?([^#]+)" test = ( (("https://comic.naver.com/webtoon/detail" "?titleId=26458&no=1&weekday=tue"), { "url": "47a956ba8c7a837213d5985f50c569fcff986f75", "content": "3806b6e8befbb1920048de9888dfce6220f69a60", "count": 14 }), (("https://comic.naver.com/challenge/detail" "?titleId=765124&no=1"), { "pattern": r"https://image-comic\.pstatic\.net/nas" r"/user_contents_data/challenge_comic/2021/01/19" r"/342586/upload_7149856273586337846\.jpeg", "count": 1, }), (("https://comic.naver.com/bestChallenge/detail.nhn" "?titleId=771467&no=3"), { "pattern": r"https://image-comic\.pstatic\.net/nas" r"/user_contents_data/challenge_comic/2021/04/28" r"/345534/upload_3617293622396203109\.jpeg", "count": 1, }), ) def __init__(self, match): path, query = match.groups() url = "{}/{}/detail?{}".format(self.root, path, query) GalleryExtractor.__init__(self, match, url) query = text.parse_query(query) self.title_id = query.get("titleId") self.episode = query.get("no") def metadata(self, page): extr = text.extract_from(page) return { "title_id": self.title_id, "episode" : self.episode, "title" : extr('property="og:title" content="', '"'), "comic" : extr('

    ', '', '').strip().split("/"), "description": extr('

    ', '

    '), "genre" : extr('', ''), "date" : extr('
    ', '
    '), } @staticmethod def images(page): view_area = text.extr(page, 'id="comic_view_area"', '

  • ') return [ (url, None) for url in text.extract_iter(view_area, '= 12", }), ) def __init__(self, match): Extractor.__init__(self, match) self.path, query = match.groups() query = text.parse_query(query) self.title_id = query.get("titleId") self.page_no = text.parse_int(query.get("page"), 1) def items(self): url = "{}/{}/list".format(self.root, self.path) params = {"titleId": self.title_id, "page": self.page_no} data = {"_extractor": NaverwebtoonEpisodeExtractor} while True: page = self.request(url, params=params).text data["page"] = self.page_no for episode_url in self.get_episode_urls(page): yield Message.Queue, episode_url, data if 'class="next"' not in page: return params["page"] += 1 def get_episode_urls(self, page): """Extract and return all episode urls in page""" return [ self.root + path for path in re.findall( r'= 400: return {} page = response.text pos = page.find('id="adults_only"') if pos >= 0: msg = text.extract(page, 'class="highlight">', '<', pos)[0] self.log.warning('"%s"', msg) extr = text.extract_from(page) data = extract_data(extr, post_url) data["_comment"] = extr( 'id="author_comments"', '
    ').partition(">")[2] data["comment"] = text.unescape(text.remove_html( data["_comment"], "", "")) data["favorites"] = text.parse_int(extr( 'id="faves_load">', '<').replace(",", "")) data["score"] = text.parse_float(extr('id="score_number">', '<')) data["tags"] = text.split_html(extr('
    ', '
    ')) data["artist"] = [ text.extr(user, '//', '.') for user in text.extract_iter(page, '
    ', '>') ] data["tags"].sort() data["user"] = self.user or data["artist"][0] data["post_url"] = post_url return data @staticmethod def _extract_image_data(extr, url): full = text.extract_from(util.json_loads(extr( '"full_image_text":', '});'))) data = { "title" : text.unescape(extr('"og:title" content="', '"')), "description": text.unescape(extr(':description" content="', '"')), "type" : extr('og:type" content="', '"'), "_type" : "i", "date" : text.parse_datetime(extr( 'itemprop="datePublished" content="', '"')), "rating" : extr('class="rated-', '"'), "url" : full('src="', '"'), "width" : text.parse_int(full('width="', '"')), "height" : text.parse_int(full('height="', '"')), } index = data["url"].rpartition("/")[2].partition("_")[0] data["index"] = text.parse_int(index) data["_index"] = index return data @staticmethod def _extract_audio_data(extr, url): index = url.split("/")[5] return { "title" : text.unescape(extr('"og:title" content="', '"')), "description": text.unescape(extr(':description" content="', '"')), "type" : extr('og:type" content="', '"'), "_type" : "a", "date" : text.parse_datetime(extr( 'itemprop="datePublished" content="', '"')), "url" : extr('{"url":"', '"').replace("\\/", "/"), "index" : text.parse_int(index), "_index" : index, "rating" : "", } def _extract_media_data(self, extr, url): index = url.split("/")[5] title = extr('"og:title" content="', '"') type = extr('og:type" content="', '"') descr = extr('"og:description" content="', '"') src = extr('{"url":"', '"') if src: src = src.replace("\\/", "/") fallback = () date = text.parse_datetime(extr( 'itemprop="datePublished" content="', '"')) else: url = self.root + "/portal/video/" + index headers = { "Accept": "application/json, text/javascript, */*; q=0.01", "X-Requested-With": "XMLHttpRequest", "Referer": self.root, } sources = self.request(url, headers=headers).json()["sources"] if self.format is True: src = sources["360p"][0]["src"].replace(".360p.", ".") formats = sources else: formats = [] for fmt, src in sources.items(): width = text.parse_int(fmt.rstrip("p")) if width <= self.format: formats.append((width, src)) if formats: formats.sort(reverse=True) src, formats = formats[0][1][0]["src"], formats[1:] else: src = "" fallback = self._video_fallback(formats) date = text.parse_timestamp(src.rpartition("?")[2]) return { "title" : text.unescape(title), "url" : src, "date" : date, "type" : type, "_type" : "", "description": text.unescape(descr or extr( 'itemprop="description" content="', '"')), "rating" : extr('class="rated-', '"'), "index" : text.parse_int(index), "_index" : index, "_fallback" : fallback, } @staticmethod def _video_fallback(formats): if isinstance(formats, dict): formats = list(formats.items()) formats.sort(key=lambda fmt: text.parse_int(fmt[0].rstrip("p")), reverse=True) for fmt in formats: yield fmt[1][0]["src"] def _pagination(self, kind): url = "{}/{}".format(self.user_root, kind) params = { "page": 1, "isAjaxRequest": "1", } headers = { "Referer": url, "X-Requested-With": "XMLHttpRequest", } while True: with self.request( url, params=params, headers=headers, fatal=False) as response: try: data = response.json() except ValueError: return if not data: return if "errors" in data: msg = ", ".join(text.unescape(e) for e in data["errors"]) raise exception.StopExtraction(msg) items = data.get("items") if not items: return for year, items in items.items(): for item in items: page_url = text.extr(item, 'href="', '"') if page_url[0] == "/": page_url = self.root + page_url yield page_url more = data.get("load_more") if not more or len(more) < 8: return params["page"] += 1 class NewgroundsImageExtractor(NewgroundsExtractor): """Extractor for a single image from newgrounds.com""" subcategory = "image" pattern = (r"(?:https?://)?(?:" r"(?:www\.)?newgrounds\.com/art/view/([^/?#]+)/[^/?#]+" r"|art\.ngfiles\.com/images/\d+/\d+_([^_]+)_([^.]+))") test = ( ("https://www.newgrounds.com/art/view/tomfulp/ryu-is-hawt", { "url": "57f182bcbbf2612690c3a54f16ffa1da5105245e", "content": "8f395e08333eb2457ba8d8b715238f8910221365", "keyword": { "artist" : ["tomfulp"], "comment" : "re:Consider this the bottom threshold for ", "date" : "dt:2009-06-04 14:44:05", "description": "re:Consider this the bottom threshold for ", "favorites" : int, "filename" : "94_tomfulp_ryu-is-hawt", "height" : 476, "index" : 94, "rating" : "e", "score" : float, "tags" : ["ryu", "streetfighter"], "title" : "Ryu is Hawt", "type" : "article", "user" : "tomfulp", "width" : 447, }, }), ("https://art.ngfiles.com/images/0/94_tomfulp_ryu-is-hawt.gif", { "url": "57f182bcbbf2612690c3a54f16ffa1da5105245e", }), ("https://www.newgrounds.com/art/view/sailoryon/yon-dream-buster", { "url": "84eec95e663041a80630df72719f231e157e5f5d", "count": 2, }), # "adult" rated (#2456) ("https://www.newgrounds.com/art/view/kekiiro/red", { "options": (("username", None),), "count": 1, }), ) def __init__(self, match): NewgroundsExtractor.__init__(self, match) if match.group(2): self.user = match.group(2) self.post_url = "https://www.newgrounds.com/art/view/{}/{}".format( self.user, match.group(3)) else: self.post_url = text.ensure_http_scheme(match.group(0)) def posts(self): return (self.post_url,) class NewgroundsMediaExtractor(NewgroundsExtractor): """Extractor for a media file from newgrounds.com""" subcategory = "media" pattern = (r"(?:https?://)?(?:www\.)?newgrounds\.com" r"(/(?:portal/view|audio/listen)/\d+)") test = ( ("https://www.newgrounds.com/portal/view/595355", { "pattern": r"https://uploads\.ungrounded\.net/alternate/564000" r"/564957_alternate_31\.mp4\?1359712249", "keyword": { "artist" : ["kickinthehead", "danpaladin", "tomfulp"], "comment" : "re:My fan trailer for Alien Hominid HD!", "date" : "dt:2013-02-01 09:50:49", "description": "Fan trailer for Alien Hominid HD!", "favorites" : int, "filename" : "564957_alternate_31", "index" : 595355, "rating" : "e", "score" : float, "tags" : ["alienhominid", "trailer"], "title" : "Alien Hominid Fan Trailer", "type" : "movie", "user" : "kickinthehead", }, }), ("https://www.newgrounds.com/audio/listen/609768", { "url": "f4c5490ae559a3b05e46821bb7ee834f93a43c95", "keyword": { "artist" : ["zj", "tomfulp"], "comment" : "re:RECORDED 12-09-2014\n\nFrom The ZJ \"Late ", "date" : "dt:2015-02-23 19:31:59", "description": "From The ZJ Report Show!", "favorites" : int, "index" : 609768, "rating" : "", "score" : float, "tags" : ["fulp", "interview", "tom", "zj"], "title" : "ZJ Interviews Tom Fulp!", "type" : "music.song", "user" : "zj", }, }), # flash animation (#1257) ("https://www.newgrounds.com/portal/view/161181/format/flash", { "pattern": r"https://uploads\.ungrounded\.net/161000" r"/161181_ddautta_mask__550x281_\.swf\?f1081628129", "keyword": {"type": "movie"}, }), # format selection (#1729) ("https://www.newgrounds.com/portal/view/758545", { "options": (("format", "720p"),), "pattern": r"https://uploads\.ungrounded\.net/alternate/1482000" r"/1482860_alternate_102516\.720p\.mp4\?\d+", }), # "adult" rated (#2456) ("https://www.newgrounds.com/portal/view/717744", { "options": (("username", None),), "count": 1, }), # flash game ("https://www.newgrounds.com/portal/view/829032", { "pattern": r"https://uploads\.ungrounded\.net/829000" r"/829032_picovsbeardx\.swf\?f1641968445", "range": "1", "keyword": { "artist" : [ "dungeonation", "carpetbakery", "animalspeakandrews", "bill", "chipollo", "dylz49", "gappyshamp", "pinktophat", "rad", "shapeshiftingblob", "tomfulp", "voicesbycorey", "psychogoldfish", ], "comment" : "re:The children are expendable. Take out the ", "date" : "dt:2022-01-10 23:00:57", "description": "Bloodshed in The Big House that Blew...again!", "favorites" : int, "index" : 829032, "post_url" : "https://www.newgrounds.com/portal/view/829032", "rating" : "m", "score" : float, "tags" : [ "assassin", "boyfriend", "darnell", "nene", "pico", "picos-school", ], "title" : "PICO VS BEAR DX", "type" : "game", "url" : "https://uploads.ungrounded.net/829000" "/829032_picovsbeardx.swf?f1641968445", }, }), ) def __init__(self, match): NewgroundsExtractor.__init__(self, match) self.user = "" self.post_url = self.root + match.group(1) def posts(self): return (self.post_url,) class NewgroundsArtExtractor(NewgroundsExtractor): """Extractor for all images of a newgrounds user""" subcategory = _path = "art" pattern = r"(?:https?://)?([\w-]+)\.newgrounds\.com/art/?$" test = ("https://tomfulp.newgrounds.com/art", { "pattern": NewgroundsImageExtractor.pattern, "count": ">= 3", }) class NewgroundsAudioExtractor(NewgroundsExtractor): """Extractor for all audio submissions of a newgrounds user""" subcategory = _path = "audio" pattern = r"(?:https?://)?([\w-]+)\.newgrounds\.com/audio/?$" test = ("https://tomfulp.newgrounds.com/audio", { "pattern": r"https://audio.ngfiles.com/\d+/\d+_.+\.mp3", "count": ">= 4", }) class NewgroundsMoviesExtractor(NewgroundsExtractor): """Extractor for all movies of a newgrounds user""" subcategory = _path = "movies" pattern = r"(?:https?://)?([\w-]+)\.newgrounds\.com/movies/?$" test = ("https://tomfulp.newgrounds.com/movies", { "pattern": r"https://uploads.ungrounded.net(/alternate)?/\d+/\d+_.+", "range": "1-10", "count": 10, }) class NewgroundsGamesExtractor(NewgroundsExtractor): """Extractor for a newgrounds user's games""" subcategory = _path = "games" pattern = r"(?:https?://)?([\w-]+)\.newgrounds\.com/games/?$" test = ("https://tomfulp.newgrounds.com/games", { "pattern": r"https://uploads.ungrounded.net(/alternate)?/\d+/\d+_.+", "range": "1-10", "count": 10, }) class NewgroundsUserExtractor(NewgroundsExtractor): """Extractor for a newgrounds user profile""" subcategory = "user" pattern = r"(?:https?://)?([\w-]+)\.newgrounds\.com/?$" test = ( ("https://tomfulp.newgrounds.com", { "pattern": "https://tomfulp.newgrounds.com/art$", }), ("https://tomfulp.newgrounds.com", { "options": (("include", "all"),), "pattern": "https://tomfulp.newgrounds.com/(art|audio|movies)$", "count": 3, }), ) def items(self): base = self.user_root + "/" return self._dispatch_extractors(( (NewgroundsArtExtractor , base + "art"), (NewgroundsAudioExtractor , base + "audio"), (NewgroundsGamesExtractor , base + "games"), (NewgroundsMoviesExtractor, base + "movies"), ), ("art",)) class NewgroundsFavoriteExtractor(NewgroundsExtractor): """Extractor for posts favorited by a newgrounds user""" subcategory = "favorite" directory_fmt = ("{category}", "{user}", "Favorites") pattern = (r"(?:https?://)?([\w-]+)\.newgrounds\.com" r"/favorites(?!/following)(?:/(art|audio|movies))?/?") test = ( ("https://tomfulp.newgrounds.com/favorites/art", { "range": "1-10", "count": ">= 10", }), ("https://tomfulp.newgrounds.com/favorites/audio"), ("https://tomfulp.newgrounds.com/favorites/movies"), ("https://tomfulp.newgrounds.com/favorites/"), ) def __init__(self, match): NewgroundsExtractor.__init__(self, match) self.kind = match.group(2) def posts(self): if self.kind: return self._pagination(self.kind) return itertools.chain.from_iterable( self._pagination(k) for k in ("art", "audio", "movies") ) def _pagination(self, kind): url = "{}/favorites/{}".format(self.user_root, kind) params = { "page": 1, "isAjaxRequest": "1", } headers = { "Referer": url, "X-Requested-With": "XMLHttpRequest", } while True: response = self.request(url, params=params, headers=headers) if response.history: return data = response.json() favs = self._extract_favorites(data.get("component") or "") yield from favs if len(favs) < 24: return params["page"] += 1 def _extract_favorites(self, page): return [ self.root + path for path in text.extract_iter( page, 'href="https://www.newgrounds.com', '"') ] class NewgroundsFollowingExtractor(NewgroundsFavoriteExtractor): """Extractor for a newgrounds user's favorited users""" subcategory = "following" pattern = r"(?:https?://)?([\w-]+)\.newgrounds\.com/favorites/(following)" test = ("https://tomfulp.newgrounds.com/favorites/following", { "pattern": NewgroundsUserExtractor.pattern, "range": "76-125", "count": 50, }) def items(self): data = {"_extractor": NewgroundsUserExtractor} for url in self._pagination(self.kind): yield Message.Queue, url, data @staticmethod def _extract_favorites(page): return [ text.ensure_http_scheme(user.rpartition('"')[2]) for user in text.extract_iter(page, 'class="item-user', '">= 400: continue page = response.text data = self._extract_data(page) data["image_id"] = text.parse_int(image_id) if self.user_name: data["user_id"] = self.user_id data["user_name"] = self.user_name else: data["user_id"] = data["artist_id"] data["user_name"] = data["artist_name"] yield Message.Directory, data for image in self._extract_images(page): image.update(data) if not image["extension"]: image["extension"] = "jpg" yield Message.Url, image["url"], image def image_ids(self): """Collect all relevant image-ids""" @staticmethod def _extract_data(page): """Extract image metadata from 'page'""" extr = text.extract_from(page) keywords = text.unescape(extr( 'name="keywords" content="', '" />')).split(",") data = { "title" : keywords[0].strip(), "description": text.unescape(extr( '"description": "', '"').replace("&", "&")), "date" : text.parse_datetime(extr( '"datePublished": "', '"'), "%a %b %d %H:%M:%S %Y", 9), "artist_id" : text.parse_int(extr('/members.php?id=', '"')), "artist_name": keywords[1], "tags" : keywords[2:-1], } return data @staticmethod def _extract_data_horne(page): """Extract image metadata from 'page'""" extr = text.extract_from(page) keywords = text.unescape(extr( 'name="keywords" content="', '" />')).split(",") data = { "title" : keywords[0].strip(), "description": text.unescape(extr( 'property="og:description" content="', '"')), "artist_id" : text.parse_int(extr('members.php?id=', '"')), "artist_name": keywords[1], "tags" : keywords[2:-1], "date" : text.parse_datetime(extr( "itemprop='datePublished' content=", "<").rpartition(">")[2], "%Y-%m-%d %H:%M:%S", 9), } return data @staticmethod def _extract_images(page): """Extract image URLs from 'page'""" images = text.extract_iter(page, "/view_popup.php", "") for num, image in enumerate(images): src = text.extr(image, 'src="', '"') if not src: continue url = ("https:" + src).replace("/__rs_l120x120/", "/") yield text.nameext_from_url(url, { "num": num, "url": url, }) @staticmethod def _extract_user_name(page): return text.unescape(text.extr(page, "
    ", "<")) def login(self): """Login and obtain session cookies""" if not self._check_cookies(self.cookienames): username, password = self._get_auth_info() self._update_cookies(self._login_impl(username, password)) @cache(maxage=90*24*3600, keyarg=1) def _login_impl(self, username, password): if not username or not password: raise exception.AuthenticationError( "Username and password required") self.log.info("Logging in as %s", username) url = "{}/login_int.php".format(self.root) data = {"email": username, "password": password, "save": "on"} response = self.request(url, method="POST", data=data) if "/login.php" in response.text: raise exception.AuthenticationError() return self.session.cookies def _pagination(self, path): url = "{}/{}.php".format(self.root, path) params = {"id": self.user_id, "p": 1} while True: page = self.request(url, params=params, notfound="artist").text if self.user_name is None: self.user_name = self._extract_user_name(page) yield from text.extract_iter(page, 'illust_id="', '"') if '= 400: return None user = response.json()["data"] attr = user["attributes"] attr["id"] = user["id"] attr["date"] = text.parse_datetime( attr["created"], "%Y-%m-%dT%H:%M:%S.%f%z") return attr def _filename(self, url): """Fetch filename from an URL's Content-Disposition header""" response = self.request(url, method="HEAD", fatal=False) cd = response.headers.get("Content-Disposition") return text.extr(cd, 'filename="', '"') @staticmethod def _filehash(url): """Extract MD5 hash from a download URL""" parts = url.partition("?")[0].split("/") parts.reverse() for part in parts: if len(part) == 32: return part return "" @staticmethod def _build_url(endpoint, query): return ( "https://www.patreon.com/api/" + endpoint + "?include=campaign,access_rules,attachments,audio,images,media," "native_video_insights,poll.choices," "poll.current_user_responses.user," "poll.current_user_responses.choice," "poll.current_user_responses.poll," "user,user_defined_tags,ti_checks" "&fields[campaign]=currency,show_audio_post_download_links," "avatar_photo_url,avatar_photo_image_urls,earnings_visibility," "is_nsfw,is_monthly,name,url" "&fields[post]=change_visibility_at,comment_count,commenter_count," "content,current_user_can_comment,current_user_can_delete," "current_user_can_view,current_user_has_liked,embed,image," "insights_last_updated_at,is_paid,like_count,meta_image_url," "min_cents_pledged_to_view,post_file,post_metadata,published_at," "patreon_url,post_type,pledge_url,preview_asset_type,thumbnail," "thumbnail_url,teaser_text,title,upgrade_url,url," "was_posted_by_campaign_owner,has_ti_violation,moderation_status," "post_level_suspension_removal_date,pls_one_liners_by_category," "video_preview,view_count" "&fields[post_tag]=tag_type,value" "&fields[user]=image_url,full_name,url" "&fields[access_rule]=access_rule_type,amount_cents" "&fields[media]=id,image_urls,download_url,metadata,file_name" "&fields[native_video_insights]=average_view_duration," "average_view_pct,has_preview,id,last_updated_at,num_views," "preview_views,video_duration" + query + "&json-api-version=1.0" ) def _build_file_generators(self, filetypes): if filetypes is None: return (self._images, self._image_large, self._attachments, self._postfile, self._content) genmap = { "images" : self._images, "image_large": self._image_large, "attachments": self._attachments, "postfile" : self._postfile, "content" : self._content, } if isinstance(filetypes, str): filetypes = filetypes.split(",") return [genmap[ft] for ft in filetypes] def _extract_bootstrap(self, page): return util.json_loads(text.extr( page, "window.patreon.bootstrap,", "\n});") + "}") class PatreonCreatorExtractor(PatreonExtractor): """Extractor for a creator's works""" subcategory = "creator" pattern = (r"(?:https?://)?(?:www\.)?patreon\.com" r"/(?!(?:home|join|posts|login|signup)(?:$|[/?#]))" r"([^/?#]+)(?:/posts)?/?(?:\?([^#]+))?") test = ( ("https://www.patreon.com/koveliana", { "range": "1-25", "count": ">= 25", "keyword": { "attachments" : list, "comment_count": int, "content" : str, "creator" : dict, "date" : "type:datetime", "id" : int, "images" : list, "like_count" : int, "post_type" : str, "published_at" : str, "title" : str, }, }), ("https://www.patreon.com/koveliana/posts?filters[month]=2020-3", { "count": 1, "keyword": {"date": "dt:2020-03-30 21:21:44"}, }), ("https://www.patreon.com/kovelianot", { "exception": exception.NotFoundError, }), ("https://www.patreon.com/user?u=2931440"), ("https://www.patreon.com/user/posts/?u=2931440"), ) def __init__(self, match): PatreonExtractor.__init__(self, match) self.creator, self.query = match.groups() def posts(self): query = text.parse_query(self.query) creator_id = query.get("u") if creator_id: url = "{}/user/posts?u={}".format(self.root, creator_id) else: url = "{}/{}/posts".format(self.root, self.creator) page = self.request(url, notfound="creator").text try: data = self._extract_bootstrap(page) campaign_id = data["creator"]["data"]["id"] except (KeyError, ValueError): raise exception.NotFoundError("creator") filters = "".join( "&filter[{}={}".format(key[8:], text.escape(value)) for key, value in query.items() if key.startswith("filters[") ) url = self._build_url("posts", ( "&filter[campaign_id]=" + campaign_id + "&filter[contains_exclusive_posts]=true" "&filter[is_draft]=false" + filters + "&sort=" + query.get("sort", "-published_at") )) return self._pagination(url) class PatreonUserExtractor(PatreonExtractor): """Extractor for media from creators supported by you""" subcategory = "user" pattern = r"(?:https?://)?(?:www\.)?patreon\.com/home$" test = ("https://www.patreon.com/home",) def posts(self): url = self._build_url("stream", ( "&page[cursor]=null" "&filter[is_following]=true" "&json-api-use-default-includes=false" )) return self._pagination(url) class PatreonPostExtractor(PatreonExtractor): """Extractor for media from a single post""" subcategory = "post" pattern = r"(?:https?://)?(?:www\.)?patreon\.com/posts/([^/?#]+)" test = ( # postfile + attachments ("https://www.patreon.com/posts/precious-metal-23563293", { "count": 4, }), # postfile + content ("https://www.patreon.com/posts/56127163", { "count": 3, "keyword": {"filename": r"re:^(?!1).+$"}, }), # tags (#1539) ("https://www.patreon.com/posts/free-post-12497641", { "keyword": {"tags": ["AWMedia"]}, }), ("https://www.patreon.com/posts/not-found-123", { "exception": exception.NotFoundError, }), ) def __init__(self, match): PatreonExtractor.__init__(self, match) self.slug = match.group(1) def posts(self): url = "{}/posts/{}".format(self.root, self.slug) page = self.request(url, notfound="post").text post = self._extract_bootstrap(page)["post"] included = self._transform(post["included"]) return (self._process(post["data"], included),) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1674918287.0 gallery_dl-1.25.0/gallery_dl/extractor/philomena.py0000644000175000017500000002034714365234617021111 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2021-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for Philomena sites""" from .booru import BooruExtractor from .. import text, exception import operator class PhilomenaExtractor(BooruExtractor): """Base class for philomena extractors""" basecategory = "philomena" filename_fmt = "{filename}.{extension}" archive_fmt = "{id}" request_interval = 1.0 per_page = 50 _file_url = operator.itemgetter("view_url") @staticmethod def _prepare(post): post["date"] = text.parse_datetime(post["created_at"]) def _pagination(self, url, params): params["page"] = 1 params["per_page"] = self.per_page api_key = self.config("api-key") if api_key: params["key"] = api_key filter_id = self.config("filter") if filter_id: params["filter_id"] = filter_id elif not api_key: try: params["filter_id"] = INSTANCES[self.category]["filter_id"] except (KeyError, TypeError): params["filter_id"] = "2" while True: data = self.request(url, params=params).json() yield from data["images"] if len(data["images"]) < self.per_page: return params["page"] += 1 INSTANCES = { "derpibooru": { "root": "https://derpibooru.org", "pattern": r"(?:www\.)?derpibooru\.org", "filter_id": "56027", }, "ponybooru": { "root": "https://ponybooru.org", "pattern": r"(?:www\.)?ponybooru\.org", "filter_id": "2", }, "furbooru": { "root": "https://furbooru.org", "pattern": r"furbooru\.org", "filter_id": "2", }, } BASE_PATTERN = PhilomenaExtractor.update(INSTANCES) class PhilomenaPostExtractor(PhilomenaExtractor): """Extractor for single posts on a Philomena booru""" subcategory = "post" pattern = BASE_PATTERN + r"/(?:images/)?(\d+)" test = ( ("https://derpibooru.org/images/1", { "content": "88449eeb0c4fa5d3583d0b794f6bc1d70bf7f889", "count": 1, "keyword": { "animated": False, "aspect_ratio": 1.0, "comment_count": int, "created_at": "2012-01-02T03:12:33Z", "date": "dt:2012-01-02 03:12:33", "deletion_reason": None, "description": "", "downvotes": int, "duplicate_of": None, "duration": 0.04, "extension": "png", "faves": int, "first_seen_at": "2012-01-02T03:12:33Z", "format": "png", "height": 900, "hidden_from_users": False, "id": 1, "mime_type": "image/png", "name": "1__safe_fluttershy_solo_cloud_happy_flying_upvotes+ga" "lore_artist-colon-speccysy_get_sunshine", "orig_sha512_hash": None, "processed": True, "representations": dict, "score": int, "sha512_hash": "f16c98e2848c2f1bfff3985e8f1a54375cc49f78125391" "aeb80534ce011ead14e3e452a5c4bc98a66f56bdfcd07e" "f7800663b994f3f343c572da5ecc22a9660f", "size": 860914, "source_url": "https://www.deviantart.com/speccysy/art" "/Afternoon-Flight-215193985", "spoilered": False, "tag_count": int, "tag_ids": list, "tags": list, "thumbnails_generated": True, "updated_at": r"re:\d\d\d\d-\d\d-\d\dT\d\d:\d\d:\d\dZ", "uploader": "Clover the Clever", "uploader_id": 211188, "upvotes": int, "view_url": str, "width": 900, "wilson_score": float, }, }), ("https://derpibooru.org/1"), ("https://www.derpibooru.org/1"), ("https://www.derpibooru.org/images/1"), ("https://ponybooru.org/images/1", { "content": "bca26f58fafd791fe07adcd2a28efd7751824605", }), ("https://www.ponybooru.org/images/1"), ("https://furbooru.org/images/1", { "content": "9eaa1e1b32fa0f16520912257dbefaff238d5fd2", }), ) def __init__(self, match): PhilomenaExtractor.__init__(self, match) self.image_id = match.group(match.lastindex) def posts(self): url = self.root + "/api/v1/json/images/" + self.image_id return (self.request(url).json()["image"],) class PhilomenaSearchExtractor(PhilomenaExtractor): """Extractor for Philomena search results""" subcategory = "search" directory_fmt = ("{category}", "{search_tags}") pattern = BASE_PATTERN + r"/(?:search/?\?([^#]+)|tags/([^/?#]+))" test = ( ("https://derpibooru.org/search?q=cute", { "range": "40-60", "count": 21, }), ("https://derpibooru.org/tags/cute", { "range": "40-60", "count": 21, }), (("https://derpibooru.org/tags/" "artist-colon--dash-_-fwslash--fwslash-%255Bkorroki%255D_aternak"), { "count": ">= 2", }), ("https://ponybooru.org/search?q=cute", { "range": "40-60", "count": 21, }), ("https://furbooru.org/search?q=cute", { "range": "40-60", "count": 21, }), ) def __init__(self, match): PhilomenaExtractor.__init__(self, match) groups = match.groups() if groups[-1]: q = groups[-1].replace("+", " ") for old, new in ( ("-colon-" , ":"), ("-dash-" , "-"), ("-dot-" , "."), ("-plus-" , "+"), ("-fwslash-", "/"), ("-bwslash-", "\\"), ): if old in q: q = q.replace(old, new) self.params = {"q": text.unquote(text.unquote(q))} else: self.params = text.parse_query(groups[-2]) def metadata(self): return {"search_tags": self.params.get("q", "")} def posts(self): url = self.root + "/api/v1/json/search/images" return self._pagination(url, self.params) class PhilomenaGalleryExtractor(PhilomenaExtractor): """Extractor for Philomena galleries""" subcategory = "gallery" directory_fmt = ("{category}", "galleries", "{gallery[id]} {gallery[title]}") pattern = BASE_PATTERN + r"/galleries/(\d+)" test = ( ("https://derpibooru.org/galleries/1", { "pattern": r"https://derpicdn\.net/img/view/\d+/\d+/\d+/\d+[^/]+$", "keyword": { "gallery": { "description": "Indexes start at 1 :P", "id": 1, "spoiler_warning": "", "thumbnail_id": 1, "title": "The Very First Gallery", "user": "DeliciousBlackInk", "user_id": 365446, }, }, }), ("https://ponybooru.org/galleries/27", { "count": ">= 24", }), ("https://furbooru.org/galleries/27", { "count": ">= 13", }), ) def __init__(self, match): PhilomenaExtractor.__init__(self, match) self.gallery_id = match.group(match.lastindex) def metadata(self): url = self.root + "/api/v1/json/search/galleries" params = {"q": "id:" + self.gallery_id} galleries = self.request(url, params=params).json()["galleries"] if not galleries: raise exception.NotFoundError("gallery") return {"gallery": galleries[0]} def posts(self): gallery_id = "gallery_id:" + self.gallery_id url = self.root + "/api/v1/json/search/images" params = {"sd": "desc", "sf": gallery_id, "q": gallery_id} return self._pagination(url, params) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1677790111.0 gallery_dl-1.25.0/gallery_dl/extractor/photobucket.py0000644000175000017500000001473314400205637021454 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2019 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extract images from https://photobucket.com/""" from .common import Extractor, Message from .. import text, exception import binascii import json class PhotobucketAlbumExtractor(Extractor): """Extractor for albums on photobucket.com""" category = "photobucket" subcategory = "album" directory_fmt = ("{category}", "{username}", "{location}") filename_fmt = "{offset:>03}{pictureId:?_//}_{titleOrFilename}.{extension}" archive_fmt = "{id}" pattern = (r"(?:https?://)?((?:[\w-]+\.)?photobucket\.com)" r"/user/[^/?&#]+/library(?:/[^?&#]*)?") test = ( ("https://s369.photobucket.com/user/CrpyLrkr/library", { "pattern": r"https?://[oi]+\d+.photobucket.com/albums/oo139/", "count": ">= 50" }), # subalbums of main "directory" ("https://s271.photobucket.com/user/lakerfanryan/library/", { "options": (("image-filter", "False"),), "pattern": pattern, "count": 1, }), # subalbums of subalbum without images ("https://s271.photobucket.com/user/lakerfanryan/library/Basketball", { "pattern": pattern, "count": ">= 9", }), # private (missing JSON data) ("https://s1277.photobucket.com/user/sinisterkat44/library/", { "count": 0, }), ("https://s1110.photobucket.com/user/chndrmhn100/library/" "Chandu%20is%20the%20King?sort=3&page=1"), ) def __init__(self, match): Extractor.__init__(self, match) self.album_path = "" self.root = "https://" + match.group(1) self.session.headers["Referer"] = self.url def items(self): for image in self.images(): image["titleOrFilename"] = text.unescape(image["titleOrFilename"]) image["title"] = text.unescape(image["title"]) image["extension"] = image["ext"] yield Message.Directory, image yield Message.Url, image["fullsizeUrl"], image if self.config("subalbums", True): for album in self.subalbums(): album["_extractor"] = PhotobucketAlbumExtractor yield Message.Queue, album["url"], album def images(self): """Yield all images of the current album""" url = self.url params = {"sort": "3", "page": 1} while True: page = self.request(url, params=params).text json_data = text.extract(page, "collectionData:", ",\n")[0] if not json_data: msg = text.extr(page, 'libraryPrivacyBlock">', "
    ") msg = ' ("{}")'.format(text.remove_html(msg)) if msg else "" self.log.error("Unable to get JSON data%s", msg) return data = json.loads(json_data) yield from data["items"]["objects"] if data["total"] <= data["offset"] + data["pageSize"]: self.album_path = data["currentAlbumPath"] return params["page"] += 1 def subalbums(self): """Return all subalbum objects""" url = self.root + "/component/Albums-SubalbumList" params = { "albumPath": self.album_path, "fetchSubAlbumsOnly": "true", "deferCollapsed": "true", "json": "1", } data = self.request(url, params=params).json() return data["body"].get("subAlbums", ()) class PhotobucketImageExtractor(Extractor): """Extractor for individual images from photobucket.com""" category = "photobucket" subcategory = "image" directory_fmt = ("{category}", "{username}") filename_fmt = "{pictureId:?/_/}{titleOrFilename}.{extension}" archive_fmt = "{username}_{id}" pattern = (r"(?:https?://)?(?:[\w-]+\.)?photobucket\.com" r"(?:/gallery/user/([^/?&#]+)/media/([^/?&#]+)" r"|/user/([^/?&#]+)/media/[^?&#]+\.html)") test = ( (("https://s271.photobucket.com/user/lakerfanryan" "/media/Untitled-3-1.jpg.html"), { "url": "3b647deeaffc184cc48c89945f67574559c9051f", "keyword": "69732741b2b351db7ecaa77ace2fdb39f08ca5a3", }), (("https://s271.photobucket.com/user/lakerfanryan" "/media/IsotopeswBros.jpg.html?sort=3&o=2"), { "url": "12c1890c09c9cdb8a88fba7eec13f324796a8d7b", "keyword": "61200a223df6c06f45ac3d30c88b3f5b048ce9a8", }), ) def __init__(self, match): Extractor.__init__(self, match) self.user = match.group(1) or match.group(3) self.media_id = match.group(2) self.session.headers["Referer"] = self.url def items(self): url = "https://photobucket.com/galleryd/search.php" params = {"userName": self.user, "searchTerm": "", "ref": ""} if self.media_id: params["mediaId"] = self.media_id else: params["url"] = self.url # retry API call up to 5 times, since it can randomly fail tries = 0 while tries < 5: data = self.request(url, method="POST", params=params).json() image = data["mediaDocuments"] if "message" not in image: break # success tries += 1 self.log.debug(image["message"]) else: raise exception.StopExtraction(image["message"]) # adjust metadata entries to be at least somewhat similar # to what the 'album' extractor provides if "media" in image: image = image["media"][image["mediaIndex"]] image["albumView"] = data["mediaDocuments"]["albumView"] image["username"] = image["ownerId"] else: image["fileUrl"] = image.pop("imageUrl") image.setdefault("title", "") image.setdefault("description", "") name, _, ext = image["fileUrl"].rpartition("/")[2].rpartition(".") image["ext"] = image["extension"] = ext image["titleOrFilename"] = image["title"] or name image["tags"] = image.pop("clarifaiTagList", []) mtype, _, mid = binascii.a2b_base64(image["id"]).partition(b":") image["pictureId"] = mid.decode() if mtype == b"mediaId" else "" yield Message.Directory, image yield Message.Url, image["fileUrl"], image ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1653908283.0 gallery_dl-1.25.0/gallery_dl/extractor/photovogue.py0000644000175000017500000000544014245121473021321 0ustar00mikemike# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.vogue.com/photovogue/""" from .common import Extractor, Message from .. import text BASE_PATTERN = r"(?:https?://)?(?:www\.)?vogue\.com/photovogue" class PhotovogueUserExtractor(Extractor): category = "photovogue" subcategory = "user" directory_fmt = ("{category}", "{photographer[id]} {photographer[name]}") filename_fmt = "{id} {title}.{extension}" archive_fmt = "{id}" pattern = BASE_PATTERN + r"/photographers/(\d+)" test = ( ("https://www.vogue.com/photovogue/photographers/221252"), ("https://vogue.com/photovogue/photographers/221252", { "pattern": r"https://images.vogue.it/Photovogue/[^/]+_gallery.jpg", "keyword": { "date": "type:datetime", "favorite_count": int, "favorited": list, "id": int, "image_id": str, "is_favorite": False, "orientation": "re:portrait|landscape", "photographer": { "biography": "Born in 1995. Live in Bologna.", "city": "Bologna", "country_id": 106, "favoritedCount": int, "id": 221252, "isGold": bool, "isPro": bool, "latitude": str, "longitude": str, "name": "Arianna Mattarozzi", "user_id": "38cb0601-4a85-453c-b7dc-7650a037f2ab", "websites": list, }, "photographer_id": 221252, "tags": list, "title": str, }, }), ) def __init__(self, match): Extractor.__init__(self, match) self.user_id = match.group(1) def items(self): for photo in self.photos(): url = photo["gallery_image"] photo["title"] = photo["title"].strip() photo["date"] = text.parse_datetime( photo["date"], "%Y-%m-%dT%H:%M:%S.%f%z") yield Message.Directory, photo yield Message.Url, url, text.nameext_from_url(url, photo) def photos(self): url = "https://api.vogue.com/production/photos" params = { "count": "50", "order_by": "DESC", "page": 0, "photographer_id": self.user_id, } while True: data = self.request(url, params=params).json() yield from data["items"] if not data["has_next"]: break params["page"] += 1 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1651573353.0 gallery_dl-1.25.0/gallery_dl/extractor/picarto.py0000644000175000017500000000473114234201151020553 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://picarto.tv/""" from .common import Extractor, Message from .. import text class PicartoGalleryExtractor(Extractor): """Extractor for picarto galleries""" category = "picarto" subcategory = "gallery" root = "https://picarto.tv" directory_fmt = ("{category}", "{channel[name]}") filename_fmt = "{id} {title}.{extension}" archive_fmt = "{id}" pattern = r"(?:https?://)?picarto\.tv/([^/?#]+)/gallery" test = ("https://picarto.tv/fnook/gallery/default/", { "pattern": r"https://images\.picarto\.tv/gallery/\d/\d\d/\d+/artwork" r"/[0-9a-f-]+/large-[0-9a-f]+\.(jpg|png|gif)", "count": ">= 7", "keyword": {"date": "type:datetime"}, }) def __init__(self, match): Extractor.__init__(self, match) self.username = match.group(1) def items(self): for post in self.posts(): post["date"] = text.parse_datetime( post["created_at"], "%Y-%m-%d %H:%M:%S") variations = post.pop("variations", ()) yield Message.Directory, post image = post["default_image"] if not image: continue url = "https://images.picarto.tv/gallery/" + image["name"] text.nameext_from_url(url, post) yield Message.Url, url, post for variation in variations: post.update(variation) image = post["default_image"] url = "https://images.picarto.tv/gallery/" + image["name"] text.nameext_from_url(url, post) yield Message.Url, url, post def posts(self): url = "https://ptvintern.picarto.tv/api/channel-gallery" params = { "first": "30", "page": 1, "filter_params[album_id]": "", "filter_params[channel_name]": self.username, "filter_params[q]": "", "filter_params[visibility]": "", "order_by[field]": "published_at", "order_by[order]": "DESC", } while True: posts = self.request(url, params=params).json() if not posts: return yield from posts params["page"] += 1 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1674475069.0 gallery_dl-1.25.0/gallery_dl/extractor/piczel.py0000644000175000017500000001117414363473075020422 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2018-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://piczel.tv/""" from .common import Extractor, Message from .. import text class PiczelExtractor(Extractor): """Base class for piczel extractors""" category = "piczel" directory_fmt = ("{category}", "{user[username]}") filename_fmt = "{category}_{id}_{title}_{num:>02}.{extension}" archive_fmt = "{id}_{num}" root = "https://piczel.tv" api_root = "https://tombstone.piczel.tv" def items(self): for post in self.posts(): post["tags"] = [t["title"] for t in post["tags"] if t["title"]] post["date"] = text.parse_datetime( post["created_at"], "%Y-%m-%dT%H:%M:%S.%f%z") if post["multi"]: images = post["images"] del post["images"] yield Message.Directory, post for post["num"], image in enumerate(images): if "id" in image: del image["id"] post.update(image) url = post["image"]["url"] yield Message.Url, url, text.nameext_from_url(url, post) else: yield Message.Directory, post post["num"] = 0 url = post["image"]["url"] yield Message.Url, url, text.nameext_from_url(url, post) def posts(self): """Return an iterable with all relevant post objects""" def _pagination(self, url, folder_id=None): params = { "from_id" : None, "folder_id": folder_id, } while True: data = self.request(url, params=params).json() if not data: return params["from_id"] = data[-1]["id"] for post in data: if not folder_id or folder_id == post["folder_id"]: yield post class PiczelUserExtractor(PiczelExtractor): """Extractor for all images from a user's gallery""" subcategory = "user" pattern = r"(?:https?://)?(?:www\.)?piczel\.tv/gallery/([^/?#]+)/?$" test = ("https://piczel.tv/gallery/Bikupan", { "range": "1-100", "count": ">= 100", }) def __init__(self, match): PiczelExtractor.__init__(self, match) self.user = match.group(1) def posts(self): url = "{}/api/users/{}/gallery".format(self.api_root, self.user) return self._pagination(url) class PiczelFolderExtractor(PiczelExtractor): """Extractor for images inside a user's folder""" subcategory = "folder" directory_fmt = ("{category}", "{user[username]}", "{folder[name]}") archive_fmt = "f{folder[id]}_{id}_{num}" pattern = (r"(?:https?://)?(?:www\.)?piczel\.tv" r"/gallery/(?!image)([^/?#]+)/(\d+)") test = ("https://piczel.tv/gallery/Lulena/1114", { "count": ">= 4", }) def __init__(self, match): PiczelExtractor.__init__(self, match) self.user, self.folder_id = match.groups() def posts(self): url = "{}/api/users/{}/gallery".format(self.api_root, self.user) return self._pagination(url, int(self.folder_id)) class PiczelImageExtractor(PiczelExtractor): """Extractor for individual images""" subcategory = "image" pattern = r"(?:https?://)?(?:www\.)?piczel\.tv/gallery/image/(\d+)" test = ("https://piczel.tv/gallery/image/7807", { "pattern": r"https://(\w+\.)?piczel\.tv/static/uploads/gallery_image" r"/32920/image/7807/1532236438-Lulena\.png", "content": "df9a053a24234474a19bce2b7e27e0dec23bff87", "keyword": { "created_at": "2018-07-22T05:13:58.000Z", "date": "dt:2018-07-22 05:13:58", "description": None, "extension": "png", "favorites_count": int, "folder_id": 1113, "id": 7807, "is_flash": False, "is_video": False, "multi": False, "nsfw": False, "num": 0, "password_protected": False, "tags": ["fanart", "commission", "altair", "recreators"], "title": "Altair", "user": dict, "views": int, }, }) def __init__(self, match): PiczelExtractor.__init__(self, match) self.image_id = match.group(1) def posts(self): url = "{}/api/gallery/{}".format(self.api_root, self.image_id) return (self.request(url).json(),) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1674475069.0 gallery_dl-1.25.0/gallery_dl/extractor/pillowfort.py0000644000175000017500000001622114363473075021333 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2021-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.pillowfort.social/""" from .common import Extractor, Message from ..cache import cache from .. import text, exception import re BASE_PATTERN = r"(?:https?://)?www\.pillowfort\.social" class PillowfortExtractor(Extractor): """Base class for pillowfort extractors""" category = "pillowfort" root = "https://www.pillowfort.social" directory_fmt = ("{category}", "{username}") filename_fmt = ("{post_id} {title|original_post[title]:?/ /}" "{num:>02}.{extension}") archive_fmt = "{id}" cookiedomain = "www.pillowfort.social" def __init__(self, match): Extractor.__init__(self, match) self.item = match.group(1) def items(self): self.login() inline = self.config("inline", True) reblogs = self.config("reblogs", False) external = self.config("external", False) if inline: inline = re.compile(r'src="(https://img\d+\.pillowfort\.social' r'/posts/[^"]+)').findall for post in self.posts(): if "original_post" in post and not reblogs: continue files = post.pop("media") if inline: for url in inline(post["content"]): files.append({"url": url}) post["date"] = text.parse_datetime( post["created_at"], "%Y-%m-%dT%H:%M:%S.%f%z") post["post_id"] = post.pop("id") yield Message.Directory, post post["num"] = 0 for file in files: url = file["url"] if not url: continue if file.get("embed_code"): if not external: continue msgtype = Message.Queue else: post["num"] += 1 msgtype = Message.Url post.update(file) text.nameext_from_url(url, post) post["hash"], _, post["filename"] = \ post["filename"].partition("_") if "id" not in file: post["id"] = post["hash"] if "created_at" in file: post["date"] = text.parse_datetime( file["created_at"], "%Y-%m-%dT%H:%M:%S.%f%z") yield msgtype, url, post def login(self): cget = self.session.cookies.get if cget("_Pf_new_session", domain=self.cookiedomain) \ or cget("remember_user_token", domain=self.cookiedomain): return username, password = self._get_auth_info() if username: cookies = self._login_impl(username, password) self._update_cookies(cookies) @cache(maxage=14*24*3600, keyarg=1) def _login_impl(self, username, password): self.log.info("Logging in as %s", username) url = "https://www.pillowfort.social/users/sign_in" page = self.request(url).text auth = text.extr(page, 'name="authenticity_token" value="', '"') headers = {"Origin": self.root, "Referer": url} data = { "utf8" : "✓", "authenticity_token": auth, "user[email]" : username, "user[password]" : password, "user[remember_me]" : "1", } response = self.request(url, method="POST", headers=headers, data=data) if not response.history: raise exception.AuthenticationError() return { cookie.name: cookie.value for cookie in response.history[0].cookies } class PillowfortPostExtractor(PillowfortExtractor): """Extractor for a single pillowfort post""" subcategory = "post" pattern = BASE_PATTERN + r"/posts/(\d+)" test = ( ("https://www.pillowfort.social/posts/27510", { "pattern": r"https://img\d+\.pillowfort\.social" r"/posts/\w+_out\d+\.png", "count": 4, "keyword": { "avatar_url": str, "col": 0, "commentable": True, "comments_count": int, "community_id": None, "content": str, "created_at": str, "date": "type:datetime", "deleted": None, "deleted_at": None, "deleted_by_mod": None, "deleted_for_flag_id": None, "embed_code": None, "id": int, "last_activity": str, "last_activity_elapsed": str, "last_edited_at": str, "likes_count": int, "media_type": "picture", "nsfw": False, "num": int, "original_post_id": None, "original_post_user_id": None, "picture_content_type": None, "picture_file_name": None, "picture_file_size": None, "picture_updated_at": None, "post_id": 27510, "post_type": "picture", "privacy": "public", "reblog_copy_info": list, "rebloggable": True, "reblogged_from_post_id": None, "reblogged_from_user_id": None, "reblogs_count": int, "row": int, "small_image_url": None, "tags": list, "time_elapsed": str, "timestamp": str, "title": "What is Pillowfort.social?", "updated_at": str, "url": r"re:https://img3.pillowfort.social/posts/.*\.png", "user_id": 5, "username": "Staff" }, }), ("https://www.pillowfort.social/posts/1557500", { "options": (("external", True), ("inline", False)), "pattern": r"https://twitter\.com/Aliciawitdaart/status" r"/1282862493841457152", }), ("https://www.pillowfort.social/posts/1672518", { "options": (("inline", True),), "count": 3, }), ) def posts(self): url = "{}/posts/{}/json/".format(self.root, self.item) return (self.request(url).json(),) class PillowfortUserExtractor(PillowfortExtractor): """Extractor for all posts of a pillowfort user""" subcategory = "user" pattern = BASE_PATTERN + r"/(?!posts/)([^/?#]+)" test = ("https://www.pillowfort.social/Pome", { "pattern": r"https://img\d+\.pillowfort\.social/posts/", "range": "1-15", "count": 15, }) def posts(self): url = "{}/{}/json/".format(self.root, self.item) params = {"p": 1} while True: posts = self.request(url, params=params).json()["posts"] yield from posts if len(posts) < 20: return params["p"] += 1 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1676722307.0 gallery_dl-1.25.0/gallery_dl/extractor/pinterest.py0000644000175000017500000004651714374140203021144 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2016-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.pinterest.com/""" from .common import Extractor, Message from .. import text, util, exception from ..cache import cache import itertools BASE_PATTERN = r"(?:https?://)?(?:\w+\.)?pinterest\.[\w.]+" class PinterestExtractor(Extractor): """Base class for pinterest extractors""" category = "pinterest" filename_fmt = "{category}_{id}{media_id:?_//}.{extension}" archive_fmt = "{id}{media_id}" root = "https://www.pinterest.com" def __init__(self, match): Extractor.__init__(self, match) domain = self.config("domain") if not domain or domain == "auto" : self.root = text.root_from_url(match.group(0)) else: self.root = text.ensure_http_scheme(domain) self.api = PinterestAPI(self) def items(self): self.api.login() data = self.metadata() videos = self.config("videos", True) yield Message.Directory, data for pin in self.pins(): if isinstance(pin, tuple): url, data = pin yield Message.Queue, url, data continue pin.update(data) carousel_data = pin.get("carousel_data") if carousel_data: for num, slot in enumerate(carousel_data["carousel_slots"], 1): slot["media_id"] = slot.pop("id") pin.update(slot) pin["num"] = num size, image = next(iter(slot["images"].items())) url = image["url"].replace("/" + size + "/", "/originals/") yield Message.Url, url, text.nameext_from_url(url, pin) else: try: media = self._media_from_pin(pin) except Exception: self.log.debug("Unable to fetch download URL for pin %s", pin.get("id")) continue if videos or media.get("duration") is None: pin.update(media) pin["num"] = 0 pin["media_id"] = "" url = media["url"] text.nameext_from_url(url, pin) if pin["extension"] == "m3u8": url = "ytdl:" + url pin["extension"] = "mp4" yield Message.Url, url, pin def metadata(self): """Return general metadata""" def pins(self): """Return all relevant pin objects""" @staticmethod def _media_from_pin(pin): videos = pin.get("videos") if videos: video_formats = videos["video_list"] for fmt in ("V_HLSV4", "V_HLSV3_WEB", "V_HLSV3_MOBILE"): if fmt in video_formats: media = video_formats[fmt] break else: media = max(video_formats.values(), key=lambda x: x.get("width", 0)) if "V_720P" in video_formats: media["_fallback"] = (video_formats["V_720P"]["url"],) return media return pin["images"]["orig"] class PinterestPinExtractor(PinterestExtractor): """Extractor for images from a single pin from pinterest.com""" subcategory = "pin" pattern = BASE_PATTERN + r"/pin/([^/?#&]+)(?!.*#related$)" test = ( ("https://www.pinterest.com/pin/858146903966145189/", { "url": "afb3c26719e3a530bb0e871c480882a801a4e8a5", "content": ("4c435a66f6bb82bb681db2ecc888f76cf6c5f9ca", "d3e24bc9f7af585e8c23b9136956bd45a4d9b947"), }), # video pin (#1189) ("https://www.pinterest.com/pin/422564377542934214/", { "pattern": r"https://v\.pinimg\.com/videos/mc/hls/d7/22/ff" r"/d722ff00ab2352981b89974b37909de8.m3u8", }), ("https://www.pinterest.com/pin/858146903966145188/", { "exception": exception.NotFoundError, }), ) def __init__(self, match): PinterestExtractor.__init__(self, match) self.pin_id = match.group(1) self.pin = None def metadata(self): self.pin = self.api.pin(self.pin_id) return self.pin def pins(self): return (self.pin,) class PinterestBoardExtractor(PinterestExtractor): """Extractor for images from a board from pinterest.com""" subcategory = "board" directory_fmt = ("{category}", "{board[owner][username]}", "{board[name]}") archive_fmt = "{board[id]}_{id}" pattern = (BASE_PATTERN + r"/(?!pin/)([^/?#&]+)" "/(?!_saved|_created|pins/)([^/?#&]+)/?$") test = ( ("https://www.pinterest.com/g1952849/test-/", { "pattern": r"https://i\.pinimg\.com/originals/", "count": 2, }), # board with sections (#835) ("https://www.pinterest.com/g1952849/stuff/", { "options": (("sections", True),), "count": 4, }), # secret board (#1055) ("https://www.pinterest.de/g1952849/secret/", { "count": 2, }), ("https://www.pinterest.com/g1952848/test/", { "exception": exception.GalleryDLException, }), # .co.uk TLD (#914) ("https://www.pinterest.co.uk/hextra7519/based-animals/"), ) def __init__(self, match): PinterestExtractor.__init__(self, match) self.user = text.unquote(match.group(1)) self.board_name = text.unquote(match.group(2)) self.board = None def metadata(self): self.board = self.api.board(self.user, self.board_name) return {"board": self.board} def pins(self): board = self.board pins = self.api.board_pins(board["id"]) if board["section_count"] and self.config("sections", True): base = "{}/{}/{}/id:".format( self.root, board["owner"]["username"], board["name"]) data = {"_extractor": PinterestSectionExtractor} sections = [(base + section["id"], data) for section in self.api.board_sections(board["id"])] pins = itertools.chain(pins, sections) return pins class PinterestUserExtractor(PinterestExtractor): """Extractor for a user's boards""" subcategory = "user" pattern = BASE_PATTERN + r"/(?!pin/)([^/?#&]+)(?:/_saved)?/?$" test = ( ("https://www.pinterest.com/g1952849/", { "pattern": PinterestBoardExtractor.pattern, "count": ">= 2", }), ("https://www.pinterest.com/g1952849/_saved/"), ) def __init__(self, match): PinterestExtractor.__init__(self, match) self.user = text.unquote(match.group(1)) def items(self): for board in self.api.boards(self.user): url = board.get("url") if url: board["_extractor"] = PinterestBoardExtractor yield Message.Queue, self.root + url, board class PinterestAllpinsExtractor(PinterestExtractor): """Extractor for a user's 'All Pins' feed""" subcategory = "allpins" directory_fmt = ("{category}", "{user}") pattern = BASE_PATTERN + r"/(?!pin/)([^/?#&]+)/pins/?$" test = ("https://www.pinterest.com/g1952849/pins/", { "pattern": r"https://i\.pinimg\.com/originals/[0-9a-f]{2}" r"/[0-9a-f]{2}/[0-9a-f]{2}/[0-9a-f]{32}\.\w{3}", "count": 7, }) def __init__(self, match): PinterestExtractor.__init__(self, match) self.user = text.unquote(match.group(1)) def metadata(self): return {"user": self.user} def pins(self): return self.api.user_pins(self.user) class PinterestCreatedExtractor(PinterestExtractor): """Extractor for a user's created pins""" subcategory = "created" directory_fmt = ("{category}", "{user}") pattern = BASE_PATTERN + r"/(?!pin/)([^/?#&]+)/_created/?$" test = ("https://www.pinterest.de/digitalmomblog/_created/", { "pattern": r"https://i\.pinimg\.com/originals/[0-9a-f]{2}" r"/[0-9a-f]{2}/[0-9a-f]{2}/[0-9a-f]{32}\.jpg", "count": 10, "range": "1-10", }) def __init__(self, match): PinterestExtractor.__init__(self, match) self.user = text.unquote(match.group(1)) def metadata(self): return {"user": self.user} def pins(self): return self.api.user_activity_pins(self.user) class PinterestSectionExtractor(PinterestExtractor): """Extractor for board sections on pinterest.com""" subcategory = "section" directory_fmt = ("{category}", "{board[owner][username]}", "{board[name]}", "{section[title]}") archive_fmt = "{board[id]}_{id}" pattern = BASE_PATTERN + r"/(?!pin/)([^/?#&]+)/([^/?#&]+)/([^/?#&]+)" test = ("https://www.pinterest.com/g1952849/stuff/section", { "count": 2, }) def __init__(self, match): PinterestExtractor.__init__(self, match) self.user = text.unquote(match.group(1)) self.board_slug = text.unquote(match.group(2)) self.section_slug = text.unquote(match.group(3)) self.section = None def metadata(self): if self.section_slug.startswith("id:"): section = self.section = self.api.board_section( self.section_slug[3:]) else: section = self.section = self.api.board_section_by_name( self.user, self.board_slug, self.section_slug) section.pop("preview_pins", None) return {"board": section.pop("board"), "section": section} def pins(self): return self.api.board_section_pins(self.section["id"]) class PinterestSearchExtractor(PinterestExtractor): """Extractor for Pinterest search results""" subcategory = "search" directory_fmt = ("{category}", "Search", "{search}") pattern = BASE_PATTERN + r"/search/pins/?\?q=([^&#]+)" test = ("https://www.pinterest.com/search/pins/?q=nature", { "range": "1-50", "count": ">= 50", }) def __init__(self, match): PinterestExtractor.__init__(self, match) self.search = text.unquote(match.group(1)) def metadata(self): return {"search": self.search} def pins(self): return self.api.search(self.search) class PinterestRelatedPinExtractor(PinterestPinExtractor): """Extractor for related pins of another pin from pinterest.com""" subcategory = "related-pin" directory_fmt = ("{category}", "related {original_pin[id]}") pattern = BASE_PATTERN + r"/pin/([^/?#&]+).*#related$" test = ("https://www.pinterest.com/pin/858146903966145189/#related", { "range": "31-70", "count": 40, "archive": False, }) def metadata(self): return {"original_pin": self.api.pin(self.pin_id)} def pins(self): return self.api.pin_related(self.pin_id) class PinterestRelatedBoardExtractor(PinterestBoardExtractor): """Extractor for related pins of a board from pinterest.com""" subcategory = "related-board" directory_fmt = ("{category}", "{board[owner][username]}", "{board[name]}", "related") pattern = BASE_PATTERN + r"/(?!pin/)([^/?#&]+)/([^/?#&]+)/?#related$" test = ("https://www.pinterest.com/g1952849/test-/#related", { "range": "31-70", "count": 40, "archive": False, }) def pins(self): return self.api.board_related(self.board["id"]) class PinterestPinitExtractor(PinterestExtractor): """Extractor for images from a pin.it URL""" subcategory = "pinit" pattern = r"(?:https?://)?pin\.it/([^/?#&]+)" test = ( ("https://pin.it/Hvt8hgT", { "url": "8daad8558382c68f0868bdbd17d05205184632fa", }), ("https://pin.it/Hvt8hgS", { "exception": exception.NotFoundError, }), ) def __init__(self, match): PinterestExtractor.__init__(self, match) self.shortened_id = match.group(1) def items(self): url = "https://api.pinterest.com/url_shortener/{}/redirect".format( self.shortened_id) response = self.request(url, method="HEAD", allow_redirects=False) location = response.headers.get("Location") if not location or not PinterestPinExtractor.pattern.match(location): raise exception.NotFoundError("pin") yield Message.Queue, location, {"_extractor": PinterestPinExtractor} class PinterestAPI(): """Minimal interface for the Pinterest Web API For a better and more complete implementation in PHP, see - https://github.com/seregazhuk/php-pinterest-bot """ def __init__(self, extractor): csrf_token = util.generate_token() self.extractor = extractor self.root = extractor.root self.cookies = {"csrftoken": csrf_token} self.headers = { "Accept" : "application/json, text/javascript, " "*/*, q=0.01", "Accept-Language" : "en-US,en;q=0.5", "Referer" : self.root + "/", "X-Requested-With" : "XMLHttpRequest", "X-APP-VERSION" : "0c4af40", "X-CSRFToken" : csrf_token, "X-Pinterest-AppState": "active", "Origin" : self.root, } def pin(self, pin_id): """Query information about a pin""" options = {"id": pin_id, "field_set_key": "detailed"} return self._call("Pin", options)["resource_response"]["data"] def pin_related(self, pin_id): """Yield related pins of another pin""" options = {"pin": pin_id, "add_vase": True, "pins_only": True} return self._pagination("RelatedPinFeed", options) def board(self, user, board_name): """Query information about a board""" options = {"slug": board_name, "username": user, "field_set_key": "detailed"} return self._call("Board", options)["resource_response"]["data"] def boards(self, user): """Yield all boards from 'user'""" options = { "sort" : "last_pinned_to", "field_set_key" : "profile_grid_item", "filter_stories" : False, "username" : user, "page_size" : 25, "include_archived": True, } return self._pagination("Boards", options) def board_pins(self, board_id): """Yield all pins of a specific board""" options = {"board_id": board_id} return self._pagination("BoardFeed", options) def board_section(self, section_id): """Yield a specific board section""" options = {"section_id": section_id} return self._call("BoardSection", options)["resource_response"]["data"] def board_section_by_name(self, user, board_slug, section_slug): """Yield a board section by name""" options = {"board_slug": board_slug, "section_slug": section_slug, "username": user} return self._call("BoardSection", options)["resource_response"]["data"] def board_sections(self, board_id): """Yield all sections of a specific board""" options = {"board_id": board_id} return self._pagination("BoardSections", options) def board_section_pins(self, section_id): """Yield all pins from a board section""" options = {"section_id": section_id} return self._pagination("BoardSectionPins", options) def board_related(self, board_id): """Yield related pins of a specific board""" options = {"board_id": board_id, "add_vase": True} return self._pagination("BoardRelatedPixieFeed", options) def user_pins(self, user): """Yield all pins from 'user'""" options = { "is_own_profile_pins": False, "username" : user, "field_set_key" : "grid_item", "pin_filter" : None, } return self._pagination("UserPins", options) def user_activity_pins(self, user): """Yield pins created by 'user'""" options = { "exclude_add_pin_rep": True, "field_set_key" : "grid_item", "is_own_profile_pins": False, "username" : user, } return self._pagination("UserActivityPins", options) def search(self, query): """Yield pins from searches""" options = {"query": query, "scope": "pins", "rs": "typed"} return self._pagination("BaseSearch", options) def login(self): """Login and obtain session cookies""" username, password = self.extractor._get_auth_info() if username: self.cookies.update(self._login_impl(username, password)) @cache(maxage=180*24*3600, keyarg=1) def _login_impl(self, username, password): self.extractor.log.info("Logging in as %s", username) url = self.root + "/resource/UserSessionResource/create/" options = { "username_or_email": username, "password" : password, } data = { "data" : util.json_dumps({"options": options}), "source_url": "", } try: response = self.extractor.request( url, method="POST", headers=self.headers, cookies=self.cookies, data=data) resource = response.json()["resource_response"] except (exception.HttpError, ValueError, KeyError): raise exception.AuthenticationError() if resource["status"] != "success": raise exception.AuthenticationError() return { cookie.name: cookie.value for cookie in response.cookies } def _call(self, resource, options): url = "{}/resource/{}Resource/get/".format(self.root, resource) params = { "data" : util.json_dumps({"options": options}), "source_url": "", } response = self.extractor.request( url, params=params, headers=self.headers, cookies=self.cookies, fatal=False) try: data = response.json() except ValueError: data = {} if response.history: self.root = text.root_from_url(response.url) if response.status_code < 400: return data if response.status_code == 404: resource = self.extractor.subcategory.rpartition("-")[2] raise exception.NotFoundError(resource) self.extractor.log.debug("Server response: %s", response.text) raise exception.StopExtraction("API request failed") def _pagination(self, resource, options): while True: data = self._call(resource, options) results = data["resource_response"]["data"] if isinstance(results, dict): results = results["results"] yield from results try: bookmarks = data["resource"]["options"]["bookmarks"] if (not bookmarks or bookmarks[0] == "-end-" or bookmarks[0].startswith("Y2JOb25lO")): return options["bookmarks"] = bookmarks except KeyError: return ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1674475069.0 gallery_dl-1.25.0/gallery_dl/extractor/pixiv.py0000644000175000017500000010520514363473075020272 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2014-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.pixiv.net/""" from .common import Extractor, Message from .. import text, util, exception from ..cache import cache, memcache from datetime import datetime, timedelta import itertools import hashlib class PixivExtractor(Extractor): """Base class for pixiv extractors""" category = "pixiv" root = "https://www.pixiv.net" directory_fmt = ("{category}", "{user[id]} {user[account]}") filename_fmt = "{id}_p{num}.{extension}" archive_fmt = "{id}{suffix}.{extension}" cookiedomain = None def __init__(self, match): Extractor.__init__(self, match) self.api = PixivAppAPI(self) self.load_ugoira = self.config("ugoira", True) self.max_posts = self.config("max-posts", 0) def items(self): tags = self.config("tags", "japanese") if tags == "original": transform_tags = None elif tags == "translated": def transform_tags(work): work["tags"] = list(dict.fromkeys( tag["translated_name"] or tag["name"] for tag in work["tags"])) else: def transform_tags(work): work["tags"] = [tag["name"] for tag in work["tags"]] ratings = {0: "General", 1: "R-18", 2: "R-18G"} meta_user = self.config("metadata") meta_bookmark = self.config("metadata-bookmark") metadata = self.metadata() works = self.works() if self.max_posts: works = itertools.islice(works, self.max_posts) for work in works: if not work["user"]["id"]: continue meta_single_page = work["meta_single_page"] meta_pages = work["meta_pages"] del work["meta_single_page"] del work["image_urls"] del work["meta_pages"] if meta_user: work.update(self.api.user_detail(work["user"]["id"])) if meta_bookmark and work["is_bookmarked"]: detail = self.api.illust_bookmark_detail(work["id"]) work["tags_bookmark"] = [tag["name"] for tag in detail["tags"] if tag["is_registered"]] if transform_tags: transform_tags(work) work["num"] = 0 work["date"] = text.parse_datetime(work["create_date"]) work["rating"] = ratings.get(work["x_restrict"]) work["suffix"] = "" work.update(metadata) yield Message.Directory, work if work["type"] == "ugoira": if not self.load_ugoira: continue try: ugoira = self.api.ugoira_metadata(work["id"]) except exception.StopExtraction as exc: self.log.warning( "Unable to retrieve Ugoira metatdata (%s - %s)", work.get("id"), exc.message) continue url = ugoira["zip_urls"]["medium"].replace( "_ugoira600x600", "_ugoira1920x1080") work["frames"] = ugoira["frames"] work["date_url"] = self._date_from_url(url) work["_http_adjust_extension"] = False yield Message.Url, url, text.nameext_from_url(url, work) elif work["page_count"] == 1: url = meta_single_page["original_image_url"] work["date_url"] = self._date_from_url(url) yield Message.Url, url, text.nameext_from_url(url, work) else: for work["num"], img in enumerate(meta_pages): url = img["image_urls"]["original"] work["date_url"] = self._date_from_url(url) work["suffix"] = "_p{:02}".format(work["num"]) yield Message.Url, url, text.nameext_from_url(url, work) @staticmethod def _date_from_url(url, offset=timedelta(hours=9)): try: _, _, _, _, _, y, m, d, H, M, S, _ = url.split("/") return datetime( int(y), int(m), int(d), int(H), int(M), int(S)) - offset except Exception: return None @staticmethod def _make_work(kind, url, user): p = url.split("/") return { "create_date" : "{}-{}-{}T{}:{}:{}+09:00".format( p[5], p[6], p[7], p[8], p[9], p[10]) if len(p) > 9 else None, "height" : 0, "id" : kind, "image_urls" : None, "meta_pages" : (), "meta_single_page": {"original_image_url": url}, "page_count" : 1, "sanity_level" : 0, "tags" : (), "title" : kind, "type" : kind, "user" : user, "width" : 0, "x_restrict" : 0, } def works(self): """Return an iterable containing all relevant 'work' objects""" def metadata(self): """Collect metadata for extractor job""" return {} class PixivUserExtractor(PixivExtractor): """Extractor for a pixiv user profile""" subcategory = "user" pattern = (r"(?:https?://)?(?:www\.|touch\.)?pixiv\.net/(?:" r"(?:en/)?u(?:sers)?/|member\.php\?id=|(?:mypage\.php)?#id=" r")(\d+)(?:$|[?#])") test = ( ("https://www.pixiv.net/en/users/173530"), ("https://www.pixiv.net/u/173530"), ("https://www.pixiv.net/member.php?id=173530"), ("https://www.pixiv.net/mypage.php#id=173530"), ("https://www.pixiv.net/#id=173530"), ) def __init__(self, match): PixivExtractor.__init__(self, match) self.user_id = match.group(1) def items(self): base = "{}/users/{}/".format(self.root, self.user_id) return self._dispatch_extractors(( (PixivAvatarExtractor , base + "avatar"), (PixivBackgroundExtractor, base + "background"), (PixivArtworksExtractor , base + "artworks"), (PixivFavoriteExtractor , base + "bookmarks/artworks"), ), ("artworks",)) class PixivArtworksExtractor(PixivExtractor): """Extractor for artworks of a pixiv user""" subcategory = "artworks" pattern = (r"(?:https?://)?(?:www\.|touch\.)?pixiv\.net/(?:" r"(?:en/)?users/(\d+)/(?:artworks|illustrations|manga)" r"(?:/([^/?#]+))?/?(?:$|[?#])" r"|member_illust\.php\?id=(\d+)(?:&([^#]+))?)") test = ( ("https://www.pixiv.net/en/users/173530/artworks", { "url": "852c31ad83b6840bacbce824d85f2a997889efb7", }), # illusts with specific tag (("https://www.pixiv.net/en/users/173530/artworks" "/%E6%89%8B%E3%81%B6%E3%82%8D"), { "url": "25b1cd81153a8ff82eec440dd9f20a4a22079658", }), (("https://www.pixiv.net/member_illust.php?id=173530" "&tag=%E6%89%8B%E3%81%B6%E3%82%8D"), { "url": "25b1cd81153a8ff82eec440dd9f20a4a22079658", }), # deleted account ("http://www.pixiv.net/member_illust.php?id=173531", { "options": (("metadata", True),), "exception": exception.NotFoundError, }), ("https://www.pixiv.net/en/users/173530/manga"), ("https://www.pixiv.net/en/users/173530/illustrations"), ("https://www.pixiv.net/member_illust.php?id=173530"), ("https://touch.pixiv.net/member_illust.php?id=173530"), ) def __init__(self, match): PixivExtractor.__init__(self, match) u1, t1, u2, t2 = match.groups() if t1: t1 = text.unquote(t1) elif t2: t2 = text.parse_query(t2).get("tag") self.user_id = u1 or u2 self.tag = t1 or t2 def metadata(self): if self.config("metadata"): self.api.user_detail(self.user_id) return {} def works(self): works = self.api.user_illusts(self.user_id) if self.tag: tag = self.tag.lower() works = ( work for work in works if tag in [t["name"].lower() for t in work["tags"]] ) return works class PixivAvatarExtractor(PixivExtractor): """Extractor for pixiv avatars""" subcategory = "avatar" filename_fmt = "avatar{date:?_//%Y-%m-%d}.{extension}" archive_fmt = "avatar_{user[id]}_{date}" pattern = (r"(?:https?://)?(?:www\.)?pixiv\.net" r"/(?:en/)?users/(\d+)/avatar") test = ("https://www.pixiv.net/en/users/173530/avatar", { "content": "4e57544480cc2036ea9608103e8f024fa737fe66", }) def __init__(self, match): PixivExtractor.__init__(self, match) self.user_id = match.group(1) def works(self): user = self.api.user_detail(self.user_id)["user"] url = user["profile_image_urls"]["medium"].replace("_170.", ".") return (self._make_work("avatar", url, user),) class PixivBackgroundExtractor(PixivExtractor): """Extractor for pixiv background banners""" subcategory = "background" filename_fmt = "background{date:?_//%Y-%m-%d}.{extension}" archive_fmt = "background_{user[id]}_{date}" pattern = (r"(?:https?://)?(?:www\.)?pixiv\.net" r"/(?:en/)?users/(\d+)/background") test = ("https://www.pixiv.net/en/users/194921/background", { "pattern": r"https://i\.pximg\.net/background/img/2021/01/30/16/12/02" r"/194921_af1f71e557a42f499213d4b9eaccc0f8\.jpg", }) def __init__(self, match): PixivExtractor.__init__(self, match) self.user_id = match.group(1) def works(self): detail = self.api.user_detail(self.user_id) url = detail["profile"]["background_image_url"] if not url: return () if "/c/" in url: parts = url.split("/") del parts[3:5] url = "/".join(parts) url = url.replace("_master1200.", ".") work = self._make_work("background", url, detail["user"]) if url.endswith(".jpg"): url = url[:-4] work["_fallback"] = (url + ".png", url + ".gif") return (work,) class PixivMeExtractor(PixivExtractor): """Extractor for pixiv.me URLs""" subcategory = "me" pattern = r"(?:https?://)?pixiv\.me/([^/?#]+)" test = ( ("https://pixiv.me/del_shannon", { "url": "29c295ce75150177e6b0a09089a949804c708fbf", }), ("https://pixiv.me/del_shanno", { "exception": exception.NotFoundError, }), ) def __init__(self, match): PixivExtractor.__init__(self, match) self.account = match.group(1) def items(self): url = "https://pixiv.me/" + self.account data = {"_extractor": PixivUserExtractor} response = self.request( url, method="HEAD", allow_redirects=False, notfound="user") yield Message.Queue, response.headers["Location"], data class PixivWorkExtractor(PixivExtractor): """Extractor for a single pixiv work/illustration""" subcategory = "work" pattern = (r"(?:https?://)?(?:(?:www\.|touch\.)?pixiv\.net" r"/(?:(?:en/)?artworks/" r"|member_illust\.php\?(?:[^&]+&)*illust_id=)(\d+)" r"|(?:i(?:\d+\.pixiv|\.pximg)\.net" r"/(?:(?:.*/)?img-[^/]+/img/\d{4}(?:/\d\d){5}|img\d+/img/[^/]+)" r"|img\d*\.pixiv\.net/img/[^/]+|(?:www\.)?pixiv\.net/i)/(\d+))") test = ( ("https://www.pixiv.net/artworks/966412", { "url": "90c1715b07b0d1aad300bce256a0bc71f42540ba", "content": "69a8edfb717400d1c2e146ab2b30d2c235440c5a", "keyword": { "date" : "dt:2008-06-12 15:29:13", "date_url": "dt:2008-06-12 15:29:13", }, }), (("http://www.pixiv.net/member_illust.php" "?mode=medium&illust_id=966411"), { "exception": exception.NotFoundError, }), # ugoira (("https://www.pixiv.net/member_illust.php" "?mode=medium&illust_id=66806629"), { "url": "7267695a985c4db8759bebcf8d21dbdd2d2317ef", "keyword": { "frames" : list, "date" : "dt:2018-01-14 15:06:08", "date_url": "dt:2018-01-15 04:24:48", }, }), # related works (#1237) ("https://www.pixiv.net/artworks/966412", { "options": (("related", True),), "range": "1-10", "count": ">= 10", }), ("https://www.pixiv.net/en/artworks/966412"), ("http://www.pixiv.net/member_illust.php?mode=medium&illust_id=96641"), ("http://i1.pixiv.net/c/600x600/img-master" "/img/2008/06/13/00/29/13/966412_p0_master1200.jpg"), ("https://i.pximg.net/img-original" "/img/2017/04/25/07/33/29/62568267_p0.png"), ("https://www.pixiv.net/i/966412"), ("http://img.pixiv.net/img/soundcross/42626136.jpg"), ("http://i2.pixiv.net/img76/img/snailrin/42672235.jpg"), ) def __init__(self, match): PixivExtractor.__init__(self, match) self.illust_id = match.group(1) or match.group(2) def works(self): works = (self.api.illust_detail(self.illust_id),) if self.config("related", False): related = self.api.illust_related(self.illust_id) works = itertools.chain(works, related) return works class PixivFavoriteExtractor(PixivExtractor): """Extractor for all favorites/bookmarks of a pixiv-user""" subcategory = "favorite" directory_fmt = ("{category}", "bookmarks", "{user_bookmark[id]} {user_bookmark[account]}") archive_fmt = "f_{user_bookmark[id]}_{id}{num}.{extension}" pattern = (r"(?:https?://)?(?:www\.|touch\.)?pixiv\.net/(?:(?:en/)?" r"users/(\d+)/(bookmarks/artworks|following)(?:/([^/?#]+))?" r"|bookmark\.php)(?:\?([^#]*))?") test = ( ("https://www.pixiv.net/en/users/173530/bookmarks/artworks", { "url": "85a3104eaaaf003c7b3947117ca2f1f0b1cfc949", }), ("https://www.pixiv.net/bookmark.php?id=173530", { "url": "85a3104eaaaf003c7b3947117ca2f1f0b1cfc949", }), # bookmarks with specific tag (("https://www.pixiv.net/en/users/3137110" "/bookmarks/artworks/%E3%81%AF%E3%82%93%E3%82%82%E3%82%93"), { "url": "379b28275f786d946e01f721e54afe346c148a8c", }), # bookmarks with specific tag (legacy url) (("https://www.pixiv.net/bookmark.php?id=3137110" "&tag=%E3%81%AF%E3%82%93%E3%82%82%E3%82%93&p=1"), { "url": "379b28275f786d946e01f721e54afe346c148a8c", }), # own bookmarks ("https://www.pixiv.net/bookmark.php", { "url": "90c1715b07b0d1aad300bce256a0bc71f42540ba", "keyword": {"tags_bookmark": ["47", "hitman"]}, "options": (("metadata-bookmark", True),), }), # own bookmarks with tag (#596) ("https://www.pixiv.net/bookmark.php?tag=foobar", { "count": 0, }), # followed users (#515) ("https://www.pixiv.net/en/users/173530/following", { "pattern": PixivUserExtractor.pattern, "count": ">= 12", }), # followed users (legacy url) (#515) ("https://www.pixiv.net/bookmark.php?id=173530&type=user", { "pattern": PixivUserExtractor.pattern, "count": ">= 12", }), # touch URLs ("https://touch.pixiv.net/bookmark.php?id=173530"), ("https://touch.pixiv.net/bookmark.php"), ) def __init__(self, match): uid, kind, self.tag, query = match.groups() query = text.parse_query(query) if not uid: uid = query.get("id") if not uid: self.subcategory = "bookmark" if kind == "following" or query.get("type") == "user": self.subcategory = "following" self.items = self._items_following PixivExtractor.__init__(self, match) self.query = query self.user_id = uid def works(self): tag = None if "tag" in self.query: tag = text.unquote(self.query["tag"]) elif self.tag: tag = text.unquote(self.tag) restrict = "public" if self.query.get("rest") == "hide": restrict = "private" return self.api.user_bookmarks_illust(self.user_id, tag, restrict) def metadata(self): if self.user_id: user = self.api.user_detail(self.user_id)["user"] else: self.api.login() user = self.api.user self.user_id = user["id"] return {"user_bookmark": user} def _items_following(self): restrict = "public" if self.query.get("rest") == "hide": restrict = "private" for preview in self.api.user_following(self.user_id, restrict): user = preview["user"] user["_extractor"] = PixivUserExtractor url = "https://www.pixiv.net/users/{}".format(user["id"]) yield Message.Queue, url, user class PixivRankingExtractor(PixivExtractor): """Extractor for pixiv ranking pages""" subcategory = "ranking" archive_fmt = "r_{ranking[mode]}_{ranking[date]}_{id}{num}.{extension}" directory_fmt = ("{category}", "rankings", "{ranking[mode]}", "{ranking[date]}") pattern = (r"(?:https?://)?(?:www\.|touch\.)?pixiv\.net" r"/ranking\.php(?:\?([^#]*))?") test = ( ("https://www.pixiv.net/ranking.php?mode=daily&date=20170818"), ("https://www.pixiv.net/ranking.php"), ("https://touch.pixiv.net/ranking.php"), ("https://www.pixiv.net/ranking.php?mode=unknown", { "exception": exception.StopExtraction, }), ) def __init__(self, match): PixivExtractor.__init__(self, match) self.query = match.group(1) self.mode = self.date = None def works(self): return self.api.illust_ranking(self.mode, self.date) def metadata(self): query = text.parse_query(self.query) mode = query.get("mode", "daily").lower() mode_map = { "daily": "day", "daily_r18": "day_r18", "daily_ai": "day_ai", "daily_r18_ai": "day_r18_ai", "weekly": "week", "weekly_r18": "week_r18", "monthly": "month", "male": "day_male", "male_r18": "day_male_r18", "female": "day_female", "female_r18": "day_female_r18", "original": "week_original", "rookie": "week_rookie", "r18g": "week_r18g", } try: self.mode = mode = mode_map[mode] except KeyError: raise exception.StopExtraction("Invalid mode '%s'", mode) date = query.get("date") if date: if len(date) == 8 and date.isdecimal(): date = "{}-{}-{}".format(date[0:4], date[4:6], date[6:8]) else: self.log.warning("invalid date '%s'", date) date = None if not date: date = (datetime.utcnow() - timedelta(days=1)).strftime("%Y-%m-%d") self.date = date return {"ranking": { "mode": mode, "date": self.date, }} class PixivSearchExtractor(PixivExtractor): """Extractor for pixiv search results""" subcategory = "search" archive_fmt = "s_{search[word]}_{id}{num}.{extension}" directory_fmt = ("{category}", "search", "{search[word]}") pattern = (r"(?:https?://)?(?:www\.|touch\.)?pixiv\.net" r"/(?:(?:en/)?tags/([^/?#]+)(?:/[^/?#]+)?/?" r"|search\.php)(?:\?([^#]+))?") test = ( ("https://www.pixiv.net/en/tags/Original", { "range": "1-10", "count": 10, }), ("https://pixiv.net/en/tags/foo/artworks?order=week&s_mode=s_tag", { "exception": exception.StopExtraction, }), ("https://pixiv.net/en/tags/foo/artworks?order=date&s_mode=tag", { "exception": exception.StopExtraction, }), ("https://www.pixiv.net/search.php?s_mode=s_tag&name=Original", { "exception": exception.StopExtraction, }), ("https://www.pixiv.net/en/tags/foo/artworks?order=date&s_mode=s_tag"), ("https://www.pixiv.net/search.php?s_mode=s_tag&word=Original"), ("https://touch.pixiv.net/search.php?word=Original"), ) def __init__(self, match): PixivExtractor.__init__(self, match) self.word, self.query = match.groups() self.sort = self.target = None def works(self): return self.api.search_illust( self.word, self.sort, self.target, date_start=self.date_start, date_end=self.date_end) def metadata(self): query = text.parse_query(self.query) if self.word: self.word = text.unquote(self.word) else: try: self.word = query["word"] except KeyError: raise exception.StopExtraction("Missing search term") sort = query.get("order", "date_d") sort_map = { "date": "date_asc", "date_d": "date_desc", } try: self.sort = sort = sort_map[sort] except KeyError: raise exception.StopExtraction("Invalid search order '%s'", sort) target = query.get("s_mode", "s_tag_full") target_map = { "s_tag": "partial_match_for_tags", "s_tag_full": "exact_match_for_tags", "s_tc": "title_and_caption", } try: self.target = target = target_map[target] except KeyError: raise exception.StopExtraction("Invalid search mode '%s'", target) self.date_start = query.get("scd") self.date_end = query.get("ecd") return {"search": { "word": self.word, "sort": self.sort, "target": self.target, "date_start": self.date_start, "date_end": self.date_end, }} class PixivFollowExtractor(PixivExtractor): """Extractor for new illustrations from your followed artists""" subcategory = "follow" archive_fmt = "F_{user_follow[id]}_{id}{num}.{extension}" directory_fmt = ("{category}", "following") pattern = (r"(?:https?://)?(?:www\.|touch\.)?pixiv\.net" r"/bookmark_new_illust\.php") test = ( ("https://www.pixiv.net/bookmark_new_illust.php"), ("https://touch.pixiv.net/bookmark_new_illust.php"), ) def works(self): return self.api.illust_follow() def metadata(self): self.api.login() return {"user_follow": self.api.user} class PixivPixivisionExtractor(PixivExtractor): """Extractor for illustrations from a pixivision article""" subcategory = "pixivision" directory_fmt = ("{category}", "pixivision", "{pixivision_id} {pixivision_title}") archive_fmt = "V{pixivision_id}_{id}{suffix}.{extension}" pattern = r"(?:https?://)?(?:www\.)?pixivision\.net/(?:en/)?a/(\d+)" test = ( ("https://www.pixivision.net/en/a/2791"), ("https://pixivision.net/a/2791", { "count": 7, "keyword": { "pixivision_id": "2791", "pixivision_title": "What's your favorite music? Editor’s " "picks featuring: “CD Covers”!", }, }), ) def __init__(self, match): PixivExtractor.__init__(self, match) self.pixivision_id = match.group(1) def works(self): return ( self.api.illust_detail(illust_id) for illust_id in util.unique_sequence(text.extract_iter( self.page, '', '<') return { "pixivision_id" : self.pixivision_id, "pixivision_title": text.unescape(title), } class PixivSeriesExtractor(PixivExtractor): """Extractor for illustrations from a Pixiv series""" subcategory = "series" directory_fmt = ("{category}", "{user[id]} {user[account]}", "{series[id]} {series[title]}") filename_fmt = "{num_series:>03}_{id}_p{num}.{extension}" pattern = (r"(?:https?://)?(?:www\.)?pixiv\.net" r"/user/(\d+)/series/(\d+)") test = ("https://www.pixiv.net/user/10509347/series/21859", { "range": "1-10", "count": 10, "keyword": { "num_series": int, "series": { "canonical": "https://www.pixiv.net/user/10509347" "/series/21859", "description": str, "ogp": dict, "title": "先輩がうざい後輩の話", "total": int, "twitter": dict, }, }, }) def __init__(self, match): PixivExtractor.__init__(self, match) self.user_id, self.series_id = match.groups() def works(self): url = self.root + "/ajax/series/" + self.series_id params = {"p": 1} headers = { "Accept": "application/json", "Referer": "{}/user/{}/series/{}".format( self.root, self.user_id, self.series_id), "Alt-Used": "www.pixiv.net", } while True: data = self.request(url, params=params, headers=headers).json() body = data["body"] page = body["page"] series = body["extraData"]["meta"] series["id"] = self.series_id series["total"] = page["total"] series["title"] = text.extr(series["title"], '"', '"') for info in page["series"]: work = self.api.illust_detail(info["workId"]) work["num_series"] = info["order"] work["series"] = series yield work if len(page["series"]) < 10: return params["p"] += 1 class PixivSketchExtractor(Extractor): """Extractor for user pages on sketch.pixiv.net""" category = "pixiv" subcategory = "sketch" directory_fmt = ("{category}", "sketch", "{user[unique_name]}") filename_fmt = "{post_id} {id}.{extension}" archive_fmt = "S{user[id]}_{id}" root = "https://sketch.pixiv.net" cookiedomain = ".pixiv.net" pattern = r"(?:https?://)?sketch\.pixiv\.net/@([^/?#]+)" test = ("https://sketch.pixiv.net/@nicoby", { "pattern": r"https://img\-sketch\.pixiv\.net/uploads/medium" r"/file/\d+/\d+\.(jpg|png)", "count": ">= 35", }) def __init__(self, match): Extractor.__init__(self, match) self.username = match.group(1) def items(self): headers = {"Referer": "{}/@{}".format(self.root, self.username)} for post in self.posts(): media = post["media"] post["post_id"] = post["id"] post["date"] = text.parse_datetime( post["created_at"], "%Y-%m-%dT%H:%M:%S.%f%z") util.delete_items(post, ("id", "media", "_links")) yield Message.Directory, post post["_http_headers"] = headers for photo in media: original = photo["photo"]["original"] post["id"] = photo["id"] post["width"] = original["width"] post["height"] = original["height"] url = original["url"] text.nameext_from_url(url, post) yield Message.Url, url, post def posts(self): url = "{}/api/walls/@{}/posts/public.json".format( self.root, self.username) headers = { "Accept": "application/vnd.sketch-v4+json", "X-Requested-With": "{}/@{}".format(self.root, self.username), "Referer": self.root + "/", } while True: data = self.request(url, headers=headers).json() yield from data["data"]["items"] next_url = data["_links"].get("next") if not next_url: return url = self.root + next_url["href"] class PixivAppAPI(): """Minimal interface for the Pixiv App API for mobile devices For a more complete implementation or documentation, see - https://github.com/upbit/pixivpy - https://gist.github.com/ZipFile/3ba99b47162c23f8aea5d5942bb557b1 """ CLIENT_ID = "MOBrBDS8blbauoSck0ZfDbtuzpyT" CLIENT_SECRET = "lsACyCD94FhDUtGTXi3QzcFE2uU1hqtDaKeqrdwj" HASH_SECRET = ("28c1fdd170a5204386cb1313c7077b34" "f83e4aaf4aa829ce78c231e05b0bae2c") def __init__(self, extractor): self.extractor = extractor self.log = extractor.log self.username = extractor._get_auth_info()[0] self.user = None extractor.session.headers.update({ "App-OS" : "ios", "App-OS-Version": "13.1.2", "App-Version" : "7.7.6", "User-Agent" : "PixivIOSApp/7.7.6 (iOS 13.1.2; iPhone11,8)", "Referer" : "https://app-api.pixiv.net/", }) self.client_id = extractor.config( "client-id", self.CLIENT_ID) self.client_secret = extractor.config( "client-secret", self.CLIENT_SECRET) token = extractor.config("refresh-token") if token is None or token == "cache": token = _refresh_token_cache(self.username) self.refresh_token = token def login(self): """Login and gain an access token""" self.user, auth = self._login_impl(self.username) self.extractor.session.headers["Authorization"] = auth @cache(maxage=3600, keyarg=1) def _login_impl(self, username): if not self.refresh_token: raise exception.AuthenticationError( "'refresh-token' required.\n" "Run `gallery-dl oauth:pixiv` to get one.") self.log.info("Refreshing access token") url = "https://oauth.secure.pixiv.net/auth/token" data = { "client_id" : self.client_id, "client_secret" : self.client_secret, "grant_type" : "refresh_token", "refresh_token" : self.refresh_token, "get_secure_url": "1", } time = datetime.utcnow().strftime("%Y-%m-%dT%H:%M:%S+00:00") headers = { "X-Client-Time": time, "X-Client-Hash": hashlib.md5( (time + self.HASH_SECRET).encode()).hexdigest(), } response = self.extractor.request( url, method="POST", headers=headers, data=data, fatal=False) if response.status_code >= 400: self.log.debug(response.text) raise exception.AuthenticationError("Invalid refresh token") data = response.json()["response"] return data["user"], "Bearer " + data["access_token"] def illust_detail(self, illust_id): params = {"illust_id": illust_id} return self._call("/v1/illust/detail", params)["illust"] def illust_bookmark_detail(self, illust_id): params = {"illust_id": illust_id} return self._call( "/v2/illust/bookmark/detail", params)["bookmark_detail"] def illust_follow(self, restrict="all"): params = {"restrict": restrict} return self._pagination("/v2/illust/follow", params) def illust_ranking(self, mode="day", date=None): params = {"mode": mode, "date": date} return self._pagination("/v1/illust/ranking", params) def illust_related(self, illust_id): params = {"illust_id": illust_id} return self._pagination("/v2/illust/related", params) def search_illust(self, word, sort=None, target=None, duration=None, date_start=None, date_end=None): params = {"word": word, "search_target": target, "sort": sort, "duration": duration, "start_date": date_start, "end_date": date_end} return self._pagination("/v1/search/illust", params) def user_bookmarks_illust(self, user_id, tag=None, restrict="public"): """Return illusts bookmarked by a user""" params = {"user_id": user_id, "tag": tag, "restrict": restrict} return self._pagination("/v1/user/bookmarks/illust", params) def user_bookmark_tags_illust(self, user_id, restrict="public"): """Return bookmark tags defined by a user""" params = {"user_id": user_id, "restrict": restrict} return self._pagination( "/v1/user/bookmark-tags/illust", params, "bookmark_tags") @memcache(keyarg=1) def user_detail(self, user_id): params = {"user_id": user_id} return self._call("/v1/user/detail", params) def user_following(self, user_id, restrict="public"): params = {"user_id": user_id, "restrict": restrict} return self._pagination("/v1/user/following", params, "user_previews") def user_illusts(self, user_id): params = {"user_id": user_id} return self._pagination("/v1/user/illusts", params) def ugoira_metadata(self, illust_id): params = {"illust_id": illust_id} return self._call("/v1/ugoira/metadata", params)["ugoira_metadata"] def _call(self, endpoint, params=None): url = "https://app-api.pixiv.net" + endpoint while True: self.login() response = self.extractor.request(url, params=params, fatal=False) data = response.json() if "error" not in data: return data self.log.debug(data) if response.status_code == 404: raise exception.NotFoundError() error = data["error"] if "rate limit" in (error.get("message") or "").lower(): self.extractor.wait(seconds=300) continue raise exception.StopExtraction("API request failed: %s", error) def _pagination(self, endpoint, params, key="illusts"): while True: data = self._call(endpoint, params) yield from data[key] if not data["next_url"]: return query = data["next_url"].rpartition("?")[2] params = text.parse_query(query) @cache(maxage=10*365*24*3600, keyarg=0) def _refresh_token_cache(username): return None ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1674475069.0 gallery_dl-1.25.0/gallery_dl/extractor/pixnet.py0000644000175000017500000001524614363473075020447 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2016-2020 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.pixnet.net/""" from .common import Extractor, Message from .. import text, exception BASE_PATTERN = r"(?:https?://)?(?!www\.)([\w-]+)\.pixnet.net" class PixnetExtractor(Extractor): """Base class for pixnet extractors""" category = "pixnet" filename_fmt = "{num:>03}_{id}.{extension}" archive_fmt = "{id}" url_fmt = "" def __init__(self, match): Extractor.__init__(self, match) self.blog, self.item_id = match.groups() self.root = "https://{}.pixnet.net".format(self.blog) def items(self): url = self.url_fmt.format(self.root, self.item_id) page = self.request(url, encoding="utf-8").text user = text.extr(page, '') if pnext is None and 'name="albumpass">' in page: raise exception.StopExtraction( "Album %s is password-protected.", self.item_id) if "href" not in pnext: return url = self.root + text.extr(pnext, 'href="', '"') page = self.request(url, encoding="utf-8").text class PixnetImageExtractor(PixnetExtractor): """Extractor for a single photo from pixnet.net""" subcategory = "image" filename_fmt = "{id}.{extension}" directory_fmt = ("{category}", "{blog}") pattern = BASE_PATTERN + r"/album/photo/(\d+)" test = ("https://albertayu773.pixnet.net/album/photo/159443828", { "url": "156564c422138914c9fa5b42191677b45c414af4", "keyword": "19971bcd056dfef5593f4328a723a9602be0f087", "content": "0e097bdf49e76dd9b9d57a016b08b16fa6a33280", }) def items(self): url = "https://api.pixnet.cc/oembed" params = { "url": "https://{}.pixnet.net/album/photo/{}".format( self.blog, self.item_id), "format": "json", } data = self.request(url, params=params).json() data["id"] = text.parse_int( data["url"].rpartition("/")[2].partition("-")[0]) data["filename"], _, data["extension"] = data["title"].rpartition(".") data["blog"] = self.blog data["user"] = data.pop("author_name") yield Message.Directory, data yield Message.Url, data["url"], data class PixnetSetExtractor(PixnetExtractor): """Extractor for images from a pixnet set""" subcategory = "set" url_fmt = "{}/album/set/{}" directory_fmt = ("{category}", "{blog}", "{folder_id} {folder_title}", "{set_id} {set_title}") pattern = BASE_PATTERN + r"/album/set/(\d+)" test = ( ("https://albertayu773.pixnet.net/album/set/15078995", { "url": "6535712801af47af51110542f4938a7cef44557f", "keyword": "bf25d59e5b0959cb1f53e7fd2e2a25f2f67e5925", }), ("https://anrine910070.pixnet.net/album/set/5917493", { "url": "b3eb6431aea0bcf5003432a4a0f3a3232084fc13", "keyword": "bf7004faa1cea18cf9bd856f0955a69be51b1ec6", }), ("https://sky92100.pixnet.net/album/set/17492544", { "count": 0, # password-protected }), ) def items(self): url = self.url_fmt.format(self.root, self.item_id) page = self.request(url, encoding="utf-8").text data = self.metadata(page) yield Message.Directory, data for num, info in enumerate(self._pagination(page), 1): url, pos = text.extract(info, ' href="', '"') src, pos = text.extract(info, ' src="', '"', pos) alt, pos = text.extract(info, ' alt="', '"', pos) photo = { "id": text.parse_int(url.rpartition("/")[2].partition("#")[0]), "url": src.replace("_s.", "."), "num": num, "filename": alt, "extension": src.rpartition(".")[2], } photo.update(data) yield Message.Url, photo["url"], photo def metadata(self, page): user , pos = text.extract(page, '', '<', pos) sid , pos = text.extract(page, '/set/', '"', pos) sname, pos = text.extract(page, '>', '<', pos) return { "blog": self.blog, "user": user.rpartition(" (")[0], "folder_id" : text.parse_int(fid, ""), "folder_title": text.unescape(fname).strip(), "set_id" : text.parse_int(sid), "set_title" : text.unescape(sname), } class PixnetFolderExtractor(PixnetExtractor): """Extractor for all sets in a pixnet folder""" subcategory = "folder" url_fmt = "{}/album/folder/{}" pattern = BASE_PATTERN + r"/album/folder/(\d+)" test = ("https://albertayu773.pixnet.net/album/folder/1405768", { "pattern": PixnetSetExtractor.pattern, "count": ">= 15", }) class PixnetUserExtractor(PixnetExtractor): """Extractor for all sets and folders of a pixnet user""" subcategory = "user" url_fmt = "{}{}/album/list" pattern = BASE_PATTERN + r"()(?:/blog|/album(?:/list)?)?/?(?:$|[?#])" test = ( ("https://albertayu773.pixnet.net/"), ("https://albertayu773.pixnet.net/blog"), ("https://albertayu773.pixnet.net/album"), ("https://albertayu773.pixnet.net/album/list", { "pattern": PixnetFolderExtractor.pattern, "count": ">= 30", }), ("https://anrine910070.pixnet.net/album/list", { "pattern": PixnetSetExtractor.pattern, "count": ">= 14", }), ) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1677790111.0 gallery_dl-1.25.0/gallery_dl/extractor/plurk.py0000644000175000017500000001037714400205637020262 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2019-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.plurk.com/""" from .common import Extractor, Message from .. import text, util, exception import datetime import re class PlurkExtractor(Extractor): """Base class for plurk extractors""" category = "plurk" root = "https://www.plurk.com" request_interval = 1.0 def items(self): urls = self._urls_ex if self.config("comments", False) else self._urls for plurk in self.plurks(): for url in urls(plurk): yield Message.Queue, url, plurk def plurks(self): """Return an iterable with all relevant 'plurk' objects""" @staticmethod def _urls(obj): """Extract URLs from a 'plurk' object""" return text.extract_iter(obj["content"], ' href="', '"') def _urls_ex(self, plurk): """Extract URLs from a 'plurk' and its comments""" yield from self._urls(plurk) for comment in self._comments(plurk): yield from self._urls(comment) def _comments(self, plurk): """Return an iterable with a 'plurk's comments""" url = "https://www.plurk.com/Responses/get" data = {"plurk_id": plurk["id"], "count": "200"} headers = { "Origin": self.root, "Referer": self.root, "X-Requested-With": "XMLHttpRequest", } while True: info = self.request( url, method="POST", headers=headers, data=data).json() yield from info["responses"] if not info["has_newer"]: return elif info["has_newer"] < 200: del data["count"] data["from_response_id"] = info["responses"][-1]["id"] + 1 @staticmethod def _load(data): if not data: raise exception.NotFoundError("user") return util.json_loads(re.sub(r"new Date\(([^)]+)\)", r"\1", data)) class PlurkTimelineExtractor(PlurkExtractor): """Extractor for URLs from all posts in a Plurk timeline""" subcategory = "timeline" pattern = r"(?:https?://)?(?:www\.)?plurk\.com/(?!p/)(\w+)/?(?:$|[?#])" test = ("https://www.plurk.com/plurkapi", { "pattern": r"https?://.+", "count": ">= 23" }) def __init__(self, match): PlurkExtractor.__init__(self, match) self.user = match.group(1) def plurks(self): url = "{}/{}".format(self.root, self.user) page = self.request(url).text user_id, pos = text.extract(page, '"page_user": {"id":', ',') plurks = self._load(text.extract(page, "_PLURKS = ", ";\n", pos)[0]) headers = {"Referer": url, "X-Requested-With": "XMLHttpRequest"} data = {"user_id": user_id.strip()} url = "https://www.plurk.com/TimeLine/getPlurks" while plurks: yield from plurks offset = datetime.datetime.strptime( plurks[-1]["posted"], "%a, %d %b %Y %H:%M:%S %Z") data["offset"] = offset.strftime("%Y-%m-%dT%H:%M:%S.000Z") response = self.request( url, method="POST", headers=headers, data=data) plurks = response.json()["plurks"] class PlurkPostExtractor(PlurkExtractor): """Extractor for URLs from a Plurk post""" subcategory = "post" pattern = r"(?:https?://)?(?:www\.)?plurk\.com/p/(\w+)" test = ( ("https://www.plurk.com/p/i701j1", { "url": "2115f208564591b8748525c2807a84596aaaaa5f", "count": 3, }), ("https://www.plurk.com/p/i701j1", { "options": (("comments", True),), "count": ">= 210", }), ) def __init__(self, match): PlurkExtractor.__init__(self, match) self.plurk_id = match.group(1) def plurks(self): url = "{}/p/{}".format(self.root, self.plurk_id) page = self.request(url).text user, pos = text.extract(page, " GLOBAL = ", "\n") data, pos = text.extract(page, "plurk = ", ";\n", pos) data = self._load(data) data["user"] = self._load(user)["page_user"] return (data,) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1676124591.0 gallery_dl-1.25.0/gallery_dl/extractor/poipiku.py0000644000175000017500000001470614371720657020620 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2022-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://poipiku.com/""" from .common import Extractor, Message from .. import text BASE_PATTERN = r"(?:https?://)?poipiku\.com" class PoipikuExtractor(Extractor): """Base class for poipiku extractors""" category = "poipiku" root = "https://poipiku.com" directory_fmt = ("{category}", "{user_id} {user_name}") filename_fmt = "{post_id}_{num}.{extension}" archive_fmt = "{post_id}_{num}" request_interval = (0.5, 1.5) def items(self): password = self.config("password", "") for post_url in self.posts(): parts = post_url.split("/") if post_url[0] == "/": post_url = self.root + post_url page = self.request(post_url).text extr = text.extract_from(page) post = { "post_category": extr("[", "]"), "count" : extr("(", " "), "post_id" : parts[-1].partition(".")[0], "user_id" : parts[-2], "user_name" : text.unescape(extr( '<h2 class="UserInfoUserName">', '</').rpartition(">")[2]), "description": text.unescape(extr( 'class="IllustItemDesc" >', '<')), "_http_headers": {"Referer": post_url}, } yield Message.Directory, post post["num"] = 0 while True: thumb = extr('class="IllustItemThumbImg" src="', '"') if not thumb: break elif thumb.startswith(("//img.poipiku.com/img/", "/img/")): continue post["num"] += 1 url = text.ensure_http_scheme(thumb[:-8]).replace( "//img.", "//img-org.", 1) yield Message.Url, url, text.nameext_from_url(url, post) if not extr(' show all(+', '<'): continue url = self.root + "/f/ShowAppendFileF.jsp" headers = { "Accept" : "application/json, text/javascript, */*; q=0.01", "X-Requested-With": "XMLHttpRequest", "Origin" : self.root, "Referer": post_url, } data = { "UID": post["user_id"], "IID": post["post_id"], "PAS": password, "MD" : "0", "TWF": "-1", } page = self.request( url, method="POST", headers=headers, data=data).json()["html"] if page.startswith(("You need to", "Password is incorrect")): self.log.warning("'%s'", page) for thumb in text.extract_iter( page, 'class="IllustItemThumbImg" src="', '"'): post["num"] += 1 url = text.ensure_http_scheme(thumb[:-8]).replace( "//img.", "//img-org.", 1) yield Message.Url, url, text.nameext_from_url(url, post) class PoipikuUserExtractor(PoipikuExtractor): """Extractor for posts from a poipiku user""" subcategory = "user" pattern = (BASE_PATTERN + r"/(?:IllustListPcV\.jsp\?PG=(\d+)&ID=)?" r"(\d+)/?(?:$|[?&#])") test = ( ("https://poipiku.com/25049/", { "pattern": r"https://img-org\.poipiku\.com/user_img\d+/000025049" r"/\d+_\w+\.(jpe?g|png)$", "range": "1-10", "count": 10, }), ("https://poipiku.com/IllustListPcV.jsp?PG=1&ID=25049&KWD=") ) def __init__(self, match): PoipikuExtractor.__init__(self, match) self._page, self.user_id = match.groups() def posts(self): url = self.root + "/IllustListPcV.jsp" params = { "PG" : text.parse_int(self._page, 0), "ID" : self.user_id, "KWD": "", } while True: page = self.request(url, params=params).text cnt = 0 for path in text.extract_iter( page, 'class="IllustInfo" href="', '"'): yield path cnt += 1 if cnt < 48: return params["PG"] += 1 class PoipikuPostExtractor(PoipikuExtractor): """Extractor for a poipiku post""" subcategory = "post" pattern = BASE_PATTERN + r"/(\d+)/(\d+)" test = ( ("https://poipiku.com/25049/5864576.html", { "pattern": r"https://img-org\.poipiku\.com/user_img\d+/000025049" r"/005864576_EWN1Y65gQ\.png$", "keyword": { "count": "1", "description": "", "extension": "png", "filename": "005864576_EWN1Y65gQ", "num": 1, "post_category": "DOODLE", "post_id": "5864576", "user_id": "25049", "user_name": "ユキウサギ", }, }), ("https://poipiku.com/2166245/6411749.html", { "pattern": r"https://img-org\.poipiku\.com/user_img\d+/002166245" r"/006411749_\w+\.jpeg$", "count": 4, "keyword": { "count": "4", "description": "絵茶の産物ネタバレあるやつ", "num": int, "post_category": "SPOILER", "post_id": "6411749", "user_id": "2166245", "user_name": "wadahito", }, }), # different warning button style ("https://poipiku.com/3572553/5776587.html", { "pattern": r"https://img-org\.poipiku.com/user_img\d+/003572553" r"/005776587_(\d+_)?\w+\.jpeg$", "count": 3, "keyword": { "count": "3", "description": "ORANGE OASISボスネタバレ", "num": int, "post_category": "SPOILER", "post_id": "5776587", "user_id": "3572553", "user_name": "nagakun", }, }), ) def __init__(self, match): PoipikuExtractor.__init__(self, match) self.user_id, self.post_id = match.groups() def posts(self): return ("/{}/{}.html".format(self.user_id, self.post_id),) ����������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1651573353.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.25.0/gallery_dl/extractor/pornhub.py���������������������������������������������������0000644�0001750�0001750�00000012427�14234201151�020570� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2019-2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.pornhub.com/""" from .common import Extractor, Message from .. import text, exception BASE_PATTERN = r"(?:https?://)?(?:[\w-]+\.)?pornhub\.com" class PornhubExtractor(Extractor): """Base class for pornhub extractors""" category = "pornhub" root = "https://www.pornhub.com" class PornhubGalleryExtractor(PornhubExtractor): """Extractor for image galleries on pornhub.com""" subcategory = "gallery" directory_fmt = ("{category}", "{user}", "{gallery[id]} {gallery[title]}") filename_fmt = "{num:>03}_{id}.{extension}" archive_fmt = "{id}" pattern = BASE_PATTERN + r"/album/(\d+)" test = ( ("https://www.pornhub.com/album/19289801", { "pattern": r"https://\w+.phncdn.com/pics/albums/\d+/\d+/\d+/\d+/", "count": ">= 300", "keyword": { "id" : int, "num" : int, "score" : int, "views" : int, "caption": str, "user" : "Danika Mori", "gallery": { "id" : 19289801, "score": int, "views": int, "tags" : list, "title": "Danika Mori Best Moments", }, }, }), ("https://www.pornhub.com/album/69040172", { "exception": exception.AuthorizationError, }), ) def __init__(self, match): PornhubExtractor.__init__(self, match) self.gallery_id = match.group(1) self._first = None def items(self): data = self.metadata() yield Message.Directory, data for num, image in enumerate(self.images(), 1): url = image["url"] image.update(data) image["num"] = num yield Message.Url, url, text.nameext_from_url(url, image) def metadata(self): url = "{}/album/{}".format( self.root, self.gallery_id) extr = text.extract_from(self.request(url).text) title = extr("<title>", "") score = extr('
    ', '<') tags = extr('
    = 6", }), ("https://www.pornhub.com/users/flyings0l0/"), ("https://www.pornhub.com/users/flyings0l0/photos/public"), ("https://www.pornhub.com/users/flyings0l0/photos/private"), ("https://www.pornhub.com/users/flyings0l0/photos/favorites"), ("https://www.pornhub.com/model/bossgirl/photos"), ) def __init__(self, match): PornhubExtractor.__init__(self, match) self.type, self.user, self.cat = match.groups() def items(self): url = "{}/{}/{}/photos/{}/ajax".format( self.root, self.type, self.user, self.cat or "public") params = {"page": 1} headers = { "Referer": url[:-5], "X-Requested-With": "XMLHttpRequest", } data = {"_extractor": PornhubGalleryExtractor} while True: page = self.request( url, method="POST", headers=headers, params=params).text if not page: return for gid in text.extract_iter(page, 'id="albumphoto', '"'): yield Message.Queue, self.root + "/album/" + gid, data params["page"] += 1 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1676756233.0 gallery_dl-1.25.0/gallery_dl/extractor/pornpics.py0000644000175000017500000001363414374242411020762 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.pornpics.com/""" from .common import GalleryExtractor, Extractor, Message from .. import text BASE_PATTERN = r"(?:https?://)?(?:www\.)?pornpics\.com(?:/\w\w)?" class PornpicsExtractor(Extractor): """Base class for pornpics extractors""" category = "pornpics" root = "https://www.pornpics.com" request_interval = (0.5, 1.5) def __init__(self, match): super().__init__(match) self.item = match.group(1) self.session.headers["Referer"] = self.root def items(self): for gallery in self.galleries(): gallery["_extractor"] = PornpicsGalleryExtractor yield Message.Queue, gallery["g_url"], gallery def _pagination(self, url, params=None): if params is None: # fetch first 20 galleries from HTML # since '"offset": 0' does not return a JSON response page = self.request(url).text for path in text.extract_iter( page, 'class="rel-link" href="', '"'): yield {"g_url": self.root + path} del page params = {"offset": 20} limit = params["limit"] = 20 headers = { "Accept": "application/json, text/javascript, */*; q=0.01", "Referer": url if params["offset"] else self.root + "/", "X-Requested-With": "XMLHttpRequest", } while True: galleries = self.request( url, params=params, headers=headers).json() yield from galleries if len(galleries) < limit: return params["offset"] += limit class PornpicsGalleryExtractor(PornpicsExtractor, GalleryExtractor): """Extractor for pornpics galleries""" pattern = BASE_PATTERN + r"(/galleries/(?:[^/?#]+-)?(\d+))" test = ( (("https://www.pornpics.com/galleries/british-beauty-danielle-flashes-" "hot-breasts-ass-and-snatch-in-the-forest-62610699/"), { "pattern": r"https://cdni\.pornpics\.com/1280/7/160/62610699" r"/62610699_\d+_[0-9a-f]{4}\.jpg", "keyword": { "categories": ["MILF", "Amateur", "Sexy", "Outdoor"], "channel": "FTV MILFs", "count": 17, "gallery_id": 62610699, "models": ["Danielle"], "num": int, "slug": "british-beauty-danielle-flashes-" "hot-breasts-ass-and-snatch-in-the-forest", "tags": ["Amateur MILF", "Sexy MILF"], "title": "British beauty Danielle flashes " "hot breasts, ass and snatch in the forest", "views": int, }, }), ("https://pornpics.com/es/galleries/62610699", { "keyword": { "slug": "british-beauty-danielle-flashes-" "hot-breasts-ass-and-snatch-in-the-forest", }, }), ) def __init__(self, match): PornpicsExtractor.__init__(self, match) self.gallery_id = match.group(2) items = GalleryExtractor.items def metadata(self, page): extr = text.extract_from(page) return { "gallery_id": text.parse_int(self.gallery_id), "slug" : extr("/galleries/", "/").rpartition("-")[0], "title" : text.unescape(extr("

    ", "<")), "channel" : extr('>Channel:', '').rpartition(">")[2], "models" : text.split_html(extr( ">Models:", 'Categories:", 'Tags List:", '

    ')), "views" : text.parse_int(extr(">Views:", "<").replace(",", "")), } def images(self, page): return [ (url, None) for url in text.extract_iter(page, "class='rel-link' href='", "'") ] class PornpicsTagExtractor(PornpicsExtractor): """Extractor for galleries from pornpics tag searches""" subcategory = "tag" pattern = BASE_PATTERN + r"/tags/([^/?#]+)" test = ( ("https://www.pornpics.com/tags/summer-dress/", { "pattern": PornpicsGalleryExtractor.pattern, "range": "1-50", "count": 50, }), ("https://pornpics.com/fr/tags/summer-dress"), ) def galleries(self): url = "{}/tags/{}/".format(self.root, self.item) return self._pagination(url) class PornpicsSearchExtractor(PornpicsExtractor): """Extractor for galleries from pornpics search results""" subcategory = "search" pattern = BASE_PATTERN + r"/(?:\?q=|pornstars/|channels/)([^/&#]+)" test = ( ("https://www.pornpics.com/?q=nature", { "pattern": PornpicsGalleryExtractor.pattern, "range": "1-50", "count": 50, }), ("https://www.pornpics.com/channels/femjoy/", { "pattern": PornpicsGalleryExtractor.pattern, "range": "1-50", "count": 50, }), ("https://www.pornpics.com/pornstars/emma-brown/", { "pattern": PornpicsGalleryExtractor.pattern, "range": "1-50", "count": 50, }), ("https://pornpics.com/jp/?q=nature"), ("https://pornpics.com/it/channels/femjoy"), ("https://pornpics.com/pt/pornstars/emma-brown"), ) def galleries(self): url = self.root + "/search/srch.php" params = { "q" : self.item.replace("-", " "), "lang" : "en", "offset": 0, } return self._pagination(url, params) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1675803597.0 gallery_dl-1.25.0/gallery_dl/extractor/pururin.py0000644000175000017500000000755414370535715020645 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2019-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://pururin.to/""" from .common import GalleryExtractor from .. import text, util import binascii class PururinGalleryExtractor(GalleryExtractor): """Extractor for image galleries on pururin.io""" category = "pururin" pattern = r"(?:https?://)?(?:www\.)?pururin\.[ti]o/(?:gallery|read)/(\d+)" test = ( ("https://pururin.to/gallery/38661/iowant-2", { "pattern": r"https://cdn.pururin.[ti]o/\w+" r"/images/data/\d+/\d+\.jpg", "keyword": { "title" : "re:I ?owant 2!!", "title_en" : "re:I ?owant 2!!", "title_jp" : "", "gallery_id": 38661, "count" : 19, "artist" : ["Shoda Norihiro"], "group" : ["Obsidian Order"], "parody" : ["Kantai Collection"], "characters": ["Iowa", "Teitoku"], "tags" : list, "type" : "Doujinshi", "collection": "I owant you!", "convention": "C92", "rating" : float, "uploader" : "demo", "scanlator" : "mrwayne", "lang" : "en", "language" : "English", } }), ("https://pururin.to/gallery/7661/unisis-team-vanilla", { "count": 17, }), ("https://pururin.io/gallery/38661/iowant-2"), ) root = "https://pururin.to" def __init__(self, match): self.gallery_id = match.group(1) url = "{}/gallery/{}/x".format(self.root, self.gallery_id) GalleryExtractor.__init__(self, match, url) self._ext = "" self._cnt = 0 def metadata(self, page): extr = text.extract_from(page) def _lst(key, e=extr): return [ text.unescape(item) for item in text.extract_iter(e(key, ""), 'title="', '"') ] def _str(key, e=extr): return text.unescape(text.extract( e(key, ""), 'title="', '"')[0] or "") url = "{}/read/{}/01/x".format(self.root, self.gallery_id) page = self.request(url).text info = util.json_loads(binascii.a2b_base64(text.extr( page, 'Artist"), "group" : _lst("Circle"), "parody" : _lst("Parody"), "tags" : _lst("Contents"), "type" : _str("Category"), "characters": _lst("Character"), "collection": _str("Collection"), "language" : _str("Language"), "scanlator" : _str("Scanlator"), "convention": _str("Convention"), "uploader" : text.remove_html(extr("Uploader", "")), "rating" : text.parse_float(extr(" :rating='" , "'")), } data["lang"] = util.language_to_code(data["language"]) return data def images(self, _): ufmt = "https://cdn.pururin.to/assets/images/data/{}/{{}}.{}".format( self.gallery_id, self._ext) return [(ufmt.format(num), None) for num in range(1, self._cnt + 1)] ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1675803953.0 gallery_dl-1.25.0/gallery_dl/extractor/reactor.py0000644000175000017500000002542514370536461020574 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2019-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Generic extractors for *reactor sites""" from .common import BaseExtractor, Message from .. import text, util import urllib.parse class ReactorExtractor(BaseExtractor): """Base class for *reactor.cc extractors""" basecategory = "reactor" filename_fmt = "{post_id}_{num:>02}{title[:100]:?_//}.{extension}" archive_fmt = "{post_id}_{num}" request_interval = 5.0 def __init__(self, match): BaseExtractor.__init__(self, match) url = text.ensure_http_scheme(match.group(0), "http://") pos = url.index("/", 10) self.root, self.path = url[:pos], url[pos:] self.session.headers["Referer"] = self.root self.gif = self.config("gif", False) if self.category == "reactor": # set category based on domain name netloc = urllib.parse.urlsplit(self.root).netloc self.category = netloc.rpartition(".")[0] def items(self): data = self.metadata() yield Message.Directory, data for post in self.posts(): for image in self._parse_post(post): url = image["url"] image.update(data) yield Message.Url, url, text.nameext_from_url(url, image) def metadata(self): """Collect metadata for extractor-job""" return {} def posts(self): """Return all relevant post-objects""" return self._pagination(self.root + self.path) def _pagination(self, url): while True: response = self.request(url) if response.history: # sometimes there is a redirect from # the last page of a listing (.../tag//1) # to the first page (.../tag/) # which could cause an endless loop cnt_old = response.history[0].url.count("/") cnt_new = response.url.count("/") if cnt_old == 5 and cnt_new == 4: return page = response.text yield from text.extract_iter( page, '
    ', '
    ') try: pos = page.index("class='next'") pos = page.rindex("class='current'", 0, pos) url = self.root + text.extract(page, "href='", "'", pos)[0] except (ValueError, TypeError): return def _parse_post(self, post): post, _, script = post.partition('").rstrip("\n\r;")) class XhamsterUserExtractor(XhamsterExtractor): """Extractor for all galleries of an xhamster user""" subcategory = "user" pattern = BASE_PATTERN + r"/users/([^/?#]+)(?:/photos)?/?(?:$|[?#])" test = ( ("https://xhamster.com/users/goldenpalomino/photos", { "pattern": XhamsterGalleryExtractor.pattern, "count": 50, "range": "1-50", }), ("https://xhamster.com/users/nickname68"), ) def __init__(self, match): XhamsterExtractor.__init__(self, match) self.user = match.group(2) def items(self): url = "{}/users/{}/photos".format(self.root, self.user) data = {"_extractor": XhamsterGalleryExtractor} while url: extr = text.extract_from(self.request(url).text) while True: url = extr('thumb-image-container role-pop" href="', '"') if not url: break yield Message.Queue, url, data url = extr('data-page="next" href="', '"') ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1675804813.0 gallery_dl-1.25.0/gallery_dl/extractor/xvideos.py0000644000175000017500000001145214370540215020601 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2017-2019 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.xvideos.com/""" from .common import GalleryExtractor, Extractor, Message from .. import text, util class XvideosBase(): """Base class for xvideos extractors""" category = "xvideos" root = "https://www.xvideos.com" class XvideosGalleryExtractor(XvideosBase, GalleryExtractor): """Extractor for user profile galleries on xvideos.com""" subcategory = "gallery" directory_fmt = ("{category}", "{user[name]}", "{gallery[id]} {gallery[title]}") filename_fmt = "{category}_{gallery[id]}_{num:>03}.{extension}" archive_fmt = "{gallery[id]}_{num}" pattern = (r"(?:https?://)?(?:www\.)?xvideos\.com" r"/(?:profiles|amateur-channels|model-channels)" r"/([^/?#]+)/photos/(\d+)") test = ( ("https://www.xvideos.com/profiles/pervertedcouple/photos/751031", { "count": 8, "pattern": r"https://profile-pics-cdn\d+\.xvideos-cdn\.com" r"/[^/]+\,\d+/videos/profiles/galleries/84/ca/37" r"/pervertedcouple/gal751031/pic_\d+_big\.jpg", "keyword": { "gallery": { "id" : 751031, "title": "Random Stuff", "tags" : list, }, "user": { "id" : 20245371, "name" : "pervertedcouple", "display" : "Pervertedcouple", "sex" : "Woman", "description": str, }, }, }), ("https://www.xvideos.com/amateur-channels/pervertedcouple/photos/12"), ("https://www.xvideos.com/model-channels/pervertedcouple/photos/12"), ) def __init__(self, match): self.user, self.gallery_id = match.groups() url = "{}/profiles/{}/photos/{}".format( self.root, self.user, self.gallery_id) GalleryExtractor.__init__(self, match, url) def metadata(self, page): extr = text.extract_from(page) title = extr('"title":"', '"') user = { "id" : text.parse_int(extr('"id_user":', ',')), "display": extr('"display":"', '"'), "sex" : extr('"sex":"', '"'), "name" : self.user, } user["description"] = extr( '', '').strip() tags = extr('Tagged:', '<').strip() return { "user": user, "gallery": { "id" : text.parse_int(self.gallery_id), "title": text.unescape(title), "tags" : text.unescape(tags).split(", ") if tags else [], }, } @staticmethod def images(page): """Return a list of all image urls for this gallery""" return [ (url, None) for url in text.extract_iter( page, '"))["data"] if not isinstance(data["galleries"], dict): return if "0" in data["galleries"]: del data["galleries"]["0"] galleries = [ { "id" : text.parse_int(gid), "title": text.unescape(gdata["title"]), "count": gdata["nb_pics"], "_extractor": XvideosGalleryExtractor, } for gid, gdata in data["galleries"].items() ] galleries.sort(key=lambda x: x["id"]) for gallery in galleries: url = "https://www.xvideos.com/profiles/{}/photos/{}".format( self.user, gallery["id"]) yield Message.Queue, url, gallery ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1653657633.0 gallery_dl-1.25.0/gallery_dl/extractor/ytdl.py0000644000175000017500000001165614244150041020074 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2021-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for sites supported by youtube-dl""" from .common import Extractor, Message from .. import ytdl, config, exception class YoutubeDLExtractor(Extractor): """Generic extractor for youtube-dl supported URLs""" category = "ytdl" directory_fmt = ("{category}", "{subcategory}") filename_fmt = "{title}-{id}.{extension}" archive_fmt = "{extractor_key} {id}" pattern = r"ytdl:(.*)" test = ("ytdl:https://www.youtube.com/watch?v=BaW_jenozKc&t=1s&end=9",) def __init__(self, match): # import main youtube_dl module ytdl_module = ytdl.import_module(config.get( ("extractor", "ytdl"), "module")) self.ytdl_module_name = ytdl_module.__name__ # find suitable youtube_dl extractor self.ytdl_url = url = match.group(1) generic = config.interpolate(("extractor", "ytdl"), "generic", True) if generic == "force": self.ytdl_ie_key = "Generic" self.force_generic_extractor = True else: for ie in ytdl_module.extractor.gen_extractor_classes(): if ie.suitable(url): self.ytdl_ie_key = ie.ie_key() break if not generic and self.ytdl_ie_key == "Generic": raise exception.NoExtractorError() self.force_generic_extractor = False # set subcategory to youtube_dl extractor's key self.subcategory = self.ytdl_ie_key Extractor.__init__(self, match) def items(self): # import subcategory module ytdl_module = ytdl.import_module( config.get(("extractor", "ytdl", self.subcategory), "module") or self.ytdl_module_name) self.log.debug("Using %s", ytdl_module) # construct YoutubeDL object extr_opts = { "extract_flat" : "in_playlist", "force_generic_extractor": self.force_generic_extractor, } user_opts = { "retries" : self._retries, "socket_timeout" : self._timeout, "nocheckcertificate" : not self._verify, } if self._proxies: user_opts["proxy"] = self._proxies.get("http") username, password = self._get_auth_info() if username: user_opts["username"], user_opts["password"] = username, password del username, password ytdl_instance = ytdl.construct_YoutubeDL( ytdl_module, self, user_opts, extr_opts) # transfer cookies to ytdl cookies = self.session.cookies if cookies: set_cookie = ytdl_instance.cookiejar.set_cookie for cookie in cookies: set_cookie(cookie) # extract youtube_dl info_dict try: info_dict = ytdl_instance._YoutubeDL__extract_info( self.ytdl_url, ytdl_instance.get_info_extractor(self.ytdl_ie_key), False, {}, True) except ytdl_module.utils.YoutubeDLError: raise exception.StopExtraction("Failed to extract video data") if not info_dict: return elif "entries" in info_dict: results = self._process_entries( ytdl_module, ytdl_instance, info_dict["entries"]) else: results = (info_dict,) # yield results for info_dict in results: info_dict["extension"] = None info_dict["_ytdl_info_dict"] = info_dict info_dict["_ytdl_instance"] = ytdl_instance url = "ytdl:" + (info_dict.get("url") or info_dict.get("webpage_url") or self.ytdl_url) yield Message.Directory, info_dict yield Message.Url, url, info_dict def _process_entries(self, ytdl_module, ytdl_instance, entries): for entry in entries: if not entry: continue elif entry.get("_type") in ("url", "url_transparent"): try: info_dict = ytdl_instance.extract_info( entry["url"], False, ie_key=entry.get("ie_key")) except ytdl_module.utils.YoutubeDLError: continue if not info_dict: continue elif "entries" in info_dict: yield from self._process_entries( ytdl_module, ytdl_instance, info_dict["entries"]) else: yield info_dict else: yield entry if config.get(("extractor", "ytdl"), "enabled"): # make 'ytdl:' prefix optional YoutubeDLExtractor.pattern = r"(?:ytdl:)?(.*)" ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1674475069.0 gallery_dl-1.25.0/gallery_dl/extractor/zerochan.py0000644000175000017500000001702214363473075020743 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2022-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.zerochan.net/""" from .booru import BooruExtractor from ..cache import cache from .. import text, exception BASE_PATTERN = r"(?:https?://)?(?:www\.)?zerochan\.net" class ZerochanExtractor(BooruExtractor): """Base class for zerochan extractors""" category = "zerochan" root = "https://www.zerochan.net" filename_fmt = "{id}.{extension}" archive_fmt = "{id}" cookiedomain = ".zerochan.net" cookienames = ("z_id", "z_hash") def login(self): self._logged_in = True if not self._check_cookies(self.cookienames): username, password = self._get_auth_info() if username: self._update_cookies(self._login_impl(username, password)) else: self._logged_in = False @cache(maxage=90*86400, keyarg=1) def _login_impl(self, username, password): self.log.info("Logging in as %s", username) url = self.root + "/login" headers = { "Origin" : self.root, "Referer" : url, } data = { "ref" : "/", "name" : username, "password": password, "login" : "Login", } response = self.request(url, method="POST", headers=headers, data=data) if not response.history: raise exception.AuthenticationError() return response.cookies def _parse_entry_html(self, entry_id): url = "{}/{}".format(self.root, entry_id) extr = text.extract_from(self.request(url).text) data = { "id" : text.parse_int(entry_id), "author" : extr('"author": "', '"'), "file_url": extr('"contentUrl": "', '"'), "date" : text.parse_datetime(extr('"datePublished": "', '"')), "width" : text.parse_int(extr('"width": "', ' ')), "height" : text.parse_int(extr('"height": "', ' ')), "size" : text.parse_bytes(extr('"contentSize": "', 'B')), "path" : text.split_html(extr( 'class="breadcrumbs', '

    '))[2:], "uploader": extr('href="/user/', '"'), "tags" : extr('