././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1649443808.1847284 gallery_dl-1.21.1/0000755000175000017500000000000014224101740012401 5ustar00mikemike././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1649443807.0 gallery_dl-1.21.1/CHANGELOG.md0000644000175000017500000042433414224101737014232 0ustar00mikemike# Changelog ## 1.21.1 - 2022-04-08 ### Additions - [gofile] add gofile.io extractor ([#2364](https://github.com/mikf/gallery-dl/issues/2364)) - [instagram] add `previews` option ([#2135](https://github.com/mikf/gallery-dl/issues/2135)) - [kemonoparty] add `duplicates` option ([#2440](https://github.com/mikf/gallery-dl/issues/2440)) - [pinterest] add extractor for created pins ([#2452](https://github.com/mikf/gallery-dl/issues/2452)) - [pinterest] support multiple files per pin ([#1619](https://github.com/mikf/gallery-dl/issues/1619), [#2452](https://github.com/mikf/gallery-dl/issues/2452)) - [telegraph] Add telegra.ph extractor ([#2312](https://github.com/mikf/gallery-dl/issues/2312)) - [twitter] add `syndication` option ([#2354](https://github.com/mikf/gallery-dl/issues/2354)) - [twitter] accept fxtwitter.com URLs ([#2484](https://github.com/mikf/gallery-dl/issues/2484)) - [downloader:http] support using an arbitrary method and sending POST data ([#2433](https://github.com/mikf/gallery-dl/issues/2433)) - [postprocessor:metadata] implement archive options ([#2421](https://github.com/mikf/gallery-dl/issues/2421)) - [postprocessor:ugoira] add `mtime` option ([#2307](https://github.com/mikf/gallery-dl/issues/2307)) - [postprocessor:ugoira] support setting timecodes with `mkvmerge` ([#1550](https://github.com/mikf/gallery-dl/issues/1550)) - [formatter] support evaluating f-string literals - add `--ugoira-conv-copy` command-line option ([#1550](https://github.com/mikf/gallery-dl/issues/1550)) - implement a `contains()` function for filter statements ([#2446](https://github.com/mikf/gallery-dl/issues/2446)) ### Fixes - [aryion] provide correct `date` metadata independent of DST - [furaffinity] fix search result pagination ([#2402](https://github.com/mikf/gallery-dl/issues/2402)) - [hitomi] update and fix metadata extraction ([#2444](https://github.com/mikf/gallery-dl/issues/2444)) - [kissgoddess] extract all images ([#2473](https://github.com/mikf/gallery-dl/issues/2473)) - [mangasee] unescape manga names ([#2454](https://github.com/mikf/gallery-dl/issues/2454)) - [newgrounds] update and fix pagination ([#2456](https://github.com/mikf/gallery-dl/issues/2456)) - [newgrounds] warn about age-restricted posts ([#2456](https://github.com/mikf/gallery-dl/issues/2456)) - [pinterest] do not force `m3u8_native` for video downloads ([#2436](https://github.com/mikf/gallery-dl/issues/2436)) - [twibooru] fix posts without `name` ([#2434](https://github.com/mikf/gallery-dl/issues/2434)) - [unsplash] replace dash with space in search API queries ([#2429](https://github.com/mikf/gallery-dl/issues/2429)) - [postprocessor:mtime] fix timestamps from datetime objects ([#2307](https://github.com/mikf/gallery-dl/issues/2307)) - fix yet another bug in `_check_cookies()` ([#2372](https://github.com/mikf/gallery-dl/issues/2372)) - fix loading/storing cookies without domain ## 1.21.0 - 2022-03-14 ### Additions - [fantia] add `num` enumeration index ([#2377](https://github.com/mikf/gallery-dl/issues/2377)) - [fantia] support "Blog Post" content ([#2381](https://github.com/mikf/gallery-dl/issues/2381)) - [imagebam] add support for /view/ paths ([#2378](https://github.com/mikf/gallery-dl/issues/2378)) - [kemonoparty] match beta.kemono.party URLs ([#2348](https://github.com/mikf/gallery-dl/issues/2348)) - [kissgoddess] add `gallery` and `model` extractors ([#1052](https://github.com/mikf/gallery-dl/issues/1052), [#2304](https://github.com/mikf/gallery-dl/issues/2304)) - [mememuseum] add `tag` and `post` extractors ([#2264](https://github.com/mikf/gallery-dl/issues/2264)) - [newgrounds] add `post_url` metadata field ([#2328](https://github.com/mikf/gallery-dl/issues/2328)) - [patreon] add `image_large` file type ([#2257](https://github.com/mikf/gallery-dl/issues/2257)) - [toyhouse] support `art` listings ([#1546](https://github.com/mikf/gallery-dl/issues/1546), [#2331](https://github.com/mikf/gallery-dl/issues/2331)) - [twibooru] add extractors for searches, galleries, and posts ([#2219](https://github.com/mikf/gallery-dl/issues/2219)) - [postprocessor:metadata] implement `mtime` option ([#2307](https://github.com/mikf/gallery-dl/issues/2307)) - [postprocessor:mtime] add `event` option ([#2307](https://github.com/mikf/gallery-dl/issues/2307)) - add fish shell completion ([#2363](https://github.com/mikf/gallery-dl/issues/2363)) - add `timedelta` class to global namespace in filter expressions ### Changes - [seiga] require authentication with `user_session` cookie ([#2372](https://github.com/mikf/gallery-dl/issues/2372)) - remove username & password login due to 2FA - refactor proxy support ([#2357](https://github.com/mikf/gallery-dl/issues/2357)) - allow gallery-dl proxy settings to overwrite environment proxies - allow specifying different proxies for data extraction and download ### Fixes - [bunkr] fix mp4 downloads ([#2239](https://github.com/mikf/gallery-dl/issues/2239)) - [fanbox] fetch data for each individual post ([#2388](https://github.com/mikf/gallery-dl/issues/2388)) - [hentaicosplays] send `Referer` header ([#2317](https://github.com/mikf/gallery-dl/issues/2317)) - [imagebam] set `nsfw_inter` cookie ([#2334](https://github.com/mikf/gallery-dl/issues/2334)) - [kemonoparty] limit default filename length ([#2373](https://github.com/mikf/gallery-dl/issues/2373)) - [mangadex] fix chapters without `translatedLanguage` ([#2352](https://github.com/mikf/gallery-dl/issues/2352)) - [newgrounds] fix video descriptions ([#2328](https://github.com/mikf/gallery-dl/issues/2328)) - [skeb] add `sent-requests` option ([#2322](https://github.com/mikf/gallery-dl/issues/2322), [#2330](https://github.com/mikf/gallery-dl/issues/2330)) - [slideshare] fix extraction - [subscribestar] unescape attachment URLs ([#2370](https://github.com/mikf/gallery-dl/issues/2370)) - [twitter] fix handling of 429 Too Many Requests responses ([#2339](https://github.com/mikf/gallery-dl/issues/2339)) - [twitter] warn about age-restricted Tweets ([#2354](https://github.com/mikf/gallery-dl/issues/2354)) - [twitter] handle Tweets with "softIntervention" entries - [twitter] update query hashes - fix another bug in `_check_cookies()` ([#2160](https://github.com/mikf/gallery-dl/issues/2160)) ## 1.20.5 - 2022-02-14 ### Additions - [furaffinity] add `layout` option ([#2277](https://github.com/mikf/gallery-dl/issues/2277)) - [lightroom] add Lightroom gallery extractor ([#2263](https://github.com/mikf/gallery-dl/issues/2263)) - [reddit] support standalone submissions on personal user pages ([#2301](https://github.com/mikf/gallery-dl/issues/2301)) - [redgifs] support i.redgifs.com URLs ([#2300](https://github.com/mikf/gallery-dl/issues/2300)) - [wallpapercave] add extractor for images and search results ([#2205](https://github.com/mikf/gallery-dl/issues/2205)) - add `signals-ignore` option ([#2296](https://github.com/mikf/gallery-dl/issues/2296)) ### Changes - [danbooru] merge `danbooru` and `e621` extractors - support `atfbooru` ([#2283](https://github.com/mikf/gallery-dl/issues/2283)) - remove support for old e621 tag search URLs ### Fixes - [furaffinity] improve new/old layout detection ([#2277](https://github.com/mikf/gallery-dl/issues/2277)) - [imgbox] fix ImgboxExtractor ([#2281](https://github.com/mikf/gallery-dl/issues/2281)) - [inkbunny] rename search parameters to their API equivalents - [kemonoparty] handle files without names ([#2276](https://github.com/mikf/gallery-dl/issues/2276)) - [twitter] fix extraction ([#2275](https://github.com/mikf/gallery-dl/issues/2275), [#2295](https://github.com/mikf/gallery-dl/issues/2295)) - [vk] fix infinite pagination loops ([#2297](https://github.com/mikf/gallery-dl/issues/2297)) - [downloader:ytdl] make `ImportError`s non-fatal ([#2273](https://github.com/mikf/gallery-dl/issues/2273)) ## 1.20.4 - 2022-02-06 ### Additions - [e621] add `favorite` extractor ([#2250](https://github.com/mikf/gallery-dl/issues/2250)) - [hitomi] add `format` option ([#2260](https://github.com/mikf/gallery-dl/issues/2260)) - [kohlchan] add Kohlchan extractors ([#2251](https://github.com/mikf/gallery-dl/issues/2251)) - [sexcom] add `pins` extractor ([#2265](https://github.com/mikf/gallery-dl/issues/2265)) - [twitter] add `warnings` option ([#2258](https://github.com/mikf/gallery-dl/issues/2258)) - add ability to disable TLS 1.2 ([#2243](https://github.com/mikf/gallery-dl/issues/2243)) - add examples for custom gelbooru instances ([#2262](https://github.com/mikf/gallery-dl/issues/2262)) ### Fixes - [bunkr] fix mp4 downloads ([#2239](https://github.com/mikf/gallery-dl/issues/2239)) - [gelbooru] improve and fix pagination ([#2230](https://github.com/mikf/gallery-dl/issues/2230), [#2232](https://github.com/mikf/gallery-dl/issues/2232)) - [hitomi] "fix" 403 errors ([#2260](https://github.com/mikf/gallery-dl/issues/2260)) - [kemonoparty] fix downloading smaller text files ([#2267](https://github.com/mikf/gallery-dl/issues/2267)) - [patreon] disable TLS 1.2 by default ([#2249](https://github.com/mikf/gallery-dl/issues/2249)) - [twitter] restore errors for protected timelines etc ([#2237](https://github.com/mikf/gallery-dl/issues/2237)) - [twitter] restore `logout` functionality ([#1719](https://github.com/mikf/gallery-dl/issues/1719)) - [twitter] provide fallback URLs for card images - [weibo] update pagination code ([#2244](https://github.com/mikf/gallery-dl/issues/2244)) ## 1.20.3 - 2022-01-26 ### Fixes - [kemonoparty] fix DMs extraction ([#2008](https://github.com/mikf/gallery-dl/issues/2008)) - [twitter] fix crash on Tweets with deleted quotes ([#2225](https://github.com/mikf/gallery-dl/issues/2225)) - [twitter] fix crash on suspended Tweets without `legacy` entry ([#2216](https://github.com/mikf/gallery-dl/issues/2216)) - [twitter] fix crash on unified cards without `type` - [twitter] prevent crash on invalid/deleted Retweets ([#2225](https://github.com/mikf/gallery-dl/issues/2225)) - [twitter] update query hashes ## 1.20.2 - 2022-01-24 ### Additions - [twitter] add `event` extractor (closes [#2109](https://github.com/mikf/gallery-dl/issues/2109)) - [twitter] support image_carousel_website unified cards - add `--source-address` command-line option ([#2206](https://github.com/mikf/gallery-dl/issues/2206)) - add environment variable syntax to formatting.md ([#2065](https://github.com/mikf/gallery-dl/issues/2065)) ### Changes - [twitter] changes to `cards` option - enable `cards` by default - require `cards` to be set to `"ytdl"` to invoke youtube-dl/yt-dlp on unsupported cards ### Fixes - [blogger] support new image domain ([#2204](https://github.com/mikf/gallery-dl/issues/2204)) - [gelbooru] improve video file detection ([#2188](https://github.com/mikf/gallery-dl/issues/2188)) - [hitomi] fix `tag` extraction ([#2189](https://github.com/mikf/gallery-dl/issues/2189)) - [instagram] fix highlights extraction ([#2197](https://github.com/mikf/gallery-dl/issues/2197)) - [mangadex] re-enable warning for external chapters ([#2193](https://github.com/mikf/gallery-dl/issues/2193)) - [newgrounds] set suitabilities filter before starting a search ([#2173](https://github.com/mikf/gallery-dl/issues/2173)) - [philomena] fix search parameter escaping ([#2215](https://github.com/mikf/gallery-dl/issues/2215)) - [reddit] allow downloading from quarantined subreddits ([#2180](https://github.com/mikf/gallery-dl/issues/2180)) - [sexcom] extend URL pattern ([#2220](https://github.com/mikf/gallery-dl/issues/2220)) - [twitter] update to GraphQL API ([#2212](https://github.com/mikf/gallery-dl/issues/2212)) ## 1.20.1 - 2022-01-08 ### Additions - [newgrounds] add `search` extractor ([#2161](https://github.com/mikf/gallery-dl/issues/2161)) ### Changes - restore `-d/--dest` functionality from before 1.20.0 ([#2148](https://github.com/mikf/gallery-dl/issues/2148)) - change short option for `--directory` to `-D` ### Fixes - [gelbooru] handle changed API response format ([#2157](https://github.com/mikf/gallery-dl/issues/2157)) - [hitomi] fix image URLs ([#2153](https://github.com/mikf/gallery-dl/issues/2153)) - [mangadex] fix extraction ([#2177](https://github.com/mikf/gallery-dl/issues/2177)) - [rule34] use `https://api.rule34.xxx` for API requests - fix cookie checks for patreon, fanbox, fantia - improve UNC path handling ([#2126](https://github.com/mikf/gallery-dl/issues/2126)) ## 1.20.0 - 2021-12-29 ### Additions - [500px] add `favorite` extractor ([#1927](https://github.com/mikf/gallery-dl/issues/1927)) - [exhentai] add `source` option - [fanbox] support pixiv redirects ([#2122](https://github.com/mikf/gallery-dl/issues/2122)) - [inkbunny] add `search` extractor ([#2094](https://github.com/mikf/gallery-dl/issues/2094)) - [kemonoparty] support coomer.party ([#2100](https://github.com/mikf/gallery-dl/issues/2100)) - [lolisafe] add generic album extractor for lolisafe/chibisafe instances ([#2038](https://github.com/mikf/gallery-dl/issues/2038), [#2105](https://github.com/mikf/gallery-dl/issues/2105)) - [rule34us] add `tag` and `post` extractors ([#1527](https://github.com/mikf/gallery-dl/issues/1527)) - add a generic extractor ([#735](https://github.com/mikf/gallery-dl/issues/735), [#683](https://github.com/mikf/gallery-dl/issues/683)) - add `-d/--directory` and `-f/--filename` command-line options - add `--sleep-request` and `--sleep-extractor` command-line options - allow specifying `sleep-*` options as string ### Changes - [cyberdrop] include file ID in default filenames - [hitomi] disable `metadata` by default - [kemonoparty] use `service` as subcategory ([#2147](https://github.com/mikf/gallery-dl/issues/2147)) - [kemonoparty] change default `files` order to `attachments,file,inline` ([#1991](https://github.com/mikf/gallery-dl/issues/1991)) - [output] write download progress indicator to stderr - [ytdl] prefer yt-dlp over youtube-dl ([#1850](https://github.com/mikf/gallery-dl/issues/1850), [#2028](https://github.com/mikf/gallery-dl/issues/2028)) - rename `--write-infojson` to `--write-info-json` ### Fixes - [500px] create directories per photo - [artstation] create directories per asset ([#2136](https://github.com/mikf/gallery-dl/issues/2136)) - [deviantart] use `/browse/newest` for most-recent searches ([#2096](https://github.com/mikf/gallery-dl/issues/2096)) - [hitomi] fix image URLs - [instagram] fix error when PostPage data is not in GraphQL format ([#2037](https://github.com/mikf/gallery-dl/issues/2037)) - [instagran] match post URLs with usernames ([#2085](https://github.com/mikf/gallery-dl/issues/2085)) - [instagram] allow downloading specific stories ([#2088](https://github.com/mikf/gallery-dl/issues/2088)) - [furaffinity] warn when no session cookies were found - [pixiv] respect date ranges in search URLs ([#2133](https://github.com/mikf/gallery-dl/issues/2133)) - [sexcom] fix and improve embed extraction ([#2145](https://github.com/mikf/gallery-dl/issues/2145)) - [tumblrgallery] fix extraction ([#2112](https://github.com/mikf/gallery-dl/issues/2112)) - [tumblrgallery] improve `id` extraction ([#2115](https://github.com/mikf/gallery-dl/issues/2115)) - [tumblrgallery] improve search pagination ([#2132](https://github.com/mikf/gallery-dl/issues/2132)) - [twitter] include `4096x4096` as a default image fallback ([#1881](https://github.com/mikf/gallery-dl/issues/1881), [#2107](https://github.com/mikf/gallery-dl/issues/2107)) - [ytdl] update argument parsing to latest yt-dlp changes ([#2124](https://github.com/mikf/gallery-dl/issues/2124)) - handle UNC paths ([#2113](https://github.com/mikf/gallery-dl/issues/2113)) ## 1.19.3 - 2021-11-27 ### Additions - [dynastyscans] add `manga` extractor ([#2035](https://github.com/mikf/gallery-dl/issues/2035)) - [instagram] include user metadata for `tagged` downloads ([#2024](https://github.com/mikf/gallery-dl/issues/2024)) - [kemonoparty] implement `files` option ([#1991](https://github.com/mikf/gallery-dl/issues/1991)) - [kemonoparty] add `dms` option ([#2008](https://github.com/mikf/gallery-dl/issues/2008)) - [mangadex] always provide `artist`, `author`, and `group` metadata fields ([#2049](https://github.com/mikf/gallery-dl/issues/2049)) - [philomena] support furbooru.org ([#1995](https://github.com/mikf/gallery-dl/issues/1995)) - [reactor] support thatpervert.com ([#2029](https://github.com/mikf/gallery-dl/issues/2029)) - [shopify] support loungeunderwear.com ([#2053](https://github.com/mikf/gallery-dl/issues/2053)) - [skeb] add `thumbnails` option ([#2047](https://github.com/mikf/gallery-dl/issues/2047), [#2051](https://github.com/mikf/gallery-dl/issues/2051)) - [subscribestar] add `num` enumeration index ([#2040](https://github.com/mikf/gallery-dl/issues/2040)) - [subscribestar] emit metadata for posts without media ([#1569](https://github.com/mikf/gallery-dl/issues/1569)) - [ytdl] implement `cmdline-args` and `config-file` options to allow parsing ytdl command-line options ([#1680](https://github.com/mikf/gallery-dl/issues/1680)) - [formatter] implement `D` format specifier - extend `blacklist`/`whitelist` syntax ([#2025](https://github.com/mikf/gallery-dl/issues/2025)) ### Fixes - [dynastyscans] provide `date` as datetime object ([#2050](https://github.com/mikf/gallery-dl/issues/2050)) - [exhentai] fix extraction for disowned galleries ([#2055](https://github.com/mikf/gallery-dl/issues/2055)) - [gelbooru] apply workaround for pagination limits - [kemonoparty] skip duplicate files ([#2032](https://github.com/mikf/gallery-dl/issues/2032), [#1991](https://github.com/mikf/gallery-dl/issues/1991), [#1899](https://github.com/mikf/gallery-dl/issues/1899)) - [kemonoparty] provide `date` metadata for gumroad ([#2007](https://github.com/mikf/gallery-dl/issues/2007)) - [mangoxo] fix metadata extraction - [twitter] distinguish between fatal & nonfatal errors ([#2020](https://github.com/mikf/gallery-dl/issues/2020)) - [twitter] fix extractor for direct image links ([#2030](https://github.com/mikf/gallery-dl/issues/2030)) - [webtoons] use download URLs that do not require a `Referer` header ([#2005](https://github.com/mikf/gallery-dl/issues/2005)) - [ytdl] improve error handling ([#1680](https://github.com/mikf/gallery-dl/issues/1680)) - [downloader:ytdl] prevent crash in `_progress_hook()` ([#1680](https://github.com/mikf/gallery-dl/issues/1680)) ### Removals - [seisoparty] remove module ## 1.19.2 - 2021-11-05 ### Additions - [kemonoparty] add `comments` option ([#1980](https://github.com/mikf/gallery-dl/issues/1980)) - [skeb] add `user` and `post` extractors ([#1031](https://github.com/mikf/gallery-dl/issues/1031), [#1971](https://github.com/mikf/gallery-dl/issues/1971)) - [twitter] add `pinned` option - support accessing environment variables and the current local datetime in format strings ([#1968](https://github.com/mikf/gallery-dl/issues/1968)) - add special type format strings to docs ([#1987](https://github.com/mikf/gallery-dl/issues/1987)) ### Fixes - [cyberdrop] fix video extraction ([#1993](https://github.com/mikf/gallery-dl/issues/1993)) - [deviantart] fix `index` values for stashed deviations - [gfycat] provide consistent `userName` values for `user` downloads ([#1962](https://github.com/mikf/gallery-dl/issues/1962)) - [gfycat] show warning when there are no available formats - [hitomi] fix image URLs ([#1975](https://github.com/mikf/gallery-dl/issues/1975), [#1982](https://github.com/mikf/gallery-dl/issues/1982), [#1988](https://github.com/mikf/gallery-dl/issues/1988)) - [instagram] update query hashes - [mangakakalot] update domain and fix extraction - [mangoxo] fix login and extraction - [reddit] prevent crash for galleries with no `media_metadata` ([#2001](https://github.com/mikf/gallery-dl/issues/2001)) - [redgifs] update to API v2 ([#1984](https://github.com/mikf/gallery-dl/issues/1984)) - fix calculating retry sleep times ([#1990](https://github.com/mikf/gallery-dl/issues/1990)) ## 1.19.1 - 2021-10-24 ### Additions - [inkbunny] add `following` extractor ([#515](https://github.com/mikf/gallery-dl/issues/515)) - [inkbunny] add `pool` extractor ([#1937](https://github.com/mikf/gallery-dl/issues/1937)) - [kemonoparty] add `discord` extractor ([#1827](https://github.com/mikf/gallery-dl/issues/1827), [#1940](https://github.com/mikf/gallery-dl/issues/1940)) - [nhentai] add `tag` extractor ([#1950](https://github.com/mikf/gallery-dl/issues/1950), [#1955](https://github.com/mikf/gallery-dl/issues/1955)) - [patreon] add `files` option ([#1935](https://github.com/mikf/gallery-dl/issues/1935)) - [picarto] add `gallery` extractor ([#1931](https://github.com/mikf/gallery-dl/issues/1931)) - [pixiv] add `sketch` extractor ([#1497](https://github.com/mikf/gallery-dl/issues/1497)) - [seisoparty] add `favorite` extractor ([#1906](https://github.com/mikf/gallery-dl/issues/1906)) - [twitter] add `size` option ([#1881](https://github.com/mikf/gallery-dl/issues/1881)) - [vk] add `album` extractor ([#474](https://github.com/mikf/gallery-dl/issues/474), [#1952](https://github.com/mikf/gallery-dl/issues/1952)) - [postprocessor:compare] add `equal` option ([#1592](https://github.com/mikf/gallery-dl/issues/1592)) ### Fixes - [cyberdrop] extract direct download URLs ([#1943](https://github.com/mikf/gallery-dl/issues/1943)) - [deviantart] update `search` argument handling ([#1911](https://github.com/mikf/gallery-dl/issues/1911)) - [deviantart] full resolution for non-downloadable images ([#293](https://github.com/mikf/gallery-dl/issues/293)) - [furaffinity] unquote search queries ([#1958](https://github.com/mikf/gallery-dl/issues/1958)) - [inkbunny] match "long" URLs for pools and favorites ([#1937](https://github.com/mikf/gallery-dl/issues/1937)) - [kemonoparty] improve inline extraction ([#1899](https://github.com/mikf/gallery-dl/issues/1899)) - [mangadex] update parameter handling for API requests ([#1908](https://github.com/mikf/gallery-dl/issues/1908)) - [patreon] better filenames for `content` images ([#1954](https://github.com/mikf/gallery-dl/issues/1954)) - [redgifs][gfycat] provide fallback URLs ([#1962](https://github.com/mikf/gallery-dl/issues/1962)) - [downloader:ytdl] prevent crash in `_progress_hook()` - restore SOCKS support for Windows executables ## 1.19.0 - 2021-10-01 ### Additions - [aryion] add `tag` extractor ([#1849](https://github.com/mikf/gallery-dl/issues/1849)) - [desktopography] implement desktopography extractors ([#1740](https://github.com/mikf/gallery-dl/issues/1740)) - [deviantart] implement `auto-unwatch` option ([#1466](https://github.com/mikf/gallery-dl/issues/1466), [#1757](https://github.com/mikf/gallery-dl/issues/1757)) - [fantia] add `date` metadata field ([#1853](https://github.com/mikf/gallery-dl/issues/1853)) - [fappic] add `image` extractor ([#1898](https://github.com/mikf/gallery-dl/issues/1898)) - [gelbooru_v02] add `favorite` extractor ([#1834](https://github.com/mikf/gallery-dl/issues/1834)) - [kemonoparty] add `favorite` extractor ([#1824](https://github.com/mikf/gallery-dl/issues/1824)) - [kemonoparty] implement login with username & password ([#1824](https://github.com/mikf/gallery-dl/issues/1824)) - [mastodon] add `following` extractor ([#1891](https://github.com/mikf/gallery-dl/issues/1891)) - [mastodon] support specifying accounts by ID - [twitter] support `/with_replies` URLs ([#1833](https://github.com/mikf/gallery-dl/issues/1833)) - [twitter] add `quote_by` metadata field ([#1481](https://github.com/mikf/gallery-dl/issues/1481)) - [postprocessor:compare] extend `action` option ([#1592](https://github.com/mikf/gallery-dl/issues/1592)) - implement a download progress indicator ([#1519](https://github.com/mikf/gallery-dl/issues/1519)) - implement a `page-reverse` option ([#1854](https://github.com/mikf/gallery-dl/issues/1854)) - implement a way to specify extended format strings - allow specifying a minimum/maximum for `sleep-*` options ([#1835](https://github.com/mikf/gallery-dl/issues/1835)) - add a `--write-infojson` command-line option ### Changes - [cyberdrop] change directory name format ([#1871](https://github.com/mikf/gallery-dl/issues/1871)) - [instagram] update default delay to 6-12 seconds ([#1835](https://github.com/mikf/gallery-dl/issues/1835)) - [reddit] extend subcategory depending on input URL ([#1836](https://github.com/mikf/gallery-dl/issues/1836)) - move util.Formatter and util.PathFormat into their own modules ### Fixes - [artstation] use `/album/all` view for user portfolios ([#1826](https://github.com/mikf/gallery-dl/issues/1826)) - [aryion] update/improve pagination ([#1849](https://github.com/mikf/gallery-dl/issues/1849)) - [deviantart] fix bug with fetching premium content ([#1879](https://github.com/mikf/gallery-dl/issues/1879)) - [deviantart] update default archive_fmt for single deviations ([#1874](https://github.com/mikf/gallery-dl/issues/1874)) - [erome] send Referer header for file downloads ([#1829](https://github.com/mikf/gallery-dl/issues/1829)) - [hiperdex] fix extraction - [kemonoparty] update file download URLs ([#1902](https://github.com/mikf/gallery-dl/issues/1902), [#1903](https://github.com/mikf/gallery-dl/issues/1903)) - [mangadex] fix extraction ([#1852](https://github.com/mikf/gallery-dl/issues/1852)) - [mangadex] fix retrieving chapters from "pornographic" titles ([#1908](https://github.com/mikf/gallery-dl/issues/1908)) - [nozomi] preserve case of search tags ([#1860](https://github.com/mikf/gallery-dl/issues/1860)) - [redgifs][gfycat] remove webtoken code ([#1907](https://github.com/mikf/gallery-dl/issues/1907)) - [twitter] ensure card entries have a `url` ([#1868](https://github.com/mikf/gallery-dl/issues/1868)) - implement a way to correctly shorten displayed filenames containing east-asian characters ([#1377](https://github.com/mikf/gallery-dl/issues/1377)) ## 1.18.4 - 2021-09-04 ### Additions - [420chan] add `thread` and `board` extractors ([#1773](https://github.com/mikf/gallery-dl/issues/1773)) - [deviantart] add `tag` extractor ([#1803](https://github.com/mikf/gallery-dl/issues/1803)) - [deviantart] add `comments` option ([#1800](https://github.com/mikf/gallery-dl/issues/1800)) - [deviantart] implement a `auto-watch` option ([#1466](https://github.com/mikf/gallery-dl/issues/1466), [#1757](https://github.com/mikf/gallery-dl/issues/1757)) - [foolfuuka] add `gallery` extractor ([#1785](https://github.com/mikf/gallery-dl/issues/1785)) - [furaffinity] expand URL pattern for searches ([#1780](https://github.com/mikf/gallery-dl/issues/1780)) - [kemonoparty] automatically generate required DDoS-GUARD cookies ([#1779](https://github.com/mikf/gallery-dl/issues/1779)) - [nhentai] add `favorite` extractor ([#1814](https://github.com/mikf/gallery-dl/issues/1814)) - [shopify] support windsorstore.com ([#1793](https://github.com/mikf/gallery-dl/issues/1793)) - [twitter] add `url` to user objects ([#1787](https://github.com/mikf/gallery-dl/issues/1787), [#1532](https://github.com/mikf/gallery-dl/issues/1532)) - [twitter] expand t.co links in user descriptions ([#1787](https://github.com/mikf/gallery-dl/issues/1787), [#1532](https://github.com/mikf/gallery-dl/issues/1532)) - show a warning if an extractor doesn`t yield any results ([#1428](https://github.com/mikf/gallery-dl/issues/1428), [#1759](https://github.com/mikf/gallery-dl/issues/1759)) - add a `j` format string conversion - implement a `fallback` option ([#1770](https://github.com/mikf/gallery-dl/issues/1770)) - implement a `path-strip` option ### Changes - [shopify] use API for product listings ([#1793](https://github.com/mikf/gallery-dl/issues/1793)) - update default User-Agent headers ### Fixes - [deviantart] prevent exceptions for "empty" videos ([#1796](https://github.com/mikf/gallery-dl/issues/1796)) - [exhentai] improve image limits check ([#1808](https://github.com/mikf/gallery-dl/issues/1808)) - [inkbunny] fix extraction ([#1816](https://github.com/mikf/gallery-dl/issues/1816)) - [mangadex] prevent exceptions for manga without English title ([#1815](https://github.com/mikf/gallery-dl/issues/1815)) - [oauth] use defaults when config values are set to `null` ([#1778](https://github.com/mikf/gallery-dl/issues/1778)) - [pixiv] fix pixivision title extraction - [reddit] delay RedditAPI initialization ([#1813](https://github.com/mikf/gallery-dl/issues/1813)) - [twitter] improve error reporting ([#1759](https://github.com/mikf/gallery-dl/issues/1759)) - [twitter] fix issue when filtering quote tweets ([#1792](https://github.com/mikf/gallery-dl/issues/1792)) - [twitter] fix `logout` option ([#1719](https://github.com/mikf/gallery-dl/issues/1719)) ### Removals - [deviantart] remove the "you need session cookies to download mature scraps" warning ([#1777](https://github.com/mikf/gallery-dl/issues/1777), [#1776](https://github.com/mikf/gallery-dl/issues/1776)) - [foolslide] remove entry for kobato.hologfx.com ## 1.18.3 - 2021-08-13 ### Additions - [bbc] add `width` option ([#1706](https://github.com/mikf/gallery-dl/issues/1706)) - [danbooru] add `external` option ([#1747](https://github.com/mikf/gallery-dl/issues/1747)) - [furaffinity] add `external` option ([#1492](https://github.com/mikf/gallery-dl/issues/1492)) - [luscious] add `gif` option ([#1701](https://github.com/mikf/gallery-dl/issues/1701)) - [newgrounds] add `format` option ([#1729](https://github.com/mikf/gallery-dl/issues/1729)) - [reactor] add `gif` option ([#1701](https://github.com/mikf/gallery-dl/issues/1701)) - [twitter] warn about suspended accounts ([#1759](https://github.com/mikf/gallery-dl/issues/1759)) - [twitter] extend `replies` option ([#1254](https://github.com/mikf/gallery-dl/issues/1254)) - [twitter] add option to log out and retry when blocked ([#1719](https://github.com/mikf/gallery-dl/issues/1719)) - [wikieat] add `thread` and `board` extractors ([#1699](https://github.com/mikf/gallery-dl/issues/1699), [#1607](https://github.com/mikf/gallery-dl/issues/1607)) ### Changes - [instagram] increase default delay between HTTP requests from 5s to 8s ([#1732](https://github.com/mikf/gallery-dl/issues/1732)) ### Fixes - [bbc] improve image dimensions ([#1706](https://github.com/mikf/gallery-dl/issues/1706)) - [bbc] support multi-page gallery listings ([#1730](https://github.com/mikf/gallery-dl/issues/1730)) - [behance] fix `collection` extraction - [deviantart] get original files for GIF previews ([#1731](https://github.com/mikf/gallery-dl/issues/1731)) - [furaffinity] fix errors when using `category-transfer` ([#1274](https://github.com/mikf/gallery-dl/issues/1274)) - [hitomi] fix image URLs ([#1765](https://github.com/mikf/gallery-dl/issues/1765)) - [instagram] use custom User-Agent header for video downloads ([#1682](https://github.com/mikf/gallery-dl/issues/1682), [#1623](https://github.com/mikf/gallery-dl/issues/1623), [#1580](https://github.com/mikf/gallery-dl/issues/1580)) - [kemonoparty] fix username extraction ([#1750](https://github.com/mikf/gallery-dl/issues/1750)) - [kemonoparty] update file server domain ([#1764](https://github.com/mikf/gallery-dl/issues/1764)) - [newgrounds] fix errors when using `category-transfer` ([#1274](https://github.com/mikf/gallery-dl/issues/1274)) - [nsfwalbum] retry backend requests when extracting image URLs ([#1733](https://github.com/mikf/gallery-dl/issues/1733), [#1271](https://github.com/mikf/gallery-dl/issues/1271)) - [vk] prevent exception for empty/private profiles ([#1742](https://github.com/mikf/gallery-dl/issues/1742)) ## 1.18.2 - 2021-07-23 ### Additions - [bbc] add `gallery` and `programme` extractors ([#1706](https://github.com/mikf/gallery-dl/issues/1706)) - [comicvine] add extractor ([#1712](https://github.com/mikf/gallery-dl/issues/1712)) - [kemonoparty] add `max-posts` option ([#1674](https://github.com/mikf/gallery-dl/issues/1674)) - [kemonoparty] parse `o` query parameters ([#1674](https://github.com/mikf/gallery-dl/issues/1674)) - [mastodon] add `reblogs` and `replies` options ([#1669](https://github.com/mikf/gallery-dl/issues/1669)) - [pixiv] add extractor for `pixivision` articles ([#1672](https://github.com/mikf/gallery-dl/issues/1672)) - [ytdl] add experimental extractor for sites supported by youtube-dl ([#1680](https://github.com/mikf/gallery-dl/issues/1680), [#878](https://github.com/mikf/gallery-dl/issues/878)) - extend `parent-metadata` functionality ([#1687](https://github.com/mikf/gallery-dl/issues/1687), [#1651](https://github.com/mikf/gallery-dl/issues/1651), [#1364](https://github.com/mikf/gallery-dl/issues/1364)) - add `archive-prefix` option ([#1711](https://github.com/mikf/gallery-dl/issues/1711)) - add `url-metadata` option ([#1659](https://github.com/mikf/gallery-dl/issues/1659), [#1073](https://github.com/mikf/gallery-dl/issues/1073)) ### Changes - [kemonoparty] skip duplicated patreon files ([#1689](https://github.com/mikf/gallery-dl/issues/1689), [#1667](https://github.com/mikf/gallery-dl/issues/1667)) - [mangadex] use custom User-Agent header ([#1535](https://github.com/mikf/gallery-dl/issues/1535)) ### Fixes - [hitomi] fix image URLs ([#1679](https://github.com/mikf/gallery-dl/issues/1679)) - [imagevenue] fix extraction ([#1677](https://github.com/mikf/gallery-dl/issues/1677)) - [instagram] fix extraction of `/explore/tags/` posts ([#1666](https://github.com/mikf/gallery-dl/issues/1666)) - [moebooru] fix `tags` ending with a `+` when logged in ([#1702](https://github.com/mikf/gallery-dl/issues/1702)) - [naverwebtoon] fix comic extraction - [pururin] update domain and fix extraction - [vk] improve metadata extraction and URL pattern ([#1691](https://github.com/mikf/gallery-dl/issues/1691)) - [downloader:ytdl] fix `outtmpl` setting for yt-dlp ([#1680](https://github.com/mikf/gallery-dl/issues/1680)) ## 1.18.1 - 2021-07-04 ### Additions - [mangafox] add `manga` extractor ([#1633](https://github.com/mikf/gallery-dl/issues/1633)) - [mangasee] add `chapter` and `manga` extractors - [mastodon] implement `text-posts` option ([#1569](https://github.com/mikf/gallery-dl/issues/1569), [#1669](https://github.com/mikf/gallery-dl/issues/1669)) - [seisoparty] add `user` and `post` extractors ([#1635](https://github.com/mikf/gallery-dl/issues/1635)) - implement conditional directories ([#1394](https://github.com/mikf/gallery-dl/issues/1394)) - add `T` format string conversion ([#1646](https://github.com/mikf/gallery-dl/issues/1646)) - document format string syntax ### Changes - [twitter] set `retweet_id` for original retweets ([#1481](https://github.com/mikf/gallery-dl/issues/1481)) ### Fixes - [directlink] manually encode Referer URLs ([#1647](https://github.com/mikf/gallery-dl/issues/1647)) - [hiperdex] use domain from input URL - [kemonoparty] fix `username` extraction ([#1652](https://github.com/mikf/gallery-dl/issues/1652)) - [kemonoparty] warn about missing DDoS-GUARD cookies - [twitter] ensure guest tokens are returned as string ([#1665](https://github.com/mikf/gallery-dl/issues/1665)) - [webtoons] match arbitrary language codes ([#1643](https://github.com/mikf/gallery-dl/issues/1643)) - fix depth counter in UrlJob when specifying `-g` multiple times ## 1.18.0 - 2021-06-19 ### Additions - [foolfuuka] support `archive.wakarimasen.moe` ([#1595](https://github.com/mikf/gallery-dl/issues/1595)) - [mangadex] implement login with username & password ([#1535](https://github.com/mikf/gallery-dl/issues/1535)) - [mangadex] add extractor for a user's followed feed ([#1535](https://github.com/mikf/gallery-dl/issues/1535)) - [pixiv] support fetching privately followed users ([#1628](https://github.com/mikf/gallery-dl/issues/1628)) - implement conditional filenames ([#1394](https://github.com/mikf/gallery-dl/issues/1394)) - implement `filter` option for post processors ([#1460](https://github.com/mikf/gallery-dl/issues/1460)) - add `-T/--terminate` command-line option ([#1399](https://github.com/mikf/gallery-dl/issues/1399)) - add `-P/--postprocessor` command-line option ([#1583](https://github.com/mikf/gallery-dl/issues/1583)) ### Changes - [kemonoparty] update default filenames and archive IDs ([#1514](https://github.com/mikf/gallery-dl/issues/1514)) - [twitter] update default settings - change `retweets` and `quoted` options from `true` to `false` - change directory format for search results to the same as other extractors - require an argument for `--clear-cache` ### Fixes - [500px] update GraphQL queries - [furaffinity] improve metadata extraction ([#1630](https://github.com/mikf/gallery-dl/issues/1630)) - [hitomi] update image URL generation ([#1637](https://github.com/mikf/gallery-dl/issues/1637)) - [idolcomplex] improve and fix pagination ([#1594](https://github.com/mikf/gallery-dl/issues/1594), [#1601](https://github.com/mikf/gallery-dl/issues/1601)) - [instagram] fix login ([#1631](https://github.com/mikf/gallery-dl/issues/1631)) - [instagram] update query hashes - [mangadex] update to API v5 ([#1535](https://github.com/mikf/gallery-dl/issues/1535)) - [mangafox] improve URL pattern ([#1608](https://github.com/mikf/gallery-dl/issues/1608)) - [oauth] prevent exceptions when reporting errors ([#1603](https://github.com/mikf/gallery-dl/issues/1603)) - [philomena] fix tag escapes handling ([#1629](https://github.com/mikf/gallery-dl/issues/1629)) - [redgifs] update API server address ([#1632](https://github.com/mikf/gallery-dl/issues/1632)) - [sankaku] handle empty tags ([#1617](https://github.com/mikf/gallery-dl/issues/1617)) - [subscribestar] improve attachment filenames ([#1609](https://github.com/mikf/gallery-dl/issues/1609)) - [unsplash] update collections URL pattern ([#1627](https://github.com/mikf/gallery-dl/issues/1627)) - [postprocessor:metadata] handle dicts in `mode:tags` ([#1598](https://github.com/mikf/gallery-dl/issues/1598)) ## 1.17.5 - 2021-05-30 ### Additions - [kemonoparty] add `metadata` option ([#1548](https://github.com/mikf/gallery-dl/issues/1548)) - [kemonoparty] add `type` metadata field ([#1556](https://github.com/mikf/gallery-dl/issues/1556)) - [mangapark] recognize v2.mangapark URLs ([#1578](https://github.com/mikf/gallery-dl/issues/1578)) - [patreon] extract user-defined `tags` ([#1539](https://github.com/mikf/gallery-dl/issues/1539), [#1540](https://github.com/mikf/gallery-dl/issues/1540)) - [pillowfort] implement login with username & password ([#846](https://github.com/mikf/gallery-dl/issues/846)) - [pillowfort] add `inline` and `external` options ([#846](https://github.com/mikf/gallery-dl/issues/846)) - [pixiv] implement `max-posts` option ([#1558](https://github.com/mikf/gallery-dl/issues/1558)) - [pixiv] add `metadata` option ([#1551](https://github.com/mikf/gallery-dl/issues/1551)) - [twitter] add `text-tweets` option ([#570](https://github.com/mikf/gallery-dl/issues/570)) - [weibo] extend `retweets` option ([#1542](https://github.com/mikf/gallery-dl/issues/1542)) - [postprocessor:ugoira] support using the `image2` demuxer ([#1550](https://github.com/mikf/gallery-dl/issues/1550)) - [postprocessor:ugoira] add `repeat-last-frame` option ([#1550](https://github.com/mikf/gallery-dl/issues/1550)) - support `XDG_CONFIG_HOME` ([#1545](https://github.com/mikf/gallery-dl/issues/1545)) - implement `parent-skip` and `"skip": "terminate"` options ([#1399](https://github.com/mikf/gallery-dl/issues/1399)) ### Changes - [twitter] resolve `t.co` URLs in `content` ([#1532](https://github.com/mikf/gallery-dl/issues/1532)) ### Fixes - [500px] update query hashes ([#1573](https://github.com/mikf/gallery-dl/issues/1573)) - [aryion] find text posts in `recursive=false` mode ([#1568](https://github.com/mikf/gallery-dl/issues/1568)) - [imagebam] fix extraction of NSFW images ([#1534](https://github.com/mikf/gallery-dl/issues/1534)) - [imgur] update URL patterns ([#1561](https://github.com/mikf/gallery-dl/issues/1561)) - [manganelo] update domain to `manganato.com` - [reactor] skip deleted/empty posts - [twitter] add missing retweet media entities ([#1555](https://github.com/mikf/gallery-dl/issues/1555)) - fix ISO 639-1 code for Japanese (`jp` -> `ja`) ## 1.17.4 - 2021-05-07 ### Additions - [gelbooru] add extractor for `/redirect.php` URLs ([#1530](https://github.com/mikf/gallery-dl/issues/1530)) - [inkbunny] add `favorite` extractor ([#1521](https://github.com/mikf/gallery-dl/issues/1521)) - add `output.skip` option - add an optional argument to `--clear-cache` to select which cache entries to remove ([#1230](https://github.com/mikf/gallery-dl/issues/1230)) ### Changes - [pixiv] update `translated-tags` option ([#1507](https://github.com/mikf/gallery-dl/issues/1507)) - rename to `tags` - accept `"japanese"`, `"translated"`, and `"original"` as values ### Fixes - [500px] update query hashes - [kemonoparty] fix download URLs ([#1514](https://github.com/mikf/gallery-dl/issues/1514)) - [imagebam] fix extraction - [instagram] update query hashes - [nozomi] update default archive-fmt for `tag` and `search` extractors ([#1529](https://github.com/mikf/gallery-dl/issues/1529)) - [pixiv] remove duplicate translated tags ([#1507](https://github.com/mikf/gallery-dl/issues/1507)) - [readcomiconline] change domain to `readcomiconline.li` ([#1517](https://github.com/mikf/gallery-dl/issues/1517)) - [sankaku] update invalid-token detection ([#1515](https://github.com/mikf/gallery-dl/issues/1515)) - fix crash when using `--no-download` with `--ugoira-conv` ([#1507](https://github.com/mikf/gallery-dl/issues/1507)) ## 1.17.3 - 2021-04-25 ### Additions - [danbooru] add option for extended metadata extraction ([#1458](https://github.com/mikf/gallery-dl/issues/1458)) - [fanbox] add extractors ([#1459](https://github.com/mikf/gallery-dl/issues/1459)) - [fantia] add extractors ([#1459](https://github.com/mikf/gallery-dl/issues/1459)) - [gelbooru] add an option to extract notes ([#1457](https://github.com/mikf/gallery-dl/issues/1457)) - [hentaicosplays] add extractor ([#907](https://github.com/mikf/gallery-dl/issues/907), [#1473](https://github.com/mikf/gallery-dl/issues/1473), [#1483](https://github.com/mikf/gallery-dl/issues/1483)) - [instagram] add extractor for `tagged` posts ([#1439](https://github.com/mikf/gallery-dl/issues/1439)) - [naverwebtoon] ignore non-comic images - [pixiv] also save untranslated tags when `translated-tags` is enabled ([#1501](https://github.com/mikf/gallery-dl/issues/1501)) - [shopify] support omgmiamiswimwear.com ([#1280](https://github.com/mikf/gallery-dl/issues/1280)) - implement `output.fallback` option - add archive format to InfoJob output ([#875](https://github.com/mikf/gallery-dl/issues/875)) - build executables with SOCKS proxy support ([#1424](https://github.com/mikf/gallery-dl/issues/1424)) ### Fixes - [500px] update query hashes - [8muses] fix JSON deobfuscation - [artstation] download `/4k/` images ([#1422](https://github.com/mikf/gallery-dl/issues/1422)) - [deviantart] fix pagination for Eclipse results ([#1444](https://github.com/mikf/gallery-dl/issues/1444)) - [deviantart] improve folder name matching ([#1451](https://github.com/mikf/gallery-dl/issues/1451)) - [erome] skip deleted albums ([#1447](https://github.com/mikf/gallery-dl/issues/1447)) - [exhentai] fix image limit detection ([#1437](https://github.com/mikf/gallery-dl/issues/1437)) - [exhentai] restore `limits` option ([#1487](https://github.com/mikf/gallery-dl/issues/1487)) - [gelbooru] fix tag category extraction ([#1455](https://github.com/mikf/gallery-dl/issues/1455)) - [instagram] update query hashes - [komikcast] fix extraction - [simplyhentai] fix extraction - [slideshare] fix extraction - [webtoons] update agegate/GDPR cookies ([#1431](https://github.com/mikf/gallery-dl/issues/1431)) - fix `category-transfer` option ### Removals - [yuki] remove module for yuki.la ## 1.17.2 - 2021-04-02 ### Additions - [deviantart] add support for posts from watched users ([#794](https://github.com/mikf/gallery-dl/issues/794)) - [manganelo] add `chapter` and `manga` extractors ([#1415](https://github.com/mikf/gallery-dl/issues/1415)) - [pinterest] add `search` extractor ([#1411](https://github.com/mikf/gallery-dl/issues/1411)) - [sankaku] add `tag_string` metadata field ([#1388](https://github.com/mikf/gallery-dl/issues/1388)) - [sankaku] add enumeration index for books ([#1388](https://github.com/mikf/gallery-dl/issues/1388)) - [tapas] add `series` and `episode` extractors ([#692](https://github.com/mikf/gallery-dl/issues/692)) - [tapas] implement login with username & password ([#692](https://github.com/mikf/gallery-dl/issues/692)) - [twitter] allow specifying a custom format for user results ([#1337](https://github.com/mikf/gallery-dl/issues/1337)) - [twitter] add extractor for direct image links ([#1417](https://github.com/mikf/gallery-dl/issues/1417)) - [vk] add support for albums ([#474](https://github.com/mikf/gallery-dl/issues/474)) ### Fixes - [aryion] unescape paths ([#1414](https://github.com/mikf/gallery-dl/issues/1414)) - [bcy] improve pagination - [deviantart] update `watch` URL pattern ([#794](https://github.com/mikf/gallery-dl/issues/794)) - [deviantart] fix arguments for search/popular results ([#1408](https://github.com/mikf/gallery-dl/issues/1408)) - [deviantart] use fallback for `/intermediary/` URLs - [exhentai] improve and simplify image limit checks - [komikcast] fix extraction - [pixiv] fix `favorite` URL pattern ([#1405](https://github.com/mikf/gallery-dl/issues/1405)) - [sankaku] simplify `pool` tags ([#1388](https://github.com/mikf/gallery-dl/issues/1388)) - [twitter] improve error message when trying to log in with 2FA ([#1409](https://github.com/mikf/gallery-dl/issues/1409)) - [twitter] don't use youtube-dl for cards when videos are disabled ([#1416](https://github.com/mikf/gallery-dl/issues/1416)) ## 1.17.1 - 2021-03-19 ### Additions - [architizer] add `project` and `firm` extractors ([#1369](https://github.com/mikf/gallery-dl/issues/1369)) - [deviantart] add `watch` extractor ([#794](https://github.com/mikf/gallery-dl/issues/794)) - [exhentai] support `/tag/` URLs ([#1363](https://github.com/mikf/gallery-dl/issues/1363)) - [gelbooru_v01] support `drawfriends.booru.org`, `vidyart.booru.org`, and `tlb.booru.org` by default - [nozomi] support `/index-N.html` URLs ([#1365](https://github.com/mikf/gallery-dl/issues/1365)) - [philomena] add generalized extractors for philomena sites ([#1379](https://github.com/mikf/gallery-dl/issues/1379)) - [philomena] support post URLs without `/images/` - [twitter] implement `users` option ([#1337](https://github.com/mikf/gallery-dl/issues/1337)) - implement `parent-metadata` option ([#1364](https://github.com/mikf/gallery-dl/issues/1364)) ### Changes - [deviantart] revert previous changes to `extra` option ([#1356](https://github.com/mikf/gallery-dl/issues/1356), [#1387](https://github.com/mikf/gallery-dl/issues/1387)) ### Fixes - [exhentai] improve favorites count extraction ([#1360](https://github.com/mikf/gallery-dl/issues/1360)) - [gelbooru] update domain for video downloads ([#1368](https://github.com/mikf/gallery-dl/issues/1368)) - [hentaifox] improve image and metadata extraction ([#1366](https://github.com/mikf/gallery-dl/issues/1366), [#1378](https://github.com/mikf/gallery-dl/issues/1378)) - [imgur] fix and improve rate limit handling ([#1386](https://github.com/mikf/gallery-dl/issues/1386)) - [weasyl] improve favorites URL pattern ([#1374](https://github.com/mikf/gallery-dl/issues/1374)) - use type check before applying `browser` option ([#1358](https://github.com/mikf/gallery-dl/issues/1358)) - ensure `-s/--simulate` always prints filenames ([#1360](https://github.com/mikf/gallery-dl/issues/1360)) ### Removals - [hentaicafe] remove module - [hentainexus] remove module - [mangareader] remove module - [mangastream] remove module ## 1.17.0 - 2021-03-05 ### Additions - [cyberdrop] add support for `https://cyberdrop.me/` ([#1328](https://github.com/mikf/gallery-dl/issues/1328)) - [exhentai] add `metadata` option; extract more metadata from gallery pages ([#1325](https://github.com/mikf/gallery-dl/issues/1325)) - [hentaicafe] add `search` and `tag` extractors ([#1345](https://github.com/mikf/gallery-dl/issues/1345)) - [hentainexus] add `original` option ([#1322](https://github.com/mikf/gallery-dl/issues/1322)) - [instagram] support `/user/reels/` URLs ([#1329](https://github.com/mikf/gallery-dl/issues/1329)) - [naverwebtoon] add support for `https://comic.naver.com/` ([#1331](https://github.com/mikf/gallery-dl/issues/1331)) - [pixiv] add `translated-tags` option ([#1354](https://github.com/mikf/gallery-dl/issues/1354)) - [tbib] add support for `https://tbib.org/` ([#473](https://github.com/mikf/gallery-dl/issues/473), [#1082](https://github.com/mikf/gallery-dl/issues/1082)) - [tumblrgallery] add support for `https://tumblrgallery.xyz/` ([#1298](https://github.com/mikf/gallery-dl/issues/1298)) - [twitter] add extractor for followed users ([#1337](https://github.com/mikf/gallery-dl/issues/1337)) - [twitter] add option to download all media from conversations ([#1319](https://github.com/mikf/gallery-dl/issues/1319)) - [wallhaven] add `collections` extractor ([#1351](https://github.com/mikf/gallery-dl/issues/1351)) - [snap] allow access to user's .netrc for site authentication ([#1352](https://github.com/mikf/gallery-dl/issues/1352)) - add extractors for Gelbooru v0.1 sites ([#234](https://github.com/mikf/gallery-dl/issues/234), [#426](https://github.com/mikf/gallery-dl/issues/426), [#473](https://github.com/mikf/gallery-dl/issues/473), [#767](https://github.com/mikf/gallery-dl/issues/767), [#1238](https://github.com/mikf/gallery-dl/issues/1238)) - add `-E/--extractor-info` command-line option ([#875](https://github.com/mikf/gallery-dl/issues/875)) - add GitHub Actions workflow for building standalone executables ([#1312](https://github.com/mikf/gallery-dl/issues/1312)) - add `browser` and `headers` options ([#1117](https://github.com/mikf/gallery-dl/issues/1117)) - add option to use different youtube-dl forks ([#1330](https://github.com/mikf/gallery-dl/issues/1330)) - support using multiple input files at once ([#1353](https://github.com/mikf/gallery-dl/issues/1353)) ### Changes - [deviantart] extend `extra` option to also download embedded DeviantArt posts. - [exhentai] rename metadata fields to match API results ([#1325](https://github.com/mikf/gallery-dl/issues/1325)) - [mangadex] use `api.mangadex.org` as default API server - [mastodon] cache OAuth tokens ([#616](https://github.com/mikf/gallery-dl/issues/616)) - replace `wait-min` and `wait-max` with `sleep-request` ### Fixes - [500px] skip unavailable photos ([#1335](https://github.com/mikf/gallery-dl/issues/1335)) - [komikcast] fix extraction - [readcomiconline] download high quality image versions ([#1347](https://github.com/mikf/gallery-dl/issues/1347)) - [twitter] update GraphQL endpoints - fix crash when `base-directory` is an empty string ([#1339](https://github.com/mikf/gallery-dl/issues/1339)) ### Removals - remove support for formerly deprecated options - remove `cloudflare` module ## 1.16.5 - 2021-02-14 ### Additions - [behance] support `video` modules ([#1282](https://github.com/mikf/gallery-dl/issues/1282)) - [erome] add `album`, `user`, and `search` extractors ([#409](https://github.com/mikf/gallery-dl/issues/409)) - [hentaifox] support searching by group ([#1294](https://github.com/mikf/gallery-dl/issues/1294)) - [imgclick] add `image` extractor ([#1307](https://github.com/mikf/gallery-dl/issues/1307)) - [kemonoparty] extract inline images ([#1286](https://github.com/mikf/gallery-dl/issues/1286)) - [kemonoparty] support URLs with non-numeric user and post IDs ([#1303](https://github.com/mikf/gallery-dl/issues/1303)) - [pillowfort] add `user` and `post` extractors ([#846](https://github.com/mikf/gallery-dl/issues/846)) ### Changes - [kemonoparty] include `service` in directories and archive keys - [pixiv] require a `refresh-token` to login ([#1304](https://github.com/mikf/gallery-dl/issues/1304)) - [snap] use `core18` as base ### Fixes - [500px] update query hashes - [deviantart] update parameters for `/browse/popular` ([#1267](https://github.com/mikf/gallery-dl/issues/1267)) - [deviantart] provide filename extension for original file downloads ([#1272](https://github.com/mikf/gallery-dl/issues/1272)) - [deviantart] fix `folders` option ([#1302](https://github.com/mikf/gallery-dl/issues/1302)) - [inkbunny] add `sid` parameter to private file downloads ([#1281](https://github.com/mikf/gallery-dl/issues/1281)) - [kemonoparty] fix absolute file URLs - [mangadex] revert to `https://mangadex.org/api/` and add `api-server` option ([#1310](https://github.com/mikf/gallery-dl/issues/1310)) - [nsfwalbum] use fallback for deleted content ([#1259](https://github.com/mikf/gallery-dl/issues/1259)) - [sankaku] update `invalid token` detection ([#1309](https://github.com/mikf/gallery-dl/issues/1309)) - [slideshare] fix extraction - [postprocessor:metadata] fix crash with `extension-format` ([#1285](https://github.com/mikf/gallery-dl/issues/1285)) ## 1.16.4 - 2021-01-23 ### Additions - [furaffinity] add `descriptions` option ([#1231](https://github.com/mikf/gallery-dl/issues/1231)) - [kemonoparty] add `user` and `post` extractors ([#1216](https://github.com/mikf/gallery-dl/issues/1216)) - [nozomi] add `num` enumeration index ([#1239](https://github.com/mikf/gallery-dl/issues/1239)) - [photovogue] added portfolio extractor ([#1253](https://github.com/mikf/gallery-dl/issues/1253)) - [twitter] match `/i/user/ID` URLs - [unsplash] add extractors ([#1197](https://github.com/mikf/gallery-dl/issues/1197)) - [vipr] add image extractor ([#1258](https://github.com/mikf/gallery-dl/issues/1258)) ### Changes - [derpibooru] use "Everything" filter by default ([#862](https://github.com/mikf/gallery-dl/issues/862)) ### Fixes - [derpibooru] update `date` parsing - [foolfuuka] stop search when results are exhausted ([#1174](https://github.com/mikf/gallery-dl/issues/1174)) - [instagram] fix regex for `/saved` URLs ([#1251](https://github.com/mikf/gallery-dl/issues/1251)) - [mangadex] update API URLs - [mangakakalot] fix extraction - [newgrounds] fix flash file extraction ([#1257](https://github.com/mikf/gallery-dl/issues/1257)) - [sankaku] simplify login process - [twitter] fix retries after hitting rate limit ## 1.16.3 - 2021-01-10 ### Fixes - fix crash when using a `dict` for `path-restrict` - [postprocessor:metadata] sanitize custom filenames ## 1.16.2 - 2021-01-09 ### Additions - [derpibooru] add `search` and `gallery` extractors ([#862](https://github.com/mikf/gallery-dl/issues/862)) - [foolfuuka] add `board` and `search` extractors ([#1044](https://github.com/mikf/gallery-dl/issues/1044), [#1174](https://github.com/mikf/gallery-dl/issues/1174)) - [gfycat] add `date` metadata field ([#1138](https://github.com/mikf/gallery-dl/issues/1138)) - [pinterest] add support for getting all boards of a user ([#1205](https://github.com/mikf/gallery-dl/issues/1205)) - [sankaku] add support for book searches ([#1204](https://github.com/mikf/gallery-dl/issues/1204)) - [twitter] fetch media from pinned tweets ([#1203](https://github.com/mikf/gallery-dl/issues/1203)) - [wikiart] add extractor for single paintings ([#1233](https://github.com/mikf/gallery-dl/issues/1233)) - [downloader:http] add MIME type and signature for `.ico` files ([#1211](https://github.com/mikf/gallery-dl/issues/1211)) - add `d` format string conversion for timestamp values - add `"ascii"` as a special `path-restrict` value ### Fixes - [hentainexus] fix extraction ([#1234](https://github.com/mikf/gallery-dl/issues/1234)) - [instagram] categorize single highlight URLs as `highlights` ([#1222](https://github.com/mikf/gallery-dl/issues/1222)) - [redgifs] fix search results - [twitter] fix login with username & password - [twitter] fetch tweets from `homeConversation` entries ## 1.16.1 - 2020-12-27 ### Additions - [instagram] add `include` option ([#1180](https://github.com/mikf/gallery-dl/issues/1180)) - [pinterest] implement video support ([#1189](https://github.com/mikf/gallery-dl/issues/1189)) - [sankaku] reimplement login support ([#1176](https://github.com/mikf/gallery-dl/issues/1176), [#1182](https://github.com/mikf/gallery-dl/issues/1182)) - [sankaku] add support for sankaku.app URLs ([#1193](https://github.com/mikf/gallery-dl/issues/1193)) ### Changes - [e621] return pool posts in order ([#1195](https://github.com/mikf/gallery-dl/issues/1195)) - [hentaicafe] prefer title of `/hc.fyi/` pages ([#1106](https://github.com/mikf/gallery-dl/issues/1106)) - [hentaicafe] simplify default filenames - [sankaku] normalize `created_at` metadata ([#1190](https://github.com/mikf/gallery-dl/issues/1190)) - [postprocessor:exec] do not add missing `{}` to command ([#1185](https://github.com/mikf/gallery-dl/issues/1185)) ### Fixes - [booru] improve error handling - [instagram] warn about private profiles ([#1187](https://github.com/mikf/gallery-dl/issues/1187)) - [keenspot] improve redirect handling - [mangadex] respect `chapter-reverse` settings ([#1194](https://github.com/mikf/gallery-dl/issues/1194)) - [pixiv] output debug message on failed login attempts ([#1192](https://github.com/mikf/gallery-dl/issues/1192)) - increase SQLite connection timeouts ([#1173](https://github.com/mikf/gallery-dl/issues/1173)) ### Removals - [mangapanda] remove module ## 1.16.0 - 2020-12-12 ### Additions - [booru] implement generalized extractors for `*booru` and `moebooru` sites - add support for sakugabooru.com ([#1136](https://github.com/mikf/gallery-dl/issues/1136)) - add support for lolibooru.moe ([#1050](https://github.com/mikf/gallery-dl/issues/1050)) - provide formattable `date` metadata fields ([#1138](https://github.com/mikf/gallery-dl/issues/1138)) - [postprocessor:metadata] add `event` and `filename` options ([#315](https://github.com/mikf/gallery-dl/issues/315), [#866](https://github.com/mikf/gallery-dl/issues/866), [#984](https://github.com/mikf/gallery-dl/issues/984)) - [postprocessor:exec] add `event` option ([#992](https://github.com/mikf/gallery-dl/issues/992)) ### Changes - [flickr] update default directories and improve metadata consistency ([#828](https://github.com/mikf/gallery-dl/issues/828)) - [sankaku] use API endpoints from `beta.sankakucomplex.com` - [downloader:http] improve filename extension handling ([#776](https://github.com/mikf/gallery-dl/issues/776)) - replace all JPEG filename extensions with `jpg` by default ### Fixes - [hentainexus] fix extraction ([#1166](https://github.com/mikf/gallery-dl/issues/1166)) - [instagram] rewrite ([#1113](https://github.com/mikf/gallery-dl/issues/1113), [#1122](https://github.com/mikf/gallery-dl/issues/1122), [#1128](https://github.com/mikf/gallery-dl/issues/1128), [#1130](https://github.com/mikf/gallery-dl/issues/1130), [#1149](https://github.com/mikf/gallery-dl/issues/1149)) - [mangadex] handle external chapters ([#1154](https://github.com/mikf/gallery-dl/issues/1154)) - [nozomi] handle empty `date` fields ([#1163](https://github.com/mikf/gallery-dl/issues/1163)) - [paheal] create directory for each post ([#1147](https://github.com/mikf/gallery-dl/issues/1147)) - [piczel] update API URLs - [twitter] update image URL format ([#1145](https://github.com/mikf/gallery-dl/issues/1145)) - [twitter] improve `x-csrf-token` header handling ([#1170](https://github.com/mikf/gallery-dl/issues/1170)) - [webtoons] update `ageGate` cookies ### Removals - [sankaku] remove login support ## 1.15.4 - 2020-11-27 ### Fixes - [2chan] skip external links - [hentainexus] fix extraction ([#1125](https://github.com/mikf/gallery-dl/issues/1125)) - [mangadex] switch to API v2 ([#1129](https://github.com/mikf/gallery-dl/issues/1129)) - [mangapanda] use http:// - [mangoxo] fix extraction - [reddit] skip invalid gallery items ([#1127](https://github.com/mikf/gallery-dl/issues/1127)) ## 1.15.3 - 2020-11-13 ### Additions - [sankakucomplex] extract videos and embeds ([#308](https://github.com/mikf/gallery-dl/issues/308)) - [twitter] add support for lists ([#1096](https://github.com/mikf/gallery-dl/issues/1096)) - [postprocessor:metadata] accept string-lists for `content-format` ([#1080](https://github.com/mikf/gallery-dl/issues/1080)) - implement `modules` and `extension-map` options ### Fixes - [500px] update query hashes - [8kun] fix file URLs of older posts ([#1101](https://github.com/mikf/gallery-dl/issues/1101)) - [exhentai] update image URL parsing ([#1094](https://github.com/mikf/gallery-dl/issues/1094)) - [hentaifoundry] update `YII_CSRF_TOKEN` cookie handling ([#1083](https://github.com/mikf/gallery-dl/issues/1083)) - [hentaifoundry] use scheme from input URLs ([#1095](https://github.com/mikf/gallery-dl/issues/1095)) - [mangoxo] fix metadata extraction - [paheal] fix extraction ([#1088](https://github.com/mikf/gallery-dl/issues/1088)) - collect post processors from `basecategory` entries ([#1084](https://github.com/mikf/gallery-dl/issues/1084)) ## 1.15.2 - 2020-10-24 ### Additions - [pinterest] implement login support ([#1055](https://github.com/mikf/gallery-dl/issues/1055)) - [reddit] add `date` metadata field ([#1068](https://github.com/mikf/gallery-dl/issues/1068)) - [seiga] add metadata for single image downloads ([#1063](https://github.com/mikf/gallery-dl/issues/1063)) - [twitter] support media from Cards ([#937](https://github.com/mikf/gallery-dl/issues/937), [#1005](https://github.com/mikf/gallery-dl/issues/1005)) - [weasyl] support api-key authentication ([#1057](https://github.com/mikf/gallery-dl/issues/1057)) - add a `t` format string conversion for trimming whitespace ([#1065](https://github.com/mikf/gallery-dl/issues/1065)) ### Fixes - [blogger] handle URLs with specified width/height ([#1061](https://github.com/mikf/gallery-dl/issues/1061)) - [fallenangels] fix extraction of `.5` chapters - [gelbooru] rewrite mp4 video URLs ([#1048](https://github.com/mikf/gallery-dl/issues/1048)) - [hitomi] fix image URLs and gallery URL pattern - [mangadex] unescape more metadata fields ([#1066](https://github.com/mikf/gallery-dl/issues/1066)) - [mangahere] ensure download URLs have a scheme ([#1070](https://github.com/mikf/gallery-dl/issues/1070)) - [mangakakalot] ignore "Go Home" buttons in chapter pages - [newgrounds] handle embeds without scheme ([#1033](https://github.com/mikf/gallery-dl/issues/1033)) - [newgrounds] provide fallback URLs for video downloads ([#1042](https://github.com/mikf/gallery-dl/issues/1042)) - [xhamster] fix user profile extraction ## 1.15.1 - 2020-10-11 ### Additions - [hentaicafe] add `manga_id` metadata field ([#1036](https://github.com/mikf/gallery-dl/issues/1036)) - [hentaifoundry] add support for stories ([#734](https://github.com/mikf/gallery-dl/issues/734)) - [hentaifoundry] add `include` option - [newgrounds] extract image embeds ([#1033](https://github.com/mikf/gallery-dl/issues/1033)) - [nijie] add `include` option ([#1018](https://github.com/mikf/gallery-dl/issues/1018)) - [reactor] match URLs without subdomain ([#1053](https://github.com/mikf/gallery-dl/issues/1053)) - [twitter] extend `retweets` option ([#1026](https://github.com/mikf/gallery-dl/issues/1026)) - [weasyl] add extractors ([#977](https://github.com/mikf/gallery-dl/issues/977)) ### Fixes - [500px] update query hashes - [behance] fix `collection` extraction - [newgrounds] fix video extraction ([#1042](https://github.com/mikf/gallery-dl/issues/1042)) - [twitter] improve twitpic extraction ([#1019](https://github.com/mikf/gallery-dl/issues/1019)) - [weibo] handle posts with more than 9 images ([#926](https://github.com/mikf/gallery-dl/issues/926)) - [xvideos] fix `title` extraction - fix crash when using `--download-archive` with `--no-skip` ([#1023](https://github.com/mikf/gallery-dl/issues/1023)) - fix issues with `blacklist`/`whitelist` defaults ([#1051](https://github.com/mikf/gallery-dl/issues/1051), [#1056](https://github.com/mikf/gallery-dl/issues/1056)) ### Removals - [kissmanga] remove module ## 1.15.0 - 2020-09-20 ### Additions - [deviantart] support watchers-only/paid deviations ([#995](https://github.com/mikf/gallery-dl/issues/995)) - [myhentaigallery] add gallery extractor ([#1001](https://github.com/mikf/gallery-dl/issues/1001)) - [twitter] support specifying users by ID ([#980](https://github.com/mikf/gallery-dl/issues/980)) - [twitter] support `/intent/user?user_id=…` URLs ([#980](https://github.com/mikf/gallery-dl/issues/980)) - add `--no-skip` command-line option ([#986](https://github.com/mikf/gallery-dl/issues/986)) - add `blacklist` and `whitelist` options ([#492](https://github.com/mikf/gallery-dl/issues/492), [#844](https://github.com/mikf/gallery-dl/issues/844)) - add `filesize-min` and `filesize-max` options ([#780](https://github.com/mikf/gallery-dl/issues/780)) - add `sleep-extractor` and `sleep-request` options ([#788](https://github.com/mikf/gallery-dl/issues/788)) - write skipped files to archive ([#550](https://github.com/mikf/gallery-dl/issues/550)) ### Changes - [exhentai] update wait time before original image downloads ([#978](https://github.com/mikf/gallery-dl/issues/978)) - [imgur] use new API endpoints for image/album data - [tumblr] create directories for each post ([#965](https://github.com/mikf/gallery-dl/issues/965)) - support format string replacement fields in download archive paths ([#985](https://github.com/mikf/gallery-dl/issues/985)) - reduce wait time growth rate for HTTP retries from exponential to linear ### Fixes - [500px] update query hash - [aryion] improve post ID extraction ([#981](https://github.com/mikf/gallery-dl/issues/981), [#982](https://github.com/mikf/gallery-dl/issues/982)) - [danbooru] handle posts without `id` ([#1004](https://github.com/mikf/gallery-dl/issues/1004)) - [furaffinity] update download URL extraction ([#988](https://github.com/mikf/gallery-dl/issues/988)) - [imgur] fix image/album detection for galleries - [postprocessor:zip] defer zip file creation ([#968](https://github.com/mikf/gallery-dl/issues/968)) ### Removals - [jaiminisbox] remove extractors - [worldthree] remove extractors ## 1.14.5 - 2020-08-30 ### Additions - [aryion] add username/password support ([#960](https://github.com/mikf/gallery-dl/issues/960)) - [exhentai] add ability to specify a custom image limit ([#940](https://github.com/mikf/gallery-dl/issues/940)) - [furaffinity] add `search` extractor ([#915](https://github.com/mikf/gallery-dl/issues/915)) - [imgur] add `search` and `tag` extractors ([#934](https://github.com/mikf/gallery-dl/issues/934)) ### Fixes - [500px] fix extraction and update URL patterns ([#956](https://github.com/mikf/gallery-dl/issues/956)) - [aryion] update folder mime type list ([#945](https://github.com/mikf/gallery-dl/issues/945)) - [gelbooru] fix extraction without API - [hentaihand] update to new site layout - [hitomi] fix redirect processing - [reddit] handle deleted galleries ([#953](https://github.com/mikf/gallery-dl/issues/953)) - [reddit] improve gallery extraction ([#955](https://github.com/mikf/gallery-dl/issues/955)) ## 1.14.4 - 2020-08-15 ### Additions - [blogger] add `search` extractor ([#925](https://github.com/mikf/gallery-dl/issues/925)) - [blogger] support searching posts by labels ([#925](https://github.com/mikf/gallery-dl/issues/925)) - [inkbunny] add `user` and `post` extractors ([#283](https://github.com/mikf/gallery-dl/issues/283)) - [instagram] support `/reel/` URLs - [pinterest] support `pinterest.co.uk` URLs ([#914](https://github.com/mikf/gallery-dl/issues/914)) - [reddit] support gallery posts ([#920](https://github.com/mikf/gallery-dl/issues/920)) - [subscribestar] extract attached media files ([#852](https://github.com/mikf/gallery-dl/issues/852)) ### Fixes - [blogger] improve error messages for missing posts/blogs ([#903](https://github.com/mikf/gallery-dl/issues/903)) - [exhentai] adjust image limit costs ([#940](https://github.com/mikf/gallery-dl/issues/940)) - [gfycat] skip malformed gfycat responses ([#902](https://github.com/mikf/gallery-dl/issues/902)) - [imgur] handle 403 overcapacity responses ([#910](https://github.com/mikf/gallery-dl/issues/910)) - [instagram] wait before GraphQL requests ([#901](https://github.com/mikf/gallery-dl/issues/901)) - [mangareader] fix extraction - [mangoxo] fix login - [pixnet] detect password-protected albums ([#177](https://github.com/mikf/gallery-dl/issues/177)) - [simplyhentai] fix `gallery_id` extraction - [subscribestar] update `date` parsing - [vsco] handle missing `description` fields - [xhamster] fix extraction ([#917](https://github.com/mikf/gallery-dl/issues/917)) - allow `parent-directory` to work recursively ([#905](https://github.com/mikf/gallery-dl/issues/905)) - skip external OAuth tests ([#908](https://github.com/mikf/gallery-dl/issues/908)) ### Removals - [bobx] remove module ## 1.14.3 - 2020-07-18 ### Additions - [8muses] support `comics.8muses.com` URLs - [artstation] add `following` extractor ([#888](https://github.com/mikf/gallery-dl/issues/888)) - [exhentai] add `domain` option ([#897](https://github.com/mikf/gallery-dl/issues/897)) - [gfycat] add `user` and `search` extractors - [imgur] support all `/t/...` URLs ([#880](https://github.com/mikf/gallery-dl/issues/880)) - [khinsider] add `format` option ([#840](https://github.com/mikf/gallery-dl/issues/840)) - [mangakakalot] add `manga` and `chapter` extractors ([#876](https://github.com/mikf/gallery-dl/issues/876)) - [redgifs] support `gifsdeliverynetwork.com` URLs ([#874](https://github.com/mikf/gallery-dl/issues/874)) - [subscribestar] add `user` and `post` extractors ([#852](https://github.com/mikf/gallery-dl/issues/852)) - [twitter] add support for nitter.net URLs ([#890](https://github.com/mikf/gallery-dl/issues/890)) - add Zsh completion script ([#150](https://github.com/mikf/gallery-dl/issues/150)) ### Fixes - [gfycat] retry 404'ed videos on redgifs.com ([#874](https://github.com/mikf/gallery-dl/issues/874)) - [newgrounds] fix favorites extraction - [patreon] yield images and attachments before post files ([#871](https://github.com/mikf/gallery-dl/issues/871)) - [reddit] fix AttributeError when using `recursion` ([#879](https://github.com/mikf/gallery-dl/issues/879)) - [twitter] raise proper exception if a user doesn't exist ([#891](https://github.com/mikf/gallery-dl/issues/891)) - defer directory creation ([#722](https://github.com/mikf/gallery-dl/issues/722)) - set pseudo extension for Metadata messages ([#865](https://github.com/mikf/gallery-dl/issues/865)) - prevent exception on Cloudflare challenges ([#868](https://github.com/mikf/gallery-dl/issues/868)) ## 1.14.2 - 2020-06-27 ### Additions - [artstation] add `date` metadata field ([#839](https://github.com/mikf/gallery-dl/issues/839)) - [mastodon] add `date` metadata field ([#839](https://github.com/mikf/gallery-dl/issues/839)) - [pinterest] add support for board sections ([#835](https://github.com/mikf/gallery-dl/issues/835)) - [twitter] add extractor for liked tweets ([#837](https://github.com/mikf/gallery-dl/issues/837)) - [twitter] add option to filter media from quoted tweets ([#854](https://github.com/mikf/gallery-dl/issues/854)) - [weibo] add `date` metadata field to `status` objects ([#829](https://github.com/mikf/gallery-dl/issues/829)) ### Fixes - [aryion] fix user gallery extraction ([#832](https://github.com/mikf/gallery-dl/issues/832)) - [imgur] build directory paths for each file ([#842](https://github.com/mikf/gallery-dl/issues/842)) - [tumblr] prevent errors when using `reblogs=same-blog` ([#851](https://github.com/mikf/gallery-dl/issues/851)) - [twitter] always provide an `author` metadata field ([#831](https://github.com/mikf/gallery-dl/issues/831), [#833](https://github.com/mikf/gallery-dl/issues/833)) - [twitter] don't download video previews ([#833](https://github.com/mikf/gallery-dl/issues/833)) - [twitter] improve handling of deleted tweets ([#838](https://github.com/mikf/gallery-dl/issues/838)) - [twitter] fix search results ([#847](https://github.com/mikf/gallery-dl/issues/847)) - [twitter] improve handling of quoted tweets ([#854](https://github.com/mikf/gallery-dl/issues/854)) - fix config lookups when multiple locations are involved ([#843](https://github.com/mikf/gallery-dl/issues/843)) - improve output of `-K/--list-keywords` for parent extractors ([#825](https://github.com/mikf/gallery-dl/issues/825)) - call `flush()` after writing JSON in `DataJob()` ([#727](https://github.com/mikf/gallery-dl/issues/727)) ## 1.14.1 - 2020-06-12 ### Additions - [furaffinity] add `artist_url` metadata field ([#821](https://github.com/mikf/gallery-dl/issues/821)) - [redgifs] add `user` and `search` extractors ([#724](https://github.com/mikf/gallery-dl/issues/724)) ### Changes - [deviantart] extend `extra` option; also search journals for sta.sh links ([#712](https://github.com/mikf/gallery-dl/issues/712)) - [twitter] rewrite; use new interface ([#806](https://github.com/mikf/gallery-dl/issues/806), [#740](https://github.com/mikf/gallery-dl/issues/740)) ### Fixes - [kissmanga] work around CAPTCHAs ([#818](https://github.com/mikf/gallery-dl/issues/818)) - [nhentai] fix extraction ([#819](https://github.com/mikf/gallery-dl/issues/819)) - [webtoons] generalize comic extraction code ([#820](https://github.com/mikf/gallery-dl/issues/820)) ## 1.14.0 - 2020-05-31 ### Additions - [imagechest] add new extractor for imgchest.com ([#750](https://github.com/mikf/gallery-dl/issues/750)) - [instagram] add `post_url`, `tags`, `location`, `tagged_users` metadata ([#743](https://github.com/mikf/gallery-dl/issues/743)) - [redgifs] add image extractor ([#724](https://github.com/mikf/gallery-dl/issues/724)) - [webtoons] add new extractor for webtoons.com ([#761](https://github.com/mikf/gallery-dl/issues/761)) - implement `--write-pages` option ([#736](https://github.com/mikf/gallery-dl/issues/736)) - extend `path-restrict` option ([#662](https://github.com/mikf/gallery-dl/issues/662)) - implement `path-replace` option ([#662](https://github.com/mikf/gallery-dl/issues/662), [#755](https://github.com/mikf/gallery-dl/issues/755)) - make `path` and `keywords` available in logging messages ([#574](https://github.com/mikf/gallery-dl/issues/574), [#575](https://github.com/mikf/gallery-dl/issues/575)) ### Changes - [danbooru] change default value of `ugoira` to `false` - [downloader:ytdl] change default value of `forward-cookies` to `false` - [downloader:ytdl] fix file extensions when merging into `.mkv` ([#720](https://github.com/mikf/gallery-dl/issues/720)) - write OAuth tokens to cache ([#616](https://github.com/mikf/gallery-dl/issues/616)) - use `%APPDATA%\gallery-dl` for config files and cache on Windows - use `util.Formatter` for formatting logging messages - reuse HTTP connections from parent extractors ### Fixes - [deviantart] use private access tokens for Journals ([#738](https://github.com/mikf/gallery-dl/issues/738)) - [gelbooru] simplify and fix pool extraction - [imgur] fix extraction of animated images without `mp4` entry - [imgur] treat `/t/unmuted/` URLs as galleries - [instagram] fix login with username & password ([#756](https://github.com/mikf/gallery-dl/issues/756), [#771](https://github.com/mikf/gallery-dl/issues/771), [#797](https://github.com/mikf/gallery-dl/issues/797), [#803](https://github.com/mikf/gallery-dl/issues/803)) - [reddit] don't send OAuth headers for file downloads ([#729](https://github.com/mikf/gallery-dl/issues/729)) - fix/improve Cloudflare bypass code ([#728](https://github.com/mikf/gallery-dl/issues/728), [#757](https://github.com/mikf/gallery-dl/issues/757)) - reset filenames on empty file extensions ([#733](https://github.com/mikf/gallery-dl/issues/733)) ## 1.13.6 - 2020-05-02 ### Additions - [patreon] respect filters and sort order in query parameters ([#711](https://github.com/mikf/gallery-dl/issues/711)) - [speakerdeck] add a new extractor for speakerdeck.com ([#726](https://github.com/mikf/gallery-dl/issues/726)) - [twitter] add `replies` option ([#705](https://github.com/mikf/gallery-dl/issues/705)) - [weibo] add `videos` option - [downloader:http] add MIME types for `.psd` files ([#714](https://github.com/mikf/gallery-dl/issues/714)) ### Fixes - [artstation] improve embed extraction ([#720](https://github.com/mikf/gallery-dl/issues/720)) - [deviantart] limit API wait times ([#721](https://github.com/mikf/gallery-dl/issues/721)) - [newgrounds] fix URLs produced by the `following` extractor ([#684](https://github.com/mikf/gallery-dl/issues/684)) - [patreon] improve file hash extraction ([#713](https://github.com/mikf/gallery-dl/issues/713)) - [vsco] fix user gallery extraction - fix/improve Cloudflare bypass code ([#728](https://github.com/mikf/gallery-dl/issues/728)) ## 1.13.5 - 2020-04-27 ### Additions - [500px] recognize `web.500px.com` URLs - [aryion] support downloading from folders ([#694](https://github.com/mikf/gallery-dl/issues/694)) - [furaffinity] add extractor for followed users ([#515](https://github.com/mikf/gallery-dl/issues/515)) - [hitomi] add extractor for tag searches ([#697](https://github.com/mikf/gallery-dl/issues/697)) - [instagram] add `post_id` and `num` metadata fields ([#698](https://github.com/mikf/gallery-dl/issues/698)) - [newgrounds] add extractor for followed users ([#684](https://github.com/mikf/gallery-dl/issues/684)) - [patreon] recognize URLs with creator IDs ([#711](https://github.com/mikf/gallery-dl/issues/711)) - [twitter] add `reply` metadata field ([#705](https://github.com/mikf/gallery-dl/issues/705)) - [xhamster] recognize `xhamster.porncache.net` URLs ([#700](https://github.com/mikf/gallery-dl/issues/700)) ### Fixes - [gelbooru] improve post ID extraction in pool listings - [hitomi] fix extraction of galleries without tags - [jaiminisbox] update metadata decoding procedure ([#702](https://github.com/mikf/gallery-dl/issues/702)) - [mastodon] fix pagination ([#701](https://github.com/mikf/gallery-dl/issues/701)) - [mastodon] improve account searches ([#704](https://github.com/mikf/gallery-dl/issues/704)) - [patreon] fix hash extraction from download URLs ([#693](https://github.com/mikf/gallery-dl/issues/693)) - improve parameter extraction when solving Cloudflare challenges ## 1.13.4 - 2020-04-12 ### Additions - [aryion] add `gallery` and `post` extractors ([#390](https://github.com/mikf/gallery-dl/issues/390), [#673](https://github.com/mikf/gallery-dl/issues/673)) - [deviantart] detect and handle folders in sta.sh listings ([#659](https://github.com/mikf/gallery-dl/issues/659)) - [hentainexus] add `circle`, `event`, and `title_conventional` metadata fields ([#661](https://github.com/mikf/gallery-dl/issues/661)) - [hiperdex] add `artist` extractor ([#606](https://github.com/mikf/gallery-dl/issues/606)) - [mastodon] add access tokens for `mastodon.social` and `baraag.net` ([#665](https://github.com/mikf/gallery-dl/issues/665)) ### Changes - [deviantart] retrieve *all* download URLs through the OAuth API - automatically read config files in PyInstaller executable directories ([#682](https://github.com/mikf/gallery-dl/issues/682)) ### Fixes - [deviantart] handle "Request blocked" errors ([#655](https://github.com/mikf/gallery-dl/issues/655)) - [deviantart] improve JPEG quality replacement pattern - [hiperdex] fix extraction - [mastodon] handle API rate limits ([#665](https://github.com/mikf/gallery-dl/issues/665)) - [mastodon] update OAuth credentials for pawoo.net ([#665](https://github.com/mikf/gallery-dl/issues/665)) - [myportfolio] fix extraction of galleries without title - [piczel] fix extraction of single images - [vsco] fix collection extraction - [weibo] accept status URLs with non-numeric IDs ([#664](https://github.com/mikf/gallery-dl/issues/664)) ## 1.13.3 - 2020-03-28 ### Additions - [instagram] Add support for user's saved medias ([#644](https://github.com/mikf/gallery-dl/issues/644)) - [nozomi] support multiple images per post ([#646](https://github.com/mikf/gallery-dl/issues/646)) - [35photo] add `tag` extractor ### Changes - [mangadex] transform timestamps from `date` fields to datetime objects ### Fixes - [deviantart] handle decode errors for `extended_fetch` results ([#655](https://github.com/mikf/gallery-dl/issues/655)) - [e621] fix bug in API rate limiting and improve pagination ([#651](https://github.com/mikf/gallery-dl/issues/651)) - [instagram] update pattern for user profile URLs - [mangapark] fix metadata extraction - [nozomi] sort search results ([#646](https://github.com/mikf/gallery-dl/issues/646)) - [piczel] fix extraction - [twitter] fix typo in `x-twitter-auth-type` header ([#625](https://github.com/mikf/gallery-dl/issues/625)) - remove trailing dots from Windows directory names ([#647](https://github.com/mikf/gallery-dl/issues/647)) - fix crash with missing `stdout`/`stderr`/`stdin` handles ([#653](https://github.com/mikf/gallery-dl/issues/653)) ## 1.13.2 - 2020-03-14 ### Additions - [furaffinity] extract more metadata - [instagram] add `post_shortcode` metadata field ([#525](https://github.com/mikf/gallery-dl/issues/525)) - [kabeuchi] add extractor ([#561](https://github.com/mikf/gallery-dl/issues/561)) - [newgrounds] add extractor for favorited posts ([#394](https://github.com/mikf/gallery-dl/issues/394)) - [pixiv] implement `avatar` option ([#595](https://github.com/mikf/gallery-dl/issues/595), [#623](https://github.com/mikf/gallery-dl/issues/623)) - [twitter] add extractor for bookmarked Tweets ([#625](https://github.com/mikf/gallery-dl/issues/625)) ### Fixes - [bcy] reduce number of HTTP requests during data extraction - [e621] update to new interface ([#635](https://github.com/mikf/gallery-dl/issues/635)) - [exhentai] handle incomplete MIME types ([#632](https://github.com/mikf/gallery-dl/issues/632)) - [hitomi] improve metadata extraction - [mangoxo] fix login - [newgrounds] improve error handling when extracting post data ## 1.13.1 - 2020-03-01 ### Additions - [hentaihand] add extractors ([#605](https://github.com/mikf/gallery-dl/issues/605)) - [hiperdex] add chapter and manga extractors ([#606](https://github.com/mikf/gallery-dl/issues/606)) - [oauth] implement option to write DeviantArt refresh-tokens to cache ([#616](https://github.com/mikf/gallery-dl/issues/616)) - [downloader:http] add more MIME types for `.bmp` and `.rar` files ([#621](https://github.com/mikf/gallery-dl/issues/621), [#628](https://github.com/mikf/gallery-dl/issues/628)) - warn about expired cookies ### Fixes - [bcy] fix partial image URLs ([#613](https://github.com/mikf/gallery-dl/issues/613)) - [danbooru] fix Ugoira downloads and metadata - [deviantart] check availability of `/intermediary/` URLs ([#609](https://github.com/mikf/gallery-dl/issues/609)) - [hitomi] follow multiple redirects & fix image URLs - [piczel] improve and update - [tumblr] replace `-` with ` ` in tag searches ([#611](https://github.com/mikf/gallery-dl/issues/611)) - [vsco] update gallery URL pattern - fix `--verbose` and `--quiet` command-line options ## 1.13.0 - 2020-02-16 ### Additions - Support for - `furaffinity` - https://www.furaffinity.net/ ([#284](https://github.com/mikf/gallery-dl/issues/284)) - `8kun` - https://8kun.top/ ([#582](https://github.com/mikf/gallery-dl/issues/582)) - `bcy` - https://bcy.net/ ([#592](https://github.com/mikf/gallery-dl/issues/592)) - [blogger] implement video extraction ([#587](https://github.com/mikf/gallery-dl/issues/587)) - [oauth] add option to specify port number used by local server ([#604](https://github.com/mikf/gallery-dl/issues/604)) - [pixiv] add `rating` metadata field ([#595](https://github.com/mikf/gallery-dl/issues/595)) - [pixiv] recognize tags at the end of new bookmark URLs - [reddit] add `videos` option - [weibo] use youtube-dl to download from m3u8 manifests - implement `parent-directory` option ([#551](https://github.com/mikf/gallery-dl/issues/551)) - extend filename formatting capabilities: - implement field name alternatives ([#525](https://github.com/mikf/gallery-dl/issues/525)) - allow multiple "special" format specifiers per replacement field ([#595](https://github.com/mikf/gallery-dl/issues/595)) - allow for numeric list and string indices ### Changes - [reddit] handle reddit-hosted images and videos natively ([#551](https://github.com/mikf/gallery-dl/issues/551)) - [twitter] change default value for `videos` to `true` ### Fixes - [cloudflare] unescape challenge URLs - [deviantart] fix video extraction from `extended_fetch` results - [hitomi] implement workaround for "broken" redirects - [khinsider] fix and improve metadata extraction - [patreon] filter duplicate files per post ([#590](https://github.com/mikf/gallery-dl/issues/590)) - [piczel] fix extraction - [pixiv] fix user IDs for bookmarks API calls ([#596](https://github.com/mikf/gallery-dl/issues/596)) - [sexcom] fix image URLs - [twitter] force old login page layout ([#584](https://github.com/mikf/gallery-dl/issues/584), [#598](https://github.com/mikf/gallery-dl/issues/598)) - [vsco] skip "invalid" entities - improve functions to load/save cookies.txt files ([#586](https://github.com/mikf/gallery-dl/issues/586)) ### Removals - [yaplog] remove module ## 1.12.3 - 2020-01-19 ### Additions - [hentaifoundry] extract more metadata ([#565](https://github.com/mikf/gallery-dl/issues/565)) - [twitter] add option to extract TwitPic embeds ([#579](https://github.com/mikf/gallery-dl/issues/579)) - implement a post-processor module to compare file versions ([#530](https://github.com/mikf/gallery-dl/issues/530)) ### Fixes - [hitomi] update image URL generation - [mangadex] revert domain to `mangadex.org` - [pinterest] improve detection of invalid pin.it links - [pixiv] update URL patterns for user profiles and bookmarks ([#568](https://github.com/mikf/gallery-dl/issues/568)) - [twitter] Fix stop before real end ([#573](https://github.com/mikf/gallery-dl/issues/573)) - remove temp files before downloading from fallback URLs ### Removals - [erolord] remove extractor ## 1.12.2 - 2020-01-05 ### Additions - [deviantart] match new search/popular URLs ([#538](https://github.com/mikf/gallery-dl/issues/538)) - [deviantart] match `/favourites/all` URLs ([#555](https://github.com/mikf/gallery-dl/issues/555)) - [deviantart] add extractor for followed users ([#515](https://github.com/mikf/gallery-dl/issues/515)) - [pixiv] support listing followed users ([#515](https://github.com/mikf/gallery-dl/issues/515)) - [imagefap] handle beta.imagefap.com URLs ([#552](https://github.com/mikf/gallery-dl/issues/552)) - [postprocessor:metadata] add `directory` option ([#520](https://github.com/mikf/gallery-dl/issues/520)) ### Fixes - [artstation] fix search result pagination ([#537](https://github.com/mikf/gallery-dl/issues/537)) - [directlink] send Referer headers ([#536](https://github.com/mikf/gallery-dl/issues/536)) - [exhentai] restrict default directory name length ([#545](https://github.com/mikf/gallery-dl/issues/545)) - [mangadex] change domain to mangadex.cc ([#559](https://github.com/mikf/gallery-dl/issues/559)) - [mangahere] send `isAdult` cookies ([#556](https://github.com/mikf/gallery-dl/issues/556)) - [newgrounds] fix tags metadata extraction - [pixiv] retry after rate limit errors ([#535](https://github.com/mikf/gallery-dl/issues/535)) - [twitter] handle quoted tweets ([#526](https://github.com/mikf/gallery-dl/issues/526)) - [twitter] handle API rate limits ([#526](https://github.com/mikf/gallery-dl/issues/526)) - [twitter] fix URLs forwarded to youtube-dl ([#540](https://github.com/mikf/gallery-dl/issues/540)) - prevent infinite recursion when spawning new extractors ([#489](https://github.com/mikf/gallery-dl/issues/489)) - improve output of `--list-keywords` for "parent" extractors ([#548](https://github.com/mikf/gallery-dl/issues/548)) - provide fallback for SQLite versions with missing `WITHOUT ROWID` support ([#553](https://github.com/mikf/gallery-dl/issues/553)) ## 1.12.1 - 2019-12-22 ### Additions - [4chan] add extractor for entire boards ([#510](https://github.com/mikf/gallery-dl/issues/510)) - [realbooru] add extractors for pools, posts, and tag searches ([#514](https://github.com/mikf/gallery-dl/issues/514)) - [instagram] implement a `videos` option ([#521](https://github.com/mikf/gallery-dl/issues/521)) - [vsco] implement a `videos` option - [postprocessor:metadata] implement a `bypost` option for downloading the metadata of an entire post ([#511](https://github.com/mikf/gallery-dl/issues/511)) ### Changes - [reddit] change the default value for `comments` to `0` - [vsco] improve image resolutions - make filesystem-related errors during file downloads non-fatal ([#512](https://github.com/mikf/gallery-dl/issues/512)) ### Fixes - [foolslide] add fallback for chapter data extraction - [instagram] ignore errors during post-page extraction - [patreon] avoid errors when fetching user info ([#508](https://github.com/mikf/gallery-dl/issues/508)) - [patreon] improve URL pattern for single posts - [reddit] fix errors with `t1` submissions - [vsco] fix user profile extraction … again - [weibo] handle unavailable/deleted statuses - [downloader:http] improve rate limit handling - retain trailing zeroes in Cloudflare challenge answers ## 1.12.0 - 2019-12-08 ### Additions - [flickr] support 3k, 4k, 5k, and 6k photo sizes ([#472](https://github.com/mikf/gallery-dl/issues/472)) - [imgur] add extractor for subreddit links ([#500](https://github.com/mikf/gallery-dl/issues/500)) - [newgrounds] add extractors for `audio` listings and general `media` files ([#394](https://github.com/mikf/gallery-dl/issues/394)) - [newgrounds] implement login support ([#394](https://github.com/mikf/gallery-dl/issues/394)) - [postprocessor:metadata] implement a `extension-format` option ([#477](https://github.com/mikf/gallery-dl/issues/477)) - `--exec-after` ### Changes - [deviantart] ensure consistent username capitalization ([#455](https://github.com/mikf/gallery-dl/issues/455)) - [directlink] split `{path}` into `{path}/{filename}.{extension}` - [twitter] update metadata fields with user/author information - [postprocessor:metadata] filter private entries & rename `format` to `content-format` - Enable `cookies-update` by default ### Fixes - [2chan] fix metadata extraction - [behance] get images from 'media_collection' modules - [bobx] fix image downloads by randomly generating session cookies ([#482](https://github.com/mikf/gallery-dl/issues/482)) - [deviantart] revert to getting download URLs from OAuth API calls ([#488](https://github.com/mikf/gallery-dl/issues/488)) - [deviantart] fix URL generation from '/extended_fetch' results ([#505](https://github.com/mikf/gallery-dl/issues/505)) - [flickr] adjust OAuth redirect URI ([#503](https://github.com/mikf/gallery-dl/issues/503)) - [hentaifox] fix extraction - [imagefap] adapt to new image URL format - [imgbb] fix error in galleries without user info ([#471](https://github.com/mikf/gallery-dl/issues/471)) - [instagram] prevent errors with missing 'video_url' fields ([#479](https://github.com/mikf/gallery-dl/issues/479)) - [nijie] fix `date` parsing - [pixiv] match new search URLs ([#507](https://github.com/mikf/gallery-dl/issues/507)) - [plurk] fix comment pagination - [sexcom] send specific Referer headers when downloading videos - [twitter] fix infinite loops ([#499](https://github.com/mikf/gallery-dl/issues/499)) - [vsco] fix user profile and collection extraction ([#480](https://github.com/mikf/gallery-dl/issues/480)) - Fix Cloudflare DDoS protection bypass ### Removals - `--abort-on-skip` ## 1.11.1 - 2019-11-09 ### Fixes - Fix inclusion of bash completion and man pages in source distributions ## 1.11.0 - 2019-11-08 ### Additions - Support for - `blogger` - https://www.blogger.com/ ([#364](https://github.com/mikf/gallery-dl/issues/364)) - `nozomi` - https://nozomi.la/ ([#388](https://github.com/mikf/gallery-dl/issues/388)) - `issuu` - https://issuu.com/ ([#413](https://github.com/mikf/gallery-dl/issues/413)) - `naver` - https://blog.naver.com/ ([#447](https://github.com/mikf/gallery-dl/issues/447)) - Extractor for `twitter` search results ([#448](https://github.com/mikf/gallery-dl/issues/448)) - Extractor for `deviantart` user profiles with configurable targets ([#377](https://github.com/mikf/gallery-dl/issues/377), [#419](https://github.com/mikf/gallery-dl/issues/419)) - `--ugoira-conv-lossless` ([#432](https://github.com/mikf/gallery-dl/issues/432)) - `cookies-update` option to allow updating cookies.txt files ([#445](https://github.com/mikf/gallery-dl/issues/445)) - Optional `cloudflare` and `video` installation targets ([#460](https://github.com/mikf/gallery-dl/issues/460)) - Allow executing commands with the `exec` post-processor after all files are downloaded ([#413](https://github.com/mikf/gallery-dl/issues/413), [#421](https://github.com/mikf/gallery-dl/issues/421)) ### Changes - Rewrite `imgur` using its public API ([#446](https://github.com/mikf/gallery-dl/issues/446)) - Rewrite `luscious` using GraphQL queries ([#457](https://github.com/mikf/gallery-dl/issues/457)) - Adjust default `nijie` filenames to match `pixiv` - Change enumeration index for gallery extractors from `page` to `num` - Return non-zero exit status when errors occurred - Forward proxy settings to youtube-dl downloader - Install bash completion script into `share/bash-completion/completions` ### Fixes - Adapt to new `instagram` page layout when logged in ([#391](https://github.com/mikf/gallery-dl/issues/391)) - Support protected `twitter` videos ([#452](https://github.com/mikf/gallery-dl/issues/452)) - Extend `hitomi` URL pattern and fix gallery extraction - Restore OAuth2 authentication error messages - Miscellaneous fixes for `patreon` ([#444](https://github.com/mikf/gallery-dl/issues/444)), `deviantart` ([#455](https://github.com/mikf/gallery-dl/issues/455)), `sexcom` ([#464](https://github.com/mikf/gallery-dl/issues/464)), `imgur` ([#467](https://github.com/mikf/gallery-dl/issues/467)), `simplyhentai` ## 1.10.6 - 2019-10-11 ### Additions - `--exec` command-line option to specify a command to run after each file download ([#421](https://github.com/mikf/gallery-dl/issues/421)) ### Changes - Include titles in `gfycat` default filenames ([#434](https://github.com/mikf/gallery-dl/issues/434)) ### Fixes - Fetch working download URLs for `deviantart` ([#436](https://github.com/mikf/gallery-dl/issues/436)) - Various fixes and improvements for `yaplog` blogs ([#443](https://github.com/mikf/gallery-dl/issues/443)) - Fix image URL generation for `hitomi` galleries - Miscellaneous fixes for `behance` and `xvideos` ## 1.10.5 - 2019-09-28 ### Additions - `instagram.highlights` option to include highlighted stories when downloading user profiles ([#329](https://github.com/mikf/gallery-dl/issues/329)) - Support for `/user/` URLs on `reddit` ([#350](https://github.com/mikf/gallery-dl/issues/350)) - Support for `imgur` user profiles and favorites ([#420](https://github.com/mikf/gallery-dl/issues/420)) - Additional metadata fields on `nijie`([#423](https://github.com/mikf/gallery-dl/issues/423)) ### Fixes - Improve handling of private `deviantart` artworks ([#414](https://github.com/mikf/gallery-dl/issues/414)) and 429 status codes ([#424](https://github.com/mikf/gallery-dl/issues/424)) - Prevent fatal errors when trying to open download-archive files ([#417](https://github.com/mikf/gallery-dl/issues/417)) - Detect and ignore unavailable videos on `weibo` ([#427](https://github.com/mikf/gallery-dl/issues/427)) - Update the `scope` of new `reddit` refresh-tokens ([#428](https://github.com/mikf/gallery-dl/issues/428)) - Fix inconsistencies with the `reddit.comments` option ([#429](https://github.com/mikf/gallery-dl/issues/429)) - Extend URL patterns for `hentaicafe` manga and `pixiv` artworks - Improve detection of unavailable albums on `luscious` and `imgbb` - Miscellaneous fixes for `tsumino` ## 1.10.4 - 2019-09-08 ### Additions - Support for - `lineblog` - https://www.lineblog.me/ ([#404](https://github.com/mikf/gallery-dl/issues/404)) - `fuskator` - https://fuskator.com/ ([#407](https://github.com/mikf/gallery-dl/issues/407)) - `ugoira` option for `danbooru` to download pre-rendered ugoira animations ([#406](https://github.com/mikf/gallery-dl/issues/406)) ### Fixes - Download the correct files from `twitter` replies ([#403](https://github.com/mikf/gallery-dl/issues/403)) - Prevent crash when trying to use unavailable downloader modules ([#405](https://github.com/mikf/gallery-dl/issues/405)) - Fix `pixiv` authentication ([#411](https://github.com/mikf/gallery-dl/issues/411)) - Improve `exhentai` image limit checks - Miscellaneous fixes for `hentaicafe`, `simplyhentai`, `tumblr` ## 1.10.3 - 2019-08-30 ### Additions - Provide `filename` metadata for all `deviantart` files ([#392](https://github.com/mikf/gallery-dl/issues/392), [#400](https://github.com/mikf/gallery-dl/issues/400)) - Implement a `ytdl.outtmpl` option to let youtube-dl handle filenames by itself ([#395](https://github.com/mikf/gallery-dl/issues/395)) - Support `seiga` mobile URLs ([#401](https://github.com/mikf/gallery-dl/issues/401)) ### Fixes - Extract more than the first 32 posts from `piczel` galleries ([#396](https://github.com/mikf/gallery-dl/issues/396)) - Fix filenames of archives created with `--zip` ([#397](https://github.com/mikf/gallery-dl/issues/397)) - Skip unavailable images and videos on `flickr` ([#398](https://github.com/mikf/gallery-dl/issues/398)) - Fix filesystem paths on Windows with Python 3.6 and lower ([#402](https://github.com/mikf/gallery-dl/issues/402)) ## 1.10.2 - 2019-08-23 ### Additions - Support for `instagram` stories and IGTV ([#371](https://github.com/mikf/gallery-dl/issues/371), [#373](https://github.com/mikf/gallery-dl/issues/373)) - Support for individual `imgbb` images ([#363](https://github.com/mikf/gallery-dl/issues/363)) - `deviantart.quality` option to set the JPEG compression quality for newer images ([#369](https://github.com/mikf/gallery-dl/issues/369)) - `enumerate` option for `extractor.skip` ([#306](https://github.com/mikf/gallery-dl/issues/306)) - `adjust-extensions` option to control filename extension adjustments - `path-remove` option to remove control characters etc. from filesystem paths ### Changes - Rename `restrict-filenames` to `path-restrict` - Adjust `pixiv` metadata and default filename format ([#366](https://github.com/mikf/gallery-dl/issues/366)) - Set `filename` to `"{category}_{user[id]}_{id}{suffix}.{extension}"` to restore the old default - Improve and optimize directory and filename generation ### Fixes - Allow the `classify` post-processor to handle files with unknown filename extension ([#138](https://github.com/mikf/gallery-dl/issues/138)) - Fix rate limit handling for OAuth APIs ([#368](https://github.com/mikf/gallery-dl/issues/368)) - Fix artwork and scraps extraction on `deviantart` ([#376](https://github.com/mikf/gallery-dl/issues/376), [#392](https://github.com/mikf/gallery-dl/issues/392)) - Distinguish between `imgur` album and gallery URLs ([#380](https://github.com/mikf/gallery-dl/issues/380)) - Prevent crash when using `--ugoira-conv` ([#382](https://github.com/mikf/gallery-dl/issues/382)) - Handle multi-image posts on `patreon` ([#383](https://github.com/mikf/gallery-dl/issues/383)) - Miscellaneous fixes for `*reactor`, `simplyhentai` ## 1.10.1 - 2019-08-02 ### Fixes - Use the correct domain for exhentai.org input URLs ## 1.10.0 - 2019-08-01 ### Warning - Prior to version 1.10.0 all cache files were created world readable (mode `644`) leading to possible sensitive information disclosure on multi-user systems - It is recommended to restrict access permissions of already existing files (`/tmp/.gallery-dl.cache`) with `chmod 600` - Windows users should not be affected ### Additions - Support for - `vsco` - https://vsco.co/ ([#331](https://github.com/mikf/gallery-dl/issues/331)) - `imgbb` - https://imgbb.com/ ([#361](https://github.com/mikf/gallery-dl/issues/361)) - `adultempire` - https://www.adultempire.com/ ([#340](https://github.com/mikf/gallery-dl/issues/340)) - `restrict-filenames` option to create Windows-compatible filenames on any platform ([#348](https://github.com/mikf/gallery-dl/issues/348)) - `forward-cookies` option to control cookie forwarding to youtube-dl ([#352](https://github.com/mikf/gallery-dl/issues/352)) ### Changes - The default cache file location on non-Windows systems is now - `$XDG_CACHE_HOME/gallery-dl/cache.sqlite3` or - `~/.cache/gallery-dl/cache.sqlite3` - New cache files are created with mode `600` - `exhentai` extractors will always use `e-hentai.org` as domain ### Fixes - Better handling of `exhentai` image limits and errors ([#356](https://github.com/mikf/gallery-dl/issues/356), [#360](https://github.com/mikf/gallery-dl/issues/360)) - Try to prevent ZIP file corruption ([#355](https://github.com/mikf/gallery-dl/issues/355)) - Miscellaneous fixes for `behance`, `ngomik` ## 1.9.0 - 2019-07-19 ### Additions - Support for - `erolord` - http://erolord.com/ ([#326](https://github.com/mikf/gallery-dl/issues/326)) - Add login support for `instagram` ([#195](https://github.com/mikf/gallery-dl/issues/195)) - Add `--no-download` and `extractor.*.download` disable file downloads ([#220](https://github.com/mikf/gallery-dl/issues/220)) - Add `-A/--abort` to specify the number of consecutive download skips before aborting - Interpret `-1` as infinite retries ([#300](https://github.com/mikf/gallery-dl/issues/300)) - Implement custom log message formats per log-level ([#304](https://github.com/mikf/gallery-dl/issues/304)) - Implement an `mtime` post-processor that sets file modification times according to metadata fields ([#332](https://github.com/mikf/gallery-dl/issues/332)) - Implement a `twitter.content` option to enable tweet text extraction ([#333](https://github.com/mikf/gallery-dl/issues/333), [#338](https://github.com/mikf/gallery-dl/issues/338)) - Enable `date-min/-max/-format` options for `tumblr` ([#337](https://github.com/mikf/gallery-dl/issues/337)) ### Changes - Set file modification times according to their `Last-Modified` header when downloading ([#236](https://github.com/mikf/gallery-dl/issues/236), [#277](https://github.com/mikf/gallery-dl/issues/277)) - Use `--no-mtime` or `downloader.*.mtime` to disable this behavior - Duplicate download URLs are no longer silently ignored (controllable with `extractor.*.image-unique`) - Deprecate `--abort-on-skip` ### Fixes - Retry downloads on OpenSSL exceptions ([#324](https://github.com/mikf/gallery-dl/issues/324)) - Ignore unavailable pins on `sexcom` instead of raising an exception ([#325](https://github.com/mikf/gallery-dl/issues/325)) - Use Firefox's SSL/TLS ciphers to prevent Cloudflare CAPTCHAs ([#342](https://github.com/mikf/gallery-dl/issues/342)) - Improve folder name matching on `deviantart` ([#343](https://github.com/mikf/gallery-dl/issues/343)) - Forward cookies to `youtube-dl` to allow downloading private videos - Miscellaneous fixes for `35photo`, `500px`, `newgrounds`, `simplyhentai` ## 1.8.7 - 2019-06-28 ### Additions - Support for - `vanillarock` - https://vanilla-rock.com/ ([#254](https://github.com/mikf/gallery-dl/issues/254)) - `nsfwalbum` - https://nsfwalbum.com/ ([#287](https://github.com/mikf/gallery-dl/issues/287)) - `artist` and `tags` metadata for `hentaicafe` ([#238](https://github.com/mikf/gallery-dl/issues/238)) - `description` metadata for `instagram` ([#310](https://github.com/mikf/gallery-dl/issues/310)) - Format string option to replace a substring with another - `R//` ([#318](https://github.com/mikf/gallery-dl/issues/318)) ### Changes - Delete empty archives created by the `zip` post-processor ([#316](https://github.com/mikf/gallery-dl/issues/316)) ### Fixes - Handle `hitomi` Game CG galleries correctly ([#321](https://github.com/mikf/gallery-dl/issues/321)) - Miscellaneous fixes for `deviantart`, `hitomi`, `pururin`, `kissmanga`, `keenspot`, `mangoxo`, `imagefap` ## 1.8.6 - 2019-06-14 ### Additions - Support for - `slickpic` - https://www.slickpic.com/ ([#249](https://github.com/mikf/gallery-dl/issues/249)) - `xhamster` - https://xhamster.com/ ([#281](https://github.com/mikf/gallery-dl/issues/281)) - `pornhub` - https://www.pornhub.com/ ([#282](https://github.com/mikf/gallery-dl/issues/282)) - `8muses` - https://www.8muses.com/ ([#305](https://github.com/mikf/gallery-dl/issues/305)) - `extra` option for `deviantart` to download Sta.sh content linked in description texts ([#302](https://github.com/mikf/gallery-dl/issues/302)) ### Changes - Detect `directlink` URLs with upper case filename extensions ([#296](https://github.com/mikf/gallery-dl/issues/296)) ### Fixes - Improved error handling for `tumblr` API calls ([#297](https://github.com/mikf/gallery-dl/issues/297)) - Fixed extraction of `livedoor` blogs ([#301](https://github.com/mikf/gallery-dl/issues/301)) - Fixed extraction of special `deviantart` Sta.sh items ([#307](https://github.com/mikf/gallery-dl/issues/307)) - Fixed pagination for specific `keenspot` comics ## 1.8.5 - 2019-06-01 ### Additions - Support for - `keenspot` - http://keenspot.com/ ([#223](https://github.com/mikf/gallery-dl/issues/223)) - `sankakucomplex` - https://www.sankakucomplex.com ([#258](https://github.com/mikf/gallery-dl/issues/258)) - `folders` option for `deviantart` to add a list of containing folders to each file ([#276](https://github.com/mikf/gallery-dl/issues/276)) - `captcha` option for `kissmanga` and `readcomiconline` to control CAPTCHA handling ([#279](https://github.com/mikf/gallery-dl/issues/279)) - `filename` metadata for files downloaded with youtube-dl ([#291](https://github.com/mikf/gallery-dl/issues/291)) ### Changes - Adjust `wallhaven` extractors to new page layout: - use API and add `api-key` option - removed traditional login support - Provide original filenames for `patreon` downloads ([#268](https://github.com/mikf/gallery-dl/issues/268)) - Use e-hentai.org or exhentai.org depending on input URL ([#278](https://github.com/mikf/gallery-dl/issues/278)) ### Fixes - Fix pagination over `sankaku` popular listings ([#265](https://github.com/mikf/gallery-dl/issues/265)) - Fix folder and collection extraction on `deviantart` ([#271](https://github.com/mikf/gallery-dl/issues/271)) - Detect "AreYouHuman" redirects on `readcomiconline` ([#279](https://github.com/mikf/gallery-dl/issues/279)) - Miscellaneous fixes for `hentainexus`, `livedoor`, `ngomik` ## 1.8.4 - 2019-05-17 ### Additions - Support for - `patreon` - https://www.patreon.com/ ([#226](https://github.com/mikf/gallery-dl/issues/226)) - `hentainexus` - https://hentainexus.com/ ([#256](https://github.com/mikf/gallery-dl/issues/256)) - `date` metadata fields for `pixiv` ([#248](https://github.com/mikf/gallery-dl/issues/248)), `instagram` ([#250](https://github.com/mikf/gallery-dl/issues/250)), `exhentai`, and `newgrounds` ### Changes - Improved `flickr` metadata and video extraction ([#246](https://github.com/mikf/gallery-dl/issues/246)) ### Fixes - Download original GIF animations from `deviantart` ([#242](https://github.com/mikf/gallery-dl/issues/242)) - Ignore missing `edge_media_to_comment` fields on `instagram` ([#250](https://github.com/mikf/gallery-dl/issues/250)) - Fix serialization of `datetime` objects for `--write-metadata` ([#251](https://github.com/mikf/gallery-dl/issues/251), [#252](https://github.com/mikf/gallery-dl/issues/252)) - Allow multiple post-processor command-line options at once ([#253](https://github.com/mikf/gallery-dl/issues/253)) - Prevent crash on `booru` sites when no tags are available ([#259](https://github.com/mikf/gallery-dl/issues/259)) - Fix extraction on `instagram` after `rhx_gis` field removal ([#266](https://github.com/mikf/gallery-dl/issues/266)) - Avoid Cloudflare CAPTCHAs for Python interpreters built against OpenSSL < 1.1.1 - Miscellaneous fixes for `luscious` ## 1.8.3 - 2019-05-04 ### Additions - Support for - `plurk` - https://www.plurk.com/ ([#212](https://github.com/mikf/gallery-dl/issues/212)) - `sexcom` - https://www.sex.com/ ([#147](https://github.com/mikf/gallery-dl/issues/147)) - `--clear-cache` - `date` metadata fields for `deviantart`, `twitter`, and `tumblr` ([#224](https://github.com/mikf/gallery-dl/issues/224), [#232](https://github.com/mikf/gallery-dl/issues/232)) ### Changes - Standalone executables are now built using PyInstaller: - uses the latest CPython interpreter (Python 3.7.3) - available on several platforms (Windows, Linux, macOS) - includes the `certifi` CA bundle, `youtube-dl`, and `pyOpenSSL` on Windows ### Fixes - Patch `urllib3`'s default list of SSL/TLS ciphers to prevent Cloudflare CAPTCHAs ([#227](https://github.com/mikf/gallery-dl/issues/227)) (Windows users need to install `pyOpenSSL` for this to take effect) - Provide fallback URLs for `twitter` images ([#237](https://github.com/mikf/gallery-dl/issues/237)) - Send `Referer` headers when downloading from `hitomi` ([#239](https://github.com/mikf/gallery-dl/issues/239)) - Updated login procedure on `mangoxo` ## 1.8.2 - 2019-04-12 ### Additions - Support for - `pixnet` - https://www.pixnet.net/ ([#177](https://github.com/mikf/gallery-dl/issues/177)) - `wikiart` - https://www.wikiart.org/ ([#179](https://github.com/mikf/gallery-dl/issues/179)) - `mangoxo` - https://www.mangoxo.com/ ([#184](https://github.com/mikf/gallery-dl/issues/184)) - `yaplog` - https://yaplog.jp/ ([#190](https://github.com/mikf/gallery-dl/issues/190)) - `livedoor` - http://blog.livedoor.jp/ ([#190](https://github.com/mikf/gallery-dl/issues/190)) - Login support for `mangoxo` ([#184](https://github.com/mikf/gallery-dl/issues/184)) and `twitter` ([#214](https://github.com/mikf/gallery-dl/issues/214)) ### Changes - Increased required `Requests` version to 2.11.0 ### Fixes - Improved image quality on `reactor` sites ([#210](https://github.com/mikf/gallery-dl/issues/210)) - Support `imagebam` galleries with more than 100 images ([#219](https://github.com/mikf/gallery-dl/issues/219)) - Updated Cloudflare bypass code ## 1.8.1 - 2019-03-29 ### Additions - Support for: - `35photo` - https://35photo.pro/ ([#162](https://github.com/mikf/gallery-dl/issues/162)) - `500px` - https://500px.com/ ([#185](https://github.com/mikf/gallery-dl/issues/185)) - `instagram` extractor for hashtags ([#202](https://github.com/mikf/gallery-dl/issues/202)) - Option to get more metadata on `deviantart` ([#189](https://github.com/mikf/gallery-dl/issues/189)) - Man pages and bash completion ([#150](https://github.com/mikf/gallery-dl/issues/150)) - Snap improvements ([#197](https://github.com/mikf/gallery-dl/issues/197), [#199](https://github.com/mikf/gallery-dl/issues/199), [#207](https://github.com/mikf/gallery-dl/issues/207)) ### Changes - Better FFmpeg arguments for `--ugoira-conv` - Adjusted metadata for `luscious` albums ### Fixes - Proper handling of `instagram` multi-image posts ([#178](https://github.com/mikf/gallery-dl/issues/178), [#201](https://github.com/mikf/gallery-dl/issues/201)) - Fixed `tumblr` avatar URLs when not using OAuth1.0 ([#193](https://github.com/mikf/gallery-dl/issues/193)) - Miscellaneous fixes for `exhentai`, `komikcast` ## 1.8.0 - 2019-03-15 ### Additions - Support for: - `weibo` - https://www.weibo.com/ - `pururin` - https://pururin.io/ ([#174](https://github.com/mikf/gallery-dl/issues/174)) - `fashionnova` - https://www.fashionnova.com/ ([#175](https://github.com/mikf/gallery-dl/issues/175)) - `shopify` sites in general ([#175](https://github.com/mikf/gallery-dl/issues/175)) - Snap packaging ([#169](https://github.com/mikf/gallery-dl/issues/169), [#170](https://github.com/mikf/gallery-dl/issues/170), [#187](https://github.com/mikf/gallery-dl/issues/187), [#188](https://github.com/mikf/gallery-dl/issues/188)) - Automatic Cloudflare DDoS protection bypass - Extractor and Job information for logging format strings - `dynastyscans` image and search extractors ([#163](https://github.com/mikf/gallery-dl/issues/163)) - `deviantart` scraps extractor ([#168](https://github.com/mikf/gallery-dl/issues/168)) - `artstation` extractor for artwork listings ([#172](https://github.com/mikf/gallery-dl/issues/172)) - `smugmug` video support and improved image format selection ([#183](https://github.com/mikf/gallery-dl/issues/183)) ### Changes - More metadata for `nhentai` galleries - Combined `myportfolio` extractors into one - Renamed `name` metadata field to `filename` and removed the original `filename` field - Simplified and improved internal data structures - Optimized creation of child extractors ### Fixes - Filter empty `tumblr` URLs ([#165](https://github.com/mikf/gallery-dl/issues/165)) - Filter ads and improve connection speed on `hentaifoundry` - Show proper error messages if `luscious` galleries are unavailable - Miscellaneous fixes for `mangahere`, `ngomik`, `simplyhentai`, `imgspice` ### Removals - `seaotterscans` ## 1.7.0 - 2019-02-05 - Added support for: - `photobucket` - http://photobucket.com/ ([#117](https://github.com/mikf/gallery-dl/issues/117)) - `hentaifox` - https://hentaifox.com/ ([#160](https://github.com/mikf/gallery-dl/issues/160)) - `tsumino` - https://www.tsumino.com/ ([#161](https://github.com/mikf/gallery-dl/issues/161)) - Added the ability to dynamically generate extractors based on a user's config file for - [`mastodon`](https://github.com/tootsuite/mastodon) instances ([#144](https://github.com/mikf/gallery-dl/issues/144)) - [`foolslide`](https://github.com/FoolCode/FoOlSlide) based sites - [`foolfuuka`](https://github.com/FoolCode/FoolFuuka) based archives - Added an extractor for `behance` collections ([#157](https://github.com/mikf/gallery-dl/issues/157)) - Added login support for `luscious` ([#159](https://github.com/mikf/gallery-dl/issues/159)) and `tsumino` ([#161](https://github.com/mikf/gallery-dl/issues/161)) - Added an option to stop downloading if the `exhentai` image limit is exceeded ([#141](https://github.com/mikf/gallery-dl/issues/141)) - Fixed extraction issues for `behance` and `mangapark` ## 1.6.3 - 2019-01-18 - Added `metadata` post-processor to write image metadata to an external file ([#135](https://github.com/mikf/gallery-dl/issues/135)) - Added option to reverse chapter order of manga extractors ([#149](https://github.com/mikf/gallery-dl/issues/149)) - Added authentication support for `danbooru` ([#151](https://github.com/mikf/gallery-dl/issues/151)) - Added tag metadata for `exhentai` and `hbrowse` galleries - Improved `*reactor` extractors ([#148](https://github.com/mikf/gallery-dl/issues/148)) - Fixed extraction issues for `nhentai` ([#156](https://github.com/mikf/gallery-dl/issues/156)), `pinterest`, `mangapark` ## 1.6.2 - 2019-01-01 - Added support for: - `instagram` - https://www.instagram.com/ ([#134](https://github.com/mikf/gallery-dl/issues/134)) - Added support for multiple items on sta.sh pages ([#113](https://github.com/mikf/gallery-dl/issues/113)) - Added option to download `tumblr` avatars ([#137](https://github.com/mikf/gallery-dl/issues/137)) - Changed defaults for visited post types and inline media on `tumblr` - Improved inline extraction of `tumblr` posts ([#133](https://github.com/mikf/gallery-dl/issues/133), [#137](https://github.com/mikf/gallery-dl/issues/137)) - Improved error handling and retry behavior of all API calls - Improved handling of missing fields in format strings ([#136](https://github.com/mikf/gallery-dl/issues/136)) - Fixed hash extraction for unusual `tumblr` URLs ([#129](https://github.com/mikf/gallery-dl/issues/129)) - Fixed image subdomains for `hitomi` galleries ([#142](https://github.com/mikf/gallery-dl/issues/142)) - Fixed and improved miscellaneous issues for `kissmanga` ([#20](https://github.com/mikf/gallery-dl/issues/20)), `luscious`, `mangapark`, `readcomiconline` ## 1.6.1 - 2018-11-28 - Added support for: - `joyreactor` - http://joyreactor.cc/ ([#114](https://github.com/mikf/gallery-dl/issues/114)) - `pornreactor` - http://pornreactor.cc/ ([#114](https://github.com/mikf/gallery-dl/issues/114)) - `newgrounds` - https://www.newgrounds.com/ ([#119](https://github.com/mikf/gallery-dl/issues/119)) - Added extractor for search results on `luscious` ([#127](https://github.com/mikf/gallery-dl/issues/127)) - Fixed filenames of ZIP archives ([#126](https://github.com/mikf/gallery-dl/issues/126)) - Fixed extraction issues for `gfycat`, `hentaifoundry` ([#125](https://github.com/mikf/gallery-dl/issues/125)), `mangafox` ## 1.6.0 - 2018-11-17 - Added support for: - `wallhaven` - https://alpha.wallhaven.cc/ - `yuki` - https://yuki.la/ - Added youtube-dl integration and video downloads for `twitter` ([#99](https://github.com/mikf/gallery-dl/issues/99)), `behance`, `artstation` - Added per-extractor options for network connections (`retries`, `timeout`, `verify`) - Added a `--no-check-certificate` command-line option - Added ability to specify the number of skipped downloads before aborting/exiting ([#115](https://github.com/mikf/gallery-dl/issues/115)) - Added extractors for scraps, favorites, popular and recent images on `hentaifoundry` ([#110](https://github.com/mikf/gallery-dl/issues/110)) - Improved login procedure for `pixiv` to avoid unwanted emails on each new login - Improved album metadata and error handling for `flickr` ([#109](https://github.com/mikf/gallery-dl/issues/109)) - Updated default User-Agent string to Firefox 62 ([#122](https://github.com/mikf/gallery-dl/issues/122)) - Fixed `twitter` API response handling when logged in ([#123](https://github.com/mikf/gallery-dl/issues/123)) - Fixed issue when converting Ugoira using H.264 - Fixed miscellaneous issues for `2chan`, `deviantart`, `fallenangels`, `flickr`, `imagefap`, `pinterest`, `turboimagehost`, `warosu`, `yuki` ([#112](https://github.com/mikf/gallery-dl/issues/112)) ## 1.5.3 - 2018-09-14 - Added support for: - `hentaicafe` - https://hentai.cafe/ ([#101](https://github.com/mikf/gallery-dl/issues/101)) - `bobx` - http://www.bobx.com/dark/ - Added black-/whitelist options for post-processor modules - Added support for `tumblr` inline videos ([#102](https://github.com/mikf/gallery-dl/issues/102)) - Fixed extraction of `smugmug` albums without owner ([#100](https://github.com/mikf/gallery-dl/issues/100)) - Fixed issues when using default config values with `reddit` extractors ([#104](https://github.com/mikf/gallery-dl/issues/104)) - Fixed pagination for user favorites on `sankaku` ([#106](https://github.com/mikf/gallery-dl/issues/106)) - Fixed a crash when processing `deviantart` journals ([#108](https://github.com/mikf/gallery-dl/issues/108)) ## 1.5.2 - 2018-08-31 - Added support for `twitter` timelines ([#96](https://github.com/mikf/gallery-dl/issues/96)) - Added option to suppress FFmpeg output during ugoira conversions - Improved filename formatter performance - Improved inline image quality on `tumblr` ([#98](https://github.com/mikf/gallery-dl/issues/98)) - Fixed image URLs for newly released `mangadex` chapters - Fixed a smaller issue with `deviantart` journals - Replaced `subapics` with `ngomik` ## 1.5.1 - 2018-08-17 - Added support for: - `piczel` - https://piczel.tv/ - Added support for related pins on `pinterest` - Fixed accessing "offensive" galleries on `exhentai` ([#97](https://github.com/mikf/gallery-dl/issues/97)) - Fixed extraction issues for `mangadex`, `komikcast` and `behance` - Removed original-image functionality from `tumblr`, since "raw" images are no longer accessible ## 1.5.0 - 2018-08-03 - Added support for: - `behance` - https://www.behance.net/ - `myportfolio` - https://www.myportfolio.com/ ([#95](https://github.com/mikf/gallery-dl/issues/95)) - Added custom format string options to handle long strings ([#92](https://github.com/mikf/gallery-dl/issues/92), [#94](https://github.com/mikf/gallery-dl/issues/94)) - Slicing: `"{field[10:40]}"` - Replacement: `"{field:L40/too long/}"` - Improved frame rate handling for ugoira conversions - Improved private access token usage on `deviantart` - Fixed metadata extraction for some images on `nijie` - Fixed chapter extraction on `mangahere` - Removed `whatisthisimnotgoodwithcomputers` - Removed support for Python 3.3 ## 1.4.2 - 2018-07-06 - Added image-pool extractors for `safebooru` and `rule34` - Added option for extended tag information on `booru` sites ([#92](https://github.com/mikf/gallery-dl/issues/92)) - Added support for DeviantArt's new URL format - Added support for `mangapark` mirrors - Changed `imagefap` extractors to use HTTPS - Fixed crash when skipping downloads for files without known extension ## 1.4.1 - 2018-06-22 - Added an `ugoira` post-processor to convert `pixiv` animations to WebM - Added `--zip` and `--ugoira-conv` command-line options - Changed how ugoira frame information is handled - instead of being written to a separate file, it is now made available as metadata field of the ZIP archive - Fixed manga and chapter titles for `mangadex` - Fixed file deletion by post-processors ## 1.4.0 - 2018-06-08 - Added support for: - `simplyhentai` - https://www.simply-hentai.com/ ([#89](https://github.com/mikf/gallery-dl/issues/89)) - Added extractors for - `pixiv` search results and followed users - `deviantart` search results and popular listings - Added post-processors to perform actions on downloaded files - Added options to configure logging behavior - Added OAuth support for `smugmug` - Changed `pixiv` extractors to use the AppAPI - this breaks `favorite` archive IDs and changes some metadata fields - Changed the default filename format for `tumblr` and renamed `offset` to `num` - Fixed a possible UnicodeDecodeError during installation ([#86](https://github.com/mikf/gallery-dl/issues/86)) - Fixed extraction of `mangadex` manga with more than 100 chapters ([#84](https://github.com/mikf/gallery-dl/issues/84)) - Fixed miscellaneous issues for `imgur`, `reddit`, `komikcast`, `mangafox` and `imagebam` ## 1.3.5 - 2018-05-04 - Added support for: - `smugmug` - https://www.smugmug.com/ - Added title information for `mangadex` chapters - Improved the `pinterest` API implementation ([#83](https://github.com/mikf/gallery-dl/issues/83)) - Improved error handling for `deviantart` and `tumblr` - Removed `gomanga` and `puremashiro` ## 1.3.4 - 2018-04-20 - Added support for custom OAuth2 credentials for `pinterest` - Improved rate limit handling for `tumblr` extractors - Improved `hentaifoundry` extractors - Improved `imgur` URL patterns - Fixed miscellaneous extraction issues for `luscious` and `komikcast` - Removed `loveisover` and `spectrumnexus` ## 1.3.3 - 2018-04-06 - Added extractors for - `nhentai` search results - `exhentai` search results and favorites - `nijie` doujins and favorites - Improved metadata extraction for `exhentai` and `nijie` - Improved `tumblr` extractors by avoiding unnecessary API calls - Fixed Cloudflare DDoS protection bypass - Fixed errors when trying to print unencodable characters ## 1.3.2 - 2018-03-23 - Added extractors for `artstation` albums, challenges and search results - Improved URL and metadata extraction for `hitomi`and `nhentai` - Fixed page transitions for `danbooru` API results ([#82](https://github.com/mikf/gallery-dl/issues/82)) ## 1.3.1 - 2018-03-16 - Added support for: - `mangadex` - https://mangadex.org/ - `artstation` - https://www.artstation.com/ - Added Cloudflare DDoS protection bypass to `komikcast` extractors - Changed archive ID formats for `deviantart` folders and collections - Improved error handling for `deviantart` API calls - Removed `imgchili` and various smaller image hosts ## 1.3.0 - 2018-03-02 - Added `--proxy` to explicitly specify a proxy server ([#76](https://github.com/mikf/gallery-dl/issues/76)) - Added options to customize [archive ID formats](https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst#extractorarchive-format) and [undefined replacement fields](https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst#extractorkeywords-default) - Changed various archive ID formats to improve their behavior for favorites / bookmarks / etc. - Affected modules are `deviantart`, `flickr`, `tumblr`, `pixiv` and all …boorus - Improved `sankaku` and `idolcomplex` support by - respecting `page` and `next` URL parameters ([#79](https://github.com/mikf/gallery-dl/issues/79)) - bypassing the page-limit for unauthenticated users - Improved `directlink` metadata by properly unquoting it - Fixed `pixiv` ugoira extraction ([#78](https://github.com/mikf/gallery-dl/issues/78)) - Fixed miscellaneous extraction issues for `mangastream` and `tumblr` - Removed `yeet`, `chronos`, `coreimg`, `hosturimage`, `imageontime`, `img4ever`, `imgmaid`, `imgupload` ## 1.2.0 - 2018-02-16 - Added support for: - `paheal` - https://rule34.paheal.net/ ([#69](https://github.com/mikf/gallery-dl/issues/69)) - `komikcast` - https://komikcast.com/ ([#70](https://github.com/mikf/gallery-dl/issues/70)) - `subapics` - http://subapics.com/ ([#70](https://github.com/mikf/gallery-dl/issues/70)) - Added `--download-archive` to record downloaded files in an archive file - Added `--write-log` to write logging output to a file - Added a filetype check on download completion to fix incorrectly assigned filename extensions ([#63](https://github.com/mikf/gallery-dl/issues/63)) - Added the `tumblr:...` pseudo URI scheme to support custom domains for Tumblr blogs ([#71](https://github.com/mikf/gallery-dl/issues/71)) - Added fallback URLs for `tumblr` images ([#64](https://github.com/mikf/gallery-dl/issues/64)) - Added support for `reddit`-hosted images ([#68](https://github.com/mikf/gallery-dl/issues/68)) - Improved the input file format by allowing comments and per-URL options - Fixed OAuth 1.0 signature generation for Python 3.3 and 3.4 ([#75](https://github.com/mikf/gallery-dl/issues/75)) - Fixed smaller issues for `luscious`, `hentai2read`, `hentaihere` and `imgur` - Removed the `batoto` module ## 1.1.2 - 2018-01-12 - Added support for: - `puremashiro` - http://reader.puremashiro.moe/ ([#66](https://github.com/mikf/gallery-dl/issues/66)) - `idolcomplex` - https://idol.sankakucomplex.com/ - Added an option to filter reblogs on `tumblr` ([#61](https://github.com/mikf/gallery-dl/issues/61)) - Added OAuth user authentication for `tumblr` ([#65](https://github.com/mikf/gallery-dl/issues/65)) - Added support for `slideshare` mobile URLs ([#67](https://github.com/mikf/gallery-dl/issues/67)) - Improved pagination for various …booru sites to work around page limits - Fixed chapter information parsing for certain manga on `kissmanga` ([#58](https://github.com/mikf/gallery-dl/issues/58)) and `batoto` ([#60](https://github.com/mikf/gallery-dl/issues/60)) ## 1.1.1 - 2017-12-22 - Added support for: - `slideshare` - https://www.slideshare.net/ ([#54](https://github.com/mikf/gallery-dl/issues/54)) - Added pool- and post-extractors for `sankaku` - Added OAuth user authentication for `deviantart` - Updated `luscious` to support `members.luscious.net` URLs ([#55](https://github.com/mikf/gallery-dl/issues/55)) - Updated `mangahere` to use their new domain name (mangahere.cc) and support mobile URLs - Updated `gelbooru` to not be restricted to the first 20,000 images ([#56](https://github.com/mikf/gallery-dl/issues/56)) - Fixed extraction issues for `nhentai` and `khinsider` ## 1.1.0 - 2017-12-08 - Added the ``-r/--limit-rate`` command-line option to set a maximum download rate - Added the ``--sleep`` command-line option to specify the number of seconds to sleep before each download - Updated `gelbooru` to no longer use their now disabled API - Fixed SWF extraction for `sankaku` ([#52](https://github.com/mikf/gallery-dl/issues/52)) - Fixed extraction issues for `hentai2read` and `khinsider` - Removed the deprecated `--images` and `--chapters` options - Removed the ``mangazuki`` module ## 1.0.2 - 2017-11-24 - Added an option to set a [custom user-agent string](https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst#extractoruser-agent) - Improved retry behavior for failed HTTP requests - Improved `seiga` by providing better metadata and getting more than the latest 200 images - Improved `tumblr` by adding support for [all post types](https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst#extractortumblrposts), scanning for [inline images](https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst#extractortumblrinline) and following [external links](https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst#extractortumblrexternal) ([#48](https://github.com/mikf/gallery-dl/issues/48)) - Fixed extraction issues for `hbrowse`, `khinsider` and `senmanga` ## 1.0.1 - 2017-11-10 - Added support for: - `xvideos` - https://www.xvideos.com/ ([#45](https://github.com/mikf/gallery-dl/issues/45)) - Fixed exception handling during file downloads which could lead to a premature exit - Fixed an issue with `tumblr` where not all images would be downloaded when using tags ([#48](https://github.com/mikf/gallery-dl/issues/48)) - Fixed extraction issues for `imgbox` ([#47](https://github.com/mikf/gallery-dl/issues/47)), `mangastream` ([#49](https://github.com/mikf/gallery-dl/issues/49)) and `mangahere` ## 1.0.0 - 2017-10-27 - Added support for: - `warosu` - https://warosu.org/ - `b4k` - https://arch.b4k.co/ - Added support for `pixiv` ranking lists - Added support for `booru` popular lists (`danbooru`, `e621`, `konachan`, `yandere`, `3dbooru`) - Added the `--cookies` command-line and [`cookies`](https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst#extractorcookies) config option to load additional cookies - Added the `--filter` and `--chapter-filter` command-line options to select individual images or manga-chapters by their metadata using simple Python expressions ([#43](https://github.com/mikf/gallery-dl/issues/43)) - Added the [`verify`](https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst#downloaderhttpverify) config option to control certificate verification during file downloads - Added config options to overwrite internally used API credentials ([API Tokens & IDs](https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst#api-tokens-ids)) - Added `-K` as a shortcut for `--list-keywords` - Changed the `--images` and `--chapters` command-line options to `--range` and `--chapter-range` - Changed keyword names for various modules to make them accessible by `--filter`. In general minus signs have been replaced with underscores (e.g. `gallery-id` -> `gallery_id`). - Changed default filename formats for manga extractors to optionally use volume and title information - Improved the downloader modules to use [`.part` files](https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst#downloaderpart) and support resuming incomplete downloads ([#29](https://github.com/mikf/gallery-dl/issues/29)) - Improved `deviantart` by distinguishing between users and groups ([#26](https://github.com/mikf/gallery-dl/issues/26)), always using HTTPS, and always downloading full-sized original images - Improved `sankaku` by adding authentication support and fixing various other issues ([#44](https://github.com/mikf/gallery-dl/issues/44)) - Improved URL pattern for direct image links ([#30](https://github.com/mikf/gallery-dl/issues/30)) - Fixed an issue with `luscious` not getting original image URLs ([#33](https://github.com/mikf/gallery-dl/issues/33)) - Fixed various smaller issues for `batoto`, `hentai2read` ([#38](https://github.com/mikf/gallery-dl/issues/38)), `jaiminisbox`, `khinsider`, `kissmanga` ([#28](https://github.com/mikf/gallery-dl/issues/28), [#46](https://github.com/mikf/gallery-dl/issues/46)), `mangahere`, `pawoo`, `twitter` - Removed `kisscomic` and `yonkouprod` modules ## 0.9.1 - 2017-07-24 - Added support for: - `2chan` - https://www.2chan.net/ - `4plebs` - https://archive.4plebs.org/ - `archivedmoe` - https://archived.moe/ - `archiveofsins` - https://archiveofsins.com/ - `desuarchive` - https://desuarchive.org/ - `fireden` - https://boards.fireden.net/ - `loveisover` - https://archive.loveisover.me/ - `nyafuu` - https://archive.nyafuu.org/ - `rbt` - https://rbt.asia/ - `thebarchive` - https://thebarchive.com/ - `mangazuki` - https://mangazuki.co/ - Improved `reddit` to allow submission filtering by ID and human-readable dates - Improved `deviantart` to support group galleries and gallery folders ([#26](https://github.com/mikf/gallery-dl/issues/26)) - Changed `deviantart` to use better default path formats - Fixed extraction of larger `imgur` albums - Fixed some smaller issues for `pixiv`, `batoto` and `fallenangels` ## 0.9.0 - 2017-06-28 - Added support for: - `reddit` - https://www.reddit.com/ ([#15](https://github.com/mikf/gallery-dl/issues/15)) - `flickr` - https://www.flickr.com/ ([#16](https://github.com/mikf/gallery-dl/issues/16)) - `gfycat` - https://gfycat.com/ - Added support for direct image links - Added user authentication via [OAuth](https://github.com/mikf/gallery-dl#52oauth) for `reddit` and `flickr` - Added support for user authentication data from [`.netrc`](https://stackoverflow.com/tags/.netrc/info) files ([#22](https://github.com/mikf/gallery-dl/issues/22)) - Added a simple progress indicator for multiple URLs ([#19](https://github.com/mikf/gallery-dl/issues/19)) - Added the `--write-unsupported` command-line option to write unsupported URLs to a file - Added documentation for all available config options ([configuration.rst](https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst)) - Improved `pixiv` to support tags for user downloads ([#17](https://github.com/mikf/gallery-dl/issues/17)) - Improved `pixiv` to support shortened and http://pixiv.me/... URLs ([#23](https://github.com/mikf/gallery-dl/issues/23)) - Improved `imgur` to properly handle `.gifv` images and provide better metadata - Fixed an issue with `kissmanga` where metadata parsing for some series failed ([#20](https://github.com/mikf/gallery-dl/issues/20)) - Fixed an issue with getting filename extensions from `Content-Type` response headers ## 0.8.4 - 2017-05-21 - Added the `--abort-on-skip` option to stop extraction if a download would be skipped - Improved the output format of the `--list-keywords` option - Updated `deviantart` to support all media types and journals - Updated `fallenangels` to support their [Vietnamese version](https://truyen.fascans.com/) - Fixed an issue with multiple tags on ...booru sites - Removed the `yomanga` module ## 0.8.3 - 2017-05-01 - Added support for https://pawoo.net/ - Added manga extractors for all [FoOlSlide](https://foolcode.github.io/FoOlSlide/)-based modules - Added the `-q/--quiet` and `-v/--verbose` options to control output verbosity - Added the `-j/--dump-json` option to dump extractor results in JSON format - Added the `--ignore-config` option - Updated the `exhentai` extractor to fall back to using the e-hentai version if no username is given - Updated `deviantart` to support sta.sh URLs - Fixed an issue with `kissmanga` which prevented image URLs from being decrypted properly (again) - Fixed an issue with `pixhost` where for an image inside an album it would always download the first image of that album ([#13](https://github.com/mikf/gallery-dl/issues/13)) - Removed the `mangashare` and `readcomics` modules ## 0.8.2 - 2017-04-10 - Fixed an issue in `kissmanga` which prevented image URLs from being decrypted properly ## 0.8.1 - 2017-04-09 - Added new extractors: - `kireicake` - https://reader.kireicake.com/ - `seaotterscans` - https://reader.seaotterscans.com/ - Added a favourites extractor for `deviantart` - Re-enabled the `kissmanga` module - Updated `nijie` to support multi-page image listings - Updated `mangastream` to support readms.net URLs - Updated `exhentai` to support e-hentai.org URLs - Updated `fallenangels` to support their new domain and site layout ## 0.8.0 - 2017-03-28 - Added logging support - Added the `-R/--retries` option to specify how often a download should be retried before giving up - Added the `--http-timeout` option to set a timeout for HTTP connections - Improved error handling/tolerance during HTTP file downloads ([#10](https://github.com/mikf/gallery-dl/issues/10)) - Improved option parsing and the help message from `-h/--help` - Changed the way configuration values are used by prioritizing top-level values - This allows for cmdline options like `-u/--username` to overwrite values set in configuration files - Fixed an issue with `imagefap.com` where incorrectly reported gallery sizes would cause the extractor to fail ([#9](https://github.com/mikf/gallery-dl/issues/9)) - Fixed an issue with `seiga.nicovideo.jp` where invalid characters in an API response caused the XML parser to fail - Fixed an issue with `seiga.nicovideo.jp` where the filename extension for the first image would be used for all others - Removed support for old configuration paths on Windows - Removed several modules: - `mangamint`: site is down - `whentai`: now requires account with VIP status for original images - `kissmanga`: encrypted image URLs (will be re-added later) ## 0.7.0 - 2017-03-06 - Added `--images` and `--chapters` options - Specifies which images (or chapters) to download through a comma-separated list of indices or index-ranges - Example: `--images -2,4,6-8,10-` will select images with index 1, 2, 4, 6, 7, 8 and 10 up to the last one - Changed the `-g`/`--get-urls` option - The amount of how often the -g option is given now determines up until which level URLs are resolved. - See 3bca86618505c21628cd9c7179ce933a78d00ca2 - Changed several option keys: - `directory_fmt` -> `directory` - `filename_fmt` -> `filename` - `download-original` -> `original` - Improved [FoOlSlide](https://foolcode.github.io/FoOlSlide/)-based extractors - Fixed URL extraction for hentai2read - Fixed an issue with deviantart, where the API access token wouldn't get refreshed ## 0.6.4 - 2017-02-13 - Added new extractors: - fallenangels (famatg.com) - Fixed url- and data-extraction for: - nhentai - mangamint - twitter - imagetwist - Disabled InsecureConnectionWarning when no certificates are available ## 0.6.3 - 2017-01-25 - Added new extractors: - gomanga - yomanga - mangafox - Fixed deviantart extractor failing - switched to using their API - Fixed an issue with SQLite on Python 3.6 - Automated test builds via Travis CI - Standalone executables for Windows ## 0.6.2 - 2017-01-05 - Added new extractors: - kisscomic - readcomics - yonkouprod - jaiminisbox - Added manga extractor to batoto-module - Added user extractor to seiga-module - Added `-i`/`--input-file` argument to allow local files and stdin as input (like wget) - Added basic support for `file://` URLs - this allows for the recursive extractor to be applied to local files: - `$ gallery-dl r:file://[path to file]` - Added a utility extractor to run unit test URLs - Updated luscious to deal with API changes - Fixed twitter to provide the original image URL - Minor fixes to hentaifoundry - Removed imgclick extractor ## 0.6.1 - 2016-11-30 - Added new extractors: - whentai - readcomiconline - sensescans, worldthree - imgmaid, imagevenue, img4ever, imgspot, imgtrial, pixhost - Added base class for extractors of [FoOlSlide](https://foolcode.github.io/FoOlSlide/)-based sites - Changed default paths for configuration files on Windows - old paths are still supported, but that will change in future versions - Fixed aborting downloads if a single one failed ([#5](https://github.com/mikf/gallery-dl/issues/5)) - Fixed cloudflare-bypass cache containing outdated cookies - Fixed image URLs for hitomi and 8chan - Updated deviantart to always provide the highest quality image - Updated README.rst - Removed doujinmode extractor ## 0.6.0 - 2016-10-08 - Added new extractors: - hentaihere - dokireader - twitter - rapidimg, picmaniac - Added support to find filename extensions by Content-Type response header - Fixed filename/path issues on Windows ([#4](https://github.com/mikf/gallery-dl/issues/4)): - Enable path names with more than 260 characters - Remove trailing spaces in path segments - Updated Job class to automatically set category/subcategory keywords ## 0.5.2 - 2016-09-23 - Added new extractors: - pinterest - rule34 - dynastyscans - imagebam, coreimg, imgcandy, imgtrex - Added login capabilities for batoto - Added `--version` cmdline argument to print the current program version and exit - Added `--list-extractors` cmdline argument to print names of all extractor classes together with descriptions and example URLs - Added proper error messages if an image/user does not exist - Added unittests for every extractor ## 0.5.1 - 2016-08-22 - Added new extractors: - luscious - doujinmode - hentaibox - seiga - imagefap - Changed error output to use stderr instead of stdout - Fixed broken pipes causing an exception-dump by catching BrokenPipeErrors ## 0.5.0 - 2016-07-25 ## 0.4.1 - 2015-12-03 - New modules (imagetwist, turboimagehost) - Manga-extractors: Download entire manga and not just single chapters - Generic extractor (provisional) - Better and configurable console output - Windows support ## 0.4.0 - 2015-11-26 ## 0.3.3 - 2015-11-10 ## 0.3.2 - 2015-11-04 ## 0.3.1 - 2015-10-30 ## 0.3.0 - 2015-10-05 ## 0.2.0 - 2015-06-28 ## 0.1.0 - 2015-05-27 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1513530019.0 gallery_dl-1.21.1/LICENSE0000644000175000017500000004325413215521243013421 0ustar00mikemike GNU GENERAL PUBLIC LICENSE Version 2, June 1991 Copyright (C) 1989, 1991 Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This General Public License applies to most of the Free Software Foundation's software and to any other program whose authors commit to using it. (Some other Free Software Foundation software is covered by the GNU Lesser General Public License instead.) You can apply it to your programs, too. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things. To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it. For example, if you distribute copies of such a program, whether gratis or for a fee, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights. We protect your rights with two steps: (1) copyright the software, and (2) offer you this license which gives you legal permission to copy, distribute and/or modify the software. Also, for each author's protection and ours, we want to make certain that everyone understands that there is no warranty for this free software. If the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors' reputations. Finally, any free program is threatened constantly by software patents. We wish to avoid the danger that redistributors of a free program will individually obtain patent licenses, in effect making the program proprietary. To prevent this, we have made it clear that any patent must be licensed for everyone's free use or not licensed at all. The precise terms and conditions for copying, distribution and modification follow. GNU GENERAL PUBLIC LICENSE TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 0. This License applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License. The "Program", below, refers to any such program or work, and a "work based on the Program" means either the Program or any derivative work under copyright law: that is to say, a work containing the Program or a portion of it, either verbatim or with modifications and/or translated into another language. (Hereinafter, translation is included without limitation in the term "modification".) Each licensee is addressed as "you". Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running the Program is not restricted, and the output from the Program is covered only if its contents constitute a work based on the Program (independent of having been made by running the Program). Whether that is true depends on what the Program does. 1. You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and give any other recipients of the Program a copy of this License along with the Program. You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. 2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions: a) You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change. b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License. c) If the modified program normally reads commands interactively when run, you must cause it, when started running for such interactive use in the most ordinary way, to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty (or else, saying that you provide a warranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this License. (Exception: if the Program itself is interactive but does not normally print such an announcement, your work based on the Program is not required to print an announcement.) These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Program, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Program, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it. Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Program. In addition, mere aggregation of another work not based on the Program with the Program (or with a work based on the Program) on a volume of a storage or distribution medium does not bring the other work under the scope of this License. 3. You may copy and distribute the Program (or a work based on it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following: a) Accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, b) Accompany it with a written offer, valid for at least three years, to give any third party, for a charge no more than your cost of physically performing source distribution, a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, c) Accompany it with the information you received as to the offer to distribute corresponding source code. (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer, in accord with Subsection b above.) The source code for a work means the preferred form of the work for making modifications to it. For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable. However, as a special exception, the source code distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable. If distribution of executable or object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place counts as distribution of the source code, even though third parties are not compelled to copy the source along with the object code. 4. You may not copy, modify, sublicense, or distribute the Program except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense or distribute the Program is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. 5. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Program or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Program (or any work based on the Program), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Program or works based on it. 6. Each time you redistribute the Program (or any work based on the Program), the recipient automatically receives a license from the original licensor to copy, distribute or modify the Program subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties to this License. 7. If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Program at all. For example, if a patent license would not permit royalty-free redistribution of the Program by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Program. If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply and the section as a whole is intended to apply in other circumstances. It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system, which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice. This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License. 8. If the distribution and/or use of the Program is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Program under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License. 9. The Free Software Foundation may publish revised and/or new versions of the General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies a version number of this License which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of this License, you may choose any version ever published by the Free Software Foundation. 10. If you wish to incorporate parts of the Program into other free programs whose distribution conditions are different, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. NO WARRANTY 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. END OF TERMS AND CONDITIONS How to Apply These Terms to Your New Programs If you develop a new program, and you want it to be of the greatest possible use to the public, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms. To do so, attach the following notices to the program. It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. Copyright (C) This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. Also add information on how to contact you by electronic and paper mail. If the program is interactive, make it output a short notice like this when it starts in an interactive mode: Gnomovision version 69, Copyright (C) year name of author Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. This is free software, and you are welcome to redistribute it under certain conditions; type `show c' for details. The hypothetical commands `show w' and `show c' should show the appropriate parts of the General Public License. Of course, the commands you use may be called something other than `show w' and `show c'; they could even be mouse-clicks or menu items--whatever suits your program. You should also get your employer (if you work as a programmer) or your school, if any, to sign a "copyright disclaimer" for the program, if necessary. Here is a sample; alter the names: Yoyodyne, Inc., hereby disclaims all copyright interest in the program `Gnomovision' (which makes passes at compilers) written by James Hacker. , 1 April 1989 Ty Coon, President of Vice This General Public License does not permit incorporating your program into proprietary programs. If your program is a subroutine library, you may consider it more useful to permit linking proprietary applications with the library. If this is what you want to do, use the GNU Lesser General Public License instead of this License. ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1618779451.0 gallery_dl-1.21.1/MANIFEST.in0000644000175000017500000000010614037116473014147 0ustar00mikemikeinclude README.rst CHANGELOG.md LICENSE recursive-include docs *.conf ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1649443808.1847284 gallery_dl-1.21.1/PKG-INFO0000644000175000017500000002540714224101740013506 0ustar00mikemikeMetadata-Version: 2.1 Name: gallery_dl Version: 1.21.1 Summary: Command-line program to download image galleries and collections from several image hosting sites Home-page: https://github.com/mikf/gallery-dl Author: Mike Fährmann Author-email: mike_faehrmann@web.de Maintainer: Mike Fährmann Maintainer-email: mike_faehrmann@web.de License: GPLv2 Download-URL: https://github.com/mikf/gallery-dl/releases/latest Keywords: image gallery downloader crawler scraper Platform: UNKNOWN Classifier: Development Status :: 5 - Production/Stable Classifier: Environment :: Console Classifier: Intended Audience :: End Users/Desktop Classifier: License :: OSI Approved :: GNU General Public License v2 (GPLv2) Classifier: Operating System :: Microsoft :: Windows Classifier: Operating System :: POSIX Classifier: Operating System :: MacOS Classifier: Programming Language :: Python :: 3.4 Classifier: Programming Language :: Python :: 3.5 Classifier: Programming Language :: Python :: 3.6 Classifier: Programming Language :: Python :: 3.7 Classifier: Programming Language :: Python :: 3.8 Classifier: Programming Language :: Python :: 3.9 Classifier: Programming Language :: Python :: 3 :: Only Classifier: Topic :: Internet :: WWW/HTTP Classifier: Topic :: Multimedia :: Graphics Classifier: Topic :: Utilities Requires-Python: >=3.4 Provides-Extra: video License-File: LICENSE ========== gallery-dl ========== *gallery-dl* is a command-line program to download image galleries and collections from several image hosting sites (see `Supported Sites`_). It is a cross-platform tool with many configuration options and powerful `filenaming capabilities `_. |pypi| |build| |gitter| .. contents:: Dependencies ============ - Python_ 3.4+ - Requests_ Optional -------- - FFmpeg_: Pixiv Ugoira to WebM conversion - yt-dlp_ or youtube-dl_: Video downloads - PySocks_: SOCKS proxy support Installation ============ Pip --- The stable releases of *gallery-dl* are distributed on PyPI_ and can be easily installed or upgraded using pip_: .. code:: bash python3 -m pip install -U gallery-dl Installing the latest dev version directly from GitHub can be done with pip_ as well: .. code:: bash python3 -m pip install -U -I --no-deps --no-cache-dir https://github.com/mikf/gallery-dl/archive/master.tar.gz Note: Windows users should use :code:`py -3` instead of :code:`python3`. It is advised to use the latest version of pip_, including the essential packages :code:`setuptools` and :code:`wheel`. To ensure these packages are up-to-date, run .. code:: bash python3 -m pip install --upgrade pip setuptools wheel Standalone Executable --------------------- Prebuilt executable files with a Python interpreter and required Python packages included are available for - `Windows `__ - `Linux `__ | Executables build from the latest commit can be found at | https://github.com/mikf/gallery-dl/actions/workflows/executables.yml Snap ---- Linux users that are using a distro that is supported by Snapd_ can install *gallery-dl* from the Snap Store: .. code:: bash snap install gallery-dl Chocolatey ---------- Windows users that have Chocolatey_ installed can install *gallery-dl* from the Chocolatey Community Packages repository: .. code:: powershell choco install gallery-dl Scoop ----- *gallery-dl* is also available in the Scoop_ "main" bucket for Windows users: .. code:: powershell scoop install gallery-dl Usage ===== To use *gallery-dl* simply call it with the URLs you wish to download images from: .. code:: bash gallery-dl [OPTION]... URL... See also :code:`gallery-dl --help`. Examples -------- Download images; in this case from danbooru via tag search for 'bonocho': .. code:: bash gallery-dl "https://danbooru.donmai.us/posts?tags=bonocho" Get the direct URL of an image from a site supporting authentication with username & password: .. code:: bash gallery-dl -g -u "" -p "" "https://twitter.com/i/web/status/604341487988576256" Filter manga chapters by language and chapter number: .. code:: bash gallery-dl --chapter-filter "lang == 'fr' and 10 <= chapter < 20" "https://mangadex.org/title/2354/" | Search a remote resource for URLs and download images from them: | (URLs for which no extractor can be found will be silently ignored) .. code:: bash gallery-dl "r:https://pastebin.com/raw/FLwrCYsT" If a site's address is nonstandard for its extractor, you can prefix the URL with the extractor's name to force the use of a specific extractor: .. code:: bash gallery-dl "tumblr:https://sometumblrblog.example" Configuration ============= Configuration files for *gallery-dl* use a JSON-based file format. | For a (more or less) complete example with options set to their default values, see gallery-dl.conf_. | For a configuration file example with more involved settings and options, see gallery-dl-example.conf_. | A list of all available configuration options and their descriptions can be found in configuration.rst_. | *gallery-dl* searches for configuration files in the following places: Windows: * ``%APPDATA%\gallery-dl\config.json`` * ``%USERPROFILE%\gallery-dl\config.json`` * ``%USERPROFILE%\gallery-dl.conf`` (``%USERPROFILE%`` usually refers to the user's home directory, i.e. ``C:\Users\\``) Linux, macOS, etc.: * ``/etc/gallery-dl.conf`` * ``${XDG_CONFIG_HOME}/gallery-dl/config.json`` * ``${HOME}/.config/gallery-dl/config.json`` * ``${HOME}/.gallery-dl.conf`` Values in later configuration files will override previous ones. Command line options will override all related settings in the configuration file(s), e.g. using ``--write-metadata`` will enable writing metadata using the default values for all ``postprocessors.metadata.*`` settings, overriding any specific settings in configuration files. Authentication ============== Username & Password ------------------- Some extractors require you to provide valid login credentials in the form of a username & password pair. This is necessary for ``nijie`` and optional for ``aryion``, ``danbooru``, ``e621``, ``exhentai``, ``idolcomplex``, ``imgbb``, ``inkbunny``, ``instagram``, ``mangadex``, ``mangoxo``, ``pillowfort``, ``sankaku``, ``subscribestar``, ``tapas``, ``tsumino``, and ``twitter``. You can set the necessary information in your configuration file (cf. gallery-dl.conf_) .. code:: json { "extractor": { "twitter": { "username": "", "password": "" } } } or you can provide them directly via the :code:`-u/--username` and :code:`-p/--password` or via the :code:`-o/--option` command-line options .. code:: bash gallery-dl -u -p URL gallery-dl -o username= -o password= URL Cookies ------- For sites where login with username & password is not possible due to CAPTCHA or similar, or has not been implemented yet, you can use the cookies from a browser login session and input them into *gallery-dl*. This can be done via the `cookies `__ option in your configuration file by specifying - | the path to a Mozilla/Netscape format cookies.txt file exported by a browser addon | (e.g. `Get cookies.txt `__ for Chrome, `Export Cookies `__ for Firefox) - | a list of name-value pairs gathered from your browser's web developer tools | (in `Chrome `__, in `Firefox `__) For example: .. code:: json { "extractor": { "instagram": { "cookies": "$HOME/path/to/cookies.txt" }, "patreon": { "cookies": { "session_id": "K1T57EKu19TR49C51CDjOJoXNQLF7VbdVOiBrC9ye0a" } } } } You can also specify a cookies.txt file with the :code:`--cookies` command-line option: .. code:: bash gallery-dl --cookies "$HOME/path/to/cookies.txt" URL OAuth ----- *gallery-dl* supports user authentication via OAuth_ for ``deviantart``, ``flickr``, ``reddit``, ``smugmug``, ``tumblr``, and ``mastodon`` instances. This is mostly optional, but grants *gallery-dl* the ability to issue requests on your account's behalf and enables it to access resources which would otherwise be unavailable to a public user. To link your account to *gallery-dl*, start by invoking it with ``oauth:`` as an argument. For example: .. code:: bash gallery-dl oauth:flickr You will be sent to the site's authorization page and asked to grant read access to *gallery-dl*. Authorize it and you will be shown one or more "tokens", which should be added to your configuration file. To authenticate with a ``mastodon`` instance, run *gallery-dl* with ``oauth:mastodon:`` as argument. For example: .. code:: bash gallery-dl oauth:mastodon:pawoo.net gallery-dl oauth:mastodon:https://mastodon.social/ .. _gallery-dl.conf: https://github.com/mikf/gallery-dl/blob/master/docs/gallery-dl.conf .. _gallery-dl-example.conf: https://github.com/mikf/gallery-dl/blob/master/docs/gallery-dl-example.conf .. _configuration.rst: https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst .. _Supported Sites: https://github.com/mikf/gallery-dl/blob/master/docs/supportedsites.md .. _Formatting: https://github.com/mikf/gallery-dl/blob/master/docs/formatting.md .. _Python: https://www.python.org/downloads/ .. _PyPI: https://pypi.org/ .. _pip: https://pip.pypa.io/en/stable/ .. _Requests: https://requests.readthedocs.io/en/master/ .. _FFmpeg: https://www.ffmpeg.org/ .. _yt-dlp: https://github.com/yt-dlp/yt-dlp .. _youtube-dl: https://ytdl-org.github.io/youtube-dl/ .. _PySocks: https://pypi.org/project/PySocks/ .. _pyOpenSSL: https://pyopenssl.org/ .. _Snapd: https://docs.snapcraft.io/installing-snapd .. _OAuth: https://en.wikipedia.org/wiki/OAuth .. _Chocolatey: https://chocolatey.org/install .. _Scoop: https://scoop.sh .. |pypi| image:: https://img.shields.io/pypi/v/gallery-dl.svg :target: https://pypi.org/project/gallery-dl/ .. |build| image:: https://github.com/mikf/gallery-dl/workflows/tests/badge.svg :target: https://github.com/mikf/gallery-dl/actions .. |gitter| image:: https://badges.gitter.im/gallery-dl/main.svg :target: https://gitter.im/gallery-dl/main ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1649443807.0 gallery_dl-1.21.1/README.rst0000644000175000017500000002265114224101737014104 0ustar00mikemike========== gallery-dl ========== *gallery-dl* is a command-line program to download image galleries and collections from several image hosting sites (see `Supported Sites`_). It is a cross-platform tool with many configuration options and powerful `filenaming capabilities `_. |pypi| |build| |gitter| .. contents:: Dependencies ============ - Python_ 3.4+ - Requests_ Optional -------- - FFmpeg_: Pixiv Ugoira to WebM conversion - yt-dlp_ or youtube-dl_: Video downloads - PySocks_: SOCKS proxy support Installation ============ Pip --- The stable releases of *gallery-dl* are distributed on PyPI_ and can be easily installed or upgraded using pip_: .. code:: bash python3 -m pip install -U gallery-dl Installing the latest dev version directly from GitHub can be done with pip_ as well: .. code:: bash python3 -m pip install -U -I --no-deps --no-cache-dir https://github.com/mikf/gallery-dl/archive/master.tar.gz Note: Windows users should use :code:`py -3` instead of :code:`python3`. It is advised to use the latest version of pip_, including the essential packages :code:`setuptools` and :code:`wheel`. To ensure these packages are up-to-date, run .. code:: bash python3 -m pip install --upgrade pip setuptools wheel Standalone Executable --------------------- Prebuilt executable files with a Python interpreter and required Python packages included are available for - `Windows `__ - `Linux `__ | Executables build from the latest commit can be found at | https://github.com/mikf/gallery-dl/actions/workflows/executables.yml Snap ---- Linux users that are using a distro that is supported by Snapd_ can install *gallery-dl* from the Snap Store: .. code:: bash snap install gallery-dl Chocolatey ---------- Windows users that have Chocolatey_ installed can install *gallery-dl* from the Chocolatey Community Packages repository: .. code:: powershell choco install gallery-dl Scoop ----- *gallery-dl* is also available in the Scoop_ "main" bucket for Windows users: .. code:: powershell scoop install gallery-dl Usage ===== To use *gallery-dl* simply call it with the URLs you wish to download images from: .. code:: bash gallery-dl [OPTION]... URL... See also :code:`gallery-dl --help`. Examples -------- Download images; in this case from danbooru via tag search for 'bonocho': .. code:: bash gallery-dl "https://danbooru.donmai.us/posts?tags=bonocho" Get the direct URL of an image from a site supporting authentication with username & password: .. code:: bash gallery-dl -g -u "" -p "" "https://twitter.com/i/web/status/604341487988576256" Filter manga chapters by language and chapter number: .. code:: bash gallery-dl --chapter-filter "lang == 'fr' and 10 <= chapter < 20" "https://mangadex.org/title/2354/" | Search a remote resource for URLs and download images from them: | (URLs for which no extractor can be found will be silently ignored) .. code:: bash gallery-dl "r:https://pastebin.com/raw/FLwrCYsT" If a site's address is nonstandard for its extractor, you can prefix the URL with the extractor's name to force the use of a specific extractor: .. code:: bash gallery-dl "tumblr:https://sometumblrblog.example" Configuration ============= Configuration files for *gallery-dl* use a JSON-based file format. | For a (more or less) complete example with options set to their default values, see gallery-dl.conf_. | For a configuration file example with more involved settings and options, see gallery-dl-example.conf_. | A list of all available configuration options and their descriptions can be found in configuration.rst_. | *gallery-dl* searches for configuration files in the following places: Windows: * ``%APPDATA%\gallery-dl\config.json`` * ``%USERPROFILE%\gallery-dl\config.json`` * ``%USERPROFILE%\gallery-dl.conf`` (``%USERPROFILE%`` usually refers to the user's home directory, i.e. ``C:\Users\\``) Linux, macOS, etc.: * ``/etc/gallery-dl.conf`` * ``${XDG_CONFIG_HOME}/gallery-dl/config.json`` * ``${HOME}/.config/gallery-dl/config.json`` * ``${HOME}/.gallery-dl.conf`` Values in later configuration files will override previous ones. Command line options will override all related settings in the configuration file(s), e.g. using ``--write-metadata`` will enable writing metadata using the default values for all ``postprocessors.metadata.*`` settings, overriding any specific settings in configuration files. Authentication ============== Username & Password ------------------- Some extractors require you to provide valid login credentials in the form of a username & password pair. This is necessary for ``nijie`` and optional for ``aryion``, ``danbooru``, ``e621``, ``exhentai``, ``idolcomplex``, ``imgbb``, ``inkbunny``, ``instagram``, ``mangadex``, ``mangoxo``, ``pillowfort``, ``sankaku``, ``subscribestar``, ``tapas``, ``tsumino``, and ``twitter``. You can set the necessary information in your configuration file (cf. gallery-dl.conf_) .. code:: json { "extractor": { "twitter": { "username": "", "password": "" } } } or you can provide them directly via the :code:`-u/--username` and :code:`-p/--password` or via the :code:`-o/--option` command-line options .. code:: bash gallery-dl -u -p URL gallery-dl -o username= -o password= URL Cookies ------- For sites where login with username & password is not possible due to CAPTCHA or similar, or has not been implemented yet, you can use the cookies from a browser login session and input them into *gallery-dl*. This can be done via the `cookies `__ option in your configuration file by specifying - | the path to a Mozilla/Netscape format cookies.txt file exported by a browser addon | (e.g. `Get cookies.txt `__ for Chrome, `Export Cookies `__ for Firefox) - | a list of name-value pairs gathered from your browser's web developer tools | (in `Chrome `__, in `Firefox `__) For example: .. code:: json { "extractor": { "instagram": { "cookies": "$HOME/path/to/cookies.txt" }, "patreon": { "cookies": { "session_id": "K1T57EKu19TR49C51CDjOJoXNQLF7VbdVOiBrC9ye0a" } } } } You can also specify a cookies.txt file with the :code:`--cookies` command-line option: .. code:: bash gallery-dl --cookies "$HOME/path/to/cookies.txt" URL OAuth ----- *gallery-dl* supports user authentication via OAuth_ for ``deviantart``, ``flickr``, ``reddit``, ``smugmug``, ``tumblr``, and ``mastodon`` instances. This is mostly optional, but grants *gallery-dl* the ability to issue requests on your account's behalf and enables it to access resources which would otherwise be unavailable to a public user. To link your account to *gallery-dl*, start by invoking it with ``oauth:`` as an argument. For example: .. code:: bash gallery-dl oauth:flickr You will be sent to the site's authorization page and asked to grant read access to *gallery-dl*. Authorize it and you will be shown one or more "tokens", which should be added to your configuration file. To authenticate with a ``mastodon`` instance, run *gallery-dl* with ``oauth:mastodon:`` as argument. For example: .. code:: bash gallery-dl oauth:mastodon:pawoo.net gallery-dl oauth:mastodon:https://mastodon.social/ .. _gallery-dl.conf: https://github.com/mikf/gallery-dl/blob/master/docs/gallery-dl.conf .. _gallery-dl-example.conf: https://github.com/mikf/gallery-dl/blob/master/docs/gallery-dl-example.conf .. _configuration.rst: https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst .. _Supported Sites: https://github.com/mikf/gallery-dl/blob/master/docs/supportedsites.md .. _Formatting: https://github.com/mikf/gallery-dl/blob/master/docs/formatting.md .. _Python: https://www.python.org/downloads/ .. _PyPI: https://pypi.org/ .. _pip: https://pip.pypa.io/en/stable/ .. _Requests: https://requests.readthedocs.io/en/master/ .. _FFmpeg: https://www.ffmpeg.org/ .. _yt-dlp: https://github.com/yt-dlp/yt-dlp .. _youtube-dl: https://ytdl-org.github.io/youtube-dl/ .. _PySocks: https://pypi.org/project/PySocks/ .. _pyOpenSSL: https://pyopenssl.org/ .. _Snapd: https://docs.snapcraft.io/installing-snapd .. _OAuth: https://en.wikipedia.org/wiki/OAuth .. _Chocolatey: https://chocolatey.org/install .. _Scoop: https://scoop.sh .. |pypi| image:: https://img.shields.io/pypi/v/gallery-dl.svg :target: https://pypi.org/project/gallery-dl/ .. |build| image:: https://github.com/mikf/gallery-dl/workflows/tests/badge.svg :target: https://github.com/mikf/gallery-dl/actions .. |gitter| image:: https://badges.gitter.im/gallery-dl/main.svg :target: https://gitter.im/gallery-dl/main ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1649443808.1580613 gallery_dl-1.21.1/data/0000755000175000017500000000000014224101740013312 5ustar00mikemike././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1649443808.1613946 gallery_dl-1.21.1/data/completion/0000755000175000017500000000000014224101740015463 5ustar00mikemike././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1648825168.0 gallery_dl-1.21.1/data/completion/_gallery-dl0000644000175000017500000001211714221611520017602 0ustar00mikemike#compdef gallery-dl local curcontext="$curcontext" typeset -A opt_args local rc=1 _arguments -C -S \ {-h,--help}'[Print this help message and exit]' \ --version'[Print program version and exit]' \ {-i,--input-file}'[Download URLs found in FILE ("-" for stdin). More than one --input-file can be specified]':'':_files \ {-d,--destination}'[Target location for file downloads]':'' \ {-D,--directory}'[Exact location for file downloads]':'' \ {-f,--filename}'[Filename format string for downloaded files ("/O" for "original" filenames)]':'' \ --cookies'[File to load additional cookies from]':'':_files \ --proxy'[Use the specified proxy]':'' \ --source-address'[Client-side IP address to bind to]':'' \ --clear-cache'[Delete cached login sessions, cookies, etc. for MODULE (ALL to delete everything)]':'' \ {-q,--quiet}'[Activate quiet mode]' \ {-v,--verbose}'[Print various debugging information]' \ {-g,--get-urls}'[Print URLs instead of downloading]' \ {-G,--resolve-urls}'[Print URLs instead of downloading; resolve intermediary URLs]' \ {-j,--dump-json}'[Print JSON information]' \ {-s,--simulate}'[Simulate data extraction; do not download anything]' \ {-E,--extractor-info}'[Print extractor defaults and settings]' \ {-K,--list-keywords}'[Print a list of available keywords and example values for the given URLs]' \ --list-modules'[Print a list of available extractor modules]' \ --list-extractors'[Print a list of extractor classes with description, (sub)category and example URL]' \ --write-log'[Write logging output to FILE]':'':_files \ --write-unsupported'[Write URLs, which get emitted by other extractors but cannot be handled, to FILE]':'':_files \ --write-pages'[Write downloaded intermediary pages to files in the current directory to debug problems]' \ {-r,--limit-rate}'[Maximum download rate (e.g. 500k or 2.5M)]':'' \ {-R,--retries}'[Maximum number of retries for failed HTTP requests or -1 for infinite retries (default: 4)]':'' \ --http-timeout'[Timeout for HTTP connections (default: 30.0)]':'' \ --sleep'[Number of seconds to wait before each download. This can be either a constant value or a range (e.g. 2.7 or 2.0-3.5)]':'' \ --sleep-request'[Number of seconds to wait between HTTP requests during data extraction]':'' \ --sleep-extractor'[Number of seconds to wait before starting data extraction for an input URL]':'' \ --filesize-min'[Do not download files smaller than SIZE (e.g. 500k or 2.5M)]':'' \ --filesize-max'[Do not download files larger than SIZE (e.g. 500k or 2.5M)]':'' \ --no-part'[Do not use .part files]' \ --no-skip'[Do not skip downloads; overwrite existing files]' \ --no-mtime'[Do not set file modification times according to Last-Modified HTTP response headers]' \ --no-download'[Do not download any files]' \ --no-check-certificate'[Disable HTTPS certificate validation]' \ {-c,--config}'[Additional configuration files]':'':_files \ --config-yaml'[==SUPPRESS==]':'':_files \ {-o,--option}'[Additional "=" option values]':'' \ --ignore-config'[Do not read the default configuration files]' \ {-u,--username}'[Username to login with]':'' \ {-p,--password}'[Password belonging to the given username]':'' \ --netrc'[Enable .netrc authentication data]' \ --download-archive'[Record all downloaded files in the archive file and skip downloading any file already in it]':'':_files \ {-A,--abort}'[Stop current extractor run after N consecutive file downloads were skipped]':'' \ {-T,--terminate}'[Stop current and parent extractor runs after N consecutive file downloads were skipped]':'' \ --range'[Index-range(s) specifying which images to download. For example "5-10" or "1,3-5,10-"]':'' \ --chapter-range'[Like "--range", but applies to manga-chapters and other delegated URLs]':'' \ --filter'[Python expression controlling which images to download. Files for which the expression evaluates to False are ignored. Available keys are the filename-specific ones listed by "-K". Example: --filter "image_width >= 1000 and rating in ("s", "q")"]':'' \ --chapter-filter'[Like "--filter", but applies to manga-chapters and other delegated URLs]':'' \ --zip'[Store downloaded files in a ZIP archive]' \ --ugoira-conv'[Convert Pixiv Ugoira to WebM (requires FFmpeg)]' \ --ugoira-conv-lossless'[Convert Pixiv Ugoira to WebM in VP9 lossless mode]' \ --ugoira-conv-copy'[Convert Pixiv Ugoira to MKV without re-encoding any frames]' \ --write-metadata'[Write metadata to separate JSON files]' \ --write-info-json'[Write gallery metadata to a info.json file]' \ --write-infojson'[==SUPPRESS==]' \ --write-tags'[Write image tags to separate text files]' \ --mtime-from-date'[Set file modification times according to "date" metadata]' \ --exec'[Execute CMD for each downloaded file. Example: --exec "convert {} {}.png && rm {}"]':'' \ --exec-after'[Execute CMD after all files were downloaded successfully. Example: --exec-after "cd {} && convert * ../doc.pdf"]':'' \ {-P,--postprocessor}'[Activate the specified post processor]':'' && rc=0 return rc ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1648825167.0 gallery_dl-1.21.1/data/completion/gallery-dl0000644000175000017500000000245714221611517017457 0ustar00mikemike_gallery_dl() { local cur prev COMPREPLY=() cur="${COMP_WORDS[COMP_CWORD]}" prev="${COMP_WORDS[COMP_CWORD-1]}" if [[ "${prev}" =~ ^(-i|--input-file|--cookies|--write-log|--write-unsupported|-c|--config|--config-yaml|--download-archive)$ ]]; then COMPREPLY=( $(compgen -f -- "${cur}") ) elif [[ "${prev}" =~ ^()$ ]]; then COMPREPLY=( $(compgen -d -- "${cur}") ) else COMPREPLY=( $(compgen -W "--help --version --input-file --destination --directory --filename --cookies --proxy --source-address --clear-cache --quiet --verbose --get-urls --resolve-urls --dump-json --simulate --extractor-info --list-keywords --list-modules --list-extractors --write-log --write-unsupported --write-pages --limit-rate --retries --http-timeout --sleep --sleep-request --sleep-extractor --filesize-min --filesize-max --no-part --no-skip --no-mtime --no-download --no-check-certificate --config --config-yaml --option --ignore-config --username --password --netrc --download-archive --abort --terminate --range --chapter-range --filter --chapter-filter --zip --ugoira-conv --ugoira-conv-lossless --ugoira-conv-copy --write-metadata --write-info-json --write-infojson --write-tags --mtime-from-date --exec --exec-after --postprocessor" -- "${cur}") ) fi } complete -F _gallery_dl gallery-dl ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1648825168.0 gallery_dl-1.21.1/data/completion/gallery-dl.fish0000644000175000017500000001471214221611520020376 0ustar00mikemikecomplete -c gallery-dl -x complete -c gallery-dl -s 'h' -l 'help' -d 'Print this help message and exit' complete -c gallery-dl -l 'version' -d 'Print program version and exit' complete -c gallery-dl -r -F -s 'i' -l 'input-file' -d 'Download URLs found in FILE ("-" for stdin). More than one --input-file can be specified' complete -c gallery-dl -x -a '(__fish_complete_directories)' -s 'd' -l 'destination' -d 'Target location for file downloads' complete -c gallery-dl -x -a '(__fish_complete_directories)' -s 'D' -l 'directory' -d 'Exact location for file downloads' complete -c gallery-dl -x -s 'f' -l 'filename' -d 'Filename format string for downloaded files ("/O" for "original" filenames)' complete -c gallery-dl -r -F -l 'cookies' -d 'File to load additional cookies from' complete -c gallery-dl -x -l 'proxy' -d 'Use the specified proxy' complete -c gallery-dl -x -l 'source-address' -d 'Client-side IP address to bind to' complete -c gallery-dl -x -l 'clear-cache' -d 'Delete cached login sessions, cookies, etc. for MODULE (ALL to delete everything)' complete -c gallery-dl -s 'q' -l 'quiet' -d 'Activate quiet mode' complete -c gallery-dl -s 'v' -l 'verbose' -d 'Print various debugging information' complete -c gallery-dl -s 'g' -l 'get-urls' -d 'Print URLs instead of downloading' complete -c gallery-dl -s 'G' -l 'resolve-urls' -d 'Print URLs instead of downloading; resolve intermediary URLs' complete -c gallery-dl -s 'j' -l 'dump-json' -d 'Print JSON information' complete -c gallery-dl -s 's' -l 'simulate' -d 'Simulate data extraction; do not download anything' complete -c gallery-dl -s 'E' -l 'extractor-info' -d 'Print extractor defaults and settings' complete -c gallery-dl -s 'K' -l 'list-keywords' -d 'Print a list of available keywords and example values for the given URLs' complete -c gallery-dl -l 'list-modules' -d 'Print a list of available extractor modules' complete -c gallery-dl -l 'list-extractors' -d 'Print a list of extractor classes with description, (sub)category and example URL' complete -c gallery-dl -r -F -l 'write-log' -d 'Write logging output to FILE' complete -c gallery-dl -r -F -l 'write-unsupported' -d 'Write URLs, which get emitted by other extractors but cannot be handled, to FILE' complete -c gallery-dl -l 'write-pages' -d 'Write downloaded intermediary pages to files in the current directory to debug problems' complete -c gallery-dl -x -s 'r' -l 'limit-rate' -d 'Maximum download rate (e.g. 500k or 2.5M)' complete -c gallery-dl -x -s 'R' -l 'retries' -d 'Maximum number of retries for failed HTTP requests or -1 for infinite retries (default: 4)' complete -c gallery-dl -x -l 'http-timeout' -d 'Timeout for HTTP connections (default: 30.0)' complete -c gallery-dl -x -l 'sleep' -d 'Number of seconds to wait before each download. This can be either a constant value or a range (e.g. 2.7 or 2.0-3.5)' complete -c gallery-dl -x -l 'sleep-request' -d 'Number of seconds to wait between HTTP requests during data extraction' complete -c gallery-dl -x -l 'sleep-extractor' -d 'Number of seconds to wait before starting data extraction for an input URL' complete -c gallery-dl -x -l 'filesize-min' -d 'Do not download files smaller than SIZE (e.g. 500k or 2.5M)' complete -c gallery-dl -x -l 'filesize-max' -d 'Do not download files larger than SIZE (e.g. 500k or 2.5M)' complete -c gallery-dl -l 'no-part' -d 'Do not use .part files' complete -c gallery-dl -l 'no-skip' -d 'Do not skip downloads; overwrite existing files' complete -c gallery-dl -l 'no-mtime' -d 'Do not set file modification times according to Last-Modified HTTP response headers' complete -c gallery-dl -l 'no-download' -d 'Do not download any files' complete -c gallery-dl -l 'no-check-certificate' -d 'Disable HTTPS certificate validation' complete -c gallery-dl -r -F -s 'c' -l 'config' -d 'Additional configuration files' complete -c gallery-dl -r -F -l 'config-yaml' -d '==SUPPRESS==' complete -c gallery-dl -x -s 'o' -l 'option' -d 'Additional "=" option values' complete -c gallery-dl -l 'ignore-config' -d 'Do not read the default configuration files' complete -c gallery-dl -x -s 'u' -l 'username' -d 'Username to login with' complete -c gallery-dl -x -s 'p' -l 'password' -d 'Password belonging to the given username' complete -c gallery-dl -l 'netrc' -d 'Enable .netrc authentication data' complete -c gallery-dl -r -F -l 'download-archive' -d 'Record all downloaded files in the archive file and skip downloading any file already in it' complete -c gallery-dl -x -s 'A' -l 'abort' -d 'Stop current extractor run after N consecutive file downloads were skipped' complete -c gallery-dl -x -s 'T' -l 'terminate' -d 'Stop current and parent extractor runs after N consecutive file downloads were skipped' complete -c gallery-dl -x -l 'range' -d 'Index-range(s) specifying which images to download. For example "5-10" or "1,3-5,10-"' complete -c gallery-dl -x -l 'chapter-range' -d 'Like "--range", but applies to manga-chapters and other delegated URLs' complete -c gallery-dl -x -l 'filter' -d 'Python expression controlling which images to download. Files for which the expression evaluates to False are ignored. Available keys are the filename-specific ones listed by "-K". Example: --filter "image_width >= 1000 and rating in ("s", "q")"' complete -c gallery-dl -x -l 'chapter-filter' -d 'Like "--filter", but applies to manga-chapters and other delegated URLs' complete -c gallery-dl -l 'zip' -d 'Store downloaded files in a ZIP archive' complete -c gallery-dl -l 'ugoira-conv' -d 'Convert Pixiv Ugoira to WebM (requires FFmpeg)' complete -c gallery-dl -l 'ugoira-conv-lossless' -d 'Convert Pixiv Ugoira to WebM in VP9 lossless mode' complete -c gallery-dl -l 'ugoira-conv-copy' -d 'Convert Pixiv Ugoira to MKV without re-encoding any frames' complete -c gallery-dl -l 'write-metadata' -d 'Write metadata to separate JSON files' complete -c gallery-dl -l 'write-info-json' -d 'Write gallery metadata to a info.json file' complete -c gallery-dl -l 'write-infojson' -d '==SUPPRESS==' complete -c gallery-dl -l 'write-tags' -d 'Write image tags to separate text files' complete -c gallery-dl -l 'mtime-from-date' -d 'Set file modification times according to "date" metadata' complete -c gallery-dl -x -l 'exec' -d 'Execute CMD for each downloaded file. Example: --exec "convert {} {}.png && rm {}"' complete -c gallery-dl -x -l 'exec-after' -d 'Execute CMD after all files were downloaded successfully. Example: --exec-after "cd {} && convert * ../doc.pdf"' complete -c gallery-dl -x -s 'P' -l 'postprocessor' -d 'Activate the specified post processor' ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1649443808.1613946 gallery_dl-1.21.1/data/man/0000755000175000017500000000000014224101740014065 5ustar00mikemike././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1649443807.0 gallery_dl-1.21.1/data/man/gallery-dl.10000644000175000017500000001563714224101737016225 0ustar00mikemike.TH "GALLERY-DL" "1" "2022-04-08" "1.21.1" "gallery-dl Manual" .\" disable hyphenation .nh .SH NAME gallery-dl \- download image-galleries and -collections .SH SYNOPSIS .B gallery-dl [OPTION]... URL... .SH DESCRIPTION .B gallery-dl is a command-line program to download image-galleries and -collections from several image hosting sites. It is a cross-platform tool with many configuration options and powerful filenaming capabilities. .SH OPTIONS .TP .B "\-h, \-\-help" Print this help message and exit .TP .B "\-\-version" Print program version and exit .TP .B "\-i, \-\-input\-file" \f[I]FILE\f[] Download URLs found in FILE ('-' for stdin). More than one --input-file can be specified .TP .B "\-d, \-\-destination" \f[I]PATH\f[] Target location for file downloads .TP .B "\-D, \-\-directory" \f[I]PATH\f[] Exact location for file downloads .TP .B "\-f, \-\-filename" \f[I]FORMAT\f[] Filename format string for downloaded files ('/O' for "original" filenames) .TP .B "\-\-cookies" \f[I]FILE\f[] File to load additional cookies from .TP .B "\-\-proxy" \f[I]URL\f[] Use the specified proxy .TP .B "\-\-source\-address" \f[I]IP\f[] Client-side IP address to bind to .TP .B "\-\-clear\-cache" \f[I]MODULE\f[] Delete cached login sessions, cookies, etc. for MODULE (ALL to delete everything) .TP .B "\-q, \-\-quiet" Activate quiet mode .TP .B "\-v, \-\-verbose" Print various debugging information .TP .B "\-g, \-\-get\-urls" Print URLs instead of downloading .TP .B "\-G, \-\-resolve\-urls" Print URLs instead of downloading; resolve intermediary URLs .TP .B "\-j, \-\-dump\-json" Print JSON information .TP .B "\-s, \-\-simulate" Simulate data extraction; do not download anything .TP .B "\-E, \-\-extractor\-info" Print extractor defaults and settings .TP .B "\-K, \-\-list\-keywords" Print a list of available keywords and example values for the given URLs .TP .B "\-\-list\-modules" Print a list of available extractor modules .TP .B "\-\-list\-extractors" Print a list of extractor classes with description, (sub)category and example URL .TP .B "\-\-write\-log" \f[I]FILE\f[] Write logging output to FILE .TP .B "\-\-write\-unsupported" \f[I]FILE\f[] Write URLs, which get emitted by other extractors but cannot be handled, to FILE .TP .B "\-\-write\-pages" Write downloaded intermediary pages to files in the current directory to debug problems .TP .B "\-r, \-\-limit\-rate" \f[I]RATE\f[] Maximum download rate (e.g. 500k or 2.5M) .TP .B "\-R, \-\-retries" \f[I]N\f[] Maximum number of retries for failed HTTP requests or -1 for infinite retries (default: 4) .TP .B "\-\-http\-timeout" \f[I]SECONDS\f[] Timeout for HTTP connections (default: 30.0) .TP .B "\-\-sleep" \f[I]SECONDS\f[] Number of seconds to wait before each download. This can be either a constant value or a range (e.g. 2.7 or 2.0-3.5) .TP .B "\-\-sleep\-request" \f[I]SECONDS\f[] Number of seconds to wait between HTTP requests during data extraction .TP .B "\-\-sleep\-extractor" \f[I]SECONDS\f[] Number of seconds to wait before starting data extraction for an input URL .TP .B "\-\-filesize\-min" \f[I]SIZE\f[] Do not download files smaller than SIZE (e.g. 500k or 2.5M) .TP .B "\-\-filesize\-max" \f[I]SIZE\f[] Do not download files larger than SIZE (e.g. 500k or 2.5M) .TP .B "\-\-no\-part" Do not use .part files .TP .B "\-\-no\-skip" Do not skip downloads; overwrite existing files .TP .B "\-\-no\-mtime" Do not set file modification times according to Last-Modified HTTP response headers .TP .B "\-\-no\-download" Do not download any files .TP .B "\-\-no\-check\-certificate" Disable HTTPS certificate validation .TP .B "\-c, \-\-config" \f[I]FILE\f[] Additional configuration files .TP .B "\-o, \-\-option" \f[I]OPT\f[] Additional '=' option values .TP .B "\-\-ignore\-config" Do not read the default configuration files .TP .B "\-u, \-\-username" \f[I]USER\f[] Username to login with .TP .B "\-p, \-\-password" \f[I]PASS\f[] Password belonging to the given username .TP .B "\-\-netrc" Enable .netrc authentication data .TP .B "\-\-download\-archive" \f[I]FILE\f[] Record all downloaded files in the archive file and skip downloading any file already in it .TP .B "\-A, \-\-abort" \f[I]N\f[] Stop current extractor run after N consecutive file downloads were skipped .TP .B "\-T, \-\-terminate" \f[I]N\f[] Stop current and parent extractor runs after N consecutive file downloads were skipped .TP .B "\-\-range" \f[I]RANGE\f[] Index-range(s) specifying which images to download. For example '5-10' or '1,3-5,10-' .TP .B "\-\-chapter\-range" \f[I]RANGE\f[] Like '--range', but applies to manga-chapters and other delegated URLs .TP .B "\-\-filter" \f[I]EXPR\f[] Python expression controlling which images to download. Files for which the expression evaluates to False are ignored. Available keys are the filename-specific ones listed by '-K'. Example: --filter "image_width >= 1000 and rating in ('s', 'q')" .TP .B "\-\-chapter\-filter" \f[I]EXPR\f[] Like '--filter', but applies to manga-chapters and other delegated URLs .TP .B "\-\-zip" Store downloaded files in a ZIP archive .TP .B "\-\-ugoira\-conv" Convert Pixiv Ugoira to WebM (requires FFmpeg) .TP .B "\-\-ugoira\-conv\-lossless" Convert Pixiv Ugoira to WebM in VP9 lossless mode .TP .B "\-\-ugoira\-conv\-copy" Convert Pixiv Ugoira to MKV without re-encoding any frames .TP .B "\-\-write\-metadata" Write metadata to separate JSON files .TP .B "\-\-write\-info\-json" Write gallery metadata to a info.json file .TP .B "\-\-write\-tags" Write image tags to separate text files .TP .B "\-\-mtime\-from\-date" Set file modification times according to 'date' metadata .TP .B "\-\-exec" \f[I]CMD\f[] Execute CMD for each downloaded file. Example: --exec 'convert {} {}.png && rm {}' .TP .B "\-\-exec\-after" \f[I]CMD\f[] Execute CMD after all files were downloaded successfully. Example: --exec-after 'cd {} && convert * ../doc.pdf' .TP .B "\-P, \-\-postprocessor" \f[I]NAME\f[] Activate the specified post processor .SH EXAMPLES .TP gallery-dl \f[I]URL\f[] Download images from \f[I]URL\f[]. .TP gallery-dl -g -u -p \f[I]URL\f[] Print direct URLs from a site that requires authentication. .TP gallery-dl --filter 'type == "ugoira"' --range '2-4' \f[I]URL\f[] Apply filter and range expressions. This will only download the second, third, and fourth file where its type value is equal to "ugoira". .TP gallery-dl r:\f[I]URL\f[] Scan \f[I]URL\f[] for other URLs and invoke \f[B]gallery-dl\f[] on them. .TP gallery-dl oauth:\f[I]SITE\-NAME\f[] Gain OAuth authentication tokens for .IR deviantart , .IR flickr , .IR reddit , .IR smugmug ", and" .IR tumblr . .SH FILES .TP .I /etc/gallery-dl.conf The system wide configuration file. .TP .I ~/.config/gallery-dl/config.json Per user configuration file. .TP .I ~/.gallery-dl.conf Alternate per user configuration file. .SH BUGS https://github.com/mikf/gallery-dl/issues .SH AUTHORS Mike Fährmann .br and https://github.com/mikf/gallery-dl/graphs/contributors .SH "SEE ALSO" .BR gallery-dl.conf (5) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1649443807.0 gallery_dl-1.21.1/data/man/gallery-dl.conf.50000644000175000017500000024706714224101737017161 0ustar00mikemike.TH "GALLERY-DL.CONF" "5" "2022-04-08" "1.21.1" "gallery-dl Manual" .\" disable hyphenation .nh .\" disable justification (adjust text to left margin only) .ad l .SH NAME gallery-dl.conf \- gallery-dl configuration file .SH DESCRIPTION gallery-dl will search for configuration files in the following places every time it is started, unless .B --ignore-config is specified: .PP .RS 4 .nf .I /etc/gallery-dl.conf .I $HOME/.config/gallery-dl/config.json .I $HOME/.gallery-dl.conf .fi .RE .PP It is also possible to specify additional configuration files with the .B -c/--config command-line option or to add further option values with .B -o/--option as = pairs, Configuration files are JSON-based and therefore don't allow any ordinary comments, but, since unused keys are simply ignored, it is possible to utilize those as makeshift comments by settings their values to arbitrary strings. .SH EXAMPLE { .RS 4 "base-directory": "/tmp/", .br "extractor": { .RS 4 "pixiv": { .RS 4 "directory": ["Pixiv", "Works", "{user[id]}"], .br "filename": "{id}{num}.{extension}", .br "username": "foo", .br "password": "bar" .RE }, .br "flickr": { .RS 4 "_comment": "OAuth keys for account 'foobar'", .br "access-token": "0123456789-0123456789abcdef", .br "access-token-secret": "fedcba9876543210" .RE } .RE }, .br "downloader": { .RS 4 "retries": 3, .br "timeout": 2.5 .RE } .RE } .SH EXTRACTOR OPTIONS .SS extractor.*.filename .IP "Type:" 6 \f[I]string\f[] or \f[I]object\f[] .IP "Example:" 4 .br * .. code:: json "{manga}_c{chapter}_{page:>03}.{extension}" .br * .. code:: json { "extension == 'mp4'": "{id}_video.{extension}", "'nature' in title" : "{id}_{title}.{extension}", "" : "{id}_default.{extension}" } .IP "Description:" 4 A \f[I]format string\f[] to build filenames for downloaded files with. If this is an \f[I]object\f[], it must contain Python expressions mapping to the filename format strings to use. These expressions are evaluated in the order as specified in Python 3.6+ and in an undetermined order in Python 3.4 and 3.5. The available replacement keys depend on the extractor used. A list of keys for a specific one can be acquired by calling *gallery-dl* with the \f[I]-K\f[]/\f[I]--list-keywords\f[] command-line option. For example: .. code:: $ gallery-dl -K http://seiga.nicovideo.jp/seiga/im5977527 Keywords for directory names: category seiga subcategory image Keywords for filenames: category seiga extension None image-id 5977527 subcategory image Note: Even if the value of the \f[I]extension\f[] key is missing or \f[I]None\f[], it will be filled in later when the file download is starting. This key is therefore always available to provide a valid filename extension. .SS extractor.*.directory .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] or \f[I]object\f[] .IP "Example:" 4 .br * .. code:: json ["{category}", "{manga}", "c{chapter} - {title}"] .br * .. code:: json { "'nature' in content": ["Nature Pictures"], "retweet_id != 0" : ["{category}", "{user[name]}", "Retweets"], "" : ["{category}", "{user[name]}"] } .IP "Description:" 4 A list of \f[I]format strings\f[] to build target directory paths with. If this is an \f[I]object\f[], it must contain Python expressions mapping to the list of format strings to use. Each individual string in such a list represents a single path segment, which will be joined together and appended to the \f[I]base-directory\f[] to form the complete target directory path. .SS extractor.*.base-directory .IP "Type:" 6 \f[I]Path\f[] .IP "Default:" 9 \f[I]"./gallery-dl/"\f[] .IP "Description:" 4 Directory path used as base for all download destinations. .SS extractor.*.parent-directory .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Use an extractor's current target directory as \f[I]base-directory\f[] for any spawned child extractors. .SS extractor.*.parent-metadata .IP "Type:" 6 \f[I]bool\f[] or \f[I]string\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 If \f[I]true\f[], overwrite any metadata provided by a child extractor with its parent's. If this is a \f[I]string\f[], add a parent's metadata to its children's .br to a field named after said string. For example with \f[I]"parent-metadata": "_p_"\f[]: .br .. code:: json { "id": "child-id", "_p_": {"id": "parent-id"} } .SS extractor.*.parent-skip .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Share number of skipped downloads between parent and child extractors. .SS extractor.*.path-restrict .IP "Type:" 6 \f[I]string\f[] or \f[I]object\f[] .IP "Default:" 9 \f[I]"auto"\f[] .IP "Example:" 4 .br * "/!? (){}" .br * {" ": "_", "/": "-", "|": "-", ":": "-", "*": "+"} .IP "Description:" 4 A string of characters to be replaced with the value of .br \f[I]path-replace\f[] or an object mapping invalid/unwanted characters to their replacements .br for generated path segment names. .br Special values: .br * \f[I]"auto"\f[]: Use characters from \f[I]"unix"\f[] or \f[I]"windows"\f[] depending on the local operating system .br * \f[I]"unix"\f[]: \f[I]"/"\f[] .br * \f[I]"windows"\f[]: \f[I]"\\\\\\\\|/<>:\\"?*"\f[] .br * \f[I]"ascii"\f[]: \f[I]"^0-9A-Za-z_."\f[] Note: In a string with 2 or more characters, \f[I][]^-\\\f[] need to be escaped with backslashes, e.g. \f[I]"\\\\[\\\\]"\f[] .SS extractor.*.path-replace .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"_"\f[] .IP "Description:" 4 The replacement character(s) for \f[I]path-restrict\f[] .SS extractor.*.path-remove .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"\\u0000-\\u001f\\u007f"\f[] (ASCII control characters) .IP "Description:" 4 Set of characters to remove from generated path names. Note: In a string with 2 or more characters, \f[I][]^-\\\f[] need to be escaped with backslashes, e.g. \f[I]"\\\\[\\\\]"\f[] .SS extractor.*.path-strip .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"auto"\f[] .IP "Description:" 4 Set of characters to remove from the end of generated path segment names using \f[I]str.rstrip()\f[] Special values: .br * \f[I]"auto"\f[]: Use characters from \f[I]"unix"\f[] or \f[I]"windows"\f[] depending on the local operating system .br * \f[I]"unix"\f[]: \f[I]""\f[] .br * \f[I]"windows"\f[]: \f[I]". "\f[] .SS extractor.*.extension-map .IP "Type:" 6 \f[I]object\f[] .IP "Default:" 9 .. code:: json { "jpeg": "jpg", "jpe" : "jpg", "jfif": "jpg", "jif" : "jpg", "jfi" : "jpg" } .IP "Description:" 4 A JSON \f[I]object\f[] mapping filename extensions to their replacements. .SS extractor.*.skip .IP "Type:" 6 \f[I]bool\f[] or \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Controls the behavior when downloading files that have been downloaded before, i.e. a file with the same filename already exists or its ID is in a \f[I]download archive\f[]. .br * \f[I]true\f[]: Skip downloads .br * \f[I]false\f[]: Overwrite already existing files .br * \f[I]"abort"\f[]: Stop the current extractor run .br * \f[I]"abort:N"\f[]: Skip downloads and stop the current extractor run after \f[I]N\f[] consecutive skips .br * \f[I]"terminate"\f[]: Stop the current extractor run, including parent extractors .br * \f[I]"terminate:N"\f[]: Skip downloads and stop the current extractor run, including parent extractors, after \f[I]N\f[] consecutive skips .br * \f[I]"exit"\f[]: Exit the program altogether .br * \f[I]"exit:N"\f[]: Skip downloads and exit the program after \f[I]N\f[] consecutive skips .br * \f[I]"enumerate"\f[]: Add an enumeration index to the beginning of the filename extension (\f[I]file.1.ext\f[], \f[I]file.2.ext\f[], etc.) .SS extractor.*.sleep .IP "Type:" 6 \f[I]Duration\f[] .IP "Default:" 9 \f[I]0\f[] .IP "Description:" 4 Number of seconds to sleep before each download. .SS extractor.*.sleep-extractor .IP "Type:" 6 \f[I]Duration\f[] .IP "Default:" 9 \f[I]0\f[] .IP "Description:" 4 Number of seconds to sleep before handling an input URL, i.e. before starting a new extractor. .SS extractor.*.sleep-request .IP "Type:" 6 \f[I]Duration\f[] .IP "Default:" 9 \f[I]0\f[] .IP "Description:" 4 Minimal time interval in seconds between each HTTP request during data extraction. .SS extractor.*.username & .password .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 The username and password to use when attempting to log in to another site. Specifying a username and password is required for .br * \f[I]nijie\f[] and optional for .br * \f[I]aryion\f[] .br * \f[I]danbooru\f[] (*) .br * \f[I]e621\f[] (*) .br * \f[I]exhentai\f[] .br * \f[I]idolcomplex\f[] .br * \f[I]imgbb\f[] .br * \f[I]inkbunny\f[] .br * \f[I]instagram\f[] .br * \f[I]kemonoparty\f[] .br * \f[I]mangadex\f[] .br * \f[I]mangoxo\f[] .br * \f[I]pillowfort\f[] .br * \f[I]sankaku\f[] .br * \f[I]seisoparty\f[] .br * \f[I]subscribestar\f[] .br * \f[I]tapas\f[] .br * \f[I]tsumino\f[] .br * \f[I]twitter\f[] These values can also be specified via the \f[I]-u/--username\f[] and \f[I]-p/--password\f[] command-line options or by using a \f[I].netrc\f[] file. (see Authentication_) (*) The password value for \f[I]danbooru\f[] and \f[I]e621\f[] should be the API key found in your user profile, not the actual account password. .SS extractor.*.netrc .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Enable the use of \f[I].netrc\f[] authentication data. .SS extractor.*.cookies .IP "Type:" 6 \f[I]Path\f[] or \f[I]object\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Source to read additional cookies from. Either as .br * the \f[I]Path\f[] to a Mozilla/Netscape format cookies.txt file or .br * a JSON \f[I]object\f[] specifying cookies as a name-to-value mapping Example: .. code:: json { "cookie-name": "cookie-value", "sessionid" : "14313336321%3AsabDFvuASDnlpb%3A31", "isAdult" : "1" } .SS extractor.*.cookies-update .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 If \f[I]extractor.*.cookies\f[] specifies the \f[I]Path\f[] to a cookies.txt file and it can be opened and parsed without errors, update its contents with cookies received during data extraction. .SS extractor.*.proxy .IP "Type:" 6 \f[I]string\f[] or \f[I]object\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Proxy (or proxies) to be used for remote connections. .br * If this is a \f[I]string\f[], it is the proxy URL for all outgoing requests. .br * If this is an \f[I]object\f[], it is a scheme-to-proxy mapping to specify different proxy URLs for each scheme. It is also possible to set a proxy for a specific host by using \f[I]scheme://host\f[] as key. See \f[I]Requests' proxy documentation\f[] for more details. Example: .. code:: json { "http" : "http://10.10.1.10:3128", "https": "http://10.10.1.10:1080", "http://10.20.1.128": "http://10.10.1.10:5323" } Note: All proxy URLs should include a scheme, otherwise \f[I]http://\f[] is assumed. .SS extractor.*.source-address .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] with 1 \f[I]string\f[] and 1 \f[I]integer\f[] as elements .IP "Example:" 4 .br * "192.168.178.20" .br * ["192.168.178.20", 8080] .IP "Description:" 4 Client-side IP address to bind to. Can be either a simple \f[I]string\f[] with just the local IP address .br or a \f[I]list\f[] with IP and explicit port number as elements. .br .SS extractor.*.user-agent .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0"\f[] .IP "Description:" 4 User-Agent header value to be used for HTTP requests. Note: This option has no effect on pixiv extractors, as these need specific values to function correctly. .SS extractor.*.browser .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"firefox"\f[] for \f[I]patreon\f[], \f[I]null\f[] everywhere else .IP "Example:" 4 .br * "chrome:macos" .IP "Description:" 4 Try to emulate a real browser (\f[I]firefox\f[] or \f[I]chrome\f[]) by using their default HTTP headers and TLS ciphers for HTTP requests. Optionally, the operating system used in the \f[I]User-Agent\f[] header can be specified after a \f[I]:\f[] (\f[I]windows\f[], \f[I]linux\f[], or \f[I]macos\f[]). Note: \f[I]requests\f[] and \f[I]urllib3\f[] only support HTTP/1.1, while a real browser would use HTTP/2. .SS extractor.*.keywords .IP "Type:" 6 \f[I]object\f[] .IP "Example:" 4 {"type": "Pixel Art", "type_id": 123} .IP "Description:" 4 Additional key-value pairs to be added to each metadata dictionary. .SS extractor.*.keywords-default .IP "Type:" 6 any .IP "Default:" 9 \f[I]"None"\f[] .IP "Description:" 4 Default value used for missing or undefined keyword names in \f[I]format strings\f[]. .SS extractor.*.url-metadata .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Insert a file's download URL into its metadata dictionary as the given name. For example, setting this option to \f[I]"gdl_file_url"\f[] will cause a new metadata field with name \f[I]gdl_file_url\f[] to appear, which contains the current file's download URL. This can then be used in \f[I]filenames\f[], with a \f[I]metadata\f[] post processor, etc. .SS extractor.*.category-transfer .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 Extractor-specific .IP "Description:" 4 Transfer an extractor's (sub)category values to all child extractors spawned by it, to let them inherit their parent's config options. .SS extractor.*.blacklist & .whitelist .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]["oauth", "recursive", "test"]\f[] + current extractor category .IP "Example:" 4 ["imgur", "gfycat:user", "*:image"] .IP "Description:" 4 A list of extractor identifiers to ignore (or allow) when spawning child extractors for unknown URLs, e.g. from \f[I]reddit\f[] or \f[I]plurk\f[]. Each identifier can be .br * A category or basecategory name (\f[I]"imgur"\f[], \f[I]"mastodon"\f[]) .br * | A (base)category-subcategory pair, where both names are separated by a colon (\f[I]"gfycat:user"\f[]). Both names can be a * or left empty, matching all possible names (\f[I]"*:image"\f[], \f[I]":user"\f[]). .br Note: Any \f[I]blacklist\f[] setting will automatically include \f[I]"oauth"\f[], \f[I]"recursive"\f[], and \f[I]"test"\f[]. .SS extractor.*.archive .IP "Type:" 6 \f[I]Path\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Example:" 4 "$HOME/.archives/{category}.sqlite3" .IP "Description:" 4 File to store IDs of downloaded files in. Downloads of files already recorded in this archive file will be \f[I]skipped\f[]. The resulting archive file is not a plain text file but an SQLite3 database, as either lookup operations are significantly faster or memory requirements are significantly lower when the amount of stored IDs gets reasonably large. Note: Archive files that do not already exist get generated automatically. Note: Archive paths support regular \f[I]format string\f[] replacements, but be aware that using external inputs for building local paths may pose a security risk. .SS extractor.*.archive-format .IP "Type:" 6 \f[I]string\f[] .IP "Example:" 4 "{id}_{offset}" .IP "Description:" 4 An alternative \f[I]format string\f[] to build archive IDs with. .SS extractor.*.archive-prefix .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"{category}"\f[] .IP "Description:" 4 Prefix for archive IDs. .SS extractor.*.postprocessors .IP "Type:" 6 \f[I]list\f[] of \f[I]Postprocessor Configuration\f[] objects .IP "Example:" 4 .. code:: json [ { "name": "zip" , "compression": "store" }, { "name": "exec", "command": ["/home/foobar/script", "{category}", "{image_id}"] } ] .IP "Description:" 4 A list of \f[I]post processors\f[] to be applied to each downloaded file in the specified order. Unlike other options, a \f[I]postprocessors\f[] setting at a deeper level .br does not override any \f[I]postprocessors\f[] setting at a lower level. Instead, all post processors from all applicable \f[I]postprocessors\f[] .br settings get combined into a single list. For example .br * an \f[I]mtime\f[] post processor at \f[I]extractor.postprocessors\f[], .br * a \f[I]zip\f[] post processor at \f[I]extractor.pixiv.postprocessors\f[], .br * and using \f[I]--exec\f[] will run all three post processors - \f[I]mtime\f[], \f[I]zip\f[], \f[I]exec\f[] - for each downloaded \f[I]pixiv\f[] file. .SS extractor.*.retries .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]4\f[] .IP "Description:" 4 Maximum number of times a failed HTTP request is retried before giving up, or \f[I]-1\f[] for infinite retries. .SS extractor.*.timeout .IP "Type:" 6 \f[I]float\f[] .IP "Default:" 9 \f[I]30.0\f[] .IP "Description:" 4 Amount of time (in seconds) to wait for a successful connection and response from a remote server. This value gets internally used as the \f[I]timeout\f[] parameter for the \f[I]requests.request()\f[] method. .SS extractor.*.verify .IP "Type:" 6 \f[I]bool\f[] or \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Controls whether to verify SSL/TLS certificates for HTTPS requests. If this is a \f[I]string\f[], it must be the path to a CA bundle to use instead of the default certificates. This value gets internally used as the \f[I]verify\f[] parameter for the \f[I]requests.request()\f[] method. .SS extractor.*.download .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Controls whether to download media files. Setting this to \f[I]false\f[] won't download any files, but all other functions (\f[I]postprocessors\f[], \f[I]download archive\f[], etc.) will be executed as normal. .SS extractor.*.fallback .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Use fallback download URLs when a download fails. .SS extractor.*.image-range .IP "Type:" 6 \f[I]string\f[] .IP "Example:" 4 .br * "10-20" .br * "-5, 10, 30-50, 100-" .IP "Description:" 4 Index-range(s) specifying which images to download. Note: The index of the first image is \f[I]1\f[]. .SS extractor.*.chapter-range .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 Like \f[I]image-range\f[], but applies to delegated URLs like manga-chapters, etc. .SS extractor.*.image-filter .IP "Type:" 6 \f[I]string\f[] .IP "Example:" 4 .br * "width >= 1200 and width/height > 1.2" .br * "re.search(r'foo(bar)+', description)" .IP "Description:" 4 Python expression controlling which files to download. Files for which the expression evaluates to \f[I]False\f[] are ignored. .br Available keys are the filename-specific ones listed by \f[I]-K\f[] or \f[I]-j\f[]. .br .SS extractor.*.chapter-filter .IP "Type:" 6 \f[I]string\f[] .IP "Example:" 4 .br * "lang == 'en'" .br * "language == 'French' and 10 <= chapter < 20" .IP "Description:" 4 Like \f[I]image-filter\f[], but applies to delegated URLs like manga-chapters, etc. .SS extractor.*.image-unique .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Ignore image URLs that have been encountered before during the current extractor run. .SS extractor.*.chapter-unique .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Like \f[I]image-unique\f[], but applies to delegated URLs like manga-chapters, etc. .SS extractor.*.date-format .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"%Y-%m-%dT%H:%M:%S"\f[] .IP "Description:" 4 Format string used to parse \f[I]string\f[] values of date-min and date-max. See \f[I]strptime\f[] for a list of formatting directives. .SH EXTRACTOR-SPECIFIC OPTIONS .SS extractor.artstation.external .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Try to follow external URLs of embedded players. .SS extractor.aryion.recursive .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Controls the post extraction strategy. .br * \f[I]true\f[]: Start on users' main gallery pages and recursively descend into subfolders .br * \f[I]false\f[]: Get posts from "Latest Updates" pages .SS extractor.bbc.width .IP "Type:" 6 \f[I]int\f[] .IP "Default:" 9 \f[I]1920\f[] .IP "Description:" 4 Specifies the requested image width. This value must be divisble by 16 and gets rounded down otherwise. The maximum possible value appears to be \f[I]1920\f[]. .SS extractor.blogger.videos .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download embedded videos hosted on https://www.blogger.com/ .SS extractor.danbooru.external .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 For unavailable or restricted posts, follow the \f[I]source\f[] and download from there if possible. .SS extractor.danbooru.metadata .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract additional metadata (notes, artist commentary, parent, children) Note: This requires 1 additional HTTP request for each post. .SS extractor.danbooru.ugoira .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Controls the download target for Ugoira posts. .br * \f[I]true\f[]: Original ZIP archives .br * \f[I]false\f[]: Converted video files .SS extractor.derpibooru.api-key .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Your \f[I]Derpibooru API Key\f[], to use your account's browsing settings and filters. .SS extractor.derpibooru.filter .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]56027\f[] (\f[I]Everything\f[] filter) .IP "Description:" 4 The content filter ID to use. Setting an explicit filter ID overrides any default filters and can be used to access 18+ content without \f[I]API Key\f[]. See \f[I]Filters\f[] for details. .SS extractor.deviantart.auto-watch .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Automatically watch users when encountering "Watchers-Only Deviations" (requires a \f[I]refresh-token\f[]). .SS extractor.deviantart.auto-unwatch .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 After watching a user through \f[I]auto-watch\f[], unwatch that user at the end of the current extractor run. .SS extractor.deviantart.comments .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract \f[I]comments\f[] metadata. .SS extractor.deviantart.extra .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download extra Sta.sh resources from description texts and journals. Note: Enabling this option also enables deviantart.metadata_. .SS extractor.deviantart.flat .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Select the directory structure created by the Gallery- and Favorite-Extractors. .br * \f[I]true\f[]: Use a flat directory structure. .br * \f[I]false\f[]: Collect a list of all gallery-folders or favorites-collections and transfer any further work to other extractors (\f[I]folder\f[] or \f[I]collection\f[]), which will then create individual subdirectories for each of them. Note: Going through all gallery folders will not be able to fetch deviations which aren't in any folder. .SS extractor.deviantart.folders .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Provide a \f[I]folders\f[] metadata field that contains the names of all folders a deviation is present in. Note: Gathering this information requires a lot of API calls. Use with caution. .SS extractor.deviantart.include .IP "Type:" 6 \f[I]string\f[] or \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"gallery"\f[] .IP "Example:" 4 "favorite,journal,scraps" or ["favorite", "journal", "scraps"] .IP "Description:" 4 A (comma-separated) list of subcategories to include when processing a user profile. Possible values are \f[I]"gallery"\f[], \f[I]"scraps"\f[], \f[I]"journal"\f[], \f[I]"favorite"\f[]. You can use \f[I]"all"\f[] instead of listing all values separately. .SS extractor.deviantart.journals .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"html"\f[] .IP "Description:" 4 Selects the output format of journal entries. .br * \f[I]"html"\f[]: HTML with (roughly) the same layout as on DeviantArt. .br * \f[I]"text"\f[]: Plain text with image references and HTML tags removed. .br * \f[I]"none"\f[]: Don't download journals. .SS extractor.deviantart.mature .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Enable mature content. This option simply sets the \f[I]mature_content\f[] parameter for API calls to either \f[I]"true"\f[] or \f[I]"false"\f[] and does not do any other form of content filtering. .SS extractor.deviantart.metadata .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Request extended metadata for deviation objects to additionally provide \f[I]description\f[], \f[I]tags\f[], \f[I]license\f[] and \f[I]is_watching\f[] fields. .SS extractor.deviantart.original .IP "Type:" 6 \f[I]bool\f[] or \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download original files if available. Setting this option to \f[I]"images"\f[] only downloads original files if they are images and falls back to preview versions for everything else (archives, etc.). .SS extractor.deviantart.refresh-token .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 The \f[I]refresh-token\f[] value you get from \f[I]linking your DeviantArt account to gallery-dl\f[]. Using a \f[I]refresh-token\f[] allows you to access private or otherwise not publicly available deviations. Note: The \f[I]refresh-token\f[] becomes invalid \f[I]after 3 months\f[] or whenever your \f[I]cache file\f[] is deleted or cleared. .SS extractor.deviantart.wait-min .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]0\f[] .IP "Description:" 4 Minimum wait time in seconds before API requests. .SS extractor.exhentai.domain .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"auto"\f[] .IP "Description:" 4 .br * \f[I]"auto"\f[]: Use \f[I]e-hentai.org\f[] or \f[I]exhentai.org\f[] depending on the input URL .br * \f[I]"e-hentai.org"\f[]: Use \f[I]e-hentai.org\f[] for all URLs .br * \f[I]"exhentai.org"\f[]: Use \f[I]exhentai.org\f[] for all URLs .SS extractor.exhentai.limits .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Sets a custom image download limit and stops extraction when it gets exceeded. .SS extractor.exhentai.metadata .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Load extended gallery metadata from the \f[I]API\f[]. Adds \f[I]archiver_key\f[], \f[I]posted\f[], and \f[I]torrents\f[]. Makes \f[I]date\f[] and \f[I]filesize\f[] more precise. .SS extractor.exhentai.original .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download full-sized original images if available. .SS extractor.exhentai.source .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"gallery"\f[] .IP "Description:" 4 Selects an alternative source to download files from. .br * \f[I]"hitomi"\f[]: Download the corresponding gallery from \f[I]hitomi.la\f[] .SS extractor.fanbox.embeds .IP "Type:" 6 \f[I]bool\f[] or \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Control behavior on embedded content from external sites. .br * \f[I]true\f[]: Extract embed URLs and download them if supported (videos are not downloaded). .br * \f[I]"ytdl"\f[]: Like \f[I]true\f[], but let \f[I]youtube-dl\f[] handle video extraction and download for YouTube, Vimeo and SoundCloud embeds. .br * \f[I]false\f[]: Ignore embeds. .SS extractor.flickr.access-token & .access-token-secret .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 The \f[I]access_token\f[] and \f[I]access_token_secret\f[] values you get from \f[I]linking your Flickr account to gallery-dl\f[]. .SS extractor.flickr.videos .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Extract and download videos. .SS extractor.flickr.size-max .IP "Type:" 6 \f[I]integer\f[] or \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Sets the maximum allowed size for downloaded images. .br * If this is an \f[I]integer\f[], it specifies the maximum image dimension (width and height) in pixels. .br * If this is a \f[I]string\f[], it should be one of Flickr's format specifiers (\f[I]"Original"\f[], \f[I]"Large"\f[], ... or \f[I]"o"\f[], \f[I]"k"\f[], \f[I]"h"\f[], \f[I]"l"\f[], ...) to use as an upper limit. .SS extractor.furaffinity.descriptions .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"text"\f[] .IP "Description:" 4 Controls the format of \f[I]description\f[] metadata fields. .br * \f[I]"text"\f[]: Plain text with HTML tags removed .br * \f[I]"html"\f[]: Raw HTML content .SS extractor.furaffinity.external .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Follow external URLs linked in descriptions. .SS extractor.furaffinity.include .IP "Type:" 6 \f[I]string\f[] or \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"gallery"\f[] .IP "Example:" 4 "scraps,favorite" or ["scraps", "favorite"] .IP "Description:" 4 A (comma-separated) list of subcategories to include when processing a user profile. Possible values are \f[I]"gallery"\f[], \f[I]"scraps"\f[], \f[I]"favorite"\f[]. You can use \f[I]"all"\f[] instead of listing all values separately. .SS extractor.furaffinity.layout .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"auto"\f[] .IP "Description:" 4 Selects which site layout to expect when parsing posts. .br * \f[I]"auto"\f[]: Automatically differentiate between \f[I]"old"\f[] and \f[I]"new"\f[] .br * \f[I]"old"\f[]: Expect the *old* site layout .br * \f[I]"new"\f[]: Expect the *new* site layout .SS extractor.generic.enabled .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Match **all** URLs not otherwise supported by gallery-dl, even ones without a \f[I]generic:\f[] prefix. .SS extractor.gfycat.format .IP "Type:" 6 .br * \f[I]list\f[] of \f[I]strings\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]["mp4", "webm", "mobile", "gif"]\f[] .IP "Description:" 4 List of names of the preferred animation format, which can be \f[I]"mp4"\f[], \f[I]"webm"\f[], \f[I]"mobile"\f[], \f[I]"gif"\f[], or \f[I]"webp"\f[]. If a selected format is not available, the next one in the list will be tried until an available format is found. If the format is given as \f[I]string\f[], it will be extended with \f[I]["mp4", "webm", "mobile", "gif"]\f[]. Use a list with one element to restrict it to only one possible format. .SS extractor.gofile.api-token .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 API token value found at the bottom of your \f[I]profile page\f[]. If not set, a temporary guest token will be used. .SS extractor.gofile.recursive .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Recursively download files from subfolders. .SS extractor.hentaifoundry.include .IP "Type:" 6 \f[I]string\f[] or \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"pictures"\f[] .IP "Example:" 4 "scraps,stories" or ["scraps", "stories"] .IP "Description:" 4 A (comma-separated) list of subcategories to include when processing a user profile. Possible values are \f[I]"pictures"\f[], \f[I]"scraps"\f[], \f[I]"stories"\f[], \f[I]"favorite"\f[]. You can use \f[I]"all"\f[] instead of listing all values separately. .SS extractor.hitomi.format .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"webp"\f[] .IP "Description:" 4 Selects which image format to download. Available formats are \f[I]"webp"\f[] and \f[I]"avif"\f[]. \f[I]"original"\f[] will try to download the original \f[I]jpg\f[] or \f[I]png\f[] versions, but is most likely going to fail with \f[I]403 Forbidden\f[] errors. .SS extractor.imgur.mp4 .IP "Type:" 6 \f[I]bool\f[] or \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Controls whether to choose the GIF or MP4 version of an animation. .br * \f[I]true\f[]: Follow Imgur's advice and choose MP4 if the \f[I]prefer_video\f[] flag in an image's metadata is set. .br * \f[I]false\f[]: Always choose GIF. .br * \f[I]"always"\f[]: Always choose MP4. .SS extractor.inkbunny.orderby .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"create_datetime"\f[] .IP "Description:" 4 Value of the \f[I]orderby\f[] parameter for submission searches. (See \f[I]API#Search\f[] for details) .SS extractor.instagram.include .IP "Type:" 6 \f[I]string\f[] or \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"posts"\f[] .IP "Example:" 4 "stories,highlights,posts" or ["stories", "highlights", "posts"] .IP "Description:" 4 A (comma-separated) list of subcategories to include when processing a user profile. Possible values are \f[I]"posts"\f[], \f[I]"reels"\f[], \f[I]"channel"\f[], \f[I]"tagged"\f[], \f[I]"stories"\f[], \f[I]"highlights"\f[]. You can use \f[I]"all"\f[] instead of listing all values separately. .SS extractor.instagram.previews .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download video previews. .SS extractor.instagram.videos .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download video files. .SS extractor.kemonoparty.comments .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract \f[I]comments\f[] metadata. .SS extractor.kemonoparty.duplicates .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Controls how to handle duplicate files in a post. .br * \f[I]true\f[]: Download duplicates .br * \f[I]false\f[]: Ignore duplicates .SS extractor.kemonoparty.dms .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract a user's direct messages as \f[I]dms\f[] metadata. .SS extractor.kemonoparty.files .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]["attachments", "file", "inline"]\f[] .IP "Description:" 4 Determines the type and order of files to be downloaded. Available types are \f[I]file\f[], \f[I]attachments\f[], and \f[I]inline\f[]. .SS extractor.kemonoparty.max-posts .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Limit the number of posts to download. .SS extractor.kemonoparty.metadata .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract \f[I]username\f[] metadata .SS extractor.khinsider.format .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"mp3"\f[] .IP "Description:" 4 The name of the preferred file format to download. Use \f[I]"all"\f[] to download all available formats, or a (comma-separated) list to select multiple formats. If the selected format is not available, the first in the list gets chosen (usually mp3). .SS extractor.luscious.gif .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Format in which to download animated images. Use \f[I]true\f[] to download animated images as gifs and \f[I]false\f[] to download as mp4 videos. .SS extractor.mangadex.api-server .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"https://api.mangadex.org"\f[] .IP "Description:" 4 The server to use for API requests. .SS extractor.mangadex.api-parameters .IP "Type:" 6 \f[I]object\f[] .IP "Example:" 4 {"order[updatedAt]": "desc"} .IP "Description:" 4 Additional query parameters to send when fetching manga chapters. (See \f[I]/manga/{id}/feed\f[] and \f[I]/user/follows/manga/feed\f[]) .SS extractor.mangadex.lang .IP "Type:" 6 \f[I]string\f[] .IP "Example:" 4 "en" .IP "Description:" 4 \f[I]ISO 639-1\f[] language code to filter chapters by. .SS extractor.mangadex.ratings .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]["safe", "suggestive", "erotica", "pornographic"]\f[] .IP "Description:" 4 List of acceptable content ratings for returned chapters. .SS extractor.mastodon.reblogs .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Fetch media from reblogged posts. .SS extractor.mastodon.replies .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Fetch media from replies to other posts. .SS extractor.mastodon.text-posts .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Also emit metadata for text-only posts without media content. .SS extractor.newgrounds.flash .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download original Adobe Flash animations instead of pre-rendered videos. .SS extractor.newgrounds.format .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"original"\f[] .IP "Example:" 4 "720p" .IP "Description:" 4 Selects the preferred format for video downloads. If the selected format is not available, the next smaller one gets chosen. .SS extractor.newgrounds.include .IP "Type:" 6 \f[I]string\f[] or \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"art"\f[] .IP "Example:" 4 "movies,audio" or ["movies", "audio"] .IP "Description:" 4 A (comma-separated) list of subcategories to include when processing a user profile. Possible values are \f[I]"art"\f[], \f[I]"audio"\f[], \f[I]"movies"\f[]. You can use \f[I]"all"\f[] instead of listing all values separately. .SS extractor.nijie.include .IP "Type:" 6 \f[I]string\f[] or \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"illustration,doujin"\f[] .IP "Description:" 4 A (comma-separated) list of subcategories to include when processing a user profile. Possible values are \f[I]"illustration"\f[], \f[I]"doujin"\f[], \f[I]"favorite"\f[]. You can use \f[I]"all"\f[] instead of listing all values separately. .SS extractor.oauth.browser .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Controls how a user is directed to an OAuth authorization page. .br * \f[I]true\f[]: Use Python's \f[I]webbrowser.open()\f[] method to automatically open the URL in the user's default browser. .br * \f[I]false\f[]: Ask the user to copy & paste an URL from the terminal. .SS extractor.oauth.cache .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Store tokens received during OAuth authorizations in \f[I]cache\f[]. .SS extractor.oauth.port .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]6414\f[] .IP "Description:" 4 Port number to listen on during OAuth authorization. Note: All redirects will go to http://localhost:6414/, regardless of the port specified here. You'll have to manually adjust the port number in your browser's address bar when using a different port than the default. .SS extractor.patreon.files .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]["images", "image_large", "attachments", "postfile", "content"]\f[] .IP "Description:" 4 Determines the type and order of files to be downloaded. Available types are \f[I]postfile\f[], \f[I]images\f[], \f[I]image_large\f[], \f[I]attachments\f[], and \f[I]content\f[]. .SS extractor.photobucket.subalbums .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download subalbums. .SS extractor.pillowfort.external .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Follow links to external sites, e.g. Twitter, .SS extractor.pillowfort.inline .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Extract inline images. .SS extractor.pillowfort.reblogs .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract media from reblogged posts. .SS extractor.pinterest.sections .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Include pins from board sections. .SS extractor.pinterest.videos .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download from video pins. .SS extractor.pixiv.user.avatar .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download user avatars. .SS extractor.pixiv.user.metadata .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Fetch extended \f[I]user\f[] metadata. .SS extractor.pixiv.work.related .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Also download related artworks. .SS extractor.pixiv.tags .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"japanese"\f[] .IP "Description:" 4 Controls the \f[I]tags\f[] metadata field. .br * "japanese": List of Japanese tags .br * "translated": List of translated tags .br * "original": Unmodified list with both Japanese and translated tags .SS extractor.pixiv.ugoira .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download Pixiv's Ugoira animations or ignore them. These animations come as a \f[I].zip\f[] file containing all animation frames in JPEG format. Use an ugoira post processor to convert them to watchable videos. (Example__) .SS extractor.pixiv.max-posts .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]0\f[] .IP "Description:" 4 When downloading galleries, this sets the maximum number of posts to get. A value of \f[I]0\f[] means no limit. .SS extractor.plurk.comments .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Also search Plurk comments for URLs. .SS extractor.reactor.gif .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Format in which to download animated images. Use \f[I]true\f[] to download animated images as gifs and \f[I]false\f[] to download as mp4 videos. .SS extractor.readcomiconline.captcha .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"stop"\f[] .IP "Description:" 4 Controls how to handle redirects to CAPTCHA pages. .br * \f[I]"stop\f[]: Stop the current extractor run. .br * \f[I]"wait\f[]: Ask the user to solve the CAPTCHA and wait. .SS extractor.reddit.comments .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]0\f[] .IP "Description:" 4 The value of the \f[I]limit\f[] parameter when loading a submission and its comments. This number (roughly) specifies the total amount of comments being retrieved with the first API call. Reddit's internal default and maximum values for this parameter appear to be 200 and 500 respectively. The value \f[I]0\f[] ignores all comments and significantly reduces the time required when scanning a subreddit. .SS extractor.reddit.morecomments .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Retrieve additional comments by resolving the \f[I]more\f[] comment stubs in the base comment tree. This requires 1 additional API call for every 100 extra comments. .SS extractor.reddit.date-min & .date-max .IP "Type:" 6 \f[I]Date\f[] .IP "Default:" 9 \f[I]0\f[] and \f[I]253402210800\f[] (timestamp of \f[I]datetime.max\f[]) .IP "Description:" 4 Ignore all submissions posted before/after this date. .SS extractor.reddit.id-min & .id-max .IP "Type:" 6 \f[I]string\f[] .IP "Example:" 4 "6kmzv2" .IP "Description:" 4 Ignore all submissions posted before/after the submission with this ID. .SS extractor.reddit.recursion .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]0\f[] .IP "Description:" 4 Reddit extractors can recursively visit other submissions linked to in the initial set of submissions. This value sets the maximum recursion depth. Special values: .br * \f[I]0\f[]: Recursion is disabled .br * \f[I]-1\f[]: Infinite recursion (don't do this) .SS extractor.reddit.refresh-token .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 The \f[I]refresh-token\f[] value you get from \f[I]linking your Reddit account to gallery-dl\f[]. Using a \f[I]refresh-token\f[] allows you to access private or otherwise not publicly available subreddits, given that your account is authorized to do so, but requests to the reddit API are going to be rate limited at 600 requests every 10 minutes/600 seconds. .SS extractor.reddit.videos .IP "Type:" 6 \f[I]bool\f[] or \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Control video download behavior. .br * \f[I]true\f[]: Download videos and use \f[I]youtube-dl\f[] to handle HLS and DASH manifests .br * \f[I]"ytdl"\f[]: Download videos and let \f[I]youtube-dl\f[] handle all of video extraction and download .br * \f[I]false\f[]: Ignore videos .SS extractor.redgifs.format .IP "Type:" 6 .br * \f[I]list\f[] of \f[I]strings\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]["hd", "sd", "gif"]\f[] .IP "Description:" 4 List of names of the preferred animation format, which can be \f[I]"hd"\f[], \f[I]"sd"\f[], "gif", "vthumbnail"`, "thumbnail"\f[I], or \f[]"poster"\f[I]. If a selected format is not available, the next one in the list will be tried until an available format is found. If the format is given as \f[]string\f[I], it will be extended with \f[]["hd", "sd", "gif"]``. Use a list with one element to restrict it to only one possible format. .SS extractor.sankakucomplex.embeds .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download video embeds from external sites. .SS extractor.sankakucomplex.videos .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download videos. .SS extractor.skeb.sent-requests .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download sent requests. .SS extractor.skeb.thumbnails .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download thumbnails. .SS extractor.smugmug.videos .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download video files. .SS extractor.tumblr.avatar .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download blog avatars. .SS extractor.tumblr.date-min & .date-max .IP "Type:" 6 \f[I]Date\f[] .IP "Default:" 9 \f[I]0\f[] and \f[I]null\f[] .IP "Description:" 4 Ignore all posts published before/after this date. .SS extractor.tumblr.external .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Follow external URLs (e.g. from "Link" posts) and try to extract images from them. .SS extractor.tumblr.inline .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Search posts for inline images and videos. .SS extractor.tumblr.reblogs .IP "Type:" 6 \f[I]bool\f[] or \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 .br * \f[I]true\f[]: Extract media from reblogged posts .br * \f[I]false\f[]: Skip reblogged posts .br * \f[I]"same-blog"\f[]: Skip reblogged posts unless the original post is from the same blog .SS extractor.tumblr.posts .IP "Type:" 6 \f[I]string\f[] or \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"all"\f[] .IP "Example:" 4 "video,audio,link" or ["video", "audio", "link"] .IP "Description:" 4 A (comma-separated) list of post types to extract images, etc. from. Possible types are \f[I]text\f[], \f[I]quote\f[], \f[I]link\f[], \f[I]answer\f[], \f[I]video\f[], \f[I]audio\f[], \f[I]photo\f[], \f[I]chat\f[]. You can use \f[I]"all"\f[] instead of listing all types separately. .SS extractor.twibooru.api-key .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Your \f[I]Twibooru API Key\f[], to use your account's browsing settings and filters. .SS extractor.twibooru.filter .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]2\f[] (\f[I]Everything\f[] filter) .IP "Description:" 4 The content filter ID to use. Setting an explicit filter ID overrides any default filters and can be used to access 18+ content without \f[I]API Key\f[]. See \f[I]Filters\f[] for details. .SS extractor.twitter.cards .IP "Type:" 6 \f[I]bool\f[] or \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Controls how to handle \f[I]Twitter Cards\f[]. .br * \f[I]false\f[]: Ignore cards .br * \f[I]true\f[]: Download image content from supported cards .br * \f[I]"ytdl"\f[]: Additionally download video content from unsupported cards using \f[I]youtube-dl\f[] .SS extractor.twitter.conversations .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Fetch media from all Tweets and replies in a \f[I]conversation \f[]. .SS extractor.twitter.size .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]["orig", "4096x4096", "large", "medium", "small"]\f[] .IP "Description:" 4 The image version to download. Any entries after the first one will be used for potential \f[I]fallback\f[] URLs. Known available sizes are \f[I]4096x4096\f[], \f[I]orig\f[], \f[I]large\f[], \f[I]medium\f[], and \f[I]small\f[]. .SS extractor.twitter.syndication .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Retrieve age-restricted content using Twitter's syndication API. .SS extractor.twitter.logout .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Logout and retry as guest when access to another user's Tweets is blocked. .SS extractor.twitter.pinned .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Fetch media from pinned Tweets. .SS extractor.twitter.quoted .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Fetch media from quoted Tweets. .SS extractor.twitter.replies .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Fetch media from replies to other Tweets. If this value is \f[I]"self"\f[], only consider replies where reply and original Tweet are from the same user. .SS extractor.twitter.retweets .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Fetch media from Retweets. If this value is \f[I]"original"\f[], metadata for these files will be taken from the original Tweets, not the Retweets. .SS extractor.twitter.text-tweets .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Also emit metadata for text-only Tweets without media content. This only has an effect with a \f[I]metadata\f[] (or \f[I]exec\f[]) post processor with \f[I]"event": "post"\f[] and appropriate \f[I]filename\f[]. .SS extractor.twitter.twitpic .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract \f[I]TwitPic\f[] embeds. .SS extractor.twitter.users .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"timeline"\f[] .IP "Example:" 4 "https://twitter.com/search?q=from:{legacy[screen_name]}" .IP "Description:" 4 Format string for user URLs generated from .br \f[I]following\f[] and \f[I]list-members\f[] queries, whose replacement field values come from Twitter \f[I]user\f[] objects .br (\f[I]Example\f[]) Special values: .br * \f[I]"timeline"\f[]: \f[I]https://twitter.com/i/user/{rest_id}\f[] .br * \f[I]"media"\f[]: \f[I]https://twitter.com/id:{rest_id}/media\f[] Note: To allow gallery-dl to follow custom URL formats, set the \f[I]blacklist\f[] for \f[I]twitter\f[] to a non-default value, e.g. an empty string \f[I]""\f[]. .SS extractor.twitter.videos .IP "Type:" 6 \f[I]bool\f[] or \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Control video download behavior. .br * \f[I]true\f[]: Download videos .br * \f[I]"ytdl"\f[]: Download videos using \f[I]youtube-dl\f[] .br * \f[I]false\f[]: Skip video Tweets .SS extractor.unsplash.format .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"raw"\f[] .IP "Description:" 4 Name of the image format to download. Available formats are \f[I]"raw"\f[], \f[I]"full"\f[], \f[I]"regular"\f[], \f[I]"small"\f[], and \f[I]"thumb"\f[]. .SS extractor.vsco.videos .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download video files. .SS extractor.wallhaven.api-key .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Your \f[I]Wallhaven API Key\f[], to use your account's browsing settings and default filters when searching. See https://wallhaven.cc/help/api for more information. .SS extractor.weasyl.api-key .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Your \f[I]Weasyl API Key\f[], to use your account's browsing settings and filters. .SS extractor.weibo.retweets .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Fetch media from retweeted posts. If this value is \f[I]"original"\f[], metadata for these files will be taken from the original posts, not the retweeted posts. .SS extractor.weibo.videos .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download video files. .SS extractor.ytdl.enabled .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Match **all** URLs, even ones without a \f[I]ytdl:\f[] prefix. .SS extractor.ytdl.format .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 youtube-dl's default, currently \f[I]"bestvideo+bestaudio/best"\f[] .IP "Description:" 4 Video \f[I]format selection \f[] directly passed to youtube-dl. .SS extractor.ytdl.generic .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Controls the use of youtube-dl's generic extractor. Set this option to \f[I]"force"\f[] for the same effect as youtube-dl's \f[I]--force-generic-extractor\f[]. .SS extractor.ytdl.logging .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Route youtube-dl's output through gallery-dl's logging system. Otherwise youtube-dl will write its output directly to stdout/stderr. Note: Set \f[I]quiet\f[] and \f[I]no_warnings\f[] in \f[I]extractor.ytdl.raw-options\f[] to \f[I]true\f[] to suppress all output. .SS extractor.ytdl.module .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Name of the youtube-dl Python module to import. Setting this to \f[I]null\f[] will try to import \f[I]"yt_dlp"\f[] followed by \f[I]"youtube_dl"\f[] as fallback. .SS extractor.ytdl.raw-options .IP "Type:" 6 \f[I]object\f[] .IP "Example:" 4 .. code:: json { "quiet": true, "writesubtitles": true, "merge_output_format": "mkv" } .IP "Description:" 4 Additional options passed directly to the \f[I]YoutubeDL\f[] constructor. All available options can be found in \f[I]youtube-dl's docstrings \f[]. .SS extractor.ytdl.cmdline-args .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Example:" 4 .br * "--quiet --write-sub --merge-output-format mkv" .br * ["--quiet", "--write-sub", "--merge-output-format", "mkv"] .IP "Description:" 4 Additional options specified as youtube-dl command-line arguments. .SS extractor.ytdl.config-file .IP "Type:" 6 \f[I]Path\f[] .IP "Example:" 4 "~/.config/youtube-dl/config" .IP "Description:" 4 Location of a youtube-dl configuration file to load options from. .SS extractor.[booru].tags .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Categorize tags by their respective types and provide them as \f[I]tags_\f[] metadata fields. Note: This requires 1 additional HTTP request for each post. .SS extractor.[booru].notes .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract overlay notes (position and text). Note: This requires 1 additional HTTP request for each post. .SS extractor.[manga-extractor].chapter-reverse .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Reverse the order of chapter URLs extracted from manga pages. .br * \f[I]true\f[]: Start with the latest chapter .br * \f[I]false\f[]: Start with the first chapter .SS extractor.[manga-extractor].page-reverse .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download manga chapter pages in reverse order. .SH DOWNLOADER OPTIONS .SS downloader.*.enabled .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Enable/Disable this downloader module. .SS downloader.*.filesize-min & .filesize-max .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Example:" 4 "32000", "500k", "2.5M" .IP "Description:" 4 Minimum/Maximum allowed file size in bytes. Any file smaller/larger than this limit will not be downloaded. Possible values are valid integer or floating-point numbers optionally followed by one of \f[I]k\f[], \f[I]m\f[]. \f[I]g\f[], \f[I]t\f[] or \f[I]p\f[]. These suffixes are case-insensitive. .SS downloader.*.mtime .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Use \f[I]Last-Modified\f[] HTTP response headers to set file modification times. .SS downloader.*.part .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Controls the use of \f[I].part\f[] files during file downloads. .br * \f[I]true\f[]: Write downloaded data into \f[I].part\f[] files and rename them upon download completion. This mode additionally supports resuming incomplete downloads. .br * \f[I]false\f[]: Do not use \f[I].part\f[] files and write data directly into the actual output files. .SS downloader.*.part-directory .IP "Type:" 6 \f[I]Path\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Alternate location for \f[I].part\f[] files. Missing directories will be created as needed. If this value is \f[I]null\f[], \f[I].part\f[] files are going to be stored alongside the actual output files. .SS downloader.*.progress .IP "Type:" 6 \f[I]float\f[] .IP "Default:" 9 \f[I]3.0\f[] .IP "Description:" 4 Number of seconds until a download progress indicator for the current download is displayed. Set this option to \f[I]null\f[] to disable this indicator. .SS downloader.*.rate .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Example:" 4 "32000", "500k", "2.5M" .IP "Description:" 4 Maximum download rate in bytes per second. Possible values are valid integer or floating-point numbers optionally followed by one of \f[I]k\f[], \f[I]m\f[]. \f[I]g\f[], \f[I]t\f[] or \f[I]p\f[]. These suffixes are case-insensitive. .SS downloader.*.retries .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]extractor.*.retries\f[] .IP "Description:" 4 Maximum number of retries during file downloads, or \f[I]-1\f[] for infinite retries. .SS downloader.*.timeout .IP "Type:" 6 \f[I]float\f[] or \f[I]null\f[] .IP "Default:" 9 \f[I]extractor.*.timeout\f[] .IP "Description:" 4 Connection timeout during file downloads. .SS downloader.*.verify .IP "Type:" 6 \f[I]bool\f[] or \f[I]string\f[] .IP "Default:" 9 \f[I]extractor.*.verify\f[] .IP "Description:" 4 Certificate validation during file downloads. .SS downloader.*.proxy .IP "Type:" 6 \f[I]string\f[] or \f[I]object\f[] .IP "Default:" 9 \f[I]extractor.*.proxy\f[] .IP "Description:" 4 Proxy server used for file downloads. .br Disable the use of a proxy by explicitly setting this option to \f[I]null\f[]. .br .SS downloader.http.adjust-extensions .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Check the file headers of \f[I]jpg\f[], \f[I]png\f[], and \f[I]gif\f[] files and adjust their filename extensions if they do not match. .SS downloader.http.headers .IP "Type:" 6 \f[I]object\f[] .IP "Example:" 4 {"Accept": "image/webp,*/*", "Referer": "https://example.org/"} .IP "Description:" 4 Additional HTTP headers to send when downloading files, .SS downloader.ytdl.format .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 youtube-dl's default, currently \f[I]"bestvideo+bestaudio/best"\f[] .IP "Description:" 4 Video \f[I]format selection \f[] directly passed to youtube-dl. .SS downloader.ytdl.forward-cookies .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Forward cookies to youtube-dl. .SS downloader.ytdl.logging .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Route youtube-dl's output through gallery-dl's logging system. Otherwise youtube-dl will write its output directly to stdout/stderr. Note: Set \f[I]quiet\f[] and \f[I]no_warnings\f[] in \f[I]downloader.ytdl.raw-options\f[] to \f[I]true\f[] to suppress all output. .SS downloader.ytdl.module .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Name of the youtube-dl Python module to import. Setting this to \f[I]null\f[] will first try to import \f[I]"yt_dlp"\f[] and use \f[I]"youtube_dl"\f[] as fallback. .SS downloader.ytdl.outtmpl .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 The \f[I]Output Template\f[] used to generate filenames for files downloaded with youtube-dl. Special values: .br * \f[I]null\f[]: generate filenames with \f[I]extractor.*.filename\f[] .br * \f[I]"default"\f[]: use youtube-dl's default, currently \f[I]"%(title)s-%(id)s.%(ext)s"\f[] Note: An output template other than \f[I]null\f[] might cause unexpected results in combination with other options (e.g. \f[I]"skip": "enumerate"\f[]) .SS downloader.ytdl.raw-options .IP "Type:" 6 \f[I]object\f[] .IP "Example:" 4 .. code:: json { "quiet": true, "writesubtitles": true, "merge_output_format": "mkv" } .IP "Description:" 4 Additional options passed directly to the \f[I]YoutubeDL\f[] constructor. All available options can be found in \f[I]youtube-dl's docstrings \f[]. .SS downloader.ytdl.cmdline-args .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Example:" 4 .br * "--quiet --write-sub --merge-output-format mkv" .br * ["--quiet", "--write-sub", "--merge-output-format", "mkv"] .IP "Description:" 4 Additional options specified as youtube-dl command-line arguments. .SS downloader.ytdl.config-file .IP "Type:" 6 \f[I]Path\f[] .IP "Example:" 4 "~/.config/youtube-dl/config" .IP "Description:" 4 Location of a youtube-dl configuration file to load options from. .SH OUTPUT OPTIONS .SS output.mode .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"auto"\f[] .IP "Description:" 4 Controls the output string format and status indicators. .br * \f[I]"null"\f[]: No output .br * \f[I]"pipe"\f[]: Suitable for piping to other processes or files .br * \f[I]"terminal"\f[]: Suitable for the standard Windows console .br * \f[I]"color"\f[]: Suitable for terminals that understand ANSI escape codes and colors .br * \f[I]"auto"\f[]: Automatically choose the best suitable output mode .SS output.shorten .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Controls whether the output strings should be shortened to fit on one console line. Set this option to \f[I]"eaw"\f[] to also work with east-asian characters with a display width greater than 1. .SS output.skip .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Show skipped file downloads. .SS output.fallback .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Include fallback URLs in the output of \f[I]-g/--get-urls\f[]. .SS output.private .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Include private fields, i.e. fields whose name starts with an underscore, in the output of \f[I]-K/--list-keywords\f[] and \f[I]-j/--dump-json\f[]. .SS output.progress .IP "Type:" 6 \f[I]bool\f[] or \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Controls the progress indicator when *gallery-dl* is run with multiple URLs as arguments. .br * \f[I]true\f[]: Show the default progress indicator (\f[I]"[{current}/{total}] {url}"\f[]) .br * \f[I]false\f[]: Do not show any progress indicator .br * Any \f[I]string\f[]: Show the progress indicator using this as a custom \f[I]format string\f[]. Possible replacement keys are \f[I]current\f[], \f[I]total\f[] and \f[I]url\f[]. .SS output.log .IP "Type:" 6 \f[I]string\f[] or \f[I]Logging Configuration\f[] .IP "Default:" 9 \f[I]"[{name}][{levelname}] {message}"\f[] .IP "Description:" 4 Configuration for standard logging output to stderr. If this is a simple \f[I]string\f[], it specifies the format string for logging messages. .SS output.logfile .IP "Type:" 6 \f[I]Path\f[] or \f[I]Logging Configuration\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 File to write logging output to. .SS output.unsupportedfile .IP "Type:" 6 \f[I]Path\f[] or \f[I]Logging Configuration\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 File to write external URLs unsupported by *gallery-dl* to. The default format string here is \f[I]"{message}"\f[]. .SS output.num-to-str .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Convert numeric values (\f[I]integer\f[] or \f[I]float\f[]) to \f[I]string\f[] before outputting them as JSON. .SH POSTPROCESSOR OPTIONS .SS classify.mapping .IP "Type:" 6 \f[I]object\f[] .IP "Default:" 9 .. code:: json { "Pictures": ["jpg", "jpeg", "png", "gif", "bmp", "svg", "webp"], "Video" : ["flv", "ogv", "avi", "mp4", "mpg", "mpeg", "3gp", "mkv", "webm", "vob", "wmv"], "Music" : ["mp3", "aac", "flac", "ogg", "wma", "m4a", "wav"], "Archives": ["zip", "rar", "7z", "tar", "gz", "bz2"] } .IP "Description:" 4 A mapping from directory names to filename extensions that should be stored in them. Files with an extension not listed will be ignored and stored in their default location. .SS compare.action .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"replace"\f[] .IP "Description:" 4 The action to take when files do **not** compare as equal. .br * \f[I]"replace"\f[]: Replace/Overwrite the old version with the new one .br * \f[I]"enumerate"\f[]: Add an enumeration index to the filename of the new version like \f[I]skip = "enumerate"\f[] .SS compare.equal .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"null"\f[] .IP "Description:" 4 The action to take when files do compare as equal. .br * \f[I]"abort:N"\f[]: Stop the current extractor run after \f[I]N\f[] consecutive files compared as equal. .br * \f[I]"terminate:N"\f[]: Stop the current extractor run, including parent extractors, after \f[I]N\f[] consecutive files compared as equal. .br * \f[I]"exit:N"\f[]: Exit the program after \f[I]N\f[] consecutive files compared as equal. .SS compare.shallow .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Only compare file sizes. Do not read and compare their content. .SS exec.async .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Controls whether to wait for a subprocess to finish or to let it run asynchronously. .SS exec.command .IP "Type:" 6 \f[I]string\f[] or \f[I]list\f[] of \f[I]strings\f[] .IP "Example:" 4 .br * "convert {} {}.png && rm {}" .br * ["echo", "{user[account]}", "{id}"] .IP "Description:" 4 The command to run. .br * If this is a \f[I]string\f[], it will be executed using the system's shell, e.g. \f[I]/bin/sh\f[]. Any \f[I]{}\f[] will be replaced with the full path of a file or target directory, depending on \f[I]exec.event\f[] .br * If this is a \f[I]list\f[], the first element specifies the program name and any further elements its arguments. Each element of this list is treated as a \f[I]format string\f[] using the files' metadata as well as \f[I]{_path}\f[], \f[I]{_directory}\f[], and \f[I]{_filename}\f[]. .SS exec.event .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"after"\f[] .IP "Description:" 4 The event for which \f[I]exec.command\f[] is run. See \f[I]metadata.event\f[] for a list of available events. .SS metadata.mode .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"json"\f[] .IP "Description:" 4 Select how to write metadata. .br * \f[I]"json"\f[]: all metadata using \f[I]json.dump() \f[] .br * \f[I]"tags"\f[]: \f[I]tags\f[] separated by newlines .br * \f[I]"custom"\f[]: result of applying \f[I]metadata.content-format\f[] to a file's metadata dictionary .SS metadata.filename .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Example:" 4 "{id}.data.json" .IP "Description:" 4 A \f[I]format string\f[] to build the filenames for metadata files with. (see \f[I]extractor.filename\f[]) If this option is set, \f[I]metadata.extension\f[] and \f[I]metadata.extension-format\f[] will be ignored. .SS metadata.directory .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"."\f[] .IP "Example:" 4 "metadata" .IP "Description:" 4 Directory where metadata files are stored in relative to the current target location for file downloads. .SS metadata.extension .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"json"\f[] or \f[I]"txt"\f[] .IP "Description:" 4 Filename extension for metadata files that will be appended to the original file names. .SS metadata.extension-format .IP "Type:" 6 \f[I]string\f[] .IP "Example:" 4 .br * "{extension}.json" .br * "json" .IP "Description:" 4 Custom format string to build filename extensions for metadata files with, which will replace the original filename extensions. Note: \f[I]metadata.extension\f[] is ignored if this option is set. .SS metadata.event .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"file"\f[] .IP "Description:" 4 The event for which metadata gets written to a file. The available events are: \f[I]init\f[] After post processor initialization and before the first file download \f[I]finalize\f[] On extractor shutdown, e.g. after all files were downloaded \f[I]prepare\f[] Before a file download \f[I]file\f[] When completing a file download, but before it gets moved to its target location \f[I]after\f[] After a file got moved to its target location \f[I]skip\f[] When skipping a file download \f[I]post\f[] When starting to download all files of a post, e.g. a Tweet on Twitter or a post on Patreon. .SS metadata.content-format .IP "Type:" 6 \f[I]string\f[] or \f[I]list\f[] of \f[I]strings\f[] .IP "Example:" 4 .br * "tags:\\n\\n{tags:J\\n}\\n" .br * ["tags:", "", "{tags:J\\n}"] .IP "Description:" 4 Custom format string to build the content of metadata files with. Note: Only applies for \f[I]"mode": "custom"\f[]. .SS metadata.archive .IP "Type:" 6 \f[I]Path\f[] .IP "Description:" 4 File to store IDs of generated metadata files in, similar to \f[I]extractor.*.archive\f[]. \f[I]archive-format\f[] and \f[I]archive-prefix\f[] options, akin to \f[I]extractor.*.archive-format\f[] and \f[I]extractor.*.archive-prefix\f[], are supported as well. .SS metadata.mtime .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Set modification times of generated metadata files according to the accompanying downloaded file. Enabling this option will only have an effect *if* there is actual \f[I]mtime\f[] metadata available, that is .br * after a file download (\f[I]"event": "file"\f[] (default), \f[I]"event": "after"\f[]) .br * when running *after* an \f[I]mtime\f[] post processes for the same \f[I]event\f[] For example, a \f[I]metadata\f[] post processor for \f[I]"event": "post"\f[] will *not* be able to set its file's modification time unless an \f[I]mtime\f[] post processor with \f[I]"event": "post"\f[] runs *before* it. .SS mtime.event .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"file"\f[] .IP "Description:" 4 See \f[I]metadata.event\f[] .SS mtime.key .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"date"\f[] .IP "Description:" 4 Name of the metadata field whose value should be used. This value must either be a UNIX timestamp or a \f[I]datetime\f[] object. .SS ugoira.extension .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"webm"\f[] .IP "Description:" 4 Filename extension for the resulting video files. .SS ugoira.ffmpeg-args .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Example:" 4 ["-c:v", "libvpx-vp9", "-an", "-b:v", "2M"] .IP "Description:" 4 Additional FFmpeg command-line arguments. .SS ugoira.ffmpeg-demuxer .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]auto\f[] .IP "Description:" 4 FFmpeg demuxer to read and process input files with. Possible values are .br * "\f[I]concat\f[]" (inaccurate frame timecodes) .br * "\f[I]image2\f[]" (accurate timecodes, not usable on Windows) .br * "mkvmerge" (accurate timecodes, only WebM or MKV, requires \f[I]mkvmerge\f[]) "auto" will select mkvmerge if possible and fall back to image2 or concat depending on the local operating system. .SS ugoira.ffmpeg-location .IP "Type:" 6 \f[I]Path\f[] .IP "Default:" 9 \f[I]"ffmpeg"\f[] .IP "Description:" 4 Location of the \f[I]ffmpeg\f[] (or \f[I]avconv\f[]) executable to use. .SS ugoira.mkvmerge-location .IP "Type:" 6 \f[I]Path\f[] .IP "Default:" 9 \f[I]"mkvmerge"\f[] .IP "Description:" 4 Location of the \f[I]mkvmerge\f[] executable for use with the \f[I]mkvmerge demuxer\f[]. .SS ugoira.ffmpeg-output .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Show FFmpeg output. .SS ugoira.ffmpeg-twopass .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Enable Two-Pass encoding. .SS ugoira.framerate .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"auto"\f[] .IP "Description:" 4 Controls the frame rate argument (\f[I]-r\f[]) for FFmpeg .br * \f[I]"auto"\f[]: Automatically assign a fitting frame rate based on delays between frames. .br * any other \f[I]string\f[]: Use this value as argument for \f[I]-r\f[]. .br * \f[I]null\f[] or an empty \f[I]string\f[]: Don't set an explicit frame rate. .SS ugoira.keep-files .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Keep ZIP archives after conversion. .SS ugoira.libx264-prevent-odd .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Prevent \f[I]"width/height not divisible by 2"\f[] errors when using \f[I]libx264\f[] or \f[I]libx265\f[] encoders by applying a simple cropping filter. See this \f[I]Stack Overflow thread\f[] for more information. This option, when \f[I]libx264/5\f[] is used, automatically adds \f[I]["-vf", "crop=iw-mod(iw\\\\,2):ih-mod(ih\\\\,2)"]\f[] to the list of FFmpeg command-line arguments to reduce an odd width/height by 1 pixel and make them even. .SS ugoira.mtime .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Set modification times of generated ugoira aniomations. .SS ugoira.repeat-last-frame .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Allow repeating the last frame when necessary to prevent it from only being displayed for a very short amount of time. .SS zip.extension .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"zip"\f[] .IP "Description:" 4 Filename extension for the created ZIP archive. .SS zip.keep-files .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Keep the actual files after writing them to a ZIP archive. .SS zip.mode .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"default"\f[] .IP "Description:" 4 .br * \f[I]"default"\f[]: Write the central directory file header once after everything is done or an exception is raised. .br * \f[I]"safe"\f[]: Update the central directory file header each time a file is stored in a ZIP archive. This greatly reduces the chance a ZIP archive gets corrupted in case the Python interpreter gets shut down unexpectedly (power outage, SIGKILL) but is also a lot slower. .SH MISCELLANEOUS OPTIONS .SS extractor.modules .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 The \f[I]modules\f[] list in \f[I]extractor/__init__.py\f[] .IP "Example:" 4 ["reddit", "danbooru", "mangadex"] .IP "Description:" 4 The list of modules to load when searching for a suitable extractor class. Useful to reduce startup time and memory usage. .SS cache.file .IP "Type:" 6 \f[I]Path\f[] .IP "Default:" 9 .br * (\f[I]%APPDATA%\f[] or \f[I]"~"\f[]) + \f[I]"/gallery-dl/cache.sqlite3"\f[] on Windows .br * (\f[I]$XDG_CACHE_HOME\f[] or \f[I]"~/.cache"\f[]) + \f[I]"/gallery-dl/cache.sqlite3"\f[] on all other platforms .IP "Description:" 4 Path of the SQLite3 database used to cache login sessions, cookies and API tokens across gallery-dl invocations. Set this option to \f[I]null\f[] or an invalid path to disable this cache. .SS signals-ignore .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Example:" 4 ["SIGTTOU", "SIGTTIN", "SIGTERM"] .IP "Description:" 4 The list of signal names to ignore, i.e. set \f[I]SIG_IGN\f[] as signal handler for. .SS pyopenssl .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Use \f[I]pyOpenSSL\f[]-backed SSL-support. .SH API TOKENS & IDS .SS extractor.deviantart.client-id & .client-secret .IP "Type:" 6 \f[I]string\f[] .IP "How To:" 4 .br * login and visit DeviantArt's \f[I]Applications & Keys\f[] section .br * click "Register Application" .br * scroll to "OAuth2 Redirect URI Whitelist (Required)" and enter "https://mikf.github.io/gallery-dl/oauth-redirect.html" .br * scroll to the bottom and agree to the API License Agreement. Submission Policy, and Terms of Service. .br * click "Save" .br * copy \f[I]client_id\f[] and \f[I]client_secret\f[] of your new application and put them in your configuration file as \f[I]"client-id"\f[] and \f[I]"client-secret"\f[] .br * clear your \f[I]cache\f[] to delete any remaining \f[I]access-token\f[] entries. (\f[I]gallery-dl --clear-cache deviantart\f[]) .br * get a new \f[I]refresh-token\f[] for the new \f[I]client-id\f[] (\f[I]gallery-dl oauth:deviantart\f[]) .SS extractor.flickr.api-key & .api-secret .IP "Type:" 6 \f[I]string\f[] .IP "How To:" 4 .br * login and \f[I]Create an App\f[] in Flickr's \f[I]App Garden\f[] .br * click "APPLY FOR A NON-COMMERCIAL KEY" .br * fill out the form with a random name and description and click "SUBMIT" .br * copy \f[I]Key\f[] and \f[I]Secret\f[] and put them in your configuration file .SS extractor.reddit.client-id & .user-agent .IP "Type:" 6 \f[I]string\f[] .IP "How To:" 4 .br * login and visit the \f[I]apps\f[] section of your account's preferences .br * click the "are you a developer? create an app..." button .br * fill out the form, choose "installed app", preferably set "http://localhost:6414/" as "redirect uri" and finally click "create app" .br * copy the client id (third line, under your application's name and "installed app") and put it in your configuration file .br * use "\f[I]Python::v1.0 (by /u/)\f[]" as user-agent and replace \f[I]\f[] and \f[I]\f[] accordingly (see Reddit's \f[I]API access rules\f[]) .SS extractor.smugmug.api-key & .api-secret .IP "Type:" 6 \f[I]string\f[] .IP "How To:" 4 .br * login and \f[I]Apply for an API Key\f[] .br * use a random name and description, set "Type" to "Application", "Platform" to "All", and "Use" to "Non-Commercial" .br * fill out the two checkboxes at the bottom and click "Apply" .br * copy \f[I]API Key\f[] and \f[I]API Secret\f[] and put them in your configuration file .SS extractor.tumblr.api-key & .api-secret .IP "Type:" 6 \f[I]string\f[] .IP "How To:" 4 .br * login and visit Tumblr's \f[I]Applications\f[] section .br * click "Register application" .br * fill out the form: use a random name and description, set https://example.org/ as "Application Website" and "Default callback URL" .br * solve Google's "I'm not a robot" challenge and click "Register" .br * click "Show secret key" (below "OAuth Consumer Key") .br * copy your \f[I]OAuth Consumer Key\f[] and \f[I]Secret Key\f[] and put them in your configuration file .SH CUSTOM TYPES .SS Date .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]integer\f[] .IP "Example:" 4 .br * "2019-01-01T00:00:00" .br * "2019" with "%Y" as \f[I]date-format\f[] .br * 1546297200 .IP "Description:" 4 A \f[I]Date\f[] value represents a specific point in time. .br * If given as \f[I]string\f[], it is parsed according to \f[I]date-format\f[]. .br * If given as \f[I]integer\f[], it is interpreted as UTC timestamp. .SS Duration .IP "Type:" 6 .br * \f[I]float\f[] .br * \f[I]list\f[] with 2 \f[I]floats\f[] .br * \f[I]string\f[] .IP "Example:" 4 .br * 2.85 .br * [1.5, 3.0] .br * "2.85", "1.5-3.0" .IP "Description:" 4 A \f[I]Duration\f[] represents a span of time in seconds. .br * If given as a single \f[I]float\f[], it will be used as that exact value. .br * If given as a \f[I]list\f[] with 2 floating-point numbers \f[I]a\f[] & \f[I]b\f[] , it will be randomly chosen with uniform distribution such that \f[I]a <= N <=b\f[]. (see \f[I]random.uniform()\f[]) .br * If given as a \f[I]string\f[], it can either represent a single \f[I]float\f[] value (\f[I]"2.85"\f[]) or a range (\f[I]"1.5-3.0"\f[]). .SS Path .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Example:" 4 .br * "file.ext" .br * "~/path/to/file.ext" .br * "$HOME/path/to/file.ext" .br * ["$HOME", "path", "to", "file.ext"] .IP "Description:" 4 A \f[I]Path\f[] is a \f[I]string\f[] representing the location of a file or directory. Simple \f[I]tilde expansion\f[] and \f[I]environment variable expansion\f[] is supported. In Windows environments, backslashes (\f[I]"\\"\f[]) can, in addition to forward slashes (\f[I]"/"\f[]), be used as path separators. Because backslashes are JSON's escape character, they themselves have to be escaped. The path \f[I]C:\\path\\to\\file.ext\f[] has therefore to be written as \f[I]"C:\\\\path\\\\to\\\\file.ext"\f[] if you want to use backslashes. .SS Logging Configuration .IP "Type:" 6 \f[I]object\f[] .IP "Example:" 4 .. code:: json { "format" : "{asctime} {name}: {message}", "format-date": "%H:%M:%S", "path" : "~/log.txt", "encoding" : "ascii" } .. code:: json { "level" : "debug", "format": { "debug" : "debug: {message}", "info" : "[{name}] {message}", "warning": "Warning: {message}", "error" : "ERROR: {message}" } } .IP "Description:" 4 Extended logging output configuration. .br * format .br * General format string for logging messages or a dictionary with format strings for each loglevel. In addition to the default \f[I]LogRecord attributes\f[], it is also possible to access the current \f[I]extractor\f[], \f[I]job\f[], \f[I]path\f[], and keywords objects and their attributes, for example \f[I]"{extractor.url}"\f[], \f[I]"{path.filename}"\f[], \f[I]"{keywords.title}"\f[] .br * Default: \f[I]"[{name}][{levelname}] {message}"\f[] .br * format-date .br * Format string for \f[I]{asctime}\f[] fields in logging messages (see \f[I]strftime() directives\f[]) .br * Default: \f[I]"%Y-%m-%d %H:%M:%S"\f[] .br * level .br * Minimum logging message level (one of \f[I]"debug"\f[], \f[I]"info"\f[], \f[I]"warning"\f[], \f[I]"error"\f[], \f[I]"exception"\f[]) .br * Default: \f[I]"info"\f[] .br * path .br * \f[I]Path\f[] to the output file .br * mode .br * Mode in which the file is opened; use \f[I]"w"\f[] to truncate or \f[I]"a"\f[] to append (see \f[I]open()\f[]) .br * Default: \f[I]"w"\f[] .br * encoding .br * File encoding .br * Default: \f[I]"utf-8"\f[] Note: path, mode, and encoding are only applied when configuring logging output to a file. .SS Postprocessor Configuration .IP "Type:" 6 \f[I]object\f[] .IP "Example:" 4 .. code:: json { "name": "mtime" } .. code:: json { "name" : "zip", "compression": "store", "extension" : "cbz", "filter" : "extension not in ('zip', 'rar')", "whitelist" : ["mangadex", "exhentai", "nhentai"] } .IP "Description:" 4 An \f[I]object\f[] containing a \f[I]"name"\f[] attribute specifying the post-processor type, as well as any of its \f[I]options\f[]. It is possible to set a \f[I]"filter"\f[] expression similar to \f[I]image-filter\f[] to only run a post-processor conditionally. It is also possible set a \f[I]"whitelist"\f[] or \f[I]"blacklist"\f[] to only enable or disable a post-processor for the specified extractor categories. The available post-processor types are \f[I]classify\f[] Categorize files by filename extension \f[I]compare\f[] Compare versions of the same file and replace/enumerate them on mismatch .br (requires \f[I]downloader.*.part\f[] = \f[I]true\f[] and \f[I]extractor.*.skip\f[] = \f[I]false\f[]) .br \f[I]exec\f[] Execute external commands \f[I]metadata\f[] Write metadata to separate files \f[I]mtime\f[] Set file modification time according to its metadata \f[I]ugoira\f[] Convert Pixiv Ugoira to WebM using \f[I]FFmpeg\f[] \f[I]zip\f[] Store files in a ZIP archive .SH BUGS https://github.com/mikf/gallery-dl/issues .SH AUTHORS Mike Fährmann .br and https://github.com/mikf/gallery-dl/graphs/contributors .SH "SEE ALSO" .BR gallery-dl (1) ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1649443808.1613946 gallery_dl-1.21.1/docs/0000755000175000017500000000000014224101740013331 5ustar00mikemike././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1646253139.0 gallery_dl-1.21.1/docs/gallery-dl-example.conf0000644000175000017500000003240514207752123017701 0ustar00mikemike{ "extractor": { "base-directory": "~/gallery-dl/", "#": "set global archive file for all extractors", "archive": "~/gallery-dl/archive.sqlite3", "#": "add two custom keywords into the metadata dictionary", "#": "these can be used to further refine your output directories or filenames", "keywords": {"bkey": "", "ckey": ""}, "#": "make sure that custom keywords are empty, i.e. they don't appear unless specified by the user", "keywords-default": "", "#": "replace invalid path characters with unicode alternatives", "path-restrict": { "\\": "⧹", "/" : "⧸", "|" : "│", ":" : "꞉", "*" : "∗", "?" : "?", "\"": "″", "<" : "﹤", ">" : "﹥" }, "#": "write tags for several *booru sites", "postprocessors": [ { "name": "metadata", "mode": "tags", "whitelist": ["danbooru", "moebooru", "sankaku"] } ], "pixiv": { "#": "override global archive setting for pixiv", "archive": "~/gallery-dl/archive-pixiv.sqlite3", "#": "set custom directory and filename format strings for all pixiv downloads", "filename": "{id}{num}.{extension}", "directory": ["Pixiv", "Works", "{user[id]}"], "refresh-token": "aBcDeFgHiJkLmNoPqRsTuVwXyZ01234567890-FedC9", "#": "transform ugoira into lossless MKVs", "ugoira": true, "postprocessors": ["ugoira-copy"], "#": "use special settings for favorites and bookmarks", "favorite": { "directory": ["Pixiv", "Favorites", "{user[id]}"] }, "bookmark": { "directory": ["Pixiv", "My Bookmarks"], "refresh-token": "01234567890aBcDeFgHiJkLmNoPqRsTuVwXyZ-ZyxW1" } }, "danbooru": { "ugoira": true, "postprocessors": ["ugoira-webm"] }, "exhentai": { "#": "use cookies instead of logging in with username and password", "cookies": { "ipb_member_id": "12345", "ipb_pass_hash": "1234567890abcdef", "igneous" : "123456789", "hath_perks" : "m1.m2.m3.a-123456789a", "sk" : "n4m34tv3574m2c4e22c35zgeehiw", "sl" : "dm_2" }, "#": "wait 2 to 4.8 seconds between HTTP requests", "sleep-request": [2.0, 4.8], "filename": "{num:>04}_{name}.{extension}", "directory": ["{category!c}", "{title}"] }, "sankaku": { "#": "authentication with cookies is not possible for sankaku", "username": "user", "password": "#secret#" }, "furaffinity": { "#": "authentication with username and password is not possible due to CAPTCHA", "cookies": { "a": "01234567-89ab-cdef-fedc-ba9876543210", "b": "fedcba98-7654-3210-0123-456789abcdef" }, "descriptions": "html", "postprocessors": ["content"] }, "deviantart": { "#": "download 'gallery' and 'scraps' images for user profile URLs", "include": "gallery,scraps", "#": "use custom API credentials to avoid 429 errors", "client-id": "98765", "client-secret": "0123456789abcdef0123456789abcdef", "refresh-token": "0123456789abcdef0123456789abcdef01234567", "#": "put description texts into a separate directory", "metadata": true, "postprocessors": [ { "name": "metadata", "mode": "custom", "directory" : "Descriptions", "content-format" : "{description}\n", "extension-format": "descr.txt" } ] }, "flickr": { "access-token": "1234567890-abcdef", "access-token-secret": "1234567890abcdef", "size-max": 1920 }, "mangadex": { "#": "only download safe/suggestive chapters translated to English", "lang": "en", "ratings": ["safe", "suggestive"], "#": "put chapters into '.cbz' archives", "postprocessors": ["cbz"] }, "reddit": { "#": "only spawn child extractors for links to specific sites", "whitelist": ["imgur", "redgifs", "gfycat"], "#": "put files from child extractors into the reddit directory", "parent-directory": true, "#": "transfer metadata to any child extractor as '_reddit'", "parent-metadata": "_reddit" }, "imgur": { "#": "use different directory and filename formats when coming from a reddit post", "directory": { "'_reddit' in locals()": [] }, "filename": { "'_reddit' in locals()": "{_reddit[id]} {id}.{extension}", "" : "{id}.{extension}" } }, "tumblr": { "posts" : "all", "external": false, "reblogs" : false, "inline" : true, "#": "use special settings when downloading liked posts", "likes": { "posts" : "video,photo,link", "external": true, "reblogs" : true } }, "twitter": { "#": "write text content for *all* tweets", "postprocessors": ["content"], "text-tweets": true }, "mastodon": { "#": "add 'tabletop.social' as recognized mastodon instance", "#": "(run 'gallery-dl oauth:mastodon:tabletop.social to get an access token')", "tabletop.social": { "root": "https://tabletop.social", "access-token": "513a36c6..." }, "#": "set filename format strings for all 'mastodon' instances", "directory": ["mastodon", "{instance}", "{account[username]!l}"], "filename" : "{id}_{media[id]}.{extension}" }, "foolslide": { "#": "add two more foolslide instances", "otscans" : {"root": "https://otscans.com/foolslide"}, "helvetica": {"root": "https://helveticascans.com/r" } }, "foolfuuka": { "#": "add two other foolfuuka 4chan archives", "fireden-onion": {"root": "http://ydt6jy2ng3s3xg2e.onion"}, "scalearchive" : {"root": "https://archive.scaled.team" } }, "gelbooru_v01": { "#": "add a custom gelbooru_v01 instance", "#": "this is just an example, this specific instance is already included!", "allgirlbooru": {"root": "https://allgirl.booru.org"}, "#": "the following options are used for all gelbooru_v01 instances", "tag": { "directory": { "locals().get('bkey')": ["Booru", "AllGirlBooru", "Tags", "{bkey}", "{ckey}", "{search_tags}"], "" : ["Booru", "AllGirlBooru", "Tags", "_Unsorted", "{search_tags}"] } }, "post": { "directory": ["Booru", "AllGirlBooru", "Posts"] }, "archive": "~/gallery-dl/custom-archive-file-for-gelbooru_v01_instances.db", "filename": "{tags}_{id}_{md5}.{extension}", "sleep-request": [0, 1.2] }, "gelbooru_v02": { "#": "add a custom gelbooru_v02 instance", "#": "this is just an example, this specific instance is already included!", "tbib": { "root": "https://tbib.org", "#": "some sites have different domains for API access", "#": "use the 'api_root' option in addition to the 'root' setting here" } }, "tbib": { "#": "the following options are only used for TBIB", "#": "gelbooru_v02 has four subcategories at the moment, use custom directory settings for all of these", "tag": { "directory": { "locals().get('bkey')": ["Other Boorus", "TBIB", "Tags", "{bkey}", "{ckey}", "{search_tags}"], "" : ["Other Boorus", "TBIB", "Tags", "_Unsorted", "{search_tags}"] } }, "pool": { "directory": { "locals().get('bkey')": ["Other Boorus", "TBIB", "Pools", "{bkey}", "{ckey}", "{pool}"], "" : ["Other Boorus", "TBIB", "Pools", "_Unsorted", "{pool}"] } }, "favorite": { "directory": { "locals().get('bkey')": ["Other Boorus", "TBIB", "Favorites", "{bkey}", "{ckey}", "{favorite_id}"], "" : ["Other Boorus", "TBIB", "Favorites", "_Unsorted", "{favorite_id}"] } }, "post": { "directory": ["Other Boorus", "TBIB", "Posts"] }, "archive": "~/gallery-dl/custom-archive-file-for-TBIB.db", "filename": "{id}_{md5}.{extension}", "sleep-request": [0, 1.2] } }, "downloader": { "#": "restrict download speed to 1 MB/s", "rate": "1M", "#": "show download progress indicator after 2 seconds", "progress": 2.0, "#": "retry failed downloads up to 3 times", "retries": 3, "#": "consider a download 'failed' after 8 seconds of inactivity", "timeout": 8.0, "#": "write '.part' files into a special directory", "part-directory": "/tmp/.download/", "#": "do not update file modification times", "mtime": false, "ytdl": { "#": "use yt-dlp instead of youtube-dl", "module": "yt_dlp" } }, "output": { "log": { "level": "info", "#": "use different ANSI colors for each log level", "format": { "debug" : "\u001b[0;37m{name}: {message}\u001b[0m", "info" : "\u001b[1;37m{name}: {message}\u001b[0m", "warning": "\u001b[1;33m{name}: {message}\u001b[0m", "error" : "\u001b[1;31m{name}: {message}\u001b[0m" } }, "#": "shorten filenames to fit into one terminal line", "#": "while also considering wider East-Asian characters", "shorten": "eaw", "#": "write logging messages to a separate file", "logfile": { "path": "~/gallery-dl/log.txt", "mode": "w", "level": "debug" }, "#": "write unrecognized URLs to a separate file", "unsupportedfile": { "path": "~/gallery-dl/unsupported.txt", "mode": "a", "format": "{asctime} {message}", "format-date": "%Y-%m-%d-%H-%M-%S" } }, "postprocessor": { "#": "write 'content' metadata into separate files", "content": { "name" : "metadata", "#": "write data for every post instead of each individual file", "event": "post", "filename": "{post_id|tweet_id|id}.txt", "#": "write only the values for 'content' or 'description'", "mode" : "custom", "format": "{content|description}\n" }, "#": "put files into a '.cbz' archive", "cbz": { "name": "zip", "extension": "cbz" }, "#": "various ugoira post processor configurations to create different file formats", "ugoira-webm": { "name": "ugoira", "extension": "webm", "ffmpeg-args": ["-c:v", "libvpx-vp9", "-an", "-b:v", "0", "-crf", "30"], "ffmpeg-twopass": true, "ffmpeg-demuxer": "image2" }, "ugoira-mp4": { "name": "ugoira", "extension": "mp4", "ffmpeg-args": ["-c:v", "libx264", "-an", "-b:v", "4M", "-preset", "veryslow"], "ffmpeg-twopass": true, "libx264-prevent-odd": true }, "ugoira-gif": { "name": "ugoira", "extension": "gif", "ffmpeg-args": ["-filter_complex", "[0:v] split [a][b];[a] palettegen [p];[b][p] paletteuse"] }, "ugoira-copy": { "name": "ugoira", "extension": "mkv", "ffmpeg-args": ["-c", "copy"], "libx264-prevent-odd": false, "repeat-last-frame": false } }, "#": "use a custom cache file location", "cache": { "file": "~/gallery-dl/cache.sqlite3" } } ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1646253139.0 gallery_dl-1.21.1/docs/gallery-dl.conf0000644000175000017500000001736414207752123016257 0ustar00mikemike{ "extractor": { "base-directory": "./gallery-dl/", "parent-directory": false, "postprocessors": null, "archive": null, "cookies": null, "cookies-update": true, "proxy": null, "skip": true, "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0", "retries": 4, "timeout": 30.0, "verify": true, "fallback": true, "sleep": 0, "sleep-request": 0, "sleep-extractor": 0, "path-restrict": "auto", "path-replace": "_", "path-remove": "\\u0000-\\u001f\\u007f", "path-strip": "auto", "extension-map": { "jpeg": "jpg", "jpe" : "jpg", "jfif": "jpg", "jif" : "jpg", "jfi" : "jpg" }, "artstation": { "external": false }, "aryion": { "username": null, "password": null, "recursive": true }, "bbc": { "width": 1920 }, "blogger": { "videos": true }, "danbooru": { "username": null, "password": null, "external": false, "metadata": false, "ugoira": false }, "derpibooru": { "api-key": null, "filter": 56027 }, "deviantart": { "client-id": null, "client-secret": null, "comments": false, "extra": false, "flat": true, "folders": false, "include": "gallery", "journals": "html", "mature": true, "metadata": false, "original": true, "wait-min": 0 }, "e621": { "username": null, "password": null }, "exhentai": { "username": null, "password": null, "domain": "auto", "limits": true, "metadata": false, "original": true, "sleep-request": 5.0 }, "flickr": { "videos": true, "size-max": null }, "furaffinity": { "descriptions": "text", "external": false, "include": "gallery", "layout": "auto" }, "gfycat": { "format": ["mp4", "webm", "mobile", "gif"] }, "hentaifoundry": { "include": "pictures" }, "hitomi": { "format": "webp", "metadata": false }, "idolcomplex": { "username": null, "password": null, "sleep-request": 5.0 }, "imgbb": { "username": null, "password": null }, "imgur": { "mp4": true }, "inkbunny": { "username": null, "password": null, "orderby": "create_datetime" }, "instagram": { "username": null, "password": null, "include": "posts", "sleep-request": 8.0, "videos": true }, "khinsider": { "format": "mp3" }, "luscious": { "gif": false }, "mangadex": { "api-server": "https://api.mangadex.org", "api-parameters": null, "lang": null, "ratings": ["safe", "suggestive", "erotica", "pornographic"] }, "mangoxo": { "username": null, "password": null }, "newgrounds": { "username": null, "password": null, "flash": true, "format": "original", "include": "art" }, "nijie": { "username": null, "password": null, "include": "illustration,doujin" }, "oauth": { "browser": true, "cache": true, "port": 6414 }, "pillowfort": { "external": false, "inline": true, "reblogs": false }, "pinterest": { "sections": true, "videos": true }, "pixiv": { "refresh-token": null, "avatar": false, "tags": "japanese", "ugoira": true }, "reactor": { "gif": false, "sleep-request": 5.0 }, "reddit": { "comments": 0, "morecomments": false, "date-min": 0, "date-max": 253402210800, "date-format": "%Y-%m-%dT%H:%M:%S", "id-min": "0", "id-max": "zik0zj", "recursion": 0, "videos": true }, "redgifs": { "format": ["hd", "sd", "gif"] }, "sankakucomplex": { "embeds": false, "videos": true }, "sankaku": { "username": null, "password": null }, "smugmug": { "videos": true }, "seiga": { "username": null, "password": null }, "subscribestar": { "username": null, "password": null }, "tsumino": { "username": null, "password": null }, "tumblr": { "avatar": false, "external": false, "inline": true, "posts": "all", "reblogs": true }, "twitter": { "username": null, "password": null, "cards": true, "conversations": false, "pinned": false, "quoted": false, "replies": true, "retweets": false, "text-tweets": false, "twitpic": false, "users": "timeline", "videos": true }, "unsplash": { "format": "raw" }, "vsco": { "videos": true }, "wallhaven": { "api-key": null }, "weasyl": { "api-key": null }, "weibo": { "retweets": true, "videos": true }, "ytdl": { "enabled": false, "format": null, "generic": true, "logging": true, "module": null, "raw-options": null }, "booru": { "tags": false, "notes": false } }, "downloader": { "filesize-min": null, "filesize-max": null, "mtime": true, "part": true, "part-directory": null, "progress": 3.0, "rate": null, "retries": 4, "timeout": 30.0, "verify": true, "http": { "adjust-extensions": true, "headers": null }, "ytdl": { "format": null, "forward-cookies": false, "logging": true, "module": null, "outtmpl": null, "raw-options": null } }, "output": { "mode": "auto", "progress": true, "shorten": true, "skip": true, "log": "[{name}][{levelname}] {message}", "logfile": null, "unsupportedfile": null }, "netrc": false } ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1649443808.1613946 gallery_dl-1.21.1/gallery_dl/0000755000175000017500000000000014224101740014517 5ustar00mikemike././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1646253139.0 gallery_dl-1.21.1/gallery_dl/__init__.py0000644000175000017500000002373414207752123016651 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2014-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. import sys import json import logging from . import version, config, option, output, extractor, job, util, exception __author__ = "Mike Fährmann" __copyright__ = "Copyright 2014-2021 Mike Fährmann" __license__ = "GPLv2" __maintainer__ = "Mike Fährmann" __email__ = "mike_faehrmann@web.de" __version__ = version.__version__ def progress(urls, pformat): """Wrapper around urls to output a simple progress indicator""" if pformat is True: pformat = "[{current}/{total}] {url}" pinfo = {"total": len(urls)} for pinfo["current"], pinfo["url"] in enumerate(urls, 1): print(pformat.format_map(pinfo), file=sys.stderr) yield pinfo["url"] def parse_inputfile(file, log): """Filter and process strings from an input file. Lines starting with '#' and empty lines will be ignored. Lines starting with '-' will be interpreted as a key-value pair separated by an '='. where 'key' is a dot-separated option name and 'value' is a JSON-parsable value for it. These config options will be applied while processing the next URL. Lines starting with '-G' are the same as above, except these options will be valid for all following URLs, i.e. they are Global. Everything else will be used as potential URL. Example input file: # settings global options -G base-directory = "/tmp/" -G skip = false # setting local options for the next URL -filename="spaces_are_optional.jpg" -skip = true https://example.org/ # next URL uses default filename and 'skip' is false. https://example.com/index.htm """ gconf = [] lconf = [] for line in file: line = line.strip() if not line or line[0] == "#": # empty line or comment continue elif line[0] == "-": # config spec if len(line) >= 2 and line[1] == "G": conf = gconf line = line[2:] else: conf = lconf line = line[1:] key, sep, value = line.partition("=") if not sep: log.warning("input file: invalid = pair: %s", line) continue try: value = json.loads(value.strip()) except ValueError as exc: log.warning("input file: unable to parse '%s': %s", value, exc) continue key = key.strip().split(".") conf.append((key[:-1], key[-1], value)) else: # url if gconf or lconf: yield util.ExtendedUrl(line, gconf, lconf) gconf = [] lconf = [] else: yield line def main(): try: if sys.stdout and sys.stdout.encoding.lower() != "utf-8": output.replace_std_streams() parser = option.build_parser() args = parser.parse_args() log = output.initialize_logging(args.loglevel) # configuration if args.load_config: config.load() if args.cfgfiles: config.load(args.cfgfiles, strict=True) if args.yamlfiles: config.load(args.yamlfiles, strict=True, fmt="yaml") if args.filename: if args.filename == "/O": args.filename = "{filename}.{extension}" config.set((), "filename", args.filename) if args.directory: config.set((), "base-directory", args.directory) config.set((), "directory", ()) if args.postprocessors: config.set((), "postprocessors", args.postprocessors) if args.abort: config.set((), "skip", "abort:" + str(args.abort)) if args.terminate: config.set((), "skip", "terminate:" + str(args.terminate)) for opts in args.options: config.set(*opts) # signals signals = config.get((), "signals-ignore") if signals: import signal if isinstance(signals, str): signals = signals.split(",") for signal_name in signals: signal_num = getattr(signal, signal_name, None) if signal_num is None: log.warning("signal '%s' is not defined", signal_name) else: signal.signal(signal_num, signal.SIG_IGN) # extractor modules modules = config.get(("extractor",), "modules") if modules is not None: if isinstance(modules, str): modules = modules.split(",") extractor.modules = modules extractor._module_iter = iter(modules) # loglevels output.configure_logging(args.loglevel) if args.loglevel >= logging.ERROR: config.set(("output",), "mode", "null") elif args.loglevel <= logging.DEBUG: import platform import subprocess import os.path import requests extra = "" if getattr(sys, "frozen", False): extra = " - Executable" else: try: out, err = subprocess.Popen( ("git", "rev-parse", "--short", "HEAD"), stdout=subprocess.PIPE, stderr=subprocess.PIPE, cwd=os.path.dirname(os.path.abspath(__file__)), ).communicate() if out and not err: extra = " - Git HEAD: " + out.decode().rstrip() except (OSError, subprocess.SubprocessError): pass log.debug("Version %s%s", __version__, extra) log.debug("Python %s - %s", platform.python_version(), platform.platform()) try: log.debug("requests %s - urllib3 %s", requests.__version__, requests.packages.urllib3.__version__) except AttributeError: pass if args.list_modules: for module_name in extractor.modules: print(module_name) elif args.list_extractors: for extr in extractor.extractors(): if not extr.__doc__: continue print(extr.__name__) print(extr.__doc__) print("Category:", extr.category, "- Subcategory:", extr.subcategory) test = next(extr._get_tests(), None) if test: print("Example :", test[0]) print() elif args.clear_cache: from . import cache log = logging.getLogger("cache") cnt = cache.clear(args.clear_cache) if cnt is None: log.error("Database file not available") else: log.info( "Deleted %d %s from '%s'", cnt, "entry" if cnt == 1 else "entries", cache._path(), ) else: if not args.urls and not args.inputfiles: parser.error( "The following arguments are required: URL\n" "Use 'gallery-dl --help' to get a list of all options.") if args.list_urls: jobtype = job.UrlJob jobtype.maxdepth = args.list_urls if config.get(("output",), "fallback", True): jobtype.handle_url = \ staticmethod(jobtype.handle_url_fallback) else: jobtype = args.jobtype or job.DownloadJob urls = args.urls if args.inputfiles: for inputfile in args.inputfiles: try: if inputfile == "-": if sys.stdin: urls += parse_inputfile(sys.stdin, log) else: log.warning("input file: stdin is not readable") else: with open(inputfile, encoding="utf-8") as file: urls += parse_inputfile(file, log) except OSError as exc: log.warning("input file: %s", exc) # unsupported file logging handler handler = output.setup_logging_handler( "unsupportedfile", fmt="{message}") if handler: ulog = logging.getLogger("unsupported") ulog.addHandler(handler) ulog.propagate = False job.Job.ulog = ulog pformat = config.get(("output",), "progress", True) if pformat and len(urls) > 1 and args.loglevel < logging.ERROR: urls = progress(urls, pformat) retval = 0 for url in urls: try: log.debug("Starting %s for '%s'", jobtype.__name__, url) if isinstance(url, util.ExtendedUrl): for opts in url.gconfig: config.set(*opts) with config.apply(url.lconfig): retval |= jobtype(url.value).run() else: retval |= jobtype(url).run() except exception.TerminateExtraction: pass except exception.NoExtractorError: log.error("No suitable extractor found for '%s'", url) retval |= 64 return retval except KeyboardInterrupt: sys.exit("\nKeyboardInterrupt") except BrokenPipeError: pass except OSError as exc: import errno if exc.errno != errno.EPIPE: raise return 1 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1609520004.0 gallery_dl-1.21.1/gallery_dl/__main__.py0000644000175000017500000000105313773651604016631 0ustar00mikemike#!/usr/bin/env python3 # -*- coding: utf-8 -*- # Copyright 2017-2019 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. import sys if __package__ is None and not hasattr(sys, "frozen"): import os.path path = os.path.dirname(os.path.dirname(os.path.abspath(__file__))) sys.path.insert(0, os.path.realpath(path)) import gallery_dl if __name__ == "__main__": sys.exit(gallery_dl.main()) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1643756959.0 gallery_dl-1.21.1/gallery_dl/cache.py0000644000175000017500000001451514176336637016167 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2016-2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Decorators to keep function results in an in-memory and database cache""" import sqlite3 import pickle import time import os import functools from . import config, util class CacheDecorator(): """Simplified in-memory cache""" def __init__(self, func, keyarg): self.func = func self.cache = {} self.keyarg = keyarg def __get__(self, instance, cls): return functools.partial(self.__call__, instance) def __call__(self, *args, **kwargs): key = "" if self.keyarg is None else args[self.keyarg] try: value = self.cache[key] except KeyError: value = self.cache[key] = self.func(*args, **kwargs) return value def update(self, key, value): self.cache[key] = value def invalidate(self, key=""): try: del self.cache[key] except KeyError: pass class MemoryCacheDecorator(CacheDecorator): """In-memory cache""" def __init__(self, func, keyarg, maxage): CacheDecorator.__init__(self, func, keyarg) self.maxage = maxage def __call__(self, *args, **kwargs): key = "" if self.keyarg is None else args[self.keyarg] timestamp = int(time.time()) try: value, expires = self.cache[key] except KeyError: expires = 0 if expires <= timestamp: value = self.func(*args, **kwargs) expires = timestamp + self.maxage self.cache[key] = value, expires return value def update(self, key, value): self.cache[key] = value, int(time.time()) + self.maxage class DatabaseCacheDecorator(): """Database cache""" db = None _init = True def __init__(self, func, keyarg, maxage): self.key = "%s.%s" % (func.__module__, func.__name__) self.func = func self.cache = {} self.keyarg = keyarg self.maxage = maxage def __get__(self, obj, objtype): return functools.partial(self.__call__, obj) def __call__(self, *args, **kwargs): key = "" if self.keyarg is None else args[self.keyarg] timestamp = int(time.time()) # in-memory cache lookup try: value, expires = self.cache[key] if expires > timestamp: return value except KeyError: pass # database lookup fullkey = "%s-%s" % (self.key, key) with self.database() as db: cursor = db.cursor() try: cursor.execute("BEGIN EXCLUSIVE") except sqlite3.OperationalError: pass # Silently swallow exception - workaround for Python 3.6 cursor.execute( "SELECT value, expires FROM data WHERE key=? LIMIT 1", (fullkey,), ) result = cursor.fetchone() if result and result[1] > timestamp: value, expires = result value = pickle.loads(value) else: value = self.func(*args, **kwargs) expires = timestamp + self.maxage cursor.execute( "INSERT OR REPLACE INTO data VALUES (?,?,?)", (fullkey, pickle.dumps(value), expires), ) self.cache[key] = value, expires return value def update(self, key, value): expires = int(time.time()) + self.maxage self.cache[key] = value, expires with self.database() as db: db.execute( "INSERT OR REPLACE INTO data VALUES (?,?,?)", ("%s-%s" % (self.key, key), pickle.dumps(value), expires), ) def invalidate(self, key): try: del self.cache[key] except KeyError: pass with self.database() as db: db.execute( "DELETE FROM data WHERE key=?", ("%s-%s" % (self.key, key),), ) def database(self): if self._init: self.db.execute( "CREATE TABLE IF NOT EXISTS data " "(key TEXT PRIMARY KEY, value TEXT, expires INTEGER)" ) DatabaseCacheDecorator._init = False return self.db def memcache(maxage=None, keyarg=None): if maxage: def wrap(func): return MemoryCacheDecorator(func, keyarg, maxage) else: def wrap(func): return CacheDecorator(func, keyarg) return wrap def cache(maxage=3600, keyarg=None): def wrap(func): return DatabaseCacheDecorator(func, keyarg, maxage) return wrap def clear(module): """Delete database entries for 'module'""" db = DatabaseCacheDecorator.db if not db: return None rowcount = 0 cursor = db.cursor() try: if module == "ALL": cursor.execute("DELETE FROM data") else: cursor.execute( "DELETE FROM data " "WHERE key LIKE 'gallery_dl.extractor.' || ? || '.%'", (module.lower(),) ) except sqlite3.OperationalError: pass # database not initialized, cannot be modified, etc. else: rowcount = cursor.rowcount db.commit() if rowcount: cursor.execute("VACUUM") return rowcount def _path(): path = config.get(("cache",), "file", util.SENTINEL) if path is not util.SENTINEL: return util.expand_path(path) if util.WINDOWS: cachedir = os.environ.get("APPDATA", "~") else: cachedir = os.environ.get("XDG_CACHE_HOME", "~/.cache") cachedir = util.expand_path(os.path.join(cachedir, "gallery-dl")) os.makedirs(cachedir, exist_ok=True) return os.path.join(cachedir, "cache.sqlite3") def _init(): try: dbfile = _path() # restrict access permissions for new db files os.close(os.open(dbfile, os.O_CREAT | os.O_RDONLY, 0o600)) DatabaseCacheDecorator.db = sqlite3.connect( dbfile, timeout=60, check_same_thread=False) except (OSError, TypeError, sqlite3.OperationalError): global cache cache = memcache _init() ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1645832469.0 gallery_dl-1.21.1/gallery_dl/config.py0000644000175000017500000001240414206264425016351 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2015-2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Global configuration module""" import sys import json import os.path import logging from . import util log = logging.getLogger("config") # -------------------------------------------------------------------- # internals _config = {} if util.WINDOWS: _default_configs = [ r"%APPDATA%\gallery-dl\config.json", r"%USERPROFILE%\gallery-dl\config.json", r"%USERPROFILE%\gallery-dl.conf", ] else: _default_configs = [ "/etc/gallery-dl.conf", "${XDG_CONFIG_HOME}/gallery-dl/config.json" if os.environ.get("XDG_CONFIG_HOME") else "${HOME}/.config/gallery-dl/config.json", "${HOME}/.gallery-dl.conf", ] if getattr(sys, "frozen", False): # look for config file in PyInstaller executable directory (#682) _default_configs.append(os.path.join( os.path.dirname(sys.executable), "gallery-dl.conf", )) # -------------------------------------------------------------------- # public interface def load(files=None, strict=False, fmt="json"): """Load JSON configuration files""" if fmt == "yaml": try: import yaml parsefunc = yaml.safe_load except ImportError: log.error("Could not import 'yaml' module") return else: parsefunc = json.load for path in files or _default_configs: path = util.expand_path(path) try: with open(path, encoding="utf-8") as file: confdict = parsefunc(file) except OSError as exc: if strict: log.error(exc) sys.exit(1) except Exception as exc: log.warning("Could not parse '%s': %s", path, exc) if strict: sys.exit(2) else: if not _config: _config.update(confdict) else: util.combine_dict(_config, confdict) def clear(): """Reset configuration to an empty state""" _config.clear() def get(path, key, default=None, *, conf=_config): """Get the value of property 'key' or a default value""" try: for p in path: conf = conf[p] return conf[key] except Exception: return default def interpolate(path, key, default=None, *, conf=_config): """Interpolate the value of 'key'""" if key in conf: return conf[key] try: for p in path: conf = conf[p] if key in conf: default = conf[key] except Exception: pass return default def interpolate_common(common, paths, key, default=None, *, conf=_config): """Interpolate the value of 'key' using multiple 'paths' along a 'common' ancestor """ if key in conf: return conf[key] # follow the common path try: for p in common: conf = conf[p] if key in conf: default = conf[key] except Exception: return default # try all paths until a value is found value = util.SENTINEL for path in paths: c = conf try: for p in path: c = c[p] if key in c: value = c[key] except Exception: pass if value is not util.SENTINEL: return value return default def accumulate(path, key, *, conf=_config): """Accumulate the values of 'key' along 'path'""" result = [] try: if key in conf: value = conf[key] if value: result.extend(value) for p in path: conf = conf[p] if key in conf: value = conf[key] if value: result[:0] = value except Exception: pass return result def set(path, key, value, *, conf=_config): """Set the value of property 'key' for this session""" for p in path: try: conf = conf[p] except KeyError: conf[p] = conf = {} conf[key] = value def setdefault(path, key, value, *, conf=_config): """Set the value of property 'key' if it doesn't exist""" for p in path: try: conf = conf[p] except KeyError: conf[p] = conf = {} return conf.setdefault(key, value) def unset(path, key, *, conf=_config): """Unset the value of property 'key'""" try: for p in path: conf = conf[p] del conf[key] except Exception: pass class apply(): """Context Manager: apply a collection of key-value pairs""" def __init__(self, kvlist): self.original = [] self.kvlist = kvlist def __enter__(self): for path, key, value in self.kvlist: self.original.append((path, key, get(path, key, util.SENTINEL))) set(path, key, value) def __exit__(self, etype, value, traceback): for path, key, value in self.original: if value is util.SENTINEL: unset(path, key) else: set(path, key, value) ././@PaxHeader0000000000000000000000000000003300000000000010211 xustar0027 mtime=1649443808.164728 gallery_dl-1.21.1/gallery_dl/downloader/0000755000175000017500000000000014224101740016655 5ustar00mikemike././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1618779451.0 gallery_dl-1.21.1/gallery_dl/downloader/__init__.py0000644000175000017500000000176414037116473021011 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2015-2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Downloader modules""" modules = [ "http", "text", "ytdl", ] def find(scheme): """Return downloader class suitable for handling the given scheme""" try: return _cache[scheme] except KeyError: pass cls = None if scheme == "https": scheme = "http" if scheme in modules: # prevent unwanted imports try: module = __import__(scheme, globals(), None, (), 1) except ImportError: pass else: cls = module.__downloader__ if scheme == "http": _cache["http"] = _cache["https"] = cls else: _cache[scheme] = cls return cls # -------------------------------------------------------------------- # internals _cache = {} ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1648567962.0 gallery_dl-1.21.1/gallery_dl/downloader/common.py0000644000175000017500000000250114220623232020516 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2014-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Common classes and constants used by downloader modules.""" import os from .. import config, util class DownloaderBase(): """Base class for downloaders""" scheme = "" def __init__(self, job): self.out = job.out self.session = job.extractor.session self.part = self.config("part", True) self.partdir = self.config("part-directory") self.log = job.get_logger("downloader." + self.scheme) if self.partdir: self.partdir = util.expand_path(self.partdir) os.makedirs(self.partdir, exist_ok=True) proxies = self.config("proxy", util.SENTINEL) if proxies is util.SENTINEL: self.proxies = job.extractor._proxies else: self.proxies = util.build_proxy_map(proxies, self.log) def config(self, key, default=None): """Interpolate downloader config value for 'key'""" return config.interpolate(("downloader", self.scheme), key, default) def download(self, url, pathfmt): """Write data from 'url' into the file specified by 'pathfmt'""" ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1648567962.0 gallery_dl-1.21.1/gallery_dl/downloader/http.py0000644000175000017500000003133114220623232020210 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2014-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Downloader module for http:// and https:// URLs""" import time import mimetypes from requests.exceptions import RequestException, ConnectionError, Timeout from .common import DownloaderBase from .. import text, util from ssl import SSLError try: from OpenSSL.SSL import Error as OpenSSLError except ImportError: OpenSSLError = SSLError class HttpDownloader(DownloaderBase): scheme = "http" def __init__(self, job): DownloaderBase.__init__(self, job) extractor = job.extractor self.chunk_size = 16384 self.downloading = False self.adjust_extension = self.config("adjust-extensions", True) self.progress = self.config("progress", 3.0) self.headers = self.config("headers") self.minsize = self.config("filesize-min") self.maxsize = self.config("filesize-max") self.retries = self.config("retries", extractor._retries) self.timeout = self.config("timeout", extractor._timeout) self.verify = self.config("verify", extractor._verify) self.mtime = self.config("mtime", True) self.rate = self.config("rate") if self.retries < 0: self.retries = float("inf") if self.minsize: minsize = text.parse_bytes(self.minsize) if not minsize: self.log.warning( "Invalid minimum file size (%r)", self.minsize) self.minsize = minsize if self.maxsize: maxsize = text.parse_bytes(self.maxsize) if not maxsize: self.log.warning( "Invalid maximum file size (%r)", self.maxsize) self.maxsize = maxsize if self.rate: rate = text.parse_bytes(self.rate) if rate: if rate < self.chunk_size: self.chunk_size = rate self.rate = rate self.receive = self._receive_rate else: self.log.warning("Invalid rate limit (%r)", self.rate) if self.progress is not None: self.receive = self._receive_rate def download(self, url, pathfmt): try: return self._download_impl(url, pathfmt) except Exception: print() raise finally: # remove file from incomplete downloads if self.downloading and not self.part: util.remove_file(pathfmt.temppath) def _download_impl(self, url, pathfmt): response = None tries = 0 msg = "" kwdict = pathfmt.kwdict adjust_extension = kwdict.get( "_http_adjust_extension", self.adjust_extension) if self.part: pathfmt.part_enable(self.partdir) while True: if tries: if response: response.close() response = None self.log.warning("%s (%s/%s)", msg, tries, self.retries+1) if tries > self.retries: return False time.sleep(tries) tries += 1 file_header = None # collect HTTP headers headers = {"Accept": "*/*"} # file-specific headers extra = kwdict.get("_http_headers") if extra: headers.update(extra) # general headers if self.headers: headers.update(self.headers) # partial content file_size = pathfmt.part_size() if file_size: headers["Range"] = "bytes={}-".format(file_size) # connect to (remote) source try: response = self.session.request( kwdict.get("_http_method", "GET"), url, stream=True, headers=headers, data=kwdict.get("_http_data"), timeout=self.timeout, proxies=self.proxies, verify=self.verify, ) except (ConnectionError, Timeout) as exc: msg = str(exc) continue except Exception as exc: self.log.warning(exc) return False # check response code = response.status_code if code == 200: # OK offset = 0 size = response.headers.get("Content-Length") elif code == 206: # Partial Content offset = file_size size = response.headers["Content-Range"].rpartition("/")[2] elif code == 416 and file_size: # Requested Range Not Satisfiable break else: msg = "'{} {}' for '{}'".format(code, response.reason, url) if code == 429 or 500 <= code < 600: # Server Error continue self.log.warning(msg) return False # check for invalid responses validate = kwdict.get("_http_validate") if validate: result = validate(response) if isinstance(result, str): url = result tries -= 1 continue if not result: self.log.warning("Invalid response") return False # set missing filename extension from MIME type if not pathfmt.extension: pathfmt.set_extension(self._find_extension(response)) if pathfmt.exists(): pathfmt.temppath = "" return True # check file size size = text.parse_int(size, None) if size is not None: if self.minsize and size < self.minsize: self.log.warning( "File size smaller than allowed minimum (%s < %s)", size, self.minsize) return False if self.maxsize and size > self.maxsize: self.log.warning( "File size larger than allowed maximum (%s > %s)", size, self.maxsize) return False content = response.iter_content(self.chunk_size) # check filename extension against file header if adjust_extension and not offset and \ pathfmt.extension in FILE_SIGNATURES: try: file_header = next( content if response.raw.chunked else response.iter_content(16), b"") except (RequestException, SSLError, OpenSSLError) as exc: msg = str(exc) print() continue if self._adjust_extension(pathfmt, file_header) and \ pathfmt.exists(): pathfmt.temppath = "" return True # set open mode if not offset: mode = "w+b" if file_size: self.log.debug("Unable to resume partial download") else: mode = "r+b" self.log.debug("Resuming download at byte %d", offset) # download content self.downloading = True with pathfmt.open(mode) as fp: if file_header: fp.write(file_header) offset += len(file_header) elif offset: if adjust_extension and \ pathfmt.extension in FILE_SIGNATURES: self._adjust_extension(pathfmt, fp.read(16)) fp.seek(offset) self.out.start(pathfmt.path) try: self.receive(fp, content, size, offset) except (RequestException, SSLError, OpenSSLError) as exc: msg = str(exc) print() continue # check file size if size and fp.tell() < size: msg = "file size mismatch ({} < {})".format( fp.tell(), size) print() continue break self.downloading = False if self.mtime: kwdict.setdefault("_mtime", response.headers.get("Last-Modified")) else: kwdict["_mtime"] = None return True @staticmethod def receive(fp, content, bytes_total, bytes_downloaded): write = fp.write for data in content: write(data) def _receive_rate(self, fp, content, bytes_total, bytes_downloaded): rate = self.rate progress = self.progress bytes_start = bytes_downloaded write = fp.write t1 = tstart = time.time() for data in content: write(data) t2 = time.time() # current time elapsed = t2 - t1 # elapsed time num_bytes = len(data) if progress is not None: bytes_downloaded += num_bytes tdiff = t2 - tstart if tdiff >= progress: self.out.progress( bytes_total, bytes_downloaded, int((bytes_downloaded - bytes_start) / tdiff), ) if rate: expected = num_bytes / rate # expected elapsed time if elapsed < expected: # sleep if less time elapsed than expected time.sleep(expected - elapsed) t2 = time.time() t1 = t2 def _find_extension(self, response): """Get filename extension from MIME type""" mtype = response.headers.get("Content-Type", "image/jpeg") mtype = mtype.partition(";")[0] if "/" not in mtype: mtype = "image/" + mtype if mtype in MIME_TYPES: return MIME_TYPES[mtype] ext = mimetypes.guess_extension(mtype, strict=False) if ext: return ext[1:] self.log.warning("Unknown MIME type '%s'", mtype) return "bin" @staticmethod def _adjust_extension(pathfmt, file_header): """Check filename extension against file header""" sig = FILE_SIGNATURES[pathfmt.extension] if not file_header.startswith(sig): for ext, sig in FILE_SIGNATURES.items(): if file_header.startswith(sig): pathfmt.set_extension(ext) return True return False MIME_TYPES = { "image/jpeg" : "jpg", "image/jpg" : "jpg", "image/png" : "png", "image/gif" : "gif", "image/bmp" : "bmp", "image/x-bmp" : "bmp", "image/x-ms-bmp": "bmp", "image/webp" : "webp", "image/svg+xml" : "svg", "image/ico" : "ico", "image/icon" : "ico", "image/x-icon" : "ico", "image/vnd.microsoft.icon" : "ico", "image/x-photoshop" : "psd", "application/x-photoshop" : "psd", "image/vnd.adobe.photoshop": "psd", "video/webm": "webm", "video/ogg" : "ogg", "video/mp4" : "mp4", "audio/wav" : "wav", "audio/x-wav": "wav", "audio/webm" : "webm", "audio/ogg" : "ogg", "audio/mpeg" : "mp3", "application/zip" : "zip", "application/x-zip": "zip", "application/x-zip-compressed": "zip", "application/rar" : "rar", "application/x-rar": "rar", "application/x-rar-compressed": "rar", "application/x-7z-compressed" : "7z", "application/pdf" : "pdf", "application/x-pdf": "pdf", "application/x-shockwave-flash": "swf", "application/ogg": "ogg", "application/octet-stream": "bin", } # https://en.wikipedia.org/wiki/List_of_file_signatures FILE_SIGNATURES = { "jpg" : b"\xFF\xD8\xFF", "png" : b"\x89PNG\r\n\x1A\n", "gif" : (b"GIF87a", b"GIF89a"), "bmp" : b"BM", "webp": b"RIFF", "svg" : b"= 0 else float("inf"), "socket_timeout": self.config("timeout", extractor._timeout), "nocheckcertificate": not self.config("verify", extractor._verify), "proxy": self.proxies.get("http") if self.proxies else None, } self.ytdl_instance = None self.forward_cookies = self.config("forward-cookies", False) self.progress = self.config("progress", 3.0) self.outtmpl = self.config("outtmpl") def download(self, url, pathfmt): kwdict = pathfmt.kwdict ytdl_instance = kwdict.pop("_ytdl_instance", None) if not ytdl_instance: ytdl_instance = self.ytdl_instance if not ytdl_instance: try: module = ytdl.import_module(self.config("module")) except ImportError as exc: self.log.error("Cannot import module '%s'", exc.name) self.log.debug("", exc_info=True) self.download = lambda u, p: False return False self.ytdl_instance = ytdl_instance = ytdl.construct_YoutubeDL( module, self, self.ytdl_opts) if self.outtmpl == "default": self.outtmpl = module.DEFAULT_OUTTMPL if self.forward_cookies: set_cookie = ytdl_instance.cookiejar.set_cookie for cookie in self.session.cookies: set_cookie(cookie) if self.progress is not None and not ytdl_instance._progress_hooks: ytdl_instance.add_progress_hook(self._progress_hook) info_dict = kwdict.pop("_ytdl_info_dict", None) if not info_dict: try: info_dict = ytdl_instance.extract_info(url[5:], download=False) except Exception: return False if "entries" in info_dict: index = kwdict.get("_ytdl_index") if index is None: return self._download_playlist( ytdl_instance, pathfmt, info_dict) else: info_dict = info_dict["entries"][index] extra = kwdict.get("_ytdl_extra") if extra: info_dict.update(extra) return self._download_video(ytdl_instance, pathfmt, info_dict) def _download_video(self, ytdl_instance, pathfmt, info_dict): if "url" in info_dict: text.nameext_from_url(info_dict["url"], pathfmt.kwdict) formats = info_dict.get("requested_formats") if formats and not compatible_formats(formats): info_dict["ext"] = "mkv" if self.outtmpl: self._set_outtmpl(ytdl_instance, self.outtmpl) pathfmt.filename = filename = \ ytdl_instance.prepare_filename(info_dict) pathfmt.extension = info_dict["ext"] pathfmt.path = pathfmt.directory + filename pathfmt.realpath = pathfmt.temppath = ( pathfmt.realdirectory + filename) else: pathfmt.set_extension(info_dict["ext"]) if pathfmt.exists(): pathfmt.temppath = "" return True if self.part and self.partdir: pathfmt.temppath = os.path.join( self.partdir, pathfmt.filename) self._set_outtmpl(ytdl_instance, pathfmt.temppath.replace("%", "%%")) self.out.start(pathfmt.path) try: ytdl_instance.process_info(info_dict) except Exception: self.log.debug("Traceback", exc_info=True) return False return True def _download_playlist(self, ytdl_instance, pathfmt, info_dict): pathfmt.set_extension("%(playlist_index)s.%(ext)s") self._set_outtmpl(ytdl_instance, pathfmt.realpath) for entry in info_dict["entries"]: ytdl_instance.process_info(entry) return True def _progress_hook(self, info): if info["status"] == "downloading" and \ info["elapsed"] >= self.progress: total = info.get("total_bytes") or info.get("total_bytes_estimate") speed = info.get("speed") self.out.progress( None if total is None else int(total), info["downloaded_bytes"], int(speed) if speed else 0, ) @staticmethod def _set_outtmpl(ytdl_instance, outtmpl): try: ytdl_instance.outtmpl_dict["default"] = outtmpl except AttributeError: ytdl_instance.params["outtmpl"] = outtmpl def compatible_formats(formats): """Returns True if 'formats' are compatible for merge""" video_ext = formats[0].get("ext") audio_ext = formats[1].get("ext") if video_ext == "webm" and audio_ext == "webm": return True exts = ("mp3", "mp4", "m4a", "m4p", "m4b", "m4r", "m4v", "ismv", "isma") return video_ext in exts and audio_ext in exts __downloader__ = YoutubeDLDownloader ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1646253139.0 gallery_dl-1.21.1/gallery_dl/exception.py0000644000175000017500000000621114207752123017077 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2015-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Exception classes used by gallery-dl Class Hierarchy: Exception +-- GalleryDLException +-- ExtractionError | +-- AuthenticationError | +-- AuthorizationError | +-- NotFoundError | +-- HttpError +-- FormatError | +-- FilenameFormatError | +-- DirectoryFormatError +-- FilterError +-- NoExtractorError +-- StopExtraction +-- TerminateExtraction """ class GalleryDLException(Exception): """Base class for GalleryDL exceptions""" default = None msgfmt = None code = 1 def __init__(self, message=None, fmt=True): if not message: message = self.default elif isinstance(message, Exception): message = "{}: {}".format(message.__class__.__name__, message) if self.msgfmt and fmt: message = self.msgfmt.format(message) Exception.__init__(self, message) class ExtractionError(GalleryDLException): """Base class for exceptions during information extraction""" class HttpError(ExtractionError): """HTTP request during data extraction failed""" default = "HTTP request failed" code = 4 def __init__(self, message, response=None): ExtractionError.__init__(self, message) self.response = response self.status = response.status_code if response else 0 class NotFoundError(ExtractionError): """Requested resource (gallery/image) could not be found""" msgfmt = "Requested {} could not be found" default = "resource (gallery/image)" code = 8 class AuthenticationError(ExtractionError): """Invalid or missing login credentials""" default = "Invalid or missing login credentials" code = 16 class AuthorizationError(ExtractionError): """Insufficient privileges to access a resource""" default = "Insufficient privileges to access the specified resource" code = 16 class FormatError(GalleryDLException): """Error while building output paths""" code = 32 class FilenameFormatError(FormatError): """Error while building output filenames""" msgfmt = "Applying filename format string failed ({})" class DirectoryFormatError(FormatError): """Error while building output directory paths""" msgfmt = "Applying directory format string failed ({})" class FilterError(GalleryDLException): """Error while evaluating a filter expression""" msgfmt = "Evaluating filter expression failed ({})" code = 32 class NoExtractorError(GalleryDLException): """No extractor can handle the given URL""" code = 64 class StopExtraction(GalleryDLException): """Stop data extraction""" def __init__(self, message=None, *args): GalleryDLException.__init__(self) self.message = message % args if args else message self.code = 1 if message else 0 class TerminateExtraction(GalleryDLException): """Terminate data extraction""" code = 0 ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1649443808.1780617 gallery_dl-1.21.1/gallery_dl/extractor/0000755000175000017500000000000014224101740016532 5ustar00mikemike././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1643756959.0 gallery_dl-1.21.1/gallery_dl/extractor/2chan.py0000644000175000017500000000651114176336637020127 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2017-2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.2chan.net/""" from .common import Extractor, Message from .. import text class _2chanThreadExtractor(Extractor): """Extractor for images from threads on www.2chan.net""" category = "2chan" subcategory = "thread" directory_fmt = ("{category}", "{board_name}", "{thread}") filename_fmt = "{tim}.{extension}" archive_fmt = "{board}_{thread}_{tim}" url_fmt = "https://{server}.2chan.net/{board}/src/{filename}" pattern = r"(?:https?://)?([\w-]+)\.2chan\.net/([^/]+)/res/(\d+)" test = ("http://dec.2chan.net/70/res/4752.htm", { "url": "f49aa31340e9a3429226af24e19e01f5b819ca1f", "keyword": "44599c21b248e79692b2eb2da12699bd0ed5640a", }) def __init__(self, match): Extractor.__init__(self, match) self.server, self.board, self.thread = match.groups() def items(self): url = "https://{}.2chan.net/{}/res/{}.htm".format( self.server, self.board, self.thread) page = self.request(url).text data = self.metadata(page) yield Message.Directory, data for post in self.posts(page): if "filename" not in post: continue post.update(data) url = self.url_fmt.format_map(post) yield Message.Url, url, post def metadata(self, page): """Collect metadata for extractor-job""" title = text.extract(page, "", "")[0] title, _, boardname = title.rpartition(" - ") return { "server": self.server, "title": title, "board": self.board, "board_name": boardname[:-4], "thread": self.thread, } def posts(self, page): """Build a list of all post-objects""" page = text.extract( page, '
')[0] return [ self.parse(post) for post in page.split('') ] def parse(self, post): """Build post-object by extracting data from an HTML post""" data = self._extract_post(post) if data["name"]: data["name"] = data["name"].strip() path = text.extract(post, '' , '<'), ("name", 'class="cnm">' , '<'), ("now" , 'class="cnw">' , '<'), ("no" , 'class="cno">No.', '<'), (None , '', ''), ))[0] @staticmethod def _extract_image(post, data): text.extract_all(post, ( (None , '_blank', ''), ("filename", '>', '<'), ("fsize" , '(', ' '), ), 0, data) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1643756959.0 gallery_dl-1.21.1/gallery_dl/extractor/35photo.py0000644000175000017500000001737714176336637020451 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2019-2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://35photo.pro/""" from .common import Extractor, Message from .. import text class _35photoExtractor(Extractor): category = "35photo" directory_fmt = ("{category}", "{user}") filename_fmt = "{id}{title:?_//}_{num:>02}.{extension}" archive_fmt = "{id}_{num}" root = "https://35photo.pro" def items(self): first = True data = self.metadata() for photo_id in self.photos(): for photo in self._photo_data(photo_id): photo.update(data) url = photo["url"] if first: first = False yield Message.Directory, photo yield Message.Url, url, text.nameext_from_url(url, photo) def metadata(self): """Returns general metadata""" return {} def photos(self): """Returns an iterable containing all relevant photo IDs""" def _pagination(self, params, extra_ids=None): url = "https://35photo.pro/show_block.php" headers = {"Referer": self.root, "X-Requested-With": "XMLHttpRequest"} params["type"] = "getNextPageData" if "lastId" not in params: params["lastId"] = "999999999" if extra_ids: yield from extra_ids while params["lastId"]: data = self.request(url, headers=headers, params=params).json() yield from self._photo_ids(data["data"]) params["lastId"] = data["lastId"] def _photo_data(self, photo_id): params = {"method": "photo.getData", "photoId": photo_id} data = self.request( "https://api.35photo.pro/", params=params).json()["data"][photo_id] info = { "url" : data["src"], "id" : data["photo_id"], "title" : data["photo_name"], "description": data["photo_desc"], "tags" : data["tags"] or [], "views" : data["photo_see"], "favorites" : data["photo_fav"], "score" : data["photo_rating"], "type" : data["photo_type"], "date" : data["timeAdd"], "user" : data["user_login"], "user_id" : data["user_id"], "user_name" : data["user_name"], } if "series" in data: for info["num"], photo in enumerate(data["series"], 1): info["url"] = photo["src"] info["id_series"] = text.parse_int(photo["id"]) info["title_series"] = photo["title"] or "" yield info.copy() else: info["num"] = 1 yield info @staticmethod def _photo_ids(page): """Extract unique photo IDs and return them as sorted list""" # searching for photo-id="..." doesn't always work (see unit tests) if not page: return () return sorted( set(text.extract_iter(page, "/photo_", "/")), key=text.parse_int, reverse=True, ) class _35photoUserExtractor(_35photoExtractor): """Extractor for all images of a user on 35photo.pro""" subcategory = "user" pattern = (r"(?:https?://)?(?:[a-z]+\.)?35photo\.pro" r"/(?!photo_|genre_|tags/|rating/)([^/?#]+)") test = ( ("https://35photo.pro/liya", { "pattern": r"https://([a-z][0-9]\.)?35photo\.pro" r"/photos_(main|series)/.*\.jpg", "count": 9, }), ("https://35photo.pro/suhoveev", { # last photo ID (1267028) isn't given as 'photo-id="" # there are only 23 photos without the last one "count": ">= 33", }), ("https://en.35photo.pro/liya"), ("https://ru.35photo.pro/liya"), ) def __init__(self, match): _35photoExtractor.__init__(self, match) self.user = match.group(1) self.user_id = 0 def metadata(self): url = "{}/{}/".format(self.root, self.user) page = self.request(url).text self.user_id = text.parse_int(text.extract(page, "/user_", ".xml")[0]) return { "user": self.user, "user_id": self.user_id, } def photos(self): return self._pagination({ "page": "photoUser", "user_id": self.user_id, }) class _35photoTagExtractor(_35photoExtractor): """Extractor for all photos from a tag listing""" subcategory = "tag" directory_fmt = ("{category}", "Tags", "{search_tag}") archive_fmt = "t{search_tag}_{id}_{num}" pattern = r"(?:https?://)?(?:[a-z]+\.)?35photo\.pro/tags/([^/?#]+)" test = ("https://35photo.pro/tags/landscape/", { "range": "1-25", "count": 25, }) def __init__(self, match): _35photoExtractor.__init__(self, match) self.tag = match.group(1) def metadata(self): return {"search_tag": text.unquote(self.tag).lower()} def photos(self): num = 1 while True: url = "{}/tags/{}/list_{}/".format(self.root, self.tag, num) page = self.request(url).text prev = None for photo_id in text.extract_iter(page, "35photo.pro/photo_", "/"): if photo_id != prev: prev = photo_id yield photo_id if not prev: return num += 1 class _35photoGenreExtractor(_35photoExtractor): """Extractor for images of a specific genre on 35photo.pro""" subcategory = "genre" directory_fmt = ("{category}", "Genre", "{genre}") archive_fmt = "g{genre_id}_{id}_{num}" pattern = r"(?:https?://)?(?:[a-z]+\.)?35photo\.pro/genre_(\d+)(/new/)?" test = ("https://35photo.pro/genre_109/",) def __init__(self, match): _35photoExtractor.__init__(self, match) self.genre_id, self.new = match.groups() self.photo_ids = None def metadata(self): url = "{}/genre_{}{}".format(self.root, self.genre_id, self.new or "/") page = self.request(url).text self.photo_ids = self._photo_ids(text.extract( page, ' class="photo', '\n')[0]) return { "genre": text.extract(page, " genre - ", ". ")[0], "genre_id": text.parse_int(self.genre_id), } def photos(self): if not self.photo_ids: return () return self._pagination({ "page": "genre", "community_id": self.genre_id, "photo_rating": "0" if self.new else "50", "lastId": self.photo_ids[-1], }, self.photo_ids) class _35photoImageExtractor(_35photoExtractor): """Extractor for individual images from 35photo.pro""" subcategory = "image" pattern = r"(?:https?://)?(?:[a-z]+\.)?35photo\.pro/photo_(\d+)" test = ("https://35photo.pro/photo_753340/", { "count": 1, "keyword": { "url" : r"re:https://35photo\.pro/photos_main/.*\.jpg", "id" : 753340, "title" : "Winter walk", "description": str, "tags" : list, "views" : int, "favorites" : int, "score" : int, "type" : 0, "date" : "15 авг, 2014", "user" : "liya", "user_id" : 20415, "user_name" : "Liya Mirzaeva", }, }) def __init__(self, match): _35photoExtractor.__init__(self, match) self.photo_id = match.group(1) def photos(self): return (self.photo_id,) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1641689335.0 gallery_dl-1.21.1/gallery_dl/extractor/3dbooru.py0000644000175000017500000000600514166430367020501 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2015-2020 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for http://behoimi.org/""" from . import moebooru class _3dbooruBase(): """Base class for 3dbooru extractors""" category = "3dbooru" basecategory = "booru" root = "http://behoimi.org" def __init__(self, match): super().__init__(match) self.session.headers.update({ "Referer": "http://behoimi.org/post/show/", "Accept-Encoding": "identity", }) class _3dbooruTagExtractor(_3dbooruBase, moebooru.MoebooruTagExtractor): """Extractor for images from behoimi.org based on search-tags""" pattern = (r"(?:https?://)?(?:www\.)?behoimi\.org/post" r"(?:/(?:index)?)?\?tags=(?P[^&#]+)") test = ("http://behoimi.org/post?tags=himekawa_azuru+dress", { "url": "ecb30c6aaaf8a6ff8f55255737a9840832a483c1", "content": "11cbda40c287e026c1ce4ca430810f761f2d0b2a", }) def posts(self): params = {"tags": self.tags} return self._pagination(self.root + "/post/index.json", params) class _3dbooruPoolExtractor(_3dbooruBase, moebooru.MoebooruPoolExtractor): """Extractor for image-pools from behoimi.org""" pattern = r"(?:https?://)?(?:www\.)?behoimi\.org/pool/show/(?P\d+)" test = ("http://behoimi.org/pool/show/27", { "url": "da75d2d1475449d5ef0c266cb612683b110a30f2", "content": "fd5b37c5c6c2de4b4d6f1facffdefa1e28176554", }) def posts(self): params = {"tags": "pool:" + self.pool_id} return self._pagination(self.root + "/post/index.json", params) class _3dbooruPostExtractor(_3dbooruBase, moebooru.MoebooruPostExtractor): """Extractor for single images from behoimi.org""" pattern = r"(?:https?://)?(?:www\.)?behoimi\.org/post/show/(?P\d+)" test = ("http://behoimi.org/post/show/140852", { "url": "ce874ea26f01d6c94795f3cc3aaaaa9bc325f2f6", "content": "26549d55b82aa9a6c1686b96af8bfcfa50805cd4", "options": (("tags", True),), "keyword": { "tags_character": "furude_rika", "tags_copyright": "higurashi_no_naku_koro_ni", "tags_model": "himekawa_azuru", "tags_general": str, }, }) def posts(self): params = {"tags": "id:" + self.post_id} return self._pagination(self.root + "/post/index.json", params) class _3dbooruPopularExtractor( _3dbooruBase, moebooru.MoebooruPopularExtractor): """Extractor for popular images from behoimi.org""" pattern = (r"(?:https?://)?(?:www\.)?behoimi\.org" r"/post/popular_(?Pby_(?:day|week|month)|recent)" r"(?:\?(?P[^#]*))?") test = ("http://behoimi.org/post/popular_by_month?month=2&year=2013", { "pattern": r"http://behoimi\.org/data/../../[0-9a-f]{32}\.jpg", "count": 20, }) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1643756959.0 gallery_dl-1.21.1/gallery_dl/extractor/420chan.py0000644000175000017500000000512014176336637020266 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://420chan.org/""" from .common import Extractor, Message class _420chanThreadExtractor(Extractor): """Extractor for 420chan threads""" category = "420chan" subcategory = "thread" directory_fmt = ("{category}", "{board}", "{thread} {title}") archive_fmt = "{board}_{thread}_{filename}" pattern = r"(?:https?://)?boards\.420chan\.org/([^/?#]+)/thread/(\d+)" test = ("https://boards.420chan.org/ani/thread/33251/chow-chows", { "pattern": r"https://boards\.420chan\.org/ani/src/\d+\.jpg", "content": "b07c803b0da78de159709da923e54e883c100934", "count": 2, }) def __init__(self, match): Extractor.__init__(self, match) self.board, self.thread = match.groups() def items(self): url = "https://api.420chan.org/{}/res/{}.json".format( self.board, self.thread) posts = self.request(url).json()["posts"] data = { "board" : self.board, "thread": self.thread, "title" : posts[0].get("sub") or posts[0]["com"][:50], } yield Message.Directory, data for post in posts: if "filename" in post: post.update(data) post["extension"] = post["ext"][1:] url = "https://boards.420chan.org/{}/src/{}{}".format( post["board"], post["filename"], post["ext"]) yield Message.Url, url, post class _420chanBoardExtractor(Extractor): """Extractor for 420chan boards""" category = "420chan" subcategory = "board" pattern = r"(?:https?://)?boards\.420chan\.org/([^/?#]+)/\d*$" test = ("https://boards.420chan.org/po/", { "pattern": _420chanThreadExtractor.pattern, "count": ">= 100", }) def __init__(self, match): Extractor.__init__(self, match) self.board = match.group(1) def items(self): url = "https://api.420chan.org/{}/threads.json".format(self.board) threads = self.request(url).json() for page in threads: for thread in page["threads"]: url = "https://boards.420chan.org/{}/thread/{}/".format( self.board, thread["no"]) thread["page"] = page["page"] thread["_extractor"] = _420chanThreadExtractor yield Message.Queue, url, thread ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1643756959.0 gallery_dl-1.21.1/gallery_dl/extractor/4chan.py0000644000175000017500000000601414176336637020127 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2015-2019 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.4chan.org/""" from .common import Extractor, Message from .. import text class _4chanThreadExtractor(Extractor): """Extractor for 4chan threads""" category = "4chan" subcategory = "thread" directory_fmt = ("{category}", "{board}", "{thread} {title}") filename_fmt = "{tim} {filename}.{extension}" archive_fmt = "{board}_{thread}_{tim}" pattern = (r"(?:https?://)?boards\.4chan(?:nel)?\.org" r"/([^/]+)/thread/(\d+)") test = ( ("https://boards.4chan.org/tg/thread/15396072/", { "url": "39082ad166161966d7ba8e37f2173a824eb540f0", "keyword": "7ae2f4049adf0d2f835eb91b6b26b7f4ec882e0a", "content": "20b7b51afa51c9c31a0020a0737b889532c8d7ec", }), ("https://boards.4channel.org/tg/thread/15396072/", { "url": "39082ad166161966d7ba8e37f2173a824eb540f0", "keyword": "7ae2f4049adf0d2f835eb91b6b26b7f4ec882e0a", }), ) def __init__(self, match): Extractor.__init__(self, match) self.board, self.thread = match.groups() def items(self): url = "https://a.4cdn.org/{}/thread/{}.json".format( self.board, self.thread) posts = self.request(url).json()["posts"] title = posts[0].get("sub") or text.remove_html(posts[0]["com"]) data = { "board" : self.board, "thread": self.thread, "title" : text.unescape(title)[:50], } yield Message.Directory, data for post in posts: if "filename" in post: post.update(data) post["extension"] = post["ext"][1:] post["filename"] = text.unescape(post["filename"]) url = "https://i.4cdn.org/{}/{}{}".format( post["board"], post["tim"], post["ext"]) yield Message.Url, url, post class _4chanBoardExtractor(Extractor): """Extractor for 4chan boards""" category = "4chan" subcategory = "board" pattern = r"(?:https?://)?boards\.4chan(?:nel)?\.org/([^/?#]+)/\d*$" test = ("https://boards.4channel.org/po/", { "pattern": _4chanThreadExtractor.pattern, "count": ">= 100", }) def __init__(self, match): Extractor.__init__(self, match) self.board = match.group(1) def items(self): url = "https://a.4cdn.org/{}/threads.json".format(self.board) threads = self.request(url).json() for page in threads: for thread in page["threads"]: url = "https://boards.4chan.org/{}/thread/{}/".format( self.board, thread["no"]) thread["page"] = page["page"] thread["_extractor"] = _4chanThreadExtractor yield Message.Queue, url, thread ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1643756959.0 gallery_dl-1.21.1/gallery_dl/extractor/500px.py0000644000175000017500000004367114176336637020020 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2019-2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://500px.com/""" from .common import Extractor, Message import json BASE_PATTERN = r"(?:https?://)?(?:web\.)?500px\.com" class _500pxExtractor(Extractor): """Base class for 500px extractors""" category = "500px" directory_fmt = ("{category}", "{user[username]}") filename_fmt = "{id}_{name}.{extension}" archive_fmt = "{id}" root = "https://500px.com" cookiedomain = ".500px.com" def __init__(self, match): Extractor.__init__(self, match) self.session.headers["Referer"] = self.root + "/" def items(self): data = self.metadata() for photo in self.photos(): url = photo["images"][-1]["url"] photo["extension"] = photo["image_format"] if data: photo.update(data) yield Message.Directory, photo yield Message.Url, url, photo def metadata(self): """Returns general metadata""" def photos(self): """Returns an iterable containing all relevant photo IDs""" def _extend(self, edges): """Extend photos with additional metadata and higher resolution URLs""" ids = [str(edge["node"]["legacyId"]) for edge in edges] url = "https://api.500px.com/v1/photos" params = { "expanded_user_info" : "true", "include_tags" : "true", "include_geo" : "true", "include_equipment_info": "true", "vendor_photos" : "true", "include_licensing" : "true", "include_releases" : "true", "liked_by" : "1", "following_sample" : "100", "image_size" : "4096", "ids" : ",".join(ids), } photos = self._request_api(url, params)["photos"] return [ photos[pid] for pid in ids if pid in photos or self.log.warning("Unable to fetch photo %s", pid) ] def _request_api(self, url, params): headers = { "Origin": self.root, "x-csrf-token": self.session.cookies.get( "x-csrf-token", domain=".500px.com"), } return self.request(url, headers=headers, params=params).json() def _request_graphql(self, opname, variables): url = "https://api.500px.com/graphql" headers = { "x-csrf-token": self.session.cookies.get( "x-csrf-token", domain=".500px.com"), } data = { "operationName": opname, "variables" : json.dumps(variables), "query" : QUERIES[opname], } return self.request( url, method="POST", headers=headers, json=data).json()["data"] class _500pxUserExtractor(_500pxExtractor): """Extractor for photos from a user's photostream on 500px.com""" subcategory = "user" pattern = BASE_PATTERN + r"/(?!photo/|liked)(?:p/)?([^/?#]+)/?(?:$|[?#])" test = ( ("https://500px.com/p/light_expression_photography", { "pattern": r"https?://drscdn.500px.org/photo/\d+/m%3D4096/v2", "range": "1-99", "count": 99, }), ("https://500px.com/light_expression_photography"), ("https://web.500px.com/light_expression_photography"), ) def __init__(self, match): _500pxExtractor.__init__(self, match) self.user = match.group(1) def photos(self): variables = {"username": self.user, "pageSize": 20} photos = self._request_graphql( "OtherPhotosQuery", variables, )["user"]["photos"] while True: yield from self._extend(photos["edges"]) if not photos["pageInfo"]["hasNextPage"]: return variables["cursor"] = photos["pageInfo"]["endCursor"] photos = self._request_graphql( "OtherPhotosPaginationContainerQuery", variables, )["userByUsername"]["photos"] class _500pxGalleryExtractor(_500pxExtractor): """Extractor for photo galleries on 500px.com""" subcategory = "gallery" directory_fmt = ("{category}", "{user[username]}", "{gallery[name]}") pattern = (BASE_PATTERN + r"/(?!photo/)(?:p/)?" r"([^/?#]+)/galleries/([^/?#]+)") test = ( ("https://500px.com/p/fashvamp/galleries/lera", { "url": "002dc81dee5b4a655f0e31ad8349e8903b296df6", "count": 3, "keyword": { "gallery": dict, "user": dict, }, }), ("https://500px.com/fashvamp/galleries/lera"), ) def __init__(self, match): _500pxExtractor.__init__(self, match) self.user_name, self.gallery_name = match.groups() self.user_id = self._photos = None def metadata(self): user = self._request_graphql( "ProfileRendererQuery", {"username": self.user_name}, )["profile"] self.user_id = str(user["legacyId"]) variables = { "galleryOwnerLegacyId": self.user_id, "ownerLegacyId" : self.user_id, "slug" : self.gallery_name, "token" : None, "pageSize" : 20, } gallery = self._request_graphql( "GalleriesDetailQueryRendererQuery", variables, )["gallery"] self._photos = gallery["photos"] del gallery["photos"] return { "gallery": gallery, "user" : user, } def photos(self): photos = self._photos variables = { "ownerLegacyId": self.user_id, "slug" : self.gallery_name, "token" : None, "pageSize" : 20, } while True: yield from self._extend(photos["edges"]) if not photos["pageInfo"]["hasNextPage"]: return variables["cursor"] = photos["pageInfo"]["endCursor"] photos = self._request_graphql( "GalleriesDetailPaginationContainerQuery", variables, )["galleryByOwnerIdAndSlugOrToken"]["photos"] class _500pxFavoriteExtractor(_500pxExtractor): """Extractor for favorite 500px photos""" subcategory = "favorite" pattern = BASE_PATTERN + r"/liked/?$" test = ("https://500px.com/liked",) def photos(self): variables = {"pageSize": 20} photos = self._request_graphql( "LikedPhotosQueryRendererQuery", variables, )["likedPhotos"] while True: yield from self._extend(photos["edges"]) if not photos["pageInfo"]["hasNextPage"]: return variables["cursor"] = photos["pageInfo"]["endCursor"] photos = self._request_graphql( "LikedPhotosPaginationContainerQuery", variables, )["likedPhotos"] class _500pxImageExtractor(_500pxExtractor): """Extractor for individual images from 500px.com""" subcategory = "image" pattern = BASE_PATTERN + r"/photo/(\d+)" test = ("https://500px.com/photo/222049255/queen-of-coasts", { "url": "fbdf7df39325cae02f5688e9f92935b0e7113315", "count": 1, "keyword": { "camera": "Canon EOS 600D", "camera_info": dict, "comments": list, "comments_count": int, "created_at": "2017-08-01T08:40:05+00:00", "description": str, "editored_by": None, "editors_choice": False, "extension": "jpg", "feature": "popular", "feature_date": "2017-08-01T09:58:28+00:00", "focal_length": "208", "height": 3111, "id": 222049255, "image_format": "jpg", "image_url": list, "images": list, "iso": "100", "lens": "EF-S55-250mm f/4-5.6 IS II", "lens_info": dict, "liked": None, "location": None, "location_details": dict, "name": "Queen Of Coasts", "nsfw": False, "privacy": False, "profile": True, "rating": float, "status": 1, "tags": list, "taken_at": "2017-05-04T17:36:51+00:00", "times_viewed": int, "url": "/photo/222049255/Queen-Of-Coasts-by-Olesya-Nabieva", "user": dict, "user_id": 12847235, "votes_count": int, "watermark": True, "width": 4637, }, }) def __init__(self, match): _500pxExtractor.__init__(self, match) self.photo_id = match.group(1) def photos(self): edges = ({"node": {"legacyId": self.photo_id}},) return self._extend(edges) QUERIES = { "OtherPhotosQuery": """\ query OtherPhotosQuery($username: String!, $pageSize: Int) { user: userByUsername(username: $username) { ...OtherPhotosPaginationContainer_user_RlXb8 id } } fragment OtherPhotosPaginationContainer_user_RlXb8 on User { photos(first: $pageSize, privacy: PROFILE, sort: ID_DESC) { edges { node { id legacyId canonicalPath width height name isLikedByMe notSafeForWork photographer: uploader { id legacyId username displayName canonicalPath followedByUsers { isFollowedByMe } } images(sizes: [33, 35]) { size url jpegUrl webpUrl id } __typename } cursor } totalCount pageInfo { endCursor hasNextPage } } } """, "OtherPhotosPaginationContainerQuery": """\ query OtherPhotosPaginationContainerQuery($username: String!, $pageSize: Int, $cursor: String) { userByUsername(username: $username) { ...OtherPhotosPaginationContainer_user_3e6UuE id } } fragment OtherPhotosPaginationContainer_user_3e6UuE on User { photos(first: $pageSize, after: $cursor, privacy: PROFILE, sort: ID_DESC) { edges { node { id legacyId canonicalPath width height name isLikedByMe notSafeForWork photographer: uploader { id legacyId username displayName canonicalPath followedByUsers { isFollowedByMe } } images(sizes: [33, 35]) { size url jpegUrl webpUrl id } __typename } cursor } totalCount pageInfo { endCursor hasNextPage } } } """, "ProfileRendererQuery": """\ query ProfileRendererQuery($username: String!) { profile: userByUsername(username: $username) { id legacyId userType: type username firstName displayName registeredAt canonicalPath avatar { ...ProfileAvatar_avatar id } userProfile { firstname lastname state country city about id } socialMedia { website twitter instagram facebook id } coverPhotoUrl followedByUsers { totalCount isFollowedByMe } followingUsers { totalCount } membership { expiryDate membershipTier: tier photoUploadQuota refreshPhotoUploadQuotaAt paymentStatus id } profileTabs { tabs { name visible } } ...EditCover_cover photoStats { likeCount viewCount } photos(privacy: PROFILE) { totalCount } licensingPhotos(status: ACCEPTED) { totalCount } portfolio { id status userDisabled } } } fragment EditCover_cover on User { coverPhotoUrl } fragment ProfileAvatar_avatar on UserAvatar { images(sizes: [MEDIUM, LARGE]) { size url id } } """, "GalleriesDetailQueryRendererQuery": """\ query GalleriesDetailQueryRendererQuery($galleryOwnerLegacyId: ID!, $ownerLegacyId: String, $slug: String, $token: String, $pageSize: Int, $gallerySize: Int) { galleries(galleryOwnerLegacyId: $galleryOwnerLegacyId, first: $gallerySize) { edges { node { legacyId description name privacy canonicalPath notSafeForWork buttonName externalUrl cover { images(sizes: [35, 33]) { size webpUrl jpegUrl id } id } photos { totalCount } id } } } gallery: galleryByOwnerIdAndSlugOrToken(ownerLegacyId: $ownerLegacyId, slug: $slug, token: $token) { ...GalleriesDetailPaginationContainer_gallery_RlXb8 id } } fragment GalleriesDetailPaginationContainer_gallery_RlXb8 on Gallery { id legacyId name privacy notSafeForWork ownPhotosOnly canonicalPath publicSlug lastPublishedAt photosAddedSinceLastPublished reportStatus creator { legacyId id } cover { images(sizes: [33, 32, 36, 2048]) { url size webpUrl id } id } description externalUrl buttonName photos(first: $pageSize) { totalCount edges { cursor node { id legacyId canonicalPath name description category uploadedAt location width height isLikedByMe photographer: uploader { id legacyId username displayName canonicalPath avatar { images(sizes: SMALL) { url id } id } followedByUsers { totalCount isFollowedByMe } } images(sizes: [33, 32]) { size url webpUrl id } __typename } } pageInfo { endCursor hasNextPage } } } """, "GalleriesDetailPaginationContainerQuery": """\ query GalleriesDetailPaginationContainerQuery($ownerLegacyId: String, $slug: String, $token: String, $pageSize: Int, $cursor: String) { galleryByOwnerIdAndSlugOrToken(ownerLegacyId: $ownerLegacyId, slug: $slug, token: $token) { ...GalleriesDetailPaginationContainer_gallery_3e6UuE id } } fragment GalleriesDetailPaginationContainer_gallery_3e6UuE on Gallery { id legacyId name privacy notSafeForWork ownPhotosOnly canonicalPath publicSlug lastPublishedAt photosAddedSinceLastPublished reportStatus creator { legacyId id } cover { images(sizes: [33, 32, 36, 2048]) { url size webpUrl id } id } description externalUrl buttonName photos(first: $pageSize, after: $cursor) { totalCount edges { cursor node { id legacyId canonicalPath name description category uploadedAt location width height isLikedByMe photographer: uploader { id legacyId username displayName canonicalPath avatar { images(sizes: SMALL) { url id } id } followedByUsers { totalCount isFollowedByMe } } images(sizes: [33, 32]) { size url webpUrl id } __typename } } pageInfo { endCursor hasNextPage } } } """, "LikedPhotosQueryRendererQuery": """\ query LikedPhotosQueryRendererQuery($pageSize: Int) { ...LikedPhotosPaginationContainer_query_RlXb8 } fragment LikedPhotosPaginationContainer_query_RlXb8 on Query { likedPhotos(first: $pageSize) { edges { node { id legacyId canonicalPath name description category uploadedAt location width height isLikedByMe notSafeForWork tags photographer: uploader { id legacyId username displayName canonicalPath avatar { images { url id } id } followedByUsers { totalCount isFollowedByMe } } images(sizes: [33, 35]) { size url jpegUrl webpUrl id } __typename } cursor } pageInfo { endCursor hasNextPage } } } """, "LikedPhotosPaginationContainerQuery": """\ query LikedPhotosPaginationContainerQuery($cursor: String, $pageSize: Int) { ...LikedPhotosPaginationContainer_query_3e6UuE } fragment LikedPhotosPaginationContainer_query_3e6UuE on Query { likedPhotos(first: $pageSize, after: $cursor) { edges { node { id legacyId canonicalPath name description category uploadedAt location width height isLikedByMe notSafeForWork tags photographer: uploader { id legacyId username displayName canonicalPath avatar { images { url id } id } followedByUsers { totalCount isFollowedByMe } } images(sizes: [33, 35]) { size url jpegUrl webpUrl id } __typename } cursor } pageInfo { endCursor hasNextPage } } } """, } ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1643756959.0 gallery_dl-1.21.1/gallery_dl/extractor/8kun.py0000644000175000017500000000643714176336637020030 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2020 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://8kun.top/""" from .common import Extractor, Message from .. import text class _8kunThreadExtractor(Extractor): """Extractor for 8kun threads""" category = "8kun" subcategory = "thread" directory_fmt = ("{category}", "{board}", "{thread} {title}") filename_fmt = "{time}{num:?-//} {filename}.{extension}" archive_fmt = "{board}_{thread}_{tim}" pattern = r"(?:https?://)?8kun\.top/([^/]+)/res/(\d+)" test = ( ("https://8kun.top/test/res/65248.html", { "pattern": r"https://media\.8kun\.top/file_store/\w{64}\.\w+", "count": ">= 8", }), # old-style file URLs (#1101) ("https://8kun.top/d/res/13258.html", { "pattern": r"https://media\.8kun\.top/d/src/\d+(-\d)?\.\w+", "range": "1-20", }), ) def __init__(self, match): Extractor.__init__(self, match) self.board, self.thread = match.groups() def items(self): url = "https://8kun.top/{}/res/{}.json".format(self.board, self.thread) posts = self.request(url).json()["posts"] title = posts[0].get("sub") or text.remove_html(posts[0]["com"]) process = self._process data = { "board" : self.board, "thread": self.thread, "title" : text.unescape(title)[:50], "num" : 0, } yield Message.Directory, data for post in posts: if "filename" in post: yield process(post, data) if "extra_files" in post: for post["num"], filedata in enumerate( post["extra_files"], 1): yield process(post, filedata) @staticmethod def _process(post, data): post.update(data) post["extension"] = post["ext"][1:] tim = post["tim"] url = ("https://media.8kun.top/" + ("file_store/" if len(tim) > 16 else post["board"] + "/src/") + tim + post["ext"]) return Message.Url, url, post class _8kunBoardExtractor(Extractor): """Extractor for 8kun boards""" category = "8kun" subcategory = "board" pattern = r"(?:https?://)?8kun\.top/([^/?#]+)/(?:index|\d+)\.html" test = ( ("https://8kun.top/v/index.html", { "pattern": _8kunThreadExtractor.pattern, "count": ">= 100", }), ("https://8kun.top/v/2.html"), ("https://8kun.top/v/index.html?PageSpeed=noscript"), ) def __init__(self, match): Extractor.__init__(self, match) self.board = match.group(1) def items(self): url = "https://8kun.top/{}/threads.json".format(self.board) threads = self.request(url).json() for page in threads: for thread in page["threads"]: url = "https://8kun.top/{}/res/{}.html".format( self.board, thread["no"]) thread["page"] = page["page"] thread["_extractor"] = _8kunThreadExtractor yield Message.Queue, url, thread ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1643756959.0 gallery_dl-1.21.1/gallery_dl/extractor/8muses.py0000644000175000017500000001162514176336637020362 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2019-2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://comics.8muses.com/""" from .common import Extractor, Message from .. import text import json class _8musesAlbumExtractor(Extractor): """Extractor for image albums on comics.8muses.com""" category = "8muses" subcategory = "album" directory_fmt = ("{category}", "{album[path]}") filename_fmt = "{page:>03}.{extension}" archive_fmt = "{hash}" root = "https://comics.8muses.com" pattern = (r"(?:https?://)?(?:comics\.|www\.)?8muses\.com" r"(/comics/album/[^?#]+)(\?[^#]+)?") test = ( ("https://comics.8muses.com/comics/album/Fakku-Comics/mogg/Liar", { "url": "6286ac33087c236c5a7e51f8a9d4e4d5548212d4", "pattern": r"https://comics.8muses.com/image/fl/[\w-]+", "keyword": { "url" : str, "hash" : str, "page" : int, "count": 6, "album": { "id" : 10467, "title" : "Liar", "path" : "Fakku Comics/mogg/Liar", "private": False, "url" : str, "parent" : 10464, "views" : int, "likes" : int, "date" : "dt:2018-07-10 00:00:00", }, }, }), ("https://www.8muses.com/comics/album/Fakku-Comics/santa", { "count": ">= 3", "pattern": pattern, "keyword": { "url" : str, "name" : str, "private": False, }, }), # custom sorting ("https://www.8muses.com/comics/album/Fakku-Comics/9?sort=az", { "count": ">= 70", "keyword": {"name": r"re:^[R-Zr-z]"}, }), # non-ASCII characters (("https://comics.8muses.com/comics/album/Various-Authors/Chessire88" "/From-Trainers-to-Pokmons"), { "count": 2, "keyword": {"name": "re:From Trainers to Pokémons"}, }), ) def __init__(self, match): Extractor.__init__(self, match) self.path = match.group(1) self.params = match.group(2) or "" def items(self): url = self.root + self.path + self.params while True: data = self._unobfuscate(text.extract( self.request(url).text, 'id="ractive-public" type="text/plain">', '')[0]) images = data.get("pictures") if images: count = len(images) album = self._make_album(data["album"]) yield Message.Directory, {"album": album, "count": count} for num, image in enumerate(images, 1): url = self.root + "/image/fl/" + image["publicUri"] img = { "url" : url, "page" : num, "hash" : image["publicUri"], "count" : count, "album" : album, "extension": "jpg", } yield Message.Url, url, img albums = data.get("albums") if albums: for album in albums: url = self.root + "/comics/album/" + album["permalink"] yield Message.Queue, url, { "url" : url, "name" : album["name"], "private" : album["isPrivate"], "_extractor": _8musesAlbumExtractor, } if data["page"] >= data["pages"]: return path, _, num = self.path.rstrip("/").rpartition("/") path = path if num.isdecimal() else self.path url = "{}{}/{}{}".format( self.root, path, data["page"] + 1, self.params) def _make_album(self, album): return { "id" : album["id"], "path" : album["path"], "title" : album["name"], "private": album["isPrivate"], "url" : self.root + album["permalink"], "parent" : text.parse_int(album["parentId"]), "views" : text.parse_int(album["numberViews"]), "likes" : text.parse_int(album["numberLikes"]), "date" : text.parse_datetime( album["updatedAt"], "%Y-%m-%dT%H:%M:%S.%fZ"), } @staticmethod def _unobfuscate(data): return json.loads("".join([ chr(33 + (ord(c) + 14) % 94) if "!" <= c <= "~" else c for c in text.unescape(data.strip("\t\n\r !")) ])) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1648568192.0 gallery_dl-1.21.1/gallery_dl/extractor/__init__.py0000644000175000017500000001007214220623600020643 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2015-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. import re modules = [ "2chan", "35photo", "3dbooru", "420chan", "4chan", "500px", "8kun", "8muses", "adultempire", "architizer", "artstation", "aryion", "bbc", "bcy", "behance", "blogger", "comicvine", "cyberdrop", "danbooru", "desktopography", "deviantart", "dynastyscans", "erome", "exhentai", "fallenangels", "fanbox", "fantia", "flickr", "furaffinity", "fuskator", "gelbooru", "gelbooru_v01", "gelbooru_v02", "gfycat", "gofile", "hbrowse", "hentai2read", "hentaicosplays", "hentaifoundry", "hentaifox", "hentaihand", "hentaihere", "hiperdex", "hitomi", "idolcomplex", "imagebam", "imagechest", "imagefap", "imgbb", "imgbox", "imgth", "imgur", "inkbunny", "instagram", "issuu", "kabeuchi", "keenspot", "kemonoparty", "khinsider", "kissgoddess", "kohlchan", "komikcast", "lightroom", "lineblog", "livedoor", "luscious", "mangadex", "mangafox", "mangahere", "mangakakalot", "manganelo", "mangapark", "mangasee", "mangoxo", "mememuseum", "myhentaigallery", "myportfolio", "naver", "naverwebtoon", "newgrounds", "ngomik", "nhentai", "nijie", "nozomi", "nsfwalbum", "paheal", "patreon", "philomena", "photobucket", "photovogue", "picarto", "piczel", "pillowfort", "pinterest", "pixiv", "pixnet", "plurk", "pornhub", "pururin", "reactor", "readcomiconline", "reddit", "redgifs", "rule34us", "sankaku", "sankakucomplex", "seiga", "senmanga", "sexcom", "simplyhentai", "skeb", "slickpic", "slideshare", "smugmug", "speakerdeck", "subscribestar", "tapas", "telegraph", "toyhouse", "tsumino", "tumblr", "tumblrgallery", "twibooru", "twitter", "unsplash", "vanillarock", "vk", "vsco", "wallhaven", "wallpapercave", "warosu", "weasyl", "webtoons", "weibo", "wikiart", "wikieat", "xhamster", "xvideos", "booru", "moebooru", "foolfuuka", "foolslide", "mastodon", "shopify", "lolisafe", "imagehosts", "directlink", "recursive", "oauth", "test", "ytdl", "generic", ] def find(url): """Find a suitable extractor for the given URL""" for cls in _list_classes(): match = cls.pattern.match(url) if match: return cls(match) return None def add(cls): """Add 'cls' to the list of available extractors""" cls.pattern = re.compile(cls.pattern) _cache.append(cls) return cls def add_module(module): """Add all extractors in 'module' to the list of available extractors""" classes = _get_classes(module) for cls in classes: cls.pattern = re.compile(cls.pattern) _cache.extend(classes) return classes def extractors(): """Yield all available extractor classes""" return sorted( _list_classes(), key=lambda x: x.__name__ ) # -------------------------------------------------------------------- # internals _cache = [] _module_iter = iter(modules) def _list_classes(): """Yield all available extractor classes""" yield from _cache globals_ = globals() for module_name in _module_iter: module = __import__(module_name, globals_, None, (), 1) yield from add_module(module) globals_["_list_classes"] = lambda : _cache def _get_classes(module): """Return a list of all extractor classes in a module""" return [ cls for cls in module.__dict__.values() if ( hasattr(cls, "pattern") and cls.__module__ == module.__name__ ) ] ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1639190302.0 gallery_dl-1.21.1/gallery_dl/extractor/adultempire.py0000644000175000017500000000421414155007436021432 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2019 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.adultempire.com/""" from .common import GalleryExtractor from .. import text class AdultempireGalleryExtractor(GalleryExtractor): """Extractor for image galleries from www.adultempire.com""" category = "adultempire" root = "https://www.adultempire.com" pattern = (r"(?:https?://)?(?:www\.)?adult(?:dvd)?empire\.com" r"(/(\d+)/gallery\.html)") test = ( ("https://www.adultempire.com/5998/gallery.html", { "range": "1", "keyword": "5b3266e69801db0d78c22181da23bc102886e027", "content": "5c6beb31e5e3cdc90ee5910d5c30f9aaec977b9e", }), ("https://www.adultdvdempire.com/5683/gallery.html", { "url": "b12cd1a65cae8019d837505adb4d6a2c1ed4d70d", "keyword": "8d448d79c4ac5f5b10a3019d5b5129ddb43655e5", }), ) def __init__(self, match): GalleryExtractor.__init__(self, match) self.gallery_id = match.group(2) def metadata(self, page): extr = text.extract_from(page, page.index('
')) return { "gallery_id": text.parse_int(self.gallery_id), "title" : text.unescape(extr('title="', '"')), "studio" : extr(">studio", "<").strip(), "date" : text.parse_datetime(extr( ">released", "<").strip(), "%m/%d/%Y"), "actors" : sorted(text.split_html(extr( '
    ", "<").strip(), "type" : text.unescape(text.remove_html(extr( '
    Type
    ', 'STATUS
', 'YEAR', 'SIZE', '', '') .replace("
", "\n")), } def images(self, page): return [ (url, None) for url in text.extract_iter( page, "property='og:image:secure_url' content='", "?") ] class ArchitizerFirmExtractor(Extractor): """Extractor for all projects of a firm""" category = "architizer" subcategory = "firm" root = "https://architizer.com" pattern = r"(?:https?://)?architizer\.com/firms/([^/?#]+)" test = ("https://architizer.com/firms/olson-kundig/", { "pattern": ArchitizerProjectExtractor.pattern, "count": ">= 90", }) def __init__(self, match): Extractor.__init__(self, match) self.firm = match.group(1) def items(self): url = url = "{}/firms/{}/?requesting_merlin=pages".format( self.root, self.firm) page = self.request(url).text data = {"_extractor": ArchitizerProjectExtractor} for project in text.extract_iter(page, '
= data["total_count"]: return params["page"] += 1 @staticmethod def _no_cache(url, alphabet=(string.digits + string.ascii_letters)): """Cause a cache miss to prevent Cloudflare 'optimizations' Cloudflare's 'Polish' optimization strips image metadata and may even recompress an image as lossy JPEG. This can be prevented by causing a cache miss when requesting an image by adding a random dummy query parameter. Ref: https://github.com/r888888888/danbooru/issues/3528 https://danbooru.donmai.us/forum_topics/14952 """ param = "gallerydl_no_cache=" + util.bencode( random.getrandbits(64), alphabet) sep = "&" if "?" in url else "?" return url + sep + param class ArtstationUserExtractor(ArtstationExtractor): """Extractor for all projects of an artstation user""" subcategory = "user" pattern = (r"(?:https?://)?(?:(?:www\.)?artstation\.com" r"/(?!artwork|projects|search)([^/?#]+)(?:/albums/all)?" r"|((?!www)\w+)\.artstation\.com(?:/projects)?)/?$") test = ( ("https://www.artstation.com/gaerikim/", { "pattern": r"https://\w+\.artstation\.com/p/assets/images" r"/images/\d+/\d+/\d+/(4k|large|medium|small)/[^/]+", "count": ">= 6", }), ("https://www.artstation.com/gaerikim/albums/all/"), ("https://gaerikim.artstation.com/"), ("https://gaerikim.artstation.com/projects/"), ) def projects(self): url = "{}/users/{}/projects.json".format(self.root, self.user) params = {"album_id": "all"} return self._pagination(url, params) class ArtstationAlbumExtractor(ArtstationExtractor): """Extractor for all projects in an artstation album""" subcategory = "album" directory_fmt = ("{category}", "{userinfo[username]}", "Albums", "{album[id]} - {album[title]}") archive_fmt = "a_{album[id]}_{asset[id]}" pattern = (r"(?:https?://)?(?:(?:www\.)?artstation\.com" r"/(?!artwork|projects|search)([^/?#]+)" r"|((?!www)\w+)\.artstation\.com)/albums/(\d+)") test = ( ("https://www.artstation.com/huimeiye/albums/770899", { "count": 2, }), ("https://www.artstation.com/huimeiye/albums/770898", { "exception": exception.NotFoundError, }), ("https://huimeiye.artstation.com/albums/770899"), ) def __init__(self, match): ArtstationExtractor.__init__(self, match) self.album_id = text.parse_int(match.group(3)) def metadata(self): userinfo = self.get_user_info(self.user) album = None for album in userinfo["albums_with_community_projects"]: if album["id"] == self.album_id: break else: raise exception.NotFoundError("album") return { "userinfo": userinfo, "album": album } def projects(self): url = "{}/users/{}/projects.json".format(self.root, self.user) params = {"album_id": self.album_id} return self._pagination(url, params) class ArtstationLikesExtractor(ArtstationExtractor): """Extractor for liked projects of an artstation user""" subcategory = "likes" directory_fmt = ("{category}", "{userinfo[username]}", "Likes") archive_fmt = "f_{userinfo[id]}_{asset[id]}" pattern = (r"(?:https?://)?(?:www\.)?artstation\.com" r"/(?!artwork|projects|search)([^/?#]+)/likes/?") test = ( ("https://www.artstation.com/mikf/likes", { "pattern": r"https://\w+\.artstation\.com/p/assets/images" r"/images/\d+/\d+/\d+/(4k|large|medium|small)/[^/]+", "count": 6, }), # no likes ("https://www.artstation.com/sungchoi/likes", { "count": 0, }), ) def projects(self): url = "{}/users/{}/likes.json".format(self.root, self.user) return self._pagination(url) class ArtstationChallengeExtractor(ArtstationExtractor): """Extractor for submissions of artstation challenges""" subcategory = "challenge" filename_fmt = "{submission_id}_{asset_id}_{filename}.{extension}" directory_fmt = ("{category}", "Challenges", "{challenge[id]} - {challenge[title]}") archive_fmt = "c_{challenge[id]}_{asset_id}" pattern = (r"(?:https?://)?(?:www\.)?artstation\.com" r"/contests/[^/?#]+/challenges/(\d+)" r"/?(?:\?sorting=([a-z]+))?") test = ( ("https://www.artstation.com/contests/thu-2017/challenges/20"), (("https://www.artstation.com/contests/beyond-human" "/challenges/23?sorting=winners"), { "range": "1-30", "count": 30, }), ) def __init__(self, match): ArtstationExtractor.__init__(self, match) self.challenge_id = match.group(1) self.sorting = match.group(2) or "popular" def items(self): challenge_url = "{}/contests/_/challenges/{}.json".format( self.root, self.challenge_id) submission_url = "{}/contests/_/challenges/{}/submissions.json".format( self.root, self.challenge_id) update_url = "{}/contests/submission_updates.json".format( self.root) challenge = self.request(challenge_url).json() yield Message.Directory, {"challenge": challenge} params = {"sorting": self.sorting} for submission in self._pagination(submission_url, params): params = {"submission_id": submission["id"]} for update in self._pagination(update_url, params=params): del update["replies"] update["challenge"] = challenge for url in text.extract_iter( update["body_presentation_html"], ' href="', '"'): update["asset_id"] = self._id_from_url(url) text.nameext_from_url(url, update) yield Message.Url, self._no_cache(url), update @staticmethod def _id_from_url(url): """Get an image's submission ID from its URL""" parts = url.split("/") return text.parse_int("".join(parts[7:10])) class ArtstationSearchExtractor(ArtstationExtractor): """Extractor for artstation search results""" subcategory = "search" directory_fmt = ("{category}", "Searches", "{search[query]}") archive_fmt = "s_{search[query]}_{asset[id]}" pattern = (r"(?:https?://)?(?:\w+\.)?artstation\.com" r"/search/?\?([^#]+)") test = ("https://www.artstation.com/search?q=ancient&sort_by=rank", { "range": "1-20", "count": 20, }) def __init__(self, match): ArtstationExtractor.__init__(self, match) query = text.parse_query(match.group(1)) self.query = query.get("q", "") self.sorting = query.get("sort_by", "rank").lower() def metadata(self): return {"search": { "query" : self.query, "sorting": self.sorting, }} def projects(self): url = "{}/api/v2/search/projects.json".format(self.root) return self._pagination(url, json={ "additional_fields": "[]", "filters" : "[]", "page" : None, "per_page" : "50", "pro_first" : "1", "query" : self.query, "sorting" : self.sorting, }) class ArtstationArtworkExtractor(ArtstationExtractor): """Extractor for projects on artstation's artwork page""" subcategory = "artwork" directory_fmt = ("{category}", "Artworks", "{artwork[sorting]!c}") archive_fmt = "A_{asset[id]}" pattern = (r"(?:https?://)?(?:\w+\.)?artstation\.com" r"/artwork/?\?([^#]+)") test = ("https://www.artstation.com/artwork?sorting=latest", { "range": "1-20", "count": 20, }) def __init__(self, match): ArtstationExtractor.__init__(self, match) self.query = text.parse_query(match.group(1)) def metadata(self): return {"artwork": self.query} def projects(self): url = "{}/projects.json".format(self.root) return self._pagination(url, self.query.copy()) class ArtstationImageExtractor(ArtstationExtractor): """Extractor for images from a single artstation project""" subcategory = "image" pattern = (r"(?:https?://)?(?:" r"(?:\w+\.)?artstation\.com/(?:artwork|projects|search)" r"|artstn\.co/p)/(\w+)") test = ( ("https://www.artstation.com/artwork/LQVJr", { "pattern": r"https?://\w+\.artstation\.com/p/assets" r"/images/images/008/760/279/4k/.+", "content": "7b113871465fdc09d127adfdc2767d51cf45a7e9", # SHA1 hash without _no_cache() # "content": "44b80f9af36d40efc5a2668cdd11d36d6793bae9", }), # multiple images per project ("https://www.artstation.com/artwork/Db3dy", { "count": 4, }), # embedded youtube video ("https://www.artstation.com/artwork/g4WPK", { "range": "2", "options": (("external", True),), "pattern": "ytdl:https://www.youtube.com/embed/JNFfJtwwrU0", }), # alternate URL patterns ("https://sungchoi.artstation.com/projects/LQVJr"), ("https://artstn.co/p/LQVJr"), ) def __init__(self, match): ArtstationExtractor.__init__(self, match) self.project_id = match.group(1) self.assets = None def metadata(self): self.assets = list(ArtstationExtractor.get_project_assets( self, self.project_id)) self.user = self.assets[0]["user"]["username"] return ArtstationExtractor.metadata(self) def projects(self): return ({"hash_id": self.project_id},) def get_project_assets(self, project_id): return self.assets class ArtstationFollowingExtractor(ArtstationExtractor): """Extractor for a user's followed users""" subcategory = "following" pattern = (r"(?:https?://)?(?:www\.)?artstation\.com" r"/(?!artwork|projects|search)([^/?#]+)/following") test = ("https://www.artstation.com/gaerikim/following", { "pattern": ArtstationUserExtractor.pattern, "count": ">= 50", }) def items(self): url = "{}/users/{}/following.json".format(self.root, self.user) for user in self._pagination(url): url = "{}/{}".format(self.root, user["username"]) user["_extractor"] = ArtstationUserExtractor yield Message.Queue, url, user ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1648567962.0 gallery_dl-1.21.1/gallery_dl/extractor/aryion.py0000644000175000017500000002217014220623232020410 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2020-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://aryion.com/""" from .common import Extractor, Message from .. import text, util, exception from ..cache import cache from email.utils import parsedate_tz from datetime import datetime BASE_PATTERN = r"(?:https?://)?(?:www\.)?aryion\.com/g4" class AryionExtractor(Extractor): """Base class for aryion extractors""" category = "aryion" directory_fmt = ("{category}", "{user!l}", "{path:J - }") filename_fmt = "{id} {title}.{extension}" archive_fmt = "{id}" cookiedomain = ".aryion.com" cookienames = ("phpbb3_rl7a3_sid",) root = "https://aryion.com" def __init__(self, match): Extractor.__init__(self, match) self.user = match.group(1) self.recursive = True def login(self): if self._check_cookies(self.cookienames): return username, password = self._get_auth_info() if username: self._update_cookies(self._login_impl(username, password)) @cache(maxage=14*24*3600, keyarg=1) def _login_impl(self, username, password): self.log.info("Logging in as %s", username) url = self.root + "/forum/ucp.php?mode=login" data = { "username": username, "password": password, "login": "Login", } response = self.request(url, method="POST", data=data) if b"You have been successfully logged in." not in response.content: raise exception.AuthenticationError() return {c: response.cookies[c] for c in self.cookienames} def items(self): self.login() data = self.metadata() for post_id in self.posts(): post = self._parse_post(post_id) if post: if data: post.update(data) yield Message.Directory, post yield Message.Url, post["url"], post elif post is False and self.recursive: base = self.root + "/g4/view/" data = {"_extractor": AryionPostExtractor} for post_id in self._pagination_params(base + post_id): yield Message.Queue, base + post_id, data def posts(self): """Yield relevant post IDs""" def metadata(self): """Return general metadata""" def _pagination_params(self, url, params=None): if params is None: params = {"p": 1} else: params["p"] = text.parse_int(params.get("p"), 1) while True: page = self.request(url, params=params).text cnt = 0 for post_id in text.extract_iter( page, "class='gallery-item' id='", "'"): cnt += 1 yield post_id if cnt < 40: return params["p"] += 1 def _pagination_next(self, url): while True: page = self.request(url).text yield from text.extract_iter(page, "thumb' href='/g4/view/", "'") pos = page.find("Next >>") if pos < 0: return url = self.root + text.rextract(page, "href='", "'", pos)[0] def _parse_post(self, post_id): url = "{}/g4/data.php?id={}".format(self.root, post_id) with self.request(url, method="HEAD", fatal=False) as response: if response.status_code >= 400: self.log.warning( "Unable to fetch post %s ('%s %s')", post_id, response.status_code, response.reason) return None headers = response.headers # folder if headers["content-type"] in ( "application/x-folder", "application/x-comic-folder", "application/x-comic-folder-nomerge", ): return False # get filename from 'Content-Disposition' header cdis = headers["content-disposition"] fname, _, ext = text.extract( cdis, 'filename="', '"')[0].rpartition(".") if not fname: fname, ext = ext, fname # get file size from 'Content-Length' header clen = headers.get("content-length") # fix 'Last-Modified' header lmod = headers["last-modified"] if lmod[22] != ":": lmod = "{}:{} GMT".format(lmod[:22], lmod[22:24]) post_url = "{}/g4/view/{}".format(self.root, post_id) extr = text.extract_from(self.request(post_url).text) title, _, artist = text.unescape(extr( "g4 :: ", "<")).rpartition(" by ") return { "id" : text.parse_int(post_id), "url" : url, "user" : self.user or artist, "title" : title, "artist": artist, "path" : text.split_html(extr( "cookiecrumb'>", '</span'))[4:-1:2], "date" : datetime(*parsedate_tz(lmod)[:6]), "size" : text.parse_int(clen), "views" : text.parse_int(extr("Views</b>:", "<").replace(",", "")), "width" : text.parse_int(extr("Resolution</b>:", "x")), "height": text.parse_int(extr("", "<")), "comments" : text.parse_int(extr("Comments</b>:", "<")), "favorites": text.parse_int(extr("Favorites</b>:", "<")), "tags" : text.split_html(extr("class='taglist'>", "</span>")), "description": text.unescape(text.remove_html(extr( "<p>", "</p>"), "", "")), "filename" : fname, "extension": ext, "_mtime" : lmod, } class AryionGalleryExtractor(AryionExtractor): """Extractor for a user's gallery on eka's portal""" subcategory = "gallery" categorytransfer = True pattern = BASE_PATTERN + r"/(?:gallery/|user/|latest.php\?name=)([^/?#]+)" test = ( ("https://aryion.com/g4/gallery/jameshoward", { "options": (("recursive", False),), "pattern": r"https://aryion\.com/g4/data\.php\?id=\d+$", "range": "48-52", "count": 5, }), ("https://aryion.com/g4/user/jameshoward"), ("https://aryion.com/g4/latest.php?name=jameshoward"), ) def __init__(self, match): AryionExtractor.__init__(self, match) self.recursive = self.config("recursive", True) self.offset = 0 def skip(self, num): if self.recursive: return 0 self.offset += num return num def posts(self): if self.recursive: url = "{}/g4/gallery/{}".format(self.root, self.user) return self._pagination_params(url) else: url = "{}/g4/latest.php?name={}".format(self.root, self.user) return util.advance(self._pagination_next(url), self.offset) class AryionTagExtractor(AryionExtractor): """Extractor for tag searches on eka's portal""" subcategory = "tag" directory_fmt = ("{category}", "tags", "{search_tags}") archive_fmt = "t_{search_tags}_{id}" pattern = BASE_PATTERN + r"/tags\.php\?([^#]+)" test = ("https://aryion.com/g4/tags.php?tag=star+wars&p=19", { "count": ">= 5", }) def metadata(self): self.params = text.parse_query(self.user) self.user = None return {"search_tags": self.params.get("tag")} def posts(self): url = self.root + "/g4/tags.php" return self._pagination_params(url, self.params) class AryionPostExtractor(AryionExtractor): """Extractor for individual posts on eka's portal""" subcategory = "post" pattern = BASE_PATTERN + r"/view/(\d+)" test = ( ("https://aryion.com/g4/view/510079", { "url": "f233286fa5558c07ae500f7f2d5cb0799881450e", "keyword": { "artist" : "jameshoward", "user" : "jameshoward", "filename" : "jameshoward-510079-subscribestar_150", "extension": "jpg", "id" : 510079, "width" : 1665, "height" : 1619, "size" : 784239, "title" : "I'm on subscribestar now too!", "description": r"re:Doesn't hurt to have a backup, right\?", "tags" : ["Non-Vore", "subscribestar"], "date" : "dt:2019-02-16 19:30:34", "path" : [], "views" : int, "favorites": int, "comments" : int, "_mtime" : "Sat, 16 Feb 2019 19:30:34 GMT", }, }), # x-folder (#694) ("https://aryion.com/g4/view/588928", { "pattern": pattern, "count": ">= 8", }), # x-comic-folder (#945) ("https://aryion.com/g4/view/537379", { "pattern": pattern, "count": 2, }), ) def posts(self): post_id, self.user = self.user, None return (post_id,) ��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1643756959.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.21.1/gallery_dl/extractor/bbc.py�������������������������������������������������������0000644�0001750�0001750�00000007144�14176336637�017665� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://bbc.co.uk/""" from .common import GalleryExtractor, Extractor, Message from .. import text, util import json BASE_PATTERN = r"(?:https?://)?(?:www\.)?bbc\.co\.uk(/programmes/" class BbcGalleryExtractor(GalleryExtractor): """Extractor for a programme gallery on bbc.co.uk""" category = "bbc" root = "https://www.bbc.co.uk" directory_fmt = ("{category}", "{path[0]}", "{path[1]}", "{path[2]}", "{path[3:]:J - /}") filename_fmt = "{num:>02}.{extension}" archive_fmt = "{programme}_{num}" pattern = BASE_PATTERN + r"[^/?#]+(?!/galleries)(?:/[^/?#]+)?)$" test = ( ("https://www.bbc.co.uk/programmes/p084qtzs/p085g9kg", { "pattern": r"https://ichef\.bbci\.co\.uk" r"/images/ic/1920xn/\w+\.jpg", "count": 37, "keyword": { "programme": "p084qtzs", "path": ["BBC One", "Doctor Who", "The Timeless Children"], }, }), ("https://www.bbc.co.uk/programmes/p084qtzs"), ) def metadata(self, page): data = json.loads(text.extract( page, '<script type="application/ld+json">', '</script>')[0]) return { "programme": self.gallery_url.split("/")[4], "path": list(util.unique_sequence( element["name"] for element in data["itemListElement"] )), } def images(self, page): width = self.config("width") width = width - width % 16 if width else 1920 dimensions = "/{}xn/".format(width) return [ (src.replace("/320x180_b/", dimensions), {"_fallback": self._fallback_urls(src, width)}) for src in text.extract_iter(page, 'data-image-src="', '"') ] @staticmethod def _fallback_urls(src, max_width): front, _, back = src.partition("/320x180_b/") for width in (1920, 1600, 1280, 976): if width < max_width: yield "{}/{}xn/{}".format(front, width, back) class BbcProgrammeExtractor(Extractor): """Extractor for all galleries of a bbc programme""" category = "bbc" subcategory = "programme" root = "https://www.bbc.co.uk" pattern = BASE_PATTERN + r"[^/?#]+/galleries)(?:/?\?page=(\d+))?" test = ( ("https://www.bbc.co.uk/programmes/b006q2x0/galleries", { "pattern": BbcGalleryExtractor.pattern, "range": "1-50", "count": ">= 50", }), ("https://www.bbc.co.uk/programmes/b006q2x0/galleries?page=40", { "pattern": BbcGalleryExtractor.pattern, "count": ">= 100", }), ) def __init__(self, match): Extractor.__init__(self, match) self.path, self.page = match.groups() def items(self): data = {"_extractor": BbcGalleryExtractor} params = {"page": text.parse_int(self.page, 1)} galleries_url = self.root + self.path while True: page = self.request(galleries_url, params=params).text for programme_id in text.extract_iter( page, '<a href="https://www.bbc.co.uk/programmes/', '"'): url = "https://www.bbc.co.uk/programmes/" + programme_id yield Message.Queue, url, data if 'rel="next"' not in page: return params["page"] += 1 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1643756959.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.21.1/gallery_dl/extractor/bcy.py�������������������������������������������������������0000644�0001750�0001750�00000015340�14176336637�017711� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2020-2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://bcy.net/""" from .common import Extractor, Message from .. import text, exception import json import re class BcyExtractor(Extractor): """Base class for bcy extractors""" category = "bcy" directory_fmt = ("{category}", "{user[id]} {user[name]}") filename_fmt = "{post[id]} {id}.{extension}" archive_fmt = "{post[id]}_{id}" root = "https://bcy.net" def __init__(self, match): Extractor.__init__(self, match) self.item_id = match.group(1) def items(self): sub = re.compile(r"^https?://p\d+-bcy\.byteimg\.com/img/banciyuan").sub iroot = "https://img-bcy-qn.pstatp.com" noop = self.config("noop") for post in self.posts(): if not post["image_list"]: continue multi = None tags = post.get("post_tags") or () data = { "user": { "id" : post["uid"], "name" : post["uname"], "avatar" : sub(iroot, post["avatar"].partition("~")[0]), }, "post": { "id" : text.parse_int(post["item_id"]), "tags" : [t["tag_name"] for t in tags], "date" : text.parse_timestamp(post["ctime"]), "parody" : post["work"], "content": post["plain"], "likes" : post["like_count"], "shares" : post["share_count"], "replies": post["reply_count"], }, } yield Message.Directory, data for data["num"], image in enumerate(post["image_list"], 1): data["id"] = image["mid"] data["width"] = image["w"] data["height"] = image["h"] url = image["path"].partition("~")[0] text.nameext_from_url(url, data) if data["extension"]: if not url.startswith(iroot): url = sub(iroot, url) data["filter"] = "" yield Message.Url, url, data else: if not multi: if len(post["multi"]) < len(post["image_list"]): multi = self._data_from_post(post["item_id"]) multi = multi["post_data"]["multi"] else: multi = post["multi"] image = multi[data["num"] - 1] if image["origin"]: data["filter"] = "watermark" yield Message.Url, image["origin"], data if noop: data["extension"] = "" data["filter"] = "noop" yield Message.Url, image["original_path"], data def posts(self): """Returns an iterable with all relevant 'post' objects""" def _data_from_post(self, post_id): url = "{}/item/detail/{}".format(self.root, post_id) page = self.request(url, notfound="post").text return json.loads( text.extract(page, 'JSON.parse("', '");')[0] .replace('\\\\u002F', '/') .replace('\\"', '"') )["detail"] class BcyUserExtractor(BcyExtractor): """Extractor for user timelines""" subcategory = "user" pattern = r"(?:https?://)?bcy\.net/u/(\d+)" test = ( ("https://bcy.net/u/1933712", { "pattern": r"https://img-bcy-qn.pstatp.com/\w+/\d+/post/\w+/.+jpg", "count": ">= 20", }), ("https://bcy.net/u/109282764041", { "pattern": r"https://p\d-bcy.byteimg.com/img/banciyuan/[0-9a-f]+" r"~tplv-banciyuan-logo-v3:.+\.image", "range": "1-25", "count": 25, }), ) def posts(self): url = self.root + "/apiv3/user/selfPosts" params = {"uid": self.item_id, "since": None} while True: data = self.request(url, params=params).json() try: items = data["data"]["items"] except KeyError: return if not items: return for item in items: yield item["item_detail"] params["since"] = item["since"] class BcyPostExtractor(BcyExtractor): """Extractor for individual posts""" subcategory = "post" pattern = r"(?:https?://)?bcy\.net/item/detail/(\d+)" test = ( ("https://bcy.net/item/detail/6355835481002893070", { "url": "301202375e61fd6e0e2e35de6c3ac9f74885dec3", "count": 1, "keyword": { "user": { "id" : 1933712, "name" : "wukloo", "avatar" : "re:https://img-bcy-qn.pstatp.com/Public/", }, "post": { "id" : 6355835481002893070, "tags" : list, "date" : "dt:2016-11-22 08:47:46", "parody" : "东方PROJECT", "content": "re:根据微博的建议稍微做了点修改", "likes" : int, "shares" : int, "replies": int, }, "id": 8330182, "num": 1, "width" : 3000, "height": 1687, "filename": "712e0780b09011e696f973c3d1568337", "extension": "jpg", }, }), # only watermarked images available ("https://bcy.net/item/detail/6950136331708144648", { "pattern": r"https://p\d-bcy.byteimg.com/img/banciyuan/[0-9a-f]+" r"~tplv-banciyuan-logo-v3:.+\.image", "count": 8, "keyword": {"filter": "watermark"}, }), # deleted ("https://bcy.net/item/detail/6780546160802143236", { "exception": exception.NotFoundError, "count": 0, }), # only visible to logged in users ("https://bcy.net/item/detail/6747523535150783495", { "count": 0, }), ) def posts(self): try: data = self._data_from_post(self.item_id) except KeyError: return () post = data["post_data"] post["image_list"] = post["multi"] post["plain"] = text.parse_unicode_escapes(post["plain"]) post.update(data["detail_user"]) return (post,) ������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1646253139.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.21.1/gallery_dl/extractor/behance.py���������������������������������������������������0000644�0001750�0001750�00000026620�14207752123�020507� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2018-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extract images from https://www.behance.net/""" from .common import Extractor, Message from .. import text import json class BehanceExtractor(Extractor): """Base class for behance extractors""" category = "behance" root = "https://www.behance.net" def items(self): for gallery in self.galleries(): gallery["_extractor"] = BehanceGalleryExtractor yield Message.Queue, gallery["url"], self._update(gallery) def galleries(self): """Return all relevant gallery URLs""" @staticmethod def _update(data): # compress data to simple lists if data["fields"] and isinstance(data["fields"][0], dict): data["fields"] = [ field.get("name") or field.get("label") for field in data["fields"] ] data["owners"] = [ owner.get("display_name") or owner.get("displayName") for owner in data["owners"] ] tags = data.get("tags") or () if tags and isinstance(tags[0], dict): tags = [tag["title"] for tag in tags] data["tags"] = tags # backwards compatibility data["gallery_id"] = data["id"] data["title"] = data["name"] data["user"] = ", ".join(data["owners"]) return data class BehanceGalleryExtractor(BehanceExtractor): """Extractor for image galleries from www.behance.net""" subcategory = "gallery" directory_fmt = ("{category}", "{owners:J, }", "{id} {name}") filename_fmt = "{category}_{id}_{num:>02}.{extension}" archive_fmt = "{id}_{num}" pattern = r"(?:https?://)?(?:www\.)?behance\.net/gallery/(\d+)" test = ( ("https://www.behance.net/gallery/17386197/A-Short-Story", { "count": 2, "url": "ab79bd3bef8d3ae48e6ac74fd995c1dfaec1b7d2", "keyword": { "id": 17386197, "name": 're:"Hi". A short story about the important things ', "owners": ["Place Studio", "Julio César Velazquez"], "fields": ["Animation", "Character Design", "Directing"], "tags": list, "module": dict, }, }), ("https://www.behance.net/gallery/21324767/Nevada-City", { "count": 6, "url": "0258fe194fe7d828d6f2c7f6086a9a0a4140db1d", "keyword": {"owners": ["Alex Strohl"]}, }), # 'media_collection' modules ("https://www.behance.net/gallery/88276087/Audi-R8-RWD", { "count": 20, "url": "6bebff0d37f85349f9ad28bd8b76fd66627c1e2f", }), # 'video' modules (#1282) ("https://www.behance.net/gallery/101185577/COLCCI", { "pattern": r"ytdl:https://cdn-prod-ccv\.adobe\.com/", "count": 3, }), ) def __init__(self, match): BehanceExtractor.__init__(self, match) self.gallery_id = match.group(1) def items(self): data = self.get_gallery_data() imgs = self.get_images(data) data["count"] = len(imgs) yield Message.Directory, data for data["num"], (url, module) in enumerate(imgs, 1): data["module"] = module data["extension"] = text.ext_from_url(url) yield Message.Url, url, data def get_gallery_data(self): """Collect gallery info dict""" url = "{}/gallery/{}/a".format(self.root, self.gallery_id) cookies = { "_evidon_consent_cookie": '{"consent_date":"2019-01-31T09:41:15.132Z"}', "bcp": "4c34489d-914c-46cd-b44c-dfd0e661136d", "gk_suid": "66981391", "gki": '{"feature_project_view":false,' '"feature_discover_login_prompt":false,' '"feature_project_login_prompt":false}', "ilo0": "true", } page = self.request(url, cookies=cookies).text data = json.loads(text.extract( page, 'id="beconfig-store_state">', '</script>')[0]) return self._update(data["project"]["project"]) def get_images(self, data): """Extract image results from an API response""" result = [] append = result.append for module in data["modules"]: mtype = module["type"] if mtype == "image": url = module["sizes"]["original"] append((url, module)) elif mtype == "video": page = self.request(module["src"]).text url = text.extract(page, '<source src="', '"')[0] if text.ext_from_url(url) == "m3u8": url = "ytdl:" + url append((url, module)) elif mtype == "media_collection": for component in module["components"]: url = component["sizes"]["source"] append((url, module)) elif mtype == "embed": embed = module.get("original_embed") or module.get("embed") if embed: url = "ytdl:" + text.extract(embed, 'src="', '"')[0] append((url, module)) return result class BehanceUserExtractor(BehanceExtractor): """Extractor for a user's galleries from www.behance.net""" subcategory = "user" categorytransfer = True pattern = r"(?:https?://)?(?:www\.)?behance\.net/([^/?#]+)/?$" test = ("https://www.behance.net/alexstrohl", { "count": ">= 8", "pattern": BehanceGalleryExtractor.pattern, }) def __init__(self, match): BehanceExtractor.__init__(self, match) self.user = match.group(1) def galleries(self): url = "{}/{}/projects".format(self.root, self.user) params = {"offset": 0} headers = {"X-Requested-With": "XMLHttpRequest"} while True: data = self.request(url, params=params, headers=headers).json() work = data["profile"]["activeSection"]["work"] yield from work["projects"] if not work["hasMore"]: return params["offset"] += len(work["projects"]) class BehanceCollectionExtractor(BehanceExtractor): """Extractor for a collection's galleries from www.behance.net""" subcategory = "collection" categorytransfer = True pattern = r"(?:https?://)?(?:www\.)?behance\.net/collection/(\d+)" test = ("https://www.behance.net/collection/71340149/inspiration", { "count": ">= 145", "pattern": BehanceGalleryExtractor.pattern, }) def __init__(self, match): BehanceExtractor.__init__(self, match) self.collection_id = match.group(1) def galleries(self): url = self.root + "/v3/graphql" headers = { "Origin" : self.root, "Referer": self.root + "/collection/" + self.collection_id, "X-BCP" : "4c34489d-914c-46cd-b44c-dfd0e661136d", "X-NewRelic-ID" : "VgUFVldbGwsFU1BRDwUBVw==", "X-Requested-With": "XMLHttpRequest", } cookies = { "bcp" : "4c34489d-914c-46cd-b44c-dfd0e661136d", "gk_suid": "66981391", "ilo0" : "true", } query = """ query GetMoodboardItemsAndRecommendations( $id: Int! $firstItem: Int! $afterItem: String $shouldGetRecommendations: Boolean! $shouldGetItems: Boolean! $shouldGetMoodboardFields: Boolean! ) { viewer @include(if: $shouldGetMoodboardFields) { isOptedOutOfRecommendations isAdmin } moodboard(id: $id) { ...moodboardFields @include(if: $shouldGetMoodboardFields) items(first: $firstItem, after: $afterItem) @include(if: $shouldGetItems) { pageInfo { endCursor hasNextPage } nodes { ...nodesFields } } recommendedItems(first: 80) @include(if: $shouldGetRecommendations) { nodes { ...nodesFields fetchSource } } } } fragment moodboardFields on Moodboard { id label privacy followerCount isFollowing projectCount url isOwner owners { id displayName url firstName location locationUrl isFollowing images { size_50 { url } size_100 { url } size_115 { url } size_230 { url } size_138 { url } size_276 { url } } } } fragment projectFields on Project { id isOwner publishedOn matureAccess hasMatureContent modifiedOn name url isPrivate slug license { license description id label url text images } fields { label } colors { r g b } owners { url displayName id location locationUrl isProfileOwner isFollowing images { size_50 { url } size_100 { url } size_115 { url } size_230 { url } size_138 { url } size_276 { url } } } covers { size_original { url } size_max_808 { url } size_808 { url } size_404 { url } size_202 { url } size_230 { url } size_115 { url } } stats { views { all } appreciations { all } comments { all } } } fragment exifDataValueFields on exifDataValue { id label value searchValue } fragment nodesFields on MoodboardItem { id entityType width height flexWidth flexHeight images { size url } entity { ... on Project { ...projectFields } ... on ImageModule { project { ...projectFields } colors { r g b } exifData { lens { ...exifDataValueFields } software { ...exifDataValueFields } makeAndModel { ...exifDataValueFields } focalLength { ...exifDataValueFields } iso { ...exifDataValueFields } location { ...exifDataValueFields } flash { ...exifDataValueFields } exposureMode { ...exifDataValueFields } shutterSpeed { ...exifDataValueFields } aperture { ...exifDataValueFields } } } ... on MediaCollectionComponent { project { ...projectFields } } } } """ variables = { "afterItem": "MAo=", "firstItem": 40, "id" : int(self.collection_id), "shouldGetItems" : True, "shouldGetMoodboardFields": False, "shouldGetRecommendations": False, } data = {"query": query, "variables": variables} while True: items = self.request( url, method="POST", headers=headers, cookies=cookies, json=data, ).json()["data"]["moodboard"]["items"] for node in items["nodes"]: yield node["entity"] if not items["pageInfo"]["hasNextPage"]: return variables["afterItem"] = items["pageInfo"]["endCursor"] ����������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1646253139.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.21.1/gallery_dl/extractor/blogger.py���������������������������������������������������0000644�0001750�0001750�00000021354�14207752123�020542� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2019-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for Blogger blogs""" from .common import Extractor, Message from .. import text import json import re BASE_PATTERN = ( r"(?:blogger:(?:https?://)?([^/]+)|" r"(?:https?://)?([\w-]+\.blogspot\.com))") class BloggerExtractor(Extractor): """Base class for blogger extractors""" category = "blogger" directory_fmt = ("{category}", "{blog[name]}", "{post[date]:%Y-%m-%d} {post[title]}") filename_fmt = "{num:>03}.{extension}" archive_fmt = "{post[id]}_{num}" root = "https://www.blogger.com" def __init__(self, match): Extractor.__init__(self, match) self.videos = self.config("videos", True) self.blog = match.group(1) or match.group(2) self.api = BloggerAPI(self) def items(self): blog = self.api.blog_by_url("http://" + self.blog) blog["pages"] = blog["pages"]["totalItems"] blog["posts"] = blog["posts"]["totalItems"] blog["date"] = text.parse_datetime(blog["published"]) del blog["selfLink"] sub = re.compile(r"(/|=)(?:s\d+|w\d+-h\d+)(?=/|$)").sub findall_image = re.compile( r'src="(https?://(?:' r'blogger\.googleusercontent\.com/img|' r'\d+\.bp\.blogspot\.com)/[^"]+)').findall findall_video = re.compile( r'src="(https?://www\.blogger\.com/video\.g\?token=[^"]+)').findall for post in self.posts(blog): content = post["content"] files = findall_image(content) for idx, url in enumerate(files): files[idx] = sub(r"\1s0", url).replace("http:", "https:", 1) if self.videos and 'id="BLOG_video-' in content: page = self.request(post["url"]).text for url in findall_video(page): page = self.request(url).text video_config = json.loads(text.extract( page, 'var VIDEO_CONFIG =', '\n')[0]) files.append(max( video_config["streams"], key=lambda x: x["format_id"], )["play_url"]) if not files: continue post["author"] = post["author"]["displayName"] post["replies"] = post["replies"]["totalItems"] post["content"] = text.remove_html(content) post["date"] = text.parse_datetime(post["published"]) del post["selfLink"] del post["blog"] yield Message.Directory, {"blog": blog, "post": post} for num, url in enumerate(files, 1): yield Message.Url, url, text.nameext_from_url(url, { "blog": blog, "post": post, "url" : url, "num" : num, }) def posts(self, blog): """Return an iterable with all relevant post objects""" class BloggerPostExtractor(BloggerExtractor): """Extractor for a single blog post""" subcategory = "post" pattern = BASE_PATTERN + r"(/\d{4}/\d\d/[^/?#]+\.html)" test = ( ("https://julianbphotography.blogspot.com/2010/12/moon-rise.html", { "url": "9928429fb62f712eb4de80f53625eccecc614aae", "pattern": r"https://3.bp.blogspot.com/.*/s0/Icy-Moonrise-.*.jpg", "keyword": { "blog": { "date" : "dt:2010-11-21 18:19:42", "description": "", "id" : "5623928067739466034", "kind" : "blogger#blog", "locale" : dict, "name" : "Julian Bunker Photography", "pages" : int, "posts" : int, "published" : "2010-11-21T10:19:42-08:00", "updated" : str, "url" : "http://julianbphotography.blogspot.com/", }, "post": { "author" : "Julian Bunker", "content" : str, "date" : "dt:2010-12-26 01:08:00", "etag" : str, "id" : "6955139236418998998", "kind" : "blogger#post", "published" : "2010-12-25T17:08:00-08:00", "replies" : "0", "title" : "Moon Rise", "updated" : "2011-12-06T05:21:24-08:00", "url" : "re:.+/2010/12/moon-rise.html$", }, "num": int, "url": str, }, }), ("blogger:http://www.julianbunker.com/2010/12/moon-rise.html"), # video (#587) (("http://cfnmscenesinmovies.blogspot.com/2011/11/" "cfnm-scene-jenna-fischer-in-office.html"), { "pattern": r"https://.+\.googlevideo\.com/videoplayback", }), # image URLs with width/height (#1061) ("https://aaaninja.blogspot.com/2020/08/altera-boob-press-2.html", { "pattern": r"https://1.bp.blogspot.com/.+/s0/altera_.+png", }), # new image domain (#2204) (("https://randomthingsthroughmyletterbox.blogspot.com/2022/01" "/bitter-flowers-by-gunnar-staalesen-blog.html"), { "pattern": r"https://blogger.googleusercontent.com/img/a/.+=s0$", "count": 8, }), ) def __init__(self, match): BloggerExtractor.__init__(self, match) self.path = match.group(3) def posts(self, blog): return (self.api.post_by_path(blog["id"], self.path),) class BloggerBlogExtractor(BloggerExtractor): """Extractor for an entire Blogger blog""" subcategory = "blog" pattern = BASE_PATTERN + r"/?$" test = ( ("https://julianbphotography.blogspot.com/", { "range": "1-25", "count": 25, "pattern": r"https://\d\.bp\.blogspot\.com/.*/s0/[^.]+\.jpg", }), ("blogger:https://www.kefblog.com.ng/", { "range": "1-25", "count": 25, }), ) def posts(self, blog): return self.api.blog_posts(blog["id"]) class BloggerSearchExtractor(BloggerExtractor): """Extractor for search resuls and labels""" subcategory = "search" pattern = BASE_PATTERN + r"/search(?:/?\?q=([^/?#]+)|/label/([^/?#]+))" test = ( ("https://julianbphotography.blogspot.com/search?q=400mm", { "count": "< 10" }), ("https://dmmagazine.blogspot.com/search/label/D%26D", { "range": "1-25", "count": 25, }), ) def __init__(self, match): BloggerExtractor.__init__(self, match) query = match.group(3) if query: self.query, self.label = query, None else: self.query, self.label = None, match.group(4) def posts(self, blog): if self.query: return self.api.blog_search(blog["id"], text.unquote(self.query)) return self.api.blog_posts(blog["id"], text.unquote(self.label)) class BloggerAPI(): """Minimal interface for the Blogger v3 API Ref: https://developers.google.com/blogger """ API_KEY = "AIzaSyCN9ax34oMMyM07g_M-5pjeDp_312eITK8" def __init__(self, extractor): self.extractor = extractor self.api_key = extractor.config("api-key", self.API_KEY) def blog_by_url(self, url): return self._call("blogs/byurl", {"url": url}, "blog") def blog_posts(self, blog_id, label=None): endpoint = "blogs/{}/posts".format(blog_id) params = {"labels": label} return self._pagination(endpoint, params) def blog_search(self, blog_id, query): endpoint = "blogs/{}/posts/search".format(blog_id) params = {"q": query} return self._pagination(endpoint, params) def post_by_path(self, blog_id, path): endpoint = "blogs/{}/posts/bypath".format(blog_id) return self._call(endpoint, {"path": path}, "post") def _call(self, endpoint, params, notfound=None): url = "https://www.googleapis.com/blogger/v3/" + endpoint params["key"] = self.api_key return self.extractor.request( url, params=params, notfound=notfound).json() def _pagination(self, endpoint, params): while True: data = self._call(endpoint, params) if "items" in data: yield from data["items"] if "nextPageToken" not in data: return params["pageToken"] = data["nextPageToken"] ������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1648487626.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.21.1/gallery_dl/extractor/booru.py�����������������������������������������������������0000644�0001750�0001750�00000004524�14220366312�020243� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2015-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for *booru sites""" from .common import BaseExtractor, Message from .. import text import operator class BooruExtractor(BaseExtractor): """Base class for *booru extractors""" basecategory = "booru" filename_fmt = "{category}_{id}_{md5}.{extension}" page_start = 0 per_page = 100 def items(self): self.login() data = self.metadata() tags = self.config("tags", False) notes = self.config("notes", False) for post in self.posts(): try: url = self._file_url(post) if url[0] == "/": url = self.root + url except (KeyError, TypeError): self.log.debug("Unable to fetch download URL for post %s " "(md5: %s)", post.get("id"), post.get("md5")) continue page_html = None if tags: page_html = self._extended_tags(post) if notes: self._notes(post, page_html) text.nameext_from_url(url, post) post.update(data) self._prepare(post) yield Message.Directory, post yield Message.Url, url, post def skip(self, num): pages = num // self.per_page self.page_start += pages return pages * self.per_page def login(self): """Login and set necessary cookies""" def metadata(self): """Return a dict with general metadata""" return () def posts(self): """Return an iterable with post objects""" return () _file_url = operator.itemgetter("file_url") def _prepare(self, post): """Prepare the 'post's metadata""" def _extended_tags(self, post, page=None): """Generate extended tag information The return value of this function will be passed to the _notes function as the page parameter. This makes it possible to reuse the same HTML both for extracting tags and notes. """ def _notes(self, post, page=None): """Generate information about notes""" ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1643756959.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.21.1/gallery_dl/extractor/comicvine.py�������������������������������������������������0000644�0001750�0001750�00000004737�14176336637�021120� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://comicvine.gamespot.com/""" from .booru import BooruExtractor from .. import text import operator class ComicvineTagExtractor(BooruExtractor): """Extractor for a gallery on comicvine.gamespot.com""" category = "comicvine" subcategory = "tag" basecategory = "" root = "https://comicvine.gamespot.com" per_page = 1000 directory_fmt = ("{category}", "{tag}") filename_fmt = "{filename}.{extension}" archive_fmt = "{id}" pattern = (r"(?:https?://)?comicvine\.gamespot\.com" r"(/([^/?#]+)/(\d+-\d+)/images/.*)") test = ( ("https://comicvine.gamespot.com/jock/4040-5653/images/", { "pattern": r"https://comicvine\.gamespot\.com/a/uploads" r"/original/\d+/\d+/\d+-.+\.(jpe?g|png)", "count": ">= 140", }), (("https://comicvine.gamespot.com/batman/4005-1699" "/images/?tag=Fan%20Art%20%26%20Cosplay"), { "pattern": r"https://comicvine\.gamespot\.com/a/uploads" r"/original/\d+/\d+/\d+-.+", "count": ">= 450", }), ) def __init__(self, match): BooruExtractor.__init__(self, match) self.path, self.object_name, self.object_id = match.groups() def metadata(self): return {"tag": text.unquote(self.object_name)} def posts(self): url = self.root + "/js/image-data.json" params = { "images": text.extract( self.request(self.root + self.path).text, 'data-gallery-id="', '"')[0], "start" : self.page_start, "count" : self.per_page, "object": self.object_id, } while True: images = self.request(url, params=params).json()["images"] yield from images if len(images) < self.per_page: return params["start"] += self.per_page def skip(self, num): self.page_start = num return num _file_url = operator.itemgetter("original") @staticmethod def _prepare(post): post["date"] = text.parse_datetime( post["dateCreated"], "%a, %b %d %Y") post["tags"] = [tag["name"] for tag in post["tags"] if tag["name"]] ���������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1648567962.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.21.1/gallery_dl/extractor/common.py����������������������������������������������������0000644�0001750�0001750�00000062300�14220623232�020376� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2014-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Common classes and constants used by extractor modules.""" import re import ssl import time import netrc import queue import logging import datetime import requests import threading from requests.adapters import HTTPAdapter from .message import Message from .. import config, text, util, exception class Extractor(): category = "" subcategory = "" basecategory = "" categorytransfer = False directory_fmt = ("{category}",) filename_fmt = "{filename}.{extension}" archive_fmt = "" cookiedomain = "" browser = None root = "" test = None request_interval = 0.0 request_interval_min = 0.0 request_timestamp = 0.0 tls12 = True def __init__(self, match): self.log = logging.getLogger(self.category) self.url = match.string self.finalize = None if self.basecategory: self.config = self._config_shared self.config_accumulate = self._config_shared_accumulate self._cfgpath = ("extractor", self.category, self.subcategory) self._parentdir = "" self._write_pages = self.config("write-pages", False) self._retries = self.config("retries", 4) self._timeout = self.config("timeout", 30) self._verify = self.config("verify", True) self._proxies = util.build_proxy_map(self.config("proxy"), self.log) self._interval = util.build_duration_func( self.config("sleep-request", self.request_interval), self.request_interval_min, ) if self._retries < 0: self._retries = float("inf") self._init_session() self._init_cookies() @classmethod def from_url(cls, url): if isinstance(cls.pattern, str): cls.pattern = re.compile(cls.pattern) match = cls.pattern.match(url) return cls(match) if match else None def __iter__(self): return self.items() def items(self): yield Message.Version, 1 def skip(self, num): return 0 def config(self, key, default=None): return config.interpolate(self._cfgpath, key, default) def config_accumulate(self, key): return config.accumulate(self._cfgpath, key) def _config_shared(self, key, default=None): return config.interpolate_common(("extractor",), ( (self.category, self.subcategory), (self.basecategory, self.subcategory), ), key, default) def _config_shared_accumulate(self, key): values = config.accumulate(self._cfgpath, key) conf = config.get(("extractor",), self.basecategory) if conf: values[:0] = config.accumulate((self.subcategory,), key, conf=conf) return values def request(self, url, *, method="GET", session=None, retries=None, encoding=None, fatal=True, notfound=None, **kwargs): if session is None: session = self.session if retries is None: retries = self._retries if "proxies" not in kwargs: kwargs["proxies"] = self._proxies if "timeout" not in kwargs: kwargs["timeout"] = self._timeout if "verify" not in kwargs: kwargs["verify"] = self._verify response = None tries = 1 if self._interval: seconds = (self._interval() - (time.time() - Extractor.request_timestamp)) if seconds > 0.0: self.log.debug("Sleeping for %.5s seconds", seconds) time.sleep(seconds) while True: try: response = session.request(method, url, **kwargs) except (requests.exceptions.ConnectionError, requests.exceptions.Timeout, requests.exceptions.ChunkedEncodingError, requests.exceptions.ContentDecodingError) as exc: msg = exc except (requests.exceptions.RequestException) as exc: raise exception.HttpError(exc) else: code = response.status_code if self._write_pages: self._dump_response(response) if 200 <= code < 400 or fatal is None and \ (400 <= code < 500) or not fatal and \ (400 <= code < 429 or 431 <= code < 500): if encoding: response.encoding = encoding return response if notfound and code == 404: raise exception.NotFoundError(notfound) msg = "'{} {}' for '{}'".format(code, response.reason, url) server = response.headers.get("Server") if server and server.startswith("cloudflare"): if code == 503 and \ b"jschl-answer" in response.content: self.log.warning("Cloudflare IUAM challenge") break if code == 403 and \ b'name="captcha-bypass"' in response.content: self.log.warning("Cloudflare CAPTCHA") break if code < 500 and code != 429 and code != 430: break finally: Extractor.request_timestamp = time.time() self.log.debug("%s (%s/%s)", msg, tries, retries+1) if tries > retries: break time.sleep( max(tries, self._interval()) if self._interval else tries) tries += 1 raise exception.HttpError(msg, response) def wait(self, *, seconds=None, until=None, adjust=1.0, reason="rate limit reset"): now = time.time() if seconds: seconds = float(seconds) until = now + seconds elif until: if isinstance(until, datetime.datetime): # convert to UTC timestamp until = util.datetime_to_timestamp(until) else: until = float(until) seconds = until - now else: raise ValueError("Either 'seconds' or 'until' is required") seconds += adjust if seconds <= 0.0: return if reason: t = datetime.datetime.fromtimestamp(until).time() isotime = "{:02}:{:02}:{:02}".format(t.hour, t.minute, t.second) self.log.info("Waiting until %s for %s.", isotime, reason) time.sleep(seconds) def _get_auth_info(self): """Return authentication information as (username, password) tuple""" username = self.config("username") password = None if username: password = self.config("password") elif self.config("netrc", False): try: info = netrc.netrc().authenticators(self.category) username, _, password = info except (OSError, netrc.NetrcParseError) as exc: self.log.error("netrc: %s", exc) except TypeError: self.log.warning("netrc: No authentication info") return username, password def _init_session(self): self.session = session = requests.Session() headers = session.headers headers.clear() ssl_options = ssl_ciphers = 0 browser = self.config("browser") or self.browser if browser and isinstance(browser, str): browser, _, platform = browser.lower().partition(":") if not platform or platform == "auto": platform = ("Windows NT 10.0; Win64; x64" if util.WINDOWS else "X11; Linux x86_64") elif platform == "windows": platform = "Windows NT 10.0; Win64; x64" elif platform == "linux": platform = "X11; Linux x86_64" elif platform == "macos": platform = "Macintosh; Intel Mac OS X 11.5" if browser == "chrome": if platform.startswith("Macintosh"): platform = platform.replace(".", "_") + "_2" else: browser = "firefox" for key, value in HTTP_HEADERS[browser]: if value and "{}" in value: headers[key] = value.format(platform) else: headers[key] = value ssl_options |= (ssl.OP_NO_SSLv2 | ssl.OP_NO_SSLv3 | ssl.OP_NO_TLSv1 | ssl.OP_NO_TLSv1_1) ssl_ciphers = SSL_CIPHERS[browser] else: headers["User-Agent"] = self.config("user-agent", ( "Mozilla/5.0 (Windows NT 10.0; Win64; x64; " "rv:91.0) Gecko/20100101 Firefox/91.0")) headers["Accept"] = "*/*" headers["Accept-Language"] = "en-US,en;q=0.5" headers["Accept-Encoding"] = "gzip, deflate" custom_headers = self.config("headers") if custom_headers: headers.update(custom_headers) custom_ciphers = self.config("ciphers") if custom_ciphers: if isinstance(custom_ciphers, list): ssl_ciphers = ":".join(custom_ciphers) else: ssl_ciphers = custom_ciphers source_address = self.config("source-address") if source_address: if isinstance(source_address, str): source_address = (source_address, 0) else: source_address = (source_address[0], source_address[1]) tls12 = self.config("tls12") if tls12 is None: tls12 = self.tls12 if not tls12: ssl_options |= ssl.OP_NO_TLSv1_2 self.log.debug("TLS 1.2 disabled.") adapter = _build_requests_adapter( ssl_options, ssl_ciphers, source_address) session.mount("https://", adapter) session.mount("http://", adapter) def _init_cookies(self): """Populate the session's cookiejar""" self._cookiefile = None self._cookiejar = self.session.cookies if self.cookiedomain is None: return cookies = self.config("cookies") if cookies: if isinstance(cookies, dict): self._update_cookies_dict(cookies, self.cookiedomain) elif isinstance(cookies, str): cookiefile = util.expand_path(cookies) try: with open(cookiefile) as fp: cookies = util.load_cookiestxt(fp) except Exception as exc: self.log.warning("cookies: %s", exc) else: self._update_cookies(cookies) self._cookiefile = cookiefile else: self.log.warning( "expected 'dict' or 'str' value for 'cookies' option, " "got '%s' (%s)", cookies.__class__.__name__, cookies) def _store_cookies(self): """Store the session's cookiejar in a cookies.txt file""" if self._cookiefile and self.config("cookies-update", True): try: with open(self._cookiefile, "w") as fp: util.save_cookiestxt(fp, self._cookiejar) except OSError as exc: self.log.warning("cookies: %s", exc) def _update_cookies(self, cookies, *, domain=""): """Update the session's cookiejar with 'cookies'""" if isinstance(cookies, dict): self._update_cookies_dict(cookies, domain or self.cookiedomain) else: setcookie = self._cookiejar.set_cookie try: cookies = iter(cookies) except TypeError: setcookie(cookies) else: for cookie in cookies: setcookie(cookie) def _update_cookies_dict(self, cookiedict, domain): """Update cookiejar with name-value pairs from a dict""" setcookie = self._cookiejar.set for name, value in cookiedict.items(): setcookie(name, value, domain=domain) def _check_cookies(self, cookienames, *, domain=None): """Check if all 'cookienames' are in the session's cookiejar""" if not self._cookiejar: return False if domain is None: domain = self.cookiedomain names = set(cookienames) now = time.time() for cookie in self._cookiejar: if cookie.name in names and ( not domain or cookie.domain == domain): if cookie.expires: diff = int(cookie.expires - now) if diff <= 0: self.log.warning( "Cookie '%s' has expired", cookie.name) continue elif diff <= 86400: hours = diff // 3600 self.log.warning( "Cookie '%s' will expire in less than %s hour%s", cookie.name, hours + 1, "s" if hours else "") names.discard(cookie.name) if not names: return True return False def _prepare_ddosguard_cookies(self): if not self._cookiejar.get("__ddg2", domain=self.cookiedomain): self._cookiejar.set( "__ddg2", util.generate_token(), domain=self.cookiedomain) def _get_date_min_max(self, dmin=None, dmax=None): """Retrieve and parse 'date-min' and 'date-max' config values""" def get(key, default): ts = self.config(key, default) if isinstance(ts, str): try: ts = int(datetime.datetime.strptime(ts, fmt).timestamp()) except ValueError as exc: self.log.warning("Unable to parse '%s': %s", key, exc) ts = default return ts fmt = self.config("date-format", "%Y-%m-%dT%H:%M:%S") return get("date-min", dmin), get("date-max", dmax) def _dispatch_extractors(self, extractor_data, default=()): """ """ extractors = { data[0].subcategory: data for data in extractor_data } include = self.config("include", default) or () if include == "all": include = extractors elif isinstance(include, str): include = include.split(",") result = [(Message.Version, 1)] for category in include: if category in extractors: extr, url = extractors[category] result.append((Message.Queue, url, {"_extractor": extr})) return iter(result) @classmethod def _get_tests(cls): """Yield an extractor's test cases as (URL, RESULTS) tuples""" tests = cls.test if not tests: return if len(tests) == 2 and (not tests[1] or isinstance(tests[1], dict)): tests = (tests,) for test in tests: if isinstance(test, str): test = (test, None) yield test def _dump_response(self, response, history=True): """Write the response content to a .dump file in the current directory. The file name is derived from the response url, replacing special characters with "_" """ if history: for resp in response.history: self._dump_response(resp, False) if hasattr(Extractor, "_dump_index"): Extractor._dump_index += 1 else: Extractor._dump_index = 1 Extractor._dump_sanitize = re.compile(r"[\\\\|/<>:\"?*&=#]+").sub fname = "{:>02}_{}".format( Extractor._dump_index, Extractor._dump_sanitize('_', response.url) )[:250] try: with open(fname + ".dump", 'wb') as fp: util.dump_response( response, fp, headers=(self._write_pages == "all")) except Exception as e: self.log.warning("Failed to dump HTTP request (%s: %s)", e.__class__.__name__, e) class GalleryExtractor(Extractor): subcategory = "gallery" filename_fmt = "{category}_{gallery_id}_{num:>03}.{extension}" directory_fmt = ("{category}", "{gallery_id} {title}") archive_fmt = "{gallery_id}_{num}" enum = "num" def __init__(self, match, url=None): Extractor.__init__(self, match) self.gallery_url = self.root + match.group(1) if url is None else url def items(self): self.login() page = self.request(self.gallery_url, notfound=self.subcategory).text data = self.metadata(page) imgs = self.images(page) if "count" in data: if self.config("page-reverse"): images = util.enumerate_reversed(imgs, 1, data["count"]) else: images = zip( range(1, data["count"]+1), imgs, ) else: enum = enumerate try: data["count"] = len(imgs) except TypeError: pass else: if self.config("page-reverse"): enum = util.enumerate_reversed images = enum(imgs, 1) yield Message.Directory, data for data[self.enum], (url, imgdata) in images: if imgdata: data.update(imgdata) if "extension" not in imgdata: text.nameext_from_url(url, data) else: text.nameext_from_url(url, data) yield Message.Url, url, data def login(self): """Login and set necessary cookies""" def metadata(self, page): """Return a dict with general metadata""" def images(self, page): """Return a list of all (image-url, metadata)-tuples""" class ChapterExtractor(GalleryExtractor): subcategory = "chapter" directory_fmt = ( "{category}", "{manga}", "{volume:?v/ />02}c{chapter:>03}{chapter_minor:?//}{title:?: //}") filename_fmt = ( "{manga}_c{chapter:>03}{chapter_minor:?//}_{page:>03}.{extension}") archive_fmt = ( "{manga}_{chapter}{chapter_minor}_{page}") enum = "page" class MangaExtractor(Extractor): subcategory = "manga" categorytransfer = True chapterclass = None reverse = True def __init__(self, match, url=None): Extractor.__init__(self, match) self.manga_url = url or self.root + match.group(1) if self.config("chapter-reverse", False): self.reverse = not self.reverse def items(self): self.login() page = self.request(self.manga_url).text chapters = self.chapters(page) if self.reverse: chapters.reverse() for chapter, data in chapters: data["_extractor"] = self.chapterclass yield Message.Queue, chapter, data def login(self): """Login and set necessary cookies""" def chapters(self, page): """Return a list of all (chapter-url, metadata)-tuples""" class AsynchronousMixin(): """Run info extraction in a separate thread""" def __iter__(self): messages = queue.Queue(5) thread = threading.Thread( target=self.async_items, args=(messages,), daemon=True, ) thread.start() while True: msg = messages.get() if msg is None: thread.join() return if isinstance(msg, Exception): thread.join() raise msg yield msg messages.task_done() def async_items(self, messages): try: for msg in self.items(): messages.put(msg) except Exception as exc: messages.put(exc) messages.put(None) class BaseExtractor(Extractor): instances = () def __init__(self, match): if not self.category: for index, group in enumerate(match.groups()): if group is not None: if index: self.category, self.root = self.instances[index-1] if not self.root: self.root = text.root_from_url(match.group(0)) else: self.root = group self.category = group.partition("://")[2] break Extractor.__init__(self, match) @classmethod def update(cls, instances): extra_instances = config.get(("extractor",), cls.basecategory) if extra_instances: for category, info in extra_instances.items(): if isinstance(info, dict) and "root" in info: instances[category] = info pattern_list = [] instance_list = cls.instances = [] for category, info in instances.items(): root = info["root"] if root: root = root.rstrip("/") instance_list.append((category, root)) pattern = info.get("pattern") if not pattern: pattern = re.escape(root[root.index(":") + 3:]) pattern_list.append(pattern + "()") return ( r"(?:" + cls.basecategory + r":(https?://[^/?#]+)|" r"(?:https?://)?(?:" + "|".join(pattern_list) + r"))" ) class RequestsAdapter(HTTPAdapter): def __init__(self, ssl_context=None, source_address=None): self.ssl_context = ssl_context self.source_address = source_address HTTPAdapter.__init__(self) def init_poolmanager(self, *args, **kwargs): kwargs["ssl_context"] = self.ssl_context kwargs["source_address"] = self.source_address return HTTPAdapter.init_poolmanager(self, *args, **kwargs) def proxy_manager_for(self, *args, **kwargs): kwargs["ssl_context"] = self.ssl_context kwargs["source_address"] = self.source_address return HTTPAdapter.proxy_manager_for(self, *args, **kwargs) def _build_requests_adapter(ssl_options, ssl_ciphers, source_address): key = (ssl_options, ssl_ciphers, source_address) try: return _adapter_cache[key] except KeyError: pass if ssl_options or ssl_ciphers: ssl_context = ssl.create_default_context() if ssl_options: ssl_context.options |= ssl_options if ssl_ciphers: ssl_context.set_ecdh_curve("prime256v1") ssl_context.set_ciphers(ssl_ciphers) else: ssl_context = None adapter = _adapter_cache[key] = RequestsAdapter( ssl_context, source_address) return adapter _adapter_cache = {} HTTP_HEADERS = { "firefox": ( ("User-Agent", "Mozilla/5.0 ({}; rv:91.0) " "Gecko/20100101 Firefox/91.0"), ("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9," "image/avif,*/*;q=0.8"), ("Accept-Language", "en-US,en;q=0.5"), ("Accept-Encoding", "gzip, deflate"), ("Referer", None), ("Connection", "keep-alive"), ("Upgrade-Insecure-Requests", "1"), ("Cookie", None), ), "chrome": ( ("Upgrade-Insecure-Requests", "1"), ("User-Agent", "Mozilla/5.0 ({}) AppleWebKit/537.36 (KHTML, " "like Gecko) Chrome/92.0.4515.131 Safari/537.36"), ("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9," "image/webp,image/apng,*/*;q=0.8"), ("Referer", None), ("Accept-Encoding", "gzip, deflate"), ("Accept-Language", "en-US,en;q=0.9"), ("Cookie", None), ), } SSL_CIPHERS = { "firefox": ( "TLS_AES_128_GCM_SHA256:" "TLS_CHACHA20_POLY1305_SHA256:" "TLS_AES_256_GCM_SHA384:" "ECDHE-ECDSA-AES128-GCM-SHA256:" "ECDHE-RSA-AES128-GCM-SHA256:" "ECDHE-ECDSA-CHACHA20-POLY1305:" "ECDHE-RSA-CHACHA20-POLY1305:" "ECDHE-ECDSA-AES256-GCM-SHA384:" "ECDHE-RSA-AES256-GCM-SHA384:" "ECDHE-ECDSA-AES256-SHA:" "ECDHE-ECDSA-AES128-SHA:" "ECDHE-RSA-AES128-SHA:" "ECDHE-RSA-AES256-SHA:" "AES128-GCM-SHA256:" "AES256-GCM-SHA384:" "AES128-SHA:" "AES256-SHA:" "DES-CBC3-SHA" ), "chrome": ( "TLS_AES_128_GCM_SHA256:" "TLS_AES_256_GCM_SHA384:" "TLS_CHACHA20_POLY1305_SHA256:" "ECDHE-ECDSA-AES128-GCM-SHA256:" "ECDHE-RSA-AES128-GCM-SHA256:" "ECDHE-ECDSA-AES256-GCM-SHA384:" "ECDHE-RSA-AES256-GCM-SHA384:" "ECDHE-ECDSA-CHACHA20-POLY1305:" "ECDHE-RSA-CHACHA20-POLY1305:" "ECDHE-RSA-AES128-SHA:" "ECDHE-RSA-AES256-SHA:" "AES128-GCM-SHA256:" "AES256-GCM-SHA384:" "AES128-SHA:" "AES256-SHA:" "DES-CBC3-SHA" ), } # Undo automatic pyOpenSSL injection by requests pyopenssl = config.get((), "pyopenssl", False) if not pyopenssl: try: from requests.packages.urllib3.contrib import pyopenssl # noqa pyopenssl.extract_from_urllib3() except ImportError: pass del pyopenssl ��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1643756959.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.21.1/gallery_dl/extractor/cyberdrop.py�������������������������������������������������0000644�0001750�0001750�00000004044�14176336637�021124� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://cyberdrop.me/""" from . import lolisafe from .. import text class CyberdropAlbumExtractor(lolisafe.LolisafelbumExtractor): category = "cyberdrop" root = "https://cyberdrop.me" pattern = r"(?:https?://)?(?:www\.)?cyberdrop\.me/a/([^/?#]+)" test = ( # images ("https://cyberdrop.me/a/keKRjm4t", { "pattern": r"https://fs-\d+\.cyberdrop\.to/.*\.(jpg|png|webp)$", "keyword": { "album_id": "keKRjm4t", "album_name": "Fate (SFW)", "album_size": 150069254, "count": 62, "date": "dt:2020-06-18 13:14:20", "description": "", "id": r"re:\w{8}", }, }), # videos ("https://cyberdrop.me/a/l8gIAXVD", { "pattern": r"https://fs-\d+\.cyberdrop\.to/.*\.mp4$", "count": 31, "keyword": { "album_id": "l8gIAXVD", "album_name": "Achelois17 videos", "album_size": 652037121, "date": "dt:2020-06-16 15:40:44", }, }), ) def fetch_album(self, album_id): url = self.root + "/a/" + self.album_id extr = text.extract_from(self.request(url).text) files = [] append = files.append while True: url = extr('id="file" href="', '"') if not url: break append({"file": text.unescape(url)}) return files, { "album_id" : self.album_id, "album_name" : extr("name: '", "'"), "date" : text.parse_timestamp(extr("timestamp: ", ",")), "album_size" : text.parse_int(extr("totalSize: ", ",")), "description": extr("description: `", "`"), "count" : len(files), } ��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1647771210.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.21.1/gallery_dl/extractor/danbooru.py��������������������������������������������������0000644�0001750�0001750�00000030742�14215577112�020735� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2014-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://danbooru.donmai.us/ and other Danbooru instances""" from .common import BaseExtractor, Message from .. import text import datetime class DanbooruExtractor(BaseExtractor): """Base class for danbooru extractors""" basecategory = "Danbooru" filename_fmt = "{category}_{id}_{filename}.{extension}" page_limit = 1000 page_start = None per_page = 200 def __init__(self, match): BaseExtractor.__init__(self, match) self.ugoira = self.config("ugoira", False) self.external = self.config("external", False) self.extended_metadata = self.config("metadata", False) username, api_key = self._get_auth_info() if username: self.log.debug("Using HTTP Basic Auth for user '%s'", username) self.session.auth = (username, api_key) instance = INSTANCES.get(self.category) or {} iget = instance.get self.headers = iget("headers") self.page_limit = iget("page-limit", 1000) self.page_start = iget("page-start") self.per_page = iget("per-page", 200) self.request_interval_min = iget("request-interval-min", 0.0) self._pools = iget("pools") def request(self, url, **kwargs): kwargs["headers"] = self.headers return BaseExtractor.request(self, url, **kwargs) def skip(self, num): pages = num // self.per_page if pages >= self.page_limit: pages = self.page_limit - 1 self.page_start = pages + 1 return pages * self.per_page def items(self): data = self.metadata() for post in self.posts(): file = post.get("file") if file: url = file["url"] if not url: md5 = file["md5"] url = file["url"] = ( "https://static1.{}/data/{}/{}/{}.{}".format( self.root[8:], md5[0:2], md5[2:4], md5, file["ext"] )) post["filename"] = file["md5"] post["extension"] = file["ext"] else: try: url = post["file_url"] except KeyError: if self.external and post["source"]: post.update(data) yield Message.Directory, post yield Message.Queue, post["source"], post continue text.nameext_from_url(url, post) if post["extension"] == "zip": if self.ugoira: post["frames"] = self.request( "{}/posts/{}.json?only=pixiv_ugoira_frame_data".format( self.root, post["id"]) ).json()["pixiv_ugoira_frame_data"]["data"] post["_http_adjust_extension"] = False else: url = post["large_file_url"] post["extension"] = "webm" if self.extended_metadata: template = ( "{}/posts/{}.json" "?only=artist_commentary,children,notes,parent" ) resp = self.request(template.format(self.root, post["id"])) post.update(resp.json()) post.update(data) yield Message.Directory, post yield Message.Url, url, post def metadata(self): return () def posts(self): return () def _pagination(self, endpoint, params, pagenum=False): url = self.root + endpoint params["limit"] = self.per_page params["page"] = self.page_start while True: posts = self.request(url, params=params).json() if "posts" in posts: posts = posts["posts"] yield from posts if len(posts) < self.per_page: return if pagenum: params["page"] += 1 else: for post in reversed(posts): if "id" in post: params["page"] = "b{}".format(post["id"]) break else: return INSTANCES = { "danbooru": { "root": None, "pattern": r"(?:danbooru|hijiribe|sonohara|safebooru)\.donmai\.us", }, "e621": { "root": None, "pattern": r"e(?:621|926)\.net", "headers": {"User-Agent": "gallery-dl/1.14.0 (by mikf)"}, "pools": "sort", "page-limit": 750, "per-page": 320, "request-interval-min": 1.0, }, "atfbooru": { "root": "https://booru.allthefallen.moe", "pattern": r"booru\.allthefallen\.moe", "page-limit": 5000, }, } BASE_PATTERN = DanbooruExtractor.update(INSTANCES) class DanbooruTagExtractor(DanbooruExtractor): """Extractor for danbooru posts from tag searches""" subcategory = "tag" directory_fmt = ("{category}", "{search_tags}") archive_fmt = "t_{search_tags}_{id}" pattern = BASE_PATTERN + r"/posts\?(?:[^&#]*&)*tags=([^&#]*)" test = ( ("https://danbooru.donmai.us/posts?tags=bonocho", { "content": "b196fb9f1668109d7774a0a82efea3ffdda07746", }), # test page transitions ("https://danbooru.donmai.us/posts?tags=mushishi", { "count": ">= 300", }), # 'external' option (#1747) ("https://danbooru.donmai.us/posts?tags=pixiv_id%3A1476533", { "options": (("external", True),), "pattern": r"http://img16.pixiv.net/img/takaraakihito/1476533.jpg", }), ("https://e621.net/posts?tags=anry", { "url": "8021e5ea28d47c474c1ffc9bd44863c4d45700ba", "content": "501d1e5d922da20ee8ff9806f5ed3ce3a684fd58", }), ("https://booru.allthefallen.moe/posts?tags=yume_shokunin", { "count": 12, }), ("https://hijiribe.donmai.us/posts?tags=bonocho"), ("https://sonohara.donmai.us/posts?tags=bonocho"), ("https://safebooru.donmai.us/posts?tags=bonocho"), ("https://e926.net/posts?tags=anry"), ) def __init__(self, match): DanbooruExtractor.__init__(self, match) tags = match.group(match.lastindex) self.tags = text.unquote(tags.replace("+", " ")) def metadata(self): return {"search_tags": self.tags} def posts(self): return self._pagination("/posts.json", {"tags": self.tags}) class DanbooruPoolExtractor(DanbooruExtractor): """Extractor for posts from danbooru pools""" subcategory = "pool" directory_fmt = ("{category}", "pool", "{pool[id]} {pool[name]}") archive_fmt = "p_{pool[id]}_{id}" pattern = BASE_PATTERN + r"/pool(?:s|/show)/(\d+)" test = ( ("https://danbooru.donmai.us/pools/7659", { "content": "b16bab12bea5f7ea9e0a836bf8045f280e113d99", }), ("https://e621.net/pools/73", { "url": "1bd09a72715286a79eea3b7f09f51b3493eb579a", "content": "91abe5d5334425d9787811d7f06d34c77974cd22", }), ("https://booru.allthefallen.moe/pools/9", { "url": "902549ffcdb00fe033c3f63e12bc3cb95c5fd8d5", "count": 6, }), ("https://danbooru.donmai.us/pool/show/7659"), ("https://e621.net/pool/show/73"), ) def __init__(self, match): DanbooruExtractor.__init__(self, match) self.pool_id = match.group(match.lastindex) self.post_ids = () def metadata(self): url = "{}/pools/{}.json".format(self.root, self.pool_id) pool = self.request(url).json() pool["name"] = pool["name"].replace("_", " ") self.post_ids = pool.pop("post_ids", ()) return {"pool": pool} def posts(self): if self._pools == "sort": self.log.info("Fetching posts of pool %s", self.pool_id) id_to_post = { post["id"]: post for post in self._pagination( "/posts.json", {"tags": "pool:" + self.pool_id}) } posts = [] append = posts.append for num, pid in enumerate(self.post_ids, 1): if pid in id_to_post: post = id_to_post[pid] post["num"] = num append(post) else: self.log.warning("Post %s is unavailable", pid) return posts else: params = {"tags": "pool:" + self.pool_id} return self._pagination("/posts.json", params) class DanbooruPostExtractor(DanbooruExtractor): """Extractor for single danbooru posts""" subcategory = "post" archive_fmt = "{id}" pattern = BASE_PATTERN + r"/post(?:s|/show)/(\d+)" test = ( ("https://danbooru.donmai.us/posts/294929", { "content": "5e255713cbf0a8e0801dc423563c34d896bb9229", }), ("https://danbooru.donmai.us/posts/3613024", { "pattern": r"https?://.+\.zip$", "options": (("ugoira", True),) }), ("https://e621.net/posts/535", { "url": "f7f78b44c9b88f8f09caac080adc8d6d9fdaa529", "content": "66f46e96a893fba8e694c4e049b23c2acc9af462", }), ("https://booru.allthefallen.moe/posts/22", { "content": "21dda68e1d7e0a554078e62923f537d8e895cac8", }), ("https://danbooru.donmai.us/post/show/294929"), ("https://e621.net/post/show/535"), ) def __init__(self, match): DanbooruExtractor.__init__(self, match) self.post_id = match.group(match.lastindex) def posts(self): url = "{}/posts/{}.json".format(self.root, self.post_id) post = self.request(url).json() return (post["post"] if "post" in post else post,) class DanbooruPopularExtractor(DanbooruExtractor): """Extractor for popular images from danbooru""" subcategory = "popular" directory_fmt = ("{category}", "popular", "{scale}", "{date}") archive_fmt = "P_{scale[0]}_{date}_{id}" pattern = BASE_PATTERN + r"/explore/posts/popular(?:\?([^#]*))?" test = ( ("https://danbooru.donmai.us/explore/posts/popular"), (("https://danbooru.donmai.us/explore/posts/popular" "?date=2013-06-06&scale=week"), { "range": "1-120", "count": 120, }), ("https://e621.net/explore/posts/popular"), (("https://e621.net/explore/posts/popular" "?date=2019-06-01&scale=month"), { "pattern": r"https://static\d.e621.net/data/../../[0-9a-f]+", "count": ">= 70", }), ("https://booru.allthefallen.moe/explore/posts/popular"), ) def __init__(self, match): DanbooruExtractor.__init__(self, match) self.params = match.group(match.lastindex) def metadata(self): self.params = params = text.parse_query(self.params) scale = params.get("scale", "day") date = params.get("date") or datetime.date.today().isoformat() if scale == "week": date = datetime.date.fromisoformat(date) date = (date - datetime.timedelta(days=date.weekday())).isoformat() elif scale == "month": date = date[:-3] return {"date": date, "scale": scale} def posts(self): if self.page_start is None: self.page_start = 1 return self._pagination( "/explore/posts/popular.json", self.params, True) class DanbooruFavoriteExtractor(DanbooruExtractor): """Extractor for e621 favorites""" subcategory = "favorite" directory_fmt = ("{category}", "Favorites", "{user_id}") archive_fmt = "f_{user_id}_{id}" pattern = BASE_PATTERN + r"/favorites(?:\?([^#]*))?" test = ( ("https://e621.net/favorites"), ("https://e621.net/favorites?page=2&user_id=53275", { "pattern": r"https://static\d.e621.net/data/../../[0-9a-f]+", "count": "> 260", }), ) def __init__(self, match): DanbooruExtractor.__init__(self, match) self.query = text.parse_query(match.group(match.lastindex)) def metadata(self): return {"user_id": self.query.get("user_id", "")} def posts(self): if self.page_start is None: self.page_start = 1 return self._pagination("/favorites.json", self.query, True) ������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1643756959.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.21.1/gallery_dl/extractor/desktopography.py��������������������������������������������0000644�0001750�0001750�00000006114�14176336637�022176� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://desktopography.net/""" from .common import Extractor, Message from .. import text BASE_PATTERN = r"(?:https?://)?desktopography\.net" class DesktopographyExtractor(Extractor): """Base class for desktopography extractors""" category = "desktopography" archive_fmt = "{filename}" root = "https://desktopography.net" class DesktopographySiteExtractor(DesktopographyExtractor): """Extractor for all desktopography exhibitions """ subcategory = "site" pattern = BASE_PATTERN + r"/$" test = ("https://desktopography.net/",) def items(self): page = self.request(self.root).text data = {"_extractor": DesktopographyExhibitionExtractor} for exhibition_year in text.extract_iter( page, '<a href="https://desktopography.net/exhibition-', '/">'): url = self.root + "/exhibition-" + exhibition_year + "/" yield Message.Queue, url, data class DesktopographyExhibitionExtractor(DesktopographyExtractor): """Extractor for a yearly desktopography exhibition""" subcategory = "exhibition" pattern = BASE_PATTERN + r"/exhibition-([^/?#]+)/" test = ("https://desktopography.net/exhibition-2020/",) def __init__(self, match): DesktopographyExtractor.__init__(self, match) self.year = match.group(1) def items(self): url = "{}/exhibition-{}/".format(self.root, self.year) base_entry_url = "https://desktopography.net/portfolios/" page = self.request(url).text data = { "_extractor": DesktopographyEntryExtractor, "year": self.year, } for entry_url in text.extract_iter( page, '<a class="overlay-background" href="' + base_entry_url, '">'): url = base_entry_url + entry_url yield Message.Queue, url, data class DesktopographyEntryExtractor(DesktopographyExtractor): """Extractor for all resolutions of a desktopography wallpaper""" subcategory = "entry" pattern = BASE_PATTERN + r"/portfolios/([\w-]+)" test = ("https://desktopography.net/portfolios/new-era/",) def __init__(self, match): DesktopographyExtractor.__init__(self, match) self.entry = match.group(1) def items(self): url = "{}/portfolios/{}".format(self.root, self.entry) page = self.request(url).text entry_data = {"entry": self.entry} yield Message.Directory, entry_data for image_data in text.extract_iter( page, '<a target="_blank" href="https://desktopography.net', '">'): path, _, filename = image_data.partition( '" class="wallpaper-button" download="') text.nameext_from_url(filename, entry_data) yield Message.Url, self.root + path, entry_data ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1648567962.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.21.1/gallery_dl/extractor/deviantart.py������������������������������������������������0000644�0001750�0001750�00000162675�14220623232�021267� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2015-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extract images from https://www.deviantart.com/""" from .common import Extractor, Message from .. import text, util, exception from ..cache import cache, memcache import collections import itertools import mimetypes import binascii import time import re BASE_PATTERN = ( r"(?:https?://)?(?:" r"(?:www\.)?deviantart\.com/(?!watch/)([\w-]+)|" r"(?!www\.)([\w-]+)\.deviantart\.com)" ) class DeviantartExtractor(Extractor): """Base class for deviantart extractors""" category = "deviantart" directory_fmt = ("{category}", "{username}") filename_fmt = "{category}_{index}_{title}.{extension}" cookiedomain = None root = "https://www.deviantart.com" _last_request = 0 def __init__(self, match): Extractor.__init__(self, match) self.offset = 0 self.flat = self.config("flat", True) self.extra = self.config("extra", False) self.original = self.config("original", True) self.comments = self.config("comments", False) self.user = match.group(1) or match.group(2) self.group = False self.api = None unwatch = self.config("auto-unwatch") if unwatch: self.unwatch = [] self.finalize = self._unwatch_premium else: self.unwatch = None if self.original != "image": self._update_content = self._update_content_default else: self._update_content = self._update_content_image self.original = True self._premium_cache = {} self.commit_journal = { "html": self._commit_journal_html, "text": self._commit_journal_text, }.get(self.config("journals", "html")) def skip(self, num): self.offset += num return num def items(self): self.api = DeviantartOAuthAPI(self) if self.user: profile = self.api.user_profile(self.user) self.group = not profile if self.group: self.subcategory = "group-" + self.subcategory self.user = self.user.lower() else: self.user = profile["user"]["username"] for deviation in self.deviations(): if isinstance(deviation, tuple): url, data = deviation yield Message.Queue, url, data continue if "premium_folder_data" in deviation: data = self._fetch_premium(deviation) if not data: continue deviation.update(data) self.prepare(deviation) yield Message.Directory, deviation if "content" in deviation: content = deviation["content"] if self.original and deviation["is_downloadable"]: self._update_content(deviation, content) else: self._update_token(deviation, content) yield self.commit(deviation, content) elif deviation["is_downloadable"]: content = self.api.deviation_download(deviation["deviationid"]) yield self.commit(deviation, content) if "videos" in deviation and deviation["videos"]: video = max(deviation["videos"], key=lambda x: text.parse_int(x["quality"][:-1])) yield self.commit(deviation, video) if "flash" in deviation: yield self.commit(deviation, deviation["flash"]) if "excerpt" in deviation and self.commit_journal: journal = self.api.deviation_content(deviation["deviationid"]) if self.extra: deviation["_journal"] = journal["html"] yield self.commit_journal(deviation, journal) if self.extra: txt = (deviation.get("description", "") + deviation.get("_journal", "")) for match in DeviantartStashExtractor.pattern.finditer(txt): url = text.ensure_http_scheme(match.group(0)) deviation["_extractor"] = DeviantartStashExtractor yield Message.Queue, url, deviation def deviations(self): """Return an iterable containing all relevant Deviation-objects""" def prepare(self, deviation): """Adjust the contents of a Deviation-object""" if "index" not in deviation: try: deviation["index"] = text.parse_int( deviation["url"].rpartition("-")[2]) except KeyError: deviation["index"] = 0 if self.user: deviation["username"] = self.user deviation["_username"] = self.user.lower() else: deviation["username"] = deviation["author"]["username"] deviation["_username"] = deviation["username"].lower() deviation["da_category"] = deviation["category"] deviation["published_time"] = text.parse_int( deviation["published_time"]) deviation["date"] = text.parse_timestamp( deviation["published_time"]) if self.comments: deviation["comments"] = ( self.api.comments_deviation(deviation["deviationid"]) if deviation["stats"]["comments"] else () ) # filename metadata alphabet = "0123456789abcdefghijklmnopqrstuvwxyz" deviation["index_base36"] = util.bencode(deviation["index"], alphabet) sub = re.compile(r"\W").sub deviation["filename"] = "".join(( sub("_", deviation["title"].lower()), "_by_", sub("_", deviation["author"]["username"].lower()), "-d", deviation["index_base36"], )) @staticmethod def commit(deviation, target): url = target["src"] name = target.get("filename") or url target = target.copy() target["filename"] = deviation["filename"] deviation["target"] = target deviation["extension"] = target["extension"] = text.ext_from_url(name) return Message.Url, url, deviation def _commit_journal_html(self, deviation, journal): title = text.escape(deviation["title"]) url = deviation["url"] thumbs = deviation.get("thumbs") or deviation.get("files") html = journal["html"] shadow = SHADOW_TEMPLATE.format_map(thumbs[0]) if thumbs else "" if "css" in journal: css, cls = journal["css"], "withskin" elif html.startswith("<style"): css, _, html = html.partition("</style>") css = css.partition(">")[2] cls = "withskin" else: css, cls = "", "journal-green" if html.find('<div class="boxtop journaltop">', 0, 250) != -1: needle = '<div class="boxtop journaltop">' header = HEADER_CUSTOM_TEMPLATE.format( title=title, url=url, date=deviation["date"], ) else: needle = '<div usr class="gr">' catlist = deviation["category_path"].split("/") categories = " / ".join( ('<span class="crumb"><a href="{}/{}/"><span>{}</span></a>' '</span>').format(self.root, cpath, cat.capitalize()) for cat, cpath in zip( catlist, itertools.accumulate(catlist, lambda t, c: t + "/" + c) ) ) username = deviation["author"]["username"] urlname = deviation.get("username") or username.lower() header = HEADER_TEMPLATE.format( title=title, url=url, userurl="{}/{}/".format(self.root, urlname), username=username, date=deviation["date"], categories=categories, ) if needle in html: html = html.replace(needle, header, 1) else: html = JOURNAL_TEMPLATE_HTML_EXTRA.format(header, html) html = JOURNAL_TEMPLATE_HTML.format( title=title, html=html, shadow=shadow, css=css, cls=cls) deviation["extension"] = "htm" return Message.Url, html, deviation @staticmethod def _commit_journal_text(deviation, journal): html = journal["html"] if html.startswith("<style"): html = html.partition("</style>")[2] content = "\n".join( text.unescape(text.remove_html(txt)) for txt in html.rpartition("<script")[0].split("<br />") ) txt = JOURNAL_TEMPLATE_TEXT.format( title=deviation["title"], username=deviation["author"]["username"], date=deviation["date"], content=content, ) deviation["extension"] = "txt" return Message.Url, txt, deviation @staticmethod def _find_folder(folders, name, uuid): if uuid.isdecimal(): match = re.compile(name.replace( "-", r"[^a-z0-9]+") + "$", re.IGNORECASE).match for folder in folders: if match(folder["name"]): return folder else: for folder in folders: if folder["folderid"] == uuid: return folder raise exception.NotFoundError("folder") def _folder_urls(self, folders, category, extractor): base = "{}/{}/{}/".format(self.root, self.user, category) for folder in folders: folder["_extractor"] = extractor url = "{}{}/{}".format(base, folder["folderid"], folder["name"]) yield url, folder def _update_content_default(self, deviation, content): public = "premium_folder_data" not in deviation data = self.api.deviation_download(deviation["deviationid"], public) content.update(data) def _update_content_image(self, deviation, content): data = self.api.deviation_download(deviation["deviationid"]) url = data["src"].partition("?")[0] mtype = mimetypes.guess_type(url, False)[0] if mtype and mtype.startswith("image/"): content.update(data) def _update_token(self, deviation, content): """Replace JWT to be able to remove width/height limits All credit goes to @Ironchest337 for discovering and implementing this method """ url, sep, _ = content["src"].partition("/v1/") if not sep: return # header = b'{"typ":"JWT","alg":"none"}' payload = ( b'{"sub":"urn:app:","iss":"urn:app:","obj":[[{"path":"/f/' + url.partition("/f/")[2].encode() + b'"}]],"aud":["urn:service:file.download"]}' ) deviation["_fallback"] = (content["src"],) content["src"] = ( "{}?token=eyJ0eXAiOiJKV1QiLCJhbGciOiJub25lIn0.{}.".format( url, # base64 of 'header' is precomputed as 'eyJ0eX...' # binascii.a2b_base64(header).rstrip(b"=\n").decode(), binascii.b2a_base64(payload).rstrip(b"=\n").decode()) ) def _limited_request(self, url, **kwargs): """Limits HTTP requests to one every 2 seconds""" kwargs["fatal"] = None diff = time.time() - DeviantartExtractor._last_request if diff < 2.0: delay = 2.0 - diff self.log.debug("Sleeping %.2f seconds", delay) time.sleep(delay) while True: response = self.request(url, **kwargs) if response.status_code != 403 or \ b"Request blocked." not in response.content: DeviantartExtractor._last_request = time.time() return response self.wait(seconds=180) def _fetch_premium(self, deviation): try: return self._premium_cache[deviation["deviationid"]] except KeyError: pass if not self.api.refresh_token_key: self.log.warning( "Unable to access premium content (no refresh-token)") self._fetch_premium = lambda _: None return None dev = self.api.deviation(deviation["deviationid"], False) folder = dev["premium_folder_data"] username = dev["author"]["username"] has_access = folder["has_access"] if not has_access and folder["type"] == "watchers" and \ self.config("auto-watch"): if self.unwatch is not None: self.unwatch.append(username) if self.api.user_friends_watch(username): has_access = True self.log.info( "Watching %s for premium folder access", username) else: self.log.warning( "Error when trying to watch %s. " "Try again with a new refresh-token", username) if has_access: self.log.info("Fetching premium folder data") else: self.log.warning("Unable to access premium content (type: %s)", folder["type"]) cache = self._premium_cache for dev in self.api.gallery( username, folder["gallery_id"], public=False): cache[dev["deviationid"]] = dev if has_access else None return cache[deviation["deviationid"]] def _unwatch_premium(self): for username in self.unwatch: self.log.info("Unwatching %s", username) self.api.user_friends_unwatch(username) class DeviantartUserExtractor(DeviantartExtractor): """Extractor for an artist's user profile""" subcategory = "user" pattern = BASE_PATTERN + r"/?$" test = ( ("https://www.deviantart.com/shimoda7", { "pattern": r"/shimoda7/gallery$", }), ("https://www.deviantart.com/shimoda7", { "options": (("include", "all"),), "pattern": r"/shimoda7/(gallery(/scraps)?|posts|favourites)$", "count": 4, }), ("https://shimoda7.deviantart.com/"), ) def items(self): base = "{}/{}/".format(self.root, self.user) return self._dispatch_extractors(( (DeviantartGalleryExtractor , base + "gallery"), (DeviantartScrapsExtractor , base + "gallery/scraps"), (DeviantartJournalExtractor , base + "posts"), (DeviantartFavoriteExtractor, base + "favourites"), ), ("gallery",)) ############################################################################### # OAuth ####################################################################### class DeviantartGalleryExtractor(DeviantartExtractor): """Extractor for all deviations from an artist's gallery""" subcategory = "gallery" archive_fmt = "g_{_username}_{index}.{extension}" pattern = BASE_PATTERN + r"/gallery(?:/all|/?\?catpath=)?/?$" test = ( ("https://www.deviantart.com/shimoda7/gallery/", { "pattern": r"https://(api-da\.wixmp\.com/_api/download/file" r"|images-wixmp-[^.]+.wixmp.com/f/.+/.+.jpg\?token=.+)", "count": ">= 30", "keyword": { "allows_comments": bool, "author": { "type": "regular", "usericon": str, "userid": "9AE51FC7-0278-806C-3FFF-F4961ABF9E2B", "username": "shimoda7", }, "category_path": str, "content": { "filesize": int, "height": int, "src": str, "transparency": bool, "width": int, }, "da_category": str, "date": "type:datetime", "deviationid": str, "?download_filesize": int, "extension": str, "index": int, "is_deleted": bool, "is_downloadable": bool, "is_favourited": bool, "is_mature": bool, "preview": { "height": int, "src": str, "transparency": bool, "width": int, }, "published_time": int, "stats": { "comments": int, "favourites": int, }, "target": dict, "thumbs": list, "title": str, "url": r"re:https://www.deviantart.com/shimoda7/art/[^/]+-\d+", "username": "shimoda7", }, }), # group ("https://www.deviantart.com/yakuzafc/gallery", { "pattern": r"https://www.deviantart.com/yakuzafc/gallery" r"/\w{8}-\w{4}-\w{4}-\w{4}-\w{12}/", "count": ">= 15", }), # 'folders' option (#276) ("https://www.deviantart.com/justatest235723/gallery", { "count": 3, "options": (("metadata", 1), ("folders", 1), ("original", 0)), "keyword": { "description": str, "folders": list, "is_watching": bool, "license": str, "tags": list, }, }), ("https://www.deviantart.com/shimoda8/gallery/", { "exception": exception.NotFoundError, }), ("https://www.deviantart.com/shimoda7/gallery"), ("https://www.deviantart.com/shimoda7/gallery/all"), ("https://www.deviantart.com/shimoda7/gallery/?catpath=/"), ("https://shimoda7.deviantart.com/gallery/"), ("https://shimoda7.deviantart.com/gallery/all/"), ("https://shimoda7.deviantart.com/gallery/?catpath=/"), ) def deviations(self): if self.flat and not self.group: return self.api.gallery_all(self.user, self.offset) folders = self.api.gallery_folders(self.user) return self._folder_urls(folders, "gallery", DeviantartFolderExtractor) class DeviantartFolderExtractor(DeviantartExtractor): """Extractor for deviations inside an artist's gallery folder""" subcategory = "folder" directory_fmt = ("{category}", "{username}", "{folder[title]}") archive_fmt = "F_{folder[uuid]}_{index}.{extension}" pattern = BASE_PATTERN + r"/gallery/([^/?#]+)/([^/?#]+)" test = ( # user ("https://www.deviantart.com/shimoda7/gallery/722019/Miscellaneous", { "count": 5, "options": (("original", False),), }), # group ("https://www.deviantart.com/yakuzafc/gallery/37412168/Crafts", { "count": ">= 4", "options": (("original", False),), }), # uuid (("https://www.deviantart.com/shimoda7/gallery" "/B38E3C6A-2029-6B45-757B-3C8D3422AD1A/misc"), { "count": 5, "options": (("original", False),), }), # name starts with '_', special characters (#1451) (("https://www.deviantart.com/justatest235723" "/gallery/69302698/-test-b-c-d-e-f-"), { "count": 1, "options": (("original", False),), }), ("https://shimoda7.deviantart.com/gallery/722019/Miscellaneous"), ("https://yakuzafc.deviantart.com/gallery/37412168/Crafts"), ) def __init__(self, match): DeviantartExtractor.__init__(self, match) self.folder = None self.folder_id = match.group(3) self.folder_name = match.group(4) def deviations(self): folders = self.api.gallery_folders(self.user) folder = self._find_folder(folders, self.folder_name, self.folder_id) self.folder = { "title": folder["name"], "uuid" : folder["folderid"], "index": self.folder_id, "owner": self.user, } return self.api.gallery(self.user, folder["folderid"], self.offset) def prepare(self, deviation): DeviantartExtractor.prepare(self, deviation) deviation["folder"] = self.folder class DeviantartStashExtractor(DeviantartExtractor): """Extractor for sta.sh-ed deviations""" subcategory = "stash" archive_fmt = "{index}.{extension}" pattern = r"(?:https?://)?sta\.sh/([a-z0-9]+)" test = ( ("https://sta.sh/022c83odnaxc", { "pattern": r"https://api-da\.wixmp\.com/_api/download/file", "content": "057eb2f2861f6c8a96876b13cca1a4b7a408c11f", "count": 1, }), # multiple stash items ("https://sta.sh/21jf51j7pzl2", { "options": (("original", False),), "count": 4, }), # downloadable, but no "content" field (#307) ("https://sta.sh/024t4coz16mi", { "pattern": r"https://api-da\.wixmp\.com/_api/download/file", "count": 1, }), # mixed folders and images (#659) ("https://sta.sh/215twi387vfj", { "options": (("original", False),), "count": 4, }), ("https://sta.sh/abcdefghijkl", { "count": 0, }), ) skip = Extractor.skip def __init__(self, match): DeviantartExtractor.__init__(self, match) self.user = None self.stash_id = match.group(1) def deviations(self, stash_id=None): if stash_id is None: stash_id = self.stash_id url = "https://sta.sh/" + stash_id page = self._limited_request(url).text if stash_id[0] == "0": uuid = text.extract(page, '//deviation/', '"')[0] if uuid: deviation = self.api.deviation(uuid) deviation["index"] = text.parse_int(text.extract( page, 'gmi-deviationid="', '"')[0]) yield deviation return for item in text.extract_iter( page, 'class="stash-thumb-container', '</div>'): url = text.extract(item, '<a href="', '"')[0] if url: stash_id = url.rpartition("/")[2] else: stash_id = text.extract(item, 'gmi-stashid="', '"')[0] stash_id = "2" + util.bencode(text.parse_int( stash_id), "0123456789abcdefghijklmnopqrstuvwxyz") if len(stash_id) > 2: yield from self.deviations(stash_id) class DeviantartFavoriteExtractor(DeviantartExtractor): """Extractor for an artist's favorites""" subcategory = "favorite" directory_fmt = ("{category}", "{username}", "Favourites") archive_fmt = "f_{_username}_{index}.{extension}" pattern = BASE_PATTERN + r"/favourites(?:/all|/?\?catpath=)?/?$" test = ( ("https://www.deviantart.com/h3813067/favourites/", { "options": (("metadata", True), ("flat", False)), # issue #271 "count": 1, }), ("https://www.deviantart.com/h3813067/favourites/", { "content": "6a7c74dc823ebbd457bdd9b3c2838a6ee728091e", }), ("https://www.deviantart.com/h3813067/favourites/all"), ("https://www.deviantart.com/h3813067/favourites/?catpath=/"), ("https://h3813067.deviantart.com/favourites/"), ("https://h3813067.deviantart.com/favourites/all"), ("https://h3813067.deviantart.com/favourites/?catpath=/"), ) def deviations(self): folders = self.api.collections_folders(self.user) if self.flat: deviations = itertools.chain.from_iterable( self.api.collections(self.user, folder["folderid"]) for folder in folders ) if self.offset: deviations = util.advance(deviations, self.offset) return deviations return self._folder_urls( folders, "favourites", DeviantartCollectionExtractor) class DeviantartCollectionExtractor(DeviantartExtractor): """Extractor for a single favorite collection""" subcategory = "collection" directory_fmt = ("{category}", "{username}", "Favourites", "{collection[title]}") archive_fmt = "C_{collection[uuid]}_{index}.{extension}" pattern = BASE_PATTERN + r"/favourites/([^/?#]+)/([^/?#]+)" test = ( (("https://www.deviantart.com/pencilshadings/favourites" "/70595441/3D-Favorites"), { "count": ">= 20", "options": (("original", False),), }), (("https://www.deviantart.com/pencilshadings/favourites" "/F050486B-CB62-3C66-87FB-1105A7F6379F/3D Favorites"), { "count": ">= 20", "options": (("original", False),), }), ("https://pencilshadings.deviantart.com" "/favourites/70595441/3D-Favorites"), ) def __init__(self, match): DeviantartExtractor.__init__(self, match) self.collection = None self.collection_id = match.group(3) self.collection_name = match.group(4) def deviations(self): folders = self.api.collections_folders(self.user) folder = self._find_folder( folders, self.collection_name, self.collection_id) self.collection = { "title": folder["name"], "uuid" : folder["folderid"], "index": self.collection_id, "owner": self.user, } return self.api.collections(self.user, folder["folderid"], self.offset) def prepare(self, deviation): DeviantartExtractor.prepare(self, deviation) deviation["collection"] = self.collection class DeviantartJournalExtractor(DeviantartExtractor): """Extractor for an artist's journals""" subcategory = "journal" directory_fmt = ("{category}", "{username}", "Journal") archive_fmt = "j_{_username}_{index}.{extension}" pattern = BASE_PATTERN + r"/(?:posts(?:/journals)?|journal)/?(?:\?.*)?$" test = ( ("https://www.deviantart.com/angrywhitewanker/posts/journals/", { "url": "38db2a0d3a587a7e0f9dba7ff7d274610ebefe44", }), ("https://www.deviantart.com/angrywhitewanker/posts/journals/", { "url": "b2a8e74d275664b1a4acee0fca0a6fd33298571e", "options": (("journals", "text"),), }), ("https://www.deviantart.com/angrywhitewanker/posts/journals/", { "count": 0, "options": (("journals", "none"),), }), ("https://www.deviantart.com/shimoda7/posts/"), ("https://www.deviantart.com/shimoda7/journal/"), ("https://www.deviantart.com/shimoda7/journal/?catpath=/"), ("https://shimoda7.deviantart.com/journal/"), ("https://shimoda7.deviantart.com/journal/?catpath=/"), ) def deviations(self): return self.api.browse_user_journals(self.user, self.offset) class DeviantartPopularExtractor(DeviantartExtractor): """Extractor for popular deviations""" subcategory = "popular" directory_fmt = ("{category}", "Popular", "{popular[range]}", "{popular[search]}") archive_fmt = "P_{popular[range]}_{popular[search]}_{index}.{extension}" pattern = (r"(?:https?://)?www\.deviantart\.com/(?:" r"search(?:/deviations)?" r"|(?:deviations/?)?\?order=(popular-[^/?#]+)" r"|((?:[\w-]+/)*)(popular-[^/?#]+)" r")/?(?:\?([^#]*))?") test = ( ("https://www.deviantart.com/?order=popular-all-time", { "options": (("original", False),), "range": "1-30", "count": 30, }), ("https://www.deviantart.com/popular-24-hours/?q=tree+house", { "options": (("original", False),), "range": "1-30", "count": 30, }), ("https://www.deviantart.com/search?q=tree"), ("https://www.deviantart.com/search/deviations?order=popular-1-week"), ("https://www.deviantart.com/artisan/popular-all-time/?q=tree"), ) def __init__(self, match): DeviantartExtractor.__init__(self, match) self.user = "" trange1, path, trange2, query = match.groups() query = text.parse_query(query) self.search_term = query.get("q") trange = trange1 or trange2 or query.get("order", "") if trange.startswith("popular-"): trange = trange[8:] self.time_range = { "newest" : "now", "most-recent" : "now", "this-week" : "1week", "this-month" : "1month", "this-century": "alltime", "all-time" : "alltime", }.get(trange, "alltime") self.popular = { "search": self.search_term or "", "range" : trange or "all-time", "path" : path.strip("/") if path else "", } def deviations(self): if self.time_range == "now": return self.api.browse_newest(self.search_term, self.offset) return self.api.browse_popular( self.search_term, self.time_range, self.offset) def prepare(self, deviation): DeviantartExtractor.prepare(self, deviation) deviation["popular"] = self.popular class DeviantartTagExtractor(DeviantartExtractor): """Extractor for deviations from tag searches""" subcategory = "tag" directory_fmt = ("{category}", "Tags", "{search_tags}") archive_fmt = "T_{search_tags}_{index}.{extension}" pattern = r"(?:https?://)?www\.deviantart\.com/tag/([^/?#]+)" test = ("https://www.deviantart.com/tag/nature", { "options": (("original", False),), "range": "1-30", "count": 30, }) def __init__(self, match): DeviantartExtractor.__init__(self, match) self.tag = text.unquote(match.group(1)) def deviations(self): return self.api.browse_tags(self.tag, self.offset) def prepare(self, deviation): DeviantartExtractor.prepare(self, deviation) deviation["search_tags"] = self.tag class DeviantartWatchExtractor(DeviantartExtractor): """Extractor for Deviations from watched users""" subcategory = "watch" pattern = (r"(?:https?://)?(?:www\.)?deviantart\.com" r"/(?:watch/deviations|notifications/watch)()()") test = ( ("https://www.deviantart.com/watch/deviations"), ("https://www.deviantart.com/notifications/watch"), ) def deviations(self): return self.api.browse_deviantsyouwatch() class DeviantartWatchPostsExtractor(DeviantartExtractor): """Extractor for Posts from watched users""" subcategory = "watch-posts" pattern = r"(?:https?://)?(?:www\.)?deviantart\.com/watch/posts()()" test = ("https://www.deviantart.com/watch/posts",) def deviations(self): return self.api.browse_posts_deviantsyouwatch() ############################################################################### # Eclipse ##################################################################### class DeviantartDeviationExtractor(DeviantartExtractor): """Extractor for single deviations""" subcategory = "deviation" archive_fmt = "g_{_username}_{index}.{extension}" pattern = BASE_PATTERN + r"/(art|journal)/(?:[^/?#]+-)?(\d+)" test = ( (("https://www.deviantart.com/shimoda7/art/For-the-sake-10073852"), { "options": (("original", 0),), "content": "6a7c74dc823ebbd457bdd9b3c2838a6ee728091e", }), ("https://www.deviantart.com/zzz/art/zzz-1234567890", { "exception": exception.NotFoundError, }), (("https://www.deviantart.com/myria-moon/art/Aime-Moi-261986576"), { "options": (("comments", True),), "pattern": r"https://api-da\.wixmp\.com/_api/download/file", "keyword": {"comments": list}, }), # wixmp URL rewrite (("https://www.deviantart.com/citizenfresh/art/Hverarond-789295466"), { "pattern": (r"https://images-wixmp-\w+\.wixmp\.com/f" r"/[^/]+/[^.]+\.jpg\?token="), }), # GIF (#242) (("https://www.deviantart.com/skatergators/art/COM-Moni-781571783"), { "pattern": (r"https://images-wixmp-\w+\.wixmp\.com" r"/f/[^/]+/[^.]+\.gif\?token="), }), # Flash animation with GIF preview (#1731) ("https://www.deviantart.com/yuumei/art/Flash-Comic-214724929", { "pattern": r"https://api-da\.wixmp\.com/_api/download" r"/file\?downloadToken=.+", "keyword": { "filename": "flash_comic_tutorial_by_yuumei-d3juatd", "extension": "swf", }, }), # sta.sh URLs from description (#302) (("https://www.deviantart.com/uotapo/art/INANAKI-Memo-590297498"), { "options": (("extra", 1), ("original", 0)), "pattern": DeviantartStashExtractor.pattern, "range": "2-", "count": 4, }), # video ("https://www.deviantart.com/chi-u/art/-VIDEO-Brushes-330774593", { "pattern": r"https://wixmp-.+wixmp.com/v/mp4/.+\.720p\.\w+.mp4", "keyword": { "filename": r"re:_video____brushes_\w+_by_chi_u-d5gxnb5", "extension": "mp4", "target": { "duration": 306, "filesize": 19367585, "quality": "720p", "src": str, }, } }), # journal ("https://www.deviantart.com/shimoda7/journal/ARTility-583755752", { "url": "d34b2c9f873423e665a1b8ced20fcb75951694a3", "pattern": "text:<!DOCTYPE html>\n", }), # journal-like post with isJournal == False (#419) ("https://www.deviantart.com/gliitchlord/art/brashstrokes-812942668", { "url": "e2e0044bd255304412179b6118536dbd9bb3bb0e", "pattern": "text:<!DOCTYPE html>\n", }), # old-style URLs ("https://shimoda7.deviantart.com" "/art/For-the-sake-of-a-memory-10073852"), ("https://myria-moon.deviantart.com" "/art/Aime-Moi-part-en-vadrouille-261986576"), ("https://zzz.deviantart.com/art/zzz-1234567890"), ) skip = Extractor.skip def __init__(self, match): DeviantartExtractor.__init__(self, match) self.type = match.group(3) self.deviation_id = match.group(4) def deviations(self): deviation = DeviantartEclipseAPI(self).deviation_extended_fetch( self.deviation_id, self.user, self.type) if "error" in deviation: raise exception.NotFoundError("deviation") return (self.api.deviation( deviation["deviation"]["extended"]["deviationUuid"]),) class DeviantartScrapsExtractor(DeviantartExtractor): """Extractor for an artist's scraps""" subcategory = "scraps" directory_fmt = ("{category}", "{username}", "Scraps") archive_fmt = "s_{_username}_{index}.{extension}" pattern = BASE_PATTERN + r"/gallery/(?:\?catpath=)?scraps\b" test = ( ("https://www.deviantart.com/shimoda7/gallery/scraps", { "count": 12, }), ("https://www.deviantart.com/shimoda7/gallery/?catpath=scraps"), ("https://shimoda7.deviantart.com/gallery/?catpath=scraps"), ) cookiedomain = ".deviantart.com" cookienames = ("auth", "auth_secure", "userinfo") def deviations(self): eclipse_api = DeviantartEclipseAPI(self) for obj in eclipse_api.gallery_scraps(self.user, self.offset): deviation = obj["deviation"] deviation_uuid = eclipse_api.deviation_extended_fetch( deviation["deviationId"], deviation["author"]["username"], "journal" if deviation["isJournal"] else "art", )["deviation"]["extended"]["deviationUuid"] yield self.api.deviation(deviation_uuid) class DeviantartFollowingExtractor(DeviantartExtractor): """Extractor for user's watched users""" subcategory = "following" pattern = BASE_PATTERN + "/about#watching$" test = ("https://www.deviantart.com/shimoda7/about#watching", { "pattern": DeviantartUserExtractor.pattern, "range": "1-50", "count": 50, }) def items(self): eclipse_api = DeviantartEclipseAPI(self) for user in eclipse_api.user_watching(self.user, self.offset): url = "{}/{}".format(self.root, user["username"]) user["_extractor"] = DeviantartUserExtractor yield Message.Queue, url, user ############################################################################### # API Interfaces ############################################################## class DeviantartOAuthAPI(): """Interface for the DeviantArt OAuth API Ref: https://www.deviantart.com/developers/http/v1/20160316 """ CLIENT_ID = "5388" CLIENT_SECRET = "76b08c69cfb27f26d6161f9ab6d061a1" def __init__(self, extractor): self.extractor = extractor self.log = extractor.log self.headers = {"dA-minor-version": "20200519"} self._warn_429 = True self.delay = extractor.config("wait-min", 0) self.delay_min = max(2, self.delay) self.mature = extractor.config("mature", "true") if not isinstance(self.mature, str): self.mature = "true" if self.mature else "false" self.folders = extractor.config("folders", False) self.metadata = extractor.extra or extractor.config("metadata", False) self.client_id = extractor.config("client-id") if self.client_id: self.client_secret = extractor.config("client-secret") else: self.client_id = self.CLIENT_ID self.client_secret = self.CLIENT_SECRET token = extractor.config("refresh-token") if token is None or token == "cache": token = "#" + str(self.client_id) if not _refresh_token_cache(token): token = None self.refresh_token_key = token self.log.debug( "Using %s API credentials (client-id %s)", "default" if self.client_id == self.CLIENT_ID else "custom", self.client_id, ) def browse_deviantsyouwatch(self, offset=0): """Yield deviations from users you watch""" endpoint = "/browse/deviantsyouwatch" params = {"limit": "50", "offset": offset, "mature_content": self.mature} return self._pagination(endpoint, params, public=False) def browse_posts_deviantsyouwatch(self, offset=0): """Yield posts from users you watch""" endpoint = "/browse/posts/deviantsyouwatch" params = {"limit": "50", "offset": offset, "mature_content": self.mature} return self._pagination(endpoint, params, public=False, unpack=True) def browse_newest(self, query=None, offset=0): """Browse newest deviations""" endpoint = "/browse/newest" params = { "q" : query, "limit" : 50 if self.metadata else 120, "offset" : offset, "mature_content": self.mature, } return self._pagination(endpoint, params) def browse_popular(self, query=None, timerange=None, offset=0): """Yield popular deviations""" endpoint = "/browse/popular" params = { "q" : query, "limit" : 50 if self.metadata else 120, "timerange" : timerange, "offset" : offset, "mature_content": self.mature, } return self._pagination(endpoint, params) def browse_tags(self, tag, offset=0): """ Browse a tag """ endpoint = "/browse/tags" params = { "tag" : tag, "offset" : offset, "limit" : 50, "mature_content": self.mature, } return self._pagination(endpoint, params) def browse_user_journals(self, username, offset=0): """Yield all journal entries of a specific user""" endpoint = "/browse/user/journals" params = {"username": username, "offset": offset, "limit": 50, "mature_content": self.mature, "featured": "false"} return self._pagination(endpoint, params) def collections(self, username, folder_id, offset=0): """Yield all Deviation-objects contained in a collection folder""" endpoint = "/collections/" + folder_id params = {"username": username, "offset": offset, "limit": 24, "mature_content": self.mature} return self._pagination(endpoint, params) @memcache(keyarg=1) def collections_folders(self, username, offset=0): """Yield all collection folders of a specific user""" endpoint = "/collections/folders" params = {"username": username, "offset": offset, "limit": 50, "mature_content": self.mature} return self._pagination_list(endpoint, params) def comments_deviation(self, deviation_id, offset=0): """Fetch comments posted on a deviation""" endpoint = "/comments/deviation/" + deviation_id params = {"maxdepth": "5", "offset": offset, "limit": 50, "mature_content": self.mature} return self._pagination_list(endpoint, params=params, key="thread") def deviation(self, deviation_id, public=True): """Query and return info about a single Deviation""" endpoint = "/deviation/" + deviation_id deviation = self._call(endpoint, public=public) if self.metadata: self._metadata((deviation,)) if self.folders: self._folders((deviation,)) return deviation def deviation_content(self, deviation_id, public=False): """Get extended content of a single Deviation""" endpoint = "/deviation/content" params = {"deviationid": deviation_id} return self._call(endpoint, params=params, public=public) def deviation_download(self, deviation_id, public=True): """Get the original file download (if allowed)""" endpoint = "/deviation/download/" + deviation_id params = {"mature_content": self.mature} return self._call(endpoint, params=params, public=public) def deviation_metadata(self, deviations): """ Fetch deviation metadata for a set of deviations""" if not deviations: return [] endpoint = "/deviation/metadata?" + "&".join( "deviationids[{}]={}".format(num, deviation["deviationid"]) for num, deviation in enumerate(deviations) ) params = {"mature_content": self.mature} return self._call(endpoint, params=params)["metadata"] def gallery(self, username, folder_id, offset=0, extend=True, public=True): """Yield all Deviation-objects contained in a gallery folder""" endpoint = "/gallery/" + folder_id params = {"username": username, "offset": offset, "limit": 24, "mature_content": self.mature, "mode": "newest"} return self._pagination(endpoint, params, extend, public) def gallery_all(self, username, offset=0): """Yield all Deviation-objects of a specific user""" endpoint = "/gallery/all" params = {"username": username, "offset": offset, "limit": 24, "mature_content": self.mature} return self._pagination(endpoint, params) @memcache(keyarg=1) def gallery_folders(self, username, offset=0): """Yield all gallery folders of a specific user""" endpoint = "/gallery/folders" params = {"username": username, "offset": offset, "limit": 50, "mature_content": self.mature} return self._pagination_list(endpoint, params) @memcache(keyarg=1) def user_profile(self, username): """Get user profile information""" endpoint = "/user/profile/" + username return self._call(endpoint, fatal=False) def user_friends_watch(self, username): """Watch a user""" endpoint = "/user/friends/watch/" + username data = { "watch[friend]" : "0", "watch[deviations]" : "0", "watch[journals]" : "0", "watch[forum_threads]": "0", "watch[critiques]" : "0", "watch[scraps]" : "0", "watch[activity]" : "0", "watch[collections]" : "0", "mature_content" : self.mature, } return self._call( endpoint, method="POST", data=data, public=False, fatal=False, ).get("success") def user_friends_unwatch(self, username): """Unwatch a user""" endpoint = "/user/friends/unwatch/" + username return self._call( endpoint, method="POST", public=False, fatal=False, ).get("success") def authenticate(self, refresh_token_key): """Authenticate the application by requesting an access token""" self.headers["Authorization"] = \ self._authenticate_impl(refresh_token_key) @cache(maxage=3600, keyarg=1) def _authenticate_impl(self, refresh_token_key): """Actual authenticate implementation""" url = "https://www.deviantart.com/oauth2/token" if refresh_token_key: self.log.info("Refreshing private access token") data = {"grant_type": "refresh_token", "refresh_token": _refresh_token_cache(refresh_token_key)} else: self.log.info("Requesting public access token") data = {"grant_type": "client_credentials"} auth = (self.client_id, self.client_secret) response = self.extractor.request( url, method="POST", data=data, auth=auth, fatal=False) data = response.json() if response.status_code != 200: self.log.debug("Server response: %s", data) raise exception.AuthenticationError('"{}" ({})'.format( data.get("error_description"), data.get("error"))) if refresh_token_key: _refresh_token_cache.update( refresh_token_key, data["refresh_token"]) return "Bearer " + data["access_token"] def _call(self, endpoint, fatal=True, public=True, **kwargs): """Call an API endpoint""" url = "https://www.deviantart.com/api/v1/oauth2" + endpoint kwargs["fatal"] = None while True: if self.delay: time.sleep(self.delay) self.authenticate(None if public else self.refresh_token_key) kwargs["headers"] = self.headers response = self.extractor.request(url, **kwargs) data = response.json() status = response.status_code if 200 <= status < 400: if self.delay > self.delay_min: self.delay -= 1 return data if not fatal and status != 429: return None if data.get("error_description") == "User not found.": raise exception.NotFoundError("user or group") self.log.debug(response.text) msg = "API responded with {} {}".format( status, response.reason) if status == 429: if self.delay < 30: self.delay += 1 self.log.warning("%s. Using %ds delay.", msg, self.delay) if self._warn_429 and self.delay >= 3: self._warn_429 = False if self.client_id == self.CLIENT_ID: self.log.info( "Register your own OAuth application and use its " "credentials to prevent this error: " "https://github.com/mikf/gallery-dl/blob/master/do" "cs/configuration.rst#extractordeviantartclient-id" "--client-secret") else: self.log.error(msg) return data def _pagination(self, endpoint, params, extend=True, public=True, unpack=False, key="results"): warn = True while True: data = self._call(endpoint, params=params, public=public) if key not in data: self.log.error("Unexpected API response: %s", data) return results = data[key] if unpack: results = [item["journal"] for item in results if "journal" in item] if extend: if public and len(results) < params["limit"]: if self.refresh_token_key: self.log.debug("Switching to private access token") public = False continue elif data["has_more"] and warn: warn = False self.log.warning( "Private deviations detected! Run 'gallery-dl " "oauth:deviantart' and follow the instructions to " "be able to access them.") if self.metadata: self._metadata(results) if self.folders: self._folders(results) yield from results if not data["has_more"]: return if "next_cursor" in data: params["offset"] = None params["cursor"] = data["next_cursor"] else: params["offset"] = data["next_offset"] params["cursor"] = None def _pagination_list(self, endpoint, params, key="results"): result = [] result.extend(self._pagination(endpoint, params, False, key=key)) return result def _metadata(self, deviations): """Add extended metadata to each deviation object""" for deviation, metadata in zip( deviations, self.deviation_metadata(deviations)): deviation.update(metadata) deviation["tags"] = [t["tag_name"] for t in deviation["tags"]] def _folders(self, deviations): """Add a list of all containing folders to each deviation object""" for deviation in deviations: deviation["folders"] = self._folders_map( deviation["author"]["username"])[deviation["deviationid"]] @memcache(keyarg=1) def _folders_map(self, username): """Generate a deviation_id -> folders mapping for 'username'""" self.log.info("Collecting folder information for '%s'", username) folders = self.gallery_folders(username) # create 'folderid'-to-'folder' mapping fmap = { folder["folderid"]: folder for folder in folders } # add parent names to folders, but ignore "Featured" as parent featured = folders[0]["folderid"] done = False while not done: done = True for folder in folders: parent = folder["parent"] if not parent: pass elif parent == featured: folder["parent"] = None else: parent = fmap[parent] if parent["parent"]: done = False else: folder["name"] = parent["name"] + "/" + folder["name"] folder["parent"] = None # map deviationids to folder names dmap = collections.defaultdict(list) for folder in folders: for deviation in self.gallery( username, folder["folderid"], 0, False): dmap[deviation["deviationid"]].append(folder["name"]) return dmap class DeviantartEclipseAPI(): """Interface to the DeviantArt Eclipse API""" def __init__(self, extractor): self.extractor = extractor self.log = extractor.log def deviation_extended_fetch(self, deviation_id, user=None, kind=None): endpoint = "/da-browse/shared_api/deviation/extended_fetch" params = { "deviationid" : deviation_id, "username" : user, "type" : kind, "include_session": "false", } return self._call(endpoint, params) def gallery_scraps(self, user, offset=None): endpoint = "/da-user-profile/api/gallery/contents" params = { "username" : user, "offset" : offset, "limit" : 24, "scraps_folder": "true", } return self._pagination(endpoint, params) def user_watching(self, user, offset=None): endpoint = "/da-user-profile/api/module/watching" params = { "username": user, "moduleid": self._module_id_watching(user), "offset" : offset, "limit" : 24, } return self._pagination(endpoint, params) def _call(self, endpoint, params=None): url = "https://www.deviantart.com/_napi" + endpoint headers = {"Referer": "https://www.deviantart.com/"} response = self.extractor._limited_request( url, params=params, headers=headers, fatal=None) if response.status_code == 404: raise exception.StopExtraction( "Your account must use the Eclipse interface.") try: return response.json() except Exception: return {"error": response.text} def _pagination(self, endpoint, params): while True: data = self._call(endpoint, params) results = data.get("results") if results is None: return yield from results if not data.get("hasMore"): return next_offset = data.get("nextOffset") if next_offset: params["offset"] = next_offset else: params["offset"] += params["limit"] def _module_id_watching(self, user): url = "{}/{}/about".format(self.extractor.root, user) page = self.extractor._limited_request(url).text pos = page.find('\\"type\\":\\"watching\\"') if pos < 0: raise exception.NotFoundError("module") return text.rextract(page, '\\"id\\":', ',', pos)[0].strip('" ') @cache(maxage=100*365*24*3600, keyarg=0) def _refresh_token_cache(token): if token and token[0] == "#": return None return token ############################################################################### # Journal Formats ############################################################# SHADOW_TEMPLATE = """ <span class="shadow"> <img src="{src}" class="smshadow" width="{width}" height="{height}"> </span> <br><br> """ HEADER_TEMPLATE = """<div usr class="gr"> <div class="metadata"> <h2><a href="{url}">{title}</a></h2> <ul> <li class="author"> by <span class="name"><span class="username-with-symbol u"> <a class="u regular username" href="{userurl}">{username}</a>\ <span class="user-symbol regular"></span></span></span>, <span>{date}</span> </li> <li class="category"> {categories} </li> </ul> </div> """ HEADER_CUSTOM_TEMPLATE = """<div class='boxtop journaltop'> <h2> <img src="https://st.deviantart.net/minish/gruzecontrol/icons/journal.gif\ ?2" style="vertical-align:middle" alt=""/> <a href="{url}">{title}</a> </h2> Journal Entry: <span>{date}</span> """ JOURNAL_TEMPLATE_HTML = """text:<!DOCTYPE html> <html> <head> <meta charset="utf-8"> <title>{title}
{shadow}
{html}
""" JOURNAL_TEMPLATE_HTML_EXTRA = """\
\
{}
{}
""" JOURNAL_TEMPLATE_TEXT = """text:{title} by {username}, {date} {content} """ ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1643756959.0 gallery_dl-1.21.1/gallery_dl/extractor/directlink.py0000644000175000017500000000501014176336637021255 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2017-2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Direct link handling""" from .common import Extractor, Message from .. import text class DirectlinkExtractor(Extractor): """Extractor for direct links to images and other media files""" category = "directlink" filename_fmt = "{domain}/{path}/{filename}.{extension}" archive_fmt = filename_fmt pattern = (r"(?i)https?://(?P[^/?#]+)/(?P[^?#]+\." r"(?:jpe?g|jpe|png|gif|web[mp]|mp4|mkv|og[gmv]|opus))" r"(?:\?(?P[^/?#]*))?(?:#(?P.*))?$") test = ( (("https://en.wikipedia.org/static/images/project-logos/enwiki.png"), { "url": "18c5d00077332e98e53be9fed2ee4be66154b88d", "keyword": "105770a3f4393618ab7b811b731b22663b5d3794", }), # empty path (("https://example.org/file.webm"), { "url": "2d807ed7059d1b532f1bb71dc24b510b80ff943f", "keyword": "29dad729c40fb09349f83edafa498dba1297464a", }), # more complex example ("https://example.org/path/to/file.webm?que=1&ry=2#fragment", { "url": "114b8f1415cc224b0f26488ccd4c2e7ce9136622", "keyword": "06014abd503e3b2b58aa286f9bdcefdd2ae336c0", }), # percent-encoded characters ("https://example.org/%27%3C%23/%23%3E%27.jpg?key=%3C%26%3E", { "url": "2627e8140727fdf743f86fe18f69f99a052c9718", "keyword": "831790fddda081bdddd14f96985ab02dc5b5341f", }), # upper case file extension (#296) ("https://post-phinf.pstatic.net/MjAxOTA1MjlfMTQ4/MDAxNTU5MTI2NjcyNTkw" ".JUzkGb4V6dj9DXjLclrOoqR64uDxHFUO5KDriRdKpGwg.88mCtd4iT1NHlpVKSCaUpP" "mZPiDgT8hmQdQ5K_gYyu0g.JPEG/2.JPG"), ) def __init__(self, match): Extractor.__init__(self, match) self.data = match.groupdict() def items(self): data = self.data for key, value in data.items(): if value: data[key] = text.unquote(value) data["path"], _, name = data["path"].rpartition("/") data["filename"], _, ext = name.rpartition(".") data["extension"] = ext.lower() data["_http_headers"] = { "Referer": self.url.encode("latin-1", "ignore")} yield Message.Directory, data yield Message.Url, self.url, data ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1646253139.0 gallery_dl-1.21.1/gallery_dl/extractor/dynastyscans.py0000644000175000017500000001266714207752123021653 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2015-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://dynasty-scans.com/""" from .common import ChapterExtractor, MangaExtractor, Extractor, Message from .. import text import json import re BASE_PATTERN = r"(?:https?://)?(?:www\.)?dynasty-scans\.com" class DynastyscansBase(): """Base class for dynastyscans extractors""" category = "dynastyscans" root = "https://dynasty-scans.com" def _parse_image_page(self, image_id): url = "{}/images/{}".format(self.root, image_id) extr = text.extract_from(self.request(url).text) date = extr("class='create_at'>", "") tags = extr("class='tags'>", "") src = extr("class='btn-group'>", "") url = extr(' src="', '"') src = text.extract(src, 'href="', '"')[0] if "Source<" in src else "" return { "url" : self.root + url, "image_id": text.parse_int(image_id), "tags" : text.split_html(tags), "date" : text.remove_html(date), "source" : text.unescape(src), } class DynastyscansChapterExtractor(DynastyscansBase, ChapterExtractor): """Extractor for manga-chapters from dynasty-scans.com""" pattern = BASE_PATTERN + r"(/chapters/[^/?#]+)" test = ( (("http://dynasty-scans.com/chapters/" "hitoribocchi_no_oo_seikatsu_ch33"), { "url": "dce64e8c504118f1ab4135c00245ea12413896cb", "keyword": "b67599703c27316a2fe4f11c3232130a1904e032", }), (("http://dynasty-scans.com/chapters/" "new_game_the_spinoff_special_13"), { "url": "dbe5bbb74da2edcfb1832895a484e2a40bc8b538", "keyword": "6b674eb3a274999153f6be044973b195008ced2f", }), ) def metadata(self, page): extr = text.extract_from(page) match = re.match( (r"(?:]*>)?([^<]+)(?:
)?" # manga name r"(?: ch(\d+)([^:<]*))?" # chapter info r"(?:: (.+))?"), # title extr("

", ""), ) author = extr(" by ", "") group = extr('"icon-print"> ', '') return { "manga" : text.unescape(match.group(1)), "chapter" : text.parse_int(match.group(2)), "chapter_minor": match.group(3) or "", "title" : text.unescape(match.group(4) or ""), "author" : text.remove_html(author), "group" : (text.remove_html(group) or text.extract(group, ' alt="', '"')[0] or ""), "date" : text.parse_datetime(extr( '"icon-calendar"> ', '<'), "%b %d, %Y"), "lang" : "en", "language": "English", } def images(self, page): data = text.extract(page, "var pages = ", ";\n")[0] return [ (self.root + img["image"], None) for img in json.loads(data) ] class DynastyscansMangaExtractor(DynastyscansBase, MangaExtractor): chapterclass = DynastyscansChapterExtractor reverse = False pattern = BASE_PATTERN + r"(/series/[^/?#]+)" test = ("https://dynasty-scans.com/series/hitoribocchi_no_oo_seikatsu", { "pattern": DynastyscansChapterExtractor.pattern, "count": ">= 100", }) def chapters(self, page): return [ (self.root + path, {}) for path in text.extract_iter(page, '
\nPlease wait a few moments", 0, 600) < 0: return response time.sleep(5) def _pagination(self, url, params): for params["page"] in itertools.count(1): page = self.request(url, params=params).text album_ids = EromeAlbumExtractor.pattern.findall(page) yield from album_ids if len(album_ids) < 36: return class EromeAlbumExtractor(EromeExtractor): """Extractor for albums on erome.com""" subcategory = "album" pattern = BASE_PATTERN + r"/a/(\w+)" test = ("https://www.erome.com/a/TyFMI7ik", { "pattern": r"https://s\d+\.erome\.com/\d+/TyFMI7ik/\w+", "count": 9, "keyword": { "album_id": "TyFMI7ik", "num": int, "title": "Ryan Ryans", "user": "xanub", }, }) def albums(self): return (self.item,) class EromeUserExtractor(EromeExtractor): subcategory = "user" pattern = BASE_PATTERN + r"/(?!a/|search\?)([^/?#]+)" test = ("https://www.erome.com/xanub", { "range": "1-25", "count": 25, }) def albums(self): url = "{}/{}".format(self.root, self.item) return self._pagination(url, {}) class EromeSearchExtractor(EromeExtractor): subcategory = "search" pattern = BASE_PATTERN + r"/search\?q=([^&#]+)" test = ("https://www.erome.com/search?q=cute", { "range": "1-25", "count": 25, }) def albums(self): url = self.root + "/search" params = {"q": text.unquote(self.item)} return self._pagination(url, params) @cache() def _cookie_cache(): return () ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1646795443.0 gallery_dl-1.21.1/gallery_dl/extractor/exhentai.py0000644000175000017500000004603414212015263020721 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2014-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://e-hentai.org/ and https://exhentai.org/""" from .common import Extractor, Message from .. import text, util, exception from ..cache import cache import itertools import math BASE_PATTERN = r"(?:https?://)?(e[x-]|g\.e-)hentai\.org" class ExhentaiExtractor(Extractor): """Base class for exhentai extractors""" category = "exhentai" directory_fmt = ("{category}", "{gid} {title[:247]}") filename_fmt = ( "{gid}_{num:>04}_{image_token}_{filename}.{extension}") archive_fmt = "{gid}_{num}" cookienames = ("ipb_member_id", "ipb_pass_hash") cookiedomain = ".exhentai.org" root = "https://exhentai.org" request_interval = 5.0 LIMIT = False def __init__(self, match): # allow calling 'self.config()' before 'Extractor.__init__()' self._cfgpath = ("extractor", self.category, self.subcategory) version = match.group(1) domain = self.config("domain", "auto") if domain == "auto": domain = ("ex" if version == "ex" else "e-") + "hentai.org" self.root = "https://" + domain self.cookiedomain = "." + domain Extractor.__init__(self, match) self.original = self.config("original", True) limits = self.config("limits", False) if limits and limits.__class__ is int: self.limits = limits self._remaining = 0 else: self.limits = False self.session.headers["Referer"] = self.root + "/" if version != "ex": self.session.cookies.set("nw", "1", domain=self.cookiedomain) def request(self, *args, **kwargs): response = Extractor.request(self, *args, **kwargs) if self._is_sadpanda(response): self.log.info("sadpanda.jpg") raise exception.AuthorizationError() return response def login(self): """Login and set necessary cookies""" if self.LIMIT: raise exception.StopExtraction("Image limit reached!") if self._check_cookies(self.cookienames): return username, password = self._get_auth_info() if username: self._update_cookies(self._login_impl(username, password)) else: self.log.info("no username given; using e-hentai.org") self.root = "https://e-hentai.org" self.original = False self.limits = False self.session.cookies["nw"] = "1" @cache(maxage=90*24*3600, keyarg=1) def _login_impl(self, username, password): self.log.info("Logging in as %s", username) url = "https://forums.e-hentai.org/index.php?act=Login&CODE=01" headers = { "Referer": "https://e-hentai.org/bounce_login.php?b=d&bt=1-1", } data = { "CookieDate": "1", "b": "d", "bt": "1-1", "UserName": username, "PassWord": password, "ipb_login_submit": "Login!", } response = self.request(url, method="POST", headers=headers, data=data) if b"You are now logged in as:" not in response.content: raise exception.AuthenticationError() return {c: response.cookies[c] for c in self.cookienames} @staticmethod def _is_sadpanda(response): """Return True if the response object contains a sad panda""" return ( response.headers.get("Content-Length") == "9615" and "sadpanda.jpg" in response.headers.get("Content-Disposition", "") ) class ExhentaiGalleryExtractor(ExhentaiExtractor): """Extractor for image galleries from exhentai.org""" subcategory = "gallery" pattern = (BASE_PATTERN + r"(?:/g/(\d+)/([\da-f]{10})" r"|/s/([\da-f]{10})/(\d+)-(\d+))") test = ( ("https://exhentai.org/g/1200119/d55c44d3d0/", { "keyword": { "cost": int, "date": "dt:2018-03-18 20:15:00", "eh_category": "Non-H", "expunged": False, "favorites": "20", "filecount": "4", "filesize": 1488978, "gid": 1200119, "height": int, "image_token": "re:[0-9a-f]{10}", "lang": "ja", "language": "Japanese", "parent": "", "rating": r"re:\d\.\d+", "size": int, "tags": [ "parody:komi-san wa komyushou desu.", "character:shouko komi", "group:seventh lowlife", "other:sample", ], "thumb": "https://exhentai.org/t/ce/0a/ce0a5bcb583229a9b07c0f8" "3bcb1630ab1350640-624622-736-1036-jpg_250.jpg", "title": "C93 [Seventh_Lowlife] Komi-san ha Tokidoki Daitan de" "su (Komi-san wa Komyushou desu) [Sample]", "title_jpn": "(C93) [Comiketjack (わ!)] 古見さんは、時々大胆" "です。 (古見さんは、コミュ症です。) [見本]", "token": "d55c44d3d0", "torrentcount": "0", "uploader": "klorpa", "width": int, }, "content": "e9891a4c017ed0bb734cd1efba5cd03f594d31ff", }), ("https://exhentai.org/g/960461/4f0e369d82/", { "exception": exception.NotFoundError, }), ("http://exhentai.org/g/962698/7f02358e00/", { "exception": exception.AuthorizationError, }), ("https://exhentai.org/s/f68367b4c8/1200119-3", { "count": 2, }), ("https://e-hentai.org/s/f68367b4c8/1200119-3", { "count": 2, }), ("https://g.e-hentai.org/g/1200119/d55c44d3d0/"), ) def __init__(self, match): ExhentaiExtractor.__init__(self, match) self.key = {} self.count = 0 self.gallery_id = text.parse_int(match.group(2) or match.group(5)) self.gallery_token = match.group(3) self.image_token = match.group(4) self.image_num = text.parse_int(match.group(6), 1) source = self.config("source") if source == "hitomi": self.items = self._items_hitomi def items(self): self.login() if self.gallery_token: gpage = self._gallery_page() self.image_token = text.extract(gpage, 'hentai.org/s/', '"')[0] if not self.image_token: self.log.error("Failed to extract initial image token") self.log.debug("Page content:\n%s", gpage) return ipage = self._image_page() else: ipage = self._image_page() part = text.extract(ipage, 'hentai.org/g/', '"')[0] if not part: self.log.error("Failed to extract gallery token") self.log.debug("Page content:\n%s", ipage) return self.gallery_token = part.split("/")[1] gpage = self._gallery_page() data = self.get_metadata(gpage) self.count = text.parse_int(data["filecount"]) yield Message.Directory, data def _validate_response(response): # declared inside 'items()' to be able to access 'data' if not response.history and response.headers.get( "content-type", "").startswith("text/html"): self._report_limits(data) return True images = itertools.chain( (self.image_from_page(ipage),), self.images_from_api()) for url, image in images: data.update(image) if self.limits: self._check_limits(data) if "/fullimg.php" in url: data["extension"] = "" data["_http_validate"] = _validate_response else: data["_http_validate"] = None yield Message.Url, url, data def _items_hitomi(self): if self.config("metadata", False): data = self.metadata_from_api() data["date"] = text.parse_timestamp(data["posted"]) else: data = {} from .hitomi import HitomiGalleryExtractor url = "https://hitomi.la/galleries/{}.html".format(self.gallery_id) data["_extractor"] = HitomiGalleryExtractor yield Message.Queue, url, data def get_metadata(self, page): """Extract gallery metadata""" data = self.metadata_from_page(page) if self.config("metadata", False): data.update(self.metadata_from_api()) data["date"] = text.parse_timestamp(data["posted"]) return data def metadata_from_page(self, page): extr = text.extract_from(page) data = { "gid" : self.gallery_id, "token" : self.gallery_token, "thumb" : extr("background:transparent url(", ")"), "title" : text.unescape(extr('

', '

')), "title_jpn" : text.unescape(extr('

', '

')), "_" : extr('
', '<'), "uploader" : extr('
', '
'), "date" : text.parse_datetime(extr( '>Posted:

'), "%Y-%m-%d %H:%M"), "parent" : extr( '>Parent:
', 'Visible:', '<'), "language" : extr('>Language:', ' '), "filesize" : text.parse_bytes(extr( '>File Size:', '<').rstrip("Bb")), "filecount" : extr('>Length:', ' '), "favorites" : extr('id="favcount">', ' '), "rating" : extr(">Average: ", "<"), "torrentcount" : extr('>Torrent Download (', ')'), } if data["uploader"].startswith("<"): data["uploader"] = text.unescape(text.extract( data["uploader"], ">", "<")[0]) f = data["favorites"][0] if f == "N": data["favorites"] = "0" elif f == "O": data["favorites"] = "1" data["lang"] = util.language_to_code(data["language"]) data["tags"] = [ text.unquote(tag.replace("+", " ")) for tag in text.extract_iter(page, 'hentai.org/tag/', '"') ] return data def metadata_from_api(self): url = self.root + "/api.php" data = { "method": "gdata", "gidlist": ((self.gallery_id, self.gallery_token),), "namespace": 1, } data = self.request(url, method="POST", json=data).json() if "error" in data: raise exception.StopExtraction(data["error"]) return data["gmetadata"][0] def image_from_page(self, page): """Get image url and data from webpage""" pos = page.index('
", "")[0] self.log.debug("Image Limits: %s/%s", current, self.limits) self._remaining = self.limits - text.parse_int(current) def _gallery_page(self): url = "{}/g/{}/{}/".format( self.root, self.gallery_id, self.gallery_token) response = self.request(url, fatal=False) page = response.text if response.status_code == 404 and "Gallery Not Available" in page: raise exception.AuthorizationError() if page.startswith(("Key missing", "Gallery not found")): raise exception.NotFoundError("gallery") if "hentai.org/mpv/" in page: self.log.warning("Enabled Multi-Page Viewer is not supported") return page def _image_page(self): url = "{}/s/{}/{}-{}".format( self.root, self.image_token, self.gallery_id, self.image_num) page = self.request(url, fatal=False).text if page.startswith(("Invalid page", "Keep trying")): raise exception.NotFoundError("image page") return page @staticmethod def _parse_image_info(url): for part in url.split("/")[4:]: try: _, size, width, height, _ = part.split("-") break except ValueError: pass else: size = width = height = 0 return { "cost" : 1, "size" : text.parse_int(size), "width" : text.parse_int(width), "height": text.parse_int(height), } @staticmethod def _parse_original_info(info): parts = info.lstrip().split(" ") size = text.parse_bytes(parts[3] + parts[4][0]) return { # 1 initial point + 1 per 0.1 MB "cost" : 1 + math.ceil(size / 100000), "size" : size, "width" : text.parse_int(parts[0]), "height": text.parse_int(parts[2]), } class ExhentaiSearchExtractor(ExhentaiExtractor): """Extractor for exhentai search results""" subcategory = "search" pattern = BASE_PATTERN + r"/(?:\?([^#]*)|tag/([^/?#]+))" test = ( ("https://e-hentai.org/?f_search=touhou"), ("https://exhentai.org/?f_cats=767&f_search=touhou"), ("https://exhentai.org/tag/parody:touhou+project"), (("https://exhentai.org/?f_doujinshi=0&f_manga=0&f_artistcg=0" "&f_gamecg=0&f_western=0&f_non-h=1&f_imageset=0&f_cosplay=0" "&f_asianporn=0&f_misc=0&f_search=touhou&f_apply=Apply+Filter"), { "pattern": ExhentaiGalleryExtractor.pattern, "range": "1-30", "count": 30, }), ) def __init__(self, match): ExhentaiExtractor.__init__(self, match) self.search_url = self.root _, query, tag = match.groups() if tag: if "+" in tag: ns, _, tag = tag.rpartition(":") tag = '{}:"{}$"'.format(ns, tag.replace("+", " ")) else: tag += "$" self.params = {"f_search": tag, "page": 0} else: self.params = text.parse_query(query) self.params["page"] = text.parse_int(self.params.get("page")) def items(self): self.login() data = {"_extractor": ExhentaiGalleryExtractor} while True: last = None page = self.request(self.search_url, params=self.params).text for gallery in ExhentaiGalleryExtractor.pattern.finditer(page): url = gallery.group(0) if url == last: continue last = url yield Message.Queue, url, data if 'class="ptdd">><' in page or ">No hits found

" in page: return self.params["page"] += 1 class ExhentaiFavoriteExtractor(ExhentaiSearchExtractor): """Extractor for favorited exhentai galleries""" subcategory = "favorite" pattern = BASE_PATTERN + r"/favorites\.php(?:\?([^#]*)())?" test = ( ("https://e-hentai.org/favorites.php", { "count": 1, "pattern": r"https?://e-hentai\.org/g/1200119/d55c44d3d0" }), ("https://exhentai.org/favorites.php?favcat=1&f_search=touhou" "&f_apply=Search+Favorites"), ) def __init__(self, match): ExhentaiSearchExtractor.__init__(self, match) self.search_url = self.root + "/favorites.php" ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1639190302.0 gallery_dl-1.21.1/gallery_dl/extractor/fallenangels.py0000644000175000017500000000761614155007436021563 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2017-2019 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extract manga-chapters from https://www.fascans.com/""" from .common import ChapterExtractor, MangaExtractor from .. import text, util import json class FallenangelsChapterExtractor(ChapterExtractor): """Extractor for manga-chapters from fascans.com""" category = "fallenangels" pattern = (r"(?:https?://)?(manga|truyen)\.fascans\.com" r"/manga/([^/?#]+)/([^/?#]+)") test = ( ("https://manga.fascans.com/manga/chronos-ruler/20/1", { "url": "4604a7914566cc2da0ff789aa178e2d1c8c241e3", "keyword": "2dfcc50020e32cd207be88e2a8fac0933e36bdfb", }), ("http://truyen.fascans.com/manga/hungry-marie/8", { "url": "1f923d9cb337d5e7bbf4323719881794a951c6ae", "keyword": "2bdb7334c0e3eceb9946ffd3132df679b4a94f6a", }), ("http://manga.fascans.com/manga/rakudai-kishi-no-eiyuutan/19.5", { "url": "273f6863966c83ea79ad5846a2866e08067d3f0e", "keyword": "d1065685bfe0054c4ff2a0f20acb089de4cec253", }), ) def __init__(self, match): self.version, self.manga, self.chapter = match.groups() url = "https://{}.fascans.com/manga/{}/{}/1".format( self.version, self.manga, self.chapter) ChapterExtractor.__init__(self, match, url) def metadata(self, page): extr = text.extract_from(page) lang = "vi" if self.version == "truyen" else "en" chapter, sep, minor = self.chapter.partition(".") return { "manga" : extr('name="description" content="', ' Chapter '), "title" : extr(': ', ' - Page 1'), "chapter" : chapter, "chapter_minor": sep + minor, "lang" : lang, "language": util.code_to_language(lang), } @staticmethod def images(page): return [ (img["page_image"], None) for img in json.loads( text.extract(page, "var pages = ", ";")[0] ) ] class FallenangelsMangaExtractor(MangaExtractor): """Extractor for manga from fascans.com""" chapterclass = FallenangelsChapterExtractor category = "fallenangels" pattern = r"(?:https?://)?((manga|truyen)\.fascans\.com/manga/[^/]+)/?$" test = ( ("https://manga.fascans.com/manga/chronos-ruler", { "url": "eea07dd50f5bc4903aa09e2cc3e45c7241c9a9c2", "keyword": "c414249525d4c74ad83498b3c59a813557e59d7e", }), ("https://truyen.fascans.com/manga/rakudai-kishi-no-eiyuutan", { "url": "51a731a6b82d5eb7a335fbae6b02d06aeb2ab07b", "keyword": "2d2a2a5d9ea5925eb9a47bb13d848967f3af086c", }), ) def __init__(self, match): url = "https://" + match.group(1) self.lang = "vi" if match.group(2) == "truyen" else "en" MangaExtractor.__init__(self, match, url) def chapters(self, page): extr = text.extract_from(page) results = [] language = util.code_to_language(self.lang) while extr('
  • ', '<') title = extr('', '') manga, _, chapter = cha.rpartition(" ") chapter, dot, minor = chapter.partition(".") results.append((url, { "manga" : manga, "title" : text.unescape(title), "volume" : text.parse_int(vol), "chapter" : text.parse_int(chapter), "chapter_minor": dot + minor, "lang" : self.lang, "language": language, })) return results ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1648567962.0 gallery_dl-1.21.1/gallery_dl/extractor/fanbox.py0000644000175000017500000002526314220623232020372 0ustar00mikemike# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.fanbox.cc/""" from .common import Extractor, Message from .. import text BASE_PATTERN = ( r"(?:https?://)?(?:" r"(?!www\.)([\w-]+)\.fanbox\.cc|" r"(?:www\.)?fanbox\.cc/@([\w-]+))" ) class FanboxExtractor(Extractor): """Base class for Fanbox extractors""" category = "fanbox" root = "https://www.fanbox.cc" directory_fmt = ("{category}", "{creatorId}") filename_fmt = "{id}_{num}.{extension}" archive_fmt = "{id}_{num}" _warning = True def __init__(self, match): Extractor.__init__(self, match) self.embeds = self.config("embeds", True) def items(self): if self._warning: if not self._check_cookies(("FANBOXSESSID",)): self.log.warning("no 'FANBOXSESSID' cookie set") FanboxExtractor._warning = False for content_body, post in self.posts(): yield Message.Directory, post yield from self._get_urls_from_post(content_body, post) def posts(self): """Return all relevant post objects""" def _pagination(self, url): headers = {"Origin": self.root} while url: url = text.ensure_http_scheme(url) body = self.request(url, headers=headers).json()["body"] for item in body["items"]: yield self._get_post_data(item["id"]) url = body["nextUrl"] def _get_post_data(self, post_id): """Fetch and process post data""" headers = {"Origin": self.root} url = "https://api.fanbox.cc/post.info?postId="+post_id post = self.request(url, headers=headers).json()["body"] content_body = post.pop("body", None) if content_body: if "html" in content_body: post["html"] = content_body["html"] if post["type"] == "article": post["articleBody"] = content_body.copy() post["date"] = text.parse_datetime(post["publishedDatetime"]) post["text"] = content_body.get("text") if content_body else None post["isCoverImage"] = False return content_body, post def _get_urls_from_post(self, content_body, post): num = 0 cover_image = post.get("coverImageUrl") if cover_image: final_post = post.copy() final_post["isCoverImage"] = True final_post["fileUrl"] = cover_image text.nameext_from_url(cover_image, final_post) final_post["num"] = num num += 1 yield Message.Url, cover_image, final_post if not content_body: return if "html" in content_body: html_urls = [] for href in text.extract_iter(content_body["html"], 'href="', '"'): if "fanbox.pixiv.net/images/entry" in href: html_urls.append(href) elif "downloads.fanbox.cc" in href: html_urls.append(href) for src in text.extract_iter(content_body["html"], 'data-src-original="', '"'): html_urls.append(src) for url in html_urls: final_post = post.copy() text.nameext_from_url(url, final_post) final_post["fileUrl"] = url final_post["num"] = num num += 1 yield Message.Url, url, final_post for group in ("images", "imageMap"): if group in content_body: for item in content_body[group]: if group == "imageMap": # imageMap is a dict with image objects as values item = content_body[group][item] final_post = post.copy() final_post["fileUrl"] = item["originalUrl"] text.nameext_from_url(item["originalUrl"], final_post) if "extension" in item: final_post["extension"] = item["extension"] final_post["fileId"] = item.get("id") final_post["width"] = item.get("width") final_post["height"] = item.get("height") final_post["num"] = num num += 1 yield Message.Url, item["originalUrl"], final_post for group in ("files", "fileMap"): if group in content_body: for item in content_body[group]: if group == "fileMap": # fileMap is a dict with file objects as values item = content_body[group][item] final_post = post.copy() final_post["fileUrl"] = item["url"] text.nameext_from_url(item["url"], final_post) if "extension" in item: final_post["extension"] = item["extension"] if "name" in item: final_post["filename"] = item["name"] final_post["fileId"] = item.get("id") final_post["num"] = num num += 1 yield Message.Url, item["url"], final_post if self.embeds: embeds_found = [] if "video" in content_body: embeds_found.append(content_body["video"]) embeds_found.extend(content_body.get("embedMap", {}).values()) for embed in embeds_found: # embed_result is (message type, url, metadata dict) embed_result = self._process_embed(post, embed) if not embed_result: continue embed_result[2]["num"] = num num += 1 yield embed_result def _process_embed(self, post, embed): final_post = post.copy() provider = embed["serviceProvider"] content_id = embed.get("videoId") or embed.get("contentId") prefix = "ytdl:" if self.embeds == "ytdl" else "" url = None is_video = False if provider == "soundcloud": url = prefix+"https://soundcloud.com/"+content_id is_video = True elif provider == "youtube": url = prefix+"https://youtube.com/watch?v="+content_id is_video = True elif provider == "vimeo": url = prefix+"https://vimeo.com/"+content_id is_video = True elif provider == "fanbox": # this is an old URL format that redirects # to a proper Fanbox URL url = "https://www.pixiv.net/fanbox/"+content_id # resolve redirect response = self.request(url, method="HEAD", allow_redirects=False) url = response.headers["Location"] final_post["_extractor"] = FanboxPostExtractor elif provider == "twitter": url = "https://twitter.com/_/status/"+content_id elif provider == "google_forms": templ = "https://docs.google.com/forms/d/e/{}/viewform?usp=sf_link" url = templ.format(content_id) else: self.log.warning("service not recognized: {}".format(provider)) if url: final_post["embed"] = embed final_post["embedUrl"] = url text.nameext_from_url(url, final_post) msg_type = Message.Queue if is_video and self.embeds == "ytdl": msg_type = Message.Url return msg_type, url, final_post class FanboxCreatorExtractor(FanboxExtractor): """Extractor for a Fanbox creator's works""" subcategory = "creator" pattern = BASE_PATTERN + r"(?:/posts)?/?$" test = ( ("https://xub.fanbox.cc", { "range": "1-15", "count": ">= 15", "keyword": { "creatorId" : "xub", "tags" : list, "title" : str, }, }), ("https://xub.fanbox.cc/posts"), ("https://www.fanbox.cc/@xub/"), ("https://www.fanbox.cc/@xub/posts"), ) def __init__(self, match): FanboxExtractor.__init__(self, match) self.creator_id = match.group(1) or match.group(2) def posts(self): url = "https://api.fanbox.cc/post.listCreator?creatorId={}&limit=10" return self._pagination(url.format(self.creator_id)) class FanboxPostExtractor(FanboxExtractor): """Extractor for media from a single Fanbox post""" subcategory = "post" pattern = BASE_PATTERN + r"/posts/(\d+)" test = ( ("https://www.fanbox.cc/@xub/posts/1910054", { "count": 3, "keyword": { "title": "えま★おうがすと", "tags": list, "hasAdultContent": True, "isCoverImage": False }, }), # entry post type, image embedded in html of the post ("https://nekoworks.fanbox.cc/posts/915", { "count": 2, "keyword": { "title": "【SAYORI FAN CLUB】お届け内容", "tags": list, "html": str, "hasAdultContent": True }, }), # article post type, imageMap, 2 twitter embeds, fanbox embed ("https://steelwire.fanbox.cc/posts/285502", { "options": (("embeds", True),), "count": 10, "keyword": { "title": "イラスト+SS|義足の炭鉱少年が義足を見せてくれるだけ 【全体公開版】", "tags": list, "articleBody": dict, "hasAdultContent": True }, }), ) def __init__(self, match): FanboxExtractor.__init__(self, match) self.post_id = match.group(3) def posts(self): return (self._get_post_data(self.post_id),) class FanboxRedirectExtractor(Extractor): """Extractor for pixiv redirects to fanbox.cc""" category = "fanbox" subcategory = "redirect" pattern = r"(?:https?://)?(?:www\.)?pixiv\.net/fanbox/creator/(\d+)" test = ("https://www.pixiv.net/fanbox/creator/52336352", { "pattern": FanboxCreatorExtractor.pattern, }) def __init__(self, match): Extractor.__init__(self, match) self.user_id = match.group(1) def items(self): url = "https://www.pixiv.net/fanbox/creator/" + self.user_id data = {"_extractor": FanboxCreatorExtractor} response = self.request( url, method="HEAD", allow_redirects=False, notfound="user") yield Message.Queue, response.headers["Location"], data ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1648567962.0 gallery_dl-1.21.1/gallery_dl/extractor/fantia.py0000644000175000017500000001410714220623232020352 0ustar00mikemike# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://fantia.jp/""" from .common import Extractor, Message from .. import text import json class FantiaExtractor(Extractor): """Base class for Fantia extractors""" category = "fantia" root = "https://fantia.jp" directory_fmt = ("{category}", "{fanclub_id}") filename_fmt = "{post_id}_{file_id}.{extension}" archive_fmt = "{post_id}_{file_id}" _warning = True def items(self): if self._warning: if not self._check_cookies(("_session_id",)): self.log.warning("no '_session_id' cookie set") FantiaExtractor._warning = False for post_id in self.posts(): full_response, post = self._get_post_data(post_id) yield Message.Directory, post post["num"] = 0 for url, url_data in self._get_urls_from_post(full_response, post): post["num"] += 1 fname = url_data["content_filename"] or url text.nameext_from_url(fname, url_data) url_data["file_url"] = url yield Message.Url, url, url_data def posts(self): """Return post IDs""" def _pagination(self, url): params = {"page": 1} headers = {"Referer": self.root} while True: page = self.request(url, params=params, headers=headers).text post_id = None for post_id in text.extract_iter( page, 'class="link-block" href="/posts/', '"'): yield post_id if not post_id: return params["page"] += 1 def _get_post_data(self, post_id): """Fetch and process post data""" headers = {"Referer": self.root} url = self.root+"/api/v1/posts/"+post_id resp = self.request(url, headers=headers).json()["post"] post = { "post_id": resp["id"], "post_url": self.root + "/posts/" + str(resp["id"]), "post_title": resp["title"], "comment": resp["comment"], "rating": resp["rating"], "posted_at": resp["posted_at"], "date": text.parse_datetime( resp["posted_at"], "%a, %d %b %Y %H:%M:%S %z"), "fanclub_id": resp["fanclub"]["id"], "fanclub_user_id": resp["fanclub"]["user"]["id"], "fanclub_user_name": resp["fanclub"]["user"]["name"], "fanclub_name": resp["fanclub"]["name"], "fanclub_url": self.root+"/fanclubs/"+str(resp["fanclub"]["id"]), "tags": resp["tags"] } return resp, post def _get_urls_from_post(self, resp, post): """Extract individual URL data from the response""" if "thumb" in resp and resp["thumb"] and "original" in resp["thumb"]: post["content_filename"] = "" post["content_category"] = "thumb" post["file_id"] = "thumb" yield resp["thumb"]["original"], post for content in resp["post_contents"]: post["content_category"] = content["category"] post["content_title"] = content["title"] post["content_filename"] = content.get("filename", "") post["content_id"] = content["id"] if "comment" in content: post["content_comment"] = content["comment"] if "post_content_photos" in content: for photo in content["post_content_photos"]: post["file_id"] = photo["id"] yield photo["url"]["original"], post if "download_uri" in content: post["file_id"] = content["id"] yield self.root+"/"+content["download_uri"], post if content["category"] == "blog" and "comment" in content: comment_json = json.loads(content["comment"]) ops = comment_json.get("ops", ()) # collect blogpost text first blog_text = "" for op in ops: insert = op.get("insert") if isinstance(insert, str): blog_text += insert post["blogpost_text"] = blog_text # collect images for op in ops: insert = op.get("insert") if isinstance(insert, dict) and "fantiaImage" in insert: img = insert["fantiaImage"] post["file_id"] = img["id"] yield "https://fantia.jp" + img["original_url"], post class FantiaCreatorExtractor(FantiaExtractor): """Extractor for a Fantia creator's works""" subcategory = "creator" pattern = r"(?:https?://)?(?:www\.)?fantia\.jp/fanclubs/(\d+)" test = ( ("https://fantia.jp/fanclubs/6939", { "range": "1-25", "count": ">= 25", "keyword": { "fanclub_user_id" : 52152, "tags" : list, "title" : str, }, }), ) def __init__(self, match): FantiaExtractor.__init__(self, match) self.creator_id = match.group(1) def posts(self): url = "{}/fanclubs/{}/posts".format(self.root, self.creator_id) return self._pagination(url) class FantiaPostExtractor(FantiaExtractor): """Extractor for media from a single Fantia post""" subcategory = "post" pattern = r"(?:https?://)?(?:www\.)?fantia\.jp/posts/(\d+)" test = ( ("https://fantia.jp/posts/508363", { "count": 6, "keyword": { "post_title": "zunda逆バニーでおしりコッショリ", "tags": list, "rating": "adult", "post_id": 508363 }, }), ) def __init__(self, match): FantiaExtractor.__init__(self, match) self.post_id = match.group(1) def posts(self): return (self.post_id,) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1643756959.0 gallery_dl-1.21.1/gallery_dl/extractor/flickr.py0000644000175000017500000004610214176336637020406 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2017-2020 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.flickr.com/""" from .common import Extractor, Message from .. import text, oauth, util, exception class FlickrExtractor(Extractor): """Base class for flickr extractors""" category = "flickr" filename_fmt = "{category}_{id}.{extension}" directory_fmt = ("{category}", "{user[username]}") archive_fmt = "{id}" cookiedomain = None def __init__(self, match): Extractor.__init__(self, match) self.api = FlickrAPI(self) self.item_id = match.group(1) self.user = None def items(self): data = self.metadata() extract = self.api._extract_format for photo in self.photos(): try: photo = extract(photo) except Exception as exc: self.log.warning( "Skipping %s (%s)", photo["id"], exc.__class__.__name__) self.log.debug("", exc_info=True) else: photo.update(data) url = photo["url"] yield Message.Directory, photo yield Message.Url, url, text.nameext_from_url(url, photo) def metadata(self): """Return general metadata""" self.user = self.api.urls_lookupUser(self.item_id) return {"user": self.user} def photos(self): """Return an iterable with all relevant photo objects""" class FlickrImageExtractor(FlickrExtractor): """Extractor for individual images from flickr.com""" subcategory = "image" pattern = (r"(?:https?://)?(?:" r"(?:(?:www\.|m\.)?flickr\.com/photos/[^/]+/" r"|[\w-]+\.static\.?flickr\.com/(?:\d+/)+)(\d+)" r"|flic\.kr/p/([A-Za-z1-9]+))") test = ( ("https://www.flickr.com/photos/departingyyz/16089302239", { "pattern": pattern, "content": ("3133006c6d657fe54cf7d4c46b82abbcb0efaf9f", "0821a28ee46386e85b02b67cf2720063440a228c"), "keyword": { "comments": int, "description": str, "extension": "jpg", "filename": "16089302239_de18cd8017_b", "id": 16089302239, "height": 683, "label": "Large", "media": "photo", "url": str, "views": int, "width": 1024, }, }), ("https://www.flickr.com/photos/145617051@N08/46733161535", { "count": 1, "keyword": {"media": "video"}, }), ("http://c2.staticflickr.com/2/1475/24531000464_9a7503ae68_b.jpg", { "pattern": pattern}), ("https://farm2.static.flickr.com/1035/1188352415_cb139831d0.jpg", { "pattern": pattern}), ("https://flic.kr/p/FPVo9U", { "pattern": pattern}), ("https://www.flickr.com/photos/zzz/16089302238", { "exception": exception.NotFoundError}), ) def __init__(self, match): FlickrExtractor.__init__(self, match) if not self.item_id: alphabet = ("123456789abcdefghijkmnopqrstu" "vwxyzABCDEFGHJKLMNPQRSTUVWXYZ") self.item_id = util.bdecode(match.group(2), alphabet) def items(self): photo = self.api.photos_getInfo(self.item_id) if photo["media"] == "video" and self.api.videos: self.api._extract_video(photo) else: self.api._extract_photo(photo) photo["user"] = photo["owner"] photo["title"] = photo["title"]["_content"] photo["comments"] = text.parse_int(photo["comments"]["_content"]) photo["description"] = photo["description"]["_content"] photo["tags"] = [t["raw"] for t in photo["tags"]["tag"]] photo["date"] = text.parse_timestamp(photo["dateuploaded"]) photo["views"] = text.parse_int(photo["views"]) photo["id"] = text.parse_int(photo["id"]) if "location" in photo: location = photo["location"] for key, value in location.items(): if isinstance(value, dict): location[key] = value["_content"] url = photo["url"] yield Message.Directory, photo yield Message.Url, url, text.nameext_from_url(url, photo) class FlickrAlbumExtractor(FlickrExtractor): """Extractor for photo albums from flickr.com""" subcategory = "album" directory_fmt = ("{category}", "{user[username]}", "Albums", "{album[id]} {album[title]}") archive_fmt = "a_{album[id]}_{id}" pattern = (r"(?:https?://)?(?:www\.)?flickr\.com/" r"photos/([^/]+)/(?:album|set)s(?:/(\d+))?") test = ( (("https://www.flickr.com/photos/shona_s/albums/72157633471741607"), { "pattern": FlickrImageExtractor.pattern, "count": 6, }), ("https://www.flickr.com/photos/shona_s/albums", { "pattern": pattern, "count": 2, }), ) def __init__(self, match): FlickrExtractor.__init__(self, match) self.album_id = match.group(2) def items(self): if self.album_id: return FlickrExtractor.items(self) return self._album_items() def _album_items(self): data = FlickrExtractor.metadata(self) data["_extractor"] = FlickrAlbumExtractor for album in self.api.photosets_getList(self.user["nsid"]): self.api._clean_info(album).update(data) url = "https://www.flickr.com/photos/{}/albums/{}".format( self.user["path_alias"], album["id"]) yield Message.Queue, url, album def metadata(self): data = FlickrExtractor.metadata(self) data["album"] = self.api.photosets_getInfo( self.album_id, self.user["nsid"]) return data def photos(self): return self.api.photosets_getPhotos(self.album_id) class FlickrGalleryExtractor(FlickrExtractor): """Extractor for photo galleries from flickr.com""" subcategory = "gallery" directory_fmt = ("{category}", "{user[username]}", "Galleries", "{gallery[gallery_id]} {gallery[title]}") archive_fmt = "g_{gallery[id]}_{id}" pattern = (r"(?:https?://)?(?:www\.)?flickr\.com/" r"photos/([^/]+)/galleries/(\d+)") test = (("https://www.flickr.com/photos/flickr/" "galleries/72157681572514792/"), { "pattern": FlickrImageExtractor.pattern, "count": ">= 10", }) def __init__(self, match): FlickrExtractor.__init__(self, match) self.gallery_id = match.group(2) def metadata(self): data = FlickrExtractor.metadata(self) data["gallery"] = self.api.galleries_getInfo(self.gallery_id) return data def photos(self): return self.api.galleries_getPhotos(self.gallery_id) class FlickrGroupExtractor(FlickrExtractor): """Extractor for group pools from flickr.com""" subcategory = "group" directory_fmt = ("{category}", "Groups", "{group[groupname]}") archive_fmt = "G_{group[nsid]}_{id}" pattern = r"(?:https?://)?(?:www\.)?flickr\.com/groups/([^/]+)" test = ("https://www.flickr.com/groups/bird_headshots/", { "pattern": FlickrImageExtractor.pattern, "count": "> 150", }) def metadata(self): self.group = self.api.urls_lookupGroup(self.item_id) return {"group": self.group} def photos(self): return self.api.groups_pools_getPhotos(self.group["nsid"]) class FlickrUserExtractor(FlickrExtractor): """Extractor for the photostream of a flickr user""" subcategory = "user" archive_fmt = "u_{user[nsid]}_{id}" pattern = r"(?:https?://)?(?:www\.)?flickr\.com/photos/([^/]+)/?$" test = ("https://www.flickr.com/photos/shona_s/", { "pattern": FlickrImageExtractor.pattern, "count": 28, }) def photos(self): return self.api.people_getPhotos(self.user["nsid"]) class FlickrFavoriteExtractor(FlickrExtractor): """Extractor for favorite photos of a flickr user""" subcategory = "favorite" directory_fmt = ("{category}", "{user[username]}", "Favorites") archive_fmt = "f_{user[nsid]}_{id}" pattern = r"(?:https?://)?(?:www\.)?flickr\.com/photos/([^/]+)/favorites" test = ("https://www.flickr.com/photos/shona_s/favorites", { "pattern": FlickrImageExtractor.pattern, "count": 4, }) def photos(self): return self.api.favorites_getList(self.user["nsid"]) class FlickrSearchExtractor(FlickrExtractor): """Extractor for flickr photos based on search results""" subcategory = "search" directory_fmt = ("{category}", "Search", "{search[text]}") archive_fmt = "s_{search}_{id}" pattern = r"(?:https?://)?(?:www\.)?flickr\.com/search/?\?([^#]+)" test = ( ("https://flickr.com/search/?text=mountain"), ("https://flickr.com/search/?text=tree%20cloud%20house" "&color_codes=4&styles=minimalism"), ) def __init__(self, match): FlickrExtractor.__init__(self, match) self.search = text.parse_query(match.group(1)) if "text" not in self.search: self.search["text"] = "" def metadata(self): return {"search": self.search} def photos(self): return self.api.photos_search(self.search) class FlickrAPI(oauth.OAuth1API): """Minimal interface for the flickr API""" API_URL = "https://api.flickr.com/services/rest/" API_KEY = "ac4fd7aa98585b9eee1ba761c209de68" API_SECRET = "3adb0f568dc68393" FORMATS = [ ("o" , "Original" , None), ("6k", "X-Large 6K" , 6144), ("5k", "X-Large 5K" , 5120), ("4k", "X-Large 4K" , 4096), ("3k", "X-Large 3K" , 3072), ("k" , "Large 2048" , 2048), ("h" , "Large 1600" , 1600), ("l" , "Large" , 1024), ("c" , "Medium 800" , 800), ("z" , "Medium 640" , 640), ("m" , "Medium" , 500), ("n" , "Small 320" , 320), ("s" , "Small" , 240), ("q" , "Large Square", 150), ("t" , "Thumbnail" , 100), ("s" , "Square" , 75), ] VIDEO_FORMATS = { "orig" : 9, "1080p" : 8, "720p" : 7, "360p" : 6, "288p" : 5, "700" : 4, "300" : 3, "100" : 2, "appletv" : 1, "iphone_wifi": 0, } def __init__(self, extractor): oauth.OAuth1API.__init__(self, extractor) self.videos = extractor.config("videos", True) self.maxsize = extractor.config("size-max") if isinstance(self.maxsize, str): for fmt, fmtname, fmtwidth in self.FORMATS: if self.maxsize == fmt or self.maxsize == fmtname: self.maxsize = fmtwidth break else: self.maxsize = None extractor.log.warning( "Could not match '%s' to any format", self.maxsize) if self.maxsize: self.formats = [fmt for fmt in self.FORMATS if not fmt[2] or fmt[2] <= self.maxsize] else: self.formats = self.FORMATS self.formats = self.formats[:8] def favorites_getList(self, user_id): """Returns a list of the user's favorite photos.""" params = {"user_id": user_id} return self._pagination("favorites.getList", params) def galleries_getInfo(self, gallery_id): """Gets information about a gallery.""" params = {"gallery_id": gallery_id} gallery = self._call("galleries.getInfo", params)["gallery"] return self._clean_info(gallery) def galleries_getPhotos(self, gallery_id): """Return the list of photos for a gallery.""" params = {"gallery_id": gallery_id} return self._pagination("galleries.getPhotos", params) def groups_pools_getPhotos(self, group_id): """Returns a list of pool photos for a given group.""" params = {"group_id": group_id} return self._pagination("groups.pools.getPhotos", params) def people_getPhotos(self, user_id): """Return photos from the given user's photostream.""" params = {"user_id": user_id} return self._pagination("people.getPhotos", params) def photos_getInfo(self, photo_id): """Get information about a photo.""" params = {"photo_id": photo_id} return self._call("photos.getInfo", params)["photo"] def photos_getSizes(self, photo_id): """Returns the available sizes for a photo.""" params = {"photo_id": photo_id} sizes = self._call("photos.getSizes", params)["sizes"]["size"] if self.maxsize: for index, size in enumerate(sizes): if index > 0 and (int(size["width"]) > self.maxsize or int(size["height"]) > self.maxsize): del sizes[index:] break return sizes def photos_search(self, params): """Return a list of photos matching some criteria.""" return self._pagination("photos.search", params.copy()) def photosets_getInfo(self, photoset_id, user_id): """Gets information about a photoset.""" params = {"photoset_id": photoset_id, "user_id": user_id} photoset = self._call("photosets.getInfo", params)["photoset"] return self._clean_info(photoset) def photosets_getList(self, user_id): """Returns the photosets belonging to the specified user.""" params = {"user_id": user_id} return self._pagination_sets("photosets.getList", params) def photosets_getPhotos(self, photoset_id): """Get the list of photos in a set.""" params = {"photoset_id": photoset_id} return self._pagination("photosets.getPhotos", params, "photoset") def urls_lookupGroup(self, groupname): """Returns a group NSID, given the url to a group's page.""" params = {"url": "https://www.flickr.com/groups/" + groupname} group = self._call("urls.lookupGroup", params)["group"] return {"nsid": group["id"], "path_alias": groupname, "groupname": group["groupname"]["_content"]} def urls_lookupUser(self, username): """Returns a user NSID, given the url to a user's photos or profile.""" params = {"url": "https://www.flickr.com/photos/" + username} user = self._call("urls.lookupUser", params)["user"] return { "nsid" : user["id"], "username" : user["username"]["_content"], "path_alias": username, } def video_getStreamInfo(self, video_id, secret=None): """Returns all available video streams""" params = {"photo_id": video_id} if not secret: secret = self._call("photos.getInfo", params)["photo"]["secret"] params["secret"] = secret stream = self._call("video.getStreamInfo", params)["streams"]["stream"] return max(stream, key=lambda s: self.VIDEO_FORMATS.get(s["type"], 0)) def _call(self, method, params): params["method"] = "flickr." + method params["format"] = "json" params["nojsoncallback"] = "1" if self.api_key: params["api_key"] = self.api_key data = self.request(self.API_URL, params=params).json() if "code" in data: msg = data.get("message") self.log.debug("Server response: %s", data) if data["code"] == 1: raise exception.NotFoundError(self.extractor.subcategory) elif data["code"] == 98: raise exception.AuthenticationError(msg) elif data["code"] == 99: raise exception.AuthorizationError(msg) raise exception.StopExtraction("API request failed: %s", msg) return data def _pagination(self, method, params, key="photos"): params["extras"] = ("description,date_upload,tags,views,media," "path_alias,owner_name,") params["extras"] += ",".join("url_" + fmt[0] for fmt in self.formats) params["page"] = 1 while True: data = self._call(method, params)[key] yield from data["photo"] if params["page"] >= data["pages"]: return params["page"] += 1 def _pagination_sets(self, method, params): params["page"] = 1 while True: data = self._call(method, params)["photosets"] yield from data["photoset"] if params["page"] >= data["pages"]: return params["page"] += 1 def _extract_format(self, photo): photo["description"] = photo["description"]["_content"].strip() photo["views"] = text.parse_int(photo["views"]) photo["date"] = text.parse_timestamp(photo["dateupload"]) photo["tags"] = photo["tags"].split() photo["id"] = text.parse_int(photo["id"]) if "owner" in photo: photo["owner"] = { "nsid" : photo["owner"], "username" : photo["ownername"], "path_alias": photo["pathalias"], } else: photo["owner"] = self.extractor.user del photo["pathalias"] del photo["ownername"] if photo["media"] == "video" and self.videos: return self._extract_video(photo) for fmt, fmtname, fmtwidth in self.formats: key = "url_" + fmt if key in photo: photo["width"] = text.parse_int(photo["width_" + fmt]) photo["height"] = text.parse_int(photo["height_" + fmt]) if self.maxsize and (photo["width"] > self.maxsize or photo["height"] > self.maxsize): continue photo["url"] = photo[key] photo["label"] = fmtname # remove excess data keys = [ key for key in photo if key.startswith(("url_", "width_", "height_")) ] for key in keys: del photo[key] break else: self._extract_photo(photo) return photo def _extract_photo(self, photo): size = self.photos_getSizes(photo["id"])[-1] photo["url"] = size["source"] photo["label"] = size["label"] photo["width"] = text.parse_int(size["width"]) photo["height"] = text.parse_int(size["height"]) return photo def _extract_video(self, photo): stream = self.video_getStreamInfo(photo["id"], photo.get("secret")) photo["url"] = stream["_content"] photo["label"] = stream["type"] photo["width"] = photo["height"] = 0 return photo @staticmethod def _clean_info(info): info["title"] = info["title"]["_content"] info["description"] = info["description"]["_content"] return info ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1646253139.0 gallery_dl-1.21.1/gallery_dl/extractor/foolfuuka.py0000644000175000017500000002506714207752123021121 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2019-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for 4chan archives based on FoolFuuka""" from .common import BaseExtractor, Message from .. import text import itertools class FoolfuukaExtractor(BaseExtractor): """Base extractor for FoolFuuka based boards/archives""" basecategory = "foolfuuka" archive_fmt = "{board[shortname]}_{num}_{timestamp}" external = "default" def __init__(self, match): BaseExtractor.__init__(self, match) self.session.headers["Referer"] = self.root if self.category == "b4k": self.remote = self._remote_direct def items(self): yield Message.Directory, self.metadata() for post in self.posts(): media = post["media"] if not media: continue url = media["media_link"] if not url and "remote_media_link" in media: url = self.remote(media) if url.startswith("/"): url = self.root + url post["filename"], _, post["extension"] = \ media["media"].rpartition(".") yield Message.Url, url, post def metadata(self): """Return general metadata""" def posts(self): """Return an iterable with all relevant posts""" def remote(self, media): """Resolve a remote media link""" needle = '= 5 else "" data["title"] = data["chapter_string"].partition(":")[2].strip() return data BASE_PATTERN = FoolslideExtractor.update({ "kireicake": { "root": "https://reader.kireicake.com", }, "powermanga": { "root": "https://read.powermanga.org", "pattern": r"read(?:er)?\.powermanga\.org", }, "sensescans": { "root": "https://sensescans.com/reader", "pattern": r"(?:(?:www\.)?sensescans\.com/reader" r"|reader\.sensescans\.com)", }, }) class FoolslideChapterExtractor(FoolslideExtractor): """Base class for chapter extractors for FoOlSlide based sites""" subcategory = "chapter" directory_fmt = ("{category}", "{manga}", "{chapter_string}") filename_fmt = ( "{manga}_c{chapter:>03}{chapter_minor:?//}_{page:>03}.{extension}") archive_fmt = "{id}" pattern = BASE_PATTERN + r"(/read/[^/?#]+/[a-z-]+/\d+/\d+(?:/\d+)?)" test = ( ("https://reader.kireicake.com/read/wonderland/en/1/1/", { "url": "b2d36bc0bc67e4c461c3a4d6444a2fd339f5d07e", "keyword": "9f80947920a325e33aea7f5cd69ea669171903b6", }), (("https://read.powermanga.org" "/read/one_piece_digital_colour_comics/en/0/75/"), { "url": "854c5817f8f767e1bccd05fa9d58ffb5a4b09384", "keyword": "a60c42f2634b7387899299d411ff494ed0ad6dbe", }), ("https://sensescans.com/reader/read/ao_no_orchestra/en/0/26/", { "url": "bbd428dc578f5055e9f86ad635b510386cd317cd", "keyword": "083ef6f8831c84127fe4096fa340a249be9d1424", }), ("https://reader.sensescans.com/read/ao_no_orchestra/en/0/26/"), ) def items(self): page = self.request(self.gallery_url).text data = self.metadata(page) imgs = self.images(page) data["count"] = len(imgs) data["chapter_id"] = text.parse_int(imgs[0]["chapter_id"]) yield Message.Directory, data enum = util.enumerate_reversed if self.config( "page-reverse") else enumerate for data["page"], image in enum(imgs, 1): try: url = image["url"] del image["url"] del image["chapter_id"] del image["thumb_url"] except KeyError: pass for key in ("height", "id", "size", "width"): image[key] = text.parse_int(image[key]) data.update(image) text.nameext_from_url(data["filename"], data) yield Message.Url, url, data def metadata(self, page): extr = text.extract_from(page) extr('

    ', '') return self.parse_chapter_url(self.gallery_url, { "manga" : text.unescape(extr('title="', '"')).strip(), "chapter_string": text.unescape(extr('title="', '"')), }) def images(self, page): return json.loads(text.extract(page, "var pages = ", ";")[0]) class FoolslideMangaExtractor(FoolslideExtractor): """Base class for manga extractors for FoOlSlide based sites""" subcategory = "manga" categorytransfer = True pattern = BASE_PATTERN + r"(/series/[^/?#]+)" test = ( ("https://reader.kireicake.com/series/wonderland/", { "url": "d067b649af1cc88fa8c8b698fde04a10909fd169", "keyword": "268f43772fb239888ca5c5f6a4f65f99ffb3eefb", }), (("https://read.powermanga.org" "/series/one_piece_digital_colour_comics/"), { "count": ">= 1", "keyword": { "chapter": int, "chapter_minor": str, "chapter_string": str, "group": "PowerManga", "lang": "en", "language": "English", "manga": "One Piece Digital Colour Comics", "title": str, "volume": int, }, }), ("https://sensescans.com/reader/series/yotsubato/", { "count": ">= 3", }), ) def items(self): page = self.request(self.gallery_url).text chapters = self.chapters(page) if not self.config("chapter-reverse", False): chapters.reverse() for chapter, data in chapters: data["_extractor"] = FoolslideChapterExtractor yield Message.Queue, chapter, data def chapters(self, page): extr = text.extract_from(page) manga = text.unescape(extr('

    ', '

    ')).strip() author = extr('Author: ', 'Artist: ', '
    ")) path = extr('href="//d', '"') if not path: self.log.warning( "Unable to download post %s (\"%s\")", post_id, text.remove_html( extr('System Message', '') or extr('System Message', '
  • ') ) ) return None pi = text.parse_int rh = text.remove_html data = text.nameext_from_url(path, { "id" : pi(post_id), "url": "https://d" + path, }) if self._new_layout: data["tags"] = text.split_html(extr( 'class="tags-row">', '')) data["title"] = text.unescape(extr("

    ", "

    ")) data["artist"] = extr("", "<") data["_description"] = extr('class="section-body">', '') data["views"] = pi(rh(extr('class="views">', ''))) data["favorites"] = pi(rh(extr('class="favorites">', ''))) data["comments"] = pi(rh(extr('class="comments">', ''))) data["rating"] = rh(extr('class="rating">', '')) data["fa_category"] = rh(extr('>Category', '')) data["theme"] = rh(extr('>', '<')) data["species"] = rh(extr('>Species', '')) data["gender"] = rh(extr('>Gender', '')) data["width"] = pi(extr("", "x")) data["height"] = pi(extr("", "p")) else: # old site layout data["title"] = text.unescape(extr("

    ", "

    ")) data["artist"] = extr(">", "<") data["fa_category"] = extr("Category:", "<").strip() data["theme"] = extr("Theme:", "<").strip() data["species"] = extr("Species:", "<").strip() data["gender"] = extr("Gender:", "<").strip() data["favorites"] = pi(extr("Favorites:", "<")) data["comments"] = pi(extr("Comments:", "<")) data["views"] = pi(extr("Views:", "<")) data["width"] = pi(extr("Resolution:", "x")) data["height"] = pi(extr("", "<")) data["tags"] = text.split_html(extr( 'id="keywords">', ''))[::2] data["rating"] = extr('', ' ')
            data[", "") data["artist_url"] = data["artist"].replace("_", "").lower() data["user"] = self.user or data["artist_url"] data["date"] = text.parse_timestamp(data["filename"].partition(".")[0]) data["description"] = self._process_description(data["_description"]) return data @staticmethod def _process_description(description): return text.unescape(text.remove_html(description, "", "")) def _pagination(self, path): num = 1 while True: url = "{}/{}/{}/{}/".format( self.root, path, self.user, num) page = self.request(url).text post_id = None for post_id in text.extract_iter(page, 'id="sid-', '"'): yield post_id if not post_id: return num += 1 def _pagination_favorites(self): path = "/favorites/{}/".format(self.user) while path: page = self.request(self.root + path).text yield from text.extract_iter(page, 'id="sid-', '"') path = text.extract(page, 'right" href="', '"')[0] def _pagination_search(self, query): url = self.root + "/search/" data = { "page" : 1, "order-by" : "relevancy", "order-direction": "desc", "range" : "all", "range_from" : "", "range_to" : "", "rating-general" : "1", "rating-mature" : "1", "rating-adult" : "1", "type-art" : "1", "type-music" : "1", "type-flash" : "1", "type-story" : "1", "type-photo" : "1", "type-poetry" : "1", "mode" : "extended", } data.update(query) if "page" in query: data["page"] = text.parse_int(query["page"]) while True: page = self.request(url, method="POST", data=data).text post_id = None for post_id in text.extract_iter(page, 'id="sid-', '"'): yield post_id if not post_id: return if "next_page" in data: data["page"] += 1 else: data["next_page"] = "Next" class FuraffinityGalleryExtractor(FuraffinityExtractor): """Extractor for a furaffinity user's gallery""" subcategory = "gallery" pattern = BASE_PATTERN + r"/gallery/([^/?#]+)" test = ("https://www.furaffinity.net/gallery/mirlinthloth/", { "pattern": r"https://d\d?\.f(uraffinity|acdn)\.net" r"/art/mirlinthloth/\d+/\d+.\w+\.\w+", "range": "45-50", "count": 6, }) def posts(self): return self._pagination("gallery") class FuraffinityScrapsExtractor(FuraffinityExtractor): """Extractor for a furaffinity user's scraps""" subcategory = "scraps" directory_fmt = ("{category}", "{user!l}", "Scraps") pattern = BASE_PATTERN + r"/scraps/([^/?#]+)" test = ("https://www.furaffinity.net/scraps/mirlinthloth/", { "pattern": r"https://d\d?\.f(uraffinity|acdn)\.net" r"/art/[^/]+(/stories)?/\d+/\d+.\w+.", "count": ">= 3", }) def posts(self): return self._pagination("scraps") class FuraffinityFavoriteExtractor(FuraffinityExtractor): """Extractor for a furaffinity user's favorites""" subcategory = "favorite" directory_fmt = ("{category}", "{user!l}", "Favorites") pattern = BASE_PATTERN + r"/favorites/([^/?#]+)" test = ("https://www.furaffinity.net/favorites/mirlinthloth/", { "pattern": r"https://d\d?\.f(uraffinity|acdn)\.net" r"/art/[^/]+/\d+/\d+.\w+\.\w+", "range": "45-50", "count": 6, }) def posts(self): return self._pagination_favorites() class FuraffinitySearchExtractor(FuraffinityExtractor): """Extractor for furaffinity search results""" subcategory = "search" directory_fmt = ("{category}", "Search", "{search}") pattern = BASE_PATTERN + r"/search(?:/([^/?#]+))?/?[?&]([^#]+)" test = ( ("https://www.furaffinity.net/search/?q=cute", { "pattern": r"https://d\d?\.f(uraffinity|acdn)\.net" r"/art/[^/]+/\d+/\d+.\w+\.\w+", "range": "45-50", "count": 6, }), # first page of search results (#2402) ("https://www.furaffinity.net/search/?q=leaf&range=1day", { "range": "1-3", "count": 3, }), ) def __init__(self, match): FuraffinityExtractor.__init__(self, match) self.query = text.parse_query(match.group(2)) if self.user and "q" not in self.query: self.query["q"] = text.unquote(self.user) def metadata(self): return {"search": self.query.get("q")} def posts(self): return self._pagination_search(self.query) class FuraffinityPostExtractor(FuraffinityExtractor): """Extractor for individual posts on furaffinity""" subcategory = "post" pattern = BASE_PATTERN + r"/(?:view|full)/(\d+)" test = ( ("https://www.furaffinity.net/view/21835115/", { "pattern": r"https://d\d*\.f(uraffinity|acdn)\.net/(download/)?art" r"/mirlinthloth/music/1488278723/1480267446.mirlinthlot" r"h_dj_fennmink_-_bude_s_4_ever\.mp3", "keyword": { "artist" : "mirlinthloth", "artist_url" : "mirlinthloth", "date" : "dt:2016-11-27 17:24:06", "description": "A Song made playing the game Cosmic DJ.", "extension" : "mp3", "filename" : r"re:\d+\.\w+_dj_fennmink_-_bude_s_4_ever", "id" : 21835115, "tags" : list, "title" : "Bude's 4 Ever", "url" : r"re:https://d\d?\.f(uraffinity|acdn)\.net/art", "user" : "mirlinthloth", "views" : int, "favorites" : int, "comments" : int, "rating" : "General", "fa_category": "Music", "theme" : "All", "species" : "Unspecified / Any", "gender" : "Any", "width" : 120, "height" : 120, }, }), # 'external' option (#1492) ("https://www.furaffinity.net/view/42166511/", { "options": (("external", True),), "pattern": r"https://d\d*\.f(uraffinity|acdn)\.net/" r"|http://www\.postybirb\.com", "count": 2, }), # no tags (#2277) ("https://www.furaffinity.net/view/45331225/", { "keyword": { "artist": "Kota_Remminders", "artist_url": "kotaremminders", "date": "dt:2022-01-03 17:49:33", "fa_category": "Adoptables", "filename": "1641232173.kotaremminders_chidopts1", "gender": "Any", "height": 905, "id": 45331225, "rating": "General", "species": "Unspecified / Any", "tags": [], "theme": "All", "title": "REMINDER", "width": 1280, }, }), ("https://furaffinity.net/view/21835115/"), ("https://sfw.furaffinity.net/view/21835115/"), ("https://www.furaffinity.net/full/21835115/"), ) def posts(self): post_id = self.user self.user = None return (post_id,) class FuraffinityUserExtractor(FuraffinityExtractor): """Extractor for furaffinity user profiles""" subcategory = "user" cookiedomain = None pattern = BASE_PATTERN + r"/user/([^/?#]+)" test = ( ("https://www.furaffinity.net/user/mirlinthloth/", { "pattern": r"/gallery/mirlinthloth/$", }), ("https://www.furaffinity.net/user/mirlinthloth/", { "options": (("include", "all"),), "pattern": r"/(gallery|scraps|favorites)/mirlinthloth/$", "count": 3, }), ) def items(self): base = "{}/{{}}/{}/".format(self.root, self.user) return self._dispatch_extractors(( (FuraffinityGalleryExtractor , base.format("gallery")), (FuraffinityScrapsExtractor , base.format("scraps")), (FuraffinityFavoriteExtractor, base.format("favorites")), ), ("gallery",)) class FuraffinityFollowingExtractor(FuraffinityExtractor): """Extractor for a furaffinity user's watched users""" subcategory = "following" pattern = BASE_PATTERN + "/watchlist/by/([^/?#]+)" test = ("https://www.furaffinity.net/watchlist/by/mirlinthloth/", { "pattern": FuraffinityUserExtractor.pattern, "range": "176-225", "count": 50, }) def items(self): url = "{}/watchlist/by/{}/".format(self.root, self.user) data = {"_extractor": FuraffinityUserExtractor} while True: page = self.request(url).text for path in text.extract_iter(page, '
    ", "")[0].strip() title, _, gallery_id = title.rpartition("#") return { "gallery_id" : text.parse_int(gallery_id), "gallery_hash": self.gallery_hash, "title" : text.unescape(title[:-15]), "views" : data["hits"], "score" : data["rating"], "tags" : data["tags"].split(","), "count" : len(data["images"]), } def images(self, page): for image in self.data["images"]: yield "https:" + image["imageUrl"], image class FuskatorSearchExtractor(Extractor): """Extractor for search results on fuskator.com""" category = "fuskator" subcategory = "search" root = "https://fuskator.com" pattern = r"(?:https?://)?fuskator\.com(/(?:search|page)/.+)" test = ( ("https://fuskator.com/search/red_swimsuit/", { "pattern": FuskatorGalleryExtractor.pattern, "count": ">= 40", }), ("https://fuskator.com/page/3/swimsuit/quality/"), ) def __init__(self, match): Extractor.__init__(self, match) self.path = match.group(1) def items(self): url = self.root + self.path data = {"_extractor": FuskatorGalleryExtractor} while True: page = self.request(url).text for path in text.extract_iter( page, 'class="pic_pad">', '>>><')[0] if not pages: return url = self.root + text.rextract(pages, 'href="', '"')[0] ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1646253139.0 gallery_dl-1.21.1/gallery_dl/extractor/gelbooru.py0000644000175000017500000001472314207752123020741 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2014-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://gelbooru.com/""" from .common import Extractor, Message from . import gelbooru_v02 from .. import text, util, exception import binascii class GelbooruBase(): """Base class for gelbooru extractors""" category = "gelbooru" basecategory = "booru" root = "https://gelbooru.com" def _api_request(self, params): url = self.root + "/index.php?page=dapi&s=post&q=index&json=1" data = self.request(url, params=params).json() if "post" not in data: return () posts = data["post"] if not isinstance(posts, list): return (posts,) return posts def _pagination(self, params): params["pid"] = self.page_start params["limit"] = self.per_page limit = self.per_page // 2 while True: posts = self._api_request(params) for post in posts: yield post if len(posts) < limit: return if "pid" in params: del params["pid"] params["tags"] = "{} id:<{}".format(self.tags, post["id"]) @staticmethod def _file_url(post): url = post["file_url"] if url.endswith((".webm", ".mp4")): md5 = post["md5"] path = "/images/{}/{}/{}.webm".format(md5[0:2], md5[2:4], md5) post["_fallback"] = GelbooruBase._video_fallback(path) url = "https://img3.gelbooru.com" + path return url @staticmethod def _video_fallback(path): yield "https://img2.gelbooru.com" + path yield "https://img1.gelbooru.com" + path class GelbooruTagExtractor(GelbooruBase, gelbooru_v02.GelbooruV02TagExtractor): """Extractor for images from gelbooru.com based on search-tags""" pattern = (r"(?:https?://)?(?:www\.)?gelbooru\.com/(?:index\.php)?" r"\?page=post&s=list&tags=(?P[^&#]+)") test = ( ("https://gelbooru.com/index.php?page=post&s=list&tags=bonocho", { "count": 5, }), ("https://gelbooru.com/index.php?page=post&s=list&tags=meiya_neon", { "range": "196-204", "url": "845a61aa1f90fb4ced841e8b7e62098be2e967bf", "pattern": r"https://img\d\.gelbooru\.com" r"/images/../../[0-9a-f]{32}\.jpg", "count": 9, }), ) class GelbooruPoolExtractor(GelbooruBase, gelbooru_v02.GelbooruV02PoolExtractor): """Extractor for image-pools from gelbooru.com""" pattern = (r"(?:https?://)?(?:www\.)?gelbooru\.com/(?:index\.php)?" r"\?page=pool&s=show&id=(?P\d+)") test = ( ("https://gelbooru.com/index.php?page=pool&s=show&id=761", { "count": 6, }), ("https://gelbooru.com/index.php?page=pool&s=show&id=761", { "options": (("api", False),), "count": 6, }), ) def metadata(self): url = "{}/index.php?page=pool&s=show&id={}".format( self.root, self.pool_id) page = self.request(url).text name, pos = text.extract(page, "

    Now Viewing: ", "

    ") if not name: raise exception.NotFoundError("pool") self.post_ids = text.extract_iter(page, 'class="" id="p', '"', pos) return { "pool": text.parse_int(self.pool_id), "pool_name": text.unescape(name), } def posts(self): params = {} for params["id"] in util.advance(self.post_ids, self.page_start): yield from self._api_request(params) class GelbooruPostExtractor(GelbooruBase, gelbooru_v02.GelbooruV02PostExtractor): """Extractor for single images from gelbooru.com""" pattern = (r"(?:https?://)?(?:www\.)?gelbooru\.com/(?:index\.php)?" r"\?page=post&s=view&id=(?P\d+)") test = ( ("https://gelbooru.com/index.php?page=post&s=view&id=313638", { "content": "5e255713cbf0a8e0801dc423563c34d896bb9229", "count": 1, }), ("https://gelbooru.com/index.php?page=post&s=view&id=6018318", { "options": (("tags", True),), "content": "977caf22f27c72a5d07ea4d4d9719acdab810991", "keyword": { "tags_artist": "kirisaki_shuusei", "tags_character": str, "tags_copyright": "vocaloid", "tags_general": str, "tags_metadata": str, }, }), # video ("https://gelbooru.com/index.php?page=post&s=view&id=5938076", { "content": "6360452fa8c2f0c1137749e81471238564df832a", "pattern": r"https://img\d\.gelbooru\.com/images" r"/22/61/226111273615049235b001b381707bd0\.webm", }), # notes ("https://gelbooru.com/index.php?page=post&s=view&id=5997331", { "options": (("notes", True),), "keywords": { "notes": [ { "height": 553, "body": "Look over this way when you talk~", "width": 246, "x": 35, "y": 72 }, { "height": 557, "body": "Hey~\nAre you listening~?", "width": 246, "x": 1233, "y": 109 } ] } }), ) class GelbooruRedirectExtractor(GelbooruBase, Extractor): subcategory = "redirect" pattern = (r"(?:https?://)?(?:www\.)?gelbooru\.com" r"/redirect\.php\?s=([^&#]+)") test = (("https://gelbooru.com/redirect.php?s=Ly9nZWxib29ydS5jb20vaW5kZXgu" "cGhwP3BhZ2U9cG9zdCZzPXZpZXcmaWQ9MTgzMDA0Ng=="), { "pattern": r"https://gelbooru.com/index.php" r"\?page=post&s=view&id=1830046" }) def __init__(self, match): Extractor.__init__(self, match) self.redirect_url = text.ensure_http_scheme( binascii.a2b_base64(match.group(1)).decode()) def items(self): data = {"_extractor": GelbooruPostExtractor} yield Message.Queue, self.redirect_url, data ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1643756959.0 gallery_dl-1.21.1/gallery_dl/extractor/gelbooru_v01.py0000644000175000017500000001303214176336637021434 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for Gelbooru v0.1 sites""" from . import booru from .. import text class GelbooruV01Extractor(booru.BooruExtractor): basecategory = "gelbooru_v01" per_page = 20 def _parse_post(self, post_id): url = "{}/index.php?page=post&s=view&id={}".format( self.root, post_id) page = self.request(url).text post = text.extract_all(page, ( ("created_at", 'Posted: ', ' <'), ("uploader" , 'By: ', ' <'), ("width" , 'Size: ', 'x'), ("height" , '', ' <'), ("source" , 'Source:
    ', '<'), ))[0] post["id"] = post_id post["md5"] = post["file_url"].rpartition("/")[2].partition(".")[0] post["rating"] = (post["rating"] or "?")[0].lower() post["tags"] = text.unescape(post["tags"]) post["date"] = text.parse_datetime( post["created_at"], "%Y-%m-%d %H:%M:%S") return post BASE_PATTERN = GelbooruV01Extractor.update({ "thecollection" : {"root": "https://the-collection.booru.org"}, "illusioncardsbooru": {"root": "https://illusioncards.booru.org"}, "allgirlbooru" : {"root": "https://allgirl.booru.org"}, "drawfriends" : {"root": "https://drawfriends.booru.org"}, "vidyart" : {"root": "https://vidyart.booru.org"}, "theloudbooru" : {"root": "https://tlb.booru.org"}, }) class GelbooruV01TagExtractor(GelbooruV01Extractor): subcategory = "tag" directory_fmt = ("{category}", "{search_tags}") archive_fmt = "t_{search_tags}_{id}" pattern = BASE_PATTERN + r"/index\.php\?page=post&s=list&tags=([^&#]+)" test = ( (("https://the-collection.booru.org" "/index.php?page=post&s=list&tags=parody"), { "range": "1-25", "count": 25, }), (("https://illusioncards.booru.org" "/index.php?page=post&s=list&tags=koikatsu"), { "range": "1-25", "count": 25, }), ("https://allgirl.booru.org/index.php?page=post&s=list&tags=dress", { "range": "1-25", "count": 25, }), ("https://drawfriends.booru.org/index.php?page=post&s=list&tags=all"), ("https://vidyart.booru.org/index.php?page=post&s=list&tags=all"), ("https://tlb.booru.org/index.php?page=post&s=list&tags=all"), ) def __init__(self, match): GelbooruV01Extractor.__init__(self, match) self.tags = match.group(match.lastindex) def metadata(self): return {"search_tags": text.unquote(self.tags.replace("+", " "))} def posts(self): url = "{}/index.php?page=post&s=list&tags={}&pid=".format( self.root, self.tags) pid = self.page_start while True: page = self.request(url + str(pid)).text cnt = 0 for post_id in text.extract_iter( page, 'class="thumb">')[0] if html: tags = collections.defaultdict(list) pattern = re.compile( r"tag-type-([^\"' ]+).*?[?;]tags=([^\"'&]+)", re.S) for tag_type, tag_name in pattern.findall(html): tags[tag_type].append(text.unquote(tag_name)) for key, value in tags.items(): post["tags_" + key] = " ".join(value) return page def _notes(self, post, page=None): if not page: url = "{}/index.php?page=post&s=view&id={}".format( self.root, post["id"]) page = self.request(url).text notes = [] notes_data = text.extract(page, '
    ')[0] if not notes_data: return note_iter = text.extract_iter(notes_data, '') extr = text.extract for note_data in note_iter: note = { "width": int(extr(note_data, 'data-width="', '"')[0]), "height": int(extr(note_data, 'data-height="', '"')[0]), "x": int(extr(note_data, 'data-x="', '"')[0]), "y": int(extr(note_data, 'data-y="', '"')[0]), "body": extr(note_data, 'data-body="', '"')[0], } notes.append(note) post["notes"] = notes INSTANCES = { "realbooru": {"root": "https://realbooru.com"}, "rule34" : {"root": "https://rule34.xxx", "api_root": " https://api.rule34.xxx"}, "safebooru": {"root": "https://safebooru.org"}, "tbib" : {"root": "https://tbib.org"}, } BASE_PATTERN = GelbooruV02Extractor.update(INSTANCES) class GelbooruV02TagExtractor(GelbooruV02Extractor): subcategory = "tag" directory_fmt = ("{category}", "{search_tags}") archive_fmt = "t_{search_tags}_{id}" pattern = BASE_PATTERN + r"/index\.php\?page=post&s=list&tags=([^&#]+)" test = ( ("https://rule34.xxx/index.php?page=post&s=list&tags=danraku", { "content": "622e80be3f496672c44aab5c47fbc6941c61bc79", "pattern": r"https?://.*rule34\.xxx/images/\d+/[0-9a-f]+\.jpg", "count": 2, }), ("https://safebooru.org/index.php?page=post&s=list&tags=bonocho", { "url": "17c61b386530cf4c30842c9f580d15ef1cd09586", "content": "e5ad4c5bf241b1def154958535bef6c2f6b733eb", }), ("https://realbooru.com/index.php?page=post&s=list&tags=wine", { "count": ">= 64", }), ("https://tbib.org/index.php?page=post&s=list&tags=yuyaiyaui", { "count": ">= 120", }), ) def __init__(self, match): GelbooruV02Extractor.__init__(self, match) tags = match.group(match.lastindex) self.tags = text.unquote(tags.replace("+", " ")) def metadata(self): return {"search_tags": self.tags} def posts(self): return self._pagination({"tags" : self.tags}) class GelbooruV02PoolExtractor(GelbooruV02Extractor): subcategory = "pool" directory_fmt = ("{category}", "pool", "{pool}") archive_fmt = "p_{pool}_{id}" pattern = BASE_PATTERN + r"/index\.php\?page=pool&s=show&id=(\d+)" test = ( ("https://rule34.xxx/index.php?page=pool&s=show&id=179", { "count": 3, }), ("https://safebooru.org/index.php?page=pool&s=show&id=11", { "count": 5, }), ("https://realbooru.com/index.php?page=pool&s=show&id=1", { "count": 3, }), ) def __init__(self, match): GelbooruV02Extractor.__init__(self, match) self.pool_id = match.group(match.lastindex) self.post_ids = () def skip(self, num): self.page_start += num return num def metadata(self): url = "{}/index.php?page=pool&s=show&id={}".format( self.root, self.pool_id) page = self.request(url).text name, pos = text.extract(page, "

    Pool: ", "

    ") if not name: raise exception.NotFoundError("pool") self.post_ids = text.extract_iter( page, 'class="thumb" id="p', '"', pos) return { "pool": text.parse_int(self.pool_id), "pool_name": text.unescape(name), } def posts(self): params = {} for params["id"] in util.advance(self.post_ids, self.page_start): for post in self._api_request(params): yield post.attrib class GelbooruV02FavoriteExtractor(GelbooruV02Extractor): subcategory = "favorite" directory_fmt = ("{category}", "favorites", "{favorite_id}") archive_fmt = "f_{favorite_id}_{id}" per_page = 50 pattern = BASE_PATTERN + r"/index\.php\?page=favorites&s=view&id=(\d+)" test = ( ("https://rule34.xxx/index.php?page=favorites&s=view&id=1030218", { "count": 3, }), ("https://safebooru.org/index.php?page=favorites&s=view&id=17567", { "count": 2, }), ("https://realbooru.com/index.php?page=favorites&s=view&id=274", { "count": 4, }), ("https://tbib.org/index.php?page=favorites&s=view&id=7881", { "count": 3, }), ) def __init__(self, match): GelbooruV02Extractor.__init__(self, match) self.favorite_id = match.group(match.lastindex) def metadata(self): return {"favorite_id": text.parse_int(self.favorite_id)} def posts(self): url = self.root + "/index.php" params = { "page": "favorites", "s" : "view", "id" : self.favorite_id, "pid" : self.page_start * self.per_page, } data = {} while True: num_ids = 0 page = self.request(url, params=params).text for data["id"] in text.extract_iter(page, '" id="p', '"'): num_ids += 1 for post in self._api_request(data): yield post.attrib if num_ids < self.per_page: return params["pid"] += self.per_page class GelbooruV02PostExtractor(GelbooruV02Extractor): subcategory = "post" archive_fmt = "{id}" pattern = BASE_PATTERN + r"/index\.php\?page=post&s=view&id=(\d+)" test = ( ("https://rule34.xxx/index.php?page=post&s=view&id=1995545", { "content": "97e4bbf86c3860be18de384d02d544251afe1d45", "options": (("tags", True),), "keyword": { "tags_artist": "danraku", "tags_character": "kashima_(kantai_collection)", "tags_copyright": "kantai_collection", "tags_general": str, "tags_metadata": str, }, }), ("https://safebooru.org/index.php?page=post&s=view&id=1169132", { "url": "cf05e37a3c62b2d55788e2080b8eabedb00f999b", "content": "93b293b27dabd198afafabbaf87c49863ac82f27", "options": (("tags", True),), "keyword": { "tags_artist": "kawanakajima", "tags_character": "heath_ledger ronald_mcdonald the_joker", "tags_copyright": "dc_comics mcdonald's the_dark_knight", "tags_general": str, }, }), ("https://realbooru.com/index.php?page=post&s=view&id=668483", { "url": "2421b5b0e15d5e20f9067090a8b0fd4114d3e7d9", "content": "7f5873ce3b6cd295ea2e81fcb49583098ea9c8da", }), ("https://tbib.org/index.php?page=post&s=view&id=9233957", { "url": "5a6ebe07bfff8e6d27f7c30b5480f27abcb577d2", "content": "1c3831b6fbaa4686e3c79035b5d98460b1c85c43", }), ) def __init__(self, match): GelbooruV02Extractor.__init__(self, match) self.post_id = match.group(match.lastindex) def posts(self): return self._pagination({"id": self.post_id}) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1645832469.0 gallery_dl-1.21.1/gallery_dl/extractor/generic.py0000644000175000017500000001761614206264425020545 0ustar00mikemike# -*- coding: utf-8 -*- """Extractor for images in a generic web page.""" from .common import Extractor, Message from .. import config, text import re import os.path class GenericExtractor(Extractor): """Extractor for images in a generic web page.""" category = "generic" directory_fmt = ("{category}", "{pageurl}") archive_fmt = "{imageurl}" # By default, the generic extractor is disabled # and the "g(eneric):" prefix in url is required. # If the extractor is enabled, make the prefix optional pattern = r"(?ix)(?Pg(?:eneric)?:)" if config.get(("extractor", "generic"), "enabled"): pattern += r"?" # The generic extractor pattern should match (almost) any valid url # Based on: https://tools.ietf.org/html/rfc3986#appendix-B pattern += r""" (?Phttps?://)? # optional http(s) scheme (?P[-\w\.]+) # required domain (?P/[^?&#]*)? # optional path (?:\?(?P[^/?#]*))? # optional query (?:\#(?P.*))?$ # optional fragment """ def __init__(self, match): """Init.""" Extractor.__init__(self, match) # Strip the "g(eneric):" prefix # and inform about "forced" or "fallback" mode if match.group('generic'): self.log.info("Forcing use of generic information extractor.") self.url = match.group(0).partition(":")[2] else: self.log.info("Falling back on generic information extractor.") self.url = match.group(0) # Make sure we have a scheme, or use https if match.group('scheme'): self.scheme = match.group('scheme') else: self.scheme = 'https://' self.url = self.scheme + self.url # Used to resolve relative image urls self.root = self.scheme + match.group('domain') def items(self): """Get page, extract metadata & images, yield them in suitable messages. Adapted from common.GalleryExtractor.items() """ page = self.request(self.url).text data = self.metadata(page) imgs = self.images(page) try: data["count"] = len(imgs) except TypeError: pass images = enumerate(imgs, 1) yield Message.Version, 1 yield Message.Directory, data for data["num"], (url, imgdata) in images: if imgdata: data.update(imgdata) if "extension" not in imgdata: text.nameext_from_url(url, data) else: text.nameext_from_url(url, data) yield Message.Url, url, data def metadata(self, page): """Extract generic webpage metadata, return them in a dict.""" data = {} data['pageurl'] = self.url data['title'] = text.extract(page, '', "")[0] or "" data['description'] = text.extract( page, ',
    View ', '<', pos) data["chapter"] = text.parse_int(url.rpartition("/")[2][1:]) data["title"] = title results.append((text.urljoin(self.root, url), data.copy())) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1639190302.0 gallery_dl-1.21.1/gallery_dl/extractor/hentai2read.py0000644000175000017500000000732614155007436021314 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2016-2019 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extract hentai-manga from https://hentai2read.com/""" from .common import ChapterExtractor, MangaExtractor from .. import text import json import re class Hentai2readBase(): """Base class for hentai2read extractors""" category = "hentai2read" root = "https://hentai2read.com" class Hentai2readChapterExtractor(Hentai2readBase, ChapterExtractor): """Extractor for a single manga chapter from hentai2read.com""" archive_fmt = "{chapter_id}_{page}" pattern = r"(?:https?://)?(?:www\.)?hentai2read\.com(/[^/?#]+/(\d+))" test = ("https://hentai2read.com/amazon_elixir/1/", { "url": "964b942cf492b3a129d2fe2608abfc475bc99e71", "keyword": "ff84b8f751f0e4ee37717efc4332ff1db71951d9", }) def __init__(self, match): self.chapter = match.group(2) ChapterExtractor.__init__(self, match) def metadata(self, page): title, pos = text.extract(page, "", "") manga_id, pos = text.extract(page, 'data-mid="', '"', pos) chapter_id, pos = text.extract(page, 'data-cid="', '"', pos) match = re.match(r"Reading (.+) \(([^)]+)\) Hentai(?: by (.+))? - " r"(\d+): (.+) . Page 1 ", title) return { "manga": match.group(1), "manga_id": text.parse_int(manga_id), "chapter": text.parse_int(self.chapter), "chapter_id": text.parse_int(chapter_id), "type": match.group(2), "author": match.group(3), "title": match.group(5), "lang": "en", "language": "English", } @staticmethod def images(page): images = text.extract(page, "'images' : ", ",\n")[0] return [ ("https://hentaicdn.com/hentai" + part, None) for part in json.loads(images) ] class Hentai2readMangaExtractor(Hentai2readBase, MangaExtractor): """Extractor for hmanga from hentai2read.com""" chapterclass = Hentai2readChapterExtractor pattern = r"(?:https?://)?(?:www\.)?hentai2read\.com(/[^/?#]+)/?$" test = ( ("https://hentai2read.com/amazon_elixir/", { "url": "273073752d418ec887d7f7211e42b832e8c403ba", "keyword": "13c1ce7e15cbb941f01c843b0e89adc993d939ac", }), ("https://hentai2read.com/oshikage_riot/", { "url": "6595f920a3088a15c2819c502862d45f8eb6bea6", "keyword": "675c7b7a4fa52cf569c283553bd16b4200a5cd36", }), ) def chapters(self, page): results = [] manga, pos = text.extract( page, '', '') mtype, pos = text.extract( page, '[', ']', pos) manga_id = text.parse_int(text.extract( page, 'data-mid="', '"', pos)[0]) while True: chapter_id, pos = text.extract(page, ' data-cid="', '"', pos) if not chapter_id: return results _ , pos = text.extract(page, ' href="', '"', pos) url, pos = text.extract(page, ' href="', '"', pos) chapter, pos = text.extract(page, '>', '<', pos) chapter, _, title = text.unescape(chapter).strip().partition(" - ") results.append((url, { "manga_id": manga_id, "manga": manga, "type": mtype, "chapter_id": text.parse_int(chapter_id), "chapter": text.parse_int(chapter), "title": title, "lang": "en", "language": "English", })) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1648487626.0 gallery_dl-1.21.1/gallery_dl/extractor/hentaicosplays.py0000644000175000017500000000525014220366312022140 0ustar00mikemike# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://hentai-cosplays.com/ (also works for hentai-img.com and porn-images-xxx.com)""" from .common import GalleryExtractor from .. import text class HentaicosplaysGalleryExtractor(GalleryExtractor): """Extractor for image galleries from hentai-cosplays.com, hentai-img.com, and porn-images-xxx.com""" category = "hentaicosplays" directory_fmt = ("{site}", "{title}") filename_fmt = "{filename}.{extension}" archive_fmt = "{title}_{filename}" pattern = r"((?:https?://)?(?:\w{2}\.)?" \ r"(hentai-cosplays|hentai-img|porn-images-xxx)\.com)/" \ r"(?:image|story)/([\w-]+)" test = ( ("https://hentai-cosplays.com/image/---devilism--tide-kurihara-/", { "pattern": r"https://static\d?.hentai-cosplays.com/upload/" r"\d+/\d+/\d+/\d+.jpg$", "keyword": { "count": 18, "site": "hentai-cosplays", "slug": "---devilism--tide-kurihara-", "title": "艦 こ れ-devilism の tide Kurihara 憂", }, }), ("https://fr.porn-images-xxx.com/image/enako-enako-24/", { "pattern": r"https://static\d?.porn-images-xxx.com/upload/" r"\d+/\d+/\d+/\d+.jpg$", "keyword": { "count": 11, "site": "porn-images-xxx", "title": str, }, }), ("https://ja.hentai-img.com/image/hollow-cora-502/", { "pattern": r"https://static\d?.hentai-img.com/upload/" r"\d+/\d+/\d+/\d+.jpg$", "keyword": { "count": 2, "site": "hentai-img", "title": str, }, }), ) def __init__(self, match): root, self.site, self.slug = match.groups() self.root = text.ensure_http_scheme(root) url = "{}/story/{}/".format(self.root, self.slug) GalleryExtractor.__init__(self, match, url) self.session.headers["Referer"] = url def metadata(self, page): title = text.extract(page, "", "")[0] return { "title": text.unescape(title.rpartition(" Story Viewer - ")[0]), "slug" : self.slug, "site" : self.site, } def images(self, page): return [ (url, None) for url in text.extract_iter( page, '', '<')), "artist" : text.unescape(extr('/profile">', '<')), "width" : text.parse_int(extr('width="', '"')), "height" : text.parse_int(extr('height="', '"')), "index" : text.parse_int(path.rsplit("/", 2)[1]), "src" : text.urljoin(self.root, text.unescape(extr( 'src="', '"'))), "description": text.unescape(text.remove_html(extr( '>Description', '
    ') .replace("\r\n", "\n"), "", "")), "ratings" : [text.unescape(r) for r in text.extract_iter(extr( "class='ratings_box'", ""), "title='", "'")], "media" : text.unescape(extr("Media\t\t", "<")), "date" : text.parse_datetime(extr("datetime='", "'")), "views" : text.parse_int(extr("Views\t\t", "<")), "tags" : text.split_html(extr( "Keywords", ""))[::2], "score" : text.parse_int(extr('Score\t\t', '<')), } return text.nameext_from_url(data["src"], data) def _parse_story(self, html): """Collect url and metadata for a story""" extr = text.extract_from(html) data = { "user" : self.user, "title" : text.unescape(extr( "
    ", "").rpartition(">")[2]), "author" : text.unescape(extr('alt="', '"')), "date" : text.parse_datetime(extr( ">Updated<", "").rpartition(">")[2], "%B %d, %Y"), "status" : extr("class='indent'>", "<"), } for c in ("Chapters", "Words", "Comments", "Views", "Rating"): data[c.lower()] = text.parse_int(extr( ">" + c + ":", "<").replace(",", "")) data["description"] = text.unescape(extr( "class='storyDescript'>", ""), "title='", "'")] return text.nameext_from_url(data["src"], data) def _init_site_filters(self): """Set site-internal filters to show all images""" url = self.root + "/?enterAgree=1" self.request(url, method="HEAD") csrf_token = self.session.cookies.get( "YII_CSRF_TOKEN", domain=self.cookiedomain) if not csrf_token: self.log.warning("Unable to update site content filters") return url = self.root + "/site/filters" data = { "rating_nudity" : "3", "rating_violence" : "3", "rating_profanity": "3", "rating_racism" : "3", "rating_sex" : "3", "rating_spoilers" : "3", "rating_yaoi" : "1", "rating_yuri" : "1", "rating_teen" : "1", "rating_guro" : "1", "rating_furry" : "1", "rating_beast" : "1", "rating_male" : "1", "rating_female" : "1", "rating_futa" : "1", "rating_other" : "1", "rating_scat" : "1", "rating_incest" : "1", "rating_rape" : "1", "filter_media" : "A", "filter_order" : "date_new", "filter_type" : "0", "YII_CSRF_TOKEN" : text.unquote(text.extract( csrf_token, "%22", "%22")[0]), } self.request(url, method="POST", data=data) class HentaifoundryUserExtractor(HentaifoundryExtractor): """Extractor for a hentaifoundry user profile""" subcategory = "user" pattern = BASE_PATTERN + r"/user/([^/?#]+)/profile" test = ("https://www.hentai-foundry.com/user/Tenpura/profile",) def items(self): root = self.root user = "/user/" + self.user return self._dispatch_extractors(( (HentaifoundryPicturesExtractor , root + "/pictures" + user), (HentaifoundryScrapsExtractor, root + "/pictures" + user + "/scraps"), (HentaifoundryStoriesExtractor, root + "/stories" + user), (HentaifoundryFavoriteExtractor, root + user + "/faves/pictures"), ), ("pictures",)) class HentaifoundryPicturesExtractor(HentaifoundryExtractor): """Extractor for all pictures of a hentaifoundry user""" subcategory = "pictures" pattern = BASE_PATTERN + r"/pictures/user/([^/?#]+)(?:/page/(\d+))?/?$" test = ( ("https://www.hentai-foundry.com/pictures/user/Tenpura", { "url": "ebbc981a85073745e3ca64a0f2ab31fab967fc28", }), ("https://www.hentai-foundry.com/pictures/user/Tenpura/page/3"), ) def __init__(self, match): HentaifoundryExtractor.__init__(self, match) self.page_url = "{}/pictures/user/{}".format(self.root, self.user) class HentaifoundryScrapsExtractor(HentaifoundryExtractor): """Extractor for scraps of a hentaifoundry user""" subcategory = "scraps" directory_fmt = ("{category}", "{user}", "Scraps") pattern = BASE_PATTERN + r"/pictures/user/([^/?#]+)/scraps" test = ( ("https://www.hentai-foundry.com/pictures/user/Evulchibi/scraps", { "url": "7cd9c6ec6258c4ab8c44991f7731be82337492a7", }), ("https://www.hentai-foundry.com" "/pictures/user/Evulchibi/scraps/page/3"), ) def __init__(self, match): HentaifoundryExtractor.__init__(self, match) self.page_url = "{}/pictures/user/{}/scraps".format( self.root, self.user) class HentaifoundryFavoriteExtractor(HentaifoundryExtractor): """Extractor for favorite images of a hentaifoundry user""" subcategory = "favorite" directory_fmt = ("{category}", "{user}", "Favorites") archive_fmt = "f_{user}_{index}" pattern = BASE_PATTERN + r"/user/([^/?#]+)/faves/pictures" test = ( ("https://www.hentai-foundry.com/user/Tenpura/faves/pictures", { "url": "56f9ae2e89fe855e9fe1da9b81e5ec6212b0320b", }), ("https://www.hentai-foundry.com" "/user/Tenpura/faves/pictures/page/3"), ) def __init__(self, match): HentaifoundryExtractor.__init__(self, match) self.page_url = "{}/user/{}/faves/pictures".format( self.root, self.user) class HentaifoundryRecentExtractor(HentaifoundryExtractor): """Extractor for 'Recent Pictures' on hentaifoundry.com""" subcategory = "recent" directory_fmt = ("{category}", "Recent Pictures", "{date}") archive_fmt = "r_{index}" pattern = BASE_PATTERN + r"/pictures/recent/(\d\d\d\d-\d\d-\d\d)" test = ("https://www.hentai-foundry.com/pictures/recent/2018-09-20", { "pattern": r"https://pictures.hentai-foundry.com/[^/]/[^/?#]+/\d+/", "range": "20-30", }) def __init__(self, match): HentaifoundryExtractor.__init__(self, match) self.page_url = "{}/pictures/recent/{}".format(self.root, self.user) def metadata(self): return {"date": self.user} class HentaifoundryPopularExtractor(HentaifoundryExtractor): """Extractor for popular images on hentaifoundry.com""" subcategory = "popular" directory_fmt = ("{category}", "Popular Pictures") archive_fmt = "p_{index}" pattern = BASE_PATTERN + r"/pictures/popular()" test = ("https://www.hentai-foundry.com/pictures/popular", { "pattern": r"https://pictures.hentai-foundry.com/[^/]/[^/?#]+/\d+/", "range": "20-30", }) def __init__(self, match): HentaifoundryExtractor.__init__(self, match) self.page_url = self.root + "/pictures/popular" class HentaifoundryImageExtractor(HentaifoundryExtractor): """Extractor for a single image from hentaifoundry.com""" subcategory = "image" pattern = (r"(https?://)?(?:www\.|pictures\.)?hentai-foundry\.com" r"/(?:pictures/user|[^/?#])/([^/?#]+)/(\d+)") test = ( (("https://www.hentai-foundry.com" "/pictures/user/Tenpura/407501/shimakaze"), { "url": "fbf2fd74906738094e2575d2728e8dc3de18a8a3", "content": "91bf01497c39254b6dfb234a18e8f01629c77fd1", "keyword": { "artist" : "Tenpura", "date" : "dt:2016-02-22 14:41:19", "description": "Thank you!", "height" : 700, "index" : 407501, "media" : "Other digital art", "ratings": ["Sexual content", "Contains female nudity"], "score" : int, "tags" : ["kancolle", "kantai", "collection", "shimakaze"], "title" : "shimakaze", "user" : "Tenpura", "views" : int, "width" : 495, }, }), ("http://www.hentai-foundry.com/pictures/user/Tenpura/407501/", { "pattern": "http://pictures.hentai-foundry.com/t/Tenpura/407501/", }), ("https://www.hentai-foundry.com/pictures/user/Tenpura/407501/"), ("https://pictures.hentai-foundry.com" "/t/Tenpura/407501/Tenpura-407501-shimakaze.png"), ) skip = Extractor.skip def __init__(self, match): HentaifoundryExtractor.__init__(self, match) self.index = match.group(3) def items(self): post_url = "{}/pictures/user/{}/{}/?enterAgree=1".format( self.root, self.user, self.index) image = self._parse_post(post_url) image["user"] = self.user yield Message.Directory, image yield Message.Url, image["src"], image class HentaifoundryStoriesExtractor(HentaifoundryExtractor): """Extractor for stories of a hentaifoundry user""" subcategory = "stories" archive_fmt = "s_{index}" pattern = BASE_PATTERN + r"/stories/user/([^/?#]+)(?:/page/(\d+))?/?$" test = ("https://www.hentai-foundry.com/stories/user/SnowWolf35", { "count": ">= 35", "keyword": { "author" : "SnowWolf35", "chapters" : int, "comments" : int, "date" : "type:datetime", "description": str, "index" : int, "rating" : int, "ratings" : list, "status" : "re:(Inc|C)omplete", "title" : str, "user" : "SnowWolf35", "views" : int, "words" : int, }, }) def items(self): self._init_site_filters() for story_html in util.advance(self.stories(), self.start_post): story = self._parse_story(story_html) yield Message.Directory, story yield Message.Url, story["src"], story def stories(self): url = "{}/stories/user/{}".format(self.root, self.user) return self._pagination(url, '
    ', '') class HentaifoundryStoryExtractor(HentaifoundryExtractor): """Extractor for a hentaifoundry story""" subcategory = "story" archive_fmt = "s_{index}" pattern = BASE_PATTERN + r"/stories/user/([^/?#]+)/(\d+)" test = (("https://www.hentai-foundry.com/stories/user/SnowWolf35" "/26416/Overwatch-High-Chapter-Voting-Location"), { "url": "5a67cfa8c3bf7634c8af8485dd07c1ea74ee0ae8", "keyword": {"title": "Overwatch High Chapter Voting Location"}, }) skip = Extractor.skip def __init__(self, match): HentaifoundryExtractor.__init__(self, match) self.index = match.group(3) def items(self): story_url = "{}/stories/user/{}/{}/x?enterAgree=1".format( self.root, self.user, self.index) story = self._parse_story(self.request(story_url).text) yield Message.Directory, story yield Message.Url, story["src"], story ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1643756959.0 gallery_dl-1.21.1/gallery_dl/extractor/hentaifox.py0000644000175000017500000001256614176336637021130 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2019-2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://hentaifox.com/""" from .common import GalleryExtractor, Extractor, Message from .. import text import json class HentaifoxBase(): """Base class for hentaifox extractors""" category = "hentaifox" root = "https://hentaifox.com" class HentaifoxGalleryExtractor(HentaifoxBase, GalleryExtractor): """Extractor for image galleries on hentaifox.com""" pattern = r"(?:https?://)?(?:www\.)?hentaifox\.com(/gallery/(\d+))" test = ( ("https://hentaifox.com/gallery/56622/", { "pattern": r"https://i\d*\.hentaifox\.com/\d+/\d+/\d+\.jpg", "keyword": "bcd6b67284f378e5cc30b89b761140e3e60fcd92", "count": 24, }), # 'split_tag' element (#1378) ("https://hentaifox.com/gallery/630/", { "keyword": { "artist": ["beti", "betty", "magi", "mimikaki"], "characters": [ "aerith gainsborough", "tifa lockhart", "yuffie kisaragi" ], "count": 32, "gallery_id": 630, "group": ["cu-little2"], "parody": ["darkstalkers | vampire", "final fantasy vii"], "tags": ["femdom", "fingering", "masturbation", "yuri"], "title": "Cu-Little Bakanya~", "type": "doujinshi", }, }), ) def __init__(self, match): GalleryExtractor.__init__(self, match) self.gallery_id = match.group(2) @staticmethod def _split(txt): return [ text.remove_html(tag.partition(">")[2], "", "") for tag in text.extract_iter( txt, "class='tag_btn", "= 60", "keyword": { "url" : str, "gallery_id": int, "title" : str, }, }), ) def __init__(self, match): Extractor.__init__(self, match) self.path = match.group(1) def items(self): for gallery in self.galleries(): yield Message.Queue, gallery["url"], gallery def galleries(self): num = 1 while True: url = "{}{}/pag/{}/".format(self.root, self.path, num) page = self.request(url).text for info in text.extract_iter( page, 'class="g_title">') yield { "url" : text.urljoin(self.root, url), "gallery_id": text.parse_int( url.strip("/").rpartition("/")[2]), "title" : text.unescape(title), "_extractor": HentaifoxGalleryExtractor, } pos = page.find(">Next<") url = text.rextract(page, "href=", ">", pos)[0] if pos == -1 or "/pag" not in url: return num += 1 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1643756959.0 gallery_dl-1.21.1/gallery_dl/extractor/hentaihand.py0000644000175000017500000001071214176336637021235 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2020-2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://hentaihand.com/""" from .common import GalleryExtractor, Extractor, Message from .. import text, util import json class HentaihandGalleryExtractor(GalleryExtractor): """Extractor for image galleries on hentaihand.com""" category = "hentaihand" root = "https://hentaihand.com" pattern = r"(?:https?://)?(?:www\.)?hentaihand\.com/\w+/comic/([\w-]+)" test = ( (("https://hentaihand.com/en/comic/c75-takumi-na-muchi-choudenji-hou-" "no-aishi-kata-how-to-love-a-super-electromagnetic-gun-toaru-kagaku-" "no-railgun-english"), { "pattern": r"https://cdn.hentaihand.com/.*/images/360468/\d+.jpg$", "count": 50, "keyword": { "artists" : ["Takumi Na Muchi"], "date" : "dt:2014-06-28 00:00:00", "gallery_id": 360468, "lang" : "en", "language" : "English", "parodies" : ["Toaru Kagaku No Railgun"], "relationships": list, "tags" : list, "title" : r"re:\(C75\) \[Takumi na Muchi\] Choudenji Hou ", "title_alt" : r"re:\(C75\) \[たくみなむち\] 超電磁砲のあいしかた", "type" : "Doujinshi", }, }), ) def __init__(self, match): self.slug = match.group(1) url = "{}/api/comics/{}".format(self.root, self.slug) GalleryExtractor.__init__(self, match, url) def metadata(self, page): info = json.loads(page) data = { "gallery_id" : text.parse_int(info["id"]), "title" : info["title"], "title_alt" : info["alternative_title"], "slug" : self.slug, "type" : info["category"]["name"], "language" : info["language"]["name"], "lang" : util.language_to_code(info["language"]["name"]), "tags" : [t["slug"] for t in info["tags"]], "date" : text.parse_datetime( info["uploaded_at"], "%Y-%m-%d"), } for key in ("artists", "authors", "groups", "characters", "relationships", "parodies"): data[key] = [v["name"] for v in info[key]] return data def images(self, _): info = self.request(self.gallery_url + "/images").json() return [(img["source_url"], img) for img in info["images"]] class HentaihandTagExtractor(Extractor): """Extractor for tag searches on hentaihand.com""" category = "hentaihand" subcategory = "tag" root = "https://hentaihand.com" pattern = (r"(?i)(?:https?://)?(?:www\.)?hentaihand\.com" r"/\w+/(parody|character|tag|artist|group|language" r"|category|relationship)/([^/?#]+)") test = ( ("https://hentaihand.com/en/artist/takumi-na-muchi", { "pattern": HentaihandGalleryExtractor.pattern, "count": ">= 6", }), ("https://hentaihand.com/en/tag/full-color"), ("https://hentaihand.com/fr/language/japanese"), ("https://hentaihand.com/zh/category/manga"), ) def __init__(self, match): Extractor.__init__(self, match) self.type, self.key = match.groups() def items(self): if self.type[-1] == "y": tpl = self.type[:-1] + "ies" else: tpl = self.type + "s" url = "{}/api/{}/{}".format(self.root, tpl, self.key) tid = self.request(url, notfound=self.type).json()["id"] url = self.root + "/api/comics" params = { "per_page": "18", tpl : tid, "page" : 1, "q" : "", "sort" : "uploaded_at", "order" : "desc", "duration": "day", } while True: info = self.request(url, params=params).json() for gallery in info["data"]: gurl = "{}/en/comic/{}".format(self.root, gallery["slug"]) gallery["_extractor"] = HentaihandGalleryExtractor yield Message.Queue, gurl, gallery if params["page"] >= info["last_page"]: return params["page"] += 1 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1639190302.0 gallery_dl-1.21.1/gallery_dl/extractor/hentaihere.py0000644000175000017500000000740314155007436021236 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2016-2019 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extract hentai-manga from https://hentaihere.com/""" from .common import ChapterExtractor, MangaExtractor from .. import text import json import re class HentaihereBase(): """Base class for hentaihere extractors""" category = "hentaihere" root = "https://hentaihere.com" class HentaihereChapterExtractor(HentaihereBase, ChapterExtractor): """Extractor for a single manga chapter from hentaihere.com""" archive_fmt = "{chapter_id}_{page}" pattern = r"(?:https?://)?(?:www\.)?hentaihere\.com/m/S(\d+)/(\d+)" test = ("https://hentaihere.com/m/S13812/1/1/", { "url": "964b942cf492b3a129d2fe2608abfc475bc99e71", "keyword": "cbcee0c0eb178c4b87f06a834085784f8dddad24", }) def __init__(self, match): self.manga_id, self.chapter = match.groups() url = "{}/m/S{}/{}/1".format(self.root, self.manga_id, self.chapter) ChapterExtractor.__init__(self, match, url) def metadata(self, page): title = text.extract(page, "", "")[0] chapter_id = text.extract(page, 'report/C', '"')[0] pattern = r"Page 1 \| (.+) \(([^)]+)\) - Chapter \d+: (.+) by (.+) at " match = re.match(pattern, title) return { "manga": match.group(1), "manga_id": text.parse_int(self.manga_id), "chapter": text.parse_int(self.chapter), "chapter_id": text.parse_int(chapter_id), "type": match.group(2), "title": match.group(3), "author": match.group(4), "lang": "en", "language": "English", } @staticmethod def images(page): images = text.extract(page, "var rff_imageList = ", ";")[0] return [ ("https://hentaicdn.com/hentai" + part, None) for part in json.loads(images) ] class HentaihereMangaExtractor(HentaihereBase, MangaExtractor): """Extractor for hmanga from hentaihere.com""" chapterclass = HentaihereChapterExtractor pattern = r"(?:https?://)?(?:www\.)?hentaihere\.com(/m/S\d+)/?$" test = ( ("https://hentaihere.com/m/S13812", { "url": "d1ba6e28bb2162e844f8559c2b2725ba0a093559", "keyword": "13c1ce7e15cbb941f01c843b0e89adc993d939ac", }), ("https://hentaihere.com/m/S7608", { "url": "6c5239758dc93f6b1b4175922836c10391b174f7", "keyword": "675c7b7a4fa52cf569c283553bd16b4200a5cd36", }), ) def chapters(self, page): results = [] manga_id = text.parse_int( self.manga_url.rstrip("/").rpartition("/")[2][1:]) manga, pos = text.extract( page, '', '') mtype, pos = text.extract( page, '[', ']', pos) while True: marker, pos = text.extract( page, '
  • ', '', pos) if marker is None: return results url, pos = text.extract(page, '\n', '<', pos) chapter_id, pos = text.extract(page, '/C', '"', pos) chapter, _, title = text.unescape(chapter).strip().partition(" - ") results.append((url, { "manga_id": manga_id, "manga": manga, "type": mtype, "chapter_id": text.parse_int(chapter_id), "chapter": text.parse_int(chapter), "title": title, "lang": "en", "language": "English", })) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1643756959.0 gallery_dl-1.21.1/gallery_dl/extractor/hiperdex.py0000644000175000017500000001531414176336637020745 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2020-2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://hiperdex.com/""" from .common import ChapterExtractor, MangaExtractor from .. import text from ..cache import memcache import re BASE_PATTERN = r"((?:https?://)?(?:www\.)?hiperdex\d?\.(?:com|net|info))" class HiperdexBase(): """Base class for hiperdex extractors""" category = "hiperdex" root = "https://hiperdex.com" @memcache(keyarg=1) def manga_data(self, manga, page=None): if not page: url = "{}/manga/{}/".format(self.root, manga) page = self.request(url).text extr = text.extract_from(page) return { "manga" : text.unescape(extr( "", "<").rpartition("&")[0].strip()), "score" : text.parse_float(extr( 'id="averagerate">', '<')), "author" : text.remove_html(extr( 'class="author-content">', '</div>')), "artist" : text.remove_html(extr( 'class="artist-content">', '</div>')), "genre" : text.split_html(extr( 'class="genres-content">', '</div>'))[::2], "type" : extr( 'class="summary-content">', '<').strip(), "release": text.parse_int(text.remove_html(extr( 'class="summary-content">', '</div>'))), "status" : extr( 'class="summary-content">', '<').strip(), "description": text.remove_html(text.unescape(extr( 'class="description-summary">', '</div>'))), "language": "English", "lang" : "en", } def chapter_data(self, chapter): chapter, _, minor = chapter.partition("-") data = { "chapter" : text.parse_int(chapter), "chapter_minor": "." + minor if minor and minor != "end" else "", } data.update(self.manga_data(self.manga.lower())) return data class HiperdexChapterExtractor(HiperdexBase, ChapterExtractor): """Extractor for manga chapters from hiperdex.com""" pattern = BASE_PATTERN + r"(/manga/([^/?#]+)/([^/?#]+))" test = ( ("https://hiperdex.com/manga/domestic-na-kanojo/154-5/", { "pattern": r"https://hiperdex\d?.(com|net|info)/wp-content/uploads" r"/WP-manga/data/manga_\w+/[0-9a-f]{32}/\d+\.webp", "count": 9, "keyword": { "artist" : "Sasuga Kei", "author" : "Sasuga Kei", "chapter": 154, "chapter_minor": ".5", "description": "re:Natsuo Fujii is in love with his teacher, ", "genre" : list, "manga" : "Domestic na Kanojo", "release": 2014, "score" : float, "type" : "Manga", }, }), ("https://hiperdex2.com/manga/domestic-na-kanojo/154-5/"), ("https://hiperdex.net/manga/domestic-na-kanojo/154-5/"), ("https://hiperdex.info/manga/domestic-na-kanojo/154-5/"), ) def __init__(self, match): root, path, self.manga, self.chapter = match.groups() self.root = text.ensure_http_scheme(root) ChapterExtractor.__init__(self, match, self.root + path + "/") def metadata(self, _): return self.chapter_data(self.chapter) def images(self, page): return [ (url.strip(), None) for url in re.findall( r'id="image-\d+"\s+(?:data-)?src="([^"]+)', page) ] class HiperdexMangaExtractor(HiperdexBase, MangaExtractor): """Extractor for manga from hiperdex.com""" chapterclass = HiperdexChapterExtractor pattern = BASE_PATTERN + r"(/manga/([^/?#]+))/?$" test = ( ("https://hiperdex.com/manga/youre-not-that-special/", { "count": 51, "pattern": HiperdexChapterExtractor.pattern, "keyword": { "artist" : "Bolp", "author" : "Abyo4", "chapter": int, "chapter_minor": "", "description": "re:I didn’t think much of the creepy girl in ", "genre" : list, "manga" : "You’re Not That Special!", "release": 2019, "score" : float, "status" : "Completed", "type" : "Manhwa", }, }), ("https://hiperdex2.com/manga/youre-not-that-special/"), ("https://hiperdex.net/manga/youre-not-that-special/"), ("https://hiperdex.info/manga/youre-not-that-special/"), ) def __init__(self, match): root, path, self.manga = match.groups() self.root = text.ensure_http_scheme(root) MangaExtractor.__init__(self, match, self.root + path + "/") def chapters(self, page): self.manga_data(self.manga, page) results = [] shortlink = text.extract(page, "rel='shortlink' href='", "'")[0] data = { "action" : "manga_get_reading_nav", "manga" : shortlink.rpartition("=")[2], "chapter" : "", "volume_id": "", "style" : "list", "type" : "manga", } url = self.root + "/wp-admin/admin-ajax.php" page = self.request(url, method="POST", data=data).text for url in text.extract_iter(page, 'data-redirect="', '"'): chapter = url.rpartition("/")[2] results.append((url, self.chapter_data(chapter))) return results class HiperdexArtistExtractor(HiperdexBase, MangaExtractor): """Extractor for an artists's manga on hiperdex.com""" subcategory = "artist" categorytransfer = False chapterclass = HiperdexMangaExtractor reverse = False pattern = BASE_PATTERN + r"(/manga-a(?:rtist|uthor)/(?:[^/?#]+))" test = ( ("https://hiperdex.net/manga-artist/beck-ho-an/"), ("https://hiperdex2.com/manga-artist/beck-ho-an/"), ("https://hiperdex.info/manga-artist/beck-ho-an/"), ("https://hiperdex.com/manga-author/viagra/", { "pattern": HiperdexMangaExtractor.pattern, "count": ">= 6", }), ) def __init__(self, match): self.root = text.ensure_http_scheme(match.group(1)) MangaExtractor.__init__(self, match, self.root + match.group(2) + "/") def chapters(self, page): results = [] for info in text.extract_iter(page, 'id="manga-item-', '<img'): url = text.extract(info, 'href="', '"')[0] results.append((url, {})) return results ��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1648567962.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.21.1/gallery_dl/extractor/hitomi.py����������������������������������������������������0000644�0001750�0001750�00000016504�14220623232�020404� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2015-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://hitomi.la/""" from .common import GalleryExtractor, Extractor, Message from .nozomi import decode_nozomi from ..cache import memcache from .. import text, util import string import json import re class HitomiGalleryExtractor(GalleryExtractor): """Extractor for image galleries from hitomi.la""" category = "hitomi" root = "https://hitomi.la" pattern = (r"(?:https?://)?hitomi\.la" r"/(?:manga|doujinshi|cg|gamecg|galleries|reader)" r"/(?:[^/?#]+-)?(\d+)") test = ( ("https://hitomi.la/galleries/867789.html", { "pattern": r"https://[a-c]a\.hitomi\.la/webp/\d+/\d+" r"/[0-9a-f]{64}\.webp", "keyword": "86af5371f38117a07407f11af689bdd460b09710", "count": 16, }), # download test ("https://hitomi.la/galleries/1401410.html", { "range": "1", "content": "d75d5a3d1302a48469016b20e53c26b714d17745", }), # Game CG with scenes (#321) ("https://hitomi.la/galleries/733697.html", { "count": 210, }), # fallback for galleries only available through /reader/ URLs ("https://hitomi.la/galleries/1045954.html", { "count": 1413, }), # gallery with "broken" redirect ("https://hitomi.la/cg/scathacha-sama-okuchi-ecchi-1291900.html", { "count": 10, "options": (("format", "original"),), "pattern": r"https://[a-c]b\.hitomi\.la/images/\d+/\d+" r"/[0-9a-f]{64}\.jpg", }), # no tags ("https://hitomi.la/cg/1615823.html", { "count": 22, "options": (("format", "avif"),), "pattern": r"https://[a-c]a\.hitomi\.la/avif/\d+/\d+" r"/[0-9a-f]{64}\.avif", }), ("https://hitomi.la/manga/amazon-no-hiyaku-867789.html"), ("https://hitomi.la/manga/867789.html"), ("https://hitomi.la/doujinshi/867789.html"), ("https://hitomi.la/cg/867789.html"), ("https://hitomi.la/gamecg/867789.html"), ("https://hitomi.la/reader/867789.html"), ) def __init__(self, match): gid = match.group(1) url = "https://ltn.hitomi.la/galleries/{}.js".format(gid) GalleryExtractor.__init__(self, match, url) self.info = None self.session.headers["Referer"] = "{}/reader/{}.html".format( self.root, gid) def metadata(self, page): self.info = info = json.loads(page.partition("=")[2]) iget = info.get language = iget("language") if language: language = language.capitalize() date = iget("date") if date: date += ":00" tags = [] for tinfo in iget("tags") or (): tag = string.capwords(tinfo["tag"]) if tinfo.get("female"): tag += " ♀" elif tinfo.get("male"): tag += " ♂" tags.append(tag) return { "gallery_id": text.parse_int(info["id"]), "title" : info["title"], "type" : info["type"].capitalize(), "language" : language, "lang" : util.language_to_code(language), "date" : text.parse_datetime(date, "%Y-%m-%d %H:%M:%S%z"), "tags" : tags, "artist" : [o["artist"] for o in iget("artists") or ()], "group" : [o["group"] for o in iget("groups") or ()], "parody" : [o["parody"] for o in iget("parodys") or ()], "characters": [o["character"] for o in iget("characters") or ()] } def images(self, _): # see https://ltn.hitomi.la/gg.js gg_m, gg_b, gg_default = _parse_gg(self) fmt = self.config("format") or "webp" if fmt == "original": subdomain, fmt, ext = "b", "images", None else: subdomain, ext = "a", fmt result = [] for image in self.info["files"]: ihash = image["hash"] idata = text.nameext_from_url(image["name"]) if ext: idata["extension"] = ext # see https://ltn.hitomi.la/common.js inum = int(ihash[-1] + ihash[-3:-1], 16) url = "https://{}{}.hitomi.la/{}/{}/{}/{}.{}".format( chr(97 + gg_m.get(inum, gg_default)), subdomain, fmt, gg_b, inum, ihash, idata["extension"], ) result.append((url, idata)) return result class HitomiTagExtractor(Extractor): """Extractor for galleries from tag searches on hitomi.la""" category = "hitomi" subcategory = "tag" root = "https://hitomi.la" pattern = (r"(?:https?://)?hitomi\.la/" r"(tag|artist|group|series|type|character)/" r"([^/?#]+)\.html") test = ( ("https://hitomi.la/tag/screenshots-japanese.html", { "pattern": HitomiGalleryExtractor.pattern, "count": ">= 35", }), ("https://hitomi.la/artist/a1-all-1.html"), ("https://hitomi.la/group/initial%2Dg-all-1.html"), ("https://hitomi.la/series/amnesia-all-1.html"), ("https://hitomi.la/type/doujinshi-all-1.html"), ("https://hitomi.la/character/a2-all-1.html"), ) def __init__(self, match): Extractor.__init__(self, match) self.type, self.tag = match.groups() tag, _, num = self.tag.rpartition("-") if num.isdecimal(): self.tag = tag def items(self): data = {"_extractor": HitomiGalleryExtractor} nozomi_url = "https://ltn.hitomi.la/{}/{}.nozomi".format( self.type, self.tag) headers = { "Origin": self.root, "Cache-Control": "max-age=0", } offset = 0 while True: headers["Referer"] = "{}/{}/{}.html?page={}".format( self.root, self.type, self.tag, offset // 100 + 1) headers["Range"] = "bytes={}-{}".format(offset, offset+99) nozomi = self.request(nozomi_url, headers=headers).content for gallery_id in decode_nozomi(nozomi): gallery_url = "{}/galleries/{}.html".format( self.root, gallery_id) yield Message.Queue, gallery_url, data if len(nozomi) < 100: return offset += 100 @memcache() def _parse_gg(extr): page = extr.request("https://ltn.hitomi.la/gg.js").text m = {} keys = [] for match in re.finditer( r"case\s+(\d+):(?:\s*o\s*=\s*(\d+))?", page): key, value = match.groups() keys.append(int(key)) if value: value = int(value) for key in keys: m[key] = value keys.clear() for match in re.finditer( r"if\s+\(g\s*===?\s*(\d+)\)[\s{]*o\s*=\s*(\d+)", page): m[int(match.group(1))] = int(match.group(2)) d = re.search(r"(?:var\s|default:)\s*o\s*=\s*(\d+)", page) b = re.search(r"b:\s*[\"'](.+)[\"']", page) return m, b.group(1).strip("/"), int(d.group(1)) if d else 1 ��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1643756959.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.21.1/gallery_dl/extractor/idolcomplex.py�����������������������������������������������0000644�0001750�0001750�00000022302�14176336637�021447� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2018-2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://idol.sankakucomplex.com/""" from .sankaku import SankakuExtractor from .common import Message from ..cache import cache from .. import text, util, exception import collections import re class IdolcomplexExtractor(SankakuExtractor): """Base class for idolcomplex extractors""" category = "idolcomplex" cookienames = ("login", "pass_hash") cookiedomain = "idol.sankakucomplex.com" root = "https://" + cookiedomain request_interval = 5.0 def __init__(self, match): SankakuExtractor.__init__(self, match) self.logged_in = True self.start_page = 1 self.start_post = 0 self.extags = self.config("tags", False) def items(self): self.login() data = self.metadata() for post_id in util.advance(self.post_ids(), self.start_post): post = self._parse_post(post_id) url = post["file_url"] post.update(data) text.nameext_from_url(url, post) yield Message.Directory, post yield Message.Url, url, post def skip(self, num): self.start_post += num return num def post_ids(self): """Return an iterable containing all relevant post ids""" def login(self): if self._check_cookies(self.cookienames): return username, password = self._get_auth_info() if username: cookies = self._login_impl(username, password) self._update_cookies(cookies) else: self.logged_in = False @cache(maxage=90*24*3600, keyarg=1) def _login_impl(self, username, password): self.log.info("Logging in as %s", username) url = self.root + "/user/authenticate" data = { "url" : "", "user[name]" : username, "user[password]": password, "commit" : "Login", } response = self.request(url, method="POST", data=data) if not response.history or response.url != self.root + "/user/home": raise exception.AuthenticationError() cookies = response.history[0].cookies return {c: cookies[c] for c in self.cookienames} def _parse_post(self, post_id): """Extract metadata of a single post""" url = self.root + "/post/show/" + post_id page = self.request(url, retries=10).text extr = text.extract tags , pos = extr(page, "<title>", " | ") vavg , pos = extr(page, "itemprop=ratingValue>", "<", pos) vcnt , pos = extr(page, "itemprop=reviewCount>", "<", pos) _ , pos = extr(page, "Posted: <", "", pos) created, pos = extr(page, ' title="', '"', pos) rating = extr(page, "<li>Rating: ", "<", pos)[0] file_url, pos = extr(page, '<li>Original: <a href="', '"', pos) if file_url: width , pos = extr(page, '>', 'x', pos) height, pos = extr(page, '', ' ', pos) else: width , pos = extr(page, '<object width=', ' ', pos) height, pos = extr(page, 'height=', '>', pos) file_url = extr(page, '<embed src="', '"', pos)[0] data = { "id": text.parse_int(post_id), "md5": file_url.rpartition("/")[2].partition(".")[0], "tags": text.unescape(tags), "vote_average": text.parse_float(vavg), "vote_count": text.parse_int(vcnt), "created_at": created, "rating": (rating or "?")[0].lower(), "file_url": "https:" + text.unescape(file_url), "width": text.parse_int(width), "height": text.parse_int(height), } if self.extags: tags = collections.defaultdict(list) tags_html = text.extract(page, '<ul id=tag-sidebar>', '</ul>')[0] pattern = re.compile(r'tag-type-([^>]+)><a href="/\?tags=([^"]+)') for tag_type, tag_name in pattern.findall(tags_html or ""): tags[tag_type].append(text.unquote(tag_name)) for key, value in tags.items(): data["tags_" + key] = " ".join(value) return data class IdolcomplexTagExtractor(IdolcomplexExtractor): """Extractor for images from idol.sankakucomplex.com by search-tags""" subcategory = "tag" directory_fmt = ("{category}", "{search_tags}") archive_fmt = "t_{search_tags}_{id}" pattern = r"(?:https?://)?idol\.sankakucomplex\.com/\?([^#]*)" test = ( ("https://idol.sankakucomplex.com/?tags=lyumos", { "count": 5, "range": "18-22", "pattern": r"https://is\.sankakucomplex\.com/data/[^/]{2}/[^/]{2}" r"/[^/]{32}\.\w+\?e=\d+&m=[^&#]+", }), ("https://idol.sankakucomplex.com/?tags=order:favcount", { "count": 5, "range": "18-22", }), ("https://idol.sankakucomplex.com" "/?tags=lyumos+wreath&page=3&next=694215"), ) per_page = 20 def __init__(self, match): IdolcomplexExtractor.__init__(self, match) query = text.parse_query(match.group(1)) self.tags = text.unquote(query.get("tags", "").replace("+", " ")) self.start_page = text.parse_int(query.get("page"), 1) self.next = text.parse_int(query.get("next"), 0) def skip(self, num): if self.next: self.start_post += num else: pages, posts = divmod(num, self.per_page) self.start_page += pages self.start_post += posts return num def metadata(self): if not self.next: max_page = 50 if self.logged_in else 25 if self.start_page > max_page: self.log.info("Traversing from page %d to page %d", max_page, self.start_page) self.start_post += self.per_page * (self.start_page - max_page) self.start_page = max_page tags = self.tags.split() if not self.logged_in and len(tags) > 4: raise exception.StopExtraction( "Non-members can only search up to 4 tags at once") return {"search_tags": " ".join(tags)} def post_ids(self): params = {"tags": self.tags} if self.next: params["next"] = self.next else: params["page"] = self.start_page while True: page = self.request(self.root, params=params, retries=10).text pos = page.find("<div id=more-popular-posts-link>") + 1 yield from text.extract_iter(page, '" id=p', '>', pos) next_url = text.extract(page, 'next-page-url="', '"', pos)[0] if not next_url: return next_params = text.parse_query(text.unescape( next_url).lstrip("?/")) if "next" in next_params: # stop if the same "next" value occurs twice in a row (#265) if "next" in params and params["next"] == next_params["next"]: return next_params["page"] = "2" params = next_params class IdolcomplexPoolExtractor(IdolcomplexExtractor): """Extractor for image-pools from idol.sankakucomplex.com""" subcategory = "pool" directory_fmt = ("{category}", "pool", "{pool}") archive_fmt = "p_{pool}_{id}" pattern = r"(?:https?://)?idol\.sankakucomplex\.com/pool/show/(\d+)" test = ("https://idol.sankakucomplex.com/pool/show/145", { "count": 3, }) per_page = 24 def __init__(self, match): IdolcomplexExtractor.__init__(self, match) self.pool_id = match.group(1) def skip(self, num): pages, posts = divmod(num, self.per_page) self.start_page += pages self.start_post += posts return num def metadata(self): return {"pool": self.pool_id} def post_ids(self): url = self.root + "/pool/show/" + self.pool_id params = {"page": self.start_page} while True: page = self.request(url, params=params, retries=10).text ids = list(text.extract_iter(page, '" id=p', '>')) yield from ids if len(ids) < self.per_page: return params["page"] += 1 class IdolcomplexPostExtractor(IdolcomplexExtractor): """Extractor for single images from idol.sankakucomplex.com""" subcategory = "post" archive_fmt = "{id}" pattern = r"(?:https?://)?idol\.sankakucomplex\.com/post/show/(\d+)" test = ("https://idol.sankakucomplex.com/post/show/694215", { "content": "694ec2491240787d75bf5d0c75d0082b53a85afd", "options": (("tags", True),), "keyword": { "tags_character": "shani_(the_witcher)", "tags_copyright": "the_witcher", "tags_idol": str, "tags_medium": str, "tags_general": str, }, }) def __init__(self, match): IdolcomplexExtractor.__init__(self, match) self.post_id = match.group(1) def post_ids(self): return (self.post_id,) ������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1648567962.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.21.1/gallery_dl/extractor/imagebam.py��������������������������������������������������0000644�0001750�0001750�00000011621�14220623232�020650� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2014-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.imagebam.com/""" from .common import Extractor, Message from .. import text, exception import re class ImagebamExtractor(Extractor): """Base class for imagebam extractors""" category = "imagebam" root = "https://www.imagebam.com" def __init__(self, match): Extractor.__init__(self, match) self.path = match.group(1) self.session.cookies.set("nsfw_inter", "1", domain="www.imagebam.com") def _parse_image_page(self, path): page = self.request(self.root + path).text url, pos = text.extract(page, '<img src="https://images', '"') filename = text.unescape(text.extract(page, 'alt="', '"', pos)[0]) data = { "url" : "https://images" + url, "image_key": path.rpartition("/")[2], } data["filename"], _, data["extension"] = filename.rpartition(".") return data class ImagebamGalleryExtractor(ImagebamExtractor): """Extractor for imagebam galleries""" subcategory = "gallery" directory_fmt = ("{category}", "{title} {gallery_key}") filename_fmt = "{num:>03} {filename}.{extension}" archive_fmt = "{gallery_key}_{image_key}" pattern = (r"(?:https?://)?(?:www\.)?imagebam\.com" r"(/(?:gallery/|view/G)[a-zA-Z0-9]+)") test = ( ("https://www.imagebam.com/gallery/adz2y0f9574bjpmonaismyrhtjgvey4o", { "url": "76d976788ae2757ac81694736b07b72356f5c4c8", "keyword": "b048478b1bbba3072a7fa9fcc40630b3efad1f6c", "content": "596e6bfa157f2c7169805d50075c2986549973a8", }), ("http://www.imagebam.com/gallery/op9dwcklwdrrguibnkoe7jxgvig30o5p", { # more than 100 images; see issue #219 "count": 107, "url": "32ae6fe5dc3e4ca73ff6252e522d16473595d1d1", }), ("http://www.imagebam.com/gallery/gsl8teckymt4vbvx1stjkyk37j70va2c", { "exception": exception.HttpError, }), # /view/ path (#2378) ("https://www.imagebam.com/view/GA3MT1", { "url": "35018ce1e00a2d2825a33d3cd37857edaf804919", "keyword": "3a9f98178f73694c527890c0d7ca9a92b46987ba", }), ) def items(self): page = self.request(self.root + self.path).text images = self.images(page) images.reverse() data = self.metadata(page) data["count"] = len(images) data["gallery_key"] = self.path.rpartition("/")[2] yield Message.Directory, data for data["num"], path in enumerate(images, 1): image = self._parse_image_page(path) image.update(data) yield Message.Url, image["url"], image @staticmethod def metadata(page): return {"title": text.unescape(text.extract( page, 'id="gallery-name">', '<')[0].strip())} def images(self, page): findall = re.compile(r'<a href="https://www\.imagebam\.com' r'(/(?:image/|view/M)[a-zA-Z0-9]+)').findall paths = [] while True: paths += findall(page) pos = page.find('rel="next" aria-label="Next') if pos > 0: url = text.rextract(page, 'href="', '"', pos)[0] if url: page = self.request(url).text continue return paths class ImagebamImageExtractor(ImagebamExtractor): """Extractor for single imagebam images""" subcategory = "image" archive_fmt = "{image_key}" pattern = (r"(?:https?://)?(?:\w+\.)?imagebam\.com" r"(/(?:image/|view/M|(?:[0-9a-f]{2}/){3})[a-zA-Z0-9]+)") test = ( ("https://www.imagebam.com/image/94d56c502511890", { "url": "5e9ba3b1451f8ded0ae3a1b84402888893915d4a", "keyword": "2a4380d4b57554ff793898c2d6ec60987c86d1a1", "content": "0c8768055e4e20e7c7259608b67799171b691140", }), ("http://images3.imagebam.com/1d/8c/44/94d56c502511890.png"), # NSFW (#1534) ("https://www.imagebam.com/image/0850951366904951", { "url": "d37297b17ed1615b4311c8ed511e50ce46e4c748", }), # /view/ path (#2378) ("https://www.imagebam.com/view/ME8JOQP", { "url": "4dca72bbe61a0360185cf4ab2bed8265b49565b8", "keyword": "15a494c02fd30846b41b42a26117aedde30e4ceb", "content": "f81008666b17a42d8834c4749b910e1dc10a6e83", }), ) def items(self): path = self.path if path[3] == "/": path = ("/view/" if path[10] == "M" else "/image/") + path[10:] image = self._parse_image_page(path) yield Message.Directory, image yield Message.Url, image["url"], image ���������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1639190302.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.21.1/gallery_dl/extractor/imagechest.py������������������������������������������������0000644�0001750�0001750�00000003043�14155007436�021227� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2020 Leonid "Bepis" Pavel # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extract images from galleries at https://imgchest.com/""" from .common import GalleryExtractor from .. import text, exception class ImagechestGalleryExtractor(GalleryExtractor): """Extractor for image galleries from imgchest.com""" category = "imagechest" root = "https://imgchest.com" pattern = r"(?:https?://)?(?:www\.)?imgchest\.com/p/([A-Za-z0-9]{11})" test = ( ("https://imgchest.com/p/3na7kr3by8d", { "url": "f095b4f78c051e5a94e7c663814d1e8d4c93c1f7", "content": "076959e65be30249a2c651fbe6090dc30ba85193", "count": 3 }), ) def __init__(self, match): self.gallery_id = match.group(1) url = self.root + "/p/" + self.gallery_id GalleryExtractor.__init__(self, match, url) def metadata(self, page): if "Sorry, but the page you requested could not be found." in page: raise exception.NotFoundError("gallery") return { "gallery_id": self.gallery_id, "title": text.unescape(text.extract( page, 'property="og:title" content="', '"')[0].strip()) } def images(self, page): return [ (url, None) for url in text.extract_iter( page, 'property="og:image" content="', '"') ] ���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1643756959.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.21.1/gallery_dl/extractor/imagefap.py��������������������������������������������������0000644�0001750�0001750�00000017063�14176336637�020711� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2016-2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.imagefap.com/""" from .common import Extractor, Message from .. import text import json BASE_PATTERN = r"(?:https?://)?(?:www\.|beta\.)?imagefap\.com" class ImagefapExtractor(Extractor): """Base class for imagefap extractors""" category = "imagefap" directory_fmt = ("{category}", "{gallery_id} {title}") filename_fmt = "{category}_{gallery_id}_{filename}.{extension}" archive_fmt = "{gallery_id}_{image_id}" root = "https://www.imagefap.com" def __init__(self, match): Extractor.__init__(self, match) self.session.headers["Referer"] = self.root class ImagefapGalleryExtractor(ImagefapExtractor): """Extractor for image galleries from imagefap.com""" subcategory = "gallery" pattern = BASE_PATTERN + r"/(?:gallery\.php\?gid=|gallery/|pictures/)(\d+)" test = ( ("https://www.imagefap.com/pictures/7102714", { "pattern": r"https://cdnh\.imagefap\.com" r"/images/full/\d+/\d+/\d+\.jpg", "keyword": "2ba96e84c2952c4750e9fa94a3f2b1f965cec2f3", "content": "694a0a57385980a6f90fbc296cadcd6c11ba2dab", }), ("https://www.imagefap.com/gallery/5486966", { "pattern": r"https://cdnh\.imagefap\.com" r"/images/full/\d+/\d+/\d+\.jpg", "keyword": "3e24eace5b09639b881ebd393165862feb46adde", }), ("https://www.imagefap.com/gallery.php?gid=7102714"), ("https://beta.imagefap.com/gallery.php?gid=7102714"), ) def __init__(self, match): ImagefapExtractor.__init__(self, match) self.gid = match.group(1) self.image_id = "" def items(self): url = "{}/pictures/{}/".format(self.root, self.gid) page = self.request(url).text data = self.get_job_metadata(page) yield Message.Directory, data for url, image in self.get_images(): data.update(image) yield Message.Url, url, data def get_job_metadata(self, page): """Collect metadata for extractor-job""" descr, pos = text.extract( page, '<meta name="description" content="Browse ', '"') count, pos = text.extract(page, ' 1 of ', ' pics"', pos) self.image_id = text.extract(page, 'id="img_ed_', '"', pos)[0] title, _, descr = descr.partition(" porn picture gallery by ") uploader, _, tags = descr.partition(" to see hottest ") return { "gallery_id": text.parse_int(self.gid), "title": text.unescape(title), "uploader": uploader, "tags": tags[:-11].split(", "), "count": text.parse_int(count), } def get_images(self): """Collect image-urls and -metadata""" num = 0 url = "{}/photo/{}/".format(self.root, self.image_id) params = {"gid": self.gid, "idx": 0, "partial": "true"} while True: pos = 0 page = self.request(url, params=params).text for _ in range(24): imgurl, pos = text.extract(page, '<a href="', '"', pos) if not imgurl: return num += 1 data = text.nameext_from_url(imgurl) data["num"] = num data["image_id"] = text.parse_int(data["filename"]) yield imgurl, data params["idx"] += 24 class ImagefapImageExtractor(ImagefapExtractor): """Extractor for single images from imagefap.com""" subcategory = "image" pattern = BASE_PATTERN + r"/photo/(\d+)" test = ( ("https://www.imagefap.com/photo/1369341772/", { "pattern": r"https://cdnh\.imagefap\.com" r"/images/full/\d+/\d+/\d+\.jpg", "keyword": "8894e45f7262020d8d66ce59917315def1fc475b", }), ("https://beta.imagefap.com/photo/1369341772/"), ) def __init__(self, match): ImagefapExtractor.__init__(self, match) self.image_id = match.group(1) def items(self): url, data = self.get_image() yield Message.Directory, data yield Message.Url, url, data def get_image(self): url = "{}/photo/{}/".format(self.root, self.image_id) page = self.request(url).text info, pos = text.extract( page, '<script type="application/ld+json">', '</script>') image_id, pos = text.extract( page, 'id="imageid_input" value="', '"', pos) gallery_id, pos = text.extract( page, 'id="galleryid_input" value="', '"', pos) info = json.loads(info) url = info["contentUrl"] return url, text.nameext_from_url(url, { "title": text.unescape(info["name"]), "uploader": info["author"], "date": info["datePublished"], "width": text.parse_int(info["width"]), "height": text.parse_int(info["height"]), "gallery_id": text.parse_int(gallery_id), "image_id": text.parse_int(image_id), }) class ImagefapUserExtractor(ImagefapExtractor): """Extractor for all galleries from a user at imagefap.com""" subcategory = "user" categorytransfer = True pattern = (BASE_PATTERN + r"/(?:profile(?:\.php\?user=|/)([^/?#]+)" r"|usergallery\.php\?userid=(\d+))") test = ( ("https://www.imagefap.com/profile/LucyRae/galleries", { "url": "d941aa906f56a75972a7a5283030eb9a8d27a4fd", }), ("https://www.imagefap.com/usergallery.php?userid=1862791", { "url": "d941aa906f56a75972a7a5283030eb9a8d27a4fd", }), ("https://www.imagefap.com/profile.php?user=LucyRae"), ("https://beta.imagefap.com/profile.php?user=LucyRae"), ) def __init__(self, match): ImagefapExtractor.__init__(self, match) self.user, self.user_id = match.groups() def items(self): for gid, name in self.get_gallery_data(): url = "{}/gallery/{}".format(self.root, gid) data = { "gallery_id": text.parse_int(gid), "title": text.unescape(name), "_extractor": ImagefapGalleryExtractor, } yield Message.Queue, url, data def get_gallery_data(self): """Yield all gallery_ids of a specific user""" folders = self.get_gallery_folders() url = "{}/ajax_usergallery_folder.php".format(self.root) params = {"userid": self.user_id} for folder_id in folders: params["id"] = folder_id page = self.request(url, params=params).text pos = 0 while True: gid, pos = text.extract(page, '<a href="/gallery/', '"', pos) if not gid: break name, pos = text.extract(page, "<b>", "<", pos) yield gid, name def get_gallery_folders(self): """Create a list of all folder_ids of a specific user""" if self.user: url = "{}/profile/{}/galleries".format(self.root, self.user) else: url = "{}/usergallery.php?userid={}".format( self.root, self.user_id) page = self.request(url).text self.user_id, pos = text.extract(page, '?userid=', '"') folders, pos = text.extract(page, ' id="tgl_all" value="', '"', pos) return folders.split("|")[:-1] �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1643756959.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.21.1/gallery_dl/extractor/imagehosts.py������������������������������������������������0000644�0001750�0001750�00000026073�14176336637�021304� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2016-2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Collection of extractors for various imagehosts""" from .common import Extractor, Message from .. import text, exception from ..cache import memcache from os.path import splitext class ImagehostImageExtractor(Extractor): """Base class for single-image extractors for various imagehosts""" basecategory = "imagehost" subcategory = "image" archive_fmt = "{token}" https = True params = None cookies = None encoding = None def __init__(self, match): Extractor.__init__(self, match) self.page_url = "http{}://{}".format( "s" if self.https else "", match.group(1)) self.token = match.group(2) if self.params == "simple": self.params = { "imgContinue": "Continue+to+image+...+", } elif self.params == "complex": self.params = { "op": "view", "id": self.token, "pre": "1", "adb": "1", "next": "Continue+to+image+...+", } def items(self): page = self.request( self.page_url, method=("POST" if self.params else "GET"), data=self.params, cookies=self.cookies, encoding=self.encoding, ).text url, filename = self.get_info(page) data = text.nameext_from_url(filename, {"token": self.token}) if self.https and url.startswith("http:"): url = "https:" + url[5:] yield Message.Directory, data yield Message.Url, url, data def get_info(self, page): """Find image-url and string to get filename from""" class ImxtoImageExtractor(ImagehostImageExtractor): """Extractor for single images from imx.to""" category = "imxto" pattern = (r"(?:https?://)?(?:www\.)?((?:imx\.to|img\.yt)" r"/(?:i/|img-)(\w+)(\.html)?)") test = ( ("https://imx.to/i/1qdeva", { # new-style URL "url": "ab2173088a6cdef631d7a47dec4a5da1c6a00130", "keyword": "1153a986c939d7aed599905588f5c940048bc517", "content": "0c8768055e4e20e7c7259608b67799171b691140", }), ("https://imx.to/img-57a2050547b97.html", { # old-style URL "url": "a83fe6ef1909a318c4d49fcf2caf62f36c3f9204", "keyword": "fd2240aee77a21b8252d5b829a1f7e542f927f09", "content": "54592f2635674c25677c6872db3709d343cdf92f", }), ("https://img.yt/img-57a2050547b97.html", { # img.yt domain "url": "a83fe6ef1909a318c4d49fcf2caf62f36c3f9204", }), ("https://imx.to/img-57a2050547b98.html", { "exception": exception.NotFoundError, }), ) params = "simple" encoding = "utf-8" def __init__(self, match): ImagehostImageExtractor.__init__(self, match) if "/img-" in self.page_url: self.page_url = self.page_url.replace("img.yt", "imx.to") self.url_ext = True else: self.url_ext = False def get_info(self, page): url, pos = text.extract( page, '<div style="text-align:center;"><a href="', '"') if not url: raise exception.NotFoundError("image") filename, pos = text.extract(page, ' title="', '"', pos) if self.url_ext and filename: filename += splitext(url)[1] return url, filename or url class AcidimgImageExtractor(ImagehostImageExtractor): """Extractor for single images from acidimg.cc""" category = "acidimg" pattern = r"(?:https?://)?((?:www\.)?acidimg\.cc/img-([a-z0-9]+)\.html)" test = ("https://acidimg.cc/img-5acb6b9de4640.html", { "url": "f132a630006e8d84f52d59555191ed82b3b64c04", "keyword": "a8bb9ab8b2f6844071945d31f8c6e04724051f37", "content": "0c8768055e4e20e7c7259608b67799171b691140", }) params = "simple" encoding = "utf-8" def get_info(self, page): url, pos = text.extract(page, "<img class='centred' src='", "'") if not url: raise exception.NotFoundError("image") filename, pos = text.extract(page, " alt='", "'", pos) return url, (filename + splitext(url)[1]) if filename else url class ImagevenueImageExtractor(ImagehostImageExtractor): """Extractor for single images from imagevenue.com""" category = "imagevenue" pattern = (r"(?:https?://)?((?:www|img\d+)\.imagevenue\.com" r"/([A-Z0-9]{8,10}|view/.*|img\.php\?.*))") test = ( ("https://www.imagevenue.com/ME13LS07", { "pattern": r"https://cdn-images\.imagevenue\.com" r"/10/ac/05/ME13LS07_o\.png", "keyword": "ae15d6e3b2095f019eee84cd896700cd34b09c36", "content": "cfaa8def53ed1a575e0c665c9d6d8cf2aac7a0ee", }), (("https://www.imagevenue.com/view/o?i=92518_13732377" "annakarina424200712535AM_122_486lo.jpg&h=img150&l=loc486"), { "url": "8bf0254e29250d8f5026c0105bbdda3ee3d84980", }), (("http://img28116.imagevenue.com/img.php" "?image=th_52709_test_122_64lo.jpg"), { "url": "f98e3091df7f48a05fb60fbd86f789fc5ec56331", }), ) def get_info(self, page): pos = page.index('class="card-body') url, pos = text.extract(page, '<img src="', '"', pos) filename, pos = text.extract(page, 'alt="', '"', pos) return url, text.unescape(filename) class ImagetwistImageExtractor(ImagehostImageExtractor): """Extractor for single images from imagetwist.com""" category = "imagetwist" pattern = r"(?:https?://)?((?:www\.)?imagetwist\.com/([a-z0-9]{12}))" test = ("https://imagetwist.com/f1i2s4vhvbrq/test.png", { "url": "8d5e168c0bee30211f821c6f3b2116e419d42671", "keyword": "d1060a4c2e3b73b83044e20681712c0ffdd6cfef", "content": "0c8768055e4e20e7c7259608b67799171b691140", }) @property @memcache(maxage=3*3600) def cookies(self): return self.request(self.page_url).cookies def get_info(self, page): url , pos = text.extract(page, 'center;"><img src="', '"') filename, pos = text.extract(page, ' alt="', '"', pos) return url, filename class ImgspiceImageExtractor(ImagehostImageExtractor): """Extractor for single images from imgspice.com""" category = "imgspice" pattern = r"(?:https?://)?((?:www\.)?imgspice\.com/([^/?#]+))" test = ("https://imgspice.com/nwfwtpyog50y/test.png.html", { "url": "b8c30a8f51ee1012959a4cfd46197fabf14de984", "keyword": "100e310a19a2fa22d87e1bbc427ecb9f6501e0c0", "content": "0c8768055e4e20e7c7259608b67799171b691140", }) def get_info(self, page): pos = page.find('id="imgpreview"') if pos < 0: raise exception.NotFoundError("image") url , pos = text.extract(page, 'src="', '"', pos) name, pos = text.extract(page, 'alt="', '"', pos) return url, text.unescape(name) class PixhostImageExtractor(ImagehostImageExtractor): """Extractor for single images from pixhost.to""" category = "pixhost" pattern = (r"(?:https?://)?((?:www\.)?pixhost\.(?:to|org)" r"/show/\d+/(\d+)_[^/?#]+)") test = ("http://pixhost.to/show/190/130327671_test-.png", { "url": "4e5470dcf6513944773044d40d883221bbc46cff", "keyword": "3bad6d59db42a5ebbd7842c2307e1c3ebd35e6b0", "content": "0c8768055e4e20e7c7259608b67799171b691140", }) cookies = {"pixhostads": "1", "pixhosttest": "1"} def get_info(self, page): url , pos = text.extract(page, "class=\"image-img\" src=\"", "\"") filename, pos = text.extract(page, "alt=\"", "\"", pos) return url, filename class PostimgImageExtractor(ImagehostImageExtractor): """Extractor for single images from postimages.org""" category = "postimg" pattern = (r"(?:https?://)?((?:www\.)?(?:postimg|pixxxels)\.(?:cc|org)" r"/(?:image/)?([^/?#]+)/?)") test = ("https://postimg.cc/Wtn2b3hC", { "url": "0794cfda9b8951a8ac3aa692472484200254ab86", "keyword": "2d05808d04e4e83e33200db83521af06e3147a84", "content": "cfaa8def53ed1a575e0c665c9d6d8cf2aac7a0ee", }) def get_info(self, page): url , pos = text.extract(page, 'id="main-image" src="', '"') filename, pos = text.extract(page, 'class="imagename">', '<', pos) return url, text.unescape(filename) class TurboimagehostImageExtractor(ImagehostImageExtractor): """Extractor for single images from www.turboimagehost.com""" category = "turboimagehost" pattern = (r"(?:https?://)?((?:www\.)?turboimagehost\.com" r"/p/(\d+)/[^/?#]+\.html)") test = ("https://www.turboimagehost.com/p/39078423/test--.png.html", { "url": "b94de43612318771ced924cb5085976f13b3b90e", "keyword": "704757ca8825f51cec516ec44c1e627c1f2058ca", "content": "0c8768055e4e20e7c7259608b67799171b691140", }) def get_info(self, page): url = text.extract(page, 'src="', '"', page.index("<img "))[0] return url, url class ViprImageExtractor(ImagehostImageExtractor): """Extractor for single images from vipr.im""" category = "vipr" pattern = r"(?:https?://)?(vipr\.im/(\w+))" test = ("https://vipr.im/kcd5jcuhgs3v.html", { "url": "88f6a3ecbf3356a11ae0868b518c60800e070202", "keyword": "c432e8a1836b0d97045195b745731c2b1bb0e771", }) def get_info(self, page): url = text.extract(page, '<img src="', '"')[0] return url, url class ImgclickImageExtractor(ImagehostImageExtractor): """Extractor for single images from imgclick.net""" category = "imgclick" pattern = r"(?:https?://)?((?:www\.)?imgclick\.net/([^/?#]+))" test = ("http://imgclick.net/4tbrre1oxew9/test-_-_.png.html", { "url": "140dcb250a325f2d26b2d918c18b8ac6a2a0f6ab", "keyword": "6895256143eab955622fc149aa367777a8815ba3", "content": "0c8768055e4e20e7c7259608b67799171b691140", }) https = False params = "complex" def get_info(self, page): url , pos = text.extract(page, '<br><img src="', '"') filename, pos = text.extract(page, 'alt="', '"', pos) return url, filename class FappicImageExtractor(ImagehostImageExtractor): """Extractor for single images from fappic.com""" category = "fappic" pattern = r"(?:https?://)?((?:www\.)?fappic\.com/(\w+)/[^/?#]+)" test = ("https://www.fappic.com/98wxqcklyh8k/test.png", { "pattern": r"https://img\d+\.fappic\.com/img/\w+/test\.png", "keyword": "433b1d310b0ff12ad8a71ac7b9d8ba3f8cd1e898", "content": "0c8768055e4e20e7c7259608b67799171b691140", }) def get_info(self, page): url , pos = text.extract(page, '<a href="/?click"><img src="', '"') filename, pos = text.extract(page, 'alt="', '"', pos) if filename.startswith("Porn-Picture-"): filename = filename[13:] return url, filename ���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1643756959.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.21.1/gallery_dl/extractor/imgbb.py�����������������������������������������������������0000644�0001750�0001750�00000020070�14176336637�020210� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2019 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://imgbb.com/""" from .common import Extractor, Message from .. import text, exception from ..cache import cache import json class ImgbbExtractor(Extractor): """Base class for imgbb extractors""" category = "imgbb" directory_fmt = ("{category}", "{user}") filename_fmt = "{title} {id}.{extension}" archive_fmt = "{id}" root = "https://imgbb.com" def __init__(self, match): Extractor.__init__(self, match) self.page_url = self.sort = None def items(self): self.login() url = self.page_url params = {"sort": self.sort} while True: response = self.request(url, params=params, allow_redirects=False) if response.status_code < 300: break url = response.headers["location"] if url.startswith(self.root): raise exception.NotFoundError(self.subcategory) page = response.text data = self.metadata(page) first = True for img in self.images(page): image = { "id" : img["url_viewer"].rpartition("/")[2], "user" : img["user"]["username"] if "user" in img else "", "title" : text.unescape(img["title"]), "url" : img["image"]["url"], "extension": img["image"]["extension"], "size" : text.parse_int(img["image"]["size"]), "width" : text.parse_int(img["width"]), "height" : text.parse_int(img["height"]), } image.update(data) if first: first = False yield Message.Directory, data yield Message.Url, image["url"], image def login(self): username, password = self._get_auth_info() if username: self._update_cookies(self._login_impl(username, password)) @cache(maxage=360*24*3600, keyarg=1) def _login_impl(self, username, password): self.log.info("Logging in as %s", username) url = self.root + "/login" page = self.request(url).text token = text.extract(page, 'PF.obj.config.auth_token="', '"')[0] headers = {"Referer": url} data = { "auth_token" : token, "login-subject": username, "password" : password, } response = self.request(url, method="POST", headers=headers, data=data) if not response.history: raise exception.AuthenticationError() return self.session.cookies def _pagination(self, page, endpoint, params): data = None seek, pos = text.extract(page, 'data-seek="', '"') tokn, pos = text.extract(page, 'PF.obj.config.auth_token="', '"', pos) params["action"] = "list" params["list"] = "images" params["sort"] = self.sort params["seek"] = seek params["page"] = 2 params["auth_token"] = tokn while True: for img in text.extract_iter(page, "data-object='", "'"): yield json.loads(text.unquote(img)) if data: if params["seek"] == data["seekEnd"]: return params["seek"] = data["seekEnd"] params["page"] += 1 elif not seek or 'class="pagination-next"' not in page: return data = self.request(endpoint, method="POST", data=params).json() page = data["html"] class ImgbbAlbumExtractor(ImgbbExtractor): """Extractor for albums on imgbb.com""" subcategory = "album" directory_fmt = ("{category}", "{user}", "{album_name} {album_id}") pattern = r"(?:https?://)?ibb\.co/album/([^/?#]+)/?(?:\?([^#]+))?" test = ( ("https://ibb.co/album/i5PggF", { "range": "1-80", "url": "70afec9fcc3a6de62a6b644b487d892d8d47cf1a", "keyword": "569e1d88ebdd27655387559cdf1cd526a3e1ab69", }), ("https://ibb.co/album/i5PggF?sort=title_asc", { "range": "1-80", "url": "afdf5fc95d8e09d77e8f44312f3e9b843987bb5a", "keyword": "f090e14d0e5f7868595082b2c95da1309c84872d", }), # no user data (#471) ("https://ibb.co/album/kYKpwF", { "url": "ac0abcfcb89f4df6adc2f7e4ff872f3b03ef1bc7", "keyword": {"user": ""}, }), # private ("https://ibb.co/album/hqgWrF", { "exception": exception.HttpError, }), ) def __init__(self, match): ImgbbExtractor.__init__(self, match) self.album_name = None self.album_id = match.group(1) self.sort = text.parse_query(match.group(2)).get("sort", "date_desc") self.page_url = "https://ibb.co/album/" + self.album_id def metadata(self, page): album, pos = text.extract(page, '"og:title" content="', '"') user , pos = text.extract(page, 'rel="author">', '<', pos) return { "album_id" : self.album_id, "album_name": text.unescape(album), "user" : user.lower() if user else "", } def images(self, page): url = text.extract(page, '"og:url" content="', '"')[0] album_id = url.rpartition("/")[2].partition("?")[0] return self._pagination(page, "https://ibb.co/json", { "from" : "album", "albumid" : album_id, "params_hidden[list]" : "images", "params_hidden[from]" : "album", "params_hidden[albumid]": album_id, }) class ImgbbUserExtractor(ImgbbExtractor): """Extractor for user profiles in imgbb.com""" subcategory = "user" pattern = r"(?:https?://)?([\w-]+)\.imgbb\.com/?(?:\?([^#]+))?$" test = ("https://folkie.imgbb.com", { "range": "1-80", "pattern": r"https?://i\.ibb\.co/\w+/[^/?#]+", }) def __init__(self, match): ImgbbExtractor.__init__(self, match) self.user = match.group(1) self.sort = text.parse_query(match.group(2)).get("sort", "date_desc") self.page_url = "https://{}.imgbb.com/".format(self.user) def metadata(self, page): return {"user": self.user} def images(self, page): user = text.extract(page, '.obj.resource={"id":"', '"')[0] return self._pagination(page, self.page_url + "json", { "from" : "user", "userid" : user, "params_hidden[userid]": user, "params_hidden[from]" : "user", }) class ImgbbImageExtractor(ImgbbExtractor): subcategory = "image" pattern = r"(?:https?://)?ibb\.co/(?!album/)([^/?#]+)" test = ("https://ibb.co/fUqh5b", { "pattern": r"https://i\.ibb\.co/g3kvx80/Arundel-Ireeman-5\.jpg", "content": "c5a0965178a8b357acd8aa39660092918c63795e", "keyword": { "id" : "fUqh5b", "title" : "Arundel Ireeman 5", "url" : "https://i.ibb.co/g3kvx80/Arundel-Ireeman-5.jpg", "width" : 960, "height": 719, "user" : "folkie", "extension": "jpg", }, }) def __init__(self, match): ImgbbExtractor.__init__(self, match) self.image_id = match.group(1) def items(self): url = "https://ibb.co/" + self.image_id extr = text.extract_from(self.request(url).text) image = { "id" : self.image_id, "title" : text.unescape(extr('"og:title" content="', '"')), "url" : extr('"og:image" content="', '"'), "width" : text.parse_int(extr('"og:image:width" content="', '"')), "height": text.parse_int(extr('"og:image:height" content="', '"')), "user" : extr('rel="author">', '<').lower(), } image["extension"] = text.ext_from_url(image["url"]) yield Message.Directory, image yield Message.Url, image["url"], image ������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1646253139.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.21.1/gallery_dl/extractor/imgbox.py����������������������������������������������������0000644�0001750�0001750�00000010630�14207752123�020401� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2014-2019 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extract images from galleries at https://imgbox.com/""" from .common import Extractor, Message, AsynchronousMixin from .. import text, exception import re class ImgboxExtractor(Extractor): """Base class for imgbox extractors""" category = "imgbox" root = "https://imgbox.com" def items(self): data = self.get_job_metadata() yield Message.Directory, data for image_key in self.get_image_keys(): imgpage = self.request(self.root + "/" + image_key).text imgdata = self.get_image_metadata(imgpage) if imgdata["filename"]: imgdata.update(data) imgdata["image_key"] = image_key text.nameext_from_url(imgdata["filename"], imgdata) yield Message.Url, self.get_image_url(imgpage), imgdata @staticmethod def get_job_metadata(): """Collect metadata for extractor-job""" return {} @staticmethod def get_image_keys(): """Return an iterable containing all image-keys""" return [] @staticmethod def get_image_metadata(page): """Collect metadata for a downloadable file""" return text.extract_all(page, ( ("num" , '</a>   ', ' of '), (None , 'class="image-container"', ''), ("filename" , ' title="', '"'), ))[0] @staticmethod def get_image_url(page): """Extract download-url""" return text.extract(page, 'property="og:image" content="', '"')[0] class ImgboxGalleryExtractor(AsynchronousMixin, ImgboxExtractor): """Extractor for image galleries from imgbox.com""" subcategory = "gallery" directory_fmt = ("{category}", "{title} - {gallery_key}") filename_fmt = "{num:>03}-{filename}.{extension}" archive_fmt = "{gallery_key}_{image_key}" pattern = r"(?:https?://)?(?:www\.)?imgbox\.com/g/([A-Za-z0-9]{10})" test = ( ("https://imgbox.com/g/JaX5V5HX7g", { "url": "da4f15b161461119ee78841d4b8e8d054d95f906", "keyword": "4b1e62820ac2c6205b7ad0b6322cc8e00dbe1b0c", "content": "d20307dc8511ac24d688859c55abf2e2cc2dd3cc", }), ("https://imgbox.com/g/cUGEkRbdZZ", { "url": "76506a3aab175c456910851f66227e90484ca9f7", "keyword": "fb0427b87983197849fb2887905e758f3e50cb6e", }), ("https://imgbox.com/g/JaX5V5HX7h", { "exception": exception.NotFoundError, }), ) def __init__(self, match): ImgboxExtractor.__init__(self, match) self.gallery_key = match.group(1) self.image_keys = [] def get_job_metadata(self): page = self.request(self.root + "/g/" + self.gallery_key).text if "The specified gallery could not be found." in page: raise exception.NotFoundError("gallery") self.image_keys = re.findall(r'<a href="/([^"]+)"><img alt="', page) title = text.extract(page, "<h1>", "</h1>")[0] title, _, count = title.rpartition(" - ") return { "gallery_key": self.gallery_key, "title": text.unescape(title), "count": count[:-7], } def get_image_keys(self): return self.image_keys class ImgboxImageExtractor(ImgboxExtractor): """Extractor for single images from imgbox.com""" subcategory = "image" archive_fmt = "{image_key}" pattern = r"(?:https?://)?(?:www\.)?imgbox\.com/([A-Za-z0-9]{8})" test = ( ("https://imgbox.com/qHhw7lpG", { "url": "ee9cdea6c48ad0161c1b5f81f6b0c9110997038c", "keyword": "dfc72310026b45f3feb4f9cada20c79b2575e1af", "content": "0c8768055e4e20e7c7259608b67799171b691140", }), ("https://imgbox.com/qHhw7lpH", { "exception": exception.NotFoundError, }), ) def __init__(self, match): ImgboxExtractor.__init__(self, match) self.image_key = match.group(1) def get_image_keys(self): return (self.image_key,) @staticmethod def get_image_metadata(page): data = ImgboxExtractor.get_image_metadata(page) if not data["filename"]: raise exception.NotFoundError("image") return data ��������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1643756959.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.21.1/gallery_dl/extractor/imgth.py�����������������������������������������������������0000644�0001750�0001750�00000004322�14176336637�020242� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2015-2019 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extract images from https://imgth.com/""" from .common import Extractor, Message from .. import text class ImgthGalleryExtractor(Extractor): """Extractor for image galleries from imgth.com""" category = "imgth" subcategory = "gallery" directory_fmt = ("{category}", "{gallery_id} {title}") filename_fmt = "{category}_{gallery_id}_{num:>03}.{extension}" archive_fmt = "{gallery_id}_{num}" pattern = r"(?:https?://)?imgth\.com/gallery/(\d+)" test = ("http://imgth.com/gallery/37/wallpaper-anime", { "url": "4ae1d281ca2b48952cf5cca57e9914402ad72748", "keyword": "6f8c00d6849ea89d1a028764675ec1fe9dbd87e2", }) def __init__(self, match): Extractor.__init__(self, match) self.gid = match.group(1) self.url_base = "https://imgth.com/gallery/" + self.gid + "/g/page/" def items(self): page = self.request(self.url_base + "0").text data = self.metadata(page) yield Message.Directory, data for data["num"], url in enumerate(self.images(page), 1): yield Message.Url, url, text.nameext_from_url(url, data) def images(self, page): """Yield all image urls for this gallery""" pnum = 0 while True: thumbs = text.extract(page, '<ul class="thumbnails">', '</ul>')[0] for url in text.extract_iter(thumbs, '<img src="', '"'): yield "https://imgth.com/images" + url[24:] if '<li class="next">' not in page: return pnum += 1 page = self.request(self.url_base + str(pnum)).text def metadata(self, page): """Collect metadata for extractor-job""" return text.extract_all(page, ( ("title", '<h1>', '</h1>'), ("count", 'total of images in this gallery: ', ' '), ("date" , 'created on ', ' by <'), (None , 'href="/users/', ''), ("user" , '>', '<'), ), values={"gallery_id": self.gid})[0] ��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1643756959.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.21.1/gallery_dl/extractor/imgur.py�����������������������������������������������������0000644�0001750�0001750�00000033077�14176336637�020266� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2015-2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://imgur.com/""" from .common import Extractor, Message from .. import text, exception BASE_PATTERN = r"(?:https?://)?(?:www\.|[im]\.)?imgur\.com" class ImgurExtractor(Extractor): """Base class for imgur extractors""" category = "imgur" root = "https://imgur.com" def __init__(self, match): Extractor.__init__(self, match) self.api = ImgurAPI(self) self.key = match.group(1) self.mp4 = self.config("mp4", True) def _prepare(self, image): image.update(image["metadata"]) del image["metadata"] if image["ext"] == "jpeg": image["ext"] = "jpg" elif image["is_animated"] and self.mp4 and image["ext"] == "gif": image["ext"] = "mp4" image["url"] = url = "https://i.imgur.com/{}.{}".format( image["id"], image["ext"]) image["date"] = text.parse_datetime(image["created_at"]) text.nameext_from_url(url, image) return url def _items_queue(self, items): album_ex = ImgurAlbumExtractor image_ex = ImgurImageExtractor for item in items: item["_extractor"] = album_ex if item["is_album"] else image_ex yield Message.Queue, item["link"], item class ImgurImageExtractor(ImgurExtractor): """Extractor for individual images on imgur.com""" subcategory = "image" filename_fmt = "{category}_{id}{title:?_//}.{extension}" archive_fmt = "{id}" pattern = (BASE_PATTERN + r"/(?!gallery|search)" r"(?:r/\w+/)?(\w{7}|\w{5})[sbtmlh]?") test = ( ("https://imgur.com/21yMxCS", { "url": "6f2dcfb86815bdd72808c313e5f715610bc7b9b2", "content": "0c8768055e4e20e7c7259608b67799171b691140", "keyword": { "account_id" : 0, "comment_count" : int, "cover_id" : "21yMxCS", "date" : "dt:2016-11-10 14:24:35", "description" : "", "downvote_count": int, "duration" : 0, "ext" : "png", "favorite" : False, "favorite_count": 0, "has_sound" : False, "height" : 32, "id" : "21yMxCS", "image_count" : 1, "in_most_viral" : False, "is_ad" : False, "is_album" : False, "is_animated" : False, "is_looping" : False, "is_mature" : False, "is_pending" : False, "mime_type" : "image/png", "name" : "test-テスト", "point_count" : int, "privacy" : "", "score" : int, "size" : 182, "title" : "Test", "upvote_count" : int, "url" : "https://i.imgur.com/21yMxCS.png", "view_count" : int, "width" : 64, }, }), ("http://imgur.com/0gybAXR", { # gifv/mp4 video "url": "a2220eb265a55b0c95e0d3d721ec7665460e3fd7", "content": "a3c080e43f58f55243ab830569ba02309d59abfc", }), ("https://imgur.com/XFfsmuC", { # missing title in API response (#467) "keyword": {"title": "Tears are a natural response to irritants"}, }), ("https://imgur.com/1Nily2P", { # animated png "pattern": "https://i.imgur.com/1Nily2P.png", }), ("https://imgur.com/zzzzzzz", { # not found "exception": exception.HttpError, }), ("https://m.imgur.com/r/Celebs/iHJ7tsM"), ("https://www.imgur.com/21yMxCS"), # www ("https://m.imgur.com/21yMxCS"), # mobile ("https://imgur.com/zxaY6"), # 5 character key ("https://i.imgur.com/21yMxCS.png"), # direct link ("https://i.imgur.com/21yMxCSh.png"), # direct link thumbnail ("https://i.imgur.com/zxaY6.gif"), # direct link (short) ("https://i.imgur.com/zxaY6s.gif"), # direct link (short; thumb) ) def items(self): image = self.api.image(self.key) try: del image["ad_url"] del image["ad_type"] except KeyError: pass image.update(image["media"][0]) del image["media"] url = self._prepare(image) yield Message.Directory, image yield Message.Url, url, image class ImgurAlbumExtractor(ImgurExtractor): """Extractor for imgur albums""" subcategory = "album" directory_fmt = ("{category}", "{album[id]}{album[title]:? - //}") filename_fmt = "{category}_{album[id]}_{num:>03}_{id}.{extension}" archive_fmt = "{album[id]}_{id}" pattern = BASE_PATTERN + r"/a/(\w{7}|\w{5})" test = ( ("https://imgur.com/a/TcBmP", { "url": "ce3552f550a5b5316bd9c7ae02e21e39f30c0563", "keyword": { "album": { "account_id" : 0, "comment_count" : int, "cover_id" : "693j2Kr", "date" : "dt:2015-10-09 10:37:50", "description" : "", "downvote_count": 0, "favorite" : False, "favorite_count": 0, "id" : "TcBmP", "image_count" : 19, "in_most_viral" : False, "is_ad" : False, "is_album" : True, "is_mature" : False, "is_pending" : False, "privacy" : "private", "score" : int, "title" : "138", "upvote_count" : int, "url" : "https://imgur.com/a/TcBmP", "view_count" : int, "virality" : int, }, "account_id" : 0, "count" : 19, "date" : "type:datetime", "description": "", "ext" : "jpg", "has_sound" : False, "height" : int, "id" : str, "is_animated": False, "is_looping" : False, "mime_type" : "image/jpeg", "name" : str, "num" : int, "size" : int, "title" : str, "type" : "image", "updated_at" : None, "url" : str, "width" : int, }, }), ("https://imgur.com/a/eD9CT", { # large album "url": "de748c181a04d18bef1de9d4f4866ef0a06d632b", }), ("https://imgur.com/a/RhJXhVT/all", { # 7 character album hash "url": "695ef0c950023362a0163ee5041796300db76674", }), ("https://imgur.com/a/TcBmQ", { "exception": exception.HttpError, }), ("https://www.imgur.com/a/TcBmP"), # www ("https://m.imgur.com/a/TcBmP"), # mobile ) def items(self): album = self.api.album(self.key) album["date"] = text.parse_datetime(album["created_at"]) images = album["media"] del album["media"] count = len(images) try: del album["ad_url"] del album["ad_type"] except KeyError: pass for num, image in enumerate(images, 1): url = self._prepare(image) image["num"] = num image["count"] = count image["album"] = album yield Message.Directory, image yield Message.Url, url, image class ImgurGalleryExtractor(ImgurExtractor): """Extractor for imgur galleries""" subcategory = "gallery" pattern = BASE_PATTERN + r"/(?:gallery|t/\w+)/(\w{7}|\w{5})" test = ( ("https://imgur.com/gallery/zf2fIms", { # non-album gallery (#380) "pattern": "https://imgur.com/zf2fIms", }), ("https://imgur.com/gallery/eD9CT", { "pattern": "https://imgur.com/a/eD9CT", }), ("https://imgur.com/t/unmuted/26sEhNr"), ("https://imgur.com/t/cat/qSB8NbN"), ) def items(self): if self.api.gallery(self.key)["is_album"]: url = "{}/a/{}".format(self.root, self.key) extr = ImgurAlbumExtractor else: url = "{}/{}".format(self.root, self.key) extr = ImgurImageExtractor yield Message.Queue, url, {"_extractor": extr} class ImgurUserExtractor(ImgurExtractor): """Extractor for all images posted by a user""" subcategory = "user" pattern = BASE_PATTERN + r"/user/([^/?#]+)(?:/posts|/submitted)?/?$" test = ( ("https://imgur.com/user/Miguenzo", { "range": "1-100", "count": 100, "pattern": r"https?://(i.imgur.com|imgur.com/a)/[\w.]+", }), ("https://imgur.com/user/Miguenzo/posts"), ("https://imgur.com/user/Miguenzo/submitted"), ) def items(self): return self._items_queue(self.api.account_submissions(self.key)) class ImgurFavoriteExtractor(ImgurExtractor): """Extractor for a user's favorites""" subcategory = "favorite" pattern = BASE_PATTERN + r"/user/([^/?#]+)/favorites" test = ("https://imgur.com/user/Miguenzo/favorites", { "range": "1-100", "count": 100, "pattern": r"https?://(i.imgur.com|imgur.com/a)/[\w.]+", }) def items(self): return self._items_queue(self.api.account_favorites(self.key)) class ImgurSubredditExtractor(ImgurExtractor): """Extractor for a subreddits's imgur links""" subcategory = "subreddit" pattern = BASE_PATTERN + r"/r/([^/?#]+)/?$" test = ("https://imgur.com/r/pics", { "range": "1-100", "count": 100, "pattern": r"https?://(i.imgur.com|imgur.com/a)/[\w.]+", }) def items(self): return self._items_queue(self.api.gallery_subreddit(self.key)) class ImgurTagExtractor(ImgurExtractor): """Extractor for imgur tag searches""" subcategory = "tag" pattern = BASE_PATTERN + r"/t/([^/?#]+)$" test = ("https://imgur.com/t/animals", { "range": "1-100", "count": 100, "pattern": r"https?://(i.imgur.com|imgur.com/a)/[\w.]+", }) def items(self): return self._items_queue(self.api.gallery_tag(self.key)) class ImgurSearchExtractor(ImgurExtractor): """Extractor for imgur search results""" subcategory = "search" pattern = BASE_PATTERN + r"/search(?:/[^?#]+)?/?\?q=([^&#]+)" test = ("https://imgur.com/search?q=cute+cat", { "range": "1-100", "count": 100, "pattern": r"https?://(i.imgur.com|imgur.com/a)/[\w.]+", }) def items(self): key = text.unquote(self.key.replace("+", " ")) return self._items_queue(self.api.gallery_search(key)) class ImgurAPI(): """Interface for the Imgur API Ref: https://apidocs.imgur.com/ """ def __init__(self, extractor): self.extractor = extractor self.headers = { "Authorization": "Client-ID " + extractor.config( "client-id", "546c25a59c58ad7"), } def account_favorites(self, account): endpoint = "/3/account/{}/gallery_favorites".format(account) return self._pagination(endpoint) def gallery_search(self, query): endpoint = "/3/gallery/search" params = {"q": query} return self._pagination(endpoint, params) def account_submissions(self, account): endpoint = "/3/account/{}/submissions".format(account) return self._pagination(endpoint) def gallery_subreddit(self, subreddit): endpoint = "/3/gallery/r/{}".format(subreddit) return self._pagination(endpoint) def gallery_tag(self, tag): endpoint = "/3/gallery/t/{}".format(tag) return self._pagination(endpoint, key="items") def image(self, image_hash): endpoint = "/post/v1/media/" + image_hash params = {"include": "media,tags,account"} return self._call(endpoint, params) def album(self, album_hash): endpoint = "/post/v1/albums/" + album_hash params = {"include": "media,tags,account"} return self._call(endpoint, params) def gallery(self, gallery_hash): endpoint = "/post/v1/posts/" + gallery_hash return self._call(endpoint) def _call(self, endpoint, params=None): while True: try: return self.extractor.request( "https://api.imgur.com" + endpoint, params=params, headers=self.headers, ).json() except exception.HttpError as exc: if exc.status not in (403, 429) or \ b"capacity" not in exc.response.content: raise self.extractor.wait(seconds=600) def _pagination(self, endpoint, params=None, key=None): num = 0 while True: data = self._call("{}/{}".format(endpoint, num), params)["data"] if key: data = data[key] if not data: return yield from data num += 1 �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1646253139.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.21.1/gallery_dl/extractor/inkbunny.py��������������������������������������������������0000644�0001750�0001750�00000033720�14207752123�020756� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2020-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://inkbunny.net/""" from .common import Extractor, Message from .. import text, exception from ..cache import cache BASE_PATTERN = r"(?:https?://)?(?:www\.)?inkbunny\.net" class InkbunnyExtractor(Extractor): """Base class for inkbunny extractors""" category = "inkbunny" directory_fmt = ("{category}", "{username!l}") filename_fmt = "{submission_id} {file_id} {title}.{extension}" archive_fmt = "{file_id}" root = "https://inkbunny.net" def __init__(self, match): Extractor.__init__(self, match) self.api = InkbunnyAPI(self) def items(self): self.api.authenticate() to_bool = ("deleted", "favorite", "friends_only", "guest_block", "hidden", "public", "scraps") for post in self.posts(): post["date"] = text.parse_datetime( post["create_datetime"] + "00", "%Y-%m-%d %H:%M:%S.%f%z") post["tags"] = [kw["keyword_name"] for kw in post["keywords"]] post["ratings"] = [r["name"] for r in post["ratings"]] files = post["files"] for key in to_bool: if key in post: post[key] = (post[key] == "t") del post["keywords"] del post["files"] yield Message.Directory, post for post["num"], file in enumerate(files, 1): post.update(file) post["deleted"] = (file["deleted"] == "t") post["date"] = text.parse_datetime( file["create_datetime"] + "00", "%Y-%m-%d %H:%M:%S.%f%z") text.nameext_from_url(file["file_name"], post) url = file["file_url_full"] if "/private_files/" in url: url += "?sid=" + self.api.session_id yield Message.Url, url, post class InkbunnyUserExtractor(InkbunnyExtractor): """Extractor for inkbunny user profiles""" subcategory = "user" pattern = BASE_PATTERN + r"/(?!s/)(gallery/|scraps/)?(\w+)(?:$|[/?#])" test = ( ("https://inkbunny.net/soina", { "pattern": r"https://[\w.]+\.metapix\.net/files/full" r"/\d+/\d+_soina_.+", "range": "20-50", "keyword": { "date" : "type:datetime", "deleted" : bool, "file_id" : "re:[0-9]+", "filename" : r"re:[0-9]+_soina_\w+", "full_file_md5": "re:[0-9a-f]{32}", "mimetype" : str, "submission_id": "re:[0-9]+", "user_id" : "20969", "comments_count" : "re:[0-9]+", "deleted" : bool, "favorite" : bool, "favorites_count": "re:[0-9]+", "friends_only" : bool, "guest_block" : bool, "hidden" : bool, "pagecount" : "re:[0-9]+", "pools" : list, "pools_count" : int, "public" : bool, "rating_id" : "re:[0-9]+", "rating_name" : str, "ratings" : list, "scraps" : bool, "tags" : list, "title" : str, "type_name" : str, "username" : "soina", "views" : str, }, }), ("https://inkbunny.net/gallery/soina", { "range": "1-25", "keyword": {"scraps": False}, }), ("https://inkbunny.net/scraps/soina", { "range": "1-25", "keyword": {"scraps": True}, }), ) def __init__(self, match): kind, self.user = match.groups() if not kind: self.scraps = None elif kind[0] == "g": self.subcategory = "gallery" self.scraps = "no" else: self.subcategory = "scraps" self.scraps = "only" InkbunnyExtractor.__init__(self, match) def posts(self): orderby = self.config("orderby") params = { "username": self.user, "scraps" : self.scraps, "orderby" : orderby, } if orderby and orderby.startswith("unread_"): params["unread_submissions"] = "yes" return self.api.search(params) class InkbunnyPoolExtractor(InkbunnyExtractor): """Extractor for inkbunny pools""" subcategory = "pool" pattern = (BASE_PATTERN + r"/(?:" r"poolview_process\.php\?pool_id=(\d+)|" r"submissionsviewall\.php\?([^#]+&mode=pool&[^#]+))") test = ( ("https://inkbunny.net/poolview_process.php?pool_id=28985", { "count": 9, }), ("https://inkbunny.net/submissionsviewall.php?rid=ffffffffff" "&mode=pool&pool_id=28985&page=1&orderby=pool_order&random=no"), ) def __init__(self, match): InkbunnyExtractor.__init__(self, match) pid = match.group(1) if pid: self.pool_id = pid self.orderby = "pool_order" else: params = text.parse_query(match.group(2)) self.pool_id = params.get("pool_id") self.orderby = params.get("orderby", "pool_order") def posts(self): params = { "pool_id": self.pool_id, "orderby": self.orderby, } return self.api.search(params) class InkbunnyFavoriteExtractor(InkbunnyExtractor): """Extractor for inkbunny user favorites""" subcategory = "favorite" pattern = (BASE_PATTERN + r"/(?:" r"userfavorites_process\.php\?favs_user_id=(\d+)|" r"submissionsviewall\.php\?([^#]+&mode=userfavs&[^#]+))") test = ( ("https://inkbunny.net/userfavorites_process.php?favs_user_id=20969", { "pattern": r"https://[\w.]+\.metapix\.net/files/full" r"/\d+/\d+_\w+_.+", "range": "20-50", }), ("https://inkbunny.net/submissionsviewall.php?rid=ffffffffff" "&mode=userfavs&random=no&orderby=fav_datetime&page=1&user_id=20969"), ) def __init__(self, match): InkbunnyExtractor.__init__(self, match) uid = match.group(1) if uid: self.user_id = uid self.orderby = self.config("orderby", "fav_datetime") else: params = text.parse_query(match.group(2)) self.user_id = params.get("user_id") self.orderby = params.get("orderby", "fav_datetime") def posts(self): params = { "favs_user_id": self.user_id, "orderby" : self.orderby, } if self.orderby and self.orderby.startswith("unread_"): params["unread_submissions"] = "yes" return self.api.search(params) class InkbunnySearchExtractor(InkbunnyExtractor): """Extractor for inkbunny search results""" subcategory = "search" pattern = (BASE_PATTERN + r"/submissionsviewall\.php\?([^#]+&mode=search&[^#]+)") test = (("https://inkbunny.net/submissionsviewall.php?rid=ffffffffff" "&mode=search&page=1&orderby=create_datetime&text=cute" "&stringtype=and&keywords=yes&title=yes&description=no&artist=" "&favsby=&type=&days=&keyword_id=&user_id=&random=&md5="), { "range": "1-10", "count": 10, }) def __init__(self, match): InkbunnyExtractor.__init__(self, match) self.query = match.group(1) def posts(self): params = text.parse_query(self.query) pop = params.pop pop("rid", None) params["string_join_type"] = pop("stringtype", None) params["dayslimit"] = pop("days", None) params["username"] = pop("artist", None) favsby = pop("favsby", None) if favsby: # get user_id from user profile url = "{}/{}".format(self.root, favsby) page = self.request(url).text user_id = text.extract(page, "?user_id=", "'")[0] params["favs_user_id"] = user_id.partition("&")[0] return self.api.search(params) class InkbunnyFollowingExtractor(InkbunnyExtractor): """Extractor for inkbunny user watches""" subcategory = "following" pattern = (BASE_PATTERN + r"/(?:" r"watchlist_process\.php\?mode=watching&user_id=(\d+)|" r"usersviewall\.php\?([^#]+&mode=watching&[^#]+))") test = ( (("https://inkbunny.net/watchlist_process.php" "?mode=watching&user_id=20969"), { "pattern": InkbunnyUserExtractor.pattern, "count": ">= 90", }), ("https://inkbunny.net/usersviewall.php?rid=ffffffffff" "&mode=watching&page=1&user_id=20969&orderby=added&namesonly="), ) def __init__(self, match): InkbunnyExtractor.__init__(self, match) self.user_id = match.group(1) or \ text.parse_query(match.group(2)).get("user_id") def items(self): url = self.root + "/watchlist_process.php" params = {"mode": "watching", "user_id": self.user_id} with self.request(url, params=params) as response: url, _, params = response.url.partition("?") page = response.text params = text.parse_query(params) params["page"] = text.parse_int(params.get("page"), 1) data = {"_extractor": InkbunnyUserExtractor} while True: cnt = 0 for user in text.extract_iter( page, '<a class="widget_userNameSmall" href="', '"', page.index('id="changethumboriginal_form"')): cnt += 1 yield Message.Queue, self.root + user, data if cnt < 20: return params["page"] += 1 page = self.request(url, params=params).text class InkbunnyPostExtractor(InkbunnyExtractor): """Extractor for individual Inkbunny posts""" subcategory = "post" pattern = BASE_PATTERN + r"/s/(\d+)" test = ( ("https://inkbunny.net/s/1829715", { "pattern": r"https://[\w.]+\.metapix\.net/files/full" r"/2626/2626843_soina_dscn2296\.jpg", "content": "cf69d8dddf0822a12b4eef1f4b2258bd600b36c8", }), ("https://inkbunny.net/s/2044094", { "count": 4, }), ) def __init__(self, match): InkbunnyExtractor.__init__(self, match) self.submission_id = match.group(1) def posts(self): submissions = self.api.detail(({"submission_id": self.submission_id},)) if submissions[0] is None: raise exception.NotFoundError("submission") return submissions class InkbunnyAPI(): """Interface for the Inkunny API Ref: https://wiki.inkbunny.net/wiki/API """ def __init__(self, extractor): self.extractor = extractor self.session_id = None def detail(self, submissions): """Get full details about submissions with the given IDs""" ids = { sub["submission_id"]: idx for idx, sub in enumerate(submissions) } params = { "submission_ids": ",".join(ids), "show_description": "yes", } submissions = [None] * len(ids) for sub in self._call("submissions", params)["submissions"]: submissions[ids[sub["submission_id"]]] = sub return submissions def search(self, params): """Perform a search""" return self._pagination_search(params) def set_allowed_ratings(self, nudity=True, sexual=True, violence=True, strong_violence=True): """Change allowed submission ratings""" params = { "tag[2]": "yes" if nudity else "no", "tag[3]": "yes" if violence else "no", "tag[4]": "yes" if sexual else "no", "tag[5]": "yes" if strong_violence else "no", } self._call("userrating", params) def authenticate(self, invalidate=False): username, password = self.extractor._get_auth_info() if invalidate: _authenticate_impl.invalidate(username or "guest") if username: self.session_id = _authenticate_impl(self, username, password) else: self.session_id = _authenticate_impl(self, "guest", "") self.set_allowed_ratings() def _call(self, endpoint, params): url = "https://inkbunny.net/api_" + endpoint + ".php" params["sid"] = self.session_id data = self.extractor.request(url, params=params).json() if "error_code" in data: if str(data["error_code"]) == "2": self.authenticate(invalidate=True) return self._call(endpoint, params) raise exception.StopExtraction(data.get("error_message")) return data def _pagination_search(self, params): params["page"] = 1 params["get_rid"] = "yes" params["submission_ids_only"] = "yes" while True: data = self._call("search", params) yield from self.detail(data["submissions"]) if data["page"] >= data["pages_count"]: return if "get_rid" in params: del params["get_rid"] params["rid"] = data["rid"] params["page"] += 1 @cache(maxage=360*24*3600, keyarg=1) def _authenticate_impl(api, username, password): api.extractor.log.info("Logging in as %s", username) url = "https://inkbunny.net/api_login.php" data = {"username": username, "password": password} data = api.extractor.request(url, method="POST", data=data).json() if "sid" not in data: raise exception.AuthenticationError(data.get("error_message")) return data["sid"] ������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1648567962.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.21.1/gallery_dl/extractor/instagram.py�������������������������������������������������0000644�0001750�0001750�00000067514�14220623232�021107� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2018-2020 Leonardo Taccari # Copyright 2018-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.instagram.com/""" from .common import Extractor, Message from .. import text, util, exception from ..cache import cache import json import time import re BASE_PATTERN = r"(?:https?://)?(?:www\.)?instagram\.com" USER_PATTERN = BASE_PATTERN + r"/(?!(?:p|tv|reel|explore|stories)/)([^/?#]+)" class InstagramExtractor(Extractor): """Base class for instagram extractors""" category = "instagram" directory_fmt = ("{category}", "{username}") filename_fmt = "{sidecar_media_id:?/_/}{media_id}.{extension}" archive_fmt = "{media_id}" root = "https://www.instagram.com" cookiedomain = ".instagram.com" cookienames = ("sessionid",) request_interval = (6.0, 12.0) def __init__(self, match): Extractor.__init__(self, match) self.item = match.group(1) self.www_claim = "0" self.csrf_token = util.generate_token() self._find_tags = re.compile(r"#\w+").findall self._cursor = None def items(self): self.login() data = self.metadata() videos = self.config("videos", True) previews = self.config("previews", False) video_headers = {"User-Agent": "Mozilla/5.0"} for post in self.posts(): if "__typename" in post: post = self._parse_post_graphql(post) else: post = self._parse_post_api(post) post.update(data) files = post.pop("_files") yield Message.Directory, post for file in files: file.update(post) url = file.get("video_url") if url: if videos: file["_http_headers"] = video_headers text.nameext_from_url(url, file) yield Message.Url, url, file if not previews: continue url = file["display_url"] yield Message.Url, url, text.nameext_from_url(url, file) def metadata(self): return () def posts(self): return () def request(self, url, **kwargs): response = Extractor.request(self, url, **kwargs) if response.history and "/accounts/login/" in response.request.url: if self._cursor: self.log.info("Use '-o cursor=%s' to continue downloading " "from the current position", self._cursor) raise exception.StopExtraction( "HTTP redirect to login page (%s)", response.request.url) www_claim = response.headers.get("x-ig-set-www-claim") if www_claim is not None: self.www_claim = www_claim return response def _request_api(self, endpoint, **kwargs): url = "https://i.instagram.com/api" + endpoint kwargs["headers"] = { "X-CSRFToken" : self.csrf_token, "X-IG-App-ID" : "936619743392459", "X-IG-WWW-Claim": self.www_claim, } kwargs["cookies"] = { "csrftoken": self.csrf_token, } return self.request(url, **kwargs).json() def _request_graphql(self, query_hash, variables): url = self.root + "/graphql/query/" params = { "query_hash": query_hash, "variables" : json.dumps(variables), } headers = { "X-CSRFToken" : self.csrf_token, "X-IG-App-ID" : "936619743392459", "X-IG-WWW-Claim" : self.www_claim, "X-Requested-With": "XMLHttpRequest", } cookies = { "csrftoken": self.csrf_token, } return self.request( url, params=params, headers=headers, cookies=cookies, ).json()["data"] def login(self): if not self._check_cookies(self.cookienames): username, password = self._get_auth_info() if username: self._update_cookies(self._login_impl(username, password)) self.session.cookies.set( "csrftoken", self.csrf_token, domain=self.cookiedomain) @cache(maxage=360*24*3600, keyarg=1) def _login_impl(self, username, password): self.log.info("Logging in as %s", username) url = self.root + "/accounts/login/" page = self.request(url).text headers = { "X-Web-Device-Id" : text.extract(page, '"device_id":"', '"')[0], "X-IG-App-ID" : "936619743392459", "X-ASBD-ID" : "437806", "X-IG-WWW-Claim" : "0", "X-Requested-With": "XMLHttpRequest", "Referer" : url, } url = self.root + "/data/shared_data/" data = self.request(url, headers=headers).json() headers["X-CSRFToken"] = data["config"]["csrf_token"] headers["X-Instagram-AJAX"] = data["rollout_hash"] headers["Origin"] = self.root data = { "username" : username, "enc_password" : "#PWD_INSTAGRAM_BROWSER:0:{}:{}".format( int(time.time()), password), "queryParams" : "{}", "optIntoOneTap" : "false", "stopDeletionNonce" : "", "trustedDeviceRecords": "{}", } url = self.root + "/accounts/login/ajax/" response = self.request(url, method="POST", headers=headers, data=data) if not response.json().get("authenticated"): raise exception.AuthenticationError() cget = self.session.cookies.get return { name: cget(name) for name in ("sessionid", "mid", "ig_did") } def _parse_post_graphql(self, post): typename = post["__typename"] if post.get("is_video") and "video_url" not in post: url = "{}/tv/{}/".format(self.root, post["shortcode"]) post = self._extract_post_page(url) if "items" in post: return self._parse_post_api({"media": post["items"][0]}) post = post["graphql"]["shortcode_media"] elif typename == "GraphSidecar" and \ "edge_sidecar_to_children" not in post: url = "{}/p/{}/".format(self.root, post["shortcode"]) post = self._extract_post_page(url) if "items" in post: return self._parse_post_api({"media": post["items"][0]}) post = post["graphql"]["shortcode_media"] owner = post["owner"] data = { "typename" : typename, "date" : text.parse_timestamp(post["taken_at_timestamp"]), "likes" : post["edge_media_preview_like"]["count"], "owner_id" : owner["id"], "username" : owner.get("username"), "fullname" : owner.get("full_name"), "post_id" : post["id"], "post_shortcode": post["shortcode"], "post_url" : "{}/p/{}/".format(self.root, post["shortcode"]), "description": text.parse_unicode_escapes("\n".join( edge["node"]["text"] for edge in post["edge_media_to_caption"]["edges"] )), } tags = self._find_tags(data["description"]) if tags: data["tags"] = sorted(set(tags)) location = post.get("location") if location: data["location_id"] = location["id"] data["location_slug"] = location["slug"] data["location_url"] = "{}/explore/locations/{}/{}/".format( self.root, location["id"], location["slug"]) data["_files"] = files = [] if "edge_sidecar_to_children" in post: for num, edge in enumerate( post["edge_sidecar_to_children"]["edges"], 1): node = edge["node"] dimensions = node["dimensions"] media = { "num": num, "media_id" : node["id"], "shortcode" : (node.get("shortcode") or self._shortcode_from_id(node["id"])), "display_url": node["display_url"], "video_url" : node.get("video_url"), "width" : dimensions["width"], "height" : dimensions["height"], "sidecar_media_id" : post["id"], "sidecar_shortcode": post["shortcode"], } self._extract_tagged_users(node, media) files.append(media) else: dimensions = post["dimensions"] media = { "media_id" : post["id"], "shortcode" : post["shortcode"], "display_url": post["display_url"], "video_url" : post.get("video_url"), "width" : dimensions["width"], "height" : dimensions["height"], } self._extract_tagged_users(post, media) files.append(media) return data def _parse_post_api(self, post): if "media" in post: media = post["media"] owner = media["user"] data = { "post_id" : media["pk"], "post_shortcode": self._shortcode_from_id(media["pk"]), } if "carousel_media" in media: post["items"] = media["carousel_media"] data["sidecar_media_id"] = data["post_id"] data["sidecar_shortcode"] = data["post_shortcode"] else: post["items"] = (media,) else: reel_id = str(post["id"]).rpartition(":")[2] owner = post["user"] data = { "expires" : text.parse_timestamp(post.get("expiring_at")), "post_id" : reel_id, "post_shortcode": self._shortcode_from_id(reel_id), } data["owner_id"] = owner["pk"] data["username"] = owner.get("username") data["fullname"] = owner.get("full_name") data["_files"] = files = [] for num, item in enumerate(post["items"], 1): image = item["image_versions2"]["candidates"][0] if "video_versions" in item: video = max( item["video_versions"], key=lambda x: (x["width"], x["height"], x["type"]), ) media = video else: video = None media = image files.append({ "num" : num, "date" : text.parse_timestamp(item.get("taken_at") or media.get("taken_at")), "media_id" : item["pk"], "shortcode" : (item.get("code") or self._shortcode_from_id(item["pk"])), "display_url": image["url"], "video_url" : video["url"] if video else None, "width" : media["width"], "height" : media["height"], }) return data @staticmethod def _shortcode_from_id(post_id): return util.bencode( int(post_id), "ABCDEFGHIJKLMNOPQRSTUVWXYZ" "abcdefghijklmnopqrstuvwxyz" "0123456789-_") def _extract_tagged_users(self, src, dest): if "edge_media_to_tagged_user" not in src: return edges = src["edge_media_to_tagged_user"]["edges"] if edges: dest["tagged_users"] = tagged_users = [] for edge in edges: user = edge["node"]["user"] tagged_users.append({ "id" : user["id"], "username" : user["username"], "full_name": user["full_name"], }) def _extract_shared_data(self, url): page = self.request(url).text shared_data, pos = text.extract( page, "window._sharedData =", ";</script>") additional_data, pos = text.extract( page, "window.__additionalDataLoaded(", ");</script>", pos) data = json.loads(shared_data) if additional_data: next(iter(data["entry_data"].values()))[0] = \ json.loads(additional_data.partition(",")[2]) return data def _extract_profile_page(self, url): data = self._extract_shared_data(url)["entry_data"] if "HttpErrorPage" in data: raise exception.NotFoundError("user") return data["ProfilePage"][0]["graphql"]["user"] def _extract_post_page(self, url): data = self._extract_shared_data(url)["entry_data"] if "HttpErrorPage" in data: raise exception.NotFoundError("post") return data["PostPage"][0] def _get_edge_data(self, user, key): cursor = self.config("cursor") if cursor or not key: return { "edges" : (), "page_info": { "end_cursor" : cursor, "has_next_page": True, "_virtual" : True, }, } return user[key] def _pagination_graphql(self, query_hash, variables, data): while True: for edge in data["edges"]: yield edge["node"] info = data["page_info"] if not info["has_next_page"]: return elif not data["edges"] and "_virtual" not in info: s = "" if self.item.endswith("s") else "s" raise exception.StopExtraction( "%s'%s posts are private", self.item, s) variables["after"] = self._cursor = info["end_cursor"] self.log.debug("Cursor: %s", self._cursor) data = next(iter(self._request_graphql( query_hash, variables)["user"].values())) def _pagination_api(self, endpoint, params): while True: data = self._request_api(endpoint, method="POST", data=params) yield from data["items"] info = data["paging_info"] if not info["more_available"]: return params["max_id"] = info["max_id"] class InstagramUserExtractor(InstagramExtractor): """Extractor for an Instagram user profile""" subcategory = "user" pattern = USER_PATTERN + r"/?(?:$|[?#])" test = ( ("https://www.instagram.com/instagram/"), ("https://www.instagram.com/instagram/?hl=en"), ) def items(self): base = "{}/{}/".format(self.root, self.item) stories = "{}/stories/{}/".format(self.root, self.item) return self._dispatch_extractors(( (InstagramStoriesExtractor , stories), (InstagramHighlightsExtractor, base + "highlights/"), (InstagramPostsExtractor , base + "posts/"), (InstagramReelsExtractor , base + "reels/"), (InstagramChannelExtractor , base + "channel/"), (InstagramTaggedExtractor , base + "tagged/"), ), ("posts",)) class InstagramPostsExtractor(InstagramExtractor): """Extractor for ProfilePage posts""" subcategory = "posts" pattern = USER_PATTERN + r"/posts" test = ("https://www.instagram.com/instagram/posts/", { "range": "1-16", "count": ">= 16", }) def posts(self): url = "{}/{}/".format(self.root, self.item) user = self._extract_profile_page(url) query_hash = "8c2a529969ee035a5063f2fc8602a0fd" variables = {"id": user["id"], "first": 50} edge = self._get_edge_data(user, "edge_owner_to_timeline_media") return self._pagination_graphql(query_hash, variables, edge) class InstagramTaggedExtractor(InstagramExtractor): """Extractor for ProfilePage tagged posts""" subcategory = "tagged" pattern = USER_PATTERN + r"/tagged" test = ("https://www.instagram.com/instagram/tagged/", { "range": "1-16", "count": ">= 16", "keyword": { "tagged_owner_id" : "25025320", "tagged_username" : "instagram", "tagged_full_name": "Instagram", }, }) def metadata(self): url = "{}/{}/".format(self.root, self.item) self.user = user = self._extract_profile_page(url) return { "tagged_owner_id" : user["id"], "tagged_username" : user["username"], "tagged_full_name": user["full_name"], } def posts(self): query_hash = "be13233562af2d229b008d2976b998b5" variables = {"id": self.user["id"], "first": 50} edge = self._get_edge_data(self.user, None) return self._pagination_graphql(query_hash, variables, edge) class InstagramChannelExtractor(InstagramExtractor): """Extractor for ProfilePage channel""" subcategory = "channel" pattern = USER_PATTERN + r"/channel" test = ("https://www.instagram.com/instagram/channel/", { "range": "1-16", "count": ">= 16", }) def posts(self): url = "{}/{}/channel/".format(self.root, self.item) user = self._extract_profile_page(url) query_hash = "bc78b344a68ed16dd5d7f264681c4c76" variables = {"id": user["id"], "first": 50} edge = self._get_edge_data(user, "edge_felix_video_timeline") return self._pagination_graphql(query_hash, variables, edge) class InstagramSavedExtractor(InstagramExtractor): """Extractor for ProfilePage saved media""" subcategory = "saved" pattern = USER_PATTERN + r"/saved" test = ("https://www.instagram.com/instagram/saved/",) def posts(self): url = "{}/{}/saved/".format(self.root, self.item) user = self._extract_profile_page(url) query_hash = "2ce1d673055b99250e93b6f88f878fde" variables = {"id": user["id"], "first": 50} edge = self._get_edge_data(user, "edge_saved_media") return self._pagination_graphql(query_hash, variables, edge) class InstagramTagExtractor(InstagramExtractor): """Extractor for TagPage""" subcategory = "tag" directory_fmt = ("{category}", "{subcategory}", "{tag}") pattern = BASE_PATTERN + r"/explore/tags/([^/?#]+)" test = ("https://www.instagram.com/explore/tags/instagram/", { "range": "1-16", "count": ">= 16", }) def metadata(self): return {"tag": text.unquote(self.item)} def posts(self): url = "{}/explore/tags/{}/".format(self.root, self.item) page = self._extract_shared_data(url)["entry_data"]["TagPage"][0] if "data" in page: return self._pagination_sections(page["data"]["recent"]) hashtag = page["graphql"]["hashtag"] query_hash = "9b498c08113f1e09617a1703c22b2f32" variables = {"tag_name": hashtag["name"], "first": 50} edge = self._get_edge_data(hashtag, "edge_hashtag_to_media") return self._pagination_graphql(query_hash, variables, edge) def _pagination_sections(self, info): endpoint = "/v1/tags/instagram/sections/" data = { "include_persistent": "0", "max_id" : None, "page" : None, "surface": "grid", "tab" : "recent", } while True: for section in info["sections"]: yield from section["layout_content"]["medias"] if not info.get("more_available"): return data["max_id"] = info["next_max_id"] data["page"] = info["next_page"] info = self._request_api(endpoint, method="POST", data=data) def _pagination_graphql(self, query_hash, variables, data): while True: for edge in data["edges"]: yield edge["node"] info = data["page_info"] if not info["has_next_page"]: return variables["after"] = self._cursor = info["end_cursor"] self.log.debug("Cursor: %s", self._cursor) data = self._request_graphql( query_hash, variables)["hashtag"]["edge_hashtag_to_media"] class InstagramPostExtractor(InstagramExtractor): """Extractor for an Instagram post""" subcategory = "post" pattern = (r"(?:https?://)?(?:www\.)?instagram\.com" r"/(?:[^/?#]+/)?(?:p|tv|reel)/([^/?#]+)") test = ( # GraphImage ("https://www.instagram.com/p/BqvsDleB3lV/", { "pattern": r"https://[^/]+\.(cdninstagram\.com|fbcdn\.net)" r"/v(p/[0-9a-f]+/[0-9A-F]+)?/t51.2885-15/e35" r"/44877605_725955034447492_3123079845831750529_n.jpg", "keyword": { "date": "dt:2018-11-29 01:04:04", "description": str, "height": int, "likes": int, "location_id": "214424288", "location_slug": "hong-kong", "location_url": "re:/explore/locations/214424288/hong-kong/", "media_id": "1922949326347663701", "shortcode": "BqvsDleB3lV", "post_id": "1922949326347663701", "post_shortcode": "BqvsDleB3lV", "post_url": "https://www.instagram.com/p/BqvsDleB3lV/", "tags": ["#WHPsquares"], "typename": "GraphImage", "username": "instagram", "width": int, } }), # GraphSidecar ("https://www.instagram.com/p/BoHk1haB5tM/", { "count": 5, "keyword": { "sidecar_media_id": "1875629777499953996", "sidecar_shortcode": "BoHk1haB5tM", "post_id": "1875629777499953996", "post_shortcode": "BoHk1haB5tM", "post_url": "https://www.instagram.com/p/BoHk1haB5tM/", "num": int, "likes": int, "username": "instagram", } }), # GraphVideo ("https://www.instagram.com/p/Bqxp0VSBgJg/", { "pattern": r"/46840863_726311431074534_7805566102611403091_n\.mp4", "keyword": { "date": "dt:2018-11-29 19:23:58", "description": str, "height": int, "likes": int, "media_id": "1923502432034620000", "post_url": "https://www.instagram.com/p/Bqxp0VSBgJg/", "shortcode": "Bqxp0VSBgJg", "tags": ["#ASMR"], "typename": "GraphVideo", "username": "instagram", "width": int, } }), # GraphVideo (IGTV) ("https://www.instagram.com/tv/BkQjCfsBIzi/", { "pattern": r"/10000000_597132547321814_702169244961988209_n\.mp4", "keyword": { "date": "dt:2018-06-20 19:51:32", "description": str, "height": int, "likes": int, "media_id": "1806097553666903266", "post_url": "https://www.instagram.com/p/BkQjCfsBIzi/", "shortcode": "BkQjCfsBIzi", "typename": "GraphVideo", "username": "instagram", "width": int, } }), # GraphSidecar with 2 embedded GraphVideo objects ("https://www.instagram.com/p/BtOvDOfhvRr/", { "count": 2, "keyword": { "post_url": "https://www.instagram.com/p/BtOvDOfhvRr/", "sidecar_media_id": "1967717017113261163", "sidecar_shortcode": "BtOvDOfhvRr", "video_url": str, } }), # GraphImage with tagged user ("https://www.instagram.com/p/B_2lf3qAd3y/", { "keyword": { "tagged_users": [{ "id" : "1246468638", "username" : "kaaymbl", "full_name": "Call Me Kay", }] } }), # URL with username (#2085) ("https://www.instagram.com/dm/p/CW042g7B9CY/"), ("https://www.instagram.com/reel/CDg_6Y1pxWu/"), ) def posts(self): query_hash = "2efa04f61586458cef44441f474eee7c" variables = { "shortcode" : self.item, "child_comment_count" : 3, "fetch_comment_count" : 40, "parent_comment_count" : 24, "has_threaded_comments": True, } data = self._request_graphql(query_hash, variables) media = data.get("shortcode_media") if not media: raise exception.NotFoundError("post") return (media,) class InstagramStoriesExtractor(InstagramExtractor): """Extractor for Instagram stories""" subcategory = "stories" pattern = (r"(?:https?://)?(?:www\.)?instagram\.com" r"/stories/(?:highlights/(\d+)|([^/?#]+)(?:/(\d+))?)") test = ( ("https://www.instagram.com/stories/instagram/"), ("https://www.instagram.com/stories/highlights/18042509488170095/"), ("https://instagram.com/stories/geekmig/2724343156064789461"), ) def __init__(self, match): self.highlight_id, self.user, self.media_id = match.groups() if self.highlight_id: self.subcategory = InstagramHighlightsExtractor.subcategory InstagramExtractor.__init__(self, match) def posts(self): if self.highlight_id: reel_id = "highlight:" + self.highlight_id else: url = "{}/stories/{}/".format(self.root, self.user) try: data = self._extract_shared_data(url)["entry_data"] user = data["StoriesPage"][0]["user"] except KeyError: return () reel_id = user["id"] endpoint = "/v1/feed/reels_media/" params = {"reel_ids": reel_id} reels = self._request_api(endpoint, params=params)["reels"] if self.media_id: reel = reels[reel_id] for item in reel["items"]: if item["pk"] == self.media_id: reel["items"] = (item,) break else: raise exception.NotFoundError("story") return reels.values() class InstagramHighlightsExtractor(InstagramExtractor): """Extractor for all Instagram story highlights of a user""" subcategory = "highlights" pattern = USER_PATTERN + r"/highlights" test = ("https://www.instagram.com/instagram/highlights",) def posts(self): url = "{}/{}/".format(self.root, self.item) user = self._extract_profile_page(url) endpoint = "/v1/highlights/{}/highlights_tray/".format(user["id"]) tray = self._request_api(endpoint)["tray"] reel_ids = [highlight["id"] for highlight in tray] # Anything above 30 responds with statuscode 400. # 30 can work, however, sometimes the API will respond with 560 or 500. chunk_size = 5 endpoint = "/v1/feed/reels_media/" for offset in range(0, len(reel_ids), chunk_size): chunk_ids = reel_ids[offset : offset+chunk_size] params = {"reel_ids": chunk_ids} reels = self._request_api(endpoint, params=params)["reels"] for reel_id in chunk_ids: yield reels[reel_id] class InstagramReelsExtractor(InstagramExtractor): """Extractor for an Instagram user's reels""" subcategory = "reels" pattern = USER_PATTERN + r"/reels" test = ("https://www.instagram.com/instagram/reels/", { "range": "40-60", "count": ">= 20", }) def posts(self): url = "{}/{}/".format(self.root, self.item) user = self._extract_profile_page(url) endpoint = "/v1/clips/user/" data = { "target_user_id": user["id"], "page_size" : "50", } return self._pagination_api(endpoint, data) ������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1643756959.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.21.1/gallery_dl/extractor/issuu.py�����������������������������������������������������0000644�0001750�0001750�00000007540�14176336637�020307� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2019-2020 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://issuu.com/""" from .common import GalleryExtractor, Extractor, Message from .. import text, util import json class IssuuBase(): """Base class for issuu extractors""" category = "issuu" root = "https://issuu.com" class IssuuPublicationExtractor(IssuuBase, GalleryExtractor): """Extractor for a single publication""" subcategory = "publication" directory_fmt = ("{category}", "{document[userName]}", "{document[originalPublishDate]} {document[title]}") filename_fmt = "{num:>03}.{extension}" archive_fmt = "{document[id]}_{num}" pattern = r"(?:https?://)?issuu\.com(/[^/?#]+/docs/[^/?#]+)" test = ("https://issuu.com/issuu/docs/motions-1-2019/", { "pattern": r"https://image.isu.pub/190916155301-\w+/jpg/page_\d+.jpg", "count" : 36, "keyword": { "document": { "access" : "public", "articleStories": list, "contentRating" : dict, "date" : "dt:2019-09-16 00:00:00", "description" : "re:Motions, the brand new publication by I", "documentId" : r"re:\d+-d99ec95935f15091b040cb8060f05510", "documentName" : "motions-1-2019", "downloadState" : "NOT_AVAILABLE", "id" : r"re:\d+-d99ec95935f15091b040cb8060f05510", "isConverting" : False, "isQuarantined" : False, "lang" : "en", "language" : "English", "pageCount" : 36, "publicationId" : "d99ec95935f15091b040cb8060f05510", "title" : "Motions by Issuu - Issue 1", "userName" : "issuu", }, "extension": "jpg", "filename" : r"re:page_\d+", "num" : int, }, }) def metadata(self, page): data = json.loads(text.extract( page, 'window.__INITIAL_STATE__ =', ';\n')[0]) doc = data["document"] doc["lang"] = doc["language"] doc["language"] = util.code_to_language(doc["language"]) doc["date"] = text.parse_datetime( doc["originalPublishDate"], "%Y-%m-%d") self._cnt = text.parse_int(doc["pageCount"]) self._tpl = "https://{}/{}/jpg/page_{{}}.jpg".format( data["config"]["hosts"]["image"], doc["id"]) return {"document": doc} def images(self, page): fmt = self._tpl.format return [(fmt(i), None) for i in range(1, self._cnt + 1)] class IssuuUserExtractor(IssuuBase, Extractor): """Extractor for all publications of a user/publisher""" subcategory = "user" pattern = r"(?:https?://)?issuu\.com/([^/?#]+)/?$" test = ("https://issuu.com/issuu", { "pattern": IssuuPublicationExtractor.pattern, "count" : "> 25", }) def __init__(self, match): Extractor.__init__(self, match) self.user = match.group(1) def items(self): url = "{}/call/profile/v1/documents/{}".format(self.root, self.user) params = {"offset": 0, "limit": "25"} while True: data = self.request(url, params=params).json() for publication in data["items"]: publication["url"] = "{}/{}/docs/{}".format( self.root, self.user, publication["uri"]) publication["_extractor"] = IssuuPublicationExtractor yield Message.Queue, publication["url"], publication if not data["hasMore"]: return params["offset"] += data["limit"] ����������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1639190302.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.21.1/gallery_dl/extractor/kabeuchi.py��������������������������������������������������0000644�0001750�0001750�00000006125�14155007436�020675� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2020 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://kabe-uchiroom.com/""" from .common import Extractor, Message from .. import text, exception class KabeuchiUserExtractor(Extractor): """Extractor for all posts of a user on kabe-uchiroom.com""" category = "kabeuchi" subcategory = "user" directory_fmt = ("{category}", "{twitter_user_id} {twitter_id}") filename_fmt = "{id}_{num:>02}{title:?_//}.{extension}" archive_fmt = "{id}_{num}" root = "https://kabe-uchiroom.com" pattern = r"(?:https?://)?kabe-uchiroom\.com/mypage/?\?id=(\d+)" test = ( ("https://kabe-uchiroom.com/mypage/?id=919865303848255493", { "pattern": (r"https://kabe-uchiroom\.com/accounts/upfile/3/" r"919865303848255493/\w+\.jpe?g"), "count": ">= 24", }), ("https://kabe-uchiroom.com/mypage/?id=123456789", { "exception": exception.NotFoundError, }), ) def __init__(self, match): Extractor.__init__(self, match) self.user_id = match.group(1) def items(self): base = "{}/accounts/upfile/{}/{}/".format( self.root, self.user_id[-1], self.user_id) keys = ("image1", "image2", "image3", "image4", "image5", "image6") for post in self.posts(): if post.get("is_ad") or not post["image1"]: continue post["date"] = text.parse_datetime( post["created_at"], "%Y-%m-%d %H:%M:%S") yield Message.Directory, post for key in keys: name = post[key] if not name: break url = base + name post["num"] = ord(key[-1]) - 48 yield Message.Url, url, text.nameext_from_url(name, post) def posts(self): url = "{}/mypage/?id={}".format(self.root, self.user_id) response = self.request(url) if response.history and response.url == self.root + "/": raise exception.NotFoundError("user") target_id = text.extract(response.text, 'user_friend_id = "', '"')[0] return self._pagination(target_id) def _pagination(self, target_id): url = "{}/get_posts.php".format(self.root) data = { "user_id" : "0", "target_id" : target_id, "type" : "uploads", "sort_type" : "0", "category_id": "all", "latest_post": "", "page_num" : 0, } while True: info = self.request(url, method="POST", data=data).json() datas = info["datas"] if not datas or not isinstance(datas, list): return yield from datas last_id = datas[-1]["id"] if last_id == info["last_data"]: return data["latest_post"] = last_id data["page_num"] += 1 �������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1643756959.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.21.1/gallery_dl/extractor/keenspot.py��������������������������������������������������0000644�0001750�0001750�00000012570�14176336637�020766� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2019 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for http://www.keenspot.com/""" from .common import Extractor, Message from .. import text class KeenspotComicExtractor(Extractor): """Extractor for webcomics from keenspot.com""" category = "keenspot" subcategory = "comic" directory_fmt = ("{category}", "{comic}") filename_fmt = "{filename}.{extension}" archive_fmt = "{comic}_{filename}" pattern = r"(?:https?://)?(?!www\.|forums\.)([\w-]+)\.keenspot\.com(/.+)?" test = ( ("http://marksmen.keenspot.com/", { # link "range": "1-3", "url": "83bcf029103bf8bc865a1988afa4aaeb23709ba6", }), ("http://barkercomic.keenspot.com/", { # id "range": "1-3", "url": "c4080926db18d00bac641fdd708393b7d61379e6", }), ("http://crowscare.keenspot.com/", { # id v2 "range": "1-3", "url": "a00e66a133dd39005777317da90cef921466fcaa" }), ("http://supernovas.keenspot.com/", { # ks "range": "1-3", "url": "de21b12887ef31ff82edccbc09d112e3885c3aab" }), ("http://twokinds.keenspot.com/comic/1066/", { # "random" access "range": "1-3", "url": "6a784e11370abfb343dcad9adbb7718f9b7be350", }) ) def __init__(self, match): Extractor.__init__(self, match) self.comic = match.group(1).lower() self.path = match.group(2) self.root = "http://" + self.comic + ".keenspot.com" self._needle = "" self._image = 'class="ksc"' self._next = self._next_needle def items(self): data = {"comic": self.comic} yield Message.Directory, data with self.request(self.root + "/") as response: if response.history: url = response.request.url self.root = url[:url.index("/", 8)] page = response.text del response url = self._first(page) if self.path: url = self.root + self.path prev = None ilen = len(self._image) while url and url != prev: prev = url page = self.request(text.urljoin(self.root, url)).text pos = 0 while True: pos = page.find(self._image, pos) if pos < 0: break img, pos = text.extract(page, 'src="', '"', pos + ilen) if img.endswith(".js"): continue if img[0] == "/": img = self.root + img elif "youtube.com/" in img: img = "ytdl:" + img yield Message.Url, img, text.nameext_from_url(img, data) url = self._next(page) def _first(self, page): if self.comic == "brawlinthefamily": self._next = self._next_brawl self._image = '<div id="comic">' return "http://brawlinthefamily.keenspot.com/comic/theshowdown/" url = text.extract(page, '<link rel="first" href="', '"')[0] if url: if self.comic == "porcelain": self._needle = 'id="porArchivetop_"' else: self._next = self._next_link return url pos = page.find('id="first_day1"') if pos >= 0: self._next = self._next_id return text.rextract(page, 'href="', '"', pos)[0] pos = page.find('>FIRST PAGE<') if pos >= 0: if self.comic == "lastblood": self._next = self._next_lastblood self._image = '<div id="comic">' else: self._next = self._next_id return text.rextract(page, 'href="', '"', pos)[0] pos = page.find('<div id="kscomicpart"') if pos >= 0: self._needle = '<a href="/archive.html' return text.extract(page, 'href="', '"', pos)[0] pos = page.find('>First Comic<') # twokinds if pos >= 0: self._image = '</header>' self._needle = 'class="navarchive"' return text.rextract(page, 'href="', '"', pos)[0] pos = page.find('id="flip_FirstDay"') # flipside if pos >= 0: self._image = 'class="flip_Pages ksc"' self._needle = 'id="flip_ArcButton"' return text.rextract(page, 'href="', '"', pos)[0] self.log.error("Unrecognized page layout") return None def _next_needle(self, page): pos = page.index(self._needle) + len(self._needle) return text.extract(page, 'href="', '"', pos)[0] @staticmethod def _next_link(page): return text.extract(page, '<link rel="next" href="', '"')[0] @staticmethod def _next_id(page): pos = page.find('id="next_') return text.rextract(page, 'href="', '"', pos)[0] if pos >= 0 else None @staticmethod def _next_lastblood(page): pos = page.index("link rel='next'") return text.extract(page, "href='", "'", pos)[0] @staticmethod def _next_brawl(page): pos = page.index("comic-nav-next") url = text.rextract(page, 'href="', '"', pos)[0] return None if "?random" in url else url ����������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1648567962.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.21.1/gallery_dl/extractor/kemonoparty.py�����������������������������������������������0000644�0001750�0001750�00000041407�14220623232�021463� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2021-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://kemono.party/""" from .common import Extractor, Message from .. import text, exception from ..cache import cache import itertools import re BASE_PATTERN = r"(?:https?://)?(?:www\.|beta\.)?(kemono|coomer)\.party" USER_PATTERN = BASE_PATTERN + r"/([^/?#]+)/user/([^/?#]+)" class KemonopartyExtractor(Extractor): """Base class for kemonoparty extractors""" category = "kemonoparty" root = "https://kemono.party" directory_fmt = ("{category}", "{service}", "{user}") filename_fmt = "{id}_{title}_{num:>02}_{filename[:180]}.{extension}" archive_fmt = "{service}_{user}_{id}_{num}" cookiedomain = ".kemono.party" def __init__(self, match): if match.group(1) == "coomer": self.category = "coomerparty" self.cookiedomain = ".coomer.party" self.root = text.root_from_url(match.group(0)) Extractor.__init__(self, match) def items(self): self._prepare_ddosguard_cookies() self._find_inline = re.compile( r'src="(?:https?://(?:kemono|coomer)\.party)?(/inline/[^"]+' r'|/[0-9a-f]{2}/[0-9a-f]{2}/[0-9a-f]{64}\.[^"]+)').findall find_hash = re.compile("/[0-9a-f]{2}/[0-9a-f]{2}/([0-9a-f]{64})").match generators = self._build_file_generators(self.config("files")) duplicates = self.config("duplicates") comments = self.config("comments") username = dms = None # prevent files to be sent with gzip compression headers = {"Accept-Encoding": "identity"} if self.config("metadata"): username = text.unescape(text.extract( self.request(self.user_url).text, '<meta name="artist_name" content="', '"')[0]) if self.config("dms"): dms = True posts = self.posts() max_posts = self.config("max-posts") if max_posts: posts = itertools.islice(posts, max_posts) for post in posts: post["date"] = text.parse_datetime( post["published"] or post["added"], "%a, %d %b %Y %H:%M:%S %Z") if username: post["username"] = username if comments: post["comments"] = self._extract_comments(post) if dms is not None: if dms is True: dms = self._extract_dms(post) post["dms"] = dms yield Message.Directory, post hashes = set() post["num"] = 0 for file in itertools.chain.from_iterable( g(post) for g in generators): url = file["path"] match = find_hash(url) if match: post["hash"] = hash = match.group(1) if hash in hashes and not duplicates: self.log.debug("Skipping %s (duplicate)", url) continue hashes.add(hash) else: post["hash"] = "" post["type"] = file["type"] post["num"] += 1 post["_http_headers"] = headers if url[0] == "/": url = self.root + "/data" + url elif url.startswith(self.root): url = self.root + "/data" + url[20:] text.nameext_from_url(file.get("name", url), post) yield Message.Url, url, post def login(self): username, password = self._get_auth_info() if username: self._update_cookies(self._login_impl(username, password)) @cache(maxage=28*24*3600, keyarg=1) def _login_impl(self, username, password): self.log.info("Logging in as %s", username) url = self.root + "/account/login" data = {"username": username, "password": password} response = self.request(url, method="POST", data=data) if response.url.endswith("/account/login") and \ "Username or password is incorrect" in response.text: raise exception.AuthenticationError() return {c.name: c.value for c in response.history[0].cookies} def _file(self, post): file = post["file"] if not file: return () file["type"] = "file" return (file,) def _attachments(self, post): for attachment in post["attachments"]: attachment["type"] = "attachment" return post["attachments"] def _inline(self, post): for path in self._find_inline(post["content"] or ""): yield {"path": path, "name": path, "type": "inline"} def _build_file_generators(self, filetypes): if filetypes is None: return (self._attachments, self._file, self._inline) genmap = { "file" : self._file, "attachments": self._attachments, "inline" : self._inline, } if isinstance(filetypes, str): filetypes = filetypes.split(",") return [genmap[ft] for ft in filetypes] def _extract_comments(self, post): url = "{}/{}/user/{}/post/{}".format( self.root, post["service"], post["user"], post["id"]) page = self.request(url).text comments = [] for comment in text.extract_iter(page, "<article", "</article>"): extr = text.extract_from(comment) cid = extr('id="', '"') comments.append({ "id" : cid, "user": extr('href="#' + cid + '"', '</').strip(" \n\r>"), "body": extr( '<section class="comment__body">', '</section>').strip(), "date": extr('datetime="', '"'), }) return comments def _extract_dms(self, post): url = "{}/{}/user/{}/dms".format( self.root, post["service"], post["user"]) page = self.request(url).text dms = [] for dm in text.extract_iter(page, "<article", "</article>"): dms.append({ "body": text.unescape(text.extract( dm, '<pre>', '</pre></section>', )[0].strip()), "date": text.extract(dm, 'datetime="', '"')[0], }) return dms class KemonopartyUserExtractor(KemonopartyExtractor): """Extractor for all posts from a kemono.party user listing""" subcategory = "user" pattern = USER_PATTERN + r"/?(?:\?o=(\d+))?(?:$|[?#])" test = ( ("https://kemono.party/fanbox/user/6993449", { "range": "1-25", "count": 25, }), # 'max-posts' option, 'o' query parameter (#1674) ("https://kemono.party/patreon/user/881792?o=150", { "options": (("max-posts", 25),), "count": "< 100", }), ("https://kemono.party/subscribestar/user/alcorart"), ) def __init__(self, match): _, service, user_id, offset = match.groups() self.subcategory = service KemonopartyExtractor.__init__(self, match) self.api_url = "{}/api/{}/user/{}".format(self.root, service, user_id) self.user_url = "{}/{}/user/{}".format(self.root, service, user_id) self.offset = text.parse_int(offset) def posts(self): url = self.api_url params = {"o": self.offset} while True: posts = self.request(url, params=params).json() yield from posts if len(posts) < 25: return params["o"] += 25 class KemonopartyPostExtractor(KemonopartyExtractor): """Extractor for a single kemono.party post""" subcategory = "post" pattern = USER_PATTERN + r"/post/([^/?#]+)" test = ( ("https://kemono.party/fanbox/user/6993449/post/506575", { "pattern": r"https://kemono.party/data/21/0f" r"/210f35388e28bbcf756db18dd516e2d82ce75[0-9a-f]+\.jpg", "keyword": { "added": "Wed, 06 May 2020 20:28:02 GMT", "content": str, "date": "dt:2019-08-11 02:09:04", "edited": None, "embed": dict, "extension": "jpeg", "filename": "P058kDFYus7DbqAkGlfWTlOr", "hash": "210f35388e28bbcf756db18dd516e2d8" "2ce758e0d32881eeee76d43e1716d382", "id": "506575", "num": 1, "published": "Sun, 11 Aug 2019 02:09:04 GMT", "service": "fanbox", "shared_file": False, "subcategory": "fanbox", "title": "c96取り置き", "type": "file", "user": "6993449", }, }), # inline image (#1286) ("https://kemono.party/fanbox/user/7356311/post/802343", { "pattern": r"https://kemono\.party/data/47/b5/47b5c014ecdcfabdf2c8" r"5eec53f1133a76336997ae8596f332e97d956a460ad2\.jpg", "keyword": {"hash": "47b5c014ecdcfabdf2c85eec53f1133a" "76336997ae8596f332e97d956a460ad2"}, }), # kemono.party -> data.kemono.party ("https://kemono.party/gumroad/user/trylsc/post/IURjT", { "pattern": r"https://kemono\.party/data/(" r"a4/7b/a47bfe938d8c1682eef06e885927484cd8df1b.+\.jpg|" r"c6/04/c6048f5067fd9dbfa7a8be565ac194efdfb6e4.+\.zip)", }), # username (#1548, #1652) ("https://kemono.party/gumroad/user/3252870377455/post/aJnAH", { "options": (("metadata", True),), "keyword": {"username": "Kudalyn's Creations"}, }), # skip patreon duplicates ("https://kemono.party/patreon/user/4158582/post/32099982", { "count": 2, }), # allow duplicates (#2440) ("https://kemono.party/patreon/user/4158582/post/32099982", { "options": (("duplicates", True),), "count": 3, }), # DMs (#2008) ("https://kemono.party/patreon/user/34134344/post/38129255", { "options": (("dms", True),), "keyword": {"dms": [{ "body": r"re:Hi! Thank you very much for supporting the work I" r" did in May. Here's your reward pack! I hope you fin" r"d something you enjoy in it. :\)\n\nhttps://www.medi" r"afire.com/file/\w+/Set13_tier_2.zip/file", "date": "2021-07-31 02:47:51.327865", }]}, }), # coomer.party (#2100) ("https://coomer.party/onlyfans/user/alinity/post/125962203", { "pattern": r"https://coomer\.party/data/7d/3f/7d3fd9804583dc224968" r"c0591163ec91794552b04f00a6c2f42a15b68231d5a8\.jpg", }), ("https://kemono.party/subscribestar/user/alcorart/post/184330"), ("https://www.kemono.party/subscribestar/user/alcorart/post/184330"), ("https://beta.kemono.party/subscribestar/user/alcorart/post/184330"), ) def __init__(self, match): _, service, user_id, post_id = match.groups() self.subcategory = service KemonopartyExtractor.__init__(self, match) self.api_url = "{}/api/{}/user/{}/post/{}".format( self.root, service, user_id, post_id) self.user_url = "{}/{}/user/{}".format(self.root, service, user_id) def posts(self): posts = self.request(self.api_url).json() return (posts[0],) if len(posts) > 1 else posts class KemonopartyDiscordExtractor(KemonopartyExtractor): """Extractor for kemono.party discord servers""" subcategory = "discord" directory_fmt = ("{category}", "discord", "{server}", "{channel_name|channel}") filename_fmt = "{id}_{num:>02}_{filename}.{extension}" archive_fmt = "discord_{server}_{id}_{num}" pattern = BASE_PATTERN + r"/discord/server/(\d+)(?:/channel/(\d+))?#(.*)" test = ( (("https://kemono.party/discord" "/server/488668827274444803#finish-work"), { "count": 4, "keyword": {"channel_name": "finish-work"}, }), (("https://kemono.party/discord" "/server/256559665620451329/channel/462437519519383555#"), { "pattern": r"https://kemono\.party/data/(" r"e3/77/e377e3525164559484ace2e64425b0cec1db08.*\.png|" r"51/45/51453640a5e0a4d23fbf57fb85390f9c5ec154.*\.gif)", "count": ">= 2", }), # 'inline' files (("https://kemono.party/discord" "/server/315262215055736843/channel/315262215055736843#general"), { "pattern": r"https://cdn\.discordapp\.com/attachments/\d+/\d+/.+$", "range": "1-5", "options": (("image-filter", "type == 'inline'"),), }), ) def __init__(self, match): KemonopartyExtractor.__init__(self, match) _, self.server, self.channel, self.channel_name = match.groups() def items(self): self._prepare_ddosguard_cookies() find_inline = re.compile( r"https?://(?:cdn\.discordapp.com|media\.discordapp\.net)" r"(/[A-Za-z0-9-._~:/?#\[\]@!$&'()*+,;%=]+)").findall posts = self.posts() max_posts = self.config("max-posts") if max_posts: posts = itertools.islice(posts, max_posts) for post in posts: files = [] append = files.append for attachment in post["attachments"]: attachment["type"] = "attachment" append(attachment) for path in find_inline(post["content"] or ""): append({"path": "https://cdn.discordapp.com" + path, "name": path, "type": "inline"}) post["channel_name"] = self.channel_name post["date"] = text.parse_datetime( post["published"], "%a, %d %b %Y %H:%M:%S %Z") yield Message.Directory, post for post["num"], file in enumerate(files, 1): post["type"] = file["type"] url = file["path"] if url[0] == "/": url = self.root + "/data" + url elif url.startswith(self.root): url = self.root + "/data" + url[20:] text.nameext_from_url(file["name"], post) yield Message.Url, url, post def posts(self): if self.channel is None: url = "{}/api/discord/channels/lookup?q={}".format( self.root, self.server) for channel in self.request(url).json(): if channel["name"] == self.channel_name: self.channel = channel["id"] break else: raise exception.NotFoundError("channel") url = "{}/api/discord/channel/{}".format(self.root, self.channel) params = {"skip": 0} while True: posts = self.request(url, params=params).json() yield from posts if len(posts) < 25: break params["skip"] += 25 class KemonopartyDiscordServerExtractor(KemonopartyExtractor): subcategory = "discord-server" pattern = BASE_PATTERN + r"/discord/server/(\d+)$" test = ("https://kemono.party/discord/server/488668827274444803", { "pattern": KemonopartyDiscordExtractor.pattern, "count": 13, }) def __init__(self, match): KemonopartyExtractor.__init__(self, match) self.server = match.group(2) def items(self): url = "{}/api/discord/channels/lookup?q={}".format( self.root, self.server) channels = self.request(url).json() for channel in channels: url = "{}/discord/server/{}/channel/{}#{}".format( self.root, self.server, channel["id"], channel["name"]) channel["_extractor"] = KemonopartyDiscordExtractor yield Message.Queue, url, channel class KemonopartyFavoriteExtractor(KemonopartyExtractor): """Extractor for kemono.party favorites""" subcategory = "favorite" pattern = BASE_PATTERN + r"/favorites" test = ("https://kemono.party/favorites", { "pattern": KemonopartyUserExtractor.pattern, "url": "f4b5b796979bcba824af84206578c79101c7f0e1", "count": 3, }) def items(self): self._prepare_ddosguard_cookies() self.login() users = self.request(self.root + "/api/favorites").json() for user in users: user["_extractor"] = KemonopartyUserExtractor url = "{}/{}/user/{}".format( self.root, user["service"], user["id"]) yield Message.Queue, url, user ���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1643756959.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.21.1/gallery_dl/extractor/khinsider.py�������������������������������������������������0000644�0001750�0001750�00000006153�14176336637�021116� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2016-2020 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://downloads.khinsider.com/""" from .common import Extractor, Message, AsynchronousMixin from .. import text, exception class KhinsiderSoundtrackExtractor(AsynchronousMixin, Extractor): """Extractor for soundtracks from khinsider.com""" category = "khinsider" subcategory = "soundtrack" directory_fmt = ("{category}", "{album[name]}") archive_fmt = "{filename}.{extension}" pattern = (r"(?:https?://)?downloads\.khinsider\.com" r"/game-soundtracks/album/([^/?#]+)") root = "https://downloads.khinsider.com" test = (("https://downloads.khinsider.com" "/game-soundtracks/album/horizon-riders-wii"), { "pattern": r"https?://vgm(site|downloads).com" r"/soundtracks/horizon-riders-wii/[^/]+" r"/Horizon%20Riders%20Wii%20-%20Full%20Soundtrack.mp3", "keyword": "12ca70e0709ea15250e577ea388cf2b5b0c65630", }) def __init__(self, match): Extractor.__init__(self, match) self.album = match.group(1) def items(self): url = self.root + "/game-soundtracks/album/" + self.album page = self.request(url, encoding="utf-8").text if "Download all songs at once:" not in page: raise exception.NotFoundError("soundtrack") data = self.metadata(page) yield Message.Directory, data for track in self.tracks(page): track.update(data) yield Message.Url, track["url"], track def metadata(self, page): extr = text.extract_from(page) return {"album": { "name" : text.unescape(extr("Album name: <b>", "<")), "count": text.parse_int(extr("Number of Files: <b>", "<")), "size" : text.parse_bytes(extr("Total Filesize: <b>", "<")[:-1]), "date" : extr("Date added: <b>", "<"), "type" : extr("Album type: <b>", "<"), }} def tracks(self, page): fmt = self.config("format", ("mp3",)) if fmt and isinstance(fmt, str): if fmt == "all": fmt = None else: fmt = fmt.lower().split(",") page = text.extract(page, '<table id="songlist">', '</table>')[0] for num, url in enumerate(text.extract_iter( page, '<td class="clickable-row"><a href="', '"'), 1): url = text.urljoin(self.root, url) page = self.request(url, encoding="utf-8").text track = first = None for url in text.extract_iter( page, 'style="color: #21363f;" href="', '"'): track = text.nameext_from_url(url, {"num": num, "url": url}) if first is None: first = track if not fmt or track["extension"] in fmt: first = False yield track if first: yield first ���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1649084538.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.21.1/gallery_dl/extractor/kissgoddess.py�����������������������������������������������0000644�0001750�0001750�00000005371�14222604172�021441� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://kissgoddess.com/""" from .common import GalleryExtractor, Extractor, Message from .. import text, exception class KissgoddessGalleryExtractor(GalleryExtractor): """Extractor for image galleries on kissgoddess.com""" category = "kissgoddess" root = "https://kissgoddess.com" pattern = r"(?:https?://)?(?:www\.)?kissgoddess\.com/album/(\d+)" test = ("https://kissgoddess.com/album/18285.html", { "pattern": r"https://pic\.kissgoddess\.com" r"/gallery/16473/18285/s/\d+\.jpg", "count": 19, "keyword": { "gallery_id": 18285, "title": "[Young Champion Extra] 2016.02 No.03 菜乃花 安枝瞳 葉月あや", }, }) def __init__(self, match): self.gallery_id = match.group(1) url = "{}/album/{}.html".format(self.root, self.gallery_id) GalleryExtractor.__init__(self, match, url) def metadata(self, page): return { "gallery_id": text.parse_int(self.gallery_id), "title" : text.extract( page, '<title>', "<")[0].rpartition(" | ")[0], } def images(self, page): pnum = 1 while page: for url in text.extract_iter(page, "<img src='", "'"): yield url, None for url in text.extract_iter(page, "<img data-original='", "'"): yield url, None pnum += 1 url = "{}/album/{}_{}.html".format( self.root, self.gallery_id, pnum) try: page = self.request(url).text except exception.HttpError: return class KissgoddessModelExtractor(Extractor): """Extractor for all galleries of a model on kissgoddess.com""" category = "kissgoddess" subcategory = "model" root = "https://kissgoddess.com" pattern = r"(?:https?://)?(?:www\.)?kissgoddess\.com/people/([^./?#]+)" test = ("https://kissgoddess.com/people/aya-hazuki.html", { "pattern": KissgoddessGalleryExtractor.pattern, "count": ">= 7", }) def __init__(self, match): Extractor.__init__(self, match) self.model = match.group(1) def items(self): url = "{}/people/{}.html".format(self.root, self.model) page = self.request(url).text data = {"_extractor": KissgoddessGalleryExtractor} for path in text.extract_iter(page, 'thumb"><a href="/album/', '"'): url = self.root + "/album/" + path yield Message.Queue, url, data �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1646253139.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.21.1/gallery_dl/extractor/kohlchan.py��������������������������������������������������0000644�0001750�0001750�00000005261�14207752123�020707� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://kohlchan.net/""" from .common import Extractor, Message from .. import text import itertools class KohlchanThreadExtractor(Extractor): """Extractor for Kohlchan threads""" category = "kohlchan" subcategory = "thread" directory_fmt = ("{category}", "{boardUri}", "{threadId} {subject|message[:50]}") filename_fmt = "{postId}{num:?-//} {filename}.{extension}" archive_fmt = "{boardUri}_{postId}_{num}" pattern = r"(?:https?://)?kohlchan\.net/([^/?#]+)/res/(\d+)" test = ("https://kohlchan.net/a/res/4594.html", { "pattern": r"https://kohlchan\.net/\.media/[0-9a-f]{64}(\.\w+)?$", "count": ">= 80", }) def __init__(self, match): Extractor.__init__(self, match) self.board, self.thread = match.groups() def items(self): url = "https://kohlchan.net/{}/res/{}.json".format( self.board, self.thread) thread = self.request(url).json() thread["postId"] = thread["threadId"] posts = thread.pop("posts") yield Message.Directory, thread for post in itertools.chain((thread,), posts): files = post.pop("files", ()) if files: thread.update(post) for num, file in enumerate(files): file.update(thread) file["num"] = num url = "https://kohlchan.net" + file["path"] text.nameext_from_url(file["originalName"], file) yield Message.Url, url, file class KohlchanBoardExtractor(Extractor): """Extractor for Kohlchan boards""" category = "kohlchan" subcategory = "board" pattern = (r"(?:https?://)?kohlchan\.net" r"/([^/?#]+)/(?:(?:catalog|\d+)\.html)?$") test = ( ("https://kohlchan.net/a/", { "pattern": KohlchanThreadExtractor.pattern, "count": ">= 100", }), ("https://kohlchan.net/a/2.html"), ("https://kohlchan.net/a/catalog.html"), ) def __init__(self, match): Extractor.__init__(self, match) self.board = match.group(1) def items(self): url = "https://kohlchan.net/{}/catalog.json".format(self.board) for thread in self.request(url).json(): url = "https://kohlchan.net/{}/res/{}.html".format( self.board, thread["threadId"]) thread["_extractor"] = KohlchanThreadExtractor yield Message.Queue, url, thread �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1643756959.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.21.1/gallery_dl/extractor/komikcast.py�������������������������������������������������0000644�0001750�0001750�00000007657�14176336637�021135� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2018-2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extract manga-chapters and entire manga from https://komikcast.com/""" from .common import ChapterExtractor, MangaExtractor from .. import text import re class KomikcastBase(): """Base class for komikcast extractors""" category = "komikcast" root = "https://komikcast.com" @staticmethod def parse_chapter_string(chapter_string, data=None): """Parse 'chapter_string' value and add its info to 'data'""" if not data: data = {} match = re.match( r"(?:(.*) Chapter )?0*(\d+)([^ ]*)(?: (?:- )?(.+))?", text.unescape(chapter_string), ) manga, chapter, data["chapter_minor"], title = match.groups() if manga: data["manga"] = manga.partition(" Chapter ")[0] if title and not title.lower().startswith("bahasa indonesia"): data["title"] = title.strip() else: data["title"] = "" data["chapter"] = text.parse_int(chapter) data["lang"] = "id" data["language"] = "Indonesian" return data class KomikcastChapterExtractor(KomikcastBase, ChapterExtractor): """Extractor for manga-chapters from komikcast.com""" pattern = r"(?:https?://)?(?:www\.)?komikcast\.com(/chapter/[^/?#]+/)" test = ( (("https://komikcast.com/chapter/" "apotheosis-chapter-02-2-bahasa-indonesia/"), { "url": "f6b43fbc027697749b3ea1c14931c83f878d7936", "keyword": "f3938e1aff9ad1f302f52447e9781b21f6da26d4", }), (("https://komikcast.com/chapter/" "solo-spell-caster-chapter-37-bahasa-indonesia/"), { "url": "c3d30de6c796ff6ff36eb86e2e6fa2f8add8e829", "keyword": "ed8a0ff73098776988bf66fb700381a2c748f910", }), ) def metadata(self, page): info = text.extract(page, "<title>", " – Komikcast<")[0] return self.parse_chapter_string(info) @staticmethod def images(page): readerarea = text.extract( page, '<div class="main-reading-area', '</div')[0] return [ (text.unescape(url), None) for url in re.findall(r"<img[^>]* src=[\"']([^\"']+)", readerarea) ] class KomikcastMangaExtractor(KomikcastBase, MangaExtractor): """Extractor for manga from komikcast.com""" chapterclass = KomikcastChapterExtractor pattern = (r"(?:https?://)?(?:www\.)?komikcast\.com" r"(/(?:komik/)?[^/?#]+)/?$") test = ( ("https://komikcast.com/komik/090-eko-to-issho/", { "url": "dc798d107697d1f2309b14ca24ca9dba30c6600f", "keyword": "837a7e96867344ff59d840771c04c20dc46c0ab1", }), ("https://komikcast.com/tonari-no-kashiwagi-san/"), ) def chapters(self, page): results = [] data = self.metadata(page) for item in text.extract_iter( page, '<a class="chapter-link-item" href="', '</a'): url, _, chapter_string = item.rpartition('">Chapter ') self.parse_chapter_string(chapter_string, data) results.append((url, data.copy())) return results @staticmethod def metadata(page): """Return a dict with general metadata""" manga , pos = text.extract(page, "<title>" , " – Komikcast<") genres, pos = text.extract( page, 'class="komik_info-content-genre">', "</span>", pos) author, pos = text.extract(page, ">Author:", "</span>", pos) mtype , pos = text.extract(page, ">Type:" , "</span>", pos) return { "manga": text.unescape(manga), "genres": text.split_html(genres), "author": text.remove_html(author), "type": text.remove_html(mtype), } ���������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1646253139.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.21.1/gallery_dl/extractor/lightroom.py�������������������������������������������������0000644�0001750�0001750�00000006543�14207752123�021130� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://lightroom.adobe.com/""" from .common import Extractor, Message from .. import text import json class LightroomGalleryExtractor(Extractor): """Extractor for an image gallery on lightroom.adobe.com""" category = "lightroom" subcategory = "gallery" directory_fmt = ("{category}", "{user}", "{title}") filename_fmt = "{num:>04}_{id}.{extension}" archive_fmt = "{id}" pattern = r"(?:https?://)?lightroom\.adobe\.com/shares/([0-9a-f]+)" test = ( (("https://lightroom.adobe.com/shares/" "0c9cce2033f24d24975423fe616368bf"), { "keyword": { "title": "Sterne und Nachtphotos", "user": "Christian Schrang", }, "count": ">= 55", }), (("https://lightroom.adobe.com/shares/" "7ba68ad5a97e48608d2e6c57e6082813"), { "keyword": { "title": "HEBFC Snr/Res v Brighton", "user": "", }, "count": ">= 180", }), ) def __init__(self, match): Extractor.__init__(self, match) self.href = match.group(1) def items(self): # Get config url = "https://lightroom.adobe.com/shares/" + self.href response = self.request(url) album = json.loads( text.extract(response.text, "albumAttributes: ", "\n")[0] ) images = self.images(album) for img in images: url = img["url"] yield Message.Directory, img yield Message.Url, url, text.nameext_from_url(url, img) def metadata(self, album): payload = album["payload"] story = payload.get("story") or {} return { "gallery_id": self.href, "user": story.get("author", ""), "title": story.get("title", payload["name"]), } def images(self, album): album_md = self.metadata(album) base_url = album["base"] next_url = album["links"]["/rels/space_album_images_videos"]["href"] num = 1 while next_url: url = base_url + next_url page = self.request(url).text # skip 1st line as it's a JS loop data = json.loads(page[page.index("\n") + 1:]) base_url = data["base"] for res in data["resources"]: img_url, img_size = None, 0 for key, value in res["asset"]["links"].items(): if not key.startswith("/rels/rendition_type/"): continue size = text.parse_int(key.split("/")[-1]) if size > img_size: img_size = size img_url = value["href"] if img_url: img = { "id": res["asset"]["id"], "num": num, "url": base_url + img_url, } img.update(album_md) yield img num += 1 try: next_url = data["links"]["next"]["href"] except KeyError: next_url = None �������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1639190302.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.21.1/gallery_dl/extractor/lineblog.py��������������������������������������������������0000644�0001750�0001750�00000004466�14155007436�020723� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2019-2020 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.lineblog.me/""" from .livedoor import LivedoorBlogExtractor, LivedoorPostExtractor from .. import text class LineblogBase(): """Base class for lineblog extractors""" category = "lineblog" root = "https://lineblog.me" def _images(self, post): imgs = [] body = post.pop("body") for num, img in enumerate(text.extract_iter(body, "<img ", ">"), 1): src = text.extract(img, 'src="', '"')[0] alt = text.extract(img, 'alt="', '"')[0] if not src: continue if src.startswith("https://obs.line-scdn.") and src.count("/") > 3: src = src.rpartition("/")[0] imgs.append(text.nameext_from_url(alt or src, { "url" : src, "num" : num, "hash": src.rpartition("/")[2], "post": post, })) return imgs class LineblogBlogExtractor(LineblogBase, LivedoorBlogExtractor): """Extractor for a user's blog on lineblog.me""" pattern = r"(?:https?://)?lineblog\.me/(\w+)/?(?:$|[?#])" test = ("https://lineblog.me/mamoru_miyano/", { "range": "1-20", "count": 20, "pattern": r"https://obs.line-scdn.net/[\w-]+$", "keyword": { "post": { "categories" : tuple, "date" : "type:datetime", "description": str, "id" : int, "tags" : list, "title" : str, "user" : "mamoru_miyano" }, "filename": str, "hash" : r"re:\w{32,}", "num" : int, }, }) class LineblogPostExtractor(LineblogBase, LivedoorPostExtractor): """Extractor for blog posts on lineblog.me""" pattern = r"(?:https?://)?lineblog\.me/(\w+)/archives/(\d+)" test = ("https://lineblog.me/mamoru_miyano/archives/1919150.html", { "url": "24afeb4044c554f80c374b52bf8109c6f1c0c757", "keyword": "76a38e2c0074926bd3362f66f9fc0e6c41591dcb", }) ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1643756959.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.21.1/gallery_dl/extractor/livedoor.py��������������������������������������������������0000644�0001750�0001750�00000012670�14176336637�020762� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2019-2020 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for http://blog.livedoor.jp/""" from .common import Extractor, Message from .. import text class LivedoorExtractor(Extractor): """Base class for livedoor extractors""" category = "livedoor" root = "http://blog.livedoor.jp" filename_fmt = "{post[id]}_{post[title]}_{num:>02}.{extension}" directory_fmt = ("{category}", "{post[user]}") archive_fmt = "{post[id]}_{hash}" def __init__(self, match): Extractor.__init__(self, match) self.user = match.group(1) def items(self): for post in self.posts(): images = self._images(post) if images: yield Message.Directory, {"post": post} for image in images: yield Message.Url, image["url"], image def posts(self): """Return an iterable with post objects""" def _load(self, data, body): extr = text.extract_from(data) tags = text.extract(body, 'class="article-tags">', '</dl>')[0] about = extr('rdf:about="', '"') return { "id" : text.parse_int( about.rpartition("/")[2].partition(".")[0]), "title" : text.unescape(extr('dc:title="', '"')), "categories" : extr('dc:subject="', '"').partition(",")[::2], "description": extr('dc:description="', '"'), "date" : text.parse_datetime(extr('dc:date="', '"')), "tags" : text.split_html(tags)[1:] if tags else [], "user" : self.user, "body" : body, } def _images(self, post): imgs = [] body = post.pop("body") for num, img in enumerate(text.extract_iter(body, "<img ", ">"), 1): src = text.extract(img, 'src="', '"')[0] alt = text.extract(img, 'alt="', '"')[0] if not src: continue if "://livedoor.blogimg.jp/" in src: url = src.replace("http:", "https:", 1).replace("-s.", ".") else: url = text.urljoin(self.root, src) name, _, ext = url.rpartition("/")[2].rpartition(".") imgs.append({ "url" : url, "num" : num, "hash" : name, "filename" : alt or name, "extension": ext, "post" : post, }) return imgs class LivedoorBlogExtractor(LivedoorExtractor): """Extractor for a user's blog on blog.livedoor.jp""" subcategory = "blog" pattern = r"(?:https?://)?blog\.livedoor\.jp/(\w+)/?(?:$|[?#])" test = ( ("http://blog.livedoor.jp/zatsu_ke/", { "range": "1-50", "count": 50, "archive": False, "pattern": r"https?://livedoor.blogimg.jp/\w+/imgs/\w/\w/\w+\.\w+", "keyword": { "post": { "categories" : tuple, "date" : "type:datetime", "description": str, "id" : int, "tags" : list, "title" : str, "user" : "zatsu_ke" }, "filename": str, "hash" : r"re:\w{4,}", "num" : int, }, }), ("http://blog.livedoor.jp/uotapo/", { "range": "1-5", "count": 5, }), ) def posts(self): url = "{}/{}".format(self.root, self.user) while url: extr = text.extract_from(self.request(url).text) while True: data = extr('<rdf:RDF', '</rdf:RDF>') if not data: break body = extr('class="article-body-inner">', 'class="article-footer">') yield self._load(data, body) url = extr('<a rel="next" href="', '"') class LivedoorPostExtractor(LivedoorExtractor): """Extractor for images from a blog post on blog.livedoor.jp""" subcategory = "post" pattern = r"(?:https?://)?blog\.livedoor\.jp/(\w+)/archives/(\d+)" test = ( ("http://blog.livedoor.jp/zatsu_ke/archives/51493859.html", { "url": "9ca3bbba62722c8155be79ad7fc47be409e4a7a2", "keyword": "1f5b558492e0734f638b760f70bfc0b65c5a97b9", }), ("http://blog.livedoor.jp/amaumauma/archives/7835811.html", { "url": "204bbd6a9db4969c50e0923855aeede04f2e4a62", "keyword": "05821c7141360e6057ef2d382b046f28326a799d", }), ("http://blog.livedoor.jp/uotapo/archives/1050616939.html", { "url": "4b5ab144b7309eb870d9c08f8853d1abee9946d2", "keyword": "84fbf6e4eef16675013d6333039a7cfcb22c2d50", }), ) def __init__(self, match): LivedoorExtractor.__init__(self, match) self.post_id = match.group(2) def posts(self): url = "{}/{}/archives/{}.html".format( self.root, self.user, self.post_id) extr = text.extract_from(self.request(url).text) data = extr('<rdf:RDF', '</rdf:RDF>') body = extr('class="article-body-inner">', 'class="article-footer">') return (self._load(data, body),) ������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1648487626.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.21.1/gallery_dl/extractor/lolisafe.py��������������������������������������������������0000644�0001750�0001750�00000005627�14220366312�020720� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2021-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for lolisafe/chibisafe instances""" from .common import BaseExtractor, Message from .. import text class LolisafeExtractor(BaseExtractor): """Base class for lolisafe extractors""" basecategory = "lolisafe" directory_fmt = ("{category}", "{album_name} ({album_id})") archive_fmt = "{album_id}_{id}" BASE_PATTERN = LolisafeExtractor.update({ "bunkr": {"root": "https://bunkr.is", "pattern": r"bunkr\.(?:is|to)"}, "zzzz" : {"root": "https://zz.ht" , "pattern": r"zz\.(?:ht|fo)"}, }) class LolisafelbumExtractor(LolisafeExtractor): subcategory = "album" pattern = BASE_PATTERN + "/a/([^/?#]+)" test = ( ("https://bunkr.is/a/Lktg9Keq", { "pattern": r"https://cdn\.bunkr\.is/test-テスト-\"&>-QjgneIQv\.png", "content": "0c8768055e4e20e7c7259608b67799171b691140", "keyword": { "album_id": "Lktg9Keq", "album_name": 'test テスト "&>', "count": 1, "filename": 'test-テスト-"&>-QjgneIQv', "id": "QjgneIQv", "name": 'test-テスト-"&>', "num": int, }, }), # mp4 (#2239) ("https://bunkr.is/a/ptRHaCn2", { "pattern": r"https://media-files\.bunkr\.is/_-RnHoW69L\.mp4", "content": "80e61d1dbc5896ae7ef9a28734c747b28b320471", }), ("https://bunkr.to/a/Lktg9Keq"), ("https://zz.ht/a/lop7W6EZ", { "pattern": r"https://z\.zz\.fo/(4anuY|ih560)\.png", "count": 2, "keyword": { "album_id": "lop7W6EZ", "album_name": "ferris", }, }), ("https://zz.fo/a/lop7W6EZ"), ) def __init__(self, match): LolisafeExtractor.__init__(self, match) self.album_id = match.group(match.lastindex) def items(self): files, data = self.fetch_album(self.album_id) yield Message.Directory, data for data["num"], file in enumerate(files, 1): url = file["file"] text.nameext_from_url(url, data) data["name"], sep, data["id"] = data["filename"].rpartition("-") if data["extension"] == "mp4": url = url.replace( "//cdn.bunkr.is/", "//media-files.bunkr.is/", 1) yield Message.Url, url, data def fetch_album(self, album_id): url = "{}/api/album/get/{}".format(self.root, album_id) data = self.request(url).json() return data["files"], { "album_id" : self.album_id, "album_name": text.unescape(data["title"]), "count" : data["count"], } ���������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1646253139.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.21.1/gallery_dl/extractor/luscious.py��������������������������������������������������0000644�0001750�0001750�00000027273�14207752123�020775� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2016-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://members.luscious.net/""" from .common import Extractor, Message from .. import text, exception class LusciousExtractor(Extractor): """Base class for luscious extractors""" category = "luscious" cookiedomain = ".luscious.net" root = "https://members.luscious.net" def _graphql(self, op, variables, query): data = { "id" : 1, "operationName": op, "query" : query, "variables" : variables, } response = self.request( "{}/graphql/nobatch/?operationName={}".format(self.root, op), method="POST", json=data, fatal=False, ) if response.status_code >= 400: self.log.debug("Server response: %s", response.text) raise exception.StopExtraction( "GraphQL query failed ('%s %s')", response.status_code, response.reason) return response.json()["data"] class LusciousAlbumExtractor(LusciousExtractor): """Extractor for image albums from luscious.net""" subcategory = "album" filename_fmt = "{category}_{album[id]}_{num:>03}.{extension}" directory_fmt = ("{category}", "{album[id]} {album[title]}") archive_fmt = "{album[id]}_{id}" pattern = (r"(?:https?://)?(?:www\.|members\.)?luscious\.net" r"/(?:albums|pictures/c/[^/?#]+/album)/[^/?#]+_(\d+)") test = ( ("https://luscious.net/albums/okinami-no-koigokoro_277031/", { "url": "7e4984a271a1072ac6483e4228a045895aff86f3", # "content": "b3a747a6464509440bd0ff6d1267e6959f8d6ff3", "keyword": { "album": { "__typename" : "Album", "audiences" : list, "content" : "Hentai", "cover" : "re:https://\\w+.luscious.net/.+/277031/", "created" : 1479625853, "created_by" : "NTRshouldbeillegal", "date" : "dt:2016-11-20 07:10:53", "description" : "Enjoy.", "download_url": "re:/download/(r/)?824778/277031/", "genres" : list, "id" : 277031, "is_manga" : True, "labels" : list, "language" : "English", "like_status" : "none", "modified" : int, "permissions" : list, "rating" : float, "slug" : "okinami-no-koigokoro", "status" : None, "tags" : list, "title" : "Okinami no Koigokoro", "url" : "/albums/okinami-no-koigokoro_277031/", "marked_for_deletion": False, "marked_for_processing": False, "number_of_animated_pictures": 0, "number_of_favorites": int, "number_of_pictures": 18, }, "aspect_ratio": r"re:\d+:\d+", "category" : "luscious", "created" : int, "date" : "type:datetime", "height" : int, "id" : int, "is_animated" : False, "like_status" : "none", "position" : int, "resolution" : r"re:\d+x\d+", "status" : None, "tags" : list, "thumbnail" : str, "title" : str, "width" : int, "number_of_comments": int, "number_of_favorites": int, }, }), ("https://luscious.net/albums/not-found_277035/", { "exception": exception.NotFoundError, }), ("https://members.luscious.net/albums/login-required_323871/", { "count": 64, }), ("https://www.luscious.net/albums/okinami_277031/"), ("https://members.luscious.net/albums/okinami_277031/"), ("https://luscious.net/pictures/c/video_game_manga/album" "/okinami-no-koigokoro_277031/sorted/position/id/16528978/@_1"), ) def __init__(self, match): LusciousExtractor.__init__(self, match) self.album_id = match.group(1) self.gif = self.config("gif", False) def items(self): album = self.metadata() yield Message.Directory, {"album": album} for num, image in enumerate(self.images(), 1): image["num"] = num image["album"] = album image["thumbnail"] = image.pop("thumbnails")[0]["url"] image["tags"] = [item["text"] for item in image["tags"]] image["date"] = text.parse_timestamp(image["created"]) image["id"] = text.parse_int(image["id"]) url = (image["url_to_original"] or image["url_to_video"] if self.gif else image["url_to_video"] or image["url_to_original"]) yield Message.Url, url, text.nameext_from_url(url, image) def metadata(self): variables = { "id": self.album_id, } query = """ query AlbumGet($id: ID!) { album { get(id: $id) { ... on Album { ...AlbumStandard } ... on MutationError { errors { code message } } } } } fragment AlbumStandard on Album { __typename id title labels description created modified like_status number_of_favorites rating status marked_for_deletion marked_for_processing number_of_pictures number_of_animated_pictures slug is_manga url download_url permissions cover { width height size url } created_by { id name display_name user_title avatar { url size } url } content { id title url } language { id title url } tags { id category text url count } genres { id title slug url } audiences { id title url url } last_viewed_picture { id position url } } """ album = self._graphql("AlbumGet", variables, query)["album"]["get"] if "errors" in album: raise exception.NotFoundError("album") album["audiences"] = [item["title"] for item in album["audiences"]] album["genres"] = [item["title"] for item in album["genres"]] album["tags"] = [item["text"] for item in album["tags"]] album["cover"] = album["cover"]["url"] album["content"] = album["content"]["title"] album["language"] = album["language"]["title"].partition(" ")[0] album["created_by"] = album["created_by"]["display_name"] album["id"] = text.parse_int(album["id"]) album["date"] = text.parse_timestamp(album["created"]) return album def images(self): variables = { "input": { "filters": [{ "name" : "album_id", "value": self.album_id, }], "display": "position", "page" : 1, }, } query = """ query AlbumListOwnPictures($input: PictureListInput!) { picture { list(input: $input) { info { ...FacetCollectionInfo } items { ...PictureStandardWithoutAlbum } } } } fragment FacetCollectionInfo on FacetCollectionInfo { page has_next_page has_previous_page total_items total_pages items_per_page url_complete url_filters_only } fragment PictureStandardWithoutAlbum on Picture { __typename id title created like_status number_of_comments number_of_favorites status width height resolution aspect_ratio url_to_original url_to_video is_animated position tags { id category text url } permissions url thumbnails { width height size url } } """ while True: data = self._graphql("AlbumListOwnPictures", variables, query) yield from data["picture"]["list"]["items"] if not data["picture"]["list"]["info"]["has_next_page"]: return variables["input"]["page"] += 1 class LusciousSearchExtractor(LusciousExtractor): """Extractor for album searches on luscious.net""" subcategory = "search" pattern = (r"(?:https?://)?(?:www\.|members\.)?luscious\.net" r"/albums/list/?(?:\?([^#]+))?") test = ( ("https://members.luscious.net/albums/list/"), ("https://members.luscious.net/albums/list/" "?display=date_newest&language_ids=%2B1&tagged=+full_color&page=1", { "pattern": LusciousAlbumExtractor.pattern, "range": "41-60", "count": 20, }), ) def __init__(self, match): LusciousExtractor.__init__(self, match) self.query = match.group(1) def items(self): query = text.parse_query(self.query) display = query.pop("display", "date_newest") page = query.pop("page", None) variables = { "input": { "display": display, "filters": [{"name": n, "value": v} for n, v in query.items()], "page": text.parse_int(page, 1), }, } query = """ query AlbumListWithPeek($input: AlbumListInput!) { album { list(input: $input) { info { ...FacetCollectionInfo } items { ...AlbumMinimal peek_thumbnails { width height size url } } } } } fragment FacetCollectionInfo on FacetCollectionInfo { page has_next_page has_previous_page total_items total_pages items_per_page url_complete url_filters_only } fragment AlbumMinimal on Album { __typename id title labels description created modified number_of_favorites number_of_pictures slug is_manga url download_url cover { width height size url } content { id title url } language { id title url } tags { id category text url count } genres { id title slug url } audiences { id title url } } """ while True: data = self._graphql("AlbumListWithPeek", variables, query) for album in data["album"]["list"]["items"]: album["url"] = self.root + album["url"] album["_extractor"] = LusciousAlbumExtractor yield Message.Queue, album["url"], album if not data["album"]["list"]["info"]["has_next_page"]: return variables["input"]["page"] += 1 �������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1648567962.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.21.1/gallery_dl/extractor/mangadex.py��������������������������������������������������0000644�0001750�0001750�00000024734�14220623232�020703� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2018-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://mangadex.org/""" from .common import Extractor, Message from .. import text, util, exception from ..cache import cache, memcache from ..version import __version__ from collections import defaultdict BASE_PATTERN = r"(?:https?://)?(?:www\.)?mangadex\.(?:org|cc)" class MangadexExtractor(Extractor): """Base class for mangadex extractors""" category = "mangadex" directory_fmt = ( "{category}", "{manga}", "{volume:?v/ />02}c{chapter:>03}{chapter_minor}{title:?: //}") filename_fmt = ( "{manga}_c{chapter:>03}{chapter_minor}_{page:>03}.{extension}") archive_fmt = "{chapter_id}_{page}" root = "https://mangadex.org" _cache = {} _headers = {"User-Agent": "gallery-dl/" + __version__} def __init__(self, match): Extractor.__init__(self, match) self.api = MangadexAPI(self) self.uuid = match.group(1) def items(self): for chapter in self.chapters(): uuid = chapter["id"] data = self._transform(chapter) data["_extractor"] = MangadexChapterExtractor self._cache[uuid] = data yield Message.Queue, self.root + "/chapter/" + uuid, data def _transform(self, chapter): relationships = defaultdict(list) for item in chapter["relationships"]: relationships[item["type"]].append(item) manga = self.api.manga(relationships["manga"][0]["id"]) for item in manga["relationships"]: relationships[item["type"]].append(item) cattributes = chapter["attributes"] mattributes = manga["attributes"] lang = cattributes.get("translatedLanguage") if lang: lang = lang.partition("-")[0] if cattributes["chapter"]: chnum, sep, minor = cattributes["chapter"].partition(".") else: chnum, sep, minor = 0, "", "" data = { "manga" : (mattributes["title"].get("en") or next(iter(mattributes["title"].values()))), "manga_id": manga["id"], "title" : cattributes["title"], "volume" : text.parse_int(cattributes["volume"]), "chapter" : text.parse_int(chnum), "chapter_minor": sep + minor, "chapter_id": chapter["id"], "date" : text.parse_datetime(cattributes["publishAt"]), "lang" : lang, "language": util.code_to_language(lang), "count" : cattributes["pages"], "_external_url": cattributes.get("externalUrl"), } data["artist"] = [artist["attributes"]["name"] for artist in relationships["artist"]] data["author"] = [author["attributes"]["name"] for author in relationships["author"]] data["group"] = [group["attributes"]["name"] for group in relationships["scanlation_group"]] return data class MangadexChapterExtractor(MangadexExtractor): """Extractor for manga-chapters from mangadex.org""" subcategory = "chapter" pattern = BASE_PATTERN + r"/chapter/([0-9a-f-]+)" test = ( ("https://mangadex.org/chapter/f946ac53-0b71-4b5d-aeb2-7931b13c4aaa", { "keyword": "86fb262cf767dac6d965cd904ad499adba466404", # "content": "50383a4c15124682057b197d40261641a98db514", }), # oneshot ("https://mangadex.org/chapter/61a88817-9c29-4281-bdf1-77b3c1be9831", { "count": 64, "keyword": "6abcbe1e24eeb1049dc931958853cd767ee483fb", }), # MANGA Plus (#1154) ("https://mangadex.org/chapter/8d50ed68-8298-4ac9-b63d-cb2aea143dd0", { "exception": exception.StopExtraction, }), ) def items(self): try: data = self._cache.pop(self.uuid) except KeyError: chapter = self.api.chapter(self.uuid) data = self._transform(chapter) if data.get("_external_url"): raise exception.StopExtraction( "Chapter %s%s is not available on MangaDex and can instead be " "read on the official publisher's website at %s.", data["chapter"], data["chapter_minor"], data["_external_url"]) yield Message.Directory, data data["_http_headers"] = self._headers server = self.api.athome_server(self.uuid) chapter = server["chapter"] base = "{}/data/{}/".format(server["baseUrl"], chapter["hash"]) enum = util.enumerate_reversed if self.config( "page-reverse") else enumerate for data["page"], page in enum(chapter["data"], 1): text.nameext_from_url(page, data) yield Message.Url, base + page, data class MangadexMangaExtractor(MangadexExtractor): """Extractor for manga from mangadex.org""" subcategory = "manga" pattern = BASE_PATTERN + r"/(?:title|manga)/(?!feed$)([0-9a-f-]+)" test = ( ("https://mangadex.org/title/f90c4398-8aad-4f51-8a1f-024ca09fdcbc", { "keyword": { "manga" : "Souten no Koumori", "manga_id": "f90c4398-8aad-4f51-8a1f-024ca09fdcbc", "title" : "re:One[Ss]hot", "volume" : 0, "chapter" : 0, "chapter_minor": "", "chapter_id": str, "date" : "type:datetime", "lang" : str, "language": str, "artist" : ["Arakawa Hiromu"], "author" : ["Arakawa Hiromu"], }, }), ("https://mangadex.cc/manga/d0c88e3b-ea64-4e07-9841-c1d2ac982f4a/", { "options": (("lang", "en"),), "count": ">= 100", }), ("https://mangadex.org/title/7c1e2742-a086-4fd3-a3be-701fd6cf0be9", { "count": 1, }), ("https://mangadex.org/title/584ef094-b2ab-40ce-962c-bce341fb9d10", { "count": ">= 20", }) ) def chapters(self): return self.api.manga_feed(self.uuid) class MangadexFeedExtractor(MangadexExtractor): """Extractor for chapters from your Followed Feed""" subcategory = "feed" pattern = BASE_PATTERN + r"/title/feed$()" test = ("https://mangadex.org/title/feed",) def chapters(self): return self.api.user_follows_manga_feed() class MangadexAPI(): """Interface for the MangaDex API v5""" def __init__(self, extr): self.extractor = extr self.headers = extr._headers.copy() self.username, self.password = self.extractor._get_auth_info() if not self.username: self.authenticate = util.noop server = extr.config("api-server") self.root = ("https://api.mangadex.org" if server is None else text.ensure_http_scheme(server).rstrip("/")) def athome_server(self, uuid): return self._call("/at-home/server/" + uuid) def chapter(self, uuid): params = {"includes[]": ("scanlation_group",)} return self._call("/chapter/" + uuid, params)["data"] @memcache(keyarg=1) def manga(self, uuid): params = {"includes[]": ("artist", "author")} return self._call("/manga/" + uuid, params)["data"] def manga_feed(self, uuid): order = "desc" if self.extractor.config("chapter-reverse") else "asc" params = { "order[volume]" : order, "order[chapter]": order, } return self._pagination("/manga/" + uuid + "/feed", params) def user_follows_manga_feed(self): params = {"order[publishAt]": "desc"} return self._pagination("/user/follows/manga/feed", params) def authenticate(self): self.headers["Authorization"] = \ self._authenticate_impl(self.username, self.password) @cache(maxage=900, keyarg=1) def _authenticate_impl(self, username, password): refresh_token = _refresh_token_cache(username) if refresh_token: self.extractor.log.info("Refreshing access token") url = self.root + "/auth/refresh" data = {"token": refresh_token} else: self.extractor.log.info("Logging in as %s", username) url = self.root + "/auth/login" data = {"username": username, "password": password} data = self.extractor.request( url, method="POST", json=data, fatal=None).json() if data.get("result") != "ok": raise exception.AuthenticationError() if refresh_token != data["token"]["refresh"]: _refresh_token_cache.update(username, data["token"]["refresh"]) return "Bearer " + data["token"]["session"] def _call(self, endpoint, params=None): url = self.root + endpoint while True: self.authenticate() response = self.extractor.request( url, params=params, headers=self.headers, fatal=None) if response.status_code < 400: return response.json() if response.status_code == 429: until = response.headers.get("X-RateLimit-Retry-After") self.extractor.wait(until=until) continue msg = ", ".join('{title}: {detail}'.format_map(error) for error in response.json()["errors"]) raise exception.StopExtraction( "%s %s (%s)", response.status_code, response.reason, msg) def _pagination(self, endpoint, params=None): if params is None: params = {} config = self.extractor.config ratings = config("ratings") if ratings is None: ratings = ("safe", "suggestive", "erotica", "pornographic") params["contentRating[]"] = ratings params["includes[]"] = ("scanlation_group",) params["translatedLanguage[]"] = config("lang") params["offset"] = 0 api_params = config("api-parameters") if api_params: params.update(api_params) while True: data = self._call(endpoint, params) yield from data["data"] params["offset"] = data["offset"] + data["limit"] if params["offset"] >= data["total"]: return @cache(maxage=28*24*3600, keyarg=0) def _refresh_token_cache(username): return None ������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1643756959.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.21.1/gallery_dl/extractor/mangafox.py��������������������������������������������������0000644�0001750�0001750�00000010445�14176336637�020735� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2017-2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://fanfox.net/""" from .common import ChapterExtractor, MangaExtractor from .. import text import re BASE_PATTERN = r"(?:https?://)?(?:www\.|m\.)?(?:fanfox\.net|mangafox\.me)" class MangafoxChapterExtractor(ChapterExtractor): """Extractor for manga chapters from fanfox.net""" category = "mangafox" root = "https://m.fanfox.net" pattern = BASE_PATTERN + \ r"(/manga/[^/?#]+/((?:v([^/?#]+)/)?c(\d+)([^/?#]*)))" test = ( ("http://fanfox.net/manga/kidou_keisatsu_patlabor/v05/c006.2/1.html", { "keyword": "5661dab258d42d09d98f194f7172fb9851a49766", "content": "5c50c252dcf12ffecf68801f4db8a2167265f66c", }), ("http://mangafox.me/manga/kidou_keisatsu_patlabor/v05/c006.2/"), ("http://fanfox.net/manga/black_clover/vTBD/c295/1.html"), ) def __init__(self, match): base, self.cstr, self.volume, self.chapter, self.minor = match.groups() self.urlbase = self.root + base ChapterExtractor.__init__(self, match, self.urlbase + "/1.html") def metadata(self, page): manga, pos = text.extract(page, "<title>", "") count, pos = text.extract( page, ">", "<", page.find("", pos) - 20) sid , pos = text.extract(page, "var series_id =", ";", pos) cid , pos = text.extract(page, "var chapter_id =", ";", pos) return { "manga": text.unescape(manga), "volume": text.parse_int(self.volume), "chapter": text.parse_int(self.chapter), "chapter_minor": self.minor or "", "chapter_string": self.cstr, "count": text.parse_int(count), "sid": text.parse_int(sid), "cid": text.parse_int(cid), } def images(self, page): pnum = 1 while True: url, pos = text.extract(page, '=60", }), ("https://mangafox.me/manga/shangri_la_frontier", { "pattern": MangafoxChapterExtractor.pattern, "count": ">=45", }), ("https://m.fanfox.net/manga/sentai_daishikkaku"), ) def chapters(self, page): match_info = re.compile(r"Ch (\d+)(\S*)(?: (.*))?").match manga, pos = text.extract(page, '

    ', '

    ') author, pos = text.extract(page, '

    Author(s):', '

    ', pos) data = { "manga" : text.unescape(manga), "author" : text.remove_html(author), "lang" : "en", "language": "English", } results = [] pos = page.index('
    ') while True: url, pos = text.extract(page, '', '', '', pos) match = match_info(text.unescape(info)) if match: chapter, minor, title = match.groups() chapter_minor = minor else: chapter, _, minor = url[:-7].rpartition("/c")[2].partition(".") chapter_minor = "." + minor data["chapter"] = text.parse_int(chapter) data["chapter_minor"] = chapter_minor if minor else "" data["date"] = date results.append(("https://" + url, data.copy())) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1646253139.0 gallery_dl-1.21.1/gallery_dl/extractor/mangahere.py0000644000175000017500000001314514207752123021047 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2015-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.mangahere.cc/""" from .common import ChapterExtractor, MangaExtractor from .. import text import re class MangahereBase(): """Base class for mangahere extractors""" category = "mangahere" root = "https://www.mangahere.cc" mobile_root = "https://m.mangahere.cc" url_fmt = mobile_root + "/manga/{}/{}.html" class MangahereChapterExtractor(MangahereBase, ChapterExtractor): """Extractor for manga-chapters from mangahere.cc""" pattern = (r"(?:https?://)?(?:www\.|m\.)?mangahere\.c[co]/manga/" r"([^/]+(?:/v0*(\d+))?/c([^/?#]+))") test = ( ("https://www.mangahere.cc/manga/dongguo_xiaojie/c004.2/", { "keyword": "7c98d7b50a47e6757b089aa875a53aa970cac66f", "content": "708d475f06893b88549cbd30df1e3f9428f2c884", }), # URLs without HTTP scheme (#1070) ("https://www.mangahere.cc/manga/beastars/c196/1.html", { "pattern": "https://zjcdn.mangahere.org/.*", }), ("http://www.mangahere.co/manga/dongguo_xiaojie/c003.2/"), ("http://m.mangahere.co/manga/dongguo_xiaojie/c003.2/"), ) def __init__(self, match): self.part, self.volume, self.chapter = match.groups() url = self.url_fmt.format(self.part, 1) ChapterExtractor.__init__(self, match, url) def metadata(self, page): pos = page.index("") count , pos = text.extract(page, ">", "<", pos - 20) manga_id , pos = text.extract(page, "series_id = ", ";", pos) chapter_id, pos = text.extract(page, "chapter_id = ", ";", pos) manga , pos = text.extract(page, '"name":"', '"', pos) chapter, dot, minor = self.chapter.partition(".") return { "manga": text.unescape(manga), "manga_id": text.parse_int(manga_id), "title": self._get_title(), "volume": text.parse_int(self.volume), "chapter": text.parse_int(chapter), "chapter_minor": dot + minor, "chapter_id": text.parse_int(chapter_id), "count": text.parse_int(count), "lang": "en", "language": "English", } def images(self, page): pnum = 1 while True: url, pos = text.extract(page, '= 50", }), ("https://www.mangahere.co/manga/aria/"), ("https://m.mangahere.co/manga/aria/"), ) def __init__(self, match): MangaExtractor.__init__(self, match) self.session.cookies.set("isAdult", "1", domain="www.mangahere.cc") def chapters(self, page): results = [] manga, pos = text.extract(page, '', '<', pos) date, pos = text.extract(page, 'class="title2">', '<', pos) match = re.match( r"(?:Vol\.0*(\d+) )?Ch\.0*(\d+)(\S*)(?: - (.*))?", info) if match: volume, chapter, minor, title = match.groups() else: chapter, _, minor = url[:-1].rpartition("/c")[2].partition(".") minor = "." + minor volume = 0 title = "" results.append((text.urljoin(self.root, url), { "manga": manga, "title": text.unescape(title) if title else "", "volume": text.parse_int(volume), "chapter": text.parse_int(chapter), "chapter_minor": minor, "date": date, "lang": "en", "language": "English", })) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1643756959.0 gallery_dl-1.21.1/gallery_dl/extractor/mangakakalot.py0000644000175000017500000001037514176336637021571 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2020 Jake Mannens # Copyright 2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://mangakakalot.tv/""" from .common import ChapterExtractor, MangaExtractor from .. import text import re class MangakakalotBase(): """Base class for mangakakalot extractors""" category = "mangakakalot" root = "https://ww.mangakakalot.tv" class MangakakalotChapterExtractor(MangakakalotBase, ChapterExtractor): """Extractor for manga chapters from mangakakalot.tv""" pattern = (r"(?:https?://)?(?:www?\.)?mangakakalot\.tv" r"(/chapter/[^/?#]+/chapter[_-][^/?#]+)") test = ( ("https://ww.mangakakalot.tv/chapter/manga-hl984546/chapter-6", { "pattern": r"https://cm\.blazefast\.co" r"/[0-9a-f]{2}/[0-9a-f]{2}/[0-9a-f]{32}\.jpg", "keyword": "e9646a76a210f1eb4a71b4134664814c99d65d48", "count": 14, }), (("https://mangakakalot.tv/chapter" "/hatarakanai_futari_the_jobless_siblings/chapter_20.1"), { "keyword": "14c430737ff600b26a3811815905f34dd6a6c8c6", "content": "b3eb1f139caef98d9dcd8ba6a5ee146a13deebc4", "count": 2, }), ) def __init__(self, match): self.path = match.group(1) ChapterExtractor.__init__(self, match, self.root + self.path) self.session.headers['Referer'] = self.root def metadata(self, page): _ , pos = text.extract(page, '', '<') manga , pos = text.extract(page, '', '<', pos) info , pos = text.extract(page, '', '<', pos) author, pos = text.extract(page, '. Author:', ' already has ', pos) match = re.match( r"(?:[Vv]ol\. *(\d+) )?" r"[Cc]hapter *([^:]*)" r"(?:: *(.+))?", info) volume, chapter, title = match.groups() if match else ("", "", info) chapter, sep, minor = chapter.partition(".") return { "manga" : text.unescape(manga), "title" : text.unescape(title) if title else "", "author" : text.unescape(author).strip() if author else "", "volume" : text.parse_int(volume), "chapter" : text.parse_int(chapter), "chapter_minor": sep + minor, "lang" : "en", "language" : "English", } def images(self, page): return [ (url, None) for url in text.extract_iter(page, '= 30", }), ) def chapters(self, page): data = {"lang": "en", "language": "English"} data["manga"], pos = text.extract(page, "

    ", "<") author, pos = text.extract(page, "
  • Author(s) :", "", pos) data["author"] = text.remove_html(author) results = [] for chapter in text.extract_iter(page, '
    ', '
    '): url, pos = text.extract(chapter, '', '', pos) data["title"] = title.partition(": ")[2] data["date"] , pos = text.extract( chapter, '') manga , pos = text.extract(page, '', pos) info , pos = text.extract(page, '', pos) author, pos = text.extract(page, '- Author(s) : ', '

    ', pos) manga, _ = text.extract(manga, '">', '<') info , _ = text.extract(info , '">', '<') match = re.match( r"(?:[Vv]ol\. *(\d+) )?" r"[Cc]hapter *([^:]*)" r"(?:: *(.+))?", info) volume, chapter, title = match.groups() if match else ("", "", info) chapter, sep, minor = chapter.partition(".") return { "manga" : text.unescape(manga), "title" : text.unescape(title) if title else "", "author" : text.unescape(author) if author else "", "volume" : text.parse_int(volume), "chapter" : text.parse_int(chapter), "chapter_minor": sep + minor, "lang" : "en", "language" : "English", } def images(self, page): page = text.extract( page, 'class="container-chapter-reader', '\n= 70", }), ("https://manganelo.com/manga/read_otome_no_teikoku", { "pattern": ManganeloChapterExtractor.pattern, "count": ">= 40", }), ("https://manganelo.com/manga/ol921234/"), ) def __init__(self, match): domain, path = match.groups() MangaExtractor.__init__(self, match, "https://" + domain + path) self.session.headers['Referer'] = self.root def chapters(self, page): results = [] data = self.parse_page(page, {"lang": "en", "language": "English"}) needle = 'class="chapter-name text-nowrap" href="' pos = page.index('
      ') while True: url, pos = text.extract(page, needle, '"', pos) if not url: return results data["title"], pos = text.extract(page, '>', '', pos) data["date"] , pos = text.extract( page, 'class="chapter-time text-nowrap" title="', '">', pos) chapter, sep, minor = url.rpartition("/chapter_")[2].partition(".") data["chapter"] = text.parse_int(chapter) data["chapter_minor"] = sep + minor results.append((url, data.copy())) @staticmethod def parse_page(page, data): """Parse metadata on 'page' and add it to 'data'""" text.extract_all(page, ( ("manga" , '

      ', '

      '), ('author' , 'Author(s) :', ''), ), values=data) data["author"] = text.remove_html(data["author"]) return data ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1643756959.0 gallery_dl-1.21.1/gallery_dl/extractor/mangapark.py0000644000175000017500000001401514176336637021073 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2015-2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://mangapark.net/""" from .common import ChapterExtractor, MangaExtractor from .. import text, exception import json import re class MangaparkBase(): """Base class for mangapark extractors""" category = "mangapark" root_fmt = "https://v2.mangapark.{}" browser = "firefox" @staticmethod def parse_chapter_path(path, data): """Get volume/chapter information from url-path of a chapter""" data["volume"], data["chapter_minor"] = 0, "" for part in path.split("/")[1:]: key, value = part[0], part[1:] if key == "c": chapter, dot, minor = value.partition(".") data["chapter"] = text.parse_int(chapter) data["chapter_minor"] = dot + minor elif key == "i": data["chapter_id"] = text.parse_int(value) elif key == "v": data["volume"] = text.parse_int(value) elif key == "s": data["stream"] = text.parse_int(value) elif key == "e": data["chapter_minor"] = "v" + value @staticmethod def parse_chapter_title(title, data): match = re.search(r"(?i)(?:vol(?:ume)?[ .]*(\d+) )?" r"ch(?:apter)?[ .]*(\d+)(\.\w+)?", title) if match: vol, ch, data["chapter_minor"] = match.groups() data["volume"] = text.parse_int(vol) data["chapter"] = text.parse_int(ch) class MangaparkChapterExtractor(MangaparkBase, ChapterExtractor): """Extractor for manga-chapters from mangapark.net""" pattern = (r"(?:https?://)?(?:www\.|v2\.)?mangapark\.(me|net|com)" r"/manga/([^?#]+/i\d+)") test = ( ("https://mangapark.net/manga/gosu/i811653/c055/1", { "count": 50, "keyword": "db1ed9af4f972756a25dbfa5af69a8f155b043ff", }), (("https://mangapark.net/manga" "/ad-astra-per-aspera-hata-kenjirou/i662051/c001.2/1"), { "count": 40, "keyword": "2bb3a8f426383ea13f17ff5582f3070d096d30ac", }), (("https://mangapark.net/manga" "/gekkan-shoujo-nozaki-kun/i2067426/v7/c70/1"), { "count": 15, "keyword": "edc14993c4752cee3a76e09b2f024d40d854bfd1", }), ("https://mangapark.me/manga/gosu/i811615/c55/1"), ("https://mangapark.com/manga/gosu/i811615/c55/1"), ) def __init__(self, match): tld, self.path = match.groups() self.root = self.root_fmt.format(tld) url = "{}/manga/{}?zoom=2".format(self.root, self.path) ChapterExtractor.__init__(self, match, url) def metadata(self, page): data = text.extract_all(page, ( ("manga_id" , "var _manga_id = '", "'"), ("chapter_id", "var _book_id = '", "'"), ("stream" , "var _stream = '", "'"), ("path" , "var _book_link = '", "'"), ("manga" , "

      ", "

      "), ("title" , "", "<"), ), values={"lang": "en", "language": "English"})[0] if not data["path"]: raise exception.NotFoundError("chapter") self.parse_chapter_path(data["path"], data) if "chapter" not in data: self.parse_chapter_title(data["title"], data) data["manga"], _, data["type"] = data["manga"].rpartition(" ") data["manga"] = text.unescape(data["manga"]) data["title"] = data["title"].partition(": ")[2] for key in ("manga_id", "chapter_id", "stream"): data[key] = text.parse_int(data[key]) return data def images(self, page): data = json.loads(text.extract(page, "var _load_pages =", ";")[0]) return [ (text.urljoin(self.root, item["u"]), { "width": text.parse_int(item["w"]), "height": text.parse_int(item["h"]), }) for item in data ] class MangaparkMangaExtractor(MangaparkBase, MangaExtractor): """Extractor for manga from mangapark.net""" chapterclass = MangaparkChapterExtractor pattern = (r"(?:https?://)?(?:www\.|v2\.)?mangapark\.(me|net|com)" r"(/manga/[^/?#]+)/?$") test = ( ("https://mangapark.net/manga/aria", { "url": "b8f7db2f581404753c4af37af66c049a41273b94", "keyword": "2c0d28efaf84fcfe62932b6931ef3c3987cd48c0", }), ("https://mangapark.me/manga/aria"), ("https://mangapark.com/manga/aria"), ) def __init__(self, match): self.root = self.root_fmt.format(match.group(1)) MangaExtractor.__init__(self, match, self.root + match.group(2)) def chapters(self, page): results = [] data = {"lang": "en", "language": "English"} data["manga"] = text.unescape( text.extract(page, '', ' Manga - ')[0]) for stream in page.split('<div id="stream_')[1:]: data["stream"] = text.parse_int(text.extract(stream, '', '"')[0]) for chapter in text.extract_iter(stream, '<li ', '</li>'): path , pos = text.extract(chapter, 'href="', '"') title1, pos = text.extract(chapter, '>', '<', pos) title2, pos = text.extract(chapter, '>: </span>', '<', pos) count , pos = text.extract(chapter, ' of ', ' ', pos) self.parse_chapter_path(path[8:], data) if "chapter" not in data: self.parse_chapter_title(title1, data) if title2: data["title"] = title2.strip() else: data["title"] = title1.partition(":")[2].strip() data["count"] = text.parse_int(count) results.append((self.root + path, data.copy())) data.pop("chapter", None) return results �������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1648649898.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.21.1/gallery_dl/extractor/mangasee.py��������������������������������������������������0000644�0001750�0001750�00000007235�14221063252�020675� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2021-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://mangasee123.com/""" from .common import ChapterExtractor, MangaExtractor from .. import text import json class MangaseeBase(): category = "mangasee" browser = "firefox" root = "https://mangasee123.com" @staticmethod def _transform_chapter(data): chapter = data["Chapter"] return { "title" : data["ChapterName"] or "", "index" : chapter[0], "chapter" : int(chapter[1:-1]), "chapter_minor": "" if chapter[-1] == "0" else "." + chapter[-1], "chapter_string": chapter, "lang" : "en", "language": "English", "date" : text.parse_datetime( data["Date"], "%Y-%m-%d %H:%M:%S"), } class MangaseeChapterExtractor(MangaseeBase, ChapterExtractor): pattern = r"(?:https?://)?mangasee123\.com(/read-online/[^/?#]+\.html)" test = (("https://mangasee123.com/read-online" "/Tokyo-Innocent-chapter-4.5-page-1.html"), { "pattern": r"https://[^/]+/manga/Tokyo-Innocent/0004\.5-00\d\.png", "count": 8, "keyword": { "chapter": 4, "chapter_minor": ".5", "chapter_string": "100045", "count": 8, "date": "dt:2020-01-20 21:52:53", "extension": "png", "filename": r"re:0004\.5-00\d", "index": "1", "lang": "en", "language": "English", "manga": "Tokyo Innocent", "page": int, "title": "", }, }) def metadata(self, page): extr = text.extract_from(page) self.chapter = data = json.loads(extr("vm.CurChapter =", ";\r\n")) self.domain = extr('vm.CurPathName = "', '"') self.slug = extr('vm.IndexName = "', '"') data = self._transform_chapter(data) data["manga"] = text.unescape(extr('vm.SeriesName = "', '"')) return data def images(self, page): chapter = self.chapter["Chapter"][1:] if chapter[-1] == "0": chapter = chapter[:-1] else: chapter = chapter[:-1] + "." + chapter[-1] base = "https://{}/manga/{}/".format(self.domain, self.slug) if self.chapter["Directory"]: base += self.chapter["Directory"] + "/" base += chapter + "-" return [ ("{}{:>03}.png".format(base, i), None) for i in range(1, int(self.chapter["Page"]) + 1) ] class MangaseeMangaExtractor(MangaseeBase, MangaExtractor): chapterclass = MangaseeChapterExtractor pattern = r"(?:https?://)?mangasee123\.com(/manga/[^/?#]+)" test = (("https://mangasee123.com/manga" "/Nakamura-Koedo-To-Daizu-Keisuke-Wa-Umaku-Ikanai"), { "pattern": MangaseeChapterExtractor.pattern, "count": ">= 17", }) def chapters(self, page): slug, pos = text.extract(page, 'vm.IndexName = "', '"') chapters = json.loads(text.extract( page, "vm.Chapters = ", ";\r\n", pos)[0]) result = [] for data in map(self._transform_chapter, chapters): url = "{}/read-online/{}-chapter-{}{}".format( self.root, slug, data["chapter"], data["chapter_minor"]) if data["index"] != "1": url += "-index-" + data["index"] url += "-page-1.html" data["manga"] = slug result.append((url, data)) return result �������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1643756959.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.21.1/gallery_dl/extractor/mangoxo.py���������������������������������������������������0000644�0001750�0001750�00000014460�14176336637�020606� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2019-2020 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.mangoxo.com/""" from .common import Extractor, Message from .. import text, exception from ..cache import cache import hashlib import time class MangoxoExtractor(Extractor): """Base class for mangoxo extractors""" category = "mangoxo" root = "https://www.mangoxo.com" cookiedomain = "www.mangoxo.com" cookienames = ("SESSION",) _warning = True def login(self): username, password = self._get_auth_info() if username: self._update_cookies(self._login_impl(username, password)) elif MangoxoExtractor._warning: MangoxoExtractor._warning = False self.log.warning("Unauthenticated users cannot see " "more than 5 images per album") @cache(maxage=3*3600, keyarg=1) def _login_impl(self, username, password): self.log.info("Logging in as %s", username) url = self.root + "/login" page = self.request(url).text token = text.extract(page, 'id="loginToken" value="', '"')[0] url = self.root + "/api/login" headers = { "X-Requested-With": "XMLHttpRequest", "Referer": self.root + "/login", } data = self._sign_by_md5(username, password, token) response = self.request(url, method="POST", headers=headers, data=data) data = response.json() if str(data.get("result")) != "1": raise exception.AuthenticationError(data.get("msg")) return {"SESSION": self.session.cookies.get("SESSION")} @staticmethod def _sign_by_md5(username, password, token): # https://dns.mangoxo.com/libs/plugins/phoenix-ui/js/phoenix-ui.js params = [ ("username" , username), ("password" , password), ("token" , token), ("timestamp", str(int(time.time()))), ] query = "&".join("=".join(item) for item in sorted(params)) query += "&secretKey=340836904" sign = hashlib.md5(query.encode()).hexdigest() params.append(("sign", sign.upper())) return params @staticmethod def _total_pages(page): return text.parse_int(text.extract(page, "total :", ",")[0]) class MangoxoAlbumExtractor(MangoxoExtractor): """Extractor for albums on mangoxo.com""" subcategory = "album" filename_fmt = "{album[id]}_{num:>03}.{extension}" directory_fmt = ("{category}", "{channel[name]}", "{album[name]}") archive_fmt = "{album[id]}_{num}" pattern = r"(?:https?://)?(?:www\.)?mangoxo\.com/album/(\w+)" test = ("https://www.mangoxo.com/album/lzVOv1Q9", { "url": "ad921fe62663b06e7d73997f7d00646cab7bdd0d", "keyword": { "channel": { "id": "gaxO16d8", "name": "Phoenix", "cover": str, }, "album": { "id": "lzVOv1Q9", "name": "re:池永康晟 Ikenaga Yasunari 透出古朴", "date": "dt:2019-03-22 14:42:00", "description": str, }, "id": int, "num": int, "count": 65, }, }) def __init__(self, match): MangoxoExtractor.__init__(self, match) self.album_id = match.group(1) def items(self): self.login() url = "{}/album/{}/".format(self.root, self.album_id) page = self.request(url).text data = self.metadata(page) imgs = self.images(url, page) yield Message.Directory, data data["extension"] = None for data["num"], path in enumerate(imgs, 1): data["id"] = text.parse_int(text.extract(path, "=", "&")[0]) url = self.root + "/external/" + path.rpartition("url=")[2] yield Message.Url, url, text.nameext_from_url(url, data) def metadata(self, page): """Return general metadata""" extr = text.extract_from(page) title = extr('<img id="cover-img" alt="', '"') cid = extr('href="https://www.mangoxo.com/user/', '"') cname = extr('<img alt="', '"') cover = extr(' src="', '"') count = extr('id="pic-count">', '<') date = extr('class="fa fa-calendar"></i>', '<') descr = extr('<pre>', '</pre>') return { "channel": { "id": cid, "name": text.unescape(cname), "cover": cover, }, "album": { "id": self.album_id, "name": text.unescape(title), "date": text.parse_datetime(date.strip(), "%Y.%m.%d %H:%M"), "description": text.unescape(descr), }, "count": text.parse_int(count), } def images(self, url, page): """Generator; Yields all image URLs""" total = self._total_pages(page) num = 1 while True: yield from text.extract_iter( page, 'class="lightgallery-item" href="', '"') if num >= total: return num += 1 page = self.request(url + str(num)).text class MangoxoChannelExtractor(MangoxoExtractor): """Extractor for all albums on a mangoxo channel""" subcategory = "channel" pattern = r"(?:https?://)?(?:www\.)?mangoxo\.com/(\w+)/album" test = ("https://www.mangoxo.com/phoenix/album", { "pattern": MangoxoAlbumExtractor.pattern, "range": "1-30", "count": "> 20", }) def __init__(self, match): MangoxoExtractor.__init__(self, match) self.user = match.group(1) def items(self): self.login() num = total = 1 url = "{}/{}/album/".format(self.root, self.user) data = {"_extractor": MangoxoAlbumExtractor} while True: page = self.request(url + str(num)).text for album in text.extract_iter( page, '<a class="link black" href="', '"'): yield Message.Queue, album, data if num == 1: total = self._total_pages(page) if num >= total: return num += 1 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1643756959.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.21.1/gallery_dl/extractor/mastodon.py��������������������������������������������������0000644�0001750�0001750�00000021452�14176336637�020761� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2019-2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for Mastodon instances""" from .common import BaseExtractor, Message from .. import text, exception from ..cache import cache class MastodonExtractor(BaseExtractor): """Base class for mastodon extractors""" basecategory = "mastodon" directory_fmt = ("mastodon", "{instance}", "{account[username]}") filename_fmt = "{category}_{id}_{media[id]}.{extension}" archive_fmt = "{media[id]}" cookiedomain = None def __init__(self, match): BaseExtractor.__init__(self, match) self.instance = self.root.partition("://")[2] self.item = match.group(match.lastindex) self.reblogs = self.config("reblogs", False) self.replies = self.config("replies", True) def items(self): for status in self.statuses(): if not self.reblogs and status["reblog"]: self.log.debug("Skipping %s (reblog)", status["id"]) continue if not self.replies and status["in_reply_to_id"]: self.log.debug("Skipping %s (reply)", status["id"]) continue attachments = status["media_attachments"] del status["media_attachments"] status["instance"] = self.instance status["tags"] = [tag["name"] for tag in status["tags"]] status["date"] = text.parse_datetime( status["created_at"][:19], "%Y-%m-%dT%H:%M:%S") yield Message.Directory, status for media in attachments: status["media"] = media url = media["url"] yield Message.Url, url, text.nameext_from_url(url, status) def statuses(self): """Return an iterable containing all relevant Status objects""" return () INSTANCES = { "mastodon.social": { "root" : "https://mastodon.social", "access-token" : "Y06R36SMvuXXN5_wiPKFAEFiQaMSQg0o_hGgc86Jj48", "client-id" : "dBSHdpsnOUZgxOnjKSQrWEPakO3ctM7HmsyoOd4FcRo", "client-secret": "DdrODTHs_XoeOsNVXnILTMabtdpWrWOAtrmw91wU1zI", }, "pawoo": { "root" : "https://pawoo.net", "access-token" : "c12c9d275050bce0dc92169a28db09d7" "0d62d0a75a8525953098c167eacd3668", "client-id" : "978a25f843ec01e53d09be2c290cd75c" "782bc3b7fdbd7ea4164b9f3c3780c8ff", "client-secret": "9208e3d4a7997032cf4f1b0e12e5df38" "8428ef1fadb446dcfeb4f5ed6872d97b", }, "baraag": { "root" : "https://baraag.net", "access-token" : "53P1Mdigf4EJMH-RmeFOOSM9gdSDztmrAYFgabOKKE0", "client-id" : "czxx2qilLElYHQ_sm-lO8yXuGwOHxLX9RYYaD0-nq1o", "client-secret": "haMaFdMBgK_-BIxufakmI2gFgkYjqmgXGEO2tB-R2xY", } } BASE_PATTERN = MastodonExtractor.update(INSTANCES) class MastodonUserExtractor(MastodonExtractor): """Extractor for all images of an account/user""" subcategory = "user" pattern = BASE_PATTERN + r"/(?:@|users/)([^/?#]+)(?:/media)?/?$" test = ( ("https://mastodon.social/@jk", { "pattern": r"https://files.mastodon.social/media_attachments" r"/files/(\d+/){3,}original/\w+", "range": "1-60", "count": 60, }), ("https://pawoo.net/@yoru_nine/", { "range": "1-60", "count": 60, }), ("https://baraag.net/@pumpkinnsfw"), ("https://mastodon.social/@id:10843"), ("https://mastodon.social/users/id:10843"), ("https://mastodon.social/users/jk"), ) def statuses(self): api = MastodonAPI(self) return api.account_statuses( api.account_id_by_username(self.item), only_media=not self.config("text-posts", False), exclude_replies=not self.replies, ) class MastodonFollowingExtractor(MastodonExtractor): """Extractor for followed mastodon users""" subcategory = "following" pattern = BASE_PATTERN + r"/users/([^/?#]+)/following" test = ( ("https://mastodon.social/users/0x4f/following", { "extractor": False, "count": ">= 20", }), ("https://mastodon.social/users/id:10843/following"), ("https://pawoo.net/users/yoru_nine/following"), ("https://baraag.net/users/pumpkinnsfw/following"), ) def items(self): api = MastodonAPI(self) account_id = api.account_id_by_username(self.item) for account in api.account_following(account_id): account["_extractor"] = MastodonUserExtractor yield Message.Queue, account["url"], account class MastodonStatusExtractor(MastodonExtractor): """Extractor for images from a status""" subcategory = "status" pattern = BASE_PATTERN + r"/@[^/?#]+/(\d+)" test = ( ("https://mastodon.social/@jk/103794036899778366", { "count": 4, }), ("https://pawoo.net/@yoru_nine/105038878897832922", { "content": "b52e807f8ab548d6f896b09218ece01eba83987a", }), ("https://baraag.net/@pumpkinnsfw/104364170556898443", { "content": "67748c1b828c58ad60d0fe5729b59fb29c872244", }), ) def statuses(self): return (MastodonAPI(self).status(self.item),) class MastodonAPI(): """Minimal interface for the Mastodon API https://docs.joinmastodon.org/ https://github.com/tootsuite/mastodon """ def __init__(self, extractor): self.root = extractor.root self.extractor = extractor access_token = extractor.config("access-token") if access_token is None or access_token == "cache": access_token = _access_token_cache(extractor.instance) if not access_token: try: access_token = INSTANCES[extractor.category]["access-token"] except (KeyError, TypeError): raise exception.StopExtraction( "Missing access token.\n" "Run 'gallery-dl oauth:mastodon:%s' to obtain one.", extractor.instance) self.headers = {"Authorization": "Bearer " + access_token} def account_id_by_username(self, username): if username.startswith("id:"): return username[3:] handle = "@{}@{}".format(username, self.extractor.instance) for account in self.account_search(handle, 1): if account["username"] == username: return account["id"] raise exception.NotFoundError("account") def account_following(self, account_id): endpoint = "/v1/accounts/{}/following".format(account_id) return self._pagination(endpoint, None) def account_search(self, query, limit=40): """Search for accounts""" endpoint = "/v1/accounts/search" params = {"q": query, "limit": limit} return self._call(endpoint, params).json() def account_statuses(self, account_id, only_media=True, exclude_replies=False): """Fetch an account's statuses""" endpoint = "/v1/accounts/{}/statuses".format(account_id) params = {"only_media" : "1" if only_media else "0", "exclude_replies": "1" if exclude_replies else "0"} return self._pagination(endpoint, params) def status(self, status_id): """Fetch a status""" endpoint = "/v1/statuses/" + status_id return self._call(endpoint).json() def _call(self, endpoint, params=None): if endpoint.startswith("http"): url = endpoint else: url = self.root + "/api" + endpoint while True: response = self.extractor.request( url, params=params, headers=self.headers, fatal=None) code = response.status_code if code < 400: return response if code == 404: raise exception.NotFoundError() if code == 429: self.extractor.wait(until=text.parse_datetime( response.headers["x-ratelimit-reset"], "%Y-%m-%dT%H:%M:%S.%fZ", )) continue raise exception.StopExtraction(response.json().get("error")) def _pagination(self, endpoint, params): url = endpoint while url: response = self._call(url, params) yield from response.json() url = response.links.get("next") if not url: return url = url["url"] params = None @cache(maxage=100*365*24*3600, keyarg=0) def _access_token_cache(instance): return None ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1648487626.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.21.1/gallery_dl/extractor/mememuseum.py������������������������������������������������0000644�0001750�0001750�00000010035�14220366312�021266� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://meme.museum/""" from .common import Extractor, Message from .. import text class MememuseumExtractor(Extractor): """Base class for meme.museum extractors""" basecategory = "booru" category = "mememuseum" filename_fmt = "{category}_{id}_{md5}.{extension}" archive_fmt = "{id}" root = "https://meme.museum" def items(self): data = self.metadata() for post in self.posts(): url = post["file_url"] for key in ("id", "width", "height"): post[key] = text.parse_int(post[key]) post["tags"] = text.unquote(post["tags"]) post.update(data) yield Message.Directory, post yield Message.Url, url, text.nameext_from_url(url, post) def metadata(self): """Return general metadata""" return () def posts(self): """Return an iterable containing data of all relevant posts""" return () class MememuseumTagExtractor(MememuseumExtractor): """Extractor for images from meme.museum by search-tags""" subcategory = "tag" directory_fmt = ("{category}", "{search_tags}") pattern = r"(?:https?://)?meme\.museum/post/list/([^/?#]+)" test = ("https://meme.museum/post/list/animated/1", { "pattern": r"https://meme\.museum/_images/\w+/\d+%20-%20", "count": ">= 30" }) per_page = 25 def __init__(self, match): MememuseumExtractor.__init__(self, match) self.tags = text.unquote(match.group(1)) def metadata(self): return {"search_tags": self.tags} def posts(self): pnum = 1 while True: url = "{}/post/list/{}/{}".format(self.root, self.tags, pnum) extr = text.extract_from(self.request(url).text) while True: mime = extr("data-mime='", "'") if not mime: break pid = extr("data-post-id='", "'") tags, dimensions, size = extr("title='", "'").split(" // ") md5 = extr("/_thumbs/", "/") width, _, height = dimensions.partition("x") yield { "file_url": "{}/_images/{}/{}%20-%20{}.{}".format( self.root, md5, pid, text.quote(tags), mime.rpartition("/")[2]), "id": pid, "md5": md5, "tags": tags, "width": width, "height": height, "size": text.parse_bytes(size[:-1]), } if not extr(">Next<", ">"): return pnum += 1 class MememuseumPostExtractor(MememuseumExtractor): """Extractor for single images from meme.museum""" subcategory = "post" pattern = r"(?:https?://)?meme\.museum/post/view/(\d+)" test = ("https://meme.museum/post/view/10243", { "pattern": r"https://meme\.museum/_images/105febebcd5ca791ee332adc4997" r"1f78/10243%20-%20g%20beard%20open_source%20richard_stallm" r"an%20stallman%20tagme%20text\.jpg", "keyword": "3c8009251480cf17248c08b2b194dc0c4d59580e", "content": "45565f3f141fc960a8ae1168b80e718a494c52d2", }) def __init__(self, match): MememuseumExtractor.__init__(self, match) self.post_id = match.group(1) def posts(self): url = "{}/post/view/{}".format(self.root, self.post_id) extr = text.extract_from(self.request(url).text) return ({ "id" : self.post_id, "tags" : extr(": ", "<"), "md5" : extr("/_thumbs/", "/"), "file_url": self.root + extr("id='main_image' src='", "'"), "width" : extr("data-width=", " ").strip("'\""), "height" : extr("data-height=", " ").strip("'\""), "size" : 0, },) ���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1639190302.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.21.1/gallery_dl/extractor/message.py���������������������������������������������������0000644�0001750�0001750�00000003453�14155007436�020547� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2015-2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. class Message(): """Enum for message identifiers Extractors yield their results as message-tuples, where the first element is one of the following identifiers. This message-identifier determines the type and meaning of the other elements in such a tuple. - Message.Version: - Message protocol version (currently always '1') - 2nd element specifies the version of all following messages as integer - Message.Directory: - Sets the target directory for all following images - 2nd element is a dictionary containing general metadata - Message.Url: - Image URL and its metadata - 2nd element is the URL as a string - 3rd element is a dictionary with image-specific metadata - Message.Headers: # obsolete - HTTP headers to use while downloading - 2nd element is a dictionary with header-name and -value pairs - Message.Cookies: # obsolete - Cookies to use while downloading - 2nd element is a dictionary with cookie-name and -value pairs - Message.Queue: - (External) URL that should be handled by another extractor - 2nd element is the (external) URL as a string - 3rd element is a dictionary containing URL-specific metadata - Message.Urllist: # obsolete - Same as Message.Url, but its 2nd element is a list of multiple URLs - The additional URLs serve as a fallback if the primary one fails """ Version = 1 Directory = 2 Url = 3 # Headers = 4 # Cookies = 5 Queue = 6 # Urllist = 7 # Metadata = 8 ���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1643756959.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.21.1/gallery_dl/extractor/moebooru.py��������������������������������������������������0000644�0001750�0001750�00000017331�14176336637�020765� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2020-2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for Moebooru based sites""" from .booru import BooruExtractor from .. import text import collections import datetime import re class MoebooruExtractor(BooruExtractor): """Base class for Moebooru extractors""" basecategory = "moebooru" filename_fmt = "{category}_{id}_{md5}.{extension}" page_start = 1 @staticmethod def _prepare(post): post["date"] = text.parse_timestamp(post["created_at"]) def _extended_tags(self, post): url = "{}/post/show/{}".format(self.root, post["id"]) page = self.request(url).text html = text.extract(page, '<ul id="tag-', '</ul>')[0] if html: tags = collections.defaultdict(list) pattern = re.compile(r"tag-type-([^\"' ]+).*?[?;]tags=([^\"'+]+)") for tag_type, tag_name in pattern.findall(html): tags[tag_type].append(text.unquote(tag_name)) for key, value in tags.items(): post["tags_" + key] = " ".join(value) def _pagination(self, url, params): params["page"] = self.page_start params["limit"] = self.per_page while True: posts = self.request(url, params=params).json() yield from posts if len(posts) < self.per_page: return params["page"] += 1 BASE_PATTERN = MoebooruExtractor.update({ "yandere": { "root": "https://yande.re", }, "konachan": { "root": "https://konachan.com", "pattern": r"konachan\.(?:com|net)", }, "hypnohub": { "root": "https://hypnohub.net", }, "sakugabooru": { "root": "https://www.sakugabooru.com", "pattern": r"(?:www\.)?sakugabooru\.com", }, "lolibooru": { "root": "https://lolibooru.moe", }, }) class MoebooruPostExtractor(MoebooruExtractor): subcategory = "post" archive_fmt = "{id}" pattern = BASE_PATTERN + r"/post/show/(\d+)" test = ( ("https://yande.re/post/show/51824", { "content": "59201811c728096b2d95ce6896fd0009235fe683", "options": (("tags", True),), "keyword": { "tags_artist": "sasaki_tamaru", "tags_circle": "softhouse_chara", "tags_copyright": "ouzoku", "tags_general": str, }, }), ("https://konachan.com/post/show/205189", { "content": "674e75a753df82f5ad80803f575818b8e46e4b65", "options": (("tags", True),), "keyword": { "tags_artist": "patata", "tags_character": "clownpiece", "tags_copyright": "touhou", "tags_general": str, }, }), ("https://konachan.net/post/show/205189"), ("https://hypnohub.net/post/show/73964", { "content": "02d5f5a8396b621a6efc04c5f8ef1b7225dfc6ee", }), ("https://www.sakugabooru.com/post/show/125570"), ("https://lolibooru.moe/post/show/287835"), ) def __init__(self, match): MoebooruExtractor.__init__(self, match) self.post_id = match.group(match.lastindex) def posts(self): params = {"tags": "id:" + self.post_id} return self.request(self.root + "/post.json", params=params).json() class MoebooruTagExtractor(MoebooruExtractor): subcategory = "tag" directory_fmt = ("{category}", "{search_tags}") archive_fmt = "t_{search_tags}_{id}" pattern = BASE_PATTERN + r"/post\?(?:[^&#]*&)*tags=([^&#]+)" test = ( ("https://yande.re/post?tags=ouzoku+armor", { "content": "59201811c728096b2d95ce6896fd0009235fe683", }), ("https://konachan.com/post?tags=patata", { "content": "838cfb815e31f48160855435655ddf7bfc4ecb8d", }), ("https://konachan.net/post?tags=patata"), ("https://hypnohub.net/post?tags=gonoike_biwa", { "url": "072330c34a1e773d0cafd00e64b8060d34b078b6", }), ("https://www.sakugabooru.com/post?tags=nichijou"), ("https://lolibooru.moe/post?tags=ruu_%28tksymkw%29"), ) def __init__(self, match): MoebooruExtractor.__init__(self, match) tags = match.group(match.lastindex) self.tags = text.unquote(tags.replace("+", " ")) def metadata(self): return {"search_tags": self.tags} def posts(self): params = {"tags": self.tags} return self._pagination(self.root + "/post.json", params) class MoebooruPoolExtractor(MoebooruExtractor): subcategory = "pool" directory_fmt = ("{category}", "pool", "{pool}") archive_fmt = "p_{pool}_{id}" pattern = BASE_PATTERN + r"/pool/show/(\d+)" test = ( ("https://yande.re/pool/show/318", { "content": "2a35b9d6edecce11cc2918c6dce4de2198342b68", }), ("https://konachan.com/pool/show/95", { "content": "cf0546e38a93c2c510a478f8744e60687b7a8426", }), ("https://konachan.net/pool/show/95"), ("https://hypnohub.net/pool/show/61", { "url": "fd74991c8729e77acd3c35eb6ddc4128ff445adf", }), ("https://www.sakugabooru.com/pool/show/54"), ("https://lolibooru.moe/pool/show/239"), ) def __init__(self, match): MoebooruExtractor.__init__(self, match) self.pool_id = match.group(match.lastindex) def metadata(self): return {"pool": text.parse_int(self.pool_id)} def posts(self): params = {"tags": "pool:" + self.pool_id} return self._pagination(self.root + "/post.json", params) class MoebooruPopularExtractor(MoebooruExtractor): subcategory = "popular" directory_fmt = ("{category}", "popular", "{scale}", "{date}") archive_fmt = "P_{scale[0]}_{date}_{id}" pattern = BASE_PATTERN + \ r"/post/popular_(by_(?:day|week|month)|recent)(?:\?([^#]*))?" test = ( ("https://yande.re/post/popular_by_month?month=6&year=2014", { "count": 40, }), ("https://yande.re/post/popular_recent"), ("https://konachan.com/post/popular_by_month?month=11&year=2010", { "count": 20, }), ("https://konachan.com/post/popular_recent"), ("https://konachan.net/post/popular_recent"), ("https://hypnohub.net/post/popular_by_month?month=6&year=2014", { "count": 20, }), ("https://hypnohub.net/post/popular_recent"), ("https://www.sakugabooru.com/post/popular_recent"), ("https://lolibooru.moe/post/popular_recent"), ) def __init__(self, match): MoebooruExtractor.__init__(self, match) self.scale = match.group(match.lastindex-1) self.query = match.group(match.lastindex) def metadata(self): self.params = params = text.parse_query(self.query) if "year" in params: date = "{:>04}-{:>02}-{:>02}".format( params["year"], params.get("month", "01"), params.get("day", "01"), ) else: date = datetime.date.today().isoformat() scale = self.scale if scale.startswith("by_"): scale = scale[3:] if scale == "week": date = datetime.date.fromisoformat(date) date = (date - datetime.timedelta(days=date.weekday())).isoformat() elif scale == "month": date = date[:-3] return {"date": date, "scale": scale} def posts(self): url = "{}/post/popular_{}.json".format(self.root, self.scale) return self.request(url, params=self.params).json() �������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1641689335.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.21.1/gallery_dl/extractor/myhentaigallery.py�������������������������������������������0000644�0001750�0001750�00000004721�14166430367�022325� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extract hentai-gallery from https://myhentaigallery.com/""" from .common import GalleryExtractor from .. import text, exception class MyhentaigalleryGalleryExtractor(GalleryExtractor): """Extractor for image galleries from myhentaigallery.com""" category = "myhentaigallery" directory_fmt = ("{category}", "{gallery_id} {artist:?[/] /J, }{title}") pattern = (r"(?:https?://)?myhentaigallery\.com" r"/gallery/(?:thumbnails|show)/(\d+)") test = ( ("https://myhentaigallery.com/gallery/thumbnails/16247", { "pattern": r"https://images.myhentaigrid.com/imagesgallery/images" r"/[^/]+/original/\d+\.jpg", "keyword": { "artist" : list, "count" : 11, "gallery_id": 16247, "group" : list, "parodies" : list, "tags" : ["Giantess"], "title" : "Attack Of The 50ft Woman 1", }, }), ("https://myhentaigallery.com/gallery/show/16247/1"), ) root = "https://myhentaigallery.com" def __init__(self, match): self.gallery_id = match.group(1) url = "{}/gallery/thumbnails/{}".format(self.root, self.gallery_id) GalleryExtractor.__init__(self, match, url) self.session.headers["Referer"] = url def metadata(self, page): extr = text.extract_from(page) split = text.split_html title = extr('<div class="comic-description">\n<h1>', '</h1>') if not title: raise exception.NotFoundError("gallery") return { "title" : text.unescape(title), "gallery_id": text.parse_int(self.gallery_id), "tags" : split(extr('<div>\nCategories:', '</div>')), "artist" : split(extr('<div>\nArtists:' , '</div>')), "group" : split(extr('<div>\nGroups:' , '</div>')), "parodies" : split(extr('<div>\nParodies:' , '</div>')), } def images(self, page): return [ (text.unescape(text.extract(url, 'src="', '"')[0]).replace( "/thumbnail/", "/original/"), None) for url in text.extract_iter(page, 'class="comic-thumb"', '</div>') ] �����������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1643756959.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.21.1/gallery_dl/extractor/myportfolio.py�����������������������������������������������0000644�0001750�0001750�00000007576�14176336637�021533� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2018-2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extract images from https://www.myportfolio.com/""" from .common import Extractor, Message from .. import text, exception class MyportfolioGalleryExtractor(Extractor): """Extractor for an image gallery on www.myportfolio.com""" category = "myportfolio" subcategory = "gallery" directory_fmt = ("{category}", "{user}", "{title}") filename_fmt = "{num:>02}.{extension}" archive_fmt = "{user}_{filename}" pattern = (r"(?:myportfolio:(?:https?://)?([^/]+)|" r"(?:https?://)?([\w-]+\.myportfolio\.com))" r"(/[^/?&#]+)?") test = ( ("https://andrewling.myportfolio.com/volvo-xc-90-hybrid", { "url": "acea0690c76db0e5cf267648cefd86e921bc3499", "keyword": "6ac6befe2ee0af921d24cf1dd4a4ed71be06db6d", }), ("https://andrewling.myportfolio.com/", { "pattern": r"https://andrewling\.myportfolio\.com/[^/?#+]+$", "count": ">= 6", }), ("https://stevenilousphotography.myportfolio.com/society", { "exception": exception.NotFoundError, }), # custom domain ("myportfolio:https://tooco.com.ar/6-of-diamonds-paradise-bird", { "count": 3, }), ("myportfolio:https://tooco.com.ar/", { "pattern": pattern, "count": ">= 40", }), ) def __init__(self, match): Extractor.__init__(self, match) domain1, domain2, self.path = match.groups() self.domain = domain1 or domain2 self.prefix = "myportfolio:" if domain1 else "" def items(self): url = "https://" + self.domain + (self.path or "") response = self.request(url) if response.history and response.url.endswith(".adobe.com/missing"): raise exception.NotFoundError() page = response.text projects = text.extract( page, '<section class="project-covers', '</section>')[0] if projects: data = {"_extractor": MyportfolioGalleryExtractor} base = self.prefix + "https://" + self.domain for path in text.extract_iter(projects, ' href="', '"'): yield Message.Queue, base + path, data else: data = self.metadata(page) imgs = self.images(page) data["count"] = len(imgs) yield Message.Directory, data for data["num"], url in enumerate(imgs, 1): yield Message.Url, url, text.nameext_from_url(url, data) @staticmethod def metadata(page): """Collect general image metadata""" # og:title contains data as "<user> - <title>", but both # <user> and <title> can contain a "-" as well, so we get the title # from somewhere else and cut that amount from the og:title content extr = text.extract_from(page) user = extr('property="og:title" content="', '"') or \ extr('property=og:title content="', '"') descr = extr('property="og:description" content="', '"') or \ extr('property=og:description content="', '"') title = extr('<h1 ', '</h1>') if title: title = title.partition(">")[2] user = user[:-len(title)-3] elif user: user, _, title = user.partition(" - ") else: raise exception.NotFoundError() return { "user": text.unescape(user), "title": text.unescape(title), "description": text.unescape(descr), } @staticmethod def images(page): """Extract and return a list of all image-urls""" return list(text.extract_iter(page, 'js-lightbox" data-src="', '"')) ����������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1643756959.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.21.1/gallery_dl/extractor/naver.py�����������������������������������������������������0000644�0001750�0001750�00000011525�14176336637�020250� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2019 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://blog.naver.com/""" from .common import GalleryExtractor, Extractor, Message from .. import text class NaverBase(): """Base class for naver extractors""" category = "naver" root = "https://blog.naver.com" class NaverPostExtractor(NaverBase, GalleryExtractor): """Extractor for blog posts on blog.naver.com""" subcategory = "post" filename_fmt = "{num:>03}.{extension}" directory_fmt = ("{category}", "{blog[user]} {blog[id]}", "{post[date]:%Y-%m-%d} {post[title]}") archive_fmt = "{blog[id]}_{post[num]}_{num}" pattern = (r"(?:https?://)?blog\.naver\.com/" r"(?:PostView\.nhn\?blogId=(\w+)&logNo=(\d+)|(\w+)/(\d+)/?$)") test = ( ("https://blog.naver.com/rlfqjxm0/221430673006", { "url": "6c694f3aced075ed5e9511f1e796d14cb26619cc", "keyword": "a6e23d19afbee86b37d6e7ad934650c379d2cb1e", }), (("https://blog.naver.com/PostView.nhn" "?blogId=rlfqjxm0&logNo=221430673006"), { "url": "6c694f3aced075ed5e9511f1e796d14cb26619cc", "keyword": "a6e23d19afbee86b37d6e7ad934650c379d2cb1e", }), ) def __init__(self, match): blog_id = match.group(1) if blog_id: self.blog_id = blog_id self.post_id = match.group(2) else: self.blog_id = match.group(3) self.post_id = match.group(4) url = "{}/PostView.nhn?blogId={}&logNo={}".format( self.root, self.blog_id, self.post_id) GalleryExtractor.__init__(self, match, url) def metadata(self, page): extr = text.extract_from(page) data = { "post": { "title" : extr('"og:title" content="', '"'), "description": extr('"og:description" content="', '"'), "num" : text.parse_int(self.post_id), }, "blog": { "id" : self.blog_id, "num" : text.parse_int(extr("var blogNo = '", "'")), "user" : extr("var nickName = '", "'"), }, } data["post"]["date"] = text.parse_datetime( extr('se_publishDate pcol2">', '<') or extr('_postAddDate">', '<'), "%Y. %m. %d. %H:%M") return data def images(self, page): return [ (url.replace("://post", "://blog", 1).partition("?")[0], None) for url in text.extract_iter(page, 'data-lazy-src="', '"') ] class NaverBlogExtractor(NaverBase, Extractor): """Extractor for a user's blog on blog.naver.com""" subcategory = "blog" categorytransfer = True pattern = (r"(?:https?://)?blog\.naver\.com/" r"(?:PostList.nhn\?(?:[^&#]+&)*blogId=([^&#]+)|(\w+)/?$)") test = ( ("https://blog.naver.com/gukjung", { "pattern": NaverPostExtractor.pattern, "count": 12, "range": "1-12", }), ("https://blog.naver.com/PostList.nhn?blogId=gukjung", { "pattern": NaverPostExtractor.pattern, "count": 12, "range": "1-12", }), ) def __init__(self, match): Extractor.__init__(self, match) self.blog_id = match.group(1) or match.group(2) def items(self): # fetch first post number url = "{}/PostList.nhn?blogId={}".format(self.root, self.blog_id) post_num = text.extract( self.request(url).text, 'gnFirstLogNo = "', '"', )[0] # setup params for API calls url = "{}/PostViewBottomTitleListAsync.nhn".format(self.root) params = { "blogId" : self.blog_id, "logNo" : post_num or "0", "viewDate" : "", "categoryNo" : "", "parentCategoryNo" : "", "showNextPage" : "true", "showPreviousPage" : "false", "sortDateInMilli" : "", "isThumbnailViewType": "false", "countPerPage" : "", } # loop over all posts while True: data = self.request(url, params=params).json() for post in data["postList"]: post["url"] = "{}/PostView.nhn?blogId={}&logNo={}".format( self.root, self.blog_id, post["logNo"]) post["_extractor"] = NaverPostExtractor yield Message.Queue, post["url"], post if not data["hasNextPage"]: return params["logNo"] = data["nextIndexLogNo"] params["sortDateInMilli"] = data["nextIndexSortDate"] ���������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1643756959.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.21.1/gallery_dl/extractor/naverwebtoon.py����������������������������������������������0000644�0001750�0001750�00000007161�14176336637�021647� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2021 Seonghyeon Cho # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://comic.naver.com/""" from .common import GalleryExtractor, Extractor, Message from .. import text BASE_PATTERN = r"(?:https?://)?comic\.naver\.com/webtoon" class NaverwebtoonBase(): """Base class for naver webtoon extractors""" category = "naverwebtoon" root = "https://comic.naver.com" class NaverwebtoonEpisodeExtractor(NaverwebtoonBase, GalleryExtractor): subcategory = "episode" directory_fmt = ("{category}", "{comic}") filename_fmt = "{episode:>03}-{num:>02}.{extension}" archive_fmt = "{title_id}_{episode}_{num}" pattern = BASE_PATTERN + r"/detail\.nhn\?([^#]+)" test = ( (("https://comic.naver.com/webtoon/detail.nhn?" "titleId=26458&no=1&weekday=tue"), { "url": "47a956ba8c7a837213d5985f50c569fcff986f75", "content": "3806b6e8befbb1920048de9888dfce6220f69a60", "count": 14 }), ) def __init__(self, match): query = match.group(1) url = "{}/webtoon/detail.nhn?{}".format(self.root, query) GalleryExtractor.__init__(self, match, url) query = text.parse_query(query) self.title_id = query.get("titleId") self.episode = query.get("no") def metadata(self, page): extr = text.extract_from(page) return { "title_id": self.title_id, "episode" : self.episode, "title" : extr('property="og:title" content="', '"'), "comic" : extr('<h2>', '<span'), "authors" : extr('class="wrt_nm">', '</span>').strip().split("/"), "description": extr('<p class="txt">', '</p>'), "genre" : extr('<span class="genre">', '</span>'), "date" : extr('<dd class="date">', '</dd>'), } @staticmethod def images(page): view_area = text.extract(page, 'id="comic_view_area"', '</div>')[0] return [ (url, None) for url in text.extract_iter(view_area, '<img src="', '"') if "/static/" not in url ] class NaverwebtoonComicExtractor(NaverwebtoonBase, Extractor): subcategory = "comic" categorytransfer = True pattern = (BASE_PATTERN + r"/list\.nhn\?([^#]+)") test = ( ("https://comic.naver.com/webtoon/list.nhn?titleId=22073", { "pattern": NaverwebtoonEpisodeExtractor.pattern, "count": 32, }), ) def __init__(self, match): Extractor.__init__(self, match) query = text.parse_query(match.group(1)) self.title_id = query.get("titleId") self.page_no = text.parse_int(query.get("page"), 1) def items(self): url = self.root + "/webtoon/list.nhn" params = {"titleId": self.title_id, "page": self.page_no} data = {"_extractor": NaverwebtoonEpisodeExtractor} while True: page = self.request(url, params=params).text data["page"] = self.page_no for episode_url in self.get_episode_urls(page): yield Message.Queue, episode_url, data if 'class="next"' not in page: return params["page"] += 1 def get_episode_urls(self, page): """Extract and return all episode urls in page""" return [ self.root + "/webtoon/detail.nhn?" + query for query in text.extract_iter( page, '<a href="/webtoon/detail?', '"') ][::2] ���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1649326994.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.21.1/gallery_dl/extractor/newgrounds.py������������������������������������������������0000644�0001750�0001750�00000052465�14223535622�021324� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2018-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.newgrounds.com/""" from .common import Extractor, Message from .. import text, exception from ..cache import cache import itertools import json class NewgroundsExtractor(Extractor): """Base class for newgrounds extractors""" category = "newgrounds" directory_fmt = ("{category}", "{artist[:10]:J, }") filename_fmt = "{category}_{_index}_{title}.{extension}" archive_fmt = "{_index}" root = "https://www.newgrounds.com" cookiedomain = ".newgrounds.com" cookienames = ("NG_GG_username", "vmk1du5I8m") def __init__(self, match): Extractor.__init__(self, match) self.user = match.group(1) self.user_root = "https://{}.newgrounds.com".format(self.user) self.flash = self.config("flash", True) fmt = self.config("format", "original") self.format = (True if not fmt or fmt == "original" else fmt if isinstance(fmt, int) else text.parse_int(fmt.rstrip("p"))) def items(self): self.login() metadata = self.metadata() for post_url in self.posts(): try: post = self.extract_post(post_url) url = post.get("url") except Exception: self.log.debug("", exc_info=True) url = None if url: if metadata: post.update(metadata) yield Message.Directory, post yield Message.Url, url, text.nameext_from_url(url, post) for num, url in enumerate(text.extract_iter( post["_comment"], 'data-smartload-src="', '"'), 1): post["num"] = num post["_index"] = "{}_{:>02}".format(post["index"], num) url = text.ensure_http_scheme(url) yield Message.Url, url, text.nameext_from_url(url, post) else: self.log.warning( "Unable to get download URL for '%s'", post_url) def posts(self): """Return URLs of all relevant post pages""" return self._pagination(self._path) def metadata(self): """Return general metadata""" def login(self): username, password = self._get_auth_info() if username: self._update_cookies(self._login_impl(username, password)) @cache(maxage=360*24*3600, keyarg=1) def _login_impl(self, username, password): self.log.info("Logging in as %s", username) url = self.root + "/passport/" page = self.request(url).text headers = {"Origin": self.root, "Referer": url} url = text.urljoin(self.root, text.extract(page, 'action="', '"')[0]) data = { "username": username, "password": password, "remember": "1", "login" : "1", } response = self.request(url, method="POST", headers=headers, data=data) if not response.history: raise exception.AuthenticationError() return { cookie.name: cookie.value for cookie in response.history[0].cookies if cookie.expires and cookie.domain == self.cookiedomain } def extract_post(self, post_url): url = post_url if "/art/view/" in post_url: extract_data = self._extract_image_data elif "/audio/listen/" in post_url: extract_data = self._extract_audio_data else: extract_data = self._extract_media_data if self.flash: url += "/format/flash" with self.request(url, fatal=False) as response: if response.status_code >= 400: return {} page = response.text pos = page.find('id="adults_only"') if pos >= 0: msg = text.extract(page, 'class="highlight">', '<', pos)[0] self.log.warning('"%s"', msg) extr = text.extract_from(page) data = extract_data(extr, post_url) data["_comment"] = extr( 'id="author_comments"', '</div>').partition(">")[2] data["comment"] = text.unescape(text.remove_html( data["_comment"], "", "")) data["favorites"] = text.parse_int(extr( 'id="faves_load">', '<').replace(",", "")) data["score"] = text.parse_float(extr('id="score_number">', '<')) data["tags"] = text.split_html(extr('<dd class="tags">', '</dd>')) data["artist"] = [ text.extract(user, '//', '.')[0] for user in text.extract_iter(page, '<div class="item-user">', '>') ] data["tags"].sort() data["user"] = self.user or data["artist"][0] data["post_url"] = post_url return data @staticmethod def _extract_image_data(extr, url): full = text.extract_from(json.loads(extr('"full_image_text":', '});'))) data = { "title" : text.unescape(extr('"og:title" content="', '"')), "description": text.unescape(extr(':description" content="', '"')), "date" : text.parse_datetime(extr( 'itemprop="datePublished" content="', '"')), "rating" : extr('class="rated-', '"'), "url" : full('src="', '"'), "width" : text.parse_int(full('width="', '"')), "height" : text.parse_int(full('height="', '"')), } index = data["url"].rpartition("/")[2].partition("_")[0] data["index"] = text.parse_int(index) data["_index"] = index return data @staticmethod def _extract_audio_data(extr, url): index = url.split("/")[5] return { "title" : text.unescape(extr('"og:title" content="', '"')), "description": text.unescape(extr(':description" content="', '"')), "date" : text.parse_datetime(extr( 'itemprop="datePublished" content="', '"')), "url" : extr('{"url":"', '"').replace("\\/", "/"), "index" : text.parse_int(index), "_index" : index, "rating" : "", } def _extract_media_data(self, extr, url): index = url.split("/")[5] title = extr('"og:title" content="', '"') descr = extr('"og:description" content="', '"') src = extr('{"url":"', '"') if src: src = src.replace("\\/", "/") fallback = () date = text.parse_datetime(extr( 'itemprop="datePublished" content="', '"')) else: url = self.root + "/portal/video/" + index headers = { "Accept": "application/json, text/javascript, */*; q=0.01", "X-Requested-With": "XMLHttpRequest", "Referer": self.root, } sources = self.request(url, headers=headers).json()["sources"] if self.format is True: src = sources["360p"][0]["src"].replace(".360p.", ".") formats = sources else: formats = [] for fmt, src in sources.items(): width = text.parse_int(fmt.rstrip("p")) if width <= self.format: formats.append((width, src)) if formats: formats.sort(reverse=True) src, formats = formats[0][1][0]["src"], formats[1:] else: src = "" fallback = self._video_fallback(formats) date = text.parse_timestamp(src.rpartition("?")[2]) return { "title" : text.unescape(title), "url" : src, "date" : date, "description": text.unescape(descr or extr( 'itemprop="description" content="', '"')), "rating" : extr('class="rated-', '"'), "index" : text.parse_int(index), "_index" : index, "_fallback" : fallback, } @staticmethod def _video_fallback(formats): if isinstance(formats, dict): formats = list(formats.items()) formats.sort(key=lambda fmt: text.parse_int(fmt[0].rstrip("p")), reverse=True) for fmt in formats: yield fmt[1][0]["src"] def _pagination(self, kind): url = "{}/{}".format(self.user_root, kind) params = { "page": 1, "isAjaxRequest": "1", } headers = { "Referer": url, "X-Requested-With": "XMLHttpRequest", } while True: with self.request( url, params=params, headers=headers, fatal=False) as response: try: data = response.json() except ValueError: return if not data: return if "errors" in data: msg = ", ".join(text.unescape(e) for e in data["errors"]) raise exception.StopExtraction(msg) for year, items in data["items"].items(): for item in items: page_url = text.extract(item, 'href="', '"')[0] if page_url[0] == "/": page_url = self.root + page_url yield page_url more = data.get("load_more") if not more or len(more) < 8: return params["page"] += 1 class NewgroundsImageExtractor(NewgroundsExtractor): """Extractor for a single image from newgrounds.com""" subcategory = "image" pattern = (r"(?:https?://)?(?:" r"(?:www\.)?newgrounds\.com/art/view/([^/?#]+)/[^/?#]+" r"|art\.ngfiles\.com/images/\d+/\d+_([^_]+)_([^.]+))") test = ( ("https://www.newgrounds.com/art/view/tomfulp/ryu-is-hawt", { "url": "57f182bcbbf2612690c3a54f16ffa1da5105245e", "content": "8f395e08333eb2457ba8d8b715238f8910221365", "keyword": { "artist" : ["tomfulp"], "comment" : "re:Consider this the bottom threshold for ", "date" : "dt:2009-06-04 14:44:05", "description": "re:Consider this the bottom threshold for ", "favorites" : int, "filename" : "94_tomfulp_ryu-is-hawt", "height" : 476, "index" : 94, "rating" : "e", "score" : float, "tags" : ["ryu", "streetfighter"], "title" : "Ryu is Hawt", "user" : "tomfulp", "width" : 447, }, }), ("https://art.ngfiles.com/images/0/94_tomfulp_ryu-is-hawt.gif", { "url": "57f182bcbbf2612690c3a54f16ffa1da5105245e", }), ("https://www.newgrounds.com/art/view/sailoryon/yon-dream-buster", { "url": "84eec95e663041a80630df72719f231e157e5f5d", "count": 2, }), # "adult" rated (#2456) ("https://www.newgrounds.com/art/view/kekiiro/red", { "options": (("username", None),), "count": 1, }), ) def __init__(self, match): NewgroundsExtractor.__init__(self, match) if match.group(2): self.user = match.group(2) self.post_url = "https://www.newgrounds.com/art/view/{}/{}".format( self.user, match.group(3)) else: self.post_url = text.ensure_http_scheme(match.group(0)) def posts(self): return (self.post_url,) class NewgroundsMediaExtractor(NewgroundsExtractor): """Extractor for a media file from newgrounds.com""" subcategory = "media" pattern = (r"(?:https?://)?(?:www\.)?newgrounds\.com" r"(/(?:portal/view|audio/listen)/\d+)") test = ( ("https://www.newgrounds.com/portal/view/595355", { "pattern": r"https://uploads\.ungrounded\.net/alternate/564000" r"/564957_alternate_31\.mp4\?1359712249", "keyword": { "artist" : ["kickinthehead", "danpaladin", "tomfulp"], "comment" : "re:My fan trailer for Alien Hominid HD!", "date" : "dt:2013-02-01 09:50:49", "description": "Fan trailer for Alien Hominid HD!", "favorites" : int, "filename" : "564957_alternate_31", "index" : 595355, "rating" : "e", "score" : float, "tags" : ["alienhominid", "trailer"], "title" : "Alien Hominid Fan Trailer", "user" : "kickinthehead", }, }), ("https://www.newgrounds.com/audio/listen/609768", { "url": "f4c5490ae559a3b05e46821bb7ee834f93a43c95", "keyword": { "artist" : ["zj", "tomfulp"], "comment" : "re:RECORDED 12-09-2014\n\nFrom The ZJ \"Late ", "date" : "dt:2015-02-23 19:31:59", "description": "From The ZJ Report Show!", "favorites" : int, "index" : 609768, "rating" : "", "score" : float, "tags" : ["fulp", "interview", "tom", "zj"], "title" : "ZJ Interviews Tom Fulp!", "user" : "zj", }, }), # flash animation (#1257) ("https://www.newgrounds.com/portal/view/161181/format/flash", { "pattern": r"https://uploads\.ungrounded\.net/161000" r"/161181_ddautta_mask__550x281_\.swf\?f1081628129", }), # format selection (#1729) ("https://www.newgrounds.com/portal/view/758545", { "options": (("format", "720p"),), "pattern": r"https://uploads\.ungrounded\.net/alternate/1482000" r"/1482860_alternate_102516\.720p\.mp4\?\d+", }), # "adult" rated (#2456) ("https://www.newgrounds.com/portal/view/717744", { "options": (("username", None),), "count": 1, }), ) def __init__(self, match): NewgroundsExtractor.__init__(self, match) self.user = "" self.post_url = self.root + match.group(1) def posts(self): return (self.post_url,) class NewgroundsArtExtractor(NewgroundsExtractor): """Extractor for all images of a newgrounds user""" subcategory = _path = "art" pattern = r"(?:https?://)?([\w-]+)\.newgrounds\.com/art/?$" test = ("https://tomfulp.newgrounds.com/art", { "pattern": NewgroundsImageExtractor.pattern, "count": ">= 3", }) class NewgroundsAudioExtractor(NewgroundsExtractor): """Extractor for all audio submissions of a newgrounds user""" subcategory = _path = "audio" pattern = r"(?:https?://)?([\w-]+)\.newgrounds\.com/audio/?$" test = ("https://tomfulp.newgrounds.com/audio", { "pattern": r"https://audio.ngfiles.com/\d+/\d+_.+\.mp3", "count": ">= 4", }) class NewgroundsMoviesExtractor(NewgroundsExtractor): """Extractor for all movies of a newgrounds user""" subcategory = _path = "movies" pattern = r"(?:https?://)?([\w-]+)\.newgrounds\.com/movies/?$" test = ("https://tomfulp.newgrounds.com/movies", { "pattern": r"https://uploads.ungrounded.net(/alternate)?/\d+/\d+_.+", "range": "1-10", "count": 10, }) class NewgroundsUserExtractor(NewgroundsExtractor): """Extractor for a newgrounds user profile""" subcategory = "user" pattern = r"(?:https?://)?([\w-]+)\.newgrounds\.com/?$" test = ( ("https://tomfulp.newgrounds.com", { "pattern": "https://tomfulp.newgrounds.com/art$", }), ("https://tomfulp.newgrounds.com", { "options": (("include", "all"),), "pattern": "https://tomfulp.newgrounds.com/(art|audio|movies)$", "count": 3, }), ) def items(self): base = self.user_root + "/" return self._dispatch_extractors(( (NewgroundsArtExtractor , base + "art"), (NewgroundsAudioExtractor , base + "audio"), (NewgroundsMoviesExtractor, base + "movies"), ), ("art",)) class NewgroundsFavoriteExtractor(NewgroundsExtractor): """Extractor for posts favorited by a newgrounds user""" subcategory = "favorite" directory_fmt = ("{category}", "{user}", "Favorites") pattern = (r"(?:https?://)?([\w-]+)\.newgrounds\.com" r"/favorites(?!/following)(?:/(art|audio|movies))?/?") test = ( ("https://tomfulp.newgrounds.com/favorites/art", { "range": "1-10", "count": ">= 10", }), ("https://tomfulp.newgrounds.com/favorites/audio"), ("https://tomfulp.newgrounds.com/favorites/movies"), ("https://tomfulp.newgrounds.com/favorites/"), ) def __init__(self, match): NewgroundsExtractor.__init__(self, match) self.kind = match.group(2) def posts(self): if self.kind: return self._pagination(self.kind) return itertools.chain.from_iterable( self._pagination(k) for k in ("art", "audio", "movies") ) def _pagination(self, kind): url = "{}/favorites/{}".format(self.user_root, kind) params = { "page": 1, "isAjaxRequest": "1", } headers = { "Referer": url, "X-Requested-With": "XMLHttpRequest", } while True: response = self.request(url, params=params, headers=headers) if response.history: return data = response.json() favs = self._extract_favorites(data.get("component") or "") yield from favs if len(favs) < 24: return params["page"] += 1 def _extract_favorites(self, page): return [ self.root + path for path in text.extract_iter( page, 'href="https://www.newgrounds.com', '"') ] class NewgroundsFollowingExtractor(NewgroundsFavoriteExtractor): """Extractor for a newgrounds user's favorited users""" subcategory = "following" pattern = r"(?:https?://)?([\w-]+)\.newgrounds\.com/favorites/(following)" test = ("https://tomfulp.newgrounds.com/favorites/following", { "pattern": NewgroundsUserExtractor.pattern, "range": "76-125", "count": 50, }) def items(self): data = {"_extractor": NewgroundsUserExtractor} for url in self._pagination(self.kind): yield Message.Queue, url, data @staticmethod def _extract_favorites(page): return [ text.ensure_http_scheme(user.rpartition('"')[2]) for user in text.extract_iter(page, 'class="item-user', '"><img') ] class NewgroundsSearchExtractor(NewgroundsExtractor): """Extractor for newgrounds.com search reesults""" subcategory = "search" directory_fmt = ("{category}", "search", "{search_tags}") pattern = (r"(?:https?://)?(?:www\.)?newgrounds\.com" r"/search/conduct/([^/?#]+)/?\?([^#]+)") test = ( ("https://www.newgrounds.com/search/conduct/art?terms=tree", { "pattern": NewgroundsImageExtractor.pattern, "keyword": {"search_tags": "tree"}, "range": "1-10", "count": 10, }), ("https://www.newgrounds.com/search/conduct/movies?terms=tree", { "pattern": r"https://uploads.ungrounded.net(/alternate)?/\d+/\d+", "range": "1-10", "count": 10, }), ("https://www.newgrounds.com/search/conduct/audio?advanced=1" "&terms=tree+green+nature&match=tdtu&genre=5&suitabilities=e%2Cm"), ) def __init__(self, match): NewgroundsExtractor.__init__(self, match) self._path, query = match.groups() self.query = text.parse_query(query) def posts(self): suitabilities = self.query.get("suitabilities") if suitabilities: data = {"view_suitability_" + s: "on" for s in suitabilities.split(",")} self.request(self.root + "/suitabilities", method="POST", data=data) return self._pagination("/search/conduct/" + self._path, self.query) def metadata(self): return {"search_tags": self.query.get("terms", "")} def _pagination(self, path, params): url = self.root + path headers = { "Accept": "application/json, text/javascript, */*; q=0.01", "X-Requested-With": "XMLHttpRequest", "Referer": self.root, } params["inner"] = "1" params["page"] = 1 while True: data = self.request(url, params=params, headers=headers).json() post_url = None for post_url in text.extract_iter(data["content"], 'href="', '"'): if not post_url.startswith("/search/"): yield post_url if post_url is None: return params["page"] += 1 �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1639190302.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.21.1/gallery_dl/extractor/ngomik.py����������������������������������������������������0000644�0001750�0001750�00000003177�14155007436�020412� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2018-2019 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extract manga-chapters and entire manga from http://ngomik.in/""" from .common import ChapterExtractor from .. import text import re class NgomikChapterExtractor(ChapterExtractor): """Extractor for manga-chapters from ngomik.in""" category = "ngomik" root = "http://ngomik.in" pattern = (r"(?:https?://)?(?:www\.)?ngomik\.in" r"(/[^/?#]+-chapter-[^/?#]+)") test = ( ("https://www.ngomik.in/14-sai-no-koi-chapter-1-6/", { "url": "8e67fdf751bbc79bc6f4dead7675008ddb8e32a4", "keyword": "204d177f09d438fd50c9c28d98c73289194640d8", }), ("https://ngomik.in/break-blade-chapter-26/", { "count": 34, }), ) def metadata(self, page): info = text.extract(page, '<title>', "")[0] manga, _, chapter = info.partition(" Chapter ") chapter, sep, minor = chapter.partition(" ")[0].partition(".") return { "manga": text.unescape(manga), "chapter": text.parse_int(chapter), "chapter_minor": sep + minor, "lang": "id", "language": "Indonesian", } @staticmethod def images(page): readerarea = text.extract(page, 'id="readerarea"', 'class="chnav"')[0] return [ (text.unescape(url), None) for url in re.findall(r"\ssrc=[\"']?([^\"' >]+)", readerarea) ] ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1643756959.0 gallery_dl-1.21.1/gallery_dl/extractor/nhentai.py0000644000175000017500000001260314176336637020561 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2015-2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://nhentai.net/""" from .common import GalleryExtractor, Extractor, Message from .. import text, util import collections import json class NhentaiGalleryExtractor(GalleryExtractor): """Extractor for image galleries from nhentai.net""" category = "nhentai" root = "https://nhentai.net" pattern = r"(?:https?://)?nhentai\.net/g/(\d+)" test = ("https://nhentai.net/g/147850/", { "url": "5179dbf0f96af44005a0ff705a0ad64ac26547d0", "keyword": { "title" : r"re:\[Morris\] Amazon no Hiyaku \| Amazon Elixir", "title_en" : str, "title_ja" : str, "gallery_id": 147850, "media_id" : 867789, "count" : 16, "date" : 1446050915, "scanlator" : "", "artist" : ["morris"], "group" : list, "parody" : list, "characters": list, "tags" : list, "type" : "manga", "lang" : "en", "language" : "English", "width" : int, "height" : int, }, }) def __init__(self, match): url = self.root + "/api/gallery/" + match.group(1) GalleryExtractor.__init__(self, match, url) def metadata(self, page): self.data = data = json.loads(page) title_en = data["title"].get("english", "") title_ja = data["title"].get("japanese", "") info = collections.defaultdict(list) for tag in data["tags"]: info[tag["type"]].append(tag["name"]) language = "" for language in info["language"]: if language != "translated": language = language.capitalize() break return { "title" : title_en or title_ja, "title_en" : title_en, "title_ja" : title_ja, "gallery_id": data["id"], "media_id" : text.parse_int(data["media_id"]), "date" : data["upload_date"], "scanlator" : data["scanlator"], "artist" : info["artist"], "group" : info["group"], "parody" : info["parody"], "characters": info["character"], "tags" : info["tag"], "type" : info["category"][0] if info["category"] else "", "lang" : util.language_to_code(language), "language" : language, } def images(self, _): ufmt = ("https://i.nhentai.net/galleries/" + self.data["media_id"] + "/{}.{}") extdict = {"j": "jpg", "p": "png", "g": "gif"} return [ (ufmt.format(num, extdict.get(img["t"], "jpg")), { "width": img["w"], "height": img["h"], }) for num, img in enumerate(self.data["images"]["pages"], 1) ] class NhentaiExtractor(Extractor): """Base class for nhentai extractors""" category = "nhentai" root = "https://nhentai.net" def __init__(self, match): Extractor.__init__(self, match) self.path, self.query = match.groups() def items(self): data = {"_extractor": NhentaiGalleryExtractor} for gallery_id in self._pagination(): url = "{}/g/{}/".format(self.root, gallery_id) yield Message.Queue, url, data def _pagination(self): url = self.root + self.path params = text.parse_query(self.query) params["page"] = text.parse_int(params.get("page"), 1) while True: page = self.request(url, params=params).text yield from text.extract_iter(page, 'href="/g/', '/') if 'class="next"' not in page: return params["page"] += 1 class NhentaiTagExtractor(NhentaiExtractor): """Extractor for nhentai tag searches""" subcategory = "tag" pattern = (r"(?:https?://)?nhentai\.net(" r"/(?:artist|category|character|group|language|parody|tag)" r"/[^/?#]+(?:/popular[^/?#]*)?/?)(?:\?([^#]+))?") test = ( ("https://nhentai.net/tag/sole-female/", { "pattern": NhentaiGalleryExtractor.pattern, "count": 30, "range": "1-30", }), ("https://nhentai.net/artist/itou-life/"), ("https://nhentai.net/group/itou-life/"), ("https://nhentai.net/parody/touhou-project/"), ("https://nhentai.net/character/patchouli-knowledge/popular"), ("https://nhentai.net/category/doujinshi/popular-today"), ("https://nhentai.net/language/english/popular-week"), ) class NhentaiSearchExtractor(NhentaiExtractor): """Extractor for nhentai search results""" subcategory = "search" pattern = r"(?:https?://)?nhentai\.net(/search/?)\?([^#]+)" test = ("https://nhentai.net/search/?q=touhou", { "pattern": NhentaiGalleryExtractor.pattern, "count": 30, "range": "1-30", }) class NhentaiFavoriteExtractor(NhentaiExtractor): """Extractor for nhentai favorites""" subcategory = "favorite" pattern = r"(?:https?://)?nhentai\.net(/favorites/?)(?:\?([^#]+))?" test = ("https://nhentai.net/favorites/",) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1643756959.0 gallery_dl-1.21.1/gallery_dl/extractor/nijie.py0000644000175000017500000002025314176336637020231 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2015-2020 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extract images from https://nijie.info/""" from .common import Extractor, Message, AsynchronousMixin from .. import text, exception from ..cache import cache BASE_PATTERN = r"(?:https?://)?(?:www\.)?nijie\.info" class NijieExtractor(AsynchronousMixin, Extractor): """Base class for nijie extractors""" category = "nijie" directory_fmt = ("{category}", "{user_id}") filename_fmt = "{image_id}_p{num}.{extension}" archive_fmt = "{image_id}_{num}" cookiedomain = "nijie.info" cookienames = ("nemail", "nlogin") root = "https://nijie.info" view_url = "https://nijie.info/view.php?id=" popup_url = "https://nijie.info/view_popup.php?id=" def __init__(self, match): Extractor.__init__(self, match) self.user_id = text.parse_int(match.group(1)) self.user_name = None self.session.headers["Referer"] = self.root + "/" def items(self): self.login() for image_id in self.image_ids(): response = self.request(self.view_url + image_id, fatal=False) if response.status_code >= 400: continue page = response.text data = self._extract_data(page) data["image_id"] = text.parse_int(image_id) yield Message.Directory, data for image in self._extract_images(page): image.update(data) if not image["extension"]: image["extension"] = "jpg" yield Message.Url, image["url"], image def image_ids(self): """Collect all relevant image-ids""" @staticmethod def _extract_data(page): """Extract image metadata from 'page'""" extr = text.extract_from(page) keywords = text.unescape(extr( 'name="keywords" content="', '" />')).split(",") data = { "title" : keywords[0].strip(), "description": text.unescape(extr( '"description": "', '"').replace("&", "&")), "date" : text.parse_datetime(extr( '"datePublished": "', '"') + "+0900", "%a %b %d %H:%M:%S %Y%z"), "artist_id" : text.parse_int(extr( '"sameAs": "https://nijie.info/members.php?id=', '"')), "artist_name": keywords[1], "tags" : keywords[2:-1], } data["user_id"] = data["artist_id"] data["user_name"] = data["artist_name"] return data @staticmethod def _extract_images(page): """Extract image URLs from 'page'""" images = text.extract_iter(page, '', '<')[0] or "") yield from text.extract_iter(page, 'illust_id="', '"') if '
  • = 400: return None user = response.json()["data"] attr = user["attributes"] attr["id"] = user["id"] attr["date"] = text.parse_datetime( attr["created"], "%Y-%m-%dT%H:%M:%S.%f%z") return attr def _filename(self, url): """Fetch filename from an URL's Content-Disposition header""" response = self.request(url, method="HEAD", fatal=False) cd = response.headers.get("Content-Disposition") return text.extract(cd, 'filename="', '"')[0] @staticmethod def _filehash(url): """Extract MD5 hash from a download URL""" parts = url.partition("?")[0].split("/") parts.reverse() for part in parts: if len(part) == 32: return part return "" @staticmethod def _build_url(endpoint, query): return ( "https://www.patreon.com/api/" + endpoint + "?include=user,images,attachments,user_defined_tags,campaign,poll." "choices,poll.current_user_responses.user,poll.current_user_respon" "ses.choice,poll.current_user_responses.poll,access_rules.tier.nul" "l" "&fields[post]=change_visibility_at,comment_count,content,current_" "user_can_delete,current_user_can_view,current_user_has_liked,embe" "d,image,is_paid,like_count,min_cents_pledged_to_view,post_file,pu" "blished_at,patron_count,patreon_url,post_type,pledge_url,thumbnai" "l_url,teaser_text,title,upgrade_url,url,was_posted_by_campaign_ow" "ner" "&fields[user]=image_url,full_name,url" "&fields[campaign]=avatar_photo_url,earnings_visibility,is_nsfw,is" "_monthly,name,url" "&fields[access_rule]=access_rule_type,amount_cents" + query + "&json-api-use-default-includes=false" "&json-api-version=1.0" ) def _build_file_generators(self, filetypes): if filetypes is None: return (self._images, self._image_large, self._attachments, self._postfile, self._content) genmap = { "images" : self._images, "image_large": self._image_large, "attachments": self._attachments, "postfile" : self._postfile, "content" : self._content, } if isinstance(filetypes, str): filetypes = filetypes.split(",") return [genmap[ft] for ft in filetypes] class PatreonCreatorExtractor(PatreonExtractor): """Extractor for a creator's works""" subcategory = "creator" pattern = (r"(?:https?://)?(?:www\.)?patreon\.com" r"/(?!(?:home|join|posts|login|signup)(?:$|[/?#]))" r"([^/?#]+)(?:/posts)?/?(?:\?([^#]+))?") test = ( ("https://www.patreon.com/koveliana", { "range": "1-25", "count": ">= 25", "keyword": { "attachments" : list, "comment_count": int, "content" : str, "creator" : dict, "date" : "type:datetime", "id" : int, "images" : list, "like_count" : int, "post_type" : str, "published_at" : str, "title" : str, }, }), ("https://www.patreon.com/koveliana/posts?filters[month]=2020-3", { "count": 1, "keyword": {"date": "dt:2020-03-30 21:21:44"}, }), ("https://www.patreon.com/kovelianot", { "exception": exception.NotFoundError, }), ("https://www.patreon.com/user?u=2931440"), ("https://www.patreon.com/user/posts/?u=2931440"), ) def __init__(self, match): PatreonExtractor.__init__(self, match) self.creator, self.query = match.groups() def posts(self): query = text.parse_query(self.query) creator_id = query.get("u") if creator_id: url = "{}/user/posts?u={}".format(self.root, creator_id) else: url = "{}/{}/posts".format(self.root, self.creator) page = self.request(url, notfound="creator").text campaign_id = text.extract(page, "/campaign/", "/")[0] if not campaign_id: raise exception.NotFoundError("creator") filters = "".join( "&filter[{}={}".format(key[8:], text.escape(value)) for key, value in query.items() if key.startswith("filters[") ) url = self._build_url("posts", ( "&sort=" + query.get("sort", "-published_at") + "&filter[is_draft]=false" "&filter[contains_exclusive_posts]=true" "&filter[campaign_id]=" + campaign_id + filters )) return self._pagination(url) class PatreonUserExtractor(PatreonExtractor): """Extractor for media from creators supported by you""" subcategory = "user" pattern = r"(?:https?://)?(?:www\.)?patreon\.com/home$" test = ("https://www.patreon.com/home",) def posts(self): url = self._build_url("stream", ( "&page[cursor]=null" "&filter[is_following]=true" )) return self._pagination(url) class PatreonPostExtractor(PatreonExtractor): """Extractor for media from a single post""" subcategory = "post" pattern = r"(?:https?://)?(?:www\.)?patreon\.com/posts/([^/?#]+)" test = ( # postfile + attachments ("https://www.patreon.com/posts/precious-metal-23563293", { "count": 4, }), # postfile + content ("https://www.patreon.com/posts/56127163", { "count": 3, "keyword": {"filename": r"re:^(?!1).+$"}, }), # tags (#1539) ("https://www.patreon.com/posts/free-post-12497641", { "keyword": {"tags": ["AWMedia"]}, }), ("https://www.patreon.com/posts/not-found-123", { "exception": exception.NotFoundError, }), ) def __init__(self, match): PatreonExtractor.__init__(self, match) self.slug = match.group(1) def posts(self): url = "{}/posts/{}".format(self.root, self.slug) page = self.request(url, notfound="post").text data = text.extract(page, "window.patreon.bootstrap,", "\n});")[0] post = json.loads(data + "}")["post"] included = self._transform(post["included"]) return (self._process(post["data"], included),) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1646253139.0 gallery_dl-1.21.1/gallery_dl/extractor/philomena.py0000644000175000017500000001776214207752123021105 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for Philomena sites""" from .booru import BooruExtractor from .. import text, exception import operator class PhilomenaExtractor(BooruExtractor): """Base class for philomena extractors""" basecategory = "philomena" filename_fmt = "{filename}.{extension}" archive_fmt = "{id}" request_interval = 1.0 per_page = 50 _file_url = operator.itemgetter("view_url") @staticmethod def _prepare(post): post["date"] = text.parse_datetime(post["created_at"]) @staticmethod def _extended_tags(post): pass def _pagination(self, url, params): params["page"] = 1 params["per_page"] = self.per_page api_key = self.config("api-key") if api_key: params["key"] = api_key filter_id = self.config("filter") if filter_id: params["filter_id"] = filter_id elif not api_key: try: params["filter_id"] = INSTANCES[self.category]["filter_id"] except (KeyError, TypeError): params["filter_id"] = "2" while True: data = self.request(url, params=params).json() yield from data["images"] if len(data["images"]) < self.per_page: return params["page"] += 1 INSTANCES = { "derpibooru": {"root": "https://derpibooru.org", "filter_id": "56027"}, "ponybooru" : {"root": "https://ponybooru.org", "filter_id": "2"}, "furbooru" : {"root": "https://furbooru.org", "filter_id": "2"}, } BASE_PATTERN = PhilomenaExtractor.update(INSTANCES) class PhilomenaPostExtractor(PhilomenaExtractor): """Extractor for single posts on a Philomena booru""" subcategory = "post" pattern = BASE_PATTERN + r"/(?:images/)?(\d+)" test = ( ("https://derpibooru.org/images/1", { "content": "88449eeb0c4fa5d3583d0b794f6bc1d70bf7f889", "count": 1, "keyword": { "animated": False, "aspect_ratio": 1.0, "comment_count": int, "created_at": "2012-01-02T03:12:33Z", "date": "dt:2012-01-02 03:12:33", "deletion_reason": None, "description": "", "downvotes": int, "duplicate_of": None, "duration": 0.04, "extension": "png", "faves": int, "first_seen_at": "2012-01-02T03:12:33Z", "format": "png", "height": 900, "hidden_from_users": False, "id": 1, "mime_type": "image/png", "name": "1__safe_fluttershy_solo_cloud_happy_flying_upvotes+ga" "lore_artist-colon-speccysy_get_sunshine", "orig_sha512_hash": None, "processed": True, "representations": dict, "score": int, "sha512_hash": "f16c98e2848c2f1bfff3985e8f1a54375cc49f78125391" "aeb80534ce011ead14e3e452a5c4bc98a66f56bdfcd07e" "f7800663b994f3f343c572da5ecc22a9660f", "size": 860914, "source_url": "https://www.deviantart.com/speccysy/art" "/Afternoon-Flight-215193985", "spoilered": False, "tag_count": 42, "tag_ids": list, "tags": list, "thumbnails_generated": True, "updated_at": "2021-09-30T20:04:01Z", "uploader": "Clover the Clever", "uploader_id": 211188, "upvotes": int, "view_url": str, "width": 900, "wilson_score": float, }, }), ("https://derpibooru.org/1"), ("https://ponybooru.org/images/1", { "content": "bca26f58fafd791fe07adcd2a28efd7751824605", }), ("https://furbooru.org/images/1", { "content": "9eaa1e1b32fa0f16520912257dbefaff238d5fd2", }), ) def __init__(self, match): PhilomenaExtractor.__init__(self, match) self.image_id = match.group(match.lastindex) def posts(self): url = self.root + "/api/v1/json/images/" + self.image_id return (self.request(url).json()["image"],) class PhilomenaSearchExtractor(PhilomenaExtractor): """Extractor for Philomena search results""" subcategory = "search" directory_fmt = ("{category}", "{search_tags}") pattern = BASE_PATTERN + r"/(?:search/?\?([^#]+)|tags/([^/?#]+))" test = ( ("https://derpibooru.org/search?q=cute", { "range": "40-60", "count": 21, }), ("https://derpibooru.org/tags/cute", { "range": "40-60", "count": 21, }), (("https://derpibooru.org/tags/" "artist-colon--dash-_-fwslash--fwslash-%255Bkorroki%255D_aternak"), { "count": ">= 2", }), ("https://ponybooru.org/search?q=cute", { "range": "40-60", "count": 21, }), ("https://furbooru.org/search?q=cute", { "range": "40-60", "count": 21, }), ) def __init__(self, match): PhilomenaExtractor.__init__(self, match) groups = match.groups() if groups[-1]: q = groups[-1].replace("+", " ") for old, new in ( ("-colon-" , ":"), ("-dash-" , "-"), ("-dot-" , "."), ("-plus-" , "+"), ("-fwslash-", "/"), ("-bwslash-", "\\"), ): if old in q: q = q.replace(old, new) self.params = {"q": text.unquote(text.unquote(q))} else: self.params = text.parse_query(groups[-2]) def metadata(self): return {"search_tags": self.params.get("q", "")} def posts(self): url = self.root + "/api/v1/json/search/images" return self._pagination(url, self.params) class PhilomenaGalleryExtractor(PhilomenaExtractor): """Extractor for Philomena galleries""" subcategory = "gallery" directory_fmt = ("{category}", "galleries", "{gallery[id]} {gallery[title]}") pattern = BASE_PATTERN + r"/galleries/(\d+)" test = ( ("https://derpibooru.org/galleries/1", { "pattern": r"https://derpicdn\.net/img/view/\d+/\d+/\d+/\d+[^/]+$", "keyword": { "gallery": { "description": "Indexes start at 1 :P", "id": 1, "spoiler_warning": "", "thumbnail_id": 1, "title": "The Very First Gallery", "user": "DeliciousBlackInk", "user_id": 365446, }, }, }), ("https://ponybooru.org/galleries/27", { "count": ">= 24", }), ("https://furbooru.org/galleries/27", { "count": ">= 13", }), ) def __init__(self, match): PhilomenaExtractor.__init__(self, match) self.gallery_id = match.group(match.lastindex) def metadata(self): url = self.root + "/api/v1/json/search/galleries" params = {"q": "id:" + self.gallery_id} galleries = self.request(url, params=params).json()["galleries"] if not galleries: raise exception.NotFoundError("gallery") return {"gallery": galleries[0]} def posts(self): gallery_id = "gallery_id:" + self.gallery_id url = self.root + "/api/v1/json/search/images" params = {"sd": "desc", "sf": gallery_id, "q" : gallery_id} return self._pagination(url, params) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1643756959.0 gallery_dl-1.21.1/gallery_dl/extractor/photobucket.py0000644000175000017500000001473414176336637021471 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2019 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extract images from https://photobucket.com/""" from .common import Extractor, Message from .. import text, exception import base64 import json class PhotobucketAlbumExtractor(Extractor): """Extractor for albums on photobucket.com""" category = "photobucket" subcategory = "album" directory_fmt = ("{category}", "{username}", "{location}") filename_fmt = "{offset:>03}{pictureId:?_//}_{titleOrFilename}.{extension}" archive_fmt = "{id}" pattern = (r"(?:https?://)?((?:[\w-]+\.)?photobucket\.com)" r"/user/[^/?&#]+/library(?:/[^?&#]*)?") test = ( ("https://s369.photobucket.com/user/CrpyLrkr/library", { "pattern": r"https?://[oi]+\d+.photobucket.com/albums/oo139/", "count": ">= 50" }), # subalbums of main "directory" ("https://s271.photobucket.com/user/lakerfanryan/library/", { "options": (("image-filter", "False"),), "pattern": pattern, "count": 1, }), # subalbums of subalbum without images ("https://s271.photobucket.com/user/lakerfanryan/library/Basketball", { "pattern": pattern, "count": ">= 9", }), # private (missing JSON data) ("https://s1277.photobucket.com/user/sinisterkat44/library/", { "count": 0, }), ("https://s1110.photobucket.com/user/chndrmhn100/library/" "Chandu%20is%20the%20King?sort=3&page=1"), ) def __init__(self, match): Extractor.__init__(self, match) self.album_path = "" self.root = "https://" + match.group(1) self.session.headers["Referer"] = self.url def items(self): for image in self.images(): image["titleOrFilename"] = text.unescape(image["titleOrFilename"]) image["title"] = text.unescape(image["title"]) image["extension"] = image["ext"] yield Message.Directory, image yield Message.Url, image["fullsizeUrl"], image if self.config("subalbums", True): for album in self.subalbums(): album["_extractor"] = PhotobucketAlbumExtractor yield Message.Queue, album["url"], album def images(self): """Yield all images of the current album""" url = self.url params = {"sort": "3", "page": 1} while True: page = self.request(url, params=params).text json_data = text.extract(page, "collectionData:", ",\n")[0] if not json_data: msg = text.extract(page, 'libraryPrivacyBlock">', "")[0] msg = ' ("{}")'.format(text.remove_html(msg)) if msg else "" self.log.error("Unable to get JSON data%s", msg) return data = json.loads(json_data) yield from data["items"]["objects"] if data["total"] <= data["offset"] + data["pageSize"]: self.album_path = data["currentAlbumPath"] return params["page"] += 1 def subalbums(self): """Return all subalbum objects""" url = self.root + "/component/Albums-SubalbumList" params = { "albumPath": self.album_path, "fetchSubAlbumsOnly": "true", "deferCollapsed": "true", "json": "1", } data = self.request(url, params=params).json() return data["body"].get("subAlbums", ()) class PhotobucketImageExtractor(Extractor): """Extractor for individual images from photobucket.com""" category = "photobucket" subcategory = "image" directory_fmt = ("{category}", "{username}") filename_fmt = "{pictureId:?/_/}{titleOrFilename}.{extension}" archive_fmt = "{username}_{id}" pattern = (r"(?:https?://)?(?:[\w-]+\.)?photobucket\.com" r"(?:/gallery/user/([^/?&#]+)/media/([^/?&#]+)" r"|/user/([^/?&#]+)/media/[^?&#]+\.html)") test = ( (("https://s271.photobucket.com/user/lakerfanryan" "/media/Untitled-3-1.jpg.html"), { "url": "3b647deeaffc184cc48c89945f67574559c9051f", "keyword": "69732741b2b351db7ecaa77ace2fdb39f08ca5a3", }), (("https://s271.photobucket.com/user/lakerfanryan" "/media/IsotopeswBros.jpg.html?sort=3&o=2"), { "url": "12c1890c09c9cdb8a88fba7eec13f324796a8d7b", "keyword": "61200a223df6c06f45ac3d30c88b3f5b048ce9a8", }), ) def __init__(self, match): Extractor.__init__(self, match) self.user = match.group(1) or match.group(3) self.media_id = match.group(2) self.session.headers["Referer"] = self.url def items(self): url = "https://photobucket.com/galleryd/search.php" params = {"userName": self.user, "searchTerm": "", "ref": ""} if self.media_id: params["mediaId"] = self.media_id else: params["url"] = self.url # retry API call up to 5 times, since it can randomly fail tries = 0 while tries < 5: data = self.request(url, method="POST", params=params).json() image = data["mediaDocuments"] if "message" not in image: break # success tries += 1 self.log.debug(image["message"]) else: raise exception.StopExtraction(image["message"]) # adjust metadata entries to be at least somewhat similar # to what the 'album' extractor provides if "media" in image: image = image["media"][image["mediaIndex"]] image["albumView"] = data["mediaDocuments"]["albumView"] image["username"] = image["ownerId"] else: image["fileUrl"] = image.pop("imageUrl") image.setdefault("title", "") image.setdefault("description", "") name, _, ext = image["fileUrl"].rpartition("/")[2].rpartition(".") image["ext"] = image["extension"] = ext image["titleOrFilename"] = image["title"] or name image["tags"] = image.pop("clarifaiTagList", []) mtype, _, mid = base64.b64decode(image["id"]).partition(b":") image["pictureId"] = mid.decode() if mtype == b"mediaId" else "" yield Message.Directory, image yield Message.Url, image["fileUrl"], image ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1639190302.0 gallery_dl-1.21.1/gallery_dl/extractor/photovogue.py0000644000175000017500000000545214155007436021323 0ustar00mikemike# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.vogue.it/en/photovogue/""" from .common import Extractor, Message from .. import text BASE_PATTERN = r"(?:https?://)?(?:www\.)?vogue\.it/(?:en/)?photovogue" class PhotovogueUserExtractor(Extractor): category = "photovogue" subcategory = "user" directory_fmt = ("{category}", "{photographer[id]} {photographer[name]}") filename_fmt = "{id} {title}.{extension}" archive_fmt = "{id}" pattern = BASE_PATTERN + r"/portfolio/?\?id=(\d+)" test = ( ("https://www.vogue.it/en/photovogue/portfolio/?id=221252"), ("https://vogue.it/photovogue/portfolio?id=221252", { "pattern": r"https://images.vogue.it/Photovogue/[^/]+_gallery.jpg", "keyword": { "date": "type:datetime", "favorite_count": int, "favorited": list, "id": int, "image_id": str, "is_favorite": False, "orientation": "re:portrait|landscape", "photographer": { "biography": "Born in 1995. Live in Bologna.", "city": "Bologna", "country_id": 106, "favoritedCount": int, "id": 221252, "isGold": bool, "isPro": bool, "latitude": str, "longitude": str, "name": "Arianna Mattarozzi", "user_id": "38cb0601-4a85-453c-b7dc-7650a037f2ab", "websites": list, }, "photographer_id": 221252, "tags": list, "title": str, }, }), ) def __init__(self, match): Extractor.__init__(self, match) self.user_id = match.group(1) def items(self): for photo in self.photos(): url = photo["gallery_image"] photo["title"] = photo["title"].strip() photo["date"] = text.parse_datetime( photo["date"], "%Y-%m-%dT%H:%M:%S.%f%z") yield Message.Directory, photo yield Message.Url, url, text.nameext_from_url(url, photo) def photos(self): url = "https://api.vogue.it/production/photos" params = { "count": "50", "order_by": "DESC", "page": 0, "photographer_id": self.user_id, } while True: data = self.request(url, params=params).json() yield from data["items"] if not data["has_next"]: break params["page"] += 1 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1643756959.0 gallery_dl-1.21.1/gallery_dl/extractor/picarto.py0000644000175000017500000000473114176336637020577 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://picarto.tv/""" from .common import Extractor, Message from .. import text class PicartoGalleryExtractor(Extractor): """Extractor for picarto galleries""" category = "picarto" subcategory = "gallery" root = "https://picarto.tv" directory_fmt = ("{category}", "{channel[name]}") filename_fmt = "{id} {title}.{extension}" archive_fmt = "{id}" pattern = r"(?:https?://)?picarto\.tv/([^/?#]+)/gallery" test = ("https://picarto.tv/fnook/gallery/default/", { "pattern": r"https://images\.picarto\.tv/gallery/\d/\d\d/\d+/artwork" r"/[0-9a-f-]+/large-[0-9a-f]+\.(jpg|png|gif)", "count": ">= 7", "keyword": {"date": "type:datetime"}, }) def __init__(self, match): Extractor.__init__(self, match) self.username = match.group(1) def items(self): for post in self.posts(): post["date"] = text.parse_datetime( post["created_at"], "%Y-%m-%d %H:%M:%S") variations = post.pop("variations", ()) yield Message.Directory, post image = post["default_image"] if not image: continue url = "https://images.picarto.tv/gallery/" + image["name"] text.nameext_from_url(url, post) yield Message.Url, url, post for variation in variations: post.update(variation) image = post["default_image"] url = "https://images.picarto.tv/gallery/" + image["name"] text.nameext_from_url(url, post) yield Message.Url, url, post def posts(self): url = "https://ptvintern.picarto.tv/api/channel-gallery" params = { "first": "30", "page": 1, "filter_params[album_id]": "", "filter_params[channel_name]": self.username, "filter_params[q]": "", "filter_params[visibility]": "", "order_by[field]": "published_at", "order_by[order]": "DESC", } while True: posts = self.request(url, params=params).json() if not posts: return yield from posts params["page"] += 1 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1643756959.0 gallery_dl-1.21.1/gallery_dl/extractor/piczel.py0000644000175000017500000001117214176336637020421 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2018-2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://piczel.tv/""" from .common import Extractor, Message from .. import text class PiczelExtractor(Extractor): """Base class for piczel extractors""" category = "piczel" directory_fmt = ("{category}", "{user[username]}") filename_fmt = "{category}_{id}_{title}_{num:>02}.{extension}" archive_fmt = "{id}_{num}" root = "https://piczel.tv" api_root = "https://tombstone.piczel.tv" def items(self): for post in self.posts(): post["tags"] = [t["title"] for t in post["tags"] if t["title"]] post["date"] = text.parse_datetime( post["created_at"], "%Y-%m-%dT%H:%M:%S.%f%z") if post["multi"]: images = post["images"] del post["images"] yield Message.Directory, post for post["num"], image in enumerate(images): if "id" in image: del image["id"] post.update(image) url = post["image"]["url"] yield Message.Url, url, text.nameext_from_url(url, post) else: yield Message.Directory, post post["num"] = 0 url = post["image"]["url"] yield Message.Url, url, text.nameext_from_url(url, post) def posts(self): """Return an iterable with all relevant post objects""" def _pagination(self, url, folder_id=None): params = { "from_id" : None, "folder_id": folder_id, } while True: data = self.request(url, params=params).json() if not data: return params["from_id"] = data[-1]["id"] for post in data: if not folder_id or folder_id == post["folder_id"]: yield post class PiczelUserExtractor(PiczelExtractor): """Extractor for all images from a user's gallery""" subcategory = "user" pattern = r"(?:https?://)?(?:www\.)?piczel\.tv/gallery/([^/?#]+)/?$" test = ("https://piczel.tv/gallery/Bikupan", { "range": "1-100", "count": ">= 100", }) def __init__(self, match): PiczelExtractor.__init__(self, match) self.user = match.group(1) def posts(self): url = "{}/api/users/{}/gallery".format(self.api_root, self.user) return self._pagination(url) class PiczelFolderExtractor(PiczelExtractor): """Extractor for images inside a user's folder""" subcategory = "folder" directory_fmt = ("{category}", "{user[username]}", "{folder[name]}") archive_fmt = "f{folder[id]}_{id}_{num}" pattern = (r"(?:https?://)?(?:www\.)?piczel\.tv" r"/gallery/(?!image)([^/?#]+)/(\d+)") test = ("https://piczel.tv/gallery/Lulena/1114", { "count": ">= 4", }) def __init__(self, match): PiczelExtractor.__init__(self, match) self.user, self.folder_id = match.groups() def posts(self): url = "{}/api/users/{}/gallery".format(self.api_root, self.user) return self._pagination(url, int(self.folder_id)) class PiczelImageExtractor(PiczelExtractor): """Extractor for individual images""" subcategory = "image" pattern = r"(?:https?://)?(?:www\.)?piczel\.tv/gallery/image/(\d+)" test = ("https://piczel.tv/gallery/image/7807", { "pattern": r"https://(\w+\.)?piczel\.tv/static/uploads/gallery_image" r"/32920/image/7807/25737334-Lulena\.png", "content": "df9a053a24234474a19bce2b7e27e0dec23bff87", "keyword": { "created_at": "2018-07-22T05:13:58.000Z", "date": "dt:2018-07-22 05:13:58", "description": None, "extension": "png", "favorites_count": int, "folder_id": 1113, "id": 7807, "is_flash": False, "is_video": False, "multi": False, "nsfw": False, "num": 0, "password_protected": False, "tags": ["fanart", "commission", "altair", "recreators"], "title": "Altair", "user": dict, "views": int, }, }) def __init__(self, match): PiczelExtractor.__init__(self, match) self.image_id = match.group(1) def posts(self): url = "{}/api/gallery/{}".format(self.api_root, self.image_id) return (self.request(url).json(),) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1646253139.0 gallery_dl-1.21.1/gallery_dl/extractor/pillowfort.py0000644000175000017500000001622714207752123021325 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2021-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.pillowfort.social/""" from .common import Extractor, Message from ..cache import cache from .. import text, exception import re BASE_PATTERN = r"(?:https?://)?www\.pillowfort\.social" class PillowfortExtractor(Extractor): """Base class for pillowfort extractors""" category = "pillowfort" root = "https://www.pillowfort.social" directory_fmt = ("{category}", "{username}") filename_fmt = ("{post_id} {title|original_post[title]:?/ /}" "{num:>02}.{extension}") archive_fmt = "{id}" cookiedomain = "www.pillowfort.social" def __init__(self, match): Extractor.__init__(self, match) self.item = match.group(1) def items(self): self.login() inline = self.config("inline", True) reblogs = self.config("reblogs", False) external = self.config("external", False) if inline: inline = re.compile(r'src="(https://img\d+\.pillowfort\.social' r'/posts/[^"]+)').findall for post in self.posts(): if "original_post" in post and not reblogs: continue files = post.pop("media") if inline: for url in inline(post["content"]): files.append({"url": url}) post["date"] = text.parse_datetime( post["created_at"], "%Y-%m-%dT%H:%M:%S.%f%z") post["post_id"] = post.pop("id") yield Message.Directory, post post["num"] = 0 for file in files: url = file["url"] if not url: continue if file.get("embed_code"): if not external: continue msgtype = Message.Queue else: post["num"] += 1 msgtype = Message.Url post.update(file) text.nameext_from_url(url, post) post["hash"], _, post["filename"] = \ post["filename"].partition("_") if "id" not in file: post["id"] = post["hash"] if "created_at" in file: post["date"] = text.parse_datetime( file["created_at"], "%Y-%m-%dT%H:%M:%S.%f%z") yield msgtype, url, post def login(self): cget = self.session.cookies.get if cget("_Pf_new_session", domain=self.cookiedomain) \ or cget("remember_user_token", domain=self.cookiedomain): return username, password = self._get_auth_info() if username: cookies = self._login_impl(username, password) self._update_cookies(cookies) @cache(maxage=14*24*3600, keyarg=1) def _login_impl(self, username, password): self.log.info("Logging in as %s", username) url = "https://www.pillowfort.social/users/sign_in" page = self.request(url).text auth = text.extract(page, 'name="authenticity_token" value="', '"')[0] headers = {"Origin": self.root, "Referer": url} data = { "utf8" : "✓", "authenticity_token": auth, "user[email]" : username, "user[password]" : password, "user[remember_me]" : "1", } response = self.request(url, method="POST", headers=headers, data=data) if not response.history: raise exception.AuthenticationError() return { cookie.name: cookie.value for cookie in response.history[0].cookies } class PillowfortPostExtractor(PillowfortExtractor): """Extractor for a single pillowfort post""" subcategory = "post" pattern = BASE_PATTERN + r"/posts/(\d+)" test = ( ("https://www.pillowfort.social/posts/27510", { "pattern": r"https://img\d+\.pillowfort\.social" r"/posts/\w+_out\d+\.png", "count": 4, "keyword": { "avatar_url": str, "col": 0, "commentable": True, "comments_count": int, "community_id": None, "content": str, "created_at": str, "date": "type:datetime", "deleted": None, "deleted_at": None, "deleted_by_mod": None, "deleted_for_flag_id": None, "embed_code": None, "id": int, "last_activity": str, "last_activity_elapsed": str, "last_edited_at": str, "likes_count": int, "media_type": "picture", "nsfw": False, "num": int, "original_post_id": None, "original_post_user_id": None, "picture_content_type": None, "picture_file_name": None, "picture_file_size": None, "picture_updated_at": None, "post_id": 27510, "post_type": "picture", "privacy": "public", "reblog_copy_info": list, "rebloggable": True, "reblogged_from_post_id": None, "reblogged_from_user_id": None, "reblogs_count": int, "row": int, "small_image_url": None, "tags": list, "time_elapsed": str, "timestamp": str, "title": "What is Pillowfort.social?", "updated_at": str, "url": r"re:https://img3.pillowfort.social/posts/.*\.png", "user_id": 5, "username": "Staff" }, }), ("https://www.pillowfort.social/posts/1557500", { "options": (("external", True), ("inline", False)), "pattern": r"https://twitter\.com/Aliciawitdaart/status" r"/1282862493841457152", }), ("https://www.pillowfort.social/posts/1672518", { "options": (("inline", True),), "count": 3, }), ) def posts(self): url = "{}/posts/{}/json/".format(self.root, self.item) return (self.request(url).json(),) class PillowfortUserExtractor(PillowfortExtractor): """Extractor for all posts of a pillowfort user""" subcategory = "user" pattern = BASE_PATTERN + r"/(?!posts/)([^/?#]+)" test = ("https://www.pillowfort.social/Pome", { "pattern": r"https://img\d+\.pillowfort\.social/posts/", "range": "1-15", "count": 15, }) def posts(self): url = "{}/{}/json/".format(self.root, self.item) params = {"p": 1} while True: posts = self.request(url, params=params).json()["posts"] yield from posts if len(posts) < 20: return params["p"] += 1 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1648847700.0 gallery_dl-1.21.1/gallery_dl/extractor/pinterest.py0000644000175000017500000004253714221665524021150 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2016-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.pinterest.com/""" from .common import Extractor, Message from .. import text, util, exception from ..cache import cache import itertools import json BASE_PATTERN = r"(?:https?://)?(?:\w+\.)?pinterest\.[\w.]+" class PinterestExtractor(Extractor): """Base class for pinterest extractors""" category = "pinterest" filename_fmt = "{category}_{id}{media_id:?_//}.{extension}" archive_fmt = "{id}{media_id}" root = "https://www.pinterest.com" def __init__(self, match): Extractor.__init__(self, match) self.api = PinterestAPI(self) def items(self): self.api.login() data = self.metadata() videos = self.config("videos", True) yield Message.Directory, data for pin in self.pins(): pin.update(data) carousel_data = pin.get("carousel_data") if carousel_data: for num, slot in enumerate(carousel_data["carousel_slots"], 1): slot["media_id"] = slot.pop("id") pin.update(slot) pin["num"] = num size, image = next(iter(slot["images"].items())) url = image["url"].replace("/" + size + "/", "/originals/") yield Message.Url, url, text.nameext_from_url(url, pin) else: try: media = self._media_from_pin(pin) except Exception: self.log.debug("Unable to fetch download URL for pin %s", pin.get("id")) continue if videos or media.get("duration") is None: pin.update(media) pin["num"] = 0 pin["media_id"] = "" url = media["url"] text.nameext_from_url(url, pin) if pin["extension"] == "m3u8": url = "ytdl:" + url pin["extension"] = "mp4" yield Message.Url, url, pin def metadata(self): """Return general metadata""" def pins(self): """Return all relevant pin objects""" @staticmethod def _media_from_pin(pin): videos = pin.get("videos") if videos: video_formats = videos["video_list"] for fmt in ("V_HLSV4", "V_HLSV3_WEB", "V_HLSV3_MOBILE"): if fmt in video_formats: media = video_formats[fmt] break else: media = max(video_formats.values(), key=lambda x: x.get("width", 0)) if "V_720P" in video_formats: media["_fallback"] = (video_formats["V_720P"]["url"],) return media return pin["images"]["orig"] class PinterestPinExtractor(PinterestExtractor): """Extractor for images from a single pin from pinterest.com""" subcategory = "pin" pattern = BASE_PATTERN + r"/pin/([^/?#&]+)(?!.*#related$)" test = ( ("https://www.pinterest.com/pin/858146903966145189/", { "url": "afb3c26719e3a530bb0e871c480882a801a4e8a5", "content": ("4c435a66f6bb82bb681db2ecc888f76cf6c5f9ca", "d3e24bc9f7af585e8c23b9136956bd45a4d9b947"), }), # video pin (#1189) ("https://www.pinterest.com/pin/422564377542934214/", { "pattern": r"https://v\.pinimg\.com/videos/mc/hls/d7/22/ff" r"/d722ff00ab2352981b89974b37909de8.m3u8", }), ("https://www.pinterest.com/pin/858146903966145188/", { "exception": exception.NotFoundError, }), ) def __init__(self, match): PinterestExtractor.__init__(self, match) self.pin_id = match.group(1) self.pin = None def metadata(self): self.pin = self.api.pin(self.pin_id) return self.pin def pins(self): return (self.pin,) class PinterestBoardExtractor(PinterestExtractor): """Extractor for images from a board from pinterest.com""" subcategory = "board" directory_fmt = ("{category}", "{board[owner][username]}", "{board[name]}") archive_fmt = "{board[id]}_{id}" pattern = (BASE_PATTERN + r"/(?!pin/)([^/?#&]+)" "/(?!_saved|_created)([^/?#&]+)/?$") test = ( ("https://www.pinterest.com/g1952849/test-/", { "pattern": r"https://i\.pinimg\.com/originals/", "count": 2, }), # board with sections (#835) ("https://www.pinterest.com/g1952849/stuff/", { "options": (("sections", True),), "count": 5, }), # secret board (#1055) ("https://www.pinterest.de/g1952849/secret/", { "count": 2, }), ("https://www.pinterest.com/g1952848/test/", { "exception": exception.GalleryDLException, }), # .co.uk TLD (#914) ("https://www.pinterest.co.uk/hextra7519/based-animals/"), ) def __init__(self, match): PinterestExtractor.__init__(self, match) self.user = text.unquote(match.group(1)) self.board_name = text.unquote(match.group(2)) self.board = None def metadata(self): self.board = self.api.board(self.user, self.board_name) return {"board": self.board} def pins(self): board = self.board if board["section_count"] and self.config("sections", True): pins = [self.api.board_pins(board["id"])] for section in self.api.board_sections(board["id"]): pins.append(self.api.board_section_pins(section["id"])) return itertools.chain.from_iterable(pins) else: return self.api.board_pins(board["id"]) class PinterestUserExtractor(PinterestExtractor): """Extractor for a user's boards""" subcategory = "user" pattern = BASE_PATTERN + r"/(?!pin/)([^/?#&]+)(?:/_saved)?/?$" test = ( ("https://www.pinterest.de/g1952849/", { "pattern": PinterestBoardExtractor.pattern, "count": ">= 2", }), ("https://www.pinterest.de/g1952849/_saved/"), ) def __init__(self, match): PinterestExtractor.__init__(self, match) self.user = text.unquote(match.group(1)) def items(self): for board in self.api.boards(self.user): url = board.get("url") if url: board["_extractor"] = PinterestBoardExtractor yield Message.Queue, self.root + url, board class PinterestCreatedExtractor(PinterestExtractor): """Extractor for a user's created pins""" subcategory = "created" directory_fmt = ("{category}", "{user}") pattern = BASE_PATTERN + r"/(?!pin/)([^/?#&]+)/_created/?$" test = ("https://www.pinterest.com/amazon/_created", { "pattern": r"https://i\.pinimg\.com/originals/[0-9a-f]{2}" r"/[0-9a-f]{2}/[0-9a-f]{2}/[0-9a-f]{32}\.jpg", "count": 10, }) def __init__(self, match): PinterestExtractor.__init__(self, match) self.user = text.unquote(match.group(1)) def metadata(self): return {"user": self.user} def pins(self): return self.api.user_activity_pins(self.user) class PinterestSectionExtractor(PinterestExtractor): """Extractor for board sections on pinterest.com""" subcategory = "section" directory_fmt = ("{category}", "{board[owner][username]}", "{board[name]}", "{section[title]}") archive_fmt = "{board[id]}_{id}" pattern = BASE_PATTERN + r"/(?!pin/)([^/?#&]+)/([^/?#&]+)/([^/?#&]+)" test = ("https://www.pinterest.com/g1952849/stuff/section", { "count": 2, }) def __init__(self, match): PinterestExtractor.__init__(self, match) self.user = text.unquote(match.group(1)) self.board_slug = text.unquote(match.group(2)) self.section_slug = text.unquote(match.group(3)) self.section = None def metadata(self): section = self.section = self.api.board_section( self.user, self.board_slug, self.section_slug) section.pop("preview_pins", None) return {"board": section.pop("board"), "section": section} def pins(self): return self.api.board_section_pins(self.section["id"]) class PinterestSearchExtractor(PinterestExtractor): """Extractor for Pinterest search results""" subcategory = "search" directory_fmt = ("{category}", "Search", "{search}") pattern = BASE_PATTERN + r"/search/pins/?\?q=([^&#]+)" test = ("https://www.pinterest.de/search/pins/?q=nature", { "range": "1-50", "count": ">= 50", }) def __init__(self, match): PinterestExtractor.__init__(self, match) self.search = match.group(1) def metadata(self): return {"search": self.search} def pins(self): return self.api.search(self.search) class PinterestRelatedPinExtractor(PinterestPinExtractor): """Extractor for related pins of another pin from pinterest.com""" subcategory = "related-pin" directory_fmt = ("{category}", "related {original_pin[id]}") pattern = BASE_PATTERN + r"/pin/([^/?#&]+).*#related$" test = ("https://www.pinterest.com/pin/858146903966145189/#related", { "range": "31-70", "count": 40, "archive": False, }) def metadata(self): return {"original_pin": self.api.pin(self.pin_id)} def pins(self): return self.api.pin_related(self.pin_id) class PinterestRelatedBoardExtractor(PinterestBoardExtractor): """Extractor for related pins of a board from pinterest.com""" subcategory = "related-board" directory_fmt = ("{category}", "{board[owner][username]}", "{board[name]}", "related") pattern = BASE_PATTERN + r"/(?!pin/)([^/?#&]+)/([^/?#&]+)/?#related$" test = ("https://www.pinterest.com/g1952849/test-/#related", { "range": "31-70", "count": 40, "archive": False, }) def pins(self): return self.api.board_related(self.board["id"]) class PinterestPinitExtractor(PinterestExtractor): """Extractor for images from a pin.it URL""" subcategory = "pinit" pattern = r"(?:https?://)?pin\.it/([^/?#&]+)" test = ( ("https://pin.it/Hvt8hgT", { "url": "8daad8558382c68f0868bdbd17d05205184632fa", }), ("https://pin.it/Hvt8hgS", { "exception": exception.NotFoundError, }), ) def __init__(self, match): PinterestExtractor.__init__(self, match) self.shortened_id = match.group(1) def items(self): url = "https://api.pinterest.com/url_shortener/{}/redirect".format( self.shortened_id) response = self.request(url, method="HEAD", allow_redirects=False) location = response.headers.get("Location") if not location or not PinterestPinExtractor.pattern.match(location): raise exception.NotFoundError("pin") yield Message.Queue, location, {"_extractor": PinterestPinExtractor} class PinterestAPI(): """Minimal interface for the Pinterest Web API For a better and more complete implementation in PHP, see - https://github.com/seregazhuk/php-pinterest-bot """ BASE_URL = "https://www.pinterest.com" HEADERS = { "Accept" : "application/json, text/javascript, " "*/*, q=0.01", "Accept-Language" : "en-US,en;q=0.5", "Referer" : BASE_URL + "/", "X-Requested-With" : "XMLHttpRequest", "X-APP-VERSION" : "31461e0", "X-CSRFToken" : None, "X-Pinterest-AppState": "active", "Origin" : BASE_URL, } def __init__(self, extractor): self.extractor = extractor csrf_token = util.generate_token() self.headers = self.HEADERS.copy() self.headers["X-CSRFToken"] = csrf_token self.cookies = {"csrftoken": csrf_token} def pin(self, pin_id): """Query information about a pin""" options = {"id": pin_id, "field_set_key": "detailed"} return self._call("Pin", options)["resource_response"]["data"] def pin_related(self, pin_id): """Yield related pins of another pin""" options = {"pin": pin_id, "add_vase": True, "pins_only": True} return self._pagination("RelatedPinFeed", options) def board(self, user, board_name): """Query information about a board""" options = {"slug": board_name, "username": user, "field_set_key": "detailed"} return self._call("Board", options)["resource_response"]["data"] def boards(self, user): """Yield all boards from 'user'""" options = { "sort" : "last_pinned_to", "field_set_key" : "profile_grid_item", "filter_stories" : False, "username" : user, "page_size" : 25, "include_archived": True, } return self._pagination("Boards", options) def board_pins(self, board_id): """Yield all pins of a specific board""" options = {"board_id": board_id} return self._pagination("BoardFeed", options) def board_section(self, user, board_slug, section_slug): """Yield a specific board section""" options = {"board_slug": board_slug, "section_slug": section_slug, "username": user} return self._call("BoardSection", options)["resource_response"]["data"] def board_sections(self, board_id): """Yield all sections of a specific board""" options = {"board_id": board_id} return self._pagination("BoardSections", options) def board_section_pins(self, section_id): """Yield all pins from a board section""" options = {"section_id": section_id} return self._pagination("BoardSectionPins", options) def board_related(self, board_id): """Yield related pins of a specific board""" options = {"board_id": board_id, "add_vase": True} return self._pagination("BoardRelatedPixieFeed", options) def user_activity_pins(self, user): """Yield pins created by 'user'""" options = { "exclude_add_pin_rep": True, "field_set_key" : "grid_item", "is_own_profile_pins": False, "username" : user, } return self._pagination("UserActivityPins", options) def search(self, query): """Yield pins from searches""" options = {"query": query, "scope": "pins", "rs": "typed"} return self._pagination("BaseSearch", options) def login(self): """Login and obtain session cookies""" username, password = self.extractor._get_auth_info() if username: self.cookies.update(self._login_impl(username, password)) @cache(maxage=180*24*3600, keyarg=1) def _login_impl(self, username, password): self.extractor.log.info("Logging in as %s", username) url = self.BASE_URL + "/resource/UserSessionResource/create/" options = { "username_or_email": username, "password" : password, } data = {"data": json.dumps({"options": options}), "source_url": ""} try: response = self.extractor.request( url, method="POST", headers=self.headers, cookies=self.cookies, data=data) resource = response.json()["resource_response"] except (exception.HttpError, ValueError, KeyError): raise exception.AuthenticationError() if resource["status"] != "success": raise exception.AuthenticationError() return { cookie.name: cookie.value for cookie in response.cookies } def _call(self, resource, options): url = "{}/resource/{}Resource/get/".format(self.BASE_URL, resource) params = {"data": json.dumps({"options": options}), "source_url": ""} response = self.extractor.request( url, params=params, headers=self.headers, cookies=self.cookies, fatal=False) try: data = response.json() except ValueError: data = {} if response.status_code < 400 and not response.history: return data if response.status_code == 404 or response.history: resource = self.extractor.subcategory.rpartition("-")[2] raise exception.NotFoundError(resource) self.extractor.log.debug("Server response: %s", response.text) raise exception.StopExtraction("API request failed") def _pagination(self, resource, options): while True: data = self._call(resource, options) results = data["resource_response"]["data"] if isinstance(results, dict): results = results["results"] yield from results try: bookmarks = data["resource"]["options"]["bookmarks"] if (not bookmarks or bookmarks[0] == "-end-" or bookmarks[0].startswith("Y2JOb25lO")): return options["bookmarks"] = bookmarks except KeyError: return ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1643756959.0 gallery_dl-1.21.1/gallery_dl/extractor/pixiv.py0000644000175000017500000006673314176336637020307 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2014-2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.pixiv.net/""" from .common import Extractor, Message from .. import text, util, exception from ..cache import cache from datetime import datetime, timedelta import itertools import hashlib import time class PixivExtractor(Extractor): """Base class for pixiv extractors""" category = "pixiv" directory_fmt = ("{category}", "{user[id]} {user[account]}") filename_fmt = "{id}_p{num}.{extension}" archive_fmt = "{id}{suffix}.{extension}" cookiedomain = None def __init__(self, match): Extractor.__init__(self, match) self.api = PixivAppAPI(self) self.load_ugoira = self.config("ugoira", True) self.max_posts = self.config("max-posts", 0) def items(self): tags = self.config("tags", "japanese") if tags == "original": transform_tags = None elif tags == "translated": def transform_tags(work): work["tags"] = list(set( tag["translated_name"] or tag["name"] for tag in work["tags"])) else: def transform_tags(work): work["tags"] = [tag["name"] for tag in work["tags"]] ratings = {0: "General", 1: "R-18", 2: "R-18G"} metadata = self.metadata() works = self.works() if self.max_posts: works = itertools.islice(works, self.max_posts) for work in works: if not work["user"]["id"]: continue meta_single_page = work["meta_single_page"] meta_pages = work["meta_pages"] del work["meta_single_page"] del work["image_urls"] del work["meta_pages"] if transform_tags: transform_tags(work) work["num"] = 0 work["date"] = text.parse_datetime(work["create_date"]) work["rating"] = ratings.get(work["x_restrict"]) work["suffix"] = "" work.update(metadata) yield Message.Directory, work if work["type"] == "ugoira": if not self.load_ugoira: continue ugoira = self.api.ugoira_metadata(work["id"]) url = ugoira["zip_urls"]["medium"].replace( "_ugoira600x600", "_ugoira1920x1080") work["frames"] = ugoira["frames"] work["_http_adjust_extension"] = False yield Message.Url, url, text.nameext_from_url(url, work) elif work["page_count"] == 1: url = meta_single_page["original_image_url"] yield Message.Url, url, text.nameext_from_url(url, work) else: for work["num"], img in enumerate(meta_pages): url = img["image_urls"]["original"] work["suffix"] = "_p{:02}".format(work["num"]) yield Message.Url, url, text.nameext_from_url(url, work) def works(self): """Return an iterable containing all relevant 'work'-objects""" def metadata(self): """Collect metadata for extractor-job""" return {} class PixivUserExtractor(PixivExtractor): """Extractor for works of a pixiv-user""" subcategory = "user" pattern = (r"(?:https?://)?(?:www\.|touch\.)?pixiv\.net/(?:" r"(?:en/)?users/(\d+)(?:/(?:artworks|illustrations|manga)" r"(?:/([^/?#]+))?)?/?(?:$|[?#])" r"|member(?:_illust)?\.php\?id=(\d+)(?:&([^#]+))?" r"|(?:u(?:ser)?/|(?:mypage\.php)?#id=)(\d+))") test = ( ("https://www.pixiv.net/en/users/173530/artworks", { "url": "852c31ad83b6840bacbce824d85f2a997889efb7", }), # illusts with specific tag (("https://www.pixiv.net/en/users/173530/artworks" "/%E6%89%8B%E3%81%B6%E3%82%8D"), { "url": "25b1cd81153a8ff82eec440dd9f20a4a22079658", }), (("https://www.pixiv.net/member_illust.php?id=173530" "&tag=%E6%89%8B%E3%81%B6%E3%82%8D"), { "url": "25b1cd81153a8ff82eec440dd9f20a4a22079658", }), # avatar (#595, 623) ("https://www.pixiv.net/en/users/173530", { "options": (("avatar", True),), "content": "4e57544480cc2036ea9608103e8f024fa737fe66", "range": "1", }), # deleted account ("http://www.pixiv.net/member_illust.php?id=173531", { "options": (("metadata", True),), "exception": exception.NotFoundError, }), ("https://www.pixiv.net/en/users/173530"), ("https://www.pixiv.net/en/users/173530/manga"), ("https://www.pixiv.net/en/users/173530/illustrations"), ("https://www.pixiv.net/member_illust.php?id=173530"), ("https://www.pixiv.net/u/173530"), ("https://www.pixiv.net/user/173530"), ("https://www.pixiv.net/mypage.php#id=173530"), ("https://www.pixiv.net/#id=173530"), ("https://touch.pixiv.net/member_illust.php?id=173530"), ) def __init__(self, match): PixivExtractor.__init__(self, match) u1, t1, u2, t2, u3 = match.groups() if t1: t1 = text.unquote(t1) elif t2: t2 = text.parse_query(t2).get("tag") self.user_id = u1 or u2 or u3 self.tag = t1 or t2 def metadata(self): if self.config("metadata"): return {"user": self.api.user_detail(self.user_id)} return {} def works(self): works = self.api.user_illusts(self.user_id) if self.tag: tag = self.tag.lower() works = ( work for work in works if tag in [t["name"].lower() for t in work["tags"]] ) if self.config("avatar"): user = self.api.user_detail(self.user_id) url = user["profile_image_urls"]["medium"].replace("_170.", ".") avatar = { "create_date" : None, "height" : 0, "id" : "avatar", "image_urls" : None, "meta_pages" : (), "meta_single_page": {"original_image_url": url}, "page_count" : 1, "sanity_level" : 0, "tags" : (), "title" : "avatar", "type" : "avatar", "user" : user, "width" : 0, "x_restrict" : 0, } works = itertools.chain((avatar,), works) return works class PixivMeExtractor(PixivExtractor): """Extractor for pixiv.me URLs""" subcategory = "me" pattern = r"(?:https?://)?pixiv\.me/([^/?#]+)" test = ( ("https://pixiv.me/del_shannon", { "url": "29c295ce75150177e6b0a09089a949804c708fbf", }), ("https://pixiv.me/del_shanno", { "exception": exception.NotFoundError, }), ) def __init__(self, match): PixivExtractor.__init__(self, match) self.account = match.group(1) def items(self): url = "https://pixiv.me/" + self.account data = {"_extractor": PixivUserExtractor} response = self.request( url, method="HEAD", allow_redirects=False, notfound="user") yield Message.Queue, response.headers["Location"], data class PixivWorkExtractor(PixivExtractor): """Extractor for a single pixiv work/illustration""" subcategory = "work" pattern = (r"(?:https?://)?(?:(?:www\.|touch\.)?pixiv\.net" r"/(?:(?:en/)?artworks/" r"|member_illust\.php\?(?:[^&]+&)*illust_id=)(\d+)" r"|(?:i(?:\d+\.pixiv|\.pximg)\.net" r"/(?:(?:.*/)?img-[^/]+/img/\d{4}(?:/\d\d){5}|img\d+/img/[^/]+)" r"|img\d*\.pixiv\.net/img/[^/]+|(?:www\.)?pixiv\.net/i)/(\d+))") test = ( ("https://www.pixiv.net/artworks/966412", { "url": "90c1715b07b0d1aad300bce256a0bc71f42540ba", "content": "69a8edfb717400d1c2e146ab2b30d2c235440c5a", }), (("http://www.pixiv.net/member_illust.php" "?mode=medium&illust_id=966411"), { "exception": exception.NotFoundError, }), # ugoira (("https://www.pixiv.net/member_illust.php" "?mode=medium&illust_id=66806629"), { "url": "7267695a985c4db8759bebcf8d21dbdd2d2317ef", "keywords": {"frames": list}, }), # related works (#1237) ("https://www.pixiv.net/artworks/966412", { "options": (("related", True),), "range": "1-10", "count": ">= 10", }), ("https://www.pixiv.net/en/artworks/966412"), ("http://www.pixiv.net/member_illust.php?mode=medium&illust_id=96641"), ("http://i1.pixiv.net/c/600x600/img-master" "/img/2008/06/13/00/29/13/966412_p0_master1200.jpg"), ("https://i.pximg.net/img-original" "/img/2017/04/25/07/33/29/62568267_p0.png"), ("https://www.pixiv.net/i/966412"), ("http://img.pixiv.net/img/soundcross/42626136.jpg"), ("http://i2.pixiv.net/img76/img/snailrin/42672235.jpg"), ) def __init__(self, match): PixivExtractor.__init__(self, match) self.illust_id = match.group(1) or match.group(2) def works(self): works = (self.api.illust_detail(self.illust_id),) if self.config("related", False): related = self.api.illust_related(self.illust_id) works = itertools.chain(works, related) return works class PixivFavoriteExtractor(PixivExtractor): """Extractor for all favorites/bookmarks of a pixiv-user""" subcategory = "favorite" directory_fmt = ("{category}", "bookmarks", "{user_bookmark[id]} {user_bookmark[account]}") archive_fmt = "f_{user_bookmark[id]}_{id}{num}.{extension}" pattern = (r"(?:https?://)?(?:www\.|touch\.)?pixiv\.net/(?:(?:en/)?" r"users/(\d+)/(bookmarks/artworks|following)(?:/([^/?#]+))?" r"|bookmark\.php)(?:\?([^#]*))?") test = ( ("https://www.pixiv.net/en/users/173530/bookmarks/artworks", { "url": "e717eb511500f2fa3497aaee796a468ecf685cc4", }), ("https://www.pixiv.net/bookmark.php?id=173530", { "url": "e717eb511500f2fa3497aaee796a468ecf685cc4", }), # bookmarks with specific tag (("https://www.pixiv.net/en/users/3137110" "/bookmarks/artworks/%E3%81%AF%E3%82%93%E3%82%82%E3%82%93"), { "url": "379b28275f786d946e01f721e54afe346c148a8c", }), # bookmarks with specific tag (legacy url) (("https://www.pixiv.net/bookmark.php?id=3137110" "&tag=%E3%81%AF%E3%82%93%E3%82%82%E3%82%93&p=1"), { "url": "379b28275f786d946e01f721e54afe346c148a8c", }), # own bookmarks ("https://www.pixiv.net/bookmark.php", { "url": "90c1715b07b0d1aad300bce256a0bc71f42540ba", }), # own bookmarks with tag (#596) ("https://www.pixiv.net/bookmark.php?tag=foobar", { "count": 0, }), # followed users (#515) ("https://www.pixiv.net/en/users/173530/following", { "pattern": PixivUserExtractor.pattern, "count": ">= 12", }), # followed users (legacy url) (#515) ("https://www.pixiv.net/bookmark.php?id=173530&type=user", { "pattern": PixivUserExtractor.pattern, "count": ">= 12", }), # touch URLs ("https://touch.pixiv.net/bookmark.php?id=173530"), ("https://touch.pixiv.net/bookmark.php"), ) def __init__(self, match): uid, kind, self.tag, query = match.groups() query = text.parse_query(query) if not uid: uid = query.get("id") if not uid: self.subcategory = "bookmark" if kind == "following" or query.get("type") == "user": self.subcategory = "following" self.items = self._items_following PixivExtractor.__init__(self, match) self.query = query self.user_id = uid def works(self): tag = None if "tag" in self.query: tag = text.unquote(self.query["tag"]) elif self.tag: tag = text.unquote(self.tag) restrict = "public" if self.query.get("rest") == "hide": restrict = "private" return self.api.user_bookmarks_illust(self.user_id, tag, restrict) def metadata(self): if self.user_id: user = self.api.user_detail(self.user_id) else: self.api.login() user = self.api.user self.user_id = user["id"] return {"user_bookmark": user} def _items_following(self): restrict = "public" if self.query.get("rest") == "hide": restrict = "private" for preview in self.api.user_following(self.user_id, restrict): user = preview["user"] user["_extractor"] = PixivUserExtractor url = "https://www.pixiv.net/users/{}".format(user["id"]) yield Message.Queue, url, user class PixivRankingExtractor(PixivExtractor): """Extractor for pixiv ranking pages""" subcategory = "ranking" archive_fmt = "r_{ranking[mode]}_{ranking[date]}_{id}{num}.{extension}" directory_fmt = ("{category}", "rankings", "{ranking[mode]}", "{ranking[date]}") pattern = (r"(?:https?://)?(?:www\.|touch\.)?pixiv\.net" r"/ranking\.php(?:\?([^#]*))?") test = ( ("https://www.pixiv.net/ranking.php?mode=daily&date=20170818"), ("https://www.pixiv.net/ranking.php"), ("https://touch.pixiv.net/ranking.php"), ) def __init__(self, match): PixivExtractor.__init__(self, match) self.query = match.group(1) self.mode = self.date = None def works(self): return self.api.illust_ranking(self.mode, self.date) def metadata(self): query = text.parse_query(self.query) mode = query.get("mode", "daily").lower() mode_map = { "daily": "day", "daily_r18": "day_r18", "weekly": "week", "weekly_r18": "week_r18", "monthly": "month", "male": "day_male", "male_r18": "day_male_r18", "female": "day_female", "female_r18": "day_female_r18", "original": "week_original", "rookie": "week_rookie", "r18g": "week_r18g", } if mode not in mode_map: self.log.warning("invalid mode '%s'", mode) mode = "daily" self.mode = mode_map[mode] date = query.get("date") if date: if len(date) == 8 and date.isdecimal(): date = "{}-{}-{}".format(date[0:4], date[4:6], date[6:8]) else: self.log.warning("invalid date '%s'", date) date = None if not date: date = (datetime.utcnow() - timedelta(days=1)).strftime("%Y-%m-%d") self.date = date return {"ranking": { "mode": mode, "date": self.date, }} class PixivSearchExtractor(PixivExtractor): """Extractor for pixiv search results""" subcategory = "search" archive_fmt = "s_{search[word]}_{id}{num}.{extension}" directory_fmt = ("{category}", "search", "{search[word]}") pattern = (r"(?:https?://)?(?:www\.|touch\.)?pixiv\.net" r"/(?:(?:en/)?tags/([^/?#]+)(?:/[^/?#]+)?/?" r"|search\.php)(?:\?([^#]+))?") test = ( ("https://www.pixiv.net/en/tags/Original", { "range": "1-10", "count": 10, }), ("https://www.pixiv.net/en/tags/foo/artworks?order=date&s_mode=s_tag"), ("https://www.pixiv.net/search.php?s_mode=s_tag&word=Original"), ("https://touch.pixiv.net/search.php?word=Original"), ) def __init__(self, match): PixivExtractor.__init__(self, match) self.word, self.query = match.groups() self.sort = self.target = None def works(self): return self.api.search_illust( self.word, self.sort, self.target, date_start=self.date_start, date_end=self.date_end) def metadata(self): query = text.parse_query(self.query) if self.word: self.word = text.unquote(self.word) else: if "word" not in query: raise exception.StopExtraction("Missing search term") self.word = query["word"] sort = query.get("order", "date_d") sort_map = { "date": "date_asc", "date_d": "date_desc", } if sort not in sort_map: self.log.warning("invalid sort order '%s'", sort) sort = "date_d" self.sort = sort_map[sort] target = query.get("s_mode", "s_tag") target_map = { "s_tag": "partial_match_for_tags", "s_tag_full": "exact_match_for_tags", "s_tc": "title_and_caption", } if target not in target_map: self.log.warning("invalid search target '%s'", target) target = "s_tag" self.target = target_map[target] self.date_start = query.get("scd") self.date_end = query.get("ecd") return {"search": { "word": self.word, "sort": self.sort, "target": self.target, "date_start": self.date_start, "date_end": self.date_end, }} class PixivFollowExtractor(PixivExtractor): """Extractor for new illustrations from your followed artists""" subcategory = "follow" archive_fmt = "F_{user_follow[id]}_{id}{num}.{extension}" directory_fmt = ("{category}", "following") pattern = (r"(?:https?://)?(?:www\.|touch\.)?pixiv\.net" r"/bookmark_new_illust\.php") test = ( ("https://www.pixiv.net/bookmark_new_illust.php"), ("https://touch.pixiv.net/bookmark_new_illust.php"), ) def works(self): return self.api.illust_follow() def metadata(self): self.api.login() return {"user_follow": self.api.user} class PixivPixivisionExtractor(PixivExtractor): """Extractor for illustrations from a pixivision article""" subcategory = "pixivision" directory_fmt = ("{category}", "pixivision", "{pixivision_id} {pixivision_title}") archive_fmt = "V{pixivision_id}_{id}{suffix}.{extension}" pattern = r"(?:https?://)?(?:www\.)?pixivision\.net/(?:en/)?a/(\d+)" test = ( ("https://www.pixivision.net/en/a/2791"), ("https://pixivision.net/a/2791", { "count": 7, "keyword": { "pixivision_id": "2791", "pixivision_title": "What's your favorite music? Editor’s " "picks featuring: “CD Covers”!", }, }), ) def __init__(self, match): PixivExtractor.__init__(self, match) self.pixivision_id = match.group(1) def works(self): return ( self.api.illust_detail(illust_id) for illust_id in util.unique_sequence(text.extract_iter( self.page, '', '<')[0] return { "pixivision_id" : self.pixivision_id, "pixivision_title": text.unescape(title), } class PixivSketchExtractor(Extractor): """Extractor for user pages on sketch.pixiv.net""" category = "pixiv" subcategory = "sketch" directory_fmt = ("{category}", "sketch", "{user[unique_name]}") filename_fmt = "{post_id} {id}.{extension}" archive_fmt = "S{user[id]}_{id}" root = "https://sketch.pixiv.net" cookiedomain = ".pixiv.net" pattern = r"(?:https?://)?sketch\.pixiv\.net/@([^/?#]+)" test = ("https://sketch.pixiv.net/@nicoby", { "pattern": r"https://img\-sketch\.pixiv\.net/uploads/medium" r"/file/\d+/\d+\.(jpg|png)", "count": ">= 35", }) def __init__(self, match): Extractor.__init__(self, match) self.username = match.group(1) def items(self): headers = {"Referer": "{}/@{}".format(self.root, self.username)} for post in self.posts(): media = post["media"] post["post_id"] = post["id"] post["date"] = text.parse_datetime( post["created_at"], "%Y-%m-%dT%H:%M:%S.%f%z") util.delete_items(post, ("id", "media", "_links")) yield Message.Directory, post post["_http_headers"] = headers for photo in media: original = photo["photo"]["original"] post["id"] = photo["id"] post["width"] = original["width"] post["height"] = original["height"] url = original["url"] text.nameext_from_url(url, post) yield Message.Url, url, post def posts(self): url = "{}/api/walls/@{}/posts/public.json".format( self.root, self.username) headers = { "Accept": "application/vnd.sketch-v4+json", "X-Requested-With": "{}/@{}".format(self.root, self.username), "Referer": self.root + "/", } while True: data = self.request(url, headers=headers).json() yield from data["data"]["items"] next_url = data["_links"].get("next") if not next_url: return url = self.root + next_url["href"] class PixivAppAPI(): """Minimal interface for the Pixiv App API for mobile devices For a more complete implementation or documentation, see - https://github.com/upbit/pixivpy - https://gist.github.com/ZipFile/3ba99b47162c23f8aea5d5942bb557b1 """ CLIENT_ID = "MOBrBDS8blbauoSck0ZfDbtuzpyT" CLIENT_SECRET = "lsACyCD94FhDUtGTXi3QzcFE2uU1hqtDaKeqrdwj" HASH_SECRET = ("28c1fdd170a5204386cb1313c7077b34" "f83e4aaf4aa829ce78c231e05b0bae2c") def __init__(self, extractor): self.extractor = extractor self.log = extractor.log self.username = extractor._get_auth_info()[0] self.user = None extractor.session.headers.update({ "App-OS" : "ios", "App-OS-Version": "13.1.2", "App-Version" : "7.7.6", "User-Agent" : "PixivIOSApp/7.7.6 (iOS 13.1.2; iPhone11,8)", "Referer" : "https://app-api.pixiv.net/", }) self.client_id = extractor.config( "client-id", self.CLIENT_ID) self.client_secret = extractor.config( "client-secret", self.CLIENT_SECRET) token = extractor.config("refresh-token") if token is None or token == "cache": token = _refresh_token_cache(self.username) self.refresh_token = token def login(self): """Login and gain an access token""" self.user, auth = self._login_impl(self.username) self.extractor.session.headers["Authorization"] = auth @cache(maxage=3600, keyarg=1) def _login_impl(self, username): if not self.refresh_token: raise exception.AuthenticationError( "'refresh-token' required.\n" "Run `gallery-dl oauth:pixiv` to get one.") self.log.info("Refreshing access token") url = "https://oauth.secure.pixiv.net/auth/token" data = { "client_id" : self.client_id, "client_secret" : self.client_secret, "grant_type" : "refresh_token", "refresh_token" : self.refresh_token, "get_secure_url": "1", } time = datetime.utcnow().strftime("%Y-%m-%dT%H:%M:%S+00:00") headers = { "X-Client-Time": time, "X-Client-Hash": hashlib.md5( (time + self.HASH_SECRET).encode()).hexdigest(), } response = self.extractor.request( url, method="POST", headers=headers, data=data, fatal=False) if response.status_code >= 400: self.log.debug(response.text) raise exception.AuthenticationError("Invalid refresh token") data = response.json()["response"] return data["user"], "Bearer " + data["access_token"] def illust_detail(self, illust_id): params = {"illust_id": illust_id} return self._call("v1/illust/detail", params)["illust"] def illust_follow(self, restrict="all"): params = {"restrict": restrict} return self._pagination("v2/illust/follow", params) def illust_ranking(self, mode="day", date=None): params = {"mode": mode, "date": date} return self._pagination("v1/illust/ranking", params) def illust_related(self, illust_id): params = {"illust_id": illust_id} return self._pagination("v2/illust/related", params) def search_illust(self, word, sort=None, target=None, duration=None, date_start=None, date_end=None): params = {"word": word, "search_target": target, "sort": sort, "duration": duration, "start_date": date_start, "end_date": date_end} return self._pagination("v1/search/illust", params) def user_bookmarks_illust(self, user_id, tag=None, restrict="public"): params = {"user_id": user_id, "tag": tag, "restrict": restrict} return self._pagination("v1/user/bookmarks/illust", params) def user_detail(self, user_id): params = {"user_id": user_id} return self._call("v1/user/detail", params)["user"] def user_following(self, user_id, restrict="public"): params = {"user_id": user_id, "restrict": restrict} return self._pagination("v1/user/following", params, "user_previews") def user_illusts(self, user_id): params = {"user_id": user_id} return self._pagination("v1/user/illusts", params) def ugoira_metadata(self, illust_id): params = {"illust_id": illust_id} return self._call("v1/ugoira/metadata", params)["ugoira_metadata"] def _call(self, endpoint, params=None): url = "https://app-api.pixiv.net/" + endpoint self.login() response = self.extractor.request(url, params=params, fatal=False) data = response.json() if "error" in data: if response.status_code == 404: raise exception.NotFoundError() error = data["error"] if "rate limit" in (error.get("message") or "").lower(): self.log.info("Waiting two minutes for API rate limit reset.") time.sleep(120) return self._call(endpoint, params) raise exception.StopExtraction("API request failed: %s", error) return data def _pagination(self, endpoint, params, key="illusts"): while True: data = self._call(endpoint, params) yield from data[key] if not data["next_url"]: return query = data["next_url"].rpartition("?")[2] params = text.parse_query(query) @cache(maxage=10*365*24*3600, keyarg=0) def _refresh_token_cache(username): return None ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1643756959.0 gallery_dl-1.21.1/gallery_dl/extractor/pixnet.py0000644000175000017500000001527014176336637020445 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2016-2020 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.pixnet.net/""" from .common import Extractor, Message from .. import text, exception BASE_PATTERN = r"(?:https?://)?(?!www\.)([\w-]+)\.pixnet.net" class PixnetExtractor(Extractor): """Base class for pixnet extractors""" category = "pixnet" filename_fmt = "{num:>03}_{id}.{extension}" archive_fmt = "{id}" url_fmt = "" def __init__(self, match): Extractor.__init__(self, match) self.blog, self.item_id = match.groups() self.root = "https://{}.pixnet.net".format(self.blog) def items(self): url = self.url_fmt.format(self.root, self.item_id) page = self.request(url, encoding="utf-8").text user = text.extract(page, '')[0] if pnext is None and 'name="albumpass">' in page: raise exception.StopExtraction( "Album %s is password-protected.", self.item_id) if "href" not in pnext: return url = self.root + text.extract(pnext, 'href="', '"')[0] page = self.request(url, encoding="utf-8").text class PixnetImageExtractor(PixnetExtractor): """Extractor for a single photo from pixnet.net""" subcategory = "image" filename_fmt = "{id}.{extension}" directory_fmt = ("{category}", "{blog}") pattern = BASE_PATTERN + r"/album/photo/(\d+)" test = ("https://albertayu773.pixnet.net/album/photo/159443828", { "url": "156564c422138914c9fa5b42191677b45c414af4", "keyword": "19971bcd056dfef5593f4328a723a9602be0f087", "content": "0e097bdf49e76dd9b9d57a016b08b16fa6a33280", }) def items(self): url = "https://api.pixnet.cc/oembed" params = { "url": "https://{}.pixnet.net/album/photo/{}".format( self.blog, self.item_id), "format": "json", } data = self.request(url, params=params).json() data["id"] = text.parse_int( data["url"].rpartition("/")[2].partition("-")[0]) data["filename"], _, data["extension"] = data["title"].rpartition(".") data["blog"] = self.blog data["user"] = data.pop("author_name") yield Message.Directory, data yield Message.Url, data["url"], data class PixnetSetExtractor(PixnetExtractor): """Extractor for images from a pixnet set""" subcategory = "set" url_fmt = "{}/album/set/{}" directory_fmt = ("{category}", "{blog}", "{folder_id} {folder_title}", "{set_id} {set_title}") pattern = BASE_PATTERN + r"/album/set/(\d+)" test = ( ("https://albertayu773.pixnet.net/album/set/15078995", { "url": "6535712801af47af51110542f4938a7cef44557f", "keyword": "bf25d59e5b0959cb1f53e7fd2e2a25f2f67e5925", }), ("https://anrine910070.pixnet.net/album/set/5917493", { "url": "b3eb6431aea0bcf5003432a4a0f3a3232084fc13", "keyword": "bf7004faa1cea18cf9bd856f0955a69be51b1ec6", }), ("https://sky92100.pixnet.net/album/set/17492544", { "count": 0, # password-protected }), ) def items(self): url = self.url_fmt.format(self.root, self.item_id) page = self.request(url, encoding="utf-8").text data = self.metadata(page) yield Message.Directory, data for num, info in enumerate(self._pagination(page), 1): url, pos = text.extract(info, ' href="', '"') src, pos = text.extract(info, ' src="', '"', pos) alt, pos = text.extract(info, ' alt="', '"', pos) photo = { "id": text.parse_int(url.rpartition("/")[2].partition("#")[0]), "url": src.replace("_s.", "."), "num": num, "filename": alt, "extension": src.rpartition(".")[2], } photo.update(data) yield Message.Url, photo["url"], photo def metadata(self, page): user , pos = text.extract(page, '', '<', pos) sid , pos = text.extract(page, '/set/', '"', pos) sname, pos = text.extract(page, '>', '<', pos) return { "blog": self.blog, "user": user.rpartition(" (")[0], "folder_id" : text.parse_int(fid, ""), "folder_title": text.unescape(fname).strip(), "set_id" : text.parse_int(sid), "set_title" : text.unescape(sname), } class PixnetFolderExtractor(PixnetExtractor): """Extractor for all sets in a pixnet folder""" subcategory = "folder" url_fmt = "{}/album/folder/{}" pattern = BASE_PATTERN + r"/album/folder/(\d+)" test = ("https://albertayu773.pixnet.net/album/folder/1405768", { "pattern": PixnetSetExtractor.pattern, "count": ">= 15", }) class PixnetUserExtractor(PixnetExtractor): """Extractor for all sets and folders of a pixnet user""" subcategory = "user" url_fmt = "{}{}/album/list" pattern = BASE_PATTERN + r"()(?:/blog|/album(?:/list)?)?/?(?:$|[?#])" test = ( ("https://albertayu773.pixnet.net/"), ("https://albertayu773.pixnet.net/blog"), ("https://albertayu773.pixnet.net/album"), ("https://albertayu773.pixnet.net/album/list", { "pattern": PixnetFolderExtractor.pattern, "count": ">= 30", }), ("https://anrine910070.pixnet.net/album/list", { "pattern": PixnetSetExtractor.pattern, "count": ">= 14", }), ) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1639190302.0 gallery_dl-1.21.1/gallery_dl/extractor/plurk.py0000644000175000017500000001040214155007436020250 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2019-2020 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.plurk.com/""" from .common import Extractor, Message from .. import text, exception import datetime import time import json import re class PlurkExtractor(Extractor): """Base class for plurk extractors""" category = "plurk" root = "https://www.plurk.com" def items(self): urls = self._urls_ex if self.config("comments", False) else self._urls for plurk in self.plurks(): for url in urls(plurk): yield Message.Queue, url, plurk def plurks(self): """Return an iterable with all relevant 'plurk' objects""" @staticmethod def _urls(obj): """Extract URLs from a 'plurk' object""" return text.extract_iter(obj["content"], ' href="', '"') def _urls_ex(self, plurk): """Extract URLs from a 'plurk' and its comments""" yield from self._urls(plurk) for comment in self._comments(plurk): yield from self._urls(comment) def _comments(self, plurk): """Return an iterable with a 'plurk's comments""" url = "https://www.plurk.com/Responses/get" data = {"plurk_id": plurk["id"], "count": "200"} headers = { "Origin": self.root, "Referer": self.root, "X-Requested-With": "XMLHttpRequest", } while True: info = self.request( url, method="POST", headers=headers, data=data).json() yield from info["responses"] if not info["has_newer"]: return elif info["has_newer"] < 200: del data["count"] time.sleep(1) data["from_response_id"] = info["responses"][-1]["id"] + 1 @staticmethod def _load(data): if not data: raise exception.NotFoundError("user") return json.loads(re.sub(r"new Date\(([^)]+)\)", r"\1", data)) class PlurkTimelineExtractor(PlurkExtractor): """Extractor for URLs from all posts in a Plurk timeline""" subcategory = "timeline" pattern = r"(?:https?://)?(?:www\.)?plurk\.com/(?!p/)(\w+)/?(?:$|[?#])" test = ("https://www.plurk.com/plurkapi", { "pattern": r"https?://.+", "count": ">= 23" }) def __init__(self, match): PlurkExtractor.__init__(self, match) self.user = match.group(1) def plurks(self): url = "{}/{}".format(self.root, self.user) page = self.request(url).text user_id, pos = text.extract(page, '"user_id":', ',') plurks = self._load(text.extract(page, "_PLURKS = ", ";\n", pos)[0]) headers = {"Referer": url, "X-Requested-With": "XMLHttpRequest"} data = {"user_id": user_id.strip()} url = "https://www.plurk.com/TimeLine/getPlurks" while plurks: yield from plurks offset = datetime.datetime.strptime( plurks[-1]["posted"], "%a, %d %b %Y %H:%M:%S %Z") data["offset"] = offset.strftime("%Y-%m-%dT%H:%M:%S.000Z") response = self.request( url, method="POST", headers=headers, data=data) plurks = response.json()["plurks"] class PlurkPostExtractor(PlurkExtractor): """Extractor for URLs from a Plurk post""" subcategory = "post" pattern = r"(?:https?://)?(?:www\.)?plurk\.com/p/(\w+)" test = ( ("https://www.plurk.com/p/i701j1", { "url": "2115f208564591b8748525c2807a84596aaaaa5f", "count": 3, }), ("https://www.plurk.com/p/i701j1", { "options": (("comments", True),), "count": ">= 210", }), ) def __init__(self, match): PlurkExtractor.__init__(self, match) self.plurk_id = match.group(1) def plurks(self): url = "{}/p/{}".format(self.root, self.plurk_id) page = self.request(url).text user, pos = text.extract(page, " GLOBAL = ", "\n") data, pos = text.extract(page, "plurk = ", ";\n", pos) data = self._load(data) data["user"] = self._load(user)["page_user"] return (data,) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1643756959.0 gallery_dl-1.21.1/gallery_dl/extractor/pornhub.py0000644000175000017500000001242714176336637020614 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2019-2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.pornhub.com/""" from .common import Extractor, Message from .. import text, exception BASE_PATTERN = r"(?:https?://)?(?:[\w-]+\.)?pornhub\.com" class PornhubExtractor(Extractor): """Base class for pornhub extractors""" category = "pornhub" root = "https://www.pornhub.com" class PornhubGalleryExtractor(PornhubExtractor): """Extractor for image galleries on pornhub.com""" subcategory = "gallery" directory_fmt = ("{category}", "{user}", "{gallery[id]} {gallery[title]}") filename_fmt = "{num:>03}_{id}.{extension}" archive_fmt = "{id}" pattern = BASE_PATTERN + r"/album/(\d+)" test = ( ("https://www.pornhub.com/album/19289801", { "pattern": r"https://\w+.phncdn.com/pics/albums/\d+/\d+/\d+/\d+/", "count": ">= 300", "keyword": { "id" : int, "num" : int, "score" : int, "views" : int, "caption": str, "user" : "Danika Mori", "gallery": { "id" : 19289801, "score": int, "views": int, "tags" : list, "title": "Danika Mori Best Moments", }, }, }), ("https://www.pornhub.com/album/69040172", { "exception": exception.AuthorizationError, }), ) def __init__(self, match): PornhubExtractor.__init__(self, match) self.gallery_id = match.group(1) self._first = None def items(self): data = self.metadata() yield Message.Directory, data for num, image in enumerate(self.images(), 1): url = image["url"] image.update(data) image["num"] = num yield Message.Url, url, text.nameext_from_url(url, image) def metadata(self): url = "{}/album/{}".format( self.root, self.gallery_id) extr = text.extract_from(self.request(url).text) title = extr("", "") score = extr('
    ', '<') tags = extr('
    = 6", }), ("https://www.pornhub.com/users/flyings0l0/"), ("https://www.pornhub.com/users/flyings0l0/photos/public"), ("https://www.pornhub.com/users/flyings0l0/photos/private"), ("https://www.pornhub.com/users/flyings0l0/photos/favorites"), ("https://www.pornhub.com/model/bossgirl/photos"), ) def __init__(self, match): PornhubExtractor.__init__(self, match) self.type, self.user, self.cat = match.groups() def items(self): url = "{}/{}/{}/photos/{}/ajax".format( self.root, self.type, self.user, self.cat or "public") params = {"page": 1} headers = { "Referer": url[:-5], "X-Requested-With": "XMLHttpRequest", } data = {"_extractor": PornhubGalleryExtractor} while True: page = self.request( url, method="POST", headers=headers, params=params).text if not page: return for gid in text.extract_iter(page, 'id="albumphoto', '"'): yield Message.Queue, self.root + "/album/" + gid, data params["page"] += 1 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1643756959.0 gallery_dl-1.21.1/gallery_dl/extractor/pururin.py0000644000175000017500000000757114176336637020647 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2019-2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://pururin.to/""" from .common import GalleryExtractor from .. import text, util import binascii import json class PururinGalleryExtractor(GalleryExtractor): """Extractor for image galleries on pururin.io""" category = "pururin" pattern = r"(?:https?://)?(?:www\.)?pururin\.[ti]o/(?:gallery|read)/(\d+)" test = ( ("https://pururin.to/gallery/38661/iowant-2", { "pattern": r"https://cdn.pururin.[ti]o/\w+" r"/images/data/\d+/\d+\.jpg", "keyword": { "title" : "re:I ?owant 2!!", "title_en" : "re:I ?owant 2!!", "title_jp" : "", "gallery_id": 38661, "count" : 19, "artist" : ["Shoda Norihiro"], "group" : ["Obsidian Order"], "parody" : ["Kantai Collection"], "characters": ["Iowa", "Teitoku"], "tags" : list, "type" : "Doujinshi", "collection": "I owant you!", "convention": "C92", "rating" : float, "uploader" : "demo", "scanlator" : "mrwayne", "lang" : "en", "language" : "English", } }), ("https://pururin.to/gallery/7661/unisis-team-vanilla", { "count": 17, }), ("https://pururin.io/gallery/38661/iowant-2"), ) root = "https://pururin.to" def __init__(self, match): self.gallery_id = match.group(1) url = "{}/gallery/{}/x".format(self.root, self.gallery_id) GalleryExtractor.__init__(self, match, url) self._ext = "" self._cnt = 0 def metadata(self, page): extr = text.extract_from(page) def _lst(key, e=extr): return [ text.unescape(item) for item in text.extract_iter(e(key, ""), 'title="', '"') ] def _str(key, e=extr): return text.unescape(text.extract( e(key, ""), 'title="', '"')[0] or "") url = "{}/read/{}/01/x".format(self.root, self.gallery_id) page = self.request(url).text info = json.loads(binascii.a2b_base64(text.extract( page, 'Artist"), "group" : _lst("Circle"), "parody" : _lst("Parody"), "tags" : _lst("Contents"), "type" : _str("Category"), "characters": _lst("Character"), "collection": _str("Collection"), "language" : _str("Language"), "scanlator" : _str("Scanlator"), "convention": _str("Convention"), "uploader" : text.remove_html(extr("Uploader", "")), "rating" : text.parse_float(extr(" :rating='" , "'")), } data["lang"] = util.language_to_code(data["language"]) return data def images(self, _): ufmt = "https://cdn.pururin.to/assets/images/data/{}/{{}}.{}".format( self.gallery_id, self._ext) return [(ufmt.format(num), None) for num in range(1, self._cnt + 1)] ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1644453306.0 gallery_dl-1.21.1/gallery_dl/extractor/reactor.py0000644000175000017500000002516614201056672020565 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2019-2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Generic extractors for *reactor sites""" from .common import BaseExtractor, Message from .. import text import urllib.parse import json class ReactorExtractor(BaseExtractor): """Base class for *reactor.cc extractors""" basecategory = "reactor" filename_fmt = "{post_id}_{num:>02}{title[:100]:?_//}.{extension}" archive_fmt = "{post_id}_{num}" request_interval = 5.0 def __init__(self, match): BaseExtractor.__init__(self, match) url = text.ensure_http_scheme(match.group(0), "http://") pos = url.index("/", 10) self.root, self.path = url[:pos], url[pos:] self.session.headers["Referer"] = self.root self.gif = self.config("gif", False) if self.category == "reactor": # set category based on domain name netloc = urllib.parse.urlsplit(self.root).netloc self.category = netloc.rpartition(".")[0] def items(self): data = self.metadata() yield Message.Directory, data for post in self.posts(): for image in self._parse_post(post): url = image["url"] image.update(data) yield Message.Url, url, text.nameext_from_url(url, image) def metadata(self): """Collect metadata for extractor-job""" return {} def posts(self): """Return all relevant post-objects""" return self._pagination(self.root + self.path) def _pagination(self, url): while True: response = self.request(url) if response.history: # sometimes there is a redirect from # the last page of a listing (.../tag//1) # to the first page (.../tag/) # which could cause an endless loop cnt_old = response.history[0].url.count("/") cnt_new = response.url.count("/") if cnt_old == 5 and cnt_new == 4: return page = response.text yield from text.extract_iter( page, '
    ', '
    ') try: pos = page.index("class='next'") pos = page.rindex("class='current'", 0, pos) url = self.root + text.extract(page, "href='", "'", pos)[0] except (ValueError, TypeError): return def _parse_post(self, post): post, _, script = post.partition('")[0].rstrip("\n\r;")) class XhamsterUserExtractor(XhamsterExtractor): """Extractor for all galleries of an xhamster user""" subcategory = "user" pattern = BASE_PATTERN + r"/users/([^/?#]+)(?:/photos)?/?(?:$|[?#])" test = ( ("https://xhamster.com/users/goldenpalomino/photos", { "pattern": XhamsterGalleryExtractor.pattern, "count": 50, "range": "1-50", }), ("https://xhamster.com/users/nickname68"), ) def __init__(self, match): XhamsterExtractor.__init__(self, match) self.user = match.group(2) def items(self): url = "{}/users/{}/photos".format(self.root, self.user) data = {"_extractor": XhamsterGalleryExtractor} while url: extr = text.extract_from(self.request(url).text) while True: url = extr('thumb-image-container role-pop" href="', '"') if not url: break yield Message.Queue, url, data url = extr('data-page="next" href="', '"') ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1643756959.0 gallery_dl-1.21.1/gallery_dl/extractor/xvideos.py0000644000175000017500000001146114176336637020615 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2017-2019 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.xvideos.com/""" from .common import GalleryExtractor, Extractor, Message from .. import text import json class XvideosBase(): """Base class for xvideos extractors""" category = "xvideos" root = "https://www.xvideos.com" class XvideosGalleryExtractor(XvideosBase, GalleryExtractor): """Extractor for user profile galleries on xvideos.com""" subcategory = "gallery" directory_fmt = ("{category}", "{user[name]}", "{gallery[id]} {gallery[title]}") filename_fmt = "{category}_{gallery[id]}_{num:>03}.{extension}" archive_fmt = "{gallery[id]}_{num}" pattern = (r"(?:https?://)?(?:www\.)?xvideos\.com" r"/(?:profiles|amateur-channels|model-channels)" r"/([^/?#]+)/photos/(\d+)") test = ( ("https://www.xvideos.com/profiles/pervertedcouple/photos/751031", { "count": 8, "pattern": r"https://profile-pics-cdn\d+\.xvideos-cdn\.com" r"/[^/]+\,\d+/videos/profiles/galleries/84/ca/37" r"/pervertedcouple/gal751031/pic_\d+_big\.jpg", "keyword": { "gallery": { "id" : 751031, "title": "Random Stuff", "tags" : list, }, "user": { "id" : 20245371, "name" : "pervertedcouple", "display" : "Pervertedcouple", "sex" : "Woman", "description": str, }, }, }), ("https://www.xvideos.com/amateur-channels/pervertedcouple/photos/12"), ("https://www.xvideos.com/model-channels/pervertedcouple/photos/12"), ) def __init__(self, match): self.user, self.gallery_id = match.groups() url = "{}/profiles/{}/photos/{}".format( self.root, self.user, self.gallery_id) GalleryExtractor.__init__(self, match, url) def metadata(self, page): extr = text.extract_from(page) title = extr('"title":"', '"') user = { "id" : text.parse_int(extr('"id_user":', ',')), "display": extr('"display":"', '"'), "sex" : extr('"sex":"', '"'), "name" : self.user, } user["description"] = extr( '', '').strip() tags = extr('Tagged:', '<').strip() return { "user": user, "gallery": { "id" : text.parse_int(self.gallery_id), "title": text.unescape(title), "tags" : text.unescape(tags).split(", ") if tags else [], }, } @staticmethod def images(page): """Return a list of all image urls for this gallery""" return [ (url, None) for url in text.extract_iter( page, '")[0])["data"] if not isinstance(data["galleries"], dict): return if "0" in data["galleries"]: del data["galleries"]["0"] galleries = [ { "id" : text.parse_int(gid), "title": text.unescape(gdata["title"]), "count": gdata["nb_pics"], "_extractor": XvideosGalleryExtractor, } for gid, gdata in data["galleries"].items() ] galleries.sort(key=lambda x: x["id"]) for gallery in galleries: url = "https://www.xvideos.com/profiles/{}/photos/{}".format( self.user, gallery["id"]) yield Message.Queue, url, gallery ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1648567962.0 gallery_dl-1.21.1/gallery_dl/extractor/ytdl.py0000644000175000017500000001165614220623232020072 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2021-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for sites supported by youtube-dl""" from .common import Extractor, Message from .. import ytdl, config, exception class YoutubeDLExtractor(Extractor): """Generic extractor for youtube-dl supported URLs""" category = "ytdl" directory_fmt = ("{category}", "{subcategory}") filename_fmt = "{title}-{id}.{extension}" archive_fmt = "{extractor_key} {id}" pattern = r"ytdl:(.*)" test = ("ytdl:https://www.youtube.com/watch?v=BaW_jenozKc&t=1s&end=9",) def __init__(self, match): # import main youtube_dl module ytdl_module = ytdl.import_module(config.get( ("extractor", "ytdl"), "module")) self.ytdl_module_name = ytdl_module.__name__ # find suitable youtube_dl extractor self.ytdl_url = url = match.group(1) generic = config.interpolate(("extractor", "ytdl"), "generic", True) if generic == "force": self.ytdl_ie_key = "Generic" self.force_generic_extractor = True else: for ie in ytdl_module.extractor.gen_extractor_classes(): if ie.suitable(url): self.ytdl_ie_key = ie.ie_key() break if not generic and self.ytdl_ie_key == "Generic": raise exception.NoExtractorError() self.force_generic_extractor = False # set subcategory to youtube_dl extractor's key self.subcategory = self.ytdl_ie_key Extractor.__init__(self, match) def items(self): # import subcategory module ytdl_module = ytdl.import_module( config.get(("extractor", "ytdl", self.subcategory), "module") or self.ytdl_module_name) self.log.debug("Using %s", ytdl_module) # construct YoutubeDL object extr_opts = { "extract_flat" : "in_playlist", "force_generic_extractor": self.force_generic_extractor, } user_opts = { "retries" : self._retries, "socket_timeout" : self._timeout, "nocheckcertificate" : not self._verify, } if self._proxies: user_opts["proxy"] = self._proxies.get("http") username, password = self._get_auth_info() if username: user_opts["username"], user_opts["password"] = username, password del username, password ytdl_instance = ytdl.construct_YoutubeDL( ytdl_module, self, user_opts, extr_opts) # transfer cookies to ytdl cookies = self.session.cookies if cookies: set_cookie = ytdl_instance.cookiejar.set_cookie for cookie in cookies: set_cookie(cookie) # extract youtube_dl info_dict try: info_dict = ytdl_instance._YoutubeDL__extract_info( self.ytdl_url, ytdl_instance.get_info_extractor(self.ytdl_ie_key), False, {}, True) except ytdl_module.utils.YoutubeDLError: raise exception.StopExtraction("Failed to extract video data") if not info_dict: return elif "entries" in info_dict: results = self._process_entries( ytdl_module, ytdl_instance, info_dict["entries"]) else: results = (info_dict,) # yield results for info_dict in results: info_dict["extension"] = None info_dict["_ytdl_info_dict"] = info_dict info_dict["_ytdl_instance"] = ytdl_instance url = "ytdl:" + (info_dict.get("url") or info_dict.get("webpage_url") or self.ytdl_url) yield Message.Directory, info_dict yield Message.Url, url, info_dict def _process_entries(self, ytdl_module, ytdl_instance, entries): for entry in entries: if not entry: continue elif entry.get("_type") in ("url", "url_transparent"): try: info_dict = ytdl_instance.extract_info( entry["url"], False, ie_key=entry.get("ie_key")) except ytdl_module.utils.YoutubeDLError: continue if not info_dict: continue elif "entries" in info_dict: yield from self._process_entries( ytdl_module, ytdl_instance, info_dict["entries"]) else: yield info_dict else: yield entry if config.get(("extractor", "ytdl"), "enabled"): # make 'ytdl:' prefix optional YoutubeDLExtractor.pattern = r"(?:ytdl:)?(.*)" ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1648567962.0 gallery_dl-1.21.1/gallery_dl/formatter.py0000644000175000017500000002375114220623232017105 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2021-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """String formatters""" import os import json import string import _string import datetime import operator from . import text, util _CACHE = {} _CONVERSIONS = None _GLOBALS = { "_env": lambda: os.environ, "_now": datetime.datetime.now, } def parse(format_string, default=None): key = format_string, default try: return _CACHE[key] except KeyError: pass cls = StringFormatter if format_string.startswith("\f"): kind, _, format_string = format_string.partition(" ") kind = kind[1:] if kind == "T": cls = TemplateFormatter elif kind == "E": cls = ExpressionFormatter elif kind == "M": cls = ModuleFormatter elif kind == "F": cls = FStringFormatter formatter = _CACHE[key] = cls(format_string, default) return formatter class StringFormatter(): """Custom, extended version of string.Formatter This string formatter implementation is a mostly performance-optimized variant of the original string.Formatter class. Unnecessary features have been removed (positional arguments, unused argument check) and new formatting options have been added. Extra Conversions: - "l": calls str.lower on the target value - "u": calls str.upper - "c": calls str.capitalize - "C": calls string.capwords - "j". calls json.dumps - "t": calls str.strip - "d": calls text.parse_timestamp - "U": calls urllib.parse.unescape - "S": calls util.to_string() - "T": calls util.to_timestamü() - Example: {f!l} -> "example"; {f!u} -> "EXAMPLE" Extra Format Specifiers: - "?//": Adds and to the actual value if it evaluates to True. Otherwise the whole replacement field becomes an empty string. Example: {f:?-+/+-/} -> "-+Example+-" (if "f" contains "Example") -> "" (if "f" is None, 0, "") - "L//": Replaces the output with if its length (in characters) exceeds . Otherwise everything is left as is. Example: {f:L5/too long/} -> "foo" (if "f" is "foo") -> "too long" (if "f" is "foobar") - "J/": Joins elements of a list (or string) using Example: {f:J - /} -> "a - b - c" (if "f" is ["a", "b", "c"]) - "R//": Replaces all occurrences of with Example: {f:R /_/} -> "f_o_o_b_a_r" (if "f" is "f o o b a r") """ def __init__(self, format_string, default=None): self.default = default self.result = [] self.fields = [] for literal_text, field_name, format_spec, conv in \ _string.formatter_parser(format_string): if literal_text: self.result.append(literal_text) if field_name: self.fields.append(( len(self.result), self._field_access(field_name, format_spec, conv), )) self.result.append("") if len(self.result) == 1: if self.fields: self.format_map = self.fields[0][1] else: self.format_map = lambda _: format_string del self.result, self.fields def format_map(self, kwdict): """Apply 'kwdict' to the initial format_string and return its result""" result = self.result for index, func in self.fields: result[index] = func(kwdict) return "".join(result) def _field_access(self, field_name, format_spec, conversion): fmt = parse_format_spec(format_spec, conversion) if "|" in field_name: return self._apply_list([ parse_field_name(fn) for fn in field_name.split("|") ], fmt) else: key, funcs = parse_field_name(field_name) if key in _GLOBALS: return self._apply_globals(_GLOBALS[key], funcs, fmt) if funcs: return self._apply(key, funcs, fmt) return self._apply_simple(key, fmt) def _apply(self, key, funcs, fmt): def wrap(kwdict): try: obj = kwdict[key] for func in funcs: obj = func(obj) except Exception: obj = self.default return fmt(obj) return wrap def _apply_globals(self, gobj, funcs, fmt): def wrap(_): try: obj = gobj() for func in funcs: obj = func(obj) except Exception: obj = self.default return fmt(obj) return wrap def _apply_simple(self, key, fmt): def wrap(kwdict): return fmt(kwdict[key] if key in kwdict else self.default) return wrap def _apply_list(self, lst, fmt): def wrap(kwdict): for key, funcs in lst: try: obj = _GLOBALS[key]() if key in _GLOBALS else kwdict[key] for func in funcs: obj = func(obj) if obj: break except Exception: pass else: obj = self.default return fmt(obj) return wrap class TemplateFormatter(StringFormatter): """Read format_string from file""" def __init__(self, path, default=None): with open(util.expand_path(path)) as fp: format_string = fp.read() StringFormatter.__init__(self, format_string, default) class ExpressionFormatter(): """Generate text by evaluating a Python expression""" def __init__(self, expression, default=None): self.format_map = util.compile_expression(expression) class ModuleFormatter(): """Generate text by calling an external function""" def __init__(self, function_spec, default=None): module_name, _, function_name = function_spec.partition(":") module = __import__(module_name) self.format_map = getattr(module, function_name) class FStringFormatter(): """Generate text by evaluaring an f-string literal""" def __init__(self, fstring, default=None): self.format_map = util.compile_expression("f'''" + fstring + "'''") def parse_field_name(field_name): first, rest = _string.formatter_field_name_split(field_name) funcs = [] for is_attr, key in rest: if is_attr: func = operator.attrgetter else: func = operator.itemgetter try: if ":" in key: start, _, stop = key.partition(":") stop, _, step = stop.partition(":") start = int(start) if start else None stop = int(stop) if stop else None step = int(step) if step else None key = slice(start, stop, step) except TypeError: pass # key is an integer funcs.append(func(key)) return first, funcs def parse_format_spec(format_spec, conversion): fmt = build_format_func(format_spec) if not conversion: return fmt global _CONVERSIONS if _CONVERSIONS is None: _CONVERSIONS = { "l": str.lower, "u": str.upper, "c": str.capitalize, "C": string.capwords, "j": json.dumps, "t": str.strip, "T": util.datetime_to_timestamp_string, "d": text.parse_timestamp, "U": text.unescape, "S": util.to_string, "s": str, "r": repr, "a": ascii, } conversion = _CONVERSIONS[conversion] if fmt is format: return conversion else: def chain(obj): return fmt(conversion(obj)) return chain def build_format_func(format_spec): if format_spec: fmt = format_spec[0] if fmt == "?": return _parse_optional(format_spec) if fmt == "L": return _parse_maxlen(format_spec) if fmt == "J": return _parse_join(format_spec) if fmt == "R": return _parse_replace(format_spec) if fmt == "D": return _parse_datetime(format_spec) return _default_format(format_spec) return format def _parse_optional(format_spec): before, after, format_spec = format_spec.split("/", 2) before = before[1:] fmt = build_format_func(format_spec) def optional(obj): return before + fmt(obj) + after if obj else "" return optional def _parse_maxlen(format_spec): maxlen, replacement, format_spec = format_spec.split("/", 2) maxlen = text.parse_int(maxlen[1:]) fmt = build_format_func(format_spec) def mlen(obj): obj = fmt(obj) return obj if len(obj) <= maxlen else replacement return mlen def _parse_join(format_spec): separator, _, format_spec = format_spec.partition("/") separator = separator[1:] fmt = build_format_func(format_spec) def join(obj): return fmt(separator.join(obj)) return join def _parse_replace(format_spec): old, new, format_spec = format_spec.split("/", 2) old = old[1:] fmt = build_format_func(format_spec) def replace(obj): return fmt(obj.replace(old, new)) return replace def _parse_datetime(format_spec): dt_format, _, format_spec = format_spec.partition("/") dt_format = dt_format[1:] fmt = build_format_func(format_spec) def dt(obj): return fmt(text.parse_datetime(obj, dt_format)) return dt def _default_format(format_spec): def wrap(obj): return format(obj, format_spec) return wrap ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1648567962.0 gallery_dl-1.21.1/gallery_dl/job.py0000644000175000017500000005641414220623232015656 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2015-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. import sys import json import time import errno import logging import functools import collections from . import extractor, downloader, postprocessor from . import config, text, util, path, formatter, output, exception from .extractor.message import Message class Job(): """Base class for Job-types""" ulog = None def __init__(self, extr, parent=None): if isinstance(extr, str): extr = extractor.find(extr) if not extr: raise exception.NoExtractorError() self.extractor = extr self.pathfmt = None self.kwdict = {} self.status = 0 self.url_key = extr.config("url-metadata") self._logger_extra = { "job" : self, "extractor": extr, "path" : output.PathfmtProxy(self), "keywords" : output.KwdictProxy(self), } extr.log = self._wrap_logger(extr.log) extr.log.debug("Using %s for '%s'", extr.__class__.__name__, extr.url) # data from parent job if parent: pextr = parent.extractor # transfer (sub)category if pextr.config("category-transfer", pextr.categorytransfer): extr._cfgpath = pextr._cfgpath extr.category = pextr.category extr.subcategory = pextr.subcategory # user-supplied metadata kwdict = extr.config("keywords") if kwdict: self.kwdict.update(kwdict) # predicates self.pred_url = self._prepare_predicates("image", True) self.pred_queue = self._prepare_predicates("chapter", False) def run(self): """Execute or run the job""" extractor = self.extractor log = extractor.log msg = None sleep = util.build_duration_func(extractor.config("sleep-extractor")) if sleep: time.sleep(sleep()) try: for msg in extractor: self.dispatch(msg) except exception.StopExtraction as exc: if exc.message: log.error(exc.message) self.status |= exc.code except exception.TerminateExtraction: raise except exception.GalleryDLException as exc: log.error("%s: %s", exc.__class__.__name__, exc) self.status |= exc.code except OSError as exc: log.error("Unable to download data: %s: %s", exc.__class__.__name__, exc) log.debug("", exc_info=True) self.status |= 128 except Exception as exc: log.error(("An unexpected error occurred: %s - %s. " "Please run gallery-dl again with the --verbose flag, " "copy its output and report this issue on " "https://github.com/mikf/gallery-dl/issues ."), exc.__class__.__name__, exc) log.debug("", exc_info=True) self.status |= 1 except BaseException: self.status |= 1 raise else: if msg is None: log.info("No results for %s", extractor.url) finally: self.handle_finalize() if extractor.finalize: extractor.finalize() return self.status def dispatch(self, msg): """Call the appropriate message handler""" if msg[0] == Message.Url: _, url, kwdict = msg if self.url_key: kwdict[self.url_key] = url if self.pred_url(url, kwdict): self.update_kwdict(kwdict) self.handle_url(url, kwdict) elif msg[0] == Message.Directory: self.update_kwdict(msg[1]) self.handle_directory(msg[1]) elif msg[0] == Message.Queue: _, url, kwdict = msg if self.url_key: kwdict[self.url_key] = url if self.pred_queue(url, kwdict): self.handle_queue(url, kwdict) def handle_url(self, url, kwdict): """Handle Message.Url""" def handle_directory(self, kwdict): """Handle Message.Directory""" def handle_queue(self, url, kwdict): """Handle Message.Queue""" def handle_finalize(self): """Handle job finalization""" def update_kwdict(self, kwdict): """Update 'kwdict' with additional metadata""" extr = self.extractor kwdict["category"] = extr.category kwdict["subcategory"] = extr.subcategory if self.kwdict: kwdict.update(self.kwdict) def _prepare_predicates(self, target, skip=True): predicates = [] if self.extractor.config(target + "-unique"): predicates.append(util.UniquePredicate()) pfilter = self.extractor.config(target + "-filter") if pfilter: try: pred = util.FilterPredicate(pfilter, target) except (SyntaxError, ValueError, TypeError) as exc: self.extractor.log.warning(exc) else: predicates.append(pred) prange = self.extractor.config(target + "-range") if prange: try: pred = util.RangePredicate(prange) except ValueError as exc: self.extractor.log.warning( "invalid %s range: %s", target, exc) else: if skip and pred.lower > 1 and not pfilter: pred.index += self.extractor.skip(pred.lower - 1) predicates.append(pred) return util.build_predicate(predicates) def get_logger(self, name): return self._wrap_logger(logging.getLogger(name)) def _wrap_logger(self, logger): return output.LoggerAdapter(logger, self._logger_extra) def _write_unsupported(self, url): if self.ulog: self.ulog.info(url) class DownloadJob(Job): """Download images into appropriate directory/filename locations""" def __init__(self, url, parent=None): Job.__init__(self, url, parent) self.log = self.get_logger("download") self.fallback = None self.archive = None self.sleep = None self.hooks = () self.downloaders = {} self.out = output.select() self.visited = parent.visited if parent else set() self._extractor_filter = None self._skipcnt = 0 def handle_url(self, url, kwdict): """Download the resource specified in 'url'""" hooks = self.hooks pathfmt = self.pathfmt archive = self.archive # prepare download pathfmt.set_filename(kwdict) if "prepare" in hooks: for callback in hooks["prepare"]: callback(pathfmt) if archive and archive.check(kwdict): pathfmt.fix_extension() self.handle_skip() return if pathfmt.exists(): if archive: archive.add(kwdict) self.handle_skip() return if self.sleep: time.sleep(self.sleep()) # download from URL if not self.download(url): # use fallback URLs if available/enabled fallback = kwdict.get("_fallback", ()) if self.fallback else () for num, url in enumerate(fallback, 1): util.remove_file(pathfmt.temppath) self.log.info("Trying fallback URL #%d", num) if self.download(url): break else: # download failed self.status |= 4 self.log.error("Failed to download %s", pathfmt.filename or url) return if not pathfmt.temppath: if archive: archive.add(kwdict) self.handle_skip() return # run post processors if "file" in hooks: for callback in hooks["file"]: callback(pathfmt) # download succeeded pathfmt.finalize() self.out.success(pathfmt.path, 0) self._skipcnt = 0 if archive: archive.add(kwdict) if "after" in hooks: for callback in hooks["after"]: callback(pathfmt) def handle_directory(self, kwdict): """Set and create the target directory for downloads""" if not self.pathfmt: self.initialize(kwdict) else: self.pathfmt.set_directory(kwdict) if "post" in self.hooks: for callback in self.hooks["post"]: callback(self.pathfmt) def handle_queue(self, url, kwdict): if url in self.visited: return self.visited.add(url) cls = kwdict.get("_extractor") if cls: extr = cls.from_url(url) else: extr = extractor.find(url) if extr: if self._extractor_filter is None: self._extractor_filter = self._build_extractor_filter() if not self._extractor_filter(extr): extr = None if extr: job = self.__class__(extr, self) pfmt = self.pathfmt pextr = self.extractor if pfmt and pextr.config("parent-directory"): extr._parentdir = pfmt.directory else: extr._parentdir = pextr._parentdir pmeta = pextr.config("parent-metadata") if pmeta: if isinstance(pmeta, str): data = self.kwdict.copy() if kwdict: data.update(kwdict) job.kwdict[pmeta] = data else: if self.kwdict: job.kwdict.update(self.kwdict) if kwdict: job.kwdict.update(kwdict) if pextr.config("parent-skip"): job._skipcnt = self._skipcnt self.status |= job.run() self._skipcnt = job._skipcnt else: self.status |= job.run() else: self._write_unsupported(url) def handle_finalize(self): pathfmt = self.pathfmt if self.archive: self.archive.close() if pathfmt: self.extractor._store_cookies() if "finalize" in self.hooks: status = self.status for callback in self.hooks["finalize"]: callback(pathfmt, status) def handle_skip(self): pathfmt = self.pathfmt self.out.skip(pathfmt.path) if "skip" in self.hooks: for callback in self.hooks["skip"]: callback(pathfmt) if self._skipexc: self._skipcnt += 1 if self._skipcnt >= self._skipmax: raise self._skipexc() def download(self, url): """Download 'url'""" scheme = url.partition(":")[0] downloader = self.get_downloader(scheme) if downloader: try: return downloader.download(url, self.pathfmt) except OSError as exc: if exc.errno == errno.ENOSPC: raise self.log.warning("%s: %s", exc.__class__.__name__, exc) return False self._write_unsupported(url) return False def get_downloader(self, scheme): """Return a downloader suitable for 'scheme'""" try: return self.downloaders[scheme] except KeyError: pass cls = downloader.find(scheme) if cls and config.get(("downloader", cls.scheme), "enabled", True): instance = cls(self) else: instance = None self.log.error("'%s:' URLs are not supported/enabled", scheme) if cls and cls.scheme == "http": self.downloaders["http"] = self.downloaders["https"] = instance else: self.downloaders[scheme] = instance return instance def initialize(self, kwdict=None): """Delayed initialization of PathFormat, etc.""" extr = self.extractor cfg = extr.config pathfmt = self.pathfmt = path.PathFormat(extr) if kwdict: pathfmt.set_directory(kwdict) self.sleep = util.build_duration_func(cfg("sleep")) self.fallback = cfg("fallback", True) if not cfg("download", True): # monkey-patch method to do nothing and always return True self.download = pathfmt.fix_extension archive = cfg("archive") if archive: archive = util.expand_path(archive) archive_format = (cfg("archive-prefix", extr.category) + cfg("archive-format", extr.archive_fmt)) try: if "{" in archive: archive = formatter.parse(archive).format_map(kwdict) self.archive = util.DownloadArchive(archive, archive_format) except Exception as exc: extr.log.warning( "Failed to open download archive at '%s' ('%s: %s')", archive, exc.__class__.__name__, exc) else: extr.log.debug("Using download archive '%s'", archive) skip = cfg("skip", True) if skip: self._skipexc = None if skip == "enumerate": pathfmt.check_file = pathfmt._enum_file elif isinstance(skip, str): skip, _, smax = skip.partition(":") if skip == "abort": self._skipexc = exception.StopExtraction elif skip == "terminate": self._skipexc = exception.TerminateExtraction elif skip == "exit": self._skipexc = sys.exit self._skipmax = text.parse_int(smax) else: # monkey-patch methods to always return False pathfmt.exists = lambda x=None: False if self.archive: self.archive.check = pathfmt.exists postprocessors = extr.config_accumulate("postprocessors") if postprocessors: self.hooks = collections.defaultdict(list) pp_log = self.get_logger("postprocessor") pp_list = [] pp_conf = config.get((), "postprocessor") or {} for pp_dict in postprocessors: if isinstance(pp_dict, str): pp_dict = pp_conf.get(pp_dict) or {"name": pp_dict} clist = pp_dict.get("whitelist") if clist is not None: negate = False else: clist = pp_dict.get("blacklist") negate = True if clist and not util.build_extractor_filter( clist, negate)(extr): continue name = pp_dict.get("name") pp_cls = postprocessor.find(name) if not pp_cls: pp_log.warning("module '%s' not found", name) continue try: pp_obj = pp_cls(self, pp_dict) except Exception as exc: pp_log.error("'%s' initialization failed: %s: %s", name, exc.__class__.__name__, exc) pp_log.debug("", exc_info=True) else: pp_list.append(pp_obj) if pp_list: extr.log.debug("Active postprocessor modules: %s", pp_list) if "init" in self.hooks: for callback in self.hooks["init"]: callback(pathfmt) def register_hooks(self, hooks, options=None): expr = options.get("filter") if options else None if expr: condition = util.compile_expression(expr) for hook, callback in hooks.items(): self.hooks[hook].append(functools.partial( self._call_hook, callback, condition)) else: for hook, callback in hooks.items(): self.hooks[hook].append(callback) @staticmethod def _call_hook(callback, condition, pathfmt): if condition(pathfmt.kwdict): callback(pathfmt) def _build_extractor_filter(self): clist = self.extractor.config("whitelist") if clist is not None: negate = False special = None else: clist = self.extractor.config("blacklist") negate = True special = util.SPECIAL_EXTRACTORS if clist is None: clist = (self.extractor.category,) return util.build_extractor_filter(clist, negate, special) class SimulationJob(DownloadJob): """Simulate the extraction process without downloading anything""" def handle_url(self, url, kwdict): if not kwdict["extension"]: kwdict["extension"] = "jpg" self.pathfmt.set_filename(kwdict) self.out.skip(self.pathfmt.path) if self.sleep: time.sleep(self.sleep()) if self.archive: self.archive.add(kwdict) def handle_directory(self, kwdict): if not self.pathfmt: self.initialize() class KeywordJob(Job): """Print available keywords""" def __init__(self, url, parent=None): Job.__init__(self, url, parent) self.private = config.get(("output",), "private") def handle_url(self, url, kwdict): print("\nKeywords for filenames and --filter:") print("------------------------------------") self.print_kwdict(kwdict) raise exception.StopExtraction() def handle_directory(self, kwdict): print("Keywords for directory names:") print("-----------------------------") self.print_kwdict(kwdict) def handle_queue(self, url, kwdict): extr = None if "_extractor" in kwdict: extr = kwdict["_extractor"].from_url(url) if not util.filter_dict(kwdict): self.extractor.log.info( "This extractor only spawns other extractors " "and does not provide any metadata on its own.") if extr: self.extractor.log.info( "Showing results for '%s' instead:\n", url) KeywordJob(extr, self).run() else: self.extractor.log.info( "Try 'gallery-dl -K \"%s\"' instead.", url) else: print("Keywords for --chapter-filter:") print("------------------------------") self.print_kwdict(kwdict) if extr or self.extractor.categorytransfer: print() KeywordJob(extr or url, self).run() raise exception.StopExtraction() def print_kwdict(self, kwdict, prefix=""): """Print key-value pairs in 'kwdict' with formatting""" suffix = "]" if prefix else "" for key, value in sorted(kwdict.items()): if key[0] == "_" and not self.private: continue key = prefix + key + suffix if isinstance(value, dict): self.print_kwdict(value, key + "[") elif isinstance(value, list): if value and isinstance(value[0], dict): self.print_kwdict(value[0], key + "[][") else: print(key, "[]", sep="") for val in value: print(" -", val) else: # string or number print(key, "\n ", value, sep="") class UrlJob(Job): """Print download urls""" maxdepth = 1 def __init__(self, url, parent=None, depth=1): Job.__init__(self, url, parent) self.depth = depth if depth >= self.maxdepth: self.handle_queue = self.handle_url @staticmethod def handle_url(url, _): print(url) @staticmethod def handle_url_fallback(url, kwdict): print(url) if "_fallback" in kwdict: for url in kwdict["_fallback"]: print("|", url) def handle_queue(self, url, kwdict): cls = kwdict.get("_extractor") if cls: extr = cls.from_url(url) else: extr = extractor.find(url) if extr: self.status |= self.__class__(extr, self, self.depth + 1).run() else: self._write_unsupported(url) class InfoJob(Job): """Print extractor defaults and settings""" def run(self): ex = self.extractor pm = self._print_multi pc = self._print_config if ex.basecategory: pm("Category / Subcategory / Basecategory", ex.category, ex.subcategory, ex.basecategory) else: pm("Category / Subcategory", ex.category, ex.subcategory) pc("Filename format", "filename", ex.filename_fmt) pc("Directory format", "directory", ex.directory_fmt) pc("Archive format", "archive-format", ex.archive_fmt) pc("Request interval", "sleep-request", ex.request_interval) return 0 def _print_multi(self, title, *values): print(title, "\n ", " / ".join(json.dumps(v) for v in values), sep="") def _print_config(self, title, optname, value): optval = self.extractor.config(optname, util.SENTINEL) if optval is not util.SENTINEL: print(title, "(custom):\n ", json.dumps(optval)) print(title, "(default):\n ", json.dumps(value)) elif value: print(title, "(default):\n ", json.dumps(value)) class DataJob(Job): """Collect extractor results and dump them""" def __init__(self, url, parent=None, file=sys.stdout, ensure_ascii=True): Job.__init__(self, url, parent) self.file = file self.data = [] self.ascii = config.get(("output",), "ascii", ensure_ascii) private = config.get(("output",), "private") self.filter = util.identity if private else util.filter_dict def run(self): sleep = util.build_duration_func( self.extractor.config("sleep-extractor")) if sleep: time.sleep(sleep()) # collect data try: for msg in self.extractor: self.dispatch(msg) except exception.StopExtraction: pass except Exception as exc: self.data.append((exc.__class__.__name__, str(exc))) except BaseException: pass # convert numbers to string if config.get(("output",), "num-to-str", False): for msg in self.data: util.transform_dict(msg[-1], util.number_to_string) # dump to 'file' try: util.dump_json(self.data, self.file, self.ascii, 2) self.file.flush() except Exception: pass return 0 def handle_url(self, url, kwdict): self.data.append((Message.Url, url, self.filter(kwdict))) def handle_directory(self, kwdict): self.data.append((Message.Directory, self.filter(kwdict))) def handle_queue(self, url, kwdict): self.data.append((Message.Queue, url, self.filter(kwdict))) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1641689335.0 gallery_dl-1.21.1/gallery_dl/oauth.py0000644000175000017500000001065714166430367016241 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2018-2020 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """OAuth helper functions and classes""" import hmac import time import base64 import random import string import hashlib import urllib.parse import requests import requests.auth from . import text from .cache import cache def nonce(size, alphabet=string.ascii_letters): """Generate a nonce value with 'size' characters""" return "".join(random.choice(alphabet) for _ in range(size)) def quote(value, quote=urllib.parse.quote): """Quote 'value' according to the OAuth1.0 standard""" return quote(value, "~") def concat(*args): """Concatenate 'args' as expected by OAuth1.0""" return "&".join(quote(item) for item in args) class OAuth1Session(requests.Session): """Extension to requests.Session to support OAuth 1.0""" def __init__(self, consumer_key, consumer_secret, token=None, token_secret=None): requests.Session.__init__(self) self.auth = OAuth1Client( consumer_key, consumer_secret, token, token_secret, ) def rebuild_auth(self, prepared_request, response): if "Authorization" in prepared_request.headers: del prepared_request.headers["Authorization"] prepared_request.prepare_auth(self.auth) class OAuth1Client(requests.auth.AuthBase): """OAuth1.0a authentication""" def __init__(self, consumer_key, consumer_secret, token=None, token_secret=None): self.consumer_key = consumer_key self.consumer_secret = consumer_secret self.token = token self.token_secret = token_secret def __call__(self, request): oauth_params = [ ("oauth_consumer_key", self.consumer_key), ("oauth_nonce", nonce(16)), ("oauth_signature_method", "HMAC-SHA1"), ("oauth_timestamp", str(int(time.time()))), ("oauth_version", "1.0"), ] if self.token: oauth_params.append(("oauth_token", self.token)) signature = self.generate_signature(request, oauth_params) oauth_params.append(("oauth_signature", signature)) request.headers["Authorization"] = "OAuth " + ",".join( key + '="' + value + '"' for key, value in oauth_params) return request def generate_signature(self, request, params): """Generate 'oauth_signature' value""" url, _, query = request.url.partition("?") params = params.copy() for key, value in text.parse_query(query).items(): params.append((quote(key), quote(value))) params.sort() query = "&".join("=".join(item) for item in params) message = concat(request.method, url, query).encode() key = concat(self.consumer_secret, self.token_secret or "").encode() signature = hmac.new(key, message, hashlib.sha1).digest() return quote(base64.b64encode(signature).decode()) class OAuth1API(): """Base class for OAuth1.0 based API interfaces""" API_KEY = None API_SECRET = None def __init__(self, extractor): self.log = extractor.log self.extractor = extractor api_key = extractor.config("api-key", self.API_KEY) api_secret = extractor.config("api-secret", self.API_SECRET) token = extractor.config("access-token") token_secret = extractor.config("access-token-secret") key_type = "default" if api_key == self.API_KEY else "custom" if token is None or token == "cache": key = (extractor.category, api_key) token, token_secret = _token_cache(key) if api_key and api_secret and token and token_secret: self.log.debug("Using %s OAuth1.0 authentication", key_type) self.session = OAuth1Session( api_key, api_secret, token, token_secret) self.api_key = None else: self.log.debug("Using %s api_key authentication", key_type) self.session = extractor.session self.api_key = api_key def request(self, url, **kwargs): kwargs["fatal"] = None kwargs["session"] = self.session return self.extractor.request(url, **kwargs) @cache(maxage=100*365*24*3600, keyarg=0) def _token_cache(key): return None, None ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1648567962.0 gallery_dl-1.21.1/gallery_dl/option.py0000644000175000017500000004131714220623232016410 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2017-2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Command line option parsing""" import argparse import logging import json import sys from . import job, version class ConfigAction(argparse.Action): """Set argparse results as config values""" def __call__(self, parser, namespace, values, option_string=None): namespace.options.append(((), self.dest, values)) class ConfigConstAction(argparse.Action): """Set argparse const values as config values""" def __call__(self, parser, namespace, values, option_string=None): namespace.options.append(((), self.dest, self.const)) class AppendCommandAction(argparse.Action): def __call__(self, parser, namespace, values, option_string=None): items = getattr(namespace, self.dest, None) or [] val = self.const.copy() val["command"] = values items.append(val) setattr(namespace, self.dest, items) class DeprecatedConfigConstAction(argparse.Action): """Set argparse const values as config values + deprecation warning""" def __call__(self, parser, namespace, values, option_string=None): print("warning: {} is deprecated. Use {} instead.".format( "/".join(self.option_strings), self.choices), file=sys.stderr) namespace.options.append(((), self.dest, self.const)) class ParseAction(argparse.Action): """Parse = options and set them as config values""" def __call__(self, parser, namespace, values, option_string=None): key, _, value = values.partition("=") try: value = json.loads(value) except ValueError: pass key = key.split(".") # splitting an empty string becomes [""] namespace.options.append((key[:-1], key[-1], value)) class Formatter(argparse.HelpFormatter): """Custom HelpFormatter class to customize help output""" def __init__(self, *args, **kwargs): super().__init__(max_help_position=50, *args, **kwargs) def _format_action_invocation(self, action): opts = action.option_strings[:] if opts: if action.nargs != 0: args_string = self._format_args(action, "ARG") opts[-1] += " " + args_string return ', '.join(opts) else: return self._metavar_formatter(action, action.dest)(1)[0] def build_parser(): """Build and configure an ArgumentParser object""" parser = argparse.ArgumentParser( usage="%(prog)s [OPTION]... URL...", formatter_class=Formatter, add_help=False, ) general = parser.add_argument_group("General Options") general.add_argument( "-h", "--help", action="help", help="Print this help message and exit", ) general.add_argument( "--version", action="version", version=version.__version__, help="Print program version and exit", ) general.add_argument( "-i", "--input-file", dest="inputfiles", metavar="FILE", action="append", help=("Download URLs found in FILE ('-' for stdin). " "More than one --input-file can be specified"), ) general.add_argument( "-d", "--destination", dest="base-directory", metavar="PATH", action=ConfigAction, help="Target location for file downloads", ) general.add_argument( "-D", "--directory", dest="directory", metavar="PATH", help="Exact location for file downloads", ) general.add_argument( "-f", "--filename", dest="filename", metavar="FORMAT", help=("Filename format string for downloaded files " "('/O' for \"original\" filenames)"), ) general.add_argument( "--cookies", dest="cookies", metavar="FILE", action=ConfigAction, help="File to load additional cookies from", ) general.add_argument( "--proxy", dest="proxy", metavar="URL", action=ConfigAction, help="Use the specified proxy", ) general.add_argument( "--source-address", dest="source-address", metavar="IP", action=ConfigAction, help="Client-side IP address to bind to", ) general.add_argument( "--clear-cache", dest="clear_cache", metavar="MODULE", help="Delete cached login sessions, cookies, etc. for MODULE " "(ALL to delete everything)", ) output = parser.add_argument_group("Output Options") output.add_argument( "-q", "--quiet", dest="loglevel", default=logging.INFO, action="store_const", const=logging.ERROR, help="Activate quiet mode", ) output.add_argument( "-v", "--verbose", dest="loglevel", action="store_const", const=logging.DEBUG, help="Print various debugging information", ) output.add_argument( "-g", "--get-urls", dest="list_urls", action="count", help="Print URLs instead of downloading", ) output.add_argument( "-G", "--resolve-urls", dest="list_urls", action="store_const", const=128, help="Print URLs instead of downloading; resolve intermediary URLs", ) output.add_argument( "-j", "--dump-json", dest="jobtype", action="store_const", const=job.DataJob, help="Print JSON information", ) output.add_argument( "-s", "--simulate", dest="jobtype", action="store_const", const=job.SimulationJob, help="Simulate data extraction; do not download anything", ) output.add_argument( "-E", "--extractor-info", dest="jobtype", action="store_const", const=job.InfoJob, help="Print extractor defaults and settings", ) output.add_argument( "-K", "--list-keywords", dest="jobtype", action="store_const", const=job.KeywordJob, help=("Print a list of available keywords and example values " "for the given URLs"), ) output.add_argument( "--list-modules", dest="list_modules", action="store_true", help="Print a list of available extractor modules", ) output.add_argument( "--list-extractors", dest="list_extractors", action="store_true", help=("Print a list of extractor classes " "with description, (sub)category and example URL"), ) output.add_argument( "--write-log", dest="logfile", metavar="FILE", action=ConfigAction, help="Write logging output to FILE", ) output.add_argument( "--write-unsupported", dest="unsupportedfile", metavar="FILE", action=ConfigAction, help=("Write URLs, which get emitted by other extractors but cannot " "be handled, to FILE"), ) output.add_argument( "--write-pages", dest="write-pages", nargs=0, action=ConfigConstAction, const=True, help=("Write downloaded intermediary pages to files " "in the current directory to debug problems"), ) downloader = parser.add_argument_group("Downloader Options") downloader.add_argument( "-r", "--limit-rate", dest="rate", metavar="RATE", action=ConfigAction, help="Maximum download rate (e.g. 500k or 2.5M)", ) downloader.add_argument( "-R", "--retries", dest="retries", metavar="N", type=int, action=ConfigAction, help=("Maximum number of retries for failed HTTP requests " "or -1 for infinite retries (default: 4)"), ) downloader.add_argument( "--http-timeout", dest="timeout", metavar="SECONDS", type=float, action=ConfigAction, help="Timeout for HTTP connections (default: 30.0)", ) downloader.add_argument( "--sleep", dest="sleep", metavar="SECONDS", action=ConfigAction, help=("Number of seconds to wait before each download. " "This can be either a constant value or a range " "(e.g. 2.7 or 2.0-3.5)"), ) downloader.add_argument( "--sleep-request", dest="sleep-request", metavar="SECONDS", action=ConfigAction, help=("Number of seconds to wait between HTTP requests " "during data extraction"), ) downloader.add_argument( "--sleep-extractor", dest="sleep-extractor", metavar="SECONDS", action=ConfigAction, help=("Number of seconds to wait before starting data extraction " "for an input URL"), ) downloader.add_argument( "--filesize-min", dest="filesize-min", metavar="SIZE", action=ConfigAction, help="Do not download files smaller than SIZE (e.g. 500k or 2.5M)", ) downloader.add_argument( "--filesize-max", dest="filesize-max", metavar="SIZE", action=ConfigAction, help="Do not download files larger than SIZE (e.g. 500k or 2.5M)", ) downloader.add_argument( "--no-part", dest="part", nargs=0, action=ConfigConstAction, const=False, help="Do not use .part files", ) downloader.add_argument( "--no-skip", dest="skip", nargs=0, action=ConfigConstAction, const=False, help="Do not skip downloads; overwrite existing files", ) downloader.add_argument( "--no-mtime", dest="mtime", nargs=0, action=ConfigConstAction, const=False, help=("Do not set file modification times according to " "Last-Modified HTTP response headers") ) downloader.add_argument( "--no-download", dest="download", nargs=0, action=ConfigConstAction, const=False, help=("Do not download any files") ) downloader.add_argument( "--no-check-certificate", dest="verify", nargs=0, action=ConfigConstAction, const=False, help="Disable HTTPS certificate validation", ) configuration = parser.add_argument_group("Configuration Options") configuration.add_argument( "-c", "--config", dest="cfgfiles", metavar="FILE", action="append", help="Additional configuration files", ) configuration.add_argument( "--config-yaml", dest="yamlfiles", metavar="FILE", action="append", help=argparse.SUPPRESS, ) configuration.add_argument( "-o", "--option", dest="options", metavar="OPT", action=ParseAction, default=[], help="Additional '=' option values", ) configuration.add_argument( "--ignore-config", dest="load_config", action="store_false", help="Do not read the default configuration files", ) authentication = parser.add_argument_group("Authentication Options") authentication.add_argument( "-u", "--username", dest="username", metavar="USER", action=ConfigAction, help="Username to login with", ) authentication.add_argument( "-p", "--password", dest="password", metavar="PASS", action=ConfigAction, help="Password belonging to the given username", ) authentication.add_argument( "--netrc", dest="netrc", nargs=0, action=ConfigConstAction, const=True, help="Enable .netrc authentication data", ) selection = parser.add_argument_group("Selection Options") selection.add_argument( "--download-archive", dest="archive", metavar="FILE", action=ConfigAction, help=("Record all downloaded files in the archive file and " "skip downloading any file already in it"), ) selection.add_argument( "-A", "--abort", dest="abort", metavar="N", type=int, help=("Stop current extractor run " "after N consecutive file downloads were skipped"), ) selection.add_argument( "-T", "--terminate", dest="terminate", metavar="N", type=int, help=("Stop current and parent extractor runs " "after N consecutive file downloads were skipped"), ) selection.add_argument( "--range", dest="image-range", metavar="RANGE", action=ConfigAction, help=("Index-range(s) specifying which images to download. " "For example '5-10' or '1,3-5,10-'"), ) selection.add_argument( "--chapter-range", dest="chapter-range", metavar="RANGE", action=ConfigAction, help=("Like '--range', but applies to manga-chapters " "and other delegated URLs"), ) selection.add_argument( "--filter", dest="image-filter", metavar="EXPR", action=ConfigAction, help=("Python expression controlling which images to download. " "Files for which the expression evaluates to False are ignored. " "Available keys are the filename-specific ones listed by '-K'. " "Example: --filter \"image_width >= 1000 and " "rating in ('s', 'q')\""), ) selection.add_argument( "--chapter-filter", dest="chapter-filter", metavar="EXPR", action=ConfigAction, help=("Like '--filter', but applies to manga-chapters " "and other delegated URLs"), ) infojson = { "name" : "metadata", "event" : "init", "filename": "info.json", } postprocessor = parser.add_argument_group("Post-processing Options") postprocessor.add_argument( "--zip", dest="postprocessors", action="append_const", const="zip", help="Store downloaded files in a ZIP archive", ) postprocessor.add_argument( "--ugoira-conv", dest="postprocessors", action="append_const", const={ "name" : "ugoira", "ffmpeg-args" : ("-c:v", "libvpx", "-crf", "4", "-b:v", "5000k"), "ffmpeg-twopass": True, "whitelist" : ("pixiv", "danbooru"), }, help="Convert Pixiv Ugoira to WebM (requires FFmpeg)", ) postprocessor.add_argument( "--ugoira-conv-lossless", dest="postprocessors", action="append_const", const={ "name" : "ugoira", "ffmpeg-args" : ("-c:v", "libvpx-vp9", "-lossless", "1", "-pix_fmt", "yuv420p"), "ffmpeg-twopass": False, "whitelist" : ("pixiv", "danbooru"), }, help="Convert Pixiv Ugoira to WebM in VP9 lossless mode", ) postprocessor.add_argument( "--ugoira-conv-copy", dest="postprocessors", action="append_const", const={ "name" : "ugoira", "extension" : "mkv", "ffmpeg-args" : ("-c:v", "copy"), "ffmpeg-twopass" : False, "repeat-last-frame": False, "whitelist" : ("pixiv", "danbooru"), }, help="Convert Pixiv Ugoira to MKV without re-encoding any frames", ) postprocessor.add_argument( "--write-metadata", dest="postprocessors", action="append_const", const="metadata", help="Write metadata to separate JSON files", ) postprocessor.add_argument( "--write-info-json", dest="postprocessors", action="append_const", const=infojson, help="Write gallery metadata to a info.json file", ) postprocessor.add_argument( "--write-infojson", dest="postprocessors", action="append_const", const=infojson, help=argparse.SUPPRESS, ) postprocessor.add_argument( "--write-tags", dest="postprocessors", action="append_const", const={"name": "metadata", "mode": "tags"}, help="Write image tags to separate text files", ) postprocessor.add_argument( "--mtime-from-date", dest="postprocessors", action="append_const", const="mtime", help="Set file modification times according to 'date' metadata", ) postprocessor.add_argument( "--exec", dest="postprocessors", metavar="CMD", action=AppendCommandAction, const={"name": "exec"}, help=("Execute CMD for each downloaded file. " "Example: --exec 'convert {} {}.png && rm {}'"), ) postprocessor.add_argument( "--exec-after", dest="postprocessors", metavar="CMD", action=AppendCommandAction, const={ "name": "exec", "event": "finalize"}, help=("Execute CMD after all files were downloaded successfully. " "Example: --exec-after 'cd {} && convert * ../doc.pdf'"), ) postprocessor.add_argument( "-P", "--postprocessor", dest="postprocessors", metavar="NAME", action="append", help="Activate the specified post processor", ) parser.add_argument( "urls", metavar="URL", nargs="*", help=argparse.SUPPRESS, ) return parser ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1645832469.0 gallery_dl-1.21.1/gallery_dl/output.py0000644000175000017500000002653014206264425016451 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2015-2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. import os import sys import shutil import logging import unicodedata from . import config, util, formatter # -------------------------------------------------------------------- # Logging LOG_FORMAT = "[{name}][{levelname}] {message}" LOG_FORMAT_DATE = "%Y-%m-%d %H:%M:%S" LOG_LEVEL = logging.INFO class Logger(logging.Logger): """Custom logger that includes extra info in log records""" def makeRecord(self, name, level, fn, lno, msg, args, exc_info, func=None, extra=None, sinfo=None, factory=logging._logRecordFactory): rv = factory(name, level, fn, lno, msg, args, exc_info, func, sinfo) if extra: rv.__dict__.update(extra) return rv class LoggerAdapter(): """Trimmed-down version of logging.LoggingAdapter""" __slots__ = ("logger", "extra") def __init__(self, logger, extra): self.logger = logger self.extra = extra def debug(self, msg, *args, **kwargs): if self.logger.isEnabledFor(logging.DEBUG): kwargs["extra"] = self.extra self.logger._log(logging.DEBUG, msg, args, **kwargs) def info(self, msg, *args, **kwargs): if self.logger.isEnabledFor(logging.INFO): kwargs["extra"] = self.extra self.logger._log(logging.INFO, msg, args, **kwargs) def warning(self, msg, *args, **kwargs): if self.logger.isEnabledFor(logging.WARNING): kwargs["extra"] = self.extra self.logger._log(logging.WARNING, msg, args, **kwargs) def error(self, msg, *args, **kwargs): if self.logger.isEnabledFor(logging.ERROR): kwargs["extra"] = self.extra self.logger._log(logging.ERROR, msg, args, **kwargs) class PathfmtProxy(): __slots__ = ("job",) def __init__(self, job): self.job = job def __getattribute__(self, name): pathfmt = object.__getattribute__(self, "job").pathfmt return pathfmt.__dict__.get(name) if pathfmt else None class KwdictProxy(): __slots__ = ("job",) def __init__(self, job): self.job = job def __getattribute__(self, name): pathfmt = object.__getattribute__(self, "job").pathfmt return pathfmt.kwdict.get(name) if pathfmt else None class Formatter(logging.Formatter): """Custom formatter that supports different formats per loglevel""" def __init__(self, fmt, datefmt): if isinstance(fmt, dict): for key in ("debug", "info", "warning", "error"): value = fmt[key] if key in fmt else LOG_FORMAT fmt[key] = (formatter.parse(value).format_map, "{asctime" in value) else: if fmt == LOG_FORMAT: fmt = (fmt.format_map, False) else: fmt = (formatter.parse(fmt).format_map, "{asctime" in fmt) fmt = {"debug": fmt, "info": fmt, "warning": fmt, "error": fmt} self.formats = fmt self.datefmt = datefmt def format(self, record): record.message = record.getMessage() fmt, asctime = self.formats[record.levelname] if asctime: record.asctime = self.formatTime(record, self.datefmt) msg = fmt(record.__dict__) if record.exc_info and not record.exc_text: record.exc_text = self.formatException(record.exc_info) if record.exc_text: msg = msg + "\n" + record.exc_text if record.stack_info: msg = msg + "\n" + record.stack_info return msg def initialize_logging(loglevel): """Setup basic logging functionality before configfiles have been loaded""" # convert levelnames to lowercase for level in (10, 20, 30, 40, 50): name = logging.getLevelName(level) logging.addLevelName(level, name.lower()) # register custom Logging class logging.Logger.manager.setLoggerClass(Logger) # setup basic logging to stderr formatter = Formatter(LOG_FORMAT, LOG_FORMAT_DATE) handler = logging.StreamHandler() handler.setFormatter(formatter) handler.setLevel(loglevel) root = logging.getLogger() root.setLevel(logging.NOTSET) root.addHandler(handler) return logging.getLogger("gallery-dl") def configure_logging(loglevel): root = logging.getLogger() minlevel = loglevel # stream logging handler handler = root.handlers[0] opts = config.interpolate(("output",), "log") if opts: if isinstance(opts, str): opts = {"format": opts} if handler.level == LOG_LEVEL and "level" in opts: handler.setLevel(opts["level"]) if "format" in opts or "format-date" in opts: handler.setFormatter(Formatter( opts.get("format", LOG_FORMAT), opts.get("format-date", LOG_FORMAT_DATE), )) if minlevel > handler.level: minlevel = handler.level # file logging handler handler = setup_logging_handler("logfile", lvl=loglevel) if handler: root.addHandler(handler) if minlevel > handler.level: minlevel = handler.level root.setLevel(minlevel) def setup_logging_handler(key, fmt=LOG_FORMAT, lvl=LOG_LEVEL): """Setup a new logging handler""" opts = config.interpolate(("output",), key) if not opts: return None if not isinstance(opts, dict): opts = {"path": opts} path = opts.get("path") mode = opts.get("mode", "w") encoding = opts.get("encoding", "utf-8") try: path = util.expand_path(path) handler = logging.FileHandler(path, mode, encoding) except (OSError, ValueError) as exc: logging.getLogger("gallery-dl").warning( "%s: %s", key, exc) return None except TypeError as exc: logging.getLogger("gallery-dl").warning( "%s: missing or invalid path (%s)", key, exc) return None handler.setLevel(opts.get("level", lvl)) handler.setFormatter(Formatter( opts.get("format", fmt), opts.get("format-date", LOG_FORMAT_DATE), )) return handler # -------------------------------------------------------------------- # Utility functions def replace_std_streams(errors="replace"): """Replace standard streams and set their error handlers to 'errors'""" for name in ("stdout", "stdin", "stderr"): stream = getattr(sys, name) if stream: setattr(sys, name, stream.__class__( stream.buffer, errors=errors, newline=stream.newlines, line_buffering=stream.line_buffering, )) # -------------------------------------------------------------------- # Downloader output def select(): """Automatically select a suitable output class""" pdict = { "default": PipeOutput, "pipe": PipeOutput, "term": TerminalOutput, "terminal": TerminalOutput, "color": ColorOutput, "null": NullOutput, } omode = config.get(("output",), "mode", "auto").lower() if omode in pdict: output = pdict[omode]() elif omode == "auto": if hasattr(sys.stdout, "isatty") and sys.stdout.isatty(): output = ColorOutput() if ANSI else TerminalOutput() else: output = PipeOutput() else: raise Exception("invalid output mode: " + omode) if not config.get(("output",), "skip", True): output.skip = util.identity return output class NullOutput(): def start(self, path): """Print a message indicating the start of a download""" def skip(self, path): """Print a message indicating that a download has been skipped""" def success(self, path, tries): """Print a message indicating the completion of a download""" def progress(self, bytes_total, bytes_downloaded, bytes_per_second): """Display download progress""" class PipeOutput(NullOutput): def skip(self, path): stdout = sys.stdout stdout.write(CHAR_SKIP + path + "\n") stdout.flush() def success(self, path, tries): stdout = sys.stdout stdout.write(path + "\n") stdout.flush() class TerminalOutput(NullOutput): def __init__(self): shorten = config.get(("output",), "shorten", True) if shorten: func = shorten_string_eaw if shorten == "eaw" else shorten_string limit = shutil.get_terminal_size().columns - OFFSET sep = CHAR_ELLIPSIES self.shorten = lambda txt: func(txt, limit, sep) else: self.shorten = util.identity def start(self, path): stdout = sys.stdout stdout.write(self.shorten(" " + path)) stdout.flush() def skip(self, path): sys.stdout.write(self.shorten(CHAR_SKIP + path) + "\n") def success(self, path, tries): sys.stdout.write("\r" + self.shorten(CHAR_SUCCESS + path) + "\n") def progress(self, bytes_total, bytes_downloaded, bytes_per_second): bdl = util.format_value(bytes_downloaded) bps = util.format_value(bytes_per_second) if bytes_total is None: sys.stderr.write("\r{:>7}B {:>7}B/s ".format(bdl, bps)) else: sys.stderr.write("\r{:>3}% {:>7}B {:>7}B/s ".format( bytes_downloaded * 100 // bytes_total, bdl, bps)) class ColorOutput(TerminalOutput): def start(self, path): stdout = sys.stdout stdout.write(self.shorten(path)) stdout.flush() def skip(self, path): sys.stdout.write("\033[2m" + self.shorten(path) + "\033[0m\n") def success(self, path, tries): sys.stdout.write("\r\033[1;32m" + self.shorten(path) + "\033[0m\n") class EAWCache(dict): def __missing__(self, key): width = self[key] = \ 2 if unicodedata.east_asian_width(key) in "WF" else 1 return width def shorten_string(txt, limit, sep="…"): """Limit width of 'txt'; assume all characters have a width of 1""" if len(txt) <= limit: return txt limit -= len(sep) return txt[:limit // 2] + sep + txt[-((limit+1) // 2):] def shorten_string_eaw(txt, limit, sep="…", cache=EAWCache()): """Limit width of 'txt'; check for east-asian characters with width > 1""" char_widths = [cache[c] for c in txt] text_width = sum(char_widths) if text_width <= limit: # no shortening required return txt limit -= len(sep) if text_width == len(txt): # all characters have a width of 1 return txt[:limit // 2] + sep + txt[-((limit+1) // 2):] # wide characters left = 0 lwidth = limit // 2 while True: lwidth -= char_widths[left] if lwidth < 0: break left += 1 right = -1 rwidth = (limit+1) // 2 + (lwidth + char_widths[left]) while True: rwidth -= char_widths[right] if rwidth < 0: break right -= 1 return txt[:left] + sep + txt[right+1:] if util.WINDOWS: ANSI = os.environ.get("TERM") == "ANSI" OFFSET = 1 CHAR_SKIP = "# " CHAR_SUCCESS = "* " CHAR_ELLIPSIES = "..." else: ANSI = True OFFSET = 0 CHAR_SKIP = "# " CHAR_SUCCESS = "✔ " CHAR_ELLIPSIES = "…" ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1648487626.0 gallery_dl-1.21.1/gallery_dl/path.py0000644000175000017500000002550014220366312016033 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2021-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Filesystem path handling""" import os import re import shutil import functools from . import util, formatter, exception WINDOWS = util.WINDOWS class PathFormat(): EXTENSION_MAP = { "jpeg": "jpg", "jpe" : "jpg", "jfif": "jpg", "jif" : "jpg", "jfi" : "jpg", } def __init__(self, extractor): config = extractor.config kwdefault = config("keywords-default") filename_fmt = config("filename") try: if filename_fmt is None: filename_fmt = extractor.filename_fmt elif isinstance(filename_fmt, dict): self.filename_conditions = [ (util.compile_expression(expr), formatter.parse(fmt, kwdefault).format_map) for expr, fmt in filename_fmt.items() if expr ] self.build_filename = self.build_filename_conditional filename_fmt = filename_fmt.get("", extractor.filename_fmt) self.filename_formatter = formatter.parse( filename_fmt, kwdefault).format_map except Exception as exc: raise exception.FilenameFormatError(exc) directory_fmt = config("directory") try: if directory_fmt is None: directory_fmt = extractor.directory_fmt elif isinstance(directory_fmt, dict): self.directory_conditions = [ (util.compile_expression(expr), [ formatter.parse(fmt, kwdefault).format_map for fmt in fmts ]) for expr, fmts in directory_fmt.items() if expr ] self.build_directory = self.build_directory_conditional directory_fmt = directory_fmt.get("", extractor.directory_fmt) self.directory_formatters = [ formatter.parse(dirfmt, kwdefault).format_map for dirfmt in directory_fmt ] except Exception as exc: raise exception.DirectoryFormatError(exc) self.kwdict = {} self.directory = self.realdirectory = \ self.filename = self.extension = self.prefix = \ self.path = self.realpath = self.temppath = "" self.delete = self._create_directory = False extension_map = config("extension-map") if extension_map is None: extension_map = self.EXTENSION_MAP self.extension_map = extension_map.get restrict = config("path-restrict", "auto") replace = config("path-replace", "_") if restrict == "auto": restrict = "\\\\|/<>:\"?*" if WINDOWS else "/" elif restrict == "unix": restrict = "/" elif restrict == "windows": restrict = "\\\\|/<>:\"?*" elif restrict == "ascii": restrict = "^0-9A-Za-z_." self.clean_segment = self._build_cleanfunc(restrict, replace) remove = config("path-remove", "\x00-\x1f\x7f") self.clean_path = self._build_cleanfunc(remove, "") strip = config("path-strip", "auto") if strip == "auto": strip = ". " if WINDOWS else "" elif strip == "unix": strip = "" elif strip == "windows": strip = ". " self.strip = strip basedir = extractor._parentdir if not basedir: basedir = config("base-directory") sep = os.sep if basedir is None: basedir = "." + sep + "gallery-dl" + sep elif basedir: basedir = util.expand_path(basedir) altsep = os.altsep if altsep and altsep in basedir: basedir = basedir.replace(altsep, sep) if basedir[-1] != sep: basedir += sep basedir = self.clean_path(basedir) self.basedirectory = basedir @staticmethod def _build_cleanfunc(chars, repl): if not chars: return util.identity elif isinstance(chars, dict): def func(x, table=str.maketrans(chars)): return x.translate(table) elif len(chars) == 1: def func(x, c=chars, r=repl): return x.replace(c, r) else: return functools.partial( re.compile("[" + chars + "]").sub, repl) return func def open(self, mode="wb"): """Open file and return a corresponding file object""" return open(self.temppath, mode) def exists(self): """Return True if the file exists on disk""" if self.extension and os.path.exists(self.realpath): return self.check_file() return False @staticmethod def check_file(): return True def _enum_file(self): num = 1 try: while True: self.prefix = str(num) + "." self.set_extension(self.extension, False) os.stat(self.realpath) # raises OSError if file doesn't exist num += 1 except OSError: pass return False def set_directory(self, kwdict): """Build directory path and create it if necessary""" self.kwdict = kwdict sep = os.sep segments = self.build_directory(kwdict) if segments: self.directory = directory = self.basedirectory + self.clean_path( sep.join(segments) + sep) else: self.directory = directory = self.basedirectory if WINDOWS: # Enable longer-than-260-character paths directory = os.path.abspath(directory) if directory.startswith("\\\\"): directory = "\\\\?\\UNC\\" + directory[2:] else: directory = "\\\\?\\" + directory # abspath() in Python 3.7+ removes trailing path separators (#402) if directory[-1] != sep: directory += sep self.realdirectory = directory self._create_directory = True def set_filename(self, kwdict): """Set general filename data""" self.kwdict = kwdict self.temppath = self.prefix = "" ext = kwdict["extension"] kwdict["extension"] = self.extension = self.extension_map(ext, ext) if self.extension: self.build_path() else: self.filename = "" def set_extension(self, extension, real=True): """Set filename extension""" extension = self.extension_map(extension, extension) if real: self.extension = extension self.kwdict["extension"] = self.prefix + extension self.build_path() def fix_extension(self, _=None): """Fix filenames without a given filename extension""" if not self.extension: self.set_extension("", False) if self.path[-1] == ".": self.path = self.path[:-1] self.temppath = self.realpath = self.realpath[:-1] return True def build_filename(self, kwdict): """Apply 'kwdict' to filename format string""" try: return self.clean_path(self.clean_segment( self.filename_formatter(kwdict))) except Exception as exc: raise exception.FilenameFormatError(exc) def build_filename_conditional(self, kwdict): try: for condition, fmt in self.filename_conditions: if condition(kwdict): break else: fmt = self.filename_formatter return self.clean_path(self.clean_segment(fmt(kwdict))) except Exception as exc: raise exception.FilenameFormatError(exc) def build_directory(self, kwdict): """Apply 'kwdict' to directory format strings""" segments = [] append = segments.append strip = self.strip try: for fmt in self.directory_formatters: segment = fmt(kwdict).strip() if strip: # remove trailing dots and spaces (#647) segment = segment.rstrip(strip) if segment: append(self.clean_segment(segment)) return segments except Exception as exc: raise exception.DirectoryFormatError(exc) def build_directory_conditional(self, kwdict): segments = [] append = segments.append strip = self.strip try: for condition, formatters in self.directory_conditions: if condition(kwdict): break else: formatters = self.directory_formatters for fmt in formatters: segment = fmt(kwdict).strip() if strip: segment = segment.rstrip(strip) if segment: append(self.clean_segment(segment)) return segments except Exception as exc: raise exception.DirectoryFormatError(exc) def build_path(self): """Combine directory and filename to full paths""" if self._create_directory: os.makedirs(self.realdirectory, exist_ok=True) self._create_directory = False self.filename = filename = self.build_filename(self.kwdict) self.path = self.directory + filename self.realpath = self.realdirectory + filename if not self.temppath: self.temppath = self.realpath def part_enable(self, part_directory=None): """Enable .part file usage""" if self.extension: self.temppath += ".part" else: self.set_extension("part", False) if part_directory: self.temppath = os.path.join( part_directory, os.path.basename(self.temppath), ) def part_size(self): """Return size of .part file""" try: return os.stat(self.temppath).st_size except OSError: pass return 0 def finalize(self): """Move tempfile to its target location""" if self.delete: self.delete = False os.unlink(self.temppath) return if self.temppath != self.realpath: # Move temp file to its actual location try: os.replace(self.temppath, self.realpath) except OSError: shutil.copyfile(self.temppath, self.realpath) os.unlink(self.temppath) mtime = self.kwdict.get("_mtime") if mtime: util.set_mtime(self.realpath, mtime) ././@PaxHeader0000000000000000000000000000003300000000000010211 xustar0027 mtime=1649443808.181395 gallery_dl-1.21.1/gallery_dl/postprocessor/0000755000175000017500000000000014224101740017444 5ustar00mikemike././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1637943742.0 gallery_dl-1.21.1/gallery_dl/postprocessor/__init__.py0000644000175000017500000000163014150204676021567 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2018-2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Post-processing modules""" modules = [ "classify", "compare", "exec", "metadata", "mtime", "ugoira", "zip", ] def find(name): """Return a postprocessor class with the given name""" try: return _cache[name] except KeyError: pass cls = None if name in modules: # prevent unwanted imports try: module = __import__(name, globals(), None, (), 1) except ImportError: pass else: cls = module.__postprocessor__ _cache[name] = cls return cls # -------------------------------------------------------------------- # internals _cache = {} ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1643756959.0 gallery_dl-1.21.1/gallery_dl/postprocessor/classify.py0000644000175000017500000000353214176336637021663 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2018-2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Categorize files by file extension""" from .common import PostProcessor import os class ClassifyPP(PostProcessor): DEFAULT_MAPPING = { "Music" : ("mp3", "aac", "flac", "ogg", "wma", "m4a", "wav"), "Video" : ("flv", "ogv", "avi", "mp4", "mpg", "mpeg", "3gp", "mkv", "webm", "vob", "wmv"), "Pictures" : ("jpg", "jpeg", "png", "gif", "bmp", "svg", "webp"), "Archives" : ("zip", "rar", "7z", "tar", "gz", "bz2"), } def __init__(self, job, options): PostProcessor.__init__(self, job) mapping = options.get("mapping", self.DEFAULT_MAPPING) self.mapping = { ext: directory for directory, exts in mapping.items() for ext in exts } job.register_hooks( {"prepare": self.prepare, "file": self.move}, options) def prepare(self, pathfmt): ext = pathfmt.extension if ext in self.mapping: # set initial paths to enable download skips self._build_paths(pathfmt, self.mapping[ext]) def move(self, pathfmt): ext = pathfmt.extension if ext in self.mapping: # rebuild paths in case the filename extension changed path = self._build_paths(pathfmt, self.mapping[ext]) os.makedirs(path, exist_ok=True) @staticmethod def _build_paths(pathfmt, extra): path = pathfmt.realdirectory + extra pathfmt.realpath = path + os.sep + pathfmt.filename pathfmt.path = pathfmt.directory + extra + os.sep + pathfmt.filename return path __postprocessor__ = ClassifyPP ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1618779452.0 gallery_dl-1.21.1/gallery_dl/postprocessor/common.py0000644000175000017500000000111114037116474021314 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2018-2020 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Common classes and constants used by postprocessor modules.""" class PostProcessor(): """Base class for postprocessors""" def __init__(self, job): name = self.__class__.__name__[:-2].lower() self.log = job.get_logger("postprocessor." + name) def __repr__(self): return self.__class__.__name__ ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1643756959.0 gallery_dl-1.21.1/gallery_dl/postprocessor/compare.py0000644000175000017500000000532014176336637021471 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2020-2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Compare versions of the same file and replace/enumerate them on mismatch""" from .common import PostProcessor from .. import text, util, exception import sys import os class ComparePP(PostProcessor): def __init__(self, job, options): PostProcessor.__init__(self, job) if options.get("shallow"): self._compare = self._compare_size self._equal_exc = self._equal_cnt = 0 equal = options.get("equal") if equal: equal, _, emax = equal.partition(":") self._equal_max = text.parse_int(emax) if equal == "abort": self._equal_exc = exception.StopExtraction elif equal == "terminate": self._equal_exc = exception.TerminateExtraction elif equal == "exit": self._equal_exc = sys.exit job.register_hooks({"file": ( self.enumerate if options.get("action") == "enumerate" else self.replace )}, options) def replace(self, pathfmt): try: if self._compare(pathfmt.realpath, pathfmt.temppath): return self._equal(pathfmt) except OSError: pass self._equal_cnt = 0 def enumerate(self, pathfmt): num = 1 try: while not self._compare(pathfmt.realpath, pathfmt.temppath): pathfmt.prefix = str(num) + "." pathfmt.set_extension(pathfmt.extension, False) num += 1 return self._equal(pathfmt) except OSError: pass self._equal_cnt = 0 def _compare(self, f1, f2): return self._compare_size(f1, f2) and self._compare_content(f1, f2) @staticmethod def _compare_size(f1, f2): return os.stat(f1).st_size == os.stat(f2).st_size @staticmethod def _compare_content(f1, f2): size = 16384 with open(f1, "rb") as fp1, open(f2, "rb") as fp2: while True: buf1 = fp1.read(size) buf2 = fp2.read(size) if buf1 != buf2: return False if not buf1: return True def _equal(self, pathfmt): if self._equal_exc: self._equal_cnt += 1 if self._equal_cnt >= self._equal_max: util.remove_file(pathfmt.temppath) print() raise self._equal_exc() pathfmt.delete = True __postprocessor__ = ComparePP ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1643756959.0 gallery_dl-1.21.1/gallery_dl/postprocessor/exec.py0000644000175000017500000000436514176336637020777 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2018-2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Execute processes""" from .common import PostProcessor from .. import util, formatter import subprocess if util.WINDOWS: def quote(s): return '"' + s.replace('"', '\\"') + '"' else: from shlex import quote class ExecPP(PostProcessor): def __init__(self, job, options): PostProcessor.__init__(self, job) if options.get("async", False): self._exec = self._exec_async args = options["command"] if isinstance(args, str): self.args = args execute = self.exec_string else: self.args = [formatter.parse(arg) for arg in args] execute = self.exec_list events = options.get("event") if events is None: events = ("after",) elif isinstance(events, str): events = events.split(",") job.register_hooks({event: execute for event in events}, options) def exec_list(self, pathfmt, status=None): if status: return kwdict = pathfmt.kwdict kwdict["_directory"] = pathfmt.realdirectory kwdict["_filename"] = pathfmt.filename kwdict["_path"] = pathfmt.realpath args = [arg.format_map(kwdict) for arg in self.args] self._exec(args, False) def exec_string(self, pathfmt, status=None): if status: return if status is None and pathfmt.realpath: args = self.args.replace("{}", quote(pathfmt.realpath)) else: args = self.args.replace("{}", quote(pathfmt.realdirectory)) self._exec(args, True) def _exec(self, args, shell): self.log.debug("Running '%s'", args) retcode = subprocess.Popen(args, shell=shell).wait() if retcode: self.log.warning("'%s' returned with non-zero exit status (%d)", args, retcode) def _exec_async(self, args, shell): self.log.debug("Running '%s'", args) subprocess.Popen(args, shell=shell) __postprocessor__ = ExecPP ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1648567962.0 gallery_dl-1.21.1/gallery_dl/postprocessor/metadata.py0000644000175000017500000001241614220623232021603 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2019-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Write metadata to external files""" from .common import PostProcessor from .. import util, formatter import os class MetadataPP(PostProcessor): def __init__(self, job, options): PostProcessor.__init__(self, job) mode = options.get("mode", "json") if mode == "custom": self.write = self._write_custom cfmt = options.get("content-format") or options.get("format") if isinstance(cfmt, list): cfmt = "\n".join(cfmt) + "\n" self._content_fmt = formatter.parse(cfmt).format_map ext = "txt" elif mode == "tags": self.write = self._write_tags ext = "txt" else: self.write = self._write_json self.indent = options.get("indent", 4) self.ascii = options.get("ascii", False) ext = "json" directory = options.get("directory") if directory: self._directory = self._directory_custom sep = os.sep + (os.altsep or "") self._metadir = util.expand_path(directory).rstrip(sep) + os.sep filename = options.get("filename") extfmt = options.get("extension-format") if filename: self._filename = self._filename_custom self._filename_fmt = formatter.parse(filename).format_map elif extfmt: self._filename = self._filename_extfmt self._extension_fmt = formatter.parse(extfmt).format_map else: self.extension = options.get("extension", ext) events = options.get("event") if events is None: events = ("file",) elif isinstance(events, str): events = events.split(",") job.register_hooks({event: self.run for event in events}, options) archive = options.get("archive") if archive: extr = job.extractor archive = util.expand_path(archive) archive_format = ( options.get("archive-prefix", extr.category) + options.get("archive-format", "_MD_" + extr.archive_fmt)) try: if "{" in archive: archive = formatter.parse(archive).format_map( job.pathfmt.kwdict) self.archive = util.DownloadArchive( archive, archive_format, "_archive_metadata") except Exception as exc: self.log.warning( "Failed to open download archive at '%s' ('%s: %s')", archive, exc.__class__.__name__, exc) else: self.log.debug("Using download archive '%s'", archive) else: self.archive = None self.mtime = options.get("mtime") def run(self, pathfmt): archive = self.archive if archive and archive.check(pathfmt.kwdict): return directory = self._directory(pathfmt) path = directory + self._filename(pathfmt) try: with open(path, "w", encoding="utf-8") as fp: self.write(fp, pathfmt.kwdict) except FileNotFoundError: os.makedirs(directory, exist_ok=True) with open(path, "w", encoding="utf-8") as fp: self.write(fp, pathfmt.kwdict) if archive: archive.add(pathfmt.kwdict) if self.mtime: mtime = pathfmt.kwdict.get("_mtime") if mtime: util.set_mtime(path, mtime) def _directory(self, pathfmt): return pathfmt.realdirectory def _directory_custom(self, pathfmt): return os.path.join(pathfmt.realdirectory, self._metadir) def _filename(self, pathfmt): return (pathfmt.filename or "metadata") + "." + self.extension def _filename_custom(self, pathfmt): return pathfmt.clean_path(pathfmt.clean_segment( self._filename_fmt(pathfmt.kwdict))) def _filename_extfmt(self, pathfmt): kwdict = pathfmt.kwdict ext = kwdict.get("extension") kwdict["extension"] = pathfmt.extension kwdict["extension"] = pathfmt.prefix + self._extension_fmt(kwdict) filename = pathfmt.build_filename(kwdict) kwdict["extension"] = ext return filename def _write_custom(self, fp, kwdict): fp.write(self._content_fmt(kwdict)) def _write_tags(self, fp, kwdict): tags = kwdict.get("tags") or kwdict.get("tag_string") if not tags: return if isinstance(tags, str): taglist = tags.split(", ") if len(taglist) < len(tags) / 16: taglist = tags.split(" ") tags = taglist elif isinstance(tags, dict): taglists = tags.values() tags = [] extend = tags.extend for taglist in taglists: extend(taglist) tags.sort() fp.write("\n".join(tags) + "\n") def _write_json(self, fp, kwdict): util.dump_json(util.filter_dict(kwdict), fp, self.ascii, self.indent) __postprocessor__ = MetadataPP ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1648567962.0 gallery_dl-1.21.1/gallery_dl/postprocessor/mtime.py0000644000175000017500000000206014220623232021130 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2019-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Use metadata as file modification time""" from .common import PostProcessor from .. import text, util from datetime import datetime class MtimePP(PostProcessor): def __init__(self, job, options): PostProcessor.__init__(self, job) self.key = options.get("key", "date") events = options.get("event") if events is None: events = ("file",) elif isinstance(events, str): events = events.split(",") job.register_hooks({event: self.run for event in events}, options) def run(self, pathfmt): mtime = pathfmt.kwdict.get(self.key) pathfmt.kwdict["_mtime"] = ( util.datetime_to_timestamp(mtime) if isinstance(mtime, datetime) else text.parse_int(mtime) ) __postprocessor__ = MtimePP ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1648567962.0 gallery_dl-1.21.1/gallery_dl/postprocessor/ugoira.py0000644000175000017500000002242414220623232021311 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2018-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Convert Pixiv Ugoira to WebM""" from .common import PostProcessor from .. import util import subprocess import tempfile import zipfile import shutil import os try: from math import gcd except ImportError: def gcd(a, b): while b: a, b = b, a % b return a class UgoiraPP(PostProcessor): def __init__(self, job, options): PostProcessor.__init__(self, job) self.extension = options.get("extension") or "webm" self.args = options.get("ffmpeg-args") or () self.twopass = options.get("ffmpeg-twopass", False) self.output = options.get("ffmpeg-output", True) self.delete = not options.get("keep-files", False) self.repeat = options.get("repeat-last-frame", True) self.mtime = options.get("mtime") ffmpeg = options.get("ffmpeg-location") self.ffmpeg = util.expand_path(ffmpeg) if ffmpeg else "ffmpeg" mkvmerge = options.get("mkvmerge-location") self.mkvmerge = util.expand_path(mkvmerge) if mkvmerge else "mkvmerge" demuxer = options.get("ffmpeg-demuxer") if demuxer is None or demuxer == "auto": if self.extension in ("webm", "mkv") and ( mkvmerge or shutil.which("mkvmerge")): demuxer = "mkvmerge" else: demuxer = "concat" if util.WINDOWS else "image2" if demuxer == "mkvmerge": self._process = self._process_mkvmerge self._finalize = self._finalize_mkvmerge elif demuxer == "image2": self._process = self._process_image2 self._finalize = None else: self._process = self._process_concat self._finalize = None self.log.debug("using %s demuxer", demuxer) rate = options.get("framerate", "auto") if rate != "auto": self.calculate_framerate = lambda _: (None, rate) if options.get("libx264-prevent-odd", True): # get last video-codec argument vcodec = None for index, arg in enumerate(self.args): arg, _, stream = arg.partition(":") if arg == "-vcodec" or arg in ("-c", "-codec") and ( not stream or stream.partition(":")[0] in ("v", "V")): vcodec = self.args[index + 1] # use filter when using libx264/5 self.prevent_odd = ( vcodec in ("libx264", "libx265") or not vcodec and self.extension.lower() in ("mp4", "mkv")) else: self.prevent_odd = False job.register_hooks( {"prepare": self.prepare, "file": self.convert}, options) def prepare(self, pathfmt): self._frames = None if pathfmt.extension != "zip": return if "frames" in pathfmt.kwdict: self._frames = pathfmt.kwdict["frames"] elif "pixiv_ugoira_frame_data" in pathfmt.kwdict: self._frames = pathfmt.kwdict["pixiv_ugoira_frame_data"]["data"] else: return if self.delete: pathfmt.set_extension(self.extension) def convert(self, pathfmt): if not self._frames: return with tempfile.TemporaryDirectory() as tempdir: # extract frames try: with zipfile.ZipFile(pathfmt.temppath) as zfile: zfile.extractall(tempdir) except FileNotFoundError: pathfmt.realpath = pathfmt.temppath return # process frames and collect command-line arguments pathfmt.set_extension(self.extension) args = self._process(pathfmt, tempdir) if self.args: args += self.args # invoke ffmpeg try: if self.twopass: if "-f" not in self.args: args += ("-f", self.extension) args += ("-passlogfile", tempdir + "/ffmpeg2pass", "-pass") self._exec(args + ["1", "-y", os.devnull]) self._exec(args + ["2", pathfmt.realpath]) else: args.append(pathfmt.realpath) self._exec(args) if self._finalize: self._finalize(pathfmt, tempdir) except OSError as exc: print() self.log.error("Unable to invoke FFmpeg (%s: %s)", exc.__class__.__name__, exc) pathfmt.realpath = pathfmt.temppath else: if self.mtime: mtime = pathfmt.kwdict.get("_mtime") if mtime: util.set_mtime(pathfmt.realpath, mtime) if self.delete: pathfmt.delete = True else: pathfmt.set_extension("zip") def _exec(self, args): self.log.debug(args) out = None if self.output else subprocess.DEVNULL return subprocess.Popen(args, stdout=out, stderr=out).wait() def _process_concat(self, pathfmt, tempdir): rate_in, rate_out = self.calculate_framerate(self._frames) args = [self.ffmpeg, "-f", "concat"] if rate_in: args += ("-r", str(rate_in)) args += ("-i", self._write_ffmpeg_concat(tempdir)) if rate_out: args += ("-r", str(rate_out)) return args def _process_image2(self, pathfmt, tempdir): tempdir += "/" frames = self._frames # add extra frame if necessary if self.repeat and not self._delay_is_uniform(frames): last = frames[-1] delay_gcd = self._delay_gcd(frames) if last["delay"] - delay_gcd > 0: last["delay"] -= delay_gcd self.log.debug("non-uniform delays; inserting extra frame") last_copy = last.copy() frames.append(last_copy) name, _, ext = last_copy["file"].rpartition(".") last_copy["file"] = "{:>06}.{}".format(int(name)+1, ext) shutil.copyfile(tempdir + last["file"], tempdir + last_copy["file"]) # adjust frame mtime values ts = 0 for frame in frames: os.utime(tempdir + frame["file"], ns=(ts, ts)) ts += frame["delay"] * 1000000 return [ self.ffmpeg, "-f", "image2", "-ts_from_file", "2", "-pattern_type", "sequence", "-i", "{}%06d.{}".format( tempdir.replace("%", "%%"), frame["file"].rpartition(".")[2] ), ] def _process_mkvmerge(self, pathfmt, tempdir): self._realpath = pathfmt.realpath pathfmt.realpath = tempdir + "/temp." + self.extension return [ self.ffmpeg, "-f", "image2", "-pattern_type", "sequence", "-i", "{}/%06d.{}".format( tempdir.replace("%", "%%"), self._frames[0]["file"].rpartition(".")[2] ), ] def _finalize_mkvmerge(self, pathfmt, tempdir): args = [ self.mkvmerge, "-o", self._realpath, "--timecodes", "0:" + self._write_mkvmerge_timecodes(tempdir), ] if self.extension == "webm": args.append("--webm") args += ("=", pathfmt.realpath) pathfmt.realpath = self._realpath self._exec(args) def _write_ffmpeg_concat(self, tempdir): content = ["ffconcat version 1.0"] append = content.append for frame in self._frames: append("file '{}'\nduration {}".format( frame["file"], frame["delay"] / 1000)) if self.repeat: append("file '{}'".format(frame["file"])) append("") ffconcat = tempdir + "/ffconcat.txt" with open(ffconcat, "w") as file: file.write("\n".join(content)) return ffconcat def _write_mkvmerge_timecodes(self, tempdir): content = ["# timecode format v2"] append = content.append delay_sum = 0 for frame in self._frames: append(str(delay_sum)) delay_sum += frame["delay"] append(str(delay_sum)) append("") timecodes = tempdir + "/timecodes.tc" with open(timecodes, "w") as file: file.write("\n".join(content)) return timecodes def calculate_framerate(self, frames): uniform = self._delay_is_uniform(frames) if uniform: return ("1000/{}".format(frames[0]["delay"]), None) return (None, "1000/{}".format(self._delay_gcd(frames))) @staticmethod def _delay_gcd(frames): result = frames[0]["delay"] for f in frames: result = gcd(result, f["delay"]) return result @staticmethod def _delay_is_uniform(frames): delay = frames[0]["delay"] for f in frames: if f["delay"] != delay: return False return True __postprocessor__ = UgoiraPP ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1648649870.0 gallery_dl-1.21.1/gallery_dl/postprocessor/zip.py0000644000175000017500000000460514221063216020627 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2018-2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Store files in ZIP archives""" from .common import PostProcessor from .. import util import zipfile class ZipPP(PostProcessor): COMPRESSION_ALGORITHMS = { "store": zipfile.ZIP_STORED, "zip" : zipfile.ZIP_DEFLATED, "bzip2": zipfile.ZIP_BZIP2, "lzma" : zipfile.ZIP_LZMA, } def __init__(self, job, options): PostProcessor.__init__(self, job) self.delete = not options.get("keep-files", False) ext = "." + options.get("extension", "zip") algorithm = options.get("compression", "store") if algorithm not in self.COMPRESSION_ALGORITHMS: self.log.warning( "unknown compression algorithm '%s'; falling back to 'store'", algorithm) algorithm = "store" self.zfile = None self.path = job.pathfmt.realdirectory self.args = (self.path[:-1] + ext, "a", self.COMPRESSION_ALGORITHMS[algorithm], True) job.register_hooks({ "file": self.write_safe if options.get("mode") == "safe" else self.write, }, options) job.hooks["finalize"].append(self.finalize) def write(self, pathfmt, zfile=None): # 'NameToInfo' is not officially documented, but it's available # for all supported Python versions and using it directly is a lot # faster than calling getinfo() if zfile is None: if self.zfile is None: self.zfile = zipfile.ZipFile(*self.args) zfile = self.zfile if pathfmt.filename not in zfile.NameToInfo: zfile.write(pathfmt.temppath, pathfmt.filename) pathfmt.delete = self.delete def write_safe(self, pathfmt): with zipfile.ZipFile(*self.args) as zfile: self.write(pathfmt, zfile) def finalize(self, pathfmt, status): if self.zfile: self.zfile.close() if self.delete: util.remove_directory(self.path) if self.zfile and not self.zfile.NameToInfo: # remove empty zip archive util.remove_file(self.zfile.filename) __postprocessor__ = ZipPP ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1648567962.0 gallery_dl-1.21.1/gallery_dl/text.py0000644000175000017500000001615114220623232016062 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2015-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Collection of functions that work on strings/text""" import re import html import datetime import urllib.parse HTML_RE = re.compile("<[^>]+>") def remove_html(txt, repl=" ", sep=" "): """Remove html-tags from a string""" try: txt = HTML_RE.sub(repl, txt) except TypeError: return "" if sep: return sep.join(txt.split()) return txt.strip() def split_html(txt): """Split input string by HTML tags""" try: return [ unescape(x).strip() for x in HTML_RE.split(txt) if x and not x.isspace() ] except TypeError: return [] def ensure_http_scheme(url, scheme="https://"): """Prepend 'scheme' to 'url' if it doesn't have one""" if url and not url.startswith(("https://", "http://")): return scheme + url.lstrip("/:") return url def root_from_url(url, scheme="https://"): """Extract scheme and domain from a URL""" if not url.startswith(("https://", "http://")): return scheme + url[:url.index("/")] return url[:url.index("/", 8)] def filename_from_url(url): """Extract the last part of an URL to use as a filename""" try: return url.partition("?")[0].rpartition("/")[2] except (TypeError, AttributeError): return "" def ext_from_url(url): """Extract the filename extension of an URL""" name, _, ext = filename_from_url(url).rpartition(".") return ext.lower() if name else "" def nameext_from_url(url, data=None): """Extract the last part of an URL and fill 'data' accordingly""" if data is None: data = {} filename = unquote(filename_from_url(url)) name, _, ext = filename.rpartition(".") if name and len(ext) <= 16: data["filename"], data["extension"] = name, ext.lower() else: data["filename"], data["extension"] = filename, "" return data def extract(txt, begin, end, pos=0): """Extract the text between 'begin' and 'end' from 'txt' Args: txt: String to search in begin: First string to be searched for end: Second string to be searched for after 'begin' pos: Starting position for searches in 'txt' Returns: The string between the two search-strings 'begin' and 'end' beginning with position 'pos' in 'txt' as well as the position after 'end'. If at least one of 'begin' or 'end' is not found, None and the original value of 'pos' is returned Examples: extract("abcde", "b", "d") -> "c" , 4 extract("abcde", "b", "d", 3) -> None, 3 """ try: first = txt.index(begin, pos) + len(begin) last = txt.index(end, first) return txt[first:last], last+len(end) except (ValueError, TypeError, AttributeError): return None, pos def rextract(txt, begin, end, pos=-1): try: lbeg = len(begin) first = txt.rindex(begin, 0, pos) last = txt.index(end, first + lbeg) return txt[first + lbeg:last], first except (ValueError, TypeError, AttributeError): return None, pos def extract_all(txt, rules, pos=0, values=None): """Calls extract for each rule and returns the result in a dict""" if values is None: values = {} for key, begin, end in rules: result, pos = extract(txt, begin, end, pos) if key: values[key] = result return values, pos def extract_iter(txt, begin, end, pos=0): """Yield values that would be returned by repeated calls of extract()""" index = txt.index lbeg = len(begin) lend = len(end) try: while True: first = index(begin, pos) + lbeg last = index(end, first) pos = last + lend yield txt[first:last] except (ValueError, TypeError, AttributeError): return def extract_from(txt, pos=0, default=""): """Returns a function object that extracts from 'txt'""" def extr(begin, end, index=txt.index, txt=txt): nonlocal pos try: first = index(begin, pos) + len(begin) last = index(end, first) pos = last + len(end) return txt[first:last] except (ValueError, TypeError, AttributeError): return default return extr def parse_unicode_escapes(txt): """Convert JSON Unicode escapes in 'txt' into actual characters""" if "\\u" in txt: return re.sub(r"\\u([0-9a-fA-F]{4})", _hex_to_char, txt) return txt def _hex_to_char(match): return chr(int(match.group(1), 16)) def parse_bytes(value, default=0, suffixes="bkmgtp"): """Convert a bytes-amount ("500k", "2.5M", ...) to int""" try: last = value[-1].lower() except (TypeError, LookupError): return default if last in suffixes: mul = 1024 ** suffixes.index(last) value = value[:-1] else: mul = 1 try: return round(float(value) * mul) except ValueError: return default def parse_int(value, default=0): """Convert 'value' to int""" if not value: return default try: return int(value) except (ValueError, TypeError): return default def parse_float(value, default=0.0): """Convert 'value' to float""" if not value: return default try: return float(value) except (ValueError, TypeError): return default def parse_query(qs): """Parse a query string into key-value pairs""" result = {} try: for key, value in urllib.parse.parse_qsl(qs): if key not in result: result[key] = value except AttributeError: pass return result def parse_timestamp(ts, default=None): """Create a datetime object from a unix timestamp""" try: return datetime.datetime.utcfromtimestamp(int(ts)) except (TypeError, ValueError, OverflowError): return default def parse_datetime(date_string, format="%Y-%m-%dT%H:%M:%S%z", utcoffset=0): """Create a datetime object by parsing 'date_string'""" try: if format.endswith("%z") and date_string[-3] == ":": # workaround for Python < 3.7: +00:00 -> +0000 ds = date_string[:-3] + date_string[-2:] else: ds = date_string d = datetime.datetime.strptime(ds, format) o = d.utcoffset() if o is not None: # convert to naive UTC d = d.replace(tzinfo=None, microsecond=0) - o else: if d.microsecond: d = d.replace(microsecond=0) if utcoffset: # apply manual UTC offset d += datetime.timedelta(0, utcoffset * -3600) return d except (TypeError, IndexError, KeyError): return None except (ValueError, OverflowError): return date_string urljoin = urllib.parse.urljoin quote = urllib.parse.quote unquote = urllib.parse.unquote escape = html.escape unescape = html.unescape ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1649421351.0 gallery_dl-1.21.1/gallery_dl/util.py0000644000175000017500000004555314224026047016070 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2017-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Utility functions and classes""" import re import os import sys import json import time import random import sqlite3 import binascii import datetime import functools import itertools import urllib.parse from http.cookiejar import Cookie from email.utils import mktime_tz, parsedate_tz from . import text, exception def bencode(num, alphabet="0123456789"): """Encode an integer into a base-N encoded string""" data = "" base = len(alphabet) while num: num, remainder = divmod(num, base) data = alphabet[remainder] + data return data def bdecode(data, alphabet="0123456789"): """Decode a base-N encoded string ( N = len(alphabet) )""" num = 0 base = len(alphabet) for c in data: num *= base num += alphabet.index(c) return num def advance(iterable, num): """"Advance 'iterable' by 'num' steps""" iterator = iter(iterable) next(itertools.islice(iterator, num, num), None) return iterator def unique(iterable): """Yield unique elements from 'iterable' while preserving order""" seen = set() add = seen.add for element in iterable: if element not in seen: add(element) yield element def unique_sequence(iterable): """Yield sequentially unique elements from 'iterable'""" last = None for element in iterable: if element != last: last = element yield element def contains(values, elements, separator=" "): """Returns True if at least one of 'elements' is contained in 'values'""" if isinstance(values, str): values = values.split(separator) if not isinstance(elements, (tuple, list)): return elements in values for e in elements: if e in values: return True return False def raises(cls): """Returns a function that raises 'cls' as exception""" def wrap(*args): raise cls(*args) return wrap def identity(x): """Returns its argument""" return x def true(_): """Always returns True""" return True def false(_): """Always returns False""" return False def noop(): """Does nothing""" def generate_token(size=16): """Generate a random token with hexadecimal digits""" data = random.getrandbits(size * 8).to_bytes(size, "big") return binascii.hexlify(data).decode() def format_value(value, suffixes="kMGTPEZY"): value = format(value) value_len = len(value) index = value_len - 4 if index >= 0: offset = (value_len - 1) % 3 + 1 return (value[:offset] + "." + value[offset:offset+2] + suffixes[index // 3]) return value def combine_dict(a, b): """Recursively combine the contents of 'b' into 'a'""" for key, value in b.items(): if key in a and isinstance(value, dict) and isinstance(a[key], dict): combine_dict(a[key], value) else: a[key] = value return a def transform_dict(a, func): """Recursively apply 'func' to all values in 'a'""" for key, value in a.items(): if isinstance(value, dict): transform_dict(value, func) else: a[key] = func(value) def filter_dict(a): """Return a copy of 'a' without "private" entries""" return {k: v for k, v in a.items() if k[0] != "_"} def delete_items(obj, keys): """Remove all 'keys' from 'obj'""" for key in keys: if key in obj: del obj[key] def enumerate_reversed(iterable, start=0, length=None): """Enumerate 'iterable' and return its elements in reverse order""" start -= 1 if length is None: length = len(iterable) return zip( range(length - start, start, -1), reversed(iterable), ) def number_to_string(value, numbers=(int, float)): """Convert numbers (int, float) to string; Return everything else as is.""" return str(value) if value.__class__ in numbers else value def to_string(value): """str() with "better" defaults""" if not value: return "" if value.__class__ is list: try: return ", ".join(value) except Exception: return ", ".join(map(str, value)) return str(value) def datetime_to_timestamp(dt): """Convert naive UTC datetime to timestamp""" return (dt - EPOCH) / SECOND def datetime_to_timestamp_string(dt): """Convert naive UTC datetime to timestamp string""" try: return str((dt - EPOCH) // SECOND) except Exception: return "" def dump_json(obj, fp=sys.stdout, ensure_ascii=True, indent=4): """Serialize 'obj' as JSON and write it to 'fp'""" json.dump( obj, fp, ensure_ascii=ensure_ascii, indent=indent, default=str, sort_keys=True, ) fp.write("\n") def dump_response(response, fp, *, headers=False, content=True, hide_auth=True): """Write the contents of 'response' into a file-like object""" if headers: request = response.request req_headers = request.headers.copy() res_headers = response.headers.copy() outfmt = """\ {request.method} {request.url} Status: {response.status_code} {response.reason} Request Headers --------------- {request_headers} Response Headers ---------------- {response_headers} """ if hide_auth: authorization = req_headers.get("Authorization") if authorization: atype, sep, _ = authorization.partition(" ") req_headers["Authorization"] = atype + " ***" if sep else "***" cookie = req_headers.get("Cookie") if cookie: req_headers["Cookie"] = ";".join( c.partition("=")[0] + "=***" for c in cookie.split(";") ) set_cookie = res_headers.get("Set-Cookie") if set_cookie: res_headers["Set-Cookie"] = re.sub( r"(^|, )([^ =]+)=[^,;]*", r"\1\2=***", set_cookie, ) fp.write(outfmt.format( request=request, response=response, request_headers="\n".join( name + ": " + value for name, value in req_headers.items() ), response_headers="\n".join( name + ": " + value for name, value in res_headers.items() ), ).encode()) if content: if headers: fp.write(b"\nContent\n-------\n") fp.write(response.content) def expand_path(path): """Expand environment variables and tildes (~)""" if not path: return path if not isinstance(path, str): path = os.path.join(*path) return os.path.expandvars(os.path.expanduser(path)) def remove_file(path): try: os.unlink(path) except OSError: pass def remove_directory(path): try: os.rmdir(path) except OSError: pass def set_mtime(path, mtime): try: if isinstance(mtime, str): mtime = mktime_tz(parsedate_tz(mtime)) os.utime(path, (time.time(), mtime)) except Exception: pass def load_cookiestxt(fp): """Parse a Netscape cookies.txt file and return a list of its Cookies""" cookies = [] for line in fp: line = line.lstrip(" ") # strip '#HttpOnly_' if line.startswith("#HttpOnly_"): line = line[10:] # ignore empty lines and comments if not line or line[0] in ("#", "$", "\n"): continue # strip trailing '\n' if line[-1] == "\n": line = line[:-1] domain, domain_specified, path, secure, expires, name, value = \ line.split("\t") if not name: name = value value = None cookies.append(Cookie( 0, name, value, None, False, domain, domain_specified == "TRUE", domain.startswith("."), path, False, secure == "TRUE", None if expires == "0" or not expires else expires, False, None, None, {}, )) return cookies def save_cookiestxt(fp, cookies): """Write 'cookies' in Netscape cookies.txt format to 'fp'""" fp.write("# Netscape HTTP Cookie File\n\n") for cookie in cookies: if not cookie.domain: continue if cookie.value is None: name = "" value = cookie.name else: name = cookie.name value = cookie.value fp.write("\t".join(( cookie.domain, "TRUE" if cookie.domain.startswith(".") else "FALSE", cookie.path, "TRUE" if cookie.secure else "FALSE", "0" if cookie.expires is None else str(cookie.expires), name, value, )) + "\n") def code_to_language(code, default=None): """Map an ISO 639-1 language code to its actual name""" return CODES.get((code or "").lower(), default) def language_to_code(lang, default=None): """Map a language name to its ISO 639-1 code""" if lang is None: return default lang = lang.capitalize() for code, language in CODES.items(): if language == lang: return code return default CODES = { "ar": "Arabic", "bg": "Bulgarian", "ca": "Catalan", "cs": "Czech", "da": "Danish", "de": "German", "el": "Greek", "en": "English", "es": "Spanish", "fi": "Finnish", "fr": "French", "he": "Hebrew", "hu": "Hungarian", "id": "Indonesian", "it": "Italian", "ja": "Japanese", "ko": "Korean", "ms": "Malay", "nl": "Dutch", "no": "Norwegian", "pl": "Polish", "pt": "Portuguese", "ro": "Romanian", "ru": "Russian", "sv": "Swedish", "th": "Thai", "tr": "Turkish", "vi": "Vietnamese", "zh": "Chinese", } class UniversalNone(): """None-style object that supports more operations than None itself""" __slots__ = () def __getattribute__(self, _): return self def __getitem__(self, _): return self @staticmethod def __bool__(): return False @staticmethod def __str__(): return "None" __repr__ = __str__ NONE = UniversalNone() EPOCH = datetime.datetime(1970, 1, 1) SECOND = datetime.timedelta(0, 1) WINDOWS = (os.name == "nt") SENTINEL = object() SPECIAL_EXTRACTORS = {"oauth", "recursive", "test"} GLOBALS = { "contains" : contains, "parse_int": text.parse_int, "urlsplit" : urllib.parse.urlsplit, "datetime" : datetime.datetime, "timedelta": datetime.timedelta, "abort" : raises(exception.StopExtraction), "terminate": raises(exception.TerminateExtraction), "re" : re, } def compile_expression(expr, name="", globals=GLOBALS): code_object = compile(expr, name, "eval") return functools.partial(eval, code_object, globals) def build_duration_func(duration, min=0.0): if not duration: return None if isinstance(duration, str): lower, _, upper = duration.partition("-") lower = float(lower) else: try: lower, upper = duration except TypeError: lower, upper = duration, None if upper: upper = float(upper) return functools.partial( random.uniform, lower if lower > min else min, upper if upper > min else min, ) else: if lower < min: lower = min return lambda: lower def build_extractor_filter(categories, negate=True, special=None): """Build a function that takes an Extractor class as argument and returns True if that class is allowed by 'categories' """ if isinstance(categories, str): categories = categories.split(",") catset = set() # set of categories / basecategories subset = set() # set of subcategories catsub = [] # list of category-subcategory pairs for item in categories: category, _, subcategory = item.partition(":") if category and category != "*": if subcategory and subcategory != "*": catsub.append((category, subcategory)) else: catset.add(category) elif subcategory and subcategory != "*": subset.add(subcategory) if special: catset |= special elif not catset and not subset and not catsub: return true if negate else false tests = [] if negate: if catset: tests.append(lambda extr: extr.category not in catset and extr.basecategory not in catset) if subset: tests.append(lambda extr: extr.subcategory not in subset) else: if catset: tests.append(lambda extr: extr.category in catset or extr.basecategory in catset) if subset: tests.append(lambda extr: extr.subcategory in subset) if catsub: def test(extr): for category, subcategory in catsub: if category in (extr.category, extr.basecategory) and \ subcategory == extr.subcategory: return not negate return negate tests.append(test) if len(tests) == 1: return tests[0] if negate: return lambda extr: all(t(extr) for t in tests) else: return lambda extr: any(t(extr) for t in tests) def build_proxy_map(proxies, log=None): """Generate a proxy map""" if not proxies: return None if isinstance(proxies, str): if "://" not in proxies: proxies = "http://" + proxies.lstrip("/") return {"http": proxies, "https": proxies} if isinstance(proxies, dict): for scheme, proxy in proxies.items(): if "://" not in proxy: proxies[scheme] = "http://" + proxy.lstrip("/") return proxies if log: log.warning("invalid proxy specifier: %s", proxies) def build_predicate(predicates): if not predicates: return lambda url, kwdict: True elif len(predicates) == 1: return predicates[0] return functools.partial(chain_predicates, predicates) def chain_predicates(predicates, url, kwdict): for pred in predicates: if not pred(url, kwdict): return False return True class RangePredicate(): """Predicate; True if the current index is in the given range""" def __init__(self, rangespec): self.ranges = self.optimize_range(self.parse_range(rangespec)) self.index = 0 if self.ranges: self.lower, self.upper = self.ranges[0][0], self.ranges[-1][1] else: self.lower, self.upper = 0, 0 def __call__(self, url, _): self.index += 1 if self.index > self.upper: raise exception.StopExtraction() for lower, upper in self.ranges: if lower <= self.index <= upper: return True return False @staticmethod def parse_range(rangespec): """Parse an integer range string and return the resulting ranges Examples: parse_range("-2,4,6-8,10-") -> [(1,2), (4,4), (6,8), (10,INTMAX)] parse_range(" - 3 , 4- 4, 2-6") -> [(1,3), (4,4), (2,6)] """ ranges = [] for group in rangespec.split(","): if not group: continue first, sep, last = group.partition("-") if not sep: beg = end = int(first) else: beg = int(first) if first.strip() else 1 end = int(last) if last.strip() else sys.maxsize ranges.append((beg, end) if beg <= end else (end, beg)) return ranges @staticmethod def optimize_range(ranges): """Simplify/Combine a parsed list of ranges Examples: optimize_range([(2,4), (4,6), (5,8)]) -> [(2,8)] optimize_range([(1,1), (2,2), (3,6), (8,9))]) -> [(1,6), (8,9)] """ if len(ranges) <= 1: return ranges ranges.sort() riter = iter(ranges) result = [] beg, end = next(riter) for lower, upper in riter: if lower > end+1: result.append((beg, end)) beg, end = lower, upper elif upper > end: end = upper result.append((beg, end)) return result class UniquePredicate(): """Predicate; True if given URL has not been encountered before""" def __init__(self): self.urls = set() def __call__(self, url, _): if url.startswith("text:"): return True if url not in self.urls: self.urls.add(url) return True return False class FilterPredicate(): """Predicate; True if evaluating the given expression returns True""" def __init__(self, expr, target="image"): name = "<{} filter>".format(target) self.expr = compile_expression(expr, name) def __call__(self, _, kwdict): try: return self.expr(kwdict) except exception.GalleryDLException: raise except Exception as exc: raise exception.FilterError(exc) class ExtendedUrl(): """URL with attached config key-value pairs""" def __init__(self, url, gconf, lconf): self.value, self.gconfig, self.lconfig = url, gconf, lconf def __str__(self): return self.value class DownloadArchive(): def __init__(self, path, format_string, cache_key="_archive_key"): con = sqlite3.connect(path, timeout=60, check_same_thread=False) con.isolation_level = None self.close = con.close self.cursor = con.cursor() self.keygen = format_string.format_map self._cache_key = cache_key try: self.cursor.execute("CREATE TABLE IF NOT EXISTS archive " "(entry PRIMARY KEY) WITHOUT ROWID") except sqlite3.OperationalError: # fallback for missing WITHOUT ROWID support (#553) self.cursor.execute("CREATE TABLE IF NOT EXISTS archive " "(entry PRIMARY KEY)") def check(self, kwdict): """Return True if the item described by 'kwdict' exists in archive""" key = kwdict[self._cache_key] = self.keygen(kwdict) self.cursor.execute( "SELECT 1 FROM archive WHERE entry=? LIMIT 1", (key,)) return self.cursor.fetchone() def add(self, kwdict): """Add item described by 'kwdict' to archive""" key = kwdict.get(self._cache_key) or self.keygen(kwdict) self.cursor.execute( "INSERT OR IGNORE INTO archive VALUES (?)", (key,)) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1649443807.0 gallery_dl-1.21.1/gallery_dl/version.py0000644000175000017500000000042014224101737016560 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2016-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. __version__ = "1.21.1" ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1648567962.0 gallery_dl-1.21.1/gallery_dl/ytdl.py0000644000175000017500000005236014220623232016054 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2021-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Helpers for interacting with youtube-dl""" import re import shlex import itertools from . import text, util, exception def import_module(module_name): if module_name is None: try: return __import__("yt_dlp") except ImportError: return __import__("youtube_dl") return __import__(module_name.replace("-", "_")) def construct_YoutubeDL(module, obj, user_opts, system_opts=None): opts = argv = None config = obj.config cfg = config("config-file") if cfg: with open(util.expand_path(cfg)) as fp: contents = fp.read() argv = shlex.split(contents, comments=True) cmd = config("cmdline-args") if cmd: if isinstance(cmd, str): cmd = shlex.split(cmd) argv = (argv + cmd) if argv else cmd try: opts = parse_command_line(module, argv) if argv else user_opts except SystemExit: raise exception.StopExtraction("Invalid command-line option") if opts.get("format") is None: opts["format"] = config("format") if opts.get("nopart") is None: opts["nopart"] = not config("part", True) if opts.get("updatetime") is None: opts["updatetime"] = config("mtime", True) if opts.get("ratelimit") is None: opts["ratelimit"] = text.parse_bytes(config("rate"), None) if opts.get("min_filesize") is None: opts["min_filesize"] = text.parse_bytes(config("filesize-min"), None) if opts.get("max_filesize") is None: opts["max_filesize"] = text.parse_bytes(config("filesize-max"), None) raw_opts = config("raw-options") if raw_opts: opts.update(raw_opts) if config("logging", True): opts["logger"] = obj.log if system_opts: opts.update(system_opts) return module.YoutubeDL(opts) def parse_command_line(module, argv): parser, opts, args = module.parseOpts(argv) ytdlp = (module.__name__ == "yt_dlp") std_headers = module.std_headers parse_bytes = module.FileDownloader.parse_bytes # HTTP headers if opts.user_agent is not None: std_headers["User-Agent"] = opts.user_agent if opts.referer is not None: std_headers["Referer"] = opts.referer if opts.headers: if isinstance(opts.headers, dict): std_headers.update(opts.headers) else: for h in opts.headers: key, _, value = h.partition(":") std_headers[key] = value if opts.ratelimit is not None: opts.ratelimit = parse_bytes(opts.ratelimit) if getattr(opts, "throttledratelimit", None) is not None: opts.throttledratelimit = parse_bytes(opts.throttledratelimit) if opts.min_filesize is not None: opts.min_filesize = parse_bytes(opts.min_filesize) if opts.max_filesize is not None: opts.max_filesize = parse_bytes(opts.max_filesize) if opts.max_sleep_interval is None: opts.max_sleep_interval = opts.sleep_interval if getattr(opts, "overwrites", None): opts.continue_dl = False if opts.retries is not None: opts.retries = parse_retries(opts.retries) if getattr(opts, "file_access_retries", None) is not None: opts.file_access_retries = parse_retries(opts.file_access_retries) if opts.fragment_retries is not None: opts.fragment_retries = parse_retries(opts.fragment_retries) if getattr(opts, "extractor_retries", None) is not None: opts.extractor_retries = parse_retries(opts.extractor_retries) if opts.buffersize is not None: opts.buffersize = parse_bytes(opts.buffersize) if opts.http_chunk_size is not None: opts.http_chunk_size = parse_bytes(opts.http_chunk_size) if opts.extractaudio: opts.audioformat = opts.audioformat.lower() if opts.audioquality: opts.audioquality = opts.audioquality.strip("kK") if opts.recodevideo is not None: opts.recodevideo = opts.recodevideo.replace(" ", "") if getattr(opts, "remuxvideo", None) is not None: opts.remuxvideo = opts.remuxvideo.replace(" ", "") if getattr(opts, "wait_for_video", None) is not None: min_wait, _, max_wait = opts.wait_for_video.partition("-") opts.wait_for_video = (module.parse_duration(min_wait), module.parse_duration(max_wait)) if opts.date is not None: date = module.DateRange.day(opts.date) else: date = module.DateRange(opts.dateafter, opts.datebefore) compat_opts = getattr(opts, "compat_opts", ()) def _unused_compat_opt(name): if name not in compat_opts: return False compat_opts.discard(name) compat_opts.update(["*%s" % name]) return True def set_default_compat( compat_name, opt_name, default=True, remove_compat=True): attr = getattr(opts, opt_name, None) if compat_name in compat_opts: if attr is None: setattr(opts, opt_name, not default) return True else: if remove_compat: _unused_compat_opt(compat_name) return False elif attr is None: setattr(opts, opt_name, default) return None set_default_compat("abort-on-error", "ignoreerrors", "only_download") set_default_compat("no-playlist-metafiles", "allow_playlist_files") set_default_compat("no-clean-infojson", "clean_infojson") if "format-sort" in compat_opts: opts.format_sort.extend(module.InfoExtractor.FormatSort.ytdl_default) _video_multistreams_set = set_default_compat( "multistreams", "allow_multiple_video_streams", False, remove_compat=False) _audio_multistreams_set = set_default_compat( "multistreams", "allow_multiple_audio_streams", False, remove_compat=False) if _video_multistreams_set is False and _audio_multistreams_set is False: _unused_compat_opt("multistreams") if isinstance(opts.outtmpl, dict): outtmpl = opts.outtmpl outtmpl_default = outtmpl.get("default") else: opts.outtmpl = outtmpl = outtmpl_default = "" if "filename" in compat_opts: if outtmpl_default is None: outtmpl_default = outtmpl["default"] = "%(title)s-%(id)s.%(ext)s" else: _unused_compat_opt("filename") if opts.extractaudio and not opts.keepvideo and opts.format is None: opts.format = "bestaudio/best" if ytdlp: def metadataparser_actions(f): if isinstance(f, str): yield module.MetadataFromFieldPP.to_action(f) else: REPLACE = module.MetadataParserPP.Actions.REPLACE args = f[1:] for x in f[0].split(","): action = [REPLACE, x] action += args yield action if getattr(opts, "parse_metadata", None) is None: opts.parse_metadata = [] if opts.metafromtitle is not None: opts.parse_metadata.append("title:%s" % opts.metafromtitle) opts.metafromtitle = None opts.parse_metadata = list(itertools.chain.from_iterable(map( metadataparser_actions, opts.parse_metadata))) else: opts.parse_metadata = () download_archive_fn = module.expand_path(opts.download_archive) \ if opts.download_archive is not None else opts.download_archive if getattr(opts, "getcomments", None): opts.writeinfojson = True if getattr(opts, "no_sponsorblock", None): opts.sponsorblock_mark = set() opts.sponsorblock_remove = set() else: opts.sponsorblock_mark = \ getattr(opts, "sponsorblock_mark", None) or set() opts.sponsorblock_remove = \ getattr(opts, "sponsorblock_remove", None) or set() sponsorblock_query = opts.sponsorblock_mark | opts.sponsorblock_remove opts.remove_chapters = getattr(opts, "remove_chapters", None) or () # PostProcessors postprocessors = [] if opts.metafromtitle: postprocessors.append({ "key": "MetadataFromTitle", "titleformat": opts.metafromtitle, }) if getattr(opts, "add_postprocessors", None): postprocessors += list(opts.add_postprocessors) if sponsorblock_query: postprocessors.append({ "key": "SponsorBlock", "categories": sponsorblock_query, "api": opts.sponsorblock_api, "when": "pre_process", }) if opts.parse_metadata: postprocessors.append({ "key": "MetadataParser", "actions": opts.parse_metadata, "when": "pre_process", }) if opts.convertsubtitles: pp = {"key": "FFmpegSubtitlesConvertor", "format": opts.convertsubtitles} if ytdlp: pp["when"] = "before_dl" postprocessors.append(pp) if getattr(opts, "convertthumbnails", None): postprocessors.append({ "key": "FFmpegThumbnailsConvertor", "format": opts.convertthumbnails, "when": "before_dl", }) if getattr(opts, "exec_before_dl_cmd", None): postprocessors.append({ "key": "Exec", "exec_cmd": opts.exec_before_dl_cmd, "when": "before_dl", }) if opts.extractaudio: postprocessors.append({ "key": "FFmpegExtractAudio", "preferredcodec": opts.audioformat, "preferredquality": opts.audioquality, "nopostoverwrites": opts.nopostoverwrites, }) if getattr(opts, "remuxvideo", None): postprocessors.append({ "key": "FFmpegVideoRemuxer", "preferedformat": opts.remuxvideo, }) if opts.recodevideo: postprocessors.append({ "key": "FFmpegVideoConvertor", "preferedformat": opts.recodevideo, }) if opts.embedsubtitles: pp = {"key": "FFmpegEmbedSubtitle"} if ytdlp: pp["already_have_subtitle"] = ( opts.writesubtitles and "no-keep-subs" not in compat_opts) postprocessors.append(pp) if not opts.writeautomaticsub and "no-keep-subs" not in compat_opts: opts.writesubtitles = True if opts.allsubtitles and not opts.writeautomaticsub: opts.writesubtitles = True remove_chapters_patterns, remove_ranges = [], [] for regex in opts.remove_chapters: if regex.startswith("*"): dur = list(map(module.parse_duration, regex[1:].split("-"))) if len(dur) == 2 and all(t is not None for t in dur): remove_ranges.append(tuple(dur)) continue remove_chapters_patterns.append(re.compile(regex)) if opts.remove_chapters or sponsorblock_query: postprocessors.append({ "key": "ModifyChapters", "remove_chapters_patterns": remove_chapters_patterns, "remove_sponsor_segments": opts.sponsorblock_remove, "remove_ranges": remove_ranges, "sponsorblock_chapter_title": opts.sponsorblock_chapter_title, "force_keyframes": opts.force_keyframes_at_cuts, }) addchapters = getattr(opts, "addchapters", None) embed_infojson = getattr(opts, "embed_infojson", None) if opts.addmetadata or addchapters or embed_infojson: pp = {"key": "FFmpegMetadata"} if ytdlp: if embed_infojson is None: embed_infojson = "if_exists" pp["add_metadata"] = opts.addmetadata pp["add_chapters"] = addchapters pp["add_infojson"] = embed_infojson postprocessors.append(pp) if getattr(opts, "sponskrub", False) is not False: postprocessors.append({ "key": "SponSkrub", "path": opts.sponskrub_path, "args": opts.sponskrub_args, "cut": opts.sponskrub_cut, "force": opts.sponskrub_force, "ignoreerror": opts.sponskrub is None, "_from_cli": True, }) if opts.embedthumbnail: already_have_thumbnail = (opts.writethumbnail or getattr(opts, "write_all_thumbnails", False)) postprocessors.append({ "key": "EmbedThumbnail", "already_have_thumbnail": already_have_thumbnail, }) if not already_have_thumbnail: opts.writethumbnail = True if isinstance(opts.outtmpl, dict): opts.outtmpl["pl_thumbnail"] = "" if getattr(opts, "split_chapters", None): postprocessors.append({ "key": "FFmpegSplitChapters", "force_keyframes": opts.force_keyframes_at_cuts, }) if opts.xattrs: postprocessors.append({"key": "XAttrMetadata"}) if opts.exec_cmd: postprocessors.append({ "key": "Exec", "exec_cmd": opts.exec_cmd, "when": "after_move", }) match_filter = ( None if opts.match_filter is None else module.match_filter_func(opts.match_filter)) return { "usenetrc": opts.usenetrc, "netrc_location": getattr(opts, "netrc_location", None), "username": opts.username, "password": opts.password, "twofactor": opts.twofactor, "videopassword": opts.videopassword, "ap_mso": opts.ap_mso, "ap_username": opts.ap_username, "ap_password": opts.ap_password, "quiet": opts.quiet, "no_warnings": opts.no_warnings, "forceurl": opts.geturl, "forcetitle": opts.gettitle, "forceid": opts.getid, "forcethumbnail": opts.getthumbnail, "forcedescription": opts.getdescription, "forceduration": opts.getduration, "forcefilename": opts.getfilename, "forceformat": opts.getformat, "forceprint": getattr(opts, "forceprint", None) or (), "force_write_download_archive": getattr( opts, "force_write_download_archive", None), "simulate": opts.simulate, "skip_download": opts.skip_download, "format": opts.format, "allow_unplayable_formats": getattr( opts, "allow_unplayable_formats", None), "ignore_no_formats_error": getattr( opts, "ignore_no_formats_error", None), "format_sort": getattr( opts, "format_sort", None), "format_sort_force": getattr( opts, "format_sort_force", None), "allow_multiple_video_streams": opts.allow_multiple_video_streams, "allow_multiple_audio_streams": opts.allow_multiple_audio_streams, "check_formats": getattr( opts, "check_formats", None), "listformats": opts.listformats, "listformats_table": getattr( opts, "listformats_table", None), "outtmpl": opts.outtmpl, "outtmpl_na_placeholder": opts.outtmpl_na_placeholder, "paths": getattr(opts, "paths", None), "autonumber_size": opts.autonumber_size, "autonumber_start": opts.autonumber_start, "restrictfilenames": opts.restrictfilenames, "windowsfilenames": getattr(opts, "windowsfilenames", None), "ignoreerrors": opts.ignoreerrors, "force_generic_extractor": opts.force_generic_extractor, "ratelimit": opts.ratelimit, "throttledratelimit": getattr(opts, "throttledratelimit", None), "overwrites": getattr(opts, "overwrites", None), "retries": opts.retries, "file_access_retries": getattr(opts, "file_access_retries", None), "fragment_retries": opts.fragment_retries, "extractor_retries": getattr(opts, "extractor_retries", None), "skip_unavailable_fragments": opts.skip_unavailable_fragments, "keep_fragments": opts.keep_fragments, "concurrent_fragment_downloads": getattr( opts, "concurrent_fragment_downloads", None), "buffersize": opts.buffersize, "noresizebuffer": opts.noresizebuffer, "http_chunk_size": opts.http_chunk_size, "continuedl": opts.continue_dl, "noprogress": True if opts.noprogress is None else opts.noprogress, "playliststart": opts.playliststart, "playlistend": opts.playlistend, "playlistreverse": opts.playlist_reverse, "playlistrandom": opts.playlist_random, "noplaylist": opts.noplaylist, "logtostderr": outtmpl_default == "-", "consoletitle": opts.consoletitle, "nopart": opts.nopart, "updatetime": opts.updatetime, "writedescription": opts.writedescription, "writeannotations": opts.writeannotations, "writeinfojson": opts.writeinfojson, "allow_playlist_files": opts.allow_playlist_files, "clean_infojson": opts.clean_infojson, "getcomments": getattr(opts, "getcomments", None), "writethumbnail": opts.writethumbnail is True, "write_all_thumbnails": getattr(opts, "write_all_thumbnails", None) or opts.writethumbnail == "all", "writelink": getattr(opts, "writelink", None), "writeurllink": getattr(opts, "writeurllink", None), "writewebloclink": getattr(opts, "writewebloclink", None), "writedesktoplink": getattr(opts, "writedesktoplink", None), "writesubtitles": opts.writesubtitles, "writeautomaticsub": opts.writeautomaticsub, "allsubtitles": opts.allsubtitles, "listsubtitles": opts.listsubtitles, "subtitlesformat": opts.subtitlesformat, "subtitleslangs": opts.subtitleslangs, "matchtitle": module.decodeOption(opts.matchtitle), "rejecttitle": module.decodeOption(opts.rejecttitle), "max_downloads": opts.max_downloads, "prefer_free_formats": opts.prefer_free_formats, "trim_file_name": getattr(opts, "trim_file_name", None), "verbose": opts.verbose, "dump_intermediate_pages": opts.dump_intermediate_pages, "write_pages": opts.write_pages, "test": opts.test, "keepvideo": opts.keepvideo, "min_filesize": opts.min_filesize, "max_filesize": opts.max_filesize, "min_views": opts.min_views, "max_views": opts.max_views, "daterange": date, "cachedir": opts.cachedir, "youtube_print_sig_code": opts.youtube_print_sig_code, "age_limit": opts.age_limit, "download_archive": download_archive_fn, "break_on_existing": getattr(opts, "break_on_existing", None), "break_on_reject": getattr(opts, "break_on_reject", None), "break_per_url": getattr(opts, "break_per_url", None), "skip_playlist_after_errors": getattr( opts, "skip_playlist_after_errors", None), "cookiefile": opts.cookiefile, "cookiesfrombrowser": getattr(opts, "cookiesfrombrowser", None), "nocheckcertificate": opts.no_check_certificate, "prefer_insecure": opts.prefer_insecure, "proxy": opts.proxy, "socket_timeout": opts.socket_timeout, "bidi_workaround": opts.bidi_workaround, "debug_printtraffic": opts.debug_printtraffic, "prefer_ffmpeg": opts.prefer_ffmpeg, "include_ads": opts.include_ads, "default_search": opts.default_search, "dynamic_mpd": getattr(opts, "dynamic_mpd", None), "extractor_args": getattr(opts, "extractor_args", None), "youtube_include_dash_manifest": getattr( opts, "youtube_include_dash_manifest", None), "youtube_include_hls_manifest": getattr( opts, "youtube_include_hls_manifest", None), "encoding": opts.encoding, "extract_flat": opts.extract_flat, "live_from_start": getattr(opts, "live_from_start", None), "wait_for_video": getattr(opts, "wait_for_video", None), "mark_watched": opts.mark_watched, "merge_output_format": opts.merge_output_format, "postprocessors": postprocessors, "fixup": opts.fixup, "source_address": opts.source_address, "sleep_interval_requests": getattr( opts, "sleep_interval_requests", None), "sleep_interval": opts.sleep_interval, "max_sleep_interval": opts.max_sleep_interval, "sleep_interval_subtitles": getattr( opts, "sleep_interval_subtitles", None), "external_downloader": opts.external_downloader, "playlist_items": opts.playlist_items, "xattr_set_filesize": opts.xattr_set_filesize, "match_filter": match_filter, "no_color": opts.no_color, "ffmpeg_location": opts.ffmpeg_location, "hls_prefer_native": opts.hls_prefer_native, "hls_use_mpegts": opts.hls_use_mpegts, "hls_split_discontinuity": getattr( opts, "hls_split_discontinuity", None), "external_downloader_args": opts.external_downloader_args, "postprocessor_args": opts.postprocessor_args, "cn_verification_proxy": opts.cn_verification_proxy, "geo_verification_proxy": opts.geo_verification_proxy, "geo_bypass": opts.geo_bypass, "geo_bypass_country": opts.geo_bypass_country, "geo_bypass_ip_block": opts.geo_bypass_ip_block, "compat_opts": compat_opts, } def parse_retries(retries, name=""): if retries in ("inf", "infinite"): return float("inf") return int(retries) ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1649443808.1613946 gallery_dl-1.21.1/gallery_dl.egg-info/0000755000175000017500000000000014224101740016211 5ustar00mikemike././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1649443807.0 gallery_dl-1.21.1/gallery_dl.egg-info/PKG-INFO0000644000175000017500000002540714224101737017324 0ustar00mikemikeMetadata-Version: 2.1 Name: gallery-dl Version: 1.21.1 Summary: Command-line program to download image galleries and collections from several image hosting sites Home-page: https://github.com/mikf/gallery-dl Author: Mike Fährmann Author-email: mike_faehrmann@web.de Maintainer: Mike Fährmann Maintainer-email: mike_faehrmann@web.de License: GPLv2 Download-URL: https://github.com/mikf/gallery-dl/releases/latest Keywords: image gallery downloader crawler scraper Platform: UNKNOWN Classifier: Development Status :: 5 - Production/Stable Classifier: Environment :: Console Classifier: Intended Audience :: End Users/Desktop Classifier: License :: OSI Approved :: GNU General Public License v2 (GPLv2) Classifier: Operating System :: Microsoft :: Windows Classifier: Operating System :: POSIX Classifier: Operating System :: MacOS Classifier: Programming Language :: Python :: 3.4 Classifier: Programming Language :: Python :: 3.5 Classifier: Programming Language :: Python :: 3.6 Classifier: Programming Language :: Python :: 3.7 Classifier: Programming Language :: Python :: 3.8 Classifier: Programming Language :: Python :: 3.9 Classifier: Programming Language :: Python :: 3 :: Only Classifier: Topic :: Internet :: WWW/HTTP Classifier: Topic :: Multimedia :: Graphics Classifier: Topic :: Utilities Requires-Python: >=3.4 Provides-Extra: video License-File: LICENSE ========== gallery-dl ========== *gallery-dl* is a command-line program to download image galleries and collections from several image hosting sites (see `Supported Sites`_). It is a cross-platform tool with many configuration options and powerful `filenaming capabilities `_. |pypi| |build| |gitter| .. contents:: Dependencies ============ - Python_ 3.4+ - Requests_ Optional -------- - FFmpeg_: Pixiv Ugoira to WebM conversion - yt-dlp_ or youtube-dl_: Video downloads - PySocks_: SOCKS proxy support Installation ============ Pip --- The stable releases of *gallery-dl* are distributed on PyPI_ and can be easily installed or upgraded using pip_: .. code:: bash python3 -m pip install -U gallery-dl Installing the latest dev version directly from GitHub can be done with pip_ as well: .. code:: bash python3 -m pip install -U -I --no-deps --no-cache-dir https://github.com/mikf/gallery-dl/archive/master.tar.gz Note: Windows users should use :code:`py -3` instead of :code:`python3`. It is advised to use the latest version of pip_, including the essential packages :code:`setuptools` and :code:`wheel`. To ensure these packages are up-to-date, run .. code:: bash python3 -m pip install --upgrade pip setuptools wheel Standalone Executable --------------------- Prebuilt executable files with a Python interpreter and required Python packages included are available for - `Windows `__ - `Linux `__ | Executables build from the latest commit can be found at | https://github.com/mikf/gallery-dl/actions/workflows/executables.yml Snap ---- Linux users that are using a distro that is supported by Snapd_ can install *gallery-dl* from the Snap Store: .. code:: bash snap install gallery-dl Chocolatey ---------- Windows users that have Chocolatey_ installed can install *gallery-dl* from the Chocolatey Community Packages repository: .. code:: powershell choco install gallery-dl Scoop ----- *gallery-dl* is also available in the Scoop_ "main" bucket for Windows users: .. code:: powershell scoop install gallery-dl Usage ===== To use *gallery-dl* simply call it with the URLs you wish to download images from: .. code:: bash gallery-dl [OPTION]... URL... See also :code:`gallery-dl --help`. Examples -------- Download images; in this case from danbooru via tag search for 'bonocho': .. code:: bash gallery-dl "https://danbooru.donmai.us/posts?tags=bonocho" Get the direct URL of an image from a site supporting authentication with username & password: .. code:: bash gallery-dl -g -u "" -p "" "https://twitter.com/i/web/status/604341487988576256" Filter manga chapters by language and chapter number: .. code:: bash gallery-dl --chapter-filter "lang == 'fr' and 10 <= chapter < 20" "https://mangadex.org/title/2354/" | Search a remote resource for URLs and download images from them: | (URLs for which no extractor can be found will be silently ignored) .. code:: bash gallery-dl "r:https://pastebin.com/raw/FLwrCYsT" If a site's address is nonstandard for its extractor, you can prefix the URL with the extractor's name to force the use of a specific extractor: .. code:: bash gallery-dl "tumblr:https://sometumblrblog.example" Configuration ============= Configuration files for *gallery-dl* use a JSON-based file format. | For a (more or less) complete example with options set to their default values, see gallery-dl.conf_. | For a configuration file example with more involved settings and options, see gallery-dl-example.conf_. | A list of all available configuration options and their descriptions can be found in configuration.rst_. | *gallery-dl* searches for configuration files in the following places: Windows: * ``%APPDATA%\gallery-dl\config.json`` * ``%USERPROFILE%\gallery-dl\config.json`` * ``%USERPROFILE%\gallery-dl.conf`` (``%USERPROFILE%`` usually refers to the user's home directory, i.e. ``C:\Users\\``) Linux, macOS, etc.: * ``/etc/gallery-dl.conf`` * ``${XDG_CONFIG_HOME}/gallery-dl/config.json`` * ``${HOME}/.config/gallery-dl/config.json`` * ``${HOME}/.gallery-dl.conf`` Values in later configuration files will override previous ones. Command line options will override all related settings in the configuration file(s), e.g. using ``--write-metadata`` will enable writing metadata using the default values for all ``postprocessors.metadata.*`` settings, overriding any specific settings in configuration files. Authentication ============== Username & Password ------------------- Some extractors require you to provide valid login credentials in the form of a username & password pair. This is necessary for ``nijie`` and optional for ``aryion``, ``danbooru``, ``e621``, ``exhentai``, ``idolcomplex``, ``imgbb``, ``inkbunny``, ``instagram``, ``mangadex``, ``mangoxo``, ``pillowfort``, ``sankaku``, ``subscribestar``, ``tapas``, ``tsumino``, and ``twitter``. You can set the necessary information in your configuration file (cf. gallery-dl.conf_) .. code:: json { "extractor": { "twitter": { "username": "", "password": "" } } } or you can provide them directly via the :code:`-u/--username` and :code:`-p/--password` or via the :code:`-o/--option` command-line options .. code:: bash gallery-dl -u -p URL gallery-dl -o username= -o password= URL Cookies ------- For sites where login with username & password is not possible due to CAPTCHA or similar, or has not been implemented yet, you can use the cookies from a browser login session and input them into *gallery-dl*. This can be done via the `cookies `__ option in your configuration file by specifying - | the path to a Mozilla/Netscape format cookies.txt file exported by a browser addon | (e.g. `Get cookies.txt `__ for Chrome, `Export Cookies `__ for Firefox) - | a list of name-value pairs gathered from your browser's web developer tools | (in `Chrome `__, in `Firefox `__) For example: .. code:: json { "extractor": { "instagram": { "cookies": "$HOME/path/to/cookies.txt" }, "patreon": { "cookies": { "session_id": "K1T57EKu19TR49C51CDjOJoXNQLF7VbdVOiBrC9ye0a" } } } } You can also specify a cookies.txt file with the :code:`--cookies` command-line option: .. code:: bash gallery-dl --cookies "$HOME/path/to/cookies.txt" URL OAuth ----- *gallery-dl* supports user authentication via OAuth_ for ``deviantart``, ``flickr``, ``reddit``, ``smugmug``, ``tumblr``, and ``mastodon`` instances. This is mostly optional, but grants *gallery-dl* the ability to issue requests on your account's behalf and enables it to access resources which would otherwise be unavailable to a public user. To link your account to *gallery-dl*, start by invoking it with ``oauth:`` as an argument. For example: .. code:: bash gallery-dl oauth:flickr You will be sent to the site's authorization page and asked to grant read access to *gallery-dl*. Authorize it and you will be shown one or more "tokens", which should be added to your configuration file. To authenticate with a ``mastodon`` instance, run *gallery-dl* with ``oauth:mastodon:`` as argument. For example: .. code:: bash gallery-dl oauth:mastodon:pawoo.net gallery-dl oauth:mastodon:https://mastodon.social/ .. _gallery-dl.conf: https://github.com/mikf/gallery-dl/blob/master/docs/gallery-dl.conf .. _gallery-dl-example.conf: https://github.com/mikf/gallery-dl/blob/master/docs/gallery-dl-example.conf .. _configuration.rst: https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst .. _Supported Sites: https://github.com/mikf/gallery-dl/blob/master/docs/supportedsites.md .. _Formatting: https://github.com/mikf/gallery-dl/blob/master/docs/formatting.md .. _Python: https://www.python.org/downloads/ .. _PyPI: https://pypi.org/ .. _pip: https://pip.pypa.io/en/stable/ .. _Requests: https://requests.readthedocs.io/en/master/ .. _FFmpeg: https://www.ffmpeg.org/ .. _yt-dlp: https://github.com/yt-dlp/yt-dlp .. _youtube-dl: https://ytdl-org.github.io/youtube-dl/ .. _PySocks: https://pypi.org/project/PySocks/ .. _pyOpenSSL: https://pyopenssl.org/ .. _Snapd: https://docs.snapcraft.io/installing-snapd .. _OAuth: https://en.wikipedia.org/wiki/OAuth .. _Chocolatey: https://chocolatey.org/install .. _Scoop: https://scoop.sh .. |pypi| image:: https://img.shields.io/pypi/v/gallery-dl.svg :target: https://pypi.org/project/gallery-dl/ .. |build| image:: https://github.com/mikf/gallery-dl/workflows/tests/badge.svg :target: https://github.com/mikf/gallery-dl/actions .. |gitter| image:: https://badges.gitter.im/gallery-dl/main.svg :target: https://gitter.im/gallery-dl/main ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1649443807.0 gallery_dl-1.21.1/gallery_dl.egg-info/SOURCES.txt0000644000175000017500000001474314224101737020114 0ustar00mikemikeCHANGELOG.md LICENSE MANIFEST.in README.rst setup.cfg setup.py data/completion/_gallery-dl data/completion/gallery-dl data/completion/gallery-dl.fish data/man/gallery-dl.1 data/man/gallery-dl.conf.5 docs/gallery-dl-example.conf docs/gallery-dl.conf gallery_dl/__init__.py gallery_dl/__main__.py gallery_dl/cache.py gallery_dl/config.py gallery_dl/exception.py gallery_dl/formatter.py gallery_dl/job.py gallery_dl/oauth.py gallery_dl/option.py gallery_dl/output.py gallery_dl/path.py gallery_dl/text.py gallery_dl/util.py gallery_dl/version.py gallery_dl/ytdl.py gallery_dl.egg-info/PKG-INFO gallery_dl.egg-info/SOURCES.txt gallery_dl.egg-info/dependency_links.txt gallery_dl.egg-info/entry_points.txt gallery_dl.egg-info/requires.txt gallery_dl.egg-info/top_level.txt gallery_dl/downloader/__init__.py gallery_dl/downloader/common.py gallery_dl/downloader/http.py gallery_dl/downloader/text.py gallery_dl/downloader/ytdl.py gallery_dl/extractor/2chan.py gallery_dl/extractor/35photo.py gallery_dl/extractor/3dbooru.py gallery_dl/extractor/420chan.py gallery_dl/extractor/4chan.py gallery_dl/extractor/500px.py gallery_dl/extractor/8kun.py gallery_dl/extractor/8muses.py gallery_dl/extractor/__init__.py gallery_dl/extractor/adultempire.py gallery_dl/extractor/architizer.py gallery_dl/extractor/artstation.py gallery_dl/extractor/aryion.py gallery_dl/extractor/bbc.py gallery_dl/extractor/bcy.py gallery_dl/extractor/behance.py gallery_dl/extractor/blogger.py gallery_dl/extractor/booru.py gallery_dl/extractor/comicvine.py gallery_dl/extractor/common.py gallery_dl/extractor/cyberdrop.py gallery_dl/extractor/danbooru.py gallery_dl/extractor/desktopography.py gallery_dl/extractor/deviantart.py gallery_dl/extractor/directlink.py gallery_dl/extractor/dynastyscans.py gallery_dl/extractor/erome.py gallery_dl/extractor/exhentai.py gallery_dl/extractor/fallenangels.py gallery_dl/extractor/fanbox.py gallery_dl/extractor/fantia.py gallery_dl/extractor/flickr.py gallery_dl/extractor/foolfuuka.py gallery_dl/extractor/foolslide.py gallery_dl/extractor/furaffinity.py gallery_dl/extractor/fuskator.py gallery_dl/extractor/gelbooru.py gallery_dl/extractor/gelbooru_v01.py gallery_dl/extractor/gelbooru_v02.py gallery_dl/extractor/generic.py gallery_dl/extractor/gfycat.py gallery_dl/extractor/gofile.py gallery_dl/extractor/hbrowse.py gallery_dl/extractor/hentai2read.py gallery_dl/extractor/hentaicosplays.py gallery_dl/extractor/hentaifoundry.py gallery_dl/extractor/hentaifox.py gallery_dl/extractor/hentaihand.py gallery_dl/extractor/hentaihere.py gallery_dl/extractor/hiperdex.py gallery_dl/extractor/hitomi.py gallery_dl/extractor/idolcomplex.py gallery_dl/extractor/imagebam.py gallery_dl/extractor/imagechest.py gallery_dl/extractor/imagefap.py gallery_dl/extractor/imagehosts.py gallery_dl/extractor/imgbb.py gallery_dl/extractor/imgbox.py gallery_dl/extractor/imgth.py gallery_dl/extractor/imgur.py gallery_dl/extractor/inkbunny.py gallery_dl/extractor/instagram.py gallery_dl/extractor/issuu.py gallery_dl/extractor/kabeuchi.py gallery_dl/extractor/keenspot.py gallery_dl/extractor/kemonoparty.py gallery_dl/extractor/khinsider.py gallery_dl/extractor/kissgoddess.py gallery_dl/extractor/kohlchan.py gallery_dl/extractor/komikcast.py gallery_dl/extractor/lightroom.py gallery_dl/extractor/lineblog.py gallery_dl/extractor/livedoor.py gallery_dl/extractor/lolisafe.py gallery_dl/extractor/luscious.py gallery_dl/extractor/mangadex.py gallery_dl/extractor/mangafox.py gallery_dl/extractor/mangahere.py gallery_dl/extractor/mangakakalot.py gallery_dl/extractor/manganelo.py gallery_dl/extractor/mangapark.py gallery_dl/extractor/mangasee.py gallery_dl/extractor/mangoxo.py gallery_dl/extractor/mastodon.py gallery_dl/extractor/mememuseum.py gallery_dl/extractor/message.py gallery_dl/extractor/moebooru.py gallery_dl/extractor/myhentaigallery.py gallery_dl/extractor/myportfolio.py gallery_dl/extractor/naver.py gallery_dl/extractor/naverwebtoon.py gallery_dl/extractor/newgrounds.py gallery_dl/extractor/ngomik.py gallery_dl/extractor/nhentai.py gallery_dl/extractor/nijie.py gallery_dl/extractor/nozomi.py gallery_dl/extractor/nsfwalbum.py gallery_dl/extractor/oauth.py gallery_dl/extractor/paheal.py gallery_dl/extractor/patreon.py gallery_dl/extractor/philomena.py gallery_dl/extractor/photobucket.py gallery_dl/extractor/photovogue.py gallery_dl/extractor/picarto.py gallery_dl/extractor/piczel.py gallery_dl/extractor/pillowfort.py gallery_dl/extractor/pinterest.py gallery_dl/extractor/pixiv.py gallery_dl/extractor/pixnet.py gallery_dl/extractor/plurk.py gallery_dl/extractor/pornhub.py gallery_dl/extractor/pururin.py gallery_dl/extractor/reactor.py gallery_dl/extractor/readcomiconline.py gallery_dl/extractor/recursive.py gallery_dl/extractor/reddit.py gallery_dl/extractor/redgifs.py gallery_dl/extractor/rule34us.py gallery_dl/extractor/sankaku.py gallery_dl/extractor/sankakucomplex.py gallery_dl/extractor/seiga.py gallery_dl/extractor/senmanga.py gallery_dl/extractor/sexcom.py gallery_dl/extractor/shopify.py gallery_dl/extractor/simplyhentai.py gallery_dl/extractor/skeb.py gallery_dl/extractor/slickpic.py gallery_dl/extractor/slideshare.py gallery_dl/extractor/smugmug.py gallery_dl/extractor/speakerdeck.py gallery_dl/extractor/subscribestar.py gallery_dl/extractor/tapas.py gallery_dl/extractor/telegraph.py gallery_dl/extractor/test.py gallery_dl/extractor/toyhouse.py gallery_dl/extractor/tsumino.py gallery_dl/extractor/tumblr.py gallery_dl/extractor/tumblrgallery.py gallery_dl/extractor/twibooru.py gallery_dl/extractor/twitter.py gallery_dl/extractor/unsplash.py gallery_dl/extractor/vanillarock.py gallery_dl/extractor/vk.py gallery_dl/extractor/vsco.py gallery_dl/extractor/wallhaven.py gallery_dl/extractor/wallpapercave.py gallery_dl/extractor/warosu.py gallery_dl/extractor/weasyl.py gallery_dl/extractor/webtoons.py gallery_dl/extractor/weibo.py gallery_dl/extractor/wikiart.py gallery_dl/extractor/wikieat.py gallery_dl/extractor/xhamster.py gallery_dl/extractor/xvideos.py gallery_dl/extractor/ytdl.py gallery_dl/postprocessor/__init__.py gallery_dl/postprocessor/classify.py gallery_dl/postprocessor/common.py gallery_dl/postprocessor/compare.py gallery_dl/postprocessor/exec.py gallery_dl/postprocessor/metadata.py gallery_dl/postprocessor/mtime.py gallery_dl/postprocessor/ugoira.py gallery_dl/postprocessor/zip.py test/test_cache.py test/test_config.py test/test_cookies.py test/test_downloader.py test/test_extractor.py test/test_formatter.py test/test_job.py test/test_oauth.py test/test_output.py test/test_postprocessor.py test/test_results.py test/test_text.py test/test_util.py././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1649443807.0 gallery_dl-1.21.1/gallery_dl.egg-info/dependency_links.txt0000644000175000017500000000000114224101737022265 0ustar00mikemike ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1649443807.0 gallery_dl-1.21.1/gallery_dl.egg-info/entry_points.txt0000644000175000017500000000006014224101737021511 0ustar00mikemike[console_scripts] gallery-dl = gallery_dl:main ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1649443807.0 gallery_dl-1.21.1/gallery_dl.egg-info/requires.txt0000644000175000017500000000004514224101737020616 0ustar00mikemikerequests>=2.11.0 [video] youtube-dl ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1649443807.0 gallery_dl-1.21.1/gallery_dl.egg-info/top_level.txt0000644000175000017500000000001314224101737020743 0ustar00mikemikegallery_dl ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1649443808.1847284 gallery_dl-1.21.1/setup.cfg0000644000175000017500000000033014224101740014216 0ustar00mikemike[flake8] exclude = gallery_dl/__init__.py,gallery_dl/__main__.py,setup.py,build,scripts,archive ignore = E203,E226,W504 per-file-ignores = gallery_dl/extractor/500px.py: E501 [egg_info] tag_build = tag_date = 0 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1648567962.0 gallery_dl-1.21.1/setup.py0000644000175000017500000001006414220623232014115 0ustar00mikemike#!/usr/bin/env python3 # -*- coding: utf-8 -*- import re import sys import os.path import warnings from setuptools import setup def read(fname): path = os.path.join(os.path.dirname(__file__), fname) with open(path, encoding="utf-8") as file: return file.read() def check_file(fname): path = os.path.join(os.path.dirname(__file__), fname) if os.path.exists(path): return True warnings.warn( "Not including file '{}' since it is not present. " "Run 'make' to build all automatically generated files.".format(fname) ) return False # get version without importing the package VERSION = re.search( r'__version__\s*=\s*"([^"]+)"', read("gallery_dl/version.py"), ).group(1) FILES = [ (path, [f for f in files if check_file(f)]) for (path, files) in [ ("share/bash-completion/completions", ["data/completion/gallery-dl"]), ("share/zsh/site-functions" , ["data/completion/_gallery-dl"]), ("share/fish/vendor_completions.d" , ["data/completion/gallery-dl.fish"]), ("share/man/man1" , ["data/man/gallery-dl.1"]), ("share/man/man5" , ["data/man/gallery-dl.conf.5"]), ] ] DESCRIPTION = ("Command-line program to download image galleries and " "collections from several image hosting sites") LONG_DESCRIPTION = read("README.rst") if "py2exe" in sys.argv: try: import py2exe except ImportError: sys.exit("Error importing 'py2exe'") # py2exe dislikes version specifiers with a trailing '-dev' VERSION = VERSION.partition("-")[0] params = { "console": [{ "script" : "./gallery_dl/__main__.py", "dest_base" : "gallery-dl", "version" : VERSION, "description" : DESCRIPTION, "comments" : LONG_DESCRIPTION, "product_name" : "gallery-dl", "product_version": VERSION, }], "options": {"py2exe": { "bundle_files": 0, "compressed" : 1, "optimize" : 1, "dist_dir" : ".", "packages" : ["gallery_dl"], "includes" : ["youtube_dl"], "dll_excludes": ["w9xpopen.exe"], }}, "zipfile": None, } else: params = {} setup( name="gallery_dl", version=VERSION, description=DESCRIPTION, long_description=LONG_DESCRIPTION, url="https://github.com/mikf/gallery-dl", download_url="https://github.com/mikf/gallery-dl/releases/latest", author="Mike Fährmann", author_email="mike_faehrmann@web.de", maintainer="Mike Fährmann", maintainer_email="mike_faehrmann@web.de", license="GPLv2", python_requires=">=3.4", install_requires=[ "requests>=2.11.0", ], extras_require={ "video": [ "youtube-dl", ], }, packages=[ "gallery_dl", "gallery_dl.extractor", "gallery_dl.downloader", "gallery_dl.postprocessor", ], entry_points={ "console_scripts": [ "gallery-dl = gallery_dl:main", ], }, data_files=FILES, keywords="image gallery downloader crawler scraper", classifiers=[ "Development Status :: 5 - Production/Stable", "Environment :: Console", "Intended Audience :: End Users/Desktop", "License :: OSI Approved :: GNU General Public License v2 (GPLv2)", "Operating System :: Microsoft :: Windows", "Operating System :: POSIX", "Operating System :: MacOS", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7", "Programming Language :: Python :: 3.8", "Programming Language :: Python :: 3.9", "Programming Language :: Python :: 3 :: Only", "Topic :: Internet :: WWW/HTTP", "Topic :: Multimedia :: Graphics", "Topic :: Utilities", ], test_suite="test", **params, ) ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1649443808.1847284 gallery_dl-1.21.1/test/0000755000175000017500000000000014224101740013360 5ustar00mikemike././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1643756959.0 gallery_dl-1.21.1/test/test_cache.py0000644000175000017500000001430414176336637016063 0ustar00mikemike#!/usr/bin/env python3 # -*- coding: utf-8 -*- # Copyright 2020 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. import os import sys import unittest from unittest.mock import patch import tempfile sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) from gallery_dl import config, util # noqa E402 dbpath = tempfile.mkstemp()[1] config.set(("cache",), "file", dbpath) from gallery_dl import cache # noqa E402 cache._init() # def tearDownModule(): # util.remove_file(dbpath) class TestCache(unittest.TestCase): def test_decorator(self): @cache.memcache() def mc1(): pass @cache.memcache(maxage=10) def mc2(): pass @cache.cache() def dbc(): pass self.assertIsInstance(mc1, cache.CacheDecorator) self.assertIsInstance(mc2, cache.MemoryCacheDecorator) self.assertIsInstance(dbc, cache.DatabaseCacheDecorator) def test_keyarg_mem_simple(self): @cache.memcache(keyarg=2) def ka(a, b, c): return a+b+c self.assertEqual(ka(1, 1, 1), 3) self.assertEqual(ka(2, 2, 2), 6) self.assertEqual(ka(0, 0, 1), 3) self.assertEqual(ka(9, 9, 1), 3) self.assertEqual(ka(0, 0, 2), 6) self.assertEqual(ka(9, 9, 2), 6) def test_keyarg_mem(self): @cache.memcache(keyarg=2, maxage=10) def ka(a, b, c): return a+b+c self.assertEqual(ka(1, 1, 1), 3) self.assertEqual(ka(2, 2, 2), 6) self.assertEqual(ka(0, 0, 1), 3) self.assertEqual(ka(9, 9, 1), 3) self.assertEqual(ka(0, 0, 2), 6) self.assertEqual(ka(9, 9, 2), 6) def test_keyarg_db(self): @cache.cache(keyarg=2, maxage=10) def ka(a, b, c): return a+b+c self.assertEqual(ka(1, 1, 1), 3) self.assertEqual(ka(2, 2, 2), 6) self.assertEqual(ka(0, 0, 1), 3) self.assertEqual(ka(9, 9, 1), 3) self.assertEqual(ka(0, 0, 2), 6) self.assertEqual(ka(9, 9, 2), 6) def test_expires_mem(self): @cache.memcache(maxage=2) def ex(a, b, c): return a+b+c with patch("time.time") as tmock: tmock.return_value = 0.001 self.assertEqual(ex(1, 1, 1), 3) self.assertEqual(ex(2, 2, 2), 3) self.assertEqual(ex(3, 3, 3), 3) # value is still cached after 1 second tmock.return_value += 1.0 self.assertEqual(ex(3, 3, 3), 3) self.assertEqual(ex(2, 2, 2), 3) self.assertEqual(ex(1, 1, 1), 3) # new value after 'maxage' seconds tmock.return_value += 1.0 self.assertEqual(ex(3, 3, 3), 9) self.assertEqual(ex(2, 2, 2), 9) self.assertEqual(ex(1, 1, 1), 9) def test_expires_db(self): @cache.cache(maxage=2) def ex(a, b, c): return a+b+c with patch("time.time") as tmock: tmock.return_value = 0.999 self.assertEqual(ex(1, 1, 1), 3) self.assertEqual(ex(2, 2, 2), 3) self.assertEqual(ex(3, 3, 3), 3) # value is still cached after 1 second tmock.return_value += 1.0 self.assertEqual(ex(3, 3, 3), 3) self.assertEqual(ex(2, 2, 2), 3) self.assertEqual(ex(1, 1, 1), 3) # new value after 'maxage' seconds tmock.return_value += 1.0 self.assertEqual(ex(3, 3, 3), 9) self.assertEqual(ex(2, 2, 2), 9) self.assertEqual(ex(1, 1, 1), 9) def test_update_mem_simple(self): @cache.memcache(keyarg=0) def up(a, b, c): return a+b+c self.assertEqual(up(1, 1, 1), 3) up.update(1, 0) up.update(2, 9) self.assertEqual(up(1, 0, 0), 0) self.assertEqual(up(2, 0, 0), 9) def test_update_mem(self): @cache.memcache(keyarg=0, maxage=10) def up(a, b, c): return a+b+c self.assertEqual(up(1, 1, 1), 3) up.update(1, 0) up.update(2, 9) self.assertEqual(up(1, 0, 0), 0) self.assertEqual(up(2, 0, 0), 9) def test_update_db(self): @cache.cache(keyarg=0, maxage=10) def up(a, b, c): return a+b+c self.assertEqual(up(1, 1, 1), 3) up.update(1, 0) up.update(2, 9) self.assertEqual(up(1, 0, 0), 0) self.assertEqual(up(2, 0, 0), 9) def test_invalidate_mem_simple(self): @cache.memcache(keyarg=0) def inv(a, b, c): return a+b+c self.assertEqual(inv(1, 1, 1), 3) inv.invalidate(1) inv.invalidate(2) self.assertEqual(inv(1, 0, 0), 1) self.assertEqual(inv(2, 0, 0), 2) def test_invalidate_mem(self): @cache.memcache(keyarg=0, maxage=10) def inv(a, b, c): return a+b+c self.assertEqual(inv(1, 1, 1), 3) inv.invalidate(1) inv.invalidate(2) self.assertEqual(inv(1, 0, 0), 1) self.assertEqual(inv(2, 0, 0), 2) def test_invalidate_db(self): @cache.cache(keyarg=0, maxage=10) def inv(a, b, c): return a+b+c self.assertEqual(inv(1, 1, 1), 3) inv.invalidate(1) inv.invalidate(2) self.assertEqual(inv(1, 0, 0), 1) self.assertEqual(inv(2, 0, 0), 2) def test_database_read(self): @cache.cache(keyarg=0, maxage=10) def db(a, b, c): return a+b+c # initialize cache self.assertEqual(db(1, 1, 1), 3) db.update(2, 6) # check and clear the in-memory portion of said cache self.assertEqual(db.cache[1][0], 3) self.assertEqual(db.cache[2][0], 6) db.cache.clear() self.assertEqual(db.cache, {}) # fetch results from database self.assertEqual(db(1, 0, 0), 3) self.assertEqual(db(2, 0, 0), 6) # check in-memory cache updates self.assertEqual(db.cache[1][0], 3) self.assertEqual(db.cache[2][0], 6) if __name__ == '__main__': unittest.main() ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1645832452.0 gallery_dl-1.21.1/test/test_config.py0000644000175000017500000001753114206264404016254 0ustar00mikemike#!/usr/bin/env python3 # -*- coding: utf-8 -*- # Copyright 2015-2020 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. import os import sys import unittest import json import tempfile ROOTDIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__))) sys.path.insert(0, ROOTDIR) from gallery_dl import config # noqa E402 class TestConfig(unittest.TestCase): def setUp(self): config.set(() , "a", 1) config.set(("b",) , "a", 2) config.set(("b", "b"), "a", 3) config.set(("b",) , "c", "text") config.set(("b", "b"), "c", [8, 9]) def tearDown(self): config.clear() def test_get(self): self.assertEqual(config.get(() , "a") , 1) self.assertEqual(config.get(("b",) , "a") , 2) self.assertEqual(config.get(("b", "b"), "a") , 3) self.assertEqual(config.get(() , "c") , None) self.assertEqual(config.get(("b",) , "c") , "text") self.assertEqual(config.get(("b", "b"), "c") , [8, 9]) self.assertEqual(config.get(("a",) , "g") , None) self.assertEqual(config.get(("a", "a"), "g") , None) self.assertEqual(config.get(("e", "f"), "g") , None) self.assertEqual(config.get(("e", "f"), "g", 4), 4) def test_interpolate(self): self.assertEqual(config.interpolate(() , "a"), 1) self.assertEqual(config.interpolate(("b",) , "a"), 1) self.assertEqual(config.interpolate(("b", "b"), "a"), 1) self.assertEqual(config.interpolate(() , "c"), None) self.assertEqual(config.interpolate(("b",) , "c"), "text") self.assertEqual(config.interpolate(("b", "b"), "c"), [8, 9]) self.assertEqual(config.interpolate(("a",) , "g") , None) self.assertEqual(config.interpolate(("a", "a"), "g") , None) self.assertEqual(config.interpolate(("e", "f"), "g") , None) self.assertEqual(config.interpolate(("e", "f"), "g", 4), 4) self.assertEqual(config.interpolate(("b",), "d", 1) , 1) self.assertEqual(config.interpolate(("d",), "d", 1) , 1) config.set(() , "d", 2) self.assertEqual(config.interpolate(("b",), "d", 1) , 2) self.assertEqual(config.interpolate(("d",), "d", 1) , 2) config.set(("b",), "d", 3) self.assertEqual(config.interpolate(("b",), "d", 1) , 2) self.assertEqual(config.interpolate(("d",), "d", 1) , 2) def test_interpolate_common(self): def lookup(): return config.interpolate_common( ("Z1", "Z2"), ( ("A1", "A2"), ("B1",), ("C1", "C2", "C3"), ), "KEY", "DEFAULT", ) def test(path, value, expected=None): config.set(path, "KEY", value) self.assertEqual(lookup(), expected or value) self.assertEqual(lookup(), "DEFAULT") test(("Z1",), 1) test(("Z1", "Z2"), 2) test(("Z1", "Z2", "C1"), 3) test(("Z1", "Z2", "C1", "C2"), 4) test(("Z1", "Z2", "C1", "C2", "C3"), 5) test(("Z1", "Z2", "B1"), 6) test(("Z1", "Z2", "A1"), 7) test(("Z1", "Z2", "A1", "A2"), 8) test(("Z1", "A1", "A2"), 999, 8) test(("Z1", "Z2", "A1", "A2", "A3"), 999, 8) test((), 9) def test_accumulate(self): self.assertEqual(config.accumulate((), "l"), []) config.set(() , "l", [5, 6]) config.set(("c",) , "l", [3, 4]) config.set(("c", "c"), "l", [1, 2]) self.assertEqual( config.accumulate((), "l") , [5, 6]) self.assertEqual( config.accumulate(("c",), "l") , [3, 4, 5, 6]) self.assertEqual( config.accumulate(("c", "c"), "l"), [1, 2, 3, 4, 5, 6]) config.set(("c",), "l", None) config.unset(("c", "c"), "l") self.assertEqual( config.accumulate((), "l") , [5, 6]) self.assertEqual( config.accumulate(("c",), "l") , [5, 6]) self.assertEqual( config.accumulate(("c", "c"), "l"), [5, 6]) def test_set(self): config.set(() , "c", [1, 2, 3]) config.set(("b",) , "c", [1, 2, 3]) config.set(("e", "f"), "g", value=234) self.assertEqual(config.get(() , "c"), [1, 2, 3]) self.assertEqual(config.get(("b",) , "c"), [1, 2, 3]) self.assertEqual(config.get(("e", "f"), "g"), 234) def test_setdefault(self): config.setdefault(() , "c", [1, 2, 3]) config.setdefault(("b",) , "c", [1, 2, 3]) config.setdefault(("e", "f"), "g", value=234) self.assertEqual(config.get(() , "c"), [1, 2, 3]) self.assertEqual(config.get(("b",) , "c"), "text") self.assertEqual(config.get(("e", "f"), "g"), 234) def test_unset(self): config.unset(() , "a") config.unset(("b",), "c") config.unset(("a",), "d") config.unset(("b",), "d") config.unset(("c",), "d") self.assertEqual(config.get(() , "a"), None) self.assertEqual(config.get(("b",), "a"), 2) self.assertEqual(config.get(("b",), "c"), None) self.assertEqual(config.get(("a",), "d"), None) self.assertEqual(config.get(("b",), "d"), None) self.assertEqual(config.get(("c",), "d"), None) def test_apply(self): options = ( (("b",) , "c", [1, 2, 3]), (("e", "f"), "g", 234), ) self.assertEqual(config.get(("b",) , "c"), "text") self.assertEqual(config.get(("e", "f"), "g"), None) with config.apply(options): self.assertEqual(config.get(("b",) , "c"), [1, 2, 3]) self.assertEqual(config.get(("e", "f"), "g"), 234) self.assertEqual(config.get(("b",) , "c"), "text") self.assertEqual(config.get(("e", "f"), "g"), None) def test_load(self): with tempfile.TemporaryDirectory() as base: path1 = os.path.join(base, "cfg1") with open(path1, "w") as file: file.write('{"a": 1, "b": {"a": 2, "c": "text"}}') path2 = os.path.join(base, "cfg2") with open(path2, "w") as file: file.write('{"a": 7, "b": {"a": 8, "e": "foo"}}') config.clear() config.load((path1,)) self.assertEqual(config.get(() , "a"), 1) self.assertEqual(config.get(("b",), "a"), 2) self.assertEqual(config.get(("b",), "c"), "text") config.load((path2,)) self.assertEqual(config.get(() , "a"), 7) self.assertEqual(config.get(("b",), "a"), 8) self.assertEqual(config.get(("b",), "c"), "text") self.assertEqual(config.get(("b",), "e"), "foo") config.clear() config.load((path1, path2)) self.assertEqual(config.get(() , "a"), 7) self.assertEqual(config.get(("b",), "a"), 8) self.assertEqual(config.get(("b",), "c"), "text") self.assertEqual(config.get(("b",), "e"), "foo") class TestConfigFiles(unittest.TestCase): def test_default_config(self): cfg = self._load("gallery-dl.conf") self.assertIsInstance(cfg, dict) self.assertTrue(cfg) def test_example_config(self): cfg = self._load("gallery-dl-example.conf") self.assertIsInstance(cfg, dict) self.assertTrue(cfg) @staticmethod def _load(name): path = os.path.join(ROOTDIR, "docs", name) try: with open(path) as fp: return json.load(fp) except FileNotFoundError: raise unittest.SkipTest(path + " not available") if __name__ == '__main__': unittest.main() ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1648567962.0 gallery_dl-1.21.1/test/test_cookies.py0000644000175000017500000001641314220623232016433 0ustar00mikemike#!/usr/bin/env python3 # -*- coding: utf-8 -*- # Copyright 2017-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. import os import sys import unittest from unittest import mock import time import logging import tempfile from os.path import join sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) from gallery_dl import config, extractor # noqa E402 class TestCookiejar(unittest.TestCase): @classmethod def setUpClass(cls): cls.path = tempfile.TemporaryDirectory() cls.cookiefile = join(cls.path.name, "cookies.txt") with open(cls.cookiefile, "w") as file: file.write("""# HTTP Cookie File .example.org\tTRUE\t/\tFALSE\t253402210800\tNAME\tVALUE """) cls.invalid_cookiefile = join(cls.path.name, "invalid.txt") with open(cls.invalid_cookiefile, "w") as file: file.write("""# asd .example.org\tTRUE/FALSE\t253402210800\tNAME\tVALUE """) @classmethod def tearDownClass(cls): cls.path.cleanup() config.clear() def test_cookiefile(self): config.set((), "cookies", self.cookiefile) cookies = extractor.find("test:").session.cookies self.assertEqual(len(cookies), 1) cookie = next(iter(cookies)) self.assertEqual(cookie.domain, ".example.org") self.assertEqual(cookie.path , "/") self.assertEqual(cookie.name , "NAME") self.assertEqual(cookie.value , "VALUE") def test_invalid_cookiefile(self): self._test_warning(self.invalid_cookiefile, ValueError) def test_invalid_filename(self): self._test_warning(join(self.path.name, "nothing"), FileNotFoundError) def _test_warning(self, filename, exc): config.set((), "cookies", filename) log = logging.getLogger("test") with mock.patch.object(log, "warning") as mock_warning: cookies = extractor.find("test:").session.cookies self.assertEqual(len(cookies), 0) self.assertEqual(mock_warning.call_count, 1) self.assertEqual(mock_warning.call_args[0][0], "cookies: %s") self.assertIsInstance(mock_warning.call_args[0][1], exc) class TestCookiedict(unittest.TestCase): def setUp(self): self.cdict = {"NAME1": "VALUE1", "NAME2": "VALUE2"} config.set((), "cookies", self.cdict) def tearDown(self): config.clear() def test_dict(self): cookies = extractor.find("test:").session.cookies self.assertEqual(len(cookies), len(self.cdict)) self.assertEqual(sorted(cookies.keys()), sorted(self.cdict.keys())) self.assertEqual(sorted(cookies.values()), sorted(self.cdict.values())) def test_domain(self): for category in ["exhentai", "idolcomplex", "nijie"]: extr = _get_extractor(category) cookies = extr.session.cookies for key in self.cdict: self.assertTrue(key in cookies) for c in cookies: self.assertEqual(c.domain, extr.cookiedomain) class TestCookieLogin(unittest.TestCase): def tearDown(self): config.clear() def test_cookie_login(self): extr_cookies = { "exhentai" : ("ipb_member_id", "ipb_pass_hash"), "idolcomplex": ("login", "pass_hash"), "nijie" : ("nemail", "nlogin"), } for category, cookienames in extr_cookies.items(): cookies = {name: "value" for name in cookienames} config.set((), "cookies", cookies) extr = _get_extractor(category) with mock.patch.object(extr, "_login_impl") as mock_login: extr.login() mock_login.assert_not_called() class TestCookieUtils(unittest.TestCase): def test_check_cookies(self): extr = extractor.find("test:") self.assertFalse(extr._cookiejar, "empty") self.assertFalse(extr.cookiedomain, "empty") # always returns False when checking for empty cookie list self.assertFalse(extr._check_cookies(())) self.assertFalse(extr._check_cookies(("a",))) self.assertFalse(extr._check_cookies(("a", "b"))) self.assertFalse(extr._check_cookies(("a", "b", "c"))) extr._cookiejar.set("a", "1") self.assertTrue(extr._check_cookies(("a",))) self.assertFalse(extr._check_cookies(("a", "b"))) self.assertFalse(extr._check_cookies(("a", "b", "c"))) extr._cookiejar.set("b", "2") self.assertTrue(extr._check_cookies(("a",))) self.assertTrue(extr._check_cookies(("a", "b"))) self.assertFalse(extr._check_cookies(("a", "b", "c"))) def test_check_cookies_domain(self): extr = extractor.find("test:") self.assertFalse(extr._cookiejar, "empty") extr.cookiedomain = ".example.org" self.assertFalse(extr._check_cookies(("a",))) self.assertFalse(extr._check_cookies(("a", "b"))) extr._cookiejar.set("a", "1") self.assertFalse(extr._check_cookies(("a",))) extr._cookiejar.set("a", "1", domain=extr.cookiedomain) self.assertTrue(extr._check_cookies(("a",))) extr._cookiejar.set("a", "1", domain="www" + extr.cookiedomain) self.assertEqual(len(extr._cookiejar), 3) self.assertTrue(extr._check_cookies(("a",))) extr._cookiejar.set("b", "2", domain=extr.cookiedomain) extr._cookiejar.set("c", "3", domain=extr.cookiedomain) self.assertTrue(extr._check_cookies(("a", "b", "c"))) def test_check_cookies_expires(self): extr = extractor.find("test:") self.assertFalse(extr._cookiejar, "empty") self.assertFalse(extr.cookiedomain, "empty") now = int(time.time()) log = logging.getLogger("test") extr._cookiejar.set("a", "1", expires=now-100) with mock.patch.object(log, "warning") as mw: self.assertFalse(extr._check_cookies(("a",))) self.assertEqual(mw.call_count, 1) self.assertEqual(mw.call_args[0], ("Cookie '%s' has expired", "a")) extr._cookiejar.set("a", "1", expires=now+100) with mock.patch.object(log, "warning") as mw: self.assertTrue(extr._check_cookies(("a",))) self.assertEqual(mw.call_count, 1) self.assertEqual(mw.call_args[0], ( "Cookie '%s' will expire in less than %s hour%s", "a", 1, "")) extr._cookiejar.set("a", "1", expires=now+100+7200) with mock.patch.object(log, "warning") as mw: self.assertTrue(extr._check_cookies(("a",))) self.assertEqual(mw.call_count, 1) self.assertEqual(mw.call_args[0], ( "Cookie '%s' will expire in less than %s hour%s", "a", 3, "s")) extr._cookiejar.set("a", "1", expires=now+100+24*3600) with mock.patch.object(log, "warning") as mw: self.assertTrue(extr._check_cookies(("a",))) self.assertEqual(mw.call_count, 0) def _get_extractor(category): for extr in extractor.extractors(): if extr.category == category and hasattr(extr, "_login_impl"): url = next(extr._get_tests())[0] return extr.from_url(url) if __name__ == "__main__": unittest.main() ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1643756959.0 gallery_dl-1.21.1/test/test_downloader.py0000644000175000017500000002232614176336637017161 0ustar00mikemike#!/usr/bin/env python3 # -*- coding: utf-8 -*- # Copyright 2018-2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. import os import sys import unittest from unittest.mock import Mock, MagicMock, patch import re import base64 import logging import os.path import tempfile import threading import http.server sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) from gallery_dl import downloader, extractor, output, config, path # noqa E402 class MockDownloaderModule(Mock): __downloader__ = "mock" class FakeJob(): def __init__(self): self.extractor = extractor.find("test:") self.pathfmt = path.PathFormat(self.extractor) self.out = output.NullOutput() self.get_logger = logging.getLogger class TestDownloaderModule(unittest.TestCase): @classmethod def setUpClass(cls): # allow import of ytdl downloader module without youtube_dl installed sys.modules["youtube_dl"] = MagicMock() @classmethod def tearDownClass(cls): del sys.modules["youtube_dl"] def tearDown(self): downloader._cache.clear() def test_find(self): cls = downloader.find("http") self.assertEqual(cls.__name__, "HttpDownloader") self.assertEqual(cls.scheme , "http") cls = downloader.find("https") self.assertEqual(cls.__name__, "HttpDownloader") self.assertEqual(cls.scheme , "http") cls = downloader.find("text") self.assertEqual(cls.__name__, "TextDownloader") self.assertEqual(cls.scheme , "text") cls = downloader.find("ytdl") self.assertEqual(cls.__name__, "YoutubeDLDownloader") self.assertEqual(cls.scheme , "ytdl") self.assertEqual(downloader.find("ftp"), None) self.assertEqual(downloader.find("foo"), None) self.assertEqual(downloader.find(1234) , None) self.assertEqual(downloader.find(None) , None) @patch("builtins.__import__") def test_cache(self, import_module): import_module.return_value = MockDownloaderModule() downloader.find("http") downloader.find("text") downloader.find("ytdl") self.assertEqual(import_module.call_count, 3) downloader.find("http") downloader.find("text") downloader.find("ytdl") self.assertEqual(import_module.call_count, 3) @patch("builtins.__import__") def test_cache_http(self, import_module): import_module.return_value = MockDownloaderModule() downloader.find("http") downloader.find("https") self.assertEqual(import_module.call_count, 1) @patch("builtins.__import__") def test_cache_https(self, import_module): import_module.return_value = MockDownloaderModule() downloader.find("https") downloader.find("http") self.assertEqual(import_module.call_count, 1) class TestDownloaderBase(unittest.TestCase): @classmethod def setUpClass(cls): cls.dir = tempfile.TemporaryDirectory() cls.fnum = 0 config.set((), "base-directory", cls.dir.name) cls.job = FakeJob() @classmethod def tearDownClass(cls): cls.dir.cleanup() config.clear() @classmethod def _prepare_destination(cls, content=None, part=True, extension=None): name = "file-{}".format(cls.fnum) cls.fnum += 1 kwdict = { "category" : "test", "subcategory": "test", "filename" : name, "extension" : extension, } pathfmt = cls.job.pathfmt pathfmt.set_directory(kwdict) pathfmt.set_filename(kwdict) if content: mode = "w" + ("b" if isinstance(content, bytes) else "") with pathfmt.open(mode) as file: file.write(content) return pathfmt def _run_test(self, url, input, output, extension, expected_extension=None): pathfmt = self._prepare_destination(input, extension=extension) success = self.downloader.download(url, pathfmt) # test successful download self.assertTrue(success, "downloading '{}' failed".format(url)) # test content mode = "r" + ("b" if isinstance(output, bytes) else "") with pathfmt.open(mode) as file: content = file.read() self.assertEqual(content, output) # test filename extension self.assertEqual( pathfmt.extension, expected_extension, ) self.assertEqual( os.path.splitext(pathfmt.realpath)[1][1:], expected_extension, ) class TestHTTPDownloader(TestDownloaderBase): @classmethod def setUpClass(cls): TestDownloaderBase.setUpClass() cls.downloader = downloader.find("http")(cls.job) port = 8088 cls.address = "http://127.0.0.1:{}".format(port) cls._jpg = cls.address + "/image.jpg" cls._png = cls.address + "/image.png" cls._gif = cls.address + "/image.gif" server = http.server.HTTPServer(("", port), HttpRequestHandler) threading.Thread(target=server.serve_forever, daemon=True).start() def tearDown(self): self.downloader.minsize = self.downloader.maxsize = None def test_http_download(self): self._run_test(self._jpg, None, DATA_JPG, "jpg", "jpg") self._run_test(self._png, None, DATA_PNG, "png", "png") self._run_test(self._gif, None, DATA_GIF, "gif", "gif") def test_http_offset(self): self._run_test(self._jpg, DATA_JPG[:123], DATA_JPG, "jpg", "jpg") self._run_test(self._png, DATA_PNG[:12] , DATA_PNG, "png", "png") self._run_test(self._gif, DATA_GIF[:1] , DATA_GIF, "gif", "gif") def test_http_extension(self): self._run_test(self._jpg, None, DATA_JPG, None, "jpg") self._run_test(self._png, None, DATA_PNG, None, "png") self._run_test(self._gif, None, DATA_GIF, None, "gif") def test_http_adjust_extension(self): self._run_test(self._jpg, None, DATA_JPG, "png", "jpg") self._run_test(self._png, None, DATA_PNG, "gif", "png") self._run_test(self._gif, None, DATA_GIF, "jpg", "gif") def test_http_filesize_min(self): pathfmt = self._prepare_destination(None, extension=None) self.downloader.minsize = 100 with self.assertLogs(self.downloader.log, "WARNING"): success = self.downloader.download(self._gif, pathfmt) self.assertFalse(success) def test_http_filesize_max(self): pathfmt = self._prepare_destination(None, extension=None) self.downloader.maxsize = 100 with self.assertLogs(self.downloader.log, "WARNING"): success = self.downloader.download(self._jpg, pathfmt) self.assertFalse(success) class TestTextDownloader(TestDownloaderBase): @classmethod def setUpClass(cls): TestDownloaderBase.setUpClass() cls.downloader = downloader.find("text")(cls.job) def test_text_download(self): self._run_test("text:foobar", None, "foobar", "txt", "txt") def test_text_offset(self): self._run_test("text:foobar", "foo", "foobar", "txt", "txt") def test_text_empty(self): self._run_test("text:", None, "", "txt", "txt") class HttpRequestHandler(http.server.BaseHTTPRequestHandler): def do_GET(self): if self.path == "/image.jpg": content_type = "image/jpeg" output = DATA_JPG elif self.path == "/image.png": content_type = "image/png" output = DATA_PNG elif self.path == "/image.gif": content_type = "image/gif" output = DATA_GIF else: self.send_response(404) self.wfile.write(self.path.encode()) return headers = { "Content-Type": content_type, "Content-Length": len(output), } if "Range" in self.headers: status = 206 match = re.match(r"bytes=(\d+)-", self.headers["Range"]) start = int(match.group(1)) headers["Content-Range"] = "bytes {}-{}/{}".format( start, len(output)-1, len(output)) output = output[start:] else: status = 200 self.send_response(status) for key, value in headers.items(): self.send_header(key, value) self.end_headers() self.wfile.write(output) DATA_JPG = base64.standard_b64decode(""" /9j/4AAQSkZJRgABAQEASABIAAD/2wBD AAEBAQEBAQEBAQEBAQEBAQEBAQEBAQEB AQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEB AQEBAQEBAQEBAQEBAQEBAQH/2wBDAQEB AQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEB AQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEB AQEBAQEBAQEBAQEBAQH/wAARCAABAAED AREAAhEBAxEB/8QAFAABAAAAAAAAAAAA AAAAAAAACv/EABQQAQAAAAAAAAAAAAAA AAAAAAD/xAAUAQEAAAAAAAAAAAAAAAAA AAAA/8QAFBEBAAAAAAAAAAAAAAAAAAAA AP/aAAwDAQACEQMRAD8AfwD/2Q==""") DATA_PNG = base64.standard_b64decode(""" iVBORw0KGgoAAAANSUhEUgAAAAEAAAAB CAAAAAA6fptVAAAACklEQVQIHWP4DwAB AQEANl9ngAAAAABJRU5ErkJggg==""") DATA_GIF = base64.standard_b64decode(""" R0lGODdhAQABAIAAAP///////ywAAAAA AQABAAACAkQBADs=""") if __name__ == "__main__": unittest.main() ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1643756959.0 gallery_dl-1.21.1/test/test_extractor.py0000644000175000017500000002135614176336637017040 0ustar00mikemike#!/usr/bin/env python3 # -*- coding: utf-8 -*- # Copyright 2018-2020 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. import os import sys import unittest from unittest.mock import patch import time import string from datetime import datetime, timedelta sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) from gallery_dl import extractor # noqa E402 from gallery_dl.extractor import mastodon # noqa E402 from gallery_dl.extractor.common import Extractor, Message # noqa E402 from gallery_dl.extractor.directlink import DirectlinkExtractor # noqa E402 _list_classes = extractor._list_classes class FakeExtractor(Extractor): category = "fake" subcategory = "test" pattern = "fake:" def items(self): yield Message.Version, 1 yield Message.Url, "text:foobar", {} class TestExtractorModule(unittest.TestCase): VALID_URIS = ( "https://example.org/file.jpg", "tumblr:foobar", "oauth:flickr", "test:pixiv:", "recursive:https://example.org/document.html", ) def setUp(self): extractor._cache.clear() extractor._module_iter = iter(extractor.modules) extractor._list_classes = _list_classes def test_find(self): for uri in self.VALID_URIS: result = extractor.find(uri) self.assertIsInstance(result, Extractor, uri) for not_found in ("", "/tmp/file.ext"): self.assertIsNone(extractor.find(not_found)) for invalid in (None, [], {}, 123, b"test:"): with self.assertRaises(TypeError): extractor.find(invalid) def test_add(self): uri = "fake:foobar" self.assertIsNone(extractor.find(uri)) extractor.add(FakeExtractor) self.assertIsInstance(extractor.find(uri), FakeExtractor) def test_add_module(self): uri = "fake:foobar" self.assertIsNone(extractor.find(uri)) classes = extractor.add_module(sys.modules[__name__]) self.assertEqual(len(classes), 1) self.assertEqual(classes[0].pattern, FakeExtractor.pattern) self.assertEqual(classes[0], FakeExtractor) self.assertIsInstance(extractor.find(uri), FakeExtractor) def test_from_url(self): for uri in self.VALID_URIS: cls = extractor.find(uri).__class__ extr = cls.from_url(uri) self.assertIs(type(extr), cls) self.assertIsInstance(extr, Extractor) for not_found in ("", "/tmp/file.ext"): self.assertIsNone(FakeExtractor.from_url(not_found)) for invalid in (None, [], {}, 123, b"test:"): with self.assertRaises(TypeError): FakeExtractor.from_url(invalid) def test_unique_pattern_matches(self): test_urls = [] # collect testcase URLs for extr in extractor.extractors(): for testcase in extr._get_tests(): test_urls.append((testcase[0], extr)) # iterate over all testcase URLs for url, extr1 in test_urls: matches = [] # ... and apply all regex patterns to each one for extr2 in extractor._cache: # skip DirectlinkExtractor pattern if it isn't tested if extr1 != DirectlinkExtractor and \ extr2 == DirectlinkExtractor: continue match = extr2.pattern.match(url) if match: matches.append(match) # fail if more or less than 1 match happened if len(matches) > 1: msg = "'{}' gets matched by more than one pattern:".format(url) for match in matches: msg += "\n- " msg += match.re.pattern self.fail(msg) if len(matches) < 1: msg = "'{}' isn't matched by any pattern".format(url) self.fail(msg) def test_docstrings(self): """ensure docstring uniqueness""" for extr1 in extractor.extractors(): for extr2 in extractor.extractors(): if extr1 != extr2 and extr1.__doc__ and extr2.__doc__: self.assertNotEqual( extr1.__doc__, extr2.__doc__, "{} <-> {}".format(extr1, extr2), ) def test_names(self): """Ensure extractor classes are named CategorySubcategoryExtractor""" def capitalize(c): if "-" in c: return string.capwords(c.replace("-", " ")).replace(" ", "") return c.capitalize() for extr in extractor.extractors(): if extr.category not in ("", "oauth", "ytdl"): expected = "{}{}Extractor".format( capitalize(extr.category), capitalize(extr.subcategory), ) if expected[0].isdigit(): expected = "_" + expected self.assertEqual(expected, extr.__name__) class TestExtractorWait(unittest.TestCase): def test_wait_seconds(self): extr = extractor.find("test:") seconds = 5 until = time.time() + seconds with patch("time.sleep") as sleep, patch.object(extr, "log") as log: extr.wait(seconds=seconds) sleep.assert_called_once_with(6.0) calls = log.info.mock_calls self.assertEqual(len(calls), 1) self._assert_isotime(calls[0][1][1], until) def test_wait_until(self): extr = extractor.find("test:") until = time.time() + 5 with patch("time.sleep") as sleep, patch.object(extr, "log") as log: extr.wait(until=until) calls = sleep.mock_calls self.assertEqual(len(calls), 1) self.assertAlmostEqual(calls[0][1][0], 6.0, places=1) calls = log.info.mock_calls self.assertEqual(len(calls), 1) self._assert_isotime(calls[0][1][1], until) def test_wait_until_datetime(self): extr = extractor.find("test:") until = datetime.utcnow() + timedelta(seconds=5) until_local = datetime.now() + timedelta(seconds=5) with patch("time.sleep") as sleep, patch.object(extr, "log") as log: extr.wait(until=until) calls = sleep.mock_calls self.assertEqual(len(calls), 1) self.assertAlmostEqual(calls[0][1][0], 6.0, places=1) calls = log.info.mock_calls self.assertEqual(len(calls), 1) self._assert_isotime(calls[0][1][1], until_local) def _assert_isotime(self, output, until): if not isinstance(until, datetime): until = datetime.fromtimestamp(until) o = self._isotime_to_seconds(output) u = self._isotime_to_seconds(until.time().isoformat()[:8]) self.assertLess(o-u, 1.0) @staticmethod def _isotime_to_seconds(isotime): parts = isotime.split(":") return int(parts[0]) * 3600 + int(parts[1]) * 60 + int(parts[2]) class TextExtractorOAuth(unittest.TestCase): def test_oauth1(self): for category in ("flickr", "smugmug", "tumblr"): extr = extractor.find("oauth:" + category) with patch.object(extr, "_oauth1_authorization_flow") as m: for msg in extr: pass self.assertEqual(len(m.mock_calls), 1) def test_oauth2(self): for category in ("deviantart", "reddit"): extr = extractor.find("oauth:" + category) with patch.object(extr, "_oauth2_authorization_code_grant") as m: for msg in extr: pass self.assertEqual(len(m.mock_calls), 1) def test_oauth2_mastodon(self): extr = extractor.find("oauth:mastodon:pawoo.net") with patch.object(extr, "_oauth2_authorization_code_grant") as m, \ patch.object(extr, "_register") as r: for msg in extr: pass self.assertEqual(len(r.mock_calls), 0) self.assertEqual(len(m.mock_calls), 1) def test_oauth2_mastodon_unknown(self): extr = extractor.find("oauth:mastodon:example.com") with patch.object(extr, "_oauth2_authorization_code_grant") as m, \ patch.object(extr, "_register") as r: r.return_value = { "client-id" : "foo", "client-secret": "bar", } for msg in extr: pass self.assertEqual(len(r.mock_calls), 1) self.assertEqual(len(m.mock_calls), 1) if __name__ == "__main__": unittest.main() ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1648567962.0 gallery_dl-1.21.1/test/test_formatter.py0000644000175000017500000002504314220623232017001 0ustar00mikemike#!/usr/bin/env python3 # -*- coding: utf-8 -*- # Copyright 2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. import os import sys import unittest import datetime import tempfile sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) from gallery_dl import formatter # noqa E402 class TestFormatter(unittest.TestCase): kwdict = { "a": "hElLo wOrLd", "b": "äöü", "d": {"a": "foo", "b": 0, "c": None}, "l": ["a", "b", "c"], "n": None, "s": " \n\r\tSPACE ", "u": "'< / >'", "t": 1262304000, "dt": datetime.datetime(2010, 1, 1), "ds": "2010-01-01T01:00:00+0100", "name": "Name", "title1": "Title", "title2": "", "title3": None, "title4": 0, } def test_conversions(self): self._run_test("{a!l}", "hello world") self._run_test("{a!u}", "HELLO WORLD") self._run_test("{a!c}", "Hello world") self._run_test("{a!C}", "Hello World") self._run_test("{s!t}", "SPACE") self._run_test("{a!U}", self.kwdict["a"]) self._run_test("{u!U}", "'< / >'") self._run_test("{a!s}", self.kwdict["a"]) self._run_test("{a!r}", "'" + self.kwdict["a"] + "'") self._run_test("{a!a}", "'" + self.kwdict["a"] + "'") self._run_test("{b!a}", "'\\xe4\\xf6\\xfc'") self._run_test("{a!S}", self.kwdict["a"]) self._run_test("{l!S}", "a, b, c") self._run_test("{n!S}", "") self._run_test("{t!d}", datetime.datetime(2010, 1, 1)) self._run_test("{t!d:%Y-%m-%d}", "2010-01-01") self._run_test("{dt!T}", "1262304000") self._run_test("{l!j}", '["a", "b", "c"]') with self.assertRaises(KeyError): self._run_test("{a!q}", "hello world") def test_optional(self): self._run_test("{name}{title1}", "NameTitle") self._run_test("{name}{title1:?//}", "NameTitle") self._run_test("{name}{title1:? **/''/}", "Name **Title''") self._run_test("{name}{title2}", "Name") self._run_test("{name}{title2:?//}", "Name") self._run_test("{name}{title2:? **/''/}", "Name") self._run_test("{name}{title3}", "NameNone") self._run_test("{name}{title3:?//}", "Name") self._run_test("{name}{title3:? **/''/}", "Name") self._run_test("{name}{title4}", "Name0") self._run_test("{name}{title4:?//}", "Name") self._run_test("{name}{title4:? **/''/}", "Name") def test_missing(self): replacement = "None" self._run_test("{missing}", replacement) self._run_test("{missing.attr}", replacement) self._run_test("{missing[key]}", replacement) self._run_test("{missing:?a//}", "") self._run_test("{name[missing]}", replacement) self._run_test("{name[missing].attr}", replacement) self._run_test("{name[missing][key]}", replacement) self._run_test("{name[missing]:?a//}", "") def test_missing_custom_default(self): replacement = default = "foobar" self._run_test("{missing}" , replacement, default) self._run_test("{missing.attr}", replacement, default) self._run_test("{missing[key]}", replacement, default) self._run_test("{missing:?a//}", "a" + default, default) def test_alternative(self): self._run_test("{a|z}" , "hElLo wOrLd") self._run_test("{z|a}" , "hElLo wOrLd") self._run_test("{z|y|a}" , "hElLo wOrLd") self._run_test("{z|y|x|a}", "hElLo wOrLd") self._run_test("{z|n|a|y}", "hElLo wOrLd") self._run_test("{z|a!C}" , "Hello World") self._run_test("{z|a:Rh/C/}" , "CElLo wOrLd") self._run_test("{z|a!C:RH/C/}", "Cello World") self._run_test("{z|y|x:?/}", "") self._run_test("{d[c]|d[b]|d[a]}", "foo") self._run_test("{d[a]|d[b]|d[c]}", "foo") self._run_test("{d[z]|d[y]|d[x]}", "None") def test_indexing(self): self._run_test("{l[0]}" , "a") self._run_test("{a[6]}" , "w") def test_slicing(self): v = self.kwdict["a"] self._run_test("{a[1:10]}" , v[1:10]) self._run_test("{a[-10:-1]}", v[-10:-1]) self._run_test("{a[5:]}" , v[5:]) self._run_test("{a[50:]}", v[50:]) self._run_test("{a[:5]}" , v[:5]) self._run_test("{a[:50]}", v[:50]) self._run_test("{a[:]}" , v) self._run_test("{a[1:10:2]}" , v[1:10:2]) self._run_test("{a[-10:-1:2]}", v[-10:-1:2]) self._run_test("{a[5::2]}" , v[5::2]) self._run_test("{a[50::2]}", v[50::2]) self._run_test("{a[:5:2]}" , v[:5:2]) self._run_test("{a[:50:2]}", v[:50:2]) self._run_test("{a[::]}" , v) def test_maxlen(self): v = self.kwdict["a"] self._run_test("{a:L5/foo/}" , "foo") self._run_test("{a:L50/foo/}", v) self._run_test("{a:L50/foo/>50}", " " * 39 + v) self._run_test("{a:L50/foo/>51}", "foo") self._run_test("{a:Lab/foo/}", "foo") def test_join(self): self._run_test("{l:J}" , "abc") self._run_test("{l:J,}" , "a,b,c") self._run_test("{l:J,/}" , "a,b,c") self._run_test("{l:J,/>20}" , " a,b,c") self._run_test("{l:J - }" , "a - b - c") self._run_test("{l:J - /}" , "a - b - c") self._run_test("{l:J - />20}", " a - b - c") self._run_test("{a:J/}" , self.kwdict["a"]) self._run_test("{a:J, /}" , ", ".join(self.kwdict["a"])) def test_replace(self): self._run_test("{a:Rh/C/}" , "CElLo wOrLd") self._run_test("{a!l:Rh/C/}", "Cello world") self._run_test("{a!u:Rh/C/}", "HELLO WORLD") self._run_test("{a!l:Rl/_/}", "he__o wor_d") self._run_test("{a!l:Rl//}" , "heo word") self._run_test("{name:Rame/othing/}", "Nothing") def test_datetime(self): self._run_test("{ds:D%Y-%m-%dT%H:%M:%S%z}", "2010-01-01 00:00:00") self._run_test("{ds:D%Y}", "2010-01-01T01:00:00+0100") self._run_test("{l:D%Y}", "None") def test_chain_special(self): # multiple replacements self._run_test("{a:Rh/C/RE/e/RL/l/}", "Cello wOrld") self._run_test("{d[b]!s:R1/Q/R2/A/R0/Y/}", "Y") # join-and-replace self._run_test("{l:J-/Rb/E/}", "a-E-c") # optional-and-maxlen self._run_test("{d[a]:?/L1/too long/}", "") self._run_test("{d[c]:?/L5/too long/}", "") # parse and format datetime self._run_test("{ds:D%Y-%m-%dT%H:%M:%S%z/%Y%m%d}", "20100101") def test_globals_env(self): os.environ["FORMATTER_TEST"] = value = self.kwdict["a"] self._run_test("{_env[FORMATTER_TEST]}" , value) self._run_test("{_env[FORMATTER_TEST]!l}", value.lower()) self._run_test("{z|_env[FORMATTER_TEST]}", value) def test_globals_now(self): fmt = formatter.parse("{_now}") out1 = fmt.format_map(self.kwdict) self.assertRegex(out1, r"^\d{4}-\d\d-\d\d \d\d:\d\d:\d\d(\.\d+)?$") out = formatter.parse("{_now:%Y%m%d}").format_map(self.kwdict) now = datetime.datetime.now() self.assertRegex(out, r"^\d{8}$") self.assertEqual(out, format(now, "%Y%m%d")) out = formatter.parse("{z|_now:%Y}").format_map(self.kwdict) self.assertRegex(out, r"^\d{4}$") self.assertEqual(out, format(now, "%Y")) out2 = fmt.format_map(self.kwdict) self.assertRegex(out1, r"^\d{4}-\d\d-\d\d \d\d:\d\d:\d\d(\.\d+)?$") self.assertNotEqual(out1, out2) def test_template(self): with tempfile.TemporaryDirectory() as tmpdirname: path1 = os.path.join(tmpdirname, "tpl1") path2 = os.path.join(tmpdirname, "tpl2") with open(path1, "w") as fp: fp.write("{a}") fmt1 = formatter.parse("\fT " + path1) with open(path2, "w") as fp: fp.write("{a!u:Rh/C/}\nFooBar") fmt2 = formatter.parse("\fT " + path2) self.assertEqual(fmt1.format_map(self.kwdict), self.kwdict["a"]) self.assertEqual(fmt2.format_map(self.kwdict), "HELLO WORLD\nFooBar") with self.assertRaises(OSError): formatter.parse("\fT /") def test_expression(self): self._run_test("\fE a", self.kwdict["a"]) self._run_test("\fE name * 2 + ' ' + a", "{}{} {}".format( self.kwdict["name"], self.kwdict["name"], self.kwdict["a"])) @unittest.skipIf(sys.hexversion < 0x3060000, "no fstring support") def test_fstring(self): self._run_test("\fF {a}", self.kwdict["a"]) self._run_test("\fF {name}{name} {a}", "{}{} {}".format( self.kwdict["name"], self.kwdict["name"], self.kwdict["a"])) self._run_test("\fF foo-'\"{a.upper()}\"'-bar", """foo-'"{}"'-bar""".format(self.kwdict["a"].upper())) def test_module(self): with tempfile.TemporaryDirectory() as tmpdirname: path = os.path.join(tmpdirname, "testmod.py") with open(path, "w") as fp: fp.write(""" def gentext(kwdict): name = kwdict.get("Name") or kwdict.get("name") or "foo" return "'{title1}' by {}".format(name, **kwdict) def lengths(kwdict): a = 0 for k, v in kwdict.items(): try: a += len(v) except TypeError: pass return format(a) def noarg(): return "" """) sys.path.insert(0, tmpdirname) try: fmt1 = formatter.parse("\fM testmod:gentext") fmt2 = formatter.parse("\fM testmod:lengths") fmt3 = formatter.parse("\fM testmod:noarg") with self.assertRaises(AttributeError): formatter.parse("\fM testmod:missing") with self.assertRaises(ImportError): formatter.parse("\fM missing:missing") finally: sys.path.pop(0) self.assertEqual(fmt1.format_map(self.kwdict), "'Title' by Name") self.assertEqual(fmt2.format_map(self.kwdict), "89") with self.assertRaises(TypeError): self.assertEqual(fmt3.format_map(self.kwdict), "") def _run_test(self, format_string, result, default=None): fmt = formatter.parse(format_string, default) output = fmt.format_map(self.kwdict) self.assertEqual(output, result, format_string) if __name__ == '__main__': unittest.main() ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1645832469.0 gallery_dl-1.21.1/test/test_job.py0000644000175000017500000002407314206264425015563 0ustar00mikemike#!/usr/bin/env python3 # -*- coding: utf-8 -*- # Copyright 2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. import os import sys import unittest from unittest.mock import patch import io import contextlib sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) from gallery_dl import job, config, text # noqa E402 from gallery_dl.extractor.common import Extractor, Message # noqa E402 class TestJob(unittest.TestCase): def tearDown(self): config.clear() def _capture_stdout(self, extr_or_job): if isinstance(extr_or_job, Extractor): jobinstance = self.jobclass(extr_or_job) else: jobinstance = extr_or_job with io.StringIO() as buffer: with contextlib.redirect_stdout(buffer): jobinstance.run() return buffer.getvalue() class TestDownloadJob(TestJob): jobclass = job.DownloadJob def test_extractor_filter(self): extr = TestExtractor.from_url("test:") tjob = self.jobclass(extr) func = tjob._build_extractor_filter() self.assertEqual(func(TestExtractor) , False) self.assertEqual(func(TestExtractorParent), False) self.assertEqual(func(TestExtractorAlt) , True) config.set((), "blacklist", ":test_subcategory") func = tjob._build_extractor_filter() self.assertEqual(func(TestExtractor) , False) self.assertEqual(func(TestExtractorParent), True) self.assertEqual(func(TestExtractorAlt) , False) config.set((), "whitelist", "test_category:test_subcategory") func = tjob._build_extractor_filter() self.assertEqual(func(TestExtractor) , True) self.assertEqual(func(TestExtractorParent), False) self.assertEqual(func(TestExtractorAlt) , False) class TestKeywordJob(TestJob): jobclass = job.KeywordJob def test_default(self): extr = TestExtractor.from_url("test:") self.assertEqual(self._capture_stdout(extr), """\ Keywords for directory names: ----------------------------- category test_category subcategory test_subcategory Keywords for filenames and --filter: ------------------------------------ category test_category extension jpg filename 1 num 1 subcategory test_subcategory tags[] - foo - bar - テスト user[id] 123 user[name] test """) class TestUrlJob(TestJob): jobclass = job.UrlJob def test_default(self): extr = TestExtractor.from_url("test:") self.assertEqual(self._capture_stdout(extr), """\ https://example.org/1.jpg https://example.org/2.jpg https://example.org/3.jpg """) def test_fallback(self): extr = TestExtractor.from_url("test:") tjob = self.jobclass(extr) tjob.handle_url = tjob.handle_url_fallback self.assertEqual(self._capture_stdout(tjob), """\ https://example.org/1.jpg | https://example.org/alt/1.jpg https://example.org/2.jpg | https://example.org/alt/2.jpg https://example.org/3.jpg | https://example.org/alt/3.jpg """) def test_parent(self): extr = TestExtractorParent.from_url("test:parent") self.assertEqual(self._capture_stdout(extr), """\ test:child test:child test:child """) def test_child(self): extr = TestExtractorParent.from_url("test:parent") tjob = job.UrlJob(extr, depth=0) self.assertEqual(self._capture_stdout(tjob), 3 * """\ https://example.org/1.jpg https://example.org/2.jpg https://example.org/3.jpg """) class TestInfoJob(TestJob): jobclass = job.InfoJob def test_default(self): extr = TestExtractor.from_url("test:") self.assertEqual(self._capture_stdout(extr), """\ Category / Subcategory "test_category" / "test_subcategory" Filename format (default): "test_{filename}.{extension}" Directory format (default): ["{category}"] """) def test_custom(self): config.set((), "filename", "custom") config.set((), "directory", ("custom",)) config.set((), "sleep-request", 321) extr = TestExtractor.from_url("test:") extr.request_interval = 123.456 self.assertEqual(self._capture_stdout(extr), """\ Category / Subcategory "test_category" / "test_subcategory" Filename format (custom): "custom" Filename format (default): "test_{filename}.{extension}" Directory format (custom): ["custom"] Directory format (default): ["{category}"] Request interval (custom): 321 Request interval (default): 123.456 """) def test_base_category(self): extr = TestExtractor.from_url("test:") extr.basecategory = "test_basecategory" self.assertEqual(self._capture_stdout(extr), """\ Category / Subcategory / Basecategory "test_category" / "test_subcategory" / "test_basecategory" Filename format (default): "test_{filename}.{extension}" Directory format (default): ["{category}"] """) class TestDataJob(TestJob): jobclass = job.DataJob def test_default(self): extr = TestExtractor.from_url("test:") tjob = self.jobclass(extr, file=io.StringIO()) tjob.run() self.assertEqual(tjob.data, [ (Message.Directory, { "category" : "test_category", "subcategory": "test_subcategory", }), (Message.Url, "https://example.org/1.jpg", { "category" : "test_category", "subcategory": "test_subcategory", "filename" : "1", "extension" : "jpg", "num" : 1, "tags" : ["foo", "bar", "テスト"], "user" : {"id": 123, "name": "test"}, }), (Message.Url, "https://example.org/2.jpg", { "category" : "test_category", "subcategory": "test_subcategory", "filename" : "2", "extension" : "jpg", "num" : 2, "tags" : ["foo", "bar", "テスト"], "user" : {"id": 123, "name": "test"}, }), (Message.Url, "https://example.org/3.jpg", { "category" : "test_category", "subcategory": "test_subcategory", "filename" : "3", "extension" : "jpg", "num" : 3, "tags" : ["foo", "bar", "テスト"], "user" : {"id": 123, "name": "test"}, }), ]) def test_exception(self): extr = TestExtractorException.from_url("test:exception") tjob = self.jobclass(extr, file=io.StringIO()) tjob.run() self.assertEqual( tjob.data[-1], ("ZeroDivisionError", "division by zero")) def test_private(self): config.set(("output",), "private", True) extr = TestExtractor.from_url("test:") tjob = self.jobclass(extr, file=io.StringIO()) tjob.run() for i in range(1, 4): self.assertEqual( tjob.data[i][2]["_fallback"], ("https://example.org/alt/{}.jpg".format(i),), ) def test_sleep(self): extr = TestExtractor.from_url("test:") tjob = self.jobclass(extr, file=io.StringIO()) config.set((), "sleep-extractor", 123) with patch("time.sleep") as sleep: tjob.run() sleep.assert_called_once_with(123) config.set((), "sleep-extractor", 0) with patch("time.sleep") as sleep: tjob.run() sleep.assert_not_called() def test_ascii(self): extr = TestExtractor.from_url("test:") tjob = self.jobclass(extr) tjob.file = buffer = io.StringIO() tjob.run() self.assertIn("""\ "tags": [ "foo", "bar", "\\u30c6\\u30b9\\u30c8" ], """, buffer.getvalue()) tjob.file = buffer = io.StringIO() tjob.ascii = False tjob.run() self.assertIn("""\ "tags": [ "foo", "bar", "テスト" ], """, buffer.getvalue()) def test_num_string(self): extr = TestExtractor.from_url("test:") tjob = self.jobclass(extr, file=io.StringIO()) with patch("gallery_dl.util.number_to_string") as nts: tjob.run() self.assertEqual(len(nts.call_args_list), 0) config.set(("output",), "num-to-str", True) with patch("gallery_dl.util.number_to_string") as nts: tjob.run() self.assertEqual(len(nts.call_args_list), 52) tjob.run() self.assertEqual(tjob.data[-1][0], Message.Url) self.assertEqual(tjob.data[-1][2]["num"], "3") class TestExtractor(Extractor): category = "test_category" subcategory = "test_subcategory" directory_fmt = ("{category}",) filename_fmt = "test_{filename}.{extension}" pattern = r"test:(child)?$" def items(self): root = "https://example.org" yield Message.Directory, {} for i in range(1, 4): url = "{}/{}.jpg".format(root, i) yield Message.Url, url, text.nameext_from_url(url, { "num" : i, "tags": ["foo", "bar", "テスト"], "user": {"id": 123, "name": "test"}, "_fallback": ("{}/alt/{}.jpg".format(root, i),), }) class TestExtractorParent(Extractor): category = "test_category" subcategory = "test_subcategory_parent" pattern = r"test:parent" def items(self): url = "test:child" for i in range(11, 14): yield Message.Queue, url, { "num" : i, "tags": ["abc", "def"], "_extractor": TestExtractor, } class TestExtractorException(Extractor): category = "test_category" subcategory = "test_subcategory_exception" pattern = r"test:exception$" def items(self): return 1/0 class TestExtractorAlt(Extractor): category = "test_category_alt" subcategory = "test_subcategory" if __name__ == '__main__': unittest.main() ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1618779452.0 gallery_dl-1.21.1/test/test_oauth.py0000644000175000017500000000736614037116474016141 0ustar00mikemike#!/usr/bin/env python3 # -*- coding: utf-8 -*- # Copyright 2018-2020 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. import os import sys import unittest sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) from gallery_dl import oauth, text # noqa E402 TESTSERVER = "http://term.ie/oauth/example" CONSUMER_KEY = "key" CONSUMER_SECRET = "secret" REQUEST_TOKEN = "requestkey" REQUEST_TOKEN_SECRET = "requestsecret" ACCESS_TOKEN = "accesskey" ACCESS_TOKEN_SECRET = "accesssecret" class TestOAuthSession(unittest.TestCase): def test_concat(self): concat = oauth.concat self.assertEqual(concat(), "") self.assertEqual(concat("str"), "str") self.assertEqual(concat("str1", "str2"), "str1&str2") self.assertEqual(concat("&", "?/"), "%26&%3F%2F") self.assertEqual( concat("GET", "http://example.org/", "foo=bar&baz=a"), "GET&http%3A%2F%2Fexample.org%2F&foo%3Dbar%26baz%3Da" ) def test_nonce(self, size=16): nonce_values = set(oauth.nonce(size) for _ in range(size)) # uniqueness self.assertEqual(len(nonce_values), size) # length for nonce in nonce_values: self.assertEqual(len(nonce), size) def test_quote(self): quote = oauth.quote reserved = ",;:!\"§$%&/(){}[]=?`´+*'äöü" unreserved = ("ABCDEFGHIJKLMNOPQRSTUVWXYZ" "abcdefghijklmnopqrstuvwxyz" "0123456789-._~") for char in unreserved: self.assertEqual(quote(char), char) for char in reserved: quoted = quote(char) quoted_hex = quoted.replace("%", "") self.assertTrue(quoted.startswith("%")) self.assertTrue(len(quoted) >= 3) self.assertEqual(quoted_hex.upper(), quoted_hex) def test_request_token(self): response = self._oauth_request( "/request_token.php", {}) expected = "oauth_token=requestkey&oauth_token_secret=requestsecret" self.assertEqual(response, expected, msg=response) data = text.parse_query(response) self.assertTrue(data["oauth_token"], REQUEST_TOKEN) self.assertTrue(data["oauth_token_secret"], REQUEST_TOKEN_SECRET) def test_access_token(self): response = self._oauth_request( "/access_token.php", {}, REQUEST_TOKEN, REQUEST_TOKEN_SECRET) expected = "oauth_token=accesskey&oauth_token_secret=accesssecret" self.assertEqual(response, expected, msg=response) data = text.parse_query(response) self.assertTrue(data["oauth_token"], ACCESS_TOKEN) self.assertTrue(data["oauth_token_secret"], ACCESS_TOKEN_SECRET) def test_authenticated_call(self): params = {"method": "foo", "a": "äöüß/?&#", "äöüß/?&#": "a"} response = self._oauth_request( "/echo_api.php", params, ACCESS_TOKEN, ACCESS_TOKEN_SECRET) self.assertEqual(text.parse_query(response), params) def _oauth_request(self, endpoint, params=None, oauth_token=None, oauth_token_secret=None): # the test server at 'term.ie' is unreachable raise unittest.SkipTest() session = oauth.OAuth1Session( CONSUMER_KEY, CONSUMER_SECRET, oauth_token, oauth_token_secret, ) try: response = session.get(TESTSERVER + endpoint, params=params) response.raise_for_status() return response.text except OSError: raise unittest.SkipTest() if __name__ == "__main__": unittest.main(warnings="ignore") ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1643756959.0 gallery_dl-1.21.1/test/test_output.py0000644000175000017500000001627514176336637016371 0ustar00mikemike#!/usr/bin/env python3 # -*- coding: utf-8 -*- # Copyright 2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. import os import sys import unittest sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) from gallery_dl import output # noqa E402 class TestShorten(unittest.TestCase): def test_shorten_noop(self, f=output.shorten_string): self.assertEqual(f("" , 10), "") self.assertEqual(f("foobar", 10), "foobar") def test_shorten(self, f=output.shorten_string): s = "01234567890123456789" # string of length 20 self.assertEqual(f(s, 30), s) self.assertEqual(f(s, 25), s) self.assertEqual(f(s, 20), s) self.assertEqual(f(s, 19), "012345678…123456789") self.assertEqual(f(s, 18), "01234567…123456789") self.assertEqual(f(s, 17), "01234567…23456789") self.assertEqual(f(s, 16), "0123456…23456789") self.assertEqual(f(s, 15), "0123456…3456789") self.assertEqual(f(s, 14), "012345…3456789") self.assertEqual(f(s, 13), "012345…456789") self.assertEqual(f(s, 12), "01234…456789") self.assertEqual(f(s, 11), "01234…56789") self.assertEqual(f(s, 10), "0123…56789") self.assertEqual(f(s, 9) , "0123…6789") self.assertEqual(f(s, 3) , "0…9") self.assertEqual(f(s, 2) , "…9") def test_shorten_separator(self, f=output.shorten_string): s = "01234567890123456789" # string of length 20 self.assertEqual(f(s, 20, "|---|"), s) self.assertEqual(f(s, 19, "|---|"), "0123456|---|3456789") self.assertEqual(f(s, 15, "|---|"), "01234|---|56789") self.assertEqual(f(s, 10, "|---|"), "01|---|789") self.assertEqual(f(s, 19, "..."), "01234567...23456789") self.assertEqual(f(s, 19, "..") , "01234567..123456789") self.assertEqual(f(s, 19, ".") , "012345678.123456789") self.assertEqual(f(s, 19, "") , "0123456780123456789") class TestShortenEAW(unittest.TestCase): def test_shorten_eaw_noop(self, f=output.shorten_string_eaw): self.assertEqual(f("" , 10), "") self.assertEqual(f("foobar", 10), "foobar") def test_shorten_eaw(self, f=output.shorten_string_eaw): s = "01234567890123456789" # 20 ascii characters self.assertEqual(f(s, 30), s) self.assertEqual(f(s, 25), s) self.assertEqual(f(s, 20), s) self.assertEqual(f(s, 19), "012345678…123456789") self.assertEqual(f(s, 18), "01234567…123456789") self.assertEqual(f(s, 17), "01234567…23456789") self.assertEqual(f(s, 16), "0123456…23456789") self.assertEqual(f(s, 15), "0123456…3456789") self.assertEqual(f(s, 14), "012345…3456789") self.assertEqual(f(s, 13), "012345…456789") self.assertEqual(f(s, 12), "01234…456789") self.assertEqual(f(s, 11), "01234…56789") self.assertEqual(f(s, 10), "0123…56789") self.assertEqual(f(s, 9) , "0123…6789") self.assertEqual(f(s, 3) , "0…9") self.assertEqual(f(s, 2) , "…9") def test_shorten_eaw_wide(self, f=output.shorten_string_eaw): s = "幻想郷幻想郷幻想郷幻想郷" # 12 wide characters self.assertEqual(f(s, 30), s) self.assertEqual(f(s, 25), s) self.assertEqual(f(s, 20), "幻想郷幻…想郷幻想郷") self.assertEqual(f(s, 19), "幻想郷幻…想郷幻想郷") self.assertEqual(f(s, 18), "幻想郷幻…郷幻想郷") self.assertEqual(f(s, 17), "幻想郷幻…郷幻想郷") self.assertEqual(f(s, 16), "幻想郷…郷幻想郷") self.assertEqual(f(s, 15), "幻想郷…郷幻想郷") self.assertEqual(f(s, 14), "幻想郷…幻想郷") self.assertEqual(f(s, 13), "幻想郷…幻想郷") self.assertEqual(f(s, 12), "幻想…幻想郷") self.assertEqual(f(s, 11), "幻想…幻想郷") self.assertEqual(f(s, 10), "幻想…想郷") self.assertEqual(f(s, 9) , "幻想…想郷") self.assertEqual(f(s, 3) , "…郷") def test_shorten_eaw_mix(self, f=output.shorten_string_eaw): s = "幻-想-郷##幻-想-郷##幻-想-郷" # mixed characters self.assertEqual(f(s, 28), s) self.assertEqual(f(s, 25), "幻-想-郷##幻…郷##幻-想-郷") self.assertEqual(f(s, 20), "幻-想-郷#…##幻-想-郷") self.assertEqual(f(s, 19), "幻-想-郷#…#幻-想-郷") self.assertEqual(f(s, 18), "幻-想-郷…#幻-想-郷") self.assertEqual(f(s, 17), "幻-想-郷…幻-想-郷") self.assertEqual(f(s, 16), "幻-想-…#幻-想-郷") self.assertEqual(f(s, 15), "幻-想-…幻-想-郷") self.assertEqual(f(s, 14), "幻-想-…-想-郷") self.assertEqual(f(s, 13), "幻-想-…-想-郷") self.assertEqual(f(s, 12), "幻-想…-想-郷") self.assertEqual(f(s, 11), "幻-想…想-郷") self.assertEqual(f(s, 10), "幻-…-想-郷") self.assertEqual(f(s, 9) , "幻-…想-郷") self.assertEqual(f(s, 3) , "…郷") def test_shorten_eaw_separator(self, f=output.shorten_string_eaw): s = "01234567890123456789" # 20 ascii characters self.assertEqual(f(s, 20, "|---|"), s) self.assertEqual(f(s, 19, "|---|"), "0123456|---|3456789") self.assertEqual(f(s, 15, "|---|"), "01234|---|56789") self.assertEqual(f(s, 10, "|---|"), "01|---|789") self.assertEqual(f(s, 19, "..."), "01234567...23456789") self.assertEqual(f(s, 19, "..") , "01234567..123456789") self.assertEqual(f(s, 19, ".") , "012345678.123456789") self.assertEqual(f(s, 19, "") , "0123456780123456789") def test_shorten_eaw_separator_wide(self, f=output.shorten_string_eaw): s = "幻想郷幻想郷幻想郷幻想郷" # 12 wide characters self.assertEqual(f(s, 24, "|---|"), s) self.assertEqual(f(s, 19, "|---|"), "幻想郷|---|郷幻想郷") self.assertEqual(f(s, 15, "|---|"), "幻想|---|幻想郷") self.assertEqual(f(s, 10, "|---|"), "幻|---|郷") self.assertEqual(f(s, 19, "..."), "幻想郷幻...郷幻想郷") self.assertEqual(f(s, 19, "..") , "幻想郷幻..郷幻想郷") self.assertEqual(f(s, 19, ".") , "幻想郷幻.想郷幻想郷") self.assertEqual(f(s, 19, "") , "幻想郷幻想郷幻想郷") def test_shorten_eaw_separator_mix_(self, f=output.shorten_string_eaw): s = "幻-想-郷##幻-想-郷##幻-想-郷" # mixed characters self.assertEqual(f(s, 30, "|---|"), s) self.assertEqual(f(s, 19, "|---|"), "幻-想-|---|幻-想-郷") self.assertEqual(f(s, 15, "|---|"), "幻-想|---|想-郷") self.assertEqual(f(s, 10, "|---|"), "幻|---|-郷") self.assertEqual(f(s, 19, "..."), "幻-想-郷...幻-想-郷") self.assertEqual(f(s, 19, "..") , "幻-想-郷..#幻-想-郷") self.assertEqual(f(s, 19, ".") , "幻-想-郷#.#幻-想-郷") self.assertEqual(f(s, 19, "") , "幻-想-郷###幻-想-郷") if __name__ == '__main__': unittest.main() ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1648567962.0 gallery_dl-1.21.1/test/test_postprocessor.py0000644000175000017500000003524214220623232017725 0ustar00mikemike#!/usr/bin/env python3 # -*- coding: utf-8 -*- # Copyright 2019-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. import os import sys import unittest from unittest.mock import Mock, mock_open, patch import logging import zipfile import tempfile import collections from datetime import datetime sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) from gallery_dl import extractor, output, path # noqa E402 from gallery_dl import postprocessor, config # noqa E402 from gallery_dl.postprocessor.common import PostProcessor # noqa E402 class MockPostprocessorModule(Mock): __postprocessor__ = "mock" class FakeJob(): def __init__(self, extr=extractor.find("test:")): self.extractor = extr self.pathfmt = path.PathFormat(extr) self.out = output.NullOutput() self.get_logger = logging.getLogger self.hooks = collections.defaultdict(list) def register_hooks(self, hooks, options): for hook, callback in hooks.items(): self.hooks[hook].append(callback) class TestPostprocessorModule(unittest.TestCase): def setUp(self): postprocessor._cache.clear() def test_find(self): for name in (postprocessor.modules): cls = postprocessor.find(name) self.assertEqual(cls.__name__, name.capitalize() + "PP") self.assertIs(cls.__base__, PostProcessor) self.assertEqual(postprocessor.find("foo"), None) self.assertEqual(postprocessor.find(1234) , None) self.assertEqual(postprocessor.find(None) , None) @patch("builtins.__import__") def test_cache(self, import_module): import_module.return_value = MockPostprocessorModule() for name in (postprocessor.modules): postprocessor.find(name) self.assertEqual(import_module.call_count, len(postprocessor.modules)) # no new calls to import_module for name in (postprocessor.modules): postprocessor.find(name) self.assertEqual(import_module.call_count, len(postprocessor.modules)) class BasePostprocessorTest(unittest.TestCase): @classmethod def setUpClass(cls): cls.dir = tempfile.TemporaryDirectory() config.set((), "base-directory", cls.dir.name) cls.job = FakeJob() @classmethod def tearDownClass(cls): cls.dir.cleanup() config.clear() def tearDown(self): self.job.hooks.clear() def _create(self, options=None, data=None): kwdict = {"category": "test", "filename": "file", "extension": "ext"} if options is None: options = {} if data is not None: kwdict.update(data) self.pathfmt = self.job.pathfmt self.pathfmt.set_directory(kwdict) self.pathfmt.set_filename(kwdict) pp = postprocessor.find(self.__class__.__name__[:-4].lower()) return pp(self.job, options) def _trigger(self, events=None, *args): for event in (events or ("prepare", "file")): for callback in self.job.hooks[event]: callback(self.pathfmt, *args) class ClassifyTest(BasePostprocessorTest): def test_classify_default(self): pp = self._create() self.assertEqual(pp.mapping, { ext: directory for directory, exts in pp.DEFAULT_MAPPING.items() for ext in exts }) self.pathfmt.set_extension("jpg") pp.prepare(self.pathfmt) path = os.path.join(self.dir.name, "test", "Pictures") self.assertEqual(self.pathfmt.path, path + "/file.jpg") self.assertEqual(self.pathfmt.realpath, path + "/file.jpg") with patch("os.makedirs") as mkdirs: self._trigger() mkdirs.assert_called_once_with(path, exist_ok=True) def test_classify_noop(self): pp = self._create() rp = self.pathfmt.realpath pp.prepare(self.pathfmt) self.assertEqual(self.pathfmt.path, rp) self.assertEqual(self.pathfmt.realpath, rp) with patch("os.makedirs") as mkdirs: self._trigger() self.assertEqual(mkdirs.call_count, 0) def test_classify_custom(self): pp = self._create({"mapping": { "foo/bar": ["foo", "bar"], }}) self.assertEqual(pp.mapping, { "foo": "foo/bar", "bar": "foo/bar", }) self.pathfmt.set_extension("foo") pp.prepare(self.pathfmt) path = os.path.join(self.dir.name, "test", "foo", "bar") self.assertEqual(self.pathfmt.path, path + "/file.foo") self.assertEqual(self.pathfmt.realpath, path + "/file.foo") with patch("os.makedirs") as mkdirs: self._trigger() mkdirs.assert_called_once_with(path, exist_ok=True) class MetadataTest(BasePostprocessorTest): def test_metadata_default(self): pp = self._create() # default arguments self.assertEqual(pp.write , pp._write_json) self.assertEqual(pp.ascii , False) self.assertEqual(pp.indent , 4) self.assertEqual(pp.extension, "json") def test_metadata_json(self): pp = self._create({ "mode" : "json", "ascii" : True, "indent" : 2, "extension": "JSON", }, { "public" : "hello", "_private" : "world", }) self.assertEqual(pp.write , pp._write_json) self.assertEqual(pp.ascii , True) self.assertEqual(pp.indent , 2) self.assertEqual(pp.extension, "JSON") with patch("builtins.open", mock_open()) as m: self._trigger() path = self.pathfmt.realpath + ".JSON" m.assert_called_once_with(path, "w", encoding="utf-8") self.assertEqual(self._output(m), """{ "category": "test", "extension": "ext", "filename": "file", "public": "hello" } """) def test_metadata_tags(self): pp = self._create( {"mode": "tags"}, {"tags": ["foo", "bar", "baz"]}, ) self.assertEqual(pp.write, pp._write_tags) self.assertEqual(pp.extension, "txt") with patch("builtins.open", mock_open()) as m: self._trigger() path = self.pathfmt.realpath + ".txt" m.assert_called_once_with(path, "w", encoding="utf-8") self.assertEqual(self._output(m), "foo\nbar\nbaz\n") def test_metadata_tags_split_1(self): self._create( {"mode": "tags"}, {"tags": "foo, bar, baz"}, ) with patch("builtins.open", mock_open()) as m: self._trigger() self.assertEqual(self._output(m), "foo\nbar\nbaz\n") def test_metadata_tags_split_2(self): self._create( {"mode": "tags"}, {"tags": "foobar1 foobar2 foobarbaz"}, ) with patch("builtins.open", mock_open()) as m: self._trigger() self.assertEqual(self._output(m), "foobar1\nfoobar2\nfoobarbaz\n") def test_metadata_tags_tagstring(self): self._create( {"mode": "tags"}, {"tag_string": "foo, bar, baz"}, ) with patch("builtins.open", mock_open()) as m: self._trigger() self.assertEqual(self._output(m), "foo\nbar\nbaz\n") def test_metadata_tags_dict(self): self._create( {"mode": "tags"}, {"tags": {"g": ["foobar1", "foobar2"], "m": ["foobarbaz"]}}, ) with patch("builtins.open", mock_open()) as m: self._trigger() self.assertEqual(self._output(m), "foobar1\nfoobar2\nfoobarbaz\n") def test_metadata_custom(self): def test(pp_info): pp = self._create(pp_info, {"foo": "bar"}) self.assertEqual(pp.write, pp._write_custom) self.assertEqual(pp.extension, "txt") self.assertTrue(pp._content_fmt) with patch("builtins.open", mock_open()) as m: self._trigger() self.assertEqual(self._output(m), "bar\nNone\n") self.job.hooks.clear() test({"mode": "custom", "content-format": "{foo}\n{missing}\n"}) test({"mode": "custom", "content-format": ["{foo}", "{missing}"]}) test({"mode": "custom", "format": "{foo}\n{missing}\n"}) def test_metadata_extfmt(self): pp = self._create({ "extension" : "ignored", "extension-format": "json", }) self.assertEqual(pp._filename, pp._filename_extfmt) with patch("builtins.open", mock_open()) as m: self._trigger() path = self.pathfmt.realdirectory + "file.json" m.assert_called_once_with(path, "w", encoding="utf-8") def test_metadata_extfmt_2(self): self._create({ "extension-format": "{extension!u}-data:{category:Res/ES/}", }) self.pathfmt.prefix = "2." with patch("builtins.open", mock_open()) as m: self._trigger() path = self.pathfmt.realdirectory + "file.2.EXT-data:tESt" m.assert_called_once_with(path, "w", encoding="utf-8") def test_metadata_directory(self): self._create({ "directory": "metadata", }) with patch("builtins.open", mock_open()) as m: self._trigger() path = self.pathfmt.realdirectory + "metadata/file.ext.json" m.assert_called_once_with(path, "w", encoding="utf-8") def test_metadata_directory_2(self): self._create({ "directory" : "metadata////", "extension-format": "json", }) with patch("builtins.open", mock_open()) as m: self._trigger() path = self.pathfmt.realdirectory + "metadata/file.json" m.assert_called_once_with(path, "w", encoding="utf-8") def test_metadata_filename(self): self._create({ "filename" : "{category}_{filename}_/meta/\n\r.data", "extension-format": "json", }) with patch("builtins.open", mock_open()) as m: self._trigger() path = self.pathfmt.realdirectory + "test_file__meta_.data" m.assert_called_once_with(path, "w", encoding="utf-8") @staticmethod def _output(mock): return "".join( call[1][0] for call in mock.mock_calls if call[0] == "().write" ) class MtimeTest(BasePostprocessorTest): def test_mtime_default(self): pp = self._create() self.assertEqual(pp.key, "date") def test_mtime_datetime(self): self._create(None, {"date": datetime(1980, 1, 1)}) self._trigger() self.assertEqual(self.pathfmt.kwdict["_mtime"], 315532800) def test_mtime_timestamp(self): self._create(None, {"date": 315532800}) self._trigger() self.assertEqual(self.pathfmt.kwdict["_mtime"], 315532800) def test_mtime_custom(self): self._create({"key": "foo"}, {"foo": 315532800}) self._trigger() self.assertEqual(self.pathfmt.kwdict["_mtime"], 315532800) class ZipTest(BasePostprocessorTest): def test_zip_default(self): pp = self._create() self.assertEqual(self.job.hooks["file"][0], pp.write) self.assertEqual(pp.path, self.pathfmt.realdirectory) self.assertEqual(pp.delete, True) self.assertEqual(pp.args, ( pp.path[:-1] + ".zip", "a", zipfile.ZIP_STORED, True, )) self.assertTrue(pp.args[0].endswith("/test.zip")) def test_zip_safe(self): pp = self._create({"mode": "safe"}) self.assertEqual(self.job.hooks["file"][0], pp.write_safe) self.assertEqual(pp.path, self.pathfmt.realdirectory) self.assertEqual(pp.delete, True) self.assertEqual(pp.args, ( pp.path[:-1] + ".zip", "a", zipfile.ZIP_STORED, True, )) self.assertTrue(pp.args[0].endswith("/test.zip")) def test_zip_options(self): pp = self._create({ "keep-files": True, "compression": "zip", "extension": "cbz", }) self.assertEqual(pp.delete, False) self.assertEqual(pp.args, ( pp.path[:-1] + ".cbz", "a", zipfile.ZIP_DEFLATED, True, )) self.assertTrue(pp.args[0].endswith("/test.cbz")) def test_zip_write(self): pp = self._create() with tempfile.NamedTemporaryFile("w", dir=self.dir.name) as file: file.write("foobar\n") # write dummy file with 3 different names for i in range(3): name = "file{}.ext".format(i) self.pathfmt.temppath = file.name self.pathfmt.filename = name self._trigger() nti = pp.zfile.NameToInfo self.assertEqual(len(nti), i+1) self.assertIn(name, nti) # check file contents self.assertEqual(len(nti), 3) self.assertIn("file0.ext", nti) self.assertIn("file1.ext", nti) self.assertIn("file2.ext", nti) # write the last file a second time (will be skipped) self._trigger() self.assertEqual(len(pp.zfile.NameToInfo), 3) # close file self._trigger(("finalize",), 0) # reopen to check persistence with zipfile.ZipFile(pp.zfile.filename) as file: nti = file.NameToInfo self.assertEqual(len(pp.zfile.NameToInfo), 3) self.assertIn("file0.ext", pp.zfile.NameToInfo) self.assertIn("file1.ext", pp.zfile.NameToInfo) self.assertIn("file2.ext", pp.zfile.NameToInfo) os.unlink(pp.zfile.filename) def test_zip_write_mock(self): def side_effect(_, name): pp.zfile.NameToInfo.add(name) pp = self._create() pp.zfile = Mock() pp.zfile.NameToInfo = set() pp.zfile.write.side_effect = side_effect # write 3 files for i in range(3): self.pathfmt.temppath = self.pathfmt.realdirectory + "file.ext" self.pathfmt.filename = "file{}.ext".format(i) self._trigger() # write the last file a second time (should be skipped) self._trigger() # close file self._trigger(("finalize",), 0) self.assertEqual(pp.zfile.write.call_count, 3) for call in pp.zfile.write.call_args_list: args, kwargs = call self.assertEqual(len(args), 2) self.assertEqual(len(kwargs), 0) self.assertEqual(args[0], self.pathfmt.temppath) self.assertRegex(args[1], r"file\d\.ext") self.assertEqual(pp.zfile.close.call_count, 1) if __name__ == "__main__": unittest.main() ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1646253139.0 gallery_dl-1.21.1/test/test_results.py0000644000175000017500000003041414207752123016504 0ustar00mikemike#!/usr/bin/env python3 # -*- coding: utf-8 -*- # Copyright 2015-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. import os import sys import unittest import re import json import hashlib import datetime sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) from gallery_dl import \ extractor, util, job, config, exception, formatter # noqa E402 # temporary issues, etc. BROKEN = { "photobucket", } class TestExtractorResults(unittest.TestCase): def setUp(self): setup_test_config() def tearDown(self): config.clear() @classmethod def setUpClass(cls): cls._skipped = [] @classmethod def tearDownClass(cls): if cls._skipped: print("\n\nSkipped tests:") for url, exc in cls._skipped: print('- {} ("{}")'.format(url, exc)) def _run_test(self, extr, url, result): if result: if "options" in result: for key, value in result["options"]: key = key.split(".") config.set(key[:-1], key[-1], value) if "range" in result: config.set((), "image-range" , result["range"]) config.set((), "chapter-range", result["range"]) content = "content" in result else: content = False tjob = ResultJob(url, content=content) self.assertEqual(extr, tjob.extractor.__class__) if not result: return if "exception" in result: with self.assertRaises(result["exception"]): tjob.run() return try: tjob.run() except exception.StopExtraction: pass except exception.HttpError as exc: exc = str(exc) if re.match(r"'5\d\d ", exc) or \ re.search(r"\bRead timed out\b", exc): self._skipped.append((url, exc)) self.skipTest(exc) raise if result.get("archive", True): self.assertEqual( len(set(tjob.archive_list)), len(tjob.archive_list), "archive-id uniqueness", ) if tjob.queue: # test '_extractor' entries for url, kwdict in zip(tjob.url_list, tjob.kwdict_list): if "_extractor" in kwdict: extr = kwdict["_extractor"].from_url(url) if extr is None and not result.get("extractor", True): continue self.assertIsInstance(extr, kwdict["_extractor"]) self.assertEqual(extr.url, url) else: # test 'extension' entries for kwdict in tjob.kwdict_list: self.assertIn("extension", kwdict) # test extraction results if "url" in result: self.assertEqual(result["url"], tjob.url_hash.hexdigest()) if "content" in result: expected = result["content"] digest = tjob.content_hash.hexdigest() if isinstance(expected, str): self.assertEqual(digest, expected, "content") else: # assume iterable self.assertIn(digest, expected, "content") if "keyword" in result: expected = result["keyword"] if isinstance(expected, dict): for kwdict in tjob.kwdict_list: self._test_kwdict(kwdict, expected) else: # assume SHA1 hash self.assertEqual(expected, tjob.kwdict_hash.hexdigest()) if "count" in result: count = result["count"] if isinstance(count, str): self.assertRegex(count, r"^ *(==|!=|<|<=|>|>=) *\d+ *$") expr = "{} {}".format(len(tjob.url_list), count) self.assertTrue(eval(expr), msg=expr) else: # assume integer self.assertEqual(len(tjob.url_list), count) if "pattern" in result: self.assertGreater(len(tjob.url_list), 0) for url in tjob.url_list: self.assertRegex(url, result["pattern"]) def _test_kwdict(self, kwdict, tests): for key, test in tests.items(): if key.startswith("?"): key = key[1:] if key not in kwdict: continue self.assertIn(key, kwdict) value = kwdict[key] if isinstance(test, dict): self._test_kwdict(value, test) elif isinstance(test, type): self.assertIsInstance(value, test, msg=key) elif isinstance(test, list): subtest = False for idx, item in enumerate(test): if isinstance(item, dict): subtest = True self._test_kwdict(value[idx], item) if not subtest: self.assertEqual(value, test, msg=key) elif isinstance(test, str): if test.startswith("re:"): self.assertRegex(value, test[3:], msg=key) elif test.startswith("dt:"): self.assertIsInstance(value, datetime.datetime, msg=key) self.assertEqual(str(value), test[3:], msg=key) elif test.startswith("type:"): self.assertEqual(type(value).__name__, test[5:], msg=key) else: self.assertEqual(value, test, msg=key) else: self.assertEqual(value, test, msg=key) class ResultJob(job.DownloadJob): """Generate test-results for extractor runs""" def __init__(self, url, parent=None, content=False): job.DownloadJob.__init__(self, url, parent) self.queue = False self.content = content self.url_list = [] self.url_hash = hashlib.sha1() self.kwdict_list = [] self.kwdict_hash = hashlib.sha1() self.archive_list = [] self.archive_hash = hashlib.sha1() self.content_hash = hashlib.sha1() if content: self.fileobj = TestPathfmt(self.content_hash) self.format_directory = TestFormatter( "".join(self.extractor.directory_fmt)).format_map self.format_filename = TestFormatter( self.extractor.filename_fmt).format_map def run(self): for msg in self.extractor: self.dispatch(msg) def handle_url(self, url, kwdict, fallback=None): self._update_url(url) self._update_kwdict(kwdict) self._update_archive(kwdict) self._update_content(url, kwdict) self.format_filename(kwdict) def handle_directory(self, kwdict): self._update_kwdict(kwdict, False) self.format_directory(kwdict) def handle_metadata(self, kwdict): pass def handle_queue(self, url, kwdict): self.queue = True self._update_url(url) self._update_kwdict(kwdict) def _update_url(self, url): self.url_list.append(url) self.url_hash.update(url.encode()) def _update_kwdict(self, kwdict, to_list=True): if to_list: self.kwdict_list.append(kwdict.copy()) kwdict = util.filter_dict(kwdict) self.kwdict_hash.update( json.dumps(kwdict, sort_keys=True, default=str).encode()) def _update_archive(self, kwdict): archive_id = self.extractor.archive_fmt.format_map(kwdict) self.archive_list.append(archive_id) self.archive_hash.update(archive_id.encode()) def _update_content(self, url, kwdict): if self.content: scheme = url.partition(":")[0] self.fileobj.kwdict = kwdict self.get_downloader(scheme).download(url, self.fileobj) class TestPathfmt(): def __init__(self, hashobj): self.hashobj = hashobj self.path = "" self.size = 0 self.kwdict = {} self.extension = "jpg" def __enter__(self): return self def __exit__(self, *args): pass def open(self, mode): self.size = 0 return self def write(self, content): """Update SHA1 hash""" self.size += len(content) self.hashobj.update(content) def tell(self): return self.size def part_size(self): return 0 class TestFormatter(formatter.StringFormatter): @staticmethod def _noop(_): return "" def _apply_simple(self, key, fmt): if key == "extension" or "_parse_optional." in repr(fmt): return self._noop def wrap(obj): return fmt(obj[key]) return wrap def _apply(self, key, funcs, fmt): if key == "extension" or "_parse_optional." in repr(fmt): return self._noop def wrap(obj): obj = obj[key] for func in funcs: obj = func(obj) return fmt(obj) return wrap def setup_test_config(): name = "gallerydl" email = "gallerydl@openaliasbox.org" email2 = "gallerydl@protonmail.com" config.clear() config.set(("cache",), "file", None) config.set(("downloader",), "part", False) config.set(("downloader",), "adjust-extensions", False) config.set(("extractor" ,), "timeout" , 60) config.set(("extractor" ,), "username", name) config.set(("extractor" ,), "password", name) config.set(("extractor", "nijie") , "username", email) config.set(("extractor", "seiga") , "username", email) config.set(("extractor", "pinterest") , "username", email2) config.set(("extractor", "pinterest") , "username", None) # login broken config.set(("extractor", "newgrounds"), "username", "d1618111") config.set(("extractor", "newgrounds"), "password", "d1618111") config.set(("extractor", "mangoxo") , "username", "LiQiang3") config.set(("extractor", "mangoxo") , "password", "5zbQF10_5u25259Ma") for category in ("danbooru", "instagram", "twitter", "subscribestar", "e621", "atfbooru", "inkbunny", "tapas", "pillowfort", "mangadex"): config.set(("extractor", category), "username", None) config.set(("extractor", "mastodon.social"), "access-token", "Blf9gVqG7GytDTfVMiyYQjwVMQaNACgf3Ds3IxxVDUQ") config.set(("extractor", "deviantart"), "client-id", "7777") config.set(("extractor", "deviantart"), "client-secret", "ff14994c744d9208e5caeec7aab4a026") config.set(("extractor", "tumblr"), "api-key", "0cXoHfIqVzMQcc3HESZSNsVlulGxEXGDTTZCDrRrjaa0jmuTc6") config.set(("extractor", "tumblr"), "api-secret", "6wxAK2HwrXdedn7VIoZWxGqVhZ8JdYKDLjiQjL46MLqGuEtyVj") config.set(("extractor", "tumblr"), "access-token", "N613fPV6tOZQnyn0ERTuoEZn0mEqG8m2K8M3ClSJdEHZJuqFdG") config.set(("extractor", "tumblr"), "access-token-secret", "sgOA7ZTT4FBXdOGGVV331sSp0jHYp4yMDRslbhaQf7CaS71i4O") def generate_tests(): """Dynamically generate extractor unittests""" def _generate_test(extr, tcase): def test(self): url, result = tcase print("\n", url, sep="") self._run_test(extr, url, result) return test # enable selective testing for direct calls if __name__ == '__main__' and len(sys.argv) > 1: categories = sys.argv[1:] negate = False if categories[0].lower() == "all": categories = () negate = True elif categories[0].lower() == "broken": categories = BROKEN del sys.argv[1:] else: categories = BROKEN negate = True if categories: print("skipping:", ", ".join(categories)) fltr = util.build_extractor_filter(categories, negate=negate) # add 'test_...' methods for extr in filter(fltr, extractor.extractors()): name = "test_" + extr.__name__ + "_" for num, tcase in enumerate(extr._get_tests(), 1): test = _generate_test(extr, tcase) test.__name__ = name + str(num) setattr(TestExtractorResults, test.__name__, test) generate_tests() if __name__ == '__main__': unittest.main(warnings='ignore') ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1648567962.0 gallery_dl-1.21.1/test/test_text.py0000644000175000017500000003465414220623232015772 0ustar00mikemike#!/usr/bin/env python3 # -*- coding: utf-8 -*- # Copyright 2015-2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. import os import sys import unittest import datetime sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) from gallery_dl import text # noqa E402 INVALID = ((), [], {}, None, 1, 2.3) INVALID_ALT = ((), [], {}, None, "") class TestText(unittest.TestCase): def test_remove_html(self, f=text.remove_html): result = "Hello World." # standard usage self.assertEqual(f(""), "") self.assertEqual(f("Hello World."), result) self.assertEqual(f(" Hello World. "), result) self.assertEqual(f("Hello
    World."), result) self.assertEqual( f("
    HelloWorld.
    "), result) # empty HTML self.assertEqual(f("
    "), "") self.assertEqual(f("
    "), "") # malformed HTML self.assertEqual(f(""), "") self.assertEqual(f(""), "") # invalid arguments for value in INVALID: self.assertEqual(f(value), "") def test_split_html(self, f=text.split_html): result = ["Hello", "World."] empty = [] # standard usage self.assertEqual(f(""), empty) self.assertEqual(f("Hello World."), ["Hello World."]) self.assertEqual(f(" Hello World. "), ["Hello World."]) self.assertEqual(f("Hello
    World."), result) self.assertEqual(f(" Hello
    World. "), result) self.assertEqual( f("
    HelloWorld.
    "), result) # escaped HTML entities self.assertEqual( f("<foo> <bar> "), ["", ""]) # empty HTML self.assertEqual(f("
    "), empty) self.assertEqual(f("
    "), empty) # malformed HTML self.assertEqual(f(""), empty) self.assertEqual(f(""), empty) # invalid arguments for value in INVALID: self.assertEqual(f(value), empty) def test_ensure_http_scheme(self, f=text.ensure_http_scheme): result = "https://example.org/filename.ext" # standard usage self.assertEqual(f(""), "") self.assertEqual(f("example.org/filename.ext"), result) self.assertEqual(f("/example.org/filename.ext"), result) self.assertEqual(f("//example.org/filename.ext"), result) self.assertEqual(f("://example.org/filename.ext"), result) # no change self.assertEqual(f(result), result) self.assertEqual( f("http://example.org/filename.ext"), "http://example.org/filename.ext", ) # ... self.assertEqual( f("htp://example.org/filename.ext"), "https://htp://example.org/filename.ext", ) # invalid arguments for value in INVALID_ALT: self.assertEqual(f(value), value) def test_root_from_url(self, f=text.root_from_url): result = "https://example.org" self.assertEqual(f("https://example.org/") , result) self.assertEqual(f("https://example.org/path"), result) self.assertEqual(f("example.org/") , result) self.assertEqual(f("example.org/path/") , result) result = "http://example.org" self.assertEqual(f("http://example.org/") , result) self.assertEqual(f("http://example.org/path/"), result) self.assertEqual(f("example.org/", "http://") , result) def test_filename_from_url(self, f=text.filename_from_url): result = "filename.ext" # standard usage self.assertEqual(f(""), "") self.assertEqual(f("filename.ext"), result) self.assertEqual(f("/filename.ext"), result) self.assertEqual(f("example.org/filename.ext"), result) self.assertEqual(f("http://example.org/v2/filename.ext"), result) self.assertEqual( f("http://example.org/v2/filename.ext?param=value#frag"), result) # invalid arguments for value in INVALID: self.assertEqual(f(value), "") def test_ext_from_url(self, f=text.ext_from_url): result = "ext" # standard usage self.assertEqual(f(""), "") self.assertEqual(f("filename"), "") self.assertEqual(f("filename.ext"), result) self.assertEqual(f("/filename.ExT"), result) self.assertEqual(f("example.org/filename.ext"), result) self.assertEqual(f("http://example.org/v2/filename.ext"), result) self.assertEqual( f("http://example.org/v2/filename.ext?param=value#frag"), result) # invalid arguments for value in INVALID: self.assertEqual(f(value), "") def test_nameext_from_url(self, f=text.nameext_from_url): empty = {"filename": "", "extension": ""} result = {"filename": "filename", "extension": "ext"} # standard usage self.assertEqual(f(""), empty) self.assertEqual(f("filename.ext"), result) self.assertEqual(f("/filename.ExT"), result) self.assertEqual(f("example.org/filename.ext"), result) self.assertEqual(f("http://example.org/v2/filename.ext"), result) self.assertEqual( f("http://example.org/v2/filename.ext?param=value#frag"), result) # long "extension" fn = "httpswww.example.orgpath-path-path-path-path-path-path-path" self.assertEqual(f(fn), {"filename": fn, "extension": ""}) # invalid arguments for value in INVALID: self.assertEqual(f(value), empty) def test_extract(self, f=text.extract): txt = "
    " self.assertEqual(f(txt, "<", ">"), ("a" , 3)) self.assertEqual(f(txt, "X", ">"), (None, 0)) self.assertEqual(f(txt, "<", "X"), (None, 0)) # 'pos' argument for i in range(1, 4): self.assertEqual(f(txt, "<", ">", i), ("b", 6)) for i in range(4, 10): self.assertEqual(f(txt, "<", ">", i), (None, i)) # invalid arguments for value in INVALID: self.assertEqual(f(value, "<" , ">") , (None, 0)) self.assertEqual(f(txt , value, ">") , (None, 0)) self.assertEqual(f(txt , "<" , value), (None, 0)) def test_rextract(self, f=text.rextract): txt = "" self.assertEqual(f(txt, "<", ">"), ("b" , 3)) self.assertEqual(f(txt, "X", ">"), (None, -1)) self.assertEqual(f(txt, "<", "X"), (None, -1)) # 'pos' argument for i in range(10, 3, -1): self.assertEqual(f(txt, "<", ">", i), ("b", 3)) for i in range(3, 0, -1): self.assertEqual(f(txt, "<", ">", i), ("a", 0)) # invalid arguments for value in INVALID: self.assertEqual(f(value, "<" , ">") , (None, -1)) self.assertEqual(f(txt , value, ">") , (None, -1)) self.assertEqual(f(txt , "<" , value), (None, -1)) def test_extract_all(self, f=text.extract_all): txt = "[c][b][a]: xyz! [d][e" self.assertEqual( f(txt, ()), ({}, 0)) self.assertEqual( f(txt, (("C", "[", "]"), ("B", "[", "]"), ("A", "[", "]"))), ({"A": "a", "B": "b", "C": "c"}, 9), ) # 'None' as field name self.assertEqual( f(txt, ((None, "[", "]"), (None, "[", "]"), ("A", "[", "]"))), ({"A": "a"}, 9), ) self.assertEqual( f(txt, ((None, "[", "]"), (None, "[", "]"), (None, "[", "]"))), ({}, 9), ) # failed matches self.assertEqual( f(txt, (("C", "[", "]"), ("X", "X", "X"), ("B", "[", "]"))), ({"B": "b", "C": "c", "X": None}, 6), ) # 'pos' argument self.assertEqual( f(txt, (("B", "[", "]"), ("A", "[", "]")), pos=1), ({"A": "a", "B": "b"}, 9), ) # 'values' argument self.assertEqual( f(txt, (("C", "[", "]"),), values={"A": "a", "B": "b"}), ({"A": "a", "B": "b", "C": "c"}, 3), ) vdict = {} rdict, pos = f(txt, (), values=vdict) self.assertIs(vdict, rdict) def test_extract_iter(self, f=text.extract_iter): txt = "[c][b][a]: xyz! [d][e" def g(*args): return list(f(*args)) self.assertEqual( g("", "[", "]"), []) self.assertEqual( g("[a]", "[", "]"), ["a"]) self.assertEqual( g(txt, "[", "]"), ["c", "b", "a", "d"]) self.assertEqual( g(txt, "X", "X"), []) self.assertEqual( g(txt, "[", "]", 6), ["a", "d"]) def test_extract_from(self, f=text.extract_from): txt = "[c][b][a]: xyz! [d][e" e = f(txt) self.assertEqual(e("[", "]"), "c") self.assertEqual(e("[", "]"), "b") self.assertEqual(e("[", "]"), "a") self.assertEqual(e("[", "]"), "d") self.assertEqual(e("[", "]"), "") self.assertEqual(e("[", "]"), "") e = f(txt, pos=6, default="END") self.assertEqual(e("[", "]"), "a") self.assertEqual(e("[", "]"), "d") self.assertEqual(e("[", "]"), "END") self.assertEqual(e("[", "]"), "END") def test_parse_unicode_escapes(self, f=text.parse_unicode_escapes): self.assertEqual(f(""), "") self.assertEqual(f("foobar"), "foobar") self.assertEqual(f("foo’bar"), "foo’bar") self.assertEqual(f("foo\\u2019bar"), "foo’bar") self.assertEqual(f("foo\\u201bar"), "foo‛ar") self.assertEqual(f("foo\\u201zar"), "foo\\u201zar") self.assertEqual( f("\\u2018foo\\u2019\\u2020bar\\u00ff"), "‘foo’†barÿ", ) def test_parse_bytes(self, f=text.parse_bytes): self.assertEqual(f("0"), 0) self.assertEqual(f("50"), 50) self.assertEqual(f("50k"), 50 * 1024**1) self.assertEqual(f("50m"), 50 * 1024**2) self.assertEqual(f("50g"), 50 * 1024**3) self.assertEqual(f("50t"), 50 * 1024**4) self.assertEqual(f("50p"), 50 * 1024**5) # fractions self.assertEqual(f("123.456"), 123) self.assertEqual(f("123.567"), 124) self.assertEqual(f("0.5M"), round(0.5 * 1024**2)) # invalid arguments for value in INVALID_ALT: self.assertEqual(f(value), 0) self.assertEqual(f("NaN"), 0) self.assertEqual(f("invalid"), 0) self.assertEqual(f(" 123 kb "), 0) def test_parse_int(self, f=text.parse_int): self.assertEqual(f(0), 0) self.assertEqual(f("0"), 0) self.assertEqual(f(123), 123) self.assertEqual(f("123"), 123) # invalid arguments for value in INVALID_ALT: self.assertEqual(f(value), 0) self.assertEqual(f("123.456"), 0) self.assertEqual(f("zzz"), 0) self.assertEqual(f([1, 2, 3]), 0) self.assertEqual(f({1: 2, 3: 4}), 0) # 'default' argument default = "default" for value in INVALID_ALT: self.assertEqual(f(value, default), default) self.assertEqual(f("zzz", default), default) def test_parse_float(self, f=text.parse_float): self.assertEqual(f(0), 0.0) self.assertEqual(f("0"), 0.0) self.assertEqual(f(123), 123.0) self.assertEqual(f("123"), 123.0) self.assertEqual(f(123.456), 123.456) self.assertEqual(f("123.456"), 123.456) # invalid arguments for value in INVALID_ALT: self.assertEqual(f(value), 0.0) self.assertEqual(f("zzz"), 0.0) self.assertEqual(f([1, 2, 3]), 0.0) self.assertEqual(f({1: 2, 3: 4}), 0.0) # 'default' argument default = "default" for value in INVALID_ALT: self.assertEqual(f(value, default), default) self.assertEqual(f("zzz", default), default) def test_parse_query(self, f=text.parse_query): # standard usage self.assertEqual(f(""), {}) self.assertEqual(f("foo=1"), {"foo": "1"}) self.assertEqual(f("foo=1&bar=2"), {"foo": "1", "bar": "2"}) # missing value self.assertEqual(f("bar"), {}) self.assertEqual(f("foo=1&bar"), {"foo": "1"}) self.assertEqual(f("foo=1&bar&baz=3"), {"foo": "1", "baz": "3"}) # keys with identical names self.assertEqual(f("foo=1&foo=2"), {"foo": "1"}) self.assertEqual( f("foo=1&bar=2&foo=3&bar=4"), {"foo": "1", "bar": "2"}, ) # invalid arguments for value in INVALID: self.assertEqual(f(value), {}) def test_parse_timestamp(self, f=text.parse_timestamp): null = datetime.datetime.utcfromtimestamp(0) value = datetime.datetime.utcfromtimestamp(1555816235) self.assertEqual(f(0) , null) self.assertEqual(f("0") , null) self.assertEqual(f(1555816235) , value) self.assertEqual(f("1555816235"), value) for value in INVALID_ALT: self.assertEqual(f(value), None) self.assertEqual(f(value, "foo"), "foo") def test_parse_datetime(self, f=text.parse_datetime): null = datetime.datetime.utcfromtimestamp(0) self.assertEqual(f("1970-01-01T00:00:00+00:00"), null) self.assertEqual(f("1970-01-01T00:00:00+0000") , null) self.assertEqual(f("1970.01.01", "%Y.%m.%d") , null) self.assertEqual( f("2019-05-07T21:25:02+09:00"), datetime.datetime(2019, 5, 7, 12, 25, 2), ) self.assertEqual( f("2019-05-07T21:25:02+0900"), datetime.datetime(2019, 5, 7, 12, 25, 2), ) self.assertEqual( f("2019-05-07T21:25:02.753+0900", "%Y-%m-%dT%H:%M:%S.%f%z"), datetime.datetime(2019, 5, 7, 12, 25, 2), ) self.assertEqual( f("2019-05-07T21:25:02", "%Y-%m-%dT%H:%M:%S", utcoffset=9), datetime.datetime(2019, 5, 7, 12, 25, 2), ) self.assertEqual( f("2019-05-07 21:25:02"), "2019-05-07 21:25:02", ) for value in INVALID: self.assertEqual(f(value), None) self.assertEqual(f("1970.01.01"), "1970.01.01") if __name__ == '__main__': unittest.main() ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1649421579.0 gallery_dl-1.21.1/test/test_util.py0000644000175000017500000005121014224026413015750 0ustar00mikemike#!/usr/bin/env python3 # -*- coding: utf-8 -*- # Copyright 2015-2022 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. import os import sys import unittest import io import random import string import datetime import http.cookiejar sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) from gallery_dl import util, text, exception # noqa E402 class TestRange(unittest.TestCase): def test_parse_range(self, f=util.RangePredicate.parse_range): self.assertEqual( f(""), []) self.assertEqual( f("1-2"), [(1, 2)]) self.assertEqual( f("-"), [(1, sys.maxsize)]) self.assertEqual( f("-2,4,6-8,10-"), [(1, 2), (4, 4), (6, 8), (10, sys.maxsize)]) self.assertEqual( f(" - 3 , 4- 4, 2-6"), [(1, 3), (4, 4), (2, 6)]) def test_optimize_range(self, f=util.RangePredicate.optimize_range): self.assertEqual( f([]), []) self.assertEqual( f([(2, 4)]), [(2, 4)]) self.assertEqual( f([(2, 4), (6, 8), (10, 12)]), [(2, 4), (6, 8), (10, 12)]) self.assertEqual( f([(2, 4), (4, 6), (5, 8)]), [(2, 8)]) self.assertEqual( f([(1, 1), (2, 2), (3, 6), (8, 9)]), [(1, 6), (8, 9)]) class TestPredicate(unittest.TestCase): def test_range_predicate(self): dummy = None pred = util.RangePredicate(" - 3 , 4- 4, 2-6") for i in range(6): self.assertTrue(pred(dummy, dummy)) with self.assertRaises(exception.StopExtraction): bool(pred(dummy, dummy)) pred = util.RangePredicate("1, 3, 5") self.assertTrue(pred(dummy, dummy)) self.assertFalse(pred(dummy, dummy)) self.assertTrue(pred(dummy, dummy)) self.assertFalse(pred(dummy, dummy)) self.assertTrue(pred(dummy, dummy)) with self.assertRaises(exception.StopExtraction): bool(pred(dummy, dummy)) pred = util.RangePredicate("") with self.assertRaises(exception.StopExtraction): bool(pred(dummy, dummy)) def test_unique_predicate(self): dummy = None pred = util.UniquePredicate() # no duplicates self.assertTrue(pred("1", dummy)) self.assertTrue(pred("2", dummy)) self.assertFalse(pred("1", dummy)) self.assertFalse(pred("2", dummy)) self.assertTrue(pred("3", dummy)) self.assertFalse(pred("3", dummy)) # duplicates for "text:" self.assertTrue(pred("text:123", dummy)) self.assertTrue(pred("text:123", dummy)) self.assertTrue(pred("text:123", dummy)) def test_filter_predicate(self): url = "" pred = util.FilterPredicate("a < 3") self.assertTrue(pred(url, {"a": 2})) self.assertFalse(pred(url, {"a": 3})) with self.assertRaises(SyntaxError): util.FilterPredicate("(") with self.assertRaises(exception.FilterError): util.FilterPredicate("a > 1")(url, {"a": None}) with self.assertRaises(exception.FilterError): util.FilterPredicate("b > 1")(url, {"a": 2}) def test_build_predicate(self): pred = util.build_predicate([]) self.assertIsInstance(pred, type(lambda: True)) pred = util.build_predicate([util.UniquePredicate()]) self.assertIsInstance(pred, util.UniquePredicate) pred = util.build_predicate([util.UniquePredicate(), util.UniquePredicate()]) self.assertIs(pred.func, util.chain_predicates) class TestISO639_1(unittest.TestCase): def test_code_to_language(self): d = "default" self._run_test(util.code_to_language, { ("en",): "English", ("FR",): "French", ("ja",): "Japanese", ("xx",): None, ("" ,): None, (None,): None, ("en", d): "English", ("FR", d): "French", ("xx", d): d, ("" , d): d, (None, d): d, }) def test_language_to_code(self): d = "default" self._run_test(util.language_to_code, { ("English",): "en", ("fRENch",): "fr", ("Japanese",): "ja", ("xx",): None, ("" ,): None, (None,): None, ("English", d): "en", ("fRENch", d): "fr", ("xx", d): d, ("" , d): d, (None, d): d, }) def _run_test(self, func, tests): for args, result in tests.items(): self.assertEqual(func(*args), result) class TestCookiesTxt(unittest.TestCase): def test_load_cookiestxt(self): def _assert(content, expected): cookies = util.load_cookiestxt(io.StringIO(content, None)) for c, e in zip(cookies, expected): self.assertEqual(c.__dict__, e.__dict__) _assert("", []) _assert("\n\n\n", []) _assert("$ Comment", []) _assert("# Comment", []) _assert(" # Comment \n\n $ Comment ", []) _assert( ".example.org\tTRUE\t/\tTRUE\t0\tname\tvalue", [self._cookie("name", "value", ".example.org")], ) _assert( ".example.org\tTRUE\t/\tTRUE\t\tname\t", [self._cookie("name", "", ".example.org")], ) _assert( "\tTRUE\t/\tTRUE\t\tname\t", [self._cookie("name", "", "")], ) _assert( "# Netscape HTTP Cookie File\n" "\n" "# default\n" ".example.org TRUE / FALSE 0 n1 v1\n" ".example.org TRUE / TRUE 2145945600 n2 v2\n" ".example.org TRUE /path FALSE 0 n3\n" "\n" " # # extra # # \n" "www.example.org FALSE / FALSE n4 \n" "www.example.org FALSE /path FALSE 100 n5 v5\n", [ self._cookie( "n1", "v1", ".example.org", True, "/", False), self._cookie( "n2", "v2", ".example.org", True, "/", True, 2145945600), self._cookie( "n3", None, ".example.org", True, "/path", False), self._cookie( "n4", "" , "www.example.org", False, "/", False), self._cookie( "n5", "v5", "www.example.org", False, "/path", False, 100), ], ) with self.assertRaises(ValueError): util.load_cookiestxt("example.org\tTRUE\t/\tTRUE\t0\tname") def test_save_cookiestxt(self): def _assert(cookies, expected): fp = io.StringIO(newline=None) util.save_cookiestxt(fp, cookies) self.assertMultiLineEqual(fp.getvalue(), expected) _assert([], "# Netscape HTTP Cookie File\n\n") _assert( [self._cookie("name", "value", ".example.org")], "# Netscape HTTP Cookie File\n\n" ".example.org\tTRUE\t/\tTRUE\t0\tname\tvalue\n", ) _assert( [ self._cookie( "n1", "v1", ".example.org", True, "/", False), self._cookie( "n2", "v2", ".example.org", True, "/", True, 2145945600), self._cookie( "n3", None, ".example.org", True, "/path", False), self._cookie( "n4", "" , "www.example.org", False, "/", False), self._cookie( "n5", "v5", "www.example.org", False, "/path", False, 100), self._cookie( "n6", "v6", "", False), ], "# Netscape HTTP Cookie File\n" "\n" ".example.org TRUE / FALSE 0 n1 v1\n" ".example.org TRUE / TRUE 2145945600 n2 v2\n" ".example.org TRUE /path FALSE 0 n3\n" "www.example.org FALSE / FALSE 0 n4 \n" "www.example.org FALSE /path FALSE 100 n5 v5\n", ) def _cookie(self, name, value, domain, domain_specified=True, path="/", secure=True, expires=None): return http.cookiejar.Cookie( 0, name, value, None, False, domain, domain_specified, domain.startswith("."), path, False, secure, expires, False, None, None, {}, ) class TestOther(unittest.TestCase): def test_bencode(self): self.assertEqual(util.bencode(0), "") self.assertEqual(util.bencode(123), "123") self.assertEqual(util.bencode(123, "01"), "1111011") self.assertEqual(util.bencode(123, "BA"), "AAAABAA") def test_bdecode(self): self.assertEqual(util.bdecode(""), 0) self.assertEqual(util.bdecode("123"), 123) self.assertEqual(util.bdecode("1111011", "01"), 123) self.assertEqual(util.bdecode("AAAABAA", "BA"), 123) def test_bencode_bdecode(self): for _ in range(100): value = random.randint(0, 1000000) for alphabet in ("01", "0123456789", string.ascii_letters): result = util.bdecode(util.bencode(value, alphabet), alphabet) self.assertEqual(result, value) def test_advance(self): items = range(5) self.assertCountEqual( util.advance(items, 0), items) self.assertCountEqual( util.advance(items, 3), range(3, 5)) self.assertCountEqual( util.advance(items, 9), []) self.assertCountEqual( util.advance(util.advance(items, 1), 2), range(3, 5)) def test_unique(self): self.assertSequenceEqual( list(util.unique("")), "") self.assertSequenceEqual( list(util.unique("AABBCC")), "ABC") self.assertSequenceEqual( list(util.unique("ABABABCAABBCC")), "ABC") self.assertSequenceEqual( list(util.unique([1, 2, 1, 3, 2, 1])), [1, 2, 3]) def test_unique_sequence(self): self.assertSequenceEqual( list(util.unique_sequence("")), "") self.assertSequenceEqual( list(util.unique_sequence("AABBCC")), "ABC") self.assertSequenceEqual( list(util.unique_sequence("ABABABCAABBCC")), "ABABABCABC") self.assertSequenceEqual( list(util.unique_sequence([1, 2, 1, 3, 2, 1])), [1, 2, 1, 3, 2, 1]) def test_contains(self): c = [1, "2", 3, 4, "5", "foo"] self.assertTrue(util.contains(c, 1)) self.assertTrue(util.contains(c, "foo")) self.assertTrue(util.contains(c, [1, 3, "5"])) self.assertTrue(util.contains(c, ["a", "b", "5"])) self.assertFalse(util.contains(c, "bar")) self.assertFalse(util.contains(c, [2, 5, "bar"])) s = "1 2 3 asd qwe y(+)c f(+)(-) bar" self.assertTrue(util.contains(s, "y(+)c")) self.assertTrue(util.contains(s, ["asd", "qwe", "yxc"])) self.assertTrue(util.contains(s, ["sdf", "dfg", "qwe"])) self.assertFalse(util.contains(s, "tag1")) self.assertFalse(util.contains(s, ["tag1", "tag2", "tag3"])) s = "1, 2, 3, asd, qwe, y(+)c, f(+)(-), bar" self.assertTrue(util.contains(s, "y(+)c", ", ")) self.assertTrue(util.contains(s, ["sdf", "dfg", "qwe"], ", ")) self.assertFalse(util.contains(s, "tag1", ", ")) def test_raises(self): func = util.raises(Exception) with self.assertRaises(Exception): func() func = util.raises(ValueError) with self.assertRaises(ValueError): func(1) with self.assertRaises(ValueError): func(2) with self.assertRaises(ValueError): func(3) def test_identity(self): for value in (123, "foo", [1, 2, 3], (1, 2, 3), {1: 2}, None): self.assertIs(util.identity(value), value) def test_noop(self): self.assertEqual(util.noop(), None) def test_compile_expression(self): expr = util.compile_expression("1 + 2 * 3") self.assertEqual(expr(), 7) self.assertEqual(expr({"a": 1, "b": 2, "c": 3}), 7) self.assertEqual(expr({"a": 9, "b": 9, "c": 9}), 7) expr = util.compile_expression("a + b * c") self.assertEqual(expr({"a": 1, "b": 2, "c": 3}), 7) self.assertEqual(expr({"a": 9, "b": 9, "c": 9}), 90) with self.assertRaises(NameError): expr() with self.assertRaises(NameError): expr({"a": 2}) with self.assertRaises(SyntaxError): util.compile_expression("") with self.assertRaises(SyntaxError): util.compile_expression("x++") expr = util.compile_expression("1 and abort()") with self.assertRaises(exception.StopExtraction): expr() def test_build_duration_func(self, f=util.build_duration_func): for v in (0, 0.0, "", None, (), []): self.assertIsNone(f(v)) def test_single(df, v): for _ in range(10): self.assertEqual(df(), v) def test_range(df, lower, upper): for __ in range(10): v = df() self.assertGreaterEqual(v, lower) self.assertLessEqual(v, upper) test_single(f(3), 3) test_single(f(3.0), 3.0) test_single(f("3"), 3) test_single(f("3.0-"), 3) test_single(f(" 3 -"), 3) test_range(f((2, 4)), 2, 4) test_range(f([2, 4]), 2, 4) test_range(f("2-4"), 2, 4) test_range(f(" 2.0 - 4 "), 2, 4) def test_extractor_filter(self): # empty func = util.build_extractor_filter("") self.assertEqual(func(TestExtractor) , True) self.assertEqual(func(TestExtractorParent), True) self.assertEqual(func(TestExtractorAlt) , True) # category func = util.build_extractor_filter("test_category") self.assertEqual(func(TestExtractor) , False) self.assertEqual(func(TestExtractorParent), False) self.assertEqual(func(TestExtractorAlt) , True) # subcategory func = util.build_extractor_filter("*:test_subcategory") self.assertEqual(func(TestExtractor) , False) self.assertEqual(func(TestExtractorParent), True) self.assertEqual(func(TestExtractorAlt) , False) # basecategory func = util.build_extractor_filter("test_basecategory") self.assertEqual(func(TestExtractor) , False) self.assertEqual(func(TestExtractorParent), False) self.assertEqual(func(TestExtractorAlt) , False) # category-subcategory pair func = util.build_extractor_filter("test_category:test_subcategory") self.assertEqual(func(TestExtractor) , False) self.assertEqual(func(TestExtractorParent), True) self.assertEqual(func(TestExtractorAlt) , True) # combination func = util.build_extractor_filter( ["test_category", "*:test_subcategory"]) self.assertEqual(func(TestExtractor) , False) self.assertEqual(func(TestExtractorParent), False) self.assertEqual(func(TestExtractorAlt) , False) # whitelist func = util.build_extractor_filter( "test_category:test_subcategory", negate=False) self.assertEqual(func(TestExtractor) , True) self.assertEqual(func(TestExtractorParent), False) self.assertEqual(func(TestExtractorAlt) , False) func = util.build_extractor_filter( ["test_category:test_subcategory", "*:test_subcategory_parent"], negate=False) self.assertEqual(func(TestExtractor) , True) self.assertEqual(func(TestExtractorParent), True) self.assertEqual(func(TestExtractorAlt) , False) def test_generate_token(self): tokens = set() for _ in range(100): token = util.generate_token() tokens.add(token) self.assertEqual(len(token), 16 * 2) self.assertRegex(token, r"^[0-9a-f]+$") self.assertGreaterEqual(len(tokens), 99) token = util.generate_token(80) self.assertEqual(len(token), 80 * 2) self.assertRegex(token, r"^[0-9a-f]+$") def test_format_value(self): self.assertEqual(util.format_value(0) , "0") self.assertEqual(util.format_value(1) , "1") self.assertEqual(util.format_value(12) , "12") self.assertEqual(util.format_value(123) , "123") self.assertEqual(util.format_value(1234) , "1.23k") self.assertEqual(util.format_value(12345) , "12.34k") self.assertEqual(util.format_value(123456) , "123.45k") self.assertEqual(util.format_value(1234567) , "1.23M") self.assertEqual(util.format_value(12345678) , "12.34M") self.assertEqual(util.format_value(123456789) , "123.45M") self.assertEqual(util.format_value(1234567890), "1.23G") def test_combine_dict(self): self.assertEqual( util.combine_dict({}, {}), {}) self.assertEqual( util.combine_dict({1: 1, 2: 2}, {2: 4, 4: 8}), {1: 1, 2: 4, 4: 8}) self.assertEqual( util.combine_dict( {1: {11: 22, 12: 24}, 2: {13: 26, 14: 28}}, {1: {11: 33, 13: 39}, 2: "str"}), {1: {11: 33, 12: 24, 13: 39}, 2: "str"}) self.assertEqual( util.combine_dict( {1: {2: {3: {4: {"1": "a", "2": "b"}}}}}, {1: {2: {3: {4: {"1": "A", "3": "C"}}}}}), {1: {2: {3: {4: {"1": "A", "2": "b", "3": "C"}}}}}) def test_transform_dict(self): d = {} util.transform_dict(d, str) self.assertEqual(d, {}) d = {1: 123, 2: "123", 3: True, 4: None} util.transform_dict(d, str) self.assertEqual( d, {1: "123", 2: "123", 3: "True", 4: "None"}) d = {1: 123, 2: "123", 3: "foo", 4: {11: 321, 12: "321", 13: "bar"}} util.transform_dict(d, text.parse_int) self.assertEqual( d, {1: 123, 2: 123, 3: 0, 4: {11: 321, 12: 321, 13: 0}}) def test_filter_dict(self): d = {} r = util.filter_dict(d) self.assertEqual(r, d) self.assertIsNot(r, d) d = {"foo": 123, "bar": [], "baz": None} r = util.filter_dict(d) self.assertEqual(r, d) self.assertIsNot(r, d) d = {"foo": 123, "_bar": [], "__baz__": None} r = util.filter_dict(d) self.assertEqual(r, {"foo": 123}) def test_number_to_string(self, f=util.number_to_string): self.assertEqual(f(1) , "1") self.assertEqual(f(1.0) , "1.0") self.assertEqual(f("1.0") , "1.0") self.assertEqual(f([1]) , [1]) self.assertEqual(f({1: 2}), {1: 2}) self.assertEqual(f(True) , True) self.assertEqual(f(None) , None) def test_to_string(self, f=util.to_string): self.assertEqual(f(1) , "1") self.assertEqual(f(1.0) , "1.0") self.assertEqual(f("1.0"), "1.0") self.assertEqual(f("") , "") self.assertEqual(f(None) , "") self.assertEqual(f(0) , "") self.assertEqual(f(["a"]), "a") self.assertEqual(f([1]) , "1") self.assertEqual(f(["a", "b", "c"]), "a, b, c") self.assertEqual(f([1, 2, 3]), "1, 2, 3") def test_datetime_to_timestamp(self, f=util.datetime_to_timestamp): self.assertEqual(f(util.EPOCH), 0.0) self.assertEqual(f(datetime.datetime(2010, 1, 1)), 1262304000.0) self.assertEqual(f(datetime.datetime(2010, 1, 1, 0, 0, 0, 128000)), 1262304000.128000) with self.assertRaises(TypeError): f(None) def test_datetime_to_timestamp_string( self, f=util.datetime_to_timestamp_string): self.assertEqual(f(util.EPOCH), "0") self.assertEqual(f(datetime.datetime(2010, 1, 1)), "1262304000") self.assertEqual(f(None), "") def test_universal_none(self): obj = util.NONE self.assertFalse(obj) self.assertEqual(str(obj), str(None)) self.assertEqual(repr(obj), repr(None)) self.assertIs(obj.attr, obj) self.assertIs(obj["key"], obj) class TestExtractor(): category = "test_category" subcategory = "test_subcategory" basecategory = "test_basecategory" class TestExtractorParent(TestExtractor): category = "test_category" subcategory = "test_subcategory_parent" class TestExtractorAlt(TestExtractor): category = "test_category_alt" subcategory = "test_subcategory" if __name__ == '__main__': unittest.main()