././@PaxHeader0000000000000000000000000000003300000000000010211 xustar0027 mtime=1747991096.462409 gallery_dl-1.29.7/0000755000175000017500000000000015014035070012417 5ustar00mikemike././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1747991093.0 gallery_dl-1.29.7/CHANGELOG.md0000644000175000017500000000700715014035065014240 0ustar00mikemike## 1.29.7 - 2025-05-23 ### Extractors #### Additions - [mangadex] add `following` extractor ([#7487](https://github.com/mikf/gallery-dl/issues/7487)) - [pixeldrain] add support for filesystem URLs ([#7473](https://github.com/mikf/gallery-dl/issues/7473)) #### Fixes - [bluesky] handle posts without `record` data ([#7499](https://github.com/mikf/gallery-dl/issues/7499)) - [civitai] fix & improve video downloads ([#7502](https://github.com/mikf/gallery-dl/issues/7502)) - [civitai] fix exception for images without `modelVersionId` ([#7432](https://github.com/mikf/gallery-dl/issues/7432)) - [civitai] make metadata extraction non-fatal ([#7562](https://github.com/mikf/gallery-dl/issues/7562)) - [fanbox] use `"browser": "firefox"` by default ([#7490](https://github.com/mikf/gallery-dl/issues/7490)) - [idolcomplex] fix pagination logic ([#7549](https://github.com/mikf/gallery-dl/issues/7549)) - [idolcomplex] fix 429 error during login by adding a 10s delay - [instagram:stories] fix `post_date` metadata ([#7521](https://github.com/mikf/gallery-dl/issues/7521)) - [motherless] fix video gallery downloads ([#7530](https://github.com/mikf/gallery-dl/issues/7530)) - [pinterest] handle `story_pin_product_sticker_block` blocks ([#7563](https://github.com/mikf/gallery-dl/issues/7563)) - [subscribestar] fix `content` and `title` metadata ([#7486](https://github.com/mikf/gallery-dl/issues/7486) [#7526](https://github.com/mikf/gallery-dl/issues/7526)) #### Improvements - [arcalive] allow overriding default `User-Agent` header ([#7556](https://github.com/mikf/gallery-dl/issues/7556)) - [fanbox] update API headers ([#7490](https://github.com/mikf/gallery-dl/issues/7490)) - [flickr] add `info` option ([#4720](https://github.com/mikf/gallery-dl/issues/4720) [#6817](https://github.com/mikf/gallery-dl/issues/6817)) - [flickr] add `profile` option - [instagram:stories] add `split` option ([#7521](https://github.com/mikf/gallery-dl/issues/7521)) - [mangadex] implement login with client credentials - [mangadex] send `Authorization` header only when necessary - [mastodon] support Akkoma/Pleroma `/notice/:ID` URLs ([#7496](https://github.com/mikf/gallery-dl/issues/7496)) - [mastodon] support Akkoma/Pleroma `/objects/:UUID` URLs ([#7497](https://github.com/mikf/gallery-dl/issues/7497)) - [pixiv] Implement sanity handling for ugoira works ([#4327](https://github.com/mikf/gallery-dl/issues/4327) [#6297](https://github.com/mikf/gallery-dl/issues/6297) [#7285](https://github.com/mikf/gallery-dl/issues/7285) [#7434](https://github.com/mikf/gallery-dl/issues/7434)) - [twitter:ctid] reduce chance of generating the same ID #### Metadata - [civitai] provide proper `extension` for model files ([#7432](https://github.com/mikf/gallery-dl/issues/7432)) - [flickr] provide `license_name` metadata - [sankaku] support new `tags` categories ([#7333](https://github.com/mikf/gallery-dl/issues/7333) [#7553](https://github.com/mikf/gallery-dl/issues/7553)) - [vipergirls] provide `num` and `count` metadata ([#7479](https://github.com/mikf/gallery-dl/issues/7479)) - [vipergirls] extract more metadata & rename fields ([#7479](https://github.com/mikf/gallery-dl/issues/7479)) ### Downloaders - [http] fix setting `mtime` per file ([#7529](https://github.com/mikf/gallery-dl/issues/7529)) - [ytdl] improve temp/part file handling ([#6949](https://github.com/mikf/gallery-dl/issues/6949) [#7494](https://github.com/mikf/gallery-dl/issues/7494)) ### Cookies - support Zen browser ([#7233](https://github.com/mikf/gallery-dl/issues/7233) [#7546](https://github.com/mikf/gallery-dl/issues/7546)) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1743510441.0 gallery_dl-1.29.7/LICENSE0000644000175000017500000004325414772755651013464 0ustar00mikemike GNU GENERAL PUBLIC LICENSE Version 2, June 1991 Copyright (C) 1989, 1991 Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This General Public License applies to most of the Free Software Foundation's software and to any other program whose authors commit to using it. (Some other Free Software Foundation software is covered by the GNU Lesser General Public License instead.) You can apply it to your programs, too. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things. To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it. For example, if you distribute copies of such a program, whether gratis or for a fee, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights. We protect your rights with two steps: (1) copyright the software, and (2) offer you this license which gives you legal permission to copy, distribute and/or modify the software. Also, for each author's protection and ours, we want to make certain that everyone understands that there is no warranty for this free software. If the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors' reputations. Finally, any free program is threatened constantly by software patents. We wish to avoid the danger that redistributors of a free program will individually obtain patent licenses, in effect making the program proprietary. To prevent this, we have made it clear that any patent must be licensed for everyone's free use or not licensed at all. The precise terms and conditions for copying, distribution and modification follow. GNU GENERAL PUBLIC LICENSE TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 0. This License applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License. The "Program", below, refers to any such program or work, and a "work based on the Program" means either the Program or any derivative work under copyright law: that is to say, a work containing the Program or a portion of it, either verbatim or with modifications and/or translated into another language. (Hereinafter, translation is included without limitation in the term "modification".) Each licensee is addressed as "you". Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running the Program is not restricted, and the output from the Program is covered only if its contents constitute a work based on the Program (independent of having been made by running the Program). Whether that is true depends on what the Program does. 1. You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and give any other recipients of the Program a copy of this License along with the Program. You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. 2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions: a) You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change. b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License. c) If the modified program normally reads commands interactively when run, you must cause it, when started running for such interactive use in the most ordinary way, to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty (or else, saying that you provide a warranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this License. (Exception: if the Program itself is interactive but does not normally print such an announcement, your work based on the Program is not required to print an announcement.) These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Program, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Program, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it. Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Program. In addition, mere aggregation of another work not based on the Program with the Program (or with a work based on the Program) on a volume of a storage or distribution medium does not bring the other work under the scope of this License. 3. You may copy and distribute the Program (or a work based on it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following: a) Accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, b) Accompany it with a written offer, valid for at least three years, to give any third party, for a charge no more than your cost of physically performing source distribution, a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, c) Accompany it with the information you received as to the offer to distribute corresponding source code. (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer, in accord with Subsection b above.) The source code for a work means the preferred form of the work for making modifications to it. For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable. However, as a special exception, the source code distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable. If distribution of executable or object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place counts as distribution of the source code, even though third parties are not compelled to copy the source along with the object code. 4. You may not copy, modify, sublicense, or distribute the Program except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense or distribute the Program is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. 5. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Program or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Program (or any work based on the Program), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Program or works based on it. 6. Each time you redistribute the Program (or any work based on the Program), the recipient automatically receives a license from the original licensor to copy, distribute or modify the Program subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties to this License. 7. If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Program at all. For example, if a patent license would not permit royalty-free redistribution of the Program by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Program. If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply and the section as a whole is intended to apply in other circumstances. It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system, which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice. This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License. 8. If the distribution and/or use of the Program is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Program under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License. 9. The Free Software Foundation may publish revised and/or new versions of the General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies a version number of this License which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of this License, you may choose any version ever published by the Free Software Foundation. 10. If you wish to incorporate parts of the Program into other free programs whose distribution conditions are different, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. NO WARRANTY 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. END OF TERMS AND CONDITIONS How to Apply These Terms to Your New Programs If you develop a new program, and you want it to be of the greatest possible use to the public, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms. To do so, attach the following notices to the program. It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. Copyright (C) This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. Also add information on how to contact you by electronic and paper mail. If the program is interactive, make it output a short notice like this when it starts in an interactive mode: Gnomovision version 69, Copyright (C) year name of author Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. This is free software, and you are welcome to redistribute it under certain conditions; type `show c' for details. The hypothetical commands `show w' and `show c' should show the appropriate parts of the General Public License. Of course, the commands you use may be called something other than `show w' and `show c'; they could even be mouse-clicks or menu items--whatever suits your program. You should also get your employer (if you work as a programmer) or your school, if any, to sign a "copyright disclaimer" for the program, if necessary. Here is a sample; alter the names: Yoyodyne, Inc., hereby disclaims all copyright interest in the program `Gnomovision' (which makes passes at compilers) written by James Hacker. , 1 April 1989 Ty Coon, President of Vice This General Public License does not permit incorporating your program into proprietary programs. If your program is a subroutine library, you may consider it more useful to permit linking proprietary applications with the library. If this is what you want to do, use the GNU Lesser General Public License instead of this License. ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1745260818.0 gallery_dl-1.29.7/MANIFEST.in0000644000175000017500000000013315001510422014145 0ustar00mikemikeinclude README.rst CHANGELOG.md LICENSE scripts/run_tests.py recursive-include docs *.conf ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1747991096.4623618 gallery_dl-1.29.7/PKG-INFO0000644000175000017500000003666015014035070013527 0ustar00mikemikeMetadata-Version: 2.4 Name: gallery_dl Version: 1.29.7 Summary: Command-line program to download image galleries and collections from several image hosting sites Home-page: https://github.com/mikf/gallery-dl Download-URL: https://github.com/mikf/gallery-dl/releases/latest Author: Mike Fährmann Author-email: mike_faehrmann@web.de Maintainer: Mike Fährmann Maintainer-email: mike_faehrmann@web.de License: GPLv2 Keywords: image gallery downloader crawler scraper Classifier: Development Status :: 5 - Production/Stable Classifier: Environment :: Console Classifier: Intended Audience :: End Users/Desktop Classifier: License :: OSI Approved :: GNU General Public License v2 (GPLv2) Classifier: Operating System :: OS Independent Classifier: Programming Language :: Python Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3 :: Only Classifier: Programming Language :: Python :: 3.4 Classifier: Programming Language :: Python :: 3.5 Classifier: Programming Language :: Python :: 3.6 Classifier: Programming Language :: Python :: 3.7 Classifier: Programming Language :: Python :: 3.8 Classifier: Programming Language :: Python :: 3.9 Classifier: Programming Language :: Python :: 3.10 Classifier: Programming Language :: Python :: 3.11 Classifier: Programming Language :: Python :: 3.12 Classifier: Programming Language :: Python :: 3.13 Classifier: Programming Language :: Python :: Implementation :: CPython Classifier: Programming Language :: Python :: Implementation :: PyPy Classifier: Topic :: Internet :: WWW/HTTP Classifier: Topic :: Multimedia :: Graphics Classifier: Topic :: Utilities Requires-Python: >=3.4 License-File: LICENSE Requires-Dist: requests>=2.11.0 Provides-Extra: video Requires-Dist: youtube-dl; extra == "video" Dynamic: author Dynamic: author-email Dynamic: classifier Dynamic: description Dynamic: download-url Dynamic: home-page Dynamic: keywords Dynamic: license Dynamic: license-file Dynamic: maintainer Dynamic: maintainer-email Dynamic: provides-extra Dynamic: requires-dist Dynamic: requires-python Dynamic: summary ========== gallery-dl ========== *gallery-dl* is a command-line program to download image galleries and collections from several image hosting sites (see `Supported Sites `__). It is a cross-platform tool with many `configuration options `__ and powerful `filenaming capabilities `__. |pypi| |build| .. contents:: Dependencies ============ - Python_ 3.4+ - Requests_ Optional -------- - yt-dlp_ or youtube-dl_: HLS/DASH video downloads, ``ytdl`` integration - FFmpeg_: Pixiv Ugoira conversion - mkvmerge_: Accurate Ugoira frame timecodes - PySocks_: SOCKS proxy support - brotli_ or brotlicffi_: Brotli compression support - zstandard_: Zstandard compression support - PyYAML_: YAML configuration file support - toml_: TOML configuration file support for Python<3.11 - SecretStorage_: GNOME keyring passwords for ``--cookies-from-browser`` - Psycopg_: PostgreSQL archive support Installation ============ Pip --- The stable releases of *gallery-dl* are distributed on PyPI_ and can be easily installed or upgraded using pip_: .. code:: bash python3 -m pip install -U gallery-dl Installing the latest dev version directly from GitHub can be done with pip_ as well: .. code:: bash python3 -m pip install -U --force-reinstall --no-deps https://github.com/mikf/gallery-dl/archive/master.tar.gz Omit :code:`--no-deps` if Requests_ hasn't been installed yet. Note: Windows users should use :code:`py -3` instead of :code:`python3`. It is advised to use the latest version of pip_, including the essential packages :code:`setuptools` and :code:`wheel`. To ensure these packages are up-to-date, run .. code:: bash python3 -m pip install --upgrade pip setuptools wheel Standalone Executable --------------------- Prebuilt executable files with a Python interpreter and required Python packages included are available for - `Windows `__ (Requires `Microsoft Visual C++ Redistributable Package (x86) `__) - `Linux `__ Nightly Builds -------------- | Executables build from the latest commit can be found at | https://github.com/gdl-org/builds/releases Snap ---- Linux users that are using a distro that is supported by Snapd_ can install *gallery-dl* from the Snap Store: .. code:: bash snap install gallery-dl Chocolatey ---------- Windows users that have Chocolatey_ installed can install *gallery-dl* from the Chocolatey Community Packages repository: .. code:: powershell choco install gallery-dl Scoop ----- *gallery-dl* is also available in the Scoop_ "main" bucket for Windows users: .. code:: powershell scoop install gallery-dl Homebrew -------- For macOS or Linux users using Homebrew: .. code:: bash brew install gallery-dl MacPorts -------- For macOS users with MacPorts: .. code:: bash sudo port install gallery-dl Docker -------- Using the Dockerfile in the repository: .. code:: bash git clone https://github.com/mikf/gallery-dl.git cd gallery-dl/ docker build -t gallery-dl:latest . Pulling image from `Docker Hub `__: .. code:: bash docker pull mikf123/gallery-dl docker tag mikf123/gallery-dl gallery-dl Pulling image from `GitHub Container Registry `__: .. code:: bash docker pull ghcr.io/mikf/gallery-dl docker tag ghcr.io/mikf/gallery-dl gallery-dl To run the container you will probably want to attach some directories on the host so that the config file and downloads can persist across runs. Make sure to either download the example config file reference in the repo and place it in the mounted volume location or touch an empty file there. If you gave the container a different tag or are using podman then make sure you adjust. Run ``docker image ls`` to check the name if you are not sure. This will remove the container after every use so you will always have a fresh environment for it to run. If you setup a ci-cd pipeline to autobuild the container you can also add a ``--pull=newer`` flag so that when you run it docker will check to see if there is a newer container and download it before running. .. code:: bash docker run --rm -v $HOME/Downloads/:/gallery-dl/ -v $HOME/.config/gallery-dl/gallery-dl.conf:/etc/gallery-dl.conf -it gallery-dl:latest You can also add an alias to your shell for "gallery-dl" or create a simple bash script and drop it somewhere in your $PATH to act as a shim for this command. Nix and Home Manager -------------------------- Adding *gallery-dl* to your system environment: .. code:: nix environment.systemPackages = with pkgs; [ gallery-dl ]; Using :code:`nix-shell` .. code:: bash nix-shell -p gallery-dl .. code:: bash nix-shell -p gallery-dl --run "gallery-dl " For Home Manager users, you can manage *gallery-dl* declaratively: .. code:: nix programs.gallery-dl = { enable = true; settings = { extractor.base-directory = "~/Downloads"; }; }; Alternatively, you can just add it to :code:`home.packages` if you don't want to manage it declaratively: .. code:: nix home.packages = with pkgs; [ gallery-dl ]; After making these changes, simply rebuild your configuration and open a new shell to have *gallery-dl* available. Usage ===== To use *gallery-dl* simply call it with the URLs you wish to download images from: .. code:: bash gallery-dl [OPTIONS]... URLS... Use :code:`gallery-dl --help` or see ``__ for a full list of all command-line options. Examples -------- Download images; in this case from danbooru via tag search for 'bonocho': .. code:: bash gallery-dl "https://danbooru.donmai.us/posts?tags=bonocho" Get the direct URL of an image from a site supporting authentication with username & password: .. code:: bash gallery-dl -g -u "" -p "" "https://twitter.com/i/web/status/604341487988576256" Filter manga chapters by chapter number and language: .. code:: bash gallery-dl --chapter-filter "10 <= chapter < 20" -o "lang=fr" "https://mangadex.org/title/59793dd0-a2d8-41a2-9758-8197287a8539" | Search a remote resource for URLs and download images from them: | (URLs for which no extractor can be found will be silently ignored) .. code:: bash gallery-dl "r:https://pastebin.com/raw/FLwrCYsT" If a site's address is nonstandard for its extractor, you can prefix the URL with the extractor's name to force the use of a specific extractor: .. code:: bash gallery-dl "tumblr:https://sometumblrblog.example" Configuration ============= Configuration files for *gallery-dl* use a JSON-based file format. Documentation ------------- A list of all available configuration options and their descriptions can be found at ``__. | For a default configuration file with available options set to their default values, see ``__. | For a commented example with more involved settings and option usage, see ``__. Locations --------- *gallery-dl* searches for configuration files in the following places: Windows: * ``%APPDATA%\gallery-dl\config.json`` * ``%USERPROFILE%\gallery-dl\config.json`` * ``%USERPROFILE%\gallery-dl.conf`` (``%USERPROFILE%`` usually refers to a user's home directory, i.e. ``C:\Users\\``) Linux, macOS, etc.: * ``/etc/gallery-dl.conf`` * ``${XDG_CONFIG_HOME}/gallery-dl/config.json`` * ``${HOME}/.config/gallery-dl/config.json`` * ``${HOME}/.gallery-dl.conf`` When run as `executable `__, *gallery-dl* will also look for a ``gallery-dl.conf`` file in the same directory as said executable. It is possible to use more than one configuration file at a time. In this case, any values from files after the first will get merged into the already loaded settings and potentially override previous ones. Authentication ============== Username & Password ------------------- Some extractors require you to provide valid login credentials in the form of a username & password pair. This is necessary for ``nijie`` and optional for ``aryion``, ``danbooru``, ``e621``, ``exhentai``, ``idolcomplex``, ``imgbb``, ``inkbunny``, ``mangadex``, ``mangoxo``, ``pillowfort``, ``sankaku``, ``subscribestar``, ``tapas``, ``tsumino``, ``twitter``, and ``zerochan``. You can set the necessary information in your `configuration file `__ .. code:: json { "extractor": { "twitter": { "username": "", "password": "" } } } or you can provide them directly via the :code:`-u/--username` and :code:`-p/--password` or via the :code:`-o/--option` command-line options .. code:: bash gallery-dl -u "" -p "" "URL" gallery-dl -o "username=" -o "password=" "URL" Cookies ------- For sites where login with username & password is not possible due to CAPTCHA or similar, or has not been implemented yet, you can use the cookies from a browser login session and input them into *gallery-dl*. This can be done via the `cookies `__ option in your configuration file by specifying - | the path to a Mozilla/Netscape format cookies.txt file exported by a browser addon | (e.g. `Get cookies.txt LOCALLY `__ for Chrome, `Export Cookies `__ for Firefox) - | a list of name-value pairs gathered from your browser's web developer tools | (in `Chrome `__, in `Firefox `__) - | the name of a browser to extract cookies from | (supported browsers are Chromium-based ones, Firefox, and Safari) For example: .. code:: json { "extractor": { "instagram": { "cookies": "$HOME/path/to/cookies.txt" }, "patreon": { "cookies": { "session_id": "K1T57EKu19TR49C51CDjOJoXNQLF7VbdVOiBrC9ye0a" } }, "twitter": { "cookies": ["firefox"] } } } | You can also specify a cookies.txt file with the :code:`--cookies` command-line option | or a browser to extract cookies from with :code:`--cookies-from-browser`: .. code:: bash gallery-dl --cookies "$HOME/path/to/cookies.txt" "URL" gallery-dl --cookies-from-browser firefox "URL" OAuth ----- *gallery-dl* supports user authentication via OAuth_ for some extractors. This is necessary for ``pixiv`` and optional for ``deviantart``, ``flickr``, ``reddit``, ``smugmug``, ``tumblr``, and ``mastodon`` instances. Linking your account to *gallery-dl* grants it the ability to issue requests on your account's behalf and enables it to access resources which would otherwise be unavailable to a public user. To do so, start by invoking it with ``oauth:`` as an argument. For example: .. code:: bash gallery-dl oauth:flickr You will be sent to the site's authorization page and asked to grant read access to *gallery-dl*. Authorize it and you will be shown one or more "tokens", which should be added to your configuration file. To authenticate with a ``mastodon`` instance, run *gallery-dl* with ``oauth:mastodon:`` as argument. For example: .. code:: bash gallery-dl oauth:mastodon:pawoo.net gallery-dl oauth:mastodon:https://mastodon.social/ .. _Python: https://www.python.org/downloads/ .. _PyPI: https://pypi.org/ .. _pip: https://pip.pypa.io/en/stable/ .. _Requests: https://requests.readthedocs.io/en/master/ .. _FFmpeg: https://www.ffmpeg.org/ .. _mkvmerge: https://www.matroska.org/downloads/mkvtoolnix.html .. _yt-dlp: https://github.com/yt-dlp/yt-dlp .. _youtube-dl: https://ytdl-org.github.io/youtube-dl/ .. _PySocks: https://pypi.org/project/PySocks/ .. _brotli: https://github.com/google/brotli .. _brotlicffi: https://github.com/python-hyper/brotlicffi .. _zstandard: https://github.com/indygreg/python-zstandard .. _PyYAML: https://pyyaml.org/ .. _toml: https://pypi.org/project/toml/ .. _SecretStorage: https://pypi.org/project/SecretStorage/ .. _Psycopg: https://www.psycopg.org/ .. _Snapd: https://docs.snapcraft.io/installing-snapd .. _OAuth: https://en.wikipedia.org/wiki/OAuth .. _Chocolatey: https://chocolatey.org/install .. _Scoop: https://scoop.sh .. |pypi| image:: https://img.shields.io/pypi/v/gallery-dl.svg :target: https://pypi.org/project/gallery-dl/ .. |build| image:: https://github.com/mikf/gallery-dl/workflows/tests/badge.svg :target: https://github.com/mikf/gallery-dl/actions .. |gitter| image:: https://badges.gitter.im/gallery-dl/main.svg :target: https://gitter.im/gallery-dl/main ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1747991092.0 gallery_dl-1.29.7/README.rst0000644000175000017500000003230415014035064014113 0ustar00mikemike========== gallery-dl ========== *gallery-dl* is a command-line program to download image galleries and collections from several image hosting sites (see `Supported Sites `__). It is a cross-platform tool with many `configuration options `__ and powerful `filenaming capabilities `__. |pypi| |build| .. contents:: Dependencies ============ - Python_ 3.4+ - Requests_ Optional -------- - yt-dlp_ or youtube-dl_: HLS/DASH video downloads, ``ytdl`` integration - FFmpeg_: Pixiv Ugoira conversion - mkvmerge_: Accurate Ugoira frame timecodes - PySocks_: SOCKS proxy support - brotli_ or brotlicffi_: Brotli compression support - zstandard_: Zstandard compression support - PyYAML_: YAML configuration file support - toml_: TOML configuration file support for Python<3.11 - SecretStorage_: GNOME keyring passwords for ``--cookies-from-browser`` - Psycopg_: PostgreSQL archive support Installation ============ Pip --- The stable releases of *gallery-dl* are distributed on PyPI_ and can be easily installed or upgraded using pip_: .. code:: bash python3 -m pip install -U gallery-dl Installing the latest dev version directly from GitHub can be done with pip_ as well: .. code:: bash python3 -m pip install -U --force-reinstall --no-deps https://github.com/mikf/gallery-dl/archive/master.tar.gz Omit :code:`--no-deps` if Requests_ hasn't been installed yet. Note: Windows users should use :code:`py -3` instead of :code:`python3`. It is advised to use the latest version of pip_, including the essential packages :code:`setuptools` and :code:`wheel`. To ensure these packages are up-to-date, run .. code:: bash python3 -m pip install --upgrade pip setuptools wheel Standalone Executable --------------------- Prebuilt executable files with a Python interpreter and required Python packages included are available for - `Windows `__ (Requires `Microsoft Visual C++ Redistributable Package (x86) `__) - `Linux `__ Nightly Builds -------------- | Executables build from the latest commit can be found at | https://github.com/gdl-org/builds/releases Snap ---- Linux users that are using a distro that is supported by Snapd_ can install *gallery-dl* from the Snap Store: .. code:: bash snap install gallery-dl Chocolatey ---------- Windows users that have Chocolatey_ installed can install *gallery-dl* from the Chocolatey Community Packages repository: .. code:: powershell choco install gallery-dl Scoop ----- *gallery-dl* is also available in the Scoop_ "main" bucket for Windows users: .. code:: powershell scoop install gallery-dl Homebrew -------- For macOS or Linux users using Homebrew: .. code:: bash brew install gallery-dl MacPorts -------- For macOS users with MacPorts: .. code:: bash sudo port install gallery-dl Docker -------- Using the Dockerfile in the repository: .. code:: bash git clone https://github.com/mikf/gallery-dl.git cd gallery-dl/ docker build -t gallery-dl:latest . Pulling image from `Docker Hub `__: .. code:: bash docker pull mikf123/gallery-dl docker tag mikf123/gallery-dl gallery-dl Pulling image from `GitHub Container Registry `__: .. code:: bash docker pull ghcr.io/mikf/gallery-dl docker tag ghcr.io/mikf/gallery-dl gallery-dl To run the container you will probably want to attach some directories on the host so that the config file and downloads can persist across runs. Make sure to either download the example config file reference in the repo and place it in the mounted volume location or touch an empty file there. If you gave the container a different tag or are using podman then make sure you adjust. Run ``docker image ls`` to check the name if you are not sure. This will remove the container after every use so you will always have a fresh environment for it to run. If you setup a ci-cd pipeline to autobuild the container you can also add a ``--pull=newer`` flag so that when you run it docker will check to see if there is a newer container and download it before running. .. code:: bash docker run --rm -v $HOME/Downloads/:/gallery-dl/ -v $HOME/.config/gallery-dl/gallery-dl.conf:/etc/gallery-dl.conf -it gallery-dl:latest You can also add an alias to your shell for "gallery-dl" or create a simple bash script and drop it somewhere in your $PATH to act as a shim for this command. Nix and Home Manager -------------------------- Adding *gallery-dl* to your system environment: .. code:: nix environment.systemPackages = with pkgs; [ gallery-dl ]; Using :code:`nix-shell` .. code:: bash nix-shell -p gallery-dl .. code:: bash nix-shell -p gallery-dl --run "gallery-dl " For Home Manager users, you can manage *gallery-dl* declaratively: .. code:: nix programs.gallery-dl = { enable = true; settings = { extractor.base-directory = "~/Downloads"; }; }; Alternatively, you can just add it to :code:`home.packages` if you don't want to manage it declaratively: .. code:: nix home.packages = with pkgs; [ gallery-dl ]; After making these changes, simply rebuild your configuration and open a new shell to have *gallery-dl* available. Usage ===== To use *gallery-dl* simply call it with the URLs you wish to download images from: .. code:: bash gallery-dl [OPTIONS]... URLS... Use :code:`gallery-dl --help` or see ``__ for a full list of all command-line options. Examples -------- Download images; in this case from danbooru via tag search for 'bonocho': .. code:: bash gallery-dl "https://danbooru.donmai.us/posts?tags=bonocho" Get the direct URL of an image from a site supporting authentication with username & password: .. code:: bash gallery-dl -g -u "" -p "" "https://twitter.com/i/web/status/604341487988576256" Filter manga chapters by chapter number and language: .. code:: bash gallery-dl --chapter-filter "10 <= chapter < 20" -o "lang=fr" "https://mangadex.org/title/59793dd0-a2d8-41a2-9758-8197287a8539" | Search a remote resource for URLs and download images from them: | (URLs for which no extractor can be found will be silently ignored) .. code:: bash gallery-dl "r:https://pastebin.com/raw/FLwrCYsT" If a site's address is nonstandard for its extractor, you can prefix the URL with the extractor's name to force the use of a specific extractor: .. code:: bash gallery-dl "tumblr:https://sometumblrblog.example" Configuration ============= Configuration files for *gallery-dl* use a JSON-based file format. Documentation ------------- A list of all available configuration options and their descriptions can be found at ``__. | For a default configuration file with available options set to their default values, see ``__. | For a commented example with more involved settings and option usage, see ``__. Locations --------- *gallery-dl* searches for configuration files in the following places: Windows: * ``%APPDATA%\gallery-dl\config.json`` * ``%USERPROFILE%\gallery-dl\config.json`` * ``%USERPROFILE%\gallery-dl.conf`` (``%USERPROFILE%`` usually refers to a user's home directory, i.e. ``C:\Users\\``) Linux, macOS, etc.: * ``/etc/gallery-dl.conf`` * ``${XDG_CONFIG_HOME}/gallery-dl/config.json`` * ``${HOME}/.config/gallery-dl/config.json`` * ``${HOME}/.gallery-dl.conf`` When run as `executable `__, *gallery-dl* will also look for a ``gallery-dl.conf`` file in the same directory as said executable. It is possible to use more than one configuration file at a time. In this case, any values from files after the first will get merged into the already loaded settings and potentially override previous ones. Authentication ============== Username & Password ------------------- Some extractors require you to provide valid login credentials in the form of a username & password pair. This is necessary for ``nijie`` and optional for ``aryion``, ``danbooru``, ``e621``, ``exhentai``, ``idolcomplex``, ``imgbb``, ``inkbunny``, ``mangadex``, ``mangoxo``, ``pillowfort``, ``sankaku``, ``subscribestar``, ``tapas``, ``tsumino``, ``twitter``, and ``zerochan``. You can set the necessary information in your `configuration file `__ .. code:: json { "extractor": { "twitter": { "username": "", "password": "" } } } or you can provide them directly via the :code:`-u/--username` and :code:`-p/--password` or via the :code:`-o/--option` command-line options .. code:: bash gallery-dl -u "" -p "" "URL" gallery-dl -o "username=" -o "password=" "URL" Cookies ------- For sites where login with username & password is not possible due to CAPTCHA or similar, or has not been implemented yet, you can use the cookies from a browser login session and input them into *gallery-dl*. This can be done via the `cookies `__ option in your configuration file by specifying - | the path to a Mozilla/Netscape format cookies.txt file exported by a browser addon | (e.g. `Get cookies.txt LOCALLY `__ for Chrome, `Export Cookies `__ for Firefox) - | a list of name-value pairs gathered from your browser's web developer tools | (in `Chrome `__, in `Firefox `__) - | the name of a browser to extract cookies from | (supported browsers are Chromium-based ones, Firefox, and Safari) For example: .. code:: json { "extractor": { "instagram": { "cookies": "$HOME/path/to/cookies.txt" }, "patreon": { "cookies": { "session_id": "K1T57EKu19TR49C51CDjOJoXNQLF7VbdVOiBrC9ye0a" } }, "twitter": { "cookies": ["firefox"] } } } | You can also specify a cookies.txt file with the :code:`--cookies` command-line option | or a browser to extract cookies from with :code:`--cookies-from-browser`: .. code:: bash gallery-dl --cookies "$HOME/path/to/cookies.txt" "URL" gallery-dl --cookies-from-browser firefox "URL" OAuth ----- *gallery-dl* supports user authentication via OAuth_ for some extractors. This is necessary for ``pixiv`` and optional for ``deviantart``, ``flickr``, ``reddit``, ``smugmug``, ``tumblr``, and ``mastodon`` instances. Linking your account to *gallery-dl* grants it the ability to issue requests on your account's behalf and enables it to access resources which would otherwise be unavailable to a public user. To do so, start by invoking it with ``oauth:`` as an argument. For example: .. code:: bash gallery-dl oauth:flickr You will be sent to the site's authorization page and asked to grant read access to *gallery-dl*. Authorize it and you will be shown one or more "tokens", which should be added to your configuration file. To authenticate with a ``mastodon`` instance, run *gallery-dl* with ``oauth:mastodon:`` as argument. For example: .. code:: bash gallery-dl oauth:mastodon:pawoo.net gallery-dl oauth:mastodon:https://mastodon.social/ .. _Python: https://www.python.org/downloads/ .. _PyPI: https://pypi.org/ .. _pip: https://pip.pypa.io/en/stable/ .. _Requests: https://requests.readthedocs.io/en/master/ .. _FFmpeg: https://www.ffmpeg.org/ .. _mkvmerge: https://www.matroska.org/downloads/mkvtoolnix.html .. _yt-dlp: https://github.com/yt-dlp/yt-dlp .. _youtube-dl: https://ytdl-org.github.io/youtube-dl/ .. _PySocks: https://pypi.org/project/PySocks/ .. _brotli: https://github.com/google/brotli .. _brotlicffi: https://github.com/python-hyper/brotlicffi .. _zstandard: https://github.com/indygreg/python-zstandard .. _PyYAML: https://pyyaml.org/ .. _toml: https://pypi.org/project/toml/ .. _SecretStorage: https://pypi.org/project/SecretStorage/ .. _Psycopg: https://www.psycopg.org/ .. _Snapd: https://docs.snapcraft.io/installing-snapd .. _OAuth: https://en.wikipedia.org/wiki/OAuth .. _Chocolatey: https://chocolatey.org/install .. _Scoop: https://scoop.sh .. |pypi| image:: https://img.shields.io/pypi/v/gallery-dl.svg :target: https://pypi.org/project/gallery-dl/ .. |build| image:: https://github.com/mikf/gallery-dl/workflows/tests/badge.svg :target: https://github.com/mikf/gallery-dl/actions .. |gitter| image:: https://badges.gitter.im/gallery-dl/main.svg :target: https://gitter.im/gallery-dl/main ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1747991096.4118786 gallery_dl-1.29.7/data/0000755000175000017500000000000015014035070013330 5ustar00mikemike././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1747991096.4169083 gallery_dl-1.29.7/data/completion/0000755000175000017500000000000015014035070015501 5ustar00mikemike././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1747043814.0 gallery_dl-1.29.7/data/completion/_gallery-dl0000644000175000017500000001710715010342746017634 0ustar00mikemike#compdef gallery-dl local curcontext="$curcontext" typeset -A opt_args local rc=1 _arguments -s -S \ {-h,--help}'[Print this help message and exit]' \ --version'[Print program version and exit]' \ {-f,--filename}'[Filename format string for downloaded files ('\''/O'\'' for "original" filenames)]':'' \ {-d,--destination}'[Target location for file downloads]':'' \ {-D,--directory}'[Exact location for file downloads]':'' \ {-X,--extractors}'[Load external extractors from PATH]':'' \ --user-agent'[User-Agent request header]':'' \ --clear-cache'[Delete cached login sessions, cookies, etc. for MODULE (ALL to delete everything)]':'' \ {-U,--update-check}'[Check if a newer version is available]' \ {-i,--input-file}'[Download URLs found in FILE ('\''-'\'' for stdin). More than one --input-file can be specified]':'':_files \ {-I,--input-file-comment}'[Download URLs found in FILE. Comment them out after they were downloaded successfully.]':'':_files \ {-x,--input-file-delete}'[Download URLs found in FILE. Delete them after they were downloaded successfully.]':'':_files \ --no-input'[Do not prompt for passwords/tokens]' \ {-q,--quiet}'[Activate quiet mode]' \ {-w,--warning}'[Print only warnings and errors]' \ {-v,--verbose}'[Print various debugging information]' \ {-g,--get-urls}'[Print URLs instead of downloading]' \ {-G,--resolve-urls}'[Print URLs instead of downloading; resolve intermediary URLs]' \ {-j,--dump-json}'[Print JSON information]' \ {-J,--resolve-json}'[Print JSON information; resolve intermediary URLs]' \ {-s,--simulate}'[Simulate data extraction; do not download anything]' \ {-E,--extractor-info}'[Print extractor defaults and settings]' \ {-K,--list-keywords}'[Print a list of available keywords and example values for the given URLs]' \ {-e,--error-file}'[Add input URLs which returned an error to FILE]':'':_files \ {-N,--print}'[Write FORMAT during EVENT (default '\''prepare'\'') to standard output. Examples: '\''id'\'' or '\''post:{md5\[:8\]}'\'']':'<[event:]format>' \ --print-to-file'[Append FORMAT during EVENT to FILE]':'<[event:]format file>' \ --list-modules'[Print a list of available extractor modules]' \ --list-extractors'[Print a list of extractor classes with description, (sub)category and example URL]':'<[categories]>' \ --write-log'[Write logging output to FILE]':'':_files \ --write-unsupported'[Write URLs, which get emitted by other extractors but cannot be handled, to FILE]':'':_files \ --write-pages'[Write downloaded intermediary pages to files in the current directory to debug problems]' \ --print-traffic'[Display sent and read HTTP traffic]' \ --no-colors'[Do not emit ANSI color codes in output]' \ {-R,--retries}'[Maximum number of retries for failed HTTP requests or -1 for infinite retries (default: 4)]':'' \ --http-timeout'[Timeout for HTTP connections (default: 30.0)]':'' \ --proxy'[Use the specified proxy]':'' \ --source-address'[Client-side IP address to bind to]':'' \ {-4,--force-ipv4}'[Make all connections via IPv4]' \ {-6,--force-ipv6}'[Make all connections via IPv6]' \ --no-check-certificate'[Disable HTTPS certificate validation]' \ {-r,--limit-rate}'[Maximum download rate (e.g. 500k or 2.5M)]':'' \ --chunk-size'[Size of in-memory data chunks (default: 32k)]':'' \ --sleep'[Number of seconds to wait before each download. This can be either a constant value or a range (e.g. 2.7 or 2.0-3.5)]':'' \ --sleep-request'[Number of seconds to wait between HTTP requests during data extraction]':'' \ --sleep-extractor'[Number of seconds to wait before starting data extraction for an input URL]':'' \ --no-part'[Do not use .part files]' \ --no-skip'[Do not skip downloads; overwrite existing files]' \ --no-mtime'[Do not set file modification times according to Last-Modified HTTP response headers]' \ --no-download'[Do not download any files]' \ {-o,--option}'[Additional options. Example: -o browser=firefox]':'' \ {-c,--config}'[Additional configuration files]':'':_files \ --config-yaml'[Additional configuration files in YAML format]':'':_files \ --config-toml'[Additional configuration files in TOML format]':'':_files \ --config-create'[Create a basic configuration file]' \ --config-status'[Show configuration file status]' \ --config-open'[Open configuration file in external application]' \ --config-ignore'[Do not read default configuration files]' \ {-u,--username}'[Username to login with]':'' \ {-p,--password}'[Password belonging to the given username]':'' \ --netrc'[Enable .netrc authentication data]' \ {-C,--cookies}'[File to load additional cookies from]':'':_files \ --cookies-export'[Export session cookies to FILE]':'':_files \ --cookies-from-browser'[Name of the browser to load cookies from, with optional domain prefixed with '\''/'\'', keyring name prefixed with '\''+'\'', profile prefixed with '\'':'\'', and container prefixed with '\''::'\'' ('\''none'\'' for no container (default), '\''all'\'' for all containers)]':'' \ {-A,--abort}'[Stop current extractor run after N consecutive file downloads were skipped]':'' \ {-T,--terminate}'[Stop current and parent extractor runs after N consecutive file downloads were skipped]':'' \ --filesize-min'[Do not download files smaller than SIZE (e.g. 500k or 2.5M)]':'' \ --filesize-max'[Do not download files larger than SIZE (e.g. 500k or 2.5M)]':'' \ --download-archive'[Record all downloaded or skipped files in FILE and skip downloading any file already in it]':'':_files \ --range'[Index range(s) specifying which files to download. These can be either a constant value, range, or slice (e.g. '\''5'\'', '\''8-20'\'', or '\''1:24:3'\'')]':'' \ --chapter-range'[Like '\''--range'\'', but applies to manga chapters and other delegated URLs]':'' \ --filter'[Python expression controlling which files to download. Files for which the expression evaluates to False are ignored. Available keys are the filename-specific ones listed by '\''-K'\''. Example: --filter "image_width >= 1000 and rating in ('\''s'\'', '\''q'\'')"]':'' \ --chapter-filter'[Like '\''--filter'\'', but applies to manga chapters and other delegated URLs]':'' \ {-P,--postprocessor}'[Activate the specified post processor]':'' \ --no-postprocessors'[Do not run any post processors]' \ {-O,--postprocessor-option}'[Additional post processor options]':'' \ --write-metadata'[Write metadata to separate JSON files]' \ --write-info-json'[Write gallery metadata to a info.json file]' \ --write-tags'[Write image tags to separate text files]' \ --zip'[Store downloaded files in a ZIP archive]' \ --cbz'[Store downloaded files in a CBZ archive]' \ --mtime'[Set file modification times according to metadata selected by NAME. Examples: '\''date'\'' or '\''status\[date\]'\'']':'' \ --rename'[Rename previously downloaded files from FORMAT to the current filename format]':'' \ --rename-to'[Rename previously downloaded files from the current filename format to FORMAT]':'' \ --ugoira'[Convert Pixiv Ugoira to FMT using FFmpeg. Supported formats are '\''webm'\'', '\''mp4'\'', '\''gif'\'', '\''vp8'\'', '\''vp9'\'', '\''vp9-lossless'\'', '\''copy'\'', '\''zip'\''.]':'' \ --exec'[Execute CMD for each downloaded file. Supported replacement fields are {} or {_path}, {_directory}, {_filename}. Example: --exec "convert {} {}.png && rm {}"]':'' \ --exec-after'[Execute CMD after all files were downloaded. Example: --exec-after "cd {_directory} && convert * ../doc.pdf"]':'' && rc=0 return rc ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1747043814.0 gallery_dl-1.29.7/data/completion/gallery-dl0000644000175000017500000000347115010342746017474 0ustar00mikemike_gallery_dl() { local cur prev COMPREPLY=() cur="${COMP_WORDS[COMP_CWORD]}" prev="${COMP_WORDS[COMP_CWORD-1]}" if [[ "${prev}" =~ ^(-i|--input-file|-I|--input-file-comment|-x|--input-file-delete|-e|--error-file|--write-log|--write-unsupported|-c|--config|--config-yaml|--config-toml|-C|--cookies|--cookies-export|--download-archive)$ ]]; then COMPREPLY=( $(compgen -f -- "${cur}") ) elif [[ "${prev}" =~ ^()$ ]]; then COMPREPLY=( $(compgen -d -- "${cur}") ) else COMPREPLY=( $(compgen -W "--help --version --filename --destination --directory --extractors --user-agent --clear-cache --update-check --input-file --input-file-comment --input-file-delete --no-input --quiet --warning --verbose --get-urls --resolve-urls --dump-json --resolve-json --simulate --extractor-info --list-keywords --error-file --print --print-to-file --list-modules --list-extractors --write-log --write-unsupported --write-pages --print-traffic --no-colors --retries --http-timeout --proxy --source-address --force-ipv4 --force-ipv6 --no-check-certificate --limit-rate --chunk-size --sleep --sleep-request --sleep-extractor --no-part --no-skip --no-mtime --no-download --option --config --config-yaml --config-toml --config-create --config-status --config-open --config-ignore --ignore-config --username --password --netrc --cookies --cookies-export --cookies-from-browser --abort --terminate --filesize-min --filesize-max --download-archive --range --chapter-range --filter --chapter-filter --postprocessor --no-postprocessors --postprocessor-option --write-metadata --write-info-json --write-infojson --write-tags --zip --cbz --mtime --mtime-from-date --rename --rename-to --ugoira --ugoira-conv --ugoira-conv-lossless --ugoira-conv-copy --exec --exec-after" -- "${cur}") ) fi } complete -F _gallery_dl gallery-dl ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1747043814.0 gallery_dl-1.29.7/data/completion/gallery-dl.fish0000644000175000017500000002313215010342746020420 0ustar00mikemikecomplete -c gallery-dl -x complete -c gallery-dl -s 'h' -l 'help' -d 'Print this help message and exit' complete -c gallery-dl -l 'version' -d 'Print program version and exit' complete -c gallery-dl -x -s 'f' -l 'filename' -d 'Filename format string for downloaded files ("/O" for "original" filenames)' complete -c gallery-dl -x -a '(__fish_complete_directories)' -s 'd' -l 'destination' -d 'Target location for file downloads' complete -c gallery-dl -x -a '(__fish_complete_directories)' -s 'D' -l 'directory' -d 'Exact location for file downloads' complete -c gallery-dl -x -a '(__fish_complete_directories)' -s 'X' -l 'extractors' -d 'Load external extractors from PATH' complete -c gallery-dl -x -l 'user-agent' -d 'User-Agent request header' complete -c gallery-dl -x -l 'clear-cache' -d 'Delete cached login sessions, cookies, etc. for MODULE (ALL to delete everything)' complete -c gallery-dl -s 'U' -l 'update-check' -d 'Check if a newer version is available' complete -c gallery-dl -r -F -s 'i' -l 'input-file' -d 'Download URLs found in FILE ("-" for stdin). More than one --input-file can be specified' complete -c gallery-dl -r -F -s 'I' -l 'input-file-comment' -d 'Download URLs found in FILE. Comment them out after they were downloaded successfully.' complete -c gallery-dl -r -F -s 'x' -l 'input-file-delete' -d 'Download URLs found in FILE. Delete them after they were downloaded successfully.' complete -c gallery-dl -l 'no-input' -d 'Do not prompt for passwords/tokens' complete -c gallery-dl -s 'q' -l 'quiet' -d 'Activate quiet mode' complete -c gallery-dl -s 'w' -l 'warning' -d 'Print only warnings and errors' complete -c gallery-dl -s 'v' -l 'verbose' -d 'Print various debugging information' complete -c gallery-dl -s 'g' -l 'get-urls' -d 'Print URLs instead of downloading' complete -c gallery-dl -s 'G' -l 'resolve-urls' -d 'Print URLs instead of downloading; resolve intermediary URLs' complete -c gallery-dl -s 'j' -l 'dump-json' -d 'Print JSON information' complete -c gallery-dl -s 'J' -l 'resolve-json' -d 'Print JSON information; resolve intermediary URLs' complete -c gallery-dl -s 's' -l 'simulate' -d 'Simulate data extraction; do not download anything' complete -c gallery-dl -s 'E' -l 'extractor-info' -d 'Print extractor defaults and settings' complete -c gallery-dl -s 'K' -l 'list-keywords' -d 'Print a list of available keywords and example values for the given URLs' complete -c gallery-dl -r -F -s 'e' -l 'error-file' -d 'Add input URLs which returned an error to FILE' complete -c gallery-dl -x -s 'N' -l 'print' -d 'Write FORMAT during EVENT (default "prepare") to standard output. Examples: "id" or "post:{md5[:8]}"' complete -c gallery-dl -x -l 'print-to-file' -d 'Append FORMAT during EVENT to FILE' complete -c gallery-dl -l 'list-modules' -d 'Print a list of available extractor modules' complete -c gallery-dl -x -l 'list-extractors' -d 'Print a list of extractor classes with description, (sub)category and example URL' complete -c gallery-dl -r -F -l 'write-log' -d 'Write logging output to FILE' complete -c gallery-dl -r -F -l 'write-unsupported' -d 'Write URLs, which get emitted by other extractors but cannot be handled, to FILE' complete -c gallery-dl -l 'write-pages' -d 'Write downloaded intermediary pages to files in the current directory to debug problems' complete -c gallery-dl -l 'print-traffic' -d 'Display sent and read HTTP traffic' complete -c gallery-dl -l 'no-colors' -d 'Do not emit ANSI color codes in output' complete -c gallery-dl -x -s 'R' -l 'retries' -d 'Maximum number of retries for failed HTTP requests or -1 for infinite retries (default: 4)' complete -c gallery-dl -x -l 'http-timeout' -d 'Timeout for HTTP connections (default: 30.0)' complete -c gallery-dl -x -l 'proxy' -d 'Use the specified proxy' complete -c gallery-dl -x -l 'source-address' -d 'Client-side IP address to bind to' complete -c gallery-dl -s '4' -l 'force-ipv4' -d 'Make all connections via IPv4' complete -c gallery-dl -s '6' -l 'force-ipv6' -d 'Make all connections via IPv6' complete -c gallery-dl -l 'no-check-certificate' -d 'Disable HTTPS certificate validation' complete -c gallery-dl -x -s 'r' -l 'limit-rate' -d 'Maximum download rate (e.g. 500k or 2.5M)' complete -c gallery-dl -x -l 'chunk-size' -d 'Size of in-memory data chunks (default: 32k)' complete -c gallery-dl -x -l 'sleep' -d 'Number of seconds to wait before each download. This can be either a constant value or a range (e.g. 2.7 or 2.0-3.5)' complete -c gallery-dl -x -l 'sleep-request' -d 'Number of seconds to wait between HTTP requests during data extraction' complete -c gallery-dl -x -l 'sleep-extractor' -d 'Number of seconds to wait before starting data extraction for an input URL' complete -c gallery-dl -l 'no-part' -d 'Do not use .part files' complete -c gallery-dl -l 'no-skip' -d 'Do not skip downloads; overwrite existing files' complete -c gallery-dl -l 'no-mtime' -d 'Do not set file modification times according to Last-Modified HTTP response headers' complete -c gallery-dl -l 'no-download' -d 'Do not download any files' complete -c gallery-dl -x -s 'o' -l 'option' -d 'Additional options. Example: -o browser=firefox' complete -c gallery-dl -r -F -s 'c' -l 'config' -d 'Additional configuration files' complete -c gallery-dl -r -F -l 'config-yaml' -d 'Additional configuration files in YAML format' complete -c gallery-dl -r -F -l 'config-toml' -d 'Additional configuration files in TOML format' complete -c gallery-dl -l 'config-create' -d 'Create a basic configuration file' complete -c gallery-dl -l 'config-status' -d 'Show configuration file status' complete -c gallery-dl -l 'config-open' -d 'Open configuration file in external application' complete -c gallery-dl -l 'config-ignore' -d 'Do not read default configuration files' complete -c gallery-dl -l 'ignore-config' -d '==SUPPRESS==' complete -c gallery-dl -x -s 'u' -l 'username' -d 'Username to login with' complete -c gallery-dl -x -s 'p' -l 'password' -d 'Password belonging to the given username' complete -c gallery-dl -l 'netrc' -d 'Enable .netrc authentication data' complete -c gallery-dl -r -F -s 'C' -l 'cookies' -d 'File to load additional cookies from' complete -c gallery-dl -r -F -l 'cookies-export' -d 'Export session cookies to FILE' complete -c gallery-dl -x -l 'cookies-from-browser' -d 'Name of the browser to load cookies from, with optional domain prefixed with "/", keyring name prefixed with "+", profile prefixed with ":", and container prefixed with "::" ("none" for no container (default), "all" for all containers)' complete -c gallery-dl -x -s 'A' -l 'abort' -d 'Stop current extractor run after N consecutive file downloads were skipped' complete -c gallery-dl -x -s 'T' -l 'terminate' -d 'Stop current and parent extractor runs after N consecutive file downloads were skipped' complete -c gallery-dl -x -l 'filesize-min' -d 'Do not download files smaller than SIZE (e.g. 500k or 2.5M)' complete -c gallery-dl -x -l 'filesize-max' -d 'Do not download files larger than SIZE (e.g. 500k or 2.5M)' complete -c gallery-dl -r -F -l 'download-archive' -d 'Record all downloaded or skipped files in FILE and skip downloading any file already in it' complete -c gallery-dl -x -l 'range' -d 'Index range(s) specifying which files to download. These can be either a constant value, range, or slice (e.g. "5", "8-20", or "1:24:3")' complete -c gallery-dl -x -l 'chapter-range' -d 'Like "--range", but applies to manga chapters and other delegated URLs' complete -c gallery-dl -x -l 'filter' -d 'Python expression controlling which files to download. Files for which the expression evaluates to False are ignored. Available keys are the filename-specific ones listed by "-K". Example: --filter "image_width >= 1000 and rating in ("s", "q")"' complete -c gallery-dl -x -l 'chapter-filter' -d 'Like "--filter", but applies to manga chapters and other delegated URLs' complete -c gallery-dl -x -s 'P' -l 'postprocessor' -d 'Activate the specified post processor' complete -c gallery-dl -l 'no-postprocessors' -d 'Do not run any post processors' complete -c gallery-dl -x -s 'O' -l 'postprocessor-option' -d 'Additional post processor options' complete -c gallery-dl -l 'write-metadata' -d 'Write metadata to separate JSON files' complete -c gallery-dl -l 'write-info-json' -d 'Write gallery metadata to a info.json file' complete -c gallery-dl -l 'write-infojson' -d '==SUPPRESS==' complete -c gallery-dl -l 'write-tags' -d 'Write image tags to separate text files' complete -c gallery-dl -l 'zip' -d 'Store downloaded files in a ZIP archive' complete -c gallery-dl -l 'cbz' -d 'Store downloaded files in a CBZ archive' complete -c gallery-dl -x -l 'mtime' -d 'Set file modification times according to metadata selected by NAME. Examples: "date" or "status[date]"' complete -c gallery-dl -l 'mtime-from-date' -d '==SUPPRESS==' complete -c gallery-dl -x -l 'rename' -d 'Rename previously downloaded files from FORMAT to the current filename format' complete -c gallery-dl -x -l 'rename-to' -d 'Rename previously downloaded files from the current filename format to FORMAT' complete -c gallery-dl -x -l 'ugoira' -d 'Convert Pixiv Ugoira to FMT using FFmpeg. Supported formats are "webm", "mp4", "gif", "vp8", "vp9", "vp9-lossless", "copy", "zip".' complete -c gallery-dl -l 'ugoira-conv' -d '==SUPPRESS==' complete -c gallery-dl -l 'ugoira-conv-lossless' -d '==SUPPRESS==' complete -c gallery-dl -l 'ugoira-conv-copy' -d '==SUPPRESS==' complete -c gallery-dl -x -l 'exec' -d 'Execute CMD for each downloaded file. Supported replacement fields are {} or {_path}, {_directory}, {_filename}. Example: --exec "convert {} {}.png && rm {}"' complete -c gallery-dl -x -l 'exec-after' -d 'Execute CMD after all files were downloaded. Example: --exec-after "cd {_directory} && convert * ../doc.pdf"' ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1747991096.4174023 gallery_dl-1.29.7/data/man/0000755000175000017500000000000015014035070014103 5ustar00mikemike././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1747991092.0 gallery_dl-1.29.7/data/man/gallery-dl.10000644000175000017500000002304015014035064016223 0ustar00mikemike.TH "GALLERY-DL" "1" "2025-05-23" "1.29.7" "gallery-dl Manual" .\" disable hyphenation .nh .SH NAME gallery-dl \- download image-galleries and -collections .SH SYNOPSIS .B gallery-dl [OPTION]... URL... .SH DESCRIPTION .B gallery-dl is a command-line program to download image-galleries and -collections from several image hosting sites. It is a cross-platform tool with many configuration options and powerful filenaming capabilities. .SH OPTIONS .TP .B "\-h, \-\-help" Print this help message and exit .TP .B "\-\-version" Print program version and exit .TP .B "\-f, \-\-filename" \f[I]FORMAT\f[] Filename format string for downloaded files ('/O' for "original" filenames) .TP .B "\-d, \-\-destination" \f[I]PATH\f[] Target location for file downloads .TP .B "\-D, \-\-directory" \f[I]PATH\f[] Exact location for file downloads .TP .B "\-X, \-\-extractors" \f[I]PATH\f[] Load external extractors from PATH .TP .B "\-\-user\-agent" \f[I]UA\f[] User-Agent request header .TP .B "\-\-clear\-cache" \f[I]MODULE\f[] Delete cached login sessions, cookies, etc. for MODULE (ALL to delete everything) .TP .B "\-U, \-\-update\-check" Check if a newer version is available .TP .B "\-i, \-\-input\-file" \f[I]FILE\f[] Download URLs found in FILE ('-' for stdin). More than one --input-file can be specified .TP .B "\-I, \-\-input\-file\-comment" \f[I]FILE\f[] Download URLs found in FILE. Comment them out after they were downloaded successfully. .TP .B "\-x, \-\-input\-file\-delete" \f[I]FILE\f[] Download URLs found in FILE. Delete them after they were downloaded successfully. .TP .B "\-\-no\-input" Do not prompt for passwords/tokens .TP .B "\-q, \-\-quiet" Activate quiet mode .TP .B "\-w, \-\-warning" Print only warnings and errors .TP .B "\-v, \-\-verbose" Print various debugging information .TP .B "\-g, \-\-get\-urls" Print URLs instead of downloading .TP .B "\-G, \-\-resolve\-urls" Print URLs instead of downloading; resolve intermediary URLs .TP .B "\-j, \-\-dump\-json" Print JSON information .TP .B "\-J, \-\-resolve\-json" Print JSON information; resolve intermediary URLs .TP .B "\-s, \-\-simulate" Simulate data extraction; do not download anything .TP .B "\-E, \-\-extractor\-info" Print extractor defaults and settings .TP .B "\-K, \-\-list\-keywords" Print a list of available keywords and example values for the given URLs .TP .B "\-e, \-\-error\-file" \f[I]FILE\f[] Add input URLs which returned an error to FILE .TP .B "\-N, \-\-print" \f[I][EVENT:]FORMAT\f[] Write FORMAT during EVENT (default 'prepare') to standard output. Examples: 'id' or 'post:{md5[:8]}' .TP .B "\-\-print\-to\-file" \f[I][EVENT:]FORMAT FILE\f[] Append FORMAT during EVENT to FILE .TP .B "\-\-list\-modules" Print a list of available extractor modules .TP .B "\-\-list\-extractors" \f[I][CATEGORIES]\f[] Print a list of extractor classes with description, (sub)category and example URL .TP .B "\-\-write\-log" \f[I]FILE\f[] Write logging output to FILE .TP .B "\-\-write\-unsupported" \f[I]FILE\f[] Write URLs, which get emitted by other extractors but cannot be handled, to FILE .TP .B "\-\-write\-pages" Write downloaded intermediary pages to files in the current directory to debug problems .TP .B "\-\-print\-traffic" Display sent and read HTTP traffic .TP .B "\-\-no\-colors" Do not emit ANSI color codes in output .TP .B "\-R, \-\-retries" \f[I]N\f[] Maximum number of retries for failed HTTP requests or -1 for infinite retries (default: 4) .TP .B "\-\-http\-timeout" \f[I]SECONDS\f[] Timeout for HTTP connections (default: 30.0) .TP .B "\-\-proxy" \f[I]URL\f[] Use the specified proxy .TP .B "\-\-source\-address" \f[I]IP\f[] Client-side IP address to bind to .TP .B "\-4, \-\-force\-ipv4" Make all connections via IPv4 .TP .B "\-6, \-\-force\-ipv6" Make all connections via IPv6 .TP .B "\-\-no\-check\-certificate" Disable HTTPS certificate validation .TP .B "\-r, \-\-limit\-rate" \f[I]RATE\f[] Maximum download rate (e.g. 500k or 2.5M) .TP .B "\-\-chunk\-size" \f[I]SIZE\f[] Size of in-memory data chunks (default: 32k) .TP .B "\-\-sleep" \f[I]SECONDS\f[] Number of seconds to wait before each download. This can be either a constant value or a range (e.g. 2.7 or 2.0-3.5) .TP .B "\-\-sleep\-request" \f[I]SECONDS\f[] Number of seconds to wait between HTTP requests during data extraction .TP .B "\-\-sleep\-extractor" \f[I]SECONDS\f[] Number of seconds to wait before starting data extraction for an input URL .TP .B "\-\-no\-part" Do not use .part files .TP .B "\-\-no\-skip" Do not skip downloads; overwrite existing files .TP .B "\-\-no\-mtime" Do not set file modification times according to Last-Modified HTTP response headers .TP .B "\-\-no\-download" Do not download any files .TP .B "\-o, \-\-option" \f[I]KEY=VALUE\f[] Additional options. Example: -o browser=firefox .TP .B "\-c, \-\-config" \f[I]FILE\f[] Additional configuration files .TP .B "\-\-config\-yaml" \f[I]FILE\f[] Additional configuration files in YAML format .TP .B "\-\-config\-toml" \f[I]FILE\f[] Additional configuration files in TOML format .TP .B "\-\-config\-create" Create a basic configuration file .TP .B "\-\-config\-status" Show configuration file status .TP .B "\-\-config\-open" Open configuration file in external application .TP .B "\-\-config\-ignore" Do not read default configuration files .TP .B "\-u, \-\-username" \f[I]USER\f[] Username to login with .TP .B "\-p, \-\-password" \f[I]PASS\f[] Password belonging to the given username .TP .B "\-\-netrc" Enable .netrc authentication data .TP .B "\-C, \-\-cookies" \f[I]FILE\f[] File to load additional cookies from .TP .B "\-\-cookies\-export" \f[I]FILE\f[] Export session cookies to FILE .TP .B "\-\-cookies\-from\-browser" \f[I]BROWSER[/DOMAIN][+KEYRING][:PROFILE][::CONTAINER]\f[] Name of the browser to load cookies from, with optional domain prefixed with '/', keyring name prefixed with '+', profile prefixed with ':', and container prefixed with '::' ('none' for no container (default), 'all' for all containers) .TP .B "\-A, \-\-abort" \f[I]N\f[] Stop current extractor run after N consecutive file downloads were skipped .TP .B "\-T, \-\-terminate" \f[I]N\f[] Stop current and parent extractor runs after N consecutive file downloads were skipped .TP .B "\-\-filesize\-min" \f[I]SIZE\f[] Do not download files smaller than SIZE (e.g. 500k or 2.5M) .TP .B "\-\-filesize\-max" \f[I]SIZE\f[] Do not download files larger than SIZE (e.g. 500k or 2.5M) .TP .B "\-\-download\-archive" \f[I]FILE\f[] Record all downloaded or skipped files in FILE and skip downloading any file already in it .TP .B "\-\-range" \f[I]RANGE\f[] Index range(s) specifying which files to download. These can be either a constant value, range, or slice (e.g. '5', '8-20', or '1:24:3') .TP .B "\-\-chapter\-range" \f[I]RANGE\f[] Like '--range', but applies to manga chapters and other delegated URLs .TP .B "\-\-filter" \f[I]EXPR\f[] Python expression controlling which files to download. Files for which the expression evaluates to False are ignored. Available keys are the filename-specific ones listed by '-K'. Example: --filter "image_width >= 1000 and rating in ('s', 'q')" .TP .B "\-\-chapter\-filter" \f[I]EXPR\f[] Like '--filter', but applies to manga chapters and other delegated URLs .TP .B "\-P, \-\-postprocessor" \f[I]NAME\f[] Activate the specified post processor .TP .B "\-\-no\-postprocessors" Do not run any post processors .TP .B "\-O, \-\-postprocessor\-option" \f[I]KEY=VALUE\f[] Additional post processor options .TP .B "\-\-write\-metadata" Write metadata to separate JSON files .TP .B "\-\-write\-info\-json" Write gallery metadata to a info.json file .TP .B "\-\-write\-tags" Write image tags to separate text files .TP .B "\-\-zip" Store downloaded files in a ZIP archive .TP .B "\-\-cbz" Store downloaded files in a CBZ archive .TP .B "\-\-mtime" \f[I]NAME\f[] Set file modification times according to metadata selected by NAME. Examples: 'date' or 'status[date]' .TP .B "\-\-rename" \f[I]FORMAT\f[] Rename previously downloaded files from FORMAT to the current filename format .TP .B "\-\-rename\-to" \f[I]FORMAT\f[] Rename previously downloaded files from the current filename format to FORMAT .TP .B "\-\-ugoira" \f[I]FMT\f[] Convert Pixiv Ugoira to FMT using FFmpeg. Supported formats are 'webm', 'mp4', 'gif', 'vp8', 'vp9', 'vp9-lossless', 'copy', 'zip'. .TP .B "\-\-exec" \f[I]CMD\f[] Execute CMD for each downloaded file. Supported replacement fields are {} or {_path}, {_directory}, {_filename}. Example: --exec "convert {} {}.png && rm {}" .TP .B "\-\-exec\-after" \f[I]CMD\f[] Execute CMD after all files were downloaded. Example: --exec-after "cd {_directory} && convert * ../doc.pdf" .SH EXAMPLES .TP gallery-dl \f[I]URL\f[] Download images from \f[I]URL\f[]. .TP gallery-dl -g -u -p \f[I]URL\f[] Print direct URLs from a site that requires authentication. .TP gallery-dl --filter 'type == "ugoira"' --range '2-4' \f[I]URL\f[] Apply filter and range expressions. This will only download the second, third, and fourth file where its type value is equal to "ugoira". .TP gallery-dl r:\f[I]URL\f[] Scan \f[I]URL\f[] for other URLs and invoke \f[B]gallery-dl\f[] on them. .TP gallery-dl oauth:\f[I]SITE\-NAME\f[] Gain OAuth authentication tokens for .IR deviantart , .IR flickr , .IR reddit , .IR smugmug ", and" .IR tumblr . .SH FILES .TP .I /etc/gallery-dl.conf The system wide configuration file. .TP .I ~/.config/gallery-dl/config.json Per user configuration file. .TP .I ~/.gallery-dl.conf Alternate per user configuration file. .SH BUGS https://github.com/mikf/gallery-dl/issues .SH AUTHORS Mike Fährmann .br and https://github.com/mikf/gallery-dl/graphs/contributors .SH "SEE ALSO" .BR gallery-dl.conf (5) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1747991093.0 gallery_dl-1.29.7/data/man/gallery-dl.conf.50000644000175000017500000054123515014035065017167 0ustar00mikemike.TH "GALLERY-DL.CONF" "5" "2025-05-23" "1.29.7" "gallery-dl Manual" .\" disable hyphenation .nh .\" disable justification (adjust text to left margin only) .ad l .SH NAME gallery-dl.conf \- gallery-dl configuration file .SH DESCRIPTION gallery-dl will search for configuration files in the following places every time it is started, unless .B --ignore-config is specified: .PP .RS 4 .nf .I /etc/gallery-dl.conf .I $HOME/.config/gallery-dl/config.json .I $HOME/.gallery-dl.conf .fi .RE .PP It is also possible to specify additional configuration files with the .B -c/--config command-line option or to add further option values with .B -o/--option as = pairs, Configuration files are JSON-based and therefore don't allow any ordinary comments, but, since unused keys are simply ignored, it is possible to utilize those as makeshift comments by settings their values to arbitrary strings. .SH EXAMPLE { .RS 4 "base-directory": "/tmp/", .br "extractor": { .RS 4 "pixiv": { .RS 4 "directory": ["Pixiv", "Works", "{user[id]}"], .br "filename": "{id}{num}.{extension}", .br "username": "foo", .br "password": "bar" .RE }, .br "flickr": { .RS 4 "_comment": "OAuth keys for account 'foobar'", .br "access-token": "0123456789-0123456789abcdef", .br "access-token-secret": "fedcba9876543210" .RE } .RE }, .br "downloader": { .RS 4 "retries": 3, .br "timeout": 2.5 .RE } .RE } .SH EXTRACTOR OPTIONS .SS extractor.*.filename .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]object\f[] (condition -> \f[I]format string\f[]) .IP "Example:" 4 .. code:: json "{manga}_c{chapter}_{page:>03}.{extension}" .. code:: json { "extension == 'mp4'": "{id}_video.{extension}", "'nature' in title" : "{id}_{title}.{extension}", "" : "{id}_default.{extension}" } .IP "Description:" 4 A \f[I]format string\f[] to build filenames for downloaded files with. If this is an \f[I]object\f[], it must contain Python expressions mapping to the filename format strings to use. These expressions are evaluated in the order as specified in Python 3.6+ and in an undetermined order in Python 3.4 and 3.5. The available replacement keys depend on the extractor used. A list of keys for a specific one can be acquired by calling *gallery-dl* with the \f[I]-K\f[]/\f[I]--list-keywords\f[] command-line option. For example: .. code:: $ gallery-dl -K http://seiga.nicovideo.jp/seiga/im5977527 Keywords for directory names: category seiga subcategory image Keywords for filenames: category seiga extension None image-id 5977527 subcategory image Note: Even if the value of the \f[I]extension\f[] key is missing or \f[I]None\f[], it will be filled in later when the file download is starting. This key is therefore always available to provide a valid filename extension. .SS extractor.*.directory .IP "Type:" 6 .br * \f[I]list\f[] of \f[I]strings\f[] .br * \f[I]object\f[] (condition -> \f[I]format strings\f[]) .IP "Example:" 4 .. code:: json ["{category}", "{manga}", "c{chapter} - {title}"] .. code:: json { "'nature' in content": ["Nature Pictures"], "retweet_id != 0" : ["{category}", "{user[name]}", "Retweets"], "" : ["{category}", "{user[name]}"] } .IP "Description:" 4 A list of \f[I]format strings\f[] to build target directory paths with. If this is an \f[I]object\f[], it must contain Python expressions mapping to the list of format strings to use. Each individual string in such a list represents a single path segment, which will be joined together and appended to the \f[I]base-directory\f[] to form the complete target directory path. .SS extractor.*.base-directory .IP "Type:" 6 \f[I]Path\f[] .IP "Default:" 9 \f[I]"./gallery-dl/"\f[] .IP "Description:" 4 Directory path used as base for all download destinations. .SS extractor.*.parent-directory .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Use an extractor's current target directory as \f[I]base-directory\f[] for any spawned child extractors. .SS extractor.*.metadata-parent .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 If \f[I]true\f[], overwrite any metadata provided by a child extractor with its parent's. If this is a \f[I]string\f[], add a parent's metadata to its children's .br to a field named after said string. For example with \f[I]"parent-metadata": "_p_"\f[]: .br .. code:: json { "id": "child-id", "_p_": {"id": "parent-id"} } .SS extractor.*.parent-skip .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Share number of skipped downloads between parent and child extractors. .SS extractor.*.path-restrict .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]object\f[] (character -> replacement character(s)) .IP "Default:" 9 \f[I]"auto"\f[] .IP "Example:" 4 .br * "/!? (){}" .br * {" ": "_", "/": "-", "|": "-", ":": "_-_", "*": "_+_"} .IP "Description:" 4 A string of characters to be replaced with the value of .br \f[I]path-replace\f[] or an object mapping invalid/unwanted characters to their replacements .br for generated path segment names. .br Special values: .br * \f[I]"auto"\f[]: Use characters from \f[I]"unix"\f[] or \f[I]"windows"\f[] depending on the local operating system .br * \f[I]"unix"\f[]: \f[I]"/"\f[] .br * \f[I]"windows"\f[]: \f[I]"\\\\\\\\|/<>:\\"?*"\f[] .br * \f[I]"ascii"\f[]: \f[I]"^0-9A-Za-z_."\f[] (only ASCII digits, letters, underscores, and dots) .br * \f[I]"ascii+"\f[]: \f[I]"^0-9@-[\\\\]-{ #-)+-.;=!}~"\f[] (all ASCII characters except the ones not allowed by Windows) Implementation Detail: For \f[I]strings\f[] with length >= 2, this option uses a \f[I]Regular Expression Character Set\f[], meaning that: .br * using a caret \f[I]^\f[] as first character inverts the set .br * character ranges are supported (\f[I]0-9a-z\f[]) .br * \f[I]]\f[], \f[I]-\f[], and \f[I]\\\f[] need to be escaped as \f[I]\\\\]\f[], \f[I]\\\\-\f[], and \f[I]\\\\\\\\\f[] respectively to use them as literal characters .SS extractor.*.path-replace .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"_"\f[] .IP "Description:" 4 The replacement character(s) for \f[I]path-restrict\f[] .SS extractor.*.path-remove .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"\\u0000-\\u001f\\u007f"\f[] (ASCII control characters) .IP "Description:" 4 Set of characters to remove from generated path names. Note: In a string with 2 or more characters, \f[I][]^-\\\f[] need to be escaped with backslashes, e.g. \f[I]"\\\\[\\\\]"\f[] .SS extractor.*.path-strip .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"auto"\f[] .IP "Description:" 4 Set of characters to remove from the end of generated path segment names using \f[I]str.rstrip()\f[] Special values: .br * \f[I]"auto"\f[]: Use characters from \f[I]"unix"\f[] or \f[I]"windows"\f[] depending on the local operating system .br * \f[I]"unix"\f[]: \f[I]""\f[] .br * \f[I]"windows"\f[]: \f[I]". "\f[] .SS extractor.*.path-extended .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 On Windows, use \f[I]extended-length paths\f[] prefixed with \f[I]\\\\?\\\f[] to work around the 260 characters path length limit. .SS extractor.*.extension-map .IP "Type:" 6 \f[I]object\f[] (extension -> replacement) .IP "Default:" 9 .. code:: json { "jpeg": "jpg", "jpe" : "jpg", "jfif": "jpg", "jif" : "jpg", "jfi" : "jpg" } .IP "Description:" 4 A JSON \f[I]object\f[] mapping filename extensions to their replacements. .SS extractor.*.skip .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Controls the behavior when downloading files that have been downloaded before, i.e. a file with the same filename already exists or its ID is in a \f[I]download archive\f[]. .br * \f[I]true\f[]: Skip downloads .br * \f[I]false\f[]: Overwrite already existing files .br * \f[I]"abort"\f[]: Stop the current extractor run .br * \f[I]"abort:N"\f[]: Skip downloads and stop the current extractor run after \f[I]N\f[] consecutive skips .br * \f[I]"terminate"\f[]: Stop the current extractor run, including parent extractors .br * \f[I]"terminate:N"\f[]: Skip downloads and stop the current extractor run, including parent extractors, after \f[I]N\f[] consecutive skips .br * \f[I]"exit"\f[]: Exit the program altogether .br * \f[I]"exit:N"\f[]: Skip downloads and exit the program after \f[I]N\f[] consecutive skips .br * \f[I]"enumerate"\f[]: Add an enumeration index to the beginning of the filename extension (\f[I]file.1.ext\f[], \f[I]file.2.ext\f[], etc.) .SS extractor.*.skip-filter .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 Python expression controlling which skipped files to count towards \f[I]"abort"\f[] / \f[I]"terminate"\f[] / \f[I]"exit"\f[]. .SS extractor.*.sleep .IP "Type:" 6 \f[I]Duration\f[] .IP "Default:" 9 \f[I]0\f[] .IP "Description:" 4 Number of seconds to sleep before each download. .SS extractor.*.sleep-extractor .IP "Type:" 6 \f[I]Duration\f[] .IP "Default:" 9 \f[I]0\f[] .IP "Description:" 4 Number of seconds to sleep before handling an input URL, i.e. before starting a new extractor. .SS extractor.*.sleep-429 .IP "Type:" 6 \f[I]Duration\f[] .IP "Default:" 9 \f[I]60\f[] .IP "Description:" 4 Number of seconds to sleep when receiving a 429 Too Many Requests response before \f[I]retrying\f[] the request. .SS extractor.*.sleep-request .IP "Type:" 6 \f[I]Duration\f[] .IP "Default:" 9 .br * \f[I]"0.5-1.5"\f[] \f[I]ao3\f[], \f[I]arcalive\f[], \f[I]civitai\f[], \f[I][Danbooru]\f[], \f[I][E621]\f[], \f[I][foolfuuka]:search\f[], \f[I]itaku\f[], \f[I]koharu\f[], \f[I]newgrounds\f[], \f[I][philomena]\f[], \f[I]pixiv:novel\f[], \f[I]plurk\f[], \f[I]poipiku\f[] , \f[I]pornpics\f[], \f[I]scrolller\f[], \f[I]soundgasm\f[], \f[I]urlgalleries\f[], \f[I]vk\f[], \f[I]webtoons\f[], \f[I]weebcentral\f[], \f[I]xfolio\f[], \f[I]zerochan\f[] .br * \f[I]"1.0"\f[] \f[I]furaffinity\f[] .br * \f[I]"1.0-2.0"\f[] \f[I]flickr\f[], \f[I]pexels\f[], \f[I]weibo\f[], \f[I][wikimedia]\f[] .br * \f[I]"1.4"\f[] \f[I]wallhaven\f[] .br * \f[I]"2.0-4.0"\f[] \f[I]behance\f[], \f[I]imagefap\f[], \f[I][Nijie]\f[] .br * \f[I]"3.0-6.0"\f[] \f[I]bilibili\f[], \f[I]exhentai\f[], \f[I]idolcomplex\f[], \f[I][reactor]\f[], \f[I]readcomiconline\f[] .br * \f[I]"6.0-6.1"\f[] \f[I]twibooru\f[] .br * \f[I]"6.0-12.0"\f[] \f[I]instagram\f[] .br * \f[I]0\f[] otherwise .IP "Description:" 4 Minimal time interval in seconds between each HTTP request during data extraction. .SS extractor.*.username & .password .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 The username and password to use when attempting to log in to another site. This is supported for .br * \f[I]aibooru\f[] (*) .br * \f[I]ao3\f[] .br * \f[I]aryion\f[] .br * \f[I]atfbooru\f[] (*) .br * \f[I]bluesky\f[] .br * \f[I]booruvar\f[] (*) .br * \f[I]coomerparty\f[] .br * \f[I]danbooru\f[] (*) .br * \f[I]deviantart\f[] .br * \f[I]e621\f[] (*) .br * \f[I]e6ai\f[] (*) .br * \f[I]e926\f[] (*) .br * \f[I]exhentai\f[] .br * \f[I]horne\f[] (R) .br * \f[I]idolcomplex\f[] .br * \f[I]imgbb\f[] .br * \f[I]inkbunny\f[] .br * \f[I]kemonoparty\f[] .br * \f[I]koharu\f[] .br * \f[I]mangadex\f[] .br * \f[I]mangoxo\f[] .br * \f[I]newgrounds\f[] .br * \f[I]nijie\f[] (R) .br * \f[I]pillowfort\f[] .br * \f[I]sankaku\f[] .br * \f[I]scrolller\f[] .br * \f[I]seiga\f[] .br * \f[I]subscribestar\f[] .br * \f[I]tapas\f[] .br * \f[I]tsumino\f[] .br * \f[I]twitter\f[] .br * \f[I]vipergirls\f[] .br * \f[I]zerochan\f[] These values can also be specified via the \f[I]-u/--username\f[] and \f[I]-p/--password\f[] command-line options or by using a \f[I].netrc\f[] file. (see Authentication_) (*) The password value for these sites should be the API key found in your user profile, not the actual account password. (R) Login with username & password or supplying logged-in \f[I]cookies\f[] is required Note: Leave the \f[I]password\f[] value empty or undefined to be prompted for a password when performing a login (see \f[I]getpass()\f[]). .SS extractor.*.input .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] if stdin is attached to a terminal, \f[I]false\f[] otherwise .IP "Description:" 4 Allow prompting the user for interactive input. .SS extractor.*.netrc .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Enable the use of \f[I].netrc\f[] authentication data. .SS extractor.*.cookies .IP "Type:" 6 .br * \f[I]Path\f[] .br * \f[I]object\f[] (name -> value) .br * \f[I]list\f[] .IP "Description:" 4 Source to read additional cookies from. This can be .br * The \f[I]Path\f[] to a Mozilla/Netscape format cookies.txt file .. code:: json "~/.local/share/cookies-instagram-com.txt" .br * An \f[I]object\f[] specifying cookies as name-value pairs .. code:: json { "cookie-name": "cookie-value", "sessionid" : "14313336321%3AsabDFvuASDnlpb%3A31", "isAdult" : "1" } .br * A \f[I]list\f[] with up to 5 entries specifying a browser profile. .br * The first entry is the browser name .br * The optional second entry is a profile name or an absolute path to a profile directory .br * The optional third entry is the keyring to retrieve passwords for decrypting cookies from .br * The optional fourth entry is a (Firefox) container name (\f[I]"none"\f[] for only cookies with no container (default)) .br * The optional fifth entry is the domain to extract cookies for. Prefix it with a dot \f[I].\f[] to include cookies for subdomains. .. code:: json ["firefox"] ["firefox", null, null, "Personal"] ["chromium", "Private", "kwallet", null, ".twitter.com"] .SS extractor.*.cookies-select .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"random"\f[] .IP "Description:" 4 Interpret \f[I]extractor.cookies\f[] as a list of cookie sources and select one of them for each extractor run. .br * \f[I]"random"\f[]: Select cookies \f[I]randomly\f[] .br * \f[I]"rotate"\f[]: Select cookies in sequence. Start over from the beginning after reaching the end of the list. .. code:: json [ "~/.local/share/cookies-instagram-com-1.txt", "~/.local/share/cookies-instagram-com-2.txt", "~/.local/share/cookies-instagram-com-3.txt", ["firefox", null, null, "c1", ".instagram-com"], ] .SS extractor.*.cookies-update .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]Path\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Export session cookies in cookies.txt format. .br * If this is a \f[I]Path\f[], write cookies to the given file path. .br * If this is \f[I]true\f[] and \f[I]extractor.*.cookies\f[] specifies the \f[I]Path\f[] of a valid cookies.txt file, update its contents. .SS extractor.*.proxy .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]object\f[] (scheme -> proxy) .IP "Example:" 4 .. code:: json "http://10.10.1.10:3128" .. code:: json { "http" : "http://10.10.1.10:3128", "https": "http://10.10.1.10:1080", "http://10.20.1.128": "http://10.10.1.10:5323" } .IP "Description:" 4 Proxy (or proxies) to be used for remote connections. .br * If this is a \f[I]string\f[], it is the proxy URL for all outgoing requests. .br * If this is an \f[I]object\f[], it is a scheme-to-proxy mapping to specify different proxy URLs for each scheme. It is also possible to set a proxy for a specific host by using \f[I]scheme://host\f[] as key. See \f[I]Requests' proxy documentation\f[] for more details. Note: If a proxy URL does not include a scheme, \f[I]http://\f[] is assumed. .SS extractor.*.proxy-env .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Collect proxy configuration information from environment variables (\f[I]HTTP_PROXY\f[], \f[I]HTTPS_PROXY\f[], \f[I]NO_PROXY\f[]) and Windows Registry settings. .SS extractor.*.source-address .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] with 1 \f[I]string\f[] and 1 \f[I]integer\f[] as elements .IP "Example:" 4 .br * "192.168.178.20" .br * ["192.168.178.20", 8080] .IP "Description:" 4 Client-side IP address to bind to. Can be either a simple \f[I]string\f[] with just the local IP address .br or a \f[I]list\f[] with IP and explicit port number as elements. .br .SS extractor.*.user-agent .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 .br * \f[I]"gallery-dl/VERSION"\f[]: \f[I][Danbooru]\f[], \f[I]mangadex\f[], \f[I]weasyl\f[] .br * \f[I]"gallery-dl/VERSION (by mikf)"\f[]: \f[I][E621]\f[] .br * \f[I]"net.umanle.arca.android.playstore/0.9.75"\f[]: \f[I]arcalive\f[] .br * \f[I]"Patreon/72.2.28 (Android; Android 14; Scale/2.10)"\f[]: \f[I]patreon\f[] .br * \f[I]"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/LATEST.0.0.0 Safari/537.36"\f[]: \f[I]instagram\f[] .br * \f[I]"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:LATEST) Gecko/20100101 Firefox/LATEST"\f[]: otherwise .IP "Description:" 4 User-Agent header value used for HTTP requests. Setting this value to \f[I]"browser"\f[] will try to automatically detect and use the \f[I]User-Agent\f[] header of the system's default browser. Note: This option has *no* effect if \f[I]extractor.browser\f[] is enabled. .SS extractor.*.browser .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 .br * \f[I]"firefox"\f[]: \f[I]artstation\f[], \f[I]fanbox\f[], \f[I]mangasee\f[], \f[I]twitter\f[] .br * \f[I]null\f[]: otherwise .IP "Example:" 4 .br * "chrome:macos" .IP "Description:" 4 Try to emulate a real browser (\f[I]firefox\f[] or \f[I]chrome\f[]) by using their default HTTP headers and TLS ciphers for HTTP requests. Optionally, the operating system used in the \f[I]User-Agent\f[] header can be specified after a \f[I]:\f[] (\f[I]windows\f[], \f[I]linux\f[], or \f[I]macos\f[]). Note: This option overrides \f[I]user-agent\f[] and sets custom \f[I]headers\f[] and \f[I]ciphers\f[] defaults. Note: \f[I]requests\f[] and \f[I]urllib3\f[] only support HTTP/1.1, while a real browser would use HTTP/2. .SS extractor.*.referer .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Send \f[I]Referer\f[] headers with all outgoing HTTP requests. If this is a \f[I]string\f[], send it as Referer instead of the extractor's \f[I]root\f[] domain. .SS extractor.*.headers .IP "Type:" 6 \f[I]object\f[] (name -> value) .IP "Default:" 9 .. code:: json { "User-Agent" : "", "Accept" : "*/*", "Accept-Language": "en-US,en;q=0.5", "Accept-Encoding": "gzip, deflate", "Referer" : "" } .IP "Description:" 4 Additional \f[I]HTTP headers\f[] to be sent with each HTTP request, To disable sending a header, set its value to \f[I]null\f[]. .SS extractor.*.ciphers .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Example:" 4 .. code:: json ["ECDHE-ECDSA-AES128-GCM-SHA256", "ECDHE-RSA-AES128-GCM-SHA256", "ECDHE-ECDSA-CHACHA20-POLY1305", "ECDHE-RSA-CHACHA20-POLY1305"] .IP "Description:" 4 List of TLS/SSL cipher suites in \f[I]OpenSSL cipher list format\f[] to be passed to \f[I]ssl.SSLContext.set_ciphers()\f[] .SS extractor.*.tls12 .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 .br * \f[I]false\f[]: \f[I]artstation\f[] .br * \f[I]true\f[]: otherwise .IP "Description:" 4 Allow selecting TLS 1.2 cipher suites. Can be disabled to alter TLS fingerprints and potentially bypass Cloudflare blocks. .SS extractor.*.keywords .IP "Type:" 6 \f[I]object\f[] (name -> value) .IP "Example:" 4 {"type": "Pixel Art", "type_id": 123} .IP "Description:" 4 Additional name-value pairs to be added to each metadata dictionary. .SS extractor.*.keywords-eval .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Evaluate each \f[I]keywords\f[] \f[I]string\f[] value as a \f[I]format string\f[]. .SS extractor.*.keywords-default .IP "Type:" 6 any .IP "Default:" 9 \f[I]"None"\f[] .IP "Description:" 4 Default value used for missing or undefined keyword names in \f[I]format strings\f[]. .SS extractor.*.url-metadata .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 Insert a file's download URL into its metadata dictionary as the given name. For example, setting this option to \f[I]"gdl_file_url"\f[] will cause a new metadata field with name \f[I]gdl_file_url\f[] to appear, which contains the current file's download URL. This can then be used in \f[I]filenames\f[], with a \f[I]metadata\f[] post processor, etc. .SS extractor.*.path-metadata .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 Insert a reference to the current \f[I]PathFormat\f[] data structure into metadata dictionaries as the given name. For example, setting this option to \f[I]"gdl_path"\f[] would make it possible to access the current file's filename as \f[I]"{gdl_path.filename}"\f[]. .SS extractor.*.extractor-metadata .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 Insert a reference to the current \f[I]Extractor\f[] object into metadata dictionaries as the given name. .SS extractor.*.http-metadata .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 Insert an \f[I]object\f[] containing a file's HTTP headers and \f[I]filename\f[], \f[I]extension\f[], and \f[I]date\f[] parsed from them into metadata dictionaries as the given name. For example, setting this option to \f[I]"gdl_http"\f[] would make it possible to access the current file's \f[I]Last-Modified\f[] header as \f[I]"{gdl_http[Last-Modified]}"\f[] and its parsed form as \f[I]"{gdl_http[date]}"\f[]. .SS extractor.*.version-metadata .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 Insert an \f[I]object\f[] containing gallery-dl's version info into metadata dictionaries as the given name. The content of the object is as follows: .. code:: json { "version" : "string", "is_executable" : "bool", "current_git_head": "string or null" } .SS extractor.*.category-transfer .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 Extractor-specific .IP "Description:" 4 Transfer an extractor's (sub)category values to all child extractors spawned by it, to let them inherit their parent's config options. .SS extractor.*.blacklist & .whitelist .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]["oauth", "recursive", "test"]\f[] + current extractor category .IP "Example:" 4 ["imgur", "redgifs:user", "*:image"] .IP "Description:" 4 A list of extractor identifiers to ignore (or allow) when spawning child extractors for unknown URLs, e.g. from \f[I]reddit\f[] or \f[I]plurk\f[]. Each identifier can be .br * A category or basecategory name (\f[I]"imgur"\f[], \f[I]"mastodon"\f[]) .br * | A (base)category-subcategory pair, where both names are separated by a colon (\f[I]"redgifs:user"\f[]). Both names can be a * or left empty, matching all possible names (\f[I]"*:image"\f[], \f[I]":user"\f[]). .br Note: Any \f[I]blacklist\f[] setting will automatically include \f[I]"oauth"\f[], \f[I]"recursive"\f[], and \f[I]"test"\f[]. .SS extractor.*.archive .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]Path\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Example:" 4 .br * "$HOME/.archives/{category}.sqlite3" .br * "postgresql://user:pass@host/database" .IP "Description:" 4 File to store IDs of downloaded files in. Downloads of files already recorded in this archive file will be \f[I]skipped\f[]. The resulting archive file is not a plain text file but an SQLite3 database, as either lookup operations are significantly faster or memory requirements are significantly lower when the amount of stored IDs gets reasonably large. If this value is a \f[I]PostgreSQL Connection URI\f[], the archive will use this PostgreSQL database as backend (requires \f[I]Psycopg\f[]). Note: Archive files that do not already exist get generated automatically. Note: Archive paths support regular \f[I]format string\f[] replacements, but be aware that using external inputs for building local paths may pose a security risk. .SS extractor.*.archive-event .IP "Type:" 6 + \f[I]string\f[] + \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"file"\f[] .IP "Example:" 4 .br * "file,skip" .br * ["file", "skip"] .IP "Description:" 4 \f[I]Event(s)\f[] for which IDs get written to an \f[I]archive\f[]. Available events are: \f[I]file\f[], \f[I]skip\f[] .SS extractor.*.archive-format .IP "Type:" 6 \f[I]string\f[] .IP "Example:" 4 "{id}_{offset}" .IP "Description:" 4 An alternative \f[I]format string\f[] to build archive IDs with. .SS extractor.*.archive-mode .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"file"\f[] .IP "Description:" 4 Controls when to write \f[I]archive IDs\f[] to the archive database. .br * \f[I]"file"\f[]: Write IDs immediately after completing or skipping a file download. .br * \f[I]"memory"\f[]: Keep IDs in memory and only write them after successful job completion. .SS extractor.*.archive-prefix .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 .br * \f[I]""\f[] when \f[I]archive-table\f[] is set .br * \f[I]"{category}"\f[] otherwise .IP "Description:" 4 Prefix for archive IDs. .SS extractor.*.archive-pragma .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Example:" 4 ["journal_mode=WAL", "synchronous=NORMAL"] .IP "Description:" 4 A list of SQLite \f[I]PRAGMA\f[] statements to run during archive initialization. See \f[I]\f[] for available \f[I]PRAGMA\f[] statements and further details. .SS extractor.*.archive-table .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"archive"\f[] .IP "Example:" 4 "{category}" .IP "Description:" 4 \f[I]Format string\f[] selecting the archive database table name. .SS extractor.*.actions .IP "Type:" 6 .br * \f[I]object\f[] (pattern -> action(s)) .br * \f[I]list\f[] of \f[I]lists\f[] with pattern -> action(s) pairs as elements .IP "Example:" 4 .. code:: json { "info:Logging in as .+" : "level = debug", "warning:(?i)unable to .+": "exit 127", "error" : [ "status \f[I]= 1", "exec notify.sh 'gdl error'", "abort" ] } .. code:: json [ ["info:Logging in as .+" , "level = debug"], ["warning:(?i)unable to .+", "exit 127" ], ["error" , [ "status \f[]= 1", "exec notify.sh 'gdl error'", "abort" ]] ] .IP "Description:" 4 Perform an \f[I]action\f[] when logging a message matched by \f[I]pattern\f[]. \f[I]pattern\f[] is parsed as severity level (\f[I]debug\f[], \f[I]info\f[], \f[I]warning\f[], \f[I]error\f[], or integer value) followed by an optional \f[I]Python Regular Expression\f[] separated by a colon \f[I]:\f[]. Using \f[I]*\f[] as level or leaving it empty matches logging messages of all levels (e.g. \f[I]*:\f[] or \f[I]:\f[]). \f[I]action\f[] is parsed as action type followed by (optional) arguments. It is possible to specify more than one \f[I]action\f[] per \f[I]pattern\f[] by providing them as a \f[I]list\f[]: \f[I]["", "", …]\f[] Supported Action Types: \f[I]status\f[]: Modify job exit status. .br Expected syntax is \f[I] \f[] (e.g. \f[I]= 100\f[]). .br Supported operators are \f[I]=\f[] (assignment), \f[I]&\f[] (bitwise AND), \f[I]|\f[] (bitwise OR), \f[I]^\f[] (bitwise XOR). \f[I]level\f[]: Modify severity level of the current logging message. .br Can be one of \f[I]debug\f[], \f[I]info\f[], \f[I]warning\f[], \f[I]error\f[] or an integer value. .br \f[I]print\f[]: Write argument to stdout. \f[I]exec\f[]: Run a shell command. \f[I]abort\f[]: Stop the current extractor run. \f[I]terminate\f[]: Stop the current extractor run, including parent extractors. \f[I]restart\f[]: Restart the current extractor run. \f[I]wait\f[]: Sleep for a given \f[I]Duration\f[] or .br wait until Enter is pressed when no argument was given. .br \f[I]exit\f[]: Exit the program with the given argument as exit status. .SS extractor.*.postprocessors .IP "Type:" 6 .br * \f[I]Postprocessor Configuration\f[] object .br * \f[I]list\f[] of \f[I]Postprocessor Configuration\f[] objects .IP "Example:" 4 .. code:: json [ { "name": "zip" , "compression": "store" }, { "name": "exec", "command": ["/home/foobar/script", "{category}", "{image_id}"] } ] .IP "Description:" 4 A list of \f[I]post processors\f[] to be applied to each downloaded file in the specified order. Unlike other options, a \f[I]postprocessors\f[] setting at a deeper level .br does not override any \f[I]postprocessors\f[] setting at a lower level. Instead, all post processors from all applicable \f[I]postprocessors\f[] .br settings get combined into a single list. For example .br * an \f[I]mtime\f[] post processor at \f[I]extractor.postprocessors\f[], .br * a \f[I]zip\f[] post processor at \f[I]extractor.pixiv.postprocessors\f[], .br * and using \f[I]--exec\f[] will run all three post processors - \f[I]mtime\f[], \f[I]zip\f[], \f[I]exec\f[] - for each downloaded \f[I]pixiv\f[] file. .SS extractor.*.postprocessor-options .IP "Type:" 6 \f[I]object\f[] (name -> value) .IP "Example:" 4 .. code:: json { "archive": null, "keep-files": true } .IP "Description:" 4 Additional \f[I]Postprocessor Options\f[] that get added to each individual \f[I]post processor object\f[] before initializing it and evaluating filters. .SS extractor.*.retries .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]4\f[] .IP "Description:" 4 Maximum number of times a failed HTTP request is retried before giving up, or \f[I]-1\f[] for infinite retries. .SS extractor.*.retry-codes .IP "Type:" 6 \f[I]list\f[] of \f[I]integers\f[] .IP "Example:" 4 [404, 429, 430] .IP "Description:" 4 Additional \f[I]HTTP response status codes\f[] to retry an HTTP request on. \f[I]2xx\f[] codes (success responses) and \f[I]3xx\f[] codes (redirection messages) will never be retried and always count as success, regardless of this option. \f[I]5xx\f[] codes (server error responses) will always be retried, regardless of this option. .SS extractor.*.timeout .IP "Type:" 6 \f[I]float\f[] .IP "Default:" 9 \f[I]30.0\f[] .IP "Description:" 4 Amount of time (in seconds) to wait for a successful connection and response from a remote server. This value gets internally used as the \f[I]timeout\f[] parameter for the \f[I]requests.request()\f[] method. .SS extractor.*.verify .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Controls whether to verify SSL/TLS certificates for HTTPS requests. If this is a \f[I]string\f[], it must be the path to a CA bundle to use instead of the default certificates. This value gets internally used as the \f[I]verify\f[] parameter for the \f[I]requests.request()\f[] method. .SS extractor.*.download .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Controls whether to download media files. Setting this to \f[I]false\f[] won't download any files, but all other functions (\f[I]postprocessors\f[], \f[I]download archive\f[], etc.) will be executed as normal. .SS extractor.*.fallback .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Use fallback download URLs when a download fails. .SS extractor.*.image-range .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Examples:" 4 .br * \f[I]"10-20"\f[] .br * \f[I]"-5, 10, 30-50, 100-"\f[] .br * \f[I]"10:21, 30:51:2, :5, 100:"\f[] .br * \f[I]["-5", "10", "30-50", "100-"]\f[] .IP "Description:" 4 Index range(s) selecting which files to download. These can be specified as .br * index: \f[I]3\f[] (file number 3) .br * range: \f[I]2-4\f[] (files 2, 3, and 4) .br * \f[I]slice\f[]: \f[I]3:8:2\f[] (files 3, 5, and 7) Arguments for range and slice notation are optional .br and will default to begin (\f[I]1\f[]) or end (\f[I]sys.maxsize\f[]) if omitted. For example \f[I]5-\f[], \f[I]5:\f[], and \f[I]5::\f[] all mean "Start at file number 5". .br Note: The index of the first file is \f[I]1\f[]. .SS extractor.*.chapter-range .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 Like \f[I]image-range\f[], but applies to delegated URLs like manga chapters, etc. .SS extractor.*.image-filter .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Examples:" 4 .br * \f[I]"re.search(r'foo(bar)+', description)"\f[] .br * \f[I]["width >= 1200", "width/height > 1.2"]\f[] .IP "Description:" 4 Python expression controlling which files to download. A file only gets downloaded when *all* of the given expressions evaluate to \f[I]True\f[]. Available values are the filename-specific ones listed by \f[I]-K\f[] or \f[I]-j\f[]. .SS extractor.*.chapter-filter .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Examples:" 4 .br * \f[I]"lang == 'en'"\f[] .br * \f[I]["language == 'French'", "10 <= chapter < 20"]\f[] .IP "Description:" 4 Like \f[I]image-filter\f[], but applies to delegated URLs like manga chapters, etc. .SS extractor.*.image-unique .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Ignore image URLs that have been encountered before during the current extractor run. .SS extractor.*.chapter-unique .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Like \f[I]image-unique\f[], but applies to delegated URLs like manga chapters, etc. .SS extractor.*.date-format .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"%Y-%m-%dT%H:%M:%S"\f[] .IP "Description:" 4 Format string used to parse \f[I]string\f[] values of date-min and date-max. See \f[I]strptime\f[] for a list of formatting directives. Note: Despite its name, this option does **not** control how \f[I]{date}\f[] metadata fields are formatted. To use a different formatting for those values other than the default \f[I]%Y-%m-%d %H:%M:%S\f[], put \f[I]strptime\f[] formatting directives after a colon \f[I]:\f[], for example \f[I]{date:%Y%m%d}\f[]. .SS extractor.*.write-pages .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 During data extraction, write received HTTP request data to enumerated files in the current working directory. Special values: .br * \f[I]"all"\f[]: Include HTTP request and response headers. Hide \f[I]Authorization\f[], \f[I]Cookie\f[], and \f[I]Set-Cookie\f[] values. .br * \f[I]"ALL"\f[]: Include all HTTP request and response headers. .SH EXTRACTOR-SPECIFIC OPTIONS .SS extractor.ao3.formats .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"pdf"\f[] .IP "Example:" 4 .br * "azw3,epub,mobi,pdf,html" .br * ["azw3", "epub", "mobi", "pdf", "html"] .IP "Description:" 4 Format(s) to download. .SS extractor.arcalive.emoticons .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download emoticon images. .SS extractor.arcalive.gifs .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Try to download \f[I].gif\f[] versions of \f[I].mp4\f[] videos. \f[I]true\f[] | \f[I]"fallback\f[] Use the \f[I].gif\f[] version as primary URL and provide the \f[I].mp4\f[] one as \f[I]fallback\f[]. \f[I]"check"\f[] Check whether a \f[I].gif\f[] version is available by sending an extra HEAD request. \f[I]false\f[] Always download the \f[I].mp4\f[] version. .SS extractor.artstation.external .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Try to follow external URLs of embedded players. .SS extractor.artstation.max-posts .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Limit the number of posts/projects to download. .SS extractor.artstation.previews .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download video previews. .SS extractor.artstation.videos .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download video clips. .SS extractor.artstation.search.pro-first .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Enable the "Show Studio and Pro member artwork first" checkbox when retrieving search results. .SS extractor.aryion.recursive .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Controls the post extraction strategy. .br * \f[I]true\f[]: Start on users' main gallery pages and recursively descend into subfolders .br * \f[I]false\f[]: Get posts from "Latest Updates" pages .SS extractor.batoto.domain .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"auto"\f[] .IP "Example:" 4 "mangatoto.org" .IP "Description:" 4 Specifies the domain used by \f[I]batoto\f[] extractors. \f[I]"auto"\f[] | \f[I]"url"\f[] Use the input URL's domain \f[I]"nolegacy"\f[] Use the input URL's domain .br - replace legacy domains with \f[I]"xbato.org"\f[] \f[I]"nowarn"\f[] Use the input URL's domain .br - do not warn about legacy domains any \f[I]string\f[] Use this domain .SS extractor.bbc.width .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]1920\f[] .IP "Description:" 4 Specifies the requested image width. This value must be divisble by 16 and gets rounded down otherwise. The maximum possible value appears to be \f[I]1920\f[]. .SS extractor.behance.modules .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]["image", "video", "mediacollection", "embed"]\f[] .IP "Description:" 4 Selects which gallery modules to download from. Supported module types are \f[I]image\f[], \f[I]video\f[], \f[I]mediacollection\f[], \f[I]embed\f[], \f[I]text\f[]. .SS extractor.[blogger].api-key .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 Custom Blogger API key. https://developers.google.com/blogger/docs/3.0/using#APIKey .SS extractor.[blogger].videos .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download embedded videos hosted on https://www.blogger.com/ .SS extractor.bluesky.include .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 .br * \f[I]"posts"\f[] if \f[I]reposts\f[] or \f[I]quoted\f[] is enabled .br * \f[I]"media"\f[] otherwise .IP "Example:" 4 .br * "avatar,background,posts" .br * ["avatar", "background", "posts"] .IP "Description:" 4 A (comma-separated) list of subcategories to include when processing a user profile. Possible values are \f[I]"info"\f[], \f[I]"avatar"\f[], \f[I]"background"\f[], \f[I]"posts"\f[], \f[I]"replies"\f[], \f[I]"media"\f[], \f[I]"video"\f[], \f[I]"likes"\f[], It is possible to use \f[I]"all"\f[] instead of listing all values separately. .SS extractor.bluesky.likes.endpoint .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"listRecords"\f[] .IP "Description:" 4 API endpoint to use for retrieving liked posts. \f[I]"listRecords"\f[] Use the results from .br \f[I]com.atproto.repo.listRecords\f[] Requires no login and alows accessing likes of all users, .br but uses one request to \f[I]getPostThread\f[] per post, \f[I]"getActorLikes"\f[] Use the results from .br \f[I]app.bsky.feed.getActorLikes\f[] Requires login and only allows accessing your own likes. .br .SS extractor.bluesky.metadata .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Example:" 4 .br * "facets,user" .br * ["facets", "user"] .IP "Description:" 4 Extract additional metadata. .br * \f[I]facets\f[]: \f[I]hashtags\f[], \f[I]mentions\f[], and \f[I]uris\f[] .br * \f[I]user\f[]: detailed \f[I]user\f[] metadata for the user referenced in the input URL (See \f[I]app.bsky.actor.getProfile\f[]). .SS extractor.bluesky.likes.depth .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]0\f[] .IP "Description:" 4 Sets the maximum depth of returned reply posts. (See depth parameter of \f[I]app.bsky.feed.getPostThread\f[]) .SS extractor.bluesky.quoted .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Fetch media from quoted posts. .SS extractor.bluesky.reposts .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Process reposts. .SS extractor.bluesky.videos .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download videos. .SS extractor.boosty.allowed .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Request only available posts. .SS extractor.boosty.bought .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Request only purchased posts for \f[I]feed\f[] results. .SS extractor.boosty.metadata .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Provide detailed \f[I]user\f[] metadata. .SS extractor.boosty.videos .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Example:" 4 ["full_hd", "high", "medium"] .IP "Description:" 4 Download videos. If this is a \f[I]list\f[], it selects which format to try to download. .br Possibly available formats are .br .br * \f[I]ultra_hd\f[] (2160p) .br * \f[I]quad_hd\f[] (1440p) .br * \f[I]full_hd\f[] (1080p) .br * \f[I]high\f[] (720p) .br * \f[I]medium\f[] (480p) .br * \f[I]low\f[] (360p) .br * \f[I]lowest\f[] (240p) .br * \f[I]tiny\f[] (144p) .SS extractor.bunkr.endpoint .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"/api/_001"\f[] .IP "Description:" 4 API endpoint for retrieving file URLs. .SS extractor.bunkr.tlds .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Controls which \f[I]bunkr\f[] TLDs to accept. .br * \f[I]true\f[]: Match URLs with *all* possible TLDs (e.g. \f[I]bunkr.xyz\f[] or \f[I]bunkrrr.duck\f[]) .br * \f[I]false\f[]: Match only URLs with known TLDs .SS extractor.cien.files .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]["image", "video", "download", "gallery"]\f[] .IP "Description:" 4 Determines the type and order of files to download. Available types are \f[I]image\f[], \f[I]video\f[], \f[I]download\f[], \f[I]gallery\f[]. .SS extractor.civitai.api .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"trpc"\f[] .IP "Description:" 4 Selects which API endpoints to use. .br * \f[I]"rest"\f[]: \f[I]Public REST API\f[] .br * \f[I]"trpc"\f[]: Internal tRPC API .SS extractor.civitai.api-key .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 The API Key value generated in your \f[I]User Account Settings\f[] to make authorized API requests. See \f[I]API/Authorization\f[] for details. .SS extractor.civitai.files .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]["image"]\f[] .IP "Description:" 4 Determines the type and order of files to download when processing models. Available types are \f[I]model\f[], \f[I]image\f[], \f[I]gallery\f[]. .SS extractor.civitai.include .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]["user-models", "user-posts"]\f[] .IP "Description:" 4 A (comma-separated) list of subcategories to include when processing a user profile. Possible values are .br * \f[I]"user-models"\f[] .br * \f[I]"user-posts"\f[] .br * \f[I]"user-images"\f[] .br * \f[I]"user-videos"\f[] It is possible to use \f[I]"all"\f[] instead of listing all values separately. .SS extractor.civitai.metadata .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Example:" 4 .br * "generation,version" .br * ["generation", "version"] .IP "Description:" 4 Extract additional \f[I]generation\f[] and \f[I]version\f[] metadata. Note: This requires 1 additional HTTP request per image or video. .SS extractor.civitai.nsfw .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] (\f[I]"api": "rest"\f[]) .br * \f[I]integer\f[] (\f[I]"api": "trpc"\f[]) .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download NSFW-rated images. .br * For \f[I]"api": "rest"\f[], this can be one of \f[I]"None"\f[], \f[I]"Soft"\f[], \f[I]"Mature"\f[], \f[I]"X"\f[] to set the highest returned mature content flag. .br * For \f[I]"api": "trpc"\f[], this can be an \f[I]integer\f[] whose bits select the returned mature content flags. For example, \f[I]28\f[] (\f[I]4\f[I]8\f[]16\f[]) would return only \f[I]R\f[], \f[I]X\f[], and \f[I]XXX\f[] rated images, while \f[I]3\f[] (\f[I]1|2\f[]) would return only \f[I]None\f[] and \f[I]Soft\f[] rated images, .SS extractor.civitai.quality .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"original=true"\f[] .IP "Example:" 4 .br * "width=1280,quality=90" .br * ["width=1280", "quality=90"] .IP "Description:" 4 A (comma-separated) list of image quality options to pass with every image URL. Known available options include \f[I]original\f[], \f[I]quality\f[], \f[I]width\f[] Note: Set this option to an arbitrary letter, e.g., \f[I]"w"\f[], to download images in JPEG format at their original resolution. .SS extractor.civitai.quality-videos .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"quality=100"\f[] .IP "Example:" 4 .br * "+transcode=true,quality=100" .br * ["+", "transcode=true", "quality=100"] .IP "Description:" 4 A (comma-separated) list of video quality options to pass with every video URL. Known available options include \f[I]original\f[], \f[I]quality\f[], \f[I]transcode\f[] Use \f[I]+\f[] as first character to add the given options to the \f[I]quality\f[] ones. .SS extractor.cyberdrop.domain .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Example:" 4 "cyberdrop.to" .IP "Description:" 4 Specifies the domain used by \f[I]cyberdrop\f[] regardless of input URL. Setting this option to \f[I]"auto"\f[] uses the same domain as a given input URL. .SS extractor.[Danbooru].external .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 For unavailable or restricted posts, follow the \f[I]source\f[] and download from there if possible. .SS extractor.[Danbooru].favgroup.order-posts .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"pool"\f[] .IP "Description:" 4 Controls the order in which \f[I]pool\f[]/\f[I]favgroup\f[] posts are returned. \f[I]"pool"\f[] \f[I] \f[I]"pool_asc"\f[] \f[] \f[I]"asc"\f[] \f[I] \f[I]"asc_pool"\f[] Pool order \f[I]"pool_desc"\f[] \f[] \f[I]"desc_pool"\f[] \f[I] \f[I]"desc"\f[] Reverse Pool order \f[I]"id"\f[] \f[] \f[I]"id_desc"\f[] \f[I] \f[I]"desc_id"\f[] Descending Post ID order \f[I]"id_asc"\f[] \f[] \f[I]"asc_id"\f[] Ascending Post ID order .SS extractor.[Danbooru].ugoira .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Controls the download target for Ugoira posts. .br * \f[I]true\f[]: Original ZIP archives .br * \f[I]false\f[]: Converted video files .SS extractor.[Danbooru].metadata .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Example:" 4 .br * "replacements,comments,ai_tags" .br * ["replacements", "comments", "ai_tags"] .IP "Description:" 4 Extract additional metadata (notes, artist commentary, parent, children, uploader) It is possible to specify a custom list of metadata includes. See \f[I]available_includes\f[] for possible field names. \f[I]aibooru\f[] also supports \f[I]ai_metadata\f[]. Note: This requires 1 additional HTTP request per 200-post batch. .SS extractor.[Danbooru].threshold .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]integer\f[] .IP "Default:" 9 \f[I]"auto"\f[] .IP "Description:" 4 Stop paginating over API results if the length of a batch of returned posts is less than the specified number. Defaults to the per-page limit of the current instance, which is 200. Note: Changing this setting is normally not necessary. When the value is greater than the per-page limit, gallery-dl will stop after the first batch. The value cannot be less than 1. .SS extractor.deviantart.auto-watch .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Automatically watch users when encountering "Watchers-Only Deviations" (requires a \f[I]refresh-token\f[]). .SS extractor.deviantart.auto-unwatch .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 After watching a user through \f[I]auto-watch\f[], unwatch that user at the end of the current extractor run. .SS extractor.deviantart.comments .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract \f[I]comments\f[] metadata. .SS extractor.deviantart.comments-avatars .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download the avatar of each commenting user. Note: Enabling this option also enables deviantart.comments_. .SS extractor.deviantart.extra .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download extra Sta.sh resources from description texts and journals. Note: Enabling this option also enables deviantart.metadata_. .SS extractor.deviantart.flat .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Select the directory structure created by the Gallery- and Favorite-Extractors. .br * \f[I]true\f[]: Use a flat directory structure. .br * \f[I]false\f[]: Collect a list of all gallery-folders or favorites-collections and transfer any further work to other extractors (\f[I]folder\f[] or \f[I]collection\f[]), which will then create individual subdirectories for each of them. Note: Going through all gallery folders will not be able to fetch deviations which aren't in any folder. .SS extractor.deviantart.folders .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Provide a \f[I]folders\f[] metadata field that contains the names of all folders a deviation is present in. Note: Gathering this information requires a lot of API calls. Use with caution. .SS extractor.deviantart.group .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Check whether the profile name in a given URL belongs to a group or a regular user. When disabled, assume every given profile name belongs to a regular user. Special values: .br * \f[I]"skip"\f[]: Skip groups .SS extractor.deviantart.include .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"gallery"\f[] .IP "Example:" 4 .br * "favorite,journal,scraps" .br * ["favorite", "journal", "scraps"] .IP "Description:" 4 A (comma-separated) list of subcategories to include when processing a user profile. Possible values are \f[I]"avatar"\f[], \f[I]"background"\f[], \f[I]"gallery"\f[], \f[I]"scraps"\f[], \f[I]"journal"\f[], \f[I]"favorite"\f[], \f[I]"status"\f[]. It is possible to use \f[I]"all"\f[] instead of listing all values separately. .SS extractor.deviantart.intermediary .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 For older non-downloadable images, download a higher-quality \f[I]/intermediary/\f[] version. .SS extractor.deviantart.journals .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"html"\f[] .IP "Description:" 4 Selects the output format for textual content. This includes journals, literature and status updates. .br * \f[I]"html"\f[]: HTML with (roughly) the same layout as on DeviantArt. .br * \f[I]"text"\f[]: Plain text with image references and HTML tags removed. .br * \f[I]"none"\f[]: Don't download textual content. .SS extractor.deviantart.jwt .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Update \f[I]JSON Web Tokens\f[] (the \f[I]token\f[] URL parameter) of otherwise non-downloadable, low-resolution images to be able to download them in full resolution. Note: No longer functional as of 2023-10-11 .SS extractor.deviantart.mature .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Enable mature content. This option simply sets the \f[I]mature_content\f[] parameter for API calls to either \f[I]"true"\f[] or \f[I]"false"\f[] and does not do any other form of content filtering. .SS extractor.deviantart.metadata .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Example:" 4 .br * "stats,submission" .br * ["camera", "stats", "submission"] .IP "Description:" 4 Extract additional metadata for deviation objects. Provides \f[I]description\f[], \f[I]tags\f[], \f[I]license\f[], and \f[I]is_watching\f[] fields when enabled. It is possible to request extended metadata by specifying a list of .br * \f[I]camera\f[] : EXIF information (if available) .br * \f[I]stats\f[] : deviation statistics .br * \f[I]submission\f[] : submission information .br * \f[I]collection\f[] : favourited folder information (requires a \f[I]refresh token\f[]) .br * \f[I]gallery\f[] : gallery folder information (requires a \f[I]refresh token\f[]) Set this option to \f[I]"all"\f[] to request all extended metadata categories. See \f[I]/deviation/metadata\f[] for official documentation. .SS extractor.deviantart.original .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download original files if available. Setting this option to \f[I]"images"\f[] only downloads original files if they are images and falls back to preview versions for everything else (archives, videos, etc.). .SS extractor.deviantart.pagination .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"api"\f[] .IP "Description:" 4 Controls when to stop paginating over API results. .br * \f[I]"api"\f[]: Trust the API and stop when \f[I]has_more\f[] is \f[I]false\f[]. .br * \f[I]"manual"\f[]: Disregard \f[I]has_more\f[] and only stop when a batch of results is empty. .SS extractor.deviantart.previews .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 For non-image files (archives, videos, etc.), also download the file's preview image. Set this option to \f[I]"all"\f[] to download previews for all files. .SS extractor.deviantart.public .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Use a public access token for API requests. Disable this option to *force* using a private token for all requests when a \f[I]refresh token\f[] is provided. .SS extractor.deviantart.quality .IP "Type:" 6 .br * \f[I]integer\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]100\f[] .IP "Description:" 4 JPEG quality level of images for which an original file download is not available. Set this to \f[I]"png"\f[] to download a PNG version of these images instead. .SS extractor.deviantart.refresh-token .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 The \f[I]refresh-token\f[] value you get from \f[I]linking your DeviantArt account to gallery-dl\f[]. Using a \f[I]refresh-token\f[] allows you to access private or otherwise not publicly available deviations. Note: The \f[I]refresh-token\f[] becomes invalid \f[I]after 3 months\f[] or whenever your \f[I]cache file\f[] is deleted or cleared. .SS extractor.deviantart.wait-min .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]0\f[] .IP "Description:" 4 Minimum wait time in seconds before API requests. .SS extractor.deviantart.avatar.formats .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Example:" 4 ["original.jpg", "big.jpg", "big.gif", ".png"] .IP "Description:" 4 Avatar URL formats to return. Each format is parsed as \f[I]SIZE.EXT\f[]. .br Leave \f[I]SIZE\f[] empty to download the regular, small avatar format. .br .SS extractor.deviantart.folder.subfolders .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Also extract subfolder content. .SS extractor.discord.embeds .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]["image", "gifv", "video"]\f[] .IP "Description:" 4 Selects which embed types to download from. Supported embed types are \f[I]image\f[], \f[I]gifv\f[], \f[I]video\f[], \f[I]rich\f[], \f[I]article\f[], \f[I]link\f[]. .SS extractor.discord.threads .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Extract threads from Discord text channels. .SS extractor.discord.token .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 Discord Bot Token for API requests. You can follow \f[I]this guide\f[] to get a token. .SS extractor.[E621].metadata .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Example:" 4 .br * "notes,pools" .br * ["notes", "pools"] .IP "Description:" 4 Extract additional metadata (notes, pool metadata) if available. Note: This requires 0-2 additional HTTP requests per post. .SS extractor.[E621].threshold .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]integer\f[] .IP "Default:" 9 \f[I]"auto"\f[] .IP "Description:" 4 Stop paginating over API results if the length of a batch of returned posts is less than the specified number. Defaults to the per-page limit of the current instance, which is 320. Note: Changing this setting is normally not necessary. When the value is greater than the per-page limit, gallery-dl will stop after the first batch. The value cannot be less than 1. .SS extractor.exhentai.domain .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"auto"\f[] .IP "Description:" 4 .br * \f[I]"auto"\f[]: Use \f[I]e-hentai.org\f[] or \f[I]exhentai.org\f[] depending on the input URL .br * \f[I]"e-hentai.org"\f[]: Use \f[I]e-hentai.org\f[] for all URLs .br * \f[I]"exhentai.org"\f[]: Use \f[I]exhentai.org\f[] for all URLs .SS extractor.exhentai.fallback-retries .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]2\f[] .IP "Description:" 4 Number of times a failed image gets retried or \f[I]-1\f[] for infinite retries. .SS extractor.exhentai.fav .IP "Type:" 6 \f[I]string\f[] .IP "Example:" 4 "4" .IP "Description:" 4 After downloading a gallery, add it to your account's favorites as the given category number. Note: Set this to "favdel" to remove galleries from your favorites. Note: This will remove any Favorite Notes when applied to already favorited galleries. .SS extractor.exhentai.gp .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"resized"\f[] .IP "Description:" 4 Selects how to handle "you do not have enough GP" errors. .br * "resized": Continue downloading \f[I]non-original\f[] images. .br * "stop": Stop the current extractor run. .br * "wait": Wait for user input before retrying the current image. .SS extractor.exhentai.limits .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Sets a custom image download limit and stops extraction when it gets exceeded. .SS extractor.exhentai.metadata .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Load extended gallery metadata from the \f[I]API\f[]. Adds \f[I]archiver_key\f[], \f[I]posted\f[], and \f[I]torrents\f[]. Makes \f[I]date\f[] and \f[I]filesize\f[] more precise. .SS extractor.exhentai.original .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download full-sized original images if available. .SS extractor.exhentai.source .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"gallery"\f[] .IP "Description:" 4 Selects an alternative source to download files from. .br * \f[I]"hitomi"\f[]: Download the corresponding gallery from \f[I]hitomi.la\f[] .SS extractor.exhentai.tags .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Group \f[I]tags\f[] by type and provide them as \f[I]tags_\f[] metadata fields, for example \f[I]tags_artist\f[] or \f[I]tags_character\f[]. .SS extractor.facebook.author-followups .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "description:" 4 Extract comments that include photo attachments made by the author of the post. .SS extractor.facebook.videos .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Control video download behavior. .br * \f[I]true\f[]: Extract and download video & audio separately. .br * \f[I]"ytdl"\f[]: Let \f[I]ytdl\f[] handle video extraction and download, and merge video & audio streams. .br * \f[I]false\f[]: Ignore videos. .SS extractor.fanbox.comments .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract \f[I]comments\f[] metadata. Note: This requires 1 or more additional API requests per post, depending on the number of comments. .SS extractor.fanbox.embeds .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Control behavior on embedded content from external sites. .br * \f[I]true\f[]: Extract embed URLs and download them if supported (videos are not downloaded). .br * \f[I]"ytdl"\f[]: Like \f[I]true\f[], but let \f[I]ytdl\f[] handle video extraction and download for YouTube, Vimeo, and SoundCloud embeds. .br * \f[I]false\f[]: Ignore embeds. .SS extractor.fanbox.metadata .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Example:" 4 .br * user,plan,comments .br * ["user", "plan", "comments"] .IP "Description:" 4 Extract \f[I]plan\f[] and extended \f[I]user\f[] metadata. Supported fields when selecting which data to extract are .br * \f[I]comments\f[] .br * \f[I]plan\f[] .br * \f[I]user\f[] Note: \f[I]comments\f[] can also be enabled via \f[I]fanbox.comments\f[] .SS extractor.flickr.access-token & .access-token-secret .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 The \f[I]access_token\f[] and \f[I]access_token_secret\f[] values you get from \f[I]linking your Flickr account to gallery-dl\f[]. .SS extractor.flickr.contexts .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 For each photo, return the albums and pools it belongs to as \f[I]set\f[] and \f[I]pool\f[] metadata. Note: This requires 1 additional API call per photo. See \f[I]flickr.photos.getAllContexts\f[] for details. .SS extractor.flickr.exif .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 For each photo, return its EXIF/TIFF/GPS tags as \f[I]exif\f[] and \f[I]camera\f[] metadata. Note: This requires 1 additional API call per photo. See \f[I]flickr.photos.getExif\f[] for details. .SS extractor.flickr.info .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 For each photo, retrieve its "full" metadata as provided by \f[I]flickr.photos.getInfo\f[] Note: This requires 1 additional API call per photo. .SS extractor.flickr.metadata .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Example:" 4 .br * license,last_update,machine_tags .br * ["license", "last_update", "machine_tags"] .IP "Description:" 4 Extract additional metadata (license, date_taken, original_format, last_update, geo, machine_tags, o_dims) It is possible to specify a custom list of metadata includes. See \f[I]the extras parameter\f[] in \f[I]Flickr's API docs\f[] for possible field names. .SS extractor.flickr.profile .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract additional \f[I]user\f[] profile metadata. Note: This requires 1 additional API call per user profile. See \f[I]flickr.people.getInfo\f[] for details. .SS extractor.flickr.videos .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Extract and download videos. .SS extractor.flickr.size-max .IP "Type:" 6 .br * \f[I]integer\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Sets the maximum allowed size for downloaded images. .br * If this is an \f[I]integer\f[], it specifies the maximum image dimension (width and height) in pixels. .br * If this is a \f[I]string\f[], it should be one of Flickr's format specifiers (\f[I]"Original"\f[], \f[I]"Large"\f[], ... or \f[I]"o"\f[], \f[I]"k"\f[], \f[I]"h"\f[], \f[I]"l"\f[], ...) to use as an upper limit. .SS extractor.furaffinity.descriptions .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"text"\f[] .IP "Description:" 4 Controls the format of \f[I]description\f[] metadata fields. .br * \f[I]"text"\f[]: Plain text with HTML tags removed .br * \f[I]"html"\f[]: Raw HTML content .SS extractor.furaffinity.external .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Follow external URLs linked in descriptions. .SS extractor.furaffinity.include .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"gallery"\f[] .IP "Example:" 4 .br * "scraps,favorite" .br * ["scraps", "favorite"] .IP "Description:" 4 A (comma-separated) list of subcategories to include when processing a user profile. Possible values are \f[I]"gallery"\f[], \f[I]"scraps"\f[], \f[I]"favorite"\f[]. It is possible to use \f[I]"all"\f[] instead of listing all values separately. .SS extractor.furaffinity.layout .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"auto"\f[] .IP "Description:" 4 Selects which site layout to expect when parsing posts. .br * \f[I]"auto"\f[]: Automatically differentiate between \f[I]"old"\f[] and \f[I]"new"\f[] .br * \f[I]"old"\f[]: Expect the *old* site layout .br * \f[I]"new"\f[]: Expect the *new* site layout .SS extractor.gelbooru.api-key & .user-id .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Values from the API Access Credentials section found at the bottom of your \f[I]Account Options\f[] page. .SS extractor.gelbooru.favorite.order-posts .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"desc"\f[] .IP "Description:" 4 Controls the order in which favorited posts are returned. .br * \f[I]"asc"\f[]: Ascending favorite date order (oldest first) .br * \f[I]"desc"\f[]: Descending favorite date order (newest first) .br * \f[I]"reverse"\f[]: Same as \f[I]"asc"\f[] .SS extractor.generic.enabled .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Match **all** URLs not otherwise supported by gallery-dl, even ones without a \f[I]generic:\f[] prefix. .SS extractor.gofile.api-token .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 API token value found at the bottom of your \f[I]profile page\f[]. If not set, a temporary guest token will be used. .SS extractor.gofile.website-token .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 API token value used during API requests. An invalid or not up-to-date value will result in \f[I]401 Unauthorized\f[] errors. Keeping this option unset will use an extra HTTP request to attempt to fetch the current value used by gofile. .SS extractor.gofile.recursive .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Recursively download files from subfolders. .SS extractor.hentaifoundry.include .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"pictures"\f[] .IP "Example:" 4 .br * "scraps,stories" .br * ["scraps", "stories"] .IP "Description:" 4 A (comma-separated) list of subcategories to include when processing a user profile. Possible values are \f[I]"pictures"\f[], \f[I]"scraps"\f[], \f[I]"stories"\f[], \f[I]"favorite"\f[]. It is possible to use \f[I]"all"\f[] instead of listing all values separately. .SS extractor.hitomi.format .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"webp"\f[] .IP "Description:" 4 Selects which image format to download. Available formats are \f[I]"webp"\f[] and \f[I]"avif"\f[]. .SS extractor.imagechest.access-token .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 Your personal Image Chest access token. These tokens allow using the API instead of having to scrape HTML pages, providing more detailed metadata. (\f[I]date\f[], \f[I]description\f[], etc) See https://imgchest.com/docs/api/1.0/general/authorization for instructions on how to generate such a token. .SS extractor.imgur.client-id .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 Custom Client ID value for API requests. .SS extractor.imgur.mp4 .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Controls whether to choose the GIF or MP4 version of an animation. .br * \f[I]true\f[]: Follow Imgur's advice and choose MP4 if the \f[I]prefer_video\f[] flag in an image's metadata is set. .br * \f[I]false\f[]: Always choose GIF. .br * \f[I]"always"\f[]: Always choose MP4. .SS extractor.inkbunny.orderby .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"create_datetime"\f[] .IP "Description:" 4 Value of the \f[I]orderby\f[] parameter for submission searches. (See \f[I]API#Search\f[] for details) .SS extractor.instagram.api .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"rest"\f[] .IP "Description:" 4 Selects which API endpoints to use. .br * \f[I]"rest"\f[]: REST API - higher-resolution media .br * \f[I]"graphql"\f[]: GraphQL API - lower-resolution media .SS extractor.instagram.cursor .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Example:" 4 "3414259811154179155_25025320" .IP "Description:" 4 Controls from which position to start the extraction process from. .br * \f[I]true\f[]: Start from the beginning. Log the most recent \f[I]cursor\f[] value when interrupted before reaching the end. .br * \f[I]false\f[]: Start from the beginning. .br * any \f[I]string\f[]: Start from the position defined by this value. .SS extractor.instagram.include .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"posts"\f[] .IP "Example:" 4 .br * "stories,highlights,posts" .br * ["stories", "highlights", "posts"] .IP "Description:" 4 A (comma-separated) list of subcategories to include when processing a user profile. Possible values are \f[I]"posts"\f[], \f[I]"reels"\f[], \f[I]"tagged"\f[], \f[I]"stories"\f[], \f[I]"highlights"\f[], \f[I]"info"\f[], \f[I]"avatar"\f[]. It is possible to use \f[I]"all"\f[] instead of listing all values separately. .SS extractor.instagram.max-posts .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Limit the number of posts to download. .SS extractor.instagram.metadata .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Provide extended \f[I]user\f[] metadata even when referring to a user by ID, e.g. \f[I]instagram.com/id:12345678\f[]. Note: This metadata is always available when referring to a user by name, e.g. \f[I]instagram.com/USERNAME\f[]. .SS extractor.instagram.order-files .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"asc"\f[] .IP "Description:" 4 Controls the order in which files of each post are returned. .br * \f[I]"asc"\f[]: Same order as displayed in a post .br * \f[I]"desc"\f[]: Reverse order as displayed in a post .br * \f[I]"reverse"\f[]: Same as \f[I]"desc"\f[] Note: This option does *not* affect \f[I]{num}\f[]. To enumerate files in reverse order, use \f[I]count - num + 1\f[]. .SS extractor.instagram.order-posts .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"asc"\f[] .IP "Description:" 4 Controls the order in which posts are returned. .br * \f[I]"asc"\f[]: Same order as displayed .br * \f[I]"desc"\f[]: Reverse order as displayed .br * \f[I]"id"\f[] or \f[I]"id_asc"\f[]: Ascending order by ID .br * \f[I]"id_desc"\f[]: Descending order by ID .br * \f[I]"reverse"\f[]: Same as \f[I]"desc"\f[] Note: This option only affects \f[I]highlights\f[]. .SS extractor.instagram.previews .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download video previews. .SS extractor.instagram.videos .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Controls video download behavior. \f[I]true\f[] \f[I] \f[I]"dash"\f[] \f[] \f[I]"ytdl"\f[] Download videos from \f[I]video_dash_manifest\f[] data using \f[I]ytdl\f[] \f[I]"merged"\f[] Download pre-merged video formats \f[I]false\f[] Do not download videos .SS extractor.instagram.stories.split .IP "Type:" 6 .br * \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Split \f[I]stories\f[] elements into separate posts. .SS extractor.itaku.videos .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download video files. .SS extractor.kemonoparty.archives .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract additional metadata for \f[I]archives\f[] files, including \f[I]file\f[], \f[I]file_list\f[], and \f[I]password\f[]. Note: This requires 1 additional HTTP request per \f[I]archives\f[] file. .SS extractor.kemonoparty.comments .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract \f[I]comments\f[] metadata. Note: This requires 1 additional HTTP request per post. .SS extractor.kemonoparty.duplicates .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Controls how to handle duplicate files in a post. .br * \f[I]true\f[]: Download duplicates .br * \f[I]false\f[]: Ignore duplicates .SS extractor.kemonoparty.dms .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract a user's direct messages as \f[I]dms\f[] metadata. .SS extractor.kemonoparty.announcements .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract a user's announcements as \f[I]announcements\f[] metadata. .SS extractor.kemonoparty.endpoint .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"posts"\f[] .IP "Description:" 4 API endpoint to use for retrieving creator posts. \f[I]"legacy"\f[] Use the results from .br \f[I]/v1/{service}/user/{creator_id}/posts-legacy\f[] Provides less metadata, but is more reliable at returning all posts. .br Supports filtering results by \f[I]tag\f[] query parameter. .br \f[I]"legacy+"\f[] Use the results from .br \f[I]/v1/{service}/user/{creator_id}/posts-legacy\f[] to retrieve post IDs and one request to .br \f[I]/v1/{service}/user/{creator_id}/post/{post_id}\f[] to get a full set of metadata for each. \f[I]"posts"\f[] Use the results from .br \f[I]/v1/{service}/user/{creator_id}\f[] Provides more metadata, but might not return a creator's first/last posts. .br .SS extractor.kemonoparty.favorites .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"artist"\f[] .IP "Description:" 4 Determines the type of favorites to be downloaded. Available types are \f[I]artist\f[], and \f[I]post\f[]. .SS extractor.kemonoparty.files .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]["attachments", "file", "inline"]\f[] .IP "Description:" 4 Determines the type and order of files to be downloaded. Available types are \f[I]file\f[], \f[I]attachments\f[], and \f[I]inline\f[]. .SS extractor.kemonoparty.max-posts .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Limit the number of posts to download. .SS extractor.kemonoparty.metadata .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Extract \f[I]username\f[] and \f[I]user_profile\f[] metadata. .SS extractor.kemonoparty.revisions .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract post revisions. Set this to \f[I]"unique"\f[] to filter out duplicate revisions. Note: This requires 1 additional HTTP request per post. .SS extractor.kemonoparty.order-revisions .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"desc"\f[] .IP "Description:" 4 Controls the order in which \f[I]revisions\f[] are returned. .br * \f[I]"asc"\f[]: Ascending order (oldest first) .br * \f[I]"desc"\f[]: Descending order (newest first) .br * \f[I]"reverse"\f[]: Same as \f[I]"asc"\f[] .SS extractor.khinsider.covers .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download album cover images. .SS extractor.khinsider.format .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"mp3"\f[] .IP "Description:" 4 The name of the preferred file format to download. Use \f[I]"all"\f[] to download all available formats, or a (comma-separated) list to select multiple formats. If the selected format is not available, the first in the list gets chosen (usually mp3). .SS extractor.koharu.cbz .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download each gallery as a single \f[I].cbz\f[] file. Disabling this option causes a gallery to be downloaded as individual image files. .SS extractor.koharu.format .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]["0", "1600", "1280", "980", "780"]\f[] .IP "Description:" 4 Name(s) of the image format to download. When more than one format is given, the first available one is selected. Possible formats are .br \f[I]"780"\f[], \f[I]"980"\f[], \f[I]"1280"\f[], \f[I]"1600"\f[], \f[I]"0"\f[] (original) .br .SS extractor.koharu.tags .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Group \f[I]tags\f[] by type and provide them as \f[I]tags_\f[] metadata fields, for example \f[I]tags_artist\f[] or \f[I]tags_character\f[]. .SS extractor.lolisafe.domain .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Specifies the domain used by a \f[I]lolisafe\f[] extractor regardless of input URL. Setting this option to \f[I]"auto"\f[] uses the same domain as a given input URL. .SS extractor.luscious.gif .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Format in which to download animated images. Use \f[I]true\f[] to download animated images as gifs and \f[I]false\f[] to download as mp4 videos. .SS extractor.mangadex.api-server .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"https://api.mangadex.org"\f[] .IP "Description:" 4 The server to use for API requests. .SS extractor.mangadex.api-parameters .IP "Type:" 6 \f[I]object\f[] (name -> value) .IP "Example:" 4 {"order[updatedAt]": "desc"} .IP "Description:" 4 Additional query parameters to send when fetching manga chapters. (See \f[I]/manga/{id}/feed\f[] and \f[I]/user/follows/manga/feed\f[]) .SS extractor.mangadex.lang .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Example:" 4 .br * "en" .br * "fr,it" .br * ["fr", "it"] .IP "Description:" 4 \f[I]ISO 639-1\f[] language codes to filter chapters by. .SS extractor.mangadex.ratings .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]["safe", "suggestive", "erotica", "pornographic"]\f[] .IP "Description:" 4 List of acceptable content ratings for returned chapters. .SS extractor.mangapark.source .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]integer\f[] .IP "Example:" 4 .br * "koala:en" .br * 15150116 .IP "Description:" 4 Select chapter source and language for a manga. The general syntax is \f[I]":"\f[]. .br Both are optional, meaning \f[I]"koala"\f[], \f[I]"koala:"\f[], \f[I]":en"\f[], .br or even just \f[I]":"\f[] are possible as well. Specifying the numeric \f[I]ID\f[] of a source is also supported. .SS extractor.[mastodon].access-token .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 The \f[I]access-token\f[] value you get from \f[I]linking your account to gallery-dl\f[]. Note: gallery-dl comes with built-in tokens for \f[I]mastodon.social\f[], \f[I]pawoo\f[] and \f[I]baraag\f[]. For other instances, you need to obtain an \f[I]access-token\f[] in order to use usernames in place of numerical user IDs. .SS extractor.[mastodon].cards .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Fetch media from cards. .SS extractor.[mastodon].reblogs .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Fetch media from reblogged posts. .SS extractor.[mastodon].replies .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Fetch media from replies to other posts. .SS extractor.[mastodon].text-posts .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Also emit metadata for text-only posts without media content. .SS extractor.[misskey].access-token .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 Your access token, necessary to fetch favorited notes. .SS extractor.[misskey].renotes .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Fetch media from renoted notes. .SS extractor.[misskey].replies .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Fetch media from replies to other notes. .SS extractor.[moebooru].pool.metadata .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract extended \f[I]pool\f[] metadata. Note: Not supported by all \f[I]moebooru\f[] instances. .SS extractor.naver.videos .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download videos. .SS extractor.newgrounds.flash .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download original Adobe Flash animations instead of pre-rendered videos. .SS extractor.newgrounds.format .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]string\f[] .IP "Default:" 9 \f[I]"original"\f[] .IP "Example:" 4 .br * "720p" .br * ["mp4", "mov", "1080p", "720p"] .IP "Description:" 4 Selects the preferred format for video downloads. If the selected format is not available, the next smaller one gets chosen. If this is a \f[I]list\f[], try each given filename extension in original resolution or recoded format until an available format is found. .SS extractor.newgrounds.include .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"art"\f[] .IP "Example:" 4 .br * "movies,audio" .br * ["movies", "audio"] .IP "Description:" 4 A (comma-separated) list of subcategories to include when processing a user profile. Possible values are \f[I]"art"\f[], \f[I]"audio"\f[], \f[I]"games"\f[], \f[I]"movies"\f[]. It is possible to use \f[I]"all"\f[] instead of listing all values separately. .SS extractor.nijie.include .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"illustration,doujin"\f[] .IP "Description:" 4 A (comma-separated) list of subcategories to include when processing a user profile. Possible values are \f[I]"illustration"\f[], \f[I]"doujin"\f[], \f[I]"favorite"\f[], \f[I]"nuita"\f[]. It is possible to use \f[I]"all"\f[] instead of listing all values separately. .SS extractor.nitter.quoted .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Fetch media from quoted Tweets. .SS extractor.nitter.retweets .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Fetch media from Retweets. .SS extractor.nitter.videos .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Control video download behavior. .br * \f[I]true\f[]: Download videos .br * \f[I]"ytdl"\f[]: Download videos using \f[I]ytdl\f[] .br * \f[I]false\f[]: Skip video Tweets .SS extractor.oauth.browser .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Controls how a user is directed to an OAuth authorization page. .br * \f[I]true\f[]: Use Python's \f[I]webbrowser.open()\f[] method to automatically open the URL in the user's default browser. .br * \f[I]false\f[]: Ask the user to copy & paste an URL from the terminal. .SS extractor.oauth.cache .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Store tokens received during OAuth authorizations in \f[I]cache\f[]. .SS extractor.oauth.host .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"localhost"\f[] .IP "Description:" 4 Host name / IP address to bind to during OAuth authorization. .SS extractor.oauth.port .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]6414\f[] .IP "Description:" 4 Port number to listen on during OAuth authorization. Note: All redirects will go to port \f[I]6414\f[], regardless of the port specified here. You'll have to manually adjust the port number in your browser's address bar when using a different port than the default. .SS extractor.paheal.metadata .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract additional metadata (\f[I]source\f[], \f[I]uploader\f[]) Note: This requires 1 additional HTTP request per post. .SS extractor.patreon.files .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]["images", "image_large", "attachments", "postfile", "content"]\f[] .IP "Description:" 4 Determines types and order of files to download. Available types: .br * \f[I]postfile\f[] .br * \f[I]images\f[] .br * \f[I]image_large\f[] .br * \f[I]attachments\f[] .br * \f[I]content\f[] .SS extractor.patreon.format-images .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"download_url"\f[] .IP "Description:" 4 Selects the format of \f[I]images\f[] \f[I]files\f[]. Possible formats: .br * \f[I]download_url\f[] (\f[I]"a":1,"p":1\f[]) .br * \f[I]url\f[] (\f[I]"w":620\f[]) .br * \f[I]original\f[] (\f[I]"q":100,"webp":0\f[]) .br * \f[I]default\f[] (\f[I]"w":620\f[]) .br * \f[I]default_small\f[] (\f[I]"w":360\f[]) .br * \f[I]default_blurred\f[] (\f[I]"w":620\f[]) .br * \f[I]default_blurred_small\f[] (\f[I]"w":360\f[]) .br * \f[I]thumbnail\f[] (\f[I]"h":360,"w":360\f[]) .br * \f[I]thumbnail_large\f[] (\f[I]"h":1080,"w":1080\f[]) .br * \f[I]thumbnail_small\f[] (\f[I]"h":100,"w":100\f[]) .SS extractor.[philomena].api-key .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Your account's API Key, to use your personal browsing settings and filters. .SS extractor.[philomena].filter .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 :\f[I]derpibooru\f[]: \f[I]56027\f[] (\f[I]Everything\f[] filter) :\f[I]ponybooru\f[]: \f[I]3\f[] (\f[I]Nah.\f[] filter) :otherwise: \f[I]2\f[] .IP "Description:" 4 The content filter ID to use. Setting an explicit filter ID overrides any default filters and can be used to access 18+ content without \f[I]API Key\f[]. See \f[I]Filters\f[] for details. .SS extractor.[philomena].svg .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download SVG versions of images when available. Try to download the \f[I]view_url\f[] version of these posts when this option is disabled. .SS extractor.pillowfort.external .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Follow links to external sites, e.g. Twitter, .SS extractor.pillowfort.inline .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Extract inline images. .SS extractor.pillowfort.reblogs .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract media from reblogged posts. .SS extractor.pinterest.domain .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"auto"\f[] .IP "Description:" 4 Specifies the domain used by \f[I]pinterest\f[] extractors. Setting this option to \f[I]"auto"\f[] uses the same domain as a given input URL. .SS extractor.pinterest.sections .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Include pins from board sections. .SS extractor.pinterest.stories .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Extract files from story pins. .SS extractor.pinterest.videos .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download from video pins. .SS extractor.pixeldrain.api-key .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 Your account's \f[I]API key\f[] .SS extractor.pixeldrain.recursive .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Recursively download files from subfolders. .SS extractor.pixiv.include .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"artworks"\f[] .IP "Example:" 4 .br * "avatar,background,artworks" .br * ["avatar", "background", "artworks"] .IP "Description:" 4 A (comma-separated) list of subcategories to include when processing a user profile. Possible values are \f[I]"artworks"\f[], \f[I]"avatar"\f[], \f[I]"background"\f[], \f[I]"favorite"\f[], \f[I]"novel-user"\f[], \f[I]"novel-bookmark"\f[]. It is possible to use \f[I]"all"\f[] instead of listing all values separately. .SS extractor.pixiv.refresh-token .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 The \f[I]refresh-token\f[] value you get from running \f[I]gallery-dl oauth:pixiv\f[] (see OAuth_) or by using a third-party tool like \f[I]gppt\f[]. .SS extractor.pixiv.novel.covers .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download cover images. .SS extractor.pixiv.novel.embeds .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download embedded images. .SS extractor.pixiv.novel.full-series .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 When downloading a novel being part of a series, download all novels of that series. .SS extractor.pixiv.metadata .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Fetch extended \f[I]user\f[] metadata. .SS extractor.pixiv.metadata-bookmark .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 For works bookmarked by \f[I]your own account\f[], fetch bookmark tags as \f[I]tags_bookmark\f[] metadata. Note: This requires 1 additional API request per bookmarked post. .SS extractor.pixiv.captions .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 For works with seemingly empty \f[I]caption\f[] metadata, try to grab the actual \f[I]caption\f[] value using the AJAX API. .SS extractor.pixiv.comments .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Fetch \f[I]comments\f[] metadata. Note: This requires 1 or more additional API requests per post, depending on the number of comments. .SS extractor.pixiv.work.related .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Also download related artworks. .SS extractor.pixiv.tags .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"japanese"\f[] .IP "Description:" 4 Controls the \f[I]tags\f[] metadata field. .br * "japanese": List of Japanese tags .br * "translated": List of translated tags .br * "original": Unmodified list with both Japanese and translated tags .SS extractor.pixiv.ugoira .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download Pixiv's Ugoira animations. These animations come as a \f[I].zip\f[] archive containing all animation frames in JPEG format by default. Set this option to \f[I]"original"\f[] to download them as individual, higher-quality frames. Use an ugoira post processor to convert them to watchable animations. (Example__) .SS extractor.pixiv.max-posts .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]0\f[] .IP "Description:" 4 When downloading galleries, this sets the maximum number of posts to get. A value of \f[I]0\f[] means no limit. .SS extractor.pixiv.sanity .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Try to fetch \f[I]limit_sanity_level\f[] works via web API. .SS extractor.plurk.comments .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Also search Plurk comments for URLs. .SS extractor.[postmill].save-link-post-body .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Whether or not to save the body for link/image posts. .SS extractor.reactor.gif .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Format in which to download animated images. Use \f[I]true\f[] to download animated images as gifs and \f[I]false\f[] to download as mp4 videos. .SS extractor.readcomiconline.captcha .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"stop"\f[] .IP "Description:" 4 Controls how to handle redirects to CAPTCHA pages. .br * \f[I]"stop\f[]: Stop the current extractor run. .br * \f[I]"wait\f[]: Ask the user to solve the CAPTCHA and wait. .SS extractor.readcomiconline.quality .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"auto"\f[] .IP "Description:" 4 Sets the \f[I]quality\f[] query parameter of issue pages. (\f[I]"lq"\f[] or \f[I]"hq"\f[]) \f[I]"auto"\f[] uses the quality parameter of the input URL or \f[I]"hq"\f[] if not present. .SS extractor.reddit.comments .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]0\f[] .IP "Description:" 4 The value of the \f[I]limit\f[] parameter when loading a submission and its comments. This number (roughly) specifies the total amount of comments being retrieved with the first API call. Reddit's internal default and maximum values for this parameter appear to be 200 and 500 respectively. The value \f[I]0\f[] ignores all comments and significantly reduces the time required when scanning a subreddit. .SS extractor.reddit.morecomments .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Retrieve additional comments by resolving the \f[I]more\f[] comment stubs in the base comment tree. Note: This requires 1 additional API call for every 100 extra comments. .SS extractor.reddit.embeds .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download embedded comments media. .SS extractor.reddit.date-min & .date-max .IP "Type:" 6 \f[I]Date\f[] .IP "Default:" 9 \f[I]0\f[] and \f[I]253402210800\f[] (timestamp of \f[I]datetime.max\f[]) .IP "Description:" 4 Ignore all submissions posted before/after this date. .SS extractor.reddit.id-min & .id-max .IP "Type:" 6 \f[I]string\f[] .IP "Example:" 4 "6kmzv2" .IP "Description:" 4 Ignore all submissions posted before/after the submission with this ID. .SS extractor.reddit.previews .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 For failed downloads from external URLs / child extractors, download Reddit's preview image/video if available. .SS extractor.reddit.recursion .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]0\f[] .IP "Description:" 4 Reddit extractors can recursively visit other submissions linked to in the initial set of submissions. This value sets the maximum recursion depth. Special values: .br * \f[I]0\f[]: Recursion is disabled .br * \f[I]-1\f[]: Infinite recursion (don't do this) .SS extractor.reddit.refresh-token .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 The \f[I]refresh-token\f[] value you get from \f[I]linking your Reddit account to gallery-dl\f[]. Using a \f[I]refresh-token\f[] allows you to access private or otherwise not publicly available subreddits, given that your account is authorized to do so, but requests to the reddit API are going to be rate limited at 600 requests every 10 minutes/600 seconds. .SS extractor.reddit.selftext .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 .br * \f[I]true\f[] if \f[I]comments\f[] are enabled .br * \f[I]false\f[] otherwise .IP "Description:" 4 Follow links in the original post's \f[I]selftext\f[]. .SS extractor.reddit.videos .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Control video download behavior. .br * \f[I]true\f[]: Download videos and use \f[I]ytdl\f[] to handle HLS and DASH manifests .br * \f[I]"ytdl"\f[]: Download videos and let \f[I]ytdl\f[] handle all of video extraction and download .br * \f[I]"dash"\f[]: Extract DASH manifest URLs and use \f[I]ytdl\f[] to download and merge them. (*) .br * \f[I]false\f[]: Ignore videos (*) This saves 1 HTTP request per video and might potentially be able to download otherwise deleted videos, but it will not always get the best video quality available. .SS extractor.redgifs.format .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]["hd", "sd", "gif"]\f[] .IP "Description:" 4 List of names of the preferred animation format, which can be \f[I]"hd"\f[], \f[I]"sd"\f[], \f[I]"gif"\f[], \f[I]"thumbnail"\f[], \f[I]"vthumbnail"\f[], or \f[I]"poster"\f[]. If a selected format is not available, the next one in the list will be tried until an available format is found. If the format is given as \f[I]string\f[], it will be extended with \f[I]["hd", "sd", "gif"]\f[]. Use a list with one element to restrict it to only one possible format. .SS extractor.rule34xyz.format .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]["10", "40", "41", "2"]\f[] .IP "Example:" 4 "33,34,4" .IP "Description:" 4 Selects the file format to extract. When more than one format is given, the first available one is selected. .SS extractor.sankaku.id-format .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"numeric"\f[] .IP "Description:" 4 Format of \f[I]id\f[] metadata fields. .br * \f[I]"alphanumeric"\f[] or \f[I]"alnum"\f[]: 11-character alphanumeric IDs (\f[I]y0abGlDOr2o\f[]) .br * \f[I]"numeric"\f[] or \f[I]"legacy"\f[]: numeric IDs (\f[I]360451\f[]) .SS extractor.sankaku.refresh .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Refresh download URLs before they expire. .SS extractor.sankaku.tags .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Group \f[I]tags\f[] by type and .br provide them as \f[I]tags_TYPE\f[] and \f[I]tag_string_TYPE\f[] metadata fields, for example \f[I]tags_artist\f[] and \f[I]tags_character\f[]. .br \f[I]true\f[] Enable general \f[I]tags\f[] categories Requires: .br * 1 additional API request per 100 tags per post \f[I]"extended"\f[] Group \f[I]tags\f[] by the new, extended tag category system used on \f[I]chan.sankakucomplex.com\f[] Requires: .br * 1 additional HTTP request per post .br * logged-in \f[I]cookies\f[] to fetch full \f[I]tags\f[] category data \f[I]false\f[] Disable \f[I]tags\f[] categories .SS extractor.sankakucomplex.embeds .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download video embeds from external sites. .SS extractor.sankakucomplex.videos .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download videos. .SS extractor.skeb.article .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download article images. .SS extractor.skeb.sent-requests .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download sent requests. .SS extractor.skeb.thumbnails .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download thumbnails. .SS extractor.skeb.search.filters .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]["genre:art", "genre:voice", "genre:novel", "genre:video", "genre:music", "genre:correction"]\f[] .IP "Example:" 4 "genre:music OR genre:voice" .IP "Description:" 4 Filters used during searches. .SS extractor.smugmug.videos .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download video files. .SS extractor.steamgriddb.animated .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Include animated assets when downloading from a list of assets. .SS extractor.steamgriddb.epilepsy .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Include assets tagged with epilepsy when downloading from a list of assets. .SS extractor.steamgriddb.dimensions .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"all"\f[] .IP "Examples:" 4 .br * \f[I]"1024x512,512x512"\f[] .br * \f[I]["460x215", "920x430"]\f[] .IP "Description:" 4 Only include assets that are in the specified dimensions. \f[I]all\f[] can be used to specify all dimensions. Valid values are: .br * Grids: \f[I]460x215\f[], \f[I]920x430\f[], \f[I]600x900\f[], \f[I]342x482\f[], \f[I]660x930\f[], \f[I]512x512\f[], \f[I]1024x1024\f[] .br * Heroes: \f[I]1920x620\f[], \f[I]3840x1240\f[], \f[I]1600x650\f[] .br * Logos: N/A (will be ignored) .br * Icons: \f[I]8x8\f[], \f[I]10x10\f[], \f[I]14x14\f[], \f[I]16x16\f[], \f[I]20x20\f[], \f[I]24x24\f[], \f[I]28x28\f[], \f[I]32x32\f[], \f[I]35x35\f[], \f[I]40x40\f[], \f[I]48x48\f[], \f[I]54x54\f[], \f[I]56x56\f[], \f[I]57x57\f[], \f[I]60x60\f[], \f[I]64x64\f[], \f[I]72x72\f[], \f[I]76x76\f[], \f[I]80x80\f[], \f[I]90x90\f[], \f[I]96x96\f[], \f[I]100x100\f[], \f[I]114x114\f[], \f[I]120x120\f[], \f[I]128x128\f[], \f[I]144x144\f[], \f[I]150x150\f[], \f[I]152x152\f[], \f[I]160x160\f[], \f[I]180x180\f[], \f[I]192x192\f[], \f[I]194x194\f[], \f[I]256x256\f[], \f[I]310x310\f[], \f[I]512x512\f[], \f[I]768x768\f[], \f[I]1024x1024\f[] .SS extractor.steamgriddb.file-types .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"all"\f[] .IP "Examples:" 4 .br * \f[I]"png,jpeg"\f[] .br * \f[I]["jpeg", "webp"]\f[] .IP "Description:" 4 Only include assets that are in the specified file types. \f[I]all\f[] can be used to specify all file types. Valid values are: .br * Grids: \f[I]png\f[], \f[I]jpeg\f[], \f[I]jpg\f[], \f[I]webp\f[] .br * Heroes: \f[I]png\f[], \f[I]jpeg\f[], \f[I]jpg\f[], \f[I]webp\f[] .br * Logos: \f[I]png\f[], \f[I]webp\f[] .br * Icons: \f[I]png\f[], \f[I]ico\f[] .SS extractor.steamgriddb.download-fake-png .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download fake PNGs alongside the real file. .SS extractor.steamgriddb.humor .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Include assets tagged with humor when downloading from a list of assets. .SS extractor.steamgriddb.languages .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"all"\f[] .IP "Examples:" 4 .br * \f[I]"en,km"\f[] .br * \f[I]["fr", "it"]\f[] .IP "Description:" 4 Only include assets that are in the specified languages. \f[I]all\f[] can be used to specify all languages. Valid values are \f[I]ISO 639-1\f[] language codes. .SS extractor.steamgriddb.nsfw .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Include assets tagged with adult content when downloading from a list of assets. .SS extractor.steamgriddb.sort .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]score_desc\f[] .IP "Description:" 4 Set the chosen sorting method when downloading from a list of assets. Can be one of: .br * \f[I]score_desc\f[] (Highest Score (Beta)) .br * \f[I]score_asc\f[] (Lowest Score (Beta)) .br * \f[I]score_old_desc\f[] (Highest Score (Old)) .br * \f[I]score_old_asc\f[] (Lowest Score (Old)) .br * \f[I]age_desc\f[] (Newest First) .br * \f[I]age_asc\f[] (Oldest First) .SS extractor.steamgriddb.static .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Include static assets when downloading from a list of assets. .SS extractor.steamgriddb.styles .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]all\f[] .IP "Examples:" 4 .br * \f[I]white,black\f[] .br * \f[I]["no_logo", "white_logo"]\f[] .IP "Description:" 4 Only include assets that are in the specified styles. \f[I]all\f[] can be used to specify all styles. Valid values are: .br * Grids: \f[I]alternate\f[], \f[I]blurred\f[], \f[I]no_logo\f[], \f[I]material\f[], \f[I]white_logo\f[] .br * Heroes: \f[I]alternate\f[], \f[I]blurred\f[], \f[I]material\f[] .br * Logos: \f[I]official\f[], \f[I]white\f[], \f[I]black\f[], \f[I]custom\f[] .br * Icons: \f[I]official\f[], \f[I]custom\f[] .SS extractor.steamgriddb.untagged .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Include untagged assets when downloading from a list of assets. .SS extractor.[szurubooru].username & .token .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 Username and login token of your account to access private resources. To generate a token, visit \f[I]/user/USERNAME/list-tokens\f[] and click \f[I]Create Token\f[]. .SS extractor.tenor.format .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]["gif", "mp4", "webm", "webp"]\f[] .IP "Description:" 4 List of names of the preferred animation format. If a selected format is not available, the next one in the list will be tried until a format is found. Possible formats include .br * \f[I]gif\f[] .br * \f[I]gif_transparent\f[] .br * \f[I]mediumgif\f[] .br * \f[I]gifpreview\f[] .br * \f[I]tinygif\f[] .br * \f[I]tinygif_transparent\f[] .br * \f[I]mp4\f[] .br * \f[I]tinymp4\f[] .br * \f[I]webm\f[] .br * \f[I]webp\f[] .br * \f[I]webp_transparent\f[] .br * \f[I]tinywebp\f[] .br * \f[I]tinywebp_transparent\f[] .SS extractor.tiktok.audio .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Controls audio download behavior. .br * \f[I]true\f[]: Download audio tracks .br * \f[I]"ytdl"\f[]: Download audio tracks using \f[I]ytdl\f[] .br * \f[I]false\f[]: Ignore audio tracks .SS extractor.tiktok.videos .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download videos using \f[I]ytdl\f[]. .SS extractor.tiktok.user.avatar .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download user avatars. .SS extractor.tiktok.user.module .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Name or filesystem path of the \f[I]ytdl\f[] Python module to extract posts from a \f[I]tiktok\f[] user profile with. See \f[I]extractor.ytdl.module\f[]. .SS extractor.tiktok.user.tiktok-range .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]""\f[] .IP "Example:" 4 "1-20" .IP "Description:" 4 Range or playlist indices of \f[I]tiktok\f[] user posts to extract. See \f[I]ytdl/playlist_items\f[] for details. .SS extractor.tumblr.avatar .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download blog avatars. .SS extractor.tumblr.date-min & .date-max .IP "Type:" 6 \f[I]Date\f[] .IP "Default:" 9 \f[I]0\f[] and \f[I]null\f[] .IP "Description:" 4 Ignore all posts published before/after this date. .SS extractor.tumblr.external .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Follow external URLs (e.g. from "Link" posts) and try to extract images from them. .SS extractor.tumblr.inline .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Search posts for inline images and videos. .SS extractor.tumblr.offset .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]0\f[] .IP "Description:" 4 Custom \f[I]offset\f[] starting value when paginating over blog posts. Allows skipping over posts without having to waste API calls. .SS extractor.tumblr.original .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download full-resolution \f[I]photo\f[] and \f[I]inline\f[] images. For each photo with "maximum" resolution (width equal to 2048 or height equal to 3072) or each inline image, use an extra HTTP request to find the URL to its full-resolution version. .SS extractor.tumblr.pagination .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"offset"\f[] .IP "Description:" 4 Controls how to paginate over blog posts. .br * \f[I]"api"\f[]: \f[I]next\f[] parameter provided by the API (potentially misses posts due to a \f[I]bug\f[] in Tumblr's API) .br * \f[I]"before"\f[]: timestamp of last post .br * \f[I]"offset"\f[]: post offset number .SS extractor.tumblr.ratelimit .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"abort"\f[] .IP "Description:" 4 Selects how to handle exceeding the daily API rate limit. .br * \f[I]"abort"\f[]: Raise an error and stop extraction .br * \f[I]"wait"\f[]: Wait until rate limit reset .SS extractor.tumblr.reblogs .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 .br * \f[I]true\f[]: Extract media from reblogged posts .br * \f[I]false\f[]: Skip reblogged posts .br * \f[I]"same-blog"\f[]: Skip reblogged posts unless the original post is from the same blog .SS extractor.tumblr.posts .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"all"\f[] .IP "Example:" 4 .br * "video,audio,link" .br * ["video", "audio", "link"] .IP "Description:" 4 A (comma-separated) list of post types to extract images, etc. from. Possible types are \f[I]text\f[], \f[I]quote\f[], \f[I]link\f[], \f[I]answer\f[], \f[I]video\f[], \f[I]audio\f[], \f[I]photo\f[], \f[I]chat\f[]. It is possible to use \f[I]"all"\f[] instead of listing all types separately. .SS extractor.tumblr.fallback-delay .IP "Type:" 6 \f[I]float\f[] .IP "Default:" 9 \f[I]120.0\f[] .IP "Description:" 4 Number of seconds to wait between retries for fetching full-resolution images. .SS extractor.tumblr.fallback-retries .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]2\f[] .IP "Description:" 4 Number of retries for fetching full-resolution images or \f[I]-1\f[] for infinite retries. .SS extractor.twibooru.api-key .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Your \f[I]Twibooru API Key\f[], to use your account's browsing settings and filters. .SS extractor.twibooru.filter .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]2\f[] (\f[I]Everything\f[] filter) .IP "Description:" 4 The content filter ID to use. Setting an explicit filter ID overrides any default filters and can be used to access 18+ content without \f[I]API Key\f[]. See \f[I]Filters\f[] for details. .SS extractor.twibooru.svg .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download SVG versions of images when available. Try to download the \f[I]view_url\f[] version of these posts when this option is disabled. .SS extractor.twitter.ads .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Fetch media from promoted Tweets. .SS extractor.twitter.cards .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Controls how to handle \f[I]Twitter Cards\f[]. .br * \f[I]false\f[]: Ignore cards .br * \f[I]true\f[]: Download image content from supported cards .br * \f[I]"ytdl"\f[]: Additionally download video content from unsupported cards using \f[I]ytdl\f[] .SS extractor.twitter.cards-blacklist .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Example:" 4 ["summary", "youtube.com", "player:twitch.tv"] .IP "Description:" 4 List of card types to ignore. Possible values are .br * card names .br * card domains .br * \f[I]:\f[] .SS extractor.twitter.conversations .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 For input URLs pointing to a single Tweet, e.g. https://twitter.com/i/web/status/, fetch media from all Tweets and replies in this \f[I]conversation \f[]. If this option is equal to \f[I]"accessible"\f[], only download from conversation Tweets if the given initial Tweet is accessible. .SS extractor.twitter.csrf .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"cookies"\f[] .IP "Description:" 4 Controls how to handle Cross Site Request Forgery (CSRF) tokens. .br * \f[I]"auto"\f[]: Always auto-generate a token. .br * \f[I]"cookies"\f[]: Use token given by the \f[I]ct0\f[] cookie if present. .SS extractor.twitter.cursor .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Example:" 4 "1/DAABCgABGVKi5lE___oKAAIYbfYNcxrQLggAAwAAAAIAAA" .IP "Description:" 4 Controls from which position to start the extraction process from. .br * \f[I]true\f[]: Start from the beginning. Log the most recent \f[I]cursor\f[] value when interrupted before reaching the end. .br * \f[I]false\f[]: Start from the beginning. .br * any \f[I]string\f[]: Start from the position defined by this value. Note: A \f[I]cursor\f[] value from one timeline cannot be used with another. .SS extractor.twitter.expand .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 For each Tweet, return *all* Tweets from that initial Tweet's conversation or thread, i.e. *expand* all Twitter threads. Going through a timeline with this option enabled is essentially the same as running \f[I]gallery-dl https://twitter.com/i/web/status/\f[] with enabled \f[I]conversations\f[] option for each Tweet in said timeline. Note: This requires at least 1 additional API call per initial Tweet. .SS extractor.twitter.unavailable .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Try to download media marked as \f[I]Unavailable\f[], e.g. \f[I]Geoblocked\f[] videos. .SS extractor.twitter.include .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"timeline"\f[] .IP "Example:" 4 .br * "avatar,background,media" .br * ["avatar", "background", "media"] .IP "Description:" 4 A (comma-separated) list of subcategories to include when processing a user profile. Possible values are \f[I]"info"\f[], \f[I]"avatar"\f[], \f[I]"background"\f[], \f[I]"timeline"\f[], \f[I]"tweets"\f[], \f[I]"media"\f[], \f[I]"replies"\f[], \f[I]"likes"\f[]. It is possible to use \f[I]"all"\f[] instead of listing all values separately. .SS extractor.twitter.transform .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Transform Tweet and User metadata into a simpler, uniform format. .SS extractor.twitter.tweet-endpoint .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"auto"\f[] .IP "Description:" 4 Selects the API endpoint used to retrieve single Tweets. .br * \f[I]"restid"\f[]: \f[I]/TweetResultByRestId\f[] - accessible to guest users .br * \f[I]"detail"\f[]: \f[I]/TweetDetail\f[] - more stable .br * \f[I]"auto"\f[]: \f[I]"detail"\f[] when logged in, \f[I]"restid"\f[] otherwise .SS extractor.twitter.size .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]["orig", "4096x4096", "large", "medium", "small"]\f[] .IP "Description:" 4 The image version to download. Any entries after the first one will be used for potential \f[I]fallback\f[] URLs. Known available sizes are .br * \f[I]orig\f[] .br * \f[I]large\f[] .br * \f[I]medium\f[] .br * \f[I]small\f[] .br * \f[I]4096x4096\f[] .br * \f[I]900x900\f[] .br * \f[I]360x360\f[] .SS extractor.twitter.logout .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Logout and retry as guest when access to another user's Tweets is blocked. .SS extractor.twitter.pinned .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Fetch media from pinned Tweets. .SS extractor.twitter.quoted .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Fetch media from quoted Tweets. If this option is enabled, gallery-dl will try to fetch a quoted (original) Tweet when it sees the Tweet which quotes it. .SS extractor.twitter.ratelimit .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"wait"\f[] .IP "Description:" 4 Selects how to handle exceeding the API rate limit. .br * \f[I]"abort"\f[]: Raise an error and stop extraction .br * \f[I]"wait"\f[]: Wait until rate limit reset .br * \f[I]"wait:N"\f[]: Wait for \f[I]N\f[] seconds .SS extractor.twitter.relogin .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 When receiving a "Could not authenticate you" error while logged in with \f[I]username & password\f[], refresh the current login session and try to continue from where it left off. .SS extractor.twitter.locked .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"abort"\f[] .IP "Description:" 4 Selects how to handle "account is temporarily locked" errors. .br * \f[I]"abort"\f[]: Raise an error and stop extraction .br * \f[I]"wait"\f[]: Wait until the account is unlocked and retry .SS extractor.twitter.replies .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Fetch media from replies to other Tweets. If this value is \f[I]"self"\f[], only consider replies where reply and original Tweet are from the same user. Note: Twitter will automatically expand conversations if you use the \f[I]/with_replies\f[] timeline while logged in. For example, media from Tweets which the user replied to will also be downloaded. It is possible to exclude unwanted Tweets using \f[I]image-filter \f[]. .SS extractor.twitter.retweets .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Fetch media from Retweets. If this value is \f[I]"original"\f[], metadata for these files will be taken from the original Tweets, not the Retweets. .SS extractor.twitter.timeline.strategy .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"auto"\f[] .IP "Description:" 4 Controls the strategy / tweet source used for timeline URLs (\f[I]https://twitter.com/USER/timeline\f[]). .br * \f[I]"tweets"\f[]: \f[I]/tweets\f[] timeline + search .br * \f[I]"media"\f[]: \f[I]/media\f[] timeline + search .br * \f[I]"with_replies"\f[]: \f[I]/with_replies\f[] timeline + search .br * \f[I]"auto"\f[]: \f[I]"tweets"\f[] or \f[I]"media"\f[], depending on \f[I]retweets\f[] and \f[I]text-tweets\f[] settings .SS extractor.twitter.text-tweets .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Also emit metadata for text-only Tweets without media content. This only has an effect with a \f[I]metadata\f[] (or \f[I]exec\f[]) post processor with \f[I]"event": "post"\f[] and appropriate \f[I]filename\f[]. .SS extractor.twitter.twitpic .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract \f[I]TwitPic\f[] embeds. .SS extractor.twitter.unique .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Ignore previously seen Tweets. .SS extractor.twitter.username-alt .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 Alternate Identifier (username, email, phone number) when \f[I]logging in\f[]. When not specified and asked for by Twitter, this identifier will need to entered in an interactive prompt. .SS extractor.twitter.users .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"user"\f[] .IP "Example:" 4 "https://twitter.com/search?q=from:{legacy[screen_name]}" .IP "Description:" 4 Format string for user URLs generated from .br \f[I]following\f[] and \f[I]list-members\f[] queries, whose replacement field values come from Twitter \f[I]user\f[] objects .br (\f[I]Example\f[]) Special values: .br * \f[I]"user"\f[]: \f[I]https://twitter.com/i/user/{rest_id}\f[] .br * \f[I]"timeline"\f[]: \f[I]https://twitter.com/id:{rest_id}/timeline\f[] .br * \f[I]"tweets"\f[]: \f[I]https://twitter.com/id:{rest_id}/tweets\f[] .br * \f[I]"media"\f[]: \f[I]https://twitter.com/id:{rest_id}/media\f[] Note: To allow gallery-dl to follow custom URL formats, set the \f[I]blacklist\f[] for \f[I]twitter\f[] to a non-default value, e.g. an empty string \f[I]""\f[]. .SS extractor.twitter.videos .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Control video download behavior. .br * \f[I]true\f[]: Download videos .br * \f[I]"ytdl"\f[]: Download videos using \f[I]ytdl\f[] .br * \f[I]false\f[]: Skip video Tweets .SS extractor.unsplash.format .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"raw"\f[] .IP "Description:" 4 Name of the image format to download. Available formats are \f[I]"raw"\f[], \f[I]"full"\f[], \f[I]"regular"\f[], \f[I]"small"\f[], and \f[I]"thumb"\f[]. .SS extractor.vipergirls.domain .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"viper.click"\f[] .IP "Description:" 4 Specifies the domain used by \f[I]vipergirls\f[] extractors. For example \f[I]"viper.click"\f[] if the main domain is blocked or to bypass Cloudflare, .SS extractor.vipergirls.like .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Automatically like posts after downloading their images. Note: Requires \f[I]login\f[] or \f[I]cookies\f[] .SS extractor.vk.offset .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]0\f[] .IP "Description:" 4 Custom \f[I]offset\f[] starting value when paginating over image results. .SS extractor.vsco.include .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"gallery"\f[] .IP "Example:" 4 .br * "avatar,collection" .br * ["avatar", "collection"] .IP "Description:" 4 A (comma-separated) list of subcategories to include when processing a user profile. Possible values are \f[I]"avatar"\f[], \f[I]"gallery"\f[], \f[I]"spaces"\f[], \f[I]"collection"\f[], It is possible to use \f[I]"all"\f[] instead of listing all values separately. .SS extractor.vsco.videos .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download video files. .SS extractor.wallhaven.api-key .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Your \f[I]Wallhaven API Key\f[], to use your account's browsing settings and default filters when searching. See https://wallhaven.cc/help/api for more information. .SS extractor.wallhaven.include .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"uploads"\f[] .IP "Example:" 4 .br * "uploads,collections" .br * ["uploads", "collections"] .IP "Description:" 4 A (comma-separated) list of subcategories to include when processing a user profile. Possible values are \f[I]"uploads"\f[], \f[I]"collections"\f[]. It is possible to use \f[I]"all"\f[] instead of listing all values separately. .SS extractor.wallhaven.metadata .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract additional metadata (tags, uploader) Note: This requires 1 additional HTTP request per post. .SS extractor.weasyl.api-key .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Your \f[I]Weasyl API Key\f[], to use your account's browsing settings and filters. .SS extractor.weasyl.metadata .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Fetch extra submission metadata during gallery downloads. .br (\f[I]comments\f[], \f[I]description\f[], \f[I]favorites\f[], \f[I]folder_name\f[], .br \f[I]tags\f[], \f[I]views\f[]) Note: This requires 1 additional HTTP request per submission. .SS extractor.webtoons.quality .IP "Type:" 6 .br * \f[I]integer\f[] .br * \f[I]string\f[] .br * \f[I]object\f[] (ext -> type) .IP "Default:" 9 \f[I]"original"\f[] .IP "Example:" 4 .br * 90 .br * "q50" .br * {"jpg": "q80", "jpeg": "q80", "png": false} .IP "Description:" 4 Controls the quality of downloaded files by modifying URLs' \f[I]type\f[] parameter. \f[I]"original"\f[] Download minimally compressed versions of JPG files any \f[I]integer\f[] Use \f[I]"q"\f[] as \f[I]type\f[] parameter for JPEG files any \f[I]string\f[] Use this value as \f[I]type\f[] parameter for JPEG files any \f[I]object\f[] Use the given values as \f[I]type\f[] parameter for URLs with the specified extensions .br - Set a value to \f[I]false\f[] to completely remove these extension's \f[I]type\f[] parameter .br - Omit an extension to leave its URLs unchanged .br .SS extractor.weibo.gifs .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download \f[I]gif\f[] files. Set this to \f[I]"video"\f[] to download GIFs as video files. .SS extractor.weibo.include .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"feed"\f[] .IP "Description:" 4 A (comma-separated) list of subcategories to include when processing a user profile. Possible values are \f[I]"home"\f[], \f[I]"feed"\f[], \f[I]"videos"\f[], \f[I]"newvideo"\f[], \f[I]"article"\f[], \f[I]"album"\f[]. It is possible to use \f[I]"all"\f[] instead of listing all values separately. .SS extractor.weibo.livephoto .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download \f[I]livephoto\f[] files. .SS extractor.weibo.movies .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download \f[I]movie\f[] videos. .SS extractor.weibo.retweets .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Fetch media from retweeted posts. If this value is \f[I]"original"\f[], metadata for these files will be taken from the original posts, not the retweeted posts. .SS extractor.weibo.videos .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download video files. .SS extractor.wikimedia.limit .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]50\f[] .IP "Description:" 4 Number of results to return in a single API query. The value must be between 10 and 500. .SS extractor.wikimedia.subcategories .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 For \f[I]Category:\f[] pages, recursively descent into subcategories. .SS extractor.ytdl.cmdline-args .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Example:" 4 .br * "--quiet --write-sub --merge-output-format mkv" .br * ["--quiet", "--write-sub", "--merge-output-format", "mkv"] .IP "Description:" 4 Additional \f[I]ytdl\f[] options specified as command-line arguments. See \f[I]yt-dlp options\f[] / \f[I]youtube-dl options\f[] .SS extractor.ytdl.config-file .IP "Type:" 6 \f[I]Path\f[] .IP "Example:" 4 "~/.config/yt-dlp/config" .IP "Description:" 4 Location of a \f[I]ytdl\f[] configuration file to load options from. .SS extractor.ytdl.enabled .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Process URLs otherwise unsupported by gallery-dl with \f[I]ytdl\f[]. .SS extractor.ytdl.format .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 Default of the \f[I]ytdl\f[] \f[I]module\f[] used. .br (\f[I]"bestvideo*+bestaudio/best"\f[] for \f[I]yt_dlp\f[], .br \f[I]"bestvideo+bestaudio/best"\f[] for \f[I]youtube_dl\f[]) .IP "Description:" 4 \f[I]ytdl\f[] format selection string. See \f[I]yt-dlp format selection\f[] / \f[I]youtube-dl format selection\f[] .SS extractor.ytdl.generic .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Enables the use of \f[I]ytdl's\f[] \f[I]generic\f[] extractor. Set this option to \f[I]"force"\f[] for the same effect as \f[I]--force-generic-extractor\f[]. .SS extractor.ytdl.logging .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Route \f[I]ytdl's\f[] output through gallery-dl's logging system. Otherwise it will be written directly to stdout/stderr. Note: Set \f[I]quiet\f[] and \f[I]no_warnings\f[] in \f[I]extractor.ytdl.raw-options\f[] to \f[I]true\f[] to suppress all output. .SS extractor.ytdl.module .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]Path\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Example:" 4 .br * "yt-dlp" .br * "/home/user/.local/lib/python3.13/site-packages/youtube_dl" .IP "Description:" 4 Name or filesystem path of the \f[I]ytdl\f[] Python module to import. Setting this to \f[I]null\f[] will try to import \f[I]"yt_dlp"\f[] followed by \f[I]"youtube_dl"\f[] as fallback. .SS extractor.ytdl.raw-options .IP "Type:" 6 \f[I]object\f[] (name -> value) .IP "Example:" 4 .. code:: json { "quiet": true, "writesubtitles": true, "merge_output_format": "mkv" } .IP "Description:" 4 Additional options passed directly to the \f[I]YoutubeDL\f[] constructor. Available options can be found in \f[I]yt-dlp's docstrings\f[] / \f[I]youtube-dl's docstrings\f[] .SS extractor.zerochan.extensions .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]["jpg", "png", "webp", "gif"]\f[] .IP "Example:" 4 .br * "gif" .br * ["webp", "gif", "jpg"} .IP "Description:" 4 List of filename extensions to try when dynamically building download URLs (\f[I]"pagination": "api"\f[] + \f[I]"metadata": false\f[]) .SS extractor.zerochan.metadata .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract additional metadata (date, md5, tags, ...) Note: This requires 1-2 additional HTTP requests per post. .SS extractor.zerochan.pagination .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"api"\f[] .IP "Description:" 4 Controls how to paginate over tag search results. .br * \f[I]"api"\f[]: Use the \f[I]JSON API\f[] (no \f[I]extension\f[] metadata) .br * \f[I]"html"\f[]: Parse HTML pages (limited to 100 pages * 24 posts) .SS extractor.zerochan.redirects .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Automatically follow tag redirects. .SS extractor.[booru].tags .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Group \f[I]tags\f[] by type and provide them as \f[I]tags_\f[] metadata fields, for example \f[I]tags_artist\f[] or \f[I]tags_character\f[]. Note: This requires 1 additional HTTP request per post. .SS extractor.[booru].notes .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract overlay notes (position and text). Note: This requires 1 additional HTTP request per post. .SS extractor.[booru].url .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"file_url"\f[] .IP "Example:" 4 .br * "preview_url" .br * ["sample_url", "preview_url", "file_url"] .IP "Description:" 4 Alternate field name to retrieve download URLs from. When multiple names are given, download the first available one. .SS extractor.[manga-extractor].chapter-reverse .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Reverse the order of chapter URLs extracted from manga pages. .br * \f[I]true\f[]: Start with the latest chapter .br * \f[I]false\f[]: Start with the first chapter .SS extractor.[manga-extractor].page-reverse .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download manga chapter pages in reverse order. .SH DOWNLOADER OPTIONS .SS downloader.*.enabled .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Enable/Disable this downloader module. .SS downloader.*.filesize-min & .filesize-max .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Example:" 4 "32000", "500k", "2.5M" .IP "Description:" 4 Minimum/Maximum allowed file size in bytes. Any file smaller/larger than this limit will not be downloaded. Possible values are valid integer or floating-point numbers optionally followed by one of \f[I]k\f[], \f[I]m\f[]. \f[I]g\f[], \f[I]t\f[], or \f[I]p\f[]. These suffixes are case-insensitive. .SS downloader.*.mtime .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Use \f[I]Last-Modified\f[] HTTP response headers to set file modification times. .SS downloader.*.part .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Controls the use of \f[I].part\f[] files during file downloads. .br * \f[I]true\f[]: Write downloaded data into \f[I].part\f[] files and rename them upon download completion. This mode additionally supports resuming incomplete downloads. .br * \f[I]false\f[]: Do not use \f[I].part\f[] files and write data directly into the actual output files. .SS downloader.*.part-directory .IP "Type:" 6 \f[I]Path\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Alternate location for \f[I].part\f[] files. Missing directories will be created as needed. If this value is \f[I]null\f[], \f[I].part\f[] files are going to be stored alongside the actual output files. .SS downloader.*.progress .IP "Type:" 6 \f[I]float\f[] .IP "Default:" 9 \f[I]3.0\f[] .IP "Description:" 4 Number of seconds until a download progress indicator for the current download is displayed. Set this option to \f[I]null\f[] to disable this indicator. .SS downloader.*.rate .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Example:" 4 "32000", "500k", "2.5M" .IP "Description:" 4 Maximum download rate in bytes per second. Possible values are valid integer or floating-point numbers optionally followed by one of \f[I]k\f[], \f[I]m\f[]. \f[I]g\f[], \f[I]t\f[], or \f[I]p\f[]. These suffixes are case-insensitive. .SS downloader.*.retries .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]extractor.*.retries\f[] .IP "Description:" 4 Maximum number of retries during file downloads, or \f[I]-1\f[] for infinite retries. .SS downloader.*.timeout .IP "Type:" 6 \f[I]float\f[] .IP "Default:" 9 \f[I]extractor.*.timeout\f[] .IP "Description:" 4 Connection timeout during file downloads. .SS downloader.*.verify .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]extractor.*.verify\f[] .IP "Description:" 4 Certificate validation during file downloads. .SS downloader.*.proxy .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]object\f[] (scheme -> proxy) .IP "Default:" 9 \f[I]extractor.*.proxy\f[] .IP "Description:" 4 Proxy server used for file downloads. Disable the use of a proxy for file downloads by explicitly setting this option to \f[I]null\f[]. .SS downloader.http.adjust-extensions .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Check file headers of downloaded files and adjust their filename extensions if they do not match. For example, this will change the filename extension (\f[I]{extension}\f[]) of a file called \f[I]example.png\f[] from \f[I]png\f[] to \f[I]jpg\f[] when said file contains JPEG/JFIF data. .SS downloader.http.consume-content .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Controls the behavior when an HTTP response is considered unsuccessful If the value is \f[I]true\f[], consume the response body. This avoids closing the connection and therefore improves connection reuse. If the value is \f[I]false\f[], immediately close the connection without reading the response. This can be useful if the server is known to send large bodies for error responses. .SS downloader.http.chunk-size .IP "Type:" 6 .br * \f[I]integer\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]32768\f[] .IP "Example:" 4 "50k", "0.8M" .IP "Description:" 4 Number of bytes per downloaded chunk. Possible values are integer numbers optionally followed by one of \f[I]k\f[], \f[I]m\f[]. \f[I]g\f[], \f[I]t\f[], or \f[I]p\f[]. These suffixes are case-insensitive. .SS downloader.http.headers .IP "Type:" 6 \f[I]object\f[] (name -> value) .IP "Example:" 4 {"Accept": "image/webp,*/*", "Referer": "https://example.org/"} .IP "Description:" 4 Additional HTTP headers to send when downloading files, .SS downloader.http.retry-codes .IP "Type:" 6 \f[I]list\f[] of \f[I]integers\f[] .IP "Default:" 9 \f[I]extractor.*.retry-codes\f[] .IP "Description:" 4 Additional \f[I]HTTP response status codes\f[] to retry a download on. Codes \f[I]200\f[], \f[I]206\f[], and \f[I]416\f[] (when resuming a \f[I]partial\f[] download) will never be retried and always count as success, regardless of this option. \f[I]5xx\f[] codes (server error responses) will always be retried, regardless of this option. .SS downloader.http.sleep-429 .IP "Type:" 6 \f[I]Duration\f[] .IP "Default:" 9 \f[I]extractor.*.sleep-429\f[] .IP "Description:" 4 Number of seconds to sleep when receiving a 429 Too Many Requests response before \f[I]retrying\f[] the request. Note: Requires \f[I]retry-codes\f[] to include \f[I]429\f[]. .SS downloader.http.validate .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Check for invalid responses. Fail a download when a file does not pass instead of downloading a potentially broken file. .SS downloader.ytdl.cmdline-args .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Example:" 4 .br * "--quiet --write-sub --merge-output-format mkv" .br * ["--quiet", "--write-sub", "--merge-output-format", "mkv"] .IP "Description:" 4 Additional \f[I]ytdl\f[] options specified as command-line arguments. See \f[I]yt-dlp options\f[] / \f[I]youtube-dl options\f[] .SS downloader.ytdl.config-file .IP "Type:" 6 \f[I]Path\f[] .IP "Example:" 4 "~/.config/yt-dlp/config" .IP "Description:" 4 Location of a \f[I]ytdl\f[] configuration file to load options from. .SS downloader.ytdl.format .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 Default of the \f[I]ytdl\f[] \f[I]module\f[] used. .br (\f[I]"bestvideo*+bestaudio/best"\f[] for \f[I]yt_dlp\f[], .br \f[I]"bestvideo+bestaudio/best"\f[] for \f[I]youtube_dl\f[]) .IP "Description:" 4 \f[I]ytdl\f[] format selection string. See \f[I]yt-dlp format selection\f[] / \f[I]youtube-dl format selection\f[] .SS downloader.ytdl.forward-cookies .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Forward gallery-dl's cookies to \f[I]ytdl\f[]. .SS downloader.ytdl.logging .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Route \f[I]ytdl's\f[] output through gallery-dl's logging system. Otherwise it will be written directly to stdout/stderr. Note: Set \f[I]quiet\f[] and \f[I]no_warnings\f[] in \f[I]downloader.ytdl.raw-options\f[] to \f[I]true\f[] to suppress all output. .SS downloader.ytdl.module .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]Path\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Example:" 4 .br * "yt-dlp" .br * "/home/user/.local/lib/python3.13/site-packages/youtube_dl" .IP "Description:" 4 Name or filesystem path of the \f[I]ytdl\f[] Python module to import. Setting this to \f[I]null\f[] will try to import \f[I]"yt_dlp"\f[] followed by \f[I]"youtube_dl"\f[] as fallback. .SS downloader.ytdl.outtmpl .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 The Output Template used to generate filenames for files downloaded with \f[I]ytdl\f[]. See \f[I]yt-dlp output template\f[] / \f[I]youtube-dl output template\f[]. Special values: .br * \f[I]null\f[]: generate filenames with \f[I]extractor.*.filename\f[] .br * \f[I]"default"\f[]: use \f[I]ytdl's\f[] default, currently \f[I]"%(title)s [%(id)s].%(ext)s"\f[] for \f[I]yt-dlp\f[] / \f[I]"%(title)s-%(id)s.%(ext)s"\f[] for \f[I]youtube-dl\f[] Note: An output template other than \f[I]null\f[] might cause unexpected results in combination with certain options (e.g. \f[I]"skip": "enumerate"\f[]) .SS downloader.ytdl.raw-options .IP "Type:" 6 \f[I]object\f[] (name -> value) .IP "Example:" 4 .. code:: json { "quiet": true, "writesubtitles": true, "merge_output_format": "mkv" } .IP "Description:" 4 Additional options passed directly to the \f[I]YoutubeDL\f[] constructor. Available options can be found in \f[I]yt-dlp's docstrings\f[] / \f[I]youtube-dl's docstrings\f[] .SH OUTPUT OPTIONS .SS output.mode .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]object\f[] (key -> format string) .IP "Default:" 9 \f[I]"auto"\f[] .IP "Description:" 4 Controls the output string format and status indicators. .br * \f[I]"null"\f[]: No output .br * \f[I]"pipe"\f[]: Suitable for piping to other processes or files .br * \f[I]"terminal"\f[]: Suitable for the standard Windows console .br * \f[I]"color"\f[]: Suitable for terminals that understand ANSI escape codes and colors .br * \f[I]"auto"\f[]: \f[I]"terminal"\f[] on Windows with \f[I]output.ansi\f[] disabled, \f[I]"color"\f[] otherwise. It is possible to use custom output format strings .br by setting this option to an \f[I]object\f[] and specifying \f[I]start\f[], \f[I]success\f[], \f[I]skip\f[], \f[I]progress\f[], and \f[I]progress-total\f[]. .br For example, the following will replicate the same output as \f[I]mode: color\f[]: .. code:: json { "start" : "{}", "success": "\\r\\u001b[1;32m{}\\u001b[0m\\n", "skip" : "\\u001b[2m{}\\u001b[0m\\n", "progress" : "\\r{0:>7}B {1:>7}B/s ", "progress-total": "\\r{3:>3}% {0:>7}B {1:>7}B/s " } \f[I]start\f[], \f[I]success\f[], and \f[I]skip\f[] are used to output the current filename, where \f[I]{}\f[] or \f[I]{0}\f[] is replaced with said filename. If a given format string contains printable characters other than that, their number needs to be specified as \f[I][, ]\f[] to get the correct results for \f[I]output.shorten\f[]. For example .. code:: json "start" : [12, "Downloading {}"] \f[I]progress\f[] and \f[I]progress-total\f[] are used when displaying the .br \f[I]download progress indicator\f[], \f[I]progress\f[] when the total number of bytes to download is unknown, .br \f[I]progress-total\f[] otherwise. For these format strings .br * \f[I]{0}\f[] is number of bytes downloaded .br * \f[I]{1}\f[] is number of downloaded bytes per second .br * \f[I]{2}\f[] is total number of bytes .br * \f[I]{3}\f[] is percent of bytes downloaded to total bytes .SS output.stdout & .stdin & .stderr .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]object\f[] .IP "Example:" 4 .. code:: json "utf-8" .. code:: json { "encoding": "utf-8", "errors": "replace", "line_buffering": true } .IP "Description:" 4 \f[I]Reconfigure\f[] a \f[I]standard stream\f[]. Possible options are .br * \f[I]encoding\f[] .br * \f[I]errors\f[] .br * \f[I]newline\f[] .br * \f[I]line_buffering\f[] .br * \f[I]write_through\f[] When this option is specified as a simple \f[I]string\f[], it is interpreted as \f[I]{"encoding": "", "errors": "replace"}\f[] Note: \f[I]errors\f[] always defaults to \f[I]"replace"\f[] .SS output.shorten .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Controls whether the output strings should be shortened to fit on one console line. Set this option to \f[I]"eaw"\f[] to also work with east-asian characters with a display width greater than 1. .SS output.colors .IP "Type:" 6 \f[I]object\f[] (key -> ANSI color) .IP "Default:" 9 .. code:: json { "success": "1;32", "skip" : "2", "debug" : "0;37", "info" : "1;37", "warning": "1;33", "error" : "1;31" } .IP "Description:" 4 Controls the \f[I]ANSI colors\f[] used for various outputs. Output for \f[I]mode: color\f[] .br * \f[I]success\f[]: successfully downloaded files .br * \f[I]skip\f[]: skipped files Logging Messages: .br * \f[I]debug\f[]: debug logging messages .br * \f[I]info\f[]: info logging messages .br * \f[I]warning\f[]: warning logging messages .br * \f[I]error\f[]: error logging messages .SS output.ansi .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 On Windows, enable ANSI escape sequences and colored output .br by setting the \f[I]ENABLE_VIRTUAL_TERMINAL_PROCESSING\f[] flag for stdout and stderr. .br .SS output.skip .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Show skipped file downloads. .SS output.fallback .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Include fallback URLs in the output of \f[I]-g/--get-urls\f[]. .SS output.private .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Include private fields, i.e. fields whose name starts with an underscore, in the output of \f[I]-K/--list-keywords\f[] and \f[I]-j/--dump-json\f[]. .SS output.progress .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Controls the progress indicator when *gallery-dl* is run with multiple URLs as arguments. .br * \f[I]true\f[]: Show the default progress indicator (\f[I]"[{current}/{total}] {url}"\f[]) .br * \f[I]false\f[]: Do not show any progress indicator .br * Any \f[I]string\f[]: Show the progress indicator using this as a custom \f[I]format string\f[]. Possible replacement keys are \f[I]current\f[], \f[I]total\f[] and \f[I]url\f[]. .SS output.log .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]Logging Configuration\f[] .IP "Default:" 9 \f[I]"[{name}][{levelname}] {message}"\f[] .IP "Description:" 4 Configuration for logging output to stderr. If this is a simple \f[I]string\f[], it specifies the format string for logging messages. .SS output.logfile .IP "Type:" 6 .br * \f[I]Path\f[] .br * \f[I]Logging Configuration\f[] .IP "Description:" 4 File to write logging output to. .SS output.unsupportedfile .IP "Type:" 6 .br * \f[I]Path\f[] .br * \f[I]Logging Configuration\f[] .IP "Description:" 4 File to write external URLs unsupported by *gallery-dl* to. The default format string here is \f[I]"{message}"\f[]. .SS output.errorfile .IP "Type:" 6 .br * \f[I]Path\f[] .br * \f[I]Logging Configuration\f[] .IP "Description:" 4 File to write input URLs which returned an error to. The default format string here is also \f[I]"{message}"\f[]. When combined with \f[I]-I\f[]/\f[I]--input-file-comment\f[] or \f[I]-x\f[]/\f[I]--input-file-delete\f[], this option will cause *all* input URLs from these files to be commented/deleted after processing them and not just successful ones. .SS output.num-to-str .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Convert numeric values (\f[I]integer\f[] or \f[I]float\f[]) to \f[I]string\f[] before outputting them as JSON. .SH POSTPROCESSOR OPTIONS .SS classify.mapping .IP "Type:" 6 \f[I]object\f[] (directory -> extensions) .IP "Default:" 9 .. code:: json { "Pictures" : ["jpg", "jpeg", "png", "gif", "bmp", "svg", "webp", "avif", "heic", "heif", "ico", "psd"], "Video" : ["flv", "ogv", "avi", "mp4", "mpg", "mpeg", "3gp", "mkv", "webm", "vob", "wmv", "m4v", "mov"], "Music" : ["mp3", "aac", "flac", "ogg", "wma", "m4a", "wav"], "Archives" : ["zip", "rar", "7z", "tar", "gz", "bz2"], "Documents": ["txt", "pdf"] } .IP "Description:" 4 A mapping from directory names to filename extensions that should be stored in them. Files with an extension not listed will be ignored and stored in their default location. .SS compare.action .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"replace"\f[] .IP "Description:" 4 The action to take when files do **not** compare as equal. .br * \f[I]"replace"\f[]: Replace/Overwrite the old version with the new one .br * \f[I]"enumerate"\f[]: Add an enumeration index to the filename of the new version like \f[I]skip = "enumerate"\f[] .SS compare.equal .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"null"\f[] .IP "Description:" 4 The action to take when files do compare as equal. .br * \f[I]"abort:N"\f[]: Stop the current extractor run after \f[I]N\f[] consecutive files compared as equal. .br * \f[I]"terminate:N"\f[]: Stop the current extractor run, including parent extractors, after \f[I]N\f[] consecutive files compared as equal. .br * \f[I]"exit:N"\f[]: Exit the program after \f[I]N\f[] consecutive files compared as equal. .SS compare.shallow .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Only compare file sizes. Do not read and compare their content. .SS directory.event .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"prepare"\f[] .IP "Description:" 4 The event(s) for which \f[I]directory\f[] format strings are (re)evaluated. See \f[I]metadata.event\f[] for a list of available events. .SS exec.archive .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]Path\f[] .IP "Description:" 4 Database to store IDs of executed commands in, similar to \f[I]extractor.*.archive\f[]. The following archive options are also supported: .br * \f[I]archive-format\f[] .br * \f[I]archive-prefix\f[] .br * \f[I]archive-pragma\f[] .br * \f[I]archive-table \f[] .SS exec.async .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Controls whether to wait for a subprocess to finish or to let it run asynchronously. .SS exec.command .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Example:" 4 .br * "convert {} {}.png && rm {}" .br * ["echo", "{user[account]}", "{id}"] .IP "Description:" 4 The command to run. .br * If this is a \f[I]string\f[], it will be executed using the system's shell, e.g. \f[I]/bin/sh\f[]. Any \f[I]{}\f[] will be replaced with the full path of a file or target directory, depending on \f[I]exec.event\f[] .br * If this is a \f[I]list\f[], the first element specifies the program name and any further elements its arguments. Each element of this list is treated as a \f[I]format string\f[] using the files' metadata as well as \f[I]{_path}\f[], \f[I]{_directory}\f[], and \f[I]{_filename}\f[]. .SS exec.event .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"after"\f[] .IP "Description:" 4 The event(s) for which \f[I]exec.command\f[] is run. See \f[I]metadata.event\f[] for a list of available events. .SS hash.chunk-size .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]32768\f[] .IP "Description:" 4 Number of bytes read per chunk during file hash computation. .SS hash.event .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"file"\f[] .IP "Description:" 4 The event(s) for which \f[I]file hashes\f[] are computed. See \f[I]metadata.event\f[] for a list of available events. .SS hash.filename .IP "Type:" 6 .br * \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Rebuild \f[I]filenames\f[] after computing \f[I]hash digests\f[] and adding them to the metadata dict. .SS hash.hashes .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]object\f[] (field name -> hash algorithm) .IP "Default:" 9 \f[I]"md5,sha1"\f[] .IP "Example:" 4 .. code:: json "sha256:hash_sha,sha3_512:hash_sha3" .. code:: json { "hash_sha" : "sha256", "hash_sha3": "sha3_512" } .IP "Description:" 4 Hash digests to compute. For a list of available hash algorithms, run .. code:: python -c "import hashlib; print('\\n'.join(hashlib.algorithms_available))" or see \f[I]python/hashlib\f[]. .br * If this is a \f[I]string\f[], it is parsed as a a comma-separated list of algorthm-fieldname pairs: .. code:: [ ":"] ["," ...] When \f[I]\f[] is omitted, \f[I]\f[] is used as algorithm name. .br * If this is an \f[I]object\f[], it is a \f[I]\f[] to \f[I]\f[] mapping for hash digests to compute. .SS metadata.mode .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"json"\f[] .IP "Description:" 4 Selects how to process metadata. .br * \f[I]"json"\f[]: write metadata using \f[I]json.dump()\f[] .br * \f[I]"jsonl"\f[]: write metadata in \f[I]JSON Lines \f[] format .br * \f[I]"tags"\f[]: write \f[I]tags\f[] separated by newlines .br * \f[I]"custom"\f[]: write the result of applying \f[I]metadata.content-format\f[] to a file's metadata dictionary .br * \f[I]"modify"\f[]: add or modify metadata entries .br * \f[I]"delete"\f[]: remove metadata entries .SS metadata.filename .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Example:" 4 "{id}.data.json" .IP "Description:" 4 A \f[I]format string\f[] to build the filenames for metadata files with. (see \f[I]extractor.filename\f[]) Using \f[I]"-"\f[] as filename will write all output to \f[I]stdout\f[]. If this option is set, \f[I]metadata.extension\f[] and \f[I]metadata.extension-format\f[] will be ignored. .SS metadata.directory .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"."\f[] .IP "Example:" 4 .br * "metadata" .br * ["..", "metadata", "\\fF {id // 500 * 500}"] .IP "Description:" 4 Directory where metadata files are stored in relative to \f[I]metadata.base-directory\f[]. .SS metadata.base-directory .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]Path\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Selects the relative location for metadata files. .br * \f[I]false\f[]: current target location for file downloads (\f[I]base-directory\f[] + directory_) .br * \f[I]true\f[]: current \f[I]base-directory\f[] location .br * any \f[I]Path\f[]: custom location .SS metadata.extension .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"json"\f[] or \f[I]"txt"\f[] .IP "Description:" 4 Filename extension for metadata files that will be appended to the original file names. .SS metadata.extension-format .IP "Type:" 6 \f[I]string\f[] .IP "Example:" 4 .br * "{extension}.json" .br * "json" .IP "Description:" 4 Custom format string to build filename extensions for metadata files with, which will replace the original filename extensions. Note: \f[I]metadata.extension\f[] is ignored if this option is set. .SS metadata.metadata-path .IP "Type:" 6 \f[I]string\f[] .IP "Example:" 4 "_meta_path" .IP "Description:" 4 Insert the path of generated files into metadata dictionaries as the given name. .SS metadata.event .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"file"\f[] .IP "Example:" 4 .br * "prepare,file,after" .br * ["prepare-after", "skip"] .IP "Description:" 4 The event(s) for which metadata gets written to a file. Available events are: \f[I]init\f[] After post processor initialization and before the first file download \f[I]finalize\f[] On extractor shutdown, e.g. after all files were downloaded \f[I]finalize-success\f[] On extractor shutdown when no error occurred \f[I]finalize-error\f[] On extractor shutdown when at least one error occurred \f[I]prepare\f[] Before a file download \f[I]prepare-after\f[] Before a file download, but after building and checking file paths \f[I]file\f[] When completing a file download, but before it gets moved to its target location \f[I]after\f[] After a file got moved to its target location \f[I]skip\f[] When skipping a file download \f[I]error\f[] After a file download failed \f[I]post\f[] When starting to download all files of a post, e.g. a Tweet on Twitter or a post on Patreon. \f[I]post-after\f[] After downloading all files of a post .SS metadata.include .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Example:" 4 ["id", "width", "height", "description"] .IP "Description:" 4 Include only the given top-level keys when writing JSON data. Note: Missing or undefined fields will be silently ignored. .SS metadata.exclude .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Example:" 4 ["blocked", "watching", "status"] .IP "Description:" 4 Exclude all given keys from written JSON data. Note: Cannot be used with \f[I]metadata.include\f[]. .SS metadata.fields .IP "Type:" 6 .br * \f[I]list\f[] of \f[I]strings\f[] .br * \f[I]object\f[] (field name -> \f[I]format string\f[]) .IP "Example:" 4 .. code:: json ["blocked", "watching", "status[creator][name]"] .. code:: json { "blocked" : "***", "watching" : "\\fE 'yes' if watching else 'no'", "status[username]": "{status[creator][name]!l}" } .IP "Description:" 4 .br * \f[I]"mode": "delete"\f[]: A list of metadata field names to remove. .br * \f[I]"mode": "modify"\f[]: An object with metadata field names mapping to a \f[I]format string\f[] whose result is assigned to said field name. .SS metadata.content-format .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Example:" 4 .br * "tags:\\n\\n{tags:J\\n}\\n" .br * ["tags:", "", "{tags:J\\n}"] .IP "Description:" 4 Custom format string to build the content of metadata files with. Note: Only applies for \f[I]"mode": "custom"\f[]. .SS metadata.ascii .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Escape all non-ASCII characters. See the \f[I]ensure_ascii\f[] argument of \f[I]json.dump()\f[] for further details. Note: Only applies for \f[I]"mode": "json"\f[] and \f[I]"jsonl"\f[]. .SS metadata.indent .IP "Type:" 6 .br * \f[I]integer\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]4\f[] .IP "Description:" 4 Indentation level of JSON output. See the \f[I]indent\f[] argument of \f[I]json.dump()\f[] for further details. Note: Only applies for \f[I]"mode": "json"\f[]. .SS metadata.separators .IP "Type:" 6 \f[I]list\f[] with two \f[I]string\f[] elements .IP "Default:" 9 \f[I][", ", ": "]\f[] .IP "Description:" 4 \f[I]\f[] - \f[I]\f[] pair to separate JSON keys and values with. See the \f[I]separators\f[] argument of \f[I]json.dump()\f[] for further details. Note: Only applies for \f[I]"mode": "json"\f[] and \f[I]"jsonl"\f[]. .SS metadata.sort .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Sort output by key. See the \f[I]sort_keys\f[] argument of \f[I]json.dump()\f[] for further details. Note: Only applies for \f[I]"mode": "json"\f[] and \f[I]"jsonl"\f[]. .SS metadata.open .IP "Type:" 6 \f[I]string\f[] .IP "Defsult:" 4 \f[I]"w"\f[] .IP "Description:" 4 The \f[I]mode\f[] in which metadata files get opened. For example, use \f[I]"a"\f[] to append to a file's content or \f[I]"w"\f[] to truncate it. See the \f[I]mode\f[] argument of \f[I]open()\f[] for further details. .SS metadata.encoding .IP "Type:" 6 \f[I]string\f[] .IP "Defsult:" 4 \f[I]"utf-8"\f[] .IP "Description:" 4 Name of the encoding used to encode a file's content. See the \f[I]encoding\f[] argument of \f[I]open()\f[] for further details. .SS metadata.private .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Include private fields, i.e. fields whose name starts with an underscore. .SS metadata.skip .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Do not overwrite already existing files. .SS metadata.archive .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]Path\f[] .IP "Description:" 4 Database to store IDs of generated metadata files in, similar to \f[I]extractor.*.archive\f[]. The following archive options are also supported: .br * \f[I]archive-format\f[] .br * \f[I]archive-prefix\f[] .br * \f[I]archive-pragma\f[] .br * \f[I]archive-table \f[] .SS metadata.mtime .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Set modification times of generated metadata files according to the accompanying downloaded file. Enabling this option will only have an effect *if* there is actual \f[I]mtime\f[] metadata available, that is .br * after a file download (\f[I]"event": "file"\f[] (default), \f[I]"event": "after"\f[]) .br * when running *after* an \f[I]mtime\f[] post processes for the same \f[I]event\f[] For example, a \f[I]metadata\f[] post processor for \f[I]"event": "post"\f[] will *not* be able to set its file's modification time unless an \f[I]mtime\f[] post processor with \f[I]"event": "post"\f[] runs *before* it. .SS mtime.event .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"file"\f[] .IP "Description:" 4 The event(s) for which \f[I]mtime.key\f[] or \f[I]mtime.value\f[] get evaluated. See \f[I]metadata.event\f[] for a list of available events. .SS mtime.key .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"date"\f[] .IP "Description:" 4 Name of the metadata field whose value should be used. This value must be either a UNIX timestamp or a \f[I]datetime\f[] object. Note: This option gets ignored if \f[I]mtime.value\f[] is set. .SS mtime.value .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Example:" 4 .br * "{status[date]}" .br * "{content[0:6]:R22/2022/D%Y%m%d/}" .IP "Description:" 4 A \f[I]format string\f[] whose value should be used. The resulting value must be either a UNIX timestamp or a \f[I]datetime\f[] object. .SS python.archive .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]Path\f[] .IP "Description:" 4 Database to store IDs of called Python functions in, similar to \f[I]extractor.*.archive\f[]. The following archive options are also supported: .br * \f[I]archive-format\f[] .br * \f[I]archive-prefix\f[] .br * \f[I]archive-pragma\f[] .br * \f[I]archive-table \f[] .SS python.event .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"file"\f[] .IP "Description:" 4 The event(s) for which \f[I]python.function\f[] gets called. See \f[I]metadata.event\f[] for a list of available events. .SS python.function .IP "Type:" 6 \f[I]string\f[] .IP "Example:" 4 .br * "my_module:generate_text" .br * "~/.local/share/gdl-utils.py:resize" .IP "Description:" 4 The Python function to call. This function is specified as \f[I]:\f[] and gets called with the current metadata dict as argument. \f[I]module\f[] is either an importable Python module name or the \f[I]Path\f[] to a .py file, .SS rename.from .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 The \f[I]format string\f[] for filenames to rename. When no value is given, \f[I]extractor.*.filename\f[] is used. .SS rename.to .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 The \f[I]format string\f[] for target filenames. When no value is given, \f[I]extractor.*.filename\f[] is used. .SS rename.skip .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Do not rename a file when another file with the target name already exists. .SS ugoira.extension .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"webm"\f[] .IP "Description:" 4 Filename extension for the resulting video files. .SS ugoira.ffmpeg-args .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Example:" 4 ["-c:v", "libvpx-vp9", "-an", "-b:v", "2M"] .IP "Description:" 4 Additional \f[I]ffmpeg\f[] command-line arguments. .SS ugoira.ffmpeg-demuxer .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]auto\f[] .IP "Description:" 4 \f[I]ffmpeg\f[] demuxer to read and process input files with. Possible values are .br * "\f[I]concat\f[]" (inaccurate frame timecodes for non-uniform frame delays) .br * "\f[I]image2\f[]" (accurate timecodes, requires nanosecond file timestamps, i.e. no Windows or macOS) .br * "mkvmerge" (accurate timecodes, only WebM or MKV, requires \f[I]mkvmerge\f[]) .br * "archive" (store "original" frames in a \f[I].zip\f[] archive) "auto" will select mkvmerge if available and fall back to concat otherwise. .SS ugoira.ffmpeg-location .IP "Type:" 6 \f[I]Path\f[] .IP "Default:" 9 \f[I]"ffmpeg"\f[] .IP "Description:" 4 Location of the \f[I]ffmpeg\f[] (or \f[I]avconv\f[]) executable to use. .SS ugoira.mkvmerge-location .IP "Type:" 6 \f[I]Path\f[] .IP "Default:" 9 \f[I]"mkvmerge"\f[] .IP "Description:" 4 Location of the \f[I]mkvmerge\f[] executable for use with the \f[I]mkvmerge demuxer\f[]. .SS ugoira.ffmpeg-output .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]"error"\f[] .IP "Description:" 4 Controls \f[I]ffmpeg\f[] output. .br * \f[I]true\f[]: Enable \f[I]ffmpeg\f[] output .br * \f[I]false\f[]: Disable all \f[I]ffmpeg\f[] output .br * any \f[I]string\f[]: Pass \f[I]-hide_banner\f[] and \f[I]-loglevel\f[] with this value as argument to \f[I]ffmpeg\f[] .SS ugoira.ffmpeg-twopass .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Enable Two-Pass encoding. .SS ugoira.framerate .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"auto"\f[] .IP "Description:" 4 Controls the frame rate argument (\f[I]-r\f[]) for \f[I]ffmpeg\f[] .br * \f[I]"auto"\f[]: Automatically assign a fitting frame rate based on delays between frames. .br * \f[I]"uniform"\f[]: Like \f[I]auto\f[], but assign an explicit frame rate only to Ugoira with uniform frame delays. .br * any other \f[I]string\f[]: Use this value as argument for \f[I]-r\f[]. .br * \f[I]null\f[] or an empty \f[I]string\f[]: Don't set an explicit frame rate. .SS ugoira.keep-files .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Keep ZIP archives after conversion. .SS ugoira.libx264-prevent-odd .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Prevent \f[I]"width/height not divisible by 2"\f[] errors when using \f[I]libx264\f[] or \f[I]libx265\f[] encoders by applying a simple cropping filter. See this \f[I]Stack Overflow thread\f[] for more information. This option, when \f[I]libx264/5\f[] is used, automatically adds \f[I]["-vf", "crop=iw-mod(iw\\\\,2):ih-mod(ih\\\\,2)"]\f[] to the list of \f[I]ffmpeg\f[] command-line arguments to reduce an odd width/height by 1 pixel and make them even. .SS ugoira.metadata .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 When using \f[I]"mode": "archive"\f[], save Ugoira frame delay data as \f[I]animation.json\f[] within the archive file. If this is a \f[I]string\f[], use it as alternate filename for frame delay files. .SS ugoira.mtime .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Set modification times of generated ugoira aniomations. .SS ugoira.repeat-last-frame .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Allow repeating the last frame when necessary to prevent it from only being displayed for a very short amount of time. .SS ugoira.skip .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Do not convert frames if target file already exists. .SS zip.compression .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"store"\f[] .IP "Description:" 4 Compression method to use when writing the archive. Possible values are \f[I]"store"\f[], \f[I]"zip"\f[], \f[I]"bzip2"\f[], \f[I]"lzma"\f[]. .SS zip.extension .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"zip"\f[] .IP "Description:" 4 Filename extension for the created ZIP archive. .SS zip.files .IP "Type:" 6 \f[I]list\f[] of \f[I]Path\f[] .IP "Example:" 4 ["info.json"] .IP "Description:" 4 List of extra files to be added to a ZIP archive. Note: Relative paths are relative to the current \f[I]download directory\f[]. .SS zip.keep-files .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Keep the actual files after writing them to a ZIP archive. .SS zip.mode .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"default"\f[] .IP "Description:" 4 .br * \f[I]"default"\f[]: Write the central directory file header once after everything is done or an exception is raised. .br * \f[I]"safe"\f[]: Update the central directory file header each time a file is stored in a ZIP archive. This greatly reduces the chance a ZIP archive gets corrupted in case the Python interpreter gets shut down unexpectedly (power outage, SIGKILL) but is also a lot slower. .SH MISCELLANEOUS OPTIONS .SS extractor.modules .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 The \f[I]modules\f[] list in \f[I]extractor/__init__.py\f[] .IP "Example:" 4 ["reddit", "danbooru", "mangadex"] .IP "Description:" 4 List of internal modules to load when searching for a suitable extractor class. Useful to reduce startup time and memory usage. .SS extractor.module-sources .IP "Type:" 6 \f[I]list\f[] of \f[I]Path\f[] instances .IP "Example:" 4 ["~/.config/gallery-dl/modules", null] .IP "Description:" 4 List of directories to load external extractor modules from. Any file in a specified directory with a \f[I].py\f[] filename extension gets \f[I]imported\f[] and searched for potential extractors, i.e. classes with a \f[I]pattern\f[] attribute. Note: \f[I]null\f[] references internal extractors defined in \f[I]extractor/__init__.py\f[] or by \f[I]extractor.modules\f[]. .SS globals .IP "Type:" 6 .br * \f[I]Path\f[] .br * \f[I]string\f[] .IP "Example:" 4 .br * "~/.local/share/gdl-globals.py" .br * "gdl-globals" .IP "Description:" 4 Path to or name of an .br \f[I]importable\f[] Python module, whose namespace, .br in addition to the \f[I]GLOBALS\f[] dict in \f[I]util.py\f[], gets used as \f[I]globals parameter\f[] for compiled Python expressions. .SS cache.file .IP "Type:" 6 \f[I]Path\f[] .IP "Default:" 9 .br * (\f[I]%APPDATA%\f[] or \f[I]"~"\f[]) + \f[I]"/gallery-dl/cache.sqlite3"\f[] on Windows .br * (\f[I]$XDG_CACHE_HOME\f[] or \f[I]"~/.cache"\f[]) + \f[I]"/gallery-dl/cache.sqlite3"\f[] on all other platforms .IP "Description:" 4 Path of the SQLite3 database used to cache login sessions, cookies and API tokens across gallery-dl invocations. Set this option to \f[I]null\f[] or an invalid path to disable this cache. .SS filters-environment .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Evaluate filter expressions in a special environment preventing them from raising fatal exceptions. \f[I]true\f[] or \f[I]"tryexcept"\f[]: Wrap expressions in a try/except block; Evaluate expressions raising an exception as \f[I]false\f[] \f[I]false\f[] or \f[I]"raw"\f[]: Do not wrap expressions in a special environment \f[I]"defaultdict"\f[]: Prevent exceptions when accessing undefined variables by using a \f[I]defaultdict\f[] .SS format-separator .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"/"\f[] .IP "Description:" 4 Character(s) used as argument separator in format string \f[I]format specifiers\f[]. For example, setting this option to \f[I]"#"\f[] would allow a replacement operation to be \f[I]Rold#new#\f[] instead of the default \f[I]Rold/new/\f[] .SS input-files .IP "Type:" 6 \f[I]list\f[] of \f[I]Path\f[] .IP "Example:" 4 ["~/urls.txt", "$HOME/input"] .IP "Description:" 4 Additional input files. .SS signals-ignore .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Example:" 4 ["SIGTTOU", "SIGTTIN", "SIGTERM"] .IP "Description:" 4 The list of signal names to ignore, i.e. set \f[I]SIG_IGN\f[] as signal handler for. .SS subconfigs .IP "Type:" 6 \f[I]list\f[] of \f[I]Path\f[] .IP "Example:" 4 ["~/cfg-twitter.json", "~/cfg-reddit.json"] .IP "Description:" 4 Additional configuration files to load. .SS warnings .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"default"\f[] .IP "Description:" 4 The \f[I]Warnings Filter action\f[] used for (urllib3) warnings. .SH API TOKENS & IDS .SS extractor.deviantart.client-id & .client-secret .IP "Type:" 6 \f[I]string\f[] .IP "How To:" 4 .br * login and visit DeviantArt's \f[I]Applications & Keys\f[] section .br * click "Register Application" .br * scroll to "OAuth2 Redirect URI Whitelist (Required)" and enter "https://mikf.github.io/gallery-dl/oauth-redirect.html" .br * scroll to the bottom and agree to the API License Agreement. Submission Policy, and Terms of Service. .br * click "Save" .br * copy \f[I]client_id\f[] and \f[I]client_secret\f[] of your new application and put them in your configuration file as \f[I]"client-id"\f[] and \f[I]"client-secret"\f[] .br * clear your \f[I]cache\f[] to delete any remaining \f[I]access-token\f[] entries. (\f[I]gallery-dl --clear-cache deviantart\f[]) .br * get a new \f[I]refresh-token\f[] for the new \f[I]client-id\f[] (\f[I]gallery-dl oauth:deviantart\f[]) .SS extractor.flickr.api-key & .api-secret .IP "Type:" 6 \f[I]string\f[] .IP "How To:" 4 .br * login and \f[I]Create an App\f[] in Flickr's \f[I]App Garden\f[] .br * click "APPLY FOR A NON-COMMERCIAL KEY" .br * fill out the form with a random name and description and click "SUBMIT" .br * copy \f[I]Key\f[] and \f[I]Secret\f[] and put them in your configuration file as \f[I]"api-key"\f[] and \f[I]"api-secret"\f[] .SS extractor.mangadex.client-id & .client-secret .IP "Type:" 6 \f[I]string\f[] .IP "How To:" 4 .br * login and go to your \f[I]User Settings\f[] .br * open the "API Clients" section .br * click "\f[I]+ Create\f[]" .br * choose a name .br * click "\f[I]✔️ Create\f[]" .br * wait for approval / reload the page .br * copy the value after "AUTOAPPROVED ACTIVE" in the form "personal-client-..." and put it in your configuration file as \f[I]"client-id"\f[] .br * click "\f[I]Get Secret\f[]", then "\f[I]Copy Secret\f[]", and paste it into your configuration file as \f[I]"client-secret"\f[] .SS extractor.reddit.client-id & .user-agent .IP "Type:" 6 \f[I]string\f[] .IP "How To:" 4 .br * login and visit the \f[I]apps\f[] section of your account's preferences .br * click the "are you a developer? create an app..." button .br * fill out the form: .br * choose a name .br * select "installed app" .br * set \f[I]http://localhost:6414/\f[] as "redirect uri" .br * solve the "I'm not a robot" reCAPTCHA if needed .br * click "create app" .br * copy the client id (third line, under your application's name and "installed app") and put it in your configuration file as \f[I]"client-id"\f[] .br * use "\f[I]Python::v1.0 (by /u/)\f[]" as \f[I]user-agent\f[] and replace \f[I]\f[] and \f[I]\f[] accordingly (see Reddit's \f[I]API access rules\f[]) .br * clear your \f[I]cache\f[] to delete any remaining \f[I]access-token\f[] entries. (\f[I]gallery-dl --clear-cache reddit\f[]) .br * get a \f[I]refresh-token\f[] for the new \f[I]client-id\f[] (\f[I]gallery-dl oauth:reddit\f[]) .SS extractor.smugmug.api-key & .api-secret .IP "Type:" 6 \f[I]string\f[] .IP "How To:" 4 .br * login and \f[I]Apply for an API Key\f[] .br * use a random name and description, set "Type" to "Application", "Platform" to "All", and "Use" to "Non-Commercial" .br * fill out the two checkboxes at the bottom and click "Apply" .br * copy \f[I]API Key\f[] and \f[I]API Secret\f[] and put them in your configuration file as \f[I]"api-key"\f[] and \f[I]"api-secret"\f[] .SS extractor.tumblr.api-key & .api-secret .IP "Type:" 6 \f[I]string\f[] .IP "How To:" 4 .br * login and visit Tumblr's \f[I]Applications\f[] section .br * click "Register application" .br * fill out the form: use a random name and description, set https://example.org/ as "Application Website" and "Default callback URL" .br * solve Google's "I'm not a robot" challenge and click "Register" .br * click "Show secret key" (below "OAuth Consumer Key") .br * copy your \f[I]OAuth Consumer Key\f[] and \f[I]Secret Key\f[] and put them in your configuration file as \f[I]"api-key"\f[] and \f[I]"api-secret"\f[] .SH CUSTOM TYPES .SS Date .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]integer\f[] .IP "Example:" 4 .br * "2019-01-01T00:00:00" .br * "2019" with "%Y" as \f[I]date-format\f[] .br * 1546297200 .IP "Description:" 4 A \f[I]Date\f[] value represents a specific point in time. .br * If given as \f[I]string\f[], it is parsed according to \f[I]date-format\f[]. .br * If given as \f[I]integer\f[], it is interpreted as UTC timestamp. .SS Duration .IP "Type:" 6 .br * \f[I]float\f[] .br * \f[I]list\f[] with 2 \f[I]floats\f[] .br * \f[I]string\f[] .IP "Example:" 4 .br * 2.85 .br * [1.5, 3.0] .br * "2.85", "1.5-3.0" .IP "Description:" 4 A \f[I]Duration\f[] represents a span of time in seconds. .br * If given as a single \f[I]float\f[], it will be used as that exact value. .br * If given as a \f[I]list\f[] with 2 floating-point numbers \f[I]a\f[] & \f[I]b\f[] , it will be randomly chosen with uniform distribution such that \f[I]a <= N <= b\f[]. (see \f[I]random.uniform()\f[]) .br * If given as a \f[I]string\f[], it can either represent a single \f[I]float\f[] value (\f[I]"2.85"\f[]) or a range (\f[I]"1.5-3.0"\f[]). .SS Path .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Example:" 4 .br * "file.ext" .br * "~/path/to/file.ext" .br * "$HOME/path/to/file.ext" .br * ["$HOME", "path", "to", "file.ext"] .IP "Description:" 4 A \f[I]Path\f[] is a \f[I]string\f[] representing the location of a file or directory. Simple \f[I]tilde expansion\f[] and \f[I]environment variable expansion\f[] is supported. In Windows environments, backslashes (\f[I]"\\"\f[]) can, in addition to forward slashes (\f[I]"/"\f[]), be used as path separators. Because backslashes are JSON's escape character, they themselves have to be escaped. The path \f[I]C:\\path\\to\\file.ext\f[] has therefore to be written as \f[I]"C:\\\\path\\\\to\\\\file.ext"\f[] if you want to use backslashes. .SS Logging Configuration .IP "Type:" 6 \f[I]object\f[] .IP "Example:" 4 .. code:: json { "format" : "{asctime} {name}: {message}", "format-date": "%H:%M:%S", "path" : "~/log.txt", "encoding" : "ascii" } .. code:: json { "level" : "debug", "format": { "debug" : "debug: {message}", "info" : "[{name}] {message}", "warning": "Warning: {message}", "error" : "ERROR: {message}" } } .IP "Description:" 4 Extended logging output configuration. .br * format .br * General format string for logging messages or an \f[I]object\f[] with format strings for each loglevel. In addition to the default \f[I]LogRecord attributes\f[], it is also possible to access the current \f[I]extractor\f[], \f[I]job\f[], \f[I]path\f[], and keywords objects and their attributes, for example \f[I]"{extractor.url}"\f[], \f[I]"{path.filename}"\f[], \f[I]"{keywords.title}"\f[] .br * Default: \f[I]"[{name}][{levelname}] {message}"\f[] .br * format-date .br * Format string for \f[I]{asctime}\f[] fields in logging messages (see \f[I]strftime() directives\f[]) .br * Default: \f[I]"%Y-%m-%d %H:%M:%S"\f[] .br * level .br * Minimum logging message level (one of \f[I]"debug"\f[], \f[I]"info"\f[], \f[I]"warning"\f[], \f[I]"error"\f[], \f[I]"exception"\f[]) .br * Default: \f[I]"info"\f[] .br * path .br * \f[I]Path\f[] to the output file .br * mode .br * Mode in which the file is opened; use \f[I]"w"\f[] to truncate or \f[I]"a"\f[] to append (see \f[I]open()\f[]) .br * Default: \f[I]"w"\f[] .br * encoding .br * File encoding .br * Default: \f[I]"utf-8"\f[] Note: path, mode, and encoding are only applied when configuring logging output to a file. .SS Postprocessor Configuration .IP "Type:" 6 \f[I]object\f[] .IP "Example:" 4 .. code:: json { "name": "mtime" } .. code:: json { "name" : "zip", "compression": "store", "extension" : "cbz", "filter" : "extension not in ('zip', 'rar')", "whitelist" : ["mangadex", "exhentai", "nhentai"] } .IP "Description:" 4 An \f[I]object\f[] containing a \f[I]"name"\f[] attribute specifying the post-processor type, as well as any of its \f[I]options\f[]. It is possible to set a \f[I]"filter"\f[] expression similar to \f[I]image-filter\f[] to only run a post-processor conditionally. It is also possible set a \f[I]"whitelist"\f[] or \f[I]"blacklist"\f[] to only enable or disable a post-processor for the specified extractor categories. The available post-processor types are \f[I]classify\f[] Categorize files by filename extension \f[I]compare\f[] Compare versions of the same file and replace/enumerate them on mismatch .br (requires \f[I]downloader.*.part\f[] = \f[I]true\f[] and \f[I]extractor.*.skip\f[] = \f[I]false\f[]) .br \f[I]directory\f[] Reevaluate \f[I]directory\f[] format strings \f[I]exec\f[] Execute external commands \f[I]hash\f[] Compute file hash digests \f[I]metadata\f[] Write metadata to separate files \f[I]mtime\f[] Set file modification time according to its metadata \f[I]python\f[] Call Python functions \f[I]rename\f[] Rename previously downloaded files \f[I]ugoira\f[] Convert Pixiv Ugoira to WebM using \f[I]ffmpeg\f[] \f[I]zip\f[] Store files in a ZIP archive .SH BUGS https://github.com/mikf/gallery-dl/issues .SH AUTHORS Mike Fährmann .br and https://github.com/mikf/gallery-dl/graphs/contributors .SH "SEE ALSO" .BR gallery-dl (1) ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1747991096.4178958 gallery_dl-1.29.7/docs/0000755000175000017500000000000015014035070013347 5ustar00mikemike././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1745260818.0 gallery_dl-1.29.7/docs/gallery-dl-example.conf0000644000175000017500000003460715001510422017710 0ustar00mikemike{ "extractor": { "base-directory": "~/gallery-dl/", "#": "set global archive file for all extractors", "archive": "~/gallery-dl/archive.sqlite3", "archive-pragma": ["journal_mode=WAL", "synchronous=NORMAL"], "#": "add two custom keywords into the metadata dictionary", "#": "these can be used to further refine your output directories or filenames", "keywords": {"bkey": "", "ckey": ""}, "#": "make sure that custom keywords are empty, i.e. they don't appear unless specified by the user", "keywords-default": "", "#": "replace invalid path characters with unicode alternatives", "path-restrict": { "\\": "⧹", "/" : "⧸", "|" : "│", ":" : "꞉", "*" : "∗", "?" : "?", "\"": "″", "<" : "﹤", ">" : "﹥" }, "#": "write tags for several *booru sites", "postprocessors": [ { "name": "metadata", "mode": "tags", "whitelist": ["danbooru", "moebooru", "sankaku"] } ], "pixiv": { "#": "override global archive path for pixiv", "archive": "~/gallery-dl/archive-pixiv.sqlite3", "#": "set custom directory and filename format strings for all pixiv downloads", "filename": "{id}{num}.{extension}", "directory": ["Pixiv", "Works", "{user[id]}"], "refresh-token": "aBcDeFgHiJkLmNoPqRsTuVwXyZ01234567890-FedC9", "#": "transform ugoira into lossless MKVs", "ugoira": true, "postprocessors": ["ugoira-copy"], "#": "use special settings for favorites and bookmarks", "favorite": { "directory": ["Pixiv", "Favorites", "{user[id]}"] }, "bookmark": { "directory": ["Pixiv", "My Bookmarks"], "refresh-token": "01234567890aBcDeFgHiJkLmNoPqRsTuVwXyZ-ZyxW1" } }, "danbooru": { "ugoira": true, "postprocessors": ["ugoira-webm"] }, "exhentai": { "#": "use cookies instead of logging in with username and password", "cookies": { "ipb_member_id": "12345", "ipb_pass_hash": "1234567890abcdef", "igneous" : "123456789", "hath_perks" : "m1.m2.m3.a-123456789a", "sk" : "n4m34tv3574m2c4e22c35zgeehiw", "sl" : "dm_2" }, "#": "wait 2 to 4.8 seconds between HTTP requests", "sleep-request": [2.0, 4.8], "filename": "{num:>04}_{name}.{extension}", "directory": ["{category!c}", "{title}"] }, "sankaku": { "#": "authentication with cookies is not possible for sankaku", "username": "user", "password": "#secret#" }, "furaffinity": { "#": "authentication with username and password is not possible due to CAPTCHA", "cookies": { "a": "01234567-89ab-cdef-fedc-ba9876543210", "b": "fedcba98-7654-3210-0123-456789abcdef" }, "descriptions": "html", "postprocessors": ["content"] }, "deviantart": { "#": "download 'gallery' and 'scraps' images for user profile URLs", "include": "gallery,scraps", "#": "use custom API credentials to avoid 429 errors", "client-id": "98765", "client-secret": "0123456789abcdef0123456789abcdef", "refresh-token": "0123456789abcdef0123456789abcdef01234567", "#": "put description texts into a separate directory", "metadata": true, "postprocessors": [ { "name": "metadata", "mode": "custom", "directory" : "Descriptions", "content-format" : "{description}\n", "extension-format": "descr.txt" } ] }, "kemonoparty": { "postprocessors": [ { "name": "metadata", "event": "post", "filename": "{id} {title}.txt", "#": "write text content and external URLs", "mode": "custom", "format": "{content}\n{embed[url]:?/\n/}", "#": "onlx write file if there is an external link present", "filter": "embed.get('url') or re.search(r'(?i)(gigafile|xgf|1drv|mediafire|mega|google|drive)', content)" } ] }, "flickr": { "access-token": "1234567890-abcdef", "access-token-secret": "1234567890abcdef", "size-max": 1920 }, "mangadex": { "#": "only download safe/suggestive chapters translated to English", "lang": "en", "ratings": ["safe", "suggestive"], "#": "put chapters into '.cbz' archives", "postprocessors": ["cbz"] }, "reddit": { "#": "only spawn child extractors for links to specific sites", "whitelist": ["imgur", "redgifs"], "#": "put files from child extractors into the reddit directory", "parent-directory": true, "#": "transfer metadata to any child extractor as '_reddit'", "parent-metadata": "_reddit" }, "imgur": { "#": "general imgur settings", "filename": "{id}.{extension}" }, "reddit>imgur": { "#": "special settings for imgur URLs found in reddit posts", "directory": [], "filename": "{_reddit[id]} {_reddit[title]} {id}.{extension}" }, "tumblr": { "posts" : "all", "external": false, "reblogs" : false, "inline" : true, "#": "use special settings when downloading liked posts", "likes": { "posts" : "video,photo,link", "external": true, "reblogs" : true } }, "twitter": { "#": "write text content for *all* tweets", "postprocessors": ["content"], "text-tweets": true }, "ytdl": { "#": "enable 'ytdl' extractor", "#": "i.e. invoke ytdl on all otherwise unsupported input URLs", "enabled": true, "#": "use yt-dlp instead of youtube-dl", "module": "yt_dlp", "#": "load ytdl options from config file", "config-file": "~/yt-dlp.conf" }, "mastodon": { "#": "add 'tabletop.social' as recognized mastodon instance", "#": "(run 'gallery-dl oauth:mastodon:tabletop.social to get an access token')", "tabletop.social": { "root": "https://tabletop.social", "access-token": "513a36c6..." }, "#": "set filename format strings for all 'mastodon' instances", "directory": ["mastodon", "{instance}", "{account[username]!l}"], "filename" : "{id}_{media[id]}.{extension}" }, "foolslide": { "#": "add two more foolslide instances", "otscans" : {"root": "https://otscans.com/foolslide"}, "helvetica": {"root": "https://helveticascans.com/r" } }, "foolfuuka": { "#": "add two other foolfuuka 4chan archives", "fireden-onion": {"root": "http://ydt6jy2ng3s3xg2e.onion"}, "scalearchive" : {"root": "https://archive.scaled.team" } }, "gelbooru_v01": { "#": "add a custom gelbooru_v01 instance", "#": "this is just an example, this specific instance is already included!", "allgirlbooru": {"root": "https://allgirl.booru.org"}, "#": "the following options are used for all gelbooru_v01 instances", "tag": { "directory": { "locals().get('bkey')": ["Booru", "AllGirlBooru", "Tags", "{bkey}", "{ckey}", "{search_tags}"], "" : ["Booru", "AllGirlBooru", "Tags", "_Unsorted", "{search_tags}"] } }, "post": { "directory": ["Booru", "AllGirlBooru", "Posts"] }, "archive": "~/gallery-dl/custom-archive-file-for-gelbooru_v01_instances.db", "filename": "{tags}_{id}_{md5}.{extension}", "sleep-request": [0, 1.2] }, "gelbooru_v02": { "#": "add a custom gelbooru_v02 instance", "#": "this is just an example, this specific instance is already included!", "tbib": { "root": "https://tbib.org", "#": "some sites have different domains for API access", "#": "use the 'api_root' option in addition to the 'root' setting here" } }, "tbib": { "#": "the following options are only used for TBIB", "#": "gelbooru_v02 has four subcategories at the moment, use custom directory settings for all of these", "tag": { "directory": { "locals().get('bkey')": ["Other Boorus", "TBIB", "Tags", "{bkey}", "{ckey}", "{search_tags}"], "" : ["Other Boorus", "TBIB", "Tags", "_Unsorted", "{search_tags}"] } }, "pool": { "directory": { "locals().get('bkey')": ["Other Boorus", "TBIB", "Pools", "{bkey}", "{ckey}", "{pool}"], "" : ["Other Boorus", "TBIB", "Pools", "_Unsorted", "{pool}"] } }, "favorite": { "directory": { "locals().get('bkey')": ["Other Boorus", "TBIB", "Favorites", "{bkey}", "{ckey}", "{favorite_id}"], "" : ["Other Boorus", "TBIB", "Favorites", "_Unsorted", "{favorite_id}"] } }, "post": { "directory": ["Other Boorus", "TBIB", "Posts"] }, "archive": "~/gallery-dl/custom-archive-file-for-TBIB.db", "filename": "{id}_{md5}.{extension}", "sleep-request": [0, 1.2] }, "urlshortener": { "tinyurl": {"root": "https://tinyurl.com"} } }, "downloader": { "#": "restrict download speed to 1 MB/s", "rate": "1M", "#": "show download progress indicator after 2 seconds", "progress": 2.0, "#": "retry failed downloads up to 3 times", "retries": 3, "#": "consider a download 'failed' after 8 seconds of inactivity", "timeout": 8.0, "#": "write '.part' files into a special directory", "part-directory": "/tmp/.download/", "#": "do not update file modification times", "mtime": false, "ytdl": { "#": "use yt-dlp instead of youtube-dl", "module": "yt_dlp" } }, "output": { "log": { "level": "info", "#": "use different ANSI colors for each log level", "format": { "debug" : "\u001b[0;37m{name}: {message}\u001b[0m", "info" : "\u001b[1;37m{name}: {message}\u001b[0m", "warning": "\u001b[1;33m{name}: {message}\u001b[0m", "error" : "\u001b[1;31m{name}: {message}\u001b[0m" } }, "#": "shorten filenames to fit into one terminal line", "#": "while also considering wider East-Asian characters", "shorten": "eaw", "#": "enable ANSI escape sequences on Windows", "ansi": true, "#": "write logging messages to a separate file", "logfile": { "path": "~/gallery-dl/log.txt", "mode": "w", "level": "debug" }, "#": "write unrecognized URLs to a separate file", "unsupportedfile": { "path": "~/gallery-dl/unsupported.txt", "mode": "a", "format": "{asctime} {message}", "format-date": "%Y-%m-%d-%H-%M-%S" } }, "postprocessor": { "#": "write 'content' metadata into separate files", "content": { "name" : "metadata", "#": "write data for every post instead of each individual file", "event": "post", "filename": "{post_id|tweet_id|id}.txt", "#": "write only the values for 'content' or 'description'", "mode" : "custom", "format": "{content|description}\n" }, "#": "put files into a '.cbz' archive", "cbz": { "name": "zip", "extension": "cbz" }, "#": "various ugoira post processor configurations to create different file formats", "ugoira-webm": { "name": "ugoira", "extension": "webm", "ffmpeg-args": ["-c:v", "libvpx-vp9", "-an", "-b:v", "0", "-crf", "30"], "ffmpeg-twopass": true, "ffmpeg-demuxer": "image2" }, "ugoira-mp4": { "name": "ugoira", "extension": "mp4", "ffmpeg-args": ["-c:v", "libx264", "-an", "-b:v", "4M", "-preset", "veryslow"], "ffmpeg-twopass": true, "libx264-prevent-odd": true }, "ugoira-gif": { "name": "ugoira", "extension": "gif", "ffmpeg-args": ["-filter_complex", "[0:v] split [a][b];[a] palettegen [p];[b][p] paletteuse"] }, "ugoira-copy": { "name": "ugoira", "extension": "mkv", "ffmpeg-args": ["-c", "copy"], "libx264-prevent-odd": false, "repeat-last-frame": false } }, "#": "use a custom cache file location", "cache": { "file": "~/gallery-dl/cache.sqlite3" } } ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1747857760.0 gallery_dl-1.29.7/docs/gallery-dl.conf0000644000175000017500000006113015013430540016253 0ustar00mikemike{ "#": "gallery-dl default configuration file", "#": "full documentation at", "#": "https://gdl-org.github.io/docs/configuration.html", "extractor": { "#": "===============================================================", "#": "==== General Extractor Options ==========================", "#": "(these can be set as site-specific extractor options as well) ", "base-directory": "./gallery-dl/", "postprocessors": null, "skip" : true, "skip-filter" : null, "user-agent" : "auto", "referer" : true, "headers" : {}, "ciphers" : null, "tls12" : true, "browser" : null, "proxy" : null, "proxy-env" : true, "source-address": null, "retries" : 4, "retry-codes" : [], "timeout" : 30.0, "verify" : true, "download" : true, "fallback" : true, "archive" : null, "archive-format": null, "archive-prefix": null, "archive-pragma": [], "archive-event" : ["file"], "archive-mode" : "file", "archive-table" : null, "cookies": null, "cookies-select": null, "cookies-update": true, "image-filter" : null, "image-range" : null, "image-unique" : false, "chapter-filter": null, "chapter-range" : null, "chapter-unique": false, "keywords" : {}, "keywords-eval" : false, "keywords-default" : null, "parent-directory": false, "parent-metadata" : false, "parent-skip" : false, "path-restrict": "auto", "path-replace" : "_", "path-remove" : "\\u0000-\\u001f\\u007f", "path-strip" : "auto", "path-extended": true, "metadata-extractor": null, "metadata-http" : null, "metadata-parent" : null, "metadata-path" : null, "metadata-url" : null, "metadata-version" : null, "sleep" : 0, "sleep-request" : 0, "sleep-extractor": 0, "sleep-429" : 60.0, "actions": [], "input" : null, "netrc" : false, "extension-map": { "jpeg": "jpg", "jpe" : "jpg", "jfif": "jpg", "jif" : "jpg", "jfi" : "jpg" }, "#": "===============================================================", "#": "==== Site-specific Extractor Options ====================", "ao3": { "username": "", "password": "", "sleep-request": "0.5-1.5", "formats": ["pdf"] }, "arcalive": { "sleep-request": "0.5-1.5", "emoticons": false, "gifs" : true }, "artstation": { "external" : false, "max-posts": null, "previews" : false, "videos" : true, "search": { "pro-first": true } }, "aryion": { "username": "", "password": "", "recursive": true }, "batoto": { "domain": "auto" }, "bbc": { "width": 1920 }, "behance": { "sleep-request": "2.0-4.0", "modules": ["image", "video", "mediacollection", "embed"] }, "bilibili": { "sleep-request": "3.0-6.0" }, "bluesky": { "username": "", "password": "", "include" : ["media"], "metadata": false, "quoted" : false, "reposts" : false, "videos" : true, "likes": { "depth" : 0, "endpoint": "listRecords" }, "post": { "depth": 0 } }, "boosty": { "allowed" : true, "bought" : false, "metadata": false, "videos" : true }, "bunkr": { "endpoint": "/api/_001", "tlds": false }, "cien": { "sleep-request": "1.0-2.0", "files": ["image", "video", "download", "gallery"] }, "civitai": { "api-key": null, "sleep-request": "0.5-1.5", "api" : "trpc", "files" : ["image"], "include" : ["user-models", "user-posts"], "metadata": false, "nsfw" : true, "quality" : "original=true", "quality-videos": "quality=100" }, "coomerparty": { "username": "", "password": "", "announcements": false, "comments" : false, "dms" : false, "duplicates" : false, "favorites" : "artist", "files" : ["attachments", "file", "inline"], "max-posts" : null, "metadata" : false, "revisions" : false, "order-revisions": "desc" }, "cyberdrop": { "domain": null }, "deviantart": { "client-id" : null, "client-secret": null, "refresh-token": null, "auto-watch" : false, "auto-unwatch" : false, "comments" : false, "comments-avatars": false, "extra" : false, "flat" : true, "folders" : false, "group" : true, "include" : "gallery", "intermediary" : true, "journals" : "html", "jwt" : false, "mature" : true, "metadata" : false, "original" : true, "pagination" : "api", "previews" : false, "public" : true, "quality" : 100, "wait-min" : 0, "avatar": { "formats": null }, "folder": { "subfolders": true } }, "exhentai": { "username": "", "password": "", "cookies" : null, "sleep-request": "3.0-6.0", "domain" : "auto", "fav" : null, "gp" : "resized", "limits" : null, "metadata": false, "original": true, "source" : null, "tags" : false, "fallback-retries": 2 }, "fanbox": { "cookies" : null, "comments": false, "embeds" : true, "metadata": false }, "flickr": { "access-token" : null, "access-token-secret": null, "sleep-request" : "1.0-2.0", "contexts": false, "exif" : false, "info" : false, "metadata": false, "profile" : false, "size-max": null, "videos" : true }, "furaffinity": { "cookies" : null, "sleep-request": "1.0", "descriptions": "text", "external" : false, "include" : ["gallery"], "layout" : "auto" }, "gelbooru": { "api-key": null, "user-id": null, "favorite": { "order-posts": "desc" } }, "generic": { "enabled": false }, "gofile": { "api-token": null, "website-token": null, "recursive": false }, "hentaifoundry": { "include": ["pictures"] }, "hitomi": { "format": "webp" }, "idolcomplex": { "username": "", "password": "", "referer" : false, "sleep-request": "3.0-6.0" }, "imagechest": { "access-token": null }, "imagefap": { "sleep-request": "2.0-4.0" }, "imgbb": { "username": "", "password": "" }, "imgur": { "client-id": null, "mp4": true }, "inkbunny": { "username": "", "password": "", "orderby": "create_datetime" }, "instagram": { "cookies": null, "sleep-request": "6.0-12.0", "api" : "rest", "cursor" : true, "include" : "posts", "max-posts" : null, "metadata" : false, "order-files": "asc", "order-posts": "asc", "previews" : false, "videos" : true, "stories": { "split": false } }, "itaku": { "sleep-request": "0.5-1.5", "videos": true }, "kemonoparty": { "username": "", "password": "", "announcements": false, "archives" : false, "comments" : false, "dms" : false, "duplicates" : false, "endpoint" : "posts", "favorites" : "artist", "files" : ["attachments", "file", "inline"], "max-posts" : null, "metadata" : true, "revisions" : false, "order-revisions": "desc" }, "khinsider": { "covers": false, "format": "mp3" }, "koharu": { "username": "", "password": "", "sleep-request": "0.5-1.5", "cbz" : true, "format": ["0", "1600", "1280", "980", "780"], "tags" : false }, "luscious": { "gif": false }, "mangadex": { "client-id" : "", "client-secret": "", "username": "", "password": "", "api-server": "https://api.mangadex.org", "api-parameters": null, "lang": null, "ratings": ["safe", "suggestive", "erotica", "pornographic"] }, "mangoxo": { "username": "", "password": "" }, "naver": { "videos": true }, "newgrounds": { "username": "", "password": "", "sleep-request": "0.5-1.5", "flash" : true, "format" : "original", "include": ["art"] }, "nsfwalbum": { "referer": false }, "oauth": { "browser": true, "cache" : true, "host" : "localhost", "port" : 6414 }, "paheal": { "metadata": false }, "patreon": { "cookies": null, "files" : ["images", "image_large", "attachments", "postfile", "content"], "format-images": "download_url" }, "pexels": { "sleep-request": "1.0-2.0" }, "pillowfort": { "username": "", "password": "", "external": false, "inline" : true, "reblogs" : false }, "pinterest": { "domain" : "auto", "sections": true, "stories" : true, "videos" : true }, "pixeldrain": { "api-key" : null, "recursive": false }, "pixiv": { "refresh-token": null, "cookies" : null, "captions" : false, "comments" : false, "include" : ["artworks"], "max-posts": null, "metadata" : false, "metadata-bookmark": false, "sanity" : true, "tags" : "japanese", "ugoira" : true, "covers" : false, "embeds" : false, "full-series": false }, "plurk": { "sleep-request": "0.5-1.5", "comments": false }, "poipiku": { "sleep-request": "0.5-1.5" }, "pornpics": { "sleep-request": "0.5-1.5" }, "readcomiconline": { "sleep-request": "3.0-6.0", "captcha": "stop", "quality": "auto" }, "reddit": { "client-id" : null, "user-agent" : null, "refresh-token": null, "comments" : 0, "morecomments": false, "embeds" : true, "date-min" : 0, "date-max" : 253402210800, "date-format" : "%Y-%m-%dT%H:%M:%S", "id-min" : null, "id-max" : null, "previews" : true, "recursion" : 0, "selftext" : null, "videos" : true }, "redgifs": { "format": ["hd", "sd", "gif"] }, "rule34xyz": { "format": ["10", "40", "41", "2"] }, "sankaku": { "username": "", "password": "", "id-format": "numeric", "refresh" : false, "tags" : false }, "sankakucomplex": { "embeds": false, "videos": true }, "scrolller": { "username": "", "password": "", "sleep-request": "0.5-1.5" }, "skeb": { "article" : false, "sent-requests": false, "thumbnails" : false, "search": { "filters": null } }, "smugmug": { "access-token" : null, "access-token-secret": null, "videos": true }, "soundgasm": { "sleep-request": "0.5-1.5" }, "steamgriddb": { "animated" : true, "epilepsy" : true, "humor" : true, "dimensions": "all", "file-types": "all", "languages" : "all,", "nsfw" : true, "sort" : "score_desc", "static" : true, "styles" : "all", "untagged" : true, "download-fake-png": true }, "seiga": { "username": "", "password": "", "cookies" : null }, "subscribestar": { "username": "", "password": "" }, "tapas": { "username": "", "password": "" }, "tenor": { "format": ["gif", "mp4", "webm", "webp"] }, "tiktok": { "audio" : true, "videos": true, "user": { "avatar": true, "module": null, "tiktok-range": "" } }, "tsumino": { "username": "", "password": "" }, "tumblr": { "access-token" : null, "access-token-secret": null, "avatar" : false, "date-min" : 0, "date-max" : null, "external" : false, "inline" : true, "offset" : 0, "original" : true, "pagination": "offset", "posts" : "all", "ratelimit" : "abort", "reblogs" : true, "fallback-delay" : 120.0, "fallback-retries": 2 }, "tumblrgallery": { "referer": false }, "twitter": { "username" : "", "username-alt": "", "password" : "", "cookies" : null, "ads" : false, "cards" : false, "cards-blacklist": [], "csrf" : "cookies", "cursor" : true, "expand" : false, "include" : ["timeline"], "locked" : "abort", "logout" : true, "pinned" : false, "quoted" : false, "ratelimit" : "wait", "relogin" : true, "replies" : true, "retweets" : false, "size" : ["orig", "4096x4096", "large", "medium", "small"], "text-tweets" : false, "tweet-endpoint": "auto", "transform" : true, "twitpic" : false, "unavailable" : false, "unique" : true, "users" : "user", "videos" : true, "timeline": { "strategy": "auto" }, "tweet": { "conversations": false } }, "unsplash": { "format": "raw" }, "urlgalleries": { "sleep-request": "0.5-1.5" }, "vipergirls": { "username": "", "password": "", "sleep-request": "0.5", "domain" : "viper.click", "like" : false }, "vk": { "sleep-request": "0.5-1.5", "offset": 0 }, "vsco": { "include": ["gallery"], "videos" : true }, "wallhaven": { "api-key" : null, "sleep-request": "1.4", "include" : ["uploads"], "metadata": false }, "weasyl": { "api-key" : null, "metadata": false }, "webtoons": { "sleep-request": "0.5-1.5", "quality": "original" }, "weebcentral": { "sleep-request": "0.5-1.5" }, "weibo": { "sleep-request": "1.0-2.0", "gifs" : true, "include" : ["feed"], "livephoto": true, "movies" : false, "retweets" : false, "videos" : true }, "xfolio": { "sleep-request": "0.5-1.5" }, "ytdl": { "cmdline-args": null, "config-file" : null, "enabled" : false, "format" : null, "generic" : true, "logging" : true, "module" : null, "raw-options" : null }, "zerochan": { "username": "", "password": "", "sleep-request": "0.5-1.5", "metadata" : false, "pagination": "api", "redirects" : false }, "#": "===============================================================", "#": "==== Base-Extractor and Instance Options ================", "blogger": { "api-key": null, "videos" : true }, "Danbooru": { "sleep-request": "0.5-1.5", "external" : false, "metadata" : false, "threshold": "auto", "ugoira" : false, "favgroup": { "order-posts": "pool" }, "pool": { "order-posts": "pool" } }, "danbooru": { "username": "", "password": "" }, "atfbooru": { "username": "", "password": "" }, "aibooru": { "username": "", "password": "" }, "booruvar": { "username": "", "password": "" }, "E621": { "sleep-request": "0.5-1.5", "metadata" : false, "threshold": "auto" }, "e621": { "username": "", "password": "" }, "e926": { "username": "", "password": "" }, "e6ai": { "username": "", "password": "" }, "foolfuuka": { "sleep-request": "0.5-1.5" }, "archivedmoe": { "referer": false }, "mastodon": { "access-token": null, "cards" : false, "reblogs" : false, "replies" : true, "text-posts" : false }, "misskey": { "access-token": null, "renotes" : false, "replies" : true }, "Nijie": { "sleep-request": "2.0-4.0", "include" : ["illustration", "doujin"] }, "nijie": { "username": "", "password": "" }, "horne": { "username": "", "password": "" }, "nitter": { "quoted" : false, "retweets": false, "videos" : true }, "philomena": { "api-key": null, "sleep-request": "0.5-1.5", "svg" : true, "filter": 2 }, "derpibooru": { "filter": 56027 }, "ponybooru": { "filter": 3 }, "twibooru": { "sleep-request": "6.0-6.1" }, "postmill": { "save-link-post-body": false }, "reactor": { "sleep-request": "3.0-6.0", "gif": false }, "wikimedia": { "sleep-request": "1.0-2.0", "limit": 50, "subcategories": true }, "booru": { "tags" : false, "notes": false, "url" : "file_url" } }, "#": "===================================================================", "#": "==== Downloader Options =====================================", "downloader": { "filesize-min" : null, "filesize-max" : null, "mtime" : true, "part" : true, "part-directory": null, "progress" : 3.0, "proxy" : null, "rate" : null, "retries" : 4, "timeout" : 30.0, "verify" : true, "http": { "adjust-extensions": true, "chunk-size" : 32768, "consume-content" : false, "enabled" : true, "headers" : null, "retry-codes" : [], "sleep-429" : 60.0, "validate" : true }, "ytdl": { "cmdline-args" : null, "config-file" : null, "enabled" : true, "format" : null, "forward-cookies": true, "logging" : true, "module" : null, "outtmpl" : null, "raw-options" : null } }, "#": "===================================================================", "#": "==== Output Options =========================================", "output": { "ansi" : true, "fallback" : true, "mode" : "auto", "private" : false, "progress" : true, "shorten" : true, "skip" : true, "stdin" : null, "stdout" : null, "stderr" : null, "log" : "[{name}][{levelname}] {message}", "logfile" : null, "errorfile": null, "unsupportedfile": null, "colors" : { "success": "1;32", "skip" : "2", "debug" : "0;37", "info" : "1;37", "warning": "1;33", "error" : "1;31" } } } ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1747991096.4227486 gallery_dl-1.29.7/gallery_dl/0000755000175000017500000000000015014035070014535 5ustar00mikemike././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1747990168.0 gallery_dl-1.29.7/gallery_dl/__init__.py0000644000175000017500000004701415014033230016650 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2014-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. import sys import logging from . import version, config, option, output, extractor, job, util, exception __author__ = "Mike Fährmann" __copyright__ = "Copyright 2014-2023 Mike Fährmann" __license__ = "GPLv2" __maintainer__ = "Mike Fährmann" __email__ = "mike_faehrmann@web.de" __version__ = version.__version__ def main(): try: parser = option.build_parser() args = parser.parse_args() log = output.initialize_logging(args.loglevel) # configuration if args.config_load: config.load() if args.configs_json: config.load(args.configs_json, strict=True) if args.configs_yaml: import yaml config.load(args.configs_yaml, strict=True, loads=yaml.safe_load) if args.configs_toml: try: import tomllib as toml except ImportError: import toml config.load(args.configs_toml, strict=True, loads=toml.loads) if not args.colors: output.ANSI = False config.set((), "colors", False) if util.WINDOWS: config.set(("output",), "ansi", False) if args.filename: filename = args.filename if filename == "/O": filename = "{filename}.{extension}" elif filename.startswith("\\f"): filename = "\f" + filename[2:] config.set((), "filename", filename) if args.directory is not None: config.set((), "base-directory", args.directory) config.set((), "directory", ()) if args.postprocessors: config.set((), "postprocessors", args.postprocessors) if args.abort: config.set((), "skip", "abort:" + str(args.abort)) if args.terminate: config.set((), "skip", "terminate:" + str(args.terminate)) if args.cookies_from_browser: browser, _, profile = args.cookies_from_browser.partition(":") browser, _, keyring = browser.partition("+") browser, _, domain = browser.partition("/") if profile and profile[0] == ":": container = profile[1:] profile = None else: profile, _, container = profile.partition("::") config.set((), "cookies", ( browser, profile, keyring, container, domain)) if args.options_pp: config.set((), "postprocessor-options", args.options_pp) for opts in args.options: config.set(*opts) output.configure_standard_streams() # signals signals = config.get((), "signals-ignore") if signals: import signal if isinstance(signals, str): signals = signals.split(",") for signal_name in signals: signal_num = getattr(signal, signal_name, None) if signal_num is None: log.warning("signal '%s' is not defined", signal_name) else: signal.signal(signal_num, signal.SIG_IGN) # enable ANSI escape sequences on Windows if util.WINDOWS and config.get(("output",), "ansi", output.COLORS): from ctypes import windll, wintypes, byref kernel32 = windll.kernel32 mode = wintypes.DWORD() for handle_id in (-11, -12): # stdout and stderr handle = kernel32.GetStdHandle(handle_id) kernel32.GetConsoleMode(handle, byref(mode)) if not mode.value & 0x4: mode.value |= 0x4 kernel32.SetConsoleMode(handle, mode) output.ANSI = True # filter environment filterenv = config.get((), "filters-environment", True) if filterenv is True: pass elif not filterenv: util.compile_expression = util.compile_expression_raw elif isinstance(filterenv, str): if filterenv == "raw": util.compile_expression = util.compile_expression_raw elif filterenv.startswith("default"): util.compile_expression = util.compile_expression_defaultdict # format string separator separator = config.get((), "format-separator") if separator: from . import formatter formatter._SEPARATOR = separator # eval globals path = config.get((), "globals") if path: util.GLOBALS.update(util.import_file(path).__dict__) # loglevels output.configure_logging(args.loglevel) if args.loglevel >= logging.WARNING: config.set(("output",), "mode", "null") config.set(("downloader",), "progress", None) elif args.loglevel <= logging.DEBUG: import platform import requests extra = "" if util.EXECUTABLE: extra = " - Executable ({})".format(version.__variant__) else: git_head = util.git_head() if git_head: extra = " - Git HEAD: " + git_head log.debug("Version %s%s", __version__, extra) log.debug("Python %s - %s", platform.python_version(), platform.platform()) try: log.debug("requests %s - urllib3 %s", requests.__version__, requests.packages.urllib3.__version__) except AttributeError: pass log.debug("Configuration Files %s", config._files) if args.print_traffic: import requests requests.packages.urllib3.connection.HTTPConnection.debuglevel = 1 # extractor modules modules = config.get(("extractor",), "modules") if modules is not None: if isinstance(modules, str): modules = modules.split(",") extractor.modules = modules # external modules if args.extractor_sources: sources = args.extractor_sources sources.append(None) else: sources = config.get(("extractor",), "module-sources") if sources: import os modules = [] for source in sources: if source: path = util.expand_path(source) try: files = os.listdir(path) modules.append(extractor._modules_path(path, files)) except Exception as exc: log.warning("Unable to load modules from %s (%s: %s)", path, exc.__class__.__name__, exc) else: modules.append(extractor._modules_internal()) if len(modules) > 1: import itertools extractor._module_iter = itertools.chain(*modules) elif not modules: extractor._module_iter = () else: extractor._module_iter = iter(modules[0]) if args.update: from . import update extr = update.UpdateExtractor.from_url("update:" + args.update) ujob = update.UpdateJob(extr) return ujob.run() elif args.list_modules: extractor.modules.append("") sys.stdout.write("\n".join(extractor.modules)) elif args.list_extractors is not None: write = sys.stdout.write fmt = ("{}{}\nCategory: {} - Subcategory: {}" "\nExample : {}\n\n").format extractors = extractor.extractors() if args.list_extractors: fltr = util.build_extractor_filter( args.list_extractors, negate=False) extractors = filter(fltr, extractors) for extr in extractors: write(fmt( extr.__name__, "\n" + extr.__doc__ if extr.__doc__ else "", extr.category, extr.subcategory, extr.example, )) elif args.clear_cache: from . import cache log = logging.getLogger("cache") cnt = cache.clear(args.clear_cache) if cnt is None: log.error("Database file not available") return 1 else: log.info( "Deleted %d %s from '%s'", cnt, "entry" if cnt == 1 else "entries", cache._path(), ) elif args.config: if args.config == "init": return config.initialize() elif args.config == "status": return config.status() else: return config.open_extern() else: input_files = config.get((), "input-files") if input_files: for input_file in input_files: if isinstance(input_file, str): input_file = (input_file, None) args.input_files.append(input_file) if not args.urls and not args.input_files: if args.cookies_from_browser or config.interpolate( ("extractor",), "cookies"): args.urls.append("noop") else: parser.error( "The following arguments are required: URL\nUse " "'gallery-dl --help' to get a list of all options.") if args.list_urls: jobtype = job.UrlJob jobtype.maxdepth = args.list_urls if config.get(("output",), "fallback", True): jobtype.handle_url = \ staticmethod(jobtype.handle_url_fallback) elif args.dump_json: jobtype = job.DataJob jobtype.resolve = args.dump_json - 1 else: jobtype = args.jobtype or job.DownloadJob input_manager = InputManager() input_manager.log = input_log = logging.getLogger("inputfile") # unsupported file logging handler handler = output.setup_logging_handler( "unsupportedfile", fmt="{message}") if handler: ulog = job.Job.ulog = logging.getLogger("unsupported") ulog.addHandler(handler) ulog.propagate = False # error file logging handler handler = output.setup_logging_handler( "errorfile", fmt="{message}", mode="a") if handler: elog = input_manager.err = logging.getLogger("errorfile") elog.addHandler(handler) elog.propagate = False # collect input URLs input_manager.add_list(args.urls) if args.input_files: for input_file, action in args.input_files: try: path = util.expand_path(input_file) input_manager.add_file(path, action) except Exception as exc: input_log.error(exc) return getattr(exc, "code", 128) pformat = config.get(("output",), "progress", True) if pformat and len(input_manager.urls) > 1 and \ args.loglevel < logging.ERROR: input_manager.progress(pformat) # process input URLs retval = 0 for url in input_manager: try: log.debug("Starting %s for '%s'", jobtype.__name__, url) if isinstance(url, ExtendedUrl): for opts in url.gconfig: config.set(*opts) with config.apply(url.lconfig): status = jobtype(url.value).run() else: status = jobtype(url).run() if status: retval |= status input_manager.error() else: input_manager.success() except exception.StopExtraction: pass except exception.TerminateExtraction: pass except exception.RestartExtraction: log.debug("Restarting '%s'", url) continue except exception.NoExtractorError: log.error("Unsupported URL '%s'", url) retval |= 64 input_manager.error() input_manager.next() return retval return 0 except KeyboardInterrupt: raise SystemExit("\nKeyboardInterrupt") except BrokenPipeError: pass except OSError as exc: import errno if exc.errno != errno.EPIPE: raise return 1 class InputManager(): def __init__(self): self.urls = [] self.files = () self.log = self.err = None self._url = "" self._item = None self._index = 0 self._pformat = None def add_url(self, url): self.urls.append(url) def add_list(self, urls): self.urls += urls def add_file(self, path, action=None): """Process an input file. Lines starting with '#' and empty lines will be ignored. Lines starting with '-' will be interpreted as a key-value pair separated by an '='. where 'key' is a dot-separated option name and 'value' is a JSON-parsable string. These configuration options will be applied while processing the next URL only. Lines starting with '-G' are the same as above, except these options will be applied for *all* following URLs, i.e. they are Global. Everything else will be used as a potential URL. Example input file: # settings global options -G base-directory = "/tmp/" -G skip = false # setting local options for the next URL -filename="spaces_are_optional.jpg" -skip = true https://example.org/ # next URL uses default filename and 'skip' is false. https://example.com/index.htm # comment1 https://example.com/404.htm # comment2 """ if path == "-" and not action: try: lines = sys.stdin.readlines() except Exception: raise exception.InputFileError("stdin is not readable") path = None else: try: with open(path, encoding="utf-8") as fp: lines = fp.readlines() except Exception as exc: raise exception.InputFileError(str(exc)) if self.files: self.files[path] = lines else: self.files = {path: lines} if action == "c": action = self._action_comment elif action == "d": action = self._action_delete else: action = None gconf = [] lconf = [] indicies = [] strip_comment = None append = self.urls.append for n, line in enumerate(lines): line = line.strip() if not line or line[0] == "#": # empty line or comment continue elif line[0] == "-": # config spec if len(line) >= 2 and line[1] == "G": conf = gconf line = line[2:] else: conf = lconf line = line[1:] if action: indicies.append(n) key, sep, value = line.partition("=") if not sep: raise exception.InputFileError( "Invalid KEY=VALUE pair '%s' on line %s in %s", line, n+1, path) try: value = util.json_loads(value.strip()) except ValueError as exc: self.log.debug("%s: %s", exc.__class__.__name__, exc) raise exception.InputFileError( "Unable to parse '%s' on line %s in %s", value, n+1, path) key = key.strip().split(".") conf.append((key[:-1], key[-1], value)) else: # url if " #" in line or "\t#" in line: if strip_comment is None: import re strip_comment = re.compile(r"\s+#.*").sub line = strip_comment("", line) if gconf or lconf: url = ExtendedUrl(line, gconf, lconf) gconf = [] lconf = [] else: url = line if action: indicies.append(n) append((url, path, action, indicies)) indicies = [] else: append(url) def progress(self, pformat=True): if pformat is True: pformat = "[{current}/{total}] {url}\n" else: pformat += "\n" self._pformat = pformat.format_map def next(self): self._index += 1 def success(self): if self._item: self._rewrite() def error(self): if self.err: if self._item: url, path, action, indicies = self._item lines = self.files[path] out = "".join(lines[i] for i in indicies) if out and out[-1] == "\n": out = out[:-1] self._rewrite() else: out = str(self._url) self.err.info(out) def _rewrite(self): url, path, action, indicies = self._item lines = self.files[path] action(lines, indicies) try: with open(path, "w", encoding="utf-8") as fp: fp.writelines(lines) except Exception as exc: self.log.warning( "Unable to update '%s' (%s: %s)", path, exc.__class__.__name__, exc) @staticmethod def _action_comment(lines, indicies): for i in indicies: lines[i] = "# " + lines[i] @staticmethod def _action_delete(lines, indicies): for i in indicies: lines[i] = "" def __iter__(self): self._index = 0 return self def __next__(self): try: url = self.urls[self._index] except IndexError: raise StopIteration if isinstance(url, tuple): self._item = url url = url[0] else: self._item = None self._url = url if self._pformat: output.stderr_write(self._pformat({ "total" : len(self.urls), "current": self._index + 1, "url" : url, })) return url class ExtendedUrl(): """URL with attached config key-value pairs""" __slots__ = ("value", "gconfig", "lconfig") def __init__(self, url, gconf, lconf): self.value = url self.gconfig = gconf self.lconfig = lconf def __str__(self): return self.value ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1743510442.0 gallery_dl-1.29.7/gallery_dl/__main__.py0000644000175000017500000000105714772755652016663 0ustar00mikemike#!/usr/bin/env python3 # -*- coding: utf-8 -*- # Copyright 2017-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. import sys if not __package__ and not hasattr(sys, "frozen"): import os.path path = os.path.realpath(os.path.abspath(__file__)) sys.path.insert(0, os.path.dirname(os.path.dirname(path))) import gallery_dl if __name__ == "__main__": raise SystemExit(gallery_dl.main()) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1745260818.0 gallery_dl-1.29.7/gallery_dl/actions.py0000644000175000017500000001261115001510422016543 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """ """ import re import time import logging import operator import functools from . import util, exception def parse(actionspec): if isinstance(actionspec, dict): actionspec = actionspec.items() actions = {} actions[-logging.DEBUG] = actions_bd = [] actions[-logging.INFO] = actions_bi = [] actions[-logging.WARNING] = actions_bw = [] actions[-logging.ERROR] = actions_be = [] actions[logging.DEBUG] = actions_ad = [] actions[logging.INFO] = actions_ai = [] actions[logging.WARNING] = actions_aw = [] actions[logging.ERROR] = actions_ae = [] for event, spec in actionspec: level, _, pattern = event.partition(":") search = re.compile(pattern).search if pattern else util.true if isinstance(spec, str): type, _, args = spec.partition(" ") before, after = ACTIONS[type](args) else: actions_before = [] actions_after = [] for s in spec: type, _, args = s.partition(" ") before, after = ACTIONS[type](args) if before: actions_before.append(before) if after: actions_after.append(after) before = _chain_actions(actions_before) after = _chain_actions(actions_after) level = level.strip() if not level or level == "*": if before: action = (search, before) actions_bd.append(action) actions_bi.append(action) actions_bw.append(action) actions_be.append(action) if after: action = (search, after) actions_ad.append(action) actions_ai.append(action) actions_aw.append(action) actions_ae.append(action) else: level = _level_to_int(level) if before: actions[-level].append((search, before)) if after: actions[level].append((search, after)) return actions class LoggerAdapter(): def __init__(self, logger, job): self.logger = logger self.extra = job._logger_extra self.actions = job._logger_actions self.debug = functools.partial(self.log, logging.DEBUG) self.info = functools.partial(self.log, logging.INFO) self.warning = functools.partial(self.log, logging.WARNING) self.error = functools.partial(self.log, logging.ERROR) def log(self, level, msg, *args, **kwargs): msg = str(msg) if args: msg = msg % args before = self.actions[-level] after = self.actions[level] if before: args = self.extra.copy() args["level"] = level for cond, action in before: if cond(msg): action(args) level = args["level"] if self.logger.isEnabledFor(level): kwargs["extra"] = self.extra self.logger._log(level, msg, (), **kwargs) if after: args = self.extra.copy() for cond, action in after: if cond(msg): action(args) def _level_to_int(level): try: return logging._nameToLevel[level] except KeyError: return int(level) def _chain_actions(actions): def _chain(args): for action in actions: action(args) return _chain # -------------------------------------------------------------------- def action_print(opts): def _print(_): print(opts) return None, _print def action_status(opts): op, value = re.match(r"\s*([&|^=])=?\s*(\d+)", opts).groups() op = { "&": operator.and_, "|": operator.or_, "^": operator.xor, "=": lambda x, y: y, }[op] value = int(value) def _status(args): args["job"].status = op(args["job"].status, value) return _status, None def action_level(opts): level = _level_to_int(opts.lstrip(" ~=")) def _level(args): args["level"] = level return _level, None def action_exec(opts): def _exec(_): util.Popen(opts, shell=True).wait() return None, _exec def action_wait(opts): if opts: seconds = util.build_duration_func(opts) def _wait(args): time.sleep(seconds()) else: def _wait(args): input("Press Enter to continue") return None, _wait def action_abort(opts): return None, util.raises(exception.StopExtraction) def action_terminate(opts): return None, util.raises(exception.TerminateExtraction) def action_restart(opts): return None, util.raises(exception.RestartExtraction) def action_exit(opts): try: opts = int(opts) except ValueError: pass def _exit(args): raise SystemExit(opts) return None, _exit ACTIONS = { "abort" : action_abort, "exec" : action_exec, "exit" : action_exit, "level" : action_level, "print" : action_print, "restart" : action_restart, "status" : action_status, "terminate": action_terminate, "wait" : action_wait, } ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1746776695.0 gallery_dl-1.29.7/gallery_dl/aes.py0000644000175000017500000005152515007331167015677 0ustar00mikemike# -*- coding: utf-8 -*- # This is a slightly modified version of yt-dlp's aes module. # https://github.com/yt-dlp/yt-dlp/blob/master/yt_dlp/aes.py import struct import binascii from math import ceil try: from Cryptodome.Cipher import AES as Cryptodome_AES except ImportError: try: from Crypto.Cipher import AES as Cryptodome_AES except ImportError: Cryptodome_AES = None except Exception as exc: Cryptodome_AES = None import logging logging.getLogger("aes").warning( "Error when trying to import 'Cryptodome' module (%s: %s)", exc.__class__.__name__, exc) del logging if Cryptodome_AES: def aes_cbc_decrypt_bytes(data, key, iv): """Decrypt bytes with AES-CBC using pycryptodome""" return Cryptodome_AES.new( key, Cryptodome_AES.MODE_CBC, iv).decrypt(data) def aes_gcm_decrypt_and_verify_bytes(data, key, tag, nonce): """Decrypt bytes with AES-GCM using pycryptodome""" return Cryptodome_AES.new( key, Cryptodome_AES.MODE_GCM, nonce).decrypt_and_verify(data, tag) else: def aes_cbc_decrypt_bytes(data, key, iv): """Decrypt bytes with AES-CBC using native implementation""" return intlist_to_bytes(aes_cbc_decrypt( bytes_to_intlist(data), bytes_to_intlist(key), bytes_to_intlist(iv), )) def aes_gcm_decrypt_and_verify_bytes(data, key, tag, nonce): """Decrypt bytes with AES-GCM using native implementation""" return intlist_to_bytes(aes_gcm_decrypt_and_verify( bytes_to_intlist(data), bytes_to_intlist(key), bytes_to_intlist(tag), bytes_to_intlist(nonce), )) bytes_to_intlist = list def intlist_to_bytes(xs): if not xs: return b"" return struct.pack("%dB" % len(xs), *xs) def unpad_pkcs7(data): return data[:-data[-1]] BLOCK_SIZE_BYTES = 16 def aes_ecb_encrypt(data, key, iv=None): """ Encrypt with aes in ECB mode @param {int[]} data cleartext @param {int[]} key 16/24/32-Byte cipher key @param {int[]} iv Unused for this mode @returns {int[]} encrypted data """ expanded_key = key_expansion(key) block_count = ceil(len(data) / BLOCK_SIZE_BYTES) encrypted_data = [] for i in range(block_count): block = data[i * BLOCK_SIZE_BYTES: (i + 1) * BLOCK_SIZE_BYTES] encrypted_data += aes_encrypt(block, expanded_key) encrypted_data = encrypted_data[:len(data)] return encrypted_data def aes_ecb_decrypt(data, key, iv=None): """ Decrypt with aes in ECB mode @param {int[]} data cleartext @param {int[]} key 16/24/32-Byte cipher key @param {int[]} iv Unused for this mode @returns {int[]} decrypted data """ expanded_key = key_expansion(key) block_count = ceil(len(data) / BLOCK_SIZE_BYTES) encrypted_data = [] for i in range(block_count): block = data[i * BLOCK_SIZE_BYTES: (i + 1) * BLOCK_SIZE_BYTES] encrypted_data += aes_decrypt(block, expanded_key) encrypted_data = encrypted_data[:len(data)] return encrypted_data def aes_ctr_decrypt(data, key, iv): """ Decrypt with aes in counter mode @param {int[]} data cipher @param {int[]} key 16/24/32-Byte cipher key @param {int[]} iv 16-Byte initialization vector @returns {int[]} decrypted data """ return aes_ctr_encrypt(data, key, iv) def aes_ctr_encrypt(data, key, iv): """ Encrypt with aes in counter mode @param {int[]} data cleartext @param {int[]} key 16/24/32-Byte cipher key @param {int[]} iv 16-Byte initialization vector @returns {int[]} encrypted data """ expanded_key = key_expansion(key) block_count = ceil(len(data) / BLOCK_SIZE_BYTES) counter = iter_vector(iv) encrypted_data = [] for i in range(block_count): counter_block = next(counter) block = data[i * BLOCK_SIZE_BYTES: (i + 1) * BLOCK_SIZE_BYTES] block += [0] * (BLOCK_SIZE_BYTES - len(block)) cipher_counter_block = aes_encrypt(counter_block, expanded_key) encrypted_data += xor(block, cipher_counter_block) encrypted_data = encrypted_data[:len(data)] return encrypted_data def aes_cbc_decrypt(data, key, iv): """ Decrypt with aes in CBC mode @param {int[]} data cipher @param {int[]} key 16/24/32-Byte cipher key @param {int[]} iv 16-Byte IV @returns {int[]} decrypted data """ expanded_key = key_expansion(key) block_count = ceil(len(data) / BLOCK_SIZE_BYTES) decrypted_data = [] previous_cipher_block = iv for i in range(block_count): block = data[i * BLOCK_SIZE_BYTES: (i + 1) * BLOCK_SIZE_BYTES] block += [0] * (BLOCK_SIZE_BYTES - len(block)) decrypted_block = aes_decrypt(block, expanded_key) decrypted_data += xor(decrypted_block, previous_cipher_block) previous_cipher_block = block decrypted_data = decrypted_data[:len(data)] return decrypted_data def aes_cbc_encrypt(data, key, iv): """ Encrypt with aes in CBC mode. Using PKCS#7 padding @param {int[]} data cleartext @param {int[]} key 16/24/32-Byte cipher key @param {int[]} iv 16-Byte IV @returns {int[]} encrypted data """ expanded_key = key_expansion(key) block_count = ceil(len(data) / BLOCK_SIZE_BYTES) encrypted_data = [] previous_cipher_block = iv for i in range(block_count): block = data[i * BLOCK_SIZE_BYTES: (i + 1) * BLOCK_SIZE_BYTES] remaining_length = BLOCK_SIZE_BYTES - len(block) block += [remaining_length] * remaining_length mixed_block = xor(block, previous_cipher_block) encrypted_block = aes_encrypt(mixed_block, expanded_key) encrypted_data += encrypted_block previous_cipher_block = encrypted_block return encrypted_data def aes_gcm_decrypt_and_verify(data, key, tag, nonce): """ Decrypt with aes in GBM mode and checks authenticity using tag @param {int[]} data cipher @param {int[]} key 16-Byte cipher key @param {int[]} tag authentication tag @param {int[]} nonce IV (recommended 12-Byte) @returns {int[]} decrypted data """ # XXX: check aes, gcm param hash_subkey = aes_encrypt([0] * BLOCK_SIZE_BYTES, key_expansion(key)) if len(nonce) == 12: j0 = nonce + [0, 0, 0, 1] else: fill = (BLOCK_SIZE_BYTES - (len(nonce) % BLOCK_SIZE_BYTES)) % \ BLOCK_SIZE_BYTES + 8 ghash_in = nonce + [0] * fill + bytes_to_intlist( (8 * len(nonce)).to_bytes(8, "big")) j0 = ghash(hash_subkey, ghash_in) # TODO: add nonce support to aes_ctr_decrypt # nonce_ctr = j0[:12] iv_ctr = inc(j0) decrypted_data = aes_ctr_decrypt( data, key, iv_ctr + [0] * (BLOCK_SIZE_BYTES - len(iv_ctr))) pad_len = ( (BLOCK_SIZE_BYTES - (len(data) % BLOCK_SIZE_BYTES)) % BLOCK_SIZE_BYTES) s_tag = ghash( hash_subkey, data + [0] * pad_len + # pad bytes_to_intlist( (0 * 8).to_bytes(8, "big") + # length of associated data ((len(data) * 8).to_bytes(8, "big")) # length of data ) ) if tag != aes_ctr_encrypt(s_tag, key, j0): raise ValueError("Mismatching authentication tag") return decrypted_data def aes_encrypt(data, expanded_key): """ Encrypt one block with aes @param {int[]} data 16-Byte state @param {int[]} expanded_key 176/208/240-Byte expanded key @returns {int[]} 16-Byte cipher """ rounds = len(expanded_key) // BLOCK_SIZE_BYTES - 1 data = xor(data, expanded_key[:BLOCK_SIZE_BYTES]) for i in range(1, rounds + 1): data = sub_bytes(data) data = shift_rows(data) if i != rounds: data = list(iter_mix_columns(data, MIX_COLUMN_MATRIX)) data = xor(data, expanded_key[ i * BLOCK_SIZE_BYTES: (i + 1) * BLOCK_SIZE_BYTES]) return data def aes_decrypt(data, expanded_key): """ Decrypt one block with aes @param {int[]} data 16-Byte cipher @param {int[]} expanded_key 176/208/240-Byte expanded key @returns {int[]} 16-Byte state """ rounds = len(expanded_key) // BLOCK_SIZE_BYTES - 1 for i in range(rounds, 0, -1): data = xor(data, expanded_key[ i * BLOCK_SIZE_BYTES: (i + 1) * BLOCK_SIZE_BYTES]) if i != rounds: data = list(iter_mix_columns(data, MIX_COLUMN_MATRIX_INV)) data = shift_rows_inv(data) data = sub_bytes_inv(data) data = xor(data, expanded_key[:BLOCK_SIZE_BYTES]) return data def aes_decrypt_text(data, password, key_size_bytes): """ Decrypt text - The first 8 Bytes of decoded 'data' are the 8 high Bytes of the counter - The cipher key is retrieved by encrypting the first 16 Byte of 'password' with the first 'key_size_bytes' Bytes from 'password' (if necessary filled with 0's) - Mode of operation is 'counter' @param {str} data Base64 encoded string @param {str,unicode} password Password (will be encoded with utf-8) @param {int} key_size_bytes Possible values: 16 for 128-Bit, 24 for 192-Bit, or 32 for 256-Bit @returns {str} Decrypted data """ NONCE_LENGTH_BYTES = 8 data = bytes_to_intlist(binascii.a2b_base64(data)) password = bytes_to_intlist(password.encode("utf-8")) key = password[:key_size_bytes] + [0] * (key_size_bytes - len(password)) key = aes_encrypt(key[:BLOCK_SIZE_BYTES], key_expansion(key)) * \ (key_size_bytes // BLOCK_SIZE_BYTES) nonce = data[:NONCE_LENGTH_BYTES] cipher = data[NONCE_LENGTH_BYTES:] return intlist_to_bytes(aes_ctr_decrypt( cipher, key, nonce + [0] * (BLOCK_SIZE_BYTES - NONCE_LENGTH_BYTES) )) RCON = ( 0x8d, 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80, 0x1b, 0x36, ) SBOX = ( 0x63, 0x7C, 0x77, 0x7B, 0xF2, 0x6B, 0x6F, 0xC5, 0x30, 0x01, 0x67, 0x2B, 0xFE, 0xD7, 0xAB, 0x76, 0xCA, 0x82, 0xC9, 0x7D, 0xFA, 0x59, 0x47, 0xF0, 0xAD, 0xD4, 0xA2, 0xAF, 0x9C, 0xA4, 0x72, 0xC0, 0xB7, 0xFD, 0x93, 0x26, 0x36, 0x3F, 0xF7, 0xCC, 0x34, 0xA5, 0xE5, 0xF1, 0x71, 0xD8, 0x31, 0x15, 0x04, 0xC7, 0x23, 0xC3, 0x18, 0x96, 0x05, 0x9A, 0x07, 0x12, 0x80, 0xE2, 0xEB, 0x27, 0xB2, 0x75, 0x09, 0x83, 0x2C, 0x1A, 0x1B, 0x6E, 0x5A, 0xA0, 0x52, 0x3B, 0xD6, 0xB3, 0x29, 0xE3, 0x2F, 0x84, 0x53, 0xD1, 0x00, 0xED, 0x20, 0xFC, 0xB1, 0x5B, 0x6A, 0xCB, 0xBE, 0x39, 0x4A, 0x4C, 0x58, 0xCF, 0xD0, 0xEF, 0xAA, 0xFB, 0x43, 0x4D, 0x33, 0x85, 0x45, 0xF9, 0x02, 0x7F, 0x50, 0x3C, 0x9F, 0xA8, 0x51, 0xA3, 0x40, 0x8F, 0x92, 0x9D, 0x38, 0xF5, 0xBC, 0xB6, 0xDA, 0x21, 0x10, 0xFF, 0xF3, 0xD2, 0xCD, 0x0C, 0x13, 0xEC, 0x5F, 0x97, 0x44, 0x17, 0xC4, 0xA7, 0x7E, 0x3D, 0x64, 0x5D, 0x19, 0x73, 0x60, 0x81, 0x4F, 0xDC, 0x22, 0x2A, 0x90, 0x88, 0x46, 0xEE, 0xB8, 0x14, 0xDE, 0x5E, 0x0B, 0xDB, 0xE0, 0x32, 0x3A, 0x0A, 0x49, 0x06, 0x24, 0x5C, 0xC2, 0xD3, 0xAC, 0x62, 0x91, 0x95, 0xE4, 0x79, 0xE7, 0xC8, 0x37, 0x6D, 0x8D, 0xD5, 0x4E, 0xA9, 0x6C, 0x56, 0xF4, 0xEA, 0x65, 0x7A, 0xAE, 0x08, 0xBA, 0x78, 0x25, 0x2E, 0x1C, 0xA6, 0xB4, 0xC6, 0xE8, 0xDD, 0x74, 0x1F, 0x4B, 0xBD, 0x8B, 0x8A, 0x70, 0x3E, 0xB5, 0x66, 0x48, 0x03, 0xF6, 0x0E, 0x61, 0x35, 0x57, 0xB9, 0x86, 0xC1, 0x1D, 0x9E, 0xE1, 0xF8, 0x98, 0x11, 0x69, 0xD9, 0x8E, 0x94, 0x9B, 0x1E, 0x87, 0xE9, 0xCE, 0x55, 0x28, 0xDF, 0x8C, 0xA1, 0x89, 0x0D, 0xBF, 0xE6, 0x42, 0x68, 0x41, 0x99, 0x2D, 0x0F, 0xB0, 0x54, 0xBB, 0x16, ) SBOX_INV = ( 0x52, 0x09, 0x6a, 0xd5, 0x30, 0x36, 0xa5, 0x38, 0xbf, 0x40, 0xa3, 0x9e, 0x81, 0xf3, 0xd7, 0xfb, 0x7c, 0xe3, 0x39, 0x82, 0x9b, 0x2f, 0xff, 0x87, 0x34, 0x8e, 0x43, 0x44, 0xc4, 0xde, 0xe9, 0xcb, 0x54, 0x7b, 0x94, 0x32, 0xa6, 0xc2, 0x23, 0x3d, 0xee, 0x4c, 0x95, 0x0b, 0x42, 0xfa, 0xc3, 0x4e, 0x08, 0x2e, 0xa1, 0x66, 0x28, 0xd9, 0x24, 0xb2, 0x76, 0x5b, 0xa2, 0x49, 0x6d, 0x8b, 0xd1, 0x25, 0x72, 0xf8, 0xf6, 0x64, 0x86, 0x68, 0x98, 0x16, 0xd4, 0xa4, 0x5c, 0xcc, 0x5d, 0x65, 0xb6, 0x92, 0x6c, 0x70, 0x48, 0x50, 0xfd, 0xed, 0xb9, 0xda, 0x5e, 0x15, 0x46, 0x57, 0xa7, 0x8d, 0x9d, 0x84, 0x90, 0xd8, 0xab, 0x00, 0x8c, 0xbc, 0xd3, 0x0a, 0xf7, 0xe4, 0x58, 0x05, 0xb8, 0xb3, 0x45, 0x06, 0xd0, 0x2c, 0x1e, 0x8f, 0xca, 0x3f, 0x0f, 0x02, 0xc1, 0xaf, 0xbd, 0x03, 0x01, 0x13, 0x8a, 0x6b, 0x3a, 0x91, 0x11, 0x41, 0x4f, 0x67, 0xdc, 0xea, 0x97, 0xf2, 0xcf, 0xce, 0xf0, 0xb4, 0xe6, 0x73, 0x96, 0xac, 0x74, 0x22, 0xe7, 0xad, 0x35, 0x85, 0xe2, 0xf9, 0x37, 0xe8, 0x1c, 0x75, 0xdf, 0x6e, 0x47, 0xf1, 0x1a, 0x71, 0x1d, 0x29, 0xc5, 0x89, 0x6f, 0xb7, 0x62, 0x0e, 0xaa, 0x18, 0xbe, 0x1b, 0xfc, 0x56, 0x3e, 0x4b, 0xc6, 0xd2, 0x79, 0x20, 0x9a, 0xdb, 0xc0, 0xfe, 0x78, 0xcd, 0x5a, 0xf4, 0x1f, 0xdd, 0xa8, 0x33, 0x88, 0x07, 0xc7, 0x31, 0xb1, 0x12, 0x10, 0x59, 0x27, 0x80, 0xec, 0x5f, 0x60, 0x51, 0x7f, 0xa9, 0x19, 0xb5, 0x4a, 0x0d, 0x2d, 0xe5, 0x7a, 0x9f, 0x93, 0xc9, 0x9c, 0xef, 0xa0, 0xe0, 0x3b, 0x4d, 0xae, 0x2a, 0xf5, 0xb0, 0xc8, 0xeb, 0xbb, 0x3c, 0x83, 0x53, 0x99, 0x61, 0x17, 0x2b, 0x04, 0x7e, 0xba, 0x77, 0xd6, 0x26, 0xe1, 0x69, 0x14, 0x63, 0x55, 0x21, 0x0c, 0x7d ) MIX_COLUMN_MATRIX = ( (0x2, 0x3, 0x1, 0x1), (0x1, 0x2, 0x3, 0x1), (0x1, 0x1, 0x2, 0x3), (0x3, 0x1, 0x1, 0x2), ) MIX_COLUMN_MATRIX_INV = ( (0xE, 0xB, 0xD, 0x9), (0x9, 0xE, 0xB, 0xD), (0xD, 0x9, 0xE, 0xB), (0xB, 0xD, 0x9, 0xE), ) RIJNDAEL_EXP_TABLE = ( 0x01, 0x03, 0x05, 0x0F, 0x11, 0x33, 0x55, 0xFF, 0x1A, 0x2E, 0x72, 0x96, 0xA1, 0xF8, 0x13, 0x35, 0x5F, 0xE1, 0x38, 0x48, 0xD8, 0x73, 0x95, 0xA4, 0xF7, 0x02, 0x06, 0x0A, 0x1E, 0x22, 0x66, 0xAA, 0xE5, 0x34, 0x5C, 0xE4, 0x37, 0x59, 0xEB, 0x26, 0x6A, 0xBE, 0xD9, 0x70, 0x90, 0xAB, 0xE6, 0x31, 0x53, 0xF5, 0x04, 0x0C, 0x14, 0x3C, 0x44, 0xCC, 0x4F, 0xD1, 0x68, 0xB8, 0xD3, 0x6E, 0xB2, 0xCD, 0x4C, 0xD4, 0x67, 0xA9, 0xE0, 0x3B, 0x4D, 0xD7, 0x62, 0xA6, 0xF1, 0x08, 0x18, 0x28, 0x78, 0x88, 0x83, 0x9E, 0xB9, 0xD0, 0x6B, 0xBD, 0xDC, 0x7F, 0x81, 0x98, 0xB3, 0xCE, 0x49, 0xDB, 0x76, 0x9A, 0xB5, 0xC4, 0x57, 0xF9, 0x10, 0x30, 0x50, 0xF0, 0x0B, 0x1D, 0x27, 0x69, 0xBB, 0xD6, 0x61, 0xA3, 0xFE, 0x19, 0x2B, 0x7D, 0x87, 0x92, 0xAD, 0xEC, 0x2F, 0x71, 0x93, 0xAE, 0xE9, 0x20, 0x60, 0xA0, 0xFB, 0x16, 0x3A, 0x4E, 0xD2, 0x6D, 0xB7, 0xC2, 0x5D, 0xE7, 0x32, 0x56, 0xFA, 0x15, 0x3F, 0x41, 0xC3, 0x5E, 0xE2, 0x3D, 0x47, 0xC9, 0x40, 0xC0, 0x5B, 0xED, 0x2C, 0x74, 0x9C, 0xBF, 0xDA, 0x75, 0x9F, 0xBA, 0xD5, 0x64, 0xAC, 0xEF, 0x2A, 0x7E, 0x82, 0x9D, 0xBC, 0xDF, 0x7A, 0x8E, 0x89, 0x80, 0x9B, 0xB6, 0xC1, 0x58, 0xE8, 0x23, 0x65, 0xAF, 0xEA, 0x25, 0x6F, 0xB1, 0xC8, 0x43, 0xC5, 0x54, 0xFC, 0x1F, 0x21, 0x63, 0xA5, 0xF4, 0x07, 0x09, 0x1B, 0x2D, 0x77, 0x99, 0xB0, 0xCB, 0x46, 0xCA, 0x45, 0xCF, 0x4A, 0xDE, 0x79, 0x8B, 0x86, 0x91, 0xA8, 0xE3, 0x3E, 0x42, 0xC6, 0x51, 0xF3, 0x0E, 0x12, 0x36, 0x5A, 0xEE, 0x29, 0x7B, 0x8D, 0x8C, 0x8F, 0x8A, 0x85, 0x94, 0xA7, 0xF2, 0x0D, 0x17, 0x39, 0x4B, 0xDD, 0x7C, 0x84, 0x97, 0xA2, 0xFD, 0x1C, 0x24, 0x6C, 0xB4, 0xC7, 0x52, 0xF6, 0x01, ) RIJNDAEL_LOG_TABLE = ( 0x00, 0x00, 0x19, 0x01, 0x32, 0x02, 0x1a, 0xc6, 0x4b, 0xc7, 0x1b, 0x68, 0x33, 0xee, 0xdf, 0x03, 0x64, 0x04, 0xe0, 0x0e, 0x34, 0x8d, 0x81, 0xef, 0x4c, 0x71, 0x08, 0xc8, 0xf8, 0x69, 0x1c, 0xc1, 0x7d, 0xc2, 0x1d, 0xb5, 0xf9, 0xb9, 0x27, 0x6a, 0x4d, 0xe4, 0xa6, 0x72, 0x9a, 0xc9, 0x09, 0x78, 0x65, 0x2f, 0x8a, 0x05, 0x21, 0x0f, 0xe1, 0x24, 0x12, 0xf0, 0x82, 0x45, 0x35, 0x93, 0xda, 0x8e, 0x96, 0x8f, 0xdb, 0xbd, 0x36, 0xd0, 0xce, 0x94, 0x13, 0x5c, 0xd2, 0xf1, 0x40, 0x46, 0x83, 0x38, 0x66, 0xdd, 0xfd, 0x30, 0xbf, 0x06, 0x8b, 0x62, 0xb3, 0x25, 0xe2, 0x98, 0x22, 0x88, 0x91, 0x10, 0x7e, 0x6e, 0x48, 0xc3, 0xa3, 0xb6, 0x1e, 0x42, 0x3a, 0x6b, 0x28, 0x54, 0xfa, 0x85, 0x3d, 0xba, 0x2b, 0x79, 0x0a, 0x15, 0x9b, 0x9f, 0x5e, 0xca, 0x4e, 0xd4, 0xac, 0xe5, 0xf3, 0x73, 0xa7, 0x57, 0xaf, 0x58, 0xa8, 0x50, 0xf4, 0xea, 0xd6, 0x74, 0x4f, 0xae, 0xe9, 0xd5, 0xe7, 0xe6, 0xad, 0xe8, 0x2c, 0xd7, 0x75, 0x7a, 0xeb, 0x16, 0x0b, 0xf5, 0x59, 0xcb, 0x5f, 0xb0, 0x9c, 0xa9, 0x51, 0xa0, 0x7f, 0x0c, 0xf6, 0x6f, 0x17, 0xc4, 0x49, 0xec, 0xd8, 0x43, 0x1f, 0x2d, 0xa4, 0x76, 0x7b, 0xb7, 0xcc, 0xbb, 0x3e, 0x5a, 0xfb, 0x60, 0xb1, 0x86, 0x3b, 0x52, 0xa1, 0x6c, 0xaa, 0x55, 0x29, 0x9d, 0x97, 0xb2, 0x87, 0x90, 0x61, 0xbe, 0xdc, 0xfc, 0xbc, 0x95, 0xcf, 0xcd, 0x37, 0x3f, 0x5b, 0xd1, 0x53, 0x39, 0x84, 0x3c, 0x41, 0xa2, 0x6d, 0x47, 0x14, 0x2a, 0x9e, 0x5d, 0x56, 0xf2, 0xd3, 0xab, 0x44, 0x11, 0x92, 0xd9, 0x23, 0x20, 0x2e, 0x89, 0xb4, 0x7c, 0xb8, 0x26, 0x77, 0x99, 0xe3, 0xa5, 0x67, 0x4a, 0xed, 0xde, 0xc5, 0x31, 0xfe, 0x18, 0x0d, 0x63, 0x8c, 0x80, 0xc0, 0xf7, 0x70, 0x07, ) def key_expansion(data): """ Generate key schedule @param {int[]} data 16/24/32-Byte cipher key @returns {int[]} 176/208/240-Byte expanded key """ data = data[:] # copy rcon_iteration = 1 key_size_bytes = len(data) expanded_key_size_bytes = (key_size_bytes // 4 + 7) * BLOCK_SIZE_BYTES while len(data) < expanded_key_size_bytes: temp = data[-4:] temp = key_schedule_core(temp, rcon_iteration) rcon_iteration += 1 data += xor(temp, data[-key_size_bytes: 4 - key_size_bytes]) for _ in range(3): temp = data[-4:] data += xor(temp, data[-key_size_bytes: 4 - key_size_bytes]) if key_size_bytes == 32: temp = data[-4:] temp = sub_bytes(temp) data += xor(temp, data[-key_size_bytes: 4 - key_size_bytes]) for _ in range(3 if key_size_bytes == 32 else 2 if key_size_bytes == 24 else 0): temp = data[-4:] data += xor(temp, data[-key_size_bytes: 4 - key_size_bytes]) data = data[:expanded_key_size_bytes] return data def iter_vector(iv): while True: yield iv iv = inc(iv) def sub_bytes(data): return [SBOX[x] for x in data] def sub_bytes_inv(data): return [SBOX_INV[x] for x in data] def rotate(data): return data[1:] + [data[0]] def key_schedule_core(data, rcon_iteration): data = rotate(data) data = sub_bytes(data) data[0] = data[0] ^ RCON[rcon_iteration] return data def xor(data1, data2): return [x ^ y for x, y in zip(data1, data2)] def iter_mix_columns(data, matrix): for i in (0, 4, 8, 12): for row in matrix: mixed = 0 for j in range(4): if data[i:i + 4][j] == 0 or row[j] == 0: mixed ^= 0 else: mixed ^= RIJNDAEL_EXP_TABLE[ (RIJNDAEL_LOG_TABLE[data[i + j]] + RIJNDAEL_LOG_TABLE[row[j]]) % 0xFF ] yield mixed def shift_rows(data): return [ data[((column + row) & 0b11) * 4 + row] for column in range(4) for row in range(4) ] def shift_rows_inv(data): return [ data[((column - row) & 0b11) * 4 + row] for column in range(4) for row in range(4) ] def shift_block(data): data_shifted = [] bit = 0 for n in data: if bit: n |= 0x100 bit = n & 1 n >>= 1 data_shifted.append(n) return data_shifted def inc(data): data = data[:] # copy for i in range(len(data) - 1, -1, -1): if data[i] == 255: data[i] = 0 else: data[i] = data[i] + 1 break return data def block_product(block_x, block_y): # NIST SP 800-38D, Algorithm 1 if len(block_x) != BLOCK_SIZE_BYTES or len(block_y) != BLOCK_SIZE_BYTES: raise ValueError( "Length of blocks need to be %d bytes" % BLOCK_SIZE_BYTES) block_r = [0xE1] + [0] * (BLOCK_SIZE_BYTES - 1) block_v = block_y[:] block_z = [0] * BLOCK_SIZE_BYTES for i in block_x: for bit in range(7, -1, -1): if i & (1 << bit): block_z = xor(block_z, block_v) do_xor = block_v[-1] & 1 block_v = shift_block(block_v) if do_xor: block_v = xor(block_v, block_r) return block_z def ghash(subkey, data): # NIST SP 800-38D, Algorithm 2 if len(data) % BLOCK_SIZE_BYTES: raise ValueError( "Length of data should be %d bytes" % BLOCK_SIZE_BYTES) last_y = [0] * BLOCK_SIZE_BYTES for i in range(0, len(data), BLOCK_SIZE_BYTES): block = data[i: i + BLOCK_SIZE_BYTES] last_y = block_product(xor(last_y, block), subkey) return last_y ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1746776695.0 gallery_dl-1.29.7/gallery_dl/archive.py0000644000175000017500000001720015007331167016540 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2024-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Download Archives""" import os import logging from . import util, formatter log = logging.getLogger("archive") def connect(path, prefix, format, table=None, mode=None, pragma=None, kwdict=None, cache_key=None): keygen = formatter.parse(prefix + format).format_map if isinstance(path, str) and path.startswith( ("postgres://", "postgresql://")): if mode == "memory": cls = DownloadArchivePostgresqlMemory else: cls = DownloadArchivePostgresql else: path = util.expand_path(path) if kwdict is not None and "{" in path: path = formatter.parse(path).format_map(kwdict) if mode == "memory": cls = DownloadArchiveMemory else: cls = DownloadArchive if kwdict is not None and table: table = formatter.parse(table).format_map(kwdict) return cls(path, keygen, table, pragma, cache_key) def sanitize(name): return '"' + name.replace('"', "_") + '"' class DownloadArchive(): _sqlite3 = None def __init__(self, path, keygen, table=None, pragma=None, cache_key=None): if self._sqlite3 is None: DownloadArchive._sqlite3 = __import__("sqlite3") try: con = self._sqlite3.connect( path, timeout=60, check_same_thread=False) except self._sqlite3.OperationalError: os.makedirs(os.path.dirname(path)) con = self._sqlite3.connect( path, timeout=60, check_same_thread=False) con.isolation_level = None self.keygen = keygen self.connection = con self.close = con.close self.cursor = cursor = con.cursor() self._cache_key = cache_key or "_archive_key" table = "archive" if table is None else sanitize(table) self._stmt_select = ( "SELECT 1 " "FROM " + table + " " "WHERE entry=? " "LIMIT 1") self._stmt_insert = ( "INSERT OR IGNORE INTO " + table + " " "(entry) VALUES (?)") if pragma: for stmt in pragma: cursor.execute("PRAGMA " + stmt) try: cursor.execute("CREATE TABLE IF NOT EXISTS " + table + " " "(entry TEXT PRIMARY KEY) WITHOUT ROWID") except self._sqlite3.OperationalError: # fallback for missing WITHOUT ROWID support (#553) cursor.execute("CREATE TABLE IF NOT EXISTS " + table + " " "(entry TEXT PRIMARY KEY)") def add(self, kwdict): """Add item described by 'kwdict' to archive""" key = kwdict.get(self._cache_key) or self.keygen(kwdict) self.cursor.execute(self._stmt_insert, (key,)) def check(self, kwdict): """Return True if the item described by 'kwdict' exists in archive""" key = kwdict[self._cache_key] = self.keygen(kwdict) self.cursor.execute(self._stmt_select, (key,)) return self.cursor.fetchone() def finalize(self): pass class DownloadArchiveMemory(DownloadArchive): def __init__(self, path, keygen, table=None, pragma=None, cache_key=None): DownloadArchive.__init__( self, path, keygen, table, pragma, cache_key) self.keys = set() def add(self, kwdict): self.keys.add( kwdict.get(self._cache_key) or self.keygen(kwdict)) def check(self, kwdict): key = kwdict[self._cache_key] = self.keygen(kwdict) if key in self.keys: return True self.cursor.execute(self._stmt_select, (key,)) return self.cursor.fetchone() def finalize(self): if not self.keys: return cursor = self.cursor with self.connection: try: cursor.execute("BEGIN") except self._sqlite3.OperationalError: pass stmt = self._stmt_insert if len(self.keys) < 100: for key in self.keys: cursor.execute(stmt, (key,)) else: cursor.executemany(stmt, ((key,) for key in self.keys)) class DownloadArchivePostgresql(): _psycopg = None def __init__(self, uri, keygen, table=None, pragma=None, cache_key=None): if self._psycopg is None: DownloadArchivePostgresql._psycopg = __import__("psycopg") self.connection = con = self._psycopg.connect(uri) self.cursor = cursor = con.cursor() self.close = con.close self.keygen = keygen self._cache_key = cache_key or "_archive_key" table = "archive" if table is None else sanitize(table) self._stmt_select = ( "SELECT true " "FROM " + table + " " "WHERE entry=%s " "LIMIT 1") self._stmt_insert = ( "INSERT INTO " + table + " (entry) " "VALUES (%s) " "ON CONFLICT DO NOTHING") try: cursor.execute("CREATE TABLE IF NOT EXISTS " + table + " " "(entry TEXT PRIMARY KEY)") con.commit() except Exception as exc: log.error("%s: %s when creating '%s' table: %s", con, exc.__class__.__name__, table, exc) con.rollback() raise def add(self, kwdict): key = kwdict.get(self._cache_key) or self.keygen(kwdict) try: self.cursor.execute(self._stmt_insert, (key,)) self.connection.commit() except Exception as exc: log.error("%s: %s when writing entry: %s", self.connection, exc.__class__.__name__, exc) self.connection.rollback() def check(self, kwdict): key = kwdict[self._cache_key] = self.keygen(kwdict) try: self.cursor.execute(self._stmt_select, (key,)) return self.cursor.fetchone() except Exception as exc: log.error("%s: %s when checking entry: %s", self.connection, exc.__class__.__name__, exc) self.connection.rollback() return False def finalize(self): pass class DownloadArchivePostgresqlMemory(DownloadArchivePostgresql): def __init__(self, path, keygen, table=None, pragma=None, cache_key=None): DownloadArchivePostgresql.__init__( self, path, keygen, table, pragma, cache_key) self.keys = set() def add(self, kwdict): self.keys.add( kwdict.get(self._cache_key) or self.keygen(kwdict)) def check(self, kwdict): key = kwdict[self._cache_key] = self.keygen(kwdict) if key in self.keys: return True try: self.cursor.execute(self._stmt_select, (key,)) return self.cursor.fetchone() except Exception as exc: log.error("%s: %s when checking entry: %s", self.connection, exc.__class__.__name__, exc) self.connection.rollback() return False def finalize(self): if not self.keys: return try: self.cursor.executemany( self._stmt_insert, ((key,) for key in self.keys)) self.connection.commit() except Exception as exc: log.error("%s: %s when writing entries: %s", self.connection, exc.__class__.__name__, exc) self.connection.rollback() ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1743510442.0 gallery_dl-1.29.7/gallery_dl/cache.py0000644000175000017500000001451514772755652016211 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2016-2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Decorators to keep function results in an in-memory and database cache""" import sqlite3 import pickle import time import os import functools from . import config, util class CacheDecorator(): """Simplified in-memory cache""" def __init__(self, func, keyarg): self.func = func self.cache = {} self.keyarg = keyarg def __get__(self, instance, cls): return functools.partial(self.__call__, instance) def __call__(self, *args, **kwargs): key = "" if self.keyarg is None else args[self.keyarg] try: value = self.cache[key] except KeyError: value = self.cache[key] = self.func(*args, **kwargs) return value def update(self, key, value): self.cache[key] = value def invalidate(self, key=""): try: del self.cache[key] except KeyError: pass class MemoryCacheDecorator(CacheDecorator): """In-memory cache""" def __init__(self, func, keyarg, maxage): CacheDecorator.__init__(self, func, keyarg) self.maxage = maxage def __call__(self, *args, **kwargs): key = "" if self.keyarg is None else args[self.keyarg] timestamp = int(time.time()) try: value, expires = self.cache[key] except KeyError: expires = 0 if expires <= timestamp: value = self.func(*args, **kwargs) expires = timestamp + self.maxage self.cache[key] = value, expires return value def update(self, key, value): self.cache[key] = value, int(time.time()) + self.maxage class DatabaseCacheDecorator(): """Database cache""" db = None _init = True def __init__(self, func, keyarg, maxage): self.key = "%s.%s" % (func.__module__, func.__name__) self.func = func self.cache = {} self.keyarg = keyarg self.maxage = maxage def __get__(self, obj, objtype): return functools.partial(self.__call__, obj) def __call__(self, *args, **kwargs): key = "" if self.keyarg is None else args[self.keyarg] timestamp = int(time.time()) # in-memory cache lookup try: value, expires = self.cache[key] if expires > timestamp: return value except KeyError: pass # database lookup fullkey = "%s-%s" % (self.key, key) with self.database() as db: cursor = db.cursor() try: cursor.execute("BEGIN EXCLUSIVE") except sqlite3.OperationalError: pass # Silently swallow exception - workaround for Python 3.6 cursor.execute( "SELECT value, expires FROM data WHERE key=? LIMIT 1", (fullkey,), ) result = cursor.fetchone() if result and result[1] > timestamp: value, expires = result value = pickle.loads(value) else: value = self.func(*args, **kwargs) expires = timestamp + self.maxage cursor.execute( "INSERT OR REPLACE INTO data VALUES (?,?,?)", (fullkey, pickle.dumps(value), expires), ) self.cache[key] = value, expires return value def update(self, key, value): expires = int(time.time()) + self.maxage self.cache[key] = value, expires with self.database() as db: db.execute( "INSERT OR REPLACE INTO data VALUES (?,?,?)", ("%s-%s" % (self.key, key), pickle.dumps(value), expires), ) def invalidate(self, key): try: del self.cache[key] except KeyError: pass with self.database() as db: db.execute( "DELETE FROM data WHERE key=?", ("%s-%s" % (self.key, key),), ) def database(self): if self._init: self.db.execute( "CREATE TABLE IF NOT EXISTS data " "(key TEXT PRIMARY KEY, value TEXT, expires INTEGER)" ) DatabaseCacheDecorator._init = False return self.db def memcache(maxage=None, keyarg=None): if maxage: def wrap(func): return MemoryCacheDecorator(func, keyarg, maxage) else: def wrap(func): return CacheDecorator(func, keyarg) return wrap def cache(maxage=3600, keyarg=None): def wrap(func): return DatabaseCacheDecorator(func, keyarg, maxage) return wrap def clear(module): """Delete database entries for 'module'""" db = DatabaseCacheDecorator.db if not db: return None rowcount = 0 cursor = db.cursor() try: if module == "ALL": cursor.execute("DELETE FROM data") else: cursor.execute( "DELETE FROM data " "WHERE key LIKE 'gallery_dl.extractor.' || ? || '.%'", (module.lower(),) ) except sqlite3.OperationalError: pass # database not initialized, cannot be modified, etc. else: rowcount = cursor.rowcount db.commit() if rowcount: cursor.execute("VACUUM") return rowcount def _path(): path = config.get(("cache",), "file", util.SENTINEL) if path is not util.SENTINEL: return util.expand_path(path) if util.WINDOWS: cachedir = os.environ.get("APPDATA", "~") else: cachedir = os.environ.get("XDG_CACHE_HOME", "~/.cache") cachedir = util.expand_path(os.path.join(cachedir, "gallery-dl")) os.makedirs(cachedir, exist_ok=True) return os.path.join(cachedir, "cache.sqlite3") def _init(): try: dbfile = _path() # restrict access permissions for new db files os.close(os.open(dbfile, os.O_CREAT | os.O_RDONLY, 0o600)) DatabaseCacheDecorator.db = sqlite3.connect( dbfile, timeout=60, check_same_thread=False) except (OSError, TypeError, sqlite3.OperationalError): global cache cache = memcache _init() ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1746776695.0 gallery_dl-1.29.7/gallery_dl/config.py0000644000175000017500000002046515007331167016373 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2015-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Global configuration module""" import sys import os.path import logging from . import util log = logging.getLogger("config") # -------------------------------------------------------------------- # internals _config = {} _files = [] if util.WINDOWS: _default_configs = [ r"%APPDATA%\gallery-dl\config.json", r"%USERPROFILE%\gallery-dl\config.json", r"%USERPROFILE%\gallery-dl.conf", ] else: _default_configs = [ "/etc/gallery-dl.conf", "${XDG_CONFIG_HOME}/gallery-dl/config.json" if os.environ.get("XDG_CONFIG_HOME") else "${HOME}/.config/gallery-dl/config.json", "${HOME}/.gallery-dl.conf", ] if util.EXECUTABLE: # look for config file in PyInstaller executable directory (#682) _default_configs.append(os.path.join( os.path.dirname(sys.executable), "gallery-dl.conf", )) # -------------------------------------------------------------------- # public interface def initialize(): paths = list(map(util.expand_path, _default_configs)) for path in paths: if os.access(path, os.R_OK | os.W_OK): log.error("There is already a configuration file at '%s'", path) return 1 for path in paths: try: os.makedirs(os.path.dirname(path), exist_ok=True) with open(path, "x", encoding="utf-8") as fp: fp.write("""\ { "extractor": { }, "downloader": { }, "output": { }, "postprocessor": { } } """) break except OSError as exc: log.debug("%s: %s", exc.__class__.__name__, exc) else: log.error("Unable to create a new configuration file " "at any of the default paths") return 1 log.info("Created a basic configuration file at '%s'", path) return 0 def open_extern(): for path in _default_configs: path = util.expand_path(path) if os.access(path, os.R_OK | os.W_OK): break else: log.warning("Unable to find any writable configuration file") return 1 if util.WINDOWS: openers = ("explorer", "notepad") else: openers = ("xdg-open", "open") editor = os.environ.get("EDITOR") if editor: openers = (editor,) + openers import shutil for opener in openers: opener = shutil.which(opener) if opener: break else: log.warning("Unable to find a program to open '%s' with", path) return 1 log.info("Running '%s %s'", opener, path) retcode = util.Popen((opener, path)).wait() if not retcode: try: with open(path, encoding="utf-8") as fp: util.json_loads(fp.read()) except Exception as exc: log.warning("%s when parsing '%s': %s", exc.__class__.__name__, path, exc) return 2 return retcode def status(): from .output import stdout_write paths = [] for path in _default_configs: path = util.expand_path(path) try: with open(path, encoding="utf-8") as fp: util.json_loads(fp.read()) except FileNotFoundError: status = "Not Present" except OSError: status = "Inaccessible" except ValueError: status = "Invalid JSON" except Exception as exc: log.debug(exc) status = "Unknown" else: status = "OK" paths.append((path, status)) fmt = "{{:<{}}} : {{}}\n".format( max(len(p[0]) for p in paths)).format for path, status in paths: stdout_write(fmt(path, status)) def load(files=None, strict=False, loads=util.json_loads): """Load JSON configuration files""" for pathfmt in files or _default_configs: path = util.expand_path(pathfmt) try: with open(path, encoding="utf-8") as fp: conf = loads(fp.read()) except OSError as exc: if strict: log.error(exc) raise SystemExit(1) except Exception as exc: log.error("%s when loading '%s': %s", exc.__class__.__name__, path, exc) if strict: raise SystemExit(2) else: if not _config: _config.update(conf) else: util.combine_dict(_config, conf) _files.append(pathfmt) if "subconfigs" in conf: subconfigs = conf["subconfigs"] if subconfigs: if isinstance(subconfigs, str): subconfigs = (subconfigs,) load(subconfigs, strict, loads) def clear(): """Reset configuration to an empty state""" _config.clear() def get(path, key, default=None, conf=_config): """Get the value of property 'key' or a default value""" try: for p in path: conf = conf[p] return conf[key] except Exception: return default def interpolate(path, key, default=None, conf=_config): """Interpolate the value of 'key'""" if key in conf: return conf[key] try: for p in path: conf = conf[p] if key in conf: default = conf[key] except Exception: pass return default def interpolate_common(common, paths, key, default=None, conf=_config): """Interpolate the value of 'key' using multiple 'paths' along a 'common' ancestor """ if key in conf: return conf[key] # follow the common path try: for p in common: conf = conf[p] if key in conf: default = conf[key] except Exception: return default # try all paths until a value is found value = util.SENTINEL for path in paths: c = conf try: for p in path: c = c[p] if key in c: value = c[key] except Exception: pass if value is not util.SENTINEL: return value return default def accumulate(path, key, conf=_config): """Accumulate the values of 'key' along 'path'""" result = [] try: if key in conf: value = conf[key] if value: if isinstance(value, list): result.extend(value) else: result.append(value) for p in path: conf = conf[p] if key in conf: value = conf[key] if value: if isinstance(value, list): result[:0] = value else: result.insert(0, value) except Exception: pass return result def set(path, key, value, conf=_config): """Set the value of property 'key' for this session""" for p in path: try: conf = conf[p] except KeyError: conf[p] = conf = {} conf[key] = value def setdefault(path, key, value, conf=_config): """Set the value of property 'key' if it doesn't exist""" for p in path: try: conf = conf[p] except KeyError: conf[p] = conf = {} return conf.setdefault(key, value) def unset(path, key, conf=_config): """Unset the value of property 'key'""" try: for p in path: conf = conf[p] del conf[key] except Exception: pass class apply(): """Context Manager: apply a collection of key-value pairs""" def __init__(self, kvlist): self.original = [] self.kvlist = kvlist def __enter__(self): for path, key, value in self.kvlist: self.original.append((path, key, get(path, key, util.SENTINEL))) set(path, key, value) def __exit__(self, exc_type, exc_value, traceback): self.original.reverse() for path, key, value in self.original: if value is util.SENTINEL: unset(path, key) else: set(path, key, value) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1747857760.0 gallery_dl-1.29.7/gallery_dl/cookies.py0000644000175000017500000011527615013430540016557 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2022-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. # Adapted from yt-dlp's cookies module. # https://github.com/yt-dlp/yt-dlp/blob/master/yt_dlp/cookies.py import binascii import ctypes import logging import os import shutil import sqlite3 import struct import subprocess import sys import tempfile from hashlib import pbkdf2_hmac from http.cookiejar import Cookie from . import aes, text, util SUPPORTED_BROWSERS_CHROMIUM = { "brave", "chrome", "chromium", "edge", "opera", "thorium", "vivaldi"} SUPPORTED_BROWSERS_FIREFOX = {"firefox", "zen"} SUPPORTED_BROWSERS = \ SUPPORTED_BROWSERS_CHROMIUM | SUPPORTED_BROWSERS_FIREFOX | {"safari"} logger = logging.getLogger("cookies") def load_cookies(browser_specification): browser_name, profile, keyring, container, domain = \ _parse_browser_specification(*browser_specification) if browser_name in SUPPORTED_BROWSERS_FIREFOX: return load_cookies_firefox(browser_name, profile, container, domain) elif browser_name == "safari": return load_cookies_safari(profile, domain) elif browser_name in SUPPORTED_BROWSERS_CHROMIUM: return load_cookies_chromium(browser_name, profile, keyring, domain) else: raise ValueError("unknown browser '{}'".format(browser_name)) def load_cookies_firefox(browser_name, profile=None, container=None, domain=None): path, container_id = _firefox_cookies_database(browser_name, profile, container) sql = ("SELECT name, value, host, path, isSecure, expiry " "FROM moz_cookies") conditions = [] parameters = [] if container_id is False: conditions.append("NOT INSTR(originAttributes,'userContextId=')") elif container_id: uid = "%userContextId={}".format(container_id) conditions.append("originAttributes LIKE ? OR originAttributes LIKE ?") parameters += (uid, uid + "&%") if domain: if domain[0] == ".": conditions.append("host == ? OR host LIKE ?") parameters += (domain[1:], "%" + domain) else: conditions.append("host == ? OR host == ?") parameters += (domain, "." + domain) if conditions: sql = "{} WHERE ( {} )".format(sql, " ) AND ( ".join(conditions)) with DatabaseConnection(path) as db: cookies = [ Cookie( 0, name, value, None, False, domain, True if domain else False, domain[0] == "." if domain else False, path, True if path else False, secure, expires, False, None, None, {}, ) for name, value, domain, path, secure, expires in db.execute( sql, parameters) ] _log_info("Extracted %s cookies from %s", len(cookies), browser_name.capitalize()) return cookies def load_cookies_safari(profile=None, domain=None): """Ref.: https://github.com/libyal/dtformats/blob /main/documentation/Safari%20Cookies.asciidoc - This data appears to be out of date but the important parts of the database structure is the same - There are a few bytes here and there which are skipped during parsing """ with _safari_cookies_database() as fp: data = fp.read() page_sizes, body_start = _safari_parse_cookies_header(data) p = DataParser(data[body_start:]) cookies = [] for page_size in page_sizes: _safari_parse_cookies_page(p.read_bytes(page_size), cookies) _log_info("Extracted %s cookies from Safari", len(cookies)) return cookies def load_cookies_chromium(browser_name, profile=None, keyring=None, domain=None): config = _chromium_browser_settings(browser_name) path = _chromium_cookies_database(profile, config) _log_debug("Extracting cookies from %s", path) if domain: if domain[0] == ".": condition = " WHERE host_key == ? OR host_key LIKE ?" parameters = (domain[1:], "%" + domain) else: condition = " WHERE host_key == ? OR host_key == ?" parameters = (domain, "." + domain) else: condition = "" parameters = () with DatabaseConnection(path) as db: db.text_factory = bytes cursor = db.cursor() try: meta_version = int(cursor.execute( "SELECT value FROM meta WHERE key = 'version'").fetchone()[0]) except Exception as exc: _log_warning("Failed to get cookie database meta version (%s: %s)", exc.__class__.__name__, exc) meta_version = 0 try: rows = cursor.execute( "SELECT host_key, name, value, encrypted_value, path, " "expires_utc, is_secure FROM cookies" + condition, parameters) except sqlite3.OperationalError: rows = cursor.execute( "SELECT host_key, name, value, encrypted_value, path, " "expires_utc, secure FROM cookies" + condition, parameters) failed_cookies = 0 unencrypted_cookies = 0 decryptor = _chromium_cookie_decryptor( config["directory"], config["keyring"], keyring, meta_version) cookies = [] for domain, name, value, enc_value, path, expires, secure in rows: if not value and enc_value: # encrypted value = decryptor.decrypt(enc_value) if value is None: failed_cookies += 1 continue else: value = value.decode() unencrypted_cookies += 1 if expires: # https://stackoverflow.com/a/43520042 expires = int(expires) // 1000000 - 11644473600 else: expires = None domain = domain.decode() path = path.decode() name = name.decode() cookies.append(Cookie( 0, name, value, None, False, domain, True if domain else False, domain[0] == "." if domain else False, path, True if path else False, secure, expires, False, None, None, {}, )) if failed_cookies > 0: failed_message = " ({} could not be decrypted)".format(failed_cookies) else: failed_message = "" _log_info("Extracted %s cookies from %s%s", len(cookies), browser_name.capitalize(), failed_message) counts = decryptor.cookie_counts counts["unencrypted"] = unencrypted_cookies _log_debug("version breakdown: %s", counts) return cookies # -------------------------------------------------------------------- # firefox def _firefox_cookies_database(browser_name, profile=None, container=None): if not profile: search_root = _firefox_browser_directory(browser_name) elif _is_path(profile): search_root = profile else: search_root = os.path.join( _firefox_browser_directory(browser_name), profile) path = _find_most_recently_used_file(search_root, "cookies.sqlite") if path is None: raise FileNotFoundError("Unable to find Firefox cookies database in " "{}".format(search_root)) _log_debug("Extracting cookies from %s", path) if not container or container == "none": container_id = False _log_debug("Only loading cookies not belonging to any container") elif container == "all": container_id = None else: containers_path = os.path.join( os.path.dirname(path), "containers.json") try: with open(containers_path) as fp: identities = util.json_loads(fp.read())["identities"] except OSError: _log_error("Unable to read Firefox container database at '%s'", containers_path) raise except KeyError: identities = () for context in identities: if container == context.get("name") or container == text.extr( context.get("l10nID", ""), "userContext", ".label"): container_id = context["userContextId"] break else: raise ValueError("Unable to find Firefox container '{}'".format( container)) _log_debug("Only loading cookies from container '%s' (ID %s)", container, container_id) return path, container_id def _firefox_browser_directory(browser_name): join = os.path.join if sys.platform in ("win32", "cygwin"): appdata = os.path.expandvars("%APPDATA%") return { "firefox": join(appdata, R"Mozilla\Firefox\Profiles"), "zen" : join(appdata, R"zen\Profiles") }[browser_name] elif sys.platform == "darwin": appdata = os.path.expanduser("~/Library/Application Support") return { "firefox": join(appdata, R"Firefox/Profiles"), "zen" : join(appdata, R"zen/Profiles") }[browser_name] else: home = os.path.expanduser("~") return { "firefox": join(home, R".mozilla/firefox"), "zen" : join(home, R".zen") }[browser_name] # -------------------------------------------------------------------- # safari def _safari_cookies_database(): try: path = os.path.expanduser("~/Library/Cookies/Cookies.binarycookies") return open(path, "rb") except FileNotFoundError: _log_debug("Trying secondary cookie location") path = os.path.expanduser("~/Library/Containers/com.apple.Safari/Data" "/Library/Cookies/Cookies.binarycookies") return open(path, "rb") def _safari_parse_cookies_header(data): p = DataParser(data) p.expect_bytes(b"cook", "database signature") number_of_pages = p.read_uint(big_endian=True) page_sizes = [p.read_uint(big_endian=True) for _ in range(number_of_pages)] return page_sizes, p.cursor def _safari_parse_cookies_page(data, cookies, domain=None): p = DataParser(data) p.expect_bytes(b"\x00\x00\x01\x00", "page signature") number_of_cookies = p.read_uint() record_offsets = [p.read_uint() for _ in range(number_of_cookies)] if number_of_cookies == 0: _log_debug("Cookies page of size %s has no cookies", len(data)) return p.skip_to(record_offsets[0], "unknown page header field") for i, record_offset in enumerate(record_offsets): p.skip_to(record_offset, "space between records") record_length = _safari_parse_cookies_record( data[record_offset:], cookies, domain) p.read_bytes(record_length) p.skip_to_end("space in between pages") def _safari_parse_cookies_record(data, cookies, host=None): p = DataParser(data) record_size = p.read_uint() p.skip(4, "unknown record field 1") flags = p.read_uint() is_secure = True if (flags & 0x0001) else False p.skip(4, "unknown record field 2") domain_offset = p.read_uint() name_offset = p.read_uint() path_offset = p.read_uint() value_offset = p.read_uint() p.skip(8, "unknown record field 3") expiration_date = _mac_absolute_time_to_posix(p.read_double()) _creation_date = _mac_absolute_time_to_posix(p.read_double()) # noqa: F841 try: p.skip_to(domain_offset) domain = p.read_cstring() if host: if host[0] == ".": if host[1:] != domain and not domain.endswith(host): return record_size else: if host != domain and ("." + host) != domain: return record_size p.skip_to(name_offset) name = p.read_cstring() p.skip_to(path_offset) path = p.read_cstring() p.skip_to(value_offset) value = p.read_cstring() except UnicodeDecodeError: _log_warning("Failed to parse Safari cookie") return record_size p.skip_to(record_size, "space at the end of the record") cookies.append(Cookie( 0, name, value, None, False, domain, True if domain else False, domain[0] == "." if domain else False, path, True if path else False, is_secure, expiration_date, False, None, None, {}, )) return record_size # -------------------------------------------------------------------- # chromium def _chromium_cookies_database(profile, config): if profile is None: search_root = config["directory"] elif _is_path(profile): search_root = profile config["directory"] = (os.path.dirname(profile) if config["profiles"] else profile) elif config["profiles"]: search_root = os.path.join(config["directory"], profile) else: _log_warning("%s does not support profiles", config["browser"]) search_root = config["directory"] path = _find_most_recently_used_file(search_root, "Cookies") if path is None: raise FileNotFoundError("Unable to find {} cookies database in " "'{}'".format(config["browser"], search_root)) return path def _chromium_browser_settings(browser_name): # https://chromium.googlesource.com/chromium # /src/+/HEAD/docs/user_data_dir.md join = os.path.join if sys.platform in ("win32", "cygwin"): appdata_local = os.path.expandvars("%LOCALAPPDATA%") appdata_roaming = os.path.expandvars("%APPDATA%") browser_dir = { "brave" : join(appdata_local, R"BraveSoftware\Brave-Browser\User Data"), "chrome" : join(appdata_local, R"Google\Chrome\User Data"), "chromium": join(appdata_local, R"Chromium\User Data"), "edge" : join(appdata_local, R"Microsoft\Edge\User Data"), "opera" : join(appdata_roaming, R"Opera Software\Opera Stable"), "thorium" : join(appdata_local, R"Thorium\User Data"), "vivaldi" : join(appdata_local, R"Vivaldi\User Data"), }[browser_name] elif sys.platform == "darwin": appdata = os.path.expanduser("~/Library/Application Support") browser_dir = { "brave" : join(appdata, "BraveSoftware/Brave-Browser"), "chrome" : join(appdata, "Google/Chrome"), "chromium": join(appdata, "Chromium"), "edge" : join(appdata, "Microsoft Edge"), "opera" : join(appdata, "com.operasoftware.Opera"), "thorium" : join(appdata, "Thorium"), "vivaldi" : join(appdata, "Vivaldi"), }[browser_name] else: config = (os.environ.get("XDG_CONFIG_HOME") or os.path.expanduser("~/.config")) browser_dir = { "brave" : join(config, "BraveSoftware/Brave-Browser"), "chrome" : join(config, "google-chrome"), "chromium": join(config, "chromium"), "edge" : join(config, "microsoft-edge"), "opera" : join(config, "opera"), "thorium" : join(config, "Thorium"), "vivaldi" : join(config, "vivaldi"), }[browser_name] # Linux keyring names can be determined by snooping on dbus # while opening the browser in KDE: # dbus-monitor "interface="org.kde.KWallet"" "type=method_return" keyring_name = { "brave" : "Brave", "chrome" : "Chrome", "chromium": "Chromium", "edge" : "Microsoft Edge" if sys.platform == "darwin" else "Chromium", "opera" : "Opera" if sys.platform == "darwin" else "Chromium", "thorium" : "Thorium", "vivaldi" : "Vivaldi" if sys.platform == "darwin" else "Chrome", }[browser_name] browsers_without_profiles = {"opera"} return { "browser" : browser_name, "directory": browser_dir, "keyring" : keyring_name, "profiles" : browser_name not in browsers_without_profiles } def _chromium_cookie_decryptor( browser_root, browser_keyring_name, keyring=None, meta_version=0): if sys.platform in ("win32", "cygwin"): return WindowsChromiumCookieDecryptor( browser_root, meta_version) elif sys.platform == "darwin": return MacChromiumCookieDecryptor( browser_keyring_name, meta_version) else: return LinuxChromiumCookieDecryptor( browser_keyring_name, keyring, meta_version) class ChromiumCookieDecryptor: """ Overview: Linux: - cookies are either v10 or v11 - v10: AES-CBC encrypted with a fixed key - v11: AES-CBC encrypted with an OS protected key (keyring) - v11 keys can be stored in various places depending on the activate desktop environment [2] Mac: - cookies are either v10 or not v10 - v10: AES-CBC encrypted with an OS protected key (keyring) and more key derivation iterations than linux - not v10: "old data" stored as plaintext Windows: - cookies are either v10 or not v10 - v10: AES-GCM encrypted with a key which is encrypted with DPAPI - not v10: encrypted with DPAPI Sources: - [1] https://chromium.googlesource.com/chromium/src/+/refs/heads /main/components/os_crypt/ - [2] https://chromium.googlesource.com/chromium/src/+/refs/heads /main/components/os_crypt/key_storage_linux.cc - KeyStorageLinux::CreateService """ def decrypt(self, encrypted_value): raise NotImplementedError("Must be implemented by sub classes") @property def cookie_counts(self): raise NotImplementedError("Must be implemented by sub classes") class LinuxChromiumCookieDecryptor(ChromiumCookieDecryptor): def __init__(self, browser_keyring_name, keyring=None, meta_version=0): password = _get_linux_keyring_password(browser_keyring_name, keyring) self._empty_key = self.derive_key(b"") self._v10_key = self.derive_key(b"peanuts") self._v11_key = None if password is None else self.derive_key(password) self._cookie_counts = {"v10": 0, "v11": 0, "other": 0} self._offset = (32 if meta_version >= 24 else 0) @staticmethod def derive_key(password): # values from # https://chromium.googlesource.com/chromium/src/+/refs/heads # /main/components/os_crypt/os_crypt_linux.cc return pbkdf2_sha1(password, salt=b"saltysalt", iterations=1, key_length=16) @property def cookie_counts(self): return self._cookie_counts def decrypt(self, encrypted_value): version = encrypted_value[:3] ciphertext = encrypted_value[3:] if version == b"v10": self._cookie_counts["v10"] += 1 value = _decrypt_aes_cbc(ciphertext, self._v10_key, self._offset) elif version == b"v11": self._cookie_counts["v11"] += 1 if self._v11_key is None: _log_warning("Unable to decrypt v11 cookies: no key found") return None value = _decrypt_aes_cbc(ciphertext, self._v11_key, self._offset) else: self._cookie_counts["other"] += 1 return None if value is None: value = _decrypt_aes_cbc(ciphertext, self._empty_key, self._offset) if value is None: _log_warning("Failed to decrypt cookie (AES-CBC)") return value class MacChromiumCookieDecryptor(ChromiumCookieDecryptor): def __init__(self, browser_keyring_name, meta_version=0): password = _get_mac_keyring_password(browser_keyring_name) self._v10_key = None if password is None else self.derive_key(password) self._cookie_counts = {"v10": 0, "other": 0} self._offset = (32 if meta_version >= 24 else 0) @staticmethod def derive_key(password): # values from # https://chromium.googlesource.com/chromium/src/+/refs/heads # /main/components/os_crypt/os_crypt_mac.mm return pbkdf2_sha1(password, salt=b"saltysalt", iterations=1003, key_length=16) @property def cookie_counts(self): return self._cookie_counts def decrypt(self, encrypted_value): version = encrypted_value[:3] ciphertext = encrypted_value[3:] if version == b"v10": self._cookie_counts["v10"] += 1 if self._v10_key is None: _log_warning("Unable to decrypt v10 cookies: no key found") return None return _decrypt_aes_cbc(ciphertext, self._v10_key, self._offset) else: self._cookie_counts["other"] += 1 # other prefixes are considered "old data", # which were stored as plaintext # https://chromium.googlesource.com/chromium/src/+/refs/heads # /main/components/os_crypt/os_crypt_mac.mm return encrypted_value class WindowsChromiumCookieDecryptor(ChromiumCookieDecryptor): def __init__(self, browser_root, meta_version=0): self._v10_key = _get_windows_v10_key(browser_root) self._cookie_counts = {"v10": 0, "other": 0} self._offset = (32 if meta_version >= 24 else 0) @property def cookie_counts(self): return self._cookie_counts def decrypt(self, encrypted_value): version = encrypted_value[:3] ciphertext = encrypted_value[3:] if version == b"v10": self._cookie_counts["v10"] += 1 if self._v10_key is None: _log_warning("Unable to decrypt v10 cookies: no key found") return None # https://chromium.googlesource.com/chromium/src/+/refs/heads # /main/components/os_crypt/os_crypt_win.cc # kNonceLength nonce_length = 96 // 8 # boringssl # EVP_AEAD_AES_GCM_TAG_LEN authentication_tag_length = 16 raw_ciphertext = ciphertext nonce = raw_ciphertext[:nonce_length] ciphertext = raw_ciphertext[ nonce_length:-authentication_tag_length] authentication_tag = raw_ciphertext[-authentication_tag_length:] return _decrypt_aes_gcm( ciphertext, self._v10_key, nonce, authentication_tag, self._offset) else: self._cookie_counts["other"] += 1 # any other prefix means the data is DPAPI encrypted # https://chromium.googlesource.com/chromium/src/+/refs/heads # /main/components/os_crypt/os_crypt_win.cc return _decrypt_windows_dpapi(encrypted_value).decode() # -------------------------------------------------------------------- # keyring def _choose_linux_keyring(): """ https://chromium.googlesource.com/chromium/src/+/refs/heads /main/components/os_crypt/key_storage_util_linux.cc SelectBackend """ desktop_environment = _get_linux_desktop_environment(os.environ) _log_debug("Detected desktop environment: %s", desktop_environment) if desktop_environment == DE_KDE: return KEYRING_KWALLET if desktop_environment == DE_OTHER: return KEYRING_BASICTEXT return KEYRING_GNOMEKEYRING def _get_kwallet_network_wallet(): """ The name of the wallet used to store network passwords. https://chromium.googlesource.com/chromium/src/+/refs/heads /main/components/os_crypt/kwallet_dbus.cc KWalletDBus::NetworkWallet which does a dbus call to the following function: https://api.kde.org/frameworks/kwallet/html/classKWallet_1_1Wallet.html Wallet::NetworkWallet """ default_wallet = "kdewallet" try: proc, stdout = Popen_communicate( "dbus-send", "--session", "--print-reply=literal", "--dest=org.kde.kwalletd5", "/modules/kwalletd5", "org.kde.KWallet.networkWallet" ) if proc.returncode != 0: _log_warning("Failed to read NetworkWallet") return default_wallet else: network_wallet = stdout.decode().strip() _log_debug("NetworkWallet = '%s'", network_wallet) return network_wallet except Exception as exc: _log_warning("Error while obtaining NetworkWallet (%s: %s)", exc.__class__.__name__, exc) return default_wallet def _get_kwallet_password(browser_keyring_name): _log_debug("Using kwallet-query to obtain password from kwallet") if shutil.which("kwallet-query") is None: _log_error( "kwallet-query command not found. KWallet and kwallet-query " "must be installed to read from KWallet. kwallet-query should be " "included in the kwallet package for your distribution") return b"" network_wallet = _get_kwallet_network_wallet() try: proc, stdout = Popen_communicate( "kwallet-query", "--read-password", browser_keyring_name + " Safe Storage", "--folder", browser_keyring_name + " Keys", network_wallet, ) if proc.returncode != 0: _log_error("kwallet-query failed with return code {}. " "Please consult the kwallet-query man page " "for details".format(proc.returncode)) return b"" if stdout.lower().startswith(b"failed to read"): _log_debug("Failed to read password from kwallet. " "Using empty string instead") # This sometimes occurs in KDE because chrome does not check # hasEntry and instead just tries to read the value (which # kwallet returns "") whereas kwallet-query checks hasEntry. # To verify this: # dbus-monitor "interface="org.kde.KWallet"" "type=method_return" # while starting chrome. # This may be a bug, as the intended behaviour is to generate a # random password and store it, but that doesn't matter here. return b"" else: if stdout[-1:] == b"\n": stdout = stdout[:-1] return stdout except Exception as exc: _log_warning("Error when running kwallet-query (%s: %s)", exc.__class__.__name__, exc) return b"" def _get_gnome_keyring_password(browser_keyring_name): try: import secretstorage except ImportError: _log_error("'secretstorage' Python package not available") return b"" # Gnome keyring does not seem to organise keys in the same way as KWallet, # using `dbus-monitor` during startup, it can be observed that chromium # lists all keys and presumably searches for its key in the list. # It appears that we must do the same. # https://github.com/jaraco/keyring/issues/556 con = secretstorage.dbus_init() try: col = secretstorage.get_default_collection(con) label = browser_keyring_name + " Safe Storage" for item in col.get_all_items(): if item.get_label() == label: return item.get_secret() else: _log_error("Failed to read from GNOME keyring") return b"" finally: con.close() def _get_linux_keyring_password(browser_keyring_name, keyring): # Note: chrome/chromium can be run with the following flags # to determine which keyring backend it has chosen to use # - chromium --enable-logging=stderr --v=1 2>&1 | grep key_storage_ # # Chromium supports --password-store= # so the automatic detection will not be sufficient in all cases. if not keyring: keyring = _choose_linux_keyring() _log_debug("Chosen keyring: %s", keyring) if keyring == KEYRING_KWALLET: return _get_kwallet_password(browser_keyring_name) elif keyring == KEYRING_GNOMEKEYRING: return _get_gnome_keyring_password(browser_keyring_name) elif keyring == KEYRING_BASICTEXT: # when basic text is chosen, all cookies are stored as v10 # so no keyring password is required return None assert False, "Unknown keyring " + keyring def _get_mac_keyring_password(browser_keyring_name): _log_debug("Using find-generic-password to obtain " "password from OSX keychain") try: proc, stdout = Popen_communicate( "security", "find-generic-password", "-w", # write password to stdout "-a", browser_keyring_name, # match "account" "-s", browser_keyring_name + " Safe Storage", # match "service" ) if stdout[-1:] == b"\n": stdout = stdout[:-1] return stdout except Exception as exc: _log_warning("Error when using find-generic-password (%s: %s)", exc.__class__.__name__, exc) return None def _get_windows_v10_key(browser_root): path = _find_most_recently_used_file(browser_root, "Local State") if path is None: _log_error("Unable to find Local State file") return None _log_debug("Found Local State file at '%s'", path) with open(path, encoding="utf-8") as fp: data = util.json_loads(fp.read()) try: base64_key = data["os_crypt"]["encrypted_key"] except KeyError: _log_error("Unable to find encrypted key in Local State") return None encrypted_key = binascii.a2b_base64(base64_key) prefix = b"DPAPI" if not encrypted_key.startswith(prefix): _log_error("Invalid Local State key") return None return _decrypt_windows_dpapi(encrypted_key[len(prefix):]) # -------------------------------------------------------------------- # utility class ParserError(Exception): pass class DataParser: def __init__(self, data): self.cursor = 0 self._data = data def read_bytes(self, num_bytes): if num_bytes < 0: raise ParserError("invalid read of {} bytes".format(num_bytes)) end = self.cursor + num_bytes if end > len(self._data): raise ParserError("reached end of input") data = self._data[self.cursor:end] self.cursor = end return data def expect_bytes(self, expected_value, message): value = self.read_bytes(len(expected_value)) if value != expected_value: raise ParserError("unexpected value: {} != {} ({})".format( value, expected_value, message)) def read_uint(self, big_endian=False): data_format = ">I" if big_endian else " 0: _log_debug("Skipping {} bytes ({}): {!r}".format( num_bytes, description, self.read_bytes(num_bytes))) elif num_bytes < 0: raise ParserError("Invalid skip of {} bytes".format(num_bytes)) def skip_to(self, offset, description="unknown"): self.skip(offset - self.cursor, description) def skip_to_end(self, description="unknown"): self.skip_to(len(self._data), description) class DatabaseConnection(): def __init__(self, path): self.path = path self.database = None self.directory = None def __enter__(self): try: # https://www.sqlite.org/uri.html#the_uri_path path = self.path.replace("?", "%3f").replace("#", "%23") if util.WINDOWS: path = "/" + os.path.abspath(path) uri = "file:{}?mode=ro&immutable=1".format(path) self.database = sqlite3.connect( uri, uri=True, isolation_level=None, check_same_thread=False) return self.database except Exception as exc: _log_debug("Falling back to temporary database copy (%s: %s)", exc.__class__.__name__, exc) try: self.directory = tempfile.TemporaryDirectory(prefix="gallery-dl-") path_copy = os.path.join(self.directory.name, "copy.sqlite") shutil.copyfile(self.path, path_copy) self.database = sqlite3.connect( path_copy, isolation_level=None, check_same_thread=False) return self.database except BaseException: if self.directory: self.directory.cleanup() raise def __exit__(self, exc_type, exc_value, traceback): self.database.close() if self.directory: self.directory.cleanup() def Popen_communicate(*args): proc = util.Popen( args, stdout=subprocess.PIPE, stderr=subprocess.DEVNULL) try: stdout, stderr = proc.communicate() except BaseException: # Including KeyboardInterrupt proc.kill() proc.wait() raise return proc, stdout """ https://chromium.googlesource.com/chromium/src/+/refs/heads /main/base/nix/xdg_util.h - DesktopEnvironment """ DE_OTHER = "other" DE_CINNAMON = "cinnamon" DE_GNOME = "gnome" DE_KDE = "kde" DE_PANTHEON = "pantheon" DE_UNITY = "unity" DE_XFCE = "xfce" """ https://chromium.googlesource.com/chromium/src/+/refs/heads /main/components/os_crypt/key_storage_util_linux.h - SelectedLinuxBackend """ KEYRING_KWALLET = "kwallet" KEYRING_GNOMEKEYRING = "gnomekeyring" KEYRING_BASICTEXT = "basictext" SUPPORTED_KEYRINGS = {"kwallet", "gnomekeyring", "basictext"} def _get_linux_desktop_environment(env): """ Ref: https://chromium.googlesource.com/chromium/src/+/refs/heads /main/base/nix/xdg_util.cc - GetDesktopEnvironment """ xdg_current_desktop = env.get("XDG_CURRENT_DESKTOP") desktop_session = env.get("DESKTOP_SESSION") if xdg_current_desktop: xdg_current_desktop = (xdg_current_desktop.partition(":")[0] .strip().lower()) if xdg_current_desktop == "unity": if desktop_session and "gnome-fallback" in desktop_session: return DE_GNOME else: return DE_UNITY elif xdg_current_desktop == "gnome": return DE_GNOME elif xdg_current_desktop == "x-cinnamon": return DE_CINNAMON elif xdg_current_desktop == "kde": return DE_KDE elif xdg_current_desktop == "pantheon": return DE_PANTHEON elif xdg_current_desktop == "xfce": return DE_XFCE if desktop_session: if desktop_session in ("mate", "gnome"): return DE_GNOME if "kde" in desktop_session: return DE_KDE if "xfce" in desktop_session: return DE_XFCE if "GNOME_DESKTOP_SESSION_ID" in env: return DE_GNOME if "KDE_FULL_SESSION" in env: return DE_KDE return DE_OTHER def _mac_absolute_time_to_posix(timestamp): # 978307200 is timestamp of 2001-01-01 00:00:00 return 978307200 + int(timestamp) def pbkdf2_sha1(password, salt, iterations, key_length): return pbkdf2_hmac("sha1", password, salt, iterations, key_length) def _decrypt_aes_cbc(ciphertext, key, offset=0, initialization_vector=b" " * 16): plaintext = aes.unpad_pkcs7(aes.aes_cbc_decrypt_bytes( ciphertext, key, initialization_vector)) if offset: plaintext = plaintext[offset:] try: return plaintext.decode() except UnicodeDecodeError: return None def _decrypt_aes_gcm(ciphertext, key, nonce, authentication_tag, offset=0): try: plaintext = aes.aes_gcm_decrypt_and_verify_bytes( ciphertext, key, authentication_tag, nonce) if offset: plaintext = plaintext[offset:] return plaintext.decode() except UnicodeDecodeError: _log_warning("Failed to decrypt cookie (AES-GCM Unicode)") except ValueError: _log_warning("Failed to decrypt cookie (AES-GCM MAC)") return None def _decrypt_windows_dpapi(ciphertext): """ References: - https://docs.microsoft.com/en-us/windows /win32/api/dpapi/nf-dpapi-cryptunprotectdata """ from ctypes.wintypes import DWORD class DATA_BLOB(ctypes.Structure): _fields_ = [("cbData", DWORD), ("pbData", ctypes.POINTER(ctypes.c_char))] buffer = ctypes.create_string_buffer(ciphertext) blob_in = DATA_BLOB(ctypes.sizeof(buffer), buffer) blob_out = DATA_BLOB() ret = ctypes.windll.crypt32.CryptUnprotectData( ctypes.byref(blob_in), # pDataIn None, # ppszDataDescr: human readable description of pDataIn None, # pOptionalEntropy: salt? None, # pvReserved: must be NULL None, # pPromptStruct: information about prompts to display 0, # dwFlags ctypes.byref(blob_out) # pDataOut ) if not ret: _log_warning("Failed to decrypt cookie (DPAPI)") return None result = ctypes.string_at(blob_out.pbData, blob_out.cbData) ctypes.windll.kernel32.LocalFree(blob_out.pbData) return result def _find_most_recently_used_file(root, filename): # if the provided root points to an exact profile path # check if it contains the wanted filename first_choice = os.path.join(root, filename) if os.path.exists(first_choice): return first_choice # if there are multiple browser profiles, take the most recently used one paths = [] for curr_root, dirs, files in os.walk(root): for file in files: if file == filename: paths.append(os.path.join(curr_root, file)) if not paths: return None return max(paths, key=lambda path: os.lstat(path).st_mtime) def _is_path(value): return os.path.sep in value def _parse_browser_specification( browser, profile=None, keyring=None, container=None, domain=None): browser = browser.lower() if browser not in SUPPORTED_BROWSERS: raise ValueError("Unsupported browser '{}'".format(browser)) if keyring and keyring not in SUPPORTED_KEYRINGS: raise ValueError("Unsupported keyring '{}'".format(keyring)) if profile and _is_path(profile): profile = os.path.expanduser(profile) return browser, profile, keyring, container, domain _log_cache = set() _log_debug = logger.debug _log_info = logger.info def _log_warning(msg, *args): if msg not in _log_cache: _log_cache.add(msg) logger.warning(msg, *args) def _log_error(msg, *args): if msg not in _log_cache: _log_cache.add(msg) logger.error(msg, *args) ././@PaxHeader0000000000000000000000000000003300000000000010211 xustar0027 mtime=1747991096.424338 gallery_dl-1.29.7/gallery_dl/downloader/0000755000175000017500000000000015014035070016673 5ustar00mikemike././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1743510442.0 gallery_dl-1.29.7/gallery_dl/downloader/__init__.py0000644000175000017500000000176414772755652021045 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2015-2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Downloader modules""" modules = [ "http", "text", "ytdl", ] def find(scheme): """Return downloader class suitable for handling the given scheme""" try: return _cache[scheme] except KeyError: pass cls = None if scheme == "https": scheme = "http" if scheme in modules: # prevent unwanted imports try: module = __import__(scheme, globals(), None, (), 1) except ImportError: pass else: cls = module.__downloader__ if scheme == "http": _cache["http"] = _cache["https"] = cls else: _cache[scheme] = cls return cls # -------------------------------------------------------------------- # internals _cache = {} ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1746776695.0 gallery_dl-1.29.7/gallery_dl/downloader/common.py0000644000175000017500000000627015007331167020552 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2014-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Common classes and constants used by downloader modules.""" import os from .. import config, util _config = config._config class DownloaderBase(): """Base class for downloaders""" scheme = "" def __init__(self, job): extractor = job.extractor self.log = job.get_logger("downloader." + self.scheme) opts = self._extractor_config(extractor) if opts: self.opts = opts self.config = self.config_opts self.out = job.out self.session = extractor.session self.part = self.config("part", True) self.partdir = self.config("part-directory") if self.partdir: self.partdir = util.expand_path(self.partdir) os.makedirs(self.partdir, exist_ok=True) proxies = self.config("proxy", util.SENTINEL) if proxies is util.SENTINEL: self.proxies = extractor._proxies else: self.proxies = util.build_proxy_map(proxies, self.log) def config(self, key, default=None): """Interpolate downloader config value for 'key'""" return config.interpolate(("downloader", self.scheme), key, default) def config_opts(self, key, default=None, conf=_config): if key in conf: return conf[key] value = self.opts.get(key, util.SENTINEL) if value is not util.SENTINEL: return value return config.interpolate(("downloader", self.scheme), key, default) def _extractor_config(self, extractor): path = extractor._cfgpath if not isinstance(path, list): return self._extractor_opts(path[1], path[2]) opts = {} for cat, sub in reversed(path): popts = self._extractor_opts(cat, sub) if popts: opts.update(popts) return opts def _extractor_opts(self, category, subcategory): cfg = config.get(("extractor",), category) if not cfg: return None copts = cfg.get(self.scheme) if copts: if subcategory in cfg: try: sopts = cfg[subcategory].get(self.scheme) if sopts: opts = copts.copy() opts.update(sopts) return opts except Exception: self._report_config_error(subcategory, cfg[subcategory]) return copts if subcategory in cfg: try: return cfg[subcategory].get(self.scheme) except Exception: self._report_config_error(subcategory, cfg[subcategory]) return None def _report_config_error(self, subcategory, value): config.log.warning("Subcategory '%s' set to '%s' instead of object", subcategory, util.json_dumps(value).strip('"')) def download(self, url, pathfmt): """Write data from 'url' into the file specified by 'pathfmt'""" ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1747857760.0 gallery_dl-1.29.7/gallery_dl/downloader/http.py0000644000175000017500000004403115013430540020226 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2014-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Downloader module for http:// and https:// URLs""" import time import mimetypes from requests.exceptions import RequestException, ConnectionError, Timeout from .common import DownloaderBase from .. import text, util, output from ssl import SSLError class HttpDownloader(DownloaderBase): scheme = "http" def __init__(self, job): DownloaderBase.__init__(self, job) extractor = job.extractor self.downloading = False self.adjust_extension = self.config("adjust-extensions", True) self.chunk_size = self.config("chunk-size", 32768) self.metadata = extractor.config("http-metadata") self.progress = self.config("progress", 3.0) self.validate = self.config("validate", True) self.headers = self.config("headers") self.minsize = self.config("filesize-min") self.maxsize = self.config("filesize-max") self.retries = self.config("retries", extractor._retries) self.retry_codes = self.config("retry-codes", extractor._retry_codes) self.timeout = self.config("timeout", extractor._timeout) self.verify = self.config("verify", extractor._verify) self.mtime = self.config("mtime", True) self.rate = self.config("rate") interval_429 = self.config("sleep-429") if not self.config("consume-content", False): # this resets the underlying TCP connection, and therefore # if the program makes another request to the same domain, # a new connection (either TLS or plain TCP) must be made self.release_conn = lambda resp: resp.close() if self.retries < 0: self.retries = float("inf") if self.minsize: minsize = text.parse_bytes(self.minsize) if not minsize: self.log.warning( "Invalid minimum file size (%r)", self.minsize) self.minsize = minsize if self.maxsize: maxsize = text.parse_bytes(self.maxsize) if not maxsize: self.log.warning( "Invalid maximum file size (%r)", self.maxsize) self.maxsize = maxsize if isinstance(self.chunk_size, str): chunk_size = text.parse_bytes(self.chunk_size) if not chunk_size: self.log.warning( "Invalid chunk size (%r)", self.chunk_size) chunk_size = 32768 self.chunk_size = chunk_size if self.rate: rate = text.parse_bytes(self.rate) if rate: if rate < self.chunk_size: self.chunk_size = rate self.rate = rate self.receive = self._receive_rate else: self.log.warning("Invalid rate limit (%r)", self.rate) if self.progress is not None: self.receive = self._receive_rate if self.progress < 0.0: self.progress = 0.0 if interval_429 is None: self.interval_429 = extractor._interval_429 else: self.interval_429 = util.build_duration_func(interval_429) def download(self, url, pathfmt): try: return self._download_impl(url, pathfmt) except Exception: output.stderr_write("\n") raise finally: # remove file from incomplete downloads if self.downloading and not self.part: util.remove_file(pathfmt.temppath) def _download_impl(self, url, pathfmt): response = None tries = code = 0 msg = "" metadata = self.metadata kwdict = pathfmt.kwdict expected_status = kwdict.get( "_http_expected_status", ()) adjust_extension = kwdict.get( "_http_adjust_extension", self.adjust_extension) if self.part and not metadata: pathfmt.part_enable(self.partdir) while True: if tries: if response: self.release_conn(response) response = None self.log.warning("%s (%s/%s)", msg, tries, self.retries+1) if tries > self.retries: return False if code == 429 and self.interval_429: s = self.interval_429() time.sleep(s if s > tries else tries) else: time.sleep(tries) code = 0 tries += 1 file_header = None # collect HTTP headers headers = {"Accept": "*/*"} # file-specific headers extra = kwdict.get("_http_headers") if extra: headers.update(extra) # general headers if self.headers: headers.update(self.headers) # partial content file_size = pathfmt.part_size() if file_size: headers["Range"] = "bytes={}-".format(file_size) # connect to (remote) source try: response = self.session.request( kwdict.get("_http_method", "GET"), url, stream=True, headers=headers, data=kwdict.get("_http_data"), timeout=self.timeout, proxies=self.proxies, verify=self.verify, ) except ConnectionError as exc: try: reason = exc.args[0].reason cls = reason.__class__.__name__ pre, _, err = str(reason.args[-1]).partition(":") msg = "{}: {}".format(cls, (err or pre).lstrip()) except Exception: msg = str(exc) continue except Timeout as exc: msg = str(exc) continue except Exception as exc: self.log.warning(exc) return False # check response code = response.status_code if code == 200 or code in expected_status: # OK offset = 0 size = response.headers.get("Content-Length") elif code == 206: # Partial Content offset = file_size size = response.headers["Content-Range"].rpartition("/")[2] elif code == 416 and file_size: # Requested Range Not Satisfiable break else: msg = "'{} {}' for '{}'".format(code, response.reason, url) challenge = util.detect_challenge(response) if challenge is not None: self.log.warning(challenge) if code in self.retry_codes or 500 <= code < 600: continue retry = kwdict.get("_http_retry") if retry and retry(response): continue self.release_conn(response) self.log.warning(msg) return False # check for invalid responses validate = kwdict.get("_http_validate") if validate and self.validate: try: result = validate(response) except Exception: self.release_conn(response) raise if isinstance(result, str): url = result tries -= 1 continue if not result: self.release_conn(response) self.log.warning("Invalid response") return False # check file size size = text.parse_int(size, None) if size is not None: if self.minsize and size < self.minsize: self.release_conn(response) self.log.warning( "File size smaller than allowed minimum (%s < %s)", size, self.minsize) pathfmt.temppath = "" return True if self.maxsize and size > self.maxsize: self.release_conn(response) self.log.warning( "File size larger than allowed maximum (%s > %s)", size, self.maxsize) pathfmt.temppath = "" return True build_path = False # set missing filename extension from MIME type if not pathfmt.extension: pathfmt.set_extension(self._find_extension(response)) build_path = True # set metadata from HTTP headers if metadata: kwdict[metadata] = util.extract_headers(response) build_path = True # build and check file path if build_path: pathfmt.build_path() if pathfmt.exists(): pathfmt.temppath = "" # release the connection back to pool by explicitly # calling .close() # see https://requests.readthedocs.io/en/latest/user # /advanced/#body-content-workflow # when the image size is on the order of megabytes, # re-establishing a TLS connection will typically be faster # than consuming the whole response response.close() return True if self.part and metadata: pathfmt.part_enable(self.partdir) metadata = False content = response.iter_content(self.chunk_size) # check filename extension against file header if adjust_extension and not offset and \ pathfmt.extension in SIGNATURE_CHECKS: try: file_header = next( content if response.raw.chunked else response.iter_content(16), b"") except (RequestException, SSLError) as exc: msg = str(exc) output.stderr_write("\n") continue if self._adjust_extension(pathfmt, file_header) and \ pathfmt.exists(): pathfmt.temppath = "" response.close() return True # set open mode if not offset: mode = "w+b" if file_size: self.log.debug("Unable to resume partial download") else: mode = "r+b" self.log.debug("Resuming download at byte %d", offset) # download content self.downloading = True with pathfmt.open(mode) as fp: if file_header: fp.write(file_header) offset += len(file_header) elif offset: if adjust_extension and \ pathfmt.extension in SIGNATURE_CHECKS: self._adjust_extension(pathfmt, fp.read(16)) fp.seek(offset) self.out.start(pathfmt.path) try: self.receive(fp, content, size, offset) except (RequestException, SSLError) as exc: msg = str(exc) output.stderr_write("\n") continue # check file size if size and fp.tell() < size: msg = "file size mismatch ({} < {})".format( fp.tell(), size) output.stderr_write("\n") continue break self.downloading = False if self.mtime: if "_http_lastmodified" in kwdict: kwdict["_mtime"] = kwdict["_http_lastmodified"] else: kwdict["_mtime"] = response.headers.get("Last-Modified") else: kwdict["_mtime"] = None return True def release_conn(self, response): """Release connection back to pool by consuming response body""" try: for _ in response.iter_content(self.chunk_size): pass except (RequestException, SSLError) as exc: output.stderr_write("\n") self.log.debug( "Unable to consume response body (%s: %s); " "closing the connection anyway", exc.__class__.__name__, exc) response.close() @staticmethod def receive(fp, content, bytes_total, bytes_start): write = fp.write for data in content: write(data) def _receive_rate(self, fp, content, bytes_total, bytes_start): rate = self.rate write = fp.write progress = self.progress bytes_downloaded = 0 time_start = time.monotonic() for data in content: time_elapsed = time.monotonic() - time_start bytes_downloaded += len(data) write(data) if progress is not None: if time_elapsed > progress: self.out.progress( bytes_total, bytes_start + bytes_downloaded, int(bytes_downloaded / time_elapsed), ) if rate: time_expected = bytes_downloaded / rate if time_expected > time_elapsed: time.sleep(time_expected - time_elapsed) def _find_extension(self, response): """Get filename extension from MIME type""" mtype = response.headers.get("Content-Type", "image/jpeg") mtype = mtype.partition(";")[0] if "/" not in mtype: mtype = "image/" + mtype if mtype in MIME_TYPES: return MIME_TYPES[mtype] ext = mimetypes.guess_extension(mtype, strict=False) if ext: return ext[1:] self.log.warning("Unknown MIME type '%s'", mtype) return "bin" @staticmethod def _adjust_extension(pathfmt, file_header): """Check filename extension against file header""" if not SIGNATURE_CHECKS[pathfmt.extension](file_header): for ext, check in SIGNATURE_CHECKS.items(): if check(file_header): pathfmt.set_extension(ext) pathfmt.build_path() return True return False MIME_TYPES = { "image/jpeg" : "jpg", "image/jpg" : "jpg", "image/png" : "png", "image/gif" : "gif", "image/bmp" : "bmp", "image/x-bmp" : "bmp", "image/x-ms-bmp": "bmp", "image/webp" : "webp", "image/avif" : "avif", "image/heic" : "heic", "image/heif" : "heif", "image/svg+xml" : "svg", "image/ico" : "ico", "image/icon" : "ico", "image/x-icon" : "ico", "image/vnd.microsoft.icon" : "ico", "image/x-photoshop" : "psd", "application/x-photoshop" : "psd", "image/vnd.adobe.photoshop": "psd", "video/webm": "webm", "video/ogg" : "ogg", "video/mp4" : "mp4", "video/m4v" : "m4v", "video/x-m4v": "m4v", "video/quicktime": "mov", "audio/wav" : "wav", "audio/x-wav": "wav", "audio/webm" : "webm", "audio/ogg" : "ogg", "audio/mpeg" : "mp3", "application/zip" : "zip", "application/x-zip": "zip", "application/x-zip-compressed": "zip", "application/rar" : "rar", "application/x-rar": "rar", "application/x-rar-compressed": "rar", "application/x-7z-compressed" : "7z", "application/pdf" : "pdf", "application/x-pdf": "pdf", "application/x-shockwave-flash": "swf", "application/ogg": "ogg", # https://www.iana.org/assignments/media-types/model/obj "model/obj": "obj", "application/octet-stream": "bin", } # https://en.wikipedia.org/wiki/List_of_file_signatures SIGNATURE_CHECKS = { "jpg" : lambda s: s[0:3] == b"\xFF\xD8\xFF", "png" : lambda s: s[0:8] == b"\x89PNG\r\n\x1A\n", "gif" : lambda s: s[0:6] in (b"GIF87a", b"GIF89a"), "bmp" : lambda s: s[0:2] == b"BM", "webp": lambda s: (s[0:4] == b"RIFF" and s[8:12] == b"WEBP"), "avif": lambda s: s[4:11] == b"ftypavi" and s[11] in b"fs", "heic": lambda s: (s[4:10] == b"ftyphe" and s[10:12] in ( b"ic", b"im", b"is", b"ix", b"vc", b"vm", b"vs")), "svg" : lambda s: s[0:5] == b"= 0 else float("inf"), "socket_timeout": self.config("timeout", extractor._timeout), "nocheckcertificate": not self.config("verify", extractor._verify), "proxy": self.proxies.get("http") if self.proxies else None, } self.ytdl_instance = None self.forward_cookies = self.config("forward-cookies", True) self.progress = self.config("progress", 3.0) self.outtmpl = self.config("outtmpl") def download(self, url, pathfmt): kwdict = pathfmt.kwdict ytdl_instance = kwdict.pop("_ytdl_instance", None) if not ytdl_instance: ytdl_instance = self.ytdl_instance if not ytdl_instance: try: module = ytdl.import_module(self.config("module")) except (ImportError, SyntaxError) as exc: self.log.error("Cannot import module '%s'", getattr(exc, "name", "")) self.log.debug("", exc_info=exc) self.download = lambda u, p: False return False try: ytdl_version = module.version.__version__ except Exception: ytdl_version = "" self.log.debug("Using %s version %s", module, ytdl_version) self.ytdl_instance = ytdl_instance = ytdl.construct_YoutubeDL( module, self, self.ytdl_opts) if self.outtmpl == "default": self.outtmpl = module.DEFAULT_OUTTMPL if self.forward_cookies: self.log.debug("Forwarding cookies to %s", ytdl_instance.__module__) set_cookie = ytdl_instance.cookiejar.set_cookie for cookie in self.session.cookies: set_cookie(cookie) if self.progress is not None and not ytdl_instance._progress_hooks: ytdl_instance.add_progress_hook(self._progress_hook) info_dict = kwdict.pop("_ytdl_info_dict", None) if not info_dict: url = url[5:] try: manifest = kwdict.pop("_ytdl_manifest", None) if manifest: info_dict = self._extract_manifest( ytdl_instance, url, manifest, kwdict.pop("_ytdl_manifest_data", None)) else: info_dict = self._extract_info(ytdl_instance, url) except Exception as exc: self.log.debug("", exc_info=exc) self.log.warning("%s: %s", exc.__class__.__name__, exc) if not info_dict: return False if "entries" in info_dict: index = kwdict.get("_ytdl_index") if index is None: return self._download_playlist( ytdl_instance, pathfmt, info_dict) else: info_dict = info_dict["entries"][index] extra = kwdict.get("_ytdl_extra") if extra: info_dict.update(extra) return self._download_video(ytdl_instance, pathfmt, info_dict) def _download_video(self, ytdl_instance, pathfmt, info_dict): if "url" in info_dict: text.nameext_from_url(info_dict["url"], pathfmt.kwdict) formats = info_dict.get("requested_formats") if formats and not compatible_formats(formats): info_dict["ext"] = "mkv" elif "ext" not in info_dict: try: info_dict["ext"] = info_dict["formats"][0]["ext"] except LookupError: info_dict["ext"] = "mp4" if self.outtmpl: self._set_outtmpl(ytdl_instance, self.outtmpl) pathfmt.filename = filename = \ ytdl_instance.prepare_filename(info_dict) pathfmt.extension = info_dict["ext"] pathfmt.path = pathfmt.directory + filename pathfmt.realpath = pathfmt.temppath = ( pathfmt.realdirectory + filename) else: pathfmt.set_extension(info_dict["ext"]) pathfmt.build_path() if pathfmt.exists(): pathfmt.temppath = "" return True self.out.start(pathfmt.path) if self.part: pathfmt.kwdict["extension"] = pathfmt.prefix + "part" filename = pathfmt.build_filename(pathfmt.kwdict) pathfmt.kwdict["extension"] = info_dict["ext"] if self.partdir: path = os.path.join(self.partdir, filename) else: path = pathfmt.realdirectory + filename else: path = pathfmt.realpath self._set_outtmpl(ytdl_instance, path.replace("%", "%%")) try: ytdl_instance.process_info(info_dict) except Exception as exc: self.log.debug("", exc_info=exc) return False pathfmt.temppath = info_dict["filepath"] return True def _download_playlist(self, ytdl_instance, pathfmt, info_dict): pathfmt.set_extension("%(playlist_index)s.%(ext)s") pathfmt.build_path() self._set_outtmpl(ytdl_instance, pathfmt.realpath) for entry in info_dict["entries"]: ytdl_instance.process_info(entry) return True def _extract_info(self, ytdl, url): return ytdl.extract_info(url, download=False) def _extract_manifest(self, ytdl, url, manifest_type, manifest_data=None): extr = ytdl.get_info_extractor("Generic") video_id = extr._generic_id(url) if manifest_type == "hls": if manifest_data is None: try: fmts, subs = extr._extract_m3u8_formats_and_subtitles( url, video_id, "mp4") except AttributeError: fmts = extr._extract_m3u8_formats(url, video_id, "mp4") subs = None else: try: fmts, subs = extr._parse_m3u8_formats_and_subtitles( url, video_id, "mp4") except AttributeError: fmts = extr._parse_m3u8_formats(url, video_id, "mp4") subs = None elif manifest_type == "dash": if manifest_data is None: try: fmts, subs = extr._extract_mpd_formats_and_subtitles( url, video_id) except AttributeError: fmts = extr._extract_mpd_formats(url, video_id) subs = None else: if isinstance(manifest_data, str): manifest_data = ElementTree.fromstring(manifest_data) try: fmts, subs = extr._parse_mpd_formats_and_subtitles( manifest_data, mpd_id="dash") except AttributeError: fmts = extr._parse_mpd_formats( manifest_data, mpd_id="dash") subs = None else: self.log.error("Unsupported manifest type '%s'", manifest_type) return None info_dict = { "extractor": "", "id" : video_id, "title" : video_id, "formats" : fmts, "subtitles": subs, } return ytdl.process_ie_result(info_dict, download=False) def _progress_hook(self, info): if info["status"] == "downloading" and \ info["elapsed"] >= self.progress: total = info.get("total_bytes") or info.get("total_bytes_estimate") speed = info.get("speed") self.out.progress( None if total is None else int(total), info["downloaded_bytes"], int(speed) if speed else 0, ) @staticmethod def _set_outtmpl(ytdl_instance, outtmpl): try: ytdl_instance._parse_outtmpl except AttributeError: try: ytdl_instance.outtmpl_dict["default"] = outtmpl except AttributeError: ytdl_instance.params["outtmpl"] = outtmpl else: ytdl_instance.params["outtmpl"] = {"default": outtmpl} def compatible_formats(formats): """Returns True if 'formats' are compatible for merge""" video_ext = formats[0].get("ext") audio_ext = formats[1].get("ext") if video_ext == "webm" and audio_ext == "webm": return True exts = ("mp3", "mp4", "m4a", "m4p", "m4b", "m4r", "m4v", "ismv", "isma") return video_ext in exts and audio_ext in exts __downloader__ = YoutubeDLDownloader ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1745260818.0 gallery_dl-1.29.7/gallery_dl/exception.py0000644000175000017500000000733015001510422017103 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2015-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Exception classes used by gallery-dl Class Hierarchy: Exception +-- GalleryDLException +-- ExtractionError | +-- AuthenticationError | +-- AuthorizationError | +-- NotFoundError | +-- HttpError +-- FormatError | +-- FilenameFormatError | +-- DirectoryFormatError +-- FilterError +-- InputFileError +-- NoExtractorError +-- StopExtraction +-- TerminateExtraction +-- RestartExtraction """ class GalleryDLException(Exception): """Base class for GalleryDL exceptions""" default = None msgfmt = None code = 1 def __init__(self, message=None, fmt=True): if not message: message = self.default elif isinstance(message, Exception): message = "{}: {}".format(message.__class__.__name__, message) if self.msgfmt and fmt: message = self.msgfmt.format(message) Exception.__init__(self, message) class ExtractionError(GalleryDLException): """Base class for exceptions during information extraction""" class HttpError(ExtractionError): """HTTP request during data extraction failed""" default = "HTTP request failed" code = 4 def __init__(self, message="", response=None): self.response = response if response is None: self.status = 0 else: self.status = response.status_code if not message: message = "'{} {}' for '{}'".format( response.status_code, response.reason, response.url) ExtractionError.__init__(self, message) class NotFoundError(ExtractionError): """Requested resource (gallery/image) could not be found""" msgfmt = "Requested {} could not be found" default = "resource (gallery/image)" code = 8 class AuthenticationError(ExtractionError): """Invalid or missing login credentials""" default = "Invalid or missing login credentials" code = 16 class AuthorizationError(ExtractionError): """Insufficient privileges to access a resource""" default = "Insufficient privileges to access the specified resource" code = 16 class FormatError(GalleryDLException): """Error while building output paths""" code = 32 class FilenameFormatError(FormatError): """Error while building output filenames""" msgfmt = "Applying filename format string failed ({})" class DirectoryFormatError(FormatError): """Error while building output directory paths""" msgfmt = "Applying directory format string failed ({})" class FilterError(GalleryDLException): """Error while evaluating a filter expression""" msgfmt = "Evaluating filter expression failed ({})" code = 32 class InputFileError(GalleryDLException): """Error when parsing input file""" code = 32 def __init__(self, message, *args): GalleryDLException.__init__( self, message % args if args else message) class NoExtractorError(GalleryDLException): """No extractor can handle the given URL""" code = 64 class StopExtraction(GalleryDLException): """Stop data extraction""" def __init__(self, message=None, *args): GalleryDLException.__init__(self) self.message = message % args if args else message self.code = 1 if message else 0 class TerminateExtraction(GalleryDLException): """Terminate data extraction""" code = 0 class RestartExtraction(GalleryDLException): """Restart data extraction""" code = 0 ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1747991096.4574318 gallery_dl-1.29.7/gallery_dl/extractor/0000755000175000017500000000000015014035070016550 5ustar00mikemike././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1745260818.0 gallery_dl-1.29.7/gallery_dl/extractor/2ch.py0000644000175000017500000000605615001510422017600 0ustar00mikemike# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://2ch.hk/""" from .common import Extractor, Message from .. import text, util class _2chThreadExtractor(Extractor): """Extractor for 2ch threads""" category = "2ch" subcategory = "thread" root = "https://2ch.hk" directory_fmt = ("{category}", "{board}", "{thread} {title}") filename_fmt = "{tim}{filename:? //}.{extension}" archive_fmt = "{board}_{thread}_{tim}" pattern = r"(?:https?://)?2ch\.hk/([^/?#]+)/res/(\d+)" example = "https://2ch.hk/a/res/12345.html" def __init__(self, match): Extractor.__init__(self, match) self.board, self.thread = match.groups() def items(self): url = "{}/{}/res/{}.json".format(self.root, self.board, self.thread) posts = self.request(url).json()["threads"][0]["posts"] op = posts[0] title = op.get("subject") or text.remove_html(op["comment"]) thread = { "board" : self.board, "thread": self.thread, "title" : text.unescape(title)[:50], } yield Message.Directory, thread for post in posts: files = post.get("files") if files: post["post_name"] = post["name"] post["date"] = text.parse_timestamp(post["timestamp"]) del post["files"] del post["name"] for file in files: file.update(thread) file.update(post) file["filename"] = file["fullname"].rpartition(".")[0] file["tim"], _, file["extension"] = \ file["name"].rpartition(".") yield Message.Url, self.root + file["path"], file class _2chBoardExtractor(Extractor): """Extractor for 2ch boards""" category = "2ch" subcategory = "board" root = "https://2ch.hk" pattern = r"(?:https?://)?2ch\.hk/([^/?#]+)/?$" example = "https://2ch.hk/a/" def __init__(self, match): Extractor.__init__(self, match) self.board = match.group(1) def items(self): # index page url = "{}/{}/index.json".format(self.root, self.board) index = self.request(url).json() index["_extractor"] = _2chThreadExtractor for thread in index["threads"]: url = "{}/{}/res/{}.html".format( self.root, self.board, thread["thread_num"]) yield Message.Queue, url, index # pages 1..n for n in util.advance(index["pages"], 1): url = "{}/{}/{}.json".format(self.root, self.board, n) page = self.request(url).json() page["_extractor"] = _2chThreadExtractor for thread in page["threads"]: url = "{}/{}/res/{}.html".format( self.root, self.board, thread["thread_num"]) yield Message.Queue, url, page ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1743510441.0 gallery_dl-1.29.7/gallery_dl/extractor/2chan.py0000644000175000017500000000623114772755651020147 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2017-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.2chan.net/""" from .common import Extractor, Message from .. import text class _2chanThreadExtractor(Extractor): """Extractor for 2chan threads""" category = "2chan" subcategory = "thread" directory_fmt = ("{category}", "{board_name}", "{thread}") filename_fmt = "{tim}.{extension}" archive_fmt = "{board}_{thread}_{tim}" url_fmt = "https://{server}.2chan.net/{board}/src/{filename}" pattern = r"(?:https?://)?([\w-]+)\.2chan\.net/([^/?#]+)/res/(\d+)" example = "https://dec.2chan.net/12/res/12345.htm" def __init__(self, match): Extractor.__init__(self, match) self.server, self.board, self.thread = match.groups() def items(self): url = "https://{}.2chan.net/{}/res/{}.htm".format( self.server, self.board, self.thread) page = self.request(url).text data = self.metadata(page) yield Message.Directory, data for post in self.posts(page): if "filename" not in post: continue post.update(data) url = self.url_fmt.format_map(post) yield Message.Url, url, post def metadata(self, page): """Collect metadata for extractor-job""" title, _, boardname = text.extr( page, "", "").rpartition(" - ") return { "server": self.server, "title": title, "board": self.board, "board_name": boardname[:-4], "thread": self.thread, } def posts(self, page): """Build a list of all post-objects""" page = text.extr( page, '
') return [ self.parse(post) for post in page.split('') ] def parse(self, post): """Build post-object by extracting data from an HTML post""" data = self._extract_post(post) if data["name"]: data["name"] = data["name"].strip() path = text.extr(post, '' , '<'), ("name", 'class="cnm">' , '<'), ("now" , 'class="cnw">' , '<'), ("no" , 'class="cno">No.', '<'), (None , '', ''), ))[0] @staticmethod def _extract_image(post, data): text.extract_all(post, ( (None , '_blank', ''), ("filename", '>', '<'), ("fsize" , '(', ' '), ), 0, data) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1743510441.0 gallery_dl-1.29.7/gallery_dl/extractor/2chen.py0000644000175000017500000000640714772755651020160 0ustar00mikemike# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://sturdychan.help/""" from .common import Extractor, Message from .. import text BASE_PATTERN = r"(?:https?://)?(?:sturdychan.help|2chen\.(?:moe|club))" class _2chenThreadExtractor(Extractor): """Extractor for 2chen threads""" category = "2chen" subcategory = "thread" root = "https://sturdychan.help" directory_fmt = ("{category}", "{board}", "{thread} {title}") filename_fmt = "{time} {filename}.{extension}" archive_fmt = "{board}_{thread}_{hash}_{time}" pattern = BASE_PATTERN + r"/([^/?#]+)/(\d+)" example = "https://sturdychan.help/a/12345/" def __init__(self, match): Extractor.__init__(self, match) self.board, self.thread = match.groups() def items(self): url = "{}/{}/{}".format(self.root, self.board, self.thread) page = self.request(url, encoding="utf-8", notfound="thread").text data = self.metadata(page) yield Message.Directory, data for post in self.posts(page): url = post["url"] if not url: continue if url[0] == "/": url = self.root + url post["url"] = url = url.partition("?")[0] post.update(data) post["time"] = text.parse_int(post["date"].timestamp()) yield Message.Url, url, text.nameext_from_url( post["filename"], post) def metadata(self, page): board, pos = text.extract(page, 'class="board">/', '/<') title = text.extract(page, "

", "

", pos)[0] return { "board" : board, "thread": self.thread, "title" : text.unescape(title), } def posts(self, page): """Return iterable with relevant posts""" return map(self.parse, text.extract_iter( page, 'class="glass media', '')) def parse(self, post): extr = text.extract_from(post) return { "name" : text.unescape(extr("", "")), "date" : text.parse_datetime( extr("")[2], "%d %b %Y (%a) %H:%M:%S" ), "no" : extr('href="#p', '"'), "url" : extr('
[^&#]+)") example = "http://behoimi.org/post?tags=TAG" def posts(self): params = {"tags": self.tags} return self._pagination(self.root + "/post/index.json", params) class _3dbooruPoolExtractor(_3dbooruBase, moebooru.MoebooruPoolExtractor): """Extractor for image-pools from behoimi.org""" pattern = r"(?:https?://)?(?:www\.)?behoimi\.org/pool/show/(?P\d+)" example = "http://behoimi.org/pool/show/12345" def posts(self): params = {"tags": "pool:" + self.pool_id} return self._pagination(self.root + "/post/index.json", params) class _3dbooruPostExtractor(_3dbooruBase, moebooru.MoebooruPostExtractor): """Extractor for single images from behoimi.org""" pattern = r"(?:https?://)?(?:www\.)?behoimi\.org/post/show/(?P\d+)" example = "http://behoimi.org/post/show/12345" def posts(self): params = {"tags": "id:" + self.post_id} return self._pagination(self.root + "/post/index.json", params) class _3dbooruPopularExtractor( _3dbooruBase, moebooru.MoebooruPopularExtractor): """Extractor for popular images from behoimi.org""" pattern = (r"(?:https?://)?(?:www\.)?behoimi\.org" r"/post/popular_(?Pby_(?:day|week|month)|recent)" r"(?:\?(?P[^#]*))?") example = "http://behoimi.org/post/popular_by_month" ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1745260818.0 gallery_dl-1.29.7/gallery_dl/extractor/4archive.py0000644000175000017500000000750115001510422020625 0ustar00mikemike# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://4archive.org/""" from .common import Extractor, Message from .. import text, util class _4archiveThreadExtractor(Extractor): """Extractor for 4archive threads""" category = "4archive" subcategory = "thread" directory_fmt = ("{category}", "{board}", "{thread} {title}") filename_fmt = "{no} {filename}.{extension}" archive_fmt = "{board}_{thread}_{no}" root = "https://4archive.org" referer = False pattern = r"(?:https?://)?4archive\.org/board/([^/?#]+)/thread/(\d+)" example = "https://4archive.org/board/a/thread/12345/" def __init__(self, match): Extractor.__init__(self, match) self.board, self.thread = match.groups() def items(self): url = "{}/board/{}/thread/{}".format( self.root, self.board, self.thread) page = self.request(url).text data = self.metadata(page) posts = self.posts(page) if not data["title"]: data["title"] = posts[0]["com"][:50] for post in posts: post.update(data) post["time"] = int(util.datetime_to_timestamp(post["date"])) yield Message.Directory, post if "url" in post: yield Message.Url, post["url"], text.nameext_from_url( post["filename"], post) def metadata(self, page): return { "board" : self.board, "thread": text.parse_int(self.thread), "title" : text.unescape(text.extr( page, 'class="subject">', "")) } def posts(self, page): return [ self.parse(post) for post in page.split('class="postContainer')[1:] ] @staticmethod def parse(post): extr = text.extract_from(post) data = { "name": extr('class="name">', ""), "date": text.parse_datetime( extr('class="dateTime postNum" >', "<").strip(), "%Y-%m-%d %H:%M:%S"), "no" : text.parse_int(extr('href="#p', '"')), } if 'class="file"' in post: extr('class="fileText"', ">File: ").strip()[1:], "size" : text.parse_bytes(extr(" (", ", ")[:-1]), "width" : text.parse_int(extr("", "x")), "height" : text.parse_int(extr("", "px")), }) extr("
", "
"))) return data class _4archiveBoardExtractor(Extractor): """Extractor for 4archive boards""" category = "4archive" subcategory = "board" root = "https://4archive.org" pattern = r"(?:https?://)?4archive\.org/board/([^/?#]+)(?:/(\d+))?/?$" example = "https://4archive.org/board/a/" def __init__(self, match): Extractor.__init__(self, match) self.board = match.group(1) self.num = text.parse_int(match.group(2), 1) def items(self): data = {"_extractor": _4archiveThreadExtractor} while True: url = "{}/board/{}/{}".format(self.root, self.board, self.num) page = self.request(url).text if 'class="thread"' not in page: return for thread in text.extract_iter(page, 'class="thread" id="t', '"'): url = "{}/board/{}/thread/{}".format( self.root, self.board, thread) yield Message.Queue, url, data self.num += 1 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1743510441.0 gallery_dl-1.29.7/gallery_dl/extractor/4chan.py0000644000175000017500000000500714772755651020151 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2015-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.4chan.org/""" from .common import Extractor, Message from .. import text class _4chanThreadExtractor(Extractor): """Extractor for 4chan threads""" category = "4chan" subcategory = "thread" directory_fmt = ("{category}", "{board}", "{thread} {title}") filename_fmt = "{tim} {filename}.{extension}" archive_fmt = "{board}_{thread}_{tim}" pattern = (r"(?:https?://)?boards\.4chan(?:nel)?\.org" r"/([^/]+)/thread/(\d+)") example = "https://boards.4channel.org/a/thread/12345/" def __init__(self, match): Extractor.__init__(self, match) self.board, self.thread = match.groups() def items(self): url = "https://a.4cdn.org/{}/thread/{}.json".format( self.board, self.thread) posts = self.request(url).json()["posts"] title = posts[0].get("sub") or text.remove_html(posts[0]["com"]) data = { "board" : self.board, "thread": self.thread, "title" : text.unescape(title)[:50], } yield Message.Directory, data for post in posts: if "filename" in post: post.update(data) post["extension"] = post["ext"][1:] post["filename"] = text.unescape(post["filename"]) url = "https://i.4cdn.org/{}/{}{}".format( post["board"], post["tim"], post["ext"]) yield Message.Url, url, post class _4chanBoardExtractor(Extractor): """Extractor for 4chan boards""" category = "4chan" subcategory = "board" pattern = r"(?:https?://)?boards\.4chan(?:nel)?\.org/([^/?#]+)/\d*$" example = "https://boards.4channel.org/a/" def __init__(self, match): Extractor.__init__(self, match) self.board = match.group(1) def items(self): url = "https://a.4cdn.org/{}/threads.json".format(self.board) threads = self.request(url).json() for page in threads: for thread in page["threads"]: url = "https://boards.4chan.org/{}/thread/{}/".format( self.board, thread["no"]) thread["page"] = page["page"] thread["_extractor"] = _4chanThreadExtractor yield Message.Queue, url, thread ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1745260818.0 gallery_dl-1.29.7/gallery_dl/extractor/4chanarchives.py0000644000175000017500000000770515001510422021650 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://4chanarchives.com/""" from .common import Extractor, Message from .. import text class _4chanarchivesThreadExtractor(Extractor): """Extractor for threads on 4chanarchives.com""" category = "4chanarchives" subcategory = "thread" root = "https://4chanarchives.com" directory_fmt = ("{category}", "{board}", "{thread} - {title}") filename_fmt = "{no}-{filename}.{extension}" archive_fmt = "{board}_{thread}_{no}" referer = False pattern = r"(?:https?://)?4chanarchives\.com/board/([^/?#]+)/thread/(\d+)" example = "https://4chanarchives.com/board/a/thread/12345/" def __init__(self, match): Extractor.__init__(self, match) self.board, self.thread = match.groups() def items(self): url = "{}/board/{}/thread/{}".format( self.root, self.board, self.thread) page = self.request(url).text data = self.metadata(page) posts = self.posts(page) if not data["title"]: data["title"] = text.unescape(text.remove_html( posts[0]["com"]))[:50] for post in posts: post.update(data) yield Message.Directory, post if "url" in post: yield Message.Url, post["url"], post def metadata(self, page): return { "board" : self.board, "thread" : self.thread, "title" : text.unescape(text.extr( page, 'property="og:title" content="', '"')), } def posts(self, page): """Build a list of all post objects""" return [self.parse(html) for html in text.extract_iter( page, 'id="pc', '')] def parse(self, html): """Build post object by extracting data from an HTML post""" post = self._extract_post(html) if ">File: <" in html: self._extract_file(html, post) post["extension"] = post["url"].rpartition(".")[2] return post @staticmethod def _extract_post(html): extr = text.extract_from(html) return { "no" : text.parse_int(extr('', '"')), "name": extr('class="name">', '<'), "time": extr('class="dateTime postNum" >', '<').rstrip(), "com" : text.unescape( html[html.find('")[2]), } @staticmethod def _extract_file(html, post): extr = text.extract_from(html, html.index(">File: <")) post["url"] = extr('href="', '"') post["filename"] = text.unquote(extr(">", "<").rpartition(".")[0]) post["fsize"] = extr("(", ", ") post["w"] = text.parse_int(extr("", "x")) post["h"] = text.parse_int(extr("", ")")) class _4chanarchivesBoardExtractor(Extractor): """Extractor for boards on 4chanarchives.com""" category = "4chanarchives" subcategory = "board" root = "https://4chanarchives.com" pattern = r"(?:https?://)?4chanarchives\.com/board/([^/?#]+)(?:/(\d+))?/?$" example = "https://4chanarchives.com/board/a/" def __init__(self, match): Extractor.__init__(self, match) self.board, self.page = match.groups() def items(self): data = {"_extractor": _4chanarchivesThreadExtractor} pnum = text.parse_int(self.page, 1) needle = '''
data["pageCount"]: return url = "{}/{}/{}.json".format(self.root, board, pnum) threads = self.request(url).json()["threads"] ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1745260818.0 gallery_dl-1.29.7/gallery_dl/extractor/8muses.py0000644000175000017500000000727515001510422020354 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2019-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://comics.8muses.com/""" from .common import Extractor, Message from .. import text, util class _8musesAlbumExtractor(Extractor): """Extractor for image albums on comics.8muses.com""" category = "8muses" subcategory = "album" directory_fmt = ("{category}", "{album[path]}") filename_fmt = "{page:>03}.{extension}" archive_fmt = "{hash}" root = "https://comics.8muses.com" pattern = (r"(?:https?://)?(?:comics\.|www\.)?8muses\.com" r"(/comics/album/[^?#]+)(\?[^#]+)?") example = "https://comics.8muses.com/comics/album/PATH/TITLE" def __init__(self, match): Extractor.__init__(self, match) self.path = match.group(1) self.params = match.group(2) or "" def items(self): url = self.root + self.path + self.params while True: data = self._unobfuscate(text.extr( self.request(url).text, 'id="ractive-public" type="text/plain">', '')) images = data.get("pictures") if images: count = len(images) album = self._make_album(data["album"]) yield Message.Directory, {"album": album, "count": count} for num, image in enumerate(images, 1): url = self.root + "/image/fl/" + image["publicUri"] img = { "url" : url, "page" : num, "hash" : image["publicUri"], "count" : count, "album" : album, "extension": "jpg", } yield Message.Url, url, img albums = data.get("albums") if albums: for album in albums: permalink = album.get("permalink") if not permalink: self.log.debug("Private album") continue url = self.root + "/comics/album/" + permalink yield Message.Queue, url, { "url" : url, "name" : album["name"], "private" : album["isPrivate"], "_extractor": _8musesAlbumExtractor, } if data["page"] >= data["pages"]: return path, _, num = self.path.rstrip("/").rpartition("/") path = path if num.isdecimal() else self.path url = "{}{}/{}{}".format( self.root, path, data["page"] + 1, self.params) def _make_album(self, album): return { "id" : album["id"], "path" : album["path"], "parts" : album["path"].split("/"), "title" : album["name"], "private": album["isPrivate"], "url" : self.root + "/comics/album/" + album["permalink"], "parent" : text.parse_int(album["parentId"]), "views" : text.parse_int(album["numberViews"]), "likes" : text.parse_int(album["numberLikes"]), "date" : text.parse_datetime( album["updatedAt"], "%Y-%m-%dT%H:%M:%S.%fZ"), } @staticmethod def _unobfuscate(data): return util.json_loads("".join([ chr(33 + (ord(c) + 14) % 94) if "!" <= c <= "~" else c for c in text.unescape(data.strip("\t\n\r !")) ])) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1747857760.0 gallery_dl-1.29.7/gallery_dl/extractor/__init__.py0000644000175000017500000001256415013430540020671 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2015-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. import sys from ..util import re_compile modules = [ "2ch", "2chan", "2chen", "35photo", "3dbooru", "4chan", "4archive", "4chanarchives", "500px", "8chan", "8muses", "adultempire", "agnph", "ao3", "arcalive", "architizer", "artstation", "aryion", "batoto", "bbc", "behance", "bilibili", "blogger", "bluesky", "boosty", "bunkr", "catbox", "chevereto", "cien", "civitai", "comicvine", "cyberdrop", "danbooru", "desktopography", "deviantart", "discord", "dynastyscans", "e621", "erome", "everia", "exhentai", "facebook", "fanbox", "fantia", "fapello", "fapachi", "flickr", "furaffinity", "furry34", "fuskator", "gelbooru", "gelbooru_v01", "gelbooru_v02", "gofile", "hatenablog", "hentai2read", "hentaicosplays", "hentaifoundry", "hentaihand", "hentaihere", "hentainexus", "hiperdex", "hitomi", "hotleak", "idolcomplex", "imagebam", "imagechest", "imagefap", "imgbb", "imgbox", "imgth", "imgur", "imhentai", "inkbunny", "instagram", "issuu", "itaku", "itchio", "jschan", "kabeuchi", "keenspot", "kemonoparty", "khinsider", "koharu", "komikcast", "lensdump", "lexica", "lightroom", "livedoor", "lofter", "luscious", "lynxchan", "mangadex", "mangafox", "mangahere", "manganelo", "mangapark", "mangaread", "mangasee", "mangoxo", "misskey", "motherless", "myhentaigallery", "myportfolio", "naver", "naverwebtoon", "nekohouse", "newgrounds", "nhentai", "nijie", "nitter", "nozomi", "nsfwalbum", "paheal", "patreon", "pexels", "philomena", "photovogue", "picarto", "pictoa", "piczel", "pillowfort", "pinterest", "pixeldrain", "pixiv", "pixnet", "plurk", "poipiku", "poringa", "pornhub", "pornpics", "postmill", "reactor", "readcomiconline", "realbooru", "reddit", "redgifs", "rule34us", "rule34vault", "rule34xyz", "saint", "sankaku", "sankakucomplex", "scrolller", "seiga", "senmanga", "sexcom", "shimmie2", "simplyhentai", "skeb", "slickpic", "slideshare", "smugmug", "soundgasm", "speakerdeck", "steamgriddb", "subscribestar", "szurubooru", "tapas", "tcbscans", "telegraph", "tenor", "tiktok", "tmohentai", "toyhouse", "tsumino", "tumblr", "tumblrgallery", "twibooru", "twitter", "urlgalleries", "unsplash", "uploadir", "urlshortener", "vanillarock", "vichan", "vipergirls", "vk", "vsco", "wallhaven", "wallpapercave", "warosu", "weasyl", "webmshare", "webtoons", "weebcentral", "weibo", "wikiart", "wikifeet", "wikimedia", "xfolio", "xhamster", "xvideos", "yiffverse", "zerochan", "zzup", "booru", "moebooru", "foolfuuka", "foolslide", "mastodon", "shopify", "lolisafe", "imagehosts", "directlink", "recursive", "oauth", "noop", "ytdl", "generic", ] def find(url): """Find a suitable extractor for the given URL""" for cls in _list_classes(): match = cls.pattern.match(url) if match: return cls(match) return None def add(cls): """Add 'cls' to the list of available extractors""" if isinstance(cls.pattern, str): cls.pattern = re_compile(cls.pattern) _cache.append(cls) return cls def add_module(module): """Add all extractors in 'module' to the list of available extractors""" classes = _get_classes(module) if classes: if isinstance(classes[0].pattern, str): for cls in classes: cls.pattern = re_compile(cls.pattern) _cache.extend(classes) return classes def extractors(): """Yield all available extractor classes""" return sorted( _list_classes(), key=lambda x: x.__name__ ) # -------------------------------------------------------------------- # internals def _list_classes(): """Yield available extractor classes""" yield from _cache for module in _module_iter: yield from add_module(module) globals()["_list_classes"] = lambda : _cache def _modules_internal(): globals_ = globals() for module_name in modules: yield __import__(module_name, globals_, None, (), 1) def _modules_path(path, files): sys.path.insert(0, path) try: return [ __import__(name[:-3]) for name in files if name.endswith(".py") ] finally: del sys.path[0] def _get_classes(module): """Return a list of all extractor classes in a module""" return [ cls for cls in module.__dict__.values() if ( hasattr(cls, "pattern") and cls.__module__ == module.__name__ ) ] _cache = [] _module_iter = _modules_internal() ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1745260818.0 gallery_dl-1.29.7/gallery_dl/extractor/adultempire.py0000644000175000017500000000355515001510422021440 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2019-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.adultempire.com/""" from .common import GalleryExtractor from .. import text class AdultempireGalleryExtractor(GalleryExtractor): """Extractor for image galleries from www.adultempire.com""" category = "adultempire" root = "https://www.adultempire.com" pattern = (r"(?:https?://)?(?:www\.)?adult(?:dvd)?empire\.com" r"(/(\d+)/gallery\.html)") example = "https://www.adultempire.com/12345/gallery.html" def __init__(self, match): GalleryExtractor.__init__(self, match) self.gallery_id = match.group(2) def _init(self): self.cookies.set("ageConfirmed", "true", domain="www.adultempire.com") def metadata(self, page): extr = text.extract_from(page, page.index('
')) return { "gallery_id": text.parse_int(self.gallery_id), "title" : text.unescape(extr('title="', '"')), "studio" : extr(">studio", "<").strip(), "date" : text.parse_datetime(extr( ">released", "<").strip(), "%m/%d/%Y"), "actors" : sorted(text.split_html(extr( '
    = int(attrib["count"]): return params["page"] += 1 def _html(self, post): url = "{}/gallery/post/show/{}/".format(self.root, post["id"]) return self.request(url).text def _tags(self, post, page): tag_container = text.extr( page, '
      ', '

      Statistics

      ') if not tag_container: return tags = collections.defaultdict(list) pattern = re.compile(r'class="(.)typetag">([^<]+)') for tag_type, tag_name in pattern.findall(tag_container): tags[tag_type].append(text.unquote(tag_name).replace(" ", "_")) for key, value in tags.items(): post["tags_" + self.TAG_TYPES[key]] = " ".join(value) class AgnphTagExtractor(AgnphExtractor): subcategory = "tag" directory_fmt = ("{category}", "{search_tags}") archive_fmt = "t_{search_tags}_{id}" pattern = BASE_PATTERN + r"/gallery/post/(?:\?([^#]+))?$" example = "https://agn.ph/gallery/post/?search=TAG" def __init__(self, match): AgnphExtractor.__init__(self, match) self.params = text.parse_query(self.groups[0]) def metadata(self): return {"search_tags": self.params.get("search") or ""} def posts(self): url = self.root + "/gallery/post/" return self._pagination(url, self.params.copy()) class AgnphPostExtractor(AgnphExtractor): subcategory = "post" archive_fmt = "{id}" pattern = BASE_PATTERN + r"/gallery/post/show/(\d+)" example = "https://agn.ph/gallery/post/show/12345/" def posts(self): url = "{}/gallery/post/show/{}/?api=xml".format( self.root, self.groups[0]) post = ElementTree.fromstring(self.request(url).text) return (self._xml_to_dict(post),) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1745260818.0 gallery_dl-1.29.7/gallery_dl/extractor/ao3.py0000644000175000017500000002701215001510422017601 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2024 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://archiveofourown.org/""" from .common import Extractor, Message from .. import text, util, exception from ..cache import cache BASE_PATTERN = (r"(?:https?://)?(?:www\.)?" r"a(?:rchiveofourown|o3)\.(?:org|com|net)") class Ao3Extractor(Extractor): """Base class for ao3 extractors""" category = "ao3" root = "https://archiveofourown.org" categorytransfer = True cookies_domain = ".archiveofourown.org" cookies_names = ("remember_user_token",) request_interval = (0.5, 1.5) def items(self): self.login() base = self.root + "/works/" data = {"_extractor": Ao3WorkExtractor, "type": "work"} for work_id in self.works(): yield Message.Queue, base + work_id, data def items_list(self, type, needle, part=True): self.login() base = self.root + "/" data_work = {"_extractor": Ao3WorkExtractor, "type": "work"} data_series = {"_extractor": Ao3SeriesExtractor, "type": "series"} data_user = {"_extractor": Ao3UserExtractor, "type": "user"} for item in self._pagination(self.groups[0], needle): path = item.rpartition("/")[0] if part else item url = base + path if item.startswith("works/"): yield Message.Queue, url, data_work elif item.startswith("series/"): yield Message.Queue, url, data_series elif item.startswith("users/"): yield Message.Queue, url, data_user else: self.log.warning("Unsupported %s type '%s'", type, path) def works(self): return self._pagination(self.groups[0]) def login(self): if self.cookies_check(self.cookies_names): return username, password = self._get_auth_info() if username: return self.cookies_update(self._login_impl(username, password)) @cache(maxage=90*86400, keyarg=1) def _login_impl(self, username, password): self.log.info("Logging in as %s", username) url = self.root + "/users/login" page = self.request(url).text pos = page.find('id="loginform"') token = text.extract( page, ' name="authenticity_token" value="', '"', pos)[0] if not token: self.log.error("Unable to extract 'authenticity_token'") data = { "authenticity_token": text.unescape(token), "user[login]" : username, "user[password]" : password, "user[remember_me]" : "1", "commit" : "Log In", } response = self.request(url, method="POST", data=data) if not response.history: raise exception.AuthenticationError() remember = response.history[0].cookies.get("remember_user_token") if not remember: raise exception.AuthenticationError() return { "remember_user_token": remember, "user_credentials" : "1", } def _pagination(self, path, needle='
    ") for dl in text.extract_iter(download, ' href="', "') fmts[type.lower()] = path data = { "id" : text.parse_int(work_id), "rating" : text.split_html( extr('
    ', "
    ")), "warnings" : text.split_html( extr('
    ', "
    ")), "categories" : text.split_html( extr('
    ', "
    ")), "fandom" : text.split_html( extr('
    ', "
    ")), "relationships": text.split_html( extr('
    ', "
    ")), "characters" : text.split_html( extr('
    ', "
    ")), "tags" : text.split_html( extr('
    ', "
    ")), "lang" : extr('
    ', "
    "), "date" : text.parse_datetime( extr('
    ', "<"), "%Y-%m-%d"), "date_completed": text.parse_datetime( extr('>Completed:
    ', "<"), "%Y-%m-%d"), "date_updated" : text.parse_timestamp( path.rpartition("updated_at=")[2]), "words" : text.parse_int( extr('
    ', "<").replace(",", "")), "chapters" : chapters, "comments" : text.parse_int( extr('
    ', "<").replace(",", "")), "likes" : text.parse_int( extr('
    ', "<").replace(",", "")), "bookmarks" : text.parse_int(text.remove_html( extr('
    ', "
    ")).replace(",", "")), "views" : text.parse_int( extr('
    ', "<").replace(",", "")), "title" : text.unescape(text.remove_html( extr(' class="title heading">', "")).strip()), "author" : text.unescape(text.remove_html( extr(' class="byline heading">', ""))), "summary" : text.split_html( extr(' class="heading">Summary:', "
")), } data["language"] = util.code_to_language(data["lang"]) series = data["series"] if series: extr = text.extract_from(series) data["series"] = { "prev" : extr(' class="previous" href="/works/', '"'), "index": extr(' class="position">Part ', " "), "id" : extr(' href="/series/', '"'), "name" : text.unescape(extr(">", "<")), "next" : extr(' class="next" href="/works/', '"'), } else: data["series"] = None yield Message.Directory, data for fmt in self.formats: try: url = text.urljoin(self.root, fmts[fmt]) except KeyError: self.log.warning("%s: Format '%s' not available", work_id, fmt) else: yield Message.Url, url, text.nameext_from_url(url, data) class Ao3SeriesExtractor(Ao3Extractor): """Extractor for AO3 works of a series""" subcategory = "series" pattern = BASE_PATTERN + r"(/series/(\d+))" example = "https://archiveofourown.org/series/12345" class Ao3TagExtractor(Ao3Extractor): """Extractor for AO3 works by tag""" subcategory = "tag" pattern = BASE_PATTERN + r"(/tags/([^/?#]+)/works(?:/?\?.+)?)" example = "https://archiveofourown.org/tags/TAG/works" class Ao3SearchExtractor(Ao3Extractor): """Extractor for AO3 search results""" subcategory = "search" pattern = BASE_PATTERN + r"(/works/search/?\?.+)" example = "https://archiveofourown.org/works/search?work_search[query]=air" class Ao3UserExtractor(Ao3Extractor): """Extractor for an AO3 user profile""" subcategory = "user" pattern = (BASE_PATTERN + r"/users/([^/?#]+(?:/pseuds/[^/?#]+)?)" r"(?:/profile)?/?(?:$|\?|#)") example = "https://archiveofourown.org/users/USER" def initialize(self): pass def items(self): base = "{}/users/{}/".format(self.root, self.groups[0]) return self._dispatch_extractors(( (Ao3UserWorksExtractor , base + "works"), (Ao3UserSeriesExtractor , base + "series"), (Ao3UserBookmarkExtractor, base + "bookmarks"), ), ("user-works", "user-series")) class Ao3UserWorksExtractor(Ao3Extractor): """Extractor for works of an AO3 user""" subcategory = "user-works" pattern = (BASE_PATTERN + r"(/users/([^/?#]+)/(?:pseuds/([^/?#]+)/)?" r"works(?:/?\?.+)?)") example = "https://archiveofourown.org/users/USER/works" class Ao3UserSeriesExtractor(Ao3Extractor): """Extractor for series of an AO3 user""" subcategory = "user-series" pattern = (BASE_PATTERN + r"(/users/([^/?#]+)/(?:pseuds/([^/?#]+)/)?" r"series(?:/?\?.+)?)") example = "https://archiveofourown.org/users/USER/series" def items(self): self.login() base = self.root + "/series/" data = {"_extractor": Ao3SeriesExtractor} for series_id in self.series(): yield Message.Queue, base + series_id, data def series(self): return self._pagination(self.groups[0], '
  • \n]+)").findall return extr(content) class ArcaliveBoardExtractor(ArcaliveExtractor): """Extractor for an arca.live board's posts""" subcategory = "board" pattern = BASE_PATTERN + r"/b/([^/?#]+)/?(?:\?([^#]+))?$" example = "https://arca.live/b/breaking" def articles(self): self.board, query = self.groups params = text.parse_query(query) return self.api.board(self.board, params) class ArcaliveUserExtractor(ArcaliveExtractor): """Extractor for an arca.live users's posts""" subcategory = "user" pattern = BASE_PATTERN + r"/u/@([^/?#]+)/?(?:\?([^#]+))?$" example = "https://arca.live/u/@USER" def articles(self): self.board = None user, query = self.groups params = text.parse_query(query) return self.api.user_posts(text.unquote(user), params) class ArcaliveAPI(): def __init__(self, extractor): self.extractor = extractor self.log = extractor.log self.root = extractor.root + "/api/app" extractor.session.headers["X-Device-Token"] = util.generate_token(64) def board(self, board_slug, params): endpoint = "/list/channel/" + board_slug return self._pagination(endpoint, params, "articles") def post(self, post_id): endpoint = "/view/article/breaking/" + str(post_id) return self._call(endpoint) def user_posts(self, username, params): endpoint = "/list/channel/breaking" params["target"] = "nickname" params["keyword"] = username return self._pagination(endpoint, params, "articles") def _call(self, endpoint, params=None): url = self.root + endpoint response = self.extractor.request(url, params=params) data = response.json() if response.status_code == 200: return data self.log.debug("Server response: %s", data) msg = data.get("message") raise exception.StopExtraction( "API request failed%s", ": " + msg if msg else "") def _pagination(self, endpoint, params, key): while True: data = self._call(endpoint, params) posts = data.get(key) if not posts: break yield from posts params.update(data["next"]) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1746776695.0 gallery_dl-1.29.7/gallery_dl/extractor/architizer.py0000644000175000017500000000575115007331167021306 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2021-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://architizer.com/""" from .common import GalleryExtractor, Extractor, Message from .. import text class ArchitizerProjectExtractor(GalleryExtractor): """Extractor for project pages on architizer.com""" category = "architizer" subcategory = "project" root = "https://architizer.com" directory_fmt = ("{category}", "{firm}", "{title}") filename_fmt = "{filename}.{extension}" archive_fmt = "{gid}_{num}" pattern = r"(?:https?://)?architizer\.com/projects/([^/?#]+)" example = "https://architizer.com/projects/NAME/" def __init__(self, match): url = "{}/projects/{}/".format(self.root, match.group(1)) GalleryExtractor.__init__(self, match, url) def metadata(self, page): extr = text.extract_from(page) extr('id="Pages"', "") return { "title" : extr("data-name='", "'"), "slug" : extr("data-slug='", "'"), "gid" : extr("data-gid='", "'").rpartition(".")[2], "firm" : extr("data-firm-leaders-str='", "'"), "location" : extr("

    ", "<").strip(), "type" : text.unescape(text.remove_html(extr( '
    Type
    ', 'STATUS', 'YEAR', 'SIZE', '', '') .replace("
    ", "\n")), } def images(self, page): return [ (url, None) for url in text.extract_iter( page, 'property="og:image:secure_url" content="', "?") ] class ArchitizerFirmExtractor(Extractor): """Extractor for all projects of a firm""" category = "architizer" subcategory = "firm" root = "https://architizer.com" pattern = r"(?:https?://)?architizer\.com/firms/([^/?#]+)" example = "https://architizer.com/firms/NAME/" def __init__(self, match): Extractor.__init__(self, match) self.firm = match.group(1) def items(self): url = url = "{}/firms/{}/?requesting_merlin=pages".format( self.root, self.firm) page = self.request(url).text data = {"_extractor": ArchitizerProjectExtractor} for project in text.extract_iter(page, '
    = data["total_count"]: return params["page"] += 1 def _init_csrf_token(self): url = self.root + "/api/v2/csrf_protection/token.json" headers = { "Accept" : "*/*", "Origin" : self.root, } return self.request( url, method="POST", headers=headers, json={}, ).json()["public_csrf_token"] @staticmethod def _no_cache(url): """Cause a cache miss to prevent Cloudflare 'optimizations' Cloudflare's 'Polish' optimization strips image metadata and may even recompress an image as lossy JPEG. This can be prevented by causing a cache miss when requesting an image by adding a random dummy query parameter. Ref: https://github.com/r888888888/danbooru/issues/3528 https://danbooru.donmai.us/forum_topics/14952 """ sep = "&" if "?" in url else "?" token = util.generate_token(8) return url + sep + token[:4] + "=" + token[4:] class ArtstationUserExtractor(ArtstationExtractor): """Extractor for all projects of an artstation user""" subcategory = "user" pattern = (r"(?:https?://)?(?:(?:www\.)?artstation\.com" r"/(?!artwork|projects|search)([^/?#]+)(?:/albums/all)?" r"|((?!www)[\w-]+)\.artstation\.com(?:/projects)?)/?$") example = "https://www.artstation.com/USER" def projects(self): url = "{}/users/{}/projects.json".format(self.root, self.user) params = {"album_id": "all"} return self._pagination(url, params) class ArtstationAlbumExtractor(ArtstationExtractor): """Extractor for all projects in an artstation album""" subcategory = "album" directory_fmt = ("{category}", "{userinfo[username]}", "Albums", "{album[id]} - {album[title]}") archive_fmt = "a_{album[id]}_{asset[id]}" pattern = (r"(?:https?://)?(?:(?:www\.)?artstation\.com" r"/(?!artwork|projects|search)([^/?#]+)" r"|((?!www)[\w-]+)\.artstation\.com)/albums/(\d+)") example = "https://www.artstation.com/USER/albums/12345" def __init__(self, match): ArtstationExtractor.__init__(self, match) self.album_id = text.parse_int(match.group(3)) def metadata(self): userinfo = self.get_user_info(self.user) album = None for album in userinfo["albums_with_community_projects"]: if album["id"] == self.album_id: break else: raise exception.NotFoundError("album") return { "userinfo": userinfo, "album": album } def projects(self): url = "{}/users/{}/projects.json".format(self.root, self.user) params = {"album_id": self.album_id} return self._pagination(url, params) class ArtstationLikesExtractor(ArtstationExtractor): """Extractor for liked projects of an artstation user""" subcategory = "likes" directory_fmt = ("{category}", "{userinfo[username]}", "Likes") archive_fmt = "f_{userinfo[id]}_{asset[id]}" pattern = (r"(?:https?://)?(?:www\.)?artstation\.com" r"/(?!artwork|projects|search)([^/?#]+)/likes") example = "https://www.artstation.com/USER/likes" def projects(self): url = "{}/users/{}/likes.json".format(self.root, self.user) return self._pagination(url) class ArtstationCollectionExtractor(ArtstationExtractor): """Extractor for an artstation collection""" subcategory = "collection" directory_fmt = ("{category}", "{user}", "{collection[id]} {collection[name]}") archive_fmt = "c_{collection[id]}_{asset[id]}" pattern = (r"(?:https?://)?(?:www\.)?artstation\.com" r"/(?!artwork|projects|search)([^/?#]+)/collections/(\d+)") example = "https://www.artstation.com/USER/collections/12345" def __init__(self, match): ArtstationExtractor.__init__(self, match) self.collection_id = match.group(2) def metadata(self): url = "{}/collections/{}.json".format( self.root, self.collection_id) params = {"username": self.user} collection = self.request( url, params=params, notfound="collection").json() return {"collection": collection, "user": self.user} def projects(self): url = "{}/collections/{}/projects.json".format( self.root, self.collection_id) params = {"collection_id": self.collection_id} return self._pagination(url, params) class ArtstationCollectionsExtractor(ArtstationExtractor): """Extractor for an artstation user's collections""" subcategory = "collections" pattern = (r"(?:https?://)?(?:www\.)?artstation\.com" r"/(?!artwork|projects|search)([^/?#]+)/collections/?$") example = "https://www.artstation.com/USER/collections" def items(self): url = self.root + "/collections.json" params = {"username": self.user} for collection in self.request( url, params=params, notfound="collections").json(): url = "{}/{}/collections/{}".format( self.root, self.user, collection["id"]) collection["_extractor"] = ArtstationCollectionExtractor yield Message.Queue, url, collection class ArtstationChallengeExtractor(ArtstationExtractor): """Extractor for submissions of artstation challenges""" subcategory = "challenge" filename_fmt = "{submission_id}_{asset_id}_{filename}.{extension}" directory_fmt = ("{category}", "Challenges", "{challenge[id]} - {challenge[title]}") archive_fmt = "c_{challenge[id]}_{asset_id}" pattern = (r"(?:https?://)?(?:www\.)?artstation\.com" r"/contests/[^/?#]+/challenges/(\d+)" r"/?(?:\?sorting=([a-z]+))?") example = "https://www.artstation.com/contests/NAME/challenges/12345" def __init__(self, match): ArtstationExtractor.__init__(self, match) self.challenge_id = match.group(1) self.sorting = match.group(2) or "popular" def items(self): challenge_url = "{}/contests/_/challenges/{}.json".format( self.root, self.challenge_id) submission_url = "{}/contests/_/challenges/{}/submissions.json".format( self.root, self.challenge_id) update_url = "{}/contests/submission_updates.json".format( self.root) challenge = self.request(challenge_url).json() yield Message.Directory, {"challenge": challenge} params = {"sorting": self.sorting} for submission in self._pagination(submission_url, params): params = {"submission_id": submission["id"]} for update in self._pagination(update_url, params=params): del update["replies"] update["challenge"] = challenge for url in text.extract_iter( update["body_presentation_html"], ' href="', '"'): update["asset_id"] = self._id_from_url(url) text.nameext_from_url(url, update) yield Message.Url, self._no_cache(url), update @staticmethod def _id_from_url(url): """Get an image's submission ID from its URL""" parts = url.split("/") return text.parse_int("".join(parts[7:10])) class ArtstationSearchExtractor(ArtstationExtractor): """Extractor for artstation search results""" subcategory = "search" directory_fmt = ("{category}", "Searches", "{search[query]}") archive_fmt = "s_{search[query]}_{asset[id]}" pattern = (r"(?:https?://)?(?:\w+\.)?artstation\.com" r"/search/?\?([^#]+)") example = "https://www.artstation.com/search?query=QUERY" def __init__(self, match): ArtstationExtractor.__init__(self, match) self.params = query = text.parse_query(match.group(1)) self.query = text.unquote(query.get("query") or query.get("q", "")) self.sorting = query.get("sort_by", "relevance").lower() self.tags = query.get("tags", "").split(",") def metadata(self): return {"search": { "query" : self.query, "sorting": self.sorting, "tags" : self.tags, }} def projects(self): filters = [] for key, value in self.params.items(): if key.endswith("_ids") or key == "tags": filters.append({ "field" : key, "method": "include", "value" : value.split(","), }) url = "{}/api/v2/search/projects.json".format(self.root) data = { "query" : self.query, "page" : None, "per_page" : 50, "sorting" : self.sorting, "pro_first" : ("1" if self.config("pro-first", True) else "0"), "filters" : filters, "additional_fields": (), } return self._pagination(url, json=data) class ArtstationArtworkExtractor(ArtstationExtractor): """Extractor for projects on artstation's artwork page""" subcategory = "artwork" directory_fmt = ("{category}", "Artworks", "{artwork[sorting]!c}") archive_fmt = "A_{asset[id]}" pattern = (r"(?:https?://)?(?:\w+\.)?artstation\.com" r"/artwork/?\?([^#]+)") example = "https://www.artstation.com/artwork?sorting=SORT" def __init__(self, match): ArtstationExtractor.__init__(self, match) self.query = text.parse_query(match.group(1)) def metadata(self): return {"artwork": self.query} def projects(self): url = "{}/projects.json".format(self.root) return self._pagination(url, self.query.copy()) class ArtstationImageExtractor(ArtstationExtractor): """Extractor for images from a single artstation project""" subcategory = "image" pattern = (r"(?:https?://)?(?:" r"(?:[\w-]+\.)?artstation\.com/(?:artwork|projects|search)" r"|artstn\.co/p)/(\w+)") example = "https://www.artstation.com/artwork/abcde" def __init__(self, match): ArtstationExtractor.__init__(self, match) self.project_id = match.group(1) self.assets = None def metadata(self): self.assets = list(ArtstationExtractor.get_project_assets( self, self.project_id)) try: self.user = self.assets[0]["user"]["username"] except IndexError: self.user = "" return ArtstationExtractor.metadata(self) def projects(self): return ({"hash_id": self.project_id},) def get_project_assets(self, project_id): return self.assets class ArtstationFollowingExtractor(ArtstationExtractor): """Extractor for a user's followed users""" subcategory = "following" pattern = (r"(?:https?://)?(?:www\.)?artstation\.com" r"/(?!artwork|projects|search)([^/?#]+)/following") example = "https://www.artstation.com/USER/following" def items(self): url = "{}/users/{}/following.json".format(self.root, self.user) for user in self._pagination(url): url = "{}/{}".format(self.root, user["username"]) user["_extractor"] = ArtstationUserExtractor yield Message.Queue, url, user ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1747857760.0 gallery_dl-1.29.7/gallery_dl/extractor/aryion.py0000644000175000017500000002027615013430540020432 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2020-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://aryion.com/""" from .common import Extractor, Message from .. import text, util, exception from ..cache import cache from email.utils import parsedate_tz from datetime import datetime BASE_PATTERN = r"(?:https?://)?(?:www\.)?aryion\.com/g4" class AryionExtractor(Extractor): """Base class for aryion extractors""" category = "aryion" directory_fmt = ("{category}", "{user!l}", "{path:J - }") filename_fmt = "{id} {title}.{extension}" archive_fmt = "{id}" cookies_domain = ".aryion.com" cookies_names = ("phpbb3_rl7a3_sid",) root = "https://aryion.com" def __init__(self, match): Extractor.__init__(self, match) self.user = match.group(1) self.recursive = True def login(self): if self.cookies_check(self.cookies_names): return username, password = self._get_auth_info() if username: self.cookies_update(self._login_impl(username, password)) @cache(maxage=14*86400, keyarg=1) def _login_impl(self, username, password): self.log.info("Logging in as %s", username) url = self.root + "/forum/ucp.php?mode=login" data = { "username": username, "password": password, "login": "Login", } response = self.request(url, method="POST", data=data) if b"You have been successfully logged in." not in response.content: raise exception.AuthenticationError() return {c: response.cookies[c] for c in self.cookies_names} def items(self): self.login() data = self.metadata() for post_id in self.posts(): post = self._parse_post(post_id) if post: if data: post.update(data) yield Message.Directory, post yield Message.Url, post["url"], post elif post is False and self.recursive: base = self.root + "/g4/view/" data = {"_extractor": AryionPostExtractor} for post_id in self._pagination_params(base + post_id): yield Message.Queue, base + post_id, data def posts(self): """Yield relevant post IDs""" def metadata(self): """Return general metadata""" def _pagination_params(self, url, params=None, needle=None): if params is None: params = {"p": 1} else: params["p"] = text.parse_int(params.get("p"), 1) if needle is None: needle = "class='gallery-item' id='" while True: page = self.request(url, params=params).text cnt = 0 for post_id in text.extract_iter(page, needle, "'"): cnt += 1 yield post_id if cnt < 40: return params["p"] += 1 def _pagination_next(self, url): while True: page = self.request(url).text yield from text.extract_iter(page, "thumb' href='/g4/view/", "'") pos = page.find("Next >>") if pos < 0: return url = self.root + text.rextract(page, "href='", "'", pos)[0] def _parse_post(self, post_id): url = "{}/g4/data.php?id={}".format(self.root, post_id) with self.request(url, method="HEAD", fatal=False) as response: if response.status_code >= 400: self.log.warning( "Unable to fetch post %s ('%s %s')", post_id, response.status_code, response.reason) return None headers = response.headers # folder if headers["content-type"] in ( "application/x-folder", "application/x-comic-folder", "application/x-comic-folder-nomerge", ): return False # get filename from 'Content-Disposition' header cdis = headers["content-disposition"] fname, _, ext = text.extr(cdis, 'filename="', '"').rpartition(".") if not fname: fname, ext = ext, fname # get file size from 'Content-Length' header clen = headers.get("content-length") # fix 'Last-Modified' header lmod = headers["last-modified"] if lmod[22] != ":": lmod = "{}:{} GMT".format(lmod[:22], lmod[22:24]) post_url = "{}/g4/view/{}".format(self.root, post_id) extr = text.extract_from(self.request(post_url).text) title, _, artist = text.unescape(extr( "g4 :: ", "<")).rpartition(" by ") return { "id" : text.parse_int(post_id), "url" : url, "user" : self.user or artist, "title" : title, "artist": artist, "path" : text.split_html(extr( "cookiecrumb'>", '</span'))[4:-1:2], "date" : datetime(*parsedate_tz(lmod)[:6]), "size" : text.parse_int(clen), "views" : text.parse_int(extr("Views</b>:", "<").replace(",", "")), "width" : text.parse_int(extr("Resolution</b>:", "x")), "height": text.parse_int(extr("", "<")), "comments" : text.parse_int(extr("Comments</b>:", "<")), "favorites": text.parse_int(extr("Favorites</b>:", "<")), "tags" : text.split_html(extr("class='taglist'>", "</span>")), "description": text.unescape(text.remove_html(extr( "<p>", "</p>"), "", "")), "filename" : fname, "extension": ext, "_http_lastmodified": lmod, } class AryionGalleryExtractor(AryionExtractor): """Extractor for a user's gallery on eka's portal""" subcategory = "gallery" categorytransfer = True pattern = BASE_PATTERN + r"/(?:gallery/|user/|latest.php\?name=)([^/?#]+)" example = "https://aryion.com/g4/gallery/USER" def __init__(self, match): AryionExtractor.__init__(self, match) self.offset = 0 def _init(self): self.recursive = self.config("recursive", True) def skip(self, num): if self.recursive: return 0 self.offset += num return num def posts(self): if self.recursive: url = "{}/g4/gallery/{}".format(self.root, self.user) return self._pagination_params(url) else: url = "{}/g4/latest.php?name={}".format(self.root, self.user) return util.advance(self._pagination_next(url), self.offset) class AryionFavoriteExtractor(AryionExtractor): """Extractor for a user's favorites gallery""" subcategory = "favorite" directory_fmt = ("{category}", "{user!l}", "favorites") archive_fmt = "f_{user}_{id}" categorytransfer = True pattern = BASE_PATTERN + r"/favorites/([^/?#]+)" example = "https://aryion.com/g4/favorites/USER" def posts(self): url = "{}/g4/favorites/{}".format(self.root, self.user) return self._pagination_params( url, None, "class='gallery-item favorite' id='") class AryionTagExtractor(AryionExtractor): """Extractor for tag searches on eka's portal""" subcategory = "tag" directory_fmt = ("{category}", "tags", "{search_tags}") archive_fmt = "t_{search_tags}_{id}" pattern = BASE_PATTERN + r"/tags\.php\?([^#]+)" example = "https://aryion.com/g4/tags.php?tag=TAG" def _init(self): self.params = text.parse_query(self.user) self.user = None def metadata(self): return {"search_tags": self.params.get("tag")} def posts(self): url = self.root + "/g4/tags.php" return self._pagination_params(url, self.params) class AryionPostExtractor(AryionExtractor): """Extractor for individual posts on eka's portal""" subcategory = "post" pattern = BASE_PATTERN + r"/view/(\d+)" example = "https://aryion.com/g4/view/12345" def posts(self): post_id, self.user = self.user, None return (post_id,) ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1746776695.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.29.7/gallery_dl/extractor/batoto.py����������������������������������������������������0000644�0001750�0001750�00000013275�15007331167�020432� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://bato.to/""" from .common import Extractor, ChapterExtractor, MangaExtractor from .. import text, exception import re BASE_PATTERN = (r"(?:https?://)?(" r"(?:ba|d|f|h|j|m|w)to\.to|" r"(?:(?:manga|read)toto|batocomic|[xz]bato)\.(?:com|net|org)|" r"comiko\.(?:net|org)|" r"bat(?:otoo|o?two)\.com)") # https://rentry.co/batoto DOMAINS = { "dto.to", "fto.to", "hto.to", "jto.to", "mto.to", "wto.to", "xbato.com", "xbato.net", "xbato.org", "zbato.com", "zbato.net", "zbato.org", "readtoto.com", "readtoto.net", "readtoto.org", "batocomic.com", "batocomic.net", "batocomic.org", "batotoo.com", "batotwo.com", "comiko.net", "comiko.org", "battwo.com", } LEGACY_DOMAINS = { "bato.to", "mangatoto.com", "mangatoto.net", "mangatoto.org", } class BatotoBase(): """Base class for batoto extractors""" category = "batoto" root = "https://xbato.org" _warn_legacy = True def _init_root(self): domain = self.config("domain") if domain is None or domain in {"auto", "url"}: domain = self.groups[0] if domain in LEGACY_DOMAINS: if self._warn_legacy: BatotoBase._warn_legacy = False self.log.warning("Legacy domain '%s'", domain) elif domain == "nolegacy": domain = self.groups[0] if domain in LEGACY_DOMAINS: domain = "xbato.org" elif domain == "nowarn": domain = self.groups[0] self.root = "https://" + domain def request(self, url, **kwargs): kwargs["encoding"] = "utf-8" return Extractor.request(self, url, **kwargs) class BatotoChapterExtractor(BatotoBase, ChapterExtractor): """Extractor for batoto manga chapters""" archive_fmt = "{chapter_id}_{page}" pattern = BASE_PATTERN + r"/(?:title/[^/?#]+|chapter)/(\d+)" example = "https://xbato.org/title/12345-MANGA/54321" def __init__(self, match): ChapterExtractor.__init__(self, match, False) self._init_root() self.chapter_id = self.groups[1] self.gallery_url = "{}/title/0/{}".format(self.root, self.chapter_id) def metadata(self, page): extr = text.extract_from(page) try: manga, info, _ = extr("<title>", "<").rsplit(" - ", 3) except ValueError: manga = info = None manga_id = text.extr( extr('rel="canonical" href="', '"'), "/title/", "/") if not manga: manga = extr('link-hover">', "<") info = text.remove_html(extr('link-hover">', "</")) info = text.unescape(info) match = re.match( r"(?i)(?:(?:Volume|S(?:eason)?)\s*(\d+)\s+)?" r"(?:Chapter|Episode)\s*(\d+)([\w.]*)", info) if match: volume, chapter, minor = match.groups() else: volume = chapter = 0 minor = "" return { "manga" : text.unescape(manga), "manga_id" : text.parse_int(manga_id), "chapter_url" : extr(self.chapter_id + "-ch_", '"'), "title" : text.unescape(text.remove_html(extr( "selected>", "</option")).partition(" : ")[2]), "volume" : text.parse_int(volume), "chapter" : text.parse_int(chapter), "chapter_minor" : minor, "chapter_string": info, "chapter_id" : text.parse_int(self.chapter_id), "date" : text.parse_timestamp(extr(' time="', '"')[:-3]), } def images(self, page): images_container = text.extr(page, 'pageOpts', ':[0,0]}"') images_container = text.unescape(images_container) return [ (url, None) for url in text.extract_iter(images_container, r"\"", r"\"") ] class BatotoMangaExtractor(BatotoBase, MangaExtractor): """Extractor for batoto manga""" reverse = False chapterclass = BatotoChapterExtractor pattern = (BASE_PATTERN + r"/(?:title/(\d+)[^/?#]*|series/(\d+)(?:/[^/?#]*)?)/?$") example = "https://xbato.org/title/12345-MANGA/" def __init__(self, match): MangaExtractor.__init__(self, match, False) self._init_root() self.manga_id = self.groups[1] or self.groups[2] self.manga_url = "{}/title/{}".format(self.root, self.manga_id) def chapters(self, page): extr = text.extract_from(page) warning = extr(' class="alert alert-warning">', "</div><") if warning: raise exception.StopExtraction("'%s'", text.remove_html(warning)) data = { "manga_id": text.parse_int(self.manga_id), "manga" : text.unescape(extr( "<title>", "<").rpartition(" - ")[0]), } extr('<div data-hk="0-0-0-0"', "") results = [] while True: href = extr('<a href="/title/', '"') if not href: break chapter = href.rpartition("-ch_")[2] chapter, sep, minor = chapter.partition(".") data["chapter"] = text.parse_int(chapter) data["chapter_minor"] = sep + minor data["date"] = text.parse_datetime( extr('time="', '"'), "%Y-%m-%dT%H:%M:%S.%fZ") url = "{}/title/{}".format(self.root, href) results.append((url, data.copy())) return results �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1746776695.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.29.7/gallery_dl/extractor/bbc.py�������������������������������������������������������0000644�0001750�0001750�00000006573�15007331167�017673� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2021-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://bbc.co.uk/""" from .common import GalleryExtractor, Extractor, Message from .. import text, util BASE_PATTERN = r"(?:https?://)?(?:www\.)?bbc\.co\.uk(/programmes/" class BbcGalleryExtractor(GalleryExtractor): """Extractor for a programme gallery on bbc.co.uk""" category = "bbc" root = "https://www.bbc.co.uk" directory_fmt = ("{category}", "{path[0]}", "{path[1]}", "{path[2]}", "{path[3:]:J - /}") filename_fmt = "{num:>02}.{extension}" archive_fmt = "{programme}_{num}" pattern = BASE_PATTERN + r"[^/?#]+(?!/galleries)(?:/[^/?#]+)?)$" example = "https://www.bbc.co.uk/programmes/PATH" def metadata(self, page): data = self._extract_jsonld(page) return { "title": text.unescape(text.extr( page, "<h1>", "</h1>").rpartition("</span>")[2]), "description": text.unescape(text.extr( page, 'property="og:description" content="', '"')), "programme": self.gallery_url.split("/")[4], "path": list(util.unique_sequence( element["name"] for element in data["itemListElement"] )), } def images(self, page): width = self.config("width") width = width - width % 16 if width else 1920 dimensions = "/{}xn/".format(width) results = [] for img in text.extract_iter(page, 'class="gallery__thumbnail', ">"): src = text.extr(img, 'data-image-src="', '"') results.append(( src.replace("/320x180_b/", dimensions), { "title_image": text.unescape(text.extr( img, 'data-gallery-title="', '"')), "synopsis": text.unescape(text.extr( img, 'data-gallery-synopsis="', '"')), "_fallback": self._fallback_urls(src, width), }, )) return results @staticmethod def _fallback_urls(src, max_width): front, _, back = src.partition("/320x180_b/") for width in (1920, 1600, 1280, 976): if width < max_width: yield "{}/{}xn/{}".format(front, width, back) class BbcProgrammeExtractor(Extractor): """Extractor for all galleries of a bbc programme""" category = "bbc" subcategory = "programme" root = "https://www.bbc.co.uk" pattern = BASE_PATTERN + r"[^/?#]+/galleries)(?:/?\?page=(\d+))?" example = "https://www.bbc.co.uk/programmes/ID/galleries" def items(self): path, pnum = self.groups data = {"_extractor": BbcGalleryExtractor} params = {"page": text.parse_int(pnum, 1)} galleries_url = self.root + path while True: page = self.request(galleries_url, params=params).text for programme_id in text.extract_iter( page, '<a href="https://www.bbc.co.uk/programmes/', '"'): url = "https://www.bbc.co.uk/programmes/" + programme_id yield Message.Queue, url, data if 'rel="next"' not in page: return params["page"] += 1 �������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1745260818.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.29.7/gallery_dl/extractor/behance.py���������������������������������������������������0000644�0001750�0001750�00000037334�15001510422�020514� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2018-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.behance.net/""" from .common import Extractor, Message from .. import text, util, exception class BehanceExtractor(Extractor): """Base class for behance extractors""" category = "behance" root = "https://www.behance.net" request_interval = (2.0, 4.0) def _init(self): self._bcp = self.cookies.get("bcp", domain="www.behance.net") if not self._bcp: self._bcp = "4c34489d-914c-46cd-b44c-dfd0e661136d" self.cookies.set("bcp", self._bcp, domain="www.behance.net") def items(self): for gallery in self.galleries(): gallery["_extractor"] = BehanceGalleryExtractor yield Message.Queue, gallery["url"], self._update(gallery) def galleries(self): """Return all relevant gallery URLs""" def _request_graphql(self, endpoint, variables): url = self.root + "/v3/graphql" headers = { "Origin": self.root, "X-BCP" : self._bcp, "X-Requested-With": "XMLHttpRequest", } data = { "query" : GRAPHQL_QUERIES[endpoint], "variables": variables, } return self.request(url, method="POST", headers=headers, json=data).json()["data"] def _update(self, data): # compress data to simple lists if data.get("fields") and isinstance(data["fields"][0], dict): data["fields"] = [ field.get("name") or field.get("label") for field in data["fields"] ] data["owners"] = [ owner.get("display_name") or owner.get("displayName") for owner in data["owners"] ] tags = data.get("tags") or () if tags and isinstance(tags[0], dict): tags = [tag["title"] for tag in tags] data["tags"] = tags data["date"] = text.parse_timestamp( data.get("publishedOn") or data.get("conceived_on") or 0) # backwards compatibility data["gallery_id"] = data["id"] data["title"] = data["name"] data["user"] = ", ".join(data["owners"]) return data class BehanceGalleryExtractor(BehanceExtractor): """Extractor for image galleries from www.behance.net""" subcategory = "gallery" directory_fmt = ("{category}", "{owners:J, }", "{id} {name}") filename_fmt = "{category}_{id}_{num:>02}.{extension}" archive_fmt = "{id}_{num}" pattern = r"(?:https?://)?(?:www\.)?behance\.net/gallery/(\d+)" example = "https://www.behance.net/gallery/12345/TITLE" def __init__(self, match): BehanceExtractor.__init__(self, match) self.gallery_id = match.group(1) def _init(self): BehanceExtractor._init(self) modules = self.config("modules") if modules: if isinstance(modules, str): modules = modules.split(",") self.modules = set(modules) else: self.modules = {"image", "video", "mediacollection", "embed"} def items(self): data = self.get_gallery_data() imgs = self.get_images(data) data["count"] = len(imgs) yield Message.Directory, data for data["num"], (url, module) in enumerate(imgs, 1): data["module"] = module data["extension"] = (module.get("extension") or text.ext_from_url(url)) yield Message.Url, url, data def get_gallery_data(self): """Collect gallery info dict""" url = "{}/gallery/{}/a".format(self.root, self.gallery_id) cookies = { "gki": '{"feature_project_view":false,' '"feature_discover_login_prompt":false,' '"feature_project_login_prompt":false}', "ilo0": "true", } page = self.request(url, cookies=cookies).text data = util.json_loads(text.extr( page, 'id="beconfig-store_state">', '</script>')) return self._update(data["project"]["project"]) def get_images(self, data): """Extract image results from an API response""" if not data["modules"]: access = data.get("matureAccess") if access == "logged-out": raise exception.AuthorizationError( "Mature content galleries require logged-in cookies") if access == "restricted-safe": raise exception.AuthorizationError( "Mature content blocked in account settings") if access and access != "allowed": raise exception.AuthorizationError() return () result = [] append = result.append for module in data["modules"]: mtype = module["__typename"][:-6].lower() if mtype not in self.modules: self.log.debug("Skipping '%s' module", mtype) continue if mtype == "image": sizes = { size["url"].rsplit("/", 2)[1]: size for size in module["imageSizes"]["allAvailable"] } size = (sizes.get("source") or sizes.get("max_3840") or sizes.get("fs") or sizes.get("hd") or sizes.get("disp")) append((size["url"], module)) elif mtype == "video": try: url = text.extr(module["embed"], 'src="', '"') page = self.request(text.unescape(url)).text url = text.extr(page, '<source src="', '"') if text.ext_from_url(url) == "m3u8": url = "ytdl:" + url module["_ytdl_manifest"] = "hls" module["extension"] = "mp4" append((url, module)) continue except Exception as exc: self.log.debug("%s: %s", exc.__class__.__name__, exc) try: renditions = module["videoData"]["renditions"] except Exception: self.log.warning("No download URLs for video %s", module.get("id") or "???") continue try: url = [ r["url"] for r in renditions if text.ext_from_url(r["url"]) != "m3u8" ][-1] except Exception as exc: self.log.debug("%s: %s", exc.__class__.__name__, exc) url = "ytdl:" + renditions[-1]["url"] append((url, module)) elif mtype == "mediacollection": for component in module["components"]: for size in component["imageSizes"].values(): if size: parts = size["url"].split("/") parts[4] = "source" append(("/".join(parts), module)) break elif mtype == "embed": embed = module.get("originalEmbed") or module.get("fluidEmbed") if embed: embed = text.unescape(text.extr(embed, 'src="', '"')) module["extension"] = "mp4" append(("ytdl:" + embed, module)) elif mtype == "text": module["extension"] = "txt" append(("text:" + module["text"], module)) return result class BehanceUserExtractor(BehanceExtractor): """Extractor for a user's galleries from www.behance.net""" subcategory = "user" categorytransfer = True pattern = r"(?:https?://)?(?:www\.)?behance\.net/([^/?#]+)/?$" example = "https://www.behance.net/USER" def __init__(self, match): BehanceExtractor.__init__(self, match) self.user = match.group(1) def galleries(self): endpoint = "GetProfileProjects" variables = { "username": self.user, "after" : "MAo=", # "0" in base64 } while True: data = self._request_graphql(endpoint, variables) items = data["user"]["profileProjects"] yield from items["nodes"] if not items["pageInfo"]["hasNextPage"]: return variables["after"] = items["pageInfo"]["endCursor"] class BehanceCollectionExtractor(BehanceExtractor): """Extractor for a collection's galleries from www.behance.net""" subcategory = "collection" categorytransfer = True pattern = r"(?:https?://)?(?:www\.)?behance\.net/collection/(\d+)" example = "https://www.behance.net/collection/12345/TITLE" def __init__(self, match): BehanceExtractor.__init__(self, match) self.collection_id = match.group(1) def galleries(self): endpoint = "GetMoodboardItemsAndRecommendations" variables = { "afterItem": "MAo=", # "0" in base64 "firstItem": 40, "id" : int(self.collection_id), "shouldGetItems" : True, "shouldGetMoodboardFields": False, "shouldGetRecommendations": False, } while True: data = self._request_graphql(endpoint, variables) items = data["moodboard"]["items"] for node in items["nodes"]: yield node["entity"] if not items["pageInfo"]["hasNextPage"]: return variables["afterItem"] = items["pageInfo"]["endCursor"] GRAPHQL_QUERIES = { "GetProfileProjects": """\ query GetProfileProjects($username: String, $after: String) { user(username: $username) { profileProjects(first: 12, after: $after) { pageInfo { endCursor hasNextPage } nodes { __typename adminFlags { mature_lock privacy_lock dmca_lock flagged_lock privacy_violation_lock trademark_lock spam_lock eu_ip_lock } colors { r g b } covers { size_202 { url } size_404 { url } size_808 { url } } features { url name featuredOn ribbon { image image2x image3x } } fields { id label slug url } hasMatureContent id isFeatured isHiddenFromWorkTab isMatureReviewSubmitted isOwner isFounder isPinnedToSubscriptionOverview isPrivate linkedAssets { ...sourceLinkFields } linkedAssetsCount sourceFiles { ...sourceFileFields } matureAccess modifiedOn name owners { ...OwnerFields images { size_50 { url } } } premium publishedOn stats { appreciations { all } views { all } comments { all } } slug tools { id title category categoryLabel categoryId approved url backgroundColor } url } } } } fragment sourceFileFields on SourceFile { __typename sourceFileId projectId userId title assetId renditionUrl mimeType size category licenseType unitAmount currency tier hidden extension hasUserPurchased } fragment sourceLinkFields on LinkedAsset { __typename name premium url category licenseType } fragment OwnerFields on User { displayName hasPremiumAccess id isFollowing isProfileOwner location locationUrl url username availabilityInfo { availabilityTimeline isAvailableFullTime isAvailableFreelance } } """, "GetMoodboardItemsAndRecommendations": """\ query GetMoodboardItemsAndRecommendations( $id: Int! $firstItem: Int! $afterItem: String $shouldGetRecommendations: Boolean! $shouldGetItems: Boolean! $shouldGetMoodboardFields: Boolean! ) { viewer @include(if: $shouldGetMoodboardFields) { isOptedOutOfRecommendations isAdmin } moodboard(id: $id) { ...moodboardFields @include(if: $shouldGetMoodboardFields) items(first: $firstItem, after: $afterItem) @include(if: $shouldGetItems) { pageInfo { endCursor hasNextPage } nodes { ...nodesFields } } recommendedItems(first: 80) @include(if: $shouldGetRecommendations) { nodes { ...nodesFields fetchSource } } } } fragment moodboardFields on Moodboard { id label privacy followerCount isFollowing projectCount url isOwner owners { ...OwnerFields images { size_50 { url } size_100 { url } size_115 { url } size_230 { url } size_138 { url } size_276 { url } } } } fragment projectFields on Project { __typename id isOwner publishedOn matureAccess hasMatureContent modifiedOn name url isPrivate slug license { license description id label url text images } fields { label } colors { r g b } owners { ...OwnerFields images { size_50 { url } size_100 { url } size_115 { url } size_230 { url } size_138 { url } size_276 { url } } } covers { size_original { url } size_max_808 { url } size_808 { url } size_404 { url } size_202 { url } size_230 { url } size_115 { url } } stats { views { all } appreciations { all } comments { all } } } fragment exifDataValueFields on exifDataValue { id label value searchValue } fragment nodesFields on MoodboardItem { id entityType width height flexWidth flexHeight images { size url } entity { ... on Project { ...projectFields } ... on ImageModule { project { ...projectFields } colors { r g b } exifData { lens { ...exifDataValueFields } software { ...exifDataValueFields } makeAndModel { ...exifDataValueFields } focalLength { ...exifDataValueFields } iso { ...exifDataValueFields } location { ...exifDataValueFields } flash { ...exifDataValueFields } exposureMode { ...exifDataValueFields } shutterSpeed { ...exifDataValueFields } aperture { ...exifDataValueFields } } } ... on MediaCollectionComponent { project { ...projectFields } } } } fragment OwnerFields on User { displayName hasPremiumAccess id isFollowing isProfileOwner location locationUrl url username availabilityInfo { availabilityTimeline isAvailableFullTime isAvailableFreelance } } """, } ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1746776695.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.29.7/gallery_dl/extractor/bilibili.py��������������������������������������������������0000644�0001750�0001750�00000012775�15007331167�020725� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.bilibili.com/""" from .common import Extractor, Message from .. import text, util, exception class BilibiliExtractor(Extractor): """Base class for bilibili extractors""" category = "bilibili" root = "https://www.bilibili.com" request_interval = (3.0, 6.0) def _init(self): self.api = BilibiliAPI(self) class BilibiliUserArticlesExtractor(BilibiliExtractor): """Extractor for a bilibili user's articles""" subcategory = "user-articles" pattern = (r"(?:https?://)?space\.bilibili\.com/(\d+)" r"/(?:article|upload/opus)") example = "https://space.bilibili.com/12345/article" def items(self): for article in self.api.user_articles(self.groups[0]): article["_extractor"] = BilibiliArticleExtractor url = "{}/opus/{}".format(self.root, article["opus_id"]) yield Message.Queue, url, article class BilibiliArticleExtractor(BilibiliExtractor): """Extractor for a bilibili article""" subcategory = "article" pattern = (r"(?:https?://)?" r"(?:t\.bilibili\.com|(?:www\.)?bilibili.com/opus)/(\d+)") example = "https://www.bilibili.com/opus/12345" directory_fmt = ("{category}", "{username}") filename_fmt = "{id}_{num}.{extension}" archive_fmt = "{id}_{num}" def items(self): article = self.api.article(self.groups[0]) # Flatten modules list modules = {} for module in article["detail"]["modules"]: del module['module_type'] modules.update(module) article["detail"]["modules"] = modules article["username"] = modules["module_author"]["name"] pics = [] if "module_top" in modules: try: pics.extend(modules["module_top"]["display"]["album"]["pics"]) except Exception: pass for paragraph in modules['module_content']['paragraphs']: if "pic" not in paragraph: continue try: pics.extend(paragraph["pic"]["pics"]) except Exception: pass article["count"] = len(pics) yield Message.Directory, article for article["num"], pic in enumerate(pics, 1): url = pic["url"] article.update(pic) yield Message.Url, url, text.nameext_from_url(url, article) class BilibiliUserArticlesFavoriteExtractor(BilibiliExtractor): subcategory = "user-articles-favorite" pattern = (r"(?:https?://)?space\.bilibili\.com" r"/(\d+)/favlist\?fid=opus") example = "https://space.bilibili.com/12345/favlist?fid=opus" _warning = True def _init(self): BilibiliExtractor._init(self) if self._warning: if not self.cookies_check(("SESSDATA",)): self.log.error("'SESSDATA' cookie required") BilibiliUserArticlesFavoriteExtractor._warning = False def items(self): for article in self.api.user_favlist(): article["_extractor"] = BilibiliArticleExtractor url = "{}/opus/{}".format(self.root, article["opus_id"]) yield Message.Queue, url, article class BilibiliAPI(): def __init__(self, extractor): self.extractor = extractor def _call(self, endpoint, params): url = "https://api.bilibili.com/x/polymer/web-dynamic/v1" + endpoint data = self.extractor.request(url, params=params).json() if data["code"] != 0: self.extractor.log.debug("Server response: %s", data) raise exception.StopExtraction("API request failed") return data def user_articles(self, user_id): endpoint = "/opus/feed/space" params = {"host_mid": user_id} while True: data = self._call(endpoint, params) for item in data["data"]["items"]: params["offset"] = item["opus_id"] yield item if not data["data"]["has_more"]: break def article(self, article_id): url = "https://www.bilibili.com/opus/" + article_id while True: page = self.extractor.request(url).text try: return util.json_loads(text.extr( page, "window.__INITIAL_STATE__=", "};") + "}") except Exception: if "window._riskdata_" not in page: raise exception.StopExtraction( "%s: Unable to extract INITIAL_STATE data", article_id) self.extractor.wait(seconds=300) def user_favlist(self): endpoint = "/opus/feed/fav" params = {"page": 1, "page_size": 20} while True: data = self._call(endpoint, params)["data"] yield from data["items"] if not data.get("has_more"): break params["page"] += 1 def login_user_id(self): url = "https://api.bilibili.com/x/space/v2/myinfo" data = self.extractor.request(url).json() if data["code"] != 0: self.extractor.log.debug("Server response: %s", data) raise exception.StopExtraction("API request failed,Are you login?") try: return data["data"]["profile"]["mid"] except Exception: raise exception.StopExtraction("API request failed") ���././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1745260818.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.29.7/gallery_dl/extractor/blogger.py���������������������������������������������������0000644�0001750�0001750�00000014577�15001510422�020554� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2019-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for Blogger blogs""" from .common import BaseExtractor, Message from .. import text, util import re class BloggerExtractor(BaseExtractor): """Base class for blogger extractors""" basecategory = "blogger" directory_fmt = ("blogger", "{blog[name]}", "{post[date]:%Y-%m-%d} {post[title]}") filename_fmt = "{num:>03}.{extension}" archive_fmt = "{post[id]}_{num}" def _init(self): self.api = BloggerAPI(self) self.blog = self.root.rpartition("/")[2] self.videos = self.config("videos", True) def items(self): blog = self.api.blog_by_url("http://" + self.blog) blog["pages"] = blog["pages"]["totalItems"] blog["posts"] = blog["posts"]["totalItems"] blog["date"] = text.parse_datetime(blog["published"]) del blog["selfLink"] sub = re.compile(r"(/|=)(?:[sw]\d+|w\d+-h\d+)(?=/|$)").sub findall_image = re.compile( r'src="(https?://(?:' r'blogger\.googleusercontent\.com/img|' r'lh\d+(?:-\w+)?\.googleusercontent\.com|' r'\d+\.bp\.blogspot\.com)/[^"]+)').findall findall_video = re.compile( r'src="(https?://www\.blogger\.com/video\.g\?token=[^"]+)').findall metadata = self.metadata() for post in self.posts(blog): content = post["content"] files = findall_image(content) for idx, url in enumerate(files): files[idx] = sub(r"\1s0", url).replace("http:", "https:", 1) if self.videos and 'id="BLOG_video-' in content: page = self.request(post["url"]).text for url in findall_video(page): page = self.request(url).text video_config = util.json_loads(text.extr( page, 'var VIDEO_CONFIG =', '\n')) files.append(max( video_config["streams"], key=lambda x: x["format_id"], )["play_url"]) post["author"] = post["author"]["displayName"] post["replies"] = post["replies"]["totalItems"] post["content"] = text.remove_html(content) post["date"] = text.parse_datetime(post["published"]) del post["selfLink"] del post["blog"] data = {"blog": blog, "post": post} if metadata: data.update(metadata) yield Message.Directory, data for data["num"], url in enumerate(files, 1): data["url"] = url yield Message.Url, url, text.nameext_from_url(url, data) def posts(self, blog): """Return an iterable with all relevant post objects""" def metadata(self): """Return additional metadata""" BASE_PATTERN = BloggerExtractor.update({ "blogspot": { "root": None, "pattern": r"[\w-]+\.blogspot\.com", }, }) class BloggerPostExtractor(BloggerExtractor): """Extractor for a single blog post""" subcategory = "post" pattern = BASE_PATTERN + r"(/\d\d\d\d/\d\d/[^/?#]+\.html)" example = "https://BLOG.blogspot.com/1970/01/TITLE.html" def __init__(self, match): BloggerExtractor.__init__(self, match) self.path = match.group(match.lastindex) def posts(self, blog): return (self.api.post_by_path(blog["id"], self.path),) class BloggerBlogExtractor(BloggerExtractor): """Extractor for an entire Blogger blog""" subcategory = "blog" pattern = BASE_PATTERN + r"/?$" example = "https://BLOG.blogspot.com/" def posts(self, blog): return self.api.blog_posts(blog["id"]) class BloggerSearchExtractor(BloggerExtractor): """Extractor for Blogger search resuls""" subcategory = "search" pattern = BASE_PATTERN + r"/search/?\?q=([^&#]+)" example = "https://BLOG.blogspot.com/search?q=QUERY" def __init__(self, match): BloggerExtractor.__init__(self, match) self.query = text.unquote(match.group(match.lastindex)) def posts(self, blog): return self.api.blog_search(blog["id"], self.query) def metadata(self): return {"query": self.query} class BloggerLabelExtractor(BloggerExtractor): """Extractor for Blogger posts by label""" subcategory = "label" pattern = BASE_PATTERN + r"/search/label/([^/?#]+)" example = "https://BLOG.blogspot.com/search/label/LABEL" def __init__(self, match): BloggerExtractor.__init__(self, match) self.label = text.unquote(match.group(match.lastindex)) def posts(self, blog): return self.api.blog_posts(blog["id"], self.label) def metadata(self): return {"label": self.label} class BloggerAPI(): """Minimal interface for the Blogger v3 API Ref: https://developers.google.com/blogger """ API_KEY = "AIzaSyCN9ax34oMMyM07g_M-5pjeDp_312eITK8" def __init__(self, extractor): self.extractor = extractor self.api_key = extractor.config("api-key") or self.API_KEY def blog_by_url(self, url): return self._call("blogs/byurl", {"url": url}, "blog") def blog_posts(self, blog_id, label=None): endpoint = "blogs/{}/posts".format(blog_id) params = {"labels": label} return self._pagination(endpoint, params) def blog_search(self, blog_id, query): endpoint = "blogs/{}/posts/search".format(blog_id) params = {"q": query} return self._pagination(endpoint, params) def post_by_path(self, blog_id, path): endpoint = "blogs/{}/posts/bypath".format(blog_id) return self._call(endpoint, {"path": path}, "post") def _call(self, endpoint, params, notfound=None): url = "https://www.googleapis.com/blogger/v3/" + endpoint params["key"] = self.api_key return self.extractor.request( url, params=params, notfound=notfound).json() def _pagination(self, endpoint, params): while True: data = self._call(endpoint, params) if "items" in data: yield from data["items"] if "nextPageToken" not in data: return params["pageToken"] = data["nextPageToken"] ���������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1747857760.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.29.7/gallery_dl/extractor/bluesky.py���������������������������������������������������0000644�0001750�0001750�00000047416�15013430540�020614� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2024 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://bsky.app/""" from .common import Extractor, Message from .. import text, util, exception from ..cache import cache, memcache BASE_PATTERN = (r"(?:https?://)?" r"(?:(?:www\.)?(?:c|[fv]x)?bs[ky]y[ex]?\.app|main\.bsky\.dev)") USER_PATTERN = BASE_PATTERN + r"/profile/([^/?#]+)" class BlueskyExtractor(Extractor): """Base class for bluesky extractors""" category = "bluesky" directory_fmt = ("{category}", "{author[handle]}") filename_fmt = "{createdAt[:19]}_{post_id}_{num}.{extension}" archive_fmt = "{filename}" root = "https://bsky.app" def _init(self): meta = self.config("metadata") or () if meta: if isinstance(meta, str): meta = meta.replace(" ", "").split(",") elif not isinstance(meta, (list, tuple)): meta = ("user", "facets") self._metadata_user = ("user" in meta) self._metadata_facets = ("facets" in meta) self.api = BlueskyAPI(self) self._user = self._user_did = None self.instance = self.root.partition("://")[2] self.videos = self.config("videos", True) self.quoted = self.config("quoted", False) def items(self): for post in self.posts(): if "post" in post: post = post["post"] if self._user_did and post["author"]["did"] != self._user_did: self.log.debug("Skipping %s (repost)", self._pid(post)) continue embed = post.get("embed") try: post.update(post.pop("record")) except Exception: self.log.debug("Skipping %s (no 'record')", self._pid(post)) continue while True: self._prepare(post) files = self._extract_files(post) yield Message.Directory, post if files: did = post["author"]["did"] base = ( "{}/xrpc/com.atproto.sync.getBlob?did={}&cid=".format( self.api.service_endpoint(did), did)) for post["num"], file in enumerate(files, 1): post.update(file) yield Message.Url, base + file["filename"], post if not self.quoted or not embed or "record" not in embed: break quote = embed["record"] if "record" in quote: quote = quote["record"] value = quote.pop("value", None) if value is None: break quote["quote_id"] = self._pid(post) quote["quote_by"] = post["author"] embed = quote.get("embed") quote.update(value) post = quote def posts(self): return () def _posts_records(self, actor, collection): depth = self.config("depth", "0") for record in self.api.list_records(actor, collection): uri = None try: uri = record["value"]["subject"]["uri"] if "/app.bsky.feed.post/" in uri: yield from self.api.get_post_thread_uri(uri, depth) except exception.StopExtraction: pass # deleted post except Exception as exc: self.log.debug(record, exc_info=exc) self.log.warning("Failed to extract %s (%s: %s)", uri or "record", exc.__class__.__name__, exc) def _pid(self, post): return post["uri"].rpartition("/")[2] @memcache(keyarg=1) def _instance(self, handle): return ".".join(handle.rsplit(".", 2)[-2:]) def _prepare(self, post): author = post["author"] author["instance"] = self._instance(author["handle"]) if self._metadata_facets: if "facets" in post: post["hashtags"] = tags = [] post["mentions"] = dids = [] post["uris"] = uris = [] for facet in post["facets"]: features = facet["features"][0] if "tag" in features: tags.append(features["tag"]) elif "did" in features: dids.append(features["did"]) elif "uri" in features: uris.append(features["uri"]) else: post["hashtags"] = post["mentions"] = post["uris"] = () if self._metadata_user: post["user"] = self._user or author post["instance"] = self.instance post["post_id"] = self._pid(post) post["date"] = text.parse_datetime( post["createdAt"][:19], "%Y-%m-%dT%H:%M:%S") def _extract_files(self, post): if "embed" not in post: post["count"] = 0 return () files = [] media = post["embed"] if "media" in media: media = media["media"] if "images" in media: for image in media["images"]: files.append(self._extract_media(image, "image")) if "video" in media and self.videos: files.append(self._extract_media(media, "video")) post["count"] = len(files) return files def _extract_media(self, media, key): try: aspect = media["aspectRatio"] width = aspect["width"] height = aspect["height"] except KeyError: width = height = 0 data = media[key] try: cid = data["ref"]["$link"] except KeyError: cid = data["cid"] return { "description": media.get("alt") or "", "width" : width, "height" : height, "filename" : cid, "extension" : data["mimeType"].rpartition("/")[2], } def _make_post(self, actor, kind): did = self.api._did_from_actor(actor) profile = self.api.get_profile(did) if kind not in profile: return () cid = profile[kind].rpartition("/")[2].partition("@")[0] return ({ "post": { "embed": {"images": [{ "alt": kind, "image": { "$type" : "blob", "ref" : {"$link": cid}, "mimeType": "image/jpeg", "size" : 0, }, "aspectRatio": { "width" : 1000, "height": 1000, }, }]}, "author" : profile, "record" : (), "createdAt": "", "uri" : cid, }, },) class BlueskyUserExtractor(BlueskyExtractor): subcategory = "user" pattern = USER_PATTERN + r"$" example = "https://bsky.app/profile/HANDLE" def initialize(self): pass def items(self): base = "{}/profile/{}/".format(self.root, self.groups[0]) default = ("posts" if self.config("quoted", False) or self.config("reposts", False) else "media") return self._dispatch_extractors(( (BlueskyInfoExtractor , base + "info"), (BlueskyAvatarExtractor , base + "avatar"), (BlueskyBackgroundExtractor, base + "banner"), (BlueskyPostsExtractor , base + "posts"), (BlueskyRepliesExtractor , base + "replies"), (BlueskyMediaExtractor , base + "media"), (BlueskyVideoExtractor , base + "video"), (BlueskyLikesExtractor , base + "likes"), ), (default,)) class BlueskyPostsExtractor(BlueskyExtractor): subcategory = "posts" pattern = USER_PATTERN + r"/posts" example = "https://bsky.app/profile/HANDLE/posts" def posts(self): return self.api.get_author_feed( self.groups[0], "posts_and_author_threads") class BlueskyRepliesExtractor(BlueskyExtractor): subcategory = "replies" pattern = USER_PATTERN + r"/replies" example = "https://bsky.app/profile/HANDLE/replies" def posts(self): return self.api.get_author_feed( self.groups[0], "posts_with_replies") class BlueskyMediaExtractor(BlueskyExtractor): subcategory = "media" pattern = USER_PATTERN + r"/media" example = "https://bsky.app/profile/HANDLE/media" def posts(self): return self.api.get_author_feed( self.groups[0], "posts_with_media") class BlueskyVideoExtractor(BlueskyExtractor): subcategory = "video" pattern = USER_PATTERN + r"/video" example = "https://bsky.app/profile/HANDLE/video" def posts(self): return self.api.get_author_feed( self.groups[0], "posts_with_video") class BlueskyLikesExtractor(BlueskyExtractor): subcategory = "likes" pattern = USER_PATTERN + r"/likes" example = "https://bsky.app/profile/HANDLE/likes" def posts(self): if self.config("endpoint") == "getActorLikes": return self.api.get_actor_likes(self.groups[0]) return self._posts_records(self.groups[0], "app.bsky.feed.like") class BlueskyFeedExtractor(BlueskyExtractor): subcategory = "feed" pattern = USER_PATTERN + r"/feed/([^/?#]+)" example = "https://bsky.app/profile/HANDLE/feed/NAME" def posts(self): actor, feed = self.groups return self.api.get_feed(actor, feed) class BlueskyListExtractor(BlueskyExtractor): subcategory = "list" pattern = USER_PATTERN + r"/lists/([^/?#]+)" example = "https://bsky.app/profile/HANDLE/lists/ID" def posts(self): actor, list_id = self.groups return self.api.get_list_feed(actor, list_id) class BlueskyFollowingExtractor(BlueskyExtractor): subcategory = "following" pattern = USER_PATTERN + r"/follows" example = "https://bsky.app/profile/HANDLE/follows" def items(self): for user in self.api.get_follows(self.groups[0]): url = "https://bsky.app/profile/" + user["did"] user["_extractor"] = BlueskyUserExtractor yield Message.Queue, url, user class BlueskyPostExtractor(BlueskyExtractor): subcategory = "post" pattern = USER_PATTERN + r"/post/([^/?#]+)" example = "https://bsky.app/profile/HANDLE/post/ID" def posts(self): actor, post_id = self.groups return self.api.get_post_thread(actor, post_id) class BlueskyInfoExtractor(BlueskyExtractor): subcategory = "info" pattern = USER_PATTERN + r"/info" example = "https://bsky.app/profile/HANDLE/info" def items(self): self._metadata_user = True self.api._did_from_actor(self.groups[0]) return iter(((Message.Directory, self._user),)) class BlueskyAvatarExtractor(BlueskyExtractor): subcategory = "avatar" filename_fmt = "avatar_{post_id}.{extension}" pattern = USER_PATTERN + r"/avatar" example = "https://bsky.app/profile/HANDLE/avatar" def posts(self): return self._make_post(self.groups[0], "avatar") class BlueskyBackgroundExtractor(BlueskyExtractor): subcategory = "background" filename_fmt = "background_{post_id}.{extension}" pattern = USER_PATTERN + r"/ba(?:nner|ckground)" example = "https://bsky.app/profile/HANDLE/banner" def posts(self): return self._make_post(self.groups[0], "banner") class BlueskySearchExtractor(BlueskyExtractor): subcategory = "search" pattern = BASE_PATTERN + r"/search(?:/|\?q=)(.+)" example = "https://bsky.app/search?q=QUERY" def posts(self): query = text.unquote(self.groups[0].replace("+", " ")) return self.api.search_posts(query) class BlueskyHashtagExtractor(BlueskyExtractor): subcategory = "hashtag" pattern = BASE_PATTERN + r"/hashtag/([^/?#]+)(?:/(top|latest))?" example = "https://bsky.app/hashtag/NAME" def posts(self): hashtag, order = self.groups return self.api.search_posts("#"+hashtag, order) class BlueskyAPI(): """Interface for the Bluesky API https://docs.bsky.app/docs/category/http-reference """ def __init__(self, extractor): self.extractor = extractor self.log = extractor.log self.headers = {"Accept": "application/json"} self.username, self.password = extractor._get_auth_info() if self.username: self.root = "https://bsky.social" else: self.root = "https://api.bsky.app" self.authenticate = util.noop def get_actor_likes(self, actor): endpoint = "app.bsky.feed.getActorLikes" params = { "actor": self._did_from_actor(actor), "limit": "100", } return self._pagination(endpoint, params, check_empty=True) def get_author_feed(self, actor, filter="posts_and_author_threads"): endpoint = "app.bsky.feed.getAuthorFeed" params = { "actor" : self._did_from_actor(actor, True), "filter": filter, "limit" : "100", } return self._pagination(endpoint, params) def get_feed(self, actor, feed): endpoint = "app.bsky.feed.getFeed" params = { "feed" : "at://{}/app.bsky.feed.generator/{}".format( self._did_from_actor(actor), feed), "limit": "100", } return self._pagination(endpoint, params) def get_follows(self, actor): endpoint = "app.bsky.graph.getFollows" params = { "actor": self._did_from_actor(actor), "limit": "100", } return self._pagination(endpoint, params, "follows") def get_list_feed(self, actor, list): endpoint = "app.bsky.feed.getListFeed" params = { "list" : "at://{}/app.bsky.graph.list/{}".format( self._did_from_actor(actor), list), "limit": "100", } return self._pagination(endpoint, params) def get_post_thread(self, actor, post_id): uri = "at://{}/app.bsky.feed.post/{}".format( self._did_from_actor(actor), post_id) depth = self.extractor.config("depth", "0") return self.get_post_thread_uri(uri, depth) def get_post_thread_uri(self, uri, depth="0"): endpoint = "app.bsky.feed.getPostThread" params = { "uri" : uri, "depth" : depth, "parentHeight": "0", } thread = self._call(endpoint, params)["thread"] if "replies" not in thread: return (thread,) index = 0 posts = [thread] while index < len(posts): post = posts[index] if "replies" in post: posts.extend(post["replies"]) index += 1 return posts @memcache(keyarg=1) def get_profile(self, did): endpoint = "app.bsky.actor.getProfile" params = {"actor": did} return self._call(endpoint, params) def list_records(self, actor, collection): endpoint = "com.atproto.repo.listRecords" actor_did = self._did_from_actor(actor) params = { "repo" : actor_did, "collection": collection, "limit" : "100", # "reverse" : "false", } return self._pagination(endpoint, params, "records", self.service_endpoint(actor_did)) @memcache(keyarg=1) def resolve_handle(self, handle): endpoint = "com.atproto.identity.resolveHandle" params = {"handle": handle} return self._call(endpoint, params)["did"] @memcache(keyarg=1) def service_endpoint(self, did): if did.startswith('did:web:'): url = "https://" + did[8:] + "/.well-known/did.json" else: url = "https://plc.directory/" + did try: data = self.extractor.request(url).json() for service in data["service"]: if service["type"] == "AtprotoPersonalDataServer": return service["serviceEndpoint"] except Exception: pass return "https://bsky.social" def search_posts(self, query, sort=None): endpoint = "app.bsky.feed.searchPosts" params = { "q" : query, "limit": "100", "sort" : sort, } return self._pagination(endpoint, params, "posts") def _did_from_actor(self, actor, user_did=False): if actor.startswith("did:"): did = actor else: did = self.resolve_handle(actor) extr = self.extractor if user_did and not extr.config("reposts", False): extr._user_did = did if extr._metadata_user: extr._user = user = self.get_profile(did) user["instance"] = extr._instance(user["handle"]) return did def authenticate(self): self.headers["Authorization"] = self._authenticate_impl(self.username) @cache(maxage=3600, keyarg=1) def _authenticate_impl(self, username): refresh_token = _refresh_token_cache(username) if refresh_token: self.log.info("Refreshing access token for %s", username) endpoint = "com.atproto.server.refreshSession" headers = {"Authorization": "Bearer " + refresh_token} data = None else: self.log.info("Logging in as %s", username) endpoint = "com.atproto.server.createSession" headers = None data = { "identifier": username, "password" : self.password, } url = "{}/xrpc/{}".format(self.root, endpoint) response = self.extractor.request( url, method="POST", headers=headers, json=data, fatal=None) data = response.json() if response.status_code != 200: self.log.debug("Server response: %s", data) raise exception.AuthenticationError('"{}: {}"'.format( data.get("error"), data.get("message"))) _refresh_token_cache.update(self.username, data["refreshJwt"]) return "Bearer " + data["accessJwt"] def _call(self, endpoint, params, root=None): if root is None: root = self.root url = "{}/xrpc/{}".format(root, endpoint) while True: self.authenticate() response = self.extractor.request( url, params=params, headers=self.headers, fatal=None) if response.status_code < 400: return response.json() if response.status_code == 429: until = response.headers.get("RateLimit-Reset") self.extractor.wait(until=until) continue try: data = response.json() msg = "API request failed ('{}: {}')".format( data["error"], data["message"]) except Exception: msg = "API request failed ({} {})".format( response.status_code, response.reason) self.extractor.log.debug("Server response: %s", response.text) raise exception.StopExtraction(msg) def _pagination(self, endpoint, params, key="feed", root=None, check_empty=False): while True: data = self._call(endpoint, params, root) if check_empty and not data[key]: return yield from data[key] cursor = data.get("cursor") if not cursor: return params["cursor"] = cursor @cache(maxage=84*86400, keyarg=0) def _refresh_token_cache(username): return None ��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1746855846.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.29.7/gallery_dl/extractor/booru.py�����������������������������������������������������0000644�0001750�0001750�00000005375�15007563646�020303� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2015-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for *booru sites""" from .common import BaseExtractor, Message from .. import text import operator class BooruExtractor(BaseExtractor): """Base class for *booru extractors""" basecategory = "booru" filename_fmt = "{category}_{id}_{md5}.{extension}" page_start = 0 per_page = 100 def items(self): self.login() data = self.metadata() tags = self.config("tags", False) notes = self.config("notes", False) fetch_html = tags or notes url_key = self.config("url") if url_key: if isinstance(url_key, (list, tuple)): self._file_url = self._file_url_list self._file_url_keys = url_key else: self._file_url = operator.itemgetter(url_key) for post in self.posts(): try: url = self._file_url(post) if url[0] == "/": url = self.root + url except Exception as exc: self.log.debug("%s: %s", exc.__class__.__name__, exc) self.log.warning("Unable to fetch download URL for post %s " "(md5: %s)", post.get("id"), post.get("md5")) continue if fetch_html: html = self._html(post) if tags: self._tags(post, html) if notes: self._notes(post, html) text.nameext_from_url(url, post) post.update(data) self._prepare(post) yield Message.Directory, post yield Message.Url, url, post def skip(self, num): pages = num // self.per_page self.page_start += pages return pages * self.per_page def login(self): """Login and set necessary cookies""" def metadata(self): """Return a dict with general metadata""" return () def posts(self): """Return an iterable with post objects""" return () _file_url = operator.itemgetter("file_url") def _file_url_list(self, post): urls = (post[key] for key in self._file_url_keys if post.get(key)) post["_fallback"] = it = iter(urls) return next(it) def _prepare(self, post): """Prepare a 'post's metadata""" def _html(self, post): """Return HTML content of a post""" def _tags(self, post, page): """Extract extended tag metadata""" def _notes(self, post, page): """Extract notes metadata""" �������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1746776695.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.29.7/gallery_dl/extractor/boosty.py����������������������������������������������������0000644�0001750�0001750�00000035250�15007331167�020456� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.boosty.to/""" from .common import Extractor, Message from .. import text, util, exception import itertools BASE_PATTERN = r"(?:https?://)?boosty\.to" class BoostyExtractor(Extractor): """Base class for boosty extractors""" category = "boosty" root = "https://www.boosty.to" directory_fmt = ("{category}", "{user[blogUrl]} ({user[id]})", "{post[date]:%Y-%m-%d} {post[int_id]}") filename_fmt = "{num:>02} {file[id]}.{extension}" archive_fmt = "{file[id]}" cookies_domain = ".boosty.to" cookies_names = ("auth",) def _init(self): self.api = BoostyAPI(self) self._user = None if self.config("metadata") else False self.only_allowed = self.config("allowed", True) self.only_bought = self.config("bought") videos = self.config("videos") if videos is None or videos: if isinstance(videos, str): videos = videos.split(",") elif not isinstance(videos, (list, tuple)): # ultra_hd: 2160p # quad_hd: 1440p # full_hd: 1080p # high: 720p # medium: 480p # low: 360p # lowest: 240p # tiny: 144p videos = ("ultra_hd", "quad_hd", "full_hd", "high", "medium", "low", "lowest", "tiny") self.videos = videos def items(self): for post in self.posts(): if not post.get("hasAccess"): self.log.warning("Not allowed to access post %s", post["id"]) continue files = self._extract_files(post) if self._user: post["user"] = self._user data = { "post" : post, "user" : post.pop("user", None), "count": len(files), } yield Message.Directory, data for data["num"], file in enumerate(files, 1): data["file"] = file url = file["url"] yield Message.Url, url, text.nameext_from_url(url, data) def posts(self): """Yield JSON content of all relevant posts""" def _extract_files(self, post): files = [] post["content"] = content = [] post["links"] = links = [] if "createdAt" in post: post["date"] = text.parse_timestamp(post["createdAt"]) for block in post["data"]: try: type = block["type"] if type == "text": if block["modificator"] == "BLOCK_END": continue c = util.json_loads(block["content"]) content.append(c[0]) elif type == "image": files.append(self._update_url(post, block)) elif type == "ok_video": if not self.videos: self.log.debug("%s: Skipping video %s", post["id"], block["id"]) continue fmts = { fmt["type"]: fmt["url"] for fmt in block["playerUrls"] if fmt["url"] } formats = [ fmts[fmt] for fmt in self.videos if fmt in fmts ] if formats: formats = iter(formats) block["url"] = next(formats) block["_fallback"] = formats files.append(block) else: self.log.warning( "%s: Found no suitable video format for %s", post["id"], block["id"]) elif type == "link": url = block["url"] links.append(url) content.append(url) elif type == "audio_file": files.append(self._update_url(post, block)) elif type == "file": files.append(self._update_url(post, block)) elif type == "smile": content.append(":" + block["name"] + ":") else: self.log.debug("%s: Unsupported data type '%s'", post["id"], type) except Exception as exc: self.log.debug("%s: %s", exc.__class__.__name__, exc) del post["data"] return files def _update_url(self, post, block): url = block["url"] sep = "&" if "?" in url else "?" signed_query = post.get("signedQuery") if signed_query: url += sep + signed_query[1:] sep = "&" migrated = post.get("isMigrated") if migrated is not None: url += sep + "is_migrated=" + str(migrated).lower() block["url"] = url return block class BoostyUserExtractor(BoostyExtractor): """Extractor for boosty.to user profiles""" subcategory = "user" pattern = BASE_PATTERN + r"/([^/?#]+)(?:\?([^#]+))?$" example = "https://boosty.to/USER" def posts(self): user, query = self.groups params = text.parse_query(query) if self._user is None: self._user = self.api.user(user) return self.api.blog_posts(user, params) class BoostyMediaExtractor(BoostyExtractor): """Extractor for boosty.to user media""" subcategory = "media" directory_fmt = "{category}", "{user[blogUrl]} ({user[id]})", "media" filename_fmt = "{post[id]}_{num}.{extension}" pattern = BASE_PATTERN + r"/([^/?#]+)/media/([^/?#]+)(?:\?([^#]+))?" example = "https://boosty.to/USER/media/all" def posts(self): user, media, query = self.groups params = text.parse_query(query) self._user = self.api.user(user) return self.api.blog_media_album(user, media, params) class BoostyFeedExtractor(BoostyExtractor): """Extractor for your boosty.to subscription feed""" subcategory = "feed" pattern = BASE_PATTERN + r"/(?:\?([^#]+))?(?:$|#)" example = "https://boosty.to/" def posts(self): params = text.parse_query(self.groups[0]) return self.api.feed_posts(params) class BoostyPostExtractor(BoostyExtractor): """Extractor for boosty.to posts""" subcategory = "post" pattern = BASE_PATTERN + r"/([^/?#]+)/posts/([0-9a-f-]+)" example = "https://boosty.to/USER/posts/01234567-89ab-cdef-0123-456789abcd" def posts(self): user, post_id = self.groups if self._user is None: self._user = self.api.user(user) return (self.api.post(user, post_id),) class BoostyFollowingExtractor(BoostyExtractor): """Extractor for your boosty.to subscribed users""" subcategory = "following" pattern = BASE_PATTERN + r"/app/settings/subscriptions" example = "https://boosty.to/app/settings/subscriptions" def items(self): for user in self.api.user_subscriptions(): url = "{}/{}".format(self.root, user["blog"]["blogUrl"]) user["_extractor"] = BoostyUserExtractor yield Message.Queue, url, user class BoostyDirectMessagesExtractor(BoostyExtractor): """Extractor for boosty.to direct messages""" subcategory = "direct-messages" directory_fmt = ("{category}", "{user[blogUrl]} ({user[id]})", "Direct Messages") pattern = BASE_PATTERN + r"/app/messages/?\?dialogId=(\d+)" example = "https://boosty.to/app/messages?dialogId=12345" def items(self): """Yield direct messages from a given dialog ID.""" dialog_id = self.groups[0] response = self.api.dialog(dialog_id) signed_query = response.get("signedQuery") try: messages = response["messages"]["data"] offset = messages[0]["id"] except Exception: return try: user = self.api.user(response["chatmate"]["url"]) except Exception: user = None messages.reverse() for message in itertools.chain( messages, self.api.dialog_messages(dialog_id, offset=offset) ): message["signedQuery"] = signed_query files = self._extract_files(message) data = { "post": message, "user": user, "count": len(files), } yield Message.Directory, data for data["num"], file in enumerate(files, 1): data["file"] = file url = file["url"] yield Message.Url, url, text.nameext_from_url(url, data) class BoostyAPI(): """Interface for the Boosty API""" root = "https://api.boosty.to" def __init__(self, extractor, access_token=None): self.extractor = extractor self.headers = { "Accept": "application/json, text/plain, */*", "Origin": extractor.root, } if not access_token: auth = self.extractor.cookies.get("auth", domain=".boosty.to") if auth: access_token = text.extr( auth, "%22accessToken%22%3A%22", "%22") if access_token: self.headers["Authorization"] = "Bearer " + access_token def blog_posts(self, username, params): endpoint = "/v1/blog/{}/post/".format(username) params = self._merge_params(params, { "limit" : "5", "offset" : None, "comments_limit": "2", "reply_limit" : "1", }) return self._pagination(endpoint, params) def blog_media_album(self, username, type="all", params=()): endpoint = "/v1/blog/{}/media_album/".format(username) params = self._merge_params(params, { "type" : type.rstrip("s"), "limit" : "15", "limit_by": "media", "offset" : None, }) return self._pagination(endpoint, params, self._transform_media_posts) def _transform_media_posts(self, data): posts = [] for obj in data["mediaPosts"]: post = obj["post"] post["data"] = obj["media"] posts.append(post) return posts def post(self, username, post_id): endpoint = "/v1/blog/{}/post/{}".format(username, post_id) return self._call(endpoint) def feed_posts(self, params=None): endpoint = "/v1/feed/post/" params = self._merge_params(params, { "limit" : "5", "offset" : None, "comments_limit": "2", }) if "only_allowed" not in params and self.extractor.only_allowed: params["only_allowed"] = "true" if "only_bought" not in params and self.extractor.only_bought: params["only_bought"] = "true" return self._pagination(endpoint, params, key="posts") def user(self, username): endpoint = "/v1/blog/" + username user = self._call(endpoint) user["id"] = user["owner"]["id"] return user def user_subscriptions(self, params=None): endpoint = "/v1/user/subscriptions" params = self._merge_params(params, { "limit" : "30", "with_follow": "true", "offset" : None, }) return self._pagination_users(endpoint, params) def _merge_params(self, params_web, params_api): if params_web: web_to_api = { "isOnlyAllowedPosts": "is_only_allowed", "postsTagsIds" : "tags_ids", "postsFrom" : "from_ts", "postsTo" : "to_ts", } for name, value in params_web.items(): name = web_to_api.get(name, name) params_api[name] = value return params_api def _call(self, endpoint, params=None): url = self.root + endpoint while True: response = self.extractor.request( url, params=params, headers=self.headers, fatal=None, allow_redirects=False) if response.status_code < 300: return response.json() elif response.status_code < 400: raise exception.AuthenticationError("Invalid API access token") elif response.status_code == 429: self.extractor.wait(seconds=600) else: self.extractor.log.debug(response.text) raise exception.StopExtraction("API request failed") def _pagination(self, endpoint, params, transform=None, key=None): if "is_only_allowed" not in params and self.extractor.only_allowed: params["only_allowed"] = "true" params["is_only_allowed"] = "true" while True: data = self._call(endpoint, params) if transform: yield from transform(data["data"]) elif key: yield from data["data"][key] else: yield from data["data"] extra = data["extra"] if extra.get("isLast"): return offset = extra.get("offset") if not offset: return params["offset"] = offset def _pagination_users(self, endpoint, params): while True: data = self._call(endpoint, params) yield from data["data"] offset = data["offset"] + data["limit"] if offset > data["total"]: return params["offset"] = offset def dialog(self, dialog_id): endpoint = "/v1/dialog/{}".format(dialog_id) return self._call(endpoint) def dialog_messages(self, dialog_id, limit=300, offset=None): endpoint = "/v1/dialog/{}/message/".format(dialog_id) params = { "limit": limit, "reverse": "true", "offset": offset, } return self._pagination_dialog(endpoint, params) def _pagination_dialog(self, endpoint, params): while True: data = self._call(endpoint, params) yield from data["data"] try: extra = data["extra"] if extra.get("isLast"): break params["offset"] = offset = extra["offset"] if not offset: break except Exception: break ��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1746776695.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.29.7/gallery_dl/extractor/bunkr.py�����������������������������������������������������0000644�0001750�0001750�00000016203�15007331167�020255� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2022-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://bunkr.si/""" from .common import Extractor from .lolisafe import LolisafeAlbumExtractor from .. import text, util, config, exception import random if config.get(("extractor", "bunkr"), "tlds"): BASE_PATTERN = ( r"(?:bunkr:(?:https?://)?([^/?#]+)|" r"(?:https?://)?(?:app\.)?(bunkr+\.\w+))" ) else: BASE_PATTERN = ( r"(?:bunkr:(?:https?://)?([^/?#]+)|" r"(?:https?://)?(?:app\.)?(bunkr+" r"\.(?:s[kiu]|c[ir]|fi|p[hks]|ru|la|is|to|a[cx]" r"|black|cat|media|red|site|ws|org)))" ) DOMAINS = [ "bunkr.ac", "bunkr.ci", "bunkr.cr", "bunkr.fi", "bunkr.ph", "bunkr.pk", "bunkr.ps", "bunkr.si", "bunkr.sk", "bunkr.ws", "bunkr.black", "bunkr.red", "bunkr.media", "bunkr.site", ] LEGACY_DOMAINS = { "bunkr.ax", "bunkr.cat", "bunkr.ru", "bunkrr.ru", "bunkr.su", "bunkrr.su", "bunkr.la", "bunkr.is", "bunkr.to", } CF_DOMAINS = set() class BunkrAlbumExtractor(LolisafeAlbumExtractor): """Extractor for bunkr.si albums""" category = "bunkr" root = "https://bunkr.si" root_dl = "https://get.bunkrr.su" archive_fmt = "{album_id}_{id|id_url}" pattern = BASE_PATTERN + r"/a/([^/?#]+)" example = "https://bunkr.si/a/ID" def __init__(self, match): LolisafeAlbumExtractor.__init__(self, match) domain = self.groups[0] or self.groups[1] if domain not in LEGACY_DOMAINS: self.root = "https://" + domain def _init(self): LolisafeAlbumExtractor._init(self) endpoint = self.config("endpoint") if not endpoint: endpoint = self.root_dl + "/api/_001" elif endpoint[0] == "/": endpoint = self.root_dl + endpoint self.endpoint = endpoint self.offset = 0 def skip(self, num): self.offset = num return num def request(self, url, **kwargs): kwargs["encoding"] = "utf-8" kwargs["allow_redirects"] = False while True: try: response = Extractor.request(self, url, **kwargs) if response.status_code < 300: return response # redirect url = response.headers["Location"] if url[0] == "/": url = self.root + url continue root, path = self._split(url) if root not in CF_DOMAINS: continue self.log.debug("Redirect to known CF challenge domain '%s'", root) except exception.HttpError as exc: if exc.status != 403: raise # CF challenge root, path = self._split(url) CF_DOMAINS.add(root) self.log.debug("Added '%s' to CF challenge domains", root) try: DOMAINS.remove(root.rpartition("/")[2]) except ValueError: pass else: if not DOMAINS: raise exception.StopExtraction( "All Bunkr domains require solving a CF challenge") # select alternative domain self.root = root = "https://" + random.choice(DOMAINS) self.log.debug("Trying '%s' as fallback", root) url = root + path def fetch_album(self, album_id): # album metadata page = self.request(self.root + "/a/" + album_id).text title = text.unescape(text.unescape(text.extr( page, 'property="og:title" content="', '"'))) # files items = list(text.extract_iter( page, '<div class="grid-images_box', "</a>")) return self._extract_files(items), { "album_id" : album_id, "album_name" : title, "album_size" : text.extr( page, '<span class="font-semibold">(', ')'), "count" : len(items), } def _extract_files(self, items): if self.offset: items = util.advance(items, self.offset) for item in items: try: url = text.unescape(text.extr(item, ' href="', '"')) if url[0] == "/": url = self.root + url file = self._extract_file(url) info = text.split_html(item) if not file["name"]: file["name"] = info[-3] file["size"] = info[-2] file["date"] = text.parse_datetime( info[-1], "%H:%M:%S %d/%m/%Y") yield file except exception.StopExtraction: raise except Exception as exc: self.log.error("%s: %s", exc.__class__.__name__, exc) self.log.debug("", exc_info=exc) def _extract_file(self, webpage_url): page = self.request(webpage_url).text data_id = text.extr(page, 'data-file-id="', '"') referer = self.root_dl + "/file/" + data_id headers = {"Referer": referer, "Origin": self.root_dl} data = self.request(self.endpoint, method="POST", headers=headers, json={"id": data_id}).json() if data.get("encrypted"): key = "SECRET_KEY_{}".format(data["timestamp"] // 3600) file_url = util.decrypt_xor(data["url"], key.encode()) else: file_url = data["url"] file_name = text.extr(page, "<h1", "<").rpartition(">")[2] fallback = text.extr(page, 'property="og:url" content="', '"') return { "file" : file_url, "name" : text.unescape(file_name), "id_url" : data_id, "_fallback" : (fallback,) if fallback else (), "_http_headers" : {"Referer": referer}, "_http_validate": self._validate, } def _validate(self, response): if response.history and response.url.endswith("/maintenance-vid.mp4"): self.log.warning("File server in maintenance mode") return False return True def _split(self, url): pos = url.index("/", 8) return url[:pos], url[pos:] class BunkrMediaExtractor(BunkrAlbumExtractor): """Extractor for bunkr.si media links""" subcategory = "media" directory_fmt = ("{category}",) pattern = BASE_PATTERN + r"(/[fvid]/[^/?#]+)" example = "https://bunkr.si/f/FILENAME" def fetch_album(self, album_id): try: file = self._extract_file(self.root + album_id) except Exception as exc: self.log.error("%s: %s", exc.__class__.__name__, exc) return (), {} return (file,), { "album_id" : "", "album_name" : "", "album_size" : -1, "description": "", "count" : 1, } ���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1743584435.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.29.7/gallery_dl/extractor/catbox.py����������������������������������������������������0000644�0001750�0001750�00000003515�14773176263�020433� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2022-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://catbox.moe/""" from .common import GalleryExtractor, Extractor, Message from .. import text class CatboxAlbumExtractor(GalleryExtractor): """Extractor for catbox albums""" category = "catbox" subcategory = "album" root = "https://catbox.moe" filename_fmt = "{filename}.{extension}" directory_fmt = ("{category}", "{album_name} ({album_id})") archive_fmt = "{album_id}_{filename}" pattern = r"(?:https?://)?(?:www\.)?catbox\.moe(/c/[^/?#]+)" example = "https://catbox.moe/c/ID" def metadata(self, page): extr = text.extract_from(page) return { "album_id" : self.gallery_url.rpartition("/")[2], "album_name" : text.unescape(extr("<h1>", "<")), "date" : text.parse_datetime(extr( "<p>Created ", "<"), "%B %d %Y"), "description": text.unescape(extr("<p>", "<")), } def images(self, page): return [ ("https://files.catbox.moe/" + path, None) for path in text.extract_iter( page, ">https://files.catbox.moe/", "<") ] class CatboxFileExtractor(Extractor): """Extractor for catbox files""" category = "catbox" subcategory = "file" archive_fmt = "{filename}" pattern = r"(?:https?://)?(?:files|litter|de)\.catbox\.moe/([^/?#]+)" example = "https://files.catbox.moe/NAME.EXT" def items(self): url = text.ensure_http_scheme(self.url) file = text.nameext_from_url(url, {"url": url}) yield Message.Directory, file yield Message.Url, url, file �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1747857760.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.29.7/gallery_dl/extractor/chevereto.py�������������������������������������������������0000644�0001750�0001750�00000007512�15013430540�021113� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for Chevereto galleries""" from .common import BaseExtractor, Message from .. import text, util class CheveretoExtractor(BaseExtractor): """Base class for chevereto extractors""" basecategory = "chevereto" directory_fmt = ("{category}", "{user}", "{album}",) archive_fmt = "{id}" def _init(self): self.path = self.groups[-1] def _pagination(self, url): while True: page = self.request(url).text for item in text.extract_iter( page, '<div class="list-item-image ', 'image-container'): yield text.urljoin(self.root, text.extr( item, '<a href="', '"')) url = text.extr(page, 'data-pagination="next" href="', '"') if not url: return if url[0] == "/": url = self.root + url BASE_PATTERN = CheveretoExtractor.update({ "jpgfish": { "root": "https://jpg5.su", "pattern": r"jpe?g\d?\.(?:su|pet|fish(?:ing)?|church)", }, "imgkiwi": { "root": "https://img.kiwi", "pattern": r"img\.kiwi", }, "imagepond": { "root": "https://imagepond.net", "pattern": r"imagepond\.net", }, }) class CheveretoImageExtractor(CheveretoExtractor): """Extractor for chevereto Images""" subcategory = "image" pattern = BASE_PATTERN + r"(/im(?:g|age)/[^/?#]+)" example = "https://jpg2.su/img/TITLE.ID" def items(self): url = self.root + self.path page = self.request(url).text extr = text.extract_from(page) url = (extr('<meta property="og:image" content="', '"') or extr('url: "', '"')) if not url or url.endswith("/loading.svg"): pos = page.find(" download=") url = text.rextract(page, 'href="', '"', pos)[0] if not url.startswith("https://"): url = util.decrypt_xor( url, b"seltilovessimpcity@simpcityhatesscrapers", fromhex=True) image = { "id" : self.path.rpartition(".")[2], "url" : url, "album": text.extr(extr("Added to <a", "/a>"), ">", "<"), "date" : text.parse_datetime(extr( '<span title="', '"'), "%Y-%m-%d %H:%M:%S"), "user" : extr('username: "', '"'), } text.nameext_from_url(image["url"], image) yield Message.Directory, image yield Message.Url, image["url"], image class CheveretoAlbumExtractor(CheveretoExtractor): """Extractor for chevereto Albums""" subcategory = "album" pattern = BASE_PATTERN + r"(/a(?:lbum)?/[^/?#]+(?:/sub)?)" example = "https://jpg2.su/album/TITLE.ID" def items(self): url = self.root + self.path data = {"_extractor": CheveretoImageExtractor} if self.path.endswith("/sub"): albums = self._pagination(url) else: albums = (url,) for album in albums: for image in self._pagination(album): yield Message.Queue, image, data class CheveretoUserExtractor(CheveretoExtractor): """Extractor for chevereto Users""" subcategory = "user" pattern = BASE_PATTERN + r"(/(?!img|image|a(?:lbum)?)[^/?#]+(?:/albums)?)" example = "https://jpg2.su/USER" def items(self): url = self.root + self.path if self.path.endswith("/albums"): data = {"_extractor": CheveretoAlbumExtractor} else: data = {"_extractor": CheveretoImageExtractor} for url in self._pagination(url): yield Message.Queue, url, data ��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1745260818.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.29.7/gallery_dl/extractor/cien.py������������������������������������������������������0000644�0001750�0001750�00000015604�15001510422�020041� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2024 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://ci-en.net/""" from .common import Extractor, Message from .. import text BASE_PATTERN = r"(?:https?://)?ci-en\.(?:net|dlsite\.com)" class CienExtractor(Extractor): category = "cien" root = "https://ci-en.net" request_interval = (1.0, 2.0) def __init__(self, match): self.root = text.root_from_url(match.group(0)) Extractor.__init__(self, match) def _init(self): self.cookies.set("accepted_rating", "r18g", domain="ci-en.dlsite.com") def _pagination_articles(self, url, params): data = {"_extractor": CienArticleExtractor} params["page"] = text.parse_int(params.get("page"), 1) while True: page = self.request(url, params=params).text for card in text.extract_iter( page, ' class="c-cardCase-item', '</div>'): article_url = text.extr(card, ' href="', '"') yield Message.Queue, article_url, data if ' rel="next"' not in page: return params["page"] += 1 class CienArticleExtractor(CienExtractor): subcategory = "article" filename_fmt = "{num:>02} {filename}.{extension}" directory_fmt = ("{category}", "{author[name]}", "{post_id} {name}") archive_fmt = "{post_id}_{num}" pattern = BASE_PATTERN + r"/creator/(\d+)/article/(\d+)" example = "https://ci-en.net/creator/123/article/12345" def items(self): url = "{}/creator/{}/article/{}".format( self.root, self.groups[0], self.groups[1]) page = self.request(url, notfound="article").text files = self._extract_files(page) post = self._extract_jsonld(page)[0] post["post_url"] = url post["post_id"] = text.parse_int(self.groups[1]) post["count"] = len(files) post["date"] = text.parse_datetime(post["datePublished"]) try: del post["publisher"] del post["sameAs"] except Exception: pass yield Message.Directory, post for post["num"], file in enumerate(files, 1): post.update(file) if "extension" not in file: text.nameext_from_url(file["url"], post) yield Message.Url, file["url"], post def _extract_files(self, page): files = [] filetypes = self.config("files") if filetypes is None: self._extract_files_image(page, files) self._extract_files_video(page, files) self._extract_files_download(page, files) self._extract_files_gallery(page, files) else: generators = { "image" : self._extract_files_image, "video" : self._extract_files_video, "download": self._extract_files_download, "gallery" : self._extract_files_gallery, "gallerie": self._extract_files_gallery, } if isinstance(filetypes, str): filetypes = filetypes.split(",") for ft in filetypes: generators[ft.rstrip("s")](page, files) return files def _extract_files_image(self, page, files): for image in text.extract_iter( page, 'class="file-player-image"', "</figure>"): size = text.extr(image, ' data-size="', '"') w, _, h = size.partition("x") files.append({ "url" : text.extr(image, ' data-raw="', '"'), "width" : text.parse_int(w), "height": text.parse_int(h), "type" : "image", }) def _extract_files_video(self, page, files): for video in text.extract_iter( page, "<vue-file-player", "</vue-file-player>"): path = text.extr(video, ' base-path="', '"') name = text.extr(video, ' file-name="', '"') auth = text.extr(video, ' auth-key="', '"') file = text.nameext_from_url(name) file["url"] = "{}video-web.mp4?{}".format(path, auth) file["type"] = "video" files.append(file) def _extract_files_download(self, page, files): for download in text.extract_iter( page, 'class="downloadBlock', "</div>"): name = text.extr(download, "<p>", "<") file = text.nameext_from_url(name.rpartition(" ")[0]) file["url"] = text.extr(download, ' href="', '"') file["type"] = "download" files.append(file) def _extract_files_gallery(self, page, files): for gallery in text.extract_iter( page, "<vue-image-gallery", "</vue-image-gallery>"): url = self.root + "/api/creator/gallery/images" params = { "hash" : text.extr(gallery, ' hash="', '"'), "gallery_id": text.extr(gallery, ' gallery-id="', '"'), "time" : text.extr(gallery, ' time="', '"'), } data = self.request(url, params=params).json() url = self.root + "/api/creator/gallery/imagePath" for params["page"], params["file_id"] in enumerate( data["imgList"]): path = self.request(url, params=params).json()["path"] file = params.copy() file["url"] = path files.append(file) class CienCreatorExtractor(CienExtractor): subcategory = "creator" pattern = BASE_PATTERN + r"/creator/(\d+)(?:/article(?:\?([^#]+))?)?/?$" example = "https://ci-en.net/creator/123" def items(self): url = "{}/creator/{}/article".format(self.root, self.groups[0]) params = text.parse_query(self.groups[1]) params["mode"] = "list" return self._pagination_articles(url, params) class CienRecentExtractor(CienExtractor): subcategory = "recent" pattern = BASE_PATTERN + r"/mypage/recent(?:\?([^#]+))?" example = "https://ci-en.net/mypage/recent" def items(self): url = self.root + "/mypage/recent" params = text.parse_query(self.groups[0]) return self._pagination_articles(url, params) class CienFollowingExtractor(CienExtractor): subcategory = "following" pattern = BASE_PATTERN + r"/mypage/subscription(/following)?" example = "https://ci-en.net/mypage/subscription" def items(self): url = self.root + "/mypage/subscription" + (self.groups[0] or "") page = self.request(url).text data = {"_extractor": CienCreatorExtractor} for subscription in text.extract_iter( page, 'class="c-grid-subscriptionInfo', '</figure>'): url = text.extr(subscription, ' href="', '"') yield Message.Queue, url, data ����������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1747990168.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.29.7/gallery_dl/extractor/civitai.py���������������������������������������������������0000644�0001750�0001750�00000061524�15014033230�020556� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2024 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.civitai.com/""" from .common import Extractor, Message from .. import text, util, exception from ..cache import memcache import itertools import time BASE_PATTERN = r"(?:https?://)?civitai\.com" USER_PATTERN = BASE_PATTERN + r"/user/([^/?#]+)" class CivitaiExtractor(Extractor): """Base class for civitai extractors""" category = "civitai" root = "https://civitai.com" directory_fmt = ("{category}", "{username|user[username]}", "images") filename_fmt = "{file[id]|id|filename}.{extension}" archive_fmt = "{file[uuid]|uuid}" request_interval = (0.5, 1.5) def _init(self): if self.config("api") == "rest": self.log.debug("Using REST API") self.api = CivitaiRestAPI(self) else: self.log.debug("Using tRPC API") self.api = CivitaiTrpcAPI(self) quality = self.config("quality") if quality: if not isinstance(quality, str): quality = ",".join(quality) self._image_quality = quality self._image_ext = ("png" if quality == "original=true" else "jpg") else: self._image_quality = "original=true" self._image_ext = "png" quality_video = self.config("quality-videos") if quality_video: if not isinstance(quality_video, str): quality_video = ",".join(quality_video) if quality_video[0] == "+": quality_video = (self._image_quality + "," + quality_video.lstrip("+,")) self._video_quality = quality_video elif quality_video is not None and quality: self._video_quality = self._image_quality else: self._video_quality = "quality=100" self._video_ext = "webm" metadata = self.config("metadata") if metadata: if isinstance(metadata, str): metadata = metadata.split(",") elif not isinstance(metadata, (list, tuple)): metadata = ("generation", "version") self._meta_generation = ("generation" in metadata) self._meta_version = ("version" in metadata) else: self._meta_generation = self._meta_version = False def items(self): models = self.models() if models: data = {"_extractor": CivitaiModelExtractor} for model in models: url = "{}/models/{}".format(self.root, model["id"]) yield Message.Queue, url, data return posts = self.posts() if posts: for post in posts: if "images" in post: images = post["images"] else: images = self.api.images_post(post["id"]) post = self.api.post(post["id"]) post["date"] = text.parse_datetime( post["publishedAt"], "%Y-%m-%dT%H:%M:%S.%fZ") data = { "post": post, "user": post.pop("user"), } if self._meta_version: data["model"], data["version"] = \ self._extract_meta_version(post) yield Message.Directory, data for file in self._image_results(images): file.update(data) yield Message.Url, file["url"], file return images = self.images() if images: for image in images: if self._meta_generation: image["generation"] = \ self._extract_meta_generation(image) if self._meta_version: image["model"], image["version"] = \ self._extract_meta_version(image, False) image["date"] = text.parse_datetime( image["createdAt"], "%Y-%m-%dT%H:%M:%S.%fZ") url = self._url(image) text.nameext_from_url(url, image) if not image["extension"]: image["extension"] = ( self._video_ext if image.get("type") == "video" else self._image_ext) yield Message.Directory, image yield Message.Url, url, image return def models(self): return () def posts(self): return () def images(self): return () def _url(self, image): url = image["url"] video = image.get("type") == "video" quality = self._video_quality if video else self._image_quality if "/" in url: parts = url.rsplit("/", 3) image["uuid"] = parts[1] parts[2] = quality return "/".join(parts) image["uuid"] = url name = image.get("name") if not name: mime = image.get("mimeType") or self._image_ext name = "{}.{}".format(image.get("id"), mime.rpartition("/")[2]) return ( "https://image.civitai.com/xG1nkqKTMzGDvpLrqFT7WA/{}/{}/{}".format( url, quality, name) ) def _image_results(self, images): for num, file in enumerate(images, 1): data = text.nameext_from_url(file["url"], { "num" : num, "file": file, "url" : self._url(file), }) if not data["extension"]: data["extension"] = ( self._video_ext if file.get("type") == "video" else self._image_ext) if "id" not in file and data["filename"].isdecimal(): file["id"] = text.parse_int(data["filename"]) if self._meta_generation: file["generation"] = self._extract_meta_generation(file) yield data def _parse_query(self, value): return text.parse_query_list( value, {"tags", "reactions", "baseModels", "tools", "techniques", "types", "fileFormats"}) def _extract_meta_generation(self, image): try: return self.api.image_generationdata(image["id"]) except Exception as exc: return self.log.debug("", exc_info=exc) def _extract_meta_version(self, item, is_post=True): try: version_id = self._extract_version_id(item, is_post) if version_id: version = self.api.model_version(version_id).copy() return version.pop("model", None), version except Exception as exc: self.log.debug("", exc_info=exc) return None, None def _extract_version_id(self, item, is_post=True): version_id = item.get("modelVersionId") if version_id: return version_id version_ids = item.get("modelVersionIds") if version_ids: return version_ids[0] if is_post: return None item["post"] = post = self.api.post(item["postId"]) post.pop("user", None) return self._extract_version_id(post) class CivitaiModelExtractor(CivitaiExtractor): subcategory = "model" directory_fmt = ("{category}", "{user[username]}", "{model[id]}{model[name]:? //}", "{version[id]}{version[name]:? //}") pattern = BASE_PATTERN + r"/models/(\d+)(?:/?\?modelVersionId=(\d+))?" example = "https://civitai.com/models/12345/TITLE" def items(self): model_id, version_id = self.groups model = self.api.model(model_id) if "user" in model: user = model["user"] del model["user"] else: user = model["creator"] del model["creator"] versions = model["modelVersions"] del model["modelVersions"] if version_id: version_id = int(version_id) for version in versions: if version["id"] == version_id: break else: version = self.api.model_version(version_id) versions = (version,) for version in versions: version["date"] = text.parse_datetime( version["createdAt"], "%Y-%m-%dT%H:%M:%S.%fZ") data = { "model" : model, "version": version, "user" : user, } yield Message.Directory, data for file in self._extract_files(model, version, user): file.update(data) yield Message.Url, file["url"], file def _extract_files(self, model, version, user): filetypes = self.config("files") if filetypes is None: return self._extract_files_image(model, version, user) generators = { "model" : self._extract_files_model, "image" : self._extract_files_image, "gallery" : self._extract_files_gallery, "gallerie": self._extract_files_gallery, } if isinstance(filetypes, str): filetypes = filetypes.split(",") return itertools.chain.from_iterable( generators[ft.rstrip("s")](model, version, user) for ft in filetypes ) def _extract_files_model(self, model, version, user): files = [] for num, file in enumerate(version["files"], 1): name, sep, ext = file["name"].rpartition(".") if not sep: name = ext ext = "bin" file["uuid"] = "model-{}-{}-{}".format( model["id"], version["id"], file["id"]) files.append({ "num" : num, "file" : file, "filename" : name, "extension": ext, "url" : (file.get("downloadUrl") or "{}/api/download/models/{}".format( self.root, version["id"])), "_http_headers" : { "Authorization": self.api.headers.get("Authorization")}, "_http_validate": self._validate_file_model, }) return files def _extract_files_image(self, model, version, user): if "images" in version: images = version["images"] else: params = { "modelVersionId": version["id"], "prioritizedUserIds": [user["id"]], "period": "AllTime", "sort": "Most Reactions", "limit": 20, "pending": True, } images = self.api.images(params, defaults=False) return self._image_results(images) def _extract_files_gallery(self, model, version, user): images = self.api.images_gallery(model, version, user) return self._image_results(images) def _validate_file_model(self, response): if response.headers.get("Content-Type", "").startswith("text/html"): alert = text.extr( response.text, 'mantine-Alert-message">', "</div></div></div>") if alert: msg = "\"{}\" - 'api-key' required".format( text.remove_html(alert)) else: msg = "'api-key' required to download this file" self.log.warning(msg) return False return True class CivitaiImageExtractor(CivitaiExtractor): subcategory = "image" pattern = BASE_PATTERN + r"/images/(\d+)" example = "https://civitai.com/images/12345" def images(self): return self.api.image(self.groups[0]) class CivitaiPostExtractor(CivitaiExtractor): subcategory = "post" directory_fmt = ("{category}", "{username|user[username]}", "posts", "{post[id]}{post[title]:? //}") pattern = BASE_PATTERN + r"/posts/(\d+)" example = "https://civitai.com/posts/12345" def posts(self): return ({"id": int(self.groups[0])},) class CivitaiTagExtractor(CivitaiExtractor): subcategory = "tag" pattern = BASE_PATTERN + r"/tag/([^/?&#]+)" example = "https://civitai.com/tag/TAG" def models(self): tag = text.unquote(self.groups[0]) return self.api.models_tag(tag) class CivitaiSearchExtractor(CivitaiExtractor): subcategory = "search" pattern = BASE_PATTERN + r"/search/models\?([^#]+)" example = "https://civitai.com/search/models?query=QUERY" def models(self): params = text.parse_query(self.groups[0]) return self.api.models(params) class CivitaiModelsExtractor(CivitaiExtractor): subcategory = "models" pattern = BASE_PATTERN + r"/models(?:/?\?([^#]+))?(?:$|#)" example = "https://civitai.com/models" def models(self): params = text.parse_query(self.groups[0]) return self.api.models(params) class CivitaiImagesExtractor(CivitaiExtractor): subcategory = "images" pattern = BASE_PATTERN + r"/images(?:/?\?([^#]+))?(?:$|#)" example = "https://civitai.com/images" def images(self): params = text.parse_query(self.groups[0]) return self.api.images(params) class CivitaiUserExtractor(CivitaiExtractor): subcategory = "user" pattern = USER_PATTERN + r"/?(?:$|\?|#)" example = "https://civitai.com/user/USER" def initialize(self): pass def items(self): base = "{}/user/{}/".format(self.root, self.groups[0]) return self._dispatch_extractors(( (CivitaiUserModelsExtractor, base + "models"), (CivitaiUserPostsExtractor , base + "posts"), (CivitaiUserImagesExtractor, base + "images"), (CivitaiUserVideosExtractor, base + "videos"), ), ("user-models", "user-posts")) class CivitaiUserModelsExtractor(CivitaiExtractor): subcategory = "user-models" pattern = USER_PATTERN + r"/models/?(?:\?([^#]+))?" example = "https://civitai.com/user/USER/models" def models(self): user, query = self.groups params = self._parse_query(query) params["username"] = text.unquote(user) return self.api.models(params) class CivitaiUserPostsExtractor(CivitaiExtractor): subcategory = "user-posts" directory_fmt = ("{category}", "{username|user[username]}", "posts", "{post[id]}{post[title]:? //}") pattern = USER_PATTERN + r"/posts/?(?:\?([^#]+))?" example = "https://civitai.com/user/USER/posts" def posts(self): user, query = self.groups params = self._parse_query(query) params["username"] = text.unquote(user) return self.api.posts(params) class CivitaiUserImagesExtractor(CivitaiExtractor): subcategory = "user-images" pattern = USER_PATTERN + r"/images/?(?:\?([^#]+))?" example = "https://civitai.com/user/USER/images" def __init__(self, match): self.params = self._parse_query(match.group(2)) if self.params.get("section") == "reactions": self.subcategory = "reactions" self.images = self.images_reactions CivitaiExtractor.__init__(self, match) def images(self): params = self.params params["username"] = text.unquote(self.groups[0]) return self.api.images(params) def images_reactions(self): if "Authorization" not in self.api.headers and \ not self.cookies.get( "__Secure-civitai-token", domain=".civitai.com"): raise exception.AuthorizationError("api-key or cookies required") params = self.params params["authed"] = True params["useIndex"] = False if "reactions" not in params: params["reactions"] = ("Like", "Dislike", "Heart", "Laugh", "Cry") return self.api.images(params) class CivitaiUserVideosExtractor(CivitaiExtractor): subcategory = "user-videos" directory_fmt = ("{category}", "{username|user[username]}", "videos") pattern = USER_PATTERN + r"/videos/?(?:\?([^#]+))?" example = "https://civitai.com/user/USER/videos" def images(self): self._image_ext = "mp4" user, query = self.groups params = self._parse_query(query) params["types"] = ["video"] params["username"] = text.unquote(user) return self.api.images(params) class CivitaiRestAPI(): """Interface for the Civitai Public REST API https://developer.civitai.com/docs/api/public-rest """ def __init__(self, extractor): self.extractor = extractor self.root = extractor.root + "/api" self.headers = {"Content-Type": "application/json"} api_key = extractor.config("api-key") if api_key: extractor.log.debug("Using api_key authentication") self.headers["Authorization"] = "Bearer " + api_key nsfw = extractor.config("nsfw") if nsfw is None or nsfw is True: nsfw = "X" elif not nsfw: nsfw = "Safe" self.nsfw = nsfw def image(self, image_id): return self.images({ "imageId": image_id, }) def images(self, params): endpoint = "/v1/images" if "nsfw" not in params: params["nsfw"] = self.nsfw return self._pagination(endpoint, params) def images_gallery(self, model, version, user): return self.images({ "modelId" : model["id"], "modelVersionId": version["id"], }) def model(self, model_id): endpoint = "/v1/models/{}".format(model_id) return self._call(endpoint) @memcache(keyarg=1) def model_version(self, model_version_id): endpoint = "/v1/model-versions/{}".format(model_version_id) return self._call(endpoint) def models(self, params): return self._pagination("/v1/models", params) def models_tag(self, tag): return self.models({"tag": tag}) def _call(self, endpoint, params=None): if endpoint[0] == "/": url = self.root + endpoint else: url = endpoint response = self.extractor.request( url, params=params, headers=self.headers) return response.json() def _pagination(self, endpoint, params): while True: data = self._call(endpoint, params) yield from data["items"] try: endpoint = data["metadata"]["nextPage"] except KeyError: return params = None class CivitaiTrpcAPI(): """Interface for the Civitai tRPC API""" def __init__(self, extractor): self.extractor = extractor self.root = extractor.root + "/api/trpc/" self.headers = { "content-type" : "application/json", "x-client-version": "5.0.701", "x-client-date" : "", "x-client" : "web", "x-fingerprint" : "undefined", } api_key = extractor.config("api-key") if api_key: extractor.log.debug("Using api_key authentication") self.headers["Authorization"] = "Bearer " + api_key nsfw = extractor.config("nsfw") if nsfw is None or nsfw is True: nsfw = 31 elif not nsfw: nsfw = 1 self.nsfw = nsfw def image(self, image_id): endpoint = "image.get" params = {"id": int(image_id)} return (self._call(endpoint, params),) def image_generationdata(self, image_id): endpoint = "image.getGenerationData" params = {"id": int(image_id)} return self._call(endpoint, params) def images(self, params, defaults=True): endpoint = "image.getInfinite" if defaults: params = self._merge_params(params, { "useIndex" : True, "period" : "AllTime", "sort" : "Newest", "types" : ["image"], "withMeta" : False, # Metadata Only "fromPlatform" : False, # Made On-Site "browsingLevel": self.nsfw, "include" : ["cosmetics"], }) params = self._type_params(params) return self._pagination(endpoint, params) def images_gallery(self, model, version, user): endpoint = "image.getImagesAsPostsInfinite" params = { "period" : "AllTime", "sort" : "Newest", "modelVersionId": version["id"], "modelId" : model["id"], "hidden" : False, "limit" : 50, "browsingLevel" : self.nsfw, } for post in self._pagination(endpoint, params): yield from post["images"] def images_post(self, post_id): params = { "postId" : int(post_id), "pending": True, } return self.images(params) def model(self, model_id): endpoint = "model.getById" params = {"id": int(model_id)} return self._call(endpoint, params) @memcache(keyarg=1) def model_version(self, model_version_id): endpoint = "modelVersion.getById" params = {"id": int(model_version_id)} return self._call(endpoint, params) def models(self, params, defaults=True): endpoint = "model.getAll" if defaults: params = self._merge_params(params, { "period" : "AllTime", "periodMode" : "published", "sort" : "Newest", "pending" : False, "hidden" : False, "followed" : False, "earlyAccess" : False, "fromPlatform" : False, "supportsGeneration": False, "browsingLevel": self.nsfw, }) return self._pagination(endpoint, params) def models_tag(self, tag): return self.models({"tagname": tag}) def post(self, post_id): endpoint = "post.get" params = {"id": int(post_id)} return self._call(endpoint, params) def posts(self, params, defaults=True): endpoint = "post.getInfinite" meta = {"cursor": ("Date",)} if defaults: params = self._merge_params(params, { "browsingLevel": self.nsfw, "period" : "AllTime", "periodMode" : "published", "sort" : "Newest", "followed" : False, "draftOnly" : False, "pending" : True, "include" : ["cosmetics"], }) return self._pagination(endpoint, params, meta) def user(self, username): endpoint = "user.getCreator" params = {"username": username} return (self._call(endpoint, params),) def _call(self, endpoint, params, meta=None): url = self.root + endpoint headers = self.headers if meta: input = {"json": params, "meta": {"values": meta}} else: input = {"json": params} params = {"input": util.json_dumps(input)} headers["x-client-date"] = str(int(time.time() * 1000)) response = self.extractor.request(url, params=params, headers=headers) return response.json()["result"]["data"]["json"] def _pagination(self, endpoint, params, meta=None): if "cursor" not in params: params["cursor"] = None meta_ = {"cursor": ("undefined",)} while True: data = self._call(endpoint, params, meta_) yield from data["items"] try: if not data["nextCursor"]: return except KeyError: return params["cursor"] = data["nextCursor"] meta_ = meta def _merge_params(self, params_user, params_default): """Combine 'params_user' with 'params_default'""" params_default.update(params_user) return params_default def _type_params(self, params): """Convert 'params' values to expected types""" types = { "tags" : int, "tools" : int, "techniques" : int, "modelId" : int, "modelVersionId": int, "remixesOnly" : _bool, "nonRemixesOnly": _bool, "withMeta" : _bool, "fromPlatform" : _bool, "supportsGeneration": _bool, } for name, value in params.items(): if name not in types: continue elif isinstance(value, str): params[name] = types[name](value) elif isinstance(value, list): type = types[name] params[name] = [type(item) for item in value] return params def _bool(value): return True if value == "true" else False ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1743510441.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.29.7/gallery_dl/extractor/comicvine.py�������������������������������������������������0000644�0001750�0001750�00000004007�14772755651�021127� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2021-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://comicvine.gamespot.com/""" from .booru import BooruExtractor from .. import text import operator class ComicvineTagExtractor(BooruExtractor): """Extractor for a gallery on comicvine.gamespot.com""" category = "comicvine" subcategory = "tag" basecategory = "" root = "https://comicvine.gamespot.com" per_page = 1000 directory_fmt = ("{category}", "{tag}") filename_fmt = "{filename}.{extension}" archive_fmt = "{id}" pattern = (r"(?:https?://)?comicvine\.gamespot\.com" r"(/([^/?#]+)/(\d+-\d+)/images/.*)") example = "https://comicvine.gamespot.com/TAG/123-45/images/" def __init__(self, match): BooruExtractor.__init__(self, match) self.path, self.object_name, self.object_id = match.groups() def metadata(self): return {"tag": text.unquote(self.object_name)} def posts(self): url = self.root + "/js/image-data.json" params = { "images": text.extract( self.request(self.root + self.path).text, 'data-gallery-id="', '"')[0], "start" : self.page_start, "count" : self.per_page, "object": self.object_id, } while True: images = self.request(url, params=params).json()["images"] yield from images if len(images) < self.per_page: return params["start"] += self.per_page def skip(self, num): self.page_start = num return num _file_url = operator.itemgetter("original") @staticmethod def _prepare(post): post["date"] = text.parse_datetime( post["dateCreated"], "%a, %b %d %Y") post["tags"] = [tag["name"] for tag in post["tags"] if tag["name"]] �������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1747990037.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.29.7/gallery_dl/extractor/common.py����������������������������������������������������0000644�0001750�0001750�00000103760�15014033025�020417� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2014-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Common classes and constants used by extractor modules.""" import os import re import ssl import time import netrc import queue import random import getpass import logging import datetime import requests import threading from requests.adapters import HTTPAdapter from .message import Message from .. import config, output, text, util, cache, exception urllib3 = requests.packages.urllib3 class Extractor(): category = "" subcategory = "" basecategory = "" categorytransfer = False directory_fmt = ("{category}",) filename_fmt = "{filename}.{extension}" archive_fmt = "" root = "" cookies_domain = "" cookies_index = 0 referer = True ciphers = None tls12 = True browser = None useragent = util.USERAGENT_FIREFOX request_interval = 0.0 request_interval_min = 0.0 request_interval_429 = 60.0 request_timestamp = 0.0 def __init__(self, match): self.log = logging.getLogger(self.category) self.url = match.string self.match = match self.groups = match.groups() self._cfgpath = ("extractor", self.category, self.subcategory) self._parentdir = "" @classmethod def from_url(cls, url): if isinstance(cls.pattern, str): cls.pattern = util.re_compile(cls.pattern) match = cls.pattern.match(url) return cls(match) if match else None def __iter__(self): self.initialize() return self.items() def initialize(self): self._init_options() self._init_session() self._init_cookies() self._init() self.initialize = util.noop def finalize(self): pass def items(self): yield Message.Version, 1 def skip(self, num): return 0 def config(self, key, default=None): return config.interpolate(self._cfgpath, key, default) def config2(self, key, key2, default=None, sentinel=util.SENTINEL): value = self.config(key, sentinel) if value is not sentinel: return value return self.config(key2, default) def config_deprecated(self, key, deprecated, default=None, sentinel=util.SENTINEL, history=set()): value = self.config(deprecated, sentinel) if value is not sentinel: if deprecated not in history: history.add(deprecated) self.log.warning("'%s' is deprecated. Use '%s' instead.", deprecated, key) default = value value = self.config(key, sentinel) if value is not sentinel: return value return default def config_accumulate(self, key): return config.accumulate(self._cfgpath, key) def config_instance(self, key, default=None): return default def _config_shared(self, key, default=None): return config.interpolate_common( ("extractor",), self._cfgpath, key, default) def _config_shared_accumulate(self, key): first = True extr = ("extractor",) for path in self._cfgpath: if first: first = False values = config.accumulate(extr + path, key) else: conf = config.get(extr, path[0]) if conf: values[:0] = config.accumulate( (self.subcategory,), key, conf=conf) return values def request(self, url, method="GET", session=None, retries=None, retry_codes=None, encoding=None, fatal=True, notfound=None, **kwargs): if session is None: session = self.session if retries is None: retries = self._retries if retry_codes is None: retry_codes = self._retry_codes if "proxies" not in kwargs: kwargs["proxies"] = self._proxies if "timeout" not in kwargs: kwargs["timeout"] = self._timeout if "verify" not in kwargs: kwargs["verify"] = self._verify if "json" in kwargs: json = kwargs["json"] if json is not None: kwargs["data"] = util.json_dumps(json).encode() del kwargs["json"] headers = kwargs.get("headers") if headers: headers["Content-Type"] = "application/json" else: kwargs["headers"] = {"Content-Type": "application/json"} response = None tries = 1 if self._interval: seconds = (self._interval() - (time.time() - Extractor.request_timestamp)) if seconds > 0.0: self.sleep(seconds, "request") while True: try: response = session.request(method, url, **kwargs) except requests.exceptions.ConnectionError as exc: code = 0 try: reason = exc.args[0].reason cls = reason.__class__.__name__ pre, _, err = str(reason.args[-1]).partition(":") msg = " {}: {}".format(cls, (err or pre).lstrip()) except Exception: msg = exc except (requests.exceptions.Timeout, requests.exceptions.ChunkedEncodingError, requests.exceptions.ContentDecodingError) as exc: msg = exc code = 0 except (requests.exceptions.RequestException) as exc: raise exception.HttpError(exc) else: code = response.status_code if self._write_pages: self._dump_response(response) if ( code < 400 or code < 500 and ( not fatal and code != 429 or fatal is None) or fatal is ... ): if encoding: response.encoding = encoding return response if notfound and code == 404: raise exception.NotFoundError(notfound) msg = "'{} {}' for '{}'".format( code, response.reason, response.url) challenge = util.detect_challenge(response) if challenge is not None: self.log.warning(challenge) if code == 429 and self._handle_429(response): continue elif code == 429 and self._interval_429: pass elif code not in retry_codes and code < 500: break finally: Extractor.request_timestamp = time.time() self.log.debug("%s (%s/%s)", msg, tries, retries+1) if tries > retries: break seconds = tries if self._interval: s = self._interval() if seconds < s: seconds = s if code == 429 and self._interval_429: s = self._interval_429() if seconds < s: seconds = s self.wait(seconds=seconds, reason="429 Too Many Requests") else: self.sleep(seconds, "retry") tries += 1 raise exception.HttpError(msg, response) def request_location(self, url, **kwargs): kwargs.setdefault("method", "HEAD") kwargs.setdefault("allow_redirects", False) return self.request(url, **kwargs).headers.get("location", "") _handle_429 = util.false def wait(self, seconds=None, until=None, adjust=1.0, reason="rate limit"): now = time.time() if seconds: seconds = float(seconds) until = now + seconds elif until: if isinstance(until, datetime.datetime): # convert to UTC timestamp until = util.datetime_to_timestamp(until) else: until = float(until) seconds = until - now else: raise ValueError("Either 'seconds' or 'until' is required") seconds += adjust if seconds <= 0.0: return if reason: t = datetime.datetime.fromtimestamp(until).time() isotime = "{:02}:{:02}:{:02}".format(t.hour, t.minute, t.second) self.log.info("Waiting until %s (%s)", isotime, reason) time.sleep(seconds) def sleep(self, seconds, reason): self.log.debug("Sleeping %.2f seconds (%s)", seconds, reason) time.sleep(seconds) def input(self, prompt, echo=True): self._check_input_allowed(prompt) if echo: try: return input(prompt) except (EOFError, OSError): return None else: return getpass.getpass(prompt) def _check_input_allowed(self, prompt=""): input = self.config("input") if input is None: input = output.TTY_STDIN if not input: raise exception.StopExtraction( "User input required (%s)", prompt.strip(" :")) def _get_auth_info(self): """Return authentication information as (username, password) tuple""" username = self.config("username") password = None if username: password = self.config("password") if not password: self._check_input_allowed("password") password = util.LazyPrompt() elif self.config("netrc", False): try: info = netrc.netrc().authenticators(self.category) username, _, password = info except (OSError, netrc.NetrcParseError) as exc: self.log.error("netrc: %s", exc) except TypeError: self.log.warning("netrc: No authentication info") return username, password def _init(self): pass def _init_options(self): self._write_pages = self.config("write-pages", False) self._retry_codes = self.config("retry-codes") self._retries = self.config("retries", 4) self._timeout = self.config("timeout", 30) self._verify = self.config("verify", True) self._proxies = util.build_proxy_map(self.config("proxy"), self.log) self._interval = util.build_duration_func( self.config("sleep-request", self.request_interval), self.request_interval_min, ) self._interval_429 = util.build_duration_func( self.config("sleep-429", self.request_interval_429), ) if self._retries < 0: self._retries = float("inf") if not self._retry_codes: self._retry_codes = () def _init_session(self): self.session = session = requests.Session() headers = session.headers headers.clear() ssl_options = ssl_ciphers = 0 # .netrc Authorization headers are alwsays disabled session.trust_env = True if self.config("proxy-env", True) else False browser = self.config("browser") if browser is None: browser = self.browser if browser and isinstance(browser, str): browser, _, platform = browser.lower().partition(":") if not platform or platform == "auto": platform = ("Windows NT 10.0; Win64; x64" if util.WINDOWS else "X11; Linux x86_64") elif platform == "windows": platform = "Windows NT 10.0; Win64; x64" elif platform == "linux": platform = "X11; Linux x86_64" elif platform == "macos": platform = "Macintosh; Intel Mac OS X 11.5" if browser == "chrome": if platform.startswith("Macintosh"): platform = platform.replace(".", "_") + "_2" else: browser = "firefox" for key, value in HTTP_HEADERS[browser]: if value and "{}" in value: headers[key] = value.format(platform) else: headers[key] = value ssl_options |= (ssl.OP_NO_SSLv2 | ssl.OP_NO_SSLv3 | ssl.OP_NO_TLSv1 | ssl.OP_NO_TLSv1_1) ssl_ciphers = SSL_CIPHERS[browser] else: useragent = self.config("user-agent") if useragent is None or useragent == "auto": useragent = self.useragent elif useragent == "browser": useragent = _browser_useragent() elif self.useragent is not Extractor.useragent and \ useragent is config.get(("extractor",), "user-agent"): useragent = self.useragent headers["User-Agent"] = useragent headers["Accept"] = "*/*" headers["Accept-Language"] = "en-US,en;q=0.5" ssl_ciphers = self.ciphers if BROTLI: headers["Accept-Encoding"] = "gzip, deflate, br" else: headers["Accept-Encoding"] = "gzip, deflate" if ZSTD: headers["Accept-Encoding"] += ", zstd" referer = self.config("referer", self.referer) if referer: if isinstance(referer, str): headers["Referer"] = referer elif self.root: headers["Referer"] = self.root + "/" custom_headers = self.config("headers") if custom_headers: headers.update(custom_headers) custom_ciphers = self.config("ciphers") if custom_ciphers: if isinstance(custom_ciphers, list): ssl_ciphers = ":".join(custom_ciphers) else: ssl_ciphers = custom_ciphers source_address = self.config("source-address") if source_address: if isinstance(source_address, str): source_address = (source_address, 0) else: source_address = (source_address[0], source_address[1]) tls12 = self.config("tls12") if tls12 is None: tls12 = self.tls12 if not tls12: ssl_options |= ssl.OP_NO_TLSv1_2 self.log.debug("TLS 1.2 disabled.") adapter = _build_requests_adapter( ssl_options, ssl_ciphers, source_address) session.mount("https://", adapter) session.mount("http://", adapter) def _init_cookies(self): """Populate the session's cookiejar""" self.cookies = self.session.cookies self.cookies_file = None if self.cookies_domain is None: return cookies = self.config("cookies") if cookies: select = self.config("cookies-select") if select: if select == "rotate": cookies = cookies[self.cookies_index % len(cookies)] Extractor.cookies_index += 1 else: cookies = random.choice(cookies) self.cookies_load(cookies) def cookies_load(self, cookies_source): if isinstance(cookies_source, dict): self.cookies_update_dict(cookies_source, self.cookies_domain) elif isinstance(cookies_source, str): path = util.expand_path(cookies_source) try: with open(path) as fp: cookies = util.cookiestxt_load(fp) except Exception as exc: self.log.warning("cookies: %s", exc) else: self.log.debug("Loading cookies from '%s'", cookies_source) set_cookie = self.cookies.set_cookie for cookie in cookies: set_cookie(cookie) self.cookies_file = path elif isinstance(cookies_source, (list, tuple)): key = tuple(cookies_source) cookies = _browser_cookies.get(key) if cookies is None: from ..cookies import load_cookies try: cookies = load_cookies(cookies_source) except Exception as exc: self.log.warning("cookies: %s", exc) cookies = () else: _browser_cookies[key] = cookies else: self.log.debug("Using cached cookies from %s", key) set_cookie = self.cookies.set_cookie for cookie in cookies: set_cookie(cookie) else: self.log.warning( "Expected 'dict', 'list', or 'str' value for 'cookies' " "option, got '%s' (%s)", cookies_source.__class__.__name__, cookies_source) def cookies_store(self): """Store the session's cookies in a cookies.txt file""" export = self.config("cookies-update", True) if not export: return if isinstance(export, str): path = util.expand_path(export) else: path = self.cookies_file if not path: return path_tmp = path + ".tmp" try: with open(path_tmp, "w") as fp: util.cookiestxt_store(fp, self.cookies) os.replace(path_tmp, path) except OSError as exc: self.log.warning("cookies: %s", exc) def cookies_update(self, cookies, domain=""): """Update the session's cookiejar with 'cookies'""" if isinstance(cookies, dict): self.cookies_update_dict(cookies, domain or self.cookies_domain) else: set_cookie = self.cookies.set_cookie try: cookies = iter(cookies) except TypeError: set_cookie(cookies) else: for cookie in cookies: set_cookie(cookie) def cookies_update_dict(self, cookiedict, domain): """Update cookiejar with name-value pairs from a dict""" set_cookie = self.cookies.set for name, value in cookiedict.items(): set_cookie(name, value, domain=domain) def cookies_check(self, cookies_names, domain=None, subdomains=False): """Check if all 'cookies_names' are in the session's cookiejar""" if not self.cookies: return False if domain is None: domain = self.cookies_domain names = set(cookies_names) now = time.time() for cookie in self.cookies: if cookie.name not in names: continue if not domain or cookie.domain == domain: pass elif not subdomains or not cookie.domain.endswith(domain): continue if cookie.expires: diff = int(cookie.expires - now) if diff <= 0: self.log.warning( "Cookie '%s' has expired", cookie.name) continue elif diff <= 86400: hours = diff // 3600 self.log.warning( "Cookie '%s' will expire in less than %s hour%s", cookie.name, hours + 1, "s" if hours else "") names.discard(cookie.name) if not names: return True return False def _extract_jsonld(self, page): return util.json_loads(text.extr( page, '<script type="application/ld+json">', "</script>")) def _extract_nextdata(self, page): return util.json_loads(text.extr( page, ' id="__NEXT_DATA__" type="application/json">', "</script>")) def _prepare_ddosguard_cookies(self): if not self.cookies.get("__ddg2", domain=self.cookies_domain): self.cookies.set( "__ddg2", util.generate_token(), domain=self.cookies_domain) def _cache(self, func, maxage, keyarg=None): # return cache.DatabaseCacheDecorator(func, maxage, keyarg) return cache.DatabaseCacheDecorator(func, keyarg, maxage) def _cache_memory(self, func, maxage=None, keyarg=None): return cache.Memcache() def _get_date_min_max(self, dmin=None, dmax=None): """Retrieve and parse 'date-min' and 'date-max' config values""" def get(key, default): ts = self.config(key, default) if isinstance(ts, str): try: ts = int(datetime.datetime.strptime(ts, fmt).timestamp()) except ValueError as exc: self.log.warning("Unable to parse '%s': %s", key, exc) ts = default return ts fmt = self.config("date-format", "%Y-%m-%dT%H:%M:%S") return get("date-min", dmin), get("date-max", dmax) def _dispatch_extractors(self, extractor_data, default=()): """ """ extractors = { data[0].subcategory: data for data in extractor_data } include = self.config("include", default) or () if include == "all": include = extractors elif isinstance(include, str): include = include.replace(" ", "").split(",") result = [(Message.Version, 1)] for category in include: try: extr, url = extractors[category] except KeyError: self.log.warning("Invalid include '%s'", category) else: result.append((Message.Queue, url, {"_extractor": extr})) return iter(result) @classmethod def _dump(cls, obj): util.dump_json(obj, ensure_ascii=False, indent=2) def _dump_response(self, response, history=True): """Write the response content to a .dump file in the current directory. The file name is derived from the response url, replacing special characters with "_" """ if history: for resp in response.history: self._dump_response(resp, False) if hasattr(Extractor, "_dump_index"): Extractor._dump_index += 1 else: Extractor._dump_index = 1 Extractor._dump_sanitize = re.compile(r"[\\\\|/<>:\"?*&=#]+").sub fname = "{:>02}_{}".format( Extractor._dump_index, Extractor._dump_sanitize('_', response.url), ) if util.WINDOWS: path = os.path.abspath(fname)[:255] else: path = fname[:251] try: with open(path + ".txt", 'wb') as fp: util.dump_response( response, fp, headers=(self._write_pages in ("all", "ALL")), hide_auth=(self._write_pages != "ALL") ) self.log.info("Writing '%s' response to '%s'", response.url, path + ".txt") except Exception as e: self.log.warning("Failed to dump HTTP request (%s: %s)", e.__class__.__name__, e) class GalleryExtractor(Extractor): subcategory = "gallery" filename_fmt = "{category}_{gallery_id}_{num:>03}.{extension}" directory_fmt = ("{category}", "{gallery_id} {title}") archive_fmt = "{gallery_id}_{num}" enum = "num" def __init__(self, match, url=None): Extractor.__init__(self, match) self.gallery_url = self.root + self.groups[0] if url is None else url def items(self): self.login() if self.gallery_url: page = self.request( self.gallery_url, notfound=self.subcategory).text else: page = None data = self.metadata(page) imgs = self.images(page) if "count" in data: if self.config("page-reverse"): images = util.enumerate_reversed(imgs, 1, data["count"]) else: images = zip( range(1, data["count"]+1), imgs, ) else: enum = enumerate try: data["count"] = len(imgs) except TypeError: pass else: if self.config("page-reverse"): enum = util.enumerate_reversed images = enum(imgs, 1) yield Message.Directory, data for data[self.enum], (url, imgdata) in images: if imgdata: data.update(imgdata) if "extension" not in imgdata: text.nameext_from_url(url, data) else: text.nameext_from_url(url, data) yield Message.Url, url, data def login(self): """Login and set necessary cookies""" def metadata(self, page): """Return a dict with general metadata""" def images(self, page): """Return a list of all (image-url, metadata)-tuples""" class ChapterExtractor(GalleryExtractor): subcategory = "chapter" directory_fmt = ( "{category}", "{manga}", "{volume:?v/ />02}c{chapter:>03}{chapter_minor:?//}{title:?: //}") filename_fmt = ( "{manga}_c{chapter:>03}{chapter_minor:?//}_{page:>03}.{extension}") archive_fmt = ( "{manga}_{chapter}{chapter_minor}_{page}") enum = "page" class MangaExtractor(Extractor): subcategory = "manga" categorytransfer = True chapterclass = None reverse = True def __init__(self, match, url=None): Extractor.__init__(self, match) self.manga_url = self.root + self.groups[0] if url is None else url if self.config("chapter-reverse", False): self.reverse = not self.reverse def items(self): self.login() if self.manga_url: page = self.request(self.manga_url, notfound=self.subcategory).text else: page = None chapters = self.chapters(page) if self.reverse: chapters.reverse() for chapter, data in chapters: data["_extractor"] = self.chapterclass yield Message.Queue, chapter, data def login(self): """Login and set necessary cookies""" def chapters(self, page): """Return a list of all (chapter-url, metadata)-tuples""" class AsynchronousMixin(): """Run info extraction in a separate thread""" def __iter__(self): self.initialize() messages = queue.Queue(5) thread = threading.Thread( target=self.async_items, args=(messages,), daemon=True, ) thread.start() while True: msg = messages.get() if msg is None: thread.join() return if isinstance(msg, Exception): thread.join() raise msg yield msg messages.task_done() def async_items(self, messages): try: for msg in self.items(): messages.put(msg) except Exception as exc: messages.put(exc) messages.put(None) class BaseExtractor(Extractor): instances = () def __init__(self, match): if not self.category: self.groups = match.groups() self.match = match self._init_category() Extractor.__init__(self, match) def _init_category(self): for index, group in enumerate(self.groups): if group is not None: if index: self.category, self.root, info = self.instances[index-1] if not self.root: self.root = text.root_from_url(self.match.group(0)) self.config_instance = info.get else: self.root = group self.category = group.partition("://")[2] break @classmethod def update(cls, instances): extra_instances = config.get(("extractor",), cls.basecategory) if extra_instances: for category, info in extra_instances.items(): if isinstance(info, dict) and "root" in info: instances[category] = info pattern_list = [] instance_list = cls.instances = [] for category, info in instances.items(): root = info["root"] if root: root = root.rstrip("/") instance_list.append((category, root, info)) pattern = info.get("pattern") if not pattern: pattern = re.escape(root[root.index(":") + 3:]) pattern_list.append(pattern + "()") return ( r"(?:" + cls.basecategory + r":(https?://[^/?#]+)|" r"(?:https?://)?(?:" + "|".join(pattern_list) + r"))" ) class RequestsAdapter(HTTPAdapter): def __init__(self, ssl_context=None, source_address=None): self.ssl_context = ssl_context self.source_address = source_address HTTPAdapter.__init__(self) def init_poolmanager(self, *args, **kwargs): kwargs["ssl_context"] = self.ssl_context kwargs["source_address"] = self.source_address return HTTPAdapter.init_poolmanager(self, *args, **kwargs) def proxy_manager_for(self, *args, **kwargs): kwargs["ssl_context"] = self.ssl_context kwargs["source_address"] = self.source_address return HTTPAdapter.proxy_manager_for(self, *args, **kwargs) def _build_requests_adapter(ssl_options, ssl_ciphers, source_address): key = (ssl_options, ssl_ciphers, source_address) try: return _adapter_cache[key] except KeyError: pass if ssl_options or ssl_ciphers: ssl_context = urllib3.connection.create_urllib3_context( options=ssl_options or None, ciphers=ssl_ciphers) if not requests.__version__ < "2.32": # https://github.com/psf/requests/pull/6731 ssl_context.load_verify_locations(requests.certs.where()) ssl_context.check_hostname = False else: ssl_context = None adapter = _adapter_cache[key] = RequestsAdapter( ssl_context, source_address) return adapter @cache.cache(maxage=86400) def _browser_useragent(): """Get User-Agent header from default browser""" import webbrowser import socket server = socket.socket(socket.AF_INET, socket.SOCK_STREAM) server.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) server.bind(("127.0.0.1", 0)) server.listen(1) host, port = server.getsockname() webbrowser.open("http://{}:{}/user-agent".format(host, port)) client = server.accept()[0] server.close() for line in client.recv(1024).split(b"\r\n"): key, _, value = line.partition(b":") if key.strip().lower() == b"user-agent": useragent = value.strip() break else: useragent = b"" client.send(b"HTTP/1.1 200 OK\r\n\r\n" + useragent) client.close() return useragent.decode() _adapter_cache = {} _browser_cookies = {} HTTP_HEADERS = { "firefox": ( ("User-Agent", "Mozilla/5.0 ({}; " "rv:128.0) Gecko/20100101 Firefox/128.0"), ("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9," "image/avif,image/webp,image/png,image/svg+xml,*/*;q=0.8"), ("Accept-Language", "en-US,en;q=0.5"), ("Accept-Encoding", None), ("Referer", None), ("Connection", "keep-alive"), ("Upgrade-Insecure-Requests", "1"), ("Cookie", None), ("Sec-Fetch-Dest", "empty"), ("Sec-Fetch-Mode", "no-cors"), ("Sec-Fetch-Site", "same-origin"), ("TE", "trailers"), ), "chrome": ( ("Connection", "keep-alive"), ("Upgrade-Insecure-Requests", "1"), ("User-Agent", "Mozilla/5.0 ({}) AppleWebKit/537.36 (KHTML, " "like Gecko) Chrome/111.0.0.0 Safari/537.36"), ("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9," "image/avif,image/webp,image/apng,*/*;q=0.8," "application/signed-exchange;v=b3;q=0.7"), ("Referer", None), ("Sec-Fetch-Site", "same-origin"), ("Sec-Fetch-Mode", "no-cors"), ("Sec-Fetch-Dest", "empty"), ("Accept-Encoding", None), ("Accept-Language", "en-US,en;q=0.9"), ("cookie", None), ("content-length", None), ), } SSL_CIPHERS = { "firefox": ( "TLS_AES_128_GCM_SHA256:" "TLS_CHACHA20_POLY1305_SHA256:" "TLS_AES_256_GCM_SHA384:" "ECDHE-ECDSA-AES128-GCM-SHA256:" "ECDHE-RSA-AES128-GCM-SHA256:" "ECDHE-ECDSA-CHACHA20-POLY1305:" "ECDHE-RSA-CHACHA20-POLY1305:" "ECDHE-ECDSA-AES256-GCM-SHA384:" "ECDHE-RSA-AES256-GCM-SHA384:" "ECDHE-ECDSA-AES256-SHA:" "ECDHE-ECDSA-AES128-SHA:" "ECDHE-RSA-AES128-SHA:" "ECDHE-RSA-AES256-SHA:" "AES128-GCM-SHA256:" "AES256-GCM-SHA384:" "AES128-SHA:" "AES256-SHA" ), "chrome": ( "TLS_AES_128_GCM_SHA256:" "TLS_AES_256_GCM_SHA384:" "TLS_CHACHA20_POLY1305_SHA256:" "ECDHE-ECDSA-AES128-GCM-SHA256:" "ECDHE-RSA-AES128-GCM-SHA256:" "ECDHE-ECDSA-AES256-GCM-SHA384:" "ECDHE-RSA-AES256-GCM-SHA384:" "ECDHE-ECDSA-CHACHA20-POLY1305:" "ECDHE-RSA-CHACHA20-POLY1305:" "ECDHE-RSA-AES128-SHA:" "ECDHE-RSA-AES256-SHA:" "AES128-GCM-SHA256:" "AES256-GCM-SHA384:" "AES128-SHA:" "AES256-SHA" ), } # disable Basic Authorization header injection from .netrc data try: requests.sessions.get_netrc_auth = lambda _: None except Exception: pass # detect brotli support try: BROTLI = urllib3.response.brotli is not None except AttributeError: BROTLI = False # detect zstandard support try: ZSTD = urllib3.response.HAS_ZSTD except AttributeError: ZSTD = False # set (urllib3) warnings filter action = config.get((), "warnings", "default") if action: try: import warnings warnings.simplefilter(action, urllib3.exceptions.HTTPWarning) except Exception: pass del action ����������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1745260818.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.29.7/gallery_dl/extractor/cyberdrop.py�������������������������������������������������0000644�0001750�0001750�00000006074�15001510422�021115� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://cyberdrop.me/""" from . import lolisafe from .common import Message from .. import text BASE_PATTERN = r"(?:https?://)?(?:www\.)?cyberdrop\.(?:me|to)" class CyberdropAlbumExtractor(lolisafe.LolisafeAlbumExtractor): """Extractor for cyberdrop albums""" category = "cyberdrop" root = "https://cyberdrop.me" root_api = "https://api.cyberdrop.me" pattern = BASE_PATTERN + r"/a/([^/?#]+)" example = "https://cyberdrop.me/a/ID" def items(self): files, data = self.fetch_album(self.album_id) yield Message.Directory, data for data["num"], file in enumerate(files, 1): file.update(data) text.nameext_from_url(file["name"], file) file["name"], sep, file["id"] = file["filename"].rpartition("-") yield Message.Url, file["url"], file def fetch_album(self, album_id): url = "{}/a/{}".format(self.root, album_id) page = self.request(url).text extr = text.extract_from(page) desc = extr('property="og:description" content="', '"') if desc.startswith("A privacy-focused censorship-resistant file " "sharing platform free for everyone."): desc = "" extr('id="title"', "") album = { "album_id" : album_id, "album_name" : text.unescape(extr('title="', '"')), "album_size" : text.parse_bytes(extr( '<p class="title">', "B")), "date" : text.parse_datetime(extr( '<p class="title">', '<'), "%d.%m.%Y"), "description": text.unescape(text.unescape( # double desc.rpartition(" [R")[0])), } file_ids = list(text.extract_iter(page, 'id="file" href="/f/', '"')) album["count"] = len(file_ids) return self._extract_files(file_ids), album def _extract_files(self, file_ids): for file_id in file_ids: try: url = "{}/api/file/info/{}".format(self.root_api, file_id) file = self.request(url).json() auth = self.request(file["auth_url"]).json() file["url"] = auth["url"] except Exception as exc: self.log.warning("%s (%s: %s)", file_id, exc.__class__.__name__, exc) continue yield file class CyberdropMediaExtractor(CyberdropAlbumExtractor): """Extractor for cyberdrop media links""" subcategory = "media" directory_fmt = ("{category}",) pattern = BASE_PATTERN + r"/f/([^/?#]+)" example = "https://cyberdrop.me/f/ID" def fetch_album(self, album_id): return self._extract_files((album_id,)), { "album_id" : "", "album_name" : "", "album_size" : -1, "description": "", "count" : 1, } ��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1746944935.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.29.7/gallery_dl/extractor/danbooru.py��������������������������������������������������0000644�0001750�0001750�00000032212�15010041647�020736� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2014-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://danbooru.donmai.us/ and other Danbooru instances""" from .common import BaseExtractor, Message from .. import text, util import datetime class DanbooruExtractor(BaseExtractor): """Base class for danbooru extractors""" basecategory = "Danbooru" filename_fmt = "{category}_{id}_{filename}.{extension}" page_limit = 1000 page_start = None per_page = 200 useragent = util.USERAGENT request_interval = (0.5, 1.5) def _init(self): self.ugoira = self.config("ugoira", False) self.external = self.config("external", False) self.includes = False threshold = self.config("threshold") if isinstance(threshold, int): self.threshold = 1 if threshold < 1 else threshold else: self.threshold = self.per_page - 20 username, api_key = self._get_auth_info() if username: self.log.debug("Using HTTP Basic Auth for user '%s'", username) self.session.auth = util.HTTPBasicAuth(username, api_key) def skip(self, num): pages = num // self.per_page if pages >= self.page_limit: pages = self.page_limit - 1 self.page_start = pages + 1 return pages * self.per_page def items(self): # 'includes' initialization must be done here and not in '_init()' # or it'll cause an exception with e621 when 'metadata' is enabled includes = self.config("metadata") if includes: if isinstance(includes, (list, tuple)): includes = ",".join(includes) elif not isinstance(includes, str): includes = "artist_commentary,children,notes,parent,uploader" self.includes = includes + ",id" data = self.metadata() for post in self.posts(): try: url = post["file_url"] except KeyError: if self.external and post["source"]: post.update(data) yield Message.Directory, post yield Message.Queue, post["source"], post continue text.nameext_from_url(url, post) post["date"] = text.parse_datetime( post["created_at"], "%Y-%m-%dT%H:%M:%S.%f%z") post["tags"] = ( post["tag_string"].split(" ") if post["tag_string"] else ()) post["tags_artist"] = ( post["tag_string_artist"].split(" ") if post["tag_string_artist"] else ()) post["tags_character"] = ( post["tag_string_character"].split(" ") if post["tag_string_character"] else ()) post["tags_copyright"] = ( post["tag_string_copyright"].split(" ") if post["tag_string_copyright"] else ()) post["tags_general"] = ( post["tag_string_general"].split(" ") if post["tag_string_general"] else ()) post["tags_meta"] = ( post["tag_string_meta"].split(" ") if post["tag_string_meta"] else ()) if post["extension"] == "zip": if self.ugoira: post["_ugoira_original"] = False post["_ugoira_frame_data"] = post["frames"] = \ self._ugoira_frames(post) post["_http_adjust_extension"] = False else: url = post["large_file_url"] post["extension"] = "webm" if url[0] == "/": url = self.root + url post.update(data) yield Message.Directory, post yield Message.Url, url, post def items_artists(self): for artist in self.artists(): artist["_extractor"] = DanbooruTagExtractor url = "{}/posts?tags={}".format( self.root, text.quote(artist["name"])) yield Message.Queue, url, artist def metadata(self): return () def posts(self): return () def _pagination(self, endpoint, params, prefix=None): url = self.root + endpoint params["limit"] = self.per_page params["page"] = self.page_start first = True while True: posts = self.request(url, params=params).json() if isinstance(posts, dict): posts = posts["posts"] if posts: if self.includes: params_meta = { "only" : self.includes, "limit": len(posts), "tags" : "id:" + ",".join(str(p["id"]) for p in posts), } data = { meta["id"]: meta for meta in self.request( url, params=params_meta).json() } for post in posts: post.update(data[post["id"]]) if prefix == "a" and not first: posts.reverse() yield from posts if len(posts) < self.threshold: return if prefix: params["page"] = "{}{}".format(prefix, posts[-1]["id"]) elif params["page"]: params["page"] += 1 else: params["page"] = 2 first = False def _ugoira_frames(self, post): data = self.request("{}/posts/{}.json?only=media_metadata".format( self.root, post["id"]) ).json()["media_metadata"]["metadata"] ext = data["ZIP:ZipFileName"].rpartition(".")[2] fmt = ("{:>06}." + ext).format delays = data["Ugoira:FrameDelays"] return [{"file": fmt(index), "delay": delay} for index, delay in enumerate(delays)] def _collection_posts(self, cid, ctype): reverse = prefix = None order = self.config("order-posts") if not order or order in {"asc", "pool", "pool_asc", "asc_pool"}: params = {"tags": "ord{}:{}".format(ctype, cid)} elif order in {"id", "desc_id", "id_desc"}: params = {"tags": "{}:{}".format(ctype, cid)} prefix = "b" elif order in {"desc", "desc_pool", "pool_desc"}: params = {"tags": "ord{}:{}".format(ctype, cid)} reverse = True elif order in {"asc_id", "id_asc"}: params = {"tags": "{}:{}".format(ctype, cid)} reverse = True posts = self._pagination("/posts.json", params, prefix) if reverse: self.log.info("Collecting posts of %s %s", ctype, cid) return self._collection_enumerate_reverse(posts) else: return self._collection_enumerate(posts) def _collection_metadata(self, cid, ctype, cname=None): url = "{}/{}s/{}.json".format(self.root, cname or ctype, cid) collection = self.request(url).json() collection["name"] = collection["name"].replace("_", " ") self.post_ids = collection.pop("post_ids", ()) return {ctype: collection} def _collection_enumerate(self, posts): pid_to_num = {pid: num for num, pid in enumerate(self.post_ids, 1)} for post in posts: post["num"] = pid_to_num[post["id"]] yield post def _collection_enumerate_reverse(self, posts): posts = list(posts) posts.reverse() pid_to_num = {pid: num for num, pid in enumerate(self.post_ids, 1)} for post in posts: post["num"] = pid_to_num[post["id"]] return posts BASE_PATTERN = DanbooruExtractor.update({ "danbooru": { "root": None, "pattern": r"(?:(?:danbooru|hijiribe|sonohara|safebooru)\.donmai\.us" r"|donmai\.moe)", }, "atfbooru": { "root": "https://booru.allthefallen.moe", "pattern": r"booru\.allthefallen\.moe", }, "aibooru": { "root": None, "pattern": r"(?:safe\.)?aibooru\.online", }, "booruvar": { "root": "https://booru.borvar.art", "pattern": r"booru\.borvar\.art", }, }) class DanbooruTagExtractor(DanbooruExtractor): """Extractor for danbooru posts from tag searches""" subcategory = "tag" directory_fmt = ("{category}", "{search_tags}") archive_fmt = "t_{search_tags}_{id}" pattern = BASE_PATTERN + r"/posts\?(?:[^&#]*&)*tags=([^&#]*)" example = "https://danbooru.donmai.us/posts?tags=TAG" def metadata(self): self.tags = text.unquote(self.groups[-1].replace("+", " ")) return {"search_tags": self.tags} def posts(self): prefix = "b" for tag in self.tags.split(): if tag.startswith("order:"): if tag == "order:id" or tag == "order:id_asc": prefix = "a" elif tag == "order:id_desc": prefix = "b" else: prefix = None elif tag.startswith( ("id:", "md5:", "ordfav:", "ordfavgroup:", "ordpool:")): prefix = None break return self._pagination("/posts.json", {"tags": self.tags}, prefix) class DanbooruPoolExtractor(DanbooruExtractor): """Extractor for Danbooru pools""" subcategory = "pool" directory_fmt = ("{category}", "pool", "{pool[id]} {pool[name]}") filename_fmt = "{num:>04}_{id}_{filename}.{extension}" archive_fmt = "p_{pool[id]}_{id}" pattern = BASE_PATTERN + r"/pool(?:s|/show)/(\d+)" example = "https://danbooru.donmai.us/pools/12345" def metadata(self): self.pool_id = self.groups[-1] return self._collection_metadata(self.pool_id, "pool") def posts(self): return self._collection_posts(self.pool_id, "pool") class DanbooruFavgroupExtractor(DanbooruExtractor): """Extractor for Danbooru favorite groups""" subcategory = "favgroup" directory_fmt = ("{category}", "Favorite Groups", "{favgroup[id]} {favgroup[name]}") filename_fmt = "{num:>04}_{id}_{filename}.{extension}" archive_fmt = "fg_{favgroup[id]}_{id}" pattern = BASE_PATTERN + r"/favorite_group(?:s|/show)/(\d+)" example = "https://danbooru.donmai.us/favorite_groups/12345" def metadata(self): return self._collection_metadata( self.groups[-1], "favgroup", "favorite_group") def posts(self): return self._collection_posts(self.groups[-1], "favgroup") class DanbooruPostExtractor(DanbooruExtractor): """Extractor for single danbooru posts""" subcategory = "post" archive_fmt = "{id}" pattern = BASE_PATTERN + r"/post(?:s|/show)/(\d+)" example = "https://danbooru.donmai.us/posts/12345" def posts(self): url = "{}/posts/{}.json".format(self.root, self.groups[-1]) post = self.request(url).json() if self.includes: params = {"only": self.includes} post.update(self.request(url, params=params).json()) return (post,) class DanbooruPopularExtractor(DanbooruExtractor): """Extractor for popular images from danbooru""" subcategory = "popular" directory_fmt = ("{category}", "popular", "{scale}", "{date}") archive_fmt = "P_{scale[0]}_{date}_{id}" pattern = BASE_PATTERN + r"/(?:explore/posts/)?popular(?:\?([^#]*))?" example = "https://danbooru.donmai.us/explore/posts/popular" def metadata(self): self.params = params = text.parse_query(self.groups[-1]) scale = params.get("scale", "day") date = params.get("date") or datetime.date.today().isoformat() if scale == "week": date = datetime.date.fromisoformat(date) date = (date - datetime.timedelta(days=date.weekday())).isoformat() elif scale == "month": date = date[:-3] return {"date": date, "scale": scale} def posts(self): return self._pagination("/explore/posts/popular.json", self.params) class DanbooruArtistExtractor(DanbooruExtractor): """Extractor for danbooru artists""" subcategory = "artist" pattern = BASE_PATTERN + r"/artists/(\d+)" example = "https://danbooru.donmai.us/artists/12345" items = DanbooruExtractor.items_artists def artists(self): url = "{}/artists/{}.json".format(self.root, self.groups[-1]) return (self.request(url).json(),) class DanbooruArtistSearchExtractor(DanbooruExtractor): """Extractor for danbooru artist searches""" subcategory = "artist-search" pattern = BASE_PATTERN + r"/artists/?\?([^#]+)" example = "https://danbooru.donmai.us/artists?QUERY" items = DanbooruExtractor.items_artists def artists(self): url = self.root + "/artists.json" params = text.parse_query(self.groups[-1]) params["page"] = text.parse_int(params.get("page"), 1) while True: artists = self.request(url, params=params).json() yield from artists if len(artists) < 20: return params["page"] += 1 ��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1743510441.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.29.7/gallery_dl/extractor/desktopography.py��������������������������������������������0000644�0001750�0001750�00000006111�14772755651�022214� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://desktopography.net/""" from .common import Extractor, Message from .. import text BASE_PATTERN = r"(?:https?://)?desktopography\.net" class DesktopographyExtractor(Extractor): """Base class for desktopography extractors""" category = "desktopography" archive_fmt = "{filename}" root = "https://desktopography.net" class DesktopographySiteExtractor(DesktopographyExtractor): """Extractor for all desktopography exhibitions """ subcategory = "site" pattern = BASE_PATTERN + r"/$" example = "https://desktopography.net/" def items(self): page = self.request(self.root).text data = {"_extractor": DesktopographyExhibitionExtractor} for exhibition_year in text.extract_iter( page, '<a href="https://desktopography.net/exhibition-', '/">'): url = self.root + "/exhibition-" + exhibition_year + "/" yield Message.Queue, url, data class DesktopographyExhibitionExtractor(DesktopographyExtractor): """Extractor for a yearly desktopography exhibition""" subcategory = "exhibition" pattern = BASE_PATTERN + r"/exhibition-([^/?#]+)/" example = "https://desktopography.net/exhibition-2020/" def __init__(self, match): DesktopographyExtractor.__init__(self, match) self.year = match.group(1) def items(self): url = "{}/exhibition-{}/".format(self.root, self.year) base_entry_url = "https://desktopography.net/portfolios/" page = self.request(url).text data = { "_extractor": DesktopographyEntryExtractor, "year": self.year, } for entry_url in text.extract_iter( page, '<a class="overlay-background" href="' + base_entry_url, '">'): url = base_entry_url + entry_url yield Message.Queue, url, data class DesktopographyEntryExtractor(DesktopographyExtractor): """Extractor for all resolutions of a desktopography wallpaper""" subcategory = "entry" pattern = BASE_PATTERN + r"/portfolios/([\w-]+)" example = "https://desktopography.net/portfolios/NAME/" def __init__(self, match): DesktopographyExtractor.__init__(self, match) self.entry = match.group(1) def items(self): url = "{}/portfolios/{}".format(self.root, self.entry) page = self.request(url).text entry_data = {"entry": self.entry} yield Message.Directory, entry_data for image_data in text.extract_iter( page, '<a target="_blank" href="https://desktopography.net', '">'): path, _, filename = image_data.partition( '" class="wallpaper-button" download="') text.nameext_from_url(filename, entry_data) yield Message.Url, self.root + path, entry_data �������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1747857760.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.29.7/gallery_dl/extractor/deviantart.py������������������������������������������������0000644�0001750�0001750�00000245454�15013430540�021301� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2015-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.deviantart.com/""" from .common import Extractor, Message from .. import text, util, exception from ..cache import cache, memcache import collections import mimetypes import binascii import time import re BASE_PATTERN = ( r"(?:https?://)?(?:" r"(?:www\.)?(?:fx)?deviantart\.com/(?!watch/)([\w-]+)|" r"(?!www\.)([\w-]+)\.(?:fx)?deviantart\.com)" ) DEFAULT_AVATAR = "https://a.deviantart.net/avatars/default.gif" class DeviantartExtractor(Extractor): """Base class for deviantart extractors""" category = "deviantart" root = "https://www.deviantart.com" directory_fmt = ("{category}", "{username}") filename_fmt = "{category}_{index}_{title}.{extension}" cookies_domain = ".deviantart.com" cookies_names = ("auth", "auth_secure", "userinfo") _last_request = 0 def __init__(self, match): Extractor.__init__(self, match) self.user = (match.group(1) or match.group(2) or "").lower() self.offset = 0 def _init(self): self.jwt = self.config("jwt", False) self.flat = self.config("flat", True) self.extra = self.config("extra", False) self.quality = self.config("quality", "100") self.original = self.config("original", True) self.previews = self.config("previews", False) self.intermediary = self.config("intermediary", True) self.comments_avatars = self.config("comments-avatars", False) self.comments = self.comments_avatars or self.config("comments", False) self.api = DeviantartOAuthAPI(self) self.eclipse_api = None self.group = False self._premium_cache = {} unwatch = self.config("auto-unwatch") if unwatch: self.unwatch = [] self.finalize = self._unwatch_premium else: self.unwatch = None if self.quality: if self.quality == "png": self.quality = "-fullview.png?" self.quality_sub = re.compile(r"-fullview\.[a-z0-9]+\?").sub else: self.quality = ",q_{}".format(self.quality) self.quality_sub = re.compile(r",q_\d+").sub if isinstance(self.original, str) and \ self.original.lower().startswith("image"): self.original = True self._update_content = self._update_content_image else: self._update_content = self._update_content_default if self.previews == "all": self.previews_images = self.previews = True else: self.previews_images = False journals = self.config("journals", "html") if journals == "html": self.commit_journal = self._commit_journal_html elif journals == "text": self.commit_journal = self._commit_journal_text else: self.commit_journal = None def request(self, url, **kwargs): if "fatal" not in kwargs: kwargs["fatal"] = False while True: response = Extractor.request(self, url, **kwargs) if response.status_code != 403 or \ b"Request blocked." not in response.content: return response self.wait(seconds=300, reason="CloudFront block") def skip(self, num): self.offset += num return num def login(self): if self.cookies_check(self.cookies_names): return True username, password = self._get_auth_info() if username: self.cookies_update(_login_impl(self, username, password)) return True def items(self): if self.user: group = self.config("group", True) if group: user = _user_details(self, self.user) if user: self.user = user["username"] self.group = False elif group == "skip": self.log.info("Skipping group '%s'", self.user) raise exception.StopExtraction() else: self.subcategory = "group-" + self.subcategory self.group = True for deviation in self.deviations(): if isinstance(deviation, tuple): url, data = deviation yield Message.Queue, url, data continue if deviation["is_deleted"]: # prevent crashing in case the deviation really is # deleted self.log.debug( "Skipping %s (deleted)", deviation["deviationid"]) continue tier_access = deviation.get("tier_access") if tier_access == "locked": self.log.debug( "Skipping %s (access locked)", deviation["deviationid"]) continue if "premium_folder_data" in deviation: data = self._fetch_premium(deviation) if not data: continue deviation.update(data) self.prepare(deviation) yield Message.Directory, deviation if "content" in deviation: content = self._extract_content(deviation) yield self.commit(deviation, content) elif deviation["is_downloadable"]: content = self.api.deviation_download(deviation["deviationid"]) deviation["is_original"] = True yield self.commit(deviation, content) if "videos" in deviation and deviation["videos"]: video = max(deviation["videos"], key=lambda x: text.parse_int(x["quality"][:-1])) deviation["is_original"] = False yield self.commit(deviation, video) if "flash" in deviation: deviation["is_original"] = True yield self.commit(deviation, deviation["flash"]) if self.commit_journal: journal = self._extract_journal(deviation) if journal: if self.extra: deviation["_journal"] = journal["html"] deviation["is_original"] = True yield self.commit_journal(deviation, journal) if self.comments_avatars: for comment in deviation["comments"]: user = comment["user"] name = user["username"].lower() if user["usericon"] == DEFAULT_AVATAR: self.log.debug( "Skipping avatar of '%s' (default)", name) continue _user_details.update(name, user) url = "{}/{}/avatar/".format(self.root, name) comment["_extractor"] = DeviantartAvatarExtractor yield Message.Queue, url, comment if self.previews and "preview" in deviation: preview = deviation["preview"] deviation["is_preview"] = True if self.previews_images: yield self.commit(deviation, preview) else: mtype = mimetypes.guess_type( "a." + deviation["extension"], False)[0] if mtype and not mtype.startswith("image/"): yield self.commit(deviation, preview) del deviation["is_preview"] if not self.extra: continue # ref: https://www.deviantart.com # /developers/http/v1/20210526/object/editor_text # the value of "features" is a JSON string with forward # slashes escaped text_content = \ deviation["text_content"]["body"]["features"].replace( "\\/", "/") if "text_content" in deviation else None for txt in (text_content, deviation.get("description"), deviation.get("_journal")): if txt is None: continue for match in DeviantartStashExtractor.pattern.finditer(txt): url = text.ensure_http_scheme(match.group(0)) deviation["_extractor"] = DeviantartStashExtractor yield Message.Queue, url, deviation def deviations(self): """Return an iterable containing all relevant Deviation-objects""" def prepare(self, deviation): """Adjust the contents of a Deviation-object""" if "index" not in deviation: try: if deviation["url"].startswith(( "https://www.deviantart.com/stash/", "https://sta.sh", )): filename = deviation["content"]["src"].split("/")[5] deviation["index_base36"] = filename.partition("-")[0][1:] deviation["index"] = id_from_base36( deviation["index_base36"]) else: deviation["index"] = text.parse_int( deviation["url"].rpartition("-")[2]) except KeyError: deviation["index"] = 0 deviation["index_base36"] = "0" if "index_base36" not in deviation: deviation["index_base36"] = base36_from_id(deviation["index"]) if self.user: deviation["username"] = self.user deviation["_username"] = self.user.lower() else: deviation["username"] = deviation["author"]["username"] deviation["_username"] = deviation["username"].lower() deviation["published_time"] = text.parse_int( deviation["published_time"]) deviation["date"] = text.parse_timestamp( deviation["published_time"]) if self.comments: deviation["comments"] = ( self._extract_comments(deviation["deviationid"], "deviation") if deviation["stats"]["comments"] else () ) # filename metadata sub = re.compile(r"\W").sub deviation["filename"] = "".join(( sub("_", deviation["title"].lower()), "_by_", sub("_", deviation["author"]["username"].lower()), "-d", deviation["index_base36"], )) @staticmethod def commit(deviation, target): url = target["src"] name = target.get("filename") or url target = target.copy() target["filename"] = deviation["filename"] deviation["target"] = target deviation["extension"] = target["extension"] = text.ext_from_url(name) if "is_original" not in deviation: deviation["is_original"] = ("/v1/" not in url) return Message.Url, url, deviation def _commit_journal_html(self, deviation, journal): title = text.escape(deviation["title"]) url = deviation["url"] thumbs = deviation.get("thumbs") or deviation.get("files") html = journal["html"] shadow = SHADOW_TEMPLATE.format_map(thumbs[0]) if thumbs else "" if not html: self.log.warning("%s: Empty journal content", deviation["index"]) if "css" in journal: css, cls = journal["css"], "withskin" elif html.startswith("<style"): css, _, html = html.partition("</style>") css = css.partition(">")[2] cls = "withskin" else: css, cls = "", "journal-green" if html.find('<div class="boxtop journaltop">', 0, 250) != -1: needle = '<div class="boxtop journaltop">' header = HEADER_CUSTOM_TEMPLATE.format( title=title, url=url, date=deviation["date"], ) else: needle = '<div usr class="gr">' username = deviation["author"]["username"] urlname = deviation.get("username") or username.lower() header = HEADER_TEMPLATE.format( title=title, url=url, userurl="{}/{}/".format(self.root, urlname), username=username, date=deviation["date"], ) if needle in html: html = html.replace(needle, header, 1) else: html = JOURNAL_TEMPLATE_HTML_EXTRA.format(header, html) html = JOURNAL_TEMPLATE_HTML.format( title=title, html=html, shadow=shadow, css=css, cls=cls) deviation["extension"] = "htm" return Message.Url, html, deviation def _commit_journal_text(self, deviation, journal): html = journal["html"] if not html: self.log.warning("%s: Empty journal content", deviation["index"]) elif html.startswith("<style"): html = html.partition("</style>")[2] head, _, tail = html.rpartition("<script") content = "\n".join( text.unescape(text.remove_html(txt)) for txt in (head or tail).split("<br />") ) txt = JOURNAL_TEMPLATE_TEXT.format( title=deviation["title"], username=deviation["author"]["username"], date=deviation["date"], content=content, ) deviation["extension"] = "txt" return Message.Url, txt, deviation def _extract_journal(self, deviation): if "excerpt" in deviation: # # empty 'html' # return self.api.deviation_content(deviation["deviationid"]) if "_page" in deviation: page = deviation["_page"] del deviation["_page"] else: page = self._limited_request(deviation["url"]).text # extract journal html from webpage html = text.extr( page, "<h2>Literature Text</h2></span><div>", "</div></section></div></div>") if html: return {"html": html} self.log.debug("%s: Failed to extract journal HTML from webpage. " "Falling back to __INITIAL_STATE__ markup.", deviation["index"]) # parse __INITIAL_STATE__ as fallback state = util.json_loads(text.extr( page, 'window.__INITIAL_STATE__ = JSON.parse("', '");') .replace("\\\\", "\\").replace("\\'", "'").replace('\\"', '"')) deviations = state["@@entities"]["deviation"] content = deviations.popitem()[1]["textContent"] html = self._textcontent_to_html(deviation, content) if html: return {"html": html} return {"html": content["excerpt"].replace("\n", "<br />")} if "body" in deviation: return {"html": deviation.pop("body")} return None def _textcontent_to_html(self, deviation, content): html = content["html"] markup = html.get("markup") if not markup or markup[0] != "{": return markup if html["type"] == "tiptap": try: return self._tiptap_to_html(markup) except Exception as exc: self.log.debug("", exc_info=exc) self.log.error("%s: '%s: %s'", deviation["index"], exc.__class__.__name__, exc) self.log.warning("%s: Unsupported '%s' markup.", deviation["index"], html["type"]) def _tiptap_to_html(self, markup): html = [] html.append('<div data-editor-viewer="1" ' 'class="_83r8m _2CKTq _3NjDa mDnFl">') data = util.json_loads(markup) for block in data["document"]["content"]: self._tiptap_process_content(html, block) html.append("</div>") return "".join(html) def _tiptap_process_content(self, html, content): type = content["type"] if type == "paragraph": children = content.get("content") if children: html.append('<p style="') attrs = content["attrs"] if "textAlign" in attrs: html.append("text-align:") html.append(attrs["textAlign"]) html.append(";") self._tiptap_process_indentation(html, attrs) html.append('">') for block in children: self._tiptap_process_content(html, block) html.append("</p>") else: html.append('<p class="empty-p"><br/></p>') elif type == "text": self._tiptap_process_text(html, content) elif type == "heading": attrs = content["attrs"] level = str(attrs.get("level") or "3") html.append("<h") html.append(level) html.append(' style="text-align:') html.append(attrs.get("textAlign") or "left") html.append('">') html.append('<span style="') self._tiptap_process_indentation(html, attrs) html.append('">') self._tiptap_process_children(html, content) html.append("</span></h") html.append(level) html.append(">") elif type in ("listItem", "bulletList", "orderedList", "blockquote"): c = type[1] tag = ( "li" if c == "i" else "ul" if c == "u" else "ol" if c == "r" else "blockquote" ) html.append("<" + tag + ">") self._tiptap_process_children(html, content) html.append("</" + tag + ">") elif type == "anchor": attrs = content["attrs"] html.append('<a id="') html.append(attrs.get("id") or "") html.append('" data-testid="anchor"></a>') elif type == "hardBreak": html.append("<br/><br/>") elif type == "horizontalRule": html.append("<hr/>") elif type == "da-deviation": self._tiptap_process_deviation(html, content) elif type == "da-mention": user = content["attrs"]["user"]["username"] html.append('<a href="https://www.deviantart.com/') html.append(user.lower()) html.append('" data-da-type="da-mention" data-user="">@<!-- -->') html.append(user) html.append('</a>') elif type == "da-gif": attrs = content["attrs"] width = str(attrs.get("width") or "") height = str(attrs.get("height") or "") url = text.escape(attrs.get("url") or "") html.append('<div data-da-type="da-gif" data-width="') html.append(width) html.append('" data-height="') html.append(height) html.append('" data-alignment="') html.append(attrs.get("alignment") or "") html.append('" data-url="') html.append(url) html.append('" class="t61qu"><video role="img" autoPlay="" ' 'muted="" loop="" style="pointer-events:none" ' 'controlsList="nofullscreen" playsInline="" ' 'aria-label="gif" data-da-type="da-gif" width="') html.append(width) html.append('" height="') html.append(height) html.append('" src="') html.append(url) html.append('" class="_1Fkk6"></video></div>') elif type == "da-video": src = text.escape(content["attrs"].get("src") or "") html.append('<div data-testid="video" data-da-type="da-video" ' 'data-src="') html.append(src) html.append('" class="_1Uxvs"><div data-canfs="yes" data-testid="v' 'ideo-inner" class="main-video" style="width:780px;hei' 'ght:438px"><div style="width:780px;height:438px">' '<video src="') html.append(src) html.append('" style="width:100%;height:100%;" preload="auto" cont' 'rols=""></video></div></div></div>') else: self.log.warning("Unsupported content type '%s'", type) def _tiptap_process_text(self, html, content): marks = content.get("marks") if marks: close = [] for mark in marks: type = mark["type"] if type == "link": attrs = mark.get("attrs") or {} html.append('<a href="') html.append(text.escape(attrs.get("href") or "")) if "target" in attrs: html.append('" target="') html.append(attrs["target"]) html.append('" rel="') html.append(attrs.get("rel") or "noopener noreferrer nofollow ugc") html.append('">') close.append("</a>") elif type == "bold": html.append("<strong>") close.append("</strong>") elif type == "italic": html.append("<em>") close.append("</em>") elif type == "underline": html.append("<u>") close.append("</u>") elif type == "strike": html.append("<s>") close.append("</s>") elif type == "textStyle" and len(mark) <= 1: pass else: self.log.warning("Unsupported text marker '%s'", type) close.reverse() html.append(text.escape(content["text"])) html.extend(close) else: html.append(text.escape(content["text"])) def _tiptap_process_children(self, html, content): children = content.get("content") if children: for block in children: self._tiptap_process_content(html, block) def _tiptap_process_indentation(self, html, attrs): itype = ("text-indent" if attrs.get("indentType") == "line" else "margin-inline-start") isize = str((attrs.get("indentation") or 0) * 24) html.append(itype + ":" + isize + "px") def _tiptap_process_deviation(self, html, content): dev = content["attrs"]["deviation"] media = dev.get("media") or () html.append('<div class="jjNX2">') html.append('<figure class="Qf-HY" data-da-type="da-deviation" ' 'data-deviation="" ' 'data-width="" data-link="" data-alignment="center">') if "baseUri" in media: url, formats = self._eclipse_media(media) full = formats["fullview"] html.append('<a href="') html.append(text.escape(dev["url"])) html.append('" class="_3ouD5" style="margin:0 auto;display:flex;' 'align-items:center;justify-content:center;' 'overflow:hidden;width:780px;height:') html.append(str(780 * full["h"] / full["w"])) html.append('px">') html.append('<img src="') html.append(text.escape(url)) html.append('" alt="') html.append(text.escape(dev["title"])) html.append('" style="width:100%;max-width:100%;display:block"/>') html.append("</a>") elif "textContent" in dev: html.append('<div class="_32Hs4" style="width:350px">') html.append('<a href="') html.append(text.escape(dev["url"])) html.append('" class="_3ouD5">') html.append('''\ <section class="Q91qI aG7Yi" style="width:350px;height:313px">\ <div class="_16ECM _1xMkk" aria-hidden="true">\ <svg height="100%" viewBox="0 0 15 12" preserveAspectRatio="xMidYMin slice" \ fill-rule="evenodd">\ <linearGradient x1="87.8481761%" y1="16.3690766%" \ x2="45.4107524%" y2="71.4898596%" id="app-root-3">\ <stop stop-color="#00FF62" offset="0%"></stop>\ <stop stop-color="#3197EF" stop-opacity="0" offset="100%"></stop>\ </linearGradient>\ <text class="_2uqbc" fill="url(#app-root-3)" text-anchor="end" x="15" y="11">J\ </text></svg></div><div class="_1xz9u">Literature</div><h3 class="_2WvKD">\ ''') html.append(text.escape(dev["title"])) html.append('</h3><div class="_2CPLm">') html.append(text.escape(dev["textContent"]["excerpt"])) html.append('</div></section></a></div>') html.append('</figure></div>') def _extract_content(self, deviation): content = deviation["content"] if self.original and deviation["is_downloadable"]: self._update_content(deviation, content) return content if self.jwt: self._update_token(deviation, content) return content if content["src"].startswith("https://images-wixmp-"): if self.intermediary and deviation["index"] <= 790677560: # https://github.com/r888888888/danbooru/issues/4069 intermediary, count = re.subn( r"(/f/[^/]+/[^/]+)/v\d+/.*", r"/intermediary\1", content["src"], 1) if count: deviation["is_original"] = False deviation["_fallback"] = (content["src"],) content["src"] = intermediary if self.quality: content["src"] = self.quality_sub( self.quality, content["src"], 1) return content @staticmethod def _find_folder(folders, name, uuid): if uuid.isdecimal(): match = re.compile(name.replace( "-", r"[^a-z0-9]+") + "$", re.IGNORECASE).match for folder in folders: if match(folder["name"]): return folder elif folder.get("has_subfolders"): for subfolder in folder["subfolders"]: if match(subfolder["name"]): return subfolder else: for folder in folders: if folder["folderid"] == uuid: return folder elif folder.get("has_subfolders"): for subfolder in folder["subfolders"]: if subfolder["folderid"] == uuid: return subfolder raise exception.NotFoundError("folder") def _folder_urls(self, folders, category, extractor): base = "{}/{}/{}/".format(self.root, self.user, category) for folder in folders: folder["_extractor"] = extractor url = "{}{}/{}".format(base, folder["folderid"], folder["name"]) yield url, folder def _update_content_default(self, deviation, content): if "premium_folder_data" in deviation or deviation.get("is_mature"): public = False else: public = None data = self.api.deviation_download(deviation["deviationid"], public) content.update(data) deviation["is_original"] = True def _update_content_image(self, deviation, content): data = self.api.deviation_download(deviation["deviationid"]) url = data["src"].partition("?")[0] mtype = mimetypes.guess_type(url, False)[0] if mtype and mtype.startswith("image/"): content.update(data) deviation["is_original"] = True def _update_token(self, deviation, content): """Replace JWT to be able to remove width/height limits All credit goes to @Ironchest337 for discovering and implementing this method """ url, sep, _ = content["src"].partition("/v1/") if not sep: return # 'images-wixmp' returns 401 errors, but just 'wixmp' still works url = url.replace("//images-wixmp", "//wixmp", 1) # header = b'{"typ":"JWT","alg":"none"}' payload = ( b'{"sub":"urn:app:","iss":"urn:app:","obj":[[{"path":"/f/' + url.partition("/f/")[2].encode() + b'"}]],"aud":["urn:service:file.download"]}' ) deviation["_fallback"] = (content["src"],) deviation["is_original"] = True content["src"] = ( "{}?token=eyJ0eXAiOiJKV1QiLCJhbGciOiJub25lIn0.{}.".format( url, # base64 of 'header' is precomputed as 'eyJ0eX...' # binascii.b2a_base64(header).rstrip(b"=\n").decode(), binascii.b2a_base64(payload).rstrip(b"=\n").decode()) ) def _extract_comments(self, target_id, target_type="deviation"): results = None comment_ids = [None] while comment_ids: comments = self.api.comments( target_id, target_type, comment_ids.pop()) if results: results.extend(comments) else: results = comments # parent comments, i.e. nodes with at least one child parents = {c["parentid"] for c in comments} # comments with more than one reply replies = {c["commentid"] for c in comments if c["replies"]} # add comment UUIDs with replies that are not parent to any node comment_ids.extend(replies - parents) return results def _limited_request(self, url, **kwargs): """Limits HTTP requests to one every 2 seconds""" diff = time.time() - DeviantartExtractor._last_request if diff < 2.0: self.sleep(2.0 - diff, "request") response = self.request(url, **kwargs) DeviantartExtractor._last_request = time.time() return response def _fetch_premium(self, deviation): try: return self._premium_cache[deviation["deviationid"]] except KeyError: pass if not self.api.refresh_token_key: self.log.warning( "Unable to access premium content (no refresh-token)") self._fetch_premium = lambda _: None return None dev = self.api.deviation(deviation["deviationid"], False) folder = deviation["premium_folder_data"] username = dev["author"]["username"] # premium_folder_data is no longer present when user has access (#5063) has_access = ("premium_folder_data" not in dev) or folder["has_access"] if not has_access and folder["type"] == "watchers" and \ self.config("auto-watch"): if self.unwatch is not None: self.unwatch.append(username) if self.api.user_friends_watch(username): has_access = True self.log.info( "Watching %s for premium folder access", username) else: self.log.warning( "Error when trying to watch %s. " "Try again with a new refresh-token", username) if has_access: self.log.info("Fetching premium folder data") else: self.log.warning("Unable to access premium content (type: %s)", folder["type"]) cache = self._premium_cache for dev in self.api.gallery( username, folder["gallery_id"], public=False): cache[dev["deviationid"]] = dev if has_access else None return cache.get(deviation["deviationid"]) def _unwatch_premium(self): for username in self.unwatch: self.log.info("Unwatching %s", username) self.api.user_friends_unwatch(username) def _eclipse_media(self, media, format="preview"): url = [media["baseUri"]] formats = { fmt["t"]: fmt for fmt in media["types"] } tokens = media.get("token") or () if tokens: if len(tokens) <= 1: fmt = formats[format] if "c" in fmt: url.append(fmt["c"].replace( "<prettyName>", media["prettyName"])) url.append("?token=") url.append(tokens[-1]) return "".join(url), formats def _eclipse_to_oauth(self, eclipse_api, deviations): for obj in deviations: deviation = obj["deviation"] if "deviation" in obj else obj deviation_uuid = eclipse_api.deviation_extended_fetch( deviation["deviationId"], deviation["author"]["username"], "journal" if deviation["isJournal"] else "art", )["deviation"]["extended"]["deviationUuid"] yield self.api.deviation(deviation_uuid) def _unescape_json(self, json): return json.replace('\\"', '"') \ .replace("\\'", "'") \ .replace("\\\\", "\\") class DeviantartUserExtractor(DeviantartExtractor): """Extractor for an artist's user profile""" subcategory = "user" pattern = BASE_PATTERN + r"/?$" example = "https://www.deviantart.com/USER" def initialize(self): pass skip = Extractor.skip def items(self): base = "{}/{}/".format(self.root, self.user) return self._dispatch_extractors(( (DeviantartAvatarExtractor , base + "avatar"), (DeviantartBackgroundExtractor, base + "banner"), (DeviantartGalleryExtractor , base + "gallery"), (DeviantartScrapsExtractor , base + "gallery/scraps"), (DeviantartJournalExtractor , base + "posts"), (DeviantartStatusExtractor , base + "posts/statuses"), (DeviantartFavoriteExtractor , base + "favourites"), ), ("gallery",)) ############################################################################### # OAuth ####################################################################### class DeviantartGalleryExtractor(DeviantartExtractor): """Extractor for all deviations from an artist's gallery""" subcategory = "gallery" archive_fmt = "g_{_username}_{index}.{extension}" pattern = (BASE_PATTERN + r"/gallery" r"(?:/all|/recommended-for-you|/?\?catpath=)?/?$") example = "https://www.deviantart.com/USER/gallery/" def deviations(self): if self.flat and not self.group: return self.api.gallery_all(self.user, self.offset) folders = self.api.gallery_folders(self.user) return self._folder_urls(folders, "gallery", DeviantartFolderExtractor) class DeviantartAvatarExtractor(DeviantartExtractor): """Extractor for an artist's avatar""" subcategory = "avatar" archive_fmt = "a_{_username}_{index}" pattern = BASE_PATTERN + r"/avatar" example = "https://www.deviantart.com/USER/avatar/" def deviations(self): name = self.user.lower() user = _user_details(self, name) if not user: return () icon = user["usericon"] if icon == DEFAULT_AVATAR: self.log.debug("Skipping avatar of '%s' (default)", name) return () _, sep, index = icon.rpartition("?") if not sep: index = "0" formats = self.config("formats") if not formats: url = icon.replace("/avatars/", "/avatars-big/", 1) return (self._make_deviation(url, user, index, ""),) if isinstance(formats, str): formats = formats.replace(" ", "").split(",") results = [] for fmt in formats: fmt, _, ext = fmt.rpartition(".") if fmt: fmt = "-" + fmt url = "https://a.deviantart.net/avatars{}/{}/{}/{}.{}?{}".format( fmt, name[0], name[1], name, ext, index) results.append(self._make_deviation(url, user, index, fmt)) return results def _make_deviation(self, url, user, index, fmt): return { "author" : user, "da_category" : "avatar", "index" : text.parse_int(index), "is_deleted" : False, "is_downloadable": False, "published_time" : 0, "title" : "avatar" + fmt, "stats" : {"comments": 0}, "content" : {"src": url}, } class DeviantartBackgroundExtractor(DeviantartExtractor): """Extractor for an artist's banner""" subcategory = "background" archive_fmt = "b_{index}" pattern = BASE_PATTERN + r"/ba(?:nner|ckground)" example = "https://www.deviantart.com/USER/banner/" def deviations(self): try: return (self.api.user_profile(self.user.lower()) ["cover_deviation"]["cover_deviation"],) except Exception: return () class DeviantartFolderExtractor(DeviantartExtractor): """Extractor for deviations inside an artist's gallery folder""" subcategory = "folder" directory_fmt = ("{category}", "{username}", "{folder[title]}") archive_fmt = "F_{folder[uuid]}_{index}.{extension}" pattern = BASE_PATTERN + r"/gallery/([^/?#]+)/([^/?#]+)" example = "https://www.deviantart.com/USER/gallery/12345/TITLE" def __init__(self, match): DeviantartExtractor.__init__(self, match) self.folder = None self.folder_id = match.group(3) self.folder_name = match.group(4) def deviations(self): folders = self.api.gallery_folders(self.user) folder = self._find_folder(folders, self.folder_name, self.folder_id) # Leaving this here for backwards compatibility self.folder = { "title": folder["name"], "uuid" : folder["folderid"], "index": self.folder_id, "owner": self.user, "parent_uuid": folder["parent"], } if folder.get("subfolder"): self.folder["parent_folder"] = folder["parent_folder"] self.archive_fmt = "F_{folder[parent_uuid]}_{index}.{extension}" if self.flat: self.directory_fmt = ("{category}", "{username}", "{folder[parent_folder]}") else: self.directory_fmt = ("{category}", "{username}", "{folder[parent_folder]}", "{folder[title]}") if folder.get("has_subfolders") and self.config("subfolders", True): for subfolder in folder["subfolders"]: subfolder["parent_folder"] = folder["name"] subfolder["subfolder"] = True yield from self._folder_urls( folder["subfolders"], "gallery", DeviantartFolderExtractor) yield from self.api.gallery(self.user, folder["folderid"], self.offset) def prepare(self, deviation): DeviantartExtractor.prepare(self, deviation) deviation["folder"] = self.folder class DeviantartStashExtractor(DeviantartExtractor): """Extractor for sta.sh-ed deviations""" subcategory = "stash" archive_fmt = "{index}.{extension}" pattern = (r"(?:https?://)?(?:(?:www\.)?deviantart\.com/stash|sta\.s(h))" r"/([a-z0-9]+)") example = "https://www.deviantart.com/stash/abcde" skip = Extractor.skip def __init__(self, match): DeviantartExtractor.__init__(self, match) self.user = None def deviations(self, stash_id=None, stash_data=None): if stash_id is None: legacy_url, stash_id = self.groups else: legacy_url = False if legacy_url and stash_id[0] == "2": url = "https://sta.sh/" + stash_id response = self._limited_request(url) stash_id = response.url.rpartition("/")[2] page = response.text else: url = "https://www.deviantart.com/stash/" + stash_id page = self._limited_request(url).text if stash_id[0] == "0": uuid = text.extr(page, '//deviation/', '"') if uuid: deviation = self.api.deviation(uuid) deviation["_page"] = page deviation["index"] = text.parse_int(text.extr( page, '\\"deviationId\\":', ',')) deviation["stash_id"] = stash_id if stash_data: folder = stash_data["folder"] deviation["stash_name"] = folder["name"] deviation["stash_folder"] = folder["folderId"] deviation["stash_parent"] = folder["parentId"] or 0 deviation["stash_description"] = \ folder["richDescription"]["excerpt"] else: deviation["stash_name"] = "" deviation["stash_description"] = "" deviation["stash_folder"] = 0 deviation["stash_parent"] = 0 yield deviation return stash_data = text.extr(page, ',\\"stash\\":', ',\\"@@') if stash_data: stash_data = util.json_loads(self._unescape_json(stash_data)) for sid in text.extract_iter( page, 'href="https://www.deviantart.com/stash/', '"'): if sid == stash_id or sid.endswith("#comments"): continue yield from self.deviations(sid, stash_data) class DeviantartFavoriteExtractor(DeviantartExtractor): """Extractor for an artist's favorites""" subcategory = "favorite" directory_fmt = ("{category}", "{username}", "Favourites") archive_fmt = "f_{_username}_{index}.{extension}" pattern = BASE_PATTERN + r"/favourites(?:/all|/?\?catpath=)?/?$" example = "https://www.deviantart.com/USER/favourites/" def deviations(self): if self.flat: return self.api.collections_all(self.user, self.offset) folders = self.api.collections_folders(self.user) return self._folder_urls( folders, "favourites", DeviantartCollectionExtractor) class DeviantartCollectionExtractor(DeviantartExtractor): """Extractor for a single favorite collection""" subcategory = "collection" directory_fmt = ("{category}", "{username}", "Favourites", "{collection[title]}") archive_fmt = "C_{collection[uuid]}_{index}.{extension}" pattern = BASE_PATTERN + r"/favourites/([^/?#]+)/([^/?#]+)" example = "https://www.deviantart.com/USER/favourites/12345/TITLE" def __init__(self, match): DeviantartExtractor.__init__(self, match) self.collection = None self.collection_id = match.group(3) self.collection_name = match.group(4) def deviations(self): folders = self.api.collections_folders(self.user) folder = self._find_folder( folders, self.collection_name, self.collection_id) self.collection = { "title": folder["name"], "uuid" : folder["folderid"], "index": self.collection_id, "owner": self.user, } return self.api.collections(self.user, folder["folderid"], self.offset) def prepare(self, deviation): DeviantartExtractor.prepare(self, deviation) deviation["collection"] = self.collection class DeviantartJournalExtractor(DeviantartExtractor): """Extractor for an artist's journals""" subcategory = "journal" directory_fmt = ("{category}", "{username}", "Journal") archive_fmt = "j_{_username}_{index}.{extension}" pattern = BASE_PATTERN + r"/(?:posts(?:/journals)?|journal)/?(?:\?.*)?$" example = "https://www.deviantart.com/USER/posts/journals/" def deviations(self): return self.api.browse_user_journals(self.user, self.offset) class DeviantartStatusExtractor(DeviantartExtractor): """Extractor for an artist's status updates""" subcategory = "status" directory_fmt = ("{category}", "{username}", "Status") filename_fmt = "{category}_{index}_{title}_{date}.{extension}" archive_fmt = "S_{_username}_{index}.{extension}" pattern = BASE_PATTERN + r"/posts/statuses" example = "https://www.deviantart.com/USER/posts/statuses/" def deviations(self): for status in self.api.user_statuses(self.user, self.offset): yield from self.status(status) def status(self, status): for item in status.get("items") or (): # do not trust is_share # shared deviations/statuses if "deviation" in item: yield item["deviation"].copy() if "status" in item: yield from self.status(item["status"].copy()) # assume is_deleted == true means necessary fields are missing if status["is_deleted"]: self.log.warning( "Skipping status %s (deleted)", status.get("statusid")) return yield status def prepare(self, deviation): if "deviationid" in deviation: return DeviantartExtractor.prepare(self, deviation) try: path = deviation["url"].split("/") deviation["index"] = text.parse_int(path[-1] or path[-2]) except KeyError: deviation["index"] = 0 if self.user: deviation["username"] = self.user deviation["_username"] = self.user.lower() else: deviation["username"] = deviation["author"]["username"] deviation["_username"] = deviation["username"].lower() deviation["date"] = dt = text.parse_datetime(deviation["ts"]) deviation["published_time"] = int(util.datetime_to_timestamp(dt)) deviation["da_category"] = "Status" deviation["category_path"] = "status" deviation["is_downloadable"] = False deviation["title"] = "Status Update" comments_count = deviation.pop("comments_count", 0) deviation["stats"] = {"comments": comments_count} if self.comments: deviation["comments"] = ( self._extract_comments(deviation["statusid"], "status") if comments_count else () ) class DeviantartTagExtractor(DeviantartExtractor): """Extractor for deviations from tag searches""" subcategory = "tag" directory_fmt = ("{category}", "Tags", "{search_tags}") archive_fmt = "T_{search_tags}_{index}.{extension}" pattern = r"(?:https?://)?www\.deviantart\.com/tag/([^/?#]+)" example = "https://www.deviantart.com/tag/TAG" def __init__(self, match): DeviantartExtractor.__init__(self, match) self.tag = text.unquote(match.group(1)) def deviations(self): return self.api.browse_tags(self.tag, self.offset) def prepare(self, deviation): DeviantartExtractor.prepare(self, deviation) deviation["search_tags"] = self.tag class DeviantartWatchExtractor(DeviantartExtractor): """Extractor for Deviations from watched users""" subcategory = "watch" pattern = (r"(?:https?://)?(?:www\.)?deviantart\.com" r"/(?:watch/deviations|notifications/watch)()()") example = "https://www.deviantart.com/watch/deviations" def deviations(self): return self.api.browse_deviantsyouwatch() class DeviantartWatchPostsExtractor(DeviantartExtractor): """Extractor for Posts from watched users""" subcategory = "watch-posts" pattern = r"(?:https?://)?(?:www\.)?deviantart\.com/watch/posts()()" example = "https://www.deviantart.com/watch/posts" def deviations(self): return self.api.browse_posts_deviantsyouwatch() ############################################################################### # Eclipse ##################################################################### class DeviantartDeviationExtractor(DeviantartExtractor): """Extractor for single deviations""" subcategory = "deviation" archive_fmt = "g_{_username}_{index}.{extension}" pattern = (BASE_PATTERN + r"/(art|journal)/(?:[^/?#]+-)?(\d+)" r"|(?:https?://)?(?:www\.)?(?:fx)?deviantart\.com/" r"(?:view/|deviation/|view(?:-full)?\.php/*\?(?:[^#]+&)?id=)" r"(\d+)" # bare deviation ID without slug r"|(?:https?://)?fav\.me/d([0-9a-z]+)") # base36 example = "https://www.deviantart.com/UsER/art/TITLE-12345" skip = Extractor.skip def __init__(self, match): DeviantartExtractor.__init__(self, match) self.type = match.group(3) self.deviation_id = \ match.group(4) or match.group(5) or id_from_base36(match.group(6)) def deviations(self): if self.user: url = "{}/{}/{}/{}".format( self.root, self.user, self.type or "art", self.deviation_id) else: url = "{}/view/{}/".format(self.root, self.deviation_id) page = self._limited_request(url, notfound="deviation").text uuid = text.extr(page, '"deviationUuid\\":\\"', '\\') if not uuid: raise exception.NotFoundError("deviation") deviation = self.api.deviation(uuid) deviation["_page"] = page deviation["index_file"] = 0 deviation["num"] = deviation["count"] = 1 additional_media = text.extr(page, ',\\"additionalMedia\\":', '}],\\"') if not additional_media: yield deviation return self.filename_fmt = ("{category}_{index}_{index_file}_{title}_" "{num:>02}.{extension}") self.archive_fmt = ("g_{_username}_{index}{index_file:?_//}." "{extension}") additional_media = util.json_loads(self._unescape_json( additional_media) + "}]") deviation["count"] = 1 + len(additional_media) yield deviation for index, post in enumerate(additional_media): uri = self._eclipse_media(post["media"], "fullview")[0] deviation["content"]["src"] = uri deviation["num"] += 1 deviation["index_file"] = post["fileId"] # Download only works on purchased materials - no way to check deviation["is_downloadable"] = False yield deviation class DeviantartScrapsExtractor(DeviantartExtractor): """Extractor for an artist's scraps""" subcategory = "scraps" directory_fmt = ("{category}", "{username}", "Scraps") archive_fmt = "s_{_username}_{index}.{extension}" pattern = BASE_PATTERN + r"/gallery/(?:\?catpath=)?scraps\b" example = "https://www.deviantart.com/USER/gallery/scraps" def deviations(self): self.login() eclipse_api = DeviantartEclipseAPI(self) return self._eclipse_to_oauth( eclipse_api, eclipse_api.gallery_scraps(self.user, self.offset)) class DeviantartSearchExtractor(DeviantartExtractor): """Extractor for deviantart search results""" subcategory = "search" directory_fmt = ("{category}", "Search", "{search_tags}") archive_fmt = "Q_{search_tags}_{index}.{extension}" pattern = (r"(?:https?://)?www\.deviantart\.com" r"/search(?:/deviations)?/?\?([^#]+)") example = "https://www.deviantart.com/search?q=QUERY" skip = Extractor.skip def __init__(self, match): DeviantartExtractor.__init__(self, match) self.query = text.parse_query(self.user) self.search = self.query.get("q", "") self.user = "" def deviations(self): logged_in = self.login() eclipse_api = DeviantartEclipseAPI(self) search = (eclipse_api.search_deviations if logged_in else self._search_html) return self._eclipse_to_oauth(eclipse_api, search(self.query)) def prepare(self, deviation): DeviantartExtractor.prepare(self, deviation) deviation["search_tags"] = self.search def _search_html(self, params): url = self.root + "/search" while True: response = self.request(url, params=params) if response.history and "/users/login" in response.url: raise exception.StopExtraction("HTTP redirect to login page") page = response.text for dev in DeviantartDeviationExtractor.pattern.findall( page)[2::3]: yield { "deviationId": dev[3], "author": {"username": dev[0]}, "isJournal": dev[2] == "journal", } cursor = text.extr(page, r'\"cursor\":\"', '\\',) if not cursor: return params["cursor"] = cursor class DeviantartGallerySearchExtractor(DeviantartExtractor): """Extractor for deviantart gallery searches""" subcategory = "gallery-search" archive_fmt = "g_{_username}_{index}.{extension}" pattern = BASE_PATTERN + r"/gallery/?\?(q=[^#]+)" example = "https://www.deviantart.com/USER/gallery?q=QUERY" def __init__(self, match): DeviantartExtractor.__init__(self, match) self.query = match.group(3) def deviations(self): self.login() eclipse_api = DeviantartEclipseAPI(self) query = text.parse_query(self.query) self.search = query["q"] return self._eclipse_to_oauth( eclipse_api, eclipse_api.galleries_search( self.user, self.search, self.offset, query.get("sort", "most-recent"), )) def prepare(self, deviation): DeviantartExtractor.prepare(self, deviation) deviation["search_tags"] = self.search class DeviantartFollowingExtractor(DeviantartExtractor): """Extractor for user's watched users""" subcategory = "following" pattern = BASE_PATTERN + "/(?:about#)?watching" example = "https://www.deviantart.com/USER/about#watching" def items(self): api = DeviantartOAuthAPI(self) for user in api.user_friends(self.user): url = "{}/{}".format(self.root, user["user"]["username"]) user["_extractor"] = DeviantartUserExtractor yield Message.Queue, url, user ############################################################################### # API Interfaces ############################################################## class DeviantartOAuthAPI(): """Interface for the DeviantArt OAuth API https://www.deviantart.com/developers/http/v1/20160316 """ CLIENT_ID = "5388" CLIENT_SECRET = "76b08c69cfb27f26d6161f9ab6d061a1" def __init__(self, extractor): self.extractor = extractor self.log = extractor.log self.headers = {"dA-minor-version": "20210526"} self._warn_429 = True self.delay = extractor.config("wait-min", 0) self.delay_min = max(2, self.delay) self.mature = extractor.config("mature", "true") if not isinstance(self.mature, str): self.mature = "true" if self.mature else "false" self.strategy = extractor.config("pagination") self.folders = extractor.config("folders", False) self.public = extractor.config("public", True) client_id = extractor.config("client-id") if client_id: self.client_id = str(client_id) self.client_secret = extractor.config("client-secret") else: self.client_id = self.CLIENT_ID self.client_secret = self.CLIENT_SECRET token = extractor.config("refresh-token") if token is None or token == "cache": token = "#" + self.client_id if not _refresh_token_cache(token): token = None self.refresh_token_key = token metadata = extractor.config("metadata", False) if not metadata: metadata = True if extractor.extra else False if metadata: self.metadata = True if isinstance(metadata, str): if metadata == "all": metadata = ("submission", "camera", "stats", "collection", "gallery") else: metadata = metadata.replace(" ", "").split(",") elif not isinstance(metadata, (list, tuple)): metadata = () self._metadata_params = {"mature_content": self.mature} self._metadata_public = None if metadata: # extended metadata self.limit = 10 for param in metadata: self._metadata_params["ext_" + param] = "1" if "ext_collection" in self._metadata_params or \ "ext_gallery" in self._metadata_params: if token: self._metadata_public = False else: self.log.error("'collection' and 'gallery' metadata " "require a refresh token") else: # base metadata self.limit = 50 else: self.metadata = False self.limit = None self.log.debug( "Using %s API credentials (client-id %s)", "default" if self.client_id == self.CLIENT_ID else "custom", self.client_id, ) def browse_deviantsyouwatch(self, offset=0): """Yield deviations from users you watch""" endpoint = "/browse/deviantsyouwatch" params = {"limit": 50, "offset": offset, "mature_content": self.mature} return self._pagination(endpoint, params, public=False) def browse_posts_deviantsyouwatch(self, offset=0): """Yield posts from users you watch""" endpoint = "/browse/posts/deviantsyouwatch" params = {"limit": 50, "offset": offset, "mature_content": self.mature} return self._pagination(endpoint, params, public=False, unpack=True) def browse_tags(self, tag, offset=0): """ Browse a tag """ endpoint = "/browse/tags" params = { "tag" : tag, "offset" : offset, "limit" : 50, "mature_content": self.mature, } return self._pagination(endpoint, params) def browse_user_journals(self, username, offset=0): journals = filter( lambda post: "/journal/" in post["url"], self.user_profile_posts(username)) if offset: journals = util.advance(journals, offset) return journals def collections(self, username, folder_id, offset=0): """Yield all Deviation-objects contained in a collection folder""" endpoint = "/collections/" + folder_id params = {"username": username, "offset": offset, "limit": 24, "mature_content": self.mature} return self._pagination(endpoint, params) def collections_all(self, username, offset=0): """Yield all deviations in a user's collection""" endpoint = "/collections/all" params = {"username": username, "offset": offset, "limit": 24, "mature_content": self.mature} return self._pagination(endpoint, params) @memcache(keyarg=1) def collections_folders(self, username, offset=0): """Yield all collection folders of a specific user""" endpoint = "/collections/folders" params = {"username": username, "offset": offset, "limit": 50, "mature_content": self.mature} return self._pagination_list(endpoint, params) def comments(self, target_id, target_type="deviation", comment_id=None, offset=0): """Fetch comments posted on a target""" endpoint = "/comments/{}/{}".format(target_type, target_id) params = { "commentid" : comment_id, "maxdepth" : "5", "offset" : offset, "limit" : 50, "mature_content": self.mature, } return self._pagination_list(endpoint, params=params, key="thread") def deviation(self, deviation_id, public=None): """Query and return info about a single Deviation""" endpoint = "/deviation/" + deviation_id deviation = self._call(endpoint, public=public) if deviation.get("is_mature") and public is None and \ self.refresh_token_key: deviation = self._call(endpoint, public=False) if self.metadata: self._metadata((deviation,)) if self.folders: self._folders((deviation,)) return deviation def deviation_content(self, deviation_id, public=None): """Get extended content of a single Deviation""" endpoint = "/deviation/content" params = {"deviationid": deviation_id} content = self._call(endpoint, params=params, public=public) if public and content["html"].startswith( ' <span class=\"username-with-symbol'): if self.refresh_token_key: content = self._call(endpoint, params=params, public=False) else: self.log.warning("Private Journal") return content def deviation_download(self, deviation_id, public=None): """Get the original file download (if allowed)""" endpoint = "/deviation/download/" + deviation_id params = {"mature_content": self.mature} try: return self._call( endpoint, params=params, public=public, log=False) except Exception: if not self.refresh_token_key: raise return self._call(endpoint, params=params, public=False) def deviation_metadata(self, deviations): """ Fetch deviation metadata for a set of deviations""" endpoint = "/deviation/metadata?" + "&".join( "deviationids[{}]={}".format(num, deviation["deviationid"]) for num, deviation in enumerate(deviations) ) return self._call( endpoint, params=self._metadata_params, public=self._metadata_public, )["metadata"] def gallery(self, username, folder_id, offset=0, extend=True, public=None): """Yield all Deviation-objects contained in a gallery folder""" endpoint = "/gallery/" + folder_id params = {"username": username, "offset": offset, "limit": 24, "mature_content": self.mature, "mode": "newest"} return self._pagination(endpoint, params, extend, public) def gallery_all(self, username, offset=0): """Yield all Deviation-objects of a specific user""" endpoint = "/gallery/all" params = {"username": username, "offset": offset, "limit": 24, "mature_content": self.mature} return self._pagination(endpoint, params) @memcache(keyarg=1) def gallery_folders(self, username, offset=0): """Yield all gallery folders of a specific user""" endpoint = "/gallery/folders" params = {"username": username, "offset": offset, "limit": 50, "mature_content": self.mature} return self._pagination_list(endpoint, params) def user_friends(self, username, offset=0): """Get the users list of friends""" endpoint = "/user/friends/" + username params = {"limit": 50, "offset": offset, "mature_content": self.mature} return self._pagination(endpoint, params) def user_friends_watch(self, username): """Watch a user""" endpoint = "/user/friends/watch/" + username data = { "watch[friend]" : "0", "watch[deviations]" : "0", "watch[journals]" : "0", "watch[forum_threads]": "0", "watch[critiques]" : "0", "watch[scraps]" : "0", "watch[activity]" : "0", "watch[collections]" : "0", "mature_content" : self.mature, } return self._call( endpoint, method="POST", data=data, public=False, fatal=False, ).get("success") def user_friends_unwatch(self, username): """Unwatch a user""" endpoint = "/user/friends/unwatch/" + username return self._call( endpoint, method="POST", public=False, fatal=False, ).get("success") @memcache(keyarg=1) def user_profile(self, username): """Get user profile information""" endpoint = "/user/profile/" + username return self._call(endpoint, fatal=False) def user_profile_posts(self, username): endpoint = "/user/profile/posts" params = {"username": username, "limit": 50, "mature_content": self.mature} return self._pagination(endpoint, params) def user_statuses(self, username, offset=0): """Yield status updates of a specific user""" statuses = filter( lambda post: "/status-update/" in post["url"], self.user_profile_posts(username)) if offset: statuses = util.advance(statuses, offset) return statuses def authenticate(self, refresh_token_key): """Authenticate the application by requesting an access token""" self.headers["Authorization"] = \ self._authenticate_impl(refresh_token_key) @cache(maxage=3600, keyarg=1) def _authenticate_impl(self, refresh_token_key): """Actual authenticate implementation""" url = "https://www.deviantart.com/oauth2/token" if refresh_token_key: self.log.info("Refreshing private access token") data = {"grant_type": "refresh_token", "refresh_token": _refresh_token_cache(refresh_token_key)} else: self.log.info("Requesting public access token") data = {"grant_type": "client_credentials"} auth = util.HTTPBasicAuth(self.client_id, self.client_secret) response = self.extractor.request( url, method="POST", data=data, auth=auth, fatal=False) data = response.json() if response.status_code != 200: self.log.debug("Server response: %s", data) raise exception.AuthenticationError('"{}" ({})'.format( data.get("error_description"), data.get("error"))) if refresh_token_key: _refresh_token_cache.update( refresh_token_key, data["refresh_token"]) return "Bearer " + data["access_token"] def _call(self, endpoint, fatal=True, log=True, public=None, **kwargs): """Call an API endpoint""" url = "https://www.deviantart.com/api/v1/oauth2" + endpoint kwargs["fatal"] = None if public is None: public = self.public while True: if self.delay: self.extractor.sleep(self.delay, "api") self.authenticate(None if public else self.refresh_token_key) kwargs["headers"] = self.headers response = self.extractor.request(url, **kwargs) try: data = response.json() except ValueError: self.log.error("Unable to parse API response") data = {} status = response.status_code if 200 <= status < 400: if self.delay > self.delay_min: self.delay -= 1 return data if not fatal and status != 429: return None error = data.get("error_description") if error == "User not found.": raise exception.NotFoundError("user or group") if error == "Deviation not downloadable.": raise exception.AuthorizationError() self.log.debug(response.text) msg = "API responded with {} {}".format( status, response.reason) if status == 429: if self.delay < 30: self.delay += 1 self.log.warning("%s. Using %ds delay.", msg, self.delay) if self._warn_429 and self.delay >= 3: self._warn_429 = False if self.client_id == self.CLIENT_ID: self.log.info( "Register your own OAuth application and use its " "credentials to prevent this error: " "https://gdl-org.github.io/docs/configuration.html" "#extractor-deviantart-client-id-client-secret") else: if log: self.log.error(msg) return data def _should_switch_tokens(self, results, params): if len(results) < params["limit"]: return True if not self.extractor.jwt: for item in results: if item.get("is_mature"): return True return False def _pagination(self, endpoint, params, extend=True, public=None, unpack=False, key="results"): warn = True if public is None: public = self.public if self.limit and params["limit"] > self.limit: params["limit"] = (params["limit"] // self.limit) * self.limit while True: data = self._call(endpoint, params=params, public=public) try: results = data[key] except KeyError: self.log.error("Unexpected API response: %s", data) return if unpack: results = [item["journal"] for item in results if "journal" in item] if extend: if public and self._should_switch_tokens(results, params): if self.refresh_token_key: self.log.debug("Switching to private access token") public = False continue elif data["has_more"] and warn: warn = False self.log.warning( "Private or mature deviations detected! " "Run 'gallery-dl oauth:deviantart' and follow the " "instructions to be able to access them.") # "statusid" cannot be used instead if results and "deviationid" in results[0]: if self.metadata: self._metadata(results) if self.folders: self._folders(results) else: # attempt to fix "deleted" deviations for dev in self._shared_content(results): if not dev["is_deleted"]: continue patch = self._call( "/deviation/" + dev["deviationid"], fatal=False) if patch: dev.update(patch) yield from results if not data["has_more"] and ( self.strategy != "manual" or not results or not extend): return if "next_cursor" in data: if not data["next_cursor"]: return params["offset"] = None params["cursor"] = data["next_cursor"] elif data["next_offset"] is not None: params["offset"] = data["next_offset"] params["cursor"] = None else: if params.get("offset") is None: return params["offset"] = int(params["offset"]) + len(results) def _pagination_list(self, endpoint, params, key="results"): result = [] result.extend(self._pagination(endpoint, params, False, key=key)) return result @staticmethod def _shared_content(results): """Return an iterable of shared deviations in 'results'""" for result in results: for item in result.get("items") or (): if "deviation" in item: yield item["deviation"] def _metadata(self, deviations): """Add extended metadata to each deviation object""" if len(deviations) <= self.limit: self._metadata_batch(deviations) else: n = self.limit for index in range(0, len(deviations), n): self._metadata_batch(deviations[index:index+n]) def _metadata_batch(self, deviations): """Fetch extended metadata for a single batch of deviations""" for deviation, metadata in zip( deviations, self.deviation_metadata(deviations)): deviation.update(metadata) deviation["tags"] = [t["tag_name"] for t in deviation["tags"]] def _folders(self, deviations): """Add a list of all containing folders to each deviation object""" for deviation in deviations: deviation["folders"] = self._folders_map( deviation["author"]["username"])[deviation["deviationid"]] @memcache(keyarg=1) def _folders_map(self, username): """Generate a deviation_id -> folders mapping for 'username'""" self.log.info("Collecting folder information for '%s'", username) folders = self.gallery_folders(username) # create 'folderid'-to-'folder' mapping fmap = { folder["folderid"]: folder for folder in folders } # add parent names to folders, but ignore "Featured" as parent featured = folders[0]["folderid"] done = False while not done: done = True for folder in folders: parent = folder["parent"] if not parent: pass elif parent == featured: folder["parent"] = None else: parent = fmap[parent] if parent["parent"]: done = False else: folder["name"] = parent["name"] + "/" + folder["name"] folder["parent"] = None # map deviationids to folder names dmap = collections.defaultdict(list) for folder in folders: for deviation in self.gallery( username, folder["folderid"], 0, False): dmap[deviation["deviationid"]].append(folder["name"]) return dmap class DeviantartEclipseAPI(): """Interface to the DeviantArt Eclipse API""" def __init__(self, extractor): self.extractor = extractor self.log = extractor.log self.request = self.extractor._limited_request self.csrf_token = None def deviation_extended_fetch(self, deviation_id, user, kind=None): endpoint = "/_puppy/dadeviation/init" params = { "deviationid" : deviation_id, "username" : user, "type" : kind, "include_session" : "false", "expand" : "deviation.related", "da_minor_version": "20230710", } return self._call(endpoint, params) def gallery_scraps(self, user, offset=0): endpoint = "/_puppy/dashared/gallection/contents" params = { "username" : user, "type" : "gallery", "offset" : offset, "limit" : 24, "scraps_folder": "true", } return self._pagination(endpoint, params) def galleries_search(self, user, query, offset=0, order="most-recent"): endpoint = "/_puppy/dashared/gallection/search" params = { "username": user, "type" : "gallery", "order" : order, "q" : query, "offset" : offset, "limit" : 24, } return self._pagination(endpoint, params) def search_deviations(self, params): endpoint = "/_puppy/dabrowse/search/deviations" return self._pagination(endpoint, params, key="deviations") def user_info(self, user, expand=False): endpoint = "/_puppy/dauserprofile/init/about" params = {"username": user} return self._call(endpoint, params) def user_watching(self, user, offset=0): gruserid, moduleid = self._ids_watching(user) endpoint = "/_puppy/gruser/module/watching" params = { "gruserid" : gruserid, "gruser_typeid": "4", "username" : user, "moduleid" : moduleid, "offset" : offset, "limit" : 24, } return self._pagination(endpoint, params) def _call(self, endpoint, params): url = "https://www.deviantart.com" + endpoint params["csrf_token"] = self.csrf_token or self._fetch_csrf_token() response = self.request(url, params=params, fatal=None) try: return response.json() except Exception: return {"error": response.text} def _pagination(self, endpoint, params, key="results"): limit = params.get("limit", 24) warn = True while True: data = self._call(endpoint, params) results = data.get(key) if results is None: return if len(results) < limit and warn and data.get("hasMore"): warn = False self.log.warning( "Private deviations detected! " "Provide login credentials or session cookies " "to be able to access them.") yield from results if not data.get("hasMore"): return if "nextCursor" in data: params["offset"] = None params["cursor"] = data["nextCursor"] elif "nextOffset" in data: params["offset"] = data["nextOffset"] params["cursor"] = None elif params.get("offset") is None: return else: params["offset"] = int(params["offset"]) + len(results) def _ids_watching(self, user): url = "{}/{}/about".format(self.extractor.root, user) page = self.request(url).text gruser_id = text.extr(page, ' data-userid="', '"') pos = page.find('\\"name\\":\\"watching\\"') if pos < 0: raise exception.NotFoundError("'watching' module ID") module_id = text.rextract( page, '\\"id\\":', ',', pos)[0].strip('" ') self._fetch_csrf_token(page) return gruser_id, module_id def _fetch_csrf_token(self, page=None): if page is None: page = self.request(self.extractor.root + "/").text self.csrf_token = token = text.extr( page, "window.__CSRF_TOKEN__ = '", "'") return token @memcache(keyarg=1) def _user_details(extr, name): try: return extr.api.user_profile(name)["user"] except Exception: return None @cache(maxage=36500*86400, keyarg=0) def _refresh_token_cache(token): if token and token[0] == "#": return None return token @cache(maxage=28*86400, keyarg=1) def _login_impl(extr, username, password): extr.log.info("Logging in as %s", username) url = "https://www.deviantart.com/users/login" page = extr.request(url).text data = {} for item in text.extract_iter(page, '<input type="hidden" name="', '"/>'): name, _, value = item.partition('" value="') data[name] = value challenge = data.get("challenge") if challenge and challenge != "0": extr.log.warning("Login requires solving a CAPTCHA") extr.log.debug(challenge) data["username"] = username data["password"] = password data["remember"] = "on" extr.sleep(2.0, "login") url = "https://www.deviantart.com/_sisu/do/signin" response = extr.request(url, method="POST", data=data) if not response.history: raise exception.AuthenticationError() return { cookie.name: cookie.value for cookie in extr.cookies } def id_from_base36(base36): return util.bdecode(base36, _ALPHABET) def base36_from_id(deviation_id): return util.bencode(int(deviation_id), _ALPHABET) _ALPHABET = "0123456789abcdefghijklmnopqrstuvwxyz" ############################################################################### # Journal Formats ############################################################# SHADOW_TEMPLATE = """ <span class="shadow"> <img src="{src}" class="smshadow" width="{width}" height="{height}"> </span> <br><br> """ HEADER_TEMPLATE = """<div usr class="gr"> <div class="metadata"> <h2><a href="{url}">{title}</a></h2> <ul> <li class="author"> by <span class="name"><span class="username-with-symbol u"> <a class="u regular username" href="{userurl}">{username}</a>\ <span class="user-symbol regular"></span></span></span>, <span>{date}</span> </li> </ul> </div> """ HEADER_CUSTOM_TEMPLATE = """<div class='boxtop journaltop'> <h2> <img src="https://st.deviantart.net/minish/gruzecontrol/icons/journal.gif\ ?2" style="vertical-align:middle" alt=""/> <a href="{url}">{title}</a> </h2> Journal Entry: <span>{date}</span> """ JOURNAL_TEMPLATE_HTML = """text:<!DOCTYPE html> <html> <head> <meta charset="utf-8"> <title>{title}
    {shadow}
    {html}
    """ JOURNAL_TEMPLATE_HTML_EXTRA = """\
    \
    {}
    {}
    """ JOURNAL_TEMPLATE_TEXT = """text:{title} by {username}, {date} {content} """ ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1745260818.0 gallery_dl-1.29.7/gallery_dl/extractor/directlink.py0000644000175000017500000000310515001510422021244 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2017-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Direct link handling""" from .common import Extractor, Message from .. import text class DirectlinkExtractor(Extractor): """Extractor for direct links to images and other media files""" category = "directlink" filename_fmt = "{domain}/{path}/{filename}.{extension}" archive_fmt = filename_fmt pattern = (r"(?i)https?://(?P[^/?#]+)/(?P[^?#]+\." r"(?:jpe?g|jpe|png|gif|bmp|svg|web[mp]|avif|heic|psd" r"|mp4|m4v|mov|mkv|og[gmv]|wav|mp3|opus|zip|rar|7z|pdf|swf))" r"(?:\?(?P[^#]*))?(?:#(?P.*))?$") example = "https://en.wikipedia.org/static/images/project-logos/enwiki.png" def __init__(self, match): Extractor.__init__(self, match) self.data = data = match.groupdict() self.subcategory = ".".join(data["domain"].rsplit(".", 2)[-2:]) def items(self): data = self.data for key, value in data.items(): if value: data[key] = text.unquote(value) data["path"], _, name = data["path"].rpartition("/") data["filename"], _, ext = name.rpartition(".") data["extension"] = ext.lower() data["_http_headers"] = { "Referer": self.url.encode("latin-1", "ignore")} yield Message.Directory, data yield Message.Url, self.url, data ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1746776695.0 gallery_dl-1.29.7/gallery_dl/extractor/discord.py0000644000175000017500000003452215007331167020567 0ustar00mikemike# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://discord.com/""" from .common import Extractor, Message from .. import text, exception BASE_PATTERN = r"(?:https?://)?discord\.com" class DiscordExtractor(Extractor): """Base class for Discord extractors""" category = "discord" root = "https://discord.com" directory_fmt = ("{category}", "{server_id}_{server}", "{channel_id}_{channel}") filename_fmt = "{message_id}_{num:>02}_{filename}.{extension}" archive_fmt = "{message_id}_{num}" cdn_fmt = "https://cdn.discordapp.com/{}/{}/{}.png?size=4096" server_metadata = {} server_channels_metadata = {} def _init(self): self.token = self.config("token") self.enabled_embeds = self.config("embeds", ["image", "gifv", "video"]) self.enabled_threads = self.config("threads", True) self.api = DiscordAPI(self) def extract_message_text(self, message): text_content = [message["content"]] for embed in message["embeds"]: if embed["type"] == "rich": try: text_content.append(embed["author"]["name"]) except Exception: pass text_content.append(embed.get("title", "")) text_content.append(embed.get("description", "")) for field in embed.get("fields", []): text_content.append(field.get("name", "")) text_content.append(field.get("value", "")) try: text_content.append(embed["footer"]["text"]) except Exception: pass if message.get("poll"): text_content.append(message["poll"]["question"]["text"]) for answer in message["poll"]["answers"]: text_content.append(answer["poll_media"]["text"]) return "\n".join(t for t in text_content if t) def extract_message(self, message): # https://discord.com/developers/docs/resources/message#message-object-message-types if message["type"] in (0, 19, 21): message_metadata = {} message_metadata.update(self.server_metadata) message_metadata.update( self.server_channels_metadata[message["channel_id"]]) message_metadata.update({ "author": message["author"]["username"], "author_id": message["author"]["id"], "author_files": [], "message": self.extract_message_text(message), "message_id": message["id"], "date": text.parse_datetime( message["timestamp"], "%Y-%m-%dT%H:%M:%S.%f%z" ), "files": [] }) for icon_type, icon_path in ( ("avatar", "avatars"), ("banner", "banners") ): if message["author"].get(icon_type): message_metadata["author_files"].append({ "url": self.cdn_fmt.format( icon_path, message_metadata["author_id"], message["author"][icon_type] ), "filename": icon_type, "extension": "png", }) for attachment in message["attachments"]: message_metadata["files"].append({ "url": attachment["url"], "type": "attachment", }) for embed in message["embeds"]: if embed["type"] in self.enabled_embeds: for field in ("video", "image", "thumbnail"): if field not in embed: continue url = embed[field].get("proxy_url") if url is not None: message_metadata["files"].append({ "url": url, "type": "embed", }) break for num, file in enumerate(message_metadata["files"], start=1): text.nameext_from_url(file["url"], file) file["num"] = num yield Message.Directory, message_metadata for file in message_metadata["files"]: message_metadata_file = message_metadata.copy() message_metadata_file.update(file) yield Message.Url, file["url"], message_metadata_file def extract_channel_text(self, channel_id): for message in self.api.get_channel_messages(channel_id): yield from self.extract_message(message) def extract_channel_threads(self, channel_id): for thread in self.api.get_channel_threads(channel_id): id = self.parse_channel(thread)["channel_id"] yield from self.extract_channel_text(id) def extract_channel(self, channel_id, safe=False): try: if channel_id not in self.server_channels_metadata: self.parse_channel(self.api.get_channel(channel_id)) channel_type = ( self.server_channels_metadata[channel_id]["channel_type"] ) # https://discord.com/developers/docs/resources/channel#channel-object-channel-types if channel_type in (0, 5): yield from self.extract_channel_text(channel_id) if self.enabled_threads: yield from self.extract_channel_threads(channel_id) elif channel_type in (1, 3, 10, 11, 12): yield from self.extract_channel_text(channel_id) elif channel_type in (15, 16): yield from self.extract_channel_threads(channel_id) elif channel_type in (4,): for channel in self.server_channels_metadata.copy().values(): if channel["parent_id"] == channel_id: yield from self.extract_channel( channel["channel_id"], safe=True) elif not safe: raise exception.StopExtraction( "This channel type is not supported." ) except exception.HttpError as exc: if not (exc.status == 403 and safe): raise def parse_channel(self, channel): parent_id = channel.get("parent_id") channel_metadata = { "channel": channel.get("name", ""), "channel_id": channel.get("id"), "channel_type": channel.get("type"), "channel_topic": channel.get("topic", ""), "parent_id": parent_id, "is_thread": "thread_metadata" in channel } if parent_id in self.server_channels_metadata: parent_metadata = self.server_channels_metadata[parent_id] channel_metadata.update({ "parent": parent_metadata["channel"], "parent_type": parent_metadata["channel_type"] }) if channel_metadata["channel_type"] in (1, 3): channel_metadata.update({ "channel": "DMs", "recipients": ( [user["username"] for user in channel["recipients"]] ), "recipients_id": ( [user["id"] for user in channel["recipients"]] ) }) channel_id = channel_metadata["channel_id"] self.server_channels_metadata[channel_id] = channel_metadata return channel_metadata def parse_server(self, server): self.server_metadata = { "server": server["name"], "server_id": server["id"], "server_files": [], "owner_id": server["owner_id"] } for icon_type, icon_path in ( ("icon", "icons"), ("banner", "banners"), ("splash", "splashes"), ("discovery_splash", "discovery-splashes") ): if server.get(icon_type): self.server_metadata["server_files"].append({ "url": self.cdn_fmt.format( icon_path, self.server_metadata["server_id"], server[icon_type] ), "filename": icon_type, "extension": "png", }) return self.server_metadata def build_server_and_channels(self, server_id): self.parse_server(self.api.get_server(server_id)) for channel in sorted( self.api.get_server_channels(server_id), key=lambda ch: ch["type"] != 4 ): self.parse_channel(channel) class DiscordChannelExtractor(DiscordExtractor): subcategory = "channel" pattern = BASE_PATTERN + r"/channels/(\d+)/(?:\d+/threads/)?(\d+)/?$" example = "https://discord.com/channels/1234567890/9876543210" def items(self): server_id, channel_id = self.groups self.build_server_and_channels(server_id) return self.extract_channel(channel_id) class DiscordMessageExtractor(DiscordExtractor): subcategory = "message" pattern = BASE_PATTERN + r"/channels/(\d+)/(\d+)/(\d+)/?$" example = "https://discord.com/channels/1234567890/9876543210/2468013579" def items(self): server_id, channel_id, message_id = self.groups self.build_server_and_channels(server_id) if channel_id not in self.server_channels_metadata: self.parse_channel(self.api.get_channel(channel_id)) return self.extract_message( self.api.get_message(channel_id, message_id)) class DiscordServerExtractor(DiscordExtractor): subcategory = "server" pattern = BASE_PATTERN + r"/channels/(\d+)/?$" example = "https://discord.com/channels/1234567890" def items(self): server_id = self.groups[0] self.build_server_and_channels(server_id) for channel in self.server_channels_metadata.copy().values(): if channel["channel_type"] in (0, 5, 15, 16): yield from self.extract_channel( channel["channel_id"], safe=True) class DiscordDirectMessagesExtractor(DiscordExtractor): subcategory = "direct-messages" directory_fmt = ("{category}", "Direct Messages", "{channel_id}_{recipients:J,}") pattern = BASE_PATTERN + r"/channels/@me/(\d+)/?$" example = "https://discord.com/channels/@me/1234567890" def items(self): return self.extract_channel(self.groups[0]) class DiscordDirectMessageExtractor(DiscordExtractor): subcategory = "direct-message" directory_fmt = ("{category}", "Direct Messages", "{channel_id}_{recipients:J,}") pattern = BASE_PATTERN + r"/channels/@me/(\d+)/(\d+)/?$" example = "https://discord.com/channels/@me/1234567890/9876543210" def items(self): channel_id, message_id = self.groups self.parse_channel(self.api.get_channel(channel_id)) return self.extract_message( self.api.get_message(channel_id, message_id)) class DiscordAPI(): """Interface for the Discord API v10 https://discord.com/developers/docs/reference """ def __init__(self, extractor): self.extractor = extractor self.root = extractor.root + "/api/v10" self.headers = {"Authorization": extractor.token} def get_server(self, server_id): """Get server information""" return self._call("/guilds/" + server_id) def get_server_channels(self, server_id): """Get server channels""" return self._call("/guilds/" + server_id + "/channels") def get_channel(self, channel_id): """Get channel information""" return self._call("/channels/" + channel_id) def get_channel_threads(self, channel_id): """Get channel threads""" THREADS_BATCH = 25 def _method(offset): return self._call("/channels/" + channel_id + "/threads/search", { "sort_by": "last_message_time", "sort_order": "desc", "limit": THREADS_BATCH, "offset": + offset, })["threads"] return self._pagination(_method, THREADS_BATCH) def get_channel_messages(self, channel_id): """Get channel messages""" MESSAGES_BATCH = 100 before = None def _method(_): nonlocal before messages = self._call("/channels/" + channel_id + "/messages", { "limit": MESSAGES_BATCH, "before": before }) if messages: before = messages[-1]["id"] return messages return self._pagination(_method, MESSAGES_BATCH) def get_message(self, channel_id, message_id): """Get message information""" return self._call("/channels/" + channel_id + "/messages", { "limit": 1, "around": message_id })[0] def _call(self, endpoint, params=None): url = self.root + endpoint try: response = self.extractor.request( url, params=params, headers=self.headers) except exception.HttpError as exc: if exc.status == 401: self._raise_invalid_token() raise return response.json() def _pagination(self, method, batch): offset = 0 while True: data = method(offset) yield from data if len(data) < batch: return offset += len(data) @staticmethod def _raise_invalid_token(): raise exception.AuthenticationError("""Invalid or missing token. Please provide a valid token following these instructions: 1) Open Discord in your browser (https://discord.com/app); 2) Open your browser's Developer Tools (F12) and switch to the Network panel; 3) Reload the page and select any request going to https://discord.com/api/...; 4) In the "Headers" tab, look for an entry beginning with "Authorization: "; 5) Right-click the entry and click "Copy Value"; 6) Paste the token in your configuration file under "extractor.discord.token", or run this command with the -o "token=[your token]" argument.""") ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1745260818.0 gallery_dl-1.29.7/gallery_dl/extractor/dynastyscans.py0000644000175000017500000001110415001510422021635 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2015-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://dynasty-scans.com/""" from .common import ChapterExtractor, MangaExtractor, Extractor, Message from .. import text, util import re BASE_PATTERN = r"(?:https?://)?(?:www\.)?dynasty-scans\.com" class DynastyscansBase(): """Base class for dynastyscans extractors""" category = "dynastyscans" root = "https://dynasty-scans.com" def _parse_image_page(self, image_id): url = "{}/images/{}".format(self.root, image_id) extr = text.extract_from(self.request(url).text) date = extr("class='create_at'>", "") tags = extr("class='tags'>", "") src = extr("class='btn-group'>", "") url = extr(' src="', '"') src = text.extr(src, 'href="', '"') if "Source<" in src else "" return { "url" : self.root + url, "image_id": text.parse_int(image_id), "tags" : text.split_html(tags), "date" : text.remove_html(date), "source" : text.unescape(src), } class DynastyscansChapterExtractor(DynastyscansBase, ChapterExtractor): """Extractor for manga-chapters from dynasty-scans.com""" pattern = BASE_PATTERN + r"(/chapters/[^/?#]+)" example = "https://dynasty-scans.com/chapters/NAME" def metadata(self, page): extr = text.extract_from(page) match = re.match( (r"(?:]*>)?([^<]+)(?:
    )?" # manga name r"(?: ch(\d+)([^:<]*))?" # chapter info r"(?:: (.+))?"), # title extr("

    ", ""), ) author = extr(" by ", "") group = extr('"icon-print"> ', '') return { "manga" : text.unescape(match.group(1)), "chapter" : text.parse_int(match.group(2)), "chapter_minor": match.group(3) or "", "title" : text.unescape(match.group(4) or ""), "author" : text.remove_html(author), "group" : (text.remove_html(group) or text.extr(group, ' alt="', '"')), "date" : text.parse_datetime(extr( '"icon-calendar"> ', '<'), "%b %d, %Y"), "tags" : text.split_html(extr( "class='tags'>", "
    \n', pos) urls = [] date = None groups = page.split('
    1: date = text.parse_timestamp(ts) data = { "album_id": album_id, "title" : text.unescape(title), "user" : text.unquote(user), "count" : len(urls), "date" : date, "tags" : ([t.replace("+", " ") for t in text.extract_iter(tags, "?q=", '"')] if tags else ()), "_http_headers": {"Referer": url}, } yield Message.Directory, data for data["num"], url in enumerate(urls, 1): yield Message.Url, url, text.nameext_from_url(url, data) def albums(self): return () def request(self, url, **kwargs): if self.__cookies: self.__cookies = False self.cookies.update(_cookie_cache()) for _ in range(5): response = Extractor.request(self, url, **kwargs) if response.cookies: _cookie_cache.update("", response.cookies) if response.content.find( b"Please wait a few moments", 0, 600) < 0: return response self.sleep(5.0, "check") def _pagination(self, url, params): for params["page"] in itertools.count(1): page = self.request(url, params=params).text album_ids = EromeAlbumExtractor.pattern.findall(page)[::2] yield from album_ids if len(album_ids) < 36: return class EromeAlbumExtractor(EromeExtractor): """Extractor for albums on erome.com""" subcategory = "album" pattern = BASE_PATTERN + r"/a/(\w+)" example = "https://www.erome.com/a/ID" def albums(self): return (self.groups[0],) class EromeUserExtractor(EromeExtractor): subcategory = "user" pattern = BASE_PATTERN + r"/(?!a/|search\?)([^/?#]+)" example = "https://www.erome.com/USER" def albums(self): url = "{}/{}".format(self.root, self.groups[0]) return self._pagination(url, {}) class EromeSearchExtractor(EromeExtractor): subcategory = "search" pattern = BASE_PATTERN + r"/search/?\?(q=[^#]+)" example = "https://www.erome.com/search?q=QUERY" def albums(self): url = self.root + "/search" params = text.parse_query(self.groups[0]) return self._pagination(url, params) @cache() def _cookie_cache(): return () ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1746776695.0 gallery_dl-1.29.7/gallery_dl/extractor/everia.py0000644000175000017500000000600215007331167020403 0ustar00mikemike# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://everia.club""" from .common import Extractor, Message from .. import text import re BASE_PATTERN = r"(?:https?://)?everia\.club" class EveriaExtractor(Extractor): category = "everia" root = "https://everia.club" def items(self): data = {"_extractor": EveriaPostExtractor} for url in self.posts(): yield Message.Queue, url, data def posts(self): return self._pagination(self.groups[0]) def _pagination(self, path, params=None, pnum=1): find_posts = re.compile(r'thumbnail">\s*= 300: return yield from find_posts(response.text) pnum += 1 class EveriaPostExtractor(EveriaExtractor): subcategory = "post" directory_fmt = ("{category}", "{title}") archive_fmt = "{post_url}_{num}" pattern = BASE_PATTERN + r"(/\d{4}/\d{2}/\d{2}/[^/?#]+)" example = "https://everia.club/0000/00/00/TITLE" def items(self): url = self.root + self.groups[0] page = self.request(url).text content = text.extr(page, 'itemprop="text">', "', "', "")), "post_url": url, "post_category": text.extr( page, "post-in-category-", " ").capitalize(), "count": len(urls), } yield Message.Directory, data for data["num"], url in enumerate(urls, 1): yield Message.Url, url, text.nameext_from_url(url, data) class EveriaTagExtractor(EveriaExtractor): subcategory = "tag" pattern = BASE_PATTERN + r"(/tag/[^/?#]+)" example = "https://everia.club/tag/TAG" class EveriaCategoryExtractor(EveriaExtractor): subcategory = "category" pattern = BASE_PATTERN + r"(/category/[^/?#]+)" example = "https://everia.club/category/CATEGORY" class EveriaDateExtractor(EveriaExtractor): subcategory = "date" pattern = (BASE_PATTERN + r"(/\d{4}(?:/\d{2})?(?:/\d{2})?)(?:/page/\d+)?/?$") example = "https://everia.club/0000/00/00" class EveriaSearchExtractor(EveriaExtractor): subcategory = "search" pattern = BASE_PATTERN + r"/(?:page/\d+/)?\?s=([^&#]+)" example = "https://everia.club/?s=SEARCH" def posts(self): params = {"s": self.groups[0]} return self._pagination("", params) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1747990168.0 gallery_dl-1.29.7/gallery_dl/extractor/exhentai.py0000644000175000017500000005045515014033230020734 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2014-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://e-hentai.org/ and https://exhentai.org/""" from .common import Extractor, Message from .. import text, util, exception from ..cache import cache import collections import itertools import math BASE_PATTERN = r"(?:https?://)?(e[x-]|g\.e-)hentai\.org" class ExhentaiExtractor(Extractor): """Base class for exhentai extractors""" category = "exhentai" directory_fmt = ("{category}", "{gid} {title[:247]}") filename_fmt = "{gid}_{num:>04}_{image_token}_{filename}.{extension}" archive_fmt = "{gid}_{num}" cookies_domain = ".exhentai.org" cookies_names = ("ipb_member_id", "ipb_pass_hash") root = "https://exhentai.org" request_interval = (3.0, 6.0) ciphers = "DEFAULT:!DH" LIMIT = False def __init__(self, match): Extractor.__init__(self, match) self.version = match.group(1) def initialize(self): domain = self.config("domain", "auto") if domain == "auto": domain = ("ex" if self.version == "ex" else "e-") + "hentai.org" self.root = "https://" + domain self.api_url = self.root + "/api.php" self.cookies_domain = "." + domain Extractor.initialize(self) if self.version != "ex": self.cookies.set("nw", "1", domain=self.cookies_domain) def request(self, url, **kwargs): response = Extractor.request(self, url, **kwargs) if "Cache-Control" not in response.headers and not response.content: self.log.info("blank page") raise exception.AuthorizationError() return response def login(self): """Login and set necessary cookies""" if self.LIMIT: raise exception.StopExtraction("Image limit reached!") if self.cookies_check(self.cookies_names): return username, password = self._get_auth_info() if username: return self.cookies_update(self._login_impl(username, password)) if self.version == "ex": self.log.info("No username or cookies given; using e-hentai.org") self.root = "https://e-hentai.org" self.cookies_domain = ".e-hentai.org" self.cookies.set("nw", "1", domain=self.cookies_domain) self.original = False self.limits = False @cache(maxage=90*86400, keyarg=1) def _login_impl(self, username, password): self.log.info("Logging in as %s", username) url = "https://forums.e-hentai.org/index.php?act=Login&CODE=01" headers = { "Referer": "https://e-hentai.org/bounce_login.php?b=d&bt=1-1", } data = { "CookieDate": "1", "b": "d", "bt": "1-1", "UserName": username, "PassWord": password, "ipb_login_submit": "Login!", } self.cookies.clear() response = self.request(url, method="POST", headers=headers, data=data) content = response.content if b"You are now logged in as:" not in content: if b"The captcha was not entered correctly" in content: raise exception.AuthenticationError( "CAPTCHA required. Use cookies instead.") raise exception.AuthenticationError() # collect more cookies url = self.root + "/favorites.php" response = self.request(url) if response.history: self.request(url) return self.cookies class ExhentaiGalleryExtractor(ExhentaiExtractor): """Extractor for image galleries from exhentai.org""" subcategory = "gallery" pattern = (BASE_PATTERN + r"(?:/g/(\d+)/([\da-f]{10})" r"|/s/([\da-f]{10})/(\d+)-(\d+))") example = "https://e-hentai.org/g/12345/67890abcde/" def __init__(self, match): ExhentaiExtractor.__init__(self, match) self.gallery_id = text.parse_int(match.group(2) or match.group(5)) self.gallery_token = match.group(3) self.image_token = match.group(4) self.image_num = text.parse_int(match.group(6), 1) self.key_start = None self.key_show = None self.key_next = None self.count = 0 self.data = None def _init(self): source = self.config("source") if source == "hitomi": self.items = self._items_hitomi limits = self.config("limits", False) if limits and limits.__class__ is int: self.limits = limits self._remaining = 0 else: self.limits = False self.fallback_retries = self.config("fallback-retries", 2) self.original = self.config("original", True) def finalize(self): if self.data: self.log.info("Use '%s/s/%s/%s-%s' as input URL " "to continue downloading from the current position", self.root, self.data["image_token"], self.gallery_id, self.data["num"]) def favorite(self, slot="0"): url = self.root + "/gallerypopups.php" params = { "gid": self.gallery_id, "t" : self.gallery_token, "act": "addfav", } data = { "favcat" : slot, "apply" : "Apply Changes", "update" : "1", } self.request(url, method="POST", params=params, data=data) def items(self): self.login() if self.gallery_token: gpage = self._gallery_page() self.image_token = text.extr(gpage, 'hentai.org/s/', '"') if not self.image_token: self.log.debug("Page content:\n%s", gpage) raise exception.StopExtraction( "Failed to extract initial image token") ipage = self._image_page() else: ipage = self._image_page() part = text.extr(ipage, 'hentai.org/g/', '"') if not part: self.log.debug("Page content:\n%s", ipage) raise exception.StopExtraction( "Failed to extract gallery token") self.gallery_token = part.split("/")[1] gpage = self._gallery_page() self.data = data = self.get_metadata(gpage) self.count = text.parse_int(data["filecount"]) yield Message.Directory, data images = itertools.chain( (self.image_from_page(ipage),), self.images_from_api()) for url, image in images: data.update(image) if self.limits: self._check_limits(data) if "/fullimg" in url: data["_http_validate"] = self._validate_response else: data["_http_validate"] = None yield Message.Url, url, data fav = self.config("fav") if fav is not None: self.favorite(fav) self.data = None def _items_hitomi(self): if self.config("metadata", False): data = self.metadata_from_api() data["date"] = text.parse_timestamp(data["posted"]) else: data = {} from .hitomi import HitomiGalleryExtractor url = "https://hitomi.la/galleries/{}.html".format(self.gallery_id) data["_extractor"] = HitomiGalleryExtractor yield Message.Queue, url, data def get_metadata(self, page): """Extract gallery metadata""" data = self.metadata_from_page(page) if self.config("metadata", False): data.update(self.metadata_from_api()) data["date"] = text.parse_timestamp(data["posted"]) if self.config("tags", False): tags = collections.defaultdict(list) for tag in data["tags"]: type, _, value = tag.partition(":") tags[type].append(value) for type, values in tags.items(): data["tags_" + type] = values return data def metadata_from_page(self, page): extr = text.extract_from(page) api_url = extr('var api_url = "', '"') if api_url: self.api_url = api_url data = { "gid" : self.gallery_id, "token" : self.gallery_token, "thumb" : extr("background:transparent url(", ")"), "title" : text.unescape(extr('

    ', '

    ')), "title_jpn" : text.unescape(extr('

    ', '

    ')), "_" : extr('
    ', '<'), "uploader" : extr('
    ', '
    '), "date" : text.parse_datetime(extr( '>Posted:

  • '), "%Y-%m-%d %H:%M"), "parent" : extr( '>Parent:
    ', 'Visible:', '<'), "language" : extr('>Language:', ' '), "filesize" : text.parse_bytes(extr( '>File Size:', '<').rstrip("Bbi")), "filecount" : extr('>Length:', ' '), "favorites" : extr('id="favcount">', ' '), "rating" : extr(">Average: ", "<"), "torrentcount" : extr('>Torrent Download (', ')'), } uploader = data["uploader"] if uploader and uploader[0] == "<": data["uploader"] = text.unescape(text.extr(uploader, ">", "<")) f = data["favorites"][0] if f == "N": data["favorites"] = "0" elif f == "O": data["favorites"] = "1" data["lang"] = util.language_to_code(data["language"]) data["tags"] = [ text.unquote(tag.replace("+", " ")) for tag in text.extract_iter(page, 'hentai.org/tag/', '"') ] return data def metadata_from_api(self): data = { "method" : "gdata", "gidlist" : ((self.gallery_id, self.gallery_token),), "namespace": 1, } data = self.request(self.api_url, method="POST", json=data).json() if "error" in data: raise exception.StopExtraction(data["error"]) return data["gmetadata"][0] def image_from_page(self, page): """Get image url and data from webpage""" pos = page.index('
    = 0: origurl, pos = text.rextract(i6, '"', '"', pos) url = text.unescape(origurl) data = self._parse_original_info(text.extract( i6, "ownload original", "<", pos)[0]) data["_fallback"] = self._fallback_original(nl, url) else: url = imgurl data = self._parse_image_info(url) data["_fallback"] = self._fallback_1280( nl, request["page"], imgkey) except IndexError: self.log.debug("Page content:\n%s", page) raise exception.StopExtraction( "Unable to parse image info for '%s'", url) data["num"] = request["page"] data["image_token"] = imgkey data["_url_1280"] = imgurl data["_nl"] = nl self._check_509(imgurl) yield url, text.nameext_from_url(url, data) request["imgkey"] = nextkey def _validate_response(self, response): if not response.history and response.headers.get( "content-type", "").startswith("text/html"): page = response.text self.log.warning("'%s'", page) if " requires GP" in page: gp = self.config("gp") if gp == "stop": raise exception.StopExtraction("Not enough GP") elif gp == "wait": input("Press ENTER to continue.") return response.url self.log.info("Falling back to non-original downloads") self.original = False return self.data["_url_1280"] if " temporarily banned " in page: raise exception.AuthorizationError("Temporarily Banned") self._report_limits() return True def _report_limits(self): ExhentaiExtractor.LIMIT = True raise exception.StopExtraction("Image limit reached!") def _check_limits(self, data): if not self._remaining or data["num"] % 25 == 0: self._update_limits() self._remaining -= data["cost"] if self._remaining <= 0: self._report_limits() def _check_509(self, url): # full 509.gif URLs # - https://exhentai.org/img/509.gif # - https://ehgt.org/g/509.gif if url.endswith(("hentai.org/img/509.gif", "ehgt.org/g/509.gif")): self.log.debug(url) self._report_limits() def _update_limits(self): url = "https://e-hentai.org/home.php" cookies = { cookie.name: cookie.value for cookie in self.cookies if cookie.domain == self.cookies_domain and cookie.name != "igneous" } page = self.request(url, cookies=cookies).text current = text.extr(page, "", "").replace(",", "") self.log.debug("Image Limits: %s/%s", current, self.limits) self._remaining = self.limits - text.parse_int(current) def _gallery_page(self): url = "{}/g/{}/{}/".format( self.root, self.gallery_id, self.gallery_token) response = self.request(url, fatal=False) page = response.text if response.status_code == 404 and "Gallery Not Available" in page: raise exception.AuthorizationError() if page.startswith(("Key missing", "Gallery not found")): raise exception.NotFoundError("gallery") if page.count("hentai.org/mpv/") > 1: self.log.warning("Enabled Multi-Page Viewer is not supported") return page def _image_page(self): url = "{}/s/{}/{}-{}".format( self.root, self.image_token, self.gallery_id, self.image_num) page = self.request(url, fatal=False).text if page.startswith(("Invalid page", "Keep trying")): raise exception.NotFoundError("image page") return page def _fallback_original(self, nl, fullimg): url = "{}?nl={}".format(fullimg, nl) for _ in util.repeat(self.fallback_retries): yield url def _fallback_1280(self, nl, num, token=None): if not token: token = self.key_start for _ in util.repeat(self.fallback_retries): url = "{}/s/{}/{}-{}?nl={}".format( self.root, token, self.gallery_id, num, nl) page = self.request(url, fatal=False).text if page.startswith(("Invalid page", "Keep trying")): return url, data = self.image_from_page(page) yield url nl = data["_nl"] @staticmethod def _parse_image_info(url): for part in url.split("/")[4:]: try: _, size, width, height, _ = part.split("-") break except ValueError: pass else: size = width = height = 0 return { "cost" : 1, "size" : text.parse_int(size), "width" : text.parse_int(width), "height": text.parse_int(height), } @staticmethod def _parse_original_info(info): parts = info.lstrip().split(" ") size = text.parse_bytes(parts[3] + parts[4][0]) return { # 1 initial point + 1 per 0.1 MB "cost" : 1 + math.ceil(size / 100000), "size" : size, "width" : text.parse_int(parts[0]), "height": text.parse_int(parts[2]), } class ExhentaiSearchExtractor(ExhentaiExtractor): """Extractor for exhentai search results""" subcategory = "search" pattern = BASE_PATTERN + r"/(?:\?([^#]*)|tag/([^/?#]+))" example = "https://e-hentai.org/?f_search=QUERY" def __init__(self, match): ExhentaiExtractor.__init__(self, match) _, query, tag = match.groups() if tag: if "+" in tag: ns, _, tag = tag.rpartition(":") tag = '{}:"{}$"'.format(ns, tag.replace("+", " ")) else: tag += "$" self.params = {"f_search": tag, "page": 0} else: self.params = text.parse_query(query) if "next" not in self.params: self.params["page"] = text.parse_int(self.params.get("page")) def _init(self): self.search_url = self.root def items(self): self.login() data = {"_extractor": ExhentaiGalleryExtractor} search_url = self.search_url params = self.params while True: last = None page = self.request(search_url, params=params).text for gallery in ExhentaiGalleryExtractor.pattern.finditer(page): url = gallery.group(0) if url == last: continue last = url data["gallery_id"] = text.parse_int(gallery.group(2)) data["gallery_token"] = gallery.group(3) yield Message.Queue, url + "/", data next_url = text.extr(page, 'nexturl="', '"', None) if next_url is not None: if not next_url: return search_url = next_url params = None elif 'class="ptdd">><' in page or ">No hits found

    " in page: return else: params["page"] += 1 class ExhentaiFavoriteExtractor(ExhentaiSearchExtractor): """Extractor for favorited exhentai galleries""" subcategory = "favorite" pattern = BASE_PATTERN + r"/favorites\.php(?:\?([^#]*)())?" example = "https://e-hentai.org/favorites.php" def _init(self): self.search_url = self.root + "/favorites.php" ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1746776695.0 gallery_dl-1.29.7/gallery_dl/extractor/facebook.py0000644000175000017500000003554415007331167020716 0ustar00mikemike# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.facebook.com/""" from .common import Extractor, Message from .. import text, exception BASE_PATTERN = r"(?:https?://)?(?:[\w-]+\.)?facebook\.com" class FacebookExtractor(Extractor): """Base class for Facebook extractors""" category = "facebook" root = "https://www.facebook.com" directory_fmt = ("{category}", "{username}", "{title} ({set_id})") filename_fmt = "{id}.{extension}" archive_fmt = "{id}.{extension}" set_url_fmt = root + "/media/set/?set={set_id}" photo_url_fmt = root + "/photo/?fbid={photo_id}&set={set_id}" def _init(self): headers = self.session.headers headers["Accept"] = ( "text/html,application/xhtml+xml,application/xml;q=0.9," "image/avif,image/webp,image/png,image/svg+xml,*/*;q=0.8" ) headers["Sec-Fetch-Dest"] = "empty" headers["Sec-Fetch-Mode"] = "navigate" headers["Sec-Fetch-Site"] = "same-origin" self.fallback_retries = self.config("fallback-retries", 2) self.videos = self.config("videos", True) self.author_followups = self.config("author-followups", False) @staticmethod def decode_all(txt): return text.unescape( txt.encode().decode("unicode_escape") .encode("utf_16", "surrogatepass").decode("utf_16") ).replace("\\/", "/") @staticmethod def parse_set_page(set_page): directory = { "set_id": text.extr( set_page, '"mediaSetToken":"', '"' ) or text.extr( set_page, '"mediasetToken":"', '"' ), "username": FacebookExtractor.decode_all( text.extr( set_page, '"user":{"__isProfile":"User","name":"', '","' ) or text.extr( set_page, '"actors":[{"__typename":"User","name":"', '","' ) ), "user_id": text.extr( set_page, '"owner":{"__typename":"User","id":"', '"' ), "title": FacebookExtractor.decode_all(text.extr( set_page, '"title":{"text":"', '"' )), "first_photo_id": text.extr( set_page, '{"__typename":"Photo","__isMedia":"Photo","', '","creation_story"' ).rsplit('"id":"', 1)[-1] or text.extr( set_page, '{"__typename":"Photo","id":"', '"' ) } return directory @staticmethod def parse_photo_page(photo_page): photo = { "id": text.extr( photo_page, '"__isNode":"Photo","id":"', '"' ), "set_id": text.extr( photo_page, '"url":"https:\\/\\/www.facebook.com\\/photo\\/?fbid=', '"' ).rsplit("&set=", 1)[-1], "username": FacebookExtractor.decode_all(text.extr( photo_page, '"owner":{"__typename":"User","name":"', '"' )), "user_id": text.extr( photo_page, '"owner":{"__typename":"User","id":"', '"' ), "caption": FacebookExtractor.decode_all(text.extr( photo_page, '"message":{"delight_ranges"', '"},"message_preferred_body"' ).rsplit('],"text":"', 1)[-1]), "date": text.parse_timestamp( text.extr(photo_page, '\\"publish_time\\":', ',') or text.extr(photo_page, '"created_time":', ',') ), "url": FacebookExtractor.decode_all(text.extr( photo_page, ',"image":{"uri":"', '","' )), "next_photo_id": text.extr( photo_page, '"nextMediaAfterNodeId":{"__typename":"Photo","id":"', '"' ) or text.extr( photo_page, '"nextMedia":{"edges":[{"node":{"__typename":"Photo","id":"', '"' ) } text.nameext_from_url(photo["url"], photo) photo["followups_ids"] = [] for comment_raw in text.extract_iter( photo_page, '{"node":{"id"', '"cursor":null}' ): if ('"is_author_original_poster":true' in comment_raw and '{"__typename":"Photo","id":"' in comment_raw): photo["followups_ids"].append(text.extr( comment_raw, '{"__typename":"Photo","id":"', '"' )) return photo @staticmethod def parse_post_page(post_page): first_photo_url = text.extr( text.extr( post_page, '"__isMedia":"Photo"', '"target_group"' ), '"url":"', ',' ) post = { "set_id": text.extr(post_page, '{"mediaset_token":"', '"') or text.extr(first_photo_url, 'set=', '"').rsplit("&", 1)[0] } return post @staticmethod def parse_video_page(video_page): video = { "id": text.extr( video_page, '\\"video_id\\":\\"', '\\"' ), "username": FacebookExtractor.decode_all(text.extr( video_page, '"actors":[{"__typename":"User","name":"', '","' )), "user_id": text.extr( video_page, '"owner":{"__typename":"User","id":"', '"' ), "date": text.parse_timestamp(text.extr( video_page, '\\"publish_time\\":', ',' )), "type": "video" } if not video["username"]: video["username"] = FacebookExtractor.decode_all(text.extr( video_page, '"__typename":"User","id":"' + video["user_id"] + '","name":"', '","' )) first_video_raw = text.extr( video_page, '"permalink_url"', '\\/Period>\\u003C\\/MPD>' ) audio = { **video, "url": FacebookExtractor.decode_all(text.extr( text.extr( first_video_raw, "AudioChannelConfiguration", "BaseURL>\\u003C" ), "BaseURL>", "\\u003C\\/" )), "type": "audio" } video["urls"] = {} for raw_url in text.extract_iter( first_video_raw, 'FBQualityLabel=\\"', '\\u003C\\/BaseURL>' ): resolution = raw_url.split('\\"', 1)[0] video["urls"][resolution] = FacebookExtractor.decode_all( raw_url.split('BaseURL>', 1)[1] ) if not video["urls"]: return video, audio video["url"] = max( video["urls"].items(), key=lambda x: text.parse_int(x[0][:-1]) )[1] text.nameext_from_url(video["url"], video) audio["filename"] = video["filename"] audio["extension"] = "m4a" return video, audio def photo_page_request_wrapper(self, url, **kwargs): LEFT_OFF_TXT = "" if url.endswith("&set=") else ( "\nYou can use this URL to continue from " "where you left off (added \"&setextract\"): " "\n" + url + "&setextract" ) res = self.request(url, **kwargs) if res.url.startswith(self.root + "/login"): raise exception.AuthenticationError( "You must be logged in to continue viewing images." + LEFT_OFF_TXT ) if b'{"__dr":"CometErrorRoot.react"}' in res.content: raise exception.StopExtraction( "You've been temporarily blocked from viewing images. " "\nPlease try using a different account, " "using a VPN or waiting before you retry." + LEFT_OFF_TXT ) return res def extract_set(self, set_data): set_id = set_data["set_id"] all_photo_ids = [set_data["first_photo_id"]] retries = 0 i = 0 while i < len(all_photo_ids): photo_id = all_photo_ids[i] photo_url = self.photo_url_fmt.format( photo_id=photo_id, set_id=set_id ) photo_page = self.photo_page_request_wrapper(photo_url).text photo = self.parse_photo_page(photo_page) photo["num"] = i + 1 if self.author_followups: for followup_id in photo["followups_ids"]: if followup_id not in all_photo_ids: self.log.debug( "Found a followup in comments: %s", followup_id ) all_photo_ids.append(followup_id) if not photo["url"]: if retries < self.fallback_retries and self._interval_429: seconds = self._interval_429() self.log.warning( "Failed to find photo download URL for %s. " "Retrying in %s seconds.", photo_url, seconds, ) self.wait(seconds=seconds, reason="429 Too Many Requests") retries += 1 continue else: self.log.error( "Failed to find photo download URL for " + photo_url + ". Skipping." ) retries = 0 else: retries = 0 photo.update(set_data) yield Message.Directory, photo yield Message.Url, photo["url"], photo if not photo["next_photo_id"]: self.log.debug( "Can't find next image in the set. " "Extraction is over." ) elif photo["next_photo_id"] in all_photo_ids: if photo["next_photo_id"] != photo["id"]: self.log.debug( "Detected a loop in the set, it's likely finished. " "Extraction is over." ) else: all_photo_ids.append(photo["next_photo_id"]) i += 1 class FacebookSetExtractor(FacebookExtractor): """Base class for Facebook Set extractors""" subcategory = "set" pattern = ( BASE_PATTERN + r"/(?:(?:media/set|photo)/?\?(?:[^&#]+&)*set=([^&#]+)" r"[^/?#]*(?= fee] if fees: plan = plans[min(fees)] else: plan = plans[0].copy() plan["fee"] = fee post["plan"] = plans[fee] = plan if self._meta_comments: if post["commentCount"]: post["comments"] = list(self._get_comment_data(post_id)) else: post["commentd"] = () return content_body, post @memcache(keyarg=1) def _get_user_data(self, creator_id): url = "https://api.fanbox.cc/creator.get" params = {"creatorId": creator_id} data = self.request(url, params=params, headers=self.headers).json() user = data["body"] user.update(user.pop("user")) return user @memcache(keyarg=1) def _get_plan_data(self, creator_id): url = "https://api.fanbox.cc/plan.listCreator" params = {"creatorId": creator_id} data = self.request(url, params=params, headers=self.headers).json() plans = {0: { "id" : "", "title" : "", "fee" : 0, "description" : "", "coverImageUrl" : "", "creatorId" : creator_id, "hasAdultContent": None, "paymentMethod" : None, }} for plan in data["body"]: del plan["user"] plans[plan["fee"]] = plan return plans def _get_comment_data(self, post_id): url = ("https://api.fanbox.cc/post.getComments" "?limit=10&postId=" + post_id) comments = [] while url: url = text.ensure_http_scheme(url) body = self.request(url, headers=self.headers).json()["body"] data = body["commentList"] comments.extend(data["items"]) url = data["nextUrl"] return comments def _get_urls_from_post(self, content_body, post): num = 0 cover_image = post.get("coverImageUrl") if cover_image: cover_image = re.sub("/c/[0-9a-z_]+", "", cover_image) final_post = post.copy() final_post["isCoverImage"] = True final_post["fileUrl"] = cover_image text.nameext_from_url(cover_image, final_post) final_post["num"] = num num += 1 yield Message.Url, cover_image, final_post if not content_body: return if "html" in content_body: html_urls = [] for href in text.extract_iter(content_body["html"], 'href="', '"'): if "fanbox.pixiv.net/images/entry" in href: html_urls.append(href) elif "downloads.fanbox.cc" in href: html_urls.append(href) for src in text.extract_iter(content_body["html"], 'data-src-original="', '"'): html_urls.append(src) for url in html_urls: final_post = post.copy() text.nameext_from_url(url, final_post) final_post["fileUrl"] = url final_post["num"] = num num += 1 yield Message.Url, url, final_post for group in ("images", "imageMap"): if group in content_body: for item in content_body[group]: if group == "imageMap": # imageMap is a dict with image objects as values item = content_body[group][item] final_post = post.copy() final_post["fileUrl"] = item["originalUrl"] text.nameext_from_url(item["originalUrl"], final_post) if "extension" in item: final_post["extension"] = item["extension"] final_post["fileId"] = item.get("id") final_post["width"] = item.get("width") final_post["height"] = item.get("height") final_post["num"] = num num += 1 yield Message.Url, item["originalUrl"], final_post for group in ("files", "fileMap"): if group in content_body: for item in content_body[group]: if group == "fileMap": # fileMap is a dict with file objects as values item = content_body[group][item] final_post = post.copy() final_post["fileUrl"] = item["url"] text.nameext_from_url(item["url"], final_post) if "extension" in item: final_post["extension"] = item["extension"] if "name" in item: final_post["filename"] = item["name"] final_post["fileId"] = item.get("id") final_post["num"] = num num += 1 yield Message.Url, item["url"], final_post if self.embeds: embeds_found = [] if "video" in content_body: embeds_found.append(content_body["video"]) embeds_found.extend(content_body.get("embedMap", {}).values()) for embed in embeds_found: # embed_result is (message type, url, metadata dict) embed_result = self._process_embed(post, embed) if not embed_result: continue embed_result[2]["num"] = num num += 1 yield embed_result def _process_embed(self, post, embed): final_post = post.copy() provider = embed["serviceProvider"] content_id = embed.get("videoId") or embed.get("contentId") prefix = "ytdl:" if self.embeds == "ytdl" else "" url = None is_video = False if provider == "soundcloud": url = prefix+"https://soundcloud.com/"+content_id is_video = True elif provider == "youtube": url = prefix+"https://youtube.com/watch?v="+content_id is_video = True elif provider == "vimeo": url = prefix+"https://vimeo.com/"+content_id is_video = True elif provider == "fanbox": # this is an old URL format that redirects # to a proper Fanbox URL url = "https://www.pixiv.net/fanbox/"+content_id # resolve redirect try: url = self.request_location(url) except Exception as exc: url = None self.log.warning("Unable to extract fanbox embed %s (%s: %s)", content_id, exc.__class__.__name__, exc) else: final_post["_extractor"] = FanboxPostExtractor elif provider == "twitter": url = "https://twitter.com/_/status/"+content_id elif provider == "google_forms": templ = "https://docs.google.com/forms/d/e/{}/viewform?usp=sf_link" url = templ.format(content_id) else: self.log.warning("service not recognized: {}".format(provider)) if url: final_post["embed"] = embed final_post["embedUrl"] = url text.nameext_from_url(url, final_post) msg_type = Message.Queue if is_video and self.embeds == "ytdl": msg_type = Message.Url return msg_type, url, final_post class FanboxCreatorExtractor(FanboxExtractor): """Extractor for a Fanbox creator's works""" subcategory = "creator" pattern = USER_PATTERN + r"(?:/posts)?/?$" example = "https://USER.fanbox.cc/" def __init__(self, match): FanboxExtractor.__init__(self, match) self.creator_id = match.group(1) or match.group(2) def posts(self): url = "https://api.fanbox.cc/post.paginateCreator?creatorId=" return self._pagination_creator(url + self.creator_id) def _pagination_creator(self, url): urls = self.request(url, headers=self.headers).json()["body"] for url in urls: url = text.ensure_http_scheme(url) body = self.request(url, headers=self.headers).json()["body"] for item in body: try: yield self._get_post_data(item["id"]) except Exception as exc: self.log.warning("Skipping post %s (%s: %s)", item["id"], exc.__class__.__name__, exc) class FanboxPostExtractor(FanboxExtractor): """Extractor for media from a single Fanbox post""" subcategory = "post" pattern = USER_PATTERN + r"/posts/(\d+)" example = "https://USER.fanbox.cc/posts/12345" def __init__(self, match): FanboxExtractor.__init__(self, match) self.post_id = match.group(3) def posts(self): return (self._get_post_data(self.post_id),) class FanboxHomeExtractor(FanboxExtractor): """Extractor for your Fanbox home feed""" subcategory = "home" pattern = BASE_PATTERN + r"/?$" example = "https://fanbox.cc/" def posts(self): url = "https://api.fanbox.cc/post.listHome?limit=10" return self._pagination(url) class FanboxSupportingExtractor(FanboxExtractor): """Extractor for your supported Fanbox users feed""" subcategory = "supporting" pattern = BASE_PATTERN + r"/home/supporting" example = "https://fanbox.cc/home/supporting" def posts(self): url = "https://api.fanbox.cc/post.listSupporting?limit=10" return self._pagination(url) class FanboxRedirectExtractor(Extractor): """Extractor for pixiv redirects to fanbox.cc""" category = "fanbox" subcategory = "redirect" pattern = r"(?:https?://)?(?:www\.)?pixiv\.net/fanbox/creator/(\d+)" example = "https://www.pixiv.net/fanbox/creator/12345" def items(self): url = "https://www.pixiv.net/fanbox/creator/" + self.groups[0] location = self.request_location(url, notfound="user") yield Message.Queue, location, {"_extractor": FanboxCreatorExtractor} ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1745260818.0 gallery_dl-1.29.7/gallery_dl/extractor/fantia.py0000644000175000017500000001553015001510422020363 0ustar00mikemike# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://fantia.jp/""" from .common import Extractor, Message from .. import text, util class FantiaExtractor(Extractor): """Base class for Fantia extractors""" category = "fantia" root = "https://fantia.jp" directory_fmt = ("{category}", "{fanclub_id}") filename_fmt = "{post_id}_{file_id}.{extension}" archive_fmt = "{post_id}_{file_id}" _warning = True def _init(self): self.headers = { "Accept" : "application/json, text/plain, */*", "X-Requested-With": "XMLHttpRequest", } self._empty_plan = { "id" : 0, "price": 0, "limit": 0, "name" : "", "description": "", "thumb": self.root + "/images/fallback/plan/thumb_default.png", } if self._warning: if not self.cookies_check(("_session_id",)): self.log.warning("no '_session_id' cookie set") FantiaExtractor._warning = False def items(self): for post_id in self.posts(): post = self._get_post_data(post_id) post["num"] = 0 contents = self._get_post_contents(post) post["content_count"] = len(contents) post["content_num"] = 0 for content in contents: files = self._process_content(post, content) yield Message.Directory, post if content["visible_status"] != "visible": self.log.warning( "Unable to download '%s' files from " "%s#post-content-id-%s", content["visible_status"], post["post_url"], content["id"]) for file in files: post.update(file) post["num"] += 1 text.nameext_from_url( post["content_filename"] or file["file_url"], post) yield Message.Url, file["file_url"], post post["content_num"] += 1 def posts(self): """Return post IDs""" def _pagination(self, url): params = {"page": 1} while True: page = self.request(url, params=params).text self._csrf_token(page) post_id = None for post_id in text.extract_iter( page, 'class="link-block" href="/posts/', '"'): yield post_id if not post_id: return params["page"] += 1 def _csrf_token(self, page=None): if not page: page = self.request(self.root + "/").text self.headers["X-CSRF-Token"] = text.extr( page, 'name="csrf-token" content="', '"') def _get_post_data(self, post_id): """Fetch and process post data""" url = self.root+"/api/v1/posts/"+post_id resp = self.request(url, headers=self.headers).json()["post"] return { "post_id": resp["id"], "post_url": self.root + "/posts/" + str(resp["id"]), "post_title": resp["title"], "comment": resp["comment"], "rating": resp["rating"], "posted_at": resp["posted_at"], "date": text.parse_datetime( resp["posted_at"], "%a, %d %b %Y %H:%M:%S %z"), "fanclub_id": resp["fanclub"]["id"], "fanclub_user_id": resp["fanclub"]["user"]["id"], "fanclub_user_name": resp["fanclub"]["user"]["name"], "fanclub_name": resp["fanclub"]["name"], "fanclub_url": self.root+"/fanclubs/"+str(resp["fanclub"]["id"]), "tags": [t["name"] for t in resp["tags"]], "_data": resp, } def _get_post_contents(self, post): contents = post["_data"]["post_contents"] try: url = post["_data"]["thumb"]["original"] except Exception: pass else: contents.insert(0, { "id": "thumb", "title": "thumb", "category": "thumb", "download_uri": url, "visible_status": "visible", "plan": None, }) return contents def _process_content(self, post, content): post["content_category"] = content["category"] post["content_title"] = content["title"] post["content_filename"] = content.get("filename") or "" post["content_id"] = content["id"] post["content_comment"] = content.get("comment") or "" post["content_num"] += 1 post["plan"] = content["plan"] or self._empty_plan files = [] if "post_content_photos" in content: for photo in content["post_content_photos"]: files.append({"file_id" : photo["id"], "file_url": photo["url"]["original"]}) if "download_uri" in content: url = content["download_uri"] if url[0] == "/": url = self.root + url files.append({"file_id" : content["id"], "file_url": url}) if content["category"] == "blog" and "comment" in content: comment_json = util.json_loads(content["comment"]) blog_text = "" for op in comment_json.get("ops") or (): insert = op.get("insert") if isinstance(insert, str): blog_text += insert elif isinstance(insert, dict) and "fantiaImage" in insert: img = insert["fantiaImage"] files.append({"file_id" : img["id"], "file_url": self.root + img["original_url"]}) post["blogpost_text"] = blog_text else: post["blogpost_text"] = "" return files class FantiaCreatorExtractor(FantiaExtractor): """Extractor for a Fantia creator's works""" subcategory = "creator" pattern = r"(?:https?://)?(?:www\.)?fantia\.jp/fanclubs/(\d+)" example = "https://fantia.jp/fanclubs/12345" def __init__(self, match): FantiaExtractor.__init__(self, match) self.creator_id = match.group(1) def posts(self): url = "{}/fanclubs/{}/posts".format(self.root, self.creator_id) return self._pagination(url) class FantiaPostExtractor(FantiaExtractor): """Extractor for media from a single Fantia post""" subcategory = "post" pattern = r"(?:https?://)?(?:www\.)?fantia\.jp/posts/(\d+)" example = "https://fantia.jp/posts/12345" def __init__(self, match): FantiaExtractor.__init__(self, match) self.post_id = match.group(1) def posts(self): self._csrf_token() return (self.post_id,) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1745260818.0 gallery_dl-1.29.7/gallery_dl/extractor/fapachi.py0000644000175000017500000000444315001510422020515 0ustar00mikemike# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://fapachi.com/""" from .common import Extractor, Message from .. import text class FapachiPostExtractor(Extractor): """Extractor for individual posts on fapachi.com""" category = "fapachi" subcategory = "post" root = "https://fapachi.com" directory_fmt = ("{category}", "{user}") filename_fmt = "{user}_{id}.{extension}" archive_fmt = "{user}_{id}" pattern = (r"(?:https?://)?(?:www\.)?fapachi\.com" r"/(?!search/)([^/?#]+)/media/(\d+)") example = "https://fapachi.com/MODEL/media/12345" def __init__(self, match): Extractor.__init__(self, match) self.user, self.id = match.groups() def items(self): data = { "user": self.user, "id" : self.id, } page = self.request("{}/{}/media/{}".format( self.root, self.user, self.id)).text url = self.root + text.extract( page, 'data-src="', '"', page.index('class="media-img'))[0] yield Message.Directory, data yield Message.Url, url, text.nameext_from_url(url, data) class FapachiUserExtractor(Extractor): """Extractor for all posts from a fapachi user""" category = "fapachi" subcategory = "user" root = "https://fapachi.com" pattern = (r"(?:https?://)?(?:www\.)?fapachi\.com" r"/(?!search(?:/|$))([^/?#]+)(?:/page/(\d+))?$") example = "https://fapachi.com/MODEL" def __init__(self, match): Extractor.__init__(self, match) self.user = match.group(1) self.num = text.parse_int(match.group(2), 1) def items(self): data = {"_extractor": FapachiPostExtractor} while True: page = self.request("{}/{}/page/{}".format( self.root, self.user, self.num)).text for post in text.extract_iter(page, 'model-media-prew">', ">"): path = text.extr(post, '
    Next page' not in page: return self.num += 1 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1746776695.0 gallery_dl-1.29.7/gallery_dl/extractor/fapello.py0000644000175000017500000001000315007331167020546 0ustar00mikemike# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://fapello.com/""" from .common import Extractor, Message from .. import text, exception BASE_PATTERN = r"(?:https?://)?(?:www\.)?fapello\.(?:com|su)" class FapelloPostExtractor(Extractor): """Extractor for individual posts on fapello.com""" category = "fapello" subcategory = "post" directory_fmt = ("{category}", "{model}") filename_fmt = "{model}_{id}.{extension}" archive_fmt = "{type}_{model}_{id}" pattern = BASE_PATTERN + r"/(?!search/|popular_videos/)([^/?#]+)/(\d+)" example = "https://fapello.com/MODEL/12345/" def __init__(self, match): Extractor.__init__(self, match) self.root = text.root_from_url(match.group(0)) self.model, self.id = match.groups() def items(self): url = "{}/{}/{}/".format(self.root, self.model, self.id) page = text.extr( self.request(url, allow_redirects=False).text, 'class="uk-align-center"', "
    ", None) if page is None: raise exception.NotFoundError("post") data = { "model": self.model, "id" : text.parse_int(self.id), "type" : "video" if 'type="video' in page else "photo", "thumbnail": text.extr(page, 'poster="', '"'), } url = text.extr(page, 'src="', '"').replace( ".md", "").replace(".th", "") yield Message.Directory, data yield Message.Url, url, text.nameext_from_url(url, data) class FapelloModelExtractor(Extractor): """Extractor for all posts from a fapello model""" category = "fapello" subcategory = "model" pattern = (BASE_PATTERN + r"/(?!top-(?:likes|followers)|popular_videos" r"|videos|trending|search/?$)" r"([^/?#]+)/?$") example = "https://fapello.com/model/" def __init__(self, match): Extractor.__init__(self, match) self.root = text.root_from_url(match.group(0)) self.model = match.group(1) def items(self): num = 1 data = {"_extractor": FapelloPostExtractor} while True: url = "{}/ajax/model/{}/page-{}/".format( self.root, self.model, num) page = self.request(url).text if not page: return url = None for url in text.extract_iter(page, '', ""): yield Message.Queue, text.extr(item, ' 0 and (int(size["width"]) > self.maxsize or int(size["height"]) > self.maxsize): del sizes[index:] break return sizes def photos_search(self, params): """Return a list of photos matching some criteria.""" return self._pagination("photos.search", params.copy()) def photosets_getInfo(self, photoset_id, user_id): """Gets information about a photoset.""" params = {"photoset_id": photoset_id, "user_id": user_id} photoset = self._call("photosets.getInfo", params)["photoset"] return self._clean_info(photoset) def photosets_getList(self, user_id): """Returns the photosets belonging to the specified user.""" params = {"user_id": user_id} return self._pagination_sets("photosets.getList", params) def photosets_getPhotos(self, photoset_id): """Get the list of photos in a set.""" params = {"photoset_id": photoset_id} return self._pagination("photosets.getPhotos", params, "photoset") def urls_lookupGroup(self, groupname): """Returns a group NSID, given the url to a group's page.""" params = {"url": "https://www.flickr.com/groups/" + groupname} group = self._call("urls.lookupGroup", params)["group"] return {"nsid": group["id"], "path_alias": groupname, "groupname": group["groupname"]["_content"]} def urls_lookupUser(self, username): """Returns a user NSID, given the url to a user's photos or profile.""" params = {"url": "https://www.flickr.com/photos/" + username} user = self._call("urls.lookupUser", params)["user"] return { "nsid" : user["id"], "username" : user["username"]["_content"], "path_alias": username, } def video_getStreamInfo(self, video_id, secret=None): """Returns all available video streams""" params = {"photo_id": video_id} if not secret: secret = self._call("photos.getInfo", params)["photo"]["secret"] params["secret"] = secret stream = self._call("video.getStreamInfo", params)["streams"]["stream"] return max(stream, key=lambda s: self.VIDEO_FORMATS.get(s["type"], 0)) def _call(self, method, params): params["method"] = "flickr." + method params["format"] = "json" params["nojsoncallback"] = "1" if self.api_key: params["api_key"] = self.api_key response = self.request(self.API_URL, params=params) try: data = response.json() except ValueError: data = {"code": -1, "message": response.content} if "code" in data: msg = data.get("message") self.log.debug("Server response: %s", data) if data["code"] == 1: raise exception.NotFoundError(self.extractor.subcategory) elif data["code"] == 2: raise exception.AuthorizationError(msg) elif data["code"] == 98: raise exception.AuthenticationError(msg) elif data["code"] == 99: raise exception.AuthorizationError(msg) raise exception.StopExtraction("API request failed: %s", msg) return data def _pagination(self, method, params, key="photos"): extras = ("description,date_upload,tags,views,media," "path_alias,owner_name,") includes = self.extractor.config("metadata") if includes: if isinstance(includes, (list, tuple)): includes = ",".join(includes) elif not isinstance(includes, str): includes = ("license,date_taken,original_format,last_update," "geo,machine_tags,o_dims") extras = extras + includes + "," extras += ",".join("url_" + fmt[0] for fmt in self.formats) params["extras"] = extras params["page"] = 1 while True: data = self._call(method, params)[key] yield from data["photo"] if params["page"] >= data["pages"]: return params["page"] += 1 def _pagination_sets(self, method, params): params["page"] = 1 while True: data = self._call(method, params)["photosets"] yield from data["photoset"] if params["page"] >= data["pages"]: return params["page"] += 1 def _extract_format(self, photo): photo["description"] = photo["description"]["_content"].strip() photo["views"] = text.parse_int(photo["views"]) photo["date"] = text.parse_timestamp(photo["dateupload"]) photo["tags"] = photo["tags"].split() self._extract_metadata(photo) photo["id"] = text.parse_int(photo["id"]) if "owner" not in photo: photo["owner"] = self.extractor.user elif not self.meta_info: photo["owner"] = { "nsid" : photo["owner"], "username" : photo["ownername"], "path_alias": photo["pathalias"], } del photo["pathalias"] del photo["ownername"] if photo["media"] == "video" and self.videos: return self._extract_video(photo) for fmt, fmtname, fmtwidth in self.formats: key = "url_" + fmt if key in photo: photo["width"] = text.parse_int(photo["width_" + fmt]) photo["height"] = text.parse_int(photo["height_" + fmt]) if self.maxsize and (photo["width"] > self.maxsize or photo["height"] > self.maxsize): continue photo["url"] = photo[key] photo["label"] = fmtname # remove excess data keys = [ key for key in photo if key.startswith(("url_", "width_", "height_")) ] for key in keys: del photo[key] break else: self._extract_photo(photo) return photo def _extract_photo(self, photo): size = self.photos_getSizes(photo["id"])[-1] photo["url"] = size["source"] photo["label"] = size["label"] photo["width"] = text.parse_int(size["width"]) photo["height"] = text.parse_int(size["height"]) return photo def _extract_video(self, photo): stream = self.video_getStreamInfo(photo["id"], photo.get("secret")) photo["url"] = stream["_content"] photo["label"] = stream["type"] photo["width"] = photo["height"] = 0 return photo def _extract_metadata(self, photo, info=True): if info and self.meta_info: try: photo.update(self.photos_getInfo(photo["id"])) photo["title"] = photo["title"]["_content"] photo["comments"] = text.parse_int( photo["comments"]["_content"]) photo["description"] = photo["description"]["_content"] photo["tags"] = [t["raw"] for t in photo["tags"]["tag"]] photo["views"] = text.parse_int(photo["views"]) photo["id"] = text.parse_int(photo["id"]) except Exception as exc: self.log.warning( "Unable to retrieve 'info' data for %s (%s: %s)", photo["id"], exc.__class__.__name__, exc) if self.meta_exif: try: photo.update(self.photos_getExif(photo["id"])) except Exception as exc: self.log.warning( "Unable to retrieve 'exif' data for %s (%s: %s)", photo["id"], exc.__class__.__name__, exc) if self.meta_contexts: try: photo.update(self.photos_getAllContexts(photo["id"])) except Exception as exc: self.log.warning( "Unable to retrieve 'contexts' data for %s (%s: %s)", photo["id"], exc.__class__.__name__, exc) if "license" in photo: photo["license_name"] = self.LICENSES.get(photo["license"]) @staticmethod def _clean_info(info): info["title"] = info["title"]["_content"] info["description"] = info["description"]["_content"] return info ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1746776695.0 gallery_dl-1.29.7/gallery_dl/extractor/foolfuuka.py0000644000175000017500000002012115007331167021121 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2019-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for FoolFuuka 4chan archives""" from .common import BaseExtractor, Message from .. import text import itertools class FoolfuukaExtractor(BaseExtractor): """Base extractor for FoolFuuka based boards/archives""" basecategory = "foolfuuka" filename_fmt = "{timestamp_ms} {filename_media}.{extension}" archive_fmt = "{board[shortname]}_{num}_{timestamp}" external = "default" def __init__(self, match): BaseExtractor.__init__(self, match) if self.category == "b4k": self.remote = self._remote_direct elif self.category == "archivedmoe": self.referer = False def items(self): yield Message.Directory, self.metadata() for post in self.posts(): media = post["media"] if not media: continue url = media["media_link"] if not url and "remote_media_link" in media: url = self.remote(media) if url and url[0] == "/": url = self.root + url post["filename"], _, post["extension"] = \ media["media"].rpartition(".") post["filename_media"] = media["media_filename"].rpartition(".")[0] post["timestamp_ms"] = text.parse_int( media["media_orig"].rpartition(".")[0]) yield Message.Url, url, post def metadata(self): """Return general metadata""" def posts(self): """Return an iterable with all relevant posts""" def remote(self, media): """Resolve a remote media link""" page = self.request(media["remote_media_link"]).text url = text.extr(page, 'http-equiv="Refresh" content="0; url=', '"') if url.endswith(".webm") and \ url.startswith("https://thebarchive.com/"): return url[:-1] return url @staticmethod def _remote_direct(media): return media["remote_media_link"] BASE_PATTERN = FoolfuukaExtractor.update({ "4plebs": { "root": "https://archive.4plebs.org", "pattern": r"(?:archive\.)?4plebs\.org", }, "archivedmoe": { "root": "https://archived.moe", "pattern": r"archived\.moe", }, "archiveofsins": { "root": "https://archiveofsins.com", "pattern": r"(?:www\.)?archiveofsins\.com", }, "b4k": { "root": "https://arch.b4k.dev", "pattern": r"arch\.b4k\.(?:dev|co)", }, "desuarchive": { "root": "https://desuarchive.org", "pattern": r"desuarchive\.org", }, "fireden": { "root": "https://boards.fireden.net", "pattern": r"boards\.fireden\.net", }, "palanq": { "root": "https://archive.palanq.win", "pattern": r"archive\.palanq\.win", }, "rbt": { "root": "https://rbt.asia", "pattern": r"(?:rbt\.asia|(?:archive\.)?rebeccablacktech\.com)", }, "thebarchive": { "root": "https://thebarchive.com", "pattern": r"thebarchive\.com", }, }) class FoolfuukaThreadExtractor(FoolfuukaExtractor): """Base extractor for threads on FoolFuuka based boards/archives""" subcategory = "thread" directory_fmt = ("{category}", "{board[shortname]}", "{thread_num} {title|comment[:50]}") pattern = BASE_PATTERN + r"/([^/?#]+)/thread/(\d+)" example = "https://archived.moe/a/thread/12345/" def __init__(self, match): FoolfuukaExtractor.__init__(self, match) self.board = self.groups[-2] self.thread = self.groups[-1] self.data = None def metadata(self): url = self.root + "/_/api/chan/thread/" params = {"board": self.board, "num": self.thread} self.data = self.request(url, params=params).json()[self.thread] return self.data["op"] def posts(self): op = (self.data["op"],) posts = self.data.get("posts") if posts: posts = list(posts.values()) posts.sort(key=lambda p: p["timestamp"]) return itertools.chain(op, posts) return op class FoolfuukaBoardExtractor(FoolfuukaExtractor): """Base extractor for FoolFuuka based boards/archives""" subcategory = "board" pattern = BASE_PATTERN + r"/([^/?#]+)(?:/(?:page/)?(\d*))?$" example = "https://archived.moe/a/" def __init__(self, match): FoolfuukaExtractor.__init__(self, match) self.board = self.groups[-2] self.page = self.groups[-1] def items(self): index_base = "{}/_/api/chan/index/?board={}&page=".format( self.root, self.board) thread_base = "{}/{}/thread/".format(self.root, self.board) page = self.page for pnum in itertools.count(text.parse_int(page, 1)): with self.request(index_base + format(pnum)) as response: try: threads = response.json() except ValueError: threads = None if not threads: return for num, thread in threads.items(): thread["url"] = thread_base + format(num) thread["_extractor"] = FoolfuukaThreadExtractor yield Message.Queue, thread["url"], thread if page: return class FoolfuukaSearchExtractor(FoolfuukaExtractor): """Base extractor for search results on FoolFuuka based boards/archives""" subcategory = "search" directory_fmt = ("{category}", "search", "{search}") pattern = BASE_PATTERN + r"/([^/?#]+)/search((?:/[^/?#]+/[^/?#]+)+)" example = "https://archived.moe/_/search/text/QUERY/" request_interval = (0.5, 1.5) def __init__(self, match): FoolfuukaExtractor.__init__(self, match) self.params = params = {} key = None for arg in self.groups[-1].split("/"): if key: params[key] = text.unescape(arg) key = None else: key = arg board = self.groups[-2] if board != "_": params["boards"] = board def metadata(self): return {"search": self.params.get("text", "")} def posts(self): url = self.root + "/_/api/chan/search/" params = self.params.copy() params["page"] = text.parse_int(params.get("page"), 1) if "filter" not in params: params["filter"] = "text" while True: try: data = self.request(url, params=params).json() except ValueError: return if isinstance(data, dict): if data.get("error"): return posts = data["0"]["posts"] elif isinstance(data, list): posts = data[0]["posts"] else: return yield from posts if len(posts) <= 3: return params["page"] += 1 class FoolfuukaGalleryExtractor(FoolfuukaExtractor): """Base extractor for FoolFuuka galleries""" subcategory = "gallery" directory_fmt = ("{category}", "{board}", "gallery") pattern = BASE_PATTERN + r"/([^/?#]+)/gallery(?:/(\d+))?" example = "https://archived.moe/a/gallery" def __init__(self, match): FoolfuukaExtractor.__init__(self, match) board = match.group(match.lastindex) if board.isdecimal(): self.board = match.group(match.lastindex-1) self.pages = (board,) else: self.board = board self.pages = map(format, itertools.count(1)) def metadata(self): return {"board": self.board} def posts(self): base = "{}/_/api/chan/gallery/?board={}&page=".format( self.root, self.board) for page in self.pages: with self.request(base + page) as response: posts = response.json() if not posts: return yield from posts ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1745260818.0 gallery_dl-1.29.7/gallery_dl/extractor/foolslide.py0000644000175000017500000001047115001510422021100 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2016-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for FoOlSlide based sites""" from .common import BaseExtractor, Message from .. import text, util class FoolslideExtractor(BaseExtractor): """Base class for FoOlSlide extractors""" basecategory = "foolslide" def __init__(self, match): BaseExtractor.__init__(self, match) self.gallery_url = self.root + match.group(match.lastindex) def request(self, url): return BaseExtractor.request( self, url, encoding="utf-8", method="POST", data={"adult": "true"}) @staticmethod def parse_chapter_url(url, data): info = url.partition("/read/")[2].rstrip("/").split("/") lang = info[1].partition("-")[0] data["lang"] = lang data["language"] = util.code_to_language(lang) data["volume"] = text.parse_int(info[2]) data["chapter"] = text.parse_int(info[3]) data["chapter_minor"] = "." + info[4] if len(info) >= 5 else "" data["title"] = data["chapter_string"].partition(":")[2].strip() return data BASE_PATTERN = FoolslideExtractor.update({ }) class FoolslideChapterExtractor(FoolslideExtractor): """Base class for chapter extractors for FoOlSlide based sites""" subcategory = "chapter" directory_fmt = ("{category}", "{manga}", "{chapter_string}") filename_fmt = ( "{manga}_c{chapter:>03}{chapter_minor:?//}_{page:>03}.{extension}") archive_fmt = "{id}" pattern = BASE_PATTERN + r"(/read/[^/?#]+/[a-z-]+/\d+/\d+(?:/\d+)?)" example = "https://read.powermanga.org/read/MANGA/en/0/123/" def items(self): page = self.request(self.gallery_url).text data = self.metadata(page) imgs = self.images(page) data["count"] = len(imgs) data["chapter_id"] = text.parse_int(imgs[0]["chapter_id"]) yield Message.Directory, data enum = util.enumerate_reversed if self.config( "page-reverse") else enumerate for data["page"], image in enum(imgs, 1): try: url = image["url"] del image["url"] del image["chapter_id"] del image["thumb_url"] except KeyError: pass for key in ("height", "id", "size", "width"): image[key] = text.parse_int(image[key]) data.update(image) text.nameext_from_url(data["filename"], data) yield Message.Url, url, data def metadata(self, page): extr = text.extract_from(page) extr('

    ', '') return self.parse_chapter_url(self.gallery_url, { "manga" : text.unescape(extr('title="', '"')).strip(), "chapter_string": text.unescape(extr('title="', '"')), }) def images(self, page): return util.json_loads(text.extr(page, "var pages = ", ";")) class FoolslideMangaExtractor(FoolslideExtractor): """Base class for manga extractors for FoOlSlide based sites""" subcategory = "manga" categorytransfer = True pattern = BASE_PATTERN + r"(/series/[^/?#]+)" example = "https://read.powermanga.org/series/MANGA/" def items(self): page = self.request(self.gallery_url).text chapters = self.chapters(page) if not self.config("chapter-reverse", False): chapters.reverse() for chapter, data in chapters: data["_extractor"] = FoolslideChapterExtractor yield Message.Queue, chapter, data def chapters(self, page): extr = text.extract_from(page) manga = text.unescape(extr('

    ', '

    ')).strip() author = extr('Author: ', 'Artist: ', '
    ")) path = extr('href="//d', '"') if not path: msg = text.remove_html( extr('System Message', '') or extr('System Message', '
    ') ).partition(" . Continue ")[0] return self.log.warning( "Unable to download post %s (\"%s\")", post_id, msg) pi = text.parse_int rh = text.remove_html data = text.nameext_from_url(path, { "id" : pi(post_id), "url": "https://d" + path, }) if self._new_layout: data["tags"] = text.split_html(extr( 'class="tags-row">', '')) data["scraps"] = (extr(' submissions">', "<") == "Scraps") data["title"] = text.unescape(extr("

    ", "

    ")) data["artist_url"] = extr('title="', '"').strip() data["artist"] = extr(">", "<") data["_description"] = extr( 'class="submission-description user-submitted-links">', ' ') data["views"] = pi(rh(extr('class="views">', ''))) data["favorites"] = pi(rh(extr('class="favorites">', ''))) data["comments"] = pi(rh(extr('class="comments">', ''))) data["rating"] = rh(extr('class="rating">', '')) data["fa_category"] = rh(extr('>Category', '')) data["theme"] = rh(extr('>', '<')) data["species"] = rh(extr('>Species', '')) data["gender"] = rh(extr('>Gender', '')) data["width"] = pi(extr("", "x")) data["height"] = pi(extr("", "p")) data["folders"] = folders = [] for folder in extr( "

    Listed in Folders

    ", "").split(""): folder = rh(folder) if folder: folders.append(folder) else: # old site layout data["scraps"] = ( "/scraps/" in extr('class="minigallery-title', "")) data["title"] = text.unescape(extr("

    ", "

    ")) data["artist_url"] = extr('title="', '"').strip() data["artist"] = extr(">", "<") data["fa_category"] = extr("Category:", "<").strip() data["theme"] = extr("Theme:", "<").strip() data["species"] = extr("Species:", "<").strip() data["gender"] = extr("Gender:", "<").strip() data["favorites"] = pi(extr("Favorites:", "<")) data["comments"] = pi(extr("Comments:", "<")) data["views"] = pi(extr("Views:", "<")) data["width"] = pi(extr("Resolution:", "x")) data["height"] = pi(extr("", "<")) data["tags"] = text.split_html(extr( 'id="keywords">', ''))[::2] data["rating"] = extr('', ' ')
            data[', ' ') data["folders"] = () # folders not present in old layout data["user"] = self.user or data["artist_url"] data["date"] = text.parse_timestamp(data["filename"].partition(".")[0]) data["description"] = self._process_description(data["_description"]) data["thumbnail"] = "https://t.furaffinity.net/{}@600-{}.jpg".format( post_id, path.rsplit("/", 2)[1]) return data @staticmethod def _process_description(description): return text.unescape(text.remove_html(description, "", "")) def _pagination(self, path, folder=None): num = 1 folder = "" if folder is None else "/folder/{}/a".format(folder) while True: url = "{}/{}/{}{}/{}/".format( self.root, path, self.user, folder, num) page = self.request(url).text post_id = None for post_id in text.extract_iter(page, 'id="sid-', '"'): yield post_id if not post_id: return num += 1 def _pagination_favorites(self): path = "/favorites/{}/".format(self.user) while path: page = self.request(self.root + path).text extr = text.extract_from(page) while True: post_id = extr('id="sid-', '"') if not post_id: break self._favorite_id = text.parse_int(extr('data-fav-id="', '"')) yield post_id pos = page.find('type="submit">Next') if pos >= 0: path = text.rextract(page, '
    ", "").strip() title, _, gallery_id = title.rpartition("#") return { "gallery_id" : text.parse_int(gallery_id), "gallery_hash": self.gallery_hash, "title" : text.unescape(title[:-15]), "views" : data.get("hits"), "score" : data.get("rating"), "tags" : (data.get("tags") or "").split(","), } def images(self, page): return [ ("https:" + image["imageUrl"], image) for image in self.data["images"] ] class FuskatorSearchExtractor(Extractor): """Extractor for search results on fuskator.com""" category = "fuskator" subcategory = "search" root = "https://fuskator.com" pattern = r"(?:https?://)?fuskator\.com(/(?:search|page)/.+)" example = "https://fuskator.com/search/TAG/" def __init__(self, match): Extractor.__init__(self, match) self.path = match.group(1) def items(self): url = self.root + self.path data = {"_extractor": FuskatorGalleryExtractor} while True: page = self.request(url).text for path in text.extract_iter( page, 'class="pic_pad">', '>>><') if not pages: return url = self.root + text.rextract(pages, 'href="', '"')[0] ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1746776695.0 gallery_dl-1.29.7/gallery_dl/extractor/gelbooru.py0000644000175000017500000002240615007331167020754 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2014-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://gelbooru.com/""" from .common import Extractor, Message from . import gelbooru_v02 from .. import text, exception import binascii BASE_PATTERN = r"(?:https?://)?(?:www\.)?gelbooru\.com/(?:index\.php)?\?" class GelbooruBase(): """Base class for gelbooru extractors""" category = "gelbooru" basecategory = "booru" root = "https://gelbooru.com" offset = 0 def _api_request(self, params, key="post", log=False): if "s" not in params: params["s"] = "post" params["api_key"] = self.api_key params["user_id"] = self.user_id url = self.root + "/index.php?page=dapi&q=index&json=1" data = self.request(url, params=params).json() if not key: return data try: posts = data[key] except KeyError: if log: self.log.error("Incomplete API response (missing '%s')", key) self.log.debug("%s", data) return [] if not isinstance(posts, list): return (posts,) return posts def _pagination(self, params): params["pid"] = self.page_start params["limit"] = self.per_page limit = self.per_page // 2 pid = False if "tags" in params: tags = params["tags"].split() op = "<" id = False for tag in tags: if tag.startswith("sort:"): if tag == "sort:id:asc": op = ">" elif tag == "sort:id" or tag.startswith("sort:id:"): op = "<" else: pid = True elif tag.startswith("id:"): id = True if not pid: if id: tag = "id:" + op tags = [t for t in tags if not t.startswith(tag)] tags = "{} id:{}".format(" ".join(tags), op) while True: posts = self._api_request(params) yield from posts if len(posts) < limit: return if pid: params["pid"] += 1 else: if "pid" in params: del params["pid"] params["tags"] = tags + str(posts[-1]["id"]) def _pagination_html(self, params): url = self.root + "/index.php" params["pid"] = self.offset data = {} while True: num_ids = 0 page = self.request(url, params=params).text for data["id"] in text.extract_iter(page, '" id="p', '"'): num_ids += 1 yield from self._api_request(data) if num_ids < self.per_page: return params["pid"] += self.per_page def _file_url(self, post): url = post["file_url"] if url.endswith((".webm", ".mp4")): post["_fallback"] = (url,) md5 = post["md5"] root = text.root_from_url(post["preview_url"]) path = "/images/{}/{}/{}.webm".format(md5[0:2], md5[2:4], md5) url = root + path return url def _notes(self, post, page): notes_data = text.extr(page, '
    ') if not notes_data: return post["notes"] = notes = [] extr = text.extract for note in text.extract_iter(notes_data, ''): notes.append({ "width" : int(extr(note, 'data-width="', '"')[0]), "height": int(extr(note, 'data-height="', '"')[0]), "x" : int(extr(note, 'data-x="', '"')[0]), "y" : int(extr(note, 'data-y="', '"')[0]), "body" : extr(note, 'data-body="', '"')[0], }) def _skip_offset(self, num): self.offset += num return num class GelbooruTagExtractor(GelbooruBase, gelbooru_v02.GelbooruV02TagExtractor): """Extractor for images from gelbooru.com based on search-tags""" pattern = BASE_PATTERN + r"page=post&s=list&tags=([^&#]*)" example = "https://gelbooru.com/index.php?page=post&s=list&tags=TAG" class GelbooruPoolExtractor(GelbooruBase, gelbooru_v02.GelbooruV02PoolExtractor): """Extractor for gelbooru pools""" per_page = 45 pattern = BASE_PATTERN + r"page=pool&s=show&id=(\d+)" example = "https://gelbooru.com/index.php?page=pool&s=show&id=12345" skip = GelbooruBase._skip_offset def metadata(self): url = self.root + "/index.php" self._params = { "page": "pool", "s" : "show", "id" : self.pool_id, } page = self.request(url, params=self._params).text name, pos = text.extract(page, "

    Now Viewing: ", "

    ") if not name: raise exception.NotFoundError("pool") return { "pool": text.parse_int(self.pool_id), "pool_name": text.unescape(name), } def posts(self): return self._pagination_html(self._params) class GelbooruFavoriteExtractor(GelbooruBase, gelbooru_v02.GelbooruV02FavoriteExtractor): """Extractor for gelbooru favorites""" per_page = 100 pattern = BASE_PATTERN + r"page=favorites&s=view&id=(\d+)" example = "https://gelbooru.com/index.php?page=favorites&s=view&id=12345" skip = GelbooruBase._skip_offset def posts(self): # get number of favorites params = { "s" : "favorite", "id" : self.favorite_id, "limit": "2", } data = self._api_request(params, None, True) count = data["@attributes"]["count"] self.log.debug("API reports %s favorite entries", count) favs = data["favorite"] try: order = 1 if favs[0]["id"] < favs[1]["id"] else -1 except LookupError as exc: self.log.debug( "Error when determining API favorite order (%s: %s)", exc.__class__.__name__, exc) order = -1 else: self.log.debug("API yields favorites in %sscending order", "a" if order > 0 else "de") order_favs = self.config("order-posts") if order_favs and order_favs[0] in ("r", "a"): self.log.debug("Returning them in reverse") order = -order if order < 0: return self._pagination(params, count) return self._pagination_reverse(params, count) def _pagination(self, params, count): if self.offset: pnum, skip = divmod(self.offset, self.per_page) else: pnum = skip = 0 params["pid"] = pnum params["limit"] = self.per_page while True: favs = self._api_request(params, "favorite") if not favs: return if skip: favs = favs[skip:] skip = 0 for fav in favs: for post in self._api_request({"id": fav["favorite"]}): post["date_favorited"] = text.parse_timestamp(fav["added"]) yield post params["pid"] += 1 def _pagination_reverse(self, params, count): pnum, last = divmod(count-1, self.per_page) if self.offset > last: # page number change self.offset -= last diff, self.offset = divmod(self.offset-1, self.per_page) pnum -= diff + 1 skip = self.offset params["pid"] = pnum params["limit"] = self.per_page while True: favs = self._api_request(params, "favorite") favs.reverse() if skip: favs = favs[skip:] skip = 0 for fav in favs: for post in self._api_request({"id": fav["favorite"]}): post["date_favorited"] = text.parse_timestamp(fav["added"]) yield post params["pid"] -= 1 if params["pid"] < 0: return class GelbooruPostExtractor(GelbooruBase, gelbooru_v02.GelbooruV02PostExtractor): """Extractor for single images from gelbooru.com""" pattern = (BASE_PATTERN + r"(?=(?:[^#]+&)?page=post(?:&|#|$))" r"(?=(?:[^#]+&)?s=view(?:&|#|$))" r"(?:[^#]+&)?id=(\d+)") example = "https://gelbooru.com/index.php?page=post&s=view&id=12345" class GelbooruRedirectExtractor(GelbooruBase, Extractor): subcategory = "redirect" pattern = (r"(?:https?://)?(?:www\.)?gelbooru\.com" r"/redirect\.php\?s=([^&#]+)") example = "https://gelbooru.com/redirect.php?s=BASE64" def __init__(self, match): Extractor.__init__(self, match) self.url_base64 = match.group(1) def items(self): url = text.ensure_http_scheme(binascii.a2b_base64( self.url_base64).decode()) data = {"_extractor": GelbooruPostExtractor} yield Message.Queue, url, data ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1743510441.0 gallery_dl-1.29.7/gallery_dl/extractor/gelbooru_v01.py0000644000175000017500000001071214772755651021457 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2021-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for Gelbooru Beta 0.1.11 sites""" from . import booru from .. import text class GelbooruV01Extractor(booru.BooruExtractor): basecategory = "gelbooru_v01" per_page = 20 def _parse_post(self, post_id): url = "{}/index.php?page=post&s=view&id={}".format( self.root, post_id) extr = text.extract_from(self.request(url).text) post = { "id" : post_id, "created_at": extr('Posted: ', ' <'), "uploader" : extr('By: ', ' <'), "width" : extr('Size: ', 'x'), "height" : extr('', ' <'), "source" : extr('Source: ', ' <'), "rating" : (extr('Rating: ', '<') or "?")[0].lower(), "score" : extr('Score: ', ' <'), "file_url" : extr('img', '<')), } post["md5"] = post["file_url"].rpartition("/")[2].partition(".")[0] post["date"] = text.parse_datetime( post["created_at"], "%Y-%m-%d %H:%M:%S") return post def skip(self, num): self.page_start += num return num def _pagination(self, url, begin, end): pid = self.page_start while True: page = self.request(url + str(pid)).text cnt = 0 for post_id in text.extract_iter(page, begin, end): yield self._parse_post(post_id) cnt += 1 if cnt < self.per_page: return pid += self.per_page BASE_PATTERN = GelbooruV01Extractor.update({ "thecollection": { "root": "https://the-collection.booru.org", "pattern": r"the-collection\.booru\.org", }, "illusioncardsbooru": { "root": "https://illusioncards.booru.org", "pattern": r"illusioncards\.booru\.org", }, "allgirlbooru": { "root": "https://allgirl.booru.org", "pattern": r"allgirl\.booru\.org", }, "drawfriends": { "root": "https://drawfriends.booru.org", "pattern": r"drawfriends\.booru\.org", }, "vidyart2": { "root": "https://vidyart2.booru.org", "pattern": r"vidyart2\.booru\.org", }, }) class GelbooruV01TagExtractor(GelbooruV01Extractor): subcategory = "tag" directory_fmt = ("{category}", "{search_tags}") archive_fmt = "t_{search_tags}_{id}" pattern = BASE_PATTERN + r"/index\.php\?page=post&s=list&tags=([^&#]+)" example = "https://allgirl.booru.org/index.php?page=post&s=list&tags=TAG" def __init__(self, match): GelbooruV01Extractor.__init__(self, match) self.tags = match.group(match.lastindex) def metadata(self): return {"search_tags": text.unquote(self.tags.replace("+", " "))} def posts(self): url = "{}/index.php?page=post&s=list&tags={}&pid=".format( self.root, self.tags) return self._pagination(url, 'class="thumb">
    = total: return if not num: self.log.debug("Empty response - Retrying") continue params["pid"] += 1 def _pagination_html(self, params): url = self.root + "/index.php" params["pid"] = self.page_start * self.per_page data = {} find_ids = re.compile(r"\sid=\"p(\d+)").findall while True: page = self.request(url, params=params).text pids = find_ids(page) for data["id"] in pids: for post in self._api_request(data): yield post.attrib if len(pids) < self.per_page: return params["pid"] += self.per_page @staticmethod def _prepare(post): post["tags"] = post["tags"].strip() post["date"] = text.parse_datetime( post["created_at"], "%a %b %d %H:%M:%S %z %Y") def _html(self, post): return self.request("{}/index.php?page=post&s=view&id={}".format( self.root, post["id"])).text def _tags(self, post, page): tag_container = (text.extr(page, '