././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1722197851.6217132 sacad-2.8.0/0000755000175000017500000000000014651523534012166 5ustar00maximemaxime././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1616451900.0 sacad-2.8.0/LICENSE0000664000175000017500000004052514026214474013177 0ustar00maximemaximeMozilla Public License Version 2.0 ================================== 1. Definitions -------------- 1.1. "Contributor" means each individual or legal entity that creates, contributes to the creation of, or owns Covered Software. 1.2. "Contributor Version" means the combination of the Contributions of others (if any) used by a Contributor and that particular Contributor's Contribution. 1.3. "Contribution" means Covered Software of a particular Contributor. 1.4. "Covered Software" means Source Code Form to which the initial Contributor has attached the notice in Exhibit A, the Executable Form of such Source Code Form, and Modifications of such Source Code Form, in each case including portions thereof. 1.5. "Incompatible With Secondary Licenses" means (a) that the initial Contributor has attached the notice described in Exhibit B to the Covered Software; or (b) that the Covered Software was made available under the terms of version 1.1 or earlier of the License, but not also under the terms of a Secondary License. 1.6. "Executable Form" means any form of the work other than Source Code Form. 1.7. "Larger Work" means a work that combines Covered Software with other material, in a separate file or files, that is not Covered Software. 1.8. "License" means this document. 1.9. "Licensable" means having the right to grant, to the maximum extent possible, whether at the time of the initial grant or subsequently, any and all of the rights conveyed by this License. 1.10. "Modifications" means any of the following: (a) any file in Source Code Form that results from an addition to, deletion from, or modification of the contents of Covered Software; or (b) any new file in Source Code Form that contains any Covered Software. 1.11. "Patent Claims" of a Contributor means any patent claim(s), including without limitation, method, process, and apparatus claims, in any patent Licensable by such Contributor that would be infringed, but for the grant of the License, by the making, using, selling, offering for sale, having made, import, or transfer of either its Contributions or its Contributor Version. 1.12. "Secondary License" means either the GNU General Public License, Version 2.0, the GNU Lesser General Public License, Version 2.1, the GNU Affero General Public License, Version 3.0, or any later versions of those licenses. 1.13. "Source Code Form" means the form of the work preferred for making modifications. 1.14. "You" (or "Your") means an individual or a legal entity exercising rights under this License. For legal entities, "You" includes any entity that controls, is controlled by, or is under common control with You. For purposes of this definition, "control" means (a) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (b) ownership of more than fifty percent (50%) of the outstanding shares or beneficial ownership of such entity. 2. License Grants and Conditions -------------------------------- 2.1. Grants Each Contributor hereby grants You a world-wide, royalty-free, non-exclusive license: (a) under intellectual property rights (other than patent or trademark) Licensable by such Contributor to use, reproduce, make available, modify, display, perform, distribute, and otherwise exploit its Contributions, either on an unmodified basis, with Modifications, or as part of a Larger Work; and (b) under Patent Claims of such Contributor to make, use, sell, offer for sale, have made, import, and otherwise transfer either its Contributions or its Contributor Version. 2.2. Effective Date The licenses granted in Section 2.1 with respect to any Contribution become effective for each Contribution on the date the Contributor first distributes such Contribution. 2.3. Limitations on Grant Scope The licenses granted in this Section 2 are the only rights granted under this License. No additional rights or licenses will be implied from the distribution or licensing of Covered Software under this License. Notwithstanding Section 2.1(b) above, no patent license is granted by a Contributor: (a) for any code that a Contributor has removed from Covered Software; or (b) for infringements caused by: (i) Your and any other third party's modifications of Covered Software, or (ii) the combination of its Contributions with other software (except as part of its Contributor Version); or (c) under Patent Claims infringed by Covered Software in the absence of its Contributions. This License does not grant any rights in the trademarks, service marks, or logos of any Contributor (except as may be necessary to comply with the notice requirements in Section 3.4). 2.4. Subsequent Licenses No Contributor makes additional grants as a result of Your choice to distribute the Covered Software under a subsequent version of this License (see Section 10.2) or under the terms of a Secondary License (if permitted under the terms of Section 3.3). 2.5. Representation Each Contributor represents that the Contributor believes its Contributions are its original creation(s) or it has sufficient rights to grant the rights to its Contributions conveyed by this License. 2.6. Fair Use This License is not intended to limit any rights You have under applicable copyright doctrines of fair use, fair dealing, or other equivalents. 2.7. Conditions Sections 3.1, 3.2, 3.3, and 3.4 are conditions of the licenses granted in Section 2.1. 3. Responsibilities ------------------- 3.1. Distribution of Source Form All distribution of Covered Software in Source Code Form, including any Modifications that You create or to which You contribute, must be under the terms of this License. You must inform recipients that the Source Code Form of the Covered Software is governed by the terms of this License, and how they can obtain a copy of this License. You may not attempt to alter or restrict the recipients' rights in the Source Code Form. 3.2. Distribution of Executable Form If You distribute Covered Software in Executable Form then: (a) such Covered Software must also be made available in Source Code Form, as described in Section 3.1, and You must inform recipients of the Executable Form how they can obtain a copy of such Source Code Form by reasonable means in a timely manner, at a charge no more than the cost of distribution to the recipient; and (b) You may distribute such Executable Form under the terms of this License, or sublicense it under different terms, provided that the license for the Executable Form does not attempt to limit or alter the recipients' rights in the Source Code Form under this License. 3.3. Distribution of a Larger Work You may create and distribute a Larger Work under terms of Your choice, provided that You also comply with the requirements of this License for the Covered Software. If the Larger Work is a combination of Covered Software with a work governed by one or more Secondary Licenses, and the Covered Software is not Incompatible With Secondary Licenses, this License permits You to additionally distribute such Covered Software under the terms of such Secondary License(s), so that the recipient of the Larger Work may, at their option, further distribute the Covered Software under the terms of either this License or such Secondary License(s). 3.4. Notices You may not remove or alter the substance of any license notices (including copyright notices, patent notices, disclaimers of warranty, or limitations of liability) contained within the Source Code Form of the Covered Software, except that You may alter any license notices to the extent required to remedy known factual inaccuracies. 3.5. Application of Additional Terms You may choose to offer, and to charge a fee for, warranty, support, indemnity or liability obligations to one or more recipients of Covered Software. However, You may do so only on Your own behalf, and not on behalf of any Contributor. You must make it absolutely clear that any such warranty, support, indemnity, or liability obligation is offered by You alone, and You hereby agree to indemnify every Contributor for any liability incurred by such Contributor as a result of warranty, support, indemnity or liability terms You offer. You may include additional disclaimers of warranty and limitations of liability specific to any jurisdiction. 4. Inability to Comply Due to Statute or Regulation --------------------------------------------------- If it is impossible for You to comply with any of the terms of this License with respect to some or all of the Covered Software due to statute, judicial order, or regulation then You must: (a) comply with the terms of this License to the maximum extent possible; and (b) describe the limitations and the code they affect. Such description must be placed in a text file included with all distributions of the Covered Software under this License. Except to the extent prohibited by statute or regulation, such description must be sufficiently detailed for a recipient of ordinary skill to be able to understand it. 5. Termination -------------- 5.1. The rights granted under this License will terminate automatically if You fail to comply with any of its terms. However, if You become compliant, then the rights granted under this License from a particular Contributor are reinstated (a) provisionally, unless and until such Contributor explicitly and finally terminates Your grants, and (b) on an ongoing basis, if such Contributor fails to notify You of the non-compliance by some reasonable means prior to 60 days after You have come back into compliance. Moreover, Your grants from a particular Contributor are reinstated on an ongoing basis if such Contributor notifies You of the non-compliance by some reasonable means, this is the first time You have received notice of non-compliance with this License from such Contributor, and You become compliant prior to 30 days after Your receipt of the notice. 5.2. If You initiate litigation against any entity by asserting a patent infringement claim (excluding declaratory judgment actions, counter-claims, and cross-claims) alleging that a Contributor Version directly or indirectly infringes any patent, then the rights granted to You by any and all Contributors for the Covered Software under Section 2.1 of this License shall terminate. 5.3. In the event of termination under Sections 5.1 or 5.2 above, all end user license agreements (excluding distributors and resellers) which have been validly granted by You or Your distributors under this License prior to termination shall survive termination. ************************************************************************ * * * 6. Disclaimer of Warranty * * ------------------------- * * * * Covered Software is provided under this License on an "as is" * * basis, without warranty of any kind, either expressed, implied, or * * statutory, including, without limitation, warranties that the * * Covered Software is free of defects, merchantable, fit for a * * particular purpose or non-infringing. The entire risk as to the * * quality and performance of the Covered Software is with You. * * Should any Covered Software prove defective in any respect, You * * (not any Contributor) assume the cost of any necessary servicing, * * repair, or correction. This disclaimer of warranty constitutes an * * essential part of this License. No use of any Covered Software is * * authorized under this License except under this disclaimer. * * * ************************************************************************ ************************************************************************ * * * 7. Limitation of Liability * * -------------------------- * * * * Under no circumstances and under no legal theory, whether tort * * (including negligence), contract, or otherwise, shall any * * Contributor, or anyone who distributes Covered Software as * * permitted above, be liable to You for any direct, indirect, * * special, incidental, or consequential damages of any character * * including, without limitation, damages for lost profits, loss of * * goodwill, work stoppage, computer failure or malfunction, or any * * and all other commercial damages or losses, even if such party * * shall have been informed of the possibility of such damages. This * * limitation of liability shall not apply to liability for death or * * personal injury resulting from such party's negligence to the * * extent applicable law prohibits such limitation. Some * * jurisdictions do not allow the exclusion or limitation of * * incidental or consequential damages, so this exclusion and * * limitation may not apply to You. * * * ************************************************************************ 8. Litigation ------------- Any litigation relating to this License may be brought only in the courts of a jurisdiction where the defendant maintains its principal place of business and such litigation shall be governed by laws of that jurisdiction, without reference to its conflict-of-law provisions. Nothing in this Section shall prevent a party's ability to bring cross-claims or counter-claims. 9. Miscellaneous ---------------- This License represents the complete agreement concerning the subject matter hereof. If any provision of this License is held to be unenforceable, such provision shall be reformed only to the extent necessary to make it enforceable. Any law or regulation which provides that the language of a contract shall be construed against the drafter shall not be used to construe this License against a Contributor. 10. Versions of the License --------------------------- 10.1. New Versions Mozilla Foundation is the license steward. Except as provided in Section 10.3, no one other than the license steward has the right to modify or publish new versions of this License. Each version will be given a distinguishing version number. 10.2. Effect of New Versions You may distribute the Covered Software under the terms of the version of the License under which You originally received the Covered Software, or under the terms of any subsequent version published by the license steward. 10.3. Modified Versions If you create software not governed by this License, and you want to create a new license for such software, you may create and use a modified version of this License if you rename the license and remove any references to the name of the license steward (except to note that such modified license differs from this License). 10.4. Distributing Source Code Form that is Incompatible With Secondary Licenses If You choose to distribute Source Code Form that is Incompatible With Secondary Licenses under the terms of this version of the License, the notice described in Exhibit B of this License must be attached. Exhibit A - Source Code Form License Notice ------------------------------------------- This Source Code Form is subject to the terms of the Mozilla Public License, v. 2.0. If a copy of the MPL was not distributed with this file, You can obtain one at http://mozilla.org/MPL/2.0/. If it is not possible or desirable to put the notice in a particular file, then You may include the notice in a location (such as a LICENSE file in a relevant directory) where a recipient would be likely to look for such a notice. You may add additional accurate notices of copyright ownership. Exhibit B - "Incompatible With Secondary Licenses" Notice --------------------------------------------------------- This Source Code Form is "Incompatible With Secondary Licenses", as defined by the Mozilla Public License, v. 2.0. ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1564081451.0 sacad-2.8.0/MANIFEST.in0000644000175000017500000000010113516376453013721 0ustar00maximemaximeinclude LICENSE README.md requirements.txt test-requirements.txt ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1722197851.6217132 sacad-2.8.0/PKG-INFO0000644000175000017500000001532614651523534013272 0ustar00maximemaximeMetadata-Version: 2.1 Name: sacad Version: 2.8.0 Summary: Search and download music album covers Home-page: https://github.com/desbma/sacad Download-URL: https://github.com/desbma/sacad/archive/2.8.0.tar.gz Author: desbma Keywords: download,album,cover,art,albumart,music Classifier: Development Status :: 5 - Production/Stable Classifier: Environment :: Console Classifier: Intended Audience :: End Users/Desktop Classifier: License :: OSI Approved :: Mozilla Public License 2.0 (MPL 2.0) Classifier: Natural Language :: English Classifier: Operating System :: OS Independent Classifier: Programming Language :: Python Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3 :: Only Classifier: Programming Language :: Python :: 3.8 Classifier: Programming Language :: Python :: 3.9 Classifier: Programming Language :: Python :: 3.10 Classifier: Programming Language :: Python :: 3.11 Classifier: Programming Language :: Python :: 3.12 Classifier: Topic :: Internet :: WWW/HTTP Classifier: Topic :: Multimedia :: Graphics Classifier: Topic :: Utilities Description-Content-Type: text/markdown License-File: LICENSE Requires-Dist: aiohttp>=3.6 Requires-Dist: appdirs>=1.4.0 Requires-Dist: bitarray>=2.0.0 Requires-Dist: cssselect>=1.1.0 Requires-Dist: lxml>=4.0.0 Requires-Dist: mutagen>=1.31 Requires-Dist: pillow>=8.0.0 Requires-Dist: tqdm>=4.28.1 Requires-Dist: unidecode>=1.1.1 Requires-Dist: web_cache>=1.1.0 # SACAD ## Smart Automatic Cover Art Downloader [![PyPI version](https://img.shields.io/pypi/v/sacad.svg?style=flat)](https://pypi.python.org/pypi/sacad/) [![AUR version](https://img.shields.io/aur/version/sacad.svg?style=flat)](https://aur.archlinux.org/packages/sacad/) [![CI status](https://img.shields.io/github/actions/workflow/status/desbma/sacad/ci.yml)](https://github.com/desbma/sacad/actions) [![Supported Python versions](https://img.shields.io/pypi/pyversions/sacad.svg?style=flat)](https://pypi.python.org/pypi/sacad/) [![License](https://img.shields.io/github/license/desbma/sacad.svg?style=flat)](https://github.com/desbma/sacad/blob/master/LICENSE) SACAD is a multi platform command line tool to download album covers without manual intervention, ideal for integration in scripts, audio players, etc. SACAD also provides a second command line tool, `sacad_r`, to scan a music library, read metadata from audio tags, and download missing covers automatically, optionally embedding the image into audio audio files. ## Features - Can target specific image size, and find results for high resolution covers - Support JPEG and PNG formats - Customizable output: save image along with the audio files / in a different directory named by artist/album / embed cover in audio files... - Currently support the following cover sources: - ~~Amazon CD (.com, .ca, .cn, .fr, .de, .co.jp and .co.uk variants) & Amazon digital music~~ (removed, too unreliable) - ~~CoverLib~~ (site is dead) - Deezer - Discogs - ~~Google Images~~ (removed, too unreliable) - Last.fm - Itunes - Smart sorting algorithm to select THE best cover for a given query, using several factors: source reliability, image format, image size, image similarity with reference cover, etc. - Automatically crunch images with optipng, oxipng or jpegoptim (can save 30% of filesize without any loss of quality, great for portable players) - Cache search results locally for faster future search - Do everything to avoid getting blocked by the sources: hide user-agent and automatically take care of rate limiting - Automatically convert/resize image if needed - Multiplatform (Windows/Mac/Linux) SACAD is designed to be robust and be executed in batch of thousands of queries: - HTML parsing is done without regex but with the LXML library, which is faster, and more robust to page changes - When the size of an image reported by a source is not reliable (ie. Google Images), automatically download the first KB of the file to get its real size from the file header - Process several queries simultaneously (using [asyncio](https://docs.python.org/3/library/asyncio.html)), to speed up processing - Automatically reuse TCP connections (HTTP Keep-Alive), for better network performance - Automatically retry failed HTTP requests - Music library scan supports all common audio formats (MP3, AAC, Vorbis, FLAC..) - Cover sources page or API changes are quickly detected, thanks to high test coverage, and SACAD is quickly updated accordingly ## Installation SACAD requires [Python](https://www.python.org/downloads/) >= 3.8. ### Standalone Windows executable Windows users can download a [standalone binary](https://github.com/desbma/sacad/releases/latest) which does not require Python. ### Arch Linux Arch Linux users can install the [sacad](https://aur.archlinux.org/packages/sacad/) AUR package. ### From PyPI (with PIP) 1. If you don't already have it, [install pip](https://pip.pypa.io/en/stable/installing/) for Python 3 2. Install SACAD: `pip3 install sacad` ### From source 1. If you don't already have it, [install setuptools](https://pypi.python.org/pypi/setuptools#installation-instructions) for Python 3 2. Clone this repository: `git clone https://github.com/desbma/sacad` 3. Install SACAD: `python3 setup.py install` #### Optional Additionally, if you want to benefit from image crunching (lossless recompression to save additional space): - Install [oxipng](https://github.com/shssoichiro/oxipng) or [optipng](http://optipng.sourceforge.net/) - Install [jpegoptim](http://freecode.com/projects/jpegoptim) On Ubuntu and other Debian derivatives, you can install them with `sudo apt-get install optipng jpegoptim`. Note that depending of the speed of your CPU, crunching may significantly slow down processing as it is very CPU intensive (especially with optipng). ## Command line usage Two tools are provided: `sacad` to search and download one cover, and `sacad_r` to scan a music library and download all missing covers. Run `sacad -h` / `sacad_r -h` to get full command line reference. ### Examples To download the cover of _Master of Puppets_ from _Metallica_, to the file `AlbumArt.jpg`, targetting ~ 600x600 pixel resolution: `sacad "metallica" "master of puppets" 600 AlbumArt.jpg`. To download covers for your library with the same parameters as previous example: `sacad_r library_directory 600 AlbumArt.jpg`. ## Limitations - Only supports front covers ## Adding cover sources Adding a new cover source is very easy if you are a Python developer, you need to inherit the `CoverSource` class and implement the following methods: - `getSearchUrl(self, album, artist)` - `parseResults(self, api_data)` - `updateHttpHeaders(self, headers)` (optional) See comments in the code for more information. ## License [Mozilla Public License Version 2.0](https://www.mozilla.org/MPL/2.0/) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1722194536.0 sacad-2.8.0/README.md0000644000175000017500000001244714651515150013450 0ustar00maximemaxime# SACAD ## Smart Automatic Cover Art Downloader [![PyPI version](https://img.shields.io/pypi/v/sacad.svg?style=flat)](https://pypi.python.org/pypi/sacad/) [![AUR version](https://img.shields.io/aur/version/sacad.svg?style=flat)](https://aur.archlinux.org/packages/sacad/) [![CI status](https://img.shields.io/github/actions/workflow/status/desbma/sacad/ci.yml)](https://github.com/desbma/sacad/actions) [![Supported Python versions](https://img.shields.io/pypi/pyversions/sacad.svg?style=flat)](https://pypi.python.org/pypi/sacad/) [![License](https://img.shields.io/github/license/desbma/sacad.svg?style=flat)](https://github.com/desbma/sacad/blob/master/LICENSE) SACAD is a multi platform command line tool to download album covers without manual intervention, ideal for integration in scripts, audio players, etc. SACAD also provides a second command line tool, `sacad_r`, to scan a music library, read metadata from audio tags, and download missing covers automatically, optionally embedding the image into audio audio files. ## Features - Can target specific image size, and find results for high resolution covers - Support JPEG and PNG formats - Customizable output: save image along with the audio files / in a different directory named by artist/album / embed cover in audio files... - Currently support the following cover sources: - ~~Amazon CD (.com, .ca, .cn, .fr, .de, .co.jp and .co.uk variants) & Amazon digital music~~ (removed, too unreliable) - ~~CoverLib~~ (site is dead) - Deezer - Discogs - ~~Google Images~~ (removed, too unreliable) - Last.fm - Itunes - Smart sorting algorithm to select THE best cover for a given query, using several factors: source reliability, image format, image size, image similarity with reference cover, etc. - Automatically crunch images with optipng, oxipng or jpegoptim (can save 30% of filesize without any loss of quality, great for portable players) - Cache search results locally for faster future search - Do everything to avoid getting blocked by the sources: hide user-agent and automatically take care of rate limiting - Automatically convert/resize image if needed - Multiplatform (Windows/Mac/Linux) SACAD is designed to be robust and be executed in batch of thousands of queries: - HTML parsing is done without regex but with the LXML library, which is faster, and more robust to page changes - When the size of an image reported by a source is not reliable (ie. Google Images), automatically download the first KB of the file to get its real size from the file header - Process several queries simultaneously (using [asyncio](https://docs.python.org/3/library/asyncio.html)), to speed up processing - Automatically reuse TCP connections (HTTP Keep-Alive), for better network performance - Automatically retry failed HTTP requests - Music library scan supports all common audio formats (MP3, AAC, Vorbis, FLAC..) - Cover sources page or API changes are quickly detected, thanks to high test coverage, and SACAD is quickly updated accordingly ## Installation SACAD requires [Python](https://www.python.org/downloads/) >= 3.8. ### Standalone Windows executable Windows users can download a [standalone binary](https://github.com/desbma/sacad/releases/latest) which does not require Python. ### Arch Linux Arch Linux users can install the [sacad](https://aur.archlinux.org/packages/sacad/) AUR package. ### From PyPI (with PIP) 1. If you don't already have it, [install pip](https://pip.pypa.io/en/stable/installing/) for Python 3 2. Install SACAD: `pip3 install sacad` ### From source 1. If you don't already have it, [install setuptools](https://pypi.python.org/pypi/setuptools#installation-instructions) for Python 3 2. Clone this repository: `git clone https://github.com/desbma/sacad` 3. Install SACAD: `python3 setup.py install` #### Optional Additionally, if you want to benefit from image crunching (lossless recompression to save additional space): - Install [oxipng](https://github.com/shssoichiro/oxipng) or [optipng](http://optipng.sourceforge.net/) - Install [jpegoptim](http://freecode.com/projects/jpegoptim) On Ubuntu and other Debian derivatives, you can install them with `sudo apt-get install optipng jpegoptim`. Note that depending of the speed of your CPU, crunching may significantly slow down processing as it is very CPU intensive (especially with optipng). ## Command line usage Two tools are provided: `sacad` to search and download one cover, and `sacad_r` to scan a music library and download all missing covers. Run `sacad -h` / `sacad_r -h` to get full command line reference. ### Examples To download the cover of _Master of Puppets_ from _Metallica_, to the file `AlbumArt.jpg`, targetting ~ 600x600 pixel resolution: `sacad "metallica" "master of puppets" 600 AlbumArt.jpg`. To download covers for your library with the same parameters as previous example: `sacad_r library_directory 600 AlbumArt.jpg`. ## Limitations - Only supports front covers ## Adding cover sources Adding a new cover source is very easy if you are a Python developer, you need to inherit the `CoverSource` class and implement the following methods: - `getSearchUrl(self, album, artist)` - `parseResults(self, api_data)` - `updateHttpHeaders(self, headers)` (optional) See comments in the code for more information. ## License [Mozilla Public License Version 2.0](https://www.mozilla.org/MPL/2.0/) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1698670037.0 sacad-2.8.0/pyproject.toml0000644000175000017500000000003614517722725015105 0ustar00maximemaxime[tool.ruff] line-length = 120 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1669404060.0 sacad-2.8.0/requirements.txt0000644000175000017500000000022414340212634015437 0ustar00maximemaximeaiohttp>=3.6 appdirs>=1.4.0 bitarray>=2.0.0 cssselect>=1.1.0 lxml>=4.0.0 mutagen>=1.31 pillow>=8.0.0 tqdm>=4.28.1 unidecode>=1.1.1 web_cache>=1.1.0 ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1722197851.6217132 sacad-2.8.0/sacad/0000755000175000017500000000000014651523534013241 5ustar00maximemaxime././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1722197812.0 sacad-2.8.0/sacad/__init__.py0000755000175000017500000001627514651523464015372 0ustar00maximemaxime#!/usr/bin/env python3 """Smart Automatic Cover Art Downloader : search and download music album covers.""" __version__ = "2.8.0" __author__ = "desbma" __license__ = "MPL 2.0" import argparse import asyncio import functools import inspect import logging import os from typing import Any, Optional, Sequence from sacad import colored_logging, sources from sacad.cover import ( HAS_JPEGOPTIM, HAS_OPTIPNG, HAS_OXIPNG, SUPPORTED_IMG_FORMATS, CoverImageFormat, CoverSourceResult, ) from sacad.sources.base import CoverSource COVER_SOURCE_CLASSES = { m[0][: -len(CoverSource.__name__)].lower(): m[1] for m in inspect.getmembers(sources, lambda x: inspect.isclass(x) and issubclass(x, CoverSource)) } async def search_and_download( album: str, artist: str, format: CoverImageFormat, size: int, out_filepath: str, *, size_tolerance_prct: int, source_classes: Optional[Sequence[Any]] = None, preserve_format: bool = False, convert_progressive_jpeg: bool = False, ) -> bool: """Search and download a cover, return True if success, False instead.""" logger = logging.getLogger("Main") # register sources source_args = (size, size_tolerance_prct) if source_classes is None: source_classes = tuple(COVER_SOURCE_CLASSES.values()) assert source_classes is not None # makes MyPy chill cover_sources = [cls(*source_args) for cls in source_classes] # schedule search work search_futures = [] for cover_source in cover_sources: coroutine = cover_source.search(album, artist) future = asyncio.ensure_future(coroutine) search_futures.append(future) # wait for it await asyncio.wait(search_futures) # get results results = [] for future in search_futures: source_results = future.result() results.extend(source_results) # sort results results = await CoverSourceResult.preProcessForComparison(results, size, size_tolerance_prct) results.sort( reverse=True, key=functools.cmp_to_key( functools.partial( CoverSourceResult.compare, target_size=size, size_tolerance_prct=size_tolerance_prct, ) ), ) if not results: logger.info("No results") else: for i, result in enumerate(results, 1): logger.debug(f"#{i:02} {result}") # download done = False for result in results: try: await result.get( format, size, size_tolerance_prct, out_filepath, preserve_format=preserve_format, convert_progressive_jpeg=convert_progressive_jpeg, ) except Exception as e: logger.warning(f"Download of {result} failed: {e.__class__.__qualname__} {e}") continue else: done = True break # cleanup sessions close_cr = [] for cover_source in cover_sources: close_cr.append(cover_source.closeSession()) await asyncio.gather(*close_cr) return done def setup_common_args(arg_parser: argparse.ArgumentParser) -> None: """Set up command line arguments shared between sacad and sacad_r.""" arg_parser.add_argument( "-t", "--size-tolerance", type=int, default=25, dest="size_tolerance_prct", help="""Tolerate this percentage of size difference with the target size. Note that covers with size above or close to the target size will still be preferred if available""", ) arg_parser.add_argument( "-s", "--cover-sources", choices=tuple(COVER_SOURCE_CLASSES.keys()), default=tuple(COVER_SOURCE_CLASSES.keys()), nargs="+", help="Cover sources to use, if not set use all of them.", ) arg_parser.add_argument( "-p", "--preserve-format", action="store_true", default=False, help="Preserve source image format if possible. Target format will still be prefered when sorting results.", ) arg_parser.add_argument( "--convert-progressive-jpeg", action="store_true", default=False, help="Convert progressive JPEG to baseline if needed. May result in bigger files and loss of quality.", ) def cl_main() -> None: """Command line entry point for sacad_r.""" # parse args arg_parser = argparse.ArgumentParser( description=f"SACAD v{__version__}. Search and download an album cover.", formatter_class=argparse.ArgumentDefaultsHelpFormatter, ) arg_parser.add_argument("artist", help="Artist to search for") arg_parser.add_argument("album", help="Album to search for") arg_parser.add_argument("size", type=int, help="Target image size") arg_parser.add_argument("out_filepath", help="Output image filepath") setup_common_args(arg_parser) arg_parser.add_argument( "-v", "--verbosity", choices=("quiet", "warning", "normal", "debug"), default="normal", dest="verbosity", help="Level of logging output", ) args = arg_parser.parse_args() args.format = os.path.splitext(args.out_filepath)[1][1:].lower() try: args.format = SUPPORTED_IMG_FORMATS[args.format] except KeyError: print(f"Unable to guess image format from extension, or unknown format: {args.format}") exit(1) args.cover_sources = tuple(COVER_SOURCE_CLASSES[source] for source in args.cover_sources) # setup logger logging_level = { "quiet": logging.CRITICAL + 1, "warning": logging.WARNING, "normal": logging.INFO, "debug": logging.DEBUG, } logging.getLogger().setLevel(logging_level[args.verbosity]) if logging_level[args.verbosity] == logging.DEBUG: fmt = "%(asctime)s %(levelname)s [%(name)s] %(message)s" else: fmt = "%(name)s: %(message)s" logging_formatter = colored_logging.ColoredFormatter(fmt=fmt) logging_handler = logging.StreamHandler() logging_handler.setFormatter(logging_formatter) logging.getLogger().addHandler(logging_handler) if logging_level[args.verbosity] == logging.DEBUG: logging.getLogger("asyncio").setLevel(logging.WARNING) else: logging.getLogger("asyncio").setLevel(logging.CRITICAL + 1) # display warning if optipng/oxipng or jpegoptim are missing if not HAS_JPEGOPTIM: logging.getLogger("Main").warning("jpegoptim could not be found, JPEG crunching will be disabled") if not (HAS_OPTIPNG or HAS_OXIPNG): logging.getLogger("Main").warning("optipng or oxipng could not be found, PNG crunching will be disabled") # search and download coroutine = search_and_download( args.album, args.artist, args.format, args.size, args.out_filepath, size_tolerance_prct=args.size_tolerance_prct, source_classes=args.cover_sources, preserve_format=args.preserve_format, convert_progressive_jpeg=args.convert_progressive_jpeg, ) future = asyncio.ensure_future(coroutine) asyncio.get_event_loop().run_until_complete(future) if __name__ == "__main__": cl_main() ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1722171973.0 sacad-2.8.0/sacad/__main__.py0000755000175000017500000000021014651441105015320 0ustar00maximemaxime#!/usr/bin/env python3 """Command line entry point for sacad program.""" import sacad if __name__ == "__main__": sacad.cl_main() ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1722171973.0 sacad-2.8.0/sacad/colored_logging.py0000644000175000017500000000201514651441105016737 0ustar00maximemaxime"""Formatter for the logging module, coloring terminal output according to error criticity.""" import enum import logging import sys Colors = enum.Enum("Colors", ("RED", "GREEN", "YELLOW", "BLUE")) LEVEL_COLOR_MAPPING = {logging.WARNING: Colors.YELLOW, logging.ERROR: Colors.RED, logging.CRITICAL: Colors.RED} LEVEL_BOLD_MAPPING = {logging.WARNING: False, logging.ERROR: False, logging.CRITICAL: True} class ColoredFormatter(logging.Formatter): """Logging formatter coloring terminal output according to error criticity.""" def format(self, record): """See logging.Formatter.format.""" message = super().format(record) if sys.stderr.isatty() and not sys.platform.startswith("win32"): try: color_code = LEVEL_COLOR_MAPPING[record.levelno].value bold = LEVEL_BOLD_MAPPING[record.levelno] except KeyError: pass else: message = f"\033[{bold:d};{30 + color_code}m{message}\033[0m" return message ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1722171973.0 sacad-2.8.0/sacad/cover.py0000644000175000017500000007373514651441105014741 0ustar00maximemaxime"""Sacad album cover.""" import asyncio import enum import io import itertools import logging import math import mimetypes import operator import os import pickle import shutil import urllib.parse from typing import Dict import appdirs import bitarray try: from bitarray import bitdiff except ImportError: from bitarray.util import count_xor as bitdiff import PIL.Image import PIL.ImageFile import PIL.ImageFilter import web_cache from sacad import mkstemp_ctx PIL.ImageFile.LOAD_TRUNCATED_IMAGES = True CoverImageFormat = enum.Enum("CoverImageFormat", ("JPEG", "PNG")) class CoverSourceQuality(enum.IntFlag): """Flags to describe cover source quality.""" # whether or not the search query matching is fuzzy (does a typo return results ?) FUZZY_SEARCH = 0 EXACT_SEARCH = 1 << 1 # whether or not the source can return cover images for the correct album, but not being the front cover # unused for now NOT_FRONT_COVER_RESULT_RISK = 0 NO_NOT_FRONT_COVER_RESULT_RISK = 1 << 2 # whether or not the source will return complete crap instead of no results when it has not found a real match # this seems similar to the fuzzy flags but is different UNRELATED_RESULT_RISK = 0 NO_UNRELATED_RESULT_RISK = 1 << 3 def isReference(self) -> bool: """ Return True if the source if of 'reference' quality. 'Reference' means a result can only be the correct cover (but can be wrong size/format). """ mask = self.EXACT_SEARCH | self.NO_UNRELATED_RESULT_RISK return (self.value & mask) == mask class CoverImageMetadata(enum.IntFlag): """Flags to describe image metadata.""" NONE = 0 FORMAT = 1 SIZE = 2 ALL = 3 HAS_JPEGOPTIM = shutil.which("jpegoptim") is not None HAS_OPTIPNG = shutil.which("optipng") is not None HAS_OXIPNG = shutil.which("oxipng") is not None SUPPORTED_IMG_FORMATS = {"jpg": CoverImageFormat.JPEG, "jpeg": CoverImageFormat.JPEG, "png": CoverImageFormat.PNG} FORMAT_EXTENSIONS = {CoverImageFormat.JPEG: "jpg", CoverImageFormat.PNG: "png"} def is_square(x): """Return True if integer x is a perfect square, False otherwise.""" return math.sqrt(x).is_integer() class CoverSourceResult: """Cover image returned by a source, candidate to be downloaded.""" METADATA_PEEK_SIZE_INCREMENT = 2**12 MAX_FILE_METADATA_PEEK_SIZE = 20 * METADATA_PEEK_SIZE_INCREMENT IMG_SIG_SIZE = 16 def __init__( self, urls, size, format, *, thumbnail_url, source, source_quality, rank=None, check_metadata=CoverImageMetadata.NONE, ): # noqa: D205,D213,D400,D415 """ Args: urls: Cover image file URL. Can be a tuple of URLs of images to be joined size: Cover size as a (with, height) tuple format: Cover image format as a CoverImageFormat enum, or None if unknown thumbnail_url: Cover thumbnail image file URL, or None if not available source: Cover source object that produced this result source_quality: Quality of the cover's source as a CoverSourceQuality enum value rank: Integer ranking of the cover in the other results from the same source, or None if not available check_metadata: If != 0, hint that the format and/or size parameters are not reliable and must be double checked """ if not isinstance(urls, str): self.urls = urls else: self.urls = (urls,) self.size = size assert (format is None) or (format in CoverImageFormat) self.format = format self.thumbnail_url = thumbnail_url self.thumbnail_sig = None self.source = source self.source_quality = source_quality self.rank = rank assert (format is not None) or ((check_metadata & CoverImageMetadata.FORMAT) != 0) assert (size is not None) or ((check_metadata & CoverImageMetadata.SIZE) != 0) self.check_metadata = check_metadata self.reliable_metadata = True self.is_similar_to_reference = False self.is_only_reference = False if not hasattr(__class__, "image_cache"): cache_filepath = os.path.join( appdirs.user_cache_dir(appname="sacad", appauthor=False), "sacad-cache.sqlite" ) os.makedirs(os.path.dirname(cache_filepath), exist_ok=True) __class__.image_cache = web_cache.WebCache( cache_filepath, "cover_image_data", caching_strategy=web_cache.CachingStrategy.LRU, expiration=60 * 60 * 24 * 365, ) # 1 year __class__.metadata_cache = web_cache.WebCache( cache_filepath, "cover_metadata", caching_strategy=web_cache.CachingStrategy.LRU, expiration=60 * 60 * 24 * 365, ) # 1 year for cache, cache_name in zip( (__class__.image_cache, __class__.metadata_cache), ("cover_image_data", "cover_metadata") ): purged_count = cache.purge() logging.getLogger("Cache").debug( f"{purged_count} obsolete entries have been removed from cache {cache_name!r}" ) row_count = len(cache) logging.getLogger("Cache").debug(f"Cache {cache_name!r} contains {row_count} entries") def __str__(self): s = f"{self.__class__.__name__} {self.urls[0]!r}" if len(self.urls) > 1: s += f" [x{len(self.urls)}]" return s async def get( self, target_format: CoverImageFormat, target_size: int, size_tolerance_prct: float, out_filepath: str, *, preserve_format: bool = False, convert_progressive_jpeg: bool = False, ) -> None: """Download cover and process it.""" images_data = [] for i, url in enumerate(self.urls): # download logging.getLogger("Cover").info(f"Downloading cover {url!r} (part {i + 1}/{len(self.urls)})...") headers: Dict[str, str] = {} self.source.updateHttpHeaders(headers) async def pre_cache_callback(img_data): return await __class__.crunch(img_data, self.format) store_in_cache_callback, image_data = await self.source.http.query( url, headers=headers, verify=False, cache=__class__.image_cache, # type: ignore pre_cache_callback=pre_cache_callback, ) # store immediately in cache await store_in_cache_callback() # append for multi images images_data.append(image_data) need_format_change = self.format != target_format need_size_change = (max(self.size) > target_size) and ( abs(max(self.size) - target_size) > target_size * size_tolerance_prct / 100 ) need_join = len(images_data) > 1 need_post_process = (need_format_change and (not preserve_format)) or need_join or need_size_change need_post_process = need_post_process or ( __class__.isProgressiveJpegData(images_data[0]) and convert_progressive_jpeg # type: ignore ) if need_post_process: # post process image_data = self.postProcess( images_data, target_format if need_format_change else None, target_size if need_size_change else None ) # crunch image again image_data = await __class__.crunch(image_data, target_format) # type: ignore format_changed = need_format_change else: format_changed = False # write it if need_format_change and (not format_changed): assert preserve_format out_filepath = f"{os.path.splitext(out_filepath)[0]}.{FORMAT_EXTENSIONS[self.format]}" with open(out_filepath, "wb") as file: file.write(image_data) @staticmethod def isProgressiveJpegData(data: bytes) -> bool: """Return True if data is from a progressive JPEG.""" in_bytes = io.BytesIO(data) try: img = PIL.Image.open(in_bytes) return bool(img.info["progressive"]) except Exception: return False def postProcess(self, images_data, new_format, new_size): """ Convert image binary data. Convert image binary data to a target format and/or size (None if no conversion needed), and return the processed data. """ if len(images_data) == 1: in_bytes = io.BytesIO(images_data[0]) img = PIL.Image.open(in_bytes) if img.mode != "RGB": img = img.convert("RGB") else: # images need to be joined before further processing logging.getLogger("Cover").info(f"Joining {len(images_data)} images...") # TODO find a way to do this losslessly for JPEG new_img = PIL.Image.new("RGB", self.size) assert is_square(len(images_data)) sq = int(math.sqrt(len(images_data))) images_data_it = iter(images_data) img_sizes = {} for x in range(sq): for y in range(sq): current_image_data = next(images_data_it) img_stream = io.BytesIO(current_image_data) img = PIL.Image.open(img_stream) img_sizes[(x, y)] = img.size box = [0, 0] if x > 0: for px in range(x): box[0] += img_sizes[(px, y)][0] if y > 0: for py in range(y): box[1] += img_sizes[(x, py)][1] box.extend((box[0] + img.size[0], box[1] + img.size[1])) new_img.paste(img, box=tuple(box)) img = new_img out_bytes = io.BytesIO() if new_size is not None: logging.getLogger("Cover").info(f"Resizing from {self.size[0]}x{self.size[1]} to {new_size}x{new_size}...") img = img.resize((new_size, new_size), PIL.Image.LANCZOS) # apply unsharp filter to remove resize blur (equivalent to (images/graphics)magick -unsharp 1.5x1+0.7+0.02) # we don't use PIL.ImageFilter.SHARPEN or PIL.ImageEnhance.Sharpness because we want precise control over # parameters unsharper = PIL.ImageFilter.UnsharpMask(radius=1.5, percent=70, threshold=5) img = img.filter(unsharper) if new_format is not None: logging.getLogger("Cover").info(f"Converting to {new_format.name.upper()}...") target_format = new_format else: target_format = self.format img.save(out_bytes, format=target_format.name, quality=90, optimize=True) return out_bytes.getvalue() async def updateImageMetadata(self): # noqa: C901 """Download image file(s) partially to get its real metadata, or get it from cache.""" assert self.needMetadataUpdate() width_sum, height_sum = 0, 0 # only download metadata for the needed images to get full size idxs = [] assert is_square(len(self.urls)) sq = int(math.sqrt(len(self.urls))) for x in range(sq): for y in range(sq): if x == y: idxs.append((x * sq + y, x, y)) for idx, x, y in idxs: url = self.urls[idx] format, width, height = None, None, None try: format, width, height = pickle.loads(__class__.metadata_cache[url]) except KeyError: # cache miss pass except Exception as e: logging.getLogger("Cover").warning( f"Unable to load metadata for URL {url!r} from cache: {e.__class__.__qualname__} {e}" ) else: # cache hit logging.getLogger("Cover").debug(f"Got metadata for URL {url!r} from cache") if format is not None: self.setFormatMetadata(format) if self.needMetadataUpdate(CoverImageMetadata.FORMAT) or ( self.needMetadataUpdate(CoverImageMetadata.SIZE) and ((width is None) or (height is None)) ): # download logging.getLogger("Cover").debug(f"Downloading file header for URL {url!r}...") try: headers = {} self.source.updateHttpHeaders(headers) response = await self.source.http.fastStreamedQuery(url, headers=headers, verify=False) try: if self.needMetadataUpdate(CoverImageMetadata.FORMAT): # try to get format from response format = __class__.guessImageFormatFromHttpResponse(response) if format is not None: self.setFormatMetadata(format) if self.needMetadataUpdate(): # try to get metadata from HTTP data metadata = await __class__.guessImageMetadataFromHttpData(response) if metadata is not None: format, width, height = metadata if format is not None: self.setFormatMetadata(format) finally: await response.release() except Exception as e: logging.getLogger("Cover").warning( f"Failed to get file metadata for URL {url!r} ({e.__class__.__qualname__} {e})" ) if self.needMetadataUpdate(): # did we fail to get needed metadata at this point? if (self.format is None) or ((self.size is None) and ((width is None) or (height is None))): # if we get here, file is probably not reachable, or not even an image logging.getLogger("Cover").debug( f"Unable to get file metadata from file or HTTP headers for URL {url!r}, " "skipping this result" ) return if (self.format is not None) and ((self.size is not None) and (width is None) and (height is None)): logging.getLogger("Cover").debug( f"Unable to get file metadata from file or HTTP headers for URL {url!r}, " "falling back to API data" ) self.check_metadata = CoverImageMetadata.NONE self.reliable_metadata = False return # save it to cache __class__.metadata_cache[url] = pickle.dumps((format, width, height)) # sum sizes if (width is not None) and (height is not None): width_sum += width height_sum += height if self.needMetadataUpdate(CoverImageMetadata.SIZE) and (width_sum > 0) and (height_sum > 0): self.setSizeMetadata((width_sum, height_sum)) def needMetadataUpdate(self, what=CoverImageMetadata.ALL): """Return True if image metadata needs to be checked, False instead.""" return (self.check_metadata & what) != 0 def setFormatMetadata(self, format): """Set format image metadata to what has been reliably identified.""" self.format = format self.check_metadata &= ~CoverImageMetadata.FORMAT def setSizeMetadata(self, size): """Set size image metadata to what has been reliably identified.""" self.size = size self.check_metadata &= ~CoverImageMetadata.SIZE async def updateSignature(self): """Calculate a cover's "signature" using its thumbnail url.""" assert self.thumbnail_sig is None if self.thumbnail_url is None: logging.getLogger("Cover").warning(f"No thumbnail available for {self}") return # download logging.getLogger("Cover").debug(f"Downloading cover thumbnail {self.thumbnail_url!r}...") headers = {} self.source.updateHttpHeaders(headers) async def pre_cache_callback(img_data): return await __class__.crunch(img_data, CoverImageFormat.JPEG, silent=True) try: store_in_cache_callback, image_data = await self.source.http.query( self.thumbnail_url, cache=__class__.image_cache, headers=headers, pre_cache_callback=pre_cache_callback ) except Exception as e: logging.getLogger("Cover").warning( f"Download of {self.thumbnail_url!r} failed: {e.__class__.__qualname__} {e}" ) return # compute sig logging.getLogger("Cover").debug(f"Computing signature of {self}...") try: self.thumbnail_sig = __class__.computeImgSignature(image_data) except Exception as e: logging.getLogger("Cover").warning( f"Failed to compute signature of '{self}': {e.__class__.__qualname__} {e}" ) else: await store_in_cache_callback() @staticmethod def compare(first, second, *, target_size, size_tolerance_prct): """ Compare cover relevance/quality. Return -1 if first is a worst match than second, 1 otherwise, or 0 if cover can't be discriminated. This code is responsible for comparing two cover results to identify the best one, and is used to sort all results. It is probably the most important piece of code of this tool. Covers with sizes under the target size (+- configured tolerance) are excluded before comparison. The following factors are used in order: 1. Prefer approximately square covers 2. Prefer covers similar to the reference cover 3. Prefer size above target size 4. If both below target size, prefer closest 5. Prefer covers of most reliable source 6. Prefer best ranked cover 7. Prefer covers with reliable metadata If all previous factors do not allow sorting of two results (very unlikely): 8. Prefer covers with less images to join 9. Prefer covers having the target size 10. Prefer PNG covers 11. Prefer exactly square covers We don't overload the __lt__ operator because we need to pass the target_size parameter. """ for c in (first, second): assert c.format is not None assert isinstance(c.size[0], int) and isinstance(c.size[1], int) # prefer square covers #1 delta_ratio1 = abs(first.size[0] / first.size[1] - 1) delta_ratio2 = abs(second.size[0] / second.size[1] - 1) if abs(delta_ratio1 - delta_ratio2) > 0.15: return -1 if (delta_ratio1 > delta_ratio2) else 1 # prefer similar to reference sr1 = first.is_similar_to_reference sr2 = second.is_similar_to_reference if sr1 and (not sr2): return 1 if (not sr1) and sr2: return -1 # prefer size above preferred delta_size1 = ((first.size[0] + first.size[1]) / 2) - target_size delta_size2 = ((second.size[0] + second.size[1]) / 2) - target_size if ((delta_size1 < 0) and (delta_size2 >= 0)) or (delta_size1 >= 0) and (delta_size2 < 0): return -1 if (delta_size1 < delta_size2) else 1 # if both below target size, prefer closest if (delta_size1 < 0) and (delta_size2 < 0) and (delta_size1 != delta_size2): return -1 if (delta_size1 < delta_size2) else 1 # prefer covers of most reliable source qs1 = first.source_quality.value qs2 = second.source_quality.value if qs1 != qs2: return -1 if (qs1 < qs2) else 1 # prefer best ranked if ( (first.rank is not None) and (second.rank is not None) and (first.__class__ is second.__class__) and (first.rank != second.rank) ): return -1 if (first.rank > second.rank) else 1 # prefer reliable metadata if first.reliable_metadata != second.reliable_metadata: return 1 if first.reliable_metadata else -1 # prefer covers with less images to join ic1 = len(first.urls) ic2 = len(second.urls) if ic1 != ic2: return -1 if (ic1 > ic2) else 1 # prefer the preferred size if abs(delta_size1) != abs(delta_size2): return -1 if (abs(delta_size1) > abs(delta_size2)) else 1 # prefer png if first.format != second.format: return -1 if (second.format is CoverImageFormat.PNG) else 1 # prefer square covers #2 if delta_ratio1 != delta_ratio2: return -1 if (delta_ratio1 > delta_ratio2) else 1 # fuck, they are the same! return 0 @staticmethod async def crunch(image_data, format, silent=False): """Crunch image data, and return the processed data, or orignal data if operation failed.""" if ((format is CoverImageFormat.PNG) and (not (HAS_OPTIPNG or HAS_OXIPNG))) or ( (format is CoverImageFormat.JPEG) and (not HAS_JPEGOPTIM) ): return image_data with mkstemp_ctx.mkstemp(suffix=f".{format.name.lower()}") as tmp_out_filepath: if not silent: logging.getLogger("Cover").info(f"Crunching {format.name.upper()} image...") with open(tmp_out_filepath, "wb") as tmp_out_file: tmp_out_file.write(image_data) size_before = len(image_data) if format is CoverImageFormat.PNG: if HAS_OXIPNG: cmd = ["oxipng", "-q", "-s"] else: cmd = ["optipng", "-quiet", "-o1"] elif format is CoverImageFormat.JPEG: cmd = ["jpegoptim", "-q", "--strip-all"] cmd.append(tmp_out_filepath) p = await asyncio.create_subprocess_exec( *cmd, stdin=asyncio.subprocess.DEVNULL, stdout=asyncio.subprocess.DEVNULL, stderr=asyncio.subprocess.DEVNULL, ) await p.wait() if p.returncode != 0: if not silent: logging.getLogger("Cover").warning("Crunching image failed") return image_data with open(tmp_out_filepath, "rb") as tmp_out_file: crunched_image_data = tmp_out_file.read() size_after = len(crunched_image_data) pct_saved = 100 * (size_before - size_after) / size_before if not silent: logging.getLogger("Cover").debug(f"Crunching image saved {pct_saved:.2f}% filesize") return crunched_image_data @staticmethod def guessImageMetadataFromData(img_data): """Identify an image format and size from its first bytes.""" format, width, height = None, None, None img_stream = io.BytesIO(img_data) try: img = PIL.Image.open(img_stream) except (IOError, OSError, RuntimeError): # PIL.UnidentifiedImageError inherits from OSError pass else: format = img.format.lower() format = SUPPORTED_IMG_FORMATS.get(format, None) width, height = img.size return format, width, height @staticmethod async def guessImageMetadataFromHttpData(response): """Identify an image format and size from the beginning of its HTTP data.""" metadata = None img_data = bytearray() while len(img_data) < CoverSourceResult.MAX_FILE_METADATA_PEEK_SIZE: new_img_data = await response.content.read(__class__.METADATA_PEEK_SIZE_INCREMENT) if not new_img_data: break img_data.extend(new_img_data) metadata = __class__.guessImageMetadataFromData(img_data) if (metadata is not None) and all(metadata): return metadata return metadata @staticmethod def guessImageFormatFromHttpResponse(response): """Guess file format from HTTP response, return format or None.""" extensions = [] # try to guess extension from response content-type header try: content_type = response.headers["Content-Type"] except KeyError: pass else: ext = mimetypes.guess_extension(content_type, strict=False) if ext is not None: extensions.append(ext) # try to extract extension from URL urls = list(response.history) + [response.url] for url in map(str, urls): ext = os.path.splitext(urllib.parse.urlsplit(url).path)[-1] if (ext is not None) and (ext not in extensions): extensions.append(ext) # now guess from the extensions for ext in extensions: try: return SUPPORTED_IMG_FORMATS[ext[1:]] except KeyError: pass @staticmethod async def preProcessForComparison(results, target_size, size_tolerance_prct): """Process results to prepare them for future comparison and sorting.""" # find reference (=image most likely to match target cover ignoring factors like size and format) reference = None for result in results: if result.source_quality.isReference(): if (reference is None) or ( CoverSourceResult.compare( result, reference, target_size=target_size, size_tolerance_prct=size_tolerance_prct ) > 0 ): reference = result # remove results that are only refs results = list(itertools.filterfalse(operator.attrgetter("is_only_reference"), results)) # remove duplicates no_dup_results = [] for result in results: is_dup = False for result_comp in results: if ( (result_comp is not result) and (result_comp.urls == result.urls) and ( __class__.compare( result, result_comp, target_size=target_size, size_tolerance_prct=size_tolerance_prct ) < 0 ) ): is_dup = True break if not is_dup: no_dup_results.append(result) dup_count = len(results) - len(no_dup_results) if dup_count > 0: logging.getLogger("Cover").info(f"Removed {dup_count} duplicate results") results = no_dup_results if reference is not None: logging.getLogger("Cover").info(f"Reference is: {reference}") reference.is_similar_to_reference = True # calculate sigs futures = [] for result in results: coroutine = result.updateSignature() future = asyncio.ensure_future(coroutine) futures.append(future) if reference.is_only_reference: assert reference not in results coroutine = reference.updateSignature() future = asyncio.ensure_future(coroutine) futures.append(future) if futures: await asyncio.wait(futures) for future in futures: future.result() # raise pending exception if any # compare other results to reference for result in results: if ( (result is not reference) and (result.thumbnail_sig is not None) and (reference.thumbnail_sig is not None) ): result.is_similar_to_reference = __class__.areImageSigsSimilar( result.thumbnail_sig, reference.thumbnail_sig ) if result.is_similar_to_reference: logging.getLogger("Cover").debug(f"{result} is similar to reference") else: logging.getLogger("Cover").debug(f"{result} is NOT similar to reference") else: logging.getLogger("Cover").warning("No reference result found") return results @staticmethod def computeImgSignature(image_data): """ Calculate an image signature. This is similar to ahash but uses 3 colors components See: https://github.com/JohannesBuchner/imagehash/blob/4.0/imagehash/__init__.py#L125 """ parser = PIL.ImageFile.Parser() parser.feed(image_data) img = parser.close() target_size = (__class__.IMG_SIG_SIZE, __class__.IMG_SIG_SIZE) img.thumbnail(target_size, PIL.Image.Resampling.BICUBIC) if img.size != target_size: logging.getLogger("Cover").debug( "Non square thumbnail after resize to %ux%u, unable to compute signature" % target_size ) return None img = img.convert(mode="RGB") pixels = img.getdata() pixel_count = target_size[0] * target_size[1] color_count = 3 r = bitarray.bitarray(pixel_count * color_count) r.setall(False) for ic in range(color_count): mean = sum(p[ic] for p in pixels) // pixel_count for ip, p in enumerate(pixels): if p[ic] > mean: r[pixel_count * ic + ip] = True return r @staticmethod def areImageSigsSimilar(sig1, sig2): """Compare 2 image "signatures" and return True if they seem to come from a similar image, False otherwise.""" return bitdiff(sig1, sig2) < 100 # silence third party module loggers logging.getLogger("PIL").setLevel(logging.ERROR) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1722171973.0 sacad-2.8.0/sacad/http_helpers.py0000644000175000017500000002246714651441105016320 0ustar00maximemaxime"""Common HTTP code.""" import asyncio import logging import os import pickle import urllib.parse import aiohttp import appdirs from sacad import rate_watcher, redo def aiohttp_socket_timeout(socket_timeout_s): """Return a aiohttp.ClientTimeout object with only socket timeouts set.""" return aiohttp.ClientTimeout(total=None, connect=None, sock_connect=socket_timeout_s, sock_read=socket_timeout_s) HTTP_NORMAL_TIMEOUT = aiohttp_socket_timeout(12.1) HTTP_SHORT_TIMEOUT = aiohttp_socket_timeout(6.1) HTTP_MAX_ATTEMPTS = 3 HTTP_MAX_RETRY_SLEEP_S = 5 HTTP_MAX_RETRY_SLEEP_SHORT_S = 2 DEFAULT_USER_AGENT = "Mozilla/5.0" class Http: """Async HTTP client code.""" def __init__( self, *, allow_session_cookies=False, min_delay_between_accesses=0, jitter_range_ms=None, rate_limited_domains=None, logger=logging.getLogger(), ): self.allow_session_cookies = allow_session_cookies self.session = None self.watcher_db_filepath = os.path.join( appdirs.user_cache_dir(appname="sacad", appauthor=False), "rate_watcher.sqlite" ) self.min_delay_between_accesses = min_delay_between_accesses self.jitter_range_ms = jitter_range_ms self.rate_limited_domains = rate_limited_domains self.logger = logger async def close(self): """Close HTTP session to make aiohttp happy.""" if self.session is not None: await self.session.close() async def query( # noqa: C901 self, url, *, post_data=None, headers=None, verify=True, cache=None, pre_cache_callback=None ): """ Send a GET/POST request. Send a GET/POST request or get data from cache, retry if it fails, and return a tuple of store in cache callback, response content. """ async def store_in_cache_callback(): pass if cache is not None: # try from cache first if post_data is not None: if (url, post_data) in cache: self.logger.debug(f"Got data for URL {url!r} {dict(post_data)} from cache") return store_in_cache_callback, cache[(url, post_data)] elif url in cache: self.logger.debug(f"Got data for URL {url!r} from cache") return store_in_cache_callback, cache[url] if self.session is None: self._initSession() # do we need to rate limit? if self.rate_limited_domains is not None: domain = urllib.parse.urlsplit(url).path rate_limit = domain in self.rate_limited_domains else: rate_limit = True if rate_limit: domain_rate_watcher = rate_watcher.AccessRateWatcher( self.watcher_db_filepath, url, self.min_delay_between_accesses, jitter_range_ms=self.jitter_range_ms, logger=self.logger, ) for attempt, time_to_sleep in enumerate( redo.retrier( max_attempts=HTTP_MAX_ATTEMPTS, sleeptime=1, max_sleeptime=HTTP_MAX_RETRY_SLEEP_S, sleepscale=1.5 ), 1, ): if rate_limit: await domain_rate_watcher.waitAccessAsync() try: if post_data is not None: async with self.session.post( url, data=post_data, headers=self._buildHeaders(headers), timeout=HTTP_NORMAL_TIMEOUT, ssl=verify, ) as response: content = await response.read() else: async with self.session.get( url, headers=self._buildHeaders(headers), timeout=HTTP_NORMAL_TIMEOUT, ssl=verify ) as response: content = await response.read() if cache is not None: async def store_in_cache_callback(): if pre_cache_callback is not None: # process try: data = await pre_cache_callback(content) except Exception: data = content else: data = content # add to cache if post_data is not None: cache[(url, post_data)] = data else: cache[url] = data except (asyncio.TimeoutError, aiohttp.ClientError) as e: self.logger.warning( f"Querying {url!r} failed (attempt {attempt}/{HTTP_MAX_ATTEMPTS}): {e.__class__.__qualname_} {e}" ) if attempt == HTTP_MAX_ATTEMPTS: raise else: self.logger.debug(f"Retrying in {time_to_sleep:.3f}s") await asyncio.sleep(time_to_sleep) else: break # http retry loop response.raise_for_status() return store_in_cache_callback, content async def isReachable(self, url, *, headers=None, verify=True, response_headers=None, cache=None): """ Send a HEAD request. Send a HEAD request with short timeout or get data from cache, return True if ressource has 2xx status code, False instead. """ if (cache is not None) and (url in cache): # try from cache first self.logger.debug(f"Got headers for URL {url!r} from cache") resp_ok, response_headers = pickle.loads(cache[url]) return resp_ok if self.session is None: self._initSession() # do we need to rate limit? if self.rate_limited_domains is not None: domain = urllib.parse.urlsplit(url).path rate_limit = domain in self.rate_limited_domains else: rate_limit = True if rate_limit: domain_rate_watcher = rate_watcher.AccessRateWatcher( self.watcher_db_filepath, url, self.min_delay_between_accesses, jitter_range_ms=self.jitter_range_ms, logger=self.logger, ) resp_ok = True try: for attempt, time_to_sleep in enumerate( redo.retrier( max_attempts=HTTP_MAX_ATTEMPTS, sleeptime=0.5, max_sleeptime=HTTP_MAX_RETRY_SLEEP_SHORT_S, sleepscale=1.5, ), 1, ): if rate_limit: await domain_rate_watcher.waitAccessAsync() try: async with self.session.head( url, headers=self._buildHeaders(headers), timeout=HTTP_SHORT_TIMEOUT, ssl=verify ) as response: pass except (asyncio.TimeoutError, aiohttp.ClientError) as e: self.logger.warning( f"Probing {url!r} failed (attempt {attempt}/{HTTP_MAX_ATTEMPTS}): {e.__class__.__qualname_} {e}" ) if attempt == HTTP_MAX_ATTEMPTS: resp_ok = False else: self.logger.debug(f"Retrying in {time_to_sleep:.3f}s") await asyncio.sleep(time_to_sleep) else: response.raise_for_status() if response_headers is not None: response_headers.update(response.headers) break # http retry loop except aiohttp.ClientResponseError as e: self.logger.debug(f"Probing {url!r} failed: {e.__class__.__qualname__} {e}") resp_ok = False if cache is not None: # store in cache cache[url] = pickle.dumps((resp_ok, response_headers)) return resp_ok async def fastStreamedQuery(self, url, *, headers=None, verify=True): """Send a GET request with short timeout, do not retry, and return streamed response.""" if self.session is None: self._initSession() response = await self.session.get( url, headers=self._buildHeaders(headers), timeout=HTTP_SHORT_TIMEOUT, ssl=verify ) response.raise_for_status() return response def _buildHeaders(self, headers): """Build HTTP headers dictionary.""" if headers is None: headers = {} if "User-Agent" not in headers: headers["User-Agent"] = DEFAULT_USER_AGENT return headers def _initSession(self): """ Initialize HTTP session. It must be done in a coroutine, see https://docs.aiohttp.org/en/stable/faq.html#why-is-creating-a-clientsession-outside-of-an-event-loop-dangerous """ assert self.session is None if self.allow_session_cookies: cookie_jar = aiohttp.cookiejar.DummyCookieJar() else: cookie_jar = None connector = aiohttp.TCPConnector() self.session = aiohttp.ClientSession(connector=connector, cookie_jar=cookie_jar) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1722171973.0 sacad-2.8.0/sacad/mkstemp_ctx.py0000644000175000017500000000074514651441105016150 0ustar00maximemaxime"""Additions to the tempfile module.""" import contextlib import os import tempfile @contextlib.contextmanager def mkstemp(*args, **kwargs): """ Safely generate a temporary file path. Context manager similar to tempfile.NamedTemporaryFile except the file is not deleted on close, and only the filepath is returned """ fd, filename = tempfile.mkstemp(*args, **kwargs) os.close(fd) try: yield filename finally: os.remove(filename) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1722171973.0 sacad-2.8.0/sacad/rate_watcher.py0000644000175000017500000000624014651441105016256 0ustar00maximemaxime"""Provide a class with a context manager to help avoid overloading web servers.""" import asyncio import logging import os import random import sqlite3 import time import urllib.parse class AccessRateWatcher: """Access rate limiter, supporting concurrent access by threads and/or processes.""" def __init__( self, db_filepath, url, min_delay_between_accesses, *, jitter_range_ms=None, logger=logging.getLogger() ): self.domain = urllib.parse.urlsplit(url).netloc self.min_delay_between_accesses = min_delay_between_accesses self.jitter_range_ms = jitter_range_ms self.logger = logger os.makedirs(os.path.dirname(db_filepath), exist_ok=True) self.connection = sqlite3.connect(db_filepath) with self.connection: self.connection.executescript( """CREATE TABLE IF NOT EXISTS access_timestamp (domain TEXT PRIMARY KEY, timestamp FLOAT NOT NULL);""" ) self.lock = None async def waitAccessAsync(self): """Wait the needed time before sending a request to honor rate limit.""" if self.lock is None: self.lock = asyncio.Lock() async with self.lock: while True: last_access_ts = self.__getLastAccess() if last_access_ts is not None: now = time.time() last_access_ts = last_access_ts[0] time_since_last_access = now - last_access_ts if time_since_last_access < self.min_delay_between_accesses: time_to_wait = self.min_delay_between_accesses - time_since_last_access if self.jitter_range_ms is not None: time_to_wait += random.randint(*self.jitter_range_ms) / 1000 self.logger.debug( "Sleeping for %.2fms because of rate limit for domain %s" % (time_to_wait * 1000, self.domain) ) await asyncio.sleep(time_to_wait) access_time = time.time() self.__access(access_time) # now we should be good... except if another process did the same query at the same time # the database serves as an atomic lock, query again to be sure the last row is the one # we just inserted last_access_ts = self.__getLastAccess() if last_access_ts[0] == access_time: break def __getLastAccess(self): with self.connection: return self.connection.execute( """SELECT timestamp FROM access_timestamp WHERE domain = ?;""", (self.domain,), ).fetchone() def __access(self, ts): """Record an API access.""" with self.connection: self.connection.execute( "INSERT OR REPLACE INTO access_timestamp (timestamp, domain) VALUES (?, ?)", (ts, self.domain) ) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1722171973.0 sacad-2.8.0/sacad/recurse.py0000755000175000017500000004207714651441105015271 0ustar00maximemaxime#!/usr/bin/env python3 """Recursively search and download album covers for a music library.""" import argparse import asyncio import base64 import collections import contextlib import inspect import itertools import logging import mimetypes import operator import os import string import sys import tempfile import mutagen import tqdm import unidecode import sacad from sacad import COVER_SOURCE_CLASSES, colored_logging, tqdm_logging EMBEDDED_ALBUM_ART_SYMBOL = "+" AUDIO_EXTENSIONS = frozenset( ("aac", "ape", "flac", "m4a", "mp3", "mp4", "mpc", "ogg", "oga", "opus", "tta", "wv") ) | frozenset( ext for ext, mime in {**mimetypes.types_map, **mimetypes.common_types}.items() if mime.startswith("audio/") ) Metadata = collections.namedtuple("Metadata", ("artist", "album", "has_embedded_cover")) # TODO use a dataclasses.dataclass when Python < 3.7 is dropped class Work: """Represent a single search & download work item.""" def __init__(self, cover_filepath, audio_filepaths, metadata): self.cover_filepath = cover_filepath self.tmp_cover_filepath = None self.audio_filepaths = audio_filepaths self.metadata = metadata def __repr__(self): return ( f"<{__class__.__qualname__} " f"cover_filepath={self.cover_filepath!r} " f"tmp_cover_filepath={self.tmp_cover_filepath!r} " f"audio_filepaths={self.audio_filepaths!r} " f"metadata={self.metadata!r}>" ) def __str__(self): return ( f"cover for {self.metadata.album!r} by {self.metadata.artist!r} " f"from {', '.join(map(repr, self.audio_filepaths))}" ) def __eq__(self, other): if not isinstance(other, __class__): return False return ( (self.cover_filepath == other.cover_filepath) and (self.tmp_cover_filepath == other.tmp_cover_filepath) and (self.audio_filepaths == other.audio_filepaths) and (self.metadata == other.metadata) ) def analyze_lib(lib_dir, cover_pattern, *, ignore_existing=False, full_scan=False, all_formats=False): """Recursively analyze library, and return a list of work.""" work = [] stats = collections.OrderedDict((k, 0) for k in ("files", "albums", "missing covers", "errors")) with tqdm.tqdm(desc="Analyzing library", unit="dir", postfix=stats) as progress, tqdm_logging.redirect_logging( progress ): for rootpath, rel_dirpaths, rel_filepaths in os.walk(lib_dir): new_work = analyze_dir( stats, rootpath, rel_filepaths, cover_pattern, ignore_existing=ignore_existing, full_scan=full_scan, all_formats=all_formats, ) progress.set_postfix(stats, refresh=False) progress.update(1) work.extend(new_work) return work def get_file_metadata(audio_filepath): """Get a Metadata object for this file or None.""" try: mf = mutagen.File(audio_filepath) except Exception: return if mf is None: return # artist for key in ("albumartist", "artist", "TPE1", "TPE2", "aART", "\xa9ART"): # ogg/ape, mp3, mp4 try: val = mf.get(key, None) except ValueError: val = None if val is not None: artist = val[-1] break else: return # album for key in ("_album", "album", "TALB", "\xa9alb"): # ogg/ape, mp3, mp4 try: val = mf.get(key, None) except ValueError: val = None if val is not None: album = val[-1] break else: return # album art if isinstance(mf.tags, mutagen._vorbis.VComment): has_embedded_cover = ("metadata_block_picture" in mf) or ( isinstance(mf, mutagen.flac.FLAC) and any((p.type == mutagen.id3.PictureType.COVER_FRONT) for p in mf.pictures) ) elif isinstance(mf.tags, mutagen.id3.ID3): has_embedded_cover = any(map(operator.methodcaller("startswith", "APIC:"), mf.keys())) elif isinstance(mf.tags, mutagen.mp4.MP4Tags): has_embedded_cover = "covr" in mf elif isinstance(mf.tags, mutagen.apev2.APEv2): has_embedded_cover = "cover art (front)" in mf else: # unknown tag format return return Metadata(artist, album, has_embedded_cover) def get_dir_metadata(audio_filepaths, *, full_scan=False): """Build a dict of Metadata to audio filepath list by analyzing audio files.""" r = collections.defaultdict(list) audio_filepaths = tuple(sorted(audio_filepaths)) for audio_filepath in audio_filepaths: file_metadata = get_file_metadata(audio_filepath) if file_metadata is None: continue if not full_scan: # stop at the first file that succeeds (for performance) # assume all directory files have the same artist/album couple r[file_metadata] = audio_filepaths break r[file_metadata].append(audio_filepath) return r VALID_PATH_CHARS = frozenset(r"-_.()!#$%&'@^{}~" + string.ascii_letters + string.digits + " ") def sanitize_for_path(s): """Sanitize a string to be FAT/NTFS friendly when used in file path.""" s = s.translate(str.maketrans("/\\|*", "---x")) s = "".join(c for c in unidecode.unidecode_expect_ascii(s) if c in VALID_PATH_CHARS) s = s.strip() s = s.rstrip(".") # this if for FAT on Android return s def pattern_to_filepath(pattern, parent_dir, metadata): """Build absolute cover file path from pattern.""" assert pattern != EMBEDDED_ALBUM_ART_SYMBOL assert metadata.artist is not None assert metadata.album is not None filepath = pattern.format(artist=sanitize_for_path(metadata.artist), album=sanitize_for_path(metadata.album)) if not os.path.isabs(filepath): filepath = os.path.join(parent_dir, filepath) return filepath def analyze_dir( stats, parent_dir, rel_filepaths, cover_pattern, *, ignore_existing=False, full_scan=False, all_formats=False ): """Analyze a directory (non recursively) and return a list of Work objects.""" r = [] # filter out non audio files audio_filepaths = [] for rel_filepath in rel_filepaths: stats["files"] += 1 try: ext = os.path.splitext(rel_filepath)[1][1:].lower() except IndexError: continue if ext in AUDIO_EXTENSIONS: audio_filepaths.append(os.path.join(parent_dir, rel_filepath)) # get metadata dir_metadata = get_dir_metadata(audio_filepaths, full_scan=full_scan) if audio_filepaths and (not dir_metadata): # failed to get any metadata for this directory stats["errors"] += 1 logging.getLogger("sacad_r").error(f"Unable to read metadata for album directory {parent_dir!r}") for metadata, album_audio_filepaths in dir_metadata.items(): # update stats stats["albums"] += 1 # add work item if needed if cover_pattern != EMBEDDED_ALBUM_ART_SYMBOL: cover_filepath = pattern_to_filepath(cover_pattern, parent_dir, metadata) if all_formats: missing = ignore_existing or ( not any( os.path.isfile(f"{os.path.splitext(cover_filepath)[0]}.{ext}") for ext in sacad.SUPPORTED_IMG_FORMATS ) ) else: missing = ignore_existing or (not os.path.isfile(cover_filepath)) else: cover_filepath = EMBEDDED_ALBUM_ART_SYMBOL missing = (not metadata.has_embedded_cover) or ignore_existing if missing: stats["missing covers"] += 1 r.append(Work(cover_filepath, album_audio_filepaths, metadata)) return r def embed_album_art(cover_filepath, audio_filepaths): """Embed album art into audio files.""" with open(cover_filepath, "rb") as f: cover_data = f.read() for filepath in audio_filepaths: mf = mutagen.File(filepath) if mf.tags is None: mf.add_tags() if isinstance(mf.tags, mutagen._vorbis.VComment): picture = mutagen.flac.Picture() picture.data = cover_data picture.type = mutagen.id3.PictureType.COVER_FRONT picture.mime = "image/jpeg" if isinstance(mf, mutagen.flac.FLAC): mf.add_picture(picture) else: encoded_data = base64.b64encode(picture.write()) mf["metadata_block_picture"] = encoded_data.decode("ascii") elif isinstance(mf.tags, mutagen.id3.ID3): mf.tags.add(mutagen.id3.APIC(mime="image/jpeg", type=mutagen.id3.PictureType.COVER_FRONT, data=cover_data)) elif isinstance(mf.tags, mutagen.mp4.MP4Tags): mf["covr"] = [mutagen.mp4.MP4Cover(cover_data, imageformat=mutagen.mp4.AtomDataType.JPEG)] elif isinstance(mf.tags, mutagen.apev2.APEv2): mf["cover art (front)"] = cover_data mf.save() def ichunk(iterable, n): """Split an iterable into n-sized chunks.""" it = iter(iterable) while True: chunk = tuple(itertools.islice(it, n)) if not chunk: return yield chunk def get_covers(work, args): """Get missing covers.""" with contextlib.ExitStack() as cm: if args.cover_pattern == EMBEDDED_ALBUM_ART_SYMBOL: tmp_prefix = f"{os.path.splitext(os.path.basename(inspect.getfile(inspect.currentframe())))[0]}_" tmp_dir = cm.enter_context(tempfile.TemporaryDirectory(prefix=tmp_prefix)) # setup progress report stats = collections.OrderedDict((k, 0) for k in ("ok", "errors", "no result found")) progress = cm.enter_context( tqdm.tqdm(total=len(work), miniters=1, desc="Searching covers", unit="cover", postfix=stats) ) cm.enter_context(tqdm_logging.redirect_logging(progress)) def post_download(future): work = futures[future] try: status = future.result() except Exception as exception: stats["errors"] += 1 logging.getLogger("sacad_r").error( f"Error occured while searching {work}: {exception.__class__.__qualname__} {exception}" ) else: if status: if work.cover_filepath == EMBEDDED_ALBUM_ART_SYMBOL: try: embed_album_art(work.tmp_cover_filepath, work.audio_filepaths) except Exception as exception: stats["errors"] += 1 logging.getLogger("sacad_r").error( f"Error occured while embedding {work}: " f"{exception.__class__.__qualname__} {exception}" ) else: stats["ok"] += 1 finally: os.remove(work.tmp_cover_filepath) else: stats["ok"] += 1 else: stats["no result found"] += 1 logging.getLogger("sacad_r").warning(f"Unable to find {work}") progress.set_postfix(stats, refresh=False) progress.update(1) # post work i = 0 # default event loop on Windows has a 512 fd limit, # see https://docs.python.org/3/library/asyncio-eventloops.html#windows # also on Linux default max open fd limit is 1024 (ulimit -n) # so work in smaller chunks to avoid hitting fd limit # this also updates the progress faster (instead of working on all searches, work on finishing the chunk before # getting to the next one) work_chunk_length = 4 if sys.platform.startswith("win") else 12 for work_chunk in ichunk(work, work_chunk_length): futures = {} for i, cur_work in enumerate(work_chunk, i): if cur_work.cover_filepath == EMBEDDED_ALBUM_ART_SYMBOL: cover_filepath = os.path.join(tmp_dir, f"{i:02}.{args.format.name.lower()}") cur_work.tmp_cover_filepath = cover_filepath else: cover_filepath = cur_work.cover_filepath os.makedirs(os.path.dirname(cover_filepath), exist_ok=True) coroutine = sacad.search_and_download( cur_work.metadata.album, cur_work.metadata.artist, args.format, args.size, cover_filepath, size_tolerance_prct=args.size_tolerance_prct, source_classes=args.cover_sources, preserve_format=args.preserve_format, convert_progressive_jpeg=args.convert_progressive_jpeg, ) future = asyncio.ensure_future(coroutine) futures[future] = cur_work for future in futures: future.add_done_callback(post_download) # wait for end of work root_future = asyncio.gather(*futures.keys()) asyncio.get_event_loop().run_until_complete(root_future) def cl_main(): """Command line entry point.""" # parse args arg_parser = argparse.ArgumentParser( description=f"SACAD (recursive tool) v{sacad.__version__}.{__doc__}", formatter_class=argparse.ArgumentDefaultsHelpFormatter, ) arg_parser.add_argument("lib_dir", help="Music library directory to recursively analyze") arg_parser.add_argument("size", type=int, help="Target image size") arg_parser.add_argument( "cover_pattern", help="""Cover image path pattern. {artist} and {album} are replaced by their tag value. You can set an absolute path, otherwise destination directory is relative to the audio files. Use single character '%s' to embed JPEG into audio files.""" % (EMBEDDED_ALBUM_ART_SYMBOL), ) arg_parser.add_argument( "-i", "--ignore-existing", action="store_true", default=False, help="Ignore existing covers and force search and download for all files", ) arg_parser.add_argument( "-f", "--full-scan", action="store_true", default=False, help="""Enable scanning of all audio files in each directory. By default the scanner will assume all audio files in a single directory are part of the same album, and only read metadata for the first file. Enable this if your files are organized in a way than allows files for different albums to be in the same directory level. WARNING: This will make the initial scan much slower.""", ) sacad.setup_common_args(arg_parser) arg_parser.add_argument( "-v", "--verbose", action="store_true", default=False, dest="verbose", help="Enable verbose output" ) args = arg_parser.parse_args() if args.cover_pattern == EMBEDDED_ALBUM_ART_SYMBOL: args.format = "jpg" else: args.format = os.path.splitext(args.cover_pattern)[1][1:].lower() try: args.format = sacad.SUPPORTED_IMG_FORMATS[args.format] except KeyError: print(f"Unable to guess image format from extension, or unknown format: {args.format}") exit(1) args.cover_sources = tuple(COVER_SOURCE_CLASSES[source] for source in args.cover_sources) # setup logger if not args.verbose: logging.getLogger("sacad_r").setLevel(logging.WARNING) logging.getLogger().setLevel(logging.ERROR) logging.getLogger("asyncio").setLevel(logging.CRITICAL + 1) fmt = "%(name)s: %(message)s" else: logging.getLogger("sacad_r").setLevel(logging.DEBUG) logging.getLogger().setLevel(logging.DEBUG) logging.getLogger("asyncio").setLevel(logging.WARNING) fmt = "%(asctime)s %(levelname)s [%(name)s] %(message)s" logging_formatter = colored_logging.ColoredFormatter(fmt=fmt) logging_handler = logging.StreamHandler() logging_handler.setFormatter(logging_formatter) logging.getLogger().addHandler(logging_handler) # bump nofile ulimit try: import resource soft_lim, hard_lim = resource.getrlimit(resource.RLIMIT_NOFILE) if (soft_lim != resource.RLIM_INFINITY) and ((soft_lim < hard_lim) or (hard_lim == resource.RLIM_INFINITY)): resource.setrlimit(resource.RLIMIT_NOFILE, (hard_lim, hard_lim)) logging.getLogger().debug(f"Max open files count set from {soft_lim} to {hard_lim}") except (AttributeError, ImportError, OSError, ValueError): # not supported on system pass # do the job work = analyze_lib( args.lib_dir, args.cover_pattern, ignore_existing=args.ignore_existing, full_scan=args.full_scan, all_formats=args.preserve_format, ) get_covers(work, args) if __name__ == "__main__": cl_main() ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1722171973.0 sacad-2.8.0/sacad/redo.py0000644000175000017500000000134614651441105014541 0ustar00maximemaxime""" Helper to retry things. Module inspired by https://github.com/mozilla-releng/redo, but yielding time to sleep instead of sleeping, to use with asyncio. """ import random def retrier(*, max_attempts, sleeptime, max_sleeptime, sleepscale=1.5, jitter=0.2): """Yield time to wait for, after the attempt, if it failed.""" assert max_attempts > 1 assert sleeptime >= 0 assert 0 <= jitter <= sleeptime assert sleepscale >= 1 cur_sleeptime = min(max_sleeptime, sleeptime) for attempt in range(max_attempts): cur_jitter = random.randint(int(-jitter * 1000), int(jitter * 1000)) / 1000 yield max(0, cur_sleeptime + cur_jitter) cur_sleeptime = min(max_sleeptime, cur_sleeptime * sleepscale) ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1722197851.6217132 sacad-2.8.0/sacad/sources/0000755000175000017500000000000014651523534014724 5ustar00maximemaxime././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1722171973.0 sacad-2.8.0/sacad/sources/__init__.py0000644000175000017500000000060714651441105017031 0ustar00maximemaxime"""SACAD cover sources.""" from sacad.sources.deezer import DeezerCoverSource, DeezerCoverSourceResult # noqa: F401 from sacad.sources.discogs import DiscogsCoverSource, DiscogsCoverSourceResult # noqa: F401 from sacad.sources.itunes import ItunesCoverSource, ItunesCoverSourceResult # noqa: F401 from sacad.sources.lastfm import LastFmCoverSource, LastFmCoverSourceResult # noqa: F401 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1722171973.0 sacad-2.8.0/sacad/sources/base.py0000644000175000017500000002230014651441105016176 0ustar00maximemaxime"""Common code for all cover sources cover sources.""" import abc import asyncio import itertools import logging import operator import os import random import string import unicodedata import urllib.parse import appdirs import web_cache from sacad import http_helpers from sacad.cover import CoverSourceQuality # noqa: F401 MAX_THUMBNAIL_SIZE = 256 class CoverSource(metaclass=abc.ABCMeta): """Base class for all cover sources.""" def __init__( self, target_size, size_tolerance_prct, *, min_delay_between_accesses=0, jitter_range_ms=None, rate_limited_domains=None, allow_cookies=False, ): self.target_size = target_size self.size_tolerance_prct = size_tolerance_prct self.logger = logging.getLogger(self.__class__.__name__) self.http = http_helpers.Http( allow_session_cookies=allow_cookies, min_delay_between_accesses=min_delay_between_accesses, jitter_range_ms=jitter_range_ms, rate_limited_domains=rate_limited_domains, logger=self.logger, ) self.ua = "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:107.0) Gecko/20100101 Firefox/107.0" if not hasattr(__class__, "api_cache"): db_filepath = os.path.join(appdirs.user_cache_dir(appname="sacad", appauthor=False), "sacad-cache.sqlite") os.makedirs(os.path.dirname(db_filepath), exist_ok=True) day_s = 60 * 60 * 24 __class__.api_cache = web_cache.WebCache( db_filepath, "cover_source_api_data", caching_strategy=web_cache.CachingStrategy.FIFO, expiration=random.randint(day_s * 7, day_s * 14), # 1-2 weeks compression=web_cache.Compression.DEFLATE, ) __class__.probe_cache = web_cache.WebCache( db_filepath, "cover_source_probe_data", caching_strategy=web_cache.CachingStrategy.FIFO, expiration=day_s * 30 * 6, ) # 6 months logging.getLogger("Cache").debug( f"Total size of file {db_filepath!r}: {__class__.api_cache.getDatabaseFileSize()}" ) for cache, cache_name in zip( (__class__.api_cache, __class__.probe_cache), ("cover_source_api_data", "cover_source_probe_data") ): purged_count = cache.purge() logging.getLogger("Cache").debug( f"{purged_count} obsolete entries have been removed from cache {cache_name!r}" ) row_count = len(cache) logging.getLogger("Cache").debug(f"Cache {cache_name!r} contains {row_count} entries") async def closeSession(self): """Close HTTP session to make aiohttp happy.""" await self.http.close() async def search(self, album, artist): """Search for a given album/artist and return an iterable of CoverSourceResult.""" self.logger.debug(f"Searching with source {self.__class__.__name__!r}...") album = self.processAlbumString(album) artist = self.processArtistString(artist) url_data = self.getSearchUrl(album, artist) if isinstance(url_data, tuple): url, post_data = url_data else: url = url_data post_data = None try: store_in_cache_callback, api_data = await self.fetchResults(url, post_data) results = await self.parseResults(api_data, search_album=album, search_artist=artist) except Exception as e: # raise self.logger.warning( f"Search with source {self.__class__.__name__!r} failed: {e.__class__.__qualname__} {e}" ) return () else: if results: # only store in cache if parsing succeeds and we have results await store_in_cache_callback() # get metadata futures = [] for result in filter(operator.methodcaller("needMetadataUpdate"), results): coroutine = result.updateImageMetadata() future = asyncio.ensure_future(coroutine) futures.append(future) if futures: await asyncio.wait(futures) for future in futures: future.result() # raise pending exception if any # filter results_excluded_count = 0 reference_only_count = 0 results_kept = [] for result in results: if ( (result.size[0] + (self.size_tolerance_prct * self.target_size / 100) < self.target_size) or ( # skip too small images result.size[1] + (self.size_tolerance_prct * self.target_size / 100) < self.target_size ) or (result.format is None) or result.needMetadataUpdate() # unknown format ): # if still true, it means we failed to grab metadata, so exclude it if result.source_quality.isReference(): # we keep this result just for the reference, it will be excluded from the results result.is_only_reference = True results_kept.append(result) reference_only_count += 1 else: results_excluded_count += 1 else: results_kept.append(result) result_kept_count = len(results_kept) - reference_only_count # log self.logger.info( f"Got {result_kept_count} relevant ({results_excluded_count + reference_only_count} excluded) results " f"from source {self.__class__.__name__!r}" ) for result in itertools.filterfalse(operator.attrgetter("is_only_reference"), results_kept): self.logger.debug( "%s %s%s %4dx%4d %s%s" % ( result.__class__.__name__, ("(%02d) " % (result.rank)) if result.rank is not None else "", result.format.name, result.size[0], result.size[1], result.urls[0], " [x%u]" % (len(result.urls)) if len(result.urls) > 1 else "", ) ) return results_kept async def fetchResults(self, url, post_data=None): """Get a (store in cache callback, search results) tuple from an URL.""" if post_data is not None: self.logger.debug(f"Querying URL {url!r} {dict(post_data)}...") else: self.logger.debug(f"Querying URL {url!r}...") headers = {} self.updateHttpHeaders(headers) return await self.http.query(url, post_data=post_data, headers=headers, cache=__class__.api_cache) async def probeUrl(self, url, response_headers=None): """Probe URL reachability from cache or HEAD request.""" self.logger.debug(f"Probing URL {url!r}...") headers = {} self.updateHttpHeaders(headers) resp_headers = {} resp_ok = await self.http.isReachable( url, headers=headers, response_headers=resp_headers, cache=__class__.probe_cache ) if response_headers is not None: response_headers.update(resp_headers) return resp_ok @staticmethod def assembleUrl(base_url, params): """Build an URL from URL base and parameters.""" return f"{base_url}?{urllib.parse.urlencode(params)}" @staticmethod def unaccentuate(s): """Replace accentuated chars in string by their non accentuated equivalent.""" return "".join(c for c in unicodedata.normalize("NFKD", s) if not unicodedata.combining(c)) @staticmethod def unpunctuate(s, *, char_blacklist=string.punctuation): """Remove punctuation from string s.""" # remove punctuation s = "".join(c for c in s if c not in char_blacklist) # remove consecutive spaces return " ".join(filter(None, s.split(" "))) # # The following methods can or should be overriden in subclasses # def processQueryString(self, s): """Process artist or album string before building query URL.""" return __class__.unpunctuate(s.lower()) def processArtistString(self, artist): """Process artist string before building query URL.""" return self.processQueryString(artist) def processAlbumString(self, album): """Process album string before building query URL.""" return self.processQueryString(album) @abc.abstractmethod def getSearchUrl(self, album, artist): """ Build a search results URL from an album and/or artist name. If the URL must be accessed with an HTTP GET request, return the URL as a string. If the URL must be accessed with an HTTP POST request, return a tuple with: - the URL as a string - the post parameters as a collections.OrderedDict """ pass def updateHttpHeaders(self, headers): """Add API specific HTTP headers.""" pass @abc.abstractmethod async def parseResults(self, api_data, *, search_album, search_artist): """Parse API data and return an iterable of results.""" pass ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1722171973.0 sacad-2.8.0/sacad/sources/deezer.py0000644000175000017500000000603314651441105016547 0ustar00maximemaxime"""Deezer cover source.""" import collections import json import operator from sacad.cover import CoverImageFormat, CoverImageMetadata, CoverSourceQuality, CoverSourceResult from sacad.sources.base import CoverSource class DeezerCoverSourceResult(CoverSourceResult): """Deezer search cover result.""" def __init__(self, *args, **kwargs): super().__init__( *args, source_quality=CoverSourceQuality.EXACT_SEARCH | CoverSourceQuality.NO_UNRELATED_RESULT_RISK, **kwargs, ) class DeezerCoverSource(CoverSource): """ Cover source using the official Deezer API. https://developers.deezer.com/api/ """ BASE_URL = "https://api.deezer.com/search" COVER_SIZES = { "cover_small": (56, 56), "cover": (120, 120), "cover_medium": (250, 250), "cover_big": (500, 500), "cover_xl": (1000, 1000), } def __init__(self, *args, **kwargs): super().__init__(*args, min_delay_between_accesses=0.1, **kwargs) def getSearchUrl(self, album, artist): """See CoverSource.getSearchUrl.""" # build request url search_params = collections.OrderedDict() search_params["artist"] = artist search_params["album"] = album url_params = collections.OrderedDict() # url_params["strict"] = "on" url_params["order"] = "RANKING" url_params["q"] = " ".join(f'{k}:"{v}"' for k, v in search_params.items()) return __class__.assembleUrl(__class__.BASE_URL, url_params) def processQueryString(self, s): """See CoverSource.processQueryString.""" # API search is fuzzy, not need to alter query return s async def parseResults(self, api_data, *, search_album, search_artist): """See CoverSource.parseResults.""" results = [] # get unique albums json_data = json.loads(api_data) albums = [] index_exact_match = 0 for e in json_data["data"]: album = e["album"] album_id = album["id"] if album_id in map(operator.itemgetter("id"), albums): continue if album["title"].lower() == search_album.lower(): # override default sorting by putting exact matches first albums.insert(index_exact_match, album) index_exact_match += 1 else: albums.append(album) for rank, album in enumerate(albums, 1): for key, size in __class__.COVER_SIZES.items(): img_url = album[key] thumbnail_url = album["cover_small"] results.append( DeezerCoverSourceResult( img_url, size, CoverImageFormat.JPEG, thumbnail_url=thumbnail_url, source=self, rank=rank, check_metadata=CoverImageMetadata.NONE, ) ) return results ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1722171973.0 sacad-2.8.0/sacad/sources/discogs.py0000644000175000017500000000600314651441105016721 0ustar00maximemaxime"""Discogs cover source.""" import collections import json from sacad import __version__ from sacad.cover import CoverImageFormat, CoverImageMetadata, CoverSourceQuality, CoverSourceResult from sacad.sources.base import CoverSource FUZZY_MODE = False class DiscogsCoverSourceResult(CoverSourceResult): """Discogs search cover result.""" def __init__(self, *args, **kwargs): quality = CoverSourceQuality.NO_UNRELATED_RESULT_RISK if FUZZY_MODE: quality |= CoverSourceQuality.FUZZY_SEARCH else: quality |= CoverSourceQuality.EXACT_SEARCH super().__init__(*args, source_quality=quality, **kwargs) class DiscogsCoverSource(CoverSource): """ Cover source using the official API. See https://www.discogs.com/developers """ BASE_URL = "https://api.discogs.com" API_KEY = "cGWMOYjQNdWYKXDaxVnR" API_SECRET = "NCyWcKHWLAvAreyjDdvVogBzVnzPEEDf" # not that secret in fact def __init__(self, *args, **kwargs): # https://www.discogs.com/developers#page:home,header:home-rate-limiting super().__init__(*args, min_delay_between_accesses=1, **kwargs) def getSearchUrl(self, album, artist): """See CoverSource.getSearchUrl.""" url_params = collections.OrderedDict() if FUZZY_MODE: url_params["q"] = f"{artist} - {album}" else: url_params["artist"] = artist url_params["release_title"] = album url_params["type"] = "release" return __class__.assembleUrl(f"{__class__.BASE_URL}/database/search", url_params) def updateHttpHeaders(self, headers): """See CoverSource.updateHttpHeaders.""" headers["User-Agent"] = f"sacad/{__version__}" headers["Accept"] = "application/vnd.discogs.v2.discogs+json" headers["Authorization"] = f"Discogs key={__class__.API_KEY}, secret={__class__.API_SECRET}" async def parseResults(self, api_data, *, search_album, search_artist): """See CoverSource.parseResults.""" json_data = json.loads(api_data) results = [] for rank, release in enumerate(json_data["results"], 1): if release["formats"][0]["name"] != "CD": continue thumbnail_url = release["thumb"] img_url = release["cover_image"] url_tokens = img_url.split("/") url_tokens.reverse() try: img_width = int(next(t for t in url_tokens if t.startswith("w:")).split(":", 1)[-1]) img_height = int(next(t for t in url_tokens if t.startswith("h:")).split(":", 1)[-1]) except StopIteration: continue result = DiscogsCoverSourceResult( img_url, (img_width, img_height), CoverImageFormat.JPEG, thumbnail_url=thumbnail_url, source=self, rank=rank, check_metadata=CoverImageMetadata.NONE, ) results.append(result) return results ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1722171973.0 sacad-2.8.0/sacad/sources/itunes.py0000644000175000017500000000545514651441105016607 0ustar00maximemaxime"""Itunes cover source.""" import collections import json from sacad.cover import SUPPORTED_IMG_FORMATS as EXTENSION_FORMAT from sacad.cover import CoverImageFormat, CoverImageMetadata, CoverSourceQuality, CoverSourceResult from sacad.sources.base import CoverSource class ItunesCoverSourceResult(CoverSourceResult): """Itunes search cover result.""" def __init__(self, *args, **kwargs): super().__init__( *args, source_quality=CoverSourceQuality.EXACT_SEARCH | CoverSourceQuality.NO_UNRELATED_RESULT_RISK, **kwargs, ) class ItunesCoverSource(CoverSource): """Itunes cover source.""" SEARCH_URL = "https://itunes.apple.com/search" def __init__(self, *args, **kwargs): # https://stackoverflow.com/questions/12596300/itunes-search-api-rate-limit super().__init__(*args, min_delay_between_accesses=3, **kwargs) def getSearchUrl(self, album, artist): """See CoverSource.getSearchUrl.""" url_params = collections.OrderedDict() url_params["media"] = "music" url_params["entity"] = "album" url_params["term"] = f"{artist} {album}" return __class__.assembleUrl(__class__.SEARCH_URL, url_params) async def parseResults(self, api_data, *, search_album, search_artist): """See CoverSource.parseResults.""" json_data = json.loads(api_data) results = [] for rank, result in enumerate(json_data["results"], 1): if (search_album != self.processAlbumString(result["collectionName"])) or ( search_artist != self.processArtistString(result["artistName"]) ): continue thumbnail_url = result["artworkUrl60"] base_img_url = result["artworkUrl60"].rsplit("/", 1)[0] url_found = False for img_size in (5000, 1200, 600): for img_format in (CoverImageFormat.PNG, CoverImageFormat.JPEG): suffix = "-100.jpg" if (img_format is CoverImageFormat.JPEG) else ".png" img_url = f"{base_img_url}/{img_size}x{img_size}{suffix}" if await self.probeUrl(img_url): url_found = True break if url_found: break else: img_url = result["artworkUrl100"] img_format = EXTENSION_FORMAT[img_url.rsplit(".", 1)[-1]] img_size = 100 result = ItunesCoverSourceResult( img_url, (img_size, img_size), img_format, thumbnail_url=thumbnail_url, source=self, rank=rank, check_metadata=CoverImageMetadata.NONE, ) results.append(result) return results ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1722171973.0 sacad-2.8.0/sacad/sources/lastfm.py0000644000175000017500000000676314651441105016571 0ustar00maximemaxime"""LastFM cover source.""" import collections import os.path import string import xml.etree.ElementTree from sacad.cover import SUPPORTED_IMG_FORMATS, CoverImageMetadata, CoverSourceQuality, CoverSourceResult from sacad.sources.base import MAX_THUMBNAIL_SIZE, CoverSource class LastFmCoverSourceResult(CoverSourceResult): """LastFM cover search result.""" def __init__(self, *args, **kwargs): super().__init__( *args, source_quality=CoverSourceQuality.EXACT_SEARCH | CoverSourceQuality.NO_UNRELATED_RESULT_RISK, **kwargs, ) class LastFmCoverSource(CoverSource): """ Cover source using the official LastFM API. http://www.lastfm.fr/api/show?service=290 """ BASE_URL = "https://ws.audioscrobbler.com/2.0/" API_KEY = "2410a53db5c7490d0f50c100a020f359" SIZES = { "small": (34, 34), "medium": (64, 64), "large": (174, 174), "extralarge": (300, 300), "mega": (600, 600), } # this is actually between 600 and 900, sometimes even more (ie 1200) def __init__(self, *args, **kwargs): super().__init__(*args, min_delay_between_accesses=0.1, **kwargs) def getSearchUrl(self, album, artist): """See CoverSource.getSearchUrl.""" # build request url params = collections.OrderedDict() params["method"] = "album.getinfo" params["api_key"] = __class__.API_KEY params["album"] = album params["artist"] = artist return __class__.assembleUrl(__class__.BASE_URL, params) def processQueryString(self, s): """See CoverSource.processQueryString.""" char_blacklist = set(string.punctuation) char_blacklist.remove("'") char_blacklist.remove("&") char_blacklist = frozenset(char_blacklist) return __class__.unpunctuate(s.lower(), char_blacklist=char_blacklist) async def parseResults(self, api_data, *, search_album, search_artist): """See CoverSource.parseResults.""" results = [] # get xml results list xml_text = api_data.decode("utf-8") xml_root = xml.etree.ElementTree.fromstring(xml_text) status = xml_root.get("status") if status != "ok": raise Exception(f"Unexpected Last.fm response status: {status}") img_elements = xml_root.findall("album/image") # build results from xml thumbnail_url = None thumbnail_size = None for img_element in img_elements: img_url = img_element.text if not img_url: # last.fm returns empty image tag for size it does not have continue lfm_size = img_element.get("size") if lfm_size == "mega": check_metadata = CoverImageMetadata.SIZE else: check_metadata = CoverImageMetadata.NONE try: size = __class__.SIZES[lfm_size] except KeyError: continue if (size[0] <= MAX_THUMBNAIL_SIZE) and ((thumbnail_size is None) or (size[0] < thumbnail_size)): thumbnail_url = img_url thumbnail_size = size[0] format = os.path.splitext(img_url)[1][1:].lower() format = SUPPORTED_IMG_FORMATS[format] results.append( LastFmCoverSourceResult( img_url, size, format, thumbnail_url=thumbnail_url, source=self, check_metadata=check_metadata ) ) return results ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1722171973.0 sacad-2.8.0/sacad/tqdm_logging.py0000644000175000017500000000262614651441105016265 0ustar00maximemaxime"""Code to help using the logging module with tqdm progress bars.""" import contextlib import logging import threading logging_handlers_lock = threading.Lock() class TqdmLoggingHandler(logging.Handler): """Logging handler sending messages to the tqdm write method (avoids overlap).""" def __init__(self, tqdm, *args, **kwargs): self.tqdm = tqdm super().__init__(*args, **kwargs) def emit(self, record): """See logging.Handler.emit.""" msg = self.format(record) self.tqdm.write(msg) @contextlib.contextmanager def redirect_logging(tqdm_obj, logger=logging.getLogger()): """Redirect logging to a TqdmLoggingHandler object and then restore the original logging behavior.""" with logging_handlers_lock: # remove current handlers prev_handlers = [] for handler in logger.handlers.copy(): prev_handlers.append(handler) logger.removeHandler(handler) # add tqdm handler tqdm_handler = TqdmLoggingHandler(tqdm_obj) if prev_handlers[-1].formatter is not None: tqdm_handler.setFormatter(prev_handlers[-1].formatter) logger.addHandler(tqdm_handler) try: yield finally: # restore handlers with logging_handlers_lock: logger.removeHandler(tqdm_handler) for handler in prev_handlers: logger.addHandler(handler) ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1722197851.6217132 sacad-2.8.0/sacad.egg-info/0000755000175000017500000000000014651523534014733 5ustar00maximemaxime././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1722197851.0 sacad-2.8.0/sacad.egg-info/PKG-INFO0000644000175000017500000001532614651523533016036 0ustar00maximemaximeMetadata-Version: 2.1 Name: sacad Version: 2.8.0 Summary: Search and download music album covers Home-page: https://github.com/desbma/sacad Download-URL: https://github.com/desbma/sacad/archive/2.8.0.tar.gz Author: desbma Keywords: download,album,cover,art,albumart,music Classifier: Development Status :: 5 - Production/Stable Classifier: Environment :: Console Classifier: Intended Audience :: End Users/Desktop Classifier: License :: OSI Approved :: Mozilla Public License 2.0 (MPL 2.0) Classifier: Natural Language :: English Classifier: Operating System :: OS Independent Classifier: Programming Language :: Python Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3 :: Only Classifier: Programming Language :: Python :: 3.8 Classifier: Programming Language :: Python :: 3.9 Classifier: Programming Language :: Python :: 3.10 Classifier: Programming Language :: Python :: 3.11 Classifier: Programming Language :: Python :: 3.12 Classifier: Topic :: Internet :: WWW/HTTP Classifier: Topic :: Multimedia :: Graphics Classifier: Topic :: Utilities Description-Content-Type: text/markdown License-File: LICENSE Requires-Dist: aiohttp>=3.6 Requires-Dist: appdirs>=1.4.0 Requires-Dist: bitarray>=2.0.0 Requires-Dist: cssselect>=1.1.0 Requires-Dist: lxml>=4.0.0 Requires-Dist: mutagen>=1.31 Requires-Dist: pillow>=8.0.0 Requires-Dist: tqdm>=4.28.1 Requires-Dist: unidecode>=1.1.1 Requires-Dist: web_cache>=1.1.0 # SACAD ## Smart Automatic Cover Art Downloader [![PyPI version](https://img.shields.io/pypi/v/sacad.svg?style=flat)](https://pypi.python.org/pypi/sacad/) [![AUR version](https://img.shields.io/aur/version/sacad.svg?style=flat)](https://aur.archlinux.org/packages/sacad/) [![CI status](https://img.shields.io/github/actions/workflow/status/desbma/sacad/ci.yml)](https://github.com/desbma/sacad/actions) [![Supported Python versions](https://img.shields.io/pypi/pyversions/sacad.svg?style=flat)](https://pypi.python.org/pypi/sacad/) [![License](https://img.shields.io/github/license/desbma/sacad.svg?style=flat)](https://github.com/desbma/sacad/blob/master/LICENSE) SACAD is a multi platform command line tool to download album covers without manual intervention, ideal for integration in scripts, audio players, etc. SACAD also provides a second command line tool, `sacad_r`, to scan a music library, read metadata from audio tags, and download missing covers automatically, optionally embedding the image into audio audio files. ## Features - Can target specific image size, and find results for high resolution covers - Support JPEG and PNG formats - Customizable output: save image along with the audio files / in a different directory named by artist/album / embed cover in audio files... - Currently support the following cover sources: - ~~Amazon CD (.com, .ca, .cn, .fr, .de, .co.jp and .co.uk variants) & Amazon digital music~~ (removed, too unreliable) - ~~CoverLib~~ (site is dead) - Deezer - Discogs - ~~Google Images~~ (removed, too unreliable) - Last.fm - Itunes - Smart sorting algorithm to select THE best cover for a given query, using several factors: source reliability, image format, image size, image similarity with reference cover, etc. - Automatically crunch images with optipng, oxipng or jpegoptim (can save 30% of filesize without any loss of quality, great for portable players) - Cache search results locally for faster future search - Do everything to avoid getting blocked by the sources: hide user-agent and automatically take care of rate limiting - Automatically convert/resize image if needed - Multiplatform (Windows/Mac/Linux) SACAD is designed to be robust and be executed in batch of thousands of queries: - HTML parsing is done without regex but with the LXML library, which is faster, and more robust to page changes - When the size of an image reported by a source is not reliable (ie. Google Images), automatically download the first KB of the file to get its real size from the file header - Process several queries simultaneously (using [asyncio](https://docs.python.org/3/library/asyncio.html)), to speed up processing - Automatically reuse TCP connections (HTTP Keep-Alive), for better network performance - Automatically retry failed HTTP requests - Music library scan supports all common audio formats (MP3, AAC, Vorbis, FLAC..) - Cover sources page or API changes are quickly detected, thanks to high test coverage, and SACAD is quickly updated accordingly ## Installation SACAD requires [Python](https://www.python.org/downloads/) >= 3.8. ### Standalone Windows executable Windows users can download a [standalone binary](https://github.com/desbma/sacad/releases/latest) which does not require Python. ### Arch Linux Arch Linux users can install the [sacad](https://aur.archlinux.org/packages/sacad/) AUR package. ### From PyPI (with PIP) 1. If you don't already have it, [install pip](https://pip.pypa.io/en/stable/installing/) for Python 3 2. Install SACAD: `pip3 install sacad` ### From source 1. If you don't already have it, [install setuptools](https://pypi.python.org/pypi/setuptools#installation-instructions) for Python 3 2. Clone this repository: `git clone https://github.com/desbma/sacad` 3. Install SACAD: `python3 setup.py install` #### Optional Additionally, if you want to benefit from image crunching (lossless recompression to save additional space): - Install [oxipng](https://github.com/shssoichiro/oxipng) or [optipng](http://optipng.sourceforge.net/) - Install [jpegoptim](http://freecode.com/projects/jpegoptim) On Ubuntu and other Debian derivatives, you can install them with `sudo apt-get install optipng jpegoptim`. Note that depending of the speed of your CPU, crunching may significantly slow down processing as it is very CPU intensive (especially with optipng). ## Command line usage Two tools are provided: `sacad` to search and download one cover, and `sacad_r` to scan a music library and download all missing covers. Run `sacad -h` / `sacad_r -h` to get full command line reference. ### Examples To download the cover of _Master of Puppets_ from _Metallica_, to the file `AlbumArt.jpg`, targetting ~ 600x600 pixel resolution: `sacad "metallica" "master of puppets" 600 AlbumArt.jpg`. To download covers for your library with the same parameters as previous example: `sacad_r library_directory 600 AlbumArt.jpg`. ## Limitations - Only supports front covers ## Adding cover sources Adding a new cover source is very easy if you are a Python developer, you need to inherit the `CoverSource` class and implement the following methods: - `getSearchUrl(self, album, artist)` - `parseResults(self, api_data)` - `updateHttpHeaders(self, headers)` (optional) See comments in the code for more information. ## License [Mozilla Public License Version 2.0](https://www.mozilla.org/MPL/2.0/) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1722197851.0 sacad-2.8.0/sacad.egg-info/SOURCES.txt0000644000175000017500000000113714651523533016620 0ustar00maximemaximeLICENSE MANIFEST.in README.md pyproject.toml requirements.txt setup.py test-requirements.txt sacad/__init__.py sacad/__main__.py sacad/colored_logging.py sacad/cover.py sacad/http_helpers.py sacad/mkstemp_ctx.py sacad/rate_watcher.py sacad/recurse.py sacad/redo.py sacad/tqdm_logging.py sacad.egg-info/PKG-INFO sacad.egg-info/SOURCES.txt sacad.egg-info/dependency_links.txt sacad.egg-info/entry_points.txt sacad.egg-info/requires.txt sacad.egg-info/top_level.txt sacad/sources/__init__.py sacad/sources/base.py sacad/sources/deezer.py sacad/sources/discogs.py sacad/sources/itunes.py sacad/sources/lastfm.py././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1722197851.0 sacad-2.8.0/sacad.egg-info/dependency_links.txt0000644000175000017500000000000114651523533021000 0ustar00maximemaxime ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1722197851.0 sacad-2.8.0/sacad.egg-info/entry_points.txt0000644000175000017500000000011014651523533020220 0ustar00maximemaxime[console_scripts] sacad = sacad:cl_main sacad_r = sacad.recurse:cl_main ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1722197851.0 sacad-2.8.0/sacad.egg-info/requires.txt0000644000175000017500000000022414651523533017330 0ustar00maximemaximeaiohttp>=3.6 appdirs>=1.4.0 bitarray>=2.0.0 cssselect>=1.1.0 lxml>=4.0.0 mutagen>=1.31 pillow>=8.0.0 tqdm>=4.28.1 unidecode>=1.1.1 web_cache>=1.1.0 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1722197851.0 sacad-2.8.0/sacad.egg-info/top_level.txt0000644000175000017500000000000614651523533017460 0ustar00maximemaximesacad ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1722197851.6217132 sacad-2.8.0/setup.cfg0000644000175000017500000000004614651523534014007 0ustar00maximemaxime[egg_info] tag_build = tag_date = 0 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1720295194.0 sacad-2.8.0/setup.py0000755000175000017500000000431414642317432013703 0ustar00maximemaxime#!/usr/bin/env python3 """Package setup.""" import os import re import sys from setuptools import find_packages, setup if sys.hexversion < 0x3080000: print("Python version %s is unsupported, >= 3.8.0 is needed" % (".".join(map(str, sys.version_info[:3])))) exit(1) with open(os.path.join("sacad", "__init__.py"), "rt") as f: version_match = re.search('__version__ = "([^"]+)"', f.read()) assert version_match is not None version = version_match.group(1) with open("requirements.txt", "rt") as f: requirements = f.read().splitlines() if sys.platform == "win32": requirements.append("idna==2.6") with open("test-requirements.txt", "rt") as f: test_requirements = f.read().splitlines() with open("README.md", "rt") as f: readme = f.read() setup( name="sacad", version=version, author="desbma", packages=find_packages(exclude=("tests",)), entry_points={"console_scripts": ["sacad = sacad:cl_main", "sacad_r = sacad.recurse:cl_main"]}, test_suite="tests", install_requires=requirements, tests_require=test_requirements, description="Search and download music album covers", long_description=readme, long_description_content_type="text/markdown", url="https://github.com/desbma/sacad", download_url="https://github.com/desbma/sacad/archive/%s.tar.gz" % (version), keywords=["download", "album", "cover", "art", "albumart", "music"], classifiers=[ "Development Status :: 5 - Production/Stable", "Environment :: Console", "Intended Audience :: End Users/Desktop", "License :: OSI Approved :: Mozilla Public License 2.0 (MPL 2.0)", "Natural Language :: English", "Operating System :: OS Independent", "Programming Language :: Python", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3 :: Only", "Programming Language :: Python :: 3.8", "Programming Language :: Python :: 3.9", "Programming Language :: Python :: 3.10", "Programming Language :: Python :: 3.11", "Programming Language :: Python :: 3.12", "Topic :: Internet :: WWW/HTTP", "Topic :: Multimedia :: Graphics", "Topic :: Utilities", ], ) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1564081451.0 sacad-2.8.0/test-requirements.txt0000644000175000017500000000002013516376453016424 0ustar00maximemaximerequests>=2.6.0