pax_global_header00006660000000000000000000000064140043361450014512gustar00rootroot0000000000000052 comment=f9cbcb2315f5712bfac770b41c66fea45a506fff python-internetarchive-1.9.9/000077500000000000000000000000001400433614500162435ustar00rootroot00000000000000python-internetarchive-1.9.9/.gitignore000066400000000000000000000003231400433614500202310ustar00rootroot00000000000000.coverage *.pyc build dist *.egg-info stairs itemlist.txt .tox TAGS *.csv htmlcov *.log *.pex pex/ wheelhouse *gz .venv* .cache .vagrant .idea v2/ v3.6/ v3.7/ .pytest_cache/ .python-version trash/ .vscode v3.8/ python-internetarchive-1.9.9/.travis.yml000066400000000000000000000003321400433614500203520ustar00rootroot00000000000000os: linux dist: xenial language: python install: pip install tox-travis python: - "2.7" - "3.6" - "3.7" - "3.8" - "3.9" - "3.10-dev" jobs: allow_failures: - python: "3.10-dev" script: tox python-internetarchive-1.9.9/AUTHORS.rst000066400000000000000000000005431400433614500201240ustar00rootroot00000000000000Authors ======= The Internet Archive Python library and command-line tool is written and maintained by Jake Johnson and various contributors: Development Lead ---------------- - Jake Johnson Contributors ------------ - Bryce Drennan Patches and Suggestions ----------------------- - VM Brasseur python-internetarchive-1.9.9/CONTRIBUTING.rst000066400000000000000000000055261400433614500207140ustar00rootroot00000000000000How to Contribute ================= Thank you for considering contributing. All contributions are welcome and appreciated! Support Questions ----------------- Please don't use the Github issue tracker for asking support questions. All support questions should be emailed to `info@archive.org `_. Bug Reports ----------- `Github issues `_ is used for tracking bugs. Please consider the following when opening an issue: - Avoid opening duplicate issues by taking a look at the current open issues. - Provide details on the version, operating system and Python version you are running. - Include complete tracebacks and error messages. Pull Requests ------------- All pull requests and patches are welcome, but please consider the following: - Include tests. - Include documentation for new features. - If your patch is supposed to fix a bug, please describe in as much detail as possible the circumstances in which the bug happens. - Please follow `PEP8 `_, with the exception of what is ignored in `setup.cfg `_. PEP8 compliancy is checked when tests run. Tests will fail if your patch is not PEP8 compliant. - Add yourself to AUTHORS.rst. - Avoid introducing new dependencies. - Open an issue if a relevant one is not already open, so others have visibility into what you're working on and efforts aren't duplicated. - Clarity is preferred over brevity. Running Tests ------------- The minimal requirements for running tests are ``pytest``, ``pytest-pep8`` and ``responses``: .. code:: bash $ pip install pytest pytest-pep8 responses Clone the `internetarchive lib `_: .. code:: bash $ git clone https://github.com/jjjake/internetarchive Install the `internetarchive lib `_ as an editable package: .. code:: bash $ cd internetarchive $ pip install -e . Run the tests: .. code:: bash $ py.test --pep8 Note that this will only test against the Python version you are currently using, however ``internetarchive`` tests against multiple Python versions defined in `tox.ini `_. Tests must pass on all versions defined in ``tox.ini`` for all pull requests. To test against all supported Python versions, first make sure you have all of the required versions of Python installed. Then simply install execute tox from the root directory of the repo: .. code:: bash $ pip install tox $ tox Even easier is simply creating a pull request. `Travis `_ is used for continuous integration, and is set up to run the full testsuite whenever a pull request is submitted or updated. python-internetarchive-1.9.9/HISTORY.rst000066400000000000000000000622621400433614500201460ustar00rootroot00000000000000.. :changelog: Release History --------------- 1.9.9 (2021-01-27) ++++++++++++++++++ **Features and Improvements** - Added support for FTS API. - Validate identifiers in spreadsheet before uploading file with ``ia upload --spreadsheet``. - Added ``ia configure --print-cookies``. This is helpful for using your archive.org cookies in other programs like ``curl``. e.g. ``curl -b $(ia configure --print-cookies) ...`` 1.9.6 (2020-11-10) ++++++++++++++++++ **Features and Improvements** - Added ability to submit tasks with a reduced priority. - Added ability to add headers to modify_metadata requests. **Bugfixes** - Bumped version requirements for ``six``. This addresses the "No module named collections_abc" error. 1.9.5 (2020-09-18) ++++++++++++++++++ **Features and Improvements** - Increased chunk size in download and added other download optimizations. - Added support for submitting reviews via ``Item.review()`` and ``ia review``. - Improved exception/error messages in cases where s3.us.archive.org returns invalid XML during uploads. - Minor updates and improvements to continuous integration. 1.9.4 (2020-06-24) ++++++++++++++++++ **Features and Improvements** - Added support for adding file-level metadata at time of upload. - Added ``--no-backup`` to ``ia upload`` to turn off backups. **Bugfixes** - Fixed bug in ``internetarchive.get_tasks`` where no tasks were returned unless ``catalog`` or ``history`` params were provided. - Fixed bug in upload where headers were being reused in certain cases. This lead to issues such as queue-derive being turned off in some cases. - Fix crash in ``ia tasks`` when a task log contains invalid UTF-8 character. - Fixed bug in upload where requests were not being closed. 1.9.3 (2020-04-07) ++++++++++++++++++ **Features and Improvements** - Added support for remvoing items from simplelists as if they were collections. - Added ``Item.derive()`` method for deriving items. - Added ``Item.fixer()`` method for submitting fixer tasks. - Added ``--task-args`` to ``ia tasks`` for submitting task args to the Tasks API. **Bugfixes** - Minor bug fix in ``ia tasks`` to fix support for tasks that do not require a ``--comment`` option. 1.9.2 (2020-03-15) ++++++++++++++++++ **Features and Improvements** - Switched to ``tqdm`` for progress bar (``clint`` is no longer maintained). - Added ``Item.identifier_available()`` method for calling check_identifier.php. - Added support for opening details page in default browser after upload. - Added support for using ``item`` or ``identifier`` as column header in spreadsheet mode. - Added ``ArchiveSession.get_my_catalog()`` method for retrieving running/queued tasks. - Removed backports.csv requirement for newer Python releases. - Authorization header is now used for metadata reads, to support privileged access to /metadata. - ``ia download`` no longer downloads history dir by default. - Added ``ignore_history_dir`` to ``Item.download()``. The default is False. **Bugfixes** - Fixed bug in ``ia copy`` and ``ia move`` where filenames weren't being encoded/quoted correctly. - Fixed bug in ``Item.get_all_item_tasks()`` where all calls would fail unless a dict was provided to ``params``. - Read from ~/.config/ia.ini with fallback to ~/.ia regardless of the existence of ~/.config - Fixed S3 overload message always mentioning the total maximum number of retries, not the remaining ones. - Fixed bug where a KeyError exception would be raised on most calls to dark items. - Fixed bug where md5 was being calculated for every upload. 1.9.0 (2019-12-05) ++++++++++++++++++ **Features and Improvements** - Implemented new archive.org `Tasks API `_. - Added support for darking and undarking items via the Tasks API. - Added support for submitting arbitrary tasks (only darking/undarking currently supported, see Tasks API documentation). **Bugfixes** - ``ia download`` now displays `download failed` instead of `success` when download fails. - Fixed bug where ``Item.get_file`` would not work on unicode names in Python 2. 1.8.5 (2019-06-07) ++++++++++++++++++ **Features and Improvements** - Improved timeout logging and exceptions. - Added support for arbitrary targets to metadata write. - IA-S3 keys now supported for auth in download. - Authoraization (i.e. ``ia configure``) now uses the archive.org xauthn endpoint. **Bugfixes** - Fixed encoding error in --get-task-log - Fixed bug in upload where connections were not being closed in upload. 1.8.4 (2019-04-11) ++++++++++++++++++ **Features and Improvements** - It's now possible to retrieve task logs, given a task id, without first retrieving the items task history. - Added examples to ``ia tasks`` help. 1.8.3 (2019-03-29) ++++++++++++++++++ **Features and Improvements** - Increased search timeout from 24 to 300 seconds. **Bugfixes** - Fixed bug in setup.py where backports.csv wasn't being installed when installing from pypi. 1.8.2 (2019-03-21) ++++++++++++++++++ **Features and Improvements** - Documnetation updates. - Added support for write-many to modify_metadata. **Bugfixes** - Fixed bug in ``ia tasks --task-id`` where no task was being returned. - Fixed bug in ``internetarchive.get_tasks()`` where it was not possible to query by ``task_id``. - Fixed TypeError bug in upload when uploading with checksum=True. 1.8.1 (2018-06-28) ++++++++++++++++++ **Bugfixes** - Fixed bug in ``ia tasks --get-task-log`` that was returning an unable to parse JSON error. 1.8.0 (2018-06-28) ++++++++++++++++++ **Features and Improvements** - Only use backports.csv for python2 in support of FreeBDS port. - Added a nicer error message to ``ia search`` for authentication errors. - Added support for using netrc files in ``ia configure``. - Added ``--remove`` option to ``ia metadata`` for removing values from single or mutli-field metadata elements. - Added support for appending a metadata value to an existing metadata element (as a new entry, not simply appending to a string). - Added ``--no-change-timestamp`` flag to ``ia download``. Download files retain the timestamp of "now", not of the source material when this option is used. **Bugfixes** - Fixed bug in upload where StringIO objects were not uploadable. - Fixed encoding issues that were causing some ``ia tasks`` commands to fail. - Fixed bug where keep-old-version wasn't working in ``ia move``. - Fixed bug in ``internetarchive.api.modify_metadata`` where debug and other args were not honoured. 1.7.7 (2018-03-05) ++++++++++++++++++ **Features and Improvements** - Added support for downloading on-the-fly archive_marc.xml files. **Bugfixes** - Improved syntax checking in ``ia move`` and ``ia copy``. - Added ``Connection:close`` header to all requests to force close connections after each request. This is a workaround for dealing with a bug on archive.org servers where the server hangs up before sending the complete response. 1.7.6 (2018-01-05) ++++++++++++++++++ **Features and Improvements** - Added ability to set the remote-name for a directory in ``ia upload`` (previously you could only do this for single files). **Bugfixes** - Fixed bug in ``ia delete`` where all requests were failing due to a typo in a function arg. 1.7.5 (2017-12-07) ++++++++++++++++++ **Features and Improvements** - Turned on ``x-archive-keep-old-version`` S3 header by default for all ``ia upload``, ``ia delete``, ``ia copy``, and ``ia move`` commands. This means that any ``ia`` command that clobbers or deletes a command, will save a version of the file in ``/history/files/$key.~N~``. This is only on by default in the CLI, and not in the Python lib. It can be turne off by adding ``-H x-archive-keep-old-version:0`` to any ``ia upload``, ``ia delete``, ``ia copy``, or ``ia move`` command. 1.7.4 (2017-11-06) ++++++++++++++++++ **Features and Improvements** - Increased timeout in search from 12 seconds to 24. - Added ability to set the ``max_retries`` in :func:`internetarchive.search_items`. - Made :meth:`internetarchive.ArchiveSession.mount_http_adapter` a public method for supporting complex custom retry logic. - Added ``--timeout`` option to ``ia search`` for setting a custom timeout. - Loosened requirements for schema library to ``schema>=0.4.0``. **Bugfixes** - The scraping API has reverted to using ``items`` key rather than ``docs`` key. v1.7.3 will still work, but this change keeps ia consistent with the API. 1.7.3 (2017-09-20) ++++++++++++++++++ **Bugfixes** - Fixed bug in search where search requests were failing with ``KeyError: 'items'``. 1.7.2 (2017-09-11) ++++++++++++++++++ **Features and Improvements** - Added support for adding custom headers to ``ia search``. **Bugfixes** - ``internetarchive.utils.get_s3_xml_text()`` is used to parse errors returned by S3 in XML. Sometimes there is no XML in the response. Most of the time this is due to 5xx errors. Either way, we want to always return the HTTPError, even if the XML parsing fails. - Fixed a regression where ``:`` was being stripped from filenames in upload. - Do not create a directory in ``download()`` when ``return_responses`` is ``True``. - Fixed bug in upload where file-like objects were failing with a TypeError exception. 1.7.1 (2017-07-25) ++++++++++++++++++ **Bugfixes** - Fixed bug in ``Item.upload_file()`` where ``checksum`` was being set to ``True`` if it was set to ``None``. 1.7.1 (2017-07-25) ++++++++++++++++++ **Bugfixes** - Fixed bug in ``ia upload`` where all commands would fail if multiple collections were specified (e.g. -m collection:foo -m collection:bar). 1.7.0 (2017-07-25) ++++++++++++++++++ **Features and Improvements** - Loosened up ``jsonpatch`` requirements, as the metadata API now supports more recent versions of the JSON Patch standard. - Added support for building "snap" packages (https://snapcraft.io/). **Bugfixes** - Fixed bug in upload where users were unable to add their own timeout via ``request_kwargs``. - Fixed bug where files with non-ascii filenames failed to upload on some platforms. - Fixed bug in upload where metadata keys with an index (e.g. ``subject[0]``) would make the request fail if the key was the only indexed key provided. - Added a default timeout to ``ArchiveSession.s3_is_overloaded()``. If it times out now, it returns ``True`` (as in, yes, S3 is overloaded). 1.6.0 (2017-06-27) ++++++++++++++++++ **Features and Improvements** - Added 60 second timeout to all upload requests. - Added support for uploading empty files. - Refactored ``Item.get_files()`` to be faster, especially for items with many files. - Updated search to use IA-S3 keys for auth instead of cookies. **Bugfixes** - Fixed bug in upload where derives weren't being queued in some cases where checksum=True was set. - Fixed bug where ``ia tasks`` and other ``Catalog`` functions were always using HTTP even when it should have been HTTPS. - ``ia metadata`` was exiting with a non-zero status for "no changes to xml" errors. This now exits with 0, as nearly every time this happens it should not be considered an "error". - Added unicode support to ``ia upload --spreadsheet`` and ``ia metadata --spreadsheet`` using the ``backports.csv`` module. - Fixed bug in ``ia upload --spreadsheet`` where some metadata was accidentally being copied from previous rows (e.g. when multiple subjects were used). - Submitter wasn't being added to ``ia tasks --json`` ouptut, it now is. - ``row_type`` in ``ia tasks --json`` was returning integer for row-type rather than name (e.g. 'red'). 1.5.0 (2017-02-17) ++++++++++++++++++ **Features and Improvements** - Added option to download() for returning a list of response objects rather than writing files to disk. 1.4.0 (2017-01-26) ++++++++++++++++++ **Bugfixes** - Another bugfix for setting mtime correctly after ``fileobj`` functionality was added to ``ia download``. 1.3.0 (2017-01-26) ++++++++++++++++++ **Bugfixes** - Fixed bug where download was trying to set mtime, even when ``fileobj`` was set to ``True`` (e.g. ``ia download --stdout``). 1.2.0 (2017-01-26) ++++++++++++++++++ **Features and Improvements** - Added ``ia copy`` and ``ia move`` for copying and moving files in archive.org items. - Added support for outputing JSON in ``ia tasks``. - Added support to ``ia download`` to write to stdout instead of file. **Bugfixes** - Fixed bug in upload where AttributeError was rasied when trying to upload file-like objects without a name attribute. - Removed identifier validation from ``ia delete``. If an identifier already exists, we don't need to validate it. This only makes things annoying if an identifier exists but fails ``internetarchive`` id validation. - Fixed bug where error message isn't returned in ``ia upload`` if the response body is not XML. Ideally IA-S3 would always return XML, but that's not the case as of now. Try to dump the HTML in the S3 response if unable to parse XML. - Fixed bug where ArchiveSession headers weren't being sent in prepared requests. - Fixed bug in ``ia upload --size-hint`` where value was an integer, but requests requries it to be a string. - Added support for downloading files to stdout in ``ia download`` and ``File.download``. 1.1.0 (2016-11-18) ++++++++++++++++++ **Features and Improvements** - Make sure collection exists when creating new item via ``ia upload``. If it doesn't, upload will fail. - Refactored tests. **Bugfixes** - Fixed bug where the full filepath was being set as the remote filename in Windows. - Convert all metadata header values to strings for compatability with ``requests>=2.11.0``. 1.0.10 (2016-09-20) +++++++++++++++++++ **Bugfixes** - Convert x-archive-cascade-delete headers to strings for compatability with ``requests>=2.11.0``. 1.0.9 (2016-08-16) ++++++++++++++++++ **Features and Improvements** - Added support to the CLI for providing username and password as options on the command-line. 1.0.8 (2016-08-10) ++++++++++++++++++ **Features and Improvements** - Increased maximum identifier length from 80 to 100 characters in ``ia upload``. **Bugfixes** - As of version 2.11.0 of the requests library, all header values must be strings (i.e. not integers). ``internetarchive`` now converts all header values to strings. 1.0.7 (2016-08-02) ++++++++++++++++++ **Features and Improvements** - Added ``internetarchive.api.get_user_info()``. 1.0.6 (2016-07-14) ++++++++++++++++++ **Bugfixes** - Fixed bug where upload was failing on file-like objects (e.g. StringIO objects). 1.0.5 (2016-07-07) ++++++++++++++++++ **Features and Improvements** - All metadata writes are now submitted at -5 priority by default. This is friendlier to the archive.org catalog, and should only be changed for one-off metadata writes. - Expanded scope of valid identifiers in ``utils.validate_ia_identifier`` (i.e. ``ia upload``). Periods are now allowed. Periods, underscores, and dashes are not allowed as the first character. 1.0.4 (2016-06-28) ++++++++++++++++++ **Features and Improvements** - Search now uses the v1 scraping API endpoint. - Moved ``internetarchive.item.Item.upload.iter_directory()`` to ``internetarchive.utils``. - Added support for downloading "on-the-fly" files (e.g. EPUB, MOBI, and DAISY) via ``ia download --on-the-fly`` or ``item.download(on_the_fly=True)``. **Bugfixes** - ``s3_is_overloaded()`` now returns ``True`` if the call is unsuccessful. - Fixed bug in upload where a derive task wasn't being queued when a directory is uploaded. 1.0.3 (2016-05-16) ++++++++++++++++++ **Features and Improvements** - Use scrape API for getting total number of results rather than the advanced search API. - Improved error messages for IA-S3 (upload) related errors. - Added retry suport to delete. - ``ia delete`` no longer exits if a single request fails when deleting multiple files, but continues onto the next file. If any file fails, the command will exit with a non-zero status code. - All search requests now require authentication via IA-S3 keys. You can run ``ia configure`` to generate a config file that will be used to authenticate all search requests automatically. For more details refer to the following links: http://internetarchive.readthedocs.io/en/latest/quickstart.html?highlight=configure#configuring http://internetarchive.readthedocs.io/en/latest/api.html#configuration - Added ability to specify your own filepath in ``ia configure`` and ``internetarchive.configure()``. **Bugfixes** - Updated ``requests`` lib version requirements. This resolves issues with sending binary strings as bodies in Python 3. - Improved support for Windows, see `https://github.com/jjjake/internetarchive/issues/126 `_ for more details. - Previously all requests were made in HTTP for Python versions < 2.7.9 due to the issues described at `https://urllib3.readthedocs.org/en/latest/security.html `_. In favor of security over convenience, all requests are now made via HTTPS regardless of Python version. Refer to `http://internetarchive.readthedocs.org/en/latest/troubleshooting.html#https-issues `_ if you are experiencing issues. - Fixed bug in ``ia`` CLI where ``--insecure`` was still making HTTPS requests when it should have been making HTTP requests. - Fixed bug in ``ia delete`` where ``--all`` option wasn't working because it was using ``item.iter_files`` instead of ``item.get_files``. - Fixed bug in ``ia upload`` where uploading files with unicode file names were failing. - Fixed bug in upload where filenames with ``;`` characters were being truncated. - Fixed bug in ``internetarchive.catalog`` where TypeError was being raised in Python 3 due to mixing bytes with strings. 1.0.2 (2016-03-07) ++++++++++++++++++ **Bugfixes** - Fixed OverflowError bug in uploads on 32-bit systems when uploading files larger than ~2GB. - Fixed unicode bug in upload where ``urllib.parse.quote`` is unable to parse non-encoded strings. **Features and Improvements** - Only generate MD5s in upload if they are used (i.e. verify, delete, or checksum is True). - verify is off by default in ``ia upload``, it can be turned on with ``ia upload --verify``. 1.0.1 (2016-03-04) ++++++++++++++++++ **Bugfixes** - Fixed memory leak in `ia upload --spreadsheet=metadata.csv`. - Fixed arg parsing bug in `ia` CLI. 1.0.0 (2016-03-01) ++++++++++++++++++ **Features and Improvements** - Renamed ``internetarchive.iacli`` to ``internetarchive.cli``. - Moved ``File`` object to ``internetarchive.files``. - Converted config fromat from YAML to INI to avoid PyYAML requirement. - Use HTTPS by default for Python versions > 2.7.9. - Added ``get_username`` function to API. - Improved Python 3 support. ``internetarchive`` is now being tested against Python versions 2.6, 2.7, 3.4, and 3.5. - Improved plugin support. - Added retry support to download and metadata retrieval. - Added ``Collection`` object. - Made ``Item`` objects hashable and orderable. **Bugfixes** - IA's Advanced Search API no longer supports deep-paging of large result sets. All search functions have been refactored to use the new Scrape API (http://archive.org/help/aboutsearch.htm). Search functions in previous versions are effictively broken, upgrade to >=1.0.0. 0.9.8 (2015-11-09) ++++++++++++++++++ **Bugfixes** - Fixed `ia help` bug. - Fixed bug in `File.download()` where connection errors weren't being caught/retried correctly. 0.9.7 (2015-11-05) ++++++++++++++++++ **Bugfixes** - Cleanup partially downloaded files when `download()` fails. **Features and Improvements** - Added `--format` option to `ia delete`. - Refactored `download()` and `ia download` to behave more like rsync. Files are now clobbered by default, `ignore_existing` and `--ignore-existing` now skip over files already downloaded without making a request. - Added retry support to `download()` and `ia download`. - Added `files` kwarg to `Item.download()` for downloading specific files. - Added `ignore_errors` option to `File.download()` for ignoring (but logging) exceptions. - Added default timeouts to metadata and download requests. - Less verbose output in `ia download` by default, use `ia download --verbose` for old style output. 0.9.6 (2015-10-12) ++++++++++++++++++ **Bugfixes** - Removed sync-db features for now, as lazytaable is not playing nicely with setup.py right now. 0.9.5 (2015-10-12) ++++++++++++++++++ **Features and Improvements** - Added skip based on mtime and length if no other clobber/skip options specified in `download()` and `ia download`. 0.9.4 (2015-10-01) ++++++++++++++++++ **Features and Improvements** - Added `internetarchive.api.get_username()` for retrieving a username with an S3 key-pair. - Added ability to sync downloads via an sqlite database. 0.9.3 (2015-09-28) ++++++++++++++++++ **Features and Improvements** - Added ability to download items from an itemlist or search query in `ia download`. - Made `ia configure` Python 3 compatabile. **Bugfixes** - Fixed bug in `ia upload` where uploading an item with more than one collection specified caused the collection check to fail. 0.9.2 (2015-08-17) ++++++++++++++++++ **Bugfixes** - Added error message for failed `ia configure` calls due to invalid creds. 0.9.1 (2015-08-13) ++++++++++++++++++ **Bugfixes** - Updated docopt to v0.6.2 and PyYAML to v3.11. - Updated setup.py to automatically pull version from `__init__`. 0.8.5 (2015-07-13) ++++++++++++++++++ **Bugfixes** - Fixed UnicodeEncodeError in `ia metadata --append`. **Features and Improvements** - Added configuration documentation to readme. - Updated requests to v2.7.0 0.8.4 (2015-06-18) ++++++++++++++++++ **Features and Improvements** - Added check to `ia upload` to see if the collection being uploaded to exists. Also added an option to override this check. 0.8.3 (2015-05-18) ++++++++++++++++++ **Features and Improvements** - Fixed append to work like a standard metadata update if the metadata field does not yet exist for the given item. 0.8.0 2015-03-09 ++++++++++++++++ **Bugfixes** - Encode filenames in upload URLs. 0.7.9 (2015-01-26) ++++++++++++++++++ **Bugfixes** - Fixed bug in `internetarchive.config.get_auth_config` (i.e. `ia configure`) where logged-in cookies returned expired within hours. Cookies should now be valid for about one year. 0.7.8 (2014-12-23) ++++++++++++++++++ - Output error message when downloading non-existing files in `ia download` rather than raising Python exception. - Fixed IOError in `ia search` when using `head`, `tail`, etc.. - Simplified `ia search` to output only JSON, rather than doing any special formatting. - Added experimental support for creating pex binaries of ia in `Makefile`. 0.7.7 (2014-12-17) ++++++++++++++++++ - Simplified `ia configure`. It now only asks for Archive.org email/password and automatically adds S3 keys and Archive.org cookies to config. See `internetarchive.config.get_auth_config()`. 0.7.6 (2014-12-17) ++++++++++++++++++ - Write metadata to stdout rather than stderr in `ia mine`. - Added options to search archive.org/v2. - Added destdir option to download files/itemdirs to a given destination dir. 0.7.5 (2014-10-08) ++++++++++++++++++ - Fixed typo. 0.7.4 (2014-10-08) ++++++++++++++++++ - Fixed missing "import" typo in `internetarchive.iacli.ia_upload`. 0.7.3 (2014-10-08) ++++++++++++++++++ - Added progress bar to `ia mine`. - Fixed unicode metadata support for `upload()`. 0.7.2 (2014-09-16) ++++++++++++++++++ - Suppress `KeyboardInterrupt` exceptions and exit with status code 130. - Added ability to skip downloading files based on checksum in `ia download`, `Item.download()`, and `File.download()`. - `ia download` is now verbose by default. Output can be suppressed with the `--quiet` flag. - Added an option to not download into item directories, but rather the current working directory (i.e. `ia download --no-directories `). - Added/fixed support for modifying different metadata targets (i.e. files/logo.jpg). 0.7.1 (2014-08-25) ++++++++++++++++++ - Added `Item.s3_is_overloaded()` method for S3 status check. This method is now used on retries in the upload method now as well. This will avoid uploading any data if a 503 is expected. If a 503 is still returned, retries are attempted. - Added `--status-check` option to `ia upload` for S3 status check. - Added `--source` parameter to `ia list` for returning files matching IA source (i.e. original, derivative, metadata, etc.). - Added support to `ia upload` for setting remote-name if only a single file is being uploaded. - Derive tasks are now only queued after the last file has been uploaded. - File URLs are now quoted in `File` objects, for downloading files with specail characters in their filenames 0.7.0 (2014-07-23) ++++++++++++++++++ - Added support for retry on S3 503 SlowDown errors. 0.6.9 (2014-07-15) ++++++++++++++++++ - Added support for \n and \r characters in upload headers. - Added support for reading filenames from stdin when using the `ia delete` command. 0.6.8 (2014-07-11) ++++++++++++++++++ - The delete `ia` subcommand is now verbose by default. - Added glob support to the delete `ia` subcommand (i.e. `ia delete --glob='*jpg'`). - Changed indexed metadata elements to clobber values instead of insert. - AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY are now deprecated. IAS3_ACCESS_KEY and IAS3_SECRET_KEY must be used if setting IAS3 keys via environment variables. python-internetarchive-1.9.9/LICENSE000066400000000000000000001033301400433614500172500ustar00rootroot00000000000000 GNU AFFERO GENERAL PUBLIC LICENSE Version 3, 19 November 2007 Copyright (C) 2007 Free Software Foundation, Inc. Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble The GNU Affero General Public License is a free, copyleft license for software and other kinds of works, specifically designed to ensure cooperation with the community in the case of network server software. The licenses for most software and other practical works are designed to take away your freedom to share and change the works. By contrast, our General Public Licenses are intended to guarantee your freedom to share and change all versions of a program--to make sure it remains free software for all its users. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for them if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs, and that you know you can do these things. Developers that use our General Public Licenses protect your rights with two steps: (1) assert copyright on the software, and (2) offer you this License which gives you legal permission to copy, distribute and/or modify the software. A secondary benefit of defending all users' freedom is that improvements made in alternate versions of the program, if they receive widespread use, become available for other developers to incorporate. Many developers of free software are heartened and encouraged by the resulting cooperation. However, in the case of software used on network servers, this result may fail to come about. The GNU General Public License permits making a modified version and letting the public access it on a server without ever releasing its source code to the public. The GNU Affero General Public License is designed specifically to ensure that, in such cases, the modified source code becomes available to the community. It requires the operator of a network server to provide the source code of the modified version running there to the users of that server. Therefore, public use of a modified version, on a publicly accessible server, gives the public access to the source code of the modified version. An older license, called the Affero General Public License and published by Affero, was designed to accomplish similar goals. This is a different license, not a version of the Affero GPL, but Affero has released a new version of the Affero GPL which permits relicensing under this license. The precise terms and conditions for copying, distribution and modification follow. TERMS AND CONDITIONS 0. Definitions. "This License" refers to version 3 of the GNU Affero General Public License. "Copyright" also means copyright-like laws that apply to other kinds of works, such as semiconductor masks. "The Program" refers to any copyrightable work licensed under this License. Each licensee is addressed as "you". "Licensees" and "recipients" may be individuals or organizations. To "modify" a work means to copy from or adapt all or part of the work in a fashion requiring copyright permission, other than the making of an exact copy. The resulting work is called a "modified version" of the earlier work or a work "based on" the earlier work. A "covered work" means either the unmodified Program or a work based on the Program. To "propagate" a work means to do anything with it that, without permission, would make you directly or secondarily liable for infringement under applicable copyright law, except executing it on a computer or modifying a private copy. Propagation includes copying, distribution (with or without modification), making available to the public, and in some countries other activities as well. To "convey" a work means any kind of propagation that enables other parties to make or receive copies. Mere interaction with a user through a computer network, with no transfer of a copy, is not conveying. An interactive user interface displays "Appropriate Legal Notices" to the extent that it includes a convenient and prominently visible feature that (1) displays an appropriate copyright notice, and (2) tells the user that there is no warranty for the work (except to the extent that warranties are provided), that licensees may convey the work under this License, and how to view a copy of this License. If the interface presents a list of user commands or options, such as a menu, a prominent item in the list meets this criterion. 1. Source Code. The "source code" for a work means the preferred form of the work for making modifications to it. "Object code" means any non-source form of a work. A "Standard Interface" means an interface that either is an official standard defined by a recognized standards body, or, in the case of interfaces specified for a particular programming language, one that is widely used among developers working in that language. The "System Libraries" of an executable work include anything, other than the work as a whole, that (a) is included in the normal form of packaging a Major Component, but which is not part of that Major Component, and (b) serves only to enable use of the work with that Major Component, or to implement a Standard Interface for which an implementation is available to the public in source code form. A "Major Component", in this context, means a major essential component (kernel, window system, and so on) of the specific operating system (if any) on which the executable work runs, or a compiler used to produce the work, or an object code interpreter used to run it. The "Corresponding Source" for a work in object code form means all the source code needed to generate, install, and (for an executable work) run the object code and to modify the work, including scripts to control those activities. However, it does not include the work's System Libraries, or general-purpose tools or generally available free programs which are used unmodified in performing those activities but which are not part of the work. For example, Corresponding Source includes interface definition files associated with source files for the work, and the source code for shared libraries and dynamically linked subprograms that the work is specifically designed to require, such as by intimate data communication or control flow between those subprograms and other parts of the work. The Corresponding Source need not include anything that users can regenerate automatically from other parts of the Corresponding Source. The Corresponding Source for a work in source code form is that same work. 2. Basic Permissions. All rights granted under this License are granted for the term of copyright on the Program, and are irrevocable provided the stated conditions are met. This License explicitly affirms your unlimited permission to run the unmodified Program. The output from running a covered work is covered by this License only if the output, given its content, constitutes a covered work. This License acknowledges your rights of fair use or other equivalent, as provided by copyright law. You may make, run and propagate covered works that you do not convey, without conditions so long as your license otherwise remains in force. You may convey covered works to others for the sole purpose of having them make modifications exclusively for you, or provide you with facilities for running those works, provided that you comply with the terms of this License in conveying all material for which you do not control copyright. Those thus making or running the covered works for you must do so exclusively on your behalf, under your direction and control, on terms that prohibit them from making any copies of your copyrighted material outside their relationship with you. Conveying under any other circumstances is permitted solely under the conditions stated below. Sublicensing is not allowed; section 10 makes it unnecessary. 3. Protecting Users' Legal Rights From Anti-Circumvention Law. No covered work shall be deemed part of an effective technological measure under any applicable law fulfilling obligations under article 11 of the WIPO copyright treaty adopted on 20 December 1996, or similar laws prohibiting or restricting circumvention of such measures. When you convey a covered work, you waive any legal power to forbid circumvention of technological measures to the extent such circumvention is effected by exercising rights under this License with respect to the covered work, and you disclaim any intention to limit operation or modification of the work as a means of enforcing, against the work's users, your or third parties' legal rights to forbid circumvention of technological measures. 4. Conveying Verbatim Copies. You may convey verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice; keep intact all notices stating that this License and any non-permissive terms added in accord with section 7 apply to the code; keep intact all notices of the absence of any warranty; and give all recipients a copy of this License along with the Program. You may charge any price or no price for each copy that you convey, and you may offer support or warranty protection for a fee. 5. Conveying Modified Source Versions. You may convey a work based on the Program, or the modifications to produce it from the Program, in the form of source code under the terms of section 4, provided that you also meet all of these conditions: a) The work must carry prominent notices stating that you modified it, and giving a relevant date. b) The work must carry prominent notices stating that it is released under this License and any conditions added under section 7. This requirement modifies the requirement in section 4 to "keep intact all notices". c) You must license the entire work, as a whole, under this License to anyone who comes into possession of a copy. This License will therefore apply, along with any applicable section 7 additional terms, to the whole of the work, and all its parts, regardless of how they are packaged. This License gives no permission to license the work in any other way, but it does not invalidate such permission if you have separately received it. d) If the work has interactive user interfaces, each must display Appropriate Legal Notices; however, if the Program has interactive interfaces that do not display Appropriate Legal Notices, your work need not make them do so. A compilation of a covered work with other separate and independent works, which are not by their nature extensions of the covered work, and which are not combined with it such as to form a larger program, in or on a volume of a storage or distribution medium, is called an "aggregate" if the compilation and its resulting copyright are not used to limit the access or legal rights of the compilation's users beyond what the individual works permit. Inclusion of a covered work in an aggregate does not cause this License to apply to the other parts of the aggregate. 6. Conveying Non-Source Forms. You may convey a covered work in object code form under the terms of sections 4 and 5, provided that you also convey the machine-readable Corresponding Source under the terms of this License, in one of these ways: a) Convey the object code in, or embodied in, a physical product (including a physical distribution medium), accompanied by the Corresponding Source fixed on a durable physical medium customarily used for software interchange. b) Convey the object code in, or embodied in, a physical product (including a physical distribution medium), accompanied by a written offer, valid for at least three years and valid for as long as you offer spare parts or customer support for that product model, to give anyone who possesses the object code either (1) a copy of the Corresponding Source for all the software in the product that is covered by this License, on a durable physical medium customarily used for software interchange, for a price no more than your reasonable cost of physically performing this conveying of source, or (2) access to copy the Corresponding Source from a network server at no charge. c) Convey individual copies of the object code with a copy of the written offer to provide the Corresponding Source. This alternative is allowed only occasionally and noncommercially, and only if you received the object code with such an offer, in accord with subsection 6b. d) Convey the object code by offering access from a designated place (gratis or for a charge), and offer equivalent access to the Corresponding Source in the same way through the same place at no further charge. You need not require recipients to copy the Corresponding Source along with the object code. If the place to copy the object code is a network server, the Corresponding Source may be on a different server (operated by you or a third party) that supports equivalent copying facilities, provided you maintain clear directions next to the object code saying where to find the Corresponding Source. Regardless of what server hosts the Corresponding Source, you remain obligated to ensure that it is available for as long as needed to satisfy these requirements. e) Convey the object code using peer-to-peer transmission, provided you inform other peers where the object code and Corresponding Source of the work are being offered to the general public at no charge under subsection 6d. A separable portion of the object code, whose source code is excluded from the Corresponding Source as a System Library, need not be included in conveying the object code work. A "User Product" is either (1) a "consumer product", which means any tangible personal property which is normally used for personal, family, or household purposes, or (2) anything designed or sold for incorporation into a dwelling. In determining whether a product is a consumer product, doubtful cases shall be resolved in favor of coverage. For a particular product received by a particular user, "normally used" refers to a typical or common use of that class of product, regardless of the status of the particular user or of the way in which the particular user actually uses, or expects or is expected to use, the product. A product is a consumer product regardless of whether the product has substantial commercial, industrial or non-consumer uses, unless such uses represent the only significant mode of use of the product. "Installation Information" for a User Product means any methods, procedures, authorization keys, or other information required to install and execute modified versions of a covered work in that User Product from a modified version of its Corresponding Source. The information must suffice to ensure that the continued functioning of the modified object code is in no case prevented or interfered with solely because modification has been made. If you convey an object code work under this section in, or with, or specifically for use in, a User Product, and the conveying occurs as part of a transaction in which the right of possession and use of the User Product is transferred to the recipient in perpetuity or for a fixed term (regardless of how the transaction is characterized), the Corresponding Source conveyed under this section must be accompanied by the Installation Information. But this requirement does not apply if neither you nor any third party retains the ability to install modified object code on the User Product (for example, the work has been installed in ROM). The requirement to provide Installation Information does not include a requirement to continue to provide support service, warranty, or updates for a work that has been modified or installed by the recipient, or for the User Product in which it has been modified or installed. Access to a network may be denied when the modification itself materially and adversely affects the operation of the network or violates the rules and protocols for communication across the network. Corresponding Source conveyed, and Installation Information provided, in accord with this section must be in a format that is publicly documented (and with an implementation available to the public in source code form), and must require no special password or key for unpacking, reading or copying. 7. Additional Terms. "Additional permissions" are terms that supplement the terms of this License by making exceptions from one or more of its conditions. Additional permissions that are applicable to the entire Program shall be treated as though they were included in this License, to the extent that they are valid under applicable law. If additional permissions apply only to part of the Program, that part may be used separately under those permissions, but the entire Program remains governed by this License without regard to the additional permissions. When you convey a copy of a covered work, you may at your option remove any additional permissions from that copy, or from any part of it. (Additional permissions may be written to require their own removal in certain cases when you modify the work.) You may place additional permissions on material, added by you to a covered work, for which you have or can give appropriate copyright permission. Notwithstanding any other provision of this License, for material you add to a covered work, you may (if authorized by the copyright holders of that material) supplement the terms of this License with terms: a) Disclaiming warranty or limiting liability differently from the terms of sections 15 and 16 of this License; or b) Requiring preservation of specified reasonable legal notices or author attributions in that material or in the Appropriate Legal Notices displayed by works containing it; or c) Prohibiting misrepresentation of the origin of that material, or requiring that modified versions of such material be marked in reasonable ways as different from the original version; or d) Limiting the use for publicity purposes of names of licensors or authors of the material; or e) Declining to grant rights under trademark law for use of some trade names, trademarks, or service marks; or f) Requiring indemnification of licensors and authors of that material by anyone who conveys the material (or modified versions of it) with contractual assumptions of liability to the recipient, for any liability that these contractual assumptions directly impose on those licensors and authors. All other non-permissive additional terms are considered "further restrictions" within the meaning of section 10. If the Program as you received it, or any part of it, contains a notice stating that it is governed by this License along with a term that is a further restriction, you may remove that term. If a license document contains a further restriction but permits relicensing or conveying under this License, you may add to a covered work material governed by the terms of that license document, provided that the further restriction does not survive such relicensing or conveying. If you add terms to a covered work in accord with this section, you must place, in the relevant source files, a statement of the additional terms that apply to those files, or a notice indicating where to find the applicable terms. Additional terms, permissive or non-permissive, may be stated in the form of a separately written license, or stated as exceptions; the above requirements apply either way. 8. Termination. You may not propagate or modify a covered work except as expressly provided under this License. Any attempt otherwise to propagate or modify it is void, and will automatically terminate your rights under this License (including any patent licenses granted under the third paragraph of section 11). However, if you cease all violation of this License, then your license from a particular copyright holder is reinstated (a) provisionally, unless and until the copyright holder explicitly and finally terminates your license, and (b) permanently, if the copyright holder fails to notify you of the violation by some reasonable means prior to 60 days after the cessation. Moreover, your license from a particular copyright holder is reinstated permanently if the copyright holder notifies you of the violation by some reasonable means, this is the first time you have received notice of violation of this License (for any work) from that copyright holder, and you cure the violation prior to 30 days after your receipt of the notice. Termination of your rights under this section does not terminate the licenses of parties who have received copies or rights from you under this License. If your rights have been terminated and not permanently reinstated, you do not qualify to receive new licenses for the same material under section 10. 9. Acceptance Not Required for Having Copies. You are not required to accept this License in order to receive or run a copy of the Program. Ancillary propagation of a covered work occurring solely as a consequence of using peer-to-peer transmission to receive a copy likewise does not require acceptance. However, nothing other than this License grants you permission to propagate or modify any covered work. These actions infringe copyright if you do not accept this License. Therefore, by modifying or propagating a covered work, you indicate your acceptance of this License to do so. 10. Automatic Licensing of Downstream Recipients. Each time you convey a covered work, the recipient automatically receives a license from the original licensors, to run, modify and propagate that work, subject to this License. You are not responsible for enforcing compliance by third parties with this License. An "entity transaction" is a transaction transferring control of an organization, or substantially all assets of one, or subdividing an organization, or merging organizations. If propagation of a covered work results from an entity transaction, each party to that transaction who receives a copy of the work also receives whatever licenses to the work the party's predecessor in interest had or could give under the previous paragraph, plus a right to possession of the Corresponding Source of the work from the predecessor in interest, if the predecessor has it or can get it with reasonable efforts. You may not impose any further restrictions on the exercise of the rights granted or affirmed under this License. For example, you may not impose a license fee, royalty, or other charge for exercise of rights granted under this License, and you may not initiate litigation (including a cross-claim or counterclaim in a lawsuit) alleging that any patent claim is infringed by making, using, selling, offering for sale, or importing the Program or any portion of it. 11. Patents. A "contributor" is a copyright holder who authorizes use under this License of the Program or a work on which the Program is based. The work thus licensed is called the contributor's "contributor version". A contributor's "essential patent claims" are all patent claims owned or controlled by the contributor, whether already acquired or hereafter acquired, that would be infringed by some manner, permitted by this License, of making, using, or selling its contributor version, but do not include claims that would be infringed only as a consequence of further modification of the contributor version. For purposes of this definition, "control" includes the right to grant patent sublicenses in a manner consistent with the requirements of this License. Each contributor grants you a non-exclusive, worldwide, royalty-free patent license under the contributor's essential patent claims, to make, use, sell, offer for sale, import and otherwise run, modify and propagate the contents of its contributor version. In the following three paragraphs, a "patent license" is any express agreement or commitment, however denominated, not to enforce a patent (such as an express permission to practice a patent or covenant not to sue for patent infringement). To "grant" such a patent license to a party means to make such an agreement or commitment not to enforce a patent against the party. If you convey a covered work, knowingly relying on a patent license, and the Corresponding Source of the work is not available for anyone to copy, free of charge and under the terms of this License, through a publicly available network server or other readily accessible means, then you must either (1) cause the Corresponding Source to be so available, or (2) arrange to deprive yourself of the benefit of the patent license for this particular work, or (3) arrange, in a manner consistent with the requirements of this License, to extend the patent license to downstream recipients. "Knowingly relying" means you have actual knowledge that, but for the patent license, your conveying the covered work in a country, or your recipient's use of the covered work in a country, would infringe one or more identifiable patents in that country that you have reason to believe are valid. If, pursuant to or in connection with a single transaction or arrangement, you convey, or propagate by procuring conveyance of, a covered work, and grant a patent license to some of the parties receiving the covered work authorizing them to use, propagate, modify or convey a specific copy of the covered work, then the patent license you grant is automatically extended to all recipients of the covered work and works based on it. A patent license is "discriminatory" if it does not include within the scope of its coverage, prohibits the exercise of, or is conditioned on the non-exercise of one or more of the rights that are specifically granted under this License. You may not convey a covered work if you are a party to an arrangement with a third party that is in the business of distributing software, under which you make payment to the third party based on the extent of your activity of conveying the work, and under which the third party grants, to any of the parties who would receive the covered work from you, a discriminatory patent license (a) in connection with copies of the covered work conveyed by you (or copies made from those copies), or (b) primarily for and in connection with specific products or compilations that contain the covered work, unless you entered into that arrangement, or that patent license was granted, prior to 28 March 2007. Nothing in this License shall be construed as excluding or limiting any implied license or other defenses to infringement that may otherwise be available to you under applicable patent law. 12. No Surrender of Others' Freedom. If conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot convey a covered work so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not convey it at all. For example, if you agree to terms that obligate you to collect a royalty for further conveying from those to whom you convey the Program, the only way you could satisfy both those terms and this License would be to refrain entirely from conveying the Program. 13. Remote Network Interaction; Use with the GNU General Public License. Notwithstanding any other provision of this License, if you modify the Program, your modified version must prominently offer all users interacting with it remotely through a computer network (if your version supports such interaction) an opportunity to receive the Corresponding Source of your version by providing access to the Corresponding Source from a network server at no charge, through some standard or customary means of facilitating copying of software. This Corresponding Source shall include the Corresponding Source for any work covered by version 3 of the GNU General Public License that is incorporated pursuant to the following paragraph. Notwithstanding any other provision of this License, you have permission to link or combine any covered work with a work licensed under version 3 of the GNU General Public License into a single combined work, and to convey the resulting work. The terms of this License will continue to apply to the part which is the covered work, but the work with which it is combined will remain governed by version 3 of the GNU General Public License. 14. Revised Versions of this License. The Free Software Foundation may publish revised and/or new versions of the GNU Affero General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies that a certain numbered version of the GNU Affero General Public License "or any later version" applies to it, you have the option of following the terms and conditions either of that numbered version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of the GNU Affero General Public License, you may choose any version ever published by the Free Software Foundation. If the Program specifies that a proxy can decide which future versions of the GNU Affero General Public License can be used, that proxy's public statement of acceptance of a version permanently authorizes you to choose that version for the Program. Later license versions may give you additional or different permissions. However, no additional obligations are imposed on any author or copyright holder as a result of your choosing to follow a later version. 15. Disclaimer of Warranty. THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 16. Limitation of Liability. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. 17. Interpretation of Sections 15 and 16. If the disclaimer of warranty and limitation of liability provided above cannot be given local legal effect according to their terms, reviewing courts shall apply local law that most closely approximates an absolute waiver of all civil liability in connection with the Program, unless a warranty or assumption of liability accompanies a copy of the Program in return for a fee. END OF TERMS AND CONDITIONS How to Apply These Terms to Your New Programs If you develop a new program, and you want it to be of the greatest possible use to the public, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms. To do so, attach the following notices to the program. It is safest to attach them to the start of each source file to most effectively state the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. Copyright (C) This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details. You should have received a copy of the GNU Affero General Public License along with this program. If not, see . Also add information on how to contact you by electronic and paper mail. If your software can interact with users remotely through a computer network, you should also make sure that it provides a way for users to get its source. For example, if your program is a web application, its interface could display a "Source" link that leads users to an archive of the code. There are many ways you could offer source, and different solutions will be better for different programs; see section 13 for the specific requirements. You should also get your employer (if you work as a programmer) or school, if any, to sign a "copyright disclaimer" for the program, if necessary. For more information on this, and how to apply and follow the GNU AGPL, see . python-internetarchive-1.9.9/MANIFEST.in000066400000000000000000000001051400433614500177750ustar00rootroot00000000000000include LICENSE AUTHORS.rst HISTORY.rst recursive-include tests *.py python-internetarchive-1.9.9/Makefile000066400000000000000000000030501400433614500177010ustar00rootroot00000000000000.PHONY: docs VERSION=$(shell grep -m1 __version__ internetarchive/__init__.py | cut -d\' -f2) init: pip install responses==0.5.0 pytest-cov pytest-pep8 pip install -e . clean: find . -type f -name '*\.pyc' -delete find . -type d -name '__pycache__' -delete pep8-test: py.test --pep8 -m pep8 --cov-report term-missing --cov internetarchive test: py.test --pep8 --cov-report term-missing --cov internetarchive publish: git tag -a v$(VERSION) -m 'version $(VERSION)' git push --tags python setup.py sdist upload python setup.py bdist_wheel upload docs-init: pip install -r docs/requirements.txt docs: cd docs && make html @echo "\033[95m\n\nBuild successful! View the docs homepage at docs/build/html/index.html.\n\033[0m" binary: # This requires using https://github.com/jjjake/pex which has been hacked for multi-platform support. pex . --python python3.7 --python python2 --python-shebang='/usr/bin/env python' --platform=linux-x86_64 --platform=macosx_10_11 -e internetarchive.cli.ia:main -o ia-$(VERSION)-py2.py3-none-any.pex -r pex-requirements.txt # make with py2??? # Use pex==1.4.0 #pex . --python python3 --python /usr/bin/python --python-shebang='/usr/bin/env python' --platform=linux-x86_64 --platform=macosx_10_11 -e internetarchive.cli.ia:main -o ia-$(VERSION)-py2.py3-none-any.pex -f wheelhouse/ --no-pypi publish-binary: ./ia-$(VERSION)-py2.py3-none-any.pex upload ia-pex ia-$(VERSION)-py2.py3-none-any.pex --no-derive ./ia-$(VERSION)-py2.py3-none-any.pex upload ia-pex ia-$(VERSION)-py2.py3-none-any.pex --remote-name=ia --no-derive python-internetarchive-1.9.9/README.rst000066400000000000000000000027171400433614500177410ustar00rootroot00000000000000A Python and Command-Line Interface to Archive.org ================================================== .. image:: https://badges.gitter.im/internetarchive/ia.svg :alt: Join the chat at https://gitter.im/internetarchive/ia :target: https://gitter.im/internetarchive/ia?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge |travis| .. |travis| image:: https://travis-ci.com/jjjake/internetarchive.svg :target: https://travis-ci.com/jjjake/internetarchive This package installs a command-line tool named ``ia`` for using Archive.org from the command-line. It also installs the ``internetarchive`` Python module for programatic access to archive.org. Please report all bugs and issues on `Github `__. Installation ------------ You can install this module via pip: .. code:: bash $ pip install internetarchive Binaries of the command-line tool are also available: .. code:: bash $ curl -LO https://archive.org/download/ia-pex/ia $ chmod +x ia $ ./ia help Documentation ------------- Documentation is available at `https://archive.org/services/docs/api/internetarchive `_. Contributing ------------ All contributions are welcome and appreciated. Please see `https://archive.org/services/docs/api/internetarchive/contributing.html `_ for more details. python-internetarchive-1.9.9/docs/000077500000000000000000000000001400433614500171735ustar00rootroot00000000000000python-internetarchive-1.9.9/docs/Makefile000066400000000000000000000152271400433614500206420ustar00rootroot00000000000000# Makefile for Sphinx documentation # # You can set these variables from the command line. SPHINXOPTS = SPHINXBUILD = sphinx-build PAPER = BUILDDIR = build # User-friendly check for sphinx-build ifeq ($(shell which $(SPHINXBUILD) >/dev/null 2>&1; echo $$?), 1) $(error The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed, then set the SPHINXBUILD environment variable to point to the full path of the '$(SPHINXBUILD)' executable. Alternatively you can add the directory with the executable to your PATH. If you don't have Sphinx installed, grab it from http://sphinx-doc.org/) endif # Internal variables. PAPEROPT_a4 = -D latex_paper_size=a4 PAPEROPT_letter = -D latex_paper_size=letter ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) source # the i18n builder cannot share the environment and doctrees with the others I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) source .PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest gettext help: @echo "Please use \`make ' where is one of" @echo " html to make standalone HTML files" @echo " dirhtml to make HTML files named index.html in directories" @echo " singlehtml to make a single large HTML file" @echo " pickle to make pickle files" @echo " json to make JSON files" @echo " htmlhelp to make HTML files and a HTML help project" @echo " qthelp to make HTML files and a qthelp project" @echo " devhelp to make HTML files and a Devhelp project" @echo " epub to make an epub" @echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter" @echo " latexpdf to make LaTeX files and run them through pdflatex" @echo " latexpdfja to make LaTeX files and run them through platex/dvipdfmx" @echo " text to make text files" @echo " man to make manual pages" @echo " texinfo to make Texinfo files" @echo " info to make Texinfo files and run them through makeinfo" @echo " gettext to make PO message catalogs" @echo " changes to make an overview of all changed/added/deprecated items" @echo " xml to make Docutils-native XML files" @echo " pseudoxml to make pseudoxml-XML files for display purposes" @echo " linkcheck to check all external links for integrity" @echo " doctest to run all doctests embedded in the documentation (if enabled)" clean: rm -rf $(BUILDDIR)/* html: $(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html @echo @echo "Build finished. The HTML pages are in $(BUILDDIR)/html." dirhtml: $(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml @echo @echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml." singlehtml: $(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml @echo @echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml." pickle: $(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle @echo @echo "Build finished; now you can process the pickle files." json: $(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json @echo @echo "Build finished; now you can process the JSON files." htmlhelp: $(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp @echo @echo "Build finished; now you can run HTML Help Workshop with the" \ ".hhp project file in $(BUILDDIR)/htmlhelp." qthelp: $(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp @echo @echo "Build finished; now you can run "qcollectiongenerator" with the" \ ".qhcp project file in $(BUILDDIR)/qthelp, like this:" @echo "# qcollectiongenerator $(BUILDDIR)/qthelp/internetarchive.qhcp" @echo "To view the help file:" @echo "# assistant -collectionFile $(BUILDDIR)/qthelp/internetarchive.qhc" devhelp: $(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp @echo @echo "Build finished." @echo "To view the help file:" @echo "# mkdir -p $$HOME/.local/share/devhelp/internetarchive" @echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/internetarchive" @echo "# devhelp" epub: $(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub @echo @echo "Build finished. The epub file is in $(BUILDDIR)/epub." latex: $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex @echo @echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex." @echo "Run \`make' in that directory to run these through (pdf)latex" \ "(use \`make latexpdf' here to do that automatically)." latexpdf: $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex @echo "Running LaTeX files through pdflatex..." $(MAKE) -C $(BUILDDIR)/latex all-pdf @echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex." latexpdfja: $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex @echo "Running LaTeX files through platex and dvipdfmx..." $(MAKE) -C $(BUILDDIR)/latex all-pdf-ja @echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex." text: $(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text @echo @echo "Build finished. The text files are in $(BUILDDIR)/text." man: $(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man @echo @echo "Build finished. The manual pages are in $(BUILDDIR)/man." texinfo: $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo @echo @echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo." @echo "Run \`make' in that directory to run these through makeinfo" \ "(use \`make info' here to do that automatically)." info: $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo @echo "Running Texinfo files through makeinfo..." make -C $(BUILDDIR)/texinfo info @echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo." gettext: $(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale @echo @echo "Build finished. The message catalogs are in $(BUILDDIR)/locale." changes: $(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes @echo @echo "The overview file is in $(BUILDDIR)/changes." linkcheck: $(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck @echo @echo "Link check complete; look for any errors in the above output " \ "or in $(BUILDDIR)/linkcheck/output.txt." doctest: $(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest @echo "Testing of doctests in the sources finished, look at the " \ "results in $(BUILDDIR)/doctest/output.txt." xml: $(SPHINXBUILD) -b xml $(ALLSPHINXOPTS) $(BUILDDIR)/xml @echo @echo "Build finished. The XML files are in $(BUILDDIR)/xml." pseudoxml: $(SPHINXBUILD) -b pseudoxml $(ALLSPHINXOPTS) $(BUILDDIR)/pseudoxml @echo @echo "Build finished. The pseudo-XML files are in $(BUILDDIR)/pseudoxml." python-internetarchive-1.9.9/docs/make.bat000066400000000000000000000151101400433614500205760ustar00rootroot00000000000000@ECHO OFF REM Command file for Sphinx documentation if "%SPHINXBUILD%" == "" ( set SPHINXBUILD=sphinx-build ) set BUILDDIR=build set ALLSPHINXOPTS=-d %BUILDDIR%/doctrees %SPHINXOPTS% source set I18NSPHINXOPTS=%SPHINXOPTS% source if NOT "%PAPER%" == "" ( set ALLSPHINXOPTS=-D latex_paper_size=%PAPER% %ALLSPHINXOPTS% set I18NSPHINXOPTS=-D latex_paper_size=%PAPER% %I18NSPHINXOPTS% ) if "%1" == "" goto help if "%1" == "help" ( :help echo.Please use `make ^` where ^ is one of echo. html to make standalone HTML files echo. dirhtml to make HTML files named index.html in directories echo. singlehtml to make a single large HTML file echo. pickle to make pickle files echo. json to make JSON files echo. htmlhelp to make HTML files and a HTML help project echo. qthelp to make HTML files and a qthelp project echo. devhelp to make HTML files and a Devhelp project echo. epub to make an epub echo. latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter echo. text to make text files echo. man to make manual pages echo. texinfo to make Texinfo files echo. gettext to make PO message catalogs echo. changes to make an overview over all changed/added/deprecated items echo. xml to make Docutils-native XML files echo. pseudoxml to make pseudoxml-XML files for display purposes echo. linkcheck to check all external links for integrity echo. doctest to run all doctests embedded in the documentation if enabled goto end ) if "%1" == "clean" ( for /d %%i in (%BUILDDIR%\*) do rmdir /q /s %%i del /q /s %BUILDDIR%\* goto end ) %SPHINXBUILD% 2> nul if errorlevel 9009 ( echo. echo.The 'sphinx-build' command was not found. Make sure you have Sphinx echo.installed, then set the SPHINXBUILD environment variable to point echo.to the full path of the 'sphinx-build' executable. Alternatively you echo.may add the Sphinx directory to PATH. echo. echo.If you don't have Sphinx installed, grab it from echo.http://sphinx-doc.org/ exit /b 1 ) if "%1" == "html" ( %SPHINXBUILD% -b html %ALLSPHINXOPTS% %BUILDDIR%/html if errorlevel 1 exit /b 1 echo. echo.Build finished. The HTML pages are in %BUILDDIR%/html. goto end ) if "%1" == "dirhtml" ( %SPHINXBUILD% -b dirhtml %ALLSPHINXOPTS% %BUILDDIR%/dirhtml if errorlevel 1 exit /b 1 echo. echo.Build finished. The HTML pages are in %BUILDDIR%/dirhtml. goto end ) if "%1" == "singlehtml" ( %SPHINXBUILD% -b singlehtml %ALLSPHINXOPTS% %BUILDDIR%/singlehtml if errorlevel 1 exit /b 1 echo. echo.Build finished. The HTML pages are in %BUILDDIR%/singlehtml. goto end ) if "%1" == "pickle" ( %SPHINXBUILD% -b pickle %ALLSPHINXOPTS% %BUILDDIR%/pickle if errorlevel 1 exit /b 1 echo. echo.Build finished; now you can process the pickle files. goto end ) if "%1" == "json" ( %SPHINXBUILD% -b json %ALLSPHINXOPTS% %BUILDDIR%/json if errorlevel 1 exit /b 1 echo. echo.Build finished; now you can process the JSON files. goto end ) if "%1" == "htmlhelp" ( %SPHINXBUILD% -b htmlhelp %ALLSPHINXOPTS% %BUILDDIR%/htmlhelp if errorlevel 1 exit /b 1 echo. echo.Build finished; now you can run HTML Help Workshop with the ^ .hhp project file in %BUILDDIR%/htmlhelp. goto end ) if "%1" == "qthelp" ( %SPHINXBUILD% -b qthelp %ALLSPHINXOPTS% %BUILDDIR%/qthelp if errorlevel 1 exit /b 1 echo. echo.Build finished; now you can run "qcollectiongenerator" with the ^ .qhcp project file in %BUILDDIR%/qthelp, like this: echo.^> qcollectiongenerator %BUILDDIR%\qthelp\internetarchive.qhcp echo.To view the help file: echo.^> assistant -collectionFile %BUILDDIR%\qthelp\internetarchive.ghc goto end ) if "%1" == "devhelp" ( %SPHINXBUILD% -b devhelp %ALLSPHINXOPTS% %BUILDDIR%/devhelp if errorlevel 1 exit /b 1 echo. echo.Build finished. goto end ) if "%1" == "epub" ( %SPHINXBUILD% -b epub %ALLSPHINXOPTS% %BUILDDIR%/epub if errorlevel 1 exit /b 1 echo. echo.Build finished. The epub file is in %BUILDDIR%/epub. goto end ) if "%1" == "latex" ( %SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex if errorlevel 1 exit /b 1 echo. echo.Build finished; the LaTeX files are in %BUILDDIR%/latex. goto end ) if "%1" == "latexpdf" ( %SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex cd %BUILDDIR%/latex make all-pdf cd %BUILDDIR%/.. echo. echo.Build finished; the PDF files are in %BUILDDIR%/latex. goto end ) if "%1" == "latexpdfja" ( %SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex cd %BUILDDIR%/latex make all-pdf-ja cd %BUILDDIR%/.. echo. echo.Build finished; the PDF files are in %BUILDDIR%/latex. goto end ) if "%1" == "text" ( %SPHINXBUILD% -b text %ALLSPHINXOPTS% %BUILDDIR%/text if errorlevel 1 exit /b 1 echo. echo.Build finished. The text files are in %BUILDDIR%/text. goto end ) if "%1" == "man" ( %SPHINXBUILD% -b man %ALLSPHINXOPTS% %BUILDDIR%/man if errorlevel 1 exit /b 1 echo. echo.Build finished. The manual pages are in %BUILDDIR%/man. goto end ) if "%1" == "texinfo" ( %SPHINXBUILD% -b texinfo %ALLSPHINXOPTS% %BUILDDIR%/texinfo if errorlevel 1 exit /b 1 echo. echo.Build finished. The Texinfo files are in %BUILDDIR%/texinfo. goto end ) if "%1" == "gettext" ( %SPHINXBUILD% -b gettext %I18NSPHINXOPTS% %BUILDDIR%/locale if errorlevel 1 exit /b 1 echo. echo.Build finished. The message catalogs are in %BUILDDIR%/locale. goto end ) if "%1" == "changes" ( %SPHINXBUILD% -b changes %ALLSPHINXOPTS% %BUILDDIR%/changes if errorlevel 1 exit /b 1 echo. echo.The overview file is in %BUILDDIR%/changes. goto end ) if "%1" == "linkcheck" ( %SPHINXBUILD% -b linkcheck %ALLSPHINXOPTS% %BUILDDIR%/linkcheck if errorlevel 1 exit /b 1 echo. echo.Link check complete; look for any errors in the above output ^ or in %BUILDDIR%/linkcheck/output.txt. goto end ) if "%1" == "doctest" ( %SPHINXBUILD% -b doctest %ALLSPHINXOPTS% %BUILDDIR%/doctest if errorlevel 1 exit /b 1 echo. echo.Testing of doctests in the sources finished, look at the ^ results in %BUILDDIR%/doctest/output.txt. goto end ) if "%1" == "xml" ( %SPHINXBUILD% -b xml %ALLSPHINXOPTS% %BUILDDIR%/xml if errorlevel 1 exit /b 1 echo. echo.Build finished. The XML files are in %BUILDDIR%/xml. goto end ) if "%1" == "pseudoxml" ( %SPHINXBUILD% -b pseudoxml %ALLSPHINXOPTS% %BUILDDIR%/pseudoxml if errorlevel 1 exit /b 1 echo. echo.Build finished. The pseudo-XML files are in %BUILDDIR%/pseudoxml. goto end ) :end python-internetarchive-1.9.9/docs/requirements.txt000066400000000000000000000000561400433614500224600ustar00rootroot00000000000000Sphinx==1.2.2 alabaster==0.7.6 docutils==0.12 python-internetarchive-1.9.9/docs/source/000077500000000000000000000000001400433614500204735ustar00rootroot00000000000000python-internetarchive-1.9.9/docs/source/_static/000077500000000000000000000000001400433614500221215ustar00rootroot00000000000000python-internetarchive-1.9.9/docs/source/_static/ia.png000066400000000000000000000111021400433614500232130ustar00rootroot00000000000000PNG  IHDRXgAMA asRGB cHRMz&u0`:pQ<bKGD pHYsHHFk>7IDATxKly؞8vD8/$"H K֬` [X!bÆ%B"xl`'q+!x=wqo ׮:]O*33=9z 0!dERI@PBPBPBPBPBPBPBPBPBPBPBPBPBPBPBPBPBPBPBPBPK}A$^lV* qU?/^D[[paѣGߏnqΦ Cܽ{n8FFFpϘAVzl6;vG<^tttuݤ򆅂D@&kjחqLNNbqqʆz=sN۷GӃ["Xֽ rSSSr}6 uى|Q;v2J0 Q,199K.-7׮]n 㠭mtwwm]5V pMOG6&&&nz=d>|Ge:;;>*c Ν;ŋP͡h$"T*đ#G*adaaBayi||nݻwPo",N:>IL ;oM: a^J{%XOIN<'=+ 0zj&A/[_~eCpEA pNdeqVb P׹Z< BM!DuyO?7|dك7xI7|3U裏sssI4MË$Nhr+رcV}1G kRhDY߸ko/Z'!6ag^[q\-q-qDՂu;w_|WgS## ={ӫ8N8^z ]]]])={0 ݍW_}F;wg~X{0Ķmpi?~^3RT\r%]M7---'|b$pxxxMq{^zk7'=ƾ? =[SĨ., JD[lю#.AZ[[EQfE8mJ]Q---;j,ZZZcłܶı\N4sGM&Il |^kPc%٬h!jH۱:BIslml3wxgM% sM]V%8?L&V瘈< ImI׈S UG۱Zd$QRژM&Il)m$`q}ZSHVljAl)mtR Tj It7L -?WbF1U.ƙA]&3giM0S3d,ΙQj* U۱Z\.]5KS_@Vy8aI믿W_}K.^T*>*IENDB`python-internetarchive-1.9.9/docs/source/_templates/000077500000000000000000000000001400433614500226305ustar00rootroot00000000000000python-internetarchive-1.9.9/docs/source/_templates/about.html000066400000000000000000000001331400433614500246250ustar00rootroot00000000000000

The internetarchive library is a Python & command-line interface to archive.org

python-internetarchive-1.9.9/docs/source/_templates/sidebarlogo.html000066400000000000000000000002411400433614500260050ustar00rootroot00000000000000 python-internetarchive-1.9.9/docs/source/_templates/usefullinks.html000066400000000000000000000011611400433614500260610ustar00rootroot00000000000000

Useful links

python-internetarchive-1.9.9/docs/source/api.rst000066400000000000000000000133151400433614500220010ustar00rootroot00000000000000.. _api: Developer Interface =================== .. module:: internetarchive Configuration ------------- Certain functions of the internetarchive library require your archive.org credentials (i.e. uploading, modifying metadata, searching). Your credentials and other configurations can be provided via a dictionary when instantiating an :class:`ArchiveSession` or :class:`Item` object, or in a config file. The easiest way to create a config file is with the `configure `_ function:: >>> from internetarchive import configure >>> configure('user@example.com', 'password') Config files are stored in either ``$HOME/.ia`` or ``$HOME/.config/ia.ini`` by default. You can also specify your own path:: >>> from internetarchive import configure >>> configure('user@example.com', 'password', config_file='/home/jake/.config/ia-alternate.ini') Custom config files can be specified when instantiating an :class:`ArchiveSession` object:: >>> from internetarchive import get_session >>> s = get_session(config_file='/home/jake/.config/ia-alternate.ini') Or an :class:`Item` object:: >>> from internetarchive import get_item >>> item = get_item('nasa', config_file='/home/jake/.config/ia-alternate.ini') IA-S3 Configuration ~~~~~~~~~~~~~~~~~~~ Your IA-S3 keys are required for uploading and modifying metadata. You can retrieve your IA-S3 keys at https://archive.org/account/s3.php. They can be specified in your config file like so:: [s3] access = mYaccEsSkEY secret = mYs3cREtKEy Or, using the :class:`ArchiveSession` object:: >>> from internetarchive import get_session >>> c = {'s3': {'access': 'mYaccEsSkEY', 'secret': 'mYs3cREtKEy'}} >>> s = get_session(config=c) >>> s.access_key 'mYaccEsSkEY' Cookie Configuration ~~~~~~~~~~~~~~~~~~~~ Your archive.org logged-in cookies are required for downloading access-restricted files that you have permissions to and retrieving information about archive.org catalog tasks. Your cookies can be specified like so:: [cookies] logged-in-user = user%40example.com logged-in-sig = Or, using the :class:`ArchiveSession` object:: >>> from internetarchive import get_session >>> c = {'cookies': {'logged-in-user': 'user%40example.com', 'logged-in-sig': 'foo'}} >>> s = get_session(config=c) >>> s.cookies['logged-in-user'] 'user%40example.com' Logging Configuration ~~~~~~~~~~~~~~~~~~~~~ You can specify logging levels and the location of your log file like so:: [logging] level = INFO file = /tmp/ia.log Or, using the :class:`ArchiveSession` object:: >>> from internetarchive import get_session >>> c = {'logging': {'level': 'INFO', 'file': '/tmp/ia.log'}} >>> s = get_session(config=c) By default logging is turned off. Other Configuration ~~~~~~~~~~~~~~~~~~~ By default all requests are HTTPS in Python versions 2.7.10 or newer. You can change this setting in your config file in the ``general`` section:: [general] secure = False Or, using the :class:`ArchiveSession` object:: >>> from internetarchive import get_session >>> s = get_session(config={'general': {'secure': False}}) In the example above, all requests will be made via HTTP. ArchiveSession Objects ---------------------- The ArchiveSession object is subclassed from :class:`requests.Session`. It collects together your credentials and config. .. autofunction:: get_session Item Objects ------------ :class:`Item` objects represent `Internet Archive items `_. From the :class:`Item` object you can create new items, upload files to existing items, read and write metadata, and download or delete files. .. autofunction:: get_item Uploading ~~~~~~~~~ Uploading to an item can be done using :meth:`Item.upload`:: >>> item = get_item('my_item') >>> r = item.upload('/home/user/foo.txt') Or :func:`internetarchive.upload`:: >>> from internetarchive import upload >>> r = upload('my_item', '/home/user/foo.txt') The item will automatically be created if it does not exist. Refer to `archive.org Identifiers `_ for more information on creating valid archive.org identifiers. Setting Remote Filenames ^^^^^^^^^^^^^^^^^^^^^^^^ Remote filenames can be defined using a dictionary:: >>> from io import BytesIO >>> fh = BytesIO() >>> fh.write(b'foo bar') >>> item.upload({'my-remote-filename.txt': fh}) .. autofunction:: upload Metadata ~~~~~~~~ .. autofunction:: modify_metadata The default target to write to is ``metadata``. If you would like to write to another target, such as ``files``, you can specify so using the ``target`` parameter. For example, if we had an item whose identifier was ``my_identifier`` and you wanted to add a metadata field to a file within the item called foo.txt:: >>> r = modify_metadata('my_identifier', metadata=dict(title='My File'), target='files/foo.txt') >>> from internetarchive import get_files >>> f = list(get_files('iacli-test-item301', 'foo.txt'))[0] >>> f.title 'My File' You can also create new targets if they don’t exist:: >>> r = modify_metadata('my_identifier', metadata=dict(foo='bar'), target='extra_metadata') >>> from internetarchive import get_item >>> item = get_item('my_identifier') >>> item.item_metadata['extra_metadata'] {'foo': 'bar'} Downloading ~~~~~~~~~~~ .. autofunction:: download Deleting ~~~~~~~~ .. autofunction:: delete File Objects ~~~~~~~~~~~~ .. autofunction:: get_files Searching Items --------------- .. autofunction:: search_items Internet Archive Tasks ---------------------- .. autofunction:: get_tasks python-internetarchive-1.9.9/docs/source/authors.rst000066400000000000000000000000371400433614500227120ustar00rootroot00000000000000.. include:: ../../AUTHORS.rst python-internetarchive-1.9.9/docs/source/cli.rst000066400000000000000000000317141400433614500220020ustar00rootroot00000000000000Command-Line Interface ====================== The ``ia`` command-line tool is installed with ``internetarchive``, or `available as a binary `_. ``ia`` allows you to interact with various archive.org services from the command-line. Getting Started --------------- The easiest way to start using ``ia`` is downloading a binary. The only requirements of the binary are a Unix-like environment with Python installed. To download the latest binary, and make it executable simply: .. code:: bash $ curl -LOs https://archive.org/download/ia-pex/ia $ chmod +x ia $ ./ia help A command line interface to archive.org. usage: ia [--help | --version] ia [--config-file FILE] [--log | --debug] [--insecure] []... options: -h, --help -v, --version -c, --config-file FILE Use FILE as config file. -l, --log Turn on logging [default: False]. -d, --debug Turn on verbose logging [default: False]. -i, --insecure Use HTTP for all requests instead of HTTPS [default: false] commands: help Retrieve help for subcommands. configure Configure `ia`. metadata Retrieve and modify metadata for items on archive.org. upload Upload items to archive.org. download Download files from archive.org. delete Delete files from archive.org. search Search archive.org. tasks Retrieve information about your archive.org catalog tasks. list List files in a given item. See 'ia help ' for more information on a specific command. Metadata -------- Reading Metadata ^^^^^^^^^^^^^^^^ You can use ``ia`` to read and write metadata from archive.org. To retrieve all of an item's metadata in JSON, simply: .. code:: bash $ ia metadata TripDown1905 A particularly useful tool to use alongside ``ia`` is `jq `_. ``jq`` is a command-line tool for parsing JSON. For example: .. code:: bash $ ia metadata TripDown1905 | jq '.metadata.date' "1906" Modifying Metadata ^^^^^^^^^^^^^^^^^^ Once ``ia`` has been `configured `_, you can modify `metadata `_: .. code:: bash $ ia metadata --modify="foo:bar" --modify="baz:foooo" You can remove a metadata field by setting the value of the given field to ``REMOVE_TAG``. For example, to remove the metadata field ``foo`` from the item ````: .. code:: bash $ ia metadata --modify="foo:REMOVE_TAG" Note that some metadata fields (e.g. ``mediatype``) cannot be modified, and must instead be set initially on upload. The default target to write to is ``metadata``. If you would like to write to another target, such as ``files``, you can specify so using the ``--target`` parameter. For example, if we had an item whose identifier was ``my_identifier`` and we wanted to add a metadata field to a file within the item called ``foo.txt``: .. code:: bash $ ia metadata my_identifier --target="files/foo.txt" --modify="title:My File" You can also create new targets if they don't exist: .. code:: bash $ ia metadata --target="extra_metadata" --modify="foo:bar" There is also an ``--append`` option which allows you to append a string to an existing metadata strings (Note: use ``--append-list`` for appending elments to a list). For example, if your item's title was ``Foo`` and you wanted it to be ``Foo Bar``, you could simply do: .. code:: bash $ ia metadata --append="title: Bar" If you would like to add a new value to an existing field that is an array (like ``subject`` or ``collection``), you can use the ``--append-list`` option: .. code:: bash $ ia metadata --append-list="subject:another subject" This command would append ``another subject`` to the items list of subjects, if it doesn't already exist (i.e. no duplicate elements are added). Metadata fields or elements can be removed with the ``--remove`` option: .. code:: bash $ ia metadata --remove="subject:another subject" This would remove ``another subject`` from the items subject field, regardless of whether or not the field is a single or multi-value field. Refer to `Internet Archive Metadata `_ for more specific details regarding metadata and archive.org. Modifying Metadata in Bulk ^^^^^^^^^^^^^^^^^^^^^^^^^^ If you have a lot of metadata changes to submit, you can use a CSV spreadsheet to submit many changes with a single command. Your CSV must contain an ``identifier`` column, with one item per row. Any other column added will be treated as a metadata field to modify. If no value is provided in a given row for a column, no changes will be submitted. If you would like to specify multiple values for certain fields, an index can be provided: ``subject[0]``, ``subject[1]``. Your CSV file should be UTF-8 encoded. See `metadata.csv `_ for an example CSV file. Once you're ready to submit your changes, you can submit them like so: .. code:: bash $ ia metadata --spreadsheet=metadata.csv See ``ia help metadata`` for more details. Upload ------ ``ia`` can also be used to upload items to archive.org. After `configuring ia `__, you can upload files like so: .. code:: bash $ ia upload file1 file2 --metadata="mediatype:texts" --metadata="blah:arg" .. warning:: Please note that, unless specified otherwise, items will be uploaded with a ``data`` mediatype. **This cannot be changed afterwards.** Therefore, you should specify a mediatype when uploading, eg. ``--metadata="mediatype:movies"``. Similarly, if you want your upload to end up somewhere else than the default collection (currently `community texts `_), you should also specify a collection with ``--metadata="collection:foo"``. See `metadata documentation `_ for more information. You can upload files from ``stdin``: .. code:: bash $ curl http://dumps.wikimedia.org/kywiki/20130927/kywiki-20130927-pages-logging.xml.gz \ | ia upload - --remote-name=kywiki-20130927-pages-logging.xml.gz --metadata="title:Uploaded from stdin." You can use the ``--retries`` parameter to retry on errors (i.e. if IA-S3 is overloaded): .. code:: bash $ ia upload file1 --retries 10 Note that ``ia upload`` makes a backup of any files that are clobbered. They are saved to a directory in the item named ``history/files/``. The files are named in the format ``$key.~N~``. These files can be deleted like normal files. You can also prevent the backup from happening on clobbers by adding ``-H x-archive-keep-old-version:0`` to your command. Refer to `archive.org Identifiers `_ for more information on creating valid archive.org identifiers. Please also read the `Internet Archive Items `_ page before getting started. Bulk Uploading ^^^^^^^^^^^^^^ Uploading in bulk can be done similarly to `Modifying Metadata in Bulk`_. The only difference is that you must provide a ``file`` column which contains a relative or absolute path to your file. Please see `uploading.csv `_ for an example. Once you are ready to start your upload, simply run: .. code:: bash $ ia upload --spreadsheet=uploading.csv See ``ia help upload`` for more details. Setting File-Level Metadata on Upload ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ You can set file-level metadata at time of upload via a JSON/JSONL file. The JSON or JSONL must have a dict for each file, with the local path to the file stored under the key, ``name``. For example, you could upload two files named ``foo.txt`` and ``bar.txt`` with a file-level ``title`` with the following JSONL file (named ``file_md.jsonl``): .. code:: json {"name": "foo.txt", "title": "my foo file"} {"name": "bar.txt", "title": "my foo file"} And the following command: .. code:: bash $ ia upload --file-metadata file_md.jsonl Download -------- Download an entire item: .. code:: bash $ ia download TripDown1905 Download specific files from an item: .. code:: bash $ ia download TripDown1905 TripDown1905_512kb.mp4 TripDown1905.ogv Download specific files matching a glob pattern: .. code:: bash $ ia download TripDown1905 --glob="*.mp4" Note that you may have to escape the ``*`` differently depending on your shell (e.g. ``\*.mp4``, ``'*.mp4'``, etc.). Download only files of a specific format: .. code:: bash $ ia download TripDown1905 --format='512Kb MPEG4' Note that ``--format`` cannot be used with ``--glob``. You can get a list of the formats of a given item like so: .. code:: bash $ ia metadata --formats TripDown1905 Download an entire collection: .. code:: bash $ ia download --search 'collection:glasgowschoolofart' Download from an itemlist: .. code:: bash $ ia download --itemlist itemlist.txt See ``ia help download`` for more details. Downloading On-The-Fly Files ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Some files on archive.org are generated on-the-fly as requested. This currently includes non-original files of the formats EPUB, MOBI, DAISY, and archive.org's own MARC XML. These files can be downloaded using the ``--on-the-fly`` parameter: .. code:: bash $ ia download goodytwoshoes00newyiala --on-the-fly Delete ------ You can use ``ia`` to delete files from archive.org items: .. code:: bash $ ia delete Delete a file *and* all files derived from the specified file: .. code:: bash $ ia delete --cascade Delete all files in an item: .. code:: bash $ ia delete --all Note that ``ia delete`` makes a backup of any files that are deleted. They are saved to a directory in the item named ``history/files/``. The files are named in the format ``$key.~N~``. These files can be deleted like normal files. You can also prevent the backup from happening on deletes by adding ``-H x-archive-keep-old-version:0`` to your command. See ``ia help delete`` for more details. Search ------ ``ia`` can also be used for retrieving archive.org search results in JSON: .. code:: bash $ ia search 'subject:"market street" collection:prelinger' By default, ``ia search`` attempts to return all items meeting the search criteria, and the results are sorted by item identifier. If you want to just select the top ``n`` items, you can specify a ``page`` and ``rows`` parameter. For example, to get the top 20 items matching the search 'dogs': .. code:: bash $ ia search --parameters="page=1&rows=20" "dogs" You can use ``ia search`` to create an itemlist: .. code:: bash $ ia search 'collection:glasgowschoolofart' --itemlist > itemlist.txt You can pipe your itemlist into a GNU Parallel command to download items concurrently: .. code:: bash $ ia search 'collection:glasgowschoolofart' --itemlist | parallel 'ia download {}' See ``ia help search`` for more details. Tasks ----- You can also use ``ia`` to retrieve information about your catalog tasks, after `configuring ia `__. To retrieve the task history for an item, simply run: .. code:: bash $ ia tasks View all of your queued and running archive.org tasks: .. code:: bash $ ia tasks See ``ia help tasks`` for more details. List ---- You can list files in an item like so: .. code:: bash $ ia list goodytwoshoes00newyiala See ``ia help list`` for more details. Copy ---- You can copy files in archive.org items like so: .. code:: bash $ ia copy / / If you're copying your file to a new item, you can provide metadata as well: .. code:: bash $ ia copy / / --metadata 'title:My New Item' --metadata collection:test_collection Note that ``ia copy`` makes a backup of any files that are clobbered. They are saved to a directory in the item named ``history/files/``. The files are named in the format ``$key.~N~``. These files can be deleted like normal files. You can also prevent the backup from happening on clobbers by adding ``-H x-archive-keep-old-version:0`` to your command. Move ---- ``ia move`` works just like ``ia copy`` except the source file is deleted after the file has been successfully copied. Note that ``ia move`` makes a backup of any files that are clobbered or deleted. They are saved to a directory in the item named ``history/files/``. The files are named in the format ``$key.~N~``. These files can be deleted like normal files. You can also prevent the backup from happening on clobbers or deletes by adding ``-H x-archive-keep-old-version:0`` to your command. python-internetarchive-1.9.9/docs/source/conf.py000066400000000000000000000216521400433614500220000ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import unicode_literals # internetarchive documentation build configuration file, created by # sphinx-quickstart on Mon Sep 23 20:16:03 2013. # # This file is execfile()d with the current directory set to its # containing dir. # # Note that not all possible configuration values are present in this # autogenerated file. # # All configuration values have a default; values that are commented out # serve to show the default. import sys import os import internetarchive from internetarchive import __version__ import alabaster # If extensions (or modules to document with autodoc) are in another directory, # add these directories to sys.path here. If the directory is relative to the # documentation root, use os.path.abspath to make it absolute, like shown here. sys.path.insert(0, os.path.abspath('../../')) # -- General configuration ------------------------------------------------ # If your documentation needs a minimal Sphinx version, state it here. #needs_sphinx = '1.0' # Add any Sphinx extension module names here, as strings. They can be # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom # ones. extensions = [ 'sphinx.ext.autodoc', 'sphinx.ext.doctest', 'sphinx.ext.todo', 'sphinx.ext.coverage', 'sphinx.ext.viewcode', 'sphinx.ext.intersphinx', 'alabaster', ] # Add any paths that contain templates here, relative to this directory. templates_path = ['_templates'] # The suffix of source filenames. source_suffix = '.rst' # The encoding of source files. #source_encoding = 'utf-8-sig' # The master toctree document. master_doc = 'index' # General information about the project. project = 'internetarchive' copyright = '2015, Internet Archive' # The version info for the project you're documenting, acts as replacement for # |version| and |release|, also used in various other places throughout the # built documents. # # The short X.Y version. version = __version__ # The full version, including alpha/beta/rc tags. release = version # The language for content autogenerated by Sphinx. Refer to documentation # for a list of supported languages. #language = None # There are two options for replacing |today|: either, you set today to some # non-false value, then it is used: #today = '' # Else, today_fmt is used as the format for a strftime call. #today_fmt = '%B %d, %Y' # List of patterns, relative to source directory, that match files and # directories to ignore when looking for source files. exclude_patterns = [] # The reST default role (used for this markup: `text`) to use for all # documents. #default_role = None # If true, '()' will be appended to :func: etc. cross-reference text. #add_function_parentheses = True # If true, the current module name will be prepended to all description # unit titles (such as .. function::). add_module_names = False # If true, sectionauthor and moduleauthor directives will be shown in the # output. They are ignored by default. #show_authors = False # The name of the Pygments (syntax highlighting) style to use. pygments_style = 'sphinx' # A list of ignored prefixes for module index sorting. #modindex_common_prefix = [] # If true, keep warnings as "system message" paragraphs in the built documents. #keep_warnings = False # -- Options for HTML output ---------------------------------------------- # The theme to use for HTML and HTML Help pages. See the documentation for # a list of builtin themes. html_theme_path = [alabaster.get_path()] html_theme = 'alabaster' html_sidebars = { '**': [ 'sidebarlogo.html', 'about.html', 'navigation.html', 'usefullinks.html', 'searchbox.html', ] } html_theme_options = { 'github_user': 'jjjake', 'github_repo': 'internetarchive', 'travis_button': True, 'github_button': True, 'show_powered_by': 'false', 'sidebar_width': '200px', } # Theme options are theme-specific and customize the look and feel of a theme # further. For a list of options available for each theme, see the # documentation. #html_theme_options = {} # Add any paths that contain custom themes here, relative to this directory. #html_theme_path = [] # The name for this set of Sphinx documents. If None, it defaults to # " v documentation". #html_title = None # A shorter title for the navigation bar. Default is the same as html_title. #html_short_title = None # The name of an image file (relative to this directory) to place at the top # of the sidebar. #html_logo = '_static/ia.png' # The name of an image file (within the static path) to use as favicon of the # docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32 # pixels large. #html_favicon = None # Add any paths that contain custom static files (such as style sheets) here, # relative to this directory. They are copied after the builtin static files, # so a file named "default.css" will overwrite the builtin "default.css". html_static_path = ['_static'] # Add any extra paths that contain custom files (such as robots.txt or # .htaccess) here, relative to this directory. These files are copied # directly to the root of the documentation. #html_extra_path = [] # If not '', a 'Last updated on:' timestamp is inserted at every page bottom, # using the given strftime format. #html_last_updated_fmt = '%b %d, %Y' # If true, SmartyPants will be used to convert quotes and dashes to # typographically correct entities. #html_use_smartypants = True # Custom sidebar templates, maps document names to template names. #html_sidebars = {} # Additional templates that should be rendered to pages, maps page names to # template names. #html_additional_pages = {} # If false, no module index is generated. #html_domain_indices = True # If false, no index is generated. #html_use_index = True # If true, the index is split into individual pages for each letter. #html_split_index = False # If true, links to the reST sources are added to the pages. #html_show_sourcelink = True # If true, "Created using Sphinx" is shown in the HTML footer. Default is True. #html_show_sphinx = True # If true, "(C) Copyright ..." is shown in the HTML footer. Default is True. #html_show_copyright = True # If true, an OpenSearch description file will be output, and all pages will # contain a tag referring to it. The value of this option must be the # base URL from which the finished HTML is served. #html_use_opensearch = '' # This is the file name suffix for HTML files (e.g. ".xhtml"). #html_file_suffix = None # Output file base name for HTML help builder. htmlhelp_basename = 'internetarchivedoc' # -- Options for LaTeX output --------------------------------------------- latex_elements = { # The paper size ('letterpaper' or 'a4paper'). #'papersize': 'letterpaper', # The font size ('10pt', '11pt' or '12pt'). #'pointsize': '10pt', # Additional stuff for the LaTeX preamble. #'preamble': '', } # Grouping the document tree into LaTeX files. List of tuples # (source start file, target name, title, # author, documentclass [howto/manual]). latex_documents = [ ('index', 'internetarchive.tex', 'internetarchive Documentation', 'Jacob M. Johnson', 'manual'), ] # The name of an image file (relative to this directory) to place at the top of # the title page. #latex_logo = None # For "manual" documents, if this is true, then toplevel headings are parts, # not chapters. #latex_use_parts = False # If true, show page references after internal links. #latex_show_pagerefs = False # If true, show URL addresses after external links. #latex_show_urls = False # Documents to append as an appendix to all manuals. #latex_appendices = [] # If false, no module index is generated. #latex_domain_indices = True # -- Options for manual page output --------------------------------------- # One entry per manual page. List of tuples # (source start file, name, description, authors, manual section). man_pages = [ ('index', 'internetarchive', 'internetarchive Documentation', ['Jacob M. Johnson'], 1) ] # If true, show URL addresses after external links. #man_show_urls = False # -- Options for Texinfo output ------------------------------------------- # Grouping the document tree into Texinfo files. List of tuples # (source start file, target name, title, author, # dir menu entry, description, category) texinfo_documents = [ ('index', 'internetarchive', 'internetarchive Documentation', 'Jacob M. Johnson', 'internetarchive', 'One line description of project.', 'Miscellaneous'), ] # Documents to append as an appendix to all manuals. #texinfo_appendices = [] # If false, no module index is generated. #texinfo_domain_indices = True # How to display URL addresses: 'footnote', 'no', or 'inline'. #texinfo_show_urls = 'footnote' # If true, do not generate a @detailmenu in the "Top" node's menu. #texinfo_no_detailmenu = False #autodoc_member_order = 'bysource' intersphinx_mapping = { 'python': ('https://docs.python.org/2.7', None), 'requests': ('http://docs.python-requests.org/en/latest/', None) } python-internetarchive-1.9.9/docs/source/contributing.rst000066400000000000000000000000441400433614500237320ustar00rootroot00000000000000.. include:: ../../CONTRIBUTING.rst python-internetarchive-1.9.9/docs/source/index.rst000066400000000000000000000021261400433614500223350ustar00rootroot00000000000000.. internetarchive documentation master file, created by sphinx-quickstart on Mon Sep 23 20:16:03 2013. You can adapt this file completely to your liking, but it should at least contain the root `toctree` directive. The Internet Archive Python Library =================================== Release v\ |version|. (:ref:`Installation `) .. image:: https://travis-ci.com/jjjake/internetarchive.svg :target: https://travis-ci.com/jjjake/internetarchive Welcome to the documentation for the ``internetarchive`` Python library. ``internetarchive`` is a command-line and Python interface to archive.org. Please report any issues on `Github `__. If you're not sure where to begin, the quickest and easiest way to get started is `downloading a binary `__ and taking a look at the `command-line interface documentation `__. User's Guide ------------ .. toctree:: :maxdepth: 2 installation quickstart cli api updates parallel troubleshooting contributing authors python-internetarchive-1.9.9/docs/source/installation.rst000066400000000000000000000076421400433614500237370ustar00rootroot00000000000000.. _install: Installation ============ System-Wide Installation ------------------------ Installing the ``internetarchive`` library globally on your system can be done with `pip `_. This is the recommended method for installing ``internetarchive`` (`see below `_ for details on installing pip):: $ sudo pip install internetarchive or, with `easy_install `_:: $ sudo easy_install internetarchive Either of these commands will install the ``internetarchive`` Python library and ``ia`` command-line tool on your system. **Note**: Some versions of Mac OS X come with Python libraries that are required by ``internetarchive`` (e.g. the Python package ``six``). This can cause installation issues. If your installation is failing with a message that looks something like:: OSError: [Errno 1] Operation not permitted: '/var/folders/bk/3wx7qs8d0x79tqbmcdmsk1040000gp/T/pip-TGyjVo-uninstall/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/six-1.4.1-py2.7.egg-info' You can use the ``--ignore-installed`` parameter in ``pip`` to ignore the libraries that are already installed, and continue with the rest of the installation:: $ sudo pip install --ignore-installed internetarchive More details on this issue can be found here: https://github.com/pypa/pip/issues/3165 Installing Pip ~~~~~~~~~~~~~~ Pip can be `installed with the get-pip.py script `_:: $ curl -LOs https://bootstrap.pypa.io/get-pip.py $ python get-pip.py virtualenv ---------- If you don't want to, or can't, install the package system-wide you can use ``virtualenv`` to create an isolated Python environment. First, make sure ``virtualenv`` is installed on your system. If it's not, you can do so with pip:: $ sudo pip install virtualenv With ``easy_install``:: $ sudo easy_install virtualenv Or your systems package manager, ``apt-get`` for example:: $ sudo apt-get install python-virtualenv Once you have ``virtualenv`` installed on your system, create a virtualenv:: $ mkdir myproject $ cd myproject $ virtualenv venv New python executable in venv/bin/python Installing setuptools, pip............done. Activate your virtualenv:: $ . venv/bin/activate Install ``internetarchive`` into your virtualenv:: $ pip install internetarchive Snap ---- You can install the latest ``ia`` `snap `_, and help testing the most recent changes of the master branch in `all the supported Linux distros `_ with:: $ sudo snap install ia --edge Every time a new version of ``ia`` is pushed to the store, you will get it updated automatically. Binaries -------- Binaries are also available for the ``ia`` command-line tool:: $ curl -LOs https://archive.org/download/ia-pex/ia $ chmod +x ia Binaries are generated with `PEX `_. The only requirement for using the binaries is that you have Python installed on a Unix-like operating system. For more details on the command-line interface please refer to the `README `_, or ``ia help``. Get the Code ------------ Internetarchive is `actively developed on GitHub `_. You can either clone the public repository:: $ git clone git://github.com/jjjake/internetarchive.git Download the `tarball `_:: $ curl -OL https://github.com/jjjake/internetarchive/tarball/master Or, download the `zipball `_:: $ curl -OL https://github.com/jjjake/internetarchive/zipball/master Once you have a copy of the source, you can install it into your site-packages easily:: $ python setup.py install python-internetarchive-1.9.9/docs/source/internetarchive.rst000066400000000000000000000016601400433614500244220ustar00rootroot00000000000000:orphan: .. _internetarchive: Internetarchive: A Python Interface to archive.org ================================================== .. automodule:: internetarchive :class:`internetarchive.Item` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. autoclass:: Item :members: :show-inheritance: :class:`internetarchive.File` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. autoclass:: File :members: :show-inheritance: :class:`internetarchive.Search` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. autoclass:: Search :members: :show-inheritance: :class:`internetarchive.Catalog` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. autoclass:: Catalog :members: :show-inheritance: :class:`internetarchive.ArchiveSession` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. autoclass:: ArchiveSession :members: :show-inheritance: :mod:`internetarchive.api` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. automodule:: internetarchive.api :members: :show-inheritance: python-internetarchive-1.9.9/docs/source/jq.rst000066400000000000000000000241441400433614500216440ustar00rootroot00000000000000.. _jq: Using jq with ia ================ `jq `_ is a lightweight and flexible command-line JSON processor. It's a great tool for processing the JSON output of ``ia``. This document will go over how to install or download ``jq`` and how to use it with ``ia``. If you have a tip you'd like to add to this page, please email `jake@archive.org `_ or send a pull request. If you're unable to figure out a ``jq`` command to do what you need and don't see it on this page, please email `jake@archive.org `_ for help. Installation ------------ Downloading a binary ^^^^^^^^^^^^^^^^^^^^ The easiest way to get started with ``jq`` is to download a binary. Binaries for Linux, OS X, and Windows are available at `https://stedolan.github.io/jq/download/ `_. Once you find the binary for your OS, you could right-click the hypertext and copy the link to the binary. Then you could paste it into your terminal and download it like so: .. code:: bash $ curl -Ls https://github.com/stedolan/jq/releases/download/jq-1.5/jq-osx-amd64 > jq $ chmod +x jq # make it executable To confirm it's working, simply run the following. You should see the help page. .. code:: bash $ ./jq jq - commandline JSON processor [version 1.5] Usage: ./jq [options] [file...] jq is a tool for processing JSON inputs, applying the given filter to its JSON text inputs and producing the filter's results as JSON on standard output. The simplest filter is ., which is the identity filter, copying jq's input to its output unmodified (except for formatting). For more advanced filters see the jq(1) manpage ("man jq") and/or https://stedolan.github.io/jq Some of the options include: -c compact instead of pretty-printed output; -n use `null` as the single input value; -e set the exit status code based on the output; -s read (slurp) all inputs into an array; apply filter to it; -r output raw strings, not JSON texts; -R read raw strings, not JSON texts; -C colorize JSON; -M monochrome (don't colorize JSON); -S sort keys of objects on output; --tab use tabs for indentation; --arg a v set variable $a to value ; --argjson a v set variable $a to JSON value ; --slurpfile a f set variable $a to an array of JSON texts read from ; See the manpage for more options. Just like the ``ia`` binary, downloading the ``jq`` binary does not install it to your system. It's simply an executable binary. To use it, you'll have to use either a relative or absolute path. For example: .. code:: bash $ ~/jq --help $ ./jq --help $ /Users/jake/jq --help Installing with a package manager ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ``jq`` can also be installed with most popular package managers: .. code:: bash # Linux $ sudo apt-get install jq # OS X $ brew install jq # FreeBSD $ pkg install jq # Solaris $ pkgutil -i jq # Windows $ chocolately install jq Please refer to `https://stedolan.github.io/jq/download/ `_ for more details. Getting started --------------- ``jq`` can seem a bit overwhelming at first, so let's get started with some basic examples. A good way to make sense of how you can access a specific metadata field is to use ``jq 'keys'``. This will show you the top-level keys that exist in the JSON document. .. code:: bash $ ia metadata nasa | jq 'keys' [ "created", "d1", "d2", "dir", "files", "files_count", "is_collection", "item_size", "metadata", "reviews", "server", "uniq", "workable_servers" ] To access the value of a given key, you can simply do: .. code:: bash $ ia metadata nasa | jq '.files_count' 8 As you can see, the command above returns the value for the ``files_count`` key. There are 8 files in the item. When working with ``ia metadata`` the ``metadata`` and ``files`` keys are likely to be the targets you'll want to access most. Let's take a look at ``metadata``: .. code:: bash $ ia metadata | jq '.metadata | keys' [ "addeddate", "backup_location", "collection", "description", "hidden", "homepage", "identifier", "mediatype", "num_recent_reviews", "num_subcollections", "num_top_dl", "publicdate", "related_collection", "rights", "show_browse_by_date", "show_hidden_subcollections", "show_search_by_year", "spotlight_identifier", "title", "updatedate", "updater", "uploader" ] As you might notice, this is all of the item-level metadata (i.e. the JSON equivalent of an item's ``_meta.xml`` file). We can decend deeper into the JSON document like so: .. code:: bash $ ia metadata nasa | jq '.metadata.title' "NASA Images" ``jq`` returns JSON by default. In this case, a quoted string. To access the raw value, you can use the ``-r`` option: .. code:: bash $ ia metadata nasa | jq -r '.metadata.title' NASA Images Search ------ ``ia search`` outputs JSONL. JSONL is series of JSON documents separated by a newline. In this case, one JSON document is returned per search document reutrned. Converting search results to CSV and other formats ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ``jq`` can be used to parse the JSON returned by ``ia search`` into CSV or TSV files: .. code:: bash $ ia search 'identifier:nasa OR identifier:stairs' --field title,date,subject | jq -r '[.identifier, .title, .date, .subject] | @csv' "nasa","NASA Images",, "stairs","stairs where i worked","2004-01-01T00:00:00Z","test" If you'd prefer a tab-separated spreadsheet, you can replace ``@csv`` with ``@tsv`` in the command above. More options can be found in the *Format strings and escaping* section in the `jq manual `_. Catalog ------- Get info on all of your IA-S3 tasks: .. code:: bash $ ia tasks --json | jq 'select(.args.comment == "s3-put")' Or, output a link to the tasklog for each S3 task you currently have queued or running: .. code:: bash $ ia tasks nasa --json \ | jq -r 'select(.args.comment == "s3-put") | "https://archive.org/log/\(.task_id)"' https://archive.org/log/469558161 https://archive.org/log/400818482 Get the identifiers for all of your redrows: .. code:: bash $ ia tasks --json | jq -r 'select(.row_type == "red").identifier' TODO ____ Recipes to document, work in progress... Select files of a specific format ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code:: bash $ ia metadata nasa | jq '.files[] | select(.format == "JPEG")' { "name": "globe_west_540.jpg", "source": "original", "size": "66065", "format": "JPEG", "mtime": "1245274910", "md5": "9366a4b09386bf673c447e33d806d904", "crc32": "2283b5fd", "sha1": "3e20a009994405f535cdf07cdc2974cef2fce8f2", "rotation": "0" } Select a file by name ^^^^^^^^^^^^^^^^^^^^^ .. code:: bash $ ia metadata nasa | jq '.files[] | select(.name == "nasa_meta.xml")' { "name": "nasa_meta.xml", "source": "metadata", "size": "7968", "format": "Metadata", "mtime": "1530756295", "md5": "06cd95343d60df0f10fb8518b349a795", "crc32": "6b9c6e24", "sha1": "c0dc994eeba245671ef53e2f6c52612722bf51d3" } Get the size of a collection ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code:: bash » ia search 'collection:georgeblood' -f item_size | jq '.item_size' | paste -sd+ - | bc 51677834206186 Getting checksums for all files in an item ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code:: bash $ ia metadata nasa | jq -r '.metadata.identifier as $id | .files[] | [$id, .name, .md5] | @tsv' nasa NASAarchiveLogo.jpg 64dcc1092df36142eb4aab7cc255a4a6 nasa __ia_thumb.jpg c354f821954f80516d163c23135e7dd7 nasa globe_west_540.jpg 9366a4b09386bf673c447e33d806d904 nasa globe_west_540_thumb.jpg d3dab682c56058c8af0df5a2073b1dd1 nasa nasa_archive.torrent 70a7b2b44c318bac381c25febca3b2ca nasa nasa_files.xml 5b8a61ea930ce04d093deebe260fd5f8 nasa nasa_meta.xml 06cd95343d60df0f10fb8518b349a795 nasa nasa_reviews.xml 711ba65d49383a25657640716c45e840 Creating histograms ^^^^^^^^^^^^^^^^^^^ This example creates a histogram of publisher's grouped by item_size. .. code:: bash » ia search 'collection:georgeblood' -f publisher,item_size \ | jq -r '"\(.publisher) \(.item_size)"' \ | awk '{arr[$1]+=$2} END {for (i in arr) {print i,arr[i]}}' \ | sort -rn -k2 \ | head Decca 9518737758200 Victor 8067854677756 Columbia 7221975357654 Capitol 1944338651172 Brunswick 1574280922547 Bluebird 1058465142211 Mercury 1003001910967 MGM 898067089555 Okeh 808308437878 Vocalion 608766709327 Get total imagecount of a collection ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code:: bash $ ia search 'scanningcenter:uoft AND shiptracking:ace54704' -f imagecount | jq '.imagecount' | paste -sd+ - | bc 8172 Selecting files based on filesize ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Get the filenames of every file in ``goodytwoshoes00newyiala`` that is larger than 3000 bytes: .. code:: bash $ ia metadata goodytwoshoes00newyiala \ | jq -r '.files[] | select(.name | endswith(".pdf")) | select((.size | tonumber) > 3000) | .name' goodytwoshoes00newyiala.pdf goodytwoshoes00newyiala_bw.pdf You can also include the identifier in the output like so: .. code:: bash $ ia metadata goodytwoshoes00newyiala \ | jq -r '.metadata.identifier as $i | .files[] | select(.name | endswith(".pdf")) | select((.size | tonumber) > 3000) | "\($i)/\(.name)"' goodytwoshoes00newyiala/goodytwoshoes00newyiala.pdf goodytwoshoes00newyiala/goodytwoshoes00newyiala_bw.pdf python-internetarchive-1.9.9/docs/source/metadata.rst000066400000000000000000000275401400433614500230150ustar00rootroot00000000000000Internet Archive Metadata ========================= `Metadata `_ is data about data. In the case of Internet Archive items, the metadata describes the contents of the items. Metadata can include information such as the performance date for a concert, the name of the artist, and a set list for the event. Metadata is a very important element of items in the Internet Archive. Metadata allows people to locate and view information. Items with little or poor metadata may never be seen and can become lost. Note that metadata keys must be valid XML tags. Please refer to the XML Naming Rules section `here `_. Archive.org Identifiers ----------------------- Each item at Internet Archive has an identifier. An identifier is composed of any unique combination of alphanumeric characters, underscore (``_``) and dash (``-``). While there are no official limits it is strongly suggested that identifiers be between 5 and 80 characters in length. Identifiers must be unique across the entirety of Internet Archive, not simply unique within a single collection. Once defined an identifier **can not** be changed. It will travel with the item or object and is involved in every manner of accessing or referring to the item. Standard Internet Archive Metadata Fields ----------------------------------------- There are several standard metadata fields recognized for Internet Archive items. Most metadata fields are optional. addeddate ^^^^^^^^^ Contains the date on which the item was added to Internet Archive. Please use an `ISO 8601`_ compatible format for this date. For instance, these are all valid date formats: - YYYY - YYYY-MM-DD - YYYY-MM-DD HH:MM:SS While it is possible to set the ``addeddate`` metadata value it is not recommended. This value is typically set by automated processes. adder ^^^^^ The name of the account which added the item to the Internet Archive. While is is possible to set the ``adder`` metadata value it is not recommended. This value is typically set by automated processes. collection ^^^^^^^^^^ A collection is a specialized item used for curation and aggregation of other items. Assigning an item to a collection defines where the item may be located by a user browsing Internet Archive. A collection **must** exist prior to assigning any items to it. Currently collections can only be created by Internet Archive staff members. Please `contact Internet Archive `_ if you need a collection created. All items **should** belong to a collection. If a collection is not specified at the time of upload, it will be added to the `Community texts `_ collection. For testing purposes, you may upload to the ``test_collection`` collection. The following collections are also available to the public at the time of writing: * `Community Audio `_ * `Community Media `_ * `Community Software `_ * `Community Texts `_ (default collection) * `Community Video `_ * `Test collection `_ contributor ^^^^^^^^^^^ The value of the ``contributor`` metadata field is information about the entity responsible for making contributions to the content of the item. This is often the library, organization or individual making the item available on Internet Archive. The value of this metadata field may contain HTML. ``