parfive-1.0.0/0000755000175000017500000000000013462277352014045 5ustar stuartstuart00000000000000parfive-1.0.0/.circleci/0000755000175000017500000000000013462277352015700 5ustar stuartstuart00000000000000parfive-1.0.0/.circleci/config.yml0000644000175000017500000000335213424341410017654 0ustar stuartstuart00000000000000skip-check: &skip-check name: Check for [ci skip] command: bash .circleci/early_exit.sh apt-run: &apt-install name: Install apt packages command: | sudo apt update sudo apt install -y graphviz build-essential tox-install: &tox-install name: Install Tox command: | sudo pip install tox version: 2 jobs: egg-info-37: docker: - image: circleci/python:3.7 steps: - checkout - run: python setup.py egg_info html-docs: docker: - image: circleci/python:3.6 steps: - checkout - run: *skip-check - run: *apt-install - run: *tox-install - run: tox -e build_docs - store_artifacts: path: docs/_build/html - run: name: "Built documentation is available at:" command: DOCS_URL="${CIRCLE_BUILD_URL}/artifacts/${CIRCLE_NODE_INDEX}/${CIRCLE_WORKING_DIRECTORY/#\~/$HOME}/docs/_build/html/index.html"; echo $DOCS_URL tests_37: docker: - image: circleci/python:3.7 steps: - checkout - run: *skip-check - run: *apt-install - run: *tox-install - run: tox -e py37 tests_36: docker: - image: circleci/python:3.6 steps: - checkout - run: *skip-check - run: *apt-install - run: *tox-install - run: tox -e py36 tests_35: docker: - image: circleci/python:3.5 steps: - checkout - run: *skip-check - run: *apt-install - run: *tox-install - run: tox -e py35 workflows: version: 2 egg-info: jobs: - egg-info-37 tests: jobs: - tests_37 - tests_36 - tests_35 test-documentation: jobs: - html-docs notify: webhooks: - url: https://giles.cadair.com/circleci parfive-1.0.0/.circleci/early_exit.sh0000755000175000017500000000035313424341410020366 0ustar stuartstuart00000000000000#!/bin/bash commitmessage=$(git log --pretty=%B -n 1) if [[ $commitmessage = *"[ci skip]"* ]] || [[ $commitmessage = *"[skip ci]"* ]]; then echo "Skipping build because [ci skip] found in commit message" circleci step halt fi parfive-1.0.0/.gitignore0000644000175000017500000000054513437774543016047 0ustar stuartstuart00000000000000*.py[cod] .eggs/** # C extensions *.so # Packages *.egg *.egg-info dist build eggs parts bin var sdist develop-eggs .installed.cfg lib lib64 __pycache__ # Installer logs pip-log.txt # Unit test / coverage reports .coverage .tox nosetests.xml # Translations *.mo # Mr Developer .mr.developer.cfg .project .pydevproject docs/_build/ docs/api/ htmlcov/ parfive-1.0.0/.readthedocs.yml0000644000175000017500000000021313424341410017110 0ustar stuartstuart00000000000000build: image: latest requirements_file: docs/rtd-requirements.txt python: version: 3.6 pip_install: false setup_py_install: true parfive-1.0.0/LICENSE0000644000175000017500000000204613311707062015041 0ustar stuartstuart00000000000000Copyright (c) 2017-2018 Stuart Mumford Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.parfive-1.0.0/PKG-INFO0000644000175000017500000000552013462277352015144 0ustar stuartstuart00000000000000Metadata-Version: 2.1 Name: parfive Version: 1.0.0 Summary: A HTTP and FTP parallel file downloader. Home-page: https://parfive.readthedocs.io/ Author: "Stuart Mumford" Author-email: "stuart@cadair.com" License: MIT Description: ParFive ======= .. image:: https://img.shields.io/pypi/v/parfive.svg :target: https://pypi.python.org/pypi/parfive :alt: Latest PyPI version A parallel file downloader using asyncio. Usage ----- parfive works by creating a downloader object, appending files to it and then running the download. parfive has a synchronous API, but uses asyncio to paralellise downloading the files. A simple example is:: from parfive import Downloader dl = Downloader() dl.enqueue_file("http://data.sunpy.org/sample-data/predicted-sunspot-radio-flux.txt", path="./") files = dl.download() Results ^^^^^^^ ``parfive.Downloader.download`` returns a ``parfive.Results`` object, which is a list of the filenames that have been downloaded. It also tracks any files which failed to download. Handling Errors ^^^^^^^^^^^^^^^ If files fail to download, the urls and the response from the server are stored in the ``Results`` object returned by ``parfive.Downloader``. These can be used to inform users about the errors. (Note, the progress bar will finish in an incomplete state if a download fails, i.e. it will show ``4/5 Files Downloaded``). The ``Results`` object is a list with an extra attribute ``errors``, this property returns a list of named tuples, where these named tuples contains the ``.url`` and the ``.response``, which is a ``aiohttp.ClientResponse`` or a ``aiohttp.ClientError`` object. Installation ------------ parfive is available on PyPI, you can install it with pip:: pip install parfive or if you want to use FTP downloads:: pip install parfive[ftp] Requirements ^^^^^^^^^^^^ - Python 3.5+ - aiohttp - tqdm - aioftp (for downloads over FTP) Licence ------- MIT Licensed Authors ------- `parfive` was written by `Stuart Mumford `_. Platform: UNKNOWN Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3.5 Classifier: Programming Language :: Python :: 3.6 Classifier: Programming Language :: Python :: 3.7 Provides-Extra: ftp Provides-Extra: test parfive-1.0.0/README.rst0000644000175000017500000000341113424341410015514 0ustar stuartstuart00000000000000ParFive ======= .. image:: https://img.shields.io/pypi/v/parfive.svg :target: https://pypi.python.org/pypi/parfive :alt: Latest PyPI version A parallel file downloader using asyncio. Usage ----- parfive works by creating a downloader object, appending files to it and then running the download. parfive has a synchronous API, but uses asyncio to paralellise downloading the files. A simple example is:: from parfive import Downloader dl = Downloader() dl.enqueue_file("http://data.sunpy.org/sample-data/predicted-sunspot-radio-flux.txt", path="./") files = dl.download() Results ^^^^^^^ ``parfive.Downloader.download`` returns a ``parfive.Results`` object, which is a list of the filenames that have been downloaded. It also tracks any files which failed to download. Handling Errors ^^^^^^^^^^^^^^^ If files fail to download, the urls and the response from the server are stored in the ``Results`` object returned by ``parfive.Downloader``. These can be used to inform users about the errors. (Note, the progress bar will finish in an incomplete state if a download fails, i.e. it will show ``4/5 Files Downloaded``). The ``Results`` object is a list with an extra attribute ``errors``, this property returns a list of named tuples, where these named tuples contains the ``.url`` and the ``.response``, which is a ``aiohttp.ClientResponse`` or a ``aiohttp.ClientError`` object. Installation ------------ parfive is available on PyPI, you can install it with pip:: pip install parfive or if you want to use FTP downloads:: pip install parfive[ftp] Requirements ^^^^^^^^^^^^ - Python 3.5+ - aiohttp - tqdm - aioftp (for downloads over FTP) Licence ------- MIT Licensed Authors ------- `parfive` was written by `Stuart Mumford `_. parfive-1.0.0/docs/0000755000175000017500000000000013462277352014775 5ustar stuartstuart00000000000000parfive-1.0.0/docs/Makefile0000644000175000017500000000110413424341410016412 0ustar stuartstuart00000000000000# Minimal makefile for Sphinx documentation # # You can set these variables from the command line. SPHINXOPTS = SPHINXBUILD = sphinx-build SOURCEDIR = . BUILDDIR = _build # Put it first so that "make" without argument is like "make help". help: @$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) .PHONY: help Makefile # Catch-all target: route all unknown targets to Sphinx using the new # "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). %: Makefile @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)parfive-1.0.0/docs/conf.py0000644000175000017500000000436413462275720016300 0ustar stuartstuart00000000000000# -*- coding: utf-8 -*- # # Configuration file for the Sphinx documentation builder. from pkg_resources import get_distribution from sphinx_astropy.conf.v1 import * # -- Project information ----------------------------------------------------- project = 'Parfive' copyright = '2018, Stuart Mumford' author = 'Stuart Mumford' release = get_distribution('parfive').version # for example take major/minor version = '.'.join(release.split('.')[:2]) # -- General configuration --------------------------------------------------- # If your documentation needs a minimal Sphinx version, state it here. # # needs_sphinx = '1.0' # The suffix(es) of source filenames. # You can specify multiple suffix as a list of string: # # source_suffix = ['.rst', '.md'] source_suffix = '.rst' # The master toctree document. master_doc = 'index' # The language for content autogenerated by Sphinx. Refer to documentation # for a list of supported languages. # # This is also used if you do content translation via gettext catalogs. # Usually you set "language" from the command line for these cases. language = None # List of patterns, relative to source directory, that match files and # directories to ignore when looking for source files. # This pattern also affects html_static_path and html_extra_path. exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store'] # The name of the Pygments (syntax highlighting) style to use. pygments_style = None # -- Options for HTML output ------------------------------------------------- try: from sunpy_sphinx_theme.conf import * except ImportError: html_theme = 'alabaster' html_theme_options = {'logo_url': 'https://parfive.readthedocs.io/en/latest/'} # -- Options for HTMLHelp output --------------------------------------------- # Output file base name for HTML help builder. htmlhelp_basename = 'Parfivedoc' # -- Extension configuration ------------------------------------------------- # -- Options for intersphinx extension --------------------------------------- # Example configuration for intersphinx: refer to the Python standard library. intersphinx_mapping = {'https://docs.python.org/': None, 'http://aiohttp.readthedocs.io/en/stable': None, 'https://aioftp.readthedocs.io/': None} parfive-1.0.0/docs/index.rst0000644000175000017500000000303413462276707016641 0ustar stuartstuart00000000000000Parfive ======= .. toctree:: :hidden: self Parfive is a small library for downloading files, it's objective is to provide a simple API for queuing files for download and then providing excellent feedback to the user about the in progress downloads. It also aims to provide a clear interface for inspecting any failed downloads. The parfive package was motivated by the needs of `SunPy's `__ ``net`` submodule, but should be generally applicable to anyone who want's a user friendly way of downloading multiple files in parallel. Parfive supports downloading files over either HTTP or FTP using `aiohttp `__ and `aioftp `__ ``aioftp`` is an optional dependency, which does not need to be installed to download files over HTTP. Installation ------------ parfive can be installed via pip:: pip install parfive or with FTP support:: pip install parfive[ftp] or with conda from conda-forge:: conda install -c conda-forge parfive or from `GitHub `__. Usage ----- parfive works by creating a downloader object, queuing downloads with it and then running the download. parfive has a synchronous API, but uses `asyncio` to parallelise downloading the files. A simple example is:: from parfive import Downloader dl = Downloader() dl.enqueue_file("http://data.sunpy.org/sample-data/predicted-sunspot-radio-flux.txt", path="./") files = dl.download() .. automodapi:: parfive :no-heading: :no-main-docstr: parfive-1.0.0/docs/rtd-requirements.txt0000644000175000017500000000006613424341410021033 0ustar stuartstuart00000000000000aiohttp tqdm aioftp sunpy-sphinx-theme sphinx-astropy parfive-1.0.0/parfive/0000755000175000017500000000000013462277352015501 5ustar stuartstuart00000000000000parfive-1.0.0/parfive/__init__.py0000644000175000017500000000065013424556534017613 0ustar stuartstuart00000000000000"""parfive - A parallel file downloader using asyncio.""" from pkg_resources import get_distribution, DistributionNotFound from .downloader import Downloader from .results import Results __all__ = ['Downloader', 'Results'] __author__ = 'Stuart Mumford ' try: __version__ = get_distribution(__name__).version except DistributionNotFound: # package is not installed __version__ = "unknown" parfive-1.0.0/parfive/downloader.py0000644000175000017500000004102413462264630020205 0ustar stuartstuart00000000000000import asyncio import contextlib import os import pathlib import urllib.parse from concurrent.futures import ThreadPoolExecutor from functools import partial import aiohttp from tqdm import tqdm, tqdm_notebook from .results import Results from .utils import (FailedDownload, Token, default_name, get_filepath, get_ftp_size, get_http_size, in_notebook, run_in_thread) try: import aioftp except ImportError: aioftp = None __all__ = ['Downloader'] class Downloader: """ Download files in parallel. Parameters ---------- max_conn : `int`, optional The number of parallel download slots. progress : `bool`, optional If `True` show a main progress bar showing how many of the total files have been downloaded. If `False`, no progress bars will be shown at all. file_progress : `bool`, optional If `True` and ``progress`` is true, show ``max_conn`` progress bars detailing the progress of each individual file being downloaded. loop : `asyncio.AbstractEventLoop`, optional The event loop to use to download the files. If not specified a new loop will be created and executed in a new thread so it does not interfere with any currently running event loop. notebook : `bool`, optional If `True` tqdm will be used in notebook mode. If `None` an attempt will be made to detect the notebook and guess which progress bar to use. overwrite : `bool` or `str`, optional Determine how to handle downloading if a file already exists with the same name. If `False` the file download will be skipped and the path returned to the existing file, if `True` the file will be downloaded and the existing file will be overwritten, if `'unique'` the filename will be modified to be unique. """ def __init__(self, max_conn=5, progress=True, file_progress=True, loop=None, notebook=None, overwrite=False): self.max_conn = max_conn self._start_loop(loop) # Configure progress bars if notebook is None: notebook = in_notebook() self.progress = progress self.file_progress = file_progress if self.progress else False self.tqdm = tqdm if not notebook else tqdm_notebook self.overwrite = overwrite def _start_loop(self, loop): # Setup asyncio loops if not loop: aio_pool = ThreadPoolExecutor(1) self.loop = asyncio.new_event_loop() self.run_until_complete = partial(run_in_thread, aio_pool, self.loop) else: self.loop = loop self.run_until_complete = self.loop.run_until_complete # Setup queues self.http_queue = asyncio.Queue(loop=self.loop) self.http_tokens = asyncio.Queue(maxsize=self.max_conn, loop=self.loop) self.ftp_queue = asyncio.Queue(loop=self.loop) self.ftp_tokens = asyncio.Queue(maxsize=self.max_conn, loop=self.loop) for i in range(self.max_conn): self.http_tokens.put_nowait(Token(i + 1)) self.ftp_tokens.put_nowait(Token(i + 1)) @property def queued_downloads(self): """ The total number of files already queued for download. """ return self.http_queue.qsize() + self.ftp_queue.qsize() def enqueue_file(self, url, path=None, filename=None, overwrite=None, **kwargs): """ Add a file to the download queue. Parameters ---------- url : `str` The URL to retrieve. path : `str`, optional The directory to retrieve the file into, if `None` defaults to the current directory. filename : `str` or `callable`, optional The filename to save the file as. Can also be a callable which takes two arguments the url and the response object from opening that URL, and returns the filename. (Note, for FTP downloads the response will be ``None``.) If `None` the HTTP headers will be read for the filename, or the last segment of the URL will be used. overwrite : `bool` or `str`, optional Determine how to handle downloading if a file already exists with the same name. If `False` the file download will be skipped and the path returned to the existing file, if `True` the file will be downloaded and the existing file will be overwritten, if `'unique'` the filename will be modified to be unique. If `None` the value set when constructing the `~parfive.Downloader` object will be used. kwargs : `dict` Extra keyword arguments are passed to `aiohttp.ClientSession.get` or `aioftp.ClientSession` depending on the protocol. """ overwrite = overwrite or self.overwrite if path is None and filename is None: raise ValueError("Either path or filename must be specified.") elif path is None: path = './' path = pathlib.Path(path) if not filename: filepath = partial(default_name, path) elif callable(filename): filepath = filename else: # Define a function because get_file expects a callback def filepath(*args): return path / filename scheme = urllib.parse.urlparse(url).scheme if scheme in ('http', 'https'): get_file = partial(self._get_http, url=url, filepath_partial=filepath, overwrite=overwrite, **kwargs) self.http_queue.put_nowait(get_file) elif scheme == 'ftp': if aioftp is None: raise ValueError("The aioftp package must be installed to download over FTP.") get_file = partial(self._get_ftp, url=url, filepath_partial=filepath, overwrite=overwrite, **kwargs) self.ftp_queue.put_nowait(get_file) else: raise ValueError("URL must start with either 'http' or 'ftp'.") def download(self, timeouts=None): """ Download all files in the queue. Parameters ---------- timeouts : `dict`, optional Overrides for the default timeouts for http downloads. Supported keys are any accepted by the `aiohttp.ClientTimeout` class. Defaults to 5 minutes for total session timeout and 90 seconds for socket read timeout. Returns ------- filenames : `parfive.Results` A list of files downloaded. Notes ----- The defaults for the `'total'` and `'sock_read'` timeouts can be overridden by two environment variables ``PARFIVE_TOTAL_TIMEOUT`` and ``PARFIVE_SOCK_READ_TIMEOUT``. """ timeouts = timeouts or {"total": os.environ.get("PARFIVE_TOTAL_TIMEOUT", 5 * 60), "sock_read": os.environ.get("PARFIVE_SOCK_READ_TIMEOUT", 90)} try: future = self.run_until_complete(self._run_download(timeouts)) finally: self.loop.stop() dlresults = future.result() results = Results() # Iterate through the results and store any failed download errors in # the errors list of the results object. for res in dlresults: if isinstance(res, FailedDownload): results.add_error(res.filepath_partial, res.url, res.exception) elif isinstance(res, Exception): raise res else: results.append(res) return results def retry(self, results): """ Retry any failed downloads in a results object. .. note:: This will start a new event loop. Parameters ---------- results : `parfive.Results` A previous results object, the ``.errors`` property will be read and the downloads retried. Returns ------- results : `parfive.Results` A modified version of the input ``results`` with all the errors from this download attempt and any new files appended to the list of file paths. """ # Restart the loop. self._start_loop(None) for err in results.errors: self.enqueue_file(err.url, filename=err.filepath_partial) new_res = self.download() results += new_res results._errors = new_res._errors return results def _get_main_pb(self, total): """ Return the tqdm instance if we want it, else return a contextmanager that just returns None. """ if self.progress: return self.tqdm(total=total, unit='file', desc="Files Downloaded", position=0) else: return contextlib.contextmanager(lambda: iter([None]))() async def _run_download(self, timeouts): """ Download all files in the queue. Returns ------- results : `parfive.Results` A list of filenames which successfully downloaded. This list also has an attribute ``errors`` which lists any failed urls and their error. """ total_files = self.http_queue.qsize() + self.ftp_queue.qsize() done = set() with self._get_main_pb(total_files) as main_pb: if not self.http_queue.empty(): done.update(await self._run_http_download(main_pb, timeouts)) if not self.ftp_queue.empty(): done.update(await self._run_ftp_download(main_pb, timeouts)) # Return one future to represent all the results. return asyncio.gather(*done, return_exceptions=True) async def _run_http_download(self, main_pb, timeouts): async with aiohttp.ClientSession(loop=self.loop) as session: futures = await self._run_from_queue(self.http_queue, self.http_tokens, main_pb, session=session, timeouts=timeouts) # Wait for all the coroutines to finish done, _ = await asyncio.wait(futures) return done async def _run_ftp_download(self, main_pb, timeouts): futures = await self._run_from_queue(self.ftp_queue, self.ftp_tokens, main_pb, timeouts=timeouts) # Wait for all the coroutines to finish done, _ = await asyncio.wait(futures) return done async def _run_from_queue(self, queue, tokens, main_pb, *, session=None, timeouts): futures = [] while not queue.empty(): get_file = await queue.get() token = await tokens.get() file_pb = self.tqdm if self.file_progress else False future = asyncio.ensure_future(get_file(session, token=token, file_pb=file_pb, timeouts=timeouts)) def callback(token, future, main_pb): tokens.put_nowait(token) # Update the main progressbar if main_pb and not future.exception(): main_pb.update(1) future.add_done_callback(partial(callback, token, main_pb=main_pb)) futures.append(future) return futures @staticmethod async def _get_http(session, *, url, filepath_partial, chunksize=100, file_pb=None, token, overwrite, timeouts, **kwargs): """ Read the file from the given url into the filename given by ``filepath_partial``. Parameters ---------- session : `aiohttp.ClientSession` The `aiohttp.ClientSession` to use to retrieve the files. url : `str` The url to retrieve. filepath_partial : `callable` A function to call which returns the filepath to save the url to. Takes two arguments ``resp, url``. chunksize : `int` The number of bytes to read into the file at a time. file_pb : `tqdm.tqdm` or `False` Should progress bars be displayed for each file downloaded. token : `parfive.downloader.Token` A token for this download slot. kwargs : `dict` Extra keyword arguments are passed to `aiohttp.ClientSession.get`. Returns ------- filepath : `str` The name of the file saved. """ timeout = aiohttp.ClientTimeout(**timeouts) try: async with session.get(url, timeout=timeout, **kwargs) as resp: if resp.status != 200: raise FailedDownload(filepath_partial, url, resp) else: filepath, skip = get_filepath(filepath_partial(resp, url), overwrite) if skip: return str(filepath) if callable(file_pb): file_pb = file_pb(position=token.n, unit='B', unit_scale=True, desc=filepath.name, leave=False, total=get_http_size(resp)) else: file_pb = None with open(str(filepath), 'wb') as fd: while True: chunk = await resp.content.read(chunksize) if not chunk: # Close the file progressbar if file_pb is not None: file_pb.close() return str(filepath) # Write this chunk to the output file. fd.write(chunk) # Update the progressbar for file if file_pb is not None: file_pb.update(chunksize) except Exception as e: raise FailedDownload(filepath_partial, url, e) @staticmethod async def _get_ftp(session=None, *, url, filepath_partial, file_pb=None, token, overwrite, timeouts, **kwargs): """ Read the file from the given url into the filename given by ``filepath_partial``. Parameters ---------- session : `None` A placeholder for API compatibility with ``_get_http`` url : `str` The url to retrieve. filepath_partial : `callable` A function to call which returns the filepath to save the url to. Takes two arguments ``resp, url``. file_pb : `tqdm.tqdm` or `False` Should progress bars be displayed for each file downloaded. token : `parfive.downloader.Token` A token for this download slot. kwargs : `dict` Extra keyword arguments are passed to `~aioftp.ClientSession`. Returns ------- filepath : `str` The name of the file saved. """ parse = urllib.parse.urlparse(url) try: async with aioftp.ClientSession(parse.hostname, **kwargs) as client: if parse.username and parse.password: client.login(parse.username, parse.password) # This has to be done before we start streaming the file: total_size = await get_ftp_size(client, parse.path) async with client.download_stream(parse.path) as stream: filepath, skip = get_filepath(filepath_partial(None, url), overwrite) if skip: return str(filepath) if callable(file_pb): file_pb = file_pb(position=token.n, unit='B', unit_scale=True, desc=filepath.name, leave=False, total=total_size) else: file_pb = None with open(str(filepath), 'wb') as fd: async for chunk in stream.iter_by_block(): # Write this chunk to the output file. fd.write(chunk) # Update the progressbar for file if file_pb is not None: file_pb.update(len(chunk)) # Close the file progressbar if file_pb is not None: file_pb.close() return str(filepath) except Exception as e: raise FailedDownload(filepath_partial, url, e) parfive-1.0.0/parfive/results.py0000644000175000017500000000474513462264433017562 0ustar stuartstuart00000000000000from collections import UserList, namedtuple import aiohttp from .utils import FailedDownload __all__ = ['Results'] class Results(UserList): """ The results of a download from `parfive.Downloader.download`. This object contains the filenames of successful downloads as well as a list of any errors encountered in the `~parfive.Results.errors` property. """ def __init__(self, *args, errors=None): super().__init__(*args) self._errors = errors or list() self._error = namedtuple("error", ("filepath_partial", "url", "exception")) def _get_nice_resp_repr(self, response): # This is a modified version of aiohttp.ClientResponse.__repr__ if isinstance(response, aiohttp.ClientResponse): ascii_encodable_url = str(response.url) if response.reason: ascii_encodable_reason = response.reason.encode('ascii', 'backslashreplace').decode('ascii') else: ascii_encodable_reason = response.reason return ''.format( ascii_encodable_url, response.status, ascii_encodable_reason) else: return repr(response) def __str__(self): out = super().__repr__() if self.errors: out += '\nErrors:\n' for error in self.errors: if isinstance(error, FailedDownload): resp = self._get_nice_resp_repr(error.exception) out += "(url={}, response={})\n".format(error.url, resp) else: out += "({})".format(repr(error)) return out def __repr__(self): out = object.__repr__(self) out += '\n' out += str(self) return out def add_error(self, filename, url, exception): """ Add an error to the results. """ if isinstance(exception, aiohttp.ClientResponse): exception._headers = None self._errors.append(self._error(filename, url, exception)) @property def errors(self): """ A list of errors encountered during the download. The errors are represented as a tuple containing ``(filepath, url, exception)`` where ``filepath`` is a function for generating a filepath, ``url`` is the url to be downloaded and ``exception`` is the error raised during download. """ return self._errors parfive-1.0.0/parfive/tests/0000755000175000017500000000000013462277352016643 5ustar stuartstuart00000000000000parfive-1.0.0/parfive/tests/__init__.py0000644000175000017500000000000013311545227020732 0ustar stuartstuart00000000000000parfive-1.0.0/parfive/tests/test_downloader.py0000644000175000017500000002000713462266765022417 0ustar stuartstuart00000000000000from pathlib import Path from unittest.mock import patch import aiohttp import pytest from pytest_localserver.http import WSGIServer from parfive.downloader import Downloader, Token, FailedDownload, Results def test_setup(event_loop): dl = Downloader(loop=event_loop) assert isinstance(dl, Downloader) assert dl.http_queue.qsize() == 0 assert dl.http_tokens.qsize() == 5 assert dl.ftp_queue.qsize() == 0 assert dl.ftp_tokens.qsize() == 5 def test_download(event_loop, httpserver, tmpdir): tmpdir = str(tmpdir) httpserver.serve_content('SIMPLE = T', headers={'Content-Disposition': "attachment; filename=testfile.fits"}) dl = Downloader(loop=event_loop) dl.enqueue_file(httpserver.url, path=Path(tmpdir)) assert dl.queued_downloads == 1 f = dl.download() assert len(f) == 1 assert Path(f[0]).name == "testfile.fits" def test_download_partial(event_loop, httpserver, tmpdir): tmpdir = str(tmpdir) httpserver.serve_content('SIMPLE = T') dl = Downloader(loop=event_loop) dl.enqueue_file(httpserver.url, filename=lambda resp, url: Path(tmpdir) / "filename") f = dl.download() assert len(f) == 1 # strip the http:// assert "filename" in f[0] def test_empty_download(event_loop, tmpdir): dl = Downloader(loop=event_loop) f = dl.download() assert len(f) == 0 def test_download_filename(event_loop, httpserver, tmpdir): httpserver.serve_content('SIMPLE = T') fname = "testing123" filename = str(tmpdir.join(fname)) with open(filename, "w") as fh: fh.write("SIMPLE = T") dl = Downloader(loop=event_loop) dl.enqueue_file(httpserver.url, filename=filename, chunksize=200) f = dl.download() assert isinstance(f, Results) assert len(f) == 1 assert f[0] == filename def test_download_no_overwrite(event_loop, httpserver, tmpdir): httpserver.serve_content('SIMPLE = T') fname = "testing123" filename = str(tmpdir.join(fname)) with open(filename, "w") as fh: fh.write("Hello world") dl = Downloader(loop=event_loop) dl.enqueue_file(httpserver.url, filename=filename, chunksize=200) f = dl.download() assert isinstance(f, Results) assert len(f) == 1 assert f[0] == filename with open(filename) as fh: # If the contents is the same as when we wrote it, it hasn't been # overwritten assert fh.read() == "Hello world" def test_download_overwrite(event_loop, httpserver, tmpdir): httpserver.serve_content('SIMPLE = T') fname = "testing123" filename = str(tmpdir.join(fname)) with open(filename, "w") as fh: fh.write("Hello world") dl = Downloader(loop=event_loop, overwrite=True) dl.enqueue_file(httpserver.url, filename=filename, chunksize=200) f = dl.download() assert isinstance(f, Results) assert len(f) == 1 assert f[0] == filename with open(filename) as fh: assert fh.read() == "SIMPLE = T" def test_download_unique(event_loop, httpserver, tmpdir): httpserver.serve_content('SIMPLE = T') fname = "testing123" filename = str(tmpdir.join(fname)) filenames = [filename, filename+'.fits', filename+'.fits.gz'] dl = Downloader(loop=event_loop, overwrite='unique') # Write files to both the target filenames. for fn in filenames: with open(fn, "w") as fh: fh.write("Hello world") dl.enqueue_file(httpserver.url, filename=fn, chunksize=200) f = dl.download() assert isinstance(f, Results) assert len(f) == len(filenames) for fn in f: assert fn not in filenames assert "{fname}.1".format(fname=fname) in fn @pytest.fixture def testserver(request): """A server that throws a 404 for the second request""" counter = 0 def simple_app(environ, start_response): """Simplest possible WSGI application""" nonlocal counter counter += 1 if counter != 2: status = '200 OK' response_headers = [('Content-type', 'text/plain'), ('Content-Disposition', ('testfile_{}'.format(counter)))] start_response(status, response_headers) return [b'Hello world!\n'] else: status = '404' response_headers = [('Content-type', 'text/plain')] start_response(status, response_headers) return "" server = WSGIServer(application=simple_app) server.start() request.addfinalizer(server.stop) return server def test_retrieve_some_content(testserver, tmpdir): """ Test that the downloader handles errors properly. """ tmpdir = str(tmpdir) dl = Downloader() nn = 5 for i in range(nn): dl.enqueue_file(testserver.url, path=tmpdir) f = dl.download() assert len(f) == nn - 1 assert len(f.errors) == 1 def test_no_progress(httpserver, tmpdir, capsys): tmpdir = str(tmpdir) httpserver.serve_content('SIMPLE = T') dl = Downloader(progress=False) dl.enqueue_file(httpserver.url, path=tmpdir) dl.download() # Check that there was not stdout captured = capsys.readouterr().out assert not captured def throwerror(*args, **kwargs): raise ValueError("Out of Cheese.") @patch("parfive.downloader.default_name", throwerror) def test_raises_other_exception(httpserver, tmpdir): tmpdir = str(tmpdir) httpserver.serve_content('SIMPLE = T') dl = Downloader() dl.enqueue_file(httpserver.url, path=tmpdir) res = dl.download() assert isinstance(res.errors[0].exception, ValueError) def test_token(): t = Token(5) assert "5" in repr(t) assert "5" in str(t) def test_failed_download(): err = FailedDownload("wibble", "bbc.co.uk", "running away") assert "bbc.co.uk" in repr(err) assert "bbc.co.uk" in repr(err) assert "running away" in str(err) assert "running away" in str(err) def test_results(): res = Results() res.append("hello") res.add_error("wibble", "notaurl", "out of cheese") assert "notaurl" in repr(res) assert "hello" in repr(res) assert "out of cheese" in repr(res) def test_notaurl(tmpdir): tmpdir = str(tmpdir) dl = Downloader(progress=False) dl.enqueue_file("http://notaurl.wibble/file", path=tmpdir) f = dl.download() assert len(f.errors) == 1 assert isinstance(f.errors[0].exception, aiohttp.ClientConnectionError) def test_retry(tmpdir, testserver): tmpdir = str(tmpdir) dl = Downloader() nn = 5 for i in range(nn): dl.enqueue_file(testserver.url, path=tmpdir) f = dl.download() assert len(f) == nn - 1 assert len(f.errors) == 1 f2 = dl.retry(f) assert len(f2) == nn assert len(f2.errors) == 0 def test_empty_retry(): f = Results() dl = Downloader() dl.retry(f) @pytest.mark.allow_hosts(True) def test_ftp(tmpdir): tmpdir = str(tmpdir) dl = Downloader() dl.enqueue_file("ftp://ftp.swpc.noaa.gov/pub/warehouse/2011/2011_SRS.tar.gz", path=tmpdir) dl.enqueue_file("ftp://ftp.swpc.noaa.gov/pub/warehouse/2011/2013_SRS.tar.gz", path=tmpdir) dl.enqueue_file("ftp://ftp.swpc.noaa.gov/pub/_SRS.tar.gz", path=tmpdir) dl.enqueue_file("ftp://notaserver/notafile.fileL", path=tmpdir) f = dl.download() assert len(f) == 1 assert len(f.errors) == 3 @pytest.mark.allow_hosts(True) def test_ftp_http(tmpdir, httpserver): tmpdir = str(tmpdir) httpserver.serve_content('SIMPLE = T') dl = Downloader() dl.enqueue_file("ftp://ftp.swpc.noaa.gov/pub/warehouse/2011/2011_SRS.tar.gz", path=tmpdir) dl.enqueue_file("ftp://ftp.swpc.noaa.gov/pub/warehouse/2011/2013_SRS.tar.gz", path=tmpdir) dl.enqueue_file("ftp://ftp.swpc.noaa.gov/pub/_SRS.tar.gz", path=tmpdir) dl.enqueue_file("ftp://notaserver/notafile.fileL", path=tmpdir) dl.enqueue_file(httpserver.url, path=tmpdir) dl.enqueue_file("http://noaurl.notadomain/noafile", path=tmpdir) assert dl.queued_downloads == 6 f = dl.download() assert len(f) == 2 assert len(f.errors) == 4 parfive-1.0.0/parfive/utils.py0000644000175000017500000000650113462264620017207 0ustar stuartstuart00000000000000import cgi import pathlib from itertools import count __all__ = ['run_in_thread', 'Token', 'FailedDownload', 'default_name', 'in_notebook'] def in_notebook(): try: import ipykernel.zmqshell shell = get_ipython() # noqa if isinstance(shell, ipykernel.zmqshell.ZMQInteractiveShell): # Check that we can import the right widget from tqdm import _tqdm_notebook _tqdm_notebook.IntProgress return True return False except Exception: return False def default_name(path, resp, url): url_filename = url.split('/')[-1] if resp: cdheader = resp.headers.get("Content-Disposition", None) if cdheader: value, params = cgi.parse_header(cdheader) name = params.get('filename', url_filename) else: name = url_filename else: name = url_filename return pathlib.Path(path) / name def run_in_thread(aio_pool, loop, coro): """ This function returns the asyncio Future after running the loop in a thread. This makes the return value of this function the same as the return of ``loop.run_until_complete``. """ return aio_pool.submit(loop.run_until_complete, coro).result() async def get_ftp_size(client, filepath): """ Given an `aioftp.ClientSession` object get the expected size of the file, return ``None`` if the size can not be determined. """ try: size = await client.stat(filepath) size = size.get("size", None) except Exception: size = None return int(size) if size else size def get_http_size(resp): size = resp.headers.get("content-length", None) return int(size) if size else size def replacement_filename(path): """ Given a path generate a unique filename. """ path = pathlib.Path(path) if not path.exists: return path suffix = ''.join(path.suffixes) for c in count(1): if suffix: name, _ = path.name.split(suffix) else: name = path.name new_name = "{name}.{c}{suffix}".format(name=name, c=c, suffix=suffix) new_path = path.parent / new_name if not new_path.exists(): return new_path def get_filepath(filepath, overwrite): """ Get the filepath to download to and ensure dir exists. """ filepath = pathlib.Path(filepath) if filepath.exists(): if not overwrite: return str(filepath), True if overwrite == 'unique': filepath = replacement_filename(filepath) if not filepath.parent.exists(): filepath.parent.mkdir(parents=True) return filepath, False class FailedDownload(Exception): def __init__(self, filepath_partial, url, exception): self.filepath_partial = filepath_partial self.url = url self.exception = exception super().__init__() def __repr__(self): out = super().__repr__() out += '\n {} {}'.format(self.url, self.exception) return out def __str__(self): return "Download Failed: {} with error {}".format(self.url, str(self.exception)) class Token: def __init__(self, n): self.n = n def __repr__(self): return super().__repr__() + "n = {}".format(self.n) def __str__(self): return "Token {}".format(self.n) parfive-1.0.0/parfive.egg-info/0000755000175000017500000000000013462277352017173 5ustar stuartstuart00000000000000parfive-1.0.0/parfive.egg-info/PKG-INFO0000644000175000017500000000552013462277352020272 0ustar stuartstuart00000000000000Metadata-Version: 2.1 Name: parfive Version: 1.0.0 Summary: A HTTP and FTP parallel file downloader. Home-page: https://parfive.readthedocs.io/ Author: "Stuart Mumford" Author-email: "stuart@cadair.com" License: MIT Description: ParFive ======= .. image:: https://img.shields.io/pypi/v/parfive.svg :target: https://pypi.python.org/pypi/parfive :alt: Latest PyPI version A parallel file downloader using asyncio. Usage ----- parfive works by creating a downloader object, appending files to it and then running the download. parfive has a synchronous API, but uses asyncio to paralellise downloading the files. A simple example is:: from parfive import Downloader dl = Downloader() dl.enqueue_file("http://data.sunpy.org/sample-data/predicted-sunspot-radio-flux.txt", path="./") files = dl.download() Results ^^^^^^^ ``parfive.Downloader.download`` returns a ``parfive.Results`` object, which is a list of the filenames that have been downloaded. It also tracks any files which failed to download. Handling Errors ^^^^^^^^^^^^^^^ If files fail to download, the urls and the response from the server are stored in the ``Results`` object returned by ``parfive.Downloader``. These can be used to inform users about the errors. (Note, the progress bar will finish in an incomplete state if a download fails, i.e. it will show ``4/5 Files Downloaded``). The ``Results`` object is a list with an extra attribute ``errors``, this property returns a list of named tuples, where these named tuples contains the ``.url`` and the ``.response``, which is a ``aiohttp.ClientResponse`` or a ``aiohttp.ClientError`` object. Installation ------------ parfive is available on PyPI, you can install it with pip:: pip install parfive or if you want to use FTP downloads:: pip install parfive[ftp] Requirements ^^^^^^^^^^^^ - Python 3.5+ - aiohttp - tqdm - aioftp (for downloads over FTP) Licence ------- MIT Licensed Authors ------- `parfive` was written by `Stuart Mumford `_. Platform: UNKNOWN Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3.5 Classifier: Programming Language :: Python :: 3.6 Classifier: Programming Language :: Python :: 3.7 Provides-Extra: ftp Provides-Extra: test parfive-1.0.0/parfive.egg-info/SOURCES.txt0000644000175000017500000000073513462277352021064 0ustar stuartstuart00000000000000.gitignore .readthedocs.yml LICENSE README.rst setup.cfg setup.py tox.ini .circleci/config.yml .circleci/early_exit.sh docs/Makefile docs/conf.py docs/index.rst docs/rtd-requirements.txt parfive/__init__.py parfive/downloader.py parfive/results.py parfive/utils.py parfive.egg-info/PKG-INFO parfive.egg-info/SOURCES.txt parfive.egg-info/dependency_links.txt parfive.egg-info/requires.txt parfive.egg-info/top_level.txt parfive/tests/__init__.py parfive/tests/test_downloader.pyparfive-1.0.0/parfive.egg-info/dependency_links.txt0000644000175000017500000000000113462277352023241 0ustar stuartstuart00000000000000 parfive-1.0.0/parfive.egg-info/requires.txt0000644000175000017500000000014513462277352021573 0ustar stuartstuart00000000000000tqdm aiohttp [ftp] aioftp [test] pytest pytest-localserver pytest-asyncio pytest-socket pytest-cov parfive-1.0.0/parfive.egg-info/top_level.txt0000644000175000017500000000001013462277352021714 0ustar stuartstuart00000000000000parfive parfive-1.0.0/setup.cfg0000644000175000017500000000157413462277352015675 0ustar stuartstuart00000000000000[metadata] name = parfive description = A HTTP and FTP parallel file downloader. long_description = file: README.rst url = https://parfive.readthedocs.io/ license = MIT author = "Stuart Mumford" author_email = "stuart@cadair.com" classifiers = Programming Language :: Python :: 3 Programming Language :: Python :: 3.5 Programming Language :: Python :: 3.6 Programming Language :: Python :: 3.7 [options] install_requires = tqdm aiohttp setup_requires = setuptools_scm packages = find: [options.extras_require] ftp = aioftp test = pytest pytest-localserver pytest-asyncio pytest-socket pytest-cov [flake8] max-line-length = 100 ignore = I100,I101,I102,I103,I104,I201 [tool:pytest] addopts = --allow-hosts=127.0.0.1,::1 [coverage:run] source = parfive omit = parfive/conftest.py parfive/*setup* parfive/tests/* parfive/__init__* [egg_info] tag_build = tag_date = 0 parfive-1.0.0/setup.py0000644000175000017500000000007213424555667015564 0ustar stuartstuart00000000000000import setuptools setuptools.setup(use_scm_version=True) parfive-1.0.0/tox.ini0000644000175000017500000000055613437774543015374 0ustar stuartstuart00000000000000[tox] envlist = py{37,36, 35}, build_docs [testenv] deps = tqdm aiohttp aioftp pytest-cov pytest-localserver pytest-asyncio pytest-sugar pytest-socket build_docs: sphinx-astropy build_docs: sunpy-sphinx-theme commands = py37,py36,py35: pytest --cov {posargs} build_docs: sphinx-build docs docs/_build/html -W -b html