xopen-0.8.4/ 0000775 0003720 0003720 00000000000 13555010000 013503 5 ustar travis travis 0000000 0000000 xopen-0.8.4/.codecov.yml 0000664 0003720 0003720 00000000264 13555007765 015760 0 ustar travis travis 0000000 0000000 comment: off
codecov:
require_ci_to_pass: no
coverage:
precision: 1
round: down
range: "70...100"
status:
project: yes
patch: no
changes: no
comment: off
xopen-0.8.4/PKG-INFO 0000664 0003720 0003720 00000013333 13555010000 014603 0 ustar travis travis 0000000 0000000 Metadata-Version: 2.1
Name: xopen
Version: 0.8.4
Summary: Open compressed files transparently
Home-page: https://github.com/marcelm/xopen/
Author: Marcel Martin
Author-email: mail@marcelm.net
License: MIT
Description: .. image:: https://travis-ci.org/marcelm/xopen.svg?branch=master
:target: https://travis-ci.org/marcelm/xopen
:alt:
.. image:: https://img.shields.io/pypi/v/xopen.svg?branch=master
:target: https://pypi.python.org/pypi/xopen
.. image:: https://img.shields.io/conda/v/conda-forge/xopen.svg
:target: https://anaconda.org/conda-forge/xopen
:alt:
.. image:: https://codecov.io/gh/marcelm/xopen/branch/master/graph/badge.svg
:target: https://codecov.io/gh/marcelm/xopen
:alt:
=====
xopen
=====
This small Python module provides an ``xopen`` function that works like the
built-in ``open`` function, but can also deal with compressed files.
Supported compression formats are gzip, bzip2 and xz. They are automatically
recognized by their file extensions `.gz`, `.bz2` or `.xz`.
The focus is on being as efficient as possible on all supported Python versions.
For example, ``xopen`` uses ``pigz``, which is a parallel version of ``gzip``,
to open ``.gz`` files, which is faster than using the built-in ``gzip.open``
function. ``pigz`` can use multiple threads when compressing, but is also faster
when reading ``.gz`` files, so it is used both for reading and writing if it is
available.
This module has originally been developed as part of the `cutadapt
tool `_ that is used in bioinformatics to
manipulate sequencing data. It has been in successful use within that software
for a few years.
``xopen`` is compatible with Python versions 2.7 and 3.4 to 3.8.
Usage
-----
Open a file for reading::
from xopen import xopen
with xopen('file.txt.xz') as f:
content = f.read()
Or without context manager::
from xopen import xopen
f = xopen('file.txt.xz')
content = f.read()
f.close()
Open a file in binary mode for writing::
from xopen import xopen
with xopen('file.txt.gz', mode='wb') as f:
f.write(b'Hello')
Credits
-------
The name ``xopen`` was taken from the C function of the same name in the
`utils.h file which is part of
BWA `_.
Kyle Beauchamp has contributed support for
appending to files.
Ruben Vorderman contributed improvements to
make reading gzipped files faster.
Some ideas were taken from the `canopener project `_.
If you also want to open S3 files, you may want to use that module instead.
Changes
-------
v0.8.4
~~~~~~
* When reading gzipped files, force ``pigz`` to use only a single process.
``pigz`` cannot use multiple cores anyway when decompressing. By default,
it would use extra I/O processes, which slightly reduces wall-clock time,
but increases CPU time. Single-core decompression with ``pigz`` is still
about twice as fast as regular ``gzip``.
* Allow ``threads=0`` for specifying that no external ``pigz``/``gzip``
process should be used (then regular ``gzip.open()`` is used instead).
v0.8.3
~~~~~~
* When reading gzipped files, let ``pigz`` use at most four threads by default.
This limit previously only applied when writing to a file.
* Support Python 3.8
v0.8.0
~~~~~~
* Speed improvements when iterating over gzipped files.
v0.6.0
~~~~~~
* For reading from gzipped files, xopen will now use a ``pigz`` subprocess.
This is faster than using ``gzip.open``.
* Python 2 support will be dropped in one of the next releases.
v0.5.0
~~~~~~
* By default, pigz is now only allowed to use at most four threads. This hopefully reduces
problems some users had with too many threads when opening many files at the same time.
* xopen now accepts pathlib.Path objects.
Author
------
Marcel Martin (`@marcelm_ on Twitter `_)
Links
-----
* `Source code `_
* `Report an issue `_
* `Project page on PyPI (Python package index) `_
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Requires-Python: >=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, <4
Provides-Extra: dev
xopen-0.8.4/src/ 0000775 0003720 0003720 00000000000 13555010000 014272 5 ustar travis travis 0000000 0000000 xopen-0.8.4/src/xopen.egg-info/ 0000775 0003720 0003720 00000000000 13555010000 017115 5 ustar travis travis 0000000 0000000 xopen-0.8.4/src/xopen.egg-info/PKG-INFO 0000664 0003720 0003720 00000013333 13555007777 020250 0 ustar travis travis 0000000 0000000 Metadata-Version: 2.1
Name: xopen
Version: 0.8.4
Summary: Open compressed files transparently
Home-page: https://github.com/marcelm/xopen/
Author: Marcel Martin
Author-email: mail@marcelm.net
License: MIT
Description: .. image:: https://travis-ci.org/marcelm/xopen.svg?branch=master
:target: https://travis-ci.org/marcelm/xopen
:alt:
.. image:: https://img.shields.io/pypi/v/xopen.svg?branch=master
:target: https://pypi.python.org/pypi/xopen
.. image:: https://img.shields.io/conda/v/conda-forge/xopen.svg
:target: https://anaconda.org/conda-forge/xopen
:alt:
.. image:: https://codecov.io/gh/marcelm/xopen/branch/master/graph/badge.svg
:target: https://codecov.io/gh/marcelm/xopen
:alt:
=====
xopen
=====
This small Python module provides an ``xopen`` function that works like the
built-in ``open`` function, but can also deal with compressed files.
Supported compression formats are gzip, bzip2 and xz. They are automatically
recognized by their file extensions `.gz`, `.bz2` or `.xz`.
The focus is on being as efficient as possible on all supported Python versions.
For example, ``xopen`` uses ``pigz``, which is a parallel version of ``gzip``,
to open ``.gz`` files, which is faster than using the built-in ``gzip.open``
function. ``pigz`` can use multiple threads when compressing, but is also faster
when reading ``.gz`` files, so it is used both for reading and writing if it is
available.
This module has originally been developed as part of the `cutadapt
tool `_ that is used in bioinformatics to
manipulate sequencing data. It has been in successful use within that software
for a few years.
``xopen`` is compatible with Python versions 2.7 and 3.4 to 3.8.
Usage
-----
Open a file for reading::
from xopen import xopen
with xopen('file.txt.xz') as f:
content = f.read()
Or without context manager::
from xopen import xopen
f = xopen('file.txt.xz')
content = f.read()
f.close()
Open a file in binary mode for writing::
from xopen import xopen
with xopen('file.txt.gz', mode='wb') as f:
f.write(b'Hello')
Credits
-------
The name ``xopen`` was taken from the C function of the same name in the
`utils.h file which is part of
BWA `_.
Kyle Beauchamp has contributed support for
appending to files.
Ruben Vorderman contributed improvements to
make reading gzipped files faster.
Some ideas were taken from the `canopener project `_.
If you also want to open S3 files, you may want to use that module instead.
Changes
-------
v0.8.4
~~~~~~
* When reading gzipped files, force ``pigz`` to use only a single process.
``pigz`` cannot use multiple cores anyway when decompressing. By default,
it would use extra I/O processes, which slightly reduces wall-clock time,
but increases CPU time. Single-core decompression with ``pigz`` is still
about twice as fast as regular ``gzip``.
* Allow ``threads=0`` for specifying that no external ``pigz``/``gzip``
process should be used (then regular ``gzip.open()`` is used instead).
v0.8.3
~~~~~~
* When reading gzipped files, let ``pigz`` use at most four threads by default.
This limit previously only applied when writing to a file.
* Support Python 3.8
v0.8.0
~~~~~~
* Speed improvements when iterating over gzipped files.
v0.6.0
~~~~~~
* For reading from gzipped files, xopen will now use a ``pigz`` subprocess.
This is faster than using ``gzip.open``.
* Python 2 support will be dropped in one of the next releases.
v0.5.0
~~~~~~
* By default, pigz is now only allowed to use at most four threads. This hopefully reduces
problems some users had with too many threads when opening many files at the same time.
* xopen now accepts pathlib.Path objects.
Author
------
Marcel Martin (`@marcelm_ on Twitter `_)
Links
-----
* `Source code `_
* `Report an issue `_
* `Project page on PyPI (Python package index) `_
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Requires-Python: >=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, <4
Provides-Extra: dev
xopen-0.8.4/src/xopen.egg-info/dependency_links.txt 0000664 0003720 0003720 00000000001 13555007777 023216 0 ustar travis travis 0000000 0000000
xopen-0.8.4/src/xopen.egg-info/top_level.txt 0000664 0003720 0003720 00000000006 13555007777 021676 0 ustar travis travis 0000000 0000000 xopen
xopen-0.8.4/src/xopen.egg-info/SOURCES.txt 0000664 0003720 0003720 00000000647 13555010000 021010 0 ustar travis travis 0000000 0000000 .codecov.yml
.editorconfig
.gitignore
.travis.yml
LICENSE
README.rst
pyproject.toml
setup.cfg
setup.py
tox.ini
src/xopen/__init__.py
src/xopen/_version.py
src/xopen.egg-info/PKG-INFO
src/xopen.egg-info/SOURCES.txt
src/xopen.egg-info/dependency_links.txt
src/xopen.egg-info/requires.txt
src/xopen.egg-info/top_level.txt
tests/file.txt
tests/file.txt.bz2
tests/file.txt.gz
tests/file.txt.xz
tests/hello.gz
tests/test_xopen.py xopen-0.8.4/src/xopen.egg-info/requires.txt 0000664 0003720 0003720 00000000062 13555007777 021546 0 ustar travis travis 0000000 0000000
[:python_version == "2.7"]
bz2file
[dev]
pytest
xopen-0.8.4/src/xopen/ 0000775 0003720 0003720 00000000000 13555010000 015423 5 ustar travis travis 0000000 0000000 xopen-0.8.4/src/xopen/_version.py 0000664 0003720 0003720 00000000164 13555007777 017655 0 ustar travis travis 0000000 0000000 # coding: utf-8
# file generated by setuptools_scm
# don't change, don't track in version control
version = '0.8.4'
xopen-0.8.4/src/xopen/__init__.py 0000664 0003720 0003720 00000032044 13555007765 017567 0 ustar travis travis 0000000 0000000 """
Open compressed files transparently.
"""
from __future__ import print_function, division, absolute_import
import gzip
import sys
import io
import os
import time
import signal
from subprocess import Popen, PIPE
from ._version import version as __version__
_PY3 = sys.version > '3'
if not _PY3:
import bz2file as bz2
else:
try:
import bz2
except ImportError:
bz2 = None
try:
import lzma
except ImportError:
lzma = None
if _PY3:
basestring = str
try:
import pathlib # Exists in Python 3.4+
except ImportError:
pathlib = None
try:
from os import fspath # Exists in Python 3.6+
except ImportError:
def fspath(path):
if hasattr(path, "__fspath__"):
return path.__fspath__()
# Python 3.4 and 3.5 have pathlib, but do not support the file system
# path protocol
if pathlib is not None and isinstance(path, pathlib.Path):
return str(path)
if not isinstance(path, basestring):
raise TypeError("path must be a string")
return path
def _available_cpu_count():
"""
Number of available virtual or physical CPUs on this system
Adapted from http://stackoverflow.com/a/1006301/715090
"""
try:
return len(os.sched_getaffinity(0))
except AttributeError:
pass
import re
try:
with open('/proc/self/status') as f:
status = f.read()
m = re.search(r'(?m)^Cpus_allowed:\s*(.*)$', status)
if m:
res = bin(int(m.group(1).replace(',', ''), 16)).count('1')
if res > 0:
return res
except IOError:
pass
try:
import multiprocessing
return multiprocessing.cpu_count()
except (ImportError, NotImplementedError):
return 1
class Closing(object):
"""
Inherit from this class and implement a close() method to offer context
manager functionality.
"""
def __enter__(self):
return self
def __exit__(self, *exc_info):
self.close()
def __del__(self):
try:
self.close()
except:
pass
class PipedGzipWriter(Closing):
"""
Write gzip-compressed files by running an external gzip or pigz process and
piping into it. pigz is tried first. It is fast because it can compress using
multiple cores.
If pigz is not available, a gzip subprocess is used. On Python 2, this saves
CPU time because gzip.GzipFile is slower. On Python 3, gzip.GzipFile is on
par with gzip itself, but running an external gzip can still reduce wall-clock
time because the compression happens in a separate process.
"""
def __init__(self, path, mode='wt', compresslevel=6, threads=None):
"""
mode -- one of 'w', 'wt', 'wb', 'a', 'at', 'ab'
compresslevel -- gzip compression level
threads (int) -- number of pigz threads. If this is set to None, a reasonable default is
used. At the moment, this means that the number of available CPU cores is used, capped
at four to avoid creating too many threads. Use 0 to let pigz use all available cores.
"""
if mode not in ('w', 'wt', 'wb', 'a', 'at', 'ab'):
raise ValueError("Mode is '{0}', but it must be 'w', 'wt', 'wb', 'a', 'at' or 'ab'".format(mode))
# TODO use a context manager
self.outfile = open(path, mode)
self.devnull = open(os.devnull, mode)
self.closed = False
self.name = path
kwargs = dict(stdin=PIPE, stdout=self.outfile, stderr=self.devnull)
# Setting close_fds to True in the Popen arguments is necessary due to
# .
# However, close_fds is not supported on Windows. See
# .
if sys.platform != 'win32':
kwargs['close_fds'] = True
if 'w' in mode and compresslevel != 6:
extra_args = ['-' + str(compresslevel)]
else:
extra_args = []
pigz_args = ['pigz']
if threads is None:
threads = min(_available_cpu_count(), 4)
if threads != 0:
pigz_args += ['-p', str(threads)]
try:
self.process = Popen(pigz_args + extra_args, **kwargs)
self.program = 'pigz'
except OSError:
# pigz not found, try regular gzip
try:
self.process = Popen(['gzip'] + extra_args, **kwargs)
self.program = 'gzip'
except (IOError, OSError):
self.outfile.close()
self.devnull.close()
raise
except IOError: # TODO IOError is the same as OSError on Python 3.3
self.outfile.close()
self.devnull.close()
raise
if _PY3 and 'b' not in mode:
self._file = io.TextIOWrapper(self.process.stdin)
else:
self._file = self.process.stdin
def write(self, arg):
self._file.write(arg)
def close(self):
if self.closed:
return
self.closed = True
self._file.close()
retcode = self.process.wait()
self.outfile.close()
self.devnull.close()
if retcode != 0:
raise IOError("Output {0} process terminated with exit code {1}".format(self.program, retcode))
def __iter__(self):
return self
def __next__(self):
raise io.UnsupportedOperation('not readable')
class PipedGzipReader(Closing):
"""
Open a pipe to pigz for reading a gzipped file. Even though pigz is mostly
used to speed up writing by using many compression threads, it is
also faster when reading, even when forced to use a single thread
(ca. 2x speedup).
"""
def __init__(self, path, mode='r', threads=None):
"""
Raise an OSError when pigz could not be found.
"""
if mode not in ('r', 'rt', 'rb'):
raise ValueError("Mode is '{0}', but it must be 'r', 'rt' or 'rb'".format(mode))
pigz_args = ['pigz', '-cd', path]
if threads is None:
# Single threaded behaviour by default because:
# - Using a single thread to read a file is the least unexpected
# behaviour. (For users of xopen, who do not know which backend is used.)
# - There is quite a substantial overhead (+25% CPU time) when
# using multiple threads while there is only a 10% gain in wall
# clock time.
threads = 1
pigz_args += ['-p', str(threads)]
self.process = Popen(pigz_args, stdout=PIPE, stderr=PIPE)
self.name = path
if _PY3 and 'b' not in mode:
self._file = io.TextIOWrapper(self.process.stdout)
else:
self._file = self.process.stdout
if _PY3:
self._stderr = io.TextIOWrapper(self.process.stderr)
else:
self._stderr = self.process.stderr
self.closed = False
# Give the subprocess a little bit of time to report any errors (such as
# a non-existing file)
time.sleep(0.01)
self._raise_if_error()
def close(self):
if self.closed:
return
self.closed = True
retcode = self.process.poll()
if retcode is None:
# still running
self.process.terminate()
allow_sigterm = True
else:
allow_sigterm = False
self.process.wait()
self._raise_if_error(allow_sigterm=allow_sigterm)
def __iter__(self):
return self._file
def _raise_if_error(self, allow_sigterm=False):
"""
Raise IOError if process is not running anymore and the exit code is
nonzero. If allow_sigterm is set and a SIGTERM exit code is
encountered, no error is raised.
"""
retcode = self.process.poll()
if (
retcode is not None and retcode != 0
and not (allow_sigterm and retcode == -signal.SIGTERM)
):
message = self._stderr.read().strip()
raise IOError("{} (exit code {})".format(message, retcode))
def read(self, *args):
return self._file.read(*args)
def readinto(self, *args):
return self._file.readinto(*args)
def readline(self, *args):
return self._file.readline(*args)
def seekable(self):
return self._file.seekable()
def peek(self, n=None):
return self._file.peek(n)
def readable(self):
if _PY3:
return self._file.readable()
else:
return NotImplementedError(
"Python 2 does not support the readable() method."
)
def writable(self):
return self._file.writable()
def flush(self):
return None
def _open_stdin_or_out(mode):
# Do not return sys.stdin or sys.stdout directly as we want the returned object
# to be closable without closing sys.stdout.
std = dict(r=sys.stdin, w=sys.stdout)[mode[0]]
if not _PY3:
# Enforce str type on Python 2
# Note that io.open is slower than regular open() on Python 2.7, but
# it appears to be the only API that has a closefd parameter.
mode = mode[0] + 'b'
return io.open(std.fileno(), mode=mode, closefd=False)
def _open_bz2(filename, mode):
if bz2 is None:
raise ImportError("Cannot open bz2 files: The bz2 module is not available")
if _PY3:
return bz2.open(filename, mode)
else:
if mode[0] == 'a':
raise ValueError("mode '{0}' not supported with BZ2 compression".format(mode))
return bz2.BZ2File(filename, mode)
def _open_xz(filename, mode):
if lzma is None:
raise ImportError(
"Cannot open xz files: The lzma module is not available (use Python 3.3 or newer)")
return lzma.open(filename, mode)
def _open_gz(filename, mode, compresslevel, threads):
if sys.version_info[:2] == (2, 7):
buffered_reader = io.BufferedReader
buffered_writer = io.BufferedWriter
else:
buffered_reader = lambda x: x
buffered_writer = lambda x: x
if _PY3:
exc = FileNotFoundError # was introduced in Python 3.3
else:
exc = OSError
if 'r' in mode:
def open_with_threads():
return PipedGzipReader(filename, mode, threads=threads)
def open_without_threads():
return buffered_reader(gzip.open(filename, mode))
else:
def open_with_threads():
return PipedGzipWriter(filename, mode, compresslevel, threads=threads)
def open_without_threads():
return buffered_writer(gzip.open(filename, mode, compresslevel=compresslevel))
if threads == 0:
return open_without_threads()
try:
return open_with_threads()
except exc:
# pigz is not installed, use fallback
return open_without_threads()
def xopen(filename, mode='r', compresslevel=6, threads=None):
"""
A replacement for the "open" function that can also read and write
compressed files transparently. The supported compression formats are gzip,
bzip2 and xz. If the filename is '-', standard output (mode 'w') or
standard input (mode 'r') is returned.
The file type is determined based on the filename: .gz is gzip, .bz2 is bzip2, .xz is
xz/lzma and no compression assumed otherwise.
mode can be: 'rt', 'rb', 'at', 'ab', 'wt', or 'wb'. Also, the 't' can be omitted,
so instead of 'rt', 'wt' and 'at', the abbreviations 'r', 'w' and 'a' can be used.
In Python 2, the 't' and 'b' characters are ignored.
Append mode ('a', 'at', 'ab') is not available with BZ2 compression and
will raise an error.
compresslevel is the compression level for writing to gzip files.
This parameter is ignored for the other compression formats.
threads only has a meaning when reading or writing gzip files.
When threads is None (the default), reading or writing a gzip file is done with a pigz
(parallel gzip) subprocess if possible. See PipedGzipWriter and PipedGzipReader.
When threads = 0, no subprocess is used.
"""
if mode in ('r', 'w', 'a'):
mode += 't'
if mode not in ('rt', 'rb', 'wt', 'wb', 'at', 'ab'):
raise ValueError("mode '{0}' not supported".format(mode))
if not _PY3:
mode = mode[0]
filename = fspath(filename)
if compresslevel not in range(1, 10):
raise ValueError("compresslevel must be between 1 and 9")
if filename == '-':
return _open_stdin_or_out(mode)
elif filename.endswith('.bz2'):
return _open_bz2(filename, mode)
elif filename.endswith('.xz'):
return _open_xz(filename, mode)
elif filename.endswith('.gz'):
return _open_gz(filename, mode, compresslevel, threads)
else:
# Python 2.6 and 2.7 have io.open, which we could use to make the returned
# object consistent with the one returned in Python 3, but reading a file
# with io.open() is 100 times slower (!) on Python 2.6, and still about
# three times slower on Python 2.7 (tested with "for _ in io.open(path): pass")
return open(filename, mode)
xopen-0.8.4/.editorconfig 0000664 0003720 0003720 00000000137 13555007765 016211 0 ustar travis travis 0000000 0000000 [*.py]
charset=utf-8
end_of_line=lf
insert_final_newline=true
indent_style=space
indent_size=4
xopen-0.8.4/tox.ini 0000664 0003720 0003720 00000000204 13555007765 015042 0 ustar travis travis 0000000 0000000 [tox]
envlist = py27,py34,py35,py36,py37,py38
[testenv]
deps = pytest
commands = pytest --doctest-modules --pyargs src/xopen tests
xopen-0.8.4/.gitignore 0000664 0003720 0003720 00000000102 13555007765 015514 0 ustar travis travis 0000000 0000000 __pycache__/
*.pyc
*.egg-info
*~
.tox
venv/
src/xopen/_version.py
xopen-0.8.4/setup.cfg 0000664 0003720 0003720 00000000305 13555010000 015322 0 ustar travis travis 0000000 0000000 [bdist_wheel]
universal = 1
[coverage:run]
parallel = True
include =
*/site-packages/xopen/*
tests/*
[coverage:paths]
source =
src/
**/site-packages/
[egg_info]
tag_build =
tag_date = 0
xopen-0.8.4/.travis.yml 0000664 0003720 0003720 00000001427 13555007765 015650 0 ustar travis travis 0000000 0000000 language: python
dist: xenial
cache:
directories:
- $HOME/.cache/pip
python:
- "2.7"
- "3.4"
- "3.5"
- "3.6"
- "3.7"
- "3.8"
install:
- sudo apt-get update && sudo apt-get install -y pigz
- pip install --upgrade coverage codecov
- pip install .
script:
- python setup.py --version # Detect encoding problems
- coverage run -m pytest
after_success:
- coverage combine
- codecov
env:
global:
- TWINE_USERNAME=marcelm
jobs:
include:
- stage: deploy
services:
- docker
python: "3.6"
install: python3 -m pip install twine
if: tag IS present
script:
- |
python3 setup.py sdist
python3 -m pip wheel -w dist/ .
ls -l dist/
python3 -m twine upload dist/xopen-*
xopen-0.8.4/LICENSE 0000664 0003720 0003720 00000002071 13555007765 014540 0 ustar travis travis 0000000 0000000 Copyright (c) 2010-2019 Marcel Martin
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
xopen-0.8.4/README.rst 0000664 0003720 0003720 00000007714 13555007765 015233 0 ustar travis travis 0000000 0000000 .. image:: https://travis-ci.org/marcelm/xopen.svg?branch=master
:target: https://travis-ci.org/marcelm/xopen
:alt:
.. image:: https://img.shields.io/pypi/v/xopen.svg?branch=master
:target: https://pypi.python.org/pypi/xopen
.. image:: https://img.shields.io/conda/v/conda-forge/xopen.svg
:target: https://anaconda.org/conda-forge/xopen
:alt:
.. image:: https://codecov.io/gh/marcelm/xopen/branch/master/graph/badge.svg
:target: https://codecov.io/gh/marcelm/xopen
:alt:
=====
xopen
=====
This small Python module provides an ``xopen`` function that works like the
built-in ``open`` function, but can also deal with compressed files.
Supported compression formats are gzip, bzip2 and xz. They are automatically
recognized by their file extensions `.gz`, `.bz2` or `.xz`.
The focus is on being as efficient as possible on all supported Python versions.
For example, ``xopen`` uses ``pigz``, which is a parallel version of ``gzip``,
to open ``.gz`` files, which is faster than using the built-in ``gzip.open``
function. ``pigz`` can use multiple threads when compressing, but is also faster
when reading ``.gz`` files, so it is used both for reading and writing if it is
available.
This module has originally been developed as part of the `cutadapt
tool `_ that is used in bioinformatics to
manipulate sequencing data. It has been in successful use within that software
for a few years.
``xopen`` is compatible with Python versions 2.7 and 3.4 to 3.8.
Usage
-----
Open a file for reading::
from xopen import xopen
with xopen('file.txt.xz') as f:
content = f.read()
Or without context manager::
from xopen import xopen
f = xopen('file.txt.xz')
content = f.read()
f.close()
Open a file in binary mode for writing::
from xopen import xopen
with xopen('file.txt.gz', mode='wb') as f:
f.write(b'Hello')
Credits
-------
The name ``xopen`` was taken from the C function of the same name in the
`utils.h file which is part of
BWA `_.
Kyle Beauchamp has contributed support for
appending to files.
Ruben Vorderman contributed improvements to
make reading gzipped files faster.
Some ideas were taken from the `canopener project `_.
If you also want to open S3 files, you may want to use that module instead.
Changes
-------
v0.8.4
~~~~~~
* When reading gzipped files, force ``pigz`` to use only a single process.
``pigz`` cannot use multiple cores anyway when decompressing. By default,
it would use extra I/O processes, which slightly reduces wall-clock time,
but increases CPU time. Single-core decompression with ``pigz`` is still
about twice as fast as regular ``gzip``.
* Allow ``threads=0`` for specifying that no external ``pigz``/``gzip``
process should be used (then regular ``gzip.open()`` is used instead).
v0.8.3
~~~~~~
* When reading gzipped files, let ``pigz`` use at most four threads by default.
This limit previously only applied when writing to a file.
* Support Python 3.8
v0.8.0
~~~~~~
* Speed improvements when iterating over gzipped files.
v0.6.0
~~~~~~
* For reading from gzipped files, xopen will now use a ``pigz`` subprocess.
This is faster than using ``gzip.open``.
* Python 2 support will be dropped in one of the next releases.
v0.5.0
~~~~~~
* By default, pigz is now only allowed to use at most four threads. This hopefully reduces
problems some users had with too many threads when opening many files at the same time.
* xopen now accepts pathlib.Path objects.
Author
------
Marcel Martin (`@marcelm_ on Twitter `_)
Links
-----
* `Source code `_
* `Report an issue `_
* `Project page on PyPI (Python package index) `_
xopen-0.8.4/pyproject.toml 0000664 0003720 0003720 00000000104 13555007765 016442 0 ustar travis travis 0000000 0000000 [build-system]
requires = ["setuptools", "wheel", "setuptools_scm"]
xopen-0.8.4/setup.py 0000664 0003720 0003720 00000002524 13555007765 015250 0 ustar travis travis 0000000 0000000 import sys
from setuptools import setup, find_packages
if sys.version_info < (2, 7):
sys.stdout.write("At least Python 2.7 is required.\n")
sys.exit(1)
with open('README.rst') as f:
long_description = f.read()
setup(
name='xopen',
use_scm_version={'write_to': 'src/xopen/_version.py'},
setup_requires=['setuptools_scm'], # Support pip versions that don't know about pyproject.toml
author='Marcel Martin',
author_email='mail@marcelm.net',
url='https://github.com/marcelm/xopen/',
description='Open compressed files transparently',
long_description=long_description,
license='MIT',
package_dir={'': 'src'},
packages=find_packages('src'),
install_requires=[
'bz2file; python_version=="2.7"',
],
extras_require={
'dev': ['pytest'],
},
python_requires='>=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, <4',
classifiers=[
"Development Status :: 5 - Production/Stable",
"License :: OSI Approved :: MIT License",
"Programming Language :: Python :: 2.7",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.4",
"Programming Language :: Python :: 3.5",
"Programming Language :: Python :: 3.6",
"Programming Language :: Python :: 3.7",
"Programming Language :: Python :: 3.8",
]
)
xopen-0.8.4/tests/ 0000775 0003720 0003720 00000000000 13555010000 014645 5 ustar travis travis 0000000 0000000 xopen-0.8.4/tests/hello.gz 0000664 0003720 0003720 00000000031 13555007765 016334 0 ustar travis travis 0000000 0000000 Z H 6 xopen-0.8.4/tests/file.txt.bz2 0000664 0003720 0003720 00000000166 13555007765 017054 0 ustar travis travis 0000000 0000000 BZh91AY&SY