xopen-1.2.1/0000755000175000017500000000000014132253261012171 5ustar nileshnileshxopen-1.2.1/README.rst0000644000175000017500000001335014132253245013664 0ustar nileshnilesh.. image:: https://github.com/pycompression/xopen/workflows/CI/badge.svg :target: https://github.com/pycompression/xopen :alt: .. image:: https://img.shields.io/pypi/v/xopen.svg?branch=master :target: https://pypi.python.org/pypi/xopen .. image:: https://img.shields.io/conda/v/conda-forge/xopen.svg :target: https://anaconda.org/conda-forge/xopen :alt: .. image:: https://codecov.io/gh/pycompression/xopen/branch/main/graph/badge.svg :target: https://codecov.io/gh/pycompression/xopen :alt: ===== xopen ===== This small Python module provides an ``xopen`` function that works like the built-in ``open`` function, but can also deal with compressed files. Supported compression formats are gzip, bzip2 and xz. They are automatically recognized by their file extensions `.gz`, `.bz2` or `.xz`. The focus is on being as efficient as possible on all supported Python versions. For example, ``xopen`` uses ``pigz``, which is a parallel version of ``gzip``, to open ``.gz`` files, which is faster than using the built-in ``gzip.open`` function. ``pigz`` can use multiple threads when compressing, but is also faster when reading ``.gz`` files, so it is used both for reading and writing if it is available. For gzip compression levels 1 to 3, `igzip `_ is used for an even greater speedup. For use cases where using only the main thread is desired xopen can be used with ``threads=0``. This will use `python-isal `_ (which binds isa-l) if python-isal is installed (automatic on Linux systems, as it is a requirement). For installation instructions for python-isal please checkout the `python-isal homepage `_. If python-isal is not available ``gzip.open`` is used. This module has originally been developed as part of the `Cutadapt tool `_ that is used in bioinformatics to manipulate sequencing data. It has been in successful use within that software for a few years. ``xopen`` is compatible with Python versions 3.6 and later. Usage ----- Open a file for reading:: from xopen import xopen with xopen('file.txt.xz') as f: content = f.read() Or without context manager:: from xopen import xopen f = xopen('file.txt.xz') content = f.read() f.close() Open a file in binary mode for writing:: from xopen import xopen with xopen('file.txt.gz', mode='wb') as f: f.write(b'Hello') Credits ------- The name ``xopen`` was taken from the C function of the same name in the `utils.h file which is part of BWA `_. Kyle Beauchamp has contributed support for appending to files. Ruben Vorderman contributed improvements to make reading and writing gzipped files faster. Benjamin Vaisvil contributed support for format detection from content. Dries Schaumont contributed support for faster bz2 reading and writing using pbzip2. Some ideas were taken from the `canopener project `_. If you also want to open S3 files, you may want to use that module instead. Changes ------- v1.2.0 ~~~~~~ * `pbzip2 `_ is now used to open ``.bz2`` files if ``threads`` is greater than zero. v1.1.0 ~~~~~~ * Python 3.5 support is dropped. * On Linux systems, `python-isal `_ is now added as a requirement. This will speed up the reading of gzip files significantly when no external processes are used. v1.0.0 ~~~~~~ * If installed, the ``igzip`` program (part of `Intel ISA-L `_) is now used for reading and writing gzip-compressed files at compression levels 1-3, which results in a significant speedup. v0.9.0 ~~~~~~ * When the file name extension of a file to be opened for reading is not available, the content is inspected (if possible) and used to determine which compression format applies. * This release drops Python 2.7 and 3.4 support. Python 3.5 or later is now required. v0.8.4 ~~~~~~ * When reading gzipped files, force ``pigz`` to use only a single process. ``pigz`` cannot use multiple cores anyway when decompressing. By default, it would use extra I/O processes, which slightly reduces wall-clock time, but increases CPU time. Single-core decompression with ``pigz`` is still about twice as fast as regular ``gzip``. * Allow ``threads=0`` for specifying that no external ``pigz``/``gzip`` process should be used (then regular ``gzip.open()`` is used instead). v0.8.3 ~~~~~~ * When reading gzipped files, let ``pigz`` use at most four threads by default. This limit previously only applied when writing to a file. * Support Python 3.8 v0.8.0 ~~~~~~ * Speed improvements when iterating over gzipped files. v0.6.0 ~~~~~~ * For reading from gzipped files, xopen will now use a ``pigz`` subprocess. This is faster than using ``gzip.open``. * Python 2 support will be dropped in one of the next releases. v0.5.0 ~~~~~~ * By default, pigz is now only allowed to use at most four threads. This hopefully reduces problems some users had with too many threads when opening many files at the same time. * xopen now accepts pathlib.Path objects. Contributors ------------ * Marcel Martin * Ruben Vorderman * For more contributors, see Links ----- * `Source code `_ * `Report an issue `_ * `Project page on PyPI (Python package index) `_ xopen-1.2.1/PKG-INFO0000644000175000017500000001426314132253261013274 0ustar nileshnileshMetadata-Version: 2.1 Name: xopen Version: 1.2.1 Summary: Open compressed files transparently Home-page: https://github.com/pycompression/xopen/ Author: Marcel Martin et al. Author-email: mail@marcelm.net License: MIT Platform: UNKNOWN Classifier: Development Status :: 5 - Production/Stable Classifier: License :: OSI Approved :: MIT License Classifier: Programming Language :: Python :: 3 Requires-Python: >=3.6 Provides-Extra: dev License-File: LICENSE .. image:: https://github.com/pycompression/xopen/workflows/CI/badge.svg :target: https://github.com/pycompression/xopen :alt: .. image:: https://img.shields.io/pypi/v/xopen.svg?branch=master :target: https://pypi.python.org/pypi/xopen .. image:: https://img.shields.io/conda/v/conda-forge/xopen.svg :target: https://anaconda.org/conda-forge/xopen :alt: .. image:: https://codecov.io/gh/pycompression/xopen/branch/main/graph/badge.svg :target: https://codecov.io/gh/pycompression/xopen :alt: ===== xopen ===== This small Python module provides an ``xopen`` function that works like the built-in ``open`` function, but can also deal with compressed files. Supported compression formats are gzip, bzip2 and xz. They are automatically recognized by their file extensions `.gz`, `.bz2` or `.xz`. The focus is on being as efficient as possible on all supported Python versions. For example, ``xopen`` uses ``pigz``, which is a parallel version of ``gzip``, to open ``.gz`` files, which is faster than using the built-in ``gzip.open`` function. ``pigz`` can use multiple threads when compressing, but is also faster when reading ``.gz`` files, so it is used both for reading and writing if it is available. For gzip compression levels 1 to 3, `igzip `_ is used for an even greater speedup. For use cases where using only the main thread is desired xopen can be used with ``threads=0``. This will use `python-isal `_ (which binds isa-l) if python-isal is installed (automatic on Linux systems, as it is a requirement). For installation instructions for python-isal please checkout the `python-isal homepage `_. If python-isal is not available ``gzip.open`` is used. This module has originally been developed as part of the `Cutadapt tool `_ that is used in bioinformatics to manipulate sequencing data. It has been in successful use within that software for a few years. ``xopen`` is compatible with Python versions 3.6 and later. Usage ----- Open a file for reading:: from xopen import xopen with xopen('file.txt.xz') as f: content = f.read() Or without context manager:: from xopen import xopen f = xopen('file.txt.xz') content = f.read() f.close() Open a file in binary mode for writing:: from xopen import xopen with xopen('file.txt.gz', mode='wb') as f: f.write(b'Hello') Credits ------- The name ``xopen`` was taken from the C function of the same name in the `utils.h file which is part of BWA `_. Kyle Beauchamp has contributed support for appending to files. Ruben Vorderman contributed improvements to make reading and writing gzipped files faster. Benjamin Vaisvil contributed support for format detection from content. Dries Schaumont contributed support for faster bz2 reading and writing using pbzip2. Some ideas were taken from the `canopener project `_. If you also want to open S3 files, you may want to use that module instead. Changes ------- v1.2.0 ~~~~~~ * `pbzip2 `_ is now used to open ``.bz2`` files if ``threads`` is greater than zero. v1.1.0 ~~~~~~ * Python 3.5 support is dropped. * On Linux systems, `python-isal `_ is now added as a requirement. This will speed up the reading of gzip files significantly when no external processes are used. v1.0.0 ~~~~~~ * If installed, the ``igzip`` program (part of `Intel ISA-L `_) is now used for reading and writing gzip-compressed files at compression levels 1-3, which results in a significant speedup. v0.9.0 ~~~~~~ * When the file name extension of a file to be opened for reading is not available, the content is inspected (if possible) and used to determine which compression format applies. * This release drops Python 2.7 and 3.4 support. Python 3.5 or later is now required. v0.8.4 ~~~~~~ * When reading gzipped files, force ``pigz`` to use only a single process. ``pigz`` cannot use multiple cores anyway when decompressing. By default, it would use extra I/O processes, which slightly reduces wall-clock time, but increases CPU time. Single-core decompression with ``pigz`` is still about twice as fast as regular ``gzip``. * Allow ``threads=0`` for specifying that no external ``pigz``/``gzip`` process should be used (then regular ``gzip.open()`` is used instead). v0.8.3 ~~~~~~ * When reading gzipped files, let ``pigz`` use at most four threads by default. This limit previously only applied when writing to a file. * Support Python 3.8 v0.8.0 ~~~~~~ * Speed improvements when iterating over gzipped files. v0.6.0 ~~~~~~ * For reading from gzipped files, xopen will now use a ``pigz`` subprocess. This is faster than using ``gzip.open``. * Python 2 support will be dropped in one of the next releases. v0.5.0 ~~~~~~ * By default, pigz is now only allowed to use at most four threads. This hopefully reduces problems some users had with too many threads when opening many files at the same time. * xopen now accepts pathlib.Path objects. Contributors ------------ * Marcel Martin * Ruben Vorderman * For more contributors, see Links ----- * `Source code `_ * `Report an issue `_ * `Project page on PyPI (Python package index) `_ xopen-1.2.1/.editorconfig0000644000175000017500000000013714132253245014651 0ustar nileshnilesh[*.py] charset=utf-8 end_of_line=lf insert_final_newline=true indent_style=space indent_size=4 xopen-1.2.1/src/0000755000175000017500000000000014132253261012760 5ustar nileshnileshxopen-1.2.1/src/xopen.egg-info/0000755000175000017500000000000014132253261015603 5ustar nileshnileshxopen-1.2.1/src/xopen.egg-info/top_level.txt0000644000175000017500000000000614132253261020331 0ustar nileshnileshxopen xopen-1.2.1/src/xopen.egg-info/PKG-INFO0000644000175000017500000001426314132253261016706 0ustar nileshnileshMetadata-Version: 2.1 Name: xopen Version: 1.2.1 Summary: Open compressed files transparently Home-page: https://github.com/pycompression/xopen/ Author: Marcel Martin et al. Author-email: mail@marcelm.net License: MIT Platform: UNKNOWN Classifier: Development Status :: 5 - Production/Stable Classifier: License :: OSI Approved :: MIT License Classifier: Programming Language :: Python :: 3 Requires-Python: >=3.6 Provides-Extra: dev License-File: LICENSE .. image:: https://github.com/pycompression/xopen/workflows/CI/badge.svg :target: https://github.com/pycompression/xopen :alt: .. image:: https://img.shields.io/pypi/v/xopen.svg?branch=master :target: https://pypi.python.org/pypi/xopen .. image:: https://img.shields.io/conda/v/conda-forge/xopen.svg :target: https://anaconda.org/conda-forge/xopen :alt: .. image:: https://codecov.io/gh/pycompression/xopen/branch/main/graph/badge.svg :target: https://codecov.io/gh/pycompression/xopen :alt: ===== xopen ===== This small Python module provides an ``xopen`` function that works like the built-in ``open`` function, but can also deal with compressed files. Supported compression formats are gzip, bzip2 and xz. They are automatically recognized by their file extensions `.gz`, `.bz2` or `.xz`. The focus is on being as efficient as possible on all supported Python versions. For example, ``xopen`` uses ``pigz``, which is a parallel version of ``gzip``, to open ``.gz`` files, which is faster than using the built-in ``gzip.open`` function. ``pigz`` can use multiple threads when compressing, but is also faster when reading ``.gz`` files, so it is used both for reading and writing if it is available. For gzip compression levels 1 to 3, `igzip `_ is used for an even greater speedup. For use cases where using only the main thread is desired xopen can be used with ``threads=0``. This will use `python-isal `_ (which binds isa-l) if python-isal is installed (automatic on Linux systems, as it is a requirement). For installation instructions for python-isal please checkout the `python-isal homepage `_. If python-isal is not available ``gzip.open`` is used. This module has originally been developed as part of the `Cutadapt tool `_ that is used in bioinformatics to manipulate sequencing data. It has been in successful use within that software for a few years. ``xopen`` is compatible with Python versions 3.6 and later. Usage ----- Open a file for reading:: from xopen import xopen with xopen('file.txt.xz') as f: content = f.read() Or without context manager:: from xopen import xopen f = xopen('file.txt.xz') content = f.read() f.close() Open a file in binary mode for writing:: from xopen import xopen with xopen('file.txt.gz', mode='wb') as f: f.write(b'Hello') Credits ------- The name ``xopen`` was taken from the C function of the same name in the `utils.h file which is part of BWA `_. Kyle Beauchamp has contributed support for appending to files. Ruben Vorderman contributed improvements to make reading and writing gzipped files faster. Benjamin Vaisvil contributed support for format detection from content. Dries Schaumont contributed support for faster bz2 reading and writing using pbzip2. Some ideas were taken from the `canopener project `_. If you also want to open S3 files, you may want to use that module instead. Changes ------- v1.2.0 ~~~~~~ * `pbzip2 `_ is now used to open ``.bz2`` files if ``threads`` is greater than zero. v1.1.0 ~~~~~~ * Python 3.5 support is dropped. * On Linux systems, `python-isal `_ is now added as a requirement. This will speed up the reading of gzip files significantly when no external processes are used. v1.0.0 ~~~~~~ * If installed, the ``igzip`` program (part of `Intel ISA-L `_) is now used for reading and writing gzip-compressed files at compression levels 1-3, which results in a significant speedup. v0.9.0 ~~~~~~ * When the file name extension of a file to be opened for reading is not available, the content is inspected (if possible) and used to determine which compression format applies. * This release drops Python 2.7 and 3.4 support. Python 3.5 or later is now required. v0.8.4 ~~~~~~ * When reading gzipped files, force ``pigz`` to use only a single process. ``pigz`` cannot use multiple cores anyway when decompressing. By default, it would use extra I/O processes, which slightly reduces wall-clock time, but increases CPU time. Single-core decompression with ``pigz`` is still about twice as fast as regular ``gzip``. * Allow ``threads=0`` for specifying that no external ``pigz``/``gzip`` process should be used (then regular ``gzip.open()`` is used instead). v0.8.3 ~~~~~~ * When reading gzipped files, let ``pigz`` use at most four threads by default. This limit previously only applied when writing to a file. * Support Python 3.8 v0.8.0 ~~~~~~ * Speed improvements when iterating over gzipped files. v0.6.0 ~~~~~~ * For reading from gzipped files, xopen will now use a ``pigz`` subprocess. This is faster than using ``gzip.open``. * Python 2 support will be dropped in one of the next releases. v0.5.0 ~~~~~~ * By default, pigz is now only allowed to use at most four threads. This hopefully reduces problems some users had with too many threads when opening many files at the same time. * xopen now accepts pathlib.Path objects. Contributors ------------ * Marcel Martin * Ruben Vorderman * For more contributors, see Links ----- * `Source code `_ * `Report an issue `_ * `Project page on PyPI (Python package index) `_ xopen-1.2.1/src/xopen.egg-info/dependency_links.txt0000644000175000017500000000000114132253261021651 0ustar nileshnilesh xopen-1.2.1/src/xopen.egg-info/SOURCES.txt0000644000175000017500000000107014132253261017465 0ustar nileshnilesh.codecov.yml .editorconfig .gitignore LICENSE README.rst pyproject.toml setup.cfg setup.py tox.ini .github/workflows/ci.yml src/xopen/__init__.py src/xopen/_version.py src/xopen/_version.pyi src/xopen/py.typed src/xopen.egg-info/PKG-INFO src/xopen.egg-info/SOURCES.txt src/xopen.egg-info/dependency_links.txt src/xopen.egg-info/requires.txt src/xopen.egg-info/top_level.txt tests/file.txt tests/file.txt.bz2 tests/file.txt.bz2.test tests/file.txt.gz tests/file.txt.gz.test tests/file.txt.test tests/file.txt.xz tests/file.txt.xz.test tests/hello.gz tests/test_xopen.pyxopen-1.2.1/src/xopen.egg-info/requires.txt0000644000175000017500000000017314132253261020204 0ustar nileshnilesh [:platform_machine == "x86_64" or platform_machine == "AMD64" or platform_machine == "aarch64"] isal>=0.9.0 [dev] pytest xopen-1.2.1/src/xopen/0000755000175000017500000000000014132253261014111 5ustar nileshnileshxopen-1.2.1/src/xopen/py.typed0000644000175000017500000000000014132253245015600 0ustar nileshnileshxopen-1.2.1/src/xopen/__init__.py0000644000175000017500000006462714132253245016243 0ustar nileshnilesh""" Open compressed files transparently. """ __all__ = [ "xopen", "PipedGzipReader", "PipedGzipWriter", "PipedIGzipReader", "PipedIGzipWriter", "PipedPigzReader", "PipedPigzWriter", "PipedPBzip2Reader", "PipedPBzip2Writer", "PipedPythonIsalReader", "PipedPythonIsalWriter", "__version__", ] import gzip import sys import io import os import bz2 import lzma import stat import signal import pathlib import subprocess import tempfile import time from abc import ABC, abstractmethod from subprocess import Popen, PIPE, DEVNULL from typing import Optional, TextIO, AnyStr, IO, List, Set from ._version import version as __version__ try: from isal import igzip, isal_zlib # type: ignore except ImportError: igzip = None isal_zlib = None try: import fcntl # fcntl.F_SETPIPE_SZ will be available in python 3.10. # https://github.com/python/cpython/pull/21921 # If not available: set it to the correct value for known platforms. if not hasattr(fcntl, "F_SETPIPE_SZ") and sys.platform == "linux": setattr(fcntl, "F_SETPIPE_SZ", 1031) except ImportError: fcntl = None # type: ignore _MAX_PIPE_SIZE_PATH = pathlib.Path("/proc/sys/fs/pipe-max-size") try: _MAX_PIPE_SIZE = int(_MAX_PIPE_SIZE_PATH.read_text()) # type: Optional[int] except OSError: # Catches file not found and permission errors. Possible other errors too. _MAX_PIPE_SIZE = None def _available_cpu_count() -> int: """ Number of available virtual or physical CPUs on this system Adapted from http://stackoverflow.com/a/1006301/715090 """ try: return len(os.sched_getaffinity(0)) except AttributeError: pass import re try: with open('/proc/self/status') as f: status = f.read() m = re.search(r'(?m)^Cpus_allowed:\s*(.*)$', status) if m: res = bin(int(m.group(1).replace(',', ''), 16)).count('1') if res > 0: return res except OSError: pass try: import multiprocessing return multiprocessing.cpu_count() except (ImportError, NotImplementedError): return 1 def _set_pipe_size_to_max(fd: int) -> None: """ Set pipe size to maximum on platforms that support it. :param fd: The file descriptor to increase the pipe size for. """ if not hasattr(fcntl, "F_SETPIPE_SZ") or not _MAX_PIPE_SIZE: return try: fcntl.fcntl(fd, fcntl.F_SETPIPE_SZ, _MAX_PIPE_SIZE) # type: ignore except OSError: pass def _can_read_concatenated_gz(program: str) -> bool: """ Check if a concatenated gzip file can be read properly. Not all deflate programs handle this properly. """ fd, temp_path = tempfile.mkstemp(suffix=".gz", prefix="xopen.") try: # Create a concatenated gzip file. gzip.compress recreates the contents # of a gzip file including header and trailer. with open(temp_path, "wb") as temp_file: temp_file.write(gzip.compress(b"AB") + gzip.compress(b"CD")) try: result = subprocess.run([program, "-c", "-d", temp_path], check=True, stderr=PIPE, stdout=PIPE) return result.stdout == b"ABCD" except subprocess.CalledProcessError: # Program can't read zip return False finally: os.close(fd) os.remove(temp_path) class Closing(ABC): """ Inherit from this class and implement a close() method to offer context manager functionality. """ def __enter__(self): return self def __exit__(self, *exc_info): self.close() def __del__(self): try: self.close() except Exception: pass @abstractmethod def close(self): """Called when exiting the context manager""" class PipedCompressionWriter(Closing): """ Write Compressed files by running an external process and piping into it. """ def __init__(self, path, program_args: List[str], mode='wt', compresslevel: Optional[int] = None, threads_flag: Optional[str] = None, threads: Optional[int] = None): """ mode -- one of 'w', 'wt', 'wb', 'a', 'at', 'ab' compresslevel -- compression level threads_flag -- which flag is used to denote the number of threads in the program. If set to none, program will be called without threads flag. threads (int) -- number of threads. If this is set to None, a reasonable default is used. At the moment, this means that the number of available CPU cores is used, capped at four to avoid creating too many threads. Use 0 to use all available cores. """ if mode not in ('w', 'wt', 'wb', 'a', 'at', 'ab'): raise ValueError( "Mode is '{}', but it must be 'w', 'wt', 'wb', 'a', 'at' or 'ab'".format(mode)) # TODO use a context manager self.outfile = open(path, mode) self.closed: bool = False self.name: str = path self._mode: str = mode self._program_args: List[str] = program_args self._threads_flag: Optional[str] = threads_flag if threads is None: threads = min(_available_cpu_count(), 4) self._threads = threads try: self.process = self._open_process( mode, compresslevel, threads, self.outfile) except OSError: self.outfile.close() raise assert self.process.stdin is not None _set_pipe_size_to_max(self.process.stdin.fileno()) if 'b' not in mode: self._file = io.TextIOWrapper(self.process.stdin) # type: IO else: self._file = self.process.stdin def __repr__(self): return "{}('{}', mode='{}', program='{}', threads={})".format( self.__class__.__name__, self.name, self._mode, " ".join(self._program_args), self._threads, ) def _open_process( self, mode: str, compresslevel: Optional[int], threads: int, outfile: TextIO, ) -> Popen: program_args: List[str] = self._program_args[:] # prevent list aliasing if threads != 0 and self._threads_flag is not None: program_args += [f"{self._threads_flag}{threads}"] extra_args = [] if 'w' in mode and compresslevel is not None: extra_args += ['-' + str(compresslevel)] kwargs = dict(stdin=PIPE, stdout=outfile, stderr=DEVNULL) # Setting close_fds to True in the Popen arguments is necessary due to # . # However, close_fds is not supported on Windows. See # . if sys.platform != 'win32': kwargs['close_fds'] = True process = Popen(program_args + extra_args, **kwargs) # type: ignore return process def write(self, arg: AnyStr) -> None: self._file.write(arg) def close(self) -> None: if self.closed: return self.closed = True self._file.close() retcode = self.process.wait() self.outfile.close() if retcode != 0: raise OSError( "Output {} process terminated with exit code {}".format( " ".join(self._program_args), retcode)) def __iter__(self): # type: ignore # For compatibility with Pandas, which checks for an __iter__ method # to determine whether an object is file-like. return self def __next__(self): raise io.UnsupportedOperation('not readable') class PipedCompressionReader(Closing): """ Open a pipe to a process for reading a compressed file. """ # This exit code is not interpreted as an error when terminating the process _allowed_exit_code: Optional[int] = -signal.SIGTERM # If this message is printed on stderr on terminating the process, # it is not interpreted as an error _allowed_exit_message: Optional[bytes] = None def __init__( self, path, program_args: List[str], mode: str = "r", threads_flag: Optional[str] = None, threads: Optional[int] = None, ): """ Raise an OSError when pigz could not be found. """ if mode not in ('r', 'rt', 'rb'): raise ValueError("Mode is '{}', but it must be 'r', 'rt' or 'rb'".format(mode)) self._program_args = program_args program_args = program_args + ['-cd', path] if threads_flag is not None: if threads is None: # Single threaded behaviour by default because: # - Using a single thread to read a file is the least unexpected # behaviour. (For users of xopen, who do not know which backend is used.) # - There is quite a substantial overhead (+25% CPU time) when # using multiple threads while there is only a 10% gain in wall # clock time. threads = 1 program_args += [f"{threads_flag}{threads}"] self._threads = threads self.process = Popen(program_args, stdout=PIPE, stderr=PIPE) self.name = path assert self.process.stdout is not None _set_pipe_size_to_max(self.process.stdout.fileno()) self._mode = mode if 'b' not in mode: self._file: IO = io.TextIOWrapper(self.process.stdout) else: self._file = self.process.stdout self.closed = False self._wait_for_output_or_process_exit() self._raise_if_error() def __repr__(self): return "{}('{}', mode='{}', program='{}', threads={})".format( self.__class__.__name__, self.name, self._mode, " ".join(self._program_args), self._threads, ) def close(self) -> None: if self.closed: return self.closed = True retcode = self.process.poll() check_allowed_code_and_message = False if retcode is None: # still running self.process.terminate() check_allowed_code_and_message = True _, stderr_message = self.process.communicate() self._file.close() self._raise_if_error(check_allowed_code_and_message, stderr_message) def __iter__(self): return self def __next__(self) -> AnyStr: return self._file.__next__() def _wait_for_output_or_process_exit(self): """ Wait for the process to produce at least some output, or has exited. """ # The program may crash due to a non-existing file, internal error etc. # In that case we need to check. However the 'time-to-crash' differs # between programs. Some crash faster than others. # Therefore we peek the first character(s) of stdout. Peek will return at # least one byte of data, unless the buffer is empty or at EOF. If at EOF, # we should wait for the program to exit. This way we ensure the program # has at least decompressed some output, or stopped before we continue. # stdout is io.BufferedReader if set to PIPE while True: first_output = self.process.stdout.peek(1) # type: ignore if first_output or self.process.poll() is not None: break time.sleep(0.01) def _raise_if_error(self, check_allowed_code_and_message: bool = False, stderr_message: bytes = b"") -> None: """ Raise OSError if process is not running anymore and the exit code is nonzero. If check_allowed_code_and_message is set, OSError is not raised when (1) the exit value of the process is equal to the value of the allowed_exit_code attribute or (2) the allowed_exit_message attribute is set and it matches with stderr_message. """ retcode = self.process.poll() if retcode is None: # process still running return if retcode == 0: # process terminated successfully return if check_allowed_code_and_message: if retcode == self._allowed_exit_code: # terminated with allowed exit code return if ( self._allowed_exit_message and stderr_message.startswith(self._allowed_exit_message) ): # terminated with another exit code, but message is allowed return assert self.process.stderr is not None if not stderr_message: stderr_message = self.process.stderr.read() self._file.close() raise OSError("{!r} (exit code {})".format(stderr_message, retcode)) def read(self, *args) -> AnyStr: return self._file.read(*args) def readinto(self, *args): return self._file.readinto(*args) def readline(self, *args) -> AnyStr: return self._file.readline(*args) def seekable(self) -> bool: return self._file.seekable() def peek(self, n: int = None): if hasattr(self._file, "peek"): return self._file.peek(n) # type: ignore else: raise AttributeError("Peek is not available when 'b' not in mode") def readable(self) -> bool: return self._file.readable() def writable(self) -> bool: return self._file.writable() def flush(self) -> None: return None class PipedGzipReader(PipedCompressionReader): """ Open a pipe to gzip for reading a gzipped file. """ def __init__(self, path, mode: str = "r"): super().__init__(path, ["gzip"], mode) class PipedGzipWriter(PipedCompressionWriter): """ Write gzip-compressed files by running an external gzip process and piping into it. On Python 3, gzip.GzipFile is on par with gzip itself, but running an external gzip can still reduce wall-clock time because the compression happens in a separate process. """ def __init__(self, path, mode: str = "wt", compresslevel: Optional[int] = None): """ mode -- one of 'w', 'wt', 'wb', 'a', 'at', 'ab' compresslevel -- compression level threads (int) -- number of pigz threads. If this is set to None, a reasonable default is used. At the moment, this means that the number of available CPU cores is used, capped at four to avoid creating too many threads. Use 0 to let pigz use all available cores. """ if compresslevel is not None and compresslevel not in range(1, 10): raise ValueError("compresslevel must be between 1 and 9") super().__init__(path, ["gzip"], mode, compresslevel, None) class PipedPigzReader(PipedCompressionReader): """ Open a pipe to pigz for reading a gzipped file. Even though pigz is mostly used to speed up writing by using many compression threads, it is also faster when reading, even when forced to use a single thread (ca. 2x speedup). """ def __init__(self, path, mode: str = "r", threads: Optional[int] = None): super().__init__(path, ["pigz"], mode, "-p", threads) class PipedPigzWriter(PipedCompressionWriter): """ Write gzip-compressed files by running an external pigz process and piping into it. pigz can compress using multiple cores. It is also more efficient than gzip on only one core. (But then igzip is even faster and should be preferred if the compression level allows it.) """ _accepted_compression_levels: Set[int] = set(list(range(10)) + [11]) def __init__( self, path, mode: str = "wt", compresslevel: Optional[int] = None, threads: Optional[int] = None, ): """ mode -- one of 'w', 'wt', 'wb', 'a', 'at', 'ab' compresslevel -- compression level threads (int) -- number of pigz threads. If this is set to None, a reasonable default is used. At the moment, this means that the number of available CPU cores is used, capped at four to avoid creating too many threads. Use 0 to let pigz use all available cores. """ if compresslevel is not None and compresslevel not in self._accepted_compression_levels: raise ValueError("compresslevel must be between 0 and 9 or 11") super().__init__(path, ["pigz"], mode, compresslevel, "-p", threads) class PipedPBzip2Reader(PipedCompressionReader): """ Open a pipe to pbzip2 for reading a bzipped file. """ _allowed_exit_code = None _allowed_exit_message = b"\n *Control-C or similar caught [sig=15], quitting..." def __init__(self, path, mode: str = "r", threads: Optional[int] = None): super().__init__(path, ["pbzip2"], mode, "-p", threads) class PipedPBzip2Writer(PipedCompressionWriter): """ Write bzip2-compressed files by running an external pbzip2 process and piping into it. pbzip2 can compress using multiple cores. """ def __init__( self, path, mode: str = "wt", threads: Optional[int] = None, ): # Use default compression level for pbzip2: 9 super().__init__(path, ["pbzip2"], mode, 9, "-p", threads) class PipedIGzipReader(PipedCompressionReader): """ Uses igzip for reading of a gzipped file. This is much faster than either gzip or pigz which were written to run on a wide array of systems. igzip can only run on x86 and ARM architectures, but is able to use more architecture-specific optimizations as a result. """ def __init__(self, path, mode: str = "r"): if not _can_read_concatenated_gz("igzip"): # Instead of elaborate version string checking once the problem is # fixed, it is much easier to use this, "proof in the pudding" type # of evaluation. raise ValueError( "This version of igzip does not support reading " "concatenated gzip files and is therefore not " "safe to use. See: https://github.com/intel/isa-l/issues/143") super().__init__(path, ["igzip"], mode) class PipedIGzipWriter(PipedCompressionWriter): """ Uses igzip for writing a gzipped file. This is much faster than either gzip or pigz which were written to run on a wide array of systems. igzip can only run on x86 and ARM architectures, but is able to use more architecture-specific optimizations as a result. Threads are supported by a flag, but do not add any speed. Also on some distro version (isal package in debian buster) the thread flag is not present. For these reason threads are omitted from the interface. Only compresslevel 0-3 are supported and these output slightly different filesizes from their pigz/gzip counterparts. See: https://gist.github.com/rhpvorderman/4f1201c3f39518ff28dde45409eb696b """ def __init__(self, path, mode: str = "wt", compresslevel: Optional[int] = None): if compresslevel is not None and compresslevel not in range(0, 4): raise ValueError("compresslevel must be between 0 and 3") super().__init__(path, ["igzip"], mode, compresslevel) class PipedPythonIsalReader(PipedCompressionReader): def __init__(self, path, mode: str = "r"): super().__init__(path, [sys.executable, "-m", "isal.igzip"], mode) class PipedPythonIsalWriter(PipedCompressionWriter): def __init__(self, path, mode: str = "wt", compresslevel: Optional[int] = None): if compresslevel is not None and compresslevel not in range(0, 4): raise ValueError("compresslevel must be between 0 and 3") super().__init__(path, [sys.executable, "-m", "isal.igzip"], mode, compresslevel) def _open_stdin_or_out(mode: str) -> IO: # Do not return sys.stdin or sys.stdout directly as we want the returned object # to be closable without closing sys.stdout. std = dict(r=sys.stdin, w=sys.stdout)[mode[0]] return open(std.fileno(), mode=mode, closefd=False) def _open_bz2(filename, mode: str, threads: Optional[int]): if threads != 0: try: if "r" in mode: return PipedPBzip2Reader(filename, mode, threads) else: return PipedPBzip2Writer(filename, mode, threads) except OSError: pass # We try without threads. return bz2.open(filename, mode) def _open_xz(filename, mode: str) -> IO: return lzma.open(filename, mode) def _open_external_gzip_reader(filename, mode, compresslevel, threads): assert "r" in mode try: return PipedIGzipReader(filename, mode) except (OSError, ValueError): # No igzip installed or version does not support reading # concatenated files. pass if igzip: return PipedPythonIsalReader(filename, mode) try: return PipedPigzReader(filename, mode, threads=threads) except OSError: return PipedGzipReader(filename, mode) def _open_external_gzip_writer(filename, mode, compresslevel, threads): assert "r" not in mode try: return PipedIGzipWriter(filename, mode, compresslevel) except (OSError, ValueError): # No igzip installed or compression level higher than 3 pass if igzip: # We can use the CLI from isal.igzip try: return PipedPythonIsalWriter(filename, mode, compresslevel) except ValueError: # Wrong compression level pass try: return PipedPigzWriter(filename, mode, compresslevel, threads=threads) except OSError: return PipedGzipWriter(filename, mode, compresslevel) def _open_gz(filename, mode: str, compresslevel, threads): if threads != 0: try: if "r" in mode: return _open_external_gzip_reader(filename, mode, compresslevel, threads) else: return _open_external_gzip_writer(filename, mode, compresslevel, threads) except OSError: pass # We try without threads. if 'r' in mode: if igzip is not None: return igzip.open(filename, mode) return gzip.open(filename, mode) if igzip is not None: try: return igzip.open(filename, mode, compresslevel=isal_zlib.ISAL_DEFAULT_COMPRESSION if compresslevel is None else compresslevel) except ValueError: # Compression level not supported, move to built-in gzip. pass # Override gzip.open's default of 9 for consistency with command-line gzip. return gzip.open(filename, mode, compresslevel=6 if compresslevel is None else compresslevel) def _detect_format_from_content(filename: str) -> Optional[str]: """ Attempts to detect file format from the content by reading the first 6 bytes. Returns None if no format could be detected. """ try: if stat.S_ISREG(os.stat(filename).st_mode): with open(filename, "rb") as fh: bs = fh.read(6) if bs[:2] == b'\x1f\x8b': # https://tools.ietf.org/html/rfc1952#page-6 return "gz" elif bs[:3] == b'\x42\x5a\x68': # https://en.wikipedia.org/wiki/List_of_file_signatures return "bz2" elif bs[:6] == b'\xfd\x37\x7a\x58\x5a\x00': # https://tukaani.org/xz/xz-file-format.txt return "xz" except OSError: pass return None def _detect_format_from_extension(filename: str) -> Optional[str]: """ Attempts to detect file format from the filename extension. Returns None if no format could be detected. """ if filename.endswith('.bz2'): return "bz2" elif filename.endswith('.xz'): return "xz" elif filename.endswith('.gz'): return "gz" else: return None def xopen( filename, mode: str = "r", compresslevel: Optional[int] = None, threads: Optional[int] = None, ) -> IO: """ A replacement for the "open" function that can also read and write compressed files transparently. The supported compression formats are gzip, bzip2 and xz. If the filename is '-', standard output (mode 'w') or standard input (mode 'r') is returned. When writing, the file format is chosen based on the file name extension: - .gz uses gzip compression - .bz2 uses bzip2 compression - .xz uses xz/lzma compression - otherwise, no compression is used When reading, if a file name extension is available, the format is detected using it, but if not, the format is detected from the contents. mode can be: 'rt', 'rb', 'at', 'ab', 'wt', or 'wb'. Also, the 't' can be omitted, so instead of 'rt', 'wt' and 'at', the abbreviations 'r', 'w' and 'a' can be used. compresslevel is the compression level for writing to gzip files. This parameter is ignored for the other compression formats. If set to None (default), level 6 is used. threads only has a meaning when reading or writing gzip files. When threads is None (the default), reading or writing a gzip file is done with a pigz (parallel gzip) subprocess if possible. See PipedGzipWriter and PipedGzipReader. When threads = 0, no subprocess is used. """ if mode in ('r', 'w', 'a'): mode += 't' if mode not in ('rt', 'rb', 'wt', 'wb', 'at', 'ab'): raise ValueError("Mode '{}' not supported".format(mode)) filename = os.fspath(filename) if filename == '-': return _open_stdin_or_out(mode) detected_format = _detect_format_from_extension(filename) if detected_format is None and "w" not in mode: detected_format = _detect_format_from_content(filename) if detected_format == "gz": opened_file = _open_gz(filename, mode, compresslevel, threads) elif detected_format == "xz": opened_file = _open_xz(filename, mode) elif detected_format == "bz2": opened_file = _open_bz2(filename, mode, threads) else: opened_file = open(filename, mode) # The "write" method for GzipFile is very costly. Lots of python calls are # made. To a lesser extent this is true for LzmaFile and BZ2File. By # putting a buffer in between, the expensive write method is called much # less. The effect is very noticeable when writing small units such as # lines or FASTQ records. if (isinstance(opened_file, (gzip.GzipFile, bz2.BZ2File, lzma.LZMAFile)) and "w" in mode): opened_file = io.BufferedWriter(opened_file) # type: ignore return opened_file xopen-1.2.1/src/xopen/_version.pyi0000644000175000017500000000022414132253245016460 0ustar nileshnilesh# The _version.py file is generated on installation. By including this stub, # we can run mypy without having to install the package. version: str xopen-1.2.1/src/xopen/_version.py0000644000175000017500000000021614132253260016305 0ustar nileshnilesh# coding: utf-8 # file generated by setuptools_scm # don't change, don't track in version control version = '1.2.1' version_tuple = (1, 2, 1) xopen-1.2.1/tox.ini0000644000175000017500000000123714132253245013511 0ustar nileshnilesh[tox] envlist = flake8,mypy,py36,py37,py38,py39,pypy3 [testenv] deps = pytest coverage setenv = PYTHONDEVMODE = 1 commands = coverage run --branch --source=xopen,tests -m pytest -v --doctest-modules tests coverage report coverage xml coverage html [testenv:isal] deps = pytest coverage isal [testenv:flake8] basepython = python3.7 deps = flake8 commands = flake8 src/ tests/ skip_install = true [testenv:mypy] basepython = python3.7 deps = mypy commands = mypy src/ skip_install = true [flake8] max-line-length = 99 max-complexity = 10 extend_ignore = E731 [coverage:report] exclude_lines = pragma: no cover def __repr__ xopen-1.2.1/setup.py0000644000175000017500000000010714132253245013703 0ustar nileshnileshfrom setuptools import setup setup(setup_requires=["setuptools_scm"]) xopen-1.2.1/tests/0000755000175000017500000000000014132253261013333 5ustar nileshnileshxopen-1.2.1/tests/file.txt.gz0000644000175000017500000000006514132253245015435 0ustar nileshnileshȵW I-.KQ(0B2RSRr2Rs&xopen-1.2.1/tests/file.txt.bz2.test0000644000175000017500000000016614132253245016472 0ustar nileshnileshBZh91AY&SYӀ@ 1MTikt%B"(HN|BZh91AY&SYsS@e 1ē& 7"(H9xopen-1.2.1/tests/hello.gz0000644000175000017500000000003114132253245014774 0ustar nileshnileshZH6xopen-1.2.1/tests/file.txt.gz.test0000644000175000017500000000006514132253245016413 0ustar nileshnileshȵW I-.KQ(0B2RSRr2Rs&xopen-1.2.1/tests/file.txt.xz.test0000644000175000017500000000014014132253245016426 0ustar nileshnilesh7zXZִF!t/%Testing, testing ... The second line. ]ݜa>&+N}YZxopen-1.2.1/tests/file.txt.test0000644000175000017500000000004614132253245015773 0ustar nileshnileshTesting, testing ... The second line. xopen-1.2.1/tests/test_xopen.py0000644000175000017500000005207014132253245016103 0ustar nileshnileshimport gzip import bz2 import lzma import io import os import random import shutil import signal import sys import time import pytest from pathlib import Path from contextlib import contextmanager from itertools import cycle from xopen import ( xopen, PipedCompressionReader, PipedCompressionWriter, PipedGzipReader, PipedGzipWriter, PipedPBzip2Reader, PipedPBzip2Writer, PipedPigzReader, PipedPigzWriter, PipedIGzipReader, PipedIGzipWriter, PipedPythonIsalReader, PipedPythonIsalWriter, _MAX_PIPE_SIZE, _can_read_concatenated_gz, igzip, ) extensions = ["", ".gz", ".bz2", ".xz"] try: import fcntl if not hasattr(fcntl, "F_GETPIPE_SZ") and sys.platform == "linux": setattr(fcntl, "F_GETPIPE_SZ", 1032) except ImportError: fcntl = None base = "tests/file.txt" files = [base + ext for ext in extensions] CONTENT_LINES = ['Testing, testing ...\n', 'The second line.\n'] CONTENT = ''.join(CONTENT_LINES) def available_gzip_readers_and_writers(): readers = [ klass for prog, klass in [ ("gzip", PipedGzipReader), ("pigz", PipedPigzReader), ("igzip", PipedIGzipReader), ] if shutil.which(prog) ] if PipedIGzipReader in readers and not _can_read_concatenated_gz("igzip"): readers.remove(PipedIGzipReader) writers = [ klass for prog, klass in [ ("gzip", PipedGzipWriter), ("pigz", PipedPigzWriter), ("igzip", PipedIGzipWriter), ] if shutil.which(prog) ] if igzip is not None: readers.append(PipedPythonIsalReader) writers.append(PipedPythonIsalWriter) return readers, writers PIPED_GZIP_READERS, PIPED_GZIP_WRITERS = available_gzip_readers_and_writers() def available_bzip2_readers_and_writers(): if shutil.which("pbzip2"): return [PipedPBzip2Reader], [PipedPBzip2Writer] return [], [] PIPED_BZIP2_READERS, PIPED_BZIP2_WRITERS = available_bzip2_readers_and_writers() ALL_READERS_WITH_EXTENSION = list(zip(PIPED_GZIP_READERS, cycle([".gz"]))) + \ list(zip(PIPED_BZIP2_READERS, cycle([".bz2"]))) ALL_WRITERS_WITH_EXTENSION = list(zip(PIPED_GZIP_WRITERS, cycle([".gz"]))) + \ list(zip(PIPED_BZIP2_WRITERS, cycle([".bz2"]))) THREADED_READERS = set([(PipedPigzReader, ".gz"), (PipedPBzip2Reader, ".bz2")]) & \ set(ALL_READERS_WITH_EXTENSION) @pytest.fixture(params=PIPED_GZIP_WRITERS) def gzip_writer(request): return request.param @pytest.fixture(params=extensions) def ext(request): return request.param @pytest.fixture(params=files) def fname(request): return request.param @pytest.fixture(params=ALL_READERS_WITH_EXTENSION) def reader(request): return request.param @pytest.fixture(params=THREADED_READERS) def threaded_reader(request): return request.param @pytest.fixture(params=ALL_WRITERS_WITH_EXTENSION) def writer(request): return request.param @contextmanager def disable_binary(tmp_path, binary_name): """ Find the location of the binary by its name, then set PATH to a directory that contains the binary with permissions set to 000. If no suitable binary could be found, PATH is set to an empty directory """ try: binary_path = shutil.which(binary_name) if binary_path: shutil.copy(binary_path, str(tmp_path)) os.chmod(str(tmp_path / binary_name), 0) path = os.environ["PATH"] os.environ["PATH"] = str(tmp_path) yield finally: os.environ["PATH"] = path @pytest.fixture def lacking_pigz_permissions(tmp_path): with disable_binary(tmp_path, "pigz"): yield @pytest.fixture def lacking_pbzip2_permissions(tmp_path): with disable_binary(tmp_path, "pbzip2"): yield @pytest.fixture(params=[1024, 2048, 4096]) def create_large_file(tmpdir, request): def _create_large_file(extension): path = str(tmpdir.join(f"large{extension}")) random_text = ''.join(random.choice('ABCDEFGHIJKLMNOPQRSTUVWXYZ\n') for _ in range(1024)) # Make the text a lot bigger in order to ensure that it is larger than the # pipe buffer size. random_text *= request.param with xopen(path, 'w') as f: f.write(random_text) return path return _create_large_file @pytest.fixture def create_truncated_file(create_large_file): def _create_truncated_file(extension): large_file = create_large_file(extension) with open(large_file, 'a') as f: f.truncate(os.stat(large_file).st_size - 10) return large_file return _create_truncated_file @pytest.fixture def xopen_without_igzip(monkeypatch): import xopen # xopen local overrides xopen global variable monkeypatch.setattr(xopen, "igzip", None) return xopen.xopen def test_xopen_text(fname): with xopen(fname, 'rt') as f: lines = list(f) assert len(lines) == 2 assert lines[1] == 'The second line.\n', fname def test_xopen_binary(fname): with xopen(fname, 'rb') as f: lines = list(f) assert len(lines) == 2 assert lines[1] == b'The second line.\n', fname def test_xopen_binary_no_isal_no_threads(fname, xopen_without_igzip): with xopen_without_igzip(fname, 'rb', threads=0) as f: lines = list(f) assert len(lines) == 2 assert lines[1] == b'The second line.\n', fname def test_xopen_binary_no_isal(fname, xopen_without_igzip): with xopen_without_igzip(fname, 'rb', threads=1) as f: lines = list(f) assert len(lines) == 2 assert lines[1] == b'The second line.\n', fname def test_no_context_manager_text(fname): f = xopen(fname, 'rt') lines = list(f) assert len(lines) == 2 assert lines[1] == 'The second line.\n', fname f.close() assert f.closed def test_no_context_manager_binary(fname): f = xopen(fname, 'rb') lines = list(f) assert len(lines) == 2 assert lines[1] == b'The second line.\n', fname f.close() assert f.closed def test_readinto(fname): content = CONTENT.encode('utf-8') with xopen(fname, 'rb') as f: b = bytearray(len(content) + 100) length = f.readinto(b) assert length == len(content) assert b[:length] == content def test_reader_readinto(reader): opener, extension = reader content = CONTENT.encode('utf-8') with opener(f"tests/file.txt{extension}", "rb") as f: b = bytearray(len(content) + 100) length = f.readinto(b) assert length == len(content) assert b[:length] == content def test_reader_textiowrapper(reader): opener, extension = reader with opener(f"tests/file.txt{extension}", "rb") as f: wrapped = io.TextIOWrapper(f) assert wrapped.read() == CONTENT def test_detect_file_format_from_content(ext): with xopen(f"tests/file.txt{ext}.test", "rb") as fh: assert fh.readline() == CONTENT_LINES[0].encode("utf-8") def test_readline(fname): first_line = CONTENT_LINES[0].encode('utf-8') with xopen(fname, 'rb') as f: assert f.readline() == first_line def test_readline_text(fname): with xopen(fname, 'r') as f: assert f.readline() == CONTENT_LINES[0] def test_reader_readline(reader): opener, extension = reader first_line = CONTENT_LINES[0].encode('utf-8') with opener(f"tests/file.txt{extension}", "rb") as f: assert f.readline() == first_line def test_reader_readline_text(reader): opener, extension = reader with opener(f"tests/file.txt{extension}", "r") as f: assert f.readline() == CONTENT_LINES[0] @pytest.mark.parametrize("threads", [None, 1, 2]) def test_piped_reader_iter(threads, threaded_reader): opener, extension = threaded_reader with opener(f"tests/file.txt{extension}", mode="r", threads=threads) as f: lines = list(f) assert lines[0] == CONTENT_LINES[0] def test_next(fname): with xopen(fname, "rt") as f: _ = next(f) line2 = next(f) assert line2 == 'The second line.\n', fname def test_xopen_has_iter_method(ext, tmpdir): path = str(tmpdir.join("out" + ext)) with xopen(path, mode='w') as f: assert hasattr(f, '__iter__') def test_writer_has_iter_method(tmpdir, writer): opener, extension = writer with opener(str(tmpdir.join(f"out.{extension}"))) as f: assert hasattr(f, '__iter__') def test_iter_without_with(fname): f = xopen(fname, "rt") it = iter(f) assert CONTENT_LINES[0] == next(it) f.close() def test_reader_iter_without_with(reader): opener, extension = reader it = iter(opener(f"tests/file.txt{extension}")) assert CONTENT_LINES[0] == next(it) @pytest.mark.parametrize("mode", ["rb", "rt"]) def test_reader_close(mode, reader, create_large_file): reader, extension = reader large_file = create_large_file(extension) with reader(large_file, mode=mode) as f: f.readline() time.sleep(0.2) # The subprocess should be properly terminated now @pytest.mark.parametrize("extension", [".gz", ".bz2"]) def test_partial_iteration_closes_correctly(extension, create_large_file): class LineReader: def __init__(self, file): self.file = xopen(file, "rb") def __iter__(self): wrapper = io.TextIOWrapper(self.file) yield from wrapper large_file = create_large_file(extension) f = LineReader(large_file) next(iter(f)) f.file.close() def test_nonexisting_file(ext): with pytest.raises(IOError): with xopen('this-file-does-not-exist' + ext): pass # pragma: no cover def test_write_to_nonexisting_dir(ext): with pytest.raises(IOError): with xopen('this/path/does/not/exist/file.txt' + ext, 'w'): pass # pragma: no cover def test_invalid_mode(ext): with pytest.raises(ValueError): with xopen(f"tests/file.txt.{ext}", mode="hallo"): pass # pragma: no cover def test_filename_not_a_string(): with pytest.raises(TypeError): with xopen(123, mode="r"): pass # pragma: no cover def test_invalid_compression_level(tmpdir): path = str(tmpdir.join("out.gz")) with pytest.raises(ValueError) as e: with xopen(path, mode="w", compresslevel=17) as f: f.write("hello") # pragma: no cover assert "compresslevel must be" in e.value.args[0] def test_invalid_compression_level_writers(gzip_writer, tmpdir): # Currently only gzip writers handle compression levels path = str(tmpdir.join("out.gz")) with pytest.raises(ValueError) as e: with gzip_writer(path, mode="w", compresslevel=17) as f: f.write("hello") # pragma: no cover assert "compresslevel must be" in e.value.args[0] @pytest.mark.parametrize("ext", extensions) def test_append(ext, tmpdir): text = b"AB" reference = text + text path = str(tmpdir.join("the-file" + ext)) with xopen(path, "ab") as f: f.write(text) with xopen(path, "ab") as f: f.write(text) with xopen(path, "r") as f: for appended in f: pass reference = reference.decode("utf-8") assert appended == reference @pytest.mark.parametrize("ext", extensions) def test_append_text(ext, tmpdir): text = "AB" reference = text + text path = str(tmpdir.join("the-file" + ext)) with xopen(path, "at") as f: f.write(text) with xopen(path, "at") as f: f.write(text) with xopen(path, "rt") as f: for appended in f: pass assert appended == reference class TookTooLongError(Exception): pass class timeout: # copied from https://stackoverflow.com/a/22348885/715090 def __init__(self, seconds=1): self.seconds = seconds def handle_timeout(self, signum, frame): raise TookTooLongError() # pragma: no cover def __enter__(self): signal.signal(signal.SIGALRM, self.handle_timeout) signal.alarm(self.seconds) def __exit__(self, type, value, traceback): signal.alarm(0) @pytest.mark.parametrize("extension", [".gz", ".bz2"]) def test_truncated_file(extension, create_truncated_file): truncated_file = create_truncated_file(extension) with timeout(seconds=2): with pytest.raises((EOFError, IOError)): f = xopen(truncated_file, "r") f.read() f.close() # pragma: no cover @pytest.mark.parametrize("extension", [".gz", ".bz2"]) def test_truncated_iter(extension, create_truncated_file): truncated_file = create_truncated_file(extension) with timeout(seconds=2): with pytest.raises((EOFError, IOError)): f = xopen(truncated_file, 'r') for line in f: pass f.close() # pragma: no cover @pytest.mark.parametrize("extension", [".gz", ".bz2"]) def test_truncated_with(extension, create_truncated_file): truncated_file = create_truncated_file(extension) with timeout(seconds=2): with pytest.raises((EOFError, IOError)): with xopen(truncated_file, 'r') as f: f.read() @pytest.mark.parametrize("extension", [".gz", ".bz2"]) def test_truncated_iter_with(extension, create_truncated_file): truncated_file = create_truncated_file(extension) with timeout(seconds=2): with pytest.raises((EOFError, IOError)): with xopen(truncated_file, 'r') as f: for line in f: pass def test_bare_read_from_gz(): with xopen('tests/hello.gz', 'rt') as f: assert f.read() == 'hello' def test_readers_read(reader): opener, extension = reader with opener(f'tests/file.txt{extension}', 'rt') as f: assert f.read() == CONTENT def test_write_threads(tmpdir, ext): path = str(tmpdir.join(f'out.{ext}')) with xopen(path, mode='w', threads=3) as f: f.write('hello') with xopen(path) as f: assert f.read() == 'hello' def test_write_pigz_threads_no_isal(tmpdir, xopen_without_igzip): path = str(tmpdir.join('out.gz')) with xopen_without_igzip(path, mode='w', threads=3) as f: f.write('hello') with xopen_without_igzip(path) as f: assert f.read() == 'hello' def test_read_no_threads(ext): klasses = { ".bz2": bz2.BZ2File, ".gz": gzip.GzipFile, ".xz": lzma.LZMAFile, "": io.BufferedReader, } klass = klasses[ext] with xopen(f"tests/file.txt{ext}", "rb", threads=0) as f: assert isinstance(f, klass), f def test_write_no_threads(tmpdir, ext): klasses = { ".bz2": bz2.BZ2File, ".gz": gzip.GzipFile, ".xz": lzma.LZMAFile, "": io.BufferedWriter, } klass = klasses[ext] path = str(tmpdir.join(f"out.{ext}")) with xopen(path, "wb", threads=0) as f: assert isinstance(f, io.BufferedWriter) if ext: assert isinstance(f.raw, klass), f def test_write_gzip_no_threads_no_isal(tmpdir, xopen_without_igzip): import gzip path = str(tmpdir.join("out.gz")) with xopen_without_igzip(path, "wb", threads=0) as f: assert isinstance(f.raw, gzip.GzipFile), f def test_write_stdout(): f = xopen('-', mode='w') print("Hello", file=f) f.close() # ensure stdout is not closed print("Still there?") def test_write_stdout_contextmanager(): # Do not close stdout with xopen('-', mode='w') as f: print("Hello", file=f) # ensure stdout is not closed print("Still there?") def test_read_pathlib(fname): path = Path(fname) with xopen(path, mode='rt') as f: assert f.read() == CONTENT def test_read_pathlib_binary(fname): path = Path(fname) with xopen(path, mode='rb') as f: assert f.read() == bytes(CONTENT, 'ascii') def test_write_pathlib(ext, tmpdir): path = Path(str(tmpdir)) / ('hello.txt' + ext) with xopen(path, mode='wt') as f: f.write('hello') with xopen(path, mode='rt') as f: assert f.read() == 'hello' def test_write_pathlib_binary(ext, tmpdir): path = Path(str(tmpdir)) / ('hello.txt' + ext) with xopen(path, mode='wb') as f: f.write(b'hello') with xopen(path, mode='rb') as f: assert f.read() == b'hello' def test_concatenated_gzip_function(): assert _can_read_concatenated_gz("gzip") is True assert _can_read_concatenated_gz("pigz") is True assert _can_read_concatenated_gz("xz") is False @pytest.mark.skipif( not hasattr(fcntl, "F_GETPIPE_SZ") or _MAX_PIPE_SIZE is None, reason="Pipe size modifications not available on this platform.") def test_pipesize_changed(tmpdir): path = Path(str(tmpdir), "hello.gz") with xopen(path, "wb") as f: assert isinstance(f, PipedCompressionWriter) assert fcntl.fcntl(f._file.fileno(), fcntl.F_GETPIPE_SZ) == _MAX_PIPE_SIZE def test_xopen_falls_back_to_gzip_open(lacking_pigz_permissions): with xopen("tests/file.txt.gz", "rb") as f: assert f.readline() == CONTENT_LINES[0].encode("utf-8") def test_xopen_falls_back_to_gzip_open_no_isal(lacking_pigz_permissions, xopen_without_igzip): with xopen_without_igzip("tests/file.txt.gz", "rb") as f: assert f.readline() == CONTENT_LINES[0].encode("utf-8") def test_xopen_fals_back_to_gzip_open_write_no_isal(lacking_pigz_permissions, xopen_without_igzip, tmp_path): tmp = tmp_path / "test.gz" with xopen_without_igzip(tmp, "wb") as f: f.write(b"hello") assert gzip.decompress(tmp.read_bytes()) == b"hello" def test_xopen_falls_back_to_bzip2_open(lacking_pbzip2_permissions): with xopen("tests/file.txt.bz2", "rb") as f: assert f.readline() == CONTENT_LINES[0].encode("utf-8") def test_open_many_writers(tmp_path, ext): files = [] # Because lzma.open allocates a lot of memory, # open fewer files to avoid MemoryError on 32-bit architectures n = 21 if ext == ".xz" else 61 for i in range(1, n): path = tmp_path / f"{i:03d}.txt{ext}" f = xopen(path, "wb", threads=2) f.write(b"hello") files.append(f) for f in files: f.close() def test_pipedcompressionwriter_wrong_mode(tmpdir): with pytest.raises(ValueError) as error: PipedCompressionWriter(tmpdir.join("test"), ["gzip"], "xb") error.match("Mode is 'xb', but it must be") def test_pipedcompressionwriter_wrong_program(tmpdir): with pytest.raises(OSError): PipedCompressionWriter(tmpdir.join("test"), ["XVXCLSKDLA"], "wb") def test_compression_level(tmpdir, gzip_writer): # Currently only the gzip writers handle compression levels. with gzip_writer(tmpdir.join("test.gz"), "wt", 2) as test_h: test_h.write("test") assert gzip.decompress(Path(tmpdir.join("test.gz")).read_bytes()) == b"test" def test_iter_method_writers(writer, tmpdir): opener, extension = writer test_path = tmpdir.join(f"test{extension}") writer = opener(test_path, "wb") assert iter(writer) == writer def test_next_method_writers(writer, tmpdir): opener, extension = writer test_path = tmpdir.join(f"test.{extension}") writer = opener(test_path, "wb") with pytest.raises(io.UnsupportedOperation) as error: next(writer) error.match('not readable') def test_pipedcompressionreader_wrong_mode(): with pytest.raises(ValueError) as error: PipedCompressionReader("test", ["gzip"], "xb") error.match("Mode is 'xb', but it must be") def test_piped_compression_reader_peek_binary(reader): opener, extension = reader filegz = Path(__file__).parent / f"file.txt{extension}" with opener(filegz, "rb") as read_h: # Peek returns at least the amount of characters but maybe more # depending on underlying stream. Hence startswith not ==. assert read_h.peek(1).startswith(b"T") @pytest.mark.parametrize("mode", ["r", "rt"]) def test_piped_compression_reader_peek_text(reader, mode): opener, extension = reader compressed_file = Path(__file__).parent / f"file.txt{extension}" with opener(compressed_file, mode) as read_h: with pytest.raises(AttributeError): read_h.peek(1) def writers_and_levels(): for writer in PIPED_GZIP_WRITERS: if writer == PipedGzipWriter: # Levels 1-9 are supported yield from ((writer, i) for i in range(1, 10)) elif writer == PipedPigzWriter: # Levels 0-9 + 11 are supported yield from ((writer, i) for i in list(range(10)) + [11]) elif writer == PipedIGzipWriter or writer == PipedPythonIsalWriter: # Levels 0-3 are supported yield from ((writer, i) for i in range(4)) else: raise NotImplementedError(f"Test should be implemented for " f"{writer}") # pragma: no cover @pytest.mark.parametrize(["writer", "level"], writers_and_levels()) def test_valid_compression_levels(writer, level, tmpdir): test_file = tmpdir.join("test.gz") with writer(test_file, "wb", level) as handle: handle.write(b"test") assert gzip.decompress(Path(test_file).read_bytes()) == b"test" xopen-1.2.1/tests/file.txt.bz20000644000175000017500000000016614132253245015514 0ustar nileshnileshBZh91AY&SYӀ@ 1MTikt%B"(HN|BZh91AY&SYsS@e 1ē& 7"(H9xopen-1.2.1/tests/file.txt0000644000175000017500000000004614132253245015015 0ustar nileshnileshTesting, testing ... The second line. xopen-1.2.1/tests/file.txt.xz0000644000175000017500000000014014132253245015450 0ustar nileshnilesh7zXZִF!t/%Testing, testing ... The second line. ]ݜa>&+N}YZxopen-1.2.1/setup.cfg0000644000175000017500000000132114132253261014007 0ustar nileshnilesh[metadata] name = xopen author = Marcel Martin et al. author_email = mail@marcelm.net url = https://github.com/pycompression/xopen/ description = Open compressed files transparently long_description = file: README.rst license = MIT classifiers = Development Status :: 5 - Production/Stable License :: OSI Approved :: MIT License Programming Language :: Python :: 3 [options] python_requires = >=3.6 package_dir = =src packages = find: install_requires = isal>=0.9.0; platform_machine == "x86_64" or platform_machine == "AMD64" or platform_machine == "aarch64" [options.packages.find] where = src [options.package_data] * = py.typed [options.extras_require] dev = pytest [egg_info] tag_build = tag_date = 0 xopen-1.2.1/.gitignore0000644000175000017500000000010214132253245014154 0ustar nileshnilesh__pycache__/ *.pyc *.egg-info *~ .tox venv/ src/xopen/_version.py xopen-1.2.1/pyproject.toml0000644000175000017500000000020314132253245015102 0ustar nileshnilesh[build-system] requires = ["setuptools", "wheel", "setuptools_scm>=6.2"] [tool.setuptools_scm] write_to = "src/xopen/_version.py" xopen-1.2.1/.codecov.yml0000644000175000017500000000026414132253245014420 0ustar nileshnileshcomment: off codecov: require_ci_to_pass: no coverage: precision: 1 round: down range: "70...100" status: project: yes patch: no changes: no comment: off xopen-1.2.1/LICENSE0000644000175000017500000000205114132253245013176 0ustar nileshnileshCopyright (c) 2010-2021 xopen developers Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. xopen-1.2.1/.github/0000755000175000017500000000000014132253261013531 5ustar nileshnileshxopen-1.2.1/.github/workflows/0000755000175000017500000000000014132253261015566 5ustar nileshnileshxopen-1.2.1/.github/workflows/ci.yml0000644000175000017500000000451214132253245016710 0ustar nileshnileshname: CI on: [push, pull_request] jobs: lint: timeout-minutes: 10 runs-on: ubuntu-latest strategy: matrix: python-version: [3.7] toxenv: [flake8, mypy] steps: - uses: actions/checkout@v2 - name: Set up Python ${{ matrix.python-version }} uses: actions/setup-python@v2 with: python-version: ${{ matrix.python-version }} - name: Install dependencies run: python -m pip install tox - name: Run tox ${{ matrix.toxenv }} run: tox -e ${{ matrix.toxenv }} test: timeout-minutes: 10 runs-on: ${{ matrix.os }} strategy: matrix: os: [ubuntu-latest] python-version: ["3.6", "3.7", "3.8", "3.9", "pypy-3.7"] include: - os: macos-latest python-version: 3.7 - os: ubuntu-20.04 python-version: 3.7 with-isal: true steps: - name: Install pigz and pbzip2 MacOS if: startsWith(matrix.os, 'macos') run: brew install pigz pbzip2 - name: Install pigz and pbzip2 Linux if: startsWith(matrix.os, 'ubuntu') run: sudo apt-get install pigz pbzip2 - name: Install isal if: matrix.with-isal && !startsWith(matrix.os, 'macos') run: sudo apt-get install isal libisal-dev - uses: actions/checkout@v2 - name: Set up Python ${{ matrix.python-version }} uses: actions/setup-python@v2 with: python-version: ${{ matrix.python-version }} - name: Install dependencies run: python -m pip install tox - name: Test run: tox -e py if: matrix.with-isal == null - name: Test with isal run: tox -e isal if: matrix.with-isal - name: Upload coverage report uses: codecov/codecov-action@v1 deploy: timeout-minutes: 10 runs-on: ubuntu-latest needs: [lint, test] if: startsWith(github.ref, 'refs/tags') steps: - uses: actions/checkout@v2 with: fetch-depth: 0 # required for setuptools_scm - name: Set up Python uses: actions/setup-python@v2 with: python-version: 3.7 - name: Make distributions run: | python -m pip install build python -m build ls -l dist/ - name: Publish to PyPI uses: pypa/gh-action-pypi-publish@v1.4.1 with: user: __token__ password: ${{ secrets.pypi_password }}