pax_global_header 0000666 0000000 0000000 00000000064 13370057767 0014531 g ustar 00root root 0000000 0000000 52 comment=02259e2b37bb0796ece84a553efb7d01a0697014
bitshuffle-0.3.5/ 0000775 0000000 0000000 00000000000 13370057767 0013671 5 ustar 00root root 0000000 0000000 bitshuffle-0.3.5/.gitignore 0000664 0000000 0000000 00000001217 13370057767 0015662 0 ustar 00root root 0000000 0000000 ## C
# Object files
*.o
*.ko
*.obj
*.elf
# Libraries
*.lib
*.a
# Shared objects (inc. Windows DLLs)
*.dll
*.so
*.so.*
*.dylib
# Executables
*.exe
*.out
*.app
*.i*86
*.x86_64
*.hex
## Python
*.py[cod]
# C extensions
*.so
# Packages
*.egg
*.egg-info
dist
build
eggs
parts
bin
var
sdist
develop-eggs
.installed.cfg
lib
lib64
__pycache__
# Installer logs
pip-log.txt
# Unit test / coverage reports
.coverage
.tox
nosetests.xml
# Translations
*.mo
# Mr Developer
.mr.developer.cfg
.project
.pydevproject
# Documentation builds
doc/_build
doc/generated
## Editor files and backups.
*.swp
*.swo
# Generated files
bitshuffle/ext.c
bitshuffle/h5.c
bitshuffle-0.3.5/.travis.yml 0000664 0000000 0000000 00000002115 13370057767 0016001 0 ustar 00root root 0000000 0000000 language: python
os: linux
# To test filter plugins, need hdf5 1.8.11+, present in Trusty but not Precise.
dist: trusty
# Required to get Trusty.
#sudo: true
python:
- "2.7"
- "3.4"
- "3.5"
- "3.6"
addons:
apt:
packages:
- libhdf5-serial-dev
- hdf5-tools
install:
- "pip install -U pip virtualenv"
# Ensures the system hdf5 headers/libs will be used whatever its version
- "export HDF5_DIR=/usr/lib"
- "pip install -r requirements.txt"
# Installing the plugin to arbitrary directory to check the install script.
- "python setup.py install --h5plugin --h5plugin-dir ~/hdf5/lib"
# Ensure it's installable and usable in virtualenv
- "virtualenv ~/venv"
- "travis_wait 30 ~/venv/bin/pip -v install --no-binary=h5py ."
- "~/venv/bin/pip -v install nose"
# Can't be somewhere that has a 'bitshuffle' directory as nose will use that
# copy instead of installed package.
script:
- "cd ~"
- "nosetests -v bitshuffle" # Test the system install
- "venv/bin/nosetests -v bitshuffle" # Test the virtualenv install
bitshuffle-0.3.5/LICENSE 0000664 0000000 0000000 00000002174 13370057767 0014702 0 ustar 00root root 0000000 0000000 Bitshuffle - Filter for improving compression of typed binary data.
Copyright (c) 2014 Kiyoshi Masui (kiyo@physics.ubc.ca)
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
bitshuffle-0.3.5/MANIFEST.in 0000664 0000000 0000000 00000000346 13370057767 0015432 0 ustar 00root root 0000000 0000000 recursive-include src *.h *.c
recursive-include bitshuffle *.pyx
recursive-include lz4 *.h *.c
recursive-include lzf *.h *.c
include setup.cfg.example
include LICENSE
include README.rst
include requirements.txt
exclude setup.cfg
bitshuffle-0.3.5/README.rst 0000664 0000000 0000000 00000022010 13370057767 0015353 0 ustar 00root root 0000000 0000000 ==========
Bitshuffle
==========
Filter for improving compression of typed binary data.
Bitshuffle is an algorithm that rearranges typed, binary data for improving
compression, as well as a python/C package that implements this algorithm
within the Numpy framework.
The library can be used along side HDF5 to compress and decompress datasets and
is integrated through the `dynamically loaded filters`_ framework. Bitshuffle
is HDF5 filter number ``32008``.
Algorithmically, Bitshuffle is closely related to HDF5's `Shuffle filter`_
except it operates at the bit level instead of the byte level. Arranging a
typed data array in to a matrix with the elements as the rows and the bits
within the elements as the columns, Bitshuffle "transposes" the matrix,
such that all the least-significant-bits are in a row, etc. This transpose
is performed within blocks of data roughly 8kB long [1]_.
This does not in itself compress data, only rearranges it for more efficient
compression. To perform the actual compression you will need a compression
library. Bitshuffle has been designed to be well matched Marc Lehmann's
LZF_ as well as LZ4_. Note that because Bitshuffle modifies the data at the bit
level, sophisticated entropy reducing compression libraries such as GZIP and
BZIP are unlikely to achieve significantly better compression than simpler and
faster duplicate-string-elimination algorithms such as LZF and LZ4. Bitshuffle
thus includes routines (and HDF5 filter options) to apply LZ4 compression to
each block after shuffling [2]_.
The Bitshuffle algorithm relies on neighbouring elements of a dataset being
highly correlated to improve data compression. Any correlations that span at
least 24 elements of the dataset may be exploited to improve compression.
Bitshuffle was designed with performance in mind. On most machines the
time required for Bitshuffle+LZ4 is insignificant compared to the time required
to read or write the compressed data to disk. Because it is able to exploit the
SSE and AVX instruction sets present on modern Intel and AMD processors, on
these machines compression is only marginally slower than an out-of-cache
memory copy. On modern x86 processors you can expect Bitshuffle to have a
throughput of roughly 1 byte per clock cycle, and on the Haswell generation of
Intel processors (2013) and later, you can expect up to 2 bytes per clock
cycle. In addition, Bitshuffle is parallelized using OpenMP.
As a bonus, Bitshuffle ships with a dynamically loaded version of
`h5py`'s LZF compression filter, such that the filter can be transparently
used outside of python and in command line utilities such as ``h5dump``.
.. [1] Chosen to fit comfortably within L1 cache as well as be well matched
window of the LZF compression library.
.. [2] Over applying bitshuffle to the full dataset then applying LZ4
compression, this has the tremendous advantage that the block is
already in the L1 cache.
.. _`dynamically loaded filters`: http://www.hdfgroup.org/HDF5/doc/Advanced/DynamicallyLoadedFilters/HDF5DynamicallyLoadedFilters.pdf
.. _`Shuffle filter`: http://www.hdfgroup.org/HDF5/doc_resource/H5Shuffle_Perf.pdf
.. _LZF: http://oldhome.schmorp.de/marc/liblzf.html
.. _LZ4: https://code.google.com/p/lz4/
Applications
------------
Bitshuffle might be right for your application if:
- You need to compress typed binary data.
- Your data is arranged such that adjacent elements over the fastest varying
index of your dataset are similar (highly correlated).
- A special case of the previous point is if you are only exercising a subset
of the bits in your data-type, as is often true of integer data.
- You need both high compression ratios and high performance.
Comparing Bitshuffle to other compression algorithms and HDF5 filters:
- Bitshuffle is less general than many other compression algorithms.
To achieve good compression ratios, consecutive elements of your data must
be highly correlated.
- For the right datasets, Bitshuffle is one of the few compression
algorithms that promises both high throughput and high compression ratios.
- Bitshuffle should have roughly the same throughput as Shuffle, but
may obtain higher compression ratios.
- The MAFISC_ filter actually includes something similar to Bitshuffle as one of
its prefilters, However, MAFICS's emphasis is on obtaining high compression
ratios at all costs, sacrificing throughput.
.. _MAFISC: http://wr.informatik.uni-hamburg.de/research/projects/icomex/mafisc
Installation for Python
-----------------------
Installation requires python 2.7+ or 3.3+, HDF5 1.8.4 or later, HDF5 for python
(h5py), Numpy and Cython. Bitshuffle must be linked against the same version of
HDF5 as h5py, which in practice means h5py must be built from source_ rather
than pre-built wheels [3]_. To use the dynamically loaded HDF5 filter requires
HDF5 1.8.11 or later.
To install::
python setup.py install [--h5plugin [--h5plugin-dir=spam]]
To get finer control of installation options, including whether to compile
with OpenMP multi-threading, copy the ``setup.cfg.example`` to ``setup.cfg``
and edit the values therein.
If using the dynamically loaded HDF5 filter (which gives you access to the
Bitshuffle and LZF filters outside of python), set the environment variable
``HDF5_PLUGIN_PATH`` to the value of ``--h5plugin-dir`` or use HDF5's default
search location of ``/usr/local/hdf5/lib/plugin``.
If you get an error about missing source files when building the extensions,
try upgrading setuptools. There is a weird bug where setuptools prior to 0.7
doesn't work properly with Cython in some cases.
.. _source: http://docs.h5py.org/en/latest/build.html#source-installation
.. [3] Typically you will be able to install Bitshuffle, but there will be
errors when creating and reading datasets.
Usage from Python
-----------------
The `bitshuffle` module contains routines for shuffling and unshuffling
Numpy arrays.
If installed with the dynamically loaded filter plugins, Bitshuffle can be used
in conjunction with HDF5 both inside and outside of python, in the same way as
any other filter; simply by specifying the filter number ``32008``. Otherwise
the filter will be available only within python and only after importing
`bitshuffle.h5`. Reading Bitshuffle encoded datasets will be transparent.
The filter can be added to new datasets either through the `h5py` low level
interface or through the convenience functions provided in
`bitshuffle.h5`. See the docstrings and unit tests for examples. For `h5py`
version 2.5.0 and later Bitshuffle can added to new datasets through the
high level interface, as in the example below.
Example h5py
------------
::
import h5py
import numpy
import bitshuffle.h5
print(h5py.__version__) # >= '2.5.0'
f = h5py.File(filename, "w")
# block_size = 0 let Bitshuffle choose its value
block_size = 0
dataset = f.create_dataset(
"data",
(100, 100, 100),
compression=bitshuffle.h5.H5FILTER,
compression_opts=(block_size, bitshuffle.h5.H5_COMPRESS_LZ4),
dtype='float32',
)
# create some random data
array = numpy.random.rand(100, 100, 100)
array = array.astype('float32')
dataset[:] = array
f.close()
Usage from C
------------
If you wish to use Bitshuffle in your C program and would prefer not to use the
HDF5 dynamically loaded filter, the C library in the ``src/`` directory is
self-contained and complete.
Usage from Java
---------------
You can use Bitshuffle even in Java and the routines for shuffling and unshuffling
are ported into `snappy-java`_. To use the routines, you need to add the following
dependency to your pom.xml::
org.xerial.snappy
snappy-java
1.1.3-M1
First, import org.xerial.snapy.BitShuffle in your Java code::
import org.xerial.snappy.BitShuffle;
Then, you use them like this::
int[] data = new int[] {1, 3, 34, 43, 34};
byte[] shuffledData = BitShuffle.bitShuffle(data);
int[] result = BitShuffle.bitUnShuffleIntArray(shuffledData);
.. _`snappy-java`: https://github.com/xerial/snappy-java
Anaconda
--------
The conda package can be build via::
conda build conda-recipe
For Best Results
----------------
Here are a few tips to help you get the most out of Bitshuffle:
- For multi-dimensional datasets, order your data such that the fastest varying
dimension is the one over which your data is most correlated (have
values that change the least), or fake this using chunks.
- To achieve the highest throughput, use a data type that is 64 *bytes* or
smaller. If you have a very large compound data type, consider adding a
dimension to your datasets instead.
- To make full use of the SSE2 instruction set, use a data type whose size
is a multiple of 2 bytes. For the AVX2 instruction set, use a data type whose
size is a multiple of 4 bytes.
Citing Bitshuffle
-----------------
Bitshuffle was initially described in
http://dx.doi.org/10.1016/j.ascom.2015.07.002, pre-print available at
http://arxiv.org/abs/1503.00638.
bitshuffle-0.3.5/bitshuffle/ 0000775 0000000 0000000 00000000000 13370057767 0016024 5 ustar 00root root 0000000 0000000 bitshuffle-0.3.5/bitshuffle/__init__.py 0000664 0000000 0000000 00000000622 13370057767 0020135 0 ustar 00root root 0000000 0000000 """
Filter for improving compression of typed binary data.
Functions
=========
using_NEON
using_SSE2
using_AVX2
bitshuffle
bitunshuffle
compress_lz4
decompress_lz4
"""
from __future__ import absolute_import
from bitshuffle.ext import (__version__, bitshuffle, bitunshuffle, using_NEON, using_SSE2,
using_AVX2, compress_lz4, decompress_lz4)
bitshuffle-0.3.5/bitshuffle/ext.pyx 0000664 0000000 0000000 00000033617 13370057767 0017400 0 ustar 00root root 0000000 0000000 """
Wrappers for public and private bitshuffle routines
"""
from __future__ import absolute_import, division, print_function, unicode_literals
import numpy as np
cimport numpy as np
cimport cython
np.import_array()
# Repeat each calculation this many times. For timing.
cdef int REPEATC = 1
#cdef int REPEATC = 32
REPEAT = REPEATC
cdef extern from b"bitshuffle.h":
int bshuf_using_NEON()
int bshuf_using_SSE2()
int bshuf_using_AVX2()
int bshuf_bitshuffle(void *A, void *B, int size, int elem_size,
int block_size)
int bshuf_bitunshuffle(void *A, void *B, int size, int elem_size,
int block_size)
int bshuf_compress_lz4_bound(int size, int elem_size, int block_size)
int bshuf_compress_lz4(void *A, void *B, int size, int elem_size,
int block_size)
int bshuf_decompress_lz4(void *A, void *B, int size, int elem_size,
int block_size)
int BSHUF_VERSION_MAJOR
int BSHUF_VERSION_MINOR
int BSHUF_VERSION_POINT
__version__ = str("%d.%d.%d").format(BSHUF_VERSION_MAJOR, BSHUF_VERSION_MINOR,
BSHUF_VERSION_POINT)
# Prototypes from bitshuffle.c
cdef extern int bshuf_copy(void *A, void *B, int size, int elem_size)
cdef extern int bshuf_trans_byte_elem_scal(void *A, void *B, int size, int elem_size)
cdef extern int bshuf_trans_byte_elem_SSE(void *A, void *B, int size, int elem_size)
cdef extern int bshuf_trans_byte_elem_NEON(void *A, void *B, int size, int elem_size)
cdef extern int bshuf_trans_bit_byte_scal(void *A, void *B, int size, int elem_size)
cdef extern int bshuf_trans_bit_byte_SSE(void *A, void *B, int size, int elem_size)
cdef extern int bshuf_trans_bit_byte_NEON(void *A, void *B, int size, int elem_size)
cdef extern int bshuf_trans_bit_byte_AVX(void *A, void *B, int size, int elem_size)
cdef extern int bshuf_trans_bitrow_eight(void *A, void *B, int size, int elem_size)
cdef extern int bshuf_trans_bit_elem_AVX(void *A, void *B, int size, int elem_size)
cdef extern int bshuf_trans_bit_elem_SSE(void *A, void *B, int size, int elem_size)
cdef extern int bshuf_trans_bit_elem_NEON(void *A, void *B, int size, int elem_size)
cdef extern int bshuf_trans_bit_elem_scal(void *A, void *B, int size, int elem_size)
cdef extern int bshuf_trans_byte_bitrow_SSE(void *A, void *B, int size, int elem_size)
cdef extern int bshuf_trans_byte_bitrow_NEON(void *A, void *B, int size, int elem_size)
cdef extern int bshuf_trans_byte_bitrow_AVX(void *A, void *B, int size, int elem_size)
cdef extern int bshuf_trans_byte_bitrow_scal(void *A, void *B, int size, int elem_size)
cdef extern int bshuf_shuffle_bit_eightelem_scal(void *A, void *B, int size, int elem_size)
cdef extern int bshuf_shuffle_bit_eightelem_SSE(void *A, void *B, int size, int elem_size)
cdef extern int bshuf_shuffle_bit_eightelem_NEON(void *A, void *B, int size, int elem_size)
cdef extern int bshuf_shuffle_bit_eightelem_AVX(void *A, void *B, int size, int elem_size)
cdef extern int bshuf_untrans_bit_elem_SSE(void *A, void *B, int size, int elem_size)
cdef extern int bshuf_untrans_bit_elem_NEON(void *A, void *B, int size, int elem_size)
cdef extern int bshuf_untrans_bit_elem_AVX(void *A, void *B, int size, int elem_size)
cdef extern int bshuf_untrans_bit_elem_scal(void *A, void *B, int size, int elem_size)
cdef extern int bshuf_trans_bit_elem(void *A, void *B, int size, int elem_size)
cdef extern int bshuf_untrans_bit_elem(void *A, void *B, int size, int elem_size)
ctypedef int (*Cfptr) (void *A, void *B, int size, int elem_size)
def using_NEON():
"""Whether compiled using Arm NEON instructions."""
if bshuf_using_NEON():
return True
else:
return False
def using_SSE2():
"""Whether compiled using SSE2 instructions."""
if bshuf_using_SSE2():
return True
else:
return False
def using_AVX2():
"""Whether compiled using AVX2 instructions."""
if bshuf_using_AVX2():
return True
else:
return False
def _setup_arr(arr):
shape = tuple(arr.shape)
if not arr.flags['C_CONTIGUOUS']:
msg = "Input array must be C-contiguous."
raise ValueError(msg)
size = arr.size
dtype = arr.dtype
itemsize = dtype.itemsize
out = np.empty(shape, dtype=dtype)
return out, size, itemsize
@cython.boundscheck(False)
@cython.wraparound(False)
cdef _wrap_C_fun(Cfptr fun, np.ndarray arr):
"""Wrap a C function with standard call signature."""
cdef int ii, size, itemsize, count=0
cdef np.ndarray out
out, size, itemsize = _setup_arr(arr)
cdef np.ndarray[dtype=np.uint8_t, ndim=1, mode="c"] arr_flat
arr_flat = arr.view(np.uint8).ravel()
cdef np.ndarray[dtype=np.uint8_t, ndim=1, mode="c"] out_flat
out_flat = out.view(np.uint8).ravel()
cdef void* arr_ptr = &arr_flat[0]
cdef void* out_ptr = &out_flat[0]
for ii in range(REPEATC):
count = fun(arr_ptr, out_ptr, size, itemsize)
if count < 0:
msg = "Failed. Error code %d."
excp = RuntimeError(msg % count, count)
raise excp
return out
def copy(np.ndarray arr not None):
"""Copies the data.
For testing and profiling purposes.
"""
return _wrap_C_fun(&bshuf_copy, arr)
def trans_byte_elem_scal(np.ndarray arr not None):
"""Transpose bytes within words but not bits.
"""
return _wrap_C_fun(&bshuf_trans_byte_elem_scal, arr)
def trans_byte_elem_SSE(np.ndarray arr not None):
"""Transpose bytes within array elements.
"""
return _wrap_C_fun(&bshuf_trans_byte_elem_SSE, arr)
def trans_byte_elem_NEON(np.ndarray arr not None):
return _wrap_C_fun(&bshuf_trans_byte_elem_NEON, arr)
def trans_bit_byte_scal(np.ndarray arr not None):
return _wrap_C_fun(&bshuf_trans_bit_byte_scal, arr)
def trans_bit_byte_SSE(np.ndarray arr not None):
return _wrap_C_fun(&bshuf_trans_bit_byte_SSE, arr)
def trans_bit_byte_NEON(np.ndarray arr not None):
return _wrap_C_fun(&bshuf_trans_bit_byte_NEON, arr)
def trans_bit_byte_AVX(np.ndarray arr not None):
return _wrap_C_fun(&bshuf_trans_bit_byte_AVX, arr)
def trans_bitrow_eight(np.ndarray arr not None):
return _wrap_C_fun(&bshuf_trans_bitrow_eight, arr)
def trans_bit_elem_AVX(np.ndarray arr not None):
return _wrap_C_fun(&bshuf_trans_bit_elem_AVX, arr)
def trans_bit_elem_scal(np.ndarray arr not None):
return _wrap_C_fun(&bshuf_trans_bit_elem_scal, arr)
def trans_bit_elem_SSE(np.ndarray arr not None):
return _wrap_C_fun(&bshuf_trans_bit_elem_SSE, arr)
def trans_bit_elem_NEON(np.ndarray arr not None):
return _wrap_C_fun(&bshuf_trans_bit_elem_NEON, arr)
def trans_byte_bitrow_SSE(np.ndarray arr not None):
return _wrap_C_fun(&bshuf_trans_byte_bitrow_SSE, arr)
def trans_byte_bitrow_NEON(np.ndarray arr not None):
return _wrap_C_fun(&bshuf_trans_byte_bitrow_NEON, arr)
def trans_byte_bitrow_AVX(np.ndarray arr not None):
return _wrap_C_fun(&bshuf_trans_byte_bitrow_AVX, arr)
def trans_byte_bitrow_scal(np.ndarray arr not None):
return _wrap_C_fun(&bshuf_trans_byte_bitrow_scal, arr)
def shuffle_bit_eightelem_scal(np.ndarray arr not None):
return _wrap_C_fun(&bshuf_shuffle_bit_eightelem_scal, arr)
def shuffle_bit_eightelem_SSE(np.ndarray arr not None):
return _wrap_C_fun(&bshuf_shuffle_bit_eightelem_SSE, arr)
def shuffle_bit_eightelem_NEON(np.ndarray arr not None):
return _wrap_C_fun(&bshuf_shuffle_bit_eightelem_NEON, arr)
def shuffle_bit_eightelem_AVX(np.ndarray arr not None):
return _wrap_C_fun(&bshuf_shuffle_bit_eightelem_AVX, arr)
def untrans_bit_elem_SSE(np.ndarray arr not None):
return _wrap_C_fun(&bshuf_untrans_bit_elem_SSE, arr)
def untrans_bit_elem_NEON(np.ndarray arr not None):
return _wrap_C_fun(&bshuf_untrans_bit_elem_NEON, arr)
def untrans_bit_elem_AVX(np.ndarray arr not None):
return _wrap_C_fun(&bshuf_untrans_bit_elem_AVX, arr)
def untrans_bit_elem_scal(np.ndarray arr not None):
return _wrap_C_fun(&bshuf_untrans_bit_elem_scal, arr)
def trans_bit_elem(np.ndarray arr not None):
return _wrap_C_fun(&bshuf_trans_bit_elem, arr)
def untrans_bit_elem(np.ndarray arr not None):
return _wrap_C_fun(&bshuf_untrans_bit_elem, arr)
@cython.boundscheck(False)
@cython.wraparound(False)
def bitshuffle(np.ndarray arr not None, int block_size=0):
"""Bitshuffle an array.
Output array is the same shape and data type as input array but underlying
buffer has been bitshuffled.
Parameters
----------
arr : numpy array
Data to ne processed.
block_size : positive integer
Block size in number of elements. By default, block size is chosen
automatically.
Returns
-------
out : numpy array
Array with the same shape as input but underlying data has been
bitshuffled.
"""
cdef int ii, size, itemsize, count=0
cdef np.ndarray out
out, size, itemsize = _setup_arr(arr)
cdef np.ndarray[dtype=np.uint8_t, ndim=1, mode="c"] arr_flat
arr_flat = arr.view(np.uint8).ravel()
cdef np.ndarray[dtype=np.uint8_t, ndim=1, mode="c"] out_flat
out_flat = out.view(np.uint8).ravel()
cdef void* arr_ptr = &arr_flat[0]
cdef void* out_ptr = &out_flat[0]
for ii in range(REPEATC):
count = bshuf_bitshuffle(arr_ptr, out_ptr, size, itemsize, block_size)
if count < 0:
msg = "Failed. Error code %d."
excp = RuntimeError(msg % count, count)
raise excp
return out
@cython.boundscheck(False)
@cython.wraparound(False)
def bitunshuffle(np.ndarray arr not None, int block_size=0):
"""Bitshuffle an array.
Output array is the same shape and data type as input array but underlying
buffer has been un-bitshuffled.
Parameters
----------
arr : numpy array
Data to ne processed.
block_size : positive integer
Block size in number of elements. Must match value used for shuffling.
Returns
-------
out : numpy array
Array with the same shape as input but underlying data has been
un-bitshuffled.
"""
cdef int ii, size, itemsize, count=0
cdef np.ndarray out
out, size, itemsize = _setup_arr(arr)
cdef np.ndarray[dtype=np.uint8_t, ndim=1, mode="c"] arr_flat
arr_flat = arr.view(np.uint8).ravel()
cdef np.ndarray[dtype=np.uint8_t, ndim=1, mode="c"] out_flat
out_flat = out.view(np.uint8).ravel()
cdef void* arr_ptr = &arr_flat[0]
cdef void* out_ptr = &out_flat[0]
for ii in range(REPEATC):
count = bshuf_bitunshuffle(arr_ptr, out_ptr, size, itemsize, block_size)
if count < 0:
msg = "Failed. Error code %d."
excp = RuntimeError(msg % count, count)
raise excp
return out
@cython.boundscheck(False)
@cython.wraparound(False)
def compress_lz4(np.ndarray arr not None, int block_size=0):
"""Bitshuffle then compress an array using LZ4.
Parameters
----------
arr : numpy array
Data to ne processed.
block_size : positive integer
Block size in number of elements. By default, block size is chosen
automatically.
Returns
-------
out : array with np.uint8 data type
Buffer holding compressed data.
"""
cdef int ii, size, itemsize, count=0
shape = (arr.shape[i] for i in range(arr.ndim))
if not arr.flags['C_CONTIGUOUS']:
msg = "Input array must be C-contiguous."
raise ValueError(msg)
size = arr.size
dtype = arr.dtype
itemsize = dtype.itemsize
max_out_size = bshuf_compress_lz4_bound(size, itemsize, block_size)
cdef np.ndarray out
out = np.empty(max_out_size, dtype=np.uint8)
cdef np.ndarray[dtype=np.uint8_t, ndim=1, mode="c"] arr_flat
arr_flat = arr.view(np.uint8).ravel()
cdef np.ndarray[dtype=np.uint8_t, ndim=1, mode="c"] out_flat
out_flat = out.view(np.uint8).ravel()
cdef void* arr_ptr = &arr_flat[0]
cdef void* out_ptr = &out_flat[0]
for ii in range(REPEATC):
count = bshuf_compress_lz4(arr_ptr, out_ptr, size, itemsize, block_size)
if count < 0:
msg = "Failed. Error code %d."
excp = RuntimeError(msg % count, count)
raise excp
return out[:count]
@cython.boundscheck(False)
@cython.wraparound(False)
def decompress_lz4(np.ndarray arr not None, shape, dtype, int block_size=0):
"""Decompress a buffer using LZ4 then bitunshuffle it yielding an array.
Parameters
----------
arr : numpy array
Input data to be decompressed.
shape : tuple of integers
Shape of the output (decompressed array). Must match the shape of the
original data array before compression.
dtype : numpy dtype
Datatype of the output array. Must match the data type of the original
data array before compression.
block_size : positive integer
Block size in number of elements. Must match value used for
compression.
Returns
-------
out : numpy array with shape *shape* and data type *dtype*
Decompressed data.
"""
cdef int ii, size, itemsize, count=0
if not arr.flags['C_CONTIGUOUS']:
msg = "Input array must be C-contiguous."
raise ValueError(msg)
size = np.prod(shape)
itemsize = dtype.itemsize
cdef np.ndarray out
out = np.empty(tuple(shape), dtype=dtype)
cdef np.ndarray[dtype=np.uint8_t, ndim=1, mode="c"] arr_flat
arr_flat = arr.view(np.uint8).ravel()
cdef np.ndarray[dtype=np.uint8_t, ndim=1, mode="c"] out_flat
out_flat = out.view(np.uint8).ravel()
cdef void* arr_ptr = &arr_flat[0]
cdef void* out_ptr = &out_flat[0]
for ii in range(REPEATC):
count = bshuf_decompress_lz4(arr_ptr, out_ptr, size, itemsize,
block_size)
if count < 0:
msg = "Failed. Error code %d."
excp = RuntimeError(msg % count, count)
raise excp
if count != arr.size:
msg = "Decompressed different number of bytes than input buffer size."
msg += "Input buffer %d, decompressed %d." % (arr.size, count)
raise RuntimeError(msg, count)
return out
bitshuffle-0.3.5/bitshuffle/h5.pyx 0000664 0000000 0000000 00000014656 13370057767 0017116 0 ustar 00root root 0000000 0000000 """
HDF5 support for Bitshuffle.
To read a dataset that uses the Bitshuffle filter using h5py, simply import
this module (unless you have installed the Bitshuffle dynamically loaded
filter, in which case importing this module is unnecessary).
To create a new dataset that includes the Bitshuffle filter, use one of the
convenience functions provided.
Constants
=========
H5FILTER : The Bitshuffle HDF5 filter integer identifier.
H5_COMPRESS_LZ4 : Filter option flag for LZ4 compression.
Functions
=========
create_dataset
create_bitshuffle_lzf_dataset
create_bitshuffle_compressed_dataset
Examples
========
>>> import numpy as np
>>> import h5py
>>> import bitshuffle.h5
>>> shape = (123, 456)
>>> chunks = (10, 456)
>>> dtype = np.float64
>>> f = h5py.File("tmp_test.h5")
>>> bitshuffle.h5.create_bitshuffle_compressed_dataset(
f, "some_data", shape, dtype, chunks)
>>> f["some_data"][:] = 42
"""
from __future__ import absolute_import, division, print_function, unicode_literals
import numpy
import h5py
from h5py import h5d, h5s, h5t, h5p, filters
cimport cython
cdef extern from b"bshuf_h5filter.h":
int bshuf_register_h5filter()
int BSHUF_H5FILTER
int BSHUF_H5_COMPRESS_LZ4
cdef int LZF_FILTER = 32000
H5FILTER = BSHUF_H5FILTER
H5_COMPRESS_LZ4 = BSHUF_H5_COMPRESS_LZ4
def register_h5_filter():
ret = bshuf_register_h5filter()
if ret < 0:
raise RuntimeError("Failed to register bitshuffle HDF5 filter.", ret)
register_h5_filter()
def create_dataset(parent, name, shape, dtype, chunks=None, maxshape=None,
fillvalue=None, track_times=None,
filter_pipeline=(), filter_flags=None, filter_opts=None):
"""Create a dataset with an arbitrary filter pipeline.
Return a new low-level dataset identifier.
Much of this code is copied from h5py, but couldn't reuse much code due to
unstable API.
"""
if hasattr(filter_pipeline, "__getitem__"):
filter_pipeline = list(filter_pipeline)
else:
filter_pipeline = [filter_pipeline]
filter_flags = [filter_flags]
filter_opts = [filter_opts]
nfilters = len(filter_pipeline)
if filter_flags is None:
filter_flags = [None] * nfilters
if filter_opts is None:
filter_opts = [None] * nfilters
if not len(filter_flags) == nfilters or not len(filter_opts) == nfilters:
msg = "Supplied incompatible number of filters, flags, and options."
raise ValueError(msg)
shape = tuple(shape)
tmp_shape = maxshape if maxshape is not None else shape
# Validate chunk shape
chunks_larger = (numpy.array([ not i>=j
for i,j in zip(tmp_shape,chunks) if i is not None])).any()
if isinstance(chunks, tuple) and chunks_larger:
errmsg = ("Chunk shape must not be greater than data shape in any "
"dimension. {} is not compatible with {}".format(chunks, shape))
raise ValueError(errmsg)
if isinstance(dtype, h5py.Datatype):
# Named types are used as-is
tid = dtype.id
dtype = tid.dtype # Following code needs this
else:
# Validate dtype
dtype = numpy.dtype(dtype)
tid = h5t.py_create(dtype, logical=1)
if shape == ():
if any((chunks, filter_pipeline)):
raise TypeError("Scalar datasets don't support chunk/filter options")
if maxshape and maxshape != ():
raise TypeError("Scalar datasets cannot be extended")
return h5p.create(h5p.DATASET_CREATE)
def rq_tuple(tpl, name):
"""Check if chunks/maxshape match dataset rank"""
if tpl in (None, True):
return
try:
tpl = tuple(tpl)
except TypeError:
raise TypeError('"%s" argument must be None or a sequence object' % name)
if len(tpl) != len(shape):
raise ValueError('"%s" must have same rank as dataset shape' % name)
rq_tuple(chunks, 'chunks')
rq_tuple(maxshape, 'maxshape')
if (chunks is True) or (chunks is None and filter_pipeline):
chunks = filters.guess_chunk(shape, maxshape, dtype.itemsize)
if maxshape is True:
maxshape = (None,)*len(shape)
dcpl = h5p.create(h5p.DATASET_CREATE)
if chunks is not None:
dcpl.set_chunk(chunks)
dcpl.set_fill_time(h5d.FILL_TIME_ALLOC) # prevent resize glitch
if fillvalue is not None:
fillvalue = numpy.array(fillvalue)
dcpl.set_fill_value(fillvalue)
if track_times in (True, False):
dcpl.set_obj_track_times(track_times)
elif track_times is not None:
raise TypeError("track_times must be either True or False")
for ii in range(nfilters):
this_filter = filter_pipeline[ii]
this_flags = filter_flags[ii]
this_opts = filter_opts[ii]
if this_flags is None:
this_flags = 0
if this_opts is None:
this_opts = ()
dcpl.set_filter(this_filter, this_flags, this_opts)
if maxshape is not None:
maxshape = tuple(m if m is not None else h5s.UNLIMITED
for m in maxshape)
sid = h5s.create_simple(shape, maxshape)
dset_id = h5d.create(parent.id, name, tid, sid, dcpl=dcpl)
return dset_id
def create_bitshuffle_lzf_dataset(parent, name, shape, dtype, chunks=None,
maxshape=None, fillvalue=None,
track_times=None):
"""Create dataset with a filter pipeline including bitshuffle and LZF"""
filter_pipeline = [H5FILTER, LZF_FILTER]
dset_id = create_dataset(parent, name, shape, dtype, chunks=chunks,
filter_pipeline=filter_pipeline, maxshape=maxshape,
fillvalue=fillvalue, track_times=track_times)
return dset_id
def create_bitshuffle_compressed_dataset(parent, name, shape, dtype,
chunks=None, maxshape=None,
fillvalue=None, track_times=None):
"""Create dataset with bitshuffle+internal LZ4 compression."""
filter_pipeline = [H5FILTER,]
filter_opts = [(0, H5_COMPRESS_LZ4)]
dset_id = create_dataset(parent, name, shape, dtype, chunks=chunks,
filter_pipeline=filter_pipeline,
filter_opts=filter_opts, maxshape=maxshape,
fillvalue=fillvalue, track_times=track_times)
return dset_id
bitshuffle-0.3.5/bitshuffle/tests/ 0000775 0000000 0000000 00000000000 13370057767 0017166 5 ustar 00root root 0000000 0000000 bitshuffle-0.3.5/bitshuffle/tests/__init__.py 0000664 0000000 0000000 00000000000 13370057767 0021265 0 ustar 00root root 0000000 0000000 bitshuffle-0.3.5/bitshuffle/tests/data/ 0000775 0000000 0000000 00000000000 13370057767 0020077 5 ustar 00root root 0000000 0000000 bitshuffle-0.3.5/bitshuffle/tests/data/regression_0.1.3.h5 0000664 0000000 0000000 00000337417 13370057767 0023253 0 ustar 00root root 0000000 0000000 HDF
` TREE HEAP X ( compressed origional 0 H h TREE P 8 - @ HEAP ` ~ SNOD H h ( P p P p TREE 8 u| @ x HEAP ` {
rT bPLii2HzmV\yTa
VjgU.]0Uo.+ n{. DB($sk
jpW͞OD{EmttP_D&1eMGFόN?-Eעϰg{JV"6/:kMZHOXYS @ B1.ga<5)cx uQ$_QdOiG8Fr:R+T'#( B1: ?KywcGu7]0FN\8rEWq@ֆ^+=:> B14%KHѽtȒׯ,Zx^xI^ |pj3 ţ!^%5lFP\뫢uc,b˶!,*!Ү6'V>N5Sf`gPˢ5^zJOqև۔f¦
3Yw2݊(L/cb.SӻI vn!Ǹ\+°s%sd vb-fvyd "{`S.pXQ>w!#w;%cq6 @ B1kF1D=v2BQx`z.Gx"B L7T5.k#