pax_global_header00006660000000000000000000000064143624761520014524gustar00rootroot0000000000000052 comment=893b3e2dd044f18eeb6f6a33eaf8f438cc00d4e1 hdf5storage-0.1.19/000077500000000000000000000000001436247615200140275ustar00rootroot00000000000000hdf5storage-0.1.19/.gitattributes000066400000000000000000000003451436247615200167240ustar00rootroot00000000000000# Set default behaviour, in case users don't have core.autocrlf set. * text=auto # Explicitly declare text files we want to always be normalized and converted # to native line endings on checkout. *.py text *.txt text eol=crlf hdf5storage-0.1.19/.github/000077500000000000000000000000001436247615200153675ustar00rootroot00000000000000hdf5storage-0.1.19/.github/workflows/000077500000000000000000000000001436247615200174245ustar00rootroot00000000000000hdf5storage-0.1.19/.github/workflows/unit_tests.yml000066400000000000000000000030441436247615200223510ustar00rootroot00000000000000name: unit-tests on: push: branches: - 0.1.x paths-ignore: - 'docs/**' - 'MANIFEST.in' - 'README.rst' - 'THANKS.rst' - 'COPYING.txt' - 'requirements**.txt' - '.gitignore' - '.gitattributes' jobs: build: runs-on: ubuntu-20.04 strategy: matrix: python-version: - '3.7' - '3.9' h5py-version: - '2.6' - '2.10' - '3.0' - '3.1' - '3.2' include: - python-version: '2.7' h5py-version: '2.6' - python-version: '2.7' h5py-version: '2.10' - python-version: '3.5' h5py-version: '2.6' - python-version: '3.5' h5py-version: '2.10' steps: - uses: actions/checkout@v2 - name: Set up Python ${{ matrix.python-version }} uses: actions/setup-python@v2 with: python-version: ${{ matrix.python-version }} - name: Install System Dependencies run: | sudo apt-get update sudo apt-get install gcc libhdf5-serial-dev libblas-dev liblapack-dev libatlas-base-dev libquadmath0 - name: Install Python dependencies including h5py ${{ matrix.h5py-version }} env: H5PY_VERSION: ${{ matrix.h5py-version }} run: | python -m pip install -U numpy Cython python -m pip install h5py==$H5PY_VERSION python -m pip install -r requirements_tests.txt python -m pip install . - name: Test with nosetests run: | nosetests hdf5storage-0.1.19/.gitignore000066400000000000000000000005421436247615200160200ustar00rootroot00000000000000*.py[cod] # C extensions *.so # Packages *.egg *.egg-info dist build eggs parts bin var sdist develop-eggs .installed.cfg lib lib64 __pycache__ # Installer logs pip-log.txt # Unit test / coverage reports .coverage .tox nosetests.xml # Translations *.mo # Mr Developer .mr.developer.cfg .project .pydevproject # autosaves *.py~ *.yml~ *.rst~ *.txt~hdf5storage-0.1.19/COPYING.txt000066400000000000000000000024521436247615200157030ustar00rootroot00000000000000Copyright (c) 2013-2023, Freja Nordsiek All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. hdf5storage-0.1.19/MANIFEST.in000066400000000000000000000002171436247615200155650ustar00rootroot00000000000000include *.txt include *.rst include *.py include *.cfg include *.toml recursive-include tests *.py *.m recursive-include doc * prune doc/build hdf5storage-0.1.19/README.rst000066400000000000000000000551551436247615200155310ustar00rootroot00000000000000Overview ======== This Python package provides high level utilities to read/write a variety of Python types to/from HDF5 (Heirarchal Data Format) formatted files. This package also provides support for MATLAB MAT v7.3 formatted files, which are just HDF5 files with a different extension and some extra meta-data. All of this is done without pickling data. Pickling is bad for security because it allows arbitrary code to be executed in the interpreter. One wants to be able to read possibly HDF5 and MAT files from untrusted sources, so pickling is avoided in this package. The package's documetation is found at http://pythonhosted.org/hdf5storage/ The package's source code is found at https://github.com/frejanordsiek/hdf5storage The package is licensed under a 2-clause BSD license (https://github.com/frejanordsiek/hdf5storage/blob/master/COPYING.txt). Installation ============ Dependencies ------------ This package only supports Python >= 2.6. This package requires the numpy and h5py (>= 2.1) packages to run. Note that full functionality requires h5py >= 2.3. An optional dependency is the scipy package. Installing by pip ----------------- This package is on `PyPI `_. To install hdf5storage using pip, run the command:: pip install hdf5storage Installing from Source ---------------------- To install hdf5storage from source, download the package and then install the dependencies :: pip install -r requirements.txt Then to install the package, run the command with Python :: python setup.py install Running Tests ------------- For testing, the package nose (>= 1.0) is required as well as unittest2 on Python 2.6. There are some tests that require Matlab and scipy to be installed and be in the executable path. Not having them means that those tests cannot be run (they will be skipped) but all the other tests will run. To install all testing dependencies, other than scipy, run :: pip install -r requirements_tests.txt. To run the tests :: python setup.py nosetests Building Documentation ---------------------- The documentation additionally requires sphinx (>= 1.7). The documentation dependencies can be installed by :: pip install -r requirements_doc.txt To build the documentation :: python setup.py build_sphinx Python 2 ======== This package was designed and written for Python 3, with Python 2.7 and 2.6 support added later. This does mean that a few things are a little clunky in Python 2. Examples include requiring ``unicode`` keys for dictionaries, the ``int`` and ``long`` types both being mapped to the Python 3 ``int`` type, etc. The storage format's metadata looks more familiar from a Python 3 standpoint as well. The documentation is written in terms of Python 3 syntax and types primarily. Important Python 2 information beyond direct translations of syntax and types will be pointed out. Hierarchal Data Format 5 (HDF5) =============================== HDF5 files (see http://www.hdfgroup.org/HDF5/) are a commonly used file format for exchange of numerical data. It has built in support for a large variety of number formats (un/signed integers, floating point numbers, strings, etc.) as scalars and arrays, enums and compound types. It also handles differences in data representation on different hardware platforms (endianness, different floating point formats, etc.). As can be imagined from the name, data is represented in an HDF5 file in a hierarchal form modelling a Unix filesystem (Datasets are equivalent to files, Groups are equivalent to directories, and links are supported). This package interfaces HDF5 files using the h5py package (http://www.h5py.org/) as opposed to the PyTables package (http://www.pytables.org/). MATLAB MAT v7.3 file support ============================ MATLAB (http://www.mathworks.com/) MAT files version 7.3 and later are HDF5 files with a different file extension (``.mat``) and a very specific set of meta-data and storage conventions. This package provides read and write support for a limited set of Python and MATLAB types. SciPy (http://scipy.org/) has functions to read and write the older MAT file formats. This package has functions modeled after the ``scipy.io.savemat`` and ``scipy.io.loadmat`` functions, that have the same names and similar arguments. The dispatch to the SciPy versions if the MAT file format is not an HDF5 based one. Supported Types =============== The supported Python and MATLAB types are given in the tables below. The tables assume that one has imported collections and numpy as:: import collections as cl import numpy as np The table gives which Python types can be read and written, the first version of this package to support it, the numpy type it gets converted to for storage (if type information is not written, that will be what it is read back as) the MATLAB class it becomes if targetting a MAT file, and the first version of this package to support writing it so MATlAB can read it. =============== ======= ========================== =========== ============== Python MATLAB ---------------------------------------------------- --------------------------- Type Version Converted to Class Version =============== ======= ========================== =========== ============== bool 0.1 np.bool\_ or np.uint8 logical 0.1 [1]_ None 0.1 ``np.float64([])`` ``[]`` 0.1 int [2]_ [3]_ 0.1 np.int64 [2]_ int64 0.1 long [3]_ [4]_ 0.1 np.int64 int64 0.1 float 0.1 np.float64 double 0.1 complex 0.1 np.complex128 double 0.1 str 0.1 np.uint32/16 char 0.1 [5]_ bytes 0.1 np.bytes\_ or np.uint16 char 0.1 [6]_ bytearray 0.1 np.bytes\_ or np.uint16 char 0.1 [6]_ list 0.1 np.object\_ cell 0.1 tuple 0.1 np.object\_ cell 0.1 set 0.1 np.object\_ cell 0.1 frozenset 0.1 np.object\_ cell 0.1 cl.deque 0.1 np.object\_ cell 0.1 dict 0.1 struct 0.1 [7]_ np.bool\_ 0.1 logical 0.1 np.void 0.1 np.uint8 0.1 uint8 0.1 np.uint16 0.1 uint16 0.1 np.uint32 0.1 uint32 0.1 np.uint64 0.1 uint64 0.1 np.uint8 0.1 int8 0.1 np.int16 0.1 int16 0.1 np.int32 0.1 int32 0.1 np.int64 0.1 int64 0.1 np.float16 [8]_ 0.1 np.float32 0.1 single 0.1 np.float64 0.1 double 0.1 np.complex64 0.1 single 0.1 np.complex128 0.1 double 0.1 np.str\_ 0.1 np.uint32/16 char/uint32 0.1 [5]_ np.bytes\_ 0.1 np.bytes\_ or np.uint16 char 0.1 [6]_ np.object\_ 0.1 cell 0.1 np.ndarray 0.1 [9]_ [10]_ [9]_ [10]_ 0.1 [9]_ [11]_ np.matrix 0.1 [9]_ [9]_ 0.1 [9]_ np.chararray 0.1 [9]_ [9]_ 0.1 [9]_ np.recarray 0.1 structured np.ndarray [9]_ [10]_ 0.1 [9]_ =============== ======= ========================== =========== ============== .. [1] Depends on the selected options. Always ``np.uint8`` when doing MATLAB compatiblity, or if the option is explicitly set. .. [2] In Python 2.x, it may be read back as a ``long`` if it can't fit in the size of an ``int``. .. [3] Must be small enough to fit into an ``np.int64``. .. [4] Type found only in Python 2.x. Python 2.x's ``long`` and ``int`` are unified into a single ``int`` type in Python 3.x. Read as an ``int`` in Python 3.x. .. [5] Depends on the selected options and whether it can be converted to UTF-16 without using doublets. If the option is explicity set (or implicitly when doing MATLAB compatibility) and it can be converted to UTF-16 without losing any characters that can't be represented in UTF-16 or using UTF-16 doublets (MATLAB doesn't support them), then it is written as ``np.uint16`` in UTF-16 encoding. Otherwise, it is stored at ``np.uint32`` in UTF-32 encoding. .. [6] Depends on the selected options. If the option is explicitly set (or implicitly when doing MATLAB compatibility), it will be stored as ``np.uint16`` in UTF-16 encoding unless it has non-ASCII characters in which case a ``NotImplementedError`` is thrown). Otherwise, it is just written as ``np.bytes_``. .. [7] All keys must be ``str`` in Python 3 or ``unicode`` in Python 2. They cannot have null characters (``'\x00'``) or forward slashes (``'/'``) in them. .. [8] ``np.float16`` are not supported for h5py versions before ``2.2``. .. [9] Container types are only supported if their underlying dtype is supported. Data conversions are done based on its dtype. .. [10] Structured ``np.ndarray`` s (have fields in their dtypes) can be written as an HDF5 COMPOUND type or as an HDF5 Group with Datasets holding its fields (either the values directly, or as an HDF5 Reference array to the values for the different elements of the data). Can only be written as an HDF5 COMPOUND type if none of its field are of dtype ``'object'``. Field names cannot have null characters (``'\x00'``) and, when writing as an HDF5 GROUP, forward slashes (``'/'``) in them. .. [11] Structured ``np.ndarray`` s with no elements, when written like a structure, will not be read back with the right dtypes for their fields (will all become 'object'). This table gives the MATLAB classes that can be read from a MAT file, the first version of this package that can read them, and the Python type they are read as. =============== ======= ================================= MATLAB Class Version Python Type =============== ======= ================================= logical 0.1 np.bool\_ single 0.1 np.float32 or np.complex64 [12]_ double 0.1 np.float64 or np.complex128 [12]_ uint8 0.1 np.uint8 uint16 0.1 np.uint16 uint32 0.1 np.uint32 uint64 0.1 np.uint64 int8 0.1 np.int8 int16 0.1 np.int16 int32 0.1 np.int32 int64 0.1 np.int64 char 0.1 np.str\_ struct 0.1 structured np.ndarray cell 0.1 np.object\_ canonical empty 0.1 ``np.float64([])`` =============== ======= ================================= .. [12] Depends on whether there is a complex part or not. File Incompatibilities ====================== The storage of empty ``numpy.ndarray`` (or objects that would be stored like one) when the ``Options.store_shape_for_empty`` (implicitly set when Matlab compatibility is enabled) is incompatible with both Matlab and the main branch of this package after 2021-07-11 due to a bug (Issue #114) that cannot be fixed without breaking compatibility in the 0.1.x series and thus will not be fixed (it is however fixed in the main branch after 2021-07-11) since such a fix would mean the version could not be of the form 0.1.x. The incompatibility is caused by storing the array shape in the Dataset after reversing the dimension order instead of before, meaning that the array is read with its dimensions reversed from what is expected if read by Matlab or the main branch after 2021-07-11. Versions ======== 0.1.19. Bugfix release. * Issue #122 and #124. Replaced use of deprecated ``numpy.asscalar`` functions with the ``numpy.ndarray.item`` method. * Issue #123. Forced the use of English month and day of the week names in the HDF5 header for MATLAB compatibility. * Issue #125. Fixed accidental collection of ``pkg_resources.parse_version`` from setuptools as a Marshaller now that it is a class. 0.1.18. Performance improving release. * Pull Request #111 from Daniel Hrisca. Many repeated calls to the ``__getitem__`` methods of objects were turned into single calls. * Further reducionts in ``__getitem__`` calls in the spirit of PR #111. 0.1.17. Bugfix and deprecation workaround release that fixed the following. * Issue #109. Fixed the fix Issue #102 for 32-bit platforms (previous fix was segfaulting). * Moved to using ``pkg_resources.parse_version`` from ``setuptools`` with ``distutils.version`` classes as a fallback instead of just the later to prepare for the removal of ``distutils`` (PEP 632) and prevent warnings on Python versions where it is marked as deprecated. * Issue #110. Changed all uses of the ``tostring`` method on numpy types to using ``tobytes`` if available, with ``tostring`` as the fallback for old versions of numpy where it is not. 0.1.16. Bugfix release that fixed the following bugs. * Issue #81 and #82. ``h5py.File`` will require the mode to be passed explicitly in the future. All calls without passing it were fixed to pass it. * Issue #102. Added support for h5py 3.0 and 3.1. * Issue #73. Fixed bug where a missing variable in ``loadmat`` would cause the function to think that the file is a pre v7.3 format MAT file fall back to ``scipy.io.loadmat`` which won't work since the file is a v7.3 format MAT file. * Fixed formatting issues in the docstrings and the documentation that prevented the documentation from building. 0.1.15. Bugfix release that fixed the following bugs. * Issue #68. Fixed bug where ``str`` and ``numpy.unicode_`` strings (but not ndarrays of them) were saved in ``uint32`` format regardless of the value of ``Options.convert_numpy_bytes_to_utf16``. * Issue #70. Updated ``setup.py`` and ``requirements.txt`` to specify the maximum versions of numpy and h5py that can be used for specific python versions (avoid version with dropped support). * Issue #71. Fixed bug where the ``'python_fields'`` attribute wouldn't always be written when doing python metadata for data written in a struct-like fashion. The bug caused the field order to not be preserved when writing and reading. * Fixed an assertion in the tests to handle field re-ordering when no metadata is used for structured dtypes that only worked on older versions of numpy. * Issue #72. Fixed bug where python collections filled with ndarrays that all have the same shape were converted to multi-dimensional object ndarrays instead of a 1D object ndarray of the elements. 0.1.14. Bugfix release that also added a couple features. * Issue #45. Fixed syntax errors in unicode strings for Python 3.0 to 3.2. * Issues #44 and #47. Fixed bugs in testing of conversion and storage of string types. * Issue #46. Fixed raising of ``RuntimeWarnings`` in tests due to signalling NaNs. * Added requirements files for building documentation and running tests. * Made it so that Matlab compatability tests are skipped if Matlab is not found, instead of raising errors. 0.1.13. Bugfix release fixing the following bug. * Issue #36. Fixed bugs in writing ``int`` and ``long`` to HDF5 and their tests on 32 bit systems. 0.1.12. Bugfix release fixing the following bugs. In addition, copyright years were also updated and notices put in the Matlab files used for testing. * Issue #32. Fixed transposing before reshaping ``np.ndarray`` when reading from HDF5 files where python metadata was stored but not Matlab metadata. * Issue #33. Fixed the loss of the number of characters when reading empty numpy string arrays. * Issue #34. Fixed a conversion error when ``np.chararray`` are written with Matlab metadata. 0.1.11. Bugfix release fixing the following. * Issue #30. Fixed ``loadmat`` not opening files in read mode. 0.1.10. Minor feature/performance fix release doing the following. * Issue #29. Added ``writes`` and ``reads`` functions to write and read more than one piece of data at a time and made ``savemat`` and ``loadmat`` use them to increase performance. Previously, the HDF5 file was being opened and closed for each piece of data, which impacted performance, especially for large files. 0.1.9. Bugfix and minor feature release doing the following. * Issue #23. Fixed bug where a structured ``np.ndarray`` with a field name of ``'O'`` could never be written as an HDF5 COMPOUND Dataset (falsely thought a field's dtype was object). * Issue #6. Added optional data compression and the storage of data checksums. Controlled by several new options. 0.1.8. Bugfix release fixing the following two bugs. * Issue #21. Fixed bug where the ``'MATLAB_class'`` Attribute is not set when writing ``dict`` types when writing MATLAB metadata. * Issue #22. Fixed bug where null characters (``'\x00'``) and forward slashes (``'/'``) were allowed in ``dict`` keys and the field names of structured ``np.ndarray`` (except that forward slashes are allowed when the ``structured_numpy_ndarray_as_struct`` is not set as is the case when the ``matlab_compatible`` option is set). These cause problems for the ``h5py`` package and the HDF5 library. ``NotImplementedError`` is now thrown in these cases. 0.1.7. Bugfix release with an added compatibility option and some added test code. Did the following. * Fixed an issue reading variables larger than 2 GB in MATLAB MAT v7.3 files when no explicit variable names to read are given to ``hdf5storage.loadmat``. Fix also reduces memory consumption and processing time a little bit by removing an unneeded memory copy. * ``Options`` now will accept any additional keyword arguments it doesn't support, ignoring them, to be API compatible with future package versions with added options. * Added tests for reading data that has been compressed or had other HDF5 filters applied. 0.1.6. Bugfix release fixing a bug with determining the maximum size of a Python 2.x ``int`` on a 32-bit system. 0.1.5. Bugfix release fixing the following bug. * Fixed bug where an ``int`` could be stored that is too big to fit into an ``int`` when read back in Python 2.x. When it is too big, it is converted to a ``long``. * Fixed a bug where an ``int`` or ``long`` that is too big to big to fit into an ``np.int64`` raised the wrong exception. * Fixed bug where fields names for structured ``np.ndarray`` with non-ASCII characters (assumed to be UTF-8 encoded in Python 2.x) can't be read or written properly. * Fixed bug where ``np.bytes_`` with non-ASCII characters can were converted incorrectly to UTF-16 when that option is set (set implicitly when doing MATLAB compatibility). Now, it throws a ``NotImplementedError``. 0.1.4. Bugfix release fixing the following bugs. Thanks goes to `mrdomino `_ for writing the bug fixes. * Fixed bug where ``dtype`` is used as a keyword parameter of ``np.ndarray.astype`` when it is a positional argument. * Fixed error caused by ``h5py.__version__`` being absent on Ubuntu 12.04. 0.1.3. Bugfix release fixing the following bug. * Fixed broken ability to correctly read and write empty structured ``np.ndarray`` (has fields). 0.1.2. Bugfix release fixing the following bugs. * Removed mistaken support for ``np.float16`` for h5py versions before ``2.2`` since that was when support for it was introduced. * Structured ``np.ndarray`` where one or more fields is of the ``'object'`` dtype can now be written without an error when the ``structured_numpy_ndarray_as_struct`` option is not set. They are written as an HDF5 Group, as if the option was set. * Support for the ``'MATLAB_fields'`` Attribute for data types that are structures in MATLAB has been added for when the version of the h5py package being used is ``2.3`` or greater. Support is still missing for earlier versions (this package requires a minimum version of ``2.1``). * The check for non-unicode string keys (``str`` in Python 3 and ``unicode`` in Python 2) in the type ``dict`` is done right before any changes are made to the HDF5 file instead of in the middle so that no changes are applied if an invalid key is present. * HDF5 userblock set with the proper metadata for MATLAB support right at the beginning of when data is being written to an HDF5 file instead of at the end, meaning the writing can crash and the file will still be a valid MATLAB file. 0.1.1. Bugfix release fixing the following bugs. * ``str`` is now written like ``numpy.str_`` instead of ``numpy.bytes_``. * Complex numbers where the real or imaginary part are ``nan`` but the other part are not are now read correctly as opposed to setting both parts to ``nan``. * Fixed bugs in string conversions on Python 2 resulting from ``str.decode()`` and ``unicode.encode()`` not taking the same keyword arguments as in Python 3. * MATLAB structure arrays can now be read without producing an error on Python 2. * ``numpy.str_`` now written as ``numpy.uint16`` on Python 2 if the ``convert_numpy_str_to_utf16`` option is set and the conversion can be done without using UTF-16 doublets, instead of always writing them as ``numpy.uint32``. 0.1. Initial version. hdf5storage-0.1.19/doc/000077500000000000000000000000001436247615200145745ustar00rootroot00000000000000hdf5storage-0.1.19/doc/Makefile000066400000000000000000000127311436247615200162400ustar00rootroot00000000000000# Makefile for Sphinx documentation # # You can set these variables from the command line. SPHINXOPTS = SPHINXBUILD = sphinx-build PAPER = BUILDDIR = build # Internal variables. PAPEROPT_a4 = -D latex_paper_size=a4 PAPEROPT_letter = -D latex_paper_size=letter ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) source # the i18n builder cannot share the environment and doctrees with the others I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) source .PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest gettext help: @echo "Please use \`make ' where is one of" @echo " html to make standalone HTML files" @echo " dirhtml to make HTML files named index.html in directories" @echo " singlehtml to make a single large HTML file" @echo " pickle to make pickle files" @echo " json to make JSON files" @echo " htmlhelp to make HTML files and a HTML help project" @echo " qthelp to make HTML files and a qthelp project" @echo " devhelp to make HTML files and a Devhelp project" @echo " epub to make an epub" @echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter" @echo " latexpdf to make LaTeX files and run them through pdflatex" @echo " text to make text files" @echo " man to make manual pages" @echo " texinfo to make Texinfo files" @echo " info to make Texinfo files and run them through makeinfo" @echo " gettext to make PO message catalogs" @echo " changes to make an overview of all changed/added/deprecated items" @echo " linkcheck to check all external links for integrity" @echo " doctest to run all doctests embedded in the documentation (if enabled)" clean: -rm -rf $(BUILDDIR)/* html: $(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html @echo @echo "Build finished. The HTML pages are in $(BUILDDIR)/html." dirhtml: $(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml @echo @echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml." singlehtml: $(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml @echo @echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml." pickle: $(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle @echo @echo "Build finished; now you can process the pickle files." json: $(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json @echo @echo "Build finished; now you can process the JSON files." htmlhelp: $(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp @echo @echo "Build finished; now you can run HTML Help Workshop with the" \ ".hhp project file in $(BUILDDIR)/htmlhelp." qthelp: $(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp @echo @echo "Build finished; now you can run "qcollectiongenerator" with the" \ ".qhcp project file in $(BUILDDIR)/qthelp, like this:" @echo "# qcollectiongenerator $(BUILDDIR)/qthelp/hdf5storage.qhcp" @echo "To view the help file:" @echo "# assistant -collectionFile $(BUILDDIR)/qthelp/hdf5storage.qhc" devhelp: $(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp @echo @echo "Build finished." @echo "To view the help file:" @echo "# mkdir -p $$HOME/.local/share/devhelp/hdf5storage" @echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/hdf5storage" @echo "# devhelp" epub: $(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub @echo @echo "Build finished. The epub file is in $(BUILDDIR)/epub." latex: $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex @echo @echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex." @echo "Run \`make' in that directory to run these through (pdf)latex" \ "(use \`make latexpdf' here to do that automatically)." latexpdf: $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex @echo "Running LaTeX files through pdflatex..." $(MAKE) -C $(BUILDDIR)/latex all-pdf @echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex." text: $(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text @echo @echo "Build finished. The text files are in $(BUILDDIR)/text." man: $(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man @echo @echo "Build finished. The manual pages are in $(BUILDDIR)/man." texinfo: $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo @echo @echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo." @echo "Run \`make' in that directory to run these through makeinfo" \ "(use \`make info' here to do that automatically)." info: $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo @echo "Running Texinfo files through makeinfo..." make -C $(BUILDDIR)/texinfo info @echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo." gettext: $(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale @echo @echo "Build finished. The message catalogs are in $(BUILDDIR)/locale." changes: $(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes @echo @echo "The overview file is in $(BUILDDIR)/changes." linkcheck: $(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck @echo @echo "Link check complete; look for any errors in the above output " \ "or in $(BUILDDIR)/linkcheck/output.txt." doctest: $(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest @echo "Testing of doctests in the sources finished, look at the " \ "results in $(BUILDDIR)/doctest/output.txt." hdf5storage-0.1.19/doc/make.bat000066400000000000000000000117731436247615200162120ustar00rootroot00000000000000@ECHO OFF REM Command file for Sphinx documentation if "%SPHINXBUILD%" == "" ( set SPHINXBUILD=sphinx-build ) set BUILDDIR=build set ALLSPHINXOPTS=-d %BUILDDIR%/doctrees %SPHINXOPTS% source set I18NSPHINXOPTS=%SPHINXOPTS% source if NOT "%PAPER%" == "" ( set ALLSPHINXOPTS=-D latex_paper_size=%PAPER% %ALLSPHINXOPTS% set I18NSPHINXOPTS=-D latex_paper_size=%PAPER% %I18NSPHINXOPTS% ) if "%1" == "" goto help if "%1" == "help" ( :help echo.Please use `make ^` where ^ is one of echo. html to make standalone HTML files echo. dirhtml to make HTML files named index.html in directories echo. singlehtml to make a single large HTML file echo. pickle to make pickle files echo. json to make JSON files echo. htmlhelp to make HTML files and a HTML help project echo. qthelp to make HTML files and a qthelp project echo. devhelp to make HTML files and a Devhelp project echo. epub to make an epub echo. latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter echo. text to make text files echo. man to make manual pages echo. texinfo to make Texinfo files echo. gettext to make PO message catalogs echo. changes to make an overview over all changed/added/deprecated items echo. linkcheck to check all external links for integrity echo. doctest to run all doctests embedded in the documentation if enabled goto end ) if "%1" == "clean" ( for /d %%i in (%BUILDDIR%\*) do rmdir /q /s %%i del /q /s %BUILDDIR%\* goto end ) if "%1" == "html" ( %SPHINXBUILD% -b html %ALLSPHINXOPTS% %BUILDDIR%/html if errorlevel 1 exit /b 1 echo. echo.Build finished. The HTML pages are in %BUILDDIR%/html. goto end ) if "%1" == "dirhtml" ( %SPHINXBUILD% -b dirhtml %ALLSPHINXOPTS% %BUILDDIR%/dirhtml if errorlevel 1 exit /b 1 echo. echo.Build finished. The HTML pages are in %BUILDDIR%/dirhtml. goto end ) if "%1" == "singlehtml" ( %SPHINXBUILD% -b singlehtml %ALLSPHINXOPTS% %BUILDDIR%/singlehtml if errorlevel 1 exit /b 1 echo. echo.Build finished. The HTML pages are in %BUILDDIR%/singlehtml. goto end ) if "%1" == "pickle" ( %SPHINXBUILD% -b pickle %ALLSPHINXOPTS% %BUILDDIR%/pickle if errorlevel 1 exit /b 1 echo. echo.Build finished; now you can process the pickle files. goto end ) if "%1" == "json" ( %SPHINXBUILD% -b json %ALLSPHINXOPTS% %BUILDDIR%/json if errorlevel 1 exit /b 1 echo. echo.Build finished; now you can process the JSON files. goto end ) if "%1" == "htmlhelp" ( %SPHINXBUILD% -b htmlhelp %ALLSPHINXOPTS% %BUILDDIR%/htmlhelp if errorlevel 1 exit /b 1 echo. echo.Build finished; now you can run HTML Help Workshop with the ^ .hhp project file in %BUILDDIR%/htmlhelp. goto end ) if "%1" == "qthelp" ( %SPHINXBUILD% -b qthelp %ALLSPHINXOPTS% %BUILDDIR%/qthelp if errorlevel 1 exit /b 1 echo. echo.Build finished; now you can run "qcollectiongenerator" with the ^ .qhcp project file in %BUILDDIR%/qthelp, like this: echo.^> qcollectiongenerator %BUILDDIR%\qthelp\hdf5storage.qhcp echo.To view the help file: echo.^> assistant -collectionFile %BUILDDIR%\qthelp\hdf5storage.ghc goto end ) if "%1" == "devhelp" ( %SPHINXBUILD% -b devhelp %ALLSPHINXOPTS% %BUILDDIR%/devhelp if errorlevel 1 exit /b 1 echo. echo.Build finished. goto end ) if "%1" == "epub" ( %SPHINXBUILD% -b epub %ALLSPHINXOPTS% %BUILDDIR%/epub if errorlevel 1 exit /b 1 echo. echo.Build finished. The epub file is in %BUILDDIR%/epub. goto end ) if "%1" == "latex" ( %SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex if errorlevel 1 exit /b 1 echo. echo.Build finished; the LaTeX files are in %BUILDDIR%/latex. goto end ) if "%1" == "text" ( %SPHINXBUILD% -b text %ALLSPHINXOPTS% %BUILDDIR%/text if errorlevel 1 exit /b 1 echo. echo.Build finished. The text files are in %BUILDDIR%/text. goto end ) if "%1" == "man" ( %SPHINXBUILD% -b man %ALLSPHINXOPTS% %BUILDDIR%/man if errorlevel 1 exit /b 1 echo. echo.Build finished. The manual pages are in %BUILDDIR%/man. goto end ) if "%1" == "texinfo" ( %SPHINXBUILD% -b texinfo %ALLSPHINXOPTS% %BUILDDIR%/texinfo if errorlevel 1 exit /b 1 echo. echo.Build finished. The Texinfo files are in %BUILDDIR%/texinfo. goto end ) if "%1" == "gettext" ( %SPHINXBUILD% -b gettext %I18NSPHINXOPTS% %BUILDDIR%/locale if errorlevel 1 exit /b 1 echo. echo.Build finished. The message catalogs are in %BUILDDIR%/locale. goto end ) if "%1" == "changes" ( %SPHINXBUILD% -b changes %ALLSPHINXOPTS% %BUILDDIR%/changes if errorlevel 1 exit /b 1 echo. echo.The overview file is in %BUILDDIR%/changes. goto end ) if "%1" == "linkcheck" ( %SPHINXBUILD% -b linkcheck %ALLSPHINXOPTS% %BUILDDIR%/linkcheck if errorlevel 1 exit /b 1 echo. echo.Link check complete; look for any errors in the above output ^ or in %BUILDDIR%/linkcheck/output.txt. goto end ) if "%1" == "doctest" ( %SPHINXBUILD% -b doctest %ALLSPHINXOPTS% %BUILDDIR%/doctest if errorlevel 1 exit /b 1 echo. echo.Testing of doctests in the sources finished, look at the ^ results in %BUILDDIR%/doctest/output.txt. goto end ) :end hdf5storage-0.1.19/doc/source/000077500000000000000000000000001436247615200160745ustar00rootroot00000000000000hdf5storage-0.1.19/doc/source/api.rst000066400000000000000000000002031436247615200173720ustar00rootroot00000000000000API === .. toctree:: :maxdepth: 2 hdf5storage hdf5storage.lowlevel hdf5storage.Marshallers hdf5storage.utilities hdf5storage-0.1.19/doc/source/compression.rst000066400000000000000000000166311436247615200211760ustar00rootroot00000000000000.. currentmodule:: hdf5storage .. _Compression: =========== Compression =========== The HDF5 libraries and the :py:mod:`h5py` module support transparent compression of data in HDF5 files. The use of compression can sometimes drastically reduce file size, often makes it faster to read the data from the file, and sometimes makes it faster to write the data. Though, not all data compresses very well and can occassionally end up larger after compression than it was uncompressed. Compression does cost CPU time both when compressing the data and when decompressing it. The reason this can sometimes lead to faster read and write times is because disks are very slow and the space savings can save enough disk access time to make up for the CPU time. All versions of this package can read compressed data, but not all versions can write compressed data. .. versionadded:: 0.1.9 HDF5 write compression features added along with several options to control it in :py:class:`Options`. .. versionadded:: 0.1.7 :py:class:`Options` will take the compression options but ignores them. .. warning:: Passing the compression options for versions earlier than ``0.1.7`` will result in an error. Enabling Compression ==================== Compression, which is enabled by default, is controlled by setting :py:attr:`Options.compress` to ``True`` or passing ``compress=X`` to :py:func:`write` and :py:func:`savemat` where ``X`` is ``True`` or ``False``. .. note:: Not all python objects written to the HDF5 file will be compressed, or even support compression. For one, :py:mod:`numpy` scalars or any type that is stored as one do not support compression due to limitations of the HDF5 library, though compressing them would be a waste (hence the lack of support). Setting The Minimum Data Size for Compression ============================================= Compressing small pieces of data often wastes space (compressed size is larger than uncompressed size) and CPU time. Due to this, python objects have to be larger than a particular size before this package will compress them. The threshold, in bytes, is controlled by setting :py:attr:`Options.compress_size_threshold` or passing ``compress_size_threshold=X`` to :py:func:`write` and :py:func:`savemat` where ``X`` is a non-negative integer. The default value is 16 KB. Controlling The Compression Algorithm And Level =============================================== Many compression algorithms can be used with HDF5 files, though only three are common. The Deflate algorithm (sometimes known as the GZIP algorithm), LZF algorithm, and SZIP algorithms are the algorithms that the HDF5 library is explicitly setup to support. The library has a mechanism for adding additional algorithms. Popular ones include the BZIP2 and BLOSC algorithms. The compression algorithm used is controlled by setting :py:attr:`Options.compression_algorithm` or passing ``compression_algorithm=X`` to :py:func:`write` and :py:func:`savemat`. ``X`` is the ``str`` name of the algorithm. The default is ``'gzip'`` corresponding to the Deflate/GZIP algorithm. .. note:: As of version ``0.2``, only the Deflate (``X = 'gzip'``), LZF (``X = 'lzf'``), and SZIP (``X = 'szip'``) algorithms are supported. .. note:: If doing MATLAB compatibility (:py:attr:`Options.matlab_compatible` is ``True``), only the Deflate algorithm is supported. The algorithms, in more detail GZIP / Deflate (``'gzip'``) The common Deflate algorithm seen in the Unix and Linux ``gzip`` utility and the most common compression algorithm used in ZIP files. It is the most compatible algorithm. It achieves good compression and is reasonably fast. It has no patent or license restrictions. LZF (``'lzf'``) A very fast algorithm but with inferior compression to GZIP/Deflate. It is less commonly used than GZIP/Deflate, but similarly has no patent or license restrictions. SZIP (``'szip'``) This compression algorithm isn't always available and has patent and license restrictions. See `SZIP License `_. If GZIP/Deflate compression is being used, the compression level can be adjusted by setting :py:attr:`Options.gzip_compression_level` or passing ``gzip_compression_level=X`` to :py:func:`write` and :py:func:`savemat` where ``X`` is an integer between ``0`` and ``9`` inclusive. ``0`` is the lowest compression, but is the fastest. ``9`` gives the best compression, but is the slowest. The default is ``7``. For all compression algorithms, there is an additional filter which can help achieve better compression at relatively low cost in CPU time. It is the shuffle filter. It is controlled by setting :py:attr:`Options.shuffle_filter` or passing ``shuffle_filter=X`` to :py:func:`write` and :py:func:`savemat` where ``X`` is ``True`` or ``False``. The default is ``True``. Using Checksums =============== Fletcher32 checksums can be calculated and stored for most types of stored data in an HDF5 file. These are then checked when the data is read to catch file corruption, which will cause an error when reading the data informing the user that there is data corruption. The filter can be enabled or disabled separately for data that is compressed and data that is not compressed (e.g. compression is disabled, the python object can't be compressed, or the python object's data size is smaller than the compression threshold). For compressed data, it is controlled by setting :py:attr:`Options.compressed_fletcher32_filter` or passing ``compressed_fletcher32_filter=X`` to :py:func:`write` and :py:func:`savemat` where ``X`` is ``True`` or ``False``. The default is ``True``. For uncompressed data, it is controlled by setting :py:attr:`Options.uncompressed_fletcher32_filter` or passing ``uncompressed_fletcher32_filter=X`` to :py:func:`write` and :py:func:`savemat` where ``X`` is ``True`` or ``False``. The default is ``False``. .. note:: Fletcher32 checksums are not computed for anything that is stored as a :py:mod:`numpy` scalar. Chunking ======== When no filters are used (compression and Fletcher32), this package stores data in HDF5 files in a contiguous manner. The use of any filter requires that the data use chunked storage. Chunk sizes are determined automatically using the autochunk feature of :py:mod:`h5py`. The HDF5 libraries make reading contiguous and chunked data transparent, though access speeds can differ and the chunk size affects the compression ratio. Further Reading =============== .. seealso:: `HDF5 Datasets Filter pipeline `_ Description of the Dataset filter pipeline in the :py:mod:`h5py` `Using Compression in HDF5 `_ FAQ on compression from the HDF Group. `HDF5 Tutorial: Learning The Basics: Dataset Storage Layout `_ Information on Dataset storage format from the HDF Group `SZIP License `_ The license for using the SZIP compression algorithm. `SZIP COMPRESSION IN HDF PRODUCTS `_ Information on using SZIP compression from the HDF Group. `3rd Party Compression Algorithms for HDF5 `_ List of common additional compression algorithms. hdf5storage-0.1.19/doc/source/conf.py000066400000000000000000000207071436247615200174010ustar00rootroot00000000000000#!/usr/bin/env python3 # -*- coding: utf-8 -*- # # hdf5storage documentation build configuration file, created by # sphinx-quickstart on Sun Dec 22 00:05:54 2013. # # This file is execfile()d with the current directory set to its containing dir. # # Note that not all possible configuration values are present in this # autogenerated file. # # All configuration values have a default; values that are commented out # serve to show the default. import sys, os # If extensions (or modules to document with autodoc) are in another directory, # add these directories to sys.path here. If the directory is relative to the # documentation root, use os.path.abspath to make it absolute, like shown here. #sys.path.insert(0, os.path.abspath('.')) # -- General configuration ----------------------------------------------------- # If your documentation needs a minimal Sphinx version, state it here. needs_sphinx = '1.7' # Add any Sphinx extension module names here, as strings. They can be extensions # coming with Sphinx (named 'sphinx.ext.*') or your custom ones. extensions = ['sphinx.ext.autodoc', 'sphinx.ext.doctest', 'sphinx.ext.intersphinx', 'sphinx.ext.todo', 'sphinx.ext.coverage', 'sphinx.ext.imgmath', 'sphinx.ext.ifconfig', 'sphinx.ext.viewcode', 'sphinx.ext.autosummary', 'numpydoc'] # Add any paths that contain templates here, relative to this directory. templates_path = ['_templates'] # The suffix of source filenames. source_suffix = '.rst' # The encoding of source files. #source_encoding = 'utf-8-sig' # The master toctree document. master_doc = 'index' # General information about the project. project = 'hdf5storage' copyright = '2013-2021, Freja Nordsiek' # The version info for the project you're documenting, acts as replacement for # |version| and |release|, also used in various other places throughout the # built documents. # # The short X.Y version. version = '0.1.18' # The full version, including alpha/beta/rc tags. release = '0.1.18' # The language for content autogenerated by Sphinx. Refer to documentation # for a list of supported languages. #language = None # There are two options for replacing |today|: either, you set today to some # non-false value, then it is used: #today = '' # Else, today_fmt is used as the format for a strftime call. #today_fmt = '%B %d, %Y' # List of patterns, relative to source directory, that match files and # directories to ignore when looking for source files. exclude_patterns = [] # The reST default role (used for this markup: `text`) to use for all documents. #default_role = None # If true, '()' will be appended to :func: etc. cross-reference text. #add_function_parentheses = True # If true, the current module name will be prepended to all description # unit titles (such as .. function::). #add_module_names = True # If true, sectionauthor and moduleauthor directives will be shown in the # output. They are ignored by default. #show_authors = False # The name of the Pygments (syntax highlighting) style to use. pygments_style = 'sphinx' # A list of ignored prefixes for module index sorting. #modindex_common_prefix = [] # -- Options for HTML output --------------------------------------------------- # The theme to use for HTML and HTML Help pages. See the documentation for # a list of builtin themes. html_theme = 'default' # Theme options are theme-specific and customize the look and feel of a theme # further. For a list of options available for each theme, see the # documentation. #html_theme_options = {} # Add any paths that contain custom themes here, relative to this directory. #html_theme_path = [] # The name for this set of Sphinx documents. If None, it defaults to # " v documentation". #html_title = None # A shorter title for the navigation bar. Default is the same as html_title. #html_short_title = None # The name of an image file (relative to this directory) to place at the top # of the sidebar. #html_logo = None # The name of an image file (within the static path) to use as favicon of the # docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32 # pixels large. #html_favicon = None # Add any paths that contain custom static files (such as style sheets) here, # relative to this directory. They are copied after the builtin static files, # so a file named "default.css" will overwrite the builtin "default.css". html_static_path = ['_static'] # If not '', a 'Last updated on:' timestamp is inserted at every page bottom, # using the given strftime format. #html_last_updated_fmt = '%b %d, %Y' # If true, SmartyPants will be used to convert quotes and dashes to # typographically correct entities. #html_use_smartypants = True # Custom sidebar templates, maps document names to template names. #html_sidebars = {} # Additional templates that should be rendered to pages, maps page names to # template names. #html_additional_pages = {} # If false, no module index is generated. #html_domain_indices = True # If false, no index is generated. #html_use_index = True # If true, the index is split into individual pages for each letter. #html_split_index = False # If true, links to the reST sources are added to the pages. html_show_sourcelink = True # If true, "Created using Sphinx" is shown in the HTML footer. Default is True. #html_show_sphinx = True # If true, "(C) Copyright ..." is shown in the HTML footer. Default is True. #html_show_copyright = True # If true, an OpenSearch description file will be output, and all pages will # contain a tag referring to it. The value of this option must be the # base URL from which the finished HTML is served. #html_use_opensearch = '' # This is the file name suffix for HTML files (e.g. ".xhtml"). #html_file_suffix = None # Output file base name for HTML help builder. htmlhelp_basename = 'hdf5storagedoc' # -- Options for LaTeX output -------------------------------------------------- latex_elements = { # The paper size ('letterpaper' or 'a4paper'). #'papersize': 'letterpaper', # The font size ('10pt', '11pt' or '12pt'). #'pointsize': '10pt', # Additional stuff for the LaTeX preamble. #'preamble': '', } # Grouping the document tree into LaTeX files. List of tuples # (source start file, target name, title, author, documentclass [howto/manual]). latex_documents = [ ('index', 'hdf5storage.tex', 'hdf5storage Documentation', 'Freja Nordsiek', 'manual'), ] # The name of an image file (relative to this directory) to place at the top of # the title page. #latex_logo = None # For "manual" documents, if this is true, then toplevel headings are parts, # not chapters. #latex_use_parts = False # If true, show page references after internal links. #latex_show_pagerefs = False # If true, show URL addresses after external links. #latex_show_urls = False # Documents to append as an appendix to all manuals. #latex_appendices = [] # If false, no module index is generated. #latex_domain_indices = True # -- Options for manual page output -------------------------------------------- # One entry per manual page. List of tuples # (source start file, name, description, authors, manual section). man_pages = [ ('index', 'hdf5storage', 'hdf5storage Documentation', ['Freja Nordsiek'], 1) ] # If true, show URL addresses after external links. #man_show_urls = False # -- Options for Texinfo output ------------------------------------------------ # Grouping the document tree into Texinfo files. List of tuples # (source start file, target name, title, author, # dir menu entry, description, category) texinfo_documents = [ ('index', 'hdf5storage', 'hdf5storage Documentation', 'Freja Nordsiek', 'hdf5storage', 'One line description of project.', 'Miscellaneous'), ] # Documents to append as an appendix to all manuals. #texinfo_appendices = [] # If false, no module index is generated. #texinfo_domain_indices = True # How to display URL addresses: 'footnote', 'no', or 'inline'. #texinfo_show_urls = 'footnote' # Example configuration for intersphinx: refer to the Python standard library. intersphinx_mapping = {'python': ('http://docs.python.org/3.6', None), 'numpy': ('http://docs.scipy.org/doc/numpy', None), 'scipy': ('http://docs.scipy.org/doc/scipy/reference', None), 'h5py': ('http://docs.h5py.org/en/latest/', None)} # -- Options for Autosummary --------------------------------------------------- autosummary_generate = True # -- Options for Numpydoc ------------------------------------------------------ numpydoc_show_class_members = True hdf5storage-0.1.19/doc/source/development.rst000066400000000000000000000140001436247615200211430ustar00rootroot00000000000000.. currentmodule:: hdf5storage ======================= Development Information ======================= The source code can be found on Github at https://github.com/frejanordsiek/hdf5storage Package Overview ================ The package is currently a pure Python package; using no Cython, C/C++, or other languages. Also, pickling is not used at all and should not be added. It is a security risk since pickled data is read through the interpreter allowing arbitrary code (which could be malicious) to be executed in the interpreter. One wants to be able to read possibly HDF5 and MAT files from untrusted sources, so pickling is avoided in this package. The :py:mod:`hdf5storage` module contains the high level reading and writing functions, as well as the :py:class:`Options` class for encapsulating all the various options governing how data is read and written. The high level reading and writing functions can either be given an :py:class:`Options` object, or be given the keyword arguments that its constructur takes (they will make one from those arguments). There is also the :py:class:`MarshallerCollection` which holds all the Marshallers (more below) and provides functions to find the appropriate Marshaller given the ``type`` of a Python object, the type string used for the 'Python.Type' Attribute, or the MATLAB class string (contained in the 'MATLAB_class' Attribute). One can give the collection additional user provided Marshallers. :py:mod:`hdf5storage.lowlevel` contains the low level reading and writing functions :py:func:`lowlevel.read_data` and :py:func:`lowlevel.write_data`. They can only work on already opened HDF5 files (the high level ones handle file creation/opening), can only be given options using a :py:class:`Options` object, and read/write individual Groups/Datasets and Python objects. Any Marshaller (more below) that needs to read or write a nested object within a Group or Python object must call these functions. :py:mod:`hdf5storage.Marshallers` contains all the Marshallers for the different Python data types that can be read from or written to an HDF5 file. They are all automitically added to any :py:class:`MarshallerCollection` which inspects this module and grabs all classes within it (if a class other than a Marshaller is added to this module, :py:class:`MarshallerCollection` will need to be modified). All Marshallers need to provide the same interface as :py:class:`Marshallers.TypeMarshaller`, which is the base class for all Marshallers in this module, and should probably be inherited from by any custom Marshallers that one would write (while it can't marshall any types, it does have some useful built in functionality). The main Marshaller in the module is :py:class:`Marshallers.NumpyScalarArrayMarshaller`, which can marshall most Numpy types. All the other built in Marshallers other than :py:class:`Marshallers.PythonDictMarshaller` inherit from it since they convert their types to and from Numpy types and use the inherited functions to do the actual work with the HDF5 file. :py:mod:`hdf5storage.utilities` contains many functions that are used throughout the pacakge, especially by the Marshallers. There are several functions to get, set, and delete different kinds of HDF5 Attributes (handle things such as them already existing, not existing, etc). Then there functions to convert between different string representations, as well as encode for writing and decode after reading complex types. And then there is the function :py:func:`utilities.next_unused_name_in_group` which produces a random unused name in a Group. TODO ==== There are several features that need to be added, bugs that need to be fixed, etc. Standing Bugs ------------- * Structured ``np.ndarray`` with no elements, when :py:attr:`Options.structured_numpy_ndarray_as_struct` is set, are not written in a way that the dtypes for the fields can be restored when it is read back from file. * The Attribute 'MATLAB_fields' is supported for h5py version ``2.3`` and newer. But for older versions, it is not currently set when writing data that should be imported into MATLAB as structures, and is ignored when reading data from file. This is because the h5py package cannot work with its format in older versions. If a structure with fields 'a' and 'cd' are saved, the Attribute looks like the following when using the ``h5dump`` utility:: ATTRIBUTE "MATLAB_fields" { DATATYPE H5T_VLEN { H5T_STRING { STRSIZE 1; STRPAD H5T_STR_NULLTERM; CSET H5T_CSET_ASCII; CTYPE H5T_C_S1; }} DATASPACE SIMPLE { ( 1 ) / ( 1 ) } DATA { (0): ("a"), ("c", "d") } } In h5py version ``2.3``, the Attribute is an array of variable length arrays of single character ASCII numpy strings (vlen of ``'S1'``). It is created like so:: fields = ['a', 'cd'] dt = h5py.special_dtype(vlen=np.dtype('S1')) fs = np.empty(shape=(len(fields),), dtype=dt) for i, s in enumerate(fields): fs[i] = np.array([c.encode('ascii') for c in s], dtype='S1') Then ``fs`` looks like:: array([array([b'a'], dtype='|S1'), array([b'c', b'd'], dtype='|S1']), dtype=object) MATLAB doesn't strictly require this field, but supporting it will help with reading/writing empty MATLAB structs and not losing the fields. Adding support for older verions of h5py would probably require writing a custom Cython or C function, or porting some h5py code. Features to Add --------------- * Marshallers for more Python types. * Marshallers to be able to read the following MATLAB types * Categorical Arrays * Tables * Maps * Time Series * Classes (could be hard if they don't look like a struct in file) * Function Handles (wouldn't be able run in Python, but could at least manipulate) * A ``whosmat`` function like the SciPy one :py:func:`scipy.io.whosmat`. * A function to find and delete Datasets and Groups inside the Group :py:attr:`Options.group_for_references` that are not referenced by other Datasets in the file. hdf5storage-0.1.19/doc/source/hdf5storage.Marshallers.rst000066400000000000000000000172631436247615200233260ustar00rootroot00000000000000hdf5storage.Marshallers ======================= .. currentmodule:: hdf5storage.Marshallers .. automodule:: hdf5storage.Marshallers .. autosummary:: write_object_array read_object_array TypeMarshaller NumpyScalarArrayMarshaller PythonScalarMarshaller PythonStringMarshaller PythonNoneMarshaller PythonDictMarshaller PythonListMarshaller PythonTupleSetDequeMarshaller write_object_array ------------------ .. autofunction:: write_object_array read_object_array ------------------ .. autofunction:: read_object_array TypeMarshaller -------------- .. autoclass:: TypeMarshaller :members: get_type_string, read, write, write_metadata :show-inheritance: .. autoattribute:: TypeMarshaller.python_attributes :annotation: = {'Python.Type'} .. autoattribute:: TypeMarshaller.matlab_attributes :annotation: = {'H5PATH'} .. autoattribute:: TypeMarshaller.types :annotation: = [] .. autoattribute:: TypeMarshaller.python_type_strings :annotation: = [] .. autoattribute:: TypeMarshaller.matlab_classes :annotation: = [] NumpyScalarArrayMarshaller -------------------------- .. autoclass:: NumpyScalarArrayMarshaller :members: read, write, write_metadata :show-inheritance: .. autoattribute:: NumpyScalarArrayMarshaller.python_attributes :annotation: = {'Python.Type', 'Python.Shape', 'Python.Empty', 'Python.numpy.UnderlyingType', 'Python.numpy.Container', 'Python.Fields'} .. autoattribute:: NumpyScalarArrayMarshaller.matlab_attributes :annotation: = {'H5PATH', 'MATLAB_class', 'MATLAB_empty', 'MATLAB_int_decode', 'MATLAB_fields'} .. autoattribute:: NumpyScalarArrayMarshaller.types :annotation: = [np.ndarray, np.matrix, np.chararray, np.core.records.recarray, np.bool_, np.void, np.uint8, np.uint16, np.uint32, np.uint64, np.int8, np.int16, np.int32, np.int64, np.float16, np.float32, np.float64, np.complex64, np.complex128, np.bytes_, np.str_, np.object_] .. autoattribute:: NumpyScalarArrayMarshaller.python_type_strings :annotation: = ['numpy.ndarray', 'numpy.matrix', 'numpy.chararray', 'numpy.recarray', 'numpy.bool_', 'numpy.void', 'numpy.uint8', 'numpy.uint16', 'numpy.uint32', 'numpy.uint64', 'numpy.int8', 'numpy.int16', 'numpy.int32', 'numpy.int64', 'numpy.float16', 'numpy.float32', 'numpy.float64', 'numpy.complex64', 'numpy.complex128', 'numpy.bytes_', 'numpy.str_', 'numpy.object_'] .. autoattribute:: NumpyScalarArrayMarshaller.matlab_classes :annotation: = ['logical', 'char', 'single', 'double', 'uint8', 'uint16', 'uint32', 'uint64', 'int8', 'int16', 'int32', 'int64', 'cell', 'canonical empty'] PythonScalarMarshaller ---------------------- .. autoclass:: PythonScalarMarshaller :members: read, write :show-inheritance: .. autoattribute:: PythonScalarMarshaller.python_attributes :annotation: = {'Python.Type', 'Python.Shape', 'Python.Empty', 'Python.numpy.UnderlyingType', 'Python.numpy.Container', 'Python.Fields'} .. autoattribute:: PythonScalarMarshaller.matlab_attributes :annotation: = {'H5PATH', 'MATLAB_class', 'MATLAB_empty', 'MATLAB_int_decode'} .. autoattribute:: PythonScalarMarshaller.types :annotation: = [bool, int, float, complex] .. autoattribute:: PythonScalarMarshaller.python_type_strings :annotation: = ['bool', 'int', 'float', 'complex'] .. autoattribute:: PythonScalarMarshaller.matlab_classes :annotation: = [] PythonStringMarshaller ---------------------- .. autoclass:: PythonStringMarshaller :members: read, write :show-inheritance: .. autoattribute:: PythonStringMarshaller.python_attributes :annotation: = {'Python.Type', 'Python.Shape', 'Python.Empty', 'Python.numpy.UnderlyingType', 'Python.numpy.Container', 'Python.Fields'} .. autoattribute:: PythonStringMarshaller.matlab_attributes :annotation: = {'H5PATH', 'MATLAB_class', 'MATLAB_empty', 'MATLAB_int_decode'} .. autoattribute:: PythonStringMarshaller.types :annotation: = [str, bytes, bytearray] .. autoattribute:: PythonStringMarshaller.python_type_strings :annotation: = ['str', 'bytes', 'bytearray'] .. autoattribute:: PythonStringMarshaller.matlab_classes :annotation: = [] PythonNoneMarshaller -------------------- .. autoclass:: PythonNoneMarshaller :members: read, write :show-inheritance: .. autoattribute:: PythonNoneMarshaller.python_attributes :annotation: = {'Python.Type', 'Python.Shape', 'Python.Empty', 'Python.numpy.UnderlyingType', 'Python.numpy.Container', 'Python.Fields'} .. autoattribute:: PythonNoneMarshaller.matlab_attributes :annotation: = {'H5PATH', 'MATLAB_class', 'MATLAB_empty', 'MATLAB_int_decode'} .. autoattribute:: PythonNoneMarshaller.types :annotation: = [builtins.NoneType] .. autoattribute:: PythonNoneMarshaller.python_type_strings :annotation: = ['builtins.NoneType'] .. autoattribute:: PythonNoneMarshaller.matlab_classes :annotation: = [] PythonDictMarshaller -------------------- .. autoclass:: PythonDictMarshaller :members: read, write, write_metadata :show-inheritance: .. autoattribute:: PythonDictMarshaller.python_attributes :annotation: = {'Python.Type', 'Python.Fields'} .. autoattribute:: PythonDictMarshaller.matlab_attributes :annotation: = {'H5PATH', 'MATLAB_class', 'MATLAB_fields'} .. autoattribute:: PythonDictMarshaller.types :annotation: = [dict] .. autoattribute:: PythonDictMarshaller.python_type_strings :annotation: = ['dict'] .. autoattribute:: PythonDictMarshaller.matlab_classes :annotation: = [] PythonListMarshaller -------------------- .. autoclass:: PythonListMarshaller :members: read, write :show-inheritance: .. autoattribute:: PythonListMarshaller.python_attributes :annotation: = {'Python.Type', 'Python.Shape', 'Python.Empty', 'Python.numpy.UnderlyingType', 'Python.numpy.Container', 'Python.Fields'} .. autoattribute:: PythonListMarshaller.matlab_attributes :annotation: = {'H5PATH', 'MATLAB_class', 'MATLAB_empty', 'MATLAB_int_decode'} .. autoattribute:: PythonListMarshaller.types :annotation: = [list] .. autoattribute:: PythonListMarshaller.python_type_strings :annotation: = ['list'] .. autoattribute:: PythonListMarshaller.matlab_classes :annotation: = [] PythonTupleSetDequeMarshaller ----------------------------- .. autoclass:: PythonTupleSetDequeMarshaller :members: read, write :show-inheritance: .. autoattribute:: PythonTupleSetDequeMarshaller.python_attributes :annotation: = {'Python.Type', 'Python.Shape', 'Python.Empty', 'Python.numpy.UnderlyingType', 'Python.numpy.Container', 'Python.Fields'} .. autoattribute:: PythonTupleSetDequeMarshaller.matlab_attributes :annotation: = {'H5PATH', 'MATLAB_class', 'MATLAB_empty', 'MATLAB_int_decode'} .. autoattribute:: PythonTupleSetDequeMarshaller.types :annotation: = [tuple, set, frozenset, collections.deque] .. autoattribute:: PythonTupleSetDequeMarshaller.python_type_strings :annotation: = ['tuple', 'set', 'frozenset', 'collections.deque'] .. autoattribute:: PythonTupleSetDequeMarshaller.matlab_classes :annotation: = [] hdf5storage-0.1.19/doc/source/hdf5storage.lowlevel.rst000066400000000000000000000012211436247615200226650ustar00rootroot00000000000000hdf5storage.lowlevel ==================== .. currentmodule:: hdf5storage.lowlevel .. automodule:: hdf5storage.lowlevel .. autosummary:: Hdf5storageError CantReadError TypeNotMatlabCompatibleError write_data read_data Hdf5storageError ---------------- .. autoexception:: Hdf5storageError :show-inheritance: CantReadError ------------- .. autoexception:: CantReadError :show-inheritance: TypeNotMatlabCompatibleError ---------------------------- .. autoexception:: TypeNotMatlabCompatibleError :show-inheritance: write_data ---------- .. autofunction:: write_data read_data --------- .. autofunction:: read_data hdf5storage-0.1.19/doc/source/hdf5storage.rst000066400000000000000000000012011436247615200210330ustar00rootroot00000000000000hdf5storage =========== .. currentmodule:: hdf5storage .. automodule:: hdf5storage .. autosummary:: write writes read reads savemat loadmat Options MarshallerCollection write ----- .. autofunction:: write writes ------ .. autofunction:: writes read ----- .. autofunction:: read reads ----- .. autofunction:: reads savemat ------- .. autofunction:: savemat loadmat ------- .. autofunction:: loadmat Options ------- .. autoclass:: Options :members: :show-inheritance: MarshallerCollection -------------------- .. autoclass:: MarshallerCollection :members: :show-inheritance: hdf5storage-0.1.19/doc/source/hdf5storage.utilities.rst000066400000000000000000000042651436247615200230620ustar00rootroot00000000000000hdf5storage.utilities ===================== .. currentmodule:: hdf5storage.utilities .. automodule:: hdf5storage.utilities .. autosummary:: numpy_to_bytes does_dtype_have_a_zero_shape next_unused_name_in_group convert_numpy_str_to_uint16 convert_numpy_str_to_uint32 convert_to_str convert_to_numpy_str convert_to_numpy_bytes decode_complex encode_complex get_attribute get_attribute_string get_attribute_string_array read_all_attributes_into read_matlab_fields_attribute set_attribute set_attribute_string set_attribute_string_array del_attribute numpy_to_bytes -------------- .. autofunction:: numpy_to_bytes does_dtype_have_a_zero_shape ---------------------------- .. autofunction:: does_dtype_have_a_zero_shape next_unused_name_in_group ------------------------- .. autofunction:: next_unused_name_in_group convert_numpy_str_to_uint16 --------------------------- .. autofunction:: convert_numpy_str_to_uint16 convert_numpy_str_to_uint32 --------------------------- .. autofunction:: convert_numpy_str_to_uint32 convert_to_str -------------- .. autofunction:: convert_to_str convert_to_numpy_str -------------------- .. autofunction:: convert_to_numpy_str convert_to_numpy_bytes ---------------------- .. autofunction:: convert_to_numpy_bytes decode_complex -------------- .. autofunction:: decode_complex encode_complex -------------- .. autofunction:: encode_complex get_attribute ------------- .. autofunction:: get_attribute get_attribute_string -------------------- .. autofunction:: get_attribute_string get_attribute_string_array -------------------------- .. autofunction:: get_attribute_string_array read_all_attributes_into ------------------------ .. autofunction:: read_all_attributes_into read_matlab_fields_attribute ---------------------------- .. autofunction:: read_matlab_fields_attribute set_attribute ------------- .. autofunction:: set_attribute set_attribute_string -------------------- .. autofunction:: set_attribute_string set_attribute_string_array -------------------------- .. autofunction:: set_attribute_string_array del_attribute ------------- .. autofunction:: del_attribute hdf5storage-0.1.19/doc/source/index.rst000066400000000000000000000010131436247615200177300ustar00rootroot00000000000000.. hdf5storage documentation master file, created by sphinx-quickstart on Sun Dec 22 00:05:54 2013. You can adapt this file completely to your liking, but it should at least contain the root `toctree` directive. Welcome to hdf5storage's documentation! ======================================= Contents: .. toctree:: :maxdepth: 2 information introduction compression storage_format development api Indices and tables ================== * :ref:`genindex` * :ref:`modindex` * :ref:`search` hdf5storage-0.1.19/doc/source/information.rst000066400000000000000000000001031436247615200211450ustar00rootroot00000000000000=========== hdf5storage =========== .. include:: ../../README.rst hdf5storage-0.1.19/doc/source/introduction.rst000066400000000000000000000432111436247615200213500ustar00rootroot00000000000000.. currentmodule:: hdf5storage ============ Introduction ============ Getting Started =============== Most of the functionality that one will use is contained in the main module :: import hdf5storage Lower level functionality needed mostly for extending this package to work with more datatypes are in its submodules. The main functions in this module are :py:func:`write` and :py:func:`read` which write a single Python variable to an HDF5 file or read the specified contents at one location in an HDF5 file and convert to Python types. HDF5 files are structured much like a Unix filesystem, so everything can be referenced with a POSIX style path, which look like ``'/pyth/hf'``. Unlike a Windows path, back slashes (``'/'``) are used as directory separators instead of forward slashes (``'\'``) and the base of the file system is just ``'/'`` instead of something like ``'C:\'``. In the language of HDF5, what we call directories and files in filesystems are called groups and datasets. :py:func:`write` has many options for controlling how the data is stored, and what metadata is stored, but we can ignore that for now. If we have a variable named ``foo`` that we want to write to an HDF5 file named ``data.h5``, we would write it by :: hdf5storage.write(foo, path='/foo', filename='data.h5') And then we can read it back from the file with the :py:func:`read` function, which returns the read data. Here, we will put the data we read back into the variable ``bar`` :: bar = hdf5storage.read(path='/foo', filename='data.h5') Writing And Reading Several Python Variables at Once ==================================================== To write and read more than one Python variable, one could use :py:func:`write` and :py:func:`read` for each variable individually. This can incur a major performance penalty, especially for large HDF5 files, since each call opens and closes the HDF5 file (sometimes more than once). Version ``0.1.10`` added a way to do this without incuring this performance penalty by adding two new functions: :py:func:`writes` and :py:func:`reads`. They can write and read more than one Python variable at once, though they can still work with a single variable. In fact, :py:func:`write` and :py:func:`read` are now wrappers around them. :py:func:`savemat` and :py:func:`loadmat` currently use them for the improved performance. .. versionadded:: 0.1.10 Ability to write and read more than one Python variable at a time without opening and closing the HDF5 file each time. Main Options Controlling Writing/Reading Data ============================================= There are many individual options that control how data is written and read to/from file. These can be set by passing an :py:class:`Options` object to :py:func:`write` and :py:func:`read` by :: options = hdf5storage.Options(...) hdf5storage.write(... , options=options) hdf5storage.read(... , options=options) or passing the individual keyword arguments used by the :py:class:`Options` constructor to :py:func:`write` and :py:func:`read`. The two methods cannot be mixed (the functions will give precedence to the given :py:class:`Options` object). .. note:: Functions in the various submodules only support the :py:class:`Options` object method of passing options. The two main options are :py:attr:`Options.store_python_metadata` and :py:attr:`Options.matlab_compatible`. A more minor option is :py:attr:`Options.oned_as`. .. versionadded:: 0.1.9 Support for the transparent compression of data has been added. It is enabled by default, compressing all python objects resulting in HDF5 Datasets larger than 16 KB with the GZIP/Deflate algorithm. store_python_metadata --------------------- ``bool`` Setting this options causes metadata to be written so that the written objects can be read back into Python accurately. As HDF5 does not natively support many Python data types (essentially only Numpy types), most Python data types have to be converted before being written. If metadata isn't also written, the data cannot be read back to its original form and will instead be read back as the Python type most closely resembling how it is stored, which will be a Numpy type of some sort. .. note This option is especially important when we consider that when ``matlab_compatible == True``, many additional conversions and manipulations will be done to the data that cannot be reversed without this metadata. matlab_compatible ----------------- ``bool`` Setting this option causes the writing of HDF5 files be done in a way compatible with MATLAB v7.3 MAT files. This consists of writing some file metadata so that MATLAB recognizes the file, adding specific metadata to every stored object so that MATLAB recognizes them, and transforming the data to be in the form that MATLAB expects for certain types (for example, MATLAB expects everything to be at least a 2D array and strings to be stored in UTF-16 but with no doublets). .. note:: There are many individual small options in the :py:class:`Options` class that this option sets to specific values. Setting ``matlab_compatible`` automatically sets them, while changing their values to something else automatically turns ``matlab_compatible`` off. action_for_matlab_incompatible ------------------------------ {``'ignore'``, ``'discard'``, ``'error'``} The action to perform when doing MATLAB compatibility (``matlab_compatible == True``) but a type being written is not MATLAB compatible. The actions are to write the data anyways ('ignore'), don't write the incompatible data ('discard'), or throw a :py:exc:`lowlevel.TypeNotMatlabCompatibleError` exception. The default is 'error'. oned_as ------- {'row', 'column'} This option is only actually relevant when ``matlab_compatible == True``. MATLAB only supports 2D and higher dimensionality arrays, but Numpy supports 1D arrays. So, 1D arrays have to be made 2 dimensional making them either into row vectors or column vectors. This option sets which they become when imported into MATLAB. compress -------- .. versionadded:: 0.1.9 ``bool`` Whether to use compression when writing data. Enabled (``True``) by default. See :ref:`Compression` for more information. Convenience Functions for MATLAB MAT Files ========================================== Two functions are provided for reading and writing to MATLAB MAT files in a convenient way. They are :py:func:`savemat` and :py:func:`loadmat`, which are modelled after the SciPy functions of the same name (:py:func:`scipy.io.savemat` and :py:func:`scipy.io.loadmat`), which work with non-HDF5 based MAT files. They take not only the same options, but dispatch calls automatically to the SciPy versions when instructed to write to a non-HDF5 based MAT file, or read a MAT file that is not HDF5 based. SciPy must be installed to take advantage of this functionality. :py:func:`savemat` takes a ``dict`` having data (values) and the names to give each piece of data (keys), and writes them to a MATLAB compatible MAT file. The `format` keyword sets the MAT file format, with ``'7.3'`` being the HDF5 based format supported by this package and ``'5'`` and ``'4'`` being the non HDF5 based formats supported by SciPy. If you want the data to be able to be read accurately back into Python, you should set ``store_python_metadata=True``. Writing a couple variables to a file looks like :: hdf5storage.savemat('data.mat', {'foo': 2.3, 'bar': (1+2j)}, format='7.3', oned_as='column', store_python_metadata=True) Then, to read variables back, we can either explicitly name the variables we want :: out = hdf5storage.loadmat('data.mat', variable_names=['foo', 'bar']) or grab all variables by either not giving the `variable_names` option or setting it to ``None``. :: out = hdf5storage.loadmat('data.mat') Example: Write And Readback Including Different Metadata ======================================================== Making The Data --------------- Make a ``dict`` containing many different types in it that we want to store to disk in an HDF5 file. The initialization method depends on the Python version. Python 3 ^^^^^^^^ The ``dict`` keys must be ``str`` (the unicode string type). >>> import numpy as np >>> import hdf5storage >>> a = {'a': True, ... 'b': None, ... 'c': 2, ... 'd': -3.2, ... 'e': (1-2.3j), ... 'f': 'hello', ... 'g': b'goodbye', ... 'h': ['list', 'of', 'stuff', [30, 2.3]], ... 'i': np.zeros(shape=(2,), dtype=[('bi', 'uint8')]), ... 'j':{'aa': np.bool_(False), ... 'bb': np.uint8(4), ... 'cc': np.uint32([70, 8]), ... 'dd': np.int32([]), ... 'ee': np.float32([[3.3], [5.3e3]]), ... 'ff': np.complex128([[3.4, 3], [9+2j, 0]]), ... 'gg': np.array(['one', 'two', 'three'], dtype='str'), ... 'hh': np.bytes_(b'how many?'), ... 'ii': np.object_(['text', np.int8([1, -3, 0])])}} Python 2 ^^^^^^^^ The same thing but in Python 2 where the ``dict`` keys must be ``unicode``. The other datatypes are translated from the Python 3 example appropriately. The rest of the examples on this page are run identically in Python 2 and 3, but the outputs are listed as is returned in Python 3. >>> import numpy as np >>> import hdf5storage >>> a = {u'a': True, ... u'b': None, ... u'c': 2, ... u'd': -3.2, ... u'e': (1-2.3j), ... u'f': u'hello', ... u'g': 'goodbye', ... u'h': [u'list', u'of', u'stuff', [30, 2.3]], ... u'i': np.zeros(shape=(2,), dtype=[('bi', 'uint8')]), ... u'j':{u'aa': np.bool_(False), ... u'bb': np.uint8(4), ... u'cc': np.uint32([70, 8]), ... u'dd': np.int32([]), ... u'ee': np.float32([[3.3], [5.3e3]]), ... u'ff': np.complex128([[3.4, 3], [9+2j, 0]]), ... u'gg': np.array([u'one', u'two', u'three'], dtype='unicode'), ... u'hh': np.str_('how many?'), ... u'ii': np.object_([u'text', np.int8([1, -3, 0])])}} Using No Metadata ----------------- Write it to a file at the ``'/a'`` directory, but include no Python or MATLAB metadata. Then, read it back and notice that many objects come back quite different from what was written. Namely, everything was converted to Numpy types. This even included the dictionaries which were converted to structured ``np.ndarray``s. This happens because all other types (other than ``dict``) must be converted to these types before being written to the HDF5 file, and without metadata, the conversion cannot be reversed (while ``dict`` isn't converted, it has the same form and thus cannot be extracted reversibly). >>> hdf5storage.write(data=a, path='/a', filename='data.h5', ... store_python_metadata=False, ... matlab_compatible=False) >>> hdf5storage.read(path='/a', filename='data.h5') array([ (True, [], 2, -3.2, (1-2.3j), b'hello', b'goodbye', [array(b'list', dtype='|S4'), array(b'of', dtype='|S2'), array(b'stuff', dtype='|S5'), array([array(30), array(2.3)], dtype=object)], [(0,), (0,)], [(False, 4, array([70, 8], dtype=uint32), array([], dtype=int32), array([[ 3.29999995e+00], [ 5.30000000e+03]], dtype=float32), array([[ 3.4+0.j, 3.0+0.j], [ 9.0+2.j, 0.0+0.j]]), array([111, 110, 101, 0, 0, 116, 119, 111, 0, 0, 116, 104, 114, 101, 101], dtype=uint32), b'how many?', array([array(b'text', dtype='|S4'), array([ 1, -3, 0], dtype=int8)], dtype=object))])], dtype=[('a', '?'), ('b', '>> hdf5storage.write(data=a, path='/a', filename='data_typeinfo.h5', ... store_python_metadata=True, ... matlab_compatible=False) >>> hdf5storage.read(path='/a', filename='data_typeinfo.h5') {'a': True, 'b': None, 'c': 2, 'd': -3.2, 'e': (1-2.3j), 'f': 'hello', 'g': b'goodbye', 'h': ['list', 'of', 'stuff', [30, 2.3]], 'i': array([(0,), (0,)], dtype=[('bi', 'u1')]), 'j': {'aa': False, 'bb': 4, 'cc': array([70, 8], dtype=uint32), 'dd': array([], dtype=int32), 'ee': array([[ 3.29999995e+00], [ 5.30000000e+03]], dtype=float32), 'ff': array([[ 3.4+0.j, 3.0+0.j], [ 9.0+2.j, 0.0+0.j]]), 'gg': array(['one', 'two', 'three'], dtype='>> hdf5storage.write(data=a, path='/a', filename='data.mat', ... store_python_metadata=False, ... matlab_compatible=True) >>> hdf5storage.read(path='/a', filename='data.mat') array([ ([[True]], [[]], [[2]], [[-3.2]], [[(1-2.3j)]], [['hello']], [['goodbye']], [[array([['list']], dtype='>> hdf5storage.write(data=a, path='/a', filename='data_typeinfo.mat', ... store_python_metadata=True, ... matlab_compatible=True) >>> hdf5storage.read(path='/a', filename='data_typeinfo.mat') {'a': True, 'b': None, 'c': 2, 'd': -3.2, 'e': (1-2.3j), 'f': 'hello', 'g': b'goodbye', 'h': ['list', 'of', 'stuff', [30, 2.3]], 'i': array([(0,), (0,)], dtype=[('bi', 'u1')]), 'j': {'aa': False, 'bb': 4, 'cc': array([70, 8], dtype=uint32), 'dd': array([], dtype=int32), 'ee': array([[ 3.29999995e+00], [ 5.30000000e+03]], dtype=float32), 'ff': array([[ 3.4+0.j, 3.0+0.j], [ 9.0+2.j, 0.0+0.j]]), 'gg': array(['one', 'two', 'three'], dtype='`_). This bug cannot be fixed in the 0.1.x series without breaking compatibility and thus will not be fixed in the 0.1.x series. H5PATH ------ MATLAB Attribute ``np.str_`` For every object that is stored inside a Group other than the root of the HDF5 file (``'/'``), the path to the object is stored in this Attribute. MATLAB does not seem to require this Attribute to be there, though it does set it in the files it produces. MATLAB_fields ------------- MATLAB Attribute numpy array of vlen numpy arrays of ``'S1'`` .. versionchanged:: 0.1.2 Support for this Attribute added. Was deleted upon writing and ignored when reading before. For MATLAB structures, MATLAB sets this field to all of the field names of the structure. If this Attribute is missing, MATLAB does not seem to care. Can only be set or read properly for h5py version ``2.3`` and newer. Trying to set it to a differently formatted array of strings that older versions of h5py can handle causes an error in MATLAB when the file is imported, so this package does not set this Attribute at all for h5py version before ``2.3``. The Attribute is an array of variable length arrays of single character ASCII numpy strings (vlen of ``'S1'``). If there are two fields named ``'a'`` and ``'cd'``, it is created like so:: fields = ['a', 'cd'] dt = h5py.special_dtype(vlen=np.dtype('S1')) fs = np.empty(shape=(len(fields),), dtype=dt) for i, s in enumerate(fields): fs[i] = np.array([c.encode('ascii') for c in s], dtype='S1') Then ``fs`` looks like:: array([array([b'a'], dtype='|S1'), array([b'c', b'd'], dtype='|S1']), dtype=object) Storage of Special Types ======================== int and long ------------ Python 2.x has two integer types: a fixed-width ``int`` corresponding to a C int type, and a variable-width ``long`` for holding arbitrarily large values. An ``int`` is thus 32 or 64 bits depending on whether the python interpreter was is a 32 or 64 bit executable. In Python 3.x, both types are both unified into a single ``int`` type. Both an ``int`` and a ``long`` written in Python 2.x will be read as a ``int`` in Python 3.x. Python 3.x always writes as ``int``. Due to this and the fact that the interpreter in Python 2.x could be using 32-bits ``int``, it is possible that a value could be read that is too large to fit into ``int``. When that happens, it read as a ``long`` instead. .. warning:: Writing Python 2.x ``long`` and Python 3.x ``int`` too big to fit into an ``np.int64`` is not supported. A ``NotImplementedError`` is raised if attempted. Complex Numbers --------------- Complex numbers and ``np.object_`` arrays (and things converted to them) have to be stored in a special fashion. Since HDF5 has no builtin complex type, complex numbers are stored as an HDF5 COMPOUND type with different fieldnames for the real and imaginary partd like many other pieces of software (including MATLAB) do. Unfortunately, there is not a standardized pair of field names. h5py by default uses 'r' and 'i' for the real and imaginary parts. MATLAB uses 'real' and 'imag' instead. The :py:attr:`Options.complex_names` option controls the field names (given as a tuple in real, imaginary order) that are used for complex numbers as they are written. It is set automatically to ``('real', 'imag')`` when ``matlab_compatible == True``. When reading data, this package automatically checks numeric types for many combinations of reasonably expected field names to find complex types. np.object\_ ----------- When storing ``np.object_`` arrays, the individual elements are stored elsewhere and then an array of HDF5 Object References to their storage locations is written as the data object. The elements are all written to the Group path set by :py:attr:`Options.group_for_references` with a randomized name (this package keeps generating randomized names till an available one is found). It must be ``'/#refs#'`` for MATLAB (setting ``matlab_compatible`` sets this automatically). Those elements that can't be written (doing MATLAB compatibility and we are set to discard MATLAB incompatible types :py:attr:`Options.action_for_matlab_incompatible`) will instead end up being a reference to the canonical empty inside the group. The canonical empty has the same format as in MATLAB and is a Dataset named 'a' of ``np.uint32/64([0, 0])`` with the Attribute 'MATLAB_class' set to 'canonical empty' and the Attribute 'MATLAB_empty' set to ``np.uint8(1)``. Structure Like Data ------------------- When storing data that is MATLAB struct like (``dict`` or structured ``np.ndarray`` when :py:attr:`Options.structured_numpy_ndarray_as_struct` is set and none of its fields are of dtype ``'object'``), it is stored as an HDF5 Group with its contents of its fields written inside of the Group. For single element data (``dict`` or structured ``np.ndarray`` with only a single element), the fields are written to Datasets inside the Group. For multi-element data, the elements for each field are written in :py:attr:`Options.group_for_references` and an HDF5 Reference array to all of those elements is written as a Dataset under the field name in the Groups. Othewise, it is written as is as a Dataset that is an HDF5 COMPOUND type. .. warning:: Field names cannot have null characters (``'\x00'``) and, when writing as an HDF5 GROUP, forward slashes (``'/'``) in them. .. warning:: If it has no elements and :py:attr:`Options.structured_numpy_ndarray_as_struct` is set, it can't be read back from the file accurately. The dtype for all the fields will become 'object' instead of what they originally were. Optional Data Transformations ============================= Many different data conversions beyond turning most non-Numpy types into Numpy types, can be done and are controlled by individual settings in the :py:class:`Options` class. Most are set to fixed values when ``matlab_compatible == True``, which are shown in the table below. The transfomations are listed below by their option name, other than `complex_names` and `group_for_references` which were explained in the previous section. ================================== ==================== attribute value ================================== ==================== delete_unused_variables ``True`` structured_numpy_ndarray_as_struct ``True`` make_atleast_2d ``True`` convert_numpy_bytes_to_utf16 ``True`` convert_numpy_str_to_utf16 ``True`` convert_bools_to_uint8 ``True`` reverse_dimension_order ``True`` store_shape_for_empty ``True`` complex_names ``('real', 'imag')`` group_for_references ``'/#refs#'`` ================================== ==================== delete_unused_variables ----------------------- ``bool`` Whether any variable names in something that would be stored as an HDF5 Group (would end up a struct in MATLAB) that currently exist in the file but are not in the object being stored should be deleted on the file or not. structured_numpy_ndarray_as_struct ---------------------------------- ``bool`` Whether ``np.ndarray`` types (or things converted to them) should be written as structures/Groups if their dtype has fields as long as none of the fields' dtypes are ``'object'`` in which case this option is treated as if it were ``True``. A dtype with fields looks like ``np.dtype([('a', np.uint16), ('b': np.float32)])``. If an array satisfies this criterion and the option is set, rather than writing the data as a single Dataset, it is written as a Group with the contents of the individual fields written as Datasets within it. This option is set to ``True`` implicitly by ``matlab_compatible``. make_at_least_2d ---------------- ``bool`` Whether all Numpy types (or things converted to them) should be made into arrays of 2 dimensions if they have less than that or not. This option is set to ``True`` implicitly by ``matlab_compatible``. convert_numpy_bytes_to_utf16 ---------------------------- ``bool`` Whether all ``np.bytes_`` strings (or things converted to it) should be converted to UTF-16 and written as an array of ``np.uint16`` or not. This option is set to ``True`` implicitly by ``matlab_compatible``. .. warning:: Only ASCII characters are supported in ``np.bytes_`` when this option is set. A ``NotImplementedError`` is raised if any non-ASCII characters are present. convert_numpy_str_to_utf16 -------------------------- ``bool`` Whether all ``np.str_`` strings (or things converted to it) should be converted to UTF-16 and written as an array of ``np.uint16`` if the strings use no characters outside of the UTF-16 set and the conversion does not result in any UTF-16 doublets or not. This option is set to ``True`` implicitly by ``matlab_compatible``. convert_bools_to_uint8 ---------------------- ``bool`` Whether the ``np.bool_`` type (or things converted to it) should be converted to ``np.uint8`` (``True`` becomes ``1`` and ``False`` becomes ``0``) or not. If not, then the h5py default of an enum type that is not MATLAB compatible is used. This option is set to ``True`` implicitly by ``matlab_compatible``. reverse_dimension_order ----------------------- ``bool`` Whether the dimension order of all arrays should be reversed (essentially a transpose) or not before writing to the file. This option is set to ``True`` implicitly by ``matlab_compatible``. This option needs to be set if one wants an array to end up the same shape when imported into MATLAB. This option is necessary because Numpy and MATLAB use opposite dimension ordering schemes, which are C and Fortan schemes respectively. 2D arrays are stored by row in the C scheme and column in the Fortran scheme. store_shape_for_empty --------------------- ``bool`` Whether, for empty arrays, to store the shape of the array (after transformations) as the Dataset for the object. This option is set to ``True`` implicitly by ``matlab_compatible``. How Data Is Read from MATLAB MAT Files ====================================== This table gives the MATLAB classes that can be read from a MAT file, the first version of this package that can read them, and the Python type they are read as if there is no Python metadata attached to them. =============== ======= ================================= MATLAB Class Version Python Type =============== ======= ================================= logical 0.1 np.bool\_ single 0.1 np.float32 or np.complex64 [14]_ double 0.1 np.float64 or np.complex128 [14]_ uint8 0.1 np.uint8 uint16 0.1 np.uint16 uint32 0.1 np.uint32 uint64 0.1 np.uint64 int8 0.1 np.int8 int16 0.1 np.int16 int32 0.1 np.int32 int64 0.1 np.int64 char 0.1 np.str\_ struct 0.1 structured np.ndarray cell 0.1 np.object\_ canonical empty 0.1 ``np.float64([])`` =============== ======= ================================= .. [14] Depends on whether there is a complex part or not. hdf5storage-0.1.19/ez_setup.py000066400000000000000000000261511436247615200162440ustar00rootroot00000000000000#!/usr/bin/env python """Bootstrap setuptools installation To use setuptools in your package's setup.py, include this file in the same directory and add this to the top of your setup.py:: from ez_setup import use_setuptools use_setuptools() To require a specific version of setuptools, set a download mirror, or use an alternate download directory, simply supply the appropriate options to ``use_setuptools()``. This file can also be run as a script to install or upgrade setuptools. """ import os import shutil import sys import tempfile import tarfile import optparse import subprocess import platform import textwrap from distutils import log try: from site import USER_SITE except ImportError: USER_SITE = None DEFAULT_VERSION = "2.1" DEFAULT_URL = "https://pypi.python.org/packages/source/s/setuptools/" def _python_cmd(*args): args = (sys.executable,) + args return subprocess.call(args) == 0 def _install(tarball, install_args=()): # extracting the tarball tmpdir = tempfile.mkdtemp() log.warn('Extracting in %s', tmpdir) old_wd = os.getcwd() try: os.chdir(tmpdir) tar = tarfile.open(tarball) _extractall(tar) tar.close() # going in the directory subdir = os.path.join(tmpdir, os.listdir(tmpdir)[0]) os.chdir(subdir) log.warn('Now working in %s', subdir) # installing log.warn('Installing Setuptools') if not _python_cmd('setup.py', 'install', *install_args): log.warn('Something went wrong during the installation.') log.warn('See the error message above.') # exitcode will be 2 return 2 finally: os.chdir(old_wd) shutil.rmtree(tmpdir) def _build_egg(egg, tarball, to_dir): # extracting the tarball tmpdir = tempfile.mkdtemp() log.warn('Extracting in %s', tmpdir) old_wd = os.getcwd() try: os.chdir(tmpdir) tar = tarfile.open(tarball) _extractall(tar) tar.close() # going in the directory subdir = os.path.join(tmpdir, os.listdir(tmpdir)[0]) os.chdir(subdir) log.warn('Now working in %s', subdir) # building an egg log.warn('Building a Setuptools egg in %s', to_dir) _python_cmd('setup.py', '-q', 'bdist_egg', '--dist-dir', to_dir) finally: os.chdir(old_wd) shutil.rmtree(tmpdir) # returning the result log.warn(egg) if not os.path.exists(egg): raise IOError('Could not build the egg.') def _do_download(version, download_base, to_dir, download_delay): egg = os.path.join(to_dir, 'setuptools-%s-py%d.%d.egg' % (version, sys.version_info[0], sys.version_info[1])) if not os.path.exists(egg): tarball = download_setuptools(version, download_base, to_dir, download_delay) _build_egg(egg, tarball, to_dir) sys.path.insert(0, egg) # Remove previously-imported pkg_resources if present (see # https://bitbucket.org/pypa/setuptools/pull-request/7/ for details). if 'pkg_resources' in sys.modules: del sys.modules['pkg_resources'] import setuptools setuptools.bootstrap_install_from = egg def use_setuptools(version=DEFAULT_VERSION, download_base=DEFAULT_URL, to_dir=os.curdir, download_delay=15): to_dir = os.path.abspath(to_dir) rep_modules = 'pkg_resources', 'setuptools' imported = set(sys.modules).intersection(rep_modules) try: import pkg_resources except ImportError: return _do_download(version, download_base, to_dir, download_delay) try: pkg_resources.require("setuptools>=" + version) return except pkg_resources.DistributionNotFound: return _do_download(version, download_base, to_dir, download_delay) except pkg_resources.VersionConflict as VC_err: if imported: msg = textwrap.dedent(""" The required version of setuptools (>={version}) is not available, and can't be installed while this script is running. Please install a more recent version first, using 'easy_install -U setuptools'. (Currently using {VC_err.args[0]!r}) """).format(VC_err=VC_err, version=version) sys.stderr.write(msg) sys.exit(2) # otherwise, reload ok del pkg_resources, sys.modules['pkg_resources'] return _do_download(version, download_base, to_dir, download_delay) def _clean_check(cmd, target): """ Run the command to download target. If the command fails, clean up before re-raising the error. """ try: subprocess.check_call(cmd) except subprocess.CalledProcessError: if os.access(target, os.F_OK): os.unlink(target) raise def download_file_powershell(url, target): """ Download the file at url to target using Powershell (which will validate trust). Raise an exception if the command cannot complete. """ target = os.path.abspath(target) cmd = [ 'powershell', '-Command', "(new-object System.Net.WebClient).DownloadFile(%(url)r, %(target)r)" % vars(), ] _clean_check(cmd, target) def has_powershell(): if platform.system() != 'Windows': return False cmd = ['powershell', '-Command', 'echo test'] devnull = open(os.path.devnull, 'wb') try: try: subprocess.check_call(cmd, stdout=devnull, stderr=devnull) except: return False finally: devnull.close() return True download_file_powershell.viable = has_powershell def download_file_curl(url, target): cmd = ['curl', url, '--silent', '--output', target] _clean_check(cmd, target) def has_curl(): cmd = ['curl', '--version'] devnull = open(os.path.devnull, 'wb') try: try: subprocess.check_call(cmd, stdout=devnull, stderr=devnull) except: return False finally: devnull.close() return True download_file_curl.viable = has_curl def download_file_wget(url, target): cmd = ['wget', url, '--quiet', '--output-document', target] _clean_check(cmd, target) def has_wget(): cmd = ['wget', '--version'] devnull = open(os.path.devnull, 'wb') try: try: subprocess.check_call(cmd, stdout=devnull, stderr=devnull) except: return False finally: devnull.close() return True download_file_wget.viable = has_wget def download_file_insecure(url, target): """ Use Python to download the file, even though it cannot authenticate the connection. """ try: from urllib.request import urlopen except ImportError: from urllib2 import urlopen src = dst = None try: src = urlopen(url) # Read/write all in one block, so we don't create a corrupt file # if the download is interrupted. data = src.read() dst = open(target, "wb") dst.write(data) finally: if src: src.close() if dst: dst.close() download_file_insecure.viable = lambda: True def get_best_downloader(): downloaders = [ download_file_powershell, download_file_curl, download_file_wget, download_file_insecure, ] for dl in downloaders: if dl.viable(): return dl def download_setuptools(version=DEFAULT_VERSION, download_base=DEFAULT_URL, to_dir=os.curdir, delay=15, downloader_factory=get_best_downloader): """Download setuptools from a specified location and return its filename `version` should be a valid setuptools version number that is available as an egg for download under the `download_base` URL (which should end with a '/'). `to_dir` is the directory where the egg will be downloaded. `delay` is the number of seconds to pause before an actual download attempt. ``downloader_factory`` should be a function taking no arguments and returning a function for downloading a URL to a target. """ # making sure we use the absolute path to_dir = os.path.abspath(to_dir) tgz_name = "setuptools-%s.tar.gz" % version url = download_base + tgz_name saveto = os.path.join(to_dir, tgz_name) if not os.path.exists(saveto): # Avoid repeated downloads log.warn("Downloading %s", url) downloader = downloader_factory() downloader(url, saveto) return os.path.realpath(saveto) def _extractall(self, path=".", members=None): """Extract all members from the archive to the current working directory and set owner, modification time and permissions on directories afterwards. `path' specifies a different directory to extract to. `members' is optional and must be a subset of the list returned by getmembers(). """ import copy import operator from tarfile import ExtractError directories = [] if members is None: members = self for tarinfo in members: if tarinfo.isdir(): # Extract directories with a safe mode. directories.append(tarinfo) tarinfo = copy.copy(tarinfo) tarinfo.mode = 448 # decimal for oct 0700 self.extract(tarinfo, path) # Reverse sort directories. directories.sort(key=operator.attrgetter('name'), reverse=True) # Set correct owner, mtime and filemode on directories. for tarinfo in directories: dirpath = os.path.join(path, tarinfo.name) try: self.chown(tarinfo, dirpath) self.utime(tarinfo, dirpath) self.chmod(tarinfo, dirpath) except ExtractError as e: if self.errorlevel > 1: raise else: self._dbg(1, "tarfile: %s" % e) def _build_install_args(options): """ Build the arguments to 'python setup.py install' on the setuptools package """ return ['--user'] if options.user_install else [] def _parse_args(): """ Parse the command line for options """ parser = optparse.OptionParser() parser.add_option( '--user', dest='user_install', action='store_true', default=False, help='install in user site package (requires Python 2.6 or later)') parser.add_option( '--download-base', dest='download_base', metavar="URL", default=DEFAULT_URL, help='alternative URL from where to download the setuptools package') parser.add_option( '--insecure', dest='downloader_factory', action='store_const', const=lambda: download_file_insecure, default=get_best_downloader, help='Use internal, non-validating downloader' ) options, args = parser.parse_args() # positional arguments are ignored return options def main(version=DEFAULT_VERSION): """Install or upgrade setuptools and EasyInstall""" options = _parse_args() tarball = download_setuptools(download_base=options.download_base, downloader_factory=options.downloader_factory) return _install(tarball, _build_install_args(options)) if __name__ == '__main__': sys.exit(main()) hdf5storage-0.1.19/hdf5storage/000077500000000000000000000000001436247615200162425ustar00rootroot00000000000000hdf5storage-0.1.19/hdf5storage/Marshallers.py000066400000000000000000002323661436247615200211050ustar00rootroot00000000000000# Copyright (c) 2013-2023, Freja Nordsiek # All rights reserved. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions are # met: # # 1. Redistributions of source code must retain the above copyright # notice, this list of conditions and the following disclaimer. # # 2. Redistributions in binary form must reproduce the above copyright # notice, this list of conditions and the following disclaimer in the # documentation and/or other materials provided with the distribution. # # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS # "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT # LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR # A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT # HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, # SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT # LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, # DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY # THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. """ Module for the classes to marshall Python types to/from file. """ import sys import posixpath import collections try: from pkg_resources import parse_version except: from distutils.version import StrictVersion as parse_version import numpy as np import h5py from hdf5storage.utilities import * from hdf5storage import lowlevel from hdf5storage.lowlevel import write_data, read_data # Ubuntu 12.04's h5py doesn't have __version__ set so we need to try to # grab the version and if it isn't available, just assume it is 2.0. try: _H5PY_VERSION = h5py.__version__ except: _H5PY_VERSION = '2.0' def write_object_array(f, data, options): """ Writes an array of objects recursively. Writes the elements of the given object array recursively in the HDF5 Group ``options.group_for_references`` and returns an ``h5py.Reference`` array to all the elements. Parameters ---------- f : h5py.File The HDF5 file handle that is open. data : numpy.ndarray of objects Numpy object array to write the elements of. options : hdf5storage.core.Options hdf5storage options object. Returns ------- numpy.ndarray of h5py.Reference A reference array pointing to all the elements written to the HDF5 file. For those that couldn't be written, the respective element points to the canonical empty. Raises ------ TypeNotMatlabCompatibleError If writing a type not compatible with MATLAB and `options.action_for_matlab_incompatible` is set to ``'error'``. See Also -------- read_object_array hdf5storage.Options.group_for_references h5py.Reference """ # We need to grab the special reference dtype and make an empty # array to store all the references in. ref_dtype = h5py.special_dtype(ref=h5py.Reference) data_refs = np.zeros(shape=data.shape, dtype='object') # We need to make sure that the group to hold references is present, # and create it if it isn't. if options.group_for_references not in f: f.create_group(options.group_for_references) grp2 = f[options.group_for_references] if not isinstance(grp2, h5py.Group): del f[options.group_for_references] f.create_group(options.group_for_references) grp2 = f[options.group_for_references] # The Dataset 'a' needs to be present as the canonical empty. It is # just and np.uint32/64([0, 0]) with its a MATLAB_class of # 'canonical empty' and the 'MATLAB_empty' attribute set. If it # isn't present or is incorrectly formatted, it is created # truncating anything previously there. if 'a' not in grp2 or grp2['a'].shape != (2,) \ or not grp2['a'].dtype.name.startswith('uint') \ or np.any(grp2['a'][...] != np.uint64([0, 0])) \ or get_attribute_string(grp2['a'], 'MATLAB_class') != \ 'canonical empty' \ or get_attribute(grp2['a'], 'MATLAB_empty') != 1: if 'a' in grp2: del grp2['a'] grp2.create_dataset('a', data=np.uint64([0, 0])) set_attribute_string(grp2['a'], 'MATLAB_class', 'canonical empty') set_attribute(grp2['a'], 'MATLAB_empty', np.uint8(1)) # Go through all the elements of data and write them, gabbing their # references and putting them in data_refs. They will be put in # group_for_references, which is also what the H5PATH needs to be # set to if we are doing MATLAB compatibility (otherwise, the # attribute needs to be deleted). If an element can't be written # (doing matlab compatibility, but it isn't compatible with matlab # and action_for_matlab_incompatible option is True), the reference # to the canonical empty will be used for the reference array to # point to. for index, x in np.ndenumerate(data): data_refs[index] = None name_for_ref = next_unused_name_in_group(grp2, 16) write_data(f, grp2, name_for_ref, x, None, options) if name_for_ref in grp2: data_refs[index] = grp2[name_for_ref].ref if options.matlab_compatible: set_attribute_string(grp2[name_for_ref], 'H5PATH', grp2.name) else: del_attribute(grp2[name_for_ref], 'H5PATH') else: data_refs[index] = grp2['a'].ref # Now, the dtype needs to be changed to the reference type and the # whole thing copied over to data_to_store. return data_refs.astype(ref_dtype).copy() def read_object_array(f, data, options): """ Reads an array of objects recursively. Read the elements of the given HDF5 Reference array recursively in the and constructs a ``numpy.object_`` array from its elements, which is returned. Parameters ---------- f : h5py.File The HDF5 file handle that is open. data : numpy.ndarray of h5py.Reference The array of HDF5 References to read and make an object array from. options : hdf5storage.core.Options hdf5storage options object. Raises ------ NotImplementedError If reading the object from file is currently not supported. Returns ------- numpy.ndarray of numpy.object\\_ The Python object array containing the items pointed to by `data`. See Also -------- write_object_array hdf5storage.Options.group_for_references h5py.Reference """ # Go through all the elements of data and read them using their # references, and the putting the output in new object array. data_derefed = np.zeros(shape=data.shape, dtype='object') for index, x in np.ndenumerate(data): try: data_derefed[index] = read_data(f, f[x].parent, \ posixpath.basename(f[x].name), options) except: raise return data_derefed class TypeMarshaller(object): """ Base class for marshallers of Python types. Base class providing the class interface for marshallers of Python types to/from disk. All marshallers should inherit from this class or at least replicate its functionality. This includes several attributes that are needed in order for reading/writing methods to know if it is the appropriate marshaller to use and methods to actually do the reading and writing. Subclasses should run this class's ``__init__()`` first thing. Inheritance information is in the **Notes** section of each method. Generally, ``read``, ``write``, and ``write_metadata`` need to be overridden and the different attributes set to the proper values. For marshalling types that are containers of other data, one will need to appropriate read/write them with the lowlevel functions ``lowlevel.read_data`` and ``lowlevel.write_data``. Attributes ---------- python_attributes : set of str Attributes used to store type information. matlab_attributes : set of str Attributes used for MATLAB compatibility. types : list of types Types the marshaller can work on. python_type_strings : list of str Type strings of readable types. matlab_classes : list of str Readable MATLAB classes. See Also -------- hdf5storage.core.Options h5py.Dataset h5py.Group h5py.AttributeManager hdf5storage.lowlevel.read_data hdf5storage.lowlevel.write_data """ def __init__(self): #: Attributes used to store type information. #: #: set of str #: #: ``set`` of attribute names the marshaller uses when #: an ``Option.store_python_metadata`` is ``True``. self.python_attributes = set(['Python.Type']) #: Attributes used for MATLAB compatibility. #: #: ``set`` of ``str`` #: #: ``set`` of attribute names the marshaller uses when maintaing #: Matlab HDF5 based mat file compatibility #: (``Option.matlab_compatible`` is ``True``). self.matlab_attributes = set(['H5PATH']) #: List of Python types that can be marshalled. #: #: list of types #: #: ``list`` of the types (gotten by doing ``type(data)``) that the #: marshaller can marshall. Default value is ``[]``. self.types = [] #: Type strings of readable types. #: #: list of str #: #: ``list`` of the ``str`` that the marshaller would put in the #: HDF5 attribute 'Python.Type' to identify the Python type to be #: able to read it back correctly. Default value is ``[]``. self.python_type_strings = [] #: MATLAB class strings of readable types. #: #: list of str #: #: ``list`` of the MATLAB class ``str`` that the marshaller can #: read into Python objects. Default value is ``[]``. self.matlab_classes = [] def get_type_string(self, data, type_string): """ Gets type string. Finds the type string for 'data' contained in ``python_type_strings`` using its ``type``. Non-``None`` 'type_string` overrides whatever type string is looked up. The override makes it easier for subclasses to convert something that the parent marshaller can write to disk but still put the right type string in place). Parameters ---------- data : type to be marshalled The Python object that is being written to disk. type_string : str or None If it is a ``str``, it overrides any looked up type string. ``None`` means don't override. Returns ------- str The type string associated with 'data'. Will be 'type_string' if it is not ``None``. Notes ----- Subclasses probably do not need to override this method. """ if type_string is not None: return type_string else: i = self.types.index(type(data)) return self.python_type_strings[i] def write(self, f, grp, name, data, type_string, options): """ Writes an object's metadata to file. Writes the Python object 'data' to 'name' in h5py.Group 'grp'. Parameters ---------- f : h5py.File The HDF5 file handle that is open. grp : h5py.Group or h5py.File The parent HDF5 Group (or File if at '/') that contains the object with the specified name. name : str Name of the object. data The object to write to file. type_string : str or None The type string for `data`. If it is ``None``, one will have to be gotten by ``get_type_string``. options : hdf5storage.core.Options hdf5storage options object. Raises ------ NotImplementedError If writing 'data' to file is currently not supported. TypeNotMatlabCompatibleError If writing a type not compatible with MATLAB and `options.action_for_matlab_incompatible` is set to ``'error'``. Notes ----- Must be overridden in a subclass because a ``NotImplementedError`` is thrown immediately. See Also -------- hdf5storage.lowlevel.write_data """ raise NotImplementedError('Can''t write data type: ' + str(type(data))) def write_metadata(self, f, grp, name, data, type_string, options): """ Writes an object to file. Writes the metadata for a Python object `data` to file at `name` in h5py.Group `grp`. Metadata is written to HDF5 Attributes. Existing Attributes that are not being used are deleted. Parameters ---------- f : h5py.File The HDF5 file handle that is open. grp : h5py.Group or h5py.File The parent HDF5 Group (or File if at '/') that contains the object with the specified name. name : str Name of the object. data The object to write to file. type_string : str or None The type string for `data`. If it is ``None``, one will have to be gotten by ``get_type_string``. options : hdf5storage.core.Options hdf5storage options object. Notes ----- The attribute 'Python.Type' is set to the type string. All H5PY Attributes not in ``python_attributes`` and/or ``matlab_attributes`` (depending on the attributes of 'options') are deleted. These are needed functions for writting essentially any Python object, so subclasses should probably call the baseclass's version of this function if they override it and just provide the additional functionality needed. This requires that the names of any additional HDF5 Attributes are put in the appropriate set. """ # Make sure we have a complete type_string. type_string = self.get_type_string(data, type_string) # The metadata that is written depends on the format. dsetgrp = grp[name] if options.store_python_metadata: set_attribute_string(dsetgrp, 'Python.Type', type_string) # If we are not storing python information or doing MATLAB # compatibility, then attributes not in the python and/or # MATLAB lists need to be removed. attributes_used = set() if options.store_python_metadata: attributes_used |= self.python_attributes if options.matlab_compatible: attributes_used |= self.matlab_attributes for attribute in (set(dsetgrp.attrs.keys()) - attributes_used): del_attribute(dsetgrp, attribute) def read(self, f, grp, name, options): """ Read a Python object from file. Reads the Python object 'name' from the HDF5 Group 'grp', if possible, and returns it. Parameters ---------- f : h5py.File The HDF5 file handle that is open. grp : h5py.Group or h5py.File The parent HDF5 Group (or File if at '/') that contains the object with the specified name. name : str Name of the object. options : hdf5storage.core.Options hdf5storage options object. Raises ------ NotImplementedError If reading the object from file is currently not supported. Returns ------- data The Python object 'name' in the HDF5 Group 'grp'. Notes ----- Must be overridden in a subclass because a ``NotImplementedError`` is thrown immediately. See Also -------- hdf5storage.lowlevel.read_data """ raise NotImplementedError('Can''t read data: ' + name) class NumpyScalarArrayMarshaller(TypeMarshaller): def __init__(self): TypeMarshaller.__init__(self) self.python_attributes |= set(['Python.Shape', 'Python.Empty', 'Python.numpy.UnderlyingType', 'Python.numpy.Container', 'Python.Fields']) self.matlab_attributes |= set(['MATLAB_class', 'MATLAB_empty', 'MATLAB_int_decode', 'MATLAB_fields']) # As np.str_ is the unicode type string in Python 3 and the bare # bytes string in Python 2, we have to use np.unicode_ which is # or points to the unicode one in both versions. self.types = [np.ndarray, np.matrix, np.chararray, np.core.records.recarray, np.bool_, np.void, np.uint8, np.uint16, np.uint32, np.uint64, np.int8, np.int16, np.int32, np.int64, np.float32, np.float64, np.complex64, np.complex128, np.bytes_, np.unicode_, np.object_] self._numpy_types = list(self.types) # Using Python 3 type strings. self.python_type_strings = ['numpy.ndarray', 'numpy.matrix', 'numpy.chararray', 'numpy.recarray', 'numpy.bool_', 'numpy.void', 'numpy.uint8', 'numpy.uint16', 'numpy.uint32', 'numpy.uint64', 'numpy.int8', 'numpy.int16', 'numpy.int32', 'numpy.int64', 'numpy.float32', 'numpy.float64', 'numpy.complex64', 'numpy.complex128', 'numpy.bytes_', 'numpy.str_', 'numpy.object_'] # If we are storing in MATLAB format, we will need to be able to # set the MATLAB_class attribute. The different numpy types just # need to be properly mapped to the right strings. Some types do # not have a string since MATLAB does not support them. self.__MATLAB_classes = {np.bool_: 'logical', np.uint8: 'uint8', np.uint16: 'uint16', np.uint32: 'uint32', np.uint64: 'uint64', np.int8: 'int8', np.int16: 'int16', np.int32: 'int32', np.int64: 'int64', np.float32: 'single', np.float64: 'double', np.complex64: 'single', np.complex128: 'double', np.bytes_: 'char', np.unicode_: 'char', np.object_: 'cell'} # Make a dict to look up the opposite direction (given a matlab # class, what numpy type to use. self.__MATLAB_classes_reverse = {'logical': np.bool_, 'uint8': np.uint8, 'uint16': np.uint16, 'uint32': np.uint32, 'uint64': np.uint64, 'int8': np.int8, 'int16': np.int16, 'int32': np.int32, 'int64': np.int64, 'single': np.float32, 'double': np.float64, 'char': np.unicode_, 'cell': np.object_, 'canonical empty': np.float64, 'struct': np.object_} # Set matlab_classes to the supported classes (the values). self.matlab_classes = list(self.__MATLAB_classes.values()) # For h5py >= 2.2, half precisions (np.float16) are supported. if parse_version(_H5PY_VERSION) \ >= parse_version('2.2'): self.types.append(np.float16) self.python_type_strings.append('numpy.float16') def write(self, f, grp, name, data, type_string, options): # If we are doing matlab compatibility and the data type is not # one of those that is supported for matlab, skip writing the # data or throw an error if appropriate. structured ndarrays and # recarrays are compatible if the # structured_numpy_ndarray_as_struct option is set. if options.matlab_compatible \ and not (data.dtype.type in self.__MATLAB_classes \ or (data.dtype.fields is not None \ and options.structured_numpy_ndarray_as_struct)): if options.action_for_matlab_incompatible == 'error': raise lowlevel.TypeNotMatlabCompatibleError( \ 'Data type ' + data.dtype.name + ' not supported by MATLAB.') elif options.action_for_matlab_incompatible == 'discard': return # Need to make a set of data that will be stored. It will start # out as a copy of data and then be steadily manipulated. data_to_store = data.copy() # recarrays must be converted to structured ndarrays in order # for h5py to be able to write them. if isinstance(data_to_store, np.core.records.recarray): data_to_store = data_to_store.view(np.ndarray) # Optionally convert bytes_ strings to UTF-16, if possible (all # are in the ASCII character set). This is done by simply # converting to uint16's and checking that each one's value is # less than 128 (in the ASCII character set). This will require # making them at least 1 dimensional. If it fails, throw an # exception. if data.dtype.type == np.bytes_ \ and options.convert_numpy_bytes_to_utf16: if data_to_store.nbytes == 0: data_to_store = np.uint16([]) else: data_to_store = np.uint16(np.atleast_1d( \ data_to_store).view(np.ndarray).view(np.uint8)) if np.any(data_to_store >= 128): raise NotImplementedError( \ 'Can''t write non-ASCII numpy.bytes_.') # As of 2013-12-13, h5py cannot write numpy.str_ (UTF-32 # encoding) types (its numpy.unicode_ in Python 2, which is an # alias for it in Python 3). If the option is set to try to # convert them to UTF-16, then an attempt at the conversion is # made. If no conversion is to be done, the conversion throws an # exception (a UTF-32 character had no UTF-16 equivalent), or a # UTF-32 character gets turned into a UTF-16 doublet (the # increase in the number of columns will be by a factor more # than the length of the strings); then it will be simply # converted to uint32's byte for byte instead. if data.dtype.type == np.unicode_: new_data = None if options.convert_numpy_str_to_utf16: try: new_data = convert_numpy_str_to_uint16( \ data_to_store) except: pass if new_data is None or (type(data_to_store) == np.unicode_ \ and len(data_to_store) != len(new_data)) \ or (isinstance(data_to_store, np.ndarray) \ and new_data.shape[-1] != data_to_store.shape[-1] \ * (data_to_store.dtype.itemsize//4)): data_to_store = convert_numpy_str_to_uint32( \ data_to_store) else: data_to_store = new_data # Convert scalars to arrays if that option is set. For 1d # arrays, an option determines whether they become row or column # vectors. if options.make_atleast_2d: new_data = np.atleast_2d(data_to_store) if len(data_to_store.shape) == 1 \ and options.oned_as == 'column': new_data = new_data.T data_to_store = new_data # Reverse the dimension order if that option is set. if options.reverse_dimension_order: data_to_store = data_to_store.T # Bools need to be converted to uint8 if the option is given. if data_to_store.dtype.name == 'bool' \ and options.convert_bools_to_uint8: data_to_store = np.uint8(data_to_store) # If data is empty, we instead need to store the shape of the # array if the appropriate option is set. if options.store_shape_for_empty and (data.size == 0 \ or ((data.dtype.type == np.bytes_ \ or data.dtype.type == np.str_) \ and data.nbytes == 0)): data_to_store = np.uint64(data_to_store.shape) # If it is a complex type, then it needs to be encoded to have # the proper complex field names. if np.iscomplexobj(data_to_store): data_to_store = encode_complex(data_to_store, options.complex_names) # If we are storing an object type and it isn't empty # (data_to_store is still an object), then we must recursively # write what each element points to and make an array of the # references to them. if data_to_store.dtype.name == 'object': data_to_store = write_object_array(f, data_to_store, options) # If it an ndarray with fields and we are writing such things as # a Group/struct or if its shape is zero (h5py can't write it # Dataset then), that needs to be handled. Otherwise, it is # simply written as is to a Dataset. As HDF5 Reference types do # look like a structured object array, those have to be excluded # explicitly. Complex types may have been converted so that they # can have different field names as an HDF5 COMPOUND type, so # those have to be excluded too. Also, if any of its fields are # an object time (no matter how nested), then rather than # converting that field to a HDF5 Reference types, it will just # be written as a Group instead (just have to see if ", 'O'" is # in str(data_to_store.dtype). # # A flag, wrote_as_struct, is set depending on which path is # taken, which is then passed onto write_metadata. if data_to_store.dtype.fields is not None \ and h5py.check_dtype(ref=data_to_store.dtype) \ is not h5py.Reference \ and not np.iscomplexobj(data) \ and (options.structured_numpy_ndarray_as_struct \ or (data_to_store.dtype.hasobject \ or '\\x00' in str(data_to_store.dtype)) \ or does_dtype_have_a_zero_shape(data_to_store.dtype)): wrote_as_struct = True # Grab the list of fields that don't have a null character # or a / in them since those can't be written. field_names = [n for n in data_to_store.dtype.names if '/' not in n and '\x00' not in n] # Throw and exception if we had to exclude any field names. if len(field_names) != len(data_to_store.dtype.names): raise NotImplementedError("Null characters ('\x00') " \ + "and '/' in the field names of this type of " \ + 'numpy.ndarray are not supported.') # If the group doesn't exist, it needs to be created. If it # already exists but is not a group, it needs to be deleted # before being created. if name not in grp: grp.create_group(name) elif not isinstance(grp[name], h5py.Group): del grp[name] grp.create_group(name) grp2 = grp[name] # Write the metadata, and set the MATLAB_class to 'struct' # explicitly. if options.matlab_compatible: set_attribute_string(grp2, 'MATLAB_class', 'struct') # Delete any Datasets/Groups not corresponding to a field # name in data if that option is set. if options.delete_unused_variables: for field in set([i for i in grp2]).difference( \ set(field_names)): del grp2[field] # Go field by field making an object array (make an empty # object array and assign element wise) and write it inside # the Group. If it only has a single element, write that # single element extracted from it (will be a standard # Dataset as opposed to a HDF5 Reference array). The H5PATH # attribute needs to be set appropriately, while all other # attributes need to be deleted. grp2_name = grp2.name for field in field_names: new_data = np.zeros(shape=data_to_store.shape, dtype='object') for index, x in np.ndenumerate(data_to_store): new_data[index] = x[field] # If we are supposed to reverse dimension order, it has # already been done, but write_data expects that it # hasn't, so it needs to be reversed again before # passing it on. if options.reverse_dimension_order: new_data = new_data.T # If there is only a single element, write it extracted # (don't need to use a Reference array in this # case). Otherwise, write the whole thing. if np.prod(new_data.shape) == 1: write_data(f, grp2, field, new_data.flatten()[0], None, options) else: write_data(f, grp2, field, new_data, None, options) if field in grp2: grp2_field = grp2[field] if options.matlab_compatible: set_attribute_string(grp2_field, 'H5PATH', grp2_name) else: del_attribute(grp2_field, 'H5PATH') # In the case that we wrote a Reference array (not a # single element), then all other attributes need to # be removed. if np.prod(new_data.shape) != 1: for attribute in (set( \ grp2_field.attrs.keys()) \ - set(['H5PATH'])): del_attribute(grp2_field, attribute) else: wrote_as_struct = False # If it has fields and it isn't a Reference type, none of # them can contain a / character. if data_to_store.dtype.fields is not None \ and h5py.check_dtype(ref=data_to_store.dtype) \ is not h5py.Reference: for n in data_to_store.dtype.fields: if '\x00' in n: raise NotImplementedError( \ "Null characters ('\x00') " \ + 'in the field names of this type of ' \ + 'numpy.ndarray are not supported.') # Set the storage options such as compression, chunking, # filters, etc. If the data is being compressed (compression # is enabled and the data is bigger than the threshold), # turn on compression, set the algorithm, set the # compression level, and enable the shuffle and fletcher32 # filters appropriately. If the data is not being # compressed, turn on the fletcher32 filter if # indicated. Compression should not be done for scalars. filters = dict() is_scalar = (data_to_store.shape != tuple()) if is_scalar and options.compress \ and data_to_store.nbytes \ >= options.compress_size_threshold: filters['compression'] = \ options.compression_algorithm if filters['compression'] == 'gzip': filters['compression_opts'] = \ options.gzip_compression_level filters['shuffle'] = options.shuffle_filter filters['fletcher32'] = \ options.compressed_fletcher32_filter else: filters['compression'] = None filters['shuffle'] = False filters['compression_opts'] = None if is_scalar: filters['fletcher32'] = \ options.uncompressed_fletcher32_filter else: filters['fletcher32'] = False # Set the chunking to auto if it is being chuncked # (compressed or using the fletcher32 filter). if filters['compression'] is not None \ or filters['fletcher32']: filters['chunks'] = True else: filters['chunks'] = None # The data must first be written. If name is not present # yet, then it must be created. If it is present, but not a # Dataset, has the wrong dtype, is the wrong shape, doesn't # use the same compression, or doesn't use the same filters; # then it must be deleted and then written. Otherwise, it is # just overwritten in place. if name not in grp: grp.create_dataset(name, data=data_to_store, **filters) else: # avoid multiple calls to __getitem__ by storing the # reference in a local variable dset = grp[name] if not isinstance(dset, h5py.Dataset) \ or dset.dtype != data_to_store.dtype \ or dset.shape != data_to_store.shape \ or dset.compression != filters['compression'] \ or dset.shuffle != filters['shuffle'] \ or dset.fletcher32 != filters['fletcher32'] \ or dset.compression_opts != \ filters['compression_opts']: del grp[name] grp.create_dataset(name, data=data_to_store, **filters) else: dset[...] = data_to_store # Write the metadata using the inherited function (good enough). self.write_metadata(f, grp, name, data, type_string, options, wrote_as_struct=wrote_as_struct) def write_metadata(self, f, grp, name, data, type_string, options, wrote_as_struct=False): # wote_as_struct is used to pass whether data was written like a # matlab struct or not. If yes, then the field names must be put # in the metadata. # First, call the inherited version to do most of the work. TypeMarshaller.write_metadata(self, f, grp, name, data, type_string, options) # Write the underlying numpy type if we are storing python # information. # If we are storing python information; the shape, underlying # numpy type, and its type of container ('scalar', 'ndarray', # 'matrix', or 'chararray') need to be stored. # avoid multiple calls to __getitem__ by storing the # reference in a local variable dsetgrp = grp[name] if options.store_python_metadata: set_attribute(dsetgrp, 'Python.Shape', np.uint64(data.shape)) # Now, in Python 3, the dtype names for bare bytes and # unicode strings start with 'bytes' and 'str' respectively, # but in Python 2, they start with 'string' and 'unicode' # respectively. The Python 2 ones must be converted to the # Python 3 ones for writing. set_attribute_string(dsetgrp, \ 'Python.numpy.UnderlyingType', \ data.dtype.name.replace('string', 'bytes').replace( \ 'unicode', 'str')) if isinstance(data, np.matrix): container = 'matrix' elif isinstance(data, np.chararray): container = 'chararray' elif isinstance(data, np.core.records.recarray): container = 'recarray' elif isinstance(data, np.ndarray): container = 'ndarray' else: container = 'scalar' set_attribute_string(dsetgrp, 'Python.numpy.Container', container) # If it was written like a matlab struct, then we set the # 'Python.Fields' and 'MATLAB_fields' Attributes to the field # names if we are storing python metadata or doing matlab # compatibility and we are storing a structured ndarray as a # structure. if wrote_as_struct: # Grab the list of fields. They need to be converted to # unicode in Python 2.x. if sys.hexversion >= 0x03000000: field_names = list(data.dtype.names) else: field_names = [c.decode('UTF-8') for c in list(data.dtype.names)] # Write or delete 'Python.Fields' as appropriate. if options.store_python_metadata: set_attribute_string_array(dsetgrp, 'Python.Fields', field_names) else: del_attribute(dsetgrp, 'Python.Fields') # If we are making it MATLAB compatible and have h5py # version >= 2.3, then we can set the MATLAB_fields # Attribute as long as all keys are mappable to # ASCII. Otherwise, the attribute should be deleted. It is # written as a vlen='S1' array of bytes_ arrays of the # individual characters. if options.matlab_compatible \ and parse_version( \ _H5PY_VERSION) \ >= parse_version('2.3'): try: dt = h5py.special_dtype(vlen=np.dtype('S1')) fs = np.empty(shape=(len(field_names),), dtype=dt) for i, s in enumerate(field_names): fs[i] = np.array([c.encode('ascii') for c in s], dtype='S1') except UnicodeEncodeError: del_attribute(dsetgrp, 'MATLAB_fields') else: set_attribute(dsetgrp, 'MATLAB_fields', fs) else: del_attribute(dsetgrp, 'MATLAB_fields') else: del_attribute(dsetgrp, 'Python.Fields') del_attribute(dsetgrp, 'MATLAB_fields') # If data is empty, we need to set the Python.Empty and # MATLAB_empty attributes to 1 if we are storing type info or # making it MATLAB compatible. Otherwise, no empty attribute is # set and existing ones must be deleted. if data.size == 0 or ((data.dtype.type == np.bytes_ \ or data.dtype.type == np.str_) and data.nbytes == 0): if options.store_python_metadata: set_attribute(dsetgrp, 'Python.Empty', np.uint8(1)) else: del_attribute(dsetgrp, 'Python.Empty') if options.matlab_compatible: set_attribute(dsetgrp, 'MATLAB_empty', np.uint8(1)) else: del_attribute(dsetgrp, 'MATLAB_empty') else: del_attribute(dsetgrp, 'Python.Empty') del_attribute(dsetgrp, 'MATLAB_empty') # If we are making it MATLAB compatible, the MATLAB_class # attribute needs to be set looking up the data type (gotten # using np.dtype.type). If it is a string or bool type, then # the MATLAB_int_decode attribute must be set to the number of # bytes each element takes up (dtype.itemsize). If the dtype has # fields and we are writing it as a structure, the class needs # to be overriddent to 'struct'. Otherwise, the attributes must # be deleted. tp = data.dtype.type if options.matlab_compatible: if data.dtype.fields is not None \ and options.structured_numpy_ndarray_as_struct: set_attribute_string(dsetgrp, 'MATLAB_class', 'struct') elif tp in self.__MATLAB_classes: set_attribute_string(dsetgrp, 'MATLAB_class', self.__MATLAB_classes[tp]) if tp in (np.bytes_, np.str_, np.bool_): set_attribute(dsetgrp, 'MATLAB_int_decode', np.int64(grp[name].dtype.itemsize)) else: del_attribute(dsetgrp, 'MATLAB_int_decode') else: del_attribute(dsetgrp, 'MATLAB_class') del_attribute(dsetgrp, 'MATLAB_empty') del_attribute(dsetgrp, 'MATLAB_int_decode') else: del_attribute(dsetgrp, 'MATLAB_class') del_attribute(dsetgrp, 'MATLAB_empty') del_attribute(dsetgrp, 'MATLAB_int_decode') def read(self, f, grp, name, options): # If name is not present, then we can't read it and have to # throw an error. if name not in grp: raise NotImplementedError(name + ' is not present.') # Get the object. dsetgrp = grp[name] # Get the different attributes this marshaller uses. if sys.hexversion >= 0x03000000: defaultfactory = type(None) else: defaultfactory = lambda : None attributes = collections.defaultdict(defaultfactory) read_all_attributes_into(dsetgrp.attrs, attributes) str_attrs = dict() for attr_name in ('Python.Type', 'Python.numpy.UnderlyingType', 'Python.numpy.Container', 'MATLAB_class'): value = attributes[attr_name] if value is None: str_attrs[attr_name] = value elif (sys.hexversion >= 0x03000000 \ and isinstance(value, str)) \ or (sys.hexversion < 0x03000000 \ and isinstance(value, unicode)): str_attrs[attr_name] = value elif isinstance(value, bytes): str_attrs[attr_name] = value.decode() elif isinstance(value, np.unicode_): str_attrs[attr_name] = str(value) elif isinstance(value, np.bytes_): str_attrs[attr_name] = value.decode() else: str_attrs[attr_name] = None type_string = str_attrs['Python.Type'] underlying_type = str_attrs['Python.numpy.UnderlyingType'] container = str_attrs['Python.numpy.Container'] matlab_class = str_attrs['MATLAB_class'] shape = attributes['Python.Shape'] python_empty = attributes['Python.Empty'] matlab_empty = attributes['MATLAB_empty'] python_fields = attributes['Python.Fields'] if python_fields is not None: python_fields = [convert_to_str(x) for x in python_fields] # Read the MATLAB_fields Attribute if it was present. matlab_fields = attributes['MATLAB_fields'] # If it is a Dataset, it can simply be read and then acted upon # (if it is an HDF5 Reference array, it will need to be read # recursively). If it is a Group, then it is a structured # ndarray like object that needs to be read field wise and # constructed. if isinstance(dsetgrp, h5py.Dataset): # Read the data. data = dsetgrp[...] # If it is a reference type, then we need to make an object # array that is its replicate, but with the objects they are # pointing to in their elements instead of just the # references. if h5py.check_dtype(ref=dsetgrp.dtype) is not None: data = read_object_array(f, data, options) else: # Starting with an empty dict, all that has to be done is # iterate through all the Datasets and Groups in dsetgrp # and add them to a dict with their name as the key. Since # we don't want an exception thrown by reading an element to # stop the whole reading process, the reading is wrapped in # a try block that just catches exceptions and then does # nothing about them (nothing needs to be done). We also # need to keep track of whether any of the fields are # Groups, aren't Reference arrays, or have attributes other # than H5PATH since that means that the fields are the # values (single element structured ndarray), as opposed to # Reference arrays to all the values (multi-element structed # ndarray). In Python 2, the field names need to be # converted to str from unicode when storing the fields in # struct_data. struct_data = dict() is_multi_element = True for k, fld in dsetgrp.items(): # We must exclude group_for_references if fld.name == options.group_for_references: continue if isinstance(fld, h5py.Group) \ or h5py.check_dtype(ref=fld.dtype) is None \ or len(set(fld.attrs.keys()) \ & ((set(self.python_attributes) \ | set(self.matlab_attributes)) - set(['H5PATH', 'MATLAB_empty', 'Python.Empty']))) != 0: is_multi_element = False try: struct_data[k] = read_data(f, dsetgrp, k, options) except: pass # If it isn't multi element, we need to pack all the values # in struct_array inside of numpy.object_'s so that the code # after this that depends on this will work. if not is_multi_element: for k, v in struct_data.items(): obj = np.zeros((1,), dtype='object') obj[0] = v struct_data[k] = obj # The dtype for the structured ndarray needs to be # composed. This is done by going through each field (in the # proper order, if the fields were given, or any order if # not) and determine the dtype and shape of that field to # put in the list. if python_fields is not None or matlab_fields is not None: if python_fields is not None: fields = python_fields else: fields = [numpy_to_bytes(k).decode() for k in matlab_fields] # Now, there may be fields available that were not # given, but still should be read. Keys that are not in # python_fields need to be added to the list. extra_fields = list(set(struct_data.keys()) - set(fields)) fields.extend(sorted(extra_fields)) else: fields = sorted(list(struct_data.keys())) dt_whole = [] for k in fields: # In Python 2, the field names for a structured ndarray # must be str as opposed to unicode, so k needs to be # converted in the Python 2 case. if sys.hexversion >= 0x03000000: k_name = k else: k_name = k.encode('UTF-8') # Read the value. v = struct_data[k] # If any of the elements are not Numpy types or if they # don't all have the exact same dtype and shape, then # this field will just be an object field. if v.size == 0 or type(v.flat[0]) \ not in self._numpy_types: dt_whole.append((k_name, 'object')) continue first = v.flatten()[0] dt = first.dtype sp = first.shape all_same = True for index, x in np.ndenumerate(v): if not isinstance(x, tuple(self.types)) \ or dt != x.dtype or sp != x.shape: all_same = False break # If they are all the same, then dt and shape should be # used. Otherwise, it has to be object. if all_same: dt_whole.append((k_name, dt, sp)) else: dt_whole.append((k_name, 'object')) # Make the structured ndarray with the constructed # dtype. The shape is simply the shape of the object arrays # of its fields, so we might as well use the shape of # v. Then, all the elements of every field need to be # assigned. Now, if dtype's itemsize is 0, a TypeError will # be thrown by numpy due to a bug in numpy. np.zeros (as # well as ones and empty) does not like to make arrays with # no bytes. A workaround is to make an empty array of some # other type and convert its dtype. The smallest one we can # make is an np.int8([]). Yes, one byte will be wasted, but # at least no errors will happen. dtwhole = np.dtype(dt_whole) if dtwhole.itemsize == 0: data = np.zeros(shape=v.shape, dtype='int8').astype(dtwhole) else: data = np.zeros(shape=v.shape, dtype=dtwhole) for k, v in struct_data.items(): # There is no sense iterating through the elements if # the shape is an empty shape. if all(data.shape) and all(v.shape): for index, x in np.ndenumerate(v): if sys.hexversion >= 0x03000000: data[k][index] = x else: data[k.encode('UTF-8')][index] = x # If metadata is present, that can be used to do convert to the # desired/closest Python data types. If none is present, or not # enough of it, then no conversions can be done. if type_string is not None and underlying_type is not None and \ shape is not None: # If the Attributes 'Python.Fields' and/or 'MATLAB_fields' # are present, the underlying type needs to be changed to # the proper dtype for the structure. if python_fields is not None or matlab_fields is not None: if python_fields is not None: fields = python_fields else: fields = [numpy_to_bytes(k).decode() for k in matlab_fields] struct_dtype = list() for k in fields: if sys.hexversion >= 0x03000000: struct_dtype.append((k, 'object')) else: struct_dtype.append((k.encode('UTF-8'), 'object')) else: struct_dtype = None # If it is empty ('Python.Empty' set to 1), then the shape # information is stored in data and we need to set data to # the empty array of the proper type (in underlying_type) # and the given shape. If we are going to transpose it # later, we need to transpose it now so that it still keeps # the right shape. Also, if it is a structure that we just # figured out the dtype for, that needs to be used. if python_empty == 1: if underlying_type.startswith('bytes'): if underlying_type == 'bytes': nchars = 1 else: nchars = int(int( underlying_type[len('bytes'):]) / 8) data = np.zeros(tuple(shape), dtype='S' + str(nchars)) elif underlying_type.startswith('str'): if underlying_type == 'str': nchars = 1 else: nchars = int(int( underlying_type[len('str'):]) / 32) data = np.zeros(tuple(shape), dtype='U' + str(nchars)) elif struct_dtype is not None: data = np.zeros(tuple(shape), dtype=struct_dtype) else: data = np.zeros(tuple(shape), dtype=underlying_type) if matlab_class is not None or \ options.reverse_dimension_order: data = data.T # If it is a complex type, then it needs to be decoded # properly. if underlying_type.startswith('complex'): data = decode_complex(data) # If its underlying type is 'bool' but it is something else, # then it needs to be converted (means it was written with # the convert_bools_to_uint8 option). if underlying_type == 'bool' and data.dtype.name != 'bool': data = np.bool_(data) # If MATLAB attributes are present or the reverse dimension # order option was given, the dimension order needs to be # reversed. This needs to be done before any reshaping as # the shape was stored before any dimensional reordering. if matlab_class is not None or \ options.reverse_dimension_order: data = data.T # String types might have to be decoded depending on the # underlying type, and MATLAB class if given. They also need # to be properly decoded into strings of the right length if # it originally represented an array of strings (turned into # uints of some sort). The length in bits is contained in # the dtype name, which is the underlying_type. if underlying_type.startswith('bytes'): if underlying_type == 'bytes': data = np.bytes_(b'') else: data = convert_to_numpy_bytes(data, \ length=int(underlying_type[5:])//8) elif underlying_type.startswith('str') \ or matlab_class == 'char': if underlying_type == 'str': data = np.unicode_('') elif underlying_type.startswith('str'): data = convert_to_numpy_str(data, \ length=int(underlying_type[3:])//32) else: data = convert_to_numpy_str(data) # If the shape of data and the shape attribute are # different but give the same number of elements, then data # needs to be reshaped. if tuple(shape) != data.shape \ and np.prod(shape) == np.prod(data.shape): data = data.reshape(tuple(shape)) # If data is a structured ndarray and the type string says # it is a recarray, then turn it into one. if type_string == 'numpy.recarray': data = data.view(np.core.records.recarray) # Convert to scalar, matrix, chararray, or ndarray depending # on the container type. For an empty scalar string, it # needs to be manually set to '' and b'' or there will be # problems. if container == 'scalar': if underlying_type.startswith('bytes'): if python_empty == 1: data = np.bytes_(b'') elif isinstance(data, np.ndarray): data = data.flatten()[0] elif underlying_type.startswith('str'): if python_empty == 1: data = np.unicode_('') elif isinstance(data, np.ndarray): data = data.flatten()[0] else: data = data.flatten()[0] elif container == 'matrix': data = np.asmatrix(data) elif container == 'chararray': data = data.view(np.chararray) elif container == 'ndarray': data = np.asarray(data) elif matlab_class in self.__MATLAB_classes_reverse: # MATLAB formatting information was given. The extraction # did most of the work except handling empties, array # dimension order, and string conversion. # If it is empty ('MATLAB_empty' set to 1), then the shape # information is stored in data and we need to set data to # the empty array of the proper type. If it is a MATLAB # struct, then the proper dtype has to be constructed from # the field names if present (the dtype of each individual # field is set to object). if matlab_empty == 1: if matlab_fields is None: data = np.zeros(tuple(np.uint64(data)), \ dtype=self.__MATLAB_classes_reverse[ \ matlab_class]) else: dt_whole = list() for k in matlab_fields: if sys.hexversion >= 0x03000000: dt_whole.append((numpy_to_bytes(k).decode(), 'object')) else: dt_whole.append((numpy_to_bytes(k), 'object')) data = np.zeros(shape=tuple(np.uint64(data)), dtype=dt_whole) # The order of the dimensions must be switched from Fortran # order which MATLAB uses to C order which Python uses. data = data.T # Now, if the matlab class is 'single' or 'double', data # could possibly be a complex type which needs to be # properly decoded. if matlab_class in ['single', 'double']: data = decode_complex(data) # If it is a logical, then it must be converted to # numpy.bool8. if matlab_class == 'logical': data = np.bool_(data) # If it is a 'char' type, the proper conversion to # numpy.unicode needs to be done. if matlab_class == 'char': data = convert_to_numpy_str(data) # Done adjusting data, so it can be returned. return data class PythonScalarMarshaller(NumpyScalarArrayMarshaller): def __init__(self): NumpyScalarArrayMarshaller.__init__(self) # In Python 3, there is only a single integer type int, which is # variable width. In Python 2, there is the fixed width int and # the variable width long. Python 2 needs to be able to save # with either, but Python 3 needs to map both to int, which can # be done by just putting the type int for its entry in types. if sys.hexversion >= 0x03000000: self.types = [bool, int, int, float, complex] else: self.types = [bool, int, long, float, complex] self.python_type_strings = ['bool', 'int', 'long', 'float', 'complex'] # As the parent class already has MATLAB strings handled, there # are no MATLAB classes that this marshaller should be used for. self.matlab_classes = [] def write(self, f, grp, name, data, type_string, options): # data just needs to be converted to the appropriate numpy # type. If it is a Python 3.x int or Python 2.x long that is too # big to fit in a numpy.int64, we need to throw an not # implemented exception so it doesn't get packaged as an # object. It is converted explicitly to a numpy.int64. If it is # too big, there will be an OverflowError. Otherwise, data is # passed through np.array and then access [()] to get the scalar # back as a scalar numpy type. The proper type_string needs to # be grabbed now as the parent function will have a modified # form of data to guess from if not given the right one # explicitly. if sys.hexversion >= 0x03000000: tp = int else: tp = long if type(data) == tp: try: out = np.int64(data) except OverflowError: raise NotImplementedError('Int/long too big to fit ' + 'into numpy.int64.') else: out = data NumpyScalarArrayMarshaller.write(self, f, grp, name, np.array(out)[()], self.get_type_string(data, type_string), options) def read(self, f, grp, name, options): # Use the parent class version to read it and do most of the # work. data = NumpyScalarArrayMarshaller.read(self, f, grp, name, options) # The type string determines how to convert it back to a Python # type (just look up the entry in types). As it might be # returned as an ndarray, we just need to use the item # method. Now, since int and long are unified in Python 3.x and # the size of int in Python 2.x is not always the same, if the # type_string is 'int', then we need to check to see if it can # fit into an int if we are in Python 2.x. If it will fit, it is # returned as an int. If it would not fit, it is returned as a # long. type_string = get_attribute_string(grp[name], 'Python.Type') if type_string in self.python_type_strings: tp = self.types[self.python_type_strings.index( type_string)] sdata = data.item() if sys.hexversion >= 0x03000000 or tp != int: return tp(sdata) else: num = long(sdata) if num > sys.maxint or num < -(sys.maxint - 1): return num else: return int(num) else: # Must be some other type, so return it as is. return data class PythonStringMarshaller(NumpyScalarArrayMarshaller): def __init__(self): NumpyScalarArrayMarshaller.__init__(self) # In Python 3, the unicode and bare bytes type strings are str # and bytes, but before Python 3, they were unicode and str # respectively. The Python 3 python_type_strings will be used, # though. if sys.hexversion >= 0x03000000: self.types = [str, bytes, bytearray] else: self.types = [unicode, str, bytearray] self.python_type_strings = ['str', 'bytes', 'bytearray'] # As the parent class already has MATLAB strings handled, there # are no MATLAB classes that this marshaller should be used for. self.matlab_classes = [] def write(self, f, grp, name, data, type_string, options): # data just needs to be converted to a numpy string of the # appropriate type (str to np.str_ and the others to np.bytes_). if (sys.hexversion >= 0x03000000 and isinstance(data, str)) \ or (sys.hexversion < 0x03000000 \ and isinstance(data, unicode)): cdata = np.unicode_(data) else: cdata = np.bytes_(data) # Now pass it to the parent version of this function to write # it. The proper type_string needs to be grabbed now as the # parent function will have a modified form of data to guess # from if not given the right one explicitly. NumpyScalarArrayMarshaller.write(self, f, grp, name, cdata, self.get_type_string(data, type_string), options) def read(self, f, grp, name, options): # Use the parent class version to read it and do most of the # work. data = NumpyScalarArrayMarshaller.read(self, f, grp, name, options) # The type string determines how to convert it back to a Python # type (just look up the entry in types). Otherwise, return it # as is. type_string = get_attribute_string(grp[name], 'Python.Type') if type_string == 'str': return convert_to_str(data) elif type_string == 'bytes': if sys.hexversion >= 0x03000000: return bytes(data) else: return str(data) elif type_string == 'bytearray': return bytearray(data) else: return data class PythonNoneMarshaller(NumpyScalarArrayMarshaller): def __init__(self): NumpyScalarArrayMarshaller.__init__(self) self.types = [type(None)] self.python_type_strings = ['builtins.NoneType'] # None corresponds to no MATLAB class. self.matlab_classes = [] def write(self, f, grp, name, data, type_string, options): # Just going to use the parent function with an empty double # (two dimensional so that MATLAB will import it as a []) as the # data and the right type_string set (parent can't guess right # from the modified form). NumpyScalarArrayMarshaller.write(self, f, grp, name, np.float64([]), self.get_type_string(data, type_string), options) def read(self, f, grp, name, options): # There is only one value, so return it. return None class PythonDictMarshaller(TypeMarshaller): def __init__(self): TypeMarshaller.__init__(self) self.python_attributes |= set(['Python.Fields']) self.matlab_attributes |= set(['MATLAB_class', 'MATLAB_fields']) self.types = [dict] self.python_type_strings = ['dict'] self.__MATLAB_classes = {dict: 'struct'} # Set matlab_classes to empty since NumpyScalarArrayMarshaller # handles Groups by default now. self.matlab_classes = list() def write(self, f, grp, name, data, type_string, options): # Check for any field names that are not unicode since they # cannot be handled. Also check for null characters and / # characters since they can't be handled either. How it is # checked (what type it is) and the error message are different # for each Python version. if sys.hexversion >= 0x03000000: for fieldname in data: if not isinstance(fieldname, str): raise NotImplementedError('Dictionaries with non-' + 'str keys are not ' + 'supported: ' + repr(fieldname)) if '\x00' in fieldname or '/' in fieldname: raise NotImplementedError('Dictionary keys with ' \ + "null characters ('\x00') and '/' are not " \ + 'supported.') else: for fieldname in data: if not isinstance(fieldname, unicode): raise NotImplementedError('Dictionaries with non-' + 'unicode keys are not ' + 'supported: ' + repr(fieldname)) if unicode('\x00') in fieldname \ or unicode('/') in fieldname: raise NotImplementedError('Dictionary keys with ' \ + "null characters ('\x00') and '/' are not " \ + 'supported.') # If the group doesn't exist, it needs to be created. If it # already exists but is not a group, it needs to be deleted # before being created. if name not in grp: grp2 = grp.create_group(name) elif not isinstance(grp[name], h5py.Group): del grp[name] grp2 = grp.create_group(name) else: grp2 = grp[name] # Write the metadata. self.write_metadata(f, grp, name, data, type_string, options) # Delete any Datasets/Groups not corresponding to a field name # in data if that option is set. if options.delete_unused_variables: for field in set([i for i in grp2]).difference( \ set([i for i in data])): del grp2[field] # Go through all the elements of data and write them. The H5PATH # needs to be set as the path of grp2 on all of them if we are # doing MATLAB compatibility (otherwise, the attribute needs to # be deleted). grp2_name = grp2.name for k, v in data.items(): write_data(f, grp2, k, v, None, options) if k in grp2: if options.matlab_compatible: set_attribute_string(grp2[k], 'H5PATH', grp2_name) else: del_attribute(grp2[k], 'H5PATH') def write_metadata(self, f, grp, name, data, type_string, options): # First, call the inherited version to do most of the work and # get the group. TypeMarshaller.write_metadata(self, f, grp, name, data, type_string, options) grp2 = grp[name] # Grab all the keys and sort the list. fields = sorted(list(data.keys())) # If we are storing python metadata, we need to set the # 'Python.Fields' Attribute to be all the keys. if options.store_python_metadata: set_attribute_string_array(grp2, 'Python.Fields', fields) # If we are making it MATLAB compatible and have h5py version # >= 2.3, then we can set the MATLAB_fields Attribute as long as # all keys are mappable to ASCII. Otherwise, the attribute # should be deleted. It is written as a vlen='S1' array of # bytes_ arrays of the individual characters. if options.matlab_compatible \ and parse_version(_H5PY_VERSION) \ >= parse_version('2.3'): try: dt = h5py.special_dtype(vlen=np.dtype('S1')) fs = np.empty(shape=(len(fields),), dtype=dt) for i, s in enumerate(fields): fs[i] = np.array([c.encode('ascii') for c in s], dtype='S1') except UnicodeDecodeError: del_attribute(grp2, 'MATLAB_fields') else: set_attribute(grp2, 'MATLAB_fields', fs) else: del_attribute(grp2, 'MATLAB_fields') # If we are making it MATLAB compatible, the MATLAB_class # attribute needs to be set for the data type. If the type # cannot be found or if we are not doing MATLAB compatibility, # the attributes need to be deleted. tp = type(data) if options.matlab_compatible and tp in self.__MATLAB_classes: set_attribute_string(grp2, 'MATLAB_class', self.__MATLAB_classes[tp]) else: del_attribute(grp2, 'MATLAB_class') def read(self, f, grp, name, options): # If name is not present or is not a Group, then we can't read # it and have to throw an error. grp2 = grp.get(name) if grp2 is None: raise NotImplementedError('No object with name ' + name + 'is present.') if not isinstance(grp2, h5py.Group): raise NotImplementedError(name + ' is not a Group.') # Starting with an empty dict, all that has to be done is # iterate through all the Datasets and Groups in grp[name] and # add them to the dict with their name as the key. Since we # don't want an exception thrown by reading an element to stop # the whole reading process, the reading is wrapped in a try # block that just catches exceptions and then does nothing about # them (nothing needs to be done). data = dict() for k, dsetgrp in grp2.items(): # We must exclude group_for_references if dsetgrp.name == options.group_for_references: continue try: data[k] = read_data(f, grp2, k, options) except: pass return data class PythonListMarshaller(NumpyScalarArrayMarshaller): def __init__(self): NumpyScalarArrayMarshaller.__init__(self) self.types = [list] self.python_type_strings = ['list'] # As the parent class already has MATLAB strings handled, there # are no MATLAB classes that this marshaller should be used for. self.matlab_classes = [] def write(self, f, grp, name, data, type_string, options): # data just needs to be converted to the appropriate numpy type # (pass it through np.object_ to get the and then pass it to the # parent version of this function. The proper type_string needs # to be grabbed now as the parent function will have a modified # form of data to guess from if not given the right one # explicitly. out = np.zeros(dtype='object', shape=(len(data), )) out[:] = data NumpyScalarArrayMarshaller.write(self, f, grp, name, out, self.get_type_string(data, type_string), options) def read(self, f, grp, name, options): # Use the parent class version to read it and do most of the # work. data = NumpyScalarArrayMarshaller.read(self, f, grp, name, options) # Passing it through list does all the work of making it a list # again. return list(data) class PythonTupleSetDequeMarshaller(PythonListMarshaller): def __init__(self): PythonListMarshaller.__init__(self) self.types = [tuple, set, frozenset, collections.deque] self.python_type_strings = ['tuple', 'set', 'frozenset', 'collections.deque'] # As the parent class already has MATLAB strings handled, there # are no MATLAB classes that this marshaller should be used for. self.matlab_classes = [] def write(self, f, grp, name, data, type_string, options): # data just needs to be converted to a list and then pass it to # the parent version of this function. The proper type_string # needs to be grabbed now as the parent function will have a # modified form of data to guess from if not given the right one # explicitly. PythonListMarshaller.write(self, f, grp, name, list(data), self.get_type_string(data, type_string), options) def read(self, f, grp, name, options): # Use the parent class version to read it and do most of the # work. data = PythonListMarshaller.read(self, f, grp, name, options) # The type string determines how to convert it back to a Python # type (just look up the entry in types). type_string = get_attribute_string(grp[name], 'Python.Type') if type_string in self.python_type_strings: tp = self.types[self.python_type_strings.index( type_string)] return tp(data) else: # Must be some other type, so return it as is. return data hdf5storage-0.1.19/hdf5storage/__init__.py000066400000000000000000002054331436247615200203620ustar00rootroot00000000000000# Copyright (c) 2013-2023, Freja Nordsiek # All rights reserved. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions are # met: # # 1. Redistributions of source code must retain the above copyright # notice, this list of conditions and the following disclaimer. # # 2. Redistributions in binary form must reproduce the above copyright # notice, this list of conditions and the following disclaimer in the # documentation and/or other materials provided with the distribution. # # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS # "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT # LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR # A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT # HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, # SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT # LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, # DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY # THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. """ This is the hdf5storage package, a Python package to read and write python data types to HDF5 (Heirarchal Data Format) files beyond just Numpy types. Version 0.1.19 """ __version__ = "0.1.19" import sys import os import posixpath import copy import inspect import datetime import h5py from . import lowlevel from hdf5storage.lowlevel import Hdf5storageError, CantReadError, \ TypeNotMatlabCompatibleError from . import Marshallers class Options(object): """ Set of options governing how data is read/written to/from disk. There are many ways that data can be transformed as it is read or written from a file, and many attributes can be used to describe the data depending on its format. The option with the most effect is the `matlab_compatible` option. It makes sure that the file is compatible with MATLAB's HDF5 based version 7.3 mat file format. It overrides several options to the values in the following table. ================================== ==================== attribute value ================================== ==================== delete_unused_variables ``True`` structured_numpy_ndarray_as_struct ``True`` make_atleast_2d ``True`` convert_numpy_bytes_to_utf16 ``True`` convert_numpy_str_to_utf16 ``True`` convert_bools_to_uint8 ``True`` reverse_dimension_order ``True`` store_shape_for_empty ``True`` complex_names ``('real', 'imag')`` group_for_references ``'/#refs#'`` compression_algorithm ``'gzip'`` ================================== ==================== In addition to setting these options, a specially formatted block of bytes is put at the front of the file so that MATLAB can recognize its format. Parameters ---------- store_python_metadata : bool, optional See Attributes. matlab_compatible : bool, optional See Attributes. action_for_matlab_incompatible : str, optional See Attributes. Only valid values are 'ignore', 'discard', and 'error'. delete_unused_variables : bool, optional See Attributes. structured_numpy_ndarray_as_struct : bool, optional See Attributes. make_atleast_2d : bool, optional See Attributes. convert_numpy_bytes_to_utf16 : bool, optional See Attributes. convert_numpy_str_to_utf16 : bool, optional See Attributes. convert_bools_to_uint8 : bool, optional See Attributes. reverse_dimension_order : bool, optional See Attributes. store_shape_for_empty : bool, optional See Attributes. complex_names : tuple of two str, optional See Attributes. group_for_references : str, optional See Attributes. oned_as : str, optional See Attributes. compress : bool, optional See Attributes. compress_size_threshold : int, optional See Attributes. compression_algorithm : str, optional See Attributes. gzip_compression_level : int, optional See Attributes. shuffle_filter : bool, optional See Attributes. compressed_fletcher32_filter : bool, optional See Attributes. uncompressed_fletcher32_filter : bool, optional See Attributes. marshaller_collection : MarshallerCollection, optional See Attributes. **keywords : Additional keyword arguments. They are ignored. They are allowed to be given to be more compatible with future versions of this package where more options will be added. Attributes ---------- store_python_metadata : bool matlab_compatible : bool action_for_matlab_incompatible : str delete_unused_variables : bool structured_numpy_ndarray_as_struct : bool make_atleast_2d : bool convert_numpy_bytes_to_utf16 : bool convert_numpy_str_to_utf16 : bool convert_bools_to_uint8 : bool reverse_dimension_order : bool store_shape_for_empty : bool complex_names : tuple of two str group_for_references : str oned_as : {'row', 'column'} compress : bool compress_size_threshold : int compression_algorithm : {'gzip', 'lzf', 'szip'} gzip_compression_level : int shuffle_filter : bool compressed_fletcher32_filter : bool uncompressed_fletcher32_filter : bool scalar_options : dict ``h5py.Group.create_dataset`` options for writing scalars. array_options : dict ``h5py.Group.create_dataset`` options for writing scalars. marshaller_collection : MarshallerCollection Collection of marshallers to disk. """ def __init__(self, store_python_metadata=True, matlab_compatible=True, action_for_matlab_incompatible='error', delete_unused_variables=False, structured_numpy_ndarray_as_struct=False, make_atleast_2d=False, convert_numpy_bytes_to_utf16=False, convert_numpy_str_to_utf16=False, convert_bools_to_uint8=False, reverse_dimension_order=False, store_shape_for_empty=False, complex_names=('r', 'i'), group_for_references="/#refs#", oned_as='row', compress=True, compress_size_threshold=16*1024, compression_algorithm='gzip', gzip_compression_level=7, shuffle_filter=True, compressed_fletcher32_filter=True, uncompressed_fletcher32_filter=False, marshaller_collection=None, **keywords): # Set the defaults. self._store_python_metadata = True self._action_for_matlab_incompatible = 'error' self._delete_unused_variables = False self._structured_numpy_ndarray_as_struct = False self._make_atleast_2d = False self._convert_numpy_bytes_to_utf16 = False self._convert_numpy_str_to_utf16 = False self._convert_bools_to_uint8 = False self._reverse_dimension_order = False self._store_shape_for_empty = False self._complex_names = ('r', 'i') self._group_for_references = "/#refs#" self._oned_as = 'row' self._compress = True self._compress_size_threshold = 16*1024 self._compression_algorithm = 'gzip' self._gzip_compression_level = 7 self._shuffle_filter = True self._compressed_fletcher32_filter = True self._uncompressed_fletcher32_filter = False self._matlab_compatible = True # Apply all the given options using the setters, making sure to # do matlab_compatible last since it will override most of the # other ones. self.store_python_metadata = store_python_metadata self.action_for_matlab_incompatible = \ action_for_matlab_incompatible self.delete_unused_variables = delete_unused_variables self.structured_numpy_ndarray_as_struct = \ structured_numpy_ndarray_as_struct self.make_atleast_2d = make_atleast_2d self.convert_numpy_bytes_to_utf16 = convert_numpy_bytes_to_utf16 self.convert_numpy_str_to_utf16 = convert_numpy_str_to_utf16 self.convert_bools_to_uint8 = convert_bools_to_uint8 self.reverse_dimension_order = reverse_dimension_order self.store_shape_for_empty = store_shape_for_empty self.complex_names = complex_names self.group_for_references = group_for_references self.oned_as = oned_as self.compress = compress self.compress_size_threshold = compress_size_threshold self.compression_algorithm = compression_algorithm self.gzip_compression_level = gzip_compression_level self.shuffle_filter = shuffle_filter self.compressed_fletcher32_filter = compressed_fletcher32_filter self.uncompressed_fletcher32_filter = \ uncompressed_fletcher32_filter self.matlab_compatible = matlab_compatible # Set the h5py options to use for writing scalars and arrays to # blank for now. self.scalar_options = dict() self.array_options = dict() # Use the given marshaller collection if it was # given. Otherwise, use the default. #: Collection of marshallers to disk. #: #: MarshallerCollection #: #: See Also #: -------- #: MarshallerCollection self.marshaller_collection = marshaller_collection if not isinstance(marshaller_collection, MarshallerCollection): self.marshaller_collection = MarshallerCollection() @property def store_python_metadata(self): """ Whether or not to store Python metadata. bool If ``True`` (default), information on the Python type for each object written to disk is put in its attributes so that it can be read back into Python as the same type. """ return self._store_python_metadata @store_python_metadata.setter def store_python_metadata(self, value): # Check that it is a bool, and then set it. This option does not # effect MATLAB compatibility if isinstance(value, bool): self._store_python_metadata = value @property def matlab_compatible(self): """ Whether or not to make the file compatible with MATLAB. bool If ``True`` (default), data is written to file in such a way that it compatible with MATLAB's version 7.3 mat file format which is HDF5 based. Setting it to ``True`` forces other options to hold the specific values in the table below. ================================== ==================== attribute value ================================== ==================== delete_unused_variables ``True`` structured_numpy_ndarray_as_struct ``True`` make_atleast_2d ``True`` convert_numpy_bytes_to_utf16 ``True`` convert_numpy_str_to_utf16 ``True`` convert_bools_to_uint8 ``True`` reverse_dimension_order ``True`` store_shape_for_empty ``True`` complex_names ``('real', 'imag')`` group_for_references ``'/#refs#'`` compression_algorithm ``'gzip'`` ================================== ==================== In addition to setting these options, a specially formatted block of bytes is put at the front of the file so that MATLAB can recognize its format. """ return self._matlab_compatible @matlab_compatible.setter def matlab_compatible(self, value): # If it is a bool, it can be set. If it is set to true, then # several other options need to be set appropriately. if isinstance(value, bool): self._matlab_compatible = value if value: self._delete_unused_variables = True self._structured_numpy_ndarray_as_struct = True self._make_atleast_2d = True self._convert_numpy_bytes_to_utf16 = True self._convert_numpy_str_to_utf16 = True self._convert_bools_to_uint8 = True self._reverse_dimension_order = True self._store_shape_for_empty = True self._complex_names = ('real', 'imag') self._group_for_references = "/#refs#" self._compression_algorithm = 'gzip' @property def action_for_matlab_incompatible(self): """ The action to do when writing non-MATLAB compatible data. {'ignore', 'discard', 'error'} The action to perform when doing MATLAB compatibility but a type being written is not MATLAB compatible. The actions are to write the data anyways ('ignore'), don't write the incompatible data ('discard'), or throw a ``TypeNotMatlabCompatibleError`` exception. The default is 'error'. See Also -------- matlab_compatible hdf5storage.lowlevel.TypeNotMatlabCompatibleError """ return self._action_for_matlab_incompatible @action_for_matlab_incompatible.setter def action_for_matlab_incompatible(self, value): # Check that it is one of the allowed values, and then set # it. This option does not effect MATLAB compatibility. if value in ('ignore', 'discard', 'error'): self._action_for_matlab_incompatible = value @property def delete_unused_variables(self): """ Whether or not to delete file variables not written to. bool If ``True`` (defaults to ``False`` unless MATLAB compatibility is being done), variables in the file below where writing starts that are not written to are deleted. Must be ``True`` if doing MATLAB compatibility. """ return self._delete_unused_variables @delete_unused_variables.setter def delete_unused_variables(self, value): # Check that it is a bool, and then set it. If it is false, we # are not doing MATLAB compatible formatting. if isinstance(value, bool): self._delete_unused_variables = value if not self._delete_unused_variables: self._matlab_compatible = False @property def structured_numpy_ndarray_as_struct(self): """ Whether or not to convert structured ndarrays to structs. bool If ``True`` (defaults to ``False`` unless MATLAB compatibility is being done), all ``numpy.ndarray``s with fields (compound dtypes) are written as HDF5 Groups with the fields as Datasets (correspond to struct arrays in MATLAB). Must be ``True`` if doing MATLAB compatibility. MATLAB cannot handle the compound types made by writing these types. """ return self._structured_numpy_ndarray_as_struct @structured_numpy_ndarray_as_struct.setter def structured_numpy_ndarray_as_struct(self, value): # Check that it is a bool, and then set it. If it is false, we # are not doing MATLAB compatible formatting. if isinstance(value, bool): self._structured_numpy_ndarray_as_struct = value if not self._structured_numpy_ndarray_as_struct: self._matlab_compatible = False @property def make_atleast_2d(self): """ Whether or not to convert scalar types to 2D arrays. bool If ``True`` (defaults to ``False`` unless MATLAB compatibility is being done), all scalar types are converted to 2D arrays when written to file. ``oned_as`` determines whether 1D arrays are turned into row or column vectors. Must be ``True`` if doing MATLAB compatibility. MATLAB can only import 2D and higher dimensional arrays. See Also -------- oned_as """ return self._make_atleast_2d @make_atleast_2d.setter def make_atleast_2d(self, value): # Check that it is a bool, and then set it. If it is false, we # are not doing MATLAB compatible formatting. if isinstance(value, bool): self._make_atleast_2d = value if not self._make_atleast_2d: self._matlab_compatible = False @property def convert_numpy_bytes_to_utf16(self): """ Whether or not to convert numpy.bytes\\_ to UTF-16. bool If ``True`` (defaults to ``False`` unless MATLAB compatibility is being done), ``numpy.bytes_`` and anything that is converted to them (``bytes``, and ``bytearray``) are converted to UTF-16 before being written to file as ``numpy.uint16``. Must be ``True`` if doing MATLAB compatibility. MATLAB uses UTF-16 for its strings. See Also -------- numpy.bytes_ convert_numpy_str_to_utf16 """ return self._convert_numpy_bytes_to_utf16 @convert_numpy_bytes_to_utf16.setter def convert_numpy_bytes_to_utf16(self, value): # Check that it is a bool, and then set it. If it is false, we # are not doing MATLAB compatible formatting. if isinstance(value, bool): self._convert_numpy_bytes_to_utf16 = value if not self._convert_numpy_bytes_to_utf16: self._matlab_compatible = False @property def convert_numpy_str_to_utf16(self): """ Whether or not to convert numpy.str\\_ to UTF-16. bool If ``True`` (defaults to ``False`` unless MATLAB compatibility is being done), ``numpy.str_`` and anything that is converted to them (``str``) will be converted to UTF-16 if possible before being written to file as ``numpy.uint16``. If doing so would lead to a loss of data (character can't be translated to UTF-16) or would change the shape of an array of ``numpy.str_`` due to a character being converted into a pair 2-bytes, the conversion will not be made and the string will be stored in UTF-32 form as a ``numpy.uint32``. Must be ``True`` if doing MATLAB compatibility. MATLAB uses UTF-16 for its strings. See Also -------- numpy.bytes_ convert_numpy_str_to_utf16 """ return self._convert_numpy_str_to_utf16 @convert_numpy_str_to_utf16.setter def convert_numpy_str_to_utf16(self, value): # Check that it is a bool, and then set it. If it is false, we # are not doing MATLAB compatible formatting. if isinstance(value, bool): self._convert_numpy_str_to_utf16 = value if not self._convert_numpy_str_to_utf16: self._matlab_compatible = False @property def convert_bools_to_uint8(self): """ Whether or not to convert bools to ``numpy.uint8``. bool If ``True`` (defaults to ``False`` unless MATLAB compatibility is being done), bool types are converted to ``numpy.uint8`` before being written to file. Must be ``True`` if doing MATLAB compatibility. MATLAB doesn't use the enums that ``h5py`` wants to use by default and also uses uint8 intead of int8. """ return self._convert_bools_to_uint8 @convert_bools_to_uint8.setter def convert_bools_to_uint8(self, value): # Check that it is a bool, and then set it. If it is false, we # are not doing MATLAB compatible formatting. if isinstance(value, bool): self._convert_bools_to_uint8 = value if not self._convert_bools_to_uint8: self._matlab_compatible = False @property def reverse_dimension_order(self): """ Whether or not to reverse the order of array dimensions. bool If ``True`` (defaults to ``False`` unless MATLAB compatibility is being done), the dimension order of ``numpy.ndarray`` and ``numpy.matrix`` are reversed. This switches them from C ordering to Fortran ordering. The switch of ordering is essentially a transpose. Must be ``True`` if doing MATLAB compatibility. MATLAB uses Fortran ordering. """ return self._reverse_dimension_order @reverse_dimension_order.setter def reverse_dimension_order(self, value): # Check that it is a bool, and then set it. If it is false, we # are not doing MATLAB compatible formatting. if isinstance(value, bool): self._reverse_dimension_order = value if not self._reverse_dimension_order: self._matlab_compatible = False @property def store_shape_for_empty(self): """ Whether to write the shape if an object has no elements. bool If ``True`` (defaults to ``False`` unless MATLAB compatibility is being done), objects that have no elements (e.g. a 0x0x2 array) will have their shape (an array of the number of elements along each axis) written to disk in place of nothing, which would otherwise be written. Must be ``True`` if doing MATLAB compatibility. For empty arrays, MATLAB requires that the shape array be written in its place along with the attribute 'MATLAB_empty' set to 1 to flag it. """ return self._store_shape_for_empty @store_shape_for_empty.setter def store_shape_for_empty(self, value): # Check that it is a bool, and then set it. If it is false, we # are not doing MATLAB compatible formatting. if isinstance(value, bool): self._store_shape_for_empty = value if not self._store_shape_for_empty: self._matlab_compatible = False @property def complex_names(self): """ Names to use for the real and imaginary fields. tuple of two str ``(r, i)`` where `r` and `i` are two ``str``. When reading and writing complex numbers, the real part gets the name in `r` and the imaginary part gets the name in `i`. ``h5py`` uses ``('r', 'i')`` by default, unless MATLAB compatibility is being done in which case its default is ``('real', 'imag')``. Must be ``('real', 'imag')`` if doing MATLAB compatibility. """ return self._complex_names @complex_names.setter def complex_names(self, value): # Check that it is a tuple of two strings, and then set it. If # it is something other than ('real', 'imag'), then we are not # doing MATLAB compatible formatting. if isinstance(value, tuple) and len(value) == 2 \ and isinstance(value[0], str) \ and isinstance(value[1], str): self._complex_names = value if self._complex_names != ('real', 'imag'): self._matlab_compatible = False @property def group_for_references(self): """ Path for where to put objects pointed at by references. str The absolute POSIX path for the Group to place all data that is pointed to by another piece of data (needed for ``numpy.object_`` and similar types). This path is automatically excluded from its parent group when reading back a ``dict``. Must be ``'/#refs#`` if doing MATLAB compatibility. """ return self._group_for_references @group_for_references.setter def group_for_references(self, value): # Check that it an str and a valid absolute POSIX path, and then # set it. If it is something other than "/#refs#", then we are # not doing MATLAB compatible formatting. if isinstance(value, str): pth = posixpath.normpath(value) if len(pth) > 1 and posixpath.isabs(pth): self._group_for_references = value if self._group_for_references != "/#refs#": self._matlab_compatible = False @property def oned_as(self): """ Vector that 1D arrays become when making everything >= 2D. {'row', 'column'} When the ``make_atleast_2d`` option is set (set implicitly by doing MATLAB compatibility), this option controls whether 1D arrays become row vectors or column vectors. See Also -------- make_atleast_2d """ return self._oned_as @oned_as.setter def oned_as(self, value): # Check that it is one of the valid values before setting it. if value in ('row', 'column'): self._oned_as = value @property def compress(self): """ Whether to compress large python objects (datasets). bool If ``True``, python objects (datasets) larger than ``compress_size_threshold`` will be compressed. See Also -------- compress_size_threshold compression_algorithm shuffle_filter compressed_fletcher32_filter """ return self._compress @compress.setter def compress(self, value): # Check that it is a bool, and then set it. if isinstance(value, bool): self._compress = value @property def compress_size_threshold(self): """ Minimum size of a python object before it is compressed. int Minimum size in bytes a python object must be for it to be compressed if ``compress`` is set. Must be non-negative. See Also -------- compress """ return self._compress_size_threshold @compress_size_threshold.setter def compress_size_threshold(self, value): # Check that it is a non-negative integer, and then set it. if isinstance(value, int) and value >= 0: self._compress_size_threshold = value @property def compression_algorithm(self): """ Algorithm to use for compression. {'gzip', 'lzf', 'szip'} Compression algorithm to use When the ``compress`` option is set and a python object is larger than ``compress_size_threshold``. ``'gzip'`` is the only MATLAB compatible option. ``'gzip'`` is also known as the Deflate algorithm, which is the default compression algorithm of ZIP files and is a common compression algorithm used on tarballs. It is the most compatible option. It has good compression and is reasonably fast. Its compression level is set with the ``gzip_compression_level`` option, which is an integer between 0 and 9 inclusive. ``'lzf'`` is a very fast but low to moderate compression algorithm. It is less commonly used than gzip/Deflate, but doesn't have any patent or license issues. ``'szip'`` is a compression algorithm that has some patents and license restrictions. It is not always available. See Also -------- compress compress_size_threshold h5py.Group.create_dataset """ return self._compression_algorithm @compression_algorithm.setter def compression_algorithm(self, value): # Check that it is one of the valid values before setting it. If # it is something other than 'gzip', then we are not doing # MATLAB compatible formatting. if value in ('gzip', 'lzf', 'szip'): self._compression_algorithm = value if self._compression_algorithm != 'gzip': self._matlab_compatible = False @property def gzip_compression_level(self): """ The compression level to use when doing the gzip algorithm. int Compression level to use when data is being compressed with the ``'gzip'`` algorithm. Must be an integer between 0 and 9 inclusive. Lower values are faster while higher values give better compression. See Also -------- compress compression_algorithm """ return self._gzip_compression_level @gzip_compression_level.setter def gzip_compression_level(self, value): # Check that it is an integer between 0 and 9. if isinstance(value, int) and value >= 0 and value <= 9: self._gzip_compression_level = value @property def shuffle_filter(self): """ Whether to use the shuffle filter on compressed python objects. bool If ``True``, python objects (datasets) that are compressed are run through the shuffle filter, which reversibly rearranges the data to improve compression. See Also -------- compress h5py.Group.create_dataset """ return self._shuffle_filter @shuffle_filter.setter def shuffle_filter(self, value): # Check that it is a bool, and then set it. if isinstance(value, bool): self._shuffle_filter = value @property def compressed_fletcher32_filter(self): """ Whether to use the fletcher32 filter on compressed python objects. bool If ``True``, python objects (datasets) that are compressed are run through the fletcher32 filter, which stores a checksum with each chunk so that data corruption can be more easily detected. See Also -------- compress shuffle_filter uncompressed_flether32_filter h5py.Group.create_dataset """ return self._compressed_fletcher32_filter @compressed_fletcher32_filter.setter def compressed_fletcher32_filter(self, value): # Check that it is a bool, and then set it. if isinstance(value, bool): self._compressed_fletcher32_filter = value @property def uncompressed_fletcher32_filter(self): """ Whether to use the fletcher32 filter on uncompressed non-scalar python objects. bool If ``True``, python objects (datasets) that are **NOT** compressed and are not scalars (when converted to a Numpy type, their shape is not an empty ``tuple``) are run through the fletcher32 filter, which stores a checksum with each chunk so that data corruption can be more easily detected. This forces all uncompressed data to be chuncked regardless of how small and can increase file sizes. See Also -------- compress shuffle_filter compressed_flether32_filter h5py.Group.create_dataset """ return self._uncompressed_fletcher32_filter @uncompressed_fletcher32_filter.setter def uncompressed_fletcher32_filter(self, value): # Check that it is a bool, and then set it. if isinstance(value, bool): self._uncompressed_fletcher32_filter = value class MarshallerCollection(object): """ Represents, maintains, and retreives a set of marshallers. Maintains a list of marshallers used to marshal data types to and from HDF5 files. It includes the builtin marshallers from the ``hdf5storage.Marshallers`` module as well as any user supplied or added marshallers. While the builtin list cannot be changed; user ones can be added or removed. Also has functions to get the marshaller appropriate for ``type`` or type_string for a python data type. User marshallers must provide the same interface as ``hdf5storage.Marshallers.TypeMarshaller``, which is probably most easily done by inheriting from it. Parameters ---------- marshallers : marshaller or list of marshallers, optional The user marshaller/s to add to the collection. Could also be a ``tuple``, ``set``, or ``frozenset`` of marshallers. See Also -------- hdf5storage.Marshallers hdf5storage.Marshallers.TypeMarshaller """ def __init__(self, marshallers=[]): # Two lists of marshallers need to be maintained: one for the # builtin ones in the Marshallers module, and another for user # supplied ones. # Grab all the marshallers in the Marshallers module (they are # the classes) by inspection. self._builtin_marshallers = [m() for key, m in dict( inspect.getmembers(Marshallers, inspect.isclass)).items() if m != Marshallers.parse_version] self._user_marshallers = [] # A list of all the marshallers will be needed along with # dictionaries to lookup up the marshaller to use for given # types, type string, or MATLAB class string (they are the # keys). self._marshallers = [] self._types = dict() self._type_strings = dict() self._matlab_classes = dict() # Add any user given marshallers. self.add_marshaller(copy.deepcopy(marshallers)) def _update_marshallers(self): """ Update the full marshaller list and other data structures. Makes a full list of both builtin and user marshallers and rebuilds internal data structures used for looking up which marshaller to use for reading/writing Python objects to/from file. """ # Combine both sets of marshallers. self._marshallers = copy.deepcopy(self._builtin_marshallers) self._marshallers.extend(copy.deepcopy(self._user_marshallers)) # Construct the dictionary to look up the appropriate marshaller # by type. It would normally be a dict comprehension such as # # self._types = {tp: m for m in self._marshallers # for tp in m.types} # # but that is not supported in Python 2.6 so it has to be done # with a for loop. self._types = dict() for m in self._marshallers: for tp in m.types: self._types[tp] = m # The equivalent one to read data types given type strings needs # to be created from it. Basically, we have to make the key be # the python_type_string from it. Same issue as before with # Python 2.6 # # self._type_strings = {type_string: m for key, m in # self._types.items() for type_string in # m.python_type_strings} self._type_strings = dict() for key, m in self._types.items(): for type_string in m.python_type_strings: self._type_strings[type_string] = m # The equivalent one to read data types given MATLAB class # strings needs to be created from it. Basically, we have to # make the key be the matlab_class from it. Same issue as before # with Python 2.6 # # self._matlab_classes = {matlab_class: m for key, m in # self._types.items() for matlab_class in # m.matlab_classes} self._matlab_classes = dict() for key, m in self._types.items(): for matlab_class in m.matlab_classes: self._matlab_classes[matlab_class] = m def add_marshaller(self, marshallers): """ Add a marshaller/s to the user provided list. Adds a marshaller or a list of them to the user provided set of marshallers. Parameters ---------- marshallers : marshaller or list of marshallers The user marshaller/s to add to the user provided collection. Could also be a ``tuple``, ``set``, or ``frozenset`` of marshallers. """ if not isinstance(marshallers, (list, tuple, set, frozenset)): marshallers = [marshallers] for m in marshallers: if m not in self._user_marshallers: self._user_marshallers.append(copy.deepcopy(m)) self._update_marshallers() def remove_marshaller(self, marshallers): """ Removes a marshaller/s from the user provided list. Removes a marshaller or a list of them from the user provided set of marshallers. Parameters ---------- marshallers : marshaller or list of marshallers The user marshaller/s to from the user provided collection. Could also be a ``tuple``, ``set``, or ``frozenset`` of marshallers. """ if not isinstance(marshallers, (list, tuple, set, frozenset)): marshallers = [marshallers] for m in marshallers: if m in self._user_marshallers: self._user_marshallers.remove(m) self._update_marshallers() def clear_marshallers(self): """ Clears the list of user provided marshallers. Removes all user provided marshallers, but not the builtin ones from the ``hdf5storage.Marshallers`` module, from the list of marshallers used. """ self._user_marshallers.clear() self._update_marshallers() def get_marshaller_for_type(self, tp): """ Gets the appropriate marshaller for a type. Retrieves the marshaller, if any, that can be used to read/write a Python object with type 'tp'. Parameters ---------- tp : type Python object ``type``. Returns ------- marshaller The marshaller that can read/write the type to file. ``None`` if no appropriate marshaller is found. See Also -------- hdf5storage.Marshallers.TypeMarshaller.types """ if tp in self._types: return copy.deepcopy(self._types[tp]) else: return None def get_marshaller_for_type_string(self, type_string): """ Gets the appropriate marshaller for a type string. Retrieves the marshaller, if any, that can be used to read/write a Python object with the given type string. Parameters ---------- type_string : str Type string for a Python object. Returns ------- marshaller The marshaller that can read/write the type to file. ``None`` if no appropriate marshaller is found. See Also -------- hdf5storage.Marshallers.TypeMarshaller.python_type_strings """ if type_string in self._type_strings: return copy.deepcopy(self._type_strings[type_string]) else: return None def get_marshaller_for_matlab_class(self, matlab_class): """ Gets the appropriate marshaller for a MATLAB class string. Retrieves the marshaller, if any, that can be used to read/write a Python object associated with the given MATLAB class string. Parameters ---------- matlab_class : str MATLAB class string for a Python object. Returns ------- marshaller The marshaller that can read/write the type to file. ``None`` if no appropriate marshaller is found. See Also -------- hdf5storage.Marshallers.TypeMarshaller.python_type_strings """ if matlab_class in self._matlab_classes: return copy.deepcopy(self._matlab_classes[matlab_class]) else: return None def writes(mdict, filename='data.h5', truncate_existing=False, truncate_invalid_matlab=False, options=None, **keywords): """ Writes data into an HDF5 file (high level). High level function to store one or more Python types (data) to specified pathes in an HDF5 file. The paths are specified as POSIX style paths where the directory name is the Group to put it in and the basename is the name to write it to. There are various options that can be used to influence how the data is written. They can be passed as an already constructed ``Options`` into `options` or as additional keywords that will be used to make one by ``options = Options(**keywords)``. Two very important options are ``store_python_metadata`` and ``matlab_compatible``, which are ``bool``. The first makes it so that enough metadata (HDF5 Attributes) are written that `data` can be read back accurately without it (or its contents if it is a container type) ending up different types, transposed in the case of numpy arrays, etc. The latter makes it so that the appropriate metadata is written, string and bool and complex types are converted properly, and numpy arrays are transposed; which is needed to make sure that MATLAB can import `data` correctly (the HDF5 header is also set so MATLAB will recognize it). Parameters ---------- mdict : dict, dict like The ``dict`` or other dictionary type object of paths and data to write to the file. The paths, the keys, must be POSIX style paths where the directory name is the Group to put it in and the basename is the name to write it to. The values are the data to write. filename : str, optional The name of the HDF5 file to write `data` to. truncate_existing : bool, optional Whether to truncate the file if it already exists before writing to it. truncate_invalid_matlab : bool, optional Whether to truncate a file if matlab_compatibility is being done and the file doesn't have the proper header (userblock in HDF5 terms) setup for MATLAB metadata to be placed. options : Options, optional The options to use when writing. Is mutually exclusive with any additional keyword arguments given (set to ``None`` or don't provide to use them). **keywords : If `options` was not provided or was ``None``, these are used as arguments to make a ``Options``. Raises ------ NotImplementedError If writing `data` is not supported. TypeNotMatlabCompatibleError If writing a type not compatible with MATLAB and `options.action_for_matlab_incompatible` is set to ``'error'``. See Also -------- write : Writes just a single piece of data reads read Options lowlevel.write_data : Low level version """ # Pack the different options into an Options class if an Options was # not given. if not isinstance(options, Options): options = Options(**keywords) # Go through mdict, extract the paths and data, and process the # paths. A list of tulpes for each piece of data to write will be # constructed where he first element is the group name, the second # the target name (name of the Dataset/Group holding the data), and # the third element the data to write. towrite = [] for p, v in mdict.items(): # Remove double slashes and a non-root trailing slash. path = posixpath.normpath(p) # Extract the group name and the target name (will be a dataset if # data can be mapped to it, but will end up being made into a group # otherwise. As HDF5 files use posix path, conventions, posixpath # will do everything. groupname = posixpath.dirname(path) targetname = posixpath.basename(path) # If groupname got turned into blank, then it is just root. if groupname == '': groupname = '/' # If targetname got turned blank, then it is the current directory. if targetname == '': targetname = '.' # Pack into towrite. towrite.append((groupname, targetname, v)) # Open/create the hdf5 file but don't write the data yet since the # userblock still needs to be set. This is all wrapped in a try # block, so that the file can be closed if any errors happen (the # error is re-raised). f = None try: # If the file doesn't already exist or the option is set to # truncate it if it does, just open it truncating whatever is # there. Otherwise, open it for read/write access without # truncating. Now, if we are doing matlab compatibility and it # doesn't have a big enough userblock (for metadata for MATLAB # to be able to tell it is a valid .mat file) and the # truncate_invalid_matlab is set, then it needs to be closed and # re-opened with truncation. Whenever we create the file from # scratch, even if matlab compatibility isn't being done, a # sufficiently sized userblock is going to be allocated # (smallest size is 512) for future use (after all, someone # might want to turn it to a .mat file later and need it and it # is only 512 bytes). if truncate_existing or not os.path.isfile(filename): f = h5py.File(filename, mode='w', userblock_size=512) else: f = h5py.File(filename, mode='a') if options.matlab_compatible and truncate_invalid_matlab \ and f.userblock_size < 128: f.close() f = h5py.File(filename, mode='w', userblock_size=512) except: raise finally: # If the hdf5 file was opened at all, get the userblock size and # close it since we need to set the userblock. if isinstance(f, h5py.File): userblock_size = f.userblock_size f.close() else: raise IOError('Unable to create or open file.') # If we are doing MATLAB formatting and there is a sufficiently # large userblock, write the new userblock. The same sort of error # handling is used. if options.matlab_compatible and userblock_size >= 128: # Get the time. now = datetime.datetime.now() # Construct the leading string. The MATLAB one looks like # # s = 'MATLAB 7.3 MAT-file, Platform: GLNXA64, Created on: ' \ # + now.strftime('%a %b %d %H:%M:%S %Y') \ # + ' HDF5 schema 1.00 .' # # Platform is going to be changed to CPython version. The # version is just gotten from sys.version_info, which is a class # for Python >= 2.7, but a tuple before that. v = sys.version_info if sys.hexversion >= 0x02070000: v = {'major': v.major, 'minor': v.minor, 'micro': v.micro} else: v = {'major': v[0], 'minor': v[1], 'micro': v[1]} s = 'MATLAB 7.3 MAT-file, Platform: CPython ' \ + '{0}.{1}.{2}'.format(v['major'], v['minor'], v['micro']) \ + ', Created on: {0} {1}'.format( ('Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun')[ \ now.weekday()], \ ('Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', \ 'Sep', 'Oct', 'Nov', 'Dec')[now.month - 1]) \ + now.strftime(' %d %H:%M:%S %Y') \ + ' HDF5 schema 1.00 .' # Make the bytearray while padding with spaces up to 128-12 # (the minus 12 is there since the last 12 bytes are special. b = bytearray(s + (128-12-len(s))*' ', encoding='utf-8') # Add 8 nulls (0) and the magic number (or something) that # MATLAB uses. Lengths must be gone to to make sure the argument # to fromhex is unicode because Python 2.6 requires it. b.extend(bytearray.fromhex( b'00000000 00000000 0002494D'.decode())) # Now, write it to the beginning of the file. try: fd = open(filename, 'r+b') fd.write(b) except: raise finally: fd.close() # Open the hdf5 file again and write the data, making the Group if # necessary. This is all wrapped in a try block, so that the file # can be closed if any errors happen (the error is re-raised). f = None try: f = h5py.File(filename, mode='a') # Go through each element of towrite and write them. for groupname, targetname, data in towrite: # Need to make sure groupname is a valid group in f and grab its # handle to pass on to the low level function. grp = f.get(groupname) if grp is None: grp = f.require_group(groupname) # Hand off to the low level function. lowlevel.write_data(f, grp, targetname, data, None, options) except: raise finally: if isinstance(f, h5py.File): f.close() def write(data, path='/', filename='data.h5', truncate_existing=False, truncate_invalid_matlab=False, options=None, **keywords): """ Writes one piece of data into an HDF5 file (high level). A wrapper around ``writes`` to write a single piece of data, `data`, to a single location, `path`. High level function to store a Python type (`data`) to a specified path (`path`) in an HDF5 file. The path is specified as a POSIX style path where the directory name is the Group to put it in and the basename is the name to write it to. There are various options that can be used to influence how the data is written. They can be passed as an already constructed ``Options`` into `options` or as additional keywords that will be used to make one by ``options = Options(**keywords)``. Two very important options are ``store_python_metadata`` and ``matlab_compatible``, which are ``bool``. The first makes it so that enough metadata (HDF5 Attributes) are written that `data` can be read back accurately without it (or its contents if it is a container type) ending up different types, transposed in the case of numpy arrays, etc. The latter makes it so that the appropriate metadata is written, string and bool and complex types are converted properly, and numpy arrays are transposed; which is needed to make sure that MATLAB can import `data` correctly (the HDF5 header is also set so MATLAB will recognize it). Parameters ---------- data : any The data to write. path : str, optional The path to write `data` to. Must be a POSIX style path where the directory name is the Group to put it in and the basename is the name to write it to. filename : str, optional The name of the HDF5 file to write `data` to. truncate_existing : bool, optional Whether to truncate the file if it already exists before writing to it. truncate_invalid_matlab : bool, optional Whether to truncate a file if matlab_compatibility is being done and the file doesn't have the proper header (userblock in HDF5 terms) setup for MATLAB metadata to be placed. options : Options, optional The options to use when writing. Is mutually exclusive with any additional keyword arguments given (set to ``None`` or don't provide to use them). **keywords : If `options` was not provided or was ``None``, these are used as arguments to make a ``Options``. Raises ------ NotImplementedError If writing `data` is not supported. TypeNotMatlabCompatibleError If writing a type not compatible with MATLAB and `options.action_for_matlab_incompatible` is set to ``'error'``. See Also -------- writes : Writes more than one piece of data at once reads read Options lowlevel.write_data : Low level version """ writes(mdict={path: data}, filename=filename, truncate_existing=truncate_existing, truncate_invalid_matlab=truncate_invalid_matlab, options=options, **keywords) def reads(paths, filename='data.h5', options=None, **keywords): """ Reads data from an HDF5 file (high level). High level function to read one or more pieces of data from an HDF5 file located at the paths specified in `paths` into Python types. Each path is specified as a POSIX style path where the data to read is located. There are various options that can be used to influence how the data is read. They can be passed as an already constructed ``Options`` into `options` or as additional keywords that will be used to make one by ``options = Options(**keywords)``. Parameters ---------- paths : iterable of str An iterable of paths to read data from. Each must be a POSIX style path where the directory name is the Group to put it in and the basename is the name to write it to. filename : str, optional The name of the HDF5 file to read data from. options : Options, optional The options to use when reading. Is mutually exclusive with any additional keyword arguments given (set to ``None`` or don't provide to use them). **keywords : If `options` was not provided or was ``None``, these are used as arguments to make a ``Options``. Returns ------- datas : iterable An iterable holding the piece of data for each path in `paths` in the same order. Raises ------ CantReadError If reading the data can't be done. See Also -------- read : Reads just a single piece of data writes write Options lowlevel.read_data : Low level version. """ # Pack the different options into an Options class if an Options was # not given. By default, the matlab_compatible option is set to # False. So, if it wasn't passed in the keywords, this needs to be # added to override the default value (True) for a new Options. if not isinstance(options, Options): kw = copy.deepcopy(keywords) if 'matlab_compatible' not in kw: kw['matlab_compatible'] = False options = Options(**kw) # Process the paths and stuff the group names and target names as # tuples into toread. toread = [] for p in paths: # Remove double slashes and a non-root trailing slash. path = posixpath.normpath(p) # Extract the group name and the target name (will be a dataset if # data can be mapped to it, but will end up being made into a group # otherwise. As HDF5 files use posix path, conventions, posixpath # will do everything. groupname = posixpath.dirname(path) targetname = posixpath.basename(path) # If groupname got turned into blank, then it is just root. if groupname == '': groupname = '/' # If targetname got turned blank, then it is the current directory. if targetname == '': targetname = '.' # Pack them into toread toread.append((groupname, targetname)) # Open the hdf5 file and start reading the data. This is all wrapped # in a try block, so that the file can be closed if any errors # happen (the error is re-raised). try: f = None f = h5py.File(filename, mode='r') # Read the data item by item datas = [] for groupname, targetname in toread: # Check that the containing group is in f and is indeed a # group. If it isn't an error needs to be thrown. grp = f.get(groupname) if grp is None or not isinstance(grp, h5py.Group): raise CantReadError('Could not find containing Group ' + groupname + '.') # Hand off everything to the low level reader. datas.append(lowlevel.read_data(f, grp, targetname, options)) except: raise finally: if f is not None: f.close() return datas def read(path='/', filename='data.h5', options=None, **keywords): """ Reads one piece of data from an HDF5 file (high level). A wrapper around ``reads`` to read a single piece of data at the single location `path`. High level function to read data from an HDF5 file located at `path` into Python types. The path is specified as a POSIX style path where the data to read is located. There are various options that can be used to influence how the data is read. They can be passed as an already constructed ``Options`` into `options` or as additional keywords that will be used to make one by ``options = Options(**keywords)``. Parameters ---------- path : str, optional The path to read data from. Must be a POSIX style path where the directory name is the Group to put it in and the basename is the name to write it to. filename : str, optional The name of the HDF5 file to read data from. options : Options, optional The options to use when reading. Is mutually exclusive with any additional keyword arguments given (set to ``None`` or don't provide to use them). **keywords : If `options` was not provided or was ``None``, these are used as arguments to make a ``Options``. Returns ------- data : The piece of data at `path`. Raises ------ CantReadError If reading the data can't be done. See Also -------- reads : Reads more than one piece of data at once writes write Options lowlevel.read_data : Low level version. """ return reads(paths=(path,), filename=filename, options=options, **keywords)[0] def savemat(file_name, mdict, appendmat=True, format='7.3', oned_as='row', store_python_metadata=True, action_for_matlab_incompatible='error', marshaller_collection=None, truncate_existing=False, truncate_invalid_matlab=False, **keywords): """ Save a dictionary of python types to a MATLAB MAT file. Saves the data provided in the dictionary `mdict` to a MATLAB MAT file. `format` determines which kind/vesion of file to use. The '7.3' version, which is HDF5 based, is handled by this package and all types that this package can write are supported. Versions 4 and 5 are not HDF5 based, so everything is dispatched to the SciPy package's ``scipy.io.savemat`` function, which this function is modelled after (arguments not specific to this package have the same names, etc.). Parameters ---------- file_name : str or file-like object Name of the MAT file to store in. The '.mat' extension is added on automatically if not present if `appendmat` is set to ``True``. An open file-like object can be passed if the writing is being dispatched to SciPy (`format` < 7.3). mdict : dict The dictionary of variables and their contents to store in the file. appendmat : bool, optional Whether to append the '.mat' extension to `file_name` if it doesn't already end in it or not. format : {'4', '5', '7.3'}, optional The MATLAB mat file format to use. The '7.3' format is handled by this package while the '4' and '5' formats are dispatched to SciPy. oned_as : {'row', 'column'}, optional Whether 1D arrays should be turned into row or column vectors. store_python_metadata : bool, optional Whether or not to store Python type information. Doing so allows most types to be read back perfectly. Only applicable if not dispatching to SciPy (`format` >= 7.3). action_for_matlab_incompatible: str, optional The action to perform writing data that is not MATLAB compatible. The actions are to write the data anyways ('ignore'), don't write the incompatible data ('discard'), or throw a ``TypeNotMatlabCompatibleError`` exception. marshaller_collection : MarshallerCollection, optional Collection of marshallers to disk to use. Only applicable if not dispatching to SciPy (`format` >= 7.3). truncate_existing : bool, optional Whether to truncate the file if it already exists before writing to it. truncate_invalid_matlab : bool, optional Whether to truncate a file if the file doesn't have the proper header (userblock in HDF5 terms) setup for MATLAB metadata to be placed. **keywords : Additional keywords arguments to be passed onto ``scipy.io.savemat`` if dispatching to SciPy (`format` < 7.3). Raises ------ ImportError If `format` < 7.3 and the ``scipy`` module can't be found. NotImplementedError If writing a variable in `mdict` is not supported. TypeNotMatlabCompatibleError If writing a type not compatible with MATLAB and `action_for_matlab_incompatible` is set to ``'error'``. Notes ----- Writing the same data and then reading it back from disk using the HDF5 based version 7.3 format (the functions in this package) or the older format (SciPy functions) can lead to very different results. Each package supports a different set of data types and converts them to and from the same MATLAB types differently. See Also -------- loadmat : Equivelent function to do reading. scipy.io.savemat : SciPy function this one models after and dispatches to. Options writes : Function used to do the actual writing. """ # If format is a number less than 7.3, the call needs to be # dispatched to the scipy version, if it is available, with all the # relevant and extra keywords options provided. if float(format) < 7.3: import scipy.io scipy.io.savemat(file_name, mdict, appendmat=appendmat, format=format, oned_as=oned_as, **keywords) return # Append .mat if it isn't on the end of the file name and we are # supposed to. if appendmat and not file_name.endswith('.mat'): file_name = file_name + '.mat' # Make the options with matlab compatibility forced. options = Options(store_python_metadata=store_python_metadata, \ matlab_compatible=True, oned_as=oned_as, \ action_for_matlab_incompatible=action_for_matlab_incompatible, \ marshaller_collection=marshaller_collection) # Write the variables in the dictionary to file. writes(mdict=mdict, filename=file_name, truncate_existing=truncate_existing, truncate_invalid_matlab=truncate_invalid_matlab, options=options) def loadmat(file_name, mdict=None, appendmat=True, variable_names=None, marshaller_collection=None, **keywords): """ Loads data to a MATLAB MAT file. Reads data from the specified variables (or all) in a MATLAB MAT file. There are many different formats of MAT files. This package can only handle the HDF5 based ones (the version 7.3 and later). As SciPy's ``scipy.io.loadmat`` function can handle the earlier formats, if this function cannot read the file, it will dispatch it onto the scipy function with all the calling arguments it uses passed on. This function is modelled after the SciPy one (arguments not specific to this package have the same names, etc.). Warning ------- Variables in `variable_names` that are missing from the file do not cause an exception and will just be missing from the output. Parameters ---------- file_name : str Name of the MAT file to read from. The '.mat' extension is added on automatically if not present if `appendmat` is set to ``True``. mdict : dict, optional The dictionary to insert read variables into appendmat : bool, optional Whether to append the '.mat' extension to `file_name` if it doesn't already end in it or not. variable_names: None or sequence, optional The variable names to read from the file. ``None`` selects all. marshaller_collection : MarshallerCollection, optional Collection of marshallers from disk to use. Only applicable if not dispatching to SciPy (version 7.3 and newer files). **keywords : Additional keywords arguments to be passed onto ``scipy.io.loadmat`` if dispatching to SciPy if the file is not a version 7.3 or later format. Returns ------- dict Dictionary of all the variables read from the MAT file (name as the key, and content as the value). If a variable was missing from the file, it will not be present here. Raises ------ ImportError If it is not a version 7.3 .mat file and the ``scipy`` module can't be found when dispatching to SciPy. CantReadError If reading the data can't be done. Notes ----- Writing the same data and then reading it back from disk using the HDF5 based version 7.3 format (the functions in this package) or the older format (SciPy functions) can lead to very different results. Each package supports a different set of data types and converts them to and from the same MATLAB types differently. See Also -------- savemat : Equivalent function to do writing. scipy.io.loadmat : SciPy function this one models after and dispatches to. Options reads : Function used to do the actual reading. """ # Will first assume that it is the HDF5 based 7.3 format. If an # OSError occurs, then it wasn't an HDF5 file and the scipy function # can be tried instead. try: # Make the options with the given marshallers. options = Options(marshaller_collection=marshaller_collection) # Append .mat if it isn't on the end of the file name and we are # supposed to. if appendmat and not file_name.endswith('.mat'): filename = file_name + '.mat' else: filename = file_name # Read everything if we were instructed. if variable_names is None: data = dict() with h5py.File(filename, mode='r') as f: for k in f: # Read if not group_for_references. Data that # produces errors when read is dicarded (the OSError # that would happen if this is not an HDF5 file # would already have happened when opening the # file). if f[k].name != options.group_for_references: try: data[k] = lowlevel.read_data(f, f, k, options) except: pass else: # Extract the desired fields one by one, catching any errors # for missing variables (so we don't fall back to # scipy.io.loadmat). data = dict() with h5py.File(filename, mode='r') as f: for k in variable_names: try: data[k] = lowlevel.read_data(f, f, k, options) except: pass # Read all the variables, stuff them into mdict, and return it. if mdict is None: mdict = dict() for k, v in data.items(): mdict[k] = v return mdict except OSError: import scipy.io return scipy.io.loadmat(file_name, mdict, appendmat=appendmat, variable_names=variable_names, **keywords) hdf5storage-0.1.19/hdf5storage/lowlevel.py000066400000000000000000000135421436247615200204520ustar00rootroot00000000000000# Copyright (c) 2013-2016, Freja Nordsiek # All rights reserved. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions are # met: # # 1. Redistributions of source code must retain the above copyright # notice, this list of conditions and the following disclaimer. # # 2. Redistributions in binary form must reproduce the above copyright # notice, this list of conditions and the following disclaimer in the # documentation and/or other materials provided with the distribution. # # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS # "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT # LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR # A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT # HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, # SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT # LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, # DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY # THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. """ Module of Exceptions and low level read and write functions. """ import posixpath import numpy as np import h5py from hdf5storage.utilities import * class Hdf5storageError(IOError): """ Base class of hdf5storage package exceptions.""" pass class CantReadError(Hdf5storageError): """ Exception for a failure to read the desired data.""" pass class TypeNotMatlabCompatibleError(Hdf5storageError): """ Exception for trying to write non-MATLAB compatible data. In the event that MATLAB compatibility is being done (``Options.matlab_compatible``) and a Python type is not importable by MATLAB, the data is either not written or this exception is thrown depending on the value of ``Options.action_for_matlab_incompatible``. See Also -------- hdf5storage.Options.matlab_compatible hdf5storage.Options.action_for_matlab_incompatible """ pass def write_data(f, grp, name, data, type_string, options): """ Writes a piece of data into an open HDF5 file. Low level function to store a Python type (`data`) into the specified Group. Parameters ---------- f : h5py.File The open HDF5 file. grp : h5py.Group or h5py.File The Group to place the data in. name : str The name to write the data to. data : any The data to write. type_string : str or None The type string of the data, or ``None`` to deduce automatically. options : hdf5storage.core.Options The options to use when writing. Raises ------ NotImplementedError If writing `data` is not supported. TypeNotMatlabCompatibleError If writing a type not compatible with MATLAB and `options.action_for_matlab_incompatible` is set to ``'error'``. See Also -------- hdf5storage.core.write : Higher level version. read_data hdf5storage.core.Options """ # Get the marshaller for type(data). tp = type(data) m = options.marshaller_collection.get_marshaller_for_type(tp) # If a marshaller was found, use it to write the data. Otherwise, # return an error. If we get something other than None back, then we # must recurse through the entries. Also, we must set the H5PATH # attribute to be the path to the containing group. if m is not None: m.write(f, grp, name, data, type_string, options) else: raise NotImplementedError('Can''t write data type: '+str(tp)) def read_data(f, grp, name, options): """ Writes a piece of data into an open HDF5 file. Low level function to read a Python type of the specified name from specified Group. Parameters ---------- f : h5py.File The open HDF5 file. grp : h5py.Group or h5py.File The Group to read the data from. name : str The name of the data to read. options : hdf5storage.core.Options The options to use when reading. Returns ------- data The data named `name` in Group `grp`. Raises ------ CantReadError If the data cannot be read successfully. See Also -------- hdf5storage.core.read : Higher level version. write_data hdf5storage.core.Options """ # If name isn't found, return error. dsetgrp = grp.get(name) if dsetgrp is None: raise CantReadError('Could not find ' + posixpath.join(grp.name, name)) # Get the different attributes that can be used to identify they # type, which are the type string and the MATLAB class. type_string = get_attribute_string(dsetgrp, 'Python.Type') matlab_class = get_attribute_string(dsetgrp, 'MATLAB_class') # If the type_string is present, get the marshaller for it. If it is # not, use the one for the matlab class if it is given. Otherwise, # use the fallback (NumpyScalarArrayMarshaller for both Datasets and # Groups). If calls to the marshaller collection to get the right # marshaller don't return one (return None), we also go to the # default). m = None mc = options.marshaller_collection if type_string is not None: m = mc.get_marshaller_for_type_string(type_string) elif matlab_class is not None: m = mc.get_marshaller_for_matlab_class(matlab_class) if m is None: m = mc.get_marshaller_for_type(np.uint8) # If a marshaller was found, use it to write the data. Otherwise, # return an error. if m is not None: return m.read(f, grp, name, options) else: raise CantReadError('Could not read ' + dsetgrp.name) hdf5storage-0.1.19/hdf5storage/utilities.py000066400000000000000000001103701436247615200206310ustar00rootroot00000000000000# Copyright (c) 2013-2016, Freja Nordsiek # All rights reserved. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions are # met: # # 1. Redistributions of source code must retain the above copyright # notice, this list of conditions and the following disclaimer. # # 2. Redistributions in binary form must reproduce the above copyright # notice, this list of conditions and the following disclaimer in the # documentation and/or other materials provided with the distribution. # # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS # "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT # LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR # A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT # HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, # SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT # LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, # DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY # THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. """ Module of functions to set and delete HDF5 attributes. """ import sys import collections import copy import random # In old versions of Python, everyting in collections.abc was in # collections. try: import collections.abc except ImportError: collections.abc = collections try: from pkg_resources import parse_version except: from distutils.version import StrictVersion as parse_version\ import numpy as np import h5py # We need to determine if h5py is one of the versions that cannot read # the MATLAB_fields Attribute in the normal fashion so that we can # handle it specially. _cant_read_matlab_fields = ( parse_version(h5py.__version__) < parse_version('2.3')) _handle_matlab_fields_specially = ( parse_version(h5py.__version__) in (parse_version('3.0'), parse_version('3.1'))) if _handle_matlab_fields_specially: import ctypes import h5py._objects # Determine if numpy.ndarrays and scalars have a tobytes method or # not. If they don't, then the tostring method must be used instead. _numpy_has_tobytes = hasattr(np.array([1]), 'tobytes') def numpy_to_bytes(obj): """ Get the raw bytes of a numpy object's data. Calls the ``tobytes`` method on `obj` for new versions of ``numpy`` where the method exists, and ``tostring`` for old versions of ``numpy`` where it does not. Parameters ---------- obj : numpy.generic or numpy.ndarray Numpy scalar or array. Returns ------- data : bytes The raw data. """ if _numpy_has_tobytes: return obj.tobytes() else: return obj.tostring() def read_matlab_fields_attribute(attrs): """ Reads the ``MATLAB_fields`` Attribute. On some versions of ``h5py``, the ``MATLAB_fields`` Attribute cannot be read in the standard way and must instead be read in a more manual fashion. This function reads the Attribute by the proper method. Parameters ---------- attrs : h5py.AttributeManager The Attribute manager to read from. Returns ------- value : numpy.ndarray or None The value of the ``MATLAB_fields`` Attribute, or ``None`` if it isn't available or its format is invalid. Raises ------ TypeError If an argument has the wrong type. """ if not isinstance(attrs, h5py.AttributeManager): raise TypeError('attrs must be a h5py.AttributeManager.') if not _handle_matlab_fields_specially: return attrs.get('MATLAB_fields') if _cant_read_matlab_fields or 'MATLAB_fields' not in attrs: return None # The following method is loosely based on the method provided by # takluyver at # https://github.com/h5py/h5py/issues/1817#issuecomment-781385699 # # but has been improved by reading it directly as (size_t, void*) # pairs uint64s instead of using struct.unpack, and avoiding making # intermediate copies of the data by copying directly to the output # array. with h5py._objects.phil: attr_id = attrs.get_id('MATLAB_fields') dt = np.dtype([('length', np.uintp), ('pointer', np.intp)]) raw_buf = np.empty(attr_id.shape, dtype=dt) attr_id.read(raw_buf, mtype=attr_id.get_type()) attr = np.empty(raw_buf.shape, dtype='object') for i, (length, ptr) in enumerate(raw_buf.flat): at = np.empty(length, dtype='S1') ctypes.memmove(at.ctypes.data, int(ptr), int(length)) attr.flat[i] = at return attr def read_all_attributes_into(attrs, out): """ Reads all Attributes into a MutableMapping (dict-like) Reads all Attributes into the MutableMapping (dict-like) out, including the special handling of the ``MATLAB_fields`` Attribute on versions of ``h5py`` where it cannot be read in the standard fashion. Parameters ---------- attrs : h5py.AttributeManager The Attribute manager to read from. out : MutableMapping The MutableMapping (dict-like) to write the Attributes into. Raises ------ TypeError If an argument has the wrong type. See Also -------- read_matlab_fields_attribute """ if not isinstance(attrs, h5py.AttributeManager): raise TypeError('attrs must be a h5py.AttributeManager.') if not isinstance(out, (dict, collections.defaultdict, collections.abc.MutableMapping)): raise TypeError('out must be a MutableMapping.') if not _handle_matlab_fields_specially \ or 'MATLAB_fields' not in attrs: out.update(attrs.items()) else: for k in attrs: if k != 'MATLAB_fields': out[k] = attrs[k] else: out[k] = read_matlab_fields_attribute(attrs) def does_dtype_have_a_zero_shape(dt): """ Determine whether a dtype (or its fields) have zero shape. Determines whether the given ``numpy.dtype`` has a shape with a zero element or if one of its fields does, or if one of its fields' fields does, and so on recursively. The following dtypes do not have zero shape. * ``'uint8'`` * ``[('a', 'int32'), ('blah', 'float16', (3, 3))]`` * ``[('a', [('b', 'complex64')], (2, 1, 3))]`` But the following do * ``('uint8', (1, 0))`` * ``[('a', 'int32'), ('blah', 'float16', (3, 0))]`` * ``[('a', [('b', 'complex64')], (2, 0, 3))]`` Parameters ---------- dt : numpy.dtype The dtype to check. Returns ------- yesno : bool Whether `dt` or one of its fields has a shape with at least one element that is zero. Raises ------ TypeError If `dt` is not a ``numpy.dtype``. """ components = [dt] while 0 != len(components): c = components.pop() if 0 in c.shape: return True if c.names is not None: components.extend([v[0] for v in c.fields.values()]) if c.base != c: components.append(c.base) return False def next_unused_name_in_group(grp, length): """ Gives a name that isn't used in a Group. Generates a name of the desired length that is not a Dataset or Group in the given group. Note, if length is not large enough and `grp` is full enough, there may be no available names meaning that this function will hang. Parameters ---------- grp : h5py.Group or h5py.File The HDF5 Group (or File if at '/') to generate an unused name in. length : int Number of characters the name should be. Returns ------- str A name that isn't already an existing Dataset or Group in `grp`. """ # While # # ltrs = string.ascii_letters + string.digits # name = ''.join([random.choice(ltrs) for i in range(length)]) # # seems intuitive, its performance is abysmal compared to # # '%0{0}x'.format(length) % random.getrandbits(length * 4) # # The difference is a factor of 20. Idea from # # https://stackoverflow.com/questions/2782229/most-lightweight-way- # to-create-a-random-string-and-a-random-hexadecimal-number/ # 35161595#35161595 fmt = '%0{0}x'.format(length) name = fmt % random.getrandbits(length * 4) while name in grp: name = fmt % random.getrandbits(length * 4) return name def convert_numpy_str_to_uint16(data): """ Converts a numpy.unicode\\_ to UTF-16 in numpy.uint16 form. Convert a ``numpy.unicode_`` or an array of them (they are UTF-32 strings) to UTF-16 in the equivalent array of ``numpy.uint16``. The conversion will throw an exception if any characters cannot be converted to UTF-16. Strings are expanded along rows (across columns) so a 2x3x4 array of 10 element strings will get turned into a 2x30x4 array of uint16's if every UTF-32 character converts easily to a UTF-16 singlet, as opposed to a UTF-16 doublet. Parameters ---------- data : numpy.unicode\\_ or numpy.ndarray of numpy.unicode\\_ The string or array of them to convert. Returns ------- array : numpy.ndarray of numpy.uint16 The result of the conversion. Raises ------ UnicodeEncodeError If a UTF-32 character has no UTF-16 representation. See Also -------- convert_numpy_str_to_uint32 convert_to_numpy_str """ # An empty string should be an empty uint16 if data.nbytes == 0: return np.uint16([]) # We need to use the UTF-16 codec for our endianness. Using the # right one means we don't have to worry about removing the BOM. if sys.byteorder == 'little': codec = 'UTF-16LE' else: codec = 'UTF-16BE' # numpy.char.encode can do the conversion element wise. Then, we # just have convert to uin16 with the appropriate dimensions. The # dimensions are gotten from the shape of the converted data with # the number of column increased by the number of words (pair of # bytes) in the strings. cdata = np.char.encode(np.atleast_1d(data), codec) shape = list(cdata.shape) shape[-1] *= (cdata.dtype.itemsize // 2) return np.ndarray(shape=shape, dtype='uint16', buffer=numpy_to_bytes(cdata)) def convert_numpy_str_to_uint32(data): """ Converts a numpy.str\\_ to its numpy.uint32 representation. Convert a ``numpy.str`` or an array of them (they are UTF-32 strings) into the equivalent array of ``numpy.uint32`` that is byte for byte identical. Strings are expanded along rows (across columns) so a 2x3x4 array of 10 element strings will get turned into a 2x30x4 array of uint32's. Parameters ---------- data : numpy.str\\_ or numpy.ndarray of numpy.str\\_ The string or array of them to convert. Returns ------- numpy.ndarray of numpy.uint32 The result of the conversion. See Also -------- convert_numpy_str_to_uint16 decode_to_numpy_str """ if data.nbytes == 0: # An empty string should be an empty uint32. return np.uint32([]) else: # We need to calculate the new shape from the current shape, # which will have to be expanded along the rows to fit all the # characters (the dtype.itemsize gets the number of bytes in # each string, which is just 4 times the number of # characters. Then it is a mstter of getting a view of the # string (in flattened form so that it is contiguous) as uint32 # and then reshaping it. shape = list(np.atleast_1d(data).shape) shape[-1] *= data.dtype.itemsize//4 return data.flatten().view(np.uint32).reshape(tuple(shape)) def convert_to_str(data): """ Decodes data to the Python 3.x str (Python 2.x unicode) type. Decodes `data` to a Python 3.x ``str`` (Python 2.x ``unicode``). If it can't be decoded, it is returned as is. Unsigned integers, Python ``bytes``, and Numpy strings (``numpy.str_`` and ``numpy.bytes_``). Python 3.x ``bytes``, Python 2.x ``str``, and ``numpy.bytes_`` are assumed to be encoded in UTF-8. Parameters ---------- data : some type Data decode into an ``str`` string. Returns ------- str or data If `data` can be decoded into a ``str``, the decoded version is returned. Otherwise, `data` is returned unchanged. See Also -------- convert_to_numpy_str convert_to_numpy_bytes """ # How the conversion is done depends on the exact underlying # type. Numpy types are handled separately. For uint types, it is # assumed to be stored as UTF-8, UTF-16, or UTF-32 depending on the # size when converting to an str. numpy.string_ is just like # converting a bytes. numpy.unicode has to be encoded into bytes # before it can be decoded back into an str. bytes is decoded # assuming it is in UTF-8. Otherwise, data has to be returned as is. if isinstance(data, (np.ndarray, np.uint8, np.uint16, np.uint32, np.bytes_, np.unicode_)): if data.dtype.name == 'uint8': return numpy_to_bytes(data.flatten()).decode('UTF-8') elif data.dtype.name == 'uint16': return numpy_to_bytes(data.flatten()).decode('UTF-16') elif data.dtype.name == 'uint32': return numpy_to_bytes(data.flatten()).decode('UTF-32') elif data.dtype.char == 'S': return data.decode('UTF-8') else: if isinstance(data, np.ndarray): return numpy_to_bytes(data.flatten()).decode('UTF-32') else: return data.encode('UTF-32').decode('UTF-32') if isinstance(data, bytes): return data.decode('UTF-8') else: return data def convert_to_numpy_str(data, length=None): """ Decodes data to Numpy unicode string (str\\_). Decodes `data` to Numpy unicode string (UTF-32), which is ``numpy.str_``, or an array of them. If it can't be decoded, it is returned as is. Unsigned integers, Python string types (``str``, ``bytes``), and ``numpy.bytes_`` are supported. If it is an array of ``numpy.bytes_``, an array of those all converted to ``numpy.str_`` is returned. Python 3.x ``bytes``, Python 2.x ``str``, and ``numpy.bytes_`` are assumed to be encoded in UTF-8. For an array of unsigned integers, it may be desirable to make an array with strings of some specified length as opposed to an array of the same size with each element being a one element string. This naturally arises when converting strings to unsigned integer types in the first place, so it needs to be reversible. The `length` parameter specifies how many to group together into a string (desired string length). For 1d arrays, this is along its only dimension. For higher dimensional arrays, it is done along each row (across columns). So, for a 3x10x5 input array of uints and a `length` of 5, the output array would be a 3x2x5 of 5 element strings. Parameters ---------- data : some type Data decode into a Numpy unicode string. length : int or None, optional The number of consecutive elements (in the case of unsigned integer `data`) to compose each string in the output array from. ``None`` indicates the full amount for a 1d array or the number of columns (full length of row) for a higher dimension array. Returns ------- numpy.str\\_ or numpy.ndarray of numpy.str\\_ or data If `data` can be decoded into a ``numpy.str_`` or a ``numpy.ndarray`` of them, the decoded version is returned. Otherwise, `data` is returned unchanged. See Also -------- convert_to_str convert_to_numpy_bytes numpy.str_ """ # The method of conversion depends on its type. if isinstance(data, np.unicode_) or (isinstance(data, np.ndarray) \ and data.dtype.char == 'U'): # It is already an np.str_ or array of them, so nothing needs to # be done. return data elif (sys.hexversion >= 0x03000000 and isinstance(data, str)) \ or (sys.hexversion < 0x03000000 \ and isinstance(data, unicode)): # Easily converted through constructor. return np.unicode_(data) elif isinstance(data, (bytes, bytearray, np.bytes_)): # All of them can be decoded and then passed through the # constructor. return np.unicode_(data.decode('UTF-8')) elif isinstance(data, (np.uint8, np.uint16)): # They are single UTF-8 or UTF-16 scalars, and are easily # converted to a UTF-8 string and then passed through the # constructor. return np.unicode_(convert_to_str(data)) elif isinstance(data, np.uint32): # It is just the uint32 version of the character, so it just # needs to be have the dtype essentially changed by having its # bytes read into ndarray. return np.ndarray(shape=tuple(), dtype='U1', buffer=numpy_to_bytes(data.flatten()))[()] elif isinstance(data, np.ndarray) and data.dtype.char == 'S': # We just need to convert it elementwise. new_data = np.zeros(shape=data.shape, dtype='U' + str(data.dtype.itemsize)) for index, x in np.ndenumerate(data): new_data[index] = np.unicode_(x.decode('UTF-8')) return new_data elif isinstance(data, np.ndarray) \ and data.dtype.name in ('uint8', 'uint16', 'uint32'): # It is an ndarray of some uint type. How it is converted # depends on its shape. If its shape is just (), then it is just # a scalar wrapped in an array, which can be converted by # recursing the scalar value back into this function. shape = list(data.shape) if len(shape) == 0: return convert_to_numpy_str(data[()]) # As there are more than one element, it gets a bit more # complicated. We need to take the subarrays of the specified # length along columns (1D arrays will be treated as row arrays # here), each of those converted to an str_ scalar (normal # string) and stuffed into a new array. # # If the length was not given, it needs to be set to full. Then # the shape of the new array needs to be calculated (divide the # appropriate dimension, which depends on the number of # dimentions). if len(shape) == 1: if length is None: length = shape[0] new_shape = (shape[0]//length,) else: if length is None: length = shape[-1] new_shape = copy.deepcopy(shape) new_shape[-1] //= length # The new array can be made as all zeros (nulls) with enough # padding to hold everything (dtype='UL' where 'L' is the # length). It will start out as a 1d array and be reshaped into # the proper shape later (makes indexing easier). new_data = np.zeros(shape=(np.prod(new_shape),), dtype='U'+str(length)) # With data flattened into a 1d array, we just need to take # length sized chunks, convert them (if they are uint8 or 16, # then decode to str first, if they are uint32, put them as an # input buffer for an ndarray of type 'U'). data = data.flatten() for i in range(0, new_data.shape[0]): chunk = data[(i*length):((i+1)*length)] if data.dtype.name == 'uint32': new_data[i] = np.ndarray( shape=tuple(), dtype=new_data.dtype, buffer=numpy_to_bytes(chunk))[()] else: new_data[i] = np.unicode_(convert_to_str(chunk)) # Only thing is left is to reshape it. return new_data.reshape(tuple(new_shape)) else: # Couldn't figure out what it is, so nothing can be done but # return it as is. return data def convert_to_numpy_bytes(data, length=None): """ Decodes data to Numpy UTF-8 econded string (bytes\\_). Decodes `data` to a Numpy UTF-8 encoded string, which is ``numpy.bytes_``, or an array of them in which case it will be ASCII encoded instead. If it can't be decoded, it is returned as is. Unsigned integers, Python string types (``str``, ``bytes``), and ``numpy.str_`` (UTF-32) are supported. For an array of unsigned integers, it may be desirable to make an array with strings of some specified length as opposed to an array of the same size with each element being a one element string. This naturally arises when converting strings to unsigned integer types in the first place, so it needs to be reversible. The `length` parameter specifies how many to group together into a string (desired string length). For 1d arrays, this is along its only dimension. For higher dimensional arrays, it is done along each row (across columns). So, for a 3x10x5 input array of uints and a `length` of 5, the output array would be a 3x2x5 of 5 element strings. Parameters ---------- data : some type Data decode into a Numpy UTF-8 encoded string/s. length : int or None, optional The number of consecutive elements (in the case of unsigned integer `data`) to compose each string in the output array from. ``None`` indicates the full amount for a 1d array or the number of columns (full length of row) for a higher dimension array. Returns ------- numpy.bytes\\_ or numpy.ndarray of numpy.bytes\\_ or data If `data` can be decoded into a ``numpy.bytes_`` or a ``numpy.ndarray`` of them, the decoded version is returned. Otherwise, `data` is returned unchanged. See Also -------- convert_to_str convert_to_numpy_str numpy.bytes_ """ # The method of conversion depends on its type. if isinstance(data, np.bytes_) or (isinstance(data, np.ndarray) \ and data.dtype.char == 'S'): # It is already an np.bytes_ or array of them, so nothing needs # to be done. return data elif isinstance(data, (bytes, bytearray)): # Easily converted through constructor. return np.bytes_(data) elif (sys.hexversion >= 0x03000000 and isinstance(data, str)) \ or (sys.hexversion < 0x03000000 \ and isinstance(data, unicode)): return np.bytes_(data.encode('UTF-8')) elif isinstance(data, (np.uint16, np.uint32)): # They are single UTF-16 or UTF-32 scalars, and are easily # converted to a UTF-8 string and then passed through the # constructor. return np.bytes_(convert_to_str(data).encode('UTF-8')) elif isinstance(data, np.uint8): # It is just the uint8 version of the character, so it just # needs to be have the dtype essentially changed by having its # bytes read into ndarray. return np.ndarray(shape=tuple(), dtype='S1', buffer=numpy_to_bytes(data.flatten()))[()] elif isinstance(data, np.ndarray) and data.dtype.char == 'U': # We just need to convert it elementwise. new_data = np.zeros(shape=data.shape, dtype='S' + str(data.dtype.itemsize)) for index, x in np.ndenumerate(data): new_data[index] = np.bytes_(x.encode('UTF-8')) return new_data elif isinstance(data, np.ndarray) \ and data.dtype.name in ('uint8', 'uint16', 'uint32'): # It is an ndarray of some uint type. How it is converted # depends on its shape. If its shape is just (), then it is just # a scalar wrapped in an array, which can be converted by # recursing the scalar value back into this function. shape = list(data.shape) if len(shape) == 0: return convert_to_numpy_bytes(data[()]) # As there are more than one element, it gets a bit more # complicated. We need to take the subarrays of the specified # length along columns (1D arrays will be treated as row arrays # here), each of those converted to an str_ scalar (normal # string) and stuffed into a new array. # # If the length was not given, it needs to be set to full. Then # the shape of the new array needs to be calculated (divide the # appropriate dimension, which depends on the number of # dimentions). if len(shape) == 1: if length is None: length2 = shape[0] new_shape = (shape[0],) else: length2 = length new_shape = (shape[0]//length2,) else: if length is None: length2 = shape[-1] else: length2 = length new_shape = copy.deepcopy(shape) new_shape[-1] //= length2 # The new array can be made as all zeros (nulls) with enough # padding to hold everything (dtype='UL' where 'L' is the # length). It will start out as a 1d array and be reshaped into # the proper shape later (makes indexing easier). new_data = np.zeros(shape=(np.prod(new_shape),), dtype='S'+str(length2)) # With data flattened into a 1d array, we just need to take # length sized chunks, convert them (if they are uint8 or 16, # then decode to str first, if they are uint32, put them as an # input buffer for an ndarray of type 'U'). data = data.flatten() for i in range(0, new_data.shape[0]): chunk = data[(i*length2):((i+1)*length2)] if data.dtype.name == 'uint8': new_data[i] = np.ndarray( shape=tuple(), dtype=new_data.dtype, buffer=numpy_to_bytes(chunk))[()] else: new_data[i] = np.bytes_( \ convert_to_str(chunk).encode('UTF-8')) # Only thing is left is to reshape it. return new_data.reshape(tuple(new_shape)) else: # Couldn't figure out what it is, so nothing can be done but # return it as is. return data def decode_complex(data, complex_names=(None, None)): """ Decodes possibly complex data read from an HDF5 file. Decodes possibly complex datasets read from an HDF5 file. HDF5 doesn't have a native complex type, so they are stored as H5T_COMPOUND types with fields such as 'r' and 'i' for the real and imaginary parts. As there is no standardization for field names, the field names have to be given explicitly, or the fieldnames in `data` analyzed for proper decoding to figure out the names. A variety of reasonably expected combinations of field names are checked and used if available to decode. If decoding is not possible, it is returned as is. Parameters ---------- data : arraylike The data read from an HDF5 file, that might be complex, to decode into the proper Numpy complex type. complex_names : tuple of 2 str and/or Nones, optional ``tuple`` of the names to use (in order) for the real and imaginary fields. A ``None`` indicates that various common field names should be tried. Returns ------- decoded data or data If `data` can be decoded into a complex type, the decoded complex version is returned. Otherwise, `data` is returned unchanged. See Also -------- encode_complex Notes ----- Currently looks for real field names of ``('r', 're', 'real')`` and imaginary field names of ``('i', 'im', 'imag', 'imaginary')`` ignoring case. """ # Now, complex types are stored in HDF5 files as an H5T_COMPOUND type # with fields along the lines of ('r', 're', 'real') and ('i', 'im', # 'imag', 'imaginary') for the real and imaginary parts, which most # likely won't be properly extracted back into making a Python # complex type unless the proper h5py configuration is set. Since we # can't depend on it being set and adjusting it is hazardous (the # setting is global), it is best to just decode it manually. These # fields are obtained from the fields of its dtype. Obviously, if # there are no fields, then there is nothing to do. if data.dtype.fields is None: return data fields = list(data.dtype.fields) # If there aren't exactly two fields, then it can't be complex. if len(fields) != 2: return data # We need to grab the field names for the real and imaginary # parts. This will be done by seeing which list, if any, each field # is and setting variables to the proper name if it is in it (they # are initialized to None so that we know if one isn't found). real_fields = ['r', 're', 'real'] imag_fields = ['i', 'im', 'imag', 'imaginary'] cnames = list(complex_names) for s in fields: if s.lower() in real_fields: cnames[0] = s elif s.lower() in imag_fields: cnames[1] = s # If the real and imaginary fields were found, construct the complex # form from the fields. This is done by finding the complex type # that they cast to, making an array, and then setting the # parts. Otherwise, return what we were given because it isn't in # the right form. if cnames[0] is not None and cnames[1] is not None: cdata = np.result_type(data[cnames[0]].dtype, \ data[cnames[1]].dtype, 'complex64').type(data[cnames[0]]) cdata.imag = data[cnames[1]] return cdata else: return data def encode_complex(data, complex_names): """ Encodes complex data to having arbitrary complex field names. Encodes complex `data` to have the real and imaginary field names given in `complex_numbers`. This is needed because the field names have to be set so that it can be written to an HDF5 file with the right field names (HDF5 doesn't have a native complex type, so H5T_COMPOUND have to be used). Parameters ---------- data : arraylike The data to encode as a complex type with the desired real and imaginary part field names. complex_names : tuple of 2 str ``tuple`` of the names to use (in order) for the real and imaginary fields. Returns ------- encoded data `data` encoded into having the specified field names for the real and imaginary parts. See Also -------- decode_complex """ # Grab the dtype name, and convert it to the right non-complex type # if it isn't already one. dtype_name = data.dtype.name if dtype_name[0:7] == 'complex': dtype_name = 'float' + str(int(float(dtype_name[7:])/2)) # Create the new version of the data with the right field names for # the real and complex parts. This is easy to do with putting the # right detype in the view function. dt = np.dtype([(complex_names[0], dtype_name), (complex_names[1], dtype_name)]) return data.view(dt).copy() def get_attribute(target, name): """ Gets an attribute from a Dataset or Group. Gets the value of an Attribute if it is present (get ``None`` if not). Parameters ---------- target : Dataset or Group Dataset or Group to get the attribute of. name : str Name of the attribute to get. Returns ------- The value of the attribute if it is present, or ``None`` if it isn't. """ try: return target.attrs[name] except: return None def get_attribute_string(target, name): """ Gets a string attribute from a Dataset or Group. Gets the value of an Attribute that is a string if it is present (get ``None`` if it is not present or isn't a string type). Parameters ---------- target : Dataset or Group Dataset or Group to get the string attribute of. name : str Name of the attribute to get. Returns ------- str or None The ``str`` value of the attribute if it is present, or ``None`` if it isn't or isn't a type that can be converted to ``str`` """ value = get_attribute(target, name) if value is None: return value elif (sys.hexversion >= 0x03000000 and isinstance(value, str)) \ or (sys.hexversion < 0x03000000 \ and isinstance(value, unicode)): return value elif isinstance(value, bytes): return value.decode() elif isinstance(value, np.unicode_): return str(value) elif isinstance(value, np.bytes_): return value.decode() else: return None def get_attribute_string_array(target, name): """ Gets a string array Attribute from a Dataset or Group. Gets the value of an Attribute that is a string array if it is present (get ``None`` if not). Parameters ---------- target : Dataset or Group Dataset or Group to get the attribute of. name : str Name of the string array Attribute to get. Returns ------- list of str or None The string array value of the Attribute if it is present, or ``None`` if it isn't. """ value = get_attribute(target, name) if value is None: return value return [convert_to_str(x) for x in value] def set_attribute(target, name, value): """ Sets an attribute on a Dataset or Group. If the attribute `name` doesn't exist yet, it is created. If it already exists, it is overwritten if it differs from `value`. Parameters ---------- target : Dataset or Group Dataset or Group to set the attribute of. name : str Name of the attribute to set. value : numpy type other than ``numpy.str_`` Value to set the attribute to. """ # use alias to speed up the code target_attributes = target.attrs if name not in target_attributes: target_attributes.create(name, value) elif name == 'MATLAB_fields': if not np.array_equal(value, read_matlab_fields_attribute( target_attributes)): target_attributes.create(name, value) else: attr = target_attributes[name] if attr.dtype != value.dtype \ or attr.shape != value.shape: target_attributes.create(name, value) elif np.any(attr != value): target_attributes.modify(name, value) def set_attribute_string(target, name, value): """ Sets an attribute to a string on a Dataset or Group. If the attribute `name` doesn't exist yet, it is created. If it already exists, it is overwritten if it differs from `value`. Parameters ---------- target : Dataset or Group Dataset or Group to set the string attribute of. name : str Name of the attribute to set. value : string Value to set the attribute to. Can be any sort of string type that will convert to a ``numpy.bytes_`` """ set_attribute(target, name, np.bytes_(value)) def set_attribute_string_array(target, name, string_list): """ Sets an attribute to an array of string on a Dataset or Group. If the attribute `name` doesn't exist yet, it is created. If it already exists, it is overwritten with the list of string `string_list` (they will be vlen strings). Parameters ---------- target : Dataset or Group Dataset or Group to set the string array attribute of. name : str Name of the attribute to set. string_list : list of str List of strings to set the attribute to. Strings must be ``str`` """ s_list = [convert_to_str(s) for s in string_list] if sys.hexversion >= 0x03000000: target.attrs.create(name, s_list, dtype=h5py.special_dtype(vlen=str)) else: target.attrs.create(name, s_list, dtype=h5py.special_dtype(vlen=unicode)) def del_attribute(target, name): """ Deletes an attribute on a Dataset or Group. If the attribute `name` exists, it is deleted. Parameters ---------- target : Dataset or Group Dataset or Group to delete attribute of. name : str Name of the attribute to delete. """ attr_manager = target.attrs if name in attr_manager: del attr_manager[name] hdf5storage-0.1.19/pyproject.toml000066400000000000000000000002541436247615200167440ustar00rootroot00000000000000[build-system] # Minimum requirements for the build system to execute. requires = ["setuptools", "wheel"] # PEP 508 specifications. build-backend = "setuptools.build_meta"hdf5storage-0.1.19/requirements.txt000066400000000000000000000007651436247615200173230ustar00rootroot00000000000000numpy<1.12.0 ; python_version == '2.6' numpy ; python_version == '2.7' numpy<1.6.0 ; python_version == '3.0' numpy<1.8.0 ; python_version == '3.1' numpy<1.12.0 ; python_version == '3.2' numpy<1.12.0 ; python_version == '3.3' numpy ; python_version >= '3.4' h5py>=2.1 ; python_version == '2.6' h5py>=2.1 ; python_version == '2.7' h5py>=2.1,<2.4 ; python_version == '3.0' h5py>=2.1,<2.7 ; python_version == '3.1' h5py>=2.1,<2.7 ; python_version == '3.2' h5py>=2.1 ; python_version >= '3.3' hdf5storage-0.1.19/requirements_doc.txt000066400000000000000000000000421436247615200201340ustar00rootroot00000000000000-r requirements.txt sphinx>=1.7 hdf5storage-0.1.19/requirements_tests.txt000066400000000000000000000001051436247615200205310ustar00rootroot00000000000000-r requirements.txt unittest2 ; python_version == '2.6' nose>=1.0 hdf5storage-0.1.19/setup.cfg000066400000000000000000000001371436247615200156510ustar00rootroot00000000000000[bdist_wheel] universal=1 [build_sphinx] all-files=1 build-dir=doc/build source-dir=doc/sourcehdf5storage-0.1.19/setup.py000066400000000000000000000051751436247615200155510ustar00rootroot00000000000000import sys if sys.hexversion < 0x2060000: raise NotImplementedError('Python < 2.6 not supported.') # Try to import setuptools and if that fails, use ez_setup to get it # (fallback for old versions of python if it isn't installed). try: from setuptools import setup except: try: import ez_setup ez_setup.use_setuptools() except: pass from setuptools import setup # If distutils.version.StrictVersion no longer exists, setuptools is # also a dependency to get version parsing. try: from distutils.version import StrictVersion extra_deps = [] except: extra_deps = ['setuptools'] with open('README.rst') as file: long_description = file.read() setup(name='hdf5storage', version='0.1.19', description='Utilities to read/write Python types to/from HDF5 files, including MATLAB v7.3 MAT files.', long_description=long_description, author='Freja Nordsiek', author_email='fnordsie@posteo.net', url='https://github.com/frejanordsiek/hdf5storage', packages=['hdf5storage'], install_requires=["numpy < 1.12.0 ; python_version == '2.6'", "numpy ; python_version == '2.7'", "numpy<1.6.0 ; python_version == '3.0'", "numpy<1.8.0 ; python_version == '3.1'", "numpy<1.12.0 ; python_version == '3.2'", "numpy<1.12.0 ; python_version == '3.3'", "numpy ; python_version >= '3.4'", "h5py>=2.1 ; python_version == '2.6'", "h5py>=2.1 ; python_version == '2.7'", "h5py>=2.1,<2.4 ; python_version == '3.0'", "h5py>=2.1,<2.7 ; python_version == '3.1'", "h5py>=2.1,<2.7 ; python_version == '3.2'", "h5py>=2.1 ; python_version >= '3.3'"] + extra_deps, license='BSD', keywords='hdf5 matlab', classifiers=[ "Programming Language :: Python :: 2.6", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3", "Development Status :: 3 - Alpha", "License :: OSI Approved :: BSD License", "Operating System :: OS Independent", "Intended Audience :: Developers", "Intended Audience :: Information Technology", "Intended Audience :: Science/Research", "Topic :: Scientific/Engineering", "Topic :: Database", "Topic :: Software Development :: Libraries :: Python Modules" ], test_suite='nose.collector', tests_require='nose>=1.0' ) hdf5storage-0.1.19/tests/000077500000000000000000000000001436247615200151715ustar00rootroot00000000000000hdf5storage-0.1.19/tests/asserts.py000066400000000000000000000437141436247615200172400ustar00rootroot00000000000000# Copyright (c) 2013-2016, Freja Nordsiek # All rights reserved. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions are # met: # # 1. Redistributions of source code must retain the above copyright # notice, this list of conditions and the following disclaimer. # # 2. Redistributions in binary form must reproduce the above copyright # notice, this list of conditions and the following disclaimer in the # documentation and/or other materials provided with the distribution. # # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS # "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT # LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR # A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT # HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, # SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT # LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, # DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY # THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. import sys import collections import warnings import numpy as np import numpy.testing as npt def assert_dtypes_equal(a, b): # Check that two dtypes are equal, but ignorning itemsize for dtypes # whose shape is 0. assert isinstance(a, np.dtype) assert a.shape == b.shape if b.names is None: assert a == b else: assert a.names == b.names for n in b.names: assert_dtypes_equal(a[n], b[n]) def assert_equal(a, b): # Compares a and b for equality. If they are dictionaries, they must # have the same set of keys, after which they values must all be # compared. If they are a collection type (list, tuple, set, # frozenset, or deque), they must have the same length and their # elements must be compared. If they are not numpy types (aren't # or don't inherit from np.generic or np.ndarray), then it is a # matter of just comparing them. Otherwise, their dtypes and shapes # have to be compared. Then, if they are not an object array, # numpy.testing.assert_equal will compare them elementwise. For # object arrays, each element must be iterated over to be compared. assert type(a) == type(b) if type(b) == dict: assert set(a.keys()) == set(b.keys()) for k in b: assert_equal(a[k], b[k]) elif type(b) in (list, tuple, set, frozenset, collections.deque): assert len(a) == len(b) if type(b) in (set, frozenset): assert a == b else: for index in range(0, len(a)): assert_equal(a[index], b[index]) elif not isinstance(b, (np.generic, np.ndarray)): with warnings.catch_warnings(): warnings.simplefilter('ignore', RuntimeWarning) if isinstance(b, complex): assert a.real == b.real \ or np.all(np.isnan([a.real, b.real])) assert a.imag == b.imag \ or np.all(np.isnan([a.imag, b.imag])) else: assert a == b or np.all(np.isnan([a, b])) else: assert_dtypes_equal(a.dtype, b.dtype) assert a.shape == b.shape if b.dtype.name != 'object': with warnings.catch_warnings(): warnings.simplefilter('ignore', RuntimeWarning) npt.assert_equal(a, b) else: for index, x in np.ndenumerate(a): assert_equal(a[index], b[index]) def assert_equal_none_format(a, b): # Compares a and b for equality. b is always the original. If they # are dictionaries, a must be a structured ndarray and they must # have the same set of keys, after which they values must all be # compared. If they are a collection type (list, tuple, set, # frozenset, or deque), then the compairison must be made with b # converted to an object array. If the original is not a numpy type # (isn't or doesn't inherit from np.generic or np.ndarray), then it # is a matter of converting it to the appropriate numpy # type. Otherwise, both are supposed to be numpy types. For object # arrays, each element must be iterated over to be compared. Then, # if it isn't a string type, then they must have the same dtype, # shape, and all elements. If it is an empty string, then it would # have been stored as just a null byte (recurse to do that # comparison). If it is a bytes_ type, the dtype, shape, and # elements must all be the same. If it is string_ type, we must # convert to uint32 and then everything can be compared. if type(b) == dict: assert type(a) == np.ndarray assert a.dtype.names is not None assert set(a.dtype.names) == set(b.keys()) for k in b: assert_equal_none_format(a[k][0], b[k]) elif type(b) in (list, tuple, set, frozenset, collections.deque): b_conv = np.zeros(dtype='object', shape=(len(b), )) for i, v in enumerate(b): b_conv[i] = v assert_equal_none_format(a, b_conv) elif not isinstance(b, (np.generic, np.ndarray)): if b is None: # It should be np.float64([]) assert type(a) == np.ndarray assert a.dtype == np.float64([]).dtype assert a.shape == (0, ) elif (sys.hexversion >= 0x03000000 \ and isinstance(b, (bytes, bytearray))) \ or (sys.hexversion < 0x03000000 \ and isinstance(b, (bytes, bytearray))): assert a == np.bytes_(b) elif (sys.hexversion >= 0x03000000 \ and isinstance(b, str)) \ or (sys.hexversion < 0x03000000 \ and isinstance(b, unicode)): assert_equal_none_format(a, np.unicode_(b)) elif (sys.hexversion >= 0x03000000 \ and type(b) == int) \ or (sys.hexversion < 0x03000000 \ and type(b) == long): assert_equal_none_format(a, np.int64(b)) else: assert_equal_none_format(a, np.array(b)[()]) elif isinstance(b, np.recarray): assert_equal_none_format(a, b.view(np.ndarray)) else: if b.dtype.name != 'object': if b.dtype.char in ('U', 'S'): if b.dtype.char == 'S' and b.shape == tuple() \ and len(b) == 0: assert_equal(a, \ np.zeros(shape=tuple(), dtype=b.dtype.char)) elif b.dtype.char == 'U': if b.shape == tuple() and len(b) == 0: c = np.uint32(()) else: c = np.atleast_1d(b).view(np.uint32) assert a.dtype == c.dtype assert a.shape == c.shape npt.assert_equal(a, c) else: assert a.dtype == b.dtype assert a.shape == b.shape npt.assert_equal(a, b) else: # Check that the dtype's shape matches. assert a.dtype.shape == b.dtype.shape # Now, if b.shape is just all ones, then a.shape will # just be (1,). Otherwise, we need to compare the shapes # directly. Also, dimensions need to be squeezed before # comparison in this case. assert np.prod(a.shape) == np.prod(b.shape) if a.shape != b.shape: assert np.prod(b.shape) == 1 assert a.shape == (1, ) if np.prod(a.shape) == 1: a = np.squeeze(a) b = np.squeeze(b) # If there was a null in the dtype or the dtype of one # of its fields (or subfields) has a 0 in its shape, # then it was written as a Group so the field order # could have changed. has_zero_shape = False if b.dtype.names is not None: parts = [b.dtype] while 0 != len(parts): part = parts.pop() if 0 in part.shape: has_zero_shape = True if part.names is not None: parts.extend([v[0] for v in part.fields.values()]) if part.base != part: parts.append(part.base) if b.dtype.names is not None \ and ('\\x00' in str(b.dtype) \ or has_zero_shape): assert a.shape == b.shape assert set(a.dtype.names) == set(b.dtype.names) for n in b.dtype.names: assert_equal_none_format(a[n], b[n]) else: assert a.dtype == b.dtype with warnings.catch_warnings(): warnings.simplefilter('ignore', RuntimeWarning) npt.assert_equal(a, b) else: # If the original is structued, it is possible that the # fields got out of order, in which case the dtype won't # quite match. It will need to be checked just to make sure # all pieces are there. Otherwise, the dtypes can be # directly compared. if b.dtype.fields is None: assert a.dtype == b.dtype else: assert dict(a.dtype.fields) == dict(b.dtype.fields) assert a.shape == b.shape for index, x in np.ndenumerate(a): assert_equal_none_format(a[index], b[index]) def assert_equal_matlab_format(a, b): # Compares a and b for equality. b is always the original. If they # are dictionaries, a must be a structured ndarray and they must # have the same set of keys, after which they values must all be # compared. If they are a collection type (list, tuple, set, # frozenset, or deque), then the compairison must be made with b # converted to an object array. If the original is not a numpy type # (isn't or doesn't inherit from np.generic or np.ndarray), then it # is a matter of converting it to the appropriate numpy # type. Otherwise, both are supposed to be numpy types. For object # arrays, each element must be iterated over to be compared. Then, # if it isn't a string type, then they must have the same dtype, # shape, and all elements. All strings are converted to numpy.str_ # on read. If it is empty, it has shape (1, 0). A numpy.str_ has all # of its strings per row compacted together. A numpy.bytes_ string # has to have the same thing done, but then it needs to be converted # up to UTF-32 and to numpy.str_ through uint32. # # In all cases, we expect things to be at least two dimensional # arrays. if type(b) == dict: assert type(a) == np.ndarray assert a.dtype.names is not None assert set(a.dtype.names) == set(b.keys()) for k in b: assert_equal_matlab_format(a[k][0], b[k]) elif type(b) in (list, tuple, set, frozenset, collections.deque): b_conv = np.zeros(dtype='object', shape=(len(b), )) for i, v in enumerate(b): b_conv[i] = v assert_equal_matlab_format(a, b_conv) elif not isinstance(b, (np.generic, np.ndarray)): if b is None: # It should be np.zeros(shape=(0, 1), dtype='float64')) assert type(a) == np.ndarray assert a.dtype == np.dtype('float64') assert a.shape == (1, 0) elif (sys.hexversion >= 0x03000000 \ and isinstance(b, (bytes, str, bytearray))) \ or (sys.hexversion < 0x03000000 \ and isinstance(b, (bytes, unicode, bytearray))): if len(b) == 0: assert_equal(a, np.zeros(shape=(1, 0), dtype='U')) elif isinstance(b, (bytes, bytearray)): assert_equal(a, np.atleast_2d(np.unicode_( \ b.decode('UTF-8')))) else: assert_equal(a, np.atleast_2d(np.unicode_(b))) elif (sys.hexversion >= 0x03000000 \ and type(b) == int) \ or (sys.hexversion < 0x03000000 \ and type(b) == long): assert_equal(a, np.atleast_2d(np.int64(b))) else: assert_equal(a, np.atleast_2d(np.array(b))) else: if b.dtype.name != 'object': if b.dtype.char in ('U', 'S'): if len(b) == 0 and (b.shape == tuple() \ or b.shape == (0, )): assert_equal(a, np.zeros(shape=(1, 0), dtype='U')) elif b.dtype.char == 'U': c = np.atleast_1d(b) c = np.atleast_2d(c.view(np.dtype('U' \ + str(c.shape[-1]*c.dtype.itemsize//4)))) assert a.dtype == c.dtype assert a.shape == c.shape npt.assert_equal(a, c) elif b.dtype.char == 'S': c = np.atleast_1d(b) c = c.view(np.dtype('S' \ + str(c.shape[-1]*c.dtype.itemsize))) c = np.uint32(c.view(np.ndarray).view(np.dtype('uint8'))) c = c.view(np.dtype('U' + str(c.shape[-1]))) c = np.atleast_2d(c) assert a.dtype == c.dtype assert a.shape == c.shape npt.assert_equal(a, c) pass else: c = np.atleast_2d(b) assert a.dtype == c.dtype assert a.shape == c.shape with warnings.catch_warnings(): warnings.simplefilter('ignore', RuntimeWarning) npt.assert_equal(a, c) else: c = np.atleast_2d(b) # An empty complex number gets turned into a real # number when it is stored. if np.prod(c.shape) == 0 \ and b.dtype.name.startswith('complex'): c = np.real(c) # If it is structured, check that the field names are # the same, in the same order, and then go through them # one by one. Otherwise, make sure the dtypes and shapes # are the same before comparing all values. if b.dtype.names is None and a.dtype.names is None: assert a.dtype == c.dtype assert a.shape == c.shape with warnings.catch_warnings(): warnings.simplefilter('ignore', RuntimeWarning) npt.assert_equal(a, c) else: assert a.dtype.names is not None assert b.dtype.names is not None assert set(a.dtype.names) == set(b.dtype.names) # The ordering of fields must be preserved if the # MATLAB_fields attribute could be used, which can # only be done if there are no non-ascii characters # in any of the field names. if sys.hexversion >= 0x03000000: allfields = ''.join(b.dtype.names) else: allfields = unicode('').join( \ [nm.decode('UTF-8') \ for nm in b.dtype.names]) if np.all(np.array([ord(ch) < 128 \ for ch in allfields])): assert a.dtype.names == b.dtype.names a = a.flatten() b = b.flatten() for k in b.dtype.names: for index, x in np.ndenumerate(a): assert_equal_from_matlab(a[k][index], b[k][index]) else: c = np.atleast_2d(b) assert a.dtype == c.dtype assert a.shape == c.shape for index, x in np.ndenumerate(a): assert_equal_matlab_format(a[index], c[index]) def assert_equal_from_matlab(a, b): # Compares a and b for equality. They are all going to be numpy # types. hdf5storage and scipy behave differently when importing # arrays as to whether they are 2D or not, so we will make them all # at least 2D regardless. For strings, the two packages produce # transposed results of each other, so one just needs to be # transposed. For object arrays, each element must be iterated over # to be compared. For structured ndarrays, their fields need to be # compared and then they can be compared element and field # wise. Otherwise, they can be directly compared. Note, the type is # often converted by scipy (or on route to the file before scipy # gets it), so comparisons are done by value, which is not perfect. a = np.atleast_2d(a) b = np.atleast_2d(b) if a.dtype.char == 'U': a = a.T if b.dtype.name == 'object': a = a.flatten() b = b.flatten() for index, x in np.ndenumerate(a): assert_equal_from_matlab(a[index], b[index]) elif b.dtype.names is not None or a.dtype.names is not None: assert a.dtype.names is not None assert b.dtype.names is not None assert set(a.dtype.names) == set(b.dtype.names) a = a.flatten() b = b.flatten() for k in b.dtype.names: for index, x in np.ndenumerate(a): assert_equal_from_matlab(a[k][index], b[k][index]) else: with warnings.catch_warnings(): warnings.simplefilter('ignore', RuntimeWarning) npt.assert_equal(a, b) hdf5storage-0.1.19/tests/make_mat_with_all_types.m000066400000000000000000000103101436247615200222270ustar00rootroot00000000000000% Copyright (c) 2013-2016, Freja Nordsiek % All rights reserved. % % Redistribution and use in source and binary forms, with or without % modification, are permitted provided that the following conditions are % met: % % 1. Redistributions of source code must retain the above copyright % notice, this list of conditions and the following disclaimer. % % 2. Redistributions in binary form must reproduce the above copyright % notice, this list of conditions and the following disclaimer in the % documentation and/or other materials provided with the distribution. % % THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS % "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT % LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR % A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT % HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, % SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT % LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, % DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY % THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT % (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE % OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. clear a % Main types as scalars and arrays. a.logical = true; a.uint8 = uint8(2); a.uint16 = uint16(28); a.uint32 = uint32(28347394); a.uint64 = uint64(234392); a.int8 = int8(-32); a.int16 = int16(284); a.int32 = int32(-7394); a.int64 = int64(2334322); a.single = single(4.2134e-2); a.single_complex = single(33.4 + 3i); a.single_nan = single(NaN); a.single_inf = single(inf); a.double = 14.2134e200; a.double_complex = 8e-30 - 3.2e40i; a.double_nan = NaN; a.double_inf = -inf; a.char = 'p'; a.logical_array = logical([1 0 0 0; 0 1 1 0]); a.uint8_array = uint8([0 1 3 4; 92 3 2 8]); a.uint16_array = uint16([0 1; 3 4; 92 3; 2 8]); a.uint32_array = uint32([0 1 3 4 92 3 2 8]); a.uint64_array = uint64([0; 1; 3; 4; 92; 3; 2; 8]); a.int8_array = int8([0 1 3 4; 92 3 2 8]); a.int16_array = int16([0 1; 3 4; 92 3; 2 8]); a.int32_array = int32([0 1 3 4 92 3 2 8]); a.int64_array = int64([0; 1; 3; 4; 92; 3; 2; 8]); a.single_array = single(rand(4, 9)); a.single_array_complex = single(rand(2,7) + 1i*rand(2,7)); a.double_array = rand(3, 2); a.double_array_complex = rand(5,2) + 1i*rand(5,2); a.char_array = ['ivkea'; 'avvai']; a.char_cell_array = {'v83nv', 'aADvai98v3'}; % Empties of main types. a.logical_empty = logical([]); a.uint8_empty = uint8([]); a.uint16_empty = uint16([]); a.uint32_empty = uint32([]); a.uint64_empty = uint64([]); a.int8_empty = int8([]); a.int16_empty = int16([]); a.int32_empty = int32([]); a.int64_empty = int64([]); a.single_empty = single([]); a.double_empty = []; % Main container types. a.cell = {5.34+9i}; a.cell_array = {1, [2 3]; 8.3, -[3; 3]; [], 20}; a.cell_empty = {}; a.struct = struct('a', {3.3}, 'bc', {[1 4 5]}); a.struct_empty = struct('vea', {}, 'b', {}); a.struct_array = struct('a', {3.3; 3}, 'avav_Ab', {[1 4 5]; []}); % % Function handles. % % ab = 1:6; % a.fhandle = @sin; % a.fhandle_args = @(x, y) x .* cos(y); % a.fhandle_args_environment = @(m, n) m*(b.*rand(size(b))) + n; % % % Map type. % % a.map_char = containers.Map({'4v', 'u', '2vn'}, {4, uint8(9), 'bafd'}); % a.map_single = containers.Map({single(3), single(38.3), single(2e-3)}, {4, uint8(9), 'bafd'}); % a.map_empty = containers.Map; % % % The categorical type. % % b = {'small', 'medium', 'small', 'medium', 'medium', 'large', 'medium'}; % c = {'small', 'medium', 'large'}; % d = round(2*rand(10,3)); % % a.categorical = categorical(b); % a.categorical_ordinal = categorical(b, c, 'Ordinal', true); % a.categorical_ordinal_int = categorical(d, 0:2, c, 'Ordinal', true); % % a.categorical_empty = categorical({}); % a.categorical_ordinal_empty = categorical({}, c, 'Ordinal', true); % a.categorical_ordinal_int_empty = categorical([], 0:2, c, 'Ordinal', true); % % % Tables. % % a.table = readtable('patients.dat'); % a.table_oneentry = a.table(1,:); % a.table_empty = a.table([], :); % % % Not doing time series yet. save('types_v7p3.mat','-struct','a','-v7.3') save('types_v7.mat','-struct','a','-v7') exit hdf5storage-0.1.19/tests/make_randoms.py000066400000000000000000000270021436247615200202040ustar00rootroot00000000000000# -*- coding: utf-8 -*- # Copyright (c) 2013-2016, Freja Nordsiek # All rights reserved. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions are # met: # # 1. Redistributions of source code must retain the above copyright # notice, this list of conditions and the following disclaimer. # # 2. Redistributions in binary form must reproduce the above copyright # notice, this list of conditions and the following disclaimer in the # documentation and/or other materials provided with the distribution. # # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS # "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT # LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR # A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT # HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, # SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT # LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, # DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY # THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. import sys import posixpath import string import random import warnings import numpy as np import numpy.random random.seed() # The dtypes that can be made dtypes = ['bool', 'uint8', 'uint16', 'uint32', 'uint64', 'int8', 'int16', 'int32', 'int64', 'float32', 'float64', 'complex64', 'complex128', 'S', 'U'] # Define the sizes of random datasets to use. max_string_length = 10 max_array_axis_length = 8 max_list_length = 6 max_posix_path_depth = 5 max_posix_path_lengths = 17 object_subarray_dimensions = 2 max_object_subarray_axis_length = 5 min_dict_keys = 4 max_dict_keys = 12 max_dict_key_length = 10 dict_value_subarray_dimensions = 2 max_dict_value_subarray_axis_length = 5 min_structured_ndarray_fields = 2 max_structured_ndarray_fields = 5 max_structured_ndarray_field_lengths = 10 max_structured_ndarray_axis_length = 2 structured_ndarray_subarray_dimensions = 2 max_structured_ndarray_subarray_axis_length = 4 def random_str_ascii_letters(length): # Makes a random ASCII str of the specified length. if sys.hexversion >= 0x03000000: ltrs = string.ascii_letters return ''.join([random.choice(ltrs) for i in range(0, length)]) else: ltrs = unicode(string.ascii_letters) return unicode('').join([random.choice(ltrs) for i in range(0, length)]) def random_str_ascii(length): # Makes a random ASCII str of the specified length. if sys.hexversion >= 0x03000000: ltrs = string.ascii_letters + string.digits return ''.join([random.choice(ltrs) for i in range(0, length)]) else: ltrs = unicode(string.ascii_letters + string.digits) return unicode('').join([random.choice(ltrs) for i in range(0, length)]) def random_str_some_unicode(length): # Makes a random ASCII+limited unicode str of the specified # length. ltrs = random_str_ascii(10) if sys.hexversion >= 0x03000000: ltrs += 'αβγδεζηθικλμνξοπρστυφχψωΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩς' c = '' else: ltrs += unicode('αβγδεζηθικλμνξοπρστυφχψω' + 'ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩς', 'utf-8') c = unicode('') return c.join([random.choice(ltrs) for i in range(0, length)]) def random_bytes(length): # Makes a random sequence of bytes of the specified length from # the ASCII set. ltrs = bytes(range(1, 127)) return bytes([random.choice(ltrs) for i in range(0, length)]) def random_bytes_fullrange(length): # Makes a random sequence of bytes of the specified length from # the ASCII set. ltrs = bytes(range(1, 255)) return bytes([random.choice(ltrs) for i in range(0, length)]) def random_int(): return random.randint(-(2**31 - 1), 2**31) def random_float(): return random.uniform(-1.0, 1.0) \ * 10.0**random.randint(-300, 300) def random_numpy(shape, dtype, allow_nan=True, allow_unicode=False): # Makes a random numpy array of the specified shape and dtype # string. The method is slightly different depending on the # type. For 'bytes', 'str', and 'object'; an array of the # specified size is made and then each element is set to either # a numpy.bytes_, numpy.str_, or some other object of any type # (here, it is a randomly typed random numpy array). If it is # any other type, then it is just a matter of constructing the # right sized ndarray from a random sequence of bytes (all must # be forced to 0 and 1 for bool). Optionally include unicode # characters. if dtype == 'S': length = random.randint(1, max_string_length) data = np.zeros(shape=shape, dtype='S' + str(length)) for index, x in np.ndenumerate(data): if allow_unicode: chars = random_bytes_fullrange(length) else: chars = random_bytes(length) data[index] = np.bytes_(chars) return data elif dtype == 'U': length = random.randint(1, max_string_length) data = np.zeros(shape=shape, dtype='U' + str(length)) for index, x in np.ndenumerate(data): if allow_unicode: chars = random_str_some_unicode(length) else: chars = random_str_ascii(length) data[index] = np.unicode_(chars) return data elif dtype == 'object': data = np.zeros(shape=shape, dtype='object') for index, x in np.ndenumerate(data): data[index] = random_numpy( \ shape=random_numpy_shape( \ object_subarray_dimensions, \ max_object_subarray_axis_length), \ dtype=random.choice(dtypes)) return data else: nbytes = np.ndarray(shape=(1,), dtype=dtype).nbytes bts = np.random.bytes(nbytes * np.prod(shape)) if dtype == 'bool': bts = b''.join([{True: b'\x01', False: b'\x00'}[ \ ch > 127] for ch in bts]) data = np.ndarray(shape=shape, dtype=dtype, buffer=bts) # If it is a floating point type and we are supposed to # remove NaN's, then turn them to zeros. Numpy will throw # RuntimeWarnings for some NaN values, so those warnings need to # be caught and ignored. if not allow_nan and data.dtype.kind in ('f', 'c'): data = data.copy() with warnings.catch_warnings(): warnings.simplefilter('ignore', RuntimeWarning) if data.dtype.kind == 'f': data[np.isnan(data)] = 0.0 else: data.real[np.isnan(data.real)] = 0.0 data.imag[np.isnan(data.imag)] = 0.0 return data def random_numpy_scalar(dtype): # How a random scalar is made depends on th type. For must, it # is just a single number. But for the string types, it is a # string of any length. if dtype == 'S': return np.bytes_(random_bytes(random.randint(1, max_string_length))) elif dtype == 'U': return np.unicode_(random_str_ascii( random.randint(1, max_string_length))) else: return random_numpy(tuple(), dtype)[()] def random_numpy_shape(dimensions, max_length): # Makes a random shape tuple having the specified number of # dimensions. The maximum size along each axis is max_length. return tuple([random.randint(1, max_length) for x in range(0, dimensions)]) def random_list(N, python_or_numpy='numpy'): # Makes a random list of the specified type. If instructed, it # will be composed entirely from random numpy arrays (make a # random object array and then convert that to a # list). Otherwise, it will be a list of random bytes. if python_or_numpy == 'numpy': return random_numpy((N,), dtype='object').tolist() else: data = [] for i in range(0, N): data.append(random_bytes(random.randint(1, max_string_length))) return data def random_dict(): # Makes a random dict (random number of randomized keys with # random numpy arrays as values). data = dict() for i in range(0, random.randint(min_dict_keys, \ max_dict_keys)): name = random_str_ascii(max_dict_key_length) data[name] = \ random_numpy(random_numpy_shape( \ dict_value_subarray_dimensions, \ max_dict_value_subarray_axis_length), \ dtype=random.choice(dtypes)) return data def random_structured_numpy_array(shape, field_shapes=None, nonascii_fields=False, names=None): # Make random field names (if not provided with field names), # dtypes, and sizes. Though, if field_shapes is explicitly given, # the sizes should be random. The field names must all be of type # str, not unicode in Python 2. Optionally include non-ascii # characters in the field names (will have to be encoded in Python # 2.x). String types will not be used due to the difficulty in # assigning the length. if names is None: if nonascii_fields: name_func = random_str_some_unicode else: name_func = random_str_ascii names = [name_func( max_structured_ndarray_field_lengths) for i in range(0, random.randint( min_structured_ndarray_fields, max_structured_ndarray_fields))] if sys.hexversion < 0x03000000: for i, name in enumerate(names): names[i] = name.encode('UTF-8') dts = [random.choice(list(set(dtypes) - set(('S', 'U')))) for i in range(len(names))] if field_shapes is None: shapes = [random_numpy_shape( structured_ndarray_subarray_dimensions, max_structured_ndarray_subarray_axis_length) for i in range(len(names))] else: shapes = [field_shapes] * len(names) # Construct the type of the whole thing. dt = np.dtype([(names[i], dts[i], shapes[i]) for i in range(len(names))]) # Make the array. If dt.itemsize is 0, then we need to make an # array of int8's the size in shape and convert it to work # around a numpy bug. Otherwise, we will just create an empty # array and then proceed by assigning each field. if dt.itemsize == 0: return np.zeros(shape=shape, dtype='int8').astype(dt) else: data = np.empty(shape=shape, dtype=dt) for index, x in np.ndenumerate(data): for i, name in enumerate(names): data[name][index] = random_numpy(shapes[i], \ dts[i], allow_nan=False) return data def random_name(): # Makes a random POSIX path of a random depth. depth = random.randint(1, max_posix_path_depth) path = '/' for i in range(0, depth): path = posixpath.join(path, random_str_ascii( random.randint(1, max_posix_path_lengths))) return path hdf5storage-0.1.19/tests/read_write_mat.m000066400000000000000000000026201436247615200203350ustar00rootroot00000000000000% Copyright (c) 2013-2016, Freja Nordsiek % All rights reserved. % % Redistribution and use in source and binary forms, with or without % modification, are permitted provided that the following conditions are % met: % % 1. Redistributions of source code must retain the above copyright % notice, this list of conditions and the following disclaimer. % % 2. Redistributions in binary form must reproduce the above copyright % notice, this list of conditions and the following disclaimer in the % documentation and/or other materials provided with the distribution. % % THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS % "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT % LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR % A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT % HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, % SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT % LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, % DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY % THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT % (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE % OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. a = load('python_v7p3.mat'); save('python_v7.mat','-struct','a','-v7'); exit; hdf5storage-0.1.19/tests/test_hdf5_filters.py000066400000000000000000000236051436247615200211660ustar00rootroot00000000000000# Copyright (c) 2013-2016, Freja Nordsiek # All rights reserved. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions are # met: # # 1. Redistributions of source code must retain the above copyright # notice, this list of conditions and the following disclaimer. # # 2. Redistributions in binary form must reproduce the above copyright # notice, this list of conditions and the following disclaimer in the # documentation and/or other materials provided with the distribution. # # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS # "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT # LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR # A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT # HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, # SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT # LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, # DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY # THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. import os import os.path import random import h5py import hdf5storage from nose.tools import raises from asserts import * from make_randoms import * random.seed() filename = 'data.mat' def check_read_filters(filters): # Read out the filter arguments. filts = {'compression': 'gzip', 'shuffle': True, 'fletcher32': True, 'gzip_level': 7} for k, v in filters.items(): filts[k] = v if filts['compression'] == 'gzip': filts['compression_opts'] = filts['gzip_level'] del filts['gzip_level'] # Make some random data. dims = random.randint(1, 4) data = random_numpy(shape=random_numpy_shape(dims, max_array_axis_length), dtype=random.choice(tuple( set(dtypes) - set(['U'])))) # Make a random name. name = random_name() # Write the data to the proper file with the given name with the # provided filters and read it backt. The file needs to be deleted # before and after to keep junk from building up. if os.path.exists(filename): os.remove(filename) try: with h5py.File(filename, mode='w') as f: f.create_dataset(name, data=data, chunks=True, **filts) out = hdf5storage.read(path=name, filename=filename, matlab_compatible=False) except: raise finally: if os.path.exists(filename): os.remove(filename) # Compare assert_equal(out, data) def check_write_filters(filters): # Read out the filter arguments. filts = {'compression': 'gzip', 'shuffle': True, 'fletcher32': True, 'gzip_level': 7} for k, v in filters.items(): filts[k] = v # Make some random data. The dtype must be restricted so that it can # be read back reliably. dims = random.randint(1, 4) dts = tuple(set(dtypes) - set(['U', 'S', 'bool', 'complex64', \ 'complex128'])) data = random_numpy(shape=random_numpy_shape(dims, max_array_axis_length), dtype=random.choice(dts)) # Make a random name. name = random_name() # Write the data to the proper file with the given name with the # provided filters and read it backt. The file needs to be deleted # before and after to keep junk from building up. if os.path.exists(filename): os.remove(filename) try: hdf5storage.write(data, path=name, filename=filename, \ store_python_metadata=False, matlab_compatible=False, \ compress=True, compress_size_threshold=0, \ compression_algorithm=filts['compression'], \ gzip_compression_level=filts['gzip_level'], \ shuffle_filter=filts['shuffle'], \ compressed_fletcher32_filter=filts['fletcher32']) with h5py.File(filename, mode='r') as f: d = f[name] fletcher32 = d.fletcher32 shuffle = d.shuffle compression = d.compression gzip_level = d.compression_opts out = d[...] except: raise finally: if os.path.exists(filename): os.remove(filename) # Check the filters assert fletcher32 == filts['fletcher32'] assert shuffle == filts['shuffle'] assert compression == filts['compression'] if filts['compression'] == 'gzip': assert gzip_level == filts['gzip_level'] # Compare assert_equal(out, data) def check_uncompressed_write_filters(method, uncompressed_fletcher32_filter, filters): # Read out the filter arguments. filts = {'compression': 'gzip', 'shuffle': True, 'fletcher32': True, 'gzip_level': 7} for k, v in filters.items(): filts[k] = v # Make some random data. The dtype must be restricted so that it can # be read back reliably. dims = random.randint(1, 4) dts = tuple(set(dtypes) - set(['U', 'S', 'bool', 'complex64', \ 'complex128'])) data = random_numpy(shape=random_numpy_shape(dims, max_array_axis_length), dtype=random.choice(dts)) # Make a random name. name = random_name() # Make the options to disable compression by the method specified, # which is either that it is outright disabled or that the data is # smaller than the compression threshold. if method == 'compression_disabled': opts = {'compress': False, 'compress_size_threshold': 0} else: opts = {'compress': True, 'compress_size_threshold': data.nbytes + 1} # Write the data to the proper file with the given name with the # provided filters and read it backt. The file needs to be deleted # before and after to keep junk from building up. if os.path.exists(filename): os.remove(filename) try: hdf5storage.write(data, path=name, filename=filename, \ store_python_metadata=False, matlab_compatible=False, \ compression_algorithm=filts['compression'], \ gzip_compression_level=filts['gzip_level'], \ shuffle_filter=filts['shuffle'], \ compressed_fletcher32_filter=filts['fletcher32'], \ uncompressed_fletcher32_filter= \ uncompressed_fletcher32_filter, \ **opts) with h5py.File(filename, mode='r') as f: d = f[name] fletcher32 = d.fletcher32 shuffle = d.shuffle compression = d.compression gzip_level = d.compression_opts out = d[...] except: raise finally: if os.path.exists(filename): os.remove(filename) # Check the filters assert compression == None assert shuffle == False assert fletcher32 == uncompressed_fletcher32_filter # Compare assert_equal(out, data) def test_read_filtered_data(): for compression in ('gzip', 'lzf'): for shuffle in (True, False): for fletcher32 in (True, False): if compression != 'gzip': filters = {'compression': compression, 'shuffle': shuffle, 'fletcher32': fletcher32} yield check_read_filters, filters else: for level in range(10): filters = {'compression': compression, 'shuffle': shuffle, 'fletcher32': fletcher32, 'gzip_level': level} yield check_read_filters, filters def test_write_filtered_data(): for compression in ('gzip', 'lzf'): for shuffle in (True, False): for fletcher32 in (True, False): if compression != 'gzip': filters = {'compression': compression, 'shuffle': shuffle, 'fletcher32': fletcher32} yield check_read_filters, filters else: for level in range(10): filters = {'compression': compression, 'shuffle': shuffle, 'fletcher32': fletcher32, 'gzip_level': level} yield check_write_filters, filters def test_uncompressed_write_filtered_data(): for method in ('compression_disabled', 'data_too_small'): for uncompressed_fletcher32_filter in (True, False): for compression in ('gzip', 'lzf'): for shuffle in (True, False): for fletcher32 in (True, False): if compression != 'gzip': filters = {'compression': compression, 'shuffle': shuffle, 'fletcher32': fletcher32} yield check_read_filters, filters else: for level in range(10): filters = {'compression': compression, 'shuffle': shuffle, 'fletcher32': fletcher32, 'gzip_level': level} yield check_uncompressed_write_filters,\ method, uncompressed_fletcher32_filter,\ filters hdf5storage-0.1.19/tests/test_matlab_compatibility.py000066400000000000000000000072101436247615200227730ustar00rootroot00000000000000# Copyright (c) 2014-2016, Freja Nordsiek # All rights reserved. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions are # met: # # 1. Redistributions of source code must retain the above copyright # notice, this list of conditions and the following disclaimer. # # 2. Redistributions in binary form must reproduce the above copyright # notice, this list of conditions and the following disclaimer in the # documentation and/or other materials provided with the distribution. # # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS # "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT # LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR # A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT # HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, # SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT # LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, # DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY # THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. import os import os.path import subprocess from nose.plugins.skip import SkipTest import hdf5storage from asserts import * mat_files = ['types_v7p3.mat', 'types_v7.mat', 'python_v7p3.mat', 'python_v7.mat'] for i in range(0, len(mat_files)): mat_files[i] = os.path.join(os.path.dirname(__file__), mat_files[i]) script_names = ['make_mat_with_all_types.m', 'read_write_mat.m'] for i in range(0, len(script_names)): script_names[i] = os.path.join(os.path.dirname(__file__), script_names[i]) types_v7 = dict() types_v7p3 = dict() python_v7 = dict() python_v7p3 = dict() # Have a flag for whether matlab was found and run successfully or not, # so tests can be skipped if not. ran_matlab_successful = [False] def setup_module(): teardown_module() try: import scipy.io matlab_command = "run('" + script_names[0] + "')" subprocess.check_call(['matlab', '-nosplash', '-nodesktop', '-nojvm', '-r', matlab_command]) scipy.io.loadmat(file_name=mat_files[1], mdict=types_v7) hdf5storage.loadmat(file_name=mat_files[0], mdict=types_v7p3) hdf5storage.savemat(file_name=mat_files[2], mdict=types_v7p3) matlab_command = "run('" + script_names[1] + "')" subprocess.check_call(['matlab', '-nosplash', '-nodesktop', '-nojvm', '-r', matlab_command]) scipy.io.loadmat(file_name=mat_files[3], mdict=python_v7) hdf5storage.loadmat(file_name=mat_files[2], mdict=python_v7p3) except: pass else: ran_matlab_successful[0] = True def teardown_module(): for name in mat_files: if os.path.exists(name): os.remove(name) def test_read_from_matlab(): if not ran_matlab_successful[0]: raise SkipTest for k in (set(types_v7.keys()) - set(['__version__', '__header__', \ '__globals__'])): yield check_variable_from_matlab, k def test_to_matlab_back(): if not ran_matlab_successful[0]: raise SkipTest for k in set(types_v7p3.keys()): yield check_variable_to_matlab_back, k def check_variable_from_matlab(name): assert_equal_from_matlab(types_v7p3[name], types_v7[name]) def check_variable_to_matlab_back(name): assert_equal_from_matlab(python_v7p3[name], types_v7[name]) hdf5storage-0.1.19/tests/test_multi_io.py000066400000000000000000000071051436247615200204260ustar00rootroot00000000000000# Copyright (c) 2013-2016, Freja Nordsiek # All rights reserved. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions are # met: # # 1. Redistributions of source code must retain the above copyright # notice, this list of conditions and the following disclaimer. # # 2. Redistributions in binary form must reproduce the above copyright # notice, this list of conditions and the following disclaimer in the # documentation and/or other materials provided with the distribution. # # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS # "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT # LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR # A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT # HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, # SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT # LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, # DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY # THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. import os import os.path import random import hdf5storage from asserts import * from make_randoms import * random.seed() filename = 'data.mat' # A series of tests to make sure that more than one data item can be # written or read at a time using the writes and reads functions. def test_multi_write(): # Makes a random dict of random paths and variables (random number # of randomized paths with random numpy arrays as values). data = dict() for i in range(0, random.randint(min_dict_keys, \ max_dict_keys)): name = random_name() data[name] = \ random_numpy(random_numpy_shape( \ dict_value_subarray_dimensions, \ max_dict_value_subarray_axis_length), \ dtype=random.choice(dtypes)) # Write it and then read it back item by item. if os.path.exists(filename): os.remove(filename) try: hdf5storage.writes(mdict=data, filename=filename) out = dict() for p in data: out[p] = hdf5storage.read(path=p, filename=filename) except: raise finally: if os.path.exists(filename): os.remove(filename) # Compare data and out. assert_equal(out, data) def test_multi_read(): # Makes a random dict of random paths and variables (random number # of randomized paths with random numpy arrays as values). data = dict() for i in range(0, random.randint(min_dict_keys, \ max_dict_keys)): name = random_name() data[name] = \ random_numpy(random_numpy_shape( \ dict_value_subarray_dimensions, \ max_dict_value_subarray_axis_length), \ dtype=random.choice(dtypes)) paths = data.keys() # Write it item by item and then read it back in one unit. if os.path.exists(filename): os.remove(filename) try: for p in paths: hdf5storage.write(data=data[p], path=p, filename=filename) out = hdf5storage.reads(paths=list(data.keys()), filename=filename) except: raise finally: if os.path.exists(filename): os.remove(filename) # Compare data and out. for i, p in enumerate(paths): assert_equal(out[i], data[p]) hdf5storage-0.1.19/tests/test_ndarray_O_field.py000066400000000000000000000066411436247615200216720ustar00rootroot00000000000000# Copyright (c) 2013-2016, Freja Nordsiek # All rights reserved. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions are # met: # # 1. Redistributions of source code must retain the above copyright # notice, this list of conditions and the following disclaimer. # # 2. Redistributions in binary form must reproduce the above copyright # notice, this list of conditions and the following disclaimer in the # documentation and/or other materials provided with the distribution. # # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS # "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT # LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR # A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT # HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, # SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT # LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, # DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY # THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. import os import os.path import numpy as np import h5py import hdf5storage filename = 'data.mat' # A series of tests to make sure that structured ndarrays with a field # that has an object dtype are written like structs (are HDF5 Groups) # but are written as an HDF5 COMPOUND Dataset otherwise (even in the # case that a field's name is 'O'). def test_O_field_compound(): name = '/a' data = np.empty(shape=(1, ), dtype=[('O', 'int8'), ('a', 'uint16')]) if os.path.exists(filename): os.remove(filename) try: hdf5storage.write(data, path=name, filename=filename, matlab_compatible=False, structured_numpy_ndarray_as_struct=False) with h5py.File(filename, mode='r') as f: assert isinstance(f[name], h5py.Dataset) except: raise finally: if os.path.exists(filename): os.remove(filename) def test_object_field_group(): name = '/a' data = np.empty(shape=(1, ), dtype=[('a', 'O'), ('b', 'uint16')]) data['a'][0] = [1, 2] if os.path.exists(filename): os.remove(filename) try: hdf5storage.write(data, path=name, filename=filename, matlab_compatible=False, structured_numpy_ndarray_as_struct=False) with h5py.File(filename, mode='r') as f: assert isinstance(f[name], h5py.Group) except: raise finally: if os.path.exists(filename): os.remove(filename) def test_O_and_object_field_group(): name = '/a' data = np.empty(shape=(1, ), dtype=[('a', 'O'), ('O', 'uint16')]) data['a'][0] = [1, 2] if os.path.exists(filename): os.remove(filename) try: hdf5storage.write(data, path=name, filename=filename, matlab_compatible=False, structured_numpy_ndarray_as_struct=False) with h5py.File(filename, mode='r') as f: assert isinstance(f[name], h5py.Group) except: raise finally: if os.path.exists(filename): os.remove(filename) hdf5storage-0.1.19/tests/test_string_utf16_conversion.py000066400000000000000000000047261436247615200234130ustar00rootroot00000000000000# -*- coding: utf-8 -*- # Copyright (c) 2013-2016, Freja Nordsiek # All rights reserved. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions are # met: # # 1. Redistributions of source code must retain the above copyright # notice, this list of conditions and the following disclaimer. # # 2. Redistributions in binary form must reproduce the above copyright # notice, this list of conditions and the following disclaimer in the # documentation and/or other materials provided with the distribution. # # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS # "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT # LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR # A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT # HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, # SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT # LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, # DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY # THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. import sys import os import os.path import tempfile import numpy as np import h5py import hdf5storage import nose.tools # A test to make sure that the following are written as UTF-16 # (uint16) if they don't contain doublets and the # convert_numpy_str_to_utf16 option is set. # # * str # * numpy.unicode_ scalars def check_conv_utf16(tp): name = '/a' data = tp('abcdefghijklmnopqrstuvwxyz') fld = None try: fld = tempfile.mkstemp() os.close(fld[0]) filename = fld[1] hdf5storage.write(data, path=name, filename=filename, matlab_compatible=False, store_python_metadata=False, convert_numpy_str_to_utf16=True) with h5py.File(filename, mode='r') as f: nose.tools.assert_equal(f[name].dtype.type, np.uint16) except: raise finally: if fld is not None: os.remove(fld[1]) def test_conv_utf16(): if sys.hexversion < 0x3000000: tps = (unicode, np.unicode_) else: tps = (str, np.unicode_) for tp in tps: yield check_conv_utf16, tp hdf5storage-0.1.19/tests/test_write_readback.py000066400000000000000000000715551436247615200215650ustar00rootroot00000000000000# Copyright (c) 2013-2016, Freja Nordsiek # All rights reserved. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions are # met: # # 1. Redistributions of source code must retain the above copyright # notice, this list of conditions and the following disclaimer. # # 2. Redistributions in binary form must reproduce the above copyright # notice, this list of conditions and the following disclaimer in the # documentation and/or other materials provided with the distribution. # # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS # "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT # LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR # A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT # HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, # SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT # LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, # DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY # THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. import sys import copy import os import os.path import math import random import collections import numpy as np import numpy.random import hdf5storage from nose.tools import raises from asserts import * from make_randoms import * random.seed() class TestPythonMatlabFormat(object): # Test for the ability to write python types to an HDF5 file that # type information and matlab information are stored in, and then # read it back and have it be the same. def __init__(self): self.filename = 'data.mat' self.options = hdf5storage.Options() # Need a list of the supported numeric dtypes to test, excluding # those not supported by MATLAB. 'S' and 'U' dtype chars have to # be used for the bare byte and unicode string dtypes since the # dtype strings (but not chars) are not the same in Python 2 and # 3. self.dtypes = ['bool', 'uint8', 'uint16', 'uint32', 'uint64', 'int8', 'int16', 'int32', 'int64', 'float32', 'float64', 'complex64', 'complex128', 'S', 'U'] def write_readback(self, data, name, options, read_options=None): # Write the data to the proper file with the given name, read it # back, and return the result. The file needs to be deleted # before and after to keep junk from building up. Different # options can be used for reading the data back. if os.path.exists(self.filename): os.remove(self.filename) try: hdf5storage.write(data, path=name, filename=self.filename, options=options) out = hdf5storage.read(path=name, filename=self.filename, options=read_options) except: raise finally: if os.path.exists(self.filename): os.remove(self.filename) return out def assert_equal(self, a, b): assert_equal(a, b) def check_numpy_scalar(self, dtype): # Makes a random numpy scalar of the given type, writes it and # reads it back, and then compares it. data = random_numpy_scalar(dtype) out = self.write_readback(data, random_name(), self.options) self.assert_equal(out, data) def check_numpy_array(self, dtype, dimensions): # Makes a random numpy array of the given type, writes it and # reads it back, and then compares it. shape = random_numpy_shape(dimensions, max_array_axis_length) data = random_numpy(shape, dtype) out = self.write_readback(data, random_name(), self.options) self.assert_equal(out, data) def check_numpy_empty(self, dtype): # Makes an empty numpy array of the given type, writes it and # reads it back, and then compares it. data = np.array([], dtype) out = self.write_readback(data, random_name(), self.options) self.assert_equal(out, data) def check_numpy_stringlike_empty(self, dtype, num_chars): # Makes an empty stringlike numpy array of the given type and # size, writes it and reads it back, and then compares it. data = np.array([], dtype + str(num_chars)) out = self.write_readback(data, random_name(), self.options) self.assert_equal(out, data) def check_numpy_structured_array(self, dimensions): # Makes a random structured ndarray of the given type, writes it # and reads it back, and then compares it. shape = random_numpy_shape(dimensions, \ max_structured_ndarray_axis_length) data = random_structured_numpy_array(shape) out = self.write_readback(data, random_name(), self.options) self.assert_equal(out, data) def check_numpy_structured_array_empty(self, dimensions): # Makes a random structured ndarray of the given type, writes it # and reads it back, and then compares it. shape = random_numpy_shape(dimensions, \ max_structured_ndarray_axis_length) data = random_structured_numpy_array(shape, (1, 0)) out = self.write_readback(data, random_name(), self.options) self.assert_equal(out, data) def check_numpy_structured_array_field_special_char(self, ch): # Makes a random 1d structured ndarray with the character # in one field, writes it and reads it back, and then compares # it. field_names = [random_str_ascii(max_dict_key_length) for i in range(2)] field_names[1] = field_names[1][0] + ch + field_names[1][1:] if sys.hexversion < 0x03000000: for i in range(len(field_names)): field_names[i] = field_names[i].encode('UTF-8') shape = random_numpy_shape(1, \ max_structured_ndarray_axis_length) data = random_structured_numpy_array(shape, names=field_names) out = self.write_readback(data, random_name(), self.options) self.assert_equal(out, data) def check_numpy_matrix(self, dtype): # Makes a random numpy array of the given type, converts it to # a matrix, writes it and reads it back, and then compares it. shape = random_numpy_shape(2, max_array_axis_length) data = np.matrix(random_numpy(shape, dtype)) out = self.write_readback(data, random_name(), self.options) self.assert_equal(out, data) def check_numpy_recarray(self, dimensions): # Makes a random structured ndarray of the given type, converts # it to a recarray, writes it and reads it back, and then # compares it. shape = random_numpy_shape(dimensions, \ max_structured_ndarray_axis_length) data = random_structured_numpy_array(shape).view(np.recarray).copy() out = self.write_readback(data, random_name(), self.options) self.assert_equal(out, data) def check_numpy_recarray_empty(self, dimensions): # Makes a random structured ndarray of the given type, converts # it to a recarray, writes it and reads it back, and then # compares it. shape = random_numpy_shape(dimensions, \ max_structured_ndarray_axis_length) data = random_structured_numpy_array(shape, (1, 0)).view(np.recarray).copy() out = self.write_readback(data, random_name(), self.options) self.assert_equal(out, data) def check_numpy_recarray_field_special_char(self, ch): # Makes a random 1d structured ndarray with the character # in one field, converts it to a recarray, writes it and reads # it back, and then compares it. field_names = [random_str_ascii(max_dict_key_length) for i in range(2)] field_names[1] = field_names[1][0] + ch + field_names[1][1:] if sys.hexversion < 0x03000000: for i in range(len(field_names)): field_names[i] = field_names[i].encode('UTF-8') shape = random_numpy_shape(1, \ max_structured_ndarray_axis_length) data = random_structured_numpy_array(shape, names=field_names).view(np.recarray).copy() out = self.write_readback(data, random_name(), self.options) self.assert_equal(out, data) def check_numpy_chararray(self, dimensions): # Makes a random numpy array of bytes, converts it to a # chararray, writes it and reads it back, and then compares it. shape = random_numpy_shape(dimensions, max_array_axis_length) data = random_numpy(shape, 'S').view(np.chararray).copy() out = self.write_readback(data, random_name(), self.options) self.assert_equal(out, data) def check_numpy_chararray_empty(self, num_chars): # Makes an empty numpy array of bytes of the given number of # characters, converts it to a chararray, writes it and reads it # back, and then compares it. data = np.array([], 'S' + str(num_chars)).view(np.chararray).copy() out = self.write_readback(data, random_name(), self.options) self.assert_equal(out, data) def check_numpy_sized_dtype_nested_0(self, zero_shaped): dtypes = ('uint8', 'uint16', 'uint32', 'uint64', 'int8', 'int16', 'int32', 'int64', 'float32', 'float64', 'complex64', 'complex128') for i in range(10): dt = (random.choice(dtypes), (2, 2 * zero_shaped)) data = np.zeros((2, ), dtype=dt) out = self.write_readback(data, random_name(), self.options) self.assert_equal(out, data) def check_numpy_sized_dtype_nested_1(self, zero_shaped): dtypes = ('uint8', 'uint16', 'uint32', 'uint64', 'int8', 'int16', 'int32', 'int64', 'float32', 'float64', 'complex64', 'complex128') for i in range(10): dt = [('a', random.choice(dtypes), (1, 2)), ('b', random.choice(dtypes), (1, 1, 4 * zero_shaped)), ('c', [('a', random.choice(dtypes)), ('b', random.choice(dtypes), (1, 2))])] data = np.zeros((random.randrange(1, 4), ), dtype=dt) out = self.write_readback(data, random_name(), self.options) self.assert_equal(out, data) def check_numpy_sized_dtype_nested_2(self, zero_shaped): dtypes = ('uint8', 'uint16', 'uint32', 'uint64', 'int8', 'int16', 'int32', 'int64', 'float32', 'float64', 'complex64', 'complex128') for i in range(10): dt = [('a', random.choice(dtypes), (1, 3)), ('b', [('a', random.choice(dtypes), (2, )), ('b', random.choice(dtypes), (1, 2, 1))]), ('c', [('a', random.choice(dtypes), (3 * zero_shaped, 1)), ('b', random.choice(dtypes), (2, ))], (2, 1))] data = np.zeros((2, ), dtype=dt) out = self.write_readback(data, random_name(), self.options) self.assert_equal(out, data) def check_numpy_sized_dtype_nested_3(self, zero_shaped): dtypes = ('uint8', 'uint16', 'uint32', 'uint64', 'int8', 'int16', 'int32', 'int64', 'float32', 'float64', 'complex64', 'complex128') for i in range(10): dt = [('a', random.choice(dtypes), (3, 2)), ('b', [('a', [('a', random.choice(dtypes))], (2, 2)), ('b', random.choice(dtypes), (1, 2))]), ('c', [('a', [('a', random.choice(dtypes), (2, 1, zero_shaped * 2))]), ('b', random.choice(dtypes))])] data = np.zeros((1, ), dtype=dt) out = self.write_readback(data, random_name(), self.options) self.assert_equal(out, data) def check_python_collection(self, tp, same_dims): # Makes a random collection of the specified type, writes it and # reads it back, and then compares it. if tp in (set, frozenset): data = tp(random_list(max_list_length, python_or_numpy='python')) else: if same_dims == 'same-dims': shape = random_numpy_shape(random.randrange(2, 4), random.randrange(1, 4)) dtypes = ('uint8', 'uint16', 'uint32', 'uint64', 'int8', 'int16', 'int32', 'int64', 'float32', 'float64', 'complex64', 'complex128') data = tp([random_numpy(shape, random.choice(dtypes), allow_nan=True) for i in range(random.randrange(2, 7))]) elif same_dims == 'diff-dims': data = tp(random_list(max_list_length, python_or_numpy='numpy')) else: raise ValueError('invalid value of same_dims') out = self.write_readback(data, random_name(), self.options) self.assert_equal(out, data) def test_None(self): data = None out = self.write_readback(data, random_name(), self.options) self.assert_equal(out, data) def test_bool_True(self): data = True out = self.write_readback(data, random_name(), self.options) self.assert_equal(out, data) def test_bool_False(self): data = False out = self.write_readback(data, random_name(), self.options) self.assert_equal(out, data) def test_int_needs_32_bits(self): data = random_int() out = self.write_readback(data, random_name(), self.options) self.assert_equal(out, data) def test_int_needs_64_bits(self): data = (2**32) * random_int() out = self.write_readback(data, random_name(), self.options) self.assert_equal(out, data) # Only relevant in Python 2.x. def test_long_needs_32_bits(self): if sys.hexversion < 0x03000000: data = long(random_int()) out = self.write_readback(data, random_name(), self.options) self.assert_equal(out, data) # Only relevant in Python 2.x. def test_long_needs_64_bits(self): if sys.hexversion < 0x03000000: data = long(2)**32 * long(random_int()) out = self.write_readback(data, random_name(), self.options) self.assert_equal(out, data) @raises(NotImplementedError) def test_int_or_long_too_big(self): if sys.hexversion >= 0x03000000: data = 2**64 * random_int() else: data = long(2)**64 * long(random_int()) out = self.write_readback(data, random_name(), self.options) self.assert_equal(out, data) def test_float(self): data = random_float() out = self.write_readback(data, random_name(), self.options) self.assert_equal(out, data) def test_float_inf(self): data = float(np.inf) out = self.write_readback(data, random_name(), self.options) self.assert_equal(out, data) def test_float_ninf(self): data = float(-np.inf) out = self.write_readback(data, random_name(), self.options) self.assert_equal(out, data) def test_float_nan(self): data = float(np.nan) out = self.write_readback(data, random_name(), self.options) assert math.isnan(out) def test_complex(self): data = random_float() + 1j*random_float() out = self.write_readback(data, random_name(), self.options) self.assert_equal(out, data) def test_complex_real_nan(self): data = complex(np.nan, random_float()) out = self.write_readback(data, random_name(), self.options) self.assert_equal(out, data) def test_complex_imaginary_nan(self): data = complex(random_float(), np.nan) out = self.write_readback(data, random_name(), self.options) self.assert_equal(out, data) def test_str_ascii(self): data = random_str_ascii(random.randint(1, max_string_length)) out = self.write_readback(data, random_name(), self.options) self.assert_equal(out, data) @raises(NotImplementedError) def test_str_ascii_encoded_utf8(self): ltrs = string.ascii_letters + string.digits data = 'a' if sys.hexversion < 0x03000000: data = unicode(data) ltrs = unicode(ltrs) while all([(c in ltrs) for c in data]): data = random_str_some_unicode(random.randint(1, \ max_string_length)) data = data.encode('utf-8') out = self.write_readback(data, random_name(), self.options) self.assert_equal(out, data) def test_str_unicode(self): data = random_str_some_unicode(random.randint(1, max_string_length)) out = self.write_readback(data, random_name(), self.options) self.assert_equal(out, data) def test_str_empty(self): data = '' out = self.write_readback(data, random_name(), self.options) self.assert_equal(out, data) def test_bytes(self): data = random_bytes(random.randint(1, max_string_length)) out = self.write_readback(data, random_name(), self.options) self.assert_equal(out, data) def test_bytes_empty(self): data = b'' out = self.write_readback(data, random_name(), self.options) self.assert_equal(out, data) def test_bytearray(self): data = bytearray(random_bytes(random.randint(1, max_string_length))) out = self.write_readback(data, random_name(), self.options) self.assert_equal(out, data) def test_bytearray_empty(self): data = bytearray(b'') out = self.write_readback(data, random_name(), self.options) self.assert_equal(out, data) def test_numpy_scalar(self): for dt in self.dtypes: yield self.check_numpy_scalar, dt def test_numpy_array_1d(self): dts = copy.deepcopy(self.dtypes) dts.append('object') for dt in dts: yield self.check_numpy_array, dt, 1 def test_numpy_array_2d(self): dts = copy.deepcopy(self.dtypes) dts.append('object') for dt in dts: yield self.check_numpy_array, dt, 2 def test_numpy_array_3d(self): dts = copy.deepcopy(self.dtypes) dts.append('object') for dt in dts: yield self.check_numpy_array, dt, 3 def test_numpy_matrix(self): dts = copy.deepcopy(self.dtypes) dts.append('object') for dt in dts: yield self.check_numpy_matrix, dt def test_numpy_empty(self): for dt in self.dtypes: yield self.check_numpy_empty, dt def test_numpy_stringlike_empty(self): dts = ['S', 'U'] for dt in dts: for n in range(1,10): yield self.check_numpy_stringlike_empty, dt, n def test_numpy_structured_array(self): for i in range(1, 4): yield self.check_numpy_structured_array, i def test_numpy_structured_array_empty(self): for i in range(1, 4): yield self.check_numpy_structured_array_empty, i def test_numpy_structured_array_unicode_fields(self): # Makes a random 1d structured ndarray with non-ascii characters # in its fields, writes it and reads it back, and then compares # it. shape = random_numpy_shape(1, \ max_structured_ndarray_axis_length) data = random_structured_numpy_array(shape, \ nonascii_fields=True).view(np.recarray).copy() out = self.write_readback(data, random_name(), self.options) self.assert_equal(out, data) @raises(NotImplementedError) def test_numpy_structured_array_field_null_character(self): self.check_numpy_structured_array_field_special_char('\x00') @raises(NotImplementedError) def test_numpy_structured_array_field_forward_slash(self): self.check_numpy_structured_array_field_special_char('/') def test_numpy_recarray(self): for i in range(1, 4): yield self.check_numpy_recarray, i def test_numpy_recarray_empty(self): for i in range(1, 4): yield self.check_numpy_recarray_empty, i def test_numpy_recarray_unicode_fields(self): # Makes a random 1d structured ndarray with non-ascii characters # in its fields, converts it to a recarray, writes it and reads # it back, and then compares it. shape = random_numpy_shape(1, \ max_structured_ndarray_axis_length) data = random_structured_numpy_array(shape, nonascii_fields=True) out = self.write_readback(data, random_name(), self.options) self.assert_equal(out, data) @raises(NotImplementedError) def test_numpy_recarray_field_null_character(self): self.check_numpy_recarray_field_special_char('\x00') @raises(NotImplementedError) def test_numpy_recarray_field_forward_slash(self): self.check_numpy_recarray_field_special_char('/') def test_numpy_chararray(self): dims = range(1, 4) for dim in dims: yield self.check_numpy_chararray, dim def test_numpy_chararray_empty(self): for n in range(1, 10): yield self.check_numpy_chararray_empty, n def test_numpy_sized_dtype_nested_0(self): for zero_shaped in (False, True): yield self.check_numpy_sized_dtype_nested_0, zero_shaped def test_numpy_sized_dtype_nested_1(self): for zero_shaped in (False, True): yield self.check_numpy_sized_dtype_nested_1, zero_shaped def test_numpy_sized_dtype_nested_2(self): for zero_shaped in (False, True): yield self.check_numpy_sized_dtype_nested_2, zero_shaped def test_numpy_sized_dtype_nested_3(self): for zero_shaped in (False, True): yield self.check_numpy_sized_dtype_nested_3, zero_shaped def test_python_collection(self): for tp in (list, tuple, set, frozenset, collections.deque): yield self.check_python_collection, tp, 'same-dims' yield self.check_python_collection, tp, 'diff-dims' def test_dict(self): data = random_dict() out = self.write_readback(data, random_name(), self.options) self.assert_equal(out, data) @raises(NotImplementedError) def test_dict_bytes_key(self): data = random_dict() key = random_bytes(max_dict_key_length) data[key] = random_int() out = self.write_readback(data, random_name(), self.options) self.assert_equal(out, data) @raises(NotImplementedError) def test_dict_key_null_character(self): data = random_dict() if sys.hexversion >= 0x03000000: ch = '\x00' else: ch = unicode('\x00') key = ch.join([random_str_ascii(max_dict_key_length) for i in range(2)]) data[key] = random_int() out = self.write_readback(data, random_name(), self.options) self.assert_equal(out, data) @raises(NotImplementedError) def test_dict_key_forward_slash(self): data = random_dict() if sys.hexversion >= 0x03000000: ch = '/' else: ch = unicode('/') key = ch.join([random_str_ascii(max_dict_key_length) for i in range(2)]) data[key] = random_int() out = self.write_readback(data, random_name(), self.options) self.assert_equal(out, data) class TestPythonFormat(TestPythonMatlabFormat): def __init__(self): # The parent does most of the setup. All that has to be changed # is turning MATLAB compatibility off and changing the file # name. TestPythonMatlabFormat.__init__(self) self.options = hdf5storage.Options(matlab_compatible=False) self.filename = 'data.h5' # Won't throw an exception unlike the parent. def test_str_ascii_encoded_utf8(self): ltrs = string.ascii_letters + string.digits data = 'a' if sys.hexversion < 0x03000000: data = unicode(data) ltrs = unicode(ltrs) while all([(c in ltrs) for c in data]): data = random_str_some_unicode(random.randint(1, \ max_string_length)) data = data.encode('utf-8') out = self.write_readback(data, random_name(), self.options) self.assert_equal(out, data) # Won't throw an exception unlike the parent. def test_numpy_structured_array_field_forward_slash(self): self.check_numpy_structured_array_field_special_char('/') # Won't throw an exception unlike the parent. def test_numpy_recarray_field_forward_slash(self): self.check_numpy_recarray_field_special_char('/') class TestNoneFormat(TestPythonMatlabFormat): def __init__(self): # The parent does most of the setup. All that has to be changed # is turning off the storage of type information as well as # MATLAB compatibility. TestPythonMatlabFormat.__init__(self) self.options = hdf5storage.Options(store_python_metadata=False, matlab_compatible=False) # Add in float16 to the set of types tested. self.dtypes.append('float16') # Won't throw an exception unlike the parent. def test_str_ascii_encoded_utf8(self): ltrs = string.ascii_letters + string.digits data = 'a' if sys.hexversion < 0x03000000: data = unicode(data) ltrs = unicode(ltrs) while all([(c in ltrs) for c in data]): data = random_str_some_unicode(random.randint(1, \ max_string_length)) data = data.encode('utf-8') out = self.write_readback(data, random_name(), self.options) self.assert_equal(out, data) # Won't throw an exception unlike the parent. def test_numpy_structured_array_field_forward_slash(self): self.check_numpy_structured_array_field_special_char('/') # Won't throw an exception unlike the parent. def test_numpy_recarray_field_forward_slash(self): self.check_numpy_recarray_field_special_char('/') def assert_equal(self, a, b): assert_equal_none_format(a, b) class TestMatlabFormat(TestPythonMatlabFormat): def __init__(self): # The parent does most of the setup. All that has to be changed # is turning on the matlab compatibility, and changing the # filename. TestPythonMatlabFormat.__init__(self) self.options = hdf5storage.Options(store_python_metadata=False, matlab_compatible=True) self.filename = 'data.mat' def assert_equal(self, a, b): assert_equal_matlab_format(a, b)