h5netcdf-0.7.1/0000755131111500116100000000000013443342124013317 5ustar shoyereng00000000000000h5netcdf-0.7.1/LICENSE0000644131111500116100000000273313263424647014344 0ustar shoyereng00000000000000Copyright (c) 2015, Stephan Hoyer All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. h5netcdf-0.7.1/MANIFEST.in0000644131111500116100000000006613263424647015072 0ustar shoyereng00000000000000include LICENSE recursive-include h5netcdf/tests *.py h5netcdf-0.7.1/PKG-INFO0000644131111500116100000002662713443342124014431 0ustar shoyereng00000000000000Metadata-Version: 1.1 Name: h5netcdf Version: 0.7.1 Summary: netCDF4 via h5py Home-page: https://github.com/shoyer/h5netcdf Author: Stephan Hoyer Author-email: shoyer@gmail.com License: BSD Description: h5netcdf ======== .. image:: https://travis-ci.org/shoyer/h5netcdf.svg?branch=master :target: https://travis-ci.org/shoyer/h5netcdf .. image:: https://badge.fury.io/py/h5netcdf.svg :target: https://pypi.python.org/pypi/h5netcdf/ A Python interface for the netCDF4_ file-format that reads and writes local or remote HDF5 files directly via h5py_ or h5pyd_, without relying on the Unidata netCDF library. .. _netCDF4: http://www.unidata.ucar.edu/software/netcdf/docs/file_format_specifications.html#netcdf_4_spec .. _h5py: http://www.h5py.org/ .. _h5pyd: https://github.com/HDFGroup/h5pyd Why h5netcdf? ------------- - We've seen occasional reports of better performance with h5py than netCDF4-python, though in many cases performance is identical. For `one workflow`_, h5netcdf was reported to be almost **4x faster** than `netCDF4-python`_. - It has one less massive binary dependency (netCDF C). If you already have h5py installed, reading netCDF4 with h5netcdf may be much easier than installing netCDF4-Python. - Anecdotally, HDF5 users seem to be unexcited about switching to netCDF -- hopefully this will convince them that the netCDF4 is actually quite sane! - Finally, side-stepping the netCDF C library (and Cython bindings to it) gives us an easier way to identify the source of performance issues and bugs. .. _one workflow: https://github.com/Unidata/netcdf4-python/issues/390#issuecomment-93864839 .. _xarray: http://github.com/pydata/xarray/ Install ------- Ensure you have a recent version of h5py installed (I recommend using conda_). At least version 2.1 is required (for dimension scales); versions 2.3 and newer have been verified to work, though some tests only pass on h5py 2.6. Then: ``pip install h5netcdf`` .. _conda: http://conda.io/ Usage ----- h5netcdf has two APIs, a new API and a legacy API. Both interfaces currently reproduce most of the features of the netCDF interface, with the notable exception of support for operations the rename or delete existing objects. We simply haven't gotten around to implementing this yet. Patches would be very welcome. New API ~~~~~~~ The new API supports direct hierarchical access of variables and groups. Its design is an adaptation of h5py to the netCDF data model. For example: .. code-block:: python import h5netcdf import numpy as np with h5netcdf.File('mydata.nc', 'w') as f: # set dimensions with a dictionary f.dimensions = {'x': 5} # and update them with a dict-like interface # f.dimensions['x'] = 5 # f.dimensions.update({'x': 5}) v = f.create_variable('hello', ('x',), float) v[:] = np.ones(5) # you don't need to create groups first # you also don't need to create dimensions first if you supply data # with the new variable v = f.create_variable('/grouped/data', ('y',), data=np.arange(10)) # access and modify attributes with a dict-like interface v.attrs['foo'] = 'bar' # you can access variables and groups directly using a hierarchical # keys like h5py print(f['/grouped/data']) # add an unlimited dimension f.dimensions['z'] = None # explicitly resize a dimension and all variables using it f.resize_dimension('z', 3) Legacy API ~~~~~~~~~~ The legacy API is designed for compatibility with netCDF4-python_. To use it, import ``h5netcdf.legacyapi``: .. _netCDF4-python: https://github.com/Unidata/netcdf4-python .. code-block:: python import h5netcdf.legacyapi as netCDF4 # everything here would also work with this instead: # import netCDF4 import numpy as np with netCDF4.Dataset('mydata.nc', 'w') as ds: ds.createDimension('x', 5) v = ds.createVariable('hello', float, ('x',)) v[:] = np.ones(5) g = ds.createGroup('grouped') g.createDimension('y', 10) g.createVariable('data', 'i8', ('y',)) v = g['data'] v[:] = np.arange(10) v.foo = 'bar' print(ds.groups['grouped'].variables['data']) The legacy API is designed to be easy to try-out for netCDF4-python users, but it is not an exact match. Here is an incomplete list of functionality we don't include: - Utility functions ``chartostring``, ``num2date``, etc., that are not directly necessary for writing netCDF files. - We don't support the ``endian`` argument to ``createVariable`` yet (see `GitHub issue`_). - h5netcdf variables do not support automatic masking or scaling (e.g., of values matching the ``_FillValue`` attribute). We prefer to leave this functionality to client libraries (e.g., xarray_), which can implement their exact desired scaling behavior. - No support yet for automatic resizing of unlimited dimensions with array indexing. This would be a welcome pull request. For now, dimensions can be manually resized with ``Group.resize_dimension(dimension, size)``. .. _GitHub issue: https://github.com/shoyer/h5netcdf/issues/15 Invalid netCDF files ~~~~~~~~~~~~~~~~~~~~ h5py implements some features that do not (yet) result in valid netCDF files: - Data types: - Booleans - Complex values - Non-string variable length types - Enum types - Reference types - Arbitrary filters: - Scale-offset filters By default [*]_, h5netcdf will not allow writing files using any of these features, as files with such features are not readable by other netCDF tools. However, these are still valid HDF5 files. If you don't care about netCDF compatibility, you can use these features by setting ``invalid_netcdf=True`` when creating a file: .. code-block:: python # avoid the .nc extension for non-netcdf files f = h5netcdf.File('mydata.h5', invalid_netcdf=True) ... # works with the legacy API, too, though compression options are not exposed ds = h5netcdf.legacyapi.Dataset('mydata.h5', invalid_netcdf=True) ... .. [*] Currently, we only issue a warning, but in a future version of h5netcdf, we will raise ``h5netcdf.CompatibilityError``. Use ``invalid_netcdf=False`` to switch to the new behavior now. Change Log ---------- Version 0.7.1 (Mar 16, 2019): - Fixed a bug where h5netcdf could write invalid netCDF files with reused dimension IDs. netCDF-C 4.6.2 will crash when reading these files. - Updated to use version 2 of ``_NCProperties`` attribute. Version 0.7 (Feb 26, 2019): - Support for reading and writing file-like objects (requires h5py 2.9 or newer). By `Scott Henderson `_. Version 0.6.2 (Aug 19, 2018): - Fixed a bug that prevented creating variables with the same name as previously created dimensions in reopened files. Version 0.6.1 (Jun 8, 2018): - Compression with arbitrary filters no longer triggers warnings about invalid netCDF files, because this is now `supported by netCDF `__. Version 0.6 (Jun 7, 2018): - Support for reading and writing data to remote HDF5 files via the HDF5 REST API using the h5pyd_ package. Any file "path" starting with either ``http://``, ``https://``, or ``hdf5://`` will automatically trigger the use of this package. By `Aleksandar Jelenak `_. Version 0.5.1 (Apr 11, 2018): - Bug fix for files with an unlimited dimension with no associated variables. By `Aleksandar Jelenak `_. Version 0.5 (Oct 17, 2017): - Support for creating unlimited dimensions. By `Lion Krischer `_. Version 0.4.3 (Oct 10, 2017): - Fix test suite failure with recent versions of netCDF4-Python. Version 0.4.2 (Sep 12, 2017): - Raise ``AttributeError`` rather than ``KeyError`` when attributes are not found using the legacy API. This fixes an issue that prevented writing to h5netcdf with dask. Version 0.4.1 (Sep 6, 2017): - Include tests in source distribution on pypi. Version 0.4 (Aug 30, 2017): - Add ``invalid_netcdf`` argument. Warnings are now issued by default when writing an invalid NetCDF file. See the "Invalid netCDF files" section of the README for full details. Version 0.3.1 (Sep 2, 2016): - Fix garbage collection issue. - Add missing ``.flush()`` method for groups. - Allow creating dimensions of size 0. Version 0.3.0 (Aug 7, 2016): - Datasets are now loaded lazily. This should increase performance when opening files with a large number of groups and/or variables. - Support for writing arrays of variable length unicode strings with ``dtype=str`` via the legacy API. - h5netcdf now writes the ``_NCProperties`` attribute for identifying netCDF4 files. License ------- `3-clause BSD`_ .. _3-clause BSD: https://github.com/shoyer/h5netcdf/blob/master/LICENSE Platform: UNKNOWN Classifier: Development Status :: 4 - Beta Classifier: License :: OSI Approved :: BSD License Classifier: Operating System :: OS Independent Classifier: Intended Audience :: Science/Research Classifier: Programming Language :: Python Classifier: Programming Language :: Python :: 2 Classifier: Programming Language :: Python :: 2.7 Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3.4 Classifier: Programming Language :: Python :: 3.5 Classifier: Programming Language :: Python :: 3.6 Classifier: Programming Language :: Python :: 3.7 Classifier: Topic :: Scientific/Engineering h5netcdf-0.7.1/README.rst0000644131111500116100000002112113443341601015002 0ustar shoyereng00000000000000h5netcdf ======== .. image:: https://travis-ci.org/shoyer/h5netcdf.svg?branch=master :target: https://travis-ci.org/shoyer/h5netcdf .. image:: https://badge.fury.io/py/h5netcdf.svg :target: https://pypi.python.org/pypi/h5netcdf/ A Python interface for the netCDF4_ file-format that reads and writes local or remote HDF5 files directly via h5py_ or h5pyd_, without relying on the Unidata netCDF library. .. _netCDF4: http://www.unidata.ucar.edu/software/netcdf/docs/file_format_specifications.html#netcdf_4_spec .. _h5py: http://www.h5py.org/ .. _h5pyd: https://github.com/HDFGroup/h5pyd Why h5netcdf? ------------- - We've seen occasional reports of better performance with h5py than netCDF4-python, though in many cases performance is identical. For `one workflow`_, h5netcdf was reported to be almost **4x faster** than `netCDF4-python`_. - It has one less massive binary dependency (netCDF C). If you already have h5py installed, reading netCDF4 with h5netcdf may be much easier than installing netCDF4-Python. - Anecdotally, HDF5 users seem to be unexcited about switching to netCDF -- hopefully this will convince them that the netCDF4 is actually quite sane! - Finally, side-stepping the netCDF C library (and Cython bindings to it) gives us an easier way to identify the source of performance issues and bugs. .. _one workflow: https://github.com/Unidata/netcdf4-python/issues/390#issuecomment-93864839 .. _xarray: http://github.com/pydata/xarray/ Install ------- Ensure you have a recent version of h5py installed (I recommend using conda_). At least version 2.1 is required (for dimension scales); versions 2.3 and newer have been verified to work, though some tests only pass on h5py 2.6. Then: ``pip install h5netcdf`` .. _conda: http://conda.io/ Usage ----- h5netcdf has two APIs, a new API and a legacy API. Both interfaces currently reproduce most of the features of the netCDF interface, with the notable exception of support for operations the rename or delete existing objects. We simply haven't gotten around to implementing this yet. Patches would be very welcome. New API ~~~~~~~ The new API supports direct hierarchical access of variables and groups. Its design is an adaptation of h5py to the netCDF data model. For example: .. code-block:: python import h5netcdf import numpy as np with h5netcdf.File('mydata.nc', 'w') as f: # set dimensions with a dictionary f.dimensions = {'x': 5} # and update them with a dict-like interface # f.dimensions['x'] = 5 # f.dimensions.update({'x': 5}) v = f.create_variable('hello', ('x',), float) v[:] = np.ones(5) # you don't need to create groups first # you also don't need to create dimensions first if you supply data # with the new variable v = f.create_variable('/grouped/data', ('y',), data=np.arange(10)) # access and modify attributes with a dict-like interface v.attrs['foo'] = 'bar' # you can access variables and groups directly using a hierarchical # keys like h5py print(f['/grouped/data']) # add an unlimited dimension f.dimensions['z'] = None # explicitly resize a dimension and all variables using it f.resize_dimension('z', 3) Legacy API ~~~~~~~~~~ The legacy API is designed for compatibility with netCDF4-python_. To use it, import ``h5netcdf.legacyapi``: .. _netCDF4-python: https://github.com/Unidata/netcdf4-python .. code-block:: python import h5netcdf.legacyapi as netCDF4 # everything here would also work with this instead: # import netCDF4 import numpy as np with netCDF4.Dataset('mydata.nc', 'w') as ds: ds.createDimension('x', 5) v = ds.createVariable('hello', float, ('x',)) v[:] = np.ones(5) g = ds.createGroup('grouped') g.createDimension('y', 10) g.createVariable('data', 'i8', ('y',)) v = g['data'] v[:] = np.arange(10) v.foo = 'bar' print(ds.groups['grouped'].variables['data']) The legacy API is designed to be easy to try-out for netCDF4-python users, but it is not an exact match. Here is an incomplete list of functionality we don't include: - Utility functions ``chartostring``, ``num2date``, etc., that are not directly necessary for writing netCDF files. - We don't support the ``endian`` argument to ``createVariable`` yet (see `GitHub issue`_). - h5netcdf variables do not support automatic masking or scaling (e.g., of values matching the ``_FillValue`` attribute). We prefer to leave this functionality to client libraries (e.g., xarray_), which can implement their exact desired scaling behavior. - No support yet for automatic resizing of unlimited dimensions with array indexing. This would be a welcome pull request. For now, dimensions can be manually resized with ``Group.resize_dimension(dimension, size)``. .. _GitHub issue: https://github.com/shoyer/h5netcdf/issues/15 Invalid netCDF files ~~~~~~~~~~~~~~~~~~~~ h5py implements some features that do not (yet) result in valid netCDF files: - Data types: - Booleans - Complex values - Non-string variable length types - Enum types - Reference types - Arbitrary filters: - Scale-offset filters By default [*]_, h5netcdf will not allow writing files using any of these features, as files with such features are not readable by other netCDF tools. However, these are still valid HDF5 files. If you don't care about netCDF compatibility, you can use these features by setting ``invalid_netcdf=True`` when creating a file: .. code-block:: python # avoid the .nc extension for non-netcdf files f = h5netcdf.File('mydata.h5', invalid_netcdf=True) ... # works with the legacy API, too, though compression options are not exposed ds = h5netcdf.legacyapi.Dataset('mydata.h5', invalid_netcdf=True) ... .. [*] Currently, we only issue a warning, but in a future version of h5netcdf, we will raise ``h5netcdf.CompatibilityError``. Use ``invalid_netcdf=False`` to switch to the new behavior now. Change Log ---------- Version 0.7.1 (Mar 16, 2019): - Fixed a bug where h5netcdf could write invalid netCDF files with reused dimension IDs. netCDF-C 4.6.2 will crash when reading these files. - Updated to use version 2 of ``_NCProperties`` attribute. Version 0.7 (Feb 26, 2019): - Support for reading and writing file-like objects (requires h5py 2.9 or newer). By `Scott Henderson `_. Version 0.6.2 (Aug 19, 2018): - Fixed a bug that prevented creating variables with the same name as previously created dimensions in reopened files. Version 0.6.1 (Jun 8, 2018): - Compression with arbitrary filters no longer triggers warnings about invalid netCDF files, because this is now `supported by netCDF `__. Version 0.6 (Jun 7, 2018): - Support for reading and writing data to remote HDF5 files via the HDF5 REST API using the h5pyd_ package. Any file "path" starting with either ``http://``, ``https://``, or ``hdf5://`` will automatically trigger the use of this package. By `Aleksandar Jelenak `_. Version 0.5.1 (Apr 11, 2018): - Bug fix for files with an unlimited dimension with no associated variables. By `Aleksandar Jelenak `_. Version 0.5 (Oct 17, 2017): - Support for creating unlimited dimensions. By `Lion Krischer `_. Version 0.4.3 (Oct 10, 2017): - Fix test suite failure with recent versions of netCDF4-Python. Version 0.4.2 (Sep 12, 2017): - Raise ``AttributeError`` rather than ``KeyError`` when attributes are not found using the legacy API. This fixes an issue that prevented writing to h5netcdf with dask. Version 0.4.1 (Sep 6, 2017): - Include tests in source distribution on pypi. Version 0.4 (Aug 30, 2017): - Add ``invalid_netcdf`` argument. Warnings are now issued by default when writing an invalid NetCDF file. See the "Invalid netCDF files" section of the README for full details. Version 0.3.1 (Sep 2, 2016): - Fix garbage collection issue. - Add missing ``.flush()`` method for groups. - Allow creating dimensions of size 0. Version 0.3.0 (Aug 7, 2016): - Datasets are now loaded lazily. This should increase performance when opening files with a large number of groups and/or variables. - Support for writing arrays of variable length unicode strings with ``dtype=str`` via the legacy API. - h5netcdf now writes the ``_NCProperties`` attribute for identifying netCDF4 files. License ------- `3-clause BSD`_ .. _3-clause BSD: https://github.com/shoyer/h5netcdf/blob/master/LICENSE h5netcdf-0.7.1/h5netcdf/0000755131111500116100000000000013443342124015017 5ustar shoyereng00000000000000h5netcdf-0.7.1/h5netcdf/__init__.py0000644131111500116100000000035713263424647017150 0ustar shoyereng00000000000000""" h5netcdf ======== A Python library for the netCDF4 file-format that directly reads and writes HDF5 files via h5py, without using the Unidata netCDF library. """ from .core import CompatibilityError, File, Group, Variable, __version__ h5netcdf-0.7.1/h5netcdf/_chainmap.py0000644131111500116100000001102013263424647017315 0ustar shoyereng00000000000000# Backported from Python 3.4 # Copyright (c) 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, # 2011, 2012, 2013, 2014, 2015 Python Software Foundation; All Rights Reserved from collections import MutableMapping try: from thread import get_ident except ImportError: from _thread import get_ident def recursive_repr(fillvalue='...'): 'Decorator to make a repr function return fillvalue for a recursive call' def decorating_function(user_function): repr_running = set() def wrapper(self): key = id(self), get_ident() if key in repr_running: return fillvalue repr_running.add(key) try: result = user_function(self) finally: repr_running.discard(key) return result # Can't use functools.wraps() here because of bootstrap issues wrapper.__module__ = getattr(user_function, '__module__') wrapper.__doc__ = getattr(user_function, '__doc__') wrapper.__name__ = getattr(user_function, '__name__') return wrapper return decorating_function class ChainMap(MutableMapping): ''' A ChainMap groups multiple dicts (or other mappings) together to create a single, updateable view. The underlying mappings are stored in a list. That list is public and can accessed or updated using the *maps* attribute. There is no other state. Lookups search the underlying mappings successively until a key is found. In contrast, writes, updates, and deletions only operate on the first mapping. ''' def __init__(self, *maps): '''Initialize a ChainMap by setting *maps* to the given mappings. If no mappings are provided, a single empty dictionary is used. ''' self.maps = list(maps) or [{}] # always at least one map def __missing__(self, key): raise KeyError(key) def __getitem__(self, key): for mapping in self.maps: try: return mapping[key] # can't use 'key in mapping' with defaultdict except KeyError: pass return self.__missing__(key) # support subclasses that define __missing__ def get(self, key, default=None): return self[key] if key in self else default def __len__(self): return len(set().union(*self.maps)) # reuses stored hash values if possible def __iter__(self): return iter(set().union(*self.maps)) def __contains__(self, key): return any(key in m for m in self.maps) def __bool__(self): return any(self.maps) @recursive_repr() def __repr__(self): return '{0.__class__.__name__}({1})'.format( self, ', '.join(repr(m) for m in self.maps)) @classmethod def fromkeys(cls, iterable, *args): 'Create a ChainMap with a single dict created from the iterable.' return cls(dict.fromkeys(iterable, *args)) def copy(self): 'New ChainMap or subclass with a new copy of maps[0] and refs to maps[1:]' return self.__class__(self.maps[0].copy(), *self.maps[1:]) __copy__ = copy def new_child(self, m=None): # like Django's Context.push() ''' New ChainMap with a new map followed by all previous maps. If no map is provided, an empty dict is used. ''' if m is None: m = {} return self.__class__(m, *self.maps) @property def parents(self): # like Django's Context.pop() 'New ChainMap from maps[1:].' return self.__class__(*self.maps[1:]) def __setitem__(self, key, value): self.maps[0][key] = value def __delitem__(self, key): try: del self.maps[0][key] except KeyError: raise KeyError('Key not found in the first mapping: {!r}'.format(key)) def popitem(self): 'Remove and return an item pair from maps[0]. Raise KeyError is maps[0] is empty.' try: return self.maps[0].popitem() except KeyError: raise KeyError('No keys found in the first mapping.') def pop(self, key, *args): 'Remove *key* from maps[0] and return its value. Raise KeyError if *key* not in maps[0].' try: return self.maps[0].pop(key, *args) except KeyError: raise KeyError('Key not found in the first mapping: {!r}'.format(key)) def clear(self): 'Clear maps[0], leaving maps[1:] intact.' self.maps[0].clear() h5netcdf-0.7.1/h5netcdf/attrs.py0000644131111500116100000000266613263424647016553 0ustar shoyereng00000000000000from collections import MutableMapping import numpy as np _HIDDEN_ATTRS = frozenset(['REFERENCE_LIST', 'CLASS', 'DIMENSION_LIST', 'NAME', '_Netcdf4Dimid', '_Netcdf4Coordinates', '_nc3_strict', '_NCProperties']) class Attributes(MutableMapping): def __init__(self, h5attrs, check_dtype): self._h5attrs = h5attrs self._check_dtype = check_dtype def __getitem__(self, key): if key in _HIDDEN_ATTRS: raise KeyError(key) return self._h5attrs[key] def __setitem__(self, key, value): if key in _HIDDEN_ATTRS: raise AttributeError('cannot write attribute with reserved name %r' % key) if hasattr(value, 'dtype'): dtype = value.dtype else: dtype = np.asarray(value).dtype self._check_dtype(dtype) self._h5attrs[key] = value def __delitem__(self, key): del self._h5attrs[key] def __iter__(self): for key in self._h5attrs: if key not in _HIDDEN_ATTRS: yield key def __len__(self): hidden_count = sum(1 if attr in self._h5attrs else 0 for attr in _HIDDEN_ATTRS) return len(self._h5attrs) - hidden_count def __repr__(self): return '\n'.join(['%r' % type(self)] + ['%s: %r' % (k, v) for k, v in self.items()]) h5netcdf-0.7.1/h5netcdf/compat.py0000644131111500116100000000056513263424647016675 0ustar shoyereng00000000000000import sys PY2 = sys.version_info[0] < 3 if PY2: unicode = unicode else: unicode = str try: from collections import OrderedDict except ImportError: from ordereddict import OrderedDict if sys.version_info < (3, 4): # we need the optional argument to ChainMap.new_child from ._chainmap import ChainMap else: from collections import ChainMap h5netcdf-0.7.1/h5netcdf/core.py0000644131111500116100000005647213443341605016342 0ustar shoyereng00000000000000# For details on how netCDF4 builds on HDF5: # http://www.unidata.ucar.edu/software/netcdf/docs/file_format_specifications.html#netcdf_4_spec from collections import Mapping import os.path import warnings import h5py import numpy as np from distutils.version import LooseVersion from .compat import ChainMap, OrderedDict, unicode from .attrs import Attributes from .dimensions import Dimensions from .utils import Frozen try: import h5pyd except ImportError: no_h5pyd = True h5_group_types = (h5py.Group,) else: no_h5pyd = False h5_group_types = (h5py.Group, h5pyd.Group) __version__ = '0.7.1' _NC_PROPERTIES = (u'version=2,h5netcdf=%s,hdf5=%s,h5py=%s' % (__version__, h5py.version.hdf5_version, h5py.__version__)) NOT_A_VARIABLE = b'This is a netCDF dimension but not a netCDF variable.' def _reverse_dict(dict_): return dict(zip(dict_.values(), dict_.keys())) def _join_h5paths(parent_path, child_path): return '/'.join([parent_path.rstrip('/'), child_path.lstrip('/')]) def _name_from_dimension(dim): # First value in a dimension is the actual dimension scale # which we'll use to extract the name. return dim[0].name.split('/')[-1] class CompatibilityError(Exception): """Raised when using features that are not part of the NetCDF4 API.""" def _invalid_netcdf_feature(feature, allow, file, stacklevel=0): if allow is None: msg = ('{} are supported by h5py, but not part of the NetCDF API. ' 'You are writing an HDF5 file that is not a valid NetCDF file! ' 'In the future, this will be an error, unless you set ' 'invalid_netcdf=True.'.format(feature)) warnings.warn(msg, FutureWarning, stacklevel=stacklevel) file._write_ncproperties = False elif not allow: msg = ('{} are not a supported NetCDF feature, and are not allowed by ' 'h5netcdf unless invalid_netcdf=True.'.format(feature)) raise CompatibilityError(msg) class BaseVariable(object): def __init__(self, parent, name, dimensions=None): self._parent = parent self._root = parent._root self._h5path = _join_h5paths(parent.name, name) self._dimensions = dimensions self._initialized = True @property def _h5ds(self): # Always refer to the root file and store not h5py object # subclasses: return self._root._h5file[self._h5path] @property def name(self): return self._h5ds.name def _lookup_dimensions(self): attrs = self._h5ds.attrs if '_Netcdf4Coordinates' in attrs: order_dim = _reverse_dict(self._parent._dim_order) return tuple(order_dim[coord_id] for coord_id in attrs['_Netcdf4Coordinates']) child_name = self.name.split('/')[-1] if child_name in self._parent.dimensions: return (child_name,) dims = [] for axis, dim in enumerate(self._h5ds.dims): # TODO: read dimension labels even if there is no associated # scale? it's not netCDF4 spec, but it is unambiguous... # Also: the netCDF lib can read HDF5 datasets with unlabeled # dimensions. if len(dim) == 0: raise ValueError('variable %r has no dimension scale ' 'associated with axis %s' % (self.name, axis)) name = _name_from_dimension(dim) dims.append(name) return tuple(dims) @property def dimensions(self): if self._dimensions is None: self._dimensions = self._lookup_dimensions() return self._dimensions @property def shape(self): return self._h5ds.shape @property def ndim(self): return len(self.shape) def __len__(self): return self.shape[0] @property def dtype(self): return self._h5ds.dtype def __array__(self, *args, **kwargs): return self._h5ds.__array__(*args, **kwargs) def __getitem__(self, key): return self._h5ds[key] def __setitem__(self, key, value): self._h5ds[key] = value @property def attrs(self): return Attributes(self._h5ds.attrs, self._root._check_valid_netcdf_dtype) _cls_name = 'h5netcdf.Variable' def __repr__(self): if self._parent._root._closed: return '' % self._cls_name header = ('<%s %r: dimensions %s, shape %s, dtype %s>' % (self._cls_name, self.name, self.dimensions, self.shape, self.dtype)) return '\n'.join([header] + ['Attributes:'] + [' %s: %r' % (k, v) for k, v in self.attrs.items()]) class Variable(BaseVariable): @property def chunks(self): return self._h5ds.chunks @property def compression(self): return self._h5ds.compression @property def compression_opts(self): return self._h5ds.compression_opts @property def fletcher32(self): return self._h5ds.fletcher32 @property def shuffle(self): return self._h5ds.shuffle class _LazyObjectLookup(Mapping): def __init__(self, parent, object_cls): self._parent = parent self._object_cls = object_cls self._objects = OrderedDict() def __setitem__(self, name, obj): self._objects[name] = obj def add(self, name): self._objects[name] = None def __iter__(self): for name in self._objects: yield name def __len__(self): return len(self._objects) def __getitem__(self, key): if self._objects[key] is not None: return self._objects[key] else: self._objects[key] = self._object_cls(self._parent, key) return self._objects[key] def _netcdf_dimension_but_not_variable(h5py_dataset): return NOT_A_VARIABLE in h5py_dataset.attrs.get('NAME', b'') class Group(Mapping): _variable_cls = Variable @property def _group_cls(self): return Group def __init__(self, parent, name): self._parent = parent self._root = parent._root self._h5path = _join_h5paths(parent.name, name) if parent is not self: self._dim_sizes = parent._dim_sizes.new_child() self._current_dim_sizes = parent._current_dim_sizes.new_child() self._dim_order = parent._dim_order.new_child() self._all_h5groups = parent._all_h5groups.new_child(self._h5group) self._variables = _LazyObjectLookup(self, self._variable_cls) self._groups = _LazyObjectLookup(self, self._group_cls) for k, v in self._h5group.items(): if isinstance(v, h5_group_types): # add to the groups collection if this is a h5py(d) Group # instance self._groups.add(k) else: if v.attrs.get('CLASS') == b'DIMENSION_SCALE': dim_id = v.attrs.get('_Netcdf4Dimid') if '_Netcdf4Coordinates' in v.attrs: assert dim_id is not None coord_ids = v.attrs['_Netcdf4Coordinates'] size = v.shape[list(coord_ids).index(dim_id)] current_size = size else: assert len(v.shape) == 1 # Unlimited dimensions are represented as None. size = None if v.maxshape == (None,) else v.size current_size = v.size self._dim_sizes[k] = size # Figure out the current size of a dimension, which for # unlimited dimensions requires looking at the actual # variables. self._current_dim_sizes[k] = \ self._determine_current_dimension_size(k, current_size) if dim_id is None: dim_id = self._root._get_next_dim_id() self._dim_order[k] = dim_id if not _netcdf_dimension_but_not_variable(v): var_name = k if k.startswith('_nc4_non_coord_'): var_name = k[len('_nc4_non_coord_'):] self._variables.add(var_name) self._initialized = True def _determine_current_dimension_size(self, dim_name, max_size): """ Helper method to determine the current size of a dimension. """ # Limited dimension. if self.dimensions[dim_name] is not None: return max_size def _find_dim(h5group, dim): if dim not in h5group: return _find_dim(h5group.parent, dim) return h5group[dim] dim_variable = _find_dim(self._h5group, dim_name) if "REFERENCE_LIST" not in dim_variable.attrs: return max_size root = self._h5group["/"] for ref, _ in dim_variable.attrs["REFERENCE_LIST"]: var = root[ref] for i, var_d in enumerate(var.dims): name = _name_from_dimension(var_d) if name == dim_name: max_size = max(var.shape[i], max_size) return max_size @property def _h5group(self): # Always refer to the root file and store not h5py object # subclasses: return self._root._h5file[self._h5path] @property def name(self): return self._h5group.name def _create_dimension(self, name, size=None): if name in self._dim_sizes.maps[0]: raise ValueError('dimension %r already exists' % name) self._dim_sizes[name] = size self._current_dim_sizes[name] = 0 if size is None else size self._dim_order[name] = self._root._get_next_dim_id() @property def dimensions(self): return Dimensions(self) @dimensions.setter def dimensions(self, value): for k, v in self._dim_sizes.maps[0].items(): if k in value: if v != value[k]: raise ValueError('cannot modify existing dimension %r' % k) else: raise ValueError('new dimensions do not include existing ' 'dimension %r' % k) self.dimensions.update(value) def _create_child_group(self, name): if name in self: raise ValueError('unable to create group %r (name already exists)' % name) self._h5group.create_group(name) self._groups[name] = self._group_cls(self, name) return self._groups[name] def _require_child_group(self, name): try: return self._groups[name] except KeyError: return self._create_child_group(name) def create_group(self, name): if name.startswith('/'): return self._root.create_group(name[1:]) keys = name.split('/') group = self for k in keys[:-1]: group = group._require_child_group(k) return group._create_child_group(keys[-1]) def _create_child_variable(self, name, dimensions, dtype, data, fillvalue, **kwargs): stacklevel = 4 # correct if name does not start with '/' if name in self: raise ValueError('unable to create variable %r ' '(name already exists)' % name) if data is not None: data = np.asarray(data) for d, s in zip(dimensions, data.shape): if d not in self.dimensions: self.dimensions[d] = s if dtype is None: dtype = data.dtype if dtype == np.bool_: # never warn since h5netcdf has always errored here _invalid_netcdf_feature('boolean dtypes', allow=bool(self._root.invalid_netcdf), file=self._root, stacklevel=stacklevel) else: self._root._check_valid_netcdf_dtype(dtype, stacklevel=stacklevel) if 'scaleoffset' in kwargs: _invalid_netcdf_feature('scale-offset filters', allow=self._root.invalid_netcdf, file=self._root, stacklevel=stacklevel) if name in self.dimensions and name not in dimensions: h5name = '_nc4_non_coord_' + name else: h5name = name shape = tuple(self._current_dim_sizes[d] for d in dimensions) maxshape = tuple(self._dim_sizes[d] for d in dimensions) # If it is passed directly it will change the default compression # settings. if shape != maxshape: kwargs["maxshape"] = maxshape # Clear dummy HDF5 datasets with this name that were created for a # dimension scale without a corresponding variable. if name in self.dimensions and name in self._h5group: h5ds = self._h5group[name] if _netcdf_dimension_but_not_variable(h5ds): self._detach_dim_scale(name) del self._h5group[name] self._h5group.create_dataset(h5name, shape, dtype=dtype, data=data, fillvalue=fillvalue, **kwargs) self._variables[h5name] = self._variable_cls(self, h5name, dimensions) variable = self._variables[h5name] if fillvalue is not None: value = variable.dtype.type(fillvalue) variable.attrs._h5attrs['_FillValue'] = value return variable def create_variable(self, name, dimensions=(), dtype=None, data=None, fillvalue=None, **kwargs): if name.startswith('/'): return self._root.create_variable(name[1:], dimensions, dtype, data, fillvalue, **kwargs) keys = name.split('/') group = self for k in keys[:-1]: group = group._require_child_group(k) return group._create_child_variable(keys[-1], dimensions, dtype, data, fillvalue, **kwargs) def _get_child(self, key): try: return self.variables[key] except KeyError: return self.groups[key] def __getitem__(self, key): if key.startswith('/'): return self._root[key[1:]] keys = key.split('/') item = self for k in keys: item = item._get_child(k) return item def __iter__(self): for name in self.groups: yield name for name in self.variables: yield name def __len__(self): return len(self.variables) + len(self.groups) def _create_dim_scales(self): """Create all necessary HDF5 dimension scale.""" dim_order = self._dim_order.maps[0] for dim in sorted(dim_order, key=lambda d: dim_order[d]): if dim not in self._h5group: size = self._current_dim_sizes[dim] kwargs = {} if self._dim_sizes[dim] is None: kwargs["maxshape"] = (None,) self._h5group.create_dataset( name=dim, shape=(size,), dtype='S1', **kwargs) h5ds = self._h5group[dim] h5ds.attrs['_Netcdf4Dimid'] = dim_order[dim] if len(h5ds.shape) > 1: dims = self._variables[dim].dimensions coord_ids = np.array([dim_order[d] for d in dims], 'int32') h5ds.attrs['_Netcdf4Coordinates'] = coord_ids scale_name = dim if dim in self.variables else NOT_A_VARIABLE h5ds.dims.create_scale(h5ds, scale_name) for subgroup in self.groups.values(): subgroup._create_dim_scales() def _attach_dim_scales(self): """Attach dimension scales to all variables.""" for name, var in self.variables.items(): if name not in self.dimensions: for n, dim in enumerate(var.dimensions): var._h5ds.dims[n].attach_scale(self._all_h5groups[dim]) for subgroup in self.groups.values(): subgroup._attach_dim_scales() def _detach_dim_scale(self, name): """Detach the dimension scale corresponding to a dimension name.""" for var in self.variables.values(): for n, dim in enumerate(var.dimensions): if dim == name: var._h5ds.dims[n].detach_scale(self._all_h5groups[dim]) for subgroup in self.groups.values(): if dim not in subgroup._h5group: subgroup._detach_dim_scale(name) @property def parent(self): return self._parent def flush(self): self._root.flush() sync = flush @property def groups(self): return Frozen(self._groups) @property def variables(self): return Frozen(self._variables) @property def attrs(self): return Attributes(self._h5group.attrs, self._root._check_valid_netcdf_dtype) _cls_name = 'h5netcdf.Group' def _repr_body(self): return ( ['Dimensions:'] + [' %s: %s' % ( k, ("Unlimited (current: %s)" % self._current_dim_sizes[k]) if v is None else v) for k, v in self.dimensions.items()] + ['Groups:'] + [' %s' % g for g in self.groups] + ['Variables:'] + [' %s: %r %s' % (k, v.dimensions, v.dtype) for k, v in self.variables.items()] + ['Attributes:'] + [' %s: %r' % (k, v) for k, v in self.attrs.items()]) def __repr__(self): if self._root._closed: return '' % self._cls_name header = ('<%s %r (%s members)>' % (self._cls_name, self.name, len(self))) return '\n'.join([header] + self._repr_body()) def resize_dimension(self, dimension, size): """ Resize a dimension to a certain size. It will pad with the underlying HDF5 data sets' fill values (usually zero) where necessary. """ if self.dimensions[dimension] is not None: raise ValueError("Dimension '%s' is not unlimited and thus " "cannot be resized." % dimension) # Resize the dimension. self._current_dim_sizes[dimension] = size for var in self.variables.values(): new_shape = list(var.shape) for i, d in enumerate(var.dimensions): if d == dimension: new_shape[i] = size new_shape = tuple(new_shape) if new_shape != var.shape: var._h5ds.resize(new_shape) # Recurse as dimensions are visible to this group and all child groups. for i in self.groups.values(): i.resize_dimension(dimension, size) class File(Group): def __init__(self, path, mode='a', invalid_netcdf=None, **kwargs): try: if isinstance(path, str): if path.startswith(('http://', 'https://', 'hdf5://')): if no_h5pyd: raise ImportError( "No module named 'h5pyd'. h5pyd is required for " "opening urls: {}".format(path)) try: with h5pyd.File(path, 'r') as f: # noqa pass self._preexisting_file = True except IOError: self._preexisting_file = False self._h5file = h5pyd.File(path, mode, **kwargs) else: self._preexisting_file = os.path.exists(path) self._h5file = h5py.File(path, mode, **kwargs) else: # file-like object if h5py.__version__ < LooseVersion('2.9.0'): raise TypeError( "h5py version ({}) must be greater than 2.9.0 to load " "file-like objects.".format(h5py.__version__)) else: self._preexisting_file = mode in {'r', 'r+', 'a'} self._h5file = h5py.File(path, mode, **kwargs) except Exception: self._closed = True raise else: self._closed = False self._mode = mode self._root = self self._h5path = '/' self.invalid_netcdf = invalid_netcdf # If invalid_netcdf is None, we'll disable writing _NCProperties only # if we actually use invalid NetCDF features. self._write_ncproperties = invalid_netcdf is not True # These maps keep track of dimensions in terms of size (might be # unlimited), current size (identical to size for limited dimensions), # their position, and look-up for HDF5 datasets corresponding to a # dimension. self._dim_sizes = ChainMap() self._current_dim_sizes = ChainMap() self._dim_order = ChainMap() self._all_h5groups = ChainMap(self._h5group) # used for picking numbers to use in self._dim_order self._next_dim_id = 0 super(File, self).__init__(self, self._h5path) def _get_next_dim_id(self): dim_id = self._next_dim_id self._next_dim_id += 1 return dim_id def _check_valid_netcdf_dtype(self, dtype, stacklevel=3): dtype = np.dtype(dtype) if dtype == bool: description = 'boolean' elif dtype == complex: description = 'complex' elif h5py.check_dtype(enum=dtype) is not None: description = 'enum' elif h5py.check_dtype(ref=dtype) is not None: description = 'reference' elif h5py.check_dtype(vlen=dtype) not in {None, unicode, bytes}: description = 'non-string variable length' else: description = None if description is not None: _invalid_netcdf_feature('{} dtypes'.format(description), allow=self.invalid_netcdf, file=self, stacklevel=stacklevel + 1) @property def mode(self): return self._h5file.mode @property def filename(self): return self._h5file.filename @property def parent(self): return None def flush(self): if 'r' not in self._mode: self._create_dim_scales() self._attach_dim_scales() if not self._preexisting_file and self._write_ncproperties: self.attrs._h5attrs['_NCProperties'] = _NC_PROPERTIES sync = flush def close(self): if not self._closed: self.flush() self._h5file.close() self._closed = True __del__ = close def __enter__(self): return self def __exit__(self, type, value, traceback): self.close() _cls_name = 'h5netcdf.File' def __repr__(self): if self._closed: return '' % self._cls_name header = '<%s %r (mode %s)>' % (self._cls_name, self.filename.split('/')[-1], self.mode) return '\n'.join([header] + self._repr_body()) h5netcdf-0.7.1/h5netcdf/dimensions.py0000644131111500116100000000142113263424647017552 0ustar shoyereng00000000000000from collections import MutableMapping class Dimensions(MutableMapping): def __init__(self, group): self._group = group def __getitem__(self, key): return self._group._dim_sizes[key] def __setitem__(self, key, value): self._group._create_dimension(key, value) def __delitem__(self, key): raise NotImplementedError('cannot yet delete dimensions') def __iter__(self): for key in self._group._dim_sizes: yield key def __len__(self): return len(self._group._dim_sizes) def __repr__(self): if self._group._root._closed: return '' return ('' % ', '.join('%s=%r' % (k, v) for k, v in self.items())) h5netcdf-0.7.1/h5netcdf/legacyapi.py0000644131111500116100000000543413263424647017350 0ustar shoyereng00000000000000import h5py from . import core from .compat import unicode class HasAttributesMixin(object): _initialized = False def getncattr(self, name): return self.attrs[name] def setncattr(self, name, value): self.attrs[name] = value def ncattrs(self): return list(self.attrs) def __getattr__(self, name): try: return self.attrs[name] except KeyError: raise AttributeError('NetCDF: attribute {} not found' .format(type(self).__name__, name)) def __setattr__(self, name, value): if self._initialized and name not in self.__dict__: self.attrs[name] = value else: object.__setattr__(self, name, value) class Variable(core.BaseVariable, HasAttributesMixin): _cls_name = 'h5netcdf.legacyapi.Variable' def chunking(self): chunks = self._h5ds.chunks if chunks is None: return 'contiguous' else: return chunks def filters(self): complevel = self._h5ds.compression_opts return {'complevel': 0 if complevel is None else complevel, 'fletcher32': self._h5ds.fletcher32, 'shuffle': self._h5ds.shuffle, 'zlib': self._h5ds.compression == 'gzip'} @property def dtype(self): dt = self._h5ds.dtype if h5py.check_dtype(vlen=dt) is unicode: return str return dt class Group(core.Group, HasAttributesMixin): _cls_name = 'h5netcdf.legacyapi.Group' _variable_cls = Variable @property def _group_cls(self): return Group createGroup = core.Group.create_group createDimension = core.Group._create_dimension def createVariable(self, varname, datatype, dimensions=(), zlib=False, complevel=4, shuffle=True, fletcher32=False, chunksizes=None, fill_value=None): if len(dimensions) == 0: # it's a scalar # rip off chunk and filter options for consistency with netCDF4-python chunksizes = None zlib = False fletcher32 = False shuffle = False if datatype is str: datatype = h5py.special_dtype(vlen=unicode) kwds = {} if zlib: # only add compression related keyword arguments if relevant (h5py # chokes otherwise) kwds['compression'] = 'gzip' kwds['compression_opts'] = complevel kwds['shuffle'] = shuffle return super(Group, self).create_variable( varname, dimensions, dtype=datatype, fletcher32=fletcher32, chunks=chunksizes, fillvalue=fill_value, **kwds) class Dataset(core.File, Group, HasAttributesMixin): _cls_name = 'h5netcdf.legacyapi.Dataset' h5netcdf-0.7.1/h5netcdf/tests/0000755131111500116100000000000013443342124016161 5ustar shoyereng00000000000000h5netcdf-0.7.1/h5netcdf/tests/conftest.py0000644131111500116100000000025413306421546020365 0ustar shoyereng00000000000000def pytest_addoption(parser): parser.addoption('--restapi', action='store_true', dest="restapi", default=False, help="Enable HDF5 REST API tests") h5netcdf-0.7.1/h5netcdf/tests/test_h5netcdf.py0000644131111500116100000007417513443341215021310 0ustar shoyereng00000000000000import netCDF4 import numpy as np import gc import re import string import random from os import environ as env import io import tempfile from distutils.version import LooseVersion import h5netcdf from h5netcdf import legacyapi from h5netcdf.compat import PY2, unicode from h5netcdf.core import NOT_A_VARIABLE import h5py import pytest from pytest import raises try: import h5pyd without_h5pyd = False except ImportError: without_h5pyd = True remote_h5 = ('http:', 'hdf5:') @pytest.fixture def tmp_local_netcdf(tmpdir): return str(tmpdir.join('testfile.nc')) @pytest.fixture(params=['testfile.nc', 'hdf5://testfile']) def tmp_local_or_remote_netcdf(request, tmpdir): if request.param.startswith(remote_h5): if not pytest.config.option.restapi: pytest.skip('Do not test with HDF5 REST API') elif without_h5pyd: pytest.skip('h5pyd package not available') if any([env.get(v) is None for v in ('HS_USERNAME', 'HS_PASSWORD')]): pytest.skip('HSDS username and/or password missing') rnd = ''.join(random.choice(string.ascii_uppercase) for _ in range(5)) return (env['HS_ENDPOINT'] + env['H5PYD_TEST_FOLDER'] + '/' + 'testfile' + rnd + '.nc') else: return str(tmpdir.join(request.param)) def get_hdf5_module(resource): """Return the correct h5py module based on the input resource.""" if isinstance(resource, str) and resource.startswith(remote_h5): return h5pyd else: return h5py def string_to_char(arr): """Like nc4.stringtochar, but faster and more flexible. """ # ensure the array is contiguous arr = np.array(arr, copy=False, order='C') kind = arr.dtype.kind if kind not in ['U', 'S']: raise ValueError('argument must be a string') return arr.reshape(arr.shape + (1,)).view(kind + '1') def array_equal(a, b): a, b = map(np.array, (a[...], b[...])) if a.shape != b.shape: return False try: return np.allclose(a, b) except TypeError: return (a == b).all() _char_array = string_to_char(np.array(['a', 'b', 'c', 'foo', 'bar', 'baz'], dtype='S')) _string_array = np.array([['foobar0', 'foobar1', 'foobar3'], ['foofoofoo', 'foofoobar', 'foobarbar']]) def is_h5py_char_working(tmp_netcdf, name): h5 = get_hdf5_module(tmp_netcdf) # https://github.com/Unidata/netcdf-c/issues/298 with h5.File(tmp_netcdf, 'r') as ds: v = ds[name] try: assert array_equal(v, _char_array) return True except Exception as e: if re.match("^Can't read data", e.args[0]): return False else: raise def write_legacy_netcdf(tmp_netcdf, write_module): ds = write_module.Dataset(tmp_netcdf, 'w') ds.setncattr('global', 42) ds.other_attr = 'yes' ds.createDimension('x', 4) ds.createDimension('y', 5) ds.createDimension('z', 6) ds.createDimension('empty', 0) ds.createDimension('string3', 3) v = ds.createVariable('foo', float, ('x', 'y'), chunksizes=(4, 5), zlib=True) v[...] = 1 v.setncattr('units', 'meters') v = ds.createVariable('y', int, ('y',), fill_value=-1) v[:4] = np.arange(4) v = ds.createVariable('z', 'S1', ('z', 'string3'), fill_value=b'X') v[...] = _char_array v = ds.createVariable('scalar', np.float32, ()) v[...] = 2.0 # test creating a scalar with compression option (with should be ignored) v = ds.createVariable('intscalar', np.int64, (), zlib=6, fill_value=None) v[...] = 2 with raises((h5netcdf.CompatibilityError, TypeError)): ds.createVariable('boolean', np.bool_, ('x')) g = ds.createGroup('subgroup') v = g.createVariable('subvar', np.int32, ('x',)) v[...] = np.arange(4.0) g.createDimension('y', 10) g.createVariable('y_var', float, ('y',)) ds.createDimension('mismatched_dim', 1) ds.createVariable('mismatched_dim', int, ()) v = ds.createVariable('var_len_str', str, ('x')) v[0] = u'foo' ds.close() def write_h5netcdf(tmp_netcdf): ds = h5netcdf.File(tmp_netcdf, 'w') ds.attrs['global'] = 42 ds.attrs['other_attr'] = 'yes' ds.dimensions = {'x': 4, 'y': 5, 'z': 6, 'empty': 0} v = ds.create_variable('foo', ('x', 'y'), float, chunks=(4, 5), compression='gzip', shuffle=True) v[...] = 1 v.attrs['units'] = 'meters' v = ds.create_variable('y', ('y',), int, fillvalue=-1) v[:4] = np.arange(4) v = ds.create_variable('z', ('z', 'string3'), data=_char_array, fillvalue=b'X') v = ds.create_variable('scalar', data=np.float32(2.0)) v = ds.create_variable('intscalar', data=np.int64(2)) with raises((h5netcdf.CompatibilityError, TypeError)): ds.create_variable('boolean', data=True) g = ds.create_group('subgroup') v = g.create_variable('subvar', ('x',), np.int32) v[...] = np.arange(4.0) with raises(AttributeError): v.attrs['_Netcdf4Dimid'] = -1 g.dimensions['y'] = 10 g.create_variable('y_var', ('y',), float) g.flush() ds.dimensions['mismatched_dim'] = 1 ds.create_variable('mismatched_dim', dtype=int) ds.flush() dt = h5py.special_dtype(vlen=unicode) v = ds.create_variable('var_len_str', ('x',), dtype=dt) v[0] = u'foo' ds.close() def read_legacy_netcdf(tmp_netcdf, read_module, write_module): ds = read_module.Dataset(tmp_netcdf, 'r') assert ds.ncattrs() == ['global', 'other_attr'] assert ds.getncattr('global') == 42 if not PY2 and write_module is not netCDF4: # skip for now: https://github.com/Unidata/netcdf4-python/issues/388 assert ds.other_attr == 'yes' with pytest.raises(AttributeError): ds.does_not_exist assert set(ds.dimensions) == set(['x', 'y', 'z', 'empty', 'string3', 'mismatched_dim']) assert set(ds.variables) == set(['foo', 'y', 'z', 'intscalar', 'scalar', 'var_len_str', 'mismatched_dim']) assert set(ds.groups) == set(['subgroup']) assert ds.parent is None v = ds.variables['foo'] assert array_equal(v, np.ones((4, 5))) assert v.dtype == float assert v.dimensions == ('x', 'y') assert v.ndim == 2 assert v.ncattrs() == ['units'] if not PY2 and write_module is not netCDF4: assert v.getncattr('units') == 'meters' assert tuple(v.chunking()) == (4, 5) assert v.filters() == {'complevel': 4, 'fletcher32': False, 'shuffle': True, 'zlib': True} v = ds.variables['y'] assert array_equal(v, np.r_[np.arange(4), [-1]]) assert v.dtype == int assert v.dimensions == ('y',) assert v.ndim == 1 assert v.ncattrs() == ['_FillValue'] assert v.getncattr('_FillValue') == -1 assert v.chunking() == 'contiguous' assert v.filters() == {'complevel': 0, 'fletcher32': False, 'shuffle': False, 'zlib': False} ds.close() # Check the behavior if h5py. Cannot expect h5netcdf to overcome these # errors: if is_h5py_char_working(tmp_netcdf, 'z'): ds = read_module.Dataset(tmp_netcdf, 'r') v = ds.variables['z'] assert array_equal(v, _char_array) assert v.dtype == 'S1' assert v.ndim == 2 assert v.dimensions == ('z', 'string3') assert v.ncattrs() == ['_FillValue'] assert v.getncattr('_FillValue') == b'X' else: ds = read_module.Dataset(tmp_netcdf, 'r') v = ds.variables['scalar'] assert array_equal(v, np.array(2.0)) assert v.dtype == 'float32' assert v.ndim == 0 assert v.dimensions == () assert v.ncattrs() == [] v = ds.variables['intscalar'] assert array_equal(v, np.array(2)) assert v.dtype == 'int64' assert v.ndim == 0 assert v.dimensions == () assert v.ncattrs() == [] v = ds.variables['var_len_str'] assert v.dtype == str assert v[0] == u'foo' v = ds.groups['subgroup'].variables['subvar'] assert ds.groups['subgroup'].parent is ds assert array_equal(v, np.arange(4.0)) assert v.dtype == 'int32' assert v.ndim == 1 assert v.dimensions == ('x',) assert v.ncattrs() == [] v = ds.groups['subgroup'].variables['y_var'] assert v.shape == (10,) assert 'y' in ds.groups['subgroup'].dimensions ds.close() def read_h5netcdf(tmp_netcdf, write_module): remote_file = (isinstance(tmp_netcdf, str) and tmp_netcdf.startswith(remote_h5)) ds = h5netcdf.File(tmp_netcdf, 'r') assert ds.name == '/' assert list(ds.attrs) == ['global', 'other_attr'] assert ds.attrs['global'] == 42 if not PY2 and write_module is not netCDF4: # skip for now: https://github.com/Unidata/netcdf4-python/issues/388 assert ds.attrs['other_attr'] == 'yes' assert set(ds.dimensions) == set(['x', 'y', 'z', 'empty', 'string3', 'mismatched_dim']) assert set(ds.variables) == set(['foo', 'y', 'z', 'intscalar', 'scalar', 'var_len_str', 'mismatched_dim']) assert set(ds.groups) == set(['subgroup']) assert ds.parent is None v = ds['foo'] assert v.name == '/foo' assert array_equal(v, np.ones((4, 5))) assert v.dtype == float assert v.dimensions == ('x', 'y') assert v.ndim == 2 assert list(v.attrs) == ['units'] if not PY2 and write_module is not netCDF4: assert v.attrs['units'] == 'meters' assert v.chunks == (4, 5) assert v.compression == 'gzip' assert v.compression_opts == 4 assert not v.fletcher32 assert v.shuffle v = ds['y'] assert array_equal(v, np.r_[np.arange(4), [-1]]) assert v.dtype == int assert v.dimensions == ('y',) assert v.ndim == 1 assert list(v.attrs) == ['_FillValue'] assert v.attrs['_FillValue'] == -1 if not remote_file: assert v.chunks is None assert v.compression is None assert v.compression_opts is None assert not v.fletcher32 assert not v.shuffle ds.close() if is_h5py_char_working(tmp_netcdf, 'z'): ds = h5netcdf.File(tmp_netcdf, 'r') v = ds['z'] assert v.dtype == 'S1' assert v.ndim == 2 assert v.dimensions == ('z', 'string3') assert list(v.attrs) == ['_FillValue'] assert v.attrs['_FillValue'] == b'X' else: ds = h5netcdf.File(tmp_netcdf, 'r') v = ds['scalar'] assert array_equal(v, np.array(2.0)) assert v.dtype == 'float32' assert v.ndim == 0 assert v.dimensions == () assert list(v.attrs) == [] v = ds.variables['intscalar'] assert array_equal(v, np.array(2)) assert v.dtype == 'int64' assert v.ndim == 0 assert v.dimensions == () assert list(v.attrs) == [] v = ds['var_len_str'] assert h5py.check_dtype(vlen=v.dtype) == unicode assert v[0] == u'foo' v = ds['/subgroup/subvar'] assert v is ds['subgroup']['subvar'] assert v is ds['subgroup/subvar'] assert v is ds['subgroup']['/subgroup/subvar'] assert v.name == '/subgroup/subvar' assert ds['subgroup'].name == '/subgroup' assert ds['subgroup'].parent is ds assert array_equal(v, np.arange(4.0)) assert v.dtype == 'int32' assert v.ndim == 1 assert v.dimensions == ('x',) assert list(v.attrs) == [] assert ds['/subgroup/y_var'].shape == (10,) assert ds['/subgroup'].dimensions['y'] == 10 ds.close() def roundtrip_legacy_netcdf(tmp_netcdf, read_module, write_module): write_legacy_netcdf(tmp_netcdf, write_module) read_legacy_netcdf(tmp_netcdf, read_module, write_module) def test_write_legacyapi_read_netCDF4(tmp_local_netcdf): roundtrip_legacy_netcdf(tmp_local_netcdf, netCDF4, legacyapi) def test_roundtrip_h5netcdf_legacyapi(tmp_local_netcdf): roundtrip_legacy_netcdf(tmp_local_netcdf, legacyapi, legacyapi) def test_write_netCDF4_read_legacyapi(tmp_local_netcdf): roundtrip_legacy_netcdf(tmp_local_netcdf, legacyapi, netCDF4) def test_write_h5netcdf_read_legacyapi(tmp_local_netcdf): write_h5netcdf(tmp_local_netcdf) read_legacy_netcdf(tmp_local_netcdf, legacyapi, h5netcdf) def test_write_h5netcdf_read_netCDF4(tmp_local_netcdf): write_h5netcdf(tmp_local_netcdf) read_legacy_netcdf(tmp_local_netcdf, netCDF4, h5netcdf) def test_roundtrip_h5netcdf(tmp_local_or_remote_netcdf): write_h5netcdf(tmp_local_or_remote_netcdf) read_h5netcdf(tmp_local_or_remote_netcdf, h5netcdf) def test_write_netCDF4_read_h5netcdf(tmp_local_netcdf): write_legacy_netcdf(tmp_local_netcdf, netCDF4) read_h5netcdf(tmp_local_netcdf, netCDF4) def test_write_legacyapi_read_h5netcdf(tmp_local_netcdf): write_legacy_netcdf(tmp_local_netcdf, legacyapi) read_h5netcdf(tmp_local_netcdf, legacyapi) def test_fileobj(): if h5py.__version__ < LooseVersion('2.9.0'): pytest.skip('h5py > 2.9.0 required to test file-like objects') fileobj = tempfile.TemporaryFile() write_h5netcdf(fileobj) read_h5netcdf(fileobj, h5netcdf) fileobj = io.BytesIO() write_h5netcdf(fileobj) read_h5netcdf(fileobj, h5netcdf) def test_repr(tmp_local_or_remote_netcdf): write_h5netcdf(tmp_local_or_remote_netcdf) f = h5netcdf.File(tmp_local_or_remote_netcdf, 'r') assert 'h5netcdf.File' in repr(f) assert 'subgroup' in repr(f) assert 'foo' in repr(f) assert 'other_attr' in repr(f) assert 'h5netcdf.attrs.Attributes' in repr(f.attrs) assert 'global' in repr(f.attrs) d = f.dimensions assert 'h5netcdf.Dimensions' in repr(d) assert 'x=4' in repr(d) g = f['subgroup'] assert 'h5netcdf.Group' in repr(g) assert 'subvar' in repr(g) v = f['foo'] assert 'h5netcdf.Variable' in repr(v) assert 'float' in repr(v) assert 'units' in repr(v) f.dimensions['temp'] = None assert 'temp: Unlimited (current: 0)' in repr(f) f.resize_dimension('temp', 5) assert 'temp: Unlimited (current: 5)' in repr(f) f.close() assert 'Closed' in repr(f) assert 'Closed' in repr(d) assert 'Closed' in repr(g) assert 'Closed' in repr(v) def test_attrs_api(tmp_local_or_remote_netcdf): with h5netcdf.File(tmp_local_or_remote_netcdf) as ds: ds.attrs['conventions'] = 'CF' ds.dimensions['x'] = 1 v = ds.create_variable('x', ('x',), 'i4') v.attrs.update({'units': 'meters', 'foo': 'bar'}) assert ds._closed with h5netcdf.File(tmp_local_or_remote_netcdf) as ds: assert len(ds.attrs) == 1 assert dict(ds.attrs) == {'conventions': 'CF'} assert list(ds.attrs) == ['conventions'] assert dict(ds['x'].attrs) == {'units': 'meters', 'foo': 'bar'} assert len(ds['x'].attrs) == 2 assert sorted(ds['x'].attrs) == ['foo', 'units'] def test_optional_netcdf4_attrs(tmp_local_or_remote_netcdf): h5 = get_hdf5_module(tmp_local_or_remote_netcdf) with h5.File(tmp_local_or_remote_netcdf) as f: foo_data = np.arange(50).reshape(5, 10) f.create_dataset('foo', data=foo_data) f.create_dataset('x', data=np.arange(5)) f.create_dataset('y', data=np.arange(10)) f['foo'].dims.create_scale(f['x']) f['foo'].dims.create_scale(f['y']) f['foo'].dims[0].attach_scale(f['x']) f['foo'].dims[1].attach_scale(f['y']) with h5netcdf.File(tmp_local_or_remote_netcdf, 'r') as ds: assert ds['foo'].dimensions == ('x', 'y') assert ds.dimensions == {'x': 5, 'y': 10} assert array_equal(ds['foo'], foo_data) def test_error_handling(tmp_local_or_remote_netcdf): with h5netcdf.File(tmp_local_or_remote_netcdf, 'w') as ds: ds.dimensions['x'] = 1 with raises(ValueError): ds.dimensions['x'] = 2 with raises(ValueError): ds.dimensions = {'x': 2} with raises(ValueError): ds.dimensions = {'y': 3} ds.create_variable('x', ('x',), dtype=float) with raises(ValueError): ds.create_variable('x', ('x',), dtype=float) ds.create_group('subgroup') with raises(ValueError): ds.create_group('subgroup') def test_invalid_netcdf4(tmp_local_or_remote_netcdf): h5 = get_hdf5_module(tmp_local_or_remote_netcdf) with h5.File(tmp_local_or_remote_netcdf) as f: f.create_dataset('foo', data=np.arange(5)) # labeled dimensions but no dimension scales f['foo'].dims[0].label = 'x' with h5netcdf.File(tmp_local_or_remote_netcdf, 'r') as ds: with raises(ValueError): ds.variables['foo'].dimensions def test_hierarchical_access_auto_create(tmp_local_or_remote_netcdf): ds = h5netcdf.File(tmp_local_or_remote_netcdf, 'w') ds.create_variable('/foo/bar', data=1) g = ds.create_group('foo/baz') g.create_variable('/foo/hello', data=2) assert set(ds) == set(['foo']) assert set(ds['foo']) == set(['bar', 'baz', 'hello']) ds.close() ds = h5netcdf.File(tmp_local_or_remote_netcdf, 'r') assert set(ds) == set(['foo']) assert set(ds['foo']) == set(['bar', 'baz', 'hello']) ds.close() def test_netcdf4Dimid(tmp_local_netcdf): # regression test for https://github.com/shoyer/h5netcdf/issues/53 with h5netcdf.File(tmp_local_netcdf, 'w') as f: f.dimensions['x'] = 1 g = f.create_group('foo') g.dimensions['x'] = 2 g.dimensions['y'] = 3 with h5py.File(tmp_local_netcdf) as f: assert f['x'].attrs['_Netcdf4Dimid'] == 0 assert f['foo/x'].attrs['_Netcdf4Dimid'] == 1 assert f['foo/y'].attrs['_Netcdf4Dimid'] == 2 def test_reading_str_array_from_netCDF4(tmp_local_netcdf): # This tests reading string variables created by netCDF4 with netCDF4.Dataset(tmp_local_netcdf, 'w') as ds: ds.createDimension('foo1', _string_array.shape[0]) ds.createDimension('foo2', _string_array.shape[1]) ds.createVariable('bar', str, ('foo1', 'foo2')) ds.variables['bar'][:] = _string_array ds = h5netcdf.File(tmp_local_netcdf, 'r') v = ds.variables['bar'] assert array_equal(v, _string_array) ds.close() def test_nc_properties_new(tmp_local_or_remote_netcdf): with h5netcdf.File(tmp_local_or_remote_netcdf): pass h5 = get_hdf5_module(tmp_local_or_remote_netcdf) with h5.File(tmp_local_or_remote_netcdf, 'r') as f: assert 'h5netcdf' in f.attrs['_NCProperties'] def test_failed_read_open_and_clean_delete(tmpdir): # A file that does not exist but is opened for # reading should only raise an IOError and # no AttributeError at garbage collection. path = str(tmpdir.join('this_file_does_not_exist.nc')) try: with h5netcdf.File(path, 'r') as ds: pass except IOError: pass # Look at garbage collection: # A simple gc.collect() does not raise an exception. # Must seek the File object and imitate its del command # by forcing it to close. obj_list = gc.get_objects() for obj in obj_list: try: is_h5netcdf_File = isinstance(obj, h5netcdf.File) except AttributeError as e: is_h5netcdf_File = False if is_h5netcdf_File: obj.close() def test_create_variable_matching_saved_dimension(tmp_local_or_remote_netcdf): h5 = get_hdf5_module(tmp_local_or_remote_netcdf) if h5 is not h5py: pytest.xfail('https://github.com/shoyer/h5netcdf/issues/48') with h5netcdf.File(tmp_local_or_remote_netcdf) as f: f.dimensions['x'] = 2 f.create_variable('y', data=[1, 2], dimensions=('x',)) with h5.File(tmp_local_or_remote_netcdf) as f: assert f['y'].dims[0].keys() == [NOT_A_VARIABLE.decode('ascii')] with h5netcdf.File(tmp_local_or_remote_netcdf) as f: f.create_variable('x', data=[0, 1], dimensions=('x',)) with h5.File(tmp_local_or_remote_netcdf) as f: assert f['y'].dims[0].keys() == ['x'] def test_invalid_netcdf_warns(tmp_local_or_remote_netcdf): if tmp_local_or_remote_netcdf.startswith(remote_h5): pytest.skip('h5pyd does not support NumPy complex dtype yet') with h5netcdf.File(tmp_local_or_remote_netcdf) as f: # valid with pytest.warns(None) as record: f.create_variable('lzf_compressed', data=[1], dimensions=('x'), compression='lzf') assert not record.list # invalid with pytest.warns(FutureWarning): f.create_variable('complex', data=1j) with pytest.warns(FutureWarning): f.attrs['complex_attr'] = 1j with pytest.warns(FutureWarning): f.create_variable('scaleoffset', data=[1], dimensions=('x',), scaleoffset=0) h5 = get_hdf5_module(tmp_local_or_remote_netcdf) with h5.File(tmp_local_or_remote_netcdf) as f: assert '_NCProperties' not in f.attrs def test_invalid_netcdf_error(tmp_local_or_remote_netcdf): with h5netcdf.File(tmp_local_or_remote_netcdf, 'w', invalid_netcdf=False) as f: # valid f.create_variable('lzf_compressed', data=[1], dimensions=('x'), compression='lzf') # invalid with pytest.raises(h5netcdf.CompatibilityError): f.create_variable('complex', data=1j) with pytest.raises(h5netcdf.CompatibilityError): f.attrs['complex_attr'] = 1j with pytest.raises(h5netcdf.CompatibilityError): f.create_variable('scaleoffset', data=[1], dimensions=('x',), scaleoffset=0) def test_invalid_netcdf_okay(tmp_local_or_remote_netcdf): if tmp_local_or_remote_netcdf.startswith(remote_h5): pytest.skip('h5pyd does not support NumPy complex dtype yet') with h5netcdf.File(tmp_local_or_remote_netcdf, invalid_netcdf=True) as f: f.create_variable('lzf_compressed', data=[1], dimensions=('x'), compression='lzf') f.create_variable('complex', data=1j) f.attrs['complex_attr'] = 1j f.create_variable('scaleoffset', data=[1], dimensions=('x',), scaleoffset=0) with h5netcdf.File(tmp_local_or_remote_netcdf) as f: np.testing.assert_equal(f['lzf_compressed'][:], [1]) assert f['complex'][...] == 1j assert f.attrs['complex_attr'] == 1j np.testing.assert_equal(f['scaleoffset'][:], [1]) h5 = get_hdf5_module(tmp_local_or_remote_netcdf) with h5.File(tmp_local_or_remote_netcdf) as f: assert '_NCProperties' not in f.attrs def test_invalid_then_valid_no_ncproperties(tmp_local_or_remote_netcdf): with h5netcdf.File(tmp_local_or_remote_netcdf, invalid_netcdf=True): pass with h5netcdf.File(tmp_local_or_remote_netcdf): pass h5 = get_hdf5_module(tmp_local_or_remote_netcdf) with h5.File(tmp_local_or_remote_netcdf) as f: # still not a valid netcdf file assert '_NCProperties' not in f.attrs def test_creating_and_resizing_unlimited_dimensions(tmp_local_or_remote_netcdf): with h5netcdf.File(tmp_local_or_remote_netcdf) as f: f.dimensions['x'] = None f.dimensions['y'] = 15 f.dimensions['z'] = None f.resize_dimension('z', 20) with pytest.raises(ValueError) as e: f.resize_dimension('y', 20) assert e.value.args[0] == ( "Dimension 'y' is not unlimited and thus cannot be resized.") h5 = get_hdf5_module(tmp_local_or_remote_netcdf) # Assert some behavior observed by using the C netCDF bindings. with h5.File(tmp_local_or_remote_netcdf) as f: assert f["x"].shape == (0,) assert f["x"].maxshape == (None,) assert f["y"].shape == (15,) assert f["y"].maxshape == (15,) assert f["z"].shape == (20,) assert f["z"].maxshape == (None,) def test_creating_variables_with_unlimited_dimensions(tmp_local_or_remote_netcdf): with h5netcdf.File(tmp_local_or_remote_netcdf) as f: f.dimensions['x'] = None f.dimensions['y'] = 2 # Creating a variable without data will initialize an array with zero # length. f.create_variable('dummy', dimensions=('x', 'y'), dtype=np.int64) assert f.variables["dummy"].shape == (0, 2) assert f.variables["dummy"]._h5ds.maxshape == (None, 2) # Trying to create a variable while the current size of the dimension # is still zero will fail. with pytest.raises(ValueError) as e: f.create_variable('dummy2', data=np.array([[1, 2], [3, 4]]), dimensions=('x', 'y')) assert e.value.args[0] == "Shape tuple is incompatible with data" # Resize data. assert f.variables["dummy"].shape == (0, 2) f.resize_dimension('x', 3) # This will also force a resize of the existing variables and it will # be padded with zeros.. np.testing.assert_allclose(f.variables["dummy"], np.zeros((3, 2))) # Creating another variable with no data will now also take the shape # of the current dimensions. f.create_variable('dummy3', dimensions=('x', 'y'), dtype=np.int64) assert f.variables["dummy3"].shape == (3, 2) assert f.variables["dummy3"]._h5ds.maxshape == (None, 2) # Close and read again to also test correct parsing of unlimited # dimensions. with h5netcdf.File(tmp_local_or_remote_netcdf) as f: assert f.dimensions['x'] is None assert f._h5file['x'].maxshape == (None,) assert f._h5file['x'].shape == (3,) assert f.dimensions['y'] == 2 assert f._h5file['y'].maxshape == (2,) assert f._h5file['y'].shape == (2,) def test_writing_to_an_unlimited_dimension(tmp_local_or_remote_netcdf): with h5netcdf.File(tmp_local_or_remote_netcdf) as f: # Two dimensions, only one is unlimited. f.dimensions['x'] = None f.dimensions['y'] = 3 # Cannot create it without first resizing it. with pytest.raises(ValueError) as e: f.create_variable('dummy1', data=np.array([[1, 2, 3]]), dimensions=('x', 'y')) assert e.value.args[0] == "Shape tuple is incompatible with data" # Without data. f.create_variable('dummy1', dimensions=('x', 'y'), dtype=np.int64) f.create_variable('dummy2', dimensions=('x', 'y'), dtype=np.int64) f.create_variable('dummy3', dimensions=('x', 'y'), dtype=np.int64) g = f.create_group('test') g.create_variable('dummy4', dimensions=('y', 'x', 'x'), dtype=np.int64) g.create_variable('dummy5', dimensions=('y', 'y'), dtype=np.int64) assert f.variables['dummy1'].shape == (0, 3) assert f.variables['dummy2'].shape == (0, 3) assert f.variables['dummy3'].shape == (0, 3) assert g.variables['dummy4'].shape == (3, 0, 0) assert g.variables['dummy5'].shape == (3, 3) f.resize_dimension("x", 2) assert f.variables['dummy1'].shape == (2, 3) assert f.variables['dummy2'].shape == (2, 3) assert f.variables['dummy3'].shape == (2, 3) assert g.variables['dummy4'].shape == (3, 2, 2) assert g.variables['dummy5'].shape == (3, 3) f.variables['dummy2'][:] = [[1, 2, 3], [5, 6, 7]] np.testing.assert_allclose(f.variables['dummy2'], [[1, 2, 3], [5, 6, 7]]) f.variables['dummy3'][...] = [[1, 2, 3], [5, 6, 7]] np.testing.assert_allclose(f.variables['dummy3'], [[1, 2, 3], [5, 6, 7]]) def test_c_api_can_read_unlimited_dimensions(tmp_local_netcdf): with h5netcdf.File(tmp_local_netcdf) as f: # Three dimensions, only one is limited. f.dimensions['x'] = None f.dimensions['y'] = 3 f.dimensions['z'] = None f.create_variable('dummy1', dimensions=('x', 'y'), dtype=np.int64) f.create_variable('dummy2', dimensions=('y', 'x', 'x'), dtype=np.int64) g = f.create_group('test') g.create_variable('dummy3', dimensions=('y', 'y'), dtype=np.int64) g.create_variable('dummy4', dimensions=('z', 'z'), dtype=np.int64) f.resize_dimension('x', 2) with netCDF4.Dataset(tmp_local_netcdf) as f: assert f.dimensions['x'].size == 2 assert f.dimensions['x'].isunlimited() is True assert f.dimensions['y'].size == 3 assert f.dimensions['y'].isunlimited() is False assert f.dimensions['z'].size == 0 assert f.dimensions['z'].isunlimited() is True assert f.variables['dummy1'].shape == (2, 3) assert f.variables['dummy2'].shape == (3, 2, 2) g = f.groups["test"] assert g.variables['dummy3'].shape == (3, 3) assert g.variables['dummy4'].shape == (0, 0) def test_reading_unlimited_dimensions_created_with_c_api(tmp_local_netcdf): with netCDF4.Dataset(tmp_local_netcdf, "w") as f: f.createDimension('x', None) f.createDimension('y', 3) f.createDimension('z', None) dummy1 = f.createVariable('dummy1', float, ('x', 'y')) f.createVariable('dummy2', float, ('y', 'x', 'x')) g = f.createGroup('test') g.createVariable('dummy3', float, ('y', 'y')) g.createVariable('dummy4', float, ('z', 'z')) # Assign something to trigger a resize. dummy1[:] = [[1, 2, 3], [4, 5, 6]] with h5netcdf.File(tmp_local_netcdf) as f: assert f.dimensions['x'] is None assert f.dimensions['y'] == 3 assert f.dimensions['z'] is None # This is parsed correctly due to h5netcdf's init trickery. assert f._current_dim_sizes['x'] == 2 assert f._current_dim_sizes['y'] == 3 assert f._current_dim_sizes['z'] == 0 # But the actual data-set and arrays are not correct. assert f['dummy1'].shape == (2, 3) # XXX: This array has some data with dimension x - netcdf does not # appear to keep dimensions consistent. assert f['dummy2'].shape == (3, 0, 0) f.groups['test']['dummy3'].shape == (3, 3) f.groups['test']['dummy4'].shape == (0, 0) def test_reading_unused_unlimited_dimension(tmp_local_or_remote_netcdf): """Test reading a file with unused dimension of unlimited size""" with h5netcdf.File(tmp_local_or_remote_netcdf, 'w') as f: f.dimensions = {'x': None} f.resize_dimension('x', 5) assert f.dimensions == {'x': None} f = h5netcdf.File(tmp_local_or_remote_netcdf, 'r') h5netcdf-0.7.1/h5netcdf/utils.py0000644131111500116100000000126513263424647016550 0ustar shoyereng00000000000000from collections import Mapping import numpy as np class Frozen(Mapping): """Wrapper around an object implementing the mapping interface to make it immutable. If you really want to modify the mapping, the mutable version is saved under the `_mapping` attribute. """ def __init__(self, mapping): self._mapping = mapping def __getitem__(self, key): return self._mapping[key] def __iter__(self): return iter(self._mapping) def __len__(self): return len(self._mapping) def __contains__(self, key): return key in self._mapping def __repr__(self): return '%s(%r)' % (type(self).__name__, self._mapping) h5netcdf-0.7.1/h5netcdf.egg-info/0000755131111500116100000000000013443342124016511 5ustar shoyereng00000000000000h5netcdf-0.7.1/h5netcdf.egg-info/PKG-INFO0000644131111500116100000002662713443342124017623 0ustar shoyereng00000000000000Metadata-Version: 1.1 Name: h5netcdf Version: 0.7.1 Summary: netCDF4 via h5py Home-page: https://github.com/shoyer/h5netcdf Author: Stephan Hoyer Author-email: shoyer@gmail.com License: BSD Description: h5netcdf ======== .. image:: https://travis-ci.org/shoyer/h5netcdf.svg?branch=master :target: https://travis-ci.org/shoyer/h5netcdf .. image:: https://badge.fury.io/py/h5netcdf.svg :target: https://pypi.python.org/pypi/h5netcdf/ A Python interface for the netCDF4_ file-format that reads and writes local or remote HDF5 files directly via h5py_ or h5pyd_, without relying on the Unidata netCDF library. .. _netCDF4: http://www.unidata.ucar.edu/software/netcdf/docs/file_format_specifications.html#netcdf_4_spec .. _h5py: http://www.h5py.org/ .. _h5pyd: https://github.com/HDFGroup/h5pyd Why h5netcdf? ------------- - We've seen occasional reports of better performance with h5py than netCDF4-python, though in many cases performance is identical. For `one workflow`_, h5netcdf was reported to be almost **4x faster** than `netCDF4-python`_. - It has one less massive binary dependency (netCDF C). If you already have h5py installed, reading netCDF4 with h5netcdf may be much easier than installing netCDF4-Python. - Anecdotally, HDF5 users seem to be unexcited about switching to netCDF -- hopefully this will convince them that the netCDF4 is actually quite sane! - Finally, side-stepping the netCDF C library (and Cython bindings to it) gives us an easier way to identify the source of performance issues and bugs. .. _one workflow: https://github.com/Unidata/netcdf4-python/issues/390#issuecomment-93864839 .. _xarray: http://github.com/pydata/xarray/ Install ------- Ensure you have a recent version of h5py installed (I recommend using conda_). At least version 2.1 is required (for dimension scales); versions 2.3 and newer have been verified to work, though some tests only pass on h5py 2.6. Then: ``pip install h5netcdf`` .. _conda: http://conda.io/ Usage ----- h5netcdf has two APIs, a new API and a legacy API. Both interfaces currently reproduce most of the features of the netCDF interface, with the notable exception of support for operations the rename or delete existing objects. We simply haven't gotten around to implementing this yet. Patches would be very welcome. New API ~~~~~~~ The new API supports direct hierarchical access of variables and groups. Its design is an adaptation of h5py to the netCDF data model. For example: .. code-block:: python import h5netcdf import numpy as np with h5netcdf.File('mydata.nc', 'w') as f: # set dimensions with a dictionary f.dimensions = {'x': 5} # and update them with a dict-like interface # f.dimensions['x'] = 5 # f.dimensions.update({'x': 5}) v = f.create_variable('hello', ('x',), float) v[:] = np.ones(5) # you don't need to create groups first # you also don't need to create dimensions first if you supply data # with the new variable v = f.create_variable('/grouped/data', ('y',), data=np.arange(10)) # access and modify attributes with a dict-like interface v.attrs['foo'] = 'bar' # you can access variables and groups directly using a hierarchical # keys like h5py print(f['/grouped/data']) # add an unlimited dimension f.dimensions['z'] = None # explicitly resize a dimension and all variables using it f.resize_dimension('z', 3) Legacy API ~~~~~~~~~~ The legacy API is designed for compatibility with netCDF4-python_. To use it, import ``h5netcdf.legacyapi``: .. _netCDF4-python: https://github.com/Unidata/netcdf4-python .. code-block:: python import h5netcdf.legacyapi as netCDF4 # everything here would also work with this instead: # import netCDF4 import numpy as np with netCDF4.Dataset('mydata.nc', 'w') as ds: ds.createDimension('x', 5) v = ds.createVariable('hello', float, ('x',)) v[:] = np.ones(5) g = ds.createGroup('grouped') g.createDimension('y', 10) g.createVariable('data', 'i8', ('y',)) v = g['data'] v[:] = np.arange(10) v.foo = 'bar' print(ds.groups['grouped'].variables['data']) The legacy API is designed to be easy to try-out for netCDF4-python users, but it is not an exact match. Here is an incomplete list of functionality we don't include: - Utility functions ``chartostring``, ``num2date``, etc., that are not directly necessary for writing netCDF files. - We don't support the ``endian`` argument to ``createVariable`` yet (see `GitHub issue`_). - h5netcdf variables do not support automatic masking or scaling (e.g., of values matching the ``_FillValue`` attribute). We prefer to leave this functionality to client libraries (e.g., xarray_), which can implement their exact desired scaling behavior. - No support yet for automatic resizing of unlimited dimensions with array indexing. This would be a welcome pull request. For now, dimensions can be manually resized with ``Group.resize_dimension(dimension, size)``. .. _GitHub issue: https://github.com/shoyer/h5netcdf/issues/15 Invalid netCDF files ~~~~~~~~~~~~~~~~~~~~ h5py implements some features that do not (yet) result in valid netCDF files: - Data types: - Booleans - Complex values - Non-string variable length types - Enum types - Reference types - Arbitrary filters: - Scale-offset filters By default [*]_, h5netcdf will not allow writing files using any of these features, as files with such features are not readable by other netCDF tools. However, these are still valid HDF5 files. If you don't care about netCDF compatibility, you can use these features by setting ``invalid_netcdf=True`` when creating a file: .. code-block:: python # avoid the .nc extension for non-netcdf files f = h5netcdf.File('mydata.h5', invalid_netcdf=True) ... # works with the legacy API, too, though compression options are not exposed ds = h5netcdf.legacyapi.Dataset('mydata.h5', invalid_netcdf=True) ... .. [*] Currently, we only issue a warning, but in a future version of h5netcdf, we will raise ``h5netcdf.CompatibilityError``. Use ``invalid_netcdf=False`` to switch to the new behavior now. Change Log ---------- Version 0.7.1 (Mar 16, 2019): - Fixed a bug where h5netcdf could write invalid netCDF files with reused dimension IDs. netCDF-C 4.6.2 will crash when reading these files. - Updated to use version 2 of ``_NCProperties`` attribute. Version 0.7 (Feb 26, 2019): - Support for reading and writing file-like objects (requires h5py 2.9 or newer). By `Scott Henderson `_. Version 0.6.2 (Aug 19, 2018): - Fixed a bug that prevented creating variables with the same name as previously created dimensions in reopened files. Version 0.6.1 (Jun 8, 2018): - Compression with arbitrary filters no longer triggers warnings about invalid netCDF files, because this is now `supported by netCDF `__. Version 0.6 (Jun 7, 2018): - Support for reading and writing data to remote HDF5 files via the HDF5 REST API using the h5pyd_ package. Any file "path" starting with either ``http://``, ``https://``, or ``hdf5://`` will automatically trigger the use of this package. By `Aleksandar Jelenak `_. Version 0.5.1 (Apr 11, 2018): - Bug fix for files with an unlimited dimension with no associated variables. By `Aleksandar Jelenak `_. Version 0.5 (Oct 17, 2017): - Support for creating unlimited dimensions. By `Lion Krischer `_. Version 0.4.3 (Oct 10, 2017): - Fix test suite failure with recent versions of netCDF4-Python. Version 0.4.2 (Sep 12, 2017): - Raise ``AttributeError`` rather than ``KeyError`` when attributes are not found using the legacy API. This fixes an issue that prevented writing to h5netcdf with dask. Version 0.4.1 (Sep 6, 2017): - Include tests in source distribution on pypi. Version 0.4 (Aug 30, 2017): - Add ``invalid_netcdf`` argument. Warnings are now issued by default when writing an invalid NetCDF file. See the "Invalid netCDF files" section of the README for full details. Version 0.3.1 (Sep 2, 2016): - Fix garbage collection issue. - Add missing ``.flush()`` method for groups. - Allow creating dimensions of size 0. Version 0.3.0 (Aug 7, 2016): - Datasets are now loaded lazily. This should increase performance when opening files with a large number of groups and/or variables. - Support for writing arrays of variable length unicode strings with ``dtype=str`` via the legacy API. - h5netcdf now writes the ``_NCProperties`` attribute for identifying netCDF4 files. License ------- `3-clause BSD`_ .. _3-clause BSD: https://github.com/shoyer/h5netcdf/blob/master/LICENSE Platform: UNKNOWN Classifier: Development Status :: 4 - Beta Classifier: License :: OSI Approved :: BSD License Classifier: Operating System :: OS Independent Classifier: Intended Audience :: Science/Research Classifier: Programming Language :: Python Classifier: Programming Language :: Python :: 2 Classifier: Programming Language :: Python :: 2.7 Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3.4 Classifier: Programming Language :: Python :: 3.5 Classifier: Programming Language :: Python :: 3.6 Classifier: Programming Language :: Python :: 3.7 Classifier: Topic :: Scientific/Engineering h5netcdf-0.7.1/h5netcdf.egg-info/SOURCES.txt0000644131111500116100000000065313443342124020401 0ustar shoyereng00000000000000LICENSE MANIFEST.in README.rst setup.cfg setup.py h5netcdf/__init__.py h5netcdf/_chainmap.py h5netcdf/attrs.py h5netcdf/compat.py h5netcdf/core.py h5netcdf/dimensions.py h5netcdf/legacyapi.py h5netcdf/utils.py h5netcdf.egg-info/PKG-INFO h5netcdf.egg-info/SOURCES.txt h5netcdf.egg-info/dependency_links.txt h5netcdf.egg-info/requires.txt h5netcdf.egg-info/top_level.txt h5netcdf/tests/conftest.py h5netcdf/tests/test_h5netcdf.pyh5netcdf-0.7.1/h5netcdf.egg-info/dependency_links.txt0000644131111500116100000000000113443342124022557 0ustar shoyereng00000000000000 h5netcdf-0.7.1/h5netcdf.egg-info/requires.txt0000644131111500116100000000000513443342124021104 0ustar shoyereng00000000000000h5py h5netcdf-0.7.1/h5netcdf.egg-info/top_level.txt0000644131111500116100000000001113443342124021233 0ustar shoyereng00000000000000h5netcdf h5netcdf-0.7.1/setup.cfg0000644131111500116100000000007513443342124015142 0ustar shoyereng00000000000000[wheel] universal = 1 [egg_info] tag_build = tag_date = 0 h5netcdf-0.7.1/setup.py0000644131111500116100000000216013443342065015034 0ustar shoyereng00000000000000import os from setuptools import setup, find_packages CLASSIFIERS = [ 'Development Status :: 4 - Beta', 'License :: OSI Approved :: BSD License', 'Operating System :: OS Independent', 'Intended Audience :: Science/Research', 'Programming Language :: Python', 'Programming Language :: Python :: 2', 'Programming Language :: Python :: 2.7', 'Programming Language :: Python :: 3', 'Programming Language :: Python :: 3.4', 'Programming Language :: Python :: 3.5', 'Programming Language :: Python :: 3.6', 'Programming Language :: Python :: 3.7', 'Topic :: Scientific/Engineering', ] setup(name='h5netcdf', description='netCDF4 via h5py', long_description=(open('README.rst').read() if os.path.exists('README.rst') else ''), version='0.7.1', license='BSD', classifiers=CLASSIFIERS, author='Stephan Hoyer', author_email='shoyer@gmail.com', url='https://github.com/shoyer/h5netcdf', install_requires=['h5py'], tests_require=['netCDF4', 'pytest'], packages=find_packages())