././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1639987175.5187743 h5netcdf-0.12.0/0000755000175100001710000000000000000000000012632 5ustar00runnerdocker././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1639987165.0 h5netcdf-0.12.0/LICENSE0000644000175100001710000000273300000000000013644 0ustar00runnerdockerCopyright (c) 2015, Stephan Hoyer All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1639987165.0 h5netcdf-0.12.0/MANIFEST.in0000644000175100001710000000006600000000000014372 0ustar00runnerdockerinclude LICENSE recursive-include h5netcdf/tests *.py ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1639987175.5187743 h5netcdf-0.12.0/PKG-INFO0000644000175100001710000002124700000000000013735 0ustar00runnerdockerMetadata-Version: 2.1 Name: h5netcdf Version: 0.12.0 Summary: netCDF4 via h5py Home-page: https://github.com/h5netcdf/h5netcdf Author: Stephan Hoyer Author-email: shoyer@gmail.com License: BSD Platform: UNKNOWN Classifier: Development Status :: 4 - Beta Classifier: License :: OSI Approved :: BSD License Classifier: Operating System :: OS Independent Classifier: Intended Audience :: Science/Research Classifier: Programming Language :: Python Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3.6 Classifier: Programming Language :: Python :: 3.7 Classifier: Programming Language :: Python :: 3.8 Classifier: Programming Language :: Python :: 3.9 Classifier: Topic :: Scientific/Engineering Requires-Python: >=3.6 License-File: LICENSE h5netcdf ======== .. image:: https://github.com/h5netcdf/h5netcdf/workflows/CI/badge.svg :target: https://github.com/h5netcdf/h5netcdf/actions .. image:: https://badge.fury.io/py/h5netcdf.svg :target: https://pypi.python.org/pypi/h5netcdf/ A Python interface for the netCDF4_ file-format that reads and writes local or remote HDF5 files directly via h5py_ or h5pyd_, without relying on the Unidata netCDF library. .. _netCDF4: http://www.unidata.ucar.edu/software/netcdf/docs/file_format_specifications.html#netcdf_4_spec .. _h5py: http://www.h5py.org/ .. _h5pyd: https://github.com/HDFGroup/h5pyd Why h5netcdf? ------------- - It has one less binary dependency (netCDF C). If you already have h5py installed, reading netCDF4 with h5netcdf may be much easier than installing netCDF4-Python. - We've seen occasional reports of better performance with h5py than netCDF4-python, though in many cases performance is identical. For `one workflow`_, h5netcdf was reported to be almost **4x faster** than `netCDF4-python`_. - Anecdotally, HDF5 users seem to be unexcited about switching to netCDF -- hopefully this will convince them that netCDF4 is actually quite sane! - Finally, side-stepping the netCDF C library (and Cython bindings to it) gives us an easier way to identify the source of performance issues and bugs in the netCDF libraries/specification. .. _one workflow: https://github.com/Unidata/netcdf4-python/issues/390#issuecomment-93864839 .. _xarray: http://github.com/pydata/xarray/ Install ------- Ensure you have a recent version of h5py installed (I recommend using conda_). At least version 2.1 is required (for dimension scales); versions 2.3 and newer have been verified to work, though some tests only pass on h5py 2.6. Then: ``pip install h5netcdf`` .. _conda: http://conda.io/ Usage ----- h5netcdf has two APIs, a new API and a legacy API. Both interfaces currently reproduce most of the features of the netCDF interface, with the notable exception of support for operations the rename or delete existing objects. We simply haven't gotten around to implementing this yet. Patches would be very welcome. New API ~~~~~~~ The new API supports direct hierarchical access of variables and groups. Its design is an adaptation of h5py to the netCDF data model. For example: .. code-block:: python import h5netcdf import numpy as np with h5netcdf.File('mydata.nc', 'w') as f: # set dimensions with a dictionary f.dimensions = {'x': 5} # and update them with a dict-like interface # f.dimensions['x'] = 5 # f.dimensions.update({'x': 5}) v = f.create_variable('hello', ('x',), float) v[:] = np.ones(5) # you don't need to create groups first # you also don't need to create dimensions first if you supply data # with the new variable v = f.create_variable('/grouped/data', ('y',), data=np.arange(10)) # access and modify attributes with a dict-like interface v.attrs['foo'] = 'bar' # you can access variables and groups directly using a hierarchical # keys like h5py print(f['/grouped/data']) # add an unlimited dimension f.dimensions['z'] = None # explicitly resize a dimension and all variables using it f.resize_dimension('z', 3) Legacy API ~~~~~~~~~~ The legacy API is designed for compatibility with netCDF4-python_. To use it, import ``h5netcdf.legacyapi``: .. _netCDF4-python: https://github.com/Unidata/netcdf4-python .. code-block:: python import h5netcdf.legacyapi as netCDF4 # everything here would also work with this instead: # import netCDF4 import numpy as np with netCDF4.Dataset('mydata.nc', 'w') as ds: ds.createDimension('x', 5) v = ds.createVariable('hello', float, ('x',)) v[:] = np.ones(5) g = ds.createGroup('grouped') g.createDimension('y', 10) g.createVariable('data', 'i8', ('y',)) v = g['data'] v[:] = np.arange(10) v.foo = 'bar' print(ds.groups['grouped'].variables['data']) The legacy API is designed to be easy to try-out for netCDF4-python users, but it is not an exact match. Here is an incomplete list of functionality we don't include: - Utility functions ``chartostring``, ``num2date``, etc., that are not directly necessary for writing netCDF files. - We don't support the ``endian`` argument to ``createVariable`` yet (see `GitHub issue`_). - h5netcdf variables do not support automatic masking or scaling (e.g., of values matching the ``_FillValue`` attribute). We prefer to leave this functionality to client libraries (e.g., xarray_), which can implement their exact desired scaling behavior. - No support yet for automatic resizing of unlimited dimensions with array indexing. This would be a welcome pull request. For now, dimensions can be manually resized with ``Group.resize_dimension(dimension, size)``. .. _GitHub issue: https://github.com/h5netcdf/h5netcdf/issues/15 Invalid netCDF files ~~~~~~~~~~~~~~~~~~~~ h5py implements some features that do not (yet) result in valid netCDF files: - Data types: - Booleans - Complex values - Non-string variable length types - Enum types - Reference types - Arbitrary filters: - Scale-offset filters By default [*]_, h5netcdf will not allow writing files using any of these features, as files with such features are not readable by other netCDF tools. However, these are still valid HDF5 files. If you don't care about netCDF compatibility, you can use these features by setting ``invalid_netcdf=True`` when creating a file: .. code-block:: python # avoid the .nc extension for non-netcdf files f = h5netcdf.File('mydata.h5', invalid_netcdf=True) ... # works with the legacy API, too, though compression options are not exposed ds = h5netcdf.legacyapi.Dataset('mydata.h5', invalid_netcdf=True) ... .. [*] h5netcdf we will raise ``h5netcdf.CompatibilityError``. Decoding variable length strings ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ h5py 3.0 introduced `new behavior`_ for handling variable length string. Instead of being automatically decoded with UTF-8 into NumPy arrays of ``str``, they are required as arrays of ``bytes``. The legacy API preserves the old behavior of h5py (which matches netCDF4), and automatically decodes strings. The new API *also* currently preserves the old behavior of h5py, but issues a warning that it will change in the future to match h5py. Explicitly set ``decode_vlen_strings=False`` in the ``h5netcdf.File`` constructor to opt-in to the new behavior early, or set ``decode_vlen_strings=True`` to opt-in to automatic decoding. .. _new behavior: https://docs.h5py.org/en/stable/strings.html Datasets with missing dimension scales ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ By default [*]_ h5netcdf raises a ``ValueError`` if variables with no dimension scale associated with one of their axes are accessed. You can set ``phony_dims='sort'`` when opening a file to let h5netcdf invent phony dimensions according to `netCDF`_ behaviour. .. code-block:: python # mimic netCDF-behaviour for non-netcdf files f = h5netcdf.File('mydata.h5', mode='r', phony_dims='sort') ... Note, that this iterates once over the whole group-hierarchy. This has affects on performance in case you rely on lazyness of group access. You can set ``phony_dims='access'`` instead to defer phony dimension creation to group access time. The created phony dimension naming will differ from `netCDF`_ behaviour. .. code-block:: python f = h5netcdf.File('mydata.h5', mode='r', phony_dims='access') ... .. _netCDF: https://www.unidata.ucar.edu/software/netcdf/docs/interoperability_hdf5.html .. [*] Keyword default setting ``phony_dims=None`` for backwards compatibility. Changelog --------- `Changelog`_ .. _Changelog: https://github.com/h5netcdf/h5netcdf/blob/master/CHANGELOG.rst License ------- `3-clause BSD`_ .. _3-clause BSD: https://github.com/h5netcdf/h5netcdf/blob/master/LICENSE ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1639987165.0 h5netcdf-0.12.0/README.rst0000644000175100001710000001762600000000000014335 0ustar00runnerdockerh5netcdf ======== .. image:: https://github.com/h5netcdf/h5netcdf/workflows/CI/badge.svg :target: https://github.com/h5netcdf/h5netcdf/actions .. image:: https://badge.fury.io/py/h5netcdf.svg :target: https://pypi.python.org/pypi/h5netcdf/ A Python interface for the netCDF4_ file-format that reads and writes local or remote HDF5 files directly via h5py_ or h5pyd_, without relying on the Unidata netCDF library. .. _netCDF4: http://www.unidata.ucar.edu/software/netcdf/docs/file_format_specifications.html#netcdf_4_spec .. _h5py: http://www.h5py.org/ .. _h5pyd: https://github.com/HDFGroup/h5pyd Why h5netcdf? ------------- - It has one less binary dependency (netCDF C). If you already have h5py installed, reading netCDF4 with h5netcdf may be much easier than installing netCDF4-Python. - We've seen occasional reports of better performance with h5py than netCDF4-python, though in many cases performance is identical. For `one workflow`_, h5netcdf was reported to be almost **4x faster** than `netCDF4-python`_. - Anecdotally, HDF5 users seem to be unexcited about switching to netCDF -- hopefully this will convince them that netCDF4 is actually quite sane! - Finally, side-stepping the netCDF C library (and Cython bindings to it) gives us an easier way to identify the source of performance issues and bugs in the netCDF libraries/specification. .. _one workflow: https://github.com/Unidata/netcdf4-python/issues/390#issuecomment-93864839 .. _xarray: http://github.com/pydata/xarray/ Install ------- Ensure you have a recent version of h5py installed (I recommend using conda_). At least version 2.1 is required (for dimension scales); versions 2.3 and newer have been verified to work, though some tests only pass on h5py 2.6. Then: ``pip install h5netcdf`` .. _conda: http://conda.io/ Usage ----- h5netcdf has two APIs, a new API and a legacy API. Both interfaces currently reproduce most of the features of the netCDF interface, with the notable exception of support for operations the rename or delete existing objects. We simply haven't gotten around to implementing this yet. Patches would be very welcome. New API ~~~~~~~ The new API supports direct hierarchical access of variables and groups. Its design is an adaptation of h5py to the netCDF data model. For example: .. code-block:: python import h5netcdf import numpy as np with h5netcdf.File('mydata.nc', 'w') as f: # set dimensions with a dictionary f.dimensions = {'x': 5} # and update them with a dict-like interface # f.dimensions['x'] = 5 # f.dimensions.update({'x': 5}) v = f.create_variable('hello', ('x',), float) v[:] = np.ones(5) # you don't need to create groups first # you also don't need to create dimensions first if you supply data # with the new variable v = f.create_variable('/grouped/data', ('y',), data=np.arange(10)) # access and modify attributes with a dict-like interface v.attrs['foo'] = 'bar' # you can access variables and groups directly using a hierarchical # keys like h5py print(f['/grouped/data']) # add an unlimited dimension f.dimensions['z'] = None # explicitly resize a dimension and all variables using it f.resize_dimension('z', 3) Legacy API ~~~~~~~~~~ The legacy API is designed for compatibility with netCDF4-python_. To use it, import ``h5netcdf.legacyapi``: .. _netCDF4-python: https://github.com/Unidata/netcdf4-python .. code-block:: python import h5netcdf.legacyapi as netCDF4 # everything here would also work with this instead: # import netCDF4 import numpy as np with netCDF4.Dataset('mydata.nc', 'w') as ds: ds.createDimension('x', 5) v = ds.createVariable('hello', float, ('x',)) v[:] = np.ones(5) g = ds.createGroup('grouped') g.createDimension('y', 10) g.createVariable('data', 'i8', ('y',)) v = g['data'] v[:] = np.arange(10) v.foo = 'bar' print(ds.groups['grouped'].variables['data']) The legacy API is designed to be easy to try-out for netCDF4-python users, but it is not an exact match. Here is an incomplete list of functionality we don't include: - Utility functions ``chartostring``, ``num2date``, etc., that are not directly necessary for writing netCDF files. - We don't support the ``endian`` argument to ``createVariable`` yet (see `GitHub issue`_). - h5netcdf variables do not support automatic masking or scaling (e.g., of values matching the ``_FillValue`` attribute). We prefer to leave this functionality to client libraries (e.g., xarray_), which can implement their exact desired scaling behavior. - No support yet for automatic resizing of unlimited dimensions with array indexing. This would be a welcome pull request. For now, dimensions can be manually resized with ``Group.resize_dimension(dimension, size)``. .. _GitHub issue: https://github.com/h5netcdf/h5netcdf/issues/15 Invalid netCDF files ~~~~~~~~~~~~~~~~~~~~ h5py implements some features that do not (yet) result in valid netCDF files: - Data types: - Booleans - Complex values - Non-string variable length types - Enum types - Reference types - Arbitrary filters: - Scale-offset filters By default [*]_, h5netcdf will not allow writing files using any of these features, as files with such features are not readable by other netCDF tools. However, these are still valid HDF5 files. If you don't care about netCDF compatibility, you can use these features by setting ``invalid_netcdf=True`` when creating a file: .. code-block:: python # avoid the .nc extension for non-netcdf files f = h5netcdf.File('mydata.h5', invalid_netcdf=True) ... # works with the legacy API, too, though compression options are not exposed ds = h5netcdf.legacyapi.Dataset('mydata.h5', invalid_netcdf=True) ... .. [*] h5netcdf we will raise ``h5netcdf.CompatibilityError``. Decoding variable length strings ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ h5py 3.0 introduced `new behavior`_ for handling variable length string. Instead of being automatically decoded with UTF-8 into NumPy arrays of ``str``, they are required as arrays of ``bytes``. The legacy API preserves the old behavior of h5py (which matches netCDF4), and automatically decodes strings. The new API *also* currently preserves the old behavior of h5py, but issues a warning that it will change in the future to match h5py. Explicitly set ``decode_vlen_strings=False`` in the ``h5netcdf.File`` constructor to opt-in to the new behavior early, or set ``decode_vlen_strings=True`` to opt-in to automatic decoding. .. _new behavior: https://docs.h5py.org/en/stable/strings.html Datasets with missing dimension scales ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ By default [*]_ h5netcdf raises a ``ValueError`` if variables with no dimension scale associated with one of their axes are accessed. You can set ``phony_dims='sort'`` when opening a file to let h5netcdf invent phony dimensions according to `netCDF`_ behaviour. .. code-block:: python # mimic netCDF-behaviour for non-netcdf files f = h5netcdf.File('mydata.h5', mode='r', phony_dims='sort') ... Note, that this iterates once over the whole group-hierarchy. This has affects on performance in case you rely on lazyness of group access. You can set ``phony_dims='access'`` instead to defer phony dimension creation to group access time. The created phony dimension naming will differ from `netCDF`_ behaviour. .. code-block:: python f = h5netcdf.File('mydata.h5', mode='r', phony_dims='access') ... .. _netCDF: https://www.unidata.ucar.edu/software/netcdf/docs/interoperability_hdf5.html .. [*] Keyword default setting ``phony_dims=None`` for backwards compatibility. Changelog --------- `Changelog`_ .. _Changelog: https://github.com/h5netcdf/h5netcdf/blob/master/CHANGELOG.rst License ------- `3-clause BSD`_ .. _3-clause BSD: https://github.com/h5netcdf/h5netcdf/blob/master/LICENSE ././@PaxHeader0000000000000000000000000000003300000000000010211 xustar0027 mtime=1639987175.514774 h5netcdf-0.12.0/h5netcdf/0000755000175100001710000000000000000000000014332 5ustar00runnerdocker././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1639987165.0 h5netcdf-0.12.0/h5netcdf/__init__.py0000644000175100001710000000036700000000000016451 0ustar00runnerdocker""" h5netcdf ======== A Python library for the netCDF4 file-format that directly reads and writes HDF5 files via h5py, without using the Unidata netCDF library. """ from .core import CompatibilityError, File, Group, Variable, __version__ # noqa ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1639987165.0 h5netcdf-0.12.0/h5netcdf/attrs.py0000644000175100001710000000330500000000000016042 0ustar00runnerdockerfrom collections.abc import MutableMapping import numpy as np _HIDDEN_ATTRS = frozenset( [ "REFERENCE_LIST", "CLASS", "DIMENSION_LIST", "NAME", "_Netcdf4Dimid", "_Netcdf4Coordinates", "_nc3_strict", "_NCProperties", ] ) class Attributes(MutableMapping): def __init__(self, h5attrs, check_dtype): self._h5attrs = h5attrs self._check_dtype = check_dtype def __getitem__(self, key): import h5py if key in _HIDDEN_ATTRS: raise KeyError(key) # see https://github.com/h5netcdf/h5netcdf/issues/94 for details if isinstance(self._h5attrs[key], h5py.Empty): string_info = h5py.check_string_dtype(self._h5attrs[key].dtype) if string_info and string_info.length == 1: return b"" return self._h5attrs[key] def __setitem__(self, key, value): if key in _HIDDEN_ATTRS: raise AttributeError("cannot write attribute with reserved name %r" % key) if hasattr(value, "dtype"): dtype = value.dtype else: dtype = np.asarray(value).dtype self._check_dtype(dtype) self._h5attrs[key] = value def __delitem__(self, key): del self._h5attrs[key] def __iter__(self): for key in self._h5attrs: if key not in _HIDDEN_ATTRS: yield key def __len__(self): hidden_count = sum(1 if attr in self._h5attrs else 0 for attr in _HIDDEN_ATTRS) return len(self._h5attrs) - hidden_count def __repr__(self): return "\n".join( ["%r" % type(self)] + ["%s: %r" % (k, v) for k, v in self.items()] ) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1639987165.0 h5netcdf-0.12.0/h5netcdf/core.py0000644000175100001710000007601200000000000015642 0ustar00runnerdocker# For details on how netCDF4 builds on HDF5: # http://www.unidata.ucar.edu/software/netcdf/docs/file_format_specifications.html#netcdf_4_spec import os.path import warnings import weakref from collections import ChainMap, OrderedDict, defaultdict from collections.abc import Mapping from distutils.version import LooseVersion import h5py import numpy as np from .attrs import Attributes from .dimensions import Dimensions from .utils import Frozen try: import h5pyd except ImportError: no_h5pyd = True h5_group_types = (h5py.Group,) h5_dataset_types = (h5py.Dataset,) else: no_h5pyd = False h5_group_types = (h5py.Group, h5pyd.Group) h5_dataset_types = (h5py.Dataset, h5pyd.Dataset) __version__ = "0.12.0" _NC_PROPERTIES = "version=2,h5netcdf=%s,hdf5=%s,h5py=%s" % ( __version__, h5py.version.hdf5_version, h5py.__version__, ) NOT_A_VARIABLE = b"This is a netCDF dimension but not a netCDF variable." def _reverse_dict(dict_): return dict(zip(dict_.values(), dict_.keys())) def _join_h5paths(parent_path, child_path): return "/".join([parent_path.rstrip("/"), child_path.lstrip("/")]) def _name_from_dimension(dim): # First value in a dimension is the actual dimension scale # which we'll use to extract the name. return dim[0].name.split("/")[-1] class CompatibilityError(Exception): """Raised when using features that are not part of the NetCDF4 API.""" def _invalid_netcdf_feature(feature, allow): if not allow: msg = ( "{} are not a supported NetCDF feature, and are not allowed by " "h5netcdf unless invalid_netcdf=True.".format(feature) ) raise CompatibilityError(msg) class BaseVariable(object): def __init__(self, parent, name, dimensions=None): self._parent_ref = weakref.ref(parent) self._root_ref = weakref.ref(parent._root) self._h5path = _join_h5paths(parent.name, name) self._dimensions = dimensions self._initialized = True @property def _parent(self): return self._parent_ref() @property def _root(self): return self._root_ref() @property def _h5ds(self): # Always refer to the root file and store not h5py object # subclasses: return self._root._h5file[self._h5path] @property def name(self): # fix name if _nc4_non_coord_ return self._h5ds.name.replace("_nc4_non_coord_", "") def _lookup_dimensions(self): attrs = self._h5ds.attrs if "_Netcdf4Coordinates" in attrs: order_dim = _reverse_dict(self._parent._dim_order) return tuple( order_dim[coord_id] for coord_id in attrs["_Netcdf4Coordinates"] ) child_name = self.name.split("/")[-1] if child_name in self._parent.dimensions: return (child_name,) dims = [] phony_dims = defaultdict(int) for axis, dim in enumerate(self._h5ds.dims): # get current dimension dimsize = self.shape[axis] phony_dims[dimsize] += 1 if len(dim): name = _name_from_dimension(dim) else: # if unlabeled dimensions are found if self._root._phony_dims_mode is None: raise ValueError( "variable %r has no dimension scale " "associated with axis %s. \n" "Use phony_dims=%r for sorted naming or " "phony_dims=%r for per access naming." % (self.name, axis, "sort", "access") ) else: # get dimension name name = self._parent._phony_dims[(dimsize, phony_dims[dimsize] - 1)] dims.append(name) return tuple(dims) @property def dimensions(self): if self._dimensions is None: self._dimensions = self._lookup_dimensions() return self._dimensions @property def shape(self): return self._h5ds.shape @property def ndim(self): return len(self.shape) def __len__(self): return self.shape[0] @property def dtype(self): return self._h5ds.dtype def __array__(self, *args, **kwargs): return self._h5ds.__array__(*args, **kwargs) def __getitem__(self, key): if getattr(self._root, "decode_vlen_strings", False): string_info = h5py.check_string_dtype(self._h5ds.dtype) if string_info and string_info.length is None: return self._h5ds.asstr()[key] return self._h5ds[key] def __setitem__(self, key, value): self._h5ds[key] = value @property def attrs(self): return Attributes(self._h5ds.attrs, self._root._check_valid_netcdf_dtype) _cls_name = "h5netcdf.Variable" def __repr__(self): if self._parent._root._closed: return "" % self._cls_name header = "<%s %r: dimensions %s, shape %s, dtype %s>" % ( self._cls_name, self.name, self.dimensions, self.shape, self.dtype, ) return "\n".join( [header] + ["Attributes:"] + [" %s: %r" % (k, v) for k, v in self.attrs.items()] ) class Variable(BaseVariable): @property def chunks(self): return self._h5ds.chunks @property def compression(self): return self._h5ds.compression @property def compression_opts(self): return self._h5ds.compression_opts @property def fletcher32(self): return self._h5ds.fletcher32 @property def shuffle(self): return self._h5ds.shuffle class _LazyObjectLookup(Mapping): def __init__(self, parent, object_cls): self._parent_ref = weakref.ref(parent) self._object_cls = object_cls self._objects = OrderedDict() @property def _parent(self): return self._parent_ref() def __setitem__(self, name, obj): self._objects[name] = obj def add(self, name): self._objects[name] = None def __iter__(self): for name in self._objects: # fix variable name for variable which clashes with dim name yield name.replace("_nc4_non_coord_", "") def __len__(self): return len(self._objects) def __getitem__(self, key): # check for _nc4_non_coord_ variable if key not in self._objects and "_nc4_non_coord_" + key in self._objects: key = "_nc4_non_coord_" + key if self._objects[key] is not None: return self._objects[key] else: self._objects[key] = self._object_cls(self._parent, key) return self._objects[key] def _netcdf_dimension_but_not_variable(h5py_dataset): return NOT_A_VARIABLE in h5py_dataset.attrs.get("NAME", b"") def _unlabeled_dimension_mix(h5py_dataset): dims = sum([len(j) for j in h5py_dataset.dims]) if dims: if dims != h5py_dataset.ndim: name = h5py_dataset.name.split("/")[-1] raise ValueError( "malformed variable {0} has mixing of labeled and " "unlabeled dimensions.".format(name) ) return dims class Group(Mapping): _variable_cls = Variable @property def _group_cls(self): return Group def __init__(self, parent, name): self._parent_ref = weakref.ref(parent) self._root_ref = weakref.ref(parent._root) self._h5path = _join_h5paths(parent.name, name) if parent is not self: self._dim_sizes = parent._dim_sizes.new_child() self._current_dim_sizes = parent._current_dim_sizes.new_child() self._dim_order = parent._dim_order.new_child() self._all_h5groups = parent._all_h5groups.new_child(self._h5group) self._variables = _LazyObjectLookup(self, self._variable_cls) self._groups = _LazyObjectLookup(self, self._group_cls) # # initialize phony dimension counter if self._root._phony_dims_mode is not None: self._phony_dims = {} phony_dims = defaultdict(int) labeled_dims = defaultdict(int) for k, v in self._h5group.items(): if isinstance(v, h5_group_types): # add to the groups collection if this is a h5py(d) Group # instance self._groups.add(k) else: if v.attrs.get("CLASS") == b"DIMENSION_SCALE": dim_id = v.attrs.get("_Netcdf4Dimid") if "_Netcdf4Coordinates" in v.attrs: assert dim_id is not None coord_ids = v.attrs["_Netcdf4Coordinates"] size = v.shape[list(coord_ids).index(dim_id)] current_size = size else: assert len(v.shape) == 1 # Unlimited dimensions are represented as None. size = None if v.maxshape == (None,) else v.size current_size = v.size self._dim_sizes[k] = size # keep track of found labeled dimensions if self._root._phony_dims_mode is not None: labeled_dims[size] += 1 self._phony_dims[(size, labeled_dims[size] - 1)] = k # Figure out the current size of a dimension, which for # unlimited dimensions requires looking at the actual # variables. self._current_dim_sizes[k] = self._determine_current_dimension_size( k, current_size ) self._dim_order[k] = dim_id else: if self._root._phony_dims_mode is not None: # check if malformed variable if not _unlabeled_dimension_mix(v): # if unscaled variable, get phony dimensions vdims = defaultdict(int) for i in v.shape: vdims[i] += 1 for dimsize, cnt in vdims.items(): phony_dims[dimsize] = max(phony_dims[dimsize], cnt) if not _netcdf_dimension_but_not_variable(v): if isinstance(v, h5_dataset_types): self._variables.add(k) # iterate over found phony dimensions and create them if self._root._phony_dims_mode is not None: grp_phony_count = 0 for size, cnt in phony_dims.items(): # only create missing dimensions for pcnt in range(labeled_dims[size], cnt): name = grp_phony_count + self._root._phony_dim_count grp_phony_count += 1 if self._root._phony_dims_mode == "access": name = "phony_dim_{}".format(name) self._create_dimension(name, size) self._phony_dims[(size, pcnt)] = name # finally increase phony dim count at file level self._root._phony_dim_count += grp_phony_count self._initialized = True @property def _root(self): return self._root_ref() @property def _parent(self): return self._parent_ref() def _create_phony_dimensions(self): # this is for 'sort' naming for key, value in self._phony_dims.items(): if isinstance(value, int): value += self._root._labeled_dim_count name = "phony_dim_{}".format(value) self._create_dimension(name, key[0]) self._phony_dims[key] = name def _determine_current_dimension_size(self, dim_name, max_size): """ Helper method to determine the current size of a dimension. """ # Limited dimension. if self.dimensions[dim_name] is not None: return max_size def _find_dim(h5group, dim): if dim not in h5group: return _find_dim(h5group.parent, dim) return h5group[dim] dim_variable = _find_dim(self._h5group, dim_name) if "REFERENCE_LIST" not in dim_variable.attrs: return max_size root = self._h5group["/"] for ref, _ in dim_variable.attrs["REFERENCE_LIST"]: var = root[ref] for i, var_d in enumerate(var.dims): name = _name_from_dimension(var_d) if name == dim_name: max_size = max(var.shape[i], max_size) return max_size @property def _h5group(self): # Always refer to the root file and store not h5py object # subclasses: return self._root._h5file[self._h5path] @property def name(self): return self._h5group.name def _create_dimension(self, name, size=None): if name in self._dim_sizes.maps[0]: raise ValueError("dimension %r already exists" % name) self._dim_sizes[name] = size self._current_dim_sizes[name] = 0 if size is None else size self._dim_order[name] = None @property def dimensions(self): return Dimensions(self) @dimensions.setter def dimensions(self, value): for k, v in self._dim_sizes.maps[0].items(): if k in value: if v != value[k]: raise ValueError("cannot modify existing dimension %r" % k) else: raise ValueError( "new dimensions do not include existing " "dimension %r" % k ) self.dimensions.update(value) def _create_child_group(self, name): if name in self: raise ValueError("unable to create group %r (name already exists)" % name) self._h5group.create_group(name) self._groups[name] = self._group_cls(self, name) return self._groups[name] def _require_child_group(self, name): try: return self._groups[name] except KeyError: return self._create_child_group(name) def create_group(self, name): if name.startswith("/"): return self._root.create_group(name[1:]) keys = name.split("/") group = self for k in keys[:-1]: group = group._require_child_group(k) return group._create_child_group(keys[-1]) def _create_child_variable( self, name, dimensions, dtype, data, fillvalue, **kwargs ): if name in self: raise ValueError( "unable to create variable %r " "(name already exists)" % name ) if data is not None: data = np.asarray(data) for d, s in zip(dimensions, data.shape): if d not in self.dimensions: self.dimensions[d] = s if dtype is None: dtype = data.dtype if dtype == np.bool_: # never warn since h5netcdf has always errored here _invalid_netcdf_feature( "boolean dtypes", self._root.invalid_netcdf, ) else: self._root._check_valid_netcdf_dtype(dtype) if "scaleoffset" in kwargs: _invalid_netcdf_feature( "scale-offset filters", self._root.invalid_netcdf, ) # variable <-> dimension name clash if name in self.dimensions and ( name not in dimensions or (len(dimensions) > 1 and dimensions[0] != name) ): h5name = "_nc4_non_coord_" + name else: h5name = name shape = tuple(self._current_dim_sizes[d] for d in dimensions) maxshape = tuple(self._dim_sizes[d] for d in dimensions) # If it is passed directly it will change the default compression # settings. if shape != maxshape: kwargs["maxshape"] = maxshape # Clear dummy HDF5 datasets with this name that were created for a # dimension scale without a corresponding variable. if h5name in self.dimensions and h5name in self._h5group: h5ds = self._h5group[h5name] if _netcdf_dimension_but_not_variable(h5ds): self._detach_dim_scale(h5name) del self._h5group[h5name] self._h5group.create_dataset( h5name, shape, dtype=dtype, data=data, fillvalue=fillvalue, **kwargs ) self._variables[h5name] = self._variable_cls(self, h5name, dimensions) variable = self._variables[h5name] if fillvalue is not None: value = variable.dtype.type(fillvalue) variable.attrs._h5attrs["_FillValue"] = value return variable def create_variable( self, name, dimensions=(), dtype=None, data=None, fillvalue=None, **kwargs ): if name.startswith("/"): return self._root.create_variable( name[1:], dimensions, dtype, data, fillvalue, **kwargs ) keys = name.split("/") group = self for k in keys[:-1]: group = group._require_child_group(k) return group._create_child_variable( keys[-1], dimensions, dtype, data, fillvalue, **kwargs ) def _get_child(self, key): try: return self.variables[key] except KeyError: return self.groups[key] def __getitem__(self, key): if key.startswith("/"): return self._root[key[1:]] keys = key.split("/") item = self for k in keys: item = item._get_child(k) return item def __iter__(self): for name in self.groups: yield name for name in self.variables: yield name def __len__(self): return len(self.variables) + len(self.groups) def _create_dim_scales(self): """Create all necessary HDF5 dimension scale.""" dim_order = self._dim_order.maps[0] for dim in sorted(dim_order, key=lambda d: dim_order[d]): dimlen = bytes(f"{self._current_dim_sizes[dim]:10}", "ascii") scale_name = ( dim if dim in self._variables and dim in self._h5group else NOT_A_VARIABLE + dimlen ) if dim not in self._h5group: size = self._current_dim_sizes[dim] kwargs = {} if self._dim_sizes[dim] is None: kwargs["maxshape"] = (None,) self._h5group.create_dataset( name=dim, shape=(size,), dtype=">f4", **kwargs ) h5ds = self._h5group[dim] h5ds.attrs["_Netcdf4Dimid"] = np.int32(dim_order[dim]) if len(h5ds.shape) > 1: dims = self._variables[dim].dimensions coord_ids = np.array([dim_order[d] for d in dims], "int32") h5ds.attrs["_Netcdf4Coordinates"] = coord_ids if not h5py.h5ds.is_scale(h5ds.id): if h5py.__version__ < LooseVersion("2.10.0"): h5ds.dims.create_scale(h5ds, scale_name) else: h5ds.make_scale(scale_name) for subgroup in self.groups.values(): subgroup._create_dim_scales() def _attach_dim_scales(self): """Attach dimension scales to all variables.""" for name, var in self.variables.items(): # also attach for _nc4_non_coord_ variables if name not in self.dimensions or "_nc4_non_coord_" in var._h5ds.name: for n, dim in enumerate(var.dimensions): vards = var._h5ds scale = self._all_h5groups[dim] # attach only, if not already attached if not h5py.h5ds.is_attached(vards.id, scale.id, n): vards.dims[n].attach_scale(scale) for subgroup in self.groups.values(): subgroup._attach_dim_scales() def _detach_dim_scale(self, name): """Detach the dimension scale corresponding to a dimension name.""" for var in self.variables.values(): for n, dim in enumerate(var.dimensions): if dim == name: vards = var._h5ds scale = self._all_h5groups[dim] # only detach if attached if h5py.h5ds.is_attached(vards.id, scale.id, n): vards.dims[n].detach_scale(scale) for subgroup in self.groups.values(): if dim not in subgroup._h5group: subgroup._detach_dim_scale(name) @property def parent(self): return self._parent def flush(self): self._root.flush() sync = flush @property def groups(self): return Frozen(self._groups) @property def variables(self): return Frozen(self._variables) @property def attrs(self): return Attributes(self._h5group.attrs, self._root._check_valid_netcdf_dtype) _cls_name = "h5netcdf.Group" def _repr_body(self): return ( ["Dimensions:"] + [ " %s: %s" % ( k, ("Unlimited (current: %s)" % self._current_dim_sizes[k]) if v is None else v, ) for k, v in self.dimensions.items() ] + ["Groups:"] + [" %s" % g for g in self.groups] + ["Variables:"] + [ " %s: %r %s" % (k, v.dimensions, v.dtype) for k, v in self.variables.items() ] + ["Attributes:"] + [" %s: %r" % (k, v) for k, v in self.attrs.items()] ) def __repr__(self): if self._root._closed: return "" % self._cls_name header = "<%s %r (%s members)>" % (self._cls_name, self.name, len(self)) return "\n".join([header] + self._repr_body()) def resize_dimension(self, dimension, size): """ Resize a dimension to a certain size. It will pad with the underlying HDF5 data sets' fill values (usually zero) where necessary. """ if self.dimensions[dimension] is not None: raise ValueError( "Dimension '%s' is not unlimited and thus " "cannot be resized." % dimension ) # Resize the dimension. self._current_dim_sizes[dimension] = size for var in self.variables.values(): new_shape = list(var.shape) for i, d in enumerate(var.dimensions): if d == dimension: new_shape[i] = size new_shape = tuple(new_shape) if new_shape != var.shape: var._h5ds.resize(new_shape) # Recurse as dimensions are visible to this group and all child groups. for i in self.groups.values(): i.resize_dimension(dimension, size) class File(Group): def __init__( self, path, mode=None, invalid_netcdf=False, phony_dims=None, **kwargs ): # Deprecating mode='a' in favor of mode='r' # If mode is None default to 'a' and issue a warning if mode is None: msg = ( "Falling back to mode='a'. " "In future versions, mode will default to read-only. " "It is recommended to explicitly set mode='r' to prevent any unintended " "changes to the file." ) warnings.warn(msg, FutureWarning, stacklevel=2) mode = "a" if h5py.__version__ >= LooseVersion("3.0.0"): self.decode_vlen_strings = kwargs.pop("decode_vlen_strings", None) try: if isinstance(path, str): if path.startswith(("http://", "https://", "hdf5://")): if no_h5pyd: raise ImportError( "No module named 'h5pyd'. h5pyd is required for " "opening urls: {}".format(path) ) try: with h5pyd.File(path, "r") as f: # noqa pass self._preexisting_file = True except IOError: self._preexisting_file = False self._h5file = h5pyd.File(path, mode, **kwargs) else: self._preexisting_file = os.path.exists(path) and mode != "w" self._h5file = h5py.File(path, mode, **kwargs) else: # file-like object if h5py.__version__ < LooseVersion("2.9.0"): raise TypeError( "h5py version ({}) must be greater than 2.9.0 to load " "file-like objects.".format(h5py.__version__) ) else: self._preexisting_file = mode in {"r", "r+", "a"} self._h5file = h5py.File(path, mode, **kwargs) except Exception: self._closed = True raise else: self._closed = False self._mode = mode self._root_ref = weakref.ref(self) self._h5path = "/" self.invalid_netcdf = invalid_netcdf # phony dimension handling self._phony_dims_mode = phony_dims if phony_dims is not None: self._phony_dim_count = 0 if phony_dims not in ["sort", "access"]: raise ValueError( "unknown value %r for phony_dims\n" "Use phony_dims=%r for sorted naming, " "phony_dims=%r for per access naming." % (phony_dims, "sort", "access") ) # string decoding if h5py.__version__ >= LooseVersion("3.0.0"): if "legacy" in self._cls_name: if self.decode_vlen_strings is not None: msg = ( "'decode_vlen_strings' keyword argument is not allowed in h5netcdf " "legacy API." ) raise TypeError(msg) self.decode_vlen_strings = True else: if self.decode_vlen_strings is None: msg = ( "String decoding changed with h5py >= 3.0. " "See https://docs.h5py.org/en/latest/strings.html for more details. " "Currently backwards compatibility with h5py < 3.0 is kept by " "decoding vlen strings per default. This will change in future " "versions for consistency with h5py >= 3.0. To silence this " "warning set kwarg ``decode_vlen_strings=False``. Setting " "``decode_vlen_strings=True`` forces vlen string decoding." ) warnings.warn(msg, FutureWarning, stacklevel=2) self.decode_vlen_strings = True # These maps keep track of dimensions in terms of size (might be # unlimited), current size (identical to size for limited dimensions), # their position, and look-up for HDF5 datasets corresponding to a # dimension. self._dim_sizes = ChainMap() self._current_dim_sizes = ChainMap() self._dim_order = ChainMap() self._all_h5groups = ChainMap(self._h5group) super(File, self).__init__(self, self._h5path) # initialize all groups to detect/create phony dimensions # mimics netcdf-c style naming if phony_dims == "sort": self._determine_phony_dimensions() def _determine_phony_dimensions(self): def get_labeled_dimension_count(grp): count = len(grp._dim_sizes.maps[0]) for name in grp.groups: count += get_labeled_dimension_count(grp[name]) return count def create_phony_dimensions(grp): grp._create_phony_dimensions() for name in grp.groups: create_phony_dimensions(grp[name]) self._labeled_dim_count = get_labeled_dimension_count(self) create_phony_dimensions(self) def _check_valid_netcdf_dtype(self, dtype): dtype = np.dtype(dtype) if dtype == bool: description = "boolean" elif dtype == complex: description = "complex" elif h5py.check_dtype(enum=dtype) is not None: description = "enum" elif h5py.check_dtype(ref=dtype) is not None: description = "reference" elif h5py.check_dtype(vlen=dtype) not in {None, str, bytes}: description = "non-string variable length" else: description = None if description is not None: _invalid_netcdf_feature( "{} dtypes".format(description), self.invalid_netcdf, ) @property def mode(self): return self._h5file.mode @property def filename(self): return self._h5file.filename @property def parent(self): return None def _set_unassigned_dimension_ids(self): max_dim_id = -1 # collect the largest assigned dimension ID groups = [self] while groups: group = groups.pop() assigned_dim_ids = [ dim_id for dim_id in group._dim_order.values() if dim_id is not None ] max_dim_id = max([max_dim_id] + assigned_dim_ids) groups.extend(group._groups.values()) # set all dimension IDs to valid values next_dim_id = max_dim_id + 1 groups = [self] while groups: group = groups.pop() for key in group._dim_order: if group._dim_order[key] is None: group._dim_order[key] = next_dim_id next_dim_id += 1 groups.extend(group._groups.values()) def flush(self): if self._mode != "r": self._set_unassigned_dimension_ids() self._create_dim_scales() self._attach_dim_scales() if not self._preexisting_file and not self.invalid_netcdf: self.attrs._h5attrs["_NCProperties"] = np.array( _NC_PROPERTIES, dtype=h5py.string_dtype( encoding="ascii", length=len(_NC_PROPERTIES) ), ) sync = flush def close(self): if not self._closed: self.flush() self._h5file.close() self._closed = True __del__ = close def __enter__(self): return self def __exit__(self, type, value, traceback): self.close() _cls_name = "h5netcdf.File" def __repr__(self): if self._closed: return "" % self._cls_name header = "<%s %r (mode %s)>" % ( self._cls_name, self.filename.split("/")[-1], self.mode, ) return "\n".join([header] + self._repr_body()) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1639987165.0 h5netcdf-0.12.0/h5netcdf/dimensions.py0000644000175100001710000000143100000000000017053 0ustar00runnerdockerfrom collections.abc import MutableMapping class Dimensions(MutableMapping): def __init__(self, group): self._group = group def __getitem__(self, key): return self._group._dim_sizes[key] def __setitem__(self, key, value): self._group._create_dimension(key, value) def __delitem__(self, key): raise NotImplementedError("cannot yet delete dimensions") def __iter__(self): for key in self._group._dim_sizes: yield key def __len__(self): return len(self._group._dim_sizes) def __repr__(self): if self._group._root._closed: return "" return "" % ", ".join( "%s=%r" % (k, v) for k, v in self.items() ) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1639987165.0 h5netcdf-0.12.0/h5netcdf/legacyapi.py0000644000175100001710000000556200000000000016652 0ustar00runnerdockerimport h5py from . import core class HasAttributesMixin(object): _initialized = False def getncattr(self, name): return self.attrs[name] def setncattr(self, name, value): self.attrs[name] = value def ncattrs(self): return list(self.attrs) def __getattr__(self, name): try: return self.attrs[name] except KeyError: raise AttributeError( "NetCDF: attribute {0}:{1} not found".format(type(self).__name__, name) ) def __setattr__(self, name, value): if self._initialized and name not in self.__dict__: self.attrs[name] = value else: object.__setattr__(self, name, value) class Variable(core.BaseVariable, HasAttributesMixin): _cls_name = "h5netcdf.legacyapi.Variable" def chunking(self): chunks = self._h5ds.chunks if chunks is None: return "contiguous" else: return chunks def filters(self): complevel = self._h5ds.compression_opts return { "complevel": 0 if complevel is None else complevel, "fletcher32": self._h5ds.fletcher32, "shuffle": self._h5ds.shuffle, "zlib": self._h5ds.compression == "gzip", } @property def dtype(self): dt = self._h5ds.dtype if h5py.check_dtype(vlen=dt) is str: return str return dt class Group(core.Group, HasAttributesMixin): _cls_name = "h5netcdf.legacyapi.Group" _variable_cls = Variable @property def _group_cls(self): return Group createGroup = core.Group.create_group createDimension = core.Group._create_dimension def createVariable( self, varname, datatype, dimensions=(), zlib=False, complevel=4, shuffle=True, fletcher32=False, chunksizes=None, fill_value=None, ): if len(dimensions) == 0: # it's a scalar # rip off chunk and filter options for consistency with netCDF4-python chunksizes = None zlib = False fletcher32 = False shuffle = False if datatype is str: datatype = h5py.special_dtype(vlen=str) kwds = {} if zlib: # only add compression related keyword arguments if relevant (h5py # chokes otherwise) kwds["compression"] = "gzip" kwds["compression_opts"] = complevel kwds["shuffle"] = shuffle return super(Group, self).create_variable( varname, dimensions, dtype=datatype, fletcher32=fletcher32, chunks=chunksizes, fillvalue=fill_value, **kwds ) class Dataset(core.File, Group, HasAttributesMixin): _cls_name = "h5netcdf.legacyapi.Dataset" ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1639987175.5187743 h5netcdf-0.12.0/h5netcdf/tests/0000755000175100001710000000000000000000000015474 5ustar00runnerdocker././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1639987165.0 h5netcdf-0.12.0/h5netcdf/tests/conftest.py0000644000175100001710000000030600000000000017672 0ustar00runnerdockerdef pytest_addoption(parser): parser.addoption( "--restapi", action="store_true", dest="restapi", default=False, help="Enable HDF5 REST API tests", ) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1639987165.0 h5netcdf-0.12.0/h5netcdf/tests/test_h5netcdf.py0000644000175100001710000012274200000000000020615 0ustar00runnerdockerimport gc import io import random import re import string import tempfile from distutils.version import LooseVersion from os import environ as env import h5py import netCDF4 import numpy as np import pytest from pytest import raises import h5netcdf from h5netcdf import legacyapi from h5netcdf.core import NOT_A_VARIABLE try: import h5pyd without_h5pyd = False except ImportError: without_h5pyd = True remote_h5 = ("http:", "hdf5:") @pytest.fixture() def restapi(pytestconfig): return pytestconfig.getoption("restapi") @pytest.fixture def tmp_local_netcdf(tmpdir): return str(tmpdir.join("testfile.nc")) @pytest.fixture(params=["testfile.nc", "hdf5://testfile"]) def tmp_local_or_remote_netcdf(request, tmpdir, restapi): if request.param.startswith(remote_h5): if not restapi: pytest.skip("Do not test with HDF5 REST API") elif without_h5pyd: pytest.skip("h5pyd package not available") if any([env.get(v) is None for v in ("HS_USERNAME", "HS_PASSWORD")]): pytest.skip("HSDS username and/or password missing") rnd = "".join(random.choice(string.ascii_uppercase) for _ in range(5)) return ( env["HS_ENDPOINT"] + env["H5PYD_TEST_FOLDER"] + "/" + "testfile" + rnd + ".nc" ) else: return str(tmpdir.join(request.param)) @pytest.fixture(params=[True, False]) def decode_vlen_strings(request): if h5py.__version__ >= LooseVersion("3.0.0"): return dict(decode_vlen_strings=request.param) else: return {} def get_hdf5_module(resource): """Return the correct h5py module based on the input resource.""" if isinstance(resource, str) and resource.startswith(remote_h5): return h5pyd else: return h5py def string_to_char(arr): """Like nc4.stringtochar, but faster and more flexible.""" # ensure the array is contiguous arr = np.array(arr, copy=False, order="C") kind = arr.dtype.kind if kind not in ["U", "S"]: raise ValueError("argument must be a string") return arr.reshape(arr.shape + (1,)).view(kind + "1") def array_equal(a, b): a, b = map(np.array, (a[...], b[...])) if a.shape != b.shape: return False try: return np.allclose(a, b) except TypeError: return (a == b).all() _char_array = string_to_char(np.array(["a", "b", "c", "foo", "bar", "baz"], dtype="S")) _string_array = np.array( [["foobar0", "foobar1", "foobar3"], ["foofoofoo", "foofoobar", "foobarbar"]] ) _vlen_string = "foo" def is_h5py_char_working(tmp_netcdf, name): h5 = get_hdf5_module(tmp_netcdf) # https://github.com/Unidata/netcdf-c/issues/298 with h5.File(tmp_netcdf, "r") as ds: v = ds[name] try: assert array_equal(v, _char_array) return True except Exception as e: if re.match("^Can't read data", e.args[0]): return False else: raise def write_legacy_netcdf(tmp_netcdf, write_module): ds = write_module.Dataset(tmp_netcdf, "w") ds.setncattr("global", 42) ds.other_attr = "yes" ds.createDimension("x", 4) ds.createDimension("y", 5) ds.createDimension("z", 6) ds.createDimension("empty", 0) ds.createDimension("string3", 3) v = ds.createVariable("foo", float, ("x", "y"), chunksizes=(4, 5), zlib=True) v[...] = 1 v.setncattr("units", "meters") v = ds.createVariable("y", int, ("y",), fill_value=-1) v[:4] = np.arange(4) v = ds.createVariable("z", "S1", ("z", "string3"), fill_value=b"X") v[...] = _char_array v = ds.createVariable("scalar", np.float32, ()) v[...] = 2.0 # test creating a scalar with compression option (with should be ignored) v = ds.createVariable("intscalar", np.int64, (), zlib=6, fill_value=None) v[...] = 2 with raises((h5netcdf.CompatibilityError, TypeError)): ds.createVariable("boolean", np.bool_, ("x")) g = ds.createGroup("subgroup") v = g.createVariable("subvar", np.int32, ("x",)) v[...] = np.arange(4.0) g.createDimension("y", 10) g.createVariable("y_var", float, ("y",)) ds.createDimension("mismatched_dim", 1) ds.createVariable("mismatched_dim", int, ()) v = ds.createVariable("var_len_str", str, ("x")) v[0] = "foo" ds.close() def write_h5netcdf(tmp_netcdf): ds = h5netcdf.File(tmp_netcdf, "w") ds.attrs["global"] = 42 ds.attrs["other_attr"] = "yes" ds.dimensions = {"x": 4, "y": 5, "z": 6, "empty": 0} v = ds.create_variable( "foo", ("x", "y"), float, chunks=(4, 5), compression="gzip", shuffle=True ) v[...] = 1 v.attrs["units"] = "meters" v = ds.create_variable("y", ("y",), int, fillvalue=-1) v[:4] = np.arange(4) v = ds.create_variable("z", ("z", "string3"), data=_char_array, fillvalue=b"X") v = ds.create_variable("scalar", data=np.float32(2.0)) v = ds.create_variable("intscalar", data=np.int64(2)) with raises((h5netcdf.CompatibilityError, TypeError)): ds.create_variable("boolean", data=True) g = ds.create_group("subgroup") v = g.create_variable("subvar", ("x",), np.int32) v[...] = np.arange(4.0) with raises(AttributeError): v.attrs["_Netcdf4Dimid"] = -1 g.dimensions["y"] = 10 g.create_variable("y_var", ("y",), float) g.flush() ds.dimensions["mismatched_dim"] = 1 ds.create_variable("mismatched_dim", dtype=int) ds.flush() dt = h5py.special_dtype(vlen=str) v = ds.create_variable("var_len_str", ("x",), dtype=dt) v[0] = _vlen_string ds.close() def read_legacy_netcdf(tmp_netcdf, read_module, write_module): ds = read_module.Dataset(tmp_netcdf, "r") assert ds.ncattrs() == ["global", "other_attr"] assert ds.getncattr("global") == 42 if write_module is not netCDF4: # skip for now: https://github.com/Unidata/netcdf4-python/issues/388 assert ds.other_attr == "yes" with pytest.raises(AttributeError): ds.does_not_exist assert set(ds.dimensions) == set( ["x", "y", "z", "empty", "string3", "mismatched_dim"] ) assert set(ds.variables) == set( ["foo", "y", "z", "intscalar", "scalar", "var_len_str", "mismatched_dim"] ) assert set(ds.groups) == set(["subgroup"]) assert ds.parent is None v = ds.variables["foo"] assert array_equal(v, np.ones((4, 5))) assert v.dtype == float assert v.dimensions == ("x", "y") assert v.ndim == 2 assert v.ncattrs() == ["units"] if write_module is not netCDF4: assert v.getncattr("units") == "meters" assert tuple(v.chunking()) == (4, 5) assert v.filters() == { "complevel": 4, "fletcher32": False, "shuffle": True, "zlib": True, } v = ds.variables["y"] assert array_equal(v, np.r_[np.arange(4), [-1]]) assert v.dtype == int assert v.dimensions == ("y",) assert v.ndim == 1 assert v.ncattrs() == ["_FillValue"] assert v.getncattr("_FillValue") == -1 assert v.chunking() == "contiguous" assert v.filters() == { "complevel": 0, "fletcher32": False, "shuffle": False, "zlib": False, } ds.close() # Check the behavior if h5py. Cannot expect h5netcdf to overcome these # errors: if is_h5py_char_working(tmp_netcdf, "z"): ds = read_module.Dataset(tmp_netcdf, "r") v = ds.variables["z"] assert array_equal(v, _char_array) assert v.dtype == "S1" assert v.ndim == 2 assert v.dimensions == ("z", "string3") assert v.ncattrs() == ["_FillValue"] assert v.getncattr("_FillValue") == b"X" else: ds = read_module.Dataset(tmp_netcdf, "r") v = ds.variables["scalar"] assert array_equal(v, np.array(2.0)) assert v.dtype == "float32" assert v.ndim == 0 assert v.dimensions == () assert v.ncattrs() == [] v = ds.variables["intscalar"] assert array_equal(v, np.array(2)) assert v.dtype == "int64" assert v.ndim == 0 assert v.dimensions == () assert v.ncattrs() == [] v = ds.variables["var_len_str"] assert v.dtype == str assert v[0] == _vlen_string v = ds.groups["subgroup"].variables["subvar"] assert ds.groups["subgroup"].parent is ds assert array_equal(v, np.arange(4.0)) assert v.dtype == "int32" assert v.ndim == 1 assert v.dimensions == ("x",) assert v.ncattrs() == [] v = ds.groups["subgroup"].variables["y_var"] assert v.shape == (10,) assert "y" in ds.groups["subgroup"].dimensions ds.close() def read_h5netcdf(tmp_netcdf, write_module, decode_vlen_strings): remote_file = isinstance(tmp_netcdf, str) and tmp_netcdf.startswith(remote_h5) ds = h5netcdf.File(tmp_netcdf, "r", **decode_vlen_strings) assert ds.name == "/" assert list(ds.attrs) == ["global", "other_attr"] assert ds.attrs["global"] == 42 if write_module is not netCDF4: # skip for now: https://github.com/Unidata/netcdf4-python/issues/388 assert ds.attrs["other_attr"] == "yes" assert set(ds.dimensions) == set( ["x", "y", "z", "empty", "string3", "mismatched_dim"] ) assert set(ds.variables) == set( ["foo", "y", "z", "intscalar", "scalar", "var_len_str", "mismatched_dim"] ) assert set(ds.groups) == set(["subgroup"]) assert ds.parent is None v = ds["foo"] assert v.name == "/foo" assert array_equal(v, np.ones((4, 5))) assert v.dtype == float assert v.dimensions == ("x", "y") assert v.ndim == 2 assert list(v.attrs) == ["units"] if write_module is not netCDF4: assert v.attrs["units"] == "meters" assert v.chunks == (4, 5) assert v.compression == "gzip" assert v.compression_opts == 4 assert not v.fletcher32 assert v.shuffle v = ds["y"] assert array_equal(v, np.r_[np.arange(4), [-1]]) assert v.dtype == int assert v.dimensions == ("y",) assert v.ndim == 1 assert list(v.attrs) == ["_FillValue"] assert v.attrs["_FillValue"] == -1 if not remote_file: assert v.chunks is None assert v.compression is None assert v.compression_opts is None assert not v.fletcher32 assert not v.shuffle ds.close() if is_h5py_char_working(tmp_netcdf, "z"): ds = h5netcdf.File(tmp_netcdf, "r") v = ds["z"] assert array_equal(v, _char_array) assert v.dtype == "S1" assert v.ndim == 2 assert v.dimensions == ("z", "string3") assert list(v.attrs) == ["_FillValue"] assert v.attrs["_FillValue"] == b"X" else: ds = h5netcdf.File(tmp_netcdf, "r", **decode_vlen_strings) v = ds["scalar"] assert array_equal(v, np.array(2.0)) assert v.dtype == "float32" assert v.ndim == 0 assert v.dimensions == () assert list(v.attrs) == [] v = ds.variables["intscalar"] assert array_equal(v, np.array(2)) assert v.dtype == "int64" assert v.ndim == 0 assert v.dimensions == () assert list(v.attrs) == [] v = ds["var_len_str"] assert h5py.check_dtype(vlen=v.dtype) == str if getattr(ds, "decode_vlen_strings", True): assert v[0] == _vlen_string else: assert v[0] == _vlen_string.encode("utf_8") v = ds["/subgroup/subvar"] assert v is ds["subgroup"]["subvar"] assert v is ds["subgroup/subvar"] assert v is ds["subgroup"]["/subgroup/subvar"] assert v.name == "/subgroup/subvar" assert ds["subgroup"].name == "/subgroup" assert ds["subgroup"].parent is ds assert array_equal(v, np.arange(4.0)) assert v.dtype == "int32" assert v.ndim == 1 assert v.dimensions == ("x",) assert list(v.attrs) == [] assert ds["/subgroup/y_var"].shape == (10,) assert ds["/subgroup"].dimensions["y"] == 10 ds.close() def roundtrip_legacy_netcdf(tmp_netcdf, read_module, write_module): write_legacy_netcdf(tmp_netcdf, write_module) read_legacy_netcdf(tmp_netcdf, read_module, write_module) def test_write_legacyapi_read_netCDF4(tmp_local_netcdf): roundtrip_legacy_netcdf(tmp_local_netcdf, netCDF4, legacyapi) def test_roundtrip_h5netcdf_legacyapi(tmp_local_netcdf): roundtrip_legacy_netcdf(tmp_local_netcdf, legacyapi, legacyapi) def test_write_netCDF4_read_legacyapi(tmp_local_netcdf): roundtrip_legacy_netcdf(tmp_local_netcdf, legacyapi, netCDF4) def test_write_h5netcdf_read_legacyapi(tmp_local_netcdf): write_h5netcdf(tmp_local_netcdf) read_legacy_netcdf(tmp_local_netcdf, legacyapi, h5netcdf) def test_write_h5netcdf_read_netCDF4(tmp_local_netcdf): write_h5netcdf(tmp_local_netcdf) read_legacy_netcdf(tmp_local_netcdf, netCDF4, h5netcdf) def test_roundtrip_h5netcdf(tmp_local_or_remote_netcdf, decode_vlen_strings): write_h5netcdf(tmp_local_or_remote_netcdf) read_h5netcdf(tmp_local_or_remote_netcdf, h5netcdf, decode_vlen_strings) def test_write_netCDF4_read_h5netcdf(tmp_local_netcdf, decode_vlen_strings): write_legacy_netcdf(tmp_local_netcdf, netCDF4) read_h5netcdf(tmp_local_netcdf, netCDF4, decode_vlen_strings) def test_write_legacyapi_read_h5netcdf(tmp_local_netcdf, decode_vlen_strings): write_legacy_netcdf(tmp_local_netcdf, legacyapi) read_h5netcdf(tmp_local_netcdf, legacyapi, decode_vlen_strings) def test_fileobj(decode_vlen_strings): if h5py.__version__ < LooseVersion("2.9.0"): pytest.skip("h5py > 2.9.0 required to test file-like objects") fileobj = tempfile.TemporaryFile() write_h5netcdf(fileobj) read_h5netcdf(fileobj, h5netcdf, decode_vlen_strings) fileobj = io.BytesIO() write_h5netcdf(fileobj) read_h5netcdf(fileobj, h5netcdf, decode_vlen_strings) def test_repr(tmp_local_or_remote_netcdf): write_h5netcdf(tmp_local_or_remote_netcdf) f = h5netcdf.File(tmp_local_or_remote_netcdf, "r") assert "h5netcdf.File" in repr(f) assert "subgroup" in repr(f) assert "foo" in repr(f) assert "other_attr" in repr(f) assert "h5netcdf.attrs.Attributes" in repr(f.attrs) assert "global" in repr(f.attrs) d = f.dimensions assert "h5netcdf.Dimensions" in repr(d) assert "x=4" in repr(d) g = f["subgroup"] assert "h5netcdf.Group" in repr(g) assert "subvar" in repr(g) v = f["foo"] assert "h5netcdf.Variable" in repr(v) assert "float" in repr(v) assert "units" in repr(v) f.dimensions["temp"] = None assert "temp: Unlimited (current: 0)" in repr(f) f.resize_dimension("temp", 5) assert "temp: Unlimited (current: 5)" in repr(f) f.close() assert "Closed" in repr(f) assert "Closed" in repr(d) assert "Closed" in repr(g) assert "Closed" in repr(v) def test_attrs_api(tmp_local_or_remote_netcdf): with h5netcdf.File(tmp_local_or_remote_netcdf, "w") as ds: ds.attrs["conventions"] = "CF" ds.attrs["empty_string"] = h5py.Empty(dtype=np.dtype("|S1")) ds.dimensions["x"] = 1 v = ds.create_variable("x", ("x",), "i4") v.attrs.update({"units": "meters", "foo": "bar"}) assert ds._closed with h5netcdf.File(tmp_local_or_remote_netcdf, "r") as ds: assert len(ds.attrs) == 2 assert dict(ds.attrs) == {"conventions": "CF", "empty_string": b""} assert list(ds.attrs) == ["conventions", "empty_string"] assert dict(ds["x"].attrs) == {"units": "meters", "foo": "bar"} assert len(ds["x"].attrs) == 2 assert sorted(ds["x"].attrs) == ["foo", "units"] def test_optional_netcdf4_attrs(tmp_local_or_remote_netcdf): h5 = get_hdf5_module(tmp_local_or_remote_netcdf) with h5.File(tmp_local_or_remote_netcdf, "w") as f: foo_data = np.arange(50).reshape(5, 10) f.create_dataset("foo", data=foo_data) f.create_dataset("x", data=np.arange(5)) f.create_dataset("y", data=np.arange(10)) if h5py.__version__ < LooseVersion("2.10.0"): f["foo"].dims.create_scale(f["x"]) f["foo"].dims.create_scale(f["y"]) else: f["x"].make_scale() f["y"].make_scale() f["foo"].dims[0].attach_scale(f["x"]) f["foo"].dims[1].attach_scale(f["y"]) with h5netcdf.File(tmp_local_or_remote_netcdf, "r") as ds: assert ds["foo"].dimensions == ("x", "y") assert ds.dimensions == {"x": 5, "y": 10} assert array_equal(ds["foo"], foo_data) def test_error_handling(tmp_local_or_remote_netcdf): with h5netcdf.File(tmp_local_or_remote_netcdf, "w") as ds: ds.dimensions["x"] = 1 with raises(ValueError): ds.dimensions["x"] = 2 with raises(ValueError): ds.dimensions = {"x": 2} with raises(ValueError): ds.dimensions = {"y": 3} ds.create_variable("x", ("x",), dtype=float) with raises(ValueError): ds.create_variable("x", ("x",), dtype=float) ds.create_group("subgroup") with raises(ValueError): ds.create_group("subgroup") @pytest.mark.skipif( h5py.__version__ < LooseVersion("3.0.0"), reason="not needed with h5py < 3.0" ) def test_decode_string_warning(tmp_local_or_remote_netcdf): write_h5netcdf(tmp_local_or_remote_netcdf) with pytest.warns(FutureWarning): with h5netcdf.File(tmp_local_or_remote_netcdf, "r") as ds: assert ds.name == "/" @pytest.mark.skipif( h5py.__version__ < LooseVersion("3.0.0"), reason="not needed with h5py < 3.0" ) def test_decode_string_error(tmp_local_or_remote_netcdf): write_h5netcdf(tmp_local_or_remote_netcdf) with pytest.raises(TypeError): with h5netcdf.legacyapi.Dataset( tmp_local_or_remote_netcdf, "r", decode_vlen_strings=True ) as ds: assert ds.name == "/" def test_mode_warning(tmp_local_or_remote_netcdf): with pytest.warns(FutureWarning): with h5netcdf.File(tmp_local_or_remote_netcdf): pass def create_invalid_netcdf_data(): foo_data = np.arange(125).reshape(5, 5, 5) bar_data = np.arange(625).reshape(25, 5, 5) var = {"foo1": foo_data, "foo2": bar_data, "foo3": foo_data, "foo4": bar_data} var2 = {"x": 5, "y": 5, "z": 5, "x1": 25, "y1": 5, "z1": 5} return var, var2 def check_invalid_netcdf4(var, i): pdim = "phony_dim_{}" assert var["foo1"].dimensions[0] == pdim.format(i * 4) assert var["foo1"].dimensions[1] == pdim.format(1 + i * 4) assert var["foo1"].dimensions[2] == pdim.format(2 + i * 4) assert var["foo2"].dimensions[0] == pdim.format(3 + i * 4) assert var["foo2"].dimensions[1] == pdim.format(0 + i * 4) assert var["foo2"].dimensions[2] == pdim.format(1 + i * 4) assert var["foo3"].dimensions[0] == pdim.format(i * 4) assert var["foo3"].dimensions[1] == pdim.format(1 + i * 4) assert var["foo3"].dimensions[2] == pdim.format(2 + i * 4) assert var["foo4"].dimensions[0] == pdim.format(3 + i * 4) assert var["foo4"].dimensions[1] == pdim.format(i * 4) assert var["foo4"].dimensions[2] == pdim.format(1 + i * 4) assert var["x"].dimensions[0] == pdim.format(i * 4) assert var["y"].dimensions[0] == pdim.format(i * 4) assert var["z"].dimensions[0] == pdim.format(i * 4) assert var["x1"].dimensions[0] == pdim.format(3 + i * 4) assert var["y1"].dimensions[0] == pdim.format(i * 4) assert var["z1"].dimensions[0] == pdim.format(i * 4) def test_invalid_netcdf4(tmp_local_or_remote_netcdf): h5 = get_hdf5_module(tmp_local_or_remote_netcdf) with h5.File(tmp_local_or_remote_netcdf, "w") as f: var, var2 = create_invalid_netcdf_data() grps = ["bar", "baz"] for grp in grps: fx = f.create_group(grp) for k, v in var.items(): fx.create_dataset(k, data=v) for k, v in var2.items(): fx.create_dataset(k, data=np.arange(v)) with h5netcdf.File(tmp_local_or_remote_netcdf, "r", phony_dims="sort") as dsr: i = len(grps) - 1 for grp in grps[::-1]: var = dsr[grp].variables check_invalid_netcdf4(var, i) i -= 1 with h5netcdf.File(tmp_local_or_remote_netcdf, "r", phony_dims="access") as dsr: for i, grp in enumerate(grps[::-1]): print(dsr[grp]) var = dsr[grp].variables check_invalid_netcdf4(var, i) with netCDF4.Dataset(tmp_local_or_remote_netcdf, "r") as dsr: for i, grp in enumerate(grps): print(dsr[grp]) var = dsr[grp].variables check_invalid_netcdf4(var, i) with h5netcdf.File(tmp_local_or_remote_netcdf, "r") as ds: with raises(ValueError): ds["bar"].variables["foo1"].dimensions with raises(ValueError): with h5netcdf.File(tmp_local_or_remote_netcdf, "r", phony_dims="srt") as ds: pass def check_invalid_netcdf4_mixed(var, i): pdim = "phony_dim_{}".format(i) assert var["foo1"].dimensions[0] == "y1" assert var["foo1"].dimensions[1] == "z1" assert var["foo1"].dimensions[2] == pdim assert var["foo2"].dimensions[0] == "x1" assert var["foo2"].dimensions[1] == "y1" assert var["foo2"].dimensions[2] == "z1" assert var["foo3"].dimensions[0] == "y1" assert var["foo3"].dimensions[1] == "z1" assert var["foo3"].dimensions[2] == pdim assert var["foo4"].dimensions[0] == "x1" assert var["foo4"].dimensions[1] == "y1" assert var["foo4"].dimensions[2] == "z1" assert var["x"].dimensions[0] == "y1" assert var["y"].dimensions[0] == "y1" assert var["z"].dimensions[0] == "y1" assert var["x1"].dimensions[0] == "x1" assert var["y1"].dimensions[0] == "y1" assert var["z1"].dimensions[0] == "z1" def test_invalid_netcdf4_mixed(tmp_local_or_remote_netcdf): h5 = get_hdf5_module(tmp_local_or_remote_netcdf) with h5.File(tmp_local_or_remote_netcdf, "w") as f: var, var2 = create_invalid_netcdf_data() for k, v in var.items(): f.create_dataset(k, data=v) for k, v in var2.items(): f.create_dataset(k, data=np.arange(v)) if h5py.__version__ < LooseVersion("2.10.0"): f["foo2"].dims.create_scale(f["x1"]) f["foo2"].dims.create_scale(f["y1"]) f["foo2"].dims.create_scale(f["z1"]) else: f["x1"].make_scale() f["y1"].make_scale() f["z1"].make_scale() f["foo2"].dims[0].attach_scale(f["x1"]) f["foo2"].dims[1].attach_scale(f["y1"]) f["foo2"].dims[2].attach_scale(f["z1"]) with h5netcdf.File(tmp_local_or_remote_netcdf, "r", phony_dims="sort") as ds: var = ds.variables check_invalid_netcdf4_mixed(var, 3) with h5netcdf.File(tmp_local_or_remote_netcdf, "r", phony_dims="access") as ds: var = ds.variables check_invalid_netcdf4_mixed(var, 0) with netCDF4.Dataset(tmp_local_or_remote_netcdf, "r") as ds: var = ds.variables check_invalid_netcdf4_mixed(var, 3) with h5netcdf.File(tmp_local_or_remote_netcdf, "r") as ds: with raises(ValueError): ds.variables["foo1"].dimensions def test_invalid_netcdf_malformed_dimension_scales(tmp_local_or_remote_netcdf): h5 = get_hdf5_module(tmp_local_or_remote_netcdf) with h5.File(tmp_local_or_remote_netcdf, "w") as f: foo_data = np.arange(125).reshape(5, 5, 5) f.create_dataset("foo1", data=foo_data) f.create_dataset("x", data=np.arange(5)) f.create_dataset("y", data=np.arange(5)) f.create_dataset("z", data=np.arange(5)) if h5py.__version__ < LooseVersion("2.10.0"): f["foo1"].dims.create_scale(f["x"]) f["foo1"].dims.create_scale(f["y"]) f["foo1"].dims.create_scale(f["z"]) else: f["x"].make_scale() f["y"].make_scale() f["z"].make_scale() f["foo1"].dims[0].attach_scale(f["x"]) with raises(ValueError): with h5netcdf.File(tmp_local_or_remote_netcdf, "r", phony_dims="sort") as ds: assert ds def test_hierarchical_access_auto_create(tmp_local_or_remote_netcdf): ds = h5netcdf.File(tmp_local_or_remote_netcdf, "w") ds.create_variable("/foo/bar", data=1) g = ds.create_group("foo/baz") g.create_variable("/foo/hello", data=2) assert set(ds) == set(["foo"]) assert set(ds["foo"]) == set(["bar", "baz", "hello"]) ds.close() ds = h5netcdf.File(tmp_local_or_remote_netcdf, "r") assert set(ds) == set(["foo"]) assert set(ds["foo"]) == set(["bar", "baz", "hello"]) ds.close() def test_Netcdf4Dimid(tmp_local_netcdf): # regression test for https://github.com/h5netcdf/h5netcdf/issues/53 with h5netcdf.File(tmp_local_netcdf, "w") as f: f.dimensions["x"] = 1 g = f.create_group("foo") g.dimensions["x"] = 2 g.dimensions["y"] = 3 with h5py.File(tmp_local_netcdf, "r") as f: # all dimension IDs should be present exactly once dim_ids = {f[name].attrs["_Netcdf4Dimid"] for name in ["x", "foo/x", "foo/y"]} assert dim_ids == {0, 1, 2} def test_reading_str_array_from_netCDF4(tmp_local_netcdf, decode_vlen_strings): # This tests reading string variables created by netCDF4 with netCDF4.Dataset(tmp_local_netcdf, "w") as ds: ds.createDimension("foo1", _string_array.shape[0]) ds.createDimension("foo2", _string_array.shape[1]) ds.createVariable("bar", str, ("foo1", "foo2")) ds.variables["bar"][:] = _string_array ds = h5netcdf.File(tmp_local_netcdf, "r", **decode_vlen_strings) v = ds.variables["bar"] if getattr(ds, "decode_vlen_strings", True): assert array_equal(v, _string_array) else: assert array_equal(v, np.char.encode(_string_array)) ds.close() def test_nc_properties_new(tmp_local_or_remote_netcdf): with h5netcdf.File(tmp_local_or_remote_netcdf, "w"): pass h5 = get_hdf5_module(tmp_local_or_remote_netcdf) with h5.File(tmp_local_or_remote_netcdf, "r") as f: assert b"h5netcdf" in f.attrs["_NCProperties"] def test_failed_read_open_and_clean_delete(tmpdir): # A file that does not exist but is opened for # reading should only raise an IOError and # no AttributeError at garbage collection. path = str(tmpdir.join("this_file_does_not_exist.nc")) try: with h5netcdf.File(path, "r") as ds: assert ds except IOError: pass # Look at garbage collection: # A simple gc.collect() does not raise an exception. # Must seek the File object and imitate its del command # by forcing it to close. obj_list = gc.get_objects() for obj in obj_list: try: is_h5netcdf_File = isinstance(obj, h5netcdf.File) except AttributeError: is_h5netcdf_File = False if is_h5netcdf_File: obj.close() def test_create_variable_matching_saved_dimension(tmp_local_or_remote_netcdf): h5 = get_hdf5_module(tmp_local_or_remote_netcdf) if h5 is not h5py: pytest.xfail("https://github.com/h5netcdf/h5netcdf/issues/48") with h5netcdf.File(tmp_local_or_remote_netcdf, "w") as f: f.dimensions["x"] = 2 f.create_variable("y", data=[1, 2], dimensions=("x",)) with h5.File(tmp_local_or_remote_netcdf, "r") as f: dimlen = f"{f['y'].dims[0].values()[0].size:10}" assert f["y"].dims[0].keys() == [NOT_A_VARIABLE.decode("ascii") + dimlen] with h5netcdf.File(tmp_local_or_remote_netcdf, "a") as f: f.create_variable("x", data=[0, 1], dimensions=("x",)) with h5.File(tmp_local_or_remote_netcdf, "r") as f: assert f["y"].dims[0].keys() == ["x"] def test_invalid_netcdf_error(tmp_local_or_remote_netcdf): with h5netcdf.File(tmp_local_or_remote_netcdf, "w", invalid_netcdf=False) as f: # valid f.create_variable( "lzf_compressed", data=[1], dimensions=("x"), compression="lzf" ) # invalid with pytest.raises(h5netcdf.CompatibilityError): f.create_variable("complex", data=1j) with pytest.raises(h5netcdf.CompatibilityError): f.attrs["complex_attr"] = 1j with pytest.raises(h5netcdf.CompatibilityError): f.create_variable("scaleoffset", data=[1], dimensions=("x",), scaleoffset=0) def test_invalid_netcdf_okay(tmp_local_or_remote_netcdf): if tmp_local_or_remote_netcdf.startswith(remote_h5): pytest.skip("h5pyd does not support NumPy complex dtype yet") with h5netcdf.File(tmp_local_or_remote_netcdf, "w", invalid_netcdf=True) as f: f.create_variable( "lzf_compressed", data=[1], dimensions=("x"), compression="lzf" ) f.create_variable("complex", data=1j) f.attrs["complex_attr"] = 1j f.create_variable("scaleoffset", data=[1], dimensions=("x",), scaleoffset=0) with h5netcdf.File(tmp_local_or_remote_netcdf, "r") as f: np.testing.assert_equal(f["lzf_compressed"][:], [1]) assert f["complex"][...] == 1j assert f.attrs["complex_attr"] == 1j np.testing.assert_equal(f["scaleoffset"][:], [1]) h5 = get_hdf5_module(tmp_local_or_remote_netcdf) with h5.File(tmp_local_or_remote_netcdf, "r") as f: assert "_NCProperties" not in f.attrs def test_reopen_file_different_dimension_sizes(tmp_local_netcdf): # regression test for https://github.com/h5netcdf/h5netcdf/issues/55 with h5netcdf.File(tmp_local_netcdf, "w") as f: f.create_variable("/one/foo", data=[1], dimensions=("x",)) with h5netcdf.File(tmp_local_netcdf, "a") as f: f.create_variable("/two/foo", data=[1, 2], dimensions=("x",)) with netCDF4.Dataset(tmp_local_netcdf, "r") as f: assert f.groups["one"].variables["foo"][...].shape == (1,) def test_invalid_then_valid_no_ncproperties(tmp_local_or_remote_netcdf): with h5netcdf.File(tmp_local_or_remote_netcdf, "w", invalid_netcdf=True): pass with h5netcdf.File(tmp_local_or_remote_netcdf, "a"): pass h5 = get_hdf5_module(tmp_local_or_remote_netcdf) with h5.File(tmp_local_or_remote_netcdf, "r") as f: # still not a valid netcdf file assert "_NCProperties" not in f.attrs def test_creating_and_resizing_unlimited_dimensions(tmp_local_or_remote_netcdf): with h5netcdf.File(tmp_local_or_remote_netcdf, "w") as f: f.dimensions["x"] = None f.dimensions["y"] = 15 f.dimensions["z"] = None f.resize_dimension("z", 20) with pytest.raises(ValueError) as e: f.resize_dimension("y", 20) assert e.value.args[0] == ( "Dimension 'y' is not unlimited and thus cannot be resized." ) h5 = get_hdf5_module(tmp_local_or_remote_netcdf) # Assert some behavior observed by using the C netCDF bindings. with h5.File(tmp_local_or_remote_netcdf, "r") as f: assert f["x"].shape == (0,) assert f["x"].maxshape == (None,) assert f["y"].shape == (15,) assert f["y"].maxshape == (15,) assert f["z"].shape == (20,) assert f["z"].maxshape == (None,) def test_creating_variables_with_unlimited_dimensions(tmp_local_or_remote_netcdf): with h5netcdf.File(tmp_local_or_remote_netcdf, "w") as f: f.dimensions["x"] = None f.dimensions["y"] = 2 # Creating a variable without data will initialize an array with zero # length. f.create_variable("dummy", dimensions=("x", "y"), dtype=np.int64) assert f.variables["dummy"].shape == (0, 2) assert f.variables["dummy"]._h5ds.maxshape == (None, 2) # Trying to create a variable while the current size of the dimension # is still zero will fail. with pytest.raises(ValueError) as e: f.create_variable( "dummy2", data=np.array([[1, 2], [3, 4]]), dimensions=("x", "y") ) assert e.value.args[0] == "Shape tuple is incompatible with data" # Resize data. assert f.variables["dummy"].shape == (0, 2) f.resize_dimension("x", 3) # This will also force a resize of the existing variables and it will # be padded with zeros.. np.testing.assert_allclose(f.variables["dummy"], np.zeros((3, 2))) # Creating another variable with no data will now also take the shape # of the current dimensions. f.create_variable("dummy3", dimensions=("x", "y"), dtype=np.int64) assert f.variables["dummy3"].shape == (3, 2) assert f.variables["dummy3"]._h5ds.maxshape == (None, 2) # Close and read again to also test correct parsing of unlimited # dimensions. with h5netcdf.File(tmp_local_or_remote_netcdf, "r") as f: assert f.dimensions["x"] is None assert f._h5file["x"].maxshape == (None,) assert f._h5file["x"].shape == (3,) assert f.dimensions["y"] == 2 assert f._h5file["y"].maxshape == (2,) assert f._h5file["y"].shape == (2,) def test_writing_to_an_unlimited_dimension(tmp_local_or_remote_netcdf): with h5netcdf.File(tmp_local_or_remote_netcdf, "w") as f: # Two dimensions, only one is unlimited. f.dimensions["x"] = None f.dimensions["y"] = 3 # Cannot create it without first resizing it. with pytest.raises(ValueError) as e: f.create_variable( "dummy1", data=np.array([[1, 2, 3]]), dimensions=("x", "y") ) assert e.value.args[0] == "Shape tuple is incompatible with data" # Without data. f.create_variable("dummy1", dimensions=("x", "y"), dtype=np.int64) f.create_variable("dummy2", dimensions=("x", "y"), dtype=np.int64) f.create_variable("dummy3", dimensions=("x", "y"), dtype=np.int64) g = f.create_group("test") g.create_variable("dummy4", dimensions=("y", "x", "x"), dtype=np.int64) g.create_variable("dummy5", dimensions=("y", "y"), dtype=np.int64) assert f.variables["dummy1"].shape == (0, 3) assert f.variables["dummy2"].shape == (0, 3) assert f.variables["dummy3"].shape == (0, 3) assert g.variables["dummy4"].shape == (3, 0, 0) assert g.variables["dummy5"].shape == (3, 3) f.resize_dimension("x", 2) assert f.variables["dummy1"].shape == (2, 3) assert f.variables["dummy2"].shape == (2, 3) assert f.variables["dummy3"].shape == (2, 3) assert g.variables["dummy4"].shape == (3, 2, 2) assert g.variables["dummy5"].shape == (3, 3) f.variables["dummy2"][:] = [[1, 2, 3], [5, 6, 7]] np.testing.assert_allclose(f.variables["dummy2"], [[1, 2, 3], [5, 6, 7]]) f.variables["dummy3"][...] = [[1, 2, 3], [5, 6, 7]] np.testing.assert_allclose(f.variables["dummy3"], [[1, 2, 3], [5, 6, 7]]) def test_c_api_can_read_unlimited_dimensions(tmp_local_netcdf): with h5netcdf.File(tmp_local_netcdf, "w") as f: # Three dimensions, only one is limited. f.dimensions["x"] = None f.dimensions["y"] = 3 f.dimensions["z"] = None f.create_variable("dummy1", dimensions=("x", "y"), dtype=np.int64) f.create_variable("dummy2", dimensions=("y", "x", "x"), dtype=np.int64) g = f.create_group("test") g.create_variable("dummy3", dimensions=("y", "y"), dtype=np.int64) g.create_variable("dummy4", dimensions=("z", "z"), dtype=np.int64) f.resize_dimension("x", 2) with netCDF4.Dataset(tmp_local_netcdf, "r") as f: assert f.dimensions["x"].size == 2 assert f.dimensions["x"].isunlimited() is True assert f.dimensions["y"].size == 3 assert f.dimensions["y"].isunlimited() is False assert f.dimensions["z"].size == 0 assert f.dimensions["z"].isunlimited() is True assert f.variables["dummy1"].shape == (2, 3) assert f.variables["dummy2"].shape == (3, 2, 2) g = f.groups["test"] assert g.variables["dummy3"].shape == (3, 3) assert g.variables["dummy4"].shape == (0, 0) def test_reading_unlimited_dimensions_created_with_c_api(tmp_local_netcdf): with netCDF4.Dataset(tmp_local_netcdf, "w") as f: f.createDimension("x", None) f.createDimension("y", 3) f.createDimension("z", None) dummy1 = f.createVariable("dummy1", float, ("x", "y")) f.createVariable("dummy2", float, ("y", "x", "x")) g = f.createGroup("test") g.createVariable("dummy3", float, ("y", "y")) g.createVariable("dummy4", float, ("z", "z")) # Assign something to trigger a resize. dummy1[:] = [[1, 2, 3], [4, 5, 6]] with h5netcdf.File(tmp_local_netcdf, "r") as f: assert f.dimensions["x"] is None assert f.dimensions["y"] == 3 assert f.dimensions["z"] is None # This is parsed correctly due to h5netcdf's init trickery. assert f._current_dim_sizes["x"] == 2 assert f._current_dim_sizes["y"] == 3 assert f._current_dim_sizes["z"] == 0 # But the actual data-set and arrays are not correct. assert f["dummy1"].shape == (2, 3) # XXX: This array has some data with dimension x - netcdf does not # appear to keep dimensions consistent. assert f["dummy2"].shape == (3, 0, 0) f.groups["test"]["dummy3"].shape == (3, 3) f.groups["test"]["dummy4"].shape == (0, 0) def test_reading_unused_unlimited_dimension(tmp_local_or_remote_netcdf): """Test reading a file with unused dimension of unlimited size""" with h5netcdf.File(tmp_local_or_remote_netcdf, "w") as f: f.dimensions = {"x": None} f.resize_dimension("x", 5) assert f.dimensions == {"x": None} def test_reading_special_datatype_created_with_c_api(tmp_local_netcdf): """Test reading a file with unsupported Datatype""" with netCDF4.Dataset(tmp_local_netcdf, "w") as f: complex128 = np.dtype([("real", np.float64), ("imag", np.float64)]) f.createCompoundType(complex128, "complex128") with h5netcdf.File(tmp_local_netcdf, "r") as f: pass def test_nc4_non_coord(tmp_local_netcdf): with h5netcdf.File(tmp_local_netcdf, "w") as f: f.dimensions = {"x": None, "y": 2} f.create_variable("test", dimensions=("x",), dtype=np.int64) f.create_variable("y", dimensions=("x",), dtype=np.int64) with h5netcdf.File(tmp_local_netcdf, "r") as f: assert f.dimensions == {"x": None, "y": 2} assert list(f.variables) == ["y", "test"] assert list(f._h5group.keys()) == ["_nc4_non_coord_y", "test", "x", "y"] def test_overwrite_existing_file(tmp_local_netcdf): # create file with _NCProperties attribute with netCDF4.Dataset(tmp_local_netcdf, "w") as ds: ds.createDimension("x", 10) # check attribute with h5netcdf.File(tmp_local_netcdf, "r") as ds: assert ds.attrs._h5attrs.get("_NCProperties", False) # overwrite file with legacyapi with legacyapi.Dataset(tmp_local_netcdf, "w") as ds: ds.createDimension("x", 10) # check attribute with h5netcdf.File(tmp_local_netcdf, "r") as ds: assert ds.attrs._h5attrs.get("_NCProperties", False) # overwrite file with new api with h5netcdf.File(tmp_local_netcdf, "w") as ds: ds.dimensions["x"] = 10 # check attribute with h5netcdf.File(tmp_local_netcdf, "r") as ds: assert ds.attrs._h5attrs.get("_NCProperties", False) def test_scales_on_append(tmp_local_netcdf): # create file with _NCProperties attribute with netCDF4.Dataset(tmp_local_netcdf, "w") as ds: ds.createDimension("x", 10) # append file with netCDF4 with netCDF4.Dataset(tmp_local_netcdf, "r+") as ds: ds.createVariable("test", "i4", ("x",)) # check scales with h5netcdf.File(tmp_local_netcdf, "r") as ds: assert ds.variables["test"].attrs._h5attrs.get("DIMENSION_LIST", False) # append file with legacyapi with legacyapi.Dataset(tmp_local_netcdf, "r+") as ds: ds.createVariable("test1", "i4", ("x",)) # check scales with h5netcdf.File(tmp_local_netcdf, "r") as ds: assert ds.variables["test1"].attrs._h5attrs.get("DIMENSION_LIST", False) def create_attach_scales(filename, append_module): # create file with netCDF4 with netCDF4.Dataset(filename, "w") as ds: ds.createDimension("x", 0) ds.createDimension("y", 1) ds.createVariable("test", "i4", ("x",)) ds.variables["test"] = np.ones((10,)) # append file with netCDF4 with append_module.Dataset(filename, "a") as ds: ds.createVariable("test1", "i4", ("x",)) ds.createVariable("y", "i4", ("x", "y")) # check scales with h5netcdf.File(filename, "r") as ds: refs = ds._h5group["x"].attrs.get("REFERENCE_LIST", False) assert len(refs) == 3 for (ref, dim), name in zip(refs, ["/test", "/test1", "/_nc4_non_coord_y"]): assert dim == 0 assert ds._root._h5file[ref].name == name def test_create_attach_scales_netcdf4(tmp_local_netcdf): create_attach_scales(tmp_local_netcdf, netCDF4) def test_create_attach_scales_legacyapi(tmp_local_netcdf): create_attach_scales(tmp_local_netcdf, legacyapi) def test_detach_scale(tmp_local_netcdf): with h5netcdf.File(tmp_local_netcdf, "w") as ds: ds.dimensions["x"] = 2 ds.dimensions["y"] = 2 with h5netcdf.File(tmp_local_netcdf, "a") as ds: ds.create_variable("test", dimensions=("x",), dtype=np.int64) # this forces detach and re-creation ds.create_variable("x", dimensions=("y",), dtype=np.int64) with h5netcdf.File(tmp_local_netcdf, "r") as ds: refs = ds._h5group["x"].attrs.get("REFERENCE_LIST", False) assert len(refs) == 1 for (ref, dim), name in zip(refs, ["/test"]): assert dim == 0 assert ds._root._h5file[ref].name == name def test_no_circular_references(tmp_local_netcdf): # https://github.com/h5py/h5py/issues/2019 with h5netcdf.File(tmp_local_netcdf, "w") as ds: ds.dimensions["x"] = 2 ds.dimensions["y"] = 2 gc.collect() with h5netcdf.File(tmp_local_netcdf, "r") as ds: assert len(gc.get_referrers(ds)) == 1 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1639987165.0 h5netcdf-0.12.0/h5netcdf/utils.py0000644000175100001710000000124600000000000016047 0ustar00runnerdockerfrom collections.abc import Mapping class Frozen(Mapping): """Wrapper around an object implementing the mapping interface to make it immutable. If you really want to modify the mapping, the mutable version is saved under the `_mapping` attribute. """ def __init__(self, mapping): self._mapping = mapping def __getitem__(self, key): return self._mapping[key] def __iter__(self): return iter(self._mapping) def __len__(self): return len(self._mapping) def __contains__(self, key): return key in self._mapping def __repr__(self): return "%s(%r)" % (type(self).__name__, self._mapping) ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1639987175.5187743 h5netcdf-0.12.0/h5netcdf.egg-info/0000755000175100001710000000000000000000000016024 5ustar00runnerdocker././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1639987175.0 h5netcdf-0.12.0/h5netcdf.egg-info/PKG-INFO0000644000175100001710000002124700000000000017127 0ustar00runnerdockerMetadata-Version: 2.1 Name: h5netcdf Version: 0.12.0 Summary: netCDF4 via h5py Home-page: https://github.com/h5netcdf/h5netcdf Author: Stephan Hoyer Author-email: shoyer@gmail.com License: BSD Platform: UNKNOWN Classifier: Development Status :: 4 - Beta Classifier: License :: OSI Approved :: BSD License Classifier: Operating System :: OS Independent Classifier: Intended Audience :: Science/Research Classifier: Programming Language :: Python Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3.6 Classifier: Programming Language :: Python :: 3.7 Classifier: Programming Language :: Python :: 3.8 Classifier: Programming Language :: Python :: 3.9 Classifier: Topic :: Scientific/Engineering Requires-Python: >=3.6 License-File: LICENSE h5netcdf ======== .. image:: https://github.com/h5netcdf/h5netcdf/workflows/CI/badge.svg :target: https://github.com/h5netcdf/h5netcdf/actions .. image:: https://badge.fury.io/py/h5netcdf.svg :target: https://pypi.python.org/pypi/h5netcdf/ A Python interface for the netCDF4_ file-format that reads and writes local or remote HDF5 files directly via h5py_ or h5pyd_, without relying on the Unidata netCDF library. .. _netCDF4: http://www.unidata.ucar.edu/software/netcdf/docs/file_format_specifications.html#netcdf_4_spec .. _h5py: http://www.h5py.org/ .. _h5pyd: https://github.com/HDFGroup/h5pyd Why h5netcdf? ------------- - It has one less binary dependency (netCDF C). If you already have h5py installed, reading netCDF4 with h5netcdf may be much easier than installing netCDF4-Python. - We've seen occasional reports of better performance with h5py than netCDF4-python, though in many cases performance is identical. For `one workflow`_, h5netcdf was reported to be almost **4x faster** than `netCDF4-python`_. - Anecdotally, HDF5 users seem to be unexcited about switching to netCDF -- hopefully this will convince them that netCDF4 is actually quite sane! - Finally, side-stepping the netCDF C library (and Cython bindings to it) gives us an easier way to identify the source of performance issues and bugs in the netCDF libraries/specification. .. _one workflow: https://github.com/Unidata/netcdf4-python/issues/390#issuecomment-93864839 .. _xarray: http://github.com/pydata/xarray/ Install ------- Ensure you have a recent version of h5py installed (I recommend using conda_). At least version 2.1 is required (for dimension scales); versions 2.3 and newer have been verified to work, though some tests only pass on h5py 2.6. Then: ``pip install h5netcdf`` .. _conda: http://conda.io/ Usage ----- h5netcdf has two APIs, a new API and a legacy API. Both interfaces currently reproduce most of the features of the netCDF interface, with the notable exception of support for operations the rename or delete existing objects. We simply haven't gotten around to implementing this yet. Patches would be very welcome. New API ~~~~~~~ The new API supports direct hierarchical access of variables and groups. Its design is an adaptation of h5py to the netCDF data model. For example: .. code-block:: python import h5netcdf import numpy as np with h5netcdf.File('mydata.nc', 'w') as f: # set dimensions with a dictionary f.dimensions = {'x': 5} # and update them with a dict-like interface # f.dimensions['x'] = 5 # f.dimensions.update({'x': 5}) v = f.create_variable('hello', ('x',), float) v[:] = np.ones(5) # you don't need to create groups first # you also don't need to create dimensions first if you supply data # with the new variable v = f.create_variable('/grouped/data', ('y',), data=np.arange(10)) # access and modify attributes with a dict-like interface v.attrs['foo'] = 'bar' # you can access variables and groups directly using a hierarchical # keys like h5py print(f['/grouped/data']) # add an unlimited dimension f.dimensions['z'] = None # explicitly resize a dimension and all variables using it f.resize_dimension('z', 3) Legacy API ~~~~~~~~~~ The legacy API is designed for compatibility with netCDF4-python_. To use it, import ``h5netcdf.legacyapi``: .. _netCDF4-python: https://github.com/Unidata/netcdf4-python .. code-block:: python import h5netcdf.legacyapi as netCDF4 # everything here would also work with this instead: # import netCDF4 import numpy as np with netCDF4.Dataset('mydata.nc', 'w') as ds: ds.createDimension('x', 5) v = ds.createVariable('hello', float, ('x',)) v[:] = np.ones(5) g = ds.createGroup('grouped') g.createDimension('y', 10) g.createVariable('data', 'i8', ('y',)) v = g['data'] v[:] = np.arange(10) v.foo = 'bar' print(ds.groups['grouped'].variables['data']) The legacy API is designed to be easy to try-out for netCDF4-python users, but it is not an exact match. Here is an incomplete list of functionality we don't include: - Utility functions ``chartostring``, ``num2date``, etc., that are not directly necessary for writing netCDF files. - We don't support the ``endian`` argument to ``createVariable`` yet (see `GitHub issue`_). - h5netcdf variables do not support automatic masking or scaling (e.g., of values matching the ``_FillValue`` attribute). We prefer to leave this functionality to client libraries (e.g., xarray_), which can implement their exact desired scaling behavior. - No support yet for automatic resizing of unlimited dimensions with array indexing. This would be a welcome pull request. For now, dimensions can be manually resized with ``Group.resize_dimension(dimension, size)``. .. _GitHub issue: https://github.com/h5netcdf/h5netcdf/issues/15 Invalid netCDF files ~~~~~~~~~~~~~~~~~~~~ h5py implements some features that do not (yet) result in valid netCDF files: - Data types: - Booleans - Complex values - Non-string variable length types - Enum types - Reference types - Arbitrary filters: - Scale-offset filters By default [*]_, h5netcdf will not allow writing files using any of these features, as files with such features are not readable by other netCDF tools. However, these are still valid HDF5 files. If you don't care about netCDF compatibility, you can use these features by setting ``invalid_netcdf=True`` when creating a file: .. code-block:: python # avoid the .nc extension for non-netcdf files f = h5netcdf.File('mydata.h5', invalid_netcdf=True) ... # works with the legacy API, too, though compression options are not exposed ds = h5netcdf.legacyapi.Dataset('mydata.h5', invalid_netcdf=True) ... .. [*] h5netcdf we will raise ``h5netcdf.CompatibilityError``. Decoding variable length strings ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ h5py 3.0 introduced `new behavior`_ for handling variable length string. Instead of being automatically decoded with UTF-8 into NumPy arrays of ``str``, they are required as arrays of ``bytes``. The legacy API preserves the old behavior of h5py (which matches netCDF4), and automatically decodes strings. The new API *also* currently preserves the old behavior of h5py, but issues a warning that it will change in the future to match h5py. Explicitly set ``decode_vlen_strings=False`` in the ``h5netcdf.File`` constructor to opt-in to the new behavior early, or set ``decode_vlen_strings=True`` to opt-in to automatic decoding. .. _new behavior: https://docs.h5py.org/en/stable/strings.html Datasets with missing dimension scales ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ By default [*]_ h5netcdf raises a ``ValueError`` if variables with no dimension scale associated with one of their axes are accessed. You can set ``phony_dims='sort'`` when opening a file to let h5netcdf invent phony dimensions according to `netCDF`_ behaviour. .. code-block:: python # mimic netCDF-behaviour for non-netcdf files f = h5netcdf.File('mydata.h5', mode='r', phony_dims='sort') ... Note, that this iterates once over the whole group-hierarchy. This has affects on performance in case you rely on lazyness of group access. You can set ``phony_dims='access'`` instead to defer phony dimension creation to group access time. The created phony dimension naming will differ from `netCDF`_ behaviour. .. code-block:: python f = h5netcdf.File('mydata.h5', mode='r', phony_dims='access') ... .. _netCDF: https://www.unidata.ucar.edu/software/netcdf/docs/interoperability_hdf5.html .. [*] Keyword default setting ``phony_dims=None`` for backwards compatibility. Changelog --------- `Changelog`_ .. _Changelog: https://github.com/h5netcdf/h5netcdf/blob/master/CHANGELOG.rst License ------- `3-clause BSD`_ .. _3-clause BSD: https://github.com/h5netcdf/h5netcdf/blob/master/LICENSE ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1639987175.0 h5netcdf-0.12.0/h5netcdf.egg-info/SOURCES.txt0000644000175100001710000000060200000000000017706 0ustar00runnerdockerLICENSE MANIFEST.in README.rst setup.cfg setup.py h5netcdf/__init__.py h5netcdf/attrs.py h5netcdf/core.py h5netcdf/dimensions.py h5netcdf/legacyapi.py h5netcdf/utils.py h5netcdf.egg-info/PKG-INFO h5netcdf.egg-info/SOURCES.txt h5netcdf.egg-info/dependency_links.txt h5netcdf.egg-info/requires.txt h5netcdf.egg-info/top_level.txt h5netcdf/tests/conftest.py h5netcdf/tests/test_h5netcdf.py././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1639987175.0 h5netcdf-0.12.0/h5netcdf.egg-info/dependency_links.txt0000644000175100001710000000000100000000000022072 0ustar00runnerdocker ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1639987175.0 h5netcdf-0.12.0/h5netcdf.egg-info/requires.txt0000644000175100001710000000000500000000000020417 0ustar00runnerdockerh5py ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1639987175.0 h5netcdf-0.12.0/h5netcdf.egg-info/top_level.txt0000644000175100001710000000001100000000000020546 0ustar00runnerdockerh5netcdf ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1639987175.5187743 h5netcdf-0.12.0/setup.cfg0000644000175100001710000000074400000000000014460 0ustar00runnerdocker[bdist_wheel] universal = 1 [flake8] ignore = E203 # whitespace before ':' - doesn't work well with black E402 # module level import not at top of file E501 # line too long - let black worry about that E731 # do not assign a lambda expression, use a def W503 # line break before binary operator exclude = .eggs [isort] profile = black skip_gitignore = true force_to_top = true default_section = THIRDPARTY known_first_party = h5netcdf [egg_info] tag_build = tag_date = 0 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1639987165.0 h5netcdf-0.12.0/setup.py0000644000175100001710000000215700000000000014351 0ustar00runnerdockerimport os import sys from setuptools import find_packages, setup if sys.version_info[:2] < (3, 6): raise RuntimeError("Python version >= 3.6 required.") CLASSIFIERS = [ "Development Status :: 4 - Beta", "License :: OSI Approved :: BSD License", "Operating System :: OS Independent", "Intended Audience :: Science/Research", "Programming Language :: Python", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7", "Programming Language :: Python :: 3.8", "Programming Language :: Python :: 3.9", "Topic :: Scientific/Engineering", ] setup( name="h5netcdf", description="netCDF4 via h5py", long_description=( open("README.rst").read() if os.path.exists("README.rst") else "" ), version="0.12.0", license="BSD", classifiers=CLASSIFIERS, author="Stephan Hoyer", author_email="shoyer@gmail.com", url="https://github.com/h5netcdf/h5netcdf", python_requires=">=3.6", install_requires=["h5py"], tests_require=["netCDF4", "pytest"], packages=find_packages(), )