deepdish-0.3.7/0000755000175000017500000000000014123256273014477 5ustar larssonlarsson00000000000000deepdish-0.3.7/LICENSE0000644000175000017500000000271713052123256015505 0ustar larssonlarsson00000000000000Copyright (c) 2014, Amit Group All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of the {organization} nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. deepdish-0.3.7/MANIFEST.in0000644000175000017500000000013514123251306016224 0ustar larssonlarsson00000000000000include requirements.txt include requirements_docs.txt recursive-include deepdish *.pyx *.py deepdish-0.3.7/PKG-INFO0000644000175000017500000000135614123256273015601 0ustar larssonlarsson00000000000000Metadata-Version: 2.1 Name: deepdish Version: 0.3.7 Summary: Deep Learning experiments from University of Chicago. Home-page: https://github.com/uchicago-cs/deepdish Maintainer: Gustav Larsson Maintainer-email: gustav.m.larsson@gmail.com License: BSD Platform: UNKNOWN Classifier: Development Status :: 4 - Beta Classifier: Intended Audience :: Science/Research Classifier: License :: OSI Approved :: BSD License Classifier: Programming Language :: Python Classifier: Programming Language :: Python :: 2 Classifier: Programming Language :: Python :: 2.7 Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3.4 Classifier: Topic :: Scientific/Engineering Provides-Extra: image License-File: LICENSE UNKNOWN deepdish-0.3.7/README.rst0000644000175000017500000000524714123254760016175 0ustar larssonlarsson00000000000000.. image:: https://readthedocs.org/projects/deepdish/badge/?version=latest :target: https://readthedocs.org/projects/deepdish/?badge=latest :alt: Documentation Status .. image:: https://travis-ci.org/uchicago-cs/deepdish.svg?branch=master :target: https://travis-ci.org/uchicago-cs/deepdish/ .. image:: https://img.shields.io/pypi/v/deepdish.svg :target: https://pypi.python.org/pypi/deepdish .. image:: https://coveralls.io/repos/uchicago-cs/deepdish/badge.svg?branch=master&service=github :target: https://coveralls.io/github/uchicago-cs/deepdish?branch=master .. image:: https://img.shields.io/badge/license-BSD%203--Clause-blue.svg?style=flat :target: http://opensource.org/licenses/BSD-3-Clause deepdish ======== Flexible HDF5 saving/loading and other data science tools from the University of Chicago. This repository also host a Deep Learning blog: * http://deepdish.io Installation ------------ :: pip install deepdish Alternatively (if you have conda with the `conda-forge `__ channel):: conda install -c conda-forge deepdish Main feature ------------ The primary feature of deepdish is its ability to save and load all kinds of data as HDF5. It can save any Python data structure, offering the same ease of use as pickling or `numpy.save `__. However, it improves by also offering: - Interoperability between languages (HDF5 is a popular standard) - Easy to inspect the content from the command line (using ``h5ls`` or our specialized tool ``ddls``) - Highly compressed storage (thanks to a PyTables backend) - Native support for scipy sparse matrices and pandas ``DataFrame``, ``Series`` and ``Panel`` - Ability to partially read files, even slices of arrays An example: .. code:: python import deepdish as dd d = { 'foo': np.ones((10, 20)), 'sub': { 'bar': 'a string', 'baz': 1.23, }, } dd.io.save('test.h5', d) This can be reconstructed using ``dd.io.load('test.h5')``, or inspected through the command line using either a standard tool:: $ h5ls test.h5 foo Dataset {10, 20} sub Group Or, better yet, our custom tool ``ddls`` (or ``python -m deepdish.io.ls``):: $ ddls test.h5 /foo array (10, 20) [float64] /sub dict /sub/bar 'a string' (8) [unicode] /sub/baz 1.23 [float64] Read more at `Saving and loading data `__. Documentation ------------- * http://deepdish.readthedocs.io/ deepdish-0.3.7/deepdish/0000755000175000017500000000000014123256273016264 5ustar larssonlarsson00000000000000deepdish-0.3.7/deepdish/__init__.py0000644000175000017500000000276314123255565020410 0ustar larssonlarsson00000000000000from __future__ import print_function, division, absolute_import # Load the following modules by default from deepdish.core import ( bytesize, humanize_bytesize, memsize, span, apply_once, tupled_argmax, multi_range, timed, aslice, ) from deepdish import io from deepdish import util from deepdish import image from deepdish import parallel from deepdish.conf import config class MovedPackage(object): def __init__(self, old_loc, new_loc): self.old_loc = old_loc self.new_loc = new_loc def __getattr__(self, name): raise ImportError('The package {} has been moved to {}'.format( self.old_loc, self.new_loc)) # This is temporary: remove after a few minor releases plot = MovedPackage('deepdish.plot', 'vzlog.image') __all__ = ['deepdish', 'set_verbose', 'info', 'warning', 'bytesize', 'humanize_bytesize', 'memsize', 'span', 'apply_once', 'tupled_argmax', 'multi_range', 'io', 'util', 'image', 'plot', 'parallel', 'config', 'timed', 'aslice', ] VERSION = (0, 3, 7) ISRELEASED = True __version__ = '{0}.{1}.{2}'.format(*VERSION) if not ISRELEASED: __version__ += '.git' deepdish-0.3.7/deepdish/conf.py0000644000175000017500000000075113052123256017560 0ustar larssonlarsson00000000000000from __future__ import division, print_function, absolute_import import os import sys if sys.version_info >= (3,): from configparser import ConfigParser else: from ConfigParser import ConfigParser def config(): """ Loads and returns a ConfigParser from ``~/.deepdish.conf``. """ conf = ConfigParser() # Set up defaults conf.add_section('io') conf.set('io', 'compression', 'zlib') conf.read(os.path.expanduser('~/.deepdish.conf')) return conf deepdish-0.3.7/deepdish/core.py0000644000175000017500000001704714123254760017576 0ustar larssonlarsson00000000000000from __future__ import division, print_function, absolute_import import time import warnings import numpy as np import itertools as itr import sys from contextlib import contextmanager warnings.simplefilter("ignore", np.ComplexWarning) _is_verbose = False _is_silent = False class AbortException(Exception): """ This exception is used for when the user wants to quit algorithms mid-way. The `AbortException` can for instance be sent by pygame input, and caught by whatever is running the algorithm. """ pass def bytesize(arr): """ Returns the memory byte size of a Numpy array as an integer. """ byte_size = np.prod(arr.shape) * np.dtype(arr.dtype).itemsize return byte_size def humanize_bytesize(byte_size): order = np.log(byte_size) / np.log(1024) orders = [ (5, 'PB'), (4, 'TB'), (3, 'GB'), (2, 'MB'), (1, 'KB'), (0, 'B') ] for ex, name in orders: if order >= ex: return '{:.4g} {}'.format(byte_size / 1024**ex, name) def memsize(arr): """ Returns the required memory of a Numpy array as a humanly readable string. """ return humanize_bytesize(bytesize(arr)) def span(arr): """ Calculate and return the mininum and maximum of an array. Parameters ---------- arr : ndarray Numpy array. Returns ------- min : dtype Minimum of array. max : dtype Maximum of array. """ # TODO: This could be made faster with a custom ufunc return (np.min(arr), np.max(arr)) def apply_once(func, arr, axes, keepdims=True): """ Similar to `numpy.apply_over_axes`, except this performs the operation over a flattened version of all the axes, meaning that the function will only be called once. This only makes a difference for non-linear functions. Parameters ---------- func : callback Function that operates well on Numpy arrays and returns a single value of compatible dtype. arr : ndarray Array to do operation over. axes : int or iterable Specifies the axes to perform the operation. Only one call will be made to `func`, with all values flattened. keepdims : bool By default, this is True, so the collapsed dimensions remain with length 1. This is simlar to `numpy.apply_over_axes` in that regard. If this is set to False, the dimensions are removed, just like when using for instance `numpy.sum` over a single axis. Note that this is safer than subsequently calling squeeze, since this option will preserve length-1 dimensions that were not operated on. Examples -------- >>> import deepdish as dd >>> import numpy as np >>> rs = np.random.RandomState(0) >>> x = rs.uniform(size=(10, 3, 3)) Image that you have ten 3x3 images and you want to calculate each image's intensity standard deviation: >>> np.apply_over_axes(np.std, x, [1, 2]).ravel() array([ 0.06056838, 0.08230712, 0.08135083, 0.09938963, 0.08533604, 0.07830725, 0.066148 , 0.07983019, 0.08134123, 0.01839635]) This is the same as ``x.std(1).std(1)``, which is not the standard deviation of all 9 pixels together. To fix this we can flatten the pixels and try again: >>> x.reshape(10, 9).std(axis=1) array([ 0.17648981, 0.32849108, 0.29409526, 0.25547501, 0.23649064, 0.26928468, 0.20081239, 0.33052397, 0.29950855, 0.26535717]) This is exactly what this function does for you: >>> dd.apply_once(np.std, x, [1, 2], keepdims=False) array([ 0.17648981, 0.32849108, 0.29409526, 0.25547501, 0.23649064, 0.26928468, 0.20081239, 0.33052397, 0.29950855, 0.26535717]) """ all_axes = np.arange(arr.ndim) if isinstance(axes, int): axes = {axes} else: axes = set(axis % arr.ndim for axis in axes) principal_axis = min(axes) for i, axis in enumerate(axes): axis0 = principal_axis + i if axis != axis0: all_axes[axis0], all_axes[axis] = all_axes[axis], all_axes[axis0] transposed_arr = arr.transpose(all_axes) new_shape = [] new_shape_keepdims = [] for axis, dim in enumerate(arr.shape): if axis == principal_axis: new_shape.append(-1) elif axis not in axes: new_shape.append(dim) if axis in axes: new_shape_keepdims.append(1) else: new_shape_keepdims.append(dim) collapsed = np.apply_along_axis(func, principal_axis, transposed_arr.reshape(new_shape)) if keepdims: return collapsed.reshape(new_shape_keepdims) else: return collapsed def tupled_argmax(a): """ Argmax that returns an index tuple. Note that `numpy.argmax` will return a scalar index as if you had flattened the array. Parameters ---------- a : array_like Input array. Returns ------- index : tuple Tuple of index, even if `a` is one-dimensional. Note that this can immediately be used to index `a` as in ``a[index]``. Examples -------- >>> import numpy as np >>> import deepdish as dd >>> a = np.arange(6).reshape(2,3) >>> a array([[0, 1, 2], [3, 4, 5]]) >>> dd.tupled_argmax(a) (1, 2) """ return np.unravel_index(np.argmax(a), np.shape(a)) def multi_range(*args): return itr.product(*[range(a) for a in args]) @contextmanager def timed(name=None, file=sys.stdout, callback=None, wall_clock=True): """ Context manager to make it easy to time the execution of a piece of code. This timer will never run your code several times and is meant more for simple in-production timing, instead of benchmarking. Reports the wall-clock time (using `time.time`) and not the processor time. Parameters ---------- name : str Name of the timing block, to identify it. file : file handler Which file handler to print the results to. Default is standard output. If a numpy array and size 1 is given, the time in seconds will be stored inside it. Ignored if `callback` is set. callback : callable This offer even more flexibility than `file`. The callable will be called at the end of the execution with a single floating point argument with the elapsed time in seconds. Examples -------- >>> import deepdish as dd >>> import time The `timed` function is a context manager, so everything inside the ``with`` block will be timed. The results will be printed by default to standard output: >>> with dd.timed('Sleep'): # doctest: +SKIP ... time.sleep(1) [timed] Sleep: 1.001035451889038 s Using the `callback` parameter, we can accumulate multiple runs into a list: >>> times = [] >>> for i in range(3): # doctest: +SKIP ... with dd.timed(callback=times.append): ... time.sleep(1) >>> times # doctest: +SKIP [1.0035350322723389, 1.0035550594329834, 1.0039470195770264] """ start = time.time() yield end = time.time() delta = end - start if callback is not None: callback(delta) elif isinstance(file, np.ndarray) and len(file) == 1: file[0] = delta else: name_str = ' {}'.format(name) if name is not None else '' print(("[timed]{0}: {1} s".format(name_str, delta)), file=file) class SliceClass(object): def __getitem__(self, index): return index aslice = SliceClass() deepdish-0.3.7/deepdish/experiments/0000755000175000017500000000000014123256273020627 5ustar larssonlarsson00000000000000deepdish-0.3.7/deepdish/experiments/__init__.py0000644000175000017500000000000013052123256022720 0ustar larssonlarsson00000000000000deepdish-0.3.7/deepdish/experiments/pylearn2/0000755000175000017500000000000014123256273022363 5ustar larssonlarsson00000000000000deepdish-0.3.7/deepdish/experiments/pylearn2/datasets/0000755000175000017500000000000014123256273024173 5ustar larssonlarsson00000000000000deepdish-0.3.7/deepdish/experiments/pylearn2/datasets/mediaeval.py0000644000175000017500000000657213052123256026500 0ustar larssonlarsson00000000000000""" .. todo:: Based on code from pylearn2.datasets.hdf5 """ __authors__ = "Mark Stoehr" __copyright__ = "Copyright 2014, Mark Stoehr" __credits__ = ["Mark Stoehr"] __license__ = "MIT" __maintainer__ = "deepdish.io" __email__ = "mark@deepdish.io" import numpy as np import warnings from pylearn2.datasets import (dense_design_matrix, control, cache) from pylearn2.datasets.hdf5 import (HDF5Dataset, DenseDesignMatrix, HDF5DatasetIterator, HDF5ViewConverter, HDF5TopoViewConverter) from pylearn2.utils import serial try: import h5py except ImportError: h5py = None from pylearn2.utils.rng import make_np_rng class MediaEval(DenseDesignMatrix): """ .. todo:: WRITEME Parameters ---------- filename X y start stop """ def __init__(self,filename,X,y,start,stop**kwargs): self.load_all = False if h5py is None: raise RuntimeError("Could not import h5py") self.h5py.File(filename) X = self.get_dataset(X) y = self.get_dataset(y) super(MediaEval,self).__init__(X=X,y=y,**kwargs) def _check_labels(self): """ Sanity checks for X_labels and y_labels. Since the np.all test used for these labels does not work with HDF5 datasets, we issue a warning that those values are not checked. """ if self.X_labels is not None: assert self.X is not None assert self.view_converter is None assert self.X.ndim <= 2 if self.load_all: assert np.all(self.X < self.X_labels) else: warnings.warn("HDF5Dataset cannot perform test np.all(X < " + "X_labels). Use X_labels at your own risk.") if self.y_labels is not None: assert self.y is not None assert self.y.ndim <= 2 if self.load_all: assert np.all(self.y < self.y_labels) else: warnings.warn("HDF5Dataset cannot perform test np.all(y < " + "y_labels). Use y_labels at your own risk.") def get_dataset(self, dataset, load_all=False): """ Get a handle for an HDF5 dataset, or load the entire dataset into memory. Parameters ---------- dataset : str Name or path of HDF5 dataset. load_all : bool, optional (default False) If true, load dataset into memory. """ if load_all: data = self._file[dataset][:] else: data = self._file[dataset] data.ndim = len(data.shape) # hdf5 handle has no ndim return data def iterator(self, *args, **kwargs): """ Get an iterator for this dataset. The FiniteDatasetIterator uses indexing that is not supported by HDF5 datasets, so we change the class to HDF5DatasetIterator to override the iterator.next method used in dataset iteration. Parameters ---------- WRITEME """ iterator = super(MediaEval, self).iterator(*args, **kwargs) iterator.__class__ = HDF5DatasetIterator return iterator deepdish-0.3.7/deepdish/image.py0000644000175000017500000002357714123254760017735 0ustar larssonlarsson00000000000000""" Basic functions for working with images. """ from __future__ import division, print_function, absolute_import import itertools as itr import numpy as np def _import_skimage(): """Import scikit-image, with slightly modified `ImportError` message""" try: import skimage except ImportError: raise ImportError("scikit-image is required to use this function.") return skimage def _import_pil(): """Import scikit-image, with slightly modified `ImportError` message""" try: import PIL except ImportError: raise ImportError("PIL/Pillow is required to use this function.") return PIL def resize_by_factor(im, factor): """ Resizes the image according to a factor. The image is pre-filtered with a Gaussian and then resampled with bilinear interpolation. This function uses scikit-image and essentially combines its `pyramid_reduce` with `pyramid_expand` into one function. Returns the same object if factor is 1, not a copy. Parameters ---------- im : ndarray, ndim=2 or 3 Image. Either 2D or 3D with 3 or 4 channels. factor : float Resize factor, e.g. a factor of 0.5 will halve both sides. """ _import_skimage() from skimage.transform.pyramids import pyramid_reduce, pyramid_expand if factor < 1: return pyramid_reduce(im, downscale=1/factor) elif factor > 1: return pyramid_expand(im, upscale=factor) else: return im def resize(im, shape=None, max_side=None, min_side=None): if min_side is not None: min = np.min(im.shape[:2]) factor = min_side / min return resize_by_factor(im, factor) elif max_side is not None: max = np.max(im.shape[:2]) factor = max_side / max return resize_by_factor(im, factor) else: factor_y = shape[0] / im.shape[0] factor_x = shape[1] / im.shape[1] assert np.fabs(factor_x - factor_y) < 0.5 return resize_by_factor(im, factor_x) def asgray(im): """ Takes an image and returns its grayscale version by averaging the color channels. if an alpha channel is present, it will simply be ignored. If a grayscale image is given, the original image is returned. Parameters ---------- image : ndarray, ndim 2 or 3 RGB or grayscale image. Returns ------- gray_image : ndarray, ndim 2 Grayscale version of image. """ if im.ndim == 2: return im elif im.ndim == 3 and im.shape[2] in (3, 4): return im[..., :3].mean(axis=-1) else: raise ValueError('Invalid image format') def crop(im, size): """ Crops an image in the center. Parameters ---------- size : tuple, (height, width) Finally size after cropping. """ diff = [im.shape[index] - size[index] for index in (0, 1)] im2 = im[diff[0]//2:diff[0]//2 + size[0], diff[1]//2:diff[1]//2 + size[1]] return im2 def crop_or_pad(im, size, value=0): """ Crops an image in the center. Parameters ---------- size : tuple, (height, width) Finally size after cropping. """ diff = [im.shape[index] - size[index] for index in (0, 1)] im2 = im[diff[0]//2:diff[0]//2 + size[0], diff[1]//2:diff[1]//2 + size[1]] return im2 def crop_to_bounding_box(im, bb): """ Crops according to a bounding box. Parameters ---------- bounding_box : tuple, (top, left, bottom, right) Crops inclusively for top/left and exclusively for bottom/right. """ return im[bb[0]:bb[2], bb[1]:bb[3]] def load(path, dtype=np.float64): """ Loads an image from file. Parameters ---------- path : str Path to image file. dtype : np.dtype Defaults to ``np.float64``, which means the image will be returned as a float with values between 0 and 1. If ``np.uint8`` is specified, the values will be between 0 and 255 and no conversion cost will be incurred. """ _import_skimage() import skimage.io im = skimage.io.imread(path) if dtype == np.uint8: return im elif dtype in {np.float16, np.float32, np.float64}: return im.astype(dtype) / 255 else: raise ValueError('Unsupported dtype') def load_raw(path): """ Load image using PIL/Pillow without any processing. This is particularly useful for palette images, which will be loaded using their palette index values as opposed to `load` which will convert them to RGB. Parameters ---------- path : str Path to image file. """ _import_pil() from PIL import Image return np.array(Image.open(path)) def save(path, im): """ Saves an image to file. If the image is type float, it will assume to have values in [0, 1]. Parameters ---------- path : str Path to which the image will be saved. im : ndarray (image) Image. """ from PIL import Image if im.dtype == np.uint8: pil_im = Image.fromarray(im) else: pil_im = Image.fromarray((im*255).astype(np.uint8)) pil_im.save(path) def integrate(ii, r0, c0, r1, c1): """ Use an integral image to integrate over a given window. Parameters ---------- ii : ndarray Integral image. r0, c0 : int Top-left corner of block to be summed. r1, c1 : int Bottom-right corner of block to be summed. Returns ------- S : int Integral (sum) over the given window. """ # This line is modified S = np.zeros(ii.shape[-1]) S += ii[r1, c1] if (r0 - 1 >= 0) and (c0 - 1 >= 0): S += ii[r0 - 1, c0 - 1] if (r0 - 1 >= 0): S -= ii[r0 - 1, c1] if (c0 - 1 >= 0): S -= ii[r1, c0 - 1] return S def offset(img, offset, fill_value=0): """ Moves the contents of image without changing the image size. The missing values are given a specified fill value. Parameters ---------- img : array Image. offset : (vertical_offset, horizontal_offset) Tuple of length 2, specifying the offset along the two axes. fill_value : dtype of img Fill value. Defaults to 0. """ sh = img.shape if sh == (0, 0): return img else: x = np.empty(sh) x[:] = fill_value x[max(offset[0], 0):min(sh[0]+offset[0], sh[0]), max(offset[1], 0):min(sh[1]+offset[1], sh[1])] = \ img[max(-offset[0], 0):min(sh[0]-offset[0], sh[0]), max(-offset[1], 0):min(sh[1]-offset[1], sh[1])] return x def bounding_box(alpha, threshold=0.1): """ Returns a bounding box of the support. Parameters ---------- alpha : ndarray, ndim=2 Any one-channel image where the background has zero or low intensity. threshold : float The threshold that divides background from foreground. Returns ------- bounding_box : (top, left, bottom, right) The bounding box describing the smallest rectangle containing the foreground object, as defined by the threshold. """ assert alpha.ndim == 2 # Take the bounding box of the support, with a certain threshold. supp_axs = [alpha.max(axis=1-i) for i in range(2)] # Check first and last value of that threshold bb = [np.where(supp_axs[i] > threshold)[0][[0, -1]] for i in range(2)] return (bb[0][0], bb[1][0], bb[0][1], bb[1][1]) def bounding_box_as_binary_map(alpha, threshold=0.1): """ Similar to `bounding_box`, except returns the bounding box as a binary map the same size as the input. Same parameters as `bounding_box`. Returns ------- binary_map : ndarray, ndim=2, dtype=np.bool_ Binary map with True if object and False if background. """ bb = bounding_box(alpha) x = np.zeros(alpha.shape, dtype=np.bool_) x[bb[0]:bb[2], bb[1]:bb[3]] = 1 return x def extract_patches(images, patch_shape, samples_per_image=40, seed=0, cycle=True): """ Takes a set of images and yields randomly chosen patches of specified size. Parameters ---------- images : iterable The images have to be iterable, and each element must be a Numpy array with at least two spatial 2 dimensions as the first and second axis. patch_shape : tuple, length 2 The spatial shape of the patches that should be extracted. If the images have further dimensions beyond the spatial, the patches will copy these too. samples_per_image : int Samples to extract before moving on to the next image. seed : int Seed with which to select the patches. cycle : bool If True, then the function will produce patches indefinitely, by going back to the first image when all are done. If False, the iteration will stop when there are no more images. Returns ------- patch_generator This function returns a generator that will produce patches. Examples -------- >>> import deepdish as dd >>> import matplotlib.pylab as plt >>> import itertools >>> images = ag.io.load_example('mnist') Now, let us say we want to exact patches from the these, where each patch has at least some activity. >>> gen = dd.image.extract_patches(images, (5, 5)) >>> gen = (x for x in gen if x.mean() > 0.1) >>> patches = np.array(list(itertools.islice(gen, 25))) >>> patches.shape (25, 5, 5) >>> dd.plot.images(patches) >>> plt.show() """ rs = np.random.RandomState(seed) for Xi in itr.cycle(images): # How many patches could we extract? w, h = [Xi.shape[i]-patch_shape[i] for i in range(2)] assert w > 0 and h > 0 # Maybe shuffle an iterator of the indices? indices = np.asarray(list(itr.product(range(w), range(h)))) rs.shuffle(indices) for x, y in indices[:samples_per_image]: yield Xi[x:x+patch_shape[0], y:y+patch_shape[1]] deepdish-0.3.7/deepdish/io/0000755000175000017500000000000014123256273016673 5ustar larssonlarsson00000000000000deepdish-0.3.7/deepdish/io/__init__.py0000644000175000017500000000065713052123256021006 0ustar larssonlarsson00000000000000from __future__ import division, print_function, absolute_import try: import tables _pytables_ok = True del tables except ImportError: _pytables_ok = False if _pytables_ok: from .hdf5io import load, save, ForcePickle, Compression else: def _f(*args, **kwargs): raise ImportError("You need PyTables for this function") load = save = _f __all__ = ['load', 'save', 'ForcePickle', 'Compression'] deepdish-0.3.7/deepdish/io/hdf5io.py0000644000175000017500000006204514123254760020431 0ustar larssonlarsson00000000000000from __future__ import division, print_function, absolute_import import numpy as np import tables import warnings from scipy import sparse from deepdish import conf try: import pandas as pd pd.io.pytables._tables() _pandas = True except ImportError: _pandas = False try: from types import SimpleNamespace _sns = True except ImportError: _sns = False from deepdish import six IO_VERSION = 12 DEEPDISH_IO_PREFIX = 'DEEPDISH_IO' DEEPDISH_IO_VERSION_STR = DEEPDISH_IO_PREFIX + '_VERSION' DEEPDISH_IO_UNPACK = DEEPDISH_IO_PREFIX + '_DEEPDISH_IO_UNPACK' DEEPDISH_IO_ROOT_IS_SNS = DEEPDISH_IO_PREFIX + '_ROOT_IS_SNS' # Types that should be saved as pytables attribute ATTR_TYPES = (int, float, bool, six.string_types, six.binary_type, np.int8, np.int16, np.int32, np.int64, np.uint8, np.uint16, np.uint32, np.uint64, np.float16, np.float32, np.float64, np.bool_, np.complex64, np.complex128) if _pandas: class _HDFStoreWithHandle(pd.io.pytables.HDFStore): def __init__(self, handle): self._path = None self._complevel = None self._complib = None self._fletcher32 = False self._filters = None self._handle = handle def is_pandas_dataframe(level): return ('pandas_version' in level._v_attrs and 'pandas_type' in level._v_attrs) class ForcePickle(object): """ When saving an object with `deepdish.io.save`, you can wrap objects in this class to force them to be pickled. They will automatically be unpacked at load time. """ def __init__(self, obj): self.obj = obj class Compression(object): """ Class to enable explicit compression settings for individual arrays. """ def __init__(self, obj, compression='default'): self.obj = obj self.compression = compression def _dict_native_ok(d): """ This checks if a dictionary can be saved natively as HDF5 groups. If it can't, it will be pickled. """ if len(d) >= 256: return False # All keys must be strings for k in d: if not isinstance(k, six.string_types): return False return True def _get_compression_filters(compression='default'): if compression == 'default': config = conf.config() compression = config.get('io', 'compression') elif compression is True: compression = 'zlib' if (compression is False or compression is None or compression == 'none' or compression == 'None'): ff = None else: if isinstance(compression, (tuple, list)): compression, level = compression else: level = 9 try: ff = tables.Filters(complevel=level, complib=compression, shuffle=True) except Exception: warnings.warn(("(deepdish.io.save) Missing compression method {}: " "no compression will be used.").format(compression)) ff = None return ff def _save_ndarray(handler, group, name, x, filters=None): if np.issubdtype(x.dtype, np.unicode_): # Convert unicode strings to pure byte arrays strtype = b'unicode' itemsize = x.itemsize // 4 atom = tables.UInt8Atom() x = x.view(dtype=np.uint8) elif np.issubdtype(x.dtype, np.string_): strtype = b'ascii' itemsize = x.itemsize atom = tables.StringAtom(itemsize) elif x.dtype == np.object: # Not supported by HDF5, force pickling _save_pickled(handler, group, x, name=name) return else: atom = tables.Atom.from_dtype(x.dtype) strtype = None itemsize = None if x.ndim > 0 and np.min(x.shape) == 0: sh = np.array(x.shape) atom0 = tables.Atom.from_dtype(np.dtype(np.int64)) node = handler.create_array(group, name, atom=atom0, shape=(sh.size,)) node._v_attrs.zeroarray_dtype = np.dtype(x.dtype).str.encode('ascii') node[:] = sh return if x.ndim == 0 and len(x.shape) == 0: # This is a numpy array scalar. We will store it as a regular scalar # instead, which means it will be unpacked as a numpy scalar (not numpy # array scalar) setattr(group._v_attrs, name, x[()]) return # For small arrays, compression actually leads to larger files, so we are # settings a threshold here. The threshold has been set through # experimentation. if filters is not None and x.size > 300: node = handler.create_carray(group, name, atom=atom, shape=x.shape, chunkshape=None, filters=filters) else: node = handler.create_array(group, name, atom=atom, shape=x.shape) if strtype is not None: node._v_attrs.strtype = strtype node._v_attrs.itemsize = itemsize node[:] = x def _save_pickled(handler, group, level, name=None): warnings.warn(('(deepdish.io.save) Pickling {}: This may cause ' 'incompatibities (for instance between Python 2 and ' '3) and should ideally be avoided').format(level), DeprecationWarning) node = handler.create_vlarray(group, name, tables.ObjectAtom()) node.append(level) def _is_linkable(level): if isinstance(level, ATTR_TYPES): return False return True def _save_level(handler, group, level, name=None, filters=None, idtable=None): _id = id(level) try: oldpath = idtable[_id] except KeyError: if _is_linkable(level): # store path to object: if group._v_pathname.endswith('/'): idtable[_id] = '{}{}'.format(group._v_pathname, name) else: idtable[_id] = '{}/{}'.format(group._v_pathname, name) else: # object already saved, so create soft link to it: handler.create_soft_link(group, name, target=oldpath) return if isinstance(level, Compression): custom_filters = _get_compression_filters(level.compression) return _save_level(handler, group, level.obj, name=name, filters=custom_filters, idtable=idtable) elif isinstance(level, ForcePickle): _save_pickled(handler, group, level, name=name) elif isinstance(level, dict) and _dict_native_ok(level): # First create a new group new_group = handler.create_group(group, name, "dict:{}".format(len(level))) for k, v in level.items(): if isinstance(k, six.string_types): _save_level(handler, new_group, v, name=k, filters=filters, idtable=idtable) elif (_sns and isinstance(level, SimpleNamespace) and _dict_native_ok(level.__dict__)): # Create a new group in same manner as for dict new_group = handler.create_group( group, name, "SimpleNamespace:{}".format(len(level.__dict__))) for k, v in level.__dict__.items(): if isinstance(k, six.string_types): _save_level(handler, new_group, v, name=k, filters=filters, idtable=idtable) elif isinstance(level, list) and len(level) < 256: # Lists can contain other dictionaries and numpy arrays, so we don't # want to serialize them. Instead, we will store each entry as i0, i1, # etc. new_group = handler.create_group(group, name, "list:{}".format(len(level))) for i, entry in enumerate(level): level_name = 'i{}'.format(i) _save_level(handler, new_group, entry, name=level_name, filters=filters, idtable=idtable) elif isinstance(level, tuple) and len(level) < 256: # Lists can contain other dictionaries and numpy arrays, so we don't # want to serialize them. Instead, we will store each entry as i0, i1, # etc. new_group = handler.create_group(group, name, "tuple:{}".format(len(level))) for i, entry in enumerate(level): level_name = 'i{}'.format(i) _save_level(handler, new_group, entry, name=level_name, filters=filters, idtable=idtable) elif isinstance(level, np.ndarray): _save_ndarray(handler, group, name, level, filters=filters) elif _pandas and isinstance(level, (pd.DataFrame, pd.Series)): store = _HDFStoreWithHandle(handler) store.put(group._v_pathname + '/' + name, level) elif isinstance(level, (sparse.dok_matrix, sparse.lil_matrix)): raise NotImplementedError( 'deepdish.io.save does not support DOK or LIL matrices; ' 'please convert before saving to one of the following supported ' 'types: BSR, COO, CSR, CSC, DIA') elif isinstance(level, (sparse.csr_matrix, sparse.csc_matrix, sparse.bsr_matrix)): new_group = handler.create_group(group, name, "sparse:") _save_ndarray(handler, new_group, 'data', level.data, filters=filters) _save_ndarray(handler, new_group, 'indices', level.indices, filters=filters) _save_ndarray(handler, new_group, 'indptr', level.indptr, filters=filters) _save_ndarray(handler, new_group, 'shape', np.asarray(level.shape)) new_group._v_attrs.format = level.format new_group._v_attrs.maxprint = level.maxprint elif isinstance(level, sparse.dia_matrix): new_group = handler.create_group(group, name, "sparse:") _save_ndarray(handler, new_group, 'data', level.data, filters=filters) _save_ndarray(handler, new_group, 'offsets', level.offsets, filters=filters) _save_ndarray(handler, new_group, 'shape', np.asarray(level.shape)) new_group._v_attrs.format = level.format new_group._v_attrs.maxprint = level.maxprint elif isinstance(level, sparse.coo_matrix): new_group = handler.create_group(group, name, "sparse:") _save_ndarray(handler, new_group, 'data', level.data, filters=filters) _save_ndarray(handler, new_group, 'col', level.col, filters=filters) _save_ndarray(handler, new_group, 'row', level.row, filters=filters) _save_ndarray(handler, new_group, 'shape', np.asarray(level.shape)) new_group._v_attrs.format = level.format new_group._v_attrs.maxprint = level.maxprint elif isinstance(level, ATTR_TYPES): setattr(group._v_attrs, name, level) elif level is None: # Store a None as an empty group new_group = handler.create_group(group, name, "nonetype:") else: _save_pickled(handler, group, level, name=name) def _load_specific_level(handler, grp, path, sel=None, pathtable=None): if path == '': if sel is not None: return _load_sliced_level(handler, grp, sel) else: return _load_level(handler, grp, pathtable) vv = path.split('/', 1) if len(vv) == 1: if hasattr(grp, vv[0]): if sel is not None: return _load_sliced_level(handler, getattr(grp, vv[0]), sel) else: return _load_level(handler, getattr(grp, vv[0]), pathtable) elif hasattr(grp, '_v_attrs') and vv[0] in grp._v_attrs: if sel is not None: raise ValueError("Cannot slice this type") v = grp._v_attrs[vv[0]] if isinstance(v, np.string_): v = v.decode('utf-8') return v else: raise ValueError('Undefined entry "{}"'.format(vv[0])) else: level, rest = vv if level == '': return _load_specific_level(handler, grp.root, rest, sel=sel, pathtable=pathtable) else: if hasattr(grp, level): return _load_specific_level(handler, getattr(grp, level), rest, sel=sel, pathtable=pathtable) else: raise ValueError('Undefined group "{}"'.format(level)) def _load_pickled(level): if isinstance(level[0], ForcePickle): return level[0].obj else: return level[0] def _load_nonlink_level(handler, level, pathtable, pathname): """ Loads level and builds appropriate type, without handling softlinks """ if isinstance(level, tables.Group): if _sns and (level._v_title.startswith('SimpleNamespace:') or DEEPDISH_IO_ROOT_IS_SNS in level._v_attrs): val = SimpleNamespace() dct = val.__dict__ elif level._v_title.startswith('list:'): dct = {} val = [] else: dct = {} val = dct # in case of recursion, object needs to be put in pathtable # before trying to fully load it pathtable[pathname] = val # Load sub-groups for grp in level: lev = _load_level(handler, grp, pathtable) n = grp._v_name # Check if it's a complicated pair or a string-value pair if n.startswith('__pair'): dct[lev['key']] = lev['value'] else: dct[n] = lev # Load attributes for name in level._v_attrs._f_list(): if name.startswith(DEEPDISH_IO_PREFIX): continue v = level._v_attrs[name] dct[name] = v if level._v_title.startswith('list:'): N = int(level._v_title[len('list:'):]) for i in range(N): val.append(dct['i{}'.format(i)]) return val elif level._v_title.startswith('tuple:'): N = int(level._v_title[len('tuple:'):]) lst = [] for i in range(N): lst.append(dct['i{}'.format(i)]) return tuple(lst) elif level._v_title.startswith('nonetype:'): return None elif is_pandas_dataframe(level): assert _pandas, "pandas is required to read this file" store = _HDFStoreWithHandle(handler) return store.get(level._v_pathname) elif level._v_title.startswith('sparse:'): frm = level._v_attrs.format if frm in ('csr', 'csc', 'bsr'): shape = tuple(level.shape[:]) cls = {'csr': sparse.csr_matrix, 'csc': sparse.csc_matrix, 'bsr': sparse.bsr_matrix} matrix = cls[frm](shape) matrix.data = level.data[:] matrix.indices = level.indices[:] matrix.indptr = level.indptr[:] matrix.maxprint = level._v_attrs.maxprint return matrix elif frm == 'dia': shape = tuple(level.shape[:]) matrix = sparse.dia_matrix(shape) matrix.data = level.data[:] matrix.offsets = level.offsets[:] matrix.maxprint = level._v_attrs.maxprint return matrix elif frm == 'coo': shape = tuple(level.shape[:]) matrix = sparse.coo_matrix(shape) matrix.data = level.data[:] matrix.col = level.col[:] matrix.row = level.row[:] matrix.maxprint = level._v_attrs.maxprint return matrix else: raise ValueError('Unknown sparse matrix type: {}'.format(frm)) else: return val elif isinstance(level, tables.VLArray): if level.shape == (1,): return _load_pickled(level) else: return level[:] elif isinstance(level, tables.Array): if 'zeroarray_dtype' in level._v_attrs: # Unpack zero-size arrays (shape is stored in an HDF5 array and # type is stored in the attibute 'zeroarray_dtype') dtype = level._v_attrs.zeroarray_dtype sh = level[:] return np.zeros(tuple(sh), dtype=dtype) if 'strtype' in level._v_attrs: strtype = level._v_attrs.strtype itemsize = level._v_attrs.itemsize if strtype == b'unicode': return level[:].view(dtype=(np.unicode_, itemsize)) elif strtype == b'ascii': return level[:].view(dtype=(np.string_, itemsize)) # This serves two purposes: # (1) unpack big integers: the only time we save arrays like this # (2) unpack non-deepdish "scalars" if level.shape == (): return level[()] return level[:] def _load_level(handler, level, pathtable): """ Loads level and builds appropriate type, handling softlinks if necessary """ if isinstance(level, tables.link.SoftLink): # this is a link, so see if target is already loaded, return it pathname = level.target node = level() else: # not a link, but it might be a target that's already been # loaded ... if so, return it pathname = level._v_pathname node = level try: return pathtable[pathname] except KeyError: pathtable[pathname] = _load_nonlink_level(handler, node, pathtable, pathname) return pathtable[pathname] def _load_sliced_level(handler, level, sel): if isinstance(level, tables.link.SoftLink): # this is a link; get target: level = level() if isinstance(level, tables.VLArray): if level.shape == (1,): return _load_pickled(level) else: return level[sel] elif isinstance(level, tables.Array): return level[sel] else: raise ValueError('Cannot partially load this data type using `sel`') def save(path, data, compression='default'): """ Save any Python structure to an HDF5 file. It is particularly suited for Numpy arrays. This function works similar to ``numpy.save``, except if you save a Python object at the top level, you do not need to issue ``data.flat[0]`` to retrieve it from inside a Numpy array of type ``object``. Some types of objects get saved natively in HDF5. The rest get serialized automatically. For most needs, you should be able to stick to the natively supported types, which are: * Dictionaries * Short lists and tuples (<256 in length) * Basic data types (including strings and None) * Numpy arrays * Scipy sparse matrices * Pandas ``DataFrame``, ``Series``, and ``Panel`` * SimpleNamespaces (for Python >= 3.3, but see note below) A recommendation is to always convert your data to using only these types That way your data will be portable and can be opened through any HDF5 reader. A class that helps you with this is :class:`deepdish.util.Saveable`. Lists and tuples are supported and can contain heterogeneous types. This is mostly useful and plays well with HDF5 for short lists and tuples. If you have a long list (>256) it will be serialized automatically. However, in such cases it is common for the elements to have the same type, in which case we strongly recommend converting to a Numpy array first. Note that the SimpleNamespace type will be read in as dictionaries for earlier versions of Python. This function requires the `PyTables `_ module to be installed. You can change the default compression method to ``blosc`` (much faster, but less portable) by creating a ``~/.deepdish.conf`` with:: [io] compression: blosc This is the recommended compression method if you plan to use your HDF5 files exclusively through deepdish (or PyTables). Parameters ---------- path : string Filename to which the data is saved. data : anything Data to be saved. This can be anything from a Numpy array, a string, an object, or a dictionary containing all of them including more dictionaries. compression : string or tuple Set compression method, choosing from `blosc`, `zlib`, `lzo`, `bzip2` and more (see PyTables documentation). It can also be specified as a tuple (e.g. ``('blosc', 5)``), with the latter value specifying the level of compression, choosing from 0 (no compression) to 9 (maximum compression). Set to `None` to turn off compression. The default is `zlib`, since it is highly portable; for much greater speed, try for instance `blosc`. See also -------- load """ filters = _get_compression_filters(compression) with tables.open_file(path, mode='w') as h5file: # If the data is a dictionary, put it flatly in the root group = h5file.root group._v_attrs[DEEPDISH_IO_VERSION_STR] = IO_VERSION idtable = {} # dict to keep track of objects already saved # Sparse matrices match isinstance(data, dict), so we'll have to be # more strict with the type checking if type(data) == type({}) and _dict_native_ok(data): idtable[id(data)] = '/' for key, value in data.items(): _save_level(h5file, group, value, name=key, filters=filters, idtable=idtable) elif (_sns and isinstance(data, SimpleNamespace) and _dict_native_ok(data.__dict__)): idtable[id(data)] = '/' group._v_attrs[DEEPDISH_IO_ROOT_IS_SNS] = True for key, value in data.__dict__.items(): _save_level(h5file, group, value, name=key, filters=filters, idtable=idtable) else: _save_level(h5file, group, data, name='data', filters=filters, idtable=idtable) # Mark this to automatically unpack when loaded group._v_attrs[DEEPDISH_IO_UNPACK] = True def load(path, group=None, sel=None, unpack=False): """ Loads an HDF5 saved with `save`. This function requires the `PyTables `_ module to be installed. Parameters ---------- path : string Filename from which to load the data. group : string or list Load a specific group in the HDF5 hierarchy. If `group` is a list of strings, then a tuple will be returned with all the groups that were specified. sel : slice or tuple of slices If you specify `group` and the target is a numpy array, then you can use this to slice it. This is useful for opening subsets of large HDF5 files. To compose the selection, you can use `deepdish.aslice`. unpack : bool If True, a single-entry dictionaries will be unpacked and the value will be returned directly. That is, if you save ``dict(a=100)``, only ``100`` will be loaded. Returns ------- data : anything Hopefully an identical reconstruction of the data that was saved. See also -------- save """ with tables.open_file(path, mode='r') as h5file: pathtable = {} # dict to keep track of objects already loaded if group is not None: if isinstance(group, str): data = _load_specific_level(h5file, h5file, group, sel=sel, pathtable=pathtable) else: # Assume group is a list or tuple data = [] for g in group: data_i = _load_specific_level(h5file, h5file, g, sel=sel, pathtable=pathtable) data.append(data_i) data = tuple(data) else: grp = h5file.root auto_unpack = (DEEPDISH_IO_UNPACK in grp._v_attrs and grp._v_attrs[DEEPDISH_IO_UNPACK]) do_unpack = unpack or auto_unpack if do_unpack and len(grp._v_children) == 1: name = next(iter(grp._v_children)) data = _load_specific_level(h5file, grp, name, sel=sel, pathtable=pathtable) do_unpack = False elif sel is not None: raise ValueError("Must specify group with `sel` unless it " "automatically unpacks") else: data = _load_level(h5file, grp, pathtable) if DEEPDISH_IO_VERSION_STR in grp._v_attrs: v = grp._v_attrs[DEEPDISH_IO_VERSION_STR] else: v = 0 if v > IO_VERSION: warnings.warn('This file was saved with a newer version of ' 'deepdish. Please upgrade to make sure it loads ' 'correctly.') # Attributes can't be unpacked with the method above, so fall back # to this if do_unpack and isinstance(data, dict) and len(data) == 1: data = next(iter(data.values())) return data deepdish-0.3.7/deepdish/io/ls.py0000644000175000017500000006437314123254760017677 0ustar larssonlarsson00000000000000""" Look inside HDF5 files from the terminal, especially those created by deepdish. """ from __future__ import division, print_function, absolute_import from .hdf5io import (DEEPDISH_IO_VERSION_STR, DEEPDISH_IO_PREFIX, DEEPDISH_IO_UNPACK, DEEPDISH_IO_ROOT_IS_SNS, IO_VERSION, _sns, is_pandas_dataframe) import tables import numpy as np import sys import os import re from deepdish import io, six, __version__ COLORS = dict( black='30', darkgray='2;39', red='0;31', green='0;32', brown='0;33', yellow='0;33', blue='0;34', purple='0;35', cyan='0;36', white='0;39', reset='0' ) MIN_COLUMN_WIDTH = 5 MIN_AUTOMATIC_COLUMN_WIDTH = 20 MAX_AUTOMATIC_COLUMN_WIDTH = 80 ABRIDGE_OVER_N_CHILDREN = 50 ABRIDGE_SHOW_EACH_SIDE = 5 def _format_dtype(dtype): dtype = np.dtype(dtype) dtype_str = dtype.name if dtype.byteorder == '<': dtype_str += ' little-endian' elif dtype.byteorder == '>': dtype_str += ' big-endian' return dtype_str def _pandas_shape(level): if 'ndim' in level._v_attrs: ndim = level._v_attrs['ndim'] shape = [] for i in range(ndim): axis_name = 'axis{}'.format(i) if axis_name in level._v_children: axis = len(level._v_children[axis_name]) shape.append(axis) elif axis_name + '_label0' in level._v_children: axis = len(level._v_children[axis_name + '_label0']) shape.append(axis) else: return None return tuple(shape) def sorted_maybe_numeric(x): """ Sorts x with numeric semantics if all keys are nonnegative integers. Otherwise uses standard string sorting. """ all_numeric = all(map(str.isdigit, x)) if all_numeric: return sorted(x, key=int) else: return sorted(x) def paint(s, color, colorize=True): if colorize: if color in COLORS: return '\033[{}m{}\033[0m'.format(COLORS[color], s) else: raise ValueError('Invalid color') else: return s def type_string(typename, dtype=None, extra=None, type_color='red', colorize=True): ll = [paint(typename, type_color, colorize=colorize)] if extra: ll += [extra] if dtype: ll += [paint('[' + dtype + ']', 'darkgray', colorize=colorize)] return ' '.join(ll) def container_info(name, size=None, colorize=True, type_color=None, final_level=False): if final_level: d = {} if size is not None: d['extra'] = '(' + str(size) + ')' if type_color is not None: d['type_color'] = type_color s = type_string(name, colorize=colorize, **d) # Mark that it's abbreviated s += ' ' + paint('[...]', 'darkgray', colorize=colorize) return s else: # If not abbreviated, then display the type in dark gray, since # the information is already conveyed through the children return type_string(name, colorize=colorize, type_color='darkgray') def abbreviate(s, maxlength=25): """Color-aware abbreviator""" assert maxlength >= 4 skip = False abbrv = None i = 0 for j, c in enumerate(s): if c == '\033': skip = True elif skip: if c == 'm': skip = False else: i += 1 if i == maxlength - 1: abbrv = s[:j] + '\033[0m...' elif i > maxlength: break if i <= maxlength: return s else: return abbrv def print_row(key, value, level=0, parent='/', colorize=True, file=sys.stdout, unpack=False, settings={}, parent_color='darkgray', key_color='white'): s = '{}{}'.format(paint(parent, parent_color, colorize=colorize), paint(key, key_color, colorize=colorize)) s_raw = '{}{}'.format(parent, key) if 'filter' in settings: if not re.search(settings['filter'], s_raw): settings['filtered_count'] += 1 return if unpack: extra_str = '*' s_raw += extra_str s += paint(extra_str, 'purple', colorize=colorize) print('{}{} {}'.format(abbreviate(s, settings['left-column-width']), ' '*max(0, (settings['left-column-width'] + 1 - len(s_raw))), value)) class Node(object): def __repr__(self): return 'Node' def print(self, level=0, parent='/', colorize=True, max_level=None, file=sys.stdout, settings={}): pass def info(self, colorize=True, final_level=False): return paint('Node', 'red', colorize=colorize) class FileNotFoundNode(Node): def __init__(self, filename): self.filename = filename def __repr__(self): return 'FileNotFoundNode' def print(self, level=0, parent='/', colorize=True, max_level=None, file=sys.stdout, settings={}): print(paint('File not found', 'red', colorize=colorize), file=file) def info(self, colorize=True, final_level=False): return paint('FileNotFoundNode', 'red', colorize=colorize) class InvalidFileNode(Node): def __init__(self, filename): self.filename = filename def __repr__(self): return 'InvalidFileNode' def print(self, level=0, parent='/', colorize=True, max_level=None, file=sys.stdout, settings={}): print(paint('Invalid HDF5 file', 'red', colorize=colorize), file=file) def info(self, colorize=True, final_level=False): return paint('InvalidFileNode', 'red', colorize=colorize) class DictNode(Node): def __init__(self): self.children = {} self.header = {} def add(self, k, v): self.children[k] = v def print(self, level=0, parent='/', colorize=True, max_level=None, file=sys.stdout, settings={}): if level < max_level: ch = sorted_maybe_numeric(self.children) N = len(ch) if N > ABRIDGE_OVER_N_CHILDREN and not settings.get('all'): ch = ch[:ABRIDGE_SHOW_EACH_SIDE] + [None] + ch[-ABRIDGE_SHOW_EACH_SIDE:] for k in ch:#sorted(self.children): if k is None: #print(paint('... ({} omitted)'.format(N-20), 'darkgray', colorize=colorize)) omitted = N-2 * ABRIDGE_SHOW_EACH_SIDE info = paint('{} omitted ({} in total)'.format(omitted, N), 'darkgray', colorize=colorize) print_row('...', info, level=level, parent=parent, unpack=self.header.get('dd_io_unpack'), colorize=colorize, file=file, key_color='darkgray', settings=settings) continue v = self.children[k] final = level+1 == max_level if (not settings.get('leaves-only') or not isinstance(v, DictNode)): print_row(k, v.info(colorize=colorize, final_level=final), level=level, parent=parent, unpack=self.header.get('dd_io_unpack'), colorize=colorize, file=file, settings=settings) v.print(level=level+1, parent='{}{}/'.format(parent, k), colorize=colorize, max_level=max_level, file=file, settings=settings) def info(self, colorize=True, final_level=False): return container_info('dict', size=len(self.children), colorize=colorize, type_color='purple', final_level=final_level) def __repr__(self): s = ['{}={}'.format(k, repr(v)) for k, v in self.children.items()] return 'DictNode({})'.format(', '.join(s)) class SimpleNamespaceNode(DictNode): def info(self, colorize=True, final_level=False): return container_info('SimpleNamespace', size=len(self.children), colorize=colorize, type_color='purple', final_level=final_level) def print(self, level=0, parent='/', colorize=True, max_level=None, file=sys.stdout): if level == 0 and not self.header.get('dd_io_unpack'): print_row('', self.info(colorize=colorize, final_level=(0 == max_level)), level=level, parent=parent, unpack=False, colorize=colorize, file=file) DictNode.print(self, level, parent, colorize, max_level, file) def __repr__(self): s = ['{}={}'.format(k, repr(v)) for k, v in self.children.items()] return 'SimpleNamespaceNode({})'.format(', '.join(s)) class PandasDataFrameNode(Node): def __init__(self, shape): self.shape = shape def info(self, colorize=True, final_level=False): d = {} if self.shape is not None: d['extra'] = repr(self.shape) return type_string('DataFrame', type_color='red', colorize=colorize, **d) def __repr__(self): return 'PandasDataFrameNode({})'.format(self.shape) class PandasPanelNode(Node): def __init__(self, shape): self.shape = shape def info(self, colorize=True, final_level=False): d = {} if self.shape is not None: d['extra'] = repr(self.shape) return type_string('Panel', type_color='red', colorize=colorize, **d) def __repr__(self): return 'PandasPanelNode({})'.format(self.shape) class PandasSeriesNode(Node): def __init__(self, size, dtype): self.size = size self.dtype = dtype def info(self, colorize=True, final_level=False): d = {} if self.size is not None: d['extra'] = repr((self.size,)) if self.dtype is not None: d['dtype'] = str(self.dtype) return type_string('Series', type_color='red', colorize=colorize, **d) def __repr__(self): return 'SeriesNode()' class ListNode(Node): def __init__(self, typename='list'): self.children = [] self.typename = typename def append(self, v): self.children.append(v) def __repr__(self): s = [repr(v) for v in self.children] return 'ListNode({})'.format(', '.join(s)) def print(self, level=0, parent='/', colorize=True, max_level=None, file=sys.stdout, settings={}): if level < max_level: for i, v in enumerate(self.children): k = str(i) final = level + 1 == max_level print_row(k, v.info(colorize=colorize, final_level=final), level=level, parent=parent + 'i', colorize=colorize, file=file, settings=settings) v.print(level=level+1, parent='{}{}/'.format(parent + 'i', k), colorize=colorize, max_level=max_level, file=file, settings=settings) def info(self, colorize=True, final_level=False): return container_info(self.typename, size=len(self.children), colorize=colorize, type_color='purple', final_level=final_level) class NumpyArrayNode(Node): def __init__(self, shape, dtype, statistics=None, compression=None): self.shape = shape self.dtype = dtype self.statistics = statistics self.compression = compression def info(self, colorize=True, final_level=False): if not self.statistics: s = type_string('array', extra=repr(self.shape), dtype=str(self.dtype), type_color='red', colorize=colorize) if self.compression: if self.compression['complib'] is not None: compstr = '{} lvl{}'.format(self.compression['complib'], self.compression['complevel']) else: compstr = 'none' s += ' ' + paint(compstr, 'yellow', colorize=colorize) else: s = type_string('array', extra=repr(self.shape), type_color='red', colorize=colorize) raw_s = type_string('array', extra=repr(self.shape), type_color='red', colorize=False) if len(raw_s) < 25: s += ' ' * (25 - len(raw_s)) s += paint(' {:14.2g}'.format(self.statistics.get('mean')), 'white', colorize=colorize) s += paint(u' \u00b1 ', 'darkgray', colorize=colorize) s += paint('{:.2g}'.format(self.statistics.get('std')), 'reset', colorize=colorize) return s def __repr__(self): return ('NumpyArrayNode(shape={}, dtype={})' .format(self.shape, self.dtype)) class SparseMatrixNode(Node): def __init__(self, fmt, shape, dtype): self.sparse_format = fmt self.shape = shape self.dtype = dtype def info(self, colorize=True, final_level=False): return type_string('sparse {}'.format(self.sparse_format), extra=repr(self.shape), dtype=str(self.dtype), type_color='red', colorize=colorize) def __repr__(self): return ('NumpyArrayNode(shape={}, dtype={})' .format(self.shape, self.dtype)) class ValueNode(Node): def __init__(self, value): self.value = value def __repr__(self): return 'ValueNode(type={})'.format(type(self.value)) def info(self, colorize=True, final_level=False): if isinstance(self.value, six.text_type): if len(self.value) > 25: s = repr(self.value[:22] + '...') else: s = repr(self.value) return type_string(s, dtype='unicode', type_color='green', extra='({})'.format(len(self.value)), colorize=colorize) elif isinstance(self.value, six.binary_type): if len(self.value) > 25: s = repr(self.value[:22] + b'...') else: s = repr(self.value) return type_string(s, dtype='ascii', type_color='green', extra='({})'.format(len(self.value)), colorize=colorize) elif self.value is None: return type_string('None', dtype='python', type_color='blue', colorize=colorize) else: return type_string(repr(self.value)[:20], dtype=str(np.dtype(type(self.value))), type_color='blue', colorize=colorize) class ObjectNode(Node): def __init__(self): pass def __repr__(self): return 'ObjectNode' def info(self, colorize=True, final_level=False): return type_string('pickled', dtype='object', type_color='yellow', colorize=colorize) class SoftLinkNode(Node): def __init__(self, target): self.target = target def info(self, colorize=True, final_level=False): return type_string('link -> {}'.format(self.target), dtype='SoftLink', type_color='cyan', colorize=colorize) def __repr__(self): return ('SoftLinkNode(target={})' .format(self.target)) def _tree_level(level, raw=False, settings={}): if isinstance(level, tables.Group): if _sns and (level._v_title.startswith('SimpleNamespace:') or DEEPDISH_IO_ROOT_IS_SNS in level._v_attrs): node = SimpleNamespaceNode() else: node = DictNode() for grp in level: node.add(grp._v_name, _tree_level(grp, raw=raw, settings=settings)) for name in level._v_attrs._f_list(): v = level._v_attrs[name] if name == DEEPDISH_IO_VERSION_STR: node.header['dd_io_version'] = v if name == DEEPDISH_IO_UNPACK: node.header['dd_io_unpack'] = v if name.startswith(DEEPDISH_IO_PREFIX): continue if isinstance(v, np.ndarray): node.add(name, NumpyArrayNode(v.shape, _format_dtype(v.dtype))) else: node.add(name, ValueNode(v)) if (level._v_title.startswith('list:') or level._v_title.startswith('tuple:')): s = level._v_title.split(':', 1)[1] N = int(s) lst = ListNode(typename=level._v_title.split(':')[0]) for i in range(N): t = node.children['i{}'.format(i)] lst.append(t) return lst elif level._v_title.startswith('nonetype:'): return ValueNode(None) elif is_pandas_dataframe(level): pandas_type = level._v_attrs['pandas_type'] if raw: # Treat as regular dictionary pass elif pandas_type == 'frame': shape = _pandas_shape(level) new_node = PandasDataFrameNode(shape) return new_node elif pandas_type == 'series': try: values = level._v_children['values'] size = len(values) dtype = values.dtype except: size = None dtype = None new_node = PandasSeriesNode(size, dtype) return new_node elif pandas_type == 'wide': shape = _pandas_shape(level) new_node = PandasPanelNode(shape) return new_node # else: it will simply be treated as a dict elif level._v_title.startswith('sparse:') and not raw: frm = level._v_attrs.format dtype = level.data.dtype shape = tuple(level.shape[:]) node = SparseMatrixNode(frm, shape, dtype) return node return node elif isinstance(level, tables.VLArray): if level.shape == (1,): return ObjectNode() node = NumpyArrayNode(level.shape, 'unknown') return node elif isinstance(level, tables.Array): stats = {} if settings.get('summarize'): stats['mean'] = level[:].mean() stats['std'] = level[:].std() compression = {} if settings.get('compression'): compression['complib'] = level.filters.complib compression['shuffle'] = level.filters.shuffle compression['complevel'] = level.filters.complevel node = NumpyArrayNode(level.shape, _format_dtype(level.dtype), statistics=stats, compression=compression) if hasattr(level._v_attrs, 'zeroarray_dtype'): dtype = level._v_attrs.zeroarray_dtype node = NumpyArrayNode(tuple(level), _format_dtype(dtype)) elif hasattr(level._v_attrs, 'strtype'): strtype = level._v_attrs.strtype itemsize = level._v_attrs.itemsize if strtype == b'unicode': shape = level.shape[:-1] + (level.shape[-1] // itemsize // 4,) elif strtype == b'ascii': shape = level.shape node = NumpyArrayNode(shape, strtype.decode('ascii')) return node elif isinstance(level, tables.link.SoftLink): node = SoftLinkNode(level.target) return node else: return Node() def get_tree(path, raw=False, settings={}): fn = os.path.basename(path) try: with tables.open_file(path, mode='r') as h5file: grp = h5file.root s = _tree_level(grp, raw=raw, settings=settings) s.header['filename'] = fn return s except OSError: return FileNotFoundNode(fn) except IOError: return FileNotFoundNode(fn) except tables.exceptions.HDF5ExtError: return InvalidFileNode(fn) def _column_width(level): if isinstance(level, tables.Group): max_w = 0 for grp in level: max_w = max(max_w, _column_width(grp)) for name in level._v_attrs._f_list(): if name.startswith(DEEPDISH_IO_PREFIX): continue max_w = max(max_w, len(level._v_pathname) + 1 + len(name)) return max_w else: return len(level._v_pathname) def _discover_column_width(path): if not os.path.isfile(path): return MIN_AUTOMATIC_COLUMN_WIDTH with tables.open_file(path, mode='r') as h5file: return _column_width(h5file.root) def main(): import argparse parser = argparse.ArgumentParser( description=("Look inside HDF5 files. Works particularly well " "for HDF5 files saved with deepdish.io.save()."), prog='ddls', epilog='example: ddls test.h5 -i /foo/bar --ipython') parser.add_argument('file', nargs='+', help='filename of HDF5 file') parser.add_argument('-d', '--depth', type=int, default=4, help='max depth, defaults to 4') parser.add_argument('-nc', '--no-color', action='store_true', help='turn off bash colors') parser.add_argument('-i', '--inspect', metavar='GRP', help='print a specific variable (e.g. /data)') parser.add_argument('--ipython', action='store_true', help=('load file into an IPython session. ' 'Works with -i')) parser.add_argument('--raw', action='store_true', help=('print the raw HDF5 structure for complex ' 'data types, such as sparse matrices and pandas ' 'data frames')) parser.add_argument('-f', '--filter', type=str, help=('print only entries that match this regular ' 'expression')) parser.add_argument('-l', '--leaves-only', action='store_true', help=('print only leaves')) parser.add_argument('-a', '--all', action='store_true', help=('do not abridge')) parser.add_argument('-s', '--summarize', action='store_true', help=('print summary statistics of numpy arrays')) parser.add_argument('-c', '--compression', action='store_true', help=('print compression method for each array')) parser.add_argument('-v', '--version', action='version', version='deepdish {} (io protocol {})'.format( __version__, IO_VERSION)) parser.add_argument('--column-width', type=int, default=None) args = parser.parse_args() colorize = sys.stdout.isatty() and not args.no_color settings = {} if args.filter: settings['filter'] = args.filter if args.leaves_only: settings['leaves-only'] = True if args.summarize: settings['summarize'] = True if args.compression: settings['compression'] = True if args.all: settings['all'] = True def single_file(files): if len(files) >= 2: s = 'Error: Select a single file when using --inspect' print(paint(s, 'red', colorize=colorize)) sys.exit(1) return files[0] def run_ipython(fn, group=None, data=None): file_desc = paint(fn, 'yellow', colorize=colorize) if group is None: path_desc = file_desc else: path_desc = '{}:{}'.format( file_desc, paint(group, 'white', colorize=colorize)) welcome = "Loaded {} into '{}':".format( path_desc, paint('data', 'blue', colorize=colorize)) # Import deepdish for the session import deepdish as dd import IPython IPython.embed(header=welcome) i = 0 if args.inspect is not None: fn = single_file(args.file) try: data = io.load(fn, args.inspect) except ValueError: s = 'Error: Could not find group: {}'.format(args.inspect) print(paint(s, 'red', colorize=colorize)) sys.exit(1) if args.ipython: run_ipython(fn, group=args.inspect, data=data) else: print(data) elif args.ipython: fn = single_file(args.file) data = io.load(fn) run_ipython(fn, data=data) else: for f in args.file: # State that will be incremented settings['filtered_count'] = 0 if args.column_width is None: settings['left-column-width'] = max(MIN_AUTOMATIC_COLUMN_WIDTH, min(MAX_AUTOMATIC_COLUMN_WIDTH, _discover_column_width(f))) else: settings['left-column-width'] = args.column_width s = get_tree(f, raw=args.raw, settings=settings) if s is not None: if i > 0: print() if len(args.file) >= 2: print(paint(f, 'yellow', colorize=colorize)) s.print(colorize=colorize, max_level=args.depth, settings=settings) i += 1 if settings.get('filter'): print('Filtered on: {} ({} rows omitted)'.format( paint(args.filter, 'purple', colorize=colorize), paint(str(settings['filtered_count']), 'white', colorize=colorize))) if __name__ == '__main__': main() deepdish-0.3.7/deepdish/parallel/0000755000175000017500000000000014123256273020060 5ustar larssonlarsson00000000000000deepdish-0.3.7/deepdish/parallel/__init__.py0000644000175000017500000000044313052123256022164 0ustar larssonlarsson00000000000000from __future__ import print_function, division, absolute_import try: import mpi4py from deepdish.parallel.mpi import * except ImportError: from deepdish.parallel.fallback import * __all__ = ['rank', 'imap_unordered', 'imap', 'starmap_unordered', 'starmap', 'main'] deepdish-0.3.7/deepdish/parallel/fallback.py0000644000175000017500000000261713052123256022171 0ustar larssonlarsson00000000000000from __future__ import division, print_function, absolute_import import itertools as itr __all__ = ['rank', 'imap_unordered', 'imap', 'starmap_unordered', 'starmap', 'main'] def rank(): """ Returns MPI rank. If the MPI backend is not used, it will always return 0. """ return 0 def imap_unordered(f, params): """ This can return the elements in any particular order. This has a lower memory footprint than the ordered version and will be more responsive in terms of printing the results. For instance, if you run the ordered version, and the first batch is particularly slow, you won't see any feedback for a long time. """ return map(f, params) def imap(f, params): """ Analogous to `itertools.imap` (Python 2) and `map` (Python 3), but run in parallel. """ return map(f, params) def starmap_unordered(f, params): """ Similar to `imap_unordered`, but it will unpack the parameters. That is, it will call ``f(*p)``, for each `p` in `params`. """ return itr.starmap(f, params) def starmap(f, params): """ Analogous to `itertools.starmap`, but run in parallel. """ return itr.starmap(f, params) def main(name=None): """ Main function. Example use: >>> if gv.parallel.main(__name__): ... res = gv.parallel.imap_unordered(f, params) """ return name == '__main__' deepdish-0.3.7/deepdish/parallel/mpi.py0000644000175000017500000001063113052123256021212 0ustar larssonlarsson00000000000000from __future__ import division, print_function, absolute_import import sys import itertools as itr import numpy as np __all__ = ['rank', 'imap_unordered', 'imap', 'starmap_unordered', 'starmap', 'main'] # Global set of workers - initialized a map function is first called _g_available_workers = None _g_initialized = False # For docstrings, see deepdish.parallel.fallback def rank(): from mpi4py import MPI rank = MPI.COMM_WORLD.Get_rank() return rank def kill_workers(): from mpi4py import MPI all_workers = range(1, MPI.COMM_WORLD.Get_size()) for worker in all_workers: MPI.COMM_WORLD.send(None, dest=worker, tag=666) def _init(): global _g_available_workers, _g_initialized from mpi4py import MPI import atexit _g_available_workers = set(range(1, MPI.COMM_WORLD.Get_size())) _g_initialized = True atexit.register(kill_workers) def imap_unordered(f, workloads, star=False): global _g_available_workers, _g_initialized from mpi4py import MPI N = MPI.COMM_WORLD.Get_size() - 1 if N == 0 or not _g_initialized: mapf = [map, itr.starmap][star] for res in mapf(f, workloads): yield res return for job_index, workload in enumerate(itr.chain(workloads, itr.repeat(None))): if workload is None and len(_g_available_workers) == N: break while not _g_available_workers or workload is None: # Wait to receive results status = MPI.Status() ret = MPI.COMM_WORLD.recv(source=MPI.ANY_SOURCE, tag=MPI.ANY_TAG, status=status) if status.tag == 2: yield ret['output_data'] _g_available_workers.add(status.source) if len(_g_available_workers) == N: break if _g_available_workers and workload is not None: dest_rank = _g_available_workers.pop() # Send off job task = dict(func=f, input_data=workload, job_index=job_index, unpack=star) MPI.COMM_WORLD.send(task, dest=dest_rank, tag=10) def imap(f, workloads, star=False): global _g_available_workers, _g_initialized from mpi4py import MPI N = MPI.COMM_WORLD.Get_size() - 1 if N == 0 or not _g_initialized: mapf = [map, itr.starmap][star] for res in mapf(f, workloads): yield res return results = [] indices = [] for job_index, workload in enumerate(itr.chain(workloads, itr.repeat(None))): if workload is None and len(_g_available_workers) == N: break while not _g_available_workers or workload is None: # Wait to receive results status = MPI.Status() ret = MPI.COMM_WORLD.recv(source=MPI.ANY_SOURCE, tag=MPI.ANY_TAG, status=status) if status.tag == 2: results.append(ret['output_data']) indices.append(ret['job_index']) _g_available_workers.add(status.source) if len(_g_available_workers) == N: break if _g_available_workers and workload is not None: dest_rank = _g_available_workers.pop() # Send off job task = dict(func=f, input_data=workload, job_index=job_index, unpack=star) MPI.COMM_WORLD.send(task, dest=dest_rank, tag=10) II = np.argsort(indices) for i in II: yield results[i] def starmap(f, workloads): return imap(f, workloads, star=True) def starmap_unordered(f, workloads): return imap_unordered(f, workloads, star=True) def worker(): from mpi4py import MPI while True: status = MPI.Status() ret = MPI.COMM_WORLD.recv(source=0, tag=MPI.ANY_TAG, status=status) if status.tag == 10: # Workload received func = ret['func'] if ret.get('unpack'): res = func(*ret['input_data']) else: res = func(ret['input_data']) # Done, let's send it back MPI.COMM_WORLD.send(dict(job_index=ret['job_index'], output_data=res), dest=0, tag=2) elif status.tag == 666: # Kill code sys.exit(0) def main(name=None): if name is not None and name != '__main__': return False from mpi4py import MPI rank = MPI.COMM_WORLD.Get_rank() if rank == 0: _init() return True else: worker() sys.exit(0) deepdish-0.3.7/deepdish/six.py0000644000175000017500000006362613052123256017450 0ustar larssonlarsson00000000000000"""Utilities for writing code that runs on Python 2 and 3""" # Copyright (c) 2010-2014 Benjamin Peterson # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in all # copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE # SOFTWARE. import functools import operator import sys import types __author__ = "Benjamin Peterson " __version__ = "1.7.3" # Useful for very coarse version differentiation. PY2 = sys.version_info[0] == 2 PY3 = sys.version_info[0] == 3 if PY3: string_types = str, integer_types = int, class_types = type, text_type = str binary_type = bytes MAXSIZE = sys.maxsize else: string_types = basestring, integer_types = (int, long) class_types = (type, types.ClassType) text_type = unicode binary_type = str if sys.platform.startswith("java"): # Jython always uses 32 bits. MAXSIZE = int((1 << 31) - 1) else: # It's possible to have sizeof(long) != sizeof(Py_ssize_t). class X(object): def __len__(self): return 1 << 31 try: len(X()) except OverflowError: # 32-bit MAXSIZE = int((1 << 31) - 1) else: # 64-bit MAXSIZE = int((1 << 63) - 1) del X def _add_doc(func, doc): """Add documentation to a function.""" func.__doc__ = doc def _import_module(name): """Import module, returning the module after the last dot.""" __import__(name) return sys.modules[name] class _LazyDescr(object): def __init__(self, name): self.name = name def __get__(self, obj, tp): result = self._resolve() setattr(obj, self.name, result) # Invokes __set__. # This is a bit ugly, but it avoids running this again. delattr(obj.__class__, self.name) return result class MovedModule(_LazyDescr): def __init__(self, name, old, new=None): super(MovedModule, self).__init__(name) if PY3: if new is None: new = name self.mod = new else: self.mod = old def _resolve(self): return _import_module(self.mod) def __getattr__(self, attr): _module = self._resolve() value = getattr(_module, attr) setattr(self, attr, value) return value class _LazyModule(types.ModuleType): def __init__(self, name): super(_LazyModule, self).__init__(name) self.__doc__ = self.__class__.__doc__ def __dir__(self): attrs = ["__doc__", "__name__"] attrs += [attr.name for attr in self._moved_attributes] return attrs # Subclasses should override this _moved_attributes = [] class MovedAttribute(_LazyDescr): def __init__(self, name, old_mod, new_mod, old_attr=None, new_attr=None): super(MovedAttribute, self).__init__(name) if PY3: if new_mod is None: new_mod = name self.mod = new_mod if new_attr is None: if old_attr is None: new_attr = name else: new_attr = old_attr self.attr = new_attr else: self.mod = old_mod if old_attr is None: old_attr = name self.attr = old_attr def _resolve(self): module = _import_module(self.mod) return getattr(module, self.attr) class _SixMetaPathImporter(object): """ A meta path importer to import six.moves and its submodules. This class implements a PEP302 finder and loader. It should be compatible with Python 2.5 and all existing versions of Python3 """ def __init__(self, six_module_name): self.name = six_module_name self.known_modules = {} def _add_module(self, mod, *fullnames): for fullname in fullnames: self.known_modules[self.name + "." + fullname] = mod def _get_module(self, fullname): return self.known_modules[self.name + "." + fullname] def find_module(self, fullname, path=None): if fullname in self.known_modules: return self return None def __get_module(self, fullname): try: return self.known_modules[fullname] except KeyError: raise ImportError("This loader does not know module " + fullname) def load_module(self, fullname): try: # in case of a reload return sys.modules[fullname] except KeyError: pass mod = self.__get_module(fullname) if isinstance(mod, MovedModule): mod = mod._resolve() else: mod.__loader__ = self sys.modules[fullname] = mod return mod def is_package(self, fullname): """ Return true, if the named module is a package. We need this method to get correct spec objects with Python 3.4 (see PEP451) """ return hasattr(self.__get_module(fullname), "__path__") def get_code(self, fullname): """Return None Required, if is_package is implemented""" self.__get_module(fullname) # eventually raises ImportError return None get_source = get_code # same as get_code _importer = _SixMetaPathImporter(__name__) class _MovedItems(_LazyModule): """Lazy loading of moved objects""" __path__ = [] # mark as package _moved_attributes = [ MovedAttribute("cStringIO", "cStringIO", "io", "StringIO"), MovedAttribute("filter", "itertools", "builtins", "ifilter", "filter"), MovedAttribute("filterfalse", "itertools", "itertools", "ifilterfalse", "filterfalse"), MovedAttribute("input", "__builtin__", "builtins", "raw_input", "input"), MovedAttribute("map", "itertools", "builtins", "imap", "map"), MovedAttribute("range", "__builtin__", "builtins", "xrange", "range"), MovedAttribute("reload_module", "__builtin__", "imp", "reload"), MovedAttribute("reduce", "__builtin__", "functools"), MovedAttribute("StringIO", "StringIO", "io"), MovedAttribute("UserDict", "UserDict", "collections"), MovedAttribute("UserList", "UserList", "collections"), MovedAttribute("UserString", "UserString", "collections"), MovedAttribute("xrange", "__builtin__", "builtins", "xrange", "range"), MovedAttribute("zip", "itertools", "builtins", "izip", "zip"), MovedAttribute("zip_longest", "itertools", "itertools", "izip_longest", "zip_longest"), MovedModule("builtins", "__builtin__"), MovedModule("configparser", "ConfigParser"), MovedModule("copyreg", "copy_reg"), MovedModule("dbm_gnu", "gdbm", "dbm.gnu"), MovedModule("_dummy_thread", "dummy_thread", "_dummy_thread"), MovedModule("http_cookiejar", "cookielib", "http.cookiejar"), MovedModule("http_cookies", "Cookie", "http.cookies"), MovedModule("html_entities", "htmlentitydefs", "html.entities"), MovedModule("html_parser", "HTMLParser", "html.parser"), MovedModule("http_client", "httplib", "http.client"), MovedModule("email_mime_multipart", "email.MIMEMultipart", "email.mime.multipart"), MovedModule("email_mime_text", "email.MIMEText", "email.mime.text"), MovedModule("email_mime_base", "email.MIMEBase", "email.mime.base"), MovedModule("BaseHTTPServer", "BaseHTTPServer", "http.server"), MovedModule("CGIHTTPServer", "CGIHTTPServer", "http.server"), MovedModule("SimpleHTTPServer", "SimpleHTTPServer", "http.server"), MovedModule("cPickle", "cPickle", "pickle"), MovedModule("queue", "Queue"), MovedModule("reprlib", "repr"), MovedModule("socketserver", "SocketServer"), MovedModule("_thread", "thread", "_thread"), MovedModule("tkinter", "Tkinter"), MovedModule("tkinter_dialog", "Dialog", "tkinter.dialog"), MovedModule("tkinter_filedialog", "FileDialog", "tkinter.filedialog"), MovedModule("tkinter_scrolledtext", "ScrolledText", "tkinter.scrolledtext"), MovedModule("tkinter_simpledialog", "SimpleDialog", "tkinter.simpledialog"), MovedModule("tkinter_tix", "Tix", "tkinter.tix"), MovedModule("tkinter_ttk", "ttk", "tkinter.ttk"), MovedModule("tkinter_constants", "Tkconstants", "tkinter.constants"), MovedModule("tkinter_dnd", "Tkdnd", "tkinter.dnd"), MovedModule("tkinter_colorchooser", "tkColorChooser", "tkinter.colorchooser"), MovedModule("tkinter_commondialog", "tkCommonDialog", "tkinter.commondialog"), MovedModule("tkinter_tkfiledialog", "tkFileDialog", "tkinter.filedialog"), MovedModule("tkinter_font", "tkFont", "tkinter.font"), MovedModule("tkinter_messagebox", "tkMessageBox", "tkinter.messagebox"), MovedModule("tkinter_tksimpledialog", "tkSimpleDialog", "tkinter.simpledialog"), MovedModule("urllib_parse", __name__ + ".moves.urllib_parse", "urllib.parse"), MovedModule("urllib_error", __name__ + ".moves.urllib_error", "urllib.error"), MovedModule("urllib", __name__ + ".moves.urllib", __name__ + ".moves.urllib"), MovedModule("urllib_robotparser", "robotparser", "urllib.robotparser"), MovedModule("xmlrpc_client", "xmlrpclib", "xmlrpc.client"), MovedModule("xmlrpc_server", "SimpleXMLRPCServer", "xmlrpc.server"), MovedModule("winreg", "_winreg"), ] for attr in _moved_attributes: setattr(_MovedItems, attr.name, attr) if isinstance(attr, MovedModule): _importer._add_module(attr, "moves." + attr.name) del attr _MovedItems._moved_attributes = _moved_attributes moves = _MovedItems(__name__ + ".moves") _importer._add_module(moves, "moves") class Module_six_moves_urllib_parse(_LazyModule): """Lazy loading of moved objects in six.moves.urllib_parse""" _urllib_parse_moved_attributes = [ MovedAttribute("ParseResult", "urlparse", "urllib.parse"), MovedAttribute("SplitResult", "urlparse", "urllib.parse"), MovedAttribute("parse_qs", "urlparse", "urllib.parse"), MovedAttribute("parse_qsl", "urlparse", "urllib.parse"), MovedAttribute("urldefrag", "urlparse", "urllib.parse"), MovedAttribute("urljoin", "urlparse", "urllib.parse"), MovedAttribute("urlparse", "urlparse", "urllib.parse"), MovedAttribute("urlsplit", "urlparse", "urllib.parse"), MovedAttribute("urlunparse", "urlparse", "urllib.parse"), MovedAttribute("urlunsplit", "urlparse", "urllib.parse"), MovedAttribute("quote", "urllib", "urllib.parse"), MovedAttribute("quote_plus", "urllib", "urllib.parse"), MovedAttribute("unquote", "urllib", "urllib.parse"), MovedAttribute("unquote_plus", "urllib", "urllib.parse"), MovedAttribute("urlencode", "urllib", "urllib.parse"), MovedAttribute("splitquery", "urllib", "urllib.parse"), ] for attr in _urllib_parse_moved_attributes: setattr(Module_six_moves_urllib_parse, attr.name, attr) del attr Module_six_moves_urllib_parse._moved_attributes = _urllib_parse_moved_attributes _importer._add_module(Module_six_moves_urllib_parse(__name__ + ".moves.urllib_parse"), "moves.urllib_parse", "moves.urllib.parse") class Module_six_moves_urllib_error(_LazyModule): """Lazy loading of moved objects in six.moves.urllib_error""" _urllib_error_moved_attributes = [ MovedAttribute("URLError", "urllib2", "urllib.error"), MovedAttribute("HTTPError", "urllib2", "urllib.error"), MovedAttribute("ContentTooShortError", "urllib", "urllib.error"), ] for attr in _urllib_error_moved_attributes: setattr(Module_six_moves_urllib_error, attr.name, attr) del attr Module_six_moves_urllib_error._moved_attributes = _urllib_error_moved_attributes _importer._add_module(Module_six_moves_urllib_error(__name__ + ".moves.urllib.error"), "moves.urllib_error", "moves.urllib.error") class Module_six_moves_urllib_request(_LazyModule): """Lazy loading of moved objects in six.moves.urllib_request""" _urllib_request_moved_attributes = [ MovedAttribute("urlopen", "urllib2", "urllib.request"), MovedAttribute("install_opener", "urllib2", "urllib.request"), MovedAttribute("build_opener", "urllib2", "urllib.request"), MovedAttribute("pathname2url", "urllib", "urllib.request"), MovedAttribute("url2pathname", "urllib", "urllib.request"), MovedAttribute("getproxies", "urllib", "urllib.request"), MovedAttribute("Request", "urllib2", "urllib.request"), MovedAttribute("OpenerDirector", "urllib2", "urllib.request"), MovedAttribute("HTTPDefaultErrorHandler", "urllib2", "urllib.request"), MovedAttribute("HTTPRedirectHandler", "urllib2", "urllib.request"), MovedAttribute("HTTPCookieProcessor", "urllib2", "urllib.request"), MovedAttribute("ProxyHandler", "urllib2", "urllib.request"), MovedAttribute("BaseHandler", "urllib2", "urllib.request"), MovedAttribute("HTTPPasswordMgr", "urllib2", "urllib.request"), MovedAttribute("HTTPPasswordMgrWithDefaultRealm", "urllib2", "urllib.request"), MovedAttribute("AbstractBasicAuthHandler", "urllib2", "urllib.request"), MovedAttribute("HTTPBasicAuthHandler", "urllib2", "urllib.request"), MovedAttribute("ProxyBasicAuthHandler", "urllib2", "urllib.request"), MovedAttribute("AbstractDigestAuthHandler", "urllib2", "urllib.request"), MovedAttribute("HTTPDigestAuthHandler", "urllib2", "urllib.request"), MovedAttribute("ProxyDigestAuthHandler", "urllib2", "urllib.request"), MovedAttribute("HTTPHandler", "urllib2", "urllib.request"), MovedAttribute("HTTPSHandler", "urllib2", "urllib.request"), MovedAttribute("FileHandler", "urllib2", "urllib.request"), MovedAttribute("FTPHandler", "urllib2", "urllib.request"), MovedAttribute("CacheFTPHandler", "urllib2", "urllib.request"), MovedAttribute("UnknownHandler", "urllib2", "urllib.request"), MovedAttribute("HTTPErrorProcessor", "urllib2", "urllib.request"), MovedAttribute("urlretrieve", "urllib", "urllib.request"), MovedAttribute("urlcleanup", "urllib", "urllib.request"), MovedAttribute("URLopener", "urllib", "urllib.request"), MovedAttribute("FancyURLopener", "urllib", "urllib.request"), MovedAttribute("proxy_bypass", "urllib", "urllib.request"), ] for attr in _urllib_request_moved_attributes: setattr(Module_six_moves_urllib_request, attr.name, attr) del attr Module_six_moves_urllib_request._moved_attributes = _urllib_request_moved_attributes _importer._add_module(Module_six_moves_urllib_request(__name__ + ".moves.urllib.request"), "moves.urllib_request", "moves.urllib.request") class Module_six_moves_urllib_response(_LazyModule): """Lazy loading of moved objects in six.moves.urllib_response""" _urllib_response_moved_attributes = [ MovedAttribute("addbase", "urllib", "urllib.response"), MovedAttribute("addclosehook", "urllib", "urllib.response"), MovedAttribute("addinfo", "urllib", "urllib.response"), MovedAttribute("addinfourl", "urllib", "urllib.response"), ] for attr in _urllib_response_moved_attributes: setattr(Module_six_moves_urllib_response, attr.name, attr) del attr Module_six_moves_urllib_response._moved_attributes = _urllib_response_moved_attributes _importer._add_module(Module_six_moves_urllib_response(__name__ + ".moves.urllib.response"), "moves.urllib_response", "moves.urllib.response") class Module_six_moves_urllib_robotparser(_LazyModule): """Lazy loading of moved objects in six.moves.urllib_robotparser""" _urllib_robotparser_moved_attributes = [ MovedAttribute("RobotFileParser", "robotparser", "urllib.robotparser"), ] for attr in _urllib_robotparser_moved_attributes: setattr(Module_six_moves_urllib_robotparser, attr.name, attr) del attr Module_six_moves_urllib_robotparser._moved_attributes = _urllib_robotparser_moved_attributes _importer._add_module(Module_six_moves_urllib_robotparser(__name__ + ".moves.urllib.robotparser"), "moves.urllib_robotparser", "moves.urllib.robotparser") class Module_six_moves_urllib(types.ModuleType): """Create a six.moves.urllib namespace that resembles the Python 3 namespace""" __path__ = [] # mark as package parse = _importer._get_module("moves.urllib_parse") error = _importer._get_module("moves.urllib_error") request = _importer._get_module("moves.urllib_request") response = _importer._get_module("moves.urllib_response") robotparser = _importer._get_module("moves.urllib_robotparser") def __dir__(self): return ['parse', 'error', 'request', 'response', 'robotparser'] _importer._add_module(Module_six_moves_urllib(__name__ + ".moves.urllib"), "moves.urllib") def add_move(move): """Add an item to six.moves.""" setattr(_MovedItems, move.name, move) def remove_move(name): """Remove item from six.moves.""" try: delattr(_MovedItems, name) except AttributeError: try: del moves.__dict__[name] except KeyError: raise AttributeError("no such move, %r" % (name,)) if PY3: _meth_func = "__func__" _meth_self = "__self__" _func_closure = "__closure__" _func_code = "__code__" _func_defaults = "__defaults__" _func_globals = "__globals__" else: _meth_func = "im_func" _meth_self = "im_self" _func_closure = "func_closure" _func_code = "func_code" _func_defaults = "func_defaults" _func_globals = "func_globals" try: advance_iterator = next except NameError: def advance_iterator(it): return it.next() next = advance_iterator try: callable = callable except NameError: def callable(obj): return any("__call__" in klass.__dict__ for klass in type(obj).__mro__) if PY3: def get_unbound_function(unbound): return unbound create_bound_method = types.MethodType Iterator = object else: def get_unbound_function(unbound): return unbound.im_func def create_bound_method(func, obj): return types.MethodType(func, obj, obj.__class__) class Iterator(object): def next(self): return type(self).__next__(self) callable = callable _add_doc(get_unbound_function, """Get the function out of a possibly unbound function""") get_method_function = operator.attrgetter(_meth_func) get_method_self = operator.attrgetter(_meth_self) get_function_closure = operator.attrgetter(_func_closure) get_function_code = operator.attrgetter(_func_code) get_function_defaults = operator.attrgetter(_func_defaults) get_function_globals = operator.attrgetter(_func_globals) if PY3: def iterkeys(d, **kw): return iter(d.keys(**kw)) def itervalues(d, **kw): return iter(d.values(**kw)) def iteritems(d, **kw): return iter(d.items(**kw)) def iterlists(d, **kw): return iter(d.lists(**kw)) else: def iterkeys(d, **kw): return iter(d.iterkeys(**kw)) def itervalues(d, **kw): return iter(d.itervalues(**kw)) def iteritems(d, **kw): return iter(d.iteritems(**kw)) def iterlists(d, **kw): return iter(d.iterlists(**kw)) _add_doc(iterkeys, "Return an iterator over the keys of a dictionary.") _add_doc(itervalues, "Return an iterator over the values of a dictionary.") _add_doc(iteritems, "Return an iterator over the (key, value) pairs of a dictionary.") _add_doc(iterlists, "Return an iterator over the (key, [values]) pairs of a dictionary.") if PY3: def b(s): return s.encode("latin-1") def u(s): return s unichr = chr if sys.version_info[1] <= 1: def int2byte(i): return bytes((i,)) else: # This is about 2x faster than the implementation above on 3.2+ int2byte = operator.methodcaller("to_bytes", 1, "big") byte2int = operator.itemgetter(0) indexbytes = operator.getitem iterbytes = iter import io StringIO = io.StringIO BytesIO = io.BytesIO else: def b(s): return s # Workaround for standalone backslash def u(s): return unicode(s.replace(r'\\', r'\\\\'), "unicode_escape") unichr = unichr int2byte = chr def byte2int(bs): return ord(bs[0]) def indexbytes(buf, i): return ord(buf[i]) def iterbytes(buf): return (ord(byte) for byte in buf) import StringIO StringIO = BytesIO = StringIO.StringIO _add_doc(b, """Byte literal""") _add_doc(u, """Text literal""") if PY3: exec_ = getattr(moves.builtins, "exec") def reraise(tp, value, tb=None): if value.__traceback__ is not tb: raise value.with_traceback(tb) raise value else: def exec_(_code_, _globs_=None, _locs_=None): """Execute code in a namespace.""" if _globs_ is None: frame = sys._getframe(1) _globs_ = frame.f_globals if _locs_ is None: _locs_ = frame.f_locals del frame elif _locs_ is None: _locs_ = _globs_ exec("""exec _code_ in _globs_, _locs_""") exec_("""def reraise(tp, value, tb=None): raise tp, value, tb """) print_ = getattr(moves.builtins, "print", None) if print_ is None: def print_(*args, **kwargs): """The new-style print function for Python 2.4 and 2.5.""" fp = kwargs.pop("file", sys.stdout) if fp is None: return def write(data): if not isinstance(data, basestring): data = str(data) # If the file has an encoding, encode unicode with it. if (isinstance(fp, file) and isinstance(data, unicode) and fp.encoding is not None): errors = getattr(fp, "errors", None) if errors is None: errors = "strict" data = data.encode(fp.encoding, errors) fp.write(data) want_unicode = False sep = kwargs.pop("sep", None) if sep is not None: if isinstance(sep, unicode): want_unicode = True elif not isinstance(sep, str): raise TypeError("sep must be None or a string") end = kwargs.pop("end", None) if end is not None: if isinstance(end, unicode): want_unicode = True elif not isinstance(end, str): raise TypeError("end must be None or a string") if kwargs: raise TypeError("invalid keyword arguments to print()") if not want_unicode: for arg in args: if isinstance(arg, unicode): want_unicode = True break if want_unicode: newline = unicode("\n") space = unicode(" ") else: newline = "\n" space = " " if sep is None: sep = space if end is None: end = newline for i, arg in enumerate(args): if i: write(sep) write(arg) write(end) _add_doc(reraise, """Reraise an exception.""") if sys.version_info[0:2] < (3, 4): def wraps(wrapped): def wrapper(f): f = functools.wraps(wrapped)(f) f.__wrapped__ = wrapped return f return wrapper else: wraps = functools.wraps def with_metaclass(meta, *bases): """Create a base class with a metaclass.""" # This requires a bit of explanation: the basic idea is to make a dummy # metaclass for one level of class instantiation that replaces itself with # the actual metaclass. class metaclass(meta): def __new__(cls, name, this_bases, d): return meta(name, bases, d) return type.__new__(metaclass, 'temporary_class', (), {}) def add_metaclass(metaclass): """Class decorator for creating a class with a metaclass.""" def wrapper(cls): orig_vars = cls.__dict__.copy() orig_vars.pop('__dict__', None) orig_vars.pop('__weakref__', None) slots = orig_vars.get('__slots__') if slots is not None: if isinstance(slots, str): slots = [slots] for slots_var in slots: orig_vars.pop(slots_var) return metaclass(cls.__name__, cls.__bases__, orig_vars) return wrapper # Complete the moves implementation. # This code is at the end of this module to speed up module loading. # Turn this module into a package. __path__ = [] # required for PEP 302 and PEP 451 __package__ = __name__ # see PEP 366 @ReservedAssignment if globals().get("__spec__") is not None: __spec__.submodule_search_locations = [] # PEP 451 @UndefinedVariable # Remove other six meta path importers, since they cause problems. This can # happen if six is removed from sys.modules and then reloaded. (Setuptools does # this for some reason.) if sys.meta_path: for i, importer in enumerate(sys.meta_path): # Here's some real nastiness: Another "instance" of the six module might # be floating around. Therefore, we can't use isinstance() to check for # the six meta path importer, since the other six instance will have # inserted an importer with different class. if (type(importer).__name__ == "_SixMetaPathImporter" and importer.name == __name__): del sys.meta_path[i] break del i, importer # Finally, add the importer to the meta path import hook. sys.meta_path.append(_importer) deepdish-0.3.7/deepdish/tests/0000755000175000017500000000000014123256273017426 5ustar larssonlarsson00000000000000deepdish-0.3.7/deepdish/tests/__init__.py0000644000175000017500000000000013052123256021517 0ustar larssonlarsson00000000000000deepdish-0.3.7/deepdish/tests/test_core.py0000644000175000017500000000407413052123256021766 0ustar larssonlarsson00000000000000from __future__ import division, print_function, absolute_import import unittest from tempfile import NamedTemporaryFile import os import numpy as np import deepdish as dd from contextlib import contextmanager class TestCore(unittest.TestCase): def test_multi_range(self): x0 = [(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2)] x1 = list(dd.multi_range(2, 3)) assert x0 == x1 def test_bytesize(self): assert dd.humanize_bytesize(1) == '1 B' assert dd.humanize_bytesize(2 * 1024) == '2 KB' assert dd.humanize_bytesize(3 * 1024**2) == '3 MB' assert dd.humanize_bytesize(4 * 1024**3) == '4 GB' assert dd.humanize_bytesize(5 * 1024**4) == '5 TB' assert dd.bytesize(np.ones((5, 2), dtype=np.int16)) == 20 assert dd.memsize(np.ones((5, 2), dtype=np.int16)) == '20 B' def test_span(self): assert dd.span(np.array([0, -10, 20])) == (-10, 20) def test_apply_once(self): x = np.arange(3 * 4 * 5).reshape((3, 4, 5)) np.testing.assert_array_almost_equal(dd.apply_once(np.std, x, [0, -1]), 16.39105447 * np.ones((1, 4, 1))) x = np.arange(2 * 3).reshape((2, 3)) np.testing.assert_array_equal(dd.apply_once(np.sum, x, 1, keepdims=False), np.array([3, 12])) def test_tupled_argmax(self): x = np.zeros((3, 4, 5)) x[1, 2, 3] = 10 assert dd.tupled_argmax(x) == (1, 2, 3) def test_slice(self): s = [slice(None, 3), slice(None), slice(2, None), slice(3, 4), Ellipsis, [1, 2, 3]] assert dd.aslice[:3, :, 2:, 3:4, ..., [1, 2, 3]] def test_timed(self): # These tests only make sure it does not cause errors with dd.timed(): pass times = [] with dd.timed(callback=times.append): pass assert len(times) == 1 x = np.zeros(1) x[:] = np.nan with dd.timed(file=x): pass assert not np.isnan(x[0]) if __name__ == '__main__': unittest.main() deepdish-0.3.7/deepdish/tests/test_io.py0000644000175000017500000003032214123254760021445 0ustar larssonlarsson00000000000000from __future__ import division, print_function, absolute_import import unittest from tempfile import NamedTemporaryFile import os import numpy as np import deepdish as dd import pandas as pd from contextlib import contextmanager try: from types import SimpleNamespace _sns = True except ImportError: _sns = False @contextmanager def tmp_filename(): f = NamedTemporaryFile(delete=False) yield f.name f.close() os.unlink(f.name) @contextmanager def tmp_file(): f = NamedTemporaryFile(delete=False) yield f f.close() os.unlink(f.name) def reconstruct(fn, x): dd.io.save(fn, x) return dd.io.load(fn) def assert_array(fn, x): dd.io.save(fn, x) x1 = dd.io.load(fn) np.testing.assert_array_equal(x, x1) class TestIO(unittest.TestCase): def test_basic_data_types(self): with tmp_filename() as fn: x = 100 x1 = reconstruct(fn, x) assert x == x1 x = 1.23 x1 = reconstruct(fn, x) assert x == x1 # This doesn't work - complex numpy arrays work however #x = 1.23 + 2.3j #x1 = reconstruct(fn, x) #assert x == x1 x = u'this is a string' x1 = reconstruct(fn, x) assert x == x1 x = b'this is a bytearray' x1 = reconstruct(fn, x) assert x == x1 x = None x1 = reconstruct(fn, x) assert x1 is None def test_big_integers(self): with tmp_filename() as fn: x = 1239487239847234982392837423874 x1 = reconstruct(fn, x) assert x == x1 def test_numpy_array(self): with tmp_filename() as fn: x0 = np.arange(3 * 4 * 5, dtype=np.int64).reshape((3, 4, 5)) assert_array(fn, x0) x0 = x0.astype(np.float32) assert_array(fn, x0) x0 = x0.astype(np.uint8) assert_array(fn, x0) x0 = x0.astype(np.complex128) x0[0] = 1 + 2j assert_array(fn, x0) def test_numpy_array_zero_size(self): # Arrays where one of the axes is length 0. These zero-length arrays cannot # be stored natively in HDF5, so we'll have to store only the shape with tmp_filename() as fn: x0 = np.arange(0, dtype=np.int64) assert_array(fn, x0) x0 = np.arange(0, dtype=np.float32).reshape((10, 20, 0)) assert_array(fn, x0) x0 = np.arange(0, dtype=np.complex128).reshape((0, 5, 0)) assert_array(fn, x0) def test_numpy_string_array(self): with tmp_filename() as fn: x0 = np.array([[b'this', b'string'], [b'foo', b'bar']]) assert_array(fn, x0) x0 = np.array([[u'this', u'string'], [u'foo', u'bar']]) assert_array(fn, x0) def test_dictionary(self): with tmp_filename() as fn: d = dict(a=100, b='this is a string', c=np.ones(5), sub=dict(a=200, b='another string', c=np.random.randn(3, 4))) d1 = reconstruct(fn, d) assert d['a'] == d1['a'] assert d['b'] == d1['b'] np.testing.assert_array_equal(d['c'], d1['c']) assert d['sub']['a'] == d1['sub']['a'] assert d['sub']['b'] == d1['sub']['b'] np.testing.assert_array_equal(d['sub']['c'], d1['sub']['c']) def test_simplenamespace(self): if _sns: with tmp_filename() as fn: d = SimpleNamespace( a=100, b='this is a string', c=np.ones(5), sub=SimpleNamespace(a=200, b='another string', c=np.random.randn(3, 4))) d1 = reconstruct(fn, d) assert d.a == d1.a assert d.b == d1.b np.testing.assert_array_equal(d.c, d1.c) assert d.sub.a == d1.sub.a assert d.sub.b == d1.sub.b np.testing.assert_array_equal(d.sub.c, d1.sub.c) def test_softlinks_recursion(self): with tmp_filename() as fn: A = np.random.randn(3, 3) df = pd.DataFrame({'int': np.arange(3), 'name': ['zero', 'one', 'two']}) AA = 4 s = dict(A=A, B=A, c=A, d=A, f=A, g=[A, A, A], AA=AA, h=AA, df=df, df2=df) s['g'].append(s) n = reconstruct(fn, s) assert n['g'][0] is n['A'] assert (n['A'] is n['B'] is n['c'] is n['d'] is n['f'] is n['g'][0] is n['g'][1] is n['g'][2]) assert n['g'][3] is n assert n['AA'] == AA == n['h'] assert n['df'] is n['df2'] assert (n['df'] == df).all().all() # test 'sel' option on link ... need to read two vars # to ensure at least one is a link: col1 = dd.io.load(fn, '/A', dd.aslice[:, 1]) assert np.all(A[:, 1] == col1) col1 = dd.io.load(fn, '/B', dd.aslice[:, 1]) assert np.all(A[:, 1] == col1) def test_softlinks_recursion_sns(self): if _sns: with tmp_filename() as fn: A = np.random.randn(3, 3) AA = 4 s = SimpleNamespace(A=A, B=A, c=A, d=A, f=A, g=[A, A, A], AA=AA, h=AA) s.g.append(s) n = reconstruct(fn, s) assert n.g[0] is n.A assert (n.A is n.B is n.c is n.d is n.f is n.g[0] is n.g[1] is n.g[2]) assert n.g[3] is n assert n.AA == AA == n.h def test_pickle_recursion(self): with tmp_filename() as fn: f = {4: 78} f['rec'] = f g = [23.4, f] h = dict(f=f, g=g) h2 = reconstruct(fn, h) assert h2['g'][0] == 23.4 assert h2['g'][1] is h2['f']['rec'] is h2['f'] assert h2['f'][4] == 78 def test_list_recursion(self): with tmp_filename() as fn: lst = [1, 3] inlst = ['inside', 'list', lst] inlst.append(inlst) lst.append(lst) lst.append(inlst) lst2 = reconstruct(fn, lst) assert lst2[2] is lst2 assert lst2[3][2] is lst2 assert lst[3][2] is lst assert lst2[3][3] is lst2[3] assert lst[3][3] is lst[3] def test_list(self): with tmp_filename() as fn: x = [100, 'this is a string', np.ones(3), dict(foo=100)] x1 = reconstruct(fn, x) assert isinstance(x1, list) assert x[0] == x1[0] assert x[1] == x1[1] np.testing.assert_array_equal(x[2], x1[2]) assert x[3]['foo'] == x1[3]['foo'] def test_tuple(self): with tmp_filename() as fn: x = (100, 'this is a string', np.ones(3), dict(foo=100)) x1 = reconstruct(fn, x) assert isinstance(x1, tuple) assert x[0] == x1[0] assert x[1] == x1[1] np.testing.assert_array_equal(x[2], x1[2]) assert x[3]['foo'] == x1[3]['foo'] def test_sparse_matrices(self): import scipy.sparse as S with tmp_filename() as fn: x = S.lil_matrix((50, 70)) x[34, 37] = 1 x[34, 39] = 2.5 x[34, 41] = -2 x[38, 41] = -1 x1 = reconstruct(fn, x.tocsr()) assert x.shape == x1.shape np.testing.assert_array_equal(x.todense(), x1.todense()) x1 = reconstruct(fn, x.tocsc()) assert x.shape == x1.shape np.testing.assert_array_equal(x.todense(), x1.todense()) x1 = reconstruct(fn, x.tocoo()) assert x.shape == x1.shape np.testing.assert_array_equal(x.todense(), x1.todense()) x1 = reconstruct(fn, x.todia()) assert x.shape == x1.shape np.testing.assert_array_equal(x.todense(), x1.todense()) x1 = reconstruct(fn, x.tobsr()) assert x.shape == x1.shape np.testing.assert_array_equal(x.todense(), x1.todense()) def test_array_scalar(self): with tmp_filename() as fn: v = np.array(12.3) v1 = reconstruct(fn, v) assert v1[()] == v and isinstance(v1[()], np.float64) v = np.array(40, dtype=np.int8) v1 = reconstruct(fn, v) assert v1[()] == v and isinstance(v1[()], np.int8) def test_load_group(self): with tmp_filename() as fn: x = dict(one=np.ones(10), two='string') dd.io.save(fn, x) one = dd.io.load(fn, '/one') np.testing.assert_array_equal(one, x['one']) two = dd.io.load(fn, '/two') assert two == x['two'] full = dd.io.load(fn, '/') np.testing.assert_array_equal(x['one'], full['one']) assert x['two'] == full['two'] def test_load_multiple_groups(self): with tmp_filename() as fn: x = dict(one=np.ones(10), two='string', three=200) dd.io.save(fn, x) one, three = dd.io.load(fn, ['/one', '/three']) np.testing.assert_array_equal(one, x['one']) assert three == x['three'] three, two = dd.io.load(fn, ['/three', '/two']) assert three == x['three'] assert two == x['two'] def test_load_slice(self): with tmp_filename() as fn: x = np.arange(3 * 4 * 5).reshape((3, 4, 5)) dd.io.save(fn, dict(x=x)) s = dd.aslice[:2] xs = dd.io.load(fn, '/x', sel=s) np.testing.assert_array_equal(xs, x[s]) s = dd.aslice[:, 1:3] xs = dd.io.load(fn, '/x', sel=s) np.testing.assert_array_equal(xs, x[s]) xs = dd.io.load(fn, sel=s, unpack=True) np.testing.assert_array_equal(xs, x[s]) dd.io.save(fn, x) xs = dd.io.load(fn, sel=s) np.testing.assert_array_equal(xs, x[s]) def test_force_pickle(self): with tmp_filename() as fn: x = dict(one=dict(two=np.arange(10)), three='string') xf = dict(one=dict(two=x['one']['two']), three=x['three']) dd.io.save(fn, xf) xs = dd.io.load(fn) np.testing.assert_array_equal(x['one']['two'], xs['one']['two']) assert x['three'] == xs['three'] # Try direct loading one two = dd.io.load(fn, '/one/two') np.testing.assert_array_equal(x['one']['two'], two) def test_non_string_key_dict(self): with tmp_filename() as fn: # These will be pickled, but it should still work x = {0: 'zero', 1: 'one', 2: 'two'} x1 = reconstruct(fn, x) assert x == x1 x = {1+1j: 'zero', b'test': 'one', (1, 2): 'two'} x1 = reconstruct(fn, x) assert x == x1 def test_force_pickle(self): with tmp_filename() as fn: x = {0: 'zero', 1: 'one', 2: 'two'} fx = dd.io.ForcePickle(x) d = dict(foo=x, bar=100) fd = dict(foo=fx, bar=100) d1 = reconstruct(fn, fd) assert d == d1 def test_pandas_dataframe(self): with tmp_filename() as fn: # These will be pickled, but it should still work df = pd.DataFrame({'int': np.arange(3), 'name': ['zero', 'one', 'two']}) df1 = reconstruct(fn, df) assert (df == df1).all().all() def test_pandas_series(self): rs = np.random.RandomState(1234) with tmp_filename() as fn: s = pd.Series(rs.randn(5), index=['a', 'b', 'c', 'd', 'e']) s1 = reconstruct(fn, s) assert (s == s1).all() def test_compression_true(self): rs = np.random.RandomState(1234) with tmp_filename() as fn: x = rs.normal(size=(1000, 5)) for comp in [None, True, 'blosc', 'zlib', ('zlib', 5)]: dd.io.save(fn, x, compression=comp) x1 = dd.io.load(fn) assert (x == x1).all() if __name__ == '__main__': unittest.main() deepdish-0.3.7/deepdish/tests/test_util.py0000644000175000017500000000434313052123256022012 0ustar larssonlarsson00000000000000from __future__ import division, print_function, absolute_import import unittest import os import numpy as np import deepdish as dd class TestUtil(unittest.TestCase): def test_pad(self): x = np.ones((2, 2)) y = np.array([[0, 0, 0, 0, 0, 0], [0, 0, 1, 1, 0, 0], [0, 0, 1, 1, 0, 0], [0, 0, 0, 0, 0, 0]]) y1 = dd.util.pad(x, (1, 2), value=0.0) np.testing.assert_array_equal(y, y1) x = np.ones((2, 2)) y = np.array([[2, 2, 2, 2], [2, 1, 1, 2], [2, 1, 1, 2], [2, 2, 2, 2]]) y1 = dd.util.pad(x, 1, value=2.0) np.testing.assert_array_equal(y, y1) def test_pad_to_size(self): x = np.ones((2, 2)) y = np.array([[1, 1, 0], [1, 1, 0], [0, 0, 0]]) y1 = dd.util.pad_to_size(x, (3, 3), value=0.0) np.testing.assert_array_equal(y, y1) def test_pad_repeat_border(self): x = np.array([[1.0, 2.0], [3.0, 4.0]]) y = np.array([[1.0, 1.0, 1.0, 2.0, 2.0, 2.0], [1.0, 1.0, 1.0, 2.0, 2.0, 2.0], [1.0, 1.0, 1.0, 2.0, 2.0, 2.0], [3.0, 3.0, 3.0, 4.0, 4.0, 4.0], [3.0, 3.0, 3.0, 4.0, 4.0, 4.0], [3.0, 3.0, 3.0, 4.0, 4.0, 4.0]]) y1 = dd.util.pad_repeat_border(x, 2) np.testing.assert_array_equal(y, y1) y = np.array([[1.0, 2.0], [1.0, 2.0], [1.0, 2.0], [3.0, 4.0], [3.0, 4.0], [3.0, 4.0]]) y1 = dd.util.pad_repeat_border(x, (2, 0)) np.testing.assert_array_equal(y, y1) def test_pad_repeat_border_corner(self): x = np.array([[1.0, 2.0], [3.0, 4.0]]) y = np.array([[1.0, 2.0, 2.0, 2.0], [3.0, 4.0, 4.0, 4.0], [3.0, 4.0, 4.0, 4.0], [3.0, 4.0, 4.0, 4.0]]) y1 = dd.util.pad_repeat_border_corner(x, (4, 4)) np.testing.assert_array_equal(y, y1) if __name__ == '__main__': unittest.main() deepdish-0.3.7/deepdish/util/0000755000175000017500000000000014123256273017241 5ustar larssonlarsson00000000000000deepdish-0.3.7/deepdish/util/__init__.py0000644000175000017500000000100013052123256021333 0ustar larssonlarsson00000000000000from __future__ import division, print_function, absolute_import from .padding import (pad, pad_to_size, pad_repeat_border, pad_repeat_border_corner) from .saveable import Saveable, NamedRegistry, SaveableRegistry from .zca_whitening import whiten, zca_whitening_matrix, apply_whitening_matrix __all__ = [ 'pad', 'pad_to_size', 'pad_repeat_border', 'pad_repeat_border_corner', 'Saveable', 'NamedRegistry', 'SaveableRegistry', 'whiten', 'zca_whitening_matrix', 'apply_whitening_matrix', ] deepdish-0.3.7/deepdish/util/padding.py0000644000175000017500000001421513052123256021216 0ustar larssonlarsson00000000000000from __future__ import division, print_function, absolute_import import numpy as np def pad(data, padwidth, value=0.0): """ Pad an array with a specific value. Parameters ---------- data : ndarray Numpy array of any dimension and type. padwidth : int or tuple If int, it will pad using this amount at the beginning and end of all dimensions. If it is a tuple (of same length as `ndim`), then the padding amount will be specified per axis. value : data.dtype The value with which to pad. Default is ``0.0``. See also -------- pad_to_size, pad_repeat_border, pad_repeat_border_corner Examples -------- >>> import deepdish as dd >>> import numpy as np Pad an array with zeros. >>> x = np.ones((3, 3)) >>> dd.util.pad(x, (1, 2), value=0.0) array([[ 0., 0., 0., 0., 0., 0., 0.], [ 0., 0., 1., 1., 1., 0., 0.], [ 0., 0., 1., 1., 1., 0., 0.], [ 0., 0., 1., 1., 1., 0., 0.], [ 0., 0., 0., 0., 0., 0., 0.]]) """ data = np.asarray(data) shape = data.shape if isinstance(padwidth, int): padwidth = (padwidth,)*len(shape) padded_shape = tuple(map(lambda ix: ix[1]+padwidth[ix[0]]*2, enumerate(shape))) new_data = np.empty(padded_shape, dtype=data.dtype) new_data[..., :] = value new_data[[slice(w, -w) if w > 0 else slice(None) for w in padwidth]] = data return new_data def pad_to_size(data, shape, value=0.0): """ This is similar to `pad`, except you specify the final shape of the array. Parameters ---------- data : ndarray Numpy array of any dimension and type. shape : tuple Final shape of padded array. Should be tuple of length ``data.ndim``. If it has to pad unevenly, it will pad one more at the end of the axis than at the beginning. If a dimension is specified as ``-1``, then it will remain its current size along that dimension. value : data.dtype The value with which to pad. Default is ``0.0``. This can even be an array, as long as ``pdata[:] = value`` is valid, where ``pdata`` is the size of the padded array. Examples -------- >>> import deepdish as dd >>> import numpy as np Pad an array with zeros. >>> x = np.ones((4, 2)) >>> dd.util.pad_to_size(x, (5, 5)) array([[ 0., 1., 1., 0., 0.], [ 0., 1., 1., 0., 0.], [ 0., 1., 1., 0., 0.], [ 0., 1., 1., 0., 0.], [ 0., 0., 0., 0., 0.]]) """ shape = [data.shape[i] if shape[i] == -1 else shape[i] for i in range(len(shape))] new_data = np.empty(shape) new_data[:] = value II = [slice((shape[i] - data.shape[i])//2, (shape[i] - data.shape[i])//2 + data.shape[i]) for i in range(len(shape))] new_data[II] = data return new_data def pad_repeat_border(data, padwidth): """ Similar to `pad`, except the border value from ``data`` is used to pad. Parameters ---------- data : ndarray Numpy array of any dimension and type. padwidth : int or tuple If int, it will pad using this amount at the beginning and end of all dimensions. If it is a tuple (of same length as `ndim`), then the padding amount will be specified per axis. Examples -------- >>> import deepdish as dd >>> import numpy as np Pad an array by repeating its borders: >>> shape = (3, 4) >>> x = np.arange(np.prod(shape)).reshape(shape) >>> dd.util.pad_repeat_border(x, 2) array([[ 0, 0, 0, 1, 2, 3, 3, 3], [ 0, 0, 0, 1, 2, 3, 3, 3], [ 0, 0, 0, 1, 2, 3, 3, 3], [ 4, 4, 4, 5, 6, 7, 7, 7], [ 8, 8, 8, 9, 10, 11, 11, 11], [ 8, 8, 8, 9, 10, 11, 11, 11], [ 8, 8, 8, 9, 10, 11, 11, 11]]) """ data = np.asarray(data) shape = data.shape if isinstance(padwidth, int): padwidth = (padwidth,)*len(shape) padded_shape = tuple(map(lambda ix: ix[1]+padwidth[ix[0]]*2, enumerate(shape))) new_data = np.empty(padded_shape, dtype=data.dtype) new_data[[slice(w, -w) if w > 0 else slice(None) for w in padwidth]] = data for i, pw in enumerate(padwidth): if pw > 0: selection = [slice(None)] * data.ndim selection2 = [slice(None)] * data.ndim # Lower boundary selection[i] = slice(0, pw) selection2[i] = slice(pw, pw+1) new_data[tuple(selection)] = new_data[tuple(selection2)] # Upper boundary selection[i] = slice(-pw, None) selection2[i] = slice(-pw-1, -pw) new_data[tuple(selection)] = new_data[tuple(selection2)] return new_data def pad_repeat_border_corner(data, shape): """ Similar to `pad_repeat_border`, except the padding is always done on the upper end of each axis and the target size is specified. Parameters ---------- data : ndarray Numpy array of any dimension and type. shape : tuple Final shape of padded array. Should be tuple of length ``data.ndim``. If it has to pad unevenly, it will pad one more at the end of the axis than at the beginning. Examples -------- >>> import deepdish as dd >>> import numpy as np Pad an array by repeating its upper borders. >>> shape = (3, 4) >>> x = np.arange(np.prod(shape)).reshape(shape) >>> dd.util.pad_repeat_border_corner(x, (5, 5)) array([[ 0., 1., 2., 3., 3.], [ 4., 5., 6., 7., 7.], [ 8., 9., 10., 11., 11.], [ 8., 9., 10., 11., 11.], [ 8., 9., 10., 11., 11.]]) """ new_data = np.empty(shape) new_data[[slice(upper) for upper in data.shape]] = data for i in range(len(shape)): selection = [slice(None)]*i + [slice(data.shape[i], None)] selection2 = [slice(None)]*i + [slice(data.shape[i]-1, data.shape[i])] new_data[selection] = new_data[selection2] return new_data deepdish-0.3.7/deepdish/util/saveable.py0000644000175000017500000001126114123254760021375 0ustar larssonlarsson00000000000000from __future__ import division, print_function, absolute_import from deepdish import io _ERR_STR = "Must override load_from_dict for Saveable interface" class Saveable(object): """ Key-value coding interface for classes. Generally, this is an interface that make it possible to access instance members through keys (strings), instead of through named variables. What this interface enables, is to save and load an instance of the class to file. This is done by encoding it into a dictionary, or decoding it from a dictionary. The dictionary is then saved/loaded using :func:`deepdish.io.save`. """ @classmethod def load(cls, path): """ Loads an instance of the class from a file. Parameters ---------- path : str Path to an HDF5 file. Examples -------- This is an abstract data type, but let us say that ``Foo`` inherits from ``Saveable``. To construct an object of this class from a file, we do: >>> foo = Foo.load('foo.h5') #doctest: +SKIP """ if path is None: return cls.load_from_dict({}) else: d = io.load(path) return cls.load_from_dict(d) def save(self, path): """ Saves an instance of the class using :func:`deepdish.io.save`. Parameters ---------- path : str Output path to HDF5 file. """ io.save(path, self.save_to_dict()) @classmethod def load_from_dict(cls, d): """ Overload this function in your subclass. It takes a dictionary and should return a constructed object. When overloading, you have to decorate this function with ``@classmethod``. Parameters ---------- d : dict Dictionary representation of an instance of your class. Returns ------- obj : object Returns an object that has been constructed based on the dictionary. """ raise NotImplementedError(_ERR_STR) def save_to_dict(self): """ Overload this function in your subclass. It should return a dictionary representation of the current instance. If you member variables that are objects, it is best to convert them to dictionaries before they are entered into your dictionary hierarchy. Returns ------- d : dict Returns a dictionary representation of the current instance. """ raise NotImplementedError(_ERR_STR) class NamedRegistry(object): """ This class provides a named hierarchy of classes, where each class is associated with a string name. """ REGISTRY = {} @property def name(self): """Returns the name of the registry entry.""" # Automatically overloaded by 'register' return "noname" @classmethod def register(cls, name): """Decorator to register a class.""" def register_decorator(reg_cls): def name_func(self): return name reg_cls.name = property(name_func) assert issubclass(reg_cls, cls), \ "Must be subclass matching your NamedRegistry class" cls.REGISTRY[name] = reg_cls return reg_cls return register_decorator @classmethod def getclass(cls, name): """ Returns the class object given its name. """ return cls.REGISTRY[name] @classmethod def construct(cls, name, *args, **kwargs): """ Constructs an instance of an object given its name. """ return cls.REGISTRY[name](*args, **kwargs) @classmethod def registry(cls): return cls.REGISTRY @classmethod def root(cls, reg_cls): """ Decorate your base class with this, to create a new registry for it """ reg_cls.REGISTRY = {} return reg_cls class SaveableRegistry(Saveable, NamedRegistry): """ This combines the features of :class:`deepdish.util.Saveable` and :class:`deepdish.util.NamedRegistry`. See also -------- Saveable, NamedRegistry """ @classmethod def load(cls, path): if path is None: return cls.load_from_dict({}) else: d = io.load(path) # Check class type class_name = d.get('name') if class_name is not None: return cls.getclass(class_name).load_from_dict(d) else: return cls.load_from_dict(d) def save(self, path): d = self.save_to_dict() d['name'] = self.name io.save(path, d) deepdish-0.3.7/deepdish/util/zca_whitening.py0000644000175000017500000000250113052123256022434 0ustar larssonlarsson00000000000000from __future__ import division, print_function, absolute_import import numpy as np def zca_whitening_matrix(X, w_epsilon, batch=1000): shape = X.shape N = shape[0] Xflat = X.reshape((N, -1)) sigma = None num_batches = int(np.ceil(N / batch)) for b in range(num_batches): Xb = Xflat[b*batch:(b+1)*batch] C = np.dot(Xb.T, Xb) if sigma is None: sigma = C else: sigma += C sigma /= N U, S, _ = np.linalg.svd(sigma) shrinker = np.diag(1 / np.sqrt(S + w_epsilon)) W = np.dot(U, np.dot(shrinker, U.T)) return W def apply_whitening_matrix(X, W, batch=1000): shape = X.shape N = shape[0] Xflat = X.reshape((N, -1)) wX = np.empty(shape) num_batches = int(np.ceil(N / batch)) for b in range(num_batches): Xb = Xflat[b*batch:(b+1)*batch] wX[b*batch:(b+1)*batch] = np.dot(W, Xb.T).T.reshape((-1,) + shape[1:]) return wX def whiten(X, w_epsilon, batch=1000): shape = X.shape N = shape[0] Xflat = X.reshape((N, -1)) W = zca_whitening_matrix(X, w_epsilon) wX = np.empty(shape) num_batches = int(np.ceil(N / batch)) for b in range(num_batches): Xb = Xflat[b*batch:(b+1)*batch] wX[b*batch:(b+1)*batch] = np.dot(W, Xb.T).T.reshape((-1,) + shape[1:]) return wX deepdish-0.3.7/deepdish.egg-info/0000755000175000017500000000000014123256273017756 5ustar larssonlarsson00000000000000deepdish-0.3.7/deepdish.egg-info/PKG-INFO0000644000175000017500000000135614123256273021060 0ustar larssonlarsson00000000000000Metadata-Version: 2.1 Name: deepdish Version: 0.3.7 Summary: Deep Learning experiments from University of Chicago. Home-page: https://github.com/uchicago-cs/deepdish Maintainer: Gustav Larsson Maintainer-email: gustav.m.larsson@gmail.com License: BSD Platform: UNKNOWN Classifier: Development Status :: 4 - Beta Classifier: Intended Audience :: Science/Research Classifier: License :: OSI Approved :: BSD License Classifier: Programming Language :: Python Classifier: Programming Language :: Python :: 2 Classifier: Programming Language :: Python :: 2.7 Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3.4 Classifier: Topic :: Scientific/Engineering Provides-Extra: image License-File: LICENSE UNKNOWN deepdish-0.3.7/deepdish.egg-info/SOURCES.txt0000644000175000017500000000144014123256273021641 0ustar larssonlarsson00000000000000LICENSE MANIFEST.in README.rst requirements.txt requirements_docs.txt setup.cfg setup.py deepdish/__init__.py deepdish/conf.py deepdish/core.py deepdish/image.py deepdish/six.py deepdish.egg-info/PKG-INFO deepdish.egg-info/SOURCES.txt deepdish.egg-info/dependency_links.txt deepdish.egg-info/requires.txt deepdish.egg-info/top_level.txt deepdish/experiments/__init__.py deepdish/experiments/pylearn2/datasets/mediaeval.py deepdish/io/__init__.py deepdish/io/hdf5io.py deepdish/io/ls.py deepdish/parallel/__init__.py deepdish/parallel/fallback.py deepdish/parallel/mpi.py deepdish/tests/__init__.py deepdish/tests/test_core.py deepdish/tests/test_io.py deepdish/tests/test_util.py deepdish/util/__init__.py deepdish/util/padding.py deepdish/util/saveable.py deepdish/util/zca_whitening.py scripts/ddlsdeepdish-0.3.7/deepdish.egg-info/dependency_links.txt0000644000175000017500000000000114123256273024024 0ustar larssonlarsson00000000000000 deepdish-0.3.7/deepdish.egg-info/requires.txt0000644000175000017500000000004414123256273022354 0ustar larssonlarsson00000000000000numpy scipy tables [image] skimage deepdish-0.3.7/deepdish.egg-info/top_level.txt0000644000175000017500000000001114123256273022500 0ustar larssonlarsson00000000000000deepdish deepdish-0.3.7/requirements.txt0000644000175000017500000000002313052123256017750 0ustar larssonlarsson00000000000000numpy scipy tables deepdish-0.3.7/requirements_docs.txt0000644000175000017500000000003213052123256020760 0ustar larssonlarsson00000000000000mock docutils sphinx>=1.3 deepdish-0.3.7/scripts/0000755000175000017500000000000014123256273016166 5ustar larssonlarsson00000000000000deepdish-0.3.7/scripts/ddls0000755000175000017500000000041413052123256017033 0ustar larssonlarsson00000000000000#!/usr/bin/env python from __future__ import division, print_function, absolute_import import os import sys sys.path = [os.path.join(os.path.abspath(os.path.dirname(__file__)), "..")] + sys.path from deepdish.io.ls import main if __name__ == '__main__': main() deepdish-0.3.7/setup.cfg0000644000175000017500000000010314123256273016312 0ustar larssonlarsson00000000000000[bdist_wheel] universal = 1 [egg_info] tag_build = tag_date = 0 deepdish-0.3.7/setup.py0000644000175000017500000000242314123255554016213 0ustar larssonlarsson00000000000000#!/usr/bin/env python from __future__ import division, print_function, absolute_import from setuptools import setup import os if os.getenv('READTHEDOCS'): with open('requirements_docs.txt') as f: required = f.read().splitlines() else: with open('requirements.txt') as f: required = f.read().splitlines() CLASSIFIERS = [ 'Development Status :: 4 - Beta', 'Intended Audience :: Science/Research', 'License :: OSI Approved :: BSD License', 'Programming Language :: Python', 'Programming Language :: Python :: 2', 'Programming Language :: Python :: 2.7', 'Programming Language :: Python :: 3', 'Programming Language :: Python :: 3.4', 'Topic :: Scientific/Engineering', ] args = dict( name='deepdish', version='0.3.7', url="https://github.com/uchicago-cs/deepdish", description="Deep Learning experiments from University of Chicago.", maintainer='Gustav Larsson', maintainer_email='gustav.m.larsson@gmail.com', install_requires=required, extras_require={ 'image': ["skimage"], }, scripts=['scripts/ddls'], packages=[ 'deepdish', 'deepdish.parallel', 'deepdish.io', 'deepdish.util', ], license='BSD', classifiers=CLASSIFIERS, ) setup(**args)