deepdish-0.3.7/ 0000755 0001750 0001750 00000000000 14123256273 014477 5 ustar larsson larsson 0000000 0000000 deepdish-0.3.7/LICENSE 0000644 0001750 0001750 00000002717 13052123256 015505 0 ustar larsson larsson 0000000 0000000 Copyright (c) 2014, Amit Group
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
* Neither the name of the {organization} nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
deepdish-0.3.7/MANIFEST.in 0000644 0001750 0001750 00000000135 14123251306 016224 0 ustar larsson larsson 0000000 0000000 include requirements.txt
include requirements_docs.txt
recursive-include deepdish *.pyx *.py
deepdish-0.3.7/PKG-INFO 0000644 0001750 0001750 00000001356 14123256273 015601 0 ustar larsson larsson 0000000 0000000 Metadata-Version: 2.1
Name: deepdish
Version: 0.3.7
Summary: Deep Learning experiments from University of Chicago.
Home-page: https://github.com/uchicago-cs/deepdish
Maintainer: Gustav Larsson
Maintainer-email: gustav.m.larsson@gmail.com
License: BSD
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: BSD License
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.4
Classifier: Topic :: Scientific/Engineering
Provides-Extra: image
License-File: LICENSE
UNKNOWN
deepdish-0.3.7/README.rst 0000644 0001750 0001750 00000005247 14123254760 016175 0 ustar larsson larsson 0000000 0000000 .. image:: https://readthedocs.org/projects/deepdish/badge/?version=latest
:target: https://readthedocs.org/projects/deepdish/?badge=latest
:alt: Documentation Status
.. image:: https://travis-ci.org/uchicago-cs/deepdish.svg?branch=master
:target: https://travis-ci.org/uchicago-cs/deepdish/
.. image:: https://img.shields.io/pypi/v/deepdish.svg
:target: https://pypi.python.org/pypi/deepdish
.. image:: https://coveralls.io/repos/uchicago-cs/deepdish/badge.svg?branch=master&service=github
:target: https://coveralls.io/github/uchicago-cs/deepdish?branch=master
.. image:: https://img.shields.io/badge/license-BSD%203--Clause-blue.svg?style=flat
:target: http://opensource.org/licenses/BSD-3-Clause
deepdish
========
Flexible HDF5 saving/loading and other data science tools from the University of Chicago. This repository also host a Deep Learning blog:
* http://deepdish.io
Installation
------------
::
pip install deepdish
Alternatively (if you have conda with the `conda-forge `__ channel)::
conda install -c conda-forge deepdish
Main feature
------------
The primary feature of deepdish is its ability to save and load all kinds of
data as HDF5. It can save any Python data structure, offering the same ease of
use as pickling or `numpy.save `__. However, it improves by also offering:
- Interoperability between languages (HDF5 is a popular standard)
- Easy to inspect the content from the command line (using ``h5ls`` or our
specialized tool ``ddls``)
- Highly compressed storage (thanks to a PyTables backend)
- Native support for scipy sparse matrices and pandas ``DataFrame``, ``Series``
and ``Panel``
- Ability to partially read files, even slices of arrays
An example:
.. code:: python
import deepdish as dd
d = {
'foo': np.ones((10, 20)),
'sub': {
'bar': 'a string',
'baz': 1.23,
},
}
dd.io.save('test.h5', d)
This can be reconstructed using ``dd.io.load('test.h5')``, or inspected through
the command line using either a standard tool::
$ h5ls test.h5
foo Dataset {10, 20}
sub Group
Or, better yet, our custom tool ``ddls`` (or ``python -m deepdish.io.ls``)::
$ ddls test.h5
/foo array (10, 20) [float64]
/sub dict
/sub/bar 'a string' (8) [unicode]
/sub/baz 1.23 [float64]
Read more at `Saving and loading data `__.
Documentation
-------------
* http://deepdish.readthedocs.io/
deepdish-0.3.7/deepdish/ 0000755 0001750 0001750 00000000000 14123256273 016264 5 ustar larsson larsson 0000000 0000000 deepdish-0.3.7/deepdish/__init__.py 0000644 0001750 0001750 00000002763 14123255565 020410 0 ustar larsson larsson 0000000 0000000 from __future__ import print_function, division, absolute_import
# Load the following modules by default
from deepdish.core import (
bytesize,
humanize_bytesize,
memsize,
span,
apply_once,
tupled_argmax,
multi_range,
timed,
aslice,
)
from deepdish import io
from deepdish import util
from deepdish import image
from deepdish import parallel
from deepdish.conf import config
class MovedPackage(object):
def __init__(self, old_loc, new_loc):
self.old_loc = old_loc
self.new_loc = new_loc
def __getattr__(self, name):
raise ImportError('The package {} has been moved to {}'.format(
self.old_loc, self.new_loc))
# This is temporary: remove after a few minor releases
plot = MovedPackage('deepdish.plot', 'vzlog.image')
__all__ = ['deepdish',
'set_verbose',
'info',
'warning',
'bytesize',
'humanize_bytesize',
'memsize',
'span',
'apply_once',
'tupled_argmax',
'multi_range',
'io',
'util',
'image',
'plot',
'parallel',
'config',
'timed',
'aslice',
]
VERSION = (0, 3, 7)
ISRELEASED = True
__version__ = '{0}.{1}.{2}'.format(*VERSION)
if not ISRELEASED:
__version__ += '.git'
deepdish-0.3.7/deepdish/conf.py 0000644 0001750 0001750 00000000751 13052123256 017560 0 ustar larsson larsson 0000000 0000000 from __future__ import division, print_function, absolute_import
import os
import sys
if sys.version_info >= (3,):
from configparser import ConfigParser
else:
from ConfigParser import ConfigParser
def config():
"""
Loads and returns a ConfigParser from ``~/.deepdish.conf``.
"""
conf = ConfigParser()
# Set up defaults
conf.add_section('io')
conf.set('io', 'compression', 'zlib')
conf.read(os.path.expanduser('~/.deepdish.conf'))
return conf
deepdish-0.3.7/deepdish/core.py 0000644 0001750 0001750 00000017047 14123254760 017576 0 ustar larsson larsson 0000000 0000000 from __future__ import division, print_function, absolute_import
import time
import warnings
import numpy as np
import itertools as itr
import sys
from contextlib import contextmanager
warnings.simplefilter("ignore", np.ComplexWarning)
_is_verbose = False
_is_silent = False
class AbortException(Exception):
"""
This exception is used for when the user wants to quit algorithms mid-way.
The `AbortException` can for instance be sent by pygame input, and caught
by whatever is running the algorithm.
"""
pass
def bytesize(arr):
"""
Returns the memory byte size of a Numpy array as an integer.
"""
byte_size = np.prod(arr.shape) * np.dtype(arr.dtype).itemsize
return byte_size
def humanize_bytesize(byte_size):
order = np.log(byte_size) / np.log(1024)
orders = [
(5, 'PB'),
(4, 'TB'),
(3, 'GB'),
(2, 'MB'),
(1, 'KB'),
(0, 'B')
]
for ex, name in orders:
if order >= ex:
return '{:.4g} {}'.format(byte_size / 1024**ex, name)
def memsize(arr):
"""
Returns the required memory of a Numpy array as a humanly readable string.
"""
return humanize_bytesize(bytesize(arr))
def span(arr):
"""
Calculate and return the mininum and maximum of an array.
Parameters
----------
arr : ndarray
Numpy array.
Returns
-------
min : dtype
Minimum of array.
max : dtype
Maximum of array.
"""
# TODO: This could be made faster with a custom ufunc
return (np.min(arr), np.max(arr))
def apply_once(func, arr, axes, keepdims=True):
"""
Similar to `numpy.apply_over_axes`, except this performs the operation over
a flattened version of all the axes, meaning that the function will only be
called once. This only makes a difference for non-linear functions.
Parameters
----------
func : callback
Function that operates well on Numpy arrays and returns a single value
of compatible dtype.
arr : ndarray
Array to do operation over.
axes : int or iterable
Specifies the axes to perform the operation. Only one call will be made
to `func`, with all values flattened.
keepdims : bool
By default, this is True, so the collapsed dimensions remain with
length 1. This is simlar to `numpy.apply_over_axes` in that regard. If
this is set to False, the dimensions are removed, just like when using
for instance `numpy.sum` over a single axis. Note that this is safer
than subsequently calling squeeze, since this option will preserve
length-1 dimensions that were not operated on.
Examples
--------
>>> import deepdish as dd
>>> import numpy as np
>>> rs = np.random.RandomState(0)
>>> x = rs.uniform(size=(10, 3, 3))
Image that you have ten 3x3 images and you want to calculate each image's
intensity standard deviation:
>>> np.apply_over_axes(np.std, x, [1, 2]).ravel()
array([ 0.06056838, 0.08230712, 0.08135083, 0.09938963, 0.08533604,
0.07830725, 0.066148 , 0.07983019, 0.08134123, 0.01839635])
This is the same as ``x.std(1).std(1)``, which is not the standard
deviation of all 9 pixels together. To fix this we can flatten the pixels
and try again:
>>> x.reshape(10, 9).std(axis=1)
array([ 0.17648981, 0.32849108, 0.29409526, 0.25547501, 0.23649064,
0.26928468, 0.20081239, 0.33052397, 0.29950855, 0.26535717])
This is exactly what this function does for you:
>>> dd.apply_once(np.std, x, [1, 2], keepdims=False)
array([ 0.17648981, 0.32849108, 0.29409526, 0.25547501, 0.23649064,
0.26928468, 0.20081239, 0.33052397, 0.29950855, 0.26535717])
"""
all_axes = np.arange(arr.ndim)
if isinstance(axes, int):
axes = {axes}
else:
axes = set(axis % arr.ndim for axis in axes)
principal_axis = min(axes)
for i, axis in enumerate(axes):
axis0 = principal_axis + i
if axis != axis0:
all_axes[axis0], all_axes[axis] = all_axes[axis], all_axes[axis0]
transposed_arr = arr.transpose(all_axes)
new_shape = []
new_shape_keepdims = []
for axis, dim in enumerate(arr.shape):
if axis == principal_axis:
new_shape.append(-1)
elif axis not in axes:
new_shape.append(dim)
if axis in axes:
new_shape_keepdims.append(1)
else:
new_shape_keepdims.append(dim)
collapsed = np.apply_along_axis(func,
principal_axis,
transposed_arr.reshape(new_shape))
if keepdims:
return collapsed.reshape(new_shape_keepdims)
else:
return collapsed
def tupled_argmax(a):
"""
Argmax that returns an index tuple. Note that `numpy.argmax` will return a
scalar index as if you had flattened the array.
Parameters
----------
a : array_like
Input array.
Returns
-------
index : tuple
Tuple of index, even if `a` is one-dimensional. Note that this can
immediately be used to index `a` as in ``a[index]``.
Examples
--------
>>> import numpy as np
>>> import deepdish as dd
>>> a = np.arange(6).reshape(2,3)
>>> a
array([[0, 1, 2],
[3, 4, 5]])
>>> dd.tupled_argmax(a)
(1, 2)
"""
return np.unravel_index(np.argmax(a), np.shape(a))
def multi_range(*args):
return itr.product(*[range(a) for a in args])
@contextmanager
def timed(name=None, file=sys.stdout, callback=None, wall_clock=True):
"""
Context manager to make it easy to time the execution of a piece of code.
This timer will never run your code several times and is meant more for
simple in-production timing, instead of benchmarking. Reports the
wall-clock time (using `time.time`) and not the processor time.
Parameters
----------
name : str
Name of the timing block, to identify it.
file : file handler
Which file handler to print the results to. Default is standard output.
If a numpy array and size 1 is given, the time in seconds will be
stored inside it. Ignored if `callback` is set.
callback : callable
This offer even more flexibility than `file`. The callable will be
called at the end of the execution with a single floating point
argument with the elapsed time in seconds.
Examples
--------
>>> import deepdish as dd
>>> import time
The `timed` function is a context manager, so everything inside the
``with`` block will be timed. The results will be printed by default to
standard output:
>>> with dd.timed('Sleep'): # doctest: +SKIP
... time.sleep(1)
[timed] Sleep: 1.001035451889038 s
Using the `callback` parameter, we can accumulate multiple runs into a
list:
>>> times = []
>>> for i in range(3): # doctest: +SKIP
... with dd.timed(callback=times.append):
... time.sleep(1)
>>> times # doctest: +SKIP
[1.0035350322723389, 1.0035550594329834, 1.0039470195770264]
"""
start = time.time()
yield
end = time.time()
delta = end - start
if callback is not None:
callback(delta)
elif isinstance(file, np.ndarray) and len(file) == 1:
file[0] = delta
else:
name_str = ' {}'.format(name) if name is not None else ''
print(("[timed]{0}: {1} s".format(name_str, delta)), file=file)
class SliceClass(object):
def __getitem__(self, index):
return index
aslice = SliceClass()
deepdish-0.3.7/deepdish/experiments/ 0000755 0001750 0001750 00000000000 14123256273 020627 5 ustar larsson larsson 0000000 0000000 deepdish-0.3.7/deepdish/experiments/__init__.py 0000644 0001750 0001750 00000000000 13052123256 022720 0 ustar larsson larsson 0000000 0000000 deepdish-0.3.7/deepdish/experiments/pylearn2/ 0000755 0001750 0001750 00000000000 14123256273 022363 5 ustar larsson larsson 0000000 0000000 deepdish-0.3.7/deepdish/experiments/pylearn2/datasets/ 0000755 0001750 0001750 00000000000 14123256273 024173 5 ustar larsson larsson 0000000 0000000 deepdish-0.3.7/deepdish/experiments/pylearn2/datasets/mediaeval.py 0000644 0001750 0001750 00000006572 13052123256 026500 0 ustar larsson larsson 0000000 0000000 """
.. todo::
Based on code from pylearn2.datasets.hdf5
"""
__authors__ = "Mark Stoehr"
__copyright__ = "Copyright 2014, Mark Stoehr"
__credits__ = ["Mark Stoehr"]
__license__ = "MIT"
__maintainer__ = "deepdish.io"
__email__ = "mark@deepdish.io"
import numpy as np
import warnings
from pylearn2.datasets import (dense_design_matrix,
control,
cache)
from pylearn2.datasets.hdf5 import (HDF5Dataset,
DenseDesignMatrix,
HDF5DatasetIterator,
HDF5ViewConverter,
HDF5TopoViewConverter)
from pylearn2.utils import serial
try:
import h5py
except ImportError:
h5py = None
from pylearn2.utils.rng import make_np_rng
class MediaEval(DenseDesignMatrix):
"""
.. todo::
WRITEME
Parameters
----------
filename
X
y
start
stop
"""
def __init__(self,filename,X,y,start,stop**kwargs):
self.load_all = False
if h5py is None:
raise RuntimeError("Could not import h5py")
self.h5py.File(filename)
X = self.get_dataset(X)
y = self.get_dataset(y)
super(MediaEval,self).__init__(X=X,y=y,**kwargs)
def _check_labels(self):
"""
Sanity checks for X_labels and y_labels.
Since the np.all test used for these labels does not work with HDF5
datasets, we issue a warning that those values are not checked.
"""
if self.X_labels is not None:
assert self.X is not None
assert self.view_converter is None
assert self.X.ndim <= 2
if self.load_all:
assert np.all(self.X < self.X_labels)
else:
warnings.warn("HDF5Dataset cannot perform test np.all(X < " +
"X_labels). Use X_labels at your own risk.")
if self.y_labels is not None:
assert self.y is not None
assert self.y.ndim <= 2
if self.load_all:
assert np.all(self.y < self.y_labels)
else:
warnings.warn("HDF5Dataset cannot perform test np.all(y < " +
"y_labels). Use y_labels at your own risk.")
def get_dataset(self, dataset, load_all=False):
"""
Get a handle for an HDF5 dataset, or load the entire dataset into
memory.
Parameters
----------
dataset : str
Name or path of HDF5 dataset.
load_all : bool, optional (default False)
If true, load dataset into memory.
"""
if load_all:
data = self._file[dataset][:]
else:
data = self._file[dataset]
data.ndim = len(data.shape) # hdf5 handle has no ndim
return data
def iterator(self, *args, **kwargs):
"""
Get an iterator for this dataset.
The FiniteDatasetIterator uses indexing that is not supported by
HDF5 datasets, so we change the class to HDF5DatasetIterator to
override the iterator.next method used in dataset iteration.
Parameters
----------
WRITEME
"""
iterator = super(MediaEval, self).iterator(*args, **kwargs)
iterator.__class__ = HDF5DatasetIterator
return iterator
deepdish-0.3.7/deepdish/image.py 0000644 0001750 0001750 00000023577 14123254760 017735 0 ustar larsson larsson 0000000 0000000 """
Basic functions for working with images.
"""
from __future__ import division, print_function, absolute_import
import itertools as itr
import numpy as np
def _import_skimage():
"""Import scikit-image, with slightly modified `ImportError` message"""
try:
import skimage
except ImportError:
raise ImportError("scikit-image is required to use this function.")
return skimage
def _import_pil():
"""Import scikit-image, with slightly modified `ImportError` message"""
try:
import PIL
except ImportError:
raise ImportError("PIL/Pillow is required to use this function.")
return PIL
def resize_by_factor(im, factor):
"""
Resizes the image according to a factor. The image is pre-filtered
with a Gaussian and then resampled with bilinear interpolation.
This function uses scikit-image and essentially combines its
`pyramid_reduce` with `pyramid_expand` into one function.
Returns the same object if factor is 1, not a copy.
Parameters
----------
im : ndarray, ndim=2 or 3
Image. Either 2D or 3D with 3 or 4 channels.
factor : float
Resize factor, e.g. a factor of 0.5 will halve both sides.
"""
_import_skimage()
from skimage.transform.pyramids import pyramid_reduce, pyramid_expand
if factor < 1:
return pyramid_reduce(im, downscale=1/factor)
elif factor > 1:
return pyramid_expand(im, upscale=factor)
else:
return im
def resize(im, shape=None, max_side=None, min_side=None):
if min_side is not None:
min = np.min(im.shape[:2])
factor = min_side / min
return resize_by_factor(im, factor)
elif max_side is not None:
max = np.max(im.shape[:2])
factor = max_side / max
return resize_by_factor(im, factor)
else:
factor_y = shape[0] / im.shape[0]
factor_x = shape[1] / im.shape[1]
assert np.fabs(factor_x - factor_y) < 0.5
return resize_by_factor(im, factor_x)
def asgray(im):
"""
Takes an image and returns its grayscale version by averaging the color
channels. if an alpha channel is present, it will simply be ignored. If a
grayscale image is given, the original image is returned.
Parameters
----------
image : ndarray, ndim 2 or 3
RGB or grayscale image.
Returns
-------
gray_image : ndarray, ndim 2
Grayscale version of image.
"""
if im.ndim == 2:
return im
elif im.ndim == 3 and im.shape[2] in (3, 4):
return im[..., :3].mean(axis=-1)
else:
raise ValueError('Invalid image format')
def crop(im, size):
"""
Crops an image in the center.
Parameters
----------
size : tuple, (height, width)
Finally size after cropping.
"""
diff = [im.shape[index] - size[index] for index in (0, 1)]
im2 = im[diff[0]//2:diff[0]//2 + size[0], diff[1]//2:diff[1]//2 + size[1]]
return im2
def crop_or_pad(im, size, value=0):
"""
Crops an image in the center.
Parameters
----------
size : tuple, (height, width)
Finally size after cropping.
"""
diff = [im.shape[index] - size[index] for index in (0, 1)]
im2 = im[diff[0]//2:diff[0]//2 + size[0], diff[1]//2:diff[1]//2 + size[1]]
return im2
def crop_to_bounding_box(im, bb):
"""
Crops according to a bounding box.
Parameters
----------
bounding_box : tuple, (top, left, bottom, right)
Crops inclusively for top/left and exclusively for bottom/right.
"""
return im[bb[0]:bb[2], bb[1]:bb[3]]
def load(path, dtype=np.float64):
"""
Loads an image from file.
Parameters
----------
path : str
Path to image file.
dtype : np.dtype
Defaults to ``np.float64``, which means the image will be returned as a
float with values between 0 and 1. If ``np.uint8`` is specified, the
values will be between 0 and 255 and no conversion cost will be
incurred.
"""
_import_skimage()
import skimage.io
im = skimage.io.imread(path)
if dtype == np.uint8:
return im
elif dtype in {np.float16, np.float32, np.float64}:
return im.astype(dtype) / 255
else:
raise ValueError('Unsupported dtype')
def load_raw(path):
"""
Load image using PIL/Pillow without any processing. This is particularly
useful for palette images, which will be loaded using their palette index
values as opposed to `load` which will convert them to RGB.
Parameters
----------
path : str
Path to image file.
"""
_import_pil()
from PIL import Image
return np.array(Image.open(path))
def save(path, im):
"""
Saves an image to file.
If the image is type float, it will assume to have values in [0, 1].
Parameters
----------
path : str
Path to which the image will be saved.
im : ndarray (image)
Image.
"""
from PIL import Image
if im.dtype == np.uint8:
pil_im = Image.fromarray(im)
else:
pil_im = Image.fromarray((im*255).astype(np.uint8))
pil_im.save(path)
def integrate(ii, r0, c0, r1, c1):
"""
Use an integral image to integrate over a given window.
Parameters
----------
ii : ndarray
Integral image.
r0, c0 : int
Top-left corner of block to be summed.
r1, c1 : int
Bottom-right corner of block to be summed.
Returns
-------
S : int
Integral (sum) over the given window.
"""
# This line is modified
S = np.zeros(ii.shape[-1])
S += ii[r1, c1]
if (r0 - 1 >= 0) and (c0 - 1 >= 0):
S += ii[r0 - 1, c0 - 1]
if (r0 - 1 >= 0):
S -= ii[r0 - 1, c1]
if (c0 - 1 >= 0):
S -= ii[r1, c0 - 1]
return S
def offset(img, offset, fill_value=0):
"""
Moves the contents of image without changing the image size. The missing
values are given a specified fill value.
Parameters
----------
img : array
Image.
offset : (vertical_offset, horizontal_offset)
Tuple of length 2, specifying the offset along the two axes.
fill_value : dtype of img
Fill value. Defaults to 0.
"""
sh = img.shape
if sh == (0, 0):
return img
else:
x = np.empty(sh)
x[:] = fill_value
x[max(offset[0], 0):min(sh[0]+offset[0], sh[0]),
max(offset[1], 0):min(sh[1]+offset[1], sh[1])] = \
img[max(-offset[0], 0):min(sh[0]-offset[0], sh[0]),
max(-offset[1], 0):min(sh[1]-offset[1], sh[1])]
return x
def bounding_box(alpha, threshold=0.1):
"""
Returns a bounding box of the support.
Parameters
----------
alpha : ndarray, ndim=2
Any one-channel image where the background has zero or low intensity.
threshold : float
The threshold that divides background from foreground.
Returns
-------
bounding_box : (top, left, bottom, right)
The bounding box describing the smallest rectangle containing the
foreground object, as defined by the threshold.
"""
assert alpha.ndim == 2
# Take the bounding box of the support, with a certain threshold.
supp_axs = [alpha.max(axis=1-i) for i in range(2)]
# Check first and last value of that threshold
bb = [np.where(supp_axs[i] > threshold)[0][[0, -1]] for i in range(2)]
return (bb[0][0], bb[1][0], bb[0][1], bb[1][1])
def bounding_box_as_binary_map(alpha, threshold=0.1):
"""
Similar to `bounding_box`, except returns the bounding box as a
binary map the same size as the input.
Same parameters as `bounding_box`.
Returns
-------
binary_map : ndarray, ndim=2, dtype=np.bool_
Binary map with True if object and False if background.
"""
bb = bounding_box(alpha)
x = np.zeros(alpha.shape, dtype=np.bool_)
x[bb[0]:bb[2], bb[1]:bb[3]] = 1
return x
def extract_patches(images, patch_shape, samples_per_image=40, seed=0,
cycle=True):
"""
Takes a set of images and yields randomly chosen patches of specified size.
Parameters
----------
images : iterable
The images have to be iterable, and each element must be a Numpy array
with at least two spatial 2 dimensions as the first and second axis.
patch_shape : tuple, length 2
The spatial shape of the patches that should be extracted. If the
images have further dimensions beyond the spatial, the patches will
copy these too.
samples_per_image : int
Samples to extract before moving on to the next image.
seed : int
Seed with which to select the patches.
cycle : bool
If True, then the function will produce patches indefinitely, by going
back to the first image when all are done. If False, the iteration will
stop when there are no more images.
Returns
-------
patch_generator
This function returns a generator that will produce patches.
Examples
--------
>>> import deepdish as dd
>>> import matplotlib.pylab as plt
>>> import itertools
>>> images = ag.io.load_example('mnist')
Now, let us say we want to exact patches from the these, where each patch
has at least some activity.
>>> gen = dd.image.extract_patches(images, (5, 5))
>>> gen = (x for x in gen if x.mean() > 0.1)
>>> patches = np.array(list(itertools.islice(gen, 25)))
>>> patches.shape
(25, 5, 5)
>>> dd.plot.images(patches)
>>> plt.show()
"""
rs = np.random.RandomState(seed)
for Xi in itr.cycle(images):
# How many patches could we extract?
w, h = [Xi.shape[i]-patch_shape[i] for i in range(2)]
assert w > 0 and h > 0
# Maybe shuffle an iterator of the indices?
indices = np.asarray(list(itr.product(range(w), range(h))))
rs.shuffle(indices)
for x, y in indices[:samples_per_image]:
yield Xi[x:x+patch_shape[0], y:y+patch_shape[1]]
deepdish-0.3.7/deepdish/io/ 0000755 0001750 0001750 00000000000 14123256273 016673 5 ustar larsson larsson 0000000 0000000 deepdish-0.3.7/deepdish/io/__init__.py 0000644 0001750 0001750 00000000657 13052123256 021006 0 ustar larsson larsson 0000000 0000000 from __future__ import division, print_function, absolute_import
try:
import tables
_pytables_ok = True
del tables
except ImportError:
_pytables_ok = False
if _pytables_ok:
from .hdf5io import load, save, ForcePickle, Compression
else:
def _f(*args, **kwargs):
raise ImportError("You need PyTables for this function")
load = save = _f
__all__ = ['load', 'save', 'ForcePickle', 'Compression']
deepdish-0.3.7/deepdish/io/hdf5io.py 0000644 0001750 0001750 00000062045 14123254760 020431 0 ustar larsson larsson 0000000 0000000 from __future__ import division, print_function, absolute_import
import numpy as np
import tables
import warnings
from scipy import sparse
from deepdish import conf
try:
import pandas as pd
pd.io.pytables._tables()
_pandas = True
except ImportError:
_pandas = False
try:
from types import SimpleNamespace
_sns = True
except ImportError:
_sns = False
from deepdish import six
IO_VERSION = 12
DEEPDISH_IO_PREFIX = 'DEEPDISH_IO'
DEEPDISH_IO_VERSION_STR = DEEPDISH_IO_PREFIX + '_VERSION'
DEEPDISH_IO_UNPACK = DEEPDISH_IO_PREFIX + '_DEEPDISH_IO_UNPACK'
DEEPDISH_IO_ROOT_IS_SNS = DEEPDISH_IO_PREFIX + '_ROOT_IS_SNS'
# Types that should be saved as pytables attribute
ATTR_TYPES = (int, float, bool, six.string_types, six.binary_type,
np.int8, np.int16, np.int32, np.int64, np.uint8,
np.uint16, np.uint32, np.uint64, np.float16, np.float32,
np.float64, np.bool_, np.complex64, np.complex128)
if _pandas:
class _HDFStoreWithHandle(pd.io.pytables.HDFStore):
def __init__(self, handle):
self._path = None
self._complevel = None
self._complib = None
self._fletcher32 = False
self._filters = None
self._handle = handle
def is_pandas_dataframe(level):
return ('pandas_version' in level._v_attrs and
'pandas_type' in level._v_attrs)
class ForcePickle(object):
"""
When saving an object with `deepdish.io.save`, you can wrap objects in this
class to force them to be pickled. They will automatically be unpacked at
load time.
"""
def __init__(self, obj):
self.obj = obj
class Compression(object):
"""
Class to enable explicit compression settings for individual arrays.
"""
def __init__(self, obj, compression='default'):
self.obj = obj
self.compression = compression
def _dict_native_ok(d):
"""
This checks if a dictionary can be saved natively as HDF5 groups.
If it can't, it will be pickled.
"""
if len(d) >= 256:
return False
# All keys must be strings
for k in d:
if not isinstance(k, six.string_types):
return False
return True
def _get_compression_filters(compression='default'):
if compression == 'default':
config = conf.config()
compression = config.get('io', 'compression')
elif compression is True:
compression = 'zlib'
if (compression is False or compression is None or
compression == 'none' or compression == 'None'):
ff = None
else:
if isinstance(compression, (tuple, list)):
compression, level = compression
else:
level = 9
try:
ff = tables.Filters(complevel=level, complib=compression,
shuffle=True)
except Exception:
warnings.warn(("(deepdish.io.save) Missing compression method {}: "
"no compression will be used.").format(compression))
ff = None
return ff
def _save_ndarray(handler, group, name, x, filters=None):
if np.issubdtype(x.dtype, np.unicode_):
# Convert unicode strings to pure byte arrays
strtype = b'unicode'
itemsize = x.itemsize // 4
atom = tables.UInt8Atom()
x = x.view(dtype=np.uint8)
elif np.issubdtype(x.dtype, np.string_):
strtype = b'ascii'
itemsize = x.itemsize
atom = tables.StringAtom(itemsize)
elif x.dtype == np.object:
# Not supported by HDF5, force pickling
_save_pickled(handler, group, x, name=name)
return
else:
atom = tables.Atom.from_dtype(x.dtype)
strtype = None
itemsize = None
if x.ndim > 0 and np.min(x.shape) == 0:
sh = np.array(x.shape)
atom0 = tables.Atom.from_dtype(np.dtype(np.int64))
node = handler.create_array(group, name, atom=atom0,
shape=(sh.size,))
node._v_attrs.zeroarray_dtype = np.dtype(x.dtype).str.encode('ascii')
node[:] = sh
return
if x.ndim == 0 and len(x.shape) == 0:
# This is a numpy array scalar. We will store it as a regular scalar
# instead, which means it will be unpacked as a numpy scalar (not numpy
# array scalar)
setattr(group._v_attrs, name, x[()])
return
# For small arrays, compression actually leads to larger files, so we are
# settings a threshold here. The threshold has been set through
# experimentation.
if filters is not None and x.size > 300:
node = handler.create_carray(group, name, atom=atom,
shape=x.shape,
chunkshape=None,
filters=filters)
else:
node = handler.create_array(group, name, atom=atom,
shape=x.shape)
if strtype is not None:
node._v_attrs.strtype = strtype
node._v_attrs.itemsize = itemsize
node[:] = x
def _save_pickled(handler, group, level, name=None):
warnings.warn(('(deepdish.io.save) Pickling {}: This may cause '
'incompatibities (for instance between Python 2 and '
'3) and should ideally be avoided').format(level),
DeprecationWarning)
node = handler.create_vlarray(group, name, tables.ObjectAtom())
node.append(level)
def _is_linkable(level):
if isinstance(level, ATTR_TYPES):
return False
return True
def _save_level(handler, group, level, name=None, filters=None, idtable=None):
_id = id(level)
try:
oldpath = idtable[_id]
except KeyError:
if _is_linkable(level):
# store path to object:
if group._v_pathname.endswith('/'):
idtable[_id] = '{}{}'.format(group._v_pathname, name)
else:
idtable[_id] = '{}/{}'.format(group._v_pathname, name)
else:
# object already saved, so create soft link to it:
handler.create_soft_link(group, name, target=oldpath)
return
if isinstance(level, Compression):
custom_filters = _get_compression_filters(level.compression)
return _save_level(handler, group, level.obj, name=name,
filters=custom_filters, idtable=idtable)
elif isinstance(level, ForcePickle):
_save_pickled(handler, group, level, name=name)
elif isinstance(level, dict) and _dict_native_ok(level):
# First create a new group
new_group = handler.create_group(group, name,
"dict:{}".format(len(level)))
for k, v in level.items():
if isinstance(k, six.string_types):
_save_level(handler, new_group, v, name=k, filters=filters,
idtable=idtable)
elif (_sns and isinstance(level, SimpleNamespace) and
_dict_native_ok(level.__dict__)):
# Create a new group in same manner as for dict
new_group = handler.create_group(
group, name, "SimpleNamespace:{}".format(len(level.__dict__)))
for k, v in level.__dict__.items():
if isinstance(k, six.string_types):
_save_level(handler, new_group, v, name=k, filters=filters,
idtable=idtable)
elif isinstance(level, list) and len(level) < 256:
# Lists can contain other dictionaries and numpy arrays, so we don't
# want to serialize them. Instead, we will store each entry as i0, i1,
# etc.
new_group = handler.create_group(group, name,
"list:{}".format(len(level)))
for i, entry in enumerate(level):
level_name = 'i{}'.format(i)
_save_level(handler, new_group, entry,
name=level_name, filters=filters, idtable=idtable)
elif isinstance(level, tuple) and len(level) < 256:
# Lists can contain other dictionaries and numpy arrays, so we don't
# want to serialize them. Instead, we will store each entry as i0, i1,
# etc.
new_group = handler.create_group(group, name,
"tuple:{}".format(len(level)))
for i, entry in enumerate(level):
level_name = 'i{}'.format(i)
_save_level(handler, new_group, entry, name=level_name,
filters=filters, idtable=idtable)
elif isinstance(level, np.ndarray):
_save_ndarray(handler, group, name, level, filters=filters)
elif _pandas and isinstance(level, (pd.DataFrame, pd.Series)):
store = _HDFStoreWithHandle(handler)
store.put(group._v_pathname + '/' + name, level)
elif isinstance(level, (sparse.dok_matrix,
sparse.lil_matrix)):
raise NotImplementedError(
'deepdish.io.save does not support DOK or LIL matrices; '
'please convert before saving to one of the following supported '
'types: BSR, COO, CSR, CSC, DIA')
elif isinstance(level, (sparse.csr_matrix,
sparse.csc_matrix,
sparse.bsr_matrix)):
new_group = handler.create_group(group, name, "sparse:")
_save_ndarray(handler, new_group, 'data', level.data, filters=filters)
_save_ndarray(handler, new_group, 'indices', level.indices,
filters=filters)
_save_ndarray(handler, new_group, 'indptr', level.indptr,
filters=filters)
_save_ndarray(handler, new_group, 'shape', np.asarray(level.shape))
new_group._v_attrs.format = level.format
new_group._v_attrs.maxprint = level.maxprint
elif isinstance(level, sparse.dia_matrix):
new_group = handler.create_group(group, name, "sparse:")
_save_ndarray(handler, new_group, 'data', level.data, filters=filters)
_save_ndarray(handler, new_group, 'offsets', level.offsets,
filters=filters)
_save_ndarray(handler, new_group, 'shape', np.asarray(level.shape))
new_group._v_attrs.format = level.format
new_group._v_attrs.maxprint = level.maxprint
elif isinstance(level, sparse.coo_matrix):
new_group = handler.create_group(group, name, "sparse:")
_save_ndarray(handler, new_group, 'data', level.data, filters=filters)
_save_ndarray(handler, new_group, 'col', level.col, filters=filters)
_save_ndarray(handler, new_group, 'row', level.row, filters=filters)
_save_ndarray(handler, new_group, 'shape', np.asarray(level.shape))
new_group._v_attrs.format = level.format
new_group._v_attrs.maxprint = level.maxprint
elif isinstance(level, ATTR_TYPES):
setattr(group._v_attrs, name, level)
elif level is None:
# Store a None as an empty group
new_group = handler.create_group(group, name, "nonetype:")
else:
_save_pickled(handler, group, level, name=name)
def _load_specific_level(handler, grp, path, sel=None, pathtable=None):
if path == '':
if sel is not None:
return _load_sliced_level(handler, grp, sel)
else:
return _load_level(handler, grp, pathtable)
vv = path.split('/', 1)
if len(vv) == 1:
if hasattr(grp, vv[0]):
if sel is not None:
return _load_sliced_level(handler, getattr(grp, vv[0]), sel)
else:
return _load_level(handler, getattr(grp, vv[0]), pathtable)
elif hasattr(grp, '_v_attrs') and vv[0] in grp._v_attrs:
if sel is not None:
raise ValueError("Cannot slice this type")
v = grp._v_attrs[vv[0]]
if isinstance(v, np.string_):
v = v.decode('utf-8')
return v
else:
raise ValueError('Undefined entry "{}"'.format(vv[0]))
else:
level, rest = vv
if level == '':
return _load_specific_level(handler, grp.root, rest, sel=sel,
pathtable=pathtable)
else:
if hasattr(grp, level):
return _load_specific_level(handler, getattr(grp, level),
rest, sel=sel, pathtable=pathtable)
else:
raise ValueError('Undefined group "{}"'.format(level))
def _load_pickled(level):
if isinstance(level[0], ForcePickle):
return level[0].obj
else:
return level[0]
def _load_nonlink_level(handler, level, pathtable, pathname):
"""
Loads level and builds appropriate type, without handling softlinks
"""
if isinstance(level, tables.Group):
if _sns and (level._v_title.startswith('SimpleNamespace:') or
DEEPDISH_IO_ROOT_IS_SNS in level._v_attrs):
val = SimpleNamespace()
dct = val.__dict__
elif level._v_title.startswith('list:'):
dct = {}
val = []
else:
dct = {}
val = dct
# in case of recursion, object needs to be put in pathtable
# before trying to fully load it
pathtable[pathname] = val
# Load sub-groups
for grp in level:
lev = _load_level(handler, grp, pathtable)
n = grp._v_name
# Check if it's a complicated pair or a string-value pair
if n.startswith('__pair'):
dct[lev['key']] = lev['value']
else:
dct[n] = lev
# Load attributes
for name in level._v_attrs._f_list():
if name.startswith(DEEPDISH_IO_PREFIX):
continue
v = level._v_attrs[name]
dct[name] = v
if level._v_title.startswith('list:'):
N = int(level._v_title[len('list:'):])
for i in range(N):
val.append(dct['i{}'.format(i)])
return val
elif level._v_title.startswith('tuple:'):
N = int(level._v_title[len('tuple:'):])
lst = []
for i in range(N):
lst.append(dct['i{}'.format(i)])
return tuple(lst)
elif level._v_title.startswith('nonetype:'):
return None
elif is_pandas_dataframe(level):
assert _pandas, "pandas is required to read this file"
store = _HDFStoreWithHandle(handler)
return store.get(level._v_pathname)
elif level._v_title.startswith('sparse:'):
frm = level._v_attrs.format
if frm in ('csr', 'csc', 'bsr'):
shape = tuple(level.shape[:])
cls = {'csr': sparse.csr_matrix,
'csc': sparse.csc_matrix,
'bsr': sparse.bsr_matrix}
matrix = cls[frm](shape)
matrix.data = level.data[:]
matrix.indices = level.indices[:]
matrix.indptr = level.indptr[:]
matrix.maxprint = level._v_attrs.maxprint
return matrix
elif frm == 'dia':
shape = tuple(level.shape[:])
matrix = sparse.dia_matrix(shape)
matrix.data = level.data[:]
matrix.offsets = level.offsets[:]
matrix.maxprint = level._v_attrs.maxprint
return matrix
elif frm == 'coo':
shape = tuple(level.shape[:])
matrix = sparse.coo_matrix(shape)
matrix.data = level.data[:]
matrix.col = level.col[:]
matrix.row = level.row[:]
matrix.maxprint = level._v_attrs.maxprint
return matrix
else:
raise ValueError('Unknown sparse matrix type: {}'.format(frm))
else:
return val
elif isinstance(level, tables.VLArray):
if level.shape == (1,):
return _load_pickled(level)
else:
return level[:]
elif isinstance(level, tables.Array):
if 'zeroarray_dtype' in level._v_attrs:
# Unpack zero-size arrays (shape is stored in an HDF5 array and
# type is stored in the attibute 'zeroarray_dtype')
dtype = level._v_attrs.zeroarray_dtype
sh = level[:]
return np.zeros(tuple(sh), dtype=dtype)
if 'strtype' in level._v_attrs:
strtype = level._v_attrs.strtype
itemsize = level._v_attrs.itemsize
if strtype == b'unicode':
return level[:].view(dtype=(np.unicode_, itemsize))
elif strtype == b'ascii':
return level[:].view(dtype=(np.string_, itemsize))
# This serves two purposes:
# (1) unpack big integers: the only time we save arrays like this
# (2) unpack non-deepdish "scalars"
if level.shape == ():
return level[()]
return level[:]
def _load_level(handler, level, pathtable):
"""
Loads level and builds appropriate type, handling softlinks if necessary
"""
if isinstance(level, tables.link.SoftLink):
# this is a link, so see if target is already loaded, return it
pathname = level.target
node = level()
else:
# not a link, but it might be a target that's already been
# loaded ... if so, return it
pathname = level._v_pathname
node = level
try:
return pathtable[pathname]
except KeyError:
pathtable[pathname] = _load_nonlink_level(handler, node, pathtable,
pathname)
return pathtable[pathname]
def _load_sliced_level(handler, level, sel):
if isinstance(level, tables.link.SoftLink):
# this is a link; get target:
level = level()
if isinstance(level, tables.VLArray):
if level.shape == (1,):
return _load_pickled(level)
else:
return level[sel]
elif isinstance(level, tables.Array):
return level[sel]
else:
raise ValueError('Cannot partially load this data type using `sel`')
def save(path, data, compression='default'):
"""
Save any Python structure to an HDF5 file. It is particularly suited for
Numpy arrays. This function works similar to ``numpy.save``, except if you
save a Python object at the top level, you do not need to issue
``data.flat[0]`` to retrieve it from inside a Numpy array of type
``object``.
Some types of objects get saved natively in HDF5. The rest get serialized
automatically. For most needs, you should be able to stick to the natively
supported types, which are:
* Dictionaries
* Short lists and tuples (<256 in length)
* Basic data types (including strings and None)
* Numpy arrays
* Scipy sparse matrices
* Pandas ``DataFrame``, ``Series``, and ``Panel``
* SimpleNamespaces (for Python >= 3.3, but see note below)
A recommendation is to always convert your data to using only these types
That way your data will be portable and can be opened through any HDF5
reader. A class that helps you with this is
:class:`deepdish.util.Saveable`.
Lists and tuples are supported and can contain heterogeneous types. This is
mostly useful and plays well with HDF5 for short lists and tuples. If you
have a long list (>256) it will be serialized automatically. However,
in such cases it is common for the elements to have the same type, in which
case we strongly recommend converting to a Numpy array first.
Note that the SimpleNamespace type will be read in as dictionaries for
earlier versions of Python.
This function requires the `PyTables `_ module to
be installed.
You can change the default compression method to ``blosc`` (much faster,
but less portable) by creating a ``~/.deepdish.conf`` with::
[io]
compression: blosc
This is the recommended compression method if you plan to use your HDF5
files exclusively through deepdish (or PyTables).
Parameters
----------
path : string
Filename to which the data is saved.
data : anything
Data to be saved. This can be anything from a Numpy array, a string, an
object, or a dictionary containing all of them including more
dictionaries.
compression : string or tuple
Set compression method, choosing from `blosc`, `zlib`, `lzo`, `bzip2`
and more (see PyTables documentation). It can also be specified as a
tuple (e.g. ``('blosc', 5)``), with the latter value specifying the
level of compression, choosing from 0 (no compression) to 9 (maximum
compression). Set to `None` to turn off compression. The default is
`zlib`, since it is highly portable; for much greater speed, try for
instance `blosc`.
See also
--------
load
"""
filters = _get_compression_filters(compression)
with tables.open_file(path, mode='w') as h5file:
# If the data is a dictionary, put it flatly in the root
group = h5file.root
group._v_attrs[DEEPDISH_IO_VERSION_STR] = IO_VERSION
idtable = {} # dict to keep track of objects already saved
# Sparse matrices match isinstance(data, dict), so we'll have to be
# more strict with the type checking
if type(data) == type({}) and _dict_native_ok(data):
idtable[id(data)] = '/'
for key, value in data.items():
_save_level(h5file, group, value, name=key,
filters=filters, idtable=idtable)
elif (_sns and isinstance(data, SimpleNamespace) and
_dict_native_ok(data.__dict__)):
idtable[id(data)] = '/'
group._v_attrs[DEEPDISH_IO_ROOT_IS_SNS] = True
for key, value in data.__dict__.items():
_save_level(h5file, group, value, name=key,
filters=filters, idtable=idtable)
else:
_save_level(h5file, group, data, name='data',
filters=filters, idtable=idtable)
# Mark this to automatically unpack when loaded
group._v_attrs[DEEPDISH_IO_UNPACK] = True
def load(path, group=None, sel=None, unpack=False):
"""
Loads an HDF5 saved with `save`.
This function requires the `PyTables `_ module to
be installed.
Parameters
----------
path : string
Filename from which to load the data.
group : string or list
Load a specific group in the HDF5 hierarchy. If `group` is a list of
strings, then a tuple will be returned with all the groups that were
specified.
sel : slice or tuple of slices
If you specify `group` and the target is a numpy array, then you can
use this to slice it. This is useful for opening subsets of large HDF5
files. To compose the selection, you can use `deepdish.aslice`.
unpack : bool
If True, a single-entry dictionaries will be unpacked and the value
will be returned directly. That is, if you save ``dict(a=100)``, only
``100`` will be loaded.
Returns
-------
data : anything
Hopefully an identical reconstruction of the data that was saved.
See also
--------
save
"""
with tables.open_file(path, mode='r') as h5file:
pathtable = {} # dict to keep track of objects already loaded
if group is not None:
if isinstance(group, str):
data = _load_specific_level(h5file, h5file, group, sel=sel,
pathtable=pathtable)
else: # Assume group is a list or tuple
data = []
for g in group:
data_i = _load_specific_level(h5file, h5file, g, sel=sel,
pathtable=pathtable)
data.append(data_i)
data = tuple(data)
else:
grp = h5file.root
auto_unpack = (DEEPDISH_IO_UNPACK in grp._v_attrs and
grp._v_attrs[DEEPDISH_IO_UNPACK])
do_unpack = unpack or auto_unpack
if do_unpack and len(grp._v_children) == 1:
name = next(iter(grp._v_children))
data = _load_specific_level(h5file, grp, name, sel=sel,
pathtable=pathtable)
do_unpack = False
elif sel is not None:
raise ValueError("Must specify group with `sel` unless it "
"automatically unpacks")
else:
data = _load_level(h5file, grp, pathtable)
if DEEPDISH_IO_VERSION_STR in grp._v_attrs:
v = grp._v_attrs[DEEPDISH_IO_VERSION_STR]
else:
v = 0
if v > IO_VERSION:
warnings.warn('This file was saved with a newer version of '
'deepdish. Please upgrade to make sure it loads '
'correctly.')
# Attributes can't be unpacked with the method above, so fall back
# to this
if do_unpack and isinstance(data, dict) and len(data) == 1:
data = next(iter(data.values()))
return data
deepdish-0.3.7/deepdish/io/ls.py 0000644 0001750 0001750 00000064373 14123254760 017677 0 ustar larsson larsson 0000000 0000000 """
Look inside HDF5 files from the terminal, especially those created by deepdish.
"""
from __future__ import division, print_function, absolute_import
from .hdf5io import (DEEPDISH_IO_VERSION_STR, DEEPDISH_IO_PREFIX,
DEEPDISH_IO_UNPACK, DEEPDISH_IO_ROOT_IS_SNS,
IO_VERSION, _sns, is_pandas_dataframe)
import tables
import numpy as np
import sys
import os
import re
from deepdish import io, six, __version__
COLORS = dict(
black='30',
darkgray='2;39',
red='0;31',
green='0;32',
brown='0;33',
yellow='0;33',
blue='0;34',
purple='0;35',
cyan='0;36',
white='0;39',
reset='0'
)
MIN_COLUMN_WIDTH = 5
MIN_AUTOMATIC_COLUMN_WIDTH = 20
MAX_AUTOMATIC_COLUMN_WIDTH = 80
ABRIDGE_OVER_N_CHILDREN = 50
ABRIDGE_SHOW_EACH_SIDE = 5
def _format_dtype(dtype):
dtype = np.dtype(dtype)
dtype_str = dtype.name
if dtype.byteorder == '<':
dtype_str += ' little-endian'
elif dtype.byteorder == '>':
dtype_str += ' big-endian'
return dtype_str
def _pandas_shape(level):
if 'ndim' in level._v_attrs:
ndim = level._v_attrs['ndim']
shape = []
for i in range(ndim):
axis_name = 'axis{}'.format(i)
if axis_name in level._v_children:
axis = len(level._v_children[axis_name])
shape.append(axis)
elif axis_name + '_label0' in level._v_children:
axis = len(level._v_children[axis_name + '_label0'])
shape.append(axis)
else:
return None
return tuple(shape)
def sorted_maybe_numeric(x):
"""
Sorts x with numeric semantics if all keys are nonnegative integers.
Otherwise uses standard string sorting.
"""
all_numeric = all(map(str.isdigit, x))
if all_numeric:
return sorted(x, key=int)
else:
return sorted(x)
def paint(s, color, colorize=True):
if colorize:
if color in COLORS:
return '\033[{}m{}\033[0m'.format(COLORS[color], s)
else:
raise ValueError('Invalid color')
else:
return s
def type_string(typename, dtype=None, extra=None,
type_color='red', colorize=True):
ll = [paint(typename, type_color, colorize=colorize)]
if extra:
ll += [extra]
if dtype:
ll += [paint('[' + dtype + ']', 'darkgray', colorize=colorize)]
return ' '.join(ll)
def container_info(name, size=None, colorize=True, type_color=None,
final_level=False):
if final_level:
d = {}
if size is not None:
d['extra'] = '(' + str(size) + ')'
if type_color is not None:
d['type_color'] = type_color
s = type_string(name, colorize=colorize, **d)
# Mark that it's abbreviated
s += ' ' + paint('[...]', 'darkgray', colorize=colorize)
return s
else:
# If not abbreviated, then display the type in dark gray, since
# the information is already conveyed through the children
return type_string(name, colorize=colorize, type_color='darkgray')
def abbreviate(s, maxlength=25):
"""Color-aware abbreviator"""
assert maxlength >= 4
skip = False
abbrv = None
i = 0
for j, c in enumerate(s):
if c == '\033':
skip = True
elif skip:
if c == 'm':
skip = False
else:
i += 1
if i == maxlength - 1:
abbrv = s[:j] + '\033[0m...'
elif i > maxlength:
break
if i <= maxlength:
return s
else:
return abbrv
def print_row(key, value, level=0, parent='/', colorize=True,
file=sys.stdout, unpack=False, settings={},
parent_color='darkgray',
key_color='white'):
s = '{}{}'.format(paint(parent, parent_color, colorize=colorize),
paint(key, key_color, colorize=colorize))
s_raw = '{}{}'.format(parent, key)
if 'filter' in settings:
if not re.search(settings['filter'], s_raw):
settings['filtered_count'] += 1
return
if unpack:
extra_str = '*'
s_raw += extra_str
s += paint(extra_str, 'purple', colorize=colorize)
print('{}{} {}'.format(abbreviate(s, settings['left-column-width']),
' '*max(0, (settings['left-column-width'] + 1 - len(s_raw))),
value))
class Node(object):
def __repr__(self):
return 'Node'
def print(self, level=0, parent='/', colorize=True, max_level=None,
file=sys.stdout, settings={}):
pass
def info(self, colorize=True, final_level=False):
return paint('Node', 'red', colorize=colorize)
class FileNotFoundNode(Node):
def __init__(self, filename):
self.filename = filename
def __repr__(self):
return 'FileNotFoundNode'
def print(self, level=0, parent='/', colorize=True, max_level=None,
file=sys.stdout, settings={}):
print(paint('File not found', 'red', colorize=colorize),
file=file)
def info(self, colorize=True, final_level=False):
return paint('FileNotFoundNode', 'red', colorize=colorize)
class InvalidFileNode(Node):
def __init__(self, filename):
self.filename = filename
def __repr__(self):
return 'InvalidFileNode'
def print(self, level=0, parent='/', colorize=True, max_level=None,
file=sys.stdout, settings={}):
print(paint('Invalid HDF5 file', 'red', colorize=colorize),
file=file)
def info(self, colorize=True, final_level=False):
return paint('InvalidFileNode', 'red', colorize=colorize)
class DictNode(Node):
def __init__(self):
self.children = {}
self.header = {}
def add(self, k, v):
self.children[k] = v
def print(self, level=0, parent='/', colorize=True, max_level=None,
file=sys.stdout, settings={}):
if level < max_level:
ch = sorted_maybe_numeric(self.children)
N = len(ch)
if N > ABRIDGE_OVER_N_CHILDREN and not settings.get('all'):
ch = ch[:ABRIDGE_SHOW_EACH_SIDE] + [None] + ch[-ABRIDGE_SHOW_EACH_SIDE:]
for k in ch:#sorted(self.children):
if k is None:
#print(paint('... ({} omitted)'.format(N-20), 'darkgray', colorize=colorize))
omitted = N-2 * ABRIDGE_SHOW_EACH_SIDE
info = paint('{} omitted ({} in total)'.format(omitted, N),
'darkgray', colorize=colorize)
print_row('...', info,
level=level,
parent=parent,
unpack=self.header.get('dd_io_unpack'),
colorize=colorize, file=file,
key_color='darkgray',
settings=settings)
continue
v = self.children[k]
final = level+1 == max_level
if (not settings.get('leaves-only') or
not isinstance(v, DictNode)):
print_row(k,
v.info(colorize=colorize, final_level=final),
level=level,
parent=parent,
unpack=self.header.get('dd_io_unpack'),
colorize=colorize, file=file,
settings=settings)
v.print(level=level+1, parent='{}{}/'.format(parent, k),
colorize=colorize, max_level=max_level, file=file,
settings=settings)
def info(self, colorize=True, final_level=False):
return container_info('dict', size=len(self.children),
colorize=colorize,
type_color='purple',
final_level=final_level)
def __repr__(self):
s = ['{}={}'.format(k, repr(v)) for k, v in self.children.items()]
return 'DictNode({})'.format(', '.join(s))
class SimpleNamespaceNode(DictNode):
def info(self, colorize=True, final_level=False):
return container_info('SimpleNamespace', size=len(self.children),
colorize=colorize,
type_color='purple',
final_level=final_level)
def print(self, level=0, parent='/', colorize=True, max_level=None,
file=sys.stdout):
if level == 0 and not self.header.get('dd_io_unpack'):
print_row('', self.info(colorize=colorize,
final_level=(0 == max_level)),
level=level, parent=parent, unpack=False,
colorize=colorize, file=file)
DictNode.print(self, level, parent, colorize, max_level, file)
def __repr__(self):
s = ['{}={}'.format(k, repr(v)) for k, v in self.children.items()]
return 'SimpleNamespaceNode({})'.format(', '.join(s))
class PandasDataFrameNode(Node):
def __init__(self, shape):
self.shape = shape
def info(self, colorize=True, final_level=False):
d = {}
if self.shape is not None:
d['extra'] = repr(self.shape)
return type_string('DataFrame',
type_color='red',
colorize=colorize, **d)
def __repr__(self):
return 'PandasDataFrameNode({})'.format(self.shape)
class PandasPanelNode(Node):
def __init__(self, shape):
self.shape = shape
def info(self, colorize=True, final_level=False):
d = {}
if self.shape is not None:
d['extra'] = repr(self.shape)
return type_string('Panel',
type_color='red',
colorize=colorize, **d)
def __repr__(self):
return 'PandasPanelNode({})'.format(self.shape)
class PandasSeriesNode(Node):
def __init__(self, size, dtype):
self.size = size
self.dtype = dtype
def info(self, colorize=True, final_level=False):
d = {}
if self.size is not None:
d['extra'] = repr((self.size,))
if self.dtype is not None:
d['dtype'] = str(self.dtype)
return type_string('Series',
type_color='red',
colorize=colorize, **d)
def __repr__(self):
return 'SeriesNode()'
class ListNode(Node):
def __init__(self, typename='list'):
self.children = []
self.typename = typename
def append(self, v):
self.children.append(v)
def __repr__(self):
s = [repr(v) for v in self.children]
return 'ListNode({})'.format(', '.join(s))
def print(self, level=0, parent='/', colorize=True,
max_level=None, file=sys.stdout, settings={}):
if level < max_level:
for i, v in enumerate(self.children):
k = str(i)
final = level + 1 == max_level
print_row(k, v.info(colorize=colorize,
final_level=final),
level=level, parent=parent + 'i',
colorize=colorize, file=file,
settings=settings)
v.print(level=level+1, parent='{}{}/'.format(parent + 'i', k),
colorize=colorize, max_level=max_level, file=file,
settings=settings)
def info(self, colorize=True, final_level=False):
return container_info(self.typename, size=len(self.children),
colorize=colorize,
type_color='purple',
final_level=final_level)
class NumpyArrayNode(Node):
def __init__(self, shape, dtype, statistics=None, compression=None):
self.shape = shape
self.dtype = dtype
self.statistics = statistics
self.compression = compression
def info(self, colorize=True, final_level=False):
if not self.statistics:
s = type_string('array', extra=repr(self.shape),
dtype=str(self.dtype),
type_color='red',
colorize=colorize)
if self.compression:
if self.compression['complib'] is not None:
compstr = '{} lvl{}'.format(self.compression['complib'],
self.compression['complevel'])
else:
compstr = 'none'
s += ' ' + paint(compstr, 'yellow', colorize=colorize)
else:
s = type_string('array', extra=repr(self.shape),
type_color='red',
colorize=colorize)
raw_s = type_string('array', extra=repr(self.shape),
type_color='red',
colorize=False)
if len(raw_s) < 25:
s += ' ' * (25 - len(raw_s))
s += paint(' {:14.2g}'.format(self.statistics.get('mean')),
'white', colorize=colorize)
s += paint(u' \u00b1 ', 'darkgray', colorize=colorize)
s += paint('{:.2g}'.format(self.statistics.get('std')),
'reset', colorize=colorize)
return s
def __repr__(self):
return ('NumpyArrayNode(shape={}, dtype={})'
.format(self.shape, self.dtype))
class SparseMatrixNode(Node):
def __init__(self, fmt, shape, dtype):
self.sparse_format = fmt
self.shape = shape
self.dtype = dtype
def info(self, colorize=True, final_level=False):
return type_string('sparse {}'.format(self.sparse_format),
extra=repr(self.shape),
dtype=str(self.dtype),
type_color='red',
colorize=colorize)
def __repr__(self):
return ('NumpyArrayNode(shape={}, dtype={})'
.format(self.shape, self.dtype))
class ValueNode(Node):
def __init__(self, value):
self.value = value
def __repr__(self):
return 'ValueNode(type={})'.format(type(self.value))
def info(self, colorize=True, final_level=False):
if isinstance(self.value, six.text_type):
if len(self.value) > 25:
s = repr(self.value[:22] + '...')
else:
s = repr(self.value)
return type_string(s, dtype='unicode',
type_color='green',
extra='({})'.format(len(self.value)),
colorize=colorize)
elif isinstance(self.value, six.binary_type):
if len(self.value) > 25:
s = repr(self.value[:22] + b'...')
else:
s = repr(self.value)
return type_string(s, dtype='ascii',
type_color='green',
extra='({})'.format(len(self.value)),
colorize=colorize)
elif self.value is None:
return type_string('None', dtype='python',
type_color='blue',
colorize=colorize)
else:
return type_string(repr(self.value)[:20],
dtype=str(np.dtype(type(self.value))),
type_color='blue',
colorize=colorize)
class ObjectNode(Node):
def __init__(self):
pass
def __repr__(self):
return 'ObjectNode'
def info(self, colorize=True, final_level=False):
return type_string('pickled', dtype='object', type_color='yellow',
colorize=colorize)
class SoftLinkNode(Node):
def __init__(self, target):
self.target = target
def info(self, colorize=True, final_level=False):
return type_string('link -> {}'.format(self.target),
dtype='SoftLink',
type_color='cyan',
colorize=colorize)
def __repr__(self):
return ('SoftLinkNode(target={})'
.format(self.target))
def _tree_level(level, raw=False, settings={}):
if isinstance(level, tables.Group):
if _sns and (level._v_title.startswith('SimpleNamespace:') or
DEEPDISH_IO_ROOT_IS_SNS in level._v_attrs):
node = SimpleNamespaceNode()
else:
node = DictNode()
for grp in level:
node.add(grp._v_name, _tree_level(grp, raw=raw, settings=settings))
for name in level._v_attrs._f_list():
v = level._v_attrs[name]
if name == DEEPDISH_IO_VERSION_STR:
node.header['dd_io_version'] = v
if name == DEEPDISH_IO_UNPACK:
node.header['dd_io_unpack'] = v
if name.startswith(DEEPDISH_IO_PREFIX):
continue
if isinstance(v, np.ndarray):
node.add(name, NumpyArrayNode(v.shape, _format_dtype(v.dtype)))
else:
node.add(name, ValueNode(v))
if (level._v_title.startswith('list:') or
level._v_title.startswith('tuple:')):
s = level._v_title.split(':', 1)[1]
N = int(s)
lst = ListNode(typename=level._v_title.split(':')[0])
for i in range(N):
t = node.children['i{}'.format(i)]
lst.append(t)
return lst
elif level._v_title.startswith('nonetype:'):
return ValueNode(None)
elif is_pandas_dataframe(level):
pandas_type = level._v_attrs['pandas_type']
if raw:
# Treat as regular dictionary
pass
elif pandas_type == 'frame':
shape = _pandas_shape(level)
new_node = PandasDataFrameNode(shape)
return new_node
elif pandas_type == 'series':
try:
values = level._v_children['values']
size = len(values)
dtype = values.dtype
except:
size = None
dtype = None
new_node = PandasSeriesNode(size, dtype)
return new_node
elif pandas_type == 'wide':
shape = _pandas_shape(level)
new_node = PandasPanelNode(shape)
return new_node
# else: it will simply be treated as a dict
elif level._v_title.startswith('sparse:') and not raw:
frm = level._v_attrs.format
dtype = level.data.dtype
shape = tuple(level.shape[:])
node = SparseMatrixNode(frm, shape, dtype)
return node
return node
elif isinstance(level, tables.VLArray):
if level.shape == (1,):
return ObjectNode()
node = NumpyArrayNode(level.shape, 'unknown')
return node
elif isinstance(level, tables.Array):
stats = {}
if settings.get('summarize'):
stats['mean'] = level[:].mean()
stats['std'] = level[:].std()
compression = {}
if settings.get('compression'):
compression['complib'] = level.filters.complib
compression['shuffle'] = level.filters.shuffle
compression['complevel'] = level.filters.complevel
node = NumpyArrayNode(level.shape, _format_dtype(level.dtype),
statistics=stats, compression=compression)
if hasattr(level._v_attrs, 'zeroarray_dtype'):
dtype = level._v_attrs.zeroarray_dtype
node = NumpyArrayNode(tuple(level), _format_dtype(dtype))
elif hasattr(level._v_attrs, 'strtype'):
strtype = level._v_attrs.strtype
itemsize = level._v_attrs.itemsize
if strtype == b'unicode':
shape = level.shape[:-1] + (level.shape[-1] // itemsize // 4,)
elif strtype == b'ascii':
shape = level.shape
node = NumpyArrayNode(shape, strtype.decode('ascii'))
return node
elif isinstance(level, tables.link.SoftLink):
node = SoftLinkNode(level.target)
return node
else:
return Node()
def get_tree(path, raw=False, settings={}):
fn = os.path.basename(path)
try:
with tables.open_file(path, mode='r') as h5file:
grp = h5file.root
s = _tree_level(grp, raw=raw, settings=settings)
s.header['filename'] = fn
return s
except OSError:
return FileNotFoundNode(fn)
except IOError:
return FileNotFoundNode(fn)
except tables.exceptions.HDF5ExtError:
return InvalidFileNode(fn)
def _column_width(level):
if isinstance(level, tables.Group):
max_w = 0
for grp in level:
max_w = max(max_w, _column_width(grp))
for name in level._v_attrs._f_list():
if name.startswith(DEEPDISH_IO_PREFIX):
continue
max_w = max(max_w, len(level._v_pathname) + 1 + len(name))
return max_w
else:
return len(level._v_pathname)
def _discover_column_width(path):
if not os.path.isfile(path):
return MIN_AUTOMATIC_COLUMN_WIDTH
with tables.open_file(path, mode='r') as h5file:
return _column_width(h5file.root)
def main():
import argparse
parser = argparse.ArgumentParser(
description=("Look inside HDF5 files. Works particularly well "
"for HDF5 files saved with deepdish.io.save()."),
prog='ddls',
epilog='example: ddls test.h5 -i /foo/bar --ipython')
parser.add_argument('file', nargs='+',
help='filename of HDF5 file')
parser.add_argument('-d', '--depth', type=int, default=4,
help='max depth, defaults to 4')
parser.add_argument('-nc', '--no-color', action='store_true',
help='turn off bash colors')
parser.add_argument('-i', '--inspect', metavar='GRP',
help='print a specific variable (e.g. /data)')
parser.add_argument('--ipython', action='store_true',
help=('load file into an IPython session. '
'Works with -i'))
parser.add_argument('--raw', action='store_true',
help=('print the raw HDF5 structure for complex '
'data types, such as sparse matrices and pandas '
'data frames'))
parser.add_argument('-f', '--filter', type=str,
help=('print only entries that match this regular '
'expression'))
parser.add_argument('-l', '--leaves-only', action='store_true',
help=('print only leaves'))
parser.add_argument('-a', '--all', action='store_true',
help=('do not abridge'))
parser.add_argument('-s', '--summarize', action='store_true',
help=('print summary statistics of numpy arrays'))
parser.add_argument('-c', '--compression', action='store_true',
help=('print compression method for each array'))
parser.add_argument('-v', '--version', action='version',
version='deepdish {} (io protocol {})'.format(
__version__, IO_VERSION))
parser.add_argument('--column-width', type=int, default=None)
args = parser.parse_args()
colorize = sys.stdout.isatty() and not args.no_color
settings = {}
if args.filter:
settings['filter'] = args.filter
if args.leaves_only:
settings['leaves-only'] = True
if args.summarize:
settings['summarize'] = True
if args.compression:
settings['compression'] = True
if args.all:
settings['all'] = True
def single_file(files):
if len(files) >= 2:
s = 'Error: Select a single file when using --inspect'
print(paint(s, 'red', colorize=colorize))
sys.exit(1)
return files[0]
def run_ipython(fn, group=None, data=None):
file_desc = paint(fn, 'yellow', colorize=colorize)
if group is None:
path_desc = file_desc
else:
path_desc = '{}:{}'.format(
file_desc,
paint(group, 'white', colorize=colorize))
welcome = "Loaded {} into '{}':".format(
path_desc,
paint('data', 'blue', colorize=colorize))
# Import deepdish for the session
import deepdish as dd
import IPython
IPython.embed(header=welcome)
i = 0
if args.inspect is not None:
fn = single_file(args.file)
try:
data = io.load(fn, args.inspect)
except ValueError:
s = 'Error: Could not find group: {}'.format(args.inspect)
print(paint(s, 'red', colorize=colorize))
sys.exit(1)
if args.ipython:
run_ipython(fn, group=args.inspect, data=data)
else:
print(data)
elif args.ipython:
fn = single_file(args.file)
data = io.load(fn)
run_ipython(fn, data=data)
else:
for f in args.file:
# State that will be incremented
settings['filtered_count'] = 0
if args.column_width is None:
settings['left-column-width'] = max(MIN_AUTOMATIC_COLUMN_WIDTH, min(MAX_AUTOMATIC_COLUMN_WIDTH, _discover_column_width(f)))
else:
settings['left-column-width'] = args.column_width
s = get_tree(f, raw=args.raw, settings=settings)
if s is not None:
if i > 0:
print()
if len(args.file) >= 2:
print(paint(f, 'yellow', colorize=colorize))
s.print(colorize=colorize, max_level=args.depth,
settings=settings)
i += 1
if settings.get('filter'):
print('Filtered on: {} ({} rows omitted)'.format(
paint(args.filter, 'purple', colorize=colorize),
paint(str(settings['filtered_count']), 'white',
colorize=colorize)))
if __name__ == '__main__':
main()
deepdish-0.3.7/deepdish/parallel/ 0000755 0001750 0001750 00000000000 14123256273 020060 5 ustar larsson larsson 0000000 0000000 deepdish-0.3.7/deepdish/parallel/__init__.py 0000644 0001750 0001750 00000000443 13052123256 022164 0 ustar larsson larsson 0000000 0000000 from __future__ import print_function, division, absolute_import
try:
import mpi4py
from deepdish.parallel.mpi import *
except ImportError:
from deepdish.parallel.fallback import *
__all__ = ['rank', 'imap_unordered', 'imap',
'starmap_unordered', 'starmap', 'main']
deepdish-0.3.7/deepdish/parallel/fallback.py 0000644 0001750 0001750 00000002617 13052123256 022171 0 ustar larsson larsson 0000000 0000000 from __future__ import division, print_function, absolute_import
import itertools as itr
__all__ = ['rank', 'imap_unordered', 'imap',
'starmap_unordered', 'starmap', 'main']
def rank():
"""
Returns MPI rank. If the MPI backend is not used, it will always return 0.
"""
return 0
def imap_unordered(f, params):
"""
This can return the elements in any particular order. This has a lower
memory footprint than the ordered version and will be more responsive in
terms of printing the results. For instance, if you run the ordered
version, and the first batch is particularly slow, you won't see any
feedback for a long time.
"""
return map(f, params)
def imap(f, params):
"""
Analogous to `itertools.imap` (Python 2) and `map` (Python 3), but run in
parallel.
"""
return map(f, params)
def starmap_unordered(f, params):
"""
Similar to `imap_unordered`, but it will unpack the parameters. That is, it
will call ``f(*p)``, for each `p` in `params`.
"""
return itr.starmap(f, params)
def starmap(f, params):
"""
Analogous to `itertools.starmap`, but run in parallel.
"""
return itr.starmap(f, params)
def main(name=None):
"""
Main function.
Example use:
>>> if gv.parallel.main(__name__):
... res = gv.parallel.imap_unordered(f, params)
"""
return name == '__main__'
deepdish-0.3.7/deepdish/parallel/mpi.py 0000644 0001750 0001750 00000010631 13052123256 021212 0 ustar larsson larsson 0000000 0000000 from __future__ import division, print_function, absolute_import
import sys
import itertools as itr
import numpy as np
__all__ = ['rank', 'imap_unordered', 'imap',
'starmap_unordered', 'starmap', 'main']
# Global set of workers - initialized a map function is first called
_g_available_workers = None
_g_initialized = False
# For docstrings, see deepdish.parallel.fallback
def rank():
from mpi4py import MPI
rank = MPI.COMM_WORLD.Get_rank()
return rank
def kill_workers():
from mpi4py import MPI
all_workers = range(1, MPI.COMM_WORLD.Get_size())
for worker in all_workers:
MPI.COMM_WORLD.send(None, dest=worker, tag=666)
def _init():
global _g_available_workers, _g_initialized
from mpi4py import MPI
import atexit
_g_available_workers = set(range(1, MPI.COMM_WORLD.Get_size()))
_g_initialized = True
atexit.register(kill_workers)
def imap_unordered(f, workloads, star=False):
global _g_available_workers, _g_initialized
from mpi4py import MPI
N = MPI.COMM_WORLD.Get_size() - 1
if N == 0 or not _g_initialized:
mapf = [map, itr.starmap][star]
for res in mapf(f, workloads):
yield res
return
for job_index, workload in enumerate(itr.chain(workloads, itr.repeat(None))):
if workload is None and len(_g_available_workers) == N:
break
while not _g_available_workers or workload is None:
# Wait to receive results
status = MPI.Status()
ret = MPI.COMM_WORLD.recv(source=MPI.ANY_SOURCE, tag=MPI.ANY_TAG, status=status)
if status.tag == 2:
yield ret['output_data']
_g_available_workers.add(status.source)
if len(_g_available_workers) == N:
break
if _g_available_workers and workload is not None:
dest_rank = _g_available_workers.pop()
# Send off job
task = dict(func=f, input_data=workload, job_index=job_index, unpack=star)
MPI.COMM_WORLD.send(task, dest=dest_rank, tag=10)
def imap(f, workloads, star=False):
global _g_available_workers, _g_initialized
from mpi4py import MPI
N = MPI.COMM_WORLD.Get_size() - 1
if N == 0 or not _g_initialized:
mapf = [map, itr.starmap][star]
for res in mapf(f, workloads):
yield res
return
results = []
indices = []
for job_index, workload in enumerate(itr.chain(workloads, itr.repeat(None))):
if workload is None and len(_g_available_workers) == N:
break
while not _g_available_workers or workload is None:
# Wait to receive results
status = MPI.Status()
ret = MPI.COMM_WORLD.recv(source=MPI.ANY_SOURCE, tag=MPI.ANY_TAG, status=status)
if status.tag == 2:
results.append(ret['output_data'])
indices.append(ret['job_index'])
_g_available_workers.add(status.source)
if len(_g_available_workers) == N:
break
if _g_available_workers and workload is not None:
dest_rank = _g_available_workers.pop()
# Send off job
task = dict(func=f, input_data=workload, job_index=job_index, unpack=star)
MPI.COMM_WORLD.send(task, dest=dest_rank, tag=10)
II = np.argsort(indices)
for i in II:
yield results[i]
def starmap(f, workloads):
return imap(f, workloads, star=True)
def starmap_unordered(f, workloads):
return imap_unordered(f, workloads, star=True)
def worker():
from mpi4py import MPI
while True:
status = MPI.Status()
ret = MPI.COMM_WORLD.recv(source=0, tag=MPI.ANY_TAG, status=status)
if status.tag == 10:
# Workload received
func = ret['func']
if ret.get('unpack'):
res = func(*ret['input_data'])
else:
res = func(ret['input_data'])
# Done, let's send it back
MPI.COMM_WORLD.send(dict(job_index=ret['job_index'], output_data=res), dest=0, tag=2)
elif status.tag == 666:
# Kill code
sys.exit(0)
def main(name=None):
if name is not None and name != '__main__':
return False
from mpi4py import MPI
rank = MPI.COMM_WORLD.Get_rank()
if rank == 0:
_init()
return True
else:
worker()
sys.exit(0)
deepdish-0.3.7/deepdish/six.py 0000644 0001750 0001750 00000063626 13052123256 017450 0 ustar larsson larsson 0000000 0000000 """Utilities for writing code that runs on Python 2 and 3"""
# Copyright (c) 2010-2014 Benjamin Peterson
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
import functools
import operator
import sys
import types
__author__ = "Benjamin Peterson "
__version__ = "1.7.3"
# Useful for very coarse version differentiation.
PY2 = sys.version_info[0] == 2
PY3 = sys.version_info[0] == 3
if PY3:
string_types = str,
integer_types = int,
class_types = type,
text_type = str
binary_type = bytes
MAXSIZE = sys.maxsize
else:
string_types = basestring,
integer_types = (int, long)
class_types = (type, types.ClassType)
text_type = unicode
binary_type = str
if sys.platform.startswith("java"):
# Jython always uses 32 bits.
MAXSIZE = int((1 << 31) - 1)
else:
# It's possible to have sizeof(long) != sizeof(Py_ssize_t).
class X(object):
def __len__(self):
return 1 << 31
try:
len(X())
except OverflowError:
# 32-bit
MAXSIZE = int((1 << 31) - 1)
else:
# 64-bit
MAXSIZE = int((1 << 63) - 1)
del X
def _add_doc(func, doc):
"""Add documentation to a function."""
func.__doc__ = doc
def _import_module(name):
"""Import module, returning the module after the last dot."""
__import__(name)
return sys.modules[name]
class _LazyDescr(object):
def __init__(self, name):
self.name = name
def __get__(self, obj, tp):
result = self._resolve()
setattr(obj, self.name, result) # Invokes __set__.
# This is a bit ugly, but it avoids running this again.
delattr(obj.__class__, self.name)
return result
class MovedModule(_LazyDescr):
def __init__(self, name, old, new=None):
super(MovedModule, self).__init__(name)
if PY3:
if new is None:
new = name
self.mod = new
else:
self.mod = old
def _resolve(self):
return _import_module(self.mod)
def __getattr__(self, attr):
_module = self._resolve()
value = getattr(_module, attr)
setattr(self, attr, value)
return value
class _LazyModule(types.ModuleType):
def __init__(self, name):
super(_LazyModule, self).__init__(name)
self.__doc__ = self.__class__.__doc__
def __dir__(self):
attrs = ["__doc__", "__name__"]
attrs += [attr.name for attr in self._moved_attributes]
return attrs
# Subclasses should override this
_moved_attributes = []
class MovedAttribute(_LazyDescr):
def __init__(self, name, old_mod, new_mod, old_attr=None, new_attr=None):
super(MovedAttribute, self).__init__(name)
if PY3:
if new_mod is None:
new_mod = name
self.mod = new_mod
if new_attr is None:
if old_attr is None:
new_attr = name
else:
new_attr = old_attr
self.attr = new_attr
else:
self.mod = old_mod
if old_attr is None:
old_attr = name
self.attr = old_attr
def _resolve(self):
module = _import_module(self.mod)
return getattr(module, self.attr)
class _SixMetaPathImporter(object):
"""
A meta path importer to import six.moves and its submodules.
This class implements a PEP302 finder and loader. It should be compatible
with Python 2.5 and all existing versions of Python3
"""
def __init__(self, six_module_name):
self.name = six_module_name
self.known_modules = {}
def _add_module(self, mod, *fullnames):
for fullname in fullnames:
self.known_modules[self.name + "." + fullname] = mod
def _get_module(self, fullname):
return self.known_modules[self.name + "." + fullname]
def find_module(self, fullname, path=None):
if fullname in self.known_modules:
return self
return None
def __get_module(self, fullname):
try:
return self.known_modules[fullname]
except KeyError:
raise ImportError("This loader does not know module " + fullname)
def load_module(self, fullname):
try:
# in case of a reload
return sys.modules[fullname]
except KeyError:
pass
mod = self.__get_module(fullname)
if isinstance(mod, MovedModule):
mod = mod._resolve()
else:
mod.__loader__ = self
sys.modules[fullname] = mod
return mod
def is_package(self, fullname):
"""
Return true, if the named module is a package.
We need this method to get correct spec objects with
Python 3.4 (see PEP451)
"""
return hasattr(self.__get_module(fullname), "__path__")
def get_code(self, fullname):
"""Return None
Required, if is_package is implemented"""
self.__get_module(fullname) # eventually raises ImportError
return None
get_source = get_code # same as get_code
_importer = _SixMetaPathImporter(__name__)
class _MovedItems(_LazyModule):
"""Lazy loading of moved objects"""
__path__ = [] # mark as package
_moved_attributes = [
MovedAttribute("cStringIO", "cStringIO", "io", "StringIO"),
MovedAttribute("filter", "itertools", "builtins", "ifilter", "filter"),
MovedAttribute("filterfalse", "itertools", "itertools", "ifilterfalse", "filterfalse"),
MovedAttribute("input", "__builtin__", "builtins", "raw_input", "input"),
MovedAttribute("map", "itertools", "builtins", "imap", "map"),
MovedAttribute("range", "__builtin__", "builtins", "xrange", "range"),
MovedAttribute("reload_module", "__builtin__", "imp", "reload"),
MovedAttribute("reduce", "__builtin__", "functools"),
MovedAttribute("StringIO", "StringIO", "io"),
MovedAttribute("UserDict", "UserDict", "collections"),
MovedAttribute("UserList", "UserList", "collections"),
MovedAttribute("UserString", "UserString", "collections"),
MovedAttribute("xrange", "__builtin__", "builtins", "xrange", "range"),
MovedAttribute("zip", "itertools", "builtins", "izip", "zip"),
MovedAttribute("zip_longest", "itertools", "itertools", "izip_longest", "zip_longest"),
MovedModule("builtins", "__builtin__"),
MovedModule("configparser", "ConfigParser"),
MovedModule("copyreg", "copy_reg"),
MovedModule("dbm_gnu", "gdbm", "dbm.gnu"),
MovedModule("_dummy_thread", "dummy_thread", "_dummy_thread"),
MovedModule("http_cookiejar", "cookielib", "http.cookiejar"),
MovedModule("http_cookies", "Cookie", "http.cookies"),
MovedModule("html_entities", "htmlentitydefs", "html.entities"),
MovedModule("html_parser", "HTMLParser", "html.parser"),
MovedModule("http_client", "httplib", "http.client"),
MovedModule("email_mime_multipart", "email.MIMEMultipart", "email.mime.multipart"),
MovedModule("email_mime_text", "email.MIMEText", "email.mime.text"),
MovedModule("email_mime_base", "email.MIMEBase", "email.mime.base"),
MovedModule("BaseHTTPServer", "BaseHTTPServer", "http.server"),
MovedModule("CGIHTTPServer", "CGIHTTPServer", "http.server"),
MovedModule("SimpleHTTPServer", "SimpleHTTPServer", "http.server"),
MovedModule("cPickle", "cPickle", "pickle"),
MovedModule("queue", "Queue"),
MovedModule("reprlib", "repr"),
MovedModule("socketserver", "SocketServer"),
MovedModule("_thread", "thread", "_thread"),
MovedModule("tkinter", "Tkinter"),
MovedModule("tkinter_dialog", "Dialog", "tkinter.dialog"),
MovedModule("tkinter_filedialog", "FileDialog", "tkinter.filedialog"),
MovedModule("tkinter_scrolledtext", "ScrolledText", "tkinter.scrolledtext"),
MovedModule("tkinter_simpledialog", "SimpleDialog", "tkinter.simpledialog"),
MovedModule("tkinter_tix", "Tix", "tkinter.tix"),
MovedModule("tkinter_ttk", "ttk", "tkinter.ttk"),
MovedModule("tkinter_constants", "Tkconstants", "tkinter.constants"),
MovedModule("tkinter_dnd", "Tkdnd", "tkinter.dnd"),
MovedModule("tkinter_colorchooser", "tkColorChooser",
"tkinter.colorchooser"),
MovedModule("tkinter_commondialog", "tkCommonDialog",
"tkinter.commondialog"),
MovedModule("tkinter_tkfiledialog", "tkFileDialog", "tkinter.filedialog"),
MovedModule("tkinter_font", "tkFont", "tkinter.font"),
MovedModule("tkinter_messagebox", "tkMessageBox", "tkinter.messagebox"),
MovedModule("tkinter_tksimpledialog", "tkSimpleDialog",
"tkinter.simpledialog"),
MovedModule("urllib_parse", __name__ + ".moves.urllib_parse", "urllib.parse"),
MovedModule("urllib_error", __name__ + ".moves.urllib_error", "urllib.error"),
MovedModule("urllib", __name__ + ".moves.urllib", __name__ + ".moves.urllib"),
MovedModule("urllib_robotparser", "robotparser", "urllib.robotparser"),
MovedModule("xmlrpc_client", "xmlrpclib", "xmlrpc.client"),
MovedModule("xmlrpc_server", "SimpleXMLRPCServer", "xmlrpc.server"),
MovedModule("winreg", "_winreg"),
]
for attr in _moved_attributes:
setattr(_MovedItems, attr.name, attr)
if isinstance(attr, MovedModule):
_importer._add_module(attr, "moves." + attr.name)
del attr
_MovedItems._moved_attributes = _moved_attributes
moves = _MovedItems(__name__ + ".moves")
_importer._add_module(moves, "moves")
class Module_six_moves_urllib_parse(_LazyModule):
"""Lazy loading of moved objects in six.moves.urllib_parse"""
_urllib_parse_moved_attributes = [
MovedAttribute("ParseResult", "urlparse", "urllib.parse"),
MovedAttribute("SplitResult", "urlparse", "urllib.parse"),
MovedAttribute("parse_qs", "urlparse", "urllib.parse"),
MovedAttribute("parse_qsl", "urlparse", "urllib.parse"),
MovedAttribute("urldefrag", "urlparse", "urllib.parse"),
MovedAttribute("urljoin", "urlparse", "urllib.parse"),
MovedAttribute("urlparse", "urlparse", "urllib.parse"),
MovedAttribute("urlsplit", "urlparse", "urllib.parse"),
MovedAttribute("urlunparse", "urlparse", "urllib.parse"),
MovedAttribute("urlunsplit", "urlparse", "urllib.parse"),
MovedAttribute("quote", "urllib", "urllib.parse"),
MovedAttribute("quote_plus", "urllib", "urllib.parse"),
MovedAttribute("unquote", "urllib", "urllib.parse"),
MovedAttribute("unquote_plus", "urllib", "urllib.parse"),
MovedAttribute("urlencode", "urllib", "urllib.parse"),
MovedAttribute("splitquery", "urllib", "urllib.parse"),
]
for attr in _urllib_parse_moved_attributes:
setattr(Module_six_moves_urllib_parse, attr.name, attr)
del attr
Module_six_moves_urllib_parse._moved_attributes = _urllib_parse_moved_attributes
_importer._add_module(Module_six_moves_urllib_parse(__name__ + ".moves.urllib_parse"),
"moves.urllib_parse", "moves.urllib.parse")
class Module_six_moves_urllib_error(_LazyModule):
"""Lazy loading of moved objects in six.moves.urllib_error"""
_urllib_error_moved_attributes = [
MovedAttribute("URLError", "urllib2", "urllib.error"),
MovedAttribute("HTTPError", "urllib2", "urllib.error"),
MovedAttribute("ContentTooShortError", "urllib", "urllib.error"),
]
for attr in _urllib_error_moved_attributes:
setattr(Module_six_moves_urllib_error, attr.name, attr)
del attr
Module_six_moves_urllib_error._moved_attributes = _urllib_error_moved_attributes
_importer._add_module(Module_six_moves_urllib_error(__name__ + ".moves.urllib.error"),
"moves.urllib_error", "moves.urllib.error")
class Module_six_moves_urllib_request(_LazyModule):
"""Lazy loading of moved objects in six.moves.urllib_request"""
_urllib_request_moved_attributes = [
MovedAttribute("urlopen", "urllib2", "urllib.request"),
MovedAttribute("install_opener", "urllib2", "urllib.request"),
MovedAttribute("build_opener", "urllib2", "urllib.request"),
MovedAttribute("pathname2url", "urllib", "urllib.request"),
MovedAttribute("url2pathname", "urllib", "urllib.request"),
MovedAttribute("getproxies", "urllib", "urllib.request"),
MovedAttribute("Request", "urllib2", "urllib.request"),
MovedAttribute("OpenerDirector", "urllib2", "urllib.request"),
MovedAttribute("HTTPDefaultErrorHandler", "urllib2", "urllib.request"),
MovedAttribute("HTTPRedirectHandler", "urllib2", "urllib.request"),
MovedAttribute("HTTPCookieProcessor", "urllib2", "urllib.request"),
MovedAttribute("ProxyHandler", "urllib2", "urllib.request"),
MovedAttribute("BaseHandler", "urllib2", "urllib.request"),
MovedAttribute("HTTPPasswordMgr", "urllib2", "urllib.request"),
MovedAttribute("HTTPPasswordMgrWithDefaultRealm", "urllib2", "urllib.request"),
MovedAttribute("AbstractBasicAuthHandler", "urllib2", "urllib.request"),
MovedAttribute("HTTPBasicAuthHandler", "urllib2", "urllib.request"),
MovedAttribute("ProxyBasicAuthHandler", "urllib2", "urllib.request"),
MovedAttribute("AbstractDigestAuthHandler", "urllib2", "urllib.request"),
MovedAttribute("HTTPDigestAuthHandler", "urllib2", "urllib.request"),
MovedAttribute("ProxyDigestAuthHandler", "urllib2", "urllib.request"),
MovedAttribute("HTTPHandler", "urllib2", "urllib.request"),
MovedAttribute("HTTPSHandler", "urllib2", "urllib.request"),
MovedAttribute("FileHandler", "urllib2", "urllib.request"),
MovedAttribute("FTPHandler", "urllib2", "urllib.request"),
MovedAttribute("CacheFTPHandler", "urllib2", "urllib.request"),
MovedAttribute("UnknownHandler", "urllib2", "urllib.request"),
MovedAttribute("HTTPErrorProcessor", "urllib2", "urllib.request"),
MovedAttribute("urlretrieve", "urllib", "urllib.request"),
MovedAttribute("urlcleanup", "urllib", "urllib.request"),
MovedAttribute("URLopener", "urllib", "urllib.request"),
MovedAttribute("FancyURLopener", "urllib", "urllib.request"),
MovedAttribute("proxy_bypass", "urllib", "urllib.request"),
]
for attr in _urllib_request_moved_attributes:
setattr(Module_six_moves_urllib_request, attr.name, attr)
del attr
Module_six_moves_urllib_request._moved_attributes = _urllib_request_moved_attributes
_importer._add_module(Module_six_moves_urllib_request(__name__ + ".moves.urllib.request"),
"moves.urllib_request", "moves.urllib.request")
class Module_six_moves_urllib_response(_LazyModule):
"""Lazy loading of moved objects in six.moves.urllib_response"""
_urllib_response_moved_attributes = [
MovedAttribute("addbase", "urllib", "urllib.response"),
MovedAttribute("addclosehook", "urllib", "urllib.response"),
MovedAttribute("addinfo", "urllib", "urllib.response"),
MovedAttribute("addinfourl", "urllib", "urllib.response"),
]
for attr in _urllib_response_moved_attributes:
setattr(Module_six_moves_urllib_response, attr.name, attr)
del attr
Module_six_moves_urllib_response._moved_attributes = _urllib_response_moved_attributes
_importer._add_module(Module_six_moves_urllib_response(__name__ + ".moves.urllib.response"),
"moves.urllib_response", "moves.urllib.response")
class Module_six_moves_urllib_robotparser(_LazyModule):
"""Lazy loading of moved objects in six.moves.urllib_robotparser"""
_urllib_robotparser_moved_attributes = [
MovedAttribute("RobotFileParser", "robotparser", "urllib.robotparser"),
]
for attr in _urllib_robotparser_moved_attributes:
setattr(Module_six_moves_urllib_robotparser, attr.name, attr)
del attr
Module_six_moves_urllib_robotparser._moved_attributes = _urllib_robotparser_moved_attributes
_importer._add_module(Module_six_moves_urllib_robotparser(__name__ + ".moves.urllib.robotparser"),
"moves.urllib_robotparser", "moves.urllib.robotparser")
class Module_six_moves_urllib(types.ModuleType):
"""Create a six.moves.urllib namespace that resembles the Python 3 namespace"""
__path__ = [] # mark as package
parse = _importer._get_module("moves.urllib_parse")
error = _importer._get_module("moves.urllib_error")
request = _importer._get_module("moves.urllib_request")
response = _importer._get_module("moves.urllib_response")
robotparser = _importer._get_module("moves.urllib_robotparser")
def __dir__(self):
return ['parse', 'error', 'request', 'response', 'robotparser']
_importer._add_module(Module_six_moves_urllib(__name__ + ".moves.urllib"),
"moves.urllib")
def add_move(move):
"""Add an item to six.moves."""
setattr(_MovedItems, move.name, move)
def remove_move(name):
"""Remove item from six.moves."""
try:
delattr(_MovedItems, name)
except AttributeError:
try:
del moves.__dict__[name]
except KeyError:
raise AttributeError("no such move, %r" % (name,))
if PY3:
_meth_func = "__func__"
_meth_self = "__self__"
_func_closure = "__closure__"
_func_code = "__code__"
_func_defaults = "__defaults__"
_func_globals = "__globals__"
else:
_meth_func = "im_func"
_meth_self = "im_self"
_func_closure = "func_closure"
_func_code = "func_code"
_func_defaults = "func_defaults"
_func_globals = "func_globals"
try:
advance_iterator = next
except NameError:
def advance_iterator(it):
return it.next()
next = advance_iterator
try:
callable = callable
except NameError:
def callable(obj):
return any("__call__" in klass.__dict__ for klass in type(obj).__mro__)
if PY3:
def get_unbound_function(unbound):
return unbound
create_bound_method = types.MethodType
Iterator = object
else:
def get_unbound_function(unbound):
return unbound.im_func
def create_bound_method(func, obj):
return types.MethodType(func, obj, obj.__class__)
class Iterator(object):
def next(self):
return type(self).__next__(self)
callable = callable
_add_doc(get_unbound_function,
"""Get the function out of a possibly unbound function""")
get_method_function = operator.attrgetter(_meth_func)
get_method_self = operator.attrgetter(_meth_self)
get_function_closure = operator.attrgetter(_func_closure)
get_function_code = operator.attrgetter(_func_code)
get_function_defaults = operator.attrgetter(_func_defaults)
get_function_globals = operator.attrgetter(_func_globals)
if PY3:
def iterkeys(d, **kw):
return iter(d.keys(**kw))
def itervalues(d, **kw):
return iter(d.values(**kw))
def iteritems(d, **kw):
return iter(d.items(**kw))
def iterlists(d, **kw):
return iter(d.lists(**kw))
else:
def iterkeys(d, **kw):
return iter(d.iterkeys(**kw))
def itervalues(d, **kw):
return iter(d.itervalues(**kw))
def iteritems(d, **kw):
return iter(d.iteritems(**kw))
def iterlists(d, **kw):
return iter(d.iterlists(**kw))
_add_doc(iterkeys, "Return an iterator over the keys of a dictionary.")
_add_doc(itervalues, "Return an iterator over the values of a dictionary.")
_add_doc(iteritems,
"Return an iterator over the (key, value) pairs of a dictionary.")
_add_doc(iterlists,
"Return an iterator over the (key, [values]) pairs of a dictionary.")
if PY3:
def b(s):
return s.encode("latin-1")
def u(s):
return s
unichr = chr
if sys.version_info[1] <= 1:
def int2byte(i):
return bytes((i,))
else:
# This is about 2x faster than the implementation above on 3.2+
int2byte = operator.methodcaller("to_bytes", 1, "big")
byte2int = operator.itemgetter(0)
indexbytes = operator.getitem
iterbytes = iter
import io
StringIO = io.StringIO
BytesIO = io.BytesIO
else:
def b(s):
return s
# Workaround for standalone backslash
def u(s):
return unicode(s.replace(r'\\', r'\\\\'), "unicode_escape")
unichr = unichr
int2byte = chr
def byte2int(bs):
return ord(bs[0])
def indexbytes(buf, i):
return ord(buf[i])
def iterbytes(buf):
return (ord(byte) for byte in buf)
import StringIO
StringIO = BytesIO = StringIO.StringIO
_add_doc(b, """Byte literal""")
_add_doc(u, """Text literal""")
if PY3:
exec_ = getattr(moves.builtins, "exec")
def reraise(tp, value, tb=None):
if value.__traceback__ is not tb:
raise value.with_traceback(tb)
raise value
else:
def exec_(_code_, _globs_=None, _locs_=None):
"""Execute code in a namespace."""
if _globs_ is None:
frame = sys._getframe(1)
_globs_ = frame.f_globals
if _locs_ is None:
_locs_ = frame.f_locals
del frame
elif _locs_ is None:
_locs_ = _globs_
exec("""exec _code_ in _globs_, _locs_""")
exec_("""def reraise(tp, value, tb=None):
raise tp, value, tb
""")
print_ = getattr(moves.builtins, "print", None)
if print_ is None:
def print_(*args, **kwargs):
"""The new-style print function for Python 2.4 and 2.5."""
fp = kwargs.pop("file", sys.stdout)
if fp is None:
return
def write(data):
if not isinstance(data, basestring):
data = str(data)
# If the file has an encoding, encode unicode with it.
if (isinstance(fp, file) and
isinstance(data, unicode) and
fp.encoding is not None):
errors = getattr(fp, "errors", None)
if errors is None:
errors = "strict"
data = data.encode(fp.encoding, errors)
fp.write(data)
want_unicode = False
sep = kwargs.pop("sep", None)
if sep is not None:
if isinstance(sep, unicode):
want_unicode = True
elif not isinstance(sep, str):
raise TypeError("sep must be None or a string")
end = kwargs.pop("end", None)
if end is not None:
if isinstance(end, unicode):
want_unicode = True
elif not isinstance(end, str):
raise TypeError("end must be None or a string")
if kwargs:
raise TypeError("invalid keyword arguments to print()")
if not want_unicode:
for arg in args:
if isinstance(arg, unicode):
want_unicode = True
break
if want_unicode:
newline = unicode("\n")
space = unicode(" ")
else:
newline = "\n"
space = " "
if sep is None:
sep = space
if end is None:
end = newline
for i, arg in enumerate(args):
if i:
write(sep)
write(arg)
write(end)
_add_doc(reraise, """Reraise an exception.""")
if sys.version_info[0:2] < (3, 4):
def wraps(wrapped):
def wrapper(f):
f = functools.wraps(wrapped)(f)
f.__wrapped__ = wrapped
return f
return wrapper
else:
wraps = functools.wraps
def with_metaclass(meta, *bases):
"""Create a base class with a metaclass."""
# This requires a bit of explanation: the basic idea is to make a dummy
# metaclass for one level of class instantiation that replaces itself with
# the actual metaclass.
class metaclass(meta):
def __new__(cls, name, this_bases, d):
return meta(name, bases, d)
return type.__new__(metaclass, 'temporary_class', (), {})
def add_metaclass(metaclass):
"""Class decorator for creating a class with a metaclass."""
def wrapper(cls):
orig_vars = cls.__dict__.copy()
orig_vars.pop('__dict__', None)
orig_vars.pop('__weakref__', None)
slots = orig_vars.get('__slots__')
if slots is not None:
if isinstance(slots, str):
slots = [slots]
for slots_var in slots:
orig_vars.pop(slots_var)
return metaclass(cls.__name__, cls.__bases__, orig_vars)
return wrapper
# Complete the moves implementation.
# This code is at the end of this module to speed up module loading.
# Turn this module into a package.
__path__ = [] # required for PEP 302 and PEP 451
__package__ = __name__ # see PEP 366 @ReservedAssignment
if globals().get("__spec__") is not None:
__spec__.submodule_search_locations = [] # PEP 451 @UndefinedVariable
# Remove other six meta path importers, since they cause problems. This can
# happen if six is removed from sys.modules and then reloaded. (Setuptools does
# this for some reason.)
if sys.meta_path:
for i, importer in enumerate(sys.meta_path):
# Here's some real nastiness: Another "instance" of the six module might
# be floating around. Therefore, we can't use isinstance() to check for
# the six meta path importer, since the other six instance will have
# inserted an importer with different class.
if (type(importer).__name__ == "_SixMetaPathImporter" and
importer.name == __name__):
del sys.meta_path[i]
break
del i, importer
# Finally, add the importer to the meta path import hook.
sys.meta_path.append(_importer)
deepdish-0.3.7/deepdish/tests/ 0000755 0001750 0001750 00000000000 14123256273 017426 5 ustar larsson larsson 0000000 0000000 deepdish-0.3.7/deepdish/tests/__init__.py 0000644 0001750 0001750 00000000000 13052123256 021517 0 ustar larsson larsson 0000000 0000000 deepdish-0.3.7/deepdish/tests/test_core.py 0000644 0001750 0001750 00000004074 13052123256 021766 0 ustar larsson larsson 0000000 0000000 from __future__ import division, print_function, absolute_import
import unittest
from tempfile import NamedTemporaryFile
import os
import numpy as np
import deepdish as dd
from contextlib import contextmanager
class TestCore(unittest.TestCase):
def test_multi_range(self):
x0 = [(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2)]
x1 = list(dd.multi_range(2, 3))
assert x0 == x1
def test_bytesize(self):
assert dd.humanize_bytesize(1) == '1 B'
assert dd.humanize_bytesize(2 * 1024) == '2 KB'
assert dd.humanize_bytesize(3 * 1024**2) == '3 MB'
assert dd.humanize_bytesize(4 * 1024**3) == '4 GB'
assert dd.humanize_bytesize(5 * 1024**4) == '5 TB'
assert dd.bytesize(np.ones((5, 2), dtype=np.int16)) == 20
assert dd.memsize(np.ones((5, 2), dtype=np.int16)) == '20 B'
def test_span(self):
assert dd.span(np.array([0, -10, 20])) == (-10, 20)
def test_apply_once(self):
x = np.arange(3 * 4 * 5).reshape((3, 4, 5))
np.testing.assert_array_almost_equal(dd.apply_once(np.std, x, [0, -1]),
16.39105447 * np.ones((1, 4, 1)))
x = np.arange(2 * 3).reshape((2, 3))
np.testing.assert_array_equal(dd.apply_once(np.sum, x, 1, keepdims=False),
np.array([3, 12]))
def test_tupled_argmax(self):
x = np.zeros((3, 4, 5))
x[1, 2, 3] = 10
assert dd.tupled_argmax(x) == (1, 2, 3)
def test_slice(self):
s = [slice(None, 3), slice(None), slice(2, None), slice(3, 4), Ellipsis, [1, 2, 3]]
assert dd.aslice[:3, :, 2:, 3:4, ..., [1, 2, 3]]
def test_timed(self):
# These tests only make sure it does not cause errors
with dd.timed():
pass
times = []
with dd.timed(callback=times.append):
pass
assert len(times) == 1
x = np.zeros(1)
x[:] = np.nan
with dd.timed(file=x):
pass
assert not np.isnan(x[0])
if __name__ == '__main__':
unittest.main()
deepdish-0.3.7/deepdish/tests/test_io.py 0000644 0001750 0001750 00000030322 14123254760 021445 0 ustar larsson larsson 0000000 0000000 from __future__ import division, print_function, absolute_import
import unittest
from tempfile import NamedTemporaryFile
import os
import numpy as np
import deepdish as dd
import pandas as pd
from contextlib import contextmanager
try:
from types import SimpleNamespace
_sns = True
except ImportError:
_sns = False
@contextmanager
def tmp_filename():
f = NamedTemporaryFile(delete=False)
yield f.name
f.close()
os.unlink(f.name)
@contextmanager
def tmp_file():
f = NamedTemporaryFile(delete=False)
yield f
f.close()
os.unlink(f.name)
def reconstruct(fn, x):
dd.io.save(fn, x)
return dd.io.load(fn)
def assert_array(fn, x):
dd.io.save(fn, x)
x1 = dd.io.load(fn)
np.testing.assert_array_equal(x, x1)
class TestIO(unittest.TestCase):
def test_basic_data_types(self):
with tmp_filename() as fn:
x = 100
x1 = reconstruct(fn, x)
assert x == x1
x = 1.23
x1 = reconstruct(fn, x)
assert x == x1
# This doesn't work - complex numpy arrays work however
#x = 1.23 + 2.3j
#x1 = reconstruct(fn, x)
#assert x == x1
x = u'this is a string'
x1 = reconstruct(fn, x)
assert x == x1
x = b'this is a bytearray'
x1 = reconstruct(fn, x)
assert x == x1
x = None
x1 = reconstruct(fn, x)
assert x1 is None
def test_big_integers(self):
with tmp_filename() as fn:
x = 1239487239847234982392837423874
x1 = reconstruct(fn, x)
assert x == x1
def test_numpy_array(self):
with tmp_filename() as fn:
x0 = np.arange(3 * 4 * 5, dtype=np.int64).reshape((3, 4, 5))
assert_array(fn, x0)
x0 = x0.astype(np.float32)
assert_array(fn, x0)
x0 = x0.astype(np.uint8)
assert_array(fn, x0)
x0 = x0.astype(np.complex128)
x0[0] = 1 + 2j
assert_array(fn, x0)
def test_numpy_array_zero_size(self):
# Arrays where one of the axes is length 0. These zero-length arrays cannot
# be stored natively in HDF5, so we'll have to store only the shape
with tmp_filename() as fn:
x0 = np.arange(0, dtype=np.int64)
assert_array(fn, x0)
x0 = np.arange(0, dtype=np.float32).reshape((10, 20, 0))
assert_array(fn, x0)
x0 = np.arange(0, dtype=np.complex128).reshape((0, 5, 0))
assert_array(fn, x0)
def test_numpy_string_array(self):
with tmp_filename() as fn:
x0 = np.array([[b'this', b'string'], [b'foo', b'bar']])
assert_array(fn, x0)
x0 = np.array([[u'this', u'string'], [u'foo', u'bar']])
assert_array(fn, x0)
def test_dictionary(self):
with tmp_filename() as fn:
d = dict(a=100, b='this is a string', c=np.ones(5),
sub=dict(a=200, b='another string',
c=np.random.randn(3, 4)))
d1 = reconstruct(fn, d)
assert d['a'] == d1['a']
assert d['b'] == d1['b']
np.testing.assert_array_equal(d['c'], d1['c'])
assert d['sub']['a'] == d1['sub']['a']
assert d['sub']['b'] == d1['sub']['b']
np.testing.assert_array_equal(d['sub']['c'], d1['sub']['c'])
def test_simplenamespace(self):
if _sns:
with tmp_filename() as fn:
d = SimpleNamespace(
a=100, b='this is a string', c=np.ones(5),
sub=SimpleNamespace(a=200, b='another string',
c=np.random.randn(3, 4)))
d1 = reconstruct(fn, d)
assert d.a == d1.a
assert d.b == d1.b
np.testing.assert_array_equal(d.c, d1.c)
assert d.sub.a == d1.sub.a
assert d.sub.b == d1.sub.b
np.testing.assert_array_equal(d.sub.c, d1.sub.c)
def test_softlinks_recursion(self):
with tmp_filename() as fn:
A = np.random.randn(3, 3)
df = pd.DataFrame({'int': np.arange(3),
'name': ['zero', 'one', 'two']})
AA = 4
s = dict(A=A, B=A, c=A, d=A, f=A, g=[A, A, A], AA=AA, h=AA,
df=df, df2=df)
s['g'].append(s)
n = reconstruct(fn, s)
assert n['g'][0] is n['A']
assert (n['A'] is n['B'] is n['c'] is n['d'] is n['f'] is
n['g'][0] is n['g'][1] is n['g'][2])
assert n['g'][3] is n
assert n['AA'] == AA == n['h']
assert n['df'] is n['df2']
assert (n['df'] == df).all().all()
# test 'sel' option on link ... need to read two vars
# to ensure at least one is a link:
col1 = dd.io.load(fn, '/A', dd.aslice[:, 1])
assert np.all(A[:, 1] == col1)
col1 = dd.io.load(fn, '/B', dd.aslice[:, 1])
assert np.all(A[:, 1] == col1)
def test_softlinks_recursion_sns(self):
if _sns:
with tmp_filename() as fn:
A = np.random.randn(3, 3)
AA = 4
s = SimpleNamespace(A=A, B=A, c=A, d=A, f=A,
g=[A, A, A], AA=AA, h=AA)
s.g.append(s)
n = reconstruct(fn, s)
assert n.g[0] is n.A
assert (n.A is n.B is n.c is n.d is n.f is
n.g[0] is n.g[1] is n.g[2])
assert n.g[3] is n
assert n.AA == AA == n.h
def test_pickle_recursion(self):
with tmp_filename() as fn:
f = {4: 78}
f['rec'] = f
g = [23.4, f]
h = dict(f=f, g=g)
h2 = reconstruct(fn, h)
assert h2['g'][0] == 23.4
assert h2['g'][1] is h2['f']['rec'] is h2['f']
assert h2['f'][4] == 78
def test_list_recursion(self):
with tmp_filename() as fn:
lst = [1, 3]
inlst = ['inside', 'list', lst]
inlst.append(inlst)
lst.append(lst)
lst.append(inlst)
lst2 = reconstruct(fn, lst)
assert lst2[2] is lst2
assert lst2[3][2] is lst2
assert lst[3][2] is lst
assert lst2[3][3] is lst2[3]
assert lst[3][3] is lst[3]
def test_list(self):
with tmp_filename() as fn:
x = [100, 'this is a string', np.ones(3), dict(foo=100)]
x1 = reconstruct(fn, x)
assert isinstance(x1, list)
assert x[0] == x1[0]
assert x[1] == x1[1]
np.testing.assert_array_equal(x[2], x1[2])
assert x[3]['foo'] == x1[3]['foo']
def test_tuple(self):
with tmp_filename() as fn:
x = (100, 'this is a string', np.ones(3), dict(foo=100))
x1 = reconstruct(fn, x)
assert isinstance(x1, tuple)
assert x[0] == x1[0]
assert x[1] == x1[1]
np.testing.assert_array_equal(x[2], x1[2])
assert x[3]['foo'] == x1[3]['foo']
def test_sparse_matrices(self):
import scipy.sparse as S
with tmp_filename() as fn:
x = S.lil_matrix((50, 70))
x[34, 37] = 1
x[34, 39] = 2.5
x[34, 41] = -2
x[38, 41] = -1
x1 = reconstruct(fn, x.tocsr())
assert x.shape == x1.shape
np.testing.assert_array_equal(x.todense(), x1.todense())
x1 = reconstruct(fn, x.tocsc())
assert x.shape == x1.shape
np.testing.assert_array_equal(x.todense(), x1.todense())
x1 = reconstruct(fn, x.tocoo())
assert x.shape == x1.shape
np.testing.assert_array_equal(x.todense(), x1.todense())
x1 = reconstruct(fn, x.todia())
assert x.shape == x1.shape
np.testing.assert_array_equal(x.todense(), x1.todense())
x1 = reconstruct(fn, x.tobsr())
assert x.shape == x1.shape
np.testing.assert_array_equal(x.todense(), x1.todense())
def test_array_scalar(self):
with tmp_filename() as fn:
v = np.array(12.3)
v1 = reconstruct(fn, v)
assert v1[()] == v and isinstance(v1[()], np.float64)
v = np.array(40, dtype=np.int8)
v1 = reconstruct(fn, v)
assert v1[()] == v and isinstance(v1[()], np.int8)
def test_load_group(self):
with tmp_filename() as fn:
x = dict(one=np.ones(10), two='string')
dd.io.save(fn, x)
one = dd.io.load(fn, '/one')
np.testing.assert_array_equal(one, x['one'])
two = dd.io.load(fn, '/two')
assert two == x['two']
full = dd.io.load(fn, '/')
np.testing.assert_array_equal(x['one'], full['one'])
assert x['two'] == full['two']
def test_load_multiple_groups(self):
with tmp_filename() as fn:
x = dict(one=np.ones(10), two='string', three=200)
dd.io.save(fn, x)
one, three = dd.io.load(fn, ['/one', '/three'])
np.testing.assert_array_equal(one, x['one'])
assert three == x['three']
three, two = dd.io.load(fn, ['/three', '/two'])
assert three == x['three']
assert two == x['two']
def test_load_slice(self):
with tmp_filename() as fn:
x = np.arange(3 * 4 * 5).reshape((3, 4, 5))
dd.io.save(fn, dict(x=x))
s = dd.aslice[:2]
xs = dd.io.load(fn, '/x', sel=s)
np.testing.assert_array_equal(xs, x[s])
s = dd.aslice[:, 1:3]
xs = dd.io.load(fn, '/x', sel=s)
np.testing.assert_array_equal(xs, x[s])
xs = dd.io.load(fn, sel=s, unpack=True)
np.testing.assert_array_equal(xs, x[s])
dd.io.save(fn, x)
xs = dd.io.load(fn, sel=s)
np.testing.assert_array_equal(xs, x[s])
def test_force_pickle(self):
with tmp_filename() as fn:
x = dict(one=dict(two=np.arange(10)),
three='string')
xf = dict(one=dict(two=x['one']['two']),
three=x['three'])
dd.io.save(fn, xf)
xs = dd.io.load(fn)
np.testing.assert_array_equal(x['one']['two'], xs['one']['two'])
assert x['three'] == xs['three']
# Try direct loading one
two = dd.io.load(fn, '/one/two')
np.testing.assert_array_equal(x['one']['two'], two)
def test_non_string_key_dict(self):
with tmp_filename() as fn:
# These will be pickled, but it should still work
x = {0: 'zero', 1: 'one', 2: 'two'}
x1 = reconstruct(fn, x)
assert x == x1
x = {1+1j: 'zero', b'test': 'one', (1, 2): 'two'}
x1 = reconstruct(fn, x)
assert x == x1
def test_force_pickle(self):
with tmp_filename() as fn:
x = {0: 'zero', 1: 'one', 2: 'two'}
fx = dd.io.ForcePickle(x)
d = dict(foo=x, bar=100)
fd = dict(foo=fx, bar=100)
d1 = reconstruct(fn, fd)
assert d == d1
def test_pandas_dataframe(self):
with tmp_filename() as fn:
# These will be pickled, but it should still work
df = pd.DataFrame({'int': np.arange(3), 'name': ['zero', 'one', 'two']})
df1 = reconstruct(fn, df)
assert (df == df1).all().all()
def test_pandas_series(self):
rs = np.random.RandomState(1234)
with tmp_filename() as fn:
s = pd.Series(rs.randn(5), index=['a', 'b', 'c', 'd', 'e'])
s1 = reconstruct(fn, s)
assert (s == s1).all()
def test_compression_true(self):
rs = np.random.RandomState(1234)
with tmp_filename() as fn:
x = rs.normal(size=(1000, 5))
for comp in [None, True, 'blosc', 'zlib', ('zlib', 5)]:
dd.io.save(fn, x, compression=comp)
x1 = dd.io.load(fn)
assert (x == x1).all()
if __name__ == '__main__':
unittest.main()
deepdish-0.3.7/deepdish/tests/test_util.py 0000644 0001750 0001750 00000004343 13052123256 022012 0 ustar larsson larsson 0000000 0000000 from __future__ import division, print_function, absolute_import
import unittest
import os
import numpy as np
import deepdish as dd
class TestUtil(unittest.TestCase):
def test_pad(self):
x = np.ones((2, 2))
y = np.array([[0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 0, 0],
[0, 0, 1, 1, 0, 0],
[0, 0, 0, 0, 0, 0]])
y1 = dd.util.pad(x, (1, 2), value=0.0)
np.testing.assert_array_equal(y, y1)
x = np.ones((2, 2))
y = np.array([[2, 2, 2, 2],
[2, 1, 1, 2],
[2, 1, 1, 2],
[2, 2, 2, 2]])
y1 = dd.util.pad(x, 1, value=2.0)
np.testing.assert_array_equal(y, y1)
def test_pad_to_size(self):
x = np.ones((2, 2))
y = np.array([[1, 1, 0],
[1, 1, 0],
[0, 0, 0]])
y1 = dd.util.pad_to_size(x, (3, 3), value=0.0)
np.testing.assert_array_equal(y, y1)
def test_pad_repeat_border(self):
x = np.array([[1.0, 2.0],
[3.0, 4.0]])
y = np.array([[1.0, 1.0, 1.0, 2.0, 2.0, 2.0],
[1.0, 1.0, 1.0, 2.0, 2.0, 2.0],
[1.0, 1.0, 1.0, 2.0, 2.0, 2.0],
[3.0, 3.0, 3.0, 4.0, 4.0, 4.0],
[3.0, 3.0, 3.0, 4.0, 4.0, 4.0],
[3.0, 3.0, 3.0, 4.0, 4.0, 4.0]])
y1 = dd.util.pad_repeat_border(x, 2)
np.testing.assert_array_equal(y, y1)
y = np.array([[1.0, 2.0],
[1.0, 2.0],
[1.0, 2.0],
[3.0, 4.0],
[3.0, 4.0],
[3.0, 4.0]])
y1 = dd.util.pad_repeat_border(x, (2, 0))
np.testing.assert_array_equal(y, y1)
def test_pad_repeat_border_corner(self):
x = np.array([[1.0, 2.0],
[3.0, 4.0]])
y = np.array([[1.0, 2.0, 2.0, 2.0],
[3.0, 4.0, 4.0, 4.0],
[3.0, 4.0, 4.0, 4.0],
[3.0, 4.0, 4.0, 4.0]])
y1 = dd.util.pad_repeat_border_corner(x, (4, 4))
np.testing.assert_array_equal(y, y1)
if __name__ == '__main__':
unittest.main()
deepdish-0.3.7/deepdish/util/ 0000755 0001750 0001750 00000000000 14123256273 017241 5 ustar larsson larsson 0000000 0000000 deepdish-0.3.7/deepdish/util/__init__.py 0000644 0001750 0001750 00000001000 13052123256 021333 0 ustar larsson larsson 0000000 0000000 from __future__ import division, print_function, absolute_import
from .padding import (pad, pad_to_size, pad_repeat_border,
pad_repeat_border_corner)
from .saveable import Saveable, NamedRegistry, SaveableRegistry
from .zca_whitening import whiten, zca_whitening_matrix, apply_whitening_matrix
__all__ = [
'pad',
'pad_to_size',
'pad_repeat_border',
'pad_repeat_border_corner',
'Saveable',
'NamedRegistry',
'SaveableRegistry',
'whiten',
'zca_whitening_matrix',
'apply_whitening_matrix',
]
deepdish-0.3.7/deepdish/util/padding.py 0000644 0001750 0001750 00000014215 13052123256 021216 0 ustar larsson larsson 0000000 0000000 from __future__ import division, print_function, absolute_import
import numpy as np
def pad(data, padwidth, value=0.0):
"""
Pad an array with a specific value.
Parameters
----------
data : ndarray
Numpy array of any dimension and type.
padwidth : int or tuple
If int, it will pad using this amount at the beginning and end of all
dimensions. If it is a tuple (of same length as `ndim`), then the
padding amount will be specified per axis.
value : data.dtype
The value with which to pad. Default is ``0.0``.
See also
--------
pad_to_size, pad_repeat_border, pad_repeat_border_corner
Examples
--------
>>> import deepdish as dd
>>> import numpy as np
Pad an array with zeros.
>>> x = np.ones((3, 3))
>>> dd.util.pad(x, (1, 2), value=0.0)
array([[ 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 1., 1., 1., 0., 0.],
[ 0., 0., 1., 1., 1., 0., 0.],
[ 0., 0., 1., 1., 1., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0.]])
"""
data = np.asarray(data)
shape = data.shape
if isinstance(padwidth, int):
padwidth = (padwidth,)*len(shape)
padded_shape = tuple(map(lambda ix: ix[1]+padwidth[ix[0]]*2,
enumerate(shape)))
new_data = np.empty(padded_shape, dtype=data.dtype)
new_data[..., :] = value
new_data[[slice(w, -w) if w > 0 else slice(None) for w in padwidth]] = data
return new_data
def pad_to_size(data, shape, value=0.0):
"""
This is similar to `pad`, except you specify the final shape of the array.
Parameters
----------
data : ndarray
Numpy array of any dimension and type.
shape : tuple
Final shape of padded array. Should be tuple of length ``data.ndim``.
If it has to pad unevenly, it will pad one more at the end of the axis
than at the beginning. If a dimension is specified as ``-1``, then it
will remain its current size along that dimension.
value : data.dtype
The value with which to pad. Default is ``0.0``. This can even be an
array, as long as ``pdata[:] = value`` is valid, where ``pdata`` is the
size of the padded array.
Examples
--------
>>> import deepdish as dd
>>> import numpy as np
Pad an array with zeros.
>>> x = np.ones((4, 2))
>>> dd.util.pad_to_size(x, (5, 5))
array([[ 0., 1., 1., 0., 0.],
[ 0., 1., 1., 0., 0.],
[ 0., 1., 1., 0., 0.],
[ 0., 1., 1., 0., 0.],
[ 0., 0., 0., 0., 0.]])
"""
shape = [data.shape[i] if shape[i] == -1 else shape[i]
for i in range(len(shape))]
new_data = np.empty(shape)
new_data[:] = value
II = [slice((shape[i] - data.shape[i])//2,
(shape[i] - data.shape[i])//2 + data.shape[i])
for i in range(len(shape))]
new_data[II] = data
return new_data
def pad_repeat_border(data, padwidth):
"""
Similar to `pad`, except the border value from ``data`` is used to pad.
Parameters
----------
data : ndarray
Numpy array of any dimension and type.
padwidth : int or tuple
If int, it will pad using this amount at the beginning and end of all
dimensions. If it is a tuple (of same length as `ndim`), then the
padding amount will be specified per axis.
Examples
--------
>>> import deepdish as dd
>>> import numpy as np
Pad an array by repeating its borders:
>>> shape = (3, 4)
>>> x = np.arange(np.prod(shape)).reshape(shape)
>>> dd.util.pad_repeat_border(x, 2)
array([[ 0, 0, 0, 1, 2, 3, 3, 3],
[ 0, 0, 0, 1, 2, 3, 3, 3],
[ 0, 0, 0, 1, 2, 3, 3, 3],
[ 4, 4, 4, 5, 6, 7, 7, 7],
[ 8, 8, 8, 9, 10, 11, 11, 11],
[ 8, 8, 8, 9, 10, 11, 11, 11],
[ 8, 8, 8, 9, 10, 11, 11, 11]])
"""
data = np.asarray(data)
shape = data.shape
if isinstance(padwidth, int):
padwidth = (padwidth,)*len(shape)
padded_shape = tuple(map(lambda ix: ix[1]+padwidth[ix[0]]*2,
enumerate(shape)))
new_data = np.empty(padded_shape, dtype=data.dtype)
new_data[[slice(w, -w) if w > 0 else slice(None) for w in padwidth]] = data
for i, pw in enumerate(padwidth):
if pw > 0:
selection = [slice(None)] * data.ndim
selection2 = [slice(None)] * data.ndim
# Lower boundary
selection[i] = slice(0, pw)
selection2[i] = slice(pw, pw+1)
new_data[tuple(selection)] = new_data[tuple(selection2)]
# Upper boundary
selection[i] = slice(-pw, None)
selection2[i] = slice(-pw-1, -pw)
new_data[tuple(selection)] = new_data[tuple(selection2)]
return new_data
def pad_repeat_border_corner(data, shape):
"""
Similar to `pad_repeat_border`, except the padding is always done on the
upper end of each axis and the target size is specified.
Parameters
----------
data : ndarray
Numpy array of any dimension and type.
shape : tuple
Final shape of padded array. Should be tuple of length ``data.ndim``.
If it has to pad unevenly, it will pad one more at the end of the axis
than at the beginning.
Examples
--------
>>> import deepdish as dd
>>> import numpy as np
Pad an array by repeating its upper borders.
>>> shape = (3, 4)
>>> x = np.arange(np.prod(shape)).reshape(shape)
>>> dd.util.pad_repeat_border_corner(x, (5, 5))
array([[ 0., 1., 2., 3., 3.],
[ 4., 5., 6., 7., 7.],
[ 8., 9., 10., 11., 11.],
[ 8., 9., 10., 11., 11.],
[ 8., 9., 10., 11., 11.]])
"""
new_data = np.empty(shape)
new_data[[slice(upper) for upper in data.shape]] = data
for i in range(len(shape)):
selection = [slice(None)]*i + [slice(data.shape[i], None)]
selection2 = [slice(None)]*i + [slice(data.shape[i]-1, data.shape[i])]
new_data[selection] = new_data[selection2]
return new_data
deepdish-0.3.7/deepdish/util/saveable.py 0000644 0001750 0001750 00000011261 14123254760 021375 0 ustar larsson larsson 0000000 0000000 from __future__ import division, print_function, absolute_import
from deepdish import io
_ERR_STR = "Must override load_from_dict for Saveable interface"
class Saveable(object):
"""
Key-value coding interface for classes. Generally, this is an interface
that make it possible to access instance members through keys (strings),
instead of through named variables. What this interface enables, is to save
and load an instance of the class to file. This is done by encoding it into
a dictionary, or decoding it from a dictionary. The dictionary is then
saved/loaded using :func:`deepdish.io.save`.
"""
@classmethod
def load(cls, path):
"""
Loads an instance of the class from a file.
Parameters
----------
path : str
Path to an HDF5 file.
Examples
--------
This is an abstract data type, but let us say that ``Foo`` inherits
from ``Saveable``. To construct an object of this class from a file, we
do:
>>> foo = Foo.load('foo.h5') #doctest: +SKIP
"""
if path is None:
return cls.load_from_dict({})
else:
d = io.load(path)
return cls.load_from_dict(d)
def save(self, path):
"""
Saves an instance of the class using :func:`deepdish.io.save`.
Parameters
----------
path : str
Output path to HDF5 file.
"""
io.save(path, self.save_to_dict())
@classmethod
def load_from_dict(cls, d):
"""
Overload this function in your subclass. It takes a dictionary and
should return a constructed object.
When overloading, you have to decorate this function with
``@classmethod``.
Parameters
----------
d : dict
Dictionary representation of an instance of your class.
Returns
-------
obj : object
Returns an object that has been constructed based on the
dictionary.
"""
raise NotImplementedError(_ERR_STR)
def save_to_dict(self):
"""
Overload this function in your subclass. It should return a dictionary
representation of the current instance.
If you member variables that are objects, it is best to convert them to
dictionaries before they are entered into your dictionary hierarchy.
Returns
-------
d : dict
Returns a dictionary representation of the current instance.
"""
raise NotImplementedError(_ERR_STR)
class NamedRegistry(object):
"""
This class provides a named hierarchy of classes, where each class is
associated with a string name.
"""
REGISTRY = {}
@property
def name(self):
"""Returns the name of the registry entry."""
# Automatically overloaded by 'register'
return "noname"
@classmethod
def register(cls, name):
"""Decorator to register a class."""
def register_decorator(reg_cls):
def name_func(self):
return name
reg_cls.name = property(name_func)
assert issubclass(reg_cls, cls), \
"Must be subclass matching your NamedRegistry class"
cls.REGISTRY[name] = reg_cls
return reg_cls
return register_decorator
@classmethod
def getclass(cls, name):
"""
Returns the class object given its name.
"""
return cls.REGISTRY[name]
@classmethod
def construct(cls, name, *args, **kwargs):
"""
Constructs an instance of an object given its name.
"""
return cls.REGISTRY[name](*args, **kwargs)
@classmethod
def registry(cls):
return cls.REGISTRY
@classmethod
def root(cls, reg_cls):
"""
Decorate your base class with this, to create
a new registry for it
"""
reg_cls.REGISTRY = {}
return reg_cls
class SaveableRegistry(Saveable, NamedRegistry):
"""
This combines the features of :class:`deepdish.util.Saveable` and
:class:`deepdish.util.NamedRegistry`.
See also
--------
Saveable, NamedRegistry
"""
@classmethod
def load(cls, path):
if path is None:
return cls.load_from_dict({})
else:
d = io.load(path)
# Check class type
class_name = d.get('name')
if class_name is not None:
return cls.getclass(class_name).load_from_dict(d)
else:
return cls.load_from_dict(d)
def save(self, path):
d = self.save_to_dict()
d['name'] = self.name
io.save(path, d)
deepdish-0.3.7/deepdish/util/zca_whitening.py 0000644 0001750 0001750 00000002501 13052123256 022434 0 ustar larsson larsson 0000000 0000000 from __future__ import division, print_function, absolute_import
import numpy as np
def zca_whitening_matrix(X, w_epsilon, batch=1000):
shape = X.shape
N = shape[0]
Xflat = X.reshape((N, -1))
sigma = None
num_batches = int(np.ceil(N / batch))
for b in range(num_batches):
Xb = Xflat[b*batch:(b+1)*batch]
C = np.dot(Xb.T, Xb)
if sigma is None:
sigma = C
else:
sigma += C
sigma /= N
U, S, _ = np.linalg.svd(sigma)
shrinker = np.diag(1 / np.sqrt(S + w_epsilon))
W = np.dot(U, np.dot(shrinker, U.T))
return W
def apply_whitening_matrix(X, W, batch=1000):
shape = X.shape
N = shape[0]
Xflat = X.reshape((N, -1))
wX = np.empty(shape)
num_batches = int(np.ceil(N / batch))
for b in range(num_batches):
Xb = Xflat[b*batch:(b+1)*batch]
wX[b*batch:(b+1)*batch] = np.dot(W, Xb.T).T.reshape((-1,) + shape[1:])
return wX
def whiten(X, w_epsilon, batch=1000):
shape = X.shape
N = shape[0]
Xflat = X.reshape((N, -1))
W = zca_whitening_matrix(X, w_epsilon)
wX = np.empty(shape)
num_batches = int(np.ceil(N / batch))
for b in range(num_batches):
Xb = Xflat[b*batch:(b+1)*batch]
wX[b*batch:(b+1)*batch] = np.dot(W, Xb.T).T.reshape((-1,) + shape[1:])
return wX
deepdish-0.3.7/deepdish.egg-info/ 0000755 0001750 0001750 00000000000 14123256273 017756 5 ustar larsson larsson 0000000 0000000 deepdish-0.3.7/deepdish.egg-info/PKG-INFO 0000644 0001750 0001750 00000001356 14123256273 021060 0 ustar larsson larsson 0000000 0000000 Metadata-Version: 2.1
Name: deepdish
Version: 0.3.7
Summary: Deep Learning experiments from University of Chicago.
Home-page: https://github.com/uchicago-cs/deepdish
Maintainer: Gustav Larsson
Maintainer-email: gustav.m.larsson@gmail.com
License: BSD
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: BSD License
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.4
Classifier: Topic :: Scientific/Engineering
Provides-Extra: image
License-File: LICENSE
UNKNOWN
deepdish-0.3.7/deepdish.egg-info/SOURCES.txt 0000644 0001750 0001750 00000001440 14123256273 021641 0 ustar larsson larsson 0000000 0000000 LICENSE
MANIFEST.in
README.rst
requirements.txt
requirements_docs.txt
setup.cfg
setup.py
deepdish/__init__.py
deepdish/conf.py
deepdish/core.py
deepdish/image.py
deepdish/six.py
deepdish.egg-info/PKG-INFO
deepdish.egg-info/SOURCES.txt
deepdish.egg-info/dependency_links.txt
deepdish.egg-info/requires.txt
deepdish.egg-info/top_level.txt
deepdish/experiments/__init__.py
deepdish/experiments/pylearn2/datasets/mediaeval.py
deepdish/io/__init__.py
deepdish/io/hdf5io.py
deepdish/io/ls.py
deepdish/parallel/__init__.py
deepdish/parallel/fallback.py
deepdish/parallel/mpi.py
deepdish/tests/__init__.py
deepdish/tests/test_core.py
deepdish/tests/test_io.py
deepdish/tests/test_util.py
deepdish/util/__init__.py
deepdish/util/padding.py
deepdish/util/saveable.py
deepdish/util/zca_whitening.py
scripts/ddls deepdish-0.3.7/deepdish.egg-info/dependency_links.txt 0000644 0001750 0001750 00000000001 14123256273 024024 0 ustar larsson larsson 0000000 0000000
deepdish-0.3.7/deepdish.egg-info/requires.txt 0000644 0001750 0001750 00000000044 14123256273 022354 0 ustar larsson larsson 0000000 0000000 numpy
scipy
tables
[image]
skimage
deepdish-0.3.7/deepdish.egg-info/top_level.txt 0000644 0001750 0001750 00000000011 14123256273 022500 0 ustar larsson larsson 0000000 0000000 deepdish
deepdish-0.3.7/requirements.txt 0000644 0001750 0001750 00000000023 13052123256 017750 0 ustar larsson larsson 0000000 0000000 numpy
scipy
tables
deepdish-0.3.7/requirements_docs.txt 0000644 0001750 0001750 00000000032 13052123256 020760 0 ustar larsson larsson 0000000 0000000 mock
docutils
sphinx>=1.3
deepdish-0.3.7/scripts/ 0000755 0001750 0001750 00000000000 14123256273 016166 5 ustar larsson larsson 0000000 0000000 deepdish-0.3.7/scripts/ddls 0000755 0001750 0001750 00000000414 13052123256 017033 0 ustar larsson larsson 0000000 0000000 #!/usr/bin/env python
from __future__ import division, print_function, absolute_import
import os
import sys
sys.path = [os.path.join(os.path.abspath(os.path.dirname(__file__)), "..")] + sys.path
from deepdish.io.ls import main
if __name__ == '__main__':
main()
deepdish-0.3.7/setup.cfg 0000644 0001750 0001750 00000000103 14123256273 016312 0 ustar larsson larsson 0000000 0000000 [bdist_wheel]
universal = 1
[egg_info]
tag_build =
tag_date = 0
deepdish-0.3.7/setup.py 0000644 0001750 0001750 00000002423 14123255554 016213 0 ustar larsson larsson 0000000 0000000 #!/usr/bin/env python
from __future__ import division, print_function, absolute_import
from setuptools import setup
import os
if os.getenv('READTHEDOCS'):
with open('requirements_docs.txt') as f:
required = f.read().splitlines()
else:
with open('requirements.txt') as f:
required = f.read().splitlines()
CLASSIFIERS = [
'Development Status :: 4 - Beta',
'Intended Audience :: Science/Research',
'License :: OSI Approved :: BSD License',
'Programming Language :: Python',
'Programming Language :: Python :: 2',
'Programming Language :: Python :: 2.7',
'Programming Language :: Python :: 3',
'Programming Language :: Python :: 3.4',
'Topic :: Scientific/Engineering',
]
args = dict(
name='deepdish',
version='0.3.7',
url="https://github.com/uchicago-cs/deepdish",
description="Deep Learning experiments from University of Chicago.",
maintainer='Gustav Larsson',
maintainer_email='gustav.m.larsson@gmail.com',
install_requires=required,
extras_require={
'image': ["skimage"],
},
scripts=['scripts/ddls'],
packages=[
'deepdish',
'deepdish.parallel',
'deepdish.io',
'deepdish.util',
],
license='BSD',
classifiers=CLASSIFIERS,
)
setup(**args)