fast-histogram-0.7/0000755000077000000240000000000013415414723014232 5ustar tomstaff00000000000000fast-histogram-0.7/CHANGES.rst0000644000077000000240000000225313415414666016044 0ustar tomstaff000000000000000.7 (2019-01-09) ---------------- - Fix definition of numpy as a build-time dependency. [#36] 0.6 (2019-01-07) ---------------- - Define numpy as a build-time dependency in pyproject.toml. [#33] - Release the GIL during calculations in C code. [#31] 0.5 (2018-09-26) ---------------- - Fix bug that caused histograms of n-dimensional arrays to not be computed correctly. [#21] - Avoid memory copies for non-native endian 64-bit float arrays. [#18] - Avoid memory copies for any numerical Numpy type and non-contiguous arrays. [#23] - Raise a better error if arrays are passed to the ``bins`` argument. [#24] 0.4 (2018-02-12) ---------------- - Make sure that Numpy is not required to run setup.py. [#15] - Fix installation on platforms with an ASCII locale. [#15] 0.3 (2017-10-28) ---------------- - Use long instead of int for x/y sizes and indices - Implement support for weights= option 0.2.1 (2017-07-18) ------------------ - Fixed rst syntax in README 0.2 (2017-07-18) ---------------- - Fixed segmentation fault under certain conditions. - Ensure that arrays are C-contiguous before passing them to the C code. 0.1 (2017-07-18) ---------------- - Initial version fast-histogram-0.7/LICENSE0000644000077000000240000000242313414215513015233 0ustar tomstaff00000000000000Copyright (c) 2017, Thomas P. Robitaille All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. fast-histogram-0.7/MANIFEST.in0000644000077000000240000000006713415414551015772 0ustar tomstaff00000000000000include LICENSE include README.rst include CHANGES.rst fast-histogram-0.7/PKG-INFO0000644000077000000240000001773113415414723015340 0ustar tomstaff00000000000000Metadata-Version: 1.0 Name: fast-histogram Version: 0.7 Summary: Fast simple 1D and 2D histograms Home-page: https://github.com/astrofrog/fast-histogram Author: Thomas Robitaille Author-email: thomas.robitaille@gmail.com License: BSD Description: |Travis Status| |AppVeyor Status| |CircleCI Status| |asv| About ----- Sometimes you just want to compute simple 1D or 2D histograms with regular bins. Fast. No nonsense. `Numpy's `__ histogram functions are versatile, and can handle for example non-regular binning, but this versatility comes at the expense of performance. The **fast-histogram** mini-package aims to provide simple and fast histogram functions for regular bins that don't compromise on performance. It doesn't do anything complicated - it just implements a simple histogram algorithm in C and keeps it simple. The aim is to have functions that are fast but also robust and reliable. The result is a 1D histogram function here that is **7-15x faster** than ``numpy.histogram``, and a 2D histogram function that is **20-25x faster** than ``numpy.histogram2d``. To install:: pip install fast-histogram or if you use conda you can instead do:: conda install -c conda-forge fast-histogram The ``fast_histogram`` module then provides two functions: ``histogram1d`` and ``histogram2d``: .. code:: python from fast_histogram import histogram1d, histogram2d Example ------- Here's an example of binning 10 million points into a regular 2D histogram: .. code:: python In [1]: import numpy as np In [2]: x = np.random.random(10_000_000) In [3]: y = np.random.random(10_000_000) In [4]: %timeit _ = np.histogram2d(x, y, range=[[-1, 2], [-2, 4]], bins=30) 935 ms ± 58.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) In [5]: from fast_histogram import histogram2d In [6]: %timeit _ = histogram2d(x, y, range=[[-1, 2], [-2, 4]], bins=30) 40.2 ms ± 624 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) (note that ``10_000_000`` is possible in Python 3.6 syntax, use ``10000000`` instead in previous versions) The version here is over 20 times faster! The following plot shows the speedup as a function of array size for the bin parameters shown above: .. figure:: https://github.com/astrofrog/fast-histogram/raw/master/speedup_compared.png :alt: Comparison of performance between Numpy and fast-histogram as well as results for the 1D case, also with 30 bins. The speedup for the 2D case is consistently between 20-25x, and for the 1D case goes from 15x for small arrays to around 7x for large arrays. Q&A --- Why don't the histogram functions return the edges? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Computing and returning the edges may seem trivial but it can slow things down by a factor of a few when computing histograms of 10^5 or fewer elements, so not returning the edges is a deliberate decision related to performance. You can easily compute the edges yourself if needed though, using ``numpy.linspace``. Doesn't package X already do this, but better? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This may very well be the case! If this duplicates another package, or if it is possible to use Numpy in a smarter way to get the same performance gains, please open an issue and I'll consider deprecating this package :) One package that does include fast histogram functions (including in n-dimensions) and can compute other statistics is `vaex `_, so take a look there if you need more advanced functionality! Are the 2D histograms not transposed compared to what they should be? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ There is technically no 'right' and 'wrong' orientation - here we adopt the convention which gives results consistent with Numpy, so: .. code:: python numpy.histogram2d(x, y, range=[[xmin, xmax], [ymin, ymax]], bins=[nx, ny]) should give the same result as: .. code:: python fast_histogram.histogram2d(x, y, range=[[xmin, xmax], [ymin, ymax]], bins=[nx, ny]) Why not contribute this to Numpy directly? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ As mentioned above, the Numpy functions are much more versatile, so they could not be replaced by the ones here. One option would be to check in Numpy's functions for cases that are simple and dispatch to functions such as the ones here, or add dedicated functions for regular binning. I hope we can get this in Numpy in some form or another eventually, but for now, the aim is to have this available to packages that need to support a range of Numpy versions. Why not use Cython? ~~~~~~~~~~~~~~~~~~~ I originally implemented this in Cython, but found that I could get a 50% performance improvement by going straight to a C extension. What about using Numba? ~~~~~~~~~~~~~~~~~~~~~~~ I specifically want to keep this package as easy as possible to install, and while `Numba `__ is a great package, it is not trivial to install outside of Anaconda. Could this be parallelized? ~~~~~~~~~~~~~~~~~~~~~~~~~~~ This may benefit from parallelization under certain circumstances. The easiest solution might be to use OpenMP, but this won't work on all platforms, so it would need to be made optional. Couldn't you make it faster by using the GPU? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Almost certainly, though the aim here is to have an easily installable and portable package, and introducing GPUs is going to affect both of these. Why make a package specifically for this? This is a tiny amount of functionality ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Packages that need this could simply bundle their own C extension or Cython code to do this, but the main motivation for releasing this as a mini-package is to avoid making pure-Python packages into packages that require compilation just because of the need to compute fast histograms. Can I contribute? ~~~~~~~~~~~~~~~~~ Yes please! This is not meant to be a finished package, and I welcome pull request to improve things. .. |Travis Status| image:: https://travis-ci.org/astrofrog/fast-histogram.svg?branch=master :target: https://travis-ci.org/astrofrog/fast-histogram .. |AppVeyor Status| image:: https://ci.appveyor.com/api/projects/status/ek63g9haku5on0q2/branch/master?svg=true :target: https://ci.appveyor.com/project/astrofrog/fast-histogram .. |CircleCI Status| image:: https://circleci.com/gh/astrofrog/fast-histogram/tree/master.svg?style=svg :target: https://circleci.com/gh/astrofrog/fast-histogram/tree/master .. |asv| image:: https://img.shields.io/badge/benchmarked%20by-asv-brightgreen.svg :target: https://astrofrog.github.io/fast-histogram Platform: UNKNOWN fast-histogram-0.7/README.rst0000644000077000000240000001463713414215513015727 0ustar tomstaff00000000000000|Travis Status| |AppVeyor Status| |CircleCI Status| |asv| About ----- Sometimes you just want to compute simple 1D or 2D histograms with regular bins. Fast. No nonsense. `Numpy's `__ histogram functions are versatile, and can handle for example non-regular binning, but this versatility comes at the expense of performance. The **fast-histogram** mini-package aims to provide simple and fast histogram functions for regular bins that don't compromise on performance. It doesn't do anything complicated - it just implements a simple histogram algorithm in C and keeps it simple. The aim is to have functions that are fast but also robust and reliable. The result is a 1D histogram function here that is **7-15x faster** than ``numpy.histogram``, and a 2D histogram function that is **20-25x faster** than ``numpy.histogram2d``. To install:: pip install fast-histogram or if you use conda you can instead do:: conda install -c conda-forge fast-histogram The ``fast_histogram`` module then provides two functions: ``histogram1d`` and ``histogram2d``: .. code:: python from fast_histogram import histogram1d, histogram2d Example ------- Here's an example of binning 10 million points into a regular 2D histogram: .. code:: python In [1]: import numpy as np In [2]: x = np.random.random(10_000_000) In [3]: y = np.random.random(10_000_000) In [4]: %timeit _ = np.histogram2d(x, y, range=[[-1, 2], [-2, 4]], bins=30) 935 ms ± 58.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) In [5]: from fast_histogram import histogram2d In [6]: %timeit _ = histogram2d(x, y, range=[[-1, 2], [-2, 4]], bins=30) 40.2 ms ± 624 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) (note that ``10_000_000`` is possible in Python 3.6 syntax, use ``10000000`` instead in previous versions) The version here is over 20 times faster! The following plot shows the speedup as a function of array size for the bin parameters shown above: .. figure:: https://github.com/astrofrog/fast-histogram/raw/master/speedup_compared.png :alt: Comparison of performance between Numpy and fast-histogram as well as results for the 1D case, also with 30 bins. The speedup for the 2D case is consistently between 20-25x, and for the 1D case goes from 15x for small arrays to around 7x for large arrays. Q&A --- Why don't the histogram functions return the edges? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Computing and returning the edges may seem trivial but it can slow things down by a factor of a few when computing histograms of 10^5 or fewer elements, so not returning the edges is a deliberate decision related to performance. You can easily compute the edges yourself if needed though, using ``numpy.linspace``. Doesn't package X already do this, but better? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This may very well be the case! If this duplicates another package, or if it is possible to use Numpy in a smarter way to get the same performance gains, please open an issue and I'll consider deprecating this package :) One package that does include fast histogram functions (including in n-dimensions) and can compute other statistics is `vaex `_, so take a look there if you need more advanced functionality! Are the 2D histograms not transposed compared to what they should be? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ There is technically no 'right' and 'wrong' orientation - here we adopt the convention which gives results consistent with Numpy, so: .. code:: python numpy.histogram2d(x, y, range=[[xmin, xmax], [ymin, ymax]], bins=[nx, ny]) should give the same result as: .. code:: python fast_histogram.histogram2d(x, y, range=[[xmin, xmax], [ymin, ymax]], bins=[nx, ny]) Why not contribute this to Numpy directly? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ As mentioned above, the Numpy functions are much more versatile, so they could not be replaced by the ones here. One option would be to check in Numpy's functions for cases that are simple and dispatch to functions such as the ones here, or add dedicated functions for regular binning. I hope we can get this in Numpy in some form or another eventually, but for now, the aim is to have this available to packages that need to support a range of Numpy versions. Why not use Cython? ~~~~~~~~~~~~~~~~~~~ I originally implemented this in Cython, but found that I could get a 50% performance improvement by going straight to a C extension. What about using Numba? ~~~~~~~~~~~~~~~~~~~~~~~ I specifically want to keep this package as easy as possible to install, and while `Numba `__ is a great package, it is not trivial to install outside of Anaconda. Could this be parallelized? ~~~~~~~~~~~~~~~~~~~~~~~~~~~ This may benefit from parallelization under certain circumstances. The easiest solution might be to use OpenMP, but this won't work on all platforms, so it would need to be made optional. Couldn't you make it faster by using the GPU? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Almost certainly, though the aim here is to have an easily installable and portable package, and introducing GPUs is going to affect both of these. Why make a package specifically for this? This is a tiny amount of functionality ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Packages that need this could simply bundle their own C extension or Cython code to do this, but the main motivation for releasing this as a mini-package is to avoid making pure-Python packages into packages that require compilation just because of the need to compute fast histograms. Can I contribute? ~~~~~~~~~~~~~~~~~ Yes please! This is not meant to be a finished package, and I welcome pull request to improve things. .. |Travis Status| image:: https://travis-ci.org/astrofrog/fast-histogram.svg?branch=master :target: https://travis-ci.org/astrofrog/fast-histogram .. |AppVeyor Status| image:: https://ci.appveyor.com/api/projects/status/ek63g9haku5on0q2/branch/master?svg=true :target: https://ci.appveyor.com/project/astrofrog/fast-histogram .. |CircleCI Status| image:: https://circleci.com/gh/astrofrog/fast-histogram/tree/master.svg?style=svg :target: https://circleci.com/gh/astrofrog/fast-histogram/tree/master .. |asv| image:: https://img.shields.io/badge/benchmarked%20by-asv-brightgreen.svg :target: https://astrofrog.github.io/fast-histogram fast-histogram-0.7/fast_histogram/0000755000077000000240000000000013415414723017244 5ustar tomstaff00000000000000fast-histogram-0.7/fast_histogram/__init__.py0000644000077000000240000000005613415414670021357 0ustar tomstaff00000000000000from .histogram import * __version__ = "0.7" fast-histogram-0.7/fast_histogram/_histogram_core.c0000644000077000000240000004377713414666307022604 0ustar tomstaff00000000000000#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION #include #include #include /* Define docstrings */ static char module_docstring[] = "Fast histogram functioins"; static char _histogram1d_docstring[] = "Compute a 1D histogram"; static char _histogram2d_docstring[] = "Compute a 2D histogram"; static char _histogram1d_weighted_docstring[] = "Compute a weighted 1D histogram"; static char _histogram2d_weighted_docstring[] = "Compute a weighted 2D histogram"; /* Declare the C functions here. */ static PyObject *_histogram1d(PyObject *self, PyObject *args); static PyObject *_histogram2d(PyObject *self, PyObject *args); static PyObject *_histogram1d_weighted(PyObject *self, PyObject *args); static PyObject *_histogram2d_weighted(PyObject *self, PyObject *args); /* Define the methods that will be available on the module. */ static PyMethodDef module_methods[] = { {"_histogram1d", _histogram1d, METH_VARARGS, _histogram1d_docstring}, {"_histogram2d", _histogram2d, METH_VARARGS, _histogram2d_docstring}, {"_histogram1d_weighted", _histogram1d_weighted, METH_VARARGS, _histogram1d_weighted_docstring}, {"_histogram2d_weighted", _histogram2d_weighted, METH_VARARGS, _histogram2d_weighted_docstring}, {NULL, NULL, 0, NULL} }; /* This is the function that is called on import. */ #if PY_MAJOR_VERSION >= 3 #define MOD_ERROR_VAL NULL #define MOD_SUCCESS_VAL(val) val #define MOD_INIT(name) PyMODINIT_FUNC PyInit_##name(void) #define MOD_DEF(ob, name, doc, methods) \ static struct PyModuleDef moduledef = { \ PyModuleDef_HEAD_INIT, name, doc, -1, methods, }; \ ob = PyModule_Create(&moduledef); #else #define MOD_ERROR_VAL #define MOD_SUCCESS_VAL(val) #define MOD_INIT(name) void init##name(void) #define MOD_DEF(ob, name, doc, methods) \ ob = Py_InitModule3(name, methods, doc); #endif MOD_INIT(_histogram_core) { PyObject *m; MOD_DEF(m, "_histogram_core", module_docstring, module_methods); if (m == NULL) return MOD_ERROR_VAL; import_array(); return MOD_SUCCESS_VAL(m); } static PyObject *_histogram1d(PyObject *self, PyObject *args) { long n; int ix, nx; double xmin, xmax, tx, fnx, normx; PyObject *x_obj, *count_obj; PyArrayObject *x_array, *count_array; npy_intp dims[1]; double *count; NpyIter *iter; NpyIter_IterNextFunc *iternext; char **dataptr; npy_intp *strideptr, *innersizeptr; PyArray_Descr *dtype; /* Parse the input tuple */ if (!PyArg_ParseTuple(args, "Oidd", &x_obj, &nx, &xmin, &xmax)) { PyErr_SetString(PyExc_TypeError, "Error parsing input"); return NULL; } /* Interpret the input objects as `numpy` arrays. */ x_array = (PyArrayObject *)PyArray_FROM_O(x_obj); /* If that didn't work, throw an `Exception`. */ if (x_array == NULL) { PyErr_SetString(PyExc_TypeError, "Couldn't parse the input arrays."); Py_XDECREF(x_array); return NULL; } /* How many data points are there? */ n = (long)PyArray_DIM(x_array, 0); /* Build the output array */ dims[0] = nx; count_obj = PyArray_SimpleNew(1, dims, NPY_DOUBLE); if (count_obj == NULL) { PyErr_SetString(PyExc_RuntimeError, "Couldn't build output array"); Py_DECREF(x_array); Py_XDECREF(count_obj); return NULL; } count_array = (PyArrayObject *)count_obj; PyArray_FILLWBYTE(count_array, 0); if (n == 0) { Py_DECREF(x_array); return count_obj; } dtype = PyArray_DescrFromType(NPY_DOUBLE); iter = NpyIter_New(x_array, NPY_ITER_READONLY | NPY_ITER_EXTERNAL_LOOP | NPY_ITER_BUFFERED, NPY_KEEPORDER, NPY_SAFE_CASTING, dtype); if (iter == NULL) { PyErr_SetString(PyExc_RuntimeError, "Couldn't set up iterator"); Py_DECREF(x_array); Py_DECREF(count_obj); Py_DECREF(count_array); return NULL; } /* * The iternext function gets stored in a local variable * so it can be called repeatedly in an efficient manner. */ iternext = NpyIter_GetIterNext(iter, NULL); if (iternext == NULL) { PyErr_SetString(PyExc_RuntimeError, "Couldn't set up iterator"); NpyIter_Deallocate(iter); Py_DECREF(x_array); Py_DECREF(count_obj); Py_DECREF(count_array); return NULL; } /* The location of the data pointer which the iterator may update */ dataptr = NpyIter_GetDataPtrArray(iter); /* The location of the stride which the iterator may update */ strideptr = NpyIter_GetInnerStrideArray(iter); /* The location of the inner loop size which the iterator may update */ innersizeptr = NpyIter_GetInnerLoopSizePtr(iter); /* Pre-compute variables for efficiency in the histogram calculation */ fnx = nx; normx = 1. / (xmax - xmin); /* Get C array for output array */ count = (double *)PyArray_DATA(count_array); Py_BEGIN_ALLOW_THREADS do { /* Get the inner loop data/stride/count values */ npy_intp stride = *strideptr; npy_intp size = *innersizeptr; /* This is a typical inner loop for NPY_ITER_EXTERNAL_LOOP */ while (size--) { tx = *(double *)dataptr[0]; if (tx >= xmin && tx < xmax) { ix = (tx - xmin) * normx * fnx; count[ix] += 1.; } dataptr[0] += stride; } } while (iternext(iter)); Py_END_ALLOW_THREADS NpyIter_Deallocate(iter); /* Clean up. */ Py_DECREF(x_array); return count_obj; } static PyObject *_histogram2d(PyObject *self, PyObject *args) { long n; int ix, iy, nx, ny; double xmin, xmax, tx, fnx, normx, ymin, ymax, ty, fny, normy; PyObject *x_obj, *y_obj, *count_obj; PyArrayObject *x_array, *y_array, *count_array, *arrays[2]; npy_intp dims[2]; double *count; NpyIter *iter; NpyIter_IterNextFunc *iternext; char **dataptr; npy_intp *strideptr, *innersizeptr; PyArray_Descr *dtypes[] = {PyArray_DescrFromType(NPY_DOUBLE), PyArray_DescrFromType(NPY_DOUBLE)}; npy_uint32 op_flags[] = {NPY_ITER_READONLY, NPY_ITER_READONLY}; /* Parse the input tuple */ if (!PyArg_ParseTuple(args, "OOiddidd", &x_obj, &y_obj, &nx, &xmin, &xmax, &ny, &ymin, &ymax)) { PyErr_SetString(PyExc_TypeError, "Error parsing input"); return NULL; } /* Interpret the input objects as `numpy` arrays. */ x_array = (PyArrayObject *)PyArray_FROM_O(x_obj); y_array = (PyArrayObject *)PyArray_FROM_O(y_obj); /* If that didn't work, throw an `Exception`. */ if (x_array == NULL || y_array == NULL) { PyErr_SetString(PyExc_TypeError, "Couldn't parse the input arrays."); Py_XDECREF(x_array); Py_XDECREF(y_array); return NULL; } /* How many data points are there? */ n = (long)PyArray_DIM(x_array, 0); /* Check the dimensions. */ if (n != (long)PyArray_DIM(y_array, 0)) { PyErr_SetString(PyExc_RuntimeError, "Dimension mismatch between x and y"); Py_DECREF(x_array); Py_DECREF(y_array); return NULL; } /* Build the output array */ dims[0] = nx; dims[1] = ny; count_obj = PyArray_SimpleNew(2, dims, NPY_DOUBLE); if (count_obj == NULL) { PyErr_SetString(PyExc_RuntimeError, "Couldn't build output array"); Py_DECREF(x_array); Py_DECREF(y_array); Py_XDECREF(count_obj); return NULL; } count_array = (PyArrayObject *)count_obj; PyArray_FILLWBYTE(count_array, 0); if (n == 0) { Py_DECREF(x_array); Py_DECREF(y_array); return count_obj; } arrays[0] = x_array; arrays[1] = y_array; iter = NpyIter_AdvancedNew(2, arrays, NPY_ITER_EXTERNAL_LOOP | NPY_ITER_BUFFERED, NPY_KEEPORDER, NPY_SAFE_CASTING, op_flags, dtypes, -1, NULL, NULL, 0); if (iter == NULL) { PyErr_SetString(PyExc_RuntimeError, "Couldn't set up iterator"); Py_DECREF(x_array); Py_DECREF(y_array); Py_DECREF(count_obj); Py_DECREF(count_array); return NULL; } /* * The iternext function gets stored in a local variable * so it can be called repeatedly in an efficient manner. */ iternext = NpyIter_GetIterNext(iter, NULL); if (iternext == NULL) { PyErr_SetString(PyExc_RuntimeError, "Couldn't set up iterator"); NpyIter_Deallocate(iter); Py_DECREF(x_array); Py_DECREF(y_array); Py_DECREF(count_obj); Py_DECREF(count_array); return NULL; } /* The location of the data pointer which the iterator may update */ dataptr = NpyIter_GetDataPtrArray(iter); /* The location of the stride which the iterator may update */ strideptr = NpyIter_GetInnerStrideArray(iter); /* The location of the inner loop size which the iterator may update */ innersizeptr = NpyIter_GetInnerLoopSizePtr(iter); /* Pre-compute variables for efficiency in the histogram calculation */ fnx = nx; fny = ny; normx = 1. / (xmax - xmin); normy = 1. / (ymax - ymin); /* Get C array for output array */ count = (double *)PyArray_DATA(count_array); Py_BEGIN_ALLOW_THREADS do { /* Get the inner loop data/stride/count values */ npy_intp stride = *strideptr; npy_intp size = *innersizeptr; /* This is a typical inner loop for NPY_ITER_EXTERNAL_LOOP */ while (size--) { tx = *(double *)dataptr[0]; ty = *(double *)dataptr[1]; if (tx >= xmin && tx < xmax && ty >= ymin && ty < ymax) { ix = (tx - xmin) * normx * fnx; iy = (ty - ymin) * normy * fny; count[iy + ny * ix] += 1.; } dataptr[0] += stride; dataptr[1] += stride; } } while (iternext(iter)); Py_END_ALLOW_THREADS NpyIter_Deallocate(iter); /* Clean up. */ Py_DECREF(x_array); Py_DECREF(y_array); return count_obj; } static PyObject *_histogram1d_weighted(PyObject *self, PyObject *args) { long n; int ix, nx; double xmin, xmax, tx, tw, fnx, normx; PyObject *x_obj, *w_obj, *count_obj; PyArrayObject *x_array, *w_array, *count_array, *arrays[2]; npy_intp dims[1]; double *count; NpyIter *iter; NpyIter_IterNextFunc *iternext; char **dataptr; npy_intp *strideptr, *innersizeptr; PyArray_Descr *dtypes[] = {PyArray_DescrFromType(NPY_DOUBLE), PyArray_DescrFromType(NPY_DOUBLE)}; npy_uint32 op_flags[] = {NPY_ITER_READONLY, NPY_ITER_READONLY}; /* Parse the input tuple */ if (!PyArg_ParseTuple(args, "OOidd", &x_obj, &w_obj, &nx, &xmin, &xmax)) { PyErr_SetString(PyExc_TypeError, "Error parsing input"); return NULL; } /* Interpret the input objects as `numpy` arrays. */ x_array = (PyArrayObject *)PyArray_FROM_O(x_obj); w_array = (PyArrayObject *)PyArray_FROM_O(w_obj); /* If that didn't work, throw an `Exception`. */ if (x_array == NULL || w_array == NULL) { PyErr_SetString(PyExc_TypeError, "Couldn't parse the input arrays."); Py_XDECREF(x_array); Py_XDECREF(w_array); return NULL; } /* How many data points are there? */ n = (long)PyArray_DIM(x_array, 0); /* Check the dimensions. */ if (n != (long)PyArray_DIM(w_array, 0)) { PyErr_SetString(PyExc_RuntimeError, "Dimension mismatch between x and w"); Py_DECREF(x_array); Py_DECREF(w_array); return NULL; } /* Build the output array */ dims[0] = nx; count_obj = PyArray_SimpleNew(1, dims, NPY_DOUBLE); if (count_obj == NULL) { PyErr_SetString(PyExc_RuntimeError, "Couldn't build output array"); Py_DECREF(x_array); Py_DECREF(w_array); Py_XDECREF(count_obj); return NULL; } count_array = (PyArrayObject *)count_obj; PyArray_FILLWBYTE(count_array, 0); if (n == 0) { Py_DECREF(x_array); Py_DECREF(w_array); return count_obj; } arrays[0] = x_array; arrays[1] = w_array; iter = NpyIter_AdvancedNew(2, arrays, NPY_ITER_EXTERNAL_LOOP | NPY_ITER_BUFFERED, NPY_KEEPORDER, NPY_SAFE_CASTING, op_flags, dtypes, -1, NULL, NULL, 0); if (iter == NULL) { PyErr_SetString(PyExc_RuntimeError, "Couldn't set up iterator"); Py_DECREF(x_array); Py_DECREF(w_array); Py_DECREF(count_obj); Py_DECREF(count_array); return NULL; } /* * The iternext function gets stored in a local variable * so it can be called repeatedly in an efficient manner. */ iternext = NpyIter_GetIterNext(iter, NULL); if (iternext == NULL) { PyErr_SetString(PyExc_RuntimeError, "Couldn't set up iterator"); NpyIter_Deallocate(iter); Py_DECREF(x_array); Py_DECREF(w_array); Py_DECREF(count_obj); Py_DECREF(count_array); return NULL; } /* The location of the data pointer which the iterator may update */ dataptr = NpyIter_GetDataPtrArray(iter); /* The location of the stride which the iterator may update */ strideptr = NpyIter_GetInnerStrideArray(iter); /* The location of the inner loop size which the iterator may update */ innersizeptr = NpyIter_GetInnerLoopSizePtr(iter); /* Pre-compute variables for efficiency in the histogram calculation */ fnx = nx; normx = 1. / (xmax - xmin); /* Get C array for output array */ count = (double *)PyArray_DATA(count_array); Py_BEGIN_ALLOW_THREADS do { /* Get the inner loop data/stride/count values */ npy_intp stride = *strideptr; npy_intp size = *innersizeptr; /* This is a typical inner loop for NPY_ITER_EXTERNAL_LOOP */ while (size--) { tx = *(double *)dataptr[0]; tw = *(double *)dataptr[1]; if (tx >= xmin && tx < xmax) { ix = (tx - xmin) * normx * fnx; count[ix] += tw; } dataptr[0] += stride; dataptr[1] += stride; } } while (iternext(iter)); Py_END_ALLOW_THREADS NpyIter_Deallocate(iter); /* Clean up. */ Py_DECREF(x_array); Py_DECREF(w_array); return count_obj; } static PyObject *_histogram2d_weighted(PyObject *self, PyObject *args) { long n; int ix, iy, nx, ny; double xmin, xmax, tx, fnx, normx, ymin, ymax, ty, fny, normy, tw; PyObject *x_obj, *y_obj, *w_obj, *count_obj; PyArrayObject *x_array, *y_array, *w_array, *count_array, *arrays[3]; npy_intp dims[2]; double *count; NpyIter *iter; NpyIter_IterNextFunc *iternext; char **dataptr; npy_intp *strideptr, *innersizeptr; PyArray_Descr *dtypes[] = {PyArray_DescrFromType(NPY_DOUBLE), PyArray_DescrFromType(NPY_DOUBLE), PyArray_DescrFromType(NPY_DOUBLE)}; npy_uint32 op_flags[] = {NPY_ITER_READONLY, NPY_ITER_READONLY, NPY_ITER_READONLY}; /* Parse the input tuple */ if (!PyArg_ParseTuple(args, "OOOiddidd", &x_obj, &y_obj, &w_obj, &nx, &xmin, &xmax, &ny, &ymin, &ymax)) { PyErr_SetString(PyExc_TypeError, "Error parsing input"); return NULL; } /* Interpret the input objects as `numpy` arrays. */ x_array = (PyArrayObject *)PyArray_FROM_O(x_obj); y_array = (PyArrayObject *)PyArray_FROM_O(y_obj); w_array = (PyArrayObject *)PyArray_FROM_O(w_obj); /* If that didn't work, throw an `Exception`. */ if (x_array == NULL || y_array == NULL || w_array == NULL) { PyErr_SetString(PyExc_TypeError, "Couldn't parse the input arrays."); Py_XDECREF(x_array); Py_XDECREF(y_array); Py_XDECREF(w_array); return NULL; } /* How many data points are there? */ n = (long)PyArray_DIM(x_array, 0); /* Check the dimensions. */ if (n != (long)PyArray_DIM(y_array, 0) || n != (long)PyArray_DIM(w_array, 0)) { PyErr_SetString(PyExc_RuntimeError, "Dimension mismatch between x, y, and w"); Py_DECREF(x_array); Py_DECREF(y_array); Py_DECREF(w_array); return NULL; } /* Build the output array */ dims[0] = nx; dims[1] = ny; count_obj = PyArray_SimpleNew(2, dims, NPY_DOUBLE); if (count_obj == NULL) { PyErr_SetString(PyExc_RuntimeError, "Couldn't build output array"); Py_DECREF(x_array); Py_DECREF(y_array); Py_DECREF(w_array); Py_XDECREF(count_obj); return NULL; } count_array = (PyArrayObject *)count_obj; PyArray_FILLWBYTE(count_array, 0); if (n == 0) { Py_DECREF(x_array); Py_DECREF(y_array); Py_DECREF(w_array); return count_obj; } arrays[0] = x_array; arrays[1] = y_array; arrays[2] = w_array; iter = NpyIter_AdvancedNew(3, arrays, NPY_ITER_EXTERNAL_LOOP | NPY_ITER_BUFFERED, NPY_KEEPORDER, NPY_SAFE_CASTING, op_flags, dtypes, -1, NULL, NULL, 0); if (iter == NULL) { PyErr_SetString(PyExc_RuntimeError, "Couldn't set up iterator"); Py_DECREF(x_array); Py_DECREF(y_array); Py_DECREF(w_array); Py_DECREF(count_obj); Py_DECREF(count_array); return NULL; } /* * The iternext function gets stored in a local variable * so it can be called repeatedly in an efficient manner. */ iternext = NpyIter_GetIterNext(iter, NULL); if (iternext == NULL) { PyErr_SetString(PyExc_RuntimeError, "Couldn't set up iterator"); NpyIter_Deallocate(iter); Py_DECREF(x_array); Py_DECREF(y_array); Py_DECREF(w_array); Py_DECREF(count_obj); Py_DECREF(count_array); return NULL; } /* The location of the data pointer which the iterator may update */ dataptr = NpyIter_GetDataPtrArray(iter); /* The location of the stride which the iterator may update */ strideptr = NpyIter_GetInnerStrideArray(iter); /* The location of the inner loop size which the iterator may update */ innersizeptr = NpyIter_GetInnerLoopSizePtr(iter); /* Pre-compute variables for efficiency in the histogram calculation */ fnx = nx; fny = ny; normx = 1. / (xmax - xmin); normy = 1. / (ymax - ymin); /* Get C array for output array */ count = (double *)PyArray_DATA(count_array); Py_BEGIN_ALLOW_THREADS do { /* Get the inner loop data/stride/count values */ npy_intp stride = *strideptr; npy_intp size = *innersizeptr; /* This is a typical inner loop for NPY_ITER_EXTERNAL_LOOP */ while (size--) { tx = *(double *)dataptr[0]; ty = *(double *)dataptr[1]; tw = *(double *)dataptr[2]; if (tx >= xmin && tx < xmax && ty >= ymin && ty < ymax) { ix = (tx - xmin) * normx * fnx; iy = (ty - ymin) * normy * fny; count[iy + ny * ix] += tw; } dataptr[0] += stride; dataptr[1] += stride; dataptr[2] += stride; } } while (iternext(iter)); Py_END_ALLOW_THREADS NpyIter_Deallocate(iter); /* Clean up. */ Py_DECREF(x_array); Py_DECREF(y_array); Py_DECREF(w_array); return count_obj; } fast-histogram-0.7/fast_histogram/histogram.py0000644000077000000240000000621513414215513021612 0ustar tomstaff00000000000000from __future__ import division import numbers import numpy as np from ._histogram_core import (_histogram1d, _histogram2d, _histogram1d_weighted, _histogram2d_weighted) __all__ = ['histogram1d', 'histogram2d'] def histogram1d(x, bins, range, weights=None): """ Compute a 1D histogram assuming equally spaced bins. Parameters ---------- x : `~numpy.ndarray` The position of the points to bin in the 1D histogram bins : int The number of bins range : iterable The range as a tuple of (xmin, xmax) weights : `~numpy.ndarray` The weights of the points in the 1D histogram Returns ------- array : `~numpy.ndarray` The 1D histogram array """ nx = bins if not np.isscalar(bins): raise TypeError('bins should be an integer') xmin, xmax = range if not np.isfinite(xmin): raise ValueError("xmin should be finite") if not np.isfinite(xmax): raise ValueError("xmax should be finite") if xmax <= xmin: raise ValueError("xmax should be greater than xmin") if nx <= 0: raise ValueError("nx should be strictly positive") if weights is None: return _histogram1d(x, nx, xmin, xmax) else: return _histogram1d_weighted(x, weights, nx, xmin, xmax) def histogram2d(x, y, bins, range, weights=None): """ Compute a 2D histogram assuming equally spaced bins. Parameters ---------- x, y : `~numpy.ndarray` The position of the points to bin in the 2D histogram bins : int or iterable The number of bins in each dimension. If given as an integer, the same number of bins is used for each dimension. range : iterable The range to use in each dimention, as an iterable of value pairs, i.e. [(xmin, xmax), (ymin, ymax)] weights : `~numpy.ndarray` The weights of the points in the 1D histogram Returns ------- array : `~numpy.ndarray` The 2D histogram array """ if isinstance(bins, numbers.Integral): nx = ny = bins else: nx, ny = bins if not np.isscalar(nx) or not np.isscalar(ny): raise TypeError('bins should be an iterable of two integers') (xmin, xmax), (ymin, ymax) = range if not np.isfinite(xmin): raise ValueError("xmin should be finite") if not np.isfinite(xmax): raise ValueError("xmax should be finite") if not np.isfinite(ymin): raise ValueError("ymin should be finite") if not np.isfinite(ymax): raise ValueError("ymax should be finite") if xmax <= xmin: raise ValueError("xmax should be greater than xmin") if ymax <= ymin: raise ValueError("xmax should be greater than xmin") if nx <= 0: raise ValueError("nx should be strictly positive") if ny <= 0: raise ValueError("ny should be strictly positive") if weights is None: return _histogram2d(x, y, nx, xmin, xmax, ny, ymin, ymax) else: return _histogram2d_weighted(x, y, weights, nx, xmin, xmax, ny, ymin, ymax) fast-histogram-0.7/fast_histogram/tests/0000755000077000000240000000000013415414723020406 5ustar tomstaff00000000000000fast-histogram-0.7/fast_histogram/tests/__init__.py0000644000077000000240000000000013414215513022500 0ustar tomstaff00000000000000fast-histogram-0.7/fast_histogram/tests/test_histogram.py0000644000077000000240000001446213414215513024016 0ustar tomstaff00000000000000import numpy as np import pytest from hypothesis import given, settings, example, assume from hypothesis import strategies as st from hypothesis.extra.numpy import arrays from ..histogram import histogram1d, histogram2d # NOTE: for now we don't test the full range of floating-point values in the # tests below, because Numpy's behavior isn't always deterministic in some # of the extreme regimes. We should add manual (non-hypothesis and not # comparing to Numpy) test cases. @given(size=st.integers(0, 100), nx=st.integers(1, 10), xmin=st.floats(-1e10, 1e10), xmax=st.floats(-1e10, 1e10), weights=st.booleans(), dtype=st.sampled_from(['>f4', 'f8', '= xmin) if weights: assume(np.allclose(np.sum(w[inside]), np.sum(reference))) else: n_inside = np.sum(inside) assume(n_inside == np.sum(reference)) fast = histogram1d(x, bins=nx, weights=w, range=(xmin, xmax)) # Numpy returns results for 32-bit results as a 32-bit histogram, but only # for 1D arrays. Since this is a summation variable it makes sense to # return 64-bit, so rather than changing the behavior of histogram1d, we # cast to 32-bit float here. if 'f4' in dtype: fast = fast.astype(np.float32) np.testing.assert_equal(fast, reference) @given(size=st.integers(0, 100), nx=st.integers(1, 10), xmin=st.floats(-1e10, 1e10), xmax=st.floats(-1e10, 1e10), ny=st.integers(1, 10), ymin=st.floats(-1e10, 1e10), ymax=st.floats(-1e10, 1e10), weights=st.booleans(), dtype=st.sampled_from(['>f4', 'f8', '= xmin) & (y <= ymax) & (y >= ymin) if weights: assume(np.allclose(np.sum(w[inside]), np.sum(reference))) else: n_inside = np.sum(inside) assume(n_inside == np.sum(reference)) fast = histogram2d(x, y, bins=(nx, ny), weights=w, range=((xmin, xmax), (ymin, ymax))) np.testing.assert_equal(fast, reference) def test_nd_arrays(): x = np.random.random(1000) result_1d = histogram1d(x, bins=10, range=(0, 1)) result_3d = histogram1d(x.reshape((10, 10, 10)), bins=10, range=(0, 1)) np.testing.assert_equal(result_1d, result_3d) y = np.random.random(1000) result_1d = histogram2d(x, y, bins=(10, 10), range=[(0, 1), (0, 1)]) result_3d = histogram2d(x.reshape((10, 10, 10)), y.reshape((10, 10, 10)), bins=(10, 10), range=[(0, 1), (0, 1)]) np.testing.assert_equal(result_1d, result_3d) def test_list(): # Make sure that lists can be passed in x_list = [1.4, 2.1, 4.2] x_arr = np.array(x_list) result_list = histogram1d(x_list, bins=10, range=(0, 10)) result_arr = histogram1d(x_arr, bins=10, range=(0, 10)) np.testing.assert_equal(result_list, result_arr) def test_non_contiguous(): x = np.random.random((10, 10, 10))[::2, ::3, :] y = np.random.random((10, 10, 10))[::2, ::3, :] w = np.random.random((10, 10, 10))[::2, ::3, :] assert not x.flags.c_contiguous assert not x.flags.f_contiguous result_1 = histogram1d(x, bins=10, range=(0, 1)) result_2 = histogram1d(x.copy(), bins=10, range=(0, 1)) np.testing.assert_equal(result_1, result_2) result_1 = histogram1d(x, bins=10, range=(0, 1), weights=w) result_2 = histogram1d(x.copy(), bins=10, range=(0, 1), weights=w) np.testing.assert_equal(result_1, result_2) result_1 = histogram2d(x, y, bins=(10, 10), range=[(0, 1), (0, 1)]) result_2 = histogram2d(x.copy(), y.copy(), bins=(10, 10), range=[(0, 1), (0, 1)]) np.testing.assert_equal(result_1, result_2) result_1 = histogram2d(x, y, bins=(10, 10), range=[(0, 1), (0, 1)], weights=w) result_2 = histogram2d(x.copy(), y.copy(), bins=(10, 10), range=[(0, 1), (0, 1)], weights=w) np.testing.assert_equal(result_1, result_2) def test_array_bins(): edges = np.array([0, 1, 2, 3, 4]) with pytest.raises(TypeError) as exc: histogram1d([1, 2, 3], bins=edges, range=(0, 10)) assert exc.value.args[0] == 'bins should be an integer' with pytest.raises(TypeError) as exc: histogram2d([1, 2, 3], [1, 2 ,3], bins=[edges, edges], range=[(0, 10), (0, 10)]) assert exc.value.args[0] == 'bins should be an iterable of two integers' fast-histogram-0.7/fast_histogram.egg-info/0000755000077000000240000000000013415414723020736 5ustar tomstaff00000000000000fast-histogram-0.7/fast_histogram.egg-info/PKG-INFO0000644000077000000240000001773113415414723022044 0ustar tomstaff00000000000000Metadata-Version: 1.0 Name: fast-histogram Version: 0.7 Summary: Fast simple 1D and 2D histograms Home-page: https://github.com/astrofrog/fast-histogram Author: Thomas Robitaille Author-email: thomas.robitaille@gmail.com License: BSD Description: |Travis Status| |AppVeyor Status| |CircleCI Status| |asv| About ----- Sometimes you just want to compute simple 1D or 2D histograms with regular bins. Fast. No nonsense. `Numpy's `__ histogram functions are versatile, and can handle for example non-regular binning, but this versatility comes at the expense of performance. The **fast-histogram** mini-package aims to provide simple and fast histogram functions for regular bins that don't compromise on performance. It doesn't do anything complicated - it just implements a simple histogram algorithm in C and keeps it simple. The aim is to have functions that are fast but also robust and reliable. The result is a 1D histogram function here that is **7-15x faster** than ``numpy.histogram``, and a 2D histogram function that is **20-25x faster** than ``numpy.histogram2d``. To install:: pip install fast-histogram or if you use conda you can instead do:: conda install -c conda-forge fast-histogram The ``fast_histogram`` module then provides two functions: ``histogram1d`` and ``histogram2d``: .. code:: python from fast_histogram import histogram1d, histogram2d Example ------- Here's an example of binning 10 million points into a regular 2D histogram: .. code:: python In [1]: import numpy as np In [2]: x = np.random.random(10_000_000) In [3]: y = np.random.random(10_000_000) In [4]: %timeit _ = np.histogram2d(x, y, range=[[-1, 2], [-2, 4]], bins=30) 935 ms ± 58.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) In [5]: from fast_histogram import histogram2d In [6]: %timeit _ = histogram2d(x, y, range=[[-1, 2], [-2, 4]], bins=30) 40.2 ms ± 624 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) (note that ``10_000_000`` is possible in Python 3.6 syntax, use ``10000000`` instead in previous versions) The version here is over 20 times faster! The following plot shows the speedup as a function of array size for the bin parameters shown above: .. figure:: https://github.com/astrofrog/fast-histogram/raw/master/speedup_compared.png :alt: Comparison of performance between Numpy and fast-histogram as well as results for the 1D case, also with 30 bins. The speedup for the 2D case is consistently between 20-25x, and for the 1D case goes from 15x for small arrays to around 7x for large arrays. Q&A --- Why don't the histogram functions return the edges? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Computing and returning the edges may seem trivial but it can slow things down by a factor of a few when computing histograms of 10^5 or fewer elements, so not returning the edges is a deliberate decision related to performance. You can easily compute the edges yourself if needed though, using ``numpy.linspace``. Doesn't package X already do this, but better? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This may very well be the case! If this duplicates another package, or if it is possible to use Numpy in a smarter way to get the same performance gains, please open an issue and I'll consider deprecating this package :) One package that does include fast histogram functions (including in n-dimensions) and can compute other statistics is `vaex `_, so take a look there if you need more advanced functionality! Are the 2D histograms not transposed compared to what they should be? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ There is technically no 'right' and 'wrong' orientation - here we adopt the convention which gives results consistent with Numpy, so: .. code:: python numpy.histogram2d(x, y, range=[[xmin, xmax], [ymin, ymax]], bins=[nx, ny]) should give the same result as: .. code:: python fast_histogram.histogram2d(x, y, range=[[xmin, xmax], [ymin, ymax]], bins=[nx, ny]) Why not contribute this to Numpy directly? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ As mentioned above, the Numpy functions are much more versatile, so they could not be replaced by the ones here. One option would be to check in Numpy's functions for cases that are simple and dispatch to functions such as the ones here, or add dedicated functions for regular binning. I hope we can get this in Numpy in some form or another eventually, but for now, the aim is to have this available to packages that need to support a range of Numpy versions. Why not use Cython? ~~~~~~~~~~~~~~~~~~~ I originally implemented this in Cython, but found that I could get a 50% performance improvement by going straight to a C extension. What about using Numba? ~~~~~~~~~~~~~~~~~~~~~~~ I specifically want to keep this package as easy as possible to install, and while `Numba `__ is a great package, it is not trivial to install outside of Anaconda. Could this be parallelized? ~~~~~~~~~~~~~~~~~~~~~~~~~~~ This may benefit from parallelization under certain circumstances. The easiest solution might be to use OpenMP, but this won't work on all platforms, so it would need to be made optional. Couldn't you make it faster by using the GPU? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Almost certainly, though the aim here is to have an easily installable and portable package, and introducing GPUs is going to affect both of these. Why make a package specifically for this? This is a tiny amount of functionality ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Packages that need this could simply bundle their own C extension or Cython code to do this, but the main motivation for releasing this as a mini-package is to avoid making pure-Python packages into packages that require compilation just because of the need to compute fast histograms. Can I contribute? ~~~~~~~~~~~~~~~~~ Yes please! This is not meant to be a finished package, and I welcome pull request to improve things. .. |Travis Status| image:: https://travis-ci.org/astrofrog/fast-histogram.svg?branch=master :target: https://travis-ci.org/astrofrog/fast-histogram .. |AppVeyor Status| image:: https://ci.appveyor.com/api/projects/status/ek63g9haku5on0q2/branch/master?svg=true :target: https://ci.appveyor.com/project/astrofrog/fast-histogram .. |CircleCI Status| image:: https://circleci.com/gh/astrofrog/fast-histogram/tree/master.svg?style=svg :target: https://circleci.com/gh/astrofrog/fast-histogram/tree/master .. |asv| image:: https://img.shields.io/badge/benchmarked%20by-asv-brightgreen.svg :target: https://astrofrog.github.io/fast-histogram Platform: UNKNOWN fast-histogram-0.7/fast_histogram.egg-info/SOURCES.txt0000644000077000000240000000062013415414723022620 0ustar tomstaff00000000000000CHANGES.rst LICENSE MANIFEST.in README.rst setup.py fast_histogram/__init__.py fast_histogram/_histogram_core.c fast_histogram/histogram.py fast_histogram.egg-info/PKG-INFO fast_histogram.egg-info/SOURCES.txt fast_histogram.egg-info/dependency_links.txt fast_histogram.egg-info/requires.txt fast_histogram.egg-info/top_level.txt fast_histogram/tests/__init__.py fast_histogram/tests/test_histogram.pyfast-histogram-0.7/fast_histogram.egg-info/dependency_links.txt0000644000077000000240000000000113415414723025004 0ustar tomstaff00000000000000 fast-histogram-0.7/fast_histogram.egg-info/requires.txt0000644000077000000240000000000613415414723023332 0ustar tomstaff00000000000000numpy fast-histogram-0.7/fast_histogram.egg-info/top_level.txt0000644000077000000240000000001713415414723023466 0ustar tomstaff00000000000000fast_histogram fast-histogram-0.7/setup.cfg0000644000077000000240000000004613415414723016053 0ustar tomstaff00000000000000[egg_info] tag_build = tag_date = 0 fast-histogram-0.7/setup.py0000644000077000000240000000277613415414647015765 0ustar tomstaff00000000000000import os import io import sys from setuptools import setup from setuptools.extension import Extension from setuptools.command.build_ext import build_ext class build_ext_with_numpy(build_ext): def run(self): import numpy self.include_dirs.append(numpy.get_include()) build_ext.run(self) extensions = [Extension("fast_histogram._histogram_core", [os.path.join('fast_histogram', '_histogram_core.c')])] with io.open('README.rst', encoding='utf-8') as f: LONG_DESCRIPTION = f.read() try: import numpy except ImportError: # We include an upper limit to the version because setup_requires is # honored by easy_install not pip, and the former doesn't ignore pre- # releases. It's not an issue if the package is built against 1.15 and # then 1.16 gets installed after, but it still makes sense to update the # upper limit whenever a new version of Numpy is released. setup_requires = ['numpy<1.16'] else: setup_requires = [] setup(name='fast-histogram', version='0.7', description='Fast simple 1D and 2D histograms', long_description=LONG_DESCRIPTION, setup_requires=setup_requires , install_requires=['numpy'], author='Thomas Robitaille', author_email='thomas.robitaille@gmail.com', license='BSD', url='https://github.com/astrofrog/fast-histogram', packages=['fast_histogram', 'fast_histogram.tests'], ext_modules=extensions, cmdclass={'build_ext': build_ext_with_numpy})