pax_global_header00006660000000000000000000000064132323607520014515gustar00rootroot0000000000000052 comment=c2b871c0d022e04963000b6506d14c69f5bc8edd sparse-0.2.0/000077500000000000000000000000001323236075200130115ustar00rootroot00000000000000sparse-0.2.0/.gitignore000066400000000000000000000013651323236075200150060ustar00rootroot00000000000000#####=== Python ===##### # Byte-compiled / optimized / DLL files __pycache__/ *.py[cod] *$py.class # C extensions *.so # Distribution / packaging .Python env/ build/ develop-eggs/ dist/ downloads/ eggs/ .eggs/ lib/ lib64/ parts/ sdist/ var/ *.egg-info/ .installed.cfg *.egg # PyInstaller # Usually these files are written by a python script from a template # before PyInstaller builds the exe, so as to inject date/other infos into it. *.manifest *.spec # Installer logs pip-log.txt pip-delete-this-directory.txt # Unit test / coverage reports htmlcov/ .tox/ .coverage .coverage.* .cache nosetests.xml coverage.xml *,cover # Translations *.mo *.pot # Django stuff: *.log # Sphinx documentation docs/_build/ # PyBuilder target/ # PyCharm .idea/ sparse-0.2.0/.travis.yml000066400000000000000000000011401323236075200151160ustar00rootroot00000000000000sudo: False language: python matrix: include: - python: 2.7 - python: 3.6 install: # Install conda - wget http://repo.continuum.io/miniconda/Miniconda-latest-Linux-x86_64.sh -O miniconda.sh - bash miniconda.sh -b -p $HOME/miniconda - export PATH="$HOME/miniconda/bin:$PATH" - conda config --set always_yes yes --set changeps1 no - conda update conda # Install dependencies - conda create -n test-sparse python=$TRAVIS_PYTHON_VERSION pytest numpy scipy flake8 nomkl - source activate test-sparse - pip install -e .[tests] script: - py.test notifications: email: false sparse-0.2.0/LICENSE.rst000066400000000000000000000030461323236075200146300ustar00rootroot00000000000000Modified BSD License ==================== | *Copyright © 2017, Continuum Analytics, Inc. and contributors* | *All rights reserved.* Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. Neither the name of the Continuum Analytics nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. sparse-0.2.0/MANIFEST.in000066400000000000000000000003241323236075200145460ustar00rootroot00000000000000recursive-include sparse *.py recursive-include sparse *.html recursive-include docs *.rst include setup.py include README.rst include LICENSE.txt include MANIFEST.in include requirements.txt prune docs/_build sparse-0.2.0/README.rst000066400000000000000000000115421323236075200145030ustar00rootroot00000000000000Sparse Multidimensional Arrays ============================== |Build Status| This implements sparse multidimensional arrays on top of NumPy and Scipy.sparse. It generalizes the scipy.sparse.coo_matrix_ layout but extends beyond just rows and columns to an arbitrary number of dimensions. The original motivation is for machine learning algorithms, but it is intended for somewhat general use. This Supports -------------- - NumPy ufuncs (where zeros are preserved) - Binary operations with other :code:`COO` objects, where zeros are preserved. - Binary operations with Scipy sparse matrices, where zeros are preserved. - Binary operations with scalars, where zeros are preserved. - Broadcasting binary operations and :code:`broadcast_to`. - Reductions (sum, max, min, prod, ...) - Reshape - Transpose - Tensordot - triu, tril - Slicing with integers, lists, and slices (with no step value) - Concatenation and stacking This may yet support -------------------- A "does not support" list is hard to build because it is infinitely long. However the following things are in scope, relatively doable, and not yet built (help welcome). - Incremental buliding of arrays and inplace updates - More operations supported by Numpy :code:`ndarray`s, such as :code:`argmin` and :code:`argmax`. - Array building functions such as :code:`eye`, :code:`spdiags`. See `building sparse matrices`_. - Linear algebra operations such as :code:`inv`, :code:`norm` and :code:`solve`. See scipy.sparse.linalg_. There are no plans to support ----------------------------- - Parallel computing (though Dask.array may use this in the future) Example ------- :: pip install sparse .. code-block:: python import numpy as np n = 1000 ndims = 4 nnz = 1000000 coords = np.random.randint(0, n - 1, size=(ndims, nnz)) data = np.random.random(nnz) import sparse x = sparse.COO(coords, data, shape=((n,) * ndims)) x # x.nbytes # 16000000 y = sparse.tensordot(x, x, axes=((3, 0), (1, 2))) y # z = y.sum(axis=(0, 1, 2)) z # z.todense() # array([ 244.0671803 , 246.38455787, 243.43383158, 256.46068737, # 261.18598416, 256.36439011, 271.74177584, 238.56059193, # ... How does this work? ------------------- Scipy.sparse implements decent 2-d sparse matrix objects for the standard layouts, notably for our purposes `CSR, CSC, and COO `_. However it doesn't include support for sparse arrays of greater than 2 dimensions. This library extends the COO layout, which stores the row index, column index, and value of every element: === === ==== row col data === === ==== 0 0 10 0 2 13 1 3 9 3 8 21 === === ==== It is straightforward to extend the COO layout to an arbitrary number of dimensions: ==== ==== ==== === ==== dim1 dim2 dim3 ... data ==== ==== ==== === ==== 0 0 0 . 10 0 0 3 . 13 0 2 2 . 9 3 1 4 . 21 ==== ==== ==== === ==== This makes it easy to *store* a multidimensional sparse array, but we still need to reimplement all of the array operations like transpose, reshape, slicing, tensordot, reductions, etc., which can be quite challenging in general. Fortunately in many cases we can leverage the existing SciPy.sparse algorithms if we can intelligently transpose and reshape our multi-dimensional array into an appropriate 2-d sparse matrix, perform a modified sparse matrix operation, and then reshape and transpose back. These reshape and transpose operations can all be done at numpy speeds by modifying the arrays of coordinates. After scipy.sparse runs its operations (coded in C) then we can convert back to using the same path of reshapings and transpositions in reverse. This approach is not novel; it has been around in the multidimensional array community for a while. It is also how some operations in numpy work. For example the ``numpy.tensordot`` function performs transposes and reshapes so that it can use the ``numpy.dot`` function for matrix multiplication which is backed by fast BLAS implementations. The ``sparse.tensordot`` code is very slight modification of ``numpy.tensordot``, replacing ``numpy.dot`` with ``scipy.sprarse.csr_matrix.dot``. LICENSE ------- This is licensed under New BSD-3 .. _scipy.sparse.coo_matrix: https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.coo_matrix.html .. _building sparse matrices: https://docs.scipy.org/doc/scipy/reference/sparse.html#functions .. _scipy.sparse.linalg: https://docs.scipy.org/doc/scipy/reference/sparse.linalg.html .. |Build Status| image:: https://travis-ci.org/mrocklin/sparse.svg?branch=master :target: https://travis-ci.org/mrocklin/sparse sparse-0.2.0/docs/000077500000000000000000000000001323236075200137415ustar00rootroot00000000000000sparse-0.2.0/docs/_templates/000077500000000000000000000000001323236075200160765ustar00rootroot00000000000000sparse-0.2.0/docs/_templates/autosummary/000077500000000000000000000000001323236075200204645ustar00rootroot00000000000000sparse-0.2.0/docs/_templates/autosummary/base.rst000066400000000000000000000001511323236075200221250ustar00rootroot00000000000000{{ objname | escape | underline}} .. currentmodule:: {{ module }} .. auto{{ objtype }}:: {{ objname }} sparse-0.2.0/docs/_templates/autosummary/class.rst000066400000000000000000000010171323236075200223220ustar00rootroot00000000000000{{ objname | escape | underline}} .. currentmodule:: {{ module }} .. autoclass:: {{ objname }} {% block attributes %} {% if attributes %} .. rubric:: Attributes .. autosummary:: :toctree: {% for item in attributes %} {{ name }}.{{ item }} {% endfor %} {% endif %} {% endblock %} {% block methods %} {% if methods %} .. rubric:: Methods .. autosummary:: :toctree: {% for item in methods %} {{ name }}.{{ item }} {% endfor %} {% endif %} {% endblock %} sparse-0.2.0/docs/_templates/autosummary/module.rst000066400000000000000000000006541323236075200225100ustar00rootroot00000000000000{{ fullname | escape | underline }} .. rubric:: Description .. automodule:: {{ fullname }} .. currentmodule:: {{ fullname }} {% if classes %} .. rubric:: Classes .. autosummary:: :toctree: {% for class in classes %} {{ class }} {% endfor %} {% endif %} {% if functions %} .. rubric:: Functions .. autosummary:: :toctree: {% for function in functions %} {{ function }} {% endfor %} {% endif %} sparse-0.2.0/docs/api.rst000066400000000000000000000001441323236075200152430ustar00rootroot00000000000000API Reference ============= .. rubric:: Modules .. autosummary:: :toctree: generated sparsesparse-0.2.0/docs/changelog.rst000066400000000000000000000040351323236075200164240ustar00rootroot00000000000000Changelog ========= 0.2.0 / 2018-01-25 ------------------- - Add Elementwise broadcasting and broadcast_to (:pr:`35`) `Hameer Abbasi`_ - Add Bitwise ops (:pr:`38`) `Hameer Abbasi`_ - Add slicing support for Ellipsis and None (:pr:`37`) `Matthew Rocklin`_ - Add triu and tril and tests (:pr:`40`) `Hameer Abbasi`_ - Extend gitignore file (:pr:`42`) `Nils Werner`_ - Update MANIFEST.in (:pr:`45`) `Matthew Rocklin`_ - Remove auto densification and unify operator code (:pr:`46`) `Hameer Abbasi`_ - Fix nnz for scalars (:pr:`48`) `Hameer Abbasi`_ - Update README (:pr:`50`) (:pr:`53`) `Hameer Abbasi`_ - Fix large concatenations and stacks (:pr:`50`) `Hameer Abbasi`_ - Add __array_ufunc__ for __call__ and reduce (:pr:`r9`) `Hameer Abbasi`_ - Update documentation (:pr:`54`) `Hameer Abbasi`_ - Flake8 and coverage in pytest (:pr:`59`) `Nils Werner`_ - Copy constructor (:pr:`55`) `Nils Werner`_ - Add random function (:pr:`41`) `Nils Werner`_ - Add lots of indexing features (:pr:`57`) `Hameer Abbasi`_ - Validate .transpose axes (:pr:`61`) `Nils Werner`_ - Simplify axes normalization logic `Nils Werner`_ - User higher density for sparse.random in tests (:pr:`64`) `Keisuke Fujii`_ - Support left-side np.number elemwise operations (:pr:`67`) `Keisuke Fujii`_ - Support len on COO (:pr:`68`) `Nils Werner`_ - Update scipy version in requirements (:pr:`70`) `Hameer Abbasi`_ - Documentation (:pr:`43`) `Nils Werner`_ and `Hameer Abbasi`_ - Use Tox for cross Python-version testing (:pr:`77`) `Nils Werner`_ - Support mixed sparse-dense when result is sparse (:pr:`75`) `Hameer Abbasi`_ - Update contributing.rst (:pr:`76`) `Hameer Abbasi`_ - Size and density properties (:pr:`69`) `Nils Werner`_ - Fix large sum (:pr:`83`) `Hameer Abbasi`_ - Add DOK (:pr:`85`) `Hameer Abbasi`_ - Implement __array__ protocol (:pr:`87`) `Matthew Rocklin`_ .. _`Matthew Rocklin`: https://github.com/mrocklin .. _`Hameer Abbasi`: https://github.com/hameerabbasi .. _`Nils Werner`: https://github.com/nils-werner .. _`Keisuke Fujii`: https://github.com/fujiisoup sparse-0.2.0/docs/conf.py000066400000000000000000000135431323236075200152460ustar00rootroot00000000000000#!/usr/bin/env python3 # -*- coding: utf-8 -*- # # sparse documentation build configuration file, created by # sphinx-quickstart on Fri Dec 29 20:58:03 2017. # # This file is execfile()d with the current directory set to its # containing dir. # # Note that not all possible configuration values are present in this # autogenerated file. # # All configuration values have a default; values that are commented out # serve to show the default. # If extensions (or modules to document with autodoc) are in another directory, # add these directories to sys.path here. If the directory is relative to the # documentation root, use os.path.abspath to make it absolute, like shown here. # import os import sys sys.path.insert(0, os.path.abspath('..')) # -- General configuration ------------------------------------------------ # If your documentation needs a minimal Sphinx version, state it here. # # needs_sphinx = '1.0' # Add any Sphinx extension module names here, as strings. They can be # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom # ones. extensions = [ 'sphinx.ext.autodoc', 'sphinx.ext.doctest', 'sphinx.ext.intersphinx', 'sphinx.ext.coverage', 'sphinx.ext.mathjax', 'sphinx.ext.napoleon', 'sphinx.ext.viewcode', 'sphinx.ext.autosummary', 'sphinx.ext.inheritance_diagram', 'sphinx.ext.extlinks', ] # Add any paths that contain templates here, relative to this directory. templates_path = ['_templates'] mathjax_path = "https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" # The suffix(es) of source filenames. # You can specify multiple suffix as a list of string: # # source_suffix = ['.rst', '.md'] source_suffix = '.rst' # The master toctree document. master_doc = 'index' # General information about the project. project = 'sparse' copyright = '2017, Sparse Developers' author = 'Sparse Developers' # The version info for the project you're documenting, acts as replacement for # |version| and |release|, also used in various other places throughout the # built documents. # # The short X.Y version. version = '0.1.1' # The full version, including alpha/beta/rc tags. release = '0.1.1' # The language for content autogenerated by Sphinx. Refer to documentation # for a list of supported languages. # # This is also used if you do content translation via gettext catalogs. # Usually you set "language" from the command line for these cases. language = None # List of patterns, relative to source directory, that match files and # directories to ignore when looking for source files. # This patterns also effect to html_static_path and html_extra_path exclude_patterns = ['_build', '**tests**', '**setup**', '**extern**', '**data**'] # The name of the Pygments (syntax highlighting) style to use. pygments_style = 'sphinx' # If true, `todo` and `todoList` produce output, else they produce nothing. todo_include_todos = False autosummary_generate = True # -- Options for HTML output ---------------------------------------------- # The theme to use for HTML and HTML Help pages. See the documentation for # a list of builtin themes. # html_theme = "sphinx_rtd_theme" # Theme options are theme-specific and customize the look and feel of a theme # further. For a list of options available for each theme, see the # documentation. # # html_theme_options = {} # Add any paths that contain custom static files (such as style sheets) here, # relative to this directory. They are copied after the builtin static files, # so a file named "default.css" will overwrite the builtin "default.css". # html_static_path = ['_static'] # Custom sidebar templates, must be a dictionary that maps document names # to template names. # # This is required for the alabaster theme # refs: http://alabaster.readthedocs.io/en/latest/installation.html#sidebars # html_sidebars = { # '**': [ # 'relations.html', # needs 'show_related': True theme option to display # 'searchbox.html', # ] # } # -- Options for HTMLHelp output ------------------------------------------ # Output file base name for HTML help builder. htmlhelp_basename = 'sparsedoc' # -- Options for LaTeX output --------------------------------------------- latex_elements = { # The paper size ('letterpaper' or 'a4paper'). # # 'papersize': 'letterpaper', # The font size ('10pt', '11pt' or '12pt'). # # 'pointsize': '10pt', # Additional stuff for the LaTeX preamble. # # 'preamble': '', # Latex figure (float) alignment # # 'figure_align': 'htbp', } # Grouping the document tree into LaTeX files. List of tuples # (source start file, target name, title, # author, documentclass [howto, manual, or own class]). latex_documents = [ (master_doc, 'sparse.tex', 'sparse Documentation', 'Sparse Developers', 'manual'), ] # -- Options for manual page output --------------------------------------- # One entry per manual page. List of tuples # (source start file, name, description, authors, manual section). man_pages = [ (master_doc, 'sparse', 'sparse Documentation', [author], 1) ] # -- Options for Texinfo output ------------------------------------------- # Grouping the document tree into Texinfo files. List of tuples # (source start file, target name, title, author, # dir menu entry, description, category) texinfo_documents = [ (master_doc, 'sparse', 'sparse Documentation', author, 'sparse', 'One line description of project.', 'Miscellaneous'), ] # Example configuration for intersphinx: refer to the Python standard library. intersphinx_mapping = { 'python': ('https://docs.python.org/3/', None), 'numpy': ('https://docs.scipy.org/doc/numpy/', None), 'scipy': ('https://docs.scipy.org/doc/scipy/reference/', None) } extlinks = { 'issue': ('https://github.com/mrocklin/sparse/issues/%s', 'GH#'), 'pr': ('https://github.com/mrocklin/sparse/pull/%s', 'GH#'), } sparse-0.2.0/docs/contributing.rst000066400000000000000000000053231323236075200172050ustar00rootroot00000000000000Contributing to sparse ====================== General Guidelines ------------------ sparse is a community-driven project on GitHub. You can find our `repository on GitHub `_. Feel free to open issues for new features or bugs, or open a pull request to fix a bug or add a new feature. If you haven't contributed to open-source before, we recommend you read `this excellent guide by GitHub on how to contribute to open source `_. The guide is long, so you can gloss over things you're familiar with. If you're not already familiar with it, we follow the `fork and pull model `_ on GitHub. Running/Adding Unit Tests ------------------------- It is best if all new functionality and/or bug fixes have unit tests added with each use-case. Since we support both Python 2.7 and Python 3.5 and newer, it is recommended to test with at least these two versions before committing your code or opening a pull request. We use `pytest `_ as our unit testing framework, with the pytest-cov extension to check code coverage and pytest-flake8 to check code style. You don't need to configure these extensions yourself. Once you've configured your environment, you can just :code:`cd` to the root of your repository and run .. code-block:: bash py.test Adding/Building the Documentation --------------------------------- If a feature is stable and relatively finalized, it is time to add it to the documentation. If you are adding any private/public functions, it is best to add docstrings, to aid in reviewing code and also for the API reference. We use `Numpy style docstrings `_ and `Sphinx `_ to document this library. Sphinx, in turn, uses `reStructuredText `_ as its markup language for adding code. We use the `Sphinx Autosummary extension `_ to generate API references. In particular, you may want do look at the :code:`docs/generated` directory to see how these files look and where to add new functions, classes or modules. For example, if you add a new function to the :code:`sparse.COO` class, you would open up :code:`docs/generated/sparse.COO.rst`, and add in the name of the function where appropriate. To build the documentation, you can :code:`cd` into the :code:`docs` directory and run .. code-block:: bash sphinx-build -b html . _build/html After this, you can find an HTML version of the documentation in :code:`docs/_build/html/index.html`. sparse-0.2.0/docs/generated/000077500000000000000000000000001323236075200156775ustar00rootroot00000000000000sparse-0.2.0/docs/generated/sparse.COO.T.rst000066400000000000000000000001021323236075200205400ustar00rootroot00000000000000COO\.T ====== .. currentmodule:: sparse .. autoattribute:: COO.Tsparse-0.2.0/docs/generated/sparse.COO.abs.rst000066400000000000000000000001051323236075200211050ustar00rootroot00000000000000COO\.abs ======== .. currentmodule:: sparse .. automethod:: COO.abssparse-0.2.0/docs/generated/sparse.COO.astype.rst000066400000000000000000000001161323236075200216470ustar00rootroot00000000000000COO\.astype =========== .. currentmodule:: sparse .. automethod:: COO.astypesparse-0.2.0/docs/generated/sparse.COO.broadcast_to.rst000066400000000000000000000001421323236075200230050ustar00rootroot00000000000000COO\.broadcast\_to ================== .. currentmodule:: sparse .. automethod:: COO.broadcast_tosparse-0.2.0/docs/generated/sparse.COO.ceil.rst000066400000000000000000000001101323236075200212500ustar00rootroot00000000000000COO\.ceil ========= .. currentmodule:: sparse .. automethod:: COO.ceilsparse-0.2.0/docs/generated/sparse.COO.conj.rst000066400000000000000000000001101323236075200212650ustar00rootroot00000000000000COO\.conj ========= .. currentmodule:: sparse .. automethod:: COO.conjsparse-0.2.0/docs/generated/sparse.COO.conjugate.rst000066400000000000000000000001271323236075200223230ustar00rootroot00000000000000COO\.conjugate ============== .. currentmodule:: sparse .. automethod:: COO.conjugatesparse-0.2.0/docs/generated/sparse.COO.density.rst000066400000000000000000000001251323236075200220210ustar00rootroot00000000000000COO\.density ============ .. currentmodule:: sparse .. autoattribute:: COO.density sparse-0.2.0/docs/generated/sparse.COO.dot.rst000066400000000000000000000001051323236075200211260ustar00rootroot00000000000000COO\.dot ======== .. currentmodule:: sparse .. automethod:: COO.dotsparse-0.2.0/docs/generated/sparse.COO.dtype.rst000066400000000000000000000001161323236075200214670ustar00rootroot00000000000000COO\.dtype ========== .. currentmodule:: sparse .. autoattribute:: COO.dtypesparse-0.2.0/docs/generated/sparse.COO.elemwise.rst000066400000000000000000000001241323236075200221530ustar00rootroot00000000000000COO\.elemwise ============= .. currentmodule:: sparse .. automethod:: COO.elemwisesparse-0.2.0/docs/generated/sparse.COO.enable_caching.rst000066400000000000000000000001501323236075200232420ustar00rootroot00000000000000COO\.enable\_caching ==================== .. currentmodule:: sparse .. automethod:: COO.enable_cachingsparse-0.2.0/docs/generated/sparse.COO.exp.rst000066400000000000000000000001051323236075200211340ustar00rootroot00000000000000COO\.exp ======== .. currentmodule:: sparse .. automethod:: COO.expsparse-0.2.0/docs/generated/sparse.COO.expm1.rst000066400000000000000000000001131323236075200213710ustar00rootroot00000000000000COO\.expm1 ========== .. currentmodule:: sparse .. automethod:: COO.expm1sparse-0.2.0/docs/generated/sparse.COO.floor.rst000066400000000000000000000001131323236075200214600ustar00rootroot00000000000000COO\.floor ========== .. currentmodule:: sparse .. automethod:: COO.floorsparse-0.2.0/docs/generated/sparse.COO.from_numpy.rst000066400000000000000000000001341323236075200225350ustar00rootroot00000000000000COO\.from\_numpy ================ .. currentmodule:: sparse .. automethod:: COO.from_numpysparse-0.2.0/docs/generated/sparse.COO.from_scipy_sparse.rst000066400000000000000000000001631323236075200240730ustar00rootroot00000000000000COO\.from\_scipy\_sparse ======================== .. currentmodule:: sparse .. automethod:: COO.from_scipy_sparsesparse-0.2.0/docs/generated/sparse.COO.linear_loc.rst000066400000000000000000000001341323236075200224510ustar00rootroot00000000000000COO\.linear\_loc ================ .. currentmodule:: sparse .. automethod:: COO.linear_locsparse-0.2.0/docs/generated/sparse.COO.log1p.rst000066400000000000000000000001131323236075200213610ustar00rootroot00000000000000COO\.log1p ========== .. currentmodule:: sparse .. automethod:: COO.log1psparse-0.2.0/docs/generated/sparse.COO.max.rst000066400000000000000000000001051323236075200211250ustar00rootroot00000000000000COO\.max ======== .. currentmodule:: sparse .. automethod:: COO.maxsparse-0.2.0/docs/generated/sparse.COO.maybe_densify.rst000066400000000000000000000001451323236075200231620ustar00rootroot00000000000000COO\.maybe\_densify =================== .. currentmodule:: sparse .. automethod:: COO.maybe_densifysparse-0.2.0/docs/generated/sparse.COO.min.rst000066400000000000000000000001051323236075200211230ustar00rootroot00000000000000COO\.min ======== .. currentmodule:: sparse .. automethod:: COO.minsparse-0.2.0/docs/generated/sparse.COO.nbytes.rst000066400000000000000000000001211323236075200216420ustar00rootroot00000000000000COO\.nbytes =========== .. currentmodule:: sparse .. autoattribute:: COO.nbytessparse-0.2.0/docs/generated/sparse.COO.ndim.rst000066400000000000000000000001131323236075200212660ustar00rootroot00000000000000COO\.ndim ========= .. currentmodule:: sparse .. autoattribute:: COO.ndimsparse-0.2.0/docs/generated/sparse.COO.nnz.rst000066400000000000000000000001101323236075200211410ustar00rootroot00000000000000COO\.nnz ======== .. currentmodule:: sparse .. autoattribute:: COO.nnzsparse-0.2.0/docs/generated/sparse.COO.prod.rst000066400000000000000000000001101323236075200213000ustar00rootroot00000000000000COO\.prod ========= .. currentmodule:: sparse .. automethod:: COO.prodsparse-0.2.0/docs/generated/sparse.COO.reduce.rst000066400000000000000000000001161323236075200216110ustar00rootroot00000000000000COO\.reduce =========== .. currentmodule:: sparse .. automethod:: COO.reducesparse-0.2.0/docs/generated/sparse.COO.reshape.rst000066400000000000000000000001211323236075200217650ustar00rootroot00000000000000COO\.reshape ============ .. currentmodule:: sparse .. automethod:: COO.reshapesparse-0.2.0/docs/generated/sparse.COO.rint.rst000066400000000000000000000001101323236075200213100ustar00rootroot00000000000000COO\.rint ========= .. currentmodule:: sparse .. automethod:: COO.rintsparse-0.2.0/docs/generated/sparse.COO.round.rst000066400000000000000000000001131323236075200214660ustar00rootroot00000000000000COO\.round ========== .. currentmodule:: sparse .. automethod:: COO.roundsparse-0.2.0/docs/generated/sparse.COO.rst000066400000000000000000000032361323236075200203510ustar00rootroot00000000000000COO === .. currentmodule:: sparse .. autoclass:: COO .. note:: :obj:`COO` objects also support :doc:`operators <../user_manual/operations/basic>` and :doc:`indexing <../user_manual/operations/indexing>` .. rubric:: Attributes .. autosummary:: :toctree: COO.T COO.dtype COO.nbytes COO.ndim COO.nnz COO.size COO.density .. rubric:: :doc:`Constructing COO objects <../user_manual/constructing>` .. autosummary:: :toctree: COO.from_numpy COO.from_scipy_sparse .. rubric:: :doc:`Element-wise operations <../user_manual/operations/elemwise>` .. autosummary:: :toctree: COO.elemwise COO.abs COO.astype COO.ceil COO.conj COO.conjugate COO.exp COO.expm1 COO.floor COO.log1p COO.rint COO.round COO.sin COO.sinh COO.sqrt COO.tan COO.tanh .. rubric:: :doc:`Reductions <../user_manual/operations/reductions>` .. autosummary:: :toctree: COO.reduce COO.sum COO.max COO.min COO.prod .. rubric:: :doc:`Converting to other formats <../user_manual/converting>` .. autosummary:: :toctree: COO.todense COO.maybe_densify COO.to_scipy_sparse COO.tocsc COO.tocsr .. rubric:: :doc:`Other operations <../user_manual/operations/other>` .. autosummary:: :toctree: COO.dot COO.reshape COO.transpose .. rubric:: Utility functions .. autosummary:: :toctree: COO.broadcast_to COO.enable_caching COO.linear_loc COO.sort_indices COO.sum_duplicates sparse-0.2.0/docs/generated/sparse.COO.sin.rst000066400000000000000000000001051323236075200211310ustar00rootroot00000000000000COO\.sin ======== .. currentmodule:: sparse .. automethod:: COO.sinsparse-0.2.0/docs/generated/sparse.COO.sinh.rst000066400000000000000000000001101323236075200212750ustar00rootroot00000000000000COO\.sinh ========= .. currentmodule:: sparse .. automethod:: COO.sinhsparse-0.2.0/docs/generated/sparse.COO.size.rst000066400000000000000000000001141323236075200213120ustar00rootroot00000000000000COO\.size ========= .. currentmodule:: sparse .. autoattribute:: COO.size sparse-0.2.0/docs/generated/sparse.COO.sort_indices.rst000066400000000000000000000001421323236075200230260ustar00rootroot00000000000000COO\.sort\_indices ================== .. currentmodule:: sparse .. automethod:: COO.sort_indicessparse-0.2.0/docs/generated/sparse.COO.sqrt.rst000066400000000000000000000001101323236075200213250ustar00rootroot00000000000000COO\.sqrt ========= .. currentmodule:: sparse .. automethod:: COO.sqrtsparse-0.2.0/docs/generated/sparse.COO.sum.rst000066400000000000000000000001051323236075200211440ustar00rootroot00000000000000COO\.sum ======== .. currentmodule:: sparse .. automethod:: COO.sumsparse-0.2.0/docs/generated/sparse.COO.sum_duplicates.rst000066400000000000000000000001501323236075200233610ustar00rootroot00000000000000COO\.sum\_duplicates ==================== .. currentmodule:: sparse .. automethod:: COO.sum_duplicatessparse-0.2.0/docs/generated/sparse.COO.tan.rst000066400000000000000000000001051323236075200211220ustar00rootroot00000000000000COO\.tan ======== .. currentmodule:: sparse .. automethod:: COO.tansparse-0.2.0/docs/generated/sparse.COO.tanh.rst000066400000000000000000000001101323236075200212660ustar00rootroot00000000000000COO\.tanh ========= .. currentmodule:: sparse .. automethod:: COO.tanhsparse-0.2.0/docs/generated/sparse.COO.to_scipy_sparse.rst000066400000000000000000000001551323236075200235530ustar00rootroot00000000000000COO\.to\_scipy\_sparse ====================== .. currentmodule:: sparse .. automethod:: COO.to_scipy_sparsesparse-0.2.0/docs/generated/sparse.COO.tocsc.rst000066400000000000000000000001131323236075200214520ustar00rootroot00000000000000COO\.tocsc ========== .. currentmodule:: sparse .. automethod:: COO.tocscsparse-0.2.0/docs/generated/sparse.COO.tocsr.rst000066400000000000000000000001131323236075200214710ustar00rootroot00000000000000COO\.tocsr ========== .. currentmodule:: sparse .. automethod:: COO.tocsrsparse-0.2.0/docs/generated/sparse.COO.todense.rst000066400000000000000000000001211323236075200217770ustar00rootroot00000000000000COO\.todense ============ .. currentmodule:: sparse .. automethod:: COO.todensesparse-0.2.0/docs/generated/sparse.COO.transpose.rst000066400000000000000000000001271323236075200223620ustar00rootroot00000000000000COO\.transpose ============== .. currentmodule:: sparse .. automethod:: COO.transposesparse-0.2.0/docs/generated/sparse.DOK.from_coo.rst000066400000000000000000000001261323236075200221430ustar00rootroot00000000000000DOK\.from\_coo ============== .. currentmodule:: sparse .. automethod:: DOK.from_coosparse-0.2.0/docs/generated/sparse.DOK.from_numpy.rst000066400000000000000000000001341323236075200225320ustar00rootroot00000000000000DOK\.from\_numpy ================ .. currentmodule:: sparse .. automethod:: DOK.from_numpysparse-0.2.0/docs/generated/sparse.DOK.ndim.rst000066400000000000000000000001131323236075200212630ustar00rootroot00000000000000DOK\.ndim ========= .. currentmodule:: sparse .. autoattribute:: DOK.ndimsparse-0.2.0/docs/generated/sparse.DOK.nnz.rst000066400000000000000000000001101323236075200211360ustar00rootroot00000000000000DOK\.nnz ======== .. currentmodule:: sparse .. autoattribute:: DOK.nnzsparse-0.2.0/docs/generated/sparse.DOK.rst000066400000000000000000000005311323236075200203410ustar00rootroot00000000000000DOK === .. currentmodule:: sparse .. autoclass:: DOK .. rubric:: Attributes .. autosummary:: :toctree: DOK.ndim DOK.nnz .. rubric:: Methods .. autosummary:: :toctree: DOK.from_coo DOK.from_numpy DOK.to_coo DOK.todense sparse-0.2.0/docs/generated/sparse.DOK.to_coo.rst000066400000000000000000000001201323236075200216140ustar00rootroot00000000000000DOK\.to\_coo ============ .. currentmodule:: sparse .. automethod:: DOK.to_coosparse-0.2.0/docs/generated/sparse.DOK.todense.rst000066400000000000000000000001211323236075200217740ustar00rootroot00000000000000DOK\.todense ============ .. currentmodule:: sparse .. automethod:: DOK.todensesparse-0.2.0/docs/generated/sparse.concatenate.rst000066400000000000000000000001211323236075200222030ustar00rootroot00000000000000concatenate =========== .. currentmodule:: sparse .. autofunction:: concatenatesparse-0.2.0/docs/generated/sparse.dot.rst000066400000000000000000000000711323236075200205110ustar00rootroot00000000000000dot === .. currentmodule:: sparse .. autofunction:: dotsparse-0.2.0/docs/generated/sparse.random.rst000066400000000000000000000001021323236075200211760ustar00rootroot00000000000000random ====== .. currentmodule:: sparse .. autofunction:: randomsparse-0.2.0/docs/generated/sparse.rst000066400000000000000000000005251323236075200177300ustar00rootroot00000000000000sparse ====== .. rubric:: Description .. automodule:: sparse .. currentmodule:: sparse .. rubric:: Classes .. autosummary:: :toctree: COO DOK .. rubric:: Functions .. autosummary:: :toctree: concatenate dot random stack tensordot tril triu sparse-0.2.0/docs/generated/sparse.stack.rst000066400000000000000000000000771323236075200210360ustar00rootroot00000000000000stack ===== .. currentmodule:: sparse .. autofunction:: stacksparse-0.2.0/docs/generated/sparse.tensordot.rst000066400000000000000000000001131323236075200217410ustar00rootroot00000000000000tensordot ========= .. currentmodule:: sparse .. autofunction:: tensordotsparse-0.2.0/docs/generated/sparse.tril.rst000066400000000000000000000000741323236075200207000ustar00rootroot00000000000000tril ==== .. currentmodule:: sparse .. autofunction:: trilsparse-0.2.0/docs/generated/sparse.triu.rst000066400000000000000000000000741323236075200207110ustar00rootroot00000000000000triu ==== .. currentmodule:: sparse .. autofunction:: triusparse-0.2.0/docs/index.rst000066400000000000000000000033111323236075200156000ustar00rootroot00000000000000sparse ====== Introduction ------------ In many scientific applications, arrays come up that are mostly empty or filled with zeros. These arrays are aptly named *sparse arrays*. However, it is a matter of choice as to how these are stored. One may store the full array, i.e., with all the zeros included. This incurs a significant cost in terms of memory and performance when working with these arrays. An alternative way is to store them in a standalone data structure that keeps track of only the nonzero entries. Often, this improves performance and memory consumption but most operations on sparse arrays have to be re-written. :obj:`sparse` tries to provide one such data structure. It isn't the only library that does this. Notably, :obj:`scipy.sparse` achieves this, along with `Pysparse `_. Motivation ---------- So why use :obj:`sparse`? Well, the other libraries mentioned are mostly limited to two-dimensional arrays. In addition, inter-compatibility with :obj:`numpy` is hit-or-miss. :obj:`sparse` strives to achieve inter-compatibility with :obj:`numpy.ndarray`, and provide mostly the same API. It defers to :obj:`scipy.sparse` when it is convenient to do so, and writes custom implementations of operations where this isn't possible. It also supports general N-dimensional arrays. Where to from here? ------------------- If you're new to this library, you can visit the :doc:`user manual ` page. If you're already familiar with this library, or you want to dive straight in, you can jump to the :doc:`API reference `. You can also see the contents in the sidebar. .. toctree:: :maxdepth: 3 :hidden: self user_manual api contributing changelog sparse-0.2.0/docs/user_manual.rst000066400000000000000000000007161323236075200170120ustar00rootroot00000000000000User Manual =========== .. currentmodule:: sparse The main class in this package is the :obj:`COO` array type. Learning a few things about this class can be very useful in using this library. This section attempts to document some common things about this object. .. toctree:: :maxdepth: 2 user_manual/installing user_manual/getting_started user_manual/constructing user_manual/building_dok user_manual/operations user_manual/converting sparse-0.2.0/docs/user_manual/000077500000000000000000000000001323236075200162545ustar00rootroot00000000000000sparse-0.2.0/docs/user_manual/building_dok.rst000066400000000000000000000023421323236075200214410ustar00rootroot00000000000000.. currentmodule:: sparse Building :obj:`COO` Arrays from :obj:`DOK` Arrays ================================================= It's possible to build :obj:`COO` arrays from :obj:`DOK` arrays, if it is not easy to construct the :code:`coords` and :obj:`data` in a simple way. :obj:`DOK` arrays provide a simple builder interface to build :obj:`COO` arrays, but at this time, they can do little else. You can get started by defining the shape (and optionally, datatype) of the :obj:`DOK` array. If you do not specify a dtype, it is inferred from the value dictionary or is set to :code:`dtype('float64')` if that is not present. .. code-block:: python s = DOK((6, 5, 2)) s2 = DOK((2, 3, 4), dtype=np.float64) After this, you can build the array by assigning arrays or scalars to elements or slices of the original array. Broadcasting rules are followed. .. code-block:: python s[1:3, 3:1:-1] = [[6, 5]] At the end, you can convert the :obj:`DOK` array to a :obj:`COO` array, and perform arithmetic or other operations on it. .. code-block:: python s2 = COO(s) In addition, it is possible to access single elements of the :obj:`DOK` array using normal Numpy indexing. .. code-block:: python s[1, 2, 1] # 5 s[5, 1, 1] # 0 sparse-0.2.0/docs/user_manual/constructing.rst000066400000000000000000000053331323236075200215340ustar00rootroot00000000000000.. currentmodule:: sparse Constructing :obj:`COO` arrays ============================== From coordinates and data ------------------------- This is the preferred way of constructing :obj:`COO` arrays. The constructor for :obj:`COO` (see :obj:`COO.__init__`) can create these objects from two main variables: :code:`coords` and :code:`data`. :code:`coords` contains the indices where the data is nonzero, and :code:`data` contains the data corresponding to those indices. For example, the following code will generate a :math:`5 \times 5` identity matrix: .. code-block:: python coords = [[0, 1, 2, 3, 4], [0, 1, 2, 3, 4]] data = [1, 1, 1, 1, 1] s = COO(coords, data) In general :code:`coords` should be a :code:`(ndim, nnz)` shaped array. Each row of :code:`coords` contains one dimension of the desired sparse array, and each column contains the index corresponding to that nonzero element. :code:`data` contains the nonzero elements of the array corresponding to the indices in :code:`coords`. Its shape should be :code:`(nnz,)` You can, and should, pass in :obj:`numpy.ndarray` objects for :code:`coords` and :code:`data`. In this case, the shape of the resulting array was determined from the maximum index in each dimension. If the array extends beyond the maximum index in :code:`coords`, you should supply a shape explicitly. For example, if we did the following without the :code:`shape` keyword argument, it would result in a :math:`4 \times 5` matrix, but maybe we wanted one that was actually :math:`5 \times 5`. .. code-block:: python coords = [[0, 3, 2, 1], [4, 1, 2, 0]] data = [1, 4, 2, 1] s = COO(coords, data, shape=(5, 5)) From :obj:`scipy.sparse.spmatrix` objects ----------------------------------------- To construct :obj:`COO` array from :obj:`scipy.sparse.spmatrix` objects, you can use the :obj:`COO.from_scipy_sparse` method. As an example, if :code:`x` is a :obj:`scipy.sparse.spmatrix`, you can do the following to get an equivalent :obj:`COO` array: .. code-block:: python s = COO.from_scipy_sparse(x) From :obj:`numpy.ndarray` objects --------------------------------- To construct :obj:`COO` arrays from :obj:`numpy.ndarray` objects, you can use the :obj:`COO.from_numpy` method. As an example, if :code:`x` is a :obj:`numpy.ndarray`, you can do the following to get an equivalent :obj:`COO` array: .. code-block:: python s = COO.from_numpy(x) Generating random :obj:`COO` objects ------------------------------------ The :obj:`sparse.random` method can be used to create random :obj:`COO` arrays. For example, the following will generate a :math:`10 \times 10` matrix with :math:`10` nonzero entries, each in the interval :math:`[0, 1)`. .. code-block:: python s = sparse.random((10, 10), density=0.1) sparse-0.2.0/docs/user_manual/converting.rst000066400000000000000000000013561323236075200211710ustar00rootroot00000000000000.. currentmodule:: sparse Converting :obj:`COO` objects to other Formats ============================================== :obj:`COO` arrays can be converted to :obj:`numpy.ndarray` objects, or to some :obj:`scipy.sparse.spmatrix` subclasses via the following methods: * :obj:`COO.todense`: Converts to a :obj:`numpy.ndarray` unconditionally. * :obj:`COO.maybe_densify`: Converts to a :obj:`numpy.ndarray` based on certain constraints. * :obj:`COO.to_scipy_sparse`: Converts to a :obj:`scipy.sparse.coo_matrix` if the array is two dimensional. * :obj:`COO.tocsr`: Converts to a :obj:`scipy.sparse.csr_matrix` if the array is two dimensional. * :obj:`COO.tocsc`: Converts to a :obj:`scipy.sparse.csc_matrix` if the array is two dimensional.sparse-0.2.0/docs/user_manual/getting_started.rst000066400000000000000000000023461323236075200222020ustar00rootroot00000000000000.. currentmodule:: sparse Getting Started =============== :obj:`COO` arrays can be constructed from :obj:`numpy.ndarray` objects and :obj:`scipy.sparse.spmatrix` objects. For example, to generate the identity matrix, .. code-block:: python import numpy as np import scipy.sparse import sparse sps_identity = scipy.sparse.eye(5) identity = sparse.COO.from_scipy_sparse(sps_identity) :obj:`COO` arrays can have operations performed on them just like :obj:`numpy.ndarray` objects. For example, to add two :obj:`COO` arrays: .. code-block:: python z = x + y You can also apply any :obj:`numpy.ufunc` to :obj:`COO` arrays. .. code-block:: python sin_x = np.sin(x) However, operations which convert the sparse array into a dense one aren't currently supported. For example, the following raises a :obj:`ValueError`. .. code-block:: python y = x + 5 However, if you're sure you want to convert a sparse array to a dense one, you can do this (which will result in a :obj:`numpy.ndarray`): .. code-block:: python y = x.todense() + 5 That's it! You can move on to the :doc:`user manual <../user_manual>` to see what part of this library interests you, or you can jump straight in with the :doc:`API reference <../api>`. sparse-0.2.0/docs/user_manual/installing.rst000066400000000000000000000007661323236075200211630ustar00rootroot00000000000000.. currentmodule:: sparse Installing ========== :obj:`sparse` can be obtained from pip via .. code-block:: bash pip install sparse You can also get :obj:`sparse` from its current source on GitHub, to get all the latest and greatest features. :obj:`sparse` is under active development, and many new features are being added. However, note that the API is currently unstable at this time. .. code-block:: bash git clone https://github.com/mrocklin/sparse.git cd ./sparse/ pip install . sparse-0.2.0/docs/user_manual/operations.rst000066400000000000000000000007251323236075200211750ustar00rootroot00000000000000Operations on :obj:`COO` arrays =============================== You can do a number of operations on :obj:`COO` arrays. These include `basic operations with operators `_, `element-wise operations `_, `reductions `_ and `other common operations `_ .. toctree:: :maxdepth: 2 operations/basic operations/elemwise operations/reductions operations/indexing operations/other sparse-0.2.0/docs/user_manual/operations/000077500000000000000000000000001323236075200204375ustar00rootroot00000000000000sparse-0.2.0/docs/user_manual/operations/basic.rst000066400000000000000000000056311323236075200222570ustar00rootroot00000000000000.. currentmodule:: sparse Basic Operations ================ :obj:`COO` objects can have a number of operators applied to them. They support operations with scalars, :obj:`scipy.sparse.spmatrix` objects, and other :obj:`COO` objects. For example, to get the sum of two :obj:`COO` objects, you would do the following: .. code-block:: python z = x + y Note that in-place operators are currently not supported. For example, .. code-block:: python x += y will not work. .. _auto-densification: Auto-Densification ------------------ Operations that would result in dense matrices, such as binary operations with :obj:`numpy.ndarray` objects or certain operations with scalars are not allowed and will raise a :obj:`ValueError`. For example, all of the following will raise a :obj:`ValueError`. Here, :code:`x` and :code:`y` are :obj:`COO` objects. .. code-block:: python x == y x + 5 x == 0 x != 5 x / y However, all of the following are valid operations. .. code-block:: python x + 0 x != y x + y x == 5 5 * x x / 7.3 x != 0 If densification is needed, it must be explicit. In other words, you must call :obj:`COO.todense` on the :obj:`COO` object. If both operands are :obj:`COO`, both must be densified. Broadcasting ------------ All binary operators support :obj:`broadcasting `. This means that (under certain conditions) you can perform binary operations on arrays with unequal shape. Namely, when the shape is missing a dimension, or when a dimension is :code:`1`. For example, performing a binary operation on two :obj:`COO` arrays with shapes :code:`(4,)` and :code:`(5, 1)` yields an object of shape :code:`(5, 4)`. The same happens with arrays of shape :code:`(1, 4)` and :code:`(5, 1)`. However, :code:`(4, 1)` and :code:`(5, 1)` will raise a :obj:`ValueError`. Full List of Operators ---------------------- Here, :code:`x` and :code:`y` can be :obj:`COO` arrays, :obj:`scipy.sparse.spmatrix` objects or scalars, keeping in mind :ref:`auto densification rules `. The following operators are supported: * Basic algebraic operations * :obj:`operator.add` (:code:`x + y`) * :obj:`operator.neg` (:code:`-x`) * :obj:`operator.sub` (:code:`x - y`) * :obj:`operator.mul` (:code:`x * y`) * :obj:`operator.truediv` (:code:`x / y`) * :obj:`operator.floordiv` (:code:`x // y`) * :obj:`operator.pow` (:code:`x ** y`) * Comparison operators * :obj:`operator.eq` (:code:`x == y`) * :obj:`operator.ne` (:code:`x != y`) * :obj:`operator.gt` (:code:`x > y`) * :obj:`operator.ge` (:code:`x >= y`) * :obj:`operator.lt` (:code:`x < y`) * :obj:`operator.le` (:code:`x <= y`) * Bitwise operators * :obj:`operator.and_` (:code:`x & y`) * :obj:`operator.or_` (:code:`x | y`) * :obj:`operator.xor` (:code:`x ^ y`) * Bit-shifting operators * :obj:`operator.lshift` (:code:`x << y`) * :obj:`operator.rshift` (:code:`x >> y`) sparse-0.2.0/docs/user_manual/operations/elemwise.rst000066400000000000000000000031461323236075200230070ustar00rootroot00000000000000.. currentmodule:: sparse Element-wise Operations ======================= :obj:`COO` arrays support a variety of element-wise operations. However, as with operators, operations that map zero to a nonzero value are not supported. To illustrate, the following are all possible, and will produce another :obj:`COO` array: .. code-block:: python x.abs() np.sin(x) np.sqrt(x) x.conj() x.expm1() np.log1p(x) However, the following are all unsupported and will raise a :obj:`ValueError`: .. code-block:: python x.exp() np.cos(x) np.log(x) Notice that you can apply any unary or binary :obj:`numpy.ufunc` to :obj:`COO` arrays, :obj:`scipy.sparse.spmatrix` objects and scalars and it will work so long as the result is not dense. :obj:`COO.elemwise` ------------------- This function allows you to apply any arbitrary unary or binary function where the first object is :obj:`COO`, and the second is a scalar, :obj:`COO`, or a :obj:`scipy.sparse.spmatrix`. For example, the following will add two :obj:`COO` objects: .. code-block:: python x.elemwise(np.add, y) Partial List of Supported :obj:`numpy.ufunc` s ---------------------------------------------- Although any unary or binary :obj:`numpy.ufunc` should work if the result is not dense, when calling in the form :code:`x.func()`, the following operations are supported: * :obj:`COO.abs` * :obj:`COO.expm1` * :obj:`COO.log1p` * :obj:`COO.sin` * :obj:`COO.sinh` * :obj:`COO.tan` * :obj:`COO.tanh` * :obj:`COO.sqrt` * :obj:`COO.ceil` * :obj:`COO.floor` * :obj:`COO.round` * :obj:`COO.rint` * :obj:`COO.conj` * :obj:`COO.conjugate` * :obj:`COO.astype` sparse-0.2.0/docs/user_manual/operations/indexing.rst000066400000000000000000000016151323236075200230010ustar00rootroot00000000000000.. currentmodule:: sparse Indexing ======== :obj:`COO` arrays can be :obj:`indexed ` just like regular :obj:`numpy.ndarray` objects. They support integer, slice and boolean indexing. However, currently, numpy advanced indexing is not properly supported. This means that all of the following work like in Numpy, except that they will produce :obj:`COO` arrays rather than :obj:`numpy.ndarray` objects, and will produce scalars where expected. Assume that :code:`z.shape` is :code:`(5, 6, 7)` .. code-block:: python z[0] z[1, 3] z[1, 4, 3] z[:3, :2, 3] z[::-1, 1, 3] z[-1] z[[True, False, True, False, True], 3, 4] All of the following will raise an :obj:`IndexError`, like in Numpy 1.13 and later. .. code-block:: python z[6] z[3, 6] z[1, 4, 8] z[-6] z[[True, True, False, True], 3, 4] .. note:: Numpy advanced indexing is currently not supported.sparse-0.2.0/docs/user_manual/operations/other.rst000066400000000000000000000005021323236075200223070ustar00rootroot00000000000000.. currentmodule:: sparse Other Operations ================ :obj:`COO` arrays support a number of other common operations. Among them are :obj:`dot`, :obj:`tensordot`, :obj:`concatenate` and :obj:`stack`, :obj:`COO.transpose` and :obj:`COO.reshape`. You can view the full list on the API reference page for :obj:`sparse` sparse-0.2.0/docs/user_manual/operations/reductions.rst000066400000000000000000000025021323236075200233470ustar00rootroot00000000000000.. currentmodule:: sparse Reductions ========== :obj:`COO` objects support a number of reductions. However, not all important reductions are currently implemented (help welcome!) All of the following currently work: .. code-block:: python x.sum(axis=1) np.max(x) np.min(x, axis=(0, 2)) x.prod() .. note:: If you are performing multiple reductions along the same axes, it may be beneficial to call :obj:`COO.enable_caching`. :obj:`COO.reduce` ----------------- This method can take an arbitrary :obj:`numpy.ufunc` and performs a reduction using that method. For example, the following will perform a sum: .. code-block:: python x.reduce(np.add, axis=1) .. note:: :obj:`sparse` currently performs reductions by grouping together all coordinates along the supplied axes and reducing those. Then, if the number in a group is deficient, it reduces an extra time with zero. As a result, if reductions can change by adding multiple zeros to it, this method won't be accurate. However, it works in most cases. Partial List of Supported Reductions ------------------------------------ Although any binary :obj:`numpy.ufunc` should work for reductions, when calling in the form :code:`x.reduction()`, the following reductions are supported: * :obj:`COO.sum` * :obj:`COO.max` * :obj:`COO.min` * :obj:`COO.prod` sparse-0.2.0/release-procedure.md000066400000000000000000000004431323236075200167420ustar00rootroot00000000000000* Update changelog in docs/changelog.rst * Tag commit git tag -a x.x.x -m 'Version x.x.x' * Push to github git push mrocklin master --tags * Upload to PyPI git clean -xfd python setup.py sdist bdist_wheel --universal twine upload dist/* sparse-0.2.0/requirements.txt000066400000000000000000000000301323236075200162660ustar00rootroot00000000000000numpy scipy >= 0.19 six sparse-0.2.0/setup.cfg000066400000000000000000000024171323236075200146360ustar00rootroot00000000000000[flake8] # References: # https://flake8.readthedocs.io/en/latest/user/configuration.html # https://flake8.readthedocs.io/en/latest/user/error-codes.html # Note: there cannot be spaces after comma's here exclude = __init__.py ignore = # Extra space in brackets E20, # Multiple spaces around "," E231,E241, # Comments E26, # Import formatting E4, # Comparing types instead of isinstance E721, # Assigning lambda expression E731, # continuation line under-indented for hanging indent E121, # continuation line over-indented for hanging indent E126, # continuation line over-indented for visual indent E127, # E128 continuation line under-indented for visual indent E128, # multiple statements on one line (semicolon) E702, # line break before binary operator W503, # visually indented line with same indent as next logical line E129, # unexpected indentation E116 max-line-length = 120 [versioneer] VCS = git style = pep440 versionfile_source = distributed/_version.py versionfile_build = distributed/_version.py tag_prefix = parentdir_prefix = distributed- [bdist_wheel] universal=1 [tool:pytest] addopts = --flake8 --doctest-modules sparse --cov-report term-missing --cov sparse sparse-0.2.0/setup.py000077500000000000000000000016771323236075200145410ustar00rootroot00000000000000#!/usr/bin/env python from os.path import exists from setuptools import setup setup(name='sparse', version='0.2.0', description='Sparse n-dimensional arrays', url='http://github.com/mrocklin/sparse/', maintainer='Matthew Rocklin', maintainer_email='mrocklin@gmail.com', license='BSD', keywords='sparse,numpy,scipy,dask', packages=['sparse'], long_description=(open('README.rst').read() if exists('README.rst') else ''), install_requires=list(open('requirements.txt').read().strip().split('\n')), extras_require={ 'tests': [ 'tox', 'pytest', 'pytest-cov', 'pytest-flake8', 'packaging', ], 'docs': [ 'sphinx', 'sphinxcontrib-napoleon', 'sphinx_rtd_theme', 'numpydoc', ], }, zip_safe=False) sparse-0.2.0/sparse/000077500000000000000000000000001323236075200143065ustar00rootroot00000000000000sparse-0.2.0/sparse/__init__.py000066400000000000000000000003521323236075200164170ustar00rootroot00000000000000from .coo import COO, tensordot, concatenate, stack, dot, triu, tril from .dok import DOK from .utils import random __version__ = '0.1.1' __all__ = ["COO", "DOK", "tensordot", "concatenate", "stack", "dot", "triu", "tril", "random"] sparse-0.2.0/sparse/coo.py000066400000000000000000002646311323236075200154540ustar00rootroot00000000000000from __future__ import absolute_import, division, print_function from collections import Iterable, defaultdict, deque from functools import reduce, partial import numbers import operator import numpy as np import scipy.sparse from .slicing import normalize_index from .utils import _zero_of_dtype # zip_longest with Python 2/3 compat from six.moves import range, zip_longest try: # Windows compatibility int = long except NameError: pass class COO(object): """ A sparse multidimensional array. This is stored in COO format. It depends on NumPy and Scipy.sparse for computation, but supports arrays of arbitrary dimension. Parameters ---------- coords : numpy.ndarray (COO.ndim, COO.nnz) An array holding the index locations of every value Should have shape (number of dimensions, number of non-zeros) data : numpy.ndarray (COO.nnz,) An array of Values shape : tuple[int] (COO.ndim,), optional The shape of the array has_duplicates : bool, optional A value indicating whether the supplied value for :code:`coords` has duplicates. Note that setting this to `False` when :code:`coords` does have duplicates may result in undefined behaviour. See :obj:`COO.sum_duplicates` sorted : bool, optional A value indicating whether the values in `coords` are sorted. Note that setting this to `False` when :code:`coords` isn't sorted may result in undefined behaviour. See :obj:`COO.sort_indices`. cache : bool, optional Whether to enable cacheing for various operations. See :obj:`COO.enable_caching` Attributes ---------- coords : numpy.ndarray (ndim, nnz) An array holding the coordinates of every nonzero element. data : numpy.ndarray (nnz,) An array holding the values corresponding to :obj:`COO.coords`. shape : tuple[int] (ndim,) The dimensions of this array. See Also -------- DOK : A mostly write-only sparse array. Examples -------- You can create :obj:`COO` objects from Numpy arrays. >>> x = np.eye(4, dtype=np.uint8) >>> x[2, 3] = 5 >>> s = COO.from_numpy(x) >>> s >>> s.data # doctest: +NORMALIZE_WHITESPACE array([1, 1, 1, 5, 1], dtype=uint8) >>> s.coords # doctest: +NORMALIZE_WHITESPACE array([[0, 1, 2, 2, 3], [0, 1, 2, 3, 3]], dtype=uint8) :obj:`COO` objects support basic arithmetic and binary operations. >>> x2 = np.eye(4, dtype=np.uint8) >>> x2[3, 2] = 5 >>> s2 = COO.from_numpy(x2) >>> (s + s2).todense() # doctest: +NORMALIZE_WHITESPACE array([[2, 0, 0, 0], [0, 2, 0, 0], [0, 0, 2, 5], [0, 0, 5, 2]], dtype=uint8) >>> (s * s2).todense() # doctest: +NORMALIZE_WHITESPACE array([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0], [0, 0, 0, 1]], dtype=uint8) Binary operations support broadcasting. >>> x3 = np.zeros((4, 1), dtype=np.uint8) >>> x3[2, 0] = 1 >>> s3 = COO.from_numpy(x3) >>> (s * s3).todense() # doctest: +NORMALIZE_WHITESPACE array([[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 1, 5], [0, 0, 0, 0]], dtype=uint8) :obj:`COO` objects also support dot products and reductions. >>> s.dot(s.T).sum(axis=0).todense() # doctest: +NORMALIZE_WHITESPACE array([ 1, 1, 31, 6], dtype=uint64) You can use Numpy :code:`ufunc` operations on :obj:`COO` arrays as well. >>> np.sum(s, axis=1).todense() # doctest: +NORMALIZE_WHITESPACE array([1, 1, 6, 1], dtype=uint64) >>> np.round(np.sqrt(s, dtype=np.float64), decimals=1).todense() # doctest: +SKIP array([[ 1. , 0. , 0. , 0. ], [ 0. , 1. , 0. , 0. ], [ 0. , 0. , 1. , 2.2], [ 0. , 0. , 0. , 1. ]]) Operations that will result in a dense array will raise a :obj:`ValueError`, such as the following. >>> np.exp(s) Traceback (most recent call last): ... ValueError: Performing this operation would produce a dense result: You can also create :obj:`COO` arrays from coordinates and data. >>> coords = [[0, 0, 0, 1, 1], ... [0, 1, 2, 0, 3], ... [0, 3, 2, 0, 1]] >>> data = [1, 2, 3, 4, 5] >>> s4 = COO(coords, data, shape=(3, 4, 5)) >>> s4 Following scipy.sparse conventions you can also pass these as a tuple with rows and columns >>> rows = [0, 1, 2, 3, 4] >>> cols = [0, 0, 0, 1, 1] >>> data = [10, 20, 30, 40, 50] >>> z = COO((data, (rows, cols))) >>> z.todense() # doctest: +NORMALIZE_WHITESPACE array([[10, 0], [20, 0], [30, 0], [ 0, 40], [ 0, 50]]) You can also pass a dictionary or iterable of index/value pairs. Repeated indices imply summation: >>> d = {(0, 0, 0): 1, (1, 2, 3): 2, (1, 1, 0): 3} >>> COO(d) >>> L = [((0, 0), 1), ... ((1, 1), 2), ... ((0, 0), 3)] >>> COO(L).todense() # doctest: +NORMALIZE_WHITESPACE array([[4, 0], [0, 2]]) You can convert :obj:`DOK` arrays to :obj:`COO` arrays. >>> from sparse import DOK >>> s5 = DOK((5, 5), dtype=np.int64) >>> s5[1:3, 1:3] = [[4, 5], [6, 7]] >>> s5 >>> s6 = COO(s5) >>> s6 >>> s6.todense() # doctest: +NORMALIZE_WHITESPACE array([[0, 0, 0, 0, 0], [0, 4, 5, 0, 0], [0, 6, 7, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]]) """ __array_priority__ = 12 def __init__(self, coords, data=None, shape=None, has_duplicates=True, sorted=False, cache=False): self._cache = None if cache: self.enable_caching() if data is None: from .dok import DOK if isinstance(coords, COO): self.coords = coords.coords self.data = coords.data self.has_duplicates = coords.has_duplicates self.sorted = coords.sorted self.shape = coords.shape return if isinstance(coords, DOK): shape = coords.shape coords = coords.data # {(i, j, k): x, (i, j, k): y, ...} if isinstance(coords, dict): coords = list(coords.items()) has_duplicates = False if isinstance(coords, np.ndarray): result = COO.from_numpy(coords) self.coords = result.coords self.data = result.data self.has_duplicates = result.has_duplicates self.sorted = result.sorted self.shape = result.shape return if isinstance(coords, scipy.sparse.spmatrix): result = COO.from_scipy_sparse(coords) self.coords = result.coords self.data = result.data self.has_duplicates = result.has_duplicates self.sorted = result.sorted self.shape = result.shape return # [] if not coords: data = [] coords = [] # [((i, j, k), value), (i, j, k), value), ...] elif isinstance(coords[0][0], Iterable): if coords: assert len(coords[0]) == 2 data = [x[1] for x in coords] coords = [x[0] for x in coords] coords = np.asarray(coords).T # (data, (row, col, slab, ...)) else: data = coords[0] coords = np.stack(coords[1], axis=0) self.data = np.asarray(data) self.coords = np.asarray(coords) if self.coords.ndim == 1: self.coords = self.coords[None, :] if shape and not self.coords.size: self.coords = np.zeros((len(shape), 0), dtype=np.uint64) if shape is None: if self.coords.nbytes: shape = tuple((self.coords.max(axis=1) + 1).tolist()) else: shape = () if isinstance(shape, numbers.Integral): shape = (int(shape),) self.shape = tuple(shape) if self.shape: dtype = np.min_scalar_type(max(self.shape)) else: dtype = np.int_ self.coords = self.coords.astype(dtype) assert not self.shape or len(data) == self.coords.shape[1] self.has_duplicates = has_duplicates self.sorted = sorted def enable_caching(self): """ Enable caching of reshape, transpose, and tocsr/csc operations This enables efficient iterative workflows that make heavy use of csr/csc operations, such as tensordot. This maintains a cache of recent results of reshape and transpose so that operations like tensordot (which uses both internally) store efficiently stored representations for repeated use. This can significantly cut down on computational costs in common numeric algorithms. However, this also assumes that neither this object, nor the downstream objects will have their data mutated. Examples -------- >>> s.enable_caching() # doctest: +SKIP >>> csr1 = s.transpose((2, 0, 1)).reshape((100, 120)).tocsr() # doctest: +SKIP >>> csr2 = s.transpose((2, 0, 1)).reshape((100, 120)).tocsr() # doctest: +SKIP >>> csr1 is csr2 # doctest: +SKIP True """ self._cache = defaultdict(lambda: deque(maxlen=3)) return self @classmethod def from_numpy(cls, x): """ Convert the given :obj:`numpy.ndarray` to a :obj:`COO` object. Parameters ---------- x : np.ndarray The dense array to convert. Returns ------- COO The converted COO array. Examples -------- >>> x = np.eye(5) >>> s = COO.from_numpy(x) >>> s """ x = np.asanyarray(x) if x.shape: coords = np.where(x) data = x[coords] coords = np.vstack(coords) else: coords = np.empty((0, 1), dtype=np.uint8) data = np.array(x, ndmin=1) return cls(coords, data, shape=x.shape, has_duplicates=False, sorted=True) def todense(self): """ Convert this :obj:`COO` array to a dense :obj:`numpy.ndarray`. Note that this may take a large amount of memory if the :obj:`COO` object's :code:`shape` is large. Returns ------- numpy.ndarray The converted dense array. See Also -------- DOK.todense : Equivalent :obj:`DOK` array method. scipy.sparse.coo_matrix.todense : Equivalent Scipy method. Examples -------- >>> x = np.random.randint(100, size=(7, 3)) >>> s = COO.from_numpy(x) >>> x2 = s.todense() >>> np.array_equal(x, x2) True """ self.sum_duplicates() x = np.zeros(shape=self.shape, dtype=self.dtype) coords = tuple([self.coords[i, :] for i in range(self.ndim)]) data = self.data if coords != (): x[coords] = data else: if len(data) != 0: x[coords] = data return x @classmethod def from_scipy_sparse(cls, x): """ Construct a :obj:`COO` array from a :obj:`scipy.sparse.spmatrix` Parameters ---------- x : scipy.sparse.spmatrix The sparse matrix to construct the array from. Returns ------- COO The converted :obj:`COO` object. Examples -------- >>> x = scipy.sparse.rand(6, 3, density=0.2) >>> s = COO.from_scipy_sparse(x) >>> np.array_equal(x.todense(), s.todense()) True """ x = scipy.sparse.coo_matrix(x) coords = np.empty((2, x.nnz), dtype=x.row.dtype) coords[0, :] = x.row coords[1, :] = x.col return COO(coords, x.data, shape=x.shape, has_duplicates=not x.has_canonical_format, sorted=x.has_canonical_format) @property def dtype(self): """ The datatype of this array. Returns ------- numpy.dtype The datatype of this array. See Also -------- numpy.ndarray.dtype : Numpy equivalent property. scipy.sparse.coo_matrix.dtype : Scipy equivalent property. Examples -------- >>> x = (200 * np.random.rand(5, 4)).astype(np.int32) >>> s = COO.from_numpy(x) >>> s.dtype dtype('int32') >>> x.dtype == s.dtype True """ return self.data.dtype @property def ndim(self): """ The number of dimensions of this array. Returns ------- int The number of dimensions of this array. See Also -------- DOK.ndim : Equivalent property for :obj:`DOK` arrays. numpy.ndarray.ndim : Numpy equivalent property. Examples -------- >>> x = np.random.rand(1, 2, 3, 1, 2) >>> s = COO.from_numpy(x) >>> s.ndim 5 >>> s.ndim == x.ndim True """ return len(self.shape) @property def nnz(self): """ The number of nonzero elements in this array. Note that any duplicates in :code:`coords` are counted multiple times. To avoid this, call :obj:`COO.sum_duplicates`. Returns ------- int The number of nonzero elements in this array. See Also -------- DOK.nnz : Equivalent :obj:`DOK` array property. numpy.count_nonzero : A similar Numpy function. scipy.sparse.coo_matrix.nnz : The Scipy equivalent property. Examples -------- >>> x = np.array([0, 0, 1, 0, 1, 2, 0, 1, 2, 3, 0, 0]) >>> np.count_nonzero(x) 6 >>> s = COO.from_numpy(x) >>> s.nnz 6 >>> np.count_nonzero(x) == s.nnz True """ return self.coords.shape[1] @property def nbytes(self): """ The number of bytes taken up by this object. Note that for small arrays, this may undercount the number of bytes due to the large constant overhead. Returns ------- int The approximate bytes of memory taken by this object. See Also -------- numpy.ndarray.nbytes : The equivalent Numpy property. Examples -------- >>> data = np.arange(6, dtype=np.uint8) >>> coords = np.random.randint(1000, size=(3, 6), dtype=np.uint16) >>> s = COO(coords, data, shape=(1000, 1000, 1000)) >>> s.nbytes 42 """ return self.data.nbytes + self.coords.nbytes def __len__(self): """ Get "length" of array, which is by definition the size of the first dimension. Returns ------- int The size of the first dimension. See Also -------- numpy.ndarray.__len__ : Numpy equivalent property. Examples -------- >>> x = np.zeros((10, 10)) >>> s = COO.from_numpy(x) >>> len(s) 10 """ return self.shape[0] @property def size(self): """ The number of all elements (including zeros) in this array. Returns ------- int The number of elements. See Also -------- numpy.ndarray.size : Numpy equivalent property. Examples -------- >>> x = np.zeros((10, 10)) >>> s = COO.from_numpy(x) >>> s.size 100 """ return np.prod(self.shape) @property def density(self): """ The ratio of nonzero to all elements in this array. Returns ------- float The ratio of nonzero to all elements. See Also -------- COO.size : Number of elements. COO.nnz : Number of nonzero elements. Examples -------- >>> x = np.zeros((8, 8)) >>> x[0, :] = 1 >>> s = COO.from_numpy(x) >>> s.density 0.125 """ return self.nnz / self.size def __sizeof__(self): return self.nbytes def __getitem__(self, index): if not isinstance(index, tuple): if isinstance(index, str): data = self.data[index] idx = np.where(data) coords = list(self.coords[:, idx[0]]) coords.extend(idx[1:]) return COO(coords, data[idx].flatten(), shape=self.shape + self.data.dtype[index].shape, has_duplicates=self.has_duplicates, sorted=self.sorted) else: index = (index,) last_ellipsis = len(index) > 0 and index[-1] is Ellipsis index = normalize_index(index, self.shape) if len(index) != 0 and all(not isinstance(ind, Iterable) and ind == slice(None) for ind in index): return self mask = np.ones(self.nnz, dtype=np.bool) for i, ind in enumerate([i for i in index if i is not None]): if not isinstance(ind, Iterable) and ind == slice(None): continue mask &= _mask(self.coords[i], ind, self.shape[i]) n = mask.sum() coords = [] shape = [] i = 0 for ind in index: if isinstance(ind, numbers.Integral): i += 1 continue elif isinstance(ind, slice): step = ind.step if ind.step is not None else 1 if step > 0: start = ind.start if ind.start is not None else 0 start = max(start, 0) stop = ind.stop if ind.stop is not None else self.shape[i] stop = min(stop, self.shape[i]) if start > stop: start = stop shape.append((stop - start + step - 1) // step) else: start = ind.start or self.shape[i] - 1 stop = ind.stop if ind.stop is not None else -1 start = min(start, self.shape[i] - 1) stop = max(stop, -1) if start < stop: start = stop shape.append((start - stop - step - 1) // (-step)) dt = np.min_scalar_type(min(-(dim - 1) if dim != 0 else -1 for dim in shape)) coords.append((self.coords[i, mask].astype(dt) - start) // step) i += 1 elif isinstance(ind, Iterable): old = self.coords[i][mask] new = np.empty(shape=old.shape, dtype=old.dtype) for j, item in enumerate(ind): new[old == item] = j coords.append(new) shape.append(len(ind)) i += 1 elif ind is None: coords.append(np.zeros(n)) shape.append(1) for j in range(i, self.ndim): coords.append(self.coords[j][mask]) shape.append(self.shape[j]) if coords: coords = np.stack(coords, axis=0) else: if last_ellipsis: coords = np.empty((0, np.sum(mask)), dtype=np.uint8) else: if np.sum(mask) != 0: return self.data[mask][0] else: return _zero_of_dtype(self.dtype)[()] shape = tuple(shape) data = self.data[mask] return COO(coords, data, shape=shape, has_duplicates=self.has_duplicates, sorted=self.sorted) def __str__(self): return "" % ( self.shape, self.dtype, self.nnz, self.sorted, self.has_duplicates) __repr__ = __str__ @staticmethod def _reduce(method, *args, **kwargs): assert len(args) == 1 self = args[0] if isinstance(self, scipy.sparse.spmatrix): self = COO.from_scipy_sparse(self) return self.reduce(method, **kwargs) def reduce(self, method, axis=None, keepdims=False, **kwargs): """ Performs a reduction operation on this array. Parameters ---------- method : numpy.ufunc The method to use for performing the reduction. axis : Union[int, Iterable[int]], optional The axes along which to perform the reduction. Uses all axes by default. keepdims : bool, optional Whether or not to keep the dimensions of the original array. kwargs : dict Any extra arguments to pass to the reduction operation. Returns ------- COO The result of the reduction operation. Raises ------ ValueError If reducing an all-zero axis would produce a nonzero result. Notes ----- This function internally calls :obj:`COO.sum_duplicates` to bring the array into canonical form. See Also -------- numpy.ufunc.reduce : A similar Numpy method. Examples -------- You can use the :obj:`COO.reduce` method to apply a reduction operation to any Numpy :code:`ufunc`. >>> x = np.ones((5, 5), dtype=np.int) >>> s = COO.from_numpy(x) >>> s2 = s.reduce(np.add, axis=1) >>> s2.todense() # doctest: +NORMALIZE_WHITESPACE array([5, 5, 5, 5, 5]) You can also use the :code:`keepdims` argument to keep the dimensions after the reduction. >>> s3 = s.reduce(np.add, axis=1, keepdims=True) >>> s3.shape (5, 1) You can also pass in any keyword argument that :obj:`numpy.ufunc.reduce` supports. For example, :code:`dtype`. Note that :code:`out` isn't supported. >>> s4 = s.reduce(np.add, axis=1, dtype=np.float16) >>> s4.dtype dtype('float16') By default, this reduces the array down to one number, reducing along all axes. >>> s.reduce(np.add) 25 """ zero_reduce_result = method.reduce([_zero_of_dtype(self.dtype)], **kwargs) if zero_reduce_result != _zero_of_dtype(np.dtype(zero_reduce_result)): raise ValueError("Performing this reduction operation would produce " "a dense result: %s" % str(method)) # Needed for more esoteric reductions like product. self.sum_duplicates() if axis is None: axis = tuple(range(self.ndim)) if not isinstance(axis, tuple): axis = (axis,) if set(axis) == set(range(self.ndim)): result = method.reduce(self.data, **kwargs) if self.nnz != self.size: result = method(result, _zero_of_dtype(self.dtype)[()], **kwargs) else: axis = tuple(axis) neg_axis = tuple(ax for ax in range(self.ndim) if ax not in axis) a = self.transpose(neg_axis + axis) a = a.reshape((np.prod([self.shape[d] for d in neg_axis]), np.prod([self.shape[d] for d in axis]))) a.sort_indices() result, inv_idx, counts = _grouped_reduce(a.data, a.coords[0], method, **kwargs) missing_counts = counts != a.shape[1] result[missing_counts] = method(result[missing_counts], _zero_of_dtype(self.dtype), **kwargs) coords = a.coords[0:1, inv_idx] a = COO(coords, result, shape=(a.shape[0],), has_duplicates=False, sorted=True) a = a.reshape([self.shape[d] for d in neg_axis]) result = a if keepdims: result = _keepdims(self, result, axis) return result def sum(self, axis=None, keepdims=False, dtype=None, out=None): """ Performs a sum operation along the given axes. Uses all axes by default. Parameters ---------- axis : Union[int, Iterable[int]], optional The axes along which to sum. Uses all axes by default. keepdims : bool, optional Whether or not to keep the dimensions of the original array. dtype: numpy.dtype The data type of the output array. Returns ------- COO The reduced output sparse array. See Also -------- :obj:`numpy.sum` : Equivalent numpy function. scipy.sparse.coo_matrix.sum : Equivalent Scipy function. Notes ----- * This function internally calls :obj:`COO.sum_duplicates` to bring the array into canonical form. * The :code:`out` parameter is provided just for compatibility with Numpy and isn't actually supported. Examples -------- You can use :obj:`COO.sum` to sum an array across any dimension. >>> x = np.ones((5, 5), dtype=np.int) >>> s = COO.from_numpy(x) >>> s2 = s.sum(axis=1) >>> s2.todense() # doctest: +NORMALIZE_WHITESPACE array([5, 5, 5, 5, 5]) You can also use the :code:`keepdims` argument to keep the dimensions after the sum. >>> s3 = s.sum(axis=1, keepdims=True) >>> s3.shape (5, 1) You can pass in an output datatype, if needed. >>> s4 = s.sum(axis=1, dtype=np.float16) >>> s4.dtype dtype('float16') By default, this reduces the array down to one number, summing along all axes. >>> s.sum() 25 """ assert out is None return self.reduce(np.add, axis=axis, keepdims=keepdims, dtype=dtype) def max(self, axis=None, keepdims=False, out=None): """ Maximize along the given axes. Uses all axes by default. Parameters ---------- axis : Union[int, Iterable[int]], optional The axes along which to maximize. Uses all axes by default. keepdims : bool, optional Whether or not to keep the dimensions of the original array. dtype: numpy.dtype The data type of the output array. Returns ------- COO The reduced output sparse array. See Also -------- :obj:`numpy.max` : Equivalent numpy function. scipy.sparse.coo_matrix.max : Equivalent Scipy function. Notes ----- * This function internally calls :obj:`COO.sum_duplicates` to bring the array into canonical form. * The :code:`out` parameter is provided just for compatibility with Numpy and isn't actually supported. Examples -------- You can use :obj:`COO.max` to maximize an array across any dimension. >>> x = np.add.outer(np.arange(5), np.arange(5)) >>> x # doctest: +NORMALIZE_WHITESPACE array([[0, 1, 2, 3, 4], [1, 2, 3, 4, 5], [2, 3, 4, 5, 6], [3, 4, 5, 6, 7], [4, 5, 6, 7, 8]]) >>> s = COO.from_numpy(x) >>> s2 = s.max(axis=1) >>> s2.todense() # doctest: +NORMALIZE_WHITESPACE array([4, 5, 6, 7, 8]) You can also use the :code:`keepdims` argument to keep the dimensions after the maximization. >>> s3 = s.max(axis=1, keepdims=True) >>> s3.shape (5, 1) By default, this reduces the array down to one number, maximizing along all axes. >>> s.max() 8 """ assert out is None return self.reduce(np.maximum, axis=axis, keepdims=keepdims) def min(self, axis=None, keepdims=False, out=None): """ Minimize along the given axes. Uses all axes by default. Parameters ---------- axis : Union[int, Iterable[int]], optional The axes along which to minimize. Uses all axes by default. keepdims : bool, optional Whether or not to keep the dimensions of the original array. dtype: numpy.dtype The data type of the output array. Returns ------- COO The reduced output sparse array. See Also -------- :obj:`numpy.min` : Equivalent numpy function. scipy.sparse.coo_matrix.min : Equivalent Scipy function. Notes ----- * This function internally calls :obj:`COO.sum_duplicates` to bring the array into canonical form. * The :code:`out` parameter is provided just for compatibility with Numpy and isn't actually supported. Examples -------- You can use :obj:`COO.min` to minimize an array across any dimension. >>> x = np.add.outer(np.arange(5), np.arange(5)) >>> x # doctest: +NORMALIZE_WHITESPACE array([[0, 1, 2, 3, 4], [1, 2, 3, 4, 5], [2, 3, 4, 5, 6], [3, 4, 5, 6, 7], [4, 5, 6, 7, 8]]) >>> s = COO.from_numpy(x) >>> s2 = s.min(axis=1) >>> s2.todense() # doctest: +NORMALIZE_WHITESPACE array([0, 1, 2, 3, 4]) You can also use the :code:`keepdims` argument to keep the dimensions after the minimization. >>> s3 = s.min(axis=1, keepdims=True) >>> s3.shape (5, 1) By default, this reduces the array down to one number, minimizing along all axes. >>> s.min() 0 """ assert out is None return self.reduce(np.minimum, axis=axis, keepdims=keepdims) def prod(self, axis=None, keepdims=False, dtype=None, out=None): """ Performs a product operation along the given axes. Uses all axes by default. Parameters ---------- axis : Union[int, Iterable[int]], optional The axes along which to multiply. Uses all axes by default. keepdims : bool, optional Whether or not to keep the dimensions of the original array. dtype: numpy.dtype The data type of the output array. Returns ------- COO The reduced output sparse array. See Also -------- :obj:`numpy.prod` : Equivalent numpy function. Notes ----- * This function internally calls :obj:`COO.sum_duplicates` to bring the array into canonical form. * The :code:`out` parameter is provided just for compatibility with Numpy and isn't actually supported. Examples -------- You can use :obj:`COO.prod` to multiply an array across any dimension. >>> x = np.add.outer(np.arange(5), np.arange(5)) >>> x # doctest: +NORMALIZE_WHITESPACE array([[0, 1, 2, 3, 4], [1, 2, 3, 4, 5], [2, 3, 4, 5, 6], [3, 4, 5, 6, 7], [4, 5, 6, 7, 8]]) >>> s = COO.from_numpy(x) >>> s2 = s.prod(axis=1) >>> s2.todense() # doctest: +NORMALIZE_WHITESPACE array([ 0, 120, 720, 2520, 6720]) You can also use the :code:`keepdims` argument to keep the dimensions after the reduction. >>> s3 = s.prod(axis=1, keepdims=True) >>> s3.shape (5, 1) You can pass in an output datatype, if needed. >>> s4 = s.prod(axis=1, dtype=np.float16) >>> s4.dtype dtype('float16') By default, this reduces the array down to one number, multiplying along all axes. >>> s.prod() 0 """ assert out is None return self.reduce(np.multiply, axis=axis, keepdims=keepdims, dtype=dtype) def transpose(self, axes=None): """ Returns a new array which has the order of the axes switched. Parameters ---------- axes : Iterable[int], optional The new order of the axes compared to the previous one. Reverses the axes by default. Returns ------- COO The new array with the axes in the desired order. See Also -------- :obj:`COO.T` : A quick property to reverse the order of the axes. numpy.ndarray.transpose : Numpy equivalent function. Examples -------- We can change the order of the dimensions of any :obj:`COO` array with this function. >>> x = np.add.outer(np.arange(5), np.arange(5)[::-1]) >>> x # doctest: +NORMALIZE_WHITESPACE array([[4, 3, 2, 1, 0], [5, 4, 3, 2, 1], [6, 5, 4, 3, 2], [7, 6, 5, 4, 3], [8, 7, 6, 5, 4]]) >>> s = COO.from_numpy(x) >>> s.transpose((1, 0)).todense() # doctest: +NORMALIZE_WHITESPACE array([[4, 5, 6, 7, 8], [3, 4, 5, 6, 7], [2, 3, 4, 5, 6], [1, 2, 3, 4, 5], [0, 1, 2, 3, 4]]) Note that by default, this reverses the order of the axes rather than switching the last and second-to-last axes as required by some linear algebra operations. >>> x = np.random.rand(2, 3, 4) >>> s = COO.from_numpy(x) >>> s.transpose().shape (4, 3, 2) """ if axes is None: axes = list(reversed(range(self.ndim))) # Normalize all axe indices to posivite values axes = np.array(axes) axes[axes < 0] += self.ndim if np.any(axes >= self.ndim) or np.any(axes < 0): raise ValueError("invalid axis for this array") if len(np.unique(axes)) < len(axes): raise ValueError("repeated axis in transpose") if not len(axes) == self.ndim: raise ValueError("axes don't match array") # Normalize all axe indices to posivite values try: axes = np.arange(self.ndim)[list(axes)] except IndexError: raise ValueError("invalid axis for this array") if len(np.unique(axes)) < len(axes): raise ValueError("repeated axis in transpose") if not len(axes) == self.ndim: raise ValueError("axes don't match array") axes = tuple(axes) if axes == tuple(range(self.ndim)): return self if self._cache is not None: for ax, value in self._cache['transpose']: if ax == axes: return value shape = tuple(self.shape[ax] for ax in axes) result = COO(self.coords[axes, :], self.data, shape, has_duplicates=self.has_duplicates, cache=self._cache is not None) if self._cache is not None: self._cache['transpose'].append((axes, result)) return result @property def T(self): """ Returns a new array which has the order of the axes reversed. Returns ------- COO The new array with the axes in the desired order. See Also -------- :obj:`COO.transpose` : A method where you can specify the order of the axes. numpy.ndarray.T : Numpy equivalent property. Examples -------- We can change the order of the dimensions of any :obj:`COO` array with this function. >>> x = np.add.outer(np.arange(5), np.arange(5)[::-1]) >>> x # doctest: +NORMALIZE_WHITESPACE array([[4, 3, 2, 1, 0], [5, 4, 3, 2, 1], [6, 5, 4, 3, 2], [7, 6, 5, 4, 3], [8, 7, 6, 5, 4]]) >>> s = COO.from_numpy(x) >>> s.T.todense() # doctest: +NORMALIZE_WHITESPACE array([[4, 5, 6, 7, 8], [3, 4, 5, 6, 7], [2, 3, 4, 5, 6], [1, 2, 3, 4, 5], [0, 1, 2, 3, 4]]) Note that by default, this reverses the order of the axes rather than switching the last and second-to-last axes as required by some linear algebra operations. >>> x = np.random.rand(2, 3, 4) >>> s = COO.from_numpy(x) >>> s.T.shape (4, 3, 2) """ return self.transpose(tuple(range(self.ndim))[::-1]) def dot(self, other): """ Performs the equivalent of :code:`x.dot(y)` for :obj:`COO`. Parameters ---------- other : Union[COO, numpy.ndarray, scipy.sparse.spmatrix] The second operand of the dot product operation. Returns ------- {COO, numpy.ndarray} The result of the dot product. If the result turns out to be dense, then a dense array is returned, otherwise, a sparse array. See Also -------- dot : Equivalent function for two arguments. :obj:`numpy.dot` : Numpy equivalent function. scipy.sparse.coo_matrix.dot : Scipy equivalent function. Examples -------- >>> x = np.arange(4).reshape((2, 2)) >>> s = COO.from_numpy(x) >>> s.dot(s) # doctest: +SKIP array([[ 2, 3], [ 6, 11]], dtype=int64) """ return dot(self, other) def __matmul__(self, other): try: return dot(self, other) except NotImplementedError: return NotImplemented def __rmatmul__(self, other): try: return dot(other, self) except NotImplementedError: return NotImplemented def __array_ufunc__(self, ufunc, method, *inputs, **kwargs): if method == '__call__': return COO._elemwise(ufunc, *inputs, **kwargs) elif method == 'reduce': return COO._reduce(ufunc, *inputs, **kwargs) else: return NotImplemented def __array__(self, dtype=None, **kwargs): x = self.todense() if dtype and x.dtype != dtype: x = x.astype(dtype) return x def linear_loc(self, signed=False): """ The nonzero coordinates of a flattened version of this array. Note that the coordinates may be out of order. Parameters ---------- signed : bool, optional Whether to use a signed datatype for the output array. :code:`False` by default. Returns ------- numpy.ndarray The flattened coordinates. See Also -------- :obj:`numpy.flatnonzero` : Equivalent Numpy function. Examples -------- >>> x = np.eye(5) >>> s = COO.from_numpy(x) >>> s.linear_loc() # doctest: +NORMALIZE_WHITESPACE array([ 0, 6, 12, 18, 24], dtype=uint8) >>> np.array_equal(np.flatnonzero(x), s.linear_loc()) True """ return _linear_loc(self.coords, self.shape, signed) def reshape(self, shape): """ Returns a new :obj:`COO` array that is a reshaped version of this array. Parameters ---------- shape : tuple[int] The desired shape of the output array. Returns ------- COO The reshaped output array. See Also -------- numpy.ndarray.reshape : The equivalent Numpy function. Examples -------- >>> s = COO.from_numpy(np.arange(25)) >>> s2 = s.reshape((5, 5)) >>> s2.todense() # doctest: +NORMALIZE_WHITESPACE array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14], [15, 16, 17, 18, 19], [20, 21, 22, 23, 24]]) """ if self.shape == shape: return self if any(d == -1 for d in shape): extra = int(self.size / np.prod([d for d in shape if d != -1])) shape = tuple([d if d != -1 else extra for d in shape]) if self.shape == shape: return self if self._cache is not None: for sh, value in self._cache['reshape']: if sh == shape: return value # TODO: this self.size enforces a 2**64 limit to array size linear_loc = self.linear_loc() max_shape = max(shape) if len(shape) != 0 else 1 coords = np.empty((len(shape), self.nnz), dtype=np.min_scalar_type(max_shape - 1)) strides = 1 for i, d in enumerate(shape[::-1]): coords[-(i + 1), :] = (linear_loc // strides) % d strides *= d result = COO(coords, self.data, shape, has_duplicates=self.has_duplicates, sorted=self.sorted, cache=self._cache is not None) if self._cache is not None: self._cache['reshape'].append((shape, result)) return result def to_scipy_sparse(self): """ Converts this :obj:`COO` object into a :obj:`scipy.sparse.coo_matrix`. Returns ------- :obj:`scipy.sparse.coo_matrix` The converted Scipy sparse matrix. Raises ------ ValueError If the array is not two-dimensional. See Also -------- COO.tocsr : Convert to a :obj:`scipy.sparse.csr_matrix`. COO.tocsc : Convert to a :obj:`scipy.sparse.csc_matrix`. """ if self.ndim != 2: raise ValueError("Can only convert a 2-dimensional array to a Scipy sparse matrix.") result = scipy.sparse.coo_matrix((self.data, (self.coords[0], self.coords[1])), shape=self.shape) result.has_canonical_format = (not self.has_duplicates and self.sorted) return result def _tocsr(self): if self.ndim != 2: raise ValueError('This array must be two-dimensional for this conversion ' 'to work.') # Pass 1: sum duplicates self.sum_duplicates() # Pass 2: sort indices self.sort_indices() row, col = self.coords # Pass 3: count nonzeros in each row indptr = np.zeros(self.shape[0] + 1, dtype=np.int64) np.cumsum(np.bincount(row, minlength=self.shape[0]), out=indptr[1:]) return scipy.sparse.csr_matrix((self.data, col, indptr), shape=self.shape) def tocsr(self): """ Converts this array to a :obj:`scipy.sparse.csr_matrix`. Returns ------- scipy.sparse.csr_matrix The result of the conversion. Raises ------ ValueError If the array is not two-dimensional. See Also -------- COO.tocsc : Convert to a :obj:`scipy.sparse.csc_matrix`. COO.to_scipy_sparse : Convert to a :obj:`scipy.sparse.coo_matrix`. scipy.sparse.coo_matrix.tocsr : Equivalent Scipy function. """ if self._cache is not None: try: return self._csr except AttributeError: pass try: self._csr = self._csc.tocsr() return self._csr except AttributeError: pass self._csr = csr = self._tocsr() else: csr = self._tocsr() return csr def tocsc(self): """ Converts this array to a :obj:`scipy.sparse.csc_matrix`. Returns ------- scipy.sparse.csc_matrix The result of the conversion. Raises ------ ValueError If the array is not two-dimensional. See Also -------- COO.tocsr : Convert to a :obj:`scipy.sparse.csr_matrix`. COO.to_scipy_sparse : Convert to a :obj:`scipy.sparse.coo_matrix`. scipy.sparse.coo_matrix.tocsc : Equivalent Scipy function. """ if self._cache is not None: try: return self._csc except AttributeError: pass try: self._csc = self._csr.tocsc() return self._csc except AttributeError: pass self._csc = csc = self.tocsr().tocsc() else: csc = self.tocsr().tocsc() return csc def sort_indices(self): """ Sorts the :obj:`COO.coords` attribute. Also sorts the data in :obj:`COO.data` to match. Examples -------- >>> coords = np.array([[1, 2, 0]], dtype=np.uint8) >>> data = np.array([4, 1, 3], dtype=np.uint8) >>> s = COO(coords, data) >>> s.sort_indices() >>> s.coords # doctest: +NORMALIZE_WHITESPACE array([[0, 1, 2]], dtype=uint8) >>> s.data # doctest: +NORMALIZE_WHITESPACE array([3, 4, 1], dtype=uint8) """ if self.sorted: return linear = self.linear_loc(signed=True) if (np.diff(linear) > 0).all(): # already sorted self.sorted = True return order = np.argsort(linear) self.coords = self.coords[:, order] self.data = self.data[order] self.sorted = True def sum_duplicates(self): """ Sums data corresponding to duplicates in :obj:`COO.coords`. See Also -------- scipy.sparse.coo_matrix.sum_duplicates : Equivalent Scipy function. Examples -------- >>> coords = np.array([[0, 1, 1, 2]], dtype=np.uint8) >>> data = np.array([6, 5, 2, 2], dtype=np.uint8) >>> s = COO(coords, data) >>> s.sum_duplicates() >>> s.coords # doctest: +NORMALIZE_WHITESPACE array([[0, 1, 2]], dtype=uint8) >>> s.data # doctest: +NORMALIZE_WHITESPACE array([6, 7, 2], dtype=uint8) """ # Inspired by scipy/sparse/coo.py::sum_duplicates # See https://github.com/scipy/scipy/blob/master/LICENSE.txt if not self.has_duplicates and self.sorted: return if not self.coords.size: return self.sort_indices() linear = self.linear_loc() unique_mask = np.diff(linear) != 0 if unique_mask.sum() == len(unique_mask): # already unique self.has_duplicates = False return unique_mask = np.append(True, unique_mask) coords = self.coords[:, unique_mask] (unique_inds,) = np.nonzero(unique_mask) data = np.add.reduceat(self.data, unique_inds, dtype=self.data.dtype) self.data = data self.coords = coords self.has_duplicates = False def __add__(self, other): return self.elemwise(operator.add, other) def __radd__(self, other): return self.elemwise(_reverse_self_other(operator.add), other) def __neg__(self): return self.elemwise(operator.neg) def __sub__(self, other): return self.elemwise(operator.sub, other) def __rsub__(self, other): return self.elemwise(_reverse_self_other(operator.sub), other) def __mul__(self, other): return self.elemwise(operator.mul, other) def __rmul__(self, other): return self.elemwise(_reverse_self_other(operator.mul), other) def __truediv__(self, other): return self.elemwise(operator.truediv, other) def __rtruediv__(self, other): return self.elemwise(_reverse_self_other(operator.truediv), other) def __floordiv__(self, other): return self.elemwise(operator.floordiv, other) def __rfloordiv__(self, other): return self.elemwise(_reverse_self_other(operator.floordiv), other) __div__ = __truediv__ __rdiv__ = __rtruediv__ def __pow__(self, other): return self.elemwise(operator.pow, other) def __rpow__(self, other): return self.elemwise(_reverse_self_other(operator.pow), other) def __mod__(self, other): return self.elemwise(operator.mod, other) def __rmod__(self, other): return self.elemwise(_reverse_self_other(operator.mod), other) def __and__(self, other): return self.elemwise(operator.and_, other) def __rand__(self, other): return self.elemwise(_reverse_self_other(operator.and_), other) def __xor__(self, other): return self.elemwise(operator.xor, other) def __rxor__(self, other): return self.elemwise(_reverse_self_other(operator.xor), other) def __or__(self, other): return self.elemwise(operator.or_, other) def __ror__(self, other): return self.elemwise(_reverse_self_other(operator.or_), other) def __invert__(self): return self.elemwise(operator.invert) def __gt__(self, other): return self.elemwise(operator.gt, other) def __ge__(self, other): return self.elemwise(operator.ge, other) def __lt__(self, other): return self.elemwise(operator.lt, other) def __le__(self, other): return self.elemwise(operator.le, other) def __eq__(self, other): return self.elemwise(operator.eq, other) def __ne__(self, other): return self.elemwise(operator.ne, other) def __lshift__(self, other): return self.elemwise(operator.lshift, other) def __rlshift__(self, other): return self.elemwise(_reverse_self_other(operator.lshift), other) def __rshift__(self, other): return self.elemwise(operator.rshift, other) def __rrshift__(self, other): return self.elemwise(_reverse_self_other(operator.rshift), other) @staticmethod def _elemwise(func, *args, **kwargs): if len(args) == 0: return func() self = args[0] if isinstance(self, scipy.sparse.spmatrix): self = COO.from_numpy(self) elif np.isscalar(self) or (isinstance(self, np.ndarray) and self.ndim == 0): func = partial(func, self) other = args[1] if isinstance(other, scipy.sparse.spmatrix): other = COO.from_scipy_sparse(other) return _elemwise_unary(func, other, *args[2:], **kwargs) if len(args) == 1: return _elemwise_unary(func, self, *args[1:], **kwargs) else: other = args[1] if isinstance(other, scipy.sparse.spmatrix): other = COO.from_scipy_sparse(other) if isinstance(other, COO) or isinstance(other, np.ndarray): return _elemwise_binary(func, self, other, *args[2:], **kwargs) else: return _elemwise_unary(func, self, *args[1:], **kwargs) def elemwise(self, func, *args, **kwargs): """ Apply a function to one or two arguments. Parameters ---------- func : Callable The function to apply to one or two arguments. args : tuple, optional The extra arguments to pass to the function. If :code:`args[0]` is a COO object, a scipy.sparse.spmatrix or a scalar; the function will be treated as a binary function. Otherwise, it will be treated as a unary function. kwargs : dict, optional The kwargs to pass to the function. Returns ------- COO The result of applying the function. Raises ------ ValueError If the operation would result in a dense matrix. See Also -------- :obj:`numpy.ufunc` : A similar Numpy construct. Note that any :code:`ufunc` can be used as the :code:`func` input to this function. """ return COO._elemwise(func, self, *args, **kwargs) def broadcast_to(self, shape): """ Performs the equivalent of :obj:`numpy.broadcast_to` for :obj:`COO`. Note that this function returns a new array instead of a view. Parameters ---------- shape : tuple[int] The shape to broadcast the data to. Returns ------- COO The broadcasted sparse array. Raises ------ ValueError If the operand cannot be broadcast to the given shape. See also -------- :obj:`numpy.broadcast_to` : NumPy equivalent function """ result_shape = _get_broadcast_shape(self.shape, shape, is_result=True) params = _get_broadcast_parameters(self.shape, result_shape) coords, data = _get_expanded_coords_data(self.coords, self.data, params, result_shape) return COO(coords, data, shape=result_shape, has_duplicates=self.has_duplicates, sorted=self.sorted) def __abs__(self): """ Calculate the absolute value element-wise. See also -------- :obj:`numpy.absolute` : NumPy equivalent ufunc. """ return self.elemwise(abs) abs = __abs__ def exp(self, out=None): """ Calculate the exponential of all elements in the array. See also -------- :obj:`numpy.exp` : NumPy equivalent ufunc. :obj:`COO.elemwise`: Apply an arbitrary element-wise function to one or two arguments. Notes ----- The :code:`out` parameter is provided just for compatibility with Numpy and isn't actually supported. """ assert out is None return self.elemwise(np.exp) def expm1(self, out=None): """ Calculate :code:`exp(x) - 1` for all elements in the array. See also -------- scipy.sparse.coo_matrix.expm1 : SciPy sparse equivalent function :obj:`numpy.expm1` : NumPy equivalent ufunc. :obj:`COO.elemwise`: Apply an arbitrary element-wise function to one or two arguments. Notes ----- The :code:`out` parameter is provided just for compatibility with Numpy and isn't actually supported. """ assert out is None return self.elemwise(np.expm1) def log1p(self, out=None): """ Return the natural logarithm of one plus the input array, element-wise. Calculates :code:`log(1 + x)`. See also -------- scipy.sparse.coo_matrix.log1p : SciPy sparse equivalent function :obj:`numpy.log1p` : NumPy equivalent ufunc. :obj:`COO.elemwise`: Apply an arbitrary element-wise function to one or two arguments. Notes ----- The :code:`out` parameter is provided just for compatibility with Numpy and isn't actually supported. """ assert out is None return self.elemwise(np.log1p) def sin(self, out=None): """ Trigonometric sine, element-wise. See also -------- scipy.sparse.coo_matrix.sin : SciPy sparse equivalent function :obj:`numpy.sin` : NumPy equivalent ufunc. :obj:`COO.elemwise`: Apply an arbitrary element-wise function to one or two arguments. Notes ----- The :code:`out` parameter is provided just for compatibility with Numpy and isn't actually supported. """ assert out is None return self.elemwise(np.sin) def sinh(self, out=None): """ Hyperbolic sine, element-wise. See also -------- scipy.sparse.coo_matrix.sinh : SciPy sparse equivalent function :obj:`numpy.sinh` : NumPy equivalent ufunc. :obj:`COO.elemwise`: Apply an arbitrary element-wise function to one or two arguments. Notes ----- The :code:`out` parameter is provided just for compatibility with Numpy and isn't actually supported. """ assert out is None return self.elemwise(np.sinh) def tan(self, out=None): """ Compute tangent element-wise. See also -------- scipy.sparse.coo_matrix.tan : SciPy sparse equivalent function :obj:`numpy.tan` : NumPy equivalent ufunc. :obj:`COO.elemwise`: Apply an arbitrary element-wise function to one or two arguments. """ assert out is None return self.elemwise(np.tan) def tanh(self, out=None): """ Compute hyperbolic tangent element-wise. See also -------- scipy.sparse.coo_matrix.tanh : SciPy sparse equivalent function :obj:`numpy.tanh` : NumPy equivalent ufunc. :obj:`COO.elemwise`: Apply an arbitrary element-wise function to one or two arguments. Notes ----- The :code:`out` parameter is provided just for compatibility with Numpy and isn't actually supported. """ assert out is None return self.elemwise(np.tanh) def sqrt(self, out=None): """ Return the positive square-root of an array, element-wise. See also -------- scipy.sparse.coo_matrix.sqrt : SciPy sparse equivalent function :obj:`numpy.sqrt` : NumPy equivalent ufunc. :obj:`COO.elemwise`: Apply an arbitrary element-wise function to one or two arguments. Notes ----- The :code:`out` parameter is provided just for compatibility with Numpy and isn't actually supported. """ assert out is None return self.elemwise(np.sqrt) def ceil(self, out=None): """ Return the ceiling of the input, element-wise. See also -------- scipy.sparse.coo_matrix.ceil : SciPy sparse equivalent function :obj:`numpy.ceil` : NumPy equivalent ufunc. :obj:`COO.elemwise`: Apply an arbitrary element-wise function to one or two arguments. Notes ----- The :code:`out` parameter is provided just for compatibility with Numpy and isn't actually supported. """ assert out is None return self.elemwise(np.ceil) def floor(self, out=None): """ Return the floor of the input, element-wise. See also -------- scipy.sparse.coo_matrix.floor : SciPy sparse equivalent function :obj:`numpy.floor` : NumPy equivalent ufunc. :obj:`COO.elemwise`: Apply an arbitrary element-wise function to one or two arguments. Notes ----- The :code:`out` parameter is provided just for compatibility with Numpy and isn't actually supported. """ assert out is None return self.elemwise(np.floor) def round(self, decimals=0, out=None): """ Evenly round to the given number of decimals. See also -------- :obj:`numpy.round` : NumPy equivalent ufunc. :obj:`COO.elemwise`: Apply an arbitrary element-wise function to one or two arguments. Notes ----- The :code:`out` parameter is provided just for compatibility with Numpy and isn't actually supported. """ assert out is None return self.elemwise(np.round, decimals) def rint(self, out=None): """ Round elements of the array to the nearest integer. See also -------- scipy.sparse.coo_matrix.rint : SciPy sparse equivalent function :obj:`numpy.rint` : NumPy equivalent ufunc. :obj:`COO.elemwise`: Apply an arbitrary element-wise function to one or two arguments. Notes ----- The :code:`out` parameter is provided just for compatibility with Numpy and isn't actually supported. """ assert out is None return self.elemwise(np.rint) def conj(self, out=None): """ Return the complex conjugate, element-wise. See also -------- conjugate : Equivalent function scipy.sparse.coo_matrix.conj : SciPy sparse equivalent function :obj:`numpy.conj` : NumPy equivalent ufunc. :obj:`COO.elemwise`: Apply an arbitrary element-wise function to one or two arguments. Notes ----- The :code:`out` parameter is provided just for compatibility with Numpy and isn't actually supported. """ assert out is None return self.elemwise(np.conj) def conjugate(self, out=None): """ Return the complex conjugate, element-wise. See also -------- conj : Equivalent function scipy.sparse.coo_matrix.conjugate : SciPy sparse equivalent function :obj:`numpy.conj` : NumPy equivalent ufunc. :obj:`COO.elemwise`: Apply an arbitrary element-wise function to one or two arguments. Notes ----- The :code:`out` parameter is provided just for compatibility with Numpy and isn't actually supported. """ assert out is None return self.elemwise(np.conjugate) def astype(self, dtype, out=None): """ Copy of the array, cast to a specified type. See also -------- scipy.sparse.coo_matrix.astype : SciPy sparse equivalent function numpy.ndarray.astype : NumPy equivalent ufunc. :obj:`COO.elemwise`: Apply an arbitrary element-wise function to one or two arguments. Notes ----- The :code:`out` parameter is provided just for compatibility with Numpy and isn't actually supported. """ assert out is None return self.elemwise(np.ndarray.astype, dtype) def maybe_densify(self, max_size=1000, min_density=0.25): """ Converts this :obj:`COO` array to a :obj:`numpy.ndarray` if not too costly. Parameters ---------- max_size : int Maximum number of elements in output min_density : float Minimum density of output Returns ------- numpy.ndarray The dense array. Raises ------- ValueError If the returned array would be too large. Examples -------- Convert a small sparse array to a dense array. >>> s = COO.from_numpy(np.random.rand(2, 3, 4)) >>> x = s.maybe_densify() >>> np.allclose(x, s.todense()) True You can also specify the minimum allowed density or the maximum number of output elements. If both conditions are unmet, this method will throw an error. >>> x = np.zeros((5, 5), dtype=np.uint8) >>> x[2, 2] = 1 >>> s = COO.from_numpy(x) >>> s.maybe_densify(max_size=5, min_density=0.25) Traceback (most recent call last): ... ValueError: Operation would require converting large sparse array to dense """ if self.size <= max_size or self.density >= min_density: return self.todense() else: raise ValueError("Operation would require converting " "large sparse array to dense") def tensordot(a, b, axes=2): """ Perform the equivalent of :obj:`numpy.tensordot`. Parameters ---------- a, b : Union[COO, np.ndarray, scipy.sparse.spmatrix] The arrays to perform the :code:`tensordot` operation on. axes : tuple[Union[int, tuple[int], Union[int, tuple[int]], optional The axes to match when performing the sum. Returns ------- Union[COO, numpy.ndarray] The result of the operation. See Also -------- numpy.tensordot : NumPy equivalent function """ # Much of this is stolen from numpy/core/numeric.py::tensordot # Please see license at https://github.com/numpy/numpy/blob/master/LICENSE.txt try: iter(axes) except TypeError: axes_a = list(range(-axes, 0)) axes_b = list(range(0, axes)) else: axes_a, axes_b = axes try: na = len(axes_a) axes_a = list(axes_a) except TypeError: axes_a = [axes_a] na = 1 try: nb = len(axes_b) axes_b = list(axes_b) except TypeError: axes_b = [axes_b] nb = 1 # a, b = asarray(a), asarray(b) # <--- modified as_ = a.shape nda = a.ndim bs = b.shape ndb = b.ndim equal = True if na != nb: equal = False else: for k in range(na): if as_[axes_a[k]] != bs[axes_b[k]]: equal = False break if axes_a[k] < 0: axes_a[k] += nda if axes_b[k] < 0: axes_b[k] += ndb if not equal: raise ValueError("shape-mismatch for sum") # Move the axes to sum over to the end of "a" # and to the front of "b" notin = [k for k in range(nda) if k not in axes_a] newaxes_a = notin + axes_a N2 = 1 for axis in axes_a: N2 *= as_[axis] newshape_a = (-1, N2) olda = [as_[axis] for axis in notin] notin = [k for k in range(ndb) if k not in axes_b] newaxes_b = axes_b + notin N2 = 1 for axis in axes_b: N2 *= bs[axis] newshape_b = (N2, -1) oldb = [bs[axis] for axis in notin] at = a.transpose(newaxes_a).reshape(newshape_a) bt = b.transpose(newaxes_b).reshape(newshape_b) res = _dot(at, bt) if isinstance(res, scipy.sparse.spmatrix): if res.nnz > reduce(operator.mul, res.shape) / 2: res = res.todense() else: res = COO.from_scipy_sparse(res) # <--- modified res.has_duplicates = False if isinstance(res, np.matrix): res = np.asarray(res) return res.reshape(olda + oldb) def dot(a, b): """ Perform the equivalent of :obj:`numpy.dot` on two arrays. Parameters ---------- a, b : Union[COO, np.ndarray, scipy.sparse.spmatrix] The arrays to perform the :code:`dot` operation on. Returns ------- Union[COO, numpy.ndarray] The result of the operation. See Also -------- numpy.dot : NumPy equivalent function. COO.dot : Equivalent function for COO objects. """ if not hasattr(a, 'ndim') or not hasattr(b, 'ndim'): raise NotImplementedError( "Cannot perform dot product on types %s, %s" % (type(a), type(b))) return tensordot(a, b, axes=((a.ndim - 1,), (b.ndim - 2,))) def _dot(a, b): if isinstance(a, COO): a.sum_duplicates() if isinstance(b, COO): b.sum_duplicates() if isinstance(b, COO) and not isinstance(a, COO): return _dot(b.T, a.T).T aa = a.tocsr() if isinstance(b, (COO, scipy.sparse.spmatrix)): b = b.tocsc() return aa.dot(b) def _keepdims(original, new, axis): shape = list(original.shape) for ax in axis: shape[ax] = 1 return new.reshape(shape) def _mask(coords, idx, shape): if isinstance(idx, numbers.Integral): return coords == idx elif isinstance(idx, slice): step = idx.step if idx.step is not None else 1 if step > 0: start = idx.start if idx.start is not None else 0 stop = idx.stop if idx.stop is not None else shape return (coords >= start) & (coords < stop) & \ (coords % step == start % step) else: start = idx.start if idx.start is not None else (shape - 1) stop = idx.stop if idx.stop is not None else -1 return (coords <= start) & (coords > stop) & \ (coords % step == start % step) elif isinstance(idx, Iterable): mask = np.zeros(len(coords), dtype=np.bool) for item in idx: mask |= _mask(coords, item, shape) return mask def concatenate(arrays, axis=0): """ Concatenate the input arrays along the given dimension. Parameters ---------- arrays : Iterable[Union[COO, numpy.ndarray, scipy.sparse.spmatrix]] The input arrays to concatenate. axis : int, optional The axis along which to concatenate the input arrays. The default is zero. Returns ------- COO The output concatenated array. See Also -------- numpy.concatenate : NumPy equivalent function """ arrays = [x if isinstance(x, COO) else COO(x) for x in arrays] if axis < 0: axis = axis + arrays[0].ndim assert all(x.shape[ax] == arrays[0].shape[ax] for x in arrays for ax in set(range(arrays[0].ndim)) - {axis}) nnz = 0 dim = sum(x.shape[axis] for x in arrays) shape = list(arrays[0].shape) shape[axis] = dim coords_dtype = np.min_scalar_type(max(shape) - 1) if len(shape) != 0 else np.uint8 data = np.concatenate([x.data for x in arrays]) coords = np.concatenate([x.coords for x in arrays], axis=1).astype(coords_dtype) dim = 0 for x in arrays: if dim: coords[axis, nnz:x.nnz + nnz] += dim dim += x.shape[axis] nnz += x.nnz has_duplicates = any(x.has_duplicates for x in arrays) return COO(coords, data, shape=shape, has_duplicates=has_duplicates, sorted=(axis == 0) and all(a.sorted for a in arrays)) def stack(arrays, axis=0): """ Stack the input arrays along the given dimension. Parameters ---------- arrays : Iterable[Union[COO, numpy.ndarray, scipy.sparse.spmatrix]] The input arrays to stack. axis : int, optional The axis along which to stack the input arrays. Returns ------- COO The output stacked array. See Also -------- numpy.stack : NumPy equivalent function """ assert len(set(x.shape for x in arrays)) == 1 arrays = [x if isinstance(x, COO) else COO(x) for x in arrays] if axis < 0: axis = axis + arrays[0].ndim + 1 data = np.concatenate([x.data for x in arrays]) coords = np.concatenate([x.coords for x in arrays], axis=1) shape = list(arrays[0].shape) shape.insert(axis, len(arrays)) coords_dtype = np.min_scalar_type(max(shape) - 1) if len(shape) != 0 else np.uint8 nnz = 0 dim = 0 new = np.empty(shape=(coords.shape[1],), dtype=coords_dtype) for x in arrays: new[nnz:x.nnz + nnz] = dim dim += 1 nnz += x.nnz has_duplicates = any(x.has_duplicates for x in arrays) coords = [coords[i].astype(coords_dtype) for i in range(coords.shape[0])] coords.insert(axis, new) coords = np.stack(coords, axis=0) return COO(coords, data, shape=shape, has_duplicates=has_duplicates, sorted=(axis == 0) and all(a.sorted for a in arrays)) def triu(x, k=0): """ Returns an array with all elements below the k-th diagonal set to zero. Parameters ---------- x : COO The input array. k : int, optional The diagonal below which elements are set to zero. The default is zero, which corresponds to the main diagonal. Returns ------- COO The output upper-triangular matrix. See Also -------- numpy.triu : NumPy equivalent function """ if not x.ndim >= 2: raise NotImplementedError('sparse.triu is not implemented for scalars or 1-D arrays.') mask = x.coords[-2] + k <= x.coords[-1] coords = x.coords[:, mask] data = x.data[mask] return COO(coords, data, x.shape, x.has_duplicates, x.sorted) def tril(x, k=0): """ Returns an array with all elements above the k-th diagonal set to zero. Parameters ---------- x : COO The input array. k : int, optional The diagonal above which elements are set to zero. The default is zero, which corresponds to the main diagonal. Returns ------- COO The output lower-triangular matrix. See Also -------- numpy.tril : NumPy equivalent function """ if not x.ndim >= 2: raise NotImplementedError('sparse.tril is not implemented for scalars or 1-D arrays.') mask = x.coords[-2] + k >= x.coords[-1] coords = x.coords[:, mask] data = x.data[mask] return COO(coords, data, x.shape, x.has_duplicates, x.sorted) # (c) Paul Panzer # Taken from https://stackoverflow.com/a/47833496/774273 # License: https://creativecommons.org/licenses/by-sa/3.0/ def _match_arrays(a, b): """ Finds all indexes into a and b such that a[i] = b[j]. The outputs are sorted in lexographical order. Parameters ---------- a, b : np.ndarray The input 1-D arrays to match. If matching of multiple fields is needed, use np.recarrays. These two arrays must be sorted. Returns ------- a_idx, b_idx : np.ndarray The output indices of every possible pair of matching elements. """ if len(a) == 0 or len(b) == 0: return np.array([], dtype=np.uint8), np.array([], dtype=np.uint8) asw = np.r_[0, 1 + np.flatnonzero(a[:-1] != a[1:]), len(a)] bsw = np.r_[0, 1 + np.flatnonzero(b[:-1] != b[1:]), len(b)] al, bl = np.diff(asw), np.diff(bsw) na = len(al) asw, bsw = asw, bsw abunq = np.r_[a[asw[:-1]], b[bsw[:-1]]] m = np.argsort(abunq, kind='mergesort') mv = abunq[m] midx = np.flatnonzero(mv[:-1] == mv[1:]) ai, bi = m[midx], m[midx + 1] - na aic = np.r_[0, np.cumsum(al[ai])] a_idx = np.ones((aic[-1],), dtype=np.int_) a_idx[aic[:-1]] = asw[ai] a_idx[aic[1:-1]] -= asw[ai[:-1]] + al[ai[:-1]] - 1 a_idx = np.repeat(np.cumsum(a_idx), np.repeat(bl[bi], al[ai])) bi = np.repeat(bi, al[ai]) bic = np.r_[0, np.cumsum(bl[bi])] b_idx = np.ones((bic[-1],), dtype=np.int_) b_idx[bic[:-1]] = bsw[bi] b_idx[bic[1:-1]] -= bsw[bi[:-1]] + bl[bi[:-1]] - 1 b_idx = np.cumsum(b_idx) return a_idx, b_idx def _grouped_reduce(x, groups, method, **kwargs): """ Performs a :code:`ufunc` grouped reduce. Parameters ---------- x : np.ndarray The data to reduce. groups : np.ndarray The groups the data belongs to. The groups must be contiguous. method : np.ufunc The :code:`ufunc` to use to perform the reduction. kwargs : dict The kwargs to pass to the :code:`ufunc`'s :code:`reduceat` function. Returns ------- result : np.ndarray The result of the grouped reduce operation. inv_idx : np.ndarray The index of the first element where each group is found. counts : np.ndarray The number of elements in each group. """ # Partial credit to @shoyer # Ref: https://gist.github.com/shoyer/f538ac78ae904c936844 flag = np.concatenate(([True] if len(x) != 0 else [], groups[1:] != groups[:-1])) inv_idx = np.flatnonzero(flag) result = method.reduceat(x, inv_idx, **kwargs) counts = np.diff(np.concatenate((inv_idx, [len(x)]))) return result, inv_idx, counts def _elemwise_binary(func, self, other, *args, **kwargs): check = kwargs.pop('check', True) self_zero = _zero_of_dtype(self.dtype) other_zero = _zero_of_dtype(other.dtype) func_zero = _zero_of_dtype(func(self_zero, other_zero, *args, **kwargs).dtype) if check and func(self_zero, other_zero, *args, **kwargs) != func_zero: raise ValueError("Performing this operation would produce " "a dense result: %s" % str(func)) if not isinstance(self, COO): if not check or np.array_equiv(func(self, other_zero, *args, **kwargs), func_zero): return _elemwise_binary_self_dense(func, self, other, *args, **kwargs) else: raise ValueError("Performing this operation would produce " "a dense result: %s" % str(func)) if not isinstance(other, COO): if not check or np.array_equiv(func(self_zero, other, *args, **kwargs), func_zero): temp_func = _reverse_self_other(func) return _elemwise_binary_self_dense(temp_func, other, self, *args, **kwargs) else: raise ValueError("Performing this operation would produce " "a dense result: %s" % str(func)) self_shape, other_shape = self.shape, other.shape result_shape = _get_broadcast_shape(self_shape, other_shape) self_params = _get_broadcast_parameters(self.shape, result_shape) other_params = _get_broadcast_parameters(other.shape, result_shape) combined_params = [p1 and p2 for p1, p2 in zip(self_params, other_params)] self_reduce_params = combined_params[-self.ndim:] other_reduce_params = combined_params[-other.ndim:] self.sum_duplicates() # TODO: document side-effect or make copy other.sum_duplicates() # TODO: document side-effect or make copy self_coords = self.coords self_data = self.data self_reduced_coords, self_reduced_shape = \ _get_reduced_coords(self_coords, self_shape, self_reduce_params) self_reduced_linear = _linear_loc(self_reduced_coords, self_reduced_shape) i = np.argsort(self_reduced_linear) self_reduced_linear = self_reduced_linear[i] self_coords = self_coords[:, i] self_data = self_data[i] # Store coords other_coords = other.coords other_data = other.data other_reduced_coords, other_reduced_shape = \ _get_reduced_coords(other_coords, other_shape, other_reduce_params) other_reduced_linear = _linear_loc(other_reduced_coords, other_reduced_shape) i = np.argsort(other_reduced_linear) other_reduced_linear = other_reduced_linear[i] other_coords = other_coords[:, i] other_data = other_data[i] # Find matches between self.coords and other.coords matched_self, matched_other = _match_arrays(self_reduced_linear, other_reduced_linear) # Start with an empty list. This may reduce computation in many cases. data_list = [] coords_list = [] # Add the matched part. matched_coords = _get_matching_coords(self_coords[:, matched_self], other_coords[:, matched_other], self_shape, other_shape) data_list.append(func(self_data[matched_self], other_data[matched_other], *args, **kwargs)) coords_list.append(matched_coords) self_func = func(self_data, other_zero, *args, **kwargs) # Add unmatched parts as necessary. if (self_func != func_zero).any(): self_unmatched_coords, self_unmatched_func = \ _get_unmatched_coords_data(self_coords, self_func, self_shape, result_shape, matched_self, matched_coords) data_list.extend(self_unmatched_func) coords_list.extend(self_unmatched_coords) other_func = func(self_zero, other_data, *args, **kwargs) if (other_func != func_zero).any(): other_unmatched_coords, other_unmatched_func = \ _get_unmatched_coords_data(other_coords, other_func, other_shape, result_shape, matched_other, matched_coords) coords_list.extend(other_unmatched_coords) data_list.extend(other_unmatched_func) # Concatenate matches and mismatches data = np.concatenate(data_list) if len(data_list) else np.empty((0,), dtype=self.dtype) coords = np.concatenate(coords_list, axis=1) if len(coords_list) else \ np.empty((0, len(result_shape)), dtype=self.coords.dtype) nonzero = data != func_zero data = data[nonzero] coords = coords[:, nonzero] return COO(coords, data, shape=result_shape, has_duplicates=False) def _elemwise_binary_self_dense(func, self, other, *args, **kwargs): assert isinstance(self, np.ndarray) assert isinstance(other, COO) result_shape = _get_broadcast_shape(self.shape, other.shape) if result_shape != other.shape: other = other.broadcast_to(result_shape) self = np.broadcast_to(self, result_shape) self_coords = tuple([other.coords[i, :] for i in range(other.ndim)]) self_data = self[self_coords] func_data = func(self_data, other.data, *args, **kwargs) mask = func_data != 0 func_data = func_data[mask] func_coords = other.coords[:, mask] return COO(func_coords, func_data, shape=result_shape, has_duplicates=other.has_duplicates, sorted=other.sorted) def _reverse_self_other(func): def wrapper(*args, **kwargs): return func(args[1], args[0], *args[2:], **kwargs) return wrapper def _get_unmatched_coords_data(coords, data, shape, result_shape, matched_idx, matched_coords): """ Get the unmatched coordinates and data - both those that are unmatched with any point of the other data as well as those which are added because of broadcasting. Parameters ---------- coords : np.ndarray The coordinates to get the unmatched coordinates from. data : np.ndarray The data corresponding to these coordinates. shape : tuple[int] The shape corresponding to these coordinates. result_shape : tuple[int] The result broadcasting shape. matched_idx : np.ndarray The indices into the coords array where it matches with the other array. matched_coords : np.ndarray The overall coordinates that match from both arrays. Returns ------- coords_list : list[np.ndarray] The list of unmatched/broadcasting coordinates. data_list : list[np.ndarray] The data corresponding to the coordinates. """ params = _get_broadcast_parameters(shape, result_shape) matched = np.zeros(len(data), dtype=np.bool) matched[matched_idx] = True unmatched = ~matched data_zero = _zero_of_dtype(data.dtype) nonzero = data != data_zero unmatched &= nonzero matched &= nonzero coords_list = [] data_list = [] unmatched_coords, unmatched_data = \ _get_expanded_coords_data(coords[:, unmatched], data[unmatched], params, result_shape) coords_list.append(unmatched_coords) data_list.append(unmatched_data) if shape != result_shape: broadcast_coords, broadcast_data = \ _get_broadcast_coords_data(coords[:, matched], matched_coords, data[matched], params, result_shape) coords_list.append(broadcast_coords) data_list.append(broadcast_data) return coords_list, data_list def _get_broadcast_shape(shape1, shape2, is_result=False): """ Get the overall broadcasted shape. Parameters ---------- shape1, shape2 : tuple[int] The input shapes to broadcast together. is_result : bool Whether or not shape2 is also the result shape. Returns ------- result_shape : tuple[int] The overall shape of the result. Raises ------ ValueError If the two shapes cannot be broadcast together. """ # https://stackoverflow.com/a/47244284/774273 if not all((l1 == l2) or (l1 == 1) or ((l2 == 1) and not is_result) for l1, l2 in zip(shape1[::-1], shape2[::-1])): raise ValueError('operands could not be broadcast together with shapes %s, %s' % (shape1, shape2)) result_shape = tuple(max(l1, l2) for l1, l2 in zip_longest(shape1[::-1], shape2[::-1], fillvalue=1))[::-1] return result_shape def _get_broadcast_parameters(shape, broadcast_shape): """ Get the broadcast parameters. Parameters ---------- shape : tuple[int] The input shape. broadcast_shape The shape to broadcast to. Returns ------- params : list A list containing None if the dimension isn't in the original array, False if it needs to be broadcast, and True if it doesn't. """ params = [None if l1 is None else l1 == l2 for l1, l2 in zip_longest(shape[::-1], broadcast_shape[::-1], fillvalue=None)][::-1] return params def _get_reduced_coords(coords, shape, params): """ Gets only those dimensions of the coordinates that don't need to be broadcast. Parameters ---------- coords : np.ndarray The coordinates to reduce. params : list The params from which to check which dimensions to get. Returns ------- reduced_coords : np.ndarray The reduced coordinates. """ reduced_params = [bool(param) for param in params] reduced_shape = tuple(l for l, p in zip(shape, params) if p) return coords[reduced_params], reduced_shape def _get_expanded_coords_data(coords, data, params, broadcast_shape): """ Expand coordinates/data to broadcast_shape. Does most of the heavy lifting for broadcast_to. Produces sorted output for sorted inputs. Parameters ---------- coords : np.ndarray The coordinates to expand. data : np.ndarray The data corresponding to the coordinates. params : list The broadcast parameters. broadcast_shape : tuple[int] The shape to broadcast to. Returns ------- expanded_coords : np.ndarray List of 1-D arrays. Each item in the list has one dimension of coordinates. expanded_data : np.ndarray The data corresponding to expanded_coords. """ first_dim = -1 expand_shapes = [] for d, p, l in zip(range(len(broadcast_shape)), params, broadcast_shape): if p and first_dim == -1: expand_shapes.append(coords.shape[1]) first_dim = d if not p: expand_shapes.append(l) all_idx = _cartesian_product(*(np.arange(d, dtype=np.min_scalar_type(d - 1)) for d in expand_shapes)) dt = np.result_type(*(np.min_scalar_type(l - 1) for l in broadcast_shape)) false_dim = 0 dim = 0 expanded_coords = np.empty((len(broadcast_shape), all_idx.shape[1]), dtype=dt) expanded_data = data[all_idx[first_dim]] for d, p, l in zip(range(len(broadcast_shape)), params, broadcast_shape): if p: expanded_coords[d] = coords[dim, all_idx[first_dim]] else: expanded_coords[d] = all_idx[false_dim + (d > first_dim)] false_dim += 1 if p is not None: dim += 1 return np.asarray(expanded_coords), np.asarray(expanded_data) # (c) senderle # Taken from https://stackoverflow.com/a/11146645/774273 # License: https://creativecommons.org/licenses/by-sa/3.0/ def _cartesian_product(*arrays): """ Get the cartesian product of a number of arrays. Parameters ---------- arrays : Tuple[np.ndarray] The arrays to get a cartesian product of. Always sorted with respect to the original array. Returns ------- out : np.ndarray The overall cartesian product of all the input arrays. """ broadcastable = np.ix_(*arrays) broadcasted = np.broadcast_arrays(*broadcastable) rows, cols = np.prod(broadcasted[0].shape), len(broadcasted) dtype = np.result_type(*arrays) out = np.empty(rows * cols, dtype=dtype) start, end = 0, rows for a in broadcasted: out[start:end] = a.reshape(-1) start, end = end, end + rows return out.reshape(cols, rows) def _elemwise_unary(func, self, *args, **kwargs): check = kwargs.pop('check', True) data_zero = _zero_of_dtype(self.dtype) func_zero = _zero_of_dtype(func(data_zero, *args, **kwargs).dtype) if check and func(data_zero, *args, **kwargs) != func_zero: raise ValueError("Performing this operation would produce " "a dense result: %s" % str(func)) data_func = func(self.data, *args, **kwargs) nonzero = data_func != func_zero return COO(self.coords[:, nonzero], data_func[nonzero], shape=self.shape, has_duplicates=self.has_duplicates, sorted=self.sorted) def _get_matching_coords(coords1, coords2, shape1, shape2): """ Takes in the matching coordinates in both dimensions (only those dimensions that don't need to be broadcast in both arrays and returns the coordinates that will overlap in the output array, i.e., the coordinates for which both broadcast arrays will be nonzero. Parameters ---------- coords1, coords2 : np.ndarray shape1, shape2 : tuple[int] Returns ------- matching_coords : np.ndarray The coordinates of the output array for which both inputs will be nonzero. """ result_shape = _get_broadcast_shape(shape1, shape2) params1 = _get_broadcast_parameters(shape1, result_shape) params2 = _get_broadcast_parameters(shape2, result_shape) matching_coords = [] dim1 = 0 dim2 = 0 for p1, p2 in zip(params1, params2): if p1: matching_coords.append(coords1[dim1]) else: matching_coords.append(coords2[dim2]) if p1 is not None: dim1 += 1 if p2 is not None: dim2 += 1 return np.asarray(matching_coords) def _get_broadcast_coords_data(coords, matched_coords, data, params, broadcast_shape): """ Get data that matched in the reduced coordinates but still had a partial overlap because of the broadcast, i.e., it didn't match in one of the other dimensions. Parameters ---------- coords : np.ndarray The list of coordinates of the required array. Must be sorted. matched_coords : np.ndarray The list of coordinates that match. Must be sorted. data : np.ndarray The data corresponding to coords. params : list The broadcast parameters. broadcast_shape : tuple[int] The shape to get the broadcast coordinates. Returns ------- broadcast_coords : np.ndarray The broadcasted coordinates. Is sorted. broadcasted_data : np.ndarray The data corresponding to those coordinates. """ full_coords, full_data = _get_expanded_coords_data(coords, data, params, broadcast_shape) linear_full_coords = _linear_loc(full_coords, broadcast_shape) linear_matched_coords = _linear_loc(matched_coords, broadcast_shape) overlapping_coords, _ = _match_arrays(linear_full_coords, linear_matched_coords) mask = np.ones(full_coords.shape[1], dtype=np.bool) mask[overlapping_coords] = False return full_coords[:, mask], full_data[mask] def _linear_loc(coords, shape, signed=False): n = reduce(operator.mul, shape, 1) if signed: n = -n dtype = np.min_scalar_type(n) out = np.zeros(coords.shape[1], dtype=dtype) tmp = np.zeros(coords.shape[1], dtype=dtype) strides = 1 for i, d in enumerate(shape[::-1]): # out += self.coords[-(i + 1), :].astype(dtype) * strides np.multiply(coords[-(i + 1), :], strides, out=tmp, dtype=dtype) np.add(tmp, out, out=out) strides *= d return out sparse-0.2.0/sparse/dok.py000066400000000000000000000243761323236075200154510ustar00rootroot00000000000000import six import numpy as np # Zip with Python 2/3 compat # Consumes less memory than Py2 zip from six.moves import zip, range from numbers import Integral from collections import Iterable from .slicing import normalize_index from .utils import _zero_of_dtype try: # Windows compatibility int = long except NameError: pass class DOK(object): """ A class for building sparse multidimensional arrays. Parameters ---------- shape : tuple[int] The shape of the array data : dict, optional The key-value pairs for the data in this array. dtype : np.dtype, optional The data type of this array. If left empty, it is inferred from the first element. Attributes ---------- dtype : numpy.dtype The datatype of this array. Can be :code:`None` if no elements have been set yet. shape : tuple[int] The shape of this array. data : dict The keys of this dictionary contain all the indices and the values contain the nonzero entries. See Also -------- COO : A read-only sparse array. Examples -------- You can create :obj:`DOK` objects from Numpy arrays. >>> x = np.eye(5, dtype=np.uint8) >>> x[2, 3] = 5 >>> s = DOK.from_numpy(x) >>> s You can also create them from just shapes, and use slicing assignment. >>> s2 = DOK((5, 5), dtype=np.int64) >>> s2[1:3, 1:3] = [[4, 5], [6, 7]] >>> s2 You can convert :obj:`DOK` arrays to :obj:`COO` arrays, or :obj:`numpy.ndarray` objects. >>> from sparse import COO >>> s3 = COO(s2) >>> s3 >>> s2.todense() # doctest: +NORMALIZE_WHITESPACE array([[0, 0, 0, 0, 0], [0, 4, 5, 0, 0], [0, 6, 7, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]]) >>> s4 = COO.from_numpy(np.eye(4, dtype=np.uint8)) >>> s4 >>> s5 = DOK.from_coo(s4) >>> s5 You can also create :obj:`DOK` arrays from a shape and a dict of values. Zeros are automatically ignored. >>> values = { ... (1, 2, 3): 4, ... (3, 2, 1): 0, ... } >>> s6 = DOK((5, 5, 5), values) >>> s6 """ def __init__(self, shape, data=None, dtype=None): from .coo import COO self.data = {} if isinstance(shape, COO): ar = DOK.from_coo(shape) self.shape = ar.shape self.dtype = ar.dtype self.data = ar.data return if isinstance(shape, np.ndarray): ar = DOK.from_numpy(shape) self.shape = ar.shape self.dtype = ar.dtype self.data = ar.data return self.dtype = np.dtype(dtype) if isinstance(shape, Integral): self.shape = (int(shape),) elif isinstance(shape, Iterable): if not all(isinstance(l, Integral) or int(l) < 0 for l in shape): raise ValueError('shape must be an iterable of non-negative integers.') self.shape = tuple(shape) if not data: data = {} if isinstance(data, dict): if not dtype: if not len(data): self.dtype = np.dtype('float64') else: self.dtype = np.result_type(*map(lambda x: np.asarray(x).dtype, six.itervalues(data))) for c, d in six.iteritems(data): self[c] = d else: raise ValueError('data must be a dict.') @classmethod def from_coo(cls, x): """ Get a :obj:`DOK` array from a :obj:`COO` array. Parameters ---------- x : COO The array to convert. Returns ------- DOK The equivalent :obj:`DOK` array. Examples -------- >>> from sparse import COO >>> s = COO.from_numpy(np.eye(4)) >>> s2 = DOK.from_coo(s) >>> s2 """ ar = cls(x.shape, dtype=x.dtype) for c, d in zip(x.coords.T, x.data): ar.data[tuple(c)] = d return ar def to_coo(self): """ Convert this :obj:`DOK` array to a :obj:`COO` array. Returns ------- COO The equivalent :obj:`COO` array. Examples -------- >>> s = DOK((5, 5)) >>> s[1:3, 1:3] = [[4, 5], [6, 7]] >>> s >>> s2 = s.to_coo() >>> s2 """ from .coo import COO return COO(self) @classmethod def from_numpy(cls, x): """ Get a :obj:`DOK` array from a Numpy array. Parameters ---------- x : np.ndarray The array to convert. Returns ------- DOK The equivalent :obj:`DOK` array. Examples -------- >>> s = DOK.from_numpy(np.eye(4)) >>> s """ ar = cls(x.shape, dtype=x.dtype) coords = np.nonzero(x) data = x[coords] for c in zip(data, *coords): d, c = c[0], c[1:] ar.data[c] = d return ar @property def ndim(self): """ The number of dimensions in this array. Returns ------- int The number of dimensions. See Also -------- COO.ndim : Equivalent property for :obj:`COO` arrays. numpy.ndarray.ndim : Numpy equivalent property. Examples -------- >>> s = DOK((1, 2, 3)) >>> s.ndim 3 """ return len(self.shape) @property def nnz(self): """ The number of nonzero elements in this array. Returns ------- int The number of nonzero elements. See Also -------- COO.nnz : Equivalent :obj:`COO` array property. numpy.count_nonzero : A similar Numpy function. scipy.sparse.dok_matrix.nnz : The Scipy equivalent property. Examples -------- >>> values = { ... (1, 2, 3): 4, ... (3, 2, 1): 0, ... } >>> s = DOK((5, 5, 5), values) >>> s.nnz 1 """ return len(self.data) def __getitem__(self, key): key = normalize_index(key, self.shape) if not all(isinstance(i, Integral) for i in key): raise NotImplementedError('All indices must be integers' ' when getting an item.') if len(key) != self.ndim: raise NotImplementedError('Can only get single elements. ' 'Expected key of length %d, got %s' % (self.ndim, str(key))) key = tuple(int(k) for k in key) if key in self.data: return self.data[key] else: return _zero_of_dtype(self.dtype)[()] def __setitem__(self, key, value): key = normalize_index(key, self.shape) value = np.asanyarray(value) value = value.astype(self.dtype) key_list = [int(k) if isinstance(k, Integral) else k for k in key] self._setitem(key_list, value) def _setitem(self, key_list, value): value_missing_dims = len([ind for ind in key_list if isinstance(ind, slice)]) - value.ndim if value_missing_dims < 0: raise ValueError('setting an array element with a sequence.') for i, ind in enumerate(key_list): if isinstance(ind, slice): step = ind.step if ind.step is not None else 1 if step > 0: start = ind.start if ind.start is not None else 0 start = max(start, 0) stop = ind.stop if ind.stop is not None else self.shape[i] stop = min(stop, self.shape[i]) if start > stop: start = stop else: start = ind.start or self.shape[i] - 1 stop = ind.stop if ind.stop is not None else -1 start = min(start, self.shape[i] - 1) stop = max(stop, -1) if start < stop: start = stop key_list_temp = key_list[:] for v_idx, ki in enumerate(range(start, stop, step)): key_list_temp[i] = ki vi = value if value_missing_dims > 0 else \ (value[0] if value.shape[0] == 1 else value[v_idx]) self._setitem(key_list_temp, vi) return elif not isinstance(ind, Integral): raise IndexError('All indices must be slices or integers' ' when setting an item.') if value != _zero_of_dtype(self.dtype): self.data[tuple(key_list)] = value[()] def __str__(self): return "" % (self.shape, self.dtype, self.nnz) __repr__ = __str__ def todense(self): """ Convert this :obj:`DOK` array into a Numpy array. Returns ------- numpy.ndarray The equivalent dense array. See Also -------- COO.todense : Equivalent :obj:`COO` array method. scipy.sparse.dok_matrix.todense : Equivalent Scipy method. Examples -------- >>> s = DOK((5, 5)) >>> s[1:3, 1:3] = [[4, 5], [6, 7]] >>> s.todense() # doctest: +SKIP array([[0., 0., 0., 0., 0.], [0., 4., 5., 0., 0.], [0., 6., 7., 0., 0.], [0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.]]) """ result = np.zeros(self.shape, dtype=self.dtype) for c, d in six.iteritems(self.data): result[c] = d return result sparse-0.2.0/sparse/slicing.py000066400000000000000000000203651323236075200163160ustar00rootroot00000000000000# Most of this file is taken from https://github.com/dask/dask/blob/master/dask/array/slicing.py # See license at https://github.com/dask/dask/blob/master/LICENSE.txt import math from numbers import Integral, Number import numpy as np def normalize_index(idx, shape): """ Normalize slicing indexes 1. Replaces ellipses with many full slices 2. Adds full slices to end of index 3. Checks bounding conditions 4. Replaces numpy arrays with lists 5. Posify's integers and lists 6. Normalizes slices to canonical form Examples -------- >>> normalize_index(1, (10,)) (1,) >>> normalize_index(-1, (10,)) (9,) >>> normalize_index([-1], (10,)) (array([9]),) >>> normalize_index(slice(-3, 10, 1), (10,)) (slice(7, None, None),) >>> normalize_index((Ellipsis, None), (10,)) (slice(None, None, None), None) """ if not isinstance(idx, tuple): idx = (idx,) idx = replace_ellipsis(len(shape), idx) n_sliced_dims = 0 for i in idx: if hasattr(i, 'ndim') and i.ndim >= 1: n_sliced_dims += i.ndim elif i is None: continue else: n_sliced_dims += 1 idx = idx + (slice(None),) * (len(shape) - n_sliced_dims) if len([i for i in idx if i is not None]) > len(shape): raise IndexError("Too many indices for array") none_shape = [] i = 0 for ind in idx: if ind is not None: none_shape.append(shape[i]) i += 1 else: none_shape.append(None) for i, d in zip(idx, none_shape): if d is not None: check_index(i, d) idx = tuple(map(sanitize_index, idx)) idx = tuple(map(normalize_slice, idx, none_shape)) idx = posify_index(none_shape, idx) return idx def replace_ellipsis(n, index): """ Replace ... with slices, :, : ,: >>> replace_ellipsis(4, (3, Ellipsis, 2)) (3, slice(None, None, None), slice(None, None, None), 2) >>> replace_ellipsis(2, (Ellipsis, None)) (slice(None, None, None), slice(None, None, None), None) """ # Careful about using in or index because index may contain arrays isellipsis = [i for i, ind in enumerate(index) if ind is Ellipsis] if not isellipsis: return index elif len(isellipsis) > 1: raise IndexError("an index can only have a single ellipsis ('...')") else: loc = isellipsis[0] extra_dimensions = n - (len(index) - sum(i is None for i in index) - 1) return index[:loc] + (slice(None, None, None),) * extra_dimensions + index[loc + 1:] def check_index(ind, dimension): """ Check validity of index for a given dimension Examples -------- >>> check_index(3, 5) >>> check_index(5, 5) Traceback (most recent call last): ... IndexError: Index is not smaller than dimension 5 >= 5 >>> check_index(6, 5) Traceback (most recent call last): ... IndexError: Index is not smaller than dimension 6 >= 5 >>> check_index(-1, 5) >>> check_index(-6, 5) Traceback (most recent call last): ... IndexError: Negative index is not greater than negative dimension -6 <= -5 >>> check_index([1, 2], 5) >>> check_index([6, 3], 5) Traceback (most recent call last): ... IndexError: Index out of bounds 5 >>> check_index(slice(0, 3), 5) """ # unknown dimension, assumed to be in bounds if np.isnan(dimension): return elif isinstance(ind, (list, np.ndarray)): x = np.asanyarray(ind) if np.issubdtype(x.dtype, np.integer) and \ ((x >= dimension).any() or (x < -dimension).any()): raise IndexError("Index out of bounds %s" % dimension) elif x.dtype == bool and len(x) != dimension: raise IndexError("boolean index did not match indexed array; dimension is %s " "but corresponding boolean dimension is %s", (dimension, len(x))) elif isinstance(ind, slice): return elif ind is None: return elif ind >= dimension: raise IndexError("Index is not smaller than dimension %d >= %d" % (ind, dimension)) elif ind < -dimension: msg = "Negative index is not greater than negative dimension %d <= -%d" raise IndexError(msg % (ind, dimension)) def sanitize_index(ind): """ Sanitize the elements for indexing along one axis >>> sanitize_index([2, 3, 5]) array([2, 3, 5]) >>> sanitize_index([True, False, True, False]) array([0, 2]) >>> sanitize_index(np.array([1, 2, 3])) array([1, 2, 3]) >>> sanitize_index(np.array([False, True, True])) array([1, 2]) >>> type(sanitize_index(np.int32(0))) # doctest: +SKIP >>> sanitize_index(1.0) 1 >>> sanitize_index(0.5) Traceback (most recent call last): ... IndexError: Bad index. Must be integer-like: 0.5 """ if ind is None: return None elif isinstance(ind, slice): return slice(_sanitize_index_element(ind.start), _sanitize_index_element(ind.stop), _sanitize_index_element(ind.step)) elif isinstance(ind, Number): return _sanitize_index_element(ind) index_array = np.asanyarray(ind) if index_array.dtype == bool: nonzero = np.nonzero(index_array) if len(nonzero) == 1: # If a 1-element tuple, unwrap the element nonzero = nonzero[0] return np.asanyarray(nonzero) elif np.issubdtype(index_array.dtype, np.integer): return index_array elif np.issubdtype(index_array.dtype, float): int_index = index_array.astype(np.intp) if np.allclose(index_array, int_index): return int_index else: check_int = np.isclose(index_array, int_index) first_err = index_array.ravel( )[np.flatnonzero(~check_int)[0]] raise IndexError("Bad index. Must be integer-like: %s" % first_err) else: raise TypeError("Invalid index type", type(ind), ind) def _sanitize_index_element(ind): """Sanitize a one-element index.""" if isinstance(ind, Number): ind2 = int(ind) if ind2 != ind: raise IndexError("Bad index. Must be integer-like: %s" % ind) else: return ind2 elif ind is None: return None else: raise TypeError("Invalid index type", type(ind), ind) def normalize_slice(idx, dim): """ Normalize slices to canonical form Parameters ---------- idx: slice or other index dim: dimension length Examples -------- >>> normalize_slice(slice(0, 10, 1), 10) slice(None, None, None) """ if isinstance(idx, slice): start, stop, step = idx.start, idx.stop, idx.step if start is not None: if start < 0 and not math.isnan(dim): start = max(0, start + dim) elif start > dim: start = dim if stop is not None: if stop < 0 and not math.isnan(dim): stop = max(0, stop + dim) elif stop > dim: stop = dim step = 1 if step is None else step if step > 0: if start == 0: start = None if stop == dim: stop = None else: if start == dim - 1: start = None if stop == -1: stop = None if step == 1: step = None return slice(start, stop, step) return idx def posify_index(shape, ind): """ Flip negative indices around to positive ones >>> posify_index(10, 3) 3 >>> posify_index(10, -3) 7 >>> posify_index(10, [3, -3]) array([3, 7]) >>> posify_index((10, 20), (3, -3)) (3, 17) >>> posify_index((10, 20), (3, [3, 4, -3])) # doctest: +NORMALIZE_WHITESPACE (3, array([ 3, 4, 17])) """ if isinstance(ind, tuple): return tuple(map(posify_index, shape, ind)) if isinstance(ind, Integral): if ind < 0 and not math.isnan(shape): return ind + shape else: return ind if isinstance(ind, (np.ndarray, list)) and not math.isnan(shape): ind = np.asanyarray(ind) return np.where(ind < 0, ind + shape, ind) return ind sparse-0.2.0/sparse/tests/000077500000000000000000000000001323236075200154505ustar00rootroot00000000000000sparse-0.2.0/sparse/tests/test_coo.py000066400000000000000000000715051323236075200176510ustar00rootroot00000000000000import pytest from packaging import version import operator import numpy as np import scipy.sparse import scipy.stats from sparse import COO import sparse from sparse.utils import assert_eq, is_lexsorted @pytest.mark.parametrize('reduction,kwargs,eqkwargs', [ ('max', {}, {}), ('sum', {}, {}), ('sum', {'dtype': np.float16}, {'atol': 1e-2}), ('prod', {}, {}), ('min', {}, {}), ]) @pytest.mark.parametrize('axis', [None, 0, 1, 2, (0, 2)]) @pytest.mark.parametrize('keepdims', [True, False]) def test_reductions(reduction, axis, keepdims, kwargs, eqkwargs): x = sparse.random((2, 3, 4), density=.25) y = x.todense() xx = getattr(x, reduction)(axis=axis, keepdims=keepdims, **kwargs) yy = getattr(y, reduction)(axis=axis, keepdims=keepdims, **kwargs) assert_eq(xx, yy, **eqkwargs) @pytest.mark.parametrize('reduction,kwargs,eqkwargs', [ (np.max, {}, {}), (np.sum, {}, {}), (np.sum, {'dtype': np.float16}, {'atol': 1e-2}), (np.prod, {}, {}), (np.min, {}, {}), ]) @pytest.mark.parametrize('axis', [None, 0, 1, 2, (0, 2)]) @pytest.mark.parametrize('keepdims', [True, False]) def test_ufunc_reductions(reduction, axis, keepdims, kwargs, eqkwargs): x = sparse.random((2, 3, 4), density=.5) y = x.todense() xx = reduction(x, axis=axis, keepdims=keepdims, **kwargs) yy = reduction(y, axis=axis, keepdims=keepdims, **kwargs) assert_eq(xx, yy, **eqkwargs) @pytest.mark.parametrize('axis', [ None, (1, 2, 0), (2, 1, 0), (0, 1, 2), (0, 1, -1), (0, -2, -1), (-3, -2, -1), ]) def test_transpose(axis): x = sparse.random((2, 3, 4), density=.25) y = x.todense() xx = x.transpose(axis) yy = y.transpose(axis) assert_eq(xx, yy) @pytest.mark.parametrize('axis', [ (0, 1), # too few (0, 1, 2, 3), # too many (3, 1, 0), # axis 3 illegal (0, -1, -4), # axis -4 illegal (0, 0, 1), # duplicate axis 0 (0, -1, 2), # duplicate axis -1 == 2 ]) def test_transpose_error(axis): x = sparse.random((2, 3, 4), density=.25) y = x.todense() with pytest.raises(ValueError): x.transpose(axis) with pytest.raises(ValueError): y.transpose(axis) @pytest.mark.parametrize('a,b', [ [(3, 4), (3, 4)], [(12,), (3, 4)], [(12,), (3, -1)], [(3, 4), (12,)], [(3, 4), (-1, 4)], [(3, 4), (3, -1)], [(2, 3, 4, 5), (8, 15)], [(2, 3, 4, 5), (24, 5)], [(2, 3, 4, 5), (20, 6)], [(), ()], ]) def test_reshape(a, b): s = sparse.random(a, density=0.5) x = s.todense() assert_eq(x.reshape(b), s.reshape(b)) def test_large_reshape(): n = 100 m = 10 row = np.arange(n, dtype=np.uint16) # np.random.randint(0, n, size=n, dtype=np.uint16) col = row % m # np.random.randint(0, m, size=n, dtype=np.uint16) data = np.ones(n, dtype=np.uint8) x = COO((data, (row, col)), sorted=True, has_duplicates=False) assert_eq(x, x.reshape(x.shape)) def test_reshape_same(): s = sparse.random((3, 5), density=0.5) assert s.reshape(s.shape) is s def test_to_scipy_sparse(): s = sparse.random((3, 5), density=0.5) a = s.to_scipy_sparse() b = scipy.sparse.coo_matrix(s.todense()) assert_eq(a, b) @pytest.mark.parametrize('a_shape,b_shape,axes', [ [(3, 4), (4, 3), (1, 0)], [(3, 4), (4, 3), (0, 1)], [(3, 4, 5), (4, 3), (1, 0)], [(3, 4), (5, 4, 3), (1, 1)], [(3, 4), (5, 4, 3), ((0, 1), (2, 1))], [(3, 4), (5, 4, 3), ((1, 0), (1, 2))], [(3, 4, 5), (4,), (1, 0)], [(4,), (3, 4, 5), (0, 1)], [(4,), (4,), (0, 0)], [(4,), (4,), 0], ]) def test_tensordot(a_shape, b_shape, axes): sa = sparse.random(a_shape, density=0.5) sb = sparse.random(b_shape, density=0.5) a = sa.todense() b = sb.todense() assert_eq(np.tensordot(a, b, axes), sparse.tensordot(sa, sb, axes)) assert_eq(np.tensordot(a, b, axes), sparse.tensordot(sa, b, axes)) # assert isinstance(sparse.tensordot(sa, b, axes), COO) assert_eq(np.tensordot(a, b, axes), sparse.tensordot(a, sb, axes)) # assert isinstance(sparse.tensordot(a, sb, axes), COO) def test_dot(): import operator sa = sparse.random((3, 4, 5), density=0.5) sb = sparse.random((5, 6), density=0.5) a = sa.todense() b = sb.todense() assert_eq(a.dot(b), sa.dot(sb)) assert_eq(np.dot(a, b), sparse.dot(sa, sb)) if hasattr(operator, 'matmul'): # Basic equivalences assert_eq(eval("a @ b"), eval("sa @ sb")) assert_eq(eval("sa @ sb"), sparse.dot(sa, sb)) # Test that SOO's and np.array's combine correctly # Not possible due to https://github.com/numpy/numpy/issues/9028 # assert_eq(eval("a @ sb"), eval("sa @ b")) @pytest.mark.xfail def test_dot_nocoercion(): sa = sparse.random((3, 4, 5), density=0.5) sb = sparse.random((5, 6), density=0.5) a = sa.todense() b = sb.todense() la = a.tolist() lb = b.tolist() la, lb # silencing flake8 if hasattr(operator, 'matmul'): # Operations with naive collection (list) assert_eq(eval("la @ b"), eval("la @ sb")) assert_eq(eval("a @ lb"), eval("sa @ lb")) @pytest.mark.parametrize('func', [np.expm1, np.log1p, np.sin, np.tan, np.sinh, np.tanh, np.floor, np.ceil, np.sqrt, np.conj, np.round, np.rint, lambda x: x.astype('int32'), np.conjugate, np.conj, lambda x: x.round(decimals=2), abs]) def test_elemwise(func): s = sparse.random((2, 3, 4), density=0.5) x = s.todense() fs = func(s) assert isinstance(fs, COO) assert_eq(func(x), fs) @pytest.mark.parametrize('func', [ operator.mul, operator.add, operator.sub, operator.gt, operator.lt, operator.ne ]) @pytest.mark.parametrize('shape', [(2,), (2, 3), (2, 3, 4), (2, 3, 4, 5)]) def test_elemwise_binary(func, shape): xs = sparse.random(shape, density=0.5) ys = sparse.random(shape, density=0.5) x = xs.todense() y = ys.todense() assert_eq(func(xs, ys), func(x, y)) @pytest.mark.parametrize('func', [ operator.pow, operator.truediv, operator.floordiv, operator.ge, operator.le, operator.eq, operator.mod ]) @pytest.mark.filterwarnings('ignore:divide by zero') @pytest.mark.filterwarnings('ignore:invalid value') def test_auto_densification_fails(func): xs = sparse.random((2, 3, 4), density=0.5) ys = sparse.random((2, 3, 4), density=0.5) with pytest.raises(ValueError): func(xs, ys) @pytest.mark.parametrize('func', [ operator.mul, operator.add, operator.sub, operator.gt, operator.lt, operator.ne ]) def test_op_scipy_sparse(func): xs = sparse.random((3, 4), density=0.5) y = sparse.random((3, 4), density=0.5).todense() ys = scipy.sparse.csr_matrix(y) x = xs.todense() assert_eq(func(x, y), func(xs, ys)) @pytest.mark.parametrize('func, scalar', [ (operator.mul, 5), (operator.add, 0), (operator.sub, 0), (operator.pow, 5), (operator.truediv, 3), (operator.floordiv, 4), (operator.gt, 5), (operator.lt, -5), (operator.ne, 0), (operator.ge, 5), (operator.le, -3), (operator.eq, 1), (operator.mod, 5) ]) @pytest.mark.parametrize('convert_to_np_number', [True, False]) def test_elemwise_scalar(func, scalar, convert_to_np_number): xs = sparse.random((2, 3, 4), density=0.5) if convert_to_np_number: scalar = np.float32(scalar) y = scalar x = xs.todense() fs = func(xs, y) assert isinstance(fs, COO) assert xs.nnz >= fs.nnz assert_eq(fs, func(x, y)) @pytest.mark.parametrize('func, scalar', [ (operator.mul, 5), (operator.add, 0), (operator.sub, 0), (operator.gt, -5), (operator.lt, 5), (operator.ne, 0), (operator.ge, -5), (operator.le, 3), (operator.eq, 1), ]) @pytest.mark.parametrize('convert_to_np_number', [True, False]) def test_leftside_elemwise_scalar(func, scalar, convert_to_np_number): xs = sparse.random((2, 3, 4), density=0.5) if convert_to_np_number: scalar = np.float32(scalar) y = scalar x = xs.todense() fs = func(y, xs) assert isinstance(fs, COO) assert xs.nnz >= fs.nnz assert_eq(fs, func(y, x)) @pytest.mark.parametrize('func, scalar', [ (operator.add, 5), (operator.sub, -5), (operator.pow, -3), (operator.truediv, 0), (operator.floordiv, 0), (operator.gt, -5), (operator.lt, 5), (operator.ne, 1), (operator.ge, -3), (operator.le, 3), (operator.eq, 0) ]) @pytest.mark.filterwarnings('ignore:divide by zero') @pytest.mark.filterwarnings('ignore:invalid value') def test_scalar_densification_fails(func, scalar): xs = sparse.random((2, 3, 4), density=0.5) y = scalar with pytest.raises(ValueError): func(xs, y) @pytest.mark.parametrize('func', [ operator.and_, operator.or_, operator.xor ]) @pytest.mark.parametrize('shape', [ (2,), (2, 3), (2, 3, 4), (2, 3, 4, 5) ]) def test_bitwise_binary(func, shape): # Small arrays need high density to have nnz entries # Casting floats to int will result in all zeros, hence the * 100 xs = (sparse.random(shape, density=0.5) * 100).astype(np.int_) ys = (sparse.random(shape, density=0.5) * 100).astype(np.int_) x = xs.todense() y = ys.todense() assert_eq(func(xs, ys), func(x, y)) @pytest.mark.parametrize('func', [ operator.lshift, operator.rshift ]) @pytest.mark.parametrize('shape', [ (2,), (2, 3), (2, 3, 4), (2, 3, 4, 5) ]) def test_bitshift_binary(func, shape): # Small arrays need high density to have nnz entries # Casting floats to int will result in all zeros, hence the * 100 xs = (sparse.random(shape, density=0.5) * 100).astype(np.int_) # Can't merge into test_bitwise_binary because left/right shifting # with something >= 64 isn't defined. ys = (sparse.random(shape, density=0.5) * 64).astype(np.int_) x = xs.todense() y = ys.todense() assert_eq(func(xs, ys), func(x, y)) @pytest.mark.parametrize('func', [ operator.and_ ]) @pytest.mark.parametrize('shape', [ (2,), (2, 3), (2, 3, 4), (2, 3, 4, 5) ]) def test_bitwise_scalar(func, shape): # Small arrays need high density to have nnz entries # Casting floats to int will result in all zeros, hence the * 100 xs = (sparse.random(shape, density=0.5) * 100).astype(np.int_) # Can't merge into test_bitwise_binary because left/right shifting # with something >= 64 isn't defined. y = np.random.randint(100) x = xs.todense() assert_eq(func(xs, y), func(x, y)) assert_eq(func(y, xs), func(y, x)) @pytest.mark.parametrize('func', [ operator.lshift, operator.rshift ]) @pytest.mark.parametrize('shape', [ (2,), (2, 3), (2, 3, 4), (2, 3, 4, 5) ]) def test_bitshift_scalar(func, shape): # Small arrays need high density to have nnz entries # Casting floats to int will result in all zeros, hence the * 100 xs = (sparse.random(shape, density=0.5) * 100).astype(np.int_) # Can't merge into test_bitwise_binary because left/right shifting # with something >= 64 isn't defined. y = np.random.randint(64) x = xs.todense() assert_eq(func(xs, y), func(x, y)) @pytest.mark.parametrize('func', [operator.invert]) @pytest.mark.parametrize('shape', [(2,), (2, 3), (2, 3, 4), (2, 3, 4, 5)]) def test_unary_bitwise_densification_fails(func, shape): # Small arrays need high density to have nnz entries # Casting floats to int will result in all zeros, hence the * 100 xs = (sparse.random(shape, density=0.5) * 100).astype(np.int_) with pytest.raises(ValueError): func(xs) @pytest.mark.parametrize('func', [operator.or_, operator.xor]) @pytest.mark.parametrize('shape', [(2,), (2, 3), (2, 3, 4), (2, 3, 4, 5)]) def test_binary_bitwise_densification_fails(func, shape): # Small arrays need high density to have nnz entries # Casting floats to int will result in all zeros, hence the * 100 xs = (sparse.random(shape, density=0.5) * 100).astype(np.int_) y = np.random.randint(1, 100) with pytest.raises(ValueError): func(xs, y) with pytest.raises(ValueError): func(y, xs) @pytest.mark.parametrize('func', [operator.lshift, operator.rshift]) @pytest.mark.parametrize('shape', [(2,), (2, 3), (2, 3, 4), (2, 3, 4, 5)]) def test_binary_bitshift_densification_fails(func, shape): # Small arrays need high density to have nnz entries # Casting floats to int will result in all zeros, hence the * 100 x = np.random.randint(1, 100) ys = (sparse.random(shape, density=0.5) * 64).astype(np.int_) with pytest.raises(ValueError): func(x, ys) @pytest.mark.parametrize('func', [operator.and_, operator.or_, operator.xor]) @pytest.mark.parametrize('shape', [(2,), (2, 3), (2, 3, 4), (2, 3, 4, 5)]) def test_bitwise_binary_bool(func, shape): # Small arrays need high density to have nnz entries xs = sparse.random(shape, density=0.5).astype(bool) ys = sparse.random(shape, density=0.5).astype(bool) x = xs.todense() y = ys.todense() assert_eq(func(xs, ys), func(x, y)) @pytest.mark.parametrize('func', [operator.mul]) @pytest.mark.parametrize('shape', [(2,), (2, 3), (2, 3, 4), (2, 3, 4, 5)]) def test_numpy_mixed_binary(func, shape): xs = sparse.random(shape, density=0.5) y = np.random.rand(*shape) x = xs.todense() fs1 = func(xs, y) assert isinstance(fs1, COO) assert fs1.nnz <= xs.nnz assert_eq(fs1, func(x, y)) fs2 = func(y, xs) assert isinstance(fs2, COO) assert fs2.nnz <= xs.nnz assert_eq(fs2, func(y, x)) @pytest.mark.parametrize('func', [operator.and_]) @pytest.mark.parametrize('shape', [(2,), (2, 3), (2, 3, 4), (2, 3, 4, 5)]) def test_numpy_mixed_binary_bitwise(func, shape): xs = (sparse.random(shape, density=0.5) * 100).astype(np.int_) y = np.random.randint(100, size=shape) x = xs.todense() fs1 = func(xs, y) assert isinstance(fs1, COO) assert fs1.nnz <= xs.nnz assert_eq(fs1, func(x, y)) fs2 = func(y, xs) assert isinstance(fs2, COO) assert fs2.nnz <= xs.nnz assert_eq(fs2, func(y, x)) def test_elemwise_binary_empty(): x = COO({}, shape=(10, 10)) y = sparse.random((10, 10), density=0.5) for z in [x * y, y * x]: assert z.nnz == 0 assert z.coords.shape == (2, 0) assert z.data.shape == (0,) def test_gt(): s = sparse.random((2, 3, 4), density=0.5) x = s.todense() m = x.mean() assert_eq(x > m, s > m) m = s.data[2] assert_eq(x > m, s > m) assert_eq(x >= m, s >= m) @pytest.mark.parametrize('index', [ 0, 1, -1, (slice(0, 2),), (slice(None, 2), slice(None, 2)), (slice(1, None), slice(1, None)), (slice(None, None),), (slice(None, 2, -1), slice(None, 2, -1)), (slice(1, None, 2), slice(1, None, 2)), (slice(None, None, 2),), (slice(None, 2, -1), slice(None, 2, -2)), (slice(1, None, 2), slice(1, None, 1)), (slice(None, None, -2),), (0, slice(0, 2),), (slice(0, 1), 0), ([1, 0], 0), (1, [0, 2]), (0, [1, 0], 0), (1, [2, 0], 0), (None, slice(1, 3), 0), (Ellipsis, slice(1, 3)), (1, Ellipsis, slice(1, 3)), (slice(0, 1), Ellipsis), (Ellipsis, None), (None, Ellipsis), (1, Ellipsis), (1, Ellipsis, None), (1, 1, 1), (1, 1, 1, Ellipsis), (Ellipsis, 1, None), (slice(0, 3), None, 0), (slice(1, 2), slice(2, 4)), (slice(1, 2), slice(None, None)), (slice(1, 2), slice(None, None), 2), (slice(1, 2, 2), slice(None, None), 2), (slice(1, 2, None), slice(None, None, 2), 2), (slice(1, 2, -2), slice(None, None), -2), (slice(1, 2, None), slice(None, None, -2), 2), (slice(1, 2, -1), slice(None, None), -1), (slice(1, 2, None), slice(None, None, -1), 2), (slice(2, 0, -1), slice(None, None), -1), (slice(-2, None, None),), (slice(-1, None, None), slice(-2, None, None)), ([True, False], slice(1, None), slice(-2, None)), (slice(1, None), slice(-2, None), [True, False, True, False]), ]) def test_slicing(index): s = sparse.random((2, 3, 4), density=0.5) x = s.todense() assert_eq(x[index], s[index]) def test_custom_dtype_slicing(): dt = np.dtype([('part1', np.float_), ('part2', np.int_, (2,)), ('part3', np.int_, (2, 2))]) x = np.zeros((2, 3, 4), dtype=dt) x[1, 1, 1] = (0.64, [4, 2], [[1, 2], [3, 0]]) s = COO.from_numpy(x) assert x[1, 1, 1] == s[1, 1, 1] assert x[0, 1, 2] == s[0, 1, 2] assert_eq(x['part1'], s['part1']) assert_eq(x['part2'], s['part2']) assert_eq(x['part3'], s['part3']) @pytest.mark.parametrize('index', [ (Ellipsis, Ellipsis), (1, 1, 1, 1), (slice(None),) * 4, 5, -5, 'foo', pytest.param( [True, False, False], marks=pytest.mark.skipif( version.parse(np.version.version) < version.parse("1.13.0"), reason="NumPy < 1.13.0 does not raise these Exceptions" ) ), ]) def test_slicing_errors(index): s = sparse.random((2, 3, 4), density=0.5) x = s.todense() try: x[index] except Exception as e: e1 = e else: raise Exception("exception not raised") try: s[index] except Exception as e: assert type(e) == type(e1) else: raise Exception("exception not raised") def test_canonical(): coords = np.array([[0, 0, 0], [0, 1, 0], [1, 0, 3], [0, 1, 0], [1, 0, 3]]).T data = np.arange(5) old = COO(coords, data, shape=(2, 2, 5)) x = COO(coords, data, shape=(2, 2, 5)) x.sum_duplicates() assert_eq(old, x) # assert x.nnz == 5 # assert x.has_duplicates assert x.nnz == 3 assert not x.has_duplicates def test_concatenate(): xx = sparse.random((2, 3, 4), density=0.5) x = xx.todense() yy = sparse.random((5, 3, 4), density=0.5) y = yy.todense() zz = sparse.random((4, 3, 4), density=0.5) z = zz.todense() assert_eq(np.concatenate([x, y, z], axis=0), sparse.concatenate([xx, yy, zz], axis=0)) xx = sparse.random((5, 3, 1), density=0.5) x = xx.todense() yy = sparse.random((5, 3, 3), density=0.5) y = yy.todense() zz = sparse.random((5, 3, 2), density=0.5) z = zz.todense() assert_eq(np.concatenate([x, y, z], axis=2), sparse.concatenate([xx, yy, zz], axis=2)) assert_eq(np.concatenate([x, y, z], axis=-1), sparse.concatenate([xx, yy, zz], axis=-1)) @pytest.mark.parametrize('axis', [0, 1]) @pytest.mark.parametrize('func', ['stack', 'concatenate']) def test_concatenate_mixed(func, axis): s = sparse.random((10, 10), density=0.5) d = s.todense() result = getattr(sparse, func)([d, s, s], axis=axis) expected = getattr(np, func)([d, d, d], axis=axis) assert isinstance(result, COO) assert_eq(result, expected) @pytest.mark.parametrize('shape', [(5,), (2, 3, 4), (5, 2)]) @pytest.mark.parametrize('axis', [0, 1, -1]) def test_stack(shape, axis): xx = sparse.random(shape, density=0.5) x = xx.todense() yy = sparse.random(shape, density=0.5) y = yy.todense() zz = sparse.random(shape, density=0.5) z = zz.todense() assert_eq(np.stack([x, y, z], axis=axis), sparse.stack([xx, yy, zz], axis=axis)) def test_large_concat_stack(): data = np.array([1], dtype=np.uint8) coords = np.array([[255]], dtype=np.uint8) xs = COO(coords, data, shape=(256,), has_duplicates=False, sorted=True) x = xs.todense() assert_eq(np.stack([x, x]), sparse.stack([xs, xs])) assert_eq(np.concatenate((x, x)), sparse.concatenate((xs, xs))) def test_coord_dtype(): s = sparse.random((2, 3, 4), density=0.5) assert s.coords.dtype == np.uint8 s = COO.from_numpy(np.zeros(1000)) assert s.coords.dtype == np.uint16 def test_addition(): a = sparse.random((2, 3, 4), density=0.5) x = a.todense() b = sparse.random((2, 3, 4), density=0.5) y = b.todense() assert_eq(x + y, a + b) assert_eq(x - y, a - b) def test_addition_not_ok_when_large_and_sparse(): x = COO({(0, 0): 1}, shape=(1000000, 1000000)) with pytest.raises(ValueError): x + 1 with pytest.raises(ValueError): 1 + x with pytest.raises(ValueError): 1 - x with pytest.raises(ValueError): x - 1 with pytest.raises(ValueError): np.exp(x) @pytest.mark.parametrize('func', [operator.add, operator.mul]) @pytest.mark.parametrize('shape1,shape2', [((2, 3, 4), (3, 4)), ((3, 4), (2, 3, 4)), ((3, 1, 4), (3, 2, 4)), ((1, 3, 4), (3, 4)), ((3, 4, 1), (3, 4, 2)), ((1, 5), (5, 1))]) def test_broadcasting(func, shape1, shape2): xs = sparse.random(shape1, density=0.5) x = xs.todense() ys = sparse.random(shape2, density=0.5) y = ys.todense() expected = func(x, y) actual = func(xs, ys) assert_eq(expected, actual) assert np.count_nonzero(expected) == actual.nnz @pytest.mark.parametrize('func', [operator.mul]) @pytest.mark.parametrize('shape1,shape2', [((2, 3, 4), (3, 4)), ((3, 4), (2, 3, 4)), ((3, 1, 4), (3, 2, 4)), ((1, 3, 4), (3, 4)), ((3, 4, 1), (3, 4, 2)), ((1, 5), (5, 1))]) def test_numpy_mixed_broadcasting(func, shape1, shape2): xs = sparse.random(shape1, density=0.5) x = xs.todense() y = np.random.rand(*shape2) expected = func(x, y) actual = func(xs, y) assert isinstance(actual, COO) assert_eq(expected, actual) assert np.count_nonzero(expected) == actual.nnz @pytest.mark.parametrize('shape1,shape2', [((3, 4), (2, 3, 4)), ((3, 1, 4), (3, 2, 4)), ((3, 4, 1), (3, 4, 2))]) def test_broadcast_to(shape1, shape2): a = sparse.random(shape1, density=0.5) x = a.todense() assert_eq(np.broadcast_to(x, shape2), a.broadcast_to(shape2)) @pytest.mark.parametrize('scalar', [2, 2.5, np.float32(2.0), np.int8(3)]) def test_scalar_multiplication(scalar): a = sparse.random((2, 3, 4), density=0.5) x = a.todense() assert_eq(x * scalar, a * scalar) assert (a * scalar).nnz == a.nnz assert_eq(scalar * x, scalar * a) assert (scalar * a).nnz == a.nnz assert_eq(x / scalar, a / scalar) assert (a / scalar).nnz == a.nnz assert_eq(x // scalar, a // scalar) # division may reduce nnz. @pytest.mark.filterwarnings('ignore:divide by zero') def test_scalar_exponentiation(): a = sparse.random((2, 3, 4), density=0.5) x = a.todense() assert_eq(x ** 2, a ** 2) assert_eq(x ** 0.5, a ** 0.5) with pytest.raises((ValueError, ZeroDivisionError)): assert_eq(x ** -1, a ** -1) def test_create_with_lists_of_tuples(): L = [((0, 0, 0), 1), ((1, 2, 1), 1), ((1, 1, 1), 2), ((1, 3, 2), 3)] s = COO(L) x = np.zeros((2, 4, 3), dtype=np.asarray([1, 2, 3]).dtype) for ind, value in L: x[ind] = value assert_eq(s, x) def test_sizeof(): import sys x = np.eye(100) y = COO.from_numpy(x) nb = sys.getsizeof(y) assert 400 < nb < x.nbytes / 10 def test_scipy_sparse_interface(): n = 100 m = 10 row = np.random.randint(0, n, size=n, dtype=np.uint16) col = np.random.randint(0, m, size=n, dtype=np.uint16) data = np.ones(n, dtype=np.uint8) inp = (data, (row, col)) x = scipy.sparse.coo_matrix(inp) xx = sparse.COO(inp) assert_eq(x, xx) assert_eq(x.T, xx.T) assert_eq(xx.to_scipy_sparse(), x) assert_eq(COO.from_scipy_sparse(xx.to_scipy_sparse()), xx) assert_eq(x, xx) assert_eq(x.T.dot(x), xx.T.dot(xx)) assert isinstance(x + xx, COO) assert isinstance(xx + x, COO) @pytest.mark.parametrize('scipy_format', ['coo', 'csr', 'dok', 'csc']) def test_scipy_sparse_interaction(scipy_format): x = sparse.random((10, 20), density=0.2).todense() sp = getattr(scipy.sparse, scipy_format + '_matrix')(x) coo = COO(x) assert isinstance(sp + coo, COO) assert isinstance(coo + sp, COO) assert_eq(sp, coo) def test_cache_csr(): x = sparse.random((10, 5), density=0.5).todense() s = COO(x, cache=True) assert isinstance(s.tocsr(), scipy.sparse.csr_matrix) assert isinstance(s.tocsc(), scipy.sparse.csc_matrix) assert s.tocsr() is s.tocsr() assert s.tocsc() is s.tocsc() def test_empty_shape(): x = COO(np.empty((0, 1), dtype=np.int8), [1.0]) assert x.shape == () assert ((2 * x).todense() == np.array(2.0)).all() def test_single_dimension(): x = COO([1, 3], [1.0, 3.0]) assert x.shape == (4,) assert_eq(x, np.array([0, 1.0, 0, 3.0])) def test_raise_dense(): x = COO({(10000, 10000): 1.0}) with pytest.raises((ValueError, NotImplementedError)) as exc_info: np.exp(x) assert 'dense' in str(exc_info.value).lower() with pytest.raises((ValueError, NotImplementedError)): x + 1 def test_large_sum(): n = 500000 x = np.random.randint(0, 10000, size=(n,)) y = np.random.randint(0, 1000, size=(n,)) z = np.random.randint(0, 3, size=(n,)) data = np.random.random(n) a = COO((x, y, z), data) assert a.shape == (10000, 1000, 3) b = a.sum(axis=2) assert b.nnz > 100000 def test_add_many_sparse_arrays(): x = COO({(1, 1): 1}) y = sum([x] * 100) assert y.nnz < np.prod(y.shape) def test_caching(): x = COO({(10, 10, 10): 1}) assert x[:].reshape((100, 10)).transpose().tocsr() is not x[:].reshape((100, 10)).transpose().tocsr() x = COO({(10, 10, 10): 1}, cache=True) assert x[:].reshape((100, 10)).transpose().tocsr() is x[:].reshape((100, 10)).transpose().tocsr() x = COO({(1, 1, 1, 1, 1, 1, 1, 2): 1}, cache=True) for i in range(x.ndim): x.reshape((1,) * i + (2,) + (1,) * (x.ndim - i - 1)) assert len(x._cache['reshape']) < 5 def test_scalar_slicing(): x = np.array([0, 1]) s = COO(x) assert np.isscalar(s[0]) assert_eq(x[0], s[0]) assert isinstance(s[0, ...], COO) assert s[0, ...].shape == () assert_eq(x[0, ...], s[0, ...]) assert np.isscalar(s[1]) assert_eq(x[1], s[1]) assert isinstance(s[1, ...], COO) assert s[1, ...].shape == () assert_eq(x[1, ...], s[1, ...]) @pytest.mark.parametrize('shape, k', [ ((3, 4), 0), ((3, 4, 5), 1), ((4, 2), -1), ((2, 4), -2), ((4, 4), 1000), ]) def test_triul(shape, k): s = sparse.random(shape, density=0.5) x = s.todense() assert_eq(np.triu(x, k), sparse.triu(s, k)) assert_eq(np.tril(x, k), sparse.tril(s, k)) def test_empty_reduction(): x = np.zeros((2, 3, 4), dtype=np.float_) xs = COO.from_numpy(x) assert_eq(x.sum(axis=(0, 2)), xs.sum(axis=(0, 2))) @pytest.mark.parametrize('shape', [ (2,), (2, 3), (2, 3, 4), ]) @pytest.mark.parametrize('density', [ 0.1, 0.3, 0.5, 0.7 ]) def test_random_shape(shape, density): s = sparse.random(shape, density) assert isinstance(s, COO) assert s.shape == shape expected_nnz = density * np.prod(shape) assert np.floor(expected_nnz) <= s.nnz <= np.ceil(expected_nnz) def test_two_random_unequal(): s1 = sparse.random((2, 3, 4), 0.3) s2 = sparse.random((2, 3, 4), 0.3) assert not np.allclose(s1.todense(), s2.todense()) def test_two_random_same_seed(): state = np.random.randint(100) s1 = sparse.random((2, 3, 4), 0.3, random_state=state) s2 = sparse.random((2, 3, 4), 0.3, random_state=state) assert_eq(s1, s2) def test_random_sorted(): s = sparse.random((2, 3, 4), canonical_order=True) assert is_lexsorted(s) @pytest.mark.parametrize('rvs, dtype', [ (None, np.float64), (scipy.stats.poisson(25, loc=10).rvs, np.int), (lambda x: np.random.choice([True, False], size=x), np.bool), ]) @pytest.mark.parametrize('shape', [ (2, 4, 5), (20, 40, 50), ]) @pytest.mark.parametrize('density', [ 0.0, 0.01, 0.1, 0.2, ]) def test_random_rvs(rvs, dtype, shape, density): x = sparse.random(shape, density, data_rvs=rvs) assert x.shape == shape assert x.dtype == dtype def test_scalar_shape_construction(): x = np.random.rand(5) coords = np.arange(5)[None] s = COO(coords, x, shape=5) assert_eq(x, s) def test_len(): s = sparse.random((20, 30, 40)) assert len(s) == 20 def test_density(): s = sparse.random((20, 30, 40), density=0.1) assert np.isclose(s.density, 0.1) def test_size(): s = sparse.random((20, 30, 40)) assert s.size == 20 * 30 * 40 def test_np_array(): s = sparse.random((20, 30, 40)) x = np.array(s) assert isinstance(x, np.ndarray) assert_eq(x, s) sparse-0.2.0/sparse/tests/test_dok.py000066400000000000000000000060271323236075200176430ustar00rootroot00000000000000import pytest import numpy as np import six import sparse from sparse import DOK from sparse.utils import assert_eq @pytest.mark.parametrize('shape', [ (2,), (2, 3), (2, 3, 4), ]) @pytest.mark.parametrize('density', [ 0.1, 0.3, 0.5, 0.7 ]) def test_random_shape_nnz(shape, density): s = sparse.random(shape, density, format='dok') assert isinstance(s, DOK) assert s.shape == shape expected_nnz = density * np.prod(shape) assert np.floor(expected_nnz) <= s.nnz <= np.ceil(expected_nnz) def test_convert_to_coo(): s1 = sparse.random((2, 3, 4), 0.5, format='dok') s2 = sparse.COO(s1) assert_eq(s1, s2) def test_convert_from_coo(): s1 = sparse.random((2, 3, 4), 0.5, format='coo') s2 = DOK(s1) assert_eq(s1, s2) def test_convert_from_numpy(): x = np.random.rand(2, 3, 4) s = DOK(x) assert_eq(x, s) def test_convert_to_numpy(): s = sparse.random((2, 3, 4), 0.5, format='dok') x = s.todense() assert_eq(x, s) @pytest.mark.parametrize('shape, data', [ (2, { 0: 1 }), ((2, 3), { (0, 1): 3, (1, 2): 4, }), ((2, 3, 4), { (0, 1): 3, (1, 2, 3): 4, (1, 1): [6, 5, 4, 1] }), ]) def test_construct(shape, data): s = DOK(shape, data) x = np.zeros(shape, dtype=s.dtype) for c, d in six.iteritems(data): x[c] = d assert_eq(x, s) @pytest.mark.parametrize('shape', [ (2,), (2, 3), (2, 3, 4), ]) @pytest.mark.parametrize('density', [ 0.1, 0.3, 0.5, 0.7 ]) def test_getitem(shape, density): s = sparse.random(shape, density, format='dok') x = s.todense() for _ in range(s.nnz): idx = np.random.randint(np.prod(shape)) idx = np.unravel_index(idx, shape) assert np.isclose(s[idx], x[idx]) @pytest.mark.parametrize('shape, index, value', [ ((2,), slice(None), np.random.rand()), ((2,), slice(1, 2), np.random.rand()), ((2,), slice(0, 2), np.random.rand(2)), ((2,), 1, np.random.rand()), ((2, 3), (0, slice(None)), np.random.rand()), ((2, 3), (0, slice(1, 3)), np.random.rand()), ((2, 3), (1, slice(None)), np.random.rand(3)), ((2, 3), (0, slice(1, 3)), np.random.rand(2)), ((2, 3), (0, slice(2, 0, -1)), np.random.rand(2)), ((2, 3), (slice(None), 1), np.random.rand()), ((2, 3), (slice(None), 1), np.random.rand(2)), ((2, 3), (slice(1, 2), 1), np.random.rand()), ((2, 3), (slice(1, 2), 1), np.random.rand(1)), ((2, 3), (0, 2), np.random.rand()), ]) def test_setitem(shape, index, value): s = sparse.random(shape, 0.5, format='dok') x = s.todense() s[index] = value x[index] = value assert_eq(x, s) def test_default_dtype(): s = DOK((5,)) assert s.dtype == np.float64 def test_int_dtype(): data = { 1: np.uint8(1), 2: np.uint16(2), } s = DOK((5,), data) assert s.dtype == np.uint16 def test_float_dtype(): data = { 1: np.uint8(1), 2: np.float32(2), } s = DOK((5,), data) assert s.dtype == np.float32 sparse-0.2.0/sparse/utils.py000066400000000000000000000076201323236075200160250ustar00rootroot00000000000000import numpy as np from numbers import Integral def assert_eq(x, y, **kwargs): from .coo import COO assert x.shape == y.shape assert x.dtype == y.dtype if isinstance(x, COO): if x.sorted: assert is_lexsorted(x) if isinstance(y, COO): if y.sorted: assert is_lexsorted(y) if hasattr(x, 'todense'): xx = x.todense() else: xx = x if hasattr(y, 'todense'): yy = y.todense() else: yy = y assert np.allclose(xx, yy, **kwargs) def is_lexsorted(x): return not x.shape or (np.diff(x.linear_loc()) > 0).all() def _zero_of_dtype(dtype): """ Creates a ()-shaped 0-dimensional zero array of a given dtype. Parameters ---------- dtype : numpy.dtype The dtype for the array. Returns ------- np.ndarray The zero array. """ return np.zeros((), dtype=dtype) def random( shape, density=0.01, canonical_order=False, random_state=None, data_rvs=None, format='coo' ): """ Generate a random sparse multidimensional array Parameters ---------- shape: Tuple[int] Shape of the array density: float, optional Density of the generated array. canonical_order : bool, optional Whether or not to put the output :obj:`COO` object into canonical order. :code:`False` by default. random_state : Union[numpy.random.RandomState, int], optional Random number generator or random seed. If not given, the singleton numpy.random will be used. This random state will be used for sampling the sparsity structure, but not necessarily for sampling the values of the structurally nonzero entries of the matrix. data_rvs : Callable Data generation callback. Must accept one single parameter: number of :code:`nnz` elements, and return one single NumPy array of exactly that length. format: {'coo', 'dok'} The format to return the output array in. Returns ------- {COO, DOK} The generated random matrix. See Also -------- :obj:`scipy.sparse.rand` Equivalent Scipy function. :obj:`numpy.random.rand` Similar Numpy function. Examples -------- >>> from sparse import random >>> from scipy import stats >>> rvs = lambda x: stats.poisson(25, loc=10).rvs(x, random_state=np.random.RandomState(1)) >>> s = random((2, 3, 4), density=0.25, random_state=np.random.RandomState(1), data_rvs=rvs) >>> s.todense() # doctest: +NORMALIZE_WHITESPACE array([[[ 0, 0, 0, 0], [ 0, 34, 0, 0], [33, 34, 0, 29]], [[30, 0, 0, 34], [ 0, 0, 0, 0], [ 0, 0, 0, 0]]]) """ # Copied, in large part, from scipy.sparse.random # See https://github.com/scipy/scipy/blob/master/LICENSE.txt from .coo import COO from .dok import DOK elements = np.prod(shape) nnz = int(elements * density) if random_state is None: random_state = np.random elif isinstance(random_state, Integral): random_state = np.random.RandomState(random_state) if data_rvs is None: data_rvs = random_state.rand # Use the algorithm from python's random.sample for k < mn/3. if elements < 3 * nnz: ind = random_state.choice(elements, size=nnz, replace=False) else: ind = np.empty(nnz, dtype=np.min_scalar_type(elements - 1)) selected = set() for i in range(nnz): j = random_state.randint(elements) while j in selected: j = random_state.randint(elements) selected.add(j) ind[i] = j data = data_rvs(nnz) ar = COO(ind[None, :], data, shape=nnz).reshape(shape) if canonical_order: ar.sum_duplicates() if format == 'dok': ar = DOK(ar) return ar sparse-0.2.0/tox.ini000066400000000000000000000001371323236075200143250ustar00rootroot00000000000000[tox] envlist = py27,py36 [testenv] commands= py.test {posargs} extras= docs tests