././@PaxHeader0000000000000000000000000000003300000000000010211 xustar0027 mtime=1643147665.447736 astroML-1.0.2/0000755000076700000240000000000000000000000012432 5ustar00bsipoczstaff././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1643147437.0 astroML-1.0.2/CHANGES.rst0000644000076700000240000000516300000000000014241 0ustar00bsipoczstaff1.0.2 (2022-01-25) ================== - Add ``fetch_sdss_galaxy_images`` to download sample of SDSS galaxy images for CNN figure example. [#242] - Fix bug in ``lumfunc.Cminus`` that lead to return NaN values. [#237] 1.0.1 (2020-09-09) ================== - Fix that PyMC3 is an optional dependency. [#228] 1.0 (2020-08-21) ================ - Added LinearRegressionwithErrors to handle errors in both dependent and independent variables using pymc3. [#206] - Removed suppport for Python versions <3.5. [#174] - Deprecated function ``savitzky_golay`` in favour of the scipy implementation. [#193] - Deprecated functions ``check_random_state`` and ``logsumexp`` in favour of their equivalent in scikit-learn and scipy, respectively. [#190] 0.4.1 (2019-10-01) ================== - Fix syntax for matplotlib.rc usage. [#188] - Various code cleanups and updates to the website. 0.4 (2019-03-06) ================ - New utils subpackage, including deprecated decorator and new warning types. ``astroML.decorators`` has been moved to this subpackage. [#141] API Changes and Other Changes ----------------------------- - Removed deprecated KDE class. [#119] - Switched to use the updated scikit-learn API for GaussianMixture. This change depends on scikit-learn 0.18+. [#125] - Minimum required astropy version is now 1.2. [#173] - Deprecated ``astroML.cosmology.Cosmology()`` in favour of ``astropy.cosmology``. [#121] - Deprecated the Lomb-Scargle periodograms from ``astroML.time_series`` in favour of ``astropy.stats.LombScargle``. [#173] - Deprecated histograms, including Bayesian blocks, as they have been moved to ``astropy.stats``. [#142] - The book and paper figures has been moved out to their separate repository (``astroML_figures``). 0.3 (2015-01-28) ================ - Add support for Python 3 - Add continuous integration via Travis - Bug: correctly account for errors in Ridge/Lasso regression - Add figure tests in ``compare_images.py`` 0.2 (2013-12-09) ================ - Documentation and example updates - Moved from using ``pyfits`` to using ``astropy.io.fits`` - Fix the prior for the Bayesian Blocks algorithm 0.1.1 (2013-01-19) ================== Bug fixes, January 2013 ----------------------- - Fixed errors in dataset downloaders: they failed on some platforms - Added citation information to the website - Updated figures to reflect those submitted for publication - Performance improvement in ``freedman_bin_width`` - Fix setup issue when sklearn is not installed - Enhancements to ``devectorize_axes`` function 0.1 (2012-10) ============= Initial release, October 2012 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1541133836.0 astroML-1.0.2/LICENSE.rst0000644000076700000240000000242500000000000014251 0ustar00bsipoczstaffCopyright (c) 2012-2013, Jacob Vanderplas All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1612224989.0 astroML-1.0.2/MANIFEST.in0000644000076700000240000000015700000000000014173 0ustar00bsipoczstaffinclude *.rst include *.py recursive-include examples *.py *.rst recursive-include astroML *.py include LICENSE././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1643147665.4479063 astroML-1.0.2/PKG-INFO0000644000076700000240000001763500000000000013543 0ustar00bsipoczstaffMetadata-Version: 2.1 Name: astroML Version: 1.0.2 Summary: Tools for machine learning and data mining in Astronomy Home-page: http://astroML.github.com Author: Jake VanderPlas Author-email: vanderplas@astro.washington.edu Maintainer: Brigitta Sipocz Maintainer-email: bsipocz@gmail.com License: BSD 3-Clause License Description: .. -*- mode: rst -*- ======================================= AstroML: Machine Learning for Astronomy ======================================= .. image:: https://img.shields.io/badge/arXiv-1411.5039-orange.svg?style=flat :target: https://arxiv.org/abs/1411.5039 .. image:: https://img.shields.io/travis/astroML/astroML/master.svg?style=flat :target: https://travis-ci.org/astroML/astroML/ .. image:: https://img.shields.io/pypi/v/astroML.svg?style=flat :target: https://pypi.python.org/pypi/astroML .. image:: https://img.shields.io/pypi/dm/astroML.svg?style=flat :target: https://pypi.python.org/pypi/astroML .. image:: https://img.shields.io/badge/license-BSD-blue.svg?style=flat :target: https://github.com/astroml/astroml/blob/main/LICENSE.rst AstroML is a Python module for machine learning and data mining built on numpy, scipy, scikit-learn, and matplotlib, and distributed under the BSD license. It contains a growing library of statistical and machine learning routines for analyzing astronomical data in python, loaders for several open astronomical datasets, and a large suite of examples of analyzing and visualizing astronomical datasets. This project was started in 2012 by Jake VanderPlas to accompany the book *Statistics, Data Mining, and Machine Learning in Astronomy* by Zeljko Ivezic, Andrew Connolly, Jacob VanderPlas, and Alex Gray. Important Links =============== - HTML documentation: https://www.astroML.org - Core source-code repository: https://github.com/astroML/astroML - Figure source-code repository: https://github.com/astroML/astroML-figures - Issue Tracker: https://github.com/astroML/astroML/issues - Mailing List: https://groups.google.com/forum/#!forum/astroml-general Installation ============ **Before installation, make sure your system meets the prerequisites listed in Dependencies, listed below.** Core ---- To install the core ``astroML`` package in your home directory, use:: pip install astroML A conda package for astroML is also available either on the conda-forge or on the astropy conda channels:: conda install -c astropy astroML The core package is pure python, so installation should be straightforward on most systems. To install from source, use:: python setup.py install You can specify an arbitrary directory for installation using:: python setup.py install --prefix='/some/path' To install system-wide on Linux/Unix systems:: python setup.py build sudo python setup.py install Dependencies ============ There are two levels of dependencies in astroML. *Core* dependencies are required for the core ``astroML`` package. *Optional* dependencies are required to run some (but not all) of the example scripts. Individual example scripts will list their optional dependencies at the top of the file. Core Dependencies ----------------- The core ``astroML`` package requires the following (some of the functionality might work with older versions): - Python_ version 3.6+ - Numpy_ >= 1.13 - Scipy_ >= 0.18 - Scikit-learn_ >= 0.18 - Matplotlib_ >= 3.0 - AstroPy_ >= 3.0 Optional Dependencies --------------------- Several of the example scripts require specialized or upgraded packages. These requirements are listed at the top of the particular scripts - HEALPy_ provides an interface to the HEALPix pixelization scheme, as well as fast spherical harmonic transforms. Development =========== This package is designed to be a repository for well-written astronomy code, and submissions of new routines are encouraged. After installing the version-control system Git_, you can check out the latest sources from GitHub_ using:: git clone git://github.com/astroML/astroML.git or if you have write privileges:: git clone git@github.com:astroML/astroML.git Contribution ------------ We strongly encourage contributions of useful astronomy-related code: for `astroML` to be a relevant tool for the python/astronomy community, it will need to grow with the field of research. There are a few guidelines for contribution: General ~~~~~~~ Any contribution should be done through the github pull request system (for more information, see the `help page `_ Code submitted to ``astroML`` should conform to a BSD-style license, and follow the `PEP8 style guide `_. Documentation and Examples ~~~~~~~~~~~~~~~~~~~~~~~~~~ All submitted code should be documented following the `Numpy Documentation Guide`_. This is a unified documentation style used by many packages in the scipy universe. In addition, it is highly recommended to create example scripts that show the usefulness of the method on an astronomical dataset (preferably making use of the loaders in ``astroML.datasets``). These example scripts are in the ``examples`` subdirectory of the main source repository. .. _Numpy Documentation Guide: https://numpydoc.readthedocs.io/en/latest/format.html Authors ======= Package Author -------------- * Jake Vanderplas https://github.com/jakevdp http://jakevdp.github.com Maintainer ---------- * Brigitta Sipocz https://github.com/bsipocz Code Contribution ----------------- * Morgan Fouesneau https://github.com/mfouesneau * Julian Taylor http://github.com/juliantaylor .. _Python: https://www.python.org .. _Numpy: https://www.numpy.org .. _Scipy: https://www.scipy.org .. _Scikit-learn: https://scikit-learn.org .. _Matplotlib: https://matplotlib.org .. _AstroPy: http://www.astropy.org/ .. _HEALPy: https://github.com/healpy/healpy .. _Git: https://git-scm.com/ .. _GitHub: https://www.github.com Keywords: astronomy,astrophysics,cosmology,space,science,modeling,models,fitting,machine-learning Platform: UNKNOWN Classifier: Development Status :: 4 - Beta Classifier: Environment :: Console Classifier: Intended Audience :: Science/Research Classifier: License :: OSI Approved :: BSD License Classifier: Natural Language :: English Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3.5 Classifier: Programming Language :: Python :: 3.6 Classifier: Programming Language :: Python :: 3.7 Classifier: Programming Language :: Python :: 3.8 Classifier: Programming Language :: Python :: 3 :: Only Classifier: Topic :: Scientific/Engineering :: Astronomy Provides-Extra: test Provides-Extra: all Provides-Extra: codestyle Provides-Extra: docs ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1643147368.0 astroML-1.0.2/README.rst0000644000076700000240000001301200000000000014116 0ustar00bsipoczstaff.. -*- mode: rst -*- ======================================= AstroML: Machine Learning for Astronomy ======================================= .. image:: https://img.shields.io/badge/arXiv-1411.5039-orange.svg?style=flat :target: https://arxiv.org/abs/1411.5039 .. image:: https://img.shields.io/travis/astroML/astroML/master.svg?style=flat :target: https://travis-ci.org/astroML/astroML/ .. image:: https://img.shields.io/pypi/v/astroML.svg?style=flat :target: https://pypi.python.org/pypi/astroML .. image:: https://img.shields.io/pypi/dm/astroML.svg?style=flat :target: https://pypi.python.org/pypi/astroML .. image:: https://img.shields.io/badge/license-BSD-blue.svg?style=flat :target: https://github.com/astroml/astroml/blob/main/LICENSE.rst AstroML is a Python module for machine learning and data mining built on numpy, scipy, scikit-learn, and matplotlib, and distributed under the BSD license. It contains a growing library of statistical and machine learning routines for analyzing astronomical data in python, loaders for several open astronomical datasets, and a large suite of examples of analyzing and visualizing astronomical datasets. This project was started in 2012 by Jake VanderPlas to accompany the book *Statistics, Data Mining, and Machine Learning in Astronomy* by Zeljko Ivezic, Andrew Connolly, Jacob VanderPlas, and Alex Gray. Important Links =============== - HTML documentation: https://www.astroML.org - Core source-code repository: https://github.com/astroML/astroML - Figure source-code repository: https://github.com/astroML/astroML-figures - Issue Tracker: https://github.com/astroML/astroML/issues - Mailing List: https://groups.google.com/forum/#!forum/astroml-general Installation ============ **Before installation, make sure your system meets the prerequisites listed in Dependencies, listed below.** Core ---- To install the core ``astroML`` package in your home directory, use:: pip install astroML A conda package for astroML is also available either on the conda-forge or on the astropy conda channels:: conda install -c astropy astroML The core package is pure python, so installation should be straightforward on most systems. To install from source, use:: python setup.py install You can specify an arbitrary directory for installation using:: python setup.py install --prefix='/some/path' To install system-wide on Linux/Unix systems:: python setup.py build sudo python setup.py install Dependencies ============ There are two levels of dependencies in astroML. *Core* dependencies are required for the core ``astroML`` package. *Optional* dependencies are required to run some (but not all) of the example scripts. Individual example scripts will list their optional dependencies at the top of the file. Core Dependencies ----------------- The core ``astroML`` package requires the following (some of the functionality might work with older versions): - Python_ version 3.6+ - Numpy_ >= 1.13 - Scipy_ >= 0.18 - Scikit-learn_ >= 0.18 - Matplotlib_ >= 3.0 - AstroPy_ >= 3.0 Optional Dependencies --------------------- Several of the example scripts require specialized or upgraded packages. These requirements are listed at the top of the particular scripts - HEALPy_ provides an interface to the HEALPix pixelization scheme, as well as fast spherical harmonic transforms. Development =========== This package is designed to be a repository for well-written astronomy code, and submissions of new routines are encouraged. After installing the version-control system Git_, you can check out the latest sources from GitHub_ using:: git clone git://github.com/astroML/astroML.git or if you have write privileges:: git clone git@github.com:astroML/astroML.git Contribution ------------ We strongly encourage contributions of useful astronomy-related code: for `astroML` to be a relevant tool for the python/astronomy community, it will need to grow with the field of research. There are a few guidelines for contribution: General ~~~~~~~ Any contribution should be done through the github pull request system (for more information, see the `help page `_ Code submitted to ``astroML`` should conform to a BSD-style license, and follow the `PEP8 style guide `_. Documentation and Examples ~~~~~~~~~~~~~~~~~~~~~~~~~~ All submitted code should be documented following the `Numpy Documentation Guide`_. This is a unified documentation style used by many packages in the scipy universe. In addition, it is highly recommended to create example scripts that show the usefulness of the method on an astronomical dataset (preferably making use of the loaders in ``astroML.datasets``). These example scripts are in the ``examples`` subdirectory of the main source repository. .. _Numpy Documentation Guide: https://numpydoc.readthedocs.io/en/latest/format.html Authors ======= Package Author -------------- * Jake Vanderplas https://github.com/jakevdp http://jakevdp.github.com Maintainer ---------- * Brigitta Sipocz https://github.com/bsipocz Code Contribution ----------------- * Morgan Fouesneau https://github.com/mfouesneau * Julian Taylor http://github.com/juliantaylor .. _Python: https://www.python.org .. _Numpy: https://www.numpy.org .. _Scipy: https://www.scipy.org .. _Scikit-learn: https://scikit-learn.org .. _Matplotlib: https://matplotlib.org .. _AstroPy: http://www.astropy.org/ .. _HEALPy: https://github.com/healpy/healpy .. _Git: https://git-scm.com/ .. _GitHub: https://www.github.com ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1643147665.4078257 astroML-1.0.2/astroML/0000755000076700000240000000000000000000000014013 5ustar00bsipoczstaff././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1615393766.0 astroML-1.0.2/astroML/__init__.py0000644000076700000240000000060100000000000016121 0ustar00bsipoczstaff__version__ = '1.0.1' __citation__ = """@INPROCEEDINGS{astroML, author={{Vanderplas}, J.T. and {Connolly}, A.J. and {Ivezi{\'c}}, {\v Z}. and {Gray}, A.}, booktitle={Conference on Intelligent Data Understanding (CIDU)}, title={Introduction to astroML: Machine learning for astrophysics}, month={Oct.}, pages={47 -54}, doi={10.1109/CIDU.2012.6382200}, year={2012} }""" ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1643147665.4101162 astroML-1.0.2/astroML/classification/0000755000076700000240000000000000000000000017006 5ustar00bsipoczstaff././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1549602299.0 astroML-1.0.2/astroML/classification/__init__.py0000644000076700000240000000004000000000000021111 0ustar00bsipoczstafffrom .gmm_bayes import GMMBayes ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1615393766.0 astroML-1.0.2/astroML/classification/gmm_bayes.py0000644000076700000240000000543100000000000021326 0ustar00bsipoczstaff""" GMM Bayes --------- This implements generative classification based on mixtures of gaussians to model the probability density of each class. """ import warnings import numpy as np try: from sklearn.naive_bayes import _BaseNB except ImportError: # work around for sklearn < 0.22 from sklearn.naive_bayes import BaseNB class _BaseNB(BaseNB): pass from sklearn.mixture import GaussianMixture from sklearn.utils import check_array class GMMBayes(_BaseNB): """GaussianMixture Bayes Classifier This is a generalization to the Naive Bayes classifier: rather than modeling the distribution of each class with axis-aligned gaussians, GMMBayes models the distribution of each class with mixtures of gaussians. This can lead to better classification in some cases. Parameters ---------- n_components : int or list number of components to use in the GaussianMixture. If specified as a list, it must match the number of class labels. Default is 1. **kwargs : dict, optional other keywords are passed directly to GaussianMixture """ def __init__(self, n_components=1, **kwargs): self.n_components = np.atleast_1d(n_components) self.kwargs = kwargs def fit(self, X, y): X = self._check_X(X) y = np.asarray(y) n_samples, n_features = X.shape if n_samples != y.shape[0]: raise ValueError("X and y have incompatible shapes") self.classes_ = np.unique(y) self.classes_.sort() unique_y = self.classes_ n_classes = unique_y.shape[0] if self.n_components.size not in (1, len(unique_y)): raise ValueError("n_components must be compatible with " "the number of classes") self.gmms_ = [None for i in range(n_classes)] self.class_prior_ = np.zeros(n_classes) n_comp = np.zeros(len(self.classes_), dtype=int) + self.n_components for i, y_i in enumerate(unique_y): if n_comp[i] > X[y == y_i].shape[0]: warnstr = ("Expected n_samples >= n_components but got " "n_samples={0}, n_components={1}, " "n_components set to {0}.") warnings.warn(warnstr.format(X[y == y_i].shape[0], n_comp[i])) n_comp[i] = X[y == y_i].shape[0] self.gmms_[i] = GaussianMixture(n_comp[i], **self.kwargs).fit(X[y == y_i]) self.class_prior_[i] = float(np.sum(y == y_i)) / n_samples return self def _joint_log_likelihood(self, X): X = np.asarray(np.atleast_2d(X)) logprobs = np.array([g.score_samples(X) for g in self.gmms_]).T return logprobs + np.log(self.class_prior_) def _check_X(self, X): return check_array(X) ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1643147665.4107027 astroML-1.0.2/astroML/classification/tests/0000755000076700000240000000000000000000000020150 5ustar00bsipoczstaff././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1553152107.0 astroML-1.0.2/astroML/classification/tests/__init__.py0000644000076700000240000000000000000000000022247 0ustar00bsipoczstaff././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1615393766.0 astroML-1.0.2/astroML/classification/tests/test_gmm_bayes.py0000644000076700000240000000332500000000000023527 0ustar00bsipoczstaff"""Tests of the GMM Bayes classifier""" import numpy as np from numpy.testing import assert_allclose import pytest from astroML.classification import GMMBayes def test_gmm1d(): x1 = np.random.normal(0, 1, size=100) x2 = np.random.normal(10, 1, size=100) X = np.concatenate((x1, x2)).reshape((200, 1)) y = np.zeros(200) y[100:] = 1 ncm = 1 clf = GMMBayes(ncm) clf.fit(X, y) predicted = clf.predict(X) assert_allclose(y, predicted) def test_gmm2d(): x1 = np.random.normal(0, 1, size=(100, 2)) x2 = np.random.normal(10, 1, size=(100, 2)) X = np.vstack((x1, x2)) y = np.zeros(200) y[100:] = 1 for ncm in (1, 2, 3): clf = GMMBayes(ncm) clf.fit(X, y) predicted = clf.predict(X) assert_allclose(y, predicted) def test_incompatible_shapes_exception(): X = np.random.normal(0, 1, size=(100, 2)) y = np.zeros(99) ncm = 1 clf = GMMBayes(ncm) with pytest.raises(Exception) as e: assert clf.fit(X, y) assert str(e.value) == "X and y have incompatible shapes" def test_incompatible_number_of_components_exception(): X = np.random.normal(0, 1, size=(100, 2)) y = np.zeros(100) ncm = [1, 2, 3] clf = GMMBayes(ncm) with pytest.raises(Exception) as e: assert clf.fit(X, y) assert str(e.value) == ("n_components must be compatible with " "the number of classes") def test_too_many_components_warning(): X = np.random.normal(0, 1, size=(3, 2)) y = np.zeros(3) ncm = 5 clf = GMMBayes(ncm) with pytest.warns(UserWarning, match="Expected n_samples >= " "n_components but got "): clf.fit(X, y) ././@PaxHeader0000000000000000000000000000003300000000000010211 xustar0027 mtime=1643147665.411572 astroML-1.0.2/astroML/clustering/0000755000076700000240000000000000000000000016172 5ustar00bsipoczstaff././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1549602299.0 astroML-1.0.2/astroML/clustering/__init__.py0000644000076700000240000000010700000000000020301 0ustar00bsipoczstafffrom .mst_clustering import HierarchicalClustering, get_graph_segments ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1615393766.0 astroML-1.0.2/astroML/clustering/mst_clustering.py0000644000076700000240000001432300000000000021611 0ustar00bsipoczstaff""" Minimum Spanning Tree Clustering """ import numpy as np from scipy import sparse from sklearn.base import BaseEstimator from sklearn.neighbors import kneighbors_graph try: from scipy.sparse.csgraph import ( minimum_spanning_tree, connected_components) except ImportError: raise ValueError("scipy v0.11 or greater required " "for minimum spanning tree") class HierarchicalClustering(BaseEstimator): """Hierarchical Clustering via Approximate Euclidean Minimum Spanning Tree Parameters ---------- n_neighbors : int number of neighbors of each point used for approximate Euclidean minimum spanning tree (MST) algorithm. See Notes below. edge_cutoff : float specify a fraction of edges to keep when selecting clusters. edge_cutoff should be between 0 and 1. min_cluster_size : int, optional specify a minimum number of points per cluster. If not specified, all clusters will be kept. Attributes ---------- X_train_ : ndarray the training data full_tree_ : sparse graph the full approximate Euclidean MST spanning the data cluster_graph_ : sparse graph the final (truncated) graph showing clusters n_components_ : int the number of clusters found. labels_ : int the cluster labels for each training point. Labels range from -1 to n_components_ - 1: points labeled -1 are in the background (i.e. their clusters were smaller than min_cluster_size) Notes ----- This routine uses an approximate Euclidean minimum spanning tree (MST) to perform hierarchical clustering. A true Euclidean minimum spanning tree naively costs O[N^3]. Graph traversal algorithms only help so much, because all N^2 edges must be used as candidates. In this approximate algorithm, we use k < N edges from each point, so that the cost is only O[Nk log(Nk)]. For k = N, the approximation is exact; in practice for well-behaved data sets, the result is exact for k << N. """ def __init__(self, n_neighbors=20, edge_cutoff=0.9, min_cluster_size=1): self.n_neighbors = n_neighbors self.edge_cutoff = edge_cutoff self.min_cluster_size = min_cluster_size def fit(self, X): """Fit the clustering model Parameters ---------- X : array_like the data to be clustered: shape = [n_samples, n_features] """ X = np.asarray(X, dtype=float) self.X_train_ = X # generate a sparse graph using the k nearest neighbors of each point G = kneighbors_graph(X, n_neighbors=self.n_neighbors, mode='distance') # Compute the minimum spanning tree of this graph self.full_tree_ = minimum_spanning_tree(G, overwrite=True) # Find the cluster labels self.n_components_, self.labels_, self.cluster_graph_ =\ self.compute_clusters() return self def compute_clusters(self, edge_cutoff=None, min_cluster_size=None): """Compute the clusters given a trained tree After fit() is called, this method may be called to obtain a clustering result with a new edge_cutoff and min_cluster_size. Parameters ---------- edge_cutoff : float, optional specify a fraction of edges to keep when selecting clusters. edge_cutoff should be between 0 and 1. If not specified, self.edge_cutoff will be used. min_cluster_size : int, optional specify a minimum number of points per cluster. If not specified, self.min_cluster_size will be used. Returns ------- n_components : int the number of clusters found labels : ndarray the labels of each point. Labels range from -1 to n_components_ - 1: points labeled -1 are in the background (i.e. their clusters were smaller than min_cluster_size) T_trunc : sparse matrix the truncated minimum spanning tree """ if edge_cutoff is None: edge_cutoff = self.edge_cutoff if min_cluster_size is None: min_cluster_size = self.min_cluster_size if not hasattr(self, 'full_tree_'): raise ValueError("must call fit() before calling " "compute_clusters()") T_trunc = self.full_tree_.copy() # cut-off edges at the percentile given by edge_cutoff cutoff = np.percentile(T_trunc.data, 100 * edge_cutoff) T_trunc.data[T_trunc.data > cutoff] = 0 T_trunc.eliminate_zeros() # find connected components n_components, labels = connected_components(T_trunc, directed=False) counts = np.bincount(labels) # for all components with less than min_cluster_size points, set # to background, and re-label the clusters i_bg = np.where(counts < min_cluster_size)[0] for i in i_bg: labels[labels == i] = -1 if len(i_bg) > 0: _, labels = np.unique(labels, return_inverse=True) labels -= 1 n_components = labels.max() + 1 # eliminate links in T_trunc which are not clusters Eye = sparse.eye(len(labels), len(labels)) Eye.data[0, labels < 0] = 0 T_trunc = Eye * T_trunc * Eye return n_components, labels, T_trunc def get_graph_segments(X, G): """Get graph segments for plotting a 2D graph Parameters ---------- X : array_like the data, of shape [n_samples, 2] G : array_like or sparse graph the [n_samples, n_samples] matrix encoding the graph of connectinons on X Returns ------- x_coords, y_coords : ndarrays the x and y coordinates for plotting the graph. They are of size [2, n_links], and can be visualized using ``plt.plot(x_coords, y_coords, '-k')`` """ X = np.asarray(X) if (X.ndim != 2) or (X.shape[1] != 2): raise ValueError('shape of X should be (n_samples, 2)') G = sparse.coo_matrix(G) A = X[G.row].T B = X[G.col].T x_coords = np.vstack([A[0], B[0]]) y_coords = np.vstack([A[1], B[1]]) return x_coords, y_coords ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1643147665.4121268 astroML-1.0.2/astroML/clustering/tests/0000755000076700000240000000000000000000000017334 5ustar00bsipoczstaff././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1541133836.0 astroML-1.0.2/astroML/clustering/tests/__init__.py0000644000076700000240000000000000000000000021433 0ustar00bsipoczstaff././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1431090039.0 astroML-1.0.2/astroML/clustering/tests/test_MST_clustering.py0000644000076700000240000000233400000000000023651 0ustar00bsipoczstaffimport numpy as np from numpy.testing import assert_, assert_allclose from astroML.clustering import HierarchicalClustering, get_graph_segments def test_simple_clustering(): np.random.seed(0) N = 10 X = np.random.random((N, 2)) model = HierarchicalClustering(8, edge_cutoff=0.5) model.fit(X) assert_(model.n_components_ == N / 2) assert_(np.sum(model.full_tree_.toarray() > 0) == N - 1) assert_(np.sum(model.cluster_graph_.toarray() > 0) == N / 2) assert_allclose(np.unique(model.labels_), np.arange(N / 2)) def test_cluster_cutoff(): np.random.seed(0) N = 100 X = np.random.random((N, 2)) model = HierarchicalClustering(8, edge_cutoff=0.9, min_cluster_size=10) model.fit(X) assert_allclose(np.unique(model.labels_), np.arange(-1, model.n_components_)) def test_graph_segments(): np.random.seed(0) N = 4 X = np.random.random((N, 2)) G = np.zeros([N, N]) G[0, 1] = 1 G[2, 1] = 1 G[2, 3] = 1 ind = np.array([[0, 2, 2], [1, 1, 3]]) xseg_check = X[ind, 0] yseg_check = X[ind, 1] xseg, yseg = get_graph_segments(X, G) assert_allclose(xseg, xseg_check) assert_allclose(yseg, yseg_check) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1615925818.0 astroML-1.0.2/astroML/conftest.py0000644000076700000240000000136600000000000016220 0ustar00bsipoczstafftry: from pytest_astropy_header.display import (PYTEST_HEADER_MODULES, TESTED_VERSIONS) def pytest_configure(config): config.option.astropy_header = True try: PYTEST_HEADER_MODULES['Astropy'] = 'astropy' PYTEST_HEADER_MODULES['scikit-learn'] = 'sklearn' PYTEST_HEADER_MODULES['pymc3'] = 'pymc3' PYTEST_HEADER_MODULES['Theano'] = 'theano' del PYTEST_HEADER_MODULES['h5py'] del PYTEST_HEADER_MODULES['Pandas'] except (KeyError): pass # This is to figure out the package version, rather than # using Astropy's from . import __version__ as version TESTED_VERSIONS['astroMl'] = version except ImportError: pass ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1615393766.0 astroML-1.0.2/astroML/correlation.py0000644000076700000240000002452300000000000016714 0ustar00bsipoczstaff""" Tools for computing two-point correlation functions. """ import numpy as np from sklearn.neighbors import KDTree from sklearn.utils import check_random_state def uniform_sphere(RAlim, DEClim, size=1): """Draw a uniform sample on a sphere Parameters ---------- RAlim : tuple select Right Ascension between RAlim[0] and RAlim[1] units are degrees DEClim : tuple select Declination between DEClim[0] and DEClim[1] size : int (optional) the size of the random arrays to return (default = 1) Returns ------- RA, DEC : ndarray the random sample on the sphere within the given limits. arrays have shape equal to size. """ zlim = np.sin(np.pi * np.asarray(DEClim) / 180.) z = zlim[0] + (zlim[1] - zlim[0]) * np.random.random(size) DEC = (180. / np.pi) * np.arcsin(z) RA = RAlim[0] + (RAlim[1] - RAlim[0]) * np.random.random(size) return RA, DEC def ra_dec_to_xyz(ra, dec): """Convert ra & dec to Euclidean points Parameters ---------- ra, dec : ndarrays Returns x, y, z : ndarrays """ sin_ra = np.sin(ra * np.pi / 180.) cos_ra = np.cos(ra * np.pi / 180.) sin_dec = np.sin(np.pi / 2 - dec * np.pi / 180.) cos_dec = np.cos(np.pi / 2 - dec * np.pi / 180.) return (cos_ra * sin_dec, sin_ra * sin_dec, cos_dec) def angular_dist_to_euclidean_dist(D, r=1): """convert angular distances to euclidean distances""" return 2 * r * np.sin(0.5 * D * np.pi / 180.) def two_point(data, bins, method='standard', data_R=None, random_state=None): """Two-point correlation function Parameters ---------- data : array_like input data, shape = [n_samples, n_features] bins : array_like bins within which to compute the 2-point correlation. shape = Nbins + 1 method : string "standard" or "landy-szalay". data_R : array_like (optional) if specified, use this as the random comparison sample random_state : integer, np.random.RandomState, or None specify the random state to use for generating background Returns ------- corr : ndarray the estimate of the correlation function within each bin shape = Nbins """ data = np.asarray(data) bins = np.asarray(bins) rng = check_random_state(random_state) if method not in ['standard', 'landy-szalay']: raise ValueError("method must be 'standard' or 'landy-szalay'") if bins.ndim != 1: raise ValueError("bins must be a 1D array") if data.ndim == 1: data = data[:, np.newaxis] elif data.ndim != 2: raise ValueError("data should be 1D or 2D") n_samples, n_features = data.shape # shuffle all but one axis to get background distribution if data_R is None: data_R = data.copy() for i in range(n_features - 1): rng.shuffle(data_R[:, i]) else: data_R = np.asarray(data_R) if (data_R.ndim != 2) or (data_R.shape[-1] != n_features): raise ValueError('data_R must have same n_features as data') factor = len(data_R) * 1. / len(data) # Fast two-point correlation functions added in scikit-learn v. 0.14 KDT_D = KDTree(data) KDT_R = KDTree(data_R) counts_DD = KDT_D.two_point_correlation(data, bins) counts_RR = KDT_R.two_point_correlation(data_R, bins) DD = np.diff(counts_DD) RR = np.diff(counts_RR) # check for zero in the denominator RR_zero = (RR == 0) RR[RR_zero] = 1 if method == 'standard': corr = factor ** 2 * DD / RR - 1 elif method == 'landy-szalay': counts_DR = KDT_R.two_point_correlation(data, bins) DR = np.diff(counts_DR) corr = (factor ** 2 * DD - 2 * factor * DR + RR) / RR corr[RR_zero] = np.nan return corr def bootstrap_two_point(data, bins, Nbootstrap=10, method='standard', return_bootstraps=False, random_state=None): """Bootstrapped two-point correlation function Parameters ---------- data : array_like input data, shape = [n_samples, n_features] bins : array_like bins within which to compute the 2-point correlation. shape = Nbins + 1 Nbootstrap : integer number of bootstrap resamples to perform (default = 10) method : string "standard" or "landy-szalay". return_bootstraps: bool if True, return full bootstrapped samples random_state : integer, np.random.RandomState, or None specify the random state to use for generating background Returns ------- corr, corr_err : ndarrays the estimate of the correlation function and the bootstrap error within each bin. shape = Nbins """ data = np.asarray(data) bins = np.asarray(bins) rng = check_random_state(random_state) if method not in ['standard', 'landy-szalay']: raise ValueError("method must be 'standard' or 'landy-szalay'") if bins.ndim != 1: raise ValueError("bins must be a 1D array") if data.ndim == 1: data = data[:, np.newaxis] elif data.ndim != 2: raise ValueError("data should be 1D or 2D") if Nbootstrap < 2: raise ValueError("Nbootstrap must be greater than 1") n_samples, n_features = data.shape # get the baseline estimate corr = two_point(data, bins, method=method, random_state=rng) bootstraps = np.zeros((Nbootstrap, len(corr))) for i in range(Nbootstrap): indices = rng.randint(0, n_samples, n_samples) bootstraps[i] = two_point(data[indices, :], bins, method=method, random_state=rng) # use masked std dev in case of NaNs corr_err = np.asarray(np.ma.masked_invalid(bootstraps).std(0, ddof=1)) if return_bootstraps: return corr, corr_err, bootstraps else: return corr, corr_err def two_point_angular(ra, dec, bins, method='standard', random_state=None): """Angular two-point correlation function A separate function is needed because angular distances are not euclidean, and random sampling needs to take into account the spherical volume element. Parameters ---------- ra : array_like input right ascention, shape = (n_samples,) dec : array_like input declination bins : array_like bins within which to compute the 2-point correlation. shape = Nbins + 1 method : string "standard" or "landy-szalay". random_state : integer, np.random.RandomState, or None specify the random state to use for generating background Returns ------- corr : ndarray the estimate of the correlation function within each bin shape = Nbins """ ra = np.asarray(ra) dec = np.asarray(dec) rng = check_random_state(random_state) if method not in ['standard', 'landy-szalay']: raise ValueError("method must be 'standard' or 'landy-szalay'") if bins.ndim != 1: raise ValueError("bins must be a 1D array") if (ra.ndim != 1) or (dec.ndim != 1) or (ra.shape != dec.shape): raise ValueError('ra and dec must be 1-dimensional ' 'arrays of the same length') # draw a random sample with N points ra_R, dec_R = uniform_sphere((min(ra), max(ra)), (min(dec), max(dec)), 2 * len(ra)) data = np.asarray(ra_dec_to_xyz(ra, dec), order='F').T data_R = np.asarray(ra_dec_to_xyz(ra_R, dec_R), order='F').T # convert spherical bins to cartesian bins bins_transform = angular_dist_to_euclidean_dist(bins) return two_point(data, bins_transform, method=method, data_R=data_R, random_state=rng) def bootstrap_two_point_angular(ra, dec, bins, method='standard', Nbootstraps=10, random_state=None): """Angular two-point correlation function A separate function is needed because angular distances are not euclidean, and random sampling needs to take into account the spherical volume element. Parameters ---------- ra : array_like input right ascention, shape = (n_samples,) dec : array_like input declination bins : array_like bins within which to compute the 2-point correlation. shape = Nbins + 1 method : string "standard" or "landy-szalay". Nbootstraps : int number of bootstrap resamples random_state : integer, np.random.RandomState, or None specify the random state to use for generating background Returns ------- corr : ndarray the estimate of the correlation function within each bin shape = Nbins dcorr : ndarray error estimate on dcorr (sample standard deviation of bootstrap resamples) bootstraps : ndarray The full sample of bootstraps used to compute corr and dcorr """ ra = np.asarray(ra) dec = np.asarray(dec) rng = check_random_state(random_state) if method not in ['standard', 'landy-szalay']: raise ValueError("method must be 'standard' or 'landy-szalay'") if bins.ndim != 1: raise ValueError("bins must be a 1D array") if (ra.ndim != 1) or (dec.ndim != 1) or (ra.shape != dec.shape): raise ValueError('ra and dec must be 1-dimensional ' 'arrays of the same length') data = np.asarray(ra_dec_to_xyz(ra, dec), order='F').T # convert spherical bins to cartesian bins bins_transform = angular_dist_to_euclidean_dist(bins) bootstraps = [] for i in range(Nbootstraps): # draw a random sample with N points ra_R, dec_R = uniform_sphere((min(ra), max(ra)), (min(dec), max(dec)), 2 * len(ra)) data_R = np.asarray(ra_dec_to_xyz(ra_R, dec_R), order='F').T if i > 0: # random sample of the data ind = np.random.randint(0, data.shape[0], data.shape[0]) data_b = data[ind] else: data_b = data bootstraps.append(two_point(data_b, bins_transform, method=method, data_R=data_R, random_state=rng)) bootstraps = np.asarray(bootstraps) corr = np.mean(bootstraps, 0) corr_err = np.std(bootstraps, 0, ddof=1) return corr, corr_err, bootstraps ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1615393766.0 astroML-1.0.2/astroML/cosmology.py0000644000076700000240000000473100000000000016405 0ustar00bsipoczstaffimport numpy as np from scipy import integrate from astroML.utils import deprecated from astroML.utils.exceptions import AstroMLDeprecationWarning @deprecated('0.4', alternative='astropy.cosmology', warning_type=AstroMLDeprecationWarning) class Cosmology: """Class to enable simple cosmological calculations. For a more full-featured cosmology package, see CosmoloPy [1]_ Parameters ---------- omegaM : float Matter Density. 0 <= omegaM <= 1 omegaL : float Dark energy density. 0 <= omegaL <= 1 h : float Hubble parameter, in units of 100 km/s/Mpc References ---------- [1] http://roban.github.com/CosmoloPy/ """ def __init__(self, omegaM=0.27, omegaL=0.73, h=0.71): self.omegaM = omegaM self.omegaL = omegaL self.omegaK = 1. - omegaM - omegaL self.h = h # compute hubble distance in Mpc self.Dh = 2.9979E5 / (100 * h) def _hinv(self, z): """ dimensionless Hubble constant at redshift z This is used in integration routines Defined as in equation 14 from Hogg 1999, and modified for non-constant w parameterized linearly with z ( w = w0 + w1*z ) """ if np.isinf(z): return np.inf return np.sqrt(self.omegaM * (1. + z) ** 3 + self.omegaK * (1. + z) ** 2 + self.omegaL) def Dc(self, z): """ Line of sight comoving distance at redshift z Remains constant with epoch if objects are in the Hubble flow """ if z == 0: return 0 else: def f(z): return 1.0 / self._hinv(z) integral = integrate.quad(f, 0, z) return self.Dh * integral[0] def Dm(self, z): """ Transverse comoving distance at redshift z At same redshift but separated by angle dtheta; Dm * dtheta is transverse comoving distance """ sOk = np.sqrt(abs(self.omegaK)) if self.omegaK < 0.0: return self.Dh * np.sin(sOk * self.Dc(z) / self.Dh) / sOk elif self.omegaK == 0.0: return self.Dc(z) else: return self.Dh * np.sinh(sOk * self.Dc(z) / self.Dh) / sOk def Dl(self, z): """Luminosity distance (Mpc) at redshift z""" return (1. + z) * self.Dm(z) def mu(self, z): """Distance Modulus at redshift z""" return 5. * np.log10(self.Dl(z) * 1E6) - 5. ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1615393766.0 astroML-1.0.2/astroML/crossmatch.py0000644000076700000240000000562700000000000016545 0ustar00bsipoczstaffimport numpy as np from scipy.spatial import cKDTree def crossmatch(X1, X2, max_distance=np.inf): """Cross-match the values between X1 and X2 By default, this uses a KD Tree for speed. Parameters ---------- X1 : array_like first dataset, shape(N1, D) X2 : array_like second dataset, shape(N2, D) max_distance : float (optional) maximum radius of search. If no point is within the given radius, then inf will be returned. Returns ------- dist, ind: ndarrays The distance and index of the closest point in X2 to each point in X1 Both arrays are length N1. Locations with no match are indicated by dist[i] = inf, ind[i] = N2 """ X1 = np.asarray(X1, dtype=float) X2 = np.asarray(X2, dtype=float) N1, D = X1.shape N2, D2 = X2.shape if D != D2: raise ValueError('Arrays must have the same second dimension') kdt = cKDTree(X2) dist, ind = kdt.query(X1, k=1, distance_upper_bound=max_distance) return dist, ind def crossmatch_angular(X1, X2, max_distance=np.inf): """Cross-match angular values between X1 and X2 by default, this uses a KD Tree for speed. Because the KD Tree only handles cartesian distances, the angles are projected onto a 3D sphere. Parameters ---------- X1 : array_like first dataset, shape(N1, 2). X1[:, 0] is the RA, X1[:, 1] is the DEC, both measured in degrees X2 : array_like second dataset, shape(N2, 2). X2[:, 0] is the RA, X2[:, 1] is the DEC, both measured in degrees max_distance : float (optional) maximum radius of search, measured in degrees. If no point is within the given radius, then inf will be returned. Returns ------- dist, ind: ndarrays The angular distance and index of the closest point in X2 to each point in X1. Both arrays are length N1. Locations with no match are indicated by dist[i] = inf, ind[i] = N2 """ X1 = X1 * (np.pi / 180.) X2 = X2 * (np.pi / 180.) max_distance = max_distance * (np.pi / 180.) # Convert 2D RA/DEC to 3D cartesian coordinates Y1 = np.transpose(np.vstack([np.cos(X1[:, 0]) * np.cos(X1[:, 1]), np.sin(X1[:, 0]) * np.cos(X1[:, 1]), np.sin(X1[:, 1])])) Y2 = np.transpose(np.vstack([np.cos(X2[:, 0]) * np.cos(X2[:, 1]), np.sin(X2[:, 0]) * np.cos(X2[:, 1]), np.sin(X2[:, 1])])) # law of cosines to compute 3D distance max_y = np.sqrt(2 - 2 * np.cos(max_distance)) dist, ind = crossmatch(Y1, Y2, max_y) # convert distances back to angles using the law of tangents not_inf = ~np.isinf(dist) x = 0.5 * dist[not_inf] dist[not_inf] = (180. / np.pi * 2 * np.arctan2(x, np.sqrt(np.maximum(0, 1 - x ** 2)))) return dist, ind ././@PaxHeader0000000000000000000000000000003300000000000010211 xustar0027 mtime=1643147665.418932 astroML-1.0.2/astroML/datasets/0000755000076700000240000000000000000000000015623 5ustar00bsipoczstaff././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1615393766.0 astroML-1.0.2/astroML/datasets/LIGO_bigdog.py0000644000076700000240000000757100000000000020254 0ustar00bsipoczstaff""" Fetch the LIGO BigDog time-domain dataset """ import os from io import BytesIO from gzip import GzipFile import numpy as np from . import get_data_home from .tools import download_with_progress_bar DATA_URL_LARGE = ('https://github.com/astroML/astroML-data/raw/main/datasets/' 'hoft.968653908-968655956.H1.dat.gz') LOCAL_FILE_LARGE = 'LIGO_large.npy' DATA_URL = 'http://www.ligo.org/science/GW100916/HLV-strain.txt' LOCAL_FILE = 'LIGO_bigdog.npy' def fetch_LIGO_large(data_home=None, download_if_missing=True): """Loader for LIGO large dataset Parameters ---------- data_home : optional, default=None Specify another download and cache folder for the datasets. By default all astroML data is stored in '~/astroML_data'. download_if_missing : optional, default=True If False, raise a IOError if the data is not locally available instead of trying to download the data from the source site. Returns ------- data : ndarray dt : float data represents ~2000s of amplitude data from LIGO hanford; dt is the time spacing between measurements in seconds. """ data_home = get_data_home(data_home) local_file = os.path.join(data_home, LOCAL_FILE_LARGE) if os.path.exists(local_file): data = np.load(local_file) else: if not download_if_missing: raise IOError('data not present on disk. ' 'set download_if_missing=True to download') print("downloading LIGO bigdog data from %s to %s" % (DATA_URL_LARGE, local_file)) zipped_buf = download_with_progress_bar(DATA_URL_LARGE, return_buffer=True) gzf = GzipFile(fileobj=zipped_buf, mode='rb') print("uncompressing file...") extracted_buf = BytesIO(gzf.read()) data = np.loadtxt(extracted_buf) np.save(local_file, data) return data, 1. / 4096 def fetch_LIGO_bigdog(data_home=None, download_if_missing=True): """Loader for LIGO bigdog event Parameters ---------- data_home : optional, default=None Specify another download and cache folder for the datasets. By default all astroML data is stored in '~/astroML_data'. download_if_missing : optional, default=True If False, raise a IOError if the data is not locally available instead of trying to download the data from the source site. Returns ------- data : record array The data is 10 seconds of measurements from three sites, along with the time of each measurement. Examples -------- >>> from astroML.datasets import fetch_LIGO_bigdog >>> data = fetch_LIGO_bigdog() # doctest: +IGNORE_OUTPUT +REMOTE_DATA >>> print(data.dtype.names) # doctest: +REMOTE_DATA ('t', 'Hanford', 'Livingston', 'Virgo') >>> print(data['t'][:3]) # doctest: +REMOTE_DATA [ 0.00000000e+00 6.10400000e-05 1.22070000e-04] >>> print(data['Hanford'][:3]) # doctest: +REMOTE_DATA [ 1.26329846e-17 1.26846778e-17 1.19187381e-17] """ data_home = get_data_home(data_home) local_file = os.path.join(data_home, LOCAL_FILE) if os.path.exists(local_file): data = np.load(local_file) else: if not download_if_missing: raise IOError('data not present on disk. ' 'set download_if_missing=True to download') print("downloading LIGO bigdog data from %s to %s" % (DATA_URL, local_file)) buffer = download_with_progress_bar(DATA_URL, return_buffer=True) data = np.loadtxt(buffer, skiprows=2, dtype=[('t', 'f8'), ('Hanford', 'f8'), ('Livingston', 'f8'), ('Virgo', 'f8')]) np.save(local_file, data) return data ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1615393766.0 astroML-1.0.2/astroML/datasets/LINEAR_sample.py0000644000076700000240000001422600000000000020515 0ustar00bsipoczstaffimport os import tarfile import numpy as np from astropy.table import Table from . import get_data_home from .tools import download_with_progress_bar TARGETLIST_URL = ("https://github.com/astroML/astroML-data/raw/main/datasets/" "allLINEARfinal_targets.dat.gz") DATA_URL = ("https://github.com/astroML/astroML-data/raw/main/datasets/" "allLINEARfinal_dat.tar.gz") GENEVA_URL = ("https://github.com/astroML/astroML-data/raw/main/datasets/" "LINEARattributesFinalApr2013.dat.gz") GENEVA_ARCHIVE = 'LINEARattributesFinalApr2013.npy' ARCHIVE_DTYPE = ([(s, 'f8') for s in ('RA', 'Dec', 'ug', 'gi', 'iK', 'JK', 'logP', 'Ampl', 'skew', 'kurt', 'magMed', 'nObs')] + [('LCtype', 'i4'), ('LINEARobjectID', '|S20')]) target_names = ['objectID', 'raLIN', 'decLIN', 'raSDSS', 'decSDSS', 'r', 'ug', 'gr', 'ri', 'iz', 'JK', '', 'std', 'rms', 'Lchi2', 'LP1', 'phi1', 'S', 'prior'] class LINEARdata: """A container class for the linear dataset. Because the dataset is often not needed all at once, this class offers tools to access just the needed components Example ------- >>> data = fetch_LINEAR_sample() # doctest: +IGNORE_OUTPUT +REMOTE_DATA >>> lightcurve = data[data.ids[0]] # doctest: +REMOTE_DATA """ @staticmethod def _name_to_id(name): return int(name.split('.')[0]) @staticmethod def _id_to_name(id): return str(id) + '.dat' def __init__(self, data_file, targetlist_file): self.targets = np.recfromtxt(targetlist_file) self.targets.dtype.names = target_names self.dataF = tarfile.open(data_file) self.ids = np.array(list(map(self._name_to_id, self.dataF.getnames()))) # rearrange targets so lists are in the same order self.targets = self.targets[self.targets['objectID'].argsort()] ind = self.targets['objectID'].searchsorted(self.ids) self.targets = self.targets[ind] def get_light_curve(self, id): """Get a light curve with the given id. Parameters ---------- id: integer LINEAR id of the desired object Returns ------- lightcurve: ndarray a size (n_observations, 3) light-curve. columns are [MJD, flux, flux_err] """ return self[id] def get_target_parameter(self, id, param): """Get a target parameter associated with the given id. Parameters ---------- id: integer LINEAR id of the desired object param: string parameter name of the desired object (see below) Returns ------- val: scalar value of the requested target parameter Notes ----- Target parameters are one of the following: ['objectID', 'raLIN', 'decLIN', 'raSDSS', 'decSDSS', 'r', 'ug', 'gr', 'ri', 'iz', 'JK', '', 'std', 'rms', 'Lchi2', 'LP1', 'phi1', 'S', 'prior'] """ i = np.where(self.targets['objectID'] == id)[0] try: val = self.targets[param][i[0]] except BaseException: raise KeyError(id) return val def __getitem__(self, id): try: lc = np.loadtxt(self.dataF.extractfile(self._id_to_name(id))) except BaseException: raise KeyError(id) return lc def fetch_LINEAR_sample(data_home=None, download_if_missing=True): """Loader for LINEAR data sample Parameters ---------- data_home : optional, default=None Specify another download and cache folder for the datasets. By default all astroML data is stored in '~/astroML_data'. download_if_missing : optional, default=True If False, raise a IOError if the data is not locally available instead of trying to download the data from the source site. Returns ------- data : LINEARdata object A custom object which provides access to 7010 selected LINEAR light curves. """ data_home = get_data_home(data_home) targetlist_file = os.path.join(data_home, os.path.basename(TARGETLIST_URL)) data_file = os.path.join(data_home, os.path.basename(DATA_URL)) if not os.path.exists(targetlist_file): if not download_if_missing: raise IOError('data not present on disk. ' 'set download_if_missing=True to download') targets = download_with_progress_bar(TARGETLIST_URL) open(targetlist_file, 'wb').write(targets) if not os.path.exists(data_file): if not download_if_missing: raise IOError('data not present on disk. ' 'set download_if_missing=True to download') databuffer = download_with_progress_bar(DATA_URL) open(data_file, 'wb').write(databuffer) return LINEARdata(data_file, targetlist_file) def fetch_LINEAR_geneva(data_home=None, download_if_missing=True): """Loader for LINEAR geneva data. This supplements the LINEAR data above with well-determined periods and other light curve characteristics. Parameters ---------- data_home : optional, default=None Specify another download and cache folder for the datasets. By default all astroML data is stored in '~/astroML_data'. download_if_missing : optional, default=True If False, raise a IOError if the data is not locally available instead of trying to download the data from the source site. Returns ------- data : record array data on 7000+ LINEAR stars from the Geneva catalog """ data_home = get_data_home(data_home) archive_file = os.path.join(data_home, GENEVA_ARCHIVE) if not os.path.exists(archive_file): if not download_if_missing: raise IOError('data not present on disk. ' 'set download_if_missing=True to download') data = Table.read(GENEVA_URL, format='ascii', header_start=19) data = data.as_array() np.save(archive_file, data) else: data = np.load(archive_file) return data ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1643147368.0 astroML-1.0.2/astroML/datasets/__init__.py0000644000076700000240000000217000000000000017734 0ustar00bsipoczstaff""" Astronomy Datasets ------------------ """ from .tools import get_data_home from .sdss_S82standards import fetch_sdss_S82standards from .dr7_quasar import fetch_dr7_quasar from .moving_objects import fetch_moving_objects from .sdss_galaxy_colors import fetch_sdss_galaxy_colors from .sdss_spectrum import fetch_sdss_spectrum from .sdss_corrected_spectra import fetch_sdss_corrected_spectra from .nasa_atlas import fetch_nasa_atlas from .sdss_sspp import fetch_sdss_sspp from .sdss_specgals import fetch_sdss_specgals, fetch_great_wall from .imaging_sample import fetch_imaging_sample from .wmap_temperatures import fetch_wmap_temperatures from .rrlyrae_mags import fetch_rrlyrae_mags, fetch_rrlyrae_combined from .LINEAR_sample import fetch_LINEAR_sample, fetch_LINEAR_geneva from .LIGO_bigdog import fetch_LIGO_bigdog, fetch_LIGO_large from .generated import generate_mu_z from .hogg2010test import fetch_hogg2010test from .rrlyrae_templates import fetch_rrlyrae_templates from .sdss_filters import fetch_sdss_filter, fetch_vega_spectrum from .kelly2007test import simulation_kelly from .sdss_galaxy_images import fetch_sdss_galaxy_images ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1612224989.0 astroML-1.0.2/astroML/datasets/dr7_quasar.py0000644000076700000240000000671000000000000020251 0ustar00bsipoczstaff""" SDSS DR7 Quasar Dataset Loader. This implements a loader for the DR7 quasar dataset, located at http://www.sdss.org/dr7/products/value added/qsocat_dr7.html """ import os from gzip import GzipFile from io import BytesIO import numpy as np from .tools import download_with_progress_bar from . import get_data_home DATA_URL = 'http://das.sdss.org/va/qsocat/dr7qso.dat.gz' ARCHIVE_FILE = 'dr7_quasar.npy' # column numbers for extraction DR7_DTYPE = [('sdssID', 'a14'), ('RA', 'f8'), ('dec', 'f8'), ('redshift', 'f4'), ('mag_u', 'f4'), ('err_u', 'f4'), ('mag_g', 'f4'), ('err_g', 'f4'), ('mag_r', 'f4'), ('err_r', 'f4'), ('mag_i', 'f4'), ('err_i', 'f4'), ('mag_z', 'f4'), ('err_z', 'f4'), ('mag_J', 'f4'), ('err_J', 'f4'), ('mag_H', 'f4'), ('err_H', 'f4'), ('mag_K', 'f4'), ('err_K', 'f4'), ('specobjid', 'i8')] COLUMN_NUMBERS = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 22, 23, 24, 25, 26, 27, 72] # length of header information SKIP_ROWS = 80 def fetch_dr7_quasar(data_home=None, download_if_missing=True): """Loader for SDSS DR7 quasar catalog Parameters ---------- data_home : optional, default=None Specify another download and cache folder for the datasets. By default all astroML data is stored in '~/astroML_data'. download_if_missing : optional, default=True If False, raise a IOError if the data is not locally available instead of trying to download the data from the source site. Returns ------- data : ndarray, shape = (105783,) numpy record array containing the quasar catalog Examples -------- >>> from astroML.datasets import fetch_dr7_quasar >>> data = fetch_dr7_quasar() # doctest: +IGNORE_OUTPUT +REMOTE_DATA >>> u_g = data['mag_u'] - data['mag_g'] # doctest: +REMOTE_DATA >>> u_g[:3] # first three u-g colors # doctest: +REMOTE_DATA array([-0.07699966, 0.03600121, 0.10900116], dtype=float32) Notes ----- Not all available data is extracted and saved. The extracted columns are: sdssID, RA, DEC, redshift, mag_u, err_u, mag_g, err_g, mag_r, err_r, mag_i, err_i, mag_z, err_z, mag_J, err_J, mag_H, err_H, mag_K, err_K, specobjid many of the objects are missing 2mass photometry. More information at http://www.sdss.org/dr7/products/value_added/qsocat_dr7.html """ data_home = get_data_home(data_home) archive_file = os.path.join(data_home, ARCHIVE_FILE) if not os.path.exists(archive_file): if not download_if_missing: raise IOError('data not present on disk. ' 'set download_if_missing=True to download') print("downloading DR7 quasar dataset from %s to %s" % (DATA_URL, data_home)) zipped_buf = download_with_progress_bar(DATA_URL, return_buffer=True) gzf = GzipFile(fileobj=zipped_buf, mode='rb') extracted_buf = BytesIO(gzf.read()) data = np.loadtxt(extracted_buf, skiprows=SKIP_ROWS, usecols=COLUMN_NUMBERS, dtype=DR7_DTYPE) np.save(archive_file, data) else: data = np.load(archive_file) return data ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1612224989.0 astroML-1.0.2/astroML/datasets/generated.py0000644000076700000240000000313500000000000020135 0ustar00bsipoczstaffimport numpy as np from astropy.cosmology import FlatLambdaCDM from ..density_estimation import FunctionDistribution from sklearn.utils import check_random_state def redshift_distribution(z, z0): return (z / z0) ** 2 * np.exp(-1.5 * (z / z0)) def generate_mu_z(size=1000, z0=0.3, dmu_0=0.1, dmu_1=0.02, random_state=None, cosmo=None): """Generate a dataset of distance modulus vs redshift. Parameters ---------- size : int or tuple size of generated data z0 : float parameter in redshift distribution: p(z) ~ (z / z0)^2 exp[-1.5 (z / z0)] dmu_0, dmu_1 : float specify the error in mu, dmu = dmu_0 + dmu_1 * mu random_state : None, int, or np.random.RandomState instance random seed or random number generator cosmo : astropy.cosmology instance specifying cosmology to use when generating the sample. If not provided, a Flat Lambda CDM model with H0=71, Om0=0.27, Tcmb=0 is used. Returns ------- z, mu, dmu : ndarrays arrays of shape ``size`` """ if cosmo is None: cosmo = FlatLambdaCDM(H0=71, Om0=0.27, Tcmb0=0) random_state = check_random_state(random_state) zdist = FunctionDistribution(redshift_distribution, func_args=dict(z0=z0), xmin=0.1 * z0, xmax=10 * z0, random_state=random_state) z_sample = zdist.rvs(size) mu_sample = cosmo.distmod(z_sample).value dmu = dmu_0 + dmu_1 * mu_sample mu_sample = random_state.normal(mu_sample, dmu) return z_sample, mu_sample, dmu ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1431090039.0 astroML-1.0.2/astroML/datasets/hogg2010test.py0000644000076700000240000000322400000000000020325 0ustar00bsipoczstaff""" Data from Hogg et al 2010; useful for testing robust regression methods """ import numpy as np def fetch_hogg2010test(structured=False): """Fetch the Hogg et al 2010 test data """ data = np.array([[1, 201, 592, 61, 9, -0.84], [2, 244, 401, 25, 4, 0.31], [3, 47, 583, 38, 11, 0.64], [4, 287, 402, 15, 7, -0.27], [5, 203, 495, 21, 5, -0.33], [6, 58, 173, 15, 9, 0.67], [7, 210, 479, 27, 4, -0.02], [8, 202, 504, 14, 4, -0.05], [9, 198, 510, 30, 11, -0.84], [10, 158, 416, 16, 7, -0.69], [11, 165, 393, 14, 5, 0.30], [12, 201, 442, 25, 5, -0.46], [13, 157, 317, 52, 5, -0.03], [14, 131, 311, 16, 6, 0.50], [15, 166, 400, 34, 6, 0.73], [16, 160, 337, 31, 5, -0.52], [17, 186, 423, 42, 9, 0.90], [18, 125, 334, 26, 8, 0.40], [19, 218, 533, 16, 6, -0.78], [20, 146, 344, 22, 5, -0.56]]) dtype = [("ID", np.int32), ("x", np.float64), ("y", np.float64), ("sigma_x", np.float64), ("sigma_y", np.float64), ("rho_xy", np.float64)] recarray = np.empty(data.shape[0], dtype=dtype) recarray['ID'] = data[:, 0] recarray['x'] = data[:, 1] recarray['y'] = data[:, 2] recarray['sigma_x'] = data[:, 4] recarray['sigma_y'] = data[:, 3] recarray['rho_xy'] = data[:, 5] return recarray ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1615393766.0 astroML-1.0.2/astroML/datasets/imaging_sample.py0000644000076700000240000000765000000000000021161 0ustar00bsipoczstaffimport os import numpy as np from astropy.table import Table from . import get_data_home DATA_URL = ("https://github.com/astroML/astroML-data/raw/main/datasets/" "sgSDSSimagingSample.fit.gz") def fetch_imaging_sample(data_home=None, download_if_missing=True): """Loader for SDSS Imaging sample data Parameters ---------- data_home : optional, default=None Specify another download and cache folder for the datasets. By default all astroML data is stored in '~/astroML_data'. download_if_missing : optional, default=True If False, raise a IOError if the data is not locally available instead of trying to download the data from the source site. Returns ------- data : recarray, shape = (330753,) record array containing imaging data Examples -------- >>> from astroML.datasets import fetch_imaging_sample >>> data = fetch_imaging_sample() # doctest: +IGNORE_OUTPUT +REMOTE_DATA >>> # number of objects in dataset >>> data.shape # doctest: +REMOTE_DATA (330753,) >>> # names of the first five columns >>> print(data.dtype.names[:5]) # doctest: +REMOTE_DATA ('ra', 'dec', 'run', 'rExtSFD', 'uRaw') >>> print(data['ra'][:2]) # doctest: +REMOTE_DATA [0.358174 0.358382] >>> print(data['dec'][:2]) # doctest: +REMOTE_DATA [-0.508718 -0.551157] Notes ----- This data was selected from the SDSS database using the following SQL query:: SELECT round(p.ra,6) as ra, round(p.dec,6) as dec, p.run, --- comments are preceded by --- round(p.extinction_r,3) as rExtSFD, --- r band extinction from SFD round(p.modelMag_u,3) as uRaw, --- ISM-uncorrected model mags round(p.modelMag_g,3) as gRaw, --- rounding up model magnitudes round(p.modelMag_r,3) as rRaw, round(p.modelMag_i,3) as iRaw, round(p.modelMag_z,3) as zRaw, round(p.modelMagErr_u,3) as uErr, --- errors are important! round(p.modelMagErr_g,3) as gErr, round(p.modelMagErr_r,3) as rErr, round(p.modelMagErr_i,3) as iErr, round(p.modelMagErr_z,3) as zErr, round(p.psfMag_u,3) as psfRaw, --- psf magnitudes round(p.psfMag_g,3) as psfRaw, round(p.psfMag_r,3) as psfRaw, round(p.psfMag_i,3) as psfRaw, round(p.psfMag_z,3) as psfRaw, round(p.psfMagErr_u,3) as psfuErr, round(p.psfMagErr_g,3) as psfgErr, round(p.psfMagErr_r,3) as psfrErr, round(p.psfMagErr_i,3) as psfiErr, round(p.psfMagErr_z,3) as psfzErr, p.type, --- tells if a source is resolved or not (case when (p.flags & '16') = 0 then 1 else 0 end) as ISOLATED INTO mydb.SDSSimagingSample FROM PhotoTag p WHERE --- 10x2 sq.deg. p.ra > 0.0 and p.ra < 10.0 and p.dec > -1 and p.dec < 1 --- resolved and unresolved sources and (p.type = 3 OR p.type = 6) and --- '4295229440' is magic code for no --- DEBLENDED_AS_MOVING or SATURATED objects (p.flags & '4295229440') = 0 and --- PRIMARY objects only, which implies --- !BRIGHT && (!BLENDED || NODEBLEND || nchild == 0)] p.mode = 1 and --- adopted faint limit (same as about SDSS limit) p.modelMag_r < 22.5 --- the end of query """ data_home = get_data_home(data_home) archive_file = os.path.join(data_home, os.path.basename(DATA_URL)) if not os.path.exists(archive_file): if not download_if_missing: raise IOError('data not present on disk. ' 'set download_if_missing=True to download') data = Table.read(DATA_URL) data.write(archive_file) else: data = Table.read(archive_file) return np.asarray(data) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1612224989.0 astroML-1.0.2/astroML/datasets/kelly2007test.py0000644000076700000240000000464200000000000020534 0ustar00bsipoczstaff""" Generate input data based on Section 7 in Kelly 2007 """ import numpy as np from scipy.stats.distributions import rv_continuous __all__ = ['simulation_kelly'] class simulation_dist(rv_continuous): def _pdf(self, x): # eq 110 return 0.796248 * np.exp(x) / (1 + np.exp(2.75 * x)) def simulation_kelly(size=50, low=-10, high=10, alpha=1, beta=0.5, epsilon=(0, 0.75), scalex=1, scaley=1, multidim=1, ksi=None, eta=None): """ Data simulator from Kelly 2007 Parameters ========== size : int Number of datapoints to be generated alpha : float Regression coefficient defined in eq 1 beta : float Regression coefficient defined in eq 1 epsilon : tupple of floats Mean and standard deviation of normally distributed intrinsic scatter scalex : float Scale parameter for x measurement errors scaley : float Scale parameter for y measurement errors multidim : int Dimension of multivariate data ksi : array If both ``ksi`` and ``eta`` are not None, use them to generate the measured ``xi`` and ``yi`` values and their measurement errors using the scale parameters. eta : array If both ``ksi`` and ``eta`` are not None, use them to generate the measured ``xi`` and ``yi`` values and their measurement errors using the scale parameters. low : float Lower bound of the support of the distribution (see eq. 110 in Kelly 2007). high : float Upper bound of the support of the distribution (see eq. 110 in Kelly 2007). Returns ======= ksi eta xi yi xi_error yi_error alpha_in beta_in Notes ===== """ eps = np.random.normal(epsilon[0], scale=epsilon[1], size=size) beta = np.atleast_1d(beta) if ksi is None and eta is None: dist = simulation_dist(a=low, b=high) ksi = dist.rvs(size=(multidim, size)) # eq 1 eta = alpha + np.dot(beta, ksi) + eps tau = np.var(ksi) t = scalex * tau s = scaley * epsilon[1] # measurement errors from scaled inverse chi2 with df=5 sigma_x = 5 * t / np.random.chisquare(df=5, size=(multidim, size)) sigma_y = 5 * s / np.random.chisquare(df=5, size=size) x = np.random.normal(ksi, sigma_x) y = np.random.normal(eta, sigma_y) return ksi, eta, x, y, sigma_x, sigma_y, alpha, beta ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1615393766.0 astroML-1.0.2/astroML/datasets/moving_objects.py0000644000076700000240000001152400000000000021210 0ustar00bsipoczstaffimport os from gzip import GzipFile from io import BytesIO import numpy as np from .tools import download_with_progress_bar from . import get_data_home DATA_URL = ('https://github.com/astroML/astroML-data/raw/main/datasets/' 'ADR3.dat.gz') ARCHIVE_FILE = 'moving_objects.npy' ADR4_dtype = [('moID', 'a6'), ('sdss_run', 'i4'), ('sdss_col', 'i4'), ('sdss_field', 'i4'), ('sdss_obj', 'i4'), ('rowc', 'f4'), ('colc', 'f4'), ('mjd', 'f8'), ('ra', 'f8'), ('dec', 'f8'), ('lambda', 'f8'), ('beta', 'f8'), ('phi', 'f8'), ('vmu', 'f4'), ('vmu_err', 'f4'), ('vnu', 'f4'), ('vnu_err', 'f4'), ('vlambda', 'f4'), ('vbeta', 'f4'), ('mag_u', 'f4'), ('err_u', 'f4'), ('mag_g', 'f4'), ('err_g', 'f4'), ('mag_r', 'f4'), ('err_r', 'f4'), ('mag_i', 'f4'), ('err_i', 'f4'), ('mag_z', 'f4'), ('err_z', 'f4'), ('mag_a', 'f4'), ('err_a', 'f4'), ('mag_V', 'f4'), ('mag_B', 'f4'), ('ast_flag', 'i4'), ('ast_num', 'i8'), ('ast_designation', 'a17'), ('ast_det_count', 'i4'), ('ast_det_total', 'i4'), ('ast_flags', 'i8'), ('ra_comp', 'f8'), ('dec_comp', 'f8'), ('mag_comp', 'f4'), ('r_helio', 'f4'), ('r_geo', 'f4'), ('phase', 'f4'), ('cat_id', 'a15'), ('H', 'f4'), ('G', 'f4'), ('Arc', 'f4'), ('Epoch', 'f8'), ('a', 'f8'), ('e', 'f8'), ('i', 'f8'), ('asc_node', 'f8'), ('arg_peri', 'f8'), ('M', 'f8'), ('PEcat_id', 'a17'), ('aprime', 'f8'), ('eprime', 'f8'), ('sin_iprime', 'f8')] def fetch_moving_objects(data_home=None, download_if_missing=True, Parker2008_cuts=False): """Loader for SDSS moving objects datasets Parameters ---------- data_home : optional, default=None Specify another download and cache folder for the datasets. By default all astroML data is stored in '~/astroML_data'. download_if_missing : optional, default=True If False, raise a IOError if the data is not locally available instead of trying to download the data from the source site. Parker2008_cuts : bool (optional) If true, apply cuts on magnitudes and orbital parameters used in Parker et al. 2008 Returns ------- data : recarray, shape = (??,) record array containing 60 values for each item Notes ----- See http://www.astro.washington.edu/users/ivezic/sdssmoc/sdssmoc3.html Columns 0, 35, 45, and 56 are left out of the fetch: they are string parameters. Only columns with known orbital parameters are saved. Examples -------- >>> from astroML.datasets import fetch_moving_objects >>> data = fetch_moving_objects() # doctest: +IGNORE_OUTPUT +REMOTE_DATA >>> # number of objects >>> print(len(data)) # doctest: +REMOTE_DATA 43424 >>> # first five u-g colors of the dataset >>> u_g = data['mag_u'] - data['mag_g'] # doctest: +REMOTE_DATA >>> print(u_g[:5]) # doctest: +REMOTE_DATA [1.4899998 1.7800007 1.6500015 2.0100002 1.8199997] """ data_home = get_data_home(data_home) archive_file = os.path.join(data_home, ARCHIVE_FILE) if not os.path.exists(archive_file): if not download_if_missing: raise IOError('data not present on disk. ' 'set download_if_missing=True to download') print("downloading moving object catalog from %s to %s" % (DATA_URL, data_home)) zipped_buf = download_with_progress_bar(DATA_URL, return_buffer=True) gzf = GzipFile(fileobj=zipped_buf, mode='rb') print("uncompressing file...") extracted_buf = BytesIO(gzf.read()) data = np.loadtxt(extracted_buf, dtype=ADR4_dtype) # Select unique sources with known orbital elements flag = (data['ast_flag'] == 1) & (data['ast_det_count'] == 1) data = data[flag] np.save(archive_file, data) else: data = np.load(archive_file) if Parker2008_cuts: i_z = data['mag_i'] - data['mag_z'] flag = ((data['aprime'] >= 0.01) & (data['aprime'] <= 100) & (data['mag_a'] <= 0.4) & (data['mag_a'] >= -0.3) & (i_z <= 0.6) & (i_z >= -0.8)) data = data[flag] return data ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1615393766.0 astroML-1.0.2/astroML/datasets/nasa_atlas.py0000644000076700000240000000401600000000000020304 0ustar00bsipoczstaff""" NASA Sloan Atlas dataset size reduction --------------------------------------- The NASA Sloan Atlas dataset is contained in a ~0.5GB available at http://www.nsatlas.org/data This function fetches a ~50MB subset of that data. This subset is created using the code that can be found at examples/datasets/truncate_nsa_data.py """ import os import numpy as np from .tools import download_with_progress_bar from . import get_data_home DATA_URL = ('https://github.com/astroML/astroML-data/raw/main/datasets/' 'nsa_v0_1_2_reduced.npy') ARCHIVE_FILE = os.path.basename(DATA_URL) def fetch_nasa_atlas(data_home=None, download_if_missing=True): """Loader for NASA galaxy atlas data Parameters ---------- data_home : optional, default=None Specify another download and cache folder for the datasets. By default all astroML data is stored in '~/astroML_data'. download_if_missing : optional, default=True If False, raise a IOError if the data is not locally available instead of trying to download the data from the source site. Returns ------- data : ndarray The data, in the form of a numpy record array. Notes ----- This is the file created by the example script at examples/datasets/truncate_nsa_data.py For an explanation of the meaning of the fields, see the description at http://www.nsatlas.org/data """ data_home = get_data_home(data_home) archive_file = os.path.join(data_home, ARCHIVE_FILE) if not os.path.exists(archive_file): if not download_if_missing: raise IOError('data not present on disk. ' 'set download_if_missing=True to download') print("downloading NASA atlas data from %s to %s" % (DATA_URL, data_home)) buf = download_with_progress_bar(DATA_URL, return_buffer=True) data = np.load(buf) np.save(archive_file, data) else: data = np.load(archive_file) return data ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1615393766.0 astroML-1.0.2/astroML/datasets/rrlyrae_mags.py0000644000076700000240000001007600000000000020670 0ustar00bsipoczstaffimport os import numpy as np from . import get_data_home from . import fetch_sdss_S82standards from .tools import download_with_progress_bar DATA_URL = ("https://github.com/astroML/astroML-data/raw/main/datasets/" "RRLyrae.fit") def fetch_rrlyrae_mags(data_home=None, download_if_missing=True): """Loader for RR-Lyrae data Parameters ---------- data_home : optional, default=None Specify another download and cache folder for the datasets. By default all astroML data is stored in '~/astroML_data'. download_if_missing : optional, default=True If False, raise a IOError if the data is not locally available instead of trying to download the data from the source site. Returns ------- data : recarray, shape = (483,) record array containing imaging data Examples -------- >>> from astroML.datasets import fetch_rrlyrae_mags >>> data = fetch_rrlyrae_mags() # doctest: +IGNORE_OUTPUT +REMOTE_DATA >>> # number of objects in dataset >>> data.shape # doctest: +REMOTE_DATA (483,) Notes ----- This data is from table 1 of Sesar et al 2010 ApJ 708:717 """ # fits is an optional dependency: don't import globally from astropy.io import fits data_home = get_data_home(data_home) archive_file = os.path.join(data_home, os.path.basename(DATA_URL)) if not os.path.exists(archive_file): if not download_if_missing: raise IOError('data not present on disk. ' 'set download_if_missing=True to download') fitsdata = download_with_progress_bar(DATA_URL) open(archive_file, 'wb').write(fitsdata) hdulist = fits.open(archive_file) return np.asarray(hdulist[1].data) def fetch_rrlyrae_combined(data_home=None, download_if_missing=True): """Loader for RR-Lyrae combined data This returns the combined RR-Lyrae colors and SDSS standards colors. The RR-Lyrae sample is confirmed through time-domain observations; this result in a nice dataset for testing classification routines. Parameters ---------- data_home : optional, default=None Specify another download and cache folder for the datasets. By default all astroML data is stored in '~/astroML_data'. download_if_missing : optional, default=True If False, raise a IOError if the data is not locally available instead of trying to download the data from the source site. Returns ------- X : ndarray a shape (n_samples, 4) array. Columns are u-g, g-r, r-i, i-z y : ndarray a shape (n_samples,) array of labels. 1 indicates an RR Lyrae, 0 indicates a background star. """ # ---------------------------------------------------------------------- # Load data kwds = dict(data_home=data_home, download_if_missing=download_if_missing) rrlyrae = fetch_rrlyrae_mags(**kwds) standards = fetch_sdss_S82standards(**kwds) # ------------------------------------------------------------ # perform color cuts on standard stars # these come from eqns 1-4 of Sesar et al 2010, ApJ 708:717 u_g = standards['mmu_u'] - standards['mmu_g'] g_r = standards['mmu_g'] - standards['mmu_r'] r_i = standards['mmu_r'] - standards['mmu_i'] i_z = standards['mmu_i'] - standards['mmu_z'] standards = standards[(u_g > 0.7) & (u_g < 1.35) & (g_r > -0.15) & (g_r < 0.4) & (r_i > -0.15) & (r_i < 0.22) & (i_z > -0.21) & (i_z < 0.25)] # ---------------------------------------------------------------------- # get magnitudes and colors; split into train and test sets mags_rr = np.vstack([rrlyrae[f + 'mag'] for f in 'ugriz']) colors_rr = mags_rr[:-1] - mags_rr[1:] mags_st = np.vstack([standards['mmu_' + f] for f in 'ugriz']) colors_st = mags_st[:-1] - mags_st[1:] # stack the two sets of colors together X = np.vstack((colors_st.T, colors_rr.T)) y = np.zeros(X.shape[0]) y[-colors_rr.shape[1]:] = 1 return X, y ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1615393766.0 astroML-1.0.2/astroML/datasets/rrlyrae_templates.py0000644000076700000240000000272600000000000021742 0ustar00bsipoczstaffimport os import tarfile import numpy as np from . import get_data_home from .tools import download_with_progress_bar DATA_URL = ("https://github.com/astroML/astroML-data/raw/main/datasets/" "RRLyr_ugriz_templates.tar.gz") def fetch_rrlyrae_templates(data_home=None, download_if_missing=True): """Loader for RR-Lyrae template data These are the light-curve templates from Sesar et al 2010, ApJ 708:717 Parameters ---------- data_home : optional, default=None Specify another download and cache folder for the datasets. By default all astroML data is stored in '~/astroML_data'. download_if_missing : optional, default=True If False, raise a IOError if the data is not locally available instead of trying to download the data from the source site. Returns ------- data : numpy record array record array containing the templates """ data_home = get_data_home(data_home) data_file = os.path.join(data_home, os.path.basename(DATA_URL)) if not os.path.exists(data_file): if not download_if_missing: raise IOError('data not present on disk. ' 'set download_if_missing=True to download') databuffer = download_with_progress_bar(DATA_URL) open(data_file, 'wb').write(databuffer) data = tarfile.open(data_file) return {name.strip('.dat'): np.loadtxt(data.extractfile(name)) for name in data.getnames()} ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1615393766.0 astroML-1.0.2/astroML/datasets/sdss_S82standards.py0000644000076700000240000001162200000000000021513 0ustar00bsipoczstaffimport os from gzip import GzipFile from io import BytesIO import numpy as np from .tools import download_with_progress_bar from . import get_data_home DATA_URL = ('https://github.com/astroML/astroML-data/raw/main/datasets/' 'stripe82calibStars_v2.6.dat.gz') DATA_URL_2MASS = ('https://github.com/astroML/astroML-data/raw/main/datasets/' 'stripe82calibStars_2MASS_v2.6.dat.gz') ARCHIVE_FILE = 'sdss_S82standards.npy' ARCHIVE_FILE_2MASS = 'sdss_S82standards_2mass.npy' DTYPE = [('RA', 'f8'), ('DEC', 'f8'), ('RArms', 'f4'), ('DECrms', 'f4'), ('Ntot', 'i4'), ('A_r', 'f4')] for band in 'ugriz': DTYPE += [('Nobs_%s' % band, 'i4')] DTYPE += map(lambda s: (s + '_' + band, 'f4'), ['mmed', 'mmu', 'msig', 'mrms', 'mchi2']) DTYPE_2MASS = DTYPE + [('ra2MASS', 'f4'), ('dec2MASS', 'f4'), ('J', 'f4'), ('Jerr', 'f4'), ('H', 'f4'), ('Herr', 'f4'), ('K', 'f4'), ('Kerr', 'f4'), ('theta', 'f4')] # first column is 'CALIBSTARS'. We'll ignore this. COLUMNS = range(1, len(DTYPE) + 1) def fetch_sdss_S82standards(data_home=None, download_if_missing=True, crossmatch_2mass=False): """Loader for SDSS stripe82 standard star catalog Parameters ---------- data_home : optional, default=None Specify another download and cache folder for the datasets. By default all astroML data is stored in '~/astroML_data'. download_if_missing : bool, optional, default=True If False, raise a IOError if the data is not locally available instead of trying to download the data from the source site. crossmatch_2mass: bool, optional, default=False If True, return the standard star catalog cross-matched with 2mass magnitudes Returns ------- data : ndarray, shape = (313859,) record array containing sdss standard stars (see notes below) Notes ----- Information on the data can be found at http://www.astro.washington.edu/users/ivezic/sdss/catalogs/stripe82.html Data is described in Ivezic et al. 2007 (Astronomical Journal, 134, 973). Columns are as follows: RA Right-ascention of source (degrees) DEC Declination of source (degrees) RArms rms of right-ascention (arcsec) DECrms rms of declination (arcsec) Ntot total number of epochs A_r SFD ISM extinction (mags) for each band in (u g r i z): Nobs_ number of observations in this band mmed_ median magnitude in this band mmu_ mean magnitude in this band msig_ standard error on the mean (1.25 times larger for median) mrms_ root-mean-square scatter mchi2_ chi2 per degree of freedom for mean magnitude For 2-MASS, the following columns are added: ra2MASS 2-mass right-ascention dec2MASS 2-mass declination J J-band magnitude Jerr J-band error H H-band magnitude Herr H-band error K K-band magnitude Kerr K-band error theta difference between SDSS and 2MASS position (arcsec) Examples -------- >>> data = fetch_sdss_S82standards() # doctest: +IGNORE_OUTPUT +REMOTE_DATA >>> u_g = data['mmed_u'] - data['mmed_g'] # doctest: +REMOTE_DATA >>> print(u_g[:4]) # doctest: +REMOTE_DATA [-22.23500061 1.34900093 1.43799973 2.08200073] References ---------- Ivesic et al. ApJ 134:973 (2007) """ data_home = get_data_home(data_home) if crossmatch_2mass: archive_file = os.path.join(data_home, ARCHIVE_FILE_2MASS) data_url = DATA_URL_2MASS kwargs = dict(dtype=DTYPE_2MASS) else: archive_file = os.path.join(data_home, ARCHIVE_FILE) data_url = DATA_URL kwargs = dict(usecols=COLUMNS, dtype=DTYPE) if not os.path.exists(archive_file): if not download_if_missing: raise IOError('data not present on disk. ' 'set download_if_missing=True to download') print("downloading cross-matched SDSS/2MASS dataset from %s to %s" % (data_url, data_home)) zipped_buf = download_with_progress_bar(data_url, return_buffer=True) gzf = GzipFile(fileobj=zipped_buf, mode='rb') print("uncompressing file...") extracted_buf = BytesIO(gzf.read()) data = np.loadtxt(extracted_buf, **kwargs) np.save(archive_file, data) else: data = np.load(archive_file) return data ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1615393766.0 astroML-1.0.2/astroML/datasets/sdss_corrected_spectra.py0000644000076700000240000000523600000000000022732 0ustar00bsipoczstaffimport os import numpy as np from . import get_data_home from .tools import download_with_progress_bar DATA_URL = ("https://github.com/astroML/astroML-data/raw/main/datasets/" "spec4000.npz") def reconstruct_spectra(data): """Compute the reconstructed spectra. Parameters ---------- data: NpzFile numpy data object returned by fetch_sdss_corrected_spectra. Returns ------- spec_recons: ndarray Reconstructed spectra, using principal components to interpolate across the masked region. """ spectra = data['spectra'] coeffs = data['coeffs'] evecs = data['evecs'] mask = data['mask'] mu = data['mu'] norms = data['norms'] spec_recons = spectra.copy() nev = coeffs.shape[1] spec_fill = mu + np.dot(coeffs, evecs[:nev]) spec_fill *= norms[:, np.newaxis] spec_recons[mask] = spec_fill[mask] return spec_recons def compute_wavelengths(data): """Compute the wavelength associated with spectra. Paramters --------- Parameters ---------- data: NpzFile numpy data object returned by fetch_sdss_corrected_spectra. Returns ------- wavelength: ndarray One-dimensional wavelength array for spectra. """ return 10 ** (data['coeff0'] + data['coeff1'] * np.arange(data['spectra'].shape[1])) def fetch_sdss_corrected_spectra(data_home=None, download_if_missing=True): """Loader for Iterative PCA pre-processed galaxy spectra Parameters ---------- data_home : optional, default=None Specify another download and cache folder for the datasets. By default all astroML data is stored in '~/astroML_data'. download_if_missing : optional, default=True If False, raise a IOError if the data is not locally available instead of trying to download the data from the source site. Returns ------- data : NpzFile The data dictionary Notes ----- This is the file created by the example script examples/datasets/compute_sdss_pca.py """ data_home = get_data_home(data_home) data_file = os.path.join(data_home, os.path.basename(DATA_URL)) if not os.path.exists(data_file): if not download_if_missing: raise IOError('data not present on disk. ' 'set download_if_missing=True to download') print("downloading PCA-processed SDSS spectra from %s to %s" % (DATA_URL, data_home)) databuffer = download_with_progress_bar(DATA_URL) open(data_file, 'wb').write(databuffer) data = np.load(data_file) return data ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1615393766.0 astroML-1.0.2/astroML/datasets/sdss_filters.py0000644000076700000240000000617500000000000020712 0ustar00bsipoczstaffimport os import numpy as np from urllib.request import urlopen from astroML.datasets import get_data_home # Info on vega spectrum: http://www.stsci.edu/hst/observatory/cdbs/calspec.html VEGA_URL = ('https://github.com/astroML/astroML-data/raw/main/datasets/' '1732526_nic_002.ascii') FILTER_URL = 'http://classic.sdss.org/dr7/instruments/imager/filters/%s.dat' def fetch_sdss_filter(fname, data_home=None, download_if_missing=True): """Loader for SDSS Filter profiles Parameters ---------- fname : str filter name: must be one of 'ugriz' data_home : optional, default=None Specify another download and cache folder for the datasets. By default all data is stored in '~/astroML_data'. download_if_missing : optional, default=True If False, raise a IOError if the data is not locally available instead of trying to download the data from the source site. Returns ------- data : ndarray data is an array of shape (5, Nlam) first row: wavelength in angstroms second row: sensitivity to point source, airmass 1.3 third row: sensitivity to extended source, airmass 1.3 fourth row: sensitivity to extended source, airmass 0.0 fifth row: assumed atmospheric extinction, airmass 1.0 """ if fname not in 'ugriz': raise ValueError("Unrecognized filter name '%s'" % fname) url = FILTER_URL % fname data_home = get_data_home(data_home) archive_file = os.path.join(data_home, '%s.dat' % fname) if not os.path.exists(archive_file): if not download_if_missing: raise IOError('data not present on disk. ' 'set download_if_missing=True to download') print("downloading from %s" % url) F = urlopen(url) open(archive_file, 'wb').write(F.read()) F = open(archive_file) return np.loadtxt(F, unpack=True) def fetch_vega_spectrum(data_home=None, download_if_missing=True): """Loader for Vega reference spectrum Parameters ---------- fname : str filter name: must be one of 'ugriz' data_home : optional, default=None Specify another download and cache folder for the datasets. By default all astroML data is stored in '~/astroML_data'. download_if_missing : optional, default=True If False, raise a IOError if the data is not locally available instead of trying to download the data from the source site. Returns ------- data : ndarray data[0] is the array of wavelength in angstroms data[1] is the array of fluxes in Jy (F_nu, not F_lambda) """ data_home = get_data_home(data_home) archive_name = os.path.join(data_home, VEGA_URL.split('/')[-1]) if not os.path.exists(archive_name): if not download_if_missing: raise IOError('data not present on disk. ' 'set download_if_missing=True to download') print("downnloading from %s" % VEGA_URL) F = urlopen(VEGA_URL) open(archive_name, 'wb').write(F.read()) F = open(archive_name, 'r') return np.loadtxt(F, unpack=True) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1612224989.0 astroML-1.0.2/astroML/datasets/sdss_galaxy_colors.py0000644000076700000240000000455500000000000022110 0ustar00bsipoczstaffimport os import numpy as np from . import get_data_home from .tools import sql_query NOBJECTS = 50000 GAL_COLORS_NAMES = ['u', 'g', 'r', 'i', 'z', 'specClass', 'redshift', 'redshift_err'] ARCHIVE_FILE = 'sdss_galaxy_colors.npy' def fetch_sdss_galaxy_colors(data_home=None, download_if_missing=True): """Loader for SDSS galaxy colors. This function directly queries the sdss SQL database at http://cas.sdss.org/ Parameters ---------- data_home : optional, default=None Specify another download and cache folder for the datasets. By default all astroML data is stored in '~/astroML_data'. download_if_missing : optional, default=True If False, raise a IOError if the data is not locally available instead of trying to download the data from the source site. Returns ------- data : recarray, shape = (10000,) record array containing magnitudes and redshift for each galaxy """ data_home = get_data_home(data_home) archive_file = os.path.join(data_home, ARCHIVE_FILE) query_text = ('\n'.join(("SELECT TOP %i" % NOBJECTS, " p.u, p.g, p.r, p.i, p.z, s.class, s.z, s.zerr", "FROM PhotoObj AS p", " JOIN SpecObj AS s ON s.bestobjid = p.objid", "WHERE ", " p.u BETWEEN 0 AND 19.6", " AND p.g BETWEEN 0 AND 20", " AND s.class <> 'UNKNOWN'", " AND s.class <> 'STAR'", " AND s.class <> 'SKY'", " AND s.class <> 'STAR_LATE'"))) if not os.path.exists(archive_file): if not download_if_missing: raise IOError('data not present on disk. ' 'set download_if_missing=True to download') print("querying for %i objects" % NOBJECTS) print(query_text) output = sql_query(query_text) print("finished.") kwargs = {'delimiter': ',', 'skip_header': 2, 'names': GAL_COLORS_NAMES, 'dtype': None, 'encoding': 'ascii', } data = np.genfromtxt(output, **kwargs) np.save(archive_file, data) else: data = np.load(archive_file) return data ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1643147368.0 astroML-1.0.2/astroML/datasets/sdss_galaxy_images.py0000644000076700000240000000452100000000000022045 0ustar00bsipoczstaffimport os from io import BytesIO from gzip import GzipFile import numpy as np from astroML.datasets import get_data_home from astroML.datasets.tools import download_with_progress_bar IMAGES_URL = ('https://github.com/astroML/astroML-data/raw/main/datasets/' 'sdss_images_1000.npy.gz') LABELS_URL = ('https://github.com/astroML/astroML-data/raw/main/datasets/' 'sdss_labels_1000.npy') def fetch_sdss_galaxy_images(data_home=None, download_if_missing=True): """ Loader for SDSS galaxy images. A sample of 1000 coloured galaxy image stamps are loaded along with labels for their morphological classification. Parameters ---------- data_home : optional, default=None Specify another download and cache folder for the datasets. By default all astroML data is stored in '~/astroML_data'. download_if_missing : optional, default=True If False, raise a IOError if the data is not locally available instead of trying to download the data from the source site. Returns ------- data : ndarray, shape = (1000, 68, 68, 3) Array containing image data for 1000 galaxies in 3 colours. labels: ndarray, shape = (1000,) Labels of morphological classification (1 for spiral, 0 for elliptical). Notes ----- The sample selection is courtesy of Marc Huertas-Company from the full dataset of Nair & Abraham 2010 ApJS 186:427. """ data_home = get_data_home(data_home) images_file = os.path.join(data_home, os.path.basename(IMAGES_URL).split('.gz')[0]) labels_file = os.path.join(data_home, os.path.basename(LABELS_URL)) if not os.path.exists(images_file) or not os.path.exists(labels_file): if not download_if_missing: raise IOError('data not present on disk. ' 'set download_if_missing=True to download') zipped_buf = download_with_progress_bar(IMAGES_URL, return_buffer=True) gzf = GzipFile(fileobj=zipped_buf, mode='rb') data = np.load(BytesIO(gzf.read())) np.save(images_file, data) labels_buffer = download_with_progress_bar(LABELS_URL, return_buffer=True) labels = np.load(labels_buffer) np.save(labels_file, labels) else: data = np.load(images_file) labels = np.load(labels_file) return data, labels ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1615393766.0 astroML-1.0.2/astroML/datasets/sdss_specgals.py0000644000076700000240000001606500000000000021042 0ustar00bsipoczstaffimport os import numpy as np from astropy.cosmology import FlatLambdaCDM from astropy.table import Table, vstack from . import get_data_home # We store the data in two parts to comply with GitHub 100Mb file size limit DATA_URL1 = ("https://github.com/astroML/astroML-data/raw/main/datasets/" "SDSSspecgalsDR8_1.fit.gz") DATA_URL2 = ("https://github.com/astroML/astroML-data/raw/main/datasets/" "SDSSspecgalsDR8_2.fit.gz") def fetch_sdss_specgals(data_home=None, download_if_missing=True): """Loader for SDSS Galaxies with spectral information Parameters ---------- data_home : optional, default=None Specify another download and cache folder for the datasets. By default all astroML data is stored in '~/astroML_data'. download_if_missing : optional, default=True If False, raise a IOError if the data is not locally available instead of trying to download the data from the source site. Returns ------- data : recarray, shape = (661598,) record array containing pipeline parameters Notes ----- These were compiled from the SDSS database using the following SQL query:: SELECT G.ra, G.dec, S.mjd, S.plate, S.fiberID, --- basic identifiers --- basic spectral data S.z, S.zErr, S.rChi2, S.velDisp, S.velDispErr, --- some useful imaging parameters G.extinction_r, G.petroMag_r, G.psfMag_r, G.psfMagErr_r, G.modelMag_u, modelMagErr_u, G.modelMag_g, modelMagErr_g, G.modelMag_r, modelMagErr_r, G.modelMag_i, modelMagErr_i, G.modelMag_z, modelMagErr_z, G.petroR50_r, G.petroR90_r, --- line fluxes for BPT diagram and other derived spec. parameters GSL.nii_6584_flux, GSL.nii_6584_flux_err, GSL.h_alpha_flux, GSL.h_alpha_flux_err, GSL.oiii_5007_flux, GSL.oiii_5007_flux_err, GSL.h_beta_flux, GSL.h_beta_flux_err, GSL.h_delta_flux, GSL.h_delta_flux_err, GSX.d4000, GSX.d4000_err, GSE.bptclass, GSE.lgm_tot_p50, GSE.sfr_tot_p50, G.objID, GSI.specObjID INTO mydb.SDSSspecgalsDR8 FROM SpecObj S CROSS APPLY dbo.fGetNearestObjEQ(S.ra, S.dec, 0.06) N, Galaxy G, GalSpecInfo GSI, GalSpecLine GSL, GalSpecIndx GSX, GalSpecExtra GSE WHERE N.objID = G.objID AND GSI.specObjID = S.specObjID AND GSL.specObjID = S.specObjID AND GSX.specObjID = S.specObjID AND GSE.specObjID = S.specObjID --- add some quality cuts to get rid of obviously bad measurements AND (G.petroMag_r > 10 AND G.petroMag_r < 18) AND (G.modelMag_u-G.modelMag_r) > 0 AND (G.modelMag_u-G.modelMag_r) < 6 AND (modelMag_u > 10 AND modelMag_u < 25) AND (modelMag_g > 10 AND modelMag_g < 25) AND (modelMag_r > 10 AND modelMag_r < 25) AND (modelMag_i > 10 AND modelMag_i < 25) AND (modelMag_z > 10 AND modelMag_z < 25) AND S.rChi2 < 2 AND (S.zErr > 0 AND S.zErr < 0.01) AND S.z > 0.02 --- end of query --- Examples -------- >>> from astroML.datasets import fetch_sdss_specgals >>> data = fetch_sdss_specgals() # doctest: +IGNORE_OUTPUT +REMOTE_DATA >>> # number of objects in dataset >>> data.shape # doctest: +REMOTE_DATA (661598,) >>> # first five column names >>> data.dtype.names[:5] # doctest: +REMOTE_DATA ('ra', 'dec', 'mjd', 'plate', 'fiberID') >>> # first three RA values >>> print(data['ra'][:3]) # doctest: +REMOTE_DATA [ 146.71419105 146.74414186 146.62857334] >>> # first three declination values >>> print(data['dec'][:3]) # doctest: +REMOTE_DATA [-1.04127639 -0.6522198 -0.7651468 ] """ data_home = get_data_home(data_home) archive_file1 = os.path.join(data_home, os.path.basename(DATA_URL1)) archive_file2 = os.path.join(data_home, os.path.basename(DATA_URL2)) if not (os.path.exists(archive_file1) and os.path.exists(archive_file2)): if not download_if_missing: raise IOError('data not present on disk. ' 'set download_if_missing=True to download') for url, name in zip([DATA_URL1, DATA_URL2], [archive_file1, archive_file2]): data = Table.read(url) data.write(name) data1 = Table.read(archive_file1) data2 = Table.read(archive_file2) data = vstack([data1, data2]) return np.asarray(data) def fetch_great_wall(data_home=None, download_if_missing=True, xlim=(-375, -175), ylim=(-300, 200), cosmo=None): """Get the 2D SDSS "Great Wall" distribution, following Cowan et al 2008 Parameters ---------- data_home : optional, default=None Specify another download and cache folder for the datasets. By default all astroML data is stored in '~/astroML_data'. download_if_missing : optional, default=True If False, raise a IOError if the data is not locally available instead of trying to download the data from the source site. xlim, ylim : tuples or None the limits in Mpc of the data: default values are the same as that used for the plots in Cowan 2008. If set to None, no cuts will be performed. cosmo : `astropy.cosmology` instance specifying cosmology to use when generating the sample. If not provided, a Flat Lambda CDM model with H0=73.2, Om0=0.27, Tcmb0=0 is used. Returns ------- data : ndarray, shape = (Ngals, 2) grid of projected (x, y) locations of galaxies in Mpc """ # local imports so we don't need dependencies for loading module from scipy.interpolate import interp1d # We need some cosmological information to compute the r-band # absolute magnitudes. if cosmo is None: cosmo = FlatLambdaCDM(H0=73.2, Om0=0.27, Tcmb0=0) data = fetch_sdss_specgals(data_home, download_if_missing) # cut to the part of the sky with the "great wall" data = data[(data['dec'] > -7) & (data['dec'] < 7)] data = data[(data['ra'] > 80) & (data['ra'] < 280)] # do a redshift cut, following Cowan et al 2008 z = data['z'] data = data[(z > 0.01) & (z < 0.12)] # first sample the distance modulus on a grid zgrid = np.linspace(min(data['z']), max(data['z']), 100) mugrid = cosmo.distmod(zgrid).value f = interp1d(zgrid, mugrid) mu = f(data['z']) # do an absolute magnitude cut at -20 Mr = data['petroMag_r'] + data['extinction_r'] - mu data = data[Mr < -21] # compute distances in the equatorial plane # first sample comoving distance Dcgrid = cosmo.comoving_distance(zgrid).value f = interp1d(zgrid, Dcgrid) dist = f(data['z']) locs = np.vstack([dist * np.cos(data['ra'] * np.pi / 180.), dist * np.sin(data['ra'] * np.pi / 180.)]).T # cut on x and y limits if specified if xlim is not None: locs = locs[(locs[:, 0] > xlim[0]) & (locs[:, 0] < xlim[1])] if ylim is not None: locs = locs[(locs[:, 1] > ylim[0]) & (locs[:, 1] < ylim[1])] return locs ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1612224989.0 astroML-1.0.2/astroML/datasets/sdss_spectrum.py0000644000076700000240000000354100000000000021076 0ustar00bsipoczstaffimport os from .tools import get_data_home, download_with_progress_bar,\ SDSSfits, sdss_fits_url, sdss_fits_filename def fetch_sdss_spectrum(plate, mjd, fiber, data_home=None, download_if_missing=True, cache_to_disk=True): """Fetch an SDSS spectrum from the Data Archive Server Parameters ---------- plate: integer plate number of desired spectrum mjd: integer mean julian date of desired spectrum fiber: integer fiber number of desired spectrum Other Parameters ---------------- data_home: string (optional) directory in which to cache downloaded fits files. If not specified, it will be set to ~/astroML_data. download_if_missing: boolean (default = True) download the fits file if it is not cached locally. cache_to_disk: boolean (default = True) cache downloaded file to data_home. Returns ------- spec: :class:`astroML.tools.SDSSfits` object An object wrapper for the fits data """ data_home = get_data_home(data_home) target_url = sdss_fits_url(plate, mjd, fiber) target_file = os.path.join(data_home, 'SDSSspec', '%04i' % plate, sdss_fits_filename(plate, mjd, fiber)) if not os.path.exists(target_file): if not download_if_missing: raise IOError("SDSS colors training data not found") buf = download_with_progress_bar(target_url, return_buffer=True) if cache_to_disk: print("caching to %s" % target_file) if not os.path.exists(os.path.dirname(target_file)): os.makedirs(os.path.dirname(target_file)) fhandler = open(target_file, 'wb') fhandler.write(buf.read()) buf.seek(0) else: buf = target_file return SDSSfits(buf) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1615393766.0 astroML-1.0.2/astroML/datasets/sdss_sspp.py0000644000076700000240000001137600000000000020226 0ustar00bsipoczstaffimport os import numpy as np from astropy.table import Table from . import get_data_home DATA_URL = ("https://github.com/astroML/astroML-data/raw/main/datasets/" "SDSSssppDR9_rerun122.fit.gz") def compute_distances(data): """Compute the distances to select stars in the sdss_sspp sample. Distance are determined using empirical color/magnitude fits from Ivezic et al 2008, ApJ 684:287 Extinction correcctions come from Berry et al 2011, arXiv 1111.4985 This distance only works for stars with log(g) > 3.3 Other stars will have distance=-1 """ # extinction terms from Berry et al Ar = data['Ar'] # Au = 1.810 * Ar Ag = 1.400 * Ar Ai = 0.759 * Ar # Az = 0.561 * Ar # compute corrected mags and colors gmag = data['gpsf'] - Ag rmag = data['rpsf'] - Ar imag = data['ipsf'] - Ai gi = gmag - imag # compute distance fit from Ivezic et al FeH = data['FeH'] Mr0 = (-5.06 + 14.32 * gi - 12.97 * gi ** 2 + 6.127 * gi ** 3 - 1.267 * gi ** 4 + 0.0967 * gi ** 5) FeHoffset = 4.50 - 1.11 * FeH - 0.18 * FeH ** 2 Mr = Mr0 + FeHoffset dist = 0.01 * 10 ** (0.2 * (rmag - Mr)) # stars with log(g) < 3.3 don't work for this fit: set distance to -1 dist[data['logg'] < 3.3] = -1 return dist def fetch_sdss_sspp(data_home=None, download_if_missing=True, cleaned=False): """Loader for SDSS SEGUE Stellar Parameter Pipeline data Parameters ---------- data_home : optional, default=None Specify another download and cache folder for the datasets. By default all astroML data is stored in '~/astroML_data'. download_if_missing : bool (optional) default=True If False, raise a IOError if the data is not locally available instead of trying to download the data from the source site. cleaned : bool (optional) default=False if True, then return a cleaned catalog where objects with extreme values are removed. Returns ------- data : recarray, shape = (327260,) record array containing pipeline parameters Notes ----- Here are the comments from the fits file header: Imaging data and spectrum identifiers for a sample of 327,260 stars with SDSS spectra, selected as: 1) available SSPP parameters in SDSS Data Release 9 (SSPP rerun 122, file from Y.S. Lee) 2) 14 < r < 21 (psf magnitudes, uncorrected for ISM extinction) 3) 10 < u < 25 & 10 < z < 25 (same as above) 4) errors in ugriz well measured (>0) and <10 5) 0 < u-g < 3 (all color cuts based on psf mags, dereddened) 6) -0.5 < g-r < 1.5 & -0.5 < r-i < 1.0 & -0.5 < i-z < 1.0 7) -200 < pmL < 200 & -200 < pmB < 200 (proper motion in mas/yr) 8) pmErr < 10 mas/yr (proper motion error) 9) 1 < log(g) < 5 10) TeffErr < 300 K Teff and TeffErr are given in Kelvin, radVel and radVelErr in km/s. (ZI, Feb 2012, ivezic@astro.washington.edu) Examples -------- >>> from astroML.datasets import fetch_sdss_sspp >>> data = fetch_sdss_sspp() # doctest: +IGNORE_OUTPUT +REMOTE_DATA >>> # number of objects in dataset >>> data.shape # doctest: +REMOTE_DATA (327260,) >>> # names of the first five columns >>> print(data.dtype.names[:5]) # doctest: +REMOTE_DATA ('ra', 'dec', 'Ar', 'upsf', 'uErr') >>> # first RA value >>> print(data['ra'][:1]) # doctest: +REMOTE_DATA [49.6275024] >>> # first DEC value >>> print(data['dec'][:1]) # doctest: +REMOTE_DATA [-1.04175591] """ data_home = get_data_home(data_home) archive_file = os.path.join(data_home, os.path.basename(DATA_URL)) if not os.path.exists(archive_file): if not download_if_missing: raise IOError('data not present on disk. ' 'set download_if_missing=True to download') data = Table.read(DATA_URL) data.write(archive_file) else: data = Table.read(archive_file) if cleaned: # -1.1 < FeH < 0.1 data = data[(data['FeH'] > -1.1) & (data['FeH'] < 0.1)] # -0.03 < alpha/Fe < 0.57 data = data[(data['alphFe'] > -0.03) & (data['alphFe'] < 0.57)] # 5000 < Teff < 6500 data = data[(data['Teff'] > 5000) & (data['Teff'] < 6500)] # 3.5 < log(g) < 5 data = data[(data['logg'] > 3.5) & (data['logg'] < 5)] # 0 < error for FeH < 0.1 data = data[(data['FeHErr'] > 0) & (data['FeHErr'] < 0.1)] # 0 < error for alpha/Fe < 0.05 data = data[(data['alphFeErr'] > 0) & (data['alphFeErr'] < 0.05)] # 15 < g mag < 18 data = data[(data['gpsf'] > 15) & (data['gpsf'] < 18)] # abs(radVel) < 100 km/s data = data[(abs(data['radVel']) < 100)] return np.asarray(data) ././@PaxHeader0000000000000000000000000000003300000000000010211 xustar0027 mtime=1643147665.419409 astroML-1.0.2/astroML/datasets/tests/0000755000076700000240000000000000000000000016765 5ustar00bsipoczstaff././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1612224989.0 astroML-1.0.2/astroML/datasets/tests/__init__.py0000644000076700000240000000000000000000000021064 0ustar00bsipoczstaff././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1612224989.0 astroML-1.0.2/astroML/datasets/tests/test_datasets.py0000644000076700000240000000026500000000000022211 0ustar00bsipoczstaffimport pytest from astroML.datasets import fetch_great_wall @pytest.mark.remote_data def test_fetch_great_wall(): data = fetch_great_wall() assert data.shape == (8014, 2) ././@PaxHeader0000000000000000000000000000003300000000000010211 xustar0027 mtime=1643147665.421034 astroML-1.0.2/astroML/datasets/tools/0000755000076700000240000000000000000000000016763 5ustar00bsipoczstaff././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1615393766.0 astroML-1.0.2/astroML/datasets/tools/__init__.py0000644000076700000240000000171200000000000021075 0ustar00bsipoczstaff""" tools for the dataset loaders """ from .download import download_with_progress_bar from .sql_query import sql_query from .cas_query import * # noqa: F403, F401 from .sdss_fits import sdss_fits_url, sdss_fits_filename, SDSSfits def get_data_home(data_home=None): """Get the home data directory. By default the data dir is set to a folder named 'astroML_data' in the user home folder. Alternatively, it can be set by the 'ASTROML_DATA' environment variable or programatically by giving an explit folder path. The '~' symbol is expanded to the user home folder. If the folder does not already exist, it is automatically created. """ import os if data_home is None: data_home = os.environ.get('ASTROML_DATA', os.path.join('~', 'astroML_data')) data_home = os.path.expanduser(data_home) if not os.path.exists(data_home): os.makedirs(data_home) return data_home ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1612224989.0 astroML-1.0.2/astroML/datasets/tools/cas_query.py0000644000076700000240000000574000000000000021336 0ustar00bsipoczstaffimport numpy as np from . import sql_query # SDSS primtarget codes TARGET_QSO_HIZ = int('0x00000001', 16) TARGET_QSO_CAP = int('0x00000002', 16) TARGET_QSO_SKIRT = int('0x00000004', 16) TARGET_QSO_FIRST_CAP = int('0x00000008', 16) TARGET_QSO_FIRST_SKIRT = int('0x00000010', 16) TARGET_GALAXY_RED = int('0x00000020', 16) TARGET_GALAXY = int('0x00000040', 16) TARGET_GALAXY_BIG = int('0x00000080', 16) TARGET_GALAXY_BRIGHT_CORE = int('0x00000100', 16) TARGET_ROSAT_A = int('0x00000200', 16) TARGET_ROSAT_B = int('0x00000400', 16) TARGET_ROSAT_C = int('0x00000800', 16) TARGET_ROSAT_D = int('0x00001000', 16) TARGET_STAR_BHB = int('0x00002000', 16) TARGET_STAR_CARBON = int('0x00004000', 16) TARGET_STAR_BROWN_DWARF = int('0x00008000', 16) TARGET_STAR_SUB_DWARF = int('0x00010000', 16) TARGET_STAR_CATY_VAR = int('0x00020000', 16) TARGET_STAR_RED_DWARF = int('0x00040000', 16) TARGET_STAR_WHITE_DWARF = int('0x00080000', 16) TARGET_SERENDIP_BLUE = int('0x00100000', 16) TARGET_SERENDIP_FIRST = int('0x00200000', 16) TARGET_SERENDIP_RED = int('0x00400000', 16) TARGET_SERENDIP_DISTANT = int('0x00800000', 16) TARGET_SERENDIP_MANUAL = int('0x01000000', 16) TARGET_QSO_FAINT = int('0x02000000', 16) TARGET_GALAXY_RED_II = int('0x04000000', 16) TARGET_ROSAT_E = int('0x08000000', 16) TARGET_STAR_PN = int('0x10000000', 16) TARGET_QSO_REJECT = int('0x20000000', 16) DEFAULT_TARGET = TARGET_GALAXY # main galaxy sample def query_plate_mjd_fiber(n_spectra, primtarget=DEFAULT_TARGET, zmin=0, zmax=0.7): """Query the SDSS server for plate, mjd, and fiber numbers Parameters ---------- n_spectra: int number of spectra to query. Max is 100,000 (set by CAS server) primtarget: int prime target flag. See notes below zmin, zmax: float minimum and maximum redshift range for query Returns ------- plate, mjd, fiber : ndarrays, size=n_spectra The plate numbers MJD, and fiber numbers of the spectra Notes ----- Primtarget flag values can be found at http://cas.sdss.org/dr7/en/help/browser/enum.asp?n=PrimTarget """ query_text = '\n'.join(("SELECT TOP %(n_spectra)i ", " plate, mjd, fiberid ", "FROM specObj ", "WHERE ((PrimTarget & %(primtarget)i) > 0) ", " AND (z > %(zmin)f)", " AND (z <= %(zmax)f) ")) % locals() output = sql_query(query_text).readlines() keys = output[0] res = np.zeros((n_spectra, 3), dtype=int) for i, line in enumerate(output[2:]): try: res[i] = line.decode().strip().split(',') except BaseException: raise ValueError(b'\n'.join(output)) ntot = i + 1 return res[:ntot].T ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1612224989.0 astroML-1.0.2/astroML/datasets/tools/download.py0000644000076700000240000000324100000000000021144 0ustar00bsipoczstaffimport sys from io import BytesIO from urllib.request import urlopen def bytes_to_string(nbytes): if nbytes < 1024: return '%ib' % nbytes nbytes /= 1024. if nbytes < 1024: return '%.1fkb' % nbytes nbytes /= 1024. if nbytes < 1024: return '%.2fMb' % nbytes nbytes /= 1024. return '%.1fGb' % nbytes def url_content_length(fhandle): length = dict(fhandle.info())['Content-Length'] return int(length.strip()) def download_with_progress_bar(data_url, return_buffer=False): """Download a file, showing progress Parameters ---------- data_url : string web address return_buffer : boolean (optional) if true, return a BytesIO buffer rather than a string Returns ------- s : string content of the file """ num_units = 40 fhandle = urlopen(data_url) content_length = url_content_length(fhandle) chunk_size = content_length // num_units print("Downloading %s" % data_url) nchunks = 0 buf = BytesIO() content_length_str = bytes_to_string(content_length) while True: next_chunk = fhandle.read(chunk_size) nchunks += 1 if next_chunk: buf.write(next_chunk) s = ('[' + nchunks * '=' + (num_units - 1 - nchunks) * ' ' + '] {} / {} \r'.format(bytes_to_string(buf.tell()), content_length_str)) else: sys.stdout.write('\n') break sys.stdout.write(s) sys.stdout.flush() buf.seek(0) if return_buffer: return buf else: return buf.getvalue() ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1612224989.0 astroML-1.0.2/astroML/datasets/tools/sdss_fits.py0000644000076700000240000003003100000000000021333 0ustar00bsipoczstaff""" Tools to download and process SDSS fits files. More information can be found at http://www.sdss.org/dr7/products/spectra/index.html """ import gc # garbage collection import numpy as np from scipy.ndimage.filters import gaussian_filter1d, uniform_filter1d from scipy import interpolate from . import download_with_progress_bar # This is the URL of the sdss fits spectra FITS_FILENAME = 'spSpec-%(mjd)05i-%(plate)04i-%(fiber)03i.fit' SDSS_URL = ('http://das.sdss.org/spectro/1d_26/%(plate)04i/' '1d/spSpec-%(mjd)05i-%(plate)04i-%(fiber)03i.fit') # lines used to generate line-index labeling LINES = dict(Ha=6564.61, Hb=4862.68, OI=6302.05, OIII=5008.24, NIIa=6549.86, NIIb=6585.27, SIIa=6718.29, SIIb=6732.67) def sdss_fits_url(plate, mjd, fiber): """Return the URL of the spectrum FITS file""" return SDSS_URL % dict(plate=plate, mjd=mjd, fiber=fiber) def sdss_fits_filename(plate, mjd, fiber): """Return the name of the spectrum FITS file""" return FITS_FILENAME % dict(plate=plate, mjd=mjd, fiber=fiber) spec_cln_dict = ['SPEC_UNKNOWN', 'SPEC_STAR', 'SPEC_GALAXY', 'SPEC_QSO', 'SPEC_HIZ_QSO', # high redshift QSO, z>2.3 'SPEC_SKY', 'STAR_LATE', # Type M or later (molecular bands dominate) 'GAL_EM'] # emission line galaxy class SDSSfits: """A class to open and interact with fits files from SDSS Parameters ---------- buf : string or file buffer (optional) file path, buffer, or url of SDSS spectra fits file if None, then initialize an empty instance. Notes ----- This class only provides access to a subset of the information available in the sdss spectra fits file. The raw fits data can be accessed using the fits object directly. This can be found in the attribute ``hdulist``. For details, please refer to the data description: http://www.sdss.org/dr7/dm/flatFiles/spSpec.html """ def __init__(self, source=None): if source is None: pass elif isinstance(source, str): if source.startswith('http://'): self._load_fits_url(source) else: self._load_fits_file(source) else: self._load_fits_file(source) def _load_fits_url(self, url): # fits is an optional dependency: don't import globally from astropy.io import fits buffer = download_with_progress_bar(url, return_buffer=True) self._initialize(fits.open(buffer)) def _load_fits_file(self, file_or_buffer): # fits is an optional dependency: don't import globally from astropy.io import fits self._initialize(fits.open(file_or_buffer)) def _initialize(self, hdulist): data = hdulist[0].data self.name = hdulist[0].header['NAME'] self.spec_cln = hdulist[0].header['SPEC_CLN'] self.coeff0 = hdulist[0].header['COEFF0'] self.coeff1 = hdulist[0].header['COEFF1'] self.z = hdulist[0].header['Z'] self.zerr = hdulist[0].header['Z_ERR'] self.zconf = hdulist[0].header['Z_CONF'] self.spectrum = data[0] self.spectrum_cont = data[1] self.error = data[2] self.mask = data[3] self.large_err = self.error.max() * 2 self.hdulist = hdulist def get_line_ew(self, wavelength): i = np.where(abs(self.hdulist[2].data['restWave'] - wavelength) < 1) return self.hdulist[2].data['ew'][i] def __del__(self): if hasattr(self, 'hdulist'): del self.hdulist gc.collect() def copy(self): snew = self.__class__() for param in ['name', 'spec_cln', 'coeff0', 'coeff1', 'z', 'zerr', 'zconf', 'spectrum', 'spectrum_cont', 'error', 'large_err', 'mask', 'hdulist']: setattr(snew, param, getattr(self, param)) return snew def restframe(self): snew = self.copy() snew.coeff0 = self.coeff0_restframe() snew.z = 0 return snew def __len__(self): return len(self.spectrum) def log_w_min(self, i=None): """ if i is specified, return log_w_min of bin i otherwise, return log_w_min of the spectrum """ if i is None: i = 0 return self.coeff0 + (i - 0.5) * self.coeff1 def log_w_max(self, i=None): """ if i is specified, return log_w_max of bin i otherwise, return log_max of the spectrum """ if i is None: i = len(self) - 1 return self.coeff0 + (i + 0.5) * self.coeff1 def w_min(self, i=None): return 10 ** self.log_w_min(i) def w_max(self, i=None): return 10 ** self.log_w_max(i) def coeff0_restframe(self): return self.coeff0 - np.log10(1 + self.z) def wavelength(self, restframe=False): """ return the wavelength of the spectrum in angstroms """ if restframe: coeff0 = self.coeff0_restframe() else: coeff0 = self.coeff0 return 10 ** (coeff0 + self.coeff1 * np.arange(len(self.spectrum))) def compute_mask(self, frac=0.5, filtwidth=5): """ return a mask showing where noise spikes to frac over the local background """ smoothed_noise = gaussian_filter1d(self.error, filtwidth) mask = ((self.error >= (1 + frac) * smoothed_noise) | (self.error <= 0) | (self.error >= self.large_err) | (self.spectrum == 0)) mask_filtered = uniform_filter1d(mask.astype(float), max(3, filtwidth)) return mask_filtered > 0.5 / filtwidth def rebin(self, rebin_coeff0, rebin_coeff1, rebin_length): """Rebin the spectrum to a new grid. Parameters ---------- rebin_coeff0: float log minimum wavelength rebin_coeff1: float log wavelength bin width rebin_length: int number of bins Returns ------- S_new: SDSSfits object The new spectrum, rebinned to the desired wavelength binning """ snew = self.copy() snew.spectrum = np.zeros(rebin_length) snew.error = np.zeros(rebin_length) snew.coeff0 = rebin_coeff0 snew.coeff1 = rebin_coeff1 N_old = len(self.spectrum) N_new = len(snew.spectrum) log_w_old = self.coeff0 + (np.arange(N_old + 1) - 0.5) * self.coeff1 log_w_new = snew.coeff0 + (np.arange(N_new + 1) - 0.5) * snew.coeff1 # Perform the interpolation. We'll interpolate the cumulative sum # so that the total flux of the spectrum is conserved. # interpolate spectrum spec_cuml_old = self.spectrum.cumsum() tck = interpolate.splrep(log_w_old, np.hstack(([0], spec_cuml_old))) spec_cuml_new = interpolate.splev(log_w_new, tck) spec_cuml_new[log_w_new >= log_w_old[-1]] = log_w_old[-1] spec_cuml_new[log_w_new <= log_w_old[0]] = 0 snew.spectrum = np.diff(spec_cuml_new) snew.spectrum *= self.coeff1 / snew.coeff1 # interpolate error err_cuml_old = self.error.cumsum() tck = interpolate.splrep(log_w_old, np.hstack(([0], err_cuml_old))) err_cuml_new = interpolate.splev(log_w_new, tck) err_cuml_new[log_w_new >= log_w_old[-1]] = log_w_old[-1] err_cuml_new[log_w_new <= log_w_old[0]] = 0 snew.error = np.diff(err_cuml_new) snew.error *= self.coeff1 / snew.coeff1 return snew def _get_line_strength(self, line): lam = LINES.get(line) if lam is None: lam1 = LINES.get(line + 'a') ind1 = np.where(abs(self.hdulist[2].data['restWave'] - lam1) < 1)[0] lam2 = LINES.get(line + 'b') ind2 = np.where(abs(self.hdulist[2].data['restWave'] - lam2) < 1)[0] if len(ind1) == 0: s1 = h1 = 0 nsig1 = 0 else: s1 = self.hdulist[2].data['sigma'][ind1] h1 = self.hdulist[2].data['height'][ind1] nsig1 = self.hdulist[2].data['nsigma'][ind1] if len(ind2) == 0: s2 = h2 = 0 nsig2 = 0 else: s2 = self.hdulist[2].data['sigma'][ind2] h2 = self.hdulist[2].data['height'][ind2] nsig2 = self.hdulist[2].data['nsigma'][ind2] strength = s1 * h1 + s2 * h2 nsig = max(nsig1, nsig2) else: ind = np.where(abs(self.hdulist[2].data['restWave'] - lam) < 1)[0] if len(ind) == 0: strength = 0 nsig = 0 else: s = self.hdulist[2].data['sigma'][ind] h = self.hdulist[2].data['height'][ind] nsig = self.hdulist[2].data['nsigma'][ind] strength = s * h return strength, nsig def lineratio_index(self, indicator='NII'): """Return the line ratio index for the given galaxy. This is the index used in Vanderplas et al 2009, and makes use of line-ratio fits from Kewley et al 2001 Parameters ---------- indicator: string ['NII'|'OI'|'SII'] The emission line to use as an indicator Returns ------- cln: integer The classification of the spectrum based on SDSS pipeline and the line ratios. 0 : unknown (SPEC_CLN = 0) 1 : star (SPEC_CLN = 1) 2 : absorption galaxy (H-alpha seen in absorption) 3 : normal galaxy (no significant H-alpha emission or absorption) 4 : emission line galaxies (below line-ratio curve) 5 : narrow-line QSO (above line-ratio curve) 6 : broad-line QSO (SPEC_CLN = 3) 7 : Sky (SPEC_CLN = 4) 8 : Hi-z QSO (SPEC_CLN = 5) 9 : Late-type star (SPEC_CLN = 6) 10 : Emission galaxy (SPEC_CLN = 7) ratios: tuple The line ratios used to compute this """ assert indicator in ['NII', 'OI', 'SII'] if self.spec_cln < 2: return self.spec_cln, (0, 0) elif self.spec_cln > 2: return self.spec_cln + 3, (0, 0) strength_Ha, nsig_Ha = self._get_line_strength('Ha') strength_Hb, nsig_Hb = self._get_line_strength('Hb') if nsig_Ha < 3 or nsig_Hb < 3: return 3, (0, 0) if strength_Ha < 0 or strength_Hb < 0: return 2, (0, 0) # all that's left is choosing between 4 and 5 # we do this based on line-ratios strength_I, nsig_I = self._get_line_strength(indicator) strength_OIII, nsig_OIII = self._get_line_strength('OIII') log_OIII_Hb = np.log10(strength_OIII / strength_Hb) I_Ha = np.log10(strength_I / strength_Ha) if indicator == 'NII': if I_Ha >= 0.47 or log_OIII_Hb >= log_OIII_Hb_NII(I_Ha): return 5, (I_Ha, log_OIII_Hb) else: return 4, (I_Ha, log_OIII_Hb) elif indicator == 'OI': if I_Ha >= -0.59 or log_OIII_Hb >= log_OIII_Hb_OI(I_Ha): return 5, (I_Ha, log_OIII_Hb) else: return 4, (I_Ha, log_OIII_Hb) else: if I_Ha >= 0.32 or log_OIII_Hb >= log_OIII_Hb_SII(I_Ha): return 5, (I_Ha, log_OIII_Hb) else: return 4, (I_Ha, log_OIII_Hb) # ---------------------------------------------------------------------- # Empirical fits from Kewley et al 2001 def log_OIII_Hb_NII(log_NII_Ha, eps=0): return 1.19 + eps + 0.61 / (log_NII_Ha - eps - 0.47) def log_OIII_Hb_OI(log_OI_Ha, eps=0): return 1.33 + eps + 0.73 / (log_OI_Ha - eps + 0.59) def log_OIII_Hb_SII(log_SII_Ha, eps=0): return 1.30 + eps + 0.72 / (log_SII_Ha - eps - 0.32) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1612224989.0 astroML-1.0.2/astroML/datasets/tools/sql_query.py0000644000076700000240000000167100000000000021366 0ustar00bsipoczstaff""" Tools to perform a SQL queries to an online server. Default values are provided for http://cas.sdss.org """ from urllib.request import urlopen from urllib.parse import urlencode PUBLIC_URL = 'http://cas.sdss.org/public/en/tools/search/x_sql.aspx' DEFAULT_FMT = 'csv' def remove_sql_comments(sql): """Strip SQL comments starting with --""" return ' \n'.join(map(lambda x: x.split('--')[0], sql.split('\n'))) def sql_query(sql_str, url=PUBLIC_URL, format='csv'): """Execute query Parameters ---------- sql_str : string valid sql query url: string (optional) query url. Default is http://cas.sdss.org query script format: string (default='csv') query output format Returns ------- F: file object results of the query """ sql_str = remove_sql_comments(sql_str) params = urlencode(dict(cmd=sql_str, format=format)) return urlopen(url + '?%s' % params) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1612224989.0 astroML-1.0.2/astroML/datasets/wmap_temperatures.py0000644000076700000240000000453000000000000021743 0ustar00bsipoczstaffimport os import numpy as np from . import get_data_home from .tools import download_with_progress_bar DATA_URL = ('http://lambda.gsfc.nasa.gov/data/map/dr4/' 'skymaps/7yr/raw/wmap_band_imap_r9_7yr_W_v4.fits') MASK_URL = ('http://lambda.gsfc.nasa.gov/data/map/dr4/' 'ancillary/masks/wmap_temperature_analysis_mask_r9_7yr_v4.fits') def fetch_wmap_temperatures(masked=False, data_home=None, download_if_missing=True): """Loader for WMAP temperature map data Parameters ---------- masked : optional, default=False If True, then return the foreground-masked healpix array of data If False, then return the raw temperature array data_home : optional, default=None Specify another download and cache folder for the datasets. By default all astroML data is stored in '~/astroML_data'. download_if_missing : optional, default=True If False, raise a IOError if the data is not locally available instead of trying to download the data from the source site. Returns ------- data : np.ndarray or np.ma.MaskedArray record array containing (masked) temperature data """ # because of a bug in healpy, pylab must be imported before healpy is # or else a segmentation fault can result. import healpy as hp data_home = get_data_home(data_home) data_file = os.path.join(data_home, os.path.basename(DATA_URL)) mask_file = os.path.join(data_home, os.path.basename(MASK_URL)) if not os.path.exists(data_file): if not download_if_missing: raise IOError('data not present on disk. ' 'set download_if_missing=True to download') data_buffer = download_with_progress_bar(DATA_URL) open(data_file, 'wb').write(data_buffer) data = hp.read_map(data_file) if masked: if not os.path.exists(mask_file): if not download_if_missing: raise IOError('mask data not present on disk. ' 'set download_if_missing=True to download') mask_buffer = download_with_progress_bar(MASK_URL) open(mask_file, 'wb').write(mask_buffer) mask = hp.read_map(mask_file) data = hp.ma(data) data.mask = np.logical_not(mask) # WMAP mask has 0=bad. We need 1=bad return data ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1615393766.0 astroML-1.0.2/astroML/decorators.py0000644000076700000240000000051100000000000016527 0ustar00bsipoczstaffimport warnings from astroML.utils.exceptions import AstroMLDeprecationWarning from astroML.utils.decorators import pickle_results # noqa: F401 warnings.warn("'decorators' has been moved to 'astroML.utils' and will be " "removed from the main namespace in the future.", AstroMLDeprecationWarning) ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1643147665.4231021 astroML-1.0.2/astroML/density_estimation/0000755000076700000240000000000000000000000017726 5ustar00bsipoczstaff././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1615393766.0 astroML-1.0.2/astroML/density_estimation/__init__.py0000644000076700000240000000061400000000000022040 0ustar00bsipoczstafffrom .density_estimation import KNeighborsDensity from .xdeconv import XDGMM from .bayesian_blocks import bayesian_blocks from .empirical import FunctionDistribution, EmpiricalDistribution from .gauss_mixture import GaussianMixture1D from .histtools import (scotts_bin_width, freedman_bin_width, knuth_bin_width, histogram) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1615393766.0 astroML-1.0.2/astroML/density_estimation/bayesian_blocks.py0000644000076700000240000003060200000000000023431 0ustar00bsipoczstaff""" Bayesian Block implementation ============================= Dynamic programming algorithm for finding the optimal adaptive-width histogram. Based on Scargle et al 2012 [1]_ References ---------- .. [1] http://adsabs.harvard.edu/abs/2012arXiv1207.5578S """ import numpy as np # TODO: implement other fitness functions from appendix B of Scargle 2012 from astroML.utils import deprecated from astroML.utils.exceptions import AstroMLDeprecationWarning @deprecated('0.4', alternative='astropy.stats.FitnessFunc', warning_type=AstroMLDeprecationWarning) class FitnessFunc: """Base class for fitness functions Each fitness function class has the following: - fitness(...) : compute fitness function. Arguments accepted by fitness must be among [T_k, N_k, a_k, b_k, c_k] - prior(N, Ntot) : compute prior on N given a total number of points Ntot """ def __init__(self, p0=0.05, gamma=None): self.p0 = p0 self.gamma = gamma def validate_input(self, t, x, sigma): """Check that input is valid""" pass def fitness(**kwargs): raise NotImplementedError() def prior(self, N, Ntot): if self.gamma is None: return self.p0_prior(N, Ntot) else: return self.gamma_prior(N, Ntot) def p0_prior(self, N, Ntot): # eq. 21 from Scargle 2012 return 4 - np.log(73.53 * self.p0 * (N ** -0.478)) def gamma_prior(self, N, Ntot): """Basic prior, parametrized by gamma (eq. 3 in Scargle 2012)""" if self.gamma == 1: return 0 else: return (np.log(1 - self.gamma) - np.log(1 - self.gamma ** (Ntot + 1)) + N * np.log(self.gamma)) # the fitness_args property will return the list of arguments accepted by # the method fitness(). This allows more efficient computation below. @property def args(self): try: # Python 2 return self.fitness.func_code.co_varnames[1:] except AttributeError: return self.fitness.__code__.co_varnames[1:] @deprecated('0.4', alternative='astropy.stats.Events', warning_type=AstroMLDeprecationWarning) class Events(FitnessFunc): """Fitness for binned or unbinned events Parameters ---------- p0 : float False alarm probability, used to compute the prior on N (see eq. 21 of Scargle 2012). Default prior is for p0 = 0. gamma : float or None If specified, then use this gamma to compute the general prior form, p ~ gamma^N. If gamma is specified, p0 is ignored. """ def fitness(self, N_k, T_k): # eq. 19 from Scargle 2012 return N_k * (np.log(N_k) - np.log(T_k)) def prior(self, N, Ntot): if self.gamma is not None: return self.gamma_prior(N, Ntot) else: # eq. 21 from Scargle 2012 return 4 - np.log(73.53 * self.p0 * (N ** -0.478)) @deprecated('0.4', alternative='astropy.stats.RegularEvents', warning_type=AstroMLDeprecationWarning) class RegularEvents(FitnessFunc): """Fitness for regular events This is for data which has a fundamental "tick" length, so that all measured values are multiples of this tick length. In each tick, there are either zero or one counts. Parameters ---------- dt : float tick rate for data gamma : float specifies the prior on the number of bins: p ~ gamma^N """ def __init__(self, dt, p0=0.05, gamma=None): self.dt = dt self.p0 = p0 self.gamma = gamma def validate_input(self, t, x, sigma): unique_x = np.unique(x) if list(unique_x) not in ([0], [1], [0, 1]): raise ValueError("Regular events must have only 0 and 1 in x") def fitness(self, T_k, N_k): # Eq. 75 of Scargle 2012 M_k = T_k / self.dt N_over_M = N_k * 1. / M_k eps = 1E-8 if np.any(N_over_M > 1 + eps): import warnings warnings.warn('regular events: N/M > 1. ' 'Is the time step correct?') one_m_NM = 1 - N_over_M N_over_M[N_over_M <= 0] = 1 one_m_NM[one_m_NM <= 0] = 1 return N_k * np.log(N_over_M) + (M_k - N_k) * np.log(one_m_NM) @deprecated('0.4', alternative='astropy.stats.PointMeasures', warning_type=AstroMLDeprecationWarning) class PointMeasures(FitnessFunc): """Fitness for point measures Parameters ---------- gamma : float specifies the prior on the number of bins: p ~ gamma^N if gamma is not specified, then a prior based on simulations will be used (see sec 3.3 of Scargle 2012) """ def __init__(self, p0=None, gamma=None): self.p0 = p0 self.gamma = gamma def fitness(self, a_k, b_k): # eq. 41 from Scargle 2012 return (b_k * b_k) / (4 * a_k) def prior(self, N, Ntot): if self.gamma is not None: return self.gamma_prior(N, Ntot) elif self.p0 is not None: return self.p0_prior(N, Ntot) else: # eq. at end of sec 3.3 in Scargle 2012 return 1.32 + 0.577 * np.log10(N) @deprecated('0.4', alternative='astropy.stats.bayesian_blocks', warning_type=AstroMLDeprecationWarning) def bayesian_blocks(t, x=None, sigma=None, fitness='events', **kwargs): """Bayesian Blocks Implementation This is a flexible implementation of the Bayesian Blocks algorithm described in Scargle 2012 [1]_ Parameters ---------- t : array_like data times (one dimensional, length N) x : array_like (optional) data values sigma : array_like or float (optional) data errors fitness : str or object the fitness function to use. If a string, the following options are supported: - 'events' : binned or unbinned event data extra arguments are `p0`, which gives the false alarm probability to compute the prior, or `gamma` which gives the slope of the prior on the number of bins. - 'regular_events' : non-overlapping events measured at multiples of a fundamental tick rate, `dt`, which must be specified as an additional argument. The prior can be specified through `gamma`, which gives the slope of the prior on the number of bins. - 'measures' : fitness for a measured sequence with Gaussian errors The prior can be specified using `gamma`, which gives the slope of the prior on the number of bins. If `gamma` is not specified, then a simulation-derived prior will be used. Alternatively, the fitness can be a user-specified object of type derived from the FitnessFunc class. Returns ------- edges : ndarray array containing the (N+1) bin edges Examples -------- Event data: >>> t = np.random.normal(size=100) >>> bins = bayesian_blocks(t, fitness='events', p0=0.01) Event data with repeats: >>> t = np.random.normal(size=100) >>> t[80:] = t[:20] >>> bins = bayesian_blocks(t, fitness='events', p0=0.01) Regular event data: >>> dt = 0.01 >>> t = dt * np.arange(1000) >>> x = np.zeros(len(t)) >>> x[np.random.randint(0, len(t), int(len(t) / 10))] = 1 >>> bins = bayesian_blocks(t, x, fitness='regular_events', dt=dt, gamma=0.9) Measured point data with errors: >>> t = 100 * np.random.random(100) >>> x = np.exp(-0.5 * (t - 50) ** 2) >>> sigma = 0.1 >>> x_obs = np.random.normal(x, sigma) >>> bins = bayesian_blocks(t, x=x_obs, fitness='measures') References ---------- .. [1] Scargle, J `et al.` (2012) http://adsabs.harvard.edu/abs/2012arXiv1207.5578S See Also -------- astroML.plotting.hist : histogram plotting function which can make use of bayesian blocks. """ # validate array input t = np.asarray(t, dtype=float) if x is not None: x = np.asarray(x) if sigma is not None: sigma = np.asarray(sigma) # verify the fitness function if fitness == 'events': if x is not None and np.any(x % 1 > 0): raise ValueError("x must be integer counts for fitness='events'") fitfunc = Events(**kwargs) elif fitness == 'regular_events': if x is not None and (np.any(x % 1 > 0) or np.any(x > 1)): raise ValueError("x must be 0 or 1 for fitness='regular_events'") fitfunc = RegularEvents(**kwargs) elif fitness == 'measures': if x is None: raise ValueError("x must be specified for fitness='measures'") fitfunc = PointMeasures(**kwargs) else: if not (hasattr(fitness, 'args') and hasattr(fitness, 'fitness') and hasattr(fitness, 'prior')): raise ValueError("fitness not understood") fitfunc = fitness # find unique values of t t = np.array(t, dtype=float) assert t.ndim == 1 unq_t, unq_ind, unq_inv = np.unique(t, return_index=True, return_inverse=True) # if x is not specified, x will be counts at each time if x is None: if sigma is not None: raise ValueError("If sigma is specified, x must be specified") if len(unq_t) == len(t): x = np.ones_like(t) else: x = np.bincount(unq_inv) t = unq_t sigma = 1 # if x is specified, then we need to sort t and x together else: x = np.asarray(x) if len(t) != len(x): raise ValueError("Size of t and x does not match") if len(unq_t) != len(t): raise ValueError("Repeated values in t not supported when " "x is specified") t = unq_t x = x[unq_ind] # verify the given sigma value N = t.size if sigma is not None: sigma = np.asarray(sigma) if sigma.shape not in [(), (1,), (N,)]: raise ValueError('sigma does not match the shape of x') else: sigma = 1 # validate the input fitfunc.validate_input(t, x, sigma) # compute values needed for computation, below if 'a_k' in fitfunc.args: ak_raw = np.ones_like(x) / sigma / sigma if 'b_k' in fitfunc.args: bk_raw = x / sigma / sigma if 'c_k' in fitfunc.args: ck_raw = x * x / sigma / sigma # create length-(N + 1) array of cell edges edges = np.concatenate([t[:1], 0.5 * (t[1:] + t[:-1]), t[-1:]]) block_length = t[-1] - edges # arrays to store the best configuration best = np.zeros(N, dtype=float) last = np.zeros(N, dtype=int) # ----------------------------------------------------------------- # Start with first data cell; add one cell at each iteration # ----------------------------------------------------------------- for R in range(N): # Compute fit_vec : fitness of putative last block (end at R) kwds = {} # T_k: width/duration of each block if 'T_k' in fitfunc.args: kwds['T_k'] = block_length[:R + 1] - block_length[R + 1] # N_k: number of elements in each block if 'N_k' in fitfunc.args: kwds['N_k'] = np.cumsum(x[:R + 1][::-1])[::-1] # a_k: eq. 31 if 'a_k' in fitfunc.args: kwds['a_k'] = 0.5 * np.cumsum(ak_raw[:R + 1][::-1])[::-1] # b_k: eq. 32 if 'b_k' in fitfunc.args: kwds['b_k'] = - np.cumsum(bk_raw[:R + 1][::-1])[::-1] # c_k: eq. 33 if 'c_k' in fitfunc.args: kwds['c_k'] = 0.5 * np.cumsum(ck_raw[:R + 1][::-1])[::-1] # evaluate fitness function fit_vec = fitfunc.fitness(**kwds) A_R = fit_vec - fitfunc.prior(R + 1, N) A_R[1:] += best[:R] i_max = np.argmax(A_R) last[R] = i_max best[R] = A_R[i_max] # ----------------------------------------------------------------- # Now find changepoints by iteratively peeling off the last block # ----------------------------------------------------------------- change_points = np.zeros(N, dtype=int) i_cp = N ind = N while True: i_cp -= 1 change_points[i_cp] = ind if ind == 0: break ind = last[ind - 1] change_points = change_points[i_cp:] return edges[change_points] ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1615393766.0 astroML-1.0.2/astroML/density_estimation/density_estimation.py0000644000076700000240000000611600000000000024217 0ustar00bsipoczstaff""" Tools for density estimation See also: - sklearn.mixture.gmm : gaussian mixture models - sklearn.neighbors.KernelDensity : Kernel Density Estimation (version 0.14+) - astroML.density_estimation.XDGMM : extreme deconvolution - scipy.spatial.gaussian_kde : a gaussian KDE implementation """ import numpy as np from scipy import special from sklearn.base import BaseEstimator from sklearn.neighbors import BallTree def n_volume(r, n): """compute the n-volume of a sphere of radius r in n dimensions""" return np.pi ** (0.5 * n) / special.gamma(0.5 * n + 1) * (r ** n) class KNeighborsDensity(BaseEstimator): """K-neighbors density estimation Parameters ---------- method : string method to use. Must be one of ['simple'|'bayesian'] (see below) n_neighbors : int number of neighbors to use Notes ----- The two methods are as follows: - simple: The density at a point x is estimated by n(x) ~ k / r_k^n - bayesian: The density at a point x is estimated by n(x) ~ sum_{i=1}^k[1 / r_i^n]. See Also -------- KDE : kernel density estimation """ def __init__(self, method='bayesian', n_neighbors=10): if method not in ['simple', 'bayesian']: raise ValueError("method = %s not recognized" % method) self.n_neighbors = n_neighbors self.method = method def fit(self, X): """Train the K-neighbors density estimator Parameters ---------- X : array_like array of points to use to train the KDE. Shape is (n_points, n_dim) """ self.X_ = np.atleast_2d(X) if self.X_.ndim != 2: raise ValueError('X must be two-dimensional') self.bt_ = BallTree(self.X_) return self def eval(self, X): """Evaluate the kernel density estimation Parameters ---------- X : array_like array of points at which to evaluate the KDE. Shape is (n_points, n_dim), where n_dim matches the dimension of the training points. Returns ------- dens : ndarray array of shape (n_points,) giving the density at each point. The density will be normalized for metric='gaussian' or metric='tophat', and will be unnormalized otherwise. """ X = np.atleast_2d(X) if X.ndim != 2: raise ValueError('X must be two-dimensional') if X.shape[1] != self.X_.shape[1]: raise ValueError('dimensions of X do not match training dimension') dist, ind = self.bt_.query(X, self.n_neighbors, return_distance=True) k = float(self.n_neighbors) ndim = X.shape[1] if self.method == 'simple': return k / n_volume(dist[:, -1], ndim) elif self.method == 'bayesian': # XXX this may be wrong in more than 1 dimension! return (k * (k + 1) * 0.5 / n_volume(1, ndim) / (dist ** ndim).sum(1)) else: raise ValueError("Unrecognized method '%s'" % self.method) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1612224990.0 astroML-1.0.2/astroML/density_estimation/empirical.py0000644000076700000240000000625000000000000022250 0ustar00bsipoczstaffimport numpy as np from scipy import interpolate from sklearn.utils import check_random_state class FunctionDistribution: """Generate random variables distributed according to an arbitrary function Parameters ---------- func : function func should take an array of x values, and return an array proportional to the probability density at each value xmin : float minimum value of interest xmax : float maximum value of interest Nx : int (optional) number of samples to draw. Default is 1000 random_state : None, int, or np.random.RandomState instance random seed or random number generator func_args : dictionary (optional) additional keyword arguments to be passed to func """ def __init__(self, func, xmin, xmax, Nx=1000, random_state=None, func_args=None): self.random_state = check_random_state(random_state) if func_args is None: func_args = {} x = np.linspace(xmin, xmax, Nx) Px = func(x, **func_args) # if there are too many zeros, interpolation will fail positive = (Px > 1E-10 * Px.max()) x = x[positive] Px = Px[positive].cumsum() Px /= Px[-1] self._tck = interpolate.splrep(Px, x) def rvs(self, shape): """Draw random variables from the distribution Parameters ---------- shape : integer or tuple shape of desired array Returns ------- rv : ndarray, shape=shape random variables """ # generate uniform variables between 0 and 1 y = self.random_state.random_sample(shape) return interpolate.splev(y, self._tck) class EmpiricalDistribution: """Empirically learn a distribution from one-dimensional data Parameters ---------- data : one-dimensional array input data Examples -------- >>> import numpy as np >>> np.random.seed(0) >>> x = np.random.normal(size=10000) # normally-distributed variables >>> x.mean(), x.std() (-0.018433720158265783, 0.98755656817612003) >>> x2 = EmpiricalDistribution(x).rvs(10000) >>> x2.mean(), x2.std() (-0.020293716681613363, 1.0039249294845276) Notes ----- This function works by approximating the inverse of the cumulative distribution using an efficient spline fit to the sorted values. """ def __init__(self, data): # copy, because we'll need to sort in-place data = np.array(data, copy=True) if data.ndim != 1: raise ValueError("data should be one-dimensional") data.sort() # set up spline y = np.linspace(0, 1, data.size) self._tck = interpolate.splrep(y, data) def rvs(self, shape): """Draw random variables from the distribution Parameters ---------- shape : integer or tuple shape of desired array Returns ------- rv : ndarray, shape=shape random variables """ # generate uniform variables between 0 and 1 y = np.random.random(shape) return interpolate.splev(y, self._tck) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1612224990.0 astroML-1.0.2/astroML/density_estimation/gauss_mixture.py0000644000076700000240000000307300000000000023202 0ustar00bsipoczstaffimport numpy as np from sklearn.mixture import GaussianMixture class GaussianMixture1D: """ Simple class to work with 1D mixtures of Gaussians Parameters ---------- means : array_like means of component distributions (default = 0) sigmas : array_like standard deviations of component distributions (default = 1) weights : array_like weight of component distributions (default = 1) """ def __init__(self, means=0, sigmas=1, weights=1): data = np.array([t for t in np.broadcast(means, sigmas, weights)]) components = data.shape[0] self._gmm = GaussianMixture(components, covariance_type='spherical') self._gmm.means_ = data[:, :1] self._gmm.weights_ = data[:, 2] / data[:, 2].sum() self._gmm.covariances_ = data[:, 1] ** 2 self._gmm.precisions_cholesky_ = 1 / np.sqrt(self._gmm.covariances_) self._gmm.fit = None # disable fit method for safety def sample(self, size): """Random sample""" return self._gmm.sample(size) def pdf(self, x): """Compute probability distribution""" if x.ndim == 1: x = x[:, np.newaxis] logprob = self._gmm.score_samples(x) return np.exp(logprob) def pdf_individual(self, x): """Compute probability distribution of each component""" if x.ndim == 1: x = x[:, np.newaxis] logprob = self._gmm.score_samples(x) responsibilities = self._gmm.predict_proba(x) return responsibilities * np.exp(logprob[:, np.newaxis]) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1612224990.0 astroML-1.0.2/astroML/density_estimation/histtools.py0000644000076700000240000001663600000000000022344 0ustar00bsipoczstaff""" Tools for working with distributions """ import numpy as np from scipy.special import gammaln from astropy import stats as astropy_stats from astroML.utils import deprecated from astroML.utils.exceptions import AstroMLDeprecationWarning @deprecated('0.4', alternative='astropy.stats.scott_bin_width', warning_type=AstroMLDeprecationWarning) def scotts_bin_width(data, return_bins=False): r"""Return the optimal histogram bin width using Scott's rule: Parameters ---------- data : array-like, ndim=1 observed (one-dimensional) data return_bins : bool (optional) if True, then return the bin edges Returns ------- width : float optimal bin width using Scott's rule bins : ndarray bin edges: returned if `return_bins` is True Notes ----- The optimal bin width is .. math:: \Delta_b = \frac{3.5\sigma}{n^{1/3}} where :math:`\sigma` is the standard deviation of the data, and :math:`n` is the number of data points. See Also -------- knuth_bin_width freedman_bin_width astroML.plotting.hist """ return astropy_stats.scott_bin_width(data, return_bins) @deprecated('0.4', alternative='astropy.stats.freedman_bin_width', warning_type=AstroMLDeprecationWarning) def freedman_bin_width(data, return_bins=False): r"""Return the optimal histogram bin width using the Freedman-Diaconis rule Parameters ---------- data : array-like, ndim=1 observed (one-dimensional) data return_bins : bool (optional) if True, then return the bin edges Returns ------- width : float optimal bin width using Scott's rule bins : ndarray bin edges: returned if `return_bins` is True Notes ----- The optimal bin width is .. math:: \Delta_b = \frac{2(q_{75} - q_{25})}{n^{1/3}} where :math:`q_{N}` is the :math:`N` percent quartile of the data, and :math:`n` is the number of data points. See Also -------- knuth_bin_width scotts_bin_width astroML.plotting.hist """ return astropy_stats.freedman_bin_width(data, return_bins) @deprecated('0.4', warning_type=AstroMLDeprecationWarning) class KnuthF: r"""Class which implements the function minimized by knuth_bin_width Parameters ---------- data : array-like, one dimension data to be histogrammed Notes ----- the function F is given by .. math:: F(M|x,I) = n\log(M) + \log\Gamma(\frac{M}{2}) - M\log\Gamma(\frac{1}{2}) - \log\Gamma(\frac{2n+M}{2}) + \sum_{k=1}^M \log\Gamma(n_k + \frac{1}{2}) where :math:`\Gamma` is the Gamma function, :math:`n` is the number of data points, :math:`n_k` is the number of measurements in bin :math:`k`. See Also -------- knuth_bin_width astroML.plotting.hist """ def __init__(self, data): self.data = np.array(data, copy=True) if self.data.ndim != 1: raise ValueError("data should be 1-dimensional") self.data.sort() self.n = self.data.size def bins(self, M): """Return the bin edges given a width dx""" return np.linspace(self.data[0], self.data[-1], int(M) + 1) def __call__(self, M): return self.eval(M) def eval(self, M): """Evaluate the Knuth function Parameters ---------- dx : float Width of bins Returns ------- F : float evaluation of the negative Knuth likelihood function: smaller values indicate a better fit. """ M = int(M) if M <= 0: return np.inf bins = self.bins(M) nk, bins = np.histogram(self.data, bins) return -(self.n * np.log(M) + gammaln(0.5 * M) - M * gammaln(0.5) - gammaln(self.n + 0.5 * M) + np.sum(gammaln(nk + 0.5))) @deprecated('0.4', alternative='astropy.stats.knuth_bin_width', warning_type=AstroMLDeprecationWarning) def knuth_bin_width(data, return_bins=False, disp=True): r"""Return the optimal histogram bin width using Knuth's rule [1]_ Parameters ---------- data : array-like, ndim=1 observed (one-dimensional) data return_bins : bool (optional) if True, then return the bin edges Returns ------- dx : float optimal bin width. Bins are measured starting at the first data point. bins : ndarray bin edges: returned if `return_bins` is True Notes ----- The optimal number of bins is the value M which maximizes the function .. math:: F(M|x,I) = n\log(M) + \log\Gamma(\frac{M}{2}) - M\log\Gamma(\frac{1}{2}) - \log\Gamma(\frac{2n+M}{2}) + \sum_{k=1}^M \log\Gamma(n_k + \frac{1}{2}) where :math:`\Gamma` is the Gamma function, :math:`n` is the number of data points, :math:`n_k` is the number of measurements in bin :math:`k`. References ---------- .. [1] Knuth, K.H. "Optimal Data-Based Binning for Histograms". arXiv:0605197, 2006 See Also -------- KnuthF freedman_bin_width scotts_bin_width """ return astropy_stats.knuth_bin_width(data, return_bins) @deprecated('0.4', alternative='astropy.stats.histogram', warning_type=AstroMLDeprecationWarning) def histogram(a, bins=10, range=None, **kwargs): """Enhanced histogram This is a histogram function that enables the use of more sophisticated algorithms for determining bins. Aside from the `bins` argument allowing a string specified how bins are computed, the parameters are the same as numpy.histogram(). Parameters ---------- a : array_like array of data to be histogrammed bins : int or list or str (optional) If bins is a string, then it must be one of: 'blocks' : use bayesian blocks for dynamic bin widths 'knuth' : use Knuth's rule to determine bins 'scotts' : use Scott's rule to determine bins 'freedman' : use the Freedman-diaconis rule to determine bins range : tuple or None (optional) the minimum and maximum range for the histogram. If not specified, it will be (x.min(), x.max()) other keyword arguments are described in numpy.hist(). Returns ------- hist : array The values of the histogram. See `normed` and `weights` for a description of the possible semantics. bin_edges : array of dtype float Return the bin edges ``(length(hist)+1)``. See Also -------- numpy.histogram astroML.plotting.hist """ a = np.asarray(a) # if range is specified, we need to truncate the data for # the bin-finding routines if (range is not None and (bins in ['blocks', 'knuth', 'scotts', 'freedman'])): a = a[(a >= range[0]) & (a <= range[1])] if isinstance(bins, str): if bins == 'blocks': bins = astropy_stats.bayesian_blocks(a) elif bins == 'knuth': da, bins = astropy_stats.knuth_bin_width(a, True) elif bins == 'scotts': da, bins = astropy_stats.scott_bin_width(a, True) elif bins == 'freedman': da, bins = astropy_stats.freedman_bin_width(a, True) else: raise ValueError("unrecognized bin code: '{}'".format(bins)) return np.histogram(a, bins, range, **kwargs) ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1643147665.4247468 astroML-1.0.2/astroML/density_estimation/tests/0000755000076700000240000000000000000000000021070 5ustar00bsipoczstaff././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1541133836.0 astroML-1.0.2/astroML/density_estimation/tests/__init__.py0000644000076700000240000000000000000000000023167 0ustar00bsipoczstaff././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1612224990.0 astroML-1.0.2/astroML/density_estimation/tests/test_bayesian_blocks.py0000644000076700000240000000375500000000000025643 0ustar00bsipoczstaffimport numpy as np from numpy.testing import assert_allclose, assert_ from astropy.tests.helper import catch_warnings from astroML.density_estimation import bayesian_blocks from astroML.utils.exceptions import AstroMLDeprecationWarning def test_single_change_point(): np.random.seed(0) x = np.concatenate([np.random.random(100), 1 + np.random.random(200)]) with catch_warnings(AstroMLDeprecationWarning): bins = bayesian_blocks(x) assert_(len(bins) == 3) assert_allclose(bins[1], 1, rtol=0.02) def test_duplicate_events(): t = np.random.random(100) t[80:] = t[:20] x = np.ones_like(t) x[:20] += 1 with catch_warnings(AstroMLDeprecationWarning): bins1 = bayesian_blocks(t) bins2 = bayesian_blocks(t[:80], x[:80]) assert_allclose(bins1, bins2) def test_measures_fitness_homoscedastic(): np.random.seed(0) t = np.linspace(0, 1, 11) x = np.exp(-0.5 * (t - 0.5) ** 2 / 0.01 ** 2) sigma = 0.05 x = np.random.normal(x, sigma) with catch_warnings(AstroMLDeprecationWarning): bins = bayesian_blocks(t, x, sigma, fitness='measures') assert_allclose(bins, [0, 0.45, 0.55, 1]) def test_measures_fitness_heteroscedastic(): np.random.seed(1) t = np.linspace(0, 1, 11) x = np.exp(-0.5 * (t - 0.5) ** 2 / 0.01 ** 2) sigma = 0.02 + 0.02 * np.random.random(len(x)) x = np.random.normal(x, sigma) with catch_warnings(AstroMLDeprecationWarning): bins = bayesian_blocks(t, x, sigma, fitness='measures') assert_allclose(bins, [0, 0.45, 0.55, 1]) def test_regular_events(): np.random.seed(0) dt = 0.01 steps = np.concatenate([np.unique(np.random.randint(0, 500, 100)), np.unique(np.random.randint(500, 1000, 200))]) t = dt * steps with catch_warnings(AstroMLDeprecationWarning): bins = bayesian_blocks(t, fitness='regular_events', dt=dt) assert_(len(bins) == 3) assert_allclose(bins[1], 5, rtol=0.05) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1612224990.0 astroML-1.0.2/astroML/density_estimation/tests/test_density.py0000644000076700000240000000214500000000000024162 0ustar00bsipoczstaff""" Test density estimation techniques """ import pytest import numpy as np from numpy.testing import assert_allclose from scipy.stats import norm from astroML.density_estimation import KNeighborsDensity, GaussianMixture1D classifiers = [KNeighborsDensity(method='simple', n_neighbors=250), KNeighborsDensity(method='bayesian', n_neighbors=250)] @pytest.mark.parametrize("clf", classifiers) def test_1D_density(clf, atol=100): np.random.seed(0) dist = norm(0, 1) X = dist.rvs((5000, 1)) X2 = np.linspace(-5, 5, 10).reshape((10, 1)) true_dens = dist.pdf(X2[:, 0]) * X.shape[0] clf.fit(X) dens = clf.eval(X2) assert_allclose(dens, true_dens, atol=atol) def test_gaussian1d(): x = np.linspace(-6, 10, 1000) means = np.array([-1.5, 0.0, 2.3]) sigmas = np.array([1, 0.25, 3.8]) weights = np.array([1, 1, 1]) gauss = GaussianMixture1D(means=means, sigmas=sigmas, weights=weights) y = gauss.pdf(x) # Check whether sampling works gauss.sample(10) dx = x[1] - x[0] integral = np.sum(y*dx) assert_allclose(integral, 1., atol=0.02) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1431090039.0 astroML-1.0.2/astroML/density_estimation/tests/test_empirical.py0000644000076700000240000000151300000000000024446 0ustar00bsipoczstaffimport numpy as np from numpy.testing import assert_allclose from scipy.stats import norm from astroML.density_estimation import\ EmpiricalDistribution, FunctionDistribution def test_empirical_distribution(N=1000, rseed=0): np.random.seed(rseed) X = norm.rvs(0, 1, size=N) dist = EmpiricalDistribution(X) X2 = dist.rvs(N) meanX = X.mean() meanX2 = X2.mean() stdX = X.std() stdX2 = X2.std() assert_allclose([meanX, stdX], [meanX2, stdX2], atol=3 / np.sqrt(N)) def test_function_distribution(N=1000, rseed=0): f = norm(0, 1).pdf # go from -10 to 10 to check interpolation in presence of zeros dist = FunctionDistribution(f, -10, 10) np.random.seed(rseed) X = dist.rvs(N) meanX = X.mean() stdX = X.std() assert_allclose([meanX, stdX], [0, 1], atol=3 / np.sqrt(N)) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1612224990.0 astroML-1.0.2/astroML/density_estimation/tests/test_hist_binwidth.py0000644000076700000240000000270000000000000025337 0ustar00bsipoczstaffimport numpy as np from numpy.testing import assert_allclose, assert_ from astropy.tests.helper import catch_warnings from astroML.density_estimation import \ scotts_bin_width, freedman_bin_width, knuth_bin_width, histogram from astroML.utils.exceptions import AstroMLDeprecationWarning def test_scotts_bin_width(N=10000, rseed=0): np.random.seed(rseed) X = np.random.normal(size=N) with catch_warnings(AstroMLDeprecationWarning): delta = scotts_bin_width(X) assert_allclose(delta, 3.5 * np.std(X) / N ** (1. / 3)) def test_freedman_bin_width(N=10000, rseed=0): np.random.seed(rseed) X = np.random.normal(size=N) with catch_warnings(AstroMLDeprecationWarning): delta = freedman_bin_width(X) v25, v75 = np.percentile(X, [25, 75]) assert_allclose(delta, 2 * (v75 - v25) / N ** (1. / 3)) def test_knuth_bin_width(N=10000, rseed=0): np.random.seed(0) X = np.random.normal(size=N) with catch_warnings(AstroMLDeprecationWarning): dx, bins = knuth_bin_width(X, return_bins=True) assert_allclose(len(bins), 59) def test_histogram(N=1000, rseed=0): np.random.seed(0) x = np.random.normal(0, 1, N) for bins in [30, np.linspace(-5, 5, 31), 'knuth', 'scotts', 'freedman']: with catch_warnings(AstroMLDeprecationWarning): counts, bins = histogram(x, bins) assert_(counts.sum() == len(x)) assert_(len(counts) == len(bins) - 1) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1612224990.0 astroML-1.0.2/astroML/density_estimation/tests/test_xdeconv.py0000644000076700000240000000230400000000000024146 0ustar00bsipoczstaffimport pytest import numpy as np from numpy.testing import assert_allclose from astroML.density_estimation import XDGMM def test_XDGMM_1D_gaussian(N=100, sigma=0.1): np.random.seed(0) mu = 0 V = 1 X = np.random.normal(mu, V, size=(N, 1)) X += np.random.normal(0, sigma, size=(N, 1)) Xerr = sigma ** 2 * np.ones((N, 1, 1)) xdgmm = XDGMM(1).fit(X, Xerr) # because of sample variance, results will be similar # but not identical. We'll use a fudge factor of 0.1 assert_allclose(mu, xdgmm.mu[0], atol=0.1) assert_allclose(V, xdgmm.V[0], atol=0.1) @pytest.mark.parametrize("D", [1, 2, 3]) def test_single_gaussian(D, N=100, sigma=0.1): np.random.seed(0) mu = np.random.random(D) V = np.random.random((D, D)) V = np.dot(V, V.T) X = np.random.multivariate_normal(mu, V, size=N) Xerr = np.zeros((N, D, D)) Xerr[:, range(D), range(D)] = sigma ** 2 X += np.random.normal(0, sigma, X.shape) xdgmm = XDGMM(1) xdgmm.fit(X, Xerr) # because of sample variance, results will be similar # but not identical. We'll use a fudge factor of 0.1 assert_allclose(mu, xdgmm.mu[0], atol=0.1) assert_allclose(V, xdgmm.V[0], atol=0.1) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1615393766.0 astroML-1.0.2/astroML/density_estimation/xdeconv.py0000644000076700000240000001603700000000000021755 0ustar00bsipoczstaff""" Extreme deconvolution solver This follows Bovy et al. http://arxiv.org/pdf/0905.2979v2.pdf Arbitrary mixing matrices R are not yet implemented: currently, this only works with R = I. """ from time import time import numpy as np from scipy import linalg try: # SciPy >= 0.19 from scipy.special import logsumexp as logsumexp except ImportError: from scipy.misc import logsumexp as logsumexp from sklearn.base import BaseEstimator from sklearn.mixture import GaussianMixture from sklearn.utils import check_random_state from ..utils import log_multivariate_gaussian class XDGMM(BaseEstimator): """Extreme Deconvolution Fit an extreme deconvolution (XD) model to the data Parameters ---------- n_components: integer number of gaussian components to fit to the data max_iter: integer (optional) number of EM iterations to perform (default=100) tol: float (optional) stopping criterion for EM iterations (default=1E-5) Notes ----- This implementation follows Bovy et al. arXiv 0905.2979 """ def __init__(self, n_components, max_iter=100, tol=1E-5, verbose=False, random_state=None): self.n_components = n_components self.max_iter = max_iter self.tol = tol self.verbose = verbose self.random_state = random_state # model parameters: these are set by the fit() method self.V = None self.mu = None self.alpha = None def fit(self, X, Xerr, R=None): """Fit the XD model to data Parameters ---------- X: array_like Input data. shape = (n_samples, n_features) Xerr: array_like Error on input data. shape = (n_samples, n_features, n_features) R : array_like (TODO: not implemented) Transformation matrix from underlying to observed data. If unspecified, then it is assumed to be the identity matrix. """ if R is not None: raise NotImplementedError("mixing matrix R is not yet implemented") X = np.asarray(X) Xerr = np.asarray(Xerr) n_samples, n_features = X.shape # assume full covariances of data assert Xerr.shape == (n_samples, n_features, n_features) # initialize components via a few steps of GaussianMixture # this doesn't take into account errors, but is a fast first-guess gmm = GaussianMixture(self.n_components, max_iter=10, covariance_type='full', random_state=self.random_state).fit(X) self.mu = gmm.means_ self.alpha = gmm.weights_ self.V = gmm.covariances_ logL = self.logL(X, Xerr) for i in range(self.max_iter): t0 = time() self._EMstep(X, Xerr) logL_next = self.logL(X, Xerr) t1 = time() if self.verbose: print("%i: log(L) = %.5g" % (i + 1, logL_next)) print(" (%.2g sec)" % (t1 - t0)) if logL_next < logL + self.tol: break logL = logL_next return self def logprob_a(self, X, Xerr): """ Evaluate the probability for a set of points Parameters ---------- X: array_like Input data. shape = (n_samples, n_features) Xerr: array_like Error on input data. shape = (n_samples, n_features, n_features) Returns ------- p: ndarray Probabilities. shape = (n_samples,) """ X = np.asarray(X) Xerr = np.asarray(Xerr) n_samples, n_features = X.shape # assume full covariances of data assert Xerr.shape == (n_samples, n_features, n_features) X = X[:, np.newaxis, :] Xerr = Xerr[:, np.newaxis, :, :] T = Xerr + self.V return log_multivariate_gaussian(X, self.mu, T) + np.log(self.alpha) def logL(self, X, Xerr): """Compute the log-likelihood of data given the model Parameters ---------- X: array_like data, shape = (n_samples, n_features) Xerr: array_like errors, shape = (n_samples, n_features, n_features) Returns ------- logL : float log-likelihood """ return np.sum(logsumexp(self.logprob_a(X, Xerr), -1)) def _EMstep(self, X, Xerr): """ Perform the E-step (eq 16 of Bovy et al) """ n_samples, n_features = X.shape X = X[:, np.newaxis, :] Xerr = Xerr[:, np.newaxis, :, :] w_m = X - self.mu T = Xerr + self.V # ------------------------------------------------------------ # compute inverse of each covariance matrix T Tshape = T.shape T = T.reshape([n_samples * self.n_components, n_features, n_features]) Tinv = np.array([linalg.inv(T[i]) for i in range(T.shape[0])]).reshape(Tshape) T = T.reshape(Tshape) # ------------------------------------------------------------ # evaluate each mixture at each point N = np.exp(log_multivariate_gaussian(X, self.mu, T, Vinv=Tinv)) # ------------------------------------------------------------ # E-step: # compute q_ij, b_ij, and B_ij q = (N * self.alpha) / np.dot(N, self.alpha)[:, None] tmp = np.sum(Tinv * w_m[:, :, np.newaxis, :], -1) b = self.mu + np.sum(self.V * tmp[:, :, np.newaxis, :], -1) tmp = np.sum(Tinv[:, :, :, :, np.newaxis] * self.V[:, np.newaxis, :, :], -2) B = self.V - np.sum(self.V[:, :, :, np.newaxis] * tmp[:, :, np.newaxis, :, :], -2) # ------------------------------------------------------------ # M-step: # compute alpha, m, V qj = q.sum(0) self.alpha = qj / n_samples self.mu = np.sum(q[:, :, np.newaxis] * b, 0) / qj[:, np.newaxis] m_b = self.mu - b tmp = m_b[:, :, np.newaxis, :] * m_b[:, :, :, np.newaxis] tmp += B tmp *= q[:, :, np.newaxis, np.newaxis] self.V = tmp.sum(0) / qj[:, np.newaxis, np.newaxis] def sample(self, size=1, random_state=None): if random_state is None: random_state = self.random_state rng = check_random_state(random_state) # noqa: F841 shape = tuple(np.atleast_1d(size)) + (self.mu.shape[1],) npts = np.prod(size) # noqa: F841 alpha_cs = np.cumsum(self.alpha) r = np.atleast_1d(np.random.random(size)) r.sort() ind = r.searchsorted(alpha_cs) ind = np.concatenate(([0], ind)) if ind[-1] != size: ind[-1] = size draw = np.vstack([np.random.multivariate_normal(self.mu[i], self.V[i], (ind[i + 1] - ind[i],)) for i in range(len(self.alpha))]) return draw.reshape(shape) ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1643147665.4253476 astroML-1.0.2/astroML/dimensionality/0000755000076700000240000000000000000000000017043 5ustar00bsipoczstaff././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1549602299.0 astroML-1.0.2/astroML/dimensionality/__init__.py0000644000076700000240000000005100000000000021150 0ustar00bsipoczstafffrom .iterative_pca import iterative_pca ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1615393766.0 astroML-1.0.2/astroML/dimensionality/iterative_pca.py0000644000076700000240000001147000000000000022237 0ustar00bsipoczstaffimport sys import numpy as np from scipy.linalg import solve def iterative_pca(X, M, n_ev=5, n_iter=15, norm=None, full_output=False): """ Parameters ---------- X: ndarray, shape = (n_samples, n_features) input data M: ndarray, bool, shape = (n_samples, n_features) mask for input data. where mask == True, the spectrum is unconstrained n_ev: int number of eigenvectors to use in reconstructing masked regions n_iter: int number of iterations to find eigenvectors norm: string what type of normalization to use on the data. Options are - None : no normalization - 'L1' : L1-norm - 'L2' : L2-norm full_output: boolean (optional) if False (default) return only the reconstructed data X_recons if True, return the full information (see below) Returns ------- X_recons: ndarray, shape = (n_samples, n_features) data with masked regions reconstructed mu: ndarray, shape = (n_features,) mean of data evecs: ndarray, shape = (min(n_samples, n_features), n_features) eigenvectors of the reconstructed data evals: ndarray, size = min(n_samples, n_features) eigenvalues of the reconstructed data norms: ndarray, size = n_samples normalization of each input coeffs: ndarray, size = (n_samples, n_ev) coefficients used to reconstruct X """ X = np.asarray(X, dtype=float) M = np.asarray(M, dtype=bool) if X.shape != M.shape: raise ValueError('X and M must have the same shape') n_samples, n_features = X.shape if np.any(M.sum(0) == n_samples): raise ValueError('Some features are masked in all samples') if type(norm) == str: norm = norm.upper() if norm not in (None, 'none', 'L1', 'L2'): raise ValueError('unrecognized norm: %s' % norm) notM = (~M) X_recons = X.copy() X_recons[M] = 0 # as an initial guess, we'll fill-in masked regions with the mean # of the rest of the sample if norm is None: mu = (X_recons * notM).sum(0) / notM.sum(0) mu = mu * np.ones([n_samples, 1]) X_recons[M] = mu[M] else: # since we're normalizing each spectrum, and the norm depends on # the filled-in values, we need to iterate a few times to make # sure things are consistent. for i in range(n_iter): # normalize if norm == 'L1': X_recons /= np.sum(X_recons, 1)[:, None] else: X_recons /= np.sqrt(np.sum(X_recons ** 2, 1))[:, None] # find the mean mu = (X_recons * notM).sum(0) / notM.sum(0) mu = mu * np.ones([n_samples, 1]) X_recons[M] = mu[M] # Matrix of coefficients coeffs = np.zeros((n_samples, n_ev)) # Now we iterate through, using the principal components to reconstruct # these regions. for i in range(n_iter): sys.stdout.write(' PCA iteration %i / %i\r' % (i + 1, n_iter)) sys.stdout.flush() # normalize the data if norm == 'L1': X_recons /= np.sum(X_recons, 1)[:, None] else: X_recons /= np.sqrt(np.sum(X_recons ** 2, 1))[:, None] # now compute the principal components mu = X_recons.mean(0) X_centered = X_recons - mu U, S, VT = np.linalg.svd(X_centered, full_matrices=False) # perform a least-squares fit to estimate the coefficients of the # first n_ev eigenvectors for each data point. # The eigenvectors are in the rows of the matrix VT. # The coefficients are given by # a_n = [V_n^T W V_n]^(-1) V_n W x # Such that x can be reconstructed via # x_n = V_n a_n # Variables here are: # x : vector length n_features. This is a data point to be # reconstructed # a_n : vector of length n. These are the reconstruction weights # V_n : eigenvector matrix of size (n_features, n). # W : diagonal weight matrix of size (n_features, n_features) # such that W[i,i] = M[i] # x_n : vector of length n_features which approximates x VWx = np.dot(VT[:n_ev], (notM * X_centered).T) for i in range(n_samples): VWV = np.dot(VT[:n_ev], (notM[i] * VT[:n_ev]).T) coeffs[i] = solve(VWV, VWx[:, i], sym_pos=True, overwrite_a=True) X_fill = mu + np.dot(coeffs, VT[:n_ev]) X_recons[M] = X_fill[M] sys.stdout.write('\n') # un-normalize X_recons norms = np.zeros(n_samples) for i in range(n_samples): ratio_i = X[i][notM[i]] / X_recons[i][notM[i]] norms[i] = ratio_i[~np.isnan(ratio_i)][0] X_recons[i] *= norms[i] if full_output: return X_recons, mu, VT, S, norms, coeffs else: return X_recons ././@PaxHeader0000000000000000000000000000003200000000000010210 xustar0026 mtime=1643147665.42585 astroML-1.0.2/astroML/dimensionality/tests/0000755000076700000240000000000000000000000020205 5ustar00bsipoczstaff././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1541133836.0 astroML-1.0.2/astroML/dimensionality/tests/__init__.py0000644000076700000240000000000000000000000022304 0ustar00bsipoczstaff././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1431090039.0 astroML-1.0.2/astroML/dimensionality/tests/test_iterative_PCA.py0000644000076700000240000000126700000000000024303 0ustar00bsipoczstaffimport numpy as np from numpy.testing import assert_array_almost_equal from astroML.dimensionality import iterative_pca def test_iterative_PCA(n_samples=50, n_features=40): np.random.seed(0) # construct some data that is well-approximated # by two principal components x = np.linspace(0, np.pi, n_features) x0 = np.linspace(0, np.pi, n_samples) X = np.sin(x) * np.cos(0.5 * (x - x0[:, None])) # mask 10% of the pixels M = (np.random.random(X.shape) > 0.9) # reconstruct and check accuracy for norm in (None, 'L1', 'L2'): X_recons = iterative_pca(X, M, n_ev=2, n_iter=10, norm=norm) assert_array_almost_equal(X, X_recons, decimal=2) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1615393766.0 astroML-1.0.2/astroML/filters.py0000644000076700000240000002356400000000000016047 0ustar00bsipoczstaffimport numpy as np from scipy import optimize, fftpack, signal from astroML.utils.decorators import deprecated from astroML.utils.exceptions import AstroMLDeprecationWarning # Note: there is a scipy PR to include an improved SG filter within the # scipy.signal submodule. It should replace this when it's finished. # see http://github.com/scipy/scipy/pull/304 @deprecated('1.0', alternative='scipy.signal.savgol_filter', warning_type=AstroMLDeprecationWarning) def savitzky_golay(y, window_size, order, deriv=0, use_fft=True): r"""Smooth (and optionally differentiate) data with a Savitzky-Golay filter This implementation is based on [1]_. The Savitzky-Golay filter removes high frequency noise from data. It has the advantage of preserving the original shape and features of the signal better than other types of filtering approaches, such as moving averages techhniques. Parameters ---------- y : array_like, shape (N,) the values of the time history of the signal. window_size : int the length of the window. Must be an odd integer number. order : int the order of the polynomial used in the filtering. Must be less then `window_size` - 1. deriv: int the order of the derivative to compute (default = 0 means only smoothing) use_fft : bool if True (default) then convolue using FFT for speed Returns ------- y_smooth : ndarray, shape (N) the smoothed signal (or it's n-th derivative). Notes ----- The Savitzky-Golay is a type of low-pass filter, particularly suited for smoothing noisy data. The main idea behind this approach is to make for each point a least-square fit with a polynomial of high order over a odd-sized window centered at the point. Examples -------- >>> t = np.linspace(-4, 4, 500) >>> y = np.exp(-t ** 2) >>> y_smooth = savitzky_golay(y, window_size=31, order=4) References ---------- .. [1] http://www.scipy.org/Cookbook/SavitzkyGolay .. [2] A. Savitzky, M. J. E. Golay, Smoothing and Differentiation of Data by Simplified Least Squares Procedures. Analytical Chemistry, 1964, 36 (8), pp 1627-1639. .. [3] Numerical Recipes 3rd Edition: The Art of Scientific Computing W.H. Press, S.A. Teukolsky, W.T. Vetterling, B.P. Flannery Cambridge University Press ISBN-13: 9780521880688 """ try: window_size = np.abs(int(window_size)) order = np.abs(int(order)) except ValueError: raise ValueError("window_size and order have to be of type int") if window_size % 2 != 1 or window_size < 1: raise TypeError("window_size size must be a positive odd number") if window_size < order + 2: raise TypeError("window_size is too small for the polynomials order") order_range = range(order + 1) half_window = (window_size - 1) // 2 # precompute coefficients b = np.array([[k ** i for i in order_range] for k in range(-half_window, half_window + 1)]) m = np.linalg.pinv(b)[deriv] # pad the signal at the extremes with # values taken from the signal itself firstvals = y[0] - np.abs(y[1:half_window + 1][::-1] - y[0]) lastvals = y[-1] + np.abs(y[-half_window - 1:-1][::-1] - y[-1]) y = np.concatenate((firstvals, y, lastvals)) if use_fft: return signal.fftconvolve(y, m, mode='valid') else: return np.convolve(y, m, mode='valid') def wiener_filter(t, h, signal='gaussian', noise='flat', return_PSDs=False, signal_params=None, noise_params=None): """Compute a Wiener-filtered time-series Parameters ---------- t : array_like evenly-sampled time series, length N h : array_like observations at each t signal : str (optional) currently only 'gaussian' is supported noise : str (optional) currently only 'flat' is supported return_PSDs : bool (optional) if True, then return (PSD, P_S, P_N) signal_guess : tuple (optional) A starting guess at the parameters for the signal. If not specified, a suitable guess will be estimated from the data itself. (see Notes below) noise_guess : tuple (optional) A starting guess at the parameters for the noise. If not specified, a suitable guess will be estimated from the data itself. (see Notes below) Returns ------- h_smooth : ndarray a smoothed version of h, length N Notes ----- The Wiener filter operates by fitting a functional form to the PSD:: PSD = P_S + P_N The resulting frequency-space filter is given by:: Phi = P_S / (P_S + P_N) This entire operation is equivalent to a kernel smoothing by a kernel whose Fourier transform is Phi. Choosing Signal/Noise Parameters ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ the arguments ``signal_guess`` and ``noise_guess`` specify the initial guess for the characteristics of signal and noise used in the minimization. They are generally expected to be tuples, and the meaning varies depending on the form of signal and noise used. For ``gaussian``, the params are (amplitude, width). For ``flat``, the params are (amplitude,). See Also -------- scipy.signal.wiener : a static (non-adaptive) wiener filter """ # Validate signal if signal != 'gaussian': raise ValueError("only signal='gaussian' is supported") if signal_params is not None and len(signal_params) != 2: raise ValueError("signal_params should be length 2") # Validate noise if noise != 'flat': raise ValueError("only noise='flat' is supported") if noise_params is not None and len(noise_params) != 1: raise ValueError("noise_params should be length 1") # Validate t and hd t = np.asarray(t) h = np.asarray(h) if (t.ndim != 1) or (t.shape != h.shape): raise ValueError('t and h must be equal-length 1-dimensional arrays') # compute the PSD of the input N = len(t) Df = 1. / N / (t[1] - t[0]) f = fftpack.ifftshift(Df * (np.arange(N) - N / 2)) H = fftpack.fft(h) PSD = abs(H) ** 2 # fit signal/noise params if necessary if signal_params is None: amp_guess = np.max(PSD[1:]) width_guess = np.min(np.abs(f[PSD < np.mean(PSD[1:])])) signal_params = (amp_guess, width_guess) if noise_params is None: noise_params = (np.mean(PSD[1:]),) # Set up the Wiener filter: # fit a model to the PSD: sum of signal form and noise form def signal(x, A, width): width = abs(width) + 1E-99 # prevent divide-by-zero errors return A * np.exp(-0.5 * (x / width) ** 2) def noise(x, n): return n * np.ones(x.shape) # use [1:] here to remove the zero-frequency term: we don't want to # fit to this for data with an offset. def min_func(v): return np.sum((PSD[1:] - signal(f[1:], v[0], v[1]) - noise(f[1:], v[2])) ** 2) v0 = tuple(signal_params) + tuple(noise_params) v = optimize.minimize(min_func, v0, method='Nelder-Mead')['x'] P_S = signal(f, v[0], v[1]) P_N = noise(f, v[2]) Phi = P_S / (P_S + P_N) Phi[0] = 1 # correct for DC offset # Use Phi to filter and smooth the values h_smooth = fftpack.ifft(Phi * H) if not np.iscomplexobj(h): h_smooth = h_smooth.real if return_PSDs: return h_smooth, PSD, P_S, P_N, Phi else: return h_smooth def min_component_filter(x, y, feature_mask, p=1, fcut=None, Q=None): """Minimum component filtering Minimum component filtering is useful for determining the background component of a signal in the presence of spikes Parameters ---------- x : array_like 1D array of evenly spaced x values y : array_like 1D array of y values corresponding to x feature_mask : array_like 1D mask array giving the locations of features in the data which should be ignored for smoothing p : integer (optional) polynomial degree to be used for the fit (default = 1) fcut : float (optional) the cutoff frequency for the low-pass filter. Default value is f_nyq / sqrt(N) Q : float (optional) the strength of the low-pass filter. Larger Q means a steeper cutoff default value is 0.1 * fcut Returns ------- y_filtered : ndarray The filtered version of y. Notes ----- This code follows the procedure explained in the book "Practical Statistics for Astronomers" by Wall & Jenkins book, as well as in Wall, J, A&A 122:371, 1997 """ x = np.asarray(x, dtype=float) y = np.asarray(y, dtype=float) feature_mask = np.asarray(feature_mask, dtype=bool) if ((x.ndim != 1) or (x.shape != y.shape) or (y.shape != feature_mask.shape)): raise ValueError('x, y, and feature_mask must be 1 dimensional ' 'with matching lengths') if fcut is None: f_nyquist = 1. / (x[1] - x[0]) fcut = f_nyquist / np.sqrt(len(x)) if Q is None: Q = 0.1 * fcut # compute polynomial features XX = x[:, None] ** np.arange(p + 1) # compute least-squares fit to non-masked data beta = np.linalg.lstsq(XX[~feature_mask], y[~feature_mask], rcond=None)[0] # subtract polynomial fit and mask the data y_mask = y - np.dot(XX, beta) y_mask[feature_mask] = 0 # get Fourier transforms of arrays yFT_mask = fftpack.fft(y_mask) # compute (shifted) frequency array for filter N = len(x) f = fftpack.ifftshift((np.arange(N) - N / 2.) * 1. / N / (x[1] - x[0])) # construct low-pass filter filt = np.exp(- (Q * (abs(f) - fcut) / fcut) ** 2) filt[abs(f) < fcut] = 1 # reconstruct filtered signal y_filtered = fftpack.ifft(yFT_mask * filt).real + np.dot(XX, beta) return y_filtered ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1612224990.0 astroML-1.0.2/astroML/fourier.py0000644000076700000240000001550600000000000016047 0ustar00bsipoczstaffimport numpy as np try: # use scipy if available: it's faster from scipy.fftpack import fft, ifft, fftshift except ImportError: from numpy.fft import fft, ifft, fftshift def FT_continuous(t, h, axis=-1, method=1): r"""Approximate a continuous 1D Fourier Transform with sampled data. This function uses the Fast Fourier Transform to approximate the continuous fourier transform of a sampled function, using the convention .. math:: H(f) = \int h(t) exp(-2 \pi i f t) dt It returns f and H, which approximate H(f). Parameters ---------- t : array_like regularly sampled array of times t is assumed to be regularly spaced, i.e. t = t0 + Dt * np.arange(N) h : array_like real or complex signal at each time axis : int axis along which to perform fourier transform. This axis must be the same length as t. Returns ------- f : ndarray frequencies of result. Units are the same as 1/t H : ndarray Fourier coefficients at each frequency. """ assert t.ndim == 1 assert h.shape[axis] == t.shape[0] N = len(t) if N % 2 != 0: raise ValueError("number of samples must be even") Dt = t[1] - t[0] Df = 1. / (N * Dt) t0 = t[N // 2] f = Df * (np.arange(N) - N // 2) shape = np.ones(h.ndim, dtype=int) shape[axis] = N phase = np.ones(N) phase[1::2] = -1 phase = phase.reshape(shape) if method == 1: H = Dt * fft(h * phase, axis=axis) else: H = Dt * fftshift(fft(h, axis=axis), axes=axis) H *= phase H *= np.exp(-2j * np.pi * t0 * f.reshape(shape)) H *= np.exp(-1j * np.pi * N / 2) return f, H def IFT_continuous(f, H, axis=-1, method=1): """Approximate a continuous 1D Inverse Fourier Transform with sampled data. This function uses the Fast Fourier Transform to approximate the continuous fourier transform of a sampled function, using the convention .. math:: H(f) = integral[ h(t) exp(-2 pi i f t) dt] h(t) = integral[ H(f) exp(2 pi i f t) dt] It returns t and h, which approximate h(t). Parameters ---------- f : array_like regularly sampled array of times t is assumed to be regularly spaced, i.e. f = f0 + Df * np.arange(N) H : array_like real or complex signal at each time axis : int axis along which to perform fourier transform. This axis must be the same length as t. Returns ------- f : ndarray frequencies of result. Units are the same as 1/t H : ndarray Fourier coefficients at each frequency. """ assert f.ndim == 1 assert H.shape[axis] == f.shape[0] N = len(f) if N % 2 != 0: raise ValueError("number of samples must be even") f0 = f[0] Df = f[1] - f[0] t0 = -0.5 / Df Dt = 1. / (N * Df) t = t0 + Dt * np.arange(N) shape = np.ones(H.ndim, dtype=int) shape[axis] = N t_calc = t.reshape(shape) f_calc = f.reshape(shape) H_prime = H * np.exp(2j * np.pi * t0 * f_calc) h_prime = ifft(H_prime, axis=axis) h = N * Df * np.exp(2j * np.pi * f0 * (t_calc - t0)) * h_prime return t, h def PSD_continuous(t, h, axis=-1, method=1): r"""Approximate a continuous 1D Power Spectral Density of sampled data. This function uses the Fast Fourier Transform to approximate the continuous fourier transform of a sampled function, using the convention .. math:: H(f) = \int h(t) \exp(-2 \pi i f t) dt It returns f and PSD, which approximate PSD(f) where .. math:: PSD(f) = |H(f)|^2 + |H(-f)|^2 Parameters ---------- t : array_like regularly sampled array of times t is assumed to be regularly spaced, i.e. t = t0 + Dt * np.arange(N) h : array_like real or complex signal at each time axis : int axis along which to perform fourier transform. This axis must be the same length as t. Returns ------- f : ndarray frequencies of result. Units are the same as 1/t PSD : ndarray Fourier coefficients at each frequency. """ assert t.ndim == 1 assert h.shape[axis] == t.shape[0] N = len(t) if N % 2 != 0: raise ValueError("number of samples must be even") ax = axis % h.ndim if method == 1: # use FT_continuous f, Hf = FT_continuous(t, h, axis) Hf = np.rollaxis(Hf, ax) f = -f[N // 2::-1] PSD = abs(Hf[N // 2::-1]) ** 2 PSD[:-1] += abs(Hf[N // 2:]) ** 2 PSD = np.rollaxis(PSD, 0, ax + 1) else: # A faster way to do it is with fftshift # take advantage of the fact that phases go away Dt = t[1] - t[0] Df = 1. / (N * Dt) f = Df * np.arange(N // 2 + 1) Hf = fft(h, axis=axis) Hf = np.rollaxis(Hf, ax) PSD = abs(Hf[:N // 2 + 1]) ** 2 PSD[-1] = 0 PSD[1:] += abs(Hf[N // 2:][::-1]) ** 2 PSD[0] *= 2 PSD = Dt ** 2 * np.rollaxis(PSD, 0, ax + 1) return f, PSD def sinegauss(t, t0, f0, Q): """Sine-gaussian wavelet""" a = (f0 * 1. / Q) ** 2 return (np.exp(-a * (t - t0) ** 2) * np.exp(2j * np.pi * f0 * (t - t0))) def sinegauss_FT(f, t0, f0, Q): """Fourier transform of the sine-gaussian wavelet. This uses the convention .. math:: H(f) = integral[ h(t) exp(-2pi i f t) dt] """ a = (f0 * 1. / Q) ** 2 return (np.sqrt(np.pi / a) * np.exp(-2j * np.pi * f * t0) * np.exp(-np.pi ** 2 * (f - f0) ** 2 / a)) def sinegauss_PSD(f, t0, f0, Q): """Compute the PSD of the sine-gaussian function at frequency f .. math:: PSD(f) = |H(f)|^2 + |H(-f)|^2 """ a = (f0 * 1. / Q) ** 2 Pf = np.pi / a * np.exp(-2 * np.pi ** 2 * (f - f0) ** 2 / a) Pmf = np.pi / a * np.exp(-2 * np.pi ** 2 * (-f - f0) ** 2 / a) return Pf + Pmf def wavelet_PSD(t, h, f0, Q=1.0): """Compute the wavelet PSD as a function of f0 and t Parameters ---------- t : array_like array of times, length N h : array_like array of observed values, length N f0 : array_like array of candidate frequencies, length Nf Q : float Q-parameter for wavelet Returns ------- PSD : ndarray The 2-dimensional PSD, of shape (Nf, N), corresponding with frequencies f0 and times t. """ t, h, f0 = map(np.asarray, (t, h, f0)) if (t.ndim != 1) or (t.shape != h.shape): raise ValueError('t and h must be one dimensional and the same shape') if f0.ndim != 1: raise ValueError('f0 must be one dimensional') Q = Q + np.zeros_like(f0) f, H = FT_continuous(t, h) W = np.conj(sinegauss_FT(f, 0, f0[:, None], Q[:, None])) _, HW = IFT_continuous(f, H * W) return abs(HW) ** 2 ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1643147665.4272823 astroML-1.0.2/astroML/linear_model/0000755000076700000240000000000000000000000016445 5ustar00bsipoczstaff././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1612224990.0 astroML-1.0.2/astroML/linear_model/TLS.py0000644000076700000240000000262200000000000017463 0ustar00bsipoczstaffimport numpy as np def TLS_logL(v, X, dX): """Compute the total least squares log-likelihood This uses Hogg et al eq. 29-32 Parameters ---------- v : ndarray The normal vector to the linear best fit. shape=(D,). Note that the magnitude |v| is a stand-in for the intercept. X : ndarray The input data. shape = [N, D] dX : ndarray The covariance of the errors for each point. For diagonal errors, the shape = (N, D) and the entries are dX[i] = [sigma_x1, sigma_x2 ... sigma_xD] For full covariance, the shape = (N, D, D) and the entries are dX[i] = Cov(X[i], X[i]), the full error covariance. Returns ------- logL : float The log-likelihood of the model v given the data. Notes ----- This implementation follows Hogg 2010, arXiv 1008.4686 """ # check inputs X, dX, v = map(np.asarray, (X, dX, v)) N, D = X.shape assert v.shape == (D,) assert dX.shape in ((N, D), (N, D, D)) v_norm = np.linalg.norm(v) v_hat = v / v_norm # eq. 30 Delta = np.dot(X, v_hat) - v_norm # eq. 31 if dX.ndim == 2: # diagonal covariance Sig2 = np.sum(dX * v_hat ** 2, 1) else: # full covariance Sig2 = np.dot(np.dot(v_hat, dX), v_hat) return (-0.5 * np.sum(np.log(2 * np.pi * Sig2)) - np.sum(0.5 * Delta ** 2 / Sig2)) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1615393766.0 astroML-1.0.2/astroML/linear_model/__init__.py0000644000076700000240000000035100000000000020555 0ustar00bsipoczstafffrom .linear_regression import LinearRegression, PolynomialRegression, BasisFunctionRegression from .linear_regression_errors import LinearRegressionwithErrors from .kernel_regression import NadarayaWatson from .TLS import TLS_logL ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1615393766.0 astroML-1.0.2/astroML/linear_model/kernel_regression.py0000644000076700000240000000304000000000000022534 0ustar00bsipoczstaffimport numpy as np from sklearn.base import BaseEstimator from sklearn.metrics import pairwise_kernels class NadarayaWatson(BaseEstimator): """Nadaraya-Watson Kernel Regression This is basically a gaussian-weighted moving average of points Parameters ---------- kernel : string kernel is either "gaussian", or one of the kernels available in sklearn.metrics.pairwise. h : float or array_like width of kernel. If array, its length must be the number of dimensions in the training data Additional keyword arguments are passed to the kernel. """ def __init__(self, kernel='gaussian', h=None, **kwargs): self.kernel = kernel self.h = h self.kwargs = kwargs def fit(self, X, y, dy=1): self.X = np.asarray(X) self.y = np.asarray(y) self.dy = np.atleast_1d(dy) return self def predict(self, X): X = np.asarray(X) if X.ndim != 2: raise ValueError('X must be two-dimensional') if X.shape[1] != self.X.shape[1]: raise ValueError('dimensions of X do not match training dimension') if self.kernel == 'gaussian': # wrangle gaussian into scikit-learn's 'rbf' kernel h = np.asarray(self.h) gamma = 0.5 / h / h K = pairwise_kernels(X, self.X, metric='rbf', gamma=gamma) else: K = pairwise_kernels(X, self.X, metric=self.kernel, **self.kwargs) K /= self.dy ** 2 return (K * self.y).sum(1) / K.sum(1) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1615393766.0 astroML-1.0.2/astroML/linear_model/linear_regression.py0000644000076700000240000001536600000000000022544 0ustar00bsipoczstaffimport numpy as np from sklearn.base import BaseEstimator from sklearn.preprocessing import PolynomialFeatures from sklearn.linear_model import LinearRegression, Lasso, Ridge # ------------------------------------------------------------ # Basis functions def gaussian_basis(X, mu, sigma): """Gaussian Basis function Parameters ---------- X : array_like input data: shape = (n_samples, n_features) mu : array_like means of bases, shape = (n_bases, n_features) sigma : float or array_like must broadcast to shape of mu Returns ------- Xg : ndarray shape = (n_samples, n_bases) """ X = np.asarray(X) mu = np.atleast_2d(mu) sigma = np.atleast_2d(sigma) n_samples, n_features = X.shape if mu.shape[1] != n_features: raise ValueError('shape of mu must match shape of X') r = (((X[:, None, :] - mu) / sigma) ** 2).sum(2) Xg = np.exp(-0.5 * r) Xg *= 1. / np.sqrt(2 * np.pi) / sigma.prod(1) return Xg class LinearRegression(BaseEstimator): """Simple Linear Regression with errors in y This is a stripped-down version of sklearn.linear_model.LinearRegression which can correctly accounts for errors in the y variable Parameters ---------- fit_intercept : bool (optional) if True (default) then fit the intercept of the data regularization : string (optional) ['l1'|'l2'|'none'] Use L1 (Lasso) or L2 (Ridge) regression kwds: dict additional keyword arguments passed to sklearn estimators: LinearRegression, Lasso (L1), or Ridge (L2) Notes ----- This implementation may be compared to that in sklearn.linear_model.LinearRegression. The difference is that here errors are """ _regressors = {'none': LinearRegression, 'l1': Lasso, 'l2': Ridge} def __init__(self, fit_intercept=True, regularization='none', kwds=None): if regularization.lower() not in ['l1', 'l2', 'none']: raise ValueError("regularization='{}' not recognized" "".format(regularization)) self.fit_intercept = fit_intercept self.regularization = regularization self.kwds = kwds def _transform_X(self, X): X = np.asarray(X) if self.fit_intercept: X = np.hstack([np.ones([X.shape[0], 1]), X]) return X @staticmethod def _scale_by_error(X, y, y_error=1): """Scale regression by error on y""" X = np.atleast_2d(X) y = np.asarray(y) y_error = np.asarray(y_error) assert X.ndim == 2 assert y.ndim == 1 assert X.shape[0] == y.shape[0] if y_error.ndim == 0: return X / y_error, y / y_error elif y_error.ndim == 1: assert y_error.shape == y.shape X_out, y_out = X / y_error[:, None], y / y_error elif y_error.ndim == 2: assert y_error.shape == (y.size, y.size) evals, evecs = np.linalg.eigh(y_error) X_out = np.dot(evecs * (evals ** -0.5), np.dot(evecs.T, X)) y_out = np.dot(evecs * (evals ** -0.5), np.dot(evecs.T, y)) else: raise ValueError("shape of y_error does not match that of y") return X_out, y_out def _choose_regressor(self): model = self._regressors.get(self.regularization.lower(), None) if model is None: raise ValueError("regularization='{}' unrecognized" "".format(self.regularization)) return model def fit(self, X, y, y_error=1): kwds = {} if self.kwds is not None: kwds.update(self.kwds) kwds['fit_intercept'] = False model = self._choose_regressor() self.clf_ = model(**kwds) X = self._transform_X(X) X, y = self._scale_by_error(X, y, y_error) self.clf_.fit(X, y) return self def predict(self, X): X = self._transform_X(X) return self.clf_.predict(X) @property def coef_(self): return self.clf_.coef_ class PolynomialRegression(LinearRegression): """Polynomial Regression with errors in y Parameters ---------- degree : int degree of the polynomial. interaction_only : bool (optional) If true, only interaction features are produced: features that are products of at most ``degree`` *distinct* input features (so not ``x[1] ** 2``, ``x[0] * x[2] ** 3``, etc.). fit_intercept : bool (optional) if True (default) then fit the intercept of the data regularization : string (optional) ['l1'|'l2'|'none'] Use L1 (Lasso) or L2 (Ridge) regression kwds: dict additional keyword arguments passed to sklearn estimators: LinearRegression, Lasso (L1), or Ridge (L2) """ def __init__(self, degree=1, interaction_only=False, fit_intercept=True, regularization='none', kwds=None): self.degree = degree self.interaction_only = interaction_only LinearRegression.__init__(self, fit_intercept, regularization, kwds) def _transform_X(self, X): trans = PolynomialFeatures(degree=self.degree, interaction_only=self.interaction_only, include_bias=self.fit_intercept) return trans.fit_transform(X) class BasisFunctionRegression(LinearRegression): """Basis Function with errors in y Parameters ---------- basis_func : str or function specify the basis function to use. This should take an input matrix of size (n_samples, n_features), along with optional parameters, and return a matrix of size (n_samples, n_bases). fit_intercept : bool (optional) if True (default) then fit the intercept of the data regularization : string (optional) ['l1'|'l2'|'none'] Use L1 (Lasso) or L2 (Ridge) regression kwds: dict additional keyword arguments passed to sklearn estimators: LinearRegression, Lasso (L1), or Ridge (L2) """ _basis_funcs = {'gaussian': gaussian_basis} def __init__(self, basis_func='gaussian', fit_intercept=True, regularization='none', kwds=None, **kwargs): self.basis_func = basis_func self.kwargs = kwargs LinearRegression.__init__(self, fit_intercept, regularization, kwds) def _transform_X(self, X): if callable(self.basis_func): basis_func = self.basis_func else: basis_func = self._basis_funcs.get(self.basis_func, None) X = basis_func(X, **self.kwargs) if self.fit_intercept: X = np.hstack([np.ones((X.shape[0], 1)), X]) return X ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1643147368.0 astroML-1.0.2/astroML/linear_model/linear_regression_errors.py0000644000076700000240000000535400000000000024134 0ustar00bsipoczstaffimport numpy as np import warnings try: import pymc3 as pm import theano.tensor as tt from packaging.version import Version PYMC_LT_39 = Version(pm.__version__) < Version("3.9") except ImportError: warnings.warn('LinearRegressionwithErrors requires PyMC3 to be installed') PYMC_LT_39 = True from astroML.linear_model import LinearRegression __all__ = ['LinearRegressionwithErrors'] class LinearRegressionwithErrors(LinearRegression): def __init__(self, fit_intercept=False, regularization='none', kwds=None): super().__init__(fit_intercept, regularization, kwds) def fit(self, X, y, y_error=1, x_error=None, *, sample_kwargs={'draws': 1000, 'target_accept': 0.9}): if not PYMC_LT_39: sample_kwargs['return_inferencedata'] = False kwds = {} if self.kwds is not None: kwds.update(self.kwds) kwds['fit_intercept'] = False model = self._choose_regressor() self.clf_ = model(**kwds) self.fit_intercept = False if x_error is not None: x_error = np.atleast_2d(x_error) with pm.Model(): # slope and intercept of eta-ksi relation slope = pm.Flat('slope', shape=(X.shape[0], )) inter = pm.Flat('inter') # intrinsic scatter of eta-ksi relation int_std = pm.HalfFlat('int_std') # standard deviation of Gaussian that ksi are drawn from (assumed mean zero) tau = pm.HalfFlat('tau', shape=(X.shape[0],)) # intrinsic ksi mu = pm.Normal('mu', mu=0, sigma=tau, shape=(X.shape[0],)) # Some wizzarding with the dimensions all around. ksi = pm.Normal('ksi', mu=mu, tau=tau, shape=X.T.shape) # intrinsic eta-ksi linear relation + intrinsic scatter eta = pm.Normal('eta', mu=(tt.dot(slope.T, ksi.T) + inter), sigma=int_std, shape=y.shape) # observed xi, yi x = pm.Normal('xi', mu=ksi.T, sigma=x_error, observed=X, shape=X.shape) # noqa: F841 y = pm.Normal('yi', mu=eta, sigma=y_error, observed=y, shape=y.shape) self.trace = pm.sample(**sample_kwargs) # TODO: make it optional to choose a way to define best HND, edges = np.histogramdd(np.hstack((self.trace['slope'], self.trace['inter'][:, None])), bins=50) w = np.where(HND == HND.max()) # choose the maximum posterior slope and intercept slope_best = [edges[i][w[i][0]] for i in range(len(edges) - 1)] intercept_best = edges[-1][w[-1][0]] self.clf_.coef_ = np.array([intercept_best, *slope_best]) return self ././@PaxHeader0000000000000000000000000000003300000000000010211 xustar0027 mtime=1643147665.428316 astroML-1.0.2/astroML/linear_model/tests/0000755000076700000240000000000000000000000017607 5ustar00bsipoczstaff././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1541133836.0 astroML-1.0.2/astroML/linear_model/tests/__init__.py0000644000076700000240000000000000000000000021706 0ustar00bsipoczstaff././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1549602299.0 astroML-1.0.2/astroML/linear_model/tests/test_TLS.py0000644000076700000240000000071000000000000021660 0ustar00bsipoczstaffimport numpy as np from numpy.testing import assert_allclose from ..TLS import TLS_logL def test_TLS_likelihood_diagonal(rseed=0): """Test Total-Least-Squares fit with diagonal covariance""" np.random.seed(rseed) X = np.random.rand(10, 2) dX1 = 0.1 * np.ones((10, 2)) dX2 = 0.1 * np.array([np.eye(2) for i in range(10)]) v = np.random.random(2) assert_allclose(TLS_logL(v, X, dX1), TLS_logL(v, X, dX2)) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1615393766.0 astroML-1.0.2/astroML/linear_model/tests/test_kernel_regression.py0000644000076700000240000000266400000000000024750 0ustar00bsipoczstaffimport numpy as np from numpy.testing import assert_allclose import pytest from astroML.linear_model import NadarayaWatson def test_NW_simple(): X = np.arange(11.) y = X + 1 dy = 1 # by symmetry, NW regression should get these exactly correct Xfit = np.array([4, 5, 6])[:, None] y_true = np.ravel(Xfit + 1) clf = NadarayaWatson(h=0.5).fit(X[:, None], y, dy) y_fit = clf.predict(Xfit) assert_allclose(y_fit, y_true) def test_NW_simple_laplacian_kernel(): X = np.arange(11.) y = X + 1 dy = 1 # by symmetry, NW regression should get these exactly correct Xfit = np.array([4, 5, 6])[:, None] y_true = np.ravel(Xfit + 1) kwargs = {'gamma': 10.} clf = NadarayaWatson(kernel='laplacian', **kwargs).fit(X[:, None], y, dy) y_fit = clf.predict(Xfit) assert_allclose(y_fit, y_true) def test_X_invalid_shape_exception(): X = np.arange(11.) y = X + 1 dy = 1 clf = NadarayaWatson(h=0.5).fit(X[:, None], y, dy) # not valid Xfit.shape[1], should raise an exception Xfit = np.array([[4, 5, 6], [1, 2, 3]]) with pytest.raises(Exception) as e: clf.predict(Xfit) assert str(e.value) == "dimensions of X do not match training dimension" # not valid Xfit.shape[1], should raise an exception Xfit = np.array([4, 5, 6]) with pytest.raises(Exception) as e: clf.predict(Xfit) assert str(e.value) == "X must be two-dimensional" ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1615393766.0 astroML-1.0.2/astroML/linear_model/tests/test_linear_regression.py0000644000076700000240000000710400000000000024734 0ustar00bsipoczstaffimport pytest import numpy as np from numpy.testing import assert_allclose from sklearn.linear_model import LinearRegression as skLinearRegression from astroML.linear_model import \ LinearRegression, PolynomialRegression, BasisFunctionRegression try: import pymc3 as pm # noqa: F401 HAS_PYMC3 = True except ImportError: HAS_PYMC3 = False def test_error_transform_diag(N=20, rseed=0): rng = np.random.RandomState(rseed) X = rng.rand(N, 2) yerr = 0.05 * (1 + rng.rand(N)) y = (X[:, 0] ** 2 + X[:, 1]) + yerr * rng.randn(N) Sigma = np.eye(N) * yerr ** 2 X1, y1 = LinearRegression._scale_by_error(X, y, yerr) X2, y2 = LinearRegression._scale_by_error(X, y, Sigma) assert_allclose(X1, X2) assert_allclose(y1, y2) def test_error_transform_full(N=20, rseed=0): rng = np.random.RandomState(rseed) X = rng.rand(N, 2) # generate a pos-definite error matrix Sigma = 0.05 * rng.randn(N, N) u, s, v = np.linalg.svd(Sigma) Sigma = np.dot(u * s, u.T) # draw y from this error distribution y = (X[:, 0] ** 2 + X[:, 1]) y = rng.multivariate_normal(y, Sigma) X2, y2 = LinearRegression._scale_by_error(X, y, Sigma) # check that the form entering the chi^2 is correct assert_allclose(np.dot(X2.T, X2), np.dot(X.T, np.linalg.solve(Sigma, X))) assert_allclose(np.dot(y2, y2), np.dot(y, np.linalg.solve(Sigma, y))) def test_LinearRegression_simple(): """ Test a simple linear regression """ x = np.arange(10.).reshape((10, 1)) y = np.arange(10.) + 1 dy = 1 clf = LinearRegression().fit(x, y, dy) y_true = clf.predict(x) assert_allclose(y, y_true, atol=1E-10) def test_LinearRegression_err(): """ Test that errors are correctly accounted for By comparing to scikit-learn LinearRegression """ np.random.seed(0) X = np.random.random((10, 1)) y = np.random.random(10) + 1 dy = 0.1 y = np.random.normal(y, dy) clf1 = LinearRegression().fit(X, y, dy) clf2 = skLinearRegression().fit(X / dy, y / dy) assert_allclose(clf1.coef_[1:], clf2.coef_) assert_allclose(clf1.coef_[0], clf2.intercept_ * dy) def test_LinearRegression_fit_intercept(): np.random.seed(0) X = np.random.random((10, 1)) y = np.random.random(10) clf1 = LinearRegression(fit_intercept=False).fit(X, y) clf2 = skLinearRegression(fit_intercept=False).fit(X, y) assert_allclose(clf1.coef_, clf2.coef_) def test_PolynomialRegression_simple(): x = np.arange(10.).reshape((10, 1)) y = np.arange(10.) dy = 1 clf = PolynomialRegression(2).fit(x, y, dy) y_true = clf.predict(x) assert_allclose(y, y_true, atol=1E-10) def test_BasisfunctionRegression_simple(): x = np.arange(10.).reshape((10, 1)) y = np.arange(10.) + 1 dy = 1 mu = np.arange(11.)[:, None] sigma = 1.0 clf = BasisFunctionRegression(mu=mu, sigma=sigma).fit(x, y, dy) y_true = clf.predict(x) assert_allclose(y, y_true, atol=1E-10) @pytest.mark.skipif('not HAS_PYMC3') def test_LinearRegressionwithErrors(): """ Test for small errors agrees with fit with y errors only """ from astroML.linear_model import LinearRegressionwithErrors np.random.seed(0) X = np.random.random(10) + 1 dy = np.random.random(10) * 0.1 y = X * 2 + 1 + (dy - 0.05) dx = np.random.random(10) * 0.01 X = X + (dx - 0.005) clf1 = LinearRegression().fit(X[:, None], y, dy) clf2 = LinearRegressionwithErrors().fit(np.atleast_2d(X), y, dy, dx) assert_allclose(clf1.coef_, clf2.coef_, 0.2) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1615393766.0 astroML-1.0.2/astroML/lumfunc.py0000644000076700000240000001277200000000000016047 0ustar00bsipoczstaffimport numpy as np def _sorted_interpolate(x, y, x_eval): """utility function for binned_Cminus""" # note that x should be sorted N = len(x) ind = x.searchsorted(x_eval) ind[ind == N] = N - 1 y_eval = np.zeros(x_eval.shape) # find perfect matches match = (x[ind] == x_eval) | (x_eval > x[-1]) | (x_eval < x[0]) y_eval[match] = y[ind[match]] ind = ind[~match] # take care of extrapolation ind[ind == 0] = 1 x_lo = x[ind - 1] x_up = x[ind] y_lo = y[ind - 1] y_up = y[ind] # take care of places where x_lo = x_up y_eval[~match] = (y_lo + (x_eval[~match] - x_lo) * (y_up - y_lo) / (x_up - x_lo)) return y_eval def Cminus(x, y, xmax, ymax): """Lynden-Bell's C-minus method Parameters ---------- x : array_like array of x values y : array_like array of y values xmax : array_like array of maximum x values for each y value ymax : array_like array of maximum y values for each x value Returns ------- Nx, Ny, cuml_x, cuml_y: ndarrays Nx and cuml_x are in the order of the sorted x array Ny and cuml_y are in the order of the sorted y array """ # make copies of input x, y, xmax, ymax = map(np.array, (x, y, xmax, ymax)) Nall = len(x) cuml_x = np.zeros(x.shape) cuml_y = np.zeros(y.shape) Nx = np.zeros(x.shape) Ny = np.zeros(y.shape) # first the y direction. i_sort = np.argsort(y) x = x[i_sort] y = y[i_sort] xmax = xmax[i_sort] ymax = ymax[i_sort] for j in range(1, Nall): # Making sure we don't divide with 0 later objects = np.sum(x[:j] < xmax[j]) if objects: Ny[j] = objects else: Ny[j] = np.inf Ny[0] = np.inf cuml_y = np.cumprod(1. + 1. / Ny) Ny[np.isinf(Ny)] = 0 # renormalize cuml_y *= Nall / cuml_y[-1] # now the x direction i_sort = np.argsort(x) x = x[i_sort] y = y[i_sort] xmax = xmax[i_sort] ymax = ymax[i_sort] for i in range(1, Nall): # Making sure we don't divide with 0 later objects = np.sum(y[:i] < ymax[i]) if objects: Nx[i] = objects else: Nx[i] = np.inf Nx[0] = np.inf cuml_x = np.cumprod(1. + 1. / Nx) Nx[np.isinf(Nx)] = 0 # renormalize cuml_x *= Nall / cuml_x[-1] return Nx, Ny, cuml_x, cuml_y def binned_Cminus(x, y, xmax, ymax, xbins, ybins, normalize=False): """Compute the binned distributions using the Cminus method Parameters ---------- x : array_like array of x values y : array_like array of y values xmax : array_like array of maximum x values for each y value ymax : array_like array of maximum y values for each x value xbins : array_like array of bin edges for the x function: size=Nbins_x + 1 ybins : array_like array of bin edges for the y function: size=Nbins_y + 1 normalize : boolean if true, then returned distributions are normalized. Default is False. Returns ------- dist_x, dist_y : ndarrays distributions of size Nbins_x and Nbins_y """ Nx, Ny, cuml_x, cuml_y = Cminus(x, y, xmax, ymax) # simple linear interpolation using a binary search # interpolate the cumulative distributions x_sort = np.sort(x) y_sort = np.sort(y) Ix_edges = _sorted_interpolate(x_sort, cuml_x, xbins) Iy_edges = _sorted_interpolate(y_sort, cuml_y, ybins) if xbins[0] < x_sort[0]: Ix_edges[0] = cuml_x[0] if xbins[-1] > x_sort[-1]: Ix_edges[-1] = cuml_x[-1] if ybins[0] < y_sort[0]: Iy_edges[0] = cuml_y[0] if ybins[-1] > y_sort[-1]: Iy_edges[-1] = cuml_y[-1] x_dist = np.diff(Ix_edges) / np.diff(xbins) y_dist = np.diff(Iy_edges) / np.diff(ybins) if normalize: x_dist /= len(x) y_dist /= len(y) return x_dist, y_dist def bootstrap_Cminus(x, y, xmax, ymax, xbins, ybins, Nbootstraps=10, normalize=False): """ Compute the binned distributions using the Cminus method, with bootstrapped estimates of the errors Parameters ---------- x : array_like array of x values y : array_like array of y values xmax : array_like array of maximum x values for each y value ymax : array_like array of maximum y values for each x value xbins : array_like array of bin edges for the x function: size=Nbins_x + 1 ybins : array_like array of bin edges for the y function: size=Nbins_y + 1 Nbootstraps : int number of bootstrap resamplings to perform normalize : boolean if true, then returned distributions are normalized. Default is False. Returns ------- dist_x, err_x, dist_y, err_y : ndarrays distributions of size Nbins_x and Nbins_y """ x, y, xmax, ymax = map(np.asarray, (x, y, xmax, ymax)) x_dist = np.zeros((Nbootstraps, len(xbins) - 1)) y_dist = np.zeros((Nbootstraps, len(ybins) - 1)) for i in range(Nbootstraps): ind = np.random.randint(0, len(x), len(x)) x_dist[i], y_dist[i] = binned_Cminus(x[ind], y[ind], xmax[ind], ymax[ind], xbins, ybins, normalize=normalize) return (x_dist.mean(0), x_dist.std(0, ddof=1), y_dist.mean(0), y_dist.std(0, ddof=1)) ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1643147665.4310377 astroML-1.0.2/astroML/plotting/0000755000076700000240000000000000000000000015653 5ustar00bsipoczstaff././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1612224990.0 astroML-1.0.2/astroML/plotting/__init__.py0000644000076700000240000000043400000000000017765 0ustar00bsipoczstafffrom .hist_tools import hist from .scatter_contour import scatter_contour from .mcmc import plot_mcmc from .ellipse import plot_tissot_ellipse from .multiaxes import MultiAxes from .settings import setup_text_plots from .regression import plot_regressions, plot_regression_from_trace ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1549602299.0 astroML-1.0.2/astroML/plotting/ellipse.py0000644000076700000240000000170000000000000017660 0ustar00bsipoczstaffimport numpy as np def plot_tissot_ellipse(longitude, latitude, radius, ax=None, **kwargs): """Plot Tissot Ellipse/Tissot Indicatrix Parameters ---------- longitude : float or array_like longitude of ellipse centers (radians) latitude : float or array_like latitude of ellipse centers (radians) radius : float or array_like radius of ellipses ax : Axes object (optional) matplotlib axes instance on which to draw ellipses. Other Parameters ---------------- other keyword arguments will be passed to matplotlib.patches.Ellipse. """ # Import here so that testing with Agg will work from matplotlib import pyplot as plt from matplotlib.patches import Ellipse if ax is None: ax = plt.gca() for long, lat, rad in np.broadcast(longitude, latitude, radius): el = Ellipse((long, lat), radius / np.cos(lat), radius, **kwargs) ax.add_patch(el) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1612224990.0 astroML-1.0.2/astroML/plotting/hist_tools.py0000644000076700000240000000532200000000000020416 0ustar00bsipoczstaffimport warnings import numpy as np from astropy.stats import (scott_bin_width, freedman_bin_width, knuth_bin_width, bayesian_blocks) from astroML.utils import deprecated from astroML.utils.exceptions import AstroMLDeprecationWarning @deprecated('0.4', alternative='astropy.visualization.hist', warning_type=AstroMLDeprecationWarning) def hist(x, bins=10, range=None, *args, **kwargs): """Enhanced histogram This is a histogram function that enables the use of more sophisticated algorithms for determining bins. Aside from the `bins` argument allowing a string specified how bins are computed, the parameters are the same as pylab.hist(). Parameters ---------- x : array_like array of data to be histogrammed bins : int or list or str (optional) If bins is a string, then it must be one of: 'blocks' : use bayesian blocks for dynamic bin widths 'knuth' : use Knuth's rule to determine bins 'scott' : use Scott's rule to determine bins 'freedman' : use the Freedman-diaconis rule to determine bins range : tuple or None (optional) the minimum and maximum range for the histogram. If not specified, it will be (x.min(), x.max()) ax : Axes instance (optional) specify the Axes on which to draw the histogram. If not specified, then the current active axes will be used. **kwargs : other keyword arguments are described in pylab.hist(). """ if isinstance(bins, str) and "weights" in kwargs: warnings.warn("weights argument is not supported: it will be ignored.") kwargs.pop('weights') x = np.asarray(x) if 'ax' in kwargs: ax = kwargs['ax'] del kwargs['ax'] else: # import here so that testing with Agg will work from matplotlib import pyplot as plt ax = plt.gca() # if range is specified, we need to truncate the data for # the bin-finding routines if (range is not None and (bins in ['blocks', 'knuth', 'knuths', 'scott', 'scotts', 'freedman', 'freedmans'])): x = x[(x >= range[0]) & (x <= range[1])] if bins in ['blocks']: bins = bayesian_blocks(x) elif bins in ['knuth', 'knuths']: dx, bins = knuth_bin_width(x, True) elif bins in ['scott', 'scotts']: dx, bins = scott_bin_width(x, True) elif bins in ['freedman', 'freedmans']: dx, bins = freedman_bin_width(x, True) elif isinstance(bins, str): raise ValueError("unrecognized bin code: '{}'".format(bins)) return ax.hist(x, bins, range, **kwargs) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1549602299.0 astroML-1.0.2/astroML/plotting/mcmc.py0000644000076700000240000001026700000000000017152 0ustar00bsipoczstaffimport numpy as np def convert_to_stdev(logL): """ Given a grid of log-likelihood values, convert them to cumulative standard deviation. This is useful for drawing contours from a grid of likelihoods. """ sigma = np.exp(logL) shape = sigma.shape sigma = sigma.ravel() # obtain the indices to sort and unsort the flattened array i_sort = np.argsort(sigma)[::-1] i_unsort = np.argsort(i_sort) sigma_cumsum = sigma[i_sort].cumsum() sigma_cumsum /= sigma_cumsum[-1] return sigma_cumsum[i_unsort].reshape(shape) def plot_mcmc(traces, labels=None, limits=None, true_values=None, fig=None, contour=True, scatter=False, levels=[0.683, 0.955], bins=20, bounds=[0.08, 0.08, 0.95, 0.95], **kwargs): """Plot a grid of MCMC results Parameters ---------- traces : array_like the MCMC chain traces. shape is [Ndim, Nchain] labels : list of strings (optional) if specified, the label associated with each trace limits : list of tuples (optional) if specified, the axes limits for each trace true_values : list of floats (optional) if specified, the true value for each trace (will be indicated with an 'X' on the plot) fig : matplotlib.Figure (optional) the figure on which to draw the axes. If not specified, a new one will be created. contour : bool (optional) if True, then draw contours in each subplot. Default=True. scatter : bool (optional) if True, then scatter points in each subplot. Default=False. levels : list of floats the list of percentile levels at which to plot contours. Each entry should be between 0 and 1 bins : int, tuple, array, or tuple of arrays the binning parameter passed to np.histogram2d. It is assumed that the point density is constant on the scale of the bins bounds : list of floats the bounds of the set of axes used for plotting additional keyword arguments are passed to scatter() and contour() Returns ------- axes_list : list of matplotlib.Axes instances the list of axes created by the routine """ # Import here so that testing with Agg will work from matplotlib import pyplot as plt if fig is None: fig = plt.figure(figsize=(8, 8)) if limits is None: limits = [(t.min(), t.max()) for t in traces] if labels is None: labels = ['' for t in traces] num_traces = len(traces) bins = [np.linspace(limits[i][0], limits[i][1], bins + 1) for i in range(num_traces)] xmin, xmax = bounds[0], bounds[2] ymin, ymax = bounds[1], bounds[3] dx = (xmax - xmin) * 1. / (num_traces - 1) dy = (ymax - ymin) * 1. / (num_traces - 1) axes_list = [] for j in range(1, num_traces): for i in range(j): ax = fig.add_axes([xmin + i * dx, ymin + (num_traces - 1 - j) * dy, dx, dy]) if scatter: plt.scatter(traces[i], traces[j], **kwargs) if contour: H, xbins, ybins = np.histogram2d(traces[i], traces[j], bins=(bins[i], bins[j])) H[H == 0] = 1E-16 Nsigma = convert_to_stdev(np.log(H)) ax.contour(0.5 * (xbins[1:] + xbins[:-1]), 0.5 * (ybins[1:] + ybins[:-1]), Nsigma.T, levels=levels, **kwargs) if i == 0: ax.set_ylabel(labels[j]) else: ax.yaxis.set_major_formatter(plt.NullFormatter()) if j == num_traces - 1: ax.set_xlabel(labels[i]) else: ax.xaxis.set_major_formatter(plt.NullFormatter()) if true_values is not None: ax.plot(limits[i], [true_values[j], true_values[j]], ':k', lw=1) ax.plot([true_values[i], true_values[i]], limits[j], ':k', lw=1) ax.set_xlim(limits[i]) ax.set_ylim(limits[j]) axes_list.append(ax) return axes_list ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1612224990.0 astroML-1.0.2/astroML/plotting/multiaxes.py0000644000076700000240000002452700000000000020252 0ustar00bsipoczstaff""" Multi-panel plotting """ from copy import deepcopy import numpy as np class MultiAxes: """Visualize Multiple-dimensional data This class enables the visualization of multi-dimensional data, using a triangular grid of 2D plots. Parameters ---------- ndim : integer Number of data dimensions inner_labels : bool If true, then label the inner axes. If false, then only the outer axes will be labeled fig : matplotlib.Figure if specified, draw the plot on this figure. Otherwise, use the current active figure. left, bottom, right, top, wspace, hspace : floats these parameters control the layout of the plots. They behave have an identical effect as the arguments to plt.subplots_adjust. If not specified, default values from the rc file will be used. Examples -------- A grid of scatter plots can be created as follows:: x = np.random.normal((4, 1000)) R = np.random.random((4, 4)) # projection matrix x = np.dot(R, x) ax = MultiAxes(4) ax.scatter(x) ax.set_labels(['x1', 'x2', 'x3', 'x4']) Alternatively, the scatter plot can be visualized as a density:: ax = MultiAxes(4) ax.density(x, bins=[20, 20, 20, 20]) """ def __init__(self, ndim, inner_labels=False, fig=None, left=None, bottom=None, right=None, top=None, wspace=None, hspace=None): # Import here so that testing with Agg will work from matplotlib import pyplot as plt if fig is None: fig = plt.gcf() self.fig = fig self.ndim = ndim self.inner_labels = inner_labels self._update('left', left) self._update('bottom', bottom) self._update('right', right) self._update('top', top) self._update('wspace', wspace) self._update('hspace', hspace) self.axes = self._draw_panels() def _update(self, s, val): # Import here so that testing with Agg will work from matplotlib import rcParams if val is None: val = getattr(self, s, None) if val is None: key = 'figure.subplot.' + s val = rcParams[key] setattr(self, s, val) def _check_data(self, data): data = np.asarray(data) if data.ndim != 2: raise ValueError("data dimension should be 2") if data.shape[1] != self.ndim: raise ValueError("leading dimension of data should match ndim") return data def _draw_panels(self): # Import here so that testing with Agg will work from matplotlib import pyplot as plt if self.top <= self.bottom: raise ValueError('top must be larger than bottom') if self.right <= self.left: raise ValueError('right must be larger than left') ndim = self.ndim panel_width = ((self.right - self.left) / (ndim - 1 + self.wspace * (ndim - 2))) panel_height = ((self.top - self.bottom) / (ndim - 1 + self.hspace * (ndim - 2))) full_panel_width = (1 + self.wspace) * panel_width full_panel_height = (1 + self.hspace) * panel_height axes = np.empty((ndim, ndim), dtype=object) axes.fill(None) for j in range(1, ndim): for i in range(j): left = self.left + i * full_panel_width right = self.bottom + (ndim - 1 - j) * full_panel_height ax = self.fig.add_axes([left, right, panel_width, panel_height]) axes[i, j] = ax if not self.inner_labels: # remove unneeded x labels for i in range(ndim): for j in range(ndim - 1): ax = axes[i, j] if ax is not None: ax.xaxis.set_major_formatter(plt.NullFormatter()) # remove unneeded y labels for i in range(1, ndim): for j in range(ndim): ax = axes[i, j] if ax is not None: ax.yaxis.set_major_formatter(plt.NullFormatter()) return np.asarray(axes, dtype=object) def set_limits(self, limits): """Set the axes limits Parameters ---------- limits : list of tuples a list of plot limits for each dimension, each in the form (xmin, xmax). The length of `limits` should match the data dimension. """ if len(limits) != self.ndim: raise ValueError("limits do not match number of dimensions") for i in range(self.ndim): for j in range(self.ndim): ax = self.axes[i, j] if ax is not None: ax.set_xlim(limits[i]) ax.set_ylim(limits[j]) def set_labels(self, labels): """Set the axes labels Parameters ---------- labels : list of strings a list of plot limits for each dimension. The length of `labels` should match the data dimension. """ if len(labels) != self.ndim: raise ValueError("labels do not match number of dimensions") for i in range(self.ndim): ax = self.axes[i, self.ndim - 1] if ax is not None: ax.set_xlabel(labels[i]) for j in range(self.ndim): ax = self.axes[0, j] if ax is not None: ax.set_ylabel(labels[j]) def set_locators(self, locators): """Set the tick locators for the plots Parameters ---------- locators : list or plt.Locator object If a list, then the length should match the data dimension. If a single Locator instance, then each axes will be given the same locator. """ # Import here so that testing with Agg will work from matplotlib import pyplot as plt if isinstance(locators, plt.Locator): locators = [deepcopy(locators) for i in range(self.ndim)] elif len(locators) != self.ndim: raise ValueError("locators do not match number of dimensions") for i in range(self.ndim): for j in range(self.ndim): ax = self.axes[i, j] if ax is not None: ax.xaxis.set_major_locator(locators[i]) ax.yaxis.set_major_locator(locators[j]) def set_formatters(self, formatters): """Set the tick formatters for the outer edge of plots Parameters ---------- formatterss : list or plt.Formatter object If a list, then the length should match the data dimension. If a single Formatter instance, then each axes will be given the same locator. """ # Import here so that testing with Agg will work from matplotlib import pyplot as plt if isinstance(formatters, plt.Formatter): formatters = [deepcopy(formatters) for i in range(self.ndim)] elif len(formatters) != self.ndim: raise ValueError("formatters do not match number of dimensions") for i in range(self.ndim): ax = self.axes[i, self.ndim - 1] if ax is not None: ax.xaxis.set_major_formatter(formatters[i]) for j in range(self.ndim): ax = self.axes[0, j] if ax is not None: ax.xaxis.set_major_formatter(formatters[i]) def plot(self, data, *args, **kwargs): """Plot data This function calls plt.plot() on each axes. All arguments or keyword arguments are passed to the plt.plot function. Parameters ---------- data : ndarray shape of data is [n_samples, ndim], and ndim should match that passed to the MultiAxes constructor. """ data = self._check_data(data) for i in range(self.ndim): for j in range(self.ndim): ax = self.axes[i, j] if ax is None: continue ax.plot(data[:, i], data[:, j], *args, **kwargs) def scatter(self, data, *args, **kwargs): """Scatter plot data This function calls plt.scatter() on each axes. All arguments or keyword arguments are passed to the plt.scatter function. Parameters ---------- data : ndarray shape of data is [n_samples, ndim], and ndim should match that passed to the MultiAxes constructor. """ data = self._check_data(data) for i in range(self.ndim): for j in range(self.ndim): ax = self.axes[i, j] if ax is None: continue ax.scatter(data[:, i], data[:, j], *args, **kwargs) def density(self, data, bins=20, **kwargs): """Density plot of data This function calls np.histogram2D to bin the data in each axes, then calls plt.imshow() on the result. All extra arguments or keyword arguments are passed to the plt.imshow function. Parameters ---------- data : ndarray shape of data is [n_samples, ndim], and ndim should match that passed to the MultiAxes constructor. bins : int, array, list of ints, or list of arrays specify the bins for each dimension. If bins is a list, then the length must match the data dimension """ data = self._check_data(data) if not hasattr(bins, '__len__'): bins = [bins for i in range(self.ndim)] elif len(bins) != self.ndim: bins = [bins for i in range(self.ndim)] for i in range(self.ndim): for j in range(self.ndim): ax = self.axes[i, j] if ax is None: continue H, xbins, ybins = np.histogram2d(data[:, i], data[:, j], (bins[i], bins[j])) ax.imshow(H.T, origin='lower', aspect='auto', extent=(xbins[0], xbins[-1], ybins[0], ybins[-1]), **kwargs) ax.set_xlim(xbins[0], xbins[-1]) ax.set_ylim(ybins[0], ybins[-1]) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1615393766.0 astroML-1.0.2/astroML/plotting/regression.py0000644000076700000240000000676600000000000020424 0ustar00bsipoczstaffimport numpy as np import matplotlib.pyplot as plt from scipy import optimize from astroML.linear_model import TLS_logL, LinearRegression # TLS: def get_m_b(beta): b = np.dot(beta, beta) / beta[1] m = -beta[0] / beta[1] return m, b def plot_regressions(ksi, eta, x, y, sigma_x, sigma_y, add_regression_lines=False, alpha_in=1, beta_in=0.5, basis='linear'): figure = plt.figure(figsize=(8, 6)) ax = figure.add_subplot(111) ax.scatter(x, y, alpha=0.5) ax.errorbar(x, y, xerr=sigma_x, yerr=sigma_y, alpha=0.3, ls='') ax.set_xlabel('x') ax.set_ylabel('y') x0 = np.linspace(np.min(x) - 0.5, np.max(x) + 0.5, 20) # True regression line if alpha_in is not None and beta_in is not None: if basis == 'linear': y0 = alpha_in + x0 * beta_in elif basis == 'poly': y0 = alpha_in + beta_in[0] * x0 + beta_in[1] * x0 * x0 + beta_in[2] * x0 * x0 * x0 ax.plot(x0, y0, color='black', label='True regression') else: y0 = None if add_regression_lines: for label, data, *target in [['fit no errors', x, y, 1], ['fit y errors only', x, y, sigma_y], ['fit x errors only', y, x, sigma_x]]: linreg = LinearRegression() linreg.fit(data[:, None], *target) if label == 'fit x errors only' and y0 is not None: x_fit = linreg.predict(y0[:, None]) ax.plot(x_fit, y0, label=label) else: y_fit = linreg.predict(x0[:, None]) ax.plot(x0, y_fit, label=label) # TLS X = np.vstack((x, y)).T dX = np.zeros((len(x), 2, 2)) dX[:, 0, 0] = sigma_x dX[:, 1, 1] = sigma_y def min_func(beta): return -TLS_logL(beta, X, dX) beta_fit = optimize.fmin(min_func, x0=[-1, 1]) m_fit, b_fit = get_m_b(beta_fit) x_fit = np.linspace(-10, 10, 20) ax.plot(x_fit, m_fit * x_fit + b_fit, label='TLS') ax.set_xlim(np.min(x)-0.5, np.max(x)+0.5) ax.set_ylim(np.min(y)-0.5, np.max(y)+0.5) ax.legend() def plot_regression_from_trace(fitted, observed, ax=None, chains=None, multidim_ind=None): traces = [fitted.trace, ] xi, yi, sigx, sigy = observed if multidim_ind is not None: xi = xi[multidim_ind] x = np.linspace(np.min(xi)-0.5, np.max(xi)+0.5, 50) for i, trace in enumerate(traces): if 'theta' in trace.varnames and 'slope' not in trace.varnames: trace.add_values({'slope': np.tan(trace['theta'])}) if multidim_ind is not None: trace_slope = trace['slope'][:, multidim_ind] else: trace_slope = trace['slope'][:, 0] if chains is not None: for chain in range(100, len(trace) * trace.nchains, chains): y = trace['inter'][chain] + trace_slope[chain] * x ax.plot(x, y, alpha=0.03, c='red') # plot the best-fit line only H2D, bins1, bins2 = np.histogram2d(trace_slope, trace['inter'], bins=50) w = np.where(H2D == H2D.max()) # choose the maximum posterior slope and intercept slope_best = bins1[w[0][0]] intercept_best = bins2[w[1][0]] print("beta:", slope_best, "alpha:", intercept_best) y = intercept_best + slope_best * x # y_pre = fitted.predict(x[:, None]) ax.plot(x, y, ':', label='fitted') ax.legend() break ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1615393766.0 astroML-1.0.2/astroML/plotting/scatter_contour.py0000644000076700000240000000746400000000000021456 0ustar00bsipoczstaffimport numpy as np def scatter_contour(x, y, levels=10, threshold=100, log_counts=False, histogram2d_args=None, plot_args=None, contour_args=None, filled_contour=True, ax=None): """Scatter plot with contour over dense regions Parameters ---------- x, y : arrays x and y data for the contour plot levels : integer or array (optional, default=10) number of contour levels, or array of contour levels threshold : float (default=100) number of points per 2D bin at which to begin drawing contours log_counts :boolean (optional) if True, contour levels are the base-10 logarithm of bin counts. histogram2d_args : dict keyword arguments passed to numpy.histogram2d see doc string of numpy.histogram2d for more information plot_args : dict keyword arguments passed to plt.plot. By default it will use dict(marker='.', linestyle='none'). see doc string of pylab.plot for more information contour_args : dict keyword arguments passed to plt.contourf or plt.contour see doc string of pylab.contourf for more information filled_contour : bool If True (default) use filled contours. Otherwise, use contour outlines. ax : pylab.Axes instance the axes on which to plot. If not specified, the current axes will be used Returns ------- points, contours : points is the return value of ax.plot() contours is the return value of ax.contour or ax.contourf """ x = np.asarray(x) y = np.asarray(y) default_contour_args = dict(zorder=2) default_plot_args = dict(marker='.', linestyle='none', zorder=1) if plot_args is not None: default_plot_args.update(plot_args) plot_args = default_plot_args if contour_args is not None: default_contour_args.update(contour_args) contour_args = default_contour_args if histogram2d_args is None: histogram2d_args = {} if contour_args is None: contour_args = {} if ax is None: # Import here so that testing with Agg will work from matplotlib import pyplot as plt ax = plt.gca() H, xbins, ybins = np.histogram2d(x, y, **histogram2d_args) if log_counts: H = np.log10(1 + H) threshold = np.log10(1 + threshold) levels = np.asarray(levels) if levels.size == 1: levels = np.linspace(threshold, H.max(), levels) extent = [xbins[0], xbins[-1], ybins[0], ybins[-1]] i_min = np.argmin(levels) # draw a zero-width line: this gives us the outer polygon to # reduce the number of points we draw # somewhat hackish... we could probably get the same info from # the full contour plot below. outline = ax.contour(H.T, levels[i_min:i_min + 1], linewidths=0, extent=extent, alpha=0) if filled_contour: contours = ax.contourf(H.T, levels, extent=extent, **contour_args) else: contours = ax.contour(H.T, levels, extent=extent, **contour_args) X = np.hstack([x[:, None], y[:, None]]) if len(outline.allsegs[0]) > 0: outer_poly = outline.allsegs[0][0] try: # this works in newer matplotlib versions from matplotlib.path import Path points_inside = Path(outer_poly).contains_points(X) except ImportError: # this works in older matplotlib versions import matplotlib.nxutils as nx points_inside = nx.points_inside_poly(X, outer_poly) Xplot = X[~points_inside] else: Xplot = X points = ax.plot(Xplot[:, 0], Xplot[:, 1], **plot_args) return points, contours ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1643147368.0 astroML-1.0.2/astroML/plotting/settings.py0000644000076700000240000000242600000000000020071 0ustar00bsipoczstaffdef setup_text_plots(fontsize=8, usetex=True): """ This function adjusts matplotlib settings so that all figures in the textbook have a uniform format and look. """ import matplotlib from packaging.version import Version matplotlib.rc('legend', fontsize=fontsize, handlelength=3) matplotlib.rc('axes', titlesize=fontsize) matplotlib.rc('axes', labelsize=fontsize) matplotlib.rc('xtick', labelsize=fontsize) matplotlib.rc('ytick', labelsize=fontsize) matplotlib.rc('text', usetex=usetex) matplotlib.rc('font', size=fontsize, family='serif', style='normal', variant='normal', stretch='normal', weight='normal') matplotlib.rc('patch', force_edgecolor=True) if Version(matplotlib.__version__) < Version("3.1"): matplotlib.rc('_internal', classic_mode=True) else: # New in mpl 3.1 matplotlib.rc('scatter', edgecolors='b') matplotlib.rc('grid', linestyle=':') matplotlib.rc('errorbar', capsize=3) matplotlib.rc('image', cmap='viridis') matplotlib.rc('axes', xmargin=0) matplotlib.rc('axes', ymargin=0) matplotlib.rc('xtick', direction='in') matplotlib.rc('ytick', direction='in') matplotlib.rc('xtick', top=True) matplotlib.rc('ytick', right=True) ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1643147665.4317245 astroML-1.0.2/astroML/plotting/tests/0000755000076700000240000000000000000000000017015 5ustar00bsipoczstaff././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1549602299.0 astroML-1.0.2/astroML/plotting/tests/__init__.py0000644000076700000240000000005000000000000021121 0ustar00bsipoczstaffimport matplotlib matplotlib.use('Agg') ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1615393766.0 astroML-1.0.2/astroML/plotting/tests/test_devectorize.py0000644000076700000240000000161400000000000022753 0ustar00bsipoczstafffrom io import BytesIO import numpy as np from numpy.testing import assert_ import matplotlib from matplotlib import image import matplotlib.pyplot as plt from astroML.plotting.tools import devectorize_axes matplotlib.use('Agg') # don't display plots def test_devectorize_axes(): np.random.seed(0) x, y = np.random.random((2, 1000)) # save vectorized version fig = plt.figure() ax = fig.add_subplot(111) ax.scatter(x, y) output = BytesIO() fig.savefig(output) output.seek(0) im1 = image.imread(output) plt.close() # save devectorized version fig = plt.figure() ax = fig.add_subplot(111) ax.scatter(x, y) devectorize_axes(ax, dpi=200) output = BytesIO() fig.savefig(output) output.seek(0) im2 = image.imread(output) plt.close() assert_(im1.shape == im2.shape) assert_((im1 != im2).sum() < 0.1 * im1.size) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1643147368.0 astroML-1.0.2/astroML/plotting/tools.py0000644000076700000240000001134100000000000017365 0ustar00bsipoczstaffimport numpy as np from io import BytesIO from matplotlib import pyplot as plt from scipy import interpolate from matplotlib import image from matplotlib.colors import LinearSegmentedColormap from matplotlib.transforms import Bbox from matplotlib.patches import Ellipse def devectorize_axes(ax=None, dpi=None, transparent=True): """Convert axes contents to a png. This is useful when plotting many points, as the size of the saved file can become very large otherwise. Parameters ---------- ax : Axes instance (optional) Axes to de-vectorize. If None, this uses the current active axes (plt.gca()) dpi: int (optional) resolution of the png image. If not specified, the default from 'savefig.dpi' in rcParams will be used transparent : bool (optional) if True (default) then the PNG will be made transparent Returns ------- ax : Axes instance the in-place modified Axes instance Examples -------- The code can be used in the following way:: >>> import matplotlib.pyplot as plt >>> import numpy as np >>> from astroML.plotting.tools import devectorize_axes >>> fig, ax = plt.subplots() >>> x, y = np.random.random((2, 10000)) >>> ax.scatter(x, y) # doctest: +IGNORE_OUTPUT >>> devectorize_axes(ax) # doctest: +IGNORE_OUTPUT The resulting figure will be much smaller than the vectorized version. """ if ax is None: ax = plt.gca() fig = ax.figure axlim = ax.axis() # setup: make all visible spines (axes & ticks) & text invisible # we need to set these back later, so we save their current state _sp = {} _txt_vis = [t.get_visible() for t in ax.texts] for k in ax.spines: _sp[k] = ax.spines[k].get_visible() ax.spines[k].set_visible(False) for t in ax.texts: t.set_visible(False) _xax = ax.xaxis.get_visible() _yax = ax.yaxis.get_visible() _patch = ax.patch.get_visible() ax.patch.set_visible(False) ax.xaxis.set_visible(False) ax.yaxis.set_visible(False) # convert canvas to PNG extents = ax.bbox.extents / fig.dpi output = BytesIO() plt.savefig(output, format='png', dpi=dpi, transparent=transparent, bbox_inches=Bbox([extents[:2], extents[2:]])) output.seek(0) im = image.imread(output) # clear everything on axis (but not text) ax.lines.clear() ax.patches.clear() ax.tables.clear() ax.artists.clear() ax.images.clear() # Show the image ax.imshow(im, extent=axlim, aspect='auto', interpolation='nearest') # restore all the spines & text for k in ax.spines: ax.spines[k].set_visible(_sp[k]) for t, v in zip(ax.texts, _txt_vis): t.set_visible(v) ax.patch.set_visible(_patch) ax.xaxis.set_visible(_xax) ax.yaxis.set_visible(_yax) if plt.isinteractive(): plt.draw() return ax def discretize_cmap(cmap, N): """Return a discrete colormap from the continuous colormap cmap. Parameters ---------- cmap : colormap instance, eg. cm.jet. N : Number of colors. Returns ------- cmap_d : discretized colormap Example ------- >>> from matplotlib import cm >>> djet = discretize_cmap(cm.jet, 5) """ cdict = cmap._segmentdata.copy() # N colors colors_i = np.linspace(0, 1., N) # N+1 indices indices = np.linspace(0, 1., N + 1) for key in ('red', 'green', 'blue'): # Find the N colors D = np.array(cdict[key]) interp = interpolate.interp1d(D[:, 0], D[:, 1]) colors = interp(colors_i) # Place these colors at the correct indices. A = np.zeros((N + 1, 3), float) A[:, 0] = indices A[1:, 1] = colors A[:-1, 2] = colors # Create a tuple for the dictionary. L = [] for color in A: L.append(tuple(color)) cdict[key] = tuple(L) # Return colormap object. return LinearSegmentedColormap('colormap', cdict, 1024) def draw_ellipse(mu, C, scales=[1, 2, 3], ax=None, **kwargs): if ax is None: ax = plt.gca() # find principal components and rotation angle of ellipse sigma_x2 = C[0, 0] sigma_y2 = C[1, 1] sigma_xy = C[0, 1] alpha = 0.5 * np.arctan2(2 * sigma_xy, (sigma_x2 - sigma_y2)) tmp1 = 0.5 * (sigma_x2 + sigma_y2) tmp2 = np.sqrt(0.25 * (sigma_x2 - sigma_y2) ** 2 + sigma_xy ** 2) sigma1 = np.sqrt(tmp1 + tmp2) sigma2 = np.sqrt(tmp1 - tmp2) for scale in scales: ax.add_patch(Ellipse((mu[0], mu[1]), 2 * scale * sigma1, 2 * scale * sigma2, alpha * 180. / np.pi, **kwargs)) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1615393766.0 astroML-1.0.2/astroML/resample.py0000644000076700000240000001377500000000000016212 0ustar00bsipoczstaffimport numpy as np import warnings from sklearn.utils import check_random_state def bootstrap(data, n_bootstraps, user_statistic, kwargs=None, pass_indices=False, random_state=None): """Compute bootstraped statistics of a dataset. Parameters ---------- data : array_like An n-dimensional data array of size n_samples by n_attributes n_bootstraps : integer the number of bootstrap samples to compute. Note that internally, two arrays of size (n_bootstraps, n_samples) will be allocated. For very large numbers of bootstraps, this can cause memory issues. user_statistic : function The statistic to be computed. This should take an array of data of size (n_bootstraps, n_samples) and return the row-wise statistics of the data. kwargs : dictionary (optional) A dictionary of keyword arguments to be passed to the user_statistic function. pass_indices : boolean (optional) if True, then the indices of the points rather than the points themselves are passed to `user_statistic` random_state: RandomState or an int seed (0 by default) A random number generator instance Returns ------- distribution : ndarray the bootstrapped distribution of statistics (length = n_bootstraps) """ # we don't set kwargs={} by default in the argument list, because using # a mutable type as a default argument can lead to strange results if kwargs is None: kwargs = {} rng = check_random_state(random_state) data = np.asarray(data) if data.ndim != 1: n_samples = data.shape[0] warnings.warn("bootstrap data are n-dimensional: " "assuming ordered n_samples by n_attributes") else: n_samples = data.size # Generate random indices with repetition ind = rng.randint(n_samples, size=(n_bootstraps, n_samples)) data = data[ind].reshape(-1, data[ind].shape[-1]) # Call the function if pass_indices: stat_bootstrap = user_statistic(ind, **kwargs) else: stat_bootstrap = user_statistic(data, **kwargs) # compute the statistic on the data return stat_bootstrap def jackknife(data, user_statistic, kwargs=None, return_raw_distribution=False, pass_indices=False): """Compute first-order jackknife statistics of the data. Parameters ---------- data : array_like A 1-dimensional data array of size n_samples user_statistic : function The statistic to be computed. This should take an array of data of size (n_samples, n_samples - 1) and return an array of size n_samples or tuple of arrays of size n_samples, representing the row-wise statistics of the input. kwargs : dictionary (optional) A dictionary of keyword arguments to be passed to the user_statistic function. return_raw_distribution : boolean (optional) if True, return the raw jackknife distribution. Be aware that this distribution is not reflective of the true distribution: it is simply an intermediate step in the jackknife calculation pass_indices : boolean (optional) if True, then the indices of the points rather than the points themselves are passed to `user_statistic` Returns ------- mean, stdev : floats The mean and standard deviation of the jackknifed distribution raw_distribution : ndarray Returned only if `return_raw_distribution` is True The array containing the raw distribution (length n_samples) Be aware that this distribution is not reflective of the true distribution: it is simply an intermediate step in the jackknife calculation Notes ----- This implementation is a leave-one-out jackknife. Jackknife resampling is known to fail on rank-based statistics (e.g. median, quartiles, etc.) It works well on smooth statistics (e.g. mean, standard deviation, etc.) """ # we don't set kwargs={} by default in the argument list, because using # a mutable type as a default argument can lead to strange results if kwargs is None: kwargs = {} data = np.asarray(data) n_samples = data.size if data.ndim != 1: raise ValueError("bootstrap expects 1-dimensional data") # generate indices for the entire dataset, converting to row vector ind0 = np.arange(n_samples)[np.newaxis, :] # generate sets of indices where a single datapoint is left-out ind = np.arange(n_samples, dtype=int) ind = np.vstack([np.hstack((ind[:i], ind[i + 1:])) for i in ind]) # compute the statistic for the whole dataset if pass_indices: stat_data = user_statistic(ind0, **kwargs) stat_jackknife = user_statistic(ind, **kwargs) else: stat_data = user_statistic(data[ind0], **kwargs) stat_jackknife = user_statistic(data[ind], **kwargs) # handle multiple statistics: # if ndim=0, then the statistic is not operating on rows (error). # if ndim=1, then it's a single statistic returned # if ndim=2, then a tuple has been returned stat_data = np.asarray(stat_data) ndim = stat_data.ndim if ndim == 0: raise ValueError("user_statistic should return row-wise statistics") stat_data = np.atleast_2d(stat_data).T stat_jackknife = np.atleast_2d(stat_jackknife) # compute the jackknife correction formula delta_stat = (n_samples - 1) * (stat_data - stat_jackknife.mean(1)) stat_corrected = (stat_data + delta_stat)[0] sigma_stat = np.sqrt(1. / n_samples / (n_samples + 1) * np.sum((n_samples * stat_data - stat_corrected - (n_samples - 1) * stat_jackknife.T) ** 2, 0)) if return_raw_distribution: results = tuple(zip(stat_corrected, sigma_stat, stat_jackknife)) else: results = tuple(zip(stat_corrected, sigma_stat)) if ndim == 1: return results[0] else: return results ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1643147665.4330575 astroML-1.0.2/astroML/stats/0000755000076700000240000000000000000000000015151 5ustar00bsipoczstaff././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1615393766.0 astroML-1.0.2/astroML/stats/__init__.py0000644000076700000240000000061500000000000017264 0ustar00bsipoczstafffrom ._binned_statistic import (binned_statistic, binned_statistic_2d, binned_statistic_dd) from ._point_statistics import (mean_sigma, sigmaG, median_sigmaG, fit_bivariate_normal) from .random import bivariate_normal, trunc_exp, linear ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1615393766.0 astroML-1.0.2/astroML/stats/_binned_statistic.py0000644000076700000240000003201400000000000021210 0ustar00bsipoczstaffimport numpy as np def binned_statistic(x, values, statistic='mean', bins=10, range=None): """ Compute a binned statistic for a set of data. This is a generalization of a histogram function. A histogram divides the space into bins, and returns the count of the number of points in each bin. This function allows the computation of the sum, mean, median, or other statistic of the values within each bin. Parameters ---------- x : array_like A sequence of values to be binned. values : array_like The values on which the statistic will be computed. This must be the same shape as x. statistic : string or callable, optional The statistic to compute (default is 'mean'). The following statistics are available: * 'mean' : compute the mean of values for points within each bin. Empty bins will be represented by NaN. * 'median' : compute the median of values for points within each bin. Empty bins will be represented by NaN. * 'count' : compute the count of points within each bin. This is identical to an unweighted histogram. `values` array is not referenced. * 'sum' : compute the sum of values for points within each bin. This is identical to a weighted histogram. * function : a user-defined function which takes a 1D array of values, and outputs a single numerical statistic. This function will be called on the values in each bin. Empty bins will be represented by function([]), or NaN if this returns an error. bins : int or sequence of scalars, optional If `bins` is an int, it defines the number of equal-width bins in the given range (10, by default). If `bins` is a sequence, it defines the bin edges, including the rightmost edge, allowing for non-uniform bin widths. range : (float, float), optional The lower and upper range of the bins. If not provided, range is simply ``(x.min(), x.max())``. Values outside the range are ignored. Returns ------- statistic : array The values of the selected statistic in each bin. bin_edges : array of dtype float Return the bin edges ``(length(statistic)+1)``. Notes ----- All but the last (righthand-most) bin is half-open. In other words, if `bins` is:: [1, 2, 3, 4] then the first bin is ``[1, 2)`` (including 1, but excluding 2) and the second ``[2, 3)``. The last bin, however, is ``[3, 4]``, which *includes* 4. Examples -------- >>> binned_statistic([1, 2, 1], [2, 5, 3], bins=[0, 1, 2, 3], statistic='count') (array([0., 2., 1.]), array([0., 1., 2., 3.])) See Also -------- np.histogram, binned_statistic_2d, binned_statistic_dd """ try: N = len(bins) except TypeError: N = 1 if N != 1: bins = [np.asarray(bins, float)] medians, edges = binned_statistic_dd([x], values, statistic, bins, range) return medians, edges[0] def binned_statistic_2d(x, y, values, statistic='mean', bins=10, range=None): """ Compute a bidimensional binned statistic for a set of data. This is a generalization of a histogram2d function. A histogram divides the space into bins, and returns the count of the number of points in each bin. This function allows the computation of the sum, mean, median, or other statistic of the values within each bin. Parameters ---------- x : array_like A sequence of values to be binned along the first dimension. y : array_like A sequence of values to be binned along the second dimension. values : array_like The values on which the statistic will be computed. This must be the same shape as x. statistic : string or callable, optional The statistic to compute (default is 'mean'). The following statistics are available: * 'mean' : compute the mean of values for points within each bin. Empty bins will be represented by NaN. * 'median' : compute the median of values for points within each bin. Empty bins will be represented by NaN. * 'count' : compute the count of points within each bin. This is identical to an unweighted histogram. `values` array is not referenced. * 'sum' : compute the sum of values for points within each bin. This is identical to a weighted histogram. * function : a user-defined function which takes a 1D array of values, and outputs a single numerical statistic. This function will be called on the values in each bin. Empty bins will be represented by function([]), or NaN if this returns an error. bins : int or [int, int] or array-like or [array, array], optional The bin specification: * the number of bins for the two dimensions (nx=ny=bins), * the number of bins in each dimension (nx, ny = bins), * the bin edges for the two dimensions (x_edges=y_edges=bins), * the bin edges in each dimension (x_edges, y_edges = bins). range : array_like, shape(2,2), optional The leftmost and rightmost edges of the bins along each dimension (if not specified explicitly in the `bins` parameters): [[xmin, xmax], [ymin, ymax]]. All values outside of this range will be considered outliers and not tallied in the histogram. Returns ------- statistic : ndarray, shape(nx, ny) The values of the selected statistic in each two-dimensional bin xedges : ndarray, shape(nx + 1,) The bin edges along the first dimension. yedges : ndarray, shape(ny + 1,) The bin edges along the second dimension. See Also -------- np.histogram2d, binned_statistic, binned_statistic_dd """ # This code is based on np.histogram2d try: N = len(bins) except TypeError: N = 1 if N != 1 and N != 2: xedges = yedges = np.asarray(bins, float) bins = [xedges, yedges] medians, edges = binned_statistic_dd([x, y], values, statistic, bins, range) return medians, edges[0], edges[1] def binned_statistic_dd(sample, values, statistic='mean', bins=10, range=None): """ Compute a multidimensional binned statistic for a set of data. This is a generalization of a histogramdd function. A histogram divides the space into bins, and returns the count of the number of points in each bin. This function allows the computation of the sum, mean, median, or other statistic of the values within each bin. Parameters ---------- sample : array_like Data to histogram passed as a sequence of D arrays of length N, or as an (N,D) array. values : array_like The values on which the statistic will be computed. This must be the same shape as x. statistic : string or callable, optional The statistic to compute (default is 'mean'). The following statistics are available: * 'mean' : compute the mean of values for points within each bin. Empty bins will be represented by NaN. * 'median' : compute the median of values for points within each bin. Empty bins will be represented by NaN. * 'count' : compute the count of points within each bin. This is identical to an unweighted histogram. `values` array is not referenced. * 'sum' : compute the sum of values for points within each bin. This is identical to a weighted histogram. * function : a user-defined function which takes a 1D array of values, and outputs a single numerical statistic. This function will be called on the values in each bin. Empty bins will be represented by function([]), or NaN if this returns an error. bins : sequence or int, optional The bin specification: * A sequence of arrays describing the bin edges along each dimension. * The number of bins for each dimension (nx, ny, ... =bins) * The number of bins for all dimensions (nx=ny=...=bins). range : sequence, optional A sequence of lower and upper bin edges to be used if the edges are not given explicitely in `bins`. Defaults to the minimum and maximum values along each dimension. Returns ------- statistic : ndarray, shape(nx1, nx2, nx3,...) The values of the selected statistic in each two-dimensional bin edges : list of ndarrays A list of D arrays describing the (nxi + 1) bin edges for each dimension See Also -------- np.histogramdd, binned_statistic, binned_statistic_2d """ if type(statistic) == str: if statistic not in ['mean', 'median', 'count', 'sum']: raise ValueError('unrecognized statistic "%s"' % statistic) elif callable(statistic): pass else: raise ValueError("statistic not understood") # This code is based on np.histogramdd try: # Sample is an ND-array. N, D = sample.shape except (AttributeError, ValueError): # Sample is a sequence of 1D arrays. sample = np.atleast_2d(sample).T N, D = sample.shape nbin = np.empty(D, int) edges = D * [None] dedges = D * [None] try: M = len(bins) if M != D: raise AttributeError('The dimension of bins must be equal ' 'to the dimension of the sample x.') except TypeError: bins = D * [bins] # Select range for each dimension # Used only if number of bins is given. if range is None: smin = np.atleast_1d(np.array(sample.min(0), float)) smax = np.atleast_1d(np.array(sample.max(0), float)) else: smin = np.zeros(D) smax = np.zeros(D) for i in np.arange(D): smin[i], smax[i] = range[i] # Make sure the bins have a finite width. for i in np.arange(len(smin)): if smin[i] == smax[i]: smin[i] = smin[i] - .5 smax[i] = smax[i] + .5 # Create edge arrays for i in np.arange(D): if np.isscalar(bins[i]): nbin[i] = bins[i] + 2 # +2 for outlier bins edges[i] = np.linspace(smin[i], smax[i], nbin[i] - 1) else: edges[i] = np.asarray(bins[i], float) nbin[i] = len(edges[i]) + 1 # +1 for outlier bins dedges[i] = np.diff(edges[i]) nbin = np.asarray(nbin) # Compute the bin number each sample falls into. Ncount = {} for i in np.arange(D): Ncount[i] = np.digitize(sample[:, i], edges[i]) # Using digitize, values that fall on an edge are put in the right bin. # For the rightmost bin, we want values equal to the right # edge to be counted in the last bin, and not as an outlier. for i in np.arange(D): # Rounding precision decimal = int(-np.log10(dedges[i].min())) + 6 # Find which points are on the rightmost edge. on_edge = np.where(np.around(sample[:, i], decimal) == np.around(edges[i][-1], decimal))[0] # Shift these points one bin to the left. Ncount[i][on_edge] -= 1 # Compute the sample indices in the flattened statistic matrix. ni = nbin.argsort() xy = np.zeros(N, int) for i in np.arange(0, D - 1): xy += Ncount[ni[i]] * nbin[ni[i + 1:]].prod() xy += Ncount[ni[-1]] result = np.empty(nbin.prod(), float) if statistic == 'mean': result.fill(np.nan) flatcount = np.bincount(xy, None) flatsum = np.bincount(xy, values) a = np.arange(len(flatcount)) result[a] = flatsum result[a] /= flatcount elif statistic == 'count': result.fill(0) flatcount = np.bincount(xy, None) a = np.arange(len(flatcount)) result[a] = flatcount elif statistic == 'sum': result.fill(0) flatsum = np.bincount(xy, values) a = np.arange(len(flatsum)) result[a] = flatsum elif statistic == 'median': result.fill(np.nan) for i in np.unique(xy): result[i] = np.median(values[xy == i]) elif callable(statistic): try: null = statistic([]) except Exception: null = np.nan result.fill(null) for i in np.unique(xy): result[i] = statistic(values[xy == i]) # Shape into a proper matrix result = result.reshape(np.sort(nbin)) for i in np.arange(nbin.size): j = ni.argsort()[i] result = result.swapaxes(i, j) ni[i], ni[j] = ni[j], ni[i] # Remove outliers (indices 0 and -1 for each dimension). core = D * [slice(1, -1)] result = result[tuple(core)] if (result.shape != nbin - 2).any(): raise RuntimeError('Internal Shape Error') return result, edges ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1615393766.0 astroML-1.0.2/astroML/stats/_point_statistics.py0000644000076700000240000002110300000000000021262 0ustar00bsipoczstaffimport numpy as np from scipy import stats # from scipy.special import erfinv # sigmaG_factor = 1. / (2 * np.sqrt(2) * erfinv(0.5)) sigmaG_factor = 0.74130110925280102 def mean_sigma(a, axis=None, dtype=None, ddof=0, keepdims=False): """Compute mean and standard deviation for an array Parameters ---------- a : array_like Array containing numbers whose mean is desired. If `a` is not an array, a conversion is attempted. axis : int, optional Axis along which the means are computed. The default is to compute the mean of the flattened array. dtype : dtype, optional Type to use in computing the standard deviation. For arrays of integer type the default is float64, for arrays of float types it is the same as the array type. keepdims : bool, optional If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original `arr`. Returns ------- mu : ndarray, see dtype parameter above array containing the mean values sigma : ndarray, see dtype parameter above. array containing the standard deviation See Also -------- median_sigmaG : robust rank-based version of this calculation. Notes ----- This routine simply calls ``np.mean`` and ``np.std``, passing the keyword arguments to them. It is provided for ease of comparison with the function median_sigmaG() """ mu = np.mean(a, axis=axis, dtype=dtype) sigma = np.std(a, axis=axis, dtype=dtype, ddof=ddof) if keepdims: if axis is None: newshape = a.ndim * (1,) else: newshape = np.asarray(a.shape) newshape[axis] = 1 mu = mu.reshape(newshape) sigma = sigma.reshape(newshape) return mu, sigma def median_sigmaG(a, axis=None, overwrite_input=False, keepdims=False): """Compute median and rank-based estimate of the standard deviation Parameters ---------- a : array_like Array containing numbers whose mean is desired. If `a` is not an array, a conversion is attempted. axis : int, optional Axis along which the means are computed. The default is to compute the mean of the flattened array. overwrite_input : bool, optional If True, then allow use of memory of input array `a` for calculations. The input array will be modified by the call to median. This will save memory when you do not need to preserve the contents of the input array. Treat the input as undefined, but it will probably be fully or partially sorted. Default is False. Note that, if `overwrite_input` is True and the input is not already an array, an error will be raised. keepdims : bool, optional If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original `arr`. Returns ------- median : ndarray, see dtype parameter above array containing the median values sigmaG : ndarray, see dtype parameter above. array containing the robust estimator of the standard deviation See Also -------- mean_sigma : non-robust version of this calculation sigmaG : robust rank-based estimate of standard deviation Notes ----- This routine uses a single call to ``np.percentile`` to find the quartiles along the given axis, and uses these to compute the median and sigmaG: median = q50 sigmaG = (q75 - q25) * 0.7413 where 0.7413 ~ 1 / (2 sqrt(2) erf^-1(0.5)) """ q25, median, q75 = np.percentile(a, [25, 50, 75], axis=axis, overwrite_input=overwrite_input) sigmaG = sigmaG_factor * (q75 - q25) if keepdims: if axis is None: newshape = a.ndim * (1,) else: newshape = np.asarray(a.shape) newshape[axis] = 1 median = median.reshape(newshape) sigmaG = sigmaG.reshape(newshape) return median, sigmaG def sigmaG(a, axis=None, overwrite_input=False, keepdims=False): """Compute the rank-based estimate of the standard deviation Parameters ---------- a : array_like Array containing numbers whose mean is desired. If `a` is not an array, a conversion is attempted. axis : int, optional Axis along which the means are computed. The default is to compute the mean of the flattened array. overwrite_input : bool, optional If True, then allow use of memory of input array `a` for calculations. The input array will be modified by the call to median. This will save memory when you do not need to preserve the contents of the input array. Treat the input as undefined, but it will probably be fully or partially sorted. Default is False. Note that, if `overwrite_input` is True and the input is not already an array, an error will be raised. keepdims : bool, optional If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original `arr`. Returns ------- median : ndarray, see dtype parameter above array containing the median values sigmaG : ndarray, see dtype parameter above. array containing the robust estimator of the standard deviation See Also -------- median_sigmaG : robust rank-based estimate of mean and standard deviation Notes ----- This routine uses a single call to ``np.percentile`` to find the quartiles along the given axis, and uses these to compute the sigmaG, a robust estimate of the standard deviation sigma: sigmaG = 0.7413 * (q75 - q25) where 0.7413 ~ 1 / (2 sqrt(2) erf^-1(0.5)) """ q25, q75 = np.percentile(a, [25, 75], axis=axis, overwrite_input=overwrite_input) sigmaG = sigmaG_factor * (q75 - q25) if keepdims: if axis is None: newshape = a.ndim * (1,) else: newshape = np.asarray(a.shape) newshape[axis] = 1 sigmaG = sigmaG.reshape(newshape) return sigmaG def fit_bivariate_normal(x, y, robust=False): """Fit bivariate normal parameters to a 2D distribution of points Parameters ---------- x, y : array_like The x, y coordinates of the points robust : boolean (optional, default=False) If True, then use rank-based statistics which are robust to outliers Otherwise, use mean/std statistics which are not robust Returns ------- mu : tuple (x, y) location of the best-fit bivariate normal sigma_1, sigma_2 : float The best-fit gaussian widths in the uncorrelated frame alpha : float The rotation angle in radians of the uncorrelated frame """ x = np.asarray(x) y = np.asarray(y) assert x.shape == y.shape if robust: # use quartiles to compute center and spread med_x, sigmaG_x = median_sigmaG(x) med_y, sigmaG_y = median_sigmaG(y) # define the principal variables from Shevlyakov & Smirnov (2011) sx = 2 * sigmaG_x sy = 2 * sigmaG_y u = (x / sx + y / sy) / np.sqrt(2) v = (x / sx - y / sy) / np.sqrt(2) med_u, sigmaG_u = median_sigmaG(u) med_v, sigmaG_v = median_sigmaG(v) r_xy = ((sigmaG_u ** 2 - sigmaG_v ** 2) / (sigmaG_u ** 2 + sigmaG_v ** 2)) # rename estimators mu_x, mu_y = med_x, med_y sigma_x, sigma_y = sigmaG_x, sigmaG_y else: mu_x = np.mean(x) sigma_x = np.std(x) mu_y = np.mean(y) sigma_y = np.std(y) r_xy = stats.pearsonr(x, y)[0] # We need to use the full (-180, 180) version of arctan: this is # np.arctan2(x, y) = np.arctan(x / y), modulo 180 degrees sigma_xy = r_xy * sigma_x * sigma_y alpha = 0.5 * np.arctan2(2 * sigma_xy, sigma_x ** 2 - sigma_y ** 2) sigma1 = np.sqrt((0.5 * (sigma_x ** 2 + sigma_y ** 2) + np.sqrt(0.25 * (sigma_x ** 2 - sigma_y ** 2) ** 2 + sigma_xy ** 2))) sigma2 = np.sqrt((0.5 * (sigma_x ** 2 + sigma_y ** 2) - np.sqrt(0.25 * (sigma_x ** 2 - sigma_y ** 2) ** 2 + sigma_xy ** 2))) return [mu_x, mu_y], sigma1, sigma2, alpha ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1615393766.0 astroML-1.0.2/astroML/stats/random.py0000644000076700000240000000746600000000000017020 0ustar00bsipoczstaff""" Statistics for astronomy """ import numpy as np from scipy.stats.distributions import rv_continuous def bivariate_normal(mu=[0, 0], sigma_1=1, sigma_2=1, alpha=0, size=None, return_cov=False): """Sample points from a 2D normal distribution Parameters ---------- mu : array-like (length 2) The mean of the distribution sigma_1 : float The unrotated x-axis width sigma_2 : float The unrotated y-axis width alpha : float The rotation counter-clockwise about the origin size : tuple of ints, optional Given a shape of, for example, ``(m,n,k)``, ``m*n*k`` samples are generated, and packed in an `m`-by-`n`-by-`k` arrangement. Because each sample is `N`-dimensional, the output shape is ``(m,n,k,N)``. If no shape is specified, a single (`N`-D) sample is returned. return_cov : boolean, optional If True, return the computed covariance matrix. Returns ------- out : ndarray The drawn samples, of shape *size*, if that was provided. If not, the shape is ``(N,)``. In other words, each entry ``out[i,j,...,:]`` is an N-dimensional value drawn from the distribution. cov : ndarray The 2x2 covariance matrix. Returned only if return_cov == True. Notes ----- This function works by computing a covariance matrix from the inputs, and calling ``np.random.multivariate_normal()``. If the covariance matrix is available, this function can be called directly. """ # compute covariance matrix sigma_xx = ((sigma_1 * np.cos(alpha)) ** 2 + (sigma_2 * np.sin(alpha)) ** 2) sigma_yy = ((sigma_1 * np.sin(alpha)) ** 2 + (sigma_2 * np.cos(alpha)) ** 2) sigma_xy = (sigma_1 ** 2 - sigma_2 ** 2) * np.sin(alpha) * np.cos(alpha) cov = np.array([[sigma_xx, sigma_xy], [sigma_xy, sigma_yy]]) # draw points from the distribution x = np.random.multivariate_normal(mu, cov, size) if return_cov: return x, cov else: return x # ---------------------------------------------------------------------- # Define some new distributions based on rv_continuous class trunc_exp_gen(rv_continuous): """A truncated positive exponential continuous random variable. The probability distribution is:: p(x) ~ exp(k * x) between a and b = 0 otherwise The arguments are (a, b, k) %(before_notes)s %(example)s """ def _argcheck(self, a, b, k): self._const = k / (np.exp(k * b) - np.exp(k * a)) return (a != b) and not np.isinf(k) def _pdf(self, x, a, b, k): pdf = self._const * np.exp(k * x) pdf[(x < a) | (x > b)] = 0 return pdf def _rvs(self, a, b, k): y = np.random.random(self._size) return (1. / k) * np.log(1 + y * k / self._const) trunc_exp = trunc_exp_gen(name="trunc_exp", shapes='a, b, k') class linear_gen(rv_continuous): """A truncated positive exponential continuous random variable. The probability distribution is:: p(x) ~ c * x + d between a and b = 0 otherwise The arguments are (a, b, c). d is set by the normalization %(before_notes)s %(example)s """ def _argcheck(self, a, b, c): return (a != b) and not np.isinf(c) def _pdf(self, x, a, b, c): d = 1. / (b - a) - 0.5 * c * (b + a) pdf = c * x + d pdf[(x < a) | (x > b)] = 0 return pdf def _rvs(self, a, b, c): mu = 0.5 * (a + b) W = (b - a) x0 = 1. / c / W - mu r = np.random.random(self._size) return -x0 + np.sqrt(2. * r / c + a * a + 2. * a * x0 + x0 * x0) linear = linear_gen(name="linear", shapes='a, b, c') ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1643147665.4339814 astroML-1.0.2/astroML/stats/tests/0000755000076700000240000000000000000000000016313 5ustar00bsipoczstaff././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1541133836.0 astroML-1.0.2/astroML/stats/tests/__init__.py0000644000076700000240000000000000000000000020412 0ustar00bsipoczstaff././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1431090039.0 astroML-1.0.2/astroML/stats/tests/test_binned_statistic.py0000644000076700000240000000774100000000000023263 0ustar00bsipoczstaffimport numpy as np from numpy.testing import assert_array_almost_equal from astroML.stats import \ binned_statistic, binned_statistic_2d, binned_statistic_dd def test_1d_count(): x = np.random.random(100) v = np.random.random(100) count1, edges1 = binned_statistic(x, v, 'count', bins=10) count2, edges2 = np.histogram(x, bins=10) assert_array_almost_equal(count1, count2) assert_array_almost_equal(edges1, edges2) def test_1d_sum(): x = np.random.random(100) v = np.random.random(100) sum1, edges1 = binned_statistic(x, v, 'sum', bins=10) sum2, edges2 = np.histogram(x, bins=10, weights=v) assert_array_almost_equal(sum1, sum2) assert_array_almost_equal(edges1, edges2) def test_1d_mean(): x = np.random.random(100) v = np.random.random(100) stat1, edges1 = binned_statistic(x, v, 'mean', bins=10) stat2, edges2 = binned_statistic(x, v, np.mean, bins=10) assert_array_almost_equal(stat1, stat2) assert_array_almost_equal(edges1, edges2) def test_1d_median(): x = np.random.random(100) v = np.random.random(100) stat1, edges1 = binned_statistic(x, v, 'median', bins=10) stat2, edges2 = binned_statistic(x, v, np.median, bins=10) assert_array_almost_equal(stat1, stat2) assert_array_almost_equal(edges1, edges2) def test_2d_count(): x = np.random.random(100) y = np.random.random(100) v = np.random.random(100) count1, binx1, biny1 = binned_statistic_2d(x, y, v, 'count', bins=5) count2, binx2, biny2 = np.histogram2d(x, y, bins=5) assert_array_almost_equal(count1, count2) assert_array_almost_equal(binx1, binx2) assert_array_almost_equal(biny1, biny2) def test_2d_sum(): x = np.random.random(100) y = np.random.random(100) v = np.random.random(100) sum1, binx1, biny1 = binned_statistic_2d(x, y, v, 'sum', bins=5) sum2, binx2, biny2 = np.histogram2d(x, y, bins=5, weights=v) assert_array_almost_equal(sum1, sum2) assert_array_almost_equal(binx1, binx2) assert_array_almost_equal(biny1, biny2) def test_2d_mean(): x = np.random.random(100) y = np.random.random(100) v = np.random.random(100) stat1, binx1, biny1 = binned_statistic_2d(x, y, v, 'mean', bins=5) stat2, binx2, biny2 = binned_statistic_2d(x, y, v, np.mean, bins=5) assert_array_almost_equal(stat1, stat2) assert_array_almost_equal(binx1, binx2) assert_array_almost_equal(biny1, biny2) def test_2d_median(): x = np.random.random(100) y = np.random.random(100) v = np.random.random(100) stat1, binx1, biny1 = binned_statistic_2d(x, y, v, 'median', bins=5) stat2, binx2, biny2 = binned_statistic_2d(x, y, v, np.median, bins=5) assert_array_almost_equal(stat1, stat2) assert_array_almost_equal(binx1, binx2) assert_array_almost_equal(biny1, biny2) def test_dd_count(): X = np.random.random((100, 3)) v = np.random.random(100) count1, edges1 = binned_statistic_dd(X, v, 'count', bins=3) count2, edges2 = np.histogramdd(X, bins=3) assert_array_almost_equal(count1, count2) assert_array_almost_equal(edges1, edges2) def test_dd_sum(): X = np.random.random((100, 3)) v = np.random.random(100) sum1, edges1 = binned_statistic_dd(X, v, 'sum', bins=3) sum2, edges2 = np.histogramdd(X, bins=3, weights=v) assert_array_almost_equal(sum1, sum2) assert_array_almost_equal(edges1, edges2) def test_dd_mean(): X = np.random.random((100, 3)) v = np.random.random(100) stat1, edges1 = binned_statistic_dd(X, v, 'mean', bins=3) stat2, edges2 = binned_statistic_dd(X, v, np.mean, bins=3) assert_array_almost_equal(stat1, stat2) assert_array_almost_equal(edges1, edges2) def test_dd_median(): X = np.random.random((100, 3)) v = np.random.random(100) stat1, edges1 = binned_statistic_dd(X, v, 'median', bins=3) stat2, edges2 = binned_statistic_dd(X, v, np.median, bins=3) assert_array_almost_equal(stat1, stat2) assert_array_almost_equal(edges1, edges2) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1615393766.0 astroML-1.0.2/astroML/stats/tests/test_stats.py0000644000076700000240000001376100000000000021072 0ustar00bsipoczstaffimport pytest import numpy as np from numpy.testing import (assert_array_almost_equal, assert_array_equal, assert_allclose) from astroML.stats import (mean_sigma, median_sigmaG, sigmaG, fit_bivariate_normal) from astroML.stats.random import bivariate_normal, trunc_exp, linear # --------------------------------------------------------------------------- # Check that mean_sigma() returns the same values as np.mean() and np.std() @pytest.mark.parametrize("a_shape", [(4, ), (4, 5), (4, 5, 6)]) @pytest.mark.parametrize("axis", [None, 0]) @pytest.mark.parametrize("ddof", [0, 1]) def test_mean_sigma(a_shape, axis, ddof): np.random.seed(0) a = np.random.random(a_shape) mu1, sigma1 = mean_sigma(a, axis=axis, ddof=ddof) mu2 = np.mean(a, axis=axis) sigma2 = np.std(a, axis=axis, ddof=ddof) assert_array_almost_equal(mu1, mu2) assert_array_almost_equal(sigma1, sigma2) # --------------------------------------------------------------------------- # Check that the keepdims argument works as expected # we'll later compare median_sigmaG to these results, so that # is effectively tested as well. @pytest.mark.parametrize("axis", [None, 0, 1, 2]) def test_mean_sigma_keepdims(axis): np.random.seed(0) a = np.random.random((4, 5, 6)) mu1, sigma1 = mean_sigma(a, axis, keepdims=False) mu2, sigma2 = mean_sigma(a, axis, keepdims=True) assert_array_equal(mu1.ravel(), mu2.ravel()) assert_array_equal(sigma1.ravel(), sigma2.ravel()) assert_array_equal(np.broadcast(a, mu2).shape, a.shape) assert_array_equal(np.broadcast(a, sigma2).shape, a.shape) # --------------------------------------------------------------------------- # Check that median_sigmaG matches the values computed using np.percentile # and np.median @pytest.mark.parametrize("axis", [None, 0, 1, 2]) def test_median_sigmaG(axis): np.random.seed(0) a = np.random.random((20, 40, 60)) from scipy.special import erfinv factor = 1. / (2 * np.sqrt(2) * erfinv(0.5)) med1, sigmaG1 = median_sigmaG(a, axis=axis) med2 = np.median(a, axis=axis) q25, q75 = np.percentile(a, [25, 75], axis=axis) sigmaG2 = factor * (q75 - q25) assert_array_almost_equal(med1, med2) assert_array_almost_equal(sigmaG1, sigmaG2) @pytest.mark.parametrize("axis", [None, 0, 1, 2]) def test_sigmaG(axis): np.random.seed(0) a = np.random.random((20, 40, 60)) from scipy.special import erfinv factor = 1. / (2 * np.sqrt(2) * erfinv(0.5)) sigmaG1 = sigmaG(a, axis=axis) q25, q75 = np.percentile(a, [25, 75], axis=axis) sigmaG2 = factor * (q75 - q25) assert_array_almost_equal(sigmaG1, sigmaG2) # --------------------------------------------------------------------------- # Check that median_sigmaG() is a good approximation of mean_sigma() # for normally-distributed data. @pytest.mark.parametrize('axis', [None, 1]) @pytest.mark.parametrize('keepdims', [True, False]) def test_median_sigmaG_approx(axis, keepdims, atol=0.02): np.random.seed(0) a = np.random.normal(0, 1, size=(10, 10000)) med, sigmaG = median_sigmaG(a, axis=axis, keepdims=keepdims) mu, sigma = mean_sigma(a, axis=axis, ddof=1, keepdims=keepdims) assert_allclose(med, mu, atol=atol) assert_allclose(sigmaG, sigma, atol=atol) # --------------------------------------------------------------------------- # Check the bivariate normal fit @pytest.mark.parametrize("alpha", np.linspace(-np.pi / 2, np.pi / 2, 7)) def test_fit_bivariate_normal(alpha): mu = [10, 10] sigma1 = 2.0 sigma2 = 1.0 N = 1000 # poisson stats rtol = 2 * np.sqrt(N) / N x, y = bivariate_normal(mu, sigma1, sigma2, alpha, N).T mu_fit, sigma1_fit, sigma2_fit, alpha_fit = fit_bivariate_normal(x, y) if alpha_fit > np.pi / 2: alpha_fit -= np.pi elif alpha_fit < -np.pi / 2: alpha_fit += np.pi # Circular degeneracy in alpha: test sin(2*alpha) instead assert_allclose(np.sin(2 * alpha_fit), np.sin(2 * alpha), atol=2 * rtol) assert_allclose(mu, mu_fit, rtol=rtol) assert_allclose(sigma1_fit, sigma1, rtol=rtol) assert_allclose(sigma2_fit, sigma2, rtol=rtol) # ------------------------------------------------------ # Check truncated exponential and linear functions def test_trunc_exp(): x = np.linspace(0, 10, 100) k = 0.25 xlim = [3, 5] # replaced with from astroML.stats.random import trunc_exp # trunc_exp = trunc_exp_gen(name="trunc_exp", shapes='a, b, k') myfunc = trunc_exp(xlim[0], xlim[1], k) y = myfunc.pdf(x) zeros = np.zeros(len(y)) # Test that the function is zero outside of defined limits assert_array_equal(y[x < xlim[0]], zeros[x < xlim[0]]) assert_array_equal(y[x > xlim[1]], zeros[x > xlim[1]]) inlims = (x < xlim[1]) & (x > xlim[0]) C = k / (np.exp(k * xlim[1]) - np.exp(k * xlim[0])) # Test that within defined limits, function is exponential assert_array_equal(y[inlims], C*np.exp(k * x[inlims])) # Test that the PDF integrates to just about 1 dx = x[1] - x[0] integral = np.sum(y * dx) assert np.round(integral, 1) == 1 # Check the linear generator def test_linear_gen(): x = np.linspace(-10, 10, 200) c = -0.5 xlim = [-2.4, 6.] # replaced with from astroML.stats.random import linear # linear = linear_gen(name="linear", shapes="a, b, c") y = linear.pdf(x, xlim[0], xlim[1], c) zeros = np.zeros(len(y)) # Test that the function is zero outside of defined limits assert_array_equal(y[x < xlim[0]], zeros[x < xlim[0]]) assert_array_equal(y[x > xlim[1]], zeros[x > xlim[1]]) inlims = (x < xlim[1]) & (x > xlim[0]) d = 1. / (xlim[1] - xlim[0]) - 0.5 * c * (xlim[1] + xlim[0]) inlims = (x < xlim[1]) & (x > xlim[0]) # Test that within defined limits, function is linear assert_array_equal(y[inlims], c*x[inlims] + d) # Test that the PDF integrates to about 1 dx = x[1] - x[0] integral = np.sum(y * dx) assert np.round(integral, 1) == 1 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1431090039.0 astroML-1.0.2/astroML/sum_of_norms.py0000644000076700000240000001043200000000000017073 0ustar00bsipoczstaff""" Functions for regression using sums-of-norms """ import numpy as np def norm(x, x0, sigma): return (1. / np.sqrt(2 * np.pi) / sigma * np.exp(-0.5 * (x - x0) ** 2 / sigma ** 2)) def sum_of_norms(x, y, num_gaussians=None, locs=None, widths=None, spacing='linear', full_output=False): r"""Approximate a function with a sum of gaussians Parameters ---------- x : array-like, shape = n_training The x-value of the input function y : array-like, shape = n_training The y-value of the input function num_gaussians : integer (optional) The number of gaussians to use. If this is not specified, then the number of items in `locs` is used. If neither is specified, this defaults to 30 locs : array-like (optional) The locations of the gaussians to use. If not specified, locations will be uniformly spaced between the end-points of x. widths : float or array-like (optional) The widths of the gaussians to use. If a single value, use this for all widths. If multiple values, the length must be equal to len(locs), if specified, and/or num_gaussians, if specified. If widths is not provided, then widths will be used which are half the distance between adjacent gaussians will be used. full_output : boolean (default = False) if True, return the rms error of the best-fit, the list of locations, and the list of widths spacing : string, ['linear'|'log'] spacing to use for automatic determination of locs. Not referenced if locs is specified Returns ------- weights if full_output == False (weights, rms, locs, widths) if full_output == True weights : array-like, length = num_gaussians The weights which best approximate the spectrum. The reconstruction is given by sum_{i=1}^{num_gaussians} weights[i] * norm(locs[i], widths[i]) rms : float the root-mean-square error of the best-fit solution locs : array the locations of the gaussians used for the fit widths : array the widths of the gaussians used for the fit Notes ----- This is solved using linear regression. Our matrix :math:`X` has shape :math:`(m, n)` where :math:`m` is the number of training points, and :math:`n` is the number of gaussians in the fit. We seek the linear combination of these :math:`n` gaussians which minimizes the squared residual error, which in matrix form can be expressed .. math: \epsilon = \min\left|y - Xw \right| here the vector :math:`w` encodes the linear combination. The vector :math:`w` which minimizes :math:`\epsilon` can be shown to be .. math: w = (X^T X)^{-1} X^T y This is the result returned by this function. """ x, y = map(np.asarray, (x, y)) assert x.ndim == 1 assert y.shape == x.shape n_training = x.shape[0] if locs is None: if num_gaussians is None: num_gaussians = 30 if spacing == 'linear': locs = np.linspace(x[0], x[-1], num_gaussians) elif spacing == 'log': locs = np.logspace(np.log10(x[0]), np.log10(x[-1]), num_gaussians) else: locs = np.asarray(locs) if num_gaussians is None: num_gaussians = len(locs) if num_gaussians is not None: assert len(locs) == num_gaussians if widths is None: widths = np.zeros(num_gaussians) widths[:-1] = locs[1:] - locs[:-1] if len(widths) > 1: widths[-1] = widths[-2] else: widths[-1] = x[-1] - x[0] else: widths = np.atleast_1d(widths) assert widths.size in (1, num_gaussians) widths = widths + np.zeros(num_gaussians) # broadcast to shape # use broadcasting to compute X in one go, without slow loops X = norm(x.reshape(n_training, 1), locs.reshape(1, num_gaussians), widths.reshape(1, num_gaussians)) # use pinv rather than inv for numerical stability w_best = np.dot(np.linalg.pinv(np.dot(X.T, X)), np.dot(X.T, y)) if not full_output: return w_best else: rms = np.sqrt(np.mean(y - np.dot(X, w_best)) ** 2) return w_best, rms, locs, widths ././@PaxHeader0000000000000000000000000000003300000000000010211 xustar0027 mtime=1643147665.436084 astroML-1.0.2/astroML/tests/0000755000076700000240000000000000000000000015155 5ustar00bsipoczstaff././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1541133836.0 astroML-1.0.2/astroML/tests/__init__.py0000644000076700000240000000000000000000000017254 0ustar00bsipoczstaff././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1615393766.0 astroML-1.0.2/astroML/tests/test_correlation.py0000644000076700000240000000202400000000000021105 0ustar00bsipoczstaffimport numpy as np from numpy.testing import assert_allclose from astroML.correlation import uniform_sphere, ra_dec_to_xyz, angular_dist_to_euclidean_dist def test_uniform_sphere(): np.random.seed(42) # check number of points in 3 axis-aligned cones is approximately the same ra, dec = uniform_sphere((-180, 180), (-90, 90), 10000) x, y, z = ra_dec_to_xyz(ra, dec) assert_allclose(x ** 2 + y ** 2 + z ** 2, np.ones_like(x)) in_x_cone = (y**2 + z**2 < 0.25).mean() in_y_cone = (x**2 + z**2 < 0.25).mean() in_z_cone = (x**2 + y**2 < 0.25).mean() # with prop > 0.999999 should not differ for more than 5 standard deviations assert_allclose(in_x_cone, in_y_cone, atol=5e-2) assert_allclose(in_x_cone, in_z_cone, atol=5e-2) assert_allclose(in_y_cone, in_z_cone, atol=5e-2) def test_angular_d_to_euclidean_d(): assert_allclose(angular_dist_to_euclidean_dist(180.), 2.) assert_allclose(angular_dist_to_euclidean_dist(60.), 1.) assert_allclose(angular_dist_to_euclidean_dist(0.), 0.) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1612224990.0 astroML-1.0.2/astroML/tests/test_filters.py0000644000076700000240000000232200000000000020235 0ustar00bsipoczstaffimport pytest import numpy as np from numpy.testing import assert_allclose from astroML.filters import savitzky_golay, wiener_filter from astroML.utils.exceptions import AstroMLDeprecationWarning def test_savitzky_golay(): y = np.zeros(100) y[::2] = 1 with pytest.warns(AstroMLDeprecationWarning): f = savitzky_golay(y, window_size=3, order=1) assert_allclose(f, (2 - y) / 3.) def test_savitzky_golay_fft(): y = np.random.normal(size=100) for width in [3, 5]: for order in range(width - 1): with pytest.warns(AstroMLDeprecationWarning): f1 = savitzky_golay(y, width, order, use_fft=False) f2 = savitzky_golay(y, width, order, use_fft=True) assert_allclose(f1, f2) def test_wiener_filter_simple(): t = np.linspace(0, 1, 256) h = np.zeros_like(t) h[::2] = 1000 s = wiener_filter(t, h) assert_allclose(s, np.mean(h)) def test_wienter_filter_spike(): np.random.seed(0) N = 2048 dt = 0.05 t = dt * np.arange(N) h = np.exp(-0.5 * ((t - 20.) / 1.0) ** 2) + 10 hN = h + np.random.normal(0, 0.05, size=h.shape) h_smooth = wiener_filter(t, hN) assert_allclose(h, h_smooth, atol=0.03) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1615393766.0 astroML-1.0.2/astroML/tests/test_fourier.py0000644000076700000240000000637500000000000020254 0ustar00bsipoczstaffimport pytest import numpy as np from numpy.testing import assert_allclose from astroML.fourier import (FT_continuous, IFT_continuous, PSD_continuous, sinegauss, sinegauss_FT) @pytest.mark.parametrize('t0', [-1, 0, 1]) @pytest.mark.parametrize('f0', [1, 2]) @pytest.mark.parametrize('Q', [1, 2]) def test_wavelets(t0, f0, Q): t = np.linspace(-10, 10, 10000) h = sinegauss(t, t0, f0, Q) f, H = FT_continuous(t, h) H2 = sinegauss_FT(f, t0, f0, Q) assert_allclose(H, H2, atol=1E-8) def sinegauss_a(t, t0, f0, a): """Sine-gaussian wavelet. Differs from the ``astroML.fourier.sinegauss`` in that instead of taking a coefficient ``Q`` and calculating the coefficient ``a``, it assumes the given coefficient ``a`` is correct. The relationship between the two is given as: a = (f0 * 1. / Q) ** 2 """ return (np.exp(-a * (t - t0) ** 2) * np.exp(2j * np.pi * f0 * (t - t0))) def sinegauss_FT_a(f, t0, f0, a): """Fourier transform of the sine-gaussian wavelet. This uses the convention H(f) = integral[ h(t) exp(-2pi i f t) dt] Differs from the ``astroML.fourier.sinegauss`` in that instead of taking a coefficient ``Q`` and calculating the coefficient ``a``, it assumes the given coefficient ``a`` is correct. The relationship between the two is given as: a = (f0 * 1. / Q) ** 2 """ return (np.sqrt(np.pi / a) * np.exp(-2j * np.pi * f * t0) * np.exp(-np.pi ** 2 * (f - f0) ** 2 / a)) def sinegauss_PSD(f, t0, f0, a): """PSD of the sine-gaussian wavelet PSD(f) = |H(f)|^2 + |H(-f)|^2 """ Pf = np.pi / a * np.exp(-2 * np.pi ** 2 * (f - f0) ** 2 / a) Pmf = np.pi / a * np.exp(-2 * np.pi ** 2 * (-f - f0) ** 2 / a) return Pf + Pmf @pytest.mark.parametrize('a', [1, 2]) @pytest.mark.parametrize('t0', [-2, 0, 2]) @pytest.mark.parametrize('f0', [-1, 0, 1]) @pytest.mark.parametrize('method', [1, 2]) def test_FT_continuous(a, t0, f0, method): t = np.linspace(-9, 10, 10000) h = sinegauss_a(t, t0, f0, a) f, H = FT_continuous(t, h, method=method) assert_allclose(H, sinegauss_FT_a(f, t0, f0, a), atol=1E-12) @pytest.mark.parametrize('a', [1, 2]) @pytest.mark.parametrize('t0', [-2, 0, 2]) @pytest.mark.parametrize('f0', [-1, 0, 1]) @pytest.mark.parametrize('method', [1, 2]) def test_PSD_continuous(a, t0, f0, method): t = np.linspace(-9, 10, 10000) h = sinegauss_a(t, t0, f0, a) f, P = PSD_continuous(t, h, method=method) assert_allclose(P, sinegauss_PSD(f, t0, f0, a), atol=1E-12) @pytest.mark.parametrize('a', [1, 2]) @pytest.mark.parametrize('t0', [-2, 0, 2]) @pytest.mark.parametrize('f0', [-1, 0, 1]) @pytest.mark.parametrize('method', [1, 2]) def check_IFT_continuous(a, t0, f0, method): f = np.linspace(-9, 10, 10000) H = sinegauss_FT_a(f, t0, f0, a) t, h = IFT_continuous(f, H, method=method) assert_allclose(h, sinegauss_a(t, t0, f0, a), atol=1E-12) def test_IFT_FT(): # Test IFT(FT(x)) = x np.random.seed(0) t = -50 + 0.01 * np.arange(10000.) x = np.random.random(10000) f, y = FT_continuous(t, x) t, xp = IFT_continuous(f, y) assert_allclose(x, xp, atol=1E-7) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1615393766.0 astroML-1.0.2/astroML/tests/test_lumfunc.py0000644000076700000240000000046500000000000020244 0ustar00bsipoczstaffimport numpy as np from astroML.lumfunc import Cminus def test_cminus_nans(): # Regression test for https://github.com/astroML/astroML/issues/234 x = [10.02, 10.00] y = [14.97, 14.99] xmax = [10.03, 10.01] ymax = [14.98, 15.00] assert np.isfinite(np.sum(Cminus(x, y, xmax, ymax))) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1615393766.0 astroML-1.0.2/astroML/tests/test_resample.py0000644000076700000240000000515300000000000020402 0ustar00bsipoczstaffimport numpy as np from numpy.testing import assert_allclose, run_module_suite from astroML.resample import bootstrap, jackknife from astroML.stats import mean_sigma def test_jackknife_results(): np.random.seed(0) x = np.random.normal(0, 1, 100) mu1, sig1 = jackknife(x, np.mean, kwargs=dict(axis=1)) mu2, sig2 = jackknife(x, np.std, kwargs=dict(axis=1)) assert_allclose([mu1, sig1, mu2, sig2], [0.0598080155345, 0.100288031685, 1.01510470168, 0.0649020337599]) def test_jackknife_multiple(): np.random.seed(0) x = np.random.normal(0, 1, 100) mu1, sig1 = jackknife(x, np.mean, kwargs=dict(axis=1)) mu2, sig2 = jackknife(x, np.std, kwargs=dict(axis=1)) res = jackknife(x, mean_sigma, kwargs=dict(axis=1)) assert_allclose(res[0], (mu1, sig1)) assert_allclose(res[1], (mu2, sig2)) def test_bootstrap_results(): np.random.seed(0) x = np.random.normal(0, 1, 100) distribution = bootstrap(x, 100, np.mean, kwargs=dict(axis=1), random_state=0) mu, sigma = mean_sigma(distribution) assert_allclose([mu, sigma], [0.08139846, 0.10465327]) def test_bootstrap_multiple(): np.random.seed(0) x = np.random.normal(0, 1, 100) dist_mean = bootstrap(x, 100, np.mean, kwargs=dict(axis=1), random_state=0) dist_std = bootstrap(x, 100, np.std, kwargs=dict(axis=1), random_state=0) res = bootstrap(x, 100, mean_sigma, kwargs=dict(axis=1), random_state=0) assert_allclose(res[0], dist_mean) assert_allclose(res[1], dist_std) def test_bootstrap_covar(): np.random.seed(0) mean = [0., 0.] covar = [[10., 3.], [3., 20.]] x = np.random.multivariate_normal(mean, covar, 1000) dist_cov = bootstrap(x, 10000, np.cov, kwargs=dict(rowvar=0), random_state=0) assert_allclose(covar[0][0], dist_cov[0][0], atol=2.*0.4) def test_bootstrap_pass_indices(): np.random.seed(0) x = np.random.normal(0, 1, 100) dist1 = bootstrap(x, 100, np.mean, kwargs=dict(axis=1), random_state=0) dist2 = bootstrap(x, 100, lambda i: np.mean(x[i], axis=1), pass_indices=True, random_state=0) assert_allclose(dist1, dist2) def test_jackknife_pass_indices(): np.random.seed(0) x = np.random.normal(0, 1, 100) res1 = jackknife(x, np.mean, kwargs=dict(axis=1)) res2 = jackknife(x, lambda i: np.mean(x[i], axis=1), pass_indices=True) assert_allclose(res1, res2) if __name__ == '__main__': run_module_suite() ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1643147665.4371226 astroML-1.0.2/astroML/time_series/0000755000076700000240000000000000000000000016323 5ustar00bsipoczstaff././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1641327596.0 astroML-1.0.2/astroML/time_series/ACF.py0000644000076700000240000000717200000000000017275 0ustar00bsipoczstaff""" Auto-correlation functions """ import numpy as np from scipy import fftpack from .periodogram import lomb_scargle def ACF_scargle(t, y, dy, n_omega=2 ** 10, omega_max=100): """Compute the Auto-correlation function via Scargle's method Parameters ---------- t : array_like times of observation. Assumed to be in increasing order. y : array_like values of each observation. Should be same shape as t dy : float or array_like errors in each observation. n_omega : int (optional) number of angular frequencies at which to evaluate the periodogram default is 2^10 omega_max : float (optional) maximum value of omega at which to evaluate the periodogram default is 100 Returns ------- ACF, t : ndarrays The auto-correlation function and associated times """ t = np.asarray(t) y = np.asarray(y) if y.shape != t.shape: raise ValueError("shapes of t and y must match") dy = np.asarray(dy) * np.ones(y.shape) d_omega = omega_max * 1. / (n_omega + 1) omega = d_omega * np.arange(1, n_omega + 1) # recall that P(omega = 0) = (chi^2(0) - chi^2(0)) / chi^2(0) # = 0 # compute P and shifted full-frequency array P = lomb_scargle(t, y, dy, omega, generalized=True) P = np.concatenate([[0], P, P[-2::-1]]) # compute PW, the power of the window function PW = lomb_scargle(t, np.ones(len(t)), dy, omega, generalized=False, subtract_mean=False) PW = np.concatenate([[0], PW, PW[-2::-1]]) # compute the inverse fourier transform of P and PW rho = fftpack.ifft(P).real rhoW = fftpack.ifft(PW).real ACF = fftpack.fftshift(rho / rhoW) / np.sqrt(2) N = len(ACF) dt = 2 * np.pi / N / (omega[1] - omega[0]) t = dt * (np.arange(N) - N // 2) return ACF, t def ACF_EK(t, y, dy, bins=20): """Auto-correlation function via the Edelson-Krolik method Parameters ---------- t : array_like times of observation. Assumed to be in increasing order. y : array_like values of each observation. Should be same shape as t dy : float or array_like errors in each observation. bins : int or array_like (optional) if integer, the number of bins to use in the analysis. if array, the (nbins + 1) bin edges. Default is bins=20. Returns ------- ACF : ndarray The auto-correlation function and associated times err : ndarray the error in the ACF bins : ndarray bin edges used in computation """ t = np.asarray(t) y = np.asarray(y) if y.shape != t.shape: raise ValueError("shapes of t and y must match") if t.ndim != 1: raise ValueError("t should be a 1-dimensional array") dy = np.asarray(dy) * np.ones(y.shape) # compute mean and standard deviation of y w = 1. / dy / dy w /= w.sum() mu = np.dot(w, y) sigma = np.std(y, ddof=1) dy2 = dy[:, None] dt = t - t[:, None] UDCF = ((y - mu) * (y - mu)[:, None] / np.sqrt((sigma ** 2 - dy ** 2) * (sigma ** 2 - dy2 ** 2))) # determine binning bins = np.asarray(bins) if bins.size == 1: dt_min = dt.min() dt_max = dt.max() bins = np.linspace(dt_min, dt_max + 1E-10, bins + 1) ACF = np.zeros(len(bins) - 1) M = np.zeros(len(bins) - 1) for i in range(len(bins) - 1): flag = (dt >= bins[i]) & (dt < bins[i + 1]) M[i] = flag.sum() ACF[i] = np.sum(UDCF[flag]) ACF /= M return ACF, np.sqrt(2. / M), bins ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1615393766.0 astroML-1.0.2/astroML/time_series/__init__.py0000644000076700000240000000063300000000000020436 0ustar00bsipoczstafffrom .ACF import ACF_scargle, ACF_EK from .generate import generate_power_law, generate_damped_RW from .periodogram import (lomb_scargle, lomb_scargle_bootstrap, lomb_scargle_AIC, lomb_scargle_BIC, multiterm_periodogram, search_frequencies, MultiTermFit) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1615393766.0 astroML-1.0.2/astroML/time_series/generate.py0000644000076700000240000000722000000000000020470 0ustar00bsipoczstaffimport numpy as np from sklearn.utils import check_random_state def generate_power_law(N, dt, beta, generate_complex=False, random_state=None): """Generate a power-law light curve This uses the method from Timmer & Koenig [1]_ Parameters ---------- N : integer Number of equal-spaced time steps to generate dt : float Spacing between time-steps beta : float Power-law index. The spectrum will be (1 / f)^beta generate_complex : boolean (optional) if True, generate a complex time series rather than a real time series random_state : None, int, or np.random.RandomState instance (optional) random seed or random number generator Returns ------- x : ndarray the length-N References ---------- .. [1] Timmer, J. & Koenig, M. On Generating Power Law Noise. A&A 300:707 """ random_state = check_random_state(random_state) dt = float(dt) N = int(N) Npos = int(N / 2) # Nneg = int((N - 1) / 2) domega = (2 * np.pi / dt / N) if generate_complex: omega = domega * np.fft.ifftshift(np.arange(N) - int(N / 2)) else: omega = domega * np.arange(Npos + 1) x_fft = np.zeros(len(omega), dtype=complex) x_fft.real[1:] = random_state.normal(0, 1, len(omega) - 1) x_fft.imag[1:] = random_state.normal(0, 1, len(omega) - 1) x_fft[1:] *= (1. / omega[1:]) ** (0.5 * beta) x_fft[1:] *= (1. / np.sqrt(2)) # by symmetry, the Nyquist frequency is real if x is real if (not generate_complex) and (N % 2 == 0): x_fft.imag[-1] = 0 if generate_complex: x = np.fft.ifft(x_fft) else: x = np.fft.irfft(x_fft, N) return x def generate_damped_RW(t_rest, tau=300., z=2.0, xmean=0, SFinf=0.3, random_state=None): """Generate a damped random walk light curve This uses a damped random walk model to generate a light curve similar to that of a QSO [1]_. Parameters ---------- t_rest : array_like rest-frame time. Should be in increasing order tau : float relaxation time z : float redshift xmean : float (optional) mean value of random walk; default=0 SFinf : float (optional Structure function at infinity; default=0.3 random_state : None, int, or np.random.RandomState instance (optional) random seed or random number generator Returns ------- x : ndarray the sampled values corresponding to times t_rest Notes ----- The differential equation is (with t = time/tau): dX = -X(t) * dt + sigma * sqrt(tau) * e(t) * sqrt(dt) + b * tau * dt where e(t) is white noise with zero mean and unit variance, and Xmean = b * tau SFinf = sigma * sqrt(tau / 2) so dX(t) = -X(t) * dt + sqrt(2) * SFint * e(t) * sqrt(dt) + Xmean * dt References ---------- .. [1] Kelly, B., Bechtold, J. & Siemiginowska, A. (2009) Are the Variations in Quasar Optical Flux Driven by Thermal Fluctuations? ApJ 698:895 (2009) """ # Xmean = b * tau # SFinf = sigma * sqrt(tau / 2) t_rest = np.atleast_1d(t_rest) if t_rest.ndim != 1: raise ValueError('t_rest should be a 1D array') random_state = check_random_state(random_state) N = len(t_rest) t_obs = t_rest * (1. + z) / tau x = np.zeros(N) x[0] = random_state.normal(xmean, SFinf) E = random_state.normal(0, 1, N) for i in range(1, N): dt = t_obs[i] - t_obs[i - 1] x[i] = (x[i - 1] - dt * (x[i - 1] - xmean) + np.sqrt(2) * SFinf * E[i] * np.sqrt(dt)) return x ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1615393766.0 astroML-1.0.2/astroML/time_series/periodogram.py0000644000076700000240000003131500000000000021210 0ustar00bsipoczstaffimport numpy as np from sklearn.utils import check_random_state try: # astropy.timeseries is new in v3.2 from astropy.timeseries import LombScargle except ImportError: from astropy.stats import LombScargle from ..utils.decorators import deprecated from ..utils.exceptions import AstroMLDeprecationWarning @deprecated('0.4', alternative='astropy.stats.LombScargle', warning_type=AstroMLDeprecationWarning) def lomb_scargle(t, y, dy, omega, generalized=True, subtract_mean=True, significance=None): """ (Generalized) Lomb-Scargle Periodogram with Floating Mean Parameters ---------- t : array_like sequence of times y : array_like sequence of observations dy : array_like sequence of observational errors omega : array_like frequencies at which to evaluate p(omega) generalized : bool if True (default) use generalized lomb-scargle method otherwise, use classic lomb-scargle. subtract_mean : bool if True (default) subtract the sample mean from the data before computing the periodogram. Only referenced if generalized is False significance : None or float or ndarray if specified, then this is a list of significances to compute for the results. Returns ------- p : array_like Lomb-Scargle power associated with each frequency omega z : array_like if significance is specified, this gives the levels corresponding to the desired significance (using the Scargle 1982 formalism) Notes ----- The algorithm is based on reference [1]_. The result for generalized=False is given by equation 4 of this work, while the result for generalized=True is given by equation 20. Note that the normalization used in this reference is different from that used in other places in the literature (e.g. [2]_). For a discussion of normalization and false-alarm probability, see [1]_. To recover the normalization used in Scargle [3]_, the results should be multiplied by (N - 1) / 2 where N is the number of data points. References ---------- .. [1] M. Zechmeister and M. Kurster, A&A 496, 577-584 (2009) .. [2] W. Press et al, Numerical Recipes in C (2002) .. [3] Scargle, J.D. 1982, ApJ 263:835-853 """ # delegate to astropy's Lomb-Scargle ls = LombScargle(t, y, dy, fit_mean=generalized, center_data=subtract_mean) frequency = np.asarray(omega) / (2 * np.pi) p_omega = ls.power(frequency, method='cython') if significance is not None: N = t.size M = 2 * N z = (-2.0 / (N - 1.) * np.log(1 - (1 - np.asarray(significance)) ** (1. / M))) return p_omega, z else: return p_omega @deprecated('0.4', alternative='astropy.stats.LombScargle.false_alarm_probability', warning_type=AstroMLDeprecationWarning) def lomb_scargle_bootstrap(t, y, dy, omega, generalized=True, subtract_mean=True, N_bootstraps=100, random_state=None): """Use a bootstrap analysis to compute Lomb-Scargle significance Parameters ---------- The first set of parameters are passed to the lomb_scargle algorithm t : array_like sequence of times y : array_like sequence of observations dy : array_like sequence of observational errors omega : array_like frequencies at which to evaluate p(omega) generalized : bool if True (default) use generalized lomb-scargle method otherwise, use classic lomb-scargle. subtract_mean : bool if True (default) subtract the sample mean from the data before computing the periodogram. Only referenced if generalized is False Remaining parameters control the bootstrap N_bootstraps : int number of bootstraps random_state : None, int, or RandomState object random seed, or random number generator Returns ------- D : ndarray distribution of the height of the highest peak """ random_state = check_random_state(random_state) t = np.asarray(t) y = np.asarray(y) dy = np.asarray(dy) + np.zeros_like(y) D = np.zeros(N_bootstraps) for i in range(N_bootstraps): ind = random_state.randint(0, len(y), len(y)) p = lomb_scargle(t, y[ind], dy[ind], omega, generalized=generalized, subtract_mean=subtract_mean) D[i] = p.max() return D def lomb_scargle_AIC(P, y, dy, n_harmonics=1): """Compute the AIC for a Lomb-Scargle Periodogram Parameters ---------- P : array_like lomb-scargle power y : array_like observations dy : array_like errors n_harmonics : int (optional) the number of harmonics used in the Lomb-Scargle fit. Default is 1 Returns ------- AIC : ndarray AIC value corresponding to values in P """ P, y, dy = map(np.asarray(P, y, dy)) w = 1. / dy ** 2 mu = np.dot(w, y) / w.sum() return np.sum(((y - mu) / dy) ** 2) * P - (2 * n_harmonics + 1) * 2 def lomb_scargle_BIC(P, y, dy, n_harmonics=1): """Compute the BIC for a Lomb-Scargle Periodogram Parameters ---------- P : array_like lomb-scargle power y : array_like observations dy : array_like errors n_harmonics : int (optional) the number of harmonics used in the Lomb-Scargle fit. Default is 1 Returns ------- BIC : ndarray BIC value corresponding to values in P """ P, y, dy = map(np.asarray, (P, y, dy)) w = 1. / dy ** 2 mu = np.dot(w, y) / w.sum() N = len(y) return np.sum(((y - mu) / dy) ** 2) * P - (2 * n_harmonics + 1) * np.log(N) @deprecated('0.4', alternative='astropy.stats.LombScargle', warning_type=AstroMLDeprecationWarning) def multiterm_periodogram(t, y, dy, omega, n_terms=3): """Perform a multiterm periodogram at each omega This calculates the chi2 for the best-fit least-squares solution for each frequency omega. Parameters ---------- t : array_like sequence of times y : array_like sequence of observations dy : array_like sequence of observational errors omega : float or array_like frequencies at which to evaluate p(omega) Returns ------- power : ndarray P = 1. - chi2 / chi2_0 where chi2_0 is the chi-square for a simple mean fit to the data """ # TODO: deprecate this ls = LombScargle(t, y, dy, nterms=n_terms) frequency = np.asarray(omega) / (2 * np.pi) return ls.power(frequency) def search_frequencies(t, y, dy, LS_func=lomb_scargle, LS_kwargs=None, initial_guess=25, limit_fractions=[0.04, 0.3, 0.9, 0.99], n_eval=10000, n_retry=5, n_save=50): """Utility Routine to find the best frequencies To find the best frequency with a Lomb-Scargle periodogram requires searching a large range of frequencies at a very fine resolution. This is an iterative routine that searches progressively finer grids to narrow-in on the best result. Parameters ---------- t: array_like observed times y: array_like observed fluxes or magnitudes dy: array_like observed errors on y Other Parameters ---------------- LS_func : function Function used to perform Lomb-Scargle periodogram. The call signature should be LS_func(t, y, dy, omega, **kwargs) (Default is astroML.periodogram.lomb_scargle) LS_kwargs : dict dictionary of keyword arguments to pass to LS_func in addition to (t, y, dy, omega) initial_guess : float the initial guess of the best period limit_fractions : array_like the list of fractions to use when zooming in on peak possibilities. On the i^th iteration, with f_i = limit_fractions[i], the range probed around each candidate will be (candidate * f_i, candidate / f_i). n_eval : integer or list The number of point to evaluate in the range on each iteration. If n_eval is a list, it should have the same length as limit_fractions. n_retry : integer or list Number of top points to search on each iteration. If n_retry is a list, it should have the same length as limit_fractions. n_save : integer or list Number of evaluations to save on each iteration. If n_save is a list, it should have the same length as limit_fractions. Returns ------- omega_top, power_top: ndarrays The saved values of omega and power. These will have size 1 + n_save * (1 + n_retry * len(limit_fractions)) as long as n_save > n_retry """ if LS_kwargs is None: LS_kwargs = dict() omega_best = [initial_guess] power_best = LS_func(t, y, dy, omega_best, **LS_kwargs) for (Ne, Nr, Ns, frac) in np.broadcast(n_eval, n_retry, n_save, limit_fractions): # make sure we explore differing regions log_ob = np.log(omega_best) width = 0.1 * np.log(frac) log_ob = np.floor(-log_ob / width).astype(int) indices = np.arange(len(log_ob)) for i in range(Nr): if len(indices) == 0: break omega_try = omega_best[indices[-1]] non_duplicates = (log_ob != log_ob[-1]) log_ob = log_ob[non_duplicates] indices = indices[non_duplicates] omega = np.linspace(omega_try * frac, omega_try / frac, Ne) power = LS_func(t, y, dy, omega, **LS_kwargs) i = np.argsort(power)[-Ns:] power_best = np.concatenate([power_best, power[i]]) omega_best = np.concatenate([omega_best, omega[i]]) i = np.argsort(power_best) power_best = power_best[i] omega_best = omega_best[i] i = np.argsort(omega_best) return omega_best[i], power_best[i] class MultiTermFit: """Multi-term Fourier fit to a light curve Parameters ---------- omega : float angular frequency of the fundamental mode n_terms : int the number of Fourier modes to use in the fit """ def __init__(self, omega, n_terms): self.omega = omega self.n_terms = n_terms def _make_X(self, t): t = np.asarray(t) k = np.arange(1, self.n_terms + 1) X = np.hstack([np.ones(t[:, None].shape), np.sin(k * self.omega * t[:, None]), np.cos(k * self.omega * t[:, None])]) return X def fit(self, t, y, dy): """Fit multiple Fourier terms to the data Parameters ---------- t: array_like observed times y: array_like observed fluxes or magnitudes dy: array_like observed errors on y Returns ------- self : The MultiTermFit object is returned """ t = np.asarray(t) y = np.asarray(y) dy = np.asarray(dy) X_scaled = self._make_X(t) / dy[:, None] y_scaled = y / dy self.t_ = t self.w_ = np.linalg.solve(np.dot(X_scaled.T, X_scaled), np.dot(X_scaled.T, y_scaled)) return self def predict(self, Nphase, return_phased_times=False, adjust_offset=True): """Compute the phased fit, and optionally return phased times Parameters ---------- Nphase : int Number of terms to use in the phased fit return_phased_times : bool If True, then return a phased version of the input times adjust_offset : bool If true, then shift results so that the minimum value is at phase 0 Returns ------- phase, y_fit : ndarrays The phase and y value of the best-fit light curve phased_times : ndarray The phased version of the training times. Returned if return_phased_times is set to True. """ phase_fit = np.linspace(0, 1, Nphase + 1)[:-1] X_fit = self._make_X(2 * np.pi * phase_fit / self.omega) y_fit = np.dot(X_fit, self.w_) i_offset = np.argmin(y_fit) if adjust_offset: y_fit = np.concatenate([y_fit[i_offset:], y_fit[:i_offset]]) if return_phased_times: if adjust_offset: offset = phase_fit[i_offset] else: offset = 0 phased_times = (self.t_ * self.omega * 0.5 / np.pi - offset) % 1 return phase_fit, y_fit, phased_times else: return phase_fit, y_fit ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1643147665.4378738 astroML-1.0.2/astroML/time_series/tests/0000755000076700000240000000000000000000000017465 5ustar00bsipoczstaff././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1541133836.0 astroML-1.0.2/astroML/time_series/tests/__init__.py0000644000076700000240000000000000000000000021564 0ustar00bsipoczstaff././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1612224990.0 astroML-1.0.2/astroML/time_series/tests/test_generate.py0000644000076700000240000000144400000000000022673 0ustar00bsipoczstaffimport pytest import numpy as np from numpy.testing import assert_, assert_almost_equal from astroML.time_series import generate_power_law, generate_damped_RW @pytest.mark.parametrize("N", [10, 11]) @pytest.mark.parametrize("generate_complex", [True, False]) def test_generate_args(N, generate_complex): dt = 0.1 beta = 2 x = generate_power_law(N, dt, beta, generate_complex) assert_(bool(generate_complex) == np.iscomplexobj(x)) assert_(len(x) == N) def test_generate_RW(): t = np.arange(0., 1E2) tau = 300 z = 2.0 rng = np.random.RandomState(0) xmean = rng.rand(1)*200 - 100 N = len(t) y = generate_damped_RW(t, tau=tau, z=z, xmean=xmean, random_state=rng) assert_(len(generate_damped_RW(t)) == N) assert_almost_equal(np.mean(y), xmean, 0) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1612224990.0 astroML-1.0.2/astroML/time_series/tests/test_periodogram.py0000644000076700000240000000105000000000000023402 0ustar00bsipoczstaffimport numpy as np from numpy.testing import assert_almost_equal from astroML.time_series import search_frequencies # TODO: add tests of lomb_scargle inputs & significance # TODO: add tests of bootstrap def test_search_frequencies(): rng = np.random.RandomState(0) t = np.arange(0, 1E1, 0.01) f = 1 w = 2 * np.pi * np.array(f) y = np.sin(w * t) dy = 0.01 y += dy * rng.randn(len(y)) omegas, power = search_frequencies(t, y, dy) omax = omegas[power == max(power)] assert_almost_equal(w, omax, decimal=3) ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1643147665.4389567 astroML-1.0.2/astroML/utils/0000755000076700000240000000000000000000000015153 5ustar00bsipoczstaff././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1612224990.0 astroML-1.0.2/astroML/utils/__init__.py0000644000076700000240000000005700000000000017266 0ustar00bsipoczstafffrom .utils import * from .decorators import * ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1643147368.0 astroML-1.0.2/astroML/utils/decorators.py0000644000076700000240000001323000000000000017671 0ustar00bsipoczstaffimport warnings import functools from packaging.version import Version import numpy as np import astropy import pickle from astroML.utils.exceptions import AstroMLDeprecationWarning # We use functionality of the deprecated decorator from astropy that was # added in v2.0.10 LTS and v3.1 av = astropy.__version__ ASTROPY_LT_31 = (Version(av) < Version("2.0.10") or (Version("3.0") <= Version(av) and Version(av) < Version("3.1"))) __all__ = ['pickle_results', 'deprecated'] def pickle_results(filename=None, verbose=True): """Generator for decorator which allows pickling the results of a funcion Pickle is python's built-in object serialization. This decorator, when used on a function, saves the results of the computation in the function to a pickle file. If the function is called a second time with the same inputs, then the computation will not be repeated and the previous results will be used. This functionality is useful for computations which take a long time, but will need to be repeated (such as the first step of a data analysis). Parameters ---------- filename : string (optional) pickle file to which results will be saved. If not specified, then the file is '_output.pkl' where '' is replaced by the name of the decorated function. verbose : boolean (optional) if True, then print a message to standard out specifying when the pickle file is written or read. Examples -------- >>> @pickle_results('tmp.pkl', verbose=True) ... def f(x): ... return x * x >>> f(4) @pickle_results: computing results and saving to 'tmp.pkl' 16 >>> f(4) @pickle_results: using precomputed results from 'tmp.pkl' 16 """ def pickle_func(f, filename=filename, verbose=verbose): if filename is None: filename = '%s_output.pkl' % f.__name__ def new_f(*args, **kwargs): # While loading, pickle, can raise any number of errors. Cover cases # when FileNotFoundError or when when pickle raises an error as equivalent. # In either case the data in the cache will have to be regenerated. try: D = pickle.load(open(filename, 'rb')) cache_exists = True except Exception: D = {} cache_exists = False # simple comparison doesn't work in the case of numpy arrays Dargs = D.get('args') Dkwargs = D.get('kwargs') try: args_match = (args == Dargs) except ValueError: args_match = np.all([np.all(a1 == a2) for (a1, a2) in zip(Dargs, args)]) try: kwargs_match = (kwargs == Dkwargs) except ValueError: kwargs_match = ((sorted(Dkwargs.keys()) == sorted(kwargs.keys())) and (np.all([np.all(Dkwargs[key] == kwargs[key]) for key in kwargs]))) if (type(D) == dict and D.get('funcname') == f.__name__ and args_match and kwargs_match): if verbose: print("@pickle_results: using precomputed " "results from '%s'" % filename) retval = D['retval'] else: if verbose: print("@pickle_results: computing results " "and saving to '%s'" % filename) if cache_exists: print(" warning: cache file '%s' exists" % filename) print(" - args match: %s" % args_match) print(" - kwargs match: %s" % kwargs_match) retval = f(*args, **kwargs) funcdict = dict(funcname=f.__name__, retval=retval, args=args, kwargs=kwargs) with open(filename, 'wb') as outfile: pickle.dump(funcdict, outfile) return retval return new_f return pickle_func if not ASTROPY_LT_31: from astropy.utils.decorators import deprecated else: def deprecated(since, message='', alternative=None, **kwargs): def deprecate_function(func, message=message, since=since, alternative=alternative): if message == '': message = ('Function {} has been deprecated since {}.' .format(func.__name__, since)) if alternative is not None: message += '\n Use {} instead.'.format(alternative) @functools.wraps(func) def deprecated_func(*args, **kwargs): warnings.warn(message, AstroMLDeprecationWarning) return func(*args, **kwargs) return deprecated_func def deprecate_class(cls, message=message, since=since, alternative=alternative): if message == '': message = ('Class {} has been deprecated since {}.' .format(cls.__name__, since)) if alternative is not None: message += '\n Use {} instead.'.format(alternative) cls.__init__ = deprecate_function(cls.__init__, message=message) return cls def deprecate(obj): if isinstance(obj, type): return deprecate_class(obj) else: return deprecate_function(obj) return deprecate ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1612224990.0 astroML-1.0.2/astroML/utils/exceptions.py0000644000076700000240000000100200000000000017677 0ustar00bsipoczstaff""" This module contains errors/exceptions and warnings for astroML. """ from astropy.utils.exceptions import AstropyWarning class AstroMLWarning(AstropyWarning): """ A base warning class from which all AstroML warnings should inherit. This class is subclassed from AstropyWarnings, so warnings inherited by this class is handled by the Astropy logger. """ class AstroMLDeprecationWarning(AstroMLWarning): """ A warning class to indicate a deprecated feature in astroML. """ ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1643147665.4396706 astroML-1.0.2/astroML/utils/tests/0000755000076700000240000000000000000000000016315 5ustar00bsipoczstaff././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1612224990.0 astroML-1.0.2/astroML/utils/tests/__init__.py0000644000076700000240000000000000000000000020414 0ustar00bsipoczstaff././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1612224990.0 astroML-1.0.2/astroML/utils/tests/test_pickle_results.py0000644000076700000240000000142400000000000022757 0ustar00bsipoczstaffimport os from astroML.utils.decorators import pickle_results def test_pickle_results(): filename = 'tmp.pkl' @pickle_results('tmp.pkl') def foo(x): foo.called = True return x * x # cleanup if necessary if os.path.exists(filename): os.remove(filename) # initial calculation: function should be executed foo.called = False assert foo(4) == 16 assert foo.called is True # recalculation: function should not be executed foo.called = False assert foo(4) == 16 assert foo.called is False # recalculation with different input: function should be executed foo.called = False assert foo(5) == 25 assert foo.called is True # cleanup assert os.path.exists(filename) os.remove(filename) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1615393766.0 astroML-1.0.2/astroML/utils/tests/test_utils.py0000644000076700000240000000624100000000000021071 0ustar00bsipoczstaffimport numpy as np from numpy.testing import assert_array_almost_equal, assert_allclose from astroML.utils import (log_multivariate_gaussian, convert_2D_cov, completeness_contamination, split_samples) def positive_definite_matrix(N, M=None): """return an array of M positive-definite matrices with shape (N, N)""" if M is None: V = np.random.random((N, N)) V = np.dot(V, V.T) else: V = np.random.random((M, N, N)) for i in range(M): V[i] = np.dot(V[i], V[i].T) return V def test_log_multivariate_gaussian_methods(): np.random.seed(0) x = np.random.random(3) mu = np.random.random(3) V = positive_definite_matrix(3, M=10) res1 = log_multivariate_gaussian(x, mu, V, method=0) res2 = log_multivariate_gaussian(x, mu, V, method=1) assert_array_almost_equal(res1, res2) def test_log_multivariate_gaussian(): np.random.seed(0) x = np.random.random((2, 1, 1, 3)) mu = np.random.random((3, 1, 3)) V = positive_definite_matrix(3, M=4) res1 = log_multivariate_gaussian(x, mu, V) assert res1.shape == (2, 3, 4) res2 = np.zeros_like(res1) for i in range(2): for j in range(3): for k in range(4): res2[i, j, k] = log_multivariate_gaussian(x[i, 0, 0], mu[j, 0], V[k]) assert_array_almost_equal(res1, res2) def test_log_multivariate_gaussian_Vinv(): np.random.seed(0) x = np.random.random((2, 1, 1, 3)) mu = np.random.random((3, 1, 3)) V = positive_definite_matrix(3, M=4) Vinv = np.array([np.linalg.inv(Vi) for Vi in V]) res1 = log_multivariate_gaussian(x, mu, V) res2 = log_multivariate_gaussian(x, mu, V, Vinv=Vinv) assert_array_almost_equal(res1, res2) def test_2D_cov(): s1 = 1.3 s2 = 1.0 alpha = 0.2 cov = convert_2D_cov(s1, s2, alpha) assert_array_almost_equal([s1, s2, alpha], convert_2D_cov(cov)) def test_completeness_contamination(): completeness, contamination = completeness_contamination(np.ones(100), np.ones(100)) assert_allclose(completeness, 1) assert_allclose(contamination, 0) completeness, contamination = completeness_contamination(np.zeros(100), np.zeros(100)) assert_allclose(completeness, 0) assert_allclose(contamination, 0) completeness, contamination = completeness_contamination( np.concatenate((np.ones(50), np.zeros(50))), np.concatenate((np.ones(25), np.zeros(50), np.ones(25))) ) assert_allclose(completeness, 0.5) assert_allclose(contamination, 0.5) def test_split_samples(): X = np.arange(100.) y = np.arange(100.) X_divisions, y_divisions = split_samples(X, y) assert (len(X_divisions[0]) == len(y_divisions[0]) == 75) assert (len(X_divisions[1]) == len(y_divisions[1]) == 25) assert (len(set(X_divisions[0]) | set(X_divisions[1])) == 100) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1641327596.0 astroML-1.0.2/astroML/utils/utils.py0000644000076700000240000001635600000000000016700 0ustar00bsipoczstaffimport numpy as np from scipy import linalg from sklearn.utils import check_random_state as sk_check_random_state from astroML.utils.decorators import deprecated from astroML.utils.exceptions import AstroMLDeprecationWarning try: # SciPy >= 0.19 from scipy.special import logsumexp as scipy_logsumexp except ImportError: from scipy.misc import logsumexp as scipy_logsumexp __all__ = ['logsumexp', 'log_multivariate_gaussian', 'check_random_state', 'split_samples', 'completeness_contamination', 'convert_2D_cov'] @deprecated('1.0', alternative='scipy.special.logsumexp', warning_type=AstroMLDeprecationWarning) def logsumexp(arr, axis=None): return scipy_logsumexp(arr, axis) def log_multivariate_gaussian(x, mu, V, Vinv=None, method=1): """Evaluate a multivariate gaussian N(x|mu, V) This allows for multiple evaluations at once, using array broadcasting Parameters ---------- x: array_like points, shape[-1] = n_features mu: array_like centers, shape[-1] = n_features V: array_like covariances, shape[-2:] = (n_features, n_features) Vinv: array_like or None pre-computed inverses of V: should have the same shape as V method: integer, optional method = 0: use cholesky decompositions of V method = 1: use explicit inverse of V Returns ------- values: ndarray shape = broadcast(x.shape[:-1], mu.shape[:-1], V.shape[:-2]) Examples -------- >>> x = [1, 2] >>> mu = [0, 0] >>> V = [[2, 1], [1, 2]] >>> log_multivariate_gaussian(x, mu, V) -3.3871832107434003 """ x = np.asarray(x, dtype=float) mu = np.asarray(mu, dtype=float) V = np.asarray(V, dtype=float) ndim = x.shape[-1] x_mu = x - mu if V.shape[-2:] != (ndim, ndim): raise ValueError("Shape of (x-mu) and V do not match") Vshape = V.shape V = V.reshape([-1, ndim, ndim]) if Vinv is not None: assert Vinv.shape == Vshape method = 1 if method == 0: Vchol = np.array([linalg.cholesky(V[i], lower=True) for i in range(V.shape[0])]) # we may be more efficient by using scipy.linalg.solve_triangular # with each cholesky decomposition VcholI = np.array([linalg.inv(Vchol[i]) for i in range(V.shape[0])]) logdet = np.array([2 * np.sum(np.log(np.diagonal(Vchol[i]))) for i in range(V.shape[0])]) VcholI = VcholI.reshape(Vshape) logdet = logdet.reshape(Vshape[:-2]) VcIx = np.sum(VcholI * x_mu.reshape(x_mu.shape[:-1] + (1,) + x_mu.shape[-1:]), -1) xVIx = np.sum(VcIx ** 2, -1) elif method == 1: if Vinv is None: Vinv = np.array([linalg.inv(V[i]) for i in range(V.shape[0])]).reshape(Vshape) else: assert Vinv.shape == Vshape logdet = np.log(np.array([linalg.det(V[i]) for i in range(V.shape[0])])) logdet = logdet.reshape(Vshape[:-2]) xVI = np.sum(x_mu.reshape(x_mu.shape + (1,)) * Vinv, -2) xVIx = np.sum(xVI * x_mu, -1) else: raise ValueError("unrecognized method %s" % method) return -0.5 * ndim * np.log(2 * np.pi) - 0.5 * (logdet + xVIx) @deprecated('1.0', alternative='sklearn.utils.check_random_state', warning_type=AstroMLDeprecationWarning) def check_random_state(seed): return sk_check_random_state(seed) def split_samples(X, y, fractions=[0.75, 0.25], random_state=None): """Split samples into training, test, and cross-validation sets Parameters ---------- X, y : array_like leading dimension n_samples fraction : array_like length n_splits. If the fractions do not add to 1, they will be re-normalized. random_state : None, int, or RandomState object random seed, or random number generator """ X = np.asarray(X) y = np.asarray(y) if X.shape[0] != y.shape[0]: raise ValueError("X and y should have the same leading dimension") n_samples = X.shape[0] fractions = np.asarray(fractions).ravel().cumsum() fractions /= fractions[-1] fractions *= n_samples N = np.concatenate([[0], fractions.astype(int)]) N[-1] = n_samples # in case of roundoff errors random_state = sk_check_random_state(random_state) indices = np.arange(len(y)) random_state.shuffle(indices) X_divisions = tuple(X[indices[N[i]:N[i + 1]]] for i in range(len(fractions))) y_divisions = tuple(y[indices[N[i]:N[i + 1]]] for i in range(len(fractions))) return X_divisions, y_divisions def completeness_contamination(predicted, true): """Compute the completeness and contamination values Parameters ---------- predicted_value, true_value : array_like integer arrays of predicted and true values. This assumes that 'false' values are given by 0, and 'true' values are nonzero. Returns ------- completeness, contamination : float or array_like the completeness and contamination of the results. shape is np.broadcast(predicted, true).shape[:-1] """ predicted = np.asarray(predicted) true = np.asarray(true) outshape = np.broadcast(predicted, true).shape[:-1] predicted = np.atleast_2d(predicted) true = np.atleast_2d(true) matches = (predicted == true) tp = np.sum(matches & (true != 0), -1) fp = np.sum(~matches & (true == 0), -1) fn = np.sum(~matches & (true != 0), -1) tot = (tp + fn) tot[tot == 0] = 1 completeness = tp * 1. / tot tot = (tp + fp) tot[tot == 0] = 1 contamination = fp * 1. / tot completeness[np.isnan(completeness)] = 0 contamination[np.isnan(contamination)] = 0 return completeness.reshape(outshape), contamination.reshape(outshape) def convert_2D_cov(*args): """Convert a 2D covariance from matrix form to principal form, and back if one parameter is passed, it is a covariance matrix, and the principal axes and rotation (sigma1, sigma2, alpha) are returned. if three parameters are passed, they are assumed to be (sigma1, sigma2, alpha) and the covariance is returned """ if len(args) == 1: C = np.asarray(args[0]) if C.shape != (2, 2): raise ValueError("Input not understood") sigma_x2 = C[0, 0] sigma_y2 = C[1, 1] sigma_xy = C[0, 1] alpha = 0.5 * np.arctan2(2 * sigma_xy, (sigma_x2 - sigma_y2)) tmp1 = 0.5 * (sigma_x2 + sigma_y2) tmp2 = np.sqrt(0.25 * (sigma_x2 - sigma_y2) ** 2 + sigma_xy ** 2) sigma1 = np.sqrt(tmp1 + tmp2) sigma2 = np.sqrt(tmp1 - tmp2) return (sigma1, sigma2, alpha) elif len(args) == 3: sigma1, sigma2, alpha = args s = np.sin(alpha) c = np.cos(alpha) sigma_x2 = (sigma1 * c) ** 2 + (sigma2 * s) ** 2 sigma_y2 = (sigma1 * s) ** 2 + (sigma2 * c) ** 2 sigma_xy = (sigma1 ** 2 - sigma2 ** 2) * s * c return np.array([[sigma_x2, sigma_xy], [sigma_xy, sigma_y2]]) else: raise ValueError("Input not understood") ././@PaxHeader0000000000000000000000000000003300000000000010211 xustar0027 mtime=1643147665.409542 astroML-1.0.2/astroML.egg-info/0000755000076700000240000000000000000000000015505 5ustar00bsipoczstaff././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1643147665.0 astroML-1.0.2/astroML.egg-info/PKG-INFO0000644000076700000240000001763500000000000016616 0ustar00bsipoczstaffMetadata-Version: 2.1 Name: astroML Version: 1.0.2 Summary: Tools for machine learning and data mining in Astronomy Home-page: http://astroML.github.com Author: Jake VanderPlas Author-email: vanderplas@astro.washington.edu Maintainer: Brigitta Sipocz Maintainer-email: bsipocz@gmail.com License: BSD 3-Clause License Description: .. -*- mode: rst -*- ======================================= AstroML: Machine Learning for Astronomy ======================================= .. image:: https://img.shields.io/badge/arXiv-1411.5039-orange.svg?style=flat :target: https://arxiv.org/abs/1411.5039 .. image:: https://img.shields.io/travis/astroML/astroML/master.svg?style=flat :target: https://travis-ci.org/astroML/astroML/ .. image:: https://img.shields.io/pypi/v/astroML.svg?style=flat :target: https://pypi.python.org/pypi/astroML .. image:: https://img.shields.io/pypi/dm/astroML.svg?style=flat :target: https://pypi.python.org/pypi/astroML .. image:: https://img.shields.io/badge/license-BSD-blue.svg?style=flat :target: https://github.com/astroml/astroml/blob/main/LICENSE.rst AstroML is a Python module for machine learning and data mining built on numpy, scipy, scikit-learn, and matplotlib, and distributed under the BSD license. It contains a growing library of statistical and machine learning routines for analyzing astronomical data in python, loaders for several open astronomical datasets, and a large suite of examples of analyzing and visualizing astronomical datasets. This project was started in 2012 by Jake VanderPlas to accompany the book *Statistics, Data Mining, and Machine Learning in Astronomy* by Zeljko Ivezic, Andrew Connolly, Jacob VanderPlas, and Alex Gray. Important Links =============== - HTML documentation: https://www.astroML.org - Core source-code repository: https://github.com/astroML/astroML - Figure source-code repository: https://github.com/astroML/astroML-figures - Issue Tracker: https://github.com/astroML/astroML/issues - Mailing List: https://groups.google.com/forum/#!forum/astroml-general Installation ============ **Before installation, make sure your system meets the prerequisites listed in Dependencies, listed below.** Core ---- To install the core ``astroML`` package in your home directory, use:: pip install astroML A conda package for astroML is also available either on the conda-forge or on the astropy conda channels:: conda install -c astropy astroML The core package is pure python, so installation should be straightforward on most systems. To install from source, use:: python setup.py install You can specify an arbitrary directory for installation using:: python setup.py install --prefix='/some/path' To install system-wide on Linux/Unix systems:: python setup.py build sudo python setup.py install Dependencies ============ There are two levels of dependencies in astroML. *Core* dependencies are required for the core ``astroML`` package. *Optional* dependencies are required to run some (but not all) of the example scripts. Individual example scripts will list their optional dependencies at the top of the file. Core Dependencies ----------------- The core ``astroML`` package requires the following (some of the functionality might work with older versions): - Python_ version 3.6+ - Numpy_ >= 1.13 - Scipy_ >= 0.18 - Scikit-learn_ >= 0.18 - Matplotlib_ >= 3.0 - AstroPy_ >= 3.0 Optional Dependencies --------------------- Several of the example scripts require specialized or upgraded packages. These requirements are listed at the top of the particular scripts - HEALPy_ provides an interface to the HEALPix pixelization scheme, as well as fast spherical harmonic transforms. Development =========== This package is designed to be a repository for well-written astronomy code, and submissions of new routines are encouraged. After installing the version-control system Git_, you can check out the latest sources from GitHub_ using:: git clone git://github.com/astroML/astroML.git or if you have write privileges:: git clone git@github.com:astroML/astroML.git Contribution ------------ We strongly encourage contributions of useful astronomy-related code: for `astroML` to be a relevant tool for the python/astronomy community, it will need to grow with the field of research. There are a few guidelines for contribution: General ~~~~~~~ Any contribution should be done through the github pull request system (for more information, see the `help page `_ Code submitted to ``astroML`` should conform to a BSD-style license, and follow the `PEP8 style guide `_. Documentation and Examples ~~~~~~~~~~~~~~~~~~~~~~~~~~ All submitted code should be documented following the `Numpy Documentation Guide`_. This is a unified documentation style used by many packages in the scipy universe. In addition, it is highly recommended to create example scripts that show the usefulness of the method on an astronomical dataset (preferably making use of the loaders in ``astroML.datasets``). These example scripts are in the ``examples`` subdirectory of the main source repository. .. _Numpy Documentation Guide: https://numpydoc.readthedocs.io/en/latest/format.html Authors ======= Package Author -------------- * Jake Vanderplas https://github.com/jakevdp http://jakevdp.github.com Maintainer ---------- * Brigitta Sipocz https://github.com/bsipocz Code Contribution ----------------- * Morgan Fouesneau https://github.com/mfouesneau * Julian Taylor http://github.com/juliantaylor .. _Python: https://www.python.org .. _Numpy: https://www.numpy.org .. _Scipy: https://www.scipy.org .. _Scikit-learn: https://scikit-learn.org .. _Matplotlib: https://matplotlib.org .. _AstroPy: http://www.astropy.org/ .. _HEALPy: https://github.com/healpy/healpy .. _Git: https://git-scm.com/ .. _GitHub: https://www.github.com Keywords: astronomy,astrophysics,cosmology,space,science,modeling,models,fitting,machine-learning Platform: UNKNOWN Classifier: Development Status :: 4 - Beta Classifier: Environment :: Console Classifier: Intended Audience :: Science/Research Classifier: License :: OSI Approved :: BSD License Classifier: Natural Language :: English Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3.5 Classifier: Programming Language :: Python :: 3.6 Classifier: Programming Language :: Python :: 3.7 Classifier: Programming Language :: Python :: 3.8 Classifier: Programming Language :: Python :: 3 :: Only Classifier: Topic :: Scientific/Engineering :: Astronomy Provides-Extra: test Provides-Extra: all Provides-Extra: codestyle Provides-Extra: docs ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1643147665.0 astroML-1.0.2/astroML.egg-info/SOURCES.txt0000644000076700000240000001205400000000000017373 0ustar00bsipoczstaffCHANGES.rst LICENSE.rst MANIFEST.in README.rst pyproject.toml setup.cfg setup.py astroML/__init__.py astroML/conftest.py astroML/correlation.py astroML/cosmology.py astroML/crossmatch.py astroML/decorators.py astroML/filters.py astroML/fourier.py astroML/lumfunc.py astroML/resample.py astroML/sum_of_norms.py astroML.egg-info/PKG-INFO astroML.egg-info/SOURCES.txt astroML.egg-info/dependency_links.txt astroML.egg-info/requires.txt astroML.egg-info/top_level.txt astroML/classification/__init__.py astroML/classification/gmm_bayes.py astroML/classification/tests/__init__.py astroML/classification/tests/test_gmm_bayes.py astroML/clustering/__init__.py astroML/clustering/mst_clustering.py astroML/clustering/tests/__init__.py astroML/clustering/tests/test_MST_clustering.py astroML/datasets/LIGO_bigdog.py astroML/datasets/LINEAR_sample.py astroML/datasets/__init__.py astroML/datasets/dr7_quasar.py astroML/datasets/generated.py astroML/datasets/hogg2010test.py astroML/datasets/imaging_sample.py astroML/datasets/kelly2007test.py astroML/datasets/moving_objects.py astroML/datasets/nasa_atlas.py astroML/datasets/rrlyrae_mags.py astroML/datasets/rrlyrae_templates.py astroML/datasets/sdss_S82standards.py astroML/datasets/sdss_corrected_spectra.py astroML/datasets/sdss_filters.py astroML/datasets/sdss_galaxy_colors.py astroML/datasets/sdss_galaxy_images.py astroML/datasets/sdss_specgals.py astroML/datasets/sdss_spectrum.py astroML/datasets/sdss_sspp.py astroML/datasets/wmap_temperatures.py astroML/datasets/tests/__init__.py astroML/datasets/tests/test_datasets.py astroML/datasets/tools/__init__.py astroML/datasets/tools/cas_query.py astroML/datasets/tools/download.py astroML/datasets/tools/sdss_fits.py astroML/datasets/tools/sql_query.py astroML/density_estimation/__init__.py astroML/density_estimation/bayesian_blocks.py astroML/density_estimation/density_estimation.py astroML/density_estimation/empirical.py astroML/density_estimation/gauss_mixture.py astroML/density_estimation/histtools.py astroML/density_estimation/xdeconv.py astroML/density_estimation/tests/__init__.py astroML/density_estimation/tests/test_bayesian_blocks.py astroML/density_estimation/tests/test_density.py astroML/density_estimation/tests/test_empirical.py astroML/density_estimation/tests/test_hist_binwidth.py astroML/density_estimation/tests/test_xdeconv.py astroML/dimensionality/__init__.py astroML/dimensionality/iterative_pca.py astroML/dimensionality/tests/__init__.py astroML/dimensionality/tests/test_iterative_PCA.py astroML/linear_model/TLS.py astroML/linear_model/__init__.py astroML/linear_model/kernel_regression.py astroML/linear_model/linear_regression.py astroML/linear_model/linear_regression_errors.py astroML/linear_model/tests/__init__.py astroML/linear_model/tests/test_TLS.py astroML/linear_model/tests/test_kernel_regression.py astroML/linear_model/tests/test_linear_regression.py astroML/plotting/__init__.py astroML/plotting/ellipse.py astroML/plotting/hist_tools.py astroML/plotting/mcmc.py astroML/plotting/multiaxes.py astroML/plotting/regression.py astroML/plotting/scatter_contour.py astroML/plotting/settings.py astroML/plotting/tools.py astroML/plotting/tests/__init__.py astroML/plotting/tests/test_devectorize.py astroML/stats/__init__.py astroML/stats/_binned_statistic.py astroML/stats/_point_statistics.py astroML/stats/random.py astroML/stats/tests/__init__.py astroML/stats/tests/test_binned_statistic.py astroML/stats/tests/test_stats.py astroML/tests/__init__.py astroML/tests/test_correlation.py astroML/tests/test_filters.py astroML/tests/test_fourier.py astroML/tests/test_lumfunc.py astroML/tests/test_resample.py astroML/time_series/ACF.py astroML/time_series/__init__.py astroML/time_series/generate.py astroML/time_series/periodogram.py astroML/time_series/tests/__init__.py astroML/time_series/tests/test_generate.py astroML/time_series/tests/test_periodogram.py astroML/utils/__init__.py astroML/utils/decorators.py astroML/utils/exceptions.py astroML/utils/utils.py astroML/utils/tests/__init__.py astroML/utils/tests/test_pickle_results.py astroML/utils/tests/test_utils.py examples/README.rst examples/algorithms/README.rst examples/algorithms/fig_volume_ratio.py examples/algorithms/plot_bayesian_blocks.py examples/algorithms/plot_crossmatch.py examples/algorithms/plot_spectrum_sum_of_norms.py examples/datasets/README.rst examples/datasets/compute_sdss_pca.py examples/datasets/plot_LIGO_spectrum.py examples/datasets/plot_SDSS_SSPP.py examples/datasets/plot_corrected_spectra.py examples/datasets/plot_dr7_quasar.py examples/datasets/plot_great_wall.py examples/datasets/plot_moving_objects.py examples/datasets/plot_nasa_atlas.py examples/datasets/plot_rrlyrae_mags.py examples/datasets/plot_sdss_S82standards.py examples/datasets/plot_sdss_filters.py examples/datasets/plot_sdss_galaxy_colors.py examples/datasets/plot_sdss_imaging.py examples/datasets/plot_sdss_line_ratios.py examples/datasets/plot_sdss_specgals.py examples/datasets/plot_sdss_spectrum.py examples/datasets/plot_wmap_power_spectra.py examples/datasets/plot_wmap_raw.py examples/learning/README.rst examples/learning/plot_neighbors_photoz.py././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1643147665.0 astroML-1.0.2/astroML.egg-info/dependency_links.txt0000644000076700000240000000000100000000000021553 0ustar00bsipoczstaff ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1643147665.0 astroML-1.0.2/astroML.egg-info/requires.txt0000644000076700000240000000032100000000000020101 0ustar00bsipoczstaffscikit-learn>=0.18 numpy>=1.13 scipy>=0.18 matplotlib>=3.0 astropy>=3.0 [all] pymc3<3.11,>=3.7 [codestyle] flake8 [docs] sphinx [test] pytest-doctestplus pytest-astropy-header pytest-remotedata pytest-cov ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1643147665.0 astroML-1.0.2/astroML.egg-info/top_level.txt0000644000076700000240000000001000000000000020226 0ustar00bsipoczstaffastroML ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1643147665.4399118 astroML-1.0.2/examples/0000755000076700000240000000000000000000000014250 5ustar00bsipoczstaff././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1641327599.0 astroML-1.0.2/examples/README.rst0000644000076700000240000000036600000000000015744 0ustar00bsipoczstaffGeneral astroML Examples ------------------------ This section contains several example plots which do not appear in the textbook. Currently there are only a few examples here: for more, please see the :ref:`text book figures `. ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1643147665.4414852 astroML-1.0.2/examples/algorithms/0000755000076700000240000000000000000000000016421 5ustar00bsipoczstaff././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1431090040.0 astroML-1.0.2/examples/algorithms/README.rst0000644000076700000240000000040300000000000020105 0ustar00bsipoczstaffData Processing Algorithms -------------------------- These figures and examples show some of the data processing and algorithmic tools enabled by astroML and other Python packages. For more examples, see the :ref:`figures ` from the textbook. ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1612224990.0 astroML-1.0.2/examples/algorithms/fig_volume_ratio.py0000644000076700000240000000346400000000000022334 0ustar00bsipoczstaff""" Curse of Dimensionality: Volume Ratio ------------------------------------- This figure shows the ratio of the volume of a unit hypercube to the volume of an inscribed hypersphere. The curse of dimensionality is illustrated in the fact that this ratio approaches zero as the number of dimensions approaches infinity. """ # Author: Jake VanderPlas # License: BSD # The figure produced by this code is published in the textbook # "Statistics, Data Mining, and Machine Learning in Astronomy" (2013) # For more information, see http://astroML.github.com # To report a bug or issue, use the following forum: # https://groups.google.com/forum/#!forum/astroml-general import numpy as np from matplotlib import pyplot as plt from scipy.special import gammaln #---------------------------------------------------------------------- # This function adjusts matplotlib settings for a uniform feel in the textbook. # Note that with usetex=True, fonts are rendered with LaTeX. This may # result in an error if LaTeX is not installed on your system. In that case, # you can set usetex to False. from astroML.plotting import setup_text_plots setup_text_plots(fontsize=8, usetex=True) dims = np.arange(1, 51) # log of volume of a sphere with r = 1 log_V_sphere = (np.log(2) + 0.5 * dims * np.log(np.pi) - np.log(dims) - gammaln(0.5 * dims)) log_V_cube = dims * np.log(2) # compute the log of f_k to avoid overflow errors log_f_k = log_V_sphere - log_V_cube fig, ax = plt.subplots(figsize=(5, 3.75)) ax.semilogy(dims, np.exp(log_V_cube), '-k', label='side-2 hypercube') ax.semilogy(dims, np.exp(log_V_sphere), '--k', label='inscribed unit hypersphere') ax.set_xlim(1, 50) ax.set_ylim(1E-13, 1E15) ax.set_xlabel('Number of Dimensions') ax.set_ylabel('Hyper-Volume') ax.legend(loc=3) plt.show() ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1612224990.0 astroML-1.0.2/examples/algorithms/plot_bayesian_blocks.py0000644000076700000240000000523300000000000023164 0ustar00bsipoczstaff""" Bayesian Blocks for Histograms ------------------------------ .. currentmodule:: astroML Bayesian Blocks is a dynamic histogramming method which optimizes one of several possible fitness functions to determine an optimal binning for data, where the bins are not necessarily uniform width. The astroML implementation is based on [1]_. For more discussion of this technique, see the blog post at [2]_. The code below uses a fitness function suitable for event data with possible repeats. More fitness functions are available: see :mod:`density_estimation` References ~~~~~~~~~~ .. [1] Scargle, J `et al.` (2012) http://adsabs.harvard.edu/abs/2012arXiv1207.5578S .. [2] http://jakevdp.github.com/blog/2012/09/12/dynamic-programming-in-python/ """ # Author: Jake VanderPlas # License: BSD # The figure is an example from astroML: see http://astroML.github.com import numpy as np from scipy import stats from matplotlib import pyplot as plt from astropy.visualization import hist # draw a set of variables np.random.seed(0) t = np.concatenate([stats.cauchy(-5, 1.8).rvs(500), stats.cauchy(-4, 0.8).rvs(2000), stats.cauchy(-1, 0.3).rvs(500), stats.cauchy(2, 0.8).rvs(1000), stats.cauchy(4, 1.5).rvs(500)]) # truncate values to a reasonable range t = t[(t > -15) & (t < 15)] #------------------------------------------------------------ # First figure: show normal histogram binning fig = plt.figure(figsize=(10, 4)) fig.subplots_adjust(left=0.1, right=0.95, bottom=0.15) ax1 = fig.add_subplot(121) ax1.hist(t, bins=15, histtype='stepfilled', alpha=0.2, density=True) ax1.set_xlabel('t') ax1.set_ylabel('P(t)') ax2 = fig.add_subplot(122) ax2.hist(t, bins=200, histtype='stepfilled', alpha=0.2, density=True) ax2.set_xlabel('t') ax2.set_ylabel('P(t)') #------------------------------------------------------------ # Second & Third figure: Knuth bins & Bayesian Blocks fig = plt.figure(figsize=(10, 4)) fig.subplots_adjust(left=0.1, right=0.95, bottom=0.15) for bins, title, subplot in zip(['knuth', 'blocks'], ["Knuth's rule", 'Bayesian blocks'], [121, 122]): ax = fig.add_subplot(subplot) # plot a standard histogram in the background, with alpha transparency hist(t, bins=200, histtype='stepfilled', alpha=0.2, density=True, label='standard histogram') # plot an adaptive-width histogram on top hist(t, bins=bins, ax=ax, color='black', histtype='step', density=True, label=title) ax.legend(prop=dict(size=12)) ax.set_xlabel('t') ax.set_ylabel('P(t)') plt.show() ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1612224990.0 astroML-1.0.2/examples/algorithms/plot_crossmatch.py0000644000076700000240000000277600000000000022213 0ustar00bsipoczstaff""" Catalog cross-matching ---------------------- This plots the cross-matched samples between the SDSS imaging data and the SDSS Stripe 82 standard stars. """ # Author: Jake VanderPlas # License: BSD # The figure is an example from astroML: see http://astroML.github.com import os import sys from time import time import numpy as np from matplotlib import pyplot as plt from astropy.visualization import hist from astroML.datasets import fetch_imaging_sample, fetch_sdss_S82standards from astroML.crossmatch import crossmatch_angular # get imaging data image_data = fetch_imaging_sample() imX = np.empty((len(image_data), 2), dtype=np.float64) imX[:, 0] = image_data['ra'] imX[:, 1] = image_data['dec'] # get standard stars standards_data = fetch_sdss_S82standards() stX = np.empty((len(standards_data), 2), dtype=np.float64) stX[:, 0] = standards_data['RA'] stX[:, 1] = standards_data['DEC'] # crossmatch catalogs max_radius = 1. / 3600 # 1 arcsec dist, ind = crossmatch_angular(imX, stX, max_radius) match = ~np.isinf(dist) dist_match = dist[match] dist_match *= 3600 ax = plt.axes() hist(dist_match, bins='knuth', ax=ax, histtype='stepfilled', ec='k', fc='#AAAAAA') ax.set_xlabel('radius of match (arcsec)') ax.set_ylabel('N(r, r+dr)') ax.text(0.95, 0.95, "Total objects: %i\nNumber with match: %i" % (imX.shape[0], np.sum(match)), ha='right', va='top', transform=ax.transAxes) ax.set_xlim(0, 0.2) plt.show() ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1431090040.0 astroML-1.0.2/examples/algorithms/plot_spectrum_sum_of_norms.py0000644000076700000240000000250100000000000024457 0ustar00bsipoczstaff""" Linear Sum of Gaussians ----------------------- Fitting a spectrum with a linear sum of gaussians. """ # Author: Jake VanderPlas # License: BSD # The figure is an example from astroML: see http://astroML.github.com from matplotlib import pyplot as plt from astroML.datasets import fetch_vega_spectrum from astroML.sum_of_norms import sum_of_norms, norm # Fetch the data x, y = fetch_vega_spectrum() # truncate the spectrum mask = (x >= 2000) & (x < 10000) x = x[mask] y = y[mask] for n_gaussians in (10, 50, 100): # compute the best-fit linear combination w_best, rms, locs, widths = sum_of_norms(x, y, n_gaussians, spacing='linear', full_output=True) norms = w_best * norm(x[:, None], locs, widths) # plot the results plt.figure() plt.plot(x, y, '-k', label='input spectrum') ylim = plt.ylim() plt.plot(x, norms, ls='-', c='#FFAAAA') plt.plot(x, norms.sum(1), '-r', label='sum of gaussians') plt.ylim(-0.1 * ylim[1], ylim[1]) plt.legend(loc=0) plt.text(0.97, 0.8, "rms error = %.2g" % rms, ha='right', va='top', transform=plt.gca().transAxes) plt.title("Fit to a Spectrum with a Sum of %i Gaussians" % n_gaussians) plt.show() ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1643147665.4469252 astroML-1.0.2/examples/datasets/0000755000076700000240000000000000000000000016060 5ustar00bsipoczstaff././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1431090040.0 astroML-1.0.2/examples/datasets/README.rst0000644000076700000240000000135100000000000017547 0ustar00bsipoczstaffData set Examples ----------------- These plots show some of the data set loaders available in astroML, and some of the ways that astronomical data can be visualized and processed using open source python tools. The dataset loaders are in the submodule :mod:`astroML.datasets`, and start with the word ``fetch_``. The first time a dataset loader is called, it will attempt to download the dataset from the web and store it locally on disk. The default location is ``~/astroML_data``, but this location can be changed by specifying an alternative directory in the ``ASTROML_DATA`` environment variable. On subsequent calls, the cached version of the data is used. For more examples, see the :ref:`figures ` from the textbook. ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1615393766.0 astroML-1.0.2/examples/datasets/compute_sdss_pca.py0000644000076700000240000001070600000000000021771 0ustar00bsipoczstaff""" Example of downloading and processing SDSS spectra -------------------------------------------------- This is the code used to create the files fetched by the routine :func:`fetch_sdss_corrected_spectra`. Be aware that this routine downloads a large amount of data (~700MB for 4000 spectra) and takes a long time to run (~30 minutes for 4000 spectra). """ # Author: Jake VanderPlas # License: BSD # The figure is an example from astroML: see http://astroML.github.com import sys from urllib.error import HTTPError import numpy as np from astroML.datasets import fetch_sdss_spectrum from astroML.datasets.tools import query_plate_mjd_fiber, TARGET_GALAXY from astroML.dimensionality import iterative_pca def fetch_and_shift_spectra(n_spectra, outfile, primtarget=TARGET_GALAXY, zlim=(0, 0.7), loglam_start=3.5, loglam_end=3.9, Nlam=1000): """ This function queries CAS for matching spectra, and then downloads them and shifts them to a common redshift binning """ # First query for the list of spectra to download plate, mjd, fiber = query_plate_mjd_fiber(n_spectra, primtarget, zlim[0], zlim[1]) # Set up arrays to hold information gathered from the spectra spec_cln = np.zeros(n_spectra, dtype=np.int32) lineindex_cln = np.zeros(n_spectra, dtype=np.int32) log_NII_Ha = np.zeros(n_spectra, dtype=np.float32) log_OIII_Hb = np.zeros(n_spectra, dtype=np.float32) z = np.zeros(n_spectra, dtype=np.float32) zerr = np.zeros(n_spectra, dtype=np.float32) spectra = np.zeros((n_spectra, Nlam), dtype=np.float32) mask = np.zeros((n_spectra, Nlam), dtype=bool) # Calculate new wavelength coefficients new_coeff0 = loglam_start new_coeff1 = (loglam_end - loglam_start) / Nlam # Now download all the needed spectra, and resample to a common # wavelength bin. n_spectra = len(plate) num_skipped = 0 i = 0 while i < n_spectra: sys.stdout.write(' %i / %i spectra\r' % (i + 1, n_spectra)) sys.stdout.flush() try: spec = fetch_sdss_spectrum(plate[i], mjd[i], fiber[i]) except HTTPError: num_skipped += 1 print("%i, %i, %i not found" % (plate[i], mjd[i], fiber[i])) i += 1 continue spec_rebin = spec.restframe().rebin(new_coeff0, new_coeff1, Nlam) if np.all(spec_rebin.spectrum == 0): num_skipped += 1 print("%i, %i, %i is all zero" % (plate[i], mjd[i], fiber[i])) i += 1 continue spec_cln[i] = spec.spec_cln lineindex_cln[i], (log_NII_Ha[i], log_OIII_Hb[i])\ = spec.lineratio_index() z[i] = spec.z zerr[i] = spec.zerr spectra[i] = spec_rebin.spectrum mask[i] = spec_rebin.compute_mask(0.5, 5) i += 1 sys.stdout.write('\n') N = i print(" %i spectra skipped" % num_skipped) print(" %i spectra processed" % N) print("saving to %s" % outfile) np.savez(outfile, spectra=spectra[:N], mask=mask[:N], coeff0=new_coeff0, coeff1=new_coeff1, spec_cln=spec_cln[:N], lineindex_cln=lineindex_cln[:N], log_NII_Ha=log_NII_Ha[:N], log_OIII_Hb=log_OIII_Hb[:N], z=z[:N], zerr=zerr[:N]) def spec_iterative_pca(outfile, n_ev=10, n_iter=20, norm='L2'): """ This function takes the file outputted above, performs an iterative PCA to fill in the gaps, and appends the results to the same file. """ data_in = np.load(outfile) spectra = data_in['spectra'] mask = data_in['mask'] res = iterative_pca(spectra, mask, n_ev=n_ev, n_iter=n_iter, norm=norm, full_output=True) input_dict = {key: data_in[key] for key in data_in.files} # don't save the reconstructed spectrum: this can easily # be recomputed from the other parameters. input_dict['mu'] = res[1] input_dict['evecs'] = res[2] input_dict['evals'] = res[3] input_dict['norms'] = res[4] input_dict['coeffs'] = res[5] np.savez(outfile, **input_dict) if __name__ == '__main__': fetch_and_shift_spectra(4000, 'spec4000.npz') spec_iterative_pca('spec4000.npz') ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1612224990.0 astroML-1.0.2/examples/datasets/plot_LIGO_spectrum.py0000644000076700000240000000472200000000000022151 0ustar00bsipoczstaff""" Plot the power spectrum of LIGO ------------------------------- This compares the power spectrum computed using the raw FFT, and using Welch's method (i.e. overlapping window functions that reduce noise). The top panel shows the raw signal, which is the measurements of the change in baseline length. The bottom panel shows the raw and smoothed power spectrum, used by the LIGO team to characterize the noise of the detector. The particular data used here is the injected `Big Dog `_ event. """ # Author: Jake VanderPlas # License: BSD # The figure is an example from astroML: see http://astroML.github.com import numpy as np from matplotlib import pyplot as plt from scipy import fftpack from matplotlib import mlab from astroML.datasets import fetch_LIGO_large #------------------------------------------------------------ # Fetch the LIGO hanford data data, dt = fetch_LIGO_large() # subset of the data to plot t0 = 646 T = 2 tplot = dt * np.arange(T * 4096) dplot = data[4096 * t0: 4096 * (t0 + T)] tplot = tplot[::10] dplot = dplot[::10] fmin = 40 fmax = 2060 #------------------------------------------------------------ # compute PSD using simple FFT N = len(data) df = 1. / (N * dt) PSD = abs(dt * fftpack.fft(data)[:N // 2]) ** 2 f = df * np.arange(N / 2) cutoff = ((f >= fmin) & (f <= fmax)) f = f[cutoff] PSD = PSD[cutoff] f = f[::100] PSD = PSD[::100] #------------------------------------------------------------ # compute PSD using Welch's method -- hanning window function PSDW2, fW2 = mlab.psd(data, NFFT=4096, Fs=1. / dt, window=mlab.window_hanning, noverlap=2048) dfW2 = fW2[1] - fW2[0] cutoff = (fW2 >= fmin) & (fW2 <= fmax) fW2 = fW2[cutoff] PSDW2 = PSDW2[cutoff] #------------------------------------------------------------ # Plot the data fig = plt.figure() fig.subplots_adjust(bottom=0.1, top=0.9, hspace=0.3) # top panel: time series ax = fig.add_subplot(211) ax.plot(tplot, dplot, '-k') ax.set_xlabel('time (s)') ax.set_ylabel('$h(t)$') ax.set_ylim(-1.2E-18, 1.2E-18) # bottom panel: hanning window ax = fig.add_subplot(212) ax.loglog(f, PSD, '-', c='#AAAAAA') ax.loglog(fW2, PSDW2, '-k') ax.text(0.98, 0.95, "Hanning (cosine) window", ha='right', va='top', transform=ax.transAxes) ax.set_xlabel('frequency (Hz)') ax.set_ylabel(r'$PSD(f)$') ax.set_xlim(40, 2060) ax.set_ylim(1E-46, 1E-36) ax.yaxis.set_major_locator(plt.LogLocator(base=100)) plt.show() ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1431090040.0 astroML-1.0.2/examples/datasets/plot_SDSS_SSPP.py0000644000076700000240000000575100000000000021121 0ustar00bsipoczstaff""" Stellar Parameters Hess Diagram ------------------------------- This example shows how to create Hess diagrams of the Segue Stellar Parameters Pipeline (SSPP) data to show multiple features on a single plot. The left panel shows the density of the points on the plot. The right panel shows the average metallicity in each pixel, with contours reflecting the density shown in the left plot. """ # Author: Jake VanderPlas # License: BSD # The figure is an example from astroML: see http://astroML.github.com import numpy as np from matplotlib import pyplot as plt #------------------------------------------------------------ # Get SDSS SSPP data from astroML.datasets import fetch_sdss_sspp data = fetch_sdss_sspp() # do some reasonable magnitude cuts rpsf = data['rpsf'] data = data[(rpsf > 15) & (rpsf < 19)] # get the desired data logg = data['logg'] Teff = data['Teff'] FeH = data['FeH'] #------------------------------------------------------------ # Plot the results using the binned_statistic function from astroML.stats import binned_statistic_2d N, xedges, yedges = binned_statistic_2d(Teff, logg, FeH, 'count', bins=100) FeH_mean, xedges, yedges = binned_statistic_2d(Teff, logg, FeH, 'mean', bins=100) # Define custom colormaps: Set pixels with no sources to white cmap = plt.cm.jet cmap.set_bad('w', 1.) cmap_multicolor = plt.cm.jet cmap_multicolor.set_bad('w', 1.) # Create figure and subplots fig = plt.figure(figsize=(8, 4)) fig.subplots_adjust(wspace=0.25, left=0.1, right=0.95, bottom=0.07, top=0.95) #-------------------- # First axes: plt.subplot(121, xticks=[4000, 5000, 6000, 7000, 8000]) plt.imshow(np.log10(N.T), origin='lower', extent=[xedges[0], xedges[-1], yedges[0], yedges[-1]], aspect='auto', interpolation='nearest', cmap=cmap) plt.xlim(xedges[-1], xedges[0]) plt.ylim(yedges[-1], yedges[0]) plt.xlabel(r'$\mathrm{T_{eff}}$') plt.ylabel(r'$\mathrm{log(g)}$') cb = plt.colorbar(ticks=[0, 1, 2, 3], format=r'$10^{%i}$', orientation='horizontal') cb.set_label(r'$\mathrm{number\ in\ pixel}$') plt.clim(0, 3) #-------------------- # Third axes: plt.subplot(122, xticks=[4000, 5000, 6000, 7000, 8000]) plt.imshow(FeH_mean.T, origin='lower', extent=[xedges[0], xedges[-1], yedges[0], yedges[-1]], aspect='auto', interpolation='nearest', cmap=cmap_multicolor) plt.xlim(xedges[-1], xedges[0]) plt.ylim(yedges[-1], yedges[0]) plt.xlabel(r'$\mathrm{T_{eff}}$') plt.ylabel(r'$\mathrm{log(g)}$') cb = plt.colorbar(ticks=np.arange(-2.5, 1, 0.5), format=r'$%.1f$', orientation='horizontal') cb.set_label(r'$\mathrm{mean\ [Fe/H]\ in\ pixel}$') plt.clim(-2.5, 0.5) # Draw density contours over the colors levels = np.linspace(0, np.log10(N.max()), 7)[2:] plt.contour(np.log10(N.T), levels, colors='k', linewidths=1, extent=[xedges[0], xedges[-1], yedges[0], yedges[-1]]) plt.show() ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1612224990.0 astroML-1.0.2/examples/datasets/plot_corrected_spectra.py0000644000076700000240000000224600000000000023167 0ustar00bsipoczstaff""" Corrected Spectra ----------------- The script examples/datasets/compute_sdss_pca.py uses an iterative PCA technique to reconstruct masked regions of SDSS spectra. Several of the resulting spectra are shown below. """ # Author: Jake VanderPlas # License: BSD # The figure is an example from astroML: see http://astroML.github.com import numpy as np import matplotlib.pyplot as plt from astroML.datasets import sdss_corrected_spectra #------------------------------------------------------------ # Fetch the data data = sdss_corrected_spectra.fetch_sdss_corrected_spectra() spectra = sdss_corrected_spectra.reconstruct_spectra(data) lam = sdss_corrected_spectra.compute_wavelengths(data) #------------------------------------------------------------ # Plot several spectra fig = plt.figure(figsize=(8, 8)) fig.subplots_adjust(hspace=0) for i in range(5): ax = fig.add_subplot(511 + i) ax.plot(lam, spectra[i], '-k') if i < 4: ax.xaxis.set_major_formatter(plt.NullFormatter()) else: ax.set_xlabel(r'wavelength $(\AA)$') ax.yaxis.set_major_formatter(plt.NullFormatter()) ax.set_ylabel('flux') plt.show() ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1431090040.0 astroML-1.0.2/examples/datasets/plot_dr7_quasar.py0000644000076700000240000000230000000000000021533 0ustar00bsipoczstaff""" SDSS Data Release 7 Quasar catalog ---------------------------------- This demonstrates how to fetch and visualize the colors from the SDSS DR7 quasar sample. """ # Author: Jake VanderPlas # License: BSD # The figure is an example from astroML: see http://astroML.github.com import numpy as np from matplotlib import pyplot as plt from astroML.plotting import MultiAxes from astroML.datasets import fetch_dr7_quasar data = fetch_dr7_quasar() colors = np.empty((len(data), 5)) colors[:, 0] = data['mag_u'] - data['mag_g'] colors[:, 1] = data['mag_g'] - data['mag_r'] colors[:, 2] = data['mag_r'] - data['mag_i'] colors[:, 3] = data['mag_i'] - data['mag_z'] colors[:, 4] = data['mag_z'] - data['mag_J'] labels = ['u-g', 'g-r', 'r-i', 'i-z', 'z-J'] bins = [np.linspace(-0.4, 1.0, 100), np.linspace(-0.4, 1.0, 100), np.linspace(-0.3, 0.6, 100), np.linspace(-0.4, 0.7, 100), np.linspace(0, 2.2, 100)] ax = MultiAxes(5, wspace=0.05, hspace=0.05, fig=plt.figure(figsize=(10, 10))) ax.density(colors, bins) ax.set_labels(labels) ax.set_locators(plt.MaxNLocator(5)) plt.suptitle('SDSS DR7 Quasar Colors', fontsize=18) plt.show() ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1612224990.0 astroML-1.0.2/examples/datasets/plot_great_wall.py0000644000076700000240000000205700000000000021615 0ustar00bsipoczstaff""" SDSS "Great Wall" ----------------- Plotting the SDSS "great wall", a filament of galaxies visible by-eye in the projected locations of the SDSS spectroscopic galaxy sample. This follows a similar procedure to [1]_, References ---------- .. [1] http://adsabs.harvard.edu/abs/2008ApJ...674L..13C """ # Author: Jake VanderPlas # License: BSD # The figure is an example from astroML: see http://astroML.github.com import numpy as np from matplotlib import pyplot as plt from astroML.datasets import fetch_great_wall from astroML.density_estimation import KNeighborsDensity #------------------------------------------------------------ # Fetch the great wall data X = fetch_great_wall() #------------------------------------------------------------ # Plot the results fig = plt.figure(figsize=(8, 4)) # First plot: scatter the points ax = plt.subplot(111, aspect='equal') ax.scatter(X[:, 1], X[:, 0], s=1, lw=0, c='k') ax.set_xlim(-300, 200) ax.set_ylim(-375, -175) ax.set_xlabel('y (Mpc)') ax.set_ylabel('x (MPC)') plt.show() ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1612224990.0 astroML-1.0.2/examples/datasets/plot_moving_objects.py0000644000076700000240000000721300000000000022503 0ustar00bsipoczstaff""" SDSS Moving Object Catalog -------------------------- This plot demonstrates how to fetch data from the SDSS Moving object catalog, and plot using a multicolor plot similar to that used in figures 3-4 of [1]_ References ~~~~~~~~~~ .. [1] Parker `et al.` 2008 http://adsabs.harvard.edu/abs/2008Icar..198..138P """ # Author: Jake VanderPlas # License: BSD # The figure is an example from astroML: see http://astroML.github.com import numpy as np import matplotlib from matplotlib import pyplot as plt from astroML.datasets import fetch_moving_objects from astroML.plotting.tools import devectorize_axes def black_bg_subplot(*args, **kwargs): """Create a subplot with black background""" if int(matplotlib.__version__[0]) >= 2: kwargs['facecolor'] = 'k' else: kwargs['axisbg'] = 'k' ax = plt.subplot(*args, **kwargs) # set ticks and labels to white for spine in ax.spines.values(): spine.set_color('w') for tick in ax.xaxis.get_major_ticks() + ax.yaxis.get_major_ticks(): for child in tick.get_children(): child.set_color('w') return ax def compute_color(mag_a, mag_i, mag_z, a_crit=-0.1): """ Compute the scatter-plot color using code adapted from TCL source used in Parker 2008. """ # define the base color scalings R = np.ones_like(mag_i) G = 0.5 * 10 ** (-2 * (mag_i - mag_z - 0.01)) B = 1.5 * 10 ** (-8 * (mag_a + 0.0)) # enhance green beyond the a_crit cutoff i = np.where(mag_a < a_crit) G[i] += 10000 * (10 ** (-0.01 * (mag_a[i] - a_crit)) - 1) # normalize color of each point to its maximum component RGB = np.vstack([R, G, B]) RGB /= RGB.max(0) # return an array of RGB colors, which is shape (n_points, 3) return RGB.T #------------------------------------------------------------ # Fetch data and extract the desired quantities data = fetch_moving_objects(Parker2008_cuts=True) mag_a = data['mag_a'] mag_i = data['mag_i'] mag_z = data['mag_z'] a = data['aprime'] sini = data['sin_iprime'] # dither: magnitudes are recorded only to +/- 0.01 mag_a += -0.005 + 0.01 * np.random.random(size=mag_a.shape) mag_i += -0.005 + 0.01 * np.random.random(size=mag_i.shape) mag_z += -0.005 + 0.01 * np.random.random(size=mag_z.shape) # compute RGB color based on magnitudes color = compute_color(mag_a, mag_i, mag_z) #------------------------------------------------------------ # set up the plot # plot the color-magnitude plot fig = plt.figure(facecolor='k') ax = black_bg_subplot(111) ax.scatter(mag_a, mag_i - mag_z, c=color, s=1, lw=0) devectorize_axes(ax, dpi=400) ax.plot([0, 0], [-0.8, 0.6], '--w', lw=2) ax.plot([0, 0.4], [-0.15, -0.15], '--w', lw=2) ax.set_xlim(-0.3, 0.4) ax.set_ylim(-0.8, 0.6) ax.set_xlabel('a*', color='w') ax.set_ylabel('i-z', color='w') # plot the orbital parameters plot fig = plt.figure(facecolor='k') ax = black_bg_subplot(111) ax.scatter(a, sini, c=color, s=1, lw=0) devectorize_axes(ax, dpi=400) ax.plot([2.5, 2.5], [-0.02, 0.3], '--w') ax.plot([2.82, 2.82], [-0.02, 0.3], '--w') ax.set_xlim(2.0, 3.3) ax.set_ylim(-0.02, 0.3) ax.set_xlabel('a (AU)', color='w') ax.set_ylabel('sin(i)', color='w') # label the plot text_kwargs = dict(color='w', fontsize=14, transform=plt.gca().transAxes, ha='center', va='bottom') ax.text(0.25, 1.01, 'Inner', **text_kwargs) ax.text(0.53, 1.01, 'Mid', **text_kwargs) ax.text(0.83, 1.01, 'Outer', **text_kwargs) # Saving the black-background figure requires some extra arguments: #fig.savefig('moving_objects.png', # facecolor='black', # edgecolor='none') plt.show() ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1612224990.0 astroML-1.0.2/examples/datasets/plot_nasa_atlas.py0000644000076700000240000000321200000000000021574 0ustar00bsipoczstaff""" NASA Sloan Atlas ---------------- This shows some visualizations of the data from the NASA SDSS Atlas """ # Author: Jake VanderPlas # License: BSD # The figure is an example from astroML: see http://astroML.github.com import numpy as np from matplotlib import pyplot as plt from astropy.visualization import hist from astroML.datasets import fetch_nasa_atlas data = fetch_nasa_atlas() #------------------------------------------------------------ # plot the RA/DEC in an area-preserving projection RA = data['RA'] DEC = data['DEC'] # convert coordinates to degrees RA -= 180 RA *= np.pi / 180 DEC *= np.pi / 180 ax = plt.axes(projection='mollweide') plt.scatter(RA, DEC, s=1, c=data['Z'], cmap=plt.cm.copper, edgecolors='none', linewidths=0) plt.grid(True) plt.title('NASA Atlas Galaxy Locations') cb = plt.colorbar(cax=plt.axes([0.05, 0.1, 0.9, 0.05]), orientation='horizontal', ticks=np.linspace(0, 0.05, 6)) cb.set_label('redshift') #------------------------------------------------------------ # plot the r vs u-r color-magnitude diagram absmag = data['ABSMAG'] u = absmag[:, 2] r = absmag[:, 4] plt.figure() ax = plt.axes() plt.scatter(u - r, r, s=1, lw=0, c=data['Z'], cmap=plt.cm.copper) plt.colorbar(ticks=np.linspace(0, 0.05, 6)).set_label('redshift') plt.xlim(0, 3.5) plt.ylim(-10, -24) plt.xlabel('u-r') plt.ylabel('r') #------------------------------------------------------------ # plot a histogram of the redshift plt.figure() hist(data['Z'], bins='knuth', histtype='stepfilled', ec='k', fc='#F5CCB0') plt.xlabel('z') plt.ylabel('N(z)') plt.show() ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1431090040.0 astroML-1.0.2/examples/datasets/plot_rrlyrae_mags.py0000644000076700000240000000206100000000000022156 0ustar00bsipoczstaff""" RR-Lyrae Magnitudes ------------------- This example downloads and plots the colors of RR Lyrae stars along with those of the non-variable stars. Several of the classification examples in the book figures use this dataset. """ # Author: Jake VanderPlas # License: BSD # The figure is an example from astroML: see http://astroML.github.com import numpy as np from matplotlib import pyplot as plt from astroML.datasets import fetch_rrlyrae_combined #---------------------------------------------------------------------- # get data and split into training & testing sets X, y = fetch_rrlyrae_combined() X = X[-5000:] y = y[-5000:] stars = (y == 0) rrlyrae = (y == 1) #------------------------------------------------------------ # plot the results ax = plt.axes() ax.plot(X[stars, 0], X[stars, 1], '.', ms=5, c='b', label='stars') ax.plot(X[rrlyrae, 0], X[rrlyrae, 1], '.', ms=5, c='r', label='RR-Lyrae') ax.legend(loc=3) ax.set_xlabel('$u-g$') ax.set_ylabel('$g-r$') ax.set_xlim(0.7, 1.4) ax.set_ylim(-0.2, 0.4) plt.show() ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1431090040.0 astroML-1.0.2/examples/datasets/plot_sdss_S82standards.py0000644000076700000240000000431500000000000023007 0ustar00bsipoczstaff""" SDSS Standard Star catalog -------------------------- This demonstrates how to fetch and plot the colors of the SDSS Stripe 82 standard stars, both alone and with the cross-matched 2MASS colors. """ # Author: Jake VanderPlas # License: BSD # The figure is an example from astroML: see http://astroML.github.com import numpy as np from matplotlib import pyplot as plt from astroML.datasets import fetch_sdss_S82standards from astroML.plotting import MultiAxes #------------------------------------------------------------ # Plot SDSS data alone data = fetch_sdss_S82standards() colors = np.zeros((len(data), 4)) colors[:, 0] = data['mmu_u'] - data['mmu_g'] colors[:, 1] = data['mmu_g'] - data['mmu_r'] colors[:, 2] = data['mmu_r'] - data['mmu_i'] colors[:, 3] = data['mmu_i'] - data['mmu_z'] labels = ['u-g', 'g-r', 'r-i', 'i-z'] bins = [np.linspace(0.0, 3.5, 100), np.linspace(0, 2, 100), np.linspace(-0.2, 1.8, 100), np.linspace(-0.2, 1.0, 100)] fig = plt.figure(figsize=(10, 10)) ax = MultiAxes(4, hspace=0.05, wspace=0.05, fig=fig) ax.density(colors, bins=bins) ax.set_labels(labels) ax.set_locators(plt.MaxNLocator(5)) plt.suptitle('SDSS magnitudes') #------------------------------------------------------------ # Plot datacross-matched with 2MASS data = fetch_sdss_S82standards(crossmatch_2mass=True) colors = np.zeros((len(data), 7)) colors[:, 0] = data['mmu_u'] - data['mmu_g'] colors[:, 1] = data['mmu_g'] - data['mmu_r'] colors[:, 2] = data['mmu_r'] - data['mmu_i'] colors[:, 3] = data['mmu_i'] - data['mmu_z'] colors[:, 4] = data['mmu_z'] - data['J'] colors[:, 5] = data['J'] - data['H'] colors[:, 6] = data['H'] - data['K'] labels = ['u-g', 'g-r', 'r-i', 'i-z', 'z-J', 'J-H', 'H-K'] bins = [np.linspace(0.0, 3.5, 100), np.linspace(0, 2, 100), np.linspace(-0.2, 1.8, 100), np.linspace(-0.2, 1.0, 100), np.linspace(0.5, 2.0, 100), np.linspace(0.0, 1.0, 100), np.linspace(-0.4, 0.8, 100)] fig = plt.figure(figsize=(10, 10)) ax = MultiAxes(7, hspace=0.05, wspace=0.05, fig=fig) ax.density(colors, bins=bins) ax.set_labels(labels) ax.set_locators(plt.MaxNLocator(5)) fig.suptitle('SDSS+2MASS magnitudes') plt.show() ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1431090040.0 astroML-1.0.2/examples/datasets/plot_sdss_filters.py0000644000076700000240000000252600000000000022201 0ustar00bsipoczstaff""" SDSS Filters ------------ Download and plot the five SDSS filter bands along with a Vega spectrum. This data is available on the SDSS website (filters) and on the STSci website (Vega). """ # Author: Jake VanderPlas # License: BSD # The figure is an example from astroML: see http://astroML.github.com from matplotlib import pyplot as plt from astroML.datasets import fetch_sdss_filter, fetch_vega_spectrum #------------------------------------------------------------ # Set up figure and axes fig = plt.figure() ax = fig.add_subplot(111) #---------------------------------------------------------------------- # Fetch and plot the Vega spectrum spec = fetch_vega_spectrum() lam = spec[0] spectrum = spec[1] / 2.1 / spec[1].max() ax.plot(lam, spectrum, '-k', lw=2) #------------------------------------------------------------ # Fetch and plot the five filters text_kwargs = dict(fontsize=20, ha='center', va='center', alpha=0.5) for f, c, loc in zip('ugriz', 'bgrmk', [3500, 4600, 6100, 7500, 8800]): data = fetch_sdss_filter(f) ax.fill(data[0], data[1], ec=c, fc=c, alpha=0.4) ax.text(loc, 0.02, f, color=c, **text_kwargs) ax.set_xlim(3000, 11000) ax.set_title('SDSS Filters and Reference Spectrum') ax.set_xlabel('Wavelength (Angstroms)') ax.set_ylabel('normalized flux / filter transmission') plt.show() ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1612224990.0 astroML-1.0.2/examples/datasets/plot_sdss_galaxy_colors.py0000644000076700000240000000235000000000000023372 0ustar00bsipoczstaff""" SDSS Galaxy Colors ------------------ The function :func:`fetch_sdss_galaxy_colors` used below actually queries the SDSS CASjobs server for the colors of the 50,000 galaxies. Below we extract the :math:`u - g` and :math:`g - r` colors for 5000 objects, and scatter-plot the results. """ # Author: Jake VanderPlas # License: BSD # The figure is an example from astroML: see http://astroML.github.com from matplotlib import pyplot as plt from astroML.datasets import fetch_sdss_galaxy_colors #------------------------------------------------------------ # Download data data = fetch_sdss_galaxy_colors() data = data[::10] # truncate for plotting # Extract colors and spectral class ug = data['u'] - data['g'] gr = data['g'] - data['r'] spec_class = data['specClass'] galaxies = (spec_class == 'GALAXY') qsos = (spec_class == 'QSO') #------------------------------------------------------------ # Prepare plot fig = plt.figure() ax = fig.add_subplot(111) ax.set_xlim(-0.5, 2.5) ax.set_ylim(-0.5, 1.5) ax.plot(ug[galaxies], gr[galaxies], '.', ms=4, c='b', label='galaxies') ax.plot(ug[qsos], gr[qsos], '.', ms=4, c='r', label='qsos') ax.legend(loc=2) ax.set_xlabel('$u-g$') ax.set_ylabel('$g-r$') plt.show() ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1431090040.0 astroML-1.0.2/examples/datasets/plot_sdss_imaging.py0000644000076700000240000000350300000000000022140 0ustar00bsipoczstaff""" SDSS Imaging ============ This example shows how to load the magnitude data from the SDSS imaging catalog, and plot colors and magnitudes of the stars and galaxies. """ # Author: Jake VanderPlas # License: BSD # The figure is an example from astroML: see http://astroML.github.com import numpy as np from matplotlib import pyplot as plt from astroML.datasets import fetch_imaging_sample #------------------------------------------------------------ # Get the star/galaxy data data = fetch_imaging_sample() objtype = data['type'] stars = data[objtype == 6][:5000] galaxies = data[objtype == 3][:5000] #------------------------------------------------------------ # Plot the stars and galaxies plot_kwargs = dict(color='k', linestyle='none', marker='.', markersize=1) fig = plt.figure() ax1 = fig.add_subplot(221) ax1.plot(galaxies['gRaw'] - galaxies['rRaw'], galaxies['rRaw'], **plot_kwargs) ax2 = fig.add_subplot(223, sharex=ax1) ax2.plot(galaxies['gRaw'] - galaxies['rRaw'], galaxies['rRaw'] - galaxies['iRaw'], **plot_kwargs) ax3 = fig.add_subplot(222, sharey=ax1) ax3.plot(stars['gRaw'] - stars['rRaw'], stars['rRaw'], **plot_kwargs) ax4 = fig.add_subplot(224, sharex=ax3, sharey=ax2) ax4.plot(stars['gRaw'] - stars['rRaw'], stars['rRaw'] - stars['iRaw'], **plot_kwargs) # set labels and titles ax1.set_ylabel('$r$') ax2.set_ylabel('$r-i$') ax2.set_xlabel('$g-r$') ax4.set_xlabel('$g-r$') ax1.set_title('Galaxies') ax3.set_title('Stars') # set axis limits ax2.set_xlim(-0.5, 3) ax3.set_ylim(22.5, 14) ax4.set_xlim(-0.5, 3) ax4.set_ylim(-1, 2) # adjust tick spacings on all axes for ax in (ax1, ax2, ax3, ax4): ax.xaxis.set_major_locator(plt.MultipleLocator(1)) ax.yaxis.set_major_locator(plt.MultipleLocator(1)) plt.show() ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1431090040.0 astroML-1.0.2/examples/datasets/plot_sdss_line_ratios.py0000644000076700000240000000300000000000000023025 0ustar00bsipoczstaff""" SDSS Line-ratio Diagrams ------------------------ This shows how to plot line-ratio diagrams for the SDSS spectra. These diagrams are often called BPT plots [1]_, Osterbrock diagrams [2]_, or Kewley diagrams [3]_. The location of the dividing line is taken from from Kewley et al 2001. References ~~~~~~~~~~ .. [1] Baldwin, J. A.; Phillips, M. M.; Terlevich, R. (1981) http://adsabs.harvard.edu/abs/1981PASP...93....5B .. [2] Osterbrock, D. E.; De Robertis, M. M. (1985) http://adsabs.harvard.edu/abs/1985PASP...97.1129O .. [3] Kewley, L. J. `et al.` (2001) http://adsabs.harvard.edu/abs/2001ApJ...556..121K """ # Author: Jake VanderPlas # License: BSD # The figure is an example from astroML: see http://astroML.github.com import numpy as np from matplotlib import pyplot as plt from astroML.datasets import fetch_sdss_corrected_spectra from astroML.datasets.tools.sdss_fits import log_OIII_Hb_NII data = fetch_sdss_corrected_spectra() i = np.where((data['lineindex_cln'] == 4) | (data['lineindex_cln'] == 5)) plt.scatter(data['log_NII_Ha'][i], data['log_OIII_Hb'][i], c=data['lineindex_cln'][i], s=9, lw=0) NII = np.linspace(-2.0, 0.35) plt.plot(NII, log_OIII_Hb_NII(NII), '-k') plt.plot(NII, log_OIII_Hb_NII(NII, 0.1), '--k') plt.plot(NII, log_OIII_Hb_NII(NII, -0.1), '--k') plt.xlim(-2.0, 1.0) plt.ylim(-1.2, 1.5) plt.xlabel(r'$\mathrm{log([NII]/H\alpha)}$', fontsize='large') plt.ylabel(r'$\mathrm{log([OIII]/H\beta)}$', fontsize='large') plt.show() ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1612224990.0 astroML-1.0.2/examples/datasets/plot_sdss_specgals.py0000644000076700000240000000341200000000000022325 0ustar00bsipoczstaff""" SDSS Spectroscopic Galaxy Sample -------------------------------- This figure shows photometric colors of the SDSS spectroscopic galaxy sample. """ # Author: Jake VanderPlas # License: BSD # The figure is an example from astroML: see http://astroML.github.com import numpy as np from matplotlib import pyplot as plt from astropy.visualization import hist from astroML.datasets import fetch_sdss_specgals data = fetch_sdss_specgals() #------------------------------------------------------------ # plot the RA/DEC in an area-preserving projection RA = data['ra'] DEC = data['dec'] # convert coordinates to degrees RA -= 180 RA *= np.pi / 180 DEC *= np.pi / 180 ax = plt.axes(projection='mollweide') ax.grid() plt.scatter(RA, DEC, s=1, lw=0, c=data['z'], cmap=plt.cm.copper, vmin=0, vmax=0.4) plt.title('SDSS DR8 Spectroscopic Galaxies') cb = plt.colorbar(cax=plt.axes([0.05, 0.1, 0.9, 0.05]), orientation='horizontal', ticks=np.linspace(0, 0.4, 9)) cb.set_label('redshift') #------------------------------------------------------------ # plot the r vs u-r color-magnitude diagram u = data['modelMag_u'] r = data['modelMag_r'] rPetro = data['petroMag_r'] plt.figure() ax = plt.axes() plt.scatter(u - r, rPetro, s=1, lw=0, c=data['z'], cmap=plt.cm.copper, vmin=0, vmax=0.4) plt.colorbar(ticks=np.linspace(0, 0.4, 9)).set_label('redshift') plt.xlim(0.5, 5.5) plt.ylim(18, 12.5) plt.xlabel('u-r') plt.ylabel('rPetrosian') #------------------------------------------------------------ # plot a histogram of the redshift plt.figure() hist(data['z'], bins='knuth', histtype='stepfilled', ec='k', fc='#F5CCB0') plt.xlim(0, 0.4) plt.xlabel('z (redshift)') plt.ylabel('dN/dz(z)') plt.show() ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1431090040.0 astroML-1.0.2/examples/datasets/plot_sdss_spectrum.py0000644000076700000240000000233700000000000022373 0ustar00bsipoczstaff""" SDSS Spectrum Example --------------------- This example shows how to fetch and plot a spectrum from the SDSS database using the plate, MJD, and fiber numbers. The code below sends a query to the SDSS server for the given plate, fiber, and mjd, downloads the spectrum, and plots the result. """ # Author: Jake VanderPlas # License: BSD # The figure is an example from astroML: see http://astroML.github.com from matplotlib import pyplot as plt from astroML.datasets import fetch_sdss_spectrum #------------------------------------------------------------ # Fetch single spectrum plate = 1615 mjd = 53166 fiber = 513 spec = fetch_sdss_spectrum(plate, mjd, fiber) #------------------------------------------------------------ # Plot the resulting spectrum ax = plt.axes() ax.plot(spec.wavelength(), spec.spectrum, '-k', label='spectrum') ax.plot(spec.wavelength(), spec.error, '-', color='gray', label='error') ax.legend(loc=4) ax.set_title('Plate = %(plate)i, MJD = %(mjd)i, Fiber = %(fiber)i' % locals()) ax.text(0.05, 0.95, 'z = %.2f' % spec.z, size=16, ha='left', va='top', transform=ax.transAxes) ax.set_xlabel(r'$\lambda (\AA)$') ax.set_ylabel('Flux') ax.set_ylim(-10, 300) plt.show() ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1431090040.0 astroML-1.0.2/examples/datasets/plot_wmap_power_spectra.py0000644000076700000240000000405400000000000023374 0ustar00bsipoczstaff""" WMAP power spectrum analysis with HealPy ---------------------------------------- This demonstrates how to plot and take a power spectrum of the WMAP data using healpy, the python wrapper for healpix. Healpy is available for download at the `github site `_ """ # Author: Jake VanderPlas # License: BSD # The figure is an example from astroML: see http://astroML.github.com import numpy as np from matplotlib import pyplot as plt # warning: due to a bug in healpy, importing it before pylab can cause # a segmentation fault in some circumstances. import healpy as hp from astroML.datasets import fetch_wmap_temperatures #------------------------------------------------------------ # Fetch the data wmap_unmasked = fetch_wmap_temperatures(masked=False) wmap_masked = fetch_wmap_temperatures(masked=True) white_noise = np.ma.asarray(np.random.normal(0, 0.062, wmap_masked.shape)) #------------------------------------------------------------ # plot the unmasked map fig = plt.figure(1) hp.mollview(wmap_unmasked, min=-1, max=1, title='Unmasked map', fig=1, unit=r'$\Delta$T (mK)') #------------------------------------------------------------ # plot the masked map # filled() fills the masked regions with a null value. fig = plt.figure(2) hp.mollview(wmap_masked.filled(), title='Masked map', fig=2, unit=r'$\Delta$T (mK)') #------------------------------------------------------------ # compute and plot the power spectrum cl = hp.anafast(wmap_masked.filled(), lmax=1024) ell = np.arange(len(cl)) cl_white = hp.anafast(white_noise, lmax=1024) fig = plt.figure(3) ax = fig.add_subplot(111) ax.scatter(ell, ell * (ell + 1) * cl, s=4, c='black', lw=0, label='data') ax.scatter(ell, ell * (ell + 1) * cl_white, s=4, c='gray', lw=0, label='white noise') ax.set_xlabel(r'$\ell$') ax.set_ylabel(r'$\ell(\ell+1)C_\ell$') ax.set_title('Angular Power (not mask corrected)') ax.legend(loc='upper right') ax.grid() ax.set_xlim(0, 1100) plt.show() ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1431090040.0 astroML-1.0.2/examples/datasets/plot_wmap_raw.py0000644000076700000240000000206300000000000021306 0ustar00bsipoczstaff""" WMAP plotting with HEALPix -------------------------- This example uses the :func:`astromL.datasets.fetch_wmap_temperatures` functionality to download and plot the raw WMAP 7-year data. The visualization requires the `healpy `_ package to be installed. """ # Author: Jake VanderPlas # License: BSD # The figure is an example from astroML: see http://astroML.github.com import numpy as np from matplotlib import pyplot as plt # warning: due to a bug in healpy, importing it before pylab can cause # a segmentation fault in some circumstances. import healpy as hp from astroML.datasets import fetch_wmap_temperatures #------------------------------------------------------------ # Fetch the wmap data wmap_unmasked = fetch_wmap_temperatures(masked=False) #------------------------------------------------------------ # plot the unmasked map fig = plt.figure(1) hp.mollview(wmap_unmasked, min=-1, max=1, title='Raw WMAP data', fig=1, cmap=plt.cm.jet, unit=r'$\Delta$T (mK)') plt.show() ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1643147665.4474697 astroML-1.0.2/examples/learning/0000755000076700000240000000000000000000000016047 5ustar00bsipoczstaff././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1431090040.0 astroML-1.0.2/examples/learning/README.rst0000644000076700000240000000036200000000000017537 0ustar00bsipoczstaffMachine Learning and Data Modeling ---------------------------------- These scripts show some of the machine learning and data modeling tools available in astroML. For more examples, see the :ref:`figures ` from the textbook. ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1612224990.0 astroML-1.0.2/examples/learning/plot_neighbors_photoz.py0000644000076700000240000000402200000000000023040 0ustar00bsipoczstaff""" K-Neighbors for Photometric Redshifts ------------------------------------- Estimate redshifts from the colors of sdss galaxies and quasars. This uses colors from a sample of 50,000 objects with SDSS photometry and ugriz magnitudes. The example shows how far one can get with an extremely simple machine learning approach to the photometric redshift problem. The function :func:`fetch_sdss_galaxy_colors` used below actually queries the SDSS CASjobs server for the colors of the 50,000 galaxies. """ # Author: Jake VanderPlas # License: BSD # The figure is an example from astroML: see http://astroML.github.com import numpy as np from matplotlib import pyplot as plt from sklearn.neighbors import KNeighborsRegressor from astroML.datasets import fetch_sdss_galaxy_colors from astroML.plotting import scatter_contour n_neighbors = 1 data = fetch_sdss_galaxy_colors() N = len(data) # shuffle data np.random.seed(0) np.random.shuffle(data) # put colors in a matrix X = np.zeros((N, 4)) X[:, 0] = data['u'] - data['g'] X[:, 1] = data['g'] - data['r'] X[:, 2] = data['r'] - data['i'] X[:, 3] = data['i'] - data['z'] z = data['redshift'] # divide into training and testing data Ntrain = N // 2 Xtrain = X[:Ntrain] ztrain = z[:Ntrain] Xtest = X[Ntrain:] ztest = z[Ntrain:] knn = KNeighborsRegressor(n_neighbors, weights='uniform') zpred = knn.fit(Xtrain, ztrain).predict(Xtest) axis_lim = np.array([-0.1, 2.5]) rms = np.sqrt(np.mean((ztest - zpred) ** 2)) print("RMS error = %.2g" % rms) ax = plt.axes() plt.scatter(ztest, zpred, c='k', lw=0, s=4) plt.plot(axis_lim, axis_lim, '--k') plt.plot(axis_lim, axis_lim + rms, ':k') plt.plot(axis_lim, axis_lim - rms, ':k') plt.xlim(axis_lim) plt.ylim(axis_lim) plt.text(0.98, 0.02, "RMS error = %.2g" % rms, ha='right', va='bottom', transform=ax.transAxes, bbox=dict(ec='w', fc='w'), fontsize=12) plt.title('Photo-z: Nearest Neigbor Regression') plt.xlabel(r'$\mathrm{z_{spec}}$', fontsize=14) plt.ylabel(r'$\mathrm{z_{phot}}$', fontsize=14) plt.show() ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1615393766.0 astroML-1.0.2/pyproject.toml0000644000076700000240000000013100000000000015341 0ustar00bsipoczstaff[build-system] requires = ["setuptools", "wheel"] build-backend = 'setuptools.build_meta'././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1643147665.4486034 astroML-1.0.2/setup.cfg0000644000076700000240000000346500000000000014263 0ustar00bsipoczstaff[metadata] name = astroML version = 1.0.2 author = Jake VanderPlas author_email = vanderplas@astro.washington.edu maintainer = Brigitta Sipocz maintainer_email = bsipocz@gmail.com license = BSD 3-Clause License license_file = LICENSE.rst url = http://astroML.github.com description = Tools for machine learning and data mining in Astronomy long_description = file: README.rst keywords = astronomy, astrophysics, cosmology, space, science, modeling, models, fitting, machine-learning classifiers = Development Status :: 4 - Beta Environment :: Console Intended Audience :: Science/Research License :: OSI Approved :: BSD License Natural Language :: English Programming Language :: Python :: 3 Programming Language :: Python :: 3.5 Programming Language :: Python :: 3.6 Programming Language :: Python :: 3.7 Programming Language :: Python :: 3.8 Programming Language :: Python :: 3 :: Only Topic :: Scientific/Engineering :: Astronomy [tool:pytest] addopts = --doctest-plus doctest_plus = enabled doctest_rst = True testspaths = astroML doc examples doctest_optionflags = FLOAT_CMP ELLIPSIS NORMALIZE_WHITESPACE filterwarnings = error::DeprecationWarning error::FutureWarning ignore:Using or importing the ABCs from 'collections':DeprecationWarning ignore:distutils Version classes are deprecated:DeprecationWarning:xarray [flake8] max-line-length = 100 per-file-ignores = astroML/datasets/tools/cas_query.py:E223, E221 exclude = __init__.py [entry_points] [options] install_requires = scikit-learn>=0.18 numpy>=1.13 scipy>=0.18 matplotlib>=3.0 astropy>=3.0 tests_require = pytest-astropy packages = find: [options.extras_require] test = pytest-doctestplus pytest-astropy-header pytest-remotedata pytest-cov all = pymc3>=3.7,<3.11 codestyle = flake8 docs = sphinx [egg_info] tag_build = tag_date = 0 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1615393766.0 astroML-1.0.2/setup.py0000644000076700000240000000004500000000000014143 0ustar00bsipoczstafffrom setuptools import setup setup()