././@PaxHeader0000000000000000000000000000003400000000000011452 xustar000000000000000028 mtime=1596543193.2007592 scikit-learn-0.23.2/0000755000175100001660000000000000000000000014455 5ustar00vstsdocker00000000000000././@PaxHeader0000000000000000000000000000003400000000000011452 xustar000000000000000028 mtime=1596543178.8526752 scikit-learn-0.23.2/COPYING0000644000175100001660000000302700000000000015512 0ustar00vstsdocker00000000000000New BSD License Copyright (c) 2007–2020 The scikit-learn developers. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: a. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. b. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. c. Neither the name of the Scikit-learn Developers nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ././@PaxHeader0000000000000000000000000000003400000000000011452 xustar000000000000000028 mtime=1596543193.2047591 scikit-learn-0.23.2/PKG-INFO0000644000175100001660000001773700000000000015571 0ustar00vstsdocker00000000000000Metadata-Version: 1.1 Name: scikit-learn Version: 0.23.2 Summary: A set of python modules for machine learning and data mining Home-page: http://scikit-learn.org Author: Andreas Mueller Author-email: amueller@ais.uni-bonn.de License: new BSD Download-URL: https://pypi.org/project/scikit-learn/#files Description: .. -*- mode: rst -*- |Azure|_ |Travis|_ |Codecov|_ |CircleCI|_ |PythonVersion|_ |PyPi|_ |DOI|_ .. |Azure| image:: https://dev.azure.com/scikit-learn/scikit-learn/_apis/build/status/scikit-learn.scikit-learn?branchName=master .. _Azure: https://dev.azure.com/scikit-learn/scikit-learn/_build/latest?definitionId=1&branchName=master .. |Travis| image:: https://api.travis-ci.org/scikit-learn/scikit-learn.svg?branch=master .. _Travis: https://travis-ci.org/scikit-learn/scikit-learn .. |Codecov| image:: https://codecov.io/github/scikit-learn/scikit-learn/badge.svg?branch=master&service=github .. _Codecov: https://codecov.io/github/scikit-learn/scikit-learn?branch=master .. |CircleCI| image:: https://circleci.com/gh/scikit-learn/scikit-learn/tree/master.svg?style=shield&circle-token=:circle-token .. _CircleCI: https://circleci.com/gh/scikit-learn/scikit-learn .. |PythonVersion| image:: https://img.shields.io/badge/python-3.6%20%7C%203.7%20%7C%203.8-blue .. _PythonVersion: https://img.shields.io/badge/python-3.6%20%7C%203.7%20%7C%203.8-blue .. |PyPi| image:: https://badge.fury.io/py/scikit-learn.svg .. _PyPi: https://badge.fury.io/py/scikit-learn .. |DOI| image:: https://zenodo.org/badge/21369/scikit-learn/scikit-learn.svg .. _DOI: https://zenodo.org/badge/latestdoi/21369/scikit-learn/scikit-learn scikit-learn ============ scikit-learn is a Python module for machine learning built on top of SciPy and is distributed under the 3-Clause BSD license. The project was started in 2007 by David Cournapeau as a Google Summer of Code project, and since then many volunteers have contributed. See the `About us `__ page for a list of core contributors. It is currently maintained by a team of volunteers. Website: https://scikit-learn.org Installation ------------ Dependencies ~~~~~~~~~~~~ scikit-learn requires: - Python (>= 3.6) - NumPy (>= 1.13.3) - SciPy (>= 0.19.1) - joblib (>= 0.11) - threadpoolctl (>= 2.0.0) **Scikit-learn 0.20 was the last version to support Python 2.7 and Python 3.4.** scikit-learn 0.23 and later require Python 3.6 or newer. Scikit-learn plotting capabilities (i.e., functions start with ``plot_`` and classes end with "Display") require Matplotlib (>= 2.1.1). For running the examples Matplotlib >= 2.1.1 is required. A few examples require scikit-image >= 0.13, a few examples require pandas >= 0.18.0, some examples require seaborn >= 0.9.0. User installation ~~~~~~~~~~~~~~~~~ If you already have a working installation of numpy and scipy, the easiest way to install scikit-learn is using ``pip`` :: pip install -U scikit-learn or ``conda``:: conda install scikit-learn The documentation includes more detailed `installation instructions `_. Changelog --------- See the `changelog `__ for a history of notable changes to scikit-learn. Development ----------- We welcome new contributors of all experience levels. The scikit-learn community goals are to be helpful, welcoming, and effective. The `Development Guide `_ has detailed information about contributing code, documentation, tests, and more. We've included some basic information in this README. Important links ~~~~~~~~~~~~~~~ - Official source code repo: https://github.com/scikit-learn/scikit-learn - Download releases: https://pypi.org/project/scikit-learn/ - Issue tracker: https://github.com/scikit-learn/scikit-learn/issues Source code ~~~~~~~~~~~ You can check the latest sources with the command:: git clone https://github.com/scikit-learn/scikit-learn.git Contributing ~~~~~~~~~~~~ To learn more about making a contribution to scikit-learn, please see our `Contributing guide `_. Testing ~~~~~~~ After installation, you can launch the test suite from outside the source directory (you will need to have ``pytest`` >= 3.3.0 installed):: pytest sklearn See the web page https://scikit-learn.org/dev/developers/advanced_installation.html#testing for more information. Random number generation can be controlled during testing by setting the ``SKLEARN_SEED`` environment variable. Submitting a Pull Request ~~~~~~~~~~~~~~~~~~~~~~~~~ Before opening a Pull Request, have a look at the full Contributing page to make sure your code complies with our guidelines: https://scikit-learn.org/stable/developers/index.html Project History --------------- The project was started in 2007 by David Cournapeau as a Google Summer of Code project, and since then many volunteers have contributed. See the `About us `__ page for a list of core contributors. The project is currently maintained by a team of volunteers. **Note**: `scikit-learn` was previously referred to as `scikits.learn`. Help and Support ---------------- Documentation ~~~~~~~~~~~~~ - HTML documentation (stable release): https://scikit-learn.org - HTML documentation (development version): https://scikit-learn.org/dev/ - FAQ: https://scikit-learn.org/stable/faq.html Communication ~~~~~~~~~~~~~ - Mailing list: https://mail.python.org/mailman/listinfo/scikit-learn - IRC channel: ``#scikit-learn`` at ``webchat.freenode.net`` - Stack Overflow: https://stackoverflow.com/questions/tagged/scikit-learn - Website: https://scikit-learn.org Citation ~~~~~~~~ If you use scikit-learn in a scientific publication, we would appreciate citations: https://scikit-learn.org/stable/about.html#citing-scikit-learn Platform: UNKNOWN Classifier: Intended Audience :: Science/Research Classifier: Intended Audience :: Developers Classifier: License :: OSI Approved Classifier: Programming Language :: C Classifier: Programming Language :: Python Classifier: Topic :: Software Development Classifier: Topic :: Scientific/Engineering Classifier: Operating System :: Microsoft :: Windows Classifier: Operating System :: POSIX Classifier: Operating System :: Unix Classifier: Operating System :: MacOS Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3.6 Classifier: Programming Language :: Python :: 3.7 Classifier: Programming Language :: Python :: 3.8 Classifier: Programming Language :: Python :: Implementation :: CPython Classifier: Programming Language :: Python :: Implementation :: PyPy ././@PaxHeader0000000000000000000000000000003400000000000011452 xustar000000000000000028 mtime=1596543178.8526752 scikit-learn-0.23.2/README.rst0000644000175100001660000001302000000000000016140 0ustar00vstsdocker00000000000000.. -*- mode: rst -*- |Azure|_ |Travis|_ |Codecov|_ |CircleCI|_ |PythonVersion|_ |PyPi|_ |DOI|_ .. |Azure| image:: https://dev.azure.com/scikit-learn/scikit-learn/_apis/build/status/scikit-learn.scikit-learn?branchName=master .. _Azure: https://dev.azure.com/scikit-learn/scikit-learn/_build/latest?definitionId=1&branchName=master .. |Travis| image:: https://api.travis-ci.org/scikit-learn/scikit-learn.svg?branch=master .. _Travis: https://travis-ci.org/scikit-learn/scikit-learn .. |Codecov| image:: https://codecov.io/github/scikit-learn/scikit-learn/badge.svg?branch=master&service=github .. _Codecov: https://codecov.io/github/scikit-learn/scikit-learn?branch=master .. |CircleCI| image:: https://circleci.com/gh/scikit-learn/scikit-learn/tree/master.svg?style=shield&circle-token=:circle-token .. _CircleCI: https://circleci.com/gh/scikit-learn/scikit-learn .. |PythonVersion| image:: https://img.shields.io/badge/python-3.6%20%7C%203.7%20%7C%203.8-blue .. _PythonVersion: https://img.shields.io/badge/python-3.6%20%7C%203.7%20%7C%203.8-blue .. |PyPi| image:: https://badge.fury.io/py/scikit-learn.svg .. _PyPi: https://badge.fury.io/py/scikit-learn .. |DOI| image:: https://zenodo.org/badge/21369/scikit-learn/scikit-learn.svg .. _DOI: https://zenodo.org/badge/latestdoi/21369/scikit-learn/scikit-learn scikit-learn ============ scikit-learn is a Python module for machine learning built on top of SciPy and is distributed under the 3-Clause BSD license. The project was started in 2007 by David Cournapeau as a Google Summer of Code project, and since then many volunteers have contributed. See the `About us `__ page for a list of core contributors. It is currently maintained by a team of volunteers. Website: https://scikit-learn.org Installation ------------ Dependencies ~~~~~~~~~~~~ scikit-learn requires: - Python (>= 3.6) - NumPy (>= 1.13.3) - SciPy (>= 0.19.1) - joblib (>= 0.11) - threadpoolctl (>= 2.0.0) **Scikit-learn 0.20 was the last version to support Python 2.7 and Python 3.4.** scikit-learn 0.23 and later require Python 3.6 or newer. Scikit-learn plotting capabilities (i.e., functions start with ``plot_`` and classes end with "Display") require Matplotlib (>= 2.1.1). For running the examples Matplotlib >= 2.1.1 is required. A few examples require scikit-image >= 0.13, a few examples require pandas >= 0.18.0, some examples require seaborn >= 0.9.0. User installation ~~~~~~~~~~~~~~~~~ If you already have a working installation of numpy and scipy, the easiest way to install scikit-learn is using ``pip`` :: pip install -U scikit-learn or ``conda``:: conda install scikit-learn The documentation includes more detailed `installation instructions `_. Changelog --------- See the `changelog `__ for a history of notable changes to scikit-learn. Development ----------- We welcome new contributors of all experience levels. The scikit-learn community goals are to be helpful, welcoming, and effective. The `Development Guide `_ has detailed information about contributing code, documentation, tests, and more. We've included some basic information in this README. Important links ~~~~~~~~~~~~~~~ - Official source code repo: https://github.com/scikit-learn/scikit-learn - Download releases: https://pypi.org/project/scikit-learn/ - Issue tracker: https://github.com/scikit-learn/scikit-learn/issues Source code ~~~~~~~~~~~ You can check the latest sources with the command:: git clone https://github.com/scikit-learn/scikit-learn.git Contributing ~~~~~~~~~~~~ To learn more about making a contribution to scikit-learn, please see our `Contributing guide `_. Testing ~~~~~~~ After installation, you can launch the test suite from outside the source directory (you will need to have ``pytest`` >= 3.3.0 installed):: pytest sklearn See the web page https://scikit-learn.org/dev/developers/advanced_installation.html#testing for more information. Random number generation can be controlled during testing by setting the ``SKLEARN_SEED`` environment variable. Submitting a Pull Request ~~~~~~~~~~~~~~~~~~~~~~~~~ Before opening a Pull Request, have a look at the full Contributing page to make sure your code complies with our guidelines: https://scikit-learn.org/stable/developers/index.html Project History --------------- The project was started in 2007 by David Cournapeau as a Google Summer of Code project, and since then many volunteers have contributed. See the `About us `__ page for a list of core contributors. The project is currently maintained by a team of volunteers. **Note**: `scikit-learn` was previously referred to as `scikits.learn`. Help and Support ---------------- Documentation ~~~~~~~~~~~~~ - HTML documentation (stable release): https://scikit-learn.org - HTML documentation (development version): https://scikit-learn.org/dev/ - FAQ: https://scikit-learn.org/stable/faq.html Communication ~~~~~~~~~~~~~ - Mailing list: https://mail.python.org/mailman/listinfo/scikit-learn - IRC channel: ``#scikit-learn`` at ``webchat.freenode.net`` - Stack Overflow: https://stackoverflow.com/questions/tagged/scikit-learn - Website: https://scikit-learn.org Citation ~~~~~~~~ If you use scikit-learn in a scientific publication, we would appreciate citations: https://scikit-learn.org/stable/about.html#citing-scikit-learn ././@PaxHeader0000000000000000000000000000003400000000000011452 xustar000000000000000028 mtime=1596543193.0047581 scikit-learn-0.23.2/doc/0000755000175100001660000000000000000000000015222 5ustar00vstsdocker00000000000000././@PaxHeader0000000000000000000000000000003400000000000011452 xustar000000000000000028 mtime=1596543178.8566754 scikit-learn-0.23.2/doc/Makefile0000644000175100001660000000734000000000000016666 0ustar00vstsdocker00000000000000# Makefile for Sphinx documentation # # You can set these variables from the command line. SPHINXOPTS = -j auto SPHINXBUILD ?= sphinx-build PAPER = BUILDDIR = _build ifneq ($(EXAMPLES_PATTERN),) EXAMPLES_PATTERN_OPTS := -D sphinx_gallery_conf.filename_pattern="$(EXAMPLES_PATTERN)" endif # Internal variables. PAPEROPT_a4 = -D latex_paper_size=a4 PAPEROPT_letter = -D latex_paper_size=letter ALLSPHINXOPTS = -T -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS)\ $(EXAMPLES_PATTERN_OPTS) . .PHONY: help clean html dirhtml pickle json latex latexpdf changes linkcheck doctest optipng all: html-noplot help: @echo "Please use \`make ' where is one of" @echo " html to make standalone HTML files" @echo " dirhtml to make HTML files named index.html in directories" @echo " pickle to make pickle files" @echo " json to make JSON files" @echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter" @echo " latexpdf to make LaTeX files and run them through pdflatex" @echo " changes to make an overview of all changed/added/deprecated items" @echo " linkcheck to check all external links for integrity" @echo " doctest to run all doctests embedded in the documentation (if enabled)" clean: -rm -rf $(BUILDDIR)/* -rm -rf auto_examples/ -rm -rf generated/* -rm -rf modules/generated/ html: # These two lines make the build a bit more lengthy, and the # the embedding of images more robust rm -rf $(BUILDDIR)/html/_images #rm -rf _build/doctrees/ $(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html/stable @echo @echo "Build finished. The HTML pages are in $(BUILDDIR)/html/stable" html-noplot: $(SPHINXBUILD) -D plot_gallery=0 -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html/stable @echo @echo "Build finished. The HTML pages are in $(BUILDDIR)/html/stable." dirhtml: $(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml @echo @echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml." pickle: $(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle @echo @echo "Build finished; now you can process the pickle files." json: $(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json @echo @echo "Build finished; now you can process the JSON files." latex: $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex @echo @echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex." @echo "Run \`make' in that directory to run these through (pdf)latex" \ "(use \`make latexpdf' here to do that automatically)." latexpdf: $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex @echo "Running LaTeX files through pdflatex..." make -C $(BUILDDIR)/latex all-pdf @echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex." changes: $(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes @echo @echo "The overview file is in $(BUILDDIR)/changes." linkcheck: $(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck @echo @echo "Link check complete; look for any errors in the above output " \ "or in $(BUILDDIR)/linkcheck/output.txt." doctest: $(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest @echo "Testing of doctests in the sources finished, look at the " \ "results in $(BUILDDIR)/doctest/output.txt." download-data: python -c "from sklearn.datasets._lfw import _check_fetch_lfw; _check_fetch_lfw()" # Optimize PNG files. Needs OptiPNG. Change the -P argument to the number of # cores you have available, so -P 64 if you have a real computer ;) optipng: find _build auto_examples */generated -name '*.png' -print0 \ | xargs -0 -n 1 -P 4 optipng -o10 dist: html latexpdf cp _build/latex/user_guide.pdf _build/html/stable/_downloads/scikit-learn-docs.pdf ././@PaxHeader0000000000000000000000000000003400000000000011452 xustar000000000000000028 mtime=1596543178.8566754 scikit-learn-0.23.2/doc/README.md0000644000175100001660000000037600000000000016507 0ustar00vstsdocker00000000000000# Documentation for scikit-learn This directory contains the full manual and web site as displayed at http://scikit-learn.org. See http://scikit-learn.org/dev/developers/contributing.html#documentation for detailed information about the documentation. ././@PaxHeader0000000000000000000000000000003400000000000011452 xustar000000000000000028 mtime=1596543178.8566754 scikit-learn-0.23.2/doc/about.rst0000644000175100001660000003376200000000000017101 0ustar00vstsdocker00000000000000.. _about: About us ======== History ------- This project was started in 2007 as a Google Summer of Code project by David Cournapeau. Later that year, Matthieu Brucher started work on this project as part of his thesis. In 2010 Fabian Pedregosa, Gael Varoquaux, Alexandre Gramfort and Vincent Michel of INRIA took leadership of the project and made the first public release, February the 1st 2010. Since then, several releases have appeared following a ~ 3-month cycle, and a thriving international community has been leading the development. Governance ---------- The decision making process and governance structure of scikit-learn is laid out in the :ref:`governance document `. Authors ------- The following people are currently core contributors to scikit-learn's development and maintenance: .. include:: authors.rst Please do not email the authors directly to ask for assistance or report issues. Instead, please see `What's the best way to ask questions about scikit-learn `_ in the FAQ. .. seealso:: :ref:`How you can contribute to the project ` Emeritus Core Developers ------------------------ The following people have been active contributors in the past, but are no longer active in the project: .. include:: authors_emeritus.rst .. _citing-scikit-learn: Citing scikit-learn ------------------- If you use scikit-learn in a scientific publication, we would appreciate citations to the following paper: `Scikit-learn: Machine Learning in Python `_, Pedregosa *et al.*, JMLR 12, pp. 2825-2830, 2011. Bibtex entry:: @article{scikit-learn, title={Scikit-learn: Machine Learning in {P}ython}, author={Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V. and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P. and Weiss, R. and Dubourg, V. and Vanderplas, J. and Passos, A. and Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay, E.}, journal={Journal of Machine Learning Research}, volume={12}, pages={2825--2830}, year={2011} } If you want to cite scikit-learn for its API or design, you may also want to consider the following paper: `API design for machine learning software: experiences from the scikit-learn project `_, Buitinck *et al.*, 2013. Bibtex entry:: @inproceedings{sklearn_api, author = {Lars Buitinck and Gilles Louppe and Mathieu Blondel and Fabian Pedregosa and Andreas Mueller and Olivier Grisel and Vlad Niculae and Peter Prettenhofer and Alexandre Gramfort and Jaques Grobler and Robert Layton and Jake VanderPlas and Arnaud Joly and Brian Holt and Ga{\"{e}}l Varoquaux}, title = {{API} design for machine learning software: experiences from the scikit-learn project}, booktitle = {ECML PKDD Workshop: Languages for Data Mining and Machine Learning}, year = {2013}, pages = {108--122}, } Artwork ------- High quality PNG and SVG logos are available in the `doc/logos/ `_ source directory. .. image:: images/scikit-learn-logo-notext.png :align: center Funding ------- Scikit-Learn is a community driven project, however institutional and private grants help to assure its sustainability. The project would like to thank the following funders. ................................... .. raw:: html
The `Members `_ of the `Scikit-Learn Consortium at Inria Foundation `_ fund Olivier Grisel, Guillaume Lemaitre, Jérémie du Boisberranger and Chiara Marmo. .. raw:: html
.. |msn| image:: images/microsoft.png :width: 100pt :target: https://www.microsoft.com/ .. |bcg| image:: images/bcg.png :width: 100pt :target: https://www.bcg.com/beyond-consulting/bcg-gamma/default.aspx .. |axa| image:: images/axa.png :width: 50pt :target: https://www.axa.fr/ .. |bnp| image:: images/bnp.png :width: 150pt :target: https://www.bnpparibascardif.com/ .. |fujitsu| image:: images/fujitsu.png :width: 100pt :target: https://www.fujitsu.com/global/ .. |intel| image:: images/intel.png :width: 70pt :target: https://www.intel.com/ .. |nvidia| image:: images/nvidia.png :width: 70pt :target: https://www.nvidia.com/ .. |dataiku| image:: images/dataiku.png :width: 70pt :target: https://www.dataiku.com/ .. |inria| image:: images/inria-logo.jpg :width: 100pt :target: https://www.inria.fr .. raw:: html
.. table:: :class: sk-sponsor-table align-default +---------+----------+ | |msn| | |bcg| | +---------+----------+ | | +---------+----------+ | |axa| | |bnp| | +---------+----------+ ||fujitsu|| |intel| | +---------+----------+ | | +---------+----------+ ||dataiku|| |nvidia| | +---------+----------+ | | +---------+----------+ | |inria| | +---------+----------+ .. raw:: html
........ .. raw:: html
`Columbia University `_ funds Andreas Müller since 2016 .. raw:: html
.. image:: themes/scikit-learn/static/img/columbia.png :width: 50pt :align: center :target: https://www.columbia.edu/ .. raw:: html
.......... .. raw:: html
Andreas Müller received a grant to improve scikit-learn from the `Alfred P. Sloan Foundation `_ . This grant supports the position of Nicolas Hug and Thomas J. Fan. .. raw:: html
.. image:: images/sloan_banner.png :width: 100pt :align: center :target: https://sloan.org/ .. raw:: html
........... .. raw:: html
`The University of Sydney `_ funds Joel Nothman since July 2017. .. raw:: html
.. image:: themes/scikit-learn/static/img/sydney-primary.jpeg :width: 100pt :align: center :target: https://sydney.edu.au/ .. raw:: html
Past Sponsors ............. .. raw:: html
`INRIA `_ actively supports this project. It has provided funding for Fabian Pedregosa (2010-2012), Jaques Grobler (2012-2013) and Olivier Grisel (2013-2017) to work on this project full-time. It also hosts coding sprints and other events. .. raw:: html
.. image:: images/inria-logo.jpg :width: 100pt :align: center :target: https://www.inria.fr .. raw:: html
..................... .. raw:: html
`Paris-Saclay Center for Data Science `_ funded one year for a developer to work on the project full-time (2014-2015), 50% of the time of Guillaume Lemaitre (2016-2017) and 50% of the time of Joris van den Bossche (2017-2018). .. raw:: html
.. image:: images/cds-logo.png :width: 100pt :align: center :target: https://www.datascience-paris-saclay.fr/ .. raw:: html
............ .. raw:: html
`Anaconda, Inc `_ funded Adrin Jalali in 2019. .. raw:: html
.. image:: images/anaconda.png :width: 100pt :align: center :target: https://www.anaconda.com/ .. raw:: html
.......................... .. raw:: html
`NYU Moore-Sloan Data Science Environment `_ funded Andreas Mueller (2014-2016) to work on this project. The Moore-Sloan Data Science Environment also funds several students to work on the project part-time. .. raw:: html
.. image:: images/nyu_short_color.png :width: 100pt :align: center :target: https://cds.nyu.edu/mooresloan/ .. raw:: html
........................ .. raw:: html
`Télécom Paristech `_ funded Manoj Kumar (2014), Tom Dupré la Tour (2015), Raghav RV (2015-2017), Thierry Guillemot (2016-2017) and Albert Thomas (2017) to work on scikit-learn. .. raw:: html
.. image:: themes/scikit-learn/static/img/telecom.png :width: 50pt :align: center :target: https://www.telecom-paristech.fr/ .. raw:: html
..................... .. raw:: html
`The Labex DigiCosme `_ funded Nicolas Goix (2015-2016), Tom Dupré la Tour (2015-2016 and 2017-2018), Mathurin Massias (2018-2019) to work part time on scikit-learn during their PhDs. It also funded a scikit-learn coding sprint in 2015. .. raw:: html
.. image:: themes/scikit-learn/static/img/digicosme.png :width: 100pt :align: center :target: https://digicosme.lri.fr .. raw:: html
...................... The following students were sponsored by `Google `_ to work on scikit-learn through the `Google Summer of Code `_ program. - 2007 - David Cournapeau - 2011 - `Vlad Niculae`_ - 2012 - `Vlad Niculae`_, Immanuel Bayer. - 2013 - Kemal Eren, Nicolas Trésegnie - 2014 - Hamzeh Alsalhi, Issam Laradji, Maheshakya Wijewardena, Manoj Kumar. - 2015 - `Raghav RV `_, Wei Xue - 2016 - `Nelson Liu `_, `YenChen Lin `_ .. _Vlad Niculae: https://vene.ro/ ................... The `NeuroDebian `_ project providing `Debian `_ packaging and contributions is supported by `Dr. James V. Haxby `_ (`Dartmouth College `_). Sprints ------- The International 2019 Paris sprint was kindly hosted by `AXA `_. Also some participants could attend thanks to the support of the `Alfred P. Sloan Foundation `_, the `Python Software Foundation `_ (PSF) and the `DATAIA Institute `_. ..................... The 2013 International Paris Sprint was made possible thanks to the support of `Télécom Paristech `_, `tinyclues `_, the `French Python Association `_ and the `Fonds de la Recherche Scientifique `_. .............. The 2011 International Granada sprint was made possible thanks to the support of the `PSF `_ and `tinyclues `_. Donating to the project ....................... If you are interested in donating to the project or to one of our code-sprints, you can use the *Paypal* button below or the `NumFOCUS Donations Page `_ (if you use the latter, please indicate that you are donating for the scikit-learn project). All donations will be handled by `NumFOCUS `_, a non-profit-organization which is managed by a board of `Scipy community members `_. NumFOCUS's mission is to foster scientific computing software, in particular in Python. As a fiscal home of scikit-learn, it ensures that money is available when needed to keep the project funded and available while in compliance with tax regulations. The received donations for the scikit-learn project mostly will go towards covering travel-expenses for code sprints, as well as towards the organization budget of the project [#f1]_. .. raw :: html


.. rubric:: Notes .. [#f1] Regarding the organization budget, in particular, we might use some of the donated funds to pay for other project expenses such as DNS, hosting or continuous integration services. Infrastructure support ---------------------- - We would like to thank `Rackspace `_ for providing us with a free `Rackspace Cloud `_ account to automatically build the documentation and the example gallery from for the development version of scikit-learn using `this tool `_. - We would also like to thank `Microsoft Azure `_, `Travis Cl `_, `CircleCl `_ for free CPU time on their Continuous Integration servers. ././@PaxHeader0000000000000000000000000000003400000000000011452 xustar000000000000000028 mtime=1596543178.8566754 scikit-learn-0.23.2/doc/authors.rst0000644000175100001660000000742000000000000017444 0ustar00vstsdocker00000000000000.. raw :: html

Jérémie du Boisberranger


Joris Van den Bossche


Loïc Estève


Thomas J Fan


Alexandre Gramfort


Olivier Grisel


Yaroslav Halchenko


Nicolas Hug


Adrin Jalali


Guillaume Lemaitre


Jan Hendrik Metzen


Andreas Mueller


Vlad Niculae


Joel Nothman


Hanmin Qin


Bertrand Thirion


Tom Dupré la Tour


Gael Varoquaux


Nelle Varoquaux


Roman Yurchak

././@PaxHeader0000000000000000000000000000003400000000000011452 xustar000000000000000028 mtime=1596543178.8566754 scikit-learn-0.23.2/doc/authors_emeritus.rst0000644000175100001660000000107500000000000021361 0ustar00vstsdocker00000000000000- Mathieu Blondel - Matthieu Brucher - Lars Buitinck - David Cournapeau - Noel Dawe - Shiqiao Du - Vincent Dubourg - Edouard Duchesnay - Alexander Fabisch - Virgile Fritsch - Satrajit Ghosh - Angel Soler Gollonet - Chris Gorgolewski - Jaques Grobler - Brian Holt - Arnaud Joly - Thouis (Ray) Jones - Kyle Kastner - manoj kumar - Robert Layton - Wei Li - Paolo Losi - Gilles Louppe - Vincent Michel - Jarrod Millman - Alexandre Passos - Fabian Pedregosa - Peter Prettenhofer - (Venkat) Raghav, Rajagopalan - Jacob Schreiber - Jake Vanderplas - David Warde-Farley - Ron Weiss././@PaxHeader0000000000000000000000000000003400000000000011452 xustar000000000000000028 mtime=1596543193.0047581 scikit-learn-0.23.2/doc/binder/0000755000175100001660000000000000000000000016465 5ustar00vstsdocker00000000000000././@PaxHeader0000000000000000000000000000003400000000000011452 xustar000000000000000028 mtime=1596543178.8566754 scikit-learn-0.23.2/doc/binder/requirements.txt0000644000175100001660000000054500000000000021755 0ustar00vstsdocker00000000000000# A binder requirement file is required by sphinx-gallery. We don't really need # one since the binder requirement files live in the # scikit-learn/binder-examples repo and not in the scikit-learn.github.io repo # that comes from the scikit-learn doc build. This file can be removed if # 'dependencies' is made an optional key for binder in sphinx-gallery. ././@PaxHeader0000000000000000000000000000003400000000000011452 xustar000000000000000028 mtime=1596543178.8566754 scikit-learn-0.23.2/doc/conf.py0000644000175100001660000003311000000000000016517 0ustar00vstsdocker00000000000000# -*- coding: utf-8 -*- # # scikit-learn documentation build configuration file, created by # sphinx-quickstart on Fri Jan 8 09:13:42 2010. # # This file is execfile()d with the current directory set to its containing # dir. # # Note that not all possible configuration values are present in this # autogenerated file. # # All configuration values have a default; values that are commented out # serve to show the default. import sys import os import warnings import re from packaging.version import parse from pathlib import Path # If extensions (or modules to document with autodoc) are in another # directory, add these directories to sys.path here. If the directory # is relative to the documentation root, use os.path.abspath to make it # absolute, like shown here. sys.path.insert(0, os.path.abspath('sphinxext')) from github_link import make_linkcode_resolve import sphinx_gallery # -- General configuration --------------------------------------------------- # Add any Sphinx extension module names here, as strings. They can be # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom ones. extensions = [ 'sphinx.ext.autodoc', 'sphinx.ext.autosummary', 'numpydoc', 'sphinx.ext.linkcode', 'sphinx.ext.doctest', 'sphinx.ext.intersphinx', 'sphinx.ext.imgconverter', 'sphinx_gallery.gen_gallery', 'sphinx_issues' ] # this is needed for some reason... # see https://github.com/numpy/numpydoc/issues/69 numpydoc_class_members_toctree = False # For maths, use mathjax by default and svg if NO_MATHJAX env variable is set # (useful for viewing the doc offline) if os.environ.get('NO_MATHJAX'): extensions.append('sphinx.ext.imgmath') imgmath_image_format = 'svg' mathjax_path = '' else: extensions.append('sphinx.ext.mathjax') mathjax_path = ('https://cdn.jsdelivr.net/npm/mathjax@3/es5/' 'tex-chtml.js') autodoc_default_options = { 'members': True, 'inherited-members': True } # Add any paths that contain templates here, relative to this directory. templates_path = ['templates'] # generate autosummary even if no references autosummary_generate = True # The suffix of source filenames. source_suffix = '.rst' # The encoding of source files. #source_encoding = 'utf-8' # The master toctree document. master_doc = 'contents' # General information about the project. project = 'scikit-learn' copyright = '2007 - 2020, scikit-learn developers (BSD License)' # The version info for the project you're documenting, acts as replacement for # |version| and |release|, also used in various other places throughout the # built documents. # # The short X.Y version. import sklearn parsed_version = parse(sklearn.__version__) version = ".".join(parsed_version.base_version.split(".")[:2]) # The full version, including alpha/beta/rc tags. # Removes post from release name if parsed_version.is_postrelease: release = parsed_version.base_version else: release = sklearn.__version__ # The language for content autogenerated by Sphinx. Refer to documentation # for a list of supported languages. #language = None # There are two options for replacing |today|: either, you set today to some # non-false value, then it is used: #today = '' # Else, today_fmt is used as the format for a strftime call. #today_fmt = '%B %d, %Y' # List of patterns, relative to source directory, that match files and # directories to ignore when looking for source files. exclude_patterns = ['_build', 'templates', 'includes', 'themes'] # The reST default role (used for this markup: `text`) to use for all # documents. default_role = 'literal' # If true, '()' will be appended to :func: etc. cross-reference text. add_function_parentheses = False # If true, the current module name will be prepended to all description # unit titles (such as .. function::). #add_module_names = True # If true, sectionauthor and moduleauthor directives will be shown in the # output. They are ignored by default. #show_authors = False # The name of the Pygments (syntax highlighting) style to use. pygments_style = 'sphinx' # A list of ignored prefixes for module index sorting. #modindex_common_prefix = [] # -- Options for HTML output ------------------------------------------------- # The theme to use for HTML and HTML Help pages. Major themes that come with # Sphinx are currently 'default' and 'sphinxdoc'. html_theme = 'scikit-learn-modern' # Theme options are theme-specific and customize the look and feel of a theme # further. For a list of options available for each theme, see the # documentation. html_theme_options = {'google_analytics': True, 'mathjax_path': mathjax_path} # Add any paths that contain custom themes here, relative to this directory. html_theme_path = ['themes'] # The name for this set of Sphinx documents. If None, it defaults to # " v documentation". #html_title = None # A shorter title for the navigation bar. Default is the same as html_title. html_short_title = 'scikit-learn' # The name of an image file (relative to this directory) to place at the top # of the sidebar. html_logo = 'logos/scikit-learn-logo-small.png' # The name of an image file (within the static path) to use as favicon of the # docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32 # pixels large. html_favicon = 'logos/favicon.ico' # Add any paths that contain custom static files (such as style sheets) here, # relative to this directory. They are copied after the builtin static files, # so a file named "default.css" will overwrite the builtin "default.css". html_static_path = ['images'] # If not '', a 'Last updated on:' timestamp is inserted at every page bottom, # using the given strftime format. #html_last_updated_fmt = '%b %d, %Y' # Custom sidebar templates, maps document names to template names. #html_sidebars = {} # Additional templates that should be rendered to pages, maps page names to # template names. html_additional_pages = { 'index': 'index.html', 'documentation': 'documentation.html'} # redirects to index # If false, no module index is generated. html_domain_indices = False # If false, no index is generated. html_use_index = False # If true, the index is split into individual pages for each letter. #html_split_index = False # If true, links to the reST sources are added to the pages. #html_show_sourcelink = True # If true, an OpenSearch description file will be output, and all pages will # contain a tag referring to it. The value of this option must be the # base URL from which the finished HTML is served. #html_use_opensearch = '' # If nonempty, this is the file name suffix for HTML files (e.g. ".xhtml"). #html_file_suffix = '' # Output file base name for HTML help builder. htmlhelp_basename = 'scikit-learndoc' # If true, the reST sources are included in the HTML build as _sources/name. html_copy_source = True # Adds variables into templates html_context = {} # finds latest release highlights and places it into HTML context for # index.html release_highlights_dir = Path("..") / "examples" / "release_highlights" # Finds the highlight with the latest version number latest_highlights = sorted(release_highlights_dir.glob( "plot_release_highlights_*.py"))[-1] latest_highlights = latest_highlights.with_suffix('').name html_context["release_highlights"] = \ f"auto_examples/release_highlights/{latest_highlights}" # get version from higlight name assuming highlights have the form # plot_release_highlights_0_22_0 highlight_version = ".".join(latest_highlights.split("_")[-3:-1]) html_context["release_highlights_version"] = highlight_version # -- Options for LaTeX output ------------------------------------------------ latex_elements = { # The paper size ('letterpaper' or 'a4paper'). # 'papersize': 'letterpaper', # The font size ('10pt', '11pt' or '12pt'). # 'pointsize': '10pt', # Additional stuff for the LaTeX preamble. 'preamble': r""" \usepackage{amsmath}\usepackage{amsfonts}\usepackage{bm} \usepackage{morefloats}\usepackage{enumitem} \setlistdepth{10} """ } # Grouping the document tree into LaTeX files. List of tuples # (source start file, target name, title, author, documentclass # [howto/manual]). latex_documents = [('contents', 'user_guide.tex', 'scikit-learn user guide', 'scikit-learn developers', 'manual'), ] # The name of an image file (relative to this directory) to place at the top of # the title page. latex_logo = "logos/scikit-learn-logo.png" # Documents to append as an appendix to all manuals. # latex_appendices = [] # If false, no module index is generated. latex_domain_indices = False trim_doctests_flags = True # intersphinx configuration intersphinx_mapping = { 'python': ('https://docs.python.org/{.major}'.format( sys.version_info), None), 'numpy': ('https://docs.scipy.org/doc/numpy/', None), 'scipy': ('https://docs.scipy.org/doc/scipy/reference', None), 'matplotlib': ('https://matplotlib.org/', None), 'pandas': ('https://pandas.pydata.org/pandas-docs/stable/', None), 'joblib': ('https://joblib.readthedocs.io/en/latest/', None), 'seaborn': ('https://seaborn.pydata.org/', None), } v = parse(release) if v.release is None: raise ValueError( 'Ill-formed version: {!r}. Version should follow ' 'PEP440'.format(version)) if v.is_devrelease: binder_branch = 'master' else: major, minor = v.release[:2] binder_branch = '{}.{}.X'.format(major, minor) class SubSectionTitleOrder: """Sort example gallery by title of subsection. Assumes README.txt exists for all subsections and uses the subsection with dashes, '---', as the adornment. """ def __init__(self, src_dir): self.src_dir = src_dir self.regex = re.compile(r"^([\w ]+)\n-", re.MULTILINE) def __repr__(self): return '<%s>' % (self.__class__.__name__,) def __call__(self, directory): src_path = os.path.normpath(os.path.join(self.src_dir, directory)) # Forces Release Highlights to the top if os.path.basename(src_path) == "release_highlights": return "0" readme = os.path.join(src_path, "README.txt") try: with open(readme, 'r') as f: content = f.read() except FileNotFoundError: return directory title_match = self.regex.search(content) if title_match is not None: return title_match.group(1) return directory sphinx_gallery_conf = { 'doc_module': 'sklearn', 'backreferences_dir': os.path.join('modules', 'generated'), 'show_memory': False, 'reference_url': { 'sklearn': None}, 'examples_dirs': ['../examples'], 'gallery_dirs': ['auto_examples'], 'subsection_order': SubSectionTitleOrder('../examples'), 'binder': { 'org': 'scikit-learn', 'repo': 'scikit-learn', 'binderhub_url': 'https://mybinder.org', 'branch': binder_branch, 'dependencies': './binder/requirements.txt', 'use_jupyter_lab': True }, # avoid generating too many cross links 'inspect_global_variables': False, 'remove_config_comments': True, } # The following dictionary contains the information used to create the # thumbnails for the front page of the scikit-learn home page. # key: first image in set # values: (number of plot in set, height of thumbnail) carousel_thumbs = {'sphx_glr_plot_classifier_comparison_001.png': 600} # enable experimental module so that experimental estimators can be # discovered properly by sphinx from sklearn.experimental import enable_hist_gradient_boosting # noqa from sklearn.experimental import enable_iterative_imputer # noqa def make_carousel_thumbs(app, exception): """produces the final resized carousel images""" if exception is not None: return print('Preparing carousel images') image_dir = os.path.join(app.builder.outdir, '_images') for glr_plot, max_width in carousel_thumbs.items(): image = os.path.join(image_dir, glr_plot) if os.path.exists(image): c_thumb = os.path.join(image_dir, glr_plot[:-4] + '_carousel.png') sphinx_gallery.gen_rst.scale_image(image, c_thumb, max_width, 190) def filter_search_index(app, exception): if exception is not None: return # searchindex only exist when generating html if app.builder.name != 'html': return print('Removing methods from search index') searchindex_path = os.path.join(app.builder.outdir, 'searchindex.js') with open(searchindex_path, 'r') as f: searchindex_text = f.read() searchindex_text = re.sub(r'{__init__.+?}', '{}', searchindex_text) searchindex_text = re.sub(r'{__call__.+?}', '{}', searchindex_text) with open(searchindex_path, 'w') as f: f.write(searchindex_text) # Config for sphinx_issues # we use the issues path for PRs since the issues URL will forward issues_github_path = 'scikit-learn/scikit-learn' def setup(app): # to hide/show the prompt in code examples: app.connect('build-finished', make_carousel_thumbs) app.connect('build-finished', filter_search_index) # The following is used by sphinx.ext.linkcode to provide links to github linkcode_resolve = make_linkcode_resolve('sklearn', 'https://github.com/scikit-learn/' 'scikit-learn/blob/{revision}/' '{package}/{path}#L{lineno}') warnings.filterwarnings("ignore", category=UserWarning, message='Matplotlib is currently using agg, which is a' ' non-GUI backend, so cannot show the figure.') ././@PaxHeader0000000000000000000000000000003400000000000011452 xustar000000000000000028 mtime=1596543178.8566754 scikit-learn-0.23.2/doc/conftest.py0000644000175100001660000000566000000000000017430 0ustar00vstsdocker00000000000000import os from os.path import exists from os.path import join import warnings import numpy as np from sklearn.utils import IS_PYPY from sklearn.utils._testing import SkipTest from sklearn.utils._testing import check_skip_network from sklearn.datasets import get_data_home from sklearn.datasets._base import _pkl_filepath from sklearn.datasets._twenty_newsgroups import CACHE_NAME def setup_labeled_faces(): data_home = get_data_home() if not exists(join(data_home, 'lfw_home')): raise SkipTest("Skipping dataset loading doctests") def setup_rcv1(): check_skip_network() # skip the test in rcv1.rst if the dataset is not already loaded rcv1_dir = join(get_data_home(), "RCV1") if not exists(rcv1_dir): raise SkipTest("Download RCV1 dataset to run this test.") def setup_twenty_newsgroups(): data_home = get_data_home() cache_path = _pkl_filepath(get_data_home(), CACHE_NAME) if not exists(cache_path): raise SkipTest("Skipping dataset loading doctests") def setup_working_with_text_data(): if IS_PYPY and os.environ.get('CI', None): raise SkipTest('Skipping too slow test with PyPy on CI') check_skip_network() cache_path = _pkl_filepath(get_data_home(), CACHE_NAME) if not exists(cache_path): raise SkipTest("Skipping dataset loading doctests") def setup_compose(): try: import pandas # noqa except ImportError: raise SkipTest("Skipping compose.rst, pandas not installed") def setup_impute(): try: import pandas # noqa except ImportError: raise SkipTest("Skipping impute.rst, pandas not installed") def setup_unsupervised_learning(): try: import skimage # noqa except ImportError: raise SkipTest("Skipping unsupervised_learning.rst, scikit-image " "not installed") # ignore deprecation warnings from scipy.misc.face warnings.filterwarnings('ignore', 'The binary mode of fromstring', DeprecationWarning) def pytest_runtest_setup(item): fname = item.fspath.strpath is_index = fname.endswith('datasets/index.rst') if fname.endswith('datasets/labeled_faces.rst') or is_index: setup_labeled_faces() elif fname.endswith('datasets/rcv1.rst') or is_index: setup_rcv1() elif fname.endswith('datasets/twenty_newsgroups.rst') or is_index: setup_twenty_newsgroups() elif fname.endswith('tutorial/text_analytics/working_with_text_data.rst')\ or is_index: setup_working_with_text_data() elif fname.endswith('modules/compose.rst') or is_index: setup_compose() elif IS_PYPY and fname.endswith('modules/feature_extraction.rst'): raise SkipTest('FeatureHasher is not compatible with PyPy') elif fname.endswith('modules/impute.rst'): setup_impute() elif fname.endswith('statistical_inference/unsupervised_learning.rst'): setup_unsupervised_learning() ././@PaxHeader0000000000000000000000000000003400000000000011452 xustar000000000000000028 mtime=1596543178.8566754 scikit-learn-0.23.2/doc/contents.rst0000644000175100001660000000062600000000000017615 0ustar00vstsdocker00000000000000.. include:: includes/big_toc_css.rst .. include:: tune_toc.rst .. Places global toc into the sidebar :globalsidebartoc: True ================= Table Of Contents ================= .. Define an order for the Table of Contents: .. toctree:: :maxdepth: 2 preface tutorial/index getting_started user_guide glossary auto_examples/index modules/classes developers/index ././@PaxHeader0000000000000000000000000000003400000000000011452 xustar000000000000000028 mtime=1596543178.8566754 scikit-learn-0.23.2/doc/data_transforms.rst0000644000175100001660000000245400000000000021150 0ustar00vstsdocker00000000000000.. include:: includes/big_toc_css.rst .. _data-transforms: Dataset transformations ----------------------- scikit-learn provides a library of transformers, which may clean (see :ref:`preprocessing`), reduce (see :ref:`data_reduction`), expand (see :ref:`kernel_approximation`) or generate (see :ref:`feature_extraction`) feature representations. Like other estimators, these are represented by classes with a ``fit`` method, which learns model parameters (e.g. mean and standard deviation for normalization) from a training set, and a ``transform`` method which applies this transformation model to unseen data. ``fit_transform`` may be more convenient and efficient for modelling and transforming the training data simultaneously. Combining such transformers, either in parallel or series is covered in :ref:`combining_estimators`. :ref:`metrics` covers transforming feature spaces into affinity matrices, while :ref:`preprocessing_targets` considers transformations of the target space (e.g. categorical labels) for use in scikit-learn. .. toctree:: :maxdepth: 2 modules/compose modules/feature_extraction modules/preprocessing modules/impute modules/unsupervised_reduction modules/random_projection modules/kernel_approximation modules/metrics modules/preprocessing_targets ././@PaxHeader0000000000000000000000000000003400000000000011452 xustar000000000000000028 mtime=1596543193.0047581 scikit-learn-0.23.2/doc/datasets/0000755000175100001660000000000000000000000017032 5ustar00vstsdocker00000000000000././@PaxHeader0000000000000000000000000000003400000000000011452 xustar000000000000000028 mtime=1596543178.8566754 scikit-learn-0.23.2/doc/datasets/index.rst0000644000175100001660000004512200000000000020677 0ustar00vstsdocker00000000000000.. _datasets: ========================= Dataset loading utilities ========================= .. currentmodule:: sklearn.datasets The ``sklearn.datasets`` package embeds some small toy datasets as introduced in the :ref:`Getting Started ` section. This package also features helpers to fetch larger datasets commonly used by the machine learning community to benchmark algorithms on data that comes from the 'real world'. To evaluate the impact of the scale of the dataset (``n_samples`` and ``n_features``) while controlling the statistical properties of the data (typically the correlation and informativeness of the features), it is also possible to generate synthetic data. General dataset API =================== There are three main kinds of dataset interfaces that can be used to get datasets depending on the desired type of dataset. **The dataset loaders.** They can be used to load small standard datasets, described in the :ref:`toy_datasets` section. **The dataset fetchers.** They can be used to download and load larger datasets, described in the :ref:`real_world_datasets` section. Both loaders and fetchers functions return a :class:`sklearn.utils.Bunch` object holding at least two items: an array of shape ``n_samples`` * ``n_features`` with key ``data`` (except for 20newsgroups) and a numpy array of length ``n_samples``, containing the target values, with key ``target``. The Bunch object is a dictionary that exposes its keys are attributes. For more information about Bunch object, see :class:`sklearn.utils.Bunch`: It's also possible for almost all of these function to constrain the output to be a tuple containing only the data and the target, by setting the ``return_X_y`` parameter to ``True``. The datasets also contain a full description in their ``DESCR`` attribute and some contain ``feature_names`` and ``target_names``. See the dataset descriptions below for details. **The dataset generation functions.** They can be used to generate controlled synthetic datasets, described in the :ref:`sample_generators` section. These functions return a tuple ``(X, y)`` consisting of a ``n_samples`` * ``n_features`` numpy array ``X`` and an array of length ``n_samples`` containing the targets ``y``. In addition, there are also miscellaneous tools to load datasets of other formats or from other locations, described in the :ref:`loading_other_datasets` section. .. _toy_datasets: Toy datasets ============ scikit-learn comes with a few small standard datasets that do not require to download any file from some external website. They can be loaded using the following functions: .. autosummary:: :toctree: ../modules/generated/ :template: function.rst load_boston load_iris load_diabetes load_digits load_linnerud load_wine load_breast_cancer These datasets are useful to quickly illustrate the behavior of the various algorithms implemented in scikit-learn. They are however often too small to be representative of real world machine learning tasks. .. include:: ../../sklearn/datasets/descr/boston_house_prices.rst .. include:: ../../sklearn/datasets/descr/iris.rst .. include:: ../../sklearn/datasets/descr/diabetes.rst .. include:: ../../sklearn/datasets/descr/digits.rst .. include:: ../../sklearn/datasets/descr/linnerud.rst .. include:: ../../sklearn/datasets/descr/wine_data.rst .. include:: ../../sklearn/datasets/descr/breast_cancer.rst .. _real_world_datasets: Real world datasets =================== scikit-learn provides tools to load larger datasets, downloading them if necessary. They can be loaded using the following functions: .. autosummary:: :toctree: ../modules/generated/ :template: function.rst fetch_olivetti_faces fetch_20newsgroups fetch_20newsgroups_vectorized fetch_lfw_people fetch_lfw_pairs fetch_covtype fetch_rcv1 fetch_kddcup99 fetch_california_housing .. include:: ../../sklearn/datasets/descr/olivetti_faces.rst .. include:: ../../sklearn/datasets/descr/twenty_newsgroups.rst .. include:: ../../sklearn/datasets/descr/lfw.rst .. include:: ../../sklearn/datasets/descr/covtype.rst .. include:: ../../sklearn/datasets/descr/rcv1.rst .. include:: ../../sklearn/datasets/descr/kddcup99.rst .. include:: ../../sklearn/datasets/descr/california_housing.rst .. _sample_generators: Generated datasets ================== In addition, scikit-learn includes various random sample generators that can be used to build artificial datasets of controlled size and complexity. Generators for classification and clustering -------------------------------------------- These generators produce a matrix of features and corresponding discrete targets. Single label ~~~~~~~~~~~~ Both :func:`make_blobs` and :func:`make_classification` create multiclass datasets by allocating each class one or more normally-distributed clusters of points. :func:`make_blobs` provides greater control regarding the centers and standard deviations of each cluster, and is used to demonstrate clustering. :func:`make_classification` specialises in introducing noise by way of: correlated, redundant and uninformative features; multiple Gaussian clusters per class; and linear transformations of the feature space. :func:`make_gaussian_quantiles` divides a single Gaussian cluster into near-equal-size classes separated by concentric hyperspheres. :func:`make_hastie_10_2` generates a similar binary, 10-dimensional problem. .. image:: ../auto_examples/datasets/images/sphx_glr_plot_random_dataset_001.png :target: ../auto_examples/datasets/plot_random_dataset.html :scale: 50 :align: center :func:`make_circles` and :func:`make_moons` generate 2d binary classification datasets that are challenging to certain algorithms (e.g. centroid-based clustering or linear classification), including optional Gaussian noise. They are useful for visualisation. :func:`make_circles` produces Gaussian data with a spherical decision boundary for binary classification, while :func:`make_moons` produces two interleaving half circles. Multilabel ~~~~~~~~~~ :func:`make_multilabel_classification` generates random samples with multiple labels, reflecting a bag of words drawn from a mixture of topics. The number of topics for each document is drawn from a Poisson distribution, and the topics themselves are drawn from a fixed random distribution. Similarly, the number of words is drawn from Poisson, with words drawn from a multinomial, where each topic defines a probability distribution over words. Simplifications with respect to true bag-of-words mixtures include: * Per-topic word distributions are independently drawn, where in reality all would be affected by a sparse base distribution, and would be correlated. * For a document generated from multiple topics, all topics are weighted equally in generating its bag of words. * Documents without labels words at random, rather than from a base distribution. .. image:: ../auto_examples/datasets/images/sphx_glr_plot_random_multilabel_dataset_001.png :target: ../auto_examples/datasets/plot_random_multilabel_dataset.html :scale: 50 :align: center Biclustering ~~~~~~~~~~~~ .. autosummary:: :toctree: ../modules/generated/ :template: function.rst make_biclusters make_checkerboard Generators for regression ------------------------- :func:`make_regression` produces regression targets as an optionally-sparse random linear combination of random features, with noise. Its informative features may be uncorrelated, or low rank (few features account for most of the variance). Other regression generators generate functions deterministically from randomized features. :func:`make_sparse_uncorrelated` produces a target as a linear combination of four features with fixed coefficients. Others encode explicitly non-linear relations: :func:`make_friedman1` is related by polynomial and sine transforms; :func:`make_friedman2` includes feature multiplication and reciprocation; and :func:`make_friedman3` is similar with an arctan transformation on the target. Generators for manifold learning -------------------------------- .. autosummary:: :toctree: ../modules/generated/ :template: function.rst make_s_curve make_swiss_roll Generators for decomposition ---------------------------- .. autosummary:: :toctree: ../modules/generated/ :template: function.rst make_low_rank_matrix make_sparse_coded_signal make_spd_matrix make_sparse_spd_matrix .. _loading_other_datasets: Loading other datasets ====================== .. _sample_images: Sample images ------------- Scikit-learn also embed a couple of sample JPEG images published under Creative Commons license by their authors. Those images can be useful to test algorithms and pipeline on 2D data. .. autosummary:: :toctree: ../modules/generated/ :template: function.rst load_sample_images load_sample_image .. image:: ../auto_examples/cluster/images/sphx_glr_plot_color_quantization_001.png :target: ../auto_examples/cluster/plot_color_quantization.html :scale: 30 :align: right .. warning:: The default coding of images is based on the ``uint8`` dtype to spare memory. Often machine learning algorithms work best if the input is converted to a floating point representation first. Also, if you plan to use ``matplotlib.pyplpt.imshow`` don't forget to scale to the range 0 - 1 as done in the following example. .. topic:: Examples: * :ref:`sphx_glr_auto_examples_cluster_plot_color_quantization.py` .. _libsvm_loader: Datasets in svmlight / libsvm format ------------------------------------ scikit-learn includes utility functions for loading datasets in the svmlight / libsvm format. In this format, each line takes the form ``