pax_global_header00006660000000000000000000000064125442555340014523gustar00rootroot0000000000000052 comment=9664247bbe209dee807d3dff5b8c259236e5a66b seaborn-0.6.0/000077500000000000000000000000001254425553400131575ustar00rootroot00000000000000seaborn-0.6.0/.coveragerc000066400000000000000000000000541254425553400152770ustar00rootroot00000000000000[run] include = seaborn/* omit = *external* seaborn-0.6.0/.gitignore000066400000000000000000000001261254425553400151460ustar00rootroot00000000000000*.pyc *.sw* build/ .ipynb_checkpoints/ dist/ seaborn.egg-info/ .coverage cover/ .idea seaborn-0.6.0/.mailmap000066400000000000000000000002211254425553400145730ustar00rootroot00000000000000Michael Waskom mwaskom Tal Yarkoni Daniel B. Allan seaborn-0.6.0/.travis.yml000066400000000000000000000026651254425553400153010ustar00rootroot00000000000000language: python env: - PYTHON=2.7 DEPS=modern - PYTHON=2.7 DEPS=minimal - PYTHON=3.3 DEPS=modern - PYTHON=3.4 DEPS=modern install: - conda update conda --yes - conda create -n testenv --yes pip python=$PYTHON - conda update conda --yes - source activate testenv - if [ ${PYTHON:0:1} == "2" ]; then conda install --yes imaging; else pip install pillow; fi - conda install --yes --file testing/deps_${DEPS}_${PYTHON}.txt - conda install ipython-notebook=2 --yes #- pip install sphinx numpydoc sphinx_bootstrap_theme runipy #- sudo apt-get install pandoc - pip install pep8==1.5 # Later versions get stricter... - pip install https://github.com/dcramer/pyflakes/tarball/master - pip install . - cp testing/matplotlibrc . before_install: - sudo apt-get update -yq - sudo sh -c "echo ttf-mscorefonts-installer msttcorefonts/accepted-mscorefonts-eula select true | debconf-set-selections" - sudo apt-get install msttcorefonts -qq - wget http://repo.continuum.io/miniconda/Miniconda-latest-Linux-x86_64.sh -O miniconda.sh - chmod +x miniconda.sh - ./miniconda.sh -b - export PATH=/home/travis/miniconda/bin:$PATH before_script: - if [ ${PYTHON:0:1} == "2" ]; then make lint; fi #- if [ ${TRAVIS_PYTHON_VERSION:0:1} == "2" ]; then # cd doc; # make notebooks html; # cd ..; # fi script: - if [ $DEPS == 'modern' ]; then nosetests --with-doctest; else nosetests; fi seaborn-0.6.0/CONTRIBUTING.md000066400000000000000000000067731254425553400154250ustar00rootroot00000000000000Contributing to seaborn ======================= General support --------------- General support questions are most at home on [StackOverflow](http://stackoverflow.com/), where they will be seen by more people and are more easily searchable. StackOverflow has a `seaborn` tag, which will bring the question to the attention of people who might be able to answer. Reporting bugs -------------- If you think you have encountered a bug in seaborn, please report it on the [Github issue tracker](https://github.com/mwaskom/seaborn/issues/new). It will be most helpful to include a reproducible script with one of the example datasets (accessed through `load_dataset()`) or using some randomly-generated data. It is difficult debug any issues without knowing the versions of seaborn and matplotlib you are using, as well as what matplotlib backend you are using to draw the plots, so please include those in your bug report. Fixing bugs ----------- If you know how to fix a bug you have encountered or see on the issue tracker, that is very appreciated. Please submit a [pull request](https://help.github.com/articles/using-pull-requests/) on the main seaborn repository with the fix. The presence of a bug implies a lack of coverage in the tests, so when fixing a bug, it is best to add a test that fails before the fix and passes after to make sure it does not reappear. See the section on testing below. But if there is an obvious fix and you're not sure how to write a test, don't let that stop you. Documentation issues -------------------- If you see something wrong or confusing in the documentation, please report it with an issue or fix it and open a pull request. New features ------------ If you'd like to add a new feature to seaborn, it's best to open an issue to discuss it first. Given the nature of seaborn's goals and approach, it can be hard to write a substantial contribution that is consistent with the rest of the package, and I often lack the bandwidth to help. Also, every new feature represents a new commitment for support. For these reasons, I'm somewhat averse to large feature contributions. Smaller or well-targeted enhancements can be helpful and should be submitted through the normal pull-request workflow. Please include tests for any new features and make sure your changes don't break any existing tests. Testing seaborn --------------- Seaborn is primarily tested through a `nose` unit-test suite that interacts with the private objects that actually draw the plots behind the function interface. The basic approach here is to test the numeric information going into and coming out of the matplotlib functions. Currently, there is a general assumption that matplotlib is drawing things properly, and tests are run against the data that ends up in the matplotlib objects but not against the images themselves. See the existing tests for examples of how this works. To execute the test suite and doctests, run `make test` in the root source directory. You can also build a test coverage report with `make coverage`. The `make lint` command will run `pep8` and `pyflakes` over the codebase to check for style issues. Doing so requires [this](https://github.com/dcramer/pyflakes) fork of pyflakes, which can be installed with `pip install https://github.com/dcramer/pyflakes/tarball/master`. Is also currently requires `pep8` 1.5 or older, as the rules got stricter and the codebase has not been updated. This is part of the Travis build, and the build will fail if there are issues, so please do this before submitting a pull request. seaborn-0.6.0/LICENSE000066400000000000000000000027321254425553400141700ustar00rootroot00000000000000Copyright (c) 2012-2013, Michael L. Waskom All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of the {organization} nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. seaborn-0.6.0/MANIFEST.in000066400000000000000000000001271254425553400147150ustar00rootroot00000000000000include README.md include CONTRIBUTING.md include LICENSE recursive-include licences * seaborn-0.6.0/Makefile000066400000000000000000000005261254425553400146220ustar00rootroot00000000000000export SHELL := /bin/bash test: cp testing/matplotlibrc . nosetests --with-doctest rm matplotlibrc coverage: cp testing/matplotlibrc . nosetests --cover-erase --with-coverage --cover-html --cover-package seaborn rm matplotlibrc lint: pyflakes -x W seaborn pep8 --exclude external seaborn hexstrip: make -C examples hexstrip seaborn-0.6.0/README.md000066400000000000000000000063041254425553400144410ustar00rootroot00000000000000Seaborn: statistical data visualization ======================================= Seaborn is a Python visualization library based on matplotlib. It provides a high-level interface for drawing attractive statistical graphics. Documentation ------------- Online documentation is available [here](http://stanford.edu/~mwaskom/software/seaborn/). It includes a high-level tutorial, detailed API documentation, and other useful info. There are docs for the development version [here](http://stanford.edu/~mwaskom/software/seaborn-dev/). These should more or less correspond with the github master branch, but they're not built automatically and thus may fall out of sync at times. Examples -------- The documentation has an [example gallery](http://stanford.edu/~mwaskom/software/seaborn/examples/index.html) with short scripts showing how to use different parts of the package. Citing ------ Seaborn can be cited using a DOI provided through Zenodo: [![DOI](https://zenodo.org/badge/doi/10.5072/zenodo.12710.png)](http://dx.doi.org/10.5072/zenodo.12710) Dependencies ------------ - Python 2.7 or 3.3+ ### Mandatory - [numpy](http://www.numpy.org/) - [scipy](http://www.scipy.org/) - [matplotlib](http://matplotlib.sourceforge.net) - [pandas](http://pandas.pydata.org/) ### Recommended - [statsmodels](http://statsmodels.sourceforge.net/) Installation ------------ To install the released version, just do pip install seaborn You may instead want to use the development version from Github, by running pip install git+git://github.com/mwaskom/seaborn.git#egg=seaborn Testing ------- [![Build Status](https://travis-ci.org/mwaskom/seaborn.png?branch=master)](https://travis-ci.org/mwaskom/seaborn) To test seaborn, run `make test` in the source directory. This will run the unit-test and doctest suite (using `nose`). Development ----------- https://github.com/mwaskom/seaborn Please [submit](https://github.com/mwaskom/seaborn/issues/new) any bugs you encounter to the Github issue tracker. License ------- Released under a BSD (3-clause) license Celebrity Endorsements ---------------------- "Those are nice plots" -Hadley Wickham seaborn-0.6.0/devel_requirements_py2.txt000066400000000000000000000001021254425553400204050ustar00rootroot00000000000000ipython>=1.0 sphinx>=1.2 sphinx_bootstrap_theme numpydoc PIL nose seaborn-0.6.0/devel_requirements_py3.txt000066400000000000000000000001051254425553400204110ustar00rootroot00000000000000ipython>=1.0 sphinx>=1.2 sphinx_bootstrap_theme numpydoc pillow nose seaborn-0.6.0/doc/000077500000000000000000000000001254425553400137245ustar00rootroot00000000000000seaborn-0.6.0/doc/.gitignore000066400000000000000000000003401254425553400157110ustar00rootroot00000000000000*_files/ _build/ generated/ examples/ example_thumbs/ aesthetics.rst color_palettes.rst distributions.rst regression.rst categorical.rst plotting_distributions.rst dataset_exploration.rst timeseries_plots.rst axis_grids.rst seaborn-0.6.0/doc/Makefile000066400000000000000000000135641254425553400153750ustar00rootroot00000000000000# Makefile for Sphinx documentation # # You can set these variables from the command line. SPHINXOPTS = SPHINXBUILD = sphinx-build PAPER = BUILDDIR = _build # Internal variables. PAPEROPT_a4 = -D latex_paper_size=a4 PAPEROPT_letter = -D latex_paper_size=letter ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) . # the i18n builder cannot share the environment and doctrees with the others I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) . .PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest gettext help: @echo "Please use \`make ' where is one of" @echo " html to make standalone HTML files" @echo " dirhtml to make HTML files named index.html in directories" @echo " singlehtml to make a single large HTML file" @echo " pickle to make pickle files" @echo " json to make JSON files" @echo " htmlhelp to make HTML files and a HTML help project" @echo " qthelp to make HTML files and a qthelp project" @echo " devhelp to make HTML files and a Devhelp project" @echo " epub to make an epub" @echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter" @echo " latexpdf to make LaTeX files and run them through pdflatex" @echo " text to make text files" @echo " man to make manual pages" @echo " texinfo to make Texinfo files" @echo " info to make Texinfo files and run them through makeinfo" @echo " gettext to make PO message catalogs" @echo " changes to make an overview of all changed/added/deprecated items" @echo " linkcheck to check all external links for integrity" @echo " doctest to run all doctests embedded in the documentation (if enabled)" clean: -rm -rf $(BUILDDIR)/* -rm -rf examples/* -rm -rf example_thumbs/* -rm -rf tutorial/*_files/ -rm -rf tutorial/*.rst -rm -rf generated/* notebooks: make -C tutorial notebooks html: $(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html @echo @echo "Build finished. The HTML pages are in $(BUILDDIR)/html." dirhtml: $(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml @echo @echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml." singlehtml: $(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml @echo @echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml." pickle: $(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle @echo @echo "Build finished; now you can process the pickle files." json: $(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json @echo @echo "Build finished; now you can process the JSON files." htmlhelp: $(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp @echo @echo "Build finished; now you can run HTML Help Workshop with the" \ ".hhp project file in $(BUILDDIR)/htmlhelp." qthelp: $(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp @echo @echo "Build finished; now you can run "qcollectiongenerator" with the" \ ".qhcp project file in $(BUILDDIR)/qthelp, like this:" @echo "# qcollectiongenerator $(BUILDDIR)/qthelp/lyman.qhcp" @echo "To view the help file:" @echo "# assistant -collectionFile $(BUILDDIR)/qthelp/lyman.qhc" devhelp: $(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp @echo @echo "Build finished." @echo "To view the help file:" @echo "# mkdir -p $$HOME/.local/share/devhelp/lyman" @echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/lyman" @echo "# devhelp" epub: $(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub @echo @echo "Build finished. The epub file is in $(BUILDDIR)/epub." latex: $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex @echo @echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex." @echo "Run \`make' in that directory to run these through (pdf)latex" \ "(use \`make latexpdf' here to do that automatically)." latexpdf: $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex @echo "Running LaTeX files through pdflatex..." $(MAKE) -C $(BUILDDIR)/latex all-pdf @echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex." text: $(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text @echo @echo "Build finished. The text files are in $(BUILDDIR)/text." man: $(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man @echo @echo "Build finished. The manual pages are in $(BUILDDIR)/man." texinfo: $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo @echo @echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo." @echo "Run \`make' in that directory to run these through makeinfo" \ "(use \`make info' here to do that automatically)." info: $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo @echo "Running Texinfo files through makeinfo..." make -C $(BUILDDIR)/texinfo info @echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo." gettext: $(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale @echo @echo "Build finished. The message catalogs are in $(BUILDDIR)/locale." changes: $(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes @echo @echo "The overview file is in $(BUILDDIR)/changes." linkcheck: $(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck @echo @echo "Link check complete; look for any errors in the above output " \ "or in $(BUILDDIR)/linkcheck/output.txt." doctest: $(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest @echo "Testing of doctests in the sources finished, look at the " \ "results in $(BUILDDIR)/doctest/output.txt." upload: rsync -azP $(BUILDDIR)/html/ mwaskom@cardinal.stanford.edu:WWW/software/seaborn @echo "Uploaded to Stanford webspace" upload-dev: rsync -azP $(BUILDDIR)/html/ mwaskom@cardinal.stanford.edu:WWW/software/seaborn-dev @echo "Uploaded to Stanford webspace (development page)" seaborn-0.6.0/doc/_static/000077500000000000000000000000001254425553400153525ustar00rootroot00000000000000seaborn-0.6.0/doc/_static/copybutton.js000066400000000000000000000051361254425553400201230ustar00rootroot00000000000000// originally taken from scikit-learn's Sphinx theme $(document).ready(function() { /* Add a [>>>] button on the top-right corner of code samples to hide * the >>> and ... prompts and the output and thus make the code * copyable. * Note: This JS snippet was taken from the official python.org * documentation site.*/ var div = $('.highlight-python .highlight,' + '.highlight-python3 .highlight,' + '.highlight-pycon .highlight') var pre = div.find('pre'); // get the styles from the current theme pre.parent().parent().css('position', 'relative'); var hide_text = 'Hide the prompts and output'; var show_text = 'Show the prompts and output'; var border_width = pre.css('border-top-width'); var border_style = pre.css('border-top-style'); var border_color = pre.css('border-top-color'); var button_styles = { 'cursor':'pointer', 'position': 'absolute', 'top': '0', 'right': '0', 'border-color': border_color, 'border-style': border_style, 'border-width': border_width, 'color': border_color, 'text-size': '75%', 'font-family': 'monospace', 'padding-left': '0.2em', 'padding-right': '0.2em' } // create and add the button to all the code blocks that contain >>> div.each(function(index) { var jthis = $(this); if (jthis.find('.gp').length > 0) { var button = $('>>>'); button.css(button_styles) button.attr('title', hide_text); jthis.prepend(button); } // tracebacks (.gt) contain bare text elements that need to be // wrapped in a span to work with .nextUntil() (see later) jthis.find('pre:has(.gt)').contents().filter(function() { return ((this.nodeType == 3) && (this.data.trim().length > 0)); }).wrap(''); }); // define the behavior of the button when it's clicked $('.copybutton').toggle( function() { var button = $(this); button.parent().find('.go, .gp, .gt').hide(); button.next('pre').find('.gt').nextUntil('.gp, .go').css('visibility', 'hidden'); button.css('text-decoration', 'line-through'); button.attr('title', show_text); }, function() { var button = $(this); button.parent().find('.go, .gp, .gt').show(); button.next('pre').find('.gt').nextUntil('.gp, .go').css('visibility', 'visible'); button.css('text-decoration', 'none'); button.attr('title', hide_text); }); }); seaborn-0.6.0/doc/_static/style.css000066400000000000000000000007421254425553400172270ustar00rootroot00000000000000 blockquote p { font-size: 14px !important; } blockquote { margin: 0 0 4px !important; } code { color: #49759c !important; background-color: #f3f5f9 !important; } .alert-info { background-color: #adb8cb !important; border-color: #adb8cb !important; color: #2c3e50 !important; } .headerlink:before { display: block; context: " "; height: 200px; /*same height as header*/ margin-top: -200px; /*same height as header*/ visibility: hidden; } seaborn-0.6.0/doc/api.rst000066400000000000000000000036271254425553400152370ustar00rootroot00000000000000.. _api_ref: .. currentmodule:: seaborn API reference ============= .. _distribution_api: Distribution plots ------------------ .. autosummary:: :toctree: generated/ jointplot pairplot distplot kdeplot rugplot .. _regression_api: Regression plots ---------------- .. autosummary:: :toctree: generated/ lmplot regplot residplot interactplot coefplot .. _categorical_api: Categorical plots ----------------- .. autosummary:: :toctree: generated/ factorplot boxplot violinplot stripplot pointplot barplot countplot .. _matrix_api: Matrix plots ------------ .. autosummary:: :toctree: generated/ heatmap clustermap Timeseries plots ---------------- .. autosummary:: :toctree: generated/ tsplot Miscellaneous plots ------------------- .. autosummary:: :toctree: generated/ palplot .. _grid_api: Axis grids ---------- .. autosummary:: :toctree: generated/ FacetGrid PairGrid JointGrid .. _style_api: Style frontend -------------- .. autosummary:: :toctree: generated/ set axes_style set_style plotting_context set_context set_color_codes reset_defaults reset_orig .. _palette_api: Color palettes -------------- .. autosummary:: :toctree: generated/ set_palette color_palette husl_palette hls_palette cubehelix_palette dark_palette light_palette diverging_palette blend_palette xkcd_palette crayon_palette mpl_palette Palette widgets --------------- .. autosummary:: :toctree: generated/ choose_colorbrewer_palette choose_cubehelix_palette choose_light_palette choose_dark_palette choose_diverging_palette Utility functions ----------------- .. autosummary:: :toctree: generated/ despine desaturate saturate set_hls_values ci_to_errsize axlabel seaborn-0.6.0/doc/conf.py000066400000000000000000000216751254425553400152360ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # seaborn documentation build configuration file, created by # sphinx-quickstart on Mon Jul 29 23:25:46 2013. # # This file is execfile()d with the current directory set to its containing dir. # # Note that not all possible configuration values are present in this # autogenerated file. # # All configuration values have a default; values that are commented out # serve to show the default. import sys, os import sphinx_bootstrap_theme import matplotlib as mpl mpl.use("Agg") # If extensions (or modules to document with autodoc) are in another directory, # add these directories to sys.path here. If the directory is relative to the # documentation root, use os.path.abspath to make it absolute, like shown here. #sys.path.insert(0, os.path.abspath('.')) # -- General configuration ----------------------------------------------------- # If your documentation needs a minimal Sphinx version, state it here. #needs_sphinx = '1.0' # Add any Sphinx extension module names here, as strings. They can be extensions # coming with Sphinx (named 'sphinx.ext.*') or your custom ones. sys.path.insert(0, os.path.abspath('sphinxext')) extensions = ['sphinx.ext.autodoc', 'sphinx.ext.doctest', 'sphinx.ext.coverage', 'sphinx.ext.mathjax', 'sphinx.ext.autosummary', 'plot_generator', 'plot_directive', 'numpydoc', 'ipython_directive', 'ipython_console_highlighting', ] # Generate the API documentation when building autosummary_generate = True numpydoc_show_class_members = False # Include the example source for plots in API docs plot_include_source = True plot_formats = [("png", 90)] plot_html_show_formats = False plot_html_show_source_link = False # Add any paths that contain templates here, relative to this directory. templates_path = ['_templates'] # The suffix of source filenames. source_suffix = '.rst' # The encoding of source files. #source_encoding = 'utf-8-sig' # The master toctree document. master_doc = 'index' # General information about the project. project = u'seaborn' copyright = u'2012-2015, Michael Waskom' # The version info for the project you're documenting, acts as replacement for # |version| and |release|, also used in various other places throughout the # built documents. # # The short X.Y version. sys.path.insert(0, os.path.abspath(os.path.pardir)) import seaborn version = seaborn.__version__ # The full version, including alpha/beta/rc tags. release = seaborn.__version__ # The language for content autogenerated by Sphinx. Refer to documentation # for a list of supported languages. #language = None # There are two options for replacing |today|: either, you set today to some # non-false value, then it is used: #today = '' # Else, today_fmt is used as the format for a strftime call. #today_fmt = '%B %d, %Y' # List of patterns, relative to source directory, that match files and # directories to ignore when looking for source files. exclude_patterns = ['_build'] # The reST default role (used for this markup: `text`) to use for all documents. #default_role = None # If true, '()' will be appended to :func: etc. cross-reference text. #add_function_parentheses = True # If true, the current module name will be prepended to all description # unit titles (such as .. function::). #add_module_names = True # If true, sectionauthor and moduleauthor directives will be shown in the # output. They are ignored by default. #show_authors = False # The name of the Pygments (syntax highlighting) style to use. pygments_style = 'sphinx' # A list of ignored prefixes for module index sorting. #modindex_common_prefix = [] # -- Options for HTML output --------------------------------------------------- # The theme to use for HTML and HTML Help pages. See the documentation for # a list of builtin themes. html_theme = 'bootstrap' # Theme options are theme-specific and customize the look and feel of a theme # further. For a list of options available for each theme, see the # documentation. html_theme_options = { 'source_link_position': "footer", 'bootswatch_theme': "flatly", 'navbar_sidebarrel': False, 'bootstrap_version': "3", 'navbar_links': [("API", "api"), ("Tutorial", "tutorial"), ("Gallery", "examples/index")], } # Add any paths that contain custom themes here, relative to this directory. html_theme_path = sphinx_bootstrap_theme.get_html_theme_path() # The name for this set of Sphinx documents. If None, it defaults to # " v documentation". #html_title = None # A shorter title for the navigation bar. Default is the same as html_title. #html_short_title = None # The name of an image file (relative to this directory) to place at the top # of the sidebar. #html_logo = None # The name of an image file (within the static path) to use as favicon of the # docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32 # pixels large. #html_favicon = None # Add any paths that contain custom static files (such as style sheets) here, # relative to this directory. They are copied after the builtin static files, # so a file named "default.css" will overwrite the builtin "default.css". html_static_path = ['_static', 'example_thumbs'] # If not '', a 'Last updated on:' timestamp is inserted at every page bottom, # using the given strftime format. #html_last_updated_fmt = '%b %d, %Y' # If true, SmartyPants will be used to convert quotes and dashes to # typographically correct entities. #html_use_smartypants = True # Custom sidebar templates, maps document names to template names. #html_sidebars = {} # Additional templates that should be rendered to pages, maps page names to # template names. #html_additional_pages = {} # If false, no module index is generated. #html_domain_indices = True # If false, no index is generated. #html_use_index = True # If true, the index is split into individual pages for each letter. #html_split_index = False # If true, links to the reST sources are added to the pages. #html_show_sourcelink = True # If true, "Created using Sphinx" is shown in the HTML footer. Default is True. #html_show_sphinx = True # If true, "(C) Copyright ..." is shown in the HTML footer. Default is True. #html_show_copyright = True # If true, an OpenSearch description file will be output, and all pages will # contain a tag referring to it. The value of this option must be the # base URL from which the finished HTML is served. #html_use_opensearch = '' # This is the file name suffix for HTML files (e.g. ".xhtml"). #html_file_suffix = None # Output file base name for HTML help builder. htmlhelp_basename = 'seaborndoc' # -- Options for LaTeX output -------------------------------------------------- latex_elements = { # The paper size ('letterpaper' or 'a4paper'). #'papersize': 'letterpaper', # The font size ('10pt', '11pt' or '12pt'). #'pointsize': '10pt', # Additional stuff for the LaTeX preamble. #'preamble': '', } # Grouping the document tree into LaTeX files. List of tuples # (source start file, target name, title, author, documentclass [howto/manual]). latex_documents = [ ('index', 'seaborn.tex', u'seaborn Documentation', u'Michael Waskom', 'manual'), ] # The name of an image file (relative to this directory) to place at the top of # the title page. #latex_logo = None # For "manual" documents, if this is true, then toplevel headings are parts, # not chapters. #latex_use_parts = False # If true, show page references after internal links. #latex_show_pagerefs = False # If true, show URL addresses after external links. #latex_show_urls = False # Documents to append as an appendix to all manuals. #latex_appendices = [] # If false, no module index is generated. #latex_domain_indices = True # -- Options for manual page output -------------------------------------------- # One entry per manual page. List of tuples # (source start file, name, description, authors, manual section). man_pages = [ ('index', 'seaborn', u'seaborn Documentation', [u'Michael Waskom'], 1) ] # If true, show URL addresses after external links. #man_show_urls = False # -- Options for Texinfo output ------------------------------------------------ # Grouping the document tree into Texinfo files. List of tuples # (source start file, target name, title, author, # dir menu entry, description, category) texinfo_documents = [ ('index', 'seaborn', u'seaborn Documentation', u'Michael Waskom', 'seaborn', 'One line description of project.', 'Miscellaneous'), ] # Documents to append as an appendix to all manuals. #texinfo_appendices = [] # If false, no module index is generated. #texinfo_domain_indices = True # How to display URL addresses: 'footnote', 'no', or 'inline'. #texinfo_show_urls = 'footnote' # Add the 'copybutton' javascript, to hide/show the prompt in code # examples, originally taken from scikit-learn's doc/conf.py def setup(app): app.add_javascript('copybutton.js') app.add_stylesheet('style.css') seaborn-0.6.0/doc/index.rst000066400000000000000000000065161254425553400155750ustar00rootroot00000000000000.. raw:: html Seaborn: statistical data visualization ======================================= .. raw:: html


Seaborn is a Python visualization library based on matplotlib. It provides a high-level interface for drawing attractive statistical graphics. For a brief introduction to the ideas behind the package, you can read the :ref:`introductory notes `. More practical information is on the :ref:`installation page `. You may also want to browse the :ref:`example gallery ` to get a sense for what you can do with seaborn and then check out the :ref:`tutorial ` and :ref:`API reference ` to find out how. To see the code or report a bug, please visit the `github repository `_. General support issues are most at home on `stackoverflow `_, where there is a seaborn tag. .. raw:: html

Documentation

.. toctree:: :maxdepth: 1 introduction whatsnew installing examples/index api tutorial .. raw:: html

Features

* Style functions: :ref:`API ` | :ref:`Tutorial ` * Color palettes: :ref:`API ` | :ref:`Tutorial ` * Distribution plots: :ref:`API ` | :ref:`Tutorial ` * Regression plots: :ref:`API ` | :ref:`Tutorial ` * Categorical plots: :ref:`API ` | :ref:`Tutorial ` * Axis grid objects: :ref:`API ` | :ref:`Tutorial ` .. raw:: html
seaborn-0.6.0/doc/installing.rst000066400000000000000000000111411254425553400166200ustar00rootroot00000000000000.. _installing: Installing and getting started ------------------------------ To install the released version of seaborn, you can use ``pip`` (i.e. ``pip install seaborn``). It's also possible to install the released version using ``conda`` (i.e. ``conda install seaborn``), although this may lag behind the version availible from PyPI. Alternatively, you can use ``pip`` to install the development version, with the command ``pip install git+git://github.com/mwaskom/seaborn.git#egg=seaborn``. Another option would be to to clone the `github repository `_ and install with ``pip install .`` from the source directory. Seaborn itself is pure Python, so installation should be reasonably straightforward. When using the development version, you may want to refer to the `development docs `_. Note that these are not built automatically and may at times fall out of sync with the actual master branch on github. Dependencies ~~~~~~~~~~~~ - Python 2.7 or 3.3+ Mandatory dependencies ^^^^^^^^^^^^^^^^^^^^^^ - `numpy `__ - `scipy `__ - `matplotlib `__ - `pandas `__ Recommended dependencies ^^^^^^^^^^^^^^^^^^^^^^^^ - `statsmodels `__ The `pip` installation script will attempt to download the mandatory dependencies if they do not exist at install-time. I recommend using seaborn with the `Anaconda distribution `_, as this makes it easy to manage the main dependencies, which otherwise can be difficult to install. I attempt to keep seaborn importable and generally functional on the versions available through the stable Debian channels. There may be cases where some more advanced features only work with newer versions of these dependencies, although these should be relatively rare. There are also some known bugs on older versions of matplotlib, so you should in general try to use a modern version. For many use cases, though, older matplotlibs will work fine. Seaborn is tested on the most recent versions offered through ``conda``. Importing seaborn ~~~~~~~~~~~~~~~~~ Seaborn will apply its default style parameters to the global matplotlib style dictionary when you import it. This will change the look of all plots, including those created by using matplotlib functions directly. To avoid this behavior and use the default matplotlib aesthetics (along with any customization in your ``matplotlibrc``), you can import the ``seaborn.apionly`` namespace. Seaborn has several other pre-packaged styles along with high-level :ref:`tools ` for managing them, so you should not limit yourself to the default aesthetics. By convention, ``seaborn`` is abbreviated to ``sns`` on import. Testing ~~~~~~~ To test seaborn, run ``make test`` in the root directory of the source distribution. This runs the unit test suite (which can also be exercised separately by running ``nosetests``). It also runs the code in the example notebooks to smoke-test a broader and more realistic range of example usage. The full set of tests requires an internet connection to download the example datasets, but the unit tests should be able to run offline. Bugs ~~~~ Please report any bugs you encounter through the github `issue tracker `_. It will be most helpful to include a reproducible example on one of the example datasets (accessed through :func:`load_dataset`). It is difficult debug any issues without knowing the versions of seaborn and matplotlib you are using, as well as what matplotlib backend you are using to draw the plots, so please include those in your bug report. Known issues ~~~~~~~~~~~~ There is a `bug `_ in the matplotlib OSX backend that causes unavoidable problems with some of the seaborn functions (particularly those that draw multi-panel figures). If you encounter this, you will want to try a `different backend `_. In particular, this bug affects any multi-panel figure that internally calls the matplotlib ``tight_layout`` function. An unfortunate consequence of how the matplotlib marker styles work is that line-art markers (e.g. ``"+"``) or markers with ``facecolor`` set to ``"none"`` will be invisible when the default seaborn style is in effect. This can be changed by using a different ``markeredgewidth`` (aliased to ``mew``) either in the function call or globally in the `rcParams`. seaborn-0.6.0/doc/introduction.rst000066400000000000000000000071351254425553400172050ustar00rootroot00000000000000.. _introduction: An introduction to seaborn ========================== Seaborn is a library for making attractive and informative statistical graphics in Python. It is built on top of `matplotlib `_ and tightly integrated with the `PyData `_ stack, including support for `numpy `_ and `pandas `_ data structures and statistical routines from `scipy `_ and `statsmodels `_. Some of the features that seaborn offers are - Several :ref:`built-in themes ` that improve on the default matplotlib aesthetics - Tools for choosing :ref:`color palettes ` to make beautiful plots that reveal patterns in your data - Functions for visualizing :ref:`univariate ` and :ref:`bivariate ` distributions or for :ref:`comparing ` them between subsets of data - Tools that fit and visualize :ref:`linear regression ` models for different kinds of :ref:`independent ` and :ref:`dependent ` variables - Functions that visualize :ref:`matrices of data ` and use clustering algorithms to :ref:`discover structure ` in those matrices - A function to plot :ref:`statistical timeseries ` data with flexible estimation and :ref:`representation ` of uncertainty around the estimate - High-level abstractions for structuring :ref:`grids of plots ` that let you easily build :ref:`complex ` visualizations Seaborn aims to make visualization a central part of exploring and understanding data. The plotting functions operate on dataframes and arrays containing a whole dataset and internally perform the necessary aggregation and statistical model-fitting to produce informative plots. If matplotlib "tries to make easy things easy and hard things possible", seaborn tries to make a well-defined set of hard things easy too. The plotting functions try to do something useful when called with a minimal set of arguments, and they expose a number of customizable options through additional parameters. Some of the functions plot directly into a matplotlib axes object, while others operate on an entire figure and produce plots with several panels. In the latter case, the plot is drawn using a Grid object that links the structure of the figure to the structure of the dataset in an abstract way. Because seaborn uses matplotlib, the graphics can be further tweaked using matplotlib tools and rendered with any of the matplotlib backends to generate publication-quality figures. Seaborn can also be used to target web-based graphics through the `mpld3 `_ and `Bokeh `_ libraries. Seaborn should be thought of as a complement to matplotlib, not a replacement for it. When using seaborn, it is likely that you will often invoke matplotlib functions directly to draw simpler plots already available through the ``pyplot`` namespace. Further, while the seaborn functions aim to make plots that are reasonably "production ready" (including extracting semantic information from Pandas objects to add informative labels), full customization of the figures will require a sophisticated understanding of matplotlib objects. For more detailed information and copious examples of the syntax and resulting plots, you can check out the :ref:`example gallery `, :ref:`tutorial ` or :ref:`API reference `. seaborn-0.6.0/doc/releases/000077500000000000000000000000001254425553400155275ustar00rootroot00000000000000seaborn-0.6.0/doc/releases/v0.2.0.txt000066400000000000000000000132411254425553400171140ustar00rootroot00000000000000 v0.2.0 (December 2013) ---------------------- This is a major release from 0.1 with a number of API changes, enhancements, and bug fixes. Highlights include an overhaul of timeseries plotting to work intelligently with dataframes, the new function ``interactplot()`` for visualizing continuous interactions, bivariate kernel density estimates in ``kdeplot()``, and significant improvements to color palette handling. Version 0.2 also introduces experimental support for Python 3. In addition to the library enhancements, the documentation has been substantially rewritten to reflect the new features and improve the presentation of the ideas behind the package. API changes ~~~~~~~~~~~ - The ``tsplot()`` function was rewritten to accept data in a long-form ``DataFrame`` and to plot different traces by condition. This introduced a relatively minor but unavoidable API change, where instead of doing ``sns.tsplot(time, heights)``, you now must do ``sns.tsplot(heights, time=time)`` (the ``time`` parameter is now optional, for quicker specification of simple plots). Additionally, the ``"obs_traces"`` and ``"obs_points"`` error styles in ``tsplot()`` have been renamed to ``"unit_traces"`` and ``"unit_points"``, respectively. - Functions that fit kernel density estimates (``kdeplot()`` and ``violinplot()``) now use ``statsmodels`` instead of ``scipy``, and the parameters that influence the density estimate have changed accordingly. This allows for increased flexibility in specifying the bandwidth and kernel, and smarter choices for defining the range of the support. Default options should produce plots that are very close to the old defaults. - The ``kdeplot()`` function now takes a second positional argument of data for drawing bivariate densities. - The ``violin()`` function has been changed to ``violinplot()``, for consistency. In 0.2, ``violin`` will still work, but it will fire a ``UserWarning``. New plotting functions ~~~~~~~~~~~~~~~~~~~~~~ - The ``interactplot()`` function draws a contour plot for an interactive linear model (i.e., the contour shows ``y-hat`` from the model ``y ~ x1 * x2``) over a scatterplot between the two predictor variables. This plot should aid the understanding of an interaction between two continuous variables. - The ``kdeplot()`` function can now draw a bivariate density estimate as a contour plot if provided with two-dimensional input data. - The ``palplot()`` function provides a simple grid-based visualization of a color palette. Other changes ~~~~~~~~~~~~~ Plotting functions ^^^^^^^^^^^^^^^^^^ - The ``corrplot()`` function can be drawn without the correlation coefficient annotation and with variable names on the side of the plot to work with large datasets. - Additionally, ``corrplot()`` sets the color palette intelligently based on the direction of the specified test. - The ``distplot()`` histogram uses a reference rule to choose the bin size if it is not provided. - Added the ``x_bins`` option in ``lmplot()`` for binning a continuous predictor variable, allowing for clearer trends with many datapoints. - Enhanced support for labeling plot elements and axes based on ``name`` attributes in several distribution plot functions and ``tsplot()`` for smarter Pandas integration. - Scatter points in ``lmplot()`` are slightly transparent so it is easy to see where observations overlap. - Added the ``order`` parameter to ``boxplot()`` and ``violinplot()`` to control the order of the bins when using a Pandas object. - When an ``ax`` argument is not provided to a plotting function, it grabs the currently active axis instead of drawing a new one. Color palettes ^^^^^^^^^^^^^^ - Added the ``dark_palette()`` and ``blend_palette()`` for on-the-fly creation of blended color palettes. - The color palette machinery is now intelligent about qualitative ColorBrewer palettes (``Set1``, ``Paired``, etc.), which are properly treated as discrete. - Seaborn color palettes (``deep``, ``muted``, etc.) have been standardized in terms of basic hue sequence, and all palettes now have 6 colors. - Introduced ``{mpl_palette}_d`` palettes, which make a palette with the basic color scheme of the source palette, but with a sequential blend from dark instead of light colors for use with line/scatter/contour plots. - Added the ``palette_context()`` function for blockwise color palettes controlled by a ``with`` statement. Plot styling ^^^^^^^^^^^^ - Added the ``despine()`` function for easily removing plot spines. - A new plot style, ``"ticks"`` has been added. - Tick labels are padded a bit farther from the axis in all styles, avoiding collisions at (0, 0). General package issues ^^^^^^^^^^^^^^^^^^^^^^ - Reorganized the package by breaking up the monolithic ``plotobjs`` module into smaller modules grouped by general objective of the constituent plots. - Removed the ``scikits-learn`` dependency in ``moss``. - Installing with ``pip`` should automatically install most missing dependencies. - The example notebooks are now used as an automated test suite. Bug fixes ~~~~~~~~~ - Fixed a bug where labels did not match data for ``boxplot()`` and ``violinplot()`` when using a groupby. - Fixed a bug in the ``desaturate()`` function. - Fixed a bug in the ``coefplot()`` figure size calculation. - Fixed a bug where ``regplot()`` choked on list input. - Fixed buggy behavior when drawing horizontal boxplots. - Specifying bins for the ``distplot()`` histogram now works. - Fixed a bug where ``kdeplot()`` would reset the axis height and cut off existing data. - All axis styling has been moved out of the top-level ``seaborn.set()`` function, so context or color palette can be cleanly changed. seaborn-0.6.0/doc/releases/v0.2.1.txt000066400000000000000000000014751254425553400171230ustar00rootroot00000000000000 v0.2.1 (December 2013) ---------------------- This is a bugfix release, with no new features. Bug fixes ~~~~~~~~~ - Changed the mechanics of ``violinplot()`` and ``boxplot()`` when using a ``Series`` object as data and performing a ``groupby`` to assign data to bins to address a problem that arises in Pandas 0.13. - Additionally fixed the ``groupby`` code to work with all styles of group specification (specifically, using a dictionary or a function now works). - Fixed a bug where artifacts from the kde fitting could undershoot and create a plot where the density axis starts below 0. - Ensured that data used for kde fitting is double-typed to avoid a low-level statsmodels error. - Changed the implementation of the histogram bin-width reference rule to take a ceiling of the estimated number of bins. seaborn-0.6.0/doc/releases/v0.3.0.txt000066400000000000000000000147631254425553400171270ustar00rootroot00000000000000 v0.3.0 (March 2014) ------------------- This is a major release from 0.2 with a number of enhancements to the plotting capabilities and styles. Highlights include :class:`FacetGrid`, :func:`factorplot`, :func:`jointplot`, and an overhaul to :ref:`style management `. There is also lots of new documentation, including an :ref:`example gallery ` and reorganized :ref:`tutorial `. New plotting functions ~~~~~~~~~~~~~~~~~~~~~~ - The :class:`FacetGrid` class adds a new form of functionality to seaborn, providing a way to abstractly structure a grid of plots corresponding to subsets of a dataset. It can be used with a wide variety of plotting functions (including most of the matplotlib and seaborn APIs. See the :ref:`tutorial ` for more information. - Version 0.3 introduces the :func:`factorplot` function, which is similar in spirit to :func:`lmplot` but intended for use when the main independent variable is categorical instead of quantitative. :func:`factorplot` can draw a plot in either a point or bar representation using the corresponding Axes-level functions :func:`pointplot` and :func:`barplot` (which are also new). Additionally, the :func:`factorplot` function can be used to draw box plots on a faceted grid. For examples of how to use these functions, you can refer to the :ref:`tutorial `. - Another new function is :func:`jointplot`, which is built using the new :class:`JointGrid` object. :func:`jointplot` generalizes the behavior of :func:`regplot` in previous versions of seaborn (:func:`regplot` has changed somewhat in 0.3; see below for details) by drawing a bivariate plot of the relationship between two variables with their marginal distributions drawn on the side of the plot. With :func:`jointplot`, you can draw a scatterplot or :ref:`regression plot ` as before, but you can now also draw bivariate kernel densities or hexbin plots with appropriate univariate graphs for the marginal distributions. Additionally, it's easy to use :class:`JointGrid` directly to build up more complex plots when the default methods offered by :func:`jointplot` are not suitable for your visualization problem. The :ref:`tutorial ` for :class:`JointGrid` has more examples of how this object can be useful. - The :func:`residplot` function complements :func:`regplot` and can be quickly used to diagnose problems with a linear model by calculating and plotting the residuals of a simple regression. There is also a ``"resid"`` kind for :func:`jointplot`. API changes ~~~~~~~~~~~ - The most noticeable change will be that :func:`regplot` no longer produces a multi-component plot with distributions in marginal axes. Instead. :func:`regplot` is now an "Axes-level" function that can be plotted into any existing figure on a specific set of axes. :func:`regplot` and :func:`lmplot` have also been unified (the latter uses the former behind the scenes), so all options for how to fit and represent the regression model can be used for both functions. To :ref:`get the old behavior ` of :func:`regplot`, use :func:`jointplot` with ``kind="reg"``. - As noted above, :func:`lmplot` has been rewritten to exploit the :class:`FacetGrid` machinery. This involves a few changes. The ``color`` keyword argument has been replaced with ``hue``, for better consistency across the package. The ``hue`` parameter will always take a variable *name*, while ``color`` will take a color name or (in some cases) a palette. The :func:`lmplot` function now returns the :class:`FacetGrid` used to draw the plot instance. - The functions that interact with matplotlib rc parameters have been updated and standardized. There are now three pairs of functions, :func:`axes_style` and :func:`set_style`, :func:`plotting_context` and :func:`set_context`, and :func:`color_palette` and :func:`set_palette`. In each case, the pairs take the exact same arguments. The first function defines and returns the parameters, and the second sets the matplotlib defaults. Additionally, the first function in each pair can be used in a ``with`` statement to temporarily change the defaults. Both the style and context functions also now accept a dictionary of matplotlib rc parameters to override the seaborn defaults, and :func:`set` now also takes a dictionary to update any of the matplotlib defaults. See the :ref:`tutorial ` for more information. - The ``nogrid`` style has been deprecated and changed to ``white`` for more uniformity (i.e. there are now ``darkgrid``, ``dark``, ``whitegrid``, and ``white`` styles). Other changes ~~~~~~~~~~~~~ Using the package ^^^^^^^^^^^^^^^^^ - If you want to use plotting functions provided by the package without setting the matplotlib style to a seaborn theme, you can now do ``import seaborn.apionly as sns`` or ``from seaborn.apionly import lmplot``, etc. This is using the (also new) :func:`reset_orig` function, which returns the rc parameters to what they are at matplotlib import time — i.e. they will respect any custom `matplotlibrc` settings on top of the matplotlib defaults. - The dependency load of the package has been reduced. It can now be installed and used with only ``numpy``, ``scipy``, ``matplotlib``, and ``pandas``. Although ``statsmodels`` is still recommended for full functionality, it is not required. Plotting functions ^^^^^^^^^^^^^^^^^^ - :func:`lmplot` (and :func:`regplot`) have two new options for fitting regression models: ``lowess`` and ``robust``. The former fits a nonparametric smoother, while the latter fits a regression using methods that are less sensitive to outliers. - The regression uncertainty in :func:`lmplot` and :func:`regplot` is now estimated with fewer bootstrap iterations, so plotting should be faster. - The univariate :func:`kdeplot` can now be drawn as a *cumulative* density plot. - Changed :func:`interactplot` to use a robust calculation of the data range when finding default limits for the contour colormap to work better when there are outliers in the data. Style ^^^^^ - There is a new style, ``dark``, which shares most features with ``darkgrid`` but does not draw a grid by default. - There is a new function, :func:`offset_spines`, and a corresponding option in :func:`despine` called ``trim``. Together, these can be used to make plots where the axis spines are offset from the main part of the figure and limited within the range of the ticks. This is recommended for use with the ``ticks`` style. - Other aspects of the seaborn styles have been tweaked for more attractive plots. seaborn-0.6.0/doc/releases/v0.3.1.txt000066400000000000000000000017451254425553400171240ustar00rootroot00000000000000 v0.3.1 (April 2014) ------------------- This is a minor release from 0.3 with fixes for several bugs. Plotting functions ~~~~~~~~~~~~~~~~~~ - The size of the points in :func:`pointplot` and :func:`factorplot` are now scaled with the linewidth for better aesthetics across different plotting contexts. - The :func:`pointplot` glyphs for different levels of the hue variable are drawn at different z-orders so that they appear uniform. Bug Fixes ~~~~~~~~~ - Fixed a bug in :class:`FacetGrid` (and thus affecting lmplot and factorplot) that appeared when ``col_wrap`` was used with a number of facets that did not evenly divide into the column width. - Fixed an issue where the support for kernel density estimates was sometimes computed incorrectly. - Fixed a problem where ``hue`` variable levels that were not strings were missing in :class:`FacetGrid` legends. - When passing a color palette list in a ``with`` statement, the entire palette is now used instead of the first six colors. seaborn-0.6.0/doc/releases/v0.4.0.txt000066400000000000000000000102531254425553400171160ustar00rootroot00000000000000 v0.4.0 (September 2014) ----------------------- This is a major release from 0.3. Highlights include new approaches for :ref:`quick, high-level dataset exploration ` (along with a more :ref:`flexible interface `) and easy creation of :ref:`perceptually-appropriate color palettes ` using the cubehelix system. Along with these additions, there are a number of smaller changes that make visualizing data with seaborn easier and more powerful. Plotting functions ~~~~~~~~~~~~~~~~~~ - A new object, :class:`PairGrid`, and a corresponding function :func:`pairplot`, for drawing grids of pairwise relationships in a dataset. This style of plot is sometimes called a "scatterplot matrix", but the representation of the data in :class:`PairGrid` is flexible and many styles other than scatterplots can be used. See the :ref:`docs ` for more information. **Note:** due to a bug in older versions of matplotlib, you will have best results if you use these functions with matplotlib 1.4 or later. - The rules for choosing default color palettes when variables are mapped to different colors have been unified (and thus changed in some cases). Now when no specific palette is requested, the current global color palette will be used, unless the number of variables to be mapped exceeds the number of unique colors in the palette, in which case the ``"husl"`` palette will be used to avoid cycling. - Added a keyword argument ``hist_norm`` to :func:`distplot`. When a :func:`distplot` is now drawn without a KDE or parametric density, the histogram is drawn as counts instead of a density. This can be overridden by by setting ``hist_norm`` to ``True``. - When using :class:`FacetGrid` with a ``hue`` variable, the legend is no longer drawn by default when you call :meth:`FacetGrid.map`. Instead, you have to call :meth:`FacetGrid.add_legend` manually. This should make it easier to layer multiple plots onto the grid without having duplicated legends. - Made some changes to :func:`factorplot` so that it behaves better when not all levels of the ``x`` variable are represented in each facet. - Added the ``logx`` option to :func:`regplot` for fitting the regression in log space. - When :func:`violinplot` encounters a bin with only a single observation, it will now plot a horizontal line at that value instead of erroring out. Style and color palettes ~~~~~~~~~~~~~~~~~~~~~~~~ - Added the :func:`cubehelix_palette` function for generating sequential palettes from the cubehelix system. See the :ref:`palette docs ` for more information on how these palettes can be used. There is also the :func:`choose_cubehelix` which will launch an interactive app to select cubehelix parameters in the notebook. - Added the :func:`xkcd_palette` and the ``xkcd_rgb`` dictionary so that colors :ref:`can be specified ` with names from the `xkcd color survey `_. - Added the ``font_scale`` option to :func:`plotting_context`, :func:`set_context`, and :func:`set`. ``font_scale`` can independently increase or decrease the size of the font elements in the plot. - Font-handling should work better on systems without Arial installed. This is accomplished by adding the ``font.sans-serif`` field to the ``axes_style`` definition with Arial and Liberation Sans prepended to matplotlib defaults. The font family can also be set through the ``font`` keyword argument in :func:`set`. Due to matplotlib bugs, this might not work as expected on matplotlib 1.3. - The :func:`despine` function gets a new keyword argument ``offset``, which replaces the deprecated :func:`offset_spines` function. You no longer need to offset the spines before plotting data. - Added a default value for ``pdf.fonttype`` so that text in PDFs is editable in Adobe Illustrator. Other API Changes ~~~~~~~~~~~~~~~~~ - Removed the deprecated ``set_color_palette`` and ``palette_context`` functions. These were replaced in version 0.3 by the :func:`set_palette` function and ability to use :func:`color_palette` directly in a ``with`` statement. - Removed the ability to specify a ``nogrid`` style, which was renamed to ``white`` in 0.3. seaborn-0.6.0/doc/releases/v0.5.0.txt000066400000000000000000000112501254425553400171150ustar00rootroot00000000000000 v0.5.0 (November 2014) -------------------------- This is a major release from 0.4. Highlights include new functions for :ref:`plotting heatmaps `, possibly while :ref:`applying clustering algorithms ` to discover structured relationships. These functions are complemented by new custom colormap functions and a full set of IPython widgets that allow interactive selection of colormap parameters. The :ref:`palette tutorial ` has been rewritten to cover these new tools and more generally provide guidance on how to use color in visualizations. There are also a number of smaller changes and bugfixes. Plotting functions ~~~~~~~~~~~~~~~~~~ - Added the :func:`heatmap` function for visualizing a matrix of data by color-encoding the values. See the :ref:`docs ` for more information. - Added the :func:`clustermap` function for clustering and visualizing a matrix of data, with options to label individual rows and columns by colors. See the :ref:`docs ` for more information. This work was lead by Olga Botvinnik. - :func:`lmplot` and :func:`pairplot` get a new keyword argument, ``markers``. This can be a single kind of marker or a list of different markers for each level of the ``hue`` variable. Using different markers for different hues should let plots be more comprehensible when reproduced to black-and-white (i.e. when printed). See the `github pull request `_ for examples. - More generally, there is a new keyword argument in :class:`FacetGrid` and :class:`PairGrid`, ``hue_kws``. This similarly lets plot aesthetics vary across the levels of the hue variable, but more flexibily. ``hue_kws`` should be a dictionary that maps the name of keyword arguments to lists of values that are as long as the number of levels of the hue variable. - The argument ``subplot_kws`` has been added to ``FacetGrid``. This allows for faceted plots with custom projections, including `maps with Cartopy `_. Color palettes ~~~~~~~~~~~~~~ - Added two new functions to create custom color palettes. For sequential palettes, you can use the :func:`light_palette` function, which takes a seed color and creates a ramp from a very light, desaturated variant of it. For diverging palettes, you can use the :func:`diverging_palette` function to create a balanced ramp between two endpoints to a light or dark midpoint. See the :ref:`palette tutorial ` for more information. - Added the ability to specify the seed color for :func:`light_palette` and :func:`dark_palette` as a tuple of ``husl`` or ``hls`` space values or as a named ``xkcd`` color. The interpretation of the seed color is now provided by the new ``input`` parameter to these functions. - Added several new interactive palette widgets: :func:`choose_colorbrewer_palette`, :func:`choose_light_palette`, :func:`choose_dark_palette`, and :func:`choose_diverging_palette`. For consistency, renamed the cubehelix widget to :func:`choose_cubehelix_palette` (and fixed a bug where the cubehelix palette was reversed). These functions also now return either a color palette list or a matplotlib colormap when called, and that object will be live-updated as you play with the widget. This should make it easy to iterate over a plot until you find a good representation for the data. See the `Github pull request `_ or `this notebook (download it to use the widgets) `_ for more information. - Overhauled the color :ref:`palette tutorial ` to organize the discussion by class of color palette and provide more motivation behind the various choices one might make when choosing colors for their data. Bug fixes ~~~~~~~~~ - Fixed a bug in :class:`PairGrid` that gave incorrect results (or a crash) when the input DataFrame has a non-default index. - Fixed a bug in :class:`PairGrid` where passing columns with a date-like datatype raised an exception. - Fixed a bug where :func:`lmplot` would show a legend when the hue variable was also used on either the rows or columns (making the legend redundant). - Worked around a matplotlib bug that was forcing outliers in :func:`boxplot` to appear as blue. - :func:`kdeplot` now accepts pandas Series for the ``data`` and ``data2`` arguments. - Using a non-default correlation method in :func:`corrplot` now implies ``sig_stars=False`` as the permutation test used to significance values for the correlations uses a pearson metric. - Removed ``pdf.fonttype`` from the style definitions, as the value used in version 0.4 resulted in very large PDF files. seaborn-0.6.0/doc/releases/v0.5.1.txt000066400000000000000000000011321254425553400171140ustar00rootroot00000000000000 v0.5.1 (November 2014) ---------------------- This is a bugfix release that includes a workaround for an issue in matplotlib 1.4.2 and fixes for two bugs in functions that were new in 0.5.0. - Implemented a workaround for a bug in matplotlib 1.4.2 that prevented point markers from being drawn when the seaborn styles had been set. See this `github issue `_ for more information. - Fixed a bug in :func:`heatmap` where the mask was vertically reversed relative to the data. - Fixed a bug in :func:`clustermap` when using nested lists of side colors. seaborn-0.6.0/doc/releases/v0.6.0.txt000066400000000000000000000256221254425553400171260ustar00rootroot00000000000000 v0.6.0 (June 2015) ------------------ This is a major release from 0.5. The main objective of this release was to unify the API for categorical plots, which means that there are some relatively large API changes in some of the older functions. See below for details of those changes, which may break code written for older versions of seaborn. There are also some new functions (:func:`stripplot`, and :func:`countplot`), numerous enhancements to existing functions, and bug fixes. Additionally, the documentation has been completely revamped and expanded for the 0.6 release. Now, the API docs page for each function has multiple examples with embedded plots showing how to use the various options. These pages should be considered the most comprehensive resource for examples, and the tutorial pages are now streamlined and oriented towards a higher-level overview of the various features. Changes and updates to categorical plots ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In version 0.6, the "categorical" plots have been unified with a common API. This new category of functions groups together plots that show the relationship between one numeric variable and one or two categorical variables. This includes plots that show distribution of the numeric variable in each bin (:func:`boxplot`, :func:`violinplot`, and :func:`stripplot`) and plots that apply a statistical estimation within each bin (:func:`pointplot`, :func:`barplot`, and :func:`countplot`). There is a new :ref:`tutorial chapter ` that introduces these functions. The categorical functions now each accept the same formats of input data and can be invoked in the same way. They can plot using long- or wide-form data, and can be drawn vertically or horizontally. When long-form data is used, the orientation of the plots is inferred from the types of the input data. Additionally, all functions natively take a ``hue`` variable to add a second layer of categorization. With the (in some cases new) API, these functions can all be drawn correctly by :class:`FacetGrid`. However, :func:`factorplot` can also now create faceted verisons of any of these kinds of plots, so in most cases it will be unnecessary to use :class:`FacetGrid` directly. By default, :func:`factorplot` draws a point plot, but this is controlled by the ``kind`` parameter. Here are details on what has changed in the process of unifying these APIs: - Changes to :func:`boxplot` and :func:`violinplot` will probably be the most disruptive. Both functions maintain backwards-compatibility in terms of the kind of data they can accept, but the syntax has changed to be more similar to other seaborn functions. These functions are now invoked with ``x`` and/or ``y`` parameters that are either vectors of data or names of variables in a long-form DataFrame passed to the new ``data`` parameter. You can still pass wide-form DataFrames or arrays to ``data``, but it is no longer the first positional argument. See the `github pull request `_ for more information on these changes and the logic behind them. - As :func:`pointplot` and :func:`barplot` can now plot with the major categorical variable on the y axis, the ``x_order`` parameter has been renamed to ``order``. - Added a ``hue`` argument to :func:`boxplot` and :func:`violinplot`, which allows for nested grouping the plot elements by a third categorical variable. For :func:`violinplot`, this nesting can also be accomplished by splitting the violins when there are two levels of the ``hue`` variable (using ``split=True``). To make this functionality feasible, the ability to specify where the plots will be draw in data coordinates has been removed. These plots now are drawn at set positions, like (and identical to) :func:`barplot` and :func:`pointplot`. - Added a ``palette`` parameter to :func:`boxplot`/:func:`violinplot`. The ``color`` parameter still exists, but no longer does double-duty in accepting the name of a seaborn palette. ``palette`` supersedes ``color`` so that it can be used with a :class:`FacetGrid`. Along with these API changes, the following changes/enhancements were made to the plotting functions: - The default rules for ordering the categories has changed. Instead of automatically sorting the category levels, the plots now show the levels in the order they appear in the input data (i.e., the order given by ``Series.unique()``). Order can be specified when plotting with the ``order`` and ``hue_order`` parameters. Additionally, when variables are pandas objects with a "categorical" dtype, the category order is inferred from the data object. This change also affects :class:`FacetGrid` and :class:`PairGrid`. - Added the ``scale`` and ``scale_hue`` parameters to :func:`violinplot`. These control how the width of the violins are scaled. The default is ``area``, which is different from how the violins used to be drawn. Use ``scale='width'`` to get the old behavior. - Used a different style for the ``box`` kind of interior plot in :func:`violinplot`, which shows the whisker range in addition to the quartiles. Use ``inner='quartile'`` to get the old style. New plotting functions ~~~~~~~~~~~~~~~~~~~~~~ - Added the :func:`stripplot` function, which draws a scatterplot where one of the variables is categorical. This plot has the same API as :func:`boxplot` and :func:`violinplot`. It is useful both on its own and when composed with one of these other plot kinds to show both the observations and underlying distribution. - Added the :func:`countplot` function, which uses a bar plot representation to show counts of variables in one or more categorical bins. This replaces the old approach of calling :func:`barplot` without a numeric variable. Other additions and changes ~~~~~~~~~~~~~~~~~~~~~~~~~~~ - The :func:`corrplot` and underlying :func:`symmatplot` functions have been deprecated in favor of :func:`heatmap`, which is much more flexible and robust. These two functions are still available in version 0.6, but they will be removed in a future version. - Added the :func:`set_color_codes` function and the ``color_codes`` argument to :func:`set` and :func:`set_palette`. This changes the interpretation of shorthand color codes (i.e. "b", "g", k", etc.) within matplotlib to use the values from one of the named seaborn palettes (i.e. "deep", "muted", etc.). That makes it easier to have a more uniform look when using matplotlib functions directly with seaborn imported. This could be disruptive to existing plots, so it does not happen by default. It is possible this could change in the future. - The :func:`color_palette` function no longer trims palettes that are longer than 6 colors when passed into it. - Added the ``as_hex`` method to color palette objects, to return a list of hex codes rather than rgb tuples. - :func:`jointplot` now passes additional keyword arguments to the function used to draw the plot on the joint axes. - Changed the default ``linewidths`` in :func:`heatmap` and :func:`clustermap` to 0 so that larger matrices plot correctly. This parameter still exists and can be used to get the old effect of lines demarcating each cell in the heatmap (the old default ``linewidths`` was 0.5). - :func:`heatmap` and :func:`clustermap` now automatically use a mask for missing values, which previously were shown with the "under" value of the colormap per default `plt.pcolormesh` behavior. - Added the ``seaborn.crayons`` dictionary and the :func:`crayon_palette` function to define colors from the 120 box (!) of `Crayola crayons `_. - Added the ``line_kws`` parameter to :func:`residplot` to change the style of the lowess line, when used. - Added open-ended ``**kwargs`` to the ``add_legend`` method on :class:`FacetGrid` and :class:`PairGrid`, which will pass additional keyword arguments through when calling the legend function on the ``Figure`` or ``Axes``. - Added the ``gridspec_kws`` parameter to :class:`FacetGrid`, which allows for control over the size of individual facets in the grid to emphasize certain plots or account for differences in variable ranges. - The interactive palette widgets now show a continuous colorbar, rather than a discrete palette, when `as_cmap` is True. - The default Axes size for :func:`pairplot` and :class:`PairGrid` is now slightly smaller. - Added the ``shade_lowest`` parameter to :func:`kdeplot` which will set the alpha for the lowest contour level to 0, making it easier to plot multiple bivariate distributions on the same axes. - The ``height`` parameter of :func:`rugplot` is now interpreted as a function of the axis size and is invariant to changes in the data scale on that axis. The rug lines are also slightly narrower by default. - Added a catch in :func:`distplot` when calculating a default number of bins. For highly skewed data it will now use sqrt(n) bins, where previously the reference rule would return "infinite" bins and cause an exception in matplotlib. - Added a ceiling (50) to the default number of bins used for :func:`distplot` histograms. This will help avoid confusing errors with certain kinds of datasets that heavily violate the assumptions of the reference rule used to get a default number of bins. The ceiling is not applied when passing a specific number of bins. - The various property dictionaries that can be passed to ``plt.boxplot`` are now applied after the seaborn restyling to allow for full customizability. - Added a ``savefig`` method to :class:`JointGrid` that defaults to a tight bounding box to make it easier to save figures using this class, and set a tight bbox as the default for the ``savefig`` method on other Grid objects. - You can now pass an integer to the ``xticklabels`` and ``yticklabels`` parameter of :func:`heatmap` (and, by extension, :func:`clustermap`). This will make the plot use the ticklabels inferred from the data, but only plot every ``n`` label, where ``n`` is the number you pass. This can help when visualizing larger matrices with some sensible ordering to the rows or columns of the dataframe. - Added `"figure.facecolor"` to the style parameters and set the default to white. - The :func:`load_dataset` function now caches datasets locally after downloading them, and uses the local copy on subsequent calls. Bug fixes ~~~~~~~~~ - Fixed bugs in :func:`clustermap` where the mask and specified ticklabels were not being reorganized using the dendrograms. - Fixed a bug in :class:`FacetGrid` and :class:`PairGrid` that lead to incorrect legend labels when levels of the ``hue`` variable appeared in ``hue_order`` but not in the data. - Fixed a bug in :meth:`FacetGrid.set_xticklabels` or :meth:`FacetGrid.set_yticklabels` when ``col_wrap`` is being used. - Fixed a bug in :class:`PairGrid` where the ``hue_order`` parameter was ignored. - Fixed two bugs in :func:`despine` that caused errors when trying to trim the spines on plots that had inverted axes or no ticks. - Improved support for the ``margin_titles`` option in :class:`FacetGrid`, which can now be used with a legend. seaborn-0.6.0/doc/sphinxext/000077500000000000000000000000001254425553400157565ustar00rootroot00000000000000seaborn-0.6.0/doc/sphinxext/ipython_console_highlighting.py000066400000000000000000000101251254425553400242700ustar00rootroot00000000000000"""reST directive for syntax-highlighting ipython interactive sessions. XXX - See what improvements can be made based on the new (as of Sept 2009) 'pycon' lexer for the python console. At the very least it will give better highlighted tracebacks. """ #----------------------------------------------------------------------------- # Needed modules # Standard library import re # Third party from pygments.lexer import Lexer, do_insertions from pygments.lexers.agile import (PythonConsoleLexer, PythonLexer, PythonTracebackLexer) from pygments.token import Comment, Generic from sphinx import highlighting #----------------------------------------------------------------------------- # Global constants line_re = re.compile('.*?\n') #----------------------------------------------------------------------------- # Code begins - classes and functions class IPythonConsoleLexer(Lexer): """ For IPython console output or doctests, such as: .. sourcecode:: ipython In [1]: a = 'foo' In [2]: a Out[2]: 'foo' In [3]: print(a) foo In [4]: 1 / 0 Notes: - Tracebacks are not currently supported. - It assumes the default IPython prompts, not customized ones. """ name = 'IPython console session' aliases = ['ipython'] mimetypes = ['text/x-ipython-console'] input_prompt = re.compile("(In \[[0-9]+\]: )|( \.\.\.+:)") output_prompt = re.compile("(Out\[[0-9]+\]: )|( \.\.\.+:)") continue_prompt = re.compile(" \.\.\.+:") tb_start = re.compile("\-+") def get_tokens_unprocessed(self, text): pylexer = PythonLexer(**self.options) tblexer = PythonTracebackLexer(**self.options) curcode = '' insertions = [] for match in line_re.finditer(text): line = match.group() input_prompt = self.input_prompt.match(line) continue_prompt = self.continue_prompt.match(line.rstrip()) output_prompt = self.output_prompt.match(line) if line.startswith("#"): insertions.append((len(curcode), [(0, Comment, line)])) elif input_prompt is not None: insertions.append((len(curcode), [(0, Generic.Prompt, input_prompt.group())])) curcode += line[input_prompt.end():] elif continue_prompt is not None: insertions.append((len(curcode), [(0, Generic.Prompt, continue_prompt.group())])) curcode += line[continue_prompt.end():] elif output_prompt is not None: # Use the 'error' token for output. We should probably make # our own token, but error is typicaly in a bright color like # red, so it works fine for our output prompts. insertions.append((len(curcode), [(0, Generic.Error, output_prompt.group())])) curcode += line[output_prompt.end():] else: if curcode: for item in do_insertions(insertions, pylexer.get_tokens_unprocessed(curcode)): yield item curcode = '' insertions = [] yield match.start(), Generic.Output, line if curcode: for item in do_insertions(insertions, pylexer.get_tokens_unprocessed(curcode)): yield item def setup(app): """Setup as a sphinx extension.""" # This is only a lexer, so adding it below to pygments appears sufficient. # But if somebody knows that the right API usage should be to do that via # sphinx, by all means fix it here. At least having this setup.py # suppresses the sphinx warning we'd get without it. pass #----------------------------------------------------------------------------- # Register the extension as a valid pygments lexer highlighting.lexers['ipython'] = IPythonConsoleLexer() seaborn-0.6.0/doc/sphinxext/ipython_directive.py000066400000000000000000001112651254425553400220660ustar00rootroot00000000000000# -*- coding: utf-8 -*- """ Sphinx directive to support embedded IPython code. This directive allows pasting of entire interactive IPython sessions, prompts and all, and their code will actually get re-executed at doc build time, with all prompts renumbered sequentially. It also allows you to input code as a pure python input by giving the argument python to the directive. The output looks like an interactive ipython section. To enable this directive, simply list it in your Sphinx ``conf.py`` file (making sure the directory where you placed it is visible to sphinx, as is needed for all Sphinx directives). For example, to enable syntax highlighting and the IPython directive:: extensions = ['IPython.sphinxext.ipython_console_highlighting', 'IPython.sphinxext.ipython_directive'] The IPython directive outputs code-blocks with the language 'ipython'. So if you do not have the syntax highlighting extension enabled as well, then all rendered code-blocks will be uncolored. By default this directive assumes that your prompts are unchanged IPython ones, but this can be customized. The configurable options that can be placed in conf.py are: ipython_savefig_dir: The directory in which to save the figures. This is relative to the Sphinx source directory. The default is `html_static_path`. ipython_rgxin: The compiled regular expression to denote the start of IPython input lines. The default is re.compile('In \[(\d+)\]:\s?(.*)\s*'). You shouldn't need to change this. ipython_rgxout: The compiled regular expression to denote the start of IPython output lines. The default is re.compile('Out\[(\d+)\]:\s?(.*)\s*'). You shouldn't need to change this. ipython_promptin: The string to represent the IPython input prompt in the generated ReST. The default is 'In [%d]:'. This expects that the line numbers are used in the prompt. ipython_promptout: The string to represent the IPython prompt in the generated ReST. The default is 'Out [%d]:'. This expects that the line numbers are used in the prompt. ipython_mplbackend: The string which specifies if the embedded Sphinx shell should import Matplotlib and set the backend. The value specifies a backend that is passed to `matplotlib.use()` before any lines in `ipython_execlines` are executed. If not specified in conf.py, then the default value of 'agg' is used. To use the IPython directive without matplotlib as a dependency, set the value to `None`. It may end up that matplotlib is still imported if the user specifies so in `ipython_execlines` or makes use of the @savefig pseudo decorator. ipython_execlines: A list of strings to be exec'd in the embedded Sphinx shell. Typical usage is to make certain packages always available. Set this to an empty list if you wish to have no imports always available. If specified in conf.py as `None`, then it has the effect of making no imports available. If omitted from conf.py altogether, then the default value of ['import numpy as np', 'import matplotlib.pyplot as plt'] is used. ipython_holdcount When the @suppress pseudo-decorator is used, the execution count can be incremented or not. The default behavior is to hold the execution count, corresponding to a value of `True`. Set this to `False` to increment the execution count after each suppressed command. As an example, to use the IPython directive when `matplotlib` is not available, one sets the backend to `None`:: ipython_mplbackend = None An example usage of the directive is: .. code-block:: rst .. ipython:: In [1]: x = 1 In [2]: y = x**2 In [3]: print(y) See http://matplotlib.org/sampledoc/ipython_directive.html for additional documentation. ToDo ---- - Turn the ad-hoc test() function into a real test suite. - Break up ipython-specific functionality from matplotlib stuff into better separated code. Authors ------- - John D Hunter: orignal author. - Fernando Perez: refactoring, documentation, cleanups, port to 0.11. - VáclavŠmilauer : Prompt generalizations. - Skipper Seabold, refactoring, cleanups, pure python addition """ from __future__ import print_function from __future__ import unicode_literals #----------------------------------------------------------------------------- # Imports #----------------------------------------------------------------------------- # Stdlib import os import re import sys import tempfile import ast from pandas.compat import zip, range, map, lmap, u, cStringIO as StringIO import warnings # To keep compatibility with various python versions try: from hashlib import md5 except ImportError: from md5 import md5 # Third-party import sphinx from docutils.parsers.rst import directives from docutils import nodes from sphinx.util.compat import Directive # Our own from IPython import Config, InteractiveShell from IPython.core.profiledir import ProfileDir from IPython.utils import io from IPython.utils.py3compat import PY3 if PY3: from io import StringIO text_type = str else: from StringIO import StringIO text_type = unicode #----------------------------------------------------------------------------- # Globals #----------------------------------------------------------------------------- # for tokenizing blocks COMMENT, INPUT, OUTPUT = range(3) #----------------------------------------------------------------------------- # Functions and class declarations #----------------------------------------------------------------------------- def block_parser(part, rgxin, rgxout, fmtin, fmtout): """ part is a string of ipython text, comprised of at most one input, one ouput, comments, and blank lines. The block parser parses the text into a list of:: blocks = [ (TOKEN0, data0), (TOKEN1, data1), ...] where TOKEN is one of [COMMENT | INPUT | OUTPUT ] and data is, depending on the type of token:: COMMENT : the comment string INPUT: the (DECORATOR, INPUT_LINE, REST) where DECORATOR: the input decorator (or None) INPUT_LINE: the input as string (possibly multi-line) REST : any stdout generated by the input line (not OUTPUT) OUTPUT: the output string, possibly multi-line """ block = [] lines = part.split('\n') N = len(lines) i = 0 decorator = None while 1: if i==N: # nothing left to parse -- the last line break line = lines[i] i += 1 line_stripped = line.strip() if line_stripped.startswith('#'): block.append((COMMENT, line)) continue if line_stripped.startswith('@'): # we're assuming at most one decorator -- may need to # rethink decorator = line_stripped continue # does this look like an input line? matchin = rgxin.match(line) if matchin: lineno, inputline = int(matchin.group(1)), matchin.group(2) # the ....: continuation string continuation = ' %s:'%''.join(['.']*(len(str(lineno))+2)) Nc = len(continuation) # input lines can continue on for more than one line, if # we have a '\' line continuation char or a function call # echo line 'print'. The input line can only be # terminated by the end of the block or an output line, so # we parse out the rest of the input line if it is # multiline as well as any echo text rest = [] while i 1: if input_lines[-1] != "": input_lines.append('') # make sure there's a blank line # so splitter buffer gets reset continuation = ' %s:'%''.join(['.']*(len(str(lineno))+2)) if is_savefig: image_file, image_directive = self.process_image(decorator) ret = [] is_semicolon = False # Hold the execution count, if requested to do so. if is_suppress and self.hold_count: store_history = False else: store_history = True # Note: catch_warnings is not thread safe with warnings.catch_warnings(record=True) as ws: for i, line in enumerate(input_lines): if line.endswith(';'): is_semicolon = True if i == 0: # process the first input line if is_verbatim: self.process_input_line('') self.IP.execution_count += 1 # increment it anyway else: # only submit the line in non-verbatim mode self.process_input_line(line, store_history=store_history) formatted_line = '%s %s'%(input_prompt, line) else: # process a continuation line if not is_verbatim: self.process_input_line(line, store_history=store_history) formatted_line = '%s %s'%(continuation, line) if not is_suppress: ret.append(formatted_line) if not is_suppress and len(rest.strip()) and is_verbatim: # the "rest" is the standard output of the # input, which needs to be added in # verbatim mode ret.append(rest) self.cout.seek(0) output = self.cout.read() if not is_suppress and not is_semicolon: ret.append(output) elif is_semicolon: # get spacing right ret.append('') # context information filename = self.state.document.current_source lineno = self.state.document.current_line # output any exceptions raised during execution to stdout # unless :okexcept: has been specified. if not is_okexcept and "Traceback" in output: s = "\nException in %s at block ending on line %s\n" % (filename, lineno) s += "Specify :okexcept: as an option in the ipython:: block to suppress this message\n" sys.stdout.write('\n\n>>>' + ('-' * 73)) sys.stdout.write(s) sys.stdout.write(output) sys.stdout.write('<<<' + ('-' * 73) + '\n\n') # output any warning raised during execution to stdout # unless :okwarning: has been specified. if not is_okwarning: for w in ws: s = "\nWarning in %s at block ending on line %s\n" % (filename, lineno) s += "Specify :okwarning: as an option in the ipython:: block to suppress this message\n" sys.stdout.write('\n\n>>>' + ('-' * 73)) sys.stdout.write(s) sys.stdout.write('-' * 76 + '\n') s=warnings.formatwarning(w.message, w.category, w.filename, w.lineno, w.line) sys.stdout.write(s) sys.stdout.write('<<<' + ('-' * 73) + '\n') self.cout.truncate(0) return (ret, input_lines, output, is_doctest, decorator, image_file, image_directive) def process_output(self, data, output_prompt, input_lines, output, is_doctest, decorator, image_file): """ Process data block for OUTPUT token. """ TAB = ' ' * 4 if is_doctest and output is not None: found = output found = found.strip() submitted = data.strip() if self.directive is None: source = 'Unavailable' content = 'Unavailable' else: source = self.directive.state.document.current_source content = self.directive.content # Add tabs and join into a single string. content = '\n'.join([TAB + line for line in content]) # Make sure the output contains the output prompt. ind = found.find(output_prompt) if ind < 0: e = ('output does not contain output prompt\n\n' 'Document source: {0}\n\n' 'Raw content: \n{1}\n\n' 'Input line(s):\n{TAB}{2}\n\n' 'Output line(s):\n{TAB}{3}\n\n') e = e.format(source, content, '\n'.join(input_lines), repr(found), TAB=TAB) raise RuntimeError(e) found = found[len(output_prompt):].strip() # Handle the actual doctest comparison. if decorator.strip() == '@doctest': # Standard doctest if found != submitted: e = ('doctest failure\n\n' 'Document source: {0}\n\n' 'Raw content: \n{1}\n\n' 'On input line(s):\n{TAB}{2}\n\n' 'we found output:\n{TAB}{3}\n\n' 'instead of the expected:\n{TAB}{4}\n\n') e = e.format(source, content, '\n'.join(input_lines), repr(found), repr(submitted), TAB=TAB) raise RuntimeError(e) else: self.custom_doctest(decorator, input_lines, found, submitted) def process_comment(self, data): """Process data fPblock for COMMENT token.""" if not self.is_suppress: return [data] def save_image(self, image_file): """ Saves the image file to disk. """ self.ensure_pyplot() command = ('plt.gcf().savefig("%s", bbox_inches="tight", ' 'dpi=100)' % image_file) #print 'SAVEFIG', command # dbg self.process_input_line('bookmark ipy_thisdir', store_history=False) self.process_input_line('cd -b ipy_savedir', store_history=False) self.process_input_line(command, store_history=False) self.process_input_line('cd -b ipy_thisdir', store_history=False) self.process_input_line('bookmark -d ipy_thisdir', store_history=False) self.clear_cout() def process_block(self, block): """ process block from the block_parser and return a list of processed lines """ ret = [] output = None input_lines = None lineno = self.IP.execution_count input_prompt = self.promptin % lineno output_prompt = self.promptout % lineno image_file = None image_directive = None for token, data in block: if token == COMMENT: out_data = self.process_comment(data) elif token == INPUT: (out_data, input_lines, output, is_doctest, decorator, image_file, image_directive) = \ self.process_input(data, input_prompt, lineno) elif token == OUTPUT: out_data = \ self.process_output(data, output_prompt, input_lines, output, is_doctest, decorator, image_file) if out_data: ret.extend(out_data) # save the image files if image_file is not None: self.save_image(image_file) return ret, image_directive def ensure_pyplot(self): """ Ensures that pyplot has been imported into the embedded IPython shell. Also, makes sure to set the backend appropriately if not set already. """ # We are here if the @figure pseudo decorator was used. Thus, it's # possible that we could be here even if python_mplbackend were set to # `None`. That's also strange and perhaps worthy of raising an # exception, but for now, we just set the backend to 'agg'. if not self._pyplot_imported: if 'matplotlib.backends' not in sys.modules: # Then ipython_matplotlib was set to None but there was a # call to the @figure decorator (and ipython_execlines did # not set a backend). #raise Exception("No backend was set, but @figure was used!") import matplotlib matplotlib.use('agg') # Always import pyplot into embedded shell. self.process_input_line('import matplotlib.pyplot as plt', store_history=False) self._pyplot_imported = True def process_pure_python(self, content): """ content is a list of strings. it is unedited directive content This runs it line by line in the InteractiveShell, prepends prompts as needed capturing stderr and stdout, then returns the content as a list as if it were ipython code """ output = [] savefig = False # keep up with this to clear figure multiline = False # to handle line continuation multiline_start = None fmtin = self.promptin ct = 0 for lineno, line in enumerate(content): line_stripped = line.strip() if not len(line): output.append(line) continue # handle decorators if line_stripped.startswith('@'): output.extend([line]) if 'savefig' in line: savefig = True # and need to clear figure continue # handle comments if line_stripped.startswith('#'): output.extend([line]) continue # deal with lines checking for multiline continuation = u' %s:'% ''.join(['.']*(len(str(ct))+2)) if not multiline: modified = u"%s %s" % (fmtin % ct, line_stripped) output.append(modified) ct += 1 try: ast.parse(line_stripped) output.append(u'') except Exception: # on a multiline multiline = True multiline_start = lineno else: # still on a multiline modified = u'%s %s' % (continuation, line) output.append(modified) # if the next line is indented, it should be part of multiline if len(content) > lineno + 1: nextline = content[lineno + 1] if len(nextline) - len(nextline.lstrip()) > 3: continue try: mod = ast.parse( '\n'.join(content[multiline_start:lineno+1])) if isinstance(mod.body[0], ast.FunctionDef): # check to see if we have the whole function for element in mod.body[0].body: if isinstance(element, ast.Return): multiline = False else: output.append(u'') multiline = False except Exception: pass if savefig: # clear figure if plotted self.ensure_pyplot() self.process_input_line('plt.clf()', store_history=False) self.clear_cout() savefig = False return output def custom_doctest(self, decorator, input_lines, found, submitted): """ Perform a specialized doctest. """ from .custom_doctests import doctests args = decorator.split() doctest_type = args[1] if doctest_type in doctests: doctests[doctest_type](self, args, input_lines, found, submitted) else: e = "Invalid option to @doctest: {0}".format(doctest_type) raise Exception(e) class IPythonDirective(Directive): has_content = True required_arguments = 0 optional_arguments = 4 # python, suppress, verbatim, doctest final_argumuent_whitespace = True option_spec = { 'python': directives.unchanged, 'suppress' : directives.flag, 'verbatim' : directives.flag, 'doctest' : directives.flag, 'okexcept': directives.flag, 'okwarning': directives.flag, 'output_encoding': directives.unchanged_required } shell = None seen_docs = set() def get_config_options(self): # contains sphinx configuration variables config = self.state.document.settings.env.config # get config variables to set figure output directory confdir = self.state.document.settings.env.app.confdir savefig_dir = config.ipython_savefig_dir source_dir = os.path.dirname(self.state.document.current_source) if savefig_dir is None: savefig_dir = config.html_static_path if isinstance(savefig_dir, list): savefig_dir = savefig_dir[0] # safe to assume only one path? savefig_dir = os.path.join(confdir, savefig_dir) # get regex and prompt stuff rgxin = config.ipython_rgxin rgxout = config.ipython_rgxout promptin = config.ipython_promptin promptout = config.ipython_promptout mplbackend = config.ipython_mplbackend exec_lines = config.ipython_execlines hold_count = config.ipython_holdcount return (savefig_dir, source_dir, rgxin, rgxout, promptin, promptout, mplbackend, exec_lines, hold_count) def setup(self): # Get configuration values. (savefig_dir, source_dir, rgxin, rgxout, promptin, promptout, mplbackend, exec_lines, hold_count) = self.get_config_options() if self.shell is None: # We will be here many times. However, when the # EmbeddedSphinxShell is created, its interactive shell member # is the same for each instance. if mplbackend: import matplotlib # Repeated calls to use() will not hurt us since `mplbackend` # is the same each time. matplotlib.use(mplbackend) # Must be called after (potentially) importing matplotlib and # setting its backend since exec_lines might import pylab. self.shell = EmbeddedSphinxShell(exec_lines, self.state) # Store IPython directive to enable better error messages self.shell.directive = self # reset the execution count if we haven't processed this doc #NOTE: this may be borked if there are multiple seen_doc tmp files #check time stamp? if not self.state.document.current_source in self.seen_docs: self.shell.IP.history_manager.reset() self.shell.IP.execution_count = 1 self.shell.IP.prompt_manager.width = 0 self.seen_docs.add(self.state.document.current_source) # and attach to shell so we don't have to pass them around self.shell.rgxin = rgxin self.shell.rgxout = rgxout self.shell.promptin = promptin self.shell.promptout = promptout self.shell.savefig_dir = savefig_dir self.shell.source_dir = source_dir self.shell.hold_count = hold_count # setup bookmark for saving figures directory self.shell.process_input_line('bookmark ipy_savedir %s'%savefig_dir, store_history=False) self.shell.clear_cout() return rgxin, rgxout, promptin, promptout def teardown(self): # delete last bookmark self.shell.process_input_line('bookmark -d ipy_savedir', store_history=False) self.shell.clear_cout() def run(self): debug = False #TODO, any reason block_parser can't be a method of embeddable shell # then we wouldn't have to carry these around rgxin, rgxout, promptin, promptout = self.setup() options = self.options self.shell.is_suppress = 'suppress' in options self.shell.is_doctest = 'doctest' in options self.shell.is_verbatim = 'verbatim' in options self.shell.is_okexcept = 'okexcept' in options self.shell.is_okwarning = 'okwarning' in options self.shell.output_encoding = [options.get('output_encoding', 'utf8')] # handle pure python code if 'python' in self.arguments: content = self.content self.content = self.shell.process_pure_python(content) parts = '\n'.join(self.content).split('\n\n') lines = ['.. code-block:: ipython', ''] figures = [] for part in parts: block = block_parser(part, rgxin, rgxout, promptin, promptout) if len(block): rows, figure = self.shell.process_block(block) for row in rows: lines.extend([' %s'%line for line in row.split('\n')]) if figure is not None: figures.append(figure) for figure in figures: lines.append('') lines.extend(figure.split('\n')) lines.append('') if len(lines)>2: if debug: print('\n'.join(lines)) else: # This has to do with input, not output. But if we comment # these lines out, then no IPython code will appear in the # final output. self.state_machine.insert_input( lines, self.state_machine.input_lines.source(0)) # cleanup self.teardown() return [] # Enable as a proper Sphinx directive def setup(app): setup.app = app app.add_directive('ipython', IPythonDirective) app.add_config_value('ipython_savefig_dir', None, 'env') app.add_config_value('ipython_rgxin', re.compile('In \[(\d+)\]:\s?(.*)\s*'), 'env') app.add_config_value('ipython_rgxout', re.compile('Out\[(\d+)\]:\s?(.*)\s*'), 'env') app.add_config_value('ipython_promptin', 'In [%d]:', 'env') app.add_config_value('ipython_promptout', 'Out[%d]:', 'env') # We could just let matplotlib pick whatever is specified as the default # backend in the matplotlibrc file, but this would cause issues if the # backend didn't work in headless environments. For this reason, 'agg' # is a good default backend choice. app.add_config_value('ipython_mplbackend', 'agg', 'env') # If the user sets this config value to `None`, then EmbeddedSphinxShell's # __init__ method will treat it as []. execlines = ['import numpy as np', 'import matplotlib.pyplot as plt'] app.add_config_value('ipython_execlines', execlines, 'env') app.add_config_value('ipython_holdcount', True, 'env') # Simple smoke test, needs to be converted to a proper automatic test. def test(): examples = [ r""" In [9]: pwd Out[9]: '/home/jdhunter/py4science/book' In [10]: cd bookdata/ /home/jdhunter/py4science/book/bookdata In [2]: from pylab import * In [2]: ion() In [3]: im = imread('stinkbug.png') @savefig mystinkbug.png width=4in In [4]: imshow(im) Out[4]: """, r""" In [1]: x = 'hello world' # string methods can be # used to alter the string @doctest In [2]: x.upper() Out[2]: 'HELLO WORLD' @verbatim In [3]: x.st x.startswith x.strip """, r""" In [130]: url = 'http://ichart.finance.yahoo.com/table.csv?s=CROX\ .....: &d=9&e=22&f=2009&g=d&a=1&br=8&c=2006&ignore=.csv' In [131]: print url.split('&') ['http://ichart.finance.yahoo.com/table.csv?s=CROX', 'd=9', 'e=22', 'f=2009', 'g=d', 'a=1', 'b=8', 'c=2006', 'ignore=.csv'] In [60]: import urllib """, r"""\ In [133]: import numpy.random @suppress In [134]: numpy.random.seed(2358) @doctest In [135]: numpy.random.rand(10,2) Out[135]: array([[ 0.64524308, 0.59943846], [ 0.47102322, 0.8715456 ], [ 0.29370834, 0.74776844], [ 0.99539577, 0.1313423 ], [ 0.16250302, 0.21103583], [ 0.81626524, 0.1312433 ], [ 0.67338089, 0.72302393], [ 0.7566368 , 0.07033696], [ 0.22591016, 0.77731835], [ 0.0072729 , 0.34273127]]) """, r""" In [106]: print x jdh In [109]: for i in range(10): .....: print i .....: .....: 0 1 2 3 4 5 6 7 8 9 """, r""" In [144]: from pylab import * In [145]: ion() # use a semicolon to suppress the output @savefig test_hist.png width=4in In [151]: hist(np.random.randn(10000), 100); @savefig test_plot.png width=4in In [151]: plot(np.random.randn(10000), 'o'); """, r""" # use a semicolon to suppress the output In [151]: plt.clf() @savefig plot_simple.png width=4in In [151]: plot([1,2,3]) @savefig hist_simple.png width=4in In [151]: hist(np.random.randn(10000), 100); """, r""" # update the current fig In [151]: ylabel('number') In [152]: title('normal distribution') @savefig hist_with_text.png In [153]: grid(True) @doctest float In [154]: 0.1 + 0.2 Out[154]: 0.3 @doctest float In [155]: np.arange(16).reshape(4,4) Out[155]: array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [12, 13, 14, 15]]) In [1]: x = np.arange(16, dtype=float).reshape(4,4) In [2]: x[0,0] = np.inf In [3]: x[0,1] = np.nan @doctest float In [4]: x Out[4]: array([[ inf, nan, 2., 3.], [ 4., 5., 6., 7.], [ 8., 9., 10., 11.], [ 12., 13., 14., 15.]]) """, ] # skip local-file depending first example: examples = examples[1:] #ipython_directive.DEBUG = True # dbg #options = dict(suppress=True) # dbg options = dict() for example in examples: content = example.split('\n') IPythonDirective('debug', arguments=None, options=options, content=content, lineno=0, content_offset=None, block_text=None, state=None, state_machine=None, ) # Run test suite as a script if __name__=='__main__': if not os.path.isdir('_static'): os.mkdir('_static') test() print('All OK? Check figures in _static/') seaborn-0.6.0/doc/sphinxext/plot_directive.py000066400000000000000000000656721254425553400213640ustar00rootroot00000000000000""" A directive for including a matplotlib plot in a Sphinx document. By default, in HTML output, `plot` will include a .png file with a link to a high-res .png and .pdf. In LaTeX output, it will include a .pdf. The source code for the plot may be included in one of three ways: 1. **A path to a source file** as the argument to the directive:: .. plot:: path/to/plot.py When a path to a source file is given, the content of the directive may optionally contain a caption for the plot:: .. plot:: path/to/plot.py This is the caption for the plot Additionally, one my specify the name of a function to call (with no arguments) immediately after importing the module:: .. plot:: path/to/plot.py plot_function1 2. Included as **inline content** to the directive:: .. plot:: import matplotlib.pyplot as plt import matplotlib.image as mpimg import numpy as np img = mpimg.imread('_static/stinkbug.png') imgplot = plt.imshow(img) 3. Using **doctest** syntax:: .. plot:: A plotting example: >>> import matplotlib.pyplot as plt >>> plt.plot([1,2,3], [4,5,6]) Options ------- The ``plot`` directive supports the following options: format : {'python', 'doctest'} Specify the format of the input include-source : bool Whether to display the source code. The default can be changed using the `plot_include_source` variable in conf.py encoding : str If this source file is in a non-UTF8 or non-ASCII encoding, the encoding must be specified using the `:encoding:` option. The encoding will not be inferred using the ``-*- coding -*-`` metacomment. context : bool or str If provided, the code will be run in the context of all previous plot directives for which the `:context:` option was specified. This only applies to inline code plot directives, not those run from files. If the ``:context: reset`` option is specified, the context is reset for this and future plots, and previous figures are closed prior to running the code. ``:context:close-figs`` keeps the context but closes previous figures before running the code. nofigs : bool If specified, the code block will be run, but no figures will be inserted. This is usually useful with the ``:context:`` option. Additionally, this directive supports all of the options of the `image` directive, except for `target` (since plot will add its own target). These include `alt`, `height`, `width`, `scale`, `align` and `class`. Configuration options --------------------- The plot directive has the following configuration options: plot_include_source Default value for the include-source option plot_html_show_source_link Whether to show a link to the source in HTML. plot_pre_code Code that should be executed before each plot. plot_basedir Base directory, to which ``plot::`` file names are relative to. (If None or empty, file names are relative to the directory where the file containing the directive is.) plot_formats File formats to generate. List of tuples or strings:: [(suffix, dpi), suffix, ...] that determine the file format and the DPI. For entries whose DPI was omitted, sensible defaults are chosen. plot_html_show_formats Whether to show links to the files in HTML. plot_rcparams A dictionary containing any non-standard rcParams that should be applied before each plot. plot_apply_rcparams By default, rcParams are applied when `context` option is not used in a plot directive. This configuration option overrides this behavior and applies rcParams before each plot. plot_working_directory By default, the working directory will be changed to the directory of the example, so the code can get at its data files, if any. Also its path will be added to `sys.path` so it can import any helper modules sitting beside it. This configuration option can be used to specify a central directory (also added to `sys.path`) where data files and helper modules for all code are located. plot_template Provide a customized template for preparing restructured text. """ from __future__ import (absolute_import, division, print_function, unicode_literals) import six from six.moves import xrange import sys, os, shutil, io, re, textwrap from os.path import relpath import traceback if not six.PY3: import cStringIO from docutils.parsers.rst import directives from docutils.parsers.rst.directives.images import Image align = Image.align import sphinx sphinx_version = sphinx.__version__.split(".") # The split is necessary for sphinx beta versions where the string is # '6b1' sphinx_version = tuple([int(re.split('[^0-9]', x)[0]) for x in sphinx_version[:2]]) try: # Sphinx depends on either Jinja or Jinja2 import jinja2 def format_template(template, **kw): return jinja2.Template(template).render(**kw) except ImportError: import jinja def format_template(template, **kw): return jinja.from_string(template, **kw) import matplotlib import matplotlib.cbook as cbook matplotlib.use('Agg') import matplotlib.pyplot as plt from matplotlib import _pylab_helpers __version__ = 2 #------------------------------------------------------------------------------ # Registration hook #------------------------------------------------------------------------------ def plot_directive(name, arguments, options, content, lineno, content_offset, block_text, state, state_machine): return run(arguments, content, options, state_machine, state, lineno) plot_directive.__doc__ = __doc__ def _option_boolean(arg): if not arg or not arg.strip(): # no argument given, assume used as a flag return True elif arg.strip().lower() in ('no', '0', 'false'): return False elif arg.strip().lower() in ('yes', '1', 'true'): return True else: raise ValueError('"%s" unknown boolean' % arg) def _option_context(arg): if arg in [None, 'reset', 'close-figs']: return arg raise ValueError("argument should be None or 'reset' or 'close-figs'") def _option_format(arg): return directives.choice(arg, ('python', 'doctest')) def _option_align(arg): return directives.choice(arg, ("top", "middle", "bottom", "left", "center", "right")) def mark_plot_labels(app, document): """ To make plots referenceable, we need to move the reference from the "htmlonly" (or "latexonly") node to the actual figure node itself. """ for name, explicit in six.iteritems(document.nametypes): if not explicit: continue labelid = document.nameids[name] if labelid is None: continue node = document.ids[labelid] if node.tagname in ('html_only', 'latex_only'): for n in node: if n.tagname == 'figure': sectname = name for c in n: if c.tagname == 'caption': sectname = c.astext() break node['ids'].remove(labelid) node['names'].remove(name) n['ids'].append(labelid) n['names'].append(name) document.settings.env.labels[name] = \ document.settings.env.docname, labelid, sectname break def setup(app): setup.app = app setup.config = app.config setup.confdir = app.confdir options = {'alt': directives.unchanged, 'height': directives.length_or_unitless, 'width': directives.length_or_percentage_or_unitless, 'scale': directives.nonnegative_int, 'align': _option_align, 'class': directives.class_option, 'include-source': _option_boolean, 'format': _option_format, 'context': _option_context, 'nofigs': directives.flag, 'encoding': directives.encoding } app.add_directive('plot', plot_directive, True, (0, 2, False), **options) app.add_config_value('plot_pre_code', None, True) app.add_config_value('plot_include_source', False, True) app.add_config_value('plot_html_show_source_link', True, True) app.add_config_value('plot_formats', ['png', 'hires.png', 'pdf'], True) app.add_config_value('plot_basedir', None, True) app.add_config_value('plot_html_show_formats', True, True) app.add_config_value('plot_rcparams', {}, True) app.add_config_value('plot_apply_rcparams', False, True) app.add_config_value('plot_working_directory', None, True) app.add_config_value('plot_template', None, True) app.connect(str('doctree-read'), mark_plot_labels) #------------------------------------------------------------------------------ # Doctest handling #------------------------------------------------------------------------------ def contains_doctest(text): try: # check if it's valid Python as-is compile(text, '', 'exec') return False except SyntaxError: pass r = re.compile(r'^\s*>>>', re.M) m = r.search(text) return bool(m) def unescape_doctest(text): """ Extract code from a piece of text, which contains either Python code or doctests. """ if not contains_doctest(text): return text code = "" for line in text.split("\n"): m = re.match(r'^\s*(>>>|\.\.\.) (.*)$', line) if m: code += m.group(2) + "\n" elif line.strip(): code += "# " + line.strip() + "\n" else: code += "\n" return code def split_code_at_show(text): """ Split code at plt.show() """ parts = [] is_doctest = contains_doctest(text) part = [] for line in text.split("\n"): if (not is_doctest and line.strip() == 'plt.show()') or \ (is_doctest and line.strip() == '>>> plt.show()'): part.append(line) parts.append("\n".join(part)) part = [] else: part.append(line) if "\n".join(part).strip(): parts.append("\n".join(part)) return parts def remove_coding(text): """ Remove the coding comment, which six.exec_ doesn't like. """ sub_re = re.compile("^#\s*-\*-\s*coding:\s*.*-\*-$", flags=re.MULTILINE) return sub_re.sub("", text) #------------------------------------------------------------------------------ # Template #------------------------------------------------------------------------------ TEMPLATE = """ {{ source_code }} {{ only_html }} {% if source_link or (html_show_formats and not multi_image) %} ( {%- if source_link -%} `Source code <{{ source_link }}>`__ {%- endif -%} {%- if html_show_formats and not multi_image -%} {%- for img in images -%} {%- for fmt in img.formats -%} {%- if source_link or not loop.first -%}, {% endif -%} `{{ fmt }} <{{ dest_dir }}/{{ img.basename }}.{{ fmt }}>`__ {%- endfor -%} {%- endfor -%} {%- endif -%} ) {% endif %} {% for img in images %} .. figure:: {{ build_dir }}/{{ img.basename }}.png {% for option in options -%} {{ option }} {% endfor %} {% if html_show_formats and multi_image -%} ( {%- for fmt in img.formats -%} {%- if not loop.first -%}, {% endif -%} `{{ fmt }} <{{ dest_dir }}/{{ img.basename }}.{{ fmt }}>`__ {%- endfor -%} ) {%- endif -%} {{ caption }} {% endfor %} {{ only_latex }} {% for img in images %} {% if 'pdf' in img.formats -%} .. image:: {{ build_dir }}/{{ img.basename }}.pdf {% endif -%} {% endfor %} {{ only_texinfo }} {% for img in images %} .. image:: {{ build_dir }}/{{ img.basename }}.png {% for option in options -%} {{ option }} {% endfor %} {% endfor %} """ exception_template = """ .. htmlonly:: [`source code <%(linkdir)s/%(basename)s.py>`__] Exception occurred rendering plot. """ # the context of the plot for all directives specified with the # :context: option plot_context = dict() class ImageFile(object): def __init__(self, basename, dirname): self.basename = basename self.dirname = dirname self.formats = [] def filename(self, format): return os.path.join(self.dirname, "%s.%s" % (self.basename, format)) def filenames(self): return [self.filename(fmt) for fmt in self.formats] def out_of_date(original, derived): """ Returns True if derivative is out-of-date wrt original, both of which are full file paths. """ return (not os.path.exists(derived) or (os.path.exists(original) and os.stat(derived).st_mtime < os.stat(original).st_mtime)) class PlotError(RuntimeError): pass def run_code(code, code_path, ns=None, function_name=None): """ Import a Python module from a path, and run the function given by name, if function_name is not None. """ # Change the working directory to the directory of the example, so # it can get at its data files, if any. Add its path to sys.path # so it can import any helper modules sitting beside it. if six.PY2: pwd = os.getcwdu() else: pwd = os.getcwd() old_sys_path = list(sys.path) if setup.config.plot_working_directory is not None: try: os.chdir(setup.config.plot_working_directory) except OSError as err: raise OSError(str(err) + '\n`plot_working_directory` option in' 'Sphinx configuration file must be a valid ' 'directory path') except TypeError as err: raise TypeError(str(err) + '\n`plot_working_directory` option in ' 'Sphinx configuration file must be a string or ' 'None') sys.path.insert(0, setup.config.plot_working_directory) elif code_path is not None: dirname = os.path.abspath(os.path.dirname(code_path)) os.chdir(dirname) sys.path.insert(0, dirname) # Reset sys.argv old_sys_argv = sys.argv sys.argv = [code_path] # Redirect stdout stdout = sys.stdout if six.PY3: sys.stdout = io.StringIO() else: sys.stdout = cStringIO.StringIO() # Assign a do-nothing print function to the namespace. There # doesn't seem to be any other way to provide a way to (not) print # that works correctly across Python 2 and 3. def _dummy_print(*arg, **kwarg): pass try: try: code = unescape_doctest(code) if ns is None: ns = {} if not ns: if setup.config.plot_pre_code is None: six.exec_(six.text_type("import numpy as np\n" + "from matplotlib import pyplot as plt\n"), ns) else: six.exec_(six.text_type(setup.config.plot_pre_code), ns) ns['print'] = _dummy_print if "__main__" in code: six.exec_("__name__ = '__main__'", ns) code = remove_coding(code) six.exec_(code, ns) if function_name is not None: six.exec_(function_name + "()", ns) except (Exception, SystemExit) as err: raise PlotError(traceback.format_exc()) finally: os.chdir(pwd) sys.argv = old_sys_argv sys.path[:] = old_sys_path sys.stdout = stdout return ns def clear_state(plot_rcparams, close=True): if close: plt.close('all') matplotlib.rc_file_defaults() matplotlib.rcParams.update(plot_rcparams) def render_figures(code, code_path, output_dir, output_base, context, function_name, config, context_reset=False, close_figs=False): """ Run a pyplot script and save the low and high res PNGs and a PDF in *output_dir*. Save the images under *output_dir* with file names derived from *output_base* """ # -- Parse format list default_dpi = {'png': 80, 'hires.png': 200, 'pdf': 200} formats = [] plot_formats = config.plot_formats if isinstance(plot_formats, six.string_types): plot_formats = eval(plot_formats) for fmt in plot_formats: if isinstance(fmt, six.string_types): formats.append((fmt, default_dpi.get(fmt, 80))) elif type(fmt) in (tuple, list) and len(fmt)==2: formats.append((str(fmt[0]), int(fmt[1]))) else: raise PlotError('invalid image format "%r" in plot_formats' % fmt) # -- Try to determine if all images already exist code_pieces = split_code_at_show(code) # Look for single-figure output files first all_exists = True img = ImageFile(output_base, output_dir) for format, dpi in formats: if out_of_date(code_path, img.filename(format)): all_exists = False break img.formats.append(format) if all_exists: return [(code, [img])] # Then look for multi-figure output files results = [] all_exists = True for i, code_piece in enumerate(code_pieces): images = [] for j in xrange(1000): if len(code_pieces) > 1: img = ImageFile('%s_%02d_%02d' % (output_base, i, j), output_dir) else: img = ImageFile('%s_%02d' % (output_base, j), output_dir) for format, dpi in formats: if out_of_date(code_path, img.filename(format)): all_exists = False break img.formats.append(format) # assume that if we have one, we have them all if not all_exists: all_exists = (j > 0) break images.append(img) if not all_exists: break results.append((code_piece, images)) if all_exists: return results # We didn't find the files, so build them results = [] if context: ns = plot_context else: ns = {} if context_reset: clear_state(config.plot_rcparams) plot_context.clear() close_figs = not context or close_figs for i, code_piece in enumerate(code_pieces): if not context or config.plot_apply_rcparams: clear_state(config.plot_rcparams, close_figs) elif close_figs: plt.close('all') run_code(code_piece, code_path, ns, function_name) images = [] fig_managers = _pylab_helpers.Gcf.get_all_fig_managers() for j, figman in enumerate(fig_managers): if len(fig_managers) == 1 and len(code_pieces) == 1: img = ImageFile(output_base, output_dir) elif len(code_pieces) == 1: img = ImageFile("%s_%02d" % (output_base, j), output_dir) else: img = ImageFile("%s_%02d_%02d" % (output_base, i, j), output_dir) images.append(img) for format, dpi in formats: try: figman.canvas.figure.savefig(img.filename(format), dpi=dpi, bbox_inches="tight") except Exception as err: raise PlotError(traceback.format_exc()) img.formats.append(format) results.append((code_piece, images)) if not context or config.plot_apply_rcparams: clear_state(config.plot_rcparams, close=not context) return results def run(arguments, content, options, state_machine, state, lineno): # The user may provide a filename *or* Python code content, but not both if arguments and content: raise RuntimeError("plot:: directive can't have both args and content") document = state_machine.document config = document.settings.env.config nofigs = 'nofigs' in options options.setdefault('include-source', config.plot_include_source) keep_context = 'context' in options context_opt = None if not keep_context else options['context'] rst_file = document.attributes['source'] rst_dir = os.path.dirname(rst_file) if len(arguments): if not config.plot_basedir: source_file_name = os.path.join(setup.app.builder.srcdir, directives.uri(arguments[0])) else: source_file_name = os.path.join(setup.confdir, config.plot_basedir, directives.uri(arguments[0])) # If there is content, it will be passed as a caption. caption = '\n'.join(content) # If the optional function name is provided, use it if len(arguments) == 2: function_name = arguments[1] else: function_name = None with io.open(source_file_name, 'r', encoding='utf-8') as fd: code = fd.read() output_base = os.path.basename(source_file_name) else: source_file_name = rst_file code = textwrap.dedent("\n".join(map(str, content))) counter = document.attributes.get('_plot_counter', 0) + 1 document.attributes['_plot_counter'] = counter base, ext = os.path.splitext(os.path.basename(source_file_name)) output_base = '%s-%d.py' % (base, counter) function_name = None caption = '' base, source_ext = os.path.splitext(output_base) if source_ext in ('.py', '.rst', '.txt'): output_base = base else: source_ext = '' # ensure that LaTeX includegraphics doesn't choke in foo.bar.pdf filenames output_base = output_base.replace('.', '-') # is it in doctest format? is_doctest = contains_doctest(code) if 'format' in options: if options['format'] == 'python': is_doctest = False else: is_doctest = True # determine output directory name fragment source_rel_name = relpath(source_file_name, setup.confdir) source_rel_dir = os.path.dirname(source_rel_name) while source_rel_dir.startswith(os.path.sep): source_rel_dir = source_rel_dir[1:] # build_dir: where to place output files (temporarily) build_dir = os.path.join(os.path.dirname(setup.app.doctreedir), 'plot_directive', source_rel_dir) # get rid of .. in paths, also changes pathsep # see note in Python docs for warning about symbolic links on Windows. # need to compare source and dest paths at end build_dir = os.path.normpath(build_dir) if not os.path.exists(build_dir): os.makedirs(build_dir) # output_dir: final location in the builder's directory dest_dir = os.path.abspath(os.path.join(setup.app.builder.outdir, source_rel_dir)) if not os.path.exists(dest_dir): os.makedirs(dest_dir) # no problem here for me, but just use built-ins # how to link to files from the RST file dest_dir_link = os.path.join(relpath(setup.confdir, rst_dir), source_rel_dir).replace(os.path.sep, '/') build_dir_link = relpath(build_dir, rst_dir).replace(os.path.sep, '/') source_link = dest_dir_link + '/' + output_base + source_ext # make figures try: results = render_figures(code, source_file_name, build_dir, output_base, keep_context, function_name, config, context_reset=context_opt == 'reset', close_figs=context_opt == 'close-figs') errors = [] except PlotError as err: reporter = state.memo.reporter sm = reporter.system_message( 2, "Exception occurred in plotting %s\n from %s:\n%s" % (output_base, source_file_name, err), line=lineno) results = [(code, [])] errors = [sm] # Properly indent the caption caption = '\n'.join(' ' + line.strip() for line in caption.split('\n')) # generate output restructuredtext total_lines = [] for j, (code_piece, images) in enumerate(results): if options['include-source']: if is_doctest: lines = [''] lines += [row.rstrip() for row in code_piece.split('\n')] else: lines = ['.. code-block:: python', ''] lines += [' %s' % row.rstrip() for row in code_piece.split('\n')] source_code = "\n".join(lines) else: source_code = "" if nofigs: images = [] opts = [':%s: %s' % (key, val) for key, val in six.iteritems(options) if key in ('alt', 'height', 'width', 'scale', 'align', 'class')] only_html = ".. only:: html" only_latex = ".. only:: latex" only_texinfo = ".. only:: texinfo" # Not-None src_link signals the need for a source link in the generated # html if j == 0 and config.plot_html_show_source_link: src_link = source_link else: src_link = None result = format_template( config.plot_template or TEMPLATE, dest_dir=dest_dir_link, build_dir=build_dir_link, source_link=src_link, multi_image=len(images) > 1, only_html=only_html, only_latex=only_latex, only_texinfo=only_texinfo, options=opts, images=images, source_code=source_code, html_show_formats=config.plot_html_show_formats and not nofigs, caption=caption) total_lines.extend(result.split("\n")) total_lines.extend("\n") if total_lines: state_machine.insert_input(total_lines, source=source_file_name) # copy image files to builder's output directory, if necessary if not os.path.exists(dest_dir): cbook.mkdirs(dest_dir) for code_piece, images in results: for img in images: for fn in img.filenames(): destimg = os.path.join(dest_dir, os.path.basename(fn)) if fn != destimg: shutil.copyfile(fn, destimg) # copy script (if necessary) target_name = os.path.join(dest_dir, output_base + source_ext) with io.open(target_name, 'w', encoding="utf-8") as f: if source_file_name == rst_file: code_escaped = unescape_doctest(code) else: code_escaped = code f.write(code_escaped) return errors seaborn-0.6.0/doc/sphinxext/plot_generator.py000066400000000000000000000234631254425553400213640ustar00rootroot00000000000000""" Sphinx plugin to run example scripts and create a gallery page. Lightly modified from the mpld3 project. """ from __future__ import division import os import os.path as op import re import glob import token import tokenize import shutil import json import matplotlib matplotlib.use('Agg') import matplotlib.pyplot as plt from matplotlib import image RST_TEMPLATE = """ .. _{sphinx_tag}: {docstring} .. image:: {img_file} **Python source code:** :download:`[download source: {fname}]<{fname}>` .. literalinclude:: {fname} :lines: {end_line}- """ INDEX_TEMPLATE = """ .. raw:: html .. _{sphinx_tag}: Example gallery =============== {toctree} {contents} .. raw:: html
""" def create_thumbnail(infile, thumbfile, width=300, height=300, cx=0.5, cy=0.5, border=4): baseout, extout = op.splitext(thumbfile) im = image.imread(infile) rows, cols = im.shape[:2] x0 = int(cx * cols - .5 * width) y0 = int(cy * rows - .5 * height) xslice = slice(x0, x0 + width) yslice = slice(y0, y0 + height) thumb = im[yslice, xslice] thumb[:border, :, :3] = thumb[-border:, :, :3] = 0 thumb[:, :border, :3] = thumb[:, -border:, :3] = 0 dpi = 100 fig = plt.figure(figsize=(width / dpi, height / dpi), dpi=dpi) ax = fig.add_axes([0, 0, 1, 1], aspect='auto', frameon=False, xticks=[], yticks=[]) ax.imshow(thumb, aspect='auto', resample=True, interpolation='bilinear') fig.savefig(thumbfile, dpi=dpi) return fig def indent(s, N=4): """indent a string""" return s.replace('\n', '\n' + N * ' ') class ExampleGenerator(object): """Tools for generating an example page from a file""" def __init__(self, filename, target_dir): self.filename = filename self.target_dir = target_dir self.thumbloc = .5, .5 self.extract_docstring() with open(filename, "r") as fid: self.filetext = fid.read() outfilename = op.join(target_dir, self.rstfilename) # Only actually run it if the output RST file doesn't # exist or it was modified less recently than the example if (not op.exists(outfilename) or (op.getmtime(outfilename) < op.getmtime(filename))): self.exec_file() else: print("skipping {0}".format(self.filename)) @property def dirname(self): return op.split(self.filename)[0] @property def fname(self): return op.split(self.filename)[1] @property def modulename(self): return op.splitext(self.fname)[0] @property def pyfilename(self): return self.modulename + '.py' @property def rstfilename(self): return self.modulename + ".rst" @property def htmlfilename(self): return self.modulename + '.html' @property def pngfilename(self): pngfile = self.modulename + '.png' return "_images/" + pngfile @property def thumbfilename(self): pngfile = self.modulename + '_thumb.png' return pngfile @property def sphinxtag(self): return self.modulename @property def pagetitle(self): return self.docstring.strip().split('\n')[0].strip() @property def plotfunc(self): match = re.search(r"sns\.(.+plot)\(", self.filetext) if match: return match.group(1) match = re.search(r"sns\.(.+map)\(", self.filetext) if match: return match.group(1) match = re.search(r"sns\.(.+Grid)\(", self.filetext) if match: return match.group(1) return "" def extract_docstring(self): """ Extract a module-level docstring """ lines = open(self.filename).readlines() start_row = 0 if lines[0].startswith('#!'): lines.pop(0) start_row = 1 docstring = '' first_par = '' tokens = tokenize.generate_tokens(lines.__iter__().next) for tok_type, tok_content, _, (erow, _), _ in tokens: tok_type = token.tok_name[tok_type] if tok_type in ('NEWLINE', 'COMMENT', 'NL', 'INDENT', 'DEDENT'): continue elif tok_type == 'STRING': docstring = eval(tok_content) # If the docstring is formatted with several paragraphs, # extract the first one: paragraphs = '\n'.join(line.rstrip() for line in docstring.split('\n') ).split('\n\n') if len(paragraphs) > 0: first_par = paragraphs[0] break thumbloc = None for i, line in enumerate(docstring.split("\n")): m = re.match(r"^_thumb: (\.\d+),\s*(\.\d+)", line) if m: thumbloc = float(m.group(1)), float(m.group(2)) break if thumbloc is not None: self.thumbloc = thumbloc docstring = "\n".join([l for l in docstring.split("\n") if not l.startswith("_thumb")]) self.docstring = docstring self.short_desc = first_par self.end_line = erow + 1 + start_row def exec_file(self): print("running {0}".format(self.filename)) plt.close('all') my_globals = {'pl': plt, 'plt': plt} execfile(self.filename, my_globals) fig = plt.gcf() fig.canvas.draw() pngfile = op.join(self.target_dir, self.pngfilename) thumbfile = op.join("example_thumbs", self.thumbfilename) self.html = "" % self.pngfilename fig.savefig(pngfile, dpi=75, bbox_inches="tight") cx, cy = self.thumbloc create_thumbnail(pngfile, thumbfile, cx=cx, cy=cy) def toctree_entry(self): return " ./%s\n\n" % op.splitext(self.htmlfilename)[0] def contents_entry(self): return (".. raw:: html\n\n" " \n\n" "\n\n" "".format(self.htmlfilename, self.thumbfilename, self.plotfunc)) def main(app): static_dir = op.join(app.builder.srcdir, '_static') target_dir = op.join(app.builder.srcdir, 'examples') image_dir = op.join(app.builder.srcdir, 'examples/_images') thumb_dir = op.join(app.builder.srcdir, "example_thumbs") source_dir = op.abspath(op.join(app.builder.srcdir, '..', 'examples')) if not op.exists(static_dir): os.makedirs(static_dir) if not op.exists(target_dir): os.makedirs(target_dir) if not op.exists(image_dir): os.makedirs(image_dir) if not op.exists(thumb_dir): os.makedirs(thumb_dir) if not op.exists(source_dir): os.makedirs(source_dir) banner_data = [] toctree = ("\n\n" ".. toctree::\n" " :hidden:\n\n") contents = "\n\n" # Write individual example files for filename in glob.glob(op.join(source_dir, "*.py")): ex = ExampleGenerator(filename, target_dir) banner_data.append({"title": ex.pagetitle, "url": op.join('examples', ex.htmlfilename), "thumb": op.join(ex.thumbfilename)}) shutil.copyfile(filename, op.join(target_dir, ex.pyfilename)) output = RST_TEMPLATE.format(sphinx_tag=ex.sphinxtag, docstring=ex.docstring, end_line=ex.end_line, fname=ex.pyfilename, img_file=ex.pngfilename) with open(op.join(target_dir, ex.rstfilename), 'w') as f: f.write(output) toctree += ex.toctree_entry() contents += ex.contents_entry() if len(banner_data) < 10: banner_data = (4 * banner_data)[:10] # write index file index_file = op.join(target_dir, 'index.rst') with open(index_file, 'w') as index: index.write(INDEX_TEMPLATE.format(sphinx_tag="example_gallery", toctree=toctree, contents=contents)) def setup(app): app.connect('builder-inited', main) seaborn-0.6.0/doc/tutorial.rst000066400000000000000000000006161254425553400163240ustar00rootroot00000000000000.. _tutorial: Seaborn tutorial ================ Style management ---------------- .. toctree:: :maxdepth: 2 tutorial/aesthetics tutorial/color_palettes Plotting functions ------------------ .. toctree:: :maxdepth: 2 tutorial/distributions tutorial/regression tutorial/categorical Structured grids ---------------- .. toctree:: :maxdepth: 2 tutorial/axis_grids seaborn-0.6.0/doc/tutorial/000077500000000000000000000000001254425553400155675ustar00rootroot00000000000000seaborn-0.6.0/doc/tutorial/Makefile000066400000000000000000000003171254425553400172300ustar00rootroot00000000000000notebooks: tools/nb_to_doc.py aesthetics tools/nb_to_doc.py color_palettes tools/nb_to_doc.py distributions tools/nb_to_doc.py regression tools/nb_to_doc.py categorical tools/nb_to_doc.py axis_grids seaborn-0.6.0/doc/tutorial/aesthetics.ipynb000066400000000000000000000313671254425553400210000ustar00rootroot00000000000000{ "cells": [ { "cell_type": "raw", "metadata": {}, "source": [ ".. _aesthetics_tutorial:\n", "\n", ".. currentmodule:: seaborn" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Controlling figure aesthetics" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Drawing attractive figures is important. When making figures for yourself, as you explore a dataset, it's nice to have plots that are pleasant to look at. Visualizations are also central to communicating quantitative insights to an audience, and in that setting it's even more necessary to have figures that catch the attention and draw a viewer in.\n", "\n", "Matplotlib is highly customizable, but it can be hard to know what settings to tweak to achieve an attractive plot. Seaborn comes with a number of customized themes and a high-level interface for controlling the look of matplotlib figures." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%matplotlib inline" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib as mpl\n", "import matplotlib.pyplot as plt\n", "np.random.seed(sum(map(ord, \"aesthetics\")))" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Let's define a simple function to plot some offset sine waves, which will help us see the different stylistic parameters we can tweak." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def sinplot(flip=1):\n", " x = np.linspace(0, 14, 100)\n", " for i in range(1, 7):\n", " plt.plot(x, np.sin(x + i * .5) * (7 - i) * flip)" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "This is what the plot looks like with matplotlib defaults:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sinplot()" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "To switch to seaborn defaults, simply import the package." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import seaborn as sns\n", "sinplot()" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "The seaborn defaults break from the MATLAB inspired aesthetic of matplotlib to plot in more muted colors over a light gray background with white grid lines. We find that the grid aids in the use of figures for conveying quantitative information – in almost all cases, figures should be preferred to tables. The white-on-gray grid that is used by default avoids being obtrusive. The grid is particularly useful in giving structure to figures with multiple facets, which is central to some of the more complex tools in the library.\n", "\n", "Seaborn splits matplotlib parameters into two independent groups. The first group sets the aesthetic style of the plot, and the second scales various elements of the figure so that it can be easily incorporated into different contexts.\n", "\n", "The interface for manipulating these parameters are two pairs of functions. To control the style, use the :func:`axes_style` and :func:`set_style` functions. To scale the plot, use the :func:`plotting_context` and :func:`set_context` functions. In both cases, the first function returns a dictionary of parameters and the second sets the matplotlib defaults.\n", "\n", ".. _axes_style:\n", "\n", "Styling figures with :func:`axes_style` and :func:`set_style`\n", "-------------------------------------------------------------\n", "\n", "There are five preset seaborn themes: ``darkgrid``, ``whitegrid``, ``dark``, ``white``, and ``ticks``. They are each suited to different applications and personal preferences. The default theme is ``darkgrid``. As mentioned above, the grid helps the plot serve as a lookup table for quantitative information, and the white-on grey helps to keep the grid from competing with lines that represent data. The ``whitegrid`` theme is similar, but it is better suited to plots with heavy data elements:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.set_style(\"whitegrid\")\n", "data = np.random.normal(size=(20, 6)) + np.arange(6) / 2\n", "sns.boxplot(data=data);" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "For many plots, (especially for settings like talks, where you primarily want to use figures to provide impressions of patterns in the data), the grid is less necessary." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.set_style(\"dark\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sinplot()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.set_style(\"white\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sinplot()" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Sometimes you might want to give a little extra structure to the plots, which is where ticks come in handy:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.set_style(\"ticks\")\n", "sinplot()" ] }, { "cell_type": "raw", "metadata": {}, "source": [ ".. _remove_spines:\n", "\n", "Removing spines with :func:`despine`\n", "------------------------------------\n", "\n", "Both the ``white`` and ``ticks`` styles can benefit from removing the top and right axes spines, which are not needed. It's impossible to do this through the matplotlib parameters, but you can call the seaborn function :func:`despine` to remove them:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sinplot()\n", "sns.despine()" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Some plots benefit from offsetting the spines away from the data, which can also be done when calling :func:`despine`. When the ticks don't cover the whole range of the axis, the ``trim`` parameter will limit the range of the surviving spines." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "f, ax = plt.subplots()\n", "sns.violinplot(data)\n", "sns.despine(offset=10, trim=True);" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "You can also control which spines are removed with additional arguments to :func:`despine`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.set_style(\"whitegrid\")\n", "sns.boxplot(data=data, color=\"deep\")\n", "sns.despine(left=True)" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Temporarily setting figure style\n", "--------------------------------\n", "\n", "Although it's easy to switch back and forth, you can also use the :func:`axes_style` function in a ``with`` statement to temporarily set plot parameters. This also allows you to make figures with differently-styled axes:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "with sns.axes_style(\"darkgrid\"):\n", " plt.subplot(211)\n", " sinplot()\n", "plt.subplot(212)\n", "sinplot(-1)" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Overriding elements of the seaborn styles\n", "-----------------------------------------\n", "\n", "If you want to customize the seaborn styles, you can pass a dictionary of parameters to the ``rc`` argument of :func:`axes_style` and :func:`set_style`. Note that you can only override the parameters that are part of the style definition through this method. (However, the higher-level :func:`set` function takes a dictionary of any matplotlib parameters).\n", "\n", "If you want to see what parameters are included, you can just call the function with no arguments, which will return the current settings:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.axes_style()" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "You can then set different versions of these parameters:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.set_style(\"darkgrid\", {\"axes.facecolor\": \".9\"})\n", "sinplot()" ] }, { "cell_type": "raw", "metadata": {}, "source": [ ".. _plotting_context:\n", "\n", "Scaling plot elements with :func:`plotting_context` and :func:`set_context`\n", "---------------------------------------------------------------------------\n", "\n", "A separate set of parameters control the scale of plot elements, which should let you use the same code to make plots that are suited for use in settings where larger or smaller plots are appropriate.\n", "\n", "First let's reset the default parameters by calling :func:`set`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.set()" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "The four preset contexts, in order of relative size, are ``paper``, ``notebook``, ``talk``, and ``poster``. The ``notebook`` style is the default, and was used in the plots above." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.set_context(\"paper\")\n", "plt.figure(figsize=(8, 6))\n", "sinplot()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.set_context(\"talk\")\n", "plt.figure(figsize=(8, 6))\n", "sinplot()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.set_context(\"poster\")\n", "plt.figure(figsize=(8, 6))\n", "sinplot()" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Most of what you now know about the style functions should transfer to the context functions.\n", "\n", "You can call :func:`set_context` with one of these names to set the parameters, and you can override the parameters by providing a dictionary of parameter values.\n", "\n", "You can also independently scale the size of the font elements when changing the context. (This option is also available through the top-level :func:`set` function)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.set_context(\"notebook\", font_scale=1.5, rc={\"lines.linewidth\": 2.5})\n", "sinplot()" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Similarly (although it might be less useful), you can temporarily control the scale of figures nested under a ``with`` statement.\n", "\n", "Both the style and the context can be quickly configured with the :func:`set` function. This function also sets the default color palette, but that will be covered in more detail in the :ref:`next section ` of the tutorial." ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.9" } }, "nbformat": 4, "nbformat_minor": 0 } seaborn-0.6.0/doc/tutorial/axis_grids.ipynb000066400000000000000000000570461254425553400210020ustar00rootroot00000000000000{ "cells": [ { "cell_type": "raw", "metadata": {}, "source": [ ".. _grid_tutorial:\n", "\n", ".. currentmodule:: seaborn" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Plotting on data-aware grids" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "When exploring medium-dimensional data, a useful approach is to draw multiple instances of the same plot on different subsets of your dataset. This technique is sometimes called either \"lattice\", or `\"trellis\" `_ plotting, and it is related to the idea of `\"small multiples\" `_. It allows a viewer to quickly extract a large amount of information about complex data. Matplotlib offers good support for making figures with multiple axes; seaborn builds on top of this to directly link the structure of the plot to the structure of your dataset.\n", "\n", "To use these features, your data has to be in a Pandas DataFrame and it must take the form of what Hadley Whickam calls `\"tidy\" data `_. In brief, that means your dataframe should be structured such that each column is a variable and each row is an observation.\n", "\n", "For advanced use, you can use the objects discussed in this part of the tutorial directly, which will provide maximum flexibility. Some seaborn functions (such as :func:`lmplot`, :func:`factorplot`, and :func:`pairplot`) also use them behind the scenes. Unlike other seaborn functions that are \"Axes-level\" and draw onto specific (possibly already-existing) matplotlib ``Axes`` without otherwise manipulating the figure, these higher-level functions create a figure when called and are generally more strict about how it gets set up. In some cases, arguments either to those functions or to the constructor of the class they rely on will provide a different interface attributes like the figure size, as in the case of :func:`lmplot` where you can set the height and aspect ratio for each facet rather than the overall size of the figure. Any function that uses one of these objects will always return it after plotting, though, and most of these objects have convenience methods for changing how the plot, often in a more abstract and easy way." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%matplotlib inline" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import seaborn as sns\n", "from scipy import stats\n", "import matplotlib as mpl\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.set(style=\"ticks\")\n", "np.random.seed(sum(map(ord, \"axis_grids\")))" ] }, { "cell_type": "raw", "metadata": {}, "source": [ ".. _facet_grid:\n", "\n", "Subsetting data with :class:`FacetGrid`\n", "---------------------------------------\n", "\n", "The :class:`FacetGrid` class is useful when you want to visualize the distribution of a variable or the relationship between multiple variables separately within subsets of your dataset. A :class:`FacetGrid` can be drawn with up to three dimensions: ``row``, ``col``, and ``hue``. The first two have obvious correspondence with the resulting array of axes; think of the hue variable as a third dimension along a depth axis, where different levels are plotted with different colors.\n", "\n", "The class is used by initializing a :class:`FacetGrid` object with a dataframe and the names of the variables that will form the row, column, or hue dimensions of the grid. These variables should be categorical or discrete, and then the data at each level of the variable will be used for a facet along that axis. For example, say we wanted to examine differences between lunch and dinner in the ``tips`` dataset.\n", "\n", "Additionally, both :func:`lmplot` and :func:`factorplot` use this object internally, and they return the object when they are finsihed so that it can be used for further tweaking." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "tips = sns.load_dataset(\"tips\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "g = sns.FacetGrid(tips, col=\"time\")" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Initializing the grid like this sets up the matplotlib figure and axes, but doesn't draw anything on them.\n", "\n", "The main approach for visualizing data on this grid is with the :meth:`FacetGrid.map` method. Provide it with a plotting function and the name(s) of variable(s) in the dataframe to plot. Let's look at the distribution of tips in each of these subsets, using a histogram." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "g = sns.FacetGrid(tips, col=\"time\")\n", "g.map(plt.hist, \"tip\");" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "This function will draw the figure and annotate the axes, hopefully producing a finished plot in one step. To make a relational plot, just pass multiple variable names. You can also provide keyword arguments, which will be passed to the plotting function:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "g = sns.FacetGrid(tips, col=\"sex\", hue=\"smoker\")\n", "g.map(plt.scatter, \"total_bill\", \"tip\", alpha=.7)\n", "g.add_legend();" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "There are several options for controlling the look of the grid that can be passed to the class constructor." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "g = sns.FacetGrid(tips, row=\"smoker\", col=\"time\", margin_titles=True)\n", "g.map(sns.regplot, \"size\", \"total_bill\", color=\".3\", fit_reg=False, x_jitter=.1);" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Note that ``margin_titles`` isn't formally supported by the matplotlib API, and may not work well in all cases. In particular, it currently can't be used with a legend that lies outside of the plot.\n", "\n", "The size of the figure is set by providing the height of *each* facet, along with the aspect ratio:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "g = sns.FacetGrid(tips, col=\"day\", size=4, aspect=.5)\n", "g.map(sns.barplot, \"sex\", \"total_bill\");" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "With versions of matplotlib > 1.4, you can pass parameters to be used in the `gridspec` module. The can be used to draw attention to a particular facet by increasing its size. It's particularly useful when visualizing distributions of datasets with unequal numbers of groups in each facet." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "titanic = sns.load_dataset(\"titanic\")\n", "titanic = titanic.assign(deck=titanic.deck.astype(object)).sort(\"deck\")\n", "g = sns.FacetGrid(titanic, col=\"class\", sharex=False,\n", " gridspec_kws={\"width_ratios\": [5, 3, 3]})\n", "g.map(sns.boxplot, \"deck\", \"age\");" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "The default ordering of the facets is derived from the information in the DataFrame. If the variable used to define facets has a categorical type, then the order of the categories is used. Otherwise, the facets will be in the order of appearence of the category levels. It is possible, however, to specify an ordering of any facet dimension with the appropriate ``*_order`` parameter:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "ordered_days = tips.day.value_counts().index\n", "g = sns.FacetGrid(tips, row=\"day\", row_order=ordered_days,\n", " size=1.7, aspect=4,)\n", "g.map(sns.distplot, \"total_bill\", hist=False, rug=True);" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Any seaborn color palette (i.e., something that can be passed to :func:`color_palette()` can be provided. You can also use a dictionary that maps the names of values in the ``hue`` variable to valid matplotlib colors:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "pal = dict(Lunch=\"seagreen\", Dinner=\"gray\")\n", "g = sns.FacetGrid(tips, hue=\"time\", palette=pal, size=5)\n", "g.map(plt.scatter, \"total_bill\", \"tip\", s=50, alpha=.7, linewidth=.5, edgecolor=\"white\")\n", "g.add_legend();" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "You can also let other aspects of the plot vary across levels of the hue variable, which can be helpful for making plots that will be more comprehensible when printed in black-and-white. To do this, pass a dictionary to ``hue_kws`` where keys are the names of plotting function keyword arguments and values are lists of keyword values, one for each level of the hue variable." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "g = sns.FacetGrid(tips, hue=\"sex\", palette=\"Set1\", size=5, hue_kws={\"marker\": [\"^\", \"v\"]})\n", "g.map(plt.scatter, \"total_bill\", \"tip\", s=100, linewidth=.5, edgecolor=\"white\")\n", "g.add_legend();" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "If you have many levels of one variable, you can plot it along the columns but \"wrap\" them so that they span multiple rows. When doing this, you cannot use a ``row`` variable." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "attend = sns.load_dataset(\"attention\").query(\"subject <= 12\")\n", "g = sns.FacetGrid(attend, col=\"subject\", col_wrap=4, size=2, ylim=(0, 10))\n", "g.map(sns.pointplot, \"solutions\", \"score\", color=\".3\", ci=None);" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Once you've drawn a plot using :meth:`FacetGrid.map` (which can be called multiple times), you may want to adjust some aspects of the plot. There are also a number of methods on the :class:`FacetGrid` object for manipulating the figure at a higher level of abstraction. The most general is :meth:`FacetGrid.set`, and there are other more specialized methods like :meth:`FacetGrid.set_axis_labels`, which respects the fact that interior facets do not have axis labels. For example:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "with sns.axes_style(\"white\"):\n", " g = sns.FacetGrid(tips, row=\"sex\", col=\"smoker\", margin_titles=True, size=2.5)\n", "g.map(plt.scatter, \"total_bill\", \"tip\", color=\"#334488\", edgecolor=\"white\", lw=.5);\n", "g.set_axis_labels(\"Total bill (US Dollars)\", \"Tip\");\n", "g.set(xticks=[10, 30, 50], yticks=[2, 6, 10]);\n", "g.fig.subplots_adjust(wspace=.02, hspace=.02);" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "For even more customization, you can work directly with the underling matplotlib ``Figure`` and ``Axes`` objects, which are stored as member attributes at ``fig`` and ``axes`` (a two-dimensional array), respectively. When making a figure without row or column faceting, you can also use the ``ax`` attribute to directly access the single axes." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "g = sns.FacetGrid(tips, col=\"smoker\", margin_titles=True, size=4)\n", "g.map(plt.scatter, \"total_bill\", \"tip\", color=\"#338844\", edgecolor=\"white\", s=50, lw=1)\n", "for ax in g.axes.flat:\n", " ax.plot((0, 50), (0, .2 * 50), c=\".2\", ls=\"--\")\n", "g.set(xlim=(0, 60), ylim=(0, 14));" ] }, { "cell_type": "raw", "metadata": {}, "source": [ ".. _custom_map_func:\n", "\n", "Mapping custom functions onto the grid\n", "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n", "\n", "You're not limited to existing matplotlib and seaborn functions when using :class:`FacetGrid`. However, to work properly, any function you use must follow a few rules:\n", "\n", "1. It must plot onto the \"currently active\" matplotlib ``Axes``. This will be true of functions in the ``matplotlib.pyplot`` namespace, and you can call ``plt.gca`` to get a reference to the current ``Axes`` if you want to work directly with its methods.\n", "2. It must accept the data that it plots in positional arguments. Internally, :class:`FacetGrid` will pass a ``Series`` of data for each of the named positional arguments passed to :meth:`FacetGrid.map`.\n", "3. It must be able to accept ``color`` and ``label`` keyword arguments, and, ideally, it will do something useful with them. In most cases, it's easiest to catch a generic dictionary of ``**kwargs`` and pass it along to the underlying plotting function.\n", "\n", "Let's look at minimal example of a function you can plot with. This function will just take a single vector of data for each facet:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def quantile_plot(x, **kwargs):\n", " qntls, xr = stats.probplot(x, fit=False)\n", " plt.scatter(xr, qntls, **kwargs)\n", " \n", "g = sns.FacetGrid(tips, col=\"sex\", size=4)\n", "g.map(quantile_plot, \"total_bill\");" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "If we want to make a bivariate plot, you should write the function so that it accepts the x-axis variable first and the y-axis variable second:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def qqplot(x, y, **kwargs):\n", " _, xr = stats.probplot(x, fit=False)\n", " _, yr = stats.probplot(y, fit=False)\n", " plt.scatter(xr, yr, **kwargs)\n", " \n", "g = sns.FacetGrid(tips, col=\"smoker\", size=4)\n", "g.map(qqplot, \"total_bill\", \"tip\");" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Because ``plt.scatter`` accepts ``color`` and ``label`` keyword arguments and does the right thing with them, we can add a hue facet without any difficulty:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "g = sns.FacetGrid(tips, hue=\"time\", col=\"sex\", size=4)\n", "g.map(qqplot, \"total_bill\", \"tip\")\n", "g.add_legend();" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "This approach also lets us use additional aesthetics to distinguish the levels of the hue variable, along with keyword arguments that won't be depdendent on the faceting variables:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "g = sns.FacetGrid(tips, hue=\"time\", col=\"sex\", size=4,\n", " hue_kws={\"marker\": [\"s\", \"D\"]})\n", "g.map(qqplot, \"total_bill\", \"tip\", s=40, edgecolor=\"w\")\n", "g.add_legend();" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Sometimes, though, you'll want to map a function that doesn't work the way you expect with the ``color`` and ``label`` keyword arguments. In this case, you'll want to explictly catch them and handle them in the logic of your custom function. For example, this approach will allow use to map ``plt.hexbin``, which otherwise does not play well with the :class:`FacetGrid` API:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def hexbin(x, y, color, **kwargs):\n", " cmap = sns.light_palette(color, as_cmap=True)\n", " plt.hexbin(x, y, gridsize=15, cmap=cmap, **kwargs)\n", "\n", "with sns.axes_style(\"dark\"):\n", " g = sns.FacetGrid(tips, hue=\"time\", col=\"time\", size=4)\n", "g.map(hexbin, \"total_bill\", \"tip\", extent=[0, 50, 0, 10]);" ] }, { "cell_type": "raw", "metadata": {}, "source": [ ".. _pair_grid:\n", "\n", "Plotting pairwise relationships with :class:`PairGrid` and :func:`pairplot`\n", "---------------------------------------------------------------------------\n", "\n", ":class:`PairGrid` also allows you to quickly draw a grid of small subplots using the same plot type to visualize data in each. In a :class:`PairGrid`, each row and column is assigned to a different variable, so the resulting plot shows each pairwise relationship in the dataset. This style of plot is sometimes called a \"scatterplot matrix\", as this is the most common way to show each relationship, but :class:`PairGrid` is not limited to scatterplots.\n", "\n", "It's important to understand the differences between a :class:`FacetGrid` and a :class:`PairGrid`. In the former, each facet shows the same relationship conditioned on different levels of other variables. In the latter, each plot shows a different relationship (although the upper and lower triangles will have mirrored plots). Using :class:`PairGrid` can give you a very quick, very high-level summary of interesting relationships in your dataset.\n", "\n", "The basic usage of the class is very similar to :class:`FacetGrid`. First you initialize the grid, then you pass plotting function to a ``map`` method and it will be called on each subplot. There is also a companion function, :func:`pairplot` that trades off some flexibility for faster plotting.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "iris = sns.load_dataset(\"iris\")\n", "g = sns.PairGrid(iris)\n", "g.map(plt.scatter);" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "It's possible to plot a different function on the diagonal to show the univariate distribution of the variable in each column. Note that the axis ticks won't correspond to the count or density axis of this plot, though." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "g = sns.PairGrid(iris)\n", "g.map_diag(plt.hist)\n", "g.map_offdiag(plt.scatter);" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "A very common way to use this plot colors the observations by a separate categorical variable. For example, the iris dataset has four measurements for each of three different species of iris flowers so you can see how they differ." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "g = sns.PairGrid(iris, hue=\"species\")\n", "g.map_diag(plt.hist)\n", "g.map_offdiag(plt.scatter)\n", "g.add_legend();" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "By default every numeric column in the dataset is used, but you can focus on particular relationships if you want." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "g = sns.PairGrid(iris, vars=[\"sepal_length\", \"sepal_width\"], hue=\"species\")\n", "g.map(plt.scatter);" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "It's also possible to use a different function in the upper and lower triangles to emphasize different aspects of the relationship." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "g = sns.PairGrid(iris)\n", "g.map_upper(plt.scatter)\n", "g.map_lower(sns.kdeplot, cmap=\"Blues_d\")\n", "g.map_diag(sns.kdeplot, lw=3, legend=False);" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "The square grid with identity relationships on the diagonal is actually just a special case, and you can plot with different variables in the rows and columns." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "g = sns.PairGrid(tips, y_vars=[\"tip\"], x_vars=[\"total_bill\", \"size\"], size=4)\n", "g.map(sns.regplot, color=\".3\")\n", "g.set(ylim=(-1, 11), yticks=[0, 5, 10]);" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Of course, the aesthetic attributes are configurable. For instance, you can use a different palette (say, to show an ordering of the ``hue`` variable) and pass keyword arguments into the plotting functions." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "g = sns.PairGrid(tips, hue=\"size\", palette=\"GnBu_d\")\n", "g.map(plt.scatter, s=50, edgecolor=\"white\")\n", "g.add_legend();" ] }, { "cell_type": "raw", "metadata": {}, "source": [ ":class:`PairGrid` is flexible, but to take a quick look at a dataset, it can be easier to use :func:`pairplot`. This function uses scatterplots and histograms by default, although a few other kinds will be added (currently, you can also plot regression plots on the off-diagonals and KDEs on the diagonal)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.pairplot(iris, hue=\"species\", size=2.5);" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "You can also control the aesthetics of the plot with keyword arguments, and it returns the :class:`PairGrid` instance for further tweaking." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "g = sns.pairplot(iris, hue=\"species\", palette=\"Set2\", diag_kind=\"kde\", size=2.5)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.10" } }, "nbformat": 4, "nbformat_minor": 0 }seaborn-0.6.0/doc/tutorial/categorical.ipynb000066400000000000000000000441041254425553400211120ustar00rootroot00000000000000{ "cells": [ { "cell_type": "raw", "metadata": {}, "source": [ ".. _categorical_tutorial:\n", "\n", ".. currentmodule:: seaborn" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Plotting with categorical data" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "We :ref:`previously ` learned how to use scatterplots and regression model fits to visualize the relationship between two variables and how it changes across levels of additional categorical variables. However, what if one of the main variables you are interested in is categorical? In this case, the scatterplot and regression model approach won't work. There are several options, however, for visualizing such a relationship, which we will discuss in this tutorial.\n", "\n", "It's useful to divide seaborn's categorical plots into two groups: those that show the full distribution of observations within each level of the categorical variable, and those that apply a statistical estimation to show a measure of central tendency and confidence interval. The former includes the functions :func:`stripplot`, :func:`boxplot`, and :func:`violinplot`, while the latter includes the functions :func:`barplot`, :func:`countplot`, and :func:`pointplot`. These functions all share a basic API for how they accept data, although each has specific parameters that control the particulars of the visualization that is applied to that data.\n", "\n", "Much like the relationship between :func:`regplot` and :func:`lmplot`, in seaborn there are both relatively low-level and relatively high-level approaches for making categorical plots. The functions named above are all low-level in that they plot onto a specific matplotlib axes. There is also the higher-level :func:`factorplot`, which combines these functions with a :class:`FacetGrid` to apply a categorical plot across a grid of figure panels.\n", "\n", "It is easiest and best to invoke these functions with a DataFrame that is in `\"tidy\" `_ format, although the lower-level functions also accept wide-form DataFrames or simple vectors of observations. See below for examples." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%matplotlib inline" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import matplotlib as mpl\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import seaborn as sns\n", "sns.set(style=\"whitegrid\", color_codes=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "np.random.seed(sum(map(ord, \"categorical\")))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "titanic = sns.load_dataset(\"titanic\")\n", "tips = sns.load_dataset(\"tips\")\n", "iris = sns.load_dataset(\"iris\")" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Distributions of observations within categories\n", "-----------------------------------------------\n", "\n", "The first set of functions shows the full distribution of the quantitative variable within each level of the categorical variable(s). These generalize some of the approaches we discussed in the :ref:`chapter ` to the case where we want to quickly compare across several distributions.\n", "\n", "Categorical scatterplots\n", "^^^^^^^^^^^^^^^^^^^^^^^^\n", "\n", "A simple way to show the distribution of some quantitative variable across the levels of a categorical variable uses :func:`stripplot`, which generalizes a scatterplot to the case where one of the variables is categorical:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.stripplot(x=\"day\", y=\"total_bill\", data=tips);" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "It's also possible to add a nested categorical variable with the ``hue`` paramater. Above the color and position on the categorical axis are redundent, but now each provides information about one of the two variables:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.stripplot(x=\"day\", y=\"total_bill\", hue=\"time\", data=tips);" ] }, { "cell_type": "raw", "metadata": {}, "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.stripplot(x=\"size\", y=\"total_bill\", data=tips.sort(\"size\"));" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "With these plots, it's often helpful to put the categorical variable on the vertical axis (this is particularly useful when the category names are relatively long or there are many categories):" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.stripplot(x=\"total_bill\", y=\"day\", hue=\"time\", data=tips);" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Boxplots\n", "^^^^^^^^\n", "\n", "At a certain point, the categorical scatterplot approach becomes limited in the information it can provide about the distribution of values within each category. There are several ways to summarize this information in ways that facilitate easy comparisons across the category levels.\n", "\n", "The first is the familiar :func:`boxplot`. This kind of plot shows the three quartile values of the distribution along with extreme values. The \"whiskers\" extend to points that lie within 1.5 IQRs of the lower and upper quartile, and then observations that fall outside this range are displayed independently. Importantly, this means that each value in the boxplot corresponds to an actual observation in the data:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.boxplot(x=\"day\", y=\"total_bill\", hue=\"time\", data=tips);" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Violinplots\n", "^^^^^^^^^^^\n", "\n", "A different approach is a :func:`violinplot`, which combines a boxplot with the kernel density estimation procedure described in the :ref:`distributions tutorial `_:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.violinplot(x=\"total_bill\", y=\"day\", hue=\"time\", data=tips);" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "This approach uses the kernel density estimate to provide a better description of the distribution of values. Additionally, the quartile and whikser values from the boxplot are shown inside the violin. Because the violinplot uses a KDE, there are some other parameters that may need tweaking, adding some complexity relative to the straightforward boxplot:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.violinplot(x=\"total_bill\", y=\"day\", hue=\"time\", data=tips,\n", " bw=.1, scale=\"count\", scale_hue=False);" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "It's also possible to \"split\" the violins when the hue parameter has only two levels, which can allow for a more efficient use of space:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.violinplot(x=\"day\", y=\"total_bill\", hue=\"sex\", data=tips, split=True);" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Finally, there are several options for the plot that is drawn on the interior of the violins, including ways to show each individual observation instead of the summary boxplot values:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.violinplot(x=\"day\", y=\"total_bill\", hue=\"sex\", data=tips,\n", " split=True, inner=\"stick\", palette=\"Set3\");" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "It can also be useful to combine :func:`stripplot` with :func:`violinplot` or :func:`boxplot` to show each observation along with a summary of the distribution:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.violinplot(x=\"day\", y=\"total_bill\", data=tips, inner=None)\n", "sns.stripplot(x=\"day\", y=\"total_bill\", data=tips, jitter=True, size=4);" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Statistical estimation within categories\n", "----------------------------------------\n", "\n", "Often, rather than showing the distribution within each category, you might want to show the central tendency of the values. Seaborn has two main ways to show this information, but importantly, the basic API for these functions is identical to that for the ones discussed above.\n", "\n", "Bar plots\n", "^^^^^^^^^\n", "\n", "A familiar style of plot that accomplishes this goal is a bar plot. In seaborn, the :func:`barplot` function operates on a full dataset and shows an arbitrary estimate, using the mean by default. When there are multiple observations in each category, it also uses bootstrapping to compute a confidence interval around the estimate and plots that using error bars:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.barplot(x=\"sex\", y=\"survived\", hue=\"class\", data=titanic);" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "A special case for the bar plot is when you want to show the number of observations in each category rather than computing a statistic for a second variable. This is similar to a histogram over a categorical, rather than quantitative, variable. In seaborn, it's easy to do so with the :func:`countplot` function:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.countplot(x=\"deck\", data=titanic, palette=\"Greens_d\");" ] }, { "cell_type": "raw", "metadata": { "collapsed": false }, "source": [ "Both :func:`barplot` and :func:`countplot` can be invoked with all of the options discussed above, along with others that are demonstrated in the detailed documentation for each function:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.countplot(y=\"deck\", hue=\"class\", data=titanic, palette=\"Greens_d\");" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Point plots\n", "^^^^^^^^^^^\n", "\n", "An alternative style for visualizing the same information is offered by the :func:`pointplot` function. This function also encodes the value of the estimate with height on the other axis, but rather than show a full bar it just plots the point estimate and confidence interval. Additionally, pointplot connects points from the same ``hue`` category. This makes it easy to see how the main relationship is changing as a function of a second variable, because your eyes are quite good at picking up on differences of slopes:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.pointplot(x=\"sex\", y=\"survived\", hue=\"class\", data=titanic);" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "To make figures that reproduce well in black and white, it can be good to use different markers and line styles for the levels of the ``hue`` category:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.pointplot(x=\"class\", y=\"survived\", hue=\"sex\", data=titanic,\n", " palette={\"male\": \"g\", \"female\": \"m\"},\n", " markers=[\"^\", \"o\"], linestyles=[\"-\", \"--\"]);" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Drawing multi-panel categorical plots\n", "-------------------------------------\n", "\n", "As we mentioned above, there are two ways to draw categorical plots in seaborn. Similar to the duality in the regression plots, you can either use the functions introduced above, or the higher-level function :func:`factorplot`, which combines these functions with a :func:`FacetGrid` to add the ability to examine additional categories through the larger structure of the figure.\n", "\n", "While the main options for each plot kind are available either way, the lower-level functions have a bit more flexibility in the kind of inputs they can take. For instance, you can just pass a ``DataFrame`` to the ``data`` parameter, and the distribution or central tendency of each *column* in the dataframe will be shown:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.boxplot(data=iris, orient=\"h\");" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Additionally, these functions accept vectors of Pandas or numpy objects rather than variables in a ``DataFrame``:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.violinplot(x=iris.species, y=iris.sepal_length);" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "To control the size and shape of plots made by the functions discussed above, you must set up the figure yourself using matplotlib commands. Of course, this also means that the plots can happily coexist in a multi-panel figure with other kinds of plots:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "f, ax = plt.subplots(figsize=(7, 3))\n", "sns.countplot(y=\"deck\", data=titanic, color=\"c\");" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "The :func:`factorplot` function is a higher-level wrapper on these plots that produces a matplotlib figure managed through a :class:`FacetGrid`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.factorplot(x=\"day\", y=\"total_bill\", hue=\"smoker\", data=tips);" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "By default it uses :func:`pairplot`, but the ``kind`` parameter lets you chose any of the kinds of plots discussed above:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.factorplot(x=\"day\", y=\"total_bill\", hue=\"smoker\", data=tips, kind=\"bar\");" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "The key advantage of :func:`factorplot` is that it's easy to add faceting by additional variables in the ``DataFrame``, such as along the columns:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.factorplot(x=\"day\", y=\"total_bill\", hue=\"smoker\",\n", " col=\"time\", data=tips, kind=\"bar\");" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Any kind of plot can be drawn. Because of the way :class:`FacetGrid` works, to change the size and shape of the figure you need to specify the ``size`` and ``aspect`` arguments, which apply to a single facet:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.factorplot(x=\"time\", y=\"total_bill\", hue=\"smoker\",\n", " col=\"day\", data=tips, kind=\"box\", size=4, aspect=.5);" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Because of the generalized API of the categorical plots, they should be easy to apply to other more complex contexts. For example, they are easily combined with a :class:`PairGrid` to show categorical relationships across several different variables:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "g = sns.PairGrid(tips,\n", " x_vars=[\"smoker\", \"time\", \"sex\"],\n", " y_vars=[\"total_bill\", \"tip\"],\n", " aspect=.75, size=3.5)\n", "g.map(sns.violinplot, palette=\"pastel\");" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.10" } }, "nbformat": 4, "nbformat_minor": 0 }seaborn-0.6.0/doc/tutorial/color_palettes.ipynb000066400000000000000000000640771254425553400216670ustar00rootroot00000000000000{ "cells": [ { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ ".. _palette_tutorial:\n", "\n", ".. currentmodule:: seaborn" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Choosing color palettes" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Color is more important than other aspects of figure style because color can reveal patterns in the data if used effectively or hide those patterns if used poorly. There are a number of great resources to learn about good techniques for using color in visualizations, I am partial to this `series of blog posts `_ from Rob Simmon and this `more technical paper `_. The matplotlib docs also now have a `nice tutorial `_ that illustrates some of the perceptual properties of the built in colormaps.\n", "\n", "Seaborn makes it easy to select and use color palettes that are suited to the kind of data you are working with and the goals you have in visualizing it." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%matplotlib inline" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import numpy as np\n", "import seaborn as sns\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.set(rc={\"figure.figsize\": (6, 6)})\n", "np.random.seed(sum(map(ord, \"palettes\")))" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Building color palettes with :func:`color_palette`\n", "--------------------------------------------------\n", "\n", "The most important function for working with discrete color palettes is :func:`color_palette`. This function provides an interface to many (though not all) of the possible ways you can generate colors in seaborn, and it's used internally by any function that has a ``palette`` argument (and in some cases for a ``color`` argument when multiple colors are needed).\n", "\n", ":func:`color_palette` will accept the name of any seaborn palette or matplotlib colormap (except ``jet``, which you should never use). It can also take a list of colors specified in any valid matplotlib format (RGB tuples, hex color codes, or HTML color names). The return value is always a list of RGB tuples.\n", "\n", "Finally, calling :func:`color_palette` with no arguments will return the current default color cycle.\n", "\n", "A corresponding function, :func:`set_palette`, takes the same arguments and will set the default color cycle for all plots. You can also use :func:`color_palette` in a ``with`` statement to temporarily change the default palette (see :ref:`below `).\n", "\n", "It is generally not possible to know what kind of color palette or colormap is best for a set of data without knowing about the characteristics of the data. Following that, we'll break up the different ways to use :func:`color_palette` and other seaborn palette functions by the three general kinds of color palettes: *qualitative*, *sequential*, and *diverging*." ] }, { "cell_type": "raw", "metadata": {}, "source": [ ".. _qualitative_palettes:\n", "\n", "Qualitative color palettes\n", "--------------------------\n", "\n", "Qualitative (or categorical) palettes are best when you want to distinguish discrete chunks of data that do not have an inherent ordering.\n", "\n", "When importing seaborn, the default color cycle is changed to a set of six colors that evoke the standard matplotlib color cycle while aiming to be a bit more pleasing to look at." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "current_palette = sns.color_palette()\n", "sns.palplot(current_palette)" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "There are six variations of the default theme, called ``deep``, ``muted``, ``pastel``, ``bright``, ``dark``, and ``colorblind``.\n", "\n", "Using circular color systems\n", "~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n", "\n", "When you have more than six categories to distinguish, the easiest thing is to draw evenly-spaced colors in a circular color space (such that the hue changes which keeping the brightness and saturation constant). This is what most seaborn functions default to when they need to use more colors than are currently set in the default color cycle.\n", "\n", "The most common way to do this uses the ``hls`` color space, which is a simple transformation of RGB values." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.palplot(sns.color_palette(\"hls\", 8))" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "There is also the :func:`hls_palette` function that lets you control the lightness and saturation of the colors." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.palplot(sns.hls_palette(8, l=.3, s=.8))" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "However, because of the way the human visual system works, colors that are even \"intensity\" in terms of their RGB levels won't necessarily look equally intense. `We perceive `_ yellows and greens as relatively brighter and blues as relatively darker, which can be a problem when aiming for uniformity with the ``hls`` system.\n", "\n", "To remedy this, seaborn provides an interface to the `husl `_ system, which also makes it easy to select evenly spaced hues while keeping the apparent brightness and saturation much more uniform." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.palplot(sns.color_palette(\"husl\", 8))" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "There is similarly a function called :func:`husl_palette` that provides a more flexible interface to this system.\n", "\n", "Using categorical Color Brewer palettes\n", "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n", "\n", "Another source of visually pleasing categorical palettes comes from the `Color Brewer `_ tool (which also has sequential and diverging palettes, as we'll see below). These also exist as matplotlib colormaps, but they are not handled properly. In seaborn, when you ask for a qualitative Color Brewer palette, you'll always get the discrete colors, but this means that at a certain point they will begin to cycle.\n", "\n", "A nice feature of the Color Brewer website is that it provides some guidance on which palettes are color blind safe. There are a variety of [kinds](http://en.wikipedia.org/wiki/Color_blindness) of color blindness, but the most common variant leads to difficulty distinguishing reds and greens. It's generally a good idea to avoid using red and green for plot elements that need to be discriminated based on color." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.palplot(sns.color_palette(\"Paired\"))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.palplot(sns.color_palette(\"Set2\", 10))" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "To help you choose palettes from the Color Brewer library, there is the :func:`choose_colorbrewer_palette` function. This function, which must be used in an IPython notebook, will launch an interactive widget that lets you browse the various options and tweak their parameters.\n", "\n", "Of course, you might just want to use a set of colors you particularly like together. Because :func:`color_palette` accepts a list of colors, this is easy to do." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "flatui = [\"#9b59b6\", \"#3498db\", \"#95a5a6\", \"#e74c3c\", \"#34495e\", \"#2ecc71\"]\n", "sns.palplot(sns.color_palette(flatui))" ] }, { "cell_type": "raw", "metadata": {}, "source": [ ".. _using_xkcd_palettes:\n", " \n", "Using named colors from the xkcd color survey\n", "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n", "\n", "A while back, `xkcd `_ ran a `crowdsourced effort `_ to name random RGB colors. This produced a set of `954 named colors `_, which you can now reference in seaborn using the ``xkcd_rgb`` dictionary:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "plt.plot([0, 1], [0, 1], sns.xkcd_rgb[\"pale red\"], lw=3)\n", "plt.plot([0, 1], [0, 2], sns.xkcd_rgb[\"medium green\"], lw=3)\n", "plt.plot([0, 1], [0, 3], sns.xkcd_rgb[\"denim blue\"], lw=3);" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "If you want to spend some time picking colors, this `interactive visualization `_ may be useful. In addition to pulling out single colors from the ``xkcd_rgb`` dictionary, you can also pass a list of names to the :func:`xkcd_palette` function." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "colors = [\"windows blue\", \"amber\", \"greyish\", \"faded green\", \"dusty purple\"]\n", "sns.palplot(sns.xkcd_palette(colors))" ] }, { "cell_type": "raw", "metadata": {}, "source": [ ".. _sequential_palettes:\n", "\n", "Sequential color palettes\n", "-------------------------\n", "\n", "The second major class of color palettes is called \"sequential\". This kind of color mapping is appropriate when data range from relatively low or unintersting values to relatively high or interesting values. Although there are cases where you will want discrete colors in a sequential palette, it's more common to use them as a colormap in functions like :func:`kdeplot` or :func:`corrplot` (along with similar matplotlib functions).\n", "\n", "It's common to see colormaps like ``jet`` (or other rainbow palettes) used in this case, becuase the range of hues gives the impression of providing additional information about the data. However, colormaps with large hue shifts tend to introduce discontinuities that don't exist in the data, and our visual system isn't able to naturally map the rainbow to quantitative distinctions like \"high\" or \"low\". The result is that these visualizations end up being more like a puzzle, and they obscure patterns in the data rather than revealing them. The jet palette is `particularly bad `_ because the brightest colors, yellow and cyan, are used for intermediate data values. This has the effect of emphasizing uninteresting (and arbitrary) values while demphasizing the extremes.\n", "\n", "For sequential data, it's better to use palettes that have at most a relatively subtle shift in hue accompanied by a large shift in brightness and saturation. This approach will naturally draw the eye to the relatively important parts of the data.\n", "\n", "The Color Brewer library has a great set of these palettes. They're named after the dominant color (or colors) in the palette." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.palplot(sns.color_palette(\"Blues\"))" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Like in matplotlib, if you want the lightness ramp to be reversed, you can add a ``_r`` suffix to the palette name." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.palplot(sns.color_palette(\"BuGn_r\"))" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Seaborn also adds a trick that allows you to create \"dark\" palettes, which do not have as wide a dynamic range. This can be useful if you want to map lines or points sequentially, as brighter-colored lines might otherwise be hard to distinguish." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.palplot(sns.color_palette(\"GnBu_d\"))" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Remember that you may want to use the :func:`choose_colorbrewer_palette` function to play with the various options, and you can set the ``as_cmap`` argument to ``True`` if you want the return value to be a colormap object that you can pass to seaborn or matplotlib functions." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ ".. _cubehelix_palettes:\n", "\n", "Sequential palettes with :func:`cubehelix_palette`\n", "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n", "\n", "The `cubehelix `_ color palette system makes sequential palettes with a linear increase or decrease in brightness and some variation in hue. This means that the information in your colormap will be preserved when converted to black and white (for printing) or when viewed by a colorblind individual.\n", "\n", "Matplotlib has the default cubehelix version built into it:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.palplot(sns.color_palette(\"cubehelix\", 8))" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Seaborn adds an interface to the cubehelix *system* so that you can make a variety of palettes that all have a well-behaved linear brightness ramp.\n", "\n", "The default palette returned by the seaborn :func:`cubehelix_palette` function is a bit different from the matplotlib default in that it does not rotate as far around the hue wheel or cover as wide a range of intensities. It also reverses the order so that more important values are darker:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.palplot(sns.cubehelix_palette(8))" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Other arguments to :func:`cubehelix_palette` control how the palette looks. The two main things you'll change are the ``start`` (a value between 0 and 3) and ``rot``, or number of rotations (an arbitrary value, but probably within -1 and 1)," ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.palplot(sns.cubehelix_palette(8, start=.5, rot=-.75))" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "You can also control how dark and light the endpoints are and even reverse the ramp:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.palplot(sns.cubehelix_palette(8, start=2, rot=0, dark=0, light=.95, reverse=True))" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "By default you just get a list of colors, like any other seaborn palette, but you can also return the palette as a colormap object that can be passed to seaborn or matplotlib functions using ``as_cmap=True``." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "x, y = np.random.multivariate_normal([0, 0], [[1, -.5], [-.5, 1]], size=300).T\n", "cmap = sns.cubehelix_palette(light=1, as_cmap=True)\n", "sns.kdeplot(x, y, cmap=cmap, shade=True);" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "To help select good palettes or colormaps using this system, you can use the :func:`choose_cubehelix_palette` function in a notebook to launch an interactive app that will let you play with the different parameters. Pass ``as_cmap=True`` if you want the function to return a colormap (rather than a list) for use in function like ``hexbin``." ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Custom sequential palettes with :func:`light_palette` and :func:`dark_palette`\n", "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n", "\n", "For a simpler interface to custom sequential palettes, you can use :func:`light_palette` or :func:`dark_palette`, which are both seeded with a single color and produce a palette that ramps either from light or dark desaturated values to that color. These functions are also accompanied by the :func:`choose_light_palette` and :func:`choose_dark_palette` functions that launch interactive widgets to create these palettes." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.palplot(sns.light_palette(\"green\"))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.palplot(sns.dark_palette(\"purple\"))" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "These palettes can also be reversed." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.palplot(sns.light_palette(\"navy\", reverse=True))" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "They can also be used to create colormap objects rather than lists of colors." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "pal = sns.dark_palette(\"palegreen\", as_cmap=True)\n", "sns.kdeplot(x, y, cmap=pal);" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "By default, the input can be any valid matplotlib color. Alternate interpretations are controlled by the ``input`` argument. Currently you can provide tuples in ``hls`` or ``husl`` space along with the default ``rgb``, and you can also seed the palette with any valid ``xkcd`` color." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.palplot(sns.light_palette((210, 90, 60), input=\"husl\"))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.palplot(sns.dark_palette(\"muted purple\", input=\"xkcd\"))" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Note that the default input space for the interactive palette widgets is ``husl``, which is different from the default for the function itself, but much more useful in this context." ] }, { "cell_type": "raw", "metadata": {}, "source": [ ".. _diverging_palettes:\n", "\n", "Diverging color palettes\n", "------------------------\n", "\n", "The third class of color palettes is called \"diverging\". These are used for data where both large low and high values are interesting. There is also usually a well-defined midpoint in the data. For instance, if you are plotting changes in temperature from some baseline timepoint, it is best to use a diverging colormap to show areas with relative decreases and areas with relative increases.\n", "\n", "The rules for choosing good diverging palettes are similar to good sequential palettes, except now you want to have two relatively subtle hue shifts from distinct starting hues that meet in an under-emphasized color at the midpoint. It's also important that the starting values are of similar brightness and saturation.\n", "\n", "It's also important to emphasize here that using red and green should be avoided, as a substantial population of potential viewers will be `unable to distinguish them `_.\n", "\n", "It should not surprise you that the Color Brewer library comes with a set of well-choosen diverging colormaps." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.palplot(sns.color_palette(\"BrBG\", 7))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.palplot(sns.color_palette(\"RdBu_r\", 7))" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Another good choice that is built into matplotlib is the ``coolwarm`` palette. Note that this colormap has less contrast between the middle values and the extremes." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.palplot(sns.color_palette(\"coolwarm\", 7))" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Custom diverging palettes with :func:`diverging_palette`\n", "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n", "\n", "You can also use the seaborn function :func:`diverging_palette` to create a custom colormap for diverging data. (Naturally there is also a companion interactive widget, :func:`choose_diverging_palette`). This function makes diverging palettes using the ``husl`` color system. You pass it two hues (in degreees) and, optionally, the lightness and saturation values for the extremes. Using ``husl`` means that the extreme values, and the resulting ramps to the midpoint, will be well-balanced" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.palplot(sns.diverging_palette(220, 20, n=7))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.palplot(sns.diverging_palette(145, 280, s=85, l=25, n=7))" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "The ``sep`` argument controls the width of the separation between the two ramps in the middle region of the palette." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.palplot(sns.diverging_palette(10, 220, sep=80, n=7))" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "It's also possible to make a palette with the midpoint is dark rather than light." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.palplot(sns.diverging_palette(255, 133, l=60, n=7, center=\"dark\"))" ] }, { "cell_type": "raw", "metadata": {}, "source": [ ".. _palette_contexts:\n", "\n", "Changing default palettes with :func:`set_palette`\n", "--------------------------------------------------\n", "\n", "The :func:`color_palette` function has a companion called :func:`set_palette`. The relationship between them is similar to the pairs covered in the :ref:`aesthetics tutorial `. :func:`set_palette` accepts the same arguments as :func:`color_palette`, but it changes the default matplotlib parameters so that the palette is used for all plots." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def sinplot(flip=1):\n", " x = np.linspace(0, 14, 100)\n", " for i in range(1, 7):\n", " plt.plot(x, np.sin(x + i * .5) * (7 - i) * flip)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.set_palette(\"husl\")\n", "sinplot()" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "The :func:`color_palette` function can also be used in a ``with`` statement to temporarily change the color palette." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "with sns.color_palette(\"PuBuGn_d\"):\n", " sinplot()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.9" } }, "nbformat": 4, "nbformat_minor": 0 }seaborn-0.6.0/doc/tutorial/distributions.ipynb000066400000000000000000000347211254425553400215430ustar00rootroot00000000000000{ "cells": [ { "cell_type": "raw", "metadata": {}, "source": [ ".. _distribution_tutorial:\n", "\n", ".. currentmodule:: seaborn" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Visualizing the distribution of a dataset" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "When dealing with a set of data, often the first thing you'll want to do is get a sense for how the variables are distributed. This chapter of the tutorial will give a brief introduction to some of the tools in seborn for examining univariate and bivariate distributions. You may also want to look at the :ref:`categorical plots ` chapter for examples of functions that make it easy to compare the distribution of a variable across levels of other variables." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%matplotlib inline" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "from scipy import stats, integrate\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import seaborn as sns\n", "sns.set(color_codes=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "np.random.seed(sum(map(ord, \"distributions\")))" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Plotting univariate distributions\n", "---------------------------------\n", "\n", "The most convenient way to take a quick look at a univariate distribution in seaborn is the :func:`distplot` function. By default, this will draw a `histogram `_ and fit a `kernel density estimate `_ (KDE). " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "x = np.random.normal(size=100)\n", "sns.distplot(x);" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Histograms\n", "^^^^^^^^^^\n", "\n", "Histograms are likely familiar, and a ``hist`` function already exists in matplotlib. A histogram represents the distribution of data by forming bins along the range of the data and then drawing bars to show the number of observations that fall in each bin.\n", "\n", "To illustrate this, let's remove the density curve and add a rug plot, which draws a small vertical tick at each observation. You can make the rug plot itself with the :func:`rugplot` function, but it is also available in :func:`distplot`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.distplot(x, kde=False, rug=True);" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "When drawing histograms, the main choice you have is the number of bins to use and where to place them. :func:`distplot` uses a simple rule to make a good guess for what the right number is by default, but trying more or fewer bins might reveal other features in the data:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.distplot(x, bins=20, kde=False, rug=True);" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Kernel density estimaton\n", "^^^^^^^^^^^^^^^^^^^^^^^^\n", "\n", "The kernel density estimate may be less familiar, but it can be a useful tool for plotting the shape of a distribution. Like the histogram, the KDE plots encodes the density of observations on one axis with height along the other axis:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.distplot(x, hist=False, rug=True);" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Drawing a KDE is more computationally involved than drawing a histogram. What happens is that each observation is first replaced with a normal (Gaussian) curve centered at that value:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "x = np.random.normal(0, 1, size=30)\n", "bandwidth = 1.06 * x.std() * x.size ** (-1 / 5.)\n", "support = np.linspace(-4, 4, 200)\n", "\n", "kernels = []\n", "for x_i in x:\n", "\n", " kernel = stats.norm(x_i, bandwidth).pdf(support)\n", " kernels.append(kernel)\n", " plt.plot(support, kernel, color=\"r\")\n", "\n", "sns.rugplot(x, color=\".2\", linewidth=3);" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Next, these curves are summed to compute the value of the density at each point in the support grid. The resulting curve is then normalized so that the area under it is equal to 1:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "density = np.sum(kernels, axis=0)\n", "density /= integrate.trapz(density, support)\n", "plt.plot(support, density);" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "We can see that if we use the :func:`kdeplot` function in seaborn, we get the same curve. This function is used by :func:`distplot`, but it provides a more direct interface with easier access to other options when you just want the density estimate:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.kdeplot(x, shade=True);" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "The bandwidth (``bw``) parameter of the KDE controls how tightly the estimation is fit to the data, much like the bin size in a histogram. It corresponds to the width of the kernels we plotted above. The default behavior tries to guess a good value using a common reference rule, but it may be helpful to try larger or smaller values:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.kdeplot(x)\n", "sns.kdeplot(x, bw=.2, label=\"bw: 0.2\")\n", "sns.kdeplot(x, bw=2, label=\"bw: 2\")\n", "plt.legend();" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "As you can see above, the nature of the Gaussian KDE process means that estimation extends past the largest and smallest values in the dataset. It's possible to control how far past the extreme values the curve is drawn with the ``cut`` parameter; however, this only influences how the curve is drawn and not how it is fit:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.kdeplot(x, shade=True, cut=0)\n", "sns.rugplot(x);" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Fitting parametric distributions\n", "^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", "\n", "You can also use :func:`distplot` to fit a parametric distribution to a dataset and visually evaluate how closely it corresponds to the observed data:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "x = np.random.gamma(6, size=200)\n", "sns.distplot(x, kde=False, fit=stats.gamma);" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Plotting bivariate distributions\n", "--------------------------------\n", "\n", "It can also be useful to visualize a bivariate distribution of two variables. The easiest way to do this in seaborn is to just the :func:`jointplot` function, which creates a multi-panel figure that shows both the bivariate (or joint) relationship between two variables along with the univariate (or marginal) distribution of each on separate axes." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "mean, cov = [0, 1], [(1, .5), (.5, 1)]\n", "data = np.random.multivariate_normal(mean, cov, 200)\n", "df = pd.DataFrame(data, columns=[\"x\", \"y\"])" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Scatterplots\n", "^^^^^^^^^^^^\n", "\n", "The most familiar way to visualize a bivariate distribution is a scatterplot, where each observation is shown with point at the *x* and *y* values. This is analgous to a rug plot on two dimensions. You can draw a scatterplot with the matplotlib ``plt.scatter`` function, and it is also the default kind of plot shown by the :func:`jointplot` function:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.jointplot(x=\"x\", y=\"y\", data=df);" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Hexbin plots\n", "^^^^^^^^^^^^\n", "\n", "The bivariate analogue of a histogram is known as a \"hexbin\" plot, because it shows the counts of observations that fall within hexagonal bins. This plot works best with relatively large datasets. It's availible through the matplotlib ``plt.hexbin`` function and as a style in :func:`jointplot`. It looks best with a white background:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "x, y = np.random.multivariate_normal(mean, cov, 1000).T\n", "with sns.axes_style(\"white\"):\n", " sns.jointplot(x=x, y=y, kind=\"hex\", color=\"k\");" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Kernel density estimation\n", "^^^^^^^^^^^^^^^^^^^^^^^^^\n", "\n", "It is also posible to use the kernel density estimation procedure described above to visualize a bivariate distribution. In seaborn, this kind of plot is shown with a contour plot and is available as a style in :func:`jointplot`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.jointplot(x=\"x\", y=\"y\", data=df, kind=\"kde\");" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "You can also draw a two-dimensional kernel density plot with the :func:`kdeplot` function. This allows you to draw this kind of plot onto a specific (and possibly already existing) matplotlib axes, whereas the :func:`jointplot` function manages its own figure:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "f, ax = plt.subplots(figsize=(6, 6))\n", "sns.kdeplot(df.x, df.y, ax=ax)\n", "sns.rugplot(df.x, color=\"g\", ax=ax)\n", "sns.rugplot(df.y, vertical=True, ax=ax);" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "If you wish to show the bivariate density more continuously, you can simply increase the number of contour levels:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "f, ax = plt.subplots(figsize=(6, 6))\n", "cmap = sns.cubehelix_palette(as_cmap=True, dark=0, light=1, reverse=True)\n", "sns.kdeplot(df.x, df.y, cmap=cmap, n_levels=60, shade=True);" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "The :func:`jointplot` function uses a :class:`JointGrid` to manage the figure. For more flexibility, you may want to draw your figure by using :class:`JointGrid` directly. :func:`jointplot` returns the :class:`JointGrid` object after plotting, which you can use to add more layers or to tweak other aspects of the visualization:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "g = sns.jointplot(x=\"x\", y=\"y\", data=df, kind=\"kde\", color=\"m\")\n", "g.plot_joint(plt.scatter, c=\"w\", s=30, linewidth=1, marker=\"+\")\n", "g.ax_joint.collections[0].set_alpha(0)\n", "g.set_axis_labels(\"$X$\", \"$Y$\");" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Visualizing pairwise relationships in a dataset\n", "-----------------------------------------------\n", "\n", "To plot multiple pairwise bivariate distributions in a dataset, you can use the :func:`pairplot` function. This creates a matrix of axes and shows the relationship for each pair of columns in a DataFrame. by default, it also draws the univariate distribution of each variable on the diagonal Axes:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "iris = sns.load_dataset(\"iris\")\n", "sns.pairplot(iris);" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Much like the relationship between :func:`jointplot` and :class:`JointGrid`, the :func:`pairplot` function is built on top of a :class:`PairGrid` object, which can be used directly for more flexibility:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "g = sns.PairGrid(iris)\n", "g.map_diag(sns.kdeplot)\n", "g.map_offdiag(sns.kdeplot, cmap=\"Blues_d\", n_levels=6);" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.10" } }, "nbformat": 4, "nbformat_minor": 0 }seaborn-0.6.0/doc/tutorial/regression.ipynb000066400000000000000000000450601254425553400210170ustar00rootroot00000000000000{ "cells": [ { "cell_type": "raw", "metadata": {}, "source": [ ".. _regression_tutorial:\n", "\n", ".. currentmodule:: seaborn" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Visualizing linear relationships" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Many datasets contain multiple quantitative variables, and the goal of an analysis is often to relate those variables to each other. We :ref:`previously discussed ` functions that can accomplish this by showing the joint distribution of two variables. It can be very helpful, though, to use statistical models to estimate a simple relationship between two noisy sets of observations. The functions discussed in this chapter will do so through the common framework of linear regression.\n", "\n", "In the spirit of Tukey, the regression plots in seaborn are primarily intended to add a visual guide that helps to emphasize patterns in a dataset during exploratory data analyses. That is to say that seaborn is not itself a package for statistical analysis. To obtain quantitative measures related to the fit of regression models, you should use `statsmodels `_. The goal of seaborn, however, is to make exploring a dataset through visualization quick and easy, as doing so is just as (if not more) important than exploring a dataset through tables of statistics." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%matplotlib inline" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import matplotlib as mpl\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import seaborn as sns\n", "sns.set(color_codes=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "np.random.seed(sum(map(ord, \"regression\")))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "tips = sns.load_dataset(\"tips\")" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Functions to draw linear regression models\n", "------------------------------------------\n", "\n", "Two main functions in seaborn are used to visualize a linear relationship as determined through regression. These functions, :func:`regplot` and :func:`lmplot` are closely related, and share much of their core functionality. It is important to understand the ways they differ, however, so that you can quickly choose the correct tool for particular job.\n", "\n", "In the simplest invocation, both functions draw a scatterplot of two variables, ``x`` and ``y``, and then fit the regression model ``y ~ x`` and plot the resulting regression line and a 95% confidence interval for that regression:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.regplot(x=\"total_bill\", y=\"tip\", data=tips);" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.lmplot(x=\"total_bill\", y=\"tip\", data=tips);" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "You should note that the resulting plots are identical, except that the figure shapes are different. We will explain why this is shortly. For now, the other main difference to know about is that :func:`regplot` accepts the ``x`` and ``y`` variables in a variety of formats including simple numpy arrays, pandas ``Series`` objects, or as references to variables in a pandas ``DataFrame`` object passed to ``data``. In contrast, :func:`lmplot` has ``data`` as a required parameter and the ``x`` and ``y`` variables must be specified as strings. This data format is called \"long-form\" or `\"tidy\" `_ data. Other than this input flexibility, :func:`regplot` possesses a subset of :func:`lmplot`'s features, so we will demonstrate them using the latter.\n", "\n", "It's possible to fit a linear regression when one of the variables takes discrete values, however, the simple scatterplot produced by this kind of dataset is often not optimal:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.lmplot(x=\"size\", y=\"tip\", data=tips);" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "One option is to add some random noise (\"jitter\") to the discrete values to make the distribution of those values more clear. Note that jitter is applied only to the scatterplot data and does not influence the regression line fit itself:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.lmplot(x=\"size\", y=\"tip\", data=tips, x_jitter=.05);" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "A second option is to collapse over the observations in each discrete bin to plot an estimate of central tendency along with a confidence interval:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.lmplot(x=\"size\", y=\"tip\", data=tips, x_estimator=np.mean);" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Fitting different kinds of models\n", "---------------------------------\n", "\n", "The simple linear regression model used above is very simple to fit, however, it is not appropriate for some kinds of datasets. The `Anscombe's quartet `_ dataset shows a few examples where simple linear regression provides an identical estimate of a relationship where simple visual inspection clearly shows differences. For example, in the first case, the linear regression is a good model:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "anscombe = sns.load_dataset(\"anscombe\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.lmplot(x=\"x\", y=\"y\", data=anscombe.query(\"dataset == 'I'\"),\n", " ci=None, scatter_kws={\"s\": 80});" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "The linear relationship in the second dataset is the same, but the plot clearly shows that this is not a good model:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.lmplot(x=\"x\", y=\"y\", data=anscombe.query(\"dataset == 'II'\"),\n", " ci=None, scatter_kws={\"s\": 80});" ] }, { "cell_type": "raw", "metadata": { "collapsed": true }, "source": [ "In the presence of these kind of higher-order relationships, :func:`lmplot` and :func:`regplot` can fit a polynomial regression model to explore simple kinds of nonlinear trends in the dataset:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.lmplot(x=\"x\", y=\"y\", data=anscombe.query(\"dataset == 'II'\"),\n", " order=2, ci=None, scatter_kws={\"s\": 80});" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "A different problem is posed by \"outlier\" observations that deviate for some reason other than the main relationship under study:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.lmplot(x=\"x\", y=\"y\", data=anscombe.query(\"dataset == 'III'\"),\n", " ci=None, scatter_kws={\"s\": 80});" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "In the presence of outliers, it can be useful to fit a robust regression, which uses a different loss function to downweight relatively large residuals:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.lmplot(x=\"x\", y=\"y\", data=anscombe.query(\"dataset == 'III'\"),\n", " robust=True, ci=None, scatter_kws={\"s\": 80});" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "When the ``y`` variable is binary, simple linear regression also \"works\" but provides implausible predictions:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "tips[\"big_tip\"] = (tips.tip / tips.total_bill) > .15\n", "sns.lmplot(x=\"total_bill\", y=\"big_tip\", data=tips,\n", " y_jitter=.03);" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "The solution in this case is to fit a logistic regression, such that the regression line shows the estimated probability of ``y = 1`` for a given value of ``x``:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.lmplot(x=\"total_bill\", y=\"big_tip\", data=tips,\n", " logistic=True, y_jitter=.03);" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Note that the logistic regression estimate is considerably more computationally intensive (this is true of robust regression as well) than simple regression, and as the confidence interval around the regression line is computed using a bootstrap procedure, you may wish to turn this off for faster iteration (using ``ci=False``).\n", "\n", "An altogether different approach is to fit a nonparametric regression using a `lowess smoother `_. This approach has the fewest assumptions, although it is computationally intensive and so currently confidence intervals are not computed at all:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.lmplot(x=\"total_bill\", y=\"tip\", data=tips,\n", " lowess=True);" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "The :func:`residplot` function can be a useful tool for checking whether the simple regression model is appropriate for a dataset. It fits and removes a simple linear regression and then plots the residual values for each observation. Ideally, these values should be randomly scattered around ``y = 0``:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.residplot(x=\"x\", y=\"y\", data=anscombe.query(\"dataset == 'I'\"),\n", " scatter_kws={\"s\": 80});" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "If there is structure in the residuals, it suggests that simple linear regression is not appropriate:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.residplot(x=\"x\", y=\"y\", data=anscombe.query(\"dataset == 'II'\"),\n", " scatter_kws={\"s\": 80});" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Conditioning on other variables\n", "-------------------------------\n", "\n", "The plots above show many ways to explore the relationship between a pair of variables. Often, however, a more interesting question is \"how does the relationship between these two variables change as a function of a third variable?\" This is where the difference between :func:`regplot` and :func:`lmplot` appears. While :func:`regplot` always shows a single relationsihp, :func:`lmplot` combines :func:`regplot` with :class:`FacetGrid` to provide an easy interface to show a linear regression on \"faceted\" plots that allow you to explore interactions with up to three additional categorical variables.\n", "\n", "The best way to separate out a relationship is to plot both levels on the same axes and to use color to distinguish them:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.lmplot(x=\"total_bill\", y=\"tip\", hue=\"smoker\", data=tips);" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "In addition to color, it's possible to use different scatterplot markers to make plots the reproduce to black and white better. You also have full control over the colors used:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.lmplot(x=\"total_bill\", y=\"tip\", hue=\"smoker\", data=tips,\n", " markers=[\"o\", \"x\"], palette=\"Set1\");" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "To add another variable, you can draw multiple \"facets\" which each level of the variable appearing in the rows or columns of the grid:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.lmplot(x=\"total_bill\", y=\"tip\", hue=\"smoker\", col=\"time\", data=tips);" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.lmplot(x=\"total_bill\", y=\"tip\", hue=\"smoker\",\n", " col=\"time\", row=\"sex\", data=tips);" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Controlling the size and shape of the plot\n", "------------------------------------------\n", "\n", "Before we noted that the default plots made by :func:`regplot` and :func:`lmplot` look the same but on axes that have a different size and shape. This is because func:`regplot` is an \"axes-level\" function draws onto a specific axes. This means that you can make mutli-panel figures yourself and control exactly where the the regression plot goes. If no axes is provided, it simply uses the \"currently active\" axes, which is why the default plot has the same size and shape as most other matplotlib functions. To control the size, you need to create a figure object yourself." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "f, ax = plt.subplots(figsize=(5, 6))\n", "sns.regplot(x=\"total_bill\", y=\"tip\", data=tips, ax=ax);" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "In contrast, the size and shape of the :func:`lmplot` figure is controlled through the :class:`FacetGrid` interface using the ``size`` and ``aspect`` parameters, which apply to each *facet* in the plot, not to the overall figure itself:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.lmplot(x=\"total_bill\", y=\"tip\", col=\"day\", data=tips,\n", " col_wrap=2, size=3);" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [], "source": [ "sns.lmplot(x=\"total_bill\", y=\"tip\", col=\"day\", data=tips,\n", " aspect=.5);" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Plotting a regression in other contexts\n", "---------------------------------------\n", "\n", "A few other seaborn functions use :func:`regplot` in the context of a larger, more complex plot. The first is the :func:`jointplot` function that we introduced in the :ref:`distributions tutorial `. In addition to the plot styles previously discussed, :func:`jointplot` can use :func:`regplot` to show the linear regression fit on the joint axes by passing ``kind=\"reg\"``:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.jointplot(x=\"total_bill\", y=\"tip\", data=tips, kind=\"reg\");" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Using the :func:`pairplot` function with ``kind=\"reg\"`` combines :func:`regplot` and :class:`PairGrid` to show the linear relationship between variables in a dataset. Take care to note how this is different from :func:`lmplot`. In the figure below, the two axes don't show the same relationship conditioned on two levels of a third variable; rather, :func:`PairGrid` is used to show multiple relationships between different pairings of the variables in a dataset:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.pairplot(tips, x_vars=[\"total_bill\", \"size\"], y_vars=[\"tip\"],\n", " size=5, aspect=.8, kind=\"reg\");" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Like :func:`lmplot`, but unlike :func:`jointplot`, conditioning on an additional categorical variable is built into :func:`pairplot` using the ``hue`` parameter:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sns.pairplot(tips, x_vars=[\"total_bill\", \"size\"], y_vars=[\"tip\"],\n", " hue=\"smoker\", size=5, aspect=.8, kind=\"reg\");" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.10" } }, "nbformat": 4, "nbformat_minor": 0 }seaborn-0.6.0/doc/tutorial/tools/000077500000000000000000000000001254425553400167275ustar00rootroot00000000000000seaborn-0.6.0/doc/tutorial/tools/nb_to_doc.py000077500000000000000000000012161254425553400212320ustar00rootroot00000000000000#! /usr/bin/env python """ Convert empty IPython notebook to a sphinx doc page. """ import sys from subprocess import check_call as sh def convert_nb(nbname): # Execute the notebook sh(["ipython", "nbconvert", "--to", "notebook", "--execute", "--inplace", nbname + ".ipynb"]) # Convert to .rst for Sphinx sh(["ipython", "nbconvert", "--to", "rst", nbname + ".ipynb"]) # Clear notebook output sh(["ipython", "nbconvert", "--to", "notebook", "--inplace", "--ClearOutputPreprocessor.enabled=True", nbname + ".ipynb"]) if __name__ == "__main__": for nbname in sys.argv[1:]: convert_nb(nbname) seaborn-0.6.0/doc/tutorial/tools/nbstripout000077500000000000000000000016521254425553400210720ustar00rootroot00000000000000#!/usr/bin/env python """strip outputs from an IPython Notebook Opens a notebook, strips its output, and writes the outputless version to the original file. Useful mainly as a git pre-commit hook for users who don't want to track output in VCS. This does mostly the same thing as the `Clear All Output` command in the notebook UI. """ import io import sys from IPython.nbformat import current def strip_output(nb): """strip the outputs from a notebook object""" for cell in nb.worksheets[0].cells: if 'outputs' in cell: cell['outputs'] = [] if 'prompt_number' in cell: cell['prompt_number'] = None return nb if __name__ == '__main__': filename = sys.argv[1] with io.open(filename, 'r', encoding='utf8') as f: nb = current.read(f, 'json') nb = strip_output(nb) with io.open(filename, 'w', encoding='utf8') as f: current.write(nb, f, 'json') seaborn-0.6.0/doc/whatsnew.rst000066400000000000000000000006711254425553400163220ustar00rootroot00000000000000.. _whatsnew: .. currentmodule:: seaborn What's new in the package ========================= A catalog of new features, improvements, and bug-fixes in each release. .. include:: releases/v0.6.0.txt .. include:: releases/v0.5.1.txt .. include:: releases/v0.5.0.txt .. include:: releases/v0.4.0.txt .. include:: releases/v0.3.1.txt .. include:: releases/v0.3.0.txt .. include:: releases/v0.2.1.txt .. include:: releases/v0.2.0.txt seaborn-0.6.0/examples/000077500000000000000000000000001254425553400147755ustar00rootroot00000000000000seaborn-0.6.0/examples/.gitignore000066400000000000000000000000201254425553400167550ustar00rootroot00000000000000*.html *_files/ seaborn-0.6.0/examples/anscombes_quartet.py000066400000000000000000000006461254425553400210740ustar00rootroot00000000000000""" Anscombe's quartet ================== _thumb: .4, .4 """ import seaborn as sns sns.set(style="ticks") # Load the example dataset for Anscombe's quartet df = sns.load_dataset("anscombe") # Show the results of a linear regression within each dataset sns.lmplot(x="x", y="y", col="dataset", hue="dataset", data=df, col_wrap=2, ci=None, palette="muted", size=4, scatter_kws={"s": 50, "alpha": 1}) seaborn-0.6.0/examples/color_palettes.py000066400000000000000000000015151254425553400203700ustar00rootroot00000000000000""" Color palette choices ===================== """ import numpy as np import seaborn as sns import matplotlib.pyplot as plt sns.set(style="white", context="talk") rs = np.random.RandomState(7) # Set up the matplotlib figure f, (ax1, ax2, ax3) = plt.subplots(3, 1, figsize=(8, 6), sharex=True) # Generate some sequential data x = np.array(list("ABCDEFGHI")) y1 = np.arange(1, 10) sns.barplot(x, y1, palette="BuGn_d", ax=ax1) ax1.set_ylabel("Sequential") # Center the data to make it diverging y2 = y1 - 5 sns.barplot(x, y2, palette="RdBu_r", ax=ax2) ax2.set_ylabel("Diverging") # Randomly reorder the data to make it qualitative y3 = rs.choice(y1, 9, replace=False) sns.barplot(x, y3, palette="Set3", ax=ax3) ax3.set_ylabel("Qualitative") # Finalize the plot sns.despine(bottom=True) plt.setp(f.axes, yticks=[]) plt.tight_layout(h_pad=3) seaborn-0.6.0/examples/cubehelix_palette.py000066400000000000000000000013451254425553400210400ustar00rootroot00000000000000""" Different cubehelix palettes ============================ _thumb: .4, .65 """ import numpy as np import seaborn as sns import matplotlib.pyplot as plt sns.set(style="dark") rs = np.random.RandomState(50) # Set up the matplotlib figure f, axes = plt.subplots(3, 3, figsize=(9, 9), sharex=True, sharey=True) # Rotate the starting point around the cubehelix hue circle for ax, s in zip(axes.flat, np.linspace(0, 3, 10)): # Create a cubehelix colormap to use with kdeplot cmap = sns.cubehelix_palette(start=s, light=1, as_cmap=True) # Generate and plot a random bivariate dataset x, y = rs.randn(2, 50) sns.kdeplot(x, y, cmap=cmap, shade=True, cut=5, ax=ax) ax.set(xlim=(-3, 3), ylim=(-3, 3)) f.tight_layout() seaborn-0.6.0/examples/distplot_options.py000066400000000000000000000015671254425553400207750ustar00rootroot00000000000000""" Distribution plot options ========================= """ import numpy as np import seaborn as sns import matplotlib.pyplot as plt sns.set(style="white", palette="muted", color_codes=True) rs = np.random.RandomState(10) # Set up the matplotlib figure f, axes = plt.subplots(2, 2, figsize=(7, 7), sharex=True) sns.despine(left=True) # Generate a random univariate dataset d = rs.normal(size=100) # Plot a simple histogram with binsize determined automatically sns.distplot(d, kde=False, color="b", ax=axes[0, 0]) # Plot a kernel density estimate and rug plot sns.distplot(d, hist=False, rug=True, color="r", ax=axes[0, 1]) # Plot a filled kernel density estimate sns.distplot(d, hist=False, color="g", kde_kws={"shade": True}, ax=axes[1, 0]) # Plot a historgram and kernel density estimate sns.distplot(d, color="m", ax=axes[1, 1]) plt.setp(axes, yticks=[]) plt.tight_layout() seaborn-0.6.0/examples/elaborate_violinplot.py000066400000000000000000000020371254425553400215660ustar00rootroot00000000000000""" Violinplot from a wide-form dataset =================================== _thumb: .6, .45 """ import seaborn as sns import matplotlib.pyplot as plt sns.set(style="whitegrid") # Load the example dataset of brain network correlations df = sns.load_dataset("brain_networks", header=[0, 1, 2], index_col=0) # Pull out a specific subset of networks used_networks = [1, 3, 4, 5, 6, 7, 8, 11, 12, 13, 16, 17] used_columns = (df.columns.get_level_values("network") .astype(int) .isin(used_networks)) df = df.loc[:, used_columns] # Compute the correlation matrix and average over networks corr_df = df.corr().groupby(level="network").mean() corr_df.index = corr_df.index.astype(int) corr_df = corr_df.sort_index().T # Set up the matplotlib figure f, ax = plt.subplots(figsize=(11, 6)) # Draw a violinplot with a narrower bandwidth than the default sns.violinplot(data=corr_df, palette="Set3", bw=.2, cut=1, linewidth=1) # Finalize the figure ax.set(ylim=(-.7, 1.05)) sns.despine(left=True, bottom=True) seaborn-0.6.0/examples/faceted_histogram.py000066400000000000000000000006211254425553400210160ustar00rootroot00000000000000""" Facetting histograms by subsets of data ======================================= _thumb: .42, .57 """ import numpy as np import seaborn as sns import matplotlib.pyplot as plt sns.set(style="darkgrid") tips = sns.load_dataset("tips") g = sns.FacetGrid(tips, row="sex", col="time", margin_titles=True) bins = np.linspace(0, 60, 13) g.map(plt.hist, "total_bill", color="steelblue", bins=bins, lw=0) seaborn-0.6.0/examples/facets_with_custom_projection.py000066400000000000000000000013771254425553400235050ustar00rootroot00000000000000""" FacetGrid with custom projection ================================ _thumb: .33, .5 """ import matplotlib.pyplot as plt import numpy as np import pandas as pd import seaborn as sns sns.set() # Generate an example radial datast r = np.linspace(0, 10, num=100) df = pd.DataFrame({'r': r, 'slow': r, 'medium': 2 * r, 'fast': 4 * r}) # Convert the dataframe to long-form or "tidy" format df = pd.melt(df, id_vars=['r'], var_name='speed', value_name='theta') # Set up a grid of axes with a polar projection g = sns.FacetGrid(df, col="speed", hue="speed", subplot_kws=dict(projection='polar'), size=4.5, sharex=False, sharey=False, despine=False) # Draw a scatterplot onto each axes in the grid g.map(plt.scatter, "theta", "r") seaborn-0.6.0/examples/factorplot_bars.py000066400000000000000000000006521254425553400205360ustar00rootroot00000000000000""" Grouped barplots ================ _thumb: .45, .5 """ import seaborn as sns sns.set(style="whitegrid") # Load the example Titanic dataset titanic = sns.load_dataset("titanic") # Draw a nested barplot to show survival for class and sex g = sns.factorplot(x="class", y="survived", hue="sex", data=titanic, size=6, kind="bar", palette="muted") g.despine(left=True) g.set_ylabels("survival probability") seaborn-0.6.0/examples/grouped_boxplot.py000066400000000000000000000004741254425553400205700ustar00rootroot00000000000000""" Grouped boxplots ================ """ import seaborn as sns sns.set(style="ticks") # Load the example tips dataset tips = sns.load_dataset("tips") # Draw a nested boxplot to show bills by day and sex sns.boxplot(x="day", y="total_bill", hue="sex", data=tips, palette="PRGn") sns.despine(offset=10, trim=True) seaborn-0.6.0/examples/grouped_violinplots.py000066400000000000000000000007521254425553400214620ustar00rootroot00000000000000""" Grouped violinplots with split violins ====================================== _thumb: .5, .47 """ import seaborn as sns sns.set(style="whitegrid", palette="pastel", color_codes=True) # Load the example tips dataset tips = sns.load_dataset("tips") # Draw a nested violinplot and split the violins for easier comparison sns.violinplot(x="day", y="total_bill", hue="sex", data=tips, split=True, inner="quart", palette={"Male": "b", "Female": "y"}) sns.despine(left=True) seaborn-0.6.0/examples/heatmap_annotation.py000066400000000000000000000005421254425553400212210ustar00rootroot00000000000000""" Annotated heatmaps ================== """ import seaborn as sns sns.set() # Load the example flights dataset and conver to long-form flights_long = sns.load_dataset("flights") flights = flights_long.pivot("month", "year", "passengers") # Draw a heatmap with the numeric values in each cell sns.heatmap(flights, annot=True, fmt="d", linewidths=.5) seaborn-0.6.0/examples/hexbin_marginals.py000066400000000000000000000005621254425553400206640ustar00rootroot00000000000000""" Hexbin plot with marginal distributions ======================================= _thumb: .45, .4 """ import numpy as np from scipy.stats import kendalltau import seaborn as sns sns.set(style="ticks") rs = np.random.RandomState(11) x = rs.gamma(2, size=1000) y = -.5 * x + rs.normal(size=1000) sns.jointplot(x, y, kind="hex", stat_func=kendalltau, color="#4CB391") seaborn-0.6.0/examples/horizontal_barplot.py000066400000000000000000000015271254425553400212700ustar00rootroot00000000000000""" Horizontal bar plots ==================== """ import seaborn as sns import matplotlib.pyplot as plt sns.set(style="whitegrid") # Initialize the matplotlib figure f, ax = plt.subplots(figsize=(6, 15)) # Load the example car crash dataset crashes = sns.load_dataset("car_crashes").sort("total", ascending=False) # Plot the total crashes sns.set_color_codes("pastel") sns.barplot(x="total", y="abbrev", data=crashes, label="Total", color="b") # Plot the crashes where alcohol was involved sns.set_color_codes("muted") sns.barplot(x="alcohol", y="abbrev", data=crashes, label="Alcohol-involved", color="b") # Add a legend and informative axis label ax.legend(ncol=2, loc="lower right", frameon=True) ax.set(xlim=(0, 24), ylabel="", xlabel="Automobile collisions per billion miles") sns.despine(left=True, bottom=True) seaborn-0.6.0/examples/horizontal_boxplot.py000066400000000000000000000012261254425553400213100ustar00rootroot00000000000000""" Horizontal boxplot with observations ==================================== _thumb: .7, .45 """ import numpy as np import seaborn as sns sns.set(style="ticks", palette="muted", color_codes=True) # Load the example planets dataset planets = sns.load_dataset("planets") # Plot the orbital period with horizontal boxes ax = sns.boxplot(x="distance", y="method", data=planets, whis=np.inf, color="c") # Add in points to show each observation sns.stripplot(x="distance", y="method", data=planets, jitter=True, size=3, color=".3", linewidth=0) # Make the quantitative axis logarithmic ax.set_xscale("log") sns.despine(trim=True) seaborn-0.6.0/examples/interactplot.py000066400000000000000000000010631254425553400200570ustar00rootroot00000000000000""" Continuous interactions ======================= """ import numpy as np import pandas as pd import seaborn as sns sns.set(style="darkgrid") # Generate a random dataset with strong simple effects and an interaction n = 80 rs = np.random.RandomState(11) x1 = rs.randn(n) x2 = x1 / 5 + rs.randn(n) b0, b1, b2, b3 = .5, .25, -1, 2 y = b0 + b1 * x1 + b2 * x2 + b3 * x1 * x2 + rs.randn(n) df = pd.DataFrame(np.c_[x1, x2, y], columns=["x1", "x2", "y"]) # Show a scatterplot of the predictors with the estimated model surface sns.interactplot("x1", "x2", "y", df) seaborn-0.6.0/examples/joint_kde.py000066400000000000000000000010111254425553400173060ustar00rootroot00000000000000""" Joint kernel density estimate ============================= _thumb: .6, .4 """ import numpy as np import pandas as pd import seaborn as sns sns.set(style="white") # Generate a random correlated bivariate dataset rs = np.random.RandomState(5) mean = [0, 0] cov = [(1, .5), (.5, 1)] x1, x2 = rs.multivariate_normal(mean, cov, 500).T x1 = pd.Series(x1, name="$X_1$") x2 = pd.Series(x2, name="$X_2$") # Show the joint distribution using kernel density estimation g = sns.jointplot(x1, x2, kind="kde", size=7, space=0) seaborn-0.6.0/examples/logistic_regression.py000066400000000000000000000010031254425553400214160ustar00rootroot00000000000000""" Faceted logistic regression =========================== _thumb: .58, .5 """ import seaborn as sns sns.set(style="darkgrid") # Load the example titanic dataset df = sns.load_dataset("titanic") # Make a custom palette with gendered colors pal = dict(male="#6495ED", female="#F08080") # Show the survival proability as a function of age and sex g = sns.lmplot(x="age", y="survived", col="sex", hue="sex", data=df, palette=pal, y_jitter=.02, logistic=True) g.set(xlim=(0, 80), ylim=(-.05, 1.05)) seaborn-0.6.0/examples/many_facets.py000066400000000000000000000020461254425553400176420ustar00rootroot00000000000000""" Plotting on a large number of facets ==================================== """ import numpy as np import pandas as pd import seaborn as sns import matplotlib.pyplot as plt sns.set(style="ticks") # Create a dataset with many short random walks rs = np.random.RandomState(4) pos = rs.randint(-1, 2, (20, 5)).cumsum(axis=1) pos -= pos[:, 0, np.newaxis] step = np.tile(range(5), 20) walk = np.repeat(range(20), 5) df = pd.DataFrame(np.c_[pos.flat, step, walk], columns=["position", "step", "walk"]) # Initialize a grid of plots with an Axes for each walk grid = sns.FacetGrid(df, col="walk", hue="walk", col_wrap=5, size=1.5) # Draw a horizontal line to show the starting point grid.map(plt.axhline, y=0, ls=":", c=".5") # Draw a line plot to show the trajectory of each random walk grid.map(plt.plot, "step", "position", marker="o", ms=4) # Adjust the tick positions and labels grid.set(xticks=np.arange(5), yticks=[-3, 3], xlim=(-.5, 4.5), ylim=(-3.5, 3.5)) # Adjust the arrangement of the plots grid.fig.tight_layout(w_pad=1) seaborn-0.6.0/examples/many_pairwise_correlations.py000066400000000000000000000017041254425553400230040ustar00rootroot00000000000000""" Plotting a diagonal correlation matrix ====================================== _thumb: .3, .6 """ from string import letters import numpy as np import pandas as pd import seaborn as sns import matplotlib.pyplot as plt sns.set(style="white") # Generate a large random dataset rs = np.random.RandomState(33) d = pd.DataFrame(data=rs.normal(size=(100, 26)), columns=list(letters[:26])) # Compute the correlation matrix corr = d.corr() # Generate a mask for the upper triangle mask = np.zeros_like(corr, dtype=np.bool) mask[np.triu_indices_from(mask)] = True # Set up the matplotlib figure f, ax = plt.subplots(figsize=(11, 9)) # Generate a custom diverging colormap cmap = sns.diverging_palette(220, 10, as_cmap=True) # Draw the heatmap with the mask and correct aspect ratio sns.heatmap(corr, mask=mask, cmap=cmap, vmax=.3, square=True, xticklabels=5, yticklabels=5, linewidths=.5, cbar_kws={"shrink": .5}, ax=ax) seaborn-0.6.0/examples/marginal_ticks.py000066400000000000000000000010401254425553400203310ustar00rootroot00000000000000""" Scatterplot with marginal ticks =============================== _thumb: .65, .35 """ import numpy as np import seaborn as sns import matplotlib.pyplot as plt sns.set(style="white", color_codes=True) # Generate a random bivariate dataset rs = np.random.RandomState(9) mean = [0, 0] cov = [(1, 0), (0, 2)] x, y = rs.multivariate_normal(mean, cov, 100).T # Use JointGrid directly to draw a custom plot grid = sns.JointGrid(x, y, space=0, size=6, ratio=50) grid.plot_joint(plt.scatter, color="g") grid.plot_marginals(sns.rugplot, color="g") seaborn-0.6.0/examples/multiple_joint_kde.py000066400000000000000000000015631254425553400212350ustar00rootroot00000000000000""" Multiple bivariate KDE plots ============================ _thumb: .6, .4 """ import seaborn as sns import matplotlib.pyplot as plt sns.set(style="darkgrid") iris = sns.load_dataset("iris") # Subset the iris dataset by species setosa = iris.query("species == 'setosa'") virginica = iris.query("species == 'virginica'") # Set up the figure f, ax = plt.subplots(figsize=(8, 8)) ax.set_aspect("equal") # Draw the two density plots ax = sns.kdeplot(setosa.sepal_width, setosa.sepal_length, cmap="Reds", shade=True, shade_lowest=False) ax = sns.kdeplot(virginica.sepal_width, virginica.sepal_length, cmap="Blues", shade=True, shade_lowest=False) # Add labels to the plot red = sns.color_palette("Reds")[-2] blue = sns.color_palette("Blues")[-2] ax.text(2.5, 8.2, "virginica", size=16, color=blue) ax.text(3.8, 4.5, "setosa", size=16, color=red) seaborn-0.6.0/examples/multiple_regression.py000066400000000000000000000010751254425553400214450ustar00rootroot00000000000000""" Multiple linear regression ========================== """ import seaborn as sns sns.set(style="ticks", context="talk") # Load the example tips dataset tips = sns.load_dataset("tips") # Make a custom sequential palette using the cubehelix system pal = sns.cubehelix_palette(4, 1.5, .75, light=.6, dark=.2) # Plot tip as a function of toal bill across days g = sns.lmplot(x="total_bill", y="tip", hue="day", data=tips, palette=pal, size=7) # Use more informative axis labels than are provided by default g.set_axis_labels("Total bill ($)", "Tip ($)") seaborn-0.6.0/examples/network_correlations.py000066400000000000000000000013521254425553400216250ustar00rootroot00000000000000""" Correlation matrix heatmap ========================== """ import seaborn as sns import matplotlib.pyplot as plt sns.set(context="paper", font="monospace") # Load the datset of correlations between cortical brain networks df = sns.load_dataset("brain_networks", header=[0, 1, 2], index_col=0) corrmat = df.corr() # Set up the matplotlib figure f, ax = plt.subplots(figsize=(12, 9)) # Draw the heatmap using seaborn sns.heatmap(corrmat, vmax=.8, square=True) # Use matplotlib directly to emphasize known networks networks = corrmat.columns.get_level_values("network") for i, network in enumerate(networks): if i and network != networks[i - 1]: ax.axhline(len(networks) - i, c="w") ax.axvline(i, c="w") f.tight_layout() seaborn-0.6.0/examples/pair_grid_with_kde.py000066400000000000000000000005311254425553400211640ustar00rootroot00000000000000""" Paired Density and Scatterplot Matrix ===================================== _thumb: .5, .5 """ import seaborn as sns import matplotlib.pyplot as plt sns.set(style="white") df = sns.load_dataset("iris") g = sns.PairGrid(df, diag_sharey=False) g.map_lower(sns.kdeplot, cmap="Blues_d") g.map_upper(plt.scatter) g.map_diag(sns.kdeplot, lw=3) seaborn-0.6.0/examples/paired_pointplots.py000066400000000000000000000010241254425553400211030ustar00rootroot00000000000000""" Paired discrete plots ===================== """ import seaborn as sns sns.set(style="whitegrid") # Load the example Titanic dataset titanic = sns.load_dataset("titanic") # Set up a grid to plot survival probability against several variables g = sns.PairGrid(titanic, y_vars="survived", x_vars=["class", "sex", "who", "alone"], size=5, aspect=.5) # Draw a seaborn pointplot onto each Axes g.map(sns.pointplot, color=sns.xkcd_rgb["plum"]) g.set(ylim=(0, 1)) sns.despine(fig=g.fig, left=True) seaborn-0.6.0/examples/pairgrid_dotplot.py000066400000000000000000000020401254425553400207110ustar00rootroot00000000000000""" Dot plot with several variables =============================== _thumb: .3, .3 """ import seaborn as sns sns.set(style="whitegrid") # Load the dataset crashes = sns.load_dataset("car_crashes") # Make the PairGrid g = sns.PairGrid(crashes.sort("total", ascending=False), x_vars=crashes.columns[:-3], y_vars=["abbrev"], size=10, aspect=.25) # Draw a dot plot using the stripplot function g.map(sns.stripplot, size=10, orient="h", palette="Reds_r", edgecolor="gray") # Use the same x axis limits on all columns and add better labels g.set(xlim=(0, 25), xlabel="Crashes", ylabel="") # Use semantically meaningful titles for the columns titles = ["Total crashes", "Speeding crashes", "Alcohol crashes", "Not distracted crashes", "No previous crashes"] for ax, title in zip(g.axes.flat, titles): # Set a different title for each axes ax.set(title=title) # Make the grid horizontal instead of vertical ax.xaxis.grid(False) ax.yaxis.grid(True) sns.despine(left=True, bottom=True) seaborn-0.6.0/examples/pointplot_anova.py000066400000000000000000000006551254425553400205710ustar00rootroot00000000000000""" Plotting a three-way ANOVA ========================== _thumb: .42, .5 """ import seaborn as sns sns.set(style="whitegrid") # Load the example exercise dataset df = sns.load_dataset("exercise") # Draw a pointplot to show pulse as a function of three categorical factors g = sns.factorplot(x="time", y="pulse", hue="kind", col="diet", data=df, palette="YlGnBu_d", size=6, aspect=.75) g.despine(left=True) seaborn-0.6.0/examples/regression_marginals.py000066400000000000000000000005261254425553400215670ustar00rootroot00000000000000""" Linear regression with marginal distributions ============================================= _thumb: .5, .6 """ import seaborn as sns sns.set(style="darkgrid", color_codes=True) tips = sns.load_dataset("tips") g = sns.jointplot("total_bill", "tip", data=tips, kind="reg", xlim=(0, 60), ylim=(0, 12), color="r", size=7) seaborn-0.6.0/examples/residplot.py000066400000000000000000000005401254425553400173530ustar00rootroot00000000000000""" Plotting model residuals ======================== """ import numpy as np import seaborn as sns sns.set(style="whitegrid") # Make an example dataset with y ~ x rs = np.random.RandomState(7) x = rs.normal(2, 1, 75) y = 2 + 1.5 * x + rs.normal(0, 2, 75) # Plot the residuals after fitting a linear model sns.residplot(x, y, lowess=True, color="g") seaborn-0.6.0/examples/scatterplot_categorical.py000066400000000000000000000010211254425553400222420ustar00rootroot00000000000000""" Scatterplot with categorical variables ====================================== """ import pandas as pd import seaborn as sns sns.set(style="whitegrid", palette="pastel") # Load the example iris dataset iris = sns.load_dataset("iris") # "Melt" the dataset to "long-form" or "tidy" representation iris = pd.melt(iris, "species", var_name="measurement") # Draw a categorical scatterplot to show each observation sns.stripplot(x="measurement", y="value", hue="species", data=iris, jitter=True, edgecolor="gray") seaborn-0.6.0/examples/scatterplot_matrix.py000066400000000000000000000002351254425553400212770ustar00rootroot00000000000000""" Scatterplot Matrix ================== _thumb: .5, .4 """ import seaborn as sns sns.set() df = sns.load_dataset("iris") sns.pairplot(df, hue="species") seaborn-0.6.0/examples/simple_violinplots.py000066400000000000000000000007571254425553400213130ustar00rootroot00000000000000""" Violinplots with observations ============================= """ import numpy as np import seaborn as sns sns.set() # Create a random dataset across several variables rs = np.random.RandomState(0) n, p = 40, 8 d = rs.normal(0, 2, (n, p)) d += np.log(np.arange(1, p + 1)) * -5 + 10 # Use cubehelix to get a custom sequential palette pal = sns.cubehelix_palette(p, rot=-.5, dark=.3) # Show each distribution with both violins and points sns.violinplot(data=d, palette=pal, inner="points") seaborn-0.6.0/examples/structured_heatmap.py000066400000000000000000000024301254425553400212510ustar00rootroot00000000000000""" Discovering structure in heatmap data ===================================== _thumb: .4, .2 """ import pandas as pd import seaborn as sns sns.set(font="monospace") # Load the brain networks example dataset df = sns.load_dataset("brain_networks", header=[0, 1, 2], index_col=0) # Select a subset of the networks used_networks = [1, 5, 6, 7, 8, 11, 12, 13, 16, 17] used_columns = (df.columns.get_level_values("network") .astype(int) .isin(used_networks)) df = df.loc[:, used_columns] # Create a custom palette to identify the networks network_pal = sns.cubehelix_palette(len(used_networks), light=.9, dark=.1, reverse=True, start=1, rot=-2) network_lut = dict(zip(map(str, used_networks), network_pal)) # Convert the palette to vectors that will be drawn on the side of the matrix networks = df.columns.get_level_values("network") network_colors = pd.Series(networks).map(network_lut) # Create a custom colormap for the heatmap values cmap = sns.diverging_palette(h_neg=210, h_pos=350, s=90, l=30, as_cmap=True) # Draw the full plot sns.clustermap(df.corr(), row_colors=network_colors, linewidths=.5, col_colors=network_colors, figsize=(13, 13), cmap=cmap) seaborn-0.6.0/examples/timeseries_bootstrapped.py000066400000000000000000000010121254425553400223000ustar00rootroot00000000000000""" Plotting timeseries data with bootstrap resampling ================================================== """ import numpy as np import seaborn as sns sns.set(style="darkgrid", palette="Set2") # Create a noisy periodic dataset sines = [] rs = np.random.RandomState(8) for _ in range(15): x = np.linspace(0, 30 / 2, 30) y = np.sin(x) + rs.normal(0, 1.5) + rs.normal(0, .3, 30) sines.append(y) # Plot the average over replicates with bootstrap resamples sns.tsplot(sines, err_style="boot_traces", n_boot=500) seaborn-0.6.0/examples/timeseries_from_dataframe.py000066400000000000000000000005231254425553400225470ustar00rootroot00000000000000""" Timeseries from DataFrame ========================= """ import seaborn as sns sns.set(style="darkgrid") # Load the long-form example gammas dataset gammas = sns.load_dataset("gammas") # Plot the response with standard error sns.tsplot(data=gammas, time="timepoint", unit="subject", condition="ROI", value="BOLD signal") seaborn-0.6.0/examples/timeseries_of_barplots.py000066400000000000000000000010061254425553400221070ustar00rootroot00000000000000""" Barplot timeseries ================== _thumb: .6, .4 """ import numpy as np import seaborn as sns sns.set(style="white") # Load the example planets dataset planets = sns.load_dataset("planets") # Make a range of years to show categories with no observations years = np.arange(2000, 2015) # Draw a count plot to show the number of planets discovered each year g = sns.factorplot(x="year", data=planets, kind="count", palette="BuPu", size=6, aspect=1.5, order=years) g.set_xticklabels(step=2) seaborn-0.6.0/examples/tips.csv000066400000000000000000000230011254425553400164650ustar00rootroot00000000000000"total_bill","tip","sex","smoker","day","time","size" 16.99,1.01,"Female","No","Sun","Dinner",2 10.34,1.66,"Male","No","Sun","Dinner",3 21.01,3.5,"Male","No","Sun","Dinner",3 23.68,3.31,"Male","No","Sun","Dinner",2 24.59,3.61,"Female","No","Sun","Dinner",4 25.29,4.71,"Male","No","Sun","Dinner",4 8.77,2,"Male","No","Sun","Dinner",2 26.88,3.12,"Male","No","Sun","Dinner",4 15.04,1.96,"Male","No","Sun","Dinner",2 14.78,3.23,"Male","No","Sun","Dinner",2 10.27,1.71,"Male","No","Sun","Dinner",2 35.26,5,"Female","No","Sun","Dinner",4 15.42,1.57,"Male","No","Sun","Dinner",2 18.43,3,"Male","No","Sun","Dinner",4 14.83,3.02,"Female","No","Sun","Dinner",2 21.58,3.92,"Male","No","Sun","Dinner",2 10.33,1.67,"Female","No","Sun","Dinner",3 16.29,3.71,"Male","No","Sun","Dinner",3 16.97,3.5,"Female","No","Sun","Dinner",3 20.65,3.35,"Male","No","Sat","Dinner",3 17.92,4.08,"Male","No","Sat","Dinner",2 20.29,2.75,"Female","No","Sat","Dinner",2 15.77,2.23,"Female","No","Sat","Dinner",2 39.42,7.58,"Male","No","Sat","Dinner",4 19.82,3.18,"Male","No","Sat","Dinner",2 17.81,2.34,"Male","No","Sat","Dinner",4 13.37,2,"Male","No","Sat","Dinner",2 12.69,2,"Male","No","Sat","Dinner",2 21.7,4.3,"Male","No","Sat","Dinner",2 19.65,3,"Female","No","Sat","Dinner",2 9.55,1.45,"Male","No","Sat","Dinner",2 18.35,2.5,"Male","No","Sat","Dinner",4 15.06,3,"Female","No","Sat","Dinner",2 20.69,2.45,"Female","No","Sat","Dinner",4 17.78,3.27,"Male","No","Sat","Dinner",2 24.06,3.6,"Male","No","Sat","Dinner",3 16.31,2,"Male","No","Sat","Dinner",3 16.93,3.07,"Female","No","Sat","Dinner",3 18.69,2.31,"Male","No","Sat","Dinner",3 31.27,5,"Male","No","Sat","Dinner",3 16.04,2.24,"Male","No","Sat","Dinner",3 17.46,2.54,"Male","No","Sun","Dinner",2 13.94,3.06,"Male","No","Sun","Dinner",2 9.68,1.32,"Male","No","Sun","Dinner",2 30.4,5.6,"Male","No","Sun","Dinner",4 18.29,3,"Male","No","Sun","Dinner",2 22.23,5,"Male","No","Sun","Dinner",2 32.4,6,"Male","No","Sun","Dinner",4 28.55,2.05,"Male","No","Sun","Dinner",3 18.04,3,"Male","No","Sun","Dinner",2 12.54,2.5,"Male","No","Sun","Dinner",2 10.29,2.6,"Female","No","Sun","Dinner",2 34.81,5.2,"Female","No","Sun","Dinner",4 9.94,1.56,"Male","No","Sun","Dinner",2 25.56,4.34,"Male","No","Sun","Dinner",4 19.49,3.51,"Male","No","Sun","Dinner",2 38.01,3,"Male","Yes","Sat","Dinner",4 26.41,1.5,"Female","No","Sat","Dinner",2 11.24,1.76,"Male","Yes","Sat","Dinner",2 48.27,6.73,"Male","No","Sat","Dinner",4 20.29,3.21,"Male","Yes","Sat","Dinner",2 13.81,2,"Male","Yes","Sat","Dinner",2 11.02,1.98,"Male","Yes","Sat","Dinner",2 18.29,3.76,"Male","Yes","Sat","Dinner",4 17.59,2.64,"Male","No","Sat","Dinner",3 20.08,3.15,"Male","No","Sat","Dinner",3 16.45,2.47,"Female","No","Sat","Dinner",2 3.07,1,"Female","Yes","Sat","Dinner",1 20.23,2.01,"Male","No","Sat","Dinner",2 15.01,2.09,"Male","Yes","Sat","Dinner",2 12.02,1.97,"Male","No","Sat","Dinner",2 17.07,3,"Female","No","Sat","Dinner",3 26.86,3.14,"Female","Yes","Sat","Dinner",2 25.28,5,"Female","Yes","Sat","Dinner",2 14.73,2.2,"Female","No","Sat","Dinner",2 10.51,1.25,"Male","No","Sat","Dinner",2 17.92,3.08,"Male","Yes","Sat","Dinner",2 27.2,4,"Male","No","Thur","Lunch",4 22.76,3,"Male","No","Thur","Lunch",2 17.29,2.71,"Male","No","Thur","Lunch",2 19.44,3,"Male","Yes","Thur","Lunch",2 16.66,3.4,"Male","No","Thur","Lunch",2 10.07,1.83,"Female","No","Thur","Lunch",1 32.68,5,"Male","Yes","Thur","Lunch",2 15.98,2.03,"Male","No","Thur","Lunch",2 34.83,5.17,"Female","No","Thur","Lunch",4 13.03,2,"Male","No","Thur","Lunch",2 18.28,4,"Male","No","Thur","Lunch",2 24.71,5.85,"Male","No","Thur","Lunch",2 21.16,3,"Male","No","Thur","Lunch",2 28.97,3,"Male","Yes","Fri","Dinner",2 22.49,3.5,"Male","No","Fri","Dinner",2 5.75,1,"Female","Yes","Fri","Dinner",2 16.32,4.3,"Female","Yes","Fri","Dinner",2 22.75,3.25,"Female","No","Fri","Dinner",2 40.17,4.73,"Male","Yes","Fri","Dinner",4 27.28,4,"Male","Yes","Fri","Dinner",2 12.03,1.5,"Male","Yes","Fri","Dinner",2 21.01,3,"Male","Yes","Fri","Dinner",2 12.46,1.5,"Male","No","Fri","Dinner",2 11.35,2.5,"Female","Yes","Fri","Dinner",2 15.38,3,"Female","Yes","Fri","Dinner",2 44.3,2.5,"Female","Yes","Sat","Dinner",3 22.42,3.48,"Female","Yes","Sat","Dinner",2 20.92,4.08,"Female","No","Sat","Dinner",2 15.36,1.64,"Male","Yes","Sat","Dinner",2 20.49,4.06,"Male","Yes","Sat","Dinner",2 25.21,4.29,"Male","Yes","Sat","Dinner",2 18.24,3.76,"Male","No","Sat","Dinner",2 14.31,4,"Female","Yes","Sat","Dinner",2 14,3,"Male","No","Sat","Dinner",2 7.25,1,"Female","No","Sat","Dinner",1 38.07,4,"Male","No","Sun","Dinner",3 23.95,2.55,"Male","No","Sun","Dinner",2 25.71,4,"Female","No","Sun","Dinner",3 17.31,3.5,"Female","No","Sun","Dinner",2 29.93,5.07,"Male","No","Sun","Dinner",4 10.65,1.5,"Female","No","Thur","Lunch",2 12.43,1.8,"Female","No","Thur","Lunch",2 24.08,2.92,"Female","No","Thur","Lunch",4 11.69,2.31,"Male","No","Thur","Lunch",2 13.42,1.68,"Female","No","Thur","Lunch",2 14.26,2.5,"Male","No","Thur","Lunch",2 15.95,2,"Male","No","Thur","Lunch",2 12.48,2.52,"Female","No","Thur","Lunch",2 29.8,4.2,"Female","No","Thur","Lunch",6 8.52,1.48,"Male","No","Thur","Lunch",2 14.52,2,"Female","No","Thur","Lunch",2 11.38,2,"Female","No","Thur","Lunch",2 22.82,2.18,"Male","No","Thur","Lunch",3 19.08,1.5,"Male","No","Thur","Lunch",2 20.27,2.83,"Female","No","Thur","Lunch",2 11.17,1.5,"Female","No","Thur","Lunch",2 12.26,2,"Female","No","Thur","Lunch",2 18.26,3.25,"Female","No","Thur","Lunch",2 8.51,1.25,"Female","No","Thur","Lunch",2 10.33,2,"Female","No","Thur","Lunch",2 14.15,2,"Female","No","Thur","Lunch",2 16,2,"Male","Yes","Thur","Lunch",2 13.16,2.75,"Female","No","Thur","Lunch",2 17.47,3.5,"Female","No","Thur","Lunch",2 34.3,6.7,"Male","No","Thur","Lunch",6 41.19,5,"Male","No","Thur","Lunch",5 27.05,5,"Female","No","Thur","Lunch",6 16.43,2.3,"Female","No","Thur","Lunch",2 8.35,1.5,"Female","No","Thur","Lunch",2 18.64,1.36,"Female","No","Thur","Lunch",3 11.87,1.63,"Female","No","Thur","Lunch",2 9.78,1.73,"Male","No","Thur","Lunch",2 7.51,2,"Male","No","Thur","Lunch",2 14.07,2.5,"Male","No","Sun","Dinner",2 13.13,2,"Male","No","Sun","Dinner",2 17.26,2.74,"Male","No","Sun","Dinner",3 24.55,2,"Male","No","Sun","Dinner",4 19.77,2,"Male","No","Sun","Dinner",4 29.85,5.14,"Female","No","Sun","Dinner",5 48.17,5,"Male","No","Sun","Dinner",6 25,3.75,"Female","No","Sun","Dinner",4 13.39,2.61,"Female","No","Sun","Dinner",2 16.49,2,"Male","No","Sun","Dinner",4 21.5,3.5,"Male","No","Sun","Dinner",4 12.66,2.5,"Male","No","Sun","Dinner",2 16.21,2,"Female","No","Sun","Dinner",3 13.81,2,"Male","No","Sun","Dinner",2 17.51,3,"Female","Yes","Sun","Dinner",2 24.52,3.48,"Male","No","Sun","Dinner",3 20.76,2.24,"Male","No","Sun","Dinner",2 31.71,4.5,"Male","No","Sun","Dinner",4 10.59,1.61,"Female","Yes","Sat","Dinner",2 10.63,2,"Female","Yes","Sat","Dinner",2 50.81,10,"Male","Yes","Sat","Dinner",3 15.81,3.16,"Male","Yes","Sat","Dinner",2 7.25,5.15,"Male","Yes","Sun","Dinner",2 31.85,3.18,"Male","Yes","Sun","Dinner",2 16.82,4,"Male","Yes","Sun","Dinner",2 32.9,3.11,"Male","Yes","Sun","Dinner",2 17.89,2,"Male","Yes","Sun","Dinner",2 14.48,2,"Male","Yes","Sun","Dinner",2 9.6,4,"Female","Yes","Sun","Dinner",2 34.63,3.55,"Male","Yes","Sun","Dinner",2 34.65,3.68,"Male","Yes","Sun","Dinner",4 23.33,5.65,"Male","Yes","Sun","Dinner",2 45.35,3.5,"Male","Yes","Sun","Dinner",3 23.17,6.5,"Male","Yes","Sun","Dinner",4 40.55,3,"Male","Yes","Sun","Dinner",2 20.69,5,"Male","No","Sun","Dinner",5 20.9,3.5,"Female","Yes","Sun","Dinner",3 30.46,2,"Male","Yes","Sun","Dinner",5 18.15,3.5,"Female","Yes","Sun","Dinner",3 23.1,4,"Male","Yes","Sun","Dinner",3 15.69,1.5,"Male","Yes","Sun","Dinner",2 19.81,4.19,"Female","Yes","Thur","Lunch",2 28.44,2.56,"Male","Yes","Thur","Lunch",2 15.48,2.02,"Male","Yes","Thur","Lunch",2 16.58,4,"Male","Yes","Thur","Lunch",2 7.56,1.44,"Male","No","Thur","Lunch",2 10.34,2,"Male","Yes","Thur","Lunch",2 43.11,5,"Female","Yes","Thur","Lunch",4 13,2,"Female","Yes","Thur","Lunch",2 13.51,2,"Male","Yes","Thur","Lunch",2 18.71,4,"Male","Yes","Thur","Lunch",3 12.74,2.01,"Female","Yes","Thur","Lunch",2 13,2,"Female","Yes","Thur","Lunch",2 16.4,2.5,"Female","Yes","Thur","Lunch",2 20.53,4,"Male","Yes","Thur","Lunch",4 16.47,3.23,"Female","Yes","Thur","Lunch",3 26.59,3.41,"Male","Yes","Sat","Dinner",3 38.73,3,"Male","Yes","Sat","Dinner",4 24.27,2.03,"Male","Yes","Sat","Dinner",2 12.76,2.23,"Female","Yes","Sat","Dinner",2 30.06,2,"Male","Yes","Sat","Dinner",3 25.89,5.16,"Male","Yes","Sat","Dinner",4 48.33,9,"Male","No","Sat","Dinner",4 13.27,2.5,"Female","Yes","Sat","Dinner",2 28.17,6.5,"Female","Yes","Sat","Dinner",3 12.9,1.1,"Female","Yes","Sat","Dinner",2 28.15,3,"Male","Yes","Sat","Dinner",5 11.59,1.5,"Male","Yes","Sat","Dinner",2 7.74,1.44,"Male","Yes","Sat","Dinner",2 30.14,3.09,"Female","Yes","Sat","Dinner",4 12.16,2.2,"Male","Yes","Fri","Lunch",2 13.42,3.48,"Female","Yes","Fri","Lunch",2 8.58,1.92,"Male","Yes","Fri","Lunch",1 15.98,3,"Female","No","Fri","Lunch",3 13.42,1.58,"Male","Yes","Fri","Lunch",2 16.27,2.5,"Female","Yes","Fri","Lunch",2 10.09,2,"Female","Yes","Fri","Lunch",2 20.45,3,"Male","No","Sat","Dinner",4 13.28,2.72,"Male","No","Sat","Dinner",2 22.12,2.88,"Female","Yes","Sat","Dinner",2 24.01,2,"Male","Yes","Sat","Dinner",4 15.69,3,"Male","Yes","Sat","Dinner",3 11.61,3.39,"Male","No","Sat","Dinner",2 10.77,1.47,"Male","No","Sat","Dinner",2 15.53,3,"Male","Yes","Sat","Dinner",2 10.07,1.25,"Male","No","Sat","Dinner",2 12.6,1,"Male","Yes","Sat","Dinner",2 32.83,1.17,"Male","Yes","Sat","Dinner",2 35.83,4.67,"Female","No","Sat","Dinner",3 29.03,5.92,"Male","No","Sat","Dinner",3 27.18,2,"Female","Yes","Sat","Dinner",2 22.67,2,"Male","Yes","Sat","Dinner",2 17.82,1.75,"Male","No","Sat","Dinner",2 18.78,3,"Female","No","Thur","Dinner",2 seaborn-0.6.0/licences/000077500000000000000000000000001254425553400147445ustar00rootroot00000000000000seaborn-0.6.0/licences/HUSL_LICENSE000066400000000000000000000020431254425553400166030ustar00rootroot00000000000000Copyright (C) 2012 Alexei Boronine Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. seaborn-0.6.0/licences/SIX_LICENSE000066400000000000000000000020521254425553400164730ustar00rootroot00000000000000Copyright (c) 2010-2014 Benjamin Peterson Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. seaborn-0.6.0/requirements.txt000066400000000000000000000000601254425553400164370ustar00rootroot00000000000000numpy scipy matplotlib pandas statsmodels patsy seaborn-0.6.0/seaborn/000077500000000000000000000000001254425553400146105ustar00rootroot00000000000000seaborn-0.6.0/seaborn/__init__.py000066400000000000000000000005171254425553400167240ustar00rootroot00000000000000from .rcmod import * from .utils import * from .palettes import * from .linearmodels import * from .categorical import * from .distributions import * from .timeseries import * from .matrix import * from .miscplot import * from .axisgrid import * from .xkcd_rgb import xkcd_rgb from .crayons import crayons set() __version__ = "0.6.0" seaborn-0.6.0/seaborn/algorithms.py000066400000000000000000000153511254425553400173400ustar00rootroot00000000000000"""Algorithms to support fitting routines in seaborn plotting functions.""" from __future__ import division import numpy as np from scipy import stats from .external.six.moves import range def bootstrap(*args, **kwargs): """Resample one or more arrays with replacement and store aggregate values. Positional arguments are a sequence of arrays to bootstrap along the first axis and pass to a summary function. Keyword arguments: n_boot : int, default 10000 Number of iterations axis : int, default None Will pass axis to ``func`` as a keyword argument. units : array, default None Array of sampling unit IDs. When used the bootstrap resamples units and then observations within units instead of individual datapoints. smooth : bool, default False If True, performs a smoothed bootstrap (draws samples from a kernel destiny estimate); only works for one-dimensional inputs and cannot be used `units` is present. func : callable, default np.mean Function to call on the args that are passed in. random_seed : int | None, default None Seed for the random number generator; useful if you want reproducible resamples. Returns ------- boot_dist: array array of bootstrapped statistic values """ # Ensure list of arrays are same length if len(np.unique(list(map(len, args)))) > 1: raise ValueError("All input arrays must have the same length") n = len(args[0]) # Default keyword arguments n_boot = kwargs.get("n_boot", 10000) func = kwargs.get("func", np.mean) axis = kwargs.get("axis", None) units = kwargs.get("units", None) smooth = kwargs.get("smooth", False) random_seed = kwargs.get("random_seed", None) if axis is None: func_kwargs = dict() else: func_kwargs = dict(axis=axis) # Initialize the resampler rs = np.random.RandomState(random_seed) # Coerce to arrays args = list(map(np.asarray, args)) if units is not None: units = np.asarray(units) # Do the bootstrap if smooth: return _smooth_bootstrap(args, n_boot, func, func_kwargs) if units is not None: return _structured_bootstrap(args, n_boot, units, func, func_kwargs, rs) boot_dist = [] for i in range(int(n_boot)): resampler = rs.randint(0, n, n) sample = [a.take(resampler, axis=0) for a in args] boot_dist.append(func(*sample, **func_kwargs)) return np.array(boot_dist) def _structured_bootstrap(args, n_boot, units, func, func_kwargs, rs): """Resample units instead of datapoints.""" unique_units = np.unique(units) n_units = len(unique_units) args = [[a[units == unit] for unit in unique_units] for a in args] boot_dist = [] for i in range(int(n_boot)): resampler = rs.randint(0, n_units, n_units) sample = [np.take(a, resampler, axis=0) for a in args] lengths = map(len, sample[0]) resampler = [rs.randint(0, n, n) for n in lengths] sample = [[c.take(r, axis=0) for c, r in zip(a, resampler)] for a in sample] sample = list(map(np.concatenate, sample)) boot_dist.append(func(*sample, **func_kwargs)) return np.array(boot_dist) def _smooth_bootstrap(args, n_boot, func, func_kwargs): """Bootstrap by resampling from a kernel density estimate.""" n = len(args[0]) boot_dist = [] kde = [stats.gaussian_kde(np.transpose(a)) for a in args] for i in range(int(n_boot)): sample = [a.resample(n).T for a in kde] boot_dist.append(func(*sample, **func_kwargs)) return np.array(boot_dist) def randomize_corrmat(a, tail="both", corrected=True, n_iter=1000, random_seed=None, return_dist=False): """Test the significance of set of correlations with permutations. By default this corrects for multiple comparisons across one side of the matrix. Parameters ---------- a : n_vars x n_obs array array with variables as rows tail : both | upper | lower whether test should be two-tailed, or which tail to integrate over corrected : boolean if True reports p values with respect to the max stat distribution n_iter : int number of permutation iterations random_seed : int or None seed for RNG return_dist : bool if True, return n_vars x n_vars x n_iter Returns ------- p_mat : float array of probabilites for actual correlation from null CDF """ if tail not in ["upper", "lower", "both"]: raise ValueError("'tail' must be 'upper', 'lower', or 'both'") rs = np.random.RandomState(random_seed) a = np.asarray(a, np.float) flat_a = a.ravel() n_vars, n_obs = a.shape # Do the permutations to establish a null distribution null_dist = np.empty((n_vars, n_vars, n_iter)) for i_i in range(n_iter): perm_i = np.concatenate([rs.permutation(n_obs) + (v * n_obs) for v in range(n_vars)]) a_i = flat_a[perm_i].reshape(n_vars, n_obs) null_dist[..., i_i] = np.corrcoef(a_i) # Get the observed correlation values real_corr = np.corrcoef(a) # Figure out p values based on the permutation distribution p_mat = np.zeros((n_vars, n_vars)) upper_tri = np.triu_indices(n_vars, 1) if corrected: if tail == "both": max_dist = np.abs(null_dist[upper_tri]).max(axis=0) elif tail == "lower": max_dist = null_dist[upper_tri].min(axis=0) elif tail == "upper": max_dist = null_dist[upper_tri].max(axis=0) cdf = lambda x: stats.percentileofscore(max_dist, x) / 100. for i, j in zip(*upper_tri): observed = real_corr[i, j] if tail == "both": p_ij = 1 - cdf(abs(observed)) elif tail == "lower": p_ij = cdf(observed) elif tail == "upper": p_ij = 1 - cdf(observed) p_mat[i, j] = p_ij else: for i, j in zip(*upper_tri): null_corrs = null_dist[i, j] cdf = lambda x: stats.percentileofscore(null_corrs, x) / 100. observed = real_corr[i, j] if tail == "both": p_ij = 2 * (1 - cdf(abs(observed))) elif tail == "lower": p_ij = cdf(observed) elif tail == "upper": p_ij = 1 - cdf(observed) p_mat[i, j] = p_ij # Make p matrix symettrical with nans on the diagonal p_mat += p_mat.T p_mat[np.diag_indices(n_vars)] = np.nan if return_dist: return p_mat, null_dist return p_mat seaborn-0.6.0/seaborn/apionly.py000066400000000000000000000000431254425553400166320ustar00rootroot00000000000000from seaborn import * reset_orig() seaborn-0.6.0/seaborn/axisgrid.py000066400000000000000000002022341254425553400167770ustar00rootroot00000000000000from __future__ import division from itertools import product from distutils.version import LooseVersion import warnings from textwrap import dedent import numpy as np import pandas as pd import matplotlib as mpl import matplotlib.pyplot as plt from six import string_types from . import utils from .palettes import color_palette class Grid(object): """Base class for grids of subplots.""" _margin_titles = False _legend_out = True def set(self, **kwargs): """Set attributes on each subplot Axes.""" for ax in self.axes.flat: ax.set(**kwargs) return self def savefig(self, *args, **kwargs): """Save the figure.""" kwargs = kwargs.copy() kwargs.setdefault("bbox_inches", "tight") self.fig.savefig(*args, **kwargs) def add_legend(self, legend_data=None, title=None, label_order=None, **kwargs): """Draw a legend, maybe placing it outside axes and resizing the figure. Parameters ---------- legend_data : dict, optional Dictionary mapping label names to matplotlib artist handles. The default reads from ``self._legend_data``. title : string, optional Title for the legend. The default reads from ``self._hue_var``. label_order : list of labels, optional The order that the legend entries should appear in. The default reads from ``self.hue_names`` or sorts the keys in ``legend_data``. kwargs : key, value pairings Other keyword arguments are passed to the underlying legend methods on the Figure or Axes object. Returns ------- self : Grid instance Returns self for easy chaining. """ # Find the data for the legend legend_data = self._legend_data if legend_data is None else legend_data if label_order is None: if self.hue_names is None: label_order = np.sort(list(legend_data.keys())) else: label_order = list(map(str, self.hue_names)) blank_handle = mpl.patches.Patch(alpha=0, linewidth=0) handles = [legend_data.get(l, blank_handle) for l in label_order] title = self._hue_var if title is None else title try: title_size = mpl.rcParams["axes.labelsize"] * .85 except TypeError: # labelsize is something like "large" title_size = mpl.rcParams["axes.labelsize"] # Set default legend kwargs kwargs.setdefault("scatterpoints", 1) if self._legend_out: # Draw a full-figure legend outside the grid figlegend = self.fig.legend(handles, label_order, "center right", **kwargs) self._legend = figlegend figlegend.set_title(title) # Set the title size a roundabout way to maintain # compatability with matplotlib 1.1 prop = mpl.font_manager.FontProperties(size=title_size) figlegend._legend_title_box._text.set_font_properties(prop) # Draw the plot to set the bounding boxes correctly plt.draw() # Calculate and set the new width of the figure so the legend fits legend_width = figlegend.get_window_extent().width / self.fig.dpi figure_width = self.fig.get_figwidth() self.fig.set_figwidth(figure_width + legend_width) # Draw the plot again to get the new transformations plt.draw() # Now calculate how much space we need on the right side legend_width = figlegend.get_window_extent().width / self.fig.dpi space_needed = legend_width / (figure_width + legend_width) margin = .04 if self._margin_titles else .01 self._space_needed = margin + space_needed right = 1 - self._space_needed # Place the subplot axes to give space for the legend self.fig.subplots_adjust(right=right) else: # Draw a legend in the first axis ax = self.axes.flat[0] leg = ax.legend(handles, label_order, loc="best", **kwargs) leg.set_title(title) # Set the title size a roundabout way to maintain # compatability with matplotlib 1.1 prop = mpl.font_manager.FontProperties(size=title_size) leg._legend_title_box._text.set_font_properties(prop) return self def _clean_axis(self, ax): """Turn off axis labels and legend.""" ax.set_xlabel("") ax.set_ylabel("") ax.legend_ = None return self def _update_legend_data(self, ax): """Extract the legend data from an axes object and save it.""" handles, labels = ax.get_legend_handles_labels() data = {l: h for h, l in zip(handles, labels)} self._legend_data.update(data) def _get_palette(self, data, hue, hue_order, palette): """Get a list of colors for the hue variable.""" if hue is None: palette = color_palette(n_colors=1) else: hue_names = utils.categorical_order(data[hue], hue_order) n_colors = len(hue_names) # By default use either the current color palette or HUSL if palette is None: current_palette = mpl.rcParams["axes.color_cycle"] if n_colors > len(current_palette): colors = color_palette("husl", n_colors) else: colors = color_palette(n_colors=n_colors) # Allow for palette to map from hue variable names elif isinstance(palette, dict): color_names = [palette[h] for h in hue_names] colors = color_palette(color_names, n_colors) # Otherwise act as if we just got a list of colors else: colors = color_palette(palette, n_colors) palette = color_palette(colors, n_colors) return palette _facet_docs = dict( data=dedent("""\ data : DataFrame Tidy ("long-form") dataframe where each column is a variable and each row is an observation.\ """), col_wrap=dedent("""\ col_wrap : int, optional "Wrap" the column variable at this width, so that the column facets span multiple rows. Incompatible with a ``row`` facet.\ """), share_xy=dedent("""\ share_{x,y} : bool, optional If true, the facets will share y axes across columns and/or x axes across rows.\ """), size=dedent("""\ size : scalar, optional Height (in inches) of each facet. See also: ``aspect``.\ """), aspect=dedent("""\ aspect : scalar, optional Aspect ratio of each facet, so that ``aspect * size`` gives the width of each facet in inches.\ """), palette=dedent("""\ palette : seaborn color palette or dict, optional Colors to use for the different levels of the ``hue`` variable. Should be something that can be interpreted by :func:`color_palette`, or a dictionary mapping hue levels to matplotlib colors.\ """), legend_out=dedent("""\ legend_out : bool, optional If ``True``, the figure size will be extended, and the legend will be drawn outside the plot on the center right.\ """), margin_titles=dedent("""\ margin_titles : bool, optional If ``True``, the titles for the row variable are drawn to the right of the last column. This option is experimental and may not work in all cases.\ """), ) class FacetGrid(Grid): """Subplot grid for plotting conditional relationships.""" def __init__(self, data, row=None, col=None, hue=None, col_wrap=None, sharex=True, sharey=True, size=3, aspect=1, palette=None, row_order=None, col_order=None, hue_order=None, hue_kws=None, dropna=True, legend_out=True, despine=True, margin_titles=False, xlim=None, ylim=None, subplot_kws=None, gridspec_kws=None): MPL_GRIDSPEC_VERSION = LooseVersion('1.4') OLD_MPL = LooseVersion(mpl.__version__) < MPL_GRIDSPEC_VERSION # Determine the hue facet layer information hue_var = hue if hue is None: hue_names = None else: hue_names = utils.categorical_order(data[hue], hue_order) colors = self._get_palette(data, hue, hue_order, palette) # Set up the lists of names for the row and column facet variables if row is None: row_names = [] else: row_names = utils.categorical_order(data[row], row_order) if col is None: col_names = [] else: col_names = utils.categorical_order(data[col], col_order) # Additional dict of kwarg -> list of values for mapping the hue var hue_kws = hue_kws if hue_kws is not None else {} # Make a boolean mask that is True anywhere there is an NA # value in one of the faceting variables, but only if dropna is True none_na = np.zeros(len(data), np.bool) if dropna: row_na = none_na if row is None else data[row].isnull() col_na = none_na if col is None else data[col].isnull() hue_na = none_na if hue is None else data[hue].isnull() not_na = ~(row_na | col_na | hue_na) else: not_na = ~none_na # Compute the grid shape ncol = 1 if col is None else len(col_names) nrow = 1 if row is None else len(row_names) self._n_facets = ncol * nrow self._col_wrap = col_wrap if col_wrap is not None: if row is not None: err = "Cannot use `row` and `col_wrap` together." raise ValueError(err) ncol = col_wrap nrow = int(np.ceil(len(data[col].unique()) / col_wrap)) self._ncol = ncol self._nrow = nrow # Calculate the base figure size # This can get stretched later by a legend figsize = (ncol * size * aspect, nrow * size) # Validate some inputs if col_wrap is not None: margin_titles = False # Build the subplot keyword dictionary subplot_kws = {} if subplot_kws is None else subplot_kws.copy() gridspec_kws = {} if gridspec_kws is None else gridspec_kws.copy() if xlim is not None: subplot_kws["xlim"] = xlim if ylim is not None: subplot_kws["ylim"] = ylim # Initialize the subplot grid if col_wrap is None: kwargs = dict(figsize=figsize, squeeze=False, sharex=sharex, sharey=sharey, subplot_kw=subplot_kws, gridspec_kw=gridspec_kws) if OLD_MPL: _ = kwargs.pop('gridspec_kw', None) if gridspec_kws: msg = "gridspec module only available in mpl >= {}" warnings.warn(msg.format(MPL_GRIDSPEC_VERSION)) fig, axes = plt.subplots(nrow, ncol, **kwargs) self.axes = axes else: # If wrapping the col variable we need to make the grid ourselves if gridspec_kws: warnings.warn("`gridspec_kws` ignored when using `col_wrap`") n_axes = len(col_names) fig = plt.figure(figsize=figsize) axes = np.empty(n_axes, object) axes[0] = fig.add_subplot(nrow, ncol, 1, **subplot_kws) if sharex: subplot_kws["sharex"] = axes[0] if sharey: subplot_kws["sharey"] = axes[0] for i in range(1, n_axes): axes[i] = fig.add_subplot(nrow, ncol, i + 1, **subplot_kws) self.axes = axes # Now we turn off labels on the inner axes if sharex: for ax in self._not_bottom_axes: for label in ax.get_xticklabels(): label.set_visible(False) ax.xaxis.offsetText.set_visible(False) if sharey: for ax in self._not_left_axes: for label in ax.get_yticklabels(): label.set_visible(False) ax.yaxis.offsetText.set_visible(False) # Set up the class attributes # --------------------------- # First the public API self.data = data self.fig = fig self.axes = axes self.row_names = row_names self.col_names = col_names self.hue_names = hue_names self.hue_kws = hue_kws # Next the private variables self._nrow = nrow self._row_var = row self._ncol = ncol self._col_var = col self._margin_titles = margin_titles self._col_wrap = col_wrap self._hue_var = hue_var self._colors = colors self._legend_out = legend_out self._legend = None self._legend_data = {} self._x_var = None self._y_var = None self._dropna = dropna self._not_na = not_na # Make the axes look good fig.tight_layout() if despine: self.despine() __init__.__doc__ = dedent("""\ Initialize the matplotlib figure and FacetGrid object. The :class:`FacetGrid` is an object that links a Pandas DataFrame to a matplotlib figure with a particular structure. In particular, :class:`FacetGrid` is used to draw plots with multiple Axes where each Axes shows the same relationship conditioned on different levels of some variable. It's possible to condition on up to three variables by assigning variables to the rows and columns of the grid and using different colors for the plot elements. The general approach to plotting here is called "small multiples", where the same kind of plot is repeated multiple times, and the specific use of small multiples to display the same relationship conditioned on one ore more other variables is often called a "trellis plot". The basic workflow is to initialize the :class:`FacetGrid` object with the dataset and the variables that are used to structure the grid. Then one or more plotting functions can be applied to each subset by calling :meth:`FacetGrid.map` or :meth:`FacetGrid.map_dataframe`. Finally, the plot can be tweaked with other methods to do things like change the axis labels, use different ticks, or add a legend. See the detailed code examples below for more information. Parameters ---------- {data} row, col, hue : strings Variables that define subsets of the data, which will be drawn on separate facets in the grid. See the ``*_order`` parameters to control the order of levels of this variable. {col_wrap} {share_xy} {size} {aspect} {palette} {{row,col,hue}}_order : lists, optional Order for the levels of the faceting variables. By default, this will be the order that the levels appear in ``data`` or, if the variables are pandas categoricals, the category order. hue_kws : dictionary of param -> list of values mapping Other keyword arguments to insert into the plotting call to let other plot attributes vary across levels of the hue variable (e.g. the markers in a scatterplot). {legend_out} despine : boolean, optional Remove the top and right spines from the plots. {margin_titles} {{x, y}}lim: tuples, optional Limits for each of the axes on each facet (only relevant when share{{x, y}} is True. subplot_kws : dict, optional Dictionary of keyword arguments passed to matplotlib subplot(s) methods. gridspec_kws : dict, optional Dictionary of keyword arguments passed to matplotlib's ``gridspec`` module (via ``plt.subplots``). Requires matplotlib >= 1.4 and is ignored if ``col_wrap`` is not ``None``. See Also -------- PairGrid : Subplot grid for plotting pairwise relationships. lmplot : Combine a regression plot and a :class:`FacetGrid`. factorplot : Combine a categorical plot and a :class:`FacetGrid`. Examples -------- Initialize a 2x2 grid of facets using the tips dataset: .. plot:: :context: close-figs >>> import seaborn as sns; sns.set(style="ticks", color_codes=True) >>> tips = sns.load_dataset("tips") >>> g = sns.FacetGrid(tips, col="time", row="smoker") Draw a univariate plot on each facet: .. plot:: :context: close-figs >>> import matplotlib.pyplot as plt >>> g = sns.FacetGrid(tips, col="time", row="smoker") >>> g = g.map(plt.hist, "total_bill") (Note that it's not necessary to re-catch the returned variable; it's the same object, but doing so in the examples makes dealing with the doctests somewhat less annoying). Pass additional keyword arguments to the mapped function: .. plot:: :context: close-figs >>> import numpy as np >>> bins = np.arange(0, 65, 5) >>> g = sns.FacetGrid(tips, col="time", row="smoker") >>> g = g.map(plt.hist, "total_bill", bins=bins, color="r") Plot a bivariate function on each facet: .. plot:: :context: close-figs >>> g = sns.FacetGrid(tips, col="time", row="smoker") >>> g = g.map(plt.scatter, "total_bill", "tip", edgecolor="w") Assign one of the variables to the color of the plot elements: .. plot:: :context: close-figs >>> g = sns.FacetGrid(tips, col="time", hue="smoker") >>> g = (g.map(plt.scatter, "total_bill", "tip", edgecolor="w") ... .add_legend()) Change the size and aspect ratio of each facet: .. plot:: :context: close-figs >>> g = sns.FacetGrid(tips, col="day", size=4, aspect=.5) >>> g = g.map(sns.boxplot, "time", "total_bill") Specify the order for plot elements: .. plot:: :context: close-figs >>> g = sns.FacetGrid(tips, col="smoker", col_order=["Yes", "No"]) >>> g = g.map(plt.hist, "total_bill", bins=bins, color="m") Use a different color palette: .. plot:: :context: close-figs >>> kws = dict(s=50, linewidth=.5, edgecolor="w") >>> g = sns.FacetGrid(tips, col="sex", hue="time", palette="Set1", ... hue_order=["Dinner", "Lunch"]) >>> g = (g.map(plt.scatter, "total_bill", "tip", **kws) ... .add_legend()) Use a dictionary mapping hue levels to colors: .. plot:: :context: close-figs >>> pal = dict(Lunch="seagreen", Dinner="gray") >>> g = sns.FacetGrid(tips, col="sex", hue="time", palette=pal, ... hue_order=["Dinner", "Lunch"]) >>> g = (g.map(plt.scatter, "total_bill", "tip", **kws) ... .add_legend()) Additionally use a different marker for the hue levels: .. plot:: :context: close-figs >>> g = sns.FacetGrid(tips, col="sex", hue="time", palette=pal, ... hue_order=["Dinner", "Lunch"], ... hue_kws=dict(marker=["^", "v"])) >>> g = (g.map(plt.scatter, "total_bill", "tip", **kws) ... .add_legend()) "Wrap" a column variable with many levels into the rows: .. plot:: :context: close-figs >>> attend = sns.load_dataset("attention") >>> g = sns.FacetGrid(attend, col="subject", col_wrap=5, ... size=1.5, ylim=(0, 10)) >>> g = g.map(sns.pointplot, "solutions", "score", scale=.7) Define a custom bivariate function to map onto the grid: .. plot:: :context: close-figs >>> from scipy import stats >>> def qqplot(x, y, **kwargs): ... _, xr = stats.probplot(x, fit=False) ... _, yr = stats.probplot(y, fit=False) ... plt.scatter(xr, yr, **kwargs) >>> g = sns.FacetGrid(tips, col="smoker", hue="sex") >>> g = (g.map(qqplot, "total_bill", "tip", **kws) ... .add_legend()) Define a custom function that uses a ``DataFrame`` object and accepts column names as positional variables: .. plot:: :context: close-figs >>> import pandas as pd >>> df = pd.DataFrame( ... data=np.random.randn(90, 4), ... columns=pd.Series(list("ABCD"), name="walk"), ... index=pd.date_range("Jan 1", "March 31", name="date")) >>> df = df.cumsum(axis=0).stack().reset_index(name="val") >>> def dateplot(x, y, **kwargs): ... ax = plt.gca() ... data = kwargs.pop("data") ... data.plot(x=x, y=y, ax=ax, grid=False, **kwargs) >>> g = sns.FacetGrid(df, col="walk", col_wrap=2, size=3.5) >>> g = g.map_dataframe(dateplot, "date", "val") Use different axes labels after plotting: .. plot:: :context: close-figs >>> g = sns.FacetGrid(tips, col="smoker", row="sex") >>> g = (g.map(plt.scatter, "total_bill", "tip", color="g", **kws) ... .set_axis_labels("Total bill (US Dollars)", "Tip")) Set other attributes that are shared across the facetes: .. plot:: :context: close-figs >>> g = sns.FacetGrid(tips, col="smoker", row="sex") >>> g = (g.map(plt.scatter, "total_bill", "tip", color="r", **kws) ... .set(xlim=(0, 60), ylim=(0, 12), ... xticks=[10, 30, 50], yticks=[2, 6, 10])) Use a different template for the facet titles: .. plot:: :context: close-figs >>> g = sns.FacetGrid(tips.sort("size"), col="size", col_wrap=3) >>> g = (g.map(plt.hist, "tip", bins=np.arange(0, 13), color="c") ... .set_titles("{{col_name}} diners")) Tighten the facets: .. plot:: :context: close-figs >>> g = sns.FacetGrid(tips, col="smoker", row="sex", ... margin_titles=True) >>> g = (g.map(plt.scatter, "total_bill", "tip", color="m", **kws) ... .set(xlim=(0, 60), ylim=(0, 12), ... xticks=[10, 30, 50], yticks=[2, 6, 10]) ... .fig.subplots_adjust(wspace=.05, hspace=.05)) """).format(**_facet_docs) def facet_data(self): """Generator for name indices and data subsets for each facet. Yields ------ (i, j, k), data_ijk : tuple of ints, DataFrame The ints provide an index into the {row, col, hue}_names attribute, and the dataframe contains a subset of the full data corresponding to each facet. The generator yields subsets that correspond with the self.axes.flat iterator, or self.axes[i, j] when `col_wrap` is None. """ data = self.data # Construct masks for the row variable if self._nrow == 1 or self._col_wrap is not None: row_masks = [np.repeat(True, len(self.data))] else: row_masks = [data[self._row_var] == n for n in self.row_names] # Construct masks for the column variable if self._ncol == 1: col_masks = [np.repeat(True, len(self.data))] else: col_masks = [data[self._col_var] == n for n in self.col_names] # Construct masks for the hue variable if len(self._colors) == 1: hue_masks = [np.repeat(True, len(self.data))] else: hue_masks = [data[self._hue_var] == n for n in self.hue_names] # Here is the main generator loop for (i, row), (j, col), (k, hue) in product(enumerate(row_masks), enumerate(col_masks), enumerate(hue_masks)): data_ijk = data[row & col & hue & self._not_na] yield (i, j, k), data_ijk def map(self, func, *args, **kwargs): """Apply a plotting function to each facet's subset of the data. Parameters ---------- func : callable A plotting function that takes data and keyword arguments. It must plot to the currently active matplotlib Axes and take a `color` keyword argument. If faceting on the `hue` dimension, it must also take a `label` keyword argument. args : strings Column names in self.data that identify variables with data to plot. The data for each variable is passed to `func` in the order the variables are specified in the call. kwargs : keyword arguments All keyword arguments are passed to the plotting function. Returns ------- self : object Returns self. """ # If color was a keyword argument, grab it here kw_color = kwargs.pop("color", None) # Iterate over the data subsets for (row_i, col_j, hue_k), data_ijk in self.facet_data(): # If this subset is null, move on if not data_ijk.values.tolist(): continue # Get the current axis ax = self.facet_axis(row_i, col_j) # Decide what color to plot with kwargs["color"] = self._facet_color(hue_k, kw_color) # Insert the other hue aesthetics if appropriate for kw, val_list in self.hue_kws.items(): kwargs[kw] = val_list[hue_k] # Insert a label in the keyword arguments for the legend if self._hue_var is not None: kwargs["label"] = str(self.hue_names[hue_k]) # Get the actual data we are going to plot with plot_data = data_ijk[list(args)] if self._dropna: plot_data = plot_data.dropna() plot_args = [v for k, v in plot_data.iteritems()] # Some matplotlib functions don't handle pandas objects correctly if func.__module__ is not None: if func.__module__.startswith("matplotlib"): plot_args = [v.values for v in plot_args] # Draw the plot self._facet_plot(func, ax, plot_args, kwargs) # Finalize the annotations and layout self._finalize_grid(args[:2]) return self def map_dataframe(self, func, *args, **kwargs): """Like `map` but passes args as strings and inserts data in kwargs. This method is suitable for plotting with functions that accept a long-form DataFrame as a `data` keyword argument and access the data in that DataFrame using string variable names. Parameters ---------- func : callable A plotting function that takes data and keyword arguments. Unlike the `map` method, a function used here must "understand" Pandas objects. It also must plot to the currently active matplotlib Axes and take a `color` keyword argument. If faceting on the `hue` dimension, it must also take a `label` keyword argument. args : strings Column names in self.data that identify variables with data to plot. The data for each variable is passed to `func` in the order the variables are specified in the call. kwargs : keyword arguments All keyword arguments are passed to the plotting function. Returns ------- self : object Returns self. """ # If color was a keyword argument, grab it here kw_color = kwargs.pop("color", None) # Iterate over the data subsets for (row_i, col_j, hue_k), data_ijk in self.facet_data(): # If this subset is null, move on if not data_ijk.values.tolist(): continue # Get the current axis ax = self.facet_axis(row_i, col_j) # Decide what color to plot with kwargs["color"] = self._facet_color(hue_k, kw_color) # Insert the other hue aesthetics if appropriate for kw, val_list in self.hue_kws.items(): kwargs[kw] = val_list[hue_k] # Insert a label in the keyword arguments for the legend if self._hue_var is not None: kwargs["label"] = self.hue_names[hue_k] # Stick the facet dataframe into the kwargs if self._dropna: data_ijk = data_ijk.dropna() kwargs["data"] = data_ijk # Draw the plot self._facet_plot(func, ax, args, kwargs) # Finalize the annotations and layout self._finalize_grid(args[:2]) return self def _facet_color(self, hue_index, kw_color): color = self._colors[hue_index] if kw_color is not None: return kw_color elif color is not None: return color def _facet_plot(self, func, ax, plot_args, plot_kwargs): # Draw the plot func(*plot_args, **plot_kwargs) # Sort out the supporting information self._update_legend_data(ax) self._clean_axis(ax) def _finalize_grid(self, axlabels): """Finalize the annotations and layout.""" self.set_axis_labels(*axlabels) self.set_titles() self.fig.tight_layout() def facet_axis(self, row_i, col_j): """Make the axis identified by these indices active and return it.""" # Calculate the actual indices of the axes to plot on if self._col_wrap is not None: ax = self.axes.flat[col_j] else: ax = self.axes[row_i, col_j] # Get a reference to the axes object we want, and make it active plt.sca(ax) return ax def despine(self, **kwargs): """Remove axis spines from the facets.""" utils.despine(self.fig, **kwargs) return self def set_axis_labels(self, x_var=None, y_var=None): """Set axis labels on the left column and bottom row of the grid.""" if x_var is not None: self._x_var = x_var self.set_xlabels(x_var) if y_var is not None: self._y_var = y_var self.set_ylabels(y_var) return self def set_xlabels(self, label=None, **kwargs): """Label the x axis on the bottom row of the grid.""" if label is None: label = self._x_var for ax in self._bottom_axes: ax.set_xlabel(label, **kwargs) return self def set_ylabels(self, label=None, **kwargs): """Label the y axis on the left column of the grid.""" if label is None: label = self._y_var for ax in self._left_axes: ax.set_ylabel(label, **kwargs) return self def set_xticklabels(self, labels=None, step=None, **kwargs): """Set x axis tick labels on the bottom row of the grid.""" for ax in self._bottom_axes: if labels is None: labels = [l.get_text() for l in ax.get_xticklabels()] if step is not None: xticks = ax.get_xticks()[::step] labels = labels[::step] ax.set_xticks(xticks) ax.set_xticklabels(labels, **kwargs) return self def set_yticklabels(self, labels=None, **kwargs): """Set y axis tick labels on the left column of the grid.""" for ax in self._left_axes: if labels is None: labels = [l.get_text() for l in ax.get_yticklabels()] ax.set_yticklabels(labels, **kwargs) return self def set_titles(self, template=None, row_template=None, col_template=None, **kwargs): """Draw titles either above each facet or on the grid margins. Parameters ---------- template : string Template for all titles with the formatting keys {col_var} and {col_name} (if using a `col` faceting variable) and/or {row_var} and {row_name} (if using a `row` faceting variable). row_template: Template for the row variable when titles are drawn on the grid margins. Must have {row_var} and {row_name} formatting keys. col_template: Template for the row variable when titles are drawn on the grid margins. Must have {col_var} and {col_name} formatting keys. Returns ------- self: object Returns self. """ args = dict(row_var=self._row_var, col_var=self._col_var) kwargs["size"] = kwargs.pop("size", mpl.rcParams["axes.labelsize"]) # Establish default templates if row_template is None: row_template = "{row_var} = {row_name}" if col_template is None: col_template = "{col_var} = {col_name}" if template is None: if self._row_var is None: template = col_template elif self._col_var is None: template = row_template else: template = " | ".join([row_template, col_template]) if self._margin_titles: if self.row_names is not None: # Draw the row titles on the right edge of the grid for i, row_name in enumerate(self.row_names): ax = self.axes[i, -1] args.update(dict(row_name=row_name)) title = row_template.format(**args) bgcolor = self.fig.get_facecolor() ax.annotate(title, xy=(1.02, .5), xycoords="axes fraction", rotation=270, ha="left", va="center", backgroundcolor=bgcolor, **kwargs) if self.col_names is not None: # Draw the column titles as normal titles for j, col_name in enumerate(self.col_names): args.update(dict(col_name=col_name)) title = col_template.format(**args) self.axes[0, j].set_title(title, **kwargs) return self # Otherwise title each facet with all the necessary information if (self._row_var is not None) and (self._col_var is not None): for i, row_name in enumerate(self.row_names): for j, col_name in enumerate(self.col_names): args.update(dict(row_name=row_name, col_name=col_name)) title = template.format(**args) self.axes[i, j].set_title(title, **kwargs) elif self.row_names is not None and len(self.row_names): for i, row_name in enumerate(self.row_names): args.update(dict(row_name=row_name)) title = template.format(**args) self.axes[i, 0].set_title(title, **kwargs) elif self.col_names is not None and len(self.col_names): for i, col_name in enumerate(self.col_names): args.update(dict(col_name=col_name)) title = template.format(**args) # Index the flat array so col_wrap works self.axes.flat[i].set_title(title, **kwargs) return self @property def ax(self): """Easy access to single axes.""" if self.axes.shape == (1, 1): return self.axes[0, 0] else: raise AttributeError @property def _inner_axes(self): """Return a flat array of the inner axes.""" if self._col_wrap is None: return self.axes[:-1, 1:].flat else: axes = [] n_empty = self._nrow * self._ncol - self._n_facets for i, ax in enumerate(self.axes): append = (i % self._ncol and i < (self._ncol * (self._nrow - 1)) and i < (self._ncol * (self._nrow - 1) - n_empty)) if append: axes.append(ax) return np.array(axes, object).flat @property def _left_axes(self): """Return a flat array of the left column of axes.""" if self._col_wrap is None: return self.axes[:, 0].flat else: axes = [] for i, ax in enumerate(self.axes): if not i % self._ncol: axes.append(ax) return np.array(axes, object).flat @property def _not_left_axes(self): """Return a flat array of axes that aren't on the left column.""" if self._col_wrap is None: return self.axes[:, 1:].flat else: axes = [] for i, ax in enumerate(self.axes): if i % self._ncol: axes.append(ax) return np.array(axes, object).flat @property def _bottom_axes(self): """Return a flat array of the bottom row of axes.""" if self._col_wrap is None: return self.axes[-1, :].flat else: axes = [] n_empty = self._nrow * self._ncol - self._n_facets for i, ax in enumerate(self.axes): append = (i >= (self._ncol * (self._nrow - 1)) or i >= (self._ncol * (self._nrow - 1) - n_empty)) if append: axes.append(ax) return np.array(axes, object).flat @property def _not_bottom_axes(self): """Return a flat array of axes that aren't on the bottom row.""" if self._col_wrap is None: return self.axes[:-1, :].flat else: axes = [] n_empty = self._nrow * self._ncol - self._n_facets for i, ax in enumerate(self.axes): append = (i < (self._ncol * (self._nrow - 1)) and i < (self._ncol * (self._nrow - 1) - n_empty)) if append: axes.append(ax) return np.array(axes, object).flat class PairGrid(Grid): """Subplot grid for plotting pairwise relationships in a dataset.""" def __init__(self, data, hue=None, hue_order=None, palette=None, hue_kws=None, vars=None, x_vars=None, y_vars=None, diag_sharey=True, size=2.5, aspect=1, despine=True, dropna=True): """Initialize the plot figure and PairGrid object. Parameters ---------- data : DataFrame Tidy (long-form) dataframe where each column is a variable and each row is an observation. hue : string (variable name), optional Variable in ``data`` to map plot aspects to different colors. hue_order : list of strings Order for the levels of the hue variable in the palette palette : dict or seaborn color palette Set of colors for mapping the ``hue`` variable. If a dict, keys should be values in the ``hue`` variable. hue_kws : dictionary of param -> list of values mapping Other keyword arguments to insert into the plotting call to let other plot attributes vary across levels of the hue variable (e.g. the markers in a scatterplot). vars : list of variable names, optional Variables within ``data`` to use, otherwise use every column with a numeric datatype. {x, y}_vars : lists of variable names, optional Variables within ``data`` to use separately for the rows and columns of the figure; i.e. to make a non-square plot. size : scalar, optional Height (in inches) of each facet. aspect : scalar, optional Aspect * size gives the width (in inches) of each facet. despine : boolean, optional Remove the top and right spines from the plots. dropna : boolean, optional Drop missing values from the data before plotting. See Also -------- pairplot : Easily drawing common uses of :class:`PairGrid`. FacetGrid : Subplot grid for plotting conditional relationships. Examples -------- Draw a scatterplot for each pairwise relationship: .. plot:: :context: close-figs >>> import matplotlib.pyplot as plt >>> import seaborn as sns; sns.set() >>> iris = sns.load_dataset("iris") >>> g = sns.PairGrid(iris) >>> g = g.map(plt.scatter) Show a univariate distribution on the diagonal: .. plot:: :context: close-figs >>> g = sns.PairGrid(iris) >>> g = g.map_diag(plt.hist) >>> g = g.map_offdiag(plt.scatter) (It's not actually necessary to catch the return value every time, as it is the same object, but it makes it easier to deal with the doctests). Color the points using a categorical variable: .. plot:: :context: close-figs >>> g = sns.PairGrid(iris, hue="species") >>> g = g.map(plt.scatter) >>> g = g.add_legend() Plot a subset of variables .. plot:: :context: close-figs >>> g = sns.PairGrid(iris, vars=["sepal_length", "sepal_width"]) >>> g = g.map(plt.scatter) Pass additional keyword arguments to the functions .. plot:: :context: close-figs >>> g = sns.PairGrid(iris) >>> g = g.map_diag(plt.hist, edgecolor="w") >>> g = g.map_offdiag(plt.scatter, edgecolor="w", s=40) Use different variables for the rows and columns: .. plot:: :context: close-figs >>> g = sns.PairGrid(iris, ... x_vars=["sepal_length", "sepal_width"], ... y_vars=["petal_length", "petal_width"]) >>> g = g.map(plt.scatter) Use different functions on the upper and lower triangles: .. plot:: :context: close-figs >>> g = sns.PairGrid(iris) >>> g = g.map_upper(plt.scatter) >>> g = g.map_lower(sns.kdeplot, cmap="Blues_d") >>> g = g.map_diag(sns.kdeplot, lw=3, legend=False) Use different colors and markers for each categorical level: .. plot:: :context: close-figs >>> g = sns.PairGrid(iris, hue="species", palette="Set2", ... hue_kws={"marker": ["o", "s", "D"]}) >>> g = g.map(plt.scatter, linewidths=1, edgecolor="w", s=40) >>> g = g.add_legend() """ # Sort out the variables that define the grid if vars is not None: x_vars = list(vars) y_vars = list(vars) elif (x_vars is not None) or (y_vars is not None): if (x_vars is None) or (y_vars is None): raise ValueError("Must specify `x_vars` and `y_vars`") else: numeric_cols = self._find_numeric_cols(data) x_vars = numeric_cols y_vars = numeric_cols if np.isscalar(x_vars): x_vars = [x_vars] if np.isscalar(y_vars): y_vars = [y_vars] self.x_vars = list(x_vars) self.y_vars = list(y_vars) self.square_grid = self.x_vars == self.y_vars # Create the figure and the array of subplots figsize = len(x_vars) * size * aspect, len(y_vars) * size fig, axes = plt.subplots(len(y_vars), len(x_vars), figsize=figsize, sharex="col", sharey="row", squeeze=False) self.fig = fig self.axes = axes self.data = data # Save what we are going to do with the diagonal self.diag_sharey = diag_sharey self.diag_axes = None # Label the axes self._add_axis_labels() # Sort out the hue variable self._hue_var = hue if hue is None: self.hue_names = ["_nolegend_"] self.hue_vals = pd.Series(["_nolegend_"] * len(data), index=data.index) else: hue_names = utils.categorical_order(data[hue], hue_order) if dropna: # Filter NA from the list of unique hue names hue_names = list(filter(pd.notnull, hue_names)) self.hue_names = hue_names self.hue_vals = data[hue] # Additional dict of kwarg -> list of values for mapping the hue var self.hue_kws = hue_kws if hue_kws is not None else {} self.palette = self._get_palette(data, hue, hue_order, palette) self._legend_data = {} # Make the plot look nice if despine: utils.despine(fig=fig) fig.tight_layout() def map(self, func, **kwargs): """Plot with the same function in every subplot. Parameters ---------- func : callable plotting function Must take x, y arrays as positional arguments and draw onto the "currently active" matplotlib Axes. """ kw_color = kwargs.pop("color", None) for i, y_var in enumerate(self.y_vars): for j, x_var in enumerate(self.x_vars): hue_grouped = self.data.groupby(self.hue_vals) for k, label_k in enumerate(self.hue_names): # Attempt to get data for this level, allowing for empty try: data_k = hue_grouped.get_group(label_k) except KeyError: data_k = pd.DataFrame(columns=self.data.columns, dtype=np.float) ax = self.axes[i, j] plt.sca(ax) # Insert the other hue aesthetics if appropriate for kw, val_list in self.hue_kws.items(): kwargs[kw] = val_list[k] color = self.palette[k] if kw_color is None else kw_color func(data_k[x_var], data_k[y_var], label=label_k, color=color, **kwargs) self._clean_axis(ax) self._update_legend_data(ax) if kw_color is not None: kwargs["color"] = kw_color self._add_axis_labels() return self def map_diag(self, func, **kwargs): """Plot with a univariate function on each diagonal subplot. Parameters ---------- func : callable plotting function Must take an x array as a positional arguments and draw onto the "currently active" matplotlib Axes. There is a special case when using a ``hue`` variable and ``plt.hist``; the histogram will be plotted with stacked bars. """ # Add special diagonal axes for the univariate plot if self.square_grid and self.diag_axes is None: diag_axes = [] for i, (var, ax) in enumerate(zip(self.x_vars, np.diag(self.axes))): if i and self.diag_sharey: diag_ax = ax._make_twin_axes(sharex=ax, sharey=diag_axes[0], frameon=False) else: diag_ax = ax._make_twin_axes(sharex=ax, frameon=False) diag_ax.set_axis_off() diag_axes.append(diag_ax) self.diag_axes = np.array(diag_axes, np.object) # Plot on each of the diagonal axes for i, var in enumerate(self.x_vars): ax = self.diag_axes[i] hue_grouped = self.data[var].groupby(self.hue_vals) # Special-case plt.hist with stacked bars if func is plt.hist: plt.sca(ax) vals = [] for label in self.hue_names: # Attempt to get data for this level, allowing for empty try: vals.append(np.asarray(hue_grouped.get_group(label))) except KeyError: vals.append(np.array([])) func(vals, color=self.palette, histtype="barstacked", **kwargs) else: for k, label_k in enumerate(self.hue_names): # Attempt to get data for this level, allowing for empty try: data_k = hue_grouped.get_group(label_k) except KeyError: data_k = np.array([]) plt.sca(ax) func(data_k, label=label_k, color=self.palette[k], **kwargs) self._clean_axis(ax) self._add_axis_labels() return self def map_lower(self, func, **kwargs): """Plot with a bivariate function on the lower diagonal subplots. Parameters ---------- func : callable plotting function Must take x, y arrays as positional arguments and draw onto the "currently active" matplotlib Axes. """ kw_color = kwargs.pop("color", None) for i, j in zip(*np.tril_indices_from(self.axes, -1)): hue_grouped = self.data.groupby(self.hue_vals) for k, label_k in enumerate(self.hue_names): # Attempt to get data for this level, allowing for empty try: data_k = hue_grouped.get_group(label_k) except KeyError: data_k = pd.DataFrame(columns=self.data.columns, dtype=np.float) ax = self.axes[i, j] plt.sca(ax) x_var = self.x_vars[j] y_var = self.y_vars[i] # Insert the other hue aesthetics if appropriate for kw, val_list in self.hue_kws.items(): kwargs[kw] = val_list[k] color = self.palette[k] if kw_color is None else kw_color func(data_k[x_var], data_k[y_var], label=label_k, color=color, **kwargs) self._clean_axis(ax) self._update_legend_data(ax) if kw_color is not None: kwargs["color"] = kw_color self._add_axis_labels() return self def map_upper(self, func, **kwargs): """Plot with a bivariate function on the upper diagonal subplots. Parameters ---------- func : callable plotting function Must take x, y arrays as positional arguments and draw onto the "currently active" matplotlib Axes. """ kw_color = kwargs.pop("color", None) for i, j in zip(*np.triu_indices_from(self.axes, 1)): hue_grouped = self.data.groupby(self.hue_vals) for k, label_k in enumerate(self.hue_names): # Attempt to get data for this level, allowing for empty try: data_k = hue_grouped.get_group(label_k) except KeyError: data_k = pd.DataFrame(columns=self.data.columns, dtype=np.float) ax = self.axes[i, j] plt.sca(ax) x_var = self.x_vars[j] y_var = self.y_vars[i] # Insert the other hue aesthetics if appropriate for kw, val_list in self.hue_kws.items(): kwargs[kw] = val_list[k] color = self.palette[k] if kw_color is None else kw_color func(data_k[x_var], data_k[y_var], label=label_k, color=color, **kwargs) self._clean_axis(ax) self._update_legend_data(ax) if kw_color is not None: kwargs["color"] = kw_color return self def map_offdiag(self, func, **kwargs): """Plot with a bivariate function on the off-diagonal subplots. Parameters ---------- func : callable plotting function Must take x, y arrays as positional arguments and draw onto the "currently active" matplotlib Axes. """ self.map_lower(func, **kwargs) self.map_upper(func, **kwargs) return self def _add_axis_labels(self): """Add labels to the left and bottom Axes.""" for ax, label in zip(self.axes[-1, :], self.x_vars): ax.set_xlabel(label) for ax, label in zip(self.axes[:, 0], self.y_vars): ax.set_ylabel(label) def _find_numeric_cols(self, data): """Find which variables in a DataFrame are numeric.""" # This can't be the best way to do this, but I do not # know what the best way might be, so this seems ok numeric_cols = [] for col in data: try: data[col].astype(np.float) numeric_cols.append(col) except (ValueError, TypeError): pass return numeric_cols class JointGrid(object): """Grid for drawing a bivariate plot with marginal univariate plots.""" def __init__(self, x, y, data=None, size=6, ratio=5, space=.2, dropna=True, xlim=None, ylim=None): """Set up the grid of subplots. Parameters ---------- x, y : strings or vectors Data or names of variables in ``data``. data : DataFrame, optional DataFrame when ``x`` and ``y`` are variable names. size : numeric Size of each side of the figure in inches (it will be square). ratio : numeric Ratio of joint axes size to marginal axes height. space : numeric, optional Space between the joint and marginal axes dropna : bool, optional If True, remove observations that are missing from `x` and `y`. {x, y}lim : two-tuples, optional Axis limits to set before plotting. See Also -------- jointplot : High-level interface for drawing bivariate plots with several different default plot kinds. Examples -------- Initialize the figure but don't draw any plots onto it: .. plot:: :context: close-figs >>> import seaborn as sns; sns.set(style="ticks", color_codes=True) >>> tips = sns.load_dataset("tips") >>> g = sns.JointGrid(x="total_bill", y="tip", data=tips) Add plots using default parameters: .. plot:: :context: close-figs >>> g = sns.JointGrid(x="total_bill", y="tip", data=tips) >>> g = g.plot(sns.regplot, sns.distplot) Draw the join and marginal plots separately, which allows finer-level control other parameters: .. plot:: :context: close-figs >>> import matplotlib.pyplot as plt >>> g = sns.JointGrid(x="total_bill", y="tip", data=tips) >>> g = g.plot_joint(plt.scatter, color=".5", edgecolor="white") >>> g = g.plot_marginals(sns.distplot, kde=False, color=".5") Draw the two marginal plots separately: .. plot:: :context: close-figs >>> import numpy as np >>> g = sns.JointGrid(x="total_bill", y="tip", data=tips) >>> g = g.plot_joint(plt.scatter, color="m", edgecolor="white") >>> _ = g.ax_marg_x.hist(tips["total_bill"], color="b", alpha=.6, ... bins=np.arange(0, 60, 5)) >>> _ = g.ax_marg_y.hist(tips["tip"], color="r", alpha=.6, ... orientation="horizontal", ... bins=np.arange(0, 12, 1)) Add an annotation with a statistic summarizing the bivariate relationship: .. plot:: :context: close-figs >>> from scipy import stats >>> g = sns.JointGrid(x="total_bill", y="tip", data=tips) >>> g = g.plot_joint(plt.scatter, ... color="g", s=40, edgecolor="white") >>> g = g.plot_marginals(sns.distplot, kde=False, color="g") >>> g = g.annotate(stats.pearsonr) Use a custom function and formatting for the annotation .. plot:: :context: close-figs >>> g = sns.JointGrid(x="total_bill", y="tip", data=tips) >>> g = g.plot_joint(plt.scatter, ... color="g", s=40, edgecolor="white") >>> g = g.plot_marginals(sns.distplot, kde=False, color="g") >>> rsquare = lambda a, b: stats.pearsonr(a, b)[0] ** 2 >>> g = g.annotate(rsquare, template="{stat}: {val:.2f}", ... stat="$R^2$", loc="upper left", fontsize=12) Remove the space between the joint and marginal axes: .. plot:: :context: close-figs >>> g = sns.JointGrid(x="total_bill", y="tip", data=tips, space=0) >>> g = g.plot_joint(sns.kdeplot, cmap="Blues_d") >>> g = g.plot_marginals(sns.kdeplot, shade=True) Draw a smaller plot with relatively larger marginal axes: .. plot:: :context: close-figs >>> g = sns.JointGrid(x="total_bill", y="tip", data=tips, ... size=5, ratio=2) >>> g = g.plot_joint(sns.kdeplot, cmap="Reds_d") >>> g = g.plot_marginals(sns.kdeplot, color="r", shade=True) Set limits on the axes: .. plot:: :context: close-figs >>> g = sns.JointGrid(x="total_bill", y="tip", data=tips, ... xlim=(0, 50), ylim=(0, 8)) >>> g = g.plot_joint(sns.kdeplot, cmap="Purples_d") >>> g = g.plot_marginals(sns.kdeplot, color="m", shade=True) """ # Set up the subplot grid f = plt.figure(figsize=(size, size)) gs = plt.GridSpec(ratio + 1, ratio + 1) ax_joint = f.add_subplot(gs[1:, :-1]) ax_marg_x = f.add_subplot(gs[0, :-1], sharex=ax_joint) ax_marg_y = f.add_subplot(gs[1:, -1], sharey=ax_joint) self.fig = f self.ax_joint = ax_joint self.ax_marg_x = ax_marg_x self.ax_marg_y = ax_marg_y # Turn off tick visibility for the measure axis on the marginal plots plt.setp(ax_marg_x.get_xticklabels(), visible=False) plt.setp(ax_marg_y.get_yticklabels(), visible=False) # Turn off the ticks on the density axis for the marginal plots plt.setp(ax_marg_x.yaxis.get_majorticklines(), visible=False) plt.setp(ax_marg_x.yaxis.get_minorticklines(), visible=False) plt.setp(ax_marg_y.xaxis.get_majorticklines(), visible=False) plt.setp(ax_marg_y.xaxis.get_minorticklines(), visible=False) plt.setp(ax_marg_x.get_yticklabels(), visible=False) plt.setp(ax_marg_y.get_xticklabels(), visible=False) ax_marg_x.yaxis.grid(False) ax_marg_y.xaxis.grid(False) # Possibly extract the variables from a DataFrame if data is not None: if x in data: x = data[x] if y in data: y = data[y] # Possibly drop NA if dropna: not_na = pd.notnull(x) & pd.notnull(y) x = x[not_na] y = y[not_na] # Find the names of the variables if hasattr(x, "name"): xlabel = x.name ax_joint.set_xlabel(xlabel) if hasattr(y, "name"): ylabel = y.name ax_joint.set_ylabel(ylabel) # Convert the x and y data to arrays for plotting self.x = np.asarray(x) self.y = np.asarray(y) if xlim is not None: ax_joint.set_xlim(xlim) if ylim is not None: ax_joint.set_ylim(ylim) # Make the grid look nice utils.despine(f) utils.despine(ax=ax_marg_x, left=True) utils.despine(ax=ax_marg_y, bottom=True) f.tight_layout() f.subplots_adjust(hspace=space, wspace=space) def plot(self, joint_func, marginal_func, annot_func=None): """Shortcut to draw the full plot. Use `plot_joint` and `plot_marginals` directly for more control. Parameters ---------- joint_func, marginal_func: callables Functions to draw the bivariate and univariate plots. Returns ------- self : JointGrid instance Returns `self`. """ self.plot_marginals(marginal_func) self.plot_joint(joint_func) if annot_func is not None: self.annotate(annot_func) return self def plot_joint(self, func, **kwargs): """Draw a bivariate plot of `x` and `y`. Parameters ---------- func : plotting callable This must take two 1d arrays of data as the first two positional arguments, and it must plot on the "current" axes. kwargs : key, value mappings Keyword argument are passed to the plotting function. Returns ------- self : JointGrid instance Returns `self`. """ plt.sca(self.ax_joint) func(self.x, self.y, **kwargs) return self def plot_marginals(self, func, **kwargs): """Draw univariate plots for `x` and `y` separately. Parameters ---------- func : plotting callable This must take a 1d array of data as the first positional argument, it must plot on the "current" axes, and it must accept a "vertical" keyword argument to orient the measure dimension of the plot vertically. kwargs : key, value mappings Keyword argument are passed to the plotting function. Returns ------- self : JointGrid instance Returns `self`. """ kwargs["vertical"] = False plt.sca(self.ax_marg_x) func(self.x, **kwargs) kwargs["vertical"] = True plt.sca(self.ax_marg_y) func(self.y, **kwargs) return self def annotate(self, func, template=None, stat=None, loc="best", **kwargs): """Annotate the plot with a statistic about the relationship. Parameters ---------- func : callable Statistical function that maps the x, y vectors either to (val, p) or to val. template : string format template, optional The template must have the format keys "stat" and "val"; if `func` returns a p value, it should also have the key "p". stat : string, optional Name to use for the statistic in the annotation, by default it uses the name of `func`. loc : string or int, optional Matplotlib legend location code; used to place the annotation. kwargs : key, value mappings Other keyword arguments are passed to `ax.legend`, which formats the annotation. Returns ------- self : JointGrid instance. Returns `self`. """ default_template = "{stat} = {val:.2g}; p = {p:.2g}" # Call the function and determine the form of the return value(s) out = func(self.x, self.y) try: val, p = out except TypeError: val, p = out, None default_template, _ = default_template.split(";") # Set the default template if template is None: template = default_template # Default to name of the function if stat is None: stat = func.__name__ # Format the annotation if p is None: annotation = template.format(stat=stat, val=val) else: annotation = template.format(stat=stat, val=val, p=p) # Draw an invisible plot and use the legend to draw the annotation # This is a bit of a hack, but `loc=best` works nicely and is not # easily abstracted. phantom, = self.ax_joint.plot(self.x, self.y, linestyle="", alpha=0) self.ax_joint.legend([phantom], [annotation], loc=loc, **kwargs) phantom.remove() return self def set_axis_labels(self, xlabel="", ylabel="", **kwargs): """Set the axis labels on the bivariate axes. Parameters ---------- xlabel, ylabel : strings Label names for the x and y variables. kwargs : key, value mappings Other keyword arguments are passed to the set_xlabel or set_ylabel. Returns ------- self : JointGrid instance returns `self` """ self.ax_joint.set_xlabel(xlabel, **kwargs) self.ax_joint.set_ylabel(ylabel, **kwargs) return self def savefig(self, *args, **kwargs): """Wrap figure.savefig defaulting to tight bounding box.""" kwargs.setdefault("bbox_inches", "tight") self.fig.savefig(*args, **kwargs) seaborn-0.6.0/seaborn/categorical.py000066400000000000000000003073531254425553400174520ustar00rootroot00000000000000from __future__ import division from textwrap import dedent import colorsys import numpy as np from scipy import stats import pandas as pd from pandas.core.series import remove_na import matplotlib as mpl import matplotlib.pyplot as plt import warnings from .external.six import string_types from .external.six.moves import range from . import utils from .utils import desaturate, iqr, categorical_order from .algorithms import bootstrap from .palettes import color_palette, husl_palette, light_palette from .axisgrid import FacetGrid, _facet_docs class _CategoricalPlotter(object): width = .8 def establish_variables(self, x=None, y=None, hue=None, data=None, orient=None, order=None, hue_order=None, units=None): """Convert input specification into a common representation.""" # Option 1: # We are plotting a wide-form dataset # ----------------------------------- if x is None and y is None: # Do a sanity check on the inputs if hue is not None: error = "Cannot use `hue` without `x` or `y`" raise ValueError(error) # No hue grouping with wide inputs plot_hues = None hue_title = None hue_names = None # No statistical units with wide inputs plot_units = None # We also won't get a axes labels here value_label = None group_label = None # Option 1a: # The input data is a Pandas DataFrame # ------------------------------------ if isinstance(data, pd.DataFrame): # Order the data correctly if order is None: order = [] # Reduce to just numeric columns for col in data: try: data[col].astype(np.float) order.append(col) except ValueError: pass plot_data = data[order] group_names = order group_label = data.columns.name # Convert to a list of arrays, the common representation iter_data = plot_data.iteritems() plot_data = [np.asarray(s, np.float) for k, s in iter_data] # Option 1b: # The input data is an array or list # ---------------------------------- else: # We can't reorder the data if order is not None: error = "Input data must be a pandas object to reorder" raise ValueError(error) # The input data is an array if hasattr(data, "shape"): if len(data.shape) == 1: if np.isscalar(data[0]): plot_data = [data] else: plot_data = list(data) elif len(data.shape) == 2: nr, nc = data.shape if nr == 1 or nc == 1: plot_data = [data.ravel()] else: plot_data = [data[:, i] for i in range(nc)] else: error = ("Input `data` can have no " "more than 2 dimensions") raise ValueError(error) # Check if `data` is None to let us bail out here (for testing) elif data is None: plot_data = [[]] # The input data is a flat list elif np.isscalar(data[0]): plot_data = [data] # The input data is a nested list # This will catch some things that might fail later # but exhaustive checks are hard else: plot_data = data # Convert to a list of arrays, the common representation plot_data = [np.asarray(d, np.float) for d in plot_data] # The group names will just be numeric indices group_names = list(range((len(plot_data)))) # Figure out the plotting orientation orient = "h" if str(orient).startswith("h") else "v" # Option 2: # We are plotting a long-form dataset # ----------------------------------- else: # See if we need to get variables from `data` if data is not None: x = data.get(x, x) y = data.get(y, y) hue = data.get(hue, hue) units = data.get(units, units) # Validate the inputs for input in [x, y, hue, units]: if isinstance(input, string_types): err = "Could not interperet input '{}'".format(input) raise ValueError(err) # Figure out the plotting orientation orient = self.infer_orient(x, y, orient) # Option 2a: # We are plotting a single set of data # ------------------------------------ if x is None or y is None: # Determine where the data are vals = y if x is None else x # Put them into the common representation plot_data = [np.asarray(vals)] # Get a label for the value axis if hasattr(vals, "name"): value_label = vals.name else: value_label = None # This plot will not have group labels or hue nesting groups = None group_label = None group_names = [] plot_hues = None hue_names = None hue_title = None plot_units = None # Option 2b: # We are grouping the data values by another variable # --------------------------------------------------- else: # Determine which role each variable will play if orient == "v": vals, groups = y, x else: vals, groups = x, y # Get the categorical axis label group_label = None if hasattr(groups, "name"): group_label = groups.name # Get the order on the categorical axis group_names = categorical_order(groups, order) # Group the numeric data plot_data, value_label = self._group_longform(vals, groups, group_names) # Now handle the hue levels for nested ordering if hue is None: plot_hues = None hue_title = None hue_names = None else: # Get the order of the hue levels hue_names = categorical_order(hue, hue_order) # Group the hue data plot_hues, hue_title = self._group_longform(hue, groups, group_names) # Now handle the units for nested observations if units is None: plot_units = None else: plot_units, _ = self._group_longform(units, groups, group_names) # Assign object attributes # ------------------------ self.orient = orient self.plot_data = plot_data self.group_label = group_label self.value_label = value_label self.group_names = group_names self.plot_hues = plot_hues self.hue_title = hue_title self.hue_names = hue_names self.plot_units = plot_units def _group_longform(self, vals, grouper, order): """Group a long-form variable by another with correct order.""" # Ensure that the groupby will work if not isinstance(vals, pd.Series): vals = pd.Series(vals) # Group the val data grouped_vals = vals.groupby(grouper) out_data = [] for g in order: try: g_vals = np.asarray(grouped_vals.get_group(g)) except KeyError: g_vals = np.array([]) out_data.append(g_vals) # Get the vals axis label label = vals.name return out_data, label def establish_colors(self, color, palette, saturation): """Get a list of colors for the main component of the plots.""" if self.hue_names is None: n_colors = len(self.plot_data) else: n_colors = len(self.hue_names) # Determine the main colors if color is None and palette is None: # Determine whether the current palette will have enough values # If not, we'll default to the husl palette so each is distinct current_palette = mpl.rcParams["axes.color_cycle"] if n_colors <= len(current_palette): colors = color_palette(n_colors=n_colors) else: colors = husl_palette(n_colors, l=.7) elif palette is None: # When passing a specific color, the interpretation depends # on whether there is a hue variable or not. # If so, we will make a blend palette so that the different # levels have some amount of variation. if self.hue_names is None: colors = [color] * n_colors else: colors = light_palette(color, n_colors) else: # Let `palette` be a dict mapping level to color if isinstance(palette, dict): if self.hue_names is None: levels = self.group_names else: levels = self.hue_names palette = [palette[l] for l in levels] colors = color_palette(palette, n_colors) # Conver the colors to a common rgb representation colors = [mpl.colors.colorConverter.to_rgb(c) for c in colors] # Desaturate a bit because these are patches if saturation < 1: colors = [desaturate(c, saturation) for c in colors] # Determine the gray color to use for the lines framing the plot light_vals = [colorsys.rgb_to_hls(*c)[1] for c in colors] l = min(light_vals) * .6 gray = (l, l, l) # Assign object attributes self.colors = colors self.gray = gray def infer_orient(self, x, y, orient=None): """Determine how the plot should be oriented based on the data.""" orient = str(orient) def is_categorical(s): try: # Correct way, but doesnt exist in older Pandas return pd.core.common.is_categorical_dtype(s) except AttributeError: # Also works, but feels hackier return str(s.dtype) == "categorical" def is_not_numeric(s): try: np.asarray(s, dtype=np.float) except ValueError: return True return False no_numeric = "Neither the `x` nor `y` variable appears to be numeric." if orient.startswith("v"): return "v" elif orient.startswith("h"): return "h" elif x is None: return "v" elif y is None: return "h" elif is_categorical(y): if is_categorical(x): raise ValueError(no_numeric) else: return "h" elif is_not_numeric(y): if is_not_numeric(x): raise ValueError(no_numeric) else: return "h" else: return "v" @property def hue_offsets(self): """A list of center positions for plots when hue nesting is used.""" n_levels = len(self.hue_names) each_width = self.width / n_levels offsets = np.linspace(0, self.width - each_width, n_levels) offsets -= offsets.mean() return offsets @property def nested_width(self): """A float with the width of plot elements when hue nesting is used.""" return self.width / len(self.hue_names) * .98 def annotate_axes(self, ax): """Add descriptive labels to an Axes object.""" if self.orient == "v": xlabel, ylabel = self.group_label, self.value_label else: xlabel, ylabel = self.value_label, self.group_label if xlabel is not None: ax.set_xlabel(xlabel) if ylabel is not None: ax.set_ylabel(ylabel) if self.orient == "v": ax.set_xticks(np.arange(len(self.plot_data))) ax.set_xticklabels(self.group_names) else: ax.set_yticks(np.arange(len(self.plot_data))) ax.set_yticklabels(self.group_names) if self.orient == "v": ax.xaxis.grid(False) ax.set_xlim(-.5, len(self.plot_data) - .5) else: ax.yaxis.grid(False) ax.set_ylim(-.5, len(self.plot_data) - .5) if self.hue_names is not None: leg = ax.legend(loc="best") if self.hue_title is not None: leg.set_title(self.hue_title) # Set the title size a roundabout way to maintain # compatability with matplotlib 1.1 try: title_size = mpl.rcParams["axes.labelsize"] * .85 except TypeError: # labelsize is something like "large" title_size = mpl.rcParams["axes.labelsize"] prop = mpl.font_manager.FontProperties(size=title_size) leg._legend_title_box._text.set_font_properties(prop) def add_legend_data(self, ax, color, label): """Add a dummy patch object so we can get legend data.""" rect = plt.Rectangle([0, 0], 0, 0, linewidth=self.linewidth / 2, edgecolor=self.gray, facecolor=color, label=label) ax.add_patch(rect) class _BoxPlotter(_CategoricalPlotter): def __init__(self, x, y, hue, data, order, hue_order, orient, color, palette, saturation, width, fliersize, linewidth): self.establish_variables(x, y, hue, data, orient, order, hue_order) self.establish_colors(color, palette, saturation) self.width = width self.fliersize = fliersize if linewidth is None: linewidth = mpl.rcParams["lines.linewidth"] self.linewidth = linewidth def draw_boxplot(self, ax, kws): """Use matplotlib to draw a boxplot on an Axes.""" vert = self.orient == "v" for i, group_data in enumerate(self.plot_data): if self.plot_hues is None: # Handle case where there is data at this level if group_data.size == 0: continue # Draw a single box or a set of boxes # with a single level of grouping box_data = remove_na(group_data) # Handle case where there is no non-null data if box_data.size == 0: continue artist_dict = ax.boxplot(box_data, vert=vert, patch_artist=True, positions=[i], widths=self.width, **kws) color = self.colors[i] self.restyle_boxplot(artist_dict, color, kws) else: # Draw nested groups of boxes offsets = self.hue_offsets for j, hue_level in enumerate(self.hue_names): hue_mask = self.plot_hues[i] == hue_level # Add a legend for this hue level if not i: self.add_legend_data(ax, self.colors[j], hue_level) # Handle case where there is data at this level if group_data.size == 0: continue box_data = remove_na(group_data[hue_mask]) # Handle case where there is no non-null data if box_data.size == 0: continue center = i + offsets[j] artist_dict = ax.boxplot(box_data, vert=vert, patch_artist=True, positions=[center], widths=self.nested_width, **kws) self.restyle_boxplot(artist_dict, self.colors[j], kws) # Add legend data, but just for one set of boxes def restyle_boxplot(self, artist_dict, color, kws): """Take a drawn matplotlib boxplot and make it look nice.""" for box in artist_dict["boxes"]: box.update(dict(color=color, zorder=.9, edgecolor=self.gray, linewidth=self.linewidth)) box.update(kws.get("boxprops", {})) for whisk in artist_dict["whiskers"]: whisk.update(dict(color=self.gray, linewidth=self.linewidth, linestyle="-")) whisk.update(kws.get("whiskerprops", {})) for cap in artist_dict["caps"]: cap.update(dict(color=self.gray, linewidth=self.linewidth)) cap.update(kws.get("capprops", {})) for med in artist_dict["medians"]: med.update(dict(color=self.gray, linewidth=self.linewidth)) med.update(kws.get("medianprops", {})) for fly in artist_dict["fliers"]: fly.update(dict(color=self.gray, marker="d", markeredgecolor=self.gray, markersize=self.fliersize)) fly.update(kws.get("flierprops", {})) def plot(self, ax, boxplot_kws): """Make the plot.""" self.draw_boxplot(ax, boxplot_kws) self.annotate_axes(ax) if self.orient == "h": ax.invert_yaxis() class _ViolinPlotter(_CategoricalPlotter): def __init__(self, x, y, hue, data, order, hue_order, bw, cut, scale, scale_hue, gridsize, width, inner, split, orient, linewidth, color, palette, saturation): self.establish_variables(x, y, hue, data, orient, order, hue_order) self.establish_colors(color, palette, saturation) self.estimate_densities(bw, cut, scale, scale_hue, gridsize) self.gridsize = gridsize self.width = width if inner is not None: if not any([inner.startswith("quart"), inner.startswith("box"), inner.startswith("stick"), inner.startswith("point")]): err = "Inner style '{}' not recognized".format(inner) raise ValueError(err) self.inner = inner if split and self.hue_names is not None and len(self.hue_names) != 2: raise ValueError("Cannot use `split` with more than 2 hue levels.") self.split = split if linewidth is None: linewidth = mpl.rcParams["lines.linewidth"] self.linewidth = linewidth def estimate_densities(self, bw, cut, scale, scale_hue, gridsize): """Find the support and density for all of the data.""" # Initialize data structures to keep track of plotting data if self.hue_names is None: support = [] density = [] counts = np.zeros(len(self.plot_data)) max_density = np.zeros(len(self.plot_data)) else: support = [[] for _ in self.plot_data] density = [[] for _ in self.plot_data] size = len(self.group_names), len(self.hue_names) counts = np.zeros(size) max_density = np.zeros(size) for i, group_data in enumerate(self.plot_data): # Option 1: we have a single level of grouping # -------------------------------------------- if self.plot_hues is None: # Strip missing datapoints kde_data = remove_na(group_data) # Handle special case of no data at this level if kde_data.size == 0: support.append(np.array([])) density.append(np.array([1.])) counts[i] = 0 max_density[i] = 0 continue # Handle special case of a single unique datapoint elif np.unique(kde_data).size == 1: support.append(np.unique(kde_data)) density.append(np.array([1.])) counts[i] = 1 max_density[i] = 0 continue # Fit the KDE and get the used bandwidth size kde, bw_used = self.fit_kde(kde_data, bw) # Determine the support grid and get the density over it support_i = self.kde_support(kde_data, bw_used, cut, gridsize) density_i = kde.evaluate(support_i) # Update the data structures with these results support.append(support_i) density.append(density_i) counts[i] = kde_data.size max_density[i] = density_i.max() # Option 2: we have nested grouping by a hue variable # --------------------------------------------------- else: for j, hue_level in enumerate(self.hue_names): # Handle special case of no data at this category level if not group_data.size: support[i].append(np.array([])) density[i].append(np.array([1.])) counts[i, j] = 0 max_density[i, j] = 0 continue # Select out the observations for this hue level hue_mask = self.plot_hues[i] == hue_level # Strip missing datapoints kde_data = remove_na(group_data[hue_mask]) # Handle special case of no data at this level if kde_data.size == 0: support[i].append(np.array([])) density[i].append(np.array([1.])) counts[i, j] = 0 max_density[i, j] = 0 continue # Handle special case of a single unique datapoint elif np.unique(kde_data).size == 1: support[i].append(np.unique(kde_data)) density[i].append(np.array([1.])) counts[i, j] = 1 max_density[i, j] = 0 continue # Fit the KDE and get the used bandwidth size kde, bw_used = self.fit_kde(kde_data, bw) # Determine the support grid and get the density over it support_ij = self.kde_support(kde_data, bw_used, cut, gridsize) density_ij = kde.evaluate(support_ij) # Update the data structures with these results support[i].append(support_ij) density[i].append(density_ij) counts[i, j] = kde_data.size max_density[i, j] = density_ij.max() # Scale the height of the density curve. # For a violinplot the density is non-quantitative. # The objective here is to scale the curves relative to 1 so that # they can be multiplied by the width parameter during plotting. if scale == "area": self.scale_area(density, max_density, scale_hue) elif scale == "width": self.scale_width(density) elif scale == "count": self.scale_count(density, counts, scale_hue) else: raise ValueError("scale method '{}' not recognized".format(scale)) # Set object attributes that will be used while plotting self.support = support self.density = density def fit_kde(self, x, bw): """Estimate a KDE for a vector of data with flexible bandwidth.""" # Allow for the use of old scipy where `bw` is fixed try: kde = stats.gaussian_kde(x, bw) except TypeError: kde = stats.gaussian_kde(x) if bw != "scott": # scipy default msg = ("Ignoring bandwidth choice, " "please upgrade scipy to use a different bandwidth.") warnings.warn(msg, UserWarning) # Extract the numeric bandwidth from the KDE object bw_used = kde.factor # At this point, bw will be a numeric scale factor. # To get the actual bandwidth of the kernel, we multiple by the # unbiased standard deviation of the data, which we will use # elsewhere to compute the range of the support. bw_used = bw_used * x.std(ddof=1) return kde, bw_used def kde_support(self, x, bw, cut, gridsize): """Define a grid of support for the violin.""" support_min = x.min() - bw * cut support_max = x.max() + bw * cut return np.linspace(support_min, support_max, gridsize) def scale_area(self, density, max_density, scale_hue): """Scale the relative area under the KDE curve. This essentially preserves the "standard" KDE scaling, but the resulting maximum density will be 1 so that the curve can be properly multiplied by the violin width. """ if self.hue_names is None: for d in density: if d.size > 1: d /= max_density.max() else: for i, group in enumerate(density): for d in group: if scale_hue: max = max_density[i].max() else: max = max_density.max() if d.size > 1: d /= max def scale_width(self, density): """Scale each density curve to the same height.""" if self.hue_names is None: for d in density: d /= d.max() else: for group in density: for d in group: d /= d.max() def scale_count(self, density, counts, scale_hue): """Scale each density curve by the number of observations.""" if self.hue_names is None: for count, d in zip(counts, density): d /= d.max() d *= count / counts.max() else: for i, group in enumerate(density): for j, d in enumerate(group): count = counts[i, j] if scale_hue: scaler = count / counts[i].max() else: scaler = count / counts.max() d /= d.max() d *= scaler @property def dwidth(self): if self.hue_names is None: return self.width / 2 elif self.split: return self.width / 2 else: return self.width / (2 * len(self.hue_names)) def draw_violins(self, ax): """Draw the violins onto `ax`.""" fill_func = ax.fill_betweenx if self.orient == "v" else ax.fill_between for i, group_data in enumerate(self.plot_data): kws = dict(edgecolor=self.gray, linewidth=self.linewidth) # Option 1: we have a single level of grouping # -------------------------------------------- if self.plot_hues is None: support, density = self.support[i], self.density[i] # Handle special case of no observations in this bin if support.size == 0: continue # Handle special case of a single observation elif support.size == 1: val = np.asscalar(support) d = np.asscalar(density) self.draw_single_observation(ax, i, val, d) continue # Draw the violin for this group grid = np.ones(self.gridsize) * i fill_func(support, grid - density * self.dwidth, grid + density * self.dwidth, color=self.colors[i], **kws) # Draw the interior representation of the data if self.inner is None: continue # Get a nan-free vector of datapoints violin_data = remove_na(group_data) # Draw box and whisker information if self.inner.startswith("box"): self.draw_box_lines(ax, violin_data, support, density, i) # Draw quartile lines elif self.inner.startswith("quart"): self.draw_quartiles(ax, violin_data, support, density, i) # Draw stick observations elif self.inner.startswith("stick"): self.draw_stick_lines(ax, violin_data, support, density, i) # Draw point observations elif self.inner.startswith("point"): self.draw_points(ax, violin_data, i) # Option 2: we have nested grouping by a hue variable # --------------------------------------------------- else: offsets = self.hue_offsets for j, hue_level in enumerate(self.hue_names): support, density = self.support[i][j], self.density[i][j] kws["color"] = self.colors[j] # Add legend data, but just for one set of violins if not i: self.add_legend_data(ax, self.colors[j], hue_level) # Handle the special case where we have no observations if support.size == 0: continue # Handle the special case where we have one observation elif support.size == 1: val = np.asscalar(support) d = np.asscalar(density) if self.split: d = d / 2 at_group = i + offsets[j] self.draw_single_observation(ax, at_group, val, d) continue # Option 2a: we are drawing a single split violin # ----------------------------------------------- if self.split: grid = np.ones(self.gridsize) * i if j: fill_func(support, grid, grid + density * self.dwidth, **kws) else: fill_func(support, grid - density * self.dwidth, grid, **kws) # Draw the interior representation of the data if self.inner is None: continue # Get a nan-free vector of datapoints hue_mask = self.plot_hues[i] == hue_level violin_data = remove_na(group_data[hue_mask]) # Draw quartile lines if self.inner.startswith("quart"): self.draw_quartiles(ax, violin_data, support, density, i, ["left", "right"][j]) # Draw stick observations elif self.inner.startswith("stick"): self.draw_stick_lines(ax, violin_data, support, density, i, ["left", "right"][j]) # The box and point interior plots are drawn for # all data at the group level, so we just do that once if not j: continue # Get the whole vector for this group level violin_data = remove_na(group_data) # Draw box and whisker information if self.inner.startswith("box"): self.draw_box_lines(ax, violin_data, support, density, i) # Draw point observations elif self.inner.startswith("point"): self.draw_points(ax, violin_data, i) # Option 2b: we are drawing full nested violins # ----------------------------------------------- else: grid = np.ones(self.gridsize) * (i + offsets[j]) fill_func(support, grid - density * self.dwidth, grid + density * self.dwidth, **kws) # Draw the interior representation if self.inner is None: continue # Get a nan-free vector of datapoints hue_mask = self.plot_hues[i] == hue_level violin_data = remove_na(group_data[hue_mask]) # Draw box and whisker information if self.inner.startswith("box"): self.draw_box_lines(ax, violin_data, support, density, i + offsets[j]) # Draw quartile lines elif self.inner.startswith("quart"): self.draw_quartiles(ax, violin_data, support, density, i + offsets[j]) # Draw stick observations elif self.inner.startswith("stick"): self.draw_stick_lines(ax, violin_data, support, density, i + offsets[j]) # Draw point observations elif self.inner.startswith("point"): self.draw_points(ax, violin_data, i + offsets[j]) def draw_single_observation(self, ax, at_group, at_quant, density): """Draw a line to mark a single observation.""" d_width = density * self.dwidth if self.orient == "v": ax.plot([at_group - d_width, at_group + d_width], [at_quant, at_quant], color=self.gray, linewidth=self.linewidth) else: ax.plot([at_quant, at_quant], [at_group - d_width, at_group + d_width], color=self.gray, linewidth=self.linewidth) def draw_box_lines(self, ax, data, support, density, center): """Draw boxplot information at center of the density.""" # Compute the boxplot statistics q25, q50, q75 = np.percentile(data, [25, 50, 75]) whisker_lim = 1.5 * iqr(data) h1 = np.min(data[data >= (q25 - whisker_lim)]) h2 = np.max(data[data <= (q75 + whisker_lim)]) # Draw a boxplot using lines and a point if self.orient == "v": ax.plot([center, center], [h1, h2], linewidth=self.linewidth, color=self.gray) ax.plot([center, center], [q25, q75], linewidth=self.linewidth * 3, color=self.gray) ax.scatter(center, q50, zorder=3, color="white", edgecolor=self.gray, s=np.square(self.linewidth * 2)) else: ax.plot([h1, h2], [center, center], linewidth=self.linewidth, color=self.gray) ax.plot([q25, q75], [center, center], linewidth=self.linewidth * 3, color=self.gray) ax.scatter(q50, center, zorder=3, color="white", edgecolor=self.gray, s=np.square(self.linewidth * 2)) def draw_quartiles(self, ax, data, support, density, center, split=False): """Draw the quartiles as lines at width of density.""" q25, q50, q75 = np.percentile(data, [25, 50, 75]) self.draw_to_density(ax, center, q25, support, density, split, linewidth=self.linewidth, dashes=[self.linewidth * 1.5] * 2) self.draw_to_density(ax, center, q50, support, density, split, linewidth=self.linewidth, dashes=[self.linewidth * 3] * 2) self.draw_to_density(ax, center, q75, support, density, split, linewidth=self.linewidth, dashes=[self.linewidth * 1.5] * 2) def draw_points(self, ax, data, center): """Draw individual observations as points at middle of the violin.""" kws = dict(s=np.square(self.linewidth * 2), c=self.gray, edgecolor=self.gray) grid = np.ones(len(data)) * center if self.orient == "v": ax.scatter(grid, data, **kws) else: ax.scatter(data, grid, **kws) def draw_stick_lines(self, ax, data, support, density, center, split=False): """Draw individual observations as sticks at width of density.""" for val in data: self.draw_to_density(ax, center, val, support, density, split, linewidth=self.linewidth * .5) def draw_to_density(self, ax, center, val, support, density, split, **kws): """Draw a line orthogonal to the value axis at width of density.""" idx = np.argmin(np.abs(support - val)) width = self.dwidth * density[idx] * .99 kws["color"] = self.gray if self.orient == "v": if split == "left": ax.plot([center - width, center], [val, val], **kws) elif split == "right": ax.plot([center, center + width], [val, val], **kws) else: ax.plot([center - width, center + width], [val, val], **kws) else: if split == "left": ax.plot([val, val], [center - width, center], **kws) elif split == "right": ax.plot([val, val], [center, center + width], **kws) else: ax.plot([val, val], [center - width, center + width], **kws) def plot(self, ax): """Make the violin plot.""" self.draw_violins(ax) self.annotate_axes(ax) if self.orient == "h": ax.invert_yaxis() class _StripPlotter(_CategoricalPlotter): """1-d scatterplot with categorical organization.""" def __init__(self, x, y, hue, data, order, hue_order, jitter, split, orient, color, palette): """Initialize the plotter.""" self.establish_variables(x, y, hue, data, orient, order, hue_order) self.establish_colors(color, palette, 1) # Set object attributes self.split = split self.width = .8 if jitter == 1: # Use a good default for `jitter = True` jlim = 0.1 else: jlim = float(jitter) if self.hue_names is not None and split: jlim /= len(self.hue_names) self.jitterer = stats.uniform(-jlim, jlim * 2).rvs def draw_stripplot(self, ax, kws): """Draw the points onto `ax`.""" # Set the default zorder to 2.1, so that the points # will be drawn on top of line elements (like in a boxplot) kws.setdefault("zorder", 2.1) for i, group_data in enumerate(self.plot_data): if self.plot_hues is None: # Determine the positions of the points strip_data = remove_na(group_data) jitter = self.jitterer(len(strip_data)) kws["color"] = self.colors[i] # Draw the plot if self.orient == "v": ax.scatter(i + jitter, strip_data, **kws) else: ax.scatter(strip_data, i + jitter, **kws) else: offsets = self.hue_offsets for j, hue_level in enumerate(self.hue_names): hue_mask = self.plot_hues[i] == hue_level if not hue_mask.any(): continue # Determine the positions of the points strip_data = remove_na(group_data[hue_mask]) pos = i + offsets[j] if self.split else i jitter = self.jitterer(len(strip_data)) kws["color"] = self.colors[j] # Only label one set of plots if i: kws.pop("label", None) else: kws["label"] = hue_level # Draw the plot if self.orient == "v": ax.scatter(pos + jitter, strip_data, **kws) else: ax.scatter(strip_data, pos + jitter, **kws) def plot(self, ax, kws): """Make the plot.""" self.draw_stripplot(ax, kws) self.annotate_axes(ax) if self.orient == "h": ax.invert_yaxis() class _SwarmPlotter(_BoxPlotter): def __init__(self): pass def plot(self, ax): pass class _CategoricalStatPlotter(_CategoricalPlotter): @property def nested_width(self): """A float with the width of plot elements when hue nesting is used.""" return self.width / len(self.hue_names) def estimate_statistic(self, estimator, ci, n_boot): if self.hue_names is None: statistic = [] confint = [] else: statistic = [[] for _ in self.plot_data] confint = [[] for _ in self.plot_data] for i, group_data in enumerate(self.plot_data): # Option 1: we have a single layer of grouping # -------------------------------------------- if self.plot_hues is None: if self.plot_units is None: stat_data = remove_na(group_data) unit_data = None else: unit_data = self.plot_units[i] have = pd.notnull(np.c_[group_data, unit_data]).all(axis=1) stat_data = group_data[have] unit_data = unit_data[have] # Estimate a statistic from the vector of data if not stat_data.size: statistic.append(np.nan) else: statistic.append(estimator(stat_data)) # Get a confidence interval for this estimate if ci is not None: if stat_data.size < 2: confint.append([np.nan, np.nan]) continue boots = bootstrap(stat_data, func=estimator, n_boot=n_boot, units=unit_data) confint.append(utils.ci(boots, ci)) # Option 2: we are grouping by a hue layer # ---------------------------------------- else: for j, hue_level in enumerate(self.hue_names): if not self.plot_hues[i].size: statistic[i].append(np.nan) if ci is not None: confint[i].append((np.nan, np.nan)) continue hue_mask = self.plot_hues[i] == hue_level if self.plot_units is None: stat_data = remove_na(group_data[hue_mask]) unit_data = None else: group_units = self.plot_units[i] have = pd.notnull( np.c_[group_data, group_units] ).all(axis=1) stat_data = group_data[hue_mask & have] unit_data = group_units[hue_mask & have] # Estimate a statistic from the vector of data if not stat_data.size: statistic[i].append(np.nan) else: statistic[i].append(estimator(stat_data)) # Get a confidence interval for this estimate if ci is not None: if stat_data.size < 2: confint[i].append([np.nan, np.nan]) continue boots = bootstrap(stat_data, func=estimator, n_boot=n_boot, units=unit_data) confint[i].append(utils.ci(boots, ci)) # Save the resulting values for plotting self.statistic = np.array(statistic) self.confint = np.array(confint) # Rename the value label to reflect the estimation if self.value_label is not None: self.value_label = "{}({})".format(estimator.__name__, self.value_label) def draw_confints(self, ax, at_group, confint, colors, **kws): kws.setdefault("lw", mpl.rcParams["lines.linewidth"] * 1.8) for at, (ci_low, ci_high), color in zip(at_group, confint, colors): if self.orient == "v": ax.plot([at, at], [ci_low, ci_high], color=color, **kws) else: ax.plot([ci_low, ci_high], [at, at], color=color, **kws) class _BarPlotter(_CategoricalStatPlotter): """Show point estimates and confidence intervals with bars.""" def __init__(self, x, y, hue, data, order, hue_order, estimator, ci, n_boot, units, orient, color, palette, saturation, errcolor): """Initialize the plotter.""" self.establish_variables(x, y, hue, data, orient, order, hue_order, units) self.establish_colors(color, palette, saturation) self.estimate_statistic(estimator, ci, n_boot) self.errcolor = errcolor def draw_bars(self, ax, kws): """Draw the bars onto `ax`.""" # Get the right matplotlib function depending on the orientation barfunc = ax.bar if self.orient == "v" else ax.barh barpos = np.arange(len(self.statistic)) if self.plot_hues is None: # Draw the bars barfunc(barpos, self.statistic, self.width, color=self.colors, align="center", **kws) # Draw the confidence intervals errcolors = [self.errcolor] * len(barpos) self.draw_confints(ax, barpos, self.confint, errcolors) else: for j, hue_level in enumerate(self.hue_names): # Draw the bars offpos = barpos + self.hue_offsets[j] barfunc(offpos, self.statistic[:, j], self.nested_width, color=self.colors[j], align="center", label=hue_level, **kws) # Draw the confidence intervals if self.confint.size: confint = self.confint[:, j] errcolors = [self.errcolor] * len(offpos) self.draw_confints(ax, offpos, confint, errcolors) def plot(self, ax, bar_kws): """Make the plot.""" self.draw_bars(ax, bar_kws) self.annotate_axes(ax) if self.orient == "h": ax.invert_yaxis() class _PointPlotter(_CategoricalStatPlotter): """Show point estimates and confidence intervals with (joined) points.""" def __init__(self, x, y, hue, data, order, hue_order, estimator, ci, n_boot, units, markers, linestyles, dodge, join, scale, orient, color, palette): """Initialize the plotter.""" self.establish_variables(x, y, hue, data, orient, order, hue_order, units) self.establish_colors(color, palette, 1) self.estimate_statistic(estimator, ci, n_boot) # Override the default palette for single-color plots if hue is None and color is None and palette is None: self.colors = [color_palette()[0]] * len(self.colors) # Don't join single-layer plots with different colors if hue is None and palette is not None: join = False # Use a good default for `dodge=True` if dodge is True and self.hue_names is not None: dodge = .025 * len(self.hue_names) # Make sure we have a marker for each hue level if isinstance(markers, string_types): markers = [markers] * len(self.colors) self.markers = markers # Make sure we have a line style for each hue level if isinstance(linestyles, string_types): linestyles = [linestyles] * len(self.colors) self.linestyles = linestyles # Set the other plot components self.dodge = dodge self.join = join self.scale = scale @property def hue_offsets(self): """Offsets relative to the center position for each hue level.""" offset = np.linspace(0, self.dodge, len(self.hue_names)) offset -= offset.mean() return offset def draw_points(self, ax): """Draw the main data components of the plot.""" # Get the center positions on the categorical axis pointpos = np.arange(len(self.statistic)) # Get the size of the plot elements lw = mpl.rcParams["lines.linewidth"] * 1.8 * self.scale mew = lw * .75 markersize = np.pi * np.square(lw) * 2 if self.plot_hues is None: # Draw lines joining each estimate point if self.join: color = self.colors[0] ls = self.linestyles[0] if self.orient == "h": ax.plot(self.statistic, pointpos, color=color, ls=ls, lw=lw) else: ax.plot(pointpos, self.statistic, color=color, ls=ls, lw=lw) # Draw the confidence intervals self.draw_confints(ax, pointpos, self.confint, self.colors, lw=lw) # Draw the estimate points marker = self.markers[0] if self.orient == "h": ax.scatter(self.statistic, pointpos, linewidth=mew, marker=marker, s=markersize, c=self.colors, edgecolor=self.colors) else: ax.scatter(pointpos, self.statistic, linewidth=mew, marker=marker, s=markersize, c=self.colors, edgecolor=self.colors) else: offsets = self.hue_offsets for j, hue_level in enumerate(self.hue_names): # Determine the values to plot for this level statistic = self.statistic[:, j] # Determine the position on the categorical and z axes offpos = pointpos + offsets[j] z = j + 1 # Draw lines joining each estimate point if self.join: color = self.colors[j] ls = self.linestyles[j] if self.orient == "h": ax.plot(statistic, offpos, color=color, zorder=z, ls=ls, lw=lw) else: ax.plot(offpos, statistic, color=color, zorder=z, ls=ls, lw=lw) # Draw the confidence intervals if self.confint.size: confint = self.confint[:, j] errcolors = [self.colors[j]] * len(offpos) self.draw_confints(ax, offpos, confint, errcolors, zorder=z, lw=lw) # Draw the estimate points marker = self.markers[j] if self.orient == "h": ax.scatter(statistic, offpos, label=hue_level, c=[self.colors[j]] * len(offpos), linewidth=mew, marker=marker, s=markersize, edgecolor=self.colors[j], zorder=z) else: ax.scatter(offpos, statistic, label=hue_level, c=[self.colors[j]] * len(offpos), linewidth=mew, marker=marker, s=markersize, edgecolor=self.colors[j], zorder=z) def plot(self, ax): """Make the plot.""" self.draw_points(ax) self.annotate_axes(ax) if self.orient == "h": ax.invert_yaxis() _categorical_docs = dict( # Shared narrative docs main_api_narrative=dedent("""\ Input data can be passed in a variety of formats, including: - Vectors of data represented as lists, numpy arrays, or pandas Series objects passed directly to the ``x``, ``y``, and/or ``hue`` parameters. - A "long-form" DataFrame, in which case the ``x``, ``y``, and ``hue`` variables will determine how the data are plotted. - A "wide-form" DataFrame, such that each numeric column will be plotted. - Anything accepted by ``plt.boxplot`` (e.g. a 2d array or list of vectors) In most cases, it is possible to use numpy or Python objects, but pandas objects are preferable because the associated names will be used to annotate the axes. Additionally, you can use Categorical types for the grouping variables to control the order of plot elements.\ """), # Shared function parameters input_params=dedent("""\ x, y, hue : names of variables in ``data`` or vector data, optional Inputs for plotting long-form data. See examples for interpretation.\ """), string_input_params=dedent("""\ x, y, hue : names of variables in ``data`` Inputs for plotting long-form data. See examples for interpretation.\ """), data=dedent("""\ data : DataFrame, array, or list of arrays, optional Dataset for plotting. If ``x`` and ``y`` are absent, this is interpreted as wide-form. Otherwise it is expected to be long-form.\ """), long_form_data=dedent("""\ data : DataFrame Long-form (tidy) dataset for plotting. Each column should correspond to a variable, and each row should correspond to an observation.\ """), order_vars=dedent("""\ order, hue_order : lists of strings, optional Order to plot the categorical levels in, otherwise the levels are inferred from the data objects.\ """), stat_api_params=dedent("""\ estimator : callable that maps vector -> scalar, optional Statistical function to estimate within each categorical bin. ci : float or None, optional Size of confidence intervals to draw around estimated values. If ``None``, no bootstrapping will be performed, and error bars will not be drawn. n_boot : int, optional Number of bootstrap iterations to use when computing confidence intervals. units : name of variable in ``data`` or vector data, optional Identifier of sampling units, which will be used to perform a multilevel bootstrap and account for repeated measures design.\ """), orient=dedent("""\ orient : "v" | "h", optional Orientation of the plot (vertical or horizontal). This is usually inferred from the dtype of the input variables, but can be used to specify when the "categorical" variable is a numeric or when plotting wide-form data.\ """), color=dedent("""\ color : matplotlib color, optional Color for all of the elements, or seed for :func:`light_palette` when using hue nesting.\ """), palette=dedent("""\ palette : palette name, list, or dict, optional Color palette that maps either the grouping variable or the hue variable. If the palette is a dictionary, keys should be names of levels and values should be matplotlib colors.\ """), saturation=dedent("""\ saturation : float, optional Proportion of the original saturation to draw colors at. Large patches often look better with slightly desaturated colors, but set this to ``1`` if you want the plot colors to perfectly match the input color spec.\ """), width=dedent("""\ width : float, optional Width of a full element when not using hue nesting, or width of all the elements for one level of the major grouping variable.\ """), linewidth=dedent("""\ linewidth : float, optional Width of the gray lines that frame the plot elements.\ """), ax_in=dedent("""\ ax : matplotlib Axes, optional Axes object to draw the plot onto, otherwise uses the current Axes.\ """), ax_out=dedent("""\ ax : matplotlib Axes Returns the Axes object with the boxplot drawn onto it.\ """), # Shared see also boxplot=dedent("""\ boxplot : A traditional box-and-whisker plot with a similar API.\ """), violinplot=dedent("""\ violinplot : A combination of boxplot and kernel density estimation.\ """), stripplot=dedent("""\ stripplot : A scatterplot where one variable is categorical. Can be used in conjunction with a other plots to show each observation.\ """), barplot=dedent("""\ barplot : Show point estimates and confidence intervals using bars.\ """), countplot=dedent("""\ countplot : Show the counts of observations in each categorical bin.\ """), pointplot=dedent("""\ pointplot : Show point estimates and confidence intervals using scatterplot glyphs.\ """), factorplot=dedent("""\ factorplot : Combine categorical plots and a class:`FacetGrid`.\ """), ) _categorical_docs.update(_facet_docs) def boxplot(x=None, y=None, hue=None, data=None, order=None, hue_order=None, orient=None, color=None, palette=None, saturation=.75, width=.8, fliersize=5, linewidth=None, whis=1.5, notch=False, ax=None, **kwargs): # Try to handle broken backwards-compatability # This should help with the lack of a smooth deprecation, # but won't catch everything warn = False if isinstance(x, pd.DataFrame): data = x x = None warn = True if "vals" in kwargs: x = kwargs.pop("vals") warn = True if "groupby" in kwargs: y = x x = kwargs.pop("groupby") warn = True if "vert" in kwargs: vert = kwargs.pop("vert", True) if not vert: x, y = y, x orient = "v" if vert else "h" warn = True if "names" in kwargs: kwargs.pop("names") warn = True if "join_rm" in kwargs: kwargs.pop("join_rm") warn = True msg = ("The boxplot API has been changed. Attempting to adjust your " "arguments for the new API (which might not work). Please update " "your code. See the version 0.6 release notes for more info.") if warn: warnings.warn(msg, UserWarning) plotter = _BoxPlotter(x, y, hue, data, order, hue_order, orient, color, palette, saturation, width, fliersize, linewidth) if ax is None: ax = plt.gca() kwargs.update(dict(whis=whis, notch=notch)) plotter.plot(ax, kwargs) return ax boxplot.__doc__ = dedent("""\ Draw a box plot to show distributions with respect to categories. A box plot (or box-and-whisker plot) shows the distribution of quantitative data in a way that facilitates comparisons between variables or across levels of a categorical variable. The box shows the quartiles of the dataset while the whiskers extend to show the rest of the distribution, except for points that are determined to be "outliers" using a method that is a function of the inter-quartile range. {main_api_narrative} Parameters ---------- {input_params} {data} {order_vars} {orient} {color} {palette} {saturation} {width} fliersize : float, optional Size of the markers used to indicate outlier observations. {linewidth} whis : float, optional Proportion of the IQR past the low and high quartiles to extend the plot whiskers. Points outside this range will be identified as outliers. notch : boolean, optional Whether to "notch" the box to indicate a confidence interval for the median. There are several other parameters that can control how the notches are drawn; see the ``plt.boxplot`` help for more information on them. {ax_in} kwargs : key, value mappings Other keyword arguments are passed through to ``plt.boxplot`` at draw time. Returns ------- {ax_out} See Also -------- {violinplot} {stripplot} Examples -------- Draw a single horizontal boxplot: .. plot:: :context: close-figs >>> import seaborn as sns >>> sns.set_style("whitegrid") >>> tips = sns.load_dataset("tips") >>> ax = sns.boxplot(x=tips["total_bill"]) Draw a vertical boxplot grouped by a categorical variable: .. plot:: :context: close-figs >>> ax = sns.boxplot(x="day", y="total_bill", data=tips) Draw a boxplot with nested grouping by two categorical variables: .. plot:: :context: close-figs >>> ax = sns.boxplot(x="day", y="total_bill", hue="smoker", ... data=tips, palette="Set3") Draw a boxplot with nested grouping when some bins are empty: .. plot:: :context: close-figs >>> ax = sns.boxplot(x="day", y="total_bill", hue="time", ... data=tips, linewidth=2.5) Control box order by sorting the input data: .. plot:: :context: close-figs >>> ax = sns.boxplot(x="size", y="tip", data=tips.sort("size")) Control box order by passing an explicit order: .. plot:: :context: close-figs >>> ax = sns.boxplot(x="size", y="tip", data=tips, ... order=np.arange(1, 7), palette="Blues_d") Draw a boxplot for each numeric variable in a DataFrame: .. plot:: :context: close-figs >>> iris = sns.load_dataset("iris") >>> ax = sns.boxplot(data=iris, orient="h", palette="Set2") Use :func:`stripplot` to show the datapoints on top of the boxes: .. plot:: :context: close-figs >>> ax = sns.boxplot(x="day", y="total_bill", data=tips) >>> ax = sns.stripplot(x="day", y="total_bill", data=tips, ... size=4, jitter=True, edgecolor="gray") Draw a box plot on to a :class:`FacetGrid` to group within an additional categorical variable: .. plot:: :context: close-figs >>> g = sns.FacetGrid(tips, col="time", size=4, aspect=.7) >>> (g.map(sns.boxplot, "sex", "total_bill", "smoker") ... .despine(left=True) ... .add_legend(title="smoker")) #doctest: +ELLIPSIS """).format(**_categorical_docs) def violinplot(x=None, y=None, hue=None, data=None, order=None, hue_order=None, bw="scott", cut=2, scale="area", scale_hue=True, gridsize=100, width=.8, inner="box", split=False, orient=None, linewidth=None, color=None, palette=None, saturation=.75, ax=None, **kwargs): # Try to handle broken backwards-compatability # This should help with the lack of a smooth deprecation, # but won't catch everything warn = False if isinstance(x, pd.DataFrame): data = x x = None warn = True if "vals" in kwargs: x = kwargs.pop("vals") warn = True if "groupby" in kwargs: y = x x = kwargs.pop("groupby") warn = True if "vert" in kwargs: vert = kwargs.pop("vert", True) if not vert: x, y = y, x orient = "v" if vert else "h" warn = True msg = ("The violinplot API has been changed. Attempting to adjust your " "arguments for the new API (which might not work). Please update " "your code. See the version 0.6 release notes for more info.") if warn: warnings.warn(msg, UserWarning) plotter = _ViolinPlotter(x, y, hue, data, order, hue_order, bw, cut, scale, scale_hue, gridsize, width, inner, split, orient, linewidth, color, palette, saturation) if ax is None: ax = plt.gca() plotter.plot(ax) return ax violinplot.__doc__ = dedent("""\ Draw a combination of boxplot and kernel density estimate. A violin plot plays a similar role as a box and whisker plot. It shows the distribution of quantitative data across several levels of one (or more) categorical variables such that those distributions can be compared. Unlike a box plot, in which all of the plot components correspond to actual datapoints, the violin plot features a kernel density estimation of the underlying distribution. This can be an effective and attractive way to show multiple distributions of data at once, but keep in mind that the estimation procedure is influenced by the sample size, and violins for relatively small samples might look misleadingly smooth. {main_api_narrative} Parameters ---------- {input_params} {data} {order_vars} bw : {{'scott', 'silverman', float}}, optional Either the name of a reference rule or the scale factor to use when computing the kernel bandwidth. The actual kernel size will be determined by multiplying the scale factor by the standard deviation of the data within each bin. cut : float, optional Distance, in units of bandwidth size, to extend the density past the extreme datapoints. Set to 0 to limit the violin range within the range of the observed data (i.e., to have the same effect as ``trim=True`` in ``ggplot``. scale : {{"area", "count", "width"}}, optional The method used to scale the width of each violin. If ``area``, each violin will have the same area. If ``count``, the width of the violins will be scaled by the number of observations in that bin. If ``width``, each violin will have the same width. scale_hue : bool, optional When nesting violins using a ``hue`` variable, this parameter determines whether the scaling is computed within each level of the major grouping variable (``scale_hue=True``) or across all the violins on the plot (``scale_hue=False``). gridsize : int, optional Number of points in the discrete grid used to compute the kernel density estimate. {width} inner : {{"box", "quartile", "point", "stick", None}}, optional Representation of the datapoints in the violin interior. If ``box``, draw a miniature boxplot. If ``quartiles``, draw the quartiles of the distribution. If ``point`` or ``stick``, show each underlying datapoint. Using ``None`` will draw unadorned violins. split : bool, optional When using hue nesting with a variable that takes two levels, setting ``split`` to True will draw half of a violin for each level. This can make it easier to directly compare the distributions. {orient} {linewidth} {color} {palette} {saturation} {ax_in} Returns ------- {ax_out} See Also -------- {boxplot} {stripplot} Examples -------- Draw a single horizontal violinplot: .. plot:: :context: close-figs >>> import seaborn as sns >>> sns.set_style("whitegrid") >>> tips = sns.load_dataset("tips") >>> ax = sns.violinplot(x=tips["total_bill"]) Draw a vertical violinplot grouped by a categorical variable: .. plot:: :context: close-figs >>> ax = sns.violinplot(x="day", y="total_bill", data=tips) Draw a violinplot with nested grouping by two categorical variables: .. plot:: :context: close-figs >>> ax = sns.violinplot(x="day", y="total_bill", hue="smoker", ... data=tips, palette="muted") Draw split violins to compare the across the hue variable: .. plot:: :context: close-figs >>> ax = sns.violinplot(x="day", y="total_bill", hue="smoker", ... data=tips, palette="muted", split=True) Control violin order by sorting the input data: .. plot:: :context: close-figs >>> ax = sns.violinplot(x="size", y="tip", data=tips.sort("size")) Control violin order by passing an explicit order: .. plot:: :context: close-figs >>> ax = sns.violinplot(x="size", y="tip", data=tips, ... order=np.arange(1, 7), palette="Blues_d") Scale the violin width by the number of observations in each bin: .. plot:: :context: close-figs >>> ax = sns.violinplot(x="day", y="total_bill", hue="sex", ... data=tips, palette="Set2", split=True, ... scale="count") Draw the quartiles as horizontal lines instead of a mini-box: .. plot:: :context: close-figs >>> ax = sns.violinplot(x="day", y="total_bill", hue="sex", ... data=tips, palette="Set2", split=True, ... scale="count", inner="quartile") Show each observation with a stick inside the violin: .. plot:: :context: close-figs >>> ax = sns.violinplot(x="day", y="total_bill", hue="sex", ... data=tips, palette="Set2", split=True, ... scale="count", inner="stick") Scale the density relative to the counts across all bins: .. plot:: :context: close-figs >>> ax = sns.violinplot(x="day", y="total_bill", hue="sex", ... data=tips, palette="Set2", split=True, ... scale="count", inner="stick", scale_hue=False) Use a narrow bandwidth to reduce the amount of smoothing: .. plot:: :context: close-figs >>> ax = sns.violinplot(x="day", y="total_bill", hue="sex", ... data=tips, palette="Set2", split=True, ... scale="count", inner="stick", ... scale_hue=False, bw=.2) Draw horizontal violins: .. plot:: :context: close-figs >>> planets = sns.load_dataset("planets") >>> ax = sns.violinplot(x="orbital_period", y="method", ... data=planets[planets.orbital_period < 1000], ... scale="width", palette="Set3") Draw a violin plot on to a :class:`FacetGrid` to group within an additional categorical variable: .. plot:: :context: close-figs >>> g = sns.FacetGrid(tips, col="time", size=4, aspect=.7) >>> (g.map(sns.violinplot, "sex", "total_bill", "smoker", split=True) ... .despine(left=True) ... .add_legend(title="smoker")) # doctest: +ELLIPSIS """).format(**_categorical_docs) def stripplot(x=None, y=None, hue=None, data=None, order=None, hue_order=None, jitter=False, split=True, orient=None, color=None, palette=None, size=7, edgecolor="w", linewidth=1, ax=None, **kwargs): plotter = _StripPlotter(x, y, hue, data, order, hue_order, jitter, split, orient, color, palette) if ax is None: ax = plt.gca() kwargs.update(dict(s=size ** 2, edgecolor=edgecolor, linewidth=linewidth)) if edgecolor == "gray": kwargs["edgecolor"] = plotter.gray plotter.plot(ax, kwargs) return ax stripplot.__doc__ = dedent("""\ Draw a scatterplot where one variable is categorical. A strip plot can be drawn on its own, but it is also a good complement to a box or violin plot in cases where you want to show all observations along with some representation of the underlying distribution. {main_api_narrative} Parameters ---------- {input_params} {data} {order_vars} jitter : float, ``True``/``1`` is special-cased, optional Amount of jitter (only along the categorical axis) to apply. This can be useful when you have many points and they overlap, so that it is easier to see the distribution. You can specify the amount of jitter (half the width of the uniform random variable support), or just use ``True`` for a good default. split : bool, optional When using ``hue`` nesting, setting this to ``True`` will separate the strips for different hue levels along the categorical axis. Otherwise, the points for each level will be plotted on top of each other. {orient} {color} {palette} size : float, optional Diameter of the markers, in points. (Although ``plt.scatter`` is used to draw the points, the ``size`` argument here takes a "normal" markersize and not size^2 like ``plt.scatter``. edgecolor : matplotlib color, "gray" is special-cased, optional Color of the lines around each point. If you pass ``"gray"``, the brightness is determined by the color palette used for the body of the points. {linewidth} {ax_in} Returns ------- {ax_out} See Also -------- {boxplot} {violinplot} Examples -------- Draw a single horizontal strip plot: .. plot:: :context: close-figs >>> import seaborn as sns >>> sns.set_style("whitegrid") >>> tips = sns.load_dataset("tips") >>> ax = sns.stripplot(x=tips["total_bill"]) Group the strips by a categorical variable: .. plot:: :context: close-figs >>> ax = sns.stripplot(x="day", y="total_bill", data=tips) Add jitter to bring out the distribution of values: .. plot:: :context: close-figs >>> ax = sns.stripplot(x="day", y="total_bill", data=tips, jitter=True) Use a smaller amount of jitter: .. plot:: :context: close-figs >>> ax = sns.stripplot(x="day", y="total_bill", data=tips, jitter=0.05) Draw horizontal strips: .. plot:: :context: close-figs >>> ax = sns.stripplot(x="total_bill", y="day", data=tips, ... jitter=True) Nest the strips within a second categorical variable: .. plot:: :context: close-figs >>> ax = sns.stripplot(x="sex", y="total_bill", hue="day", ... data=tips, jitter=True) Draw each level of the ``hue`` variable at the same location on the major categorical axis: .. plot:: :context: close-figs >>> ax = sns.stripplot(x="day", y="total_bill", hue="smoker", ... data=tips, jitter=True, ... palette="Set2", split=False) Control strip order by sorting the input data: .. plot:: :context: close-figs >>> ax = sns.stripplot(x="size", y="tip", data=tips.sort("size")) Control strip order by passing an explicit order: .. plot:: :context: close-figs >>> ax = sns.stripplot(x="size", y="tip", data=tips, ... order=np.arange(1, 7), palette="Blues_d") Draw strips with large points and different aesthetics: .. plot:: :context: close-figs >>> ax = sns.stripplot("day", "total_bill", "smoker", data=tips, ... palette="Set2", size=20, marker="D", ... edgecolor="gray", alpha=.25) Draw strips of observations on top of a box plot: .. plot:: :context: close-figs >>> ax = sns.boxplot(x="tip", y="day", data=tips, whis=np.inf) >>> ax = sns.stripplot(x="tip", y="day", data=tips, jitter=True) Draw strips of observations on top of a violin plot: .. plot:: :context: close-figs >>> ax = sns.violinplot(x="day", y="total_bill", data=tips, inner=None) >>> ax = sns.stripplot(x="day", y="total_bill", data=tips, ... jitter=True, color="white", edgecolor="gray") """).format(**_categorical_docs) def barplot(x=None, y=None, hue=None, data=None, order=None, hue_order=None, estimator=np.mean, ci=95, n_boot=1000, units=None, orient=None, color=None, palette=None, saturation=.75, errcolor=".26", ax=None, **kwargs): # Handle some deprecated arguments if "hline" in kwargs: kwargs.pop("hline") warnings.warn("The `hline` parameter has been removed", UserWarning) if "dropna" in kwargs: kwargs.pop("dropna") warnings.warn("The `dropna` parameter has been removed", UserWarning) if "x_order" in kwargs: order = kwargs.pop("x_order") warnings.warn("The `x_order` parameter has been renamed `order`", UserWarning) plotter = _BarPlotter(x, y, hue, data, order, hue_order, estimator, ci, n_boot, units, orient, color, palette, saturation, errcolor) if ax is None: ax = plt.gca() plotter.plot(ax, kwargs) return ax barplot.__doc__ = dedent("""\ Show point estimates and confidence intervals as rectangular bars. A bar plot represents an estimate of central tendency for a numeric variable with the height of each rectangle and provides some indication of the uncertainty around that estimate using error bars. Bar plots include 0 in the quantitative axis range, and they are a good choice when 0 is a meaningful value for the quantitative variable, and you want to make comparisons against it. For datasets where 0 is not a meaningful value, a point plot will allow you to focus on differences between levels of one or more categorical variables. It is also important to keep in mind that a bar plot shows only the mean (or other estimator) value, but in many cases it may be more informative to show the distribution of values at each level of the categorical variables. In that case, other approaches such as a box or violin plot may be more appropriate. {main_api_narrative} Parameters ---------- {input_params} {data} {order_vars} {stat_api_params} {orient} {color} {palette} {saturation} errcolor : matplotlib color Color for the lines that represent the confidence interval. {ax_in} kwargs : key, value mappings Other keyword arguments are passed through to ``plt.bar`` at draw time. Returns ------- {ax_out} See Also -------- {countplot} {pointplot} {factorplot} Examples -------- Draw a set of vertical bar plots grouped by a categorical variable: .. plot:: :context: close-figs >>> import seaborn as sns >>> sns.set_style("whitegrid") >>> tips = sns.load_dataset("tips") >>> ax = sns.barplot(x="day", y="total_bill", data=tips) Draw a set of vertical bars with nested grouping by a two variables: .. plot:: :context: close-figs >>> ax = sns.barplot(x="day", y="total_bill", hue="sex", data=tips) Draw a set of horizontal bars: .. plot:: :context: close-figs >>> ax = sns.barplot(x="tip", y="day", data=tips) Control bar order by sorting the input data: .. plot:: :context: close-figs >>> ax = sns.barplot(x="size", y="tip", data=tips.sort("size")) Control bar order by passing an explicit order: .. plot:: :context: close-figs >>> ax = sns.barplot(x="size", y="tip", data=tips, ... order=np.arange(1, 7), palette="Blues_d") Use median as the estimate of central tendency: .. plot:: :context: close-figs >>> from numpy import median >>> ax = sns.barplot(x="day", y="tip", data=tips, estimator=median) Show the standard error of the mean with the error bars: .. plot:: :context: close-figs >>> ax = sns.barplot(x="day", y="tip", data=tips, ci=68) Use a different color palette for the bars: .. plot:: :context: close-figs >>> ax = sns.barplot("size", y="total_bill", data=tips.sort("size"), ... palette="Blues_d") Plot all bars in a single color: .. plot:: :context: close-figs >>> ax = sns.barplot("size", y="total_bill", data=tips.sort("size"), ... color="salmon", saturation=.5) Use ``plt.bar`` keyword arguments to further change the aesthetic: .. plot:: :context: close-figs >>> ax = sns.barplot("day", "total_bill", data=tips, ... linewidth=2.5, facecolor=(1, 1, 1, 0), ... errcolor=".2", edgecolor=".2") """).format(**_categorical_docs) def pointplot(x=None, y=None, hue=None, data=None, order=None, hue_order=None, estimator=np.mean, ci=95, n_boot=1000, units=None, markers="o", linestyles="-", dodge=False, join=True, scale=1, orient=None, color=None, palette=None, ax=None, **kwargs): # Handle some deprecated arguments if "hline" in kwargs: kwargs.pop("hline") warnings.warn("The `hline` parameter has been removed", UserWarning) if "dropna" in kwargs: kwargs.pop("dropna") warnings.warn("The `dropna` parameter has been removed", UserWarning) if "x_order" in kwargs: order = kwargs.pop("x_order") warnings.warn("The `x_order` parameter has been renamed `order`", UserWarning) plotter = _PointPlotter(x, y, hue, data, order, hue_order, estimator, ci, n_boot, units, markers, linestyles, dodge, join, scale, orient, color, palette) if ax is None: ax = plt.gca() plotter.plot(ax) return ax pointplot.__doc__ = dedent("""\ Show point estimates and confidence intervals using scatter plot glyphs. A point plot represents an estimate of central tendency for a numeric variable by the position of scatter plot points and provides some indication of the uncertainty around that estimate using error bars. Point plots can be more useful than bar plots for focusing comparisons between different levels of one or more categorical variables. They are particularly adept at showing interactions: how the relationship between levels of one categorical variable changes across levels of a second categorical variable. The lines that join each point from the same ``hue`` level allow interactions to be judged by differences in slope, which is easier for the eyes than comparing the heights of several groups of points or bars. It is important to keep in mind that a point plot shows only the mean (or other estimator) value, but in many cases it may be more informative to show the distribution of values at each level of the categorical variables. In that case, other approaches such as a box or violin plot may be more appropriate. {main_api_narrative} Parameters ---------- {input_params} {data} {order_vars} {stat_api_params} markers : string or list of strings, optional Markers to use for each of the ``hue`` levels. linestyles : string or list of strings, optional Line styles to use for each of the ``hue`` levels. dodge : bool or float, optional Amount to separate the points for each level of the ``hue`` variable along the categorical axis. join : bool, optional If ``True``, lines will be drawn between point estimates at the same ``hue`` level. scale : float, optional Scale factor for the plot elements. {orient} {color} {palette} {ax_in} Returns ------- {ax_out} See Also -------- {barplot} {factorplot} Examples -------- Draw a set of vertical point plots grouped by a categorical variable: .. plot:: :context: close-figs >>> import seaborn as sns >>> sns.set_style("darkgrid") >>> tips = sns.load_dataset("tips") >>> ax = sns.pointplot(x="time", y="total_bill", data=tips) Draw a set of vertical points with nested grouping by a two variables: .. plot:: :context: close-figs >>> ax = sns.pointplot(x="time", y="total_bill", hue="smoker", ... data=tips) Separate the points for different hue levels along the categorical axis: .. plot:: :context: close-figs >>> ax = sns.pointplot(x="time", y="total_bill", hue="smoker", ... data=tips, dodge=True) Use a different marker and line style for the hue levels: .. plot:: :context: close-figs >>> ax = sns.pointplot(x="time", y="total_bill", hue="smoker", ... data=tips, ... markers=["o", "x"], ... linestyles=["-", "--"]) Draw a set of horizontal points: .. plot:: :context: close-figs >>> ax = sns.pointplot(x="tip", y="day", data=tips) Don't draw a line connecting each point: .. plot:: :context: close-figs >>> ax = sns.pointplot(x="tip", y="day", data=tips, join=False) Use a different color for a single-layer plot: .. plot:: :context: close-figs >>> ax = sns.pointplot("time", y="total_bill", data=tips, ... color="#bb3f3f") Use a different color palette for the points: .. plot:: :context: close-figs >>> ax = sns.pointplot(x="time", y="total_bill", hue="smoker", ... data=tips, palette="Set2") Control point order by sorting the input data: .. plot:: :context: close-figs >>> ax = sns.pointplot(x="size", y="tip", data=tips.sort("size")) Control point order by passing an explicit order: .. plot:: :context: close-figs >>> ax = sns.pointplot(x="size", y="tip", data=tips, ... order=np.arange(1, 7), palette="Blues_d") Use median as the estimate of central tendency: .. plot:: :context: close-figs >>> from numpy import median >>> ax = sns.pointplot(x="day", y="tip", data=tips, estimator=median) Show the standard error of the mean with the error bars: .. plot:: :context: close-figs >>> ax = sns.pointplot(x="day", y="tip", data=tips, ci=68) """).format(**_categorical_docs) def countplot(x=None, y=None, hue=None, data=None, order=None, hue_order=None, orient=None, color=None, palette=None, saturation=.75, ax=None, **kwargs): estimator = len ci = None n_boot = 0 units = None errcolor = None if x is None and y is not None: orient = "h" x = y elif y is None and x is not None: orient = "v" y = x elif x is not None and y is not None: raise TypeError("Cannot pass values for both `x` and `y`") else: raise TypeError("Must pass valus for either `x` or `y`") plotter = _BarPlotter(x, y, hue, data, order, hue_order, estimator, ci, n_boot, units, orient, color, palette, saturation, errcolor) plotter.value_label = "count" if ax is None: ax = plt.gca() plotter.plot(ax, kwargs) return ax countplot.__doc__ = dedent("""\ Show the counts of observations in each categorical bin using bars. A count plot can be thought of as a histogram across a categorical, instead of quantitative, variable. The basic API and options are identical to those for :func:`barplot`, so you can compare counts across nested variables. {main_api_narrative} Parameters ---------- {input_params} {data} {order_vars} {orient} {color} {palette} {saturation} {ax_in} kwargs : key, value mappings Other keyword arguments are passed to ``plt.bar``. Returns ------- {ax_out} See Also -------- {barplot} {factorplot} Examples -------- Show value counts for a single categorical variable: .. plot:: :context: close-figs >>> import seaborn as sns >>> sns.set(style="darkgrid") >>> titanic = sns.load_dataset("titanic") >>> ax = sns.countplot(x="class", data=titanic) Show value counts for two categorical variables: .. plot:: :context: close-figs >>> ax = sns.countplot(x="class", hue="who", data=titanic) Plot the bars horizontally: .. plot:: :context: close-figs >>> ax = sns.countplot(y="class", hue="who", data=titanic) Use a different color palette: .. plot:: :context: close-figs >>> ax = sns.countplot(x="who", data=titanic, palette="Set3") Use ``plt.bar`` keyword arguments for a different look: .. plot:: :context: close-figs >>> ax = sns.countplot(x="who", data=titanic, ... facecolor=(0, 0, 0, 0), ... linewidth=5, ... edgecolor=sns.color_palette("dark", 3)) """).format(**_categorical_docs) def factorplot(x=None, y=None, hue=None, data=None, row=None, col=None, col_wrap=None, estimator=np.mean, ci=95, n_boot=1000, units=None, order=None, hue_order=None, row_order=None, col_order=None, kind="point", size=4, aspect=1, orient=None, color=None, palette=None, legend=True, legend_out=True, sharex=True, sharey=True, margin_titles=False, facet_kws=None, **kwargs): # Handle some deprecated arguments if "hline" in kwargs: kwargs.pop("hline") warnings.warn("The `hline` parameter has been removed", UserWarning) if "dropna" in kwargs: kwargs.pop("dropna") warnings.warn("The `dropna` parameter has been removed", UserWarning) if "x_order" in kwargs: order = kwargs.pop("x_order") warnings.warn("The `x_order` parameter has been renamed `order`", UserWarning) # Determine the plotting function try: plot_func = globals()[kind + "plot"] except KeyError: err = "Plot kind '{}' is not recognized".format(kind) raise ValueError(err) # Alias the input variables to determine categorical order and palette # correctly in the case of a count plot if kind == "count": if x is None and y is not None: x_, y_, orient = y, y, "h" elif y is None and x is not None: x_, y_, orient = x, x, "v" else: x_, y_ = x, y # Determine the order for the whole dataset, which will be used in all # facets to ensure representation of all data in the final plot p = _CategoricalPlotter() p.establish_variables(x_, y_, hue, data, orient, order, hue_order) order = p.group_names hue_order = p.hue_names # Determine the palette to use # (FacetGrid will pass a value for ``color`` to the plotting function # so we need to define ``palette`` to get default behavior for the # categorical functions p.establish_colors(color, palette, 1) if kind != "point" or hue is not None: palette = p.colors # Determine keyword arguments for the facets facet_kws = {} if facet_kws is None else facet_kws facet_kws.update( data=data, row=row, col=col, row_order=row_order, col_order=col_order, col_wrap=col_wrap, size=size, aspect=aspect, sharex=sharex, sharey=sharey, legend_out=legend_out, margin_titles=margin_titles, dropna=False, ) # Determine keyword arguments for the plotting function plot_kws = dict( order=order, hue_order=hue_order, orient=orient, color=color, palette=palette, ) plot_kws.update(kwargs) if kind in ["bar", "point"]: plot_kws.update( estimator=estimator, ci=ci, n_boot=n_boot, units=units, ) # Initialize the facets g = FacetGrid(**facet_kws) # Draw the plot onto the facets g.map_dataframe(plot_func, x, y, hue, **plot_kws) # Special case axis labels for a count type plot if kind == "count": if x is None: g.set_axis_labels(x_var="count") if y is None: g.set_axis_labels(y_var="count") if legend and (hue is not None) and (hue not in [x, row, col]): hue_order = list(map(str, hue_order)) g.add_legend(title=hue, label_order=hue_order) return g factorplot.__doc__ = dedent("""\ Draw a categorical plot onto a FacetGrid. The default plot that is shown is a point plot, but other seaborn categorical plots can be chosen with the ``kind`` parameter, including box plots, violin plots, bar plots, or strip plots. It is important to choose how variables get mapped to the plot structure such that the most important comparisons are easiest to make. As a general rule, it is easier to compare positions that are closer together, so the ``hue`` variable should be used for the most important comparisons. For secondary comparisons, try to share the quantitative axis (so, use ``col`` for vertical plots and ``row`` for horizontal plots). Note that, although it is possible to make rather complex plots using this function, in many cases you may be better served by created several smaller and more focused plots than by trying to stuff many comparisons into one figure. After plotting, the :class:`FacetGrid` with the plot is returned and can be used directly to tweak supporting plot details or add other layers. Note that, unlike when using the underlying plotting functions directly, data must be passed in a long-form DataFrame with variables specified by passing strings to ``x``, ``y``, ``hue``, and other parameters. As in the case with the underlying plot functions, if variables have a ``categorical`` data type, the correct orientation of the plot elements, the levels of the categorical variables, and their order will be inferred from the objects. Otherwise you may have to use the function parameters (``orient``, ``order``, ``hue_order``, etc.) to set up the plot correctly. Parameters ---------- {string_input_params} {long_form_data} row, col : names of variables in ``data``, optional Categorical variables that will determine the faceting of the grid. {col_wrap} {stat_api_params} {order_vars} row_order, col_order : lists of strings, optional Order to organize the rows and/or columns of the grid in, otherwise the orders are inferred from the data objects. kind : {{``point``, ``bar``, ``count``, ``box``, ``violin``, ``strip``}} The kind of plot to draw. {size} {aspect} {orient} {color} {palette} legend : bool, optional If ``True`` and there is a ``hue`` variable, draw a legend on the plot. {legend_out} {share_xy} {margin_titles} facet_kws : dict, optional Dictionary of other keyword arguments to pass to :class:`FacetGrid`. kwargs : key, value pairings Other keyword arguments are passed through to the underlying plotting function. Returns ------- g : :class:`FacetGrid` Returns the :class:`FacetGrid` object with the plot on it for further tweaking. Examples -------- Draw a single facet to use the :class:`FacetGrid` legend placement: .. plot:: :context: close-figs >>> import seaborn as sns >>> sns.set(style="ticks") >>> exercise = sns.load_dataset("exercise") >>> g = sns.factorplot(x="time", y="pulse", hue="kind", data=exercise) Use a different plot kind to visualize the same data: .. plot:: :context: close-figs >>> g = sns.factorplot(x="time", y="pulse", hue="kind", ... data=exercise, kind="violin") Facet along the columns to show a third categorical variable: .. plot:: :context: close-figs >>> g = sns.factorplot(x="time", y="pulse", hue="kind", ... col="diet", data=exercise) Use a different size and aspect ratio for the facets: .. plot:: :context: close-figs >>> g = sns.factorplot(x="time", y="pulse", hue="kind", ... col="diet", data=exercise, ... size=5, aspect=.8) Make many column facets and wrap them into the rows of the grid: .. plot:: :context: close-figs >>> titanic = sns.load_dataset("titanic") >>> g = sns.factorplot("alive", col="deck", col_wrap=4, ... data=titanic[titanic.deck.notnull()], ... kind="count", size=2.5, aspect=.8) Plot horizontally and pass other keyword arguments to the plot function: .. plot:: :context: close-figs >>> g = sns.factorplot(x="age", y="embark_town", ... hue="sex", row="class", ... data=titanic[titanic.embark_town.notnull()], ... orient="h", size=2, aspect=3.5, palette="Set3", ... kind="violin", split=True, cut=0, bw=.2) Use methods on the returned :class:`FacetGrid` to tweak the presentation: .. plot:: :context: close-figs >>> g = sns.factorplot(x="who", y="survived", col="class", ... data=titanic, saturation=.5, ... kind="bar", ci=None, aspect=.6) >>> (g.set_axis_labels("", "Survival Rate") ... .set_xticklabels(["Men", "Women", "Children"]) ... .set_titles("{{col_name}} {{col_var}}") ... .set(ylim=(0, 1)) ... .despine(left=True)) #doctest: +ELLIPSIS """).format(**_categorical_docs) seaborn-0.6.0/seaborn/crayons.py000066400000000000000000000103521254425553400166410ustar00rootroot00000000000000crayons = {'Almond': '#EFDECD', 'Antique Brass': '#CD9575', 'Apricot': '#FDD9B5', 'Aquamarine': '#78DBE2', 'Asparagus': '#87A96B', 'Atomic Tangerine': '#FFA474', 'Banana Mania': '#FAE7B5', 'Beaver': '#9F8170', 'Bittersweet': '#FD7C6E', 'Black': '#000000', 'Blue': '#1F75FE', 'Blue Bell': '#A2A2D0', 'Blue Green': '#0D98BA', 'Blue Violet': '#7366BD', 'Blush': '#DE5D83', 'Brick Red': '#CB4154', 'Brown': '#B4674D', 'Burnt Orange': '#FF7F49', 'Burnt Sienna': '#EA7E5D', 'Cadet Blue': '#B0B7C6', 'Canary': '#FFFF99', 'Caribbean Green': '#00CC99', 'Carnation Pink': '#FFAACC', 'Cerise': '#DD4492', 'Cerulean': '#1DACD6', 'Chestnut': '#BC5D58', 'Copper': '#DD9475', 'Cornflower': '#9ACEEB', 'Cotton Candy': '#FFBCD9', 'Dandelion': '#FDDB6D', 'Denim': '#2B6CC4', 'Desert Sand': '#EFCDB8', 'Eggplant': '#6E5160', 'Electric Lime': '#CEFF1D', 'Fern': '#71BC78', 'Forest Green': '#6DAE81', 'Fuchsia': '#C364C5', 'Fuzzy Wuzzy': '#CC6666', 'Gold': '#E7C697', 'Goldenrod': '#FCD975', 'Granny Smith Apple': '#A8E4A0', 'Gray': '#95918C', 'Green': '#1CAC78', 'Green Yellow': '#F0E891', 'Hot Magenta': '#FF1DCE', 'Inchworm': '#B2EC5D', 'Indigo': '#5D76CB', 'Jazzberry Jam': '#CA3767', 'Jungle Green': '#3BB08F', 'Laser Lemon': '#FEFE22', 'Lavender': '#FCB4D5', 'Macaroni and Cheese': '#FFBD88', 'Magenta': '#F664AF', 'Mahogany': '#CD4A4C', 'Manatee': '#979AAA', 'Mango Tango': '#FF8243', 'Maroon': '#C8385A', 'Mauvelous': '#EF98AA', 'Melon': '#FDBCB4', 'Midnight Blue': '#1A4876', 'Mountain Meadow': '#30BA8F', 'Navy Blue': '#1974D2', 'Neon Carrot': '#FFA343', 'Olive Green': '#BAB86C', 'Orange': '#FF7538', 'Orchid': '#E6A8D7', 'Outer Space': '#414A4C', 'Outrageous Orange': '#FF6E4A', 'Pacific Blue': '#1CA9C9', 'Peach': '#FFCFAB', 'Periwinkle': '#C5D0E6', 'Piggy Pink': '#FDDDE6', 'Pine Green': '#158078', 'Pink Flamingo': '#FC74FD', 'Pink Sherbert': '#F78FA7', 'Plum': '#8E4585', 'Purple Heart': '#7442C8', "Purple Mountains' Majesty": '#9D81BA', 'Purple Pizzazz': '#FE4EDA', 'Radical Red': '#FF496C', 'Raw Sienna': '#D68A59', 'Razzle Dazzle Rose': '#FF48D0', 'Razzmatazz': '#E3256B', 'Red': '#EE204D', 'Red Orange': '#FF5349', 'Red Violet': '#C0448F', "Robin's Egg Blue": '#1FCECB', 'Royal Purple': '#7851A9', 'Salmon': '#FF9BAA', 'Scarlet': '#FC2847', "Screamin' Green": '#76FF7A', 'Sea Green': '#93DFB8', 'Sepia': '#A5694F', 'Shadow': '#8A795D', 'Shamrock': '#45CEA2', 'Shocking Pink': '#FB7EFD', 'Silver': '#CDC5C2', 'Sky Blue': '#80DAEB', 'Spring Green': '#ECEABE', 'Sunglow': '#FFCF48', 'Sunset Orange': '#FD5E53', 'Tan': '#FAA76C', 'Tickle Me Pink': '#FC89AC', 'Timberwolf': '#DBD7D2', 'Tropical Rain Forest': '#17806D', 'Tumbleweed': '#DEAA88', 'Turquoise Blue': '#77DDE7', 'Unmellow Yellow': '#FFFF66', 'Violet (Purple)': '#926EAE', 'Violet Red': '#F75394', 'Vivid Tangerine': '#FFA089', 'Vivid Violet': '#8F509D', 'White': '#FFFFFF', 'Wild Blue Yonder': '#A2ADD0', 'Wild Strawberry': '#FF43A4', 'Wild Watermelon': '#FC6C85', 'Wisteria': '#CDA4DE', 'Yellow': '#FCE883', 'Yellow Green': '#C5E384', 'Yellow Orange': '#FFAE42'} seaborn-0.6.0/seaborn/distributions.py000066400000000000000000000672501254425553400200760ustar00rootroot00000000000000"""Plotting functions for visualizing distributions.""" from __future__ import division import numpy as np from scipy import stats import pandas as pd import matplotlib as mpl import matplotlib.pyplot as plt import warnings from six import string_types try: import statsmodels.nonparametric.api as smnp _has_statsmodels = True except ImportError: _has_statsmodels = False from .utils import set_hls_values, iqr, _kde_support from .palettes import color_palette, blend_palette from .axisgrid import JointGrid def _freedman_diaconis_bins(a): """Calculate number of hist bins using Freedman-Diaconis rule.""" # From http://stats.stackexchange.com/questions/798/ a = np.asarray(a) h = 2 * iqr(a) / (len(a) ** (1 / 3)) # fall back to sqrt(a) bins if iqr is 0 if h == 0: return np.sqrt(a.size) else: return np.ceil((a.max() - a.min()) / h) def distplot(a, bins=None, hist=True, kde=True, rug=False, fit=None, hist_kws=None, kde_kws=None, rug_kws=None, fit_kws=None, color=None, vertical=False, norm_hist=False, axlabel=None, label=None, ax=None): """Flexibly plot a univariate distribution of observations. This function combines the matplotlib ``hist`` function (with automatic calculation of a good default bin size) with the seaborn :func:`kdeplot` and :func:`rugplot` functions. It can also fit ``scipy.stats`` distributions and plot the estimated PDF over the data. Parameters ---------- a : Series, 1d-array, or list. Observed data. If this is a Series object with a ``name`` attribute, the name will be used to label the data axis. bins : argument for matplotlib hist(), or None, optional Specification of hist bins, or None to use Freedman-Diaconis rule. hist : bool, optional Whether to plot a (normed) histogram. kde : bool, optional Whether to plot a gaussian kernel density estimate. rug : bool, optional Whether to draw a rugplot on the support axis. fit : random variable object, optional An object with `fit` method, returning a tuple that can be passed to a `pdf` method a positional arguments following an grid of values to evaluate the pdf on. {hist, kde, rug, fit}_kws : dictionaries, optional Keyword arguments for underlying plotting functions. color : matplotlib color, optional Color to plot everything but the fitted curve in. vertical : bool, optional If True, oberved values are on y-axis. norm_hist : bool, otional If True, the histogram height shows a density rather than a count. This is implied if a KDE or fitted density is plotted. axlabel : string, False, or None, optional Name for the support axis label. If None, will try to get it from a.namel if False, do not set a label. label : string, optional Legend label for the relevent component of the plot ax : matplotlib axis, optional if provided, plot on this axis Returns ------- ax : matplotlib Axes Returns the Axes object with the plot for further tweaking. See Also -------- kdeplot : Show a univariate or bivariate distribution with a kernel density estimate. rugplot : Draw small vertical lines to show each observation in a distribution. Examples -------- Show a default plot with a kernel density estimate and histogram with bin size determined automatically with a reference rule: .. plot:: :context: close-figs >>> import seaborn as sns, numpy as np >>> sns.set(rc={"figure.figsize": (8, 4)}); np.random.seed(0) >>> x = np.random.randn(100) >>> ax = sns.distplot(x) Use Pandas objects to get an informative axis label: .. plot:: :context: close-figs >>> import pandas as pd >>> x = pd.Series(x, name="x variable") >>> ax = sns.distplot(x) Plot the distribution with a kenel density estimate and rug plot: .. plot:: :context: close-figs >>> ax = sns.distplot(x, rug=True, hist=False) Plot the distribution with a histogram and maximum likelihood gaussian distribution fit: .. plot:: :context: close-figs >>> from scipy.stats import norm >>> ax = sns.distplot(x, fit=norm, kde=False) Plot the distribution on the vertical axis: .. plot:: :context: close-figs >>> ax = sns.distplot(x, vertical=True) Change the color of all the plot elements: .. plot:: :context: close-figs >>> sns.set_color_codes() >>> ax = sns.distplot(x, color="y") Pass specific parameters to the underlying plot functions: .. plot:: :context: close-figs >>> ax = sns.distplot(x, rug=True, rug_kws={"color": "g"}, ... kde_kws={"color": "k", "lw": 3, "label": "KDE"}, ... hist_kws={"histtype": "step", "linewidth": 3, ... "alpha": 1, "color": "g"}) """ if ax is None: ax = plt.gca() # Intelligently label the support axis label_ax = bool(axlabel) if axlabel is None and hasattr(a, "name"): axlabel = a.name if axlabel is not None: label_ax = True # Make a a 1-d array a = np.asarray(a).squeeze() # Decide if the hist is normed norm_hist = norm_hist or kde or (fit is not None) # Handle dictionary defaults if hist_kws is None: hist_kws = dict() if kde_kws is None: kde_kws = dict() if rug_kws is None: rug_kws = dict() if fit_kws is None: fit_kws = dict() # Get the color from the current color cycle if color is None: if vertical: line, = ax.plot(0, a.mean()) else: line, = ax.plot(a.mean(), 0) color = line.get_color() line.remove() # Plug the label into the right kwarg dictionary if label is not None: if hist: hist_kws["label"] = label elif kde: kde_kws["label"] = label elif rug: rug_kws["label"] = label elif fit: fit_kws["label"] = label if hist: if bins is None: bins = min(_freedman_diaconis_bins(a), 50) hist_kws.setdefault("alpha", 0.4) hist_kws.setdefault("normed", norm_hist) orientation = "horizontal" if vertical else "vertical" hist_color = hist_kws.pop("color", color) ax.hist(a, bins, orientation=orientation, color=hist_color, **hist_kws) if hist_color != color: hist_kws["color"] = hist_color if kde: kde_color = kde_kws.pop("color", color) kdeplot(a, vertical=vertical, ax=ax, color=kde_color, **kde_kws) if kde_color != color: kde_kws["color"] = kde_color if rug: rug_color = rug_kws.pop("color", color) axis = "y" if vertical else "x" rugplot(a, axis=axis, ax=ax, color=rug_color, **rug_kws) if rug_color != color: rug_kws["color"] = rug_color if fit is not None: fit_color = fit_kws.pop("color", "#282828") gridsize = fit_kws.pop("gridsize", 200) cut = fit_kws.pop("cut", 3) clip = fit_kws.pop("clip", (-np.inf, np.inf)) bw = stats.gaussian_kde(a).scotts_factor() * a.std(ddof=1) x = _kde_support(a, bw, gridsize, cut, clip) params = fit.fit(a) pdf = lambda x: fit.pdf(x, *params) y = pdf(x) if vertical: x, y = y, x ax.plot(x, y, color=fit_color, **fit_kws) if fit_color != "#282828": fit_kws["color"] = fit_color if label_ax: if vertical: ax.set_ylabel(axlabel) else: ax.set_xlabel(axlabel) return ax def _univariate_kdeplot(data, shade, vertical, kernel, bw, gridsize, cut, clip, legend, ax, cumulative=False, **kwargs): """Plot a univariate kernel density estimate on one of the axes.""" # Sort out the clipping if clip is None: clip = (-np.inf, np.inf) # Calculate the KDE if _has_statsmodels: # Prefer using statsmodels for kernel flexibility x, y = _statsmodels_univariate_kde(data, kernel, bw, gridsize, cut, clip, cumulative=cumulative) else: # Fall back to scipy if missing statsmodels if kernel != "gau": kernel = "gau" msg = "Kernel other than `gau` requires statsmodels." warnings.warn(msg, UserWarning) if cumulative: raise ImportError("Cumulative distributions are currently" "only implemented in statsmodels." "Please install statsmodels.") x, y = _scipy_univariate_kde(data, bw, gridsize, cut, clip) # Make sure the density is nonnegative y = np.amax(np.c_[np.zeros_like(y), y], axis=1) # Flip the data if the plot should be on the y axis if vertical: x, y = y, x # Check if a label was specified in the call label = kwargs.pop("label", None) # Otherwise check if the data object has a name if label is None and hasattr(data, "name"): label = data.name # Decide if we're going to add a legend legend = label is not None and legend label = "_nolegend_" if label is None else label # Use the active color cycle to find the plot color line, = ax.plot(x, y, **kwargs) color = line.get_color() line.remove() kwargs.pop("color", None) # Draw the KDE plot and, optionally, shade ax.plot(x, y, color=color, label=label, **kwargs) alpha = kwargs.get("alpha", 0.25) if shade: if vertical: ax.fill_betweenx(y, 1e-12, x, color=color, alpha=alpha) else: ax.fill_between(x, 1e-12, y, color=color, alpha=alpha) # Draw the legend here if legend: ax.legend(loc="best") return ax def _statsmodels_univariate_kde(data, kernel, bw, gridsize, cut, clip, cumulative=False): """Compute a univariate kernel density estimate using statsmodels.""" fft = kernel == "gau" kde = smnp.KDEUnivariate(data) kde.fit(kernel, bw, fft, gridsize=gridsize, cut=cut, clip=clip) if cumulative: grid, y = kde.support, kde.cdf else: grid, y = kde.support, kde.density return grid, y def _scipy_univariate_kde(data, bw, gridsize, cut, clip): """Compute a univariate kernel density estimate using scipy.""" try: kde = stats.gaussian_kde(data, bw_method=bw) except TypeError: kde = stats.gaussian_kde(data) if bw != "scott": # scipy default msg = ("Ignoring bandwidth choice, " "please upgrade scipy to use a different bandwidth.") warnings.warn(msg, UserWarning) if isinstance(bw, string_types): bw = "scotts" if bw == "scott" else bw bw = getattr(kde, "%s_factor" % bw)() grid = _kde_support(data, bw, gridsize, cut, clip) y = kde(grid) return grid, y def _bivariate_kdeplot(x, y, filled, fill_lowest, kernel, bw, gridsize, cut, clip, axlabel, ax, **kwargs): """Plot a joint KDE estimate as a bivariate contour plot.""" # Determine the clipping if clip is None: clip = [(-np.inf, np.inf), (-np.inf, np.inf)] elif np.ndim(clip) == 1: clip = [clip, clip] # Calculate the KDE if _has_statsmodels: xx, yy, z = _statsmodels_bivariate_kde(x, y, bw, gridsize, cut, clip) else: xx, yy, z = _scipy_bivariate_kde(x, y, bw, gridsize, cut, clip) # Plot the contours n_levels = kwargs.pop("n_levels", 10) cmap = kwargs.get("cmap", "BuGn" if filled else "BuGn_d") if isinstance(cmap, string_types): if cmap.endswith("_d"): pal = ["#333333"] pal.extend(color_palette(cmap.replace("_d", "_r"), 2)) cmap = blend_palette(pal, as_cmap=True) else: cmap = mpl.cm.get_cmap(cmap) kwargs["cmap"] = cmap contour_func = ax.contourf if filled else ax.contour cset = contour_func(xx, yy, z, n_levels, **kwargs) if filled and not fill_lowest: cset.collections[0].set_alpha(0) kwargs["n_levels"] = n_levels # Label the axes if hasattr(x, "name") and axlabel: ax.set_xlabel(x.name) if hasattr(y, "name") and axlabel: ax.set_ylabel(y.name) return ax def _statsmodels_bivariate_kde(x, y, bw, gridsize, cut, clip): """Compute a bivariate kde using statsmodels.""" if isinstance(bw, string_types): bw_func = getattr(smnp.bandwidths, "bw_" + bw) x_bw = bw_func(x) y_bw = bw_func(y) bw = [x_bw, y_bw] elif np.isscalar(bw): bw = [bw, bw] if isinstance(x, pd.Series): x = x.values if isinstance(y, pd.Series): y = y.values kde = smnp.KDEMultivariate([x, y], "cc", bw) x_support = _kde_support(x, kde.bw[0], gridsize, cut, clip[0]) y_support = _kde_support(y, kde.bw[1], gridsize, cut, clip[1]) xx, yy = np.meshgrid(x_support, y_support) z = kde.pdf([xx.ravel(), yy.ravel()]).reshape(xx.shape) return xx, yy, z def _scipy_bivariate_kde(x, y, bw, gridsize, cut, clip): """Compute a bivariate kde using scipy.""" data = np.c_[x, y] kde = stats.gaussian_kde(data.T) data_std = data.std(axis=0, ddof=1) if isinstance(bw, string_types): bw = "scotts" if bw == "scott" else bw bw_x = getattr(kde, "%s_factor" % bw)() * data_std[0] bw_y = getattr(kde, "%s_factor" % bw)() * data_std[1] elif np.isscalar(bw): bw_x, bw_y = bw, bw else: msg = ("Cannot specify a different bandwidth for each dimension " "with the scipy backend. You should install statsmodels.") raise ValueError(msg) x_support = _kde_support(data[:, 0], bw_x, gridsize, cut, clip[0]) y_support = _kde_support(data[:, 1], bw_y, gridsize, cut, clip[1]) xx, yy = np.meshgrid(x_support, y_support) z = kde([xx.ravel(), yy.ravel()]).reshape(xx.shape) return xx, yy, z def kdeplot(data, data2=None, shade=False, vertical=False, kernel="gau", bw="scott", gridsize=100, cut=3, clip=None, legend=True, cumulative=False, shade_lowest=True, ax=None, **kwargs): """Fit and plot a univariate or bivariate kernel density estimate. Parameters ---------- data : 1d array-like Input data. data2: 1d array-like Second input data. If present, a bivariate KDE will be estimated. shade : bool, optional If True, shade in the area under the KDE curve (or draw with filled contours when data is bivariate). vertical : bool If True, density is on x-axis. kernel : {'gau' | 'cos' | 'biw' | 'epa' | 'tri' | 'triw' }, optional Code for shape of kernel to fit with. Bivariate KDE can only use gaussian kernel. bw : {'scott' | 'silverman' | scalar | pair of scalars }, optional Name of reference method to determine kernel size, scalar factor, or scalar for each dimension of the bivariate plot. gridsize : int, optional Number of discrete points in the evaluation grid. cut : scalar, optional Draw the estimate to cut * bw from the extreme data points. clip : pair of scalars, or pair of pair of scalars, optional Lower and upper bounds for datapoints used to fit KDE. Can provide a pair of (low, high) bounds for bivariate plots. legend : bool, optinal If True, add a legend or label the axes when possible. cumulative : bool If True, draw the cumulative distribution estimated by the kde. shade_lowest : bool If True, shade the lowest contour of a bivariate KDE plot. Not relevant when drawing a univariate plot or when ``shade=False``. Setting this to ``False`` can be useful when you want multiple densities on the same Axes. ax : matplotlib axis, optional Axis to plot on, otherwise uses current axis. kwargs : key, value pairings Other keyword arguments are passed to ``plt.plot()`` or ``plt.contour{f}`` depending on whether a univariate or bivariate plot is being drawn. Returns ------- ax : matplotlib Axes Axes with plot. See Also -------- distplot: Flexibly plot a univariate distribution of observations. jointplot: Plot a joint dataset with bivariate and marginal distributions. Examples -------- Plot a basic univariate density: .. plot:: :context: close-figs >>> import numpy as np; np.random.seed(10) >>> import seaborn as sns; sns.set(color_codes=True) >>> mean, cov = [0, 2], [(1, .5), (.5, 1)] >>> x, y = np.random.multivariate_normal(mean, cov, size=50).T >>> ax = sns.kdeplot(x) Shade under the density curve and use a different color: .. plot:: :context: close-figs >>> ax = sns.kdeplot(x, shade=True, color="r") Plot a bivariate density: .. plot:: :context: close-figs >>> ax = sns.kdeplot(x, y) Use filled contours: .. plot:: :context: close-figs >>> ax = sns.kdeplot(x, y, shade=True) Use more contour levels and a different color palette: .. plot:: :context: close-figs >>> ax = sns.kdeplot(x, y, n_levels=30, cmap="Purples_d") Use a narrower bandwith: .. plot:: :context: close-figs >>> ax = sns.kdeplot(x, bw=.15) Plot the density on the vertical axis: .. plot:: :context: close-figs >>> ax = sns.kdeplot(y, vertical=True) Limit the density curve within the range of the data: .. plot:: :context: close-figs >>> ax = sns.kdeplot(x, cut=0) Plot two shaded bivariate densities: .. plot:: :context: close-figs >>> iris = sns.load_dataset("iris") >>> setosa = iris.loc[iris.species == "setosa"] >>> virginica = iris.loc[iris.species == "virginica"] >>> ax = sns.kdeplot(setosa.sepal_width, setosa.sepal_length, ... cmap="Reds", shade=True, shade_lowest=False) >>> ax = sns.kdeplot(virginica.sepal_width, virginica.sepal_length, ... cmap="Blues", shade=True, shade_lowest=False) """ if ax is None: ax = plt.gca() data = data.astype(np.float64) if data2 is not None: data2 = data2.astype(np.float64) bivariate = False if isinstance(data, np.ndarray) and np.ndim(data) > 1: bivariate = True x, y = data.T elif isinstance(data, pd.DataFrame) and np.ndim(data) > 1: bivariate = True x = data.iloc[:, 0].values y = data.iloc[:, 1].values elif data2 is not None: bivariate = True x = data y = data2 if bivariate and cumulative: raise TypeError("Cumulative distribution plots are not" "supported for bivariate distributions.") if bivariate: ax = _bivariate_kdeplot(x, y, shade, shade_lowest, kernel, bw, gridsize, cut, clip, legend, ax, **kwargs) else: ax = _univariate_kdeplot(data, shade, vertical, kernel, bw, gridsize, cut, clip, legend, ax, cumulative=cumulative, **kwargs) return ax def rugplot(a, height=.05, axis="x", ax=None, **kwargs): """Plot datapoints in an array as sticks on an axis. Parameters ---------- a : vector 1D array of observations. height : scalar, optional Height of ticks as proportion of the axis. axis : {'x' | 'y'}, optional Axis to draw rugplot on. ax : matplotlib axes Axes to draw plot into; otherwise grabs current axes. kwargs : key, value mappings Other keyword arguments are passed to ``axvline`` or ``axhline``. Returns ------- ax : matplotlib axes The Axes object with the plot on it. """ if ax is None: ax = plt.gca() a = np.asarray(a) vertical = kwargs.pop("vertical", axis == "y") func = ax.axhline if vertical else ax.axvline kwargs.setdefault("linewidth", 1) for pt in a: func(pt, 0, height, **kwargs) return ax def jointplot(x, y, data=None, kind="scatter", stat_func=stats.pearsonr, color=None, size=6, ratio=5, space=.2, dropna=True, xlim=None, ylim=None, joint_kws=None, marginal_kws=None, annot_kws=None, **kwargs): """Draw a plot of two variables with bivariate and univariate graphs. This function provides a convenient interface to the :class:`JointGrid` class, with several canned plot kinds. This is intended to be a fairly lightweight wrapper; if you need more flexibility, you should use :class:`JointGrid` directly. Parameters ---------- x, y : strings or vectors Data or names of variables in ``data``. data : DataFrame, optional DataFrame when ``x`` and ``y`` are variable names. kind : { "scatter" | "reg" | "resid" | "kde" | "hex" }, optional Kind of plot to draw. stat_func : callable or None Function used to calculate a statistic about the relationship and annotate the plot. Should map `x` and `y` either to a single value or to a (value, p) tuple. Set to ``None`` if you don't want to annotate the plot. color : matplotlib color, optional Color used for the plot elements. size : numeric, optional Size of the figure (it will be square). ratio : numeric, optional Ratio of joint axes size to marginal axes height. space : numeric, optional Space between the joint and marginal axes dropna : bool, optional If True, remove observations that are missing from ``x`` and ``y``. {x, y}lim : two-tuples, optional Axis limits to set before plotting. {joint, marginal, annot}_kws : dicts Additional keyword arguments for the plot components. kwargs : key, value pairs Additional keyword arguments are passed to the function used to draw the plot on the joint Axes, superseding items in the ``joint_kws`` dictionary. Returns ------- grid : :class:`JointGrid` :class:`JointGrid` object with the plot on it. See Also -------- JointGrid : The Grid class used for drawing this plot. Use it directly if you need more flexibility. Examples -------- Draw a scatterplot with marginal histograms: .. plot:: :context: close-figs >>> import numpy as np, pandas as pd; np.random.seed(0) >>> import seaborn as sns; sns.set(style="white", color_codes=True) >>> tips = sns.load_dataset("tips") >>> g = sns.jointplot(x="total_bill", y="tip", data=tips) Add regression and kernel density fits: .. plot:: :context: close-figs >>> g = sns.jointplot("total_bill", "tip", data=tips, kind="reg") Replace the scatterplot with a joint histogram using hexagonal bins: .. plot:: :context: close-figs >>> g = sns.jointplot("total_bill", "tip", data=tips, kind="hex") Replace the scatterplots and histograms with density estimates and align the marginal Axes tightly with the joint Axes: .. plot:: :context: close-figs >>> iris = sns.load_dataset("iris") >>> g = sns.jointplot("sepal_width", "petal_length", data=iris, ... kind="kde", space=0, color="g") Use a different statistic for the annotation: .. plot:: :context: close-figs >>> from scipy.stats import spearmanr >>> g = sns.jointplot("size", "total_bill", data=tips, ... stat_func=spearmanr, color="m") Draw a scatterplot, then add a joint density estimate: .. plot:: :context: close-figs >>> g = (sns.jointplot("sepal_length", "sepal_width", ... data=iris, color="k") ... .plot_joint(sns.kdeplot, zorder=0, n_levels=6)) Pass vectors in directly without using Pandas, then name the axes: .. plot:: :context: close-figs >>> x, y = np.random.randn(2, 300) >>> g = (sns.jointplot(x, y, kind="hex", stat_func=None) ... .set_axis_labels("x", "y")) Draw a smaller figure with more space devoted to the marginal plots: .. plot:: :context: close-figs >>> g = sns.jointplot("total_bill", "tip", data=tips, ... size=5, ratio=3, color="g") Pass keyword arguments down to the underlying plots: .. plot:: :context: close-figs >>> g = sns.jointplot("petal_length", "sepal_length", data=iris, ... marginal_kws=dict(bins=15, rug=True), ... annot_kws=dict(stat="r"), ... s=40, edgecolor="w", linewidth=1) """ # Set up empty default kwarg dicts if joint_kws is None: joint_kws = {} joint_kws.update(kwargs) if marginal_kws is None: marginal_kws = {} if annot_kws is None: annot_kws = {} # Make a colormap based off the plot color if color is None: color = color_palette()[0] color_rgb = mpl.colors.colorConverter.to_rgb(color) colors = [set_hls_values(color_rgb, l=l) for l in np.linspace(1, 0, 12)] cmap = blend_palette(colors, as_cmap=True) # Initialize the JointGrid object grid = JointGrid(x, y, data, dropna=dropna, size=size, ratio=ratio, space=space, xlim=xlim, ylim=ylim) # Plot the data using the grid if kind == "scatter": joint_kws.setdefault("color", color) grid.plot_joint(plt.scatter, **joint_kws) marginal_kws.setdefault("kde", False) marginal_kws.setdefault("color", color) grid.plot_marginals(distplot, **marginal_kws) elif kind.startswith("hex"): x_bins = _freedman_diaconis_bins(grid.x) y_bins = _freedman_diaconis_bins(grid.y) gridsize = int(np.mean([x_bins, y_bins])) joint_kws.setdefault("gridsize", gridsize) joint_kws.setdefault("cmap", cmap) grid.plot_joint(plt.hexbin, **joint_kws) marginal_kws.setdefault("kde", False) marginal_kws.setdefault("color", color) grid.plot_marginals(distplot, **marginal_kws) elif kind.startswith("kde"): joint_kws.setdefault("shade", True) joint_kws.setdefault("cmap", cmap) grid.plot_joint(kdeplot, **joint_kws) marginal_kws.setdefault("shade", True) marginal_kws.setdefault("color", color) grid.plot_marginals(kdeplot, **marginal_kws) elif kind.startswith("reg"): from .linearmodels import regplot marginal_kws.setdefault("color", color) grid.plot_marginals(distplot, **marginal_kws) joint_kws.setdefault("color", color) grid.plot_joint(regplot, **joint_kws) elif kind.startswith("resid"): from .linearmodels import residplot joint_kws.setdefault("color", color) grid.plot_joint(residplot, **joint_kws) x, y = grid.ax_joint.collections[0].get_offsets().T marginal_kws.setdefault("color", color) marginal_kws.setdefault("kde", False) distplot(x, ax=grid.ax_marg_x, **marginal_kws) distplot(y, vertical=True, fit=stats.norm, ax=grid.ax_marg_y, **marginal_kws) stat_func = None else: msg = "kind must be either 'scatter', 'reg', 'resid', 'kde', or 'hex'" raise ValueError(msg) if stat_func is not None: grid.annotate(stat_func, **annot_kws) return grid seaborn-0.6.0/seaborn/external/000077500000000000000000000000001254425553400164325ustar00rootroot00000000000000seaborn-0.6.0/seaborn/external/__init__.py000066400000000000000000000000001254425553400205310ustar00rootroot00000000000000seaborn-0.6.0/seaborn/external/husl.py000066400000000000000000000150051254425553400177600ustar00rootroot00000000000000import operator import math __version__ = "2.1.0" m = [ [3.2406, -1.5372, -0.4986], [-0.9689, 1.8758, 0.0415], [0.0557, -0.2040, 1.0570] ] m_inv = [ [0.4124, 0.3576, 0.1805], [0.2126, 0.7152, 0.0722], [0.0193, 0.1192, 0.9505] ] # Hard-coded D65 illuminant refX = 0.95047 refY = 1.00000 refZ = 1.08883 refU = 0.19784 refV = 0.46834 lab_e = 0.008856 lab_k = 903.3 # Public API def husl_to_rgb(h, s, l): return lch_to_rgb(*husl_to_lch([h, s, l])) def husl_to_hex(h, s, l): return rgb_to_hex(husl_to_rgb(h, s, l)) def rgb_to_husl(r, g, b): return lch_to_husl(rgb_to_lch(r, g, b)) def hex_to_husl(hex): return rgb_to_husl(*hex_to_rgb(hex)) def huslp_to_rgb(h, s, l): return lch_to_rgb(*huslp_to_lch([h, s, l])) def huslp_to_hex(h, s, l): return rgb_to_hex(huslp_to_rgb(h, s, l)) def rgb_to_huslp(r, g, b): return lch_to_huslp(rgb_to_lch(r, g, b)) def hex_to_huslp(hex): return rgb_to_huslp(*hex_to_rgb(hex)) def lch_to_rgb(l, c, h): return xyz_to_rgb(luv_to_xyz(lch_to_luv([l, c, h]))) def rgb_to_lch(r, g, b): return luv_to_lch(xyz_to_luv(rgb_to_xyz([r, g, b]))) def max_chroma(L, H): hrad = math.radians(H) sinH = (math.sin(hrad)) cosH = (math.cos(hrad)) sub1 = (math.pow(L + 16, 3.0) / 1560896.0) sub2 = sub1 if sub1 > 0.008856 else (L / 903.3) result = float("inf") for row in m: m1 = row[0] m2 = row[1] m3 = row[2] top = ((0.99915 * m1 + 1.05122 * m2 + 1.14460 * m3) * sub2) rbottom = (0.86330 * m3 - 0.17266 * m2) lbottom = (0.12949 * m3 - 0.38848 * m1) bottom = (rbottom * sinH + lbottom * cosH) * sub2 for t in (0.0, 1.0): C = (L * (top - 1.05122 * t) / (bottom + 0.17266 * sinH * t)) if C > 0.0 and C < result: result = C return result def _hrad_extremum(L): lhs = (math.pow(L, 3.0) + 48.0 * math.pow(L, 2.0) + 768.0 * L + 4096.0) / 1560896.0 rhs = 1107.0 / 125000.0 sub = lhs if lhs > rhs else 10.0 * L / 9033.0 chroma = float("inf") result = None for row in m: for limit in (0.0, 1.0): [m1, m2, m3] = row top = -3015466475.0 * m3 * sub + 603093295.0 * m2 * sub - 603093295.0 * limit bottom = 1356959916.0 * m1 * sub - 452319972.0 * m3 * sub hrad = math.atan2(top, bottom) # This is a math hack to deal with tan quadrants, I'm too lazy to figure # out how to do this properly if limit == 0.0: hrad += math.pi test = max_chroma(L, math.degrees(hrad)) if test < chroma: chroma = test result = hrad return result def max_chroma_pastel(L): H = math.degrees(_hrad_extremum(L)) return max_chroma(L, H) def dot_product(a, b): return sum(map(operator.mul, a, b)) def f(t): if t > lab_e: return (math.pow(t, 1.0 / 3.0)) else: return (7.787 * t + 16.0 / 116.0) def f_inv(t): if math.pow(t, 3.0) > lab_e: return (math.pow(t, 3.0)) else: return (116.0 * t - 16.0) / lab_k def from_linear(c): if c <= 0.0031308: return 12.92 * c else: return (1.055 * math.pow(c, 1.0 / 2.4) - 0.055) def to_linear(c): a = 0.055 if c > 0.04045: return (math.pow((c + a) / (1.0 + a), 2.4)) else: return (c / 12.92) def rgb_prepare(triple): ret = [] for ch in triple: ch = round(ch, 3) if ch < -0.0001 or ch > 1.0001: raise Exception("Illegal RGB value %f" % ch) if ch < 0: ch = 0 if ch > 1: ch = 1 # Fix for Python 3 which by default rounds 4.5 down to 4.0 # instead of Python 2 which is rounded to 5.0 which caused # a couple off by one errors in the tests. Tests now all pass # in Python 2 and Python 3 ret.append(round(ch * 255 + 0.001, 0)) return ret def hex_to_rgb(hex): if hex.startswith('#'): hex = hex[1:] r = int(hex[0:2], 16) / 255.0 g = int(hex[2:4], 16) / 255.0 b = int(hex[4:6], 16) / 255.0 return [r, g, b] def rgb_to_hex(triple): [r, g, b] = triple return '#%02x%02x%02x' % tuple(rgb_prepare([r, g, b])) def xyz_to_rgb(triple): xyz = map(lambda row: dot_product(row, triple), m) return list(map(from_linear, xyz)) def rgb_to_xyz(triple): rgbl = list(map(to_linear, triple)) return list(map(lambda row: dot_product(row, rgbl), m_inv)) def xyz_to_luv(triple): X, Y, Z = triple if X == Y == Z == 0.0: return [0.0, 0.0, 0.0] varU = (4.0 * X) / (X + (15.0 * Y) + (3.0 * Z)) varV = (9.0 * Y) / (X + (15.0 * Y) + (3.0 * Z)) L = 116.0 * f(Y / refY) - 16.0 # Black will create a divide-by-zero error if L == 0.0: return [0.0, 0.0, 0.0] U = 13.0 * L * (varU - refU) V = 13.0 * L * (varV - refV) return [L, U, V] def luv_to_xyz(triple): L, U, V = triple if L == 0: return [0.0, 0.0, 0.0] varY = f_inv((L + 16.0) / 116.0) varU = U / (13.0 * L) + refU varV = V / (13.0 * L) + refV Y = varY * refY X = 0.0 - (9.0 * Y * varU) / ((varU - 4.0) * varV - varU * varV) Z = (9.0 * Y - (15.0 * varV * Y) - (varV * X)) / (3.0 * varV) return [X, Y, Z] def luv_to_lch(triple): L, U, V = triple C = (math.pow(math.pow(U, 2) + math.pow(V, 2), (1.0 / 2.0))) hrad = (math.atan2(V, U)) H = math.degrees(hrad) if H < 0.0: H = 360.0 + H return [L, C, H] def lch_to_luv(triple): L, C, H = triple Hrad = math.radians(H) U = (math.cos(Hrad) * C) V = (math.sin(Hrad) * C) return [L, U, V] def husl_to_lch(triple): H, S, L = triple if L > 99.9999999: return [100, 0.0, H] if L < 0.00000001: return [0.0, 0.0, H] mx = max_chroma(L, H) C = mx / 100.0 * S return [L, C, H] def lch_to_husl(triple): L, C, H = triple if L > 99.9999999: return [H, 0.0, 100.0] if L < 0.00000001: return [H, 0.0, 0.0] mx = max_chroma(L, H) S = C / mx * 100.0 return [H, S, L] def huslp_to_lch(triple): H, S, L = triple if L > 99.9999999: return [100, 0.0, H] if L < 0.00000001: return [0.0, 0.0, H] mx = max_chroma_pastel(L) C = mx / 100.0 * S return [L, C, H] def lch_to_huslp(triple): L, C, H = triple if L > 99.9999999: return [H, 0.0, 100.0] if L < 0.00000001: return [H, 0.0, 0.0] mx = max_chroma_pastel(L) S = C / mx * 100.0 return [H, S, L] seaborn-0.6.0/seaborn/external/six.py000066400000000000000000000545111254425553400176150ustar00rootroot00000000000000"""Utilities for writing code that runs on Python 2 and 3""" # Copyright (c) 2010-2014 Benjamin Peterson # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in all # copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE # SOFTWARE. import operator import sys import types __author__ = "Benjamin Peterson " __version__ = "1.5.2" # Useful for very coarse version differentiation. PY2 = sys.version_info[0] == 2 PY3 = sys.version_info[0] == 3 if PY3: string_types = str, integer_types = int, class_types = type, text_type = str binary_type = bytes MAXSIZE = sys.maxsize else: string_types = basestring, integer_types = (int, long) class_types = (type, types.ClassType) text_type = unicode binary_type = str if sys.platform.startswith("java"): # Jython always uses 32 bits. MAXSIZE = int((1 << 31) - 1) else: # It's possible to have sizeof(long) != sizeof(Py_ssize_t). class X(object): def __len__(self): return 1 << 31 try: len(X()) except OverflowError: # 32-bit MAXSIZE = int((1 << 31) - 1) else: # 64-bit MAXSIZE = int((1 << 63) - 1) del X def _add_doc(func, doc): """Add documentation to a function.""" func.__doc__ = doc def _import_module(name): """Import module, returning the module after the last dot.""" __import__(name) return sys.modules[name] class _LazyDescr(object): def __init__(self, name): self.name = name def __get__(self, obj, tp): result = self._resolve() setattr(obj, self.name, result) # Invokes __set__. # This is a bit ugly, but it avoids running this again. delattr(obj.__class__, self.name) return result class MovedModule(_LazyDescr): def __init__(self, name, old, new=None): super(MovedModule, self).__init__(name) if PY3: if new is None: new = name self.mod = new else: self.mod = old def _resolve(self): return _import_module(self.mod) def __getattr__(self, attr): # Hack around the Django autoreloader. The reloader tries to get # __file__ or __name__ of every module in sys.modules. This doesn't work # well if this MovedModule is for an module that is unavailable on this # machine (like winreg on Unix systems). Thus, we pretend __file__ and # __name__ don't exist if the module hasn't been loaded yet. See issues # #51 and #53. if attr in ("__file__", "__name__") and self.mod not in sys.modules: raise AttributeError _module = self._resolve() value = getattr(_module, attr) setattr(self, attr, value) return value class _LazyModule(types.ModuleType): def __init__(self, name): super(_LazyModule, self).__init__(name) self.__doc__ = self.__class__.__doc__ def __dir__(self): attrs = ["__doc__", "__name__"] attrs += [attr.name for attr in self._moved_attributes] return attrs # Subclasses should override this _moved_attributes = [] class MovedAttribute(_LazyDescr): def __init__(self, name, old_mod, new_mod, old_attr=None, new_attr=None): super(MovedAttribute, self).__init__(name) if PY3: if new_mod is None: new_mod = name self.mod = new_mod if new_attr is None: if old_attr is None: new_attr = name else: new_attr = old_attr self.attr = new_attr else: self.mod = old_mod if old_attr is None: old_attr = name self.attr = old_attr def _resolve(self): module = _import_module(self.mod) return getattr(module, self.attr) class _MovedItems(_LazyModule): """Lazy loading of moved objects""" _moved_attributes = [ MovedAttribute("cStringIO", "cStringIO", "io", "StringIO"), MovedAttribute("filter", "itertools", "builtins", "ifilter", "filter"), MovedAttribute("filterfalse", "itertools", "itertools", "ifilterfalse", "filterfalse"), MovedAttribute("input", "__builtin__", "builtins", "raw_input", "input"), MovedAttribute("map", "itertools", "builtins", "imap", "map"), MovedAttribute("range", "__builtin__", "builtins", "xrange", "range"), MovedAttribute("reload_module", "__builtin__", "imp", "reload"), MovedAttribute("reduce", "__builtin__", "functools"), MovedAttribute("StringIO", "StringIO", "io"), MovedAttribute("UserString", "UserString", "collections"), MovedAttribute("xrange", "__builtin__", "builtins", "xrange", "range"), MovedAttribute("zip", "itertools", "builtins", "izip", "zip"), MovedAttribute("zip_longest", "itertools", "itertools", "izip_longest", "zip_longest"), MovedModule("builtins", "__builtin__"), MovedModule("configparser", "ConfigParser"), MovedModule("copyreg", "copy_reg"), MovedModule("dbm_gnu", "gdbm", "dbm.gnu"), MovedModule("http_cookiejar", "cookielib", "http.cookiejar"), MovedModule("http_cookies", "Cookie", "http.cookies"), MovedModule("html_entities", "htmlentitydefs", "html.entities"), MovedModule("html_parser", "HTMLParser", "html.parser"), MovedModule("http_client", "httplib", "http.client"), MovedModule("email_mime_multipart", "email.MIMEMultipart", "email.mime.multipart"), MovedModule("email_mime_text", "email.MIMEText", "email.mime.text"), MovedModule("email_mime_base", "email.MIMEBase", "email.mime.base"), MovedModule("BaseHTTPServer", "BaseHTTPServer", "http.server"), MovedModule("CGIHTTPServer", "CGIHTTPServer", "http.server"), MovedModule("SimpleHTTPServer", "SimpleHTTPServer", "http.server"), MovedModule("cPickle", "cPickle", "pickle"), MovedModule("queue", "Queue"), MovedModule("reprlib", "repr"), MovedModule("socketserver", "SocketServer"), MovedModule("_thread", "thread", "_thread"), MovedModule("tkinter", "Tkinter"), MovedModule("tkinter_dialog", "Dialog", "tkinter.dialog"), MovedModule("tkinter_filedialog", "FileDialog", "tkinter.filedialog"), MovedModule("tkinter_scrolledtext", "ScrolledText", "tkinter.scrolledtext"), MovedModule("tkinter_simpledialog", "SimpleDialog", "tkinter.simpledialog"), MovedModule("tkinter_tix", "Tix", "tkinter.tix"), MovedModule("tkinter_ttk", "ttk", "tkinter.ttk"), MovedModule("tkinter_constants", "Tkconstants", "tkinter.constants"), MovedModule("tkinter_dnd", "Tkdnd", "tkinter.dnd"), MovedModule("tkinter_colorchooser", "tkColorChooser", "tkinter.colorchooser"), MovedModule("tkinter_commondialog", "tkCommonDialog", "tkinter.commondialog"), MovedModule("tkinter_tkfiledialog", "tkFileDialog", "tkinter.filedialog"), MovedModule("tkinter_font", "tkFont", "tkinter.font"), MovedModule("tkinter_messagebox", "tkMessageBox", "tkinter.messagebox"), MovedModule("tkinter_tksimpledialog", "tkSimpleDialog", "tkinter.simpledialog"), MovedModule("urllib_parse", __name__ + ".moves.urllib_parse", "urllib.parse"), MovedModule("urllib_error", __name__ + ".moves.urllib_error", "urllib.error"), MovedModule("urllib", __name__ + ".moves.urllib", __name__ + ".moves.urllib"), MovedModule("urllib_robotparser", "robotparser", "urllib.robotparser"), MovedModule("xmlrpc_client", "xmlrpclib", "xmlrpc.client"), MovedModule("winreg", "_winreg"), ] for attr in _moved_attributes: setattr(_MovedItems, attr.name, attr) if isinstance(attr, MovedModule): sys.modules[__name__ + ".moves." + attr.name] = attr del attr _MovedItems._moved_attributes = _moved_attributes moves = sys.modules[__name__ + ".moves"] = _MovedItems(__name__ + ".moves") class Module_six_moves_urllib_parse(_LazyModule): """Lazy loading of moved objects in six.moves.urllib_parse""" _urllib_parse_moved_attributes = [ MovedAttribute("ParseResult", "urlparse", "urllib.parse"), MovedAttribute("parse_qs", "urlparse", "urllib.parse"), MovedAttribute("parse_qsl", "urlparse", "urllib.parse"), MovedAttribute("urldefrag", "urlparse", "urllib.parse"), MovedAttribute("urljoin", "urlparse", "urllib.parse"), MovedAttribute("urlparse", "urlparse", "urllib.parse"), MovedAttribute("urlsplit", "urlparse", "urllib.parse"), MovedAttribute("urlunparse", "urlparse", "urllib.parse"), MovedAttribute("urlunsplit", "urlparse", "urllib.parse"), MovedAttribute("quote", "urllib", "urllib.parse"), MovedAttribute("quote_plus", "urllib", "urllib.parse"), MovedAttribute("unquote", "urllib", "urllib.parse"), MovedAttribute("unquote_plus", "urllib", "urllib.parse"), MovedAttribute("urlencode", "urllib", "urllib.parse"), ] for attr in _urllib_parse_moved_attributes: setattr(Module_six_moves_urllib_parse, attr.name, attr) del attr Module_six_moves_urllib_parse._moved_attributes = _urllib_parse_moved_attributes sys.modules[__name__ + ".moves.urllib_parse"] = sys.modules[__name__ + ".moves.urllib.parse"] = Module_six_moves_urllib_parse(__name__ + ".moves.urllib_parse") class Module_six_moves_urllib_error(_LazyModule): """Lazy loading of moved objects in six.moves.urllib_error""" _urllib_error_moved_attributes = [ MovedAttribute("URLError", "urllib2", "urllib.error"), MovedAttribute("HTTPError", "urllib2", "urllib.error"), MovedAttribute("ContentTooShortError", "urllib", "urllib.error"), ] for attr in _urllib_error_moved_attributes: setattr(Module_six_moves_urllib_error, attr.name, attr) del attr Module_six_moves_urllib_error._moved_attributes = _urllib_error_moved_attributes sys.modules[__name__ + ".moves.urllib_error"] = sys.modules[__name__ + ".moves.urllib.error"] = Module_six_moves_urllib_error(__name__ + ".moves.urllib.error") class Module_six_moves_urllib_request(_LazyModule): """Lazy loading of moved objects in six.moves.urllib_request""" _urllib_request_moved_attributes = [ MovedAttribute("urlopen", "urllib2", "urllib.request"), MovedAttribute("install_opener", "urllib2", "urllib.request"), MovedAttribute("build_opener", "urllib2", "urllib.request"), MovedAttribute("pathname2url", "urllib", "urllib.request"), MovedAttribute("url2pathname", "urllib", "urllib.request"), MovedAttribute("getproxies", "urllib", "urllib.request"), MovedAttribute("Request", "urllib2", "urllib.request"), MovedAttribute("OpenerDirector", "urllib2", "urllib.request"), MovedAttribute("HTTPDefaultErrorHandler", "urllib2", "urllib.request"), MovedAttribute("HTTPRedirectHandler", "urllib2", "urllib.request"), MovedAttribute("HTTPCookieProcessor", "urllib2", "urllib.request"), MovedAttribute("ProxyHandler", "urllib2", "urllib.request"), MovedAttribute("BaseHandler", "urllib2", "urllib.request"), MovedAttribute("HTTPPasswordMgr", "urllib2", "urllib.request"), MovedAttribute("HTTPPasswordMgrWithDefaultRealm", "urllib2", "urllib.request"), MovedAttribute("AbstractBasicAuthHandler", "urllib2", "urllib.request"), MovedAttribute("HTTPBasicAuthHandler", "urllib2", "urllib.request"), MovedAttribute("ProxyBasicAuthHandler", "urllib2", "urllib.request"), MovedAttribute("AbstractDigestAuthHandler", "urllib2", "urllib.request"), MovedAttribute("HTTPDigestAuthHandler", "urllib2", "urllib.request"), MovedAttribute("ProxyDigestAuthHandler", "urllib2", "urllib.request"), MovedAttribute("HTTPHandler", "urllib2", "urllib.request"), MovedAttribute("HTTPSHandler", "urllib2", "urllib.request"), MovedAttribute("FileHandler", "urllib2", "urllib.request"), MovedAttribute("FTPHandler", "urllib2", "urllib.request"), MovedAttribute("CacheFTPHandler", "urllib2", "urllib.request"), MovedAttribute("UnknownHandler", "urllib2", "urllib.request"), MovedAttribute("HTTPErrorProcessor", "urllib2", "urllib.request"), MovedAttribute("urlretrieve", "urllib", "urllib.request"), MovedAttribute("urlcleanup", "urllib", "urllib.request"), MovedAttribute("URLopener", "urllib", "urllib.request"), MovedAttribute("FancyURLopener", "urllib", "urllib.request"), MovedAttribute("proxy_bypass", "urllib", "urllib.request"), ] for attr in _urllib_request_moved_attributes: setattr(Module_six_moves_urllib_request, attr.name, attr) del attr Module_six_moves_urllib_request._moved_attributes = _urllib_request_moved_attributes sys.modules[__name__ + ".moves.urllib_request"] = sys.modules[__name__ + ".moves.urllib.request"] = Module_six_moves_urllib_request(__name__ + ".moves.urllib.request") class Module_six_moves_urllib_response(_LazyModule): """Lazy loading of moved objects in six.moves.urllib_response""" _urllib_response_moved_attributes = [ MovedAttribute("addbase", "urllib", "urllib.response"), MovedAttribute("addclosehook", "urllib", "urllib.response"), MovedAttribute("addinfo", "urllib", "urllib.response"), MovedAttribute("addinfourl", "urllib", "urllib.response"), ] for attr in _urllib_response_moved_attributes: setattr(Module_six_moves_urllib_response, attr.name, attr) del attr Module_six_moves_urllib_response._moved_attributes = _urllib_response_moved_attributes sys.modules[__name__ + ".moves.urllib_response"] = sys.modules[__name__ + ".moves.urllib.response"] = Module_six_moves_urllib_response(__name__ + ".moves.urllib.response") class Module_six_moves_urllib_robotparser(_LazyModule): """Lazy loading of moved objects in six.moves.urllib_robotparser""" _urllib_robotparser_moved_attributes = [ MovedAttribute("RobotFileParser", "robotparser", "urllib.robotparser"), ] for attr in _urllib_robotparser_moved_attributes: setattr(Module_six_moves_urllib_robotparser, attr.name, attr) del attr Module_six_moves_urllib_robotparser._moved_attributes = _urllib_robotparser_moved_attributes sys.modules[__name__ + ".moves.urllib_robotparser"] = sys.modules[__name__ + ".moves.urllib.robotparser"] = Module_six_moves_urllib_robotparser(__name__ + ".moves.urllib.robotparser") class Module_six_moves_urllib(types.ModuleType): """Create a six.moves.urllib namespace that resembles the Python 3 namespace""" parse = sys.modules[__name__ + ".moves.urllib_parse"] error = sys.modules[__name__ + ".moves.urllib_error"] request = sys.modules[__name__ + ".moves.urllib_request"] response = sys.modules[__name__ + ".moves.urllib_response"] robotparser = sys.modules[__name__ + ".moves.urllib_robotparser"] def __dir__(self): return ['parse', 'error', 'request', 'response', 'robotparser'] sys.modules[__name__ + ".moves.urllib"] = Module_six_moves_urllib(__name__ + ".moves.urllib") def add_move(move): """Add an item to six.moves.""" setattr(_MovedItems, move.name, move) def remove_move(name): """Remove item from six.moves.""" try: delattr(_MovedItems, name) except AttributeError: try: del moves.__dict__[name] except KeyError: raise AttributeError("no such move, %r" % (name,)) if PY3: _meth_func = "__func__" _meth_self = "__self__" _func_closure = "__closure__" _func_code = "__code__" _func_defaults = "__defaults__" _func_globals = "__globals__" _iterkeys = "keys" _itervalues = "values" _iteritems = "items" _iterlists = "lists" else: _meth_func = "im_func" _meth_self = "im_self" _func_closure = "func_closure" _func_code = "func_code" _func_defaults = "func_defaults" _func_globals = "func_globals" _iterkeys = "iterkeys" _itervalues = "itervalues" _iteritems = "iteritems" _iterlists = "iterlists" try: advance_iterator = next except NameError: def advance_iterator(it): return it.next() next = advance_iterator try: callable = callable except NameError: def callable(obj): return any("__call__" in klass.__dict__ for klass in type(obj).__mro__) if PY3: def get_unbound_function(unbound): return unbound create_bound_method = types.MethodType Iterator = object else: def get_unbound_function(unbound): return unbound.im_func def create_bound_method(func, obj): return types.MethodType(func, obj, obj.__class__) class Iterator(object): def next(self): return type(self).__next__(self) callable = callable _add_doc(get_unbound_function, """Get the function out of a possibly unbound function""") get_method_function = operator.attrgetter(_meth_func) get_method_self = operator.attrgetter(_meth_self) get_function_closure = operator.attrgetter(_func_closure) get_function_code = operator.attrgetter(_func_code) get_function_defaults = operator.attrgetter(_func_defaults) get_function_globals = operator.attrgetter(_func_globals) def iterkeys(d, **kw): """Return an iterator over the keys of a dictionary.""" return iter(getattr(d, _iterkeys)(**kw)) def itervalues(d, **kw): """Return an iterator over the values of a dictionary.""" return iter(getattr(d, _itervalues)(**kw)) def iteritems(d, **kw): """Return an iterator over the (key, value) pairs of a dictionary.""" return iter(getattr(d, _iteritems)(**kw)) def iterlists(d, **kw): """Return an iterator over the (key, [values]) pairs of a dictionary.""" return iter(getattr(d, _iterlists)(**kw)) if PY3: def b(s): return s.encode("latin-1") def u(s): return s unichr = chr if sys.version_info[1] <= 1: def int2byte(i): return bytes((i,)) else: # This is about 2x faster than the implementation above on 3.2+ int2byte = operator.methodcaller("to_bytes", 1, "big") byte2int = operator.itemgetter(0) indexbytes = operator.getitem iterbytes = iter import io StringIO = io.StringIO BytesIO = io.BytesIO else: def b(s): return s # Workaround for standalone backslash def u(s): return unicode(s.replace(r'\\', r'\\\\'), "unicode_escape") unichr = unichr int2byte = chr def byte2int(bs): return ord(bs[0]) def indexbytes(buf, i): return ord(buf[i]) def iterbytes(buf): return (ord(byte) for byte in buf) import StringIO StringIO = BytesIO = StringIO.StringIO _add_doc(b, """Byte literal""") _add_doc(u, """Text literal""") if PY3: exec_ = getattr(moves.builtins, "exec") def reraise(tp, value, tb=None): if value.__traceback__ is not tb: raise value.with_traceback(tb) raise value else: def exec_(_code_, _globs_=None, _locs_=None): """Execute code in a namespace.""" if _globs_ is None: frame = sys._getframe(1) _globs_ = frame.f_globals if _locs_ is None: _locs_ = frame.f_locals del frame elif _locs_ is None: _locs_ = _globs_ exec("""exec _code_ in _globs_, _locs_""") exec_("""def reraise(tp, value, tb=None): raise tp, value, tb """) print_ = getattr(moves.builtins, "print", None) if print_ is None: def print_(*args, **kwargs): """The new-style print function for Python 2.4 and 2.5.""" fp = kwargs.pop("file", sys.stdout) if fp is None: return def write(data): if not isinstance(data, basestring): data = str(data) # If the file has an encoding, encode unicode with it. if (isinstance(fp, file) and isinstance(data, unicode) and fp.encoding is not None): errors = getattr(fp, "errors", None) if errors is None: errors = "strict" data = data.encode(fp.encoding, errors) fp.write(data) want_unicode = False sep = kwargs.pop("sep", None) if sep is not None: if isinstance(sep, unicode): want_unicode = True elif not isinstance(sep, str): raise TypeError("sep must be None or a string") end = kwargs.pop("end", None) if end is not None: if isinstance(end, unicode): want_unicode = True elif not isinstance(end, str): raise TypeError("end must be None or a string") if kwargs: raise TypeError("invalid keyword arguments to print()") if not want_unicode: for arg in args: if isinstance(arg, unicode): want_unicode = True break if want_unicode: newline = unicode("\n") space = unicode(" ") else: newline = "\n" space = " " if sep is None: sep = space if end is None: end = newline for i, arg in enumerate(args): if i: write(sep) write(arg) write(end) _add_doc(reraise, """Reraise an exception.""") def with_metaclass(meta, *bases): """Create a base class with a metaclass.""" return meta("NewBase", bases, {}) def add_metaclass(metaclass): """Class decorator for creating a class with a metaclass.""" def wrapper(cls): orig_vars = cls.__dict__.copy() orig_vars.pop('__dict__', None) orig_vars.pop('__weakref__', None) slots = orig_vars.get('__slots__') if slots is not None: if isinstance(slots, str): slots = [slots] for slots_var in slots: orig_vars.pop(slots_var) return metaclass(cls.__name__, cls.__bases__, orig_vars) return wrapper seaborn-0.6.0/seaborn/linearmodels.py000066400000000000000000001600711254425553400176450ustar00rootroot00000000000000"""Plotting functions for linear models (broadly construed).""" from __future__ import division import copy import itertools from textwrap import dedent import numpy as np import pandas as pd from scipy.spatial import distance import matplotlib as mpl import matplotlib.pyplot as plt import warnings try: import statsmodels assert statsmodels _has_statsmodels = True except ImportError: _has_statsmodels = False from .external.six import string_types from .external.six.moves import range from . import utils from . import algorithms as algo from .palettes import color_palette from .axisgrid import FacetGrid, PairGrid, _facet_docs from .distributions import kdeplot class _LinearPlotter(object): """Base class for plotting relational data in tidy format. To get anything useful done you'll have to inherit from this, but setup code that can be abstracted out should be put here. """ def establish_variables(self, data, **kws): """Extract variables from data or use directly.""" self.data = data # Validate the inputs any_strings = any([isinstance(v, string_types) for v in kws.values()]) if any_strings and data is None: raise ValueError("Must pass `data` if using named variables.") # Set the variables for var, val in kws.items(): if isinstance(val, string_types): setattr(self, var, data[val]) else: setattr(self, var, val) def dropna(self, *vars): """Remove observations with missing data.""" vals = [getattr(self, var) for var in vars] vals = [v for v in vals if v is not None] not_na = np.all(np.column_stack([pd.notnull(v) for v in vals]), axis=1) for var in vars: val = getattr(self, var) if val is not None: setattr(self, var, val[not_na]) def plot(self, ax): raise NotImplementedError class _RegressionPlotter(_LinearPlotter): """Plotter for numeric independent variables with regression model. This does the computations and drawing for the `regplot` function, and is thus also used indirectly by `lmplot`. """ def __init__(self, x, y, data=None, x_estimator=None, x_bins=None, x_ci="ci", scatter=True, fit_reg=True, ci=95, n_boot=1000, units=None, order=1, logistic=False, lowess=False, robust=False, logx=False, x_partial=None, y_partial=None, truncate=False, dropna=True, x_jitter=None, y_jitter=None, color=None, label=None): # Set member attributes self.x_estimator = x_estimator self.ci = ci self.x_ci = ci if x_ci == "ci" else x_ci self.n_boot = n_boot self.scatter = scatter self.fit_reg = fit_reg self.order = order self.logistic = logistic self.lowess = lowess self.robust = robust self.logx = logx self.truncate = truncate self.x_jitter = x_jitter self.y_jitter = y_jitter self.color = color self.label = label # Validate the regression options: if sum((order > 1, logistic, robust, lowess, logx)) > 1: raise ValueError("Mutually exclusive regression options.") # Extract the data vals from the arguments or passed dataframe self.establish_variables(data, x=x, y=y, units=units, x_partial=x_partial, y_partial=y_partial) # Drop null observations if dropna: self.dropna("x", "y", "units", "x_partial", "y_partial") # Regress nuisance variables out of the data if self.x_partial is not None: self.x = self.regress_out(self.x, self.x_partial) if self.y_partial is not None: self.y = self.regress_out(self.y, self.y_partial) # Possibly bin the predictor variable, which implies a point estimate if x_bins is not None: self.x_estimator = np.mean if x_estimator is None else x_estimator x_discrete, x_bins = self.bin_predictor(x_bins) self.x_discrete = x_discrete else: self.x_discrete = self.x # Save the range of the x variable for the grid later self.x_range = self.x.min(), self.x.max() @property def scatter_data(self): """Data where each observation is a point.""" x_j = self.x_jitter if x_j is None: x = self.x else: x = self.x + np.random.uniform(-x_j, x_j, len(self.x)) y_j = self.y_jitter if y_j is None: y = self.y else: y = self.y + np.random.uniform(-y_j, y_j, len(self.y)) return x, y @property def estimate_data(self): """Data with a point estimate and CI for each discrete x value.""" x, y = self.x_discrete, self.y vals = sorted(np.unique(x)) points, cis = [], [] for val in vals: # Get the point estimate of the y variable _y = y[x == val] est = self.x_estimator(_y) points.append(est) # Compute the confidence interval for this estimate if self.x_ci is None: cis.append(None) else: units = None if self.units is not None: units = self.units[x == val] boots = algo.bootstrap(_y, func=self.x_estimator, n_boot=self.n_boot, units=units) _ci = utils.ci(boots, self.x_ci) cis.append(_ci) return vals, points, cis def fit_regression(self, ax=None, x_range=None, grid=None): """Fit the regression model.""" # Create the grid for the regression if grid is None: if self.truncate: x_min, x_max = self.x_range else: if ax is None: x_min, x_max = x_range else: x_min, x_max = ax.get_xlim() grid = np.linspace(x_min, x_max, 100) ci = self.ci # Fit the regression if self.order > 1: yhat, yhat_boots = self.fit_poly(grid, self.order) elif self.logistic: from statsmodels.genmod.generalized_linear_model import GLM from statsmodels.genmod.families import Binomial yhat, yhat_boots = self.fit_statsmodels(grid, GLM, family=Binomial()) elif self.lowess: ci = None grid, yhat = self.fit_lowess() elif self.robust: from statsmodels.robust.robust_linear_model import RLM yhat, yhat_boots = self.fit_statsmodels(grid, RLM) elif self.logx: yhat, yhat_boots = self.fit_logx(grid) else: yhat, yhat_boots = self.fit_fast(grid) # Compute the confidence interval at each grid point if ci is None: err_bands = None else: err_bands = utils.ci(yhat_boots, ci, axis=0) return grid, yhat, err_bands def fit_fast(self, grid): """Low-level regression and prediction using linear algebra.""" X, y = np.c_[np.ones(len(self.x)), self.x], self.y grid = np.c_[np.ones(len(grid)), grid] reg_func = lambda _x, _y: np.linalg.pinv(_x).dot(_y) yhat = grid.dot(reg_func(X, y)) if self.ci is None: return yhat, None beta_boots = algo.bootstrap(X, y, func=reg_func, n_boot=self.n_boot, units=self.units).T yhat_boots = grid.dot(beta_boots).T return yhat, yhat_boots def fit_poly(self, grid, order): """Regression using numpy polyfit for higher-order trends.""" x, y = self.x, self.y reg_func = lambda _x, _y: np.polyval(np.polyfit(_x, _y, order), grid) yhat = reg_func(x, y) if self.ci is None: return yhat, None yhat_boots = algo.bootstrap(x, y, func=reg_func, n_boot=self.n_boot, units=self.units) return yhat, yhat_boots def fit_statsmodels(self, grid, model, **kwargs): """More general regression function using statsmodels objects.""" X, y = np.c_[np.ones(len(self.x)), self.x], self.y grid = np.c_[np.ones(len(grid)), grid] reg_func = lambda _x, _y: model(_y, _x, **kwargs).fit().predict(grid) yhat = reg_func(X, y) if self.ci is None: return yhat, None yhat_boots = algo.bootstrap(X, y, func=reg_func, n_boot=self.n_boot, units=self.units) return yhat, yhat_boots def fit_lowess(self): """Fit a locally-weighted regression, which returns its own grid.""" from statsmodels.nonparametric.smoothers_lowess import lowess grid, yhat = lowess(self.y, self.x).T return grid, yhat def fit_logx(self, grid): """Fit the model in log-space.""" X, y = np.c_[np.ones(len(self.x)), self.x], self.y grid = np.c_[np.ones(len(grid)), np.log(grid)] def reg_func(_x, _y): _x = np.c_[_x[:, 0], np.log(_x[:, 1])] return np.linalg.pinv(_x).dot(_y) yhat = grid.dot(reg_func(X, y)) if self.ci is None: return yhat, None beta_boots = algo.bootstrap(X, y, func=reg_func, n_boot=self.n_boot, units=self.units).T yhat_boots = grid.dot(beta_boots).T return yhat, yhat_boots def bin_predictor(self, bins): """Discretize a predictor by assigning value to closest bin.""" x = self.x if np.isscalar(bins): percentiles = np.linspace(0, 100, bins + 2)[1:-1] bins = np.c_[utils.percentiles(x, percentiles)] else: bins = np.c_[np.ravel(bins)] dist = distance.cdist(np.c_[x], bins) x_binned = bins[np.argmin(dist, axis=1)].ravel() return x_binned, bins.ravel() def regress_out(self, a, b): """Regress b from a keeping a's original mean.""" a_mean = a.mean() a = a - a_mean b = b - b.mean() b = np.c_[b] a_prime = a - b.dot(np.linalg.pinv(b).dot(a)) return (a_prime + a_mean).reshape(a.shape) def plot(self, ax, scatter_kws, line_kws): """Draw the full plot.""" # Insert the plot label into the correct set of keyword arguments if self.scatter: scatter_kws["label"] = self.label else: line_kws["label"] = self.label # Use the current color cycle state as a default if self.color is None: lines, = plt.plot(self.x.mean(), self.y.mean()) color = lines.get_color() lines.remove() else: color = self.color # Let color in keyword arguments override overall plot color scatter_kws.setdefault("color", color) line_kws.setdefault("color", color) # Draw the constituent plots if self.scatter: self.scatterplot(ax, scatter_kws) if self.fit_reg: self.lineplot(ax, line_kws) # Label the axes if hasattr(self.x, "name"): ax.set_xlabel(self.x.name) if hasattr(self.y, "name"): ax.set_ylabel(self.y.name) def scatterplot(self, ax, kws): """Draw the data.""" # Treat the line-based markers specially, explicitly setting larger # linewidth than is provided by the seaborn style defaults. # This would ideally be handled better in matplotlib (i.e., distinguish # between edgewidth for solid glyphs and linewidth for line glyphs # but this should do for now. line_markers = ["1", "2", "3", "4", "+", "x", "|", "_"] if self.x_estimator is None: if "marker" in kws and kws["marker"] in line_markers: lw = mpl.rcParams["lines.linewidth"] else: lw = mpl.rcParams["lines.markeredgewidth"] kws.setdefault("linewidths", lw) if not hasattr(kws['color'], 'shape') or kws['color'].shape[1] < 4: kws.setdefault("alpha", .8) x, y = self.scatter_data ax.scatter(x, y, **kws) else: # TODO abstraction ci_kws = {"color": kws["color"]} ci_kws["linewidth"] = mpl.rcParams["lines.linewidth"] * 1.75 kws.setdefault("s", 50) xs, ys, cis = self.estimate_data if [ci for ci in cis if ci is not None]: for x, ci in zip(xs, cis): ax.plot([x, x], ci, **ci_kws) ax.scatter(xs, ys, **kws) def lineplot(self, ax, kws): """Draw the model.""" xlim = ax.get_xlim() # Fit the regression model grid, yhat, err_bands = self.fit_regression(ax) # Get set default aesthetics fill_color = kws["color"] lw = kws.pop("lw", mpl.rcParams["lines.linewidth"] * 1.5) kws.setdefault("linewidth", lw) # Draw the regression line and confidence interval ax.plot(grid, yhat, **kws) if err_bands is not None: ax.fill_between(grid, *err_bands, color=fill_color, alpha=.15) ax.set_xlim(*xlim) _regression_docs = dict( model_api=dedent("""\ There are a number of mutually exclusive options for estimating the regression model: ``order``, ``logistic``, ``lowess``, ``robust``, and ``logx``. See the parameter docs for more information on these options.\ """), regplot_vs_lmplot=dedent("""\ Understanding the difference between :func:`regplot` and :func:`lmplot` can be a bit tricky. In fact, they are closely related, as :func:`lmplot` uses :func:`regplot` internally and takes most of its parameters. However, :func:`regplot` is an axes-level function, so it draws directly onto an axes (either the currently active axes or the one provided by the ``ax`` parameter), while :func:`lmplot` is a figure-level function and creates its own figure, which is managed through a :class:`FacetGrid`. This has a few consequences, namely that :func:`regplot` can happily coexist in a figure with other kinds of plots and will follow the global matplotlib color cycle. In contrast, :func:`lmplot` needs to occupy an entire figure, and the size and color cycle are controlled through function parameters, ignoring the global defaults.\ """), x_estimator=dedent("""\ x_estimator : callable that maps vector -> scalar, optional Apply this function to each unique value of ``x`` and plot the resulting estimate. This is useful when ``x`` is a discrete variable. If ``x_ci`` is not ``None``, this estimate will be bootstrapped and a confidence interval will be drawn.\ """), x_bins=dedent("""\ x_bins : int or vector, optional Bin the ``x`` variable into discrete bins and then estimate the central tendency and a confidence interval. This binning only influences how the scatterplot is drawn; the regression is still fit to the original data. This parameter is interpreted either as the number of evenly-sized (not necessary spaced) bins or the positions of the bin centers. When this parameter is used, it implies that the default of ``x_estimator`` is ``numpy.mean``.\ """), x_ci=dedent("""\ x_ci : "ci", int in [0, 100] or None, optional Size of the confidence interval used when plotting a central tendency for discrete values of ``x``. If "ci", defer to the value of the``ci`` parameter.\ """), scatter=dedent("""\ scatter : bool, optional If ``True``, draw a scatterplot with the underlying observations (or the ``x_estimator`` values).\ """), fit_reg=dedent("""\ fit_reg : bool, optional If ``True``, estimate and plot a regression model relating the ``x`` and ``y`` variables.\ """), ci=dedent("""\ ci : int in [0, 100] or None, optional Size of the confidence interval for the regression estimate. This will be drawn using translucent bands around the regression line. The confidence interval is estimated using a bootstrap; for large datasets, it may be advisable to avoid that computation by setting this parameter to None.\ """), n_boot=dedent("""\ n_boot : int, optional Number of bootstrap resamples used to estimate the ``ci``. The default value attempts to balance time and stability; you may want to increase this value for "final" versions of plots.\ """), units=dedent("""\ units : variable name in ``data``, optional If the ``x`` and ``y`` observations are nested within sampling units, those can be specified here. This will be taken into account when computing the confidence intervals by performing a multilevel bootstrap that resamples both units and observations (within unit). This does not otherwise influence how the regression is estimated or drawn.\ """), order=dedent("""\ order : int, optional If ``order`` is greater than 1, use ``numpy.polyfit`` to estimate a polynomial regression.\ """), logistic=dedent("""\ logistic : bool, optional If ``True``, assume that ``y`` is a binary variable and use ``statsmodels`` to estimate a logistic regression model. Note that this is substantially more computationally intensive than linear regression, so you may wish to decrease the number of bootstrap resamples (``n_boot``) or set ``ci`` to None.\ """), lowess=dedent("""\ lowess : bool, optional If ``True``, use ``statsmodels`` to estimate a nonparametric lowess model (locally weighted linear regression). Note that confidence intervals cannot currently be drawn for this kind of model.\ """), robust=dedent("""\ robust : bool, optional If ``True``, use ``statsmodels`` to estimate a robust regression. This will de-weight outliers. Note that this is substantially more computationally intensive than standard linear regression, so you may wish to decrease the number of bootstrap resamples (``n_boot``) or set ``ci`` to None.\ """), logx=dedent("""\ logx : bool, optional If ``True``, estimate a linear regression of the form y ~ log(x), but plot the scatterplot and regression model in the input space. Note that ``x`` must be positive for this to work.\ """), xy_partial=dedent("""\ {x,y}_partial : strings in ``data`` or matrices Confounding variables to regress out of the ``x`` or ``y`` variables before plotting.\ """), truncate=dedent("""\ truncate : bool, optional By default, the regression line is drawn to fill the x axis limits after the scatterplot is drawn. If ``truncate`` is ``True``, it will instead by bounded by the data limits.\ """), xy_jitter=dedent("""\ {x,y}_jitter : floats, optional Add uniform random noise of this size to either the ``x`` or ``y`` variables. The noise is added to a copy of the data after fitting the regression, and only influences the look of the scatterplot. This can be helpful when plotting variables that take discrete values.\ """), scatter_line_kws=dedent("""\ {scatter,line}_kws : dictionaries Additional keyword arguments to pass to ``plt.scatter`` and ``plt.plot``.\ """), ) _regression_docs.update(_facet_docs) def lmplot(x, y, data, hue=None, col=None, row=None, palette=None, col_wrap=None, size=5, aspect=1, markers="o", sharex=True, sharey=True, hue_order=None, col_order=None, row_order=None, legend=True, legend_out=True, x_estimator=None, x_bins=None, x_ci="ci", scatter=True, fit_reg=True, ci=95, n_boot=1000, units=None, order=1, logistic=False, lowess=False, robust=False, logx=False, x_partial=None, y_partial=None, truncate=False, x_jitter=None, y_jitter=None, scatter_kws=None, line_kws=None): # Reduce the dataframe to only needed columns need_cols = [x, y, hue, col, row, units, x_partial, y_partial] cols = np.unique([a for a in need_cols if a is not None]).tolist() data = data[cols] # Initialize the grid facets = FacetGrid(data, row, col, hue, palette=palette, row_order=row_order, col_order=col_order, hue_order=hue_order, size=size, aspect=aspect, col_wrap=col_wrap, sharex=sharex, sharey=sharey, legend_out=legend_out) # Add the markers here as FacetGrid has figured out how many levels of the # hue variable are needed and we don't want to duplicate that process if facets.hue_names is None: n_markers = 1 else: n_markers = len(facets.hue_names) if not isinstance(markers, list): markers = [markers] * n_markers if len(markers) != n_markers: raise ValueError(("markers must be a singeton or a list of markers " "for each level of the hue variable")) facets.hue_kws = {"marker": markers} # Hack to set the x limits properly, which needs to happen here # because the extent of the regression estimate is determined # by the limits of the plot if sharex: for ax in facets.axes.flat: scatter = ax.scatter(data[x], np.ones(len(data)) * data[y].mean()) scatter.remove() # Draw the regression plot on each facet regplot_kws = dict( x_estimator=x_estimator, x_bins=x_bins, x_ci=x_ci, scatter=scatter, fit_reg=fit_reg, ci=ci, n_boot=n_boot, units=units, order=order, logistic=logistic, lowess=lowess, robust=robust, logx=logx, x_partial=x_partial, y_partial=y_partial, truncate=truncate, x_jitter=x_jitter, y_jitter=y_jitter, scatter_kws=scatter_kws, line_kws=line_kws, ) facets.map_dataframe(regplot, x, y, **regplot_kws) # Add a legend if legend and (hue is not None) and (hue not in [col, row]): facets.add_legend() return facets lmplot.__doc__ = dedent("""\ Plot data and regression model fits across a FacetGrid. This function combines :func:`regplot` and :class:`FacetGrid`. It is intended as a convenient interface to fit regression models across conditional subsets of a dataset. When thinking about how to assign variables to different facets, a general rule is that it makes sense to use ``hue`` for the most important comparison, followed by ``col`` and ``row``. However, always think about your particular dataset and the goals of the visualization you are creating. {model_api} The parameters to this function span most of the options in :class:`FacetGrid`, although there may be occasional cases where you will want to use that class and :func:`regplot` directly. Parameters ---------- x, y : strings, optional Input variables; these should be column names in ``data``. {data} hue, col, row : strings Variables that define subsets of the data, which will be drawn on separate facets in the grid. See the ``*_order`` parameters to control the order of levels of this variable. {palette} {col_wrap} {size} {aspect} markers : matplotlib marker code or list of marker codes, optional Markers for the scatterplot. If a list, each marker in the list will be used for each level of the ``hue`` variable. {share_xy} {{hue,col,row}}_order : lists, optional Order for the levels of the faceting variables. By default, this will be the order that the levels appear in ``data`` or, if the variables are pandas categoricals, the category order. legend : bool, optional If ``True`` and there is a ``hue`` variable, add a legend. {legend_out} {x_estimator} {x_bins} {x_ci} {scatter} {fit_reg} {ci} {n_boot} {units} {order} {logistic} {lowess} {robust} {logx} {xy_partial} {truncate} {xy_jitter} {scatter_line_kws} See Also -------- regplot : Plot data and a conditional model fit. FacetGrid : Subplot grid for plotting conditional relationships. pairplot : Combine :func:`regplot` and :class:`PairGrid` (when used with ``kind="reg"``). Notes ----- {regplot_vs_lmplot} Examples -------- These examples focus on basic regression model plots to exhibit the various faceting options; see the :func:`regplot` docs for demonstrations of the other options for plotting the data and models. There are also other examples for how to manipulate plot using the returned object on the :class:`FacetGrid` docs. Plot a simple linear relationship between two variables: .. plot:: :context: close-figs >>> import seaborn as sns; sns.set(color_codes=True) >>> tips = sns.load_dataset("tips") >>> g = sns.lmplot(x="total_bill", y="tip", data=tips) Condition on a third variable and plot the levels in different colors: .. plot:: :context: close-figs >>> g = sns.lmplot(x="total_bill", y="tip", hue="smoker", data=tips) Use different markers as well as colors so the plot will reproduce to black-and-white more easily: .. plot:: :context: close-figs >>> g = sns.lmplot(x="total_bill", y="tip", hue="smoker", data=tips, ... markers=["o", "x"]) Use a different color palette: .. plot:: :context: close-figs >>> g = sns.lmplot(x="total_bill", y="tip", hue="smoker", data=tips, ... palette="Set1") Map ``hue`` levels to colors with a dictionary: .. plot:: :context: close-figs >>> g = sns.lmplot(x="total_bill", y="tip", hue="smoker", data=tips, ... palette=dict(Yes="g", No="m")) Plot the levels of the third variable across different columns: .. plot:: :context: close-figs >>> g = sns.lmplot(x="total_bill", y="tip", col="smoker", data=tips) Change the size and aspect ratio of the facets: .. plot:: :context: close-figs >>> g = sns.lmplot(x="size", y="total_bill", hue="day", col="day", ... data=tips, aspect=.4, x_jitter=.1) Wrap the levels of the column variable into multiple rows: .. plot:: :context: close-figs >>> g = sns.lmplot(x="total_bill", y="tip", col="day", hue="day", ... data=tips, col_wrap=2, size=3) Condition on two variables to make a full grid: .. plot:: :context: close-figs >>> g = sns.lmplot(x="total_bill", y="tip", row="sex", col="time", ... data=tips, size=3) Use methods on the returned :class:`FacetGrid` instance to further tweak the plot: .. plot:: :context: close-figs >>> g = sns.lmplot(x="total_bill", y="tip", row="sex", col="time", ... data=tips, size=3) >>> g = (g.set_axis_labels("Total bill (US Dollars)", "Tip") ... .set(xlim=(0, 60), ylim=(0, 12), ... xticks=[10, 30, 50], yticks=[2, 6, 10]) ... .fig.subplots_adjust(wspace=.02)) """).format(**_regression_docs) def regplot(x, y, data=None, x_estimator=None, x_bins=None, x_ci="ci", scatter=True, fit_reg=True, ci=95, n_boot=1000, units=None, order=1, logistic=False, lowess=False, robust=False, logx=False, x_partial=None, y_partial=None, truncate=False, dropna=True, x_jitter=None, y_jitter=None, label=None, color=None, marker="o", scatter_kws=None, line_kws=None, ax=None): plotter = _RegressionPlotter(x, y, data, x_estimator, x_bins, x_ci, scatter, fit_reg, ci, n_boot, units, order, logistic, lowess, robust, logx, x_partial, y_partial, truncate, dropna, x_jitter, y_jitter, color, label) if ax is None: ax = plt.gca() scatter_kws = {} if scatter_kws is None else copy.copy(scatter_kws) scatter_kws["marker"] = marker line_kws = {} if line_kws is None else copy.copy(line_kws) plotter.plot(ax, scatter_kws, line_kws) return ax regplot.__doc__ = dedent("""\ Plot data and a linear regression model fit. {model_api} Parameters ---------- x, y: string, series, or vector array Input variables. If strings, these should correspond with column names in ``data``. When pandas objects are used, axes will be labeled with the series name. {data} {x_estimator} {x_bins} {x_ci} {scatter} {fit_reg} {ci} {n_boot} {units} {order} {logistic} {lowess} {robust} {logx} {xy_partial} {truncate} {xy_jitter} label : string Label to apply to ether the scatterplot or regression line (if ``scatter`` is ``False``) for use in a legend. color : matplotlib color Color to apply to all plot elements; will be superseded by colors passed in ``scatter_kws`` or ``line_kws``. marker : matplotlib marker code Marker to use for the scatterplot glyphs. {scatter_line_kws} ax : matplotlib Axes, optional Axes object to draw the plot onto, otherwise uses the current Axes. Returns ------- ax : matplotlib Axes The Axes object containing the plot. See Also -------- lmplot : Combine :func:`regplot` and :class:`FacetGrid` to plot multiple linear relationships in a dataset. jointplot : Combine :func:`regplot` and :class:`JointGrid` (when used with ``kind="reg"``). pairplot : Combine :func:`regplot` and :class:`PairGrid` (when used with ``kind="reg"``). residplot : Plot the residuals of a linear regression model. interactplot : Plot a two-way interaction between continuous variables Notes ----- {regplot_vs_lmplot} It's also easy to combine combine :func:`regplot` and :class:`JointGrid` or :class:`PairGrid` through the :func:`jointplot` and :func:`pairplot` functions, although these do not directly accept all of :func:`regplot`'s parameters. Examples -------- Plot the relationship between two variables in a DataFrame: .. plot:: :context: close-figs >>> import seaborn as sns; sns.set(color_codes=True) >>> tips = sns.load_dataset("tips") >>> ax = sns.regplot(x="total_bill", y="tip", data=tips) Plot with two variables defined as numpy arrays; use a different color: .. plot:: :context: close-figs >>> import numpy as np; np.random.seed(8) >>> mean, cov = [4, 6], [(1.5, .7), (.7, 1)] >>> x, y = np.random.multivariate_normal(mean, cov, 80).T >>> ax = sns.regplot(x=x, y=y, color="g") Plot with two variables defined as pandas Series; use a different marker: .. plot:: :context: close-figs >>> import pandas as pd >>> x, y = pd.Series(x, name="x_var"), pd.Series(y, name="y_var") >>> ax = sns.regplot(x=x, y=y, marker="+") Use a 68% confidence interval, which corresponds with the standard error of the estimate: .. plot:: :context: close-figs >>> ax = sns.regplot(x=x, y=y, ci=68) Plot with a discrete ``x`` variable and add some jitter: .. plot:: :context: close-figs >>> ax = sns.regplot(x="size", y="total_bill", data=tips, x_jitter=.1) Plot with a discrete ``x`` variable showing means and confidence intervals for unique values: .. plot:: :context: close-figs >>> ax = sns.regplot(x="size", y="total_bill", data=tips, ... x_estimator=np.mean) Plot with a continuous variable divided into discrete bins: .. plot:: :context: close-figs >>> ax = sns.regplot(x=x, y=y, x_bins=4) Fit a higher-order polynomial regression and truncate the model prediction: .. plot:: :context: close-figs >>> ans = sns.load_dataset("anscombe") >>> ax = sns.regplot(x="x", y="y", data=ans.loc[ans.dataset == "II"], ... scatter_kws={{"s": 80}}, ... order=2, ci=None, truncate=True) Fit a robust regression and don't plot a confidence interval: .. plot:: :context: close-figs >>> ax = sns.regplot(x="x", y="y", data=ans.loc[ans.dataset == "III"], ... scatter_kws={{"s": 80}}, ... robust=True, ci=None) Fit a logistic regression; jitter the y variable and use fewer bootstrap iterations: .. plot:: :context: close-figs >>> tips["big_tip"] = (tips.tip / tips.total_bill) > .175 >>> ax = sns.regplot(x="total_bill", y="big_tip", data=tips, ... logistic=True, n_boot=500, y_jitter=.03) Fit the regression model using log(x) and truncate the model prediction: .. plot:: :context: close-figs >>> ax = sns.regplot(x="size", y="total_bill", data=tips, ... x_estimator=np.mean, logx=True, truncate=True) """).format(**_regression_docs) def residplot(x, y, data=None, lowess=False, x_partial=None, y_partial=None, order=1, robust=False, dropna=True, label=None, color=None, scatter_kws=None, line_kws=None, ax=None): """Plot the residuals of a linear regression. This function will regress y on x (possibly as a robust or polynomial regression) and then draw a scatterplot of the residuals. You can optionally fit a lowess smoother to the residual plot, which can help in determining if there is structure to the residuals. Parameters ---------- x : vector or string Data or column name in `data` for the predictor variable. y : vector or string Data or column name in `data` for the response variable. data : DataFrame, optional DataFrame to use if `x` and `y` are column names. lowess : boolean, optional Fit a lowess smoother to the residual scatterplot. {x, y}_partial : matrix or string(s) , optional Matrix with same first dimension as `x`, or column name(s) in `data`. These variables are treated as confounding and are removed from the `x` or `y` variables before plotting. order : int, optional Order of the polynomial to fit when calculating the residuals. robust : boolean, optional Fit a robust linear regression when calculating the residuals. dropna : boolean, optional If True, ignore observations with missing data when fitting and plotting. label : string, optional Label that will be used in any plot legends. color : matplotlib color, optional Color to use for all elements of the plot. {scatter, line}_kws : dictionaries, optional Additional keyword arguments passed to scatter() and plot() for drawing the components of the plot. ax : matplotlib axis, optional Plot into this axis, otherwise grab the current axis or make a new one if not existing. Returns ------- ax: matplotlib axes Axes with the regression plot. See Also -------- regplot : Plot a simple linear regression model. jointplot (with kind="resid"): Draw a residplot with univariate marginal distrbutions. """ plotter = _RegressionPlotter(x, y, data, ci=None, order=order, robust=robust, x_partial=x_partial, y_partial=y_partial, dropna=dropna, color=color, label=label) if ax is None: ax = plt.gca() # Calculate the residual from a linear regression _, yhat, _ = plotter.fit_regression(grid=plotter.x) plotter.y = plotter.y - yhat # Set the regression option on the plotter if lowess: plotter.lowess = True else: plotter.fit_reg = False # Plot a horizontal line at 0 ax.axhline(0, ls=":", c=".2") # Draw the scatterplot scatter_kws = {} if scatter_kws is None else scatter_kws line_kws = {} if line_kws is None else line_kws plotter.plot(ax, scatter_kws, line_kws) return ax def coefplot(formula, data, groupby=None, intercept=False, ci=95, palette="husl"): """Plot the coefficients from a linear model. Parameters ---------- formula : string patsy formula for ols model data : dataframe data for the plot; formula terms must appear in columns groupby : grouping object, optional object to group data with to fit conditional models intercept : bool, optional if False, strips the intercept term before plotting ci : float, optional size of confidence intervals palette : seaborn color palette, optional palette for the horizonal plots """ if not _has_statsmodels: raise ImportError("The `coefplot` function requires statsmodels") import statsmodels.formula.api as sf alpha = 1 - ci / 100 if groupby is None: coefs = sf.ols(formula, data).fit().params cis = sf.ols(formula, data).fit().conf_int(alpha) else: grouped = data.groupby(groupby) coefs = grouped.apply(lambda d: sf.ols(formula, d).fit().params).T cis = grouped.apply(lambda d: sf.ols(formula, d).fit().conf_int(alpha)) # Possibly ignore the intercept if not intercept: coefs = coefs.ix[1:] n_terms = len(coefs) # Plot seperately depending on groupby w, h = mpl.rcParams["figure.figsize"] hsize = lambda n: n * (h / 2) wsize = lambda n: n * (w / (4 * (n / 5))) if groupby is None: colors = itertools.cycle(color_palette(palette, n_terms)) f, ax = plt.subplots(1, 1, figsize=(wsize(n_terms), hsize(1))) for i, term in enumerate(coefs.index): color = next(colors) low, high = cis.ix[term] ax.plot([i, i], [low, high], c=color, solid_capstyle="round", lw=2.5) ax.plot(i, coefs.ix[term], "o", c=color, ms=8) ax.set_xlim(-.5, n_terms - .5) ax.axhline(0, ls="--", c="dimgray") ax.set_xticks(range(n_terms)) ax.set_xticklabels(coefs.index) else: n_groups = len(coefs.columns) f, axes = plt.subplots(n_terms, 1, sharex=True, figsize=(wsize(n_groups), hsize(n_terms))) if n_terms == 1: axes = [axes] colors = itertools.cycle(color_palette(palette, n_groups)) for ax, term in zip(axes, coefs.index): for i, group in enumerate(coefs.columns): color = next(colors) low, high = cis.ix[(group, term)] ax.plot([i, i], [low, high], c=color, solid_capstyle="round", lw=2.5) ax.plot(i, coefs.loc[term, group], "o", c=color, ms=8) ax.set_xlim(-.5, n_groups - .5) ax.axhline(0, ls="--", c="dimgray") ax.set_title(term) ax.set_xlabel(groupby) ax.set_xticks(range(n_groups)) ax.set_xticklabels(coefs.columns) def interactplot(x1, x2, y, data=None, filled=False, cmap="RdBu_r", colorbar=True, levels=30, logistic=False, contour_kws=None, scatter_kws=None, ax=None, **kwargs): """Visualize a continuous two-way interaction with a contour plot. Parameters ---------- x1, x2, y, strings or array-like Either the two independent variables and the dependent variable, or keys to extract them from `data` data : DataFrame Pandas DataFrame with the data in the columns. filled : bool Whether to plot with filled or unfilled contours cmap : matplotlib colormap Colormap to represent yhat in the countour plot. colorbar : bool Whether to draw the colorbar for interpreting the color values. levels : int or sequence Number or position of contour plot levels. logistic : bool Fit a logistic regression model instead of linear regression. contour_kws : dictionary Keyword arguments for contour[f](). scatter_kws : dictionary Keyword arguments for plot(). ax : matplotlib axis Axis to draw plot in. Returns ------- ax : Matplotlib axis Axis with the contour plot. """ if not _has_statsmodels: raise ImportError("The `interactplot` function requires statsmodels") from statsmodels.regression.linear_model import OLS from statsmodels.genmod.generalized_linear_model import GLM from statsmodels.genmod.families import Binomial # Handle the form of the data if data is not None: x1 = data[x1] x2 = data[x2] y = data[y] if hasattr(x1, "name"): xlabel = x1.name else: xlabel = None if hasattr(x2, "name"): ylabel = x2.name else: ylabel = None if hasattr(y, "name"): clabel = y.name else: clabel = None x1 = np.asarray(x1) x2 = np.asarray(x2) y = np.asarray(y) # Initialize the scatter keyword dictionary if scatter_kws is None: scatter_kws = {} if not ("color" in scatter_kws or "c" in scatter_kws): scatter_kws["color"] = "#222222" if "alpha" not in scatter_kws: scatter_kws["alpha"] = 0.75 # Intialize the contour keyword dictionary if contour_kws is None: contour_kws = {} # Initialize the axis if ax is None: ax = plt.gca() # Plot once to let matplotlib sort out the axis limits ax.plot(x1, x2, "o", **scatter_kws) # Find the plot limits x1min, x1max = ax.get_xlim() x2min, x2max = ax.get_ylim() # Make the grid for the contour plot x1_points = np.linspace(x1min, x1max, 100) x2_points = np.linspace(x2min, x2max, 100) xx1, xx2 = np.meshgrid(x1_points, x2_points) # Fit the model with an interaction X = np.c_[np.ones(x1.size), x1, x2, x1 * x2] if logistic: lm = GLM(y, X, family=Binomial()).fit() else: lm = OLS(y, X).fit() # Evaluate the model on the grid eval = np.vectorize(lambda x1_, x2_: lm.predict([1, x1_, x2_, x1_ * x2_])) yhat = eval(xx1, xx2) # Default color limits put the midpoint at mean(y) y_bar = y.mean() c_min = min(np.percentile(y, 2), yhat.min()) c_max = max(np.percentile(y, 98), yhat.max()) delta = max(c_max - y_bar, y_bar - c_min) c_min, cmax = y_bar - delta, y_bar + delta contour_kws.setdefault("vmin", c_min) contour_kws.setdefault("vmax", c_max) # Draw the contour plot func_name = "contourf" if filled else "contour" contour = getattr(ax, func_name) c = contour(xx1, xx2, yhat, levels, cmap=cmap, **contour_kws) # Draw the scatter again so it's visible ax.plot(x1, x2, "o", **scatter_kws) # Draw a colorbar, maybe if colorbar: bar = plt.colorbar(c) # Label the axes if xlabel is not None: ax.set_xlabel(xlabel) if ylabel is not None: ax.set_ylabel(ylabel) if clabel is not None and colorbar: clabel = "P(%s)" % clabel if logistic else clabel bar.set_label(clabel, labelpad=15, rotation=270) return ax def corrplot(data, names=None, annot=True, sig_stars=True, sig_tail="both", sig_corr=True, cmap=None, cmap_range=None, cbar=True, diag_names=True, method=None, ax=None, **kwargs): """Plot a correlation matrix with colormap and r values. NOTE: This function is deprecated in favor of :func:`heatmap` and will be removed in a forthcoming release. Parameters ---------- data : Dataframe or nobs x nvars array Rectangular input data with variabes in the columns. names : sequence of strings Names to associate with variables if `data` is not a DataFrame. annot : bool Whether to annotate the upper triangle with correlation coefficients. sig_stars : bool If True, get significance with permutation test and denote with stars. sig_tail : both | upper | lower Direction for significance test. Also controls the default colorbar. sig_corr : bool If True, use FWE-corrected p values for the sig stars. cmap : colormap Colormap name as string or colormap object. cmap_range : None, "full", (low, high) Either truncate colormap at (-max(abs(r)), max(abs(r))), use the full range (-1, 1), or specify (min, max) values for the colormap. cbar : bool If true, plot the colorbar legend. method: None (pearson) | kendall | spearman Correlation method to compute pairwise correlations. Methods other than the default pearson correlation will not have a significance computed. ax : matplotlib axis Axis to draw plot in. kwargs : other keyword arguments Passed to ax.matshow() Returns ------- ax : matplotlib axis Axis object with plot. """ warnings.warn(("The `corrplot` function has been deprecated in favor " "of `heatmap` and will be removed in a forthcoming " "release. Please update your code.")) if not isinstance(data, pd.DataFrame): if names is None: names = ["var_%d" % i for i in range(data.shape[1])] data = pd.DataFrame(data, columns=names, dtype=np.float) # Calculate the correlation matrix of the dataframe if method is None: corrmat = data.corr() else: corrmat = data.corr(method=method) # Pandas will drop non-numeric columns; let's keep track of that operation names = corrmat.columns data = data[names] # Get p values with a permutation test if annot and sig_stars and method is None: p_mat = algo.randomize_corrmat(data.values.T, sig_tail, sig_corr) else: p_mat = None # Sort out the color range if cmap_range is None: triu = np.triu_indices(len(corrmat), 1) vmax = min(1, np.max(np.abs(corrmat.values[triu])) * 1.15) vmin = -vmax if sig_tail == "both": cmap_range = vmin, vmax elif sig_tail == "upper": cmap_range = 0, vmax elif sig_tail == "lower": cmap_range = vmin, 0 elif cmap_range == "full": cmap_range = (-1, 1) # Find a colormapping, somewhat intelligently if cmap is None: if min(cmap_range) >= 0: cmap = "OrRd" elif max(cmap_range) <= 0: cmap = "PuBu_r" else: cmap = "coolwarm" if cmap == "jet": # Paternalism raise ValueError("Never use the 'jet' colormap!") # Plot using the more general symmatplot function ax = symmatplot(corrmat, p_mat, names, cmap, cmap_range, cbar, annot, diag_names, ax, **kwargs) return ax def symmatplot(mat, p_mat=None, names=None, cmap="Greys", cmap_range=None, cbar=True, annot=True, diag_names=True, ax=None, **kwargs): """Plot a symmetric matrix with colormap and statistic values. NOTE: This function is deprecated in favor of :func:`heatmap` and will be removed in a forthcoming release. """ warnings.warn(("The `symmatplot` function has been deprecated in favor " "of `heatmap` and will be removed in a forthcoming " "release. Please update your code.")) if ax is None: ax = plt.gca() nvars = len(mat) if isinstance(mat, pd.DataFrame): plotmat = mat.values.copy() mat = mat.values else: plotmat = mat.copy() plotmat[np.triu_indices(nvars)] = np.nan if cmap_range is None: vmax = np.nanmax(plotmat) * 1.15 vmin = np.nanmin(plotmat) * 1.15 elif len(cmap_range) == 2: vmin, vmax = cmap_range else: raise ValueError("cmap_range argument not understood") mat_img = ax.matshow(plotmat, cmap=cmap, vmin=vmin, vmax=vmax, **kwargs) if cbar: plt.colorbar(mat_img, shrink=.75) if p_mat is None: p_mat = np.ones((nvars, nvars)) if annot: for i, j in zip(*np.triu_indices(nvars, 1)): val = mat[i, j] stars = utils.sig_stars(p_mat[i, j]) ax.text(j, i, "\n%.2g\n%s" % (val, stars), fontdict=dict(ha="center", va="center")) else: fill = np.ones_like(plotmat) fill[np.tril_indices_from(fill, -1)] = np.nan ax.matshow(fill, cmap="Greys", vmin=0, vmax=0, zorder=2) if names is None: names = ["var%d" % i for i in range(nvars)] if diag_names: for i, name in enumerate(names): ax.text(i, i, name, fontdict=dict(ha="center", va="center", weight="bold", rotation=45)) ax.set_xticklabels(()) ax.set_yticklabels(()) else: ax.xaxis.set_ticks_position("bottom") xnames = names if annot else names[:-1] ax.set_xticklabels(xnames, rotation=90) ynames = names if annot else names[1:] ax.set_yticklabels(ynames) minor_ticks = np.linspace(-.5, nvars - 1.5, nvars) ax.set_xticks(minor_ticks, True) ax.set_yticks(minor_ticks, True) major_ticks = np.linspace(0, nvars - 1, nvars) xticks = major_ticks if annot else major_ticks[:-1] ax.set_xticks(xticks) yticks = major_ticks if annot else major_ticks[1:] ax.set_yticks(yticks) ax.grid(False, which="major") ax.grid(True, which="minor", linestyle="-") return ax def pairplot(data, hue=None, hue_order=None, palette=None, vars=None, x_vars=None, y_vars=None, kind="scatter", diag_kind="hist", markers=None, size=2.5, aspect=1, dropna=True, plot_kws=None, diag_kws=None, grid_kws=None): """Plot pairwise relationships in a dataset. By default, this function will create a grid of Axes such that each variable in ``data`` will by shared in the y-axis across a single row and in the x-axis across a single column. The diagonal Axes are treated differently, drawing a plot to show the univariate distribution of the data for the variable in that column. It is also possible to show a subset of variables or plot different variables on the rows and columns. This is a high-level interface for :class:`PairGrid` that is intended to make it easy to draw a few common styles. You should use :class`PairGrid` directly if you need more flexibility. Parameters ---------- data : DataFrame Tidy (long-form) dataframe where each column is a variable and each row is an observation. hue : string (variable name), optional Variable in ``data`` to map plot aspects to different colors. hue_order : list of strings Order for the levels of the hue variable in the palette palette : dict or seaborn color palette Set of colors for mapping the ``hue`` variable. If a dict, keys should be values in the ``hue`` variable. vars : list of variable names, optional Variables within ``data`` to use, otherwise use every column with a numeric datatype. {x, y}_vars : lists of variable names, optional Variables within ``data`` to use separately for the rows and columns of the figure; i.e. to make a non-square plot. kind : {'scatter', 'reg'}, optional Kind of plot for the non-identity relationships. diag_kind : {'hist', 'kde'}, optional Kind of plot for the diagonal subplots. markers : single matplotlib marker code or list, optional Either the marker to use for all datapoints or a list of markers with a length the same as the number of levels in the hue variable so that differently colored points will also have different scatterplot markers. size : scalar, optional Height (in inches) of each facet. aspect : scalar, optional Aspect * size gives the width (in inches) of each facet. dropna : boolean, optional Drop missing values from the data before plotting. {plot, diag, grid}_kws : dicts, optional Dictionaries of keyword arguments. Returns ------- grid : PairGrid Returns the underlying ``PairGrid`` instance for further tweaking. See Also -------- PairGrid : Subplot grid for more flexible plotting of pairwise relationships. Examples -------- Draw scatterplots for joint relationships and histograms for univariate distributions: .. plot:: :context: close-figs >>> import seaborn as sns; sns.set(style="ticks", color_codes=True) >>> iris = sns.load_dataset("iris") >>> g = sns.pairplot(iris) Show different levels of a categorical variable by the color of plot elements: .. plot:: :context: close-figs >>> g = sns.pairplot(iris, hue="species") Use a different color palette: .. plot:: :context: close-figs >>> g = sns.pairplot(iris, hue="species", palette="husl") Use different markers for each level of the hue variable: .. plot:: :context: close-figs >>> g = sns.pairplot(iris, hue="species", markers=["o", "s", "D"]) Plot a subset of variables: .. plot:: :context: close-figs >>> g = sns.pairplot(iris, vars=["sepal_width", "sepal_length"]) Draw larger plots: .. plot:: :context: close-figs >>> g = sns.pairplot(iris, size=3, ... vars=["sepal_width", "sepal_length"]) Plot different variables in the rows and columns: .. plot:: :context: close-figs >>> g = sns.pairplot(iris, ... x_vars=["sepal_width", "sepal_length"], ... y_vars=["petal_width", "petal_length"]) Use kernel density estimates for univariate plots: .. plot:: :context: close-figs >>> g = sns.pairplot(iris, diag_kind="kde") Fit linear regression models to the scatter plots: .. plot:: :context: close-figs >>> g = sns.pairplot(iris, kind="reg") Pass keyword arguments down to the underlying functions (it may be easier to use :class:`PairGrid` directly): .. plot:: :context: close-figs >>> g = sns.pairplot(iris, diag_kind="kde", markers="+", ... plot_kws=dict(s=50, edgecolor="b", linewidth=1), ... diag_kws=dict(shade=True)) """ if plot_kws is None: plot_kws = {} if diag_kws is None: diag_kws = {} if grid_kws is None: grid_kws = {} # Set up the PairGrid diag_sharey = diag_kind == "hist" grid = PairGrid(data, vars=vars, x_vars=x_vars, y_vars=y_vars, hue=hue, hue_order=hue_order, palette=palette, diag_sharey=diag_sharey, size=size, aspect=aspect, dropna=dropna, **grid_kws) # Add the markers here as PairGrid has figured out how many levels of the # hue variable are needed and we don't want to duplicate that process if markers is not None: if grid.hue_names is None: n_markers = 1 else: n_markers = len(grid.hue_names) if not isinstance(markers, list): markers = [markers] * n_markers if len(markers) != n_markers: raise ValueError(("markers must be a singeton or a list of markers" " for each level of the hue variable")) grid.hue_kws = {"marker": markers} # Maybe plot on the diagonal if grid.square_grid: if diag_kind == "hist": grid.map_diag(plt.hist, **diag_kws) elif diag_kind == "kde": diag_kws["legend"] = False grid.map_diag(kdeplot, **diag_kws) # Maybe plot on the off-diagonals if grid.square_grid and diag_kind is not None: plotter = grid.map_offdiag else: plotter = grid.map if kind == "scatter": plot_kws.setdefault("edgecolor", "white") plotter(plt.scatter, **plot_kws) elif kind == "reg": plotter(regplot, **plot_kws) # Add a legend if hue is not None: grid.add_legend() return grid seaborn-0.6.0/seaborn/matrix.py000066400000000000000000001177341254425553400165030ustar00rootroot00000000000000"""Functions to visualize matrices of data.""" import itertools import colorsys import matplotlib as mpl import matplotlib.pyplot as plt from matplotlib import gridspec import numpy as np import pandas as pd from scipy.spatial import distance from scipy.cluster import hierarchy from .axisgrid import Grid from .palettes import cubehelix_palette from .utils import despine, axis_ticklabels_overlap from .external.six.moves import range def _index_to_label(index): """Convert a pandas index or multiindex to an axis label.""" if isinstance(index, pd.MultiIndex): return "-".join(map(str, index.names)) else: return index.name def _index_to_ticklabels(index): """Convert a pandas index or multiindex into ticklabels.""" if isinstance(index, pd.MultiIndex): return ["-".join(map(str, i)) for i in index.values] else: return index.values def _convert_colors(colors): """Convert either a list of colors or nested lists of colors to RGB.""" to_rgb = mpl.colors.colorConverter.to_rgb try: to_rgb(colors[0]) # If this works, there is only one level of colors return list(map(to_rgb, colors)) except ValueError: # If we get here, we have nested lists return [list(map(to_rgb, l)) for l in colors] def _matrix_mask(data, mask): """Ensure that data and mask are compatabile and add missing values. Values will be plotted for cells where ``mask`` is ``False``. ``data`` is expected to be a DataFrame; ``mask`` can be an array or a DataFrame. """ if mask is None: mask = np.zeros(data.shape, np.bool) if isinstance(mask, np.ndarray): # For array masks, ensure that shape matches data then convert if mask.shape != data.shape: raise ValueError("Mask must have the same shape as data.") mask = pd.DataFrame(mask, index=data.index, columns=data.columns, dtype=np.bool) elif isinstance(mask, pd.DataFrame): # For DataFrame masks, ensure that semantic labels match data if not mask.index.equals(data.index) \ and mask.columns.equals(data.columns): err = "Mask must have the same index and columns as data." raise ValueError(err) # Add any cells with missing data to the mask # This works around an issue where `plt.pcolormesh` doesn't represent # missing data properly mask = mask | pd.isnull(data) return mask class _HeatMapper(object): """Draw a heatmap plot of a matrix with nice labels and colormaps.""" def __init__(self, data, vmin, vmax, cmap, center, robust, annot, fmt, annot_kws, cbar, cbar_kws, xticklabels=True, yticklabels=True, mask=None): """Initialize the plotting object.""" # We always want to have a DataFrame with semantic information # and an ndarray to pass to matplotlib if isinstance(data, pd.DataFrame): plot_data = data.values else: plot_data = np.asarray(data) data = pd.DataFrame(plot_data) # Validate the mask and convet to DataFrame mask = _matrix_mask(data, mask) # Reverse the rows so the plot looks like the matrix plot_data = plot_data[::-1] data = data.ix[::-1] mask = mask.ix[::-1] plot_data = np.ma.masked_where(np.asarray(mask), plot_data) # Get good names for the rows and columns xtickevery = 1 if isinstance(xticklabels, int) and xticklabels > 1: xtickevery = xticklabels xticklabels = _index_to_ticklabels(data.columns) elif isinstance(xticklabels, bool) and xticklabels: xticklabels = _index_to_ticklabels(data.columns) elif isinstance(xticklabels, bool) and not xticklabels: xticklabels = ['' for _ in range(data.shape[1])] ytickevery = 1 if isinstance(yticklabels, int) and yticklabels > 1: ytickevery = yticklabels yticklabels = _index_to_ticklabels(data.index) elif isinstance(yticklabels, bool) and yticklabels: yticklabels = _index_to_ticklabels(data.index) elif isinstance(yticklabels, bool) and not yticklabels: yticklabels = ['' for _ in range(data.shape[0])] else: yticklabels = yticklabels[::-1] # Get the positions and used label for the ticks nx, ny = data.T.shape xstart, xend, xstep = 0, nx, xtickevery self.xticks = np.arange(xstart, xend, xstep) + .5 self.xticklabels = xticklabels[xstart:xend:xstep] ystart, yend, ystep = (ny - 1) % ytickevery, ny, ytickevery self.yticks = np.arange(ystart, yend, ystep) + .5 self.yticklabels = yticklabels[ystart:yend:ystep] # Get good names for the axis labels xlabel = _index_to_label(data.columns) ylabel = _index_to_label(data.index) self.xlabel = xlabel if xlabel is not None else "" self.ylabel = ylabel if ylabel is not None else "" # Determine good default values for the colormapping self._determine_cmap_params(plot_data, vmin, vmax, cmap, center, robust) # Save other attributes to the object self.data = data self.plot_data = plot_data self.annot = annot self.fmt = fmt self.annot_kws = {} if annot_kws is None else annot_kws self.cbar = cbar self.cbar_kws = {} if cbar_kws is None else cbar_kws def _determine_cmap_params(self, plot_data, vmin, vmax, cmap, center, robust): """Use some heuristics to set good defaults for colorbar and range.""" calc_data = plot_data.data[~np.isnan(plot_data.data)] if vmin is None: vmin = np.percentile(calc_data, 2) if robust else calc_data.min() if vmax is None: vmax = np.percentile(calc_data, 98) if robust else calc_data.max() # Simple heuristics for whether these data should have a divergent map divergent = ((vmin < 0) and (vmax > 0)) or center is not None # Now set center to 0 so math below makes sense if center is None: center = 0 # A divergent map should be symmetric around the center value if divergent: vlim = max(abs(vmin - center), abs(vmax - center)) vmin, vmax = -vlim, vlim self.divergent = divergent # Now add in the centering value and set the limits vmin += center vmax += center self.vmin = vmin self.vmax = vmax # Choose default colormaps if not provided if cmap is None: if divergent: self.cmap = "RdBu_r" else: self.cmap = cubehelix_palette(light=.95, as_cmap=True) else: self.cmap = cmap def _annotate_heatmap(self, ax, mesh): """Add textual labels with the value in each cell.""" xpos, ypos = np.meshgrid(ax.get_xticks(), ax.get_yticks()) for x, y, val, color in zip(xpos.flat, ypos.flat, mesh.get_array(), mesh.get_facecolors()): if val is not np.ma.masked: _, l, _ = colorsys.rgb_to_hls(*color[:3]) text_color = ".15" if l > .5 else "w" val = ("{:" + self.fmt + "}").format(val) ax.text(x, y, val, color=text_color, ha="center", va="center", **self.annot_kws) def plot(self, ax, cax, kws): """Draw the heatmap on the provided Axes.""" # Remove all the Axes spines despine(ax=ax, left=True, bottom=True) # Draw the heatmap mesh = ax.pcolormesh(self.plot_data, vmin=self.vmin, vmax=self.vmax, cmap=self.cmap, **kws) # Set the axis limits ax.set(xlim=(0, self.data.shape[1]), ylim=(0, self.data.shape[0])) # Add row and column labels ax.set(xticks=self.xticks, yticks=self.yticks) xtl = ax.set_xticklabels(self.xticklabels) ytl = ax.set_yticklabels(self.yticklabels, rotation="vertical") # Possibly rotate them if they overlap plt.draw() if axis_ticklabels_overlap(xtl): plt.setp(xtl, rotation="vertical") if axis_ticklabels_overlap(ytl): plt.setp(ytl, rotation="horizontal") # Add the axis labels ax.set(xlabel=self.xlabel, ylabel=self.ylabel) # Annotate the cells with the formatted values if self.annot: self._annotate_heatmap(ax, mesh) # Possibly add a colorbar if self.cbar: ticker = mpl.ticker.MaxNLocator(6) cb = ax.figure.colorbar(mesh, cax, ax, ticks=ticker, **self.cbar_kws) cb.outline.set_linewidth(0) def heatmap(data, vmin=None, vmax=None, cmap=None, center=None, robust=False, annot=False, fmt=".2g", annot_kws=None, linewidths=0, linecolor="white", cbar=True, cbar_kws=None, cbar_ax=None, square=False, ax=None, xticklabels=True, yticklabels=True, mask=None, **kwargs): """Plot rectangular data as a color-encoded matrix. This function tries to infer a good colormap to use from the data, but this is not guaranteed to work, so take care to make sure the kind of colormap (sequential or diverging) and its limits are appropriate. This is an Axes-level function and will draw the heatmap into the currently-active Axes if none is provided to the ``ax`` argument. Part of this Axes space will be taken and used to plot a colormap, unless ``cbar`` is False or a separate Axes is provided to ``cbar_ax``. Parameters ---------- data : rectangular dataset 2D dataset that can be coerced into an ndarray. If a Pandas DataFrame is provided, the index/column information will be used to label the columns and rows. vmin, vmax : floats, optional Values to anchor the colormap, otherwise they are inferred from the data and other keyword arguments. When a diverging dataset is inferred, one of these values may be ignored. cmap : matplotlib colormap name or object, optional The mapping from data values to color space. If not provided, this will be either a cubehelix map (if the function infers a sequential dataset) or ``RdBu_r`` (if the function infers a diverging dataset). center : float, optional The value at which to center the colormap. Passing this value implies use of a diverging colormap. robust : bool, optional If True and ``vmin`` or ``vmax`` are absent, the colormap range is computed with robust quantiles instead of the extreme values. annot : bool, optional If True, write the data value in each cell. fmt : string, optional String formatting code to use when ``annot`` is True. annot_kws : dict of key, value mappings, optional Keyword arguments for ``ax.text`` when ``annot`` is True. linewidths : float, optional Width of the lines that will divide each cell. linecolor : color, optional Color of the lines that will divide each cell. cbar : boolean, optional Whether to draw a colorbar. cbar_kws : dict of key, value mappings, optional Keyword arguments for `fig.colorbar`. cbar_ax : matplotlib Axes, optional Axes in which to draw the colorbar, otherwise take space from the main Axes. square : boolean, optional If True, set the Axes aspect to "equal" so each cell will be square-shaped. ax : matplotlib Axes, optional Axes in which to draw the plot, otherwise use the currently-active Axes. xticklabels : list-like, int, or bool, optional If True, plot the column names of the dataframe. If False, don't plot the column names. If list-like, plot these alternate labels as the xticklabels. If an integer, use the column names but plot only every n label. yticklabels : list-like, int, or bool, optional If True, plot the row names of the dataframe. If False, don't plot the row names. If list-like, plot these alternate labels as the yticklabels. If an integer, use the index names but plot only every n label. mask : boolean array or DataFrame, optional If passed, data will not be shown in cells where ``mask`` is True. Cells with missing values are automatically masked. kwargs : other keyword arguments All other keyword arguments are passed to ``ax.pcolormesh``. Returns ------- ax : matplotlib Axes Axes object with the heatmap. Examples -------- Plot a heatmap for a numpy array: .. plot:: :context: close-figs >>> import numpy as np; np.random.seed(0) >>> import seaborn as sns; sns.set() >>> uniform_data = np.random.rand(10, 12) >>> ax = sns.heatmap(uniform_data) Change the limits of the colormap: .. plot:: :context: close-figs >>> ax = sns.heatmap(uniform_data, vmin=0, vmax=1) Plot a heatmap for data centered on 0: .. plot:: :context: close-figs >>> normal_data = np.random.randn(10, 12) >>> ax = sns.heatmap(normal_data) Plot a dataframe with meaningful row and column labels: .. plot:: :context: close-figs >>> flights = sns.load_dataset("flights") >>> flights = flights.pivot("month", "year", "passengers") >>> ax = sns.heatmap(flights) Annotate each cell with the numeric value using integer formatting: .. plot:: :context: close-figs >>> ax = sns.heatmap(flights, annot=True, fmt="d") Add lines between each cell: .. plot:: :context: close-figs >>> ax = sns.heatmap(flights, linewidths=.5) Use a different colormap: .. plot:: :context: close-figs >>> ax = sns.heatmap(flights, cmap="YlGnBu") Center the colormap at a specific value: .. plot:: :context: close-figs >>> ax = sns.heatmap(flights, center=flights.loc["January", 1955]) Plot every other column label and don't plot row labels: .. plot:: :context: close-figs >>> data = np.random.randn(50, 20) >>> ax = sns.heatmap(data, xticklabels=2, yticklabels=False) Don't draw a colorbar: .. plot:: :context: close-figs >>> ax = sns.heatmap(flights, cbar=False) Use different axes for the colorbar: .. plot:: :context: close-figs >>> grid_kws = {"height_ratios": (.9, .05), "hspace": .3} >>> f, (ax, cbar_ax) = plt.subplots(2, gridspec_kw=grid_kws) >>> ax = sns.heatmap(flights, ax=ax, ... cbar_ax=cbar_ax, ... cbar_kws={"orientation": "horizontal"}) Use a mask to plot only part of a matrix .. plot:: :context: close-figs >>> corr = np.corrcoef(np.random.randn(10, 200)) >>> mask = np.zeros_like(corr) >>> mask[np.triu_indices_from(mask)] = True >>> with sns.axes_style("white"): ... ax = sns.heatmap(corr, mask=mask, vmax=.3, square=True) """ # Initialize the plotter object plotter = _HeatMapper(data, vmin, vmax, cmap, center, robust, annot, fmt, annot_kws, cbar, cbar_kws, xticklabels, yticklabels, mask) # Add the pcolormesh kwargs here kwargs["linewidths"] = linewidths kwargs["edgecolor"] = linecolor # Draw the plot and return the Axes if ax is None: ax = plt.gca() if square: ax.set_aspect("equal") plotter.plot(ax, cbar_ax, kwargs) return ax class _DendrogramPlotter(object): """Object for drawing tree of similarities between data rows/columns""" def __init__(self, data, linkage, metric, method, axis, label, rotate): """Plot a dendrogram of the relationships between the columns of data Parameters ---------- data : pandas.DataFrame Rectangular data """ self.axis = axis if self.axis == 1: data = data.T if isinstance(data, pd.DataFrame): array = data.values else: array = np.asarray(data) data = pd.DataFrame(array) self.array = array self.data = data self.shape = self.data.shape self.metric = metric self.method = method self.axis = axis self.label = label self.rotate = rotate if linkage is None: self.linkage = self.calculated_linkage else: self.linkage = linkage self.dendrogram = self.calculate_dendrogram() # Dendrogram ends are always at multiples of 5, who knows why ticks = 10 * np.arange(self.data.shape[0]) + 5 if self.label: ticklabels = _index_to_ticklabels(self.data.index) ticklabels = [ticklabels[i] for i in self.reordered_ind] if self.rotate: self.xticks = [] self.yticks = ticks self.xticklabels = [] self.yticklabels = ticklabels self.ylabel = _index_to_label(self.data.index) self.xlabel = '' else: self.xticks = ticks self.yticks = [] self.xticklabels = ticklabels self.yticklabels = [] self.ylabel = '' self.xlabel = _index_to_label(self.data.index) else: self.xticks, self.yticks = [], [] self.yticklabels, self.xticklabels = [], [] self.xlabel, self.ylabel = '', '' if self.rotate: self.X = self.dendrogram['dcoord'] self.Y = self.dendrogram['icoord'] else: self.X = self.dendrogram['icoord'] self.Y = self.dendrogram['dcoord'] def _calculate_linkage_scipy(self): if np.product(self.shape) >= 10000: UserWarning('This will be slow... (gentle suggestion: ' '"pip install fastcluster")') pairwise_dists = distance.squareform( distance.pdist(self.array, metric=self.metric)) linkage = hierarchy.linkage(pairwise_dists, method=self.method) del pairwise_dists return linkage def _calculate_linkage_fastcluster(self): import fastcluster # Fastcluster has a memory-saving vectorized version, but only # with certain linkage methods, and mostly with euclidean metric vector_methods = ('single', 'centroid', 'median', 'ward') euclidean_methods = ('centroid', 'median', 'ward') euclidean = self.metric == 'euclidean' and self.method in \ euclidean_methods if euclidean or self.method == 'single': return fastcluster.linkage_vector(self.array, method=self.method, metric=self.metric) else: pairwise_dists = distance.pdist(self.array, metric=self.metric) linkage = fastcluster.linkage(pairwise_dists, method=self.method) del pairwise_dists return linkage @property def calculated_linkage(self): try: return self._calculate_linkage_fastcluster() except ImportError: return self._calculate_linkage_scipy() def calculate_dendrogram(self): """Calculates a dendrogram based on the linkage matrix Made a separate function, not a property because don't want to recalculate the dendrogram every time it is accessed. Returns ------- dendrogram : dict Dendrogram dictionary as returned by scipy.cluster.hierarchy .dendrogram. The important key-value pairing is "reordered_ind" which indicates the re-ordering of the matrix """ return hierarchy.dendrogram(self.linkage, no_plot=True, color_list=['k'], color_threshold=-np.inf) @property def reordered_ind(self): """Indices of the matrix, reordered by the dendrogram""" return self.dendrogram['leaves'] def plot(self, ax): """Plots a dendrogram of the similarities between data on the axes Parameters ---------- ax : matplotlib.axes.Axes Axes object upon which the dendrogram is plotted """ for x, y in zip(self.X, self.Y): ax.plot(x, y, color='k', linewidth=.5) if self.rotate and self.axis == 0: ax.invert_xaxis() ax.yaxis.set_ticks_position('right') ymax = min(map(min, self.Y)) + max(map(max, self.Y)) ax.set_ylim(0, ymax) ax.invert_yaxis() else: xmax = min(map(min, self.X)) + max(map(max, self.X)) ax.set_xlim(0, xmax) despine(ax=ax, bottom=True, left=True) ax.set(xticks=self.xticks, yticks=self.yticks, xlabel=self.xlabel, ylabel=self.ylabel) xtl = ax.set_xticklabels(self.xticklabels) ytl = ax.set_yticklabels(self.yticklabels, rotation='vertical') # Force a draw of the plot to avoid matplotlib window error plt.draw() if len(ytl) > 0 and axis_ticklabels_overlap(ytl): plt.setp(ytl, rotation="horizontal") if len(xtl) > 0 and axis_ticklabels_overlap(xtl): plt.setp(xtl, rotation="vertical") return self def dendrogram(data, linkage=None, axis=1, label=True, metric='euclidean', method='average', rotate=False, ax=None): """Draw a tree diagram of relationships within a matrix Parameters ---------- data : pandas.DataFrame Rectangular data linkage : numpy.array, optional Linkage matrix axis : int, optional Which axis to use to calculate linkage. 0 is rows, 1 is columns. label : bool, optional If True, label the dendrogram at leaves with column or row names metric : str, optional Distance metric. Anything valid for scipy.spatial.distance.pdist method : str, optional Linkage method to use. Anything valid for scipy.cluster.hierarchy.linkage rotate : bool, optional When plotting the matrix, whether to rotate it 90 degrees counter-clockwise, so the leaves face right ax : matplotlib axis, optional Axis to plot on, otherwise uses current axis Returns ------- dendrogramplotter : _DendrogramPlotter A Dendrogram plotter object. Notes ----- Access the reordered dendrogram indices with dendrogramplotter.reordered_ind """ plotter = _DendrogramPlotter(data, linkage=linkage, axis=axis, metric=metric, method=method, label=label, rotate=rotate) if ax is None: ax = plt.gca() return plotter.plot(ax=ax) class ClusterGrid(Grid): def __init__(self, data, pivot_kws=None, z_score=None, standard_scale=None, figsize=None, row_colors=None, col_colors=None, mask=None): """Grid object for organizing clustered heatmap input on to axes""" if isinstance(data, pd.DataFrame): self.data = data else: self.data = pd.DataFrame(data) self.data2d = self.format_data(self.data, pivot_kws, z_score, standard_scale) self.mask = _matrix_mask(self.data2d, mask) if figsize is None: width, height = 10, 10 figsize = (width, height) self.fig = plt.figure(figsize=figsize) if row_colors is not None: row_colors = _convert_colors(row_colors) self.row_colors = row_colors if col_colors is not None: col_colors = _convert_colors(col_colors) self.col_colors = col_colors width_ratios = self.dim_ratios(self.row_colors, figsize=figsize, axis=1) height_ratios = self.dim_ratios(self.col_colors, figsize=figsize, axis=0) nrows = 3 if self.col_colors is None else 4 ncols = 3 if self.row_colors is None else 4 self.gs = gridspec.GridSpec(nrows, ncols, wspace=0.01, hspace=0.01, width_ratios=width_ratios, height_ratios=height_ratios) self.ax_row_dendrogram = self.fig.add_subplot(self.gs[nrows - 1, 0:2], axisbg="white") self.ax_col_dendrogram = self.fig.add_subplot(self.gs[0:2, ncols - 1], axisbg="white") self.ax_row_colors = None self.ax_col_colors = None if self.row_colors is not None: self.ax_row_colors = self.fig.add_subplot( self.gs[nrows - 1, ncols - 2]) if self.col_colors is not None: self.ax_col_colors = self.fig.add_subplot( self.gs[nrows - 2, ncols - 1]) self.ax_heatmap = self.fig.add_subplot(self.gs[nrows - 1, ncols - 1]) # colorbar for scale to left corner self.cax = self.fig.add_subplot(self.gs[0, 0]) self.dendrogram_row = None self.dendrogram_col = None def format_data(self, data, pivot_kws, z_score=None, standard_scale=None): """Extract variables from data or use directly.""" # Either the data is already in 2d matrix format, or need to do a pivot if pivot_kws is not None: data2d = data.pivot(**pivot_kws) else: data2d = data if z_score is not None and standard_scale is not None: raise ValueError( 'Cannot perform both z-scoring and standard-scaling on data') if z_score is not None: data2d = self.z_score(data2d, z_score) if standard_scale is not None: data2d = self.standard_scale(data2d, standard_scale) return data2d @staticmethod def z_score(data2d, axis=1): """Standarize the mean and variance of the data axis Parameters ---------- data2d : pandas.DataFrame Data to normalize axis : int Which axis to normalize across. If 0, normalize across rows, if 1, normalize across columns. Returns ------- normalized : pandas.DataFrame Noramlized data with a mean of 0 and variance of 1 across the specified axis. """ if axis == 1: z_scored = data2d else: z_scored = data2d.T z_scored = (z_scored - z_scored.mean()) / z_scored.var() if axis == 1: return z_scored else: return z_scored.T @staticmethod def standard_scale(data2d, axis=1): """Divide the data by the difference between the max and min Parameters ---------- data2d : pandas.DataFrame Data to normalize axis : int Which axis to normalize across. If 0, normalize across rows, if 1, normalize across columns. vmin : int If 0, then subtract the minimum of the data before dividing by the range. Returns ------- standardized : pandas.DataFrame Noramlized data with a mean of 0 and variance of 1 across the specified axis. >>> import numpy as np >>> d = np.arange(5, 8, 0.5) >>> ClusterGrid.standard_scale(d) array([ 0. , 0.2, 0.4, 0.6, 0.8, 1. ]) """ # Normalize these values to range from 0 to 1 if axis == 1: standardized = data2d else: standardized = data2d.T subtract = standardized.min() standardized = (standardized - subtract) / ( standardized.max() - standardized.min()) if axis == 1: return standardized else: return standardized.T def dim_ratios(self, side_colors, axis, figsize, side_colors_ratio=0.05): """Get the proportions of the figure taken up by each axes """ figdim = figsize[axis] # Get resizing proportion of this figure for the dendrogram and # colorbar, so only the heatmap gets bigger but the dendrogram stays # the same size. dendrogram = min(2. / figdim, .2) # add the colorbar colorbar_width = .8 * dendrogram colorbar_height = .2 * dendrogram if axis == 0: ratios = [colorbar_width, colorbar_height] else: ratios = [colorbar_height, colorbar_width] if side_colors is not None: # Add room for the colors ratios += [side_colors_ratio] # Add the ratio for the heatmap itself ratios += [.8] return ratios @staticmethod def color_list_to_matrix_and_cmap(colors, ind, axis=0): """Turns a list of colors into a numpy matrix and matplotlib colormap These arguments can now be plotted using heatmap(matrix, cmap) and the provided colors will be plotted. Parameters ---------- colors : list of matplotlib colors Colors to label the rows or columns of a dataframe. ind : list of ints Ordering of the rows or columns, to reorder the original colors by the clustered dendrogram order axis : int Which axis this is labeling Returns ------- matrix : numpy.array A numpy array of integer values, where each corresponds to a color from the originally provided list of colors cmap : matplotlib.colors.ListedColormap """ # check for nested lists/color palettes. # Will fail if matplotlib color is list not tuple if any(issubclass(type(x), list) for x in colors): all_colors = set(itertools.chain(*colors)) n = len(colors) m = len(colors[0]) else: all_colors = set(colors) n = 1 m = len(colors) colors = [colors] color_to_value = dict((col, i) for i, col in enumerate(all_colors)) matrix = np.array([color_to_value[c] for color in colors for c in color]) shape = (n, m) matrix = matrix.reshape(shape) matrix = matrix[:, ind] if axis == 0: # row-side: matrix = matrix.T cmap = mpl.colors.ListedColormap(all_colors) return matrix, cmap def savefig(self, *args, **kwargs): if 'bbox_inches' not in kwargs: kwargs['bbox_inches'] = 'tight' self.fig.savefig(*args, **kwargs) def plot_dendrograms(self, row_cluster, col_cluster, metric, method, row_linkage, col_linkage): # Plot the row dendrogram if row_cluster: self.dendrogram_row = dendrogram( self.data2d, metric=metric, method=method, label=False, axis=0, ax=self.ax_row_dendrogram, rotate=True, linkage=row_linkage) else: self.ax_row_dendrogram.set_xticks([]) self.ax_row_dendrogram.set_yticks([]) # PLot the column dendrogram if col_cluster: self.dendrogram_col = dendrogram( self.data2d, metric=metric, method=method, label=False, axis=1, ax=self.ax_col_dendrogram, linkage=col_linkage) else: self.ax_col_dendrogram.set_xticks([]) self.ax_col_dendrogram.set_yticks([]) despine(ax=self.ax_row_dendrogram, bottom=True, left=True) despine(ax=self.ax_col_dendrogram, bottom=True, left=True) def plot_colors(self, xind, yind, **kws): """Plots color labels between the dendrogram and the heatmap Parameters ---------- heatmap_kws : dict Keyword arguments heatmap """ # Remove any custom colormap and centering kws = kws.copy() kws.pop('cmap', None) kws.pop('center', None) kws.pop('vmin', None) kws.pop('vmax', None) kws.pop('xticklabels', None) kws.pop('yticklabels', None) if self.row_colors is not None: matrix, cmap = self.color_list_to_matrix_and_cmap( self.row_colors, yind, axis=0) heatmap(matrix, cmap=cmap, cbar=False, ax=self.ax_row_colors, xticklabels=False, yticklabels=False, **kws) else: despine(self.ax_row_colors, left=True, bottom=True) if self.col_colors is not None: matrix, cmap = self.color_list_to_matrix_and_cmap( self.col_colors, xind, axis=1) heatmap(matrix, cmap=cmap, cbar=False, ax=self.ax_col_colors, xticklabels=False, yticklabels=False, **kws) else: despine(self.ax_col_colors, left=True, bottom=True) def plot_matrix(self, colorbar_kws, xind, yind, **kws): self.data2d = self.data2d.iloc[yind, xind] self.mask = self.mask.iloc[yind, xind] # Try to reorganize specified tick labels, if provided xtl = kws.pop("xticklabels", True) try: xtl = np.asarray(xtl)[xind] except (TypeError, IndexError): pass ytl = kws.pop("yticklabels", True) try: ytl = np.asarray(ytl)[yind] except (TypeError, IndexError): pass heatmap(self.data2d, ax=self.ax_heatmap, cbar_ax=self.cax, cbar_kws=colorbar_kws, mask=self.mask, xticklabels=xtl, yticklabels=ytl, **kws) self.ax_heatmap.yaxis.set_ticks_position('right') self.ax_heatmap.yaxis.set_label_position('right') def plot(self, metric, method, colorbar_kws, row_cluster, col_cluster, row_linkage, col_linkage, **kws): colorbar_kws = {} if colorbar_kws is None else colorbar_kws self.plot_dendrograms(row_cluster, col_cluster, metric, method, row_linkage=row_linkage, col_linkage=col_linkage) try: xind = self.dendrogram_col.reordered_ind except AttributeError: xind = np.arange(self.data2d.shape[1]) try: yind = self.dendrogram_row.reordered_ind except AttributeError: yind = np.arange(self.data2d.shape[0]) self.plot_colors(xind, yind, **kws) self.plot_matrix(colorbar_kws, xind, yind, **kws) return self def clustermap(data, pivot_kws=None, method='average', metric='euclidean', z_score=None, standard_scale=None, figsize=None, cbar_kws=None, row_cluster=True, col_cluster=True, row_linkage=None, col_linkage=None, row_colors=None, col_colors=None, mask=None, **kwargs): """Plot a hierarchically clustered heatmap of a pandas DataFrame Parameters ---------- data: pandas.DataFrame Rectangular data for clustering. Cannot contain NAs. pivot_kws : dict, optional If `data` is a tidy dataframe, can provide keyword arguments for pivot to create a rectangular dataframe. method : str, optional Linkage method to use for calculating clusters. See scipy.cluster.hierarchy.linkage documentation for more information: http://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.hierarchy.linkage.html metric : str, optional Distance metric to use for the data. See scipy.spatial.distance.pdist documentation for more options http://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.pdist.html z_score : int or None, optional Either 0 (rows) or 1 (columns). Whether or not to calculate z-scores for the rows or the columns. Z scores are: z = (x - mean)/std, so values in each row (column) will get the mean of the row (column) subtracted, then divided by the standard deviation of the row (column). This ensures that each row (column) has mean of 0 and variance of 1. standard_scale : int or None, optional Either 0 (rows) or 1 (columns). Whether or not to standardize that dimension, meaning for each row or column, subtract the minimum and divide each by its maximum. figsize: tuple of two ints, optional Size of the figure to create. cbar_kws : dict, optional Keyword arguments to pass to ``cbar_kws`` in ``heatmap``, e.g. to add a label to the colorbar. {row,col}_cluster : bool, optional If True, cluster the {rows, columns}. {row,col}_linkage : numpy.array, optional Precomputed linkage matrix for the rows or columns. See scipy.cluster.hierarchy.linkage for specific formats. {row,col}_colors : list-like, optional List of colors to label for either the rows or columns. Useful to evaluate whether samples within a group are clustered together. Can use nested lists for multiple color levels of labeling. mask : boolean array or DataFrame, optional If passed, data will not be shown in cells where ``mask`` is True. Cells with missing values are automatically masked. Only used for visualizing, not for calculating. kwargs : other keyword arguments All other keyword arguments are passed to ``sns.heatmap`` Returns ------- clustergrid : ClusterGrid A ClusterGrid instance. Notes ----- The returned object has a ``savefig`` method that should be used if you want to save the figure object without clipping the dendrograms. To access the reordered row indices, use: ``clustergrid.dendrogram_row.reordered_ind`` Column indices, use: ``clustergrid.dendrogram_col.reordered_ind`` Examples -------- Plot a clustered heatmap: .. plot:: :context: close-figs >>> import seaborn as sns; sns.set() >>> flights = sns.load_dataset("flights") >>> flights = flights.pivot("month", "year", "passengers") >>> g = sns.clustermap(flights) Don't cluster one of the axes: .. plot:: :context: close-figs >>> g = sns.clustermap(flights, col_cluster=False) Use a different colormap and add lines to separate the cells: .. plot:: :context: close-figs >>> cmap = sns.cubehelix_palette(as_cmap=True, rot=-.3, light=1) >>> g = sns.clustermap(flights, cmap=cmap, linewidths=.5) Use a different figure size: .. plot:: :context: close-figs >>> g = sns.clustermap(flights, cmap=cmap, figsize=(7, 5)) Standardize the data across the columns: .. plot:: :context: close-figs >>> g = sns.clustermap(flights, standard_scale=1) Normalize the data across the rows: .. plot:: :context: close-figs >>> g = sns.clustermap(flights, z_score=0) Use a different clustering method: .. plot:: :context: close-figs >>> g = sns.clustermap(flights, method="single", metric="cosine") Add colored labels on one of the axes: .. plot:: :context: close-figs >>> season_colors = (sns.color_palette("BuPu", 3) + ... sns.color_palette("RdPu", 3) + ... sns.color_palette("YlGn", 3) + ... sns.color_palette("OrRd", 3)) >>> g = sns.clustermap(flights, row_colors=season_colors) """ plotter = ClusterGrid(data, pivot_kws=pivot_kws, figsize=figsize, row_colors=row_colors, col_colors=col_colors, z_score=z_score, standard_scale=standard_scale, mask=mask) return plotter.plot(metric=metric, method=method, colorbar_kws=cbar_kws, row_cluster=row_cluster, col_cluster=col_cluster, row_linkage=row_linkage, col_linkage=col_linkage, **kwargs) seaborn-0.6.0/seaborn/miscplot.py000066400000000000000000000027321254425553400170200ustar00rootroot00000000000000from __future__ import division import numpy as np import matplotlib as mpl import matplotlib.pyplot as plt def palplot(pal, size=1): """Plot the values in a color palette as a horizontal array. Parameters ---------- pal : sequence of matplotlib colors colors, i.e. as returned by seaborn.color_palette() size : scaling factor for size of plot """ n = len(pal) f, ax = plt.subplots(1, 1, figsize=(n * size, size)) ax.imshow(np.arange(n).reshape(1, n), cmap=mpl.colors.ListedColormap(list(pal)), interpolation="nearest", aspect="auto") ax.set_xticks(np.arange(n) - .5) ax.set_yticks([-.5, .5]) ax.set_xticklabels([]) ax.set_yticklabels([]) def puppyplot(grown_up=False): """Plot today's daily puppy. Only works in the IPython notebook.""" from .external.six.moves.urllib.request import urlopen from IPython.display import HTML try: from bs4 import BeautifulSoup url = "http://www.dailypuppy.com" if grown_up: url += "/dogs" html_doc = urlopen(url) soup = BeautifulSoup(html_doc) puppy = soup.find("div", {"class": "daily_puppy"}) return HTML(str(puppy.img)) except ImportError: html = ('') return HTML(html) seaborn-0.6.0/seaborn/palettes.py000066400000000000000000001247131254425553400170130ustar00rootroot00000000000000from __future__ import division import colorsys from itertools import cycle import numpy as np import matplotlib as mpl import matplotlib.pyplot as plt from matplotlib.colors import LinearSegmentedColormap from .external import husl from .external.six import string_types from .external.six.moves import range from .utils import desaturate, set_hls_values from .xkcd_rgb import xkcd_rgb from .crayons import crayons from .miscplot import palplot SEABORN_PALETTES = dict( deep=["#4C72B0", "#55A868", "#C44E52", "#8172B2", "#CCB974", "#64B5CD"], muted=["#4878CF", "#6ACC65", "#D65F5F", "#B47CC7", "#C4AD66", "#77BEDB"], pastel=["#92C6FF", "#97F0AA", "#FF9F9A", "#D0BBFF", "#FFFEA3", "#B0E0E6"], bright=["#003FFF", "#03ED3A", "#E8000B", "#8A2BE2", "#FFC400", "#00D7FF"], dark=["#001C7F", "#017517", "#8C0900", "#7600A1", "#B8860B", "#006374"], colorblind=["#0072B2", "#009E73", "#D55E00", "#CC79A7", "#F0E442", "#56B4E9"] ) class _ColorPalette(list): """Set the color palette in a with statement, otherwise be a list.""" def __enter__(self): """Open the context.""" from .rcmod import set_palette self._orig_palette = color_palette() set_palette(self) return self def __exit__(self, *args): """Close the context.""" from .rcmod import set_palette set_palette(self._orig_palette) def as_hex(self): """Return a color palette with hex codes instead of RGB values.""" hex = [mpl.colors.rgb2hex(rgb) for rgb in self] return _ColorPalette(hex) def color_palette(palette=None, n_colors=None, desat=None): """Return a list of colors defining a color palette. Availible seaborn palette names: deep, muted, bright, pastel, dark, colorblind Other options: hls, husl, any named matplotlib palette, list of colors Calling this function with ``palette=None`` will return the current matplotlib color cycle. Matplotlib paletes can be specified as reversed palettes by appending "_r" to the name or as dark palettes by appending "_d" to the name. (These options are mutually exclusive, but the resulting list of colors can also be reversed). This function can also be used in a ``with`` statement to temporarily set the color cycle for a plot or set of plots. Parameters ---------- palette: None, string, or sequence, optional Name of palette or None to return current palette. If a sequence, input colors are used but possibly cycled and desaturated. n_colors : int, optional Number of colors in the palette. If ``None``, the default will depend on how ``palette`` is specified. Named palettes default to 6 colors, but grabbing the current palette or passing in a list of colors will not change the number of colors unless this is specified. Asking for more colors than exist in the palette will cause it to cycle. desat : float, optional Proportion to desaturate each color by. Returns ------- palette : list of RGB tuples. Color palette. Behaves like a list, but can be used as a context manager and possesses an ``as_hex`` method to convert to hex color codes. See Also -------- set_palette : Set the default color cycle for all plots. set_color_codes : Reassign color codes like ``"b"``, ``"g"``, etc. to colors from one of the seaborn palettes. Examples -------- Show one of the "seaborn palettes", which have the same basic order of hues as the default matplotlib color cycle but more attractive colors. .. plot:: :context: close-figs >>> import seaborn as sns; sns.set() >>> sns.palplot(sns.color_palette("muted")) Use discrete values from one of the built-in matplotlib colormaps. .. plot:: :context: close-figs >>> sns.palplot(sns.color_palette("RdBu", n_colors=7)) Make a "dark" matplotlib sequential palette variant. (This can be good when coloring multiple lines or points that correspond to an ordered variable, where you don't want the lightest lines to be invisible). .. plot:: :context: close-figs >>> sns.palplot(sns.color_palette("Blues_d")) Use a categorical matplotlib palette, add some desaturation. (This can be good when making plots with large patches, which look best with dimmer colors). .. plot:: :context: close-figs >>> sns.palplot(sns.color_palette("Set1", n_colors=8, desat=.5)) Use as a context manager: .. plot:: :context: close-figs >>> import numpy as np, matplotlib.pyplot as plt >>> with sns.color_palette("husl", 8): ... _ = plt.plot(np.c_[np.zeros(8), np.arange(8)].T) """ if palette is None: palette = mpl.rcParams["axes.color_cycle"] if n_colors is None: n_colors = len(palette) elif not isinstance(palette, string_types): palette = palette if n_colors is None: n_colors = len(palette) else: if n_colors is None: n_colors = 6 if palette == "hls": palette = hls_palette(n_colors) elif palette == "husl": palette = husl_palette(n_colors) elif palette.lower() == "jet": raise ValueError("No.") elif palette in SEABORN_PALETTES: palette = SEABORN_PALETTES[palette] elif palette in dir(mpl.cm): palette = mpl_palette(palette, n_colors) elif palette[:-2] in dir(mpl.cm): palette = mpl_palette(palette, n_colors) else: raise ValueError("%s is not a valid palette name" % palette) if desat is not None: palette = [desaturate(c, desat) for c in palette] # Always return as many colors as we asked for pal_cycle = cycle(palette) palette = [next(pal_cycle) for _ in range(n_colors)] # Always return in r, g, b tuple format try: palette = map(mpl.colors.colorConverter.to_rgb, palette) palette = _ColorPalette(palette) except ValueError: raise ValueError("Could not generate a palette for %s" % str(palette)) return palette def hls_palette(n_colors=6, h=.01, l=.6, s=.65): """Get a set of evenly spaced colors in HLS hue space. h, l, and s should be between 0 and 1 Parameters ---------- n_colors : int number of colors in the palette h : float first hue l : float lightness s : float saturation Returns ------- palette : seaborn color palette List-like object of colors as RGB tuples. See Also -------- husl_palette : Make a palette using evently spaced circular hues in the HUSL system. Examples -------- Create a palette of 10 colors with the default parameters: .. plot:: :context: close-figs >>> import seaborn as sns; sns.set() >>> sns.palplot(sns.hls_palette(10)) Create a palette of 10 colors that begins at a different hue value: .. plot:: :context: close-figs >>> sns.palplot(sns.hls_palette(10, h=.5)) Create a palette of 10 colors that are darker than the default: .. plot:: :context: close-figs >>> sns.palplot(sns.hls_palette(10, l=.4)) Create a palette of 10 colors that are less saturated than the default: .. plot:: :context: close-figs >>> sns.palplot(sns.hls_palette(10, s=.4)) """ hues = np.linspace(0, 1, n_colors + 1)[:-1] hues += h hues %= 1 hues -= hues.astype(int) palette = [colorsys.hls_to_rgb(h_i, l, s) for h_i in hues] return _ColorPalette(palette) def husl_palette(n_colors=6, h=.01, s=.9, l=.65): """Get a set of evenly spaced colors in HUSL hue space. h, s, and l should be between 0 and 1 Parameters ---------- n_colors : int number of colors in the palette h : float first hue s : float saturation l : float lightness Returns ------- palette : seaborn color palette List-like object of colors as RGB tuples. See Also -------- hls_palette : Make a palette using evently spaced circular hues in the HSL system. Examples -------- Create a palette of 10 colors with the default parameters: .. plot:: :context: close-figs >>> import seaborn as sns; sns.set() >>> sns.palplot(sns.husl_palette(10)) Create a palette of 10 colors that begins at a different hue value: .. plot:: :context: close-figs >>> sns.palplot(sns.husl_palette(10, h=.5)) Create a palette of 10 colors that are darker than the default: .. plot:: :context: close-figs >>> sns.palplot(sns.husl_palette(10, l=.4)) Create a palette of 10 colors that are less saturated than the default: .. plot:: :context: close-figs >>> sns.palplot(sns.husl_palette(10, s=.4)) """ hues = np.linspace(0, 1, n_colors + 1)[:-1] hues += h hues %= 1 hues *= 359 s *= 99 l *= 99 palette = [husl.husl_to_rgb(h_i, s, l) for h_i in hues] return _ColorPalette(palette) def mpl_palette(name, n_colors=6): """Return discrete colors from a matplotlib palette. Note that this handles the qualitative colorbrewer palettes properly, although if you ask for more colors than a particular qualitative palette can provide you will get fewer than you are expecting. In contrast, asking for qualitative color brewer palettes using :func:`color_palette` will return the expected number of colors, but they will cycle. If you are using the IPython notebook, you can also use the function :func:`choose_colorbrewer_palette` to interactively select palettes. Parameters ---------- name : string Name of the palette. This should be a named matplotlib colormap. n_colors : int Number of discrete colors in the palette. Returns ------- palette or cmap : seaborn color palette or matplotlib colormap List-like object of colors as RGB tuples, or colormap object that can map continuous values to colors, depending on the value of the ``as_cmap`` parameter. Examples -------- Create a qualitative colorbrewer palette with 8 colors: .. plot:: :context: close-figs >>> import seaborn as sns; sns.set() >>> sns.palplot(sns.mpl_palette("Set2", 8)) Create a sequential colorbrewer palette: .. plot:: :context: close-figs >>> sns.palplot(sns.mpl_palette("Blues")) Create a diverging palette: .. plot:: :context: close-figs >>> sns.palplot(sns.mpl_palette("seismic", 8)) Create a "dark" sequential palette: .. plot:: :context: close-figs >>> sns.palplot(sns.mpl_palette("GnBu_d")) """ brewer_qual_pals = {"Accent": 8, "Dark2": 8, "Paired": 12, "Pastel1": 9, "Pastel2": 8, "Set1": 9, "Set2": 8, "Set3": 12} if name.endswith("_d"): pal = ["#333333"] pal.extend(color_palette(name.replace("_d", "_r"), 2)) cmap = blend_palette(pal, n_colors, as_cmap=True) else: cmap = getattr(mpl.cm, name) if name in brewer_qual_pals: bins = np.linspace(0, 1, brewer_qual_pals[name])[:n_colors] else: bins = np.linspace(0, 1, n_colors + 2)[1:-1] palette = list(map(tuple, cmap(bins)[:, :3])) return _ColorPalette(palette) def _color_to_rgb(color, input): """Add some more flexibility to color choices.""" if input == "hls": color = colorsys.hls_to_rgb(*color) elif input == "husl": color = husl.husl_to_rgb(*color) elif input == "xkcd": color = xkcd_rgb[color] return color def dark_palette(color, n_colors=6, reverse=False, as_cmap=False, input="rgb"): """Make a sequential palette that blends from dark to ``color``. This kind of palette is good for data that range between relatively uninteresting low values and interesting high values. The ``color`` parameter can be specified in a number of ways, including all options for defining a color in matplotlib and several additional color spaces that are handled by seaborn. You can also use the database of named colors from the XKCD color survey. If you are using the IPython notebook, you can also choose this palette interactively with the :func:`choose_dark_palette` function. Parameters ---------- color : base color for high values hex, rgb-tuple, or html color name n_colors : int, optional number of colors in the palette reverse : bool, optional if True, reverse the direction of the blend as_cmap : bool, optional if True, return as a matplotlib colormap instead of list input : {'rgb', 'hls', 'husl', xkcd'} Color space to interpret the input color. The first three options apply to tuple inputs and the latter applies to string inputs. Returns ------- palette or cmap : seaborn color palette or matplotlib colormap List-like object of colors as RGB tuples, or colormap object that can map continuous values to colors, depending on the value of the ``as_cmap`` parameter. See Also -------- light_palette : Create a sequential palette with bright low values. diverging_palette : Create a diverging palette with two colors. Examples -------- Generate a palette from an HTML color: .. plot:: :context: close-figs >>> import seaborn as sns; sns.set() >>> sns.palplot(sns.dark_palette("purple")) Generate a palette that decreases in lightness: .. plot:: :context: close-figs >>> sns.palplot(sns.dark_palette("seagreen", reverse=True)) Generate a palette from an HUSL-space seed: .. plot:: :context: close-figs >>> sns.palplot(sns.dark_palette((260, 75, 60), input="husl")) Generate a colormap object: .. plot:: :context: close-figs >>> from numpy import arange >>> x = arange(25).reshape(5, 5) >>> cmap = sns.dark_palette("#2ecc71", as_cmap=True) >>> ax = sns.heatmap(x, cmap=cmap) """ color = _color_to_rgb(color, input) gray = "#222222" colors = [color, gray] if reverse else [gray, color] return blend_palette(colors, n_colors, as_cmap) def light_palette(color, n_colors=6, reverse=False, as_cmap=False, input="rgb"): """Make a sequential palette that blends from light to ``color``. This kind of palette is good for data that range between relatively uninteresting low values and interesting high values. The ``color`` parameter can be specified in a number of ways, including all options for defining a color in matplotlib and several additional color spaces that are handled by seaborn. You can also use the database of named colors from the XKCD color survey. If you are using the IPython notebook, you can also choose this palette interactively with the :func:`choose_light_palette` function. Parameters ---------- color : base color for high values hex code, html color name, or tuple in ``input`` space. n_colors : int, optional number of colors in the palette reverse : bool, optional if True, reverse the direction of the blend as_cmap : bool, optional if True, return as a matplotlib colormap instead of list input : {'rgb', 'hls', 'husl', xkcd'} Color space to interpret the input color. The first three options apply to tuple inputs and the latter applies to string inputs. Returns ------- palette or cmap : seaborn color palette or matplotlib colormap List-like object of colors as RGB tuples, or colormap object that can map continuous values to colors, depending on the value of the ``as_cmap`` parameter. See Also -------- dark_palette : Create a sequential palette with dark low values. diverging_palette : Create a diverging palette with two colors. Examples -------- Generate a palette from an HTML color: .. plot:: :context: close-figs >>> import seaborn as sns; sns.set() >>> sns.palplot(sns.light_palette("purple")) Generate a palette that increases in lightness: .. plot:: :context: close-figs >>> sns.palplot(sns.light_palette("seagreen", reverse=True)) Generate a palette from an HUSL-space seed: .. plot:: :context: close-figs >>> sns.palplot(sns.light_palette((260, 75, 60), input="husl")) Generate a colormap object: .. plot:: :context: close-figs >>> from numpy import arange >>> x = arange(25).reshape(5, 5) >>> cmap = sns.light_palette("#2ecc71", as_cmap=True) >>> ax = sns.heatmap(x, cmap=cmap) """ color = _color_to_rgb(color, input) light = set_hls_values(color, l=.95) colors = [color, light] if reverse else [light, color] return blend_palette(colors, n_colors, as_cmap) def _flat_palette(color, n_colors=6, reverse=False, as_cmap=False, input="rgb"): """Make a sequential palette that blends from gray to ``color``. Parameters ---------- color : matplotlib color hex, rgb-tuple, or html color name n_colors : int, optional number of colors in the palette reverse : bool, optional if True, reverse the direction of the blend as_cmap : bool, optional if True, return as a matplotlib colormap instead of list Returns ------- palette : list or colormap dark_palette : Create a sequential palette with dark low values. """ color = _color_to_rgb(color, input) flat = desaturate(color, 0) colors = [color, flat] if reverse else [flat, color] return blend_palette(colors, n_colors, as_cmap) def diverging_palette(h_neg, h_pos, s=75, l=50, sep=10, n=6, center="light", as_cmap=False): """Make a diverging palette between two HUSL colors. If you are using the IPython notebook, you can also choose this palette interactively with the :func:`choose_diverging_palette` function. Parameters ---------- h_neg, h_pos : float in [0, 359] Anchor hues for negative and positive extents of the map. s : float in [0, 100], optional Anchor saturation for both extents of the map. l : float in [0, 100], optional Anchor lightness for both extents of the map. n : int, optional Number of colors in the palette (if not returning a cmap) center : {"light", "dark"}, optional Whether the center of the palette is light or dark as_cmap : bool, optional If true, return a matplotlib colormap object rather than a list of colors. Returns ------- palette or cmap : seaborn color palette or matplotlib colormap List-like object of colors as RGB tuples, or colormap object that can map continuous values to colors, depending on the value of the ``as_cmap`` parameter. See Also -------- dark_palette : Create a sequential palette with dark values. light_palette : Create a sequential palette with light values. Examples -------- Generate a blue-white-red palette: .. plot:: :context: close-figs >>> import seaborn as sns; sns.set() >>> sns.palplot(sns.diverging_palette(240, 10, n=9)) Generate a brighter green-white-purple palette: .. plot:: :context: close-figs >>> sns.palplot(sns.diverging_palette(150, 275, s=80, l=55, n=9)) Generate a blue-black-red palette: .. plot:: :context: close-figs >>> sns.palplot(sns.diverging_palette(250, 15, s=75, l=40, ... n=9, center="dark")) Generate a colormap object: .. plot:: :context: close-figs >>> from numpy import arange >>> x = arange(25).reshape(5, 5) >>> cmap = sns.diverging_palette(220, 20, sep=20, as_cmap=True) >>> ax = sns.heatmap(x, cmap=cmap) """ palfunc = dark_palette if center == "dark" else light_palette neg = palfunc((h_neg, s, l), 128 - (sep / 2), reverse=True, input="husl") pos = palfunc((h_pos, s, l), 128 - (sep / 2), input="husl") midpoint = dict(light=[(.95, .95, .95, 1.)], dark=[(.133, .133, .133, 1.)])[center] mid = midpoint * sep pal = blend_palette(np.concatenate([neg, mid, pos]), n, as_cmap=as_cmap) return pal def blend_palette(colors, n_colors=6, as_cmap=False, input="rgb"): """Make a palette that blends between a list of colors. Parameters ---------- colors : sequence of colors in various formats interpreted by ``input`` hex code, html color name, or tuple in ``input`` space. n_colors : int, optional Number of colors in the palette. as_cmap : bool, optional If True, return as a matplotlib colormap instead of list. Returns ------- palette or cmap : seaborn color palette or matplotlib colormap List-like object of colors as RGB tuples, or colormap object that can map continuous values to colors, depending on the value of the ``as_cmap`` parameter. """ colors = [_color_to_rgb(color, input) for color in colors] name = "blend" pal = mpl.colors.LinearSegmentedColormap.from_list(name, colors) if not as_cmap: pal = _ColorPalette(pal(np.linspace(0, 1, n_colors))) return pal def xkcd_palette(colors): """Make a palette with color names from the xkcd color survey. See xkcd for the full list of colors: http://xkcd.com/color/rgb/ This is just a simple wrapper around the ``seaborn.xkcd_rgb`` dictionary. Parameters ---------- colors : list of strings List of keys in the ``seaborn.xkcd_rgb`` dictionary. Returns ------- palette : seaborn color palette Returns the list of colors as RGB tuples in an object that behaves like other seaborn color palettes. See Also -------- crayon_palette : Make a palette with Crayola crayon colors. """ palette = [xkcd_rgb[name] for name in colors] return color_palette(palette, len(palette)) def crayon_palette(colors): """Make a palette with color names from Crayola crayons. Colors are taken from here: http://en.wikipedia.org/wiki/List_of_Crayola_crayon_colors This is just a simple wrapper around the ``seaborn.crayons`` dictionary. Parameters ---------- colors : list of strings List of keys in the ``seaborn.crayons`` dictionary. Returns ------- palette : seaborn color palette Returns the list of colors as rgb tuples in an object that behaves like other seaborn color palettes. See Also -------- xkcd_palette : Make a palette with named colors from the XKCD color survey. """ palette = [crayons[name] for name in colors] return color_palette(palette, len(palette)) def cubehelix_palette(n_colors=6, start=0, rot=.4, gamma=1.0, hue=0.8, light=.85, dark=.15, reverse=False, as_cmap=False): """Make a sequential palette from the cubehelix system. This produces a colormap with linearly-decreasing (or increasing) brightness. That means that information will be preserved if printed to black and white or viewed by someone who is colorblind. "cubehelix" is also availible as a matplotlib-based palette, but this function gives the user more control over the look of the palette and has a different set of defaults. Parameters ---------- n_colors : int Number of colors in the palette. start : float, 0 <= start <= 3 The hue at the start of the helix. rot : float Rotations around the hue wheel over the range of the palette. gamma : float 0 <= gamma Gamma factor to emphasize darker (gamma < 1) or lighter (gamma > 1) colors. hue : float, 0 <= hue <= 1 Saturation of the colors. dark : float 0 <= dark <= 1 Intensity of the darkest color in the palette. light : float 0 <= light <= 1 Intensity of the lightest color in the palette. reverse : bool If True, the palette will go from dark to light. as_cmap : bool If True, return a matplotlib colormap instead of a list of colors. Returns ------- palette or cmap : seaborn color palette or matplotlib colormap List-like object of colors as RGB tuples, or colormap object that can map continuous values to colors, depending on the value of the ``as_cmap`` parameter. See Also -------- choose_cubehelix_palette : Launch an interactive widget to select cubehelix palette parameters. dark_palette : Create a sequential palette with dark low values. light_palette : Create a sequential palette with bright low values. References ---------- Green, D. A. (2011). "A colour scheme for the display of astronomical intensity images". Bulletin of the Astromical Society of India, Vol. 39, p. 289-295. Examples -------- Generate the default palette: .. plot:: :context: close-figs >>> import seaborn as sns; sns.set() >>> sns.palplot(sns.cubehelix_palette()) Rotate backwards from the same starting location: .. plot:: :context: close-figs >>> sns.palplot(sns.cubehelix_palette(rot=-.4)) Use a different starting point and shorter rotation: .. plot:: :context: close-figs >>> sns.palplot(sns.cubehelix_palette(start=2.8, rot=.1)) Reverse the direction of the lightness ramp: .. plot:: :context: close-figs >>> sns.palplot(sns.cubehelix_palette(reverse=True)) Generate a colormap object: .. plot:: :context: close-figs >>> from numpy import arange >>> x = arange(25).reshape(5, 5) >>> cmap = sns.cubehelix_palette(as_cmap=True) >>> ax = sns.heatmap(x, cmap=cmap) Use the full lightness range: .. plot:: :context: close-figs >>> cmap = sns.cubehelix_palette(dark=0, light=1, as_cmap=True) >>> ax = sns.heatmap(x, cmap=cmap) """ cdict = mpl._cm.cubehelix(gamma, start, rot, hue) cmap = mpl.colors.LinearSegmentedColormap("cubehelix", cdict) x = np.linspace(light, dark, n_colors) pal = cmap(x)[:, :3].tolist() if reverse: pal = pal[::-1] if as_cmap: x_256 = np.linspace(light, dark, 256) if reverse: x_256 = x_256[::-1] pal_256 = cmap(x_256) cmap = mpl.colors.ListedColormap(pal_256) return cmap else: return _ColorPalette(pal) def set_color_codes(palette="deep"): """Change how matplotlib color shorthands are interpreted. Calling this will change how shorthand codes like "b" or "g" are interpreted by matplotlib in subsequent plots. Parameters ---------- palette : {deep, muted, pastel, dark, bright, colorblind} Named seaborn palette to use as the source of colors. See Also -------- set : Color codes can be set through the high-level seaborn style manager. set_palette : Color codes can also be set through the function that sets the matplotlib color cycle. Examples -------- Map matplotlib color codes to the default seaborn palette. .. plot:: :context: close-figs >>> import matplotlib.pyplot as plt >>> import seaborn as sns; sns.set() >>> sns.set_color_codes() >>> _ = plt.plot([0, 1], color="r") Use a different seaborn palette. .. plot:: :context: close-figs >>> sns.set_color_codes("dark") >>> _ = plt.plot([0, 1], color="g") >>> _ = plt.plot([0, 2], color="m") """ if palette == "reset": colors = [(0., 0., 1.), (0., .5, 0.), (1., 0., 0.), (.75, .75, 0.), (.75, .75, 0.), (0., .75, .75), (0., 0., 0.)] else: colors = SEABORN_PALETTES[palette] + [(.1, .1, .1)] for code, color in zip("bgrmyck", colors): rgb = mpl.colors.colorConverter.to_rgb(color) mpl.colors.colorConverter.colors[code] = rgb mpl.colors.colorConverter.cache[code] = rgb def _init_mutable_colormap(): """Create a matplotlib colormap that will be updated by the widgets.""" greys = color_palette("Greys", 256) cmap = LinearSegmentedColormap.from_list("interactive", greys) cmap._init() cmap._set_extremes() return cmap def _update_lut(cmap, colors): """Change the LUT values in a matplotlib colormap in-place.""" cmap._lut[:256] = colors cmap._set_extremes() def _show_cmap(cmap): """Show a continuous matplotlib colormap.""" from .rcmod import axes_style # Avoid circular import with axes_style("white"): f, ax = plt.subplots(figsize=(8.25, .75)) ax.set(xticks=[], yticks=[]) x = np.linspace(0, 1, 256)[np.newaxis, :] ax.pcolormesh(x, cmap=cmap) def choose_colorbrewer_palette(data_type, as_cmap=False): """Select a palette from the ColorBrewer set. These palettes are built into matplotlib and can be used by name in many seaborn functions, or by passing the object returned by this function. Parameters ---------- data_type : {'sequential', 'diverging', 'qualitative'} This describes the kind of data you want to visualize. See the seaborn color palette docs for more information about how to choose this value. Note that you can pass substrings (e.g. 'q' for 'qualitative. as_cmap : bool If True, the return value is a matplotlib colormap rather than a list of discrete colors. Returns ------- pal or cmap : list of colors or matplotlib colormap Object that can be passed to plotting functions. See Also -------- dark_palette : Create a sequential palette with dark low values. light_palette : Create a sequential palette with bright low values. diverging_palette : Create a diverging palette from selected colors. cubehelix_palette : Create a sequential palette or colormap using the cubehelix system. """ from IPython.html.widgets import interact, FloatSliderWidget if data_type.startswith("q") and as_cmap: raise ValueError("Qualitative palettes cannot be colormaps.") pal = [] if as_cmap: cmap = _init_mutable_colormap() if data_type.startswith("s"): opts = ["Greys", "Reds", "Greens", "Blues", "Oranges", "Purples", "BuGn", "BuPu", "GnBu", "OrRd", "PuBu", "PuRd", "RdPu", "YlGn", "PuBuGn", "YlGnBu", "YlOrBr", "YlOrRd"] variants = ["regular", "reverse", "dark"] @interact def choose_sequential(name=opts, n=(2, 18), desat=FloatSliderWidget(min=0, max=1, value=1), variant=variants): if variant == "reverse": name += "_r" elif variant == "dark": name += "_d" if as_cmap: colors = color_palette(name, 256, desat) _update_lut(cmap, np.c_[colors, np.ones(256)]) _show_cmap(cmap) else: pal[:] = color_palette(name, n, desat) palplot(pal) elif data_type.startswith("d"): opts = ["RdBu", "RdGy", "PRGn", "PiYG", "BrBG", "RdYlBu", "RdYlGn", "Spectral"] variants = ["regular", "reverse"] @interact def choose_diverging(name=opts, n=(2, 16), desat=FloatSliderWidget(min=0, max=1, value=1), variant=variants): if variant == "reverse": name += "_r" if as_cmap: colors = color_palette(name, 256, desat) _update_lut(cmap, np.c_[colors, np.ones(256)]) _show_cmap(cmap) else: pal[:] = color_palette(name, n, desat) palplot(pal) elif data_type.startswith("q"): opts = ["Set1", "Set2", "Set3", "Paired", "Accent", "Pastel1", "Pastel2", "Dark2"] @interact def choose_qualitative(name=opts, n=(2, 16), desat=FloatSliderWidget(min=0, max=1, value=1)): pal[:] = color_palette(name, n, desat) palplot(pal) if as_cmap: return cmap return pal def choose_dark_palette(input="husl", as_cmap=False): """Launch an interactive widget to create a dark sequential palette. This corresponds with the :func:`dark_palette` function. This kind of palette is good for data that range between relatively uninteresting low values and interesting high values. Requires IPython 2+ and must be used in the notebook. Parameters ---------- input : {'husl', 'hls', 'rgb'} Color space for defining the seed value. Note that the default is different than the default input for :func:`dark_palette`. as_cmap : bool If True, the return value is a matplotlib colormap rather than a list of discrete colors. Returns ------- pal or cmap : list of colors or matplotlib colormap Object that can be passed to plotting functions. See Also -------- dark_palette : Create a sequential palette with dark low values. light_palette : Create a sequential palette with bright low values. cubehelix_palette : Create a sequential palette or colormap using the cubehelix system. """ from IPython.html.widgets import interact pal = [] if as_cmap: cmap = _init_mutable_colormap() if input == "rgb": @interact def choose_dark_palette_rgb(r=(0., 1.), g=(0., 1.), b=(0., 1.), n=(3, 17)): color = r, g, b if as_cmap: colors = dark_palette(color, 256, input="rgb") _update_lut(cmap, colors) _show_cmap(cmap) else: pal[:] = dark_palette(color, n, input="rgb") palplot(pal) elif input == "hls": @interact def choose_dark_palette_hls(h=(0., 1.), l=(0., 1.), s=(0., 1.), n=(3, 17)): color = h, l, s if as_cmap: colors = dark_palette(color, 256, input="hls") _update_lut(cmap, colors) _show_cmap(cmap) else: pal[:] = dark_palette(color, n, input="hls") palplot(pal) elif input == "husl": @interact def choose_dark_palette_husl(h=(0, 359), s=(0, 99), l=(0, 99), n=(3, 17)): color = h, s, l if as_cmap: colors = dark_palette(color, 256, input="husl") _update_lut(cmap, colors) _show_cmap(cmap) else: pal[:] = dark_palette(color, n, input="husl") palplot(pal) if as_cmap: return cmap return pal def choose_light_palette(input="husl", as_cmap=False): """Launch an interactive widget to create a light sequential palette. This corresponds with the :func:`light_palette` function. This kind of palette is good for data that range between relatively uninteresting low values and interesting high values. Requires IPython 2+ and must be used in the notebook. Parameters ---------- input : {'husl', 'hls', 'rgb'} Color space for defining the seed value. Note that the default is different than the default input for :func:`light_palette`. as_cmap : bool If True, the return value is a matplotlib colormap rather than a list of discrete colors. Returns ------- pal or cmap : list of colors or matplotlib colormap Object that can be passed to plotting functions. See Also -------- light_palette : Create a sequential palette with bright low values. dark_palette : Create a sequential palette with dark low values. cubehelix_palette : Create a sequential palette or colormap using the cubehelix system. """ from IPython.html.widgets import interact pal = [] if as_cmap: cmap = _init_mutable_colormap() if input == "rgb": @interact def choose_light_palette_rgb(r=(0., 1.), g=(0., 1.), b=(0., 1.), n=(3, 17)): color = r, g, b if as_cmap: colors = light_palette(color, 256, input="rgb") _update_lut(cmap, colors) _show_cmap(cmap) else: pal[:] = light_palette(color, n, input="rgb") palplot(pal) elif input == "hls": @interact def choose_light_palette_hls(h=(0., 1.), l=(0., 1.), s=(0., 1.), n=(3, 17)): color = h, l, s if as_cmap: colors = light_palette(color, 256, input="hls") _update_lut(cmap, colors) _show_cmap(cmap) else: pal[:] = light_palette(color, n, input="hls") palplot(pal) elif input == "husl": @interact def choose_light_palette_husl(h=(0, 359), s=(0, 99), l=(0, 99), n=(3, 17)): color = h, s, l if as_cmap: colors = light_palette(color, 256, input="husl") _update_lut(cmap, colors) _show_cmap(cmap) else: pal[:] = light_palette(color, n, input="husl") palplot(pal) if as_cmap: return cmap return pal def choose_diverging_palette(as_cmap=False): """Launch an interactive widget to choose a diverging color palette. This corresponds with the :func:`diverging_palette` function. This kind of palette is good for data that range between interesting low values and interesting high values with a meaningful midpoint. (For example, change scores relative to some baseline value). Requires IPython 2+ and must be used in the notebook. Parameters ---------- as_cmap : bool If True, the return value is a matplotlib colormap rather than a list of discrete colors. Returns ------- pal or cmap : list of colors or matplotlib colormap Object that can be passed to plotting functions. See Also -------- diverging_palette : Create a diverging color palette or colormap. choose_colorbrewer_palette : Interactively choose palettes from the colorbrewer set, including diverging palettes. """ from IPython.html.widgets import interact, IntSliderWidget pal = [] if as_cmap: cmap = _init_mutable_colormap() @interact def choose_diverging_palette(h_neg=IntSliderWidget(min=0, max=359, value=220), h_pos=IntSliderWidget(min=0, max=359, value=10), s=IntSliderWidget(min=0, max=99, value=74), l=IntSliderWidget(min=0, max=99, value=50), sep=IntSliderWidget(min=1, max=50, value=10), n=(2, 16), center=["light", "dark"]): if as_cmap: colors = diverging_palette(h_neg, h_pos, s, l, sep, 256, center) _update_lut(cmap, colors) _show_cmap(cmap) else: pal[:] = diverging_palette(h_neg, h_pos, s, l, sep, n, center) palplot(pal) if as_cmap: return cmap return pal def choose_cubehelix_palette(as_cmap=False): """Launch an interactive widget to create a sequential cubehelix palette. This corresponds with the :func:`cubehelix_palette` function. This kind of palette is good for data that range between relatively uninteresting low values and interesting high values. The cubehelix system allows the palette to have more hue variance across the range, which can be helpful for distinguishing a wider range of values. Requires IPython 2+ and must be used in the notebook. Parameters ---------- as_cmap : bool If True, the return value is a matplotlib colormap rather than a list of discrete colors. Returns ------- pal or cmap : list of colors or matplotlib colormap Object that can be passed to plotting functions. See Also -------- cubehelix_palette : Create a sequential palette or colormap using the cubehelix system. """ from IPython.html.widgets import (interact, FloatSliderWidget, IntSliderWidget) pal = [] if as_cmap: cmap = _init_mutable_colormap() @interact def choose_cubehelix(n_colors=IntSliderWidget(min=2, max=16, value=9), start=FloatSliderWidget(min=0, max=3, value=0), rot=FloatSliderWidget(min=-1, max=1, value=.4), gamma=FloatSliderWidget(min=0, max=5, value=1), hue=FloatSliderWidget(min=0, max=1, value=.8), light=FloatSliderWidget(min=0, max=1, value=.85), dark=FloatSliderWidget(min=0, max=1, value=.15), reverse=False): if as_cmap: colors = cubehelix_palette(256, start, rot, gamma, hue, light, dark, reverse) _update_lut(cmap, np.c_[colors, np.ones(256)]) _show_cmap(cmap) else: pal[:] = cubehelix_palette(n_colors, start, rot, gamma, hue, light, dark, reverse) palplot(pal) if as_cmap: return cmap return pal seaborn-0.6.0/seaborn/rcmod.py000066400000000000000000000365141254425553400162770ustar00rootroot00000000000000"""Functions that alter the matplotlib rc dictionary on the fly.""" import numpy as np import matplotlib as mpl from . import palettes _style_keys = ( "axes.facecolor", "axes.edgecolor", "axes.grid", "axes.axisbelow", "axes.linewidth", "axes.labelcolor", "figure.facecolor", "grid.color", "grid.linestyle", "text.color", "xtick.color", "ytick.color", "xtick.direction", "ytick.direction", "xtick.major.size", "ytick.major.size", "xtick.minor.size", "ytick.minor.size", "legend.frameon", "legend.numpoints", "legend.scatterpoints", "lines.solid_capstyle", "image.cmap", "font.family", "font.sans-serif", ) _context_keys = ( "figure.figsize", "axes.labelsize", "axes.titlesize", "xtick.labelsize", "ytick.labelsize", "legend.fontsize", "grid.linewidth", "lines.linewidth", "patch.linewidth", "lines.markersize", "lines.markeredgewidth", "xtick.major.width", "ytick.major.width", "xtick.minor.width", "ytick.minor.width", "xtick.major.pad", "ytick.major.pad" ) def set(context="notebook", style="darkgrid", palette="deep", font="sans-serif", font_scale=1, color_codes=False, rc=None): """Set aesthetic parameters in one step. Each set of parameters can be set directly or temporarily, see the referenced functions below for more information. Parameters ---------- context : string or dict Plotting context parameters, see :func:`plotting_context` style : string or dict Axes style parameters, see :func:`axes_style` palette : string or sequence Color palette, see :func:`color_palette` font : string Font family, see matplotlib font manager. font_scale : float, optional Separate scaling factor to independently scale the size of the font elements. color_codes : bool If ``True`` and ``palette`` is a seaborn palette, remap the shorthand color codes (e.g. "b", "g", "r", etc.) to the colors from this palette. rc : dict or None Dictionary of rc parameter mappings to override the above. """ set_context(context, font_scale) set_style(style, rc={"font.family": font}) set_palette(palette, color_codes=color_codes) if rc is not None: mpl.rcParams.update(rc) def reset_defaults(): """Restore all RC params to default settings.""" mpl.rcParams.update(mpl.rcParamsDefault) def reset_orig(): """Restore all RC params to original settings (respects custom rc).""" mpl.rcParams.update(mpl.rcParamsOrig) class _AxesStyle(dict): """Light wrapper on a dict to set style temporarily.""" def __enter__(self): """Open the context.""" rc = mpl.rcParams self._orig_style = {k: rc[k] for k in _style_keys} set_style(self) return self def __exit__(self, *args): """Close the context.""" set_style(self._orig_style) class _PlottingContext(dict): """Light wrapper on a dict to set context temporarily.""" def __enter__(self): """Open the context.""" rc = mpl.rcParams self._orig_context = {k: rc[k] for k in _context_keys} set_context(self) return self def __exit__(self, *args): """Close the context.""" set_context(self._orig_context) def axes_style(style=None, rc=None): """Return a parameter dict for the aesthetic style of the plots. This affects things like the color of the axes, whether a grid is enabled by default, and other aesthetic elements. This function returns an object that can be used in a ``with`` statement to temporarily change the style parameters. Parameters ---------- style : dict, None, or one of {darkgrid, whitegrid, dark, white, ticks} A dictionary of parameters or the name of a preconfigured set. rc : dict, optional Parameter mappings to override the values in the preset seaborn style dictionaries. This only updates parameters that are considered part of the style definition. Examples -------- >>> st = axes_style("whitegrid") >>> set_style("ticks", {"xtick.major.size": 8, "ytick.major.size": 8}) >>> import matplotlib.pyplot as plt >>> with axes_style("white"): ... f, ax = plt.subplots() ... ax.plot(x, y) # doctest: +SKIP See Also -------- set_style : set the matplotlib parameters for a seaborn theme plotting_context : return a parameter dict to to scale plot elements color_palette : define the color palette for a plot """ if style is None: style_dict = {k: mpl.rcParams[k] for k in _style_keys} elif isinstance(style, dict): style_dict = style else: styles = ["white", "dark", "whitegrid", "darkgrid", "ticks"] if style not in styles: raise ValueError("style must be one of %s" % ", ".join(styles)) # Define colors here dark_gray = ".15" light_gray = ".8" # Common parameters style_dict = { "figure.facecolor": "white", "text.color": dark_gray, "axes.labelcolor": dark_gray, "legend.frameon": False, "legend.numpoints": 1, "legend.scatterpoints": 1, "xtick.direction": "out", "ytick.direction": "out", "xtick.color": dark_gray, "ytick.color": dark_gray, "axes.axisbelow": True, "image.cmap": "Greys", "font.family": ["sans-serif"], "font.sans-serif": ["Arial", "Liberation Sans", "Bitstream Vera Sans", "sans-serif"], "grid.linestyle": "-", "lines.solid_capstyle": "round", } # Set grid on or off if "grid" in style: style_dict.update({ "axes.grid": True, }) else: style_dict.update({ "axes.grid": False, }) # Set the color of the background, spines, and grids if style.startswith("dark"): style_dict.update({ "axes.facecolor": "#EAEAF2", "axes.edgecolor": "white", "axes.linewidth": 0, "grid.color": "white", }) elif style == "whitegrid": style_dict.update({ "axes.facecolor": "white", "axes.edgecolor": light_gray, "axes.linewidth": 1, "grid.color": light_gray, }) elif style in ["white", "ticks"]: style_dict.update({ "axes.facecolor": "white", "axes.edgecolor": dark_gray, "axes.linewidth": 1.25, "grid.color": light_gray, }) # Show or hide the axes ticks if style == "ticks": style_dict.update({ "xtick.major.size": 6, "ytick.major.size": 6, "xtick.minor.size": 3, "ytick.minor.size": 3, }) else: style_dict.update({ "xtick.major.size": 0, "ytick.major.size": 0, "xtick.minor.size": 0, "ytick.minor.size": 0, }) # Override these settings with the provided rc dictionary if rc is not None: rc = {k: v for k, v in rc.items() if k in _style_keys} style_dict.update(rc) # Wrap in an _AxesStyle object so this can be used in a with statement style_object = _AxesStyle(style_dict) return style_object def set_style(style=None, rc=None): """Set the aesthetic style of the plots. This affects things like the color of the axes, whether a grid is enabled by default, and other aesthetic elements. Parameters ---------- style : dict, None, or one of {darkgrid, whitegrid, dark, white, ticks} A dictionary of parameters or the name of a preconfigured set. rc : dict, optional Parameter mappings to override the values in the preset seaborn style dictionaries. This only updates parameters that are considered part of the style definition. Examples -------- >>> set_style("whitegrid") >>> set_style("ticks", {"xtick.major.size": 8, "ytick.major.size": 8}) See Also -------- axes_style : return a dict of parameters or use in a ``with`` statement to temporarily set the style. set_context : set parameters to scale plot elements set_palette : set the default color palette for figures """ style_object = axes_style(style, rc) mpl.rcParams.update(style_object) def plotting_context(context=None, font_scale=1, rc=None): """Return a parameter dict to scale elements of the figure. This affects things like the size of the labels, lines, and other elements of the plot, but not the overall style. The base context is "notebook", and the other contexts are "paper", "talk", and "poster", which are version of the notebook parameters scaled by .8, 1.3, and 1.6, respectively. This function returns an object that can be used in a ``with`` statement to temporarily change the context parameters. Parameters ---------- context : dict, None, or one of {paper, notebook, talk, poster} A dictionary of parameters or the name of a preconfigured set. font_scale : float, optional Separate scaling factor to independently scale the size of the font elements. rc : dict, optional Parameter mappings to override the values in the preset seaborn context dictionaries. This only updates parameters that are considered part of the context definition. Examples -------- >>> c = plotting_context("poster") >>> c = plotting_context("notebook", font_scale=1.5) >>> c = plotting_context("talk", rc={"lines.linewidth": 2}) >>> import matplotlib.pyplot as plt >>> with plotting_context("paper"): ... f, ax = plt.subplots() ... ax.plot(x, y) # doctest: +SKIP See Also -------- set_context : set the matplotlib parameters to scale plot elements axes_style : return a dict of parameters defining a figure style color_palette : define the color palette for a plot """ if context is None: context_dict = {k: mpl.rcParams[k] for k in _context_keys} elif isinstance(context, dict): context_dict = context else: contexts = ["paper", "notebook", "talk", "poster"] if context not in contexts: raise ValueError("context must be in %s" % ", ".join(contexts)) # Set up dictionary of default parameters base_context = { "figure.figsize": np.array([8, 5.5]), "axes.labelsize": 11, "axes.titlesize": 12, "xtick.labelsize": 10, "ytick.labelsize": 10, "legend.fontsize": 10, "grid.linewidth": 1, "lines.linewidth": 1.75, "patch.linewidth": .3, "lines.markersize": 7, "lines.markeredgewidth": 0, "xtick.major.width": 1, "ytick.major.width": 1, "xtick.minor.width": .5, "ytick.minor.width": .5, "xtick.major.pad": 7, "ytick.major.pad": 7, } # Scale all the parameters by the same factor depending on the context scaling = dict(paper=.8, notebook=1, talk=1.3, poster=1.6)[context] context_dict = {k: v * scaling for k, v in base_context.items()} # Now independently scale the fonts font_keys = ["axes.labelsize", "axes.titlesize", "legend.fontsize", "xtick.labelsize", "ytick.labelsize"] font_dict = {k: context_dict[k] * font_scale for k in font_keys} context_dict.update(font_dict) # Implement hack workaround for matplotlib bug # See https://github.com/mwaskom/seaborn/issues/344 # There is a bug in matplotlib 1.4.2 that makes points invisible when # they don't have an edgewidth. It will supposedly be fixed in 1.4.3. if mpl.__version__ == "1.4.2": context_dict["lines.markeredgewidth"] = 0.01 # Override these settings with the provided rc dictionary if rc is not None: rc = {k: v for k, v in rc.items() if k in _context_keys} context_dict.update(rc) # Wrap in a _PlottingContext object so this can be used in a with statement context_object = _PlottingContext(context_dict) return context_object def set_context(context=None, font_scale=1, rc=None): """Set the plotting context parameters. This affects things like the size of the labels, lines, and other elements of the plot, but not the overall style. The base context is "notebook", and the other contexts are "paper", "talk", and "poster", which are version of the notebook parameters scaled by .8, 1.3, and 1.6, respectively. Parameters ---------- context : dict, None, or one of {paper, notebook, talk, poster} A dictionary of parameters or the name of a preconfigured set. font_scale : float, optional Separate scaling factor to independently scale the size of the font elements. rc : dict, optional Parameter mappings to override the values in the preset seaborn context dictionaries. This only updates parameters that are considered part of the context definition. Examples -------- >>> set_context("paper") >>> set_context("talk", font_scale=1.4) >>> set_context("talk", rc={"lines.linewidth": 2}) See Also -------- plotting_context : return a dictionary of rc parameters, or use in a ``with`` statement to temporarily set the context. set_style : set the default parameters for figure style set_palette : set the default color palette for figures """ context_object = plotting_context(context, font_scale, rc) mpl.rcParams.update(context_object) def set_palette(palette, n_colors=None, desat=None, color_codes=False): """Set the matplotlib color cycle using a seaborn palette. Parameters ---------- palette : hls | husl | matplotlib colormap | seaborn color palette Palette definition. Should be something that :func:`color_palette` can process. n_colors : int Number of colors in the cycle. The default number of colors will depend on the format of ``palette``, see the :func:`color_palette` documentation for more information. desat : float Proportion to desaturate each color by. color_codes : bool If ``True`` and ``palette`` is a seaborn palette, remap the shorthand color codes (e.g. "b", "g", "r", etc.) to the colors from this palette. Examples -------- >>> set_palette("Reds") >>> set_palette("Set1", 8, .75) See Also -------- color_palette : build a color palette or set the color cycle temporarily in a ``with`` statement. set_context : set parameters to scale plot elements set_style : set the default parameters for figure style """ colors = palettes.color_palette(palette, n_colors, desat) mpl.rcParams["axes.color_cycle"] = list(colors) mpl.rcParams["patch.facecolor"] = colors[0] if color_codes: palettes.set_color_codes(palette) seaborn-0.6.0/seaborn/tests/000077500000000000000000000000001254425553400157525ustar00rootroot00000000000000seaborn-0.6.0/seaborn/tests/__init__.py000066400000000000000000000000001254425553400200510ustar00rootroot00000000000000seaborn-0.6.0/seaborn/tests/test_algorithms.py000066400000000000000000000147171254425553400215460ustar00rootroot00000000000000import numpy as np from scipy import stats from ..external.six.moves import range import numpy.testing as npt from numpy.testing import assert_array_equal import nose.tools from nose.tools import assert_equal, raises from .. import algorithms as algo rs = np.random.RandomState(sum(map(ord, "test_algorithms"))) a_norm = rs.randn(100) def test_bootstrap(): """Test that bootstrapping gives the right answer in dumb cases.""" a_ones = np.ones(10) n_boot = 5 out1 = algo.bootstrap(a_ones, n_boot=n_boot) assert_array_equal(out1, np.ones(n_boot)) out2 = algo.bootstrap(a_ones, n_boot=n_boot, func=np.median) assert_array_equal(out2, np.ones(n_boot)) def test_bootstrap_length(): """Test that we get a bootstrap array of the right shape.""" out = algo.bootstrap(a_norm) assert_equal(len(out), 10000) n_boot = 100 out = algo.bootstrap(a_norm, n_boot=n_boot) assert_equal(len(out), n_boot) def test_bootstrap_range(): """Test that boostrapping a random array stays within the right range.""" min, max = a_norm.min(), a_norm.max() out = algo.bootstrap(a_norm) nose.tools.assert_less(min, out.min()) nose.tools.assert_greater_equal(max, out.max()) def test_bootstrap_multiarg(): """Test that bootstrap works with multiple input arrays.""" x = np.vstack([[1, 10] for i in range(10)]) y = np.vstack([[5, 5] for i in range(10)]) test_func = lambda x, y: np.vstack((x, y)).max(axis=0) out_actual = algo.bootstrap(x, y, n_boot=2, func=test_func) out_wanted = np.array([[5, 10], [5, 10]]) assert_array_equal(out_actual, out_wanted) def test_bootstrap_axis(): """Test axis kwarg to bootstrap function.""" x = rs.randn(10, 20) n_boot = 100 out_default = algo.bootstrap(x, n_boot=n_boot) assert_equal(out_default.shape, (n_boot,)) out_axis = algo.bootstrap(x, n_boot=n_boot, axis=0) assert_equal(out_axis.shape, (n_boot, 20)) def test_bootstrap_random_seed(): """Test that we can get reproducible resamples by seeding the RNG.""" data = rs.randn(50) seed = 42 boots1 = algo.bootstrap(data, random_seed=seed) boots2 = algo.bootstrap(data, random_seed=seed) assert_array_equal(boots1, boots2) def test_smooth_bootstrap(): """Test smooth bootstrap.""" x = rs.randn(15) n_boot = 100 out_smooth = algo.bootstrap(x, n_boot=n_boot, smooth=True, func=np.median) assert(not np.median(out_smooth) in x) def test_bootstrap_ols(): """Test bootstrap of OLS model fit.""" ols_fit = lambda X, y: np.dot(np.dot(np.linalg.inv( np.dot(X.T, X)), X.T), y) X = np.column_stack((rs.randn(50, 4), np.ones(50))) w = [2, 4, 0, 3, 5] y_noisy = np.dot(X, w) + rs.randn(50) * 20 y_lownoise = np.dot(X, w) + rs.randn(50) n_boot = 500 w_boot_noisy = algo.bootstrap(X, y_noisy, n_boot=n_boot, func=ols_fit) w_boot_lownoise = algo.bootstrap(X, y_lownoise, n_boot=n_boot, func=ols_fit) assert_equal(w_boot_noisy.shape, (n_boot, 5)) assert_equal(w_boot_lownoise.shape, (n_boot, 5)) nose.tools.assert_greater(w_boot_noisy.std(), w_boot_lownoise.std()) def test_bootstrap_units(): """Test that results make sense when passing unit IDs to bootstrap.""" data = rs.randn(50) ids = np.repeat(range(10), 5) bwerr = rs.normal(0, 2, 10) bwerr = bwerr[ids] data_rm = data + bwerr seed = 77 boots_orig = algo.bootstrap(data_rm, random_seed=seed) boots_rm = algo.bootstrap(data_rm, units=ids, random_seed=seed) nose.tools.assert_greater(boots_rm.std(), boots_orig.std()) @raises(ValueError) def test_bootstrap_arglength(): """Test that different length args raise ValueError.""" algo.bootstrap(np.arange(5), np.arange(10)) @raises(TypeError) def test_bootstrap_noncallable(): """Test that we get a TypeError with noncallable algo.unc.""" non_func = "mean" algo.bootstrap(a_norm, 100, non_func) def test_randomize_corrmat(): """Test the correctness of the correlation matrix p values.""" a = rs.randn(30) b = a + rs.rand(30) * 3 c = rs.randn(30) d = [a, b, c] p_mat, dist = algo.randomize_corrmat(d, tail="upper", corrected=False, return_dist=True) nose.tools.assert_greater(p_mat[2, 0], p_mat[1, 0]) corrmat = np.corrcoef(d) pctile = 100 - stats.percentileofscore(dist[2, 1], corrmat[2, 1]) nose.tools.assert_almost_equal(p_mat[2, 1] * 100, pctile) d[1] = -a + rs.rand(30) p_mat = algo.randomize_corrmat(d) nose.tools.assert_greater(0.05, p_mat[1, 0]) def test_randomize_corrmat_dist(): """Test that the distribution looks right.""" a = rs.randn(3, 20) for n_i in [5, 10]: p_mat, dist = algo.randomize_corrmat(a, n_iter=n_i, return_dist=True) assert_equal(n_i, dist.shape[-1]) p_mat, dist = algo.randomize_corrmat(a, n_iter=10000, return_dist=True) diag_mean = dist[0, 0].mean() assert_equal(diag_mean, 1) off_diag_mean = dist[0, 1].mean() nose.tools.assert_greater(0.05, off_diag_mean) def test_randomize_corrmat_correction(): """Test that FWE correction works.""" a = rs.randn(3, 20) p_mat = algo.randomize_corrmat(a, "upper", False) p_mat_corr = algo.randomize_corrmat(a, "upper", True) triu = np.triu_indices(3, 1) npt.assert_array_less(p_mat[triu], p_mat_corr[triu]) def test_randimoize_corrmat_tails(): """Test that the tail argument works.""" a = rs.randn(30) b = a + rs.rand(30) * 8 c = rs.randn(30) d = [a, b, c] p_mat_b = algo.randomize_corrmat(d, "both", False, random_seed=0) p_mat_u = algo.randomize_corrmat(d, "upper", False, random_seed=0) p_mat_l = algo.randomize_corrmat(d, "lower", False, random_seed=0) assert_equal(p_mat_b[0, 1], p_mat_u[0, 1] * 2) assert_equal(p_mat_l[0, 1], 1 - p_mat_u[0, 1]) def test_randomise_corrmat_seed(): """Test that we can seed the corrmat randomization.""" a = rs.randn(3, 20) _, dist1 = algo.randomize_corrmat(a, random_seed=0, return_dist=True) _, dist2 = algo.randomize_corrmat(a, random_seed=0, return_dist=True) assert_array_equal(dist1, dist2) @raises(ValueError) def test_randomize_corrmat_tail_error(): """Test that we are strict about tail paramete.""" a = rs.randn(3, 30) algo.randomize_corrmat(a, "hello") seaborn-0.6.0/seaborn/tests/test_axisgrid.py000066400000000000000000001234051254425553400212020ustar00rootroot00000000000000import warnings import numpy as np import pandas as pd from scipy import stats import matplotlib as mpl import matplotlib.pyplot as plt from distutils.version import LooseVersion import nose.tools as nt import numpy.testing as npt from numpy.testing.decorators import skipif import pandas.util.testing as tm from .. import axisgrid as ag from .. import rcmod from ..palettes import color_palette from ..distributions import kdeplot from ..categorical import pointplot from ..linearmodels import pairplot rs = np.random.RandomState(0) old_matplotlib = LooseVersion(mpl.__version__) < "1.4" class TestFacetGrid(object): df = pd.DataFrame(dict(x=rs.normal(size=60), y=rs.gamma(4, size=60), a=np.repeat(list("abc"), 20), b=np.tile(list("mn"), 30), c=np.tile(list("tuv"), 20), d=np.tile(list("abcdefghij"), 6))) def test_self_data(self): g = ag.FacetGrid(self.df) nt.assert_is(g.data, self.df) plt.close("all") def test_self_fig(self): g = ag.FacetGrid(self.df) nt.assert_is_instance(g.fig, plt.Figure) plt.close("all") def test_self_axes(self): g = ag.FacetGrid(self.df, row="a", col="b", hue="c") for ax in g.axes.flat: nt.assert_is_instance(ax, plt.Axes) plt.close("all") def test_axes_array_size(self): g1 = ag.FacetGrid(self.df) nt.assert_equal(g1.axes.shape, (1, 1)) g2 = ag.FacetGrid(self.df, row="a") nt.assert_equal(g2.axes.shape, (3, 1)) g3 = ag.FacetGrid(self.df, col="b") nt.assert_equal(g3.axes.shape, (1, 2)) g4 = ag.FacetGrid(self.df, hue="c") nt.assert_equal(g4.axes.shape, (1, 1)) g5 = ag.FacetGrid(self.df, row="a", col="b", hue="c") nt.assert_equal(g5.axes.shape, (3, 2)) for ax in g5.axes.flat: nt.assert_is_instance(ax, plt.Axes) plt.close("all") def test_single_axes(self): g1 = ag.FacetGrid(self.df) nt.assert_is_instance(g1.ax, plt.Axes) g2 = ag.FacetGrid(self.df, row="a") with nt.assert_raises(AttributeError): g2.ax g3 = ag.FacetGrid(self.df, col="a") with nt.assert_raises(AttributeError): g3.ax g4 = ag.FacetGrid(self.df, col="a", row="b") with nt.assert_raises(AttributeError): g4.ax def test_col_wrap(self): g = ag.FacetGrid(self.df, col="d") nt.assert_equal(g.axes.shape, (1, 10)) nt.assert_is(g.facet_axis(0, 8), g.axes[0, 8]) g_wrap = ag.FacetGrid(self.df, col="d", col_wrap=4) nt.assert_equal(g_wrap.axes.shape, (10,)) nt.assert_is(g_wrap.facet_axis(0, 8), g_wrap.axes[8]) nt.assert_equal(g_wrap._ncol, 4) nt.assert_equal(g_wrap._nrow, 3) with nt.assert_raises(ValueError): g = ag.FacetGrid(self.df, row="b", col="d", col_wrap=4) df = self.df.copy() df.loc[df.d == "j"] = np.nan g_missing = ag.FacetGrid(df, col="d") nt.assert_equal(g_missing.axes.shape, (1, 9)) g_missing_wrap = ag.FacetGrid(df, col="d", col_wrap=4) nt.assert_equal(g_missing_wrap.axes.shape, (9,)) plt.close("all") def test_normal_axes(self): null = np.empty(0, object).flat g = ag.FacetGrid(self.df) npt.assert_array_equal(g._bottom_axes, g.axes.flat) npt.assert_array_equal(g._not_bottom_axes, null) npt.assert_array_equal(g._left_axes, g.axes.flat) npt.assert_array_equal(g._not_left_axes, null) npt.assert_array_equal(g._inner_axes, null) g = ag.FacetGrid(self.df, col="c") npt.assert_array_equal(g._bottom_axes, g.axes.flat) npt.assert_array_equal(g._not_bottom_axes, null) npt.assert_array_equal(g._left_axes, g.axes[:, 0].flat) npt.assert_array_equal(g._not_left_axes, g.axes[:, 1:].flat) npt.assert_array_equal(g._inner_axes, null) g = ag.FacetGrid(self.df, row="c") npt.assert_array_equal(g._bottom_axes, g.axes[-1, :].flat) npt.assert_array_equal(g._not_bottom_axes, g.axes[:-1, :].flat) npt.assert_array_equal(g._left_axes, g.axes.flat) npt.assert_array_equal(g._not_left_axes, null) npt.assert_array_equal(g._inner_axes, null) g = ag.FacetGrid(self.df, col="a", row="c") npt.assert_array_equal(g._bottom_axes, g.axes[-1, :].flat) npt.assert_array_equal(g._not_bottom_axes, g.axes[:-1, :].flat) npt.assert_array_equal(g._left_axes, g.axes[:, 0].flat) npt.assert_array_equal(g._not_left_axes, g.axes[:, 1:].flat) npt.assert_array_equal(g._inner_axes, g.axes[:-1, 1:].flat) plt.close("all") def test_wrapped_axes(self): null = np.empty(0, object).flat g = ag.FacetGrid(self.df, col="a", col_wrap=2) npt.assert_array_equal(g._bottom_axes, g.axes[np.array([1, 2])].flat) npt.assert_array_equal(g._not_bottom_axes, g.axes[:1].flat) npt.assert_array_equal(g._left_axes, g.axes[np.array([0, 2])].flat) npt.assert_array_equal(g._not_left_axes, g.axes[np.array([1])].flat) npt.assert_array_equal(g._inner_axes, null) plt.close("all") def test_figure_size(self): g = ag.FacetGrid(self.df, row="a", col="b") npt.assert_array_equal(g.fig.get_size_inches(), (6, 9)) g = ag.FacetGrid(self.df, row="a", col="b", size=6) npt.assert_array_equal(g.fig.get_size_inches(), (12, 18)) g = ag.FacetGrid(self.df, col="c", size=4, aspect=.5) npt.assert_array_equal(g.fig.get_size_inches(), (6, 4)) plt.close("all") def test_figure_size_with_legend(self): g1 = ag.FacetGrid(self.df, col="a", hue="c", size=4, aspect=.5) npt.assert_array_equal(g1.fig.get_size_inches(), (6, 4)) g1.add_legend() nt.assert_greater(g1.fig.get_size_inches()[0], 6) g2 = ag.FacetGrid(self.df, col="a", hue="c", size=4, aspect=.5, legend_out=False) npt.assert_array_equal(g2.fig.get_size_inches(), (6, 4)) g2.add_legend() npt.assert_array_equal(g2.fig.get_size_inches(), (6, 4)) plt.close("all") def test_legend_data(self): g1 = ag.FacetGrid(self.df, hue="a") g1.map(plt.plot, "x", "y") g1.add_legend() palette = color_palette(n_colors=3) nt.assert_equal(g1._legend.get_title().get_text(), "a") a_levels = sorted(self.df.a.unique()) lines = g1._legend.get_lines() nt.assert_equal(len(lines), len(a_levels)) for line, hue in zip(lines, palette): nt.assert_equal(line.get_color(), hue) labels = g1._legend.get_texts() nt.assert_equal(len(labels), len(a_levels)) for label, level in zip(labels, a_levels): nt.assert_equal(label.get_text(), level) plt.close("all") def test_legend_data_missing_level(self): g1 = ag.FacetGrid(self.df, hue="a", hue_order=list("azbc")) g1.map(plt.plot, "x", "y") g1.add_legend() b, g, r, p = color_palette(n_colors=4) palette = [b, r, p] nt.assert_equal(g1._legend.get_title().get_text(), "a") a_levels = sorted(self.df.a.unique()) lines = g1._legend.get_lines() nt.assert_equal(len(lines), len(a_levels)) for line, hue in zip(lines, palette): nt.assert_equal(line.get_color(), hue) labels = g1._legend.get_texts() nt.assert_equal(len(labels), 4) for label, level in zip(labels, list("azbc")): nt.assert_equal(label.get_text(), level) plt.close("all") def test_get_boolean_legend_data(self): self.df["b_bool"] = self.df.b == "m" g1 = ag.FacetGrid(self.df, hue="b_bool") g1.map(plt.plot, "x", "y") g1.add_legend() palette = color_palette(n_colors=2) nt.assert_equal(g1._legend.get_title().get_text(), "b_bool") b_levels = list(map(str, self.df.b_bool.unique())) lines = g1._legend.get_lines() nt.assert_equal(len(lines), len(b_levels)) for line, hue in zip(lines, palette): nt.assert_equal(line.get_color(), hue) labels = g1._legend.get_texts() nt.assert_equal(len(labels), len(b_levels)) for label, level in zip(labels, b_levels): nt.assert_equal(label.get_text(), level) plt.close("all") def test_legend_options(self): g1 = ag.FacetGrid(self.df, hue="b") g1.map(plt.plot, "x", "y") g1.add_legend() def test_legendout_with_colwrap(self): g = ag.FacetGrid(self.df, col="d", hue='b', col_wrap=4, legend_out=False) g.map(plt.plot, "x", "y", linewidth=3) g.add_legend() def test_subplot_kws(self): g = ag.FacetGrid(self.df, subplot_kws=dict(axisbg="blue")) for ax in g.axes.flat: nt.assert_equal(ax.get_axis_bgcolor(), "blue") @skipif(old_matplotlib) def test_gridspec_kws(self): ratios = [3, 1, 2] sizes = [0.46, 0.15, 0.31] gskws = dict(width_ratios=ratios, height_ratios=ratios) g = ag.FacetGrid(self.df, col='c', row='a', gridspec_kws=gskws) # clear out all ticks for ax in g.axes.flat: ax.set_xticks([]) ax.set_yticks([]) g.fig.tight_layout() widths, heights = np.meshgrid(sizes, sizes) for n, ax in enumerate(g.axes.flat): npt.assert_almost_equal( ax.get_position().width, widths.flatten()[n], decimal=2 ) npt.assert_almost_equal( ax.get_position().height, heights.flatten()[n], decimal=2 ) @skipif(old_matplotlib) def test_gridspec_kws_col_wrap(self): ratios = [3, 1, 2, 1, 1] sizes = [0.46, 0.15, 0.31] gskws = dict(width_ratios=ratios) with warnings.catch_warnings(): warnings.resetwarnings() warnings.simplefilter("always") npt.assert_warns(UserWarning, ag.FacetGrid, self.df, col='d', col_wrap=5, gridspec_kws=gskws) @skipif(not old_matplotlib) def test_gridsic_kws_old_mpl(self): ratios = [3, 1, 2] sizes = [0.46, 0.15, 0.31] gskws = dict(width_ratios=ratios, height_ratios=ratios) with warnings.catch_warnings(): warnings.resetwarnings() warnings.simplefilter("always") npt.assert_warns(UserWarning, ag.FacetGrid, self.df, col='c', row='a', gridspec_kws=gskws) def test_data_generator(self): g = ag.FacetGrid(self.df, row="a") d = list(g.facet_data()) nt.assert_equal(len(d), 3) tup, data = d[0] nt.assert_equal(tup, (0, 0, 0)) nt.assert_true((data["a"] == "a").all()) tup, data = d[1] nt.assert_equal(tup, (1, 0, 0)) nt.assert_true((data["a"] == "b").all()) g = ag.FacetGrid(self.df, row="a", col="b") d = list(g.facet_data()) nt.assert_equal(len(d), 6) tup, data = d[0] nt.assert_equal(tup, (0, 0, 0)) nt.assert_true((data["a"] == "a").all()) nt.assert_true((data["b"] == "m").all()) tup, data = d[1] nt.assert_equal(tup, (0, 1, 0)) nt.assert_true((data["a"] == "a").all()) nt.assert_true((data["b"] == "n").all()) tup, data = d[2] nt.assert_equal(tup, (1, 0, 0)) nt.assert_true((data["a"] == "b").all()) nt.assert_true((data["b"] == "m").all()) g = ag.FacetGrid(self.df, hue="c") d = list(g.facet_data()) nt.assert_equal(len(d), 3) tup, data = d[1] nt.assert_equal(tup, (0, 0, 1)) nt.assert_true((data["c"] == "u").all()) plt.close("all") def test_map(self): g = ag.FacetGrid(self.df, row="a", col="b", hue="c") g.map(plt.plot, "x", "y", linewidth=3) lines = g.axes[0, 0].lines nt.assert_equal(len(lines), 3) line1, _, _ = lines nt.assert_equal(line1.get_linewidth(), 3) x, y = line1.get_data() mask = (self.df.a == "a") & (self.df.b == "m") & (self.df.c == "t") npt.assert_array_equal(x, self.df.x[mask]) npt.assert_array_equal(y, self.df.y[mask]) def test_map_dataframe(self): g = ag.FacetGrid(self.df, row="a", col="b", hue="c") plot = lambda x, y, data=None, **kws: plt.plot(data[x], data[y], **kws) g.map_dataframe(plot, "x", "y", linestyle="--") lines = g.axes[0, 0].lines nt.assert_equal(len(lines), 3) line1, _, _ = lines nt.assert_equal(line1.get_linestyle(), "--") x, y = line1.get_data() mask = (self.df.a == "a") & (self.df.b == "m") & (self.df.c == "t") npt.assert_array_equal(x, self.df.x[mask]) npt.assert_array_equal(y, self.df.y[mask]) def test_set(self): g = ag.FacetGrid(self.df, row="a", col="b") xlim = (-2, 5) ylim = (3, 6) xticks = [-2, 0, 3, 5] yticks = [3, 4.5, 6] g.set(xlim=xlim, ylim=ylim, xticks=xticks, yticks=yticks) for ax in g.axes.flat: npt.assert_array_equal(ax.get_xlim(), xlim) npt.assert_array_equal(ax.get_ylim(), ylim) npt.assert_array_equal(ax.get_xticks(), xticks) npt.assert_array_equal(ax.get_yticks(), yticks) plt.close("all") def test_set_titles(self): g = ag.FacetGrid(self.df, row="a", col="b") g.map(plt.plot, "x", "y") # Test the default titles nt.assert_equal(g.axes[0, 0].get_title(), "a = a | b = m") nt.assert_equal(g.axes[0, 1].get_title(), "a = a | b = n") nt.assert_equal(g.axes[1, 0].get_title(), "a = b | b = m") # Test a provided title g.set_titles("{row_var} == {row_name} \/ {col_var} == {col_name}") nt.assert_equal(g.axes[0, 0].get_title(), "a == a \/ b == m") nt.assert_equal(g.axes[0, 1].get_title(), "a == a \/ b == n") nt.assert_equal(g.axes[1, 0].get_title(), "a == b \/ b == m") # Test a single row g = ag.FacetGrid(self.df, col="b") g.map(plt.plot, "x", "y") # Test the default titles nt.assert_equal(g.axes[0, 0].get_title(), "b = m") nt.assert_equal(g.axes[0, 1].get_title(), "b = n") # test with dropna=False g = ag.FacetGrid(self.df, col="b", hue="b", dropna=False) g.map(plt.plot, 'x', 'y') plt.close("all") def test_set_titles_margin_titles(self): g = ag.FacetGrid(self.df, row="a", col="b", margin_titles=True) g.map(plt.plot, "x", "y") # Test the default titles nt.assert_equal(g.axes[0, 0].get_title(), "b = m") nt.assert_equal(g.axes[0, 1].get_title(), "b = n") nt.assert_equal(g.axes[1, 0].get_title(), "") # Test the row "titles" nt.assert_equal(g.axes[0, 1].texts[0].get_text(), "a = a") nt.assert_equal(g.axes[1, 1].texts[0].get_text(), "a = b") # Test a provided title g.set_titles(col_template="{col_var} == {col_name}") nt.assert_equal(g.axes[0, 0].get_title(), "b == m") nt.assert_equal(g.axes[0, 1].get_title(), "b == n") nt.assert_equal(g.axes[1, 0].get_title(), "") plt.close("all") def test_set_ticklabels(self): g = ag.FacetGrid(self.df, row="a", col="b") g.map(plt.plot, "x", "y") xlab = [l.get_text() + "h" for l in g.axes[1, 0].get_xticklabels()] ylab = [l.get_text() for l in g.axes[1, 0].get_yticklabels()] g.set_xticklabels(xlab) g.set_yticklabels(rotation=90) got_x = [l.get_text() + "h" for l in g.axes[1, 1].get_xticklabels()] got_y = [l.get_text() for l in g.axes[0, 0].get_yticklabels()] npt.assert_array_equal(got_x, xlab) npt.assert_array_equal(got_y, ylab) x, y = np.arange(10), np.arange(10) df = pd.DataFrame(np.c_[x, y], columns=["x", "y"]) g = ag.FacetGrid(df).map(pointplot, "x", "y") g.set_xticklabels(step=2) got_x = [int(l.get_text()) for l in g.axes[0, 0].get_xticklabels()] npt.assert_array_equal(x[::2], got_x) g = ag.FacetGrid(self.df, col="d", col_wrap=5) g.map(plt.plot, "x", "y") g.set_xticklabels(rotation=45) g.set_yticklabels(rotation=75) for ax in g._bottom_axes: for l in ax.get_xticklabels(): nt.assert_equal(l.get_rotation(), 45) for ax in g._left_axes: for l in ax.get_yticklabels(): nt.assert_equal(l.get_rotation(), 75) plt.close("all") def test_set_axis_labels(self): g = ag.FacetGrid(self.df, row="a", col="b") g.map(plt.plot, "x", "y") xlab = 'xx' ylab = 'yy' g.set_axis_labels(xlab, ylab) got_x = [ax.get_xlabel() for ax in g.axes[-1, :]] got_y = [ax.get_ylabel() for ax in g.axes[:, 0]] npt.assert_array_equal(got_x, xlab) npt.assert_array_equal(got_y, ylab) plt.close("all") def test_axis_lims(self): g = ag.FacetGrid(self.df, row="a", col="b", xlim=(0, 4), ylim=(-2, 3)) nt.assert_equal(g.axes[0, 0].get_xlim(), (0, 4)) nt.assert_equal(g.axes[0, 0].get_ylim(), (-2, 3)) plt.close("all") def test_data_orders(self): g = ag.FacetGrid(self.df, row="a", col="b", hue="c") nt.assert_equal(g.row_names, list("abc")) nt.assert_equal(g.col_names, list("mn")) nt.assert_equal(g.hue_names, list("tuv")) nt.assert_equal(g.axes.shape, (3, 2)) g = ag.FacetGrid(self.df, row="a", col="b", hue="c", row_order=list("bca"), col_order=list("nm"), hue_order=list("vtu")) nt.assert_equal(g.row_names, list("bca")) nt.assert_equal(g.col_names, list("nm")) nt.assert_equal(g.hue_names, list("vtu")) nt.assert_equal(g.axes.shape, (3, 2)) g = ag.FacetGrid(self.df, row="a", col="b", hue="c", row_order=list("bcda"), col_order=list("nom"), hue_order=list("qvtu")) nt.assert_equal(g.row_names, list("bcda")) nt.assert_equal(g.col_names, list("nom")) nt.assert_equal(g.hue_names, list("qvtu")) nt.assert_equal(g.axes.shape, (4, 3)) plt.close("all") def test_palette(self): rcmod.set() g = ag.FacetGrid(self.df, hue="c") nt.assert_equal(g._colors, color_palette(n_colors=3)) g = ag.FacetGrid(self.df, hue="d") nt.assert_equal(g._colors, color_palette("husl", 10)) g = ag.FacetGrid(self.df, hue="c", palette="Set2") nt.assert_equal(g._colors, color_palette("Set2", 3)) dict_pal = dict(t="red", u="green", v="blue") list_pal = color_palette(["red", "green", "blue"], 3) g = ag.FacetGrid(self.df, hue="c", palette=dict_pal) nt.assert_equal(g._colors, list_pal) list_pal = color_palette(["green", "blue", "red"], 3) g = ag.FacetGrid(self.df, hue="c", hue_order=list("uvt"), palette=dict_pal) nt.assert_equal(g._colors, list_pal) plt.close("all") def test_hue_kws(self): kws = dict(marker=["o", "s", "D"]) g = ag.FacetGrid(self.df, hue="c", hue_kws=kws) g.map(plt.plot, "x", "y") for line, marker in zip(g.axes[0, 0].lines, kws["marker"]): nt.assert_equal(line.get_marker(), marker) def test_dropna(self): df = self.df.copy() hasna = pd.Series(np.tile(np.arange(6), 10), dtype=np.float) hasna[hasna == 5] = np.nan df["hasna"] = hasna g = ag.FacetGrid(df, dropna=False, row="hasna") nt.assert_equal(g._not_na.sum(), 60) g = ag.FacetGrid(df, dropna=True, row="hasna") nt.assert_equal(g._not_na.sum(), 50) plt.close("all") @classmethod def teardown_class(cls): """Ensure that all figures are closed on exit.""" plt.close("all") class TestPairGrid(object): rs = np.random.RandomState(sum(map(ord, "PairGrid"))) df = pd.DataFrame(dict(x=rs.normal(size=80), y=rs.randint(0, 4, size=(80)), z=rs.gamma(3, size=80), a=np.repeat(list("abcd"), 20), b=np.repeat(list("abcdefgh"), 10))) def test_self_data(self): g = ag.PairGrid(self.df) nt.assert_is(g.data, self.df) plt.close("all") def test_ignore_datelike_data(self): df = self.df.copy() df['date'] = pd.date_range('2010-01-01', periods=len(df), freq='d') result = ag.PairGrid(self.df).data expected = df.drop('date', axis=1) tm.assert_frame_equal(result, expected) plt.close("all") def test_self_fig(self): g = ag.PairGrid(self.df) nt.assert_is_instance(g.fig, plt.Figure) plt.close("all") def test_self_axes(self): g = ag.PairGrid(self.df) for ax in g.axes.flat: nt.assert_is_instance(ax, plt.Axes) plt.close("all") def test_default_axes(self): g = ag.PairGrid(self.df) nt.assert_equal(g.axes.shape, (3, 3)) nt.assert_equal(g.x_vars, ["x", "y", "z"]) nt.assert_equal(g.y_vars, ["x", "y", "z"]) nt.assert_true(g.square_grid) plt.close("all") def test_specific_square_axes(self): vars = ["z", "x"] g = ag.PairGrid(self.df, vars=vars) nt.assert_equal(g.axes.shape, (len(vars), len(vars))) nt.assert_equal(g.x_vars, vars) nt.assert_equal(g.y_vars, vars) nt.assert_true(g.square_grid) plt.close("all") def test_specific_nonsquare_axes(self): x_vars = ["x", "y"] y_vars = ["z", "y", "x"] g = ag.PairGrid(self.df, x_vars=x_vars, y_vars=y_vars) nt.assert_equal(g.axes.shape, (len(y_vars), len(x_vars))) nt.assert_equal(g.x_vars, x_vars) nt.assert_equal(g.y_vars, y_vars) nt.assert_true(not g.square_grid) x_vars = ["x", "y"] y_vars = "z" g = ag.PairGrid(self.df, x_vars=x_vars, y_vars=y_vars) nt.assert_equal(g.axes.shape, (len(y_vars), len(x_vars))) nt.assert_equal(g.x_vars, list(x_vars)) nt.assert_equal(g.y_vars, list(y_vars)) nt.assert_true(not g.square_grid) plt.close("all") def test_specific_square_axes_with_array(self): vars = np.array(["z", "x"]) g = ag.PairGrid(self.df, vars=vars) nt.assert_equal(g.axes.shape, (len(vars), len(vars))) nt.assert_equal(g.x_vars, list(vars)) nt.assert_equal(g.y_vars, list(vars)) nt.assert_true(g.square_grid) plt.close("all") def test_specific_nonsquare_axes_with_array(self): x_vars = np.array(["x", "y"]) y_vars = np.array(["z", "y", "x"]) g = ag.PairGrid(self.df, x_vars=x_vars, y_vars=y_vars) nt.assert_equal(g.axes.shape, (len(y_vars), len(x_vars))) nt.assert_equal(g.x_vars, list(x_vars)) nt.assert_equal(g.y_vars, list(y_vars)) nt.assert_true(not g.square_grid) plt.close("all") def test_size(self): g1 = ag.PairGrid(self.df, size=3) npt.assert_array_equal(g1.fig.get_size_inches(), (9, 9)) g2 = ag.PairGrid(self.df, size=4, aspect=.5) npt.assert_array_equal(g2.fig.get_size_inches(), (6, 12)) g3 = ag.PairGrid(self.df, y_vars=["z"], x_vars=["x", "y"], size=2, aspect=2) npt.assert_array_equal(g3.fig.get_size_inches(), (8, 2)) plt.close("all") def test_map(self): vars = ["x", "y", "z"] g1 = ag.PairGrid(self.df) g1.map(plt.scatter) for i, axes_i in enumerate(g1.axes): for j, ax in enumerate(axes_i): x_in = self.df[vars[j]] y_in = self.df[vars[i]] x_out, y_out = ax.collections[0].get_offsets().T npt.assert_array_equal(x_in, x_out) npt.assert_array_equal(y_in, y_out) g2 = ag.PairGrid(self.df, "a") g2.map(plt.scatter) for i, axes_i in enumerate(g2.axes): for j, ax in enumerate(axes_i): x_in = self.df[vars[j]] y_in = self.df[vars[i]] for k, k_level in enumerate("abcd"): x_in_k = x_in[self.df.a == k_level] y_in_k = y_in[self.df.a == k_level] x_out, y_out = ax.collections[k].get_offsets().T npt.assert_array_equal(x_in_k, x_out) npt.assert_array_equal(y_in_k, y_out) plt.close("all") def test_map_nonsquare(self): x_vars = ["x"] y_vars = ["y", "z"] g = ag.PairGrid(self.df, x_vars=x_vars, y_vars=y_vars) g.map(plt.scatter) x_in = self.df.x for i, i_var in enumerate(y_vars): ax = g.axes[i, 0] y_in = self.df[i_var] x_out, y_out = ax.collections[0].get_offsets().T npt.assert_array_equal(x_in, x_out) npt.assert_array_equal(y_in, y_out) plt.close("all") def test_map_lower(self): vars = ["x", "y", "z"] g = ag.PairGrid(self.df) g.map_lower(plt.scatter) for i, j in zip(*np.tril_indices_from(g.axes, -1)): ax = g.axes[i, j] x_in = self.df[vars[j]] y_in = self.df[vars[i]] x_out, y_out = ax.collections[0].get_offsets().T npt.assert_array_equal(x_in, x_out) npt.assert_array_equal(y_in, y_out) for i, j in zip(*np.triu_indices_from(g.axes)): ax = g.axes[i, j] nt.assert_equal(len(ax.collections), 0) plt.close("all") def test_map_upper(self): vars = ["x", "y", "z"] g = ag.PairGrid(self.df) g.map_upper(plt.scatter) for i, j in zip(*np.triu_indices_from(g.axes, 1)): ax = g.axes[i, j] x_in = self.df[vars[j]] y_in = self.df[vars[i]] x_out, y_out = ax.collections[0].get_offsets().T npt.assert_array_equal(x_in, x_out) npt.assert_array_equal(y_in, y_out) for i, j in zip(*np.tril_indices_from(g.axes)): ax = g.axes[i, j] nt.assert_equal(len(ax.collections), 0) plt.close("all") @skipif(old_matplotlib) def test_map_diag(self): g1 = ag.PairGrid(self.df) g1.map_diag(plt.hist) for ax in g1.diag_axes: nt.assert_equal(len(ax.patches), 10) g2 = ag.PairGrid(self.df) g2.map_diag(plt.hist, bins=15) for ax in g2.diag_axes: nt.assert_equal(len(ax.patches), 15) g3 = ag.PairGrid(self.df, hue="a") g3.map_diag(plt.hist) for ax in g3.diag_axes: nt.assert_equal(len(ax.patches), 40) plt.close("all") @skipif(old_matplotlib) def test_map_diag_and_offdiag(self): vars = ["x", "y", "z"] g = ag.PairGrid(self.df) g.map_offdiag(plt.scatter) g.map_diag(plt.hist) for ax in g.diag_axes: nt.assert_equal(len(ax.patches), 10) for i, j in zip(*np.triu_indices_from(g.axes, 1)): ax = g.axes[i, j] x_in = self.df[vars[j]] y_in = self.df[vars[i]] x_out, y_out = ax.collections[0].get_offsets().T npt.assert_array_equal(x_in, x_out) npt.assert_array_equal(y_in, y_out) for i, j in zip(*np.tril_indices_from(g.axes, -1)): ax = g.axes[i, j] x_in = self.df[vars[j]] y_in = self.df[vars[i]] x_out, y_out = ax.collections[0].get_offsets().T npt.assert_array_equal(x_in, x_out) npt.assert_array_equal(y_in, y_out) for i, j in zip(*np.diag_indices_from(g.axes)): ax = g.axes[i, j] nt.assert_equal(len(ax.collections), 0) plt.close("all") def test_palette(self): rcmod.set() g = ag.PairGrid(self.df, hue="a") nt.assert_equal(g.palette, color_palette(n_colors=4)) g = ag.PairGrid(self.df, hue="b") nt.assert_equal(g.palette, color_palette("husl", 8)) g = ag.PairGrid(self.df, hue="a", palette="Set2") nt.assert_equal(g.palette, color_palette("Set2", 4)) dict_pal = dict(a="red", b="green", c="blue", d="purple") list_pal = color_palette(["red", "green", "blue", "purple"], 4) g = ag.PairGrid(self.df, hue="a", palette=dict_pal) nt.assert_equal(g.palette, list_pal) list_pal = color_palette(["purple", "blue", "red", "green"], 4) g = ag.PairGrid(self.df, hue="a", hue_order=list("dcab"), palette=dict_pal) nt.assert_equal(g.palette, list_pal) plt.close("all") def test_hue_kws(self): kws = dict(marker=["o", "s", "d", "+"]) g = ag.PairGrid(self.df, hue="a", hue_kws=kws) g.map(plt.plot) for line, marker in zip(g.axes[0, 0].lines, kws["marker"]): nt.assert_equal(line.get_marker(), marker) g = ag.PairGrid(self.df, hue="a", hue_kws=kws, hue_order=list("dcab")) g.map(plt.plot) for line, marker in zip(g.axes[0, 0].lines, kws["marker"]): nt.assert_equal(line.get_marker(), marker) plt.close("all") @skipif(old_matplotlib) def test_hue_order(self): order = list("dcab") g = ag.PairGrid(self.df, hue="a", hue_order=order) g.map(plt.plot) for line, level in zip(g.axes[1, 0].lines, order): x, y = line.get_xydata().T npt.assert_array_equal(x, self.df.loc[self.df.a == level, "x"]) npt.assert_array_equal(y, self.df.loc[self.df.a == level, "y"]) plt.close("all") g = ag.PairGrid(self.df, hue="a", hue_order=order) g.map_diag(plt.plot) for line, level in zip(g.axes[0, 0].lines, order): x, y = line.get_xydata().T npt.assert_array_equal(x, self.df.loc[self.df.a == level, "x"]) npt.assert_array_equal(y, self.df.loc[self.df.a == level, "x"]) plt.close("all") g = ag.PairGrid(self.df, hue="a", hue_order=order) g.map_lower(plt.plot) for line, level in zip(g.axes[1, 0].lines, order): x, y = line.get_xydata().T npt.assert_array_equal(x, self.df.loc[self.df.a == level, "x"]) npt.assert_array_equal(y, self.df.loc[self.df.a == level, "y"]) plt.close("all") g = ag.PairGrid(self.df, hue="a", hue_order=order) g.map_upper(plt.plot) for line, level in zip(g.axes[0, 1].lines, order): x, y = line.get_xydata().T npt.assert_array_equal(x, self.df.loc[self.df.a == level, "y"]) npt.assert_array_equal(y, self.df.loc[self.df.a == level, "x"]) plt.close("all") @skipif(old_matplotlib) def test_hue_order_missing_level(self): order = list("dcaeb") g = ag.PairGrid(self.df, hue="a", hue_order=order) g.map(plt.plot) for line, level in zip(g.axes[1, 0].lines, order): x, y = line.get_xydata().T npt.assert_array_equal(x, self.df.loc[self.df.a == level, "x"]) npt.assert_array_equal(y, self.df.loc[self.df.a == level, "y"]) plt.close("all") g = ag.PairGrid(self.df, hue="a", hue_order=order) g.map_diag(plt.plot) for line, level in zip(g.axes[0, 0].lines, order): x, y = line.get_xydata().T npt.assert_array_equal(x, self.df.loc[self.df.a == level, "x"]) npt.assert_array_equal(y, self.df.loc[self.df.a == level, "x"]) plt.close("all") g = ag.PairGrid(self.df, hue="a", hue_order=order) g.map_lower(plt.plot) for line, level in zip(g.axes[1, 0].lines, order): x, y = line.get_xydata().T npt.assert_array_equal(x, self.df.loc[self.df.a == level, "x"]) npt.assert_array_equal(y, self.df.loc[self.df.a == level, "y"]) plt.close("all") g = ag.PairGrid(self.df, hue="a", hue_order=order) g.map_upper(plt.plot) for line, level in zip(g.axes[0, 1].lines, order): x, y = line.get_xydata().T npt.assert_array_equal(x, self.df.loc[self.df.a == level, "y"]) npt.assert_array_equal(y, self.df.loc[self.df.a == level, "x"]) plt.close("all") def test_nondefault_index(self): df = self.df.copy().set_index("b") vars = ["x", "y", "z"] g1 = ag.PairGrid(df) g1.map(plt.scatter) for i, axes_i in enumerate(g1.axes): for j, ax in enumerate(axes_i): x_in = self.df[vars[j]] y_in = self.df[vars[i]] x_out, y_out = ax.collections[0].get_offsets().T npt.assert_array_equal(x_in, x_out) npt.assert_array_equal(y_in, y_out) g2 = ag.PairGrid(df, "a") g2.map(plt.scatter) for i, axes_i in enumerate(g2.axes): for j, ax in enumerate(axes_i): x_in = self.df[vars[j]] y_in = self.df[vars[i]] for k, k_level in enumerate("abcd"): x_in_k = x_in[self.df.a == k_level] y_in_k = y_in[self.df.a == k_level] x_out, y_out = ax.collections[k].get_offsets().T npt.assert_array_equal(x_in_k, x_out) npt.assert_array_equal(y_in_k, y_out) plt.close("all") @skipif(old_matplotlib) def test_pairplot(self): vars = ["x", "y", "z"] g = pairplot(self.df) for ax in g.diag_axes: nt.assert_equal(len(ax.patches), 10) for i, j in zip(*np.triu_indices_from(g.axes, 1)): ax = g.axes[i, j] x_in = self.df[vars[j]] y_in = self.df[vars[i]] x_out, y_out = ax.collections[0].get_offsets().T npt.assert_array_equal(x_in, x_out) npt.assert_array_equal(y_in, y_out) for i, j in zip(*np.tril_indices_from(g.axes, -1)): ax = g.axes[i, j] x_in = self.df[vars[j]] y_in = self.df[vars[i]] x_out, y_out = ax.collections[0].get_offsets().T npt.assert_array_equal(x_in, x_out) npt.assert_array_equal(y_in, y_out) for i, j in zip(*np.diag_indices_from(g.axes)): ax = g.axes[i, j] nt.assert_equal(len(ax.collections), 0) plt.close("all") @skipif(old_matplotlib) def test_pairplot_reg(self): vars = ["x", "y", "z"] g = pairplot(self.df, kind="reg") for ax in g.diag_axes: nt.assert_equal(len(ax.patches), 10) for i, j in zip(*np.triu_indices_from(g.axes, 1)): ax = g.axes[i, j] x_in = self.df[vars[j]] y_in = self.df[vars[i]] x_out, y_out = ax.collections[0].get_offsets().T npt.assert_array_equal(x_in, x_out) npt.assert_array_equal(y_in, y_out) nt.assert_equal(len(ax.lines), 1) nt.assert_equal(len(ax.collections), 2) for i, j in zip(*np.tril_indices_from(g.axes, -1)): ax = g.axes[i, j] x_in = self.df[vars[j]] y_in = self.df[vars[i]] x_out, y_out = ax.collections[0].get_offsets().T npt.assert_array_equal(x_in, x_out) npt.assert_array_equal(y_in, y_out) nt.assert_equal(len(ax.lines), 1) nt.assert_equal(len(ax.collections), 2) for i, j in zip(*np.diag_indices_from(g.axes)): ax = g.axes[i, j] nt.assert_equal(len(ax.collections), 0) plt.close("all") @skipif(old_matplotlib) def test_pairplot_kde(self): vars = ["x", "y", "z"] g = pairplot(self.df, diag_kind="kde") for ax in g.diag_axes: nt.assert_equal(len(ax.lines), 1) for i, j in zip(*np.triu_indices_from(g.axes, 1)): ax = g.axes[i, j] x_in = self.df[vars[j]] y_in = self.df[vars[i]] x_out, y_out = ax.collections[0].get_offsets().T npt.assert_array_equal(x_in, x_out) npt.assert_array_equal(y_in, y_out) for i, j in zip(*np.tril_indices_from(g.axes, -1)): ax = g.axes[i, j] x_in = self.df[vars[j]] y_in = self.df[vars[i]] x_out, y_out = ax.collections[0].get_offsets().T npt.assert_array_equal(x_in, x_out) npt.assert_array_equal(y_in, y_out) for i, j in zip(*np.diag_indices_from(g.axes)): ax = g.axes[i, j] nt.assert_equal(len(ax.collections), 0) plt.close("all") @skipif(old_matplotlib) def test_pairplot_markers(self): vars = ["x", "y", "z"] markers = ["o", "x", "s", "d"] g = pairplot(self.df, hue="a", vars=vars, markers=markers) nt.assert_equal(g.hue_kws["marker"], markers) plt.close("all") with nt.assert_raises(ValueError): g = pairplot(self.df, hue="a", vars=vars, markers=markers[:-2]) @classmethod def teardown_class(cls): """Ensure that all figures are closed on exit.""" plt.close("all") class TestJointGrid(object): rs = np.random.RandomState(sum(map(ord, "JointGrid"))) x = rs.randn(100) y = rs.randn(100) x_na = x.copy() x_na[10] = np.nan x_na[20] = np.nan data = pd.DataFrame(dict(x=x, y=y, x_na=x_na)) def test_margin_grid_from_arrays(self): g = ag.JointGrid(self.x, self.y) npt.assert_array_equal(g.x, self.x) npt.assert_array_equal(g.y, self.y) plt.close("all") def test_margin_grid_from_series(self): g = ag.JointGrid(self.data.x, self.data.y) npt.assert_array_equal(g.x, self.x) npt.assert_array_equal(g.y, self.y) plt.close("all") def test_margin_grid_from_dataframe(self): g = ag.JointGrid("x", "y", self.data) npt.assert_array_equal(g.x, self.x) npt.assert_array_equal(g.y, self.y) plt.close("all") def test_margin_grid_axis_labels(self): g = ag.JointGrid("x", "y", self.data) xlabel, ylabel = g.ax_joint.get_xlabel(), g.ax_joint.get_ylabel() nt.assert_equal(xlabel, "x") nt.assert_equal(ylabel, "y") g.set_axis_labels("x variable", "y variable") xlabel, ylabel = g.ax_joint.get_xlabel(), g.ax_joint.get_ylabel() nt.assert_equal(xlabel, "x variable") nt.assert_equal(ylabel, "y variable") plt.close("all") def test_dropna(self): g = ag.JointGrid("x_na", "y", self.data, dropna=False) nt.assert_equal(len(g.x), len(self.x_na)) g = ag.JointGrid("x_na", "y", self.data, dropna=True) nt.assert_equal(len(g.x), pd.notnull(self.x_na).sum()) plt.close("all") def test_axlims(self): lim = (-3, 3) g = ag.JointGrid("x", "y", self.data, xlim=lim, ylim=lim) nt.assert_equal(g.ax_joint.get_xlim(), lim) nt.assert_equal(g.ax_joint.get_ylim(), lim) nt.assert_equal(g.ax_marg_x.get_xlim(), lim) nt.assert_equal(g.ax_marg_y.get_ylim(), lim) def test_marginal_ticks(self): g = ag.JointGrid("x", "y", self.data) nt.assert_true(~len(g.ax_marg_x.get_xticks())) nt.assert_true(~len(g.ax_marg_y.get_yticks())) plt.close("all") def test_bivariate_plot(self): g = ag.JointGrid("x", "y", self.data) g.plot_joint(plt.plot) x, y = g.ax_joint.lines[0].get_xydata().T npt.assert_array_equal(x, self.x) npt.assert_array_equal(y, self.y) plt.close("all") def test_univariate_plot(self): g = ag.JointGrid("x", "x", self.data) g.plot_marginals(kdeplot) _, y1 = g.ax_marg_x.lines[0].get_xydata().T y2, _ = g.ax_marg_y.lines[0].get_xydata().T npt.assert_array_equal(y1, y2) plt.close("all") def test_plot(self): g = ag.JointGrid("x", "x", self.data) g.plot(plt.plot, kdeplot) x, y = g.ax_joint.lines[0].get_xydata().T npt.assert_array_equal(x, self.x) npt.assert_array_equal(y, self.x) _, y1 = g.ax_marg_x.lines[0].get_xydata().T y2, _ = g.ax_marg_y.lines[0].get_xydata().T npt.assert_array_equal(y1, y2) plt.close("all") def test_annotate(self): g = ag.JointGrid("x", "y", self.data) rp = stats.pearsonr(self.x, self.y) g.annotate(stats.pearsonr) annotation = g.ax_joint.legend_.texts[0].get_text() nt.assert_equal(annotation, "pearsonr = %.2g; p = %.2g" % rp) g.annotate(stats.pearsonr, stat="correlation") annotation = g.ax_joint.legend_.texts[0].get_text() nt.assert_equal(annotation, "correlation = %.2g; p = %.2g" % rp) def rsquared(x, y): return stats.pearsonr(x, y)[0] ** 2 r2 = rsquared(self.x, self.y) g.annotate(rsquared) annotation = g.ax_joint.legend_.texts[0].get_text() nt.assert_equal(annotation, "rsquared = %.2g" % r2) template = "{stat} = {val:.3g} (p = {p:.3g})" g.annotate(stats.pearsonr, template=template) annotation = g.ax_joint.legend_.texts[0].get_text() nt.assert_equal(annotation, template.format(stat="pearsonr", val=rp[0], p=rp[1])) plt.close("all") def test_space(self): g = ag.JointGrid("x", "y", self.data, space=0) joint_bounds = g.ax_joint.bbox.bounds marg_x_bounds = g.ax_marg_x.bbox.bounds marg_y_bounds = g.ax_marg_y.bbox.bounds nt.assert_equal(joint_bounds[2], marg_x_bounds[2]) nt.assert_equal(joint_bounds[3], marg_y_bounds[3]) @classmethod def teardown_class(cls): """Ensure that all figures are closed on exit.""" plt.close("all") seaborn-0.6.0/seaborn/tests/test_categorical.py000066400000000000000000002233021254425553400216420ustar00rootroot00000000000000import numpy as np import pandas as pd import scipy from scipy import stats import matplotlib as mpl import matplotlib.pyplot as plt from distutils.version import LooseVersion pandas_has_categoricals = LooseVersion(pd.__version__) >= "0.15" import nose.tools as nt import numpy.testing as npt from numpy.testing.decorators import skipif from .. import categorical as cat from .. import palettes class CategoricalFixture(object): """Test boxplot (also base class for things like violinplots).""" rs = np.random.RandomState(30) n_total = 60 x = rs.randn(n_total / 3, 3) x_df = pd.DataFrame(x, columns=pd.Series(list("XYZ"), name="big")) y = pd.Series(rs.randn(n_total), name="y_data") g = pd.Series(np.repeat(list("abc"), n_total / 3), name="small") h = pd.Series(np.tile(list("mn"), n_total / 2), name="medium") u = pd.Series(np.tile(list("jkh"), n_total / 3)) df = pd.DataFrame(dict(y=y, g=g, h=h, u=u)) x_df["W"] = g class TestCategoricalPlotter(CategoricalFixture): def test_wide_df_data(self): p = cat._CategoricalPlotter() # Test basic wide DataFrame p.establish_variables(data=self.x_df) # Check data attribute for x, y, in zip(p.plot_data, self.x_df[["X", "Y", "Z"]].values.T): npt.assert_array_equal(x, y) # Check semantic attributes nt.assert_equal(p.orient, "v") nt.assert_is(p.plot_hues, None) nt.assert_is(p.group_label, "big") nt.assert_is(p.value_label, None) # Test wide dataframe with forced horizontal orientation p.establish_variables(data=self.x_df, orient="horiz") nt.assert_equal(p.orient, "h") # Text exception by trying to hue-group with a wide dataframe with nt.assert_raises(ValueError): p.establish_variables(hue="d", data=self.x_df) def test_1d_input_data(self): p = cat._CategoricalPlotter() # Test basic vector data x_1d_array = self.x.ravel() p.establish_variables(data=x_1d_array) nt.assert_equal(len(p.plot_data), 1) nt.assert_equal(len(p.plot_data[0]), self.n_total) nt.assert_is(p.group_label, None) nt.assert_is(p.value_label, None) # Test basic vector data in list form x_1d_list = x_1d_array.tolist() p.establish_variables(data=x_1d_list) nt.assert_equal(len(p.plot_data), 1) nt.assert_equal(len(p.plot_data[0]), self.n_total) nt.assert_is(p.group_label, None) nt.assert_is(p.value_label, None) # Test an object array that looks 1D but isn't x_notreally_1d = np.array([self.x.ravel(), self.x.ravel()[:self.n_total / 2]]) p.establish_variables(data=x_notreally_1d) nt.assert_equal(len(p.plot_data), 2) nt.assert_equal(len(p.plot_data[0]), self.n_total) nt.assert_equal(len(p.plot_data[1]), self.n_total / 2) nt.assert_is(p.group_label, None) nt.assert_is(p.value_label, None) def test_2d_input_data(self): p = cat._CategoricalPlotter() x = self.x[:, 0] # Test vector data that looks 2D but doesn't really have columns p.establish_variables(data=x[:, np.newaxis]) nt.assert_equal(len(p.plot_data), 1) nt.assert_equal(len(p.plot_data[0]), self.x.shape[0]) nt.assert_is(p.group_label, None) nt.assert_is(p.value_label, None) # Test vector data that looks 2D but doesn't really have rows p.establish_variables(data=x[np.newaxis, :]) nt.assert_equal(len(p.plot_data), 1) nt.assert_equal(len(p.plot_data[0]), self.x.shape[0]) nt.assert_is(p.group_label, None) nt.assert_is(p.value_label, None) def test_3d_input_data(self): p = cat._CategoricalPlotter() # Test that passing actually 3D data raises x = np.zeros((5, 5, 5)) with nt.assert_raises(ValueError): p.establish_variables(data=x) def test_list_of_array_input_data(self): p = cat._CategoricalPlotter() # Test 2D input in list form x_list = self.x.T.tolist() p.establish_variables(data=x_list) nt.assert_equal(len(p.plot_data), 3) lengths = [len(v_i) for v_i in p.plot_data] nt.assert_equal(lengths, [self.n_total / 3] * 3) nt.assert_is(p.group_label, None) nt.assert_is(p.value_label, None) def test_wide_array_input_data(self): p = cat._CategoricalPlotter() # Test 2D input in array form p.establish_variables(data=self.x) nt.assert_equal(np.shape(p.plot_data), (3, self.n_total / 3)) npt.assert_array_equal(p.plot_data, self.x.T) nt.assert_is(p.group_label, None) nt.assert_is(p.value_label, None) def test_single_long_direct_inputs(self): p = cat._CategoricalPlotter() # Test passing a series to the x variable p.establish_variables(x=self.y) npt.assert_equal(p.plot_data, [self.y]) nt.assert_equal(p.orient, "h") nt.assert_equal(p.value_label, "y_data") nt.assert_is(p.group_label, None) # Test passing a series to the y variable p.establish_variables(y=self.y) npt.assert_equal(p.plot_data, [self.y]) nt.assert_equal(p.orient, "v") nt.assert_equal(p.value_label, "y_data") nt.assert_is(p.group_label, None) # Test passing an array to the y variable p.establish_variables(y=self.y.values) npt.assert_equal(p.plot_data, [self.y]) nt.assert_equal(p.orient, "v") nt.assert_is(p.value_label, None) nt.assert_is(p.group_label, None) def test_single_long_indirect_inputs(self): p = cat._CategoricalPlotter() # Test referencing a DataFrame series in the x variable p.establish_variables(x="y", data=self.df) npt.assert_equal(p.plot_data, [self.y]) nt.assert_equal(p.orient, "h") nt.assert_equal(p.value_label, "y") nt.assert_is(p.group_label, None) # Test referencing a DataFrame series in the y variable p.establish_variables(y="y", data=self.df) npt.assert_equal(p.plot_data, [self.y]) nt.assert_equal(p.orient, "v") nt.assert_equal(p.value_label, "y") nt.assert_is(p.group_label, None) def test_longform_groupby(self): p = cat._CategoricalPlotter() # Test a vertically oriented grouped and nested plot p.establish_variables("g", "y", "h", data=self.df) nt.assert_equal(len(p.plot_data), 3) nt.assert_equal(len(p.plot_hues), 3) nt.assert_equal(p.orient, "v") nt.assert_equal(p.value_label, "y") nt.assert_equal(p.group_label, "g") nt.assert_equal(p.hue_title, "h") for group, vals in zip(["a", "b", "c"], p.plot_data): npt.assert_array_equal(vals, self.y[self.g == group]) for group, hues in zip(["a", "b", "c"], p.plot_hues): npt.assert_array_equal(hues, self.h[self.g == group]) # Test a grouped and nested plot with direct array value data p.establish_variables("g", self.y.values, "h", self.df) nt.assert_is(p.value_label, None) nt.assert_equal(p.group_label, "g") for group, vals in zip(["a", "b", "c"], p.plot_data): npt.assert_array_equal(vals, self.y[self.g == group]) # Test a grouped and nested plot with direct array hue data p.establish_variables("g", "y", self.h.values, self.df) for group, hues in zip(["a", "b", "c"], p.plot_hues): npt.assert_array_equal(hues, self.h[self.g == group]) # Test categorical grouping data if pandas_has_categoricals: df = self.df.copy() df.g = df.g.astype("category") # Test that horizontal orientation is automatically detected p.establish_variables("y", "g", "h", data=df) nt.assert_equal(len(p.plot_data), 3) nt.assert_equal(len(p.plot_hues), 3) nt.assert_equal(p.orient, "h") nt.assert_equal(p.value_label, "y") nt.assert_equal(p.group_label, "g") nt.assert_equal(p.hue_title, "h") for group, vals in zip(["a", "b", "c"], p.plot_data): npt.assert_array_equal(vals, self.y[self.g == group]) for group, hues in zip(["a", "b", "c"], p.plot_hues): npt.assert_array_equal(hues, self.h[self.g == group]) def test_input_validation(self): p = cat._CategoricalPlotter() kws = dict(x="g", y="y", hue="h", units="u", data=self.df) for input in ["x", "y", "hue", "units"]: input_kws = kws.copy() input_kws[input] = "bad_input" with nt.assert_raises(ValueError): p.establish_variables(**input_kws) def test_order(self): p = cat._CategoricalPlotter() # Test inferred order from a wide dataframe input p.establish_variables(data=self.x_df) nt.assert_equal(p.group_names, ["X", "Y", "Z"]) # Test specified order with a wide dataframe input p.establish_variables(data=self.x_df, order=["Y", "Z", "X"]) nt.assert_equal(p.group_names, ["Y", "Z", "X"]) for group, vals in zip(["Y", "Z", "X"], p.plot_data): npt.assert_array_equal(vals, self.x_df[group]) with nt.assert_raises(ValueError): p.establish_variables(data=self.x, order=[1, 2, 0]) # Test inferred order from a grouped longform input p.establish_variables("g", "y", data=self.df) nt.assert_equal(p.group_names, ["a", "b", "c"]) # Test specified order from a grouped longform input p.establish_variables("g", "y", data=self.df, order=["b", "a", "c"]) nt.assert_equal(p.group_names, ["b", "a", "c"]) for group, vals in zip(["b", "a", "c"], p.plot_data): npt.assert_array_equal(vals, self.y[self.g == group]) # Test inferred order from a grouped input with categorical groups if pandas_has_categoricals: df = self.df.copy() df.g = df.g.astype("category") df.g = df.g.cat.reorder_categories(["c", "b", "a"]) p.establish_variables("g", "y", data=df) nt.assert_equal(p.group_names, ["c", "b", "a"]) for group, vals in zip(["c", "b", "a"], p.plot_data): npt.assert_array_equal(vals, self.y[self.g == group]) df.g = (df.g.cat.add_categories("d") .cat.reorder_categories(["c", "b", "d", "a"])) p.establish_variables("g", "y", data=df) nt.assert_equal(p.group_names, ["c", "b", "d", "a"]) def test_hue_order(self): p = cat._CategoricalPlotter() # Test inferred hue order p.establish_variables("g", "y", "h", data=self.df) nt.assert_equal(p.hue_names, ["m", "n"]) # Test specified hue order p.establish_variables("g", "y", "h", data=self.df, hue_order=["n", "m"]) nt.assert_equal(p.hue_names, ["n", "m"]) # Test inferred hue order from a categorical hue input if pandas_has_categoricals: df = self.df.copy() df.h = df.h.astype("category") df.h = df.h.cat.reorder_categories(["n", "m"]) p.establish_variables("g", "y", "h", data=df) nt.assert_equal(p.hue_names, ["n", "m"]) df.h = (df.h.cat.add_categories("o") .cat.reorder_categories(["o", "m", "n"])) p.establish_variables("g", "y", "h", data=df) nt.assert_equal(p.hue_names, ["o", "m", "n"]) def test_plot_units(self): p = cat._CategoricalPlotter() p.establish_variables("g", "y", "h", data=self.df) nt.assert_is(p.plot_units, None) p.establish_variables("g", "y", "h", data=self.df, units="u") for group, units in zip(["a", "b", "c"], p.plot_units): npt.assert_array_equal(units, self.u[self.g == group]) def test_infer_orient(self): p = cat._CategoricalPlotter() cats = pd.Series(["a", "b", "c"] * 10) nums = pd.Series(self.rs.randn(30)) nt.assert_equal(p.infer_orient(cats, nums), "v") nt.assert_equal(p.infer_orient(nums, cats), "h") nt.assert_equal(p.infer_orient(nums, None), "h") nt.assert_equal(p.infer_orient(None, nums), "v") nt.assert_equal(p.infer_orient(nums, nums, "vert"), "v") nt.assert_equal(p.infer_orient(nums, nums, "hori"), "h") with nt.assert_raises(ValueError): p.infer_orient(cats, cats) if pandas_has_categoricals: cats = pd.Series([0, 1, 2] * 10, dtype="category") nt.assert_equal(p.infer_orient(cats, nums), "v") nt.assert_equal(p.infer_orient(nums, cats), "h") with nt.assert_raises(ValueError): p.infer_orient(cats, cats) def test_default_palettes(self): p = cat._CategoricalPlotter() # Test palette mapping the x position p.establish_variables("g", "y", data=self.df) p.establish_colors(None, None, 1) nt.assert_equal(p.colors, palettes.color_palette("deep", 3)) # Test palette mapping the hue position p.establish_variables("g", "y", "h", data=self.df) p.establish_colors(None, None, 1) nt.assert_equal(p.colors, palettes.color_palette("deep", 2)) def test_default_palette_with_many_levels(self): with palettes.color_palette(["blue", "red"], 2): p = cat._CategoricalPlotter() p.establish_variables("g", "y", data=self.df) p.establish_colors(None, None, 1) npt.assert_array_equal(p.colors, palettes.husl_palette(3, l=.7)) def test_specific_color(self): p = cat._CategoricalPlotter() # Test the same color for each x position p.establish_variables("g", "y", data=self.df) p.establish_colors("blue", None, 1) blue_rgb = mpl.colors.colorConverter.to_rgb("blue") nt.assert_equal(p.colors, [blue_rgb] * 3) # Test a color-based blend for the hue mapping p.establish_variables("g", "y", "h", data=self.df) p.establish_colors("#ff0022", None, 1) rgba_array = np.array(palettes.light_palette("#ff0022", 2)) npt.assert_array_almost_equal(p.colors, rgba_array[:, :3]) def test_specific_palette(self): p = cat._CategoricalPlotter() # Test palette mapping the x position p.establish_variables("g", "y", data=self.df) p.establish_colors(None, "dark", 1) nt.assert_equal(p.colors, palettes.color_palette("dark", 3)) # Test that non-None `color` and `hue` raises an error p.establish_variables("g", "y", "h", data=self.df) p.establish_colors(None, "muted", 1) nt.assert_equal(p.colors, palettes.color_palette("muted", 2)) # Test that specified palette overrides specified color p = cat._CategoricalPlotter() p.establish_variables("g", "y", data=self.df) p.establish_colors("blue", "deep", 1) nt.assert_equal(p.colors, palettes.color_palette("deep", 3)) def test_dict_as_palette(self): p = cat._CategoricalPlotter() p.establish_variables("g", "y", "h", data=self.df) pal = {"m": (0, 0, 1), "n": (1, 0, 0)} p.establish_colors(None, pal, 1) nt.assert_equal(p.colors, [(0, 0, 1), (1, 0, 0)]) def test_palette_desaturation(self): p = cat._CategoricalPlotter() p.establish_variables("g", "y", data=self.df) p.establish_colors((0, 0, 1), None, .5) nt.assert_equal(p.colors, [(.25, .25, .75)] * 3) p.establish_colors(None, [(0, 0, 1), (1, 0, 0), "w"], .5) nt.assert_equal(p.colors, [(.25, .25, .75), (.75, .25, .25), (1, 1, 1)]) class TestCategoricalStatPlotter(CategoricalFixture): def test_no_bootstrappig(self): p = cat._CategoricalStatPlotter() p.establish_variables("g", "y", data=self.df) p.estimate_statistic(np.mean, None, 100) npt.assert_array_equal(p.confint, np.array([])) p.establish_variables("g", "y", "h", data=self.df) p.estimate_statistic(np.mean, None, 100) npt.assert_array_equal(p.confint, np.array([[], [], []])) def test_single_layer_stats(self): p = cat._CategoricalStatPlotter() g = pd.Series(np.repeat(list("abc"), 100)) y = pd.Series(np.random.RandomState(0).randn(300)) p.establish_variables(g, y) p.estimate_statistic(np.mean, 95, 10000) nt.assert_equal(p.statistic.shape, (3,)) nt.assert_equal(p.confint.shape, (3, 2)) npt.assert_array_almost_equal(p.statistic, y.groupby(g).mean()) for ci, (_, grp_y) in zip(p.confint, y.groupby(g)): sem = stats.sem(grp_y) mean = grp_y.mean() stats.norm.ppf(.975) half_ci = stats.norm.ppf(.975) * sem ci_want = mean - half_ci, mean + half_ci npt.assert_array_almost_equal(ci_want, ci, 2) def test_single_layer_stats_with_units(self): p = cat._CategoricalStatPlotter() g = pd.Series(np.repeat(list("abc"), 90)) y = pd.Series(np.random.RandomState(0).randn(270)) u = pd.Series(np.repeat(np.tile(list("xyz"), 30), 3)) y[u == "x"] -= 3 y[u == "y"] += 3 p.establish_variables(g, y) p.estimate_statistic(np.mean, 95, 10000) stat1, ci1 = p.statistic, p.confint p.establish_variables(g, y, units=u) p.estimate_statistic(np.mean, 95, 10000) stat2, ci2 = p.statistic, p.confint npt.assert_array_equal(stat1, stat2) ci1_size = ci1[:, 1] - ci1[:, 0] ci2_size = ci2[:, 1] - ci2[:, 0] npt.assert_array_less(ci1_size, ci2_size) def test_single_layer_stats_with_missing_data(self): p = cat._CategoricalStatPlotter() g = pd.Series(np.repeat(list("abc"), 100)) y = pd.Series(np.random.RandomState(0).randn(300)) p.establish_variables(g, y, order=list("abdc")) p.estimate_statistic(np.mean, 95, 10000) nt.assert_equal(p.statistic.shape, (4,)) nt.assert_equal(p.confint.shape, (4, 2)) mean = y[g == "b"].mean() sem = stats.sem(y[g == "b"]) half_ci = stats.norm.ppf(.975) * sem ci = mean - half_ci, mean + half_ci npt.assert_equal(p.statistic[1], mean) npt.assert_array_almost_equal(p.confint[1], ci, 2) npt.assert_equal(p.statistic[2], np.nan) npt.assert_array_equal(p.confint[2], (np.nan, np.nan)) def test_nested_stats(self): p = cat._CategoricalStatPlotter() g = pd.Series(np.repeat(list("abc"), 100)) h = pd.Series(np.tile(list("xy"), 150)) y = pd.Series(np.random.RandomState(0).randn(300)) p.establish_variables(g, y, h) p.estimate_statistic(np.mean, 95, 50000) nt.assert_equal(p.statistic.shape, (3, 2)) nt.assert_equal(p.confint.shape, (3, 2, 2)) npt.assert_array_almost_equal(p.statistic, y.groupby([g, h]).mean().unstack()) for ci_g, (_, grp_y) in zip(p.confint, y.groupby(g)): for ci, hue_y in zip(ci_g, [grp_y[::2], grp_y[1::2]]): sem = stats.sem(hue_y) mean = hue_y.mean() half_ci = stats.norm.ppf(.975) * sem ci_want = mean - half_ci, mean + half_ci npt.assert_array_almost_equal(ci_want, ci, 2) def test_nested_stats_with_units(self): p = cat._CategoricalStatPlotter() g = pd.Series(np.repeat(list("abc"), 90)) h = pd.Series(np.tile(list("xy"), 135)) u = pd.Series(np.repeat(list("ijkijk"), 45)) y = pd.Series(np.random.RandomState(0).randn(270)) y[u == "i"] -= 3 y[u == "k"] += 3 p.establish_variables(g, y, h) p.estimate_statistic(np.mean, 95, 10000) stat1, ci1 = p.statistic, p.confint p.establish_variables(g, y, h, units=u) p.estimate_statistic(np.mean, 95, 10000) stat2, ci2 = p.statistic, p.confint npt.assert_array_equal(stat1, stat2) ci1_size = ci1[:, 0, 1] - ci1[:, 0, 0] ci2_size = ci2[:, 0, 1] - ci2[:, 0, 0] npt.assert_array_less(ci1_size, ci2_size) def test_nested_stats_with_missing_data(self): p = cat._CategoricalStatPlotter() g = pd.Series(np.repeat(list("abc"), 100)) y = pd.Series(np.random.RandomState(0).randn(300)) h = pd.Series(np.tile(list("xy"), 150)) p.establish_variables(g, y, h, order=list("abdc"), hue_order=list("zyx")) p.estimate_statistic(np.mean, 95, 50000) nt.assert_equal(p.statistic.shape, (4, 3)) nt.assert_equal(p.confint.shape, (4, 3, 2)) mean = y[(g == "b") & (h == "x")].mean() sem = stats.sem(y[(g == "b") & (h == "x")]) half_ci = stats.norm.ppf(.975) * sem ci = mean - half_ci, mean + half_ci npt.assert_equal(p.statistic[1, 2], mean) npt.assert_array_almost_equal(p.confint[1, 2], ci, 2) npt.assert_array_equal(p.statistic[:, 0], [np.nan] * 4) npt.assert_array_equal(p.statistic[2], [np.nan] * 3) npt.assert_array_equal(p.confint[:, 0], np.zeros((4, 2)) * np.nan) npt.assert_array_equal(p.confint[2], np.zeros((3, 2)) * np.nan) def test_estimator_value_label(self): p = cat._CategoricalStatPlotter() p.establish_variables("g", "y", data=self.df) p.estimate_statistic(np.mean, None, 100) nt.assert_equal(p.value_label, "mean(y)") p = cat._CategoricalStatPlotter() p.establish_variables("g", "y", data=self.df) p.estimate_statistic(np.median, None, 100) nt.assert_equal(p.value_label, "median(y)") def test_draw_cis(self): p = cat._CategoricalStatPlotter() # Test vertical CIs p.orient = "v" f, ax = plt.subplots() at_group = [0, 1] confints = [(.5, 1.5), (.25, .8)] colors = [".2", ".3"] p.draw_confints(ax, at_group, confints, colors) lines = ax.lines for line, at, ci, c in zip(lines, at_group, confints, colors): x, y = line.get_xydata().T npt.assert_array_equal(x, [at, at]) npt.assert_array_equal(y, ci) nt.assert_equal(line.get_color(), c) plt.close("all") # Test horizontal CIs p.orient = "h" f, ax = plt.subplots() p.draw_confints(ax, at_group, confints, colors) lines = ax.lines for line, at, ci, c in zip(lines, at_group, confints, colors): x, y = line.get_xydata().T npt.assert_array_equal(x, ci) npt.assert_array_equal(y, [at, at]) nt.assert_equal(line.get_color(), c) plt.close("all") # Test extra keyword arguments f, ax = plt.subplots() p.draw_confints(ax, at_group, confints, colors, lw=4) line = ax.lines[0] nt.assert_equal(line.get_linewidth(), 4) plt.close("all") class TestBoxPlotter(CategoricalFixture): default_kws = dict(x=None, y=None, hue=None, data=None, order=None, hue_order=None, orient=None, color=None, palette=None, saturation=.75, width=.8, fliersize=5, linewidth=None) def test_nested_width(self): p = cat._BoxPlotter(**self.default_kws) p.establish_variables("g", "y", "h", data=self.df) nt.assert_equal(p.nested_width, .4 * .98) kws = self.default_kws.copy() kws["width"] = .6 p = cat._BoxPlotter(**kws) p.establish_variables("g", "y", "h", data=self.df) nt.assert_equal(p.nested_width, .3 * .98) def test_hue_offsets(self): p = cat._BoxPlotter(**self.default_kws) p.establish_variables("g", "y", "h", data=self.df) npt.assert_array_equal(p.hue_offsets, [-.2, .2]) kws = self.default_kws.copy() kws["width"] = .6 p = cat._BoxPlotter(**kws) p.establish_variables("g", "y", "h", data=self.df) npt.assert_array_equal(p.hue_offsets, [-.15, .15]) p = cat._BoxPlotter(**kws) p.establish_variables("h", "y", "g", data=self.df) npt.assert_array_almost_equal(p.hue_offsets, [-.2, 0, .2]) def test_axes_data(self): ax = cat.boxplot("g", "y", data=self.df) nt.assert_equal(len(ax.artists), 3) plt.close("all") ax = cat.boxplot("g", "y", "h", data=self.df) nt.assert_equal(len(ax.artists), 6) plt.close("all") def test_box_colors(self): ax = cat.boxplot("g", "y", data=self.df, saturation=1) pal = palettes.color_palette("deep", 3) for patch, color in zip(ax.artists, pal): nt.assert_equal(patch.get_facecolor()[:3], color) plt.close("all") ax = cat.boxplot("g", "y", "h", data=self.df, saturation=1) pal = palettes.color_palette("deep", 2) for patch, color in zip(ax.artists, pal * 2): nt.assert_equal(patch.get_facecolor()[:3], color) plt.close("all") def test_draw_missing_boxes(self): ax = cat.boxplot("g", "y", data=self.df, order=["a", "b", "c", "d"]) nt.assert_equal(len(ax.artists), 3) plt.close("all") def test_missing_data(self): x = ["a", "a", "b", "b", "c", "c", "d", "d"] h = ["x", "y", "x", "y", "x", "y", "x", "y"] y = self.rs.randn(8) y[-2:] = np.nan ax = cat.boxplot(x, y) nt.assert_equal(len(ax.artists), 3) plt.close("all") y[-1] = 0 ax = cat.boxplot(x, y, h) nt.assert_equal(len(ax.artists), 7) plt.close("all") def test_boxplots(self): # Smoke test the high level boxplot options cat.boxplot("y", data=self.df) plt.close("all") cat.boxplot(y="y", data=self.df) plt.close("all") cat.boxplot("g", "y", data=self.df) plt.close("all") cat.boxplot("y", "g", data=self.df, orient="h") plt.close("all") cat.boxplot("g", "y", "h", data=self.df) plt.close("all") cat.boxplot("g", "y", "h", order=list("nabc"), data=self.df) plt.close("all") cat.boxplot("g", "y", "h", hue_order=list("omn"), data=self.df) plt.close("all") cat.boxplot("y", "g", "h", data=self.df, orient="h") plt.close("all") def test_axes_annotation(self): ax = cat.boxplot("g", "y", data=self.df) nt.assert_equal(ax.get_xlabel(), "g") nt.assert_equal(ax.get_ylabel(), "y") nt.assert_equal(ax.get_xlim(), (-.5, 2.5)) npt.assert_array_equal(ax.get_xticks(), [0, 1, 2]) npt.assert_array_equal([l.get_text() for l in ax.get_xticklabels()], ["a", "b", "c"]) plt.close("all") ax = cat.boxplot("g", "y", "h", data=self.df) nt.assert_equal(ax.get_xlabel(), "g") nt.assert_equal(ax.get_ylabel(), "y") npt.assert_array_equal(ax.get_xticks(), [0, 1, 2]) npt.assert_array_equal([l.get_text() for l in ax.get_xticklabels()], ["a", "b", "c"]) npt.assert_array_equal([l.get_text() for l in ax.legend_.get_texts()], ["m", "n"]) plt.close("all") ax = cat.boxplot("y", "g", data=self.df, orient="h") nt.assert_equal(ax.get_xlabel(), "y") nt.assert_equal(ax.get_ylabel(), "g") nt.assert_equal(ax.get_ylim(), (2.5, -.5)) npt.assert_array_equal(ax.get_yticks(), [0, 1, 2]) npt.assert_array_equal([l.get_text() for l in ax.get_yticklabels()], ["a", "b", "c"]) plt.close("all") class TestViolinPlotter(CategoricalFixture): default_kws = dict(x=None, y=None, hue=None, data=None, order=None, hue_order=None, bw="scott", cut=2, scale="area", scale_hue=True, gridsize=100, width=.8, inner="box", split=False, orient=None, linewidth=None, color=None, palette=None, saturation=.75) def test_split_error(self): kws = self.default_kws.copy() kws.update(dict(x="h", y="y", hue="g", data=self.df, split=True)) with nt.assert_raises(ValueError): cat._ViolinPlotter(**kws) def test_no_observations(self): p = cat._ViolinPlotter(**self.default_kws) x = ["a", "a", "b"] y = self.rs.randn(3) y[-1] = np.nan p.establish_variables(x, y) p.estimate_densities("scott", 2, "area", True, 20) nt.assert_equal(len(p.support[0]), 20) nt.assert_equal(len(p.support[1]), 0) nt.assert_equal(len(p.density[0]), 20) nt.assert_equal(len(p.density[1]), 1) nt.assert_equal(p.density[1].item(), 1) p.estimate_densities("scott", 2, "count", True, 20) nt.assert_equal(p.density[1].item(), 0) x = ["a"] * 4 + ["b"] * 2 y = self.rs.randn(6) h = ["m", "n"] * 2 + ["m"] * 2 p.establish_variables(x, y, h) p.estimate_densities("scott", 2, "area", True, 20) nt.assert_equal(len(p.support[1][0]), 20) nt.assert_equal(len(p.support[1][1]), 0) nt.assert_equal(len(p.density[1][0]), 20) nt.assert_equal(len(p.density[1][1]), 1) nt.assert_equal(p.density[1][1].item(), 1) p.estimate_densities("scott", 2, "count", False, 20) nt.assert_equal(p.density[1][1].item(), 0) def test_single_observation(self): p = cat._ViolinPlotter(**self.default_kws) x = ["a", "a", "b"] y = self.rs.randn(3) p.establish_variables(x, y) p.estimate_densities("scott", 2, "area", True, 20) nt.assert_equal(len(p.support[0]), 20) nt.assert_equal(len(p.support[1]), 1) nt.assert_equal(len(p.density[0]), 20) nt.assert_equal(len(p.density[1]), 1) nt.assert_equal(p.density[1].item(), 1) p.estimate_densities("scott", 2, "count", True, 20) nt.assert_equal(p.density[1].item(), .5) x = ["b"] * 4 + ["a"] * 3 y = self.rs.randn(7) h = (["m", "n"] * 4)[:-1] p.establish_variables(x, y, h) p.estimate_densities("scott", 2, "area", True, 20) nt.assert_equal(len(p.support[1][0]), 20) nt.assert_equal(len(p.support[1][1]), 1) nt.assert_equal(len(p.density[1][0]), 20) nt.assert_equal(len(p.density[1][1]), 1) nt.assert_equal(p.density[1][1].item(), 1) p.estimate_densities("scott", 2, "count", False, 20) nt.assert_equal(p.density[1][1].item(), .5) def test_dwidth(self): kws = self.default_kws.copy() kws.update(dict(x="g", y="y", data=self.df)) p = cat._ViolinPlotter(**kws) nt.assert_equal(p.dwidth, .4) kws.update(dict(width=.4)) p = cat._ViolinPlotter(**kws) nt.assert_equal(p.dwidth, .2) kws.update(dict(hue="h", width=.8)) p = cat._ViolinPlotter(**kws) nt.assert_equal(p.dwidth, .2) kws.update(dict(split=True)) p = cat._ViolinPlotter(**kws) nt.assert_equal(p.dwidth, .4) def test_scale_area(self): kws = self.default_kws.copy() kws["scale"] = "area" p = cat._ViolinPlotter(**kws) # Test single layer of grouping p.hue_names = None density = [self.rs.uniform(0, .8, 50), self.rs.uniform(0, .2, 50)] max_before = np.array([d.max() for d in density]) p.scale_area(density, max_before, False) max_after = np.array([d.max() for d in density]) nt.assert_equal(max_after[0], 1) before_ratio = max_before[1] / max_before[0] after_ratio = max_after[1] / max_after[0] nt.assert_equal(before_ratio, after_ratio) # Test nested grouping scaling across all densities p.hue_names = ["foo", "bar"] density = [[self.rs.uniform(0, .8, 50), self.rs.uniform(0, .2, 50)], [self.rs.uniform(0, .1, 50), self.rs.uniform(0, .02, 50)]] max_before = np.array([[r.max() for r in row] for row in density]) p.scale_area(density, max_before, False) max_after = np.array([[r.max() for r in row] for row in density]) nt.assert_equal(max_after[0, 0], 1) before_ratio = max_before[1, 1] / max_before[0, 0] after_ratio = max_after[1, 1] / max_after[0, 0] nt.assert_equal(before_ratio, after_ratio) # Test nested grouping scaling within hue p.hue_names = ["foo", "bar"] density = [[self.rs.uniform(0, .8, 50), self.rs.uniform(0, .2, 50)], [self.rs.uniform(0, .1, 50), self.rs.uniform(0, .02, 50)]] max_before = np.array([[r.max() for r in row] for row in density]) p.scale_area(density, max_before, True) max_after = np.array([[r.max() for r in row] for row in density]) nt.assert_equal(max_after[0, 0], 1) nt.assert_equal(max_after[1, 0], 1) before_ratio = max_before[1, 1] / max_before[1, 0] after_ratio = max_after[1, 1] / max_after[1, 0] nt.assert_equal(before_ratio, after_ratio) def test_scale_width(self): kws = self.default_kws.copy() kws["scale"] = "width" p = cat._ViolinPlotter(**kws) # Test single layer of grouping p.hue_names = None density = [self.rs.uniform(0, .8, 50), self.rs.uniform(0, .2, 50)] p.scale_width(density) max_after = np.array([d.max() for d in density]) npt.assert_array_equal(max_after, [1, 1]) # Test nested grouping p.hue_names = ["foo", "bar"] density = [[self.rs.uniform(0, .8, 50), self.rs.uniform(0, .2, 50)], [self.rs.uniform(0, .1, 50), self.rs.uniform(0, .02, 50)]] p.scale_width(density) max_after = np.array([[r.max() for r in row] for row in density]) npt.assert_array_equal(max_after, [[1, 1], [1, 1]]) def test_scale_count(self): kws = self.default_kws.copy() kws["scale"] = "count" p = cat._ViolinPlotter(**kws) # Test single layer of grouping p.hue_names = None density = [self.rs.uniform(0, .8, 20), self.rs.uniform(0, .2, 40)] counts = np.array([20, 40]) p.scale_count(density, counts, False) max_after = np.array([d.max() for d in density]) npt.assert_array_equal(max_after, [.5, 1]) # Test nested grouping scaling across all densities p.hue_names = ["foo", "bar"] density = [[self.rs.uniform(0, .8, 5), self.rs.uniform(0, .2, 40)], [self.rs.uniform(0, .1, 100), self.rs.uniform(0, .02, 50)]] counts = np.array([[5, 40], [100, 50]]) p.scale_count(density, counts, False) max_after = np.array([[r.max() for r in row] for row in density]) npt.assert_array_equal(max_after, [[.05, .4], [1, .5]]) # Test nested grouping scaling within hue p.hue_names = ["foo", "bar"] density = [[self.rs.uniform(0, .8, 5), self.rs.uniform(0, .2, 40)], [self.rs.uniform(0, .1, 100), self.rs.uniform(0, .02, 50)]] counts = np.array([[5, 40], [100, 50]]) p.scale_count(density, counts, True) max_after = np.array([[r.max() for r in row] for row in density]) npt.assert_array_equal(max_after, [[.125, 1], [1, .5]]) def test_bad_scale(self): kws = self.default_kws.copy() kws["scale"] = "not_a_scale_type" with nt.assert_raises(ValueError): cat._ViolinPlotter(**kws) def test_kde_fit(self): p = cat._ViolinPlotter(**self.default_kws) data = self.y data_std = data.std(ddof=1) # Bandwidth behavior depends on scipy version if LooseVersion(scipy.__version__) < "0.11": # Test ignoring custom bandwidth on old scipy kde, bw = p.fit_kde(self.y, .2) nt.assert_is_instance(kde, stats.gaussian_kde) nt.assert_equal(kde.factor, kde.scotts_factor()) else: # Test reference rule bandwidth kde, bw = p.fit_kde(data, "scott") nt.assert_is_instance(kde, stats.gaussian_kde) nt.assert_equal(kde.factor, kde.scotts_factor()) nt.assert_equal(bw, kde.scotts_factor() * data_std) # Test numeric scale factor kde, bw = p.fit_kde(self.y, .2) nt.assert_is_instance(kde, stats.gaussian_kde) nt.assert_equal(kde.factor, .2) nt.assert_equal(bw, .2 * data_std) def test_draw_to_density(self): p = cat._ViolinPlotter(**self.default_kws) # p.dwidth will be 1 for easier testing p.width = 2 # Test verical plots support = np.array([.2, .6]) density = np.array([.1, .4]) # Test full vertical plot _, ax = plt.subplots() p.draw_to_density(ax, 0, .5, support, density, False) x, y = ax.lines[0].get_xydata().T npt.assert_array_equal(x, [.99 * -.4, .99 * .4]) npt.assert_array_equal(y, [.5, .5]) plt.close("all") # Test left vertical plot _, ax = plt.subplots() p.draw_to_density(ax, 0, .5, support, density, "left") x, y = ax.lines[0].get_xydata().T npt.assert_array_equal(x, [.99 * -.4, 0]) npt.assert_array_equal(y, [.5, .5]) plt.close("all") # Test right vertical plot _, ax = plt.subplots() p.draw_to_density(ax, 0, .5, support, density, "right") x, y = ax.lines[0].get_xydata().T npt.assert_array_equal(x, [0, .99 * .4]) npt.assert_array_equal(y, [.5, .5]) plt.close("all") # Switch orientation to test horizontal plots p.orient = "h" support = np.array([.2, .5]) density = np.array([.3, .7]) # Test full horizontal plot _, ax = plt.subplots() p.draw_to_density(ax, 0, .6, support, density, False) x, y = ax.lines[0].get_xydata().T npt.assert_array_equal(x, [.6, .6]) npt.assert_array_equal(y, [.99 * -.7, .99 * .7]) plt.close("all") # Test left horizontal plot _, ax = plt.subplots() p.draw_to_density(ax, 0, .6, support, density, "left") x, y = ax.lines[0].get_xydata().T npt.assert_array_equal(x, [.6, .6]) npt.assert_array_equal(y, [.99 * -.7, 0]) plt.close("all") # Test right horizontal plot _, ax = plt.subplots() p.draw_to_density(ax, 0, .6, support, density, "right") x, y = ax.lines[0].get_xydata().T npt.assert_array_equal(x, [.6, .6]) npt.assert_array_equal(y, [0, .99 * .7]) plt.close("all") def test_draw_single_observations(self): p = cat._ViolinPlotter(**self.default_kws) p.width = 2 # Test vertical plot _, ax = plt.subplots() p.draw_single_observation(ax, 1, 1.5, 1) x, y = ax.lines[0].get_xydata().T npt.assert_array_equal(x, [0, 2]) npt.assert_array_equal(y, [1.5, 1.5]) plt.close("all") # Test horizontal plot p.orient = "h" _, ax = plt.subplots() p.draw_single_observation(ax, 2, 2.2, .5) x, y = ax.lines[0].get_xydata().T npt.assert_array_equal(x, [2.2, 2.2]) npt.assert_array_equal(y, [1.5, 2.5]) plt.close("all") def test_draw_box_lines(self): # Test vertical plot kws = self.default_kws.copy() kws.update(dict(y="y", data=self.df, inner=None)) p = cat._ViolinPlotter(**kws) _, ax = plt.subplots() p.draw_box_lines(ax, self.y, p.support[0], p.density[0], 0) nt.assert_equal(len(ax.lines), 2) q25, q50, q75 = np.percentile(self.y, [25, 50, 75]) _, y = ax.lines[1].get_xydata().T npt.assert_array_equal(y, [q25, q75]) _, y = ax.collections[0].get_offsets().T nt.assert_equal(y, q50) plt.close("all") # Test horizontal plot kws = self.default_kws.copy() kws.update(dict(x="y", data=self.df, inner=None)) p = cat._ViolinPlotter(**kws) _, ax = plt.subplots() p.draw_box_lines(ax, self.y, p.support[0], p.density[0], 0) nt.assert_equal(len(ax.lines), 2) q25, q50, q75 = np.percentile(self.y, [25, 50, 75]) x, _ = ax.lines[1].get_xydata().T npt.assert_array_equal(x, [q25, q75]) x, _ = ax.collections[0].get_offsets().T nt.assert_equal(x, q50) plt.close("all") def test_draw_quartiles(self): kws = self.default_kws.copy() kws.update(dict(y="y", data=self.df, inner=None)) p = cat._ViolinPlotter(**kws) _, ax = plt.subplots() p.draw_quartiles(ax, self.y, p.support[0], p.density[0], 0) for val, line in zip(np.percentile(self.y, [25, 50, 75]), ax.lines): _, y = line.get_xydata().T npt.assert_array_equal(y, [val, val]) plt.close("all") def test_draw_points(self): p = cat._ViolinPlotter(**self.default_kws) # Test vertical plot _, ax = plt.subplots() p.draw_points(ax, self.y, 0) x, y = ax.collections[0].get_offsets().T npt.assert_array_equal(x, np.zeros_like(self.y)) npt.assert_array_equal(y, self.y) plt.close("all") # Test horizontal plot p.orient = "h" _, ax = plt.subplots() p.draw_points(ax, self.y, 0) x, y = ax.collections[0].get_offsets().T npt.assert_array_equal(x, self.y) npt.assert_array_equal(y, np.zeros_like(self.y)) plt.close("all") def test_draw_sticks(self): kws = self.default_kws.copy() kws.update(dict(y="y", data=self.df, inner=None)) p = cat._ViolinPlotter(**kws) # Test vertical plot _, ax = plt.subplots() p.draw_stick_lines(ax, self.y, p.support[0], p.density[0], 0) for val, line in zip(self.y, ax.lines): _, y = line.get_xydata().T npt.assert_array_equal(y, [val, val]) plt.close("all") # Test horizontal plot p.orient = "h" _, ax = plt.subplots() p.draw_stick_lines(ax, self.y, p.support[0], p.density[0], 0) for val, line in zip(self.y, ax.lines): x, _ = line.get_xydata().T npt.assert_array_equal(x, [val, val]) plt.close("all") def test_validate_inner(self): kws = self.default_kws.copy() kws.update(dict(inner="bad_inner")) with nt.assert_raises(ValueError): cat._ViolinPlotter(**kws) def test_draw_violinplots(self): kws = self.default_kws.copy() # Test single vertical violin kws.update(dict(y="y", data=self.df, inner=None, saturation=1, color=(1, 0, 0, 1))) p = cat._ViolinPlotter(**kws) _, ax = plt.subplots() p.draw_violins(ax) nt.assert_equal(len(ax.collections), 1) npt.assert_array_equal(ax.collections[0].get_facecolors(), [(1, 0, 0, 1)]) plt.close("all") # Test single horizontal violin kws.update(dict(x="y", y=None, color=(0, 1, 0, 1))) p = cat._ViolinPlotter(**kws) _, ax = plt.subplots() p.draw_violins(ax) nt.assert_equal(len(ax.collections), 1) npt.assert_array_equal(ax.collections[0].get_facecolors(), [(0, 1, 0, 1)]) plt.close("all") # Test multiple vertical violins kws.update(dict(x="g", y="y", color=None,)) p = cat._ViolinPlotter(**kws) _, ax = plt.subplots() p.draw_violins(ax) nt.assert_equal(len(ax.collections), 3) for violin, color in zip(ax.collections, palettes.color_palette()): npt.assert_array_equal(violin.get_facecolors()[0, :-1], color) plt.close("all") # Test multiple violins with hue nesting kws.update(dict(hue="h")) p = cat._ViolinPlotter(**kws) _, ax = plt.subplots() p.draw_violins(ax) nt.assert_equal(len(ax.collections), 6) for violin, color in zip(ax.collections, palettes.color_palette(n_colors=2) * 3): npt.assert_array_equal(violin.get_facecolors()[0, :-1], color) plt.close("all") # Test multiple split violins kws.update(dict(split=True, palette="muted")) p = cat._ViolinPlotter(**kws) _, ax = plt.subplots() p.draw_violins(ax) nt.assert_equal(len(ax.collections), 6) for violin, color in zip(ax.collections, palettes.color_palette("muted", n_colors=2) * 3): npt.assert_array_equal(violin.get_facecolors()[0, :-1], color) plt.close("all") def test_draw_violinplots_no_observations(self): kws = self.default_kws.copy() kws["inner"] = None # Test single layer of grouping x = ["a", "a", "b"] y = self.rs.randn(3) y[-1] = np.nan kws.update(x=x, y=y) p = cat._ViolinPlotter(**kws) _, ax = plt.subplots() p.draw_violins(ax) nt.assert_equal(len(ax.collections), 1) nt.assert_equal(len(ax.lines), 0) plt.close("all") # Test nested hue grouping x = ["a"] * 4 + ["b"] * 2 y = self.rs.randn(6) h = ["m", "n"] * 2 + ["m"] * 2 kws.update(x=x, y=y, hue=h) p = cat._ViolinPlotter(**kws) _, ax = plt.subplots() p.draw_violins(ax) nt.assert_equal(len(ax.collections), 3) nt.assert_equal(len(ax.lines), 0) plt.close("all") def test_draw_violinplots_single_observations(self): kws = self.default_kws.copy() kws["inner"] = None # Test single layer of grouping x = ["a", "a", "b"] y = self.rs.randn(3) kws.update(x=x, y=y) p = cat._ViolinPlotter(**kws) _, ax = plt.subplots() p.draw_violins(ax) nt.assert_equal(len(ax.collections), 1) nt.assert_equal(len(ax.lines), 1) plt.close("all") # Test nested hue grouping x = ["b"] * 4 + ["a"] * 3 y = self.rs.randn(7) h = (["m", "n"] * 4)[:-1] kws.update(x=x, y=y, hue=h) p = cat._ViolinPlotter(**kws) _, ax = plt.subplots() p.draw_violins(ax) nt.assert_equal(len(ax.collections), 3) nt.assert_equal(len(ax.lines), 1) plt.close("all") # Test nested hue grouping with split kws["split"] = True p = cat._ViolinPlotter(**kws) _, ax = plt.subplots() p.draw_violins(ax) nt.assert_equal(len(ax.collections), 3) nt.assert_equal(len(ax.lines), 1) plt.close("all") def test_violinplots(self): # Smoke test the high level violinplot options cat.violinplot("y", data=self.df) plt.close("all") cat.violinplot(y="y", data=self.df) plt.close("all") cat.violinplot("g", "y", data=self.df) plt.close("all") cat.violinplot("y", "g", data=self.df, orient="h") plt.close("all") cat.violinplot("g", "y", "h", data=self.df) plt.close("all") cat.violinplot("g", "y", "h", order=list("nabc"), data=self.df) plt.close("all") cat.violinplot("g", "y", "h", hue_order=list("omn"), data=self.df) plt.close("all") cat.violinplot("y", "g", "h", data=self.df, orient="h") plt.close("all") for inner in ["box", "quart", "point", "stick", None]: cat.violinplot("g", "y", data=self.df, inner=inner) plt.close("all") cat.violinplot("g", "y", "h", data=self.df, inner=inner) plt.close("all") cat.violinplot("g", "y", "h", data=self.df, inner=inner, split=True) plt.close("all") class TestStripPlotter(CategoricalFixture): def test_stripplot_vertical(self): pal = palettes.color_palette() ax = cat.stripplot("g", "y", data=self.df) for i, (_, vals) in enumerate(self.y.groupby(self.g)): x, y = ax.collections[i].get_offsets().T npt.assert_array_equal(x, np.ones(len(x)) * i) npt.assert_array_equal(y, vals) npt.assert_equal(ax.collections[i].get_facecolors()[0, :3], pal[i]) plt.close("all") @skipif(not pandas_has_categoricals) def test_stripplot_horiztonal(self): df = self.df.copy() df.g = df.g.astype("category") ax = cat.stripplot("y", "g", data=df) for i, (_, vals) in enumerate(self.y.groupby(self.g)): x, y = ax.collections[i].get_offsets().T npt.assert_array_equal(x, vals) npt.assert_array_equal(y, np.ones(len(x)) * i) plt.close("all") def test_stripplot_jitter(self): pal = palettes.color_palette() ax = cat.stripplot("g", "y", data=self.df, jitter=True) for i, (_, vals) in enumerate(self.y.groupby(self.g)): x, y = ax.collections[i].get_offsets().T npt.assert_array_less(np.ones(len(x)) * i - .1, x) npt.assert_array_less(x, np.ones(len(x)) * i + .1) npt.assert_array_equal(y, vals) npt.assert_equal(ax.collections[i].get_facecolors()[0, :3], pal[i]) plt.close("all") def test_split_nested_stripplot_vertical(self): pal = palettes.color_palette() ax = cat.stripplot("g", "y", "h", data=self.df) for i, (_, group_vals) in enumerate(self.y.groupby(self.g)): for j, (_, vals) in enumerate(group_vals.groupby(self.h)): x, y = ax.collections[i * 2 + j].get_offsets().T npt.assert_array_equal(x, np.ones(len(x)) * i + [-.2, .2][j]) npt.assert_array_equal(y, vals) fc = ax.collections[i * 2 + j].get_facecolors()[0, :3] npt.assert_equal(fc, pal[j]) plt.close("all") @skipif(not pandas_has_categoricals) def test_split_nested_stripplot_horizontal(self): df = self.df.copy() df.g = df.g.astype("category") ax = cat.stripplot("y", "g", "h", data=df) for i, (_, group_vals) in enumerate(self.y.groupby(self.g)): for j, (_, vals) in enumerate(group_vals.groupby(self.h)): x, y = ax.collections[i * 2 + j].get_offsets().T npt.assert_array_equal(x, vals) npt.assert_array_equal(y, np.ones(len(x)) * i + [-.2, .2][j]) plt.close("all") def test_unsplit_nested_stripplot_vertical(self): pal = palettes.color_palette() # Test a simple vertical strip plot ax = cat.stripplot("g", "y", "h", data=self.df, split=False) for i, (_, group_vals) in enumerate(self.y.groupby(self.g)): for j, (_, vals) in enumerate(group_vals.groupby(self.h)): x, y = ax.collections[i * 2 + j].get_offsets().T npt.assert_array_equal(x, np.ones(len(x)) * i) npt.assert_array_equal(y, vals) fc = ax.collections[i * 2 + j].get_facecolors()[0, :3] npt.assert_equal(fc, pal[j]) plt.close("all") @skipif(not pandas_has_categoricals) def test_unsplit_nested_stripplot_horizontal(self): df = self.df.copy() df.g = df.g.astype("category") ax = cat.stripplot("y", "g", "h", data=df, split=False) for i, (_, group_vals) in enumerate(self.y.groupby(self.g)): for j, (_, vals) in enumerate(group_vals.groupby(self.h)): x, y = ax.collections[i * 2 + j].get_offsets().T npt.assert_array_equal(x, vals) npt.assert_array_equal(y, np.ones(len(x)) * i) plt.close("all") class TestBarPlotter(CategoricalFixture): default_kws = dict(x=None, y=None, hue=None, data=None, estimator=np.mean, ci=95, n_boot=100, units=None, order=None, hue_order=None, orient=None, color=None, palette=None, saturation=.75, errcolor=".26") def test_nested_width(self): kws = self.default_kws.copy() p = cat._BarPlotter(**kws) p.establish_variables("g", "y", "h", data=self.df) nt.assert_equal(p.nested_width, .8 / 2) p = cat._BarPlotter(**kws) p.establish_variables("h", "y", "g", data=self.df) nt.assert_equal(p.nested_width, .8 / 3) def test_draw_vertical_bars(self): kws = self.default_kws.copy() kws.update(x="g", y="y", data=self.df) p = cat._BarPlotter(**kws) f, ax = plt.subplots() p.draw_bars(ax, {}) nt.assert_equal(len(ax.patches), len(p.plot_data)) nt.assert_equal(len(ax.lines), len(p.plot_data)) for bar, color in zip(ax.patches, p.colors): nt.assert_equal(bar.get_facecolor()[:-1], color) positions = np.arange(len(p.plot_data)) - p.width / 2 for bar, pos, stat in zip(ax.patches, positions, p.statistic): nt.assert_equal(bar.get_x(), pos) nt.assert_equal(bar.get_y(), min(0, stat)) nt.assert_equal(bar.get_height(), abs(stat)) nt.assert_equal(bar.get_width(), p.width) plt.close("all") def test_draw_horizontal_bars(self): kws = self.default_kws.copy() kws.update(x="y", y="g", orient="h", data=self.df) p = cat._BarPlotter(**kws) f, ax = plt.subplots() p.draw_bars(ax, {}) nt.assert_equal(len(ax.patches), len(p.plot_data)) nt.assert_equal(len(ax.lines), len(p.plot_data)) for bar, color in zip(ax.patches, p.colors): nt.assert_equal(bar.get_facecolor()[:-1], color) positions = np.arange(len(p.plot_data)) - p.width / 2 for bar, pos, stat in zip(ax.patches, positions, p.statistic): nt.assert_equal(bar.get_x(), min(0, stat)) nt.assert_equal(bar.get_y(), pos) nt.assert_equal(bar.get_height(), p.width) nt.assert_equal(bar.get_width(), abs(stat)) plt.close("all") def test_draw_nested_vertical_bars(self): kws = self.default_kws.copy() kws.update(x="g", y="y", hue="h", data=self.df) p = cat._BarPlotter(**kws) f, ax = plt.subplots() p.draw_bars(ax, {}) n_groups, n_hues = len(p.plot_data), len(p.hue_names) nt.assert_equal(len(ax.patches), n_groups * n_hues) nt.assert_equal(len(ax.lines), n_groups * n_hues) for bar in ax.patches[:n_groups]: nt.assert_equal(bar.get_facecolor()[:-1], p.colors[0]) for bar in ax.patches[n_groups:]: nt.assert_equal(bar.get_facecolor()[:-1], p.colors[1]) for bar, stat in zip(ax.patches, p.statistic.T.flat): nt.assert_almost_equal(bar.get_y(), min(0, stat)) nt.assert_almost_equal(bar.get_height(), abs(stat)) positions = np.arange(len(p.plot_data)) for bar, pos in zip(ax.patches[:n_groups], positions): nt.assert_almost_equal(bar.get_x(), pos - p.width / 2) nt.assert_almost_equal(bar.get_width(), p.nested_width) plt.close("all") def test_draw_nested_horizontal_bars(self): kws = self.default_kws.copy() kws.update(x="y", y="g", hue="h", orient="h", data=self.df) p = cat._BarPlotter(**kws) f, ax = plt.subplots() p.draw_bars(ax, {}) n_groups, n_hues = len(p.plot_data), len(p.hue_names) nt.assert_equal(len(ax.patches), n_groups * n_hues) nt.assert_equal(len(ax.lines), n_groups * n_hues) for bar in ax.patches[:n_groups]: nt.assert_equal(bar.get_facecolor()[:-1], p.colors[0]) for bar in ax.patches[n_groups:]: nt.assert_equal(bar.get_facecolor()[:-1], p.colors[1]) positions = np.arange(len(p.plot_data)) for bar, pos in zip(ax.patches[:n_groups], positions): nt.assert_almost_equal(bar.get_y(), pos - p.width / 2) nt.assert_almost_equal(bar.get_height(), p.nested_width) for bar, stat in zip(ax.patches, p.statistic.T.flat): nt.assert_almost_equal(bar.get_x(), min(0, stat)) nt.assert_almost_equal(bar.get_width(), abs(stat)) plt.close("all") def test_draw_missing_bars(self): kws = self.default_kws.copy() order = list("abcd") kws.update(x="g", y="y", order=order, data=self.df) p = cat._BarPlotter(**kws) f, ax = plt.subplots() p.draw_bars(ax, {}) nt.assert_equal(len(ax.patches), len(order)) nt.assert_equal(len(ax.lines), len(order)) plt.close("all") hue_order = list("mno") kws.update(x="g", y="y", hue="h", hue_order=hue_order, data=self.df) p = cat._BarPlotter(**kws) f, ax = plt.subplots() p.draw_bars(ax, {}) nt.assert_equal(len(ax.patches), len(p.plot_data) * len(hue_order)) nt.assert_equal(len(ax.lines), len(p.plot_data) * len(hue_order)) plt.close("all") def test_barplot_colors(self): # Test unnested palette colors kws = self.default_kws.copy() kws.update(x="g", y="y", data=self.df, saturation=1, palette="muted") p = cat._BarPlotter(**kws) f, ax = plt.subplots() p.draw_bars(ax, {}) palette = palettes.color_palette("muted", len(self.g.unique())) for patch, pal_color in zip(ax.patches, palette): nt.assert_equal(patch.get_facecolor()[:-1], pal_color) plt.close("all") # Test single color color = (.2, .2, .3, 1) kws = self.default_kws.copy() kws.update(x="g", y="y", data=self.df, saturation=1, color=color) p = cat._BarPlotter(**kws) f, ax = plt.subplots() p.draw_bars(ax, {}) for patch in ax.patches: nt.assert_equal(patch.get_facecolor(), color) plt.close("all") # Test nested palette colors kws = self.default_kws.copy() kws.update(x="g", y="y", hue="h", data=self.df, saturation=1, palette="Set2") p = cat._BarPlotter(**kws) f, ax = plt.subplots() p.draw_bars(ax, {}) palette = palettes.color_palette("Set2", len(self.h.unique())) for patch in ax.patches[:len(self.g.unique())]: nt.assert_equal(patch.get_facecolor()[:-1], palette[0]) for patch in ax.patches[len(self.g.unique()):]: nt.assert_equal(patch.get_facecolor()[:-1], palette[1]) plt.close("all") def test_simple_barplots(self): ax = cat.barplot("g", "y", data=self.df) nt.assert_equal(len(ax.patches), len(self.g.unique())) nt.assert_equal(ax.get_xlabel(), "g") nt.assert_equal(ax.get_ylabel(), "mean(y)") plt.close("all") ax = cat.barplot("y", "g", orient="h", data=self.df) nt.assert_equal(len(ax.patches), len(self.g.unique())) nt.assert_equal(ax.get_xlabel(), "mean(y)") nt.assert_equal(ax.get_ylabel(), "g") plt.close("all") ax = cat.barplot("g", "y", "h", data=self.df) nt.assert_equal(len(ax.patches), len(self.g.unique()) * len(self.h.unique())) nt.assert_equal(ax.get_xlabel(), "g") nt.assert_equal(ax.get_ylabel(), "mean(y)") plt.close("all") ax = cat.barplot("y", "g", "h", orient="h", data=self.df) nt.assert_equal(len(ax.patches), len(self.g.unique()) * len(self.h.unique())) nt.assert_equal(ax.get_xlabel(), "mean(y)") nt.assert_equal(ax.get_ylabel(), "g") plt.close("all") class TestPointPlotter(CategoricalFixture): default_kws = dict(x=None, y=None, hue=None, data=None, estimator=np.mean, ci=95, n_boot=100, units=None, order=None, hue_order=None, markers="o", linestyles="-", dodge=0, join=True, scale=1, orient=None, color=None, palette=None) def test_different_defualt_colors(self): kws = self.default_kws.copy() kws.update(dict(x="g", y="y", data=self.df)) p = cat._PointPlotter(**kws) color = palettes.color_palette()[0] npt.assert_array_equal(p.colors, [color, color, color]) def test_hue_offsets(self): kws = self.default_kws.copy() kws.update(dict(x="g", y="y", hue="h", data=self.df)) p = cat._PointPlotter(**kws) npt.assert_array_equal(p.hue_offsets, [0, 0]) kws.update(dict(dodge=.5)) p = cat._PointPlotter(**kws) npt.assert_array_equal(p.hue_offsets, [-.25, .25]) kws.update(dict(x="h", hue="g", dodge=0)) p = cat._PointPlotter(**kws) npt.assert_array_equal(p.hue_offsets, [0, 0, 0]) kws.update(dict(dodge=.3)) p = cat._PointPlotter(**kws) npt.assert_array_equal(p.hue_offsets, [-.15, 0, .15]) def test_draw_vertical_points(self): kws = self.default_kws.copy() kws.update(x="g", y="y", data=self.df) p = cat._PointPlotter(**kws) f, ax = plt.subplots() p.draw_points(ax) nt.assert_equal(len(ax.collections), 1) nt.assert_equal(len(ax.lines), len(p.plot_data) + 1) points = ax.collections[0] nt.assert_equal(len(points.get_offsets()), len(p.plot_data)) x, y = points.get_offsets().T npt.assert_array_equal(x, np.arange(len(p.plot_data))) npt.assert_array_equal(y, p.statistic) for got_color, want_color in zip(points.get_facecolors(), p.colors): npt.assert_array_equal(got_color[:-1], want_color) plt.close("all") def test_draw_horizontal_points(self): kws = self.default_kws.copy() kws.update(x="y", y="g", orient="h", data=self.df) p = cat._PointPlotter(**kws) f, ax = plt.subplots() p.draw_points(ax) nt.assert_equal(len(ax.collections), 1) nt.assert_equal(len(ax.lines), len(p.plot_data) + 1) points = ax.collections[0] nt.assert_equal(len(points.get_offsets()), len(p.plot_data)) x, y = points.get_offsets().T npt.assert_array_equal(x, p.statistic) npt.assert_array_equal(y, np.arange(len(p.plot_data))) for got_color, want_color in zip(points.get_facecolors(), p.colors): npt.assert_array_equal(got_color[:-1], want_color) plt.close("all") def test_draw_vertical_nested_points(self): kws = self.default_kws.copy() kws.update(x="g", y="y", hue="h", data=self.df) p = cat._PointPlotter(**kws) f, ax = plt.subplots() p.draw_points(ax) nt.assert_equal(len(ax.collections), 2) nt.assert_equal(len(ax.lines), len(p.plot_data) * len(p.hue_names) + len(p.hue_names)) for points, stats, color in zip(ax.collections, p.statistic.T, p.colors): nt.assert_equal(len(points.get_offsets()), len(p.plot_data)) x, y = points.get_offsets().T npt.assert_array_equal(x, np.arange(len(p.plot_data))) npt.assert_array_equal(y, stats) for got_color in points.get_facecolors(): npt.assert_array_equal(got_color[:-1], color) plt.close("all") def test_draw_horizontal_nested_points(self): kws = self.default_kws.copy() kws.update(x="y", y="g", hue="h", orient="h", data=self.df) p = cat._PointPlotter(**kws) f, ax = plt.subplots() p.draw_points(ax) nt.assert_equal(len(ax.collections), 2) nt.assert_equal(len(ax.lines), len(p.plot_data) * len(p.hue_names) + len(p.hue_names)) for points, stats, color in zip(ax.collections, p.statistic.T, p.colors): nt.assert_equal(len(points.get_offsets()), len(p.plot_data)) x, y = points.get_offsets().T npt.assert_array_equal(x, stats) npt.assert_array_equal(y, np.arange(len(p.plot_data))) for got_color in points.get_facecolors(): npt.assert_array_equal(got_color[:-1], color) plt.close("all") def test_pointplot_colors(self): # Test a single-color unnested plot color = (.2, .2, .3, 1) kws = self.default_kws.copy() kws.update(x="g", y="y", data=self.df, color=color) p = cat._PointPlotter(**kws) f, ax = plt.subplots() p.draw_points(ax) for line in ax.lines: nt.assert_equal(line.get_color(), color[:-1]) for got_color in ax.collections[0].get_facecolors(): npt.assert_array_equal(got_color, color) plt.close("all") # Test a multi-color unnested plot palette = palettes.color_palette("Set1", 3) kws.update(x="g", y="y", data=self.df, palette="Set1") p = cat._PointPlotter(**kws) nt.assert_true(not p.join) f, ax = plt.subplots() p.draw_points(ax) for line, pal_color in zip(ax.lines, palette): npt.assert_array_equal(line.get_color(), pal_color) for point_color, pal_color in zip(ax.collections[0].get_facecolors(), palette): npt.assert_array_equal(point_color[:-1], pal_color) plt.close("all") # Test a multi-colored nested plot palette = palettes.color_palette("dark", 2) kws.update(x="g", y="y", hue="h", data=self.df, palette="dark") p = cat._PointPlotter(**kws) f, ax = plt.subplots() p.draw_points(ax) for line in ax.lines[:(len(p.plot_data) + 1)]: nt.assert_equal(line.get_color(), palette[0]) for line in ax.lines[(len(p.plot_data) + 1):]: nt.assert_equal(line.get_color(), palette[1]) for i, pal_color in enumerate(palette): for point_color in ax.collections[i].get_facecolors(): npt.assert_array_equal(point_color[:-1], pal_color) plt.close("all") def test_simple_pointplots(self): ax = cat.pointplot("g", "y", data=self.df) nt.assert_equal(len(ax.collections), 1) nt.assert_equal(len(ax.lines), len(self.g.unique()) + 1) nt.assert_equal(ax.get_xlabel(), "g") nt.assert_equal(ax.get_ylabel(), "mean(y)") plt.close("all") ax = cat.pointplot("y", "g", orient="h", data=self.df) nt.assert_equal(len(ax.collections), 1) nt.assert_equal(len(ax.lines), len(self.g.unique()) + 1) nt.assert_equal(ax.get_xlabel(), "mean(y)") nt.assert_equal(ax.get_ylabel(), "g") plt.close("all") ax = cat.pointplot("g", "y", "h", data=self.df) nt.assert_equal(len(ax.collections), len(self.h.unique())) nt.assert_equal(len(ax.lines), (len(self.g.unique()) * len(self.h.unique()) + len(self.h.unique()))) nt.assert_equal(ax.get_xlabel(), "g") nt.assert_equal(ax.get_ylabel(), "mean(y)") plt.close("all") ax = cat.pointplot("y", "g", "h", orient="h", data=self.df) nt.assert_equal(len(ax.collections), len(self.h.unique())) nt.assert_equal(len(ax.lines), (len(self.g.unique()) * len(self.h.unique()) + len(self.h.unique()))) nt.assert_equal(ax.get_xlabel(), "mean(y)") nt.assert_equal(ax.get_ylabel(), "g") plt.close("all") class TestCountPlot(CategoricalFixture): def test_plot_elements(self): ax = cat.countplot("g", data=self.df) nt.assert_equal(len(ax.patches), self.g.unique().size) for p in ax.patches: nt.assert_equal(p.get_y(), 0) nt.assert_equal(p.get_height(), self.g.size / self.g.unique().size) plt.close("all") ax = cat.countplot(y="g", data=self.df) nt.assert_equal(len(ax.patches), self.g.unique().size) for p in ax.patches: nt.assert_equal(p.get_x(), 0) nt.assert_equal(p.get_width(), self.g.size / self.g.unique().size) plt.close("all") ax = cat.countplot("g", hue="h", data=self.df) nt.assert_equal(len(ax.patches), self.g.unique().size * self.h.unique().size) plt.close("all") ax = cat.countplot(y="g", hue="h", data=self.df) nt.assert_equal(len(ax.patches), self.g.unique().size * self.h.unique().size) plt.close("all") def test_input_error(self): with nt.assert_raises(TypeError): cat.countplot() with nt.assert_raises(TypeError): cat.countplot(x="g", y="h", data=self.df) class TestFactorPlot(CategoricalFixture): def test_facet_organization(self): g = cat.factorplot("g", "y", data=self.df) nt.assert_equal(g.axes.shape, (1, 1)) g = cat.factorplot("g", "y", col="h", data=self.df) nt.assert_equal(g.axes.shape, (1, 2)) g = cat.factorplot("g", "y", row="h", data=self.df) nt.assert_equal(g.axes.shape, (2, 1)) g = cat.factorplot("g", "y", col="u", row="h", data=self.df) nt.assert_equal(g.axes.shape, (2, 3)) plt.close("all") def test_plot_elements(self): g = cat.factorplot("g", "y", data=self.df) nt.assert_equal(len(g.ax.collections), 1) want_lines = self.g.unique().size + 1 nt.assert_equal(len(g.ax.lines), want_lines) g = cat.factorplot("g", "y", "h", data=self.df) want_collections = self.h.unique().size nt.assert_equal(len(g.ax.collections), want_collections) want_lines = (self.g.unique().size + 1) * self.h.unique().size nt.assert_equal(len(g.ax.lines), want_lines) g = cat.factorplot("g", "y", data=self.df, kind="bar") want_elements = self.g.unique().size nt.assert_equal(len(g.ax.patches), want_elements) nt.assert_equal(len(g.ax.lines), want_elements) g = cat.factorplot("g", "y", "h", data=self.df, kind="bar") want_elements = self.g.unique().size * self.h.unique().size nt.assert_equal(len(g.ax.patches), want_elements) nt.assert_equal(len(g.ax.lines), want_elements) g = cat.factorplot("g", data=self.df, kind="count") want_elements = self.g.unique().size nt.assert_equal(len(g.ax.patches), want_elements) nt.assert_equal(len(g.ax.lines), 0) g = cat.factorplot("g", hue="h", data=self.df, kind="count") want_elements = self.g.unique().size * self.h.unique().size nt.assert_equal(len(g.ax.patches), want_elements) nt.assert_equal(len(g.ax.lines), 0) g = cat.factorplot("g", "y", data=self.df, kind="box") want_artists = self.g.unique().size nt.assert_equal(len(g.ax.artists), want_artists) g = cat.factorplot("g", "y", "h", data=self.df, kind="box") want_artists = self.g.unique().size * self.h.unique().size nt.assert_equal(len(g.ax.artists), want_artists) g = cat.factorplot("g", "y", data=self.df, kind="violin", inner=None) want_elements = self.g.unique().size nt.assert_equal(len(g.ax.collections), want_elements) g = cat.factorplot("g", "y", "h", data=self.df, kind="violin", inner=None) want_elements = self.g.unique().size * self.h.unique().size nt.assert_equal(len(g.ax.collections), want_elements) g = cat.factorplot("g", "y", data=self.df, kind="strip") want_elements = self.g.unique().size nt.assert_equal(len(g.ax.collections), want_elements) g = cat.factorplot("g", "y", "h", data=self.df, kind="strip") want_elements = self.g.unique().size * self.h.unique().size nt.assert_equal(len(g.ax.collections), want_elements) plt.close("all") def test_bad_plot_kind_error(self): with nt.assert_raises(ValueError): cat.factorplot("g", "y", data=self.df, kind="not_a_kind") def test_plot_colors(self): ax = cat.barplot("g", "y", data=self.df) g = cat.factorplot("g", "y", data=self.df, kind="bar") for p1, p2 in zip(ax.patches, g.ax.patches): nt.assert_equal(p1.get_facecolor(), p2.get_facecolor()) plt.close("all") ax = cat.barplot("g", "y", data=self.df, color="purple") g = cat.factorplot("g", "y", data=self.df, kind="bar", color="purple") for p1, p2 in zip(ax.patches, g.ax.patches): nt.assert_equal(p1.get_facecolor(), p2.get_facecolor()) plt.close("all") ax = cat.barplot("g", "y", data=self.df, palette="Set2") g = cat.factorplot("g", "y", data=self.df, kind="bar", palette="Set2") for p1, p2 in zip(ax.patches, g.ax.patches): nt.assert_equal(p1.get_facecolor(), p2.get_facecolor()) plt.close("all") ax = cat.pointplot("g", "y", data=self.df) g = cat.factorplot("g", "y", data=self.df) for l1, l2 in zip(ax.lines, g.ax.lines): nt.assert_equal(l1.get_color(), l2.get_color()) plt.close("all") ax = cat.pointplot("g", "y", data=self.df, color="purple") g = cat.factorplot("g", "y", data=self.df, color="purple") for l1, l2 in zip(ax.lines, g.ax.lines): nt.assert_equal(l1.get_color(), l2.get_color()) plt.close("all") ax = cat.pointplot("g", "y", data=self.df, palette="Set2") g = cat.factorplot("g", "y", data=self.df, palette="Set2") for l1, l2 in zip(ax.lines, g.ax.lines): nt.assert_equal(l1.get_color(), l2.get_color()) plt.close("all") seaborn-0.6.0/seaborn/tests/test_distributions.py000066400000000000000000000203521254425553400222670ustar00rootroot00000000000000import numpy as np import pandas as pd import matplotlib as mpl import matplotlib.pyplot as plt import nose.tools as nt import numpy.testing as npt from numpy.testing.decorators import skipif from .. import distributions as dist try: import statsmodels.nonparametric.api assert statsmodels.nonparametric.api _no_statsmodels = False except ImportError: _no_statsmodels = True class TestKDE(object): rs = np.random.RandomState(0) x = rs.randn(50) y = rs.randn(50) kernel = "gau" bw = "scott" gridsize = 128 clip = (-np.inf, np.inf) cut = 3 def test_scipy_univariate_kde(self): """Test the univariate KDE estimation with scipy.""" grid, y = dist._scipy_univariate_kde(self.x, self.bw, self.gridsize, self.cut, self.clip) nt.assert_equal(len(grid), self.gridsize) nt.assert_equal(len(y), self.gridsize) for bw in ["silverman", .2]: dist._scipy_univariate_kde(self.x, bw, self.gridsize, self.cut, self.clip) @skipif(_no_statsmodels) def test_statsmodels_univariate_kde(self): """Test the univariate KDE estimation with statsmodels.""" grid, y = dist._statsmodels_univariate_kde(self.x, self.kernel, self.bw, self.gridsize, self.cut, self.clip) nt.assert_equal(len(grid), self.gridsize) nt.assert_equal(len(y), self.gridsize) for bw in ["silverman", .2]: dist._statsmodels_univariate_kde(self.x, self.kernel, bw, self.gridsize, self.cut, self.clip) def test_scipy_bivariate_kde(self): """Test the bivariate KDE estimation with scipy.""" clip = [self.clip, self.clip] x, y, z = dist._scipy_bivariate_kde(self.x, self.y, self.bw, self.gridsize, self.cut, clip) nt.assert_equal(x.shape, (self.gridsize, self.gridsize)) nt.assert_equal(y.shape, (self.gridsize, self.gridsize)) nt.assert_equal(len(z), self.gridsize) # Test a specific bandwidth clip = [self.clip, self.clip] x, y, z = dist._scipy_bivariate_kde(self.x, self.y, 1, self.gridsize, self.cut, clip) # Test that we get an error with an invalid bandwidth with nt.assert_raises(ValueError): dist._scipy_bivariate_kde(self.x, self.y, (1, 2), self.gridsize, self.cut, clip) @skipif(_no_statsmodels) def test_statsmodels_bivariate_kde(self): """Test the bivariate KDE estimation with statsmodels.""" clip = [self.clip, self.clip] x, y, z = dist._statsmodels_bivariate_kde(self.x, self.y, self.bw, self.gridsize, self.cut, clip) nt.assert_equal(x.shape, (self.gridsize, self.gridsize)) nt.assert_equal(y.shape, (self.gridsize, self.gridsize)) nt.assert_equal(len(z), self.gridsize) @skipif(_no_statsmodels) def test_statsmodels_kde_cumulative(self): """Test computation of cumulative KDE.""" grid, y = dist._statsmodels_univariate_kde(self.x, self.kernel, self.bw, self.gridsize, self.cut, self.clip, cumulative=True) nt.assert_equal(len(grid), self.gridsize) nt.assert_equal(len(y), self.gridsize) # make sure y is monotonically increasing npt.assert_((np.diff(y) > 0).all()) def test_kde_cummulative_2d(self): """Check error if args indicate bivariate KDE and cumulative.""" with npt.assert_raises(TypeError): dist.kdeplot(self.x, data2=self.y, cumulative=True) def test_bivariate_kde_series(self): df = pd.DataFrame({'x': self.x, 'y': self.y}) ax_series = dist.kdeplot(df.x, df.y) ax_values = dist.kdeplot(df.x.values, df.y.values) nt.assert_equal(len(ax_series.collections), len(ax_values.collections)) nt.assert_equal(ax_series.collections[0].get_paths(), ax_values.collections[0].get_paths()) plt.close("all") class TestJointPlot(object): rs = np.random.RandomState(sum(map(ord, "jointplot"))) x = rs.randn(100) y = rs.randn(100) data = pd.DataFrame(dict(x=x, y=y)) def test_scatter(self): g = dist.jointplot("x", "y", self.data) nt.assert_equal(len(g.ax_joint.collections), 1) x, y = g.ax_joint.collections[0].get_offsets().T npt.assert_array_equal(self.x, x) npt.assert_array_equal(self.y, y) x_bins = dist._freedman_diaconis_bins(self.x) nt.assert_equal(len(g.ax_marg_x.patches), x_bins) y_bins = dist._freedman_diaconis_bins(self.y) nt.assert_equal(len(g.ax_marg_y.patches), y_bins) plt.close("all") def test_reg(self): g = dist.jointplot("x", "y", self.data, kind="reg") nt.assert_equal(len(g.ax_joint.collections), 2) x, y = g.ax_joint.collections[0].get_offsets().T npt.assert_array_equal(self.x, x) npt.assert_array_equal(self.y, y) x_bins = dist._freedman_diaconis_bins(self.x) nt.assert_equal(len(g.ax_marg_x.patches), x_bins) y_bins = dist._freedman_diaconis_bins(self.y) nt.assert_equal(len(g.ax_marg_y.patches), y_bins) nt.assert_equal(len(g.ax_joint.lines), 1) nt.assert_equal(len(g.ax_marg_x.lines), 1) nt.assert_equal(len(g.ax_marg_y.lines), 1) plt.close("all") def test_resid(self): g = dist.jointplot("x", "y", self.data, kind="resid") nt.assert_equal(len(g.ax_joint.collections), 1) nt.assert_equal(len(g.ax_joint.lines), 1) nt.assert_equal(len(g.ax_marg_x.lines), 0) nt.assert_equal(len(g.ax_marg_y.lines), 1) plt.close("all") def test_hex(self): g = dist.jointplot("x", "y", self.data, kind="hex") nt.assert_equal(len(g.ax_joint.collections), 1) x_bins = dist._freedman_diaconis_bins(self.x) nt.assert_equal(len(g.ax_marg_x.patches), x_bins) y_bins = dist._freedman_diaconis_bins(self.y) nt.assert_equal(len(g.ax_marg_y.patches), y_bins) plt.close("all") def test_kde(self): g = dist.jointplot("x", "y", self.data, kind="kde") nt.assert_true(len(g.ax_joint.collections) > 0) nt.assert_equal(len(g.ax_marg_x.collections), 1) nt.assert_equal(len(g.ax_marg_y.collections), 1) nt.assert_equal(len(g.ax_marg_x.lines), 1) nt.assert_equal(len(g.ax_marg_y.lines), 1) plt.close("all") def test_color(self): g = dist.jointplot("x", "y", self.data, color="purple") purple = mpl.colors.colorConverter.to_rgb("purple") scatter_color = g.ax_joint.collections[0].get_facecolor()[0, :3] nt.assert_equal(tuple(scatter_color), purple) hist_color = g.ax_marg_x.patches[0].get_facecolor()[:3] nt.assert_equal(hist_color, purple) plt.close("all") def test_annotation(self): g = dist.jointplot("x", "y", self.data) nt.assert_equal(len(g.ax_joint.legend_.get_texts()), 1) g = dist.jointplot("x", "y", self.data, stat_func=None) nt.assert_is(g.ax_joint.legend_, None) plt.close("all") def test_hex_customise(self): # test that default gridsize can be overridden g = dist.jointplot("x", "y", self.data, kind="hex", joint_kws=dict(gridsize=5)) nt.assert_equal(len(g.ax_joint.collections), 1) a = g.ax_joint.collections[0].get_array() nt.assert_equal(28, a.shape[0]) # 28 hexagons expected for gridsize 5 plt.close("all") def test_bad_kind(self): with nt.assert_raises(ValueError): dist.jointplot("x", "y", self.data, kind="not_a_kind") @classmethod def teardown_class(cls): """Ensure that all figures are closed on exit.""" plt.close("all") seaborn-0.6.0/seaborn/tests/test_linearmodels.py000066400000000000000000000453251254425553400220520ustar00rootroot00000000000000import numpy as np import matplotlib as mpl import matplotlib.pyplot as plt import pandas as pd import nose.tools as nt import numpy.testing as npt import pandas.util.testing as pdt from numpy.testing.decorators import skipif from nose import SkipTest try: import statsmodels.regression.linear_model as smlm _no_statsmodels = False except ImportError: _no_statsmodels = True from .. import linearmodels as lm from .. import algorithms as algo from .. import utils from ..palettes import color_palette rs = np.random.RandomState(0) class TestLinearPlotter(object): rs = np.random.RandomState(77) df = pd.DataFrame(dict(x=rs.normal(size=60), d=rs.randint(-2, 3, 60), y=rs.gamma(4, size=60), s=np.tile(list("abcdefghij"), 6))) df["z"] = df.y + rs.randn(60) df["y_na"] = df.y.copy() df.y_na.ix[[10, 20, 30]] = np.nan def test_establish_variables_from_frame(self): p = lm._LinearPlotter() p.establish_variables(self.df, x="x", y="y") pdt.assert_series_equal(p.x, self.df.x) pdt.assert_series_equal(p.y, self.df.y) pdt.assert_frame_equal(p.data, self.df) def test_establish_variables_from_series(self): p = lm._LinearPlotter() p.establish_variables(None, x=self.df.x, y=self.df.y) pdt.assert_series_equal(p.x, self.df.x) pdt.assert_series_equal(p.y, self.df.y) nt.assert_is(p.data, None) def test_establish_variables_from_array(self): p = lm._LinearPlotter() p.establish_variables(None, x=self.df.x.values, y=self.df.y.values) npt.assert_array_equal(p.x, self.df.x) npt.assert_array_equal(p.y, self.df.y) nt.assert_is(p.data, None) def test_establish_variables_from_mix(self): p = lm._LinearPlotter() p.establish_variables(self.df, x="x", y=self.df.y) pdt.assert_series_equal(p.x, self.df.x) pdt.assert_series_equal(p.y, self.df.y) pdt.assert_frame_equal(p.data, self.df) def test_establish_variables_from_bad(self): p = lm._LinearPlotter() with nt.assert_raises(ValueError): p.establish_variables(None, x="x", y=self.df.y) def test_dropna(self): p = lm._LinearPlotter() p.establish_variables(self.df, x="x", y_na="y_na") pdt.assert_series_equal(p.x, self.df.x) pdt.assert_series_equal(p.y_na, self.df.y_na) p.dropna("x", "y_na") mask = self.df.y_na.notnull() pdt.assert_series_equal(p.x, self.df.x[mask]) pdt.assert_series_equal(p.y_na, self.df.y_na[mask]) class TestRegressionPlotter(object): rs = np.random.RandomState(49) grid = np.linspace(-3, 3, 30) n_boot = 100 bins_numeric = 3 bins_given = [-1, 0, 1] df = pd.DataFrame(dict(x=rs.normal(size=60), d=rs.randint(-2, 3, 60), y=rs.gamma(4, size=60), s=np.tile(list(range(6)), 10))) df["z"] = df.y + rs.randn(60) df["y_na"] = df.y.copy() bw_err = rs.randn(6)[df.s.values] * 2 df.y += bw_err p = 1 / (1 + np.exp(-(df.x * 2 + rs.randn(60)))) df["c"] = [rs.binomial(1, p_i) for p_i in p] df.y_na.ix[[10, 20, 30]] = np.nan def test_variables_from_frame(self): p = lm._RegressionPlotter("x", "y", data=self.df, units="s") pdt.assert_series_equal(p.x, self.df.x) pdt.assert_series_equal(p.y, self.df.y) pdt.assert_series_equal(p.units, self.df.s) pdt.assert_frame_equal(p.data, self.df) def test_variables_from_series(self): p = lm._RegressionPlotter(self.df.x, self.df.y, units=self.df.s) npt.assert_array_equal(p.x, self.df.x) npt.assert_array_equal(p.y, self.df.y) npt.assert_array_equal(p.units, self.df.s) nt.assert_is(p.data, None) def test_variables_from_mix(self): p = lm._RegressionPlotter("x", self.df.y + 1, data=self.df) npt.assert_array_equal(p.x, self.df.x) npt.assert_array_equal(p.y, self.df.y + 1) pdt.assert_frame_equal(p.data, self.df) def test_dropna(self): p = lm._RegressionPlotter("x", "y_na", data=self.df) nt.assert_equal(len(p.x), pd.notnull(self.df.y_na).sum()) p = lm._RegressionPlotter("x", "y_na", data=self.df, dropna=False) nt.assert_equal(len(p.x), len(self.df.y_na)) def test_ci(self): p = lm._RegressionPlotter("x", "y", data=self.df, ci=95) nt.assert_equal(p.ci, 95) nt.assert_equal(p.x_ci, 95) p = lm._RegressionPlotter("x", "y", data=self.df, ci=95, x_ci=68) nt.assert_equal(p.ci, 95) nt.assert_equal(p.x_ci, 68) @skipif(_no_statsmodels) def test_fast_regression(self): p = lm._RegressionPlotter("x", "y", data=self.df, n_boot=self.n_boot) # Fit with the "fast" function, which just does linear algebra yhat_fast, _ = p.fit_fast(self.grid) # Fit using the statsmodels function with an OLS model yhat_smod, _ = p.fit_statsmodels(self.grid, smlm.OLS) # Compare the vector of y_hat values npt.assert_array_almost_equal(yhat_fast, yhat_smod) @skipif(_no_statsmodels) def test_regress_poly(self): p = lm._RegressionPlotter("x", "y", data=self.df, n_boot=self.n_boot) # Fit an first-order polynomial yhat_poly, _ = p.fit_poly(self.grid, 1) # Fit using the statsmodels function with an OLS model yhat_smod, _ = p.fit_statsmodels(self.grid, smlm.OLS) # Compare the vector of y_hat values npt.assert_array_almost_equal(yhat_poly, yhat_smod) def test_regress_logx(self): x = np.arange(1, 10) y = np.arange(1, 10) grid = np.linspace(1, 10, 100) p = lm._RegressionPlotter(x, y, n_boot=self.n_boot) yhat_lin, _ = p.fit_fast(grid) yhat_log, _ = p.fit_logx(grid) nt.assert_greater(yhat_lin[0], yhat_log[0]) nt.assert_greater(yhat_log[20], yhat_lin[20]) nt.assert_greater(yhat_lin[90], yhat_log[90]) @skipif(_no_statsmodels) def test_regress_n_boot(self): p = lm._RegressionPlotter("x", "y", data=self.df, n_boot=self.n_boot) # Fast (linear algebra) version _, boots_fast = p.fit_fast(self.grid) npt.assert_equal(boots_fast.shape, (self.n_boot, self.grid.size)) # Slower (np.polyfit) version _, boots_poly = p.fit_poly(self.grid, 1) npt.assert_equal(boots_poly.shape, (self.n_boot, self.grid.size)) # Slowest (statsmodels) version _, boots_smod = p.fit_statsmodels(self.grid, smlm.OLS) npt.assert_equal(boots_smod.shape, (self.n_boot, self.grid.size)) @skipif(_no_statsmodels) def test_regress_without_bootstrap(self): p = lm._RegressionPlotter("x", "y", data=self.df, n_boot=self.n_boot, ci=None) # Fast (linear algebra) version _, boots_fast = p.fit_fast(self.grid) nt.assert_is(boots_fast, None) # Slower (np.polyfit) version _, boots_poly = p.fit_poly(self.grid, 1) nt.assert_is(boots_poly, None) # Slowest (statsmodels) version _, boots_smod = p.fit_statsmodels(self.grid, smlm.OLS) nt.assert_is(boots_smod, None) def test_numeric_bins(self): p = lm._RegressionPlotter(self.df.x, self.df.y) x_binned, bins = p.bin_predictor(self.bins_numeric) npt.assert_equal(len(bins), self.bins_numeric) npt.assert_array_equal(np.unique(x_binned), bins) def test_provided_bins(self): p = lm._RegressionPlotter(self.df.x, self.df.y) x_binned, bins = p.bin_predictor(self.bins_given) npt.assert_array_equal(np.unique(x_binned), self.bins_given) def test_bin_results(self): p = lm._RegressionPlotter(self.df.x, self.df.y) x_binned, bins = p.bin_predictor(self.bins_given) nt.assert_greater(self.df.x[x_binned == 0].min(), self.df.x[x_binned == -1].max()) nt.assert_greater(self.df.x[x_binned == 1].min(), self.df.x[x_binned == 0].max()) def test_scatter_data(self): p = lm._RegressionPlotter(self.df.x, self.df.y) x, y = p.scatter_data npt.assert_array_equal(x, self.df.x) npt.assert_array_equal(y, self.df.y) p = lm._RegressionPlotter(self.df.d, self.df.y) x, y = p.scatter_data npt.assert_array_equal(x, self.df.d) npt.assert_array_equal(y, self.df.y) p = lm._RegressionPlotter(self.df.d, self.df.y, x_jitter=.1) x, y = p.scatter_data nt.assert_true((x != self.df.d).any()) npt.assert_array_less(np.abs(self.df.d - x), np.repeat(.1, len(x))) npt.assert_array_equal(y, self.df.y) p = lm._RegressionPlotter(self.df.d, self.df.y, y_jitter=.05) x, y = p.scatter_data npt.assert_array_equal(x, self.df.d) npt.assert_array_less(np.abs(self.df.y - y), np.repeat(.1, len(y))) def test_estimate_data(self): p = lm._RegressionPlotter(self.df.d, self.df.y, x_estimator=np.mean) x, y, ci = p.estimate_data npt.assert_array_equal(x, np.sort(np.unique(self.df.d))) npt.assert_array_almost_equal(y, self.df.groupby("d").y.mean()) npt.assert_array_less(np.array(ci)[:, 0], y) npt.assert_array_less(y, np.array(ci)[:, 1]) def test_estimate_cis(self): # set known good seed to avoid the test stochastically failing np.random.seed(123) p = lm._RegressionPlotter(self.df.d, self.df.y, x_estimator=np.mean, ci=95) _, _, ci_big = p.estimate_data p = lm._RegressionPlotter(self.df.d, self.df.y, x_estimator=np.mean, ci=50) _, _, ci_wee = p.estimate_data npt.assert_array_less(np.diff(ci_wee), np.diff(ci_big)) p = lm._RegressionPlotter(self.df.d, self.df.y, x_estimator=np.mean, ci=None) _, _, ci_nil = p.estimate_data npt.assert_array_equal(ci_nil, [None] * len(ci_nil)) def test_estimate_units(self): # Seed the RNG locally np.random.seed(345) p = lm._RegressionPlotter("x", "y", data=self.df, units="s", x_bins=3) _, _, ci_big = p.estimate_data ci_big = np.diff(ci_big, axis=1) p = lm._RegressionPlotter("x", "y", data=self.df, x_bins=3) _, _, ci_wee = p.estimate_data ci_wee = np.diff(ci_wee, axis=1) npt.assert_array_less(ci_wee, ci_big) def test_partial(self): x = self.rs.randn(100) y = x + self.rs.randn(100) z = x + self.rs.randn(100) p = lm._RegressionPlotter(y, z) _, r_orig = np.corrcoef(p.x, p.y)[0] p = lm._RegressionPlotter(y, z, y_partial=x) _, r_semipartial = np.corrcoef(p.x, p.y)[0] nt.assert_less(r_semipartial, r_orig) p = lm._RegressionPlotter(y, z, x_partial=x, y_partial=x) _, r_partial = np.corrcoef(p.x, p.y)[0] nt.assert_less(r_partial, r_orig) @skipif(_no_statsmodels) def test_logistic_regression(self): p = lm._RegressionPlotter("x", "c", data=self.df, logistic=True, n_boot=self.n_boot) _, yhat, _ = p.fit_regression(x_range=(-3, 3)) npt.assert_array_less(yhat, 1) npt.assert_array_less(0, yhat) @skipif(_no_statsmodels) def test_robust_regression(self): p_ols = lm._RegressionPlotter("x", "y", data=self.df, n_boot=self.n_boot) _, ols_yhat, _ = p_ols.fit_regression(x_range=(-3, 3)) p_robust = lm._RegressionPlotter("x", "y", data=self.df, robust=True, n_boot=self.n_boot) _, robust_yhat, _ = p_robust.fit_regression(x_range=(-3, 3)) nt.assert_equal(len(ols_yhat), len(robust_yhat)) @skipif(_no_statsmodels) def test_lowess_regression(self): p = lm._RegressionPlotter("x", "y", data=self.df, lowess=True) grid, yhat, err_bands = p.fit_regression(x_range=(-3, 3)) nt.assert_equal(len(grid), len(yhat)) nt.assert_is(err_bands, None) def test_regression_options(self): with nt.assert_raises(ValueError): lm._RegressionPlotter("x", "y", data=self.df, lowess=True, order=2) with nt.assert_raises(ValueError): lm._RegressionPlotter("x", "y", data=self.df, lowess=True, logistic=True) def test_regression_limits(self): f, ax = plt.subplots() ax.scatter(self.df.x, self.df.y) p = lm._RegressionPlotter("x", "y", data=self.df) grid, _, _ = p.fit_regression(ax) xlim = ax.get_xlim() nt.assert_equal(grid.min(), xlim[0]) nt.assert_equal(grid.max(), xlim[1]) p = lm._RegressionPlotter("x", "y", data=self.df, truncate=True) grid, _, _ = p.fit_regression() nt.assert_equal(grid.min(), self.df.x.min()) nt.assert_equal(grid.max(), self.df.x.max()) plt.close("all") class TestRegressionPlots(object): rs = np.random.RandomState(56) df = pd.DataFrame(dict(x=rs.randn(90), y=rs.randn(90) + 5, z=rs.randint(0, 1, 90), g=np.repeat(list("abc"), 30), h=np.tile(list("xy"), 45), u=np.tile(np.arange(6), 15))) bw_err = rs.randn(6)[df.u.values] df.y += bw_err def test_regplot_basic(self): f, ax = plt.subplots() lm.regplot("x", "y", self.df) nt.assert_equal(len(ax.lines), 1) nt.assert_equal(len(ax.collections), 2) x, y = ax.collections[0].get_offsets().T npt.assert_array_equal(x, self.df.x) npt.assert_array_equal(y, self.df.y) plt.close("all") def test_regplot_selective(self): f, ax = plt.subplots() ax = lm.regplot("x", "y", self.df, scatter=False, ax=ax) nt.assert_equal(len(ax.lines), 1) nt.assert_equal(len(ax.collections), 1) ax.clear() f, ax = plt.subplots() ax = lm.regplot("x", "y", self.df, fit_reg=False) nt.assert_equal(len(ax.lines), 0) nt.assert_equal(len(ax.collections), 1) ax.clear() f, ax = plt.subplots() ax = lm.regplot("x", "y", self.df, ci=None) nt.assert_equal(len(ax.lines), 1) nt.assert_equal(len(ax.collections), 1) ax.clear() plt.close("all") def test_regplot_scatter_kws_alpha(self): f, ax = plt.subplots() color = np.array([[0.3, 0.8, 0.5, 0.5]]) ax = lm.regplot("x", "y", self.df, scatter_kws={'color': color}) nt.assert_is(ax.collections[0]._alpha, None) nt.assert_equal(ax.collections[0]._facecolors[0, 3], 0.5) f, ax = plt.subplots() color = np.array([[0.3, 0.8, 0.5]]) ax = lm.regplot("x", "y", self.df, scatter_kws={'color': color}) nt.assert_equal(ax.collections[0]._alpha, 0.8) f, ax = plt.subplots() color = np.array([[0.3, 0.8, 0.5]]) ax = lm.regplot("x", "y", self.df, scatter_kws={'color': color, 'alpha': 0.4}) nt.assert_equal(ax.collections[0]._alpha, 0.4) f, ax = plt.subplots() color = 'r' ax = lm.regplot("x", "y", self.df, scatter_kws={'color': color}) nt.assert_equal(ax.collections[0]._alpha, 0.8) plt.close("all") def test_regplot_binned(self): ax = lm.regplot("x", "y", self.df, x_bins=5) nt.assert_equal(len(ax.lines), 6) nt.assert_equal(len(ax.collections), 2) plt.close("all") def test_lmplot_basic(self): g = lm.lmplot("x", "y", self.df) ax = g.axes[0, 0] nt.assert_equal(len(ax.lines), 1) nt.assert_equal(len(ax.collections), 2) x, y = ax.collections[0].get_offsets().T npt.assert_array_equal(x, self.df.x) npt.assert_array_equal(y, self.df.y) plt.close("all") def test_lmplot_hue(self): g = lm.lmplot("x", "y", data=self.df, hue="h") ax = g.axes[0, 0] nt.assert_equal(len(ax.lines), 2) nt.assert_equal(len(ax.collections), 4) plt.close("all") def test_lmplot_markers(self): g1 = lm.lmplot("x", "y", data=self.df, hue="h", markers="s") nt.assert_equal(g1.hue_kws, {"marker": ["s", "s"]}) g2 = lm.lmplot("x", "y", data=self.df, hue="h", markers=["o", "s"]) nt.assert_equal(g2.hue_kws, {"marker": ["o", "s"]}) with nt.assert_raises(ValueError): lm.lmplot("x", "y", data=self.df, hue="h", markers=["o", "s", "d"]) plt.close("all") def test_lmplot_marker_linewidths(self): if mpl.__version__ == "1.4.2": raise SkipTest g = lm.lmplot("x", "y", data=self.df, hue="h", fit_reg=False, markers=["o", "+"]) c = g.axes[0, 0].collections nt.assert_equal(c[0].get_linewidths()[0], 0) rclw = mpl.rcParams["lines.linewidth"] nt.assert_equal(c[1].get_linewidths()[0], rclw) plt.close("all") def test_lmplot_facets(self): g = lm.lmplot("x", "y", data=self.df, row="g", col="h") nt.assert_equal(g.axes.shape, (3, 2)) g = lm.lmplot("x", "y", data=self.df, col="u", col_wrap=4) nt.assert_equal(g.axes.shape, (6,)) g = lm.lmplot("x", "y", data=self.df, hue="h", col="u") nt.assert_equal(g.axes.shape, (1, 6)) plt.close("all") def test_lmplot_hue_col_nolegend(self): g = lm.lmplot("x", "y", data=self.df, col="h", hue="h") nt.assert_is(g._legend, None) plt.close("all") def test_lmplot_scatter_kws(self): g = lm.lmplot("x", "y", hue="h", data=self.df, ci=None) red_scatter, blue_scatter = g.axes[0, 0].collections red, blue = color_palette(n_colors=2) npt.assert_array_equal(red, red_scatter.get_facecolors()[0, :3]) npt.assert_array_equal(blue, blue_scatter.get_facecolors()[0, :3]) plt.close("all") def test_residplot(self): x, y = self.df.x, self.df.y ax = lm.residplot(x, y) resid = y - np.polyval(np.polyfit(x, y, 1), x) x_plot, y_plot = ax.collections[0].get_offsets().T npt.assert_array_equal(x, x_plot) npt.assert_array_almost_equal(resid, y_plot) plt.close("all") @skipif(_no_statsmodels) def test_residplot_lowess(self): ax = lm.residplot("x", "y", self.df, lowess=True) nt.assert_equal(len(ax.lines), 2) x, y = ax.lines[1].get_xydata().T npt.assert_array_equal(x, np.sort(self.df.x)) plt.close("all") seaborn-0.6.0/seaborn/tests/test_matrix.py000066400000000000000000000766201254425553400207020ustar00rootroot00000000000000import itertools import tempfile import numpy as np import matplotlib as mpl import matplotlib.pyplot as plt import pandas as pd from scipy.spatial import distance from scipy.cluster import hierarchy import nose.tools as nt import numpy.testing as npt import pandas.util.testing as pdt from numpy.testing.decorators import skipif from .. import matrix as mat from .. import color_palette from ..external.six.moves import range try: import fastcluster assert fastcluster _no_fastcluster = False except ImportError: _no_fastcluster = True class TestHeatmap(object): rs = np.random.RandomState(sum(map(ord, "heatmap"))) x_norm = rs.randn(4, 8) letters = pd.Series(["A", "B", "C", "D"], name="letters") df_norm = pd.DataFrame(x_norm, index=letters) x_unif = rs.rand(20, 13) df_unif = pd.DataFrame(x_unif) default_kws = dict(vmin=None, vmax=None, cmap=None, center=None, robust=False, annot=False, fmt=".2f", annot_kws=None, cbar=True, cbar_kws=None, mask=None) def test_ndarray_input(self): p = mat._HeatMapper(self.x_norm, **self.default_kws) npt.assert_array_equal(p.plot_data, self.x_norm[::-1]) pdt.assert_frame_equal(p.data, pd.DataFrame(self.x_norm).ix[::-1]) npt.assert_array_equal(p.xticklabels, np.arange(8)) npt.assert_array_equal(p.yticklabels, np.arange(4)[::-1]) nt.assert_equal(p.xlabel, "") nt.assert_equal(p.ylabel, "") def test_df_input(self): p = mat._HeatMapper(self.df_norm, **self.default_kws) npt.assert_array_equal(p.plot_data, self.x_norm[::-1]) pdt.assert_frame_equal(p.data, self.df_norm.ix[::-1]) npt.assert_array_equal(p.xticklabels, np.arange(8)) npt.assert_array_equal(p.yticklabels, ["D", "C", "B", "A"]) nt.assert_equal(p.xlabel, "") nt.assert_equal(p.ylabel, "letters") def test_df_multindex_input(self): df = self.df_norm.copy() index = pd.MultiIndex.from_tuples([("A", 1), ("B", 2), ("C", 3), ("D", 4)], names=["letter", "number"]) index.name = "letter-number" df.index = index p = mat._HeatMapper(df, **self.default_kws) npt.assert_array_equal(p.yticklabels, ["D-4", "C-3", "B-2", "A-1"]) nt.assert_equal(p.ylabel, "letter-number") p = mat._HeatMapper(df.T, **self.default_kws) npt.assert_array_equal(p.xticklabels, ["A-1", "B-2", "C-3", "D-4"]) nt.assert_equal(p.xlabel, "letter-number") def test_mask_input(self): kws = self.default_kws.copy() mask = self.x_norm > 0 kws['mask'] = mask p = mat._HeatMapper(self.x_norm, **kws) plot_data = np.ma.masked_where(mask, self.x_norm) npt.assert_array_equal(p.plot_data, plot_data[::-1]) def test_default_sequential_vlims(self): p = mat._HeatMapper(self.df_unif, **self.default_kws) nt.assert_equal(p.vmin, self.x_unif.min()) nt.assert_equal(p.vmax, self.x_unif.max()) nt.assert_true(not p.divergent) def test_default_diverging_vlims(self): p = mat._HeatMapper(self.df_norm, **self.default_kws) vlim = max(abs(self.x_norm.min()), abs(self.x_norm.max())) nt.assert_equal(p.vmin, -vlim) nt.assert_equal(p.vmax, vlim) nt.assert_true(p.divergent) def test_robust_sequential_vlims(self): kws = self.default_kws.copy() kws["robust"] = True p = mat._HeatMapper(self.df_unif, **kws) nt.assert_equal(p.vmin, np.percentile(self.x_unif, 2)) nt.assert_equal(p.vmax, np.percentile(self.x_unif, 98)) def test_custom_sequential_vlims(self): kws = self.default_kws.copy() kws["vmin"] = 0 kws["vmax"] = 1 p = mat._HeatMapper(self.df_unif, **kws) nt.assert_equal(p.vmin, 0) nt.assert_equal(p.vmax, 1) def test_custom_diverging_vlims(self): kws = self.default_kws.copy() kws["vmin"] = -4 kws["vmax"] = 5 p = mat._HeatMapper(self.df_norm, **kws) nt.assert_equal(p.vmin, -5) nt.assert_equal(p.vmax, 5) def test_array_with_nans(self): x1 = self.rs.rand(10, 10) nulls = np.zeros(10) * np.nan x2 = np.c_[x1, nulls] m1 = mat._HeatMapper(x1, **self.default_kws) m2 = mat._HeatMapper(x2, **self.default_kws) nt.assert_equal(m1.vmin, m2.vmin) nt.assert_equal(m1.vmax, m2.vmax) def test_mask(self): df = pd.DataFrame(data={'a': [1, 1, 1], 'b': [2, np.nan, 2], 'c': [3, 3, np.nan]}) kws = self.default_kws.copy() kws["mask"] = np.isnan(df.values) m = mat._HeatMapper(df, **kws) npt.assert_array_equal(np.isnan(m.plot_data.data), m.plot_data.mask) def test_custom_cmap(self): kws = self.default_kws.copy() kws["cmap"] = "BuGn" p = mat._HeatMapper(self.df_unif, **kws) nt.assert_equal(p.cmap, "BuGn") def test_centered_vlims(self): kws = self.default_kws.copy() kws["center"] = .5 p = mat._HeatMapper(self.df_unif, **kws) nt.assert_true(p.divergent) nt.assert_equal(p.vmax - .5, .5 - p.vmin) def test_tickabels_off(self): kws = self.default_kws.copy() kws['xticklabels'] = False kws['yticklabels'] = False p = mat._HeatMapper(self.df_norm, **kws) nt.assert_equal(p.xticklabels, ['' for _ in range( self.df_norm.shape[1])]) nt.assert_equal(p.yticklabels, ['' for _ in range( self.df_norm.shape[0])]) def test_custom_ticklabels(self): kws = self.default_kws.copy() xticklabels = list('iheartheatmaps'[:self.df_norm.shape[1]]) yticklabels = list('heatmapsarecool'[:self.df_norm.shape[0]]) kws['xticklabels'] = xticklabels kws['yticklabels'] = yticklabels p = mat._HeatMapper(self.df_norm, **kws) nt.assert_equal(p.xticklabels, xticklabels) nt.assert_equal(p.yticklabels, yticklabels[::-1]) def test_custom_ticklabel_interval(self): kws = self.default_kws.copy() kws['xticklabels'] = 2 kws['yticklabels'] = 3 p = mat._HeatMapper(self.df_norm, **kws) nx, ny = self.df_norm.T.shape ystart = (ny - 1) % 3 npt.assert_array_equal(p.xticks, np.arange(0, nx, 2) + .5) npt.assert_array_equal(p.yticks, np.arange(ystart, ny, 3) + .5) npt.assert_array_equal(p.xticklabels, self.df_norm.columns[::2]) npt.assert_array_equal(p.yticklabels, self.df_norm.index[::-1][ystart:ny:3]) def test_heatmap_annotation(self): ax = mat.heatmap(self.df_norm, annot=True, fmt=".1f", annot_kws={"fontsize": 14}) for val, text in zip(self.x_norm[::-1].flat, ax.texts): nt.assert_equal(text.get_text(), "{:.1f}".format(val)) nt.assert_equal(text.get_fontsize(), 14) plt.close("all") def test_heatmap_annotation_with_mask(self): df = pd.DataFrame(data={'a': [1, 1, 1], 'b': [2, np.nan, 2], 'c': [3, 3, np.nan]}) mask = np.isnan(df.values) df_masked = np.ma.masked_where(mask, df) ax = mat.heatmap(df, annot=True, fmt='.1f', mask=mask) nt.assert_equal(len(df_masked[::-1].compressed()), len(ax.texts)) for val, text in zip(df_masked[::-1].compressed(), ax.texts): nt.assert_equal("{:.1f}".format(val), text.get_text()) plt.close("all") def test_heatmap_cbar(self): f = plt.figure() mat.heatmap(self.df_norm) nt.assert_equal(len(f.axes), 2) plt.close(f) f = plt.figure() mat.heatmap(self.df_norm, cbar=False) nt.assert_equal(len(f.axes), 1) plt.close(f) f, (ax1, ax2) = plt.subplots(2) mat.heatmap(self.df_norm, ax=ax1, cbar_ax=ax2) nt.assert_equal(len(f.axes), 2) plt.close(f) def test_heatmap_axes(self): ax = mat.heatmap(self.df_norm) xtl = [int(l.get_text()) for l in ax.get_xticklabels()] nt.assert_equal(xtl, list(self.df_norm.columns)) ytl = [l.get_text() for l in ax.get_yticklabels()] nt.assert_equal(ytl, list(self.df_norm.index[::-1])) nt.assert_equal(ax.get_xlabel(), "") nt.assert_equal(ax.get_ylabel(), "letters") nt.assert_equal(ax.get_xlim(), (0, 8)) nt.assert_equal(ax.get_ylim(), (0, 4)) plt.close("all") def test_heatmap_ticklabel_rotation(self): f, ax = plt.subplots(figsize=(2, 2)) mat.heatmap(self.df_norm, ax=ax) for t in ax.get_xticklabels(): nt.assert_equal(t.get_rotation(), 0) for t in ax.get_yticklabels(): nt.assert_equal(t.get_rotation(), 90) plt.close(f) df = self.df_norm.copy() df.columns = [str(c) * 10 for c in df.columns] df.index = [i * 10 for i in df.index] f, ax = plt.subplots(figsize=(2, 2)) mat.heatmap(df, ax=ax) for t in ax.get_xticklabels(): nt.assert_equal(t.get_rotation(), 90) for t in ax.get_yticklabels(): nt.assert_equal(t.get_rotation(), 0) plt.close(f) def test_heatmap_inner_lines(self): c = (0, 0, 1, 1) ax = mat.heatmap(self.df_norm, linewidths=2, linecolor=c) mesh = ax.collections[0] nt.assert_equal(mesh.get_linewidths()[0], 2) nt.assert_equal(tuple(mesh.get_edgecolor()[0]), c) plt.close("all") def test_square_aspect(self): ax = mat.heatmap(self.df_norm, square=True) nt.assert_equal(ax.get_aspect(), "equal") plt.close("all") def test_mask_validation(self): mask = mat._matrix_mask(self.df_norm, None) nt.assert_equal(mask.shape, self.df_norm.shape) nt.assert_equal(mask.values.sum(), 0) with nt.assert_raises(ValueError): bad_array_mask = self.rs.randn(3, 6) > 0 mat._matrix_mask(self.df_norm, bad_array_mask) with nt.assert_raises(ValueError): bad_df_mask = pd.DataFrame(self.rs.randn(4, 8) > 0) mat._matrix_mask(self.df_norm, bad_df_mask) def test_missing_data_mask(self): data = pd.DataFrame(np.arange(4, dtype=np.float).reshape(2, 2)) data.loc[0, 0] = np.nan mask = mat._matrix_mask(data, None) npt.assert_array_equal(mask, [[True, False], [False, False]]) mask_in = np.array([[False, True], [False, False]]) mask_out = mat._matrix_mask(data, mask_in) npt.assert_array_equal(mask_out, [[True, True], [False, False]]) class TestDendrogram(object): rs = np.random.RandomState(sum(map(ord, "dendrogram"))) x_norm = rs.randn(4, 8) + np.arange(8) x_norm = (x_norm.T + np.arange(4)).T letters = pd.Series(["A", "B", "C", "D", "E", "F", "G", "H"], name="letters") df_norm = pd.DataFrame(x_norm, columns=letters) try: import fastcluster x_norm_linkage = fastcluster.linkage_vector(x_norm.T, metric='euclidean', method='single') except ImportError: x_norm_distances = distance.squareform( distance.pdist(x_norm.T, metric='euclidean')) x_norm_linkage = hierarchy.linkage(x_norm_distances, method='single') x_norm_dendrogram = hierarchy.dendrogram(x_norm_linkage, no_plot=True, color_list=['k'], color_threshold=-np.inf) x_norm_leaves = x_norm_dendrogram['leaves'] df_norm_leaves = np.asarray(df_norm.columns[x_norm_leaves]) default_kws = dict(linkage=None, metric='euclidean', method='single', axis=1, label=True, rotate=False) def test_ndarray_input(self): p = mat._DendrogramPlotter(self.x_norm, **self.default_kws) npt.assert_array_equal(p.array.T, self.x_norm) pdt.assert_frame_equal(p.data.T, pd.DataFrame(self.x_norm)) npt.assert_array_equal(p.linkage, self.x_norm_linkage) nt.assert_dict_equal(p.dendrogram, self.x_norm_dendrogram) npt.assert_array_equal(p.reordered_ind, self.x_norm_leaves) npt.assert_array_equal(p.xticklabels, self.x_norm_leaves) npt.assert_array_equal(p.yticklabels, []) nt.assert_equal(p.xlabel, None) nt.assert_equal(p.ylabel, '') def test_df_input(self): p = mat._DendrogramPlotter(self.df_norm, **self.default_kws) npt.assert_array_equal(p.array.T, np.asarray(self.df_norm)) pdt.assert_frame_equal(p.data.T, self.df_norm) npt.assert_array_equal(p.linkage, self.x_norm_linkage) nt.assert_dict_equal(p.dendrogram, self.x_norm_dendrogram) npt.assert_array_equal(p.xticklabels, np.asarray(self.df_norm.columns)[ self.x_norm_leaves]) npt.assert_array_equal(p.yticklabels, []) nt.assert_equal(p.xlabel, 'letters') nt.assert_equal(p.ylabel, '') def test_df_multindex_input(self): df = self.df_norm.copy() index = pd.MultiIndex.from_tuples([("A", 1), ("B", 2), ("C", 3), ("D", 4)], names=["letter", "number"]) index.name = "letter-number" df.index = index kws = self.default_kws.copy() kws['label'] = True p = mat._DendrogramPlotter(df.T, **kws) xticklabels = ["A-1", "B-2", "C-3", "D-4"] xticklabels = [xticklabels[i] for i in p.reordered_ind] npt.assert_array_equal(p.xticklabels, xticklabels) npt.assert_array_equal(p.yticklabels, []) nt.assert_equal(p.xlabel, "letter-number") def test_axis0_input(self): kws = self.default_kws.copy() kws['axis'] = 0 p = mat._DendrogramPlotter(self.df_norm.T, **kws) npt.assert_array_equal(p.array, np.asarray(self.df_norm.T)) pdt.assert_frame_equal(p.data, self.df_norm.T) npt.assert_array_equal(p.linkage, self.x_norm_linkage) nt.assert_dict_equal(p.dendrogram, self.x_norm_dendrogram) npt.assert_array_equal(p.xticklabels, self.df_norm_leaves) npt.assert_array_equal(p.yticklabels, []) nt.assert_equal(p.xlabel, 'letters') nt.assert_equal(p.ylabel, '') def test_rotate_input(self): kws = self.default_kws.copy() kws['rotate'] = True p = mat._DendrogramPlotter(self.df_norm, **kws) npt.assert_array_equal(p.array.T, np.asarray(self.df_norm)) pdt.assert_frame_equal(p.data.T, self.df_norm) npt.assert_array_equal(p.xticklabels, []) npt.assert_array_equal(p.yticklabels, self.df_norm_leaves) nt.assert_equal(p.xlabel, '') nt.assert_equal(p.ylabel, 'letters') def test_rotate_axis0_input(self): kws = self.default_kws.copy() kws['rotate'] = True kws['axis'] = 0 p = mat._DendrogramPlotter(self.df_norm.T, **kws) npt.assert_array_equal(p.reordered_ind, self.x_norm_leaves) def test_custom_linkage(self): kws = self.default_kws.copy() try: import fastcluster linkage = fastcluster.linkage_vector(self.x_norm, method='single', metric='euclidean') except ImportError: d = distance.squareform(distance.pdist(self.x_norm, metric='euclidean')) linkage = hierarchy.linkage(d, method='single') dendrogram = hierarchy.dendrogram(linkage, no_plot=True, color_list=['k'], color_threshold=-np.inf) kws['linkage'] = linkage p = mat._DendrogramPlotter(self.df_norm, **kws) npt.assert_array_equal(p.linkage, linkage) nt.assert_dict_equal(p.dendrogram, dendrogram) def test_label_false(self): kws = self.default_kws.copy() kws['label'] = False p = mat._DendrogramPlotter(self.df_norm, **kws) nt.assert_equal(p.xticks, []) nt.assert_equal(p.yticks, []) nt.assert_equal(p.xticklabels, []) nt.assert_equal(p.yticklabels, []) nt.assert_equal(p.xlabel, "") nt.assert_equal(p.ylabel, "") def test_linkage_scipy(self): p = mat._DendrogramPlotter(self.x_norm, **self.default_kws) scipy_linkage = p._calculate_linkage_scipy() from scipy.spatial import distance from scipy.cluster import hierarchy dists = distance.squareform(distance.pdist(self.x_norm.T, metric=self.default_kws[ 'metric'])) linkage = hierarchy.linkage(dists, method=self.default_kws['method']) npt.assert_array_equal(scipy_linkage, linkage) @skipif(_no_fastcluster) def test_fastcluster_other_method(self): import fastcluster kws = self.default_kws.copy() kws['method'] = 'average' linkage = fastcluster.linkage(self.x_norm.T, method='average', metric='euclidean') p = mat._DendrogramPlotter(self.x_norm, **kws) npt.assert_array_equal(p.linkage, linkage) @skipif(_no_fastcluster) def test_fastcluster_non_euclidean(self): import fastcluster kws = self.default_kws.copy() kws['metric'] = 'cosine' kws['method'] = 'average' linkage = fastcluster.linkage(self.x_norm.T, method=kws['method'], metric=kws['metric']) p = mat._DendrogramPlotter(self.x_norm, **kws) npt.assert_array_equal(p.linkage, linkage) def test_dendrogram_plot(self): d = mat.dendrogram(self.x_norm, **self.default_kws) ax = plt.gca() d.xmin, d.xmax = ax.get_xlim() xmax = min(map(min, d.X)) + max(map(max, d.X)) nt.assert_equal(d.xmin, 0) nt.assert_equal(d.xmax, xmax) nt.assert_equal(len(ax.get_lines()), len(d.X)) nt.assert_equal(len(ax.get_lines()), len(d.Y)) plt.close('all') def test_dendrogram_rotate(self): kws = self.default_kws.copy() kws['rotate'] = True d = mat.dendrogram(self.x_norm, **kws) ax = plt.gca() d.ymin, d.ymax = ax.get_ylim() ymax = min(map(min, d.Y)) + max(map(max, d.Y)) nt.assert_equal(d.ymin, 0) nt.assert_equal(d.ymax, ymax) plt.close('all') def test_dendrogram_ticklabel_rotation(self): f, ax = plt.subplots(figsize=(2, 2)) mat.dendrogram(self.df_norm, ax=ax) for t in ax.get_xticklabels(): nt.assert_equal(t.get_rotation(), 0) plt.close(f) df = self.df_norm.copy() df.columns = [str(c) * 10 for c in df.columns] df.index = [i * 10 for i in df.index] f, ax = plt.subplots(figsize=(2, 2)) mat.dendrogram(df, ax=ax) for t in ax.get_xticklabels(): nt.assert_equal(t.get_rotation(), 90) plt.close(f) f, ax = plt.subplots(figsize=(2, 2)) mat.dendrogram(df.T, axis=0, rotate=True) for t in ax.get_yticklabels(): nt.assert_equal(t.get_rotation(), 0) plt.close(f) class TestClustermap(object): rs = np.random.RandomState(sum(map(ord, "clustermap"))) x_norm = rs.randn(4, 8) + np.arange(8) x_norm = (x_norm.T + np.arange(4)).T letters = pd.Series(["A", "B", "C", "D", "E", "F", "G", "H"], name="letters") df_norm = pd.DataFrame(x_norm, columns=letters) try: import fastcluster x_norm_linkage = fastcluster.linkage_vector(x_norm.T, metric='euclidean', method='single') except ImportError: x_norm_distances = distance.squareform( distance.pdist(x_norm.T, metric='euclidean')) x_norm_linkage = hierarchy.linkage(x_norm_distances, method='single') x_norm_dendrogram = hierarchy.dendrogram(x_norm_linkage, no_plot=True, color_list=['k'], color_threshold=-np.inf) x_norm_leaves = x_norm_dendrogram['leaves'] df_norm_leaves = np.asarray(df_norm.columns[x_norm_leaves]) default_kws = dict(pivot_kws=None, z_score=None, standard_scale=None, figsize=None, row_colors=None, col_colors=None) default_plot_kws = dict(metric='euclidean', method='average', colorbar_kws=None, row_cluster=True, col_cluster=True, row_linkage=None, col_linkage=None) row_colors = color_palette('Set2', df_norm.shape[0]) col_colors = color_palette('Dark2', df_norm.shape[1]) def test_ndarray_input(self): cm = mat.ClusterGrid(self.x_norm, **self.default_kws) pdt.assert_frame_equal(cm.data, pd.DataFrame(self.x_norm)) nt.assert_equal(len(cm.fig.axes), 4) nt.assert_equal(cm.ax_row_colors, None) nt.assert_equal(cm.ax_col_colors, None) plt.close('all') def test_df_input(self): cm = mat.ClusterGrid(self.df_norm, **self.default_kws) pdt.assert_frame_equal(cm.data, self.df_norm) plt.close('all') def test_corr_df_input(self): df = self.df_norm.corr() cg = mat.ClusterGrid(df, **self.default_kws) cg.plot(**self.default_plot_kws) diag = cg.data2d.values[np.diag_indices_from(cg.data2d)] npt.assert_array_equal(diag, np.ones(cg.data2d.shape[0])) plt.close('all') def test_pivot_input(self): df_norm = self.df_norm.copy() df_norm.index.name = 'numbers' df_long = pd.melt(df_norm.reset_index(), var_name='letters', id_vars='numbers') kws = self.default_kws.copy() kws['pivot_kws'] = dict(index='numbers', columns='letters', values='value') cm = mat.ClusterGrid(df_long, **kws) pdt.assert_frame_equal(cm.data2d, df_norm) plt.close('all') def test_colors_input(self): kws = self.default_kws.copy() kws['row_colors'] = self.row_colors kws['col_colors'] = self.col_colors cm = mat.ClusterGrid(self.df_norm, **kws) npt.assert_array_equal(cm.row_colors, self.row_colors) npt.assert_array_equal(cm.col_colors, self.col_colors) nt.assert_equal(len(cm.fig.axes), 6) plt.close('all') def test_nested_colors_input(self): kws = self.default_kws.copy() row_colors = [self.row_colors, self.row_colors] col_colors = [self.col_colors, self.col_colors] kws['row_colors'] = row_colors kws['col_colors'] = col_colors cm = mat.ClusterGrid(self.df_norm, **kws) npt.assert_array_equal(cm.row_colors, row_colors) npt.assert_array_equal(cm.col_colors, col_colors) nt.assert_equal(len(cm.fig.axes), 6) plt.close('all') def test_colors_input_custom_cmap(self): kws = self.default_kws.copy() kws['cmap'] = mpl.cm.PRGn kws['row_colors'] = self.row_colors kws['col_colors'] = self.col_colors cm = mat.clustermap(self.df_norm, **kws) npt.assert_array_equal(cm.row_colors, self.row_colors) npt.assert_array_equal(cm.col_colors, self.col_colors) nt.assert_equal(len(cm.fig.axes), 6) plt.close('all') def test_z_score(self): df = self.df_norm.copy() df = (df - df.mean()) / df.var() kws = self.default_kws.copy() kws['z_score'] = 1 cm = mat.ClusterGrid(self.df_norm, **kws) pdt.assert_frame_equal(cm.data2d, df) plt.close('all') def test_z_score_axis0(self): df = self.df_norm.copy() df = df.T df = (df - df.mean()) / df.var() df = df.T kws = self.default_kws.copy() kws['z_score'] = 0 cm = mat.ClusterGrid(self.df_norm, **kws) pdt.assert_frame_equal(cm.data2d, df) plt.close('all') def test_standard_scale(self): df = self.df_norm.copy() df = (df - df.min()) / (df.max() - df.min()) kws = self.default_kws.copy() kws['standard_scale'] = 1 cm = mat.ClusterGrid(self.df_norm, **kws) pdt.assert_frame_equal(cm.data2d, df) plt.close('all') def test_standard_scale_axis0(self): df = self.df_norm.copy() df = df.T df = (df - df.min()) / (df.max() - df.min()) df = df.T kws = self.default_kws.copy() kws['standard_scale'] = 0 cm = mat.ClusterGrid(self.df_norm, **kws) pdt.assert_frame_equal(cm.data2d, df) plt.close('all') def test_z_score_standard_scale(self): kws = self.default_kws.copy() kws['z_score'] = True kws['standard_scale'] = True with nt.assert_raises(ValueError): cm = mat.ClusterGrid(self.df_norm, **kws) plt.close('all') def test_color_list_to_matrix_and_cmap(self): matrix, cmap = mat.ClusterGrid.color_list_to_matrix_and_cmap( self.col_colors, self.x_norm_leaves) colors_set = set(self.col_colors) col_to_value = dict((col, i) for i, col in enumerate(colors_set)) matrix_test = np.array([col_to_value[col] for col in self.col_colors])[self.x_norm_leaves] shape = len(self.col_colors), 1 matrix_test = matrix_test.reshape(shape) cmap_test = mpl.colors.ListedColormap(colors_set) npt.assert_array_equal(matrix, matrix_test) npt.assert_array_equal(cmap.colors, cmap_test.colors) plt.close('all') def test_nested_color_list_to_matrix_and_cmap(self): colors = [self.col_colors, self.col_colors] matrix, cmap = mat.ClusterGrid.color_list_to_matrix_and_cmap( colors, self.x_norm_leaves) all_colors = set(itertools.chain(*colors)) color_to_value = dict((col, i) for i, col in enumerate(all_colors)) matrix_test = np.array( [color_to_value[c] for color in colors for c in color]) shape = len(colors), len(colors[0]) matrix_test = matrix_test.reshape(shape) matrix_test = matrix_test[:, self.x_norm_leaves] matrix_test = matrix_test.T cmap_test = mpl.colors.ListedColormap(all_colors) npt.assert_array_equal(matrix, matrix_test) npt.assert_array_equal(cmap.colors, cmap_test.colors) plt.close('all') def test_color_list_to_matrix_and_cmap_axis1(self): matrix, cmap = mat.ClusterGrid.color_list_to_matrix_and_cmap( self.col_colors, self.x_norm_leaves, axis=1) colors_set = set(self.col_colors) col_to_value = dict((col, i) for i, col in enumerate(colors_set)) matrix_test = np.array([col_to_value[col] for col in self.col_colors])[self.x_norm_leaves] shape = 1, len(self.col_colors) matrix_test = matrix_test.reshape(shape) cmap_test = mpl.colors.ListedColormap(colors_set) npt.assert_array_equal(matrix, matrix_test) npt.assert_array_equal(cmap.colors, cmap_test.colors) plt.close('all') def test_savefig(self): # Not sure if this is the right way to test.... cm = mat.ClusterGrid(self.df_norm, **self.default_kws) cm.plot(**self.default_plot_kws) cm.savefig(tempfile.NamedTemporaryFile(), format='png') plt.close('all') def test_plot_dendrograms(self): cm = mat.clustermap(self.df_norm, **self.default_kws) nt.assert_equal(len(cm.ax_row_dendrogram.get_lines()), len(cm.dendrogram_row.X)) nt.assert_equal(len(cm.ax_col_dendrogram.get_lines()), len(cm.dendrogram_col.X)) data2d = self.df_norm.iloc[cm.dendrogram_row.reordered_ind, cm.dendrogram_col.reordered_ind] pdt.assert_frame_equal(cm.data2d, data2d) plt.close('all') def test_cluster_false(self): kws = self.default_kws.copy() kws['row_cluster'] = False kws['col_cluster'] = False cm = mat.clustermap(self.df_norm, **kws) nt.assert_equal(len(cm.ax_row_dendrogram.lines), 0) nt.assert_equal(len(cm.ax_col_dendrogram.lines), 0) nt.assert_equal(len(cm.ax_row_dendrogram.get_xticks()), 0) nt.assert_equal(len(cm.ax_row_dendrogram.get_yticks()), 0) nt.assert_equal(len(cm.ax_col_dendrogram.get_xticks()), 0) nt.assert_equal(len(cm.ax_col_dendrogram.get_yticks()), 0) pdt.assert_frame_equal(cm.data2d, self.df_norm) plt.close('all') def test_row_col_colors(self): kws = self.default_kws.copy() kws['row_colors'] = self.row_colors kws['col_colors'] = self.col_colors cm = mat.clustermap(self.df_norm, **kws) nt.assert_equal(len(cm.ax_row_colors.collections), 1) nt.assert_equal(len(cm.ax_col_colors.collections), 1) plt.close('all') def test_cluster_false_row_col_colors(self): kws = self.default_kws.copy() kws['row_cluster'] = False kws['col_cluster'] = False kws['row_colors'] = self.row_colors kws['col_colors'] = self.col_colors cm = mat.clustermap(self.df_norm, **kws) nt.assert_equal(len(cm.ax_row_dendrogram.lines), 0) nt.assert_equal(len(cm.ax_col_dendrogram.lines), 0) nt.assert_equal(len(cm.ax_row_dendrogram.get_xticks()), 0) nt.assert_equal(len(cm.ax_row_dendrogram.get_yticks()), 0) nt.assert_equal(len(cm.ax_col_dendrogram.get_xticks()), 0) nt.assert_equal(len(cm.ax_col_dendrogram.get_yticks()), 0) nt.assert_equal(len(cm.ax_row_colors.collections), 1) nt.assert_equal(len(cm.ax_col_colors.collections), 1) pdt.assert_frame_equal(cm.data2d, self.df_norm) plt.close('all') def test_mask_reorganization(self): kws = self.default_kws.copy() kws["mask"] = self.df_norm > 0 g = mat.clustermap(self.df_norm, **kws) npt.assert_array_equal(g.data2d.index, g.mask.index) npt.assert_array_equal(g.data2d.columns, g.mask.columns) npt.assert_array_equal(g.mask.index, self.df_norm.index[ g.dendrogram_row.reordered_ind]) npt.assert_array_equal(g.mask.columns, self.df_norm.columns[ g.dendrogram_col.reordered_ind]) plt.close("all") def test_ticklabel_reorganization(self): kws = self.default_kws.copy() xtl = np.arange(self.df_norm.shape[1]) kws["xticklabels"] = list(xtl) ytl = self.letters.ix[:self.df_norm.shape[0]] kws["yticklabels"] = ytl g = mat.clustermap(self.df_norm, **kws) xtl_actual = [t.get_text() for t in g.ax_heatmap.get_xticklabels()] ytl_actual = [t.get_text() for t in g.ax_heatmap.get_yticklabels()] xtl_want = xtl[g.dendrogram_col.reordered_ind].astype("= "0.15" from pandas.util.testing import network from ..utils import get_dataset_names, load_dataset try: from bs4 import BeautifulSoup except ImportError: BeautifulSoup = None from .. import utils, rcmod a_norm = np.random.randn(100) def test_pmf_hist_basics(): """Test the function to return barplot args for pmf hist.""" out = utils.pmf_hist(a_norm) assert_equal(len(out), 3) x, h, w = out assert_equal(len(x), len(h)) # Test simple case a = np.arange(10) x, h, w = utils.pmf_hist(a, 10) nose.tools.assert_true(np.all(h == h[0])) def test_pmf_hist_widths(): """Test histogram width is correct.""" x, h, w = utils.pmf_hist(a_norm) assert_equal(x[1] - x[0], w) def test_pmf_hist_normalization(): """Test that output data behaves like a PMF.""" x, h, w = utils.pmf_hist(a_norm) nose.tools.assert_almost_equal(sum(h), 1) nose.tools.assert_less_equal(h.max(), 1) def test_pmf_hist_bins(): """Test bin specification.""" x, h, w = utils.pmf_hist(a_norm, 20) assert_equal(len(x), 20) def test_ci_to_errsize(): """Test behavior of ci_to_errsize.""" cis = [[.5, .5], [1.25, 1.5]] heights = [1, 1.5] actual_errsize = np.array([[.5, 1], [.25, 0]]) test_errsize = utils.ci_to_errsize(cis, heights) npt.assert_array_equal(actual_errsize, test_errsize) def test_desaturate(): """Test color desaturation.""" out1 = utils.desaturate("red", .5) assert_equal(out1, (.75, .25, .25)) out2 = utils.desaturate("#00FF00", .5) assert_equal(out2, (.25, .75, .25)) out3 = utils.desaturate((0, 0, 1), .5) assert_equal(out3, (.25, .25, .75)) out4 = utils.desaturate("red", .5) assert_equal(out4, (.75, .25, .25)) @raises(ValueError) def test_desaturation_prop(): """Test that pct outside of [0, 1] raises exception.""" utils.desaturate("blue", 50) def test_saturate(): """Test performance of saturation function.""" out = utils.saturate((.75, .25, .25)) assert_equal(out, (1, 0, 0)) def test_iqr(): """Test the IQR function.""" a = np.arange(5) iqr = utils.iqr(a) assert_equal(iqr, 2) class TestSpineUtils(object): sides = ["left", "right", "bottom", "top"] outer_sides = ["top", "right"] inner_sides = ["left", "bottom"] offset = 10 original_position = ("outward", 0) offset_position = ("outward", offset) def test_despine(self): f, ax = plt.subplots() for side in self.sides: nt.assert_true(ax.spines[side].get_visible()) utils.despine() for side in self.outer_sides: nt.assert_true(~ax.spines[side].get_visible()) for side in self.inner_sides: nt.assert_true(ax.spines[side].get_visible()) utils.despine(**dict(zip(self.sides, [True] * 4))) for side in self.sides: nt.assert_true(~ax.spines[side].get_visible()) plt.close("all") def test_despine_specific_axes(self): f, (ax1, ax2) = plt.subplots(2, 1) utils.despine(ax=ax2) for side in self.sides: nt.assert_true(ax1.spines[side].get_visible()) for side in self.outer_sides: nt.assert_true(~ax2.spines[side].get_visible()) for side in self.inner_sides: nt.assert_true(ax2.spines[side].get_visible()) plt.close("all") def test_despine_with_offset(self): f, ax = plt.subplots() for side in self.sides: nt.assert_equal(ax.spines[side].get_position(), self.original_position) utils.despine(ax=ax, offset=self.offset) for side in self.sides: is_visible = ax.spines[side].get_visible() new_position = ax.spines[side].get_position() if is_visible: nt.assert_equal(new_position, self.offset_position) else: nt.assert_equal(new_position, self.original_position) plt.close("all") def test_despine_with_offset_specific_axes(self): f, (ax1, ax2) = plt.subplots(2, 1) utils.despine(offset=self.offset, ax=ax2) for side in self.sides: nt.assert_equal(ax1.spines[side].get_position(), self.original_position) if ax2.spines[side].get_visible(): nt.assert_equal(ax2.spines[side].get_position(), self.offset_position) else: nt.assert_equal(ax2.spines[side].get_position(), self.original_position) plt.close("all") def test_despine_trim_spines(self): f, ax = plt.subplots() ax.plot([1, 2, 3], [1, 2, 3]) ax.set_xlim(.75, 3.25) utils.despine(trim=True) for side in self.inner_sides: bounds = ax.spines[side].get_bounds() nt.assert_equal(bounds, (1, 3)) plt.close("all") def test_despine_trim_inverted(self): f, ax = plt.subplots() ax.plot([1, 2, 3], [1, 2, 3]) ax.set_ylim(.85, 3.15) ax.invert_yaxis() utils.despine(trim=True) for side in self.inner_sides: bounds = ax.spines[side].get_bounds() nt.assert_equal(bounds, (1, 3)) plt.close("all") def test_despine_trim_noticks(self): f, ax = plt.subplots() ax.plot([1, 2, 3], [1, 2, 3]) ax.set_yticks([]) utils.despine(trim=True) nt.assert_equal(ax.get_yticks().size, 0) def test_offset_spines_warns(self): with warnings.catch_warnings(record=True) as w: warnings.simplefilter("always", category=UserWarning) f, ax = plt.subplots() utils.offset_spines(offset=self.offset) nt.assert_true('deprecated' in str(w[0].message)) nt.assert_true(issubclass(w[0].category, UserWarning)) plt.close('all') def test_offset_spines(self): with warnings.catch_warnings(record=True) as w: warnings.simplefilter("always", category=UserWarning) f, ax = plt.subplots() for side in self.sides: nt.assert_equal(ax.spines[side].get_position(), self.original_position) utils.offset_spines(offset=self.offset) for side in self.sides: nt.assert_equal(ax.spines[side].get_position(), self.offset_position) plt.close("all") def test_offset_spines_specific_axes(self): with warnings.catch_warnings(record=True) as w: warnings.simplefilter("always", category=UserWarning) f, (ax1, ax2) = plt.subplots(2, 1) utils.offset_spines(offset=self.offset, ax=ax2) for side in self.sides: nt.assert_equal(ax1.spines[side].get_position(), self.original_position) nt.assert_equal(ax2.spines[side].get_position(), self.offset_position) plt.close("all") def test_ticklabels_overlap(): rcmod.set() f, ax = plt.subplots(figsize=(2, 2)) f.tight_layout() # This gets the Agg renderer working assert not utils.axis_ticklabels_overlap(ax.get_xticklabels()) big_strings = "abcdefgh", "ijklmnop" ax.set_xlim(-.5, 1.5) ax.set_xticks([0, 1]) ax.set_xticklabels(big_strings) assert utils.axis_ticklabels_overlap(ax.get_xticklabels()) x, y = utils.axes_ticklabels_overlap(ax) assert x assert not y def test_categorical_order(): x = ["a", "c", "c", "b", "a", "d"] order = ["a", "b", "c", "d"] out = utils.categorical_order(x) nt.assert_equal(out, ["a", "c", "b", "d"]) out = utils.categorical_order(x, order) nt.assert_equal(out, order) out = utils.categorical_order(x, ["b", "a"]) nt.assert_equal(out, ["b", "a"]) out = utils.categorical_order(np.array(x)) nt.assert_equal(out, ["a", "c", "b", "d"]) out = utils.categorical_order(pd.Series(x)) nt.assert_equal(out, ["a", "c", "b", "d"]) if pandas_has_categoricals: x = pd.Categorical(x, order) out = utils.categorical_order(x) nt.assert_equal(out, list(x.categories)) x = pd.Series(x) out = utils.categorical_order(x) nt.assert_equal(out, list(x.cat.categories)) out = utils.categorical_order(x, ["b", "a"]) nt.assert_equal(out, ["b", "a"]) x = ["a", np.nan, "c", "c", "b", "a", "d"] out = utils.categorical_order(x) nt.assert_equal(out, ["a", "c", "b", "d"]) if LooseVersion(pd.__version__) >= "0.15": def check_load_dataset(name): ds = load_dataset(name, cache=False) assert(isinstance(ds, pd.DataFrame)) def check_load_cached_dataset(name): # Test the cacheing using a temporary file. # With Python 3.2+, we could use the tempfile.TemporaryDirectory() # context manager instead of this try...finally statement tmpdir = tempfile.mkdtemp() try: # download and cache ds = load_dataset(name, cache=True, data_home=tmpdir) # use cached version ds2 = load_dataset(name, cache=True, data_home=tmpdir) pdt.assert_frame_equal(ds, ds2) finally: shutil.rmtree(tmpdir) @network(url="https://github.com/mwaskom/seaborn-data") def test_get_dataset_names(): if not BeautifulSoup: raise nose.SkipTest("No BeautifulSoup available for parsing html") names = get_dataset_names() assert(len(names) > 0) assert(u"titanic" in names) @network(url="https://github.com/mwaskom/seaborn-data") def test_load_datasets(): if not BeautifulSoup: raise nose.SkipTest("No BeautifulSoup available for parsing html") # Heavy test to verify that we can load all available datasets for name in get_dataset_names(): # unfortunately @network somehow obscures this generator so it # does not get in effect, so we need to call explicitly # yield check_load_dataset, name check_load_dataset(name) @network(url="https://github.com/mwaskom/seaborn-data") def test_load_cached_datasets(): if not BeautifulSoup: raise nose.SkipTest("No BeautifulSoup available for parsing html") # Heavy test to verify that we can load all available datasets for name in get_dataset_names(): # unfortunately @network somehow obscures this generator so it # does not get in effect, so we need to call explicitly # yield check_load_dataset, name check_load_cached_dataset(name) seaborn-0.6.0/seaborn/timeseries.py000066400000000000000000000355621254425553400173460ustar00rootroot00000000000000"""Timeseries plotting functions.""" from __future__ import division import numpy as np import pandas as pd from scipy import stats, interpolate import matplotlib as mpl import matplotlib.pyplot as plt from .external.six import string_types from . import utils from . import algorithms as algo from .palettes import color_palette def tsplot(data, time=None, unit=None, condition=None, value=None, err_style="ci_band", ci=68, interpolate=True, color=None, estimator=np.mean, n_boot=5000, err_palette=None, err_kws=None, legend=True, ax=None, **kwargs): """Plot one or more timeseries with flexible representation of uncertainty. This function is intended to be used with data where observations are nested within sampling units that were measured at multiple timepoints. It can take data specified either as a long-form (tidy) DataFrame or as an ndarray with dimensions (unit, time) The interpretation of some of the other parameters changes depending on the type of object passed as data. Parameters ---------- data : DataFrame or ndarray Data for the plot. Should either be a "long form" dataframe or an array with dimensions (unit, time, condition). In both cases, the condition field/dimension is optional. The type of this argument determines the interpretation of the next few parameters. When using a DataFrame, the index has to be sequential. time : string or series-like Either the name of the field corresponding to time in the data DataFrame or x values for a plot when data is an array. If a Series, the name will be used to label the x axis. unit : string Field in the data DataFrame identifying the sampling unit (e.g. subject, neuron, etc.). The error representation will collapse over units at each time/condition observation. This has no role when data is an array. value : string Either the name of the field corresponding to the data values in the data DataFrame (i.e. the y coordinate) or a string that forms the y axis label when data is an array. condition : string or Series-like Either the name of the field identifying the condition an observation falls under in the data DataFrame, or a sequence of names with a length equal to the size of the third dimension of data. There will be a separate trace plotted for each condition. If condition is a Series with a name attribute, the name will form the title for the plot legend (unless legend is set to False). err_style : string or list of strings or None Names of ways to plot uncertainty across units from set of {ci_band, ci_bars, boot_traces, boot_kde, unit_traces, unit_points}. Can use one or more than one method. ci : float or list of floats in [0, 100] Confidence interaval size(s). If a list, it will stack the error plots for each confidence interval. Only relevant for error styles with "ci" in the name. interpolate : boolean Whether to do a linear interpolation between each timepoint when plotting. The value of this parameter also determines the marker used for the main plot traces, unless marker is specified as a keyword argument. color : seaborn palette or matplotlib color name or dictionary Palette or color for the main plots and error representation (unless plotting by unit, which can be separately controlled with err_palette). If a dictionary, should map condition name to color spec. estimator : callable Function to determine central tendency and to pass to bootstrap must take an ``axis`` argument. n_boot : int Number of bootstrap iterations. err_palette : seaborn palette Palette name or list of colors used when plotting data for each unit. err_kws : dict, optional Keyword argument dictionary passed through to matplotlib function generating the error plot, legend : bool, optional If ``True`` and there is a ``condition`` variable, add a legend to the plot. ax : axis object, optional Plot in given axis; if None creates a new figure kwargs : Other keyword arguments are passed to main plot() call Returns ------- ax : matplotlib axis axis with plot data Examples -------- Plot a trace with translucent confidence bands: .. plot:: :context: close-figs >>> import numpy as np; np.random.seed(22) >>> import seaborn as sns; sns.set(color_codes=True) >>> x = np.linspace(0, 15, 31) >>> data = np.sin(x) + np.random.rand(10, 31) + np.random.randn(10, 1) >>> ax = sns.tsplot(data=data) Plot a long-form dataframe with several conditions: .. plot:: :context: close-figs >>> gammas = sns.load_dataset("gammas") >>> ax = sns.tsplot(time="timepoint", value="BOLD signal", ... unit="subject", condition="ROI", ... data=gammas) Use error bars at the positions of the observations: .. plot:: :context: close-figs >>> ax = sns.tsplot(data=data, err_style="ci_bars", color="g") Don't interpolate between the observations: .. plot:: :context: close-figs >>> import matplotlib.pyplot as plt >>> ax = sns.tsplot(data=data, err_style="ci_bars", interpolate=False) Show multiple confidence bands: .. plot:: :context: close-figs >>> ax = sns.tsplot(data=data, ci=[68, 95], color="m") Use a different estimator: .. plot:: :context: close-figs >>> ax = sns.tsplot(data=data, estimator=np.median) Show each bootstrap resample: .. plot:: :context: close-figs >>> ax = sns.tsplot(data=data, err_style="boot_traces", n_boot=500) Show the trace from each sampling unit: .. plot:: :context: close-figs >>> ax = sns.tsplot(data=data, err_style="unit_traces") """ # Sort out default values for the parameters if ax is None: ax = plt.gca() if err_kws is None: err_kws = {} # Handle different types of input data if isinstance(data, pd.DataFrame): xlabel = time ylabel = value # Condition is optional if condition is None: condition = pd.Series(np.ones(len(data))) legend = False legend_name = None n_cond = 1 else: legend = True and legend legend_name = condition n_cond = len(data[condition].unique()) else: data = np.asarray(data) # Data can be a timecourse from a single unit or # several observations in one condition if data.ndim == 1: data = data[np.newaxis, :, np.newaxis] elif data.ndim == 2: data = data[:, :, np.newaxis] n_unit, n_time, n_cond = data.shape # Units are experimental observations. Maybe subjects, or neurons if unit is None: units = np.arange(n_unit) unit = "unit" units = np.repeat(units, n_time * n_cond) ylabel = None # Time forms the xaxis of the plot if time is None: times = np.arange(n_time) else: times = np.asarray(time) xlabel = None if hasattr(time, "name"): xlabel = time.name time = "time" times = np.tile(np.repeat(times, n_cond), n_unit) # Conditions split the timeseries plots if condition is None: conds = range(n_cond) legend = False if isinstance(color, dict): err = "Must have condition names if using color dict." raise ValueError(err) else: conds = np.asarray(condition) legend = True and legend if hasattr(condition, "name"): legend_name = condition.name else: legend_name = None condition = "cond" conds = np.tile(conds, n_unit * n_time) # Value forms the y value in the plot if value is None: ylabel = None else: ylabel = value value = "value" # Convert to long-form DataFrame data = pd.DataFrame(dict(value=data.ravel(), time=times, unit=units, cond=conds)) # Set up the err_style and ci arguments for the loop below if isinstance(err_style, string_types): err_style = [err_style] elif err_style is None: err_style = [] if not hasattr(ci, "__iter__"): ci = [ci] # Set up the color palette if color is None: current_palette = mpl.rcParams["axes.color_cycle"] if len(current_palette) < n_cond: colors = color_palette("husl", n_cond) else: colors = color_palette(n_colors=n_cond) elif isinstance(color, dict): colors = [color[c] for c in data[condition].unique()] else: try: colors = color_palette(color, n_cond) except ValueError: color = mpl.colors.colorConverter.to_rgb(color) colors = [color] * n_cond # Do a groupby with condition and plot each trace for c, (cond, df_c) in enumerate(data.groupby(condition, sort=False)): df_c = df_c.pivot(unit, time, value) x = df_c.columns.values.astype(np.float) # Bootstrap the data for confidence intervals boot_data = algo.bootstrap(df_c.values, n_boot=n_boot, axis=0, func=estimator) cis = [utils.ci(boot_data, v, axis=0) for v in ci] central_data = estimator(df_c.values, axis=0) # Get the color for this condition color = colors[c] # Use subroutines to plot the uncertainty for style in err_style: # Allow for null style (only plot central tendency) if style is None: continue # Grab the function from the global environment try: plot_func = globals()["_plot_%s" % style] except KeyError: raise ValueError("%s is not a valid err_style" % style) # Possibly set up to plot each observation in a different color if err_palette is not None and "unit" in style: orig_color = color color = color_palette(err_palette, len(df_c.values)) # Pass all parameters to the error plotter as keyword args plot_kwargs = dict(ax=ax, x=x, data=df_c.values, boot_data=boot_data, central_data=central_data, color=color, err_kws=err_kws) # Plot the error representation, possibly for multiple cis for ci_i in cis: plot_kwargs["ci"] = ci_i plot_func(**plot_kwargs) if err_palette is not None and "unit" in style: color = orig_color # Plot the central trace kwargs.setdefault("marker", "" if interpolate else "o") ls = kwargs.pop("ls", "-" if interpolate else "") kwargs.setdefault("linestyle", ls) label = cond if legend else "_nolegend_" ax.plot(x, central_data, color=color, label=label, **kwargs) # Pad the sides of the plot only when not interpolating ax.set_xlim(x.min(), x.max()) x_diff = x[1] - x[0] if not interpolate: ax.set_xlim(x.min() - x_diff, x.max() + x_diff) # Add the plot labels if xlabel is not None: ax.set_xlabel(xlabel) if ylabel is not None: ax.set_ylabel(ylabel) if legend: ax.legend(loc=0, title=legend_name) return ax # Subroutines for tsplot errorbar plotting # ---------------------------------------- def _plot_ci_band(ax, x, ci, color, err_kws, **kwargs): """Plot translucent error bands around the central tendancy.""" low, high = ci if "alpha" not in err_kws: err_kws["alpha"] = 0.2 ax.fill_between(x, low, high, color=color, **err_kws) def _plot_ci_bars(ax, x, central_data, ci, color, err_kws, **kwargs): """Plot error bars at each data point.""" for x_i, y_i, (low, high) in zip(x, central_data, ci.T): ax.plot([x_i, x_i], [low, high], color=color, solid_capstyle="round", **err_kws) def _plot_boot_traces(ax, x, boot_data, color, err_kws, **kwargs): """Plot 250 traces from bootstrap.""" err_kws.setdefault("alpha", 0.25) err_kws.setdefault("linewidth", 0.25) if "lw" in err_kws: err_kws["linewidth"] = err_kws.pop("lw") ax.plot(x, boot_data.T, color=color, label="_nolegend_", **err_kws) def _plot_unit_traces(ax, x, data, ci, color, err_kws, **kwargs): """Plot a trace for each observation in the original data.""" if isinstance(color, list): if "alpha" not in err_kws: err_kws["alpha"] = .5 for i, obs in enumerate(data): ax.plot(x, obs, color=color[i], label="_nolegend_", **err_kws) else: if "alpha" not in err_kws: err_kws["alpha"] = .2 ax.plot(x, data.T, color=color, label="_nolegend_", **err_kws) def _plot_unit_points(ax, x, data, color, err_kws, **kwargs): """Plot each original data point discretely.""" if isinstance(color, list): for i, obs in enumerate(data): ax.plot(x, obs, "o", color=color[i], alpha=0.8, markersize=4, label="_nolegend_", **err_kws) else: ax.plot(x, data.T, "o", color=color, alpha=0.5, markersize=4, label="_nolegend_", **err_kws) def _plot_boot_kde(ax, x, boot_data, color, **kwargs): """Plot the kernal density estimate of the bootstrap distribution.""" kwargs.pop("data") _ts_kde(ax, x, boot_data, color, **kwargs) def _plot_unit_kde(ax, x, data, color, **kwargs): """Plot the kernal density estimate over the sample.""" _ts_kde(ax, x, data, color, **kwargs) def _ts_kde(ax, x, data, color, **kwargs): """Upsample over time and plot a KDE of the bootstrap distribution.""" kde_data = [] y_min, y_max = data.min(), data.max() y_vals = np.linspace(y_min, y_max, 100) upsampler = interpolate.interp1d(x, data) data_upsample = upsampler(np.linspace(x.min(), x.max(), 100)) for pt_data in data_upsample.T: pt_kde = stats.kde.gaussian_kde(pt_data) kde_data.append(pt_kde(y_vals)) kde_data = np.transpose(kde_data) rgb = mpl.colors.ColorConverter().to_rgb(color) img = np.zeros((kde_data.shape[0], kde_data.shape[1], 4)) img[:, :, :3] = rgb kde_data /= kde_data.max(axis=0) kde_data[kde_data > 1] = 1 img[:, :, 3] = kde_data ax.imshow(img, interpolation="spline16", zorder=2, extent=(x.min(), x.max(), y_min, y_max), aspect="auto", origin="lower") seaborn-0.6.0/seaborn/utils.py000066400000000000000000000357131254425553400163330ustar00rootroot00000000000000"""Small plotting-related utility functions.""" from __future__ import print_function, division import colorsys import warnings import os import numpy as np from scipy import stats import pandas as pd import matplotlib.colors as mplcol import matplotlib.pyplot as plt from distutils.version import LooseVersion pandas_has_categoricals = LooseVersion(pd.__version__) >= "0.15" from .external.six.moves.urllib.request import urlopen, urlretrieve def ci_to_errsize(cis, heights): """Convert intervals to error arguments relative to plot heights. Parameters ---------- cis: 2 x n sequence sequence of confidence interval limits heights : n sequence sequence of plot heights Returns ------- errsize : 2 x n array sequence of error size relative to height values in correct format as argument for plt.bar """ cis = np.atleast_2d(cis).reshape(2, -1) heights = np.atleast_1d(heights) errsize = [] for i, (low, high) in enumerate(np.transpose(cis)): h = heights[i] elow = h - low ehigh = high - h errsize.append([elow, ehigh]) errsize = np.asarray(errsize).T return errsize def pmf_hist(a, bins=10): """Return arguments to plt.bar for pmf-like histogram of an array. Parameters ---------- a: array-like array to make histogram of bins: int number of bins Returns ------- x: array left x position of bars h: array height of bars w: float width of bars """ n, x = np.histogram(a, bins) h = n / n.sum() w = x[1] - x[0] return x[:-1], h, w def desaturate(color, prop): """Decrease the saturation channel of a color by some percent. Parameters ---------- color : matplotlib color hex, rgb-tuple, or html color name prop : float saturation channel of color will be multiplied by this value Returns ------- new_color : rgb tuple desaturated color code in RGB tuple representation """ # Check inputs if not 0 <= prop <= 1: raise ValueError("prop must be between 0 and 1") # Get rgb tuple rep rgb = mplcol.colorConverter.to_rgb(color) # Convert to hls h, l, s = colorsys.rgb_to_hls(*rgb) # Desaturate the saturation channel s *= prop # Convert back to rgb new_color = colorsys.hls_to_rgb(h, l, s) return new_color def saturate(color): """Return a fully saturated color with the same hue. Parameters ---------- color : matplotlib color hex, rgb-tuple, or html color name Returns ------- new_color : rgb tuple saturated color code in RGB tuple representation """ return set_hls_values(color, s=1) def set_hls_values(color, h=None, l=None, s=None): """Independently manipulate the h, l, or s channels of a color. Parameters ---------- color : matplotlib color hex, rgb-tuple, or html color name h, l, s : floats between 0 and 1, or None new values for each channel in hls space Returns ------- new_color : rgb tuple new color code in RGB tuple representation """ # Get rgb tuple representation rgb = mplcol.colorConverter.to_rgb(color) vals = list(colorsys.rgb_to_hls(*rgb)) for i, val in enumerate([h, l, s]): if val is not None: vals[i] = val rgb = colorsys.hls_to_rgb(*vals) return rgb def axlabel(xlabel, ylabel, **kwargs): """Grab current axis and label it.""" ax = plt.gca() ax.set_xlabel(xlabel, **kwargs) ax.set_ylabel(ylabel, **kwargs) def despine(fig=None, ax=None, top=True, right=True, left=False, bottom=False, offset=None, trim=False): """Remove the top and right spines from plot(s). fig : matplotlib figure, optional Figure to despine all axes of, default uses current figure. ax : matplotlib axes, optional Specific axes object to despine. top, right, left, bottom : boolean, optional If True, remove that spine. offset : int or None (default), optional Absolute distance, in points, spines should be moved away from the axes (negative values move spines inward). trim : bool, optional If true, limit spines to the smallest and largest major tick on each non-despined axis. Returns ------- None """ # Get references to the axes we want if fig is None and ax is None: axes = plt.gcf().axes elif fig is not None: axes = fig.axes elif ax is not None: axes = [ax] for ax_i in axes: for side in ["top", "right", "left", "bottom"]: # Toggle the spine objects is_visible = not locals()[side] ax_i.spines[side].set_visible(is_visible) if offset is not None and is_visible: _set_spine_position(ax_i.spines[side], ('outward', offset)) # Set the ticks appropriately if bottom: ax_i.xaxis.tick_top() if top: ax_i.xaxis.tick_bottom() if left: ax_i.yaxis.tick_right() if right: ax_i.yaxis.tick_left() if trim: # clip off the parts of the spines that extend past major ticks xticks = ax_i.get_xticks() if xticks.size: firsttick = np.compress(xticks >= min(ax_i.get_xlim()), xticks)[0] lasttick = np.compress(xticks <= max(ax_i.get_xlim()), xticks)[-1] ax_i.spines['bottom'].set_bounds(firsttick, lasttick) ax_i.spines['top'].set_bounds(firsttick, lasttick) newticks = xticks.compress(xticks <= lasttick) newticks = newticks.compress(newticks >= firsttick) ax_i.set_xticks(newticks) yticks = ax_i.get_yticks() if yticks.size: firsttick = np.compress(yticks >= min(ax_i.get_ylim()), yticks)[0] lasttick = np.compress(yticks <= max(ax_i.get_ylim()), yticks)[-1] ax_i.spines['left'].set_bounds(firsttick, lasttick) ax_i.spines['right'].set_bounds(firsttick, lasttick) newticks = yticks.compress(yticks <= lasttick) newticks = newticks.compress(newticks >= firsttick) ax_i.set_yticks(newticks) def offset_spines(offset=10, fig=None, ax=None): """Simple function to offset spines away from axes. Use this immediately after creating figure and axes objects. Offsetting spines after plotting or manipulating the axes objects may result in loss of labels, ticks, and formatting. Parameters ---------- offset : int, optional Absolute distance, in points, spines should be moved away from the axes (negative values move spines inward). fig : matplotlib figure, optional Figure to despine all axes of, default uses current figure. ax : matplotlib axes, optional Specific axes object to despine Returns ------- None """ warn_msg = "`offset_spines` is deprecated and will be removed in v0.5" warnings.warn(warn_msg, UserWarning) # Get references to the axes we want if fig is None and ax is None: axes = plt.gcf().axes elif fig is not None: axes = fig.axes elif ax is not None: axes = [ax] for ax_i in axes: for spine in ax_i.spines.values(): _set_spine_position(spine, ('outward', offset)) def _set_spine_position(spine, position): """ Set the spine's position without resetting an associated axis. As of matplotlib v. 1.0.0, if a spine has an associated axis, then spine.set_position() calls axis.cla(), which resets locators, formatters, etc. We temporarily replace that call with axis.reset_ticks(), which is sufficient for our purposes. """ axis = spine.axis if axis is not None: cla = axis.cla axis.cla = axis.reset_ticks spine.set_position(position) if axis is not None: axis.cla = cla def _kde_support(data, bw, gridsize, cut, clip): """Establish support for a kernel density estimate.""" support_min = max(data.min() - bw * cut, clip[0]) support_max = min(data.max() + bw * cut, clip[1]) return np.linspace(support_min, support_max, gridsize) def percentiles(a, pcts, axis=None): """Like scoreatpercentile but can take and return array of percentiles. Parameters ---------- a : array data pcts : sequence of percentile values percentile or percentiles to find score at axis : int or None if not None, computes scores over this axis Returns ------- scores: array array of scores at requested percentiles first dimension is length of object passed to ``pcts`` """ scores = [] try: n = len(pcts) except TypeError: pcts = [pcts] n = 0 for i, p in enumerate(pcts): if axis is None: score = stats.scoreatpercentile(a.ravel(), p) else: score = np.apply_along_axis(stats.scoreatpercentile, axis, a, p) scores.append(score) scores = np.asarray(scores) if not n: scores = scores.squeeze() return scores def ci(a, which=95, axis=None): """Return a percentile range from an array of values.""" p = 50 - which / 2, 50 + which / 2 return percentiles(a, p, axis) def sig_stars(p): """Return a R-style significance string corresponding to p values.""" if p < 0.001: return "***" elif p < 0.01: return "**" elif p < 0.05: return "*" elif p < 0.1: return "." return "" def iqr(a): """Calculate the IQR for an array of numbers.""" a = np.asarray(a) q1 = stats.scoreatpercentile(a, 25) q3 = stats.scoreatpercentile(a, 75) return q3 - q1 def get_dataset_names(): """Report available example datasets, useful for reporting issues.""" # delayed import to not demand bs4 unless this function is actually used from bs4 import BeautifulSoup http = urlopen('https://github.com/mwaskom/seaborn-data/') gh_list = BeautifulSoup(http) return [l.text.replace('.csv', '') for l in gh_list.find_all("a", {"class": "js-directory-link"}) if l.text.endswith('.csv')] def get_data_home(data_home=None): """Return the path of the seaborn data directory. This is used by the ``load_dataset`` function. If the ``data_home`` argument is not specified, the default location is ``~/seaborn-data``. Alternatively, a different default location can be specified using the environment variable ``SEABORN_DATA``. """ if data_home is None: data_home = os.environ.get('SEABORN_DATA', os.path.join('~', 'seaborn-data')) data_home = os.path.expanduser(data_home) if not os.path.exists(data_home): os.makedirs(data_home) return data_home def load_dataset(name, cache=True, data_home=None, **kws): """Load a dataset from the online repository (requires internet). Parameters ---------- name : str Name of the dataset (`name`.csv on https://github.com/mwaskom/seaborn-data). You can obtain list of available datasets using :func:`get_dataset_names` cache : boolean, optional If True, then cache data locally and use the cache on subsequent calls data_home : string, optional The directory in which to cache data. By default, uses ~/seaborn_data/ kws : dict, optional Passed to pandas.read_csv """ path = "https://github.com/mwaskom/seaborn-data/raw/master/{0}.csv" full_path = path.format(name) if cache: cache_path = os.path.join(get_data_home(data_home), os.path.basename(full_path)) if not os.path.exists(cache_path): urlretrieve(full_path, cache_path) full_path = cache_path df = pd.read_csv(full_path, **kws) if df.iloc[-1].isnull().all(): df = df.iloc[:-1] if not pandas_has_categoricals: return df # Set some columns as a categorical type with ordered levels if name == "tips": df["day"] = pd.Categorical(df["day"], ["Thur", "Fri", "Sat", "Sun"]) df["sex"] = pd.Categorical(df["sex"], ["Male", "Female"]) df["time"] = pd.Categorical(df["time"], ["Lunch", "Dinner"]) df["smoker"] = pd.Categorical(df["smoker"], ["Yes", "No"]) if name == "flights": df["month"] = pd.Categorical(df["month"], df.month.unique()) if name == "exercise": df["time"] = pd.Categorical(df["time"], ["1 min", "15 min", "30 min"]) df["kind"] = pd.Categorical(df["kind"], ["rest", "walking", "running"]) df["diet"] = pd.Categorical(df["diet"], ["no fat", "low fat"]) if name == "titanic": df["class"] = pd.Categorical(df["class"], ["First", "Second", "Third"]) df["deck"] = pd.Categorical(df["deck"], list("ABCDEFG")) return df def axis_ticklabels_overlap(labels): """Return a boolean for whether the list of ticklabels have overlaps. Parameters ---------- labels : list of ticklabels Returns ------- overlap : boolean True if any of the labels overlap. """ if not labels: return False try: bboxes = [l.get_window_extent() for l in labels] overlaps = [b.count_overlaps(bboxes) for b in bboxes] return max(overlaps) > 1 except RuntimeError: # Issue on macosx backend rasies an error in the above code return False def axes_ticklabels_overlap(ax): """Return booleans for whether the x and y ticklabels on an Axes overlap. Parameters ---------- ax : matplotlib Axes Returns ------- x_overlap, y_overlap : booleans True when the labels on that axis overlap. """ return (axis_ticklabels_overlap(ax.get_xticklabels()), axis_ticklabels_overlap(ax.get_yticklabels())) def categorical_order(values, order=None): """Return a list of unique data values. Determine an ordered list of levels in ``values``. Parameters ---------- values : list, array, Categorical, or Series Vector of "categorical" values order : list-like, optional Desired order of category levels to override the order determined from the ``values`` object. Returns ------- order : list Ordered list of category levels not including null values. """ if order is None: if hasattr(values, "categories"): order = values.categories else: try: order = values.cat.categories except (TypeError, AttributeError): try: order = values.unique() except AttributeError: order = pd.unique(values) order = filter(pd.notnull, order) return list(order) seaborn-0.6.0/seaborn/xkcd_rgb.py000066400000000000000000001050631254425553400167520ustar00rootroot00000000000000xkcd_rgb = {'acid green': '#8ffe09', 'adobe': '#bd6c48', 'algae': '#54ac68', 'algae green': '#21c36f', 'almost black': '#070d0d', 'amber': '#feb308', 'amethyst': '#9b5fc0', 'apple': '#6ecb3c', 'apple green': '#76cd26', 'apricot': '#ffb16d', 'aqua': '#13eac9', 'aqua blue': '#02d8e9', 'aqua green': '#12e193', 'aqua marine': '#2ee8bb', 'aquamarine': '#04d8b2', 'army green': '#4b5d16', 'asparagus': '#77ab56', 'aubergine': '#3d0734', 'auburn': '#9a3001', 'avocado': '#90b134', 'avocado green': '#87a922', 'azul': '#1d5dec', 'azure': '#069af3', 'baby blue': '#a2cffe', 'baby green': '#8cff9e', 'baby pink': '#ffb7ce', 'baby poo': '#ab9004', 'baby poop': '#937c00', 'baby poop green': '#8f9805', 'baby puke green': '#b6c406', 'baby purple': '#ca9bf7', 'baby shit brown': '#ad900d', 'baby shit green': '#889717', 'banana': '#ffff7e', 'banana yellow': '#fafe4b', 'barbie pink': '#fe46a5', 'barf green': '#94ac02', 'barney': '#ac1db8', 'barney purple': '#a00498', 'battleship grey': '#6b7c85', 'beige': '#e6daa6', 'berry': '#990f4b', 'bile': '#b5c306', 'black': '#000000', 'bland': '#afa88b', 'blood': '#770001', 'blood orange': '#fe4b03', 'blood red': '#980002', 'blue': '#0343df', 'blue blue': '#2242c7', 'blue green': '#137e6d', 'blue grey': '#607c8e', 'blue purple': '#5729ce', 'blue violet': '#5d06e9', 'blue with a hint of purple': '#533cc6', 'blue/green': '#0f9b8e', 'blue/grey': '#758da3', 'blue/purple': '#5a06ef', 'blueberry': '#464196', 'bluegreen': '#017a79', 'bluegrey': '#85a3b2', 'bluey green': '#2bb179', 'bluey grey': '#89a0b0', 'bluey purple': '#6241c7', 'bluish': '#2976bb', 'bluish green': '#10a674', 'bluish grey': '#748b97', 'bluish purple': '#703be7', 'blurple': '#5539cc', 'blush': '#f29e8e', 'blush pink': '#fe828c', 'booger': '#9bb53c', 'booger green': '#96b403', 'bordeaux': '#7b002c', 'boring green': '#63b365', 'bottle green': '#044a05', 'brick': '#a03623', 'brick orange': '#c14a09', 'brick red': '#8f1402', 'bright aqua': '#0bf9ea', 'bright blue': '#0165fc', 'bright cyan': '#41fdfe', 'bright green': '#01ff07', 'bright lavender': '#c760ff', 'bright light blue': '#26f7fd', 'bright light green': '#2dfe54', 'bright lilac': '#c95efb', 'bright lime': '#87fd05', 'bright lime green': '#65fe08', 'bright magenta': '#ff08e8', 'bright olive': '#9cbb04', 'bright orange': '#ff5b00', 'bright pink': '#fe01b1', 'bright purple': '#be03fd', 'bright red': '#ff000d', 'bright sea green': '#05ffa6', 'bright sky blue': '#02ccfe', 'bright teal': '#01f9c6', 'bright turquoise': '#0ffef9', 'bright violet': '#ad0afd', 'bright yellow': '#fffd01', 'bright yellow green': '#9dff00', 'british racing green': '#05480d', 'bronze': '#a87900', 'brown': '#653700', 'brown green': '#706c11', 'brown grey': '#8d8468', 'brown orange': '#b96902', 'brown red': '#922b05', 'brown yellow': '#b29705', 'brownish': '#9c6d57', 'brownish green': '#6a6e09', 'brownish grey': '#86775f', 'brownish orange': '#cb7723', 'brownish pink': '#c27e79', 'brownish purple': '#76424e', 'brownish red': '#9e3623', 'brownish yellow': '#c9b003', 'browny green': '#6f6c0a', 'browny orange': '#ca6b02', 'bruise': '#7e4071', 'bubble gum pink': '#ff69af', 'bubblegum': '#ff6cb5', 'bubblegum pink': '#fe83cc', 'buff': '#fef69e', 'burgundy': '#610023', 'burnt orange': '#c04e01', 'burnt red': '#9f2305', 'burnt siena': '#b75203', 'burnt sienna': '#b04e0f', 'burnt umber': '#a0450e', 'burnt yellow': '#d5ab09', 'burple': '#6832e3', 'butter': '#ffff81', 'butter yellow': '#fffd74', 'butterscotch': '#fdb147', 'cadet blue': '#4e7496', 'camel': '#c69f59', 'camo': '#7f8f4e', 'camo green': '#526525', 'camouflage green': '#4b6113', 'canary': '#fdff63', 'canary yellow': '#fffe40', 'candy pink': '#ff63e9', 'caramel': '#af6f09', 'carmine': '#9d0216', 'carnation': '#fd798f', 'carnation pink': '#ff7fa7', 'carolina blue': '#8ab8fe', 'celadon': '#befdb7', 'celery': '#c1fd95', 'cement': '#a5a391', 'cerise': '#de0c62', 'cerulean': '#0485d1', 'cerulean blue': '#056eee', 'charcoal': '#343837', 'charcoal grey': '#3c4142', 'chartreuse': '#c1f80a', 'cherry': '#cf0234', 'cherry red': '#f7022a', 'chestnut': '#742802', 'chocolate': '#3d1c02', 'chocolate brown': '#411900', 'cinnamon': '#ac4f06', 'claret': '#680018', 'clay': '#b66a50', 'clay brown': '#b2713d', 'clear blue': '#247afd', 'cloudy blue': '#acc2d9', 'cobalt': '#1e488f', 'cobalt blue': '#030aa7', 'cocoa': '#875f42', 'coffee': '#a6814c', 'cool blue': '#4984b8', 'cool green': '#33b864', 'cool grey': '#95a3a6', 'copper': '#b66325', 'coral': '#fc5a50', 'coral pink': '#ff6163', 'cornflower': '#6a79f7', 'cornflower blue': '#5170d7', 'cranberry': '#9e003a', 'cream': '#ffffc2', 'creme': '#ffffb6', 'crimson': '#8c000f', 'custard': '#fffd78', 'cyan': '#00ffff', 'dandelion': '#fedf08', 'dark': '#1b2431', 'dark aqua': '#05696b', 'dark aquamarine': '#017371', 'dark beige': '#ac9362', 'dark blue': '#00035b', 'dark blue green': '#005249', 'dark blue grey': '#1f3b4d', 'dark brown': '#341c02', 'dark coral': '#cf524e', 'dark cream': '#fff39a', 'dark cyan': '#0a888a', 'dark forest green': '#002d04', 'dark fuchsia': '#9d0759', 'dark gold': '#b59410', 'dark grass green': '#388004', 'dark green': '#033500', 'dark green blue': '#1f6357', 'dark grey': '#363737', 'dark grey blue': '#29465b', 'dark hot pink': '#d90166', 'dark indigo': '#1f0954', 'dark khaki': '#9b8f55', 'dark lavender': '#856798', 'dark lilac': '#9c6da5', 'dark lime': '#84b701', 'dark lime green': '#7ebd01', 'dark magenta': '#960056', 'dark maroon': '#3c0008', 'dark mauve': '#874c62', 'dark mint': '#48c072', 'dark mint green': '#20c073', 'dark mustard': '#a88905', 'dark navy': '#000435', 'dark navy blue': '#00022e', 'dark olive': '#373e02', 'dark olive green': '#3c4d03', 'dark orange': '#c65102', 'dark pastel green': '#56ae57', 'dark peach': '#de7e5d', 'dark periwinkle': '#665fd1', 'dark pink': '#cb416b', 'dark plum': '#3f012c', 'dark purple': '#35063e', 'dark red': '#840000', 'dark rose': '#b5485d', 'dark royal blue': '#02066f', 'dark sage': '#598556', 'dark salmon': '#c85a53', 'dark sand': '#a88f59', 'dark sea green': '#11875d', 'dark seafoam': '#1fb57a', 'dark seafoam green': '#3eaf76', 'dark sky blue': '#448ee4', 'dark slate blue': '#214761', 'dark tan': '#af884a', 'dark taupe': '#7f684e', 'dark teal': '#014d4e', 'dark turquoise': '#045c5a', 'dark violet': '#34013f', 'dark yellow': '#d5b60a', 'dark yellow green': '#728f02', 'darkblue': '#030764', 'darkgreen': '#054907', 'darkish blue': '#014182', 'darkish green': '#287c37', 'darkish pink': '#da467d', 'darkish purple': '#751973', 'darkish red': '#a90308', 'deep aqua': '#08787f', 'deep blue': '#040273', 'deep brown': '#410200', 'deep green': '#02590f', 'deep lavender': '#8d5eb7', 'deep lilac': '#966ebd', 'deep magenta': '#a0025c', 'deep orange': '#dc4d01', 'deep pink': '#cb0162', 'deep purple': '#36013f', 'deep red': '#9a0200', 'deep rose': '#c74767', 'deep sea blue': '#015482', 'deep sky blue': '#0d75f8', 'deep teal': '#00555a', 'deep turquoise': '#017374', 'deep violet': '#490648', 'denim': '#3b638c', 'denim blue': '#3b5b92', 'desert': '#ccad60', 'diarrhea': '#9f8303', 'dirt': '#8a6e45', 'dirt brown': '#836539', 'dirty blue': '#3f829d', 'dirty green': '#667e2c', 'dirty orange': '#c87606', 'dirty pink': '#ca7b80', 'dirty purple': '#734a65', 'dirty yellow': '#cdc50a', 'dodger blue': '#3e82fc', 'drab': '#828344', 'drab green': '#749551', 'dried blood': '#4b0101', 'duck egg blue': '#c3fbf4', 'dull blue': '#49759c', 'dull brown': '#876e4b', 'dull green': '#74a662', 'dull orange': '#d8863b', 'dull pink': '#d5869d', 'dull purple': '#84597e', 'dull red': '#bb3f3f', 'dull teal': '#5f9e8f', 'dull yellow': '#eedc5b', 'dusk': '#4e5481', 'dusk blue': '#26538d', 'dusky blue': '#475f94', 'dusky pink': '#cc7a8b', 'dusky purple': '#895b7b', 'dusky rose': '#ba6873', 'dust': '#b2996e', 'dusty blue': '#5a86ad', 'dusty green': '#76a973', 'dusty lavender': '#ac86a8', 'dusty orange': '#f0833a', 'dusty pink': '#d58a94', 'dusty purple': '#825f87', 'dusty red': '#b9484e', 'dusty rose': '#c0737a', 'dusty teal': '#4c9085', 'earth': '#a2653e', 'easter green': '#8cfd7e', 'easter purple': '#c071fe', 'ecru': '#feffca', 'egg shell': '#fffcc4', 'eggplant': '#380835', 'eggplant purple': '#430541', 'eggshell': '#ffffd4', 'eggshell blue': '#c4fff7', 'electric blue': '#0652ff', 'electric green': '#21fc0d', 'electric lime': '#a8ff04', 'electric pink': '#ff0490', 'electric purple': '#aa23ff', 'emerald': '#01a049', 'emerald green': '#028f1e', 'evergreen': '#05472a', 'faded blue': '#658cbb', 'faded green': '#7bb274', 'faded orange': '#f0944d', 'faded pink': '#de9dac', 'faded purple': '#916e99', 'faded red': '#d3494e', 'faded yellow': '#feff7f', 'fawn': '#cfaf7b', 'fern': '#63a950', 'fern green': '#548d44', 'fire engine red': '#fe0002', 'flat blue': '#3c73a8', 'flat green': '#699d4c', 'fluorescent green': '#08ff08', 'fluro green': '#0aff02', 'foam green': '#90fda9', 'forest': '#0b5509', 'forest green': '#06470c', 'forrest green': '#154406', 'french blue': '#436bad', 'fresh green': '#69d84f', 'frog green': '#58bc08', 'fuchsia': '#ed0dd9', 'gold': '#dbb40c', 'golden': '#f5bf03', 'golden brown': '#b27a01', 'golden rod': '#f9bc08', 'golden yellow': '#fec615', 'goldenrod': '#fac205', 'grape': '#6c3461', 'grape purple': '#5d1451', 'grapefruit': '#fd5956', 'grass': '#5cac2d', 'grass green': '#3f9b0b', 'grassy green': '#419c03', 'green': '#15b01a', 'green apple': '#5edc1f', 'green blue': '#06b48b', 'green brown': '#544e03', 'green grey': '#77926f', 'green teal': '#0cb577', 'green yellow': '#c9ff27', 'green/blue': '#01c08d', 'green/yellow': '#b5ce08', 'greenblue': '#23c48b', 'greenish': '#40a368', 'greenish beige': '#c9d179', 'greenish blue': '#0b8b87', 'greenish brown': '#696112', 'greenish cyan': '#2afeb7', 'greenish grey': '#96ae8d', 'greenish tan': '#bccb7a', 'greenish teal': '#32bf84', 'greenish turquoise': '#00fbb0', 'greenish yellow': '#cdfd02', 'greeny blue': '#42b395', 'greeny brown': '#696006', 'greeny grey': '#7ea07a', 'greeny yellow': '#c6f808', 'grey': '#929591', 'grey blue': '#6b8ba4', 'grey brown': '#7f7053', 'grey green': '#789b73', 'grey pink': '#c3909b', 'grey purple': '#826d8c', 'grey teal': '#5e9b8a', 'grey/blue': '#647d8e', 'grey/green': '#86a17d', 'greyblue': '#77a1b5', 'greyish': '#a8a495', 'greyish blue': '#5e819d', 'greyish brown': '#7a6a4f', 'greyish green': '#82a67d', 'greyish pink': '#c88d94', 'greyish purple': '#887191', 'greyish teal': '#719f91', 'gross green': '#a0bf16', 'gunmetal': '#536267', 'hazel': '#8e7618', 'heather': '#a484ac', 'heliotrope': '#d94ff5', 'highlighter green': '#1bfc06', 'hospital green': '#9be5aa', 'hot green': '#25ff29', 'hot magenta': '#f504c9', 'hot pink': '#ff028d', 'hot purple': '#cb00f5', 'hunter green': '#0b4008', 'ice': '#d6fffa', 'ice blue': '#d7fffe', 'icky green': '#8fae22', 'indian red': '#850e04', 'indigo': '#380282', 'indigo blue': '#3a18b1', 'iris': '#6258c4', 'irish green': '#019529', 'ivory': '#ffffcb', 'jade': '#1fa774', 'jade green': '#2baf6a', 'jungle green': '#048243', 'kelley green': '#009337', 'kelly green': '#02ab2e', 'kermit green': '#5cb200', 'key lime': '#aeff6e', 'khaki': '#aaa662', 'khaki green': '#728639', 'kiwi': '#9cef43', 'kiwi green': '#8ee53f', 'lavender': '#c79fef', 'lavender blue': '#8b88f8', 'lavender pink': '#dd85d7', 'lawn green': '#4da409', 'leaf': '#71aa34', 'leaf green': '#5ca904', 'leafy green': '#51b73b', 'leather': '#ac7434', 'lemon': '#fdff52', 'lemon green': '#adf802', 'lemon lime': '#bffe28', 'lemon yellow': '#fdff38', 'lichen': '#8fb67b', 'light aqua': '#8cffdb', 'light aquamarine': '#7bfdc7', 'light beige': '#fffeb6', 'light blue': '#95d0fc', 'light blue green': '#7efbb3', 'light blue grey': '#b7c9e2', 'light bluish green': '#76fda8', 'light bright green': '#53fe5c', 'light brown': '#ad8150', 'light burgundy': '#a8415b', 'light cyan': '#acfffc', 'light eggplant': '#894585', 'light forest green': '#4f9153', 'light gold': '#fddc5c', 'light grass green': '#9af764', 'light green': '#96f97b', 'light green blue': '#56fca2', 'light greenish blue': '#63f7b4', 'light grey': '#d8dcd6', 'light grey blue': '#9dbcd4', 'light grey green': '#b7e1a1', 'light indigo': '#6d5acf', 'light khaki': '#e6f2a2', 'light lavendar': '#efc0fe', 'light lavender': '#dfc5fe', 'light light blue': '#cafffb', 'light light green': '#c8ffb0', 'light lilac': '#edc8ff', 'light lime': '#aefd6c', 'light lime green': '#b9ff66', 'light magenta': '#fa5ff7', 'light maroon': '#a24857', 'light mauve': '#c292a1', 'light mint': '#b6ffbb', 'light mint green': '#a6fbb2', 'light moss green': '#a6c875', 'light mustard': '#f7d560', 'light navy': '#155084', 'light navy blue': '#2e5a88', 'light neon green': '#4efd54', 'light olive': '#acbf69', 'light olive green': '#a4be5c', 'light orange': '#fdaa48', 'light pastel green': '#b2fba5', 'light pea green': '#c4fe82', 'light peach': '#ffd8b1', 'light periwinkle': '#c1c6fc', 'light pink': '#ffd1df', 'light plum': '#9d5783', 'light purple': '#bf77f6', 'light red': '#ff474c', 'light rose': '#ffc5cb', 'light royal blue': '#3a2efe', 'light sage': '#bcecac', 'light salmon': '#fea993', 'light sea green': '#98f6b0', 'light seafoam': '#a0febf', 'light seafoam green': '#a7ffb5', 'light sky blue': '#c6fcff', 'light tan': '#fbeeac', 'light teal': '#90e4c1', 'light turquoise': '#7ef4cc', 'light urple': '#b36ff6', 'light violet': '#d6b4fc', 'light yellow': '#fffe7a', 'light yellow green': '#ccfd7f', 'light yellowish green': '#c2ff89', 'lightblue': '#7bc8f6', 'lighter green': '#75fd63', 'lighter purple': '#a55af4', 'lightgreen': '#76ff7b', 'lightish blue': '#3d7afd', 'lightish green': '#61e160', 'lightish purple': '#a552e6', 'lightish red': '#fe2f4a', 'lilac': '#cea2fd', 'liliac': '#c48efd', 'lime': '#aaff32', 'lime green': '#89fe05', 'lime yellow': '#d0fe1d', 'lipstick': '#d5174e', 'lipstick red': '#c0022f', 'macaroni and cheese': '#efb435', 'magenta': '#c20078', 'mahogany': '#4a0100', 'maize': '#f4d054', 'mango': '#ffa62b', 'manilla': '#fffa86', 'marigold': '#fcc006', 'marine': '#042e60', 'marine blue': '#01386a', 'maroon': '#650021', 'mauve': '#ae7181', 'medium blue': '#2c6fbb', 'medium brown': '#7f5112', 'medium green': '#39ad48', 'medium grey': '#7d7f7c', 'medium pink': '#f36196', 'medium purple': '#9e43a2', 'melon': '#ff7855', 'merlot': '#730039', 'metallic blue': '#4f738e', 'mid blue': '#276ab3', 'mid green': '#50a747', 'midnight': '#03012d', 'midnight blue': '#020035', 'midnight purple': '#280137', 'military green': '#667c3e', 'milk chocolate': '#7f4e1e', 'mint': '#9ffeb0', 'mint green': '#8fff9f', 'minty green': '#0bf77d', 'mocha': '#9d7651', 'moss': '#769958', 'moss green': '#658b38', 'mossy green': '#638b27', 'mud': '#735c12', 'mud brown': '#60460f', 'mud green': '#606602', 'muddy brown': '#886806', 'muddy green': '#657432', 'muddy yellow': '#bfac05', 'mulberry': '#920a4e', 'murky green': '#6c7a0e', 'mushroom': '#ba9e88', 'mustard': '#ceb301', 'mustard brown': '#ac7e04', 'mustard green': '#a8b504', 'mustard yellow': '#d2bd0a', 'muted blue': '#3b719f', 'muted green': '#5fa052', 'muted pink': '#d1768f', 'muted purple': '#805b87', 'nasty green': '#70b23f', 'navy': '#01153e', 'navy blue': '#001146', 'navy green': '#35530a', 'neon blue': '#04d9ff', 'neon green': '#0cff0c', 'neon pink': '#fe019a', 'neon purple': '#bc13fe', 'neon red': '#ff073a', 'neon yellow': '#cfff04', 'nice blue': '#107ab0', 'night blue': '#040348', 'ocean': '#017b92', 'ocean blue': '#03719c', 'ocean green': '#3d9973', 'ocher': '#bf9b0c', 'ochre': '#bf9005', 'ocre': '#c69c04', 'off blue': '#5684ae', 'off green': '#6ba353', 'off white': '#ffffe4', 'off yellow': '#f1f33f', 'old pink': '#c77986', 'old rose': '#c87f89', 'olive': '#6e750e', 'olive brown': '#645403', 'olive drab': '#6f7632', 'olive green': '#677a04', 'olive yellow': '#c2b709', 'orange': '#f97306', 'orange brown': '#be6400', 'orange pink': '#ff6f52', 'orange red': '#fd411e', 'orange yellow': '#ffad01', 'orangeish': '#fd8d49', 'orangered': '#fe420f', 'orangey brown': '#b16002', 'orangey red': '#fa4224', 'orangey yellow': '#fdb915', 'orangish': '#fc824a', 'orangish brown': '#b25f03', 'orangish red': '#f43605', 'orchid': '#c875c4', 'pale': '#fff9d0', 'pale aqua': '#b8ffeb', 'pale blue': '#d0fefe', 'pale brown': '#b1916e', 'pale cyan': '#b7fffa', 'pale gold': '#fdde6c', 'pale green': '#c7fdb5', 'pale grey': '#fdfdfe', 'pale lavender': '#eecffe', 'pale light green': '#b1fc99', 'pale lilac': '#e4cbff', 'pale lime': '#befd73', 'pale lime green': '#b1ff65', 'pale magenta': '#d767ad', 'pale mauve': '#fed0fc', 'pale olive': '#b9cc81', 'pale olive green': '#b1d27b', 'pale orange': '#ffa756', 'pale peach': '#ffe5ad', 'pale pink': '#ffcfdc', 'pale purple': '#b790d4', 'pale red': '#d9544d', 'pale rose': '#fdc1c5', 'pale salmon': '#ffb19a', 'pale sky blue': '#bdf6fe', 'pale teal': '#82cbb2', 'pale turquoise': '#a5fbd5', 'pale violet': '#ceaefa', 'pale yellow': '#ffff84', 'parchment': '#fefcaf', 'pastel blue': '#a2bffe', 'pastel green': '#b0ff9d', 'pastel orange': '#ff964f', 'pastel pink': '#ffbacd', 'pastel purple': '#caa0ff', 'pastel red': '#db5856', 'pastel yellow': '#fffe71', 'pea': '#a4bf20', 'pea green': '#8eab12', 'pea soup': '#929901', 'pea soup green': '#94a617', 'peach': '#ffb07c', 'peachy pink': '#ff9a8a', 'peacock blue': '#016795', 'pear': '#cbf85f', 'periwinkle': '#8e82fe', 'periwinkle blue': '#8f99fb', 'perrywinkle': '#8f8ce7', 'petrol': '#005f6a', 'pig pink': '#e78ea5', 'pine': '#2b5d34', 'pine green': '#0a481e', 'pink': '#ff81c0', 'pink purple': '#db4bda', 'pink red': '#f5054f', 'pink/purple': '#ef1de7', 'pinkish': '#d46a7e', 'pinkish brown': '#b17261', 'pinkish grey': '#c8aca9', 'pinkish orange': '#ff724c', 'pinkish purple': '#d648d7', 'pinkish red': '#f10c45', 'pinkish tan': '#d99b82', 'pinky': '#fc86aa', 'pinky purple': '#c94cbe', 'pinky red': '#fc2647', 'piss yellow': '#ddd618', 'pistachio': '#c0fa8b', 'plum': '#580f41', 'plum purple': '#4e0550', 'poison green': '#40fd14', 'poo': '#8f7303', 'poo brown': '#885f01', 'poop': '#7f5e00', 'poop brown': '#7a5901', 'poop green': '#6f7c00', 'powder blue': '#b1d1fc', 'powder pink': '#ffb2d0', 'primary blue': '#0804f9', 'prussian blue': '#004577', 'puce': '#a57e52', 'puke': '#a5a502', 'puke brown': '#947706', 'puke green': '#9aae07', 'puke yellow': '#c2be0e', 'pumpkin': '#e17701', 'pumpkin orange': '#fb7d07', 'pure blue': '#0203e2', 'purple': '#7e1e9c', 'purple blue': '#632de9', 'purple brown': '#673a3f', 'purple grey': '#866f85', 'purple pink': '#e03fd8', 'purple red': '#990147', 'purple/blue': '#5d21d0', 'purple/pink': '#d725de', 'purpleish': '#98568d', 'purpleish blue': '#6140ef', 'purpleish pink': '#df4ec8', 'purpley': '#8756e4', 'purpley blue': '#5f34e7', 'purpley grey': '#947e94', 'purpley pink': '#c83cb9', 'purplish': '#94568c', 'purplish blue': '#601ef9', 'purplish brown': '#6b4247', 'purplish grey': '#7a687f', 'purplish pink': '#ce5dae', 'purplish red': '#b0054b', 'purply': '#983fb2', 'purply blue': '#661aee', 'purply pink': '#f075e6', 'putty': '#beae8a', 'racing green': '#014600', 'radioactive green': '#2cfa1f', 'raspberry': '#b00149', 'raw sienna': '#9a6200', 'raw umber': '#a75e09', 'really light blue': '#d4ffff', 'red': '#e50000', 'red brown': '#8b2e16', 'red orange': '#fd3c06', 'red pink': '#fa2a55', 'red purple': '#820747', 'red violet': '#9e0168', 'red wine': '#8c0034', 'reddish': '#c44240', 'reddish brown': '#7f2b0a', 'reddish grey': '#997570', 'reddish orange': '#f8481c', 'reddish pink': '#fe2c54', 'reddish purple': '#910951', 'reddy brown': '#6e1005', 'rich blue': '#021bf9', 'rich purple': '#720058', 'robin egg blue': '#8af1fe', "robin's egg": '#6dedfd', "robin's egg blue": '#98eff9', 'rosa': '#fe86a4', 'rose': '#cf6275', 'rose pink': '#f7879a', 'rose red': '#be013c', 'rosy pink': '#f6688e', 'rouge': '#ab1239', 'royal': '#0c1793', 'royal blue': '#0504aa', 'royal purple': '#4b006e', 'ruby': '#ca0147', 'russet': '#a13905', 'rust': '#a83c09', 'rust brown': '#8b3103', 'rust orange': '#c45508', 'rust red': '#aa2704', 'rusty orange': '#cd5909', 'rusty red': '#af2f0d', 'saffron': '#feb209', 'sage': '#87ae73', 'sage green': '#88b378', 'salmon': '#ff796c', 'salmon pink': '#fe7b7c', 'sand': '#e2ca76', 'sand brown': '#cba560', 'sand yellow': '#fce166', 'sandstone': '#c9ae74', 'sandy': '#f1da7a', 'sandy brown': '#c4a661', 'sandy yellow': '#fdee73', 'sap green': '#5c8b15', 'sapphire': '#2138ab', 'scarlet': '#be0119', 'sea': '#3c9992', 'sea blue': '#047495', 'sea green': '#53fca1', 'seafoam': '#80f9ad', 'seafoam blue': '#78d1b6', 'seafoam green': '#7af9ab', 'seaweed': '#18d17b', 'seaweed green': '#35ad6b', 'sepia': '#985e2b', 'shamrock': '#01b44c', 'shamrock green': '#02c14d', 'shit': '#7f5f00', 'shit brown': '#7b5804', 'shit green': '#758000', 'shocking pink': '#fe02a2', 'sick green': '#9db92c', 'sickly green': '#94b21c', 'sickly yellow': '#d0e429', 'sienna': '#a9561e', 'silver': '#c5c9c7', 'sky': '#82cafc', 'sky blue': '#75bbfd', 'slate': '#516572', 'slate blue': '#5b7c99', 'slate green': '#658d6d', 'slate grey': '#59656d', 'slime green': '#99cc04', 'snot': '#acbb0d', 'snot green': '#9dc100', 'soft blue': '#6488ea', 'soft green': '#6fc276', 'soft pink': '#fdb0c0', 'soft purple': '#a66fb5', 'spearmint': '#1ef876', 'spring green': '#a9f971', 'spruce': '#0a5f38', 'squash': '#f2ab15', 'steel': '#738595', 'steel blue': '#5a7d9a', 'steel grey': '#6f828a', 'stone': '#ada587', 'stormy blue': '#507b9c', 'straw': '#fcf679', 'strawberry': '#fb2943', 'strong blue': '#0c06f7', 'strong pink': '#ff0789', 'sun yellow': '#ffdf22', 'sunflower': '#ffc512', 'sunflower yellow': '#ffda03', 'sunny yellow': '#fff917', 'sunshine yellow': '#fffd37', 'swamp': '#698339', 'swamp green': '#748500', 'tan': '#d1b26f', 'tan brown': '#ab7e4c', 'tan green': '#a9be70', 'tangerine': '#ff9408', 'taupe': '#b9a281', 'tea': '#65ab7c', 'tea green': '#bdf8a3', 'teal': '#029386', 'teal blue': '#01889f', 'teal green': '#25a36f', 'tealish': '#24bca8', 'tealish green': '#0cdc73', 'terra cotta': '#c9643b', 'terracota': '#cb6843', 'terracotta': '#ca6641', 'tiffany blue': '#7bf2da', 'tomato': '#ef4026', 'tomato red': '#ec2d01', 'topaz': '#13bbaf', 'toupe': '#c7ac7d', 'toxic green': '#61de2a', 'tree green': '#2a7e19', 'true blue': '#010fcc', 'true green': '#089404', 'turquoise': '#06c2ac', 'turquoise blue': '#06b1c4', 'turquoise green': '#04f489', 'turtle green': '#75b84f', 'twilight': '#4e518b', 'twilight blue': '#0a437a', 'ugly blue': '#31668a', 'ugly brown': '#7d7103', 'ugly green': '#7a9703', 'ugly pink': '#cd7584', 'ugly purple': '#a442a0', 'ugly yellow': '#d0c101', 'ultramarine': '#2000b1', 'ultramarine blue': '#1805db', 'umber': '#b26400', 'velvet': '#750851', 'vermillion': '#f4320c', 'very dark blue': '#000133', 'very dark brown': '#1d0200', 'very dark green': '#062e03', 'very dark purple': '#2a0134', 'very light blue': '#d5ffff', 'very light brown': '#d3b683', 'very light green': '#d1ffbd', 'very light pink': '#fff4f2', 'very light purple': '#f6cefc', 'very pale blue': '#d6fffe', 'very pale green': '#cffdbc', 'vibrant blue': '#0339f8', 'vibrant green': '#0add08', 'vibrant purple': '#ad03de', 'violet': '#9a0eea', 'violet blue': '#510ac9', 'violet pink': '#fb5ffc', 'violet red': '#a50055', 'viridian': '#1e9167', 'vivid blue': '#152eff', 'vivid green': '#2fef10', 'vivid purple': '#9900fa', 'vomit': '#a2a415', 'vomit green': '#89a203', 'vomit yellow': '#c7c10c', 'warm blue': '#4b57db', 'warm brown': '#964e02', 'warm grey': '#978a84', 'warm pink': '#fb5581', 'warm purple': '#952e8f', 'washed out green': '#bcf5a6', 'water blue': '#0e87cc', 'watermelon': '#fd4659', 'weird green': '#3ae57f', 'wheat': '#fbdd7e', 'white': '#ffffff', 'windows blue': '#3778bf', 'wine': '#80013f', 'wine red': '#7b0323', 'wintergreen': '#20f986', 'wisteria': '#a87dc2', 'yellow': '#ffff14', 'yellow brown': '#b79400', 'yellow green': '#c0fb2d', 'yellow ochre': '#cb9d06', 'yellow orange': '#fcb001', 'yellow tan': '#ffe36e', 'yellow/green': '#c8fd3d', 'yellowgreen': '#bbf90f', 'yellowish': '#faee66', 'yellowish brown': '#9b7a01', 'yellowish green': '#b0dd16', 'yellowish orange': '#ffab0f', 'yellowish tan': '#fcfc81', 'yellowy brown': '#ae8b0c', 'yellowy green': '#bff128'} seaborn-0.6.0/setup.py000066400000000000000000000070431254425553400146750ustar00rootroot00000000000000#! /usr/bin/env python # # Copyright (C) 2012-2014 Michael Waskom import os # temporarily redirect config directory to prevent matplotlib importing # testing that for writeable directory which results in sandbox error in # certain easy_install versions os.environ["MPLCONFIGDIR"] = "." DESCRIPTION = "Seaborn: statistical data visualization" LONG_DESCRIPTION = """\ Seaborn is a library for making attractive and informative statistical graphics in Python. It is built on top of matplotlib and tightly integrated with the PyData stack, including support for numpy and pandas data structures and statistical routines from scipy and statsmodels. Some of the features that seaborn offers are - Several built-in themes that improve on the default matplotlib aesthetics - Tools for choosing color palettes to make beautiful plots that reveal patterns in your data - Functions for visualizing univariate and bivariate distributions or for comparing them between subsets of data - Tools that fit and visualize linear regression models for different kinds of independent and dependent variables - Functions that visualize matrices of data and use clustering algorithms to discover structure in those matrices - A function to plot statistical timeseries data with flexible estimation and representation of uncertainty around the estimate - High-level abstractions for structuring grids of plots that let you easily build complex visualizations """ DISTNAME = 'seaborn' MAINTAINER = 'Michael Waskom' MAINTAINER_EMAIL = 'mwaskom@stanford.edu' URL = 'http://stanford.edu/~mwaskom/software/seaborn/' LICENSE = 'BSD (3-clause)' DOWNLOAD_URL = 'https://github.com/mwaskom/seaborn/' VERSION = '0.6.0' try: from setuptools import setup _has_setuptools = True except ImportError: from distutils.core import setup def check_dependencies(): install_requires = [] # Just make sure dependencies exist, I haven't rigorously # tested what the minimal versions that will work are # (help on that would be awesome) try: import numpy except ImportError: install_requires.append('numpy') try: import scipy except ImportError: install_requires.append('scipy') try: import matplotlib except ImportError: install_requires.append('matplotlib') try: import pandas except ImportError: install_requires.append('pandas') return install_requires if __name__ == "__main__": install_requires = check_dependencies() setup(name=DISTNAME, author=MAINTAINER, author_email=MAINTAINER_EMAIL, maintainer=MAINTAINER, maintainer_email=MAINTAINER_EMAIL, description=DESCRIPTION, long_description=LONG_DESCRIPTION, license=LICENSE, url=URL, version=VERSION, download_url=DOWNLOAD_URL, install_requires=install_requires, packages=['seaborn', 'seaborn.external', 'seaborn.tests'], classifiers=[ 'Intended Audience :: Science/Research', 'Programming Language :: Python :: 2.7', 'Programming Language :: Python :: 3.3', 'Programming Language :: Python :: 3.4', 'License :: OSI Approved :: BSD License', 'Topic :: Scientific/Engineering :: Visualization', 'Topic :: Multimedia :: Graphics', 'Operating System :: POSIX', 'Operating System :: Unix', 'Operating System :: MacOS'], ) seaborn-0.6.0/testing/000077500000000000000000000000001254425553400146345ustar00rootroot00000000000000seaborn-0.6.0/testing/deps_minimal_2.7.txt000066400000000000000000000001541254425553400204240ustar00rootroot00000000000000ipython-notebook nose numpy=1.6.2 scipy=0.11.0 matplotlib=1.1.1 pandas=0.12.0 statsmodels=0.5.0 patsy=0.2.1 seaborn-0.6.0/testing/deps_modern_2.7.txt000066400000000000000000000001061254425553400202570ustar00rootroot00000000000000ipython-notebook nose numpy scipy matplotlib pandas statsmodels patsy seaborn-0.6.0/testing/deps_modern_3.3.txt000066400000000000000000000001061254425553400202540ustar00rootroot00000000000000ipython-notebook nose numpy scipy matplotlib pandas statsmodels patsy seaborn-0.6.0/testing/deps_modern_3.4.txt000066400000000000000000000001001254425553400202470ustar00rootroot00000000000000ipython-notebook nose numpy scipy matplotlib pandas statsmodels seaborn-0.6.0/testing/matplotlibrc000066400000000000000000000000161254425553400172500ustar00rootroot00000000000000backend : Agg